From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9E663C3600C for ; Thu, 3 Apr 2025 16:21:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 67C1F280005; Thu, 3 Apr 2025 12:21:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 60456280001; Thu, 3 Apr 2025 12:21:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4A6B2280005; Thu, 3 Apr 2025 12:21:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 2A5C7280001 for ; Thu, 3 Apr 2025 12:21:57 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 86139AC5E2 for ; Thu, 3 Apr 2025 16:21:58 +0000 (UTC) X-FDA: 83293249116.22.29E1673 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf07.hostedemail.com (Postfix) with ESMTP id C7A224000C for ; Thu, 3 Apr 2025 16:21:56 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=XElAYKJ3; spf=pass (imf07.hostedemail.com: domain of kees@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=kees@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1743697317; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/h0ebradDNTjBufhrsTXlS4lJlv8y4DaiUuuRt9tU7Q=; b=3y7FvusTOV4VBUrC2figVU8TmnddSIMB2qn5g/vrIGNmHT0HGTgRjXcBLr8kCf+XeSCald H8jha6L/t1yQ2tny0uZog2Jvjp4FzHL8vrU9BW04B8sQ1quM/3M4DXZ8RTcBP2BarOwzjm o08yxOE2xbiCXyoOO436/px3qe2RkFk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1743697317; a=rsa-sha256; cv=none; b=gIEf4lORjmC+7bgXFKSixWKNMzhVDjbxJ2ylWguAslemXrTnCsjpgh+8BFDeFdrLxbJTnL NDAIM/bK2F21bxQcbTiNDy8pfoQD95diMNeZuchzrXaKerKxrO8WvZInZpZOg0+/DiIS/y EBJEAe945OQBfQyN6LVHk3riUbzu8Q8= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=XElAYKJ3; spf=pass (imf07.hostedemail.com: domain of kees@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=kees@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id B02F65C6AF9; Thu, 3 Apr 2025 16:19:38 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 56612C4CEE3; Thu, 3 Apr 2025 16:21:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1743697315; bh=hEwo2BARUEQeXAHpKtc6UfpXpgB0pd/FflGY/8PW2F4=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=XElAYKJ3FzgVQIEhT4DYu/b7dHiMk1PXDg2iUizmMtmFdEiKYsw5MjpA1UeJW907K G+16RfNxz8JXAus7puijYLLjxqz3/hraDLyLRtqwdrx+SN8lFKcbdr14scqSdgLzM8 7udRXXsWgervZvBL/SQvSolHSkhBmdvt9jsge3Ox49/HQQXWI0ZvhPIwueNAQUqWR5 sGI/mYcQwkPr4mCN9EmnS6RnP6IcVUgPXFdvMAcROtBhsRXHUNABStNa2VLkMkW7lm HhdMQm7JqOQ726K9eWU+1O+HG/OfMgT9q/thPRRHZRexf0DBbJQPXVtLSqcOxi3ASu uH4tLFT+Tp5mw== Date: Thu, 3 Apr 2025 09:21:50 -0700 From: Kees Cook To: Michal Hocko Cc: Shakeel Butt , Dave Chinner , Yafang Shao , Harry Yoo , joel.granados@kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Josef Bacik , linux-mm@kvack.org, Vlastimil Babka Subject: Re: [PATCH] mm: kvmalloc: make kmalloc fast path real fast path Message-ID: <202504030920.EB65CCA2@keescook> References: <20250401073046.51121-1-laoar.shao@gmail.com> <3315D21B-0772-4312-BCFB-402F408B0EF6@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Stat-Signature: ie9ugy5p7h7ou4651h5gun4jnheyeeg5 X-Rspam-User: X-Rspamd-Queue-Id: C7A224000C X-Rspamd-Server: rspam08 X-HE-Tag: 1743697316-47673 X-HE-Meta: U2FsdGVkX19S8SEfhQesdShTUi1UFWqxVD+DemgXiOCO2MNv+TL84QgybfkC5V3n/Bx+1sA3pb4H1O5A2fcYCuIiXqV8Ci06WphPL3b2Bj9sUo70M4wn1YvHrW5mKh90z+lXrOoj32fsEhK8cPqX4HuxHD5i36uev74zzp/TNda+11jspOaQGiTQmVTCi+R9HHD0axmUYP8fh55E5DTgA8DT81V+MtVnML7fNqYfJV6oxjSBl/SZWbyl536UzbVmdK4LO8ItPa51ReEUwb1SIYr+S5J+yko/jIKP6PmmbC4AAM69Cekr5LPOneIW1lbPJf7a91HQfk6Z8hjbSDZJSbYpLSYAjfQci8JTq9YFU1bXenf9aqaIOOOBBqi7MJBgRCA3JLqQm7sts0Mxqn16KRtQiBc1tlb62Afh405EghniA+gO8R5ybI7xJjNJlc+LKlpPyyolLLV14NQBlPdOOsT+QHcXbGbms3GlMLz6gh0zi/l+/2NMHZdCy/pdah9b+oco3GZ5905PB5rDu9PRtFchrAembMfizAjEpC1LHS8ftyva0C4b8itRAe61BGIPiAIsyy3ZB3X9+UXBIgjjqqoZfm0G8V22ucfQl0lkRUn6rKM6o/JWq3+qW6IMvSlEbxP1UDj/GfIscs/OKPaeWrusUAM92JspO+yixJfq+lvfnOu1HGH6JzbWxEyHN23ElCjMz03EuOIz1PbpqhTm7Jd3/yTP9KprTBOZsihfi5N1mGxFYmloVLVauqnb/Mu7FVnU/A8lXcHk3Zzj88VRfAlyGkVBsVzbd/quXW0vYkbLK7ku+fwpzni5orOjNdJKQxUQwrwYsvWMvDnypRS/19p6mxl4/ABVsMgV0J47ISqxP2Am2f2GJWGcEx+TG7CXwcXmZpnaXIVWADKiaRvFqhuU4jX6686HsPmb3kZe8CY7Ub3AVeb0ogepGG4tLYbPIUOCO2R43hkxK3Gh3Ko 8c5faC3v H3zvS/qLQMhWvyqNZg/7OS4f3Yqzx7sYgDr8pFepj/+SQGO3Q3clE0rhFM/fsobxpmQ9/xxasPsT5Nuk9UV0zSiLREvXU1/iBEwjjSCul5EZ8PNfMWI5eDruivSFTuDxxjucGRtAmfc4r7vxSZIZQOWYk3MrBxfOxrntKliYtUedbFwPxqt7/g+yiwmVYW6RxELuQotVzIYwTLuQj8pqJWLSkfn1Vsj1yHjeNEkS7a3hXuot5afZUxUZyhVdi5zNOYi+ZyKq6ypuWhhCFJfEPEE2cmX7u+U3ImXWRc1kVw3Erj5+KhpHxFvaoDTqSpcVE24O0TTFOpjeMAtU7XlW4ZB0wrT7wu+7OTIVypUid3+7DI6FWjYzaaIOjOQXWPmdIEsG9mdrf6i8t2wZcVquU4NsT2A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Apr 03, 2025 at 09:43:39AM +0200, Michal Hocko wrote: > There are users like xfs which need larger allocations with NOFAIL > sementic. They are not using kvmalloc currently because the current > implementation tries too hard to allocate through the kmalloc path > which causes a lot of direct reclaim and compaction and that hurts > performance a lot (see 8dc9384b7d75 ("xfs: reduce kvmalloc overhead for > CIL shadow buffers") for more details). > > kvmalloc does support __GFP_RETRY_MAYFAIL semantic to express that > kmalloc (physically contiguous) allocation is preferred and we should go > more aggressive to make it happen. There is currently no way to express > that kmalloc should be very lightweight and as it has been argued [1] > this mode should be default to support kvmalloc(NOFAIL) with a > lightweight kmalloc path which is currently impossible to express as > __GFP_NOFAIL cannot be combined by any other reclaim modifiers. > > This patch makes all kmalloc allocations GFP_NOWAIT unless > __GFP_RETRY_MAYFAIL is provided to kvmalloc. This allows to support both > fail fast and retry hard on physically contiguous memory with vmalloc > fallback. > > There is a potential downside that relatively small allocations (smaller > than PAGE_ALLOC_COSTLY_ORDER) could fallback to vmalloc too easily and > cause page block fragmentation. We cannot really rule that out but it > seems that xlog_cil_kvmalloc use doesn't indicate this to be happening. > > [1] https://lore.kernel.org/all/Z-3i1wATGh6vI8x8@dread.disaster.area/T/#u > Signed-off-by: Michal Hocko Thanks for finding a solution for this! It makes way more sense to me to kick over to vmap by default for kvmalloc users. > --- > mm/slub.c | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/mm/slub.c b/mm/slub.c > index b46f87662e71..2da40c2f6478 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -4972,14 +4972,16 @@ static gfp_t kmalloc_gfp_adjust(gfp_t flags, size_t size) > * We want to attempt a large physically contiguous block first because > * it is less likely to fragment multiple larger blocks and therefore > * contribute to a long term fragmentation less than vmalloc fallback. > - * However make sure that larger requests are not too disruptive - no > - * OOM killer and no allocation failure warnings as we have a fallback. > + * However make sure that larger requests are not too disruptive - i.e. > + * do not direct reclaim unless physically continuous memory is preferred > + * (__GFP_RETRY_MAYFAIL mode). We still kick in kswapd/kcompactd to start > + * working in the background but the allocation itself. I think a word is missing here? "...but do the allocation..." or "...allocation itself happens" ? -- Kees Cook