From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6D1C7C3600C for ; Thu, 3 Apr 2025 08:25:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0D8CF280003; Thu, 3 Apr 2025 04:25:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0AE72280001; Thu, 3 Apr 2025 04:25:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E915B280003; Thu, 3 Apr 2025 04:25:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id CADCE280001 for ; Thu, 3 Apr 2025 04:25:05 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 6E928141AE2 for ; Thu, 3 Apr 2025 08:25:06 +0000 (UTC) X-FDA: 83292047412.10.875B619 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) by imf23.hostedemail.com (Postfix) with ESMTP id 188CC140007 for ; Thu, 3 Apr 2025 08:25:03 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b="vFQR/Wdz"; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=mFjLbEDR; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=OwiZjwYL; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b="jyeH/9La"; dmarc=none; spf=pass (imf23.hostedemail.com: domain of vbabka@suse.cz designates 195.135.223.131 as permitted sender) smtp.mailfrom=vbabka@suse.cz ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1743668704; a=rsa-sha256; cv=none; b=BWrmRw/g6R4H6eXUZnO6nZrdb4HWzIcq0q4MGc4S4P5yc/FJhM+pca1BjXCAkS0PrllGcf 0dmhvKf4IYYTnhxnC+m4CL3aqYJgOsVW3m7w/exKGbsAr/zfpdd8oyOV1LHX/hyOg/1rru nU1U0HfzSczC5R6Juzgh4q7A3ukj500= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b="vFQR/Wdz"; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=mFjLbEDR; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=OwiZjwYL; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b="jyeH/9La"; dmarc=none; spf=pass (imf23.hostedemail.com: domain of vbabka@suse.cz designates 195.135.223.131 as permitted sender) smtp.mailfrom=vbabka@suse.cz ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1743668704; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=lXjFRmAk77Oc3x1SPovG++92LrFyhg0OprYLqRNOBWw=; b=Iy5NOYRn2wM2i6dylyji3WJfVnfS4woLIRrKG/rFahtJKxhiizBvHbas7TYnMnZZd0Q8r6 KmHbmBIZbA5zsfmHWmZ7QgLGtjL+6Zaeq7uHwu0MPlQAx00DaQlU7+KGK4CgASF+XrXtHH Nd0D3FlzwCG9FzhtjXdz9qLT+epNZCA= Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id C1F031F38A; Thu, 3 Apr 2025 08:24:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1743668697; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lXjFRmAk77Oc3x1SPovG++92LrFyhg0OprYLqRNOBWw=; b=vFQR/Wdze1bK+3ZacQZAZO743skDtdGLOXNSFbngtVq6DqVICh0KXF1Cwbe3Zh0Or1HEpT guR16qICOZdmu/17T9+4QhXp3iJNYLbRSwVhB60rSrqjEy6w73nWemmpkTvCqz1XhgVY3i 9LUELfDhpIbbyhNb6ARMn7RjIHz5xDw= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1743668697; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lXjFRmAk77Oc3x1SPovG++92LrFyhg0OprYLqRNOBWw=; b=mFjLbEDRiHXUt/AjqXRWV3pQxDaBiuFItw5urw7TYZ5rntxeAq8ksMScl47UwGUHiYcDgW /gBcsIUA63SJCPDQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1743668696; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lXjFRmAk77Oc3x1SPovG++92LrFyhg0OprYLqRNOBWw=; b=OwiZjwYLf8hovP72Cw9vIm01uCD2VxN9bZagLqJG1QzqqjU1HOZZ4pov6Xg3wvyIzAm9mx VVFPKQHxVSNqqbKSG1ZACctheZ3lWZfZ1lIgqlHnog81j+GFPprGaf/AK4AvsyElstT+kW k8avUhJeZwG1F9MTnGu+VhRFrptPqzw= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1743668696; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lXjFRmAk77Oc3x1SPovG++92LrFyhg0OprYLqRNOBWw=; b=jyeH/9LadK5HnLQ2uUy9Bi9dD9R0UNkxoulwRh/8D4eZwIoDi3aUcx6ZLSHolDIkkoGaSL l6MLPY2PoH8hBqCA== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id A8AB31392A; Thu, 3 Apr 2025 08:24:56 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id da5mKNhF7mdeWAAAD6G6ig (envelope-from ); Thu, 03 Apr 2025 08:24:56 +0000 Message-ID: Date: Thu, 3 Apr 2025 10:24:56 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] mm: kvmalloc: make kmalloc fast path real fast path Content-Language: en-US To: Michal Hocko , Shakeel Butt Cc: Dave Chinner , Yafang Shao , Harry Yoo , Kees Cook , joel.granados@kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Josef Bacik , linux-mm@kvack.org References: <20250401073046.51121-1-laoar.shao@gmail.com> <3315D21B-0772-4312-BCFB-402F408B0EF6@kernel.org> From: Vlastimil Babka In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 188CC140007 X-Stat-Signature: b6z74am1nw1jsum1uincm8m1nm43aubq X-Rspam-User: X-Rspamd-Server: rspam06 X-HE-Tag: 1743668703-686352 X-HE-Meta: U2FsdGVkX1/BalVSl6vj1sRBlORhAPMC/DlTDJjK8aUec1rMQeUCw3vNg5k4pp0y2Qzqokb5KGgvlTSZa2KrhXmy/bgDtIE/CSDfYfgbKeSrxnjuq7AjlZX3pe04ViT2pz2BKKBSyObVFYkABgWsUle64ujWVBqTT0wdIj4I0XWtfqnKJZCax3UZj2VZkLMBVKgL1RIKrE02huKpj6pGuJElQpLPCHnhwdnkCWYrL3afl8S43BpHi/SDe1tRUxV6mrxZ1c9JWNoMESZ3chw8gLEOKksWXCwEPFb7DEm8sF0h9Gj8HvmabADpZJ8w4IlY70S8FSBsdjVzRCzDgfbixpRjNZSObjsOq1tj2JyUOvqWXnQBfRxl5fpuio+fHJGB7uR0ZN6neSRonB/bAUAbn1Jc/IKrrsHx3vHwUMK2Dhn0TJYsD9BQJIKqjaqtjRxrVPyl8pTm4zbQJSMZ9xbtzoSeSKLVmgEMNIFmpM/v4mWuzXh1rySX2PbLbZ0jdsXcYTuUzTcvKtp8m0C2OkL5yBkcLw+XzmBAOsf+foh6EeUye/79C7A3DzyIpLqCQ29BD3N+lm0Nm7mQuIVM/cDSmeutFOqNdax0Alf6I8SBDkELqaHiE3eBoD29qwqc+R6o3p7/x1Dg6TFnreRwzNCTpubkEJfQcD2unCewHBjl6un+Rg1ZWvObPFcrh98SeYQQiAEoFBMyFw11oyg4IHD2RpZjvNK3A+Ltddv03OJMPrv7OV0Qx2RbGII6+zX3PY8eYPyoPy/IXJYV8oBetI5Yk6fCes52+4ypTrzjrOcFUD8SCae6W9gcl2+ylU5RzsqXnzzMZr/6ZtHAzbqpLEXUJi1GfPyoWN5uLGJOxiU1f8ZWFFbc27+LilkDgfUmI95BzEF4LKXWG6rdN8M6IAOVyigP+fYT5ESvXr78ZamPJ6IbTYSpCE8w3RqV7+zqYUtnTIuMTZJ6PZg720PWbhr a1ynBpOF HUED+nWD8yoKD1FjdEFfDz1PgTxmt+v4C8Dl5wiYAg2C05RB8XZwXXNmzFsjOIpZNWZYgBmbPKfyn7eHKISi/l9vo+MI4exY64ZCnKppKOKm/pABhlRD9dFjkEB0U8DrQYWNNIRh83DNfZk4MTadcX3smDoWbK/G+EYBAlu3kAHAhhAlCqraUekZxSK4qTwZgFFbGcbeYSa0v5bJJ8rudknKd8enSUIUpuclXn+/OK+37O/QtfLMXcWGuzrX7N1t5nt4ij7BG7W2UZkSEyCI+SY7z3Qcm08aIFiOzhlFIltFVZk3jrEQd4SD0rhNIWhWzEDBcQM64OmrLeHe1ikpKX44jIiSTfrw9LX/o44n3quIO+ox8zAO36xHN/uCKXHpINx5J5ZztJdPGfT11gG/4c58Z7tLoZ9tRKUuW1eHrhzqM6bfGi6YzDInRGCT3Xrg+oyGg X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 4/3/25 09:43, Michal Hocko wrote: > There are users like xfs which need larger allocations with NOFAIL > sementic. They are not using kvmalloc currently because the current > implementation tries too hard to allocate through the kmalloc path > which causes a lot of direct reclaim and compaction and that hurts > performance a lot (see 8dc9384b7d75 ("xfs: reduce kvmalloc overhead for > CIL shadow buffers") for more details). > > kvmalloc does support __GFP_RETRY_MAYFAIL semantic to express that > kmalloc (physically contiguous) allocation is preferred and we should go > more aggressive to make it happen. There is currently no way to express > that kmalloc should be very lightweight and as it has been argued [1] > this mode should be default to support kvmalloc(NOFAIL) with a > lightweight kmalloc path which is currently impossible to express as > __GFP_NOFAIL cannot be combined by any other reclaim modifiers. > > This patch makes all kmalloc allocations GFP_NOWAIT unless > __GFP_RETRY_MAYFAIL is provided to kvmalloc. This allows to support both > fail fast and retry hard on physically contiguous memory with vmalloc > fallback. > > There is a potential downside that relatively small allocations (smaller > than PAGE_ALLOC_COSTLY_ORDER) could fallback to vmalloc too easily and > cause page block fragmentation. We cannot really rule that out but it > seems that xlog_cil_kvmalloc use doesn't indicate this to be happening. > > [1] https://lore.kernel.org/all/Z-3i1wATGh6vI8x8@dread.disaster.area/T/#u > Signed-off-by: Michal Hocko Looks like a step in the right direction, but is that enough? - to replace xlog_kvmalloc(), we need to deal with kvmalloc() passing VM_ALLOW_HUGE_VMAP, so we don't end up with GFP_KERNEL huge allocation anyway (in practice maybe it wouldn't happen because "size >= PMD_SIZE" required for the huge vmalloc is never true for current xlog_kvmalloc() users but dunno if we can rely on that). Maybe it's a bad idea to use VM_ALLOW_HUGE_VMAP in kvmalloc() anyway? Since we're in a vmalloc fallback which means the huge allocations failed anyway for the kmalloc() part. Maybe there's some grey area where it makes sense, with size much larger than PMD_SIZE, e.g. exceeding MAX_PAGE_ORDER where we can't kmalloc() anyway so at least try to assemble the allocation from huge vmalloc. Maybe tie it to such a size check, or require __GFP_RETRY_MAYFAIL to activate VM_ALLOW_HUGE_VMAP? - we're still not addressing the original issue of high kcompactd activity, but maybe the answer is that it needs to be investigated more (why deferred compaction doesn't limit it) instead of trying to suppress it from kvmalloc() > --- > mm/slub.c | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/mm/slub.c b/mm/slub.c > index b46f87662e71..2da40c2f6478 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -4972,14 +4972,16 @@ static gfp_t kmalloc_gfp_adjust(gfp_t flags, size_t size) > * We want to attempt a large physically contiguous block first because > * it is less likely to fragment multiple larger blocks and therefore > * contribute to a long term fragmentation less than vmalloc fallback. > - * However make sure that larger requests are not too disruptive - no > - * OOM killer and no allocation failure warnings as we have a fallback. > + * However make sure that larger requests are not too disruptive - i.e. > + * do not direct reclaim unless physically continuous memory is preferred > + * (__GFP_RETRY_MAYFAIL mode). We still kick in kswapd/kcompactd to start > + * working in the background but the allocation itself. > */ > if (size > PAGE_SIZE) { > flags |= __GFP_NOWARN; > > if (!(flags & __GFP_RETRY_MAYFAIL)) > - flags |= __GFP_NORETRY; > + flags &= ~__GFP_DIRECT_RECLAIM; > > /* nofail semantic is implemented by the vmalloc fallback */ > flags &= ~__GFP_NOFAIL;