From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1283CC4345F for ; Thu, 18 Apr 2024 10:02:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4230D6B0083; Thu, 18 Apr 2024 06:02:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3D2EC6B0085; Thu, 18 Apr 2024 06:02:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2741C6B0087; Thu, 18 Apr 2024 06:02:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 0CE376B0083 for ; Thu, 18 Apr 2024 06:02:02 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 76DC681338 for ; Thu, 18 Apr 2024 10:02:01 +0000 (UTC) X-FDA: 82022211642.24.A42960B Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by imf17.hostedemail.com (Postfix) with ESMTP id ED8BA40006 for ; Thu, 18 Apr 2024 10:01:58 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=YEE6pjJR; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=iAhukmSG; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=YEE6pjJR; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=iAhukmSG; spf=pass (imf17.hostedemail.com: domain of vbabka@suse.cz designates 195.135.223.130 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1713434519; a=rsa-sha256; cv=none; b=EXYe1XllOfqgPwmY3tGEH+CF/K3l899YJJyn2TpchjgVCq85NZkoIQboOh8tYcHyheFIFM Njm05SXk3zy9qyZPHIChrSY2NNORMCCPLmH73P/Td22fsau1NmZ2IpSVJjKEh+tlL/Br8z 2BYnrLJVEu8LhqhPHT1Uo9rphhMohSI= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=YEE6pjJR; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=iAhukmSG; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=YEE6pjJR; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=iAhukmSG; spf=pass (imf17.hostedemail.com: domain of vbabka@suse.cz designates 195.135.223.130 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1713434519; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=t9BC5x+epqyJhvENaf5Ud+TioMjKQGGUEKQ2yKbOem4=; b=UL+PfMjrgM+PL1sXxNmVgvCdJ5Wvzh2gxTNO9G5mUcfknNZ7bULI4VMhiRdPOu05iVOe28 SkuBggNzemkNblfA8tdh24O8stAXbGPCgyLk7FWCP9Zauep4WqjexUUbBoTcWVIUnFyTDT B/djdG5V3yJWQxsmbeOtaGm+TtK1V6U= Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 50D6634CF1; Thu, 18 Apr 2024 10:01:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1713434517; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=t9BC5x+epqyJhvENaf5Ud+TioMjKQGGUEKQ2yKbOem4=; b=YEE6pjJRHnm0WNgrLCf1bV4cwh+6uXHyuqm41rLGr5jxwlOEEvGW3lUEgRqFTNobuX0LlX xOdjgvLdySs8wggbr/WxDvcM5qYMR7SP6sr4QVHfvWpP/X7sIEPG7PLOb/u1RTLqoPM2Az 2sF8bLki4zAFFg4BvFEdFnFK/n5zBug= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1713434517; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=t9BC5x+epqyJhvENaf5Ud+TioMjKQGGUEKQ2yKbOem4=; b=iAhukmSGHwAHBydoC1CoGwjoCozOc4F+zxg5iCQWTvWQrw/gN+nuev+Er34qqipIrSOhO+ osCzkFsW8N3eD/AQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1713434517; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=t9BC5x+epqyJhvENaf5Ud+TioMjKQGGUEKQ2yKbOem4=; b=YEE6pjJRHnm0WNgrLCf1bV4cwh+6uXHyuqm41rLGr5jxwlOEEvGW3lUEgRqFTNobuX0LlX xOdjgvLdySs8wggbr/WxDvcM5qYMR7SP6sr4QVHfvWpP/X7sIEPG7PLOb/u1RTLqoPM2Az 2sF8bLki4zAFFg4BvFEdFnFK/n5zBug= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1713434517; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=t9BC5x+epqyJhvENaf5Ud+TioMjKQGGUEKQ2yKbOem4=; b=iAhukmSGHwAHBydoC1CoGwjoCozOc4F+zxg5iCQWTvWQrw/gN+nuev+Er34qqipIrSOhO+ osCzkFsW8N3eD/AQ== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 4462913687; Thu, 18 Apr 2024 10:01:57 +0000 (UTC) Received: from dovecot-director2.suse.de ([10.150.64.162]) by imap1.dmz-prg2.suse.org with ESMTPSA id SKhfEJXvIGbMRQAAD6G6ig (envelope-from ); Thu, 18 Apr 2024 10:01:57 +0000 Message-ID: <9863d6b8-cb6d-4555-b35e-38d495f3afbd@suse.cz> Date: Thu, 18 Apr 2024 12:01:59 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 1/1] slub: limit number of slabs to scan in count_partial() To: Jianfeng Wang , linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: akpm@linux-foundation.org, cl@linux.com, penberg@kernel.org, rientjes@google.com, iamjoonsoo.kim@lge.com, junxiao.bi@oracle.com References: <20240417185938.5237-1-jianfeng.w.wang@oracle.com> <20240417185938.5237-2-jianfeng.w.wang@oracle.com> From: Vlastimil Babka Content-Language: en-US In-Reply-To: <20240417185938.5237-2-jianfeng.w.wang@oracle.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: ED8BA40006 X-Stat-Signature: 99dkjqjo3oftursked8czckr89ikf5du X-Rspam-User: X-HE-Tag: 1713434518-490549 X-HE-Meta: U2FsdGVkX18h3F/SD/75aJyQOeMkQcSKf3DmhDW3oMoJQnDKn+iDuRPcWEQ8nID01ora3CnLGAqsJr1Z6UUAbQU9O0XfByMN7oHGCscS7cRMZvXlrKY6ldUNbsRSxb88qrjQxeh4+A5QQfub05ZpItF4bPiSHVNrjgGdcbhb7Y2ToMIh5ibtrBusosFBh49SbC7Ph9CpzxaPyopsJUlaHQIgzglKk0CuFzrbzQG9L/I2cNF7iqfGlU3AgQW6zwEFpi2OfD7rKJL7JI6Udx+4waCO/S+SDS2jFb1ea/YeKbbJBvhD0BrnsHudxJnDRIdgHHXjG9yh+QNUiY0QMAWfu+6daIL4EaSXBzNB4r38q45KOOtSz7ob1hPadp+aAjA+cEwsml93A/5irKXmJKEWe6uBXHq8dQCv1EP2itAP3SxllYtHXJ8vkiUkAifEqaBjZpC9u8ipX5Wg2WHKRn08dJk8s4FZY/po453tWnA5xuG444ir4MsPwpN9AdYAO8g2M0yHoRyJEWq7IB0KEnxrY7F+Jt8ojGCxrqkdwrQ/8yWbDSU4Zm/xaZ0Smc1zR4J8nKFG8ett50z+FwQbf2gFzT1wJd8JgsM1ShP4Su2N2EoRwOQf8Mu+3RdgFBYjdqRUA5eU6jqImA0FNbAhrKDPplsCS7XWU/nhjQArsxc5kw7AKh8D7xzt/Qg0EmirCs1pjlIkAy2HMbqWMuHZ6tP6UxThJ/tdW4uP2nyZJMQCaX7BuQUqM4rAi6Ro+p2nSuYZwNBIrYznpOItnTKBTk5f5M1aWJSEzKknQgsX4EVohtRVCE8Soj73Mg5/YNb/45O6SZ5U0yPhmBnHCnDH4Y2ufygAFMvdswsKyFbnB4gE+VW5ACIwyFMrosfvFXE773ZLMvTsL/o+TEbMbJucuP7o4I0R6uJL4wOpEIEcd98vEa3BIe7WbMpXdwPHuRdBaObxC6JJO60r08yhln0GtyA NyEc/sJm dzofPYDs2uePI7YkcpNGQqBa+G8pcWIabJyUDz4FBZ86jRrp8Eofc4L7JtKPqe7nwVfDeR176KVfP3jKlZ+V7Hq/dSYjo0sJS/OJpZroX8ys/oXt9eVNXBnlVkDmCgMrcx5INOC1q/aj5sFvK1KIGb1i9xRFMe9oNa2n2w9hQQal8wC7bFsPQ7ZCQgMyP+TCEmd8hIXHmqmkAyE7BSbpJzYE9gbZ+XmWRs0Fi3zunYBJVR1jvQfponUx50+gZ20L4Zr/OethWW8lRRY+2JC7D79eJuW1X8rC8d0aqWJd3AUvtK1Q2jDrEr4tp/uaXcQlt2aISVCYoFj/HztAYBvbBzIlid4ymiZIxvNmmtJooYYqHEamMTHOWmZM0/JwSvUIt0DulhbB0+5HYi+FGz//eYUELy7YEkjRSCDQ61MlKKObl3gdVUNo9HnDyjK+WFthohqOPi+8s9vXpshZpdeJlfZ2Nx9xsYUhdLKMcHb8y9Rwiv5g= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 4/17/24 20:59, Jianfeng Wang wrote: > When reading "/proc/slabinfo", the kernel needs to report the number > of free objects for each kmem_cache. The current implementation uses > count_partial() to count the number of free objects by scanning each Hi, thanks. I wanted to apply this patch but then I realized we use the same function besides slabinfo for sysfs and slab_out_of_memory(), and it's not always counting free objects. When somebody is debugging with sysfs, they may expect the exact counts and pay the price if needed, but we probably don't want to make slab_out_of_memory() slow and precise, so that's more like the slabinfo. So what I propose is to create a new variant of count_partial, called e.g. count_partial_free_approx() which has no get_count parameter but hardcodes what count_free() does. Then use this new function only for slabinfo and slab_out_of_memory(), leaving the other count_partial() users unchanged. Another benefit of that is that we remove the overhead of calling get_count(), which may be nontrivial with the current cpu vulnerability mitigations so it's good to avoid for slabinfo and oom reports. Thanks! > kmem_cache_node's list of partial slabs and summing free objects > from every partial slab in the list. This process must hold per > kmem_cache_node spinlock and disable IRQ, and may take a long time. > Consequently, it can block slab allocations on other CPU cores and > cause timeouts for network devices and so on, when the partial list > is long. In production, even NMI watchdog can be triggered due to this > matter: e.g., for "buffer_head", the number of partial slabs was > observed to be ~1M in one kmem_cache_node. This problem was also > confirmed by several others [1-3]. > > Iterating a partial list to get the exact count of objects can cause > soft lockups for a long list with or without the lock (e.g., if > preemption is disabled), and is not very useful too: the object > count can change right after the lock is released. The approach of > maintaining free-object counters requires atomic operations on the > fast path [3]. > > So, the fix is to limit the number of slabs to scan in count_partial(). > Suppose the limit is N. If the list's length is not greater than N, > output the exact count by traversing the whole list; if its length is > greater than N, then output an approximated count by traversing a > subset of the list. The proposed method is to scan N/2 slabs from the > list's head and the other N/2 slabs from the tail. For a partial list > with ~280K slabs, benchmarks show that this approach performs better > than just counting from the list's head, after slabs get sorted by > kmem_cache_shrink(). Default the limit to 10000, as it produces an > approximation within 1% of the exact count for both scenarios. > > Benchmarks: Diff = (exact - approximated) / exact > * Normal case (w/o kmem_cache_shrink()): > | MAX_TO_SCAN | Diff (count from head)| Diff (count head+tail)| > | 1000 | 0.43 % | 1.09 % | > | 5000 | 0.06 % | 0.37 % | > | 10000 | 0.02 % | 0.16 % | > | 20000 | 0.009 % | -0.003 % | > > * Skewed case (w/ kmem_cache_shrink()): > | MAX_TO_SCAN | Diff (count from head)| Diff (count head+tail)| > | 1000 | 12.46 % | 6.75 % | > | 5000 | 5.38 % | 1.27 % | > | 10000 | 4.99 % | 0.22 % | > | 20000 | 4.86 % | -0.06 % | > > [1] https://lore.kernel.org/linux-mm/ > alpine.DEB.2.21.2003031602460.1537@www.lameter.com/T/ > [2] https://lore.kernel.org/lkml/ > alpine.DEB.2.22.394.2008071258020.55871@www.lameter.com/T/ > [3] https://lore.kernel.org/lkml/ > 1e01092b-140d-2bab-aeba-321a74a194ee@linux.com/T/ > > Signed-off-by: Jianfeng Wang > --- > mm/slub.c | 28 ++++++++++++++++++++++++++-- > 1 file changed, 26 insertions(+), 2 deletions(-) > > diff --git a/mm/slub.c b/mm/slub.c > index 1bb2a93cf7b6..7e34f2f0ba85 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -3213,6 +3213,8 @@ static inline bool free_debug_processing(struct kmem_cache *s, > #endif /* CONFIG_SLUB_DEBUG */ > > #if defined(CONFIG_SLUB_DEBUG) || defined(SLAB_SUPPORTS_SYSFS) > +#define MAX_PARTIAL_TO_SCAN 10000 > + > static unsigned long count_partial(struct kmem_cache_node *n, > int (*get_count)(struct slab *)) > { > @@ -3221,8 +3223,30 @@ static unsigned long count_partial(struct kmem_cache_node *n, > struct slab *slab; > > spin_lock_irqsave(&n->list_lock, flags); > - list_for_each_entry(slab, &n->partial, slab_list) > - x += get_count(slab); > + if (n->nr_partial <= MAX_PARTIAL_TO_SCAN) { > + list_for_each_entry(slab, &n->partial, slab_list) > + x += get_count(slab); > + } else { > + /* > + * For a long list, approximate the total count of objects in > + * it to meet the limit on the number of slabs to scan. > + * Scan from both the list's head and tail for better accuracy. > + */ > + unsigned long scanned = 0; > + > + list_for_each_entry(slab, &n->partial, slab_list) { > + x += get_count(slab); > + if (++scanned == MAX_PARTIAL_TO_SCAN / 2) > + break; > + } > + list_for_each_entry_reverse(slab, &n->partial, slab_list) { > + x += get_count(slab); > + if (++scanned == MAX_PARTIAL_TO_SCAN) > + break; > + } > + x = mult_frac(x, n->nr_partial, scanned); > + x = min(x, node_nr_objs(n)); > + } > spin_unlock_irqrestore(&n->list_lock, flags); > return x; > }