From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 65FD4D4114F for ; Thu, 15 Jan 2026 09:47:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CA9666B0098; Thu, 15 Jan 2026 04:47:22 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C8A926B009B; Thu, 15 Jan 2026 04:47:22 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BC0E16B009D; Thu, 15 Jan 2026 04:47:22 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id A90086B0098 for ; Thu, 15 Jan 2026 04:47:22 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 5973B57A86 for ; Thu, 15 Jan 2026 09:47:22 +0000 (UTC) X-FDA: 84333720324.15.2214719 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.14]) by imf25.hostedemail.com (Postfix) with ESMTP id 92553A0007 for ; Thu, 15 Jan 2026 09:47:19 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=NuOjLcxf; spf=pass (imf25.hostedemail.com: domain of zhao1.liu@intel.com designates 198.175.65.14 as permitted sender) smtp.mailfrom=zhao1.liu@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1768470440; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=78P1zlakmBOTRlf6nLy5vypSODn9Ltk+5BdYEaqz9qU=; b=VvKhilaz3EarH5Jd6hEfTZL6wnlCzVZQ6n5844xxrIw230omO1jIQgH6JD40nXqgToL9Zh e2mHh/RNP8Iie0p9PiMLczWcbGksY/S0PQD4rYtLOeGBpObwuUyQ1XIBWVw1dOC2aiQvbA LbEO8JjSaCsbxj//wPz8U8a1QXRtMAY= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=NuOjLcxf; spf=pass (imf25.hostedemail.com: domain of zhao1.liu@intel.com designates 198.175.65.14 as permitted sender) smtp.mailfrom=zhao1.liu@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1768470440; a=rsa-sha256; cv=none; b=DBwal+s2BrkLmgBqZJTmLAF0rL2n5lg9tXsvEgni3OqfA2MOuWWmL76b4+XN1byVKoOmN/ rTxc2Z76xkGpYC9hgzbafZGpCerbzYBE74WDfTvQAR00Fg3xV40kDvpAwrhXsYchEUHKG9 rqDYxptSzReqR2SaQ8JYI0Cq1Ccjai0= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1768470440; x=1800006440; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=pUsFh1TjrJbtT/5IeJ6TbQyrT3Lb6WLEFHN+eJRAy94=; b=NuOjLcxfvMX/FHiFYY/z6lXT0++h9ThResOQ8UnBJOrthgMIxzZI1bV1 0prkQY8mr7FOMgYV9TaV2XyTGalOShDVjczqw2V7vd4L5+hLIZDhMI3ru 8q/BlYs9Ijaf+frUHTfUjJ4kODJcfELxGRf0KyjgDBNrfyE+Sx4i96a8l PdwSuHB7uzzF/gn2jnWCpkBx4qZYCMr+YJnfFgdO/Hz9XSYHGlA/xfjaA YtfesS39pT5Xuja/IUOE7ExmJs0BRSQWC7GDS2xRacH7Dc70z0retHfaC IiY4CZKHXmALjMy09/K4a4AMAKew51mJgf8RXG1c9D7MVbqDdM8QNaOsE Q==; X-CSE-ConnectionGUID: wlHS8wPVSpi00bAWM2/VLA== X-CSE-MsgGUID: Vr2xDyzgRqaRXNLuofNrlw== X-IronPort-AV: E=McAfee;i="6800,10657,11671"; a="73616694" X-IronPort-AV: E=Sophos;i="6.21,226,1763452800"; d="scan'208";a="73616694" Received: from fmviesa010.fm.intel.com ([10.60.135.150]) by orvoesa106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2026 01:47:18 -0800 X-CSE-ConnectionGUID: X5YuPrgiSxax2MegQAIbrw== X-CSE-MsgGUID: k1vqIDmQSDqWjFGuynXJTg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.21,226,1763452800"; d="scan'208";a="205421762" Received: from liuzhao-optiplex-7080.sh.intel.com (HELO localhost) ([10.239.160.39]) by fmviesa010.fm.intel.com with ESMTP; 15 Jan 2026 01:47:15 -0800 Date: Thu, 15 Jan 2026 18:12:44 +0800 From: Zhao Liu To: Vlastimil Babka , Hao Li Cc: akpm@linux-foundation.org, harry.yoo@oracle.com, cl@gentwo.org, rientjes@google.com, roman.gushchin@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, tim.c.chen@intel.com, yu.c.chen@intel.com, zhao1.liu@intel.com Subject: Re: [PATCH v2] slub: keep empty main sheaf as spare in __pcs_replace_empty_main() Message-ID: References: <20251210002629.34448-1-haoli.tcs@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 92553A0007 X-Stat-Signature: izux7sjz1x7tn1fkcnqku985rk3g4ic7 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1768470439-287383 X-HE-Meta: U2FsdGVkX1+vGlBD8uUSWqjNRzZOWg2E8pZ3GJi/REWJDrUalXHlJk2oNalGQCcIPGm2tioAMYGRnhqhWRSwFyj70/1i/UjW7326j5pbX2NIXP3RP+MPMWrZLNNtR/nLIELJOfe0VO1715PNtwilBTYpN3EGUMt63A9wRKDCL4RPg5lCqLbaLdkJ22E35JsqILJQx0u+hyQxBu7sb3IVL9/7rnLT40gsv8ipKVq9cOI7/ZbQd9lXc9r8+TvV78Ao5Bv3NYEUL9lHLdywmRSyqHlk0rnBaZL+MXcvNCMLHC09R0XiH2DH7UTvFeyncdqkWjXvX+hTtT7WVQFv3dDI7dvEUQe+gAL6cc/XVPEuEoi//mhKRcAsJETd3oYqNsNlbcfZejIQ87zoaGl9M+T9NSjOkMCD9KhfeedUJ7FBbr8g3M09rmC3W1nE05xGyK2ce6LiP65oajCLoe1WwuZbzdZ2b1FXfPf1KGWcms235Fj0wFwWYdLPIeeZED4BM1DVl7xsg1bd4ER7dYW71hAPMxsFxywQaRIau4pQHT7AZAsWX5U+kWOKa7LahbDZAXpb1SpzGvgTsNttcJZ9dMMv4gpY6HO7y7HtFrJbA4SkkKR2w2DcwfpnPja/fmCHtDNW877cmDTinEBkBgwyYxCi+29JzrFPpo8R1xJ7A1jJFohrWCO6p3jo1bzZXsRQGzfMB3PaBv5ZoHQReUk1hU6L+l+OBu8V9Ee3IlNUiW0eOfeQvbiDf+quWBaIgfXGTzCWs4T9snGjro8NosBes+J3gCgGs1stIv67JPHFgGEHb25X9RYemN6IRfLTk3nF4QZb2A/HXz4zjG9nXhIhvFSKKznXYYkHRdVTW8l/gupJ+LVsnM1qOICiMRF9uCffeZ5EJ3pU88wfoppwcLNLYhrC7p2Pmg7OpS8sv4aPzPh5HdT4o/kkLdWEkGvuxV4e2cqydv20Z525qg59qyYhRzn wCkk0Cbp m5I2bm8jLIjzjjPTeY6N347JA4o/CpC+k6duAfdbcB4MxOw6EFmThoBdBvirwgqcy9fIrLvhZuN0L/lhmvN+qDgcLBfS3LGZ6kxIB1yGLPxv1claGINdJtqNF7mWQHvSOhPaGdJRF8LFZTIkxS6Bq7k2GrRFwfa1mGDsIO/ulUyQDCTNQh4/uYtqdXPJ96x3puDIb0Ya2jjx5C/hxzSxMk4uWrxl2w17Mrb1u7yCHF3zKrNhtCrmDDBdjZUm8ASq+exZtuAIv8Hugu30Z21sJznknethRqFT5XknDszWR7cI/tp1XrOn6jf6KRmRXi7ZNrPZyrnnvjDOVeRmi5FQ0fNqYlg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Babka & Hao, > Thanks, LGTM. We can make it smaller though. Adding to slab/for-next > adjusted like this: > > diff --git a/mm/slub.c b/mm/slub.c > index f21b2f0c6f5a..ad71f01571f0 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -5052,7 +5052,11 @@ __pcs_replace_empty_main(struct kmem_cache *s, struct slub_percpu_sheaves *pcs, > */ > > if (pcs->main->size == 0) { > - barn_put_empty_sheaf(barn, pcs->main); > + if (!pcs->spare) { > + pcs->spare = pcs->main; > + } else { > + barn_put_empty_sheaf(barn, pcs->main); > + } > pcs->main = full; > return pcs; > } I noticed the previous lkp regression report and tested this fix: * will-it-scale.per_process_ops Compared with v6.19-rc4(f0b9d8eb98df), with this fix, I have these results: nr_tasks Delta 1 + 3.593% 8 + 3.094% 64 +60.247% 128 +49.344% 192 +27.500% 256 -12.077% For the cases (nr_tasks: 1-192), there're the improvements. I think this is expected since pre-cached spare sheaf reduces spinlock race: reduce barn_put_empty_sheaf() & barn_get_empty_sheaf(). So (maybe too late), Tested-by: Zhao Liu But I find there are two more questions that might need consideration? # Question 1: Regression for 256 tasks For the above test - the case with nr_tasks: 256, there's a "slight" regression. I did more testing: (This is a single-round test; the 256-tasks data has jitter.) nr_tasks Delta 244 0.308% 248 - 0.805% 252 12.070% 256 -11.441% 258 2.070% 260 1.252% 264 2.369% 268 -11.479% 272 2.130% 292 8.714% 296 10.905% 298 17.196% 300 11.783% 302 6.620% 304 3.112% 308 - 5.924% It can be seen that most cases show improvement, though a few may experience slight regression. Based on the configuration of my machine: GNR - 2 sockets with the following NUMA topology: NUMA: NUMA node(s): 4 NUMA node0 CPU(s): 0-42,172-214 NUMA node1 CPU(s): 43-85,215-257 NUMA node2 CPU(s): 86-128,258-300 NUMA node3 CPU(s): 129-171,301-343 Since I set the CPU affinity on the core, 256 cases is roughly equivalent to the moment when Node 0 and Node 1 are filled. The following is the perf data comparing 2 tests w/o fix & with this fix: # Baseline Delta Abs Shared Object Symbol # ........ ......... ....................... .................................... # 61.76% +4.78% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath 0.93% -0.32% [kernel.vmlinux] [k] __slab_free 0.39% -0.31% [kernel.vmlinux] [k] barn_get_empty_sheaf 1.35% -0.30% [kernel.vmlinux] [k] mas_leaf_max_gap 3.22% -0.30% [kernel.vmlinux] [k] __kmem_cache_alloc_bulk 1.73% -0.20% [kernel.vmlinux] [k] __cond_resched 0.52% -0.19% [kernel.vmlinux] [k] _raw_spin_lock_irqsave 0.92% +0.18% [kernel.vmlinux] [k] _raw_spin_lock 1.91% -0.15% [kernel.vmlinux] [k] zap_pmd_range.isra.0 1.37% -0.13% [kernel.vmlinux] [k] mas_wr_node_store 1.29% -0.12% [kernel.vmlinux] [k] free_pud_range 0.92% -0.11% [kernel.vmlinux] [k] __mmap_region 0.12% -0.11% [kernel.vmlinux] [k] barn_put_empty_sheaf 0.20% -0.09% [kernel.vmlinux] [k] barn_replace_empty_sheaf 0.31% +0.09% [kernel.vmlinux] [k] get_partial_node 0.29% -0.07% [kernel.vmlinux] [k] __rcu_free_sheaf_prepare 0.12% -0.07% [kernel.vmlinux] [k] intel_idle_xstate 0.21% -0.07% [kernel.vmlinux] [k] __kfree_rcu_sheaf 0.26% -0.07% [kernel.vmlinux] [k] down_write 0.53% -0.06% libc.so.6 [.] __mmap 0.66% -0.06% [kernel.vmlinux] [k] mas_walk 0.48% -0.06% [kernel.vmlinux] [k] mas_prev_slot 0.45% -0.06% [kernel.vmlinux] [k] mas_find 0.38% -0.06% [kernel.vmlinux] [k] mas_wr_store_type 0.23% -0.06% [kernel.vmlinux] [k] do_vmi_align_munmap 0.21% -0.05% [kernel.vmlinux] [k] perf_event_mmap_event 0.32% -0.05% [kernel.vmlinux] [k] entry_SYSRETQ_unsafe_stack 0.19% -0.05% [kernel.vmlinux] [k] downgrade_write 0.59% -0.05% [kernel.vmlinux] [k] mas_next_slot 0.31% -0.05% [kernel.vmlinux] [k] __mmap_new_vma 0.44% -0.05% [kernel.vmlinux] [k] kmem_cache_alloc_noprof 0.28% -0.05% [kernel.vmlinux] [k] __vma_enter_locked 0.41% -0.05% [kernel.vmlinux] [k] memcpy 0.48% -0.04% [kernel.vmlinux] [k] mas_store_gfp 0.14% +0.04% [kernel.vmlinux] [k] __put_partials 0.19% -0.04% [kernel.vmlinux] [k] mas_empty_area_rev 0.30% -0.04% [kernel.vmlinux] [k] do_syscall_64 0.25% -0.04% [kernel.vmlinux] [k] mas_preallocate 0.15% -0.04% [kernel.vmlinux] [k] rcu_free_sheaf 0.22% -0.04% [kernel.vmlinux] [k] entry_SYSCALL_64 0.49% -0.04% libc.so.6 [.] __munmap 0.91% -0.04% [kernel.vmlinux] [k] rcu_all_qs 0.21% -0.04% [kernel.vmlinux] [k] __vm_munmap 0.24% -0.04% [kernel.vmlinux] [k] mas_store_prealloc 0.19% -0.04% [kernel.vmlinux] [k] __kmalloc_cache_noprof 0.34% -0.04% [kernel.vmlinux] [k] build_detached_freelist 0.19% -0.03% [kernel.vmlinux] [k] vms_complete_munmap_vmas 0.36% -0.03% [kernel.vmlinux] [k] mas_rev_awalk 0.05% -0.03% [kernel.vmlinux] [k] shuffle_freelist 0.19% -0.03% [kernel.vmlinux] [k] down_write_killable 0.19% -0.03% [kernel.vmlinux] [k] kmem_cache_free 0.27% -0.03% [kernel.vmlinux] [k] up_write 0.13% -0.03% [kernel.vmlinux] [k] vm_area_alloc 0.18% -0.03% [kernel.vmlinux] [k] arch_get_unmapped_area_topdown 0.08% -0.03% [kernel.vmlinux] [k] userfaultfd_unmap_complete 0.10% -0.03% [kernel.vmlinux] [k] tlb_gather_mmu 0.30% -0.02% [kernel.vmlinux] [k] ___slab_alloc I think the insteresting item is "get_partial_node". It seems this fix makes "get_partial_node" slightly more frequent. HMM, however, I still can't figure out why this is happening. Do you have any thoughts on it? # Question 2: sheaf capacity Back the original commit which triggerred lkp regression. I did more testing to check if this fix could totally fill the regression gap. The base line is commit 3accabda4 ("mm, vma: use percpu sheaves for vm_area_struct cache") and its next commit 59faa4da7cd4 ("maple_tree: use percpu sheaves for maple_node_cache") has the regression. I compared v6.19-rc4(f0b9d8eb98df) w/o fix & with fix aginst the base line: nr_tasks w/o fix with fix 1 - 3.643% - 0.181% 8 -12.523% - 9.816% 64 -50.378% -20.482% 128 -36.736% - 5.518% 192 -22.963% - 1.777% 256 -32.926% - 41.026% It appears that under extreme conditions, regression remains significate. I remembered your suggestion about larger capacity and did the following testing: 59faa4da7cd4 59faa4da7cd4 59faa4da7cd4 59faa4da7cd4 59faa4da7cd4 (with this fix) (cap: 32->64) (cap: 32->128) (cap: 32->256) 1 -8.789% -8.805% -8.185% -9.912% -8.673% 8 -12.256% -9.219% -10.460% -10.070% -8.819% 64 -38.915% -8.172% -4.700% 4.571% 8.793% 128 -8.032% 11.377% 23.232% 26.940% 30.573% 192 -1.220% 9.758% 20.573% 22.645% 25.768% 256 -6.570% 9.967% 21.663% 30.103% 33.876% Comparing with base line (3accabda4), larger capacity could significatly improve the Sheaf's scalability. So, I'd like to know if you think dynamically or adaptively adjusting capacity is a worthwhile idea. Thanks for your patience. Regards, Zhao