linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>
Cc: Harry Yoo <harry.yoo@oracle.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	linux-block@vger.kernel.org, Hao Li <hao.li@linux.dev>,
	Christoph Hellwig <hch@infradead.org>
Subject: Re: [Regression] mm:slab/sheaves: severe performance regression in cross-CPU slab allocation
Date: Fri, 6 Mar 2026 18:22:37 +0800	[thread overview]
Message-ID: <aaqq7YmUcOht3GWH@fedora> (raw)
In-Reply-To: <d5cd4eb4-1703-41af-9c71-842ae33edbdf@kernel.org>

On Fri, Mar 06, 2026 at 09:47:27AM +0100, Vlastimil Babka (SUSE) wrote:
> On 3/6/26 05:55, Harry Yoo wrote:
> > On Thu, Feb 26, 2026 at 07:02:11PM +0100, Vlastimil Babka (SUSE) wrote:
> >> On 2/25/26 10:31, Ming Lei wrote:
> >> > Hi Vlastimil,
> >> > 
> >> > On Wed, Feb 25, 2026 at 09:45:03AM +0100, Vlastimil Babka (SUSE) wrote:
> >> >> On 2/24/26 21:27, Vlastimil Babka wrote:
> >> >> > 
> >> >> > It made sense to me not to refill sheaves when we can't reclaim, but I
> >> >> > didn't anticipate this interaction with mempools. We could change them
> >> >> > but there might be others using a similar pattern. Maybe it would be for
> >> >> > the best to just drop that heuristic from __pcs_replace_empty_main()
> >> >> > (but carefully as some deadlock avoidance depends on it, we might need
> >> >> > to e.g. replace it with gfpflags_allow_spinning()). I'll send a patch
> >> >> > tomorrow to test this theory, unless someone beats me to it (feel free to).
> >> >> Could you try this then, please? Thanks!
> >> > 
> >> > Thanks for working on this issue!
> >> > 
> >> > Unfortunately the patch doesn't make a difference on IOPS in the perf test,
> >> > follows the collected perf profile on linus tree(basically 7.0-rc1 with your patch):
> >> 
> >> what about this patch in addition to the previous one? Thanks.
> >> 
> >> ----8<----
> >> From d3e8118c078996d1372a9f89285179d93971fdb2 Mon Sep 17 00:00:00 2001
> >> From: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>
> >> Date: Thu, 26 Feb 2026 18:59:56 +0100
> >> Subject: [PATCH] mm/slab: put barn on every online node
> >> 
> >> Including memoryless nodes.
> >> 
> >> Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
> >> ---
> > 
> > Just taking a quick grasp...
> > 
> >> @@ -6121,7 +6122,8 @@ void slab_free(struct kmem_cache *s, struct slab *slab, void *object,
> >>  	if (unlikely(!slab_free_hook(s, object, slab_want_init_on_free(s), false)))
> >>  		return;
> >>  
> >> -	if (likely(!IS_ENABLED(CONFIG_NUMA) || slab_nid(slab) == numa_mem_id())
> >> +	if (likely(!IS_ENABLED(CONFIG_NUMA) || (slab_nid(slab) == numa_mem_id())
> >> +			|| !node_isset(slab_nid(slab), slab_nodes))
> > 
> > I think you intended !node_isset(numa_mem_id(), slab_nodes)?
> > 
> > "Skip freeing to pcs if it's remote free, but memoryless nodes is
> >  an exception".
> 
> Indeed, thanks! Ming, could you retry with that fixed up please?

After applying the following change, IOPS is ~25M:

- delta change on the two patches

diff --git a/mm/slub.c b/mm/slub.c
index 085fe49eec68..56fe8bd956c0 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -6142,7 +6142,7 @@ void slab_free(struct kmem_cache *s, struct slab *slab, void *object,
                return;
 
        if (likely(!IS_ENABLED(CONFIG_NUMA) || (slab_nid(slab) == numa_mem_id())
-                       || !node_isset(slab_nid(slab), slab_nodes))
+                       || !node_isset(numa_mem_id(), slab_nodes))
            && likely(!slab_test_pfmemalloc(slab))) {
                if (likely(free_to_pcs(s, object, true)))
                        return;


- slab stat on patched `815c8e35511d Merge branch 'slab/for-7.0/sheaves' into slab/for-next`

# (cd /sys/kernel/slab/bio-256/ && find . -type f -exec grep -aH . {} \;)
./remote_node_defrag_ratio:100
./total_objects:7395 N1=3876 N5=3519
./alloc_fastpath:507619662 C0=70 C1=27608632 C3=28990301 C5=35098386 C6=9 C7=35782152 C8=115 C9=31757274 C10=32 C11=30087065 C12=34 C13=31615065 C14=7 C15=31798233 C17=30695955 C18=128 C19=32204853 C20=64 C21=36842392 C23=36212376 C25=30013640 C27=29055001 C29=29990232 C30=48 C31=29867595 C36=2 C50=1
./cpu_slabs:0
./objects:7232 N1=3816 N5=3416
./sheaf_return_slow:0
./objects_partial:500 N1=195 N5=305
./sheaf_return_fast:0
./cpu_partial:0
./free_slowpath:20 C4=20
./barn_get_fail:260 C1=6 C3=26 C5=26 C7=7 C9=5 C10=2 C11=26 C12=2 C13=10 C14=1 C15=19 C17=8 C18=5 C19=19 C20=1 C21=9 C23=22 C25=11 C27=21 C29=26 C31=6 C36=1 C50=1
./sheaf_prefill_oversize:0
./skip_kfence:0
./min_partial:5
./order_fallback:0
./sheaf_capacity:28
./sheaf_flush:28 C24=28
./free_rcu_sheaf:0
./sheaf_alloc:178 C0=4 C2=9 C3=1 C4=9 C5=65 C6=4 C8=5 C10=8 C11=1 C12=4 C13=1 C14=8 C15=1 C16=5 C18=8 C19=1 C20=3 C22=10 C23=1 C24=5 C25=1 C26=7 C27=1 C28=10 C29=1 C30=2 C31=1 C36=1 C50=1
./sheaf_free:0
./sheaf_prefill_slow:0
./sheaf_prefill_fast:0
./poison:0
./red_zone:0
./free_slab:0
./slabs:145 N1=76 N5=69
./barn_get:18129029 C0=3 C1=986017 C3=1035342 C5=1253488 C6=1 C7=1277927 C8=5 C9=1134184 C11=1074513 C13=1129100 C15=1135633 C17=1096277 C19=1150155 C20=2 C21=1315791 C23=1293278 C25=1071905 C27=1037658 C29=1071054 C30=2 C31=1066694
./alloc_slowpath:0
./destroy_by_rcu:1
./free_rcu_sheaf_fail:0
./barn_put:18129105 C0=986015 C2=1035357 C4=1253502 C6=1277924 C8=1134182 C10=1074529 C12=1129101 C14=1135641 C16=1096273 C18=1150168 C20=1315792 C22=1293288 C24=1071905 C26=1037668 C28=1071069 C30=1066691
./usersize:0
./sanity_checks:0
./barn_put_fail:1 C24=1
./align:64
./alloc_node_mismatch:0
./alloc_slab:145 C1=3 C3=19 C5=6 C7=3 C9=3 C10=2 C11=18 C12=2 C13=6 C14=1 C15=12 C17=8 C18=3 C19=12 C21=2 C23=5 C25=7 C27=12 C29=15 C31=4 C36=1 C50=1
./free_remove_partial:0
./aliases:0
./store_user:0
./trace:0
./reclaim_account:0
./order:2
./sheaf_refill:7280 C1=168 C3=728 C5=728 C7=196 C9=140 C10=56 C11=728 C12=56 C13=280 C14=28 C15=532 C17=224 C18=140 C19=532 C20=28 C21=252 C23=616 C25=308 C27=588 C29=728 C31=168 C36=28 C50=28
./object_size:256
./free_fastpath:507615526 C0=27608438 C2=28990052 C4=35098103 C6=35781903 C8=31757101 C10=30086841 C12=31614841 C14=31797983 C16=30695700 C18=32204722 C19=1 C20=36842201 C22=36212117 C24=30013416 C26=29054742 C28=29989974 C30=29867383 C31=4 C39=2 C47=2
./hwcache_align:1
./cmpxchg_double_fail:0
./objs_per_slab:51
./partial:13 N1=5 N5=8
./slabs_cpu_partial:0(0)
./free_add_partial:117 C1=3 C3=7 C5=19 C7=4 C9=2 C11=8 C13=4 C15=7 C18=2 C19=7 C20=1 C21=7 C23=17 C24=3 C25=4 C27=9 C29=11 C31=2
./slab_size:320
./cache_dma:0





Thanks,
Ming



      reply	other threads:[~2026-03-06 10:23 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-24  2:52 Ming Lei
2026-02-24  5:00 ` Harry Yoo
2026-02-24  9:07   ` Ming Lei
2026-02-25  5:32     ` Hao Li
2026-02-25  6:54       ` Harry Yoo
2026-02-25  7:06         ` Hao Li
2026-02-25  7:19           ` Harry Yoo
2026-02-25  8:19             ` Hao Li
2026-02-25  8:41               ` Harry Yoo
2026-02-25  8:54                 ` Hao Li
2026-02-25  8:21             ` Harry Yoo
2026-02-24  6:51 ` Hao Li
2026-02-24  7:10   ` Harry Yoo
2026-02-24  7:41     ` Hao Li
2026-02-24 20:27 ` Vlastimil Babka
2026-02-25  5:24   ` Harry Yoo
2026-02-25  8:45   ` Vlastimil Babka (SUSE)
2026-02-25  9:31     ` Ming Lei
2026-02-25 11:29       ` Vlastimil Babka (SUSE)
2026-02-25 12:24         ` Ming Lei
2026-02-25 13:22           ` Vlastimil Babka (SUSE)
2026-02-26 18:02       ` Vlastimil Babka (SUSE)
2026-02-27  9:23         ` Ming Lei
2026-03-05 13:05           ` Vlastimil Babka (SUSE)
2026-03-05 15:48             ` Ming Lei
2026-03-06  1:01               ` Ming Lei
2026-03-06  4:17               ` Hao Li
2026-03-06  4:55         ` Harry Yoo
2026-03-06  8:32           ` Hao Li
2026-03-06  8:47           ` Vlastimil Babka (SUSE)
2026-03-06 10:22             ` Ming Lei [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aaqq7YmUcOht3GWH@fedora \
    --to=ming.lei@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hao.li@linux.dev \
    --cc=harry.yoo@oracle.com \
    --cc=hch@infradead.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=vbabka@kernel.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox