From: Ming Lei <ming.lei@redhat.com>
To: Harry Yoo <harry.yoo@oracle.com>
Cc: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>,
Vlastimil Babka <vbabka@suse.cz>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linux-block@vger.kernel.org, Hao Li <hao.li@linux.dev>,
Christoph Hellwig <hch@infradead.org>
Subject: Re: [Regression] mm:slab/sheaves: severe performance regression in cross-CPU slab allocation
Date: Wed, 11 Mar 2026 18:43:14 +0800 [thread overview]
Message-ID: <CAFj5m9KutWuGfPN4cfAd-CjFNTCC+fLw0DtMsFFQrRBEeK-8LQ@mail.gmail.com> (raw)
In-Reply-To: <abFA17ZmsW6RgcYI@fedora>
On Wed, Mar 11, 2026 at 6:16 PM Ming Lei <ming.lei@redhat.com> wrote:
>
> On Wed, Mar 11, 2026 at 10:10:13AM +0900, Harry Yoo wrote:
> > On Fri, Mar 06, 2026 at 06:22:37PM +0800, Ming Lei wrote:
> > > On Fri, Mar 06, 2026 at 09:47:27AM +0100, Vlastimil Babka (SUSE) wrote:
> > > > On 3/6/26 05:55, Harry Yoo wrote:
> > > > > On Thu, Feb 26, 2026 at 07:02:11PM +0100, Vlastimil Babka (SUSE) wrote:
> > > > >> On 2/25/26 10:31, Ming Lei wrote:
> > > > >> > Hi Vlastimil,
> > > > >> >
> > > > >> > On Wed, Feb 25, 2026 at 09:45:03AM +0100, Vlastimil Babka (SUSE) wrote:
> > > > >> >> On 2/24/26 21:27, Vlastimil Babka wrote:
> > > > >> >> >
> > > > >> >> > It made sense to me not to refill sheaves when we can't reclaim, but I
> > > > >> >> > didn't anticipate this interaction with mempools. We could change them
> > > > >> >> > but there might be others using a similar pattern. Maybe it would be for
> > > > >> >> > the best to just drop that heuristic from __pcs_replace_empty_main()
> > > > >> >> > (but carefully as some deadlock avoidance depends on it, we might need
> > > > >> >> > to e.g. replace it with gfpflags_allow_spinning()). I'll send a patch
> > > > >> >> > tomorrow to test this theory, unless someone beats me to it (feel free to).
> > > > >> >> Could you try this then, please? Thanks!
> > > > >> >
> > > > >> > Thanks for working on this issue!
> > > > >> >
> > > > >> > Unfortunately the patch doesn't make a difference on IOPS in the perf test,
> > > > >> > follows the collected perf profile on linus tree(basically 7.0-rc1 with your patch):
> > > > >>
> > > > >> what about this patch in addition to the previous one? Thanks.
> > > > >>
> > > > >> ----8<----
> > > > >> From d3e8118c078996d1372a9f89285179d93971fdb2 Mon Sep 17 00:00:00 2001
> > > > >> From: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>
> > > > >> Date: Thu, 26 Feb 2026 18:59:56 +0100
> > > > >> Subject: [PATCH] mm/slab: put barn on every online node
> > > > >>
> > > > >> Including memoryless nodes.
> > > > >>
> > > > >> Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
> > > > >> ---
> > > > >
> > > > > Just taking a quick grasp...
> > > > >
> > > > >> @@ -6121,7 +6122,8 @@ void slab_free(struct kmem_cache *s, struct slab *slab, void *object,
> > > > >> if (unlikely(!slab_free_hook(s, object, slab_want_init_on_free(s), false)))
> > > > >> return;
> > > > >>
> > > > >> - if (likely(!IS_ENABLED(CONFIG_NUMA) || slab_nid(slab) == numa_mem_id())
> > > > >> + if (likely(!IS_ENABLED(CONFIG_NUMA) || (slab_nid(slab) == numa_mem_id())
> > > > >> + || !node_isset(slab_nid(slab), slab_nodes))
> > > > >
> > > > > I think you intended !node_isset(numa_mem_id(), slab_nodes)?
> > > > >
> > > > > "Skip freeing to pcs if it's remote free, but memoryless nodes is
> > > > > an exception".
> > > >
> > > > Indeed, thanks! Ming, could you retry with that fixed up please?
> > >
> > > After applying the following change, IOPS is ~25M:
> > >
> > > - delta change on the two patches
> > >
> > > diff --git a/mm/slub.c b/mm/slub.c
> > > index 085fe49eec68..56fe8bd956c0 100644
> > > --- a/mm/slub.c
> > > +++ b/mm/slub.c
> > > @@ -6142,7 +6142,7 @@ void slab_free(struct kmem_cache *s, struct slab *slab, void *object,
> > > return;
> > >
> > > if (likely(!IS_ENABLED(CONFIG_NUMA) || (slab_nid(slab) == numa_mem_id())
> > > - || !node_isset(slab_nid(slab), slab_nodes))
> > > + || !node_isset(numa_mem_id(), slab_nodes))
> > > && likely(!slab_test_pfmemalloc(slab))) {
> > > if (likely(free_to_pcs(s, object, true)))
> > > return;
> > >
> >
> > Hi Ming, thanks a lot for helping testing!
> >
> > The stats look quite fine to me, but we're still seeing suboptimal IOPS.
> >
> > > - slab stat on patched `815c8e35511d Merge branch 'slab/for-7.0/sheaves' into slab/for-next`
> >
> > Does that doesn't include Vlastimil's (fb1091febd66 mm/slab: allow sheaf
> > refill if blocking is not allowed)?
>
> No, because fb1091febd66 isn't included into `815c8e35511d Merge branch
> 'slab/for-7.0/sheaves'.
>
> >
> > Next time when testing it, could you please test on top of 7.0-rc3 w/
> > the memoryless node patch (w/ the delta above) applied?
>
> IOPS is same between `815c8e35511d Merge branch 'slab/for-7.0/sheaves' into slab/for-next`
> and 7.0-rc3 with the two patches.
>
> IMO, it should be more easier to compare & investigate by focusing on
> 815c8e35511d, given there is only 41 patches between v6.19-rc5 and
> commit 815c8e35511d.
>
> >
> > Also, let us check a few things...
> >
> > 1) Does bumping up sheaf capacity change the slab stats & IOPS?
> >
> > diff --git a/mm/slub.c b/mm/slub.c
> > index 0c906fefc31b..5207279417e2 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -7611,13 +7611,13 @@ static unsigned int calculate_sheaf_capacity(struct kmem_cache *s,
> > * should result in similar lock contention (barn or list_lock)
> > */
> > if (s->size >= PAGE_SIZE)
> > - capacity = 4;
> > + capacity = 6;
> > else if (s->size >= 1024)
> > - capacity = 12;
> > + capacity = 24;
> > else if (s->size >= 256)
> > - capacity = 26;
> > + capacity = 52;
> > else
> > - capacity = 60;
> > + capacity = 120;
> >
> > /* Increment capacity to make sheaf exactly a kmalloc size bucket */
> > size = struct_size_t(struct slab_sheaf, objects, capacity);
>
> IOPS can be increased from 24M to 29M with this patch, against 7.0-rc3 with
> Vlastimil's today patchset.
BTW, the improvement looks unstable; sometimes it reaches 28–29M, but sometimes
it doesn't, just 25–26M.
Thanks,
next prev parent reply other threads:[~2026-03-11 10:43 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-24 2:52 Ming Lei
2026-02-24 5:00 ` Harry Yoo
2026-02-24 9:07 ` Ming Lei
2026-02-25 5:32 ` Hao Li
2026-02-25 6:54 ` Harry Yoo
2026-02-25 7:06 ` Hao Li
2026-02-25 7:19 ` Harry Yoo
2026-02-25 8:19 ` Hao Li
2026-02-25 8:41 ` Harry Yoo
2026-02-25 8:54 ` Hao Li
2026-02-25 8:21 ` Harry Yoo
2026-02-24 6:51 ` Hao Li
2026-02-24 7:10 ` Harry Yoo
2026-02-24 7:41 ` Hao Li
2026-02-24 20:27 ` Vlastimil Babka
2026-02-25 5:24 ` Harry Yoo
2026-02-25 8:45 ` Vlastimil Babka (SUSE)
2026-02-25 9:31 ` Ming Lei
2026-02-25 11:29 ` Vlastimil Babka (SUSE)
2026-02-25 12:24 ` Ming Lei
2026-02-25 13:22 ` Vlastimil Babka (SUSE)
2026-02-26 18:02 ` Vlastimil Babka (SUSE)
2026-02-27 9:23 ` Ming Lei
2026-03-05 13:05 ` Vlastimil Babka (SUSE)
2026-03-05 15:48 ` Ming Lei
2026-03-06 1:01 ` Ming Lei
2026-03-06 4:17 ` Hao Li
2026-03-06 4:55 ` Harry Yoo
2026-03-06 8:32 ` Hao Li
2026-03-06 8:47 ` Vlastimil Babka (SUSE)
2026-03-06 10:22 ` Ming Lei
2026-03-11 1:10 ` Harry Yoo
2026-03-11 10:15 ` Ming Lei
2026-03-11 10:43 ` Ming Lei [this message]
2026-03-12 4:11 ` Harry Yoo
2026-03-12 11:26 ` Hao Li
2026-03-12 11:56 ` Ming Lei
2026-03-12 12:13 ` Hao Li
2026-03-12 14:50 ` Ming Lei
2026-03-13 3:26 ` Hao Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAFj5m9KutWuGfPN4cfAd-CjFNTCC+fLw0DtMsFFQrRBEeK-8LQ@mail.gmail.com \
--to=ming.lei@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=hao.li@linux.dev \
--cc=harry.yoo@oracle.com \
--cc=hch@infradead.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=vbabka@kernel.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox