Re: [PATCH v4 00/22] slab: replace cpu (partial) slabs with sheaves

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Hao Li <hao.li@linux.dev>
To: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>,
	Harry Yoo <harry.yoo@oracle.com>,
	 Petr Tesarik <ptesarik@suse.com>,
	Christoph Lameter <cl@gentwo.org>,
	 David Rientjes <rientjes@google.com>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	 Andrew Morton <akpm@linux-foundation.org>,
	Uladzislau Rezki <urezki@gmail.com>,
	 Suren Baghdasaryan <surenb@google.com>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	 Alexei Starovoitov <ast@kernel.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	 linux-rt-devel@lists.linux.dev, bpf@vger.kernel.org,
	kasan-dev@googlegroups.com,
	 kernel test robot <oliver.sang@intel.com>,
	stable@vger.kernel.org, "Paul E. McKenney" <paulmck@kernel.org>
Subject: Re: [PATCH v4 00/22] slab: replace cpu (partial) slabs with sheaves
Date: Fri, 30 Jan 2026 12:38:48 +0800	[thread overview]
Message-ID: <k3ntrr6kyekjwh2yeawk2pvtiilnoltsxipdzdgzaby2cdon6c@yknpymvklz4y> (raw)
In-Reply-To: <aewj4cm6qojpm25qbn5pf75jg3xdd5zue2t4lvxtvgjbhoc3rx@b5u5pysccldy>

On Thu, Jan 29, 2026 at 11:44:21AM -0500, Liam R. Howlett wrote:
> * Hao Li <hao.li@linux.dev> [260129 11:07]:
> > On Thu, Jan 29, 2026 at 04:28:01PM +0100, Vlastimil Babka wrote:
> > > On 1/29/26 16:18, Hao Li wrote:
> > > > Hi Vlastimil,
> > > > 
> > > > I conducted a detailed performance evaluation of the each patch on my setup.
> > > 
> > > Thanks! What was the benchmark(s) used?
> 
> Yes, Thank you for running the benchmarks!
> 
> > 
> > I'm currently using the mmap2 test case from will-it-scale. The machine is still
> > an AMD 2-socket system, with 2 nodes per socket, totaling 192 CPUs, with SMT
> > disabled. For each test run, I used 64, 128, and 192 processes respectively.
> 
> What about the other tests you ran in the detailed evaluation, were
> there other regressions?  It might be worth including the list of tests
> that showed issues and some of the raw results (maybe at the end of your
> email) to show what you saw more clearly.  I did notice you had done
> this previously.

Hi, Liam

I only ran the mmap2 use case of will-it-scale. And now I have some new test results, and
I will share the raw data later.

> 
> Was the regression in the threaded or processes version of mmap2?

It's processes version.

> 
> > 
> > > Importantly, does it rely on vma/maple_node objects?
> > 
> > Yes, this test primarily puts a lot of pressure on maple_node.
> > 
> > > So previously those would become kind of double
> > > cached by both sheaves and cpu (partial) slabs (and thus hopefully benefited
> > > more than they should) since sheaves introduction in 6.18, and now they are
> > > not double cached anymore?
> > 
> > Exactly, since version 6.18, maple_node has indeed benefited from a dual-layer
> > cache.
> > 
> > I did wonder if this isn't a performance regression but rather the
> > performance returning to its baseline after removing one layer of caching.
> > 
> > However, verifying this idea would require completely disabling the sheaf
> > mechanism on version 6.19-rc5 while leaving the rest of the SLUB code untouched.
> > It would be great to hear any suggestions on how this might be approached.
> 
> You could use perf record to capture the differences on the two kernels.
> You could also user perf to look at the differences between three kernel
> versions:
> 1. pre-sheaves entirely
> 2. the 'dual layer' cache
> 3. The final version

That's right, this is exactly the test I just completed. I will send a separate
email later.

> 
> In these scenarios, it's not worth looking at the numbers, but just the
> differences since the debug required to get meaningful information makes
> the results hugely slow and, potentially, not as consistent.  Sometimes
> I run them multiple time to ensure what I'm seeing makes sense for a
> particular comparison (and the server didn't just rotate the logs or
> whatever..)

Yes, that's right. This is important. I also ran it multiple times to observe
data stability and took the average value.

> 
> > 
> > > 
> > > > During my tests, I observed two points in the series where performance
> > > > regressions occurred:
> > > > 
> > > >     Patch 10: I noticed a ~16% regression in my environment. My hypothesis is
> > > >     that with this patch, the allocation fast path bypasses the percpu partial
> > > >     list, leading to increased contention on the node list.
> > > 
> > > That makes sense.
> > > 
> > > >     Patch 12: This patch seems to introduce an additional ~9.7% regression. I
> > > >     suspect this might be because the free path also loses buffering from the
> > > >     percpu partial list, further exacerbating node list contention.
> > > 
> > > Hmm yeah... we did put the previously full slabs there, avoiding the lock.
> > > 
> > > > These are the only two patches in the series where I observed noticeable
> > > > regressions. The rest of the patches did not show significant performance
> > > > changes in my tests.
> > > > 
> > > > I hope these test results are helpful.
> > > 
> > > They are, thanks. I'd however hope it's just some particular test that has
> > > these regressions,
> > 
> > Yes, I hope so too. And the mmap2 test case is indeed quite extreme.
> > 
> > > which can be explained by the loss of double caching.
> > 
> > If we could compare it with a version that only uses the
> > CPU partial list, the answer might become clearer.
> 
> In my experience, micro-benchmarks are good at identifying specific
> failure points of a patch set, but unless an entire area of benchmarks
> regress (ie all mmap threaded), then they rarely tell the whole story.

Yes. This make sense to me.

> 
> Are the benchmarks consistently slower?  This specific test is sensitive
> to alignment because of the 128MB mmap/munmap operation.  Sometimes, you
> will see a huge spike at a particular process/thread count that moves
> around in tests like this.  Was your run consistently lower?

Yes, my test results have been quite stable, probably because the machine was
relatively idle.

Thanks for your reply and discuss!

-- 
Thanks,
Hao

> 
> Thanks,
> Liam
>

next prev parent reply	other threads:[~2026-01-30  4:39 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-23  6:52 Vlastimil Babka
2026-01-23  6:52 ` [PATCH v4 01/22] mm/slab: add rcu_barrier() to kvfree_rcu_barrier_on_cache() Vlastimil Babka
2026-01-27 16:08   ` Liam R. Howlett
2026-01-23  6:52 ` [PATCH v4 02/22] mm/slab: fix false lockdep warning in __kfree_rcu_sheaf() Vlastimil Babka
2026-01-23 12:03   ` Sebastian Andrzej Siewior
2026-01-24 10:58     ` Harry Yoo
2026-01-23  6:52 ` [PATCH v4 03/22] slab: add SLAB_CONSISTENCY_CHECKS to SLAB_NEVER_MERGE Vlastimil Babka
2026-01-23  6:52 ` [PATCH v4 04/22] mm/slab: move and refactor __kmem_cache_alias() Vlastimil Babka
2026-01-27 16:17   ` Liam R. Howlett
2026-01-27 16:59     ` Vlastimil Babka
2026-01-23  6:52 ` [PATCH v4 05/22] mm/slab: make caches with sheaves mergeable Vlastimil Babka
2026-01-27 16:23   ` Liam R. Howlett
2026-01-23  6:52 ` [PATCH v4 06/22] slab: add sheaves to most caches Vlastimil Babka
2026-01-26  6:36   ` Hao Li
2026-01-26  8:39     ` Vlastimil Babka
2026-01-26 13:59   ` Breno Leitao
2026-01-27 16:34   ` Liam R. Howlett
2026-01-27 17:01     ` Vlastimil Babka
2026-01-29  7:24   ` Zhao Liu
2026-01-29  8:21     ` Vlastimil Babka
2026-01-30  7:15       ` Zhao Liu
2026-02-04 18:01         ` Vlastimil Babka
2026-01-23  6:52 ` [PATCH v4 07/22] slab: introduce percpu sheaves bootstrap Vlastimil Babka
2026-01-26  6:13   ` Hao Li
2026-01-26  8:42     ` Vlastimil Babka
2026-01-27 17:31   ` Liam R. Howlett
2026-01-23  6:52 ` [PATCH v4 08/22] slab: make percpu sheaves compatible with kmalloc_nolock()/kfree_nolock() Vlastimil Babka
2026-01-23 18:05   ` Alexei Starovoitov
2026-01-27 17:36   ` Liam R. Howlett
2026-01-29  8:25     ` Vlastimil Babka
2026-01-23  6:52 ` [PATCH v4 09/22] slab: handle kmalloc sheaves bootstrap Vlastimil Babka
2026-01-27 18:30   ` Liam R. Howlett
2026-01-23  6:52 ` [PATCH v4 10/22] slab: add optimized sheaf refill from partial list Vlastimil Babka
2026-01-26  7:12   ` Hao Li
2026-01-29  7:43     ` Harry Yoo
2026-01-29  8:29       ` Vlastimil Babka
2026-01-27 20:05   ` Liam R. Howlett
2026-01-29  8:01   ` Harry Yoo
2026-01-23  6:52 ` [PATCH v4 11/22] slab: remove cpu (partial) slabs usage from allocation paths Vlastimil Babka
2026-01-23 18:17   ` Alexei Starovoitov
2026-01-23  6:52 ` [PATCH v4 12/22] slab: remove SLUB_CPU_PARTIAL Vlastimil Babka
2026-01-23  6:52 ` [PATCH v4 13/22] slab: remove the do_slab_free() fastpath Vlastimil Babka
2026-01-23 18:15   ` Alexei Starovoitov
2026-01-23  6:52 ` [PATCH v4 14/22] slab: remove defer_deactivate_slab() Vlastimil Babka
2026-01-23 17:31   ` Alexei Starovoitov
2026-01-23  6:52 ` [PATCH v4 15/22] slab: simplify kmalloc_nolock() Vlastimil Babka
2026-01-23  6:52 ` [PATCH v4 16/22] slab: remove struct kmem_cache_cpu Vlastimil Babka
2026-01-23  6:52 ` [PATCH v4 17/22] slab: remove unused PREEMPT_RT specific macros Vlastimil Babka
2026-01-23  6:52 ` [PATCH v4 18/22] slab: refill sheaves from all nodes Vlastimil Babka
2026-01-27 14:28   ` Mateusz Guzik
2026-01-27 22:04     ` Vlastimil Babka
2026-01-29  9:16   ` Harry Yoo
2026-01-23  6:52 ` [PATCH v4 19/22] slab: update overview comments Vlastimil Babka
2026-01-23  6:52 ` [PATCH v4 20/22] slab: remove frozen slab checks from __slab_free() Vlastimil Babka
2026-01-29  7:16   ` Harry Yoo
2026-01-23  6:52 ` [PATCH v4 21/22] mm/slub: remove DEACTIVATE_TO_* stat items Vlastimil Babka
2026-01-29  7:21   ` Harry Yoo
2026-01-23  6:53 ` [PATCH v4 22/22] mm/slub: cleanup and repurpose some " Vlastimil Babka
2026-01-29  7:40   ` Harry Yoo
2026-01-29 15:18 ` [PATCH v4 00/22] slab: replace cpu (partial) slabs with sheaves Hao Li
2026-01-29 15:28   ` Vlastimil Babka
2026-01-29 16:06     ` Hao Li
2026-01-29 16:44       ` Liam R. Howlett
2026-01-30  4:38         ` Hao Li [this message]
2026-01-30  4:50     ` Hao Li
2026-01-30  6:17       ` Hao Li
2026-02-04 18:02       ` Vlastimil Babka
2026-02-04 18:24         ` Christoph Lameter (Ampere)
2026-02-06 16:44           ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=k3ntrr6kyekjwh2yeawk2pvtiilnoltsxipdzdgzaby2cdon6c@yknpymvklz4y \
    --to=hao.li@linux.dev \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=ast@kernel.org \
    --cc=bigeasy@linutronix.de \
    --cc=bpf@vger.kernel.org \
    --cc=cl@gentwo.org \
    --cc=harry.yoo@oracle.com \
    --cc=kasan-dev@googlegroups.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rt-devel@lists.linux.dev \
    --cc=oliver.sang@intel.com \
    --cc=paulmck@kernel.org \
    --cc=ptesarik@suse.com \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=stable@vger.kernel.org \
    --cc=surenb@google.com \
    --cc=urezki@gmail.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox