linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: <hu.shengming@zte.com.cn>
To: <harry@kernel.org>
Cc: <vbabka@kernel.org>, <akpm@linux-foundation.org>,
	<hao.li@linux.dev>, <cl@gentwo.org>, <rientjes@google.com>,
	<roman.gushchin@linux.dev>, <linux-mm@kvack.org>,
	<linux-kernel@vger.kernel.org>, <zhang.run@zte.com.cn>,
	<xu.xin16@zte.com.cn>, <yang.tao172@zte.com.cn>,
	<yang.yang29@zte.com.cn>
Subject: Re: [PATCH v5] mm/slub: defer freelist construction until after bulk allocation from a new slab
Date: Mon, 13 Apr 2026 13:14:23 +0800 (CST)	[thread overview]
Message-ID: <20260413131423382u868NVr2RkcvDe0Ii3ERj@zte.com.cn> (raw)
In-Reply-To: <adxm5az9EfHr2aYg@hyeyoo>

Harry wrote:
> On Thu, Apr 09, 2026 at 08:43:52PM +0800, hu.shengming@zte.com.cn wrote:
> > From: Shengming Hu <hu.shengming@zte.com.cn>
> > 
> > Allocations from a fresh slab can consume all of its objects, and the
> > freelist built during slab allocation is discarded immediately as a result.
> > 
> > Instead of special-casing the whole-slab bulk refill case, defer freelist
> > construction until after objects are emitted from a fresh slab.
> > new_slab() now only allocates the slab and initializes its metadata.
> > refill_objects() then obtains a fresh slab and lets alloc_from_new_slab()
> > emit objects directly, building a freelist only for the objects left
> > unallocated; the same change is applied to alloc_single_from_new_slab().
> > 
> > To keep CONFIG_SLAB_FREELIST_RANDOM=y/n on the same path, introduce a
> > small iterator abstraction for walking free objects in allocation order.
> > The iterator is used both for filling the sheaf and for building the
> > freelist of the remaining objects.
> > 
> > Also mark setup_object() inline. After this optimization, the compiler no
> > longer consistently inlines this helper in the hot path, which can hurt
> > performance. Explicitly marking it inline restores the expected code
> > generation.
> > 
> > This reduces per-object overhead when allocating from a fresh slab.
> > The most direct benefit is in the paths that allocate objects first and
> > only build a freelist for the remainder afterward: bulk allocation from
> > a new slab in refill_objects(), single-object allocation from a new slab
> > in ___slab_alloc(), and the corresponding early-boot paths that now use
> > the same deferred-freelist scheme. Since refill_objects() is also used to
> > refill sheaves, the optimization is not limited to the small set of
> > kmem_cache_alloc_bulk()/kmem_cache_free_bulk() users; regular allocation
> > workloads may benefit as well when they refill from a fresh slab.
> > 
> > In slub_bulk_bench, the time per object drops by about 32% to 71% with
> > CONFIG_SLAB_FREELIST_RANDOM=n, and by about 52% to 70% with
> > CONFIG_SLAB_FREELIST_RANDOM=y. This benchmark is intended to isolate the
> > cost removed by this change: each iteration allocates exactly
> > slab->objects from a fresh slab. That makes it a near best-case scenario
> > for deferred freelist construction, because the old path still built a
> > full freelist even when no objects remained, while the new path avoids
> > that work. Realistic workloads may see smaller end-to-end gains depending
> > on how often allocations reach this fresh-slab refill path.
> > 
> > Benchmark results (slub_bulk_bench):
> > Machine: qemu-system-x86 -m 1024M -smp 8 -enable-kvm -cpu host
> > Kernel: Linux 7.0.0-rc7-next-20260407
> > Config: x86_64_defconfig
> > Cpu: 0
> > Rounds: 20
> > Total: 256MB
> 
> [...]
> 
> Hi Shengming, it's been great to see how this patch has been improved
> since v1 to where it is now. Thanks for taking the feedback and steadily
> improving things along the way.
> 

Hi Harry,

Thank you very much for your helpful reviews and suggestions from v1 through v5.
I really appreciate your patience and professionalism throughout the review process,
and I have learned a lot from your feedback.

> I think this is getting pretty close to being ready for mainline,
> with just one little thing to fix in the code.
> 
> Other reviewers/maintainers may also take a look and leave comments
> when they get a chance.
> 

I am also looking forward to any further comments or suggestions from
other reviewers and maintainers.

> > Link: https://github.com/HSM6236/slub_bulk_test.git
> > Signed-off-by: Shengming Hu <hu.shengming@zte.com.cn>
> > ---
> 
> If you think it's appropriate, please feel free to add:
> Suggested-by: Harry Yoo (Oracle) <harry@kernel.org>
> 

Sure, I will add:

  Suggested-by: Harry Yoo (Oracle) <harry@kernel.org>

Thanks again for your continued review and guidance.

> In case this was assisted by AI or other tools, please disclose that
> according to the process document:
> 
>   https://docs.kernel.org/process/generated-content.html
>   https://docs.kernel.org/process/coding-assistants.html
> 
> Not that I think this was assisted by AI, just mentioning because
> sometimes people using tools to develop the kernel are not aware that
> they need to disclose the fact. It wouldn't hurt to remind people :-)
> 

Regarding AI disclosure: I only used an AI tool to polish the English wording
of the commit message, since I am not fully confident in my English writing. :-)

As I understand it, the documentation says that "spelling and grammar fix ups,
like rephrasing to imperative voice" are out of scope, so I believe an
Assisted-by tag is not needed in this case. Please let me know if you think otherwise.

> > Changes in v5:
> > - Call build_slab_freelist() unconditionally, and remove the redundant "slab->freelist = NULL" initialization in allocate_slab().
> > - Check the return value of alloc_from_new_slab() to prevent a potential use-after-free bug.
> > - Refine the commit message with more precise test coverage descriptions.
> > - Link to v4: https://lore.kernel.org/all/2026040823281824773ybHpC3kgUhR9OE1rGTl@zte.com.cn/
> > 
> > ---
> >  mm/slab.h |  10 ++
> >  mm/slub.c | 279 +++++++++++++++++++++++++++---------------------------
> >  2 files changed, 147 insertions(+), 142 deletions(-)
> > 
> > diff --git a/mm/slub.c b/mm/slub.c
> > index 4927407c9699..9ff8af8c2f73 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -3696,22 +3686,30 @@ static void *alloc_single_from_new_slab(struct kmem_cache *s, struct slab *slab,
> >           * corruption in theory could cause that.
> >           * Leak memory of allocated slab.
> >           */
> > -        if (!allow_spin)
> > -            spin_unlock_irqrestore(&n->list_lock, flags);
> >          return NULL;
> >      }
> >  
> > -    if (allow_spin)
> > +    n = get_node(s, slab_nid(slab));
> > +    if (allow_spin) {
> >          spin_lock_irqsave(&n->list_lock, flags);
> > +    } else if (!spin_trylock_irqsave(&n->list_lock, flags)) {
> > +        /*
> > +         * Unlucky, discard newly allocated slab.
> > +         * The slab is not fully free, but it's fine as
> > +         * objects are not allocated to users.
> > +         */
> > +        free_new_slab_nolock(s, slab);
> > +        return NULL;
> > +    }
> >  
> > -    if (slab->inuse == slab->objects)
> > -        add_full(s, n, slab);
> > -    else
> > +    if (needs_add_partial)
> >          add_partial(n, slab, ADD_TO_HEAD);
> > +    else
> > +        add_full(s, n, slab);
> >  
> > -    inc_slabs_node(s, nid, slab->objects);
> >      spin_unlock_irqrestore(&n->list_lock, flags);
> >  
> > +    inc_slabs_node(s, slab_nid(slab), slab->objects);
> 
> Ouch, I didn't catch this when it was added in v4. When slab debugging
> feature is enabled for the cache, inc_slabs_node() should be done within
> the spinlock to avoid race conditions with slab validation.
> 
> Perhaps it's worth adding a comment mentioning this :)
> 
> See commit c7323a5ad078 ("mm/slub: restrict sysfs validation to debug
> caches and make it safe") for more details.
> 
> With this fixed, please feel free to add:
> Reviewed-by: Harry Yoo (Oracle) <harry@kernel.org>
> 

You are right about the inc_slabs_node() placement. I missed that change when
it was introduced in v4. Thank you very much for catching it.

After reading commit c7323a5ad078 ("mm/slub: restrict sysfs validation to debug
caches and make it safe"), my understanding is that inc_slabs_node() should
remain under n->list_lock for debug caches, so that validation cannot observe
inconsistent state during list transitions. I will fix that in the next revision
and add a comment along these lines.

Would a comment like the following look good? :-)

/*
 * Debug caches require nr_slabs updates under n->list_lock so validation
 * cannot race with list transitions and observe inconsistent state.
 */

Thank you again for the careful review.

-- 
Cheers,
Shengming

> -- 
> Cheers,
> Harry / Hyeonggon


  reply	other threads:[~2026-04-13  5:14 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-09 12:43 hu.shengming
2026-04-13  3:45 ` Harry Yoo (Oracle)
2026-04-13  5:14   ` hu.shengming [this message]
2026-04-13  5:27     ` Harry Yoo (Oracle)
2026-04-13  6:20       ` hu.shengming

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260413131423382u868NVr2RkcvDe0Ii3ERj@zte.com.cn \
    --to=hu.shengming@zte.com.cn \
    --cc=akpm@linux-foundation.org \
    --cc=cl@gentwo.org \
    --cc=hao.li@linux.dev \
    --cc=harry@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=vbabka@kernel.org \
    --cc=xu.xin16@zte.com.cn \
    --cc=yang.tao172@zte.com.cn \
    --cc=yang.yang29@zte.com.cn \
    --cc=zhang.run@zte.com.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox