linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Harry Yoo (Oracle)" <harry@kernel.org>
To: hu.shengming@zte.com.cn
Cc: vbabka@kernel.org, akpm@linux-foundation.org, hao.li@linux.dev,
	cl@gentwo.org, rientjes@google.com, roman.gushchin@linux.dev,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	zhang.run@zte.com.cn, xu.xin16@zte.com.cn,
	yang.tao172@zte.com.cn, yang.yang29@zte.com.cn
Subject: Re: [PATCH v5] mm/slub: defer freelist construction until after bulk allocation from a new slab
Date: Mon, 13 Apr 2026 14:27:54 +0900	[thread overview]
Message-ID: <adx-2p09xEKz5Zdb@hyeyoo> (raw)
In-Reply-To: <20260413131423382u868NVr2RkcvDe0Ii3ERj@zte.com.cn>

On Mon, Apr 13, 2026 at 01:14:23PM +0800, hu.shengming@zte.com.cn wrote:
> Harry wrote:
> > On Thu, Apr 09, 2026 at 08:43:52PM +0800, hu.shengming@zte.com.cn wrote:
> > > From: Shengming Hu <hu.shengming@zte.com.cn>
> > > 
> > > Allocations from a fresh slab can consume all of its objects, and the
> > > freelist built during slab allocation is discarded immediately as a result.
> > > 
> > > Instead of special-casing the whole-slab bulk refill case, defer freelist
> > > construction until after objects are emitted from a fresh slab.
> > > new_slab() now only allocates the slab and initializes its metadata.
> > > refill_objects() then obtains a fresh slab and lets alloc_from_new_slab()
> > > emit objects directly, building a freelist only for the objects left
> > > unallocated; the same change is applied to alloc_single_from_new_slab().
> > > 
> > > To keep CONFIG_SLAB_FREELIST_RANDOM=y/n on the same path, introduce a
> > > small iterator abstraction for walking free objects in allocation order.
> > > The iterator is used both for filling the sheaf and for building the
> > > freelist of the remaining objects.
> > > 
> > > Also mark setup_object() inline. After this optimization, the compiler no
> > > longer consistently inlines this helper in the hot path, which can hurt
> > > performance. Explicitly marking it inline restores the expected code
> > > generation.
> > > 
> > > This reduces per-object overhead when allocating from a fresh slab.
> > > The most direct benefit is in the paths that allocate objects first and
> > > only build a freelist for the remainder afterward: bulk allocation from
> > > a new slab in refill_objects(), single-object allocation from a new slab
> > > in ___slab_alloc(), and the corresponding early-boot paths that now use
> > > the same deferred-freelist scheme. Since refill_objects() is also used to
> > > refill sheaves, the optimization is not limited to the small set of
> > > kmem_cache_alloc_bulk()/kmem_cache_free_bulk() users; regular allocation
> > > workloads may benefit as well when they refill from a fresh slab.
> > > 
> > > In slub_bulk_bench, the time per object drops by about 32% to 71% with
> > > CONFIG_SLAB_FREELIST_RANDOM=n, and by about 52% to 70% with
> > > CONFIG_SLAB_FREELIST_RANDOM=y. This benchmark is intended to isolate the
> > > cost removed by this change: each iteration allocates exactly
> > > slab->objects from a fresh slab. That makes it a near best-case scenario
> > > for deferred freelist construction, because the old path still built a
> > > full freelist even when no objects remained, while the new path avoids
> > > that work. Realistic workloads may see smaller end-to-end gains depending
> > > on how often allocations reach this fresh-slab refill path.
> > > 
> > > Benchmark results (slub_bulk_bench):
> > > Machine: qemu-system-x86 -m 1024M -smp 8 -enable-kvm -cpu host
> > > Kernel: Linux 7.0.0-rc7-next-20260407
> > > Config: x86_64_defconfig
> > > Cpu: 0
> > > Rounds: 20
> > > Total: 256MB
> > 
> > [...]
> > 
> > Hi Shengming, it's been great to see how this patch has been improved
> > since v1 to where it is now. Thanks for taking the feedback and steadily
> > improving things along the way.
> > 
> 
> Hi Harry,
> 
> Thank you very much for your helpful reviews and suggestions from v1 through v5.
> I really appreciate your patience and professionalism throughout the review process,
> and I have learned a lot from your feedback.

You're welcome!

> > I think this is getting pretty close to being ready for mainline,
> > with just one little thing to fix in the code.
> > 
> > Other reviewers/maintainers may also take a look and leave comments
> > when they get a chance.
> > 
> 
> I am also looking forward to any further comments or suggestions from
> other reviewers and maintainers.
> 
> > > Link: https://github.com/HSM6236/slub_bulk_test.git
> > > Signed-off-by: Shengming Hu <hu.shengming@zte.com.cn>
> > > ---
> > 
> > If you think it's appropriate, please feel free to add:
> > Suggested-by: Harry Yoo (Oracle) <harry@kernel.org>
> > 
> 
> Sure, I will add:
> 
>   Suggested-by: Harry Yoo (Oracle) <harry@kernel.org>

Thanks, and

> Thanks again for your continued review and guidance.

no problem!

> > In case this was assisted by AI or other tools, please disclose that
> > according to the process document:
> > 
> >   https://docs.kernel.org/process/generated-content.html
> >   https://docs.kernel.org/process/coding-assistants.html
> > 
> > Not that I think this was assisted by AI, just mentioning because
> > sometimes people using tools to develop the kernel are not aware that
> > they need to disclose the fact. It wouldn't hurt to remind people :-)
> > 
> 
> Regarding AI disclosure: I only used an AI tool to polish the English wording
> of the commit message, since I am not fully confident in my English writing. :-)

Haha, that's fine and I'm not a native english speaker either :)

> As I understand it, the documentation says that "spelling and grammar fix ups,
> like rephrasing to imperative voice" are out of scope, so I believe an
> Assisted-by tag is not needed in this case.

You are right. In that case it's not necessary.
Thanks for clarifying!

> > > Changes in v5:
> > > - Call build_slab_freelist() unconditionally, and remove the redundant "slab->freelist = NULL" initialization in allocate_slab().
> > > - Check the return value of alloc_from_new_slab() to prevent a potential use-after-free bug.
> > > - Refine the commit message with more precise test coverage descriptions.
> > > - Link to v4: https://lore.kernel.org/all/2026040823281824773ybHpC3kgUhR9OE1rGTl@zte.com.cn/
> > > 
> > > ---
> > >  mm/slab.h |  10 ++
> > >  mm/slub.c | 279 +++++++++++++++++++++++++++---------------------------
> > >  2 files changed, 147 insertions(+), 142 deletions(-)
> > > 
> > > diff --git a/mm/slub.c b/mm/slub.c
> > > index 4927407c9699..9ff8af8c2f73 100644
> > > --- a/mm/slub.c
> > > +++ b/mm/slub.c
> > > @@ -3696,22 +3686,30 @@ static void *alloc_single_from_new_slab(struct kmem_cache *s, struct slab *slab,
> > >           * corruption in theory could cause that.
> > >           * Leak memory of allocated slab.
> > >           */
> > > -        if (!allow_spin)
> > > -            spin_unlock_irqrestore(&n->list_lock, flags);
> > >          return NULL;
> > >      }
> > >  
> > > -    if (allow_spin)
> > > +    n = get_node(s, slab_nid(slab));
> > > +    if (allow_spin) {
> > >          spin_lock_irqsave(&n->list_lock, flags);
> > > +    } else if (!spin_trylock_irqsave(&n->list_lock, flags)) {
> > > +        /*
> > > +         * Unlucky, discard newly allocated slab.
> > > +         * The slab is not fully free, but it's fine as
> > > +         * objects are not allocated to users.
> > > +         */
> > > +        free_new_slab_nolock(s, slab);
> > > +        return NULL;
> > > +    }
> > >  
> > > -    if (slab->inuse == slab->objects)
> > > -        add_full(s, n, slab);
> > > -    else
> > > +    if (needs_add_partial)
> > >          add_partial(n, slab, ADD_TO_HEAD);
> > > +    else
> > > +        add_full(s, n, slab);
> > >  
> > > -    inc_slabs_node(s, nid, slab->objects);
> > >      spin_unlock_irqrestore(&n->list_lock, flags);
> > >  
> > > +    inc_slabs_node(s, slab_nid(slab), slab->objects);
> > 
> > Ouch, I didn't catch this when it was added in v4. When slab debugging
> > feature is enabled for the cache, inc_slabs_node() should be done within
> > the spinlock to avoid race conditions with slab validation.
> > 
> > Perhaps it's worth adding a comment mentioning this :)
> > 
> > See commit c7323a5ad078 ("mm/slub: restrict sysfs validation to debug
> > caches and make it safe") for more details.
> > 
> > With this fixed, please feel free to add:
> > Reviewed-by: Harry Yoo (Oracle) <harry@kernel.org>
> > 
> 
> You are right about the inc_slabs_node() placement. I missed that change when
> it was introduced in v4. Thank you very much for catching it.
> 
> After reading commit c7323a5ad078 ("mm/slub: restrict sysfs validation to debug
> caches and make it safe"), my understanding is that inc_slabs_node() should
> remain under n->list_lock for debug caches, so that validation cannot observe
> inconsistent state during list transitions. I will fix that in the next revision
> and add a comment along these lines.
> 
> Would a comment like the following look good? :-)
> 
> /*
>  * Debug caches require nr_slabs updates under n->list_lock so validation
>  * cannot race with list transitions and observe inconsistent state.
>  */

Mostly LGTM, but perhaps it's better to say
"validation cannot race with slab (de)allocations"
because it's not about transitions between lists,
but rather about slabs being added to or removed from the node while
validation code is iterating over partial and full lists.

Thanks!

-- 
Cheers,
 Harry / Hyeonggon


  reply	other threads:[~2026-04-13  5:28 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-09 12:43 hu.shengming
2026-04-13  3:45 ` Harry Yoo (Oracle)
2026-04-13  5:14   ` hu.shengming
2026-04-13  5:27     ` Harry Yoo (Oracle) [this message]
2026-04-13  6:20       ` hu.shengming

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=adx-2p09xEKz5Zdb@hyeyoo \
    --to=harry@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=cl@gentwo.org \
    --cc=hao.li@linux.dev \
    --cc=hu.shengming@zte.com.cn \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=vbabka@kernel.org \
    --cc=xu.xin16@zte.com.cn \
    --cc=yang.tao172@zte.com.cn \
    --cc=yang.yang29@zte.com.cn \
    --cc=zhang.run@zte.com.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox