Re: [PATCH v2] mm/slub: skip freelist construction for whole-slab bulk refill

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: <hu.shengming@zte.com.cn>
To: <harry@kernel.org>
Cc: <hao.li@linux.dev>, <vbabka@kernel.org>,
	<akpm@linux-foundation.org>, <cl@gentwo.org>,
	<rientjes@google.com>, <roman.gushchin@linux.dev>,
	<linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>,
	<zhang.run@zte.com.cn>, <xu.xin16@zte.com.cn>,
	<yang.tao172@zte.com.cn>, <yang.yang29@zte.com.cn>
Subject: Re: [PATCH v2] mm/slub: skip freelist construction for whole-slab bulk refill
Date: Thu, 2 Apr 2026 15:03:10 +0800 (CST)	[thread overview]
Message-ID: <20260402150310775aOAcX92pJLmjcUIRoWFER@zte.com.cn> (raw)
In-Reply-To: <ac32ZQMxSSZ2VsNY@hyeyoo>

Harry wrote:
> On Wed, Apr 01, 2026 at 02:55:23PM +0800, Hao Li wrote:
> > On Wed, Apr 01, 2026 at 12:57:25PM +0800, hu.shengming@zte.com.cn wrote:
> > > @@ -4395,6 +4458,48 @@ static unsigned int alloc_from_new_slab(struct kmem_cache *s, struct slab *slab,
> > >      return allocated;
> > >  }
> > >  
> > > +static unsigned int alloc_whole_from_new_slab(struct kmem_cache *s,
> > > +        struct slab *slab, void **p, bool allow_spin)
> > > +{
> > > +
> > > +    unsigned int allocated = 0;
> > > +    void *object, *start;
> > > +
> > > +    if (alloc_whole_from_new_slab_random(s, slab, p, allow_spin,
> > > +                         &allocated)) {
> > > +        goto done;
> > > +    }
> > > +
> > > +    start = fixup_red_left(s, slab_address(slab));
> > > +    object = setup_object(s, start);
> > > +
> > > +    while (allocated < slab->objects - 1) {
> > > +        p[allocated] = object;
> > > +        maybe_wipe_obj_freeptr(s, object);
> > > +
> > > +        allocated++;
> > > +        object += s->size;
> > > +        object = setup_object(s, object);
> > > +    }
> > 
> > Also, I feel the current patch contains some duplicated code like this loop.
> > 
> > Would it make sense to split allocate_slab() into two functions?
> > 
> > For example,
> > the first part could be called allocate_slab_meta_setup() (just an example name)
> > And, the second part could be allocate_slab_objects_setup(), with the core logic
> > being the loop over objects. Then allocate_slab_objects_setup() could support
> > two modes: one called BUILD_FREELIST, which builds the freelist, and another
> > called EMIT_OBJECTS, which skips building the freelist and directly places the
> > objects into the target array.> 

> Something similar but a little bit more thoughts to unify the code
> (**regardless of CONFIG_SLAB_FREELIST_RANDOM**) and avoid treating
> "the whole slab->freelist fits into the sheaf" as a special case:> 

> - allocate_slab() no longer builds the freelist.
>   the freelist is built only when there are objects left after
>   allocating objects from the new slab.> 

> - new_slab() allocates a new slab AND builds the freelist
>   to keep existing behaviour.> 

> - refill_objects() allocates a slab using allocate_slab(),
>   and passes it to alloc_from_new_slab().> 

>   alloc_from_new_slab() consumes some objects in random order,
>   and then build the freelist with the objects left (if exists).> 

> We could actually abstract "iterating free objects in random order"
> into an API, and there would be two users of the API:
> - Building freelist
> - Filling objects into the sheaf (without building freelist!)> 

> Something like this...
> (names here are just examples, I'm not good at naming things!)> 

> struct freelist_iter {
>     int pos;
>     int freelist_count;
>     int page_limit;
>     void *start;
> };> 

> /* note: handling !allow_spin nicely is tricky :-) */
> alloc_from_new_slab(...) {
>     struct freelist_iter fit;> 

>     prep_freelist_iter(s, slab, &fit, allow_spin);
>     while (slab->inuse < min(count, slab->objects)) {
>         p[slab->inuse++] = next_freelist_entry(s, &fit);
>     }> 

>     if (slab->inuse < slab->objects)
>         build_freelist(s, slab, &fit);
> }> 

> build_freelist(s, slab, fit) {
>     size = slab->objects - slab->inuse;> 

>     cur = next_freelist_entry(s, fit);
>     cur = setup_object(s, cur);
>     slab->freelist = cur;
>     for (i = 1; i < size; i++) {
>         next = next_freelist_entry(s, fit);
>         next = setup_object(s, next);
>         set_freepointer(s, cur, next);
>         cur = next;
>     }
> }> 

> #ifdef CONFIG_SLAB_FREELIST_RANDOM
> prep_freelist_iter(s, slab, fit, allow_spin) {
>     fit->freelist_count = oo_objects(s->oo);
>     fit->page_limit = slab->objects * s->size;
>     fit->start = fixup_red_left(s, slab_address(slab));> 

>     if (slab->objects < 2 || !s->random_seq) {
>         fit->pos = 0;
>     } else if (allow_spin) {
>         fit->pos = get_random_u32_below(freelist_count);
>     } else {
>         struct rnd_state *state;> 

>                 /*                                                              
>                  * An interrupt or NMI handler might interrupt and change       
>                  * the state in the middle, but that's safe.                        
>                  */                                                             
>                 state = &get_cpu_var(slab_rnd_state);                           
>                 fit->pos = prandom_u32_state(state) % freelist_count;                
>                 put_cpu_var(slab_rnd_state);    
>     }> 

>     return;
> }
> next_freelist_entry(s, fit) {
>     /*
>          * If the target page allocation failed, the number of objects on the
>          * page might be smaller than the usual size defined by the cache.
>          */
>         do {
>                 idx = s->random_seq[fit->pos];
>                 fit->pos += 1;
>                 if (fit->pos >= freelist_count)
>                         fit->pos = 0;
>         } while (unlikely(idx >= page_limit));> 

>     return (char *)start + idx;
> }
> #else
> prep_freelist_iter(s, slab, fit, allow_spin) {
>     fit->pos = 0;
>     return;
> }
> next_freelist_entry(s, fit) {
>     void *next = fit->start + fit->pos * s->size;> 

>     fit->pos++;
>     return next;
> }
> #endif> 

Hi Harry,

Thanks a lot for the detailed suggestion. This is a very good direction for
restructuring refill_objects().

I agree that abstracting the free-object iteration and making the flow uniform
regardless of CONFIG_SLAB_FREELIST_RANDOM is a cleaner approach than keeping
the “whole slab fits into the sheaf” case as a special path. Your idea of letting
alloc_from_new_slab() consume objects first and only build the freelist for the
remainder makes a lot of sense, and should also help reduce the duplicated
object-setup logic.

I’ll rework the patch along these lines, incorporating your and Hao suggestions,
and send a v3.

Thanks again for the thoughtful review.

--
With Best Regards,
Shengming

> -- 
> Cheers,
> Harry / Hyeonggon

next prev parent reply	other threads:[~2026-04-02  7:03 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-01  4:57 hu.shengming
2026-04-01  6:55 ` Hao Li
2026-04-01 13:56   ` hu.shengming
2026-04-02  9:07     ` Hao Li
2026-04-02  4:53   ` Harry Yoo (Oracle)
2026-04-02  7:03     ` hu.shengming [this message]
2026-04-02  8:12       ` Harry Yoo (Oracle)
2026-04-02  9:00         ` Hao Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260402150310775aOAcX92pJLmjcUIRoWFER@zte.com.cn \
    --to=hu.shengming@zte.com.cn \
    --cc=akpm@linux-foundation.org \
    --cc=cl@gentwo.org \
    --cc=hao.li@linux.dev \
    --cc=harry@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=vbabka@kernel.org \
    --cc=xu.xin16@zte.com.cn \
    --cc=yang.tao172@zte.com.cn \
    --cc=yang.yang29@zte.com.cn \
    --cc=zhang.run@zte.com.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox