linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Hao Li <hao.li@linux.dev>
To: "Harry Yoo (Oracle)" <harry@kernel.org>
Cc: hu.shengming@zte.com.cn, vbabka@kernel.org,
	akpm@linux-foundation.org,  cl@gentwo.org, rientjes@google.com,
	roman.gushchin@linux.dev,  linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, zhang.run@zte.com.cn,
	 xu.xin16@zte.com.cn, yang.tao172@zte.com.cn,
	yang.yang29@zte.com.cn
Subject: Re: [PATCH v2] mm/slub: skip freelist construction for whole-slab bulk refill
Date: Thu, 2 Apr 2026 17:00:26 +0800	[thread overview]
Message-ID: <k2cytuzrmrtzxd36faxp2apfpzrctzuthaifhifitilbddsczu@lfv4hvsien6d> (raw)
In-Reply-To: <ac4k5yhq3zok6m1u@hyeyoo>

On Thu, Apr 02, 2026 at 05:12:23PM +0900, Harry Yoo (Oracle) wrote:
> On Thu, Apr 02, 2026 at 03:03:10PM +0800, hu.shengming@zte.com.cn wrote:
> > Harry wrote:
> > > On Wed, Apr 01, 2026 at 02:55:23PM +0800, Hao Li wrote:
> > > > On Wed, Apr 01, 2026 at 12:57:25PM +0800, hu.shengming@zte.com.cn wrote:
> > > > > @@ -4395,6 +4458,48 @@ static unsigned int alloc_from_new_slab(struct kmem_cache *s, struct slab *slab,
> > > > >      return allocated;
> > > > >  }
> > > > > +static unsigned int alloc_whole_from_new_slab(struct kmem_cache *s,
> > > > > +        struct slab *slab, void **p, bool allow_spin)
> > > > > +{
> > > > > +
> > > > > +    unsigned int allocated = 0;
> > > > > +    void *object, *start;
> > > > > +
> > > > > +    if (alloc_whole_from_new_slab_random(s, slab, p, allow_spin,
> > > > > +                         &allocated)) {
> > > > > +        goto done;
> > > > > +    }
> > > > > +
> > > > > +    start = fixup_red_left(s, slab_address(slab));
> > > > > +    object = setup_object(s, start);
> > > > > +
> > > > > +    while (allocated < slab->objects - 1) {
> > > > > +        p[allocated] = object;
> > > > > +        maybe_wipe_obj_freeptr(s, object);
> > > > > +
> > > > > +        allocated++;
> > > > > +        object += s->size;
> > > > > +        object = setup_object(s, object);
> > > > > +    }
> > > > 
> > > > Also, I feel the current patch contains some duplicated code like this loop.
> > > > 
> > > > Would it make sense to split allocate_slab() into two functions?
> > > > 
> > > > For example,
> > > > the first part could be called allocate_slab_meta_setup() (just an example name)
> > > > And, the second part could be allocate_slab_objects_setup(), with the core logic
> > > > being the loop over objects. Then allocate_slab_objects_setup() could support
> > > > two modes: one called BUILD_FREELIST, which builds the freelist, and another
> > > > called EMIT_OBJECTS, which skips building the freelist and directly places the
> > > > objects into the target array.> 
> > 
> > > Something similar but a little bit more thoughts to unify the code
> > > (**regardless of CONFIG_SLAB_FREELIST_RANDOM**) and avoid treating
> > > "the whole slab->freelist fits into the sheaf" as a special case:> 
> > 
> > > - allocate_slab() no longer builds the freelist.
> > >   the freelist is built only when there are objects left after
> > >   allocating objects from the new slab.> 
> > 
> > > - new_slab() allocates a new slab AND builds the freelist
> > >   to keep existing behaviour.> 
> > 
> > > - refill_objects() allocates a slab using allocate_slab(),
> > >   and passes it to alloc_from_new_slab().> 
> > 
> > >   alloc_from_new_slab() consumes some objects in random order,
> > >   and then build the freelist with the objects left (if exists).> 
> > 
> > > We could actually abstract "iterating free objects in random order"
> > > into an API, and there would be two users of the API:
> > > - Building freelist
> > > - Filling objects into the sheaf (without building freelist!)> 
> > 
> > > Something like this...
> > > (names here are just examples, I'm not good at naming things!)> 
> > 
> > > struct freelist_iter {
> > >     int pos;
> > >     int freelist_count;
> > >     int page_limit;
> > >     void *start;
> > > };> 
> > 
> > > /* note: handling !allow_spin nicely is tricky :-) */
> > > alloc_from_new_slab(...) {
> > >     struct freelist_iter fit;> 
> > 
> > >     prep_freelist_iter(s, slab, &fit, allow_spin);
> > >     while (slab->inuse < min(count, slab->objects)) {
> > >         p[slab->inuse++] = next_freelist_entry(s, &fit);
> > >     }> 
> > 
> > >     if (slab->inuse < slab->objects)
> > >         build_freelist(s, slab, &fit);
> > > }> 
> > 
> > > build_freelist(s, slab, fit) {
> > >     size = slab->objects - slab->inuse;> 
> > 
> > >     cur = next_freelist_entry(s, fit);
> > >     cur = setup_object(s, cur);
> > >     slab->freelist = cur;
> > >     for (i = 1; i < size; i++) {
> > >         next = next_freelist_entry(s, fit);
> > >         next = setup_object(s, next);
> > >         set_freepointer(s, cur, next);
> > >         cur = next;
> > >     }
> > > }> 
> > 
> > > #ifdef CONFIG_SLAB_FREELIST_RANDOM
> > > prep_freelist_iter(s, slab, fit, allow_spin) {
> > >     fit->freelist_count = oo_objects(s->oo);
> > >     fit->page_limit = slab->objects * s->size;
> > >     fit->start = fixup_red_left(s, slab_address(slab));> 
> > 
> > >     if (slab->objects < 2 || !s->random_seq) {
> > >         fit->pos = 0;
> > >     } else if (allow_spin) {
> > >         fit->pos = get_random_u32_below(freelist_count);
> > >     } else {
> > >         struct rnd_state *state;> 
> > 
> > >                 /*                                                              
> > >                  * An interrupt or NMI handler might interrupt and change       
> > >                  * the state in the middle, but that's safe.                        
> > >                  */                                                             
> > >                 state = &get_cpu_var(slab_rnd_state);                           
> > >                 fit->pos = prandom_u32_state(state) % freelist_count;                
> > >                 put_cpu_var(slab_rnd_state);    
> > >     }> 
> > 
> > >     return;
> > > }
> > > next_freelist_entry(s, fit) {
> > >     /*
> > >          * If the target page allocation failed, the number of objects on the
> > >          * page might be smaller than the usual size defined by the cache.
> > >          */
> > >         do {
> > >                 idx = s->random_seq[fit->pos];
> > >                 fit->pos += 1;
> > >                 if (fit->pos >= freelist_count)
> > >                         fit->pos = 0;
> > >         } while (unlikely(idx >= page_limit));> 
> > 
> > >     return (char *)start + idx;
> > > }
> > > #else
> > > prep_freelist_iter(s, slab, fit, allow_spin) {
> > >     fit->pos = 0;
> > >     return;
> > > }
> > > next_freelist_entry(s, fit) {
> > >     void *next = fit->start + fit->pos * s->size;> 
> > 
> > >     fit->pos++;
> > >     return next;
> > > }
> > > #endif> 
> > 
> > Hi Harry,
> > 
> > Thanks a lot for the detailed suggestion. This is a very good direction for
> > restructuring refill_objects().
> > 
> > I agree that abstracting the free-object iteration and making the flow uniform
> > regardless of CONFIG_SLAB_FREELIST_RANDOM is a cleaner approach than keeping
> > the “whole slab fits into the sheaf” case as a special path.
> >
> > Your idea of letting alloc_from_new_slab() consume objects first and only
> > build the freelist for the remainder makes a lot of sense,
> 

Hi, Harry and Shengming

I just finished understanding the new detailed code skeleton, and I think the
idea looks really promising. It seems to unify the handling of shuffle and
non-shuffle cases, while also naturally covering scenarios where a new slab is
not fully consumed. It looks very nice. Let's see how v3 performs going
forward.

> I believe Hao is working on trying to allow consuming all objects (from new
> and partial slabs) to fill sheaves when possible,
>

Yeah, I've had some work going on in that area too, and I'm trying to keep it
from stepping on the changes in the current patch :)

> but it'd still be nice to
> do this as long as it keeps the implementation simple.

Yes, exactly.

> 
> > and should also help reduce the duplicated object-setup logic.
> 
> Yeah, less code means less tokens, so better for the environm... oh wait,
> April Fools day is over! (just joking).

Oh, I didn't even realize yesterday was April Fools' Day. I guess my clock must
be broken :P

>  
> > I’ll rework the patch along these lines, incorporating your and Hao suggestions,
> > and send a v3.
> 
> Thanks for working on this.
> 
> > Thanks again for the thoughtful review.
> 
> No problem!
> 
> -- 
> Cheers,
> Harry / Hyeonggon

-- 
Thanks,
Hao


      reply	other threads:[~2026-04-02  9:01 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-01  4:57 hu.shengming
2026-04-01  6:55 ` Hao Li
2026-04-01 13:56   ` hu.shengming
2026-04-02  9:07     ` Hao Li
2026-04-02  4:53   ` Harry Yoo (Oracle)
2026-04-02  7:03     ` hu.shengming
2026-04-02  8:12       ` Harry Yoo (Oracle)
2026-04-02  9:00         ` Hao Li [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=k2cytuzrmrtzxd36faxp2apfpzrctzuthaifhifitilbddsczu@lfv4hvsien6d \
    --to=hao.li@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=cl@gentwo.org \
    --cc=harry@kernel.org \
    --cc=hu.shengming@zte.com.cn \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=vbabka@kernel.org \
    --cc=xu.xin16@zte.com.cn \
    --cc=yang.tao172@zte.com.cn \
    --cc=yang.yang29@zte.com.cn \
    --cc=zhang.run@zte.com.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox