Re: [PATCH v3] mm/slub: defer freelist construction until after bulk allocation from a new slab

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "Harry Yoo (Oracle)" <harry@kernel.org>
To: hu.shengming@zte.com.cn
Cc: vbabka@kernel.org, akpm@linux-foundation.org, hao.li@linux.dev,
	cl@gentwo.org, rientjes@google.com, roman.gushchin@linux.dev,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	zhang.run@zte.com.cn, xu.xin16@zte.com.cn,
	yang.tao172@zte.com.cn, yang.yang29@zte.com.cn
Subject: Re: [PATCH v3] mm/slub: defer freelist construction until after bulk allocation from a new slab
Date: Wed, 8 Apr 2026 10:38:32 +0900	[thread overview]
Message-ID: <adWxmEzU23nza_pn@hyeyoo> (raw)
In-Reply-To: <20260407210216761qrDj8RR7pN-ycbvYmA69v@zte.com.cn>

On Tue, Apr 07, 2026 at 09:02:16PM +0800, hu.shengming@zte.com.cn wrote:
> > > diff --git a/mm/slub.c b/mm/slub.c
> > > index fb2c5c57bc4e..88537e577989 100644
> > > --- a/mm/slub.c
> > > +++ b/mm/slub.c
> > > +#ifdef CONFIG_SLAB_FREELIST_RANDOM
> > > +    iter->random = (slab->objects >= 2 && s->random_seq);
> > > +    if (!iter->random)
> > > +        return;
> > > +
> > > +    iter->freelist_count = oo_objects(s->oo);
> > > +    iter->page_limit = slab->objects * s->size;
> > > +
> > > +    if (allow_spin) {
> > > +        iter->pos = get_random_u32_below(iter->freelist_count);
> > > +    } else {
> > > +        struct rnd_state *state;
> > > +
> > > +        /*
> > > +         * An interrupt or NMI handler might interrupt and change
> > > +         * the state in the middle, but that's safe.
> > > +         */
> > > +        state = &get_cpu_var(slab_rnd_state);
> > > +        iter->pos = prandom_u32_state(state) % iter->freelist_count;
> > > +        put_cpu_var(slab_rnd_state);
> > > +    }
> > > +#endif
> > > +}
> > >  static unsigned int alloc_from_new_slab(struct kmem_cache *s, struct slab *slab,
> > >          void **p, unsigned int count, bool allow_spin)
> > 
> > There is one problem with this change; ___slab_alloc() builds the
> > freelist before calling alloc_from_new_slab(), while refill_objects()
> > does not. For consistency, let's allocate a new slab without building
> > freelist in ___slab_alloc() and build the freelist in
> > alloc_single_from_new_slab() and alloc_from_new_slab()?
> > 
> 
> Agreed, also, new_slab() is currently used by both early_kmem_cache_node_alloc()
> and ___slab_alloc(), so I'll rework the early allocation path as well to
> keep the new-slab flow consistent.

Sounds fine to me.

> > > -        slab->inuse++;
> > >          allocated++;
> > >      }
> > > -    slab->freelist = object;
> > > +    slab->inuse = target_inuse;
> > >  
> > >      if (needs_add_partial) {
> > > -
> > > +        build_slab_freelist(s, slab, &iter);
> > 
> > When allow_spin is true, it's building the freelist while holding the
> > spinlock, and that's not great.
> > 
> > Hmm, can we do better?
> > 
> > Perhaps just allocate object(s) from the slab and build the freelist
> > with the objects left (if exists), but free the slab if allow_spin
> > is false AND trylock fails, and accept the fact that the slab may not be
> > fully free when it's freed due to trylock failure?
> > 
> > something like:
> > 
> > alloc_from_new_slab() {
> >     needs_add_partial = (slab->objects > count);
> >     target_inuse = needs_add_partial ? count : slab->objects;
> > 
> >     init_slab_obj_iter(s, slab, &iter, allow_spin);
> >     while (allocated < target_inuse) {
> >         p[allocated] = next_slab_obj(s, &iter);
> >         allocated++;
> >     }
> >     slab->inuse = target_inuse;
> > 
> >     if (needs_add_partial) {
> >         build_slab_freelist(s, slab, &iter);
> >         n = get_node(s, slab_nid(slab))
> >         if (allow_spin) {
> >             spin_lock_irqsave(&n->list_lock, flags);
> >         } else if (!spin_trylock_irqsave(&n->list_lock, flags)) {
> >             /* 
> >              * Unlucky, discard newly allocated slab.
> >              * The slab is not fully free, but it's fine as
> >              * objects are not allocated to users.
> >              */
> >             free_new_slab_nolock(s, slab);
> >             return 0;
> >         }
> >         add_partial(n, slab, ADD_TO_HEAD);
> >         spin_unlock_irqrestore(&n->list_lock, flags);
> >     }
> >     [...]
> > }
> > 
> > And do something similar in alloc_single_from_new_slab() as well.
> > 
> 
> Good point. I'll restructure the path so objects are emitted first, the leftover
> freelist is built only if needed, and the slab is added to partial afterwards. For
> the !allow_spin trylock failure case, I'll discard the new slab and return 0. I'll
> do the same for the single-object path as well.

Thanks. Please keep in mind that you need to enable SLUB_TINY or slab_debug
to test the alloc_single_from_new_slab() path.

> > >          if (allow_spin) {
> > >              n = get_node(s, slab_nid(slab));
> > >              spin_lock_irqsave(&n->list_lock, flags);
> > > @@ -7244,16 +7265,15 @@ refill_objects(struct kmem_cache *s, void **p, gfp_t gfp, unsigned int min,
> > >  
> > >  new_slab:
> > >  
> > > -    slab = new_slab(s, gfp, local_node);
> > > +    slab_gfp = prepare_slab_alloc_flags(s, gfp);
> > 
> > Could we do `flags = prepare_slab_alloc_flags(s, flags);`
> > within allocate_slab()? Having gfp and slab_gfp flags is distractive.
> > The value of allow_spin should not change after
> > prepare_slab_alloc_flags() anyway.
> > 
> 
> Agreed. I'll move the prepare_slab_alloc_flags() handling into allocate_slab()
> so the call sites stay simpler, and keep the iterator/freelist construction
> local to alloc_single_from_new_slab() and alloc_from_new_slab().

Thanks.

> I also have a question about the allow_spin semantics in the refill_objects path.
> 
> Now that init_slab_obj_iter() has been moved into alloc_from_new_slab(..., true),
> allow_spin on this path appears to be unconditionally set to true.
> 
> Previously, shuffle_freelist() received allow_spin = gfpflags_allow_spinning(gfp),
> so I wanted to check whether moving init_slab_obj_iter() into alloc_from_new_slab()
> changes the intended semantics here.
> 
> My current understanding is that this is still fine, because refill_objects()
> is guaranteed to run only when spinning is allowed. Is that correct?

That's correct and it's fine.

> > > +    allow_spin = gfpflags_allow_spinning(slab_gfp);
> > > +
> > > +    slab = allocate_slab(s, slab_gfp, local_node, allow_spin);
> > >      if (!slab)
> > >          goto out;
> > >  
> > >      stat(s, ALLOC_SLAB);
> > >  
> > > -    /*
> > > -     * TODO: possible optimization - if we know we will consume the whole
> > > -     * slab we might skip creating the freelist?
> > > -     */
> > >      refilled += alloc_from_new_slab(s, slab, p + refilled, max - refilled,
> > >                      /* allow_spin = */ true);
> > >  
> > > -- 
> > > 2.25.1
> > 
> > -- 
> > Cheers,
> > Harry / Hyeonggon
> 
> Thanks again. I will fold these changes into the next revision.

Thanks!

> --
> With Best Regards,
> Shengming

-- 
Cheers,
Harry / Hyeonggon

next prev parent reply	other threads:[~2026-04-08  1:38 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-06 13:50 hu.shengming
2026-04-07  4:19 ` Harry Yoo (Oracle)
2026-04-07 13:02   ` hu.shengming
2026-04-08  1:38     ` Harry Yoo (Oracle) [this message]
2026-04-08  4:21       ` hu.shengming

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=adWxmEzU23nza_pn@hyeyoo \
    --to=harry@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=cl@gentwo.org \
    --cc=hao.li@linux.dev \
    --cc=hu.shengming@zte.com.cn \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=vbabka@kernel.org \
    --cc=xu.xin16@zte.com.cn \
    --cc=yang.tao172@zte.com.cn \
    --cc=yang.yang29@zte.com.cn \
    --cc=zhang.run@zte.com.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox