From: "Harry Yoo (Oracle)" <harry@kernel.org>
To: hu.shengming@zte.com.cn
Cc: vbabka@kernel.org, akpm@linux-foundation.org, hao.li@linux.dev,
cl@gentwo.org, rientjes@google.com, roman.gushchin@linux.dev,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
zhang.run@zte.com.cn, xu.xin16@zte.com.cn,
yang.tao172@zte.com.cn, yang.yang29@zte.com.cn
Subject: Re: [PATCH v3] mm/slub: defer freelist construction until after bulk allocation from a new slab
Date: Wed, 8 Apr 2026 10:38:32 +0900 [thread overview]
Message-ID: <adWxmEzU23nza_pn@hyeyoo> (raw)
In-Reply-To: <20260407210216761qrDj8RR7pN-ycbvYmA69v@zte.com.cn>
On Tue, Apr 07, 2026 at 09:02:16PM +0800, hu.shengming@zte.com.cn wrote:
> > > diff --git a/mm/slub.c b/mm/slub.c
> > > index fb2c5c57bc4e..88537e577989 100644
> > > --- a/mm/slub.c
> > > +++ b/mm/slub.c
> > > +#ifdef CONFIG_SLAB_FREELIST_RANDOM
> > > + iter->random = (slab->objects >= 2 && s->random_seq);
> > > + if (!iter->random)
> > > + return;
> > > +
> > > + iter->freelist_count = oo_objects(s->oo);
> > > + iter->page_limit = slab->objects * s->size;
> > > +
> > > + if (allow_spin) {
> > > + iter->pos = get_random_u32_below(iter->freelist_count);
> > > + } else {
> > > + struct rnd_state *state;
> > > +
> > > + /*
> > > + * An interrupt or NMI handler might interrupt and change
> > > + * the state in the middle, but that's safe.
> > > + */
> > > + state = &get_cpu_var(slab_rnd_state);
> > > + iter->pos = prandom_u32_state(state) % iter->freelist_count;
> > > + put_cpu_var(slab_rnd_state);
> > > + }
> > > +#endif
> > > +}
> > > static unsigned int alloc_from_new_slab(struct kmem_cache *s, struct slab *slab,
> > > void **p, unsigned int count, bool allow_spin)
> >
> > There is one problem with this change; ___slab_alloc() builds the
> > freelist before calling alloc_from_new_slab(), while refill_objects()
> > does not. For consistency, let's allocate a new slab without building
> > freelist in ___slab_alloc() and build the freelist in
> > alloc_single_from_new_slab() and alloc_from_new_slab()?
> >
>
> Agreed, also, new_slab() is currently used by both early_kmem_cache_node_alloc()
> and ___slab_alloc(), so I'll rework the early allocation path as well to
> keep the new-slab flow consistent.
Sounds fine to me.
> > > - slab->inuse++;
> > > allocated++;
> > > }
> > > - slab->freelist = object;
> > > + slab->inuse = target_inuse;
> > >
> > > if (needs_add_partial) {
> > > -
> > > + build_slab_freelist(s, slab, &iter);
> >
> > When allow_spin is true, it's building the freelist while holding the
> > spinlock, and that's not great.
> >
> > Hmm, can we do better?
> >
> > Perhaps just allocate object(s) from the slab and build the freelist
> > with the objects left (if exists), but free the slab if allow_spin
> > is false AND trylock fails, and accept the fact that the slab may not be
> > fully free when it's freed due to trylock failure?
> >
> > something like:
> >
> > alloc_from_new_slab() {
> > needs_add_partial = (slab->objects > count);
> > target_inuse = needs_add_partial ? count : slab->objects;
> >
> > init_slab_obj_iter(s, slab, &iter, allow_spin);
> > while (allocated < target_inuse) {
> > p[allocated] = next_slab_obj(s, &iter);
> > allocated++;
> > }
> > slab->inuse = target_inuse;
> >
> > if (needs_add_partial) {
> > build_slab_freelist(s, slab, &iter);
> > n = get_node(s, slab_nid(slab))
> > if (allow_spin) {
> > spin_lock_irqsave(&n->list_lock, flags);
> > } else if (!spin_trylock_irqsave(&n->list_lock, flags)) {
> > /*
> > * Unlucky, discard newly allocated slab.
> > * The slab is not fully free, but it's fine as
> > * objects are not allocated to users.
> > */
> > free_new_slab_nolock(s, slab);
> > return 0;
> > }
> > add_partial(n, slab, ADD_TO_HEAD);
> > spin_unlock_irqrestore(&n->list_lock, flags);
> > }
> > [...]
> > }
> >
> > And do something similar in alloc_single_from_new_slab() as well.
> >
>
> Good point. I'll restructure the path so objects are emitted first, the leftover
> freelist is built only if needed, and the slab is added to partial afterwards. For
> the !allow_spin trylock failure case, I'll discard the new slab and return 0. I'll
> do the same for the single-object path as well.
Thanks. Please keep in mind that you need to enable SLUB_TINY or slab_debug
to test the alloc_single_from_new_slab() path.
> > > if (allow_spin) {
> > > n = get_node(s, slab_nid(slab));
> > > spin_lock_irqsave(&n->list_lock, flags);
> > > @@ -7244,16 +7265,15 @@ refill_objects(struct kmem_cache *s, void **p, gfp_t gfp, unsigned int min,
> > >
> > > new_slab:
> > >
> > > - slab = new_slab(s, gfp, local_node);
> > > + slab_gfp = prepare_slab_alloc_flags(s, gfp);
> >
> > Could we do `flags = prepare_slab_alloc_flags(s, flags);`
> > within allocate_slab()? Having gfp and slab_gfp flags is distractive.
> > The value of allow_spin should not change after
> > prepare_slab_alloc_flags() anyway.
> >
>
> Agreed. I'll move the prepare_slab_alloc_flags() handling into allocate_slab()
> so the call sites stay simpler, and keep the iterator/freelist construction
> local to alloc_single_from_new_slab() and alloc_from_new_slab().
Thanks.
> I also have a question about the allow_spin semantics in the refill_objects path.
>
> Now that init_slab_obj_iter() has been moved into alloc_from_new_slab(..., true),
> allow_spin on this path appears to be unconditionally set to true.
>
> Previously, shuffle_freelist() received allow_spin = gfpflags_allow_spinning(gfp),
> so I wanted to check whether moving init_slab_obj_iter() into alloc_from_new_slab()
> changes the intended semantics here.
>
> My current understanding is that this is still fine, because refill_objects()
> is guaranteed to run only when spinning is allowed. Is that correct?
That's correct and it's fine.
> > > + allow_spin = gfpflags_allow_spinning(slab_gfp);
> > > +
> > > + slab = allocate_slab(s, slab_gfp, local_node, allow_spin);
> > > if (!slab)
> > > goto out;
> > >
> > > stat(s, ALLOC_SLAB);
> > >
> > > - /*
> > > - * TODO: possible optimization - if we know we will consume the whole
> > > - * slab we might skip creating the freelist?
> > > - */
> > > refilled += alloc_from_new_slab(s, slab, p + refilled, max - refilled,
> > > /* allow_spin = */ true);
> > >
> > > --
> > > 2.25.1
> >
> > --
> > Cheers,
> > Harry / Hyeonggon
>
> Thanks again. I will fold these changes into the next revision.
Thanks!
> --
> With Best Regards,
> Shengming
--
Cheers,
Harry / Hyeonggon
next prev parent reply other threads:[~2026-04-08 1:38 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-06 13:50 hu.shengming
2026-04-07 4:19 ` Harry Yoo (Oracle)
2026-04-07 13:02 ` hu.shengming
2026-04-08 1:38 ` Harry Yoo (Oracle) [this message]
2026-04-08 4:21 ` hu.shengming
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=adWxmEzU23nza_pn@hyeyoo \
--to=harry@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=cl@gentwo.org \
--cc=hao.li@linux.dev \
--cc=hu.shengming@zte.com.cn \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=vbabka@kernel.org \
--cc=xu.xin16@zte.com.cn \
--cc=yang.tao172@zte.com.cn \
--cc=yang.yang29@zte.com.cn \
--cc=zhang.run@zte.com.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox