From: Hao Li <hao.li@linux.dev>
To: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>
Cc: Harry Yoo <harry@kernel.org>, Christoph Lameter <cl@gentwo.org>,
David Rientjes <rientjes@google.com>,
Roman Gushchin <roman.gushchin@linux.dev>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
hu.shengming@zte.com.cn,
Vinicius Costa Gomes <vinicius.gomes@intel.com>
Subject: Re: [PATCH RFC] mm, slab: add an optimistic __slab_try_return_freelist()
Date: Wed, 22 Apr 2026 15:09:56 +0800 [thread overview]
Message-ID: <qfxtirtpoxcuzhnohm7cbo62z3ob3tiknlss6xenjlrjossmpu@pp4wwry3jad4> (raw)
In-Reply-To: <20260421-b4-refill-optimistic-return-v1-1-24f0bfc1acff@kernel.org>
On Tue, Apr 21, 2026 at 04:49:52PM +0200, Vlastimil Babka (SUSE) wrote:
> When we end up returning extraneous objects during refill to a slab
> where we just did a get_freelist_nofreeze(), it is likely no other CPU
> has freed objects to it meanwhile. We can then reattach the remainder of
> the freelist without having to walk the (potentially cache cold)
> freelist to to find its tail to connect slab->freelist to it.
this approach is clever!
>
> Add a __slab_try_return_freelist() function that does that. It's a bit
> like __slab_free() but many situations from there cannot occur in this
> specific scenario so it's simpler.
>
> Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
> ---
> I've got this idea during the discussions on refill spilling, but not
> sure if I've mentioned it on-list. Anyway it seems like there should be
> no downsides (in theory...) so please test if it indeed improves things
> anywhere and then it could be a better baseline before trying anything
> that comes with tradeoffs?
>
> I've included SLUB_STAT items that show to me the optimistic path is
> indeed successful, but maybe they don't need to make it in final version
> due to being rather low-level.
>
> Git version here:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=b4/refill-optimistic-return
>
> It's Linus' upstream few days ago with "mm/slub: defer freelist
> construction until after bulk allocation from a new slab" which is
> heading to 7.2 so also should be considered part of baseline.
> ---
> mm/slub.c | 76 +++++++++++++++++++++++++++++++++++++++++++++++++++++++--------
> 1 file changed, 67 insertions(+), 9 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 35b6cd0efc3b..176bc4936d03 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -373,6 +373,8 @@ enum stat_item {
> SHEAF_PREFILL_OVERSIZE, /* Allocation of oversize sheaf for prefill */
> SHEAF_RETURN_FAST, /* Sheaf return reattached spare sheaf */
> SHEAF_RETURN_SLOW, /* Sheaf return could not reattach spare */
> + REFILL_RETURN_FAST,
> + REFILL_RETURN_SLOW,
> NR_SLUB_STAT_ITEMS
> };
>
> @@ -4323,7 +4325,8 @@ static inline bool pfmemalloc_match(struct slab *slab, gfp_t gfpflags)
> * Assumes this is performed only for caches without debugging so we
> * don't need to worry about adding the slab to the full list.
> */
> -static inline void *get_freelist_nofreeze(struct kmem_cache *s, struct slab *slab)
> +static inline void *get_freelist_nofreeze(struct kmem_cache *s, struct slab *slab,
> + unsigned int *count)
> {
> struct freelist_counters old, new;
>
> @@ -4339,6 +4342,7 @@ static inline void *get_freelist_nofreeze(struct kmem_cache *s, struct slab *sla
>
> } while (!slab_update_freelist(s, slab, &old, &new, "get_freelist_nofreeze"));
>
> + *count = old.objects - old.inuse;
> return old.freelist;
> }
>
> @@ -5502,6 +5506,50 @@ static noinline void free_to_partial_list(
> }
> }
>
> +/*
> + * Try returning a (part of) freelist that we just detached from the slab.
> + * Optimistically assume the slab is still full so we don't need to know
> + * the tail of the freelist. Return to the partial list unconditionally even
> + * if it became empty.
> + *
> + * Fail if the slab isn't full anymore due to a cocurrent free.
> + *
> + * Can be only used for non-debug caches
> + */
> +static bool __slab_try_return_freelist(struct kmem_cache *s, struct slab *slab,
> + void *head, int cnt)
> +{
> + struct freelist_counters old, new;
> + struct kmem_cache_node *n = NULL;
> + unsigned long flags;
> +
> + old.freelist = slab->freelist;
> + old.counters = slab->counters;
> +
> + if (old.freelist)
> + return false;
> +
> + new.freelist = head;
> + new.counters = old.counters;
> + new.inuse -= cnt;
> +
> + n = get_node(s, slab_nid(slab));
> +
> + spin_lock_irqsave(&n->list_lock, flags);
> +
> + if (!slab_update_freelist(s, slab, &old, &new, "__slab_try_return_freelist")) {
> + spin_unlock_irqrestore(&n->list_lock, flags);
> + return false;
> + }
> +
> + add_partial(n, slab, ADD_TO_HEAD);
> + stat(s, FREE_ADD_PARTIAL);
> +
> + spin_unlock_irqrestore(&n->list_lock, flags);
I was just brainstorming a bit here: what if we only try calling
slab_update_freelist without grabbing the lock or touching the partial list at
all?
Instead, we could just toss the slab right back into pc.slabs. That way,
we can let the downstream logic for handling "leftover slabs" take care of this
slab together. It could save us a whole lock/unlock pair.
> + stat(s, REFILL_RETURN_FAST);
> + return true;
> +}
> +
> /*
> * Slow path handling. This may still be called frequently since objects
> * have a longer lifetime than the cpu slabs in most processing loads.
> @@ -7113,34 +7161,40 @@ __refill_objects_node(struct kmem_cache *s, void **p, gfp_t gfp, unsigned int mi
>
> list_for_each_entry_safe(slab, slab2, &pc.slabs, slab_list) {
>
> + unsigned int count;
> +
> list_del(&slab->slab_list);
>
> - object = get_freelist_nofreeze(s, slab);
> + object = get_freelist_nofreeze(s, slab, &count);
>
> - while (object && refilled < max) {
> + while (count && refilled < max) {
> p[refilled] = object;
> object = get_freepointer(s, object);
> maybe_wipe_obj_freeptr(s, p[refilled]);
>
> refilled++;
> + count--;
> }
>
> /*
> * Freelist had more objects than we can accommodate, we need to
> - * free them back. We can treat it like a detached freelist, just
> - * need to find the tail object.
> + * free them back. First we try to be optimistic and assume the
> + * slab is stil full since we just detached its freelist.
> + * Otherwise we must need to find the tail object.
> */
> - if (unlikely(object)) {
> + if (unlikely(count)) {
> void *head = object;
> void *tail;
> - int cnt = 0;
> +
> + if (__slab_try_return_freelist(s, slab, head, count))
> + break;
>
> do {
> tail = object;
> - cnt++;
> object = get_freepointer(s, object);
> } while (object);
> - __slab_free(s, slab, head, tail, cnt, _RET_IP_);
> + __slab_free(s, slab, head, tail, count, _RET_IP_);
> + stat(s, REFILL_RETURN_SLOW);
> }
>
> if (refilled >= max)
> @@ -9366,6 +9420,8 @@ STAT_ATTR(SHEAF_PREFILL_SLOW, sheaf_prefill_slow);
> STAT_ATTR(SHEAF_PREFILL_OVERSIZE, sheaf_prefill_oversize);
> STAT_ATTR(SHEAF_RETURN_FAST, sheaf_return_fast);
> STAT_ATTR(SHEAF_RETURN_SLOW, sheaf_return_slow);
> +STAT_ATTR(REFILL_RETURN_FAST, refill_return_fast);
> +STAT_ATTR(REFILL_RETURN_SLOW, refill_return_slow);
> #endif /* CONFIG_SLUB_STATS */
>
> #ifdef CONFIG_KFENCE
> @@ -9454,6 +9510,8 @@ static const struct attribute *const slab_attrs[] = {
> &sheaf_prefill_oversize_attr.attr,
> &sheaf_return_fast_attr.attr,
> &sheaf_return_slow_attr.attr,
> + &refill_return_fast_attr.attr,
> + &refill_return_slow_attr.attr,
> #endif
> #ifdef CONFIG_FAILSLAB
> &failslab_attr.attr,
>
> ---
> base-commit: da993c58f9bde299a5e1a4e7900125b32dccd2a6
> change-id: 20260421-b4-refill-optimistic-return-f44d3b74cc49
>
> Best regards,
> --
> Vlastimil Babka (SUSE) <vbabka@kernel.org>
>
--
Thanks,
Hao
prev parent reply other threads:[~2026-04-22 7:10 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-21 14:49 Vlastimil Babka (SUSE)
2026-04-22 7:09 ` Hao Li [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=qfxtirtpoxcuzhnohm7cbo62z3ob3tiknlss6xenjlrjossmpu@pp4wwry3jad4 \
--to=hao.li@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=cl@gentwo.org \
--cc=harry@kernel.org \
--cc=hu.shengming@zte.com.cn \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=vbabka@kernel.org \
--cc=vinicius.gomes@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox