From: Vinicius Costa Gomes <vinicius.gomes@intel.com>
To: Hao Li <hao.li@linux.dev>,
vbabka@kernel.org, harry@kernel.org, akpm@linux-foundation.org
Cc: cl@gentwo.org, rientjes@google.com, roman.gushchin@linux.dev,
linux-mm@kvack.org, linux-kernel@vger.kernel.orgg,
Hao Li <hao.li@linux.dev>
Subject: Re: [RFC PATCH] slub: spill refill leftover objects into percpu sheaves
Date: Wed, 15 Apr 2026 13:55:54 -0700 [thread overview]
Message-ID: <87a4v47xk5.fsf@intel.com> (raw)
In-Reply-To: <20260410112202.142597-1-hao.li@linux.dev>
Hao Li <hao.li@linux.dev> writes:
> When performing objects refill, we tend to optimistically assume that
> there will be more allocation requests coming next; this is the
> fundamental assumption behind this optimization.
>
> When __refill_objects_node() isolates a partial slab and satisfies a
> bulk allocation from its freelist, the slab can still have a small tail
> of free objects left over. Today those objects are freed back to the
> slab immediately.
>
> If the leftover tail is local and small enough to fit, keep it in the
> current CPU's sheaves instead. This avoids pushing those objects back
> through the __slab_free slowpath.
>
> Add a helper to obtain both the freelist and its free-object count, and
> then spill the remaining objects into a percpu sheaf when:
> - the tail fits in a sheaf
> - the slab is local to the current CPU
> - the slab is not pfmemalloc
> - the target sheaf has enough free space
>
> Otherwise keep the existing fallback and free the tail back to the slab.
>
> Also add a SHEAF_SPILL stat so the new path can be observed in SLUB
> stats.
>
> On the mmap2 case in the will-it-scale benchmark suite, this patch can
> improve performance by about 2~5%.
>
> Signed-off-by: Hao Li <hao.li@linux.dev>
> ---
>
> This patch is an exploratory attempt to address the leftover objects and
> partial slab issues in the refill path, and it is marked as RFC to warmly
> welcome any feedback, suggestions, and discussion!
>
I was also looking at these regressions, but I went from a different
direction, and ended up with 3 patches:
1. the regressions showed a lot of increase in the cache misses,
which gave me the idea that a cache would help (and it seemed to help)
2. Allowing smaller refills (but potentially more frequent);
3. A cute (but with small impact) use of prefetch();
The numbers are here (the commentary from the bot are very hit or miss,
so don't pay too much attention to them):
https://github.com/vcgomes/linux/commit/c898c39ee8def5252942281353eda6acdd83d4ea
I am re-running the tests against a more recent tree, but if you
want to take a look:
https://github.com/vcgomes/linux/tree/mm-sheaves-regression-timerfd
Also, if you feel it's useful, I can send a RFC.
> ---
> mm/slub.c | 107 ++++++++++++++++++++++++++++++++++++++++++++----------
> 1 file changed, 88 insertions(+), 19 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 2b2d33cc735c..fe6351ba0e60 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -353,6 +353,7 @@ enum stat_item {
> SHEAF_REFILL, /* Objects refilled to a sheaf */
> SHEAF_ALLOC, /* Allocation of an empty sheaf */
> SHEAF_FREE, /* Freeing of an empty sheaf */
> + SHEAF_SPILL,
> BARN_GET, /* Got full sheaf from barn */
> BARN_GET_FAIL, /* Failed to get full sheaf from barn */
> BARN_PUT, /* Put full sheaf to barn */
> @@ -4279,7 +4280,9 @@ static inline bool pfmemalloc_match(struct slab *slab, gfp_t gfpflags)
> * Assumes this is performed only for caches without debugging so we
> * don't need to worry about adding the slab to the full list.
> */
> -static inline void *get_freelist_nofreeze(struct kmem_cache *s, struct slab *slab)
> +static inline void *__get_freelist_nofreeze(struct kmem_cache *s,
> + struct slab *slab, int *freecount,
> + const char *n)
> {
> struct freelist_counters old, new;
>
> @@ -4293,11 +4296,26 @@ static inline void *get_freelist_nofreeze(struct kmem_cache *s, struct slab *sla
>
> new.inuse = old.objects;
>
> - } while (!slab_update_freelist(s, slab, &old, &new, "get_freelist_nofreeze"));
> + } while (!slab_update_freelist(s, slab, &old, &new, n));
> +
> + if (freecount)
> + *freecount = old.objects - old.inuse;
>
> return old.freelist;
> }
>
> +static inline void *get_freelist_nofreeze(struct kmem_cache *s, struct slab *slab)
> +{
> + return __get_freelist_nofreeze(s, slab, NULL, "get_freelist_nofreeze");
> +}
> +
> +static inline void *get_freelist_and_freecount_nofreeze(struct kmem_cache *s,
> + struct slab *slab,
> + int *freecount)
> +{
> + return __get_freelist_nofreeze(s, slab, freecount, "get_freelist_and_freecount_nofreeze");
> +}
> +
> /*
> * If the object has been wiped upon free, make sure it's fully initialized by
> * zeroing out freelist pointer.
> @@ -7028,10 +7046,15 @@ __refill_objects_node(struct kmem_cache *s, void **p, gfp_t gfp, unsigned int mi
> return 0;
>
> list_for_each_entry_safe(slab, slab2, &pc.slabs, slab_list) {
> + void *head;
> + void *tail;
> + struct slub_percpu_sheaves *pcs;
> + int freecount, local_node, i, cnt = 0;
> + struct slab_sheaf *spill;
>
> list_del(&slab->slab_list);
>
> - object = get_freelist_nofreeze(s, slab);
> + object = get_freelist_and_freecount_nofreeze(s, slab, &freecount);
>
> while (object && refilled < max) {
> p[refilled] = object;
> @@ -7039,28 +7062,72 @@ __refill_objects_node(struct kmem_cache *s, void **p, gfp_t gfp, unsigned int mi
> maybe_wipe_obj_freeptr(s, p[refilled]);
>
> refilled++;
> + freecount--;
> }
>
> + if (!freecount) {
> + if (refilled >= max)
> + break;
> + continue;
> + }
> /*
> - * Freelist had more objects than we can accommodate, we need to
> - * free them back. We can treat it like a detached freelist, just
> - * need to find the tail object.
> + * Freelist had more objects than we can accommodate, we first
> + * try to spill them into percpu sheaf.
> */
> - if (unlikely(object)) {
> - void *head = object;
> - void *tail;
> - int cnt = 0;
> -
> - do {
> - tail = object;
> - cnt++;
> - object = get_freepointer(s, object);
> - } while (object);
> - __slab_free(s, slab, head, tail, cnt, _RET_IP_);
> + if (freecount > s->sheaf_capacity)
> + goto skip_spill;
> + if (slab_test_pfmemalloc(slab))
> + goto skip_spill;
> +
> + if (!local_trylock(&s->cpu_sheaves->lock))
> + goto skip_spill;
> +
> + local_node = numa_mem_id();
> + if (slab_nid(slab) != local_node) {
> + local_unlock(&s->cpu_sheaves->lock);
> + goto skip_spill;
> }
>
> - if (refilled >= max)
> - break;
> + pcs = this_cpu_ptr(s->cpu_sheaves);
> + if (pcs->spare &&
> + (freecount <= (s->sheaf_capacity - pcs->spare->size)))
> + spill = pcs->spare;
> + else if (freecount <= (s->sheaf_capacity - pcs->main->size))
> + spill = pcs->main;
> + else {
> + local_unlock(&s->cpu_sheaves->lock);
> + goto skip_spill;
> + }
> +
> + if (freecount > (s->sheaf_capacity - spill->size)) {
> + local_unlock(&s->cpu_sheaves->lock);
> + goto skip_spill;
> + }
> +
> + for (i = 0; i < freecount; i++) {
> + spill->objects[spill->size] = object;
> + object = get_freepointer(s, object);
> + maybe_wipe_obj_freeptr(s, spill->objects[spill->size]);
> + spill->size++;
> + }
> +
> + local_unlock(&s->cpu_sheaves->lock);
> + stat(s, SHEAF_SPILL);
> + break;
> +skip_spill:
> + /*
> + * Freelist had more objects than we can accommodate or spill,
> + * we need to free them back. We can treat it like a detached freelist,
> + * just need to find the tail object.
> + */
> + head = object;
> + do {
> + tail = object;
> + cnt++;
> + object = get_freepointer(s, object);
> + } while (object);
> + __slab_free(s, slab, head, tail, cnt, _RET_IP_);
> + break;
> }
>
> if (unlikely(!list_empty(&pc.slabs))) {
> @@ -9247,6 +9314,7 @@ STAT_ATTR(SHEAF_FLUSH, sheaf_flush);
> STAT_ATTR(SHEAF_REFILL, sheaf_refill);
> STAT_ATTR(SHEAF_ALLOC, sheaf_alloc);
> STAT_ATTR(SHEAF_FREE, sheaf_free);
> +STAT_ATTR(SHEAF_SPILL, sheaf_spill);
> STAT_ATTR(BARN_GET, barn_get);
> STAT_ATTR(BARN_GET_FAIL, barn_get_fail);
> STAT_ATTR(BARN_PUT, barn_put);
> @@ -9335,6 +9403,7 @@ static struct attribute *slab_attrs[] = {
> &sheaf_refill_attr.attr,
> &sheaf_alloc_attr.attr,
> &sheaf_free_attr.attr,
> + &sheaf_spill_attr.attr,
> &barn_get_attr.attr,
> &barn_get_fail_attr.attr,
> &barn_put_attr.attr,
> --
> 2.50.1
>
>
--
Vinicius
prev parent reply other threads:[~2026-04-15 20:56 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-10 11:16 Hao Li
2026-04-14 8:39 ` Harry Yoo (Oracle)
2026-04-14 9:59 ` Hao Li
2026-04-15 10:20 ` Harry Yoo (Oracle)
2026-04-15 20:55 ` Vinicius Costa Gomes [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87a4v47xk5.fsf@intel.com \
--to=vinicius.gomes@intel.com \
--cc=akpm@linux-foundation.org \
--cc=cl@gentwo.org \
--cc=hao.li@linux.dev \
--cc=harry@kernel.org \
--cc=linux-kernel@vger.kernel.orgg \
--cc=linux-mm@kvack.org \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=vbabka@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox