linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Vinicius Costa Gomes <vinicius.gomes@intel.com>
To: Hao Li <hao.li@linux.dev>,
	vbabka@kernel.org, harry@kernel.org, akpm@linux-foundation.org
Cc: cl@gentwo.org, rientjes@google.com, roman.gushchin@linux.dev,
	linux-mm@kvack.org, linux-kernel@vger.kernel.orgg,
	Hao Li <hao.li@linux.dev>
Subject: Re: [RFC PATCH] slub: spill refill leftover objects into percpu sheaves
Date: Wed, 15 Apr 2026 13:55:54 -0700	[thread overview]
Message-ID: <87a4v47xk5.fsf@intel.com> (raw)
In-Reply-To: <20260410112202.142597-1-hao.li@linux.dev>

Hao Li <hao.li@linux.dev> writes:

> When performing objects refill, we tend to optimistically assume that
> there will be more allocation requests coming next; this is the
> fundamental assumption behind this optimization.
>
> When __refill_objects_node() isolates a partial slab and satisfies a
> bulk allocation from its freelist, the slab can still have a small tail
> of free objects left over. Today those objects are freed back to the
> slab immediately.
>
> If the leftover tail is local and small enough to fit, keep it in the
> current CPU's sheaves instead. This avoids pushing those objects back
> through the __slab_free slowpath.
>
> Add a helper to obtain both the freelist and its free-object count, and
> then spill the remaining objects into a percpu sheaf when:
> - the tail fits in a sheaf
> - the slab is local to the current CPU
> - the slab is not pfmemalloc
> - the target sheaf has enough free space
>
> Otherwise keep the existing fallback and free the tail back to the slab.
>
> Also add a SHEAF_SPILL stat so the new path can be observed in SLUB
> stats.
>
> On the mmap2 case in the will-it-scale benchmark suite, this patch can
> improve performance by about 2~5%.
>
> Signed-off-by: Hao Li <hao.li@linux.dev>
> ---
>
> This patch is an exploratory attempt to address the leftover objects and
> partial slab issues in the refill path, and it is marked as RFC to warmly
> welcome any feedback, suggestions, and discussion!
>

I was also looking at these regressions, but I went from a different
direction, and ended up with 3 patches:

1. the regressions showed a lot of increase in the cache misses,
   which gave me the idea that a cache would help (and it seemed to help)

2. Allowing smaller refills (but potentially more frequent);

3. A cute (but with small impact) use of prefetch();

The numbers are here (the commentary from the bot are very hit or miss,
so don't pay too much attention to them):

https://github.com/vcgomes/linux/commit/c898c39ee8def5252942281353eda6acdd83d4ea

I am re-running the tests against a more recent tree, but if you
want to take a look:

https://github.com/vcgomes/linux/tree/mm-sheaves-regression-timerfd

Also, if you feel it's useful, I can send a RFC.

> ---
>  mm/slub.c | 107 ++++++++++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 88 insertions(+), 19 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 2b2d33cc735c..fe6351ba0e60 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -353,6 +353,7 @@ enum stat_item {
>  	SHEAF_REFILL,		/* Objects refilled to a sheaf */
>  	SHEAF_ALLOC,		/* Allocation of an empty sheaf */
>  	SHEAF_FREE,		/* Freeing of an empty sheaf */
> +	SHEAF_SPILL,
>  	BARN_GET,		/* Got full sheaf from barn */
>  	BARN_GET_FAIL,		/* Failed to get full sheaf from barn */
>  	BARN_PUT,		/* Put full sheaf to barn */
> @@ -4279,7 +4280,9 @@ static inline bool pfmemalloc_match(struct slab *slab, gfp_t gfpflags)
>   * Assumes this is performed only for caches without debugging so we
>   * don't need to worry about adding the slab to the full list.
>   */
> -static inline void *get_freelist_nofreeze(struct kmem_cache *s, struct slab *slab)
> +static inline void *__get_freelist_nofreeze(struct kmem_cache *s,
> +					    struct slab *slab, int *freecount,
> +					    const char *n)
>  {
>  	struct freelist_counters old, new;
>  
> @@ -4293,11 +4296,26 @@ static inline void *get_freelist_nofreeze(struct kmem_cache *s, struct slab *sla
>  
>  		new.inuse = old.objects;
>  
> -	} while (!slab_update_freelist(s, slab, &old, &new, "get_freelist_nofreeze"));
> +	} while (!slab_update_freelist(s, slab, &old, &new, n));
> +
> +	if (freecount)
> +		*freecount = old.objects - old.inuse;
>  
>  	return old.freelist;
>  }
>  
> +static inline void *get_freelist_nofreeze(struct kmem_cache *s, struct slab *slab)
> +{
> +	return __get_freelist_nofreeze(s, slab, NULL, "get_freelist_nofreeze");
> +}
> +
> +static inline void *get_freelist_and_freecount_nofreeze(struct kmem_cache *s,
> +							struct slab *slab,
> +							int *freecount)
> +{
> +	return __get_freelist_nofreeze(s, slab, freecount, "get_freelist_and_freecount_nofreeze");
> +}
> +
>  /*
>   * If the object has been wiped upon free, make sure it's fully initialized by
>   * zeroing out freelist pointer.
> @@ -7028,10 +7046,15 @@ __refill_objects_node(struct kmem_cache *s, void **p, gfp_t gfp, unsigned int mi
>  		return 0;
>  
>  	list_for_each_entry_safe(slab, slab2, &pc.slabs, slab_list) {
> +		void *head;
> +		void *tail;
> +		struct slub_percpu_sheaves *pcs;
> +		int freecount, local_node, i, cnt = 0;
> +		struct slab_sheaf *spill;
>  
>  		list_del(&slab->slab_list);
>  
> -		object = get_freelist_nofreeze(s, slab);
> +		object = get_freelist_and_freecount_nofreeze(s, slab, &freecount);
>  
>  		while (object && refilled < max) {
>  			p[refilled] = object;
> @@ -7039,28 +7062,72 @@ __refill_objects_node(struct kmem_cache *s, void **p, gfp_t gfp, unsigned int mi
>  			maybe_wipe_obj_freeptr(s, p[refilled]);
>  
>  			refilled++;
> +			freecount--;
>  		}
>  
> +		if (!freecount) {
> +			if (refilled >= max)
> +				break;
> +			continue;
> +		}
>  		/*
> -		 * Freelist had more objects than we can accommodate, we need to
> -		 * free them back. We can treat it like a detached freelist, just
> -		 * need to find the tail object.
> +		 * Freelist had more objects than we can accommodate, we first
> +		 * try to spill them into percpu sheaf.
>  		 */
> -		if (unlikely(object)) {
> -			void *head = object;
> -			void *tail;
> -			int cnt = 0;
> -
> -			do {
> -				tail = object;
> -				cnt++;
> -				object = get_freepointer(s, object);
> -			} while (object);
> -			__slab_free(s, slab, head, tail, cnt, _RET_IP_);
> +		if (freecount > s->sheaf_capacity)
> +			goto skip_spill;
> +		if (slab_test_pfmemalloc(slab))
> +			goto skip_spill;
> +
> +		if (!local_trylock(&s->cpu_sheaves->lock))
> +			goto skip_spill;
> +
> +		local_node = numa_mem_id();
> +		if (slab_nid(slab) != local_node) {
> +			local_unlock(&s->cpu_sheaves->lock);
> +			goto skip_spill;
>  		}
>  
> -		if (refilled >= max)
> -			break;
> +		pcs = this_cpu_ptr(s->cpu_sheaves);
> +		if (pcs->spare &&
> +		    (freecount <= (s->sheaf_capacity - pcs->spare->size)))
> +			spill = pcs->spare;
> +		else if (freecount <= (s->sheaf_capacity - pcs->main->size))
> +			spill = pcs->main;
> +		else {
> +			local_unlock(&s->cpu_sheaves->lock);
> +			goto skip_spill;
> +		}
> +
> +		if (freecount > (s->sheaf_capacity - spill->size)) {
> +			local_unlock(&s->cpu_sheaves->lock);
> +			goto skip_spill;
> +		}
> +
> +		for (i = 0; i < freecount; i++) {
> +			spill->objects[spill->size] = object;
> +			object = get_freepointer(s, object);
> +			maybe_wipe_obj_freeptr(s, spill->objects[spill->size]);
> +			spill->size++;
> +		}
> +
> +		local_unlock(&s->cpu_sheaves->lock);
> +		stat(s, SHEAF_SPILL);
> +		break;
> +skip_spill:
> +		/*
> +		 * Freelist had more objects than we can accommodate or spill,
> +		 * we need to free them back. We can treat it like a detached freelist,
> +		 * just need to find the tail object.
> +		 */
> +		head = object;
> +		do {
> +			tail = object;
> +			cnt++;
> +			object = get_freepointer(s, object);
> +		} while (object);
> +		__slab_free(s, slab, head, tail, cnt, _RET_IP_);
> +		break;
>  	}
>  
>  	if (unlikely(!list_empty(&pc.slabs))) {
> @@ -9247,6 +9314,7 @@ STAT_ATTR(SHEAF_FLUSH, sheaf_flush);
>  STAT_ATTR(SHEAF_REFILL, sheaf_refill);
>  STAT_ATTR(SHEAF_ALLOC, sheaf_alloc);
>  STAT_ATTR(SHEAF_FREE, sheaf_free);
> +STAT_ATTR(SHEAF_SPILL, sheaf_spill);
>  STAT_ATTR(BARN_GET, barn_get);
>  STAT_ATTR(BARN_GET_FAIL, barn_get_fail);
>  STAT_ATTR(BARN_PUT, barn_put);
> @@ -9335,6 +9403,7 @@ static struct attribute *slab_attrs[] = {
>  	&sheaf_refill_attr.attr,
>  	&sheaf_alloc_attr.attr,
>  	&sheaf_free_attr.attr,
> +	&sheaf_spill_attr.attr,
>  	&barn_get_attr.attr,
>  	&barn_get_fail_attr.attr,
>  	&barn_put_attr.attr,
> -- 
> 2.50.1
>
>

-- 
Vinicius


      parent reply	other threads:[~2026-04-15 20:56 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-10 11:16 Hao Li
2026-04-14  8:39 ` Harry Yoo (Oracle)
2026-04-14  9:59   ` Hao Li
2026-04-15 10:20     ` Harry Yoo (Oracle)
2026-04-15 20:55 ` Vinicius Costa Gomes [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a4v47xk5.fsf@intel.com \
    --to=vinicius.gomes@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@gentwo.org \
    --cc=hao.li@linux.dev \
    --cc=harry@kernel.org \
    --cc=linux-kernel@vger.kernel.orgg \
    --cc=linux-mm@kvack.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=vbabka@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox