Re: [PATCH v5 01/14] slab: add opt-in caching layer of percpu sheaves

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Harry Yoo <harry.yoo@oracle.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Suren Baghdasaryan <surenb@google.com>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Christoph Lameter <cl@gentwo.org>,
	David Rientjes <rientjes@google.com>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Uladzislau Rezki <urezki@gmail.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	rcu@vger.kernel.org, maple-tree@lists.infradead.org
Subject: Re: [PATCH v5 01/14] slab: add opt-in caching layer of percpu sheaves
Date: Mon, 18 Aug 2025 19:09:13 +0900	[thread overview]
Message-ID: <aKL7yYL8coNuQpAP@harry> (raw)
In-Reply-To: <20250723-slub-percpu-caches-v5-1-b792cd830f5d@suse.cz>

On Wed, Jul 23, 2025 at 03:34:34PM +0200, Vlastimil Babka wrote:
> Specifying a non-zero value for a new struct kmem_cache_args field
> sheaf_capacity will setup a caching layer of percpu arrays called
> sheaves of given capacity for the created cache.
> 
> Allocations from the cache will allocate via the percpu sheaves (main or
> spare) as long as they have no NUMA node preference. Frees will also
> put the object back into one of the sheaves.
> 
> When both percpu sheaves are found empty during an allocation, an empty
> sheaf may be replaced with a full one from the per-node barn. If none
> are available and the allocation is allowed to block, an empty sheaf is
> refilled from slab(s) by an internal bulk alloc operation. When both
> percpu sheaves are full during freeing, the barn can replace a full one
> with an empty one, unless over a full sheaves limit. In that case a
> sheaf is flushed to slab(s) by an internal bulk free operation. Flushing
> sheaves and barns is also wired to the existing cpu flushing and cache
> shrinking operations.
> 
> The sheaves do not distinguish NUMA locality of the cached objects. If
> an allocation is requested with kmem_cache_alloc_node() (or a mempolicy
> with strict_numa mode enabled) with a specific node (not NUMA_NO_NODE),
> the sheaves are bypassed.
> 
> The bulk operations exposed to slab users also try to utilize the
> sheaves as long as the necessary (full or empty) sheaves are available
> on the cpu or in the barn. Once depleted, they will fallback to bulk
> alloc/free to slabs directly to avoid double copying.
> 
> The sheaf_capacity value is exported in sysfs for observability.
> 
> Sysfs CONFIG_SLUB_STATS counters alloc_cpu_sheaf and free_cpu_sheaf
> count objects allocated or freed using the sheaves (and thus not
> counting towards the other alloc/free path counters). Counters
> sheaf_refill and sheaf_flush count objects filled or flushed from or to
> slab pages, and can be used to assess how effective the caching is. The
> refill and flush operations will also count towards the usual
> alloc_fastpath/slowpath, free_fastpath/slowpath and other counters for
> the backing slabs.  For barn operations, barn_get and barn_put count how
> many full sheaves were get from or put to the barn, the _fail variants
> count how many such requests could not be satisfied mainly  because the
> barn was either empty or full. While the barn also holds empty sheaves
> to make some operations easier, these are not as critical to mandate own
> counters.  Finally, there are sheaf_alloc/sheaf_free counters.
> 
> Access to the percpu sheaves is protected by local_trylock() when
> potential callers include irq context, and local_lock() otherwise (such
> as when we already know the gfp flags allow blocking). The trylock
> failures should be rare and we can easily fallback. Each per-NUMA-node
> barn has a spin_lock.
> 
> When slub_debug is enabled for a cache with sheaf_capacity also
> specified, the latter is ignored so that allocations and frees reach the
> slow path where debugging hooks are processed. Similarly, we ignore it
> with CONFIG_SLUB_TINY which prefers low memory usage to performance.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  include/linux/slab.h |   31 ++
>  mm/slab.h            |    2 +
>  mm/slab_common.c     |    5 +-
>  mm/slub.c            | 1101 +++++++++++++++++++++++++++++++++++++++++++++++---
>  4 files changed, 1092 insertions(+), 47 deletions(-)
> 
> @@ -4554,6 +5164,274 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
>  	discard_slab(s, slab);
>  }
>  
> +/*
> + * pcs is locked. We should have get rid of the spare sheaf and obtained an
> + * empty sheaf, while the main sheaf is full. We want to install the empty sheaf
> + * as a main sheaf, and make the current main sheaf a spare sheaf.
> + *
> + * However due to having relinquished the cpu_sheaves lock when obtaining
> + * the empty sheaf, we need to handle some unlikely but possible cases.
> + *
> + * If we put any sheaf to barn here, it's because we were interrupted or have
> + * been migrated to a different cpu, which should be rare enough so just ignore
> + * the barn's limits to simplify the handling.
> + *
> + * An alternative scenario that gets us here is when we fail
> + * barn_replace_full_sheaf(), because there's no empty sheaf available in the
> + * barn, so we had to allocate it by alloc_empty_sheaf(). But because we saw the
> + * limit on full sheaves was not exceeded, we assume it didn't change and just
> + * put the full sheaf there.
> + */
> +static void __pcs_install_empty_sheaf(struct kmem_cache *s,
> +		struct slub_percpu_sheaves *pcs, struct slab_sheaf *empty)
> +{
> +	/* This is what we expect to find if nobody interrupted us. */
> +	if (likely(!pcs->spare)) {
> +		pcs->spare = pcs->main;
> +		pcs->main = empty;
> +		return;
> +	}
> +
> +	/*
> +	 * Unlikely because if the main sheaf had space, we would have just
> +	 * freed to it. Get rid of our empty sheaf.
> +	 */
> +	if (pcs->main->size < s->sheaf_capacity) {
> +		barn_put_empty_sheaf(pcs->barn, empty);
> +		return;
> +	}
> +
> +	/* Also unlikely for the same reason/ */

nit: unnecessary '/'

> +	if (pcs->spare->size < s->sheaf_capacity) {
> +		swap(pcs->main, pcs->spare);
> +		barn_put_empty_sheaf(pcs->barn, empty);
> +		return;
> +	}
> +
> +	/*
> +	 * We probably failed barn_replace_full_sheaf() due to no empty sheaf
> +	 * available there, but we allocated one, so finish the job.
> +	 */
> +	barn_put_full_sheaf(pcs->barn, pcs->main);
> +	stat(s, BARN_PUT);
> +	pcs->main = empty;
> +}

> +static struct slub_percpu_sheaves *
> +__pcs_handle_full(struct kmem_cache *s, struct slub_percpu_sheaves *pcs)
> +{
> +	struct slab_sheaf *empty;
> +	bool put_fail;
> +
> +restart:
> +	put_fail = false;
> +
> +	if (!pcs->spare) {
> +		empty = barn_get_empty_sheaf(pcs->barn);
> +		if (empty) {
> +			pcs->spare = pcs->main;
> +			pcs->main = empty;
> +			return pcs;
> +		}
> +		goto alloc_empty;
> +	}
> +
> +	if (pcs->spare->size < s->sheaf_capacity) {
> +		swap(pcs->main, pcs->spare);
> +		return pcs;
> +	}
> +
> +	empty = barn_replace_full_sheaf(pcs->barn, pcs->main);
> +
> +	if (!IS_ERR(empty)) {
> +		stat(s, BARN_PUT);
> +		pcs->main = empty;
> +		return pcs;
> +	}
> +
> +	if (PTR_ERR(empty) == -E2BIG) {
> +		/* Since we got here, spare exists and is full */
> +		struct slab_sheaf *to_flush = pcs->spare;
> +
> +		stat(s, BARN_PUT_FAIL);
> +
> +		pcs->spare = NULL;
> +		local_unlock(&s->cpu_sheaves->lock);
> +
> +		sheaf_flush_unused(s, to_flush);
> +		empty = to_flush;
> +		goto got_empty;
> +	}
> +
> +	/*
> +	 * We could not replace full sheaf because barn had no empty
> +	 * sheaves. We can still allocate it and put the full sheaf in
> +	 * __pcs_install_empty_sheaf(), but if we fail to allocate it,
> +	 * make sure to count the fail.
> +	 */
> +	put_fail = true;
> +
> +alloc_empty:
> +	local_unlock(&s->cpu_sheaves->lock);
> +
> +	empty = alloc_empty_sheaf(s, GFP_NOWAIT);
> +	if (empty)
> +		goto got_empty;
> +
> +	if (put_fail)
> +		 stat(s, BARN_PUT_FAIL);
> +
> +	if (!sheaf_flush_main(s))
> +		return NULL;
> +
> +	if (!local_trylock(&s->cpu_sheaves->lock))
> +		return NULL;
> +
> +	/*
> +	 * we flushed the main sheaf so it should be empty now,
> +	 * but in case we got preempted or migrated, we need to
> +	 * check again
> +	 */
> +	if (pcs->main->size == s->sheaf_capacity)
> +		goto restart;

I think it's missing:

pcs = this_cpu_ptr(&s->cpu_sheaves);

between local_trylock() and reading pcs->main->size().

> +
> +	return pcs;
> +
> +got_empty:
> +	if (!local_trylock(&s->cpu_sheaves->lock)) {
> +		barn_put_empty_sheaf(pcs->barn, empty);
> +		return NULL;
> +	}
> +
> +	pcs = this_cpu_ptr(s->cpu_sheaves);
> +	__pcs_install_empty_sheaf(s, pcs, empty);
> +
> +	return pcs;
> +}
> +
>  #ifndef CONFIG_SLUB_TINY
>  /*
>   * Fastpath with forced inlining to produce a kfree and kmem_cache_free that
> @@ -6481,7 +7464,6 @@ int do_kmem_cache_create(struct kmem_cache *s, const char *name,
>  		__kmem_cache_release(s);
>  	return err;
>  }
> -

nit: unnecessary removal of a newline?

Otherwise looks good to me.

>  #ifdef SLAB_SUPPORTS_SYSFS
>  static int count_inuse(struct slab *slab)
>  {

-- 
Cheers,
Harry / Hyeonggon

next prev parent reply	other threads:[~2025-08-18 10:09 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-23 13:34 [PATCH v5 00/14] SLUB " Vlastimil Babka
2025-07-23 13:34 ` [PATCH v5 01/14] slab: add opt-in caching layer of " Vlastimil Babka
2025-08-18 10:09   ` Harry Yoo [this message]
2025-08-26  8:03     ` Vlastimil Babka
2025-08-19  4:19   ` Suren Baghdasaryan
2025-08-26  8:51     ` Vlastimil Babka
2025-09-13 14:35   ` Mateusz Guzik
2025-09-13 20:32     ` Vlastimil Babka
2025-09-14  2:22   ` Hillf Danton
2025-09-14 20:24     ` Vlastimil Babka
2025-09-15  0:11       ` Hillf Danton
2025-09-15  7:21         ` Vlastimil Babka
2025-07-23 13:34 ` [PATCH v5 02/14] slab: add sheaf support for batching kfree_rcu() operations Vlastimil Babka
2025-07-23 16:39   ` Uladzislau Rezki
2025-07-24 14:30     ` Vlastimil Babka
2025-07-24 17:36       ` Uladzislau Rezki
2025-07-23 13:34 ` [PATCH v5 03/14] slab: sheaf prefilling for guaranteed allocations Vlastimil Babka
2025-07-23 13:34 ` [PATCH v5 04/14] slab: determine barn status racily outside of lock Vlastimil Babka
2025-07-23 13:34 ` [PATCH v5 05/14] tools: Add testing support for changes to rcu and slab for sheaves Vlastimil Babka
2025-08-22 16:28   ` Suren Baghdasaryan
2025-08-26  9:32     ` Vlastimil Babka
2025-08-27  0:19       ` Suren Baghdasaryan
2025-07-23 13:34 ` [PATCH v5 06/14] tools: Add sheaves support to testing infrastructure Vlastimil Babka
2025-08-22 16:56   ` Suren Baghdasaryan
2025-08-26  9:59     ` Vlastimil Babka
2025-07-23 13:34 ` [PATCH v5 07/14] maple_tree: use percpu sheaves for maple_node_cache Vlastimil Babka
2025-07-23 13:34 ` [PATCH v5 08/14] mm, vma: use percpu sheaves for vm_area_struct cache Vlastimil Babka
2025-07-23 13:34 ` [PATCH v5 09/14] mm, slub: skip percpu sheaves for remote object freeing Vlastimil Babka
2025-08-25  5:22   ` Harry Yoo
2025-08-26 10:11     ` Vlastimil Babka
2025-07-23 13:34 ` [PATCH v5 10/14] mm, slab: allow NUMA restricted allocations to use percpu sheaves Vlastimil Babka
2025-08-22 19:58   ` Suren Baghdasaryan
2025-08-25  6:52   ` Harry Yoo
2025-08-26 10:49     ` Vlastimil Babka
2025-07-23 13:34 ` [PATCH v5 11/14] testing/radix-tree/maple: Increase readers and reduce delay for faster machines Vlastimil Babka
2025-07-23 13:34 ` [PATCH v5 12/14] maple_tree: Sheaf conversion Vlastimil Babka
2025-08-22 20:18   ` Suren Baghdasaryan
2025-08-26 14:22     ` Liam R. Howlett
2025-08-27  2:07       ` Suren Baghdasaryan
2025-08-28 14:27         ` Liam R. Howlett
2025-07-23 13:34 ` [PATCH v5 13/14] maple_tree: Add single node allocation support to maple state Vlastimil Babka
2025-08-22 20:25   ` Suren Baghdasaryan
2025-08-26 15:10     ` Liam R. Howlett
2025-08-27  2:03       ` Suren Baghdasaryan
2025-07-23 13:34 ` [PATCH v5 14/14] maple_tree: Convert forking to use the sheaf interface Vlastimil Babka
2025-08-22 20:29   ` Suren Baghdasaryan
2025-08-15 22:53 ` [PATCH v5 00/14] SLUB percpu sheaves Sudarsan Mahendran
2025-08-16  8:05   ` Harry Yoo
2025-08-16 17:35     ` Sudarsan Mahendran
2025-08-16 18:31       ` Vlastimil Babka
2025-08-16 18:33         ` Vlastimil Babka
2025-08-17  4:28           ` Sudarsan Mahendran
2025-09-13  0:09 ` Benchmarking " Sudarsan Mahendran
2025-09-15  7:51   ` Jan Engelhardt
2025-09-15 12:13     ` Paul E. McKenney
2025-09-15 15:22       ` Vlastimil Babka
2025-09-16 17:09         ` Suren Baghdasaryan
2025-09-17  5:19           ` Uladzislau Rezki
2025-09-17 16:14             ` Suren Baghdasaryan
2025-09-17 23:59               ` Suren Baghdasaryan
2025-09-18 11:50                 ` Uladzislau Rezki
2025-09-18 15:29                   ` Liam R. Howlett
2025-09-19 15:07                     ` Uladzislau Rezki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aKL7yYL8coNuQpAP@harry \
    --to=harry.yoo@oracle.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=cl@gentwo.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=maple-tree@lists.infradead.org \
    --cc=rcu@vger.kernel.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=surenb@google.com \
    --cc=urezki@gmail.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox