From: Uladzislau Rezki <urezki@gmail.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Uladzislau Rezki <urezki@gmail.com>,
Suren Baghdasaryan <surenb@google.com>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Christoph Lameter <cl@gentwo.org>,
David Rientjes <rientjes@google.com>,
Roman Gushchin <roman.gushchin@linux.dev>,
Harry Yoo <harry.yoo@oracle.com>,
Sidhartha Kumar <sidhartha.kumar@oracle.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
rcu@vger.kernel.org, maple-tree@lists.infradead.org
Subject: Re: [PATCH v7 04/21] slab: add sheaf support for batching kfree_rcu() operations
Date: Tue, 9 Sep 2025 11:08:20 +0200 [thread overview]
Message-ID: <aL_uhPtztx7Ef0T2@pc636> (raw)
In-Reply-To: <6f8274da-a010-4bb3-b3d6-690481b5ace0@suse.cz>
On Mon, Sep 08, 2025 at 02:45:11PM +0200, Vlastimil Babka wrote:
> On 9/8/25 13:59, Uladzislau Rezki wrote:
> > On Wed, Sep 03, 2025 at 02:59:46PM +0200, Vlastimil Babka wrote:
> >> Extend the sheaf infrastructure for more efficient kfree_rcu() handling.
> >> For caches with sheaves, on each cpu maintain a rcu_free sheaf in
> >> addition to main and spare sheaves.
> >>
> >> kfree_rcu() operations will try to put objects on this sheaf. Once full,
> >> the sheaf is detached and submitted to call_rcu() with a handler that
> >> will try to put it in the barn, or flush to slab pages using bulk free,
> >> when the barn is full. Then a new empty sheaf must be obtained to put
> >> more objects there.
> >>
> >> It's possible that no free sheaves are available to use for a new
> >> rcu_free sheaf, and the allocation in kfree_rcu() context can only use
> >> GFP_NOWAIT and thus may fail. In that case, fall back to the existing
> >> kfree_rcu() implementation.
> >>
> >> Expected advantages:
> >> - batching the kfree_rcu() operations, that could eventually replace the
> >> existing batching
> >> - sheaves can be reused for allocations via barn instead of being
> >> flushed to slabs, which is more efficient
> >> - this includes cases where only some cpus are allowed to process rcu
> >> callbacks (Android)
> >>
> >> Possible disadvantage:
> >> - objects might be waiting for more than their grace period (it is
> >> determined by the last object freed into the sheaf), increasing memory
> >> usage - but the existing batching does that too.
> >>
> >> Only implement this for CONFIG_KVFREE_RCU_BATCHED as the tiny
> >> implementation favors smaller memory footprint over performance.
> >>
> >> Add CONFIG_SLUB_STATS counters free_rcu_sheaf and free_rcu_sheaf_fail to
> >> count how many kfree_rcu() used the rcu_free sheaf successfully and how
> >> many had to fall back to the existing implementation.
> >>
> >> Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
> >> Reviewed-by: Suren Baghdasaryan <surenb@google.com>
> >> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> >> ---
> >> mm/slab.h | 2 +
> >> mm/slab_common.c | 24 +++++++
> >> mm/slub.c | 192 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
> >> 3 files changed, 216 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/mm/slab.h b/mm/slab.h
> >> index 206987ce44a4d053ebe3b5e50784d2dd23822cd1..f1866f2d9b211bb0d7f24644b80ef4b50a7c3d24 100644
> >> --- a/mm/slab.h
> >> +++ b/mm/slab.h
> >> @@ -435,6 +435,8 @@ static inline bool is_kmalloc_normal(struct kmem_cache *s)
> >> return !(s->flags & (SLAB_CACHE_DMA|SLAB_ACCOUNT|SLAB_RECLAIM_ACCOUNT));
> >> }
> >>
> >> +bool __kfree_rcu_sheaf(struct kmem_cache *s, void *obj);
> >> +
> >> #define SLAB_CORE_FLAGS (SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA | \
> >> SLAB_CACHE_DMA32 | SLAB_PANIC | \
> >> SLAB_TYPESAFE_BY_RCU | SLAB_DEBUG_OBJECTS | \
> >> diff --git a/mm/slab_common.c b/mm/slab_common.c
> >> index e2b197e47866c30acdbd1fee4159f262a751c5a7..2d806e02568532a1000fd3912db6978e945dcfa8 100644
> >> --- a/mm/slab_common.c
> >> +++ b/mm/slab_common.c
> >> @@ -1608,6 +1608,27 @@ static void kfree_rcu_work(struct work_struct *work)
> >> kvfree_rcu_list(head);
> >> }
> >>
> >> +static bool kfree_rcu_sheaf(void *obj)
> >> +{
> >> + struct kmem_cache *s;
> >> + struct folio *folio;
> >> + struct slab *slab;
> >> +
> >> + if (is_vmalloc_addr(obj))
> >> + return false;
> >> +
> >> + folio = virt_to_folio(obj);
> >> + if (unlikely(!folio_test_slab(folio)))
> >> + return false;
> >> +
> >> + slab = folio_slab(folio);
> >> + s = slab->slab_cache;
> >> + if (s->cpu_sheaves)
> >> + return __kfree_rcu_sheaf(s, obj);
> >> +
> >> + return false;
> >> +}
> >> +
> >> static bool
> >> need_offload_krc(struct kfree_rcu_cpu *krcp)
> >> {
> >> @@ -1952,6 +1973,9 @@ void kvfree_call_rcu(struct rcu_head *head, void *ptr)
> >> if (!head)
> >> might_sleep();
> >>
> >> + if (kfree_rcu_sheaf(ptr))
> >> + return;
> >> +
> > Uh.. I have some concerns about this.
> >
> > This patch introduces a new path which is a collision to the
> > existing kvfree_rcu() logic. It implements some batching which
> > we already have.
>
> Yes but for caches with sheaves it's better to recycle the whole sheaf (as
> described), which is so different from the existing batching scheme that I'm
> not sure if there's a sensible way to combine them.
>
> > - kvfree_rcu_barrier() does not know about "sheaf" path. Am i missing
> > something? How do you guarantee that kvfree_rcu_barrier() flushes
> > sheafs? If it is part of kvfree_rcu() it has to care about this.
>
> Hm good point, thanks. I've taken care of handling flushing related to
> kfree_rcu() sheaves in kmem_cache_destroy(), but forgot that
> kvfree_rcu_barrier() can be also used outside of that - we have one user in
> codetag_unload_module() currently.
>
> > - we do not allocate in kvfree_rcu() path because of PREEMMPT_RT, i.e.
> > kvfree_rcu() is supposed it can be called from the non-sleeping contexts.
>
> Hm I could not find where that distinction is in the code, can you give a
> hint please. In __kfree_rcu_sheaf() I do only have a GFP_NOWAIT attempt.
>
For PREEMPT_RT a regular spin-lock is an rt-mutex which can sleep. We
made kvfree_rcu() to make it possible to invoke it from non-sleep contexts:
CONFIG_PREEMPT_RT
preempt_disable() or something similar;
kvfree_rcu();
GFP_NOWAIT - lock rt-mutex
If GFP_NOWAIT semantic does not access any spin-locks then we are safe
or if it uses raw_spin_locks.
> > - call_rcu() can be slow, therefore we do not use it in the kvfree_rcu().
>
> If call_rcu() is called once per 32 kfree_rcu() filling up the rcu sheaf, is
> it still too slow?
>
You do not know where in a queue this callback lands, in the beginning,
in the end, etc. It is part of generic list which is processed one by
one. It can contain thousands of callbacks.
If performance is not needed then it is not an issue. But in
kvfree_rcu() we do not use it, because of we want to offload
fast.
--
Uladzislau Rezki
next prev parent reply other threads:[~2025-09-09 9:08 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-03 12:59 [PATCH v7 00/21] SLUB percpu sheaves Vlastimil Babka
2025-09-03 12:59 ` [PATCH v7 01/21] locking/local_lock: Expose dep_map in local_trylock_t Vlastimil Babka
2025-09-04 1:38 ` Harry Yoo
2025-09-03 12:59 ` [PATCH v7 02/21] slab: simplify init_kmem_cache_nodes() error handling Vlastimil Babka
2025-09-04 1:41 ` Harry Yoo
2025-09-03 12:59 ` [PATCH v7 03/21] slab: add opt-in caching layer of percpu sheaves Vlastimil Babka
2025-09-08 11:19 ` Harry Yoo
2025-09-08 12:26 ` Vlastimil Babka
2025-09-03 12:59 ` [PATCH v7 04/21] slab: add sheaf support for batching kfree_rcu() operations Vlastimil Babka
2025-09-08 11:59 ` Uladzislau Rezki
2025-09-08 12:45 ` Vlastimil Babka
2025-09-09 9:08 ` Uladzislau Rezki [this message]
2025-09-09 9:14 ` Uladzislau Rezki
2025-09-09 10:20 ` Vlastimil Babka
2025-09-09 14:55 ` Vlastimil Babka
2025-09-09 14:35 ` Liam R. Howlett
2025-09-10 7:31 ` Uladzislau Rezki
2025-09-03 12:59 ` [PATCH v7 05/21] slab: sheaf prefilling for guaranteed allocations Vlastimil Babka
2025-09-03 12:59 ` [PATCH v7 06/21] slab: determine barn status racily outside of lock Vlastimil Babka
2025-09-03 12:59 ` [PATCH v7 07/21] slab: skip percpu sheaves for remote object freeing Vlastimil Babka
2025-09-03 12:59 ` [PATCH v7 08/21] slab: allow NUMA restricted allocations to use percpu sheaves Vlastimil Babka
2025-09-03 12:59 ` [PATCH v7 09/21] tools/testing/maple_tree: Fix check_bulk_rebalance() locks Vlastimil Babka
2025-09-03 12:59 ` [PATCH v7 10/21] tools/testing/vma: Implement vm_refcnt reset Vlastimil Babka
2025-09-03 12:59 ` [PATCH v7 11/21] tools/testing: Add support for changes to slab for sheaves Vlastimil Babka
2025-09-03 12:59 ` [PATCH v7 12/21] mm, vma: use percpu sheaves for vm_area_struct cache Vlastimil Babka
2025-09-03 12:59 ` [PATCH v7 13/21] maple_tree: use percpu sheaves for maple_node_cache Vlastimil Babka
2025-09-03 12:59 ` [PATCH v7 14/21] tools/testing: include maple-shim.c in maple.c Vlastimil Babka
2025-09-03 12:59 ` [PATCH v7 15/21] testing/radix-tree/maple: Hack around kfree_rcu not existing Vlastimil Babka
2025-09-03 12:59 ` [PATCH v7 16/21] maple_tree: Use kfree_rcu in ma_free_rcu Vlastimil Babka
2025-09-03 12:59 ` [PATCH v7 17/21] maple_tree: Replace mt_free_one() with kfree() Vlastimil Babka
2025-09-03 13:00 ` [PATCH v7 18/21] tools/testing: Add support for prefilled slab sheafs Vlastimil Babka
2025-09-03 13:00 ` [PATCH v7 19/21] maple_tree: Prefilled sheaf conversion and testing Vlastimil Babka
2025-09-03 13:00 ` [PATCH v7 20/21] maple_tree: Add single node allocation support to maple state Vlastimil Babka
2025-09-03 13:00 ` [PATCH v7 21/21] maple_tree: Convert forking to use the sheaf interface Vlastimil Babka
2025-09-08 7:55 ` [PATCH v7 00/21] SLUB percpu sheaves Vlastimil Babka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aL_uhPtztx7Ef0T2@pc636 \
--to=urezki@gmail.com \
--cc=Liam.Howlett@oracle.com \
--cc=cl@gentwo.org \
--cc=harry.yoo@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=maple-tree@lists.infradead.org \
--cc=rcu@vger.kernel.org \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=sidhartha.kumar@oracle.com \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox