From: "Paul E. McKenney" <paulmck@kernel.org>
To: Harry Yoo <harry.yoo@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Vlastimil Babka <vbabka@suse.cz>,
Christoph Lameter <cl@gentwo.org>,
David Rientjes <rientjes@google.com>,
Roman Gushchin <roman.gushchin@linux.dev>,
Johannes Weiner <hannes@cmpxchg.org>,
Shakeel Butt <shakeel.butt@linux.dev>,
Michal Hocko <mhocko@kernel.org>, Hao Li <hao.li@linux.dev>,
Alexei Starovoitov <ast@kernel.org>,
Puranjay Mohan <puranjay@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Amery Hung <ameryhung@gmail.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Frederic Weisbecker <frederic@kernel.org>,
Neeraj Upadhyay <neeraj.upadhyay@kernel.org>,
Joel Fernandes <joelagnelf@nvidia.com>,
Josh Triplett <josh@joshtriplett.org>,
Boqun Feng <boqun.feng@gmail.com>,
Uladzislau Rezki <urezki@gmail.com>,
Steven Rostedt <rostedt@goodmis.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Lai Jiangshan <jiangshanlai@gmail.com>,
Zqiang <qiang.zhang@linux.dev>,
Dave Chinner <david@fromorbit.com>,
Qi Zheng <zhengqi.arch@bytedance.com>,
Muchun Song <muchun.song@linux.dev>,
rcu@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org
Subject: Re: [RFC PATCH 0/7] k[v]free_rcu() improvements
Date: Fri, 6 Feb 2026 16:16:46 -0800 [thread overview]
Message-ID: <3069e76d-5c7a-4c3f-9b83-43ed1700b95f@paulmck-laptop> (raw)
In-Reply-To: <20260206093410.160622-1-harry.yoo@oracle.com>
On Fri, Feb 06, 2026 at 06:34:03PM +0900, Harry Yoo wrote:
> These are a few improvements for k[v]free_rcu() API, which were suggested
> by Alexei Starovoitov.
>
> [ To kmemleak folks: I'm going to teach delete_object_full() and
> paint_ptr() to ignore cases when the object does not exist.
> Could you please let me know if the way it's done in patch 3
> looks good? Only part 2 is relevant to you. ]
On what commit should I apply this series? I get conflicts on top of -rcu
(no surprise there) and build errors on top of next-20260205.
Thanx, Paul
> Although I've put some effort into providing a decent quality
> implementation, I'd like you to consider this as a proof-of-concept
> and let's discuss how best we could tackle those problems:
>
> 1) Allow an 8-byte field to be used as an alternative to
> struct rcu_head (16-byte) for 2-argument kvfree_rcu()
> 2) kmalloc_nolock() -> kfree[_rcu]() support
> 3) Add kfree_rcu_nolock() for NMI context
>
> # Part 1. Allow an 8-byte field to be used as an alternative to
> struct rcu_head for 2-argument kvfree_rcu()
>
> Technically, objects that are freed with k[v]free_rcu() need
> only one pointer to link objects, because we already know that
> the callback function is always kvfree(). For this purpose,
> struct rcu_head is unnecessarily large (16 bytes on 64-bit).
>
> Allow a smaller, 8-byte field (of struct rcu_ptr type) to be used
> with k[v]free_rcu(). Let's save one pointer per slab object.
>
> I have to admit that my naming skill isn't great; hopefully
> we'll come up with a better name than `struct rcu_ptr`.
>
> With this feature, either a struct rcu_ptr or rcu_head field
> can be used as the second argument of the k[v]free_rcu() API.
>
> Users that only use k[v]free_rcu() are highly encouraged to use
> struct rcu_ptr; otherwise you're wasting memory. However, some users,
> such as maple tree, may use call_rcu() or k[v]free_rcu() depending on
> the situation for objects of the same type. For such users,
> struct rcu_head remains the only option.
>
> Patch 1 implements this feature, and patch 2 adds a few users in mm/.
>
> # Part 2. kmalloc_nolock() -> kfree() or kfree_rcu() path support
>
> Allow objects allocated with kmalloc_nolock() to be freed with
> kfree[_rcu](). Without this support, users are forced to call
> call_rcu() with kfree_nolock() to free objects after a grace period.
> This is not efficient and can create unnecessarily many grace periods
> by bypassing the kfree_rcu batching layer.
>
> The reason why it was not supported before was because some alloc
> hooks are not called in kmalloc_nolock(), while all free hooks are
> called in kfree().
>
> Patch 3 adds support for this by teaching kmemleak to ignore cases
> when free hooks are called without prior alloc hooks. Patch 4 frees
> a bit in enum objexts_flags, since we no longer have to remember
> whether the array was allocated using kmalloc_nolock() or kmalloc().
>
> Note that the free hooks fall into these categories:
>
> - Its alloc hook is called in kmalloc_nolock(), no problem!
> (kmsan_slab_alloc(), kasan_slab_alloc(),
> memcg_slab_post_alloc_hook(), alloc_tagging_slab_alloc_hook())
>
> - Its alloc hook isn't called in kmalloc_nolock(); free hooks
> must handle asymmetric hook calls. (kfence_free(),
> kmemleak_free_recursive())
>
> - There is no matching alloc hook for the free hook; it's safe to
> call. (debug_check_no_{locks,obj}_freed, __kcsan_check_access())
>
> Note that kmalloc() -> kfree_nolock() or kfree_rcu_nolock() isn't
> still supported! That's much trickier :)
>
> # Part 3. Add kfree_rcu_nolock() for NMI context
>
> Add a new 2-argument kfree_rcu_nolock() variant that is safe to be
> called in NMI context. In NMI context, calling kfree_rcu() or
> call_rcu() is not legal, and thus users are forced to implement some
> sort of deferred freeing. Let's make users' lives easier with the new
> variant.
>
> Note that 1-argument kfree_rcu_nolock() is not supported, since there
> is not much we can do when trylock & memory allocation fails.
> (You can't call synchronize_rcu() in NMI context!)
>
> When spinning on a lock is not allowed, try to acquire the spinlock.
> When it succeeds in acquiring the lock, do either:
>
> 1) Use the rcu sheaf to free the object. Note that call_rcu() cannot
> be called in NMI context! When the rcu sheaf becomes full by
> freeing the object, it cannot free to the sheaf and has to fall back.
>
> 2) Use struct rcu_ptr field to link objects. Consuming a bnode
> (of struct kvfree_rcu_bulk_data) and queueing work to maintain
> a number of cached bnodes is avoided in NMI context.
>
> Note that scheduling delayed monitor work to drain objects after
> KFREE_DRAIN_JIFFIES is done using a lazy irq_work to avoid raising
> self-IPIs. That means scheduling delayed monitor work can be delayed
> up to the length of a time slice.
>
> In rare cases where trylock fails, a non-lazy irq_work is used to
> defer calling kvfree_rcu_call().
>
> When certain debug features (kmemleak, debugobjects) are enabled,
> freeing in NMI context is always deferred because they use spinlocks.
>
> Patch 6 implements kfree_rcu_nolock() support, patch 7 adds sheaves
> support for the new API.
>
> Harry Yoo (7):
> mm/slab: introduce k[v]free_rcu() with struct rcu_ptr
> mm: use rcu_ptr instead of rcu_head
> mm/slab: allow freeing kmalloc_nolock()'d objects using kfree[_rcu]()
> mm/slab: free a bit in enum objexts_flags
> mm/slab: move kfree_rcu_cpu[_work] definitions
> mm/slab: introduce kfree_rcu_nolock()
> mm/slab: make kfree_rcu_nolock() work with sheaves
>
> include/linux/list_lru.h | 2 +-
> include/linux/memcontrol.h | 3 +-
> include/linux/rcupdate.h | 68 +++++---
> include/linux/shrinker.h | 2 +-
> include/linux/types.h | 9 ++
> mm/kmemleak.c | 11 +-
> mm/slab.h | 2 +-
> mm/slab_common.c | 309 +++++++++++++++++++++++++------------
> mm/slub.c | 47 ++++--
> mm/vmalloc.c | 4 +-
> 10 files changed, 310 insertions(+), 147 deletions(-)
>
> --
> 2.43.0
>
next prev parent reply other threads:[~2026-02-07 0:16 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-06 9:34 Harry Yoo
2026-02-06 9:34 ` [RFC PATCH 1/7] mm/slab: introduce k[v]free_rcu() with struct rcu_ptr Harry Yoo
2026-02-11 10:16 ` Uladzislau Rezki
2026-02-11 10:44 ` Harry Yoo
2026-02-11 10:53 ` Uladzislau Rezki
2026-02-11 11:26 ` Harry Yoo
2026-02-11 13:02 ` Uladzislau Rezki
2026-02-11 17:05 ` Alexei Starovoitov
2026-02-12 11:52 ` Vlastimil Babka
2026-02-13 5:17 ` Harry Yoo
2026-02-06 9:34 ` [RFC PATCH 2/7] mm: use rcu_ptr instead of rcu_head Harry Yoo
2026-02-09 10:41 ` Uladzislau Rezki
2026-02-09 11:22 ` Harry Yoo
2026-02-06 9:34 ` [RFC PATCH 3/7] mm/slab: allow freeing kmalloc_nolock()'d objects using kfree[_rcu]() Harry Yoo
2026-02-06 9:34 ` [RFC PATCH 4/7] mm/slab: free a bit in enum objexts_flags Harry Yoo
2026-02-06 20:09 ` Alexei Starovoitov
2026-02-09 9:38 ` Vlastimil Babka
2026-02-09 18:44 ` Alexei Starovoitov
2026-02-06 9:34 ` [RFC PATCH 5/7] mm/slab: move kfree_rcu_cpu[_work] definitions Harry Yoo
2026-02-06 9:34 ` [RFC PATCH 6/7] mm/slab: introduce kfree_rcu_nolock() Harry Yoo
2026-02-12 2:58 ` Harry Yoo
2026-02-16 21:07 ` Joel Fernandes
2026-02-16 21:32 ` Joel Fernandes
2026-02-06 9:34 ` [RFC PATCH 7/7] mm/slab: make kfree_rcu_nolock() work with sheaves Harry Yoo
2026-02-12 19:15 ` Alexei Starovoitov
2026-02-13 11:55 ` Harry Yoo
2026-02-07 0:16 ` Paul E. McKenney [this message]
2026-02-07 1:21 ` [RFC PATCH 0/7] k[v]free_rcu() improvements Harry Yoo
2026-02-07 1:33 ` Paul E. McKenney
2026-02-09 9:02 ` Harry Yoo
2026-02-09 16:40 ` Paul E. McKenney
2026-02-12 14:28 ` Vlastimil Babka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3069e76d-5c7a-4c3f-9b83-43ed1700b95f@paulmck-laptop \
--to=paulmck@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=ameryhung@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=boqun.feng@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=catalin.marinas@arm.com \
--cc=cl@gentwo.org \
--cc=david@fromorbit.com \
--cc=frederic@kernel.org \
--cc=hannes@cmpxchg.org \
--cc=hao.li@linux.dev \
--cc=harry.yoo@oracle.com \
--cc=jiangshanlai@gmail.com \
--cc=joelagnelf@nvidia.com \
--cc=josh@joshtriplett.org \
--cc=linux-mm@kvack.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhocko@kernel.org \
--cc=muchun.song@linux.dev \
--cc=neeraj.upadhyay@kernel.org \
--cc=puranjay@kernel.org \
--cc=qiang.zhang@linux.dev \
--cc=rcu@vger.kernel.org \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=rostedt@goodmis.org \
--cc=shakeel.butt@linux.dev \
--cc=urezki@gmail.com \
--cc=vbabka@suse.cz \
--cc=zhengqi.arch@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox