From: Harry Yoo <harry.yoo@oracle.com>
To: Andrew Morton <akpm@linux-foundation.org>,
Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@gentwo.org>,
David Rientjes <rientjes@google.com>,
Roman Gushchin <roman.gushchin@linux.dev>,
Johannes Weiner <hannes@cmpxchg.org>,
Shakeel Butt <shakeel.butt@linux.dev>,
Michal Hocko <mhocko@kernel.org>,
Harry Yoo <harry.yoo@oracle.com>, Hao Li <hao.li@linux.dev>,
Alexei Starovoitov <ast@kernel.org>,
Puranjay Mohan <puranjay@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Amery Hung <ameryhung@gmail.com>,
Catalin Marinas <catalin.marinas@arm.com>,
"Paul E . McKenney" <paulmck@kernel.org>,
Frederic Weisbecker <frederic@kernel.org>,
Neeraj Upadhyay <neeraj.upadhyay@kernel.org>,
Joel Fernandes <joelagnelf@nvidia.com>,
Josh Triplett <josh@joshtriplett.org>,
Boqun Feng <boqun.feng@gmail.com>,
Uladzislau Rezki <urezki@gmail.com>,
Steven Rostedt <rostedt@goodmis.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Lai Jiangshan <jiangshanlai@gmail.com>,
Zqiang <qiang.zhang@linux.dev>,
Dave Chinner <david@fromorbit.com>,
Qi Zheng <zhengqi.arch@bytedance.com>,
Muchun Song <muchun.song@linux.dev>,
rcu@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org
Subject: [RFC PATCH 0/7] k[v]free_rcu() improvements
Date: Fri, 6 Feb 2026 18:34:03 +0900 [thread overview]
Message-ID: <20260206093410.160622-1-harry.yoo@oracle.com> (raw)
These are a few improvements for k[v]free_rcu() API, which were suggested
by Alexei Starovoitov.
[ To kmemleak folks: I'm going to teach delete_object_full() and
paint_ptr() to ignore cases when the object does not exist.
Could you please let me know if the way it's done in patch 3
looks good? Only part 2 is relevant to you. ]
Although I've put some effort into providing a decent quality
implementation, I'd like you to consider this as a proof-of-concept
and let's discuss how best we could tackle those problems:
1) Allow an 8-byte field to be used as an alternative to
struct rcu_head (16-byte) for 2-argument kvfree_rcu()
2) kmalloc_nolock() -> kfree[_rcu]() support
3) Add kfree_rcu_nolock() for NMI context
# Part 1. Allow an 8-byte field to be used as an alternative to
struct rcu_head for 2-argument kvfree_rcu()
Technically, objects that are freed with k[v]free_rcu() need
only one pointer to link objects, because we already know that
the callback function is always kvfree(). For this purpose,
struct rcu_head is unnecessarily large (16 bytes on 64-bit).
Allow a smaller, 8-byte field (of struct rcu_ptr type) to be used
with k[v]free_rcu(). Let's save one pointer per slab object.
I have to admit that my naming skill isn't great; hopefully
we'll come up with a better name than `struct rcu_ptr`.
With this feature, either a struct rcu_ptr or rcu_head field
can be used as the second argument of the k[v]free_rcu() API.
Users that only use k[v]free_rcu() are highly encouraged to use
struct rcu_ptr; otherwise you're wasting memory. However, some users,
such as maple tree, may use call_rcu() or k[v]free_rcu() depending on
the situation for objects of the same type. For such users,
struct rcu_head remains the only option.
Patch 1 implements this feature, and patch 2 adds a few users in mm/.
# Part 2. kmalloc_nolock() -> kfree() or kfree_rcu() path support
Allow objects allocated with kmalloc_nolock() to be freed with
kfree[_rcu](). Without this support, users are forced to call
call_rcu() with kfree_nolock() to free objects after a grace period.
This is not efficient and can create unnecessarily many grace periods
by bypassing the kfree_rcu batching layer.
The reason why it was not supported before was because some alloc
hooks are not called in kmalloc_nolock(), while all free hooks are
called in kfree().
Patch 3 adds support for this by teaching kmemleak to ignore cases
when free hooks are called without prior alloc hooks. Patch 4 frees
a bit in enum objexts_flags, since we no longer have to remember
whether the array was allocated using kmalloc_nolock() or kmalloc().
Note that the free hooks fall into these categories:
- Its alloc hook is called in kmalloc_nolock(), no problem!
(kmsan_slab_alloc(), kasan_slab_alloc(),
memcg_slab_post_alloc_hook(), alloc_tagging_slab_alloc_hook())
- Its alloc hook isn't called in kmalloc_nolock(); free hooks
must handle asymmetric hook calls. (kfence_free(),
kmemleak_free_recursive())
- There is no matching alloc hook for the free hook; it's safe to
call. (debug_check_no_{locks,obj}_freed, __kcsan_check_access())
Note that kmalloc() -> kfree_nolock() or kfree_rcu_nolock() isn't
still supported! That's much trickier :)
# Part 3. Add kfree_rcu_nolock() for NMI context
Add a new 2-argument kfree_rcu_nolock() variant that is safe to be
called in NMI context. In NMI context, calling kfree_rcu() or
call_rcu() is not legal, and thus users are forced to implement some
sort of deferred freeing. Let's make users' lives easier with the new
variant.
Note that 1-argument kfree_rcu_nolock() is not supported, since there
is not much we can do when trylock & memory allocation fails.
(You can't call synchronize_rcu() in NMI context!)
When spinning on a lock is not allowed, try to acquire the spinlock.
When it succeeds in acquiring the lock, do either:
1) Use the rcu sheaf to free the object. Note that call_rcu() cannot
be called in NMI context! When the rcu sheaf becomes full by
freeing the object, it cannot free to the sheaf and has to fall back.
2) Use struct rcu_ptr field to link objects. Consuming a bnode
(of struct kvfree_rcu_bulk_data) and queueing work to maintain
a number of cached bnodes is avoided in NMI context.
Note that scheduling delayed monitor work to drain objects after
KFREE_DRAIN_JIFFIES is done using a lazy irq_work to avoid raising
self-IPIs. That means scheduling delayed monitor work can be delayed
up to the length of a time slice.
In rare cases where trylock fails, a non-lazy irq_work is used to
defer calling kvfree_rcu_call().
When certain debug features (kmemleak, debugobjects) are enabled,
freeing in NMI context is always deferred because they use spinlocks.
Patch 6 implements kfree_rcu_nolock() support, patch 7 adds sheaves
support for the new API.
Harry Yoo (7):
mm/slab: introduce k[v]free_rcu() with struct rcu_ptr
mm: use rcu_ptr instead of rcu_head
mm/slab: allow freeing kmalloc_nolock()'d objects using kfree[_rcu]()
mm/slab: free a bit in enum objexts_flags
mm/slab: move kfree_rcu_cpu[_work] definitions
mm/slab: introduce kfree_rcu_nolock()
mm/slab: make kfree_rcu_nolock() work with sheaves
include/linux/list_lru.h | 2 +-
include/linux/memcontrol.h | 3 +-
include/linux/rcupdate.h | 68 +++++---
include/linux/shrinker.h | 2 +-
include/linux/types.h | 9 ++
mm/kmemleak.c | 11 +-
mm/slab.h | 2 +-
mm/slab_common.c | 309 +++++++++++++++++++++++++------------
mm/slub.c | 47 ++++--
mm/vmalloc.c | 4 +-
10 files changed, 310 insertions(+), 147 deletions(-)
--
2.43.0
next reply other threads:[~2026-02-06 9:35 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-06 9:34 Harry Yoo [this message]
2026-02-06 9:34 ` [RFC PATCH 1/7] mm/slab: introduce k[v]free_rcu() with struct rcu_ptr Harry Yoo
2026-02-11 10:16 ` Uladzislau Rezki
2026-02-11 10:44 ` Harry Yoo
2026-02-11 10:53 ` Uladzislau Rezki
2026-02-11 11:26 ` Harry Yoo
2026-02-11 13:02 ` Uladzislau Rezki
2026-02-11 17:05 ` Alexei Starovoitov
2026-02-12 11:52 ` Vlastimil Babka
2026-02-13 5:17 ` Harry Yoo
2026-02-06 9:34 ` [RFC PATCH 2/7] mm: use rcu_ptr instead of rcu_head Harry Yoo
2026-02-09 10:41 ` Uladzislau Rezki
2026-02-09 11:22 ` Harry Yoo
2026-02-06 9:34 ` [RFC PATCH 3/7] mm/slab: allow freeing kmalloc_nolock()'d objects using kfree[_rcu]() Harry Yoo
2026-02-06 9:34 ` [RFC PATCH 4/7] mm/slab: free a bit in enum objexts_flags Harry Yoo
2026-02-06 20:09 ` Alexei Starovoitov
2026-02-09 9:38 ` Vlastimil Babka
2026-02-09 18:44 ` Alexei Starovoitov
2026-02-06 9:34 ` [RFC PATCH 5/7] mm/slab: move kfree_rcu_cpu[_work] definitions Harry Yoo
2026-02-06 9:34 ` [RFC PATCH 6/7] mm/slab: introduce kfree_rcu_nolock() Harry Yoo
2026-02-12 2:58 ` Harry Yoo
2026-02-16 21:07 ` Joel Fernandes
2026-02-16 21:32 ` Joel Fernandes
2026-02-06 9:34 ` [RFC PATCH 7/7] mm/slab: make kfree_rcu_nolock() work with sheaves Harry Yoo
2026-02-12 19:15 ` Alexei Starovoitov
2026-02-13 11:55 ` Harry Yoo
2026-02-07 0:16 ` [RFC PATCH 0/7] k[v]free_rcu() improvements Paul E. McKenney
2026-02-07 1:21 ` Harry Yoo
2026-02-07 1:33 ` Paul E. McKenney
2026-02-09 9:02 ` Harry Yoo
2026-02-09 16:40 ` Paul E. McKenney
2026-02-12 14:28 ` Vlastimil Babka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260206093410.160622-1-harry.yoo@oracle.com \
--to=harry.yoo@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=ameryhung@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=boqun.feng@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=catalin.marinas@arm.com \
--cc=cl@gentwo.org \
--cc=david@fromorbit.com \
--cc=frederic@kernel.org \
--cc=hannes@cmpxchg.org \
--cc=hao.li@linux.dev \
--cc=jiangshanlai@gmail.com \
--cc=joelagnelf@nvidia.com \
--cc=josh@joshtriplett.org \
--cc=linux-mm@kvack.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhocko@kernel.org \
--cc=muchun.song@linux.dev \
--cc=neeraj.upadhyay@kernel.org \
--cc=paulmck@kernel.org \
--cc=puranjay@kernel.org \
--cc=qiang.zhang@linux.dev \
--cc=rcu@vger.kernel.org \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=rostedt@goodmis.org \
--cc=shakeel.butt@linux.dev \
--cc=urezki@gmail.com \
--cc=vbabka@suse.cz \
--cc=zhengqi.arch@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox