From: Joel Fernandes <joelagnelf@nvidia.com>
To: Harry Yoo <harry.yoo@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Vlastimil Babka <vbabka@suse.cz>,
Christoph Lameter <cl@gentwo.org>,
David Rientjes <rientjes@google.com>,
Roman Gushchin <roman.gushchin@linux.dev>,
Johannes Weiner <hannes@cmpxchg.org>,
Shakeel Butt <shakeel.butt@linux.dev>,
Michal Hocko <mhocko@kernel.org>, Hao Li <hao.li@linux.dev>,
Alexei Starovoitov <ast@kernel.org>,
Puranjay Mohan <puranjay@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Amery Hung <ameryhung@gmail.com>,
Catalin Marinas <catalin.marinas@arm.com>,
"Paul E . McKenney" <paulmck@kernel.org>,
Frederic Weisbecker <frederic@kernel.org>,
Neeraj Upadhyay <neeraj.upadhyay@kernel.org>,
Josh Triplett <josh@joshtriplett.org>,
Boqun Feng <boqun.feng@gmail.com>,
Uladzislau Rezki <urezki@gmail.com>,
Steven Rostedt <rostedt@goodmis.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Lai Jiangshan <jiangshanlai@gmail.com>,
Zqiang <qiang.zhang@linux.dev>,
Dave Chinner <david@fromorbit.com>,
Qi Zheng <zhengqi.arch@bytedance.com>,
Muchun Song <muchun.song@linux.dev>,
rcu@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org,
peterz@infradead.org
Subject: Re: [RFC PATCH 6/7] mm/slab: introduce kfree_rcu_nolock()
Date: Mon, 16 Feb 2026 16:32:54 -0500 [thread overview]
Message-ID: <20260216213254.GA1469635@joelbox2> (raw)
In-Reply-To: <20260216210755.GA1320175@joelbox2>
CC Peter for real this time. ;-)
On Mon, Feb 16, 2026 at 04:07:55PM -0500, Joel Fernandes wrote:
> Hi Harry,
>
> On Fri, Feb 06, 2026 at 06:34:09PM +0900, Harry Yoo wrote:
> > Currently, kfree_rcu() cannot be called in an NMI context.
> > In such a context, even calling call_rcu() is not legal,
> > forcing users to implement deferred freeing.
> >
> > Make users' lives easier by introducing kfree_rcu_nolock() variant.
> > Unlike kfree_rcu(), kfree_rcu_nolock() only supports a 2-argument
> > variant, because, in the worst case where memory allocation fails,
> > the caller cannot synchronously wait for the grace period to finish.
> >
> > Similar to kfree_nolock() implementation, try to acquire kfree_rcu_cpu
> > spinlock, and if that fails, insert the object to per-cpu lockless list
> > and delay freeing using irq_work that calls kvfree_call_rcu() later.
> > In case kmemleak or debugobjects is enabled, always defer freeing as
> > those debug features don't support NMI contexts.
> >
> > When trylock succeeds, avoid consuming bnode and run_page_cache_worker()
> > altogether. Instead, insert objects into struct kfree_rcu_cpu.head
> > without consuming additional memory.
> >
> > For now, the sheaves layer is bypassed if spinning is not allowed.
> >
> > Scheduling delayed monitor work in an NMI context is tricky; use
> > irq_work to schedule, but use lazy irq_work to avoid raising self-IPIs.
> > That means scheduling delayed monitor work can be delayed up to the
> > length of a time slice.
> >
> > Without CONFIG_KVFREE_RCU_BATCHED, all frees in the !allow_spin case are
> > delayed using irq_work.
> >
> > Suggested-by: Alexei Starovoitov <ast@kernel.org>
> > Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> > ---
> > include/linux/rcupdate.h | 23 ++++---
> > mm/slab_common.c | 140 +++++++++++++++++++++++++++++++++------
> > 2 files changed, 133 insertions(+), 30 deletions(-)
> >
> > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > index db5053a7b0cb..18bb7378b23d 100644
> > --- a/include/linux/rcupdate.h
> > +++ b/include/linux/rcupdate.h
> > @@ -1092,8 +1092,9 @@ static inline void rcu_read_unlock_migrate(void)
> > * The BUILD_BUG_ON check must not involve any function calls, hence the
> > * checks are done in macros here.
> > */
> > -#define kfree_rcu(ptr, rf) kvfree_rcu_arg_2(ptr, rf)
> > -#define kvfree_rcu(ptr, rf) kvfree_rcu_arg_2(ptr, rf)
> > +#define kfree_rcu(ptr, rf) kvfree_rcu_arg_2(ptr, rf, true)
> > +#define kfree_rcu_nolock(ptr, rf) kvfree_rcu_arg_2(ptr, rf, false)
> > +#define kvfree_rcu(ptr, rf) kvfree_rcu_arg_2(ptr, rf, true)
> >
> > /**
> > * kfree_rcu_mightsleep() - kfree an object after a grace period.
> > @@ -1117,35 +1118,35 @@ static inline void rcu_read_unlock_migrate(void)
> >
> >
> > #ifdef CONFIG_KVFREE_RCU_BATCHED
> > -void kvfree_call_rcu_ptr(struct rcu_ptr *head, void *ptr);
> > -#define kvfree_call_rcu(head, ptr) \
> > +void kvfree_call_rcu_ptr(struct rcu_ptr *head, void *ptr, bool allow_spin);
> > +#define kvfree_call_rcu(head, ptr, spin) \
> > _Generic((head), \
> > struct rcu_head *: kvfree_call_rcu_ptr, \
> > struct rcu_ptr *: kvfree_call_rcu_ptr, \
> > void *: kvfree_call_rcu_ptr \
> > - )((struct rcu_ptr *)(head), (ptr))
> > + )((struct rcu_ptr *)(head), (ptr), spin)
> > #else
> > -void kvfree_call_rcu_head(struct rcu_head *head, void *ptr);
> > +void kvfree_call_rcu_head(struct rcu_head *head, void *ptr, bool allow_spin);
> > static_assert(sizeof(struct rcu_head) == sizeof(struct rcu_ptr));
> > -#define kvfree_call_rcu(head, ptr) \
> > +#define kvfree_call_rcu(head, ptr, spin) \
> > _Generic((head), \
> > struct rcu_head *: kvfree_call_rcu_head, \
> > struct rcu_ptr *: kvfree_call_rcu_head, \
> > void *: kvfree_call_rcu_head \
> > - )((struct rcu_head *)(head), (ptr))
> > + )((struct rcu_head *)(head), (ptr), spin)
> > #endif
> >
> > /*
> > * The BUILD_BUG_ON() makes sure the rcu_head offset can be handled. See the
> > * comment of kfree_rcu() for details.
> > */
> > -#define kvfree_rcu_arg_2(ptr, rf) \
> > +#define kvfree_rcu_arg_2(ptr, rf, spin) \
> > do { \
> > typeof (ptr) ___p = (ptr); \
> > \
> > if (___p) { \
> > BUILD_BUG_ON(offsetof(typeof(*(ptr)), rf) >= 4096); \
> > - kvfree_call_rcu(&((___p)->rf), (void *) (___p)); \
> > + kvfree_call_rcu(&((___p)->rf), (void *) (___p), spin); \
> > } \
> > } while (0)
> >
> > @@ -1154,7 +1155,7 @@ do { \
> > typeof(ptr) ___p = (ptr); \
> > \
> > if (___p) \
> > - kvfree_call_rcu(NULL, (void *) (___p)); \
> > + kvfree_call_rcu(NULL, (void *) (___p), true); \
> > } while (0)
> >
> > /*
> > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > index d232b99a4b52..9d7801e5cb73 100644
> > --- a/mm/slab_common.c
> > +++ b/mm/slab_common.c
> > @@ -1311,6 +1311,12 @@ struct kfree_rcu_cpu_work {
> > * the interactions with the slab allocators.
> > */
> > struct kfree_rcu_cpu {
> > + // Objects queued on a lockless linked list, not protected by the lock.
> > + // This allows freeing objects in NMI context, where trylock may fail.
> > + struct llist_head llist_head;
> > + struct irq_work irq_work;
> > + struct irq_work sched_monitor_irq_work;
>
> It would be great if irq_work_queue() could support a lazy flag, or a new
> irq_work_queue_lazy() which then just skips the irq_work_raise() for the lazy
> case. Then we don't need multiple struct irq_work doing the same thing. +PeterZ
>
> [...]
> > @@ -1979,9 +2059,15 @@ void kvfree_call_rcu_ptr(struct rcu_ptr *head, void *ptr)
> > }
> >
> > kasan_record_aux_stack(ptr);
> > - success = add_ptr_to_bulk_krc_lock(&krcp, &flags, ptr, !head);
> > +
> > + krcp = krc_this_cpu_lock(&flags, allow_spin);
> > + if (!krcp)
> > + goto defer_free;
> > +
> > + success = add_ptr_to_bulk_krc_lock(krcp, &flags, ptr, !head, allow_spin);
> > if (!success) {
> > - run_page_cache_worker(krcp);
> > + if (allow_spin)
> > + run_page_cache_worker(krcp);
> >
> > if (head == NULL)
> > // Inline if kvfree_rcu(one_arg) call.
> > @@ -2005,8 +2091,12 @@ void kvfree_call_rcu_ptr(struct rcu_ptr *head, void *ptr)
> > kmemleak_ignore(ptr);
> >
> > // Set timer to drain after KFREE_DRAIN_JIFFIES.
> > - if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING)
> > - __schedule_delayed_monitor_work(krcp);
> > + if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING) {
> > + if (allow_spin)
> > + __schedule_delayed_monitor_work(krcp);
> > + else
> > + irq_work_queue(&krcp->sched_monitor_irq_work);
>
> Here this irq_work will be queued even if delayed_work_pending? That might be
> additional irq_work overhead (which was not needed) when the delayed monitor
> was already queued?
>
> If delayed_work_pending() is safe to call from NMI, you could also call
> that to avoid unnecessary irq_work queueing. But do double check if it is.
>
> Also per [1], I gather allow_spin does not always imply NMI. If that is true,
> is better to call in_nmi() instead of relying on allow_spin?
>
> [1] https://lore.kernel.org/all/CAADnVQKk_Bgi0bc-td_3pVpHYXR3CpC3R8rg-NHwdLEDiQSeNg@mail.gmail.com/
>
> Thanks,
>
> --
> Joel Fernandes
>
>
>
> > + }
> >
> > unlock_return:
> > krc_this_cpu_unlock(krcp, flags);
> > @@ -2017,10 +2107,22 @@ void kvfree_call_rcu_ptr(struct rcu_ptr *head, void *ptr)
> > * CPU can pass the QS state.
> > */
> > if (!success) {
> > + VM_WARN_ON_ONCE(!allow_spin);
> > debug_rcu_head_unqueue((struct rcu_head *) ptr);
> > synchronize_rcu();
> > kvfree(ptr);
> > }
> > + return;
> > +
> > +defer_free:
> > + VM_WARN_ON_ONCE(allow_spin);
> > + guard(preempt)();
> > +
> > + krcp = this_cpu_ptr(&krc);
> > + if (llist_add((struct llist_node *)head, &krcp->llist_head))
> > + irq_work_queue(&krcp->irq_work);
> > + return;
> > +
> > }
> > EXPORT_SYMBOL_GPL(kvfree_call_rcu_ptr);
> >
> > --
> > 2.43.0
> >
next prev parent reply other threads:[~2026-02-16 21:33 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-06 9:34 [RFC PATCH 0/7] k[v]free_rcu() improvements Harry Yoo
2026-02-06 9:34 ` [RFC PATCH 1/7] mm/slab: introduce k[v]free_rcu() with struct rcu_ptr Harry Yoo
2026-02-11 10:16 ` Uladzislau Rezki
2026-02-11 10:44 ` Harry Yoo
2026-02-11 10:53 ` Uladzislau Rezki
2026-02-11 11:26 ` Harry Yoo
2026-02-11 13:02 ` Uladzislau Rezki
2026-02-11 17:05 ` Alexei Starovoitov
2026-02-12 11:52 ` Vlastimil Babka
2026-02-13 5:17 ` Harry Yoo
2026-02-06 9:34 ` [RFC PATCH 2/7] mm: use rcu_ptr instead of rcu_head Harry Yoo
2026-02-09 10:41 ` Uladzislau Rezki
2026-02-09 11:22 ` Harry Yoo
2026-02-06 9:34 ` [RFC PATCH 3/7] mm/slab: allow freeing kmalloc_nolock()'d objects using kfree[_rcu]() Harry Yoo
2026-02-06 9:34 ` [RFC PATCH 4/7] mm/slab: free a bit in enum objexts_flags Harry Yoo
2026-02-06 20:09 ` Alexei Starovoitov
2026-02-09 9:38 ` Vlastimil Babka
2026-02-09 18:44 ` Alexei Starovoitov
2026-02-06 9:34 ` [RFC PATCH 5/7] mm/slab: move kfree_rcu_cpu[_work] definitions Harry Yoo
2026-02-06 9:34 ` [RFC PATCH 6/7] mm/slab: introduce kfree_rcu_nolock() Harry Yoo
2026-02-12 2:58 ` Harry Yoo
2026-02-16 21:07 ` Joel Fernandes
2026-02-16 21:32 ` Joel Fernandes [this message]
2026-02-06 9:34 ` [RFC PATCH 7/7] mm/slab: make kfree_rcu_nolock() work with sheaves Harry Yoo
2026-02-12 19:15 ` Alexei Starovoitov
2026-02-13 11:55 ` Harry Yoo
2026-02-07 0:16 ` [RFC PATCH 0/7] k[v]free_rcu() improvements Paul E. McKenney
2026-02-07 1:21 ` Harry Yoo
2026-02-07 1:33 ` Paul E. McKenney
2026-02-09 9:02 ` Harry Yoo
2026-02-09 16:40 ` Paul E. McKenney
2026-02-12 14:28 ` Vlastimil Babka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260216213254.GA1469635@joelbox2 \
--to=joelagnelf@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=ameryhung@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=boqun.feng@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=catalin.marinas@arm.com \
--cc=cl@gentwo.org \
--cc=david@fromorbit.com \
--cc=frederic@kernel.org \
--cc=hannes@cmpxchg.org \
--cc=hao.li@linux.dev \
--cc=harry.yoo@oracle.com \
--cc=jiangshanlai@gmail.com \
--cc=josh@joshtriplett.org \
--cc=linux-mm@kvack.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhocko@kernel.org \
--cc=muchun.song@linux.dev \
--cc=neeraj.upadhyay@kernel.org \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=puranjay@kernel.org \
--cc=qiang.zhang@linux.dev \
--cc=rcu@vger.kernel.org \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=rostedt@goodmis.org \
--cc=shakeel.butt@linux.dev \
--cc=urezki@gmail.com \
--cc=vbabka@suse.cz \
--cc=zhengqi.arch@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox