linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Harry Yoo <harry.yoo@oracle.com>
To: Joel Fernandes <joelagnelf@nvidia.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Christoph Lameter <cl@gentwo.org>,
	David Rientjes <rientjes@google.com>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	Michal Hocko <mhocko@kernel.org>, Hao Li <hao.li@linux.dev>,
	Alexei Starovoitov <ast@kernel.org>,
	Puranjay Mohan <puranjay@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Amery Hung <ameryhung@gmail.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	"Paul E . McKenney" <paulmck@kernel.org>,
	Frederic Weisbecker <frederic@kernel.org>,
	Neeraj Upadhyay <neeraj.upadhyay@kernel.org>,
	Josh Triplett <josh@joshtriplett.org>,
	Boqun Feng <boqun.feng@gmail.com>,
	Uladzislau Rezki <urezki@gmail.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	Zqiang <qiang.zhang@linux.dev>,
	Dave Chinner <david@fromorbit.com>,
	Qi Zheng <zhengqi.arch@bytedance.com>,
	Muchun Song <muchun.song@linux.dev>,
	rcu@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org,
	peterz@infradead.org
Subject: Re: [RFC PATCH 6/7] mm/slab: introduce kfree_rcu_nolock()
Date: Wed, 25 Feb 2026 14:55:35 +0900	[thread overview]
Message-ID: <aZ6O1-8SLXZxpz6g@hyeyoo> (raw)
In-Reply-To: <20260216213254.GA1469635@joelbox2>

Hi Joel, I appreciate your feedback.

On Mon, Feb 16, 2026 at 04:32:54PM -0500, Joel Fernandes wrote:
> CC Peter for real this time. ;-)
> 
> On Mon, Feb 16, 2026 at 04:07:55PM -0500, Joel Fernandes wrote:
> > Hi Harry,
> > 
> > On Fri, Feb 06, 2026 at 06:34:09PM +0900, Harry Yoo wrote:
> > > Currently, kfree_rcu() cannot be called in an NMI context.
> > > In such a context, even calling call_rcu() is not legal,
> > > forcing users to implement deferred freeing.
> > > 
> > > Make users' lives easier by introducing kfree_rcu_nolock() variant.
> > > Unlike kfree_rcu(), kfree_rcu_nolock() only supports a 2-argument
> > > variant, because, in the worst case where memory allocation fails,
> > > the caller cannot synchronously wait for the grace period to finish.
> > > 
> > > Similar to kfree_nolock() implementation, try to acquire kfree_rcu_cpu
> > > spinlock, and if that fails, insert the object to per-cpu lockless list
> > > and delay freeing using irq_work that calls kvfree_call_rcu() later.
> > > In case kmemleak or debugobjects is enabled, always defer freeing as
> > > those debug features don't support NMI contexts.
> > > 
> > > When trylock succeeds, avoid consuming bnode and run_page_cache_worker()
> > > altogether. Instead, insert objects into struct kfree_rcu_cpu.head
> > > without consuming additional memory.
> > > 
> > > For now, the sheaves layer is bypassed if spinning is not allowed.
> > > 
> > > Scheduling delayed monitor work in an NMI context is tricky; use
> > > irq_work to schedule, but use lazy irq_work to avoid raising self-IPIs.
> > > That means scheduling delayed monitor work can be delayed up to the
> > > length of a time slice.
> > > 
> > > Without CONFIG_KVFREE_RCU_BATCHED, all frees in the !allow_spin case are
> > > delayed using irq_work.
> > > 
> > > Suggested-by: Alexei Starovoitov <ast@kernel.org>
> > > Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> > > ---
> > >  include/linux/rcupdate.h |  23 ++++---
> > >  mm/slab_common.c         | 140 +++++++++++++++++++++++++++++++++------
> > >  2 files changed, 133 insertions(+), 30 deletions(-)
> > > 

[...]

> > > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > > index d232b99a4b52..9d7801e5cb73 100644
> > > --- a/mm/slab_common.c
> > > +++ b/mm/slab_common.c
> > > @@ -1311,6 +1311,12 @@ struct kfree_rcu_cpu_work {
> > >   * the interactions with the slab allocators.
> > >   */
> > >  struct kfree_rcu_cpu {
> > > +	// Objects queued on a lockless linked list, not protected by the lock.
> > > +	// This allows freeing objects in NMI context, where trylock may fail.
> > > +	struct llist_head llist_head;
> > > +	struct irq_work irq_work;
> > > +	struct irq_work sched_monitor_irq_work;
> > 
> > It would be great if irq_work_queue() could support a lazy flag, or a new
> > irq_work_queue_lazy() which then just skips the irq_work_raise() for the lazy
> > case. Then we don't need multiple struct irq_work doing the same thing. +PeterZ

That'd be nice to have, yes.

> > > @@ -1979,9 +2059,15 @@ void kvfree_call_rcu_ptr(struct rcu_ptr *head, void *ptr)
> > >  	}
> > >  
> > >  	kasan_record_aux_stack(ptr);
> > > -	success = add_ptr_to_bulk_krc_lock(&krcp, &flags, ptr, !head);
> > > +
> > > +	krcp = krc_this_cpu_lock(&flags, allow_spin);
> > > +	if (!krcp)
> > > +		goto defer_free;
> > > +
> > > +	success = add_ptr_to_bulk_krc_lock(krcp, &flags, ptr, !head, allow_spin);
> > >  	if (!success) {
> > > -		run_page_cache_worker(krcp);
> > > +		if (allow_spin)
> > > +			run_page_cache_worker(krcp);
> > >  
> > >  		if (head == NULL)
> > >  			// Inline if kvfree_rcu(one_arg) call.
> > > @@ -2005,8 +2091,12 @@ void kvfree_call_rcu_ptr(struct rcu_ptr *head, void *ptr)
> > >  	kmemleak_ignore(ptr);
> > >  
> > >  	// Set timer to drain after KFREE_DRAIN_JIFFIES.
> > > -	if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING)
> > > -		__schedule_delayed_monitor_work(krcp);
> > > +	if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING) {
> > > +		if (allow_spin)
> > > +			__schedule_delayed_monitor_work(krcp);
> > > +		else
> > > +			irq_work_queue(&krcp->sched_monitor_irq_work);
> > 
> > Here this irq_work will be queued even if delayed_work_pending? That might be
> > additional irq_work overhead (which was not needed) when the delayed monitor
> > was already queued?

Right.

> > If delayed_work_pending() is safe to call from NMI, you could also call
> > that to avoid unnecessary irq_work queueing. But do double check if it is.

I think test_bit(WORK_STRUCT_PENDING_BIT, ...); should be safe to use
w/ allow_spin == false. I'll give it a try in v2.

Actually, I'm massaging v2 to make the allow_spin == false case behave almost
similiarly to allow_spin == true case (scheduling delayed monitor work
only when needed - as you mentioned, allocating & consuming bnodes if
possible, and running page cache worker when needed).

> > Also per [1], I gather allow_spin does not always imply NMI. If that is true,
> > is better to call in_nmi() instead of relying on allow_spin?

As Alexei explained [1], allow_spin == false implies that the context is
unknown. It might be in NMI, or in the middle of kfree_rcu, or something
else.

Because NMI context is not the only context that cannot use spinlocks,
e.g. kfree_rcu_nolock() may be called in the middle of kfree_rcu().
So using in_nmi() to check the current context doesn't help much here.

> > [1] https://lore.kernel.org/all/CAADnVQKk_Bgi0bc-td_3pVpHYXR3CpC3R8rg-NHwdLEDiQSeNg@mail.gmail.com/
> > 
> > Thanks,

-- 
Cheers,
Harry / Hyeonggon


  reply	other threads:[~2026-02-25  6:02 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-06  9:34 [RFC PATCH 0/7] k[v]free_rcu() improvements Harry Yoo
2026-02-06  9:34 ` [RFC PATCH 1/7] mm/slab: introduce k[v]free_rcu() with struct rcu_ptr Harry Yoo
2026-02-11 10:16   ` Uladzislau Rezki
2026-02-11 10:44     ` Harry Yoo
2026-02-11 10:53       ` Uladzislau Rezki
2026-02-11 11:26         ` Harry Yoo
2026-02-11 13:02           ` Uladzislau Rezki
2026-02-11 17:05           ` Alexei Starovoitov
2026-02-12 11:52     ` Vlastimil Babka
2026-02-13  5:17       ` Harry Yoo
2026-02-06  9:34 ` [RFC PATCH 2/7] mm: use rcu_ptr instead of rcu_head Harry Yoo
2026-02-09 10:41   ` Uladzislau Rezki
2026-02-09 11:22     ` Harry Yoo
2026-02-06  9:34 ` [RFC PATCH 3/7] mm/slab: allow freeing kmalloc_nolock()'d objects using kfree[_rcu]() Harry Yoo
2026-02-06  9:34 ` [RFC PATCH 4/7] mm/slab: free a bit in enum objexts_flags Harry Yoo
2026-02-06 20:09   ` Alexei Starovoitov
2026-02-09  9:38     ` Vlastimil Babka
2026-02-09 18:44       ` Alexei Starovoitov
2026-02-06  9:34 ` [RFC PATCH 5/7] mm/slab: move kfree_rcu_cpu[_work] definitions Harry Yoo
2026-02-06  9:34 ` [RFC PATCH 6/7] mm/slab: introduce kfree_rcu_nolock() Harry Yoo
2026-02-12  2:58   ` Harry Yoo
2026-02-16 21:07   ` Joel Fernandes
2026-02-16 21:32     ` Joel Fernandes
2026-02-25  5:55       ` Harry Yoo [this message]
2026-02-06  9:34 ` [RFC PATCH 7/7] mm/slab: make kfree_rcu_nolock() work with sheaves Harry Yoo
2026-02-12 19:15   ` Alexei Starovoitov
2026-02-13 11:55     ` Harry Yoo
2026-02-07  0:16 ` [RFC PATCH 0/7] k[v]free_rcu() improvements Paul E. McKenney
2026-02-07  1:21   ` Harry Yoo
2026-02-07  1:33     ` Paul E. McKenney
2026-02-09  9:02       ` Harry Yoo
2026-02-09 16:40         ` Paul E. McKenney
2026-02-12 14:28 ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aZ6O1-8SLXZxpz6g@hyeyoo \
    --to=harry.yoo@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=ameryhung@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=boqun.feng@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=cl@gentwo.org \
    --cc=david@fromorbit.com \
    --cc=frederic@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=hao.li@linux.dev \
    --cc=jiangshanlai@gmail.com \
    --cc=joelagnelf@nvidia.com \
    --cc=josh@joshtriplett.org \
    --cc=linux-mm@kvack.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhocko@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=neeraj.upadhyay@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=puranjay@kernel.org \
    --cc=qiang.zhang@linux.dev \
    --cc=rcu@vger.kernel.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=rostedt@goodmis.org \
    --cc=shakeel.butt@linux.dev \
    --cc=urezki@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox