linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Harry Yoo (Oracle)" <harry@kernel.org>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@kernel.org>,
	Christoph Lameter <cl@gentwo.org>,
	David Rientjes <rientjes@google.com>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Hao Li <hao.li@linux.dev>, Alexei Starovoitov <ast@kernel.org>,
	Uladzislau Rezki <urezki@gmail.com>,
	Frederic Weisbecker <frederic@kernel.org>,
	Neeraj Upadhyay <neeraj.upadhyay@kernel.org>,
	Joel Fernandes <joelagnelf@nvidia.com>,
	Josh Triplett <josh@joshtriplett.org>,
	Boqun Feng <boqun@kernel.org>, Zqiang <qiang.zhang@linux.dev>,
	Steven Rostedt <rostedt@goodmis.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	rcu@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH 4/8] mm/slab: introduce kfree_rcu_nolock()
Date: Wed, 22 Apr 2026 12:02:03 +0900	[thread overview]
Message-ID: <aeg6K-NcvFHZ02bk@hyeyoo> (raw)
In-Reply-To: <805c33d7-3a7b-470c-bd9d-065717a3e3e2@paulmck-laptop>

On Tue, Apr 21, 2026 at 04:10:41PM -0700, Paul E. McKenney wrote:
> On Tue, Apr 21, 2026 at 03:46:30PM -0700, Alexei Starovoitov wrote:
> > On Thu Apr 16, 2026 at 2:10 AM PDT, Harry Yoo (Oracle) wrote:
> > >  struct kfree_rcu_cpu {
> > > +	// Objects queued on a lockless linked list, used to free objects
> > > +	// in unknown contexts when trylock fails.
> > > +	struct llist_head defer_head;
> > > +
> > > +	struct irq_work defer_free;
> > > +	struct irq_work sched_delayed_monitor;
> > > +	struct irq_work run_page_cache_worker;
> > > +
> > >  	// Objects queued on a linked list
> > >  	struct rcu_ptr *head;
> > >  	unsigned long head_gp_snap;
> > > @@ -1333,12 +1341,99 @@ struct kfree_rcu_cpu {
> > >  	struct llist_head bkvcache;
> > >  	int nr_bkv_objs;
> > >  };
> > > +
> > > +static void defer_kfree_rcu_irq_work_fn(struct irq_work *work);
> > > +static void sched_delayed_monitor_irq_work_fn(struct irq_work *work);
> > > +static void run_page_cache_worker_irq_work_fn(struct irq_work *work);
> > > +
> > > +static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc) = {
> > > +	.lock = __RAW_SPIN_LOCK_UNLOCKED(krc.lock),
> > > +	.defer_head = LLIST_HEAD_INIT(defer_head),
> > > +	.defer_free = IRQ_WORK_INIT(defer_kfree_rcu_irq_work_fn),
> > > +	.sched_delayed_monitor =
> > > +		IRQ_WORK_INIT_LAZY(sched_delayed_monitor_irq_work_fn),
> > > +	.run_page_cache_worker =
> > > +		IRQ_WORK_INIT_LAZY(run_page_cache_worker_irq_work_fn),
> > > +};
> > 
> > I think kfree_rcu_cpu doesn't need to be per-cpu.

After reading this, I was like "Oh, that's quite a drastic change?",
but looks like I misread it. I didn't create a new percpu structure,
but extended the existing one.

I guess you meant the new fields added (defer_head, and irq works)
to struct kfree_rcu_cpu, not the whole structure.

> > It can be global llist with single irq_work for them all.

It could be, but what is the benefit of separating them from
existing kfree_rcu_cpu and making them global?

> I would be quite nervous about that, but you might well be right, given
> that this is a trylock-acquisition failure path.  Give or take people
> and/or machines analyzing the code for potential denial-of-service
> attacks.  :-/

It'll probably not that bad because it's trylock-acquisiion failure path
of per-cpu lock; IIRC during my test, falling back to defer_free
happened only a few times (< 10) when the kunit test is calling
kfree_rcu() in a tight loop (100k calls) while concurrently invoking
kfree_rcu_nolock() ~10k times on the same CPU.

> > Not sure about sched_delayed_monitor/run_page_cache_worker.
> > Do they have to be per-cpu ?

Since existing sched_delayed_monitor/run_page_cache_worker works are
per-cpu, I think it's better to keep those irq_works per-cpu as well.

> > Can all 3 share single irq_work?

I thought defer_free and defer_call_rcu should be non-lazy irq work
and others should be lazy irq work. And I was thinking of having
one lazy and one non-lazy IRQ work (two instead of four).

But given that sched_delayed_monitor and run_page_cache_worker should
not triggered that frequently anyway, it'll probably be okay for all of
them to share a single non-lazy IRQ work.

> On the other hand, if all CPUs are doing kfree_rcu() in even a semi-tight
> loop, having them all unconditionally use global state is not going to
> make for a fun time on large systems.  And there already are situations
> where user code can make all CPUs to call_rcu() in a semi-tight loop,
> so even if that is not yet the case for kfree_rcu(), past experience
> indicates that it soon will be.

A tight loop for kfree_rcu() should be fine.

I think the question is "Can a malicious user can make all CPUs to
kfree_rcu() in a tight loop AND concurrently trigger kfree_rcu_nolock()
on those CPUs, so that trylock will mostly fail"

> And noted on the desirability of call_rcu_nolock(), apologies for being
> slow.

No problem. Really appreciate looking into it, Alexei and Paul!

-- 
Cheers,
Harry / Hyeonggon


  parent reply	other threads:[~2026-04-22  3:02 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-16  9:10 [RFC PATCH v2 0/8] kvfree_rcu() improvements Harry Yoo (Oracle)
2026-04-16  9:10 ` [PATCH 1/8] mm/slab: introduce k[v]free_rcu() with struct rcu_ptr Harry Yoo (Oracle)
2026-04-16  9:10 ` [PATCH 2/8] fs/dcache: use rcu_ptr instead of rcu_head for external names Harry Yoo (Oracle)
2026-04-21 20:21   ` Al Viro
2026-04-22  1:16     ` Harry Yoo (Oracle)
2026-04-16  9:10 ` [PATCH 3/8] mm/slab: move kfree_rcu_cpu[_work] definitions Harry Yoo (Oracle)
2026-04-16  9:10 ` [PATCH 4/8] mm/slab: introduce kfree_rcu_nolock() Harry Yoo (Oracle)
2026-04-21 22:46   ` Alexei Starovoitov
2026-04-21 23:10     ` Paul E. McKenney
2026-04-21 23:14       ` Alexei Starovoitov
2026-04-22  3:02       ` Harry Yoo (Oracle) [this message]
2026-04-16  9:10 ` [PATCH 5/8] mm/slab: make kfree_rcu_nolock() work with sheaves Harry Yoo (Oracle)
2026-04-16  9:10 ` [PATCH 6/8] mm/slab: wrap rcu sheaf handling with ifdef Harry Yoo (Oracle)
2026-04-16  9:10 ` [PATCH 7/8] mm/slab: introduce deferred submission of rcu sheaves Harry Yoo (Oracle)
2026-04-21 22:51   ` Alexei Starovoitov
2026-04-22  3:11     ` Harry Yoo (Oracle)
2026-04-16  9:10 ` [PATCH 8/8] lib/tests/slub_kunit: add a test case for kfree_rcu_nolock() Harry Yoo (Oracle)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aeg6K-NcvFHZ02bk@hyeyoo \
    --to=harry@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=alexei.starovoitov@gmail.com \
    --cc=ast@kernel.org \
    --cc=boqun@kernel.org \
    --cc=cl@gentwo.org \
    --cc=frederic@kernel.org \
    --cc=hao.li@linux.dev \
    --cc=jiangshanlai@gmail.com \
    --cc=joelagnelf@nvidia.com \
    --cc=josh@joshtriplett.org \
    --cc=linux-mm@kvack.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=neeraj.upadhyay@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=qiang.zhang@linux.dev \
    --cc=rcu@vger.kernel.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=rostedt@goodmis.org \
    --cc=urezki@gmail.com \
    --cc=vbabka@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox