Re: [RFC PATCH 0/4] sched+mm: Track lazy active mm existence with hazard pointers

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Linus Torvalds <torvalds@linux-foundation.org>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	 linux-kernel@vger.kernel.org,
	Nicholas Piggin <npiggin@gmail.com>,
	 Michael Ellerman <mpe@ellerman.id.au>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	 Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	 Will Deacon <will@kernel.org>, Boqun Feng <boqun.feng@gmail.com>,
	 Alan Stern <stern@rowland.harvard.edu>,
	John Stultz <jstultz@google.com>,
	 Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>,
	Frederic Weisbecker <frederic@kernel.org>,
	 Joel Fernandes <joel@joelfernandes.org>,
	Josh Triplett <josh@joshtriplett.org>,
	 Uladzislau Rezki <urezki@gmail.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	 Lai Jiangshan <jiangshanlai@gmail.com>,
	Zqiang <qiang.zhang1211@gmail.com>,
	 Ingo Molnar <mingo@redhat.com>, Waiman Long <longman@redhat.com>,
	 Mark Rutland <mark.rutland@arm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	 Vlastimil Babka <vbabka@suse.cz>,
	maged.michael@gmail.com, Mateusz Guzik <mjguzik@gmail.com>,
	 Jonas Oberhauser <jonas.oberhauser@huaweicloud.com>,
	rcu@vger.kernel.org, linux-mm@kvack.org,  lkmm@lists.linux.dev
Subject: Re: [RFC PATCH 0/4] sched+mm: Track lazy active mm existence with hazard pointers
Date: Wed, 2 Oct 2024 10:39:15 -0700	[thread overview]
Message-ID: <CAHk-=wgztWbA4z85xKob4eS9P=Nt5h4j=HnN+Pc90expskiCRA@mail.gmail.com> (raw)
In-Reply-To: <20241002010205.1341915-1-mathieu.desnoyers@efficios.com>

On Tue, 1 Oct 2024 at 18:04, Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:
>
> Hazard pointers appear to be a good fit for replacing refcount based lazy
> active mm tracking.

If the mm refcount is this expensive, I suspect we really shouldn't
use it at all.

The thing is, we don't _need_ to use the mm refcount - the reason the
lazy-tlb handling uses it is because we already had that refcount and
it was easy to extend on existing logic, not because it's really
required any more.

The lazy-tlb activation is basically "I'm switching to a kernel
thread, so I'll re-use the TLB state of the previous thread".

(And yes, it also has a secondary case of "I'm exiting, so I will turn
the mm I already have into a lazy one").

But in the actual task switch case, the previous thread hasn't _lost_
that mm, so we don't actually need to take the refcount at all.

We really just need to make sure to invalidate it before it's torn
down, but we do that *anyway* as part of TLB flushing.

(The exit case is actually different: we are setting it up to be lost,
although delayed - and the lazy count is the delay).

The only thing the refcount means is that we don't actually have to be
as careful when we actually *really* get rid of the MM. We can be a
bit laissez-faire about things because even if we weren't to
invalidate the lazy mm, it does have its own refcount, so we don't
much care.

But in reality, we're actually very careful about the active_mm
_anyway_, because of a fairly fundamental issue: the TLB shootdown and
PCID handling that we need to do even when mm's aren't lazy.

So we actually keep track of things like "which CPU's have seen this
MM state" in all the TLB code.

And even the exit case doesn't actually need the special thing - it
*does* need the "this CPU is still using this MM", but we have that
too as part of the TLB code - entirely independently of 'active_mm'.

So in many ways, I'm pretty sure not just the refcount, but all of
'active_mm', is largely pointless to begin with.

And if the refcount really is this big of a deal:

> nr threads (-t)     speedup
>    192               +28%

then we should probably just strive to get rid of 'active_mm' altogether.

Look, at least on x86 we ALREADY has a better replacement: it's the
percpu 'cpu_tlbstate'.

It basically duplicates all we do with active_mm and the whole "keep
track of old mm state" (the 'loaded_mm' member is basically the true
'active' mm), except it has some additional fixes:

 - it has some extra housekeeping data that the architecture wants
(for PCID updates etc)

 - it's actually atomic wrt the low-level code in ways that
'current->active_mm' isn't

So I think the real issue is that "active_mm" is an old hack from a
bygone era when we didn't have the (much more involved) full TLB
tracking.

               Linus

next prev parent reply	other threads:[~2024-10-02 17:39 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-02  1:02 Mathieu Desnoyers
2024-10-02  1:02 ` [RFC PATCH 1/4] compiler.h: Introduce ptr_eq() to preserve address dependency Mathieu Desnoyers
2024-10-03  0:08   ` Joel Fernandes
2024-10-03 14:19     ` Mathieu Desnoyers
2024-10-03 22:09       ` Joel Fernandes
2024-10-02  1:02 ` [RFC PATCH 2/4] Documentation: RCU: Refer to ptr_eq() Mathieu Desnoyers
2024-10-02  1:02 ` [RFC PATCH 3/4] hp: Implement Hazard Pointers Mathieu Desnoyers
2024-10-03  0:24   ` Boqun Feng
2024-10-03 13:30     ` Mathieu Desnoyers
2024-10-07 13:47       ` Boqun Feng
2024-10-07 14:52         ` Mathieu Desnoyers
2024-10-02  1:02 ` [RFC PATCH 4/4] sched+mm: Use hazard pointers to track lazy active mm existence Mathieu Desnoyers
2024-10-02 14:09 ` [RFC PATCH 0/4] sched+mm: Track lazy active mm existence with hazard pointers Paul E. McKenney
2024-10-02 15:26   ` Mathieu Desnoyers
2024-10-02 15:33     ` Matthew Wilcox
2024-10-02 15:36       ` Mathieu Desnoyers
2024-10-02 15:53         ` Mathieu Desnoyers
2024-10-02 15:58           ` Jens Axboe
2024-10-02 16:02             ` Mathieu Desnoyers
2024-10-02 16:14               ` Jens Axboe
2024-10-02 17:39 ` Linus Torvalds [this message]
2024-10-05 16:15   ` Peter Zijlstra
2024-10-05 16:56     ` Linus Torvalds
2024-10-07  7:06       ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wgztWbA4z85xKob4eS9P=Nt5h4j=HnN+Pc90expskiCRA@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=Neeraj.Upadhyay@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=bigeasy@linutronix.de \
    --cc=boqun.feng@gmail.com \
    --cc=frederic@kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=jiangshanlai@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=jonas.oberhauser@huaweicloud.com \
    --cc=josh@joshtriplett.org \
    --cc=jstultz@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkmm@lists.linux.dev \
    --cc=longman@redhat.com \
    --cc=maged.michael@gmail.com \
    --cc=mark.rutland@arm.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@redhat.com \
    --cc=mjguzik@gmail.com \
    --cc=mpe@ellerman.id.au \
    --cc=npiggin@gmail.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=qiang.zhang1211@gmail.com \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=stern@rowland.harvard.edu \
    --cc=tglx@linutronix.de \
    --cc=urezki@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox