From: Kevin Brodsky <kevin.brodsky@arm.com>
To: David Hildenbrand <david@redhat.com>,
Alexander Gordeev <agordeev@linux.ibm.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Andreas Larsson <andreas@gaisler.com>,
Andrew Morton <akpm@linux-foundation.org>,
Boris Ostrovsky <boris.ostrovsky@oracle.com>,
Borislav Petkov <bp@alien8.de>,
Catalin Marinas <catalin.marinas@arm.com>,
Christophe Leroy <christophe.leroy@csgroup.eu>,
Dave Hansen <dave.hansen@linux.intel.com>,
"David S. Miller" <davem@davemloft.net>,
"H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
Jann Horn <jannh@google.com>, Juergen Gross <jgross@suse.com>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
Madhavan Srinivasan <maddy@linux.ibm.com>,
Michael Ellerman <mpe@ellerman.id.au>,
Michal Hocko <mhocko@suse.com>, Mike Rapoport <rppt@kernel.org>,
Nicholas Piggin <npiggin@gmail.com>,
Peter Zijlstra <peterz@infradead.org>,
Ryan Roberts <ryan.roberts@arm.com>,
Suren Baghdasaryan <surenb@google.com>,
Thomas Gleixner <tglx@linutronix.de>,
Vlastimil Babka <vbabka@suse.cz>, Will Deacon <will@kernel.org>,
Yeoreum Yun <yeoreum.yun@arm.com>,
linux-arm-kernel@lists.infradead.org,
linuxppc-dev@lists.ozlabs.org, sparclinux@vger.kernel.org,
xen-devel@lists.xenproject.org
Subject: Re: [PATCH v2 2/7] mm: introduce local state for lazy_mmu sections
Date: Tue, 9 Sep 2025 15:49:46 +0200 [thread overview]
Message-ID: <47ee1df7-1602-4200-af94-475f84ca8d80@arm.com> (raw)
In-Reply-To: <e521b1f4-3f2b-48cd-9568-b9a4cf4c4830@redhat.com>
On 09/09/2025 13:54, David Hildenbrand wrote:
> On 09.09.25 13:45, Alexander Gordeev wrote:
>> On Tue, Sep 09, 2025 at 12:09:48PM +0200, David Hildenbrand wrote:
>>> On 09.09.25 11:40, Alexander Gordeev wrote:
>>>> On Tue, Sep 09, 2025 at 11:07:36AM +0200, David Hildenbrand wrote:
>>>>> On 08.09.25 09:39, Kevin Brodsky wrote:
>>>>>> arch_{enter,leave}_lazy_mmu_mode() currently have a stateless API
>>>>>> (taking and returning no value). This is proving problematic in
>>>>>> situations where leave() needs to restore some context back to its
>>>>>> original state (before enter() was called). In particular, this
>>>>>> makes it difficult to support the nesting of lazy_mmu sections -
>>>>>> leave() does not know whether the matching enter() call occurred
>>>>>> while lazy_mmu was already enabled, and whether to disable it or
>>>>>> not.
>>>>>>
>>>>>> This patch gives all architectures the chance to store local state
>>>>>> while inside a lazy_mmu section by making enter() return some value,
>>>>>> storing it in a local variable, and having leave() take that value.
>>>>>> That value is typed lazy_mmu_state_t - each architecture defining
>>>>>> __HAVE_ARCH_ENTER_LAZY_MMU_MODE is free to define it as it sees fit.
>>>>>> For now we define it as int everywhere, which is sufficient to
>>>>>> support nesting.
>>>> ...
>>>>>> {
>>>>>> + lazy_mmu_state_t lazy_mmu_state;
>>>>>> ...
>>>>>> - arch_enter_lazy_mmu_mode();
>>>>>> + lazy_mmu_state = arch_enter_lazy_mmu_mode();
>>>>>> ...
>>>>>> - arch_leave_lazy_mmu_mode();
>>>>>> + arch_leave_lazy_mmu_mode(lazy_mmu_state);
>>>>>> ...
>>>>>> }
>>>>>>
>>>>>> * In a few cases (e.g. xen_flush_lazy_mmu()), a function knows that
>>>>>> lazy_mmu is already enabled, and it temporarily disables it by
>>>>>> calling leave() and then enter() again. Here we want to ensure
>>>>>> that any operation between the leave() and enter() calls is
>>>>>> completed immediately; for that reason we pass
>>>>>> LAZY_MMU_DEFAULT to
>>>>>> leave() to fully disable lazy_mmu. enter() will then
>>>>>> re-enable it
>>>>>> - this achieves the expected behaviour, whether nesting
>>>>>> occurred
>>>>>> before that function was called or not.
>>>>>>
>>>>>> Note: it is difficult to provide a default definition of
>>>>>> lazy_mmu_state_t for architectures implementing lazy_mmu, because
>>>>>> that definition would need to be available in
>>>>>> arch/x86/include/asm/paravirt_types.h and adding a new generic
>>>>>> #include there is very tricky due to the existing header soup.
>>>>>
>>>>> Yeah, I was wondering about exactly that.
>>>>>
>>>>> In particular because LAZY_MMU_DEFAULT etc resides somewehere
>>>>> compeltely
>>>>> different.
>>>>>
>>>>> Which raises the question: is using a new type really of any
>>>>> benefit here?
>>>>>
>>>>> Can't we just use an "enum lazy_mmu_state" and call it a day?
>>>>
>>>> I could envision something completely different for this type on s390,
>>>> e.g. a pointer to a per-cpu structure. So I would really ask to stick
>>>> with the current approach.
This is indeed the motivation - let every arch do whatever it sees fit.
lazy_mmu_state_t is basically an opaque type as far as generic code is
concerned, which also means that this API change is the first and last
one we need (famous last words, I know).
I mentioned in the cover letter that the pkeys-based page table
protection series [1] would have an immediate use for lazy_mmu_state_t.
In that proposal, any helper writing to pgtables needs to modify the
pkey register and then restore it. To reduce the overhead, lazy_mmu is
used to set the pkey register only once in enter(), and then restore it
in leave() [2]. This currently relies on storing the original pkey
register value in thread_struct, which is suboptimal and most
importantly doesn't work if lazy_mmu sections nest. With this series, we
could instead store the pkey register value in lazy_mmu_state_t
(enlarging it to 64 bits or more).
I also considered going further and making lazy_mmu_state_t a pointer as
Alexander suggested - more complex to manage, but also a lot more flexible.
>>> Would that integrate well with LAZY_MMU_DEFAULT etc?
>>
>> Hmm... I though the idea is to use LAZY_MMU_* by architectures that
>> want to use it - at least that is how I read the description above.
>>
>> It is only kasan_populate|depopulate_vmalloc_pte() in generic code
>> that do not follow this pattern, and it looks as a problem to me.
This discussion also made me realise that this is problematic, as the
LAZY_MMU_{DEFAULT,NESTED} macros were meant only for architectures'
convenience, not for generic code (where lazy_mmu_state_t should ideally
be an opaque type as mentioned above). It almost feels like the kasan
case deserves a different API, because this is not how enter() and
leave() are meant to be used. This would mean quite a bit of churn
though, so maybe just introduce another arch-defined value to pass to
leave() for such a situation - for instance,
arch_leave_lazy_mmu_mode(LAZY_MMU_FLUSH)?
>
> Yes, that's why I am asking.
>
> What kind of information (pointer to a per-cpu structure) would you
> want to return, and would handling it similar to how
> pagefault_disable()/pagefault_enable() e.g., using a variable in
> "current" to track the nesting level avoid having s390x to do that?
The pagefault_disabled approach works fine for simple use-cases, but it
doesn't scale well. The space allocated in task_struct/thread_struct to
track that state is wasted (unused) most of the time. Worse, it does not
truly enable states to be nested: it allows the outermost section to
store some state, but nested sections cannot allocate extra space. This
is really what the stack is for.
- Kevin
next prev parent reply other threads:[~2025-09-09 13:50 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-08 7:39 [PATCH v2 0/7] Nesting support for lazy MMU mode Kevin Brodsky
2025-09-08 7:39 ` [PATCH v2 1/7] mm: remove arch_flush_lazy_mmu_mode() Kevin Brodsky
2025-09-08 9:29 ` Yeoreum Yun
2025-09-09 9:00 ` David Hildenbrand
2025-09-08 7:39 ` [PATCH v2 2/7] mm: introduce local state for lazy_mmu sections Kevin Brodsky
2025-09-08 9:30 ` Yeoreum Yun
2025-09-09 5:40 ` Andrew Morton
2025-09-09 9:05 ` Kevin Brodsky
2025-09-09 9:07 ` David Hildenbrand
2025-09-09 9:40 ` Alexander Gordeev
2025-09-09 10:09 ` David Hildenbrand
2025-09-09 11:45 ` Alexander Gordeev
2025-09-09 11:54 ` David Hildenbrand
2025-09-09 13:49 ` Kevin Brodsky [this message]
2025-09-09 14:02 ` Kevin Brodsky
2025-09-09 14:28 ` David Hildenbrand
2025-09-10 15:16 ` Kevin Brodsky
2025-09-10 15:37 ` David Hildenbrand
2025-09-11 16:19 ` Kevin Brodsky
2025-09-11 18:14 ` David Hildenbrand
2025-09-12 7:26 ` Kevin Brodsky
2025-09-12 8:04 ` David Hildenbrand
2025-09-12 8:48 ` Kevin Brodsky
2025-09-12 8:55 ` David Hildenbrand
2025-09-12 12:37 ` Alexander Gordeev
2025-09-12 12:40 ` David Hildenbrand
2025-09-12 12:56 ` Alexander Gordeev
2025-09-12 13:02 ` David Hildenbrand
2025-09-12 14:05 ` Alexander Gordeev
2025-09-12 14:25 ` David Hildenbrand
2025-09-12 15:02 ` Kevin Brodsky
2025-09-09 14:38 ` Alexander Gordeev
2025-09-10 16:11 ` Kevin Brodsky
2025-09-11 12:06 ` Alexander Gordeev
2025-09-11 16:20 ` Kevin Brodsky
2025-09-09 10:57 ` Juergen Gross
2025-09-09 14:15 ` Kevin Brodsky
2025-09-09 10:08 ` Jürgen Groß
2025-09-08 7:39 ` [PATCH v2 3/7] arm64: mm: fully support nested " Kevin Brodsky
2025-09-08 9:30 ` Yeoreum Yun
2025-09-08 7:39 ` [PATCH v2 4/7] x86/xen: support nested lazy_mmu sections (again) Kevin Brodsky
2025-09-09 9:13 ` David Hildenbrand
2025-09-09 9:37 ` Jürgen Groß
2025-09-09 9:56 ` David Hildenbrand
2025-09-09 11:28 ` Kevin Brodsky
2025-09-09 9:42 ` Jürgen Groß
2025-09-08 7:39 ` [PATCH v2 5/7] powerpc/mm: support nested lazy_mmu sections Kevin Brodsky
2025-09-08 7:39 ` [PATCH v2 6/7] sparc/mm: " Kevin Brodsky
2025-09-08 7:39 ` [PATCH v2 7/7] mm: update lazy_mmu documentation Kevin Brodsky
2025-09-08 9:30 ` Yeoreum Yun
2025-09-08 16:56 ` [PATCH v2 0/7] Nesting support for lazy MMU mode Lorenzo Stoakes
2025-09-09 9:10 ` Kevin Brodsky
2025-09-09 2:16 ` Andrew Morton
2025-09-09 9:21 ` David Hildenbrand
2025-09-09 13:59 ` Kevin Brodsky
2025-09-12 15:25 ` Kevin Brodsky
2025-09-15 6:28 ` Alexander Gordeev
2025-09-15 11:19 ` Kevin Brodsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47ee1df7-1602-4200-af94-475f84ca8d80@arm.com \
--to=kevin.brodsky@arm.com \
--cc=Liam.Howlett@oracle.com \
--cc=agordeev@linux.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=andreas@gaisler.com \
--cc=boris.ostrovsky@oracle.com \
--cc=bp@alien8.de \
--cc=catalin.marinas@arm.com \
--cc=christophe.leroy@csgroup.eu \
--cc=dave.hansen@linux.intel.com \
--cc=davem@davemloft.net \
--cc=david@redhat.com \
--cc=hpa@zytor.com \
--cc=jannh@google.com \
--cc=jgross@suse.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=maddy@linux.ibm.com \
--cc=mhocko@suse.com \
--cc=mingo@redhat.com \
--cc=mpe@ellerman.id.au \
--cc=npiggin@gmail.com \
--cc=peterz@infradead.org \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=sparclinux@vger.kernel.org \
--cc=surenb@google.com \
--cc=tglx@linutronix.de \
--cc=vbabka@suse.cz \
--cc=will@kernel.org \
--cc=xen-devel@lists.xenproject.org \
--cc=yeoreum.yun@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox