From: Ryan Roberts <ryan.roberts@arm.com>
To: Kevin Brodsky <kevin.brodsky@arm.com>, linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org,
Alexander Gordeev <agordeev@linux.ibm.com>,
Andreas Larsson <andreas@gaisler.com>,
Andrew Morton <akpm@linux-foundation.org>,
Boris Ostrovsky <boris.ostrovsky@oracle.com>,
Borislav Petkov <bp@alien8.de>,
Catalin Marinas <catalin.marinas@arm.com>,
Christophe Leroy <christophe.leroy@csgroup.eu>,
Dave Hansen <dave.hansen@linux.intel.com>,
David Hildenbrand <david@redhat.com>,
"David S. Miller" <davem@davemloft.net>,
David Woodhouse <dwmw2@infradead.org>,
"H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
Jann Horn <jannh@google.com>, Juergen Gross <jgross@suse.com>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
Madhavan Srinivasan <maddy@linux.ibm.com>,
Michael Ellerman <mpe@ellerman.id.au>,
Michal Hocko <mhocko@suse.com>, Mike Rapoport <rppt@kernel.org>,
Nicholas Piggin <npiggin@gmail.com>,
Peter Zijlstra <peterz@infradead.org>,
Suren Baghdasaryan <surenb@google.com>,
Thomas Gleixner <tglx@linutronix.de>,
Vlastimil Babka <vbabka@suse.cz>, Will Deacon <will@kernel.org>,
Yeoreum Yun <yeoreum.yun@arm.com>,
linux-arm-kernel@lists.infradead.org,
linuxppc-dev@lists.ozlabs.org, sparclinux@vger.kernel.org,
xen-devel@lists.xenproject.org, x86@kernel.org
Subject: Re: [PATCH v4 07/12] mm: enable lazy_mmu sections to nest
Date: Tue, 11 Nov 2025 17:03:14 +0000 [thread overview]
Message-ID: <8f70692c-25a9-4bd0-94ab-43ab435e4b1b@arm.com> (raw)
In-Reply-To: <58fd1a6e-f2c4-421c-9b95-dea4b244a515@arm.com>
On 11/11/2025 15:56, Kevin Brodsky wrote:
> On 11/11/2025 10:24, Ryan Roberts wrote:
>> [...]
>>
>>>>> + state->active = true;
>>>>> + arch_enter_lazy_mmu_mode();
>>>>> + }
>>>>> }
>>>>>
>>>>> static inline void lazy_mmu_mode_disable(void)
>>>>> {
>>>>> - arch_leave_lazy_mmu_mode();
>>>>> + struct lazy_mmu_state *state = ¤t->lazy_mmu_state;
>>>>> +
>>>>> + VM_WARN_ON_ONCE(state->nesting_level == 0);
>>>>> + VM_WARN_ON(!state->active);
>>>>> +
>>>>> + if (--state->nesting_level == 0) {
>>>>> + state->active = false;
>>>>> + arch_leave_lazy_mmu_mode();
>>>>> + } else {
>>>>> + /* Exiting a nested section */
>>>>> + arch_flush_lazy_mmu_mode();
>>>>> + }
>>>>> }
>>>>>
>>>>> static inline void lazy_mmu_mode_pause(void)
>>>>> {
>>>>> + struct lazy_mmu_state *state = ¤t->lazy_mmu_state;
>>>>> +
>>>>> + VM_WARN_ON(state->nesting_level == 0 || !state->active);
>>>> nit: do you need the first condition? I think when nesting_level==0, we expect
>>>> to be !active?
>>> I suppose this should never happen indeed - I was just being extra
>>> defensive.
>>>
>>> Either way David suggested allowing pause()/resume() to be called
>>> outside of any section so the next version will bail out on
>>> nesting_level == 0.
>> Ignoring my current opinion that we don't need pause/resume at all for now; Are
>> you suggesting that pause/resume will be completely independent of
>> enable/disable? I think that would be best. So enable/disable increment and
>> decrement the nesting_level counter regardless of whether we are paused.
>> nesting_level 0 => 1 enables if not paused. nesting_level 1 => 0 disables if not
>> paused. pause disables nesting_level >= 1, resume enables if nesting_level >= 1.
>
> This is something else. Currently the rules are:
>
> [A]
>
> // pausing forbidden
> enable()
> pause()
> // pausing/enabling forbidden
> resume()
> disable()
>
> David suggested allowing:
>
> [B]
>
> pause()
> // pausing/enabling forbidden
> resume()
>
> Your suggestion is also allowing:
>
> [C]
>
> pause()
> // pausing forbidden
> enable()
> disable()
> resume()
I think the current kasan kasan_depopulate_vmalloc_pte() path will require [C]
if CONFIG_DEBUG_PAGEALLOC is enabled on arm64. It calls __free_page() while
paused. I guess CONFIG_DEBUG_PAGEALLOC will cause __free_page() ->
debug_pagealloc_unmap_pages() ->->-> update_range_prot() -> lazy_mmu_enable().
Arguably you could move the resume() to before the __free_page(). But it just
illustrates that it's all a bit brittle at the moment...
>
>> Perhaps we also need nested pause/resume? Then you just end up with 2 counters;
>> enable_count and pause_count. Sorry if this has already been discussed.
>
> And finally:
>
> [D]
>
> pause()
> pause()
> enable()
> disable()
> resume()
> resume()
>
> I don't really mind either way, but I don't see an immediate use for [C]
> and [D] - the idea is that the paused section is short and controlled,
> not made up of arbitrary calls.
If my thinking above is correct, then I've already demonstrated that this is not
the case. So I'd be inclined to go with [D] on the basis that it is the most robust.
Keeping 2 nesting counts (enable and pause) feels pretty elegant to me and gives
the fewest opportunities for surprises.
Thanks,
Ryan
> A potential downside of allowing [C] and
> [D] is that it makes it harder to detect unintended nesting (fewer
> VM_WARN assertions). Happy to implement it if this proves useful though.
>
> OTOH the idea behind [B] is that it allows the caller of
> pause()/resume() not to care about whether lazy MMU is actually enabled
> or not - i.e. the kasan helpers would keep working even if
> apply_to_page_range() didn't use lazy MMU any more.
>
>>>>> +
>>>>> + state->active = false;
>>>>> arch_leave_lazy_mmu_mode();
>>>>> }
>>>>>
>>>>> static inline void lazy_mmu_mode_resume(void)
>>>>> {
>>>>> + struct lazy_mmu_state *state = ¤t->lazy_mmu_state;
>>>>> +
>>>>> + VM_WARN_ON(state->nesting_level == 0 || state->active);
>>>> Similar argument?
>>>>
>>>>> +
>>>>> + state->active = true;
>>>>> arch_enter_lazy_mmu_mode();
>>>>> }
>>>>> #else
>>>>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>>>>> index cbb7340c5866..11566d973f42 100644
>>>>> --- a/include/linux/sched.h
>>>>> +++ b/include/linux/sched.h
>>>>> @@ -1441,6 +1441,10 @@ struct task_struct {
>>>>>
>>>>> struct page_frag task_frag;
>>>>>
>>>>> +#ifdef CONFIG_ARCH_HAS_LAZY_MMU_MODE
>>>>> + struct lazy_mmu_state lazy_mmu_state;
>>>>> +#endif
>>>>> +
>>>>> #ifdef CONFIG_TASK_DELAY_ACCT
>>>>> struct task_delay_info *delays;
>>>>> #endif
>>>>> @@ -1724,6 +1728,18 @@ static inline char task_state_to_char(struct task_struct *tsk)
>>>>> return task_index_to_char(task_state_index(tsk));
>>>>> }
>>>>>
>>>>> +#ifdef CONFIG_ARCH_HAS_LAZY_MMU_MODE
>>>>> +static inline bool in_lazy_mmu_mode(void)
>>>>> +{
>>>>> + return current->lazy_mmu_state.active;
>>>>> +}
>>>>> +#else
>>>>> +static inline bool in_lazy_mmu_mode(void)
>>>>> +{
>>>>> + return false;
>>>> Just pointing out that this isn't really a correct implementation:
>>>>
>>>> lazy_mmu_mode_enable()
>>>> ASSERT(in_lazy_mmu_mode()) << triggers for arches without lazy mmu
>>>> lazy_mmu_mode_disable()
>>>>
>>>> Although it probably doesn't matter in practice?
>>> I'd say that the expectation is invalid - lazy MMU mode can only be
>>> enabled if the architecture supports it. In fact as you pointed out
>>> above the API may be called in interrupt context but it will have no
>>> effect, so this sequence would always fail in interrupt context.
>> Yep, but previously there was no way to query the current state so it didn't
>> matter. Now you have a query API so it might matter if/when people come along
>> and use it in unexpected ways.
>
> I suppose the best we can do is document it alongside those helpers
> (David has already suggested some documentation, see patch 11).
>
>>> Worth nothing that in_lazy_mmu_mode() is only ever called from arch code
>>> where lazy MMU is implemented. I added the fallback as a matter of
>>> principle, but it isn't actually required.
>> Yes, I agree that's the intent. I'm just wondering if it's possible to enforce
>> that only arch code uses this. Perhaps add some docs to explain that it's only
>> intended for arches that implement lazy_mmu, and don't define it for arches that
>> don't, which would catch any generic users?
>
> Yep sounds like the best option - a lot less risk of misuse if it can't
> be called from generic code :) The build would still succeed on arch's
> that implement it, but the kernel CI should catch such calls sooner or
> later.
>
> - Kevin
next prev parent reply other threads:[~2025-11-11 17:03 UTC|newest]
Thread overview: 80+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-29 10:08 [PATCH v4 00/12] Nesting support for lazy MMU mode Kevin Brodsky
2025-10-29 10:08 ` [PATCH v4 01/12] powerpc/64s: Do not re-activate batched TLB flush Kevin Brodsky
2025-11-01 12:05 ` David Hildenbrand
2025-11-05 2:46 ` Ritesh Harjani
2025-11-06 10:29 ` Kevin Brodsky
2025-11-08 0:35 ` Ritesh Harjani
2025-11-10 13:18 ` Kevin Brodsky
2025-11-07 12:25 ` Ryan Roberts
2025-11-07 12:28 ` Ryan Roberts
2025-10-29 10:08 ` [PATCH v4 02/12] x86/xen: simplify flush_lazy_mmu() Kevin Brodsky
2025-11-01 12:14 ` David Hildenbrand
2025-11-03 18:06 ` Kevin Brodsky
2025-11-07 12:31 ` Ryan Roberts
2025-11-10 10:36 ` Kevin Brodsky
2025-11-11 10:08 ` Ryan Roberts
2025-11-07 15:45 ` Jürgen Groß
2025-10-29 10:09 ` [PATCH v4 03/12] powerpc/mm: implement arch_flush_lazy_mmu_mode() Kevin Brodsky
2025-11-01 12:14 ` David Hildenbrand
2025-11-05 3:15 ` Ritesh Harjani
2025-11-05 9:49 ` Ritesh Harjani
2025-11-06 10:31 ` Kevin Brodsky
2025-10-29 10:09 ` [PATCH v4 04/12] sparc/mm: " Kevin Brodsky
2025-11-01 12:14 ` David Hildenbrand
2025-10-29 10:09 ` [PATCH v4 05/12] mm: introduce CONFIG_ARCH_HAS_LAZY_MMU_MODE Kevin Brodsky
2025-11-01 12:16 ` David Hildenbrand
2025-11-05 4:40 ` Ritesh Harjani
2025-11-06 10:33 ` Kevin Brodsky
2025-11-07 13:56 ` Ryan Roberts
2025-11-10 10:37 ` Kevin Brodsky
2025-10-29 10:09 ` [PATCH v4 06/12] mm: introduce generic lazy_mmu helpers Kevin Brodsky
2025-11-01 12:18 ` David Hildenbrand
2025-11-07 14:26 ` Ryan Roberts
2025-11-07 14:34 ` David Hildenbrand (Red Hat)
2025-11-07 15:22 ` Ryan Roberts
2025-11-10 8:11 ` Alexander Gordeev
2025-11-10 9:19 ` Ryan Roberts
2025-11-11 8:01 ` Alexander Gordeev
2025-11-11 12:16 ` Ryan Roberts
2025-11-10 10:45 ` Kevin Brodsky
2025-11-24 12:47 ` Kevin Brodsky
2025-11-24 14:36 ` Ryan Roberts
2025-10-29 10:09 ` [PATCH v4 07/12] mm: enable lazy_mmu sections to nest Kevin Brodsky
2025-10-29 16:41 ` Alexander Gordeev
2025-10-30 10:28 ` Kevin Brodsky
2025-10-30 16:34 ` Alexander Gordeev
2025-11-01 12:22 ` David Hildenbrand
2025-11-03 18:08 ` Kevin Brodsky
2025-11-05 8:49 ` Ritesh Harjani
2025-11-05 16:12 ` Alexander Gordeev
2025-11-06 10:51 ` Kevin Brodsky
2025-11-06 15:33 ` Alexander Gordeev
2025-11-07 10:16 ` Kevin Brodsky
2025-11-06 16:32 ` Ritesh Harjani
2025-11-06 17:01 ` Ritesh Harjani
2025-11-07 11:13 ` Kevin Brodsky
2025-11-07 14:59 ` Ryan Roberts
2025-11-10 10:47 ` Kevin Brodsky
2025-11-11 10:24 ` Ryan Roberts
2025-11-11 15:56 ` Kevin Brodsky
2025-11-11 17:03 ` Ryan Roberts [this message]
2025-11-12 10:42 ` Kevin Brodsky
2025-11-12 13:57 ` David Hildenbrand (Red Hat)
2025-10-29 10:09 ` [PATCH v4 08/12] arm64: mm: replace TIF_LAZY_MMU with in_lazy_mmu_mode() Kevin Brodsky
2025-11-03 16:03 ` David Hildenbrand
2025-11-03 18:25 ` Kevin Brodsky
2025-11-07 15:28 ` Ryan Roberts
2025-10-29 10:09 ` [PATCH v4 09/12] powerpc/mm: replace batch->active " Kevin Brodsky
2025-11-03 16:05 ` David Hildenbrand
2025-11-04 11:33 ` Kevin Brodsky
2025-11-05 9:40 ` Ritesh Harjani
2025-10-29 10:09 ` [PATCH v4 10/12] sparc/mm: " Kevin Brodsky
2025-11-03 16:11 ` David Hildenbrand (Red Hat)
2025-10-29 10:09 ` [PATCH v4 11/12] x86/xen: use lazy_mmu_state when context-switching Kevin Brodsky
2025-11-03 16:15 ` David Hildenbrand (Red Hat)
2025-11-03 18:29 ` Kevin Brodsky
2025-11-03 19:23 ` David Hildenbrand (Red Hat)
2025-11-04 11:28 ` Kevin Brodsky
2025-10-29 10:09 ` [PATCH v4 12/12] mm: bail out of lazy_mmu_mode_* in interrupt context Kevin Brodsky
2025-11-07 15:42 ` Ryan Roberts
2025-11-10 10:48 ` Kevin Brodsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8f70692c-25a9-4bd0-94ab-43ab435e4b1b@arm.com \
--to=ryan.roberts@arm.com \
--cc=Liam.Howlett@oracle.com \
--cc=agordeev@linux.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=andreas@gaisler.com \
--cc=boris.ostrovsky@oracle.com \
--cc=bp@alien8.de \
--cc=catalin.marinas@arm.com \
--cc=christophe.leroy@csgroup.eu \
--cc=dave.hansen@linux.intel.com \
--cc=davem@davemloft.net \
--cc=david@redhat.com \
--cc=dwmw2@infradead.org \
--cc=hpa@zytor.com \
--cc=jannh@google.com \
--cc=jgross@suse.com \
--cc=kevin.brodsky@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=maddy@linux.ibm.com \
--cc=mhocko@suse.com \
--cc=mingo@redhat.com \
--cc=mpe@ellerman.id.au \
--cc=npiggin@gmail.com \
--cc=peterz@infradead.org \
--cc=rppt@kernel.org \
--cc=sparclinux@vger.kernel.org \
--cc=surenb@google.com \
--cc=tglx@linutronix.de \
--cc=vbabka@suse.cz \
--cc=will@kernel.org \
--cc=x86@kernel.org \
--cc=xen-devel@lists.xenproject.org \
--cc=yeoreum.yun@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox