linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ryan Roberts <ryan.roberts@arm.com>
To: David Hildenbrand <david@redhat.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	Pasha Tatashin <pasha.tatashin@soleen.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Uladzislau Rezki <urezki@gmail.com>,
	Christoph Hellwig <hch@infradead.org>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Alexandre Ghiti <alexghiti@rivosinc.com>,
	Kevin Brodsky <kevin.brodsky@arm.com>
Cc: linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,
	syzbot+5c0d9392e042f41d45c5@syzkaller.appspotmail.com
Subject: Re: [PATCH] arm64/mm: Disable barrier batching in interrupt contexts
Date: Mon, 12 May 2025 13:00:05 +0100	[thread overview]
Message-ID: <e37d5e61-54e7-4425-837f-25a13f5a68b5@arm.com> (raw)
In-Reply-To: <001dfd4f-27f2-407f-bd1c-21928a754342@redhat.com>

On 12/05/2025 12:07, David Hildenbrand wrote:
> On 12.05.25 12:22, Ryan Roberts wrote:
>> Commit 5fdd05efa1cd ("arm64/mm: Batch barriers when updating kernel
>> mappings") enabled arm64 kernels to track "lazy mmu mode" using TIF
>> flags in order to defer barriers until exiting the mode. At the same
>> time, it added warnings to check that pte manipulations were never
>> performed in interrupt context, because the tracking implementation
>> could not deal with nesting.
>>
>> But it turns out that some debug features (e.g. KFENCE, DEBUG_PAGEALLOC)
>> do manipulate ptes in softirq context, which triggered the warnings.
>>
>> So let's take the simplest and safest route and disable the batching
>> optimization in interrupt contexts. This makes these users no worse off
>> than prior to the optimization. Additionally the known offenders are
>> debug features that only manipulate a single PTE, so there is no
>> performance gain anyway.
>>
>> There may be some obscure case of encrypted/decrypted DMA with the
>> dma_free_coherent called from an interrupt context, but again, this is
>> no worse off than prior to the commit.
>>
>> Some options for supporting nesting were considered, but there is a
>> difficult to solve problem if any code manipulates ptes within interrupt
>> context but *outside of* a lazy mmu region. If this case exists, the
>> code would expect the updates to be immediate, but because the task
>> context may have already been in lazy mmu mode, the updates would be
>> deferred, which could cause incorrect behaviour. This problem is avoided
>> by always ensuring updates within interrupt context are immediate.
>>
>> Fixes: 5fdd05efa1cd ("arm64/mm: Batch barriers when updating kernel mappings")
>> Reported-by: syzbot+5c0d9392e042f41d45c5@syzkaller.appspotmail.com
>> Closes: https://lore.kernel.org/linux-arm-
>> kernel/681f2a09.050a0220.f2294.0006.GAE@google.com/
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>>
>> Hi Will,
>>
>> I've tested before and after with KFENCE enabled and it solves the issue. I've
>> also run all the mm-selftests which all continue to pass.
>>
>> Catalin suggested a Fixes patch targetting the SHA as it is in for-next/mm was
>> the preferred approach, but shout if you want something different. I'm hoping
>> that with this fix we can still make it for this cycle, subject to not finding
>> any more issues.
>>
>> Thanks,
>> Ryan
>>
>>
>>   arch/arm64/include/asm/pgtable.h | 16 ++++++++++++++--
>>   1 file changed, 14 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>> index ab4a1b19e596..e65083ec35cb 100644
>> --- a/arch/arm64/include/asm/pgtable.h
>> +++ b/arch/arm64/include/asm/pgtable.h
>> @@ -64,7 +64,11 @@ static inline void queue_pte_barriers(void)
>>   {
>>       unsigned long flags;
>>
>> -    VM_WARN_ON(in_interrupt());
>> +    if (in_interrupt()) {
>> +        emit_pte_barriers();
>> +        return;
>> +    }
>> +
>>       flags = read_thread_flags();
>>
>>       if (flags & BIT(TIF_LAZY_MMU)) {
>> @@ -79,7 +83,9 @@ static inline void queue_pte_barriers(void)
>>   #define  __HAVE_ARCH_ENTER_LAZY_MMU_MODE
>>   static inline void arch_enter_lazy_mmu_mode(void)
>>   {
>> -    VM_WARN_ON(in_interrupt());
>> +    if (in_interrupt())
>> +        return;
>> +
>>       VM_WARN_ON(test_thread_flag(TIF_LAZY_MMU));
>>
>>       set_thread_flag(TIF_LAZY_MMU);
>> @@ -87,12 +93,18 @@ static inline void arch_enter_lazy_mmu_mode(void)
>>
>>   static inline void arch_flush_lazy_mmu_mode(void)
>>   {
>> +    if (in_interrupt())
>> +        return;
>> +
>>       if (test_and_clear_thread_flag(TIF_LAZY_MMU_PENDING))
>>           emit_pte_barriers();
>>   }
>>
>>   static inline void arch_leave_lazy_mmu_mode(void)
>>   {
>> +    if (in_interrupt())
>> +        return;
>> +
>>       arch_flush_lazy_mmu_mode();
>>       clear_thread_flag(TIF_LAZY_MMU);
>>   }
> 
> I guess in all cases we could optimize out the in_interrupt() check on !debug
> configs.

I think that assumes we can easily and accurately identify all configs that
cause this? We've identified 2 but I'm not confident that it's a full list.
Also, KFENCE isn't really a debug config (despite me calling it that in the
commit log) - it's supposed to be something that can be enabled in production
builds.

> 
> Hm, maybe there is an elegant way to catch all of these "problematic" users?

I'm all ears if you have any suggestions? :)


It actaully looks like x86/XEN tries to solves this problem in a similar way:

enum xen_lazy_mode xen_get_lazy_mode(void)
{
	if (in_interrupt())
		return XEN_LAZY_NONE;

	return this_cpu_read(xen_lazy_mode);
}

Although I'm not convinced it's fully robust. It also has:

static inline void enter_lazy(enum xen_lazy_mode mode)
{
	BUG_ON(this_cpu_read(xen_lazy_mode) != XEN_LAZY_NONE);

	this_cpu_write(xen_lazy_mode, mode);
}

which is called as part of its arch_enter_lazy_mmu_mode() implementation. If a
task was already in lazy mmu mode when an interrupt comes in and causes the
nested arch_enter_lazy_mmu_mode() that we saw in this bug report, surely that
BUG_ON() should trigger?

Thanks,
Ryan



  reply	other threads:[~2025-05-12 12:00 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-12 10:22 Ryan Roberts
2025-05-12 11:00 ` Catalin Marinas
2025-05-12 11:03   ` Ryan Roberts
2025-05-12 11:07 ` David Hildenbrand
2025-05-12 12:00   ` Ryan Roberts [this message]
2025-05-12 12:05     ` David Hildenbrand
2025-05-12 12:33       ` Ryan Roberts
2025-05-12 13:14 ` Catalin Marinas
2025-05-12 13:53   ` Ryan Roberts
2025-05-12 14:14     ` Ryan Roberts
2025-05-13 20:46 ` Will Deacon
2025-05-14  9:29   ` Ryan Roberts
2025-05-14 15:13     ` Will Deacon
2025-05-14 15:14 ` Will Deacon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e37d5e61-54e7-4425-837f-25a13f5a68b5@arm.com \
    --to=ryan.roberts@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexghiti@rivosinc.com \
    --cc=anshuman.khandual@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=david@redhat.com \
    --cc=hch@infradead.org \
    --cc=kevin.brodsky@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mark.rutland@arm.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=syzbot+5c0d9392e042f41d45c5@syzkaller.appspotmail.com \
    --cc=urezki@gmail.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox