From: "Huang, Ying" <ying.huang@linux.alibaba.com>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: David Hildenbrand <david@redhat.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Vlastimil Babka <vbabka@suse.cz>, Zi Yan <ziy@nvidia.com>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
Ryan Roberts <ryan.roberts@arm.com>,
Yang Shi <yang@os.amperecomputing.com>,
"Christoph Lameter (Ampere)" <cl@gentwo.org>,
Dev Jain <dev.jain@arm.com>, Barry Song <baohua@kernel.org>,
Anshuman Khandual <anshuman.khandual@arm.com>,
Yicong Yang <yangyicong@hisilicon.com>,
Kefeng Wang <wangkefeng.wang@huawei.com>,
Kevin Brodsky <kevin.brodsky@arm.com>,
Yin Fengwei <fengwei_yin@linux.alibaba.com>,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH -v2 1/2] mm: add spurious fault fixing support for huge pmd
Date: Thu, 16 Oct 2025 10:22:57 +0800 [thread overview]
Message-ID: <87bjm7lh4u.fsf@DESKTOP-5N7EMDA> (raw)
In-Reply-To: <d748bc6b-2aff-4cee-a233-008f4d2555fa@lucifer.local> (Lorenzo Stoakes's message of "Wed, 15 Oct 2025 12:20:30 +0100")
Lorenzo Stoakes <lorenzo.stoakes@oracle.com> writes:
> On Wed, Oct 15, 2025 at 04:43:14PM +0800, Huang, Ying wrote:
>> Hi, Lorenzo,
>>
>> Thanks for comments!
>>
>> Lorenzo Stoakes <lorenzo.stoakes@oracle.com> writes:
>>
>> > On Mon, Oct 13, 2025 at 05:20:37PM +0800, Huang Ying wrote:
>> >> In the current kernel, there is spurious fault fixing support for pte,
>> >> but not for huge pmd because no architectures need it. But in the
>> >> next patch in the series, we will change the write protection fault
>> >> handling logic on arm64, so that some stale huge pmd entries may
>> >> remain in the TLB. These entries need to be flushed via the huge pmd
>> >> spurious fault fixing mechanism.
>> >>
>> >> Signed-off-by: Huang Ying <ying.huang@linux.alibaba.com>
>> >
>> > Right now the PTE level spurious fault handling is dealt with in
>> > handle_pte_fault() when ptep_set_access_flags() returns false.
>> >
>> > Now you're updating touch_pmd() which is invoked by follow_huge_pmd() and
>> > huge_pmd_set_accessed().
>> >
>> > 1 - Why are you not adding handling to GUP?
>> >
>> > 2 - Is this the correct level of abstraction? It's really not obvious but
>> > huge_pmd_set_accessed() is invoked by __handle_mm_fault() on a non-WP,
>> > non-NUMA hint huge page fault where a page table entry already exists
>> > but we are faulting anyway (e.g. non-present or read-only writable).
>> >
>> > You don't mention any of this in the commit message, which you need to do
>> > and really need to explain how spurious faults can arise, why you can only
>> > do this at the point of abstraction you do (if you are unable to put it in
>> > actual fault handing-code), and you need to add a bunch more comments to
>> > explain this.
>>
>> This patch adds the spurious PMD page fault fixing based on the spurious
>> PTE page fault fixing. So, I assumed that the spurious page fault
>> fixing has been documented already. But you are right, nothing prevents
>> us from improving it further. Let's try to do that.
>
> I wouldn't make any kind of assumption like this in the kernel :) sadly our
> documentation is often incomplete.
>
>>
>> The page faults may be spurious because of the racy access to the page
>> table. For example, a non-populated virtual page is accessed on 2 CPUs
>> simultaneously, thus the page faults are triggered on both CPUs.
>> However, it's possible that one CPU (say CPU A) cannot find the reason
>> for the page fault if the other CPU (say CPU B) has changed the page
>> table before the PTE is checked on CPU A. Most of the time, the
>> spurious page faults can be ignored safely. However, if the page fault
>> is for the write access, it's possible that a stale read-only TLB entry
>> exists in the local CPU and needs to be flushed on some architectures.
>> This is called the spurious page fault fixing.
>>
>> The spurious page fault fixing only makes sense during page fault
>> handling, so we don't need to do it for GUP. In fact, I plan to avoid
>> it in all GUP paths in another followup patch.
>
> OK this is great, let's put it all in the kdoc for the new shared spurious
> faulting function! :) and additionally add it to the commit message.
Sure. Will do it in the next version.
[snip]
>> >>
>> >> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
>> >> index 32e8457ad535..341622ec80e4 100644
>> >> --- a/include/linux/pgtable.h
>> >> +++ b/include/linux/pgtable.h
>> >> @@ -1232,6 +1232,10 @@ static inline void arch_swap_restore(swp_entry_t entry, struct folio *folio)
>> >> #define flush_tlb_fix_spurious_fault(vma, address, ptep) flush_tlb_page(vma, address)
>> >> #endif
>> >>
>> >> +#ifndef flush_tlb_fix_spurious_fault_pmd
>> >> +#define flush_tlb_fix_spurious_fault_pmd(vma, address, ptep) do { } while (0)
>> >> +#endif
>> >
>> > flush_tlb_fix_spurious_fault(), when the arch doesn't declare it, defaults to
>> > flush_tlb_page() - why do we just do nothing in this case here?
>>
>> Because all architectures do nothing for the spurious PMD page fault
>> fixing until the [2/2] of this series. Where, we make it necessary to
>> flush the local TLB for spurious PMD page fault fixing on arm64
>> architecture.
>>
>> If we follow the design of flush_tlb_fix_spurious_fault(), we need to
>> change all architecture implementation to do nothing in this patch to
>> keep the current behavior. I don't think that it's a good idea. Do
>> you agree?
>
> Yeah probably we should keep the same behaviour as before, which is
> obviously, prior to this series, we did nothing.
>
> I guess in the PTE case we _always_ want to flush the TLB, whereas in the
> PMD case we otherwise don't have any need to at the point at which the
> spurious flush is performed.
>
> But from your explanation above re: the stale TLB entry this _only_ needs
> to be done for architectures which might encounter this problem rather than
> needing a TLB flush in general.
>
> Given we're generalising the code and one case always flushes the TLB and
> the other doesn't maybe it's worth putting a comment in the generalised
> function mentioning this?
I'm not sure whether it's a good idea to document architecture behaviors
in the general code. The behavior may be changed architecture by
architecture in the future.
[snip]
---
Best Regards,
Huang, Ying
next prev parent reply other threads:[~2025-10-16 2:23 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-13 9:20 [PATCH -v2 0/2] arm, tlbflush: avoid TLBI broadcast if page reused in write fault Huang Ying
2025-10-13 9:20 ` [PATCH -v2 1/2] mm: add spurious fault fixing support for huge pmd Huang Ying
2025-10-14 14:21 ` Lorenzo Stoakes
2025-10-14 14:38 ` David Hildenbrand
2025-10-14 14:49 ` Lorenzo Stoakes
2025-10-14 14:58 ` David Hildenbrand
2025-10-14 15:13 ` Lorenzo Stoakes
2025-10-15 8:43 ` Huang, Ying
2025-10-15 11:20 ` Lorenzo Stoakes
2025-10-15 12:23 ` David Hildenbrand
2025-10-16 2:22 ` Huang, Ying [this message]
2025-10-16 8:25 ` Lorenzo Stoakes
2025-10-16 8:59 ` David Hildenbrand
2025-10-16 9:12 ` Huang, Ying
2025-10-13 9:20 ` [PATCH -v2 2/2] arm64, tlbflush: don't TLBI broadcast if page reused in write fault Huang Ying
2025-10-15 15:28 ` Ryan Roberts
2025-10-16 1:35 ` Huang, Ying
2025-10-22 4:08 ` Barry Song
2025-10-22 7:31 ` Huang, Ying
2025-10-22 8:14 ` Barry Song
2025-10-22 9:02 ` Huang, Ying
2025-10-22 9:17 ` Barry Song
2025-10-22 9:30 ` Huang, Ying
2025-10-22 9:37 ` Barry Song
2025-10-22 9:46 ` Huang, Ying
2025-10-22 9:55 ` Barry Song
2025-10-22 10:22 ` Barry Song
2025-10-22 10:34 ` Huang, Ying
2025-10-22 10:52 ` Barry Song
2025-10-23 1:22 ` Huang, Ying
2025-10-23 5:39 ` Barry Song
2025-10-23 6:15 ` Huang, Ying
2025-10-23 10:18 ` Ryan Roberts
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87bjm7lh4u.fsf@DESKTOP-5N7EMDA \
--to=ying.huang@linux.alibaba.com \
--cc=akpm@linux-foundation.org \
--cc=anshuman.khandual@arm.com \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=catalin.marinas@arm.com \
--cc=cl@gentwo.org \
--cc=david@redhat.com \
--cc=dev.jain@arm.com \
--cc=fengwei_yin@linux.alibaba.com \
--cc=kevin.brodsky@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=ryan.roberts@arm.com \
--cc=vbabka@suse.cz \
--cc=wangkefeng.wang@huawei.com \
--cc=will@kernel.org \
--cc=yang@os.amperecomputing.com \
--cc=yangyicong@hisilicon.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox