* [PATCH v2] mm: make page_mapped_in_vma() hugetlb walk aware
@ 2025-02-24 21:14 Jane Chu
2025-02-25 6:49 ` Miaohe Lin
2025-02-26 15:49 ` David Hildenbrand
0 siblings, 2 replies; 6+ messages in thread
From: Jane Chu @ 2025-02-24 21:14 UTC (permalink / raw)
To: willy, peterx, akpm, linmiaohe, kirill.shutemov, hughd, linux-mm,
linux-kernel
When a process consumes a UE in a page, the memory failure handler
attempts to collect information for a potential SIGBUS.
If the page is an anonymous page, page_mapped_in_vma(page, vma) is
invoked in order to
1. retrieve the vaddr from the process' address space,
2. verify that the vaddr is indeed mapped to the poisoned page,
where 'page' is the precise small page with UE.
It's been observed that when injecting poison to a non-head subpage
of an anonymous hugetlb page, no SIGBUS show up; while injecting to
the head page produces a SIGBUS. The casue is that, though hugetlb_walk()
returns a valid pmd entry (on x86), but check_pte() detects mismatch
between the head page per the pmd and the input subpage. Thus the vaddr
is considered not mapped to the subpage and the process is not collected
for SIGBUS purpose. This is the calling stack
collect_procs_anon
page_mapped_in_vma
page_vma_mapped_walk
hugetlb_walk
huge_pte_lock
check_pte
check_pte() header says that it
"check if [pvmw->pfn, @pvmw->pfn + @pvmw->nr_pages) is mapped at the @pvmw->pte"
but practically works only if pvmw->pfn is the head page pfn at pvmw->pte.
Hindsight acknowledging that some pvmw->pte could point to a hugepage of
some sort such that it makes sense to make check_pte() work for hugepage.
Signed-off-by: Jane Chu <jane.chu@oracle.com>
---
mm/page_vma_mapped.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index 81839a9e74f1..367209e65830 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -84,6 +84,7 @@ static bool map_pte(struct page_vma_mapped_walk *pvmw, pmd_t *pmdvalp,
* mapped at the @pvmw->pte
* @pvmw: page_vma_mapped_walk struct, includes a pair pte and pfn range
* for checking
+ * @pte_nr: the number of small pages described by @pvmw->pte.
*
* page_vma_mapped_walk() found a place where pfn range is *potentially*
* mapped. check_pte() has to validate this.
@@ -100,7 +101,7 @@ static bool map_pte(struct page_vma_mapped_walk *pvmw, pmd_t *pmdvalp,
* Otherwise, return false.
*
*/
-static bool check_pte(struct page_vma_mapped_walk *pvmw)
+static bool check_pte(struct page_vma_mapped_walk *pvmw, unsigned long pte_nr)
{
unsigned long pfn;
pte_t ptent = ptep_get(pvmw->pte);
@@ -133,7 +134,11 @@ static bool check_pte(struct page_vma_mapped_walk *pvmw)
pfn = pte_pfn(ptent);
}
- return (pfn - pvmw->pfn) < pvmw->nr_pages;
+ if ((pfn + pte_nr - 1) < pvmw->pfn)
+ return false;
+ if (pfn > (pvmw->pfn + pvmw->nr_pages - 1))
+ return false;
+ return true;
}
/* Returns true if the two ranges overlap. Careful to not overflow. */
@@ -208,7 +213,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
return false;
pvmw->ptl = huge_pte_lock(hstate, mm, pvmw->pte);
- if (!check_pte(pvmw))
+ if (!check_pte(pvmw, pages_per_huge_page(hstate)))
return not_found(pvmw);
return true;
}
@@ -291,7 +296,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
goto next_pte;
}
this_pte:
- if (check_pte(pvmw))
+ if (check_pte(pvmw, 1))
return true;
next_pte:
do {
--
2.39.3
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] mm: make page_mapped_in_vma() hugetlb walk aware
2025-02-24 21:14 [PATCH v2] mm: make page_mapped_in_vma() hugetlb walk aware Jane Chu
@ 2025-02-25 6:49 ` Miaohe Lin
2025-02-25 19:56 ` jane.chu
2025-02-26 15:49 ` David Hildenbrand
1 sibling, 1 reply; 6+ messages in thread
From: Miaohe Lin @ 2025-02-25 6:49 UTC (permalink / raw)
To: Jane Chu
Cc: willy, peterx, akpm, kirill.shutemov, hughd, linux-mm, linux-kernel
On 2025/2/25 5:14, Jane Chu wrote:
> When a process consumes a UE in a page, the memory failure handler
> attempts to collect information for a potential SIGBUS.
> If the page is an anonymous page, page_mapped_in_vma(page, vma) is
> invoked in order to
> 1. retrieve the vaddr from the process' address space,
> 2. verify that the vaddr is indeed mapped to the poisoned page,
> where 'page' is the precise small page with UE.
>
> It's been observed that when injecting poison to a non-head subpage
> of an anonymous hugetlb page, no SIGBUS show up; while injecting to
> the head page produces a SIGBUS. The casue is that, though hugetlb_walk()
> returns a valid pmd entry (on x86), but check_pte() detects mismatch
> between the head page per the pmd and the input subpage. Thus the vaddr
> is considered not mapped to the subpage and the process is not collected
> for SIGBUS purpose. This is the calling stack
> collect_procs_anon
> page_mapped_in_vma
> page_vma_mapped_walk
> hugetlb_walk
> huge_pte_lock
> check_pte
>
> check_pte() header says that it
> "check if [pvmw->pfn, @pvmw->pfn + @pvmw->nr_pages) is mapped at the @pvmw->pte"
> but practically works only if pvmw->pfn is the head page pfn at pvmw->pte.
> Hindsight acknowledging that some pvmw->pte could point to a hugepage of
> some sort such that it makes sense to make check_pte() work for hugepage.
Thanks for your patch. This patch looks good to me.
>
> Signed-off-by: Jane Chu <jane.chu@oracle.com>
Is a Fixes tag needed?
Thanks.
.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] mm: make page_mapped_in_vma() hugetlb walk aware
2025-02-25 6:49 ` Miaohe Lin
@ 2025-02-25 19:56 ` jane.chu
0 siblings, 0 replies; 6+ messages in thread
From: jane.chu @ 2025-02-25 19:56 UTC (permalink / raw)
To: Miaohe Lin
Cc: willy, peterx, akpm, kirill.shutemov, hughd, linux-mm, linux-kernel
On 2/24/2025 10:49 PM, Miaohe Lin wrote:
> On 2025/2/25 5:14, Jane Chu wrote:
>> When a process consumes a UE in a page, the memory failure handler
>> attempts to collect information for a potential SIGBUS.
>> If the page is an anonymous page, page_mapped_in_vma(page, vma) is
>> invoked in order to
>> 1. retrieve the vaddr from the process' address space,
>> 2. verify that the vaddr is indeed mapped to the poisoned page,
>> where 'page' is the precise small page with UE.
>>
>> It's been observed that when injecting poison to a non-head subpage
>> of an anonymous hugetlb page, no SIGBUS show up; while injecting to
>> the head page produces a SIGBUS. The casue is that, though hugetlb_walk()
>> returns a valid pmd entry (on x86), but check_pte() detects mismatch
>> between the head page per the pmd and the input subpage. Thus the vaddr
>> is considered not mapped to the subpage and the process is not collected
>> for SIGBUS purpose. This is the calling stack
>> collect_procs_anon
>> page_mapped_in_vma
>> page_vma_mapped_walk
>> hugetlb_walk
>> huge_pte_lock
>> check_pte
>>
>> check_pte() header says that it
>> "check if [pvmw->pfn, @pvmw->pfn + @pvmw->nr_pages) is mapped at the @pvmw->pte"
>> but practically works only if pvmw->pfn is the head page pfn at pvmw->pte.
>> Hindsight acknowledging that some pvmw->pte could point to a hugepage of
>> some sort such that it makes sense to make check_pte() work for hugepage.
> Thanks for your patch. This patch looks good to me.
>
>> Signed-off-by: Jane Chu <jane.chu@oracle.com>
> Is a Fixes tag needed?
I don't have a clear call and here is the reason.
Since the introduction of check_pte() by ace71a19cec5e ("mm: introduce
page_vma_mapped_walk()"), it has carried the assumption that pvmw->page
(later changed to pvmw->pfn) points to the head of a huge page or a
small page and had been used in such way, so that it doesn't really
check whether a given subpage range falls within a huge leaf pte range.
When 376907f3a0b34 ("mm/memory-failure: pass the folio and the page to
collect_procs()") came along, it sort of exposed the latent issue which
hadn't been an issue before.
Thanks!
-jane
>
> Thanks.
> .
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] mm: make page_mapped_in_vma() hugetlb walk aware
2025-02-24 21:14 [PATCH v2] mm: make page_mapped_in_vma() hugetlb walk aware Jane Chu
2025-02-25 6:49 ` Miaohe Lin
@ 2025-02-26 15:49 ` David Hildenbrand
2025-02-26 15:53 ` David Hildenbrand
2025-02-26 19:08 ` jane.chu
1 sibling, 2 replies; 6+ messages in thread
From: David Hildenbrand @ 2025-02-26 15:49 UTC (permalink / raw)
To: Jane Chu, willy, peterx, akpm, linmiaohe, kirill.shutemov, hughd,
linux-mm, linux-kernel
On 24.02.25 22:14, Jane Chu wrote:
> When a process consumes a UE in a page, the memory failure handler
> attempts to collect information for a potential SIGBUS.
> If the page is an anonymous page, page_mapped_in_vma(page, vma) is
> invoked in order to
> 1. retrieve the vaddr from the process' address space,
> 2. verify that the vaddr is indeed mapped to the poisoned page,
> where 'page' is the precise small page with UE.
>
> It's been observed that when injecting poison to a non-head subpage
> of an anonymous hugetlb page, no SIGBUS show up; while injecting to
> the head page produces a SIGBUS. The casue is that, though hugetlb_walk()
> returns a valid pmd entry (on x86), but check_pte() detects mismatch
> between the head page per the pmd and the input subpage. Thus the vaddr
> is considered not mapped to the subpage and the process is not collected
> for SIGBUS purpose. This is the calling stack
> collect_procs_anon
> page_mapped_in_vma
> page_vma_mapped_walk
> hugetlb_walk
> huge_pte_lock
> check_pte
>
Why can't we require callers to never pass in subpages of hugetlb pages,
and sanity check that this is the case?
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] mm: make page_mapped_in_vma() hugetlb walk aware
2025-02-26 15:49 ` David Hildenbrand
@ 2025-02-26 15:53 ` David Hildenbrand
2025-02-26 19:08 ` jane.chu
1 sibling, 0 replies; 6+ messages in thread
From: David Hildenbrand @ 2025-02-26 15:53 UTC (permalink / raw)
To: Jane Chu, willy, peterx, akpm, linmiaohe, kirill.shutemov, hughd,
linux-mm, linux-kernel
On 26.02.25 16:49, David Hildenbrand wrote:
> On 24.02.25 22:14, Jane Chu wrote:
>> When a process consumes a UE in a page, the memory failure handler
>> attempts to collect information for a potential SIGBUS.
>> If the page is an anonymous page, page_mapped_in_vma(page, vma) is
>> invoked in order to
>> 1. retrieve the vaddr from the process' address space,
>> 2. verify that the vaddr is indeed mapped to the poisoned page,
>> where 'page' is the precise small page with UE.
>>
>> It's been observed that when injecting poison to a non-head subpage
>> of an anonymous hugetlb page, no SIGBUS show up; while injecting to
>> the head page produces a SIGBUS. The casue is that, though hugetlb_walk()
>> returns a valid pmd entry (on x86), but check_pte() detects mismatch
>> between the head page per the pmd and the input subpage. Thus the vaddr
>> is considered not mapped to the subpage and the process is not collected
>> for SIGBUS purpose. This is the calling stack
>> collect_procs_anon
>> page_mapped_in_vma
>> page_vma_mapped_walk
>> hugetlb_walk
>> huge_pte_lock
>> check_pte
>>
>
> Why can't we require callers to never pass in subpages of hugetlb pages,
> and sanity check that this is the case?
To be precise: in collect_procs_anon() pass the head page of the hugetlb
page, or in page_mapped_in_vma(), use the head page of the folio for
hugetlb folios.
hugetlb folios are always entirely mapped, adding logic to detect if
sub-pages are mapped doesn't make too much sense.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] mm: make page_mapped_in_vma() hugetlb walk aware
2025-02-26 15:49 ` David Hildenbrand
2025-02-26 15:53 ` David Hildenbrand
@ 2025-02-26 19:08 ` jane.chu
1 sibling, 0 replies; 6+ messages in thread
From: jane.chu @ 2025-02-26 19:08 UTC (permalink / raw)
To: David Hildenbrand, willy, peterx, akpm, linmiaohe,
kirill.shutemov, hughd, linux-mm, linux-kernel
On 2/26/2025 7:49 AM, David Hildenbrand wrote:
> On 24.02.25 22:14, Jane Chu wrote:
>> When a process consumes a UE in a page, the memory failure handler
>> attempts to collect information for a potential SIGBUS.
>> If the page is an anonymous page, page_mapped_in_vma(page, vma) is
>> invoked in order to
>> 1. retrieve the vaddr from the process' address space,
>> 2. verify that the vaddr is indeed mapped to the poisoned page,
>> where 'page' is the precise small page with UE.
>>
>> It's been observed that when injecting poison to a non-head subpage
>> of an anonymous hugetlb page, no SIGBUS show up; while injecting to
>> the head page produces a SIGBUS. The casue is that, though
>> hugetlb_walk()
>> returns a valid pmd entry (on x86), but check_pte() detects mismatch
>> between the head page per the pmd and the input subpage. Thus the vaddr
>> is considered not mapped to the subpage and the process is not collected
>> for SIGBUS purpose. This is the calling stack
>> collect_procs_anon
>> page_mapped_in_vma
>> page_vma_mapped_walk
>> hugetlb_walk
>> huge_pte_lock
>> check_pte
>>
>
> Why can't we require callers to never pass in subpages of hugetlb
> pages, and sanity check that this is the case?
Because for memory-failure handling, we want to pin point to the exact
small page even if the small page is part of a huge page, so that if
userspace could manage to recover, they don't have to recover the clean
subpages. Please refer to 376907f3a0b34 ("mm/memory-failure: pass the
folio and the page to collect_procs()") for the change.
thanks,
-jane
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-02-26 19:08 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-02-24 21:14 [PATCH v2] mm: make page_mapped_in_vma() hugetlb walk aware Jane Chu
2025-02-25 6:49 ` Miaohe Lin
2025-02-25 19:56 ` jane.chu
2025-02-26 15:49 ` David Hildenbrand
2025-02-26 15:53 ` David Hildenbrand
2025-02-26 19:08 ` jane.chu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox