linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH mm-new 1/1] mm/khugepaged: abort collapse scan on non-swap entries
@ 2025-09-24 10:02 Lance Yang
  2025-09-24 10:10 ` David Hildenbrand
  2025-09-24 10:10 ` Kiryl Shutsemau
  0 siblings, 2 replies; 9+ messages in thread
From: Lance Yang @ 2025-09-24 10:02 UTC (permalink / raw)
  To: akpm
  Cc: david, lorenzo.stoakes, Liam.Howlett, baohua, baolin.wang,
	dev.jain, hughd, ioworker0, kirill, linux-kernel, linux-mm,
	mpenttil, npache, ryan.roberts, ziy, richard.weiyang, Lance Yang

From: Lance Yang <lance.yang@linux.dev>

The existing check in hpage_collapse_scan_pmd() is specific to uffd-wp
markers. Other special markers (e.g., GUARD, POISONED) would not be caught
early, leading to failures deeper in the swap-in logic.

hpage_collapse_scan_pmd()
 `- collapse_huge_page()
     `- __collapse_huge_page_swapin() -> fails!

As David suggested[1], this patch skips any such non-swap entries early.
If a special marker is found, the scan is aborted immediately with the
SCAN_PTE_NON_PRESENT result, as Lorenzo suggested[2], avoiding wasted
work.

[1] https://lore.kernel.org/linux-mm/7840f68e-7580-42cb-a7c8-1ba64fd6df69@redhat.com
[2] https://lore.kernel.org/linux-mm/7df49fe7-c6b7-426a-8680-dcd55219c8bd@lucifer.local

Suggested-by: David Hildenbrand <david@redhat.com>
Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Lance Yang <lance.yang@linux.dev>
---
 mm/khugepaged.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 7ab2d1a42df3..e9778e7734b5 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1285,16 +1285,19 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
 	     _pte++, addr += PAGE_SIZE) {
 		pte_t pteval = ptep_get(_pte);
 		if (is_swap_pte(pteval)) {
+			swp_entry_t swp = pte_to_swp_entry(pteval);
 			++unmapped;
 			if (!cc->is_khugepaged ||
 			    unmapped <= khugepaged_max_ptes_swap) {
 				/*
-				 * Always be strict with uffd-wp
-				 * enabled swap entries.  Please see
-				 * comment below for pte_uffd_wp().
+				 * Always be strict with PTE markers, which are
+				 * special non-swap entries (e.g., for UFFD_WP,
+				 * POISONED, GUARD). We cannot collapse over
+				 * them, so just abort the scan here.
 				 */
-				if (pte_swp_uffd_wp_any(pteval)) {
-					result = SCAN_PTE_UFFD_WP;
+				if (is_pte_marker_entry(swp) &&
+				    pte_marker_get(swp)) {
+					result = SCAN_PTE_NON_PRESENT;
 					goto out_unmap;
 				}
 				continue;
-- 
2.49.0



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH mm-new 1/1] mm/khugepaged: abort collapse scan on non-swap entries
  2025-09-24 10:02 [PATCH mm-new 1/1] mm/khugepaged: abort collapse scan on non-swap entries Lance Yang
@ 2025-09-24 10:10 ` David Hildenbrand
  2025-09-24 10:17   ` Kiryl Shutsemau
  2025-09-24 11:47   ` Lance Yang
  2025-09-24 10:10 ` Kiryl Shutsemau
  1 sibling, 2 replies; 9+ messages in thread
From: David Hildenbrand @ 2025-09-24 10:10 UTC (permalink / raw)
  To: Lance Yang, akpm
  Cc: lorenzo.stoakes, Liam.Howlett, baohua, baolin.wang, dev.jain,
	hughd, ioworker0, kirill, linux-kernel, linux-mm, mpenttil,
	npache, ryan.roberts, ziy, richard.weiyang

On 24.09.25 12:02, Lance Yang wrote:
> From: Lance Yang <lance.yang@linux.dev>
> 
> The existing check in hpage_collapse_scan_pmd() is specific to uffd-wp
> markers. Other special markers (e.g., GUARD, POISONED) would not be caught
> early, leading to failures deeper in the swap-in logic.
> 
> hpage_collapse_scan_pmd()
>   `- collapse_huge_page()
>       `- __collapse_huge_page_swapin() -> fails!
> 
> As David suggested[1], this patch skips any such non-swap entries early.
> If a special marker is found, the scan is aborted immediately with the
> SCAN_PTE_NON_PRESENT result, as Lorenzo suggested[2], avoiding wasted
> work.

Note that I suggested to skip all non-present entries except swap 
entries, which includes migration entries, hwpoisoned entries etc.

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH mm-new 1/1] mm/khugepaged: abort collapse scan on non-swap entries
  2025-09-24 10:02 [PATCH mm-new 1/1] mm/khugepaged: abort collapse scan on non-swap entries Lance Yang
  2025-09-24 10:10 ` David Hildenbrand
@ 2025-09-24 10:10 ` Kiryl Shutsemau
  1 sibling, 0 replies; 9+ messages in thread
From: Kiryl Shutsemau @ 2025-09-24 10:10 UTC (permalink / raw)
  To: Lance Yang
  Cc: akpm, david, lorenzo.stoakes, Liam.Howlett, baohua, baolin.wang,
	dev.jain, hughd, ioworker0, linux-kernel, linux-mm, mpenttil,
	npache, ryan.roberts, ziy, richard.weiyang

On Wed, Sep 24, 2025 at 06:02:07PM +0800, Lance Yang wrote:
> From: Lance Yang <lance.yang@linux.dev>
> 
> The existing check in hpage_collapse_scan_pmd() is specific to uffd-wp
> markers. Other special markers (e.g., GUARD, POISONED) would not be caught
> early, leading to failures deeper in the swap-in logic.
> 
> hpage_collapse_scan_pmd()
>  `- collapse_huge_page()
>      `- __collapse_huge_page_swapin() -> fails!
> 
> As David suggested[1], this patch skips any such non-swap entries early.
> If a special marker is found, the scan is aborted immediately with the
> SCAN_PTE_NON_PRESENT result, as Lorenzo suggested[2], avoiding wasted
> work.
> 
> [1] https://lore.kernel.org/linux-mm/7840f68e-7580-42cb-a7c8-1ba64fd6df69@redhat.com
> [2] https://lore.kernel.org/linux-mm/7df49fe7-c6b7-426a-8680-dcd55219c8bd@lucifer.local
> 
> Suggested-by: David Hildenbrand <david@redhat.com>
> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Signed-off-by: Lance Yang <lance.yang@linux.dev>
> ---
>  mm/khugepaged.c | 13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 7ab2d1a42df3..e9778e7734b5 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1285,16 +1285,19 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>  	     _pte++, addr += PAGE_SIZE) {
>  		pte_t pteval = ptep_get(_pte);
>  		if (is_swap_pte(pteval)) {
> +			swp_entry_t swp = pte_to_swp_entry(pteval);
>  			++unmapped;
>  			if (!cc->is_khugepaged ||
>  			    unmapped <= khugepaged_max_ptes_swap) {
>  				/*
> -				 * Always be strict with uffd-wp
> -				 * enabled swap entries.  Please see
> -				 * comment below for pte_uffd_wp().
> +				 * Always be strict with PTE markers, which are
> +				 * special non-swap entries (e.g., for UFFD_WP,
> +				 * POISONED, GUARD). We cannot collapse over
> +				 * them, so just abort the scan here.
>  				 */
> -				if (pte_swp_uffd_wp_any(pteval)) {
> -					result = SCAN_PTE_UFFD_WP;
> +				if (is_pte_marker_entry(swp) &&
> +				    pte_marker_get(swp)) {

Hm. Can we have a marker that have zero pte_marker_get()?

> +					result = SCAN_PTE_NON_PRESENT;
>  					goto out_unmap;
>  				}
>  				continue;
> -- 
> 2.49.0
> 

-- 
  Kiryl Shutsemau / Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH mm-new 1/1] mm/khugepaged: abort collapse scan on non-swap entries
  2025-09-24 10:10 ` David Hildenbrand
@ 2025-09-24 10:17   ` Kiryl Shutsemau
  2025-09-24 10:40     ` David Hildenbrand
  2025-09-24 11:47   ` Lance Yang
  1 sibling, 1 reply; 9+ messages in thread
From: Kiryl Shutsemau @ 2025-09-24 10:17 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Lance Yang, akpm, lorenzo.stoakes, Liam.Howlett, baohua,
	baolin.wang, dev.jain, hughd, ioworker0, linux-kernel, linux-mm,
	mpenttil, npache, ryan.roberts, ziy, richard.weiyang

On Wed, Sep 24, 2025 at 12:10:47PM +0200, David Hildenbrand wrote:
> On 24.09.25 12:02, Lance Yang wrote:
> > From: Lance Yang <lance.yang@linux.dev>
> > 
> > The existing check in hpage_collapse_scan_pmd() is specific to uffd-wp
> > markers. Other special markers (e.g., GUARD, POISONED) would not be caught
> > early, leading to failures deeper in the swap-in logic.
> > 
> > hpage_collapse_scan_pmd()
> >   `- collapse_huge_page()
> >       `- __collapse_huge_page_swapin() -> fails!
> > 
> > As David suggested[1], this patch skips any such non-swap entries early.
> > If a special marker is found, the scan is aborted immediately with the
> > SCAN_PTE_NON_PRESENT result, as Lorenzo suggested[2], avoiding wasted
> > work.
> 
> Note that I suggested to skip all non-present entries except swap entries,
> which includes migration entries, hwpoisoned entries etc.

Hm. So swap in is fine, but wait for migration to complete is not?
Seems odd.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH mm-new 1/1] mm/khugepaged: abort collapse scan on non-swap entries
  2025-09-24 10:17   ` Kiryl Shutsemau
@ 2025-09-24 10:40     ` David Hildenbrand
  0 siblings, 0 replies; 9+ messages in thread
From: David Hildenbrand @ 2025-09-24 10:40 UTC (permalink / raw)
  To: Kiryl Shutsemau
  Cc: Lance Yang, akpm, lorenzo.stoakes, Liam.Howlett, baohua,
	baolin.wang, dev.jain, hughd, ioworker0, linux-kernel, linux-mm,
	mpenttil, npache, ryan.roberts, ziy, richard.weiyang

On 24.09.25 12:17, Kiryl Shutsemau wrote:
> On Wed, Sep 24, 2025 at 12:10:47PM +0200, David Hildenbrand wrote:
>> On 24.09.25 12:02, Lance Yang wrote:
>>> From: Lance Yang <lance.yang@linux.dev>
>>>
>>> The existing check in hpage_collapse_scan_pmd() is specific to uffd-wp
>>> markers. Other special markers (e.g., GUARD, POISONED) would not be caught
>>> early, leading to failures deeper in the swap-in logic.
>>>
>>> hpage_collapse_scan_pmd()
>>>    `- collapse_huge_page()
>>>        `- __collapse_huge_page_swapin() -> fails!
>>>
>>> As David suggested[1], this patch skips any such non-swap entries early.
>>> If a special marker is found, the scan is aborted immediately with the
>>> SCAN_PTE_NON_PRESENT result, as Lorenzo suggested[2], avoiding wasted
>>> work.
>>
>> Note that I suggested to skip all non-present entries except swap entries,
>> which includes migration entries, hwpoisoned entries etc.
> 
> Hm. So swap in is fine, but wait for migration to complete is not?

If so we'd have to add the logic to actually wait for migration entries, 
and not count them towards max swap entries.

But that's a different discussion and could be added on top of cleanly 
handling all non-swap entries.

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH mm-new 1/1] mm/khugepaged: abort collapse scan on non-swap entries
  2025-09-24 10:10 ` David Hildenbrand
  2025-09-24 10:17   ` Kiryl Shutsemau
@ 2025-09-24 11:47   ` Lance Yang
  2025-09-29 10:10     ` Lance Yang
  2025-09-29 10:29     ` David Hildenbrand
  1 sibling, 2 replies; 9+ messages in thread
From: Lance Yang @ 2025-09-24 11:47 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: lorenzo.stoakes, Liam.Howlett, baohua, baolin.wang, dev.jain,
	hughd, ioworker0, kirill, linux-kernel, linux-mm, mpenttil,
	npache, ryan.roberts, ziy, richard.weiyang, akpm



On 2025/9/24 18:10, David Hildenbrand wrote:
> On 24.09.25 12:02, Lance Yang wrote:
>> From: Lance Yang <lance.yang@linux.dev>
>>
>> The existing check in hpage_collapse_scan_pmd() is specific to uffd-wp
>> markers. Other special markers (e.g., GUARD, POISONED) would not be 
>> caught
>> early, leading to failures deeper in the swap-in logic.
>>
>> hpage_collapse_scan_pmd()
>>   `- collapse_huge_page()
>>       `- __collapse_huge_page_swapin() -> fails!
>>
>> As David suggested[1], this patch skips any such non-swap entries early.
>> If a special marker is found, the scan is aborted immediately with the
>> SCAN_PTE_NON_PRESENT result, as Lorenzo suggested[2], avoiding wasted
>> work.
> 
> Note that I suggested to skip all non-present entries except swap 
> entries, which includes migration entries, hwpoisoned entries etc.

Oops, I completely misunderstood your suggestion :(

It should be to handle all special non-present entries (migration,
hwpoison, markers), not just a specific type of marker ...

How about this version, which handles all non-swap entries as you
suggested?

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 7ab2d1a42df3..27f432e7f07c 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1284,7 +1284,23 @@ static int hpage_collapse_scan_pmd(struct 
mm_struct *mm,
         for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
              _pte++, addr += PAGE_SIZE) {
                 pte_t pteval = ptep_get(_pte);
-               if (is_swap_pte(pteval)) {
+               if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
+                       ++none_or_zero;
+                       if (!userfaultfd_armed(vma) &&
+                           (!cc->is_khugepaged ||
+                            none_or_zero <= khugepaged_max_ptes_none)) {
+                               continue;
+                       } else {
+                               result = SCAN_EXCEED_NONE_PTE;
+                               count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
+                               goto out_unmap;
+                       }
+               } else if (!pte_present(pteval)) {
+                       if (non_swap_entry(pte_to_swp_entry(pteval))) {
+                               result = SCAN_PTE_NON_PRESENT;
+                               goto out_unmap;
+                       }
+
                         ++unmapped;
                         if (!cc->is_khugepaged ||
                             unmapped <= khugepaged_max_ptes_swap) {
@@ -1293,7 +1309,7 @@ static int hpage_collapse_scan_pmd(struct 
mm_struct *mm,
                                  * enabled swap entries.  Please see
                                  * comment below for pte_uffd_wp().
                                  */
-                               if (pte_swp_uffd_wp_any(pteval)) {
+                               if (pte_swp_uffd_wp(pteval)) {
                                         result = SCAN_PTE_UFFD_WP;
                                         goto out_unmap;
                                 }
@@ -1304,18 +1320,6 @@ static int hpage_collapse_scan_pmd(struct 
mm_struct *mm,
                                 goto out_unmap;
                         }
                 }
-               if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
-                       ++none_or_zero;
-                       if (!userfaultfd_armed(vma) &&
-                           (!cc->is_khugepaged ||
-                            none_or_zero <= khugepaged_max_ptes_none)) {
-                               continue;
-                       } else {
-                               result = SCAN_EXCEED_NONE_PTE;
-                               count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
-                               goto out_unmap;
-                       }
-               }
                 if (pte_uffd_wp(pteval)) {
                         /*
                          * Don't collapse the page if any of the small
---

Thanks,
Lance


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH mm-new 1/1] mm/khugepaged: abort collapse scan on non-swap entries
  2025-09-24 11:47   ` Lance Yang
@ 2025-09-29 10:10     ` Lance Yang
  2025-09-29 10:29     ` David Hildenbrand
  1 sibling, 0 replies; 9+ messages in thread
From: Lance Yang @ 2025-09-29 10:10 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: lorenzo.stoakes, Liam.Howlett, baohua, baolin.wang, dev.jain,
	hughd, ioworker0, kirill, linux-kernel, linux-mm, mpenttil,
	npache, ryan.roberts, ziy, richard.weiyang, akpm



On 2025/9/24 19:47, Lance Yang wrote:
> 
> 
> On 2025/9/24 18:10, David Hildenbrand wrote:
>> On 24.09.25 12:02, Lance Yang wrote:
>>> From: Lance Yang <lance.yang@linux.dev>
>>>
>>> The existing check in hpage_collapse_scan_pmd() is specific to uffd-wp
>>> markers. Other special markers (e.g., GUARD, POISONED) would not be 
>>> caught
>>> early, leading to failures deeper in the swap-in logic.
>>>
>>> hpage_collapse_scan_pmd()
>>>   `- collapse_huge_page()
>>>       `- __collapse_huge_page_swapin() -> fails!
>>>
>>> As David suggested[1], this patch skips any such non-swap entries early.
>>> If a special marker is found, the scan is aborted immediately with the
>>> SCAN_PTE_NON_PRESENT result, as Lorenzo suggested[2], avoiding wasted
>>> work.
>>
>> Note that I suggested to skip all non-present entries except swap 
>> entries, which includes migration entries, hwpoisoned entries etc.
> 
> Oops, I completely misunderstood your suggestion :(
> 
> It should be to handle all special non-present entries (migration,
> hwpoison, markers), not just a specific type of marker ...
> 
> How about this version, which handles all non-swap entries as you
> suggested?

@David

Gently ping ...

> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 7ab2d1a42df3..27f432e7f07c 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1284,7 +1284,23 @@ static int hpage_collapse_scan_pmd(struct 
> mm_struct *mm,
>          for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
>               _pte++, addr += PAGE_SIZE) {
>                  pte_t pteval = ptep_get(_pte);
> -               if (is_swap_pte(pteval)) {
> +               if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
> +                       ++none_or_zero;
> +                       if (!userfaultfd_armed(vma) &&
> +                           (!cc->is_khugepaged ||
> +                            none_or_zero <= khugepaged_max_ptes_none)) {
> +                               continue;
> +                       } else {
> +                               result = SCAN_EXCEED_NONE_PTE;
> +                               count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
> +                               goto out_unmap;
> +                       }
> +               } else if (!pte_present(pteval)) {
> +                       if (non_swap_entry(pte_to_swp_entry(pteval))) {
> +                               result = SCAN_PTE_NON_PRESENT;
> +                               goto out_unmap;
> +                       }
> +
>                          ++unmapped;
>                          if (!cc->is_khugepaged ||
>                              unmapped <= khugepaged_max_ptes_swap) {
> @@ -1293,7 +1309,7 @@ static int hpage_collapse_scan_pmd(struct 
> mm_struct *mm,
>                                   * enabled swap entries.  Please see
>                                   * comment below for pte_uffd_wp().
>                                   */
> -                               if (pte_swp_uffd_wp_any(pteval)) {
> +                               if (pte_swp_uffd_wp(pteval)) {
>                                          result = SCAN_PTE_UFFD_WP;
>                                          goto out_unmap;
>                                  }
> @@ -1304,18 +1320,6 @@ static int hpage_collapse_scan_pmd(struct 
> mm_struct *mm,
>                                  goto out_unmap;
>                          }
>                  }
> -               if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
> -                       ++none_or_zero;
> -                       if (!userfaultfd_armed(vma) &&
> -                           (!cc->is_khugepaged ||
> -                            none_or_zero <= khugepaged_max_ptes_none)) {
> -                               continue;
> -                       } else {
> -                               result = SCAN_EXCEED_NONE_PTE;
> -                               count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
> -                               goto out_unmap;
> -                       }
> -               }
>                  if (pte_uffd_wp(pteval)) {
>                          /*
>                           * Don't collapse the page if any of the small
> ---
> 
> Thanks,
> Lance



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH mm-new 1/1] mm/khugepaged: abort collapse scan on non-swap entries
  2025-09-24 11:47   ` Lance Yang
  2025-09-29 10:10     ` Lance Yang
@ 2025-09-29 10:29     ` David Hildenbrand
  2025-09-29 10:39       ` Lance Yang
  1 sibling, 1 reply; 9+ messages in thread
From: David Hildenbrand @ 2025-09-29 10:29 UTC (permalink / raw)
  To: Lance Yang
  Cc: lorenzo.stoakes, Liam.Howlett, baohua, baolin.wang, dev.jain,
	hughd, ioworker0, kirill, linux-kernel, linux-mm, mpenttil,
	npache, ryan.roberts, ziy, richard.weiyang, akpm

On 24.09.25 13:47, Lance Yang wrote:
> 
> 
> On 2025/9/24 18:10, David Hildenbrand wrote:
>> On 24.09.25 12:02, Lance Yang wrote:
>>> From: Lance Yang <lance.yang@linux.dev>
>>>
>>> The existing check in hpage_collapse_scan_pmd() is specific to uffd-wp
>>> markers. Other special markers (e.g., GUARD, POISONED) would not be
>>> caught
>>> early, leading to failures deeper in the swap-in logic.
>>>
>>> hpage_collapse_scan_pmd()
>>>    `- collapse_huge_page()
>>>        `- __collapse_huge_page_swapin() -> fails!
>>>
>>> As David suggested[1], this patch skips any such non-swap entries early.
>>> If a special marker is found, the scan is aborted immediately with the
>>> SCAN_PTE_NON_PRESENT result, as Lorenzo suggested[2], avoiding wasted
>>> work.
>>
>> Note that I suggested to skip all non-present entries except swap
>> entries, which includes migration entries, hwpoisoned entries etc.
> 
> Oops, I completely misunderstood your suggestion :(
> 
> It should be to handle all special non-present entries (migration,
> hwpoison, markers), not just a specific type of marker ...
> 
> How about this version, which handles all non-swap entries as you
> suggested?
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 7ab2d1a42df3..27f432e7f07c 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1284,7 +1284,23 @@ static int hpage_collapse_scan_pmd(struct
> mm_struct *mm,
>           for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
>                _pte++, addr += PAGE_SIZE) {
>                   pte_t pteval = ptep_get(_pte);
> -               if (is_swap_pte(pteval)) {
> +               if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
> +                       ++none_or_zero;
> +                       if (!userfaultfd_armed(vma) &&
> +                           (!cc->is_khugepaged ||
> +                            none_or_zero <= khugepaged_max_ptes_none)) {
> +                               continue;
> +                       } else {
> +                               result = SCAN_EXCEED_NONE_PTE;
> +                               count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
> +                               goto out_unmap;
> +                       }
> +               } else if (!pte_present(pteval)) {
> +                       if (non_swap_entry(pte_to_swp_entry(pteval))) {
> +                               result = SCAN_PTE_NON_PRESENT;
> +                               goto out_unmap;
> +                       }
> +
>                           ++unmapped;
>                           if (!cc->is_khugepaged ||
>                               unmapped <= khugepaged_max_ptes_swap) {
> @@ -1293,7 +1309,7 @@ static int hpage_collapse_scan_pmd(struct
> mm_struct *mm,
>                                    * enabled swap entries.  Please see
>                                    * comment below for pte_uffd_wp().
>                                    */
> -                               if (pte_swp_uffd_wp_any(pteval)) {
> +                               if (pte_swp_uffd_wp(pteval)) {
>                                           result = SCAN_PTE_UFFD_WP;
>                                           goto out_unmap;
>                                   }
> @@ -1304,18 +1320,6 @@ static int hpage_collapse_scan_pmd(struct
> mm_struct *mm,
>                                   goto out_unmap;
>                           }
>                   }
> -               if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
> -                       ++none_or_zero;
> -                       if (!userfaultfd_armed(vma) &&
> -                           (!cc->is_khugepaged ||
> -                            none_or_zero <= khugepaged_max_ptes_none)) {
> -                               continue;
> -                       } else {
> -                               result = SCAN_EXCEED_NONE_PTE;
> -                               count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
> -                               goto out_unmap;
> -                       }
> -               }
>                   if (pte_uffd_wp(pteval)) {

 From a quick glimpse, this should work. And as raised, we might be able 
to unify later the scanning with the almost-duplicated code when we do 
the second scan.

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH mm-new 1/1] mm/khugepaged: abort collapse scan on non-swap entries
  2025-09-29 10:29     ` David Hildenbrand
@ 2025-09-29 10:39       ` Lance Yang
  0 siblings, 0 replies; 9+ messages in thread
From: Lance Yang @ 2025-09-29 10:39 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: lorenzo.stoakes, Liam.Howlett, baohua, baolin.wang, dev.jain,
	hughd, ioworker0, kirill, linux-kernel, linux-mm, mpenttil,
	npache, ryan.roberts, ziy, richard.weiyang, akpm



On 2025/9/29 18:29, David Hildenbrand wrote:
> On 24.09.25 13:47, Lance Yang wrote:
>>
>>
>> On 2025/9/24 18:10, David Hildenbrand wrote:
>>> On 24.09.25 12:02, Lance Yang wrote:
>>>> From: Lance Yang <lance.yang@linux.dev>
>>>>
>>>> The existing check in hpage_collapse_scan_pmd() is specific to uffd-wp
>>>> markers. Other special markers (e.g., GUARD, POISONED) would not be
>>>> caught
>>>> early, leading to failures deeper in the swap-in logic.
>>>>
>>>> hpage_collapse_scan_pmd()
>>>>    `- collapse_huge_page()
>>>>        `- __collapse_huge_page_swapin() -> fails!
>>>>
>>>> As David suggested[1], this patch skips any such non-swap entries 
>>>> early.
>>>> If a special marker is found, the scan is aborted immediately with the
>>>> SCAN_PTE_NON_PRESENT result, as Lorenzo suggested[2], avoiding wasted
>>>> work.
>>>
>>> Note that I suggested to skip all non-present entries except swap
>>> entries, which includes migration entries, hwpoisoned entries etc.
>>
>> Oops, I completely misunderstood your suggestion :(
>>
>> It should be to handle all special non-present entries (migration,
>> hwpoison, markers), not just a specific type of marker ...
>>
>> How about this version, which handles all non-swap entries as you
>> suggested?
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index 7ab2d1a42df3..27f432e7f07c 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -1284,7 +1284,23 @@ static int hpage_collapse_scan_pmd(struct
>> mm_struct *mm,
>>           for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
>>                _pte++, addr += PAGE_SIZE) {
>>                   pte_t pteval = ptep_get(_pte);
>> -               if (is_swap_pte(pteval)) {
>> +               if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
>> +                       ++none_or_zero;
>> +                       if (!userfaultfd_armed(vma) &&
>> +                           (!cc->is_khugepaged ||
>> +                            none_or_zero <= khugepaged_max_ptes_none)) {
>> +                               continue;
>> +                       } else {
>> +                               result = SCAN_EXCEED_NONE_PTE;
>> +                               count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
>> +                               goto out_unmap;
>> +                       }
>> +               } else if (!pte_present(pteval)) {
>> +                       if (non_swap_entry(pte_to_swp_entry(pteval))) {
>> +                               result = SCAN_PTE_NON_PRESENT;
>> +                               goto out_unmap;
>> +                       }
>> +
>>                           ++unmapped;
>>                           if (!cc->is_khugepaged ||
>>                               unmapped <= khugepaged_max_ptes_swap) {
>> @@ -1293,7 +1309,7 @@ static int hpage_collapse_scan_pmd(struct
>> mm_struct *mm,
>>                                    * enabled swap entries.  Please see
>>                                    * comment below for pte_uffd_wp().
>>                                    */
>> -                               if (pte_swp_uffd_wp_any(pteval)) {
>> +                               if (pte_swp_uffd_wp(pteval)) {
>>                                           result = SCAN_PTE_UFFD_WP;
>>                                           goto out_unmap;
>>                                   }
>> @@ -1304,18 +1320,6 @@ static int hpage_collapse_scan_pmd(struct
>> mm_struct *mm,
>>                                   goto out_unmap;
>>                           }
>>                   }
>> -               if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
>> -                       ++none_or_zero;
>> -                       if (!userfaultfd_armed(vma) &&
>> -                           (!cc->is_khugepaged ||
>> -                            none_or_zero <= khugepaged_max_ptes_none)) {
>> -                               continue;
>> -                       } else {
>> -                               result = SCAN_EXCEED_NONE_PTE;
>> -                               count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
>> -                               goto out_unmap;
>> -                       }
>> -               }
>>                   if (pte_uffd_wp(pteval)) {
> 
>  From a quick glimpse, this should work. And as raised, we might be able 
> to unify later the scanning with the almost-duplicated code when we do 
> the second scan.

Sounds good! Let's get this one merged first, and I'll send a follow-up
patch to unify the duplicated code as you suggested ;)



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-09-29 10:39 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-09-24 10:02 [PATCH mm-new 1/1] mm/khugepaged: abort collapse scan on non-swap entries Lance Yang
2025-09-24 10:10 ` David Hildenbrand
2025-09-24 10:17   ` Kiryl Shutsemau
2025-09-24 10:40     ` David Hildenbrand
2025-09-24 11:47   ` Lance Yang
2025-09-29 10:10     ` Lance Yang
2025-09-29 10:29     ` David Hildenbrand
2025-09-29 10:39       ` Lance Yang
2025-09-24 10:10 ` Kiryl Shutsemau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox