linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH mm-new v3 1/1] mm/khugepaged: abort collapse scan on non-swap entries
@ 2025-10-08  3:26 Lance Yang
  2025-10-08  8:18 ` David Hildenbrand
                   ` (3 more replies)
  0 siblings, 4 replies; 24+ messages in thread
From: Lance Yang @ 2025-10-08  3:26 UTC (permalink / raw)
  To: akpm
  Cc: david, lorenzo.stoakes, Liam.Howlett, baohua, baolin.wang,
	dev.jain, hughd, ioworker0, kirill, linux-kernel, linux-mm,
	mpenttil, npache, ryan.roberts, ziy, richard.weiyang, Lance Yang

From: Lance Yang <lance.yang@linux.dev>

Currently, special non-swap entries (like PTE markers) are not caught
early in hpage_collapse_scan_pmd(), leading to failures deep in the
swap-in logic.

A function that is called __collapse_huge_page_swapin() and documented
to "Bring missing pages in from swap" will handle other types as well.

As analyzed by David[1], we could have ended up with the following
entry types right before do_swap_page():

  (1) Migration entries. We would have waited.
      -> Maybe worth it to wait, maybe not. We suspect we don't stumble
         into that frequently such that we don't care. We could always
         unlock this separately later.

  (2) Device-exclusive entries. We would have converted to non-exclusive.
      -> See make_device_exclusive(), we cannot tolerate PMD entries and
         have to split them through FOLL_SPLIT_PMD. As popped up during
         a recent discussion, collapsing here is actually
         counter-productive, because the next conversion will PTE-map
         it again.
      -> Ok to not collapse.

  (3) Device-private entries. We would have migrated to RAM.
      -> Device-private still does not support THPs, so collapsing right
         now just means that the next device access would split the
         folio again.
      -> Ok to not collapse.

  (4) HWPoison entries
      -> Cannot collapse

  (5) Markers
      -> Cannot collapse

First, this patch adds an early check for these non-swap entries. If
any one is found, the scan is aborted immediately with the
SCAN_PTE_NON_PRESENT result, as Lorenzo suggested[2], avoiding wasted
work. While at it, convert pte_swp_uffd_wp_any() to pte_swp_uffd_wp()
since we are in the swap pte branch.

Second, as Wei pointed out[3], we may have a chance to get a non-swap
entry, since we will drop and re-acquire the mmap lock before
__collapse_huge_page_swapin(). To handle this, we also add a
non_swap_entry() check there.

Note that we can unlock later what we really need, and not account it
towards max_swap_ptes.

[1] https://lore.kernel.org/linux-mm/09eaca7b-9988-41c7-8d6e-4802055b3f1e@redhat.com
[2] https://lore.kernel.org/linux-mm/7df49fe7-c6b7-426a-8680-dcd55219c8bd@lucifer.local
[3] https://lore.kernel.org/linux-mm/20251005010511.ysek2nqojebqngf3@master

Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Lance Yang <lance.yang@linux.dev>
---
v2 -> v3:
 - Collect Acked-by from David - thanks!
 - Collect Reviewed-by from Wei and Dev - thanks!
 - Add a non_swap_entry() check in __collapse_huge_page_swapin() (per Wei
   and David) - thanks!
 - Rework the changelog to incorporate David's detailed analysis of
   non-swap entry types - thanks!!!
 - https://lore.kernel.org/linux-mm/20251001032251.85888-1-lance.yang@linux.dev/

v1 -> v2:
 - Skip all non-present entries except swap entries (per David) thanks!
 - https://lore.kernel.org/linux-mm/20250924100207.28332-1-lance.yang@linux.dev/

 mm/khugepaged.c | 37 +++++++++++++++++++++++--------------
 1 file changed, 23 insertions(+), 14 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index abe54f0043c7..bec3e268dc76 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1020,6 +1020,11 @@ static int __collapse_huge_page_swapin(struct mm_struct *mm,
 		if (!is_swap_pte(vmf.orig_pte))
 			continue;
 
+		if (non_swap_entry(pte_to_swp_entry(vmf.orig_pte))) {
+			result = SCAN_PTE_NON_PRESENT;
+			goto out;
+		}
+
 		vmf.pte = pte;
 		vmf.ptl = ptl;
 		ret = do_swap_page(&vmf);
@@ -1281,7 +1286,23 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
 	for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
 	     _pte++, addr += PAGE_SIZE) {
 		pte_t pteval = ptep_get(_pte);
-		if (is_swap_pte(pteval)) {
+		if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
+			++none_or_zero;
+			if (!userfaultfd_armed(vma) &&
+			    (!cc->is_khugepaged ||
+			     none_or_zero <= khugepaged_max_ptes_none)) {
+				continue;
+			} else {
+				result = SCAN_EXCEED_NONE_PTE;
+				count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
+				goto out_unmap;
+			}
+		} else if (!pte_present(pteval)) {
+			if (non_swap_entry(pte_to_swp_entry(pteval))) {
+				result = SCAN_PTE_NON_PRESENT;
+				goto out_unmap;
+			}
+
 			++unmapped;
 			if (!cc->is_khugepaged ||
 			    unmapped <= khugepaged_max_ptes_swap) {
@@ -1290,7 +1311,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
 				 * enabled swap entries.  Please see
 				 * comment below for pte_uffd_wp().
 				 */
-				if (pte_swp_uffd_wp_any(pteval)) {
+				if (pte_swp_uffd_wp(pteval)) {
 					result = SCAN_PTE_UFFD_WP;
 					goto out_unmap;
 				}
@@ -1301,18 +1322,6 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
 				goto out_unmap;
 			}
 		}
-		if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
-			++none_or_zero;
-			if (!userfaultfd_armed(vma) &&
-			    (!cc->is_khugepaged ||
-			     none_or_zero <= khugepaged_max_ptes_none)) {
-				continue;
-			} else {
-				result = SCAN_EXCEED_NONE_PTE;
-				count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
-				goto out_unmap;
-			}
-		}
 		if (pte_uffd_wp(pteval)) {
 			/*
 			 * Don't collapse the page if any of the small
-- 
2.49.0



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH mm-new v3 1/1] mm/khugepaged: abort collapse scan on non-swap entries
  2025-10-08  3:26 [PATCH mm-new v3 1/1] mm/khugepaged: abort collapse scan on non-swap entries Lance Yang
@ 2025-10-08  8:18 ` David Hildenbrand
  2025-10-10  3:44 ` Barry Song
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 24+ messages in thread
From: David Hildenbrand @ 2025-10-08  8:18 UTC (permalink / raw)
  To: Lance Yang, akpm
  Cc: lorenzo.stoakes, Liam.Howlett, baohua, baolin.wang, dev.jain,
	hughd, ioworker0, kirill, linux-kernel, linux-mm, mpenttil,
	npache, ryan.roberts, ziy, richard.weiyang

On 08.10.25 05:26, Lance Yang wrote:
> From: Lance Yang <lance.yang@linux.dev>
> 
> Currently, special non-swap entries (like PTE markers) are not caught
> early in hpage_collapse_scan_pmd(), leading to failures deep in the
> swap-in logic.
> 
> A function that is called __collapse_huge_page_swapin() and documented
> to "Bring missing pages in from swap" will handle other types as well.
> 
> As analyzed by David[1], we could have ended up with the following
> entry types right before do_swap_page():
> 
>    (1) Migration entries. We would have waited.
>        -> Maybe worth it to wait, maybe not. We suspect we don't stumble
>           into that frequently such that we don't care. We could always
>           unlock this separately later.
> 
>    (2) Device-exclusive entries. We would have converted to non-exclusive.
>        -> See make_device_exclusive(), we cannot tolerate PMD entries and
>           have to split them through FOLL_SPLIT_PMD. As popped up during
>           a recent discussion, collapsing here is actually
>           counter-productive, because the next conversion will PTE-map
>           it again.
>        -> Ok to not collapse.
> 
>    (3) Device-private entries. We would have migrated to RAM.
>        -> Device-private still does not support THPs, so collapsing right
>           now just means that the next device access would split the
>           folio again.
>        -> Ok to not collapse.
> 
>    (4) HWPoison entries
>        -> Cannot collapse
> 
>    (5) Markers
>        -> Cannot collapse
> 
> First, this patch adds an early check for these non-swap entries. If
> any one is found, the scan is aborted immediately with the
> SCAN_PTE_NON_PRESENT result, as Lorenzo suggested[2], avoiding wasted
> work. While at it, convert pte_swp_uffd_wp_any() to pte_swp_uffd_wp()
> since we are in the swap pte branch.
> 
> Second, as Wei pointed out[3], we may have a chance to get a non-swap
> entry, since we will drop and re-acquire the mmap lock before
> __collapse_huge_page_swapin(). To handle this, we also add a
> non_swap_entry() check there.
> 
> Note that we can unlock later what we really need, and not account it
> towards max_swap_ptes.
> 
> [1] https://lore.kernel.org/linux-mm/09eaca7b-9988-41c7-8d6e-4802055b3f1e@redhat.com
> [2] https://lore.kernel.org/linux-mm/7df49fe7-c6b7-426a-8680-dcd55219c8bd@lucifer.local
> [3] https://lore.kernel.org/linux-mm/20251005010511.ysek2nqojebqngf3@master
> 
> Acked-by: David Hildenbrand <david@redhat.com>
> Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
> Reviewed-by: Dev Jain <dev.jain@arm.com>
> Suggested-by: David Hildenbrand <david@redhat.com>
> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Signed-off-by: Lance Yang <lance.yang@linux.dev>
> ---

Sorry for not replying earlier to your other mail.

LGTM.

We can always handle migration entries later if this shows up to be a 
problem (this time, in a clean way ...) and not count them towards 
actual "swap" entries.

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH mm-new v3 1/1] mm/khugepaged: abort collapse scan on non-swap entries
  2025-10-08  3:26 [PATCH mm-new v3 1/1] mm/khugepaged: abort collapse scan on non-swap entries Lance Yang
  2025-10-08  8:18 ` David Hildenbrand
@ 2025-10-10  3:44 ` Barry Song
  2025-10-10 15:21 ` Zi Yan
  2025-10-14 11:08 ` Lorenzo Stoakes
  3 siblings, 0 replies; 24+ messages in thread
From: Barry Song @ 2025-10-10  3:44 UTC (permalink / raw)
  To: Lance Yang
  Cc: akpm, david, lorenzo.stoakes, Liam.Howlett, baolin.wang,
	dev.jain, hughd, ioworker0, kirill, linux-kernel, linux-mm,
	mpenttil, npache, ryan.roberts, ziy, richard.weiyang

On Thu, Oct 9, 2025 at 3:56 AM Lance Yang <lance.yang@linux.dev> wrote:
>
> From: Lance Yang <lance.yang@linux.dev>
>
> Currently, special non-swap entries (like PTE markers) are not caught
> early in hpage_collapse_scan_pmd(), leading to failures deep in the
> swap-in logic.
>
> A function that is called __collapse_huge_page_swapin() and documented
> to "Bring missing pages in from swap" will handle other types as well.
>
> As analyzed by David[1], we could have ended up with the following
> entry types right before do_swap_page():
>
>   (1) Migration entries. We would have waited.
>       -> Maybe worth it to wait, maybe not. We suspect we don't stumble
>          into that frequently such that we don't care. We could always
>          unlock this separately later.
>

On phones, we’ve actually observed user-visible stutters when the
migration process gets preempted and a foreground app is waiting for
do_swap_page() to complete. Anyway, that’s a separate issue we might
address later.

>   (2) Device-exclusive entries. We would have converted to non-exclusive.
>       -> See make_device_exclusive(), we cannot tolerate PMD entries and
>          have to split them through FOLL_SPLIT_PMD. As popped up during
>          a recent discussion, collapsing here is actually
>          counter-productive, because the next conversion will PTE-map
>          it again.
>       -> Ok to not collapse.
>
>   (3) Device-private entries. We would have migrated to RAM.
>       -> Device-private still does not support THPs, so collapsing right
>          now just means that the next device access would split the
>          folio again.
>       -> Ok to not collapse.
>
>   (4) HWPoison entries
>       -> Cannot collapse
>
>   (5) Markers
>       -> Cannot collapse
>
> First, this patch adds an early check for these non-swap entries. If
> any one is found, the scan is aborted immediately with the
> SCAN_PTE_NON_PRESENT result, as Lorenzo suggested[2], avoiding wasted
> work. While at it, convert pte_swp_uffd_wp_any() to pte_swp_uffd_wp()
> since we are in the swap pte branch.
>
> Second, as Wei pointed out[3], we may have a chance to get a non-swap
> entry, since we will drop and re-acquire the mmap lock before
> __collapse_huge_page_swapin(). To handle this, we also add a
> non_swap_entry() check there.
>
> Note that we can unlock later what we really need, and not account it
> towards max_swap_ptes.
>
> [1] https://lore.kernel.org/linux-mm/09eaca7b-9988-41c7-8d6e-4802055b3f1e@redhat.com
> [2] https://lore.kernel.org/linux-mm/7df49fe7-c6b7-426a-8680-dcd55219c8bd@lucifer.local
> [3] https://lore.kernel.org/linux-mm/20251005010511.ysek2nqojebqngf3@master
>
> Acked-by: David Hildenbrand <david@redhat.com>
> Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
> Reviewed-by: Dev Jain <dev.jain@arm.com>
> Suggested-by: David Hildenbrand <david@redhat.com>
> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Signed-off-by: Lance Yang <lance.yang@linux.dev>
> ---

Reviewed-by: Barry Song <baohua@kernel.org>

Thanks
Barry


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH mm-new v3 1/1] mm/khugepaged: abort collapse scan on non-swap entries
  2025-10-08  3:26 [PATCH mm-new v3 1/1] mm/khugepaged: abort collapse scan on non-swap entries Lance Yang
  2025-10-08  8:18 ` David Hildenbrand
  2025-10-10  3:44 ` Barry Song
@ 2025-10-10 15:21 ` Zi Yan
  2025-10-10 15:34   ` Lance Yang
  2025-10-14 11:08 ` Lorenzo Stoakes
  3 siblings, 1 reply; 24+ messages in thread
From: Zi Yan @ 2025-10-10 15:21 UTC (permalink / raw)
  To: Lance Yang
  Cc: akpm, david, lorenzo.stoakes, Liam.Howlett, baohua, baolin.wang,
	dev.jain, hughd, ioworker0, kirill, linux-kernel, linux-mm,
	mpenttil, npache, ryan.roberts, richard.weiyang

On 7 Oct 2025, at 23:26, Lance Yang wrote:

> From: Lance Yang <lance.yang@linux.dev>
>
> Currently, special non-swap entries (like PTE markers) are not caught
> early in hpage_collapse_scan_pmd(), leading to failures deep in the
> swap-in logic.
>
> A function that is called __collapse_huge_page_swapin() and documented
> to "Bring missing pages in from swap" will handle other types as well.
>
> As analyzed by David[1], we could have ended up with the following
> entry types right before do_swap_page():
>
>   (1) Migration entries. We would have waited.
>       -> Maybe worth it to wait, maybe not. We suspect we don't stumble
>          into that frequently such that we don't care. We could always
>          unlock this separately later.
>
>   (2) Device-exclusive entries. We would have converted to non-exclusive.
>       -> See make_device_exclusive(), we cannot tolerate PMD entries and
>          have to split them through FOLL_SPLIT_PMD. As popped up during
>          a recent discussion, collapsing here is actually
>          counter-productive, because the next conversion will PTE-map
>          it again.
>       -> Ok to not collapse.
>
>   (3) Device-private entries. We would have migrated to RAM.
>       -> Device-private still does not support THPs, so collapsing right
>          now just means that the next device access would split the
>          folio again.
>       -> Ok to not collapse.
>
>   (4) HWPoison entries
>       -> Cannot collapse
>
>   (5) Markers
>       -> Cannot collapse
>
> First, this patch adds an early check for these non-swap entries. If
> any one is found, the scan is aborted immediately with the
> SCAN_PTE_NON_PRESENT result, as Lorenzo suggested[2], avoiding wasted
> work. While at it, convert pte_swp_uffd_wp_any() to pte_swp_uffd_wp()
> since we are in the swap pte branch.
>
> Second, as Wei pointed out[3], we may have a chance to get a non-swap
> entry, since we will drop and re-acquire the mmap lock before
> __collapse_huge_page_swapin(). To handle this, we also add a
> non_swap_entry() check there.
>
> Note that we can unlock later what we really need, and not account it
> towards max_swap_ptes.
>
> [1] https://lore.kernel.org/linux-mm/09eaca7b-9988-41c7-8d6e-4802055b3f1e@redhat.com
> [2] https://lore.kernel.org/linux-mm/7df49fe7-c6b7-426a-8680-dcd55219c8bd@lucifer.local
> [3] https://lore.kernel.org/linux-mm/20251005010511.ysek2nqojebqngf3@master
>
> Acked-by: David Hildenbrand <david@redhat.com>
> Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
> Reviewed-by: Dev Jain <dev.jain@arm.com>
> Suggested-by: David Hildenbrand <david@redhat.com>
> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Signed-off-by: Lance Yang <lance.yang@linux.dev>
> ---
> v2 -> v3:
>  - Collect Acked-by from David - thanks!
>  - Collect Reviewed-by from Wei and Dev - thanks!
>  - Add a non_swap_entry() check in __collapse_huge_page_swapin() (per Wei
>    and David) - thanks!
>  - Rework the changelog to incorporate David's detailed analysis of
>    non-swap entry types - thanks!!!
>  - https://lore.kernel.org/linux-mm/20251001032251.85888-1-lance.yang@linux.dev/
>
> v1 -> v2:
>  - Skip all non-present entries except swap entries (per David) thanks!
>  - https://lore.kernel.org/linux-mm/20250924100207.28332-1-lance.yang@linux.dev/
>
>  mm/khugepaged.c | 37 +++++++++++++++++++++++--------------
>  1 file changed, 23 insertions(+), 14 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index abe54f0043c7..bec3e268dc76 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1020,6 +1020,11 @@ static int __collapse_huge_page_swapin(struct mm_struct *mm,
>  		if (!is_swap_pte(vmf.orig_pte))
>  			continue;
>
> +		if (non_swap_entry(pte_to_swp_entry(vmf.orig_pte))) {
> +			result = SCAN_PTE_NON_PRESENT;
> +			goto out;
> +		}
> +
>  		vmf.pte = pte;
>  		vmf.ptl = ptl;
>  		ret = do_swap_page(&vmf);
> @@ -1281,7 +1286,23 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>  	for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
>  	     _pte++, addr += PAGE_SIZE) {
>  		pte_t pteval = ptep_get(_pte);
> -		if (is_swap_pte(pteval)) {
> +		if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
> +			++none_or_zero;
> +			if (!userfaultfd_armed(vma) &&
> +			    (!cc->is_khugepaged ||
> +			     none_or_zero <= khugepaged_max_ptes_none)) {
> +				continue;
> +			} else {
> +				result = SCAN_EXCEED_NONE_PTE;
> +				count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
> +				goto out_unmap;
> +			}
> +		} else if (!pte_present(pteval)) {
> +			if (non_swap_entry(pte_to_swp_entry(pteval))) {
> +				result = SCAN_PTE_NON_PRESENT;
> +				goto out_unmap;
> +			}
> +
>  			++unmapped;
>  			if (!cc->is_khugepaged ||
>  			    unmapped <= khugepaged_max_ptes_swap) {
> @@ -1290,7 +1311,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>  				 * enabled swap entries.  Please see
>  				 * comment below for pte_uffd_wp().
>  				 */
> -				if (pte_swp_uffd_wp_any(pteval)) {
> +				if (pte_swp_uffd_wp(pteval)) {

pte_swp_uffd_wp_any() returns true for both pte_swp_uffd_wp() and
pte_marker_uffd_wp(). Why is it OK to just check pte_swp_uffd_wp() here?

>  					result = SCAN_PTE_UFFD_WP;
>  					goto out_unmap;
>  				}
> @@ -1301,18 +1322,6 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>  				goto out_unmap;
>  			}
>  		}
> -		if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
> -			++none_or_zero;
> -			if (!userfaultfd_armed(vma) &&
> -			    (!cc->is_khugepaged ||
> -			     none_or_zero <= khugepaged_max_ptes_none)) {
> -				continue;
> -			} else {
> -				result = SCAN_EXCEED_NONE_PTE;
> -				count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
> -				goto out_unmap;
> -			}
> -		}
>  		if (pte_uffd_wp(pteval)) {
>  			/*
>  			 * Don't collapse the page if any of the small
> -- 
> 2.49.0


--
Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH mm-new v3 1/1] mm/khugepaged: abort collapse scan on non-swap entries
  2025-10-10 15:21 ` Zi Yan
@ 2025-10-10 15:34   ` Lance Yang
  2025-10-10 15:35     ` Zi Yan
  0 siblings, 1 reply; 24+ messages in thread
From: Lance Yang @ 2025-10-10 15:34 UTC (permalink / raw)
  To: Zi Yan
  Cc: akpm, david, lorenzo.stoakes, Liam.Howlett, baohua, baolin.wang,
	dev.jain, hughd, ioworker0, kirill, linux-kernel, linux-mm,
	mpenttil, npache, ryan.roberts, richard.weiyang



On 2025/10/10 23:21, Zi Yan wrote:
> On 7 Oct 2025, at 23:26, Lance Yang wrote:
> 
>> From: Lance Yang <lance.yang@linux.dev>
>>
>> Currently, special non-swap entries (like PTE markers) are not caught
>> early in hpage_collapse_scan_pmd(), leading to failures deep in the
>> swap-in logic.
>>
>> A function that is called __collapse_huge_page_swapin() and documented
>> to "Bring missing pages in from swap" will handle other types as well.
>>
>> As analyzed by David[1], we could have ended up with the following
>> entry types right before do_swap_page():
>>
>>    (1) Migration entries. We would have waited.
>>        -> Maybe worth it to wait, maybe not. We suspect we don't stumble
>>           into that frequently such that we don't care. We could always
>>           unlock this separately later.
>>
>>    (2) Device-exclusive entries. We would have converted to non-exclusive.
>>        -> See make_device_exclusive(), we cannot tolerate PMD entries and
>>           have to split them through FOLL_SPLIT_PMD. As popped up during
>>           a recent discussion, collapsing here is actually
>>           counter-productive, because the next conversion will PTE-map
>>           it again.
>>        -> Ok to not collapse.
>>
>>    (3) Device-private entries. We would have migrated to RAM.
>>        -> Device-private still does not support THPs, so collapsing right
>>           now just means that the next device access would split the
>>           folio again.
>>        -> Ok to not collapse.
>>
>>    (4) HWPoison entries
>>        -> Cannot collapse
>>
>>    (5) Markers
>>        -> Cannot collapse
>>
>> First, this patch adds an early check for these non-swap entries. If
>> any one is found, the scan is aborted immediately with the
>> SCAN_PTE_NON_PRESENT result, as Lorenzo suggested[2], avoiding wasted
>> work. While at it, convert pte_swp_uffd_wp_any() to pte_swp_uffd_wp()
>> since we are in the swap pte branch.
>>
>> Second, as Wei pointed out[3], we may have a chance to get a non-swap
>> entry, since we will drop and re-acquire the mmap lock before
>> __collapse_huge_page_swapin(). To handle this, we also add a
>> non_swap_entry() check there.
>>
>> Note that we can unlock later what we really need, and not account it
>> towards max_swap_ptes.
>>
>> [1] https://lore.kernel.org/linux-mm/09eaca7b-9988-41c7-8d6e-4802055b3f1e@redhat.com
>> [2] https://lore.kernel.org/linux-mm/7df49fe7-c6b7-426a-8680-dcd55219c8bd@lucifer.local
>> [3] https://lore.kernel.org/linux-mm/20251005010511.ysek2nqojebqngf3@master
>>
>> Acked-by: David Hildenbrand <david@redhat.com>
>> Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
>> Reviewed-by: Dev Jain <dev.jain@arm.com>
>> Suggested-by: David Hildenbrand <david@redhat.com>
>> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>> Signed-off-by: Lance Yang <lance.yang@linux.dev>
>> ---
>> v2 -> v3:
>>   - Collect Acked-by from David - thanks!
>>   - Collect Reviewed-by from Wei and Dev - thanks!
>>   - Add a non_swap_entry() check in __collapse_huge_page_swapin() (per Wei
>>     and David) - thanks!
>>   - Rework the changelog to incorporate David's detailed analysis of
>>     non-swap entry types - thanks!!!
>>   - https://lore.kernel.org/linux-mm/20251001032251.85888-1-lance.yang@linux.dev/
>>
>> v1 -> v2:
>>   - Skip all non-present entries except swap entries (per David) thanks!
>>   - https://lore.kernel.org/linux-mm/20250924100207.28332-1-lance.yang@linux.dev/
>>
>>   mm/khugepaged.c | 37 +++++++++++++++++++++++--------------
>>   1 file changed, 23 insertions(+), 14 deletions(-)
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index abe54f0043c7..bec3e268dc76 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -1020,6 +1020,11 @@ static int __collapse_huge_page_swapin(struct mm_struct *mm,
>>   		if (!is_swap_pte(vmf.orig_pte))
>>   			continue;
>>
>> +		if (non_swap_entry(pte_to_swp_entry(vmf.orig_pte))) {
>> +			result = SCAN_PTE_NON_PRESENT;
>> +			goto out;
>> +		}
>> +
>>   		vmf.pte = pte;
>>   		vmf.ptl = ptl;
>>   		ret = do_swap_page(&vmf);
>> @@ -1281,7 +1286,23 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>>   	for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
>>   	     _pte++, addr += PAGE_SIZE) {
>>   		pte_t pteval = ptep_get(_pte);
>> -		if (is_swap_pte(pteval)) {
>> +		if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
>> +			++none_or_zero;
>> +			if (!userfaultfd_armed(vma) &&
>> +			    (!cc->is_khugepaged ||
>> +			     none_or_zero <= khugepaged_max_ptes_none)) {
>> +				continue;
>> +			} else {
>> +				result = SCAN_EXCEED_NONE_PTE;
>> +				count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
>> +				goto out_unmap;
>> +			}
>> +		} else if (!pte_present(pteval)) {
>> +			if (non_swap_entry(pte_to_swp_entry(pteval))) { 
>> +				result = SCAN_PTE_NON_PRESENT;
>> +				goto out_unmap;
>> +			}
>> +
>>   			++unmapped;
>>   			if (!cc->is_khugepaged ||
>>   			    unmapped <= khugepaged_max_ptes_swap) {
>> @@ -1290,7 +1311,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>>   				 * enabled swap entries.  Please see
>>   				 * comment below for pte_uffd_wp().
>>   				 */
>> -				if (pte_swp_uffd_wp_any(pteval)) {
>> +				if (pte_swp_uffd_wp(pteval)) {
> 
> pte_swp_uffd_wp_any() returns true for both pte_swp_uffd_wp() and
> pte_marker_uffd_wp(). Why is it OK to just check pte_swp_uffd_wp() here?

+		} else if (!pte_present(pteval)) {
+			if (non_swap_entry(pte_to_swp_entry(pteval))) { <--
+				result = SCAN_PTE_NON_PRESENT;
+				goto out_unmap;
+			}

IIUC, we have just handled all non-swap entries above (which would
include pte_marker_uffd_wp()), right?

> 
>>   					result = SCAN_PTE_UFFD_WP;
>>   					goto out_unmap;
>>   				}
>> @@ -1301,18 +1322,6 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>>   				goto out_unmap;
>>   			}
>>   		}
>> -		if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
>> -			++none_or_zero;
>> -			if (!userfaultfd_armed(vma) &&
>> -			    (!cc->is_khugepaged ||
>> -			     none_or_zero <= khugepaged_max_ptes_none)) {
>> -				continue;
>> -			} else {
>> -				result = SCAN_EXCEED_NONE_PTE;
>> -				count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
>> -				goto out_unmap;
>> -			}
>> -		}
>>   		if (pte_uffd_wp(pteval)) {
>>   			/*
>>   			 * Don't collapse the page if any of the small
>> -- 
>> 2.49.0
> 
> 
> --
> Best Regards,
> Yan, Zi



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH mm-new v3 1/1] mm/khugepaged: abort collapse scan on non-swap entries
  2025-10-10 15:34   ` Lance Yang
@ 2025-10-10 15:35     ` Zi Yan
  0 siblings, 0 replies; 24+ messages in thread
From: Zi Yan @ 2025-10-10 15:35 UTC (permalink / raw)
  To: Lance Yang
  Cc: akpm, david, lorenzo.stoakes, Liam.Howlett, baohua, baolin.wang,
	dev.jain, hughd, ioworker0, kirill, linux-kernel, linux-mm,
	mpenttil, npache, ryan.roberts, richard.weiyang

On 10 Oct 2025, at 11:34, Lance Yang wrote:

> On 2025/10/10 23:21, Zi Yan wrote:
>> On 7 Oct 2025, at 23:26, Lance Yang wrote:
>>
>>> From: Lance Yang <lance.yang@linux.dev>
>>>
>>> Currently, special non-swap entries (like PTE markers) are not caught
>>> early in hpage_collapse_scan_pmd(), leading to failures deep in the
>>> swap-in logic.
>>>
>>> A function that is called __collapse_huge_page_swapin() and documented
>>> to "Bring missing pages in from swap" will handle other types as well.
>>>
>>> As analyzed by David[1], we could have ended up with the following
>>> entry types right before do_swap_page():
>>>
>>>    (1) Migration entries. We would have waited.
>>>        -> Maybe worth it to wait, maybe not. We suspect we don't stumble
>>>           into that frequently such that we don't care. We could always
>>>           unlock this separately later.
>>>
>>>    (2) Device-exclusive entries. We would have converted to non-exclusive.
>>>        -> See make_device_exclusive(), we cannot tolerate PMD entries and
>>>           have to split them through FOLL_SPLIT_PMD. As popped up during
>>>           a recent discussion, collapsing here is actually
>>>           counter-productive, because the next conversion will PTE-map
>>>           it again.
>>>        -> Ok to not collapse.
>>>
>>>    (3) Device-private entries. We would have migrated to RAM.
>>>        -> Device-private still does not support THPs, so collapsing right
>>>           now just means that the next device access would split the
>>>           folio again.
>>>        -> Ok to not collapse.
>>>
>>>    (4) HWPoison entries
>>>        -> Cannot collapse
>>>
>>>    (5) Markers
>>>        -> Cannot collapse
>>>
>>> First, this patch adds an early check for these non-swap entries. If
>>> any one is found, the scan is aborted immediately with the
>>> SCAN_PTE_NON_PRESENT result, as Lorenzo suggested[2], avoiding wasted
>>> work. While at it, convert pte_swp_uffd_wp_any() to pte_swp_uffd_wp()
>>> since we are in the swap pte branch.
>>>
>>> Second, as Wei pointed out[3], we may have a chance to get a non-swap
>>> entry, since we will drop and re-acquire the mmap lock before
>>> __collapse_huge_page_swapin(). To handle this, we also add a
>>> non_swap_entry() check there.
>>>
>>> Note that we can unlock later what we really need, and not account it
>>> towards max_swap_ptes.
>>>
>>> [1] https://lore.kernel.org/linux-mm/09eaca7b-9988-41c7-8d6e-4802055b3f1e@redhat.com
>>> [2] https://lore.kernel.org/linux-mm/7df49fe7-c6b7-426a-8680-dcd55219c8bd@lucifer.local
>>> [3] https://lore.kernel.org/linux-mm/20251005010511.ysek2nqojebqngf3@master
>>>
>>> Acked-by: David Hildenbrand <david@redhat.com>
>>> Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
>>> Reviewed-by: Dev Jain <dev.jain@arm.com>
>>> Suggested-by: David Hildenbrand <david@redhat.com>
>>> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>>> Signed-off-by: Lance Yang <lance.yang@linux.dev>
>>> ---
>>> v2 -> v3:
>>>   - Collect Acked-by from David - thanks!
>>>   - Collect Reviewed-by from Wei and Dev - thanks!
>>>   - Add a non_swap_entry() check in __collapse_huge_page_swapin() (per Wei
>>>     and David) - thanks!
>>>   - Rework the changelog to incorporate David's detailed analysis of
>>>     non-swap entry types - thanks!!!
>>>   - https://lore.kernel.org/linux-mm/20251001032251.85888-1-lance.yang@linux.dev/
>>>
>>> v1 -> v2:
>>>   - Skip all non-present entries except swap entries (per David) thanks!
>>>   - https://lore.kernel.org/linux-mm/20250924100207.28332-1-lance.yang@linux.dev/
>>>
>>>   mm/khugepaged.c | 37 +++++++++++++++++++++++--------------
>>>   1 file changed, 23 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>> index abe54f0043c7..bec3e268dc76 100644
>>> --- a/mm/khugepaged.c
>>> +++ b/mm/khugepaged.c
>>> @@ -1020,6 +1020,11 @@ static int __collapse_huge_page_swapin(struct mm_struct *mm,
>>>   		if (!is_swap_pte(vmf.orig_pte))
>>>   			continue;
>>>
>>> +		if (non_swap_entry(pte_to_swp_entry(vmf.orig_pte))) {
>>> +			result = SCAN_PTE_NON_PRESENT;
>>> +			goto out;
>>> +		}
>>> +
>>>   		vmf.pte = pte;
>>>   		vmf.ptl = ptl;
>>>   		ret = do_swap_page(&vmf);
>>> @@ -1281,7 +1286,23 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>>>   	for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
>>>   	     _pte++, addr += PAGE_SIZE) {
>>>   		pte_t pteval = ptep_get(_pte);
>>> -		if (is_swap_pte(pteval)) {
>>> +		if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
>>> +			++none_or_zero;
>>> +			if (!userfaultfd_armed(vma) &&
>>> +			    (!cc->is_khugepaged ||
>>> +			     none_or_zero <= khugepaged_max_ptes_none)) {
>>> +				continue;
>>> +			} else {
>>> +				result = SCAN_EXCEED_NONE_PTE;
>>> +				count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
>>> +				goto out_unmap;
>>> +			}
>>> +		} else if (!pte_present(pteval)) {
>>> +			if (non_swap_entry(pte_to_swp_entry(pteval))) { +				result = SCAN_PTE_NON_PRESENT;
>>> +				goto out_unmap;
>>> +			}
>>> +
>>>   			++unmapped;
>>>   			if (!cc->is_khugepaged ||
>>>   			    unmapped <= khugepaged_max_ptes_swap) {
>>> @@ -1290,7 +1311,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>>>   				 * enabled swap entries.  Please see
>>>   				 * comment below for pte_uffd_wp().
>>>   				 */
>>> -				if (pte_swp_uffd_wp_any(pteval)) {
>>> +				if (pte_swp_uffd_wp(pteval)) {
>>
>> pte_swp_uffd_wp_any() returns true for both pte_swp_uffd_wp() and
>> pte_marker_uffd_wp(). Why is it OK to just check pte_swp_uffd_wp() here?
>
> +		} else if (!pte_present(pteval)) {
> +			if (non_swap_entry(pte_to_swp_entry(pteval))) { <--
> +				result = SCAN_PTE_NON_PRESENT;
> +				goto out_unmap;
> +			}
>
> IIUC, we have just handled all non-swap entries above (which would
> include pte_marker_uffd_wp()), right?

Got it. Thanks for the explanation.

Reviewed-by: Zi Yan <ziy@nvidia.com>

--
Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH mm-new v3 1/1] mm/khugepaged: abort collapse scan on non-swap entries
  2025-10-08  3:26 [PATCH mm-new v3 1/1] mm/khugepaged: abort collapse scan on non-swap entries Lance Yang
                   ` (2 preceding siblings ...)
  2025-10-10 15:21 ` Zi Yan
@ 2025-10-14 11:08 ` Lorenzo Stoakes
  2025-10-14 14:26   ` Lance Yang
  3 siblings, 1 reply; 24+ messages in thread
From: Lorenzo Stoakes @ 2025-10-14 11:08 UTC (permalink / raw)
  To: Lance Yang
  Cc: akpm, david, Liam.Howlett, baohua, baolin.wang, dev.jain, hughd,
	ioworker0, kirill, linux-kernel, linux-mm, mpenttil, npache,
	ryan.roberts, ziy, richard.weiyang

On Wed, Oct 08, 2025 at 11:26:57AM +0800, Lance Yang wrote:
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index abe54f0043c7..bec3e268dc76 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1020,6 +1020,11 @@ static int __collapse_huge_page_swapin(struct mm_struct *mm,
>  		if (!is_swap_pte(vmf.orig_pte))
>  			continue;
>
> +		if (non_swap_entry(pte_to_swp_entry(vmf.orig_pte))) {
> +			result = SCAN_PTE_NON_PRESENT;
> +			goto out;
> +		}

OK seems in line with what we were discussing before...

> +
>  		vmf.pte = pte;
>  		vmf.ptl = ptl;
>  		ret = do_swap_page(&vmf);
> @@ -1281,7 +1286,23 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>  	for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
>  	     _pte++, addr += PAGE_SIZE) {
>  		pte_t pteval = ptep_get(_pte);
> -		if (is_swap_pte(pteval)) {
> +		if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
> +			++none_or_zero;
> +			if (!userfaultfd_armed(vma) &&
> +			    (!cc->is_khugepaged ||
> +			     none_or_zero <= khugepaged_max_ptes_none)) {
> +				continue;
> +			} else {
> +				result = SCAN_EXCEED_NONE_PTE;
> +				count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
> +				goto out_unmap;
> +			}
> +		} else if (!pte_present(pteval)) {
> +			if (non_swap_entry(pte_to_swp_entry(pteval))) {

Hm but can't this be pte_protnone() at this stage (or something else)? And then
we're just assuming pte_to_swp_entry() is operating on a swap entry when it in
fact might not be?

Couldn't we end up with false positives here?

> +				result = SCAN_PTE_NON_PRESENT;
> +				goto out_unmap;
> +			}
> +
>  			++unmapped;
>  			if (!cc->is_khugepaged ||
>  			    unmapped <= khugepaged_max_ptes_swap) {
> @@ -1290,7 +1311,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>  				 * enabled swap entries.  Please see
>  				 * comment below for pte_uffd_wp().
>  				 */
> -				if (pte_swp_uffd_wp_any(pteval)) {
> +				if (pte_swp_uffd_wp(pteval)) {

Again you're assuming it's a swap entry but you're not asserting this is a swap
entry in this branch?

Also an aside - I hate, hate, hate how this uffd wp stuff has infiltrated all
kinds of open-coded stuff. It's so gross (not your fault, just a general
comment...)

>  					result = SCAN_PTE_UFFD_WP;
>  					goto out_unmap;
>  				}
> @@ -1301,18 +1322,6 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>  				goto out_unmap;
>  			}
>  		}
> -		if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
> -			++none_or_zero;
> -			if (!userfaultfd_armed(vma) &&
> -			    (!cc->is_khugepaged ||
> -			     none_or_zero <= khugepaged_max_ptes_none)) {
> -				continue;
> -			} else {
> -				result = SCAN_EXCEED_NONE_PTE;
> -				count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
> -				goto out_unmap;
> -			}
> -		}
>  		if (pte_uffd_wp(pteval)) {
>  			/*
>  			 * Don't collapse the page if any of the small
> --
> 2.49.0
>

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH mm-new v3 1/1] mm/khugepaged: abort collapse scan on non-swap entries
  2025-10-14 11:08 ` Lorenzo Stoakes
@ 2025-10-14 14:26   ` Lance Yang
  2025-10-14 14:32     ` David Hildenbrand
  2025-10-14 14:39     ` Lorenzo Stoakes
  0 siblings, 2 replies; 24+ messages in thread
From: Lance Yang @ 2025-10-14 14:26 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: akpm, david, Liam.Howlett, baohua, baolin.wang, dev.jain, hughd,
	ioworker0, kirill, linux-kernel, linux-mm, mpenttil, npache,
	ryan.roberts, ziy, richard.weiyang



On 2025/10/14 19:08, Lorenzo Stoakes wrote:
> On Wed, Oct 08, 2025 at 11:26:57AM +0800, Lance Yang wrote:
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index abe54f0043c7..bec3e268dc76 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -1020,6 +1020,11 @@ static int __collapse_huge_page_swapin(struct mm_struct *mm,
>>   		if (!is_swap_pte(vmf.orig_pte))
>>   			continue;
>>
>> +		if (non_swap_entry(pte_to_swp_entry(vmf.orig_pte))) {
>> +			result = SCAN_PTE_NON_PRESENT;
>> +			goto out;
>> +		}
> 
> OK seems in line with what we were discussing before...

Yep. That's the idea :)

> 
>> +
>>   		vmf.pte = pte;
>>   		vmf.ptl = ptl;
>>   		ret = do_swap_page(&vmf);
>> @@ -1281,7 +1286,23 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>>   	for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
>>   	     _pte++, addr += PAGE_SIZE) {
>>   		pte_t pteval = ptep_get(_pte);
>> -		if (is_swap_pte(pteval)) {
>> +		if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
>> +			++none_or_zero;
>> +			if (!userfaultfd_armed(vma) &&
>> +			    (!cc->is_khugepaged ||
>> +			     none_or_zero <= khugepaged_max_ptes_none)) {
>> +				continue;
>> +			} else {
>> +				result = SCAN_EXCEED_NONE_PTE;
>> +				count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
>> +				goto out_unmap;
>> +			}
>> +		} else if (!pte_present(pteval)) {
>> +			if (non_swap_entry(pte_to_swp_entry(pteval))) {
> 

Thanks for pointing that out!

> Hm but can't this be pte_protnone() at this stage (or something else)? And then

Yeah. The funny thing is, a protnone pte cannot actually get here, IIUC.

```
static inline int pte_protnone(pte_t pte)
{
	return (pte_flags(pte) & (_PAGE_PROTNONE | _PAGE_PRESENT))
		== _PAGE_PROTNONE;
}

static inline int pte_present(pte_t a)
{
	return pte_flags(a) & (_PAGE_PRESENT | _PAGE_PROTNONE);
}
```

On x86, pte_present() returns true for a protnone pte. And I'd assume
other archs behave similarly ...

> we're just assuming pte_to_swp_entry() is operating on a swap entry when it in
> fact might not be?
> 
> Couldn't we end up with false positives here?

Emm, I think we're good here and the code is doing the right thing.

> 
>> +				result = SCAN_PTE_NON_PRESENT;
>> +				goto out_unmap;
>> +			}
>> +
>>   			++unmapped;
>>   			if (!cc->is_khugepaged ||
>>   			    unmapped <= khugepaged_max_ptes_swap) {
>> @@ -1290,7 +1311,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>>   				 * enabled swap entries.  Please see
>>   				 * comment below for pte_uffd_wp().
>>   				 */
>> -				if (pte_swp_uffd_wp_any(pteval)) {
>> +				if (pte_swp_uffd_wp(pteval)) {
> 
> Again you're assuming it's a swap entry but you're not asserting this is a swap
> entry in this branch?

As we discussed above, the non_swap_entry() check has already kicked out
anything that isn't a true swap entry, right?

> 
> Also an aside - I hate, hate, hate how this uffd wp stuff has infiltrated all
> kinds of open-coded stuff. It's so gross (not your fault, just a general
> comment...)

Haha, tell me about it. No argument from me there ;)

Thanks,
Lance


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH mm-new v3 1/1] mm/khugepaged: abort collapse scan on non-swap entries
  2025-10-14 14:26   ` Lance Yang
@ 2025-10-14 14:32     ` David Hildenbrand
  2025-10-14 14:37       ` Lance Yang
  2025-10-14 14:39     ` Lorenzo Stoakes
  1 sibling, 1 reply; 24+ messages in thread
From: David Hildenbrand @ 2025-10-14 14:32 UTC (permalink / raw)
  To: Lance Yang, Lorenzo Stoakes
  Cc: akpm, Liam.Howlett, baohua, baolin.wang, dev.jain, hughd,
	ioworker0, kirill, linux-kernel, linux-mm, mpenttil, npache,
	ryan.roberts, ziy, richard.weiyang


> static inline int pte_protnone(pte_t pte)
> {
> 	return (pte_flags(pte) & (_PAGE_PROTNONE | _PAGE_PRESENT))
> 		== _PAGE_PROTNONE;
> }
> 
> static inline int pte_present(pte_t a)
> {
> 	return pte_flags(a) & (_PAGE_PRESENT | _PAGE_PROTNONE);
> }
> ```
> 
> On x86, pte_present() returns true for a protnone pte. And I'd assume
> other archs behave similarly ...

Applies to all architecture. prot-none entries must be present, 
otherwise we'd have a lot of other issues :)

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH mm-new v3 1/1] mm/khugepaged: abort collapse scan on non-swap entries
  2025-10-14 14:32     ` David Hildenbrand
@ 2025-10-14 14:37       ` Lance Yang
  2025-10-14 14:43         ` Lorenzo Stoakes
  0 siblings, 1 reply; 24+ messages in thread
From: Lance Yang @ 2025-10-14 14:37 UTC (permalink / raw)
  To: Lorenzo Stoakes, David Hildenbrand
  Cc: akpm, Liam.Howlett, baohua, baolin.wang, dev.jain, hughd,
	ioworker0, kirill, linux-kernel, linux-mm, mpenttil, npache,
	ryan.roberts, ziy, richard.weiyang



On 2025/10/14 22:32, David Hildenbrand wrote:
> 
>> static inline int pte_protnone(pte_t pte)
>> {
>>     return (pte_flags(pte) & (_PAGE_PROTNONE | _PAGE_PRESENT))
>>         == _PAGE_PROTNONE;
>> }
>>
>> static inline int pte_present(pte_t a)
>> {
>>     return pte_flags(a) & (_PAGE_PRESENT | _PAGE_PROTNONE);
>> }
>> ```
>>
>> On x86, pte_present() returns true for a protnone pte. And I'd assume
>> other archs behave similarly ...
> 
> Applies to all architecture. prot-none entries must be present, 
> otherwise we'd have a lot of other issues :)

Thanks for confirming, David! That's good to know ;p

So @Lorenzo, looks like we're good here on the protnone front. Does
that clear up your concern?



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH mm-new v3 1/1] mm/khugepaged: abort collapse scan on non-swap entries
  2025-10-14 14:26   ` Lance Yang
  2025-10-14 14:32     ` David Hildenbrand
@ 2025-10-14 14:39     ` Lorenzo Stoakes
  2025-10-14 15:01       ` Lance Yang
  2025-10-14 15:11       ` David Hildenbrand
  1 sibling, 2 replies; 24+ messages in thread
From: Lorenzo Stoakes @ 2025-10-14 14:39 UTC (permalink / raw)
  To: Lance Yang
  Cc: akpm, david, Liam.Howlett, baohua, baolin.wang, dev.jain, hughd,
	ioworker0, kirill, linux-kernel, linux-mm, mpenttil, npache,
	ryan.roberts, ziy, richard.weiyang

On Tue, Oct 14, 2025 at 10:26:20PM +0800, Lance Yang wrote:
>
>
> On 2025/10/14 19:08, Lorenzo Stoakes wrote:
> > On Wed, Oct 08, 2025 at 11:26:57AM +0800, Lance Yang wrote:
> > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > > index abe54f0043c7..bec3e268dc76 100644
> > > --- a/mm/khugepaged.c
> > > +++ b/mm/khugepaged.c
> > > @@ -1020,6 +1020,11 @@ static int __collapse_huge_page_swapin(struct mm_struct *mm,
> > >   		if (!is_swap_pte(vmf.orig_pte))
> > >   			continue;
> > >
> > > +		if (non_swap_entry(pte_to_swp_entry(vmf.orig_pte))) {
> > > +			result = SCAN_PTE_NON_PRESENT;
> > > +			goto out;
> > > +		}
> >
> > OK seems in line with what we were discussing before...
>
> Yep. That's the idea :)
>
> >
> > > +
> > >   		vmf.pte = pte;
> > >   		vmf.ptl = ptl;
> > >   		ret = do_swap_page(&vmf);
> > > @@ -1281,7 +1286,23 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
> > >   	for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
> > >   	     _pte++, addr += PAGE_SIZE) {
> > >   		pte_t pteval = ptep_get(_pte);
> > > -		if (is_swap_pte(pteval)) {
> > > +		if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
> > > +			++none_or_zero;
> > > +			if (!userfaultfd_armed(vma) &&
> > > +			    (!cc->is_khugepaged ||
> > > +			     none_or_zero <= khugepaged_max_ptes_none)) {
> > > +				continue;
> > > +			} else {
> > > +				result = SCAN_EXCEED_NONE_PTE;
> > > +				count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
> > > +				goto out_unmap;
> > > +			}
> > > +		} else if (!pte_present(pteval)) {
> > > +			if (non_swap_entry(pte_to_swp_entry(pteval))) {
> >
>
> Thanks for pointing that out!

You've deleted what I've said here and also not indicated whether you'll do what
I asked :)

Please be clearer...

>
> > Hm but can't this be pte_protnone() at this stage (or something else)? And then
>
> Yeah. The funny thing is, a protnone pte cannot actually get here, IIUC.
>
> ```
> static inline int pte_protnone(pte_t pte)
> {
> 	return (pte_flags(pte) & (_PAGE_PROTNONE | _PAGE_PRESENT))
> 		== _PAGE_PROTNONE;
> }
>
> static inline int pte_present(pte_t a)
> {
> 	return pte_flags(a) & (_PAGE_PRESENT | _PAGE_PROTNONE);
> }
> ```
>
> On x86, pte_present() returns true for a protnone pte. And I'd assume
> other archs behave similarly ...

This was one example, we may make changes in the future that result in entries
that are non-present but also non-swap.

I don't see the point in eliminating this check based on an implicit, open-coded
assumption that this can never be the case, this is just asking for trouble.

>
> > we're just assuming pte_to_swp_entry() is operating on a swap entry when it in
> > fact might not be?
> >
> > Couldn't we end up with false positives here?
>
> Emm, I think we're good here and the code is doing the right thing.

I mean sorry but just - NO - to doing swap operations based on open-coded checks
that you implicitly feel must imply a swap entry.

This makes the code a lot more confusing, it opens us up to accidentally
breaking things in future and has little to no benefit, I don't see why we're
doing it.

I don't think every little 'aha X must imply Y so just eliminate Z' idea need be
implemented, this feels like a sort of 'mathematical reduction of code ignoring
all other factors'.

>
> >
> > > +				result = SCAN_PTE_NON_PRESENT;
> > > +				goto out_unmap;
> > > +			}
> > > +
> > >   			++unmapped;
> > >   			if (!cc->is_khugepaged ||
> > >   			    unmapped <= khugepaged_max_ptes_swap) {
> > > @@ -1290,7 +1311,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
> > >   				 * enabled swap entries.  Please see
> > >   				 * comment below for pte_uffd_wp().
> > >   				 */
> > > -				if (pte_swp_uffd_wp_any(pteval)) {
> > > +				if (pte_swp_uffd_wp(pteval)) {
> >
> > Again you're assuming it's a swap entry but you're not asserting this is a swap
> > entry in this branch?
>
> As we discussed above, the non_swap_entry() check has already kicked out
> anything that isn't a true swap entry, right?

This is a different function?

Actually I'm mistaken here I think - you check in the code above:

		if (is_swap_pte(pteval)) {
			...
		}

So this is fine, please ignore sorry :)

>
> >
> > Also an aside - I hate, hate, hate how this uffd wp stuff has infiltrated all
> > kinds of open-coded stuff. It's so gross (not your fault, just a general
> > comment...)
>
> Haha, tell me about it. No argument from me there ;)

:)

>
> Thanks,
> Lance

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH mm-new v3 1/1] mm/khugepaged: abort collapse scan on non-swap entries
  2025-10-14 14:37       ` Lance Yang
@ 2025-10-14 14:43         ` Lorenzo Stoakes
  0 siblings, 0 replies; 24+ messages in thread
From: Lorenzo Stoakes @ 2025-10-14 14:43 UTC (permalink / raw)
  To: Lance Yang
  Cc: David Hildenbrand, akpm, Liam.Howlett, baohua, baolin.wang,
	dev.jain, hughd, ioworker0, kirill, linux-kernel, linux-mm,
	mpenttil, npache, ryan.roberts, ziy, richard.weiyang

On Tue, Oct 14, 2025 at 10:37:28PM +0800, Lance Yang wrote:
>
>
> On 2025/10/14 22:32, David Hildenbrand wrote:
> >
> > > static inline int pte_protnone(pte_t pte)
> > > {
> > >     return (pte_flags(pte) & (_PAGE_PROTNONE | _PAGE_PRESENT))
> > >         == _PAGE_PROTNONE;
> > > }
> > >
> > > static inline int pte_present(pte_t a)
> > > {
> > >     return pte_flags(a) & (_PAGE_PRESENT | _PAGE_PROTNONE);
> > > }
> > > ```
> > >
> > > On x86, pte_present() returns true for a protnone pte. And I'd assume
> > > other archs behave similarly ...
> >
> > Applies to all architecture. prot-none entries must be present,
> > otherwise we'd have a lot of other issues :)
>
> Thanks for confirming, David! That's good to know ;p
>
> So @Lorenzo, looks like we're good here on the protnone front. Does
> that clear up your concern?
>

No absolutely not, I go into detail as to why in my reply. I don't want
that upstreamed.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH mm-new v3 1/1] mm/khugepaged: abort collapse scan on non-swap entries
  2025-10-14 14:39     ` Lorenzo Stoakes
@ 2025-10-14 15:01       ` Lance Yang
  2025-10-14 15:12         ` David Hildenbrand
  2025-10-14 15:11       ` David Hildenbrand
  1 sibling, 1 reply; 24+ messages in thread
From: Lance Yang @ 2025-10-14 15:01 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: akpm, david, Liam.Howlett, baohua, baolin.wang, dev.jain, hughd,
	ioworker0, kirill, linux-kernel, linux-mm, mpenttil, npache,
	ryan.roberts, ziy, richard.weiyang



On 2025/10/14 22:39, Lorenzo Stoakes wrote:
> On Tue, Oct 14, 2025 at 10:26:20PM +0800, Lance Yang wrote:
>>
>>
>> On 2025/10/14 19:08, Lorenzo Stoakes wrote:
>>> On Wed, Oct 08, 2025 at 11:26:57AM +0800, Lance Yang wrote:
>>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>>> index abe54f0043c7..bec3e268dc76 100644
>>>> --- a/mm/khugepaged.c
>>>> +++ b/mm/khugepaged.c
>>>> @@ -1020,6 +1020,11 @@ static int __collapse_huge_page_swapin(struct mm_struct *mm,
>>>>    		if (!is_swap_pte(vmf.orig_pte))
>>>>    			continue;
>>>>
>>>> +		if (non_swap_entry(pte_to_swp_entry(vmf.orig_pte))) {
>>>> +			result = SCAN_PTE_NON_PRESENT;
>>>> +			goto out;
>>>> +		}
>>>
>>> OK seems in line with what we were discussing before...
>>
>> Yep. That's the idea :)
>>
>>>
>>>> +
>>>>    		vmf.pte = pte;
>>>>    		vmf.ptl = ptl;
>>>>    		ret = do_swap_page(&vmf);
>>>> @@ -1281,7 +1286,23 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>>>>    	for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
>>>>    	     _pte++, addr += PAGE_SIZE) {
>>>>    		pte_t pteval = ptep_get(_pte);
>>>> -		if (is_swap_pte(pteval)) {
>>>> +		if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
>>>> +			++none_or_zero;
>>>> +			if (!userfaultfd_armed(vma) &&
>>>> +			    (!cc->is_khugepaged ||
>>>> +			     none_or_zero <= khugepaged_max_ptes_none)) {
>>>> +				continue;
>>>> +			} else {
>>>> +				result = SCAN_EXCEED_NONE_PTE;
>>>> +				count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
>>>> +				goto out_unmap;
>>>> +			}
>>>> +		} else if (!pte_present(pteval)) {
>>>> +			if (non_swap_entry(pte_to_swp_entry(pteval))) {
>>>
>>
>> Thanks for pointing that out!
> 
> You've deleted what I've said here and also not indicated whether you'll do what
> I asked :)
> 
> Please be clearer...

Oh, I didn't delete your comment at all ... It's just below ...

> 
>>>>> Hm but can't this be pte_protnone() at this stage (or something 
else)? And then <-- Here!
>>
>> Yeah. The funny thing is, a protnone pte cannot actually get here, IIUC.
>>
>> ```
>> static inline int pte_protnone(pte_t pte)
>> {
>> 	return (pte_flags(pte) & (_PAGE_PROTNONE | _PAGE_PRESENT))
>> 		== _PAGE_PROTNONE;
>> }
>>
>> static inline int pte_present(pte_t a)
>> {
>> 	return pte_flags(a) & (_PAGE_PRESENT | _PAGE_PROTNONE);
>> }
>> ```
>>
>> On x86, pte_present() returns true for a protnone pte. And I'd assume
>> other archs behave similarly ...
> 
> This was one example, we may make changes in the future that result in entries
> that are non-present but also non-swap.
> 
> I don't see the point in eliminating this check based on an implicit, open-coded
> assumption that this can never be the case, this is just asking for trouble.
> 
>>
>>> we're just assuming pte_to_swp_entry() is operating on a swap entry when it in
>>> fact might not be?
>>>
>>> Couldn't we end up with false positives here?
>>
>> Emm, I think we're good here and the code is doing the right thing.
> 
> I mean sorry but just - NO - to doing swap operations based on open-coded checks
> that you implicitly feel must imply a swap entry.
> 
> This makes the code a lot more confusing, it opens us up to accidentally
> breaking things in future and has little to no benefit, I don't see why we're
> doing it.
> 
> I don't think every little 'aha X must imply Y so just eliminate Z' idea need be
> implemented, this feels like a sort of 'mathematical reduction of code ignoring
> all other factors'.

Understood. Changing !pte_present() to is_swap_pte() will resolve all your
concerns, right?

```
if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
[...]
} else if (is_swap_pte(pteval)) { <-- Here
	if (non_swap_entry(pte_to_swp_entry(pteval))) {
		[...]
	}
[...]}
```

> 
>>
>>>
>>>> +				result = SCAN_PTE_NON_PRESENT;
>>>> +				goto out_unmap;
>>>> +			}
>>>> +
>>>>    			++unmapped;
>>>>    			if (!cc->is_khugepaged ||
>>>>    			    unmapped <= khugepaged_max_ptes_swap) {
>>>> @@ -1290,7 +1311,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>>>>    				 * enabled swap entries.  Please see
>>>>    				 * comment below for pte_uffd_wp().
>>>>    				 */
>>>> -				if (pte_swp_uffd_wp_any(pteval)) {
>>>> +				if (pte_swp_uffd_wp(pteval)) {
>>>
>>> Again you're assuming it's a swap entry but you're not asserting this is a swap
>>> entry in this branch?
>>
>> As we discussed above, the non_swap_entry() check has already kicked out
>> anything that isn't a true swap entry, right?
> 
> This is a different function?
> 
> Actually I'm mistaken here I think - you check in the code above:
> 
> 		if (is_swap_pte(pteval)) {
> 			...
> 		}
> 
> So this is fine, please ignore sorry :)

No worries at all, thanks for double-checking and clarifying!

> 
>>
>>>
>>> Also an aside - I hate, hate, hate how this uffd wp stuff has infiltrated all
>>> kinds of open-coded stuff. It's so gross (not your fault, just a general
>>> comment...)
>>
>> Haha, tell me about it. No argument from me there ;)
> 
> :)
> 
>>
>> Thanks,
>> Lance
> 
> Cheers, Lorenzo



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH mm-new v3 1/1] mm/khugepaged: abort collapse scan on non-swap entries
  2025-10-14 14:39     ` Lorenzo Stoakes
  2025-10-14 15:01       ` Lance Yang
@ 2025-10-14 15:11       ` David Hildenbrand
  2025-10-14 15:52         ` Lorenzo Stoakes
  1 sibling, 1 reply; 24+ messages in thread
From: David Hildenbrand @ 2025-10-14 15:11 UTC (permalink / raw)
  To: Lorenzo Stoakes, Lance Yang
  Cc: akpm, Liam.Howlett, baohua, baolin.wang, dev.jain, hughd,
	ioworker0, kirill, linux-kernel, linux-mm, mpenttil, npache,
	ryan.roberts, ziy, richard.weiyang

On 14.10.25 16:39, Lorenzo Stoakes wrote:
> On Tue, Oct 14, 2025 at 10:26:20PM +0800, Lance Yang wrote:
>>
>>
>> On 2025/10/14 19:08, Lorenzo Stoakes wrote:
>>> On Wed, Oct 08, 2025 at 11:26:57AM +0800, Lance Yang wrote:
>>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>>> index abe54f0043c7..bec3e268dc76 100644
>>>> --- a/mm/khugepaged.c
>>>> +++ b/mm/khugepaged.c
>>>> @@ -1020,6 +1020,11 @@ static int __collapse_huge_page_swapin(struct mm_struct *mm,
>>>>    		if (!is_swap_pte(vmf.orig_pte))
>>>>    			continue;
>>>>
>>>> +		if (non_swap_entry(pte_to_swp_entry(vmf.orig_pte))) {
>>>> +			result = SCAN_PTE_NON_PRESENT;
>>>> +			goto out;
>>>> +		}
>>>
>>> OK seems in line with what we were discussing before...
>>
>> Yep. That's the idea :)
>>
>>>
>>>> +
>>>>    		vmf.pte = pte;
>>>>    		vmf.ptl = ptl;
>>>>    		ret = do_swap_page(&vmf);
>>>> @@ -1281,7 +1286,23 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>>>>    	for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
>>>>    	     _pte++, addr += PAGE_SIZE) {
>>>>    		pte_t pteval = ptep_get(_pte);
>>>> -		if (is_swap_pte(pteval)) {
>>>> +		if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
>>>> +			++none_or_zero;
>>>> +			if (!userfaultfd_armed(vma) &&
>>>> +			    (!cc->is_khugepaged ||
>>>> +			     none_or_zero <= khugepaged_max_ptes_none)) {
>>>> +				continue;
>>>> +			} else {
>>>> +				result = SCAN_EXCEED_NONE_PTE;
>>>> +				count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
>>>> +				goto out_unmap;
>>>> +			}
>>>> +		} else if (!pte_present(pteval)) {
>>>> +			if (non_swap_entry(pte_to_swp_entry(pteval))) {
>>>
>>
>> Thanks for pointing that out!
> 
> You've deleted what I've said here and also not indicated whether you'll do what
> I asked :)
> 
> Please be clearer...
> 
>>
>>> Hm but can't this be pte_protnone() at this stage (or something else)? And then
>>
>> Yeah. The funny thing is, a protnone pte cannot actually get here, IIUC.
>>
>> ```
>> static inline int pte_protnone(pte_t pte)
>> {
>> 	return (pte_flags(pte) & (_PAGE_PROTNONE | _PAGE_PRESENT))
>> 		== _PAGE_PROTNONE;
>> }
>>
>> static inline int pte_present(pte_t a)
>> {
>> 	return pte_flags(a) & (_PAGE_PRESENT | _PAGE_PROTNONE);
>> }
>> ```
>>
>> On x86, pte_present() returns true for a protnone pte. And I'd assume
>> other archs behave similarly ...
> 
> This was one example, we may make changes in the future that result in entries
> that are non-present but also non-swap.
> 
> I don't see the point in eliminating this check based on an implicit, open-coded
> assumption that this can never be the case, this is just asking for trouble.
> 
>>
>>> we're just assuming pte_to_swp_entry() is operating on a swap entry when it in
>>> fact might not be?
>>>
>>> Couldn't we end up with false positives here?
>>
>> Emm, I think we're good here and the code is doing the right thing.
> 
> I mean sorry but just - NO - to doing swap operations based on open-coded checks
> that you implicitly feel must imply a swap entry.
> 
> This makes the code a lot more confusing, it opens us up to accidentally
> breaking things in future and has little to no benefit, I don't see why we're
> doing it.
> 
> I don't think every little 'aha X must imply Y so just eliminate Z' idea need be
> implemented, this feels like a sort of 'mathematical reduction of code ignoring
> all other factors'.

Not sure I follow. If something is !none && !present it's what we call a 
swap PTE (that includes actual swap and non-swap PTEs).

We have the exact same code flow for example in 
copy_pte_range()->copy_nonpresent_pte() and I don't see a problem with it.

If we were to ever change what we call a "swap PTE" (I don't think so?) 
we'd have to fix stuff all over the place.

Maybe I get the concern here wrong?

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH mm-new v3 1/1] mm/khugepaged: abort collapse scan on non-swap entries
  2025-10-14 15:01       ` Lance Yang
@ 2025-10-14 15:12         ` David Hildenbrand
  2025-10-14 15:33           ` David Hildenbrand
  2025-10-14 15:41           ` Lorenzo Stoakes
  0 siblings, 2 replies; 24+ messages in thread
From: David Hildenbrand @ 2025-10-14 15:12 UTC (permalink / raw)
  To: Lance Yang, Lorenzo Stoakes
  Cc: akpm, Liam.Howlett, baohua, baolin.wang, dev.jain, hughd,
	ioworker0, kirill, linux-kernel, linux-mm, mpenttil, npache,
	ryan.roberts, ziy, richard.weiyang

On 14.10.25 17:01, Lance Yang wrote:
> 
> 
> On 2025/10/14 22:39, Lorenzo Stoakes wrote:
>> On Tue, Oct 14, 2025 at 10:26:20PM +0800, Lance Yang wrote:
>>>
>>>
>>> On 2025/10/14 19:08, Lorenzo Stoakes wrote:
>>>> On Wed, Oct 08, 2025 at 11:26:57AM +0800, Lance Yang wrote:
>>>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>>>> index abe54f0043c7..bec3e268dc76 100644
>>>>> --- a/mm/khugepaged.c
>>>>> +++ b/mm/khugepaged.c
>>>>> @@ -1020,6 +1020,11 @@ static int __collapse_huge_page_swapin(struct mm_struct *mm,
>>>>>     		if (!is_swap_pte(vmf.orig_pte))
>>>>>     			continue;
>>>>>
>>>>> +		if (non_swap_entry(pte_to_swp_entry(vmf.orig_pte))) {
>>>>> +			result = SCAN_PTE_NON_PRESENT;
>>>>> +			goto out;
>>>>> +		}
>>>>
>>>> OK seems in line with what we were discussing before...
>>>
>>> Yep. That's the idea :)
>>>
>>>>
>>>>> +
>>>>>     		vmf.pte = pte;
>>>>>     		vmf.ptl = ptl;
>>>>>     		ret = do_swap_page(&vmf);
>>>>> @@ -1281,7 +1286,23 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>>>>>     	for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
>>>>>     	     _pte++, addr += PAGE_SIZE) {
>>>>>     		pte_t pteval = ptep_get(_pte);
>>>>> -		if (is_swap_pte(pteval)) {
>>>>> +		if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
>>>>> +			++none_or_zero;
>>>>> +			if (!userfaultfd_armed(vma) &&
>>>>> +			    (!cc->is_khugepaged ||
>>>>> +			     none_or_zero <= khugepaged_max_ptes_none)) {
>>>>> +				continue;
>>>>> +			} else {
>>>>> +				result = SCAN_EXCEED_NONE_PTE;
>>>>> +				count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
>>>>> +				goto out_unmap;
>>>>> +			}
>>>>> +		} else if (!pte_present(pteval)) {
>>>>> +			if (non_swap_entry(pte_to_swp_entry(pteval))) {
>>>>
>>>
>>> Thanks for pointing that out!
>>
>> You've deleted what I've said here and also not indicated whether you'll do what
>> I asked :)
>>
>> Please be clearer...
> 
> Oh, I didn't delete your comment at all ... It's just below ...
> 
>>
>>>>>> Hm but can't this be pte_protnone() at this stage (or something
> else)? And then <-- Here!
>>>
>>> Yeah. The funny thing is, a protnone pte cannot actually get here, IIUC.
>>>
>>> ```
>>> static inline int pte_protnone(pte_t pte)
>>> {
>>> 	return (pte_flags(pte) & (_PAGE_PROTNONE | _PAGE_PRESENT))
>>> 		== _PAGE_PROTNONE;
>>> }
>>>
>>> static inline int pte_present(pte_t a)
>>> {
>>> 	return pte_flags(a) & (_PAGE_PRESENT | _PAGE_PROTNONE);
>>> }
>>> ```
>>>
>>> On x86, pte_present() returns true for a protnone pte. And I'd assume
>>> other archs behave similarly ...
>>
>> This was one example, we may make changes in the future that result in entries
>> that are non-present but also non-swap.
>>
>> I don't see the point in eliminating this check based on an implicit, open-coded
>> assumption that this can never be the case, this is just asking for trouble.
>>
>>>
>>>> we're just assuming pte_to_swp_entry() is operating on a swap entry when it in
>>>> fact might not be?
>>>>
>>>> Couldn't we end up with false positives here?
>>>
>>> Emm, I think we're good here and the code is doing the right thing.
>>
>> I mean sorry but just - NO - to doing swap operations based on open-coded checks
>> that you implicitly feel must imply a swap entry.
>>
>> This makes the code a lot more confusing, it opens us up to accidentally
>> breaking things in future and has little to no benefit, I don't see why we're
>> doing it.
>>
>> I don't think every little 'aha X must imply Y so just eliminate Z' idea need be
>> implemented, this feels like a sort of 'mathematical reduction of code ignoring
>> all other factors'.
> 
> Understood. Changing !pte_present() to is_swap_pte() will resolve all your
> concerns, right?
> 
> ```
> if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
> [...]
> } else if (is_swap_pte(pteval)) { <-- Here
> 	if (non_swap_entry(pte_to_swp_entry(pteval))) {
> 		[...]
> 	}
> [...]}

Can we please take a step back and make sure we are not starting to do 
stuff differently than elswehere in the kernel, please?

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH mm-new v3 1/1] mm/khugepaged: abort collapse scan on non-swap entries
  2025-10-14 15:12         ` David Hildenbrand
@ 2025-10-14 15:33           ` David Hildenbrand
  2025-10-14 16:10             ` Lorenzo Stoakes
  2025-10-14 15:41           ` Lorenzo Stoakes
  1 sibling, 1 reply; 24+ messages in thread
From: David Hildenbrand @ 2025-10-14 15:33 UTC (permalink / raw)
  To: Lance Yang, Lorenzo Stoakes
  Cc: akpm, Liam.Howlett, baohua, baolin.wang, dev.jain, hughd,
	ioworker0, kirill, linux-kernel, linux-mm, mpenttil, npache,
	ryan.roberts, ziy, richard.weiyang

On 14.10.25 17:12, David Hildenbrand wrote:
> On 14.10.25 17:01, Lance Yang wrote:
>>
>>
>> On 2025/10/14 22:39, Lorenzo Stoakes wrote:
>>> On Tue, Oct 14, 2025 at 10:26:20PM +0800, Lance Yang wrote:
>>>>
>>>>
>>>> On 2025/10/14 19:08, Lorenzo Stoakes wrote:
>>>>> On Wed, Oct 08, 2025 at 11:26:57AM +0800, Lance Yang wrote:
>>>>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>>>>> index abe54f0043c7..bec3e268dc76 100644
>>>>>> --- a/mm/khugepaged.c
>>>>>> +++ b/mm/khugepaged.c
>>>>>> @@ -1020,6 +1020,11 @@ static int __collapse_huge_page_swapin(struct mm_struct *mm,
>>>>>>      		if (!is_swap_pte(vmf.orig_pte))
>>>>>>      			continue;
>>>>>>
>>>>>> +		if (non_swap_entry(pte_to_swp_entry(vmf.orig_pte))) {
>>>>>> +			result = SCAN_PTE_NON_PRESENT;
>>>>>> +			goto out;
>>>>>> +		}
>>>>>
>>>>> OK seems in line with what we were discussing before...
>>>>
>>>> Yep. That's the idea :)
>>>>
>>>>>
>>>>>> +
>>>>>>      		vmf.pte = pte;
>>>>>>      		vmf.ptl = ptl;
>>>>>>      		ret = do_swap_page(&vmf);
>>>>>> @@ -1281,7 +1286,23 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>>>>>>      	for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
>>>>>>      	     _pte++, addr += PAGE_SIZE) {
>>>>>>      		pte_t pteval = ptep_get(_pte);
>>>>>> -		if (is_swap_pte(pteval)) {
>>>>>> +		if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
>>>>>> +			++none_or_zero;
>>>>>> +			if (!userfaultfd_armed(vma) &&
>>>>>> +			    (!cc->is_khugepaged ||
>>>>>> +			     none_or_zero <= khugepaged_max_ptes_none)) {
>>>>>> +				continue;
>>>>>> +			} else {
>>>>>> +				result = SCAN_EXCEED_NONE_PTE;
>>>>>> +				count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
>>>>>> +				goto out_unmap;
>>>>>> +			}
>>>>>> +		} else if (!pte_present(pteval)) {
>>>>>> +			if (non_swap_entry(pte_to_swp_entry(pteval))) {
>>>>>
>>>>
>>>> Thanks for pointing that out!
>>>
>>> You've deleted what I've said here and also not indicated whether you'll do what
>>> I asked :)
>>>
>>> Please be clearer...
>>
>> Oh, I didn't delete your comment at all ... It's just below ...
>>
>>>
>>>>>>> Hm but can't this be pte_protnone() at this stage (or something
>> else)? And then <-- Here!
>>>>
>>>> Yeah. The funny thing is, a protnone pte cannot actually get here, IIUC.
>>>>
>>>> ```
>>>> static inline int pte_protnone(pte_t pte)
>>>> {
>>>> 	return (pte_flags(pte) & (_PAGE_PROTNONE | _PAGE_PRESENT))
>>>> 		== _PAGE_PROTNONE;
>>>> }
>>>>
>>>> static inline int pte_present(pte_t a)
>>>> {
>>>> 	return pte_flags(a) & (_PAGE_PRESENT | _PAGE_PROTNONE);
>>>> }
>>>> ```
>>>>
>>>> On x86, pte_present() returns true for a protnone pte. And I'd assume
>>>> other archs behave similarly ...
>>>
>>> This was one example, we may make changes in the future that result in entries
>>> that are non-present but also non-swap.
>>>
>>> I don't see the point in eliminating this check based on an implicit, open-coded
>>> assumption that this can never be the case, this is just asking for trouble.
>>>
>>>>
>>>>> we're just assuming pte_to_swp_entry() is operating on a swap entry when it in
>>>>> fact might not be?
>>>>>
>>>>> Couldn't we end up with false positives here?
>>>>
>>>> Emm, I think we're good here and the code is doing the right thing.
>>>
>>> I mean sorry but just - NO - to doing swap operations based on open-coded checks
>>> that you implicitly feel must imply a swap entry.
>>>
>>> This makes the code a lot more confusing, it opens us up to accidentally
>>> breaking things in future and has little to no benefit, I don't see why we're
>>> doing it.
>>>
>>> I don't think every little 'aha X must imply Y so just eliminate Z' idea need be
>>> implemented, this feels like a sort of 'mathematical reduction of code ignoring
>>> all other factors'.
>>
>> Understood. Changing !pte_present() to is_swap_pte() will resolve all your
>> concerns, right?
>>
>> ```
>> if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
>> [...]
>> } else if (is_swap_pte(pteval)) { <-- Here
>> 	if (non_swap_entry(pte_to_swp_entry(pteval))) {
>> 		[...]
>> 	}
>> [...]}
> 
> Can we please take a step back and make sure we are not starting to do
> stuff differently than elswehere in the kernel, please?
> 

For the sake of progress, I assume the compiler will optimize out the 
additional pte_none() stuff.

I absolutely, absolutely hate is_swap_pte(). To me, it makes the code 
more confusing that talking about something that is !present but also 
!none: there is something that is not an ordinary page table mapping.

The underlying problem is how we hacked in non-swap into swap (and 
that's exactly where it gets confusing). Well, which this series is all 
about.

So, I don't care in the end here.

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH mm-new v3 1/1] mm/khugepaged: abort collapse scan on non-swap entries
  2025-10-14 15:12         ` David Hildenbrand
  2025-10-14 15:33           ` David Hildenbrand
@ 2025-10-14 15:41           ` Lorenzo Stoakes
  2025-10-14 15:48             ` David Hildenbrand
  1 sibling, 1 reply; 24+ messages in thread
From: Lorenzo Stoakes @ 2025-10-14 15:41 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Lance Yang, akpm, Liam.Howlett, baohua, baolin.wang, dev.jain,
	hughd, ioworker0, kirill, linux-kernel, linux-mm, mpenttil,
	npache, ryan.roberts, ziy, richard.weiyang

On Tue, Oct 14, 2025 at 05:12:18PM +0200, David Hildenbrand wrote:
> Can we please take a step back and make sure we are not starting to do stuff
> differently than elswehere in the kernel, please?

The code already had the is_swap_pte() check in it, so this is not changing
anything, it's just avoiding confusion.

Even if this pattern exists elsewhere, I don't think propagating 'just have
to happen to know X, Y, Z' to understand it is a good idea.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH mm-new v3 1/1] mm/khugepaged: abort collapse scan on non-swap entries
  2025-10-14 15:41           ` Lorenzo Stoakes
@ 2025-10-14 15:48             ` David Hildenbrand
  2025-10-14 15:57               ` Lorenzo Stoakes
  0 siblings, 1 reply; 24+ messages in thread
From: David Hildenbrand @ 2025-10-14 15:48 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Lance Yang, akpm, Liam.Howlett, baohua, baolin.wang, dev.jain,
	hughd, ioworker0, kirill, linux-kernel, linux-mm, mpenttil,
	npache, ryan.roberts, ziy, richard.weiyang

On 14.10.25 17:41, Lorenzo Stoakes wrote:
> On Tue, Oct 14, 2025 at 05:12:18PM +0200, David Hildenbrand wrote:
>> Can we please take a step back and make sure we are not starting to do stuff
>> differently than elswehere in the kernel, please?
> 
> The code already had the is_swap_pte() check in it, so this is not changing
> anything, it's just avoiding confusion.

Well, in hpage_collapse_scan_pmd(), in __collapse_huge_page_isolate() no :)

And then it's called SCAN_PTE_NON_PRESENT and it all starts being a mess 
TBH.

But as raised offline, I don't have the time+energy for that today, so I 
don't care enough.

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH mm-new v3 1/1] mm/khugepaged: abort collapse scan on non-swap entries
  2025-10-14 15:11       ` David Hildenbrand
@ 2025-10-14 15:52         ` Lorenzo Stoakes
  2025-10-14 16:09           ` Lance Yang
  0 siblings, 1 reply; 24+ messages in thread
From: Lorenzo Stoakes @ 2025-10-14 15:52 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Lance Yang, akpm, Liam.Howlett, baohua, baolin.wang, dev.jain,
	hughd, ioworker0, kirill, linux-kernel, linux-mm, mpenttil,
	npache, ryan.roberts, ziy, richard.weiyang

On Tue, Oct 14, 2025 at 05:11:12PM +0200, David Hildenbrand wrote:
> On 14.10.25 16:39, Lorenzo Stoakes wrote:
> > On Tue, Oct 14, 2025 at 10:26:20PM +0800, Lance Yang wrote:
> > >
> > >
> > > On 2025/10/14 19:08, Lorenzo Stoakes wrote:
> > > > On Wed, Oct 08, 2025 at 11:26:57AM +0800, Lance Yang wrote:
> > > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > > > > index abe54f0043c7..bec3e268dc76 100644
> > > > > --- a/mm/khugepaged.c
> > > > > +++ b/mm/khugepaged.c
> > > > > @@ -1020,6 +1020,11 @@ static int __collapse_huge_page_swapin(struct mm_struct *mm,
> > > > >    		if (!is_swap_pte(vmf.orig_pte))
> > > > >    			continue;
> > > > >
> > > > > +		if (non_swap_entry(pte_to_swp_entry(vmf.orig_pte))) {
> > > > > +			result = SCAN_PTE_NON_PRESENT;
> > > > > +			goto out;
> > > > > +		}
> > > >
> > > > OK seems in line with what we were discussing before...
> > >
> > > Yep. That's the idea :)
> > >
> > > >
> > > > > +
> > > > >    		vmf.pte = pte;
> > > > >    		vmf.ptl = ptl;
> > > > >    		ret = do_swap_page(&vmf);
> > > > > @@ -1281,7 +1286,23 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
> > > > >    	for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
> > > > >    	     _pte++, addr += PAGE_SIZE) {
> > > > >    		pte_t pteval = ptep_get(_pte);
> > > > > -		if (is_swap_pte(pteval)) {
> > > > > +		if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
> > > > > +			++none_or_zero;
> > > > > +			if (!userfaultfd_armed(vma) &&
> > > > > +			    (!cc->is_khugepaged ||
> > > > > +			     none_or_zero <= khugepaged_max_ptes_none)) {
> > > > > +				continue;
> > > > > +			} else {
> > > > > +				result = SCAN_EXCEED_NONE_PTE;
> > > > > +				count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
> > > > > +				goto out_unmap;
> > > > > +			}
> > > > > +		} else if (!pte_present(pteval)) {
> > > > > +			if (non_swap_entry(pte_to_swp_entry(pteval))) {
> > > >
> > >
> > > Thanks for pointing that out!
> >
> > You've deleted what I've said here and also not indicated whether you'll do what
> > I asked :)
> >
> > Please be clearer...
> >
> > >
> > > > Hm but can't this be pte_protnone() at this stage (or something else)? And then
> > >
> > > Yeah. The funny thing is, a protnone pte cannot actually get here, IIUC.
> > >
> > > ```
> > > static inline int pte_protnone(pte_t pte)
> > > {
> > > 	return (pte_flags(pte) & (_PAGE_PROTNONE | _PAGE_PRESENT))
> > > 		== _PAGE_PROTNONE;
> > > }
> > >
> > > static inline int pte_present(pte_t a)
> > > {
> > > 	return pte_flags(a) & (_PAGE_PRESENT | _PAGE_PROTNONE);
> > > }
> > > ```
> > >
> > > On x86, pte_present() returns true for a protnone pte. And I'd assume
> > > other archs behave similarly ...
> >
> > This was one example, we may make changes in the future that result in entries
> > that are non-present but also non-swap.
> >
> > I don't see the point in eliminating this check based on an implicit, open-coded
> > assumption that this can never be the case, this is just asking for trouble.
> >
> > >
> > > > we're just assuming pte_to_swp_entry() is operating on a swap entry when it in
> > > > fact might not be?
> > > >
> > > > Couldn't we end up with false positives here?
> > >
> > > Emm, I think we're good here and the code is doing the right thing.
> >
> > I mean sorry but just - NO - to doing swap operations based on open-coded checks
> > that you implicitly feel must imply a swap entry.
> >
> > This makes the code a lot more confusing, it opens us up to accidentally
> > breaking things in future and has little to no benefit, I don't see why we're
> > doing it.
> >
> > I don't think every little 'aha X must imply Y so just eliminate Z' idea need be
> > implemented, this feels like a sort of 'mathematical reduction of code ignoring
> > all other factors'.
>
> Not sure I follow. If something is !none && !present it's what we call a
> swap PTE (that includes actual swap and non-swap PTEs).

You see this is the issue. You know that. I knew that, then forgot that :)

Obviously you can go read the function:

/* check whether a pte points to a swap entry */
static inline int is_swap_pte(pte_t pte)
{
	return !pte_none(pte) && !pte_present(pte);
}

And see that this is the definition.

You also have to know that we hack pte_present() so it doesn't actually tell you
the whether the PTE is present but whether it's present and _not protnone_ and
again, I just forgot :>)

But that's the point! You shouldn't _need_ to 'just know' these things.

We have some operations that _require_ the entry to be a swap entry (including
the very poorly named 'non swap entry'), or they'll read back/write garbage.

My key arugment is:

	if (is_thing_we_want())
		do_stuff_with_thing();

Abstracts all this, and is far better than:

	if (condition_a) {
		...
	} else if (!condition_b) {
		do_stuff_with_thing(); // you should just know that !a, !b means we can.
	}

I do agree that the naming is very bad here.

>
> We have the exact same code flow for example in
> copy_pte_range()->copy_nonpresent_pte() and I don't see a problem with it.

Obviously I'd make the same argument above.

>
> If we were to ever change what we call a "swap PTE" (I don't think so?) we'd
> have to fix stuff all over the place.

Sorry it's my mistake for raising this point about some imagined new non-present
PTE that is also not-none, I'm fresh off vacation and forgetting things clearly
:)

So withdraw that point.

>
> Maybe I get the concern here wrong?
>
> --
> Cheers
>
> David / dhildenb
>

Perhaps the naming is really the issue...

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH mm-new v3 1/1] mm/khugepaged: abort collapse scan on non-swap entries
  2025-10-14 15:48             ` David Hildenbrand
@ 2025-10-14 15:57               ` Lorenzo Stoakes
  0 siblings, 0 replies; 24+ messages in thread
From: Lorenzo Stoakes @ 2025-10-14 15:57 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Lance Yang, akpm, Liam.Howlett, baohua, baolin.wang, dev.jain,
	hughd, ioworker0, kirill, linux-kernel, linux-mm, mpenttil,
	npache, ryan.roberts, ziy, richard.weiyang

On Tue, Oct 14, 2025 at 05:48:04PM +0200, David Hildenbrand wrote:
> On 14.10.25 17:41, Lorenzo Stoakes wrote:
> > On Tue, Oct 14, 2025 at 05:12:18PM +0200, David Hildenbrand wrote:
> > > Can we please take a step back and make sure we are not starting to do stuff
> > > differently than elswehere in the kernel, please?
> >
> > The code already had the is_swap_pte() check in it, so this is not changing
> > anything, it's just avoiding confusion.
>
> Well, in hpage_collapse_scan_pmd(), in __collapse_huge_page_isolate() no :)
>
> And then it's called SCAN_PTE_NON_PRESENT and it all starts being a mess
> TBH.

Yeah ugh. THP code is generally a mess :(

>
> But as raised offline, I don't have the time+energy for that today, so I
> don't care enough.

Time, energy? What are those? :P

>
> --
> Cheers
>
> David / dhildenb
>

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH mm-new v3 1/1] mm/khugepaged: abort collapse scan on non-swap entries
  2025-10-14 15:52         ` Lorenzo Stoakes
@ 2025-10-14 16:09           ` Lance Yang
  2025-10-14 16:27             ` Lorenzo Stoakes
  0 siblings, 1 reply; 24+ messages in thread
From: Lance Yang @ 2025-10-14 16:09 UTC (permalink / raw)
  To: Lorenzo Stoakes, David Hildenbrand
  Cc: akpm, Liam.Howlett, baohua, baolin.wang, dev.jain, hughd,
	ioworker0, kirill, linux-kernel, linux-mm, mpenttil, npache,
	ryan.roberts, ziy, richard.weiyang

Hi David, Lorenzo,

Thanks to both of you for the deep dive on this!!!

The code we've been discussing was moved to a new helper by
another patch series[1], so let's call it a day on this
thread and leave it as-is and just review/change the logic
at its new home :)

[1] 
https://lore.kernel.org/linux-mm/20251008043748.45554-1-lance.yang@linux.dev/

Thanks a lot!
Lance


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH mm-new v3 1/1] mm/khugepaged: abort collapse scan on non-swap entries
  2025-10-14 15:33           ` David Hildenbrand
@ 2025-10-14 16:10             ` Lorenzo Stoakes
  0 siblings, 0 replies; 24+ messages in thread
From: Lorenzo Stoakes @ 2025-10-14 16:10 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Lance Yang, akpm, Liam.Howlett, baohua, baolin.wang, dev.jain,
	hughd, ioworker0, kirill, linux-kernel, linux-mm, mpenttil,
	npache, ryan.roberts, ziy, richard.weiyang

On Tue, Oct 14, 2025 at 05:33:18PM +0200, David Hildenbrand wrote:
> For the sake of progress, I assume the compiler will optimize out the
> additional pte_none() stuff.
>
> I absolutely, absolutely hate is_swap_pte(). To me, it makes the code more
> confusing that talking about something that is !present but also !none:
> there is something that is not an ordinary page table mapping.

Yeah it's nasty for sure.

I hate non-swap-entry swap entries with a passion.

We clearly need to rework this... :)

Naming is hard however...

>
> The underlying problem is how we hacked in non-swap into swap (and that's
> exactly where it gets confusing). Well, which this series is all about.

Yup

>
> So, I don't care in the end here.

I do still feel a single guard condition works better than having to know
implementaiton details so would still prefer this change to be made.

>
> --
> Cheers
>
> David / dhildenb
>

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH mm-new v3 1/1] mm/khugepaged: abort collapse scan on non-swap entries
  2025-10-14 16:09           ` Lance Yang
@ 2025-10-14 16:27             ` Lorenzo Stoakes
  2025-10-15  1:52               ` Lance Yang
  0 siblings, 1 reply; 24+ messages in thread
From: Lorenzo Stoakes @ 2025-10-14 16:27 UTC (permalink / raw)
  To: Lance Yang
  Cc: David Hildenbrand, akpm, Liam.Howlett, baohua, baolin.wang,
	dev.jain, hughd, ioworker0, kirill, linux-kernel, linux-mm,
	mpenttil, npache, ryan.roberts, ziy, richard.weiyang

On Wed, Oct 15, 2025 at 12:09:42AM +0800, Lance Yang wrote:
> Hi David, Lorenzo,
>
> Thanks to both of you for the deep dive on this!!!

That's ok :)

>
> The code we've been discussing was moved to a new helper by
> another patch series[1], so let's call it a day on this
> thread and leave it as-is and just review/change the logic
> at its new home :)

Ahh, please do make that clear in future, I reviewed this on the basis it
was a dependency of [1] :)

I separately reviewed that on that basis.

Will reply there...

>
> [1]
> https://lore.kernel.org/linux-mm/20251008043748.45554-1-lance.yang@linux.dev/
>
> Thanks a lot!
> Lance

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH mm-new v3 1/1] mm/khugepaged: abort collapse scan on non-swap entries
  2025-10-14 16:27             ` Lorenzo Stoakes
@ 2025-10-15  1:52               ` Lance Yang
  0 siblings, 0 replies; 24+ messages in thread
From: Lance Yang @ 2025-10-15  1:52 UTC (permalink / raw)
  To: akpm
  Cc: David Hildenbrand, Liam.Howlett, baohua, baolin.wang, dev.jain,
	hughd, ioworker0, kirill, linux-kernel, linux-mm, mpenttil,
	npache, ryan.roberts, ziy, richard.weiyang, Lorenzo Stoakes


@Andrew could you please drop this patch from mm-new ;)

Thanks,
Lance

On 2025/10/15 00:27, Lorenzo Stoakes wrote:
> On Wed, Oct 15, 2025 at 12:09:42AM +0800, Lance Yang wrote:
>> Hi David, Lorenzo,
>>
>> Thanks to both of you for the deep dive on this!!!
> 
> That's ok :)
> 
>>
>> The code we've been discussing was moved to a new helper by
>> another patch series[1], so let's call it a day on this
>> thread and leave it as-is and just review/change the logic
>> at its new home :)
> 
> Ahh, please do make that clear in future, I reviewed this on the basis it
> was a dependency of [1] :)
> 
> I separately reviewed that on that basis.
> 
> Will reply there...
> 
>>
>> [1]
>> https://lore.kernel.org/linux-mm/20251008043748.45554-1-lance.yang@linux.dev/
>>
>> Thanks a lot!
>> Lance
> 
> Cheers, Lorenzo



^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2025-10-15  1:52 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-08  3:26 [PATCH mm-new v3 1/1] mm/khugepaged: abort collapse scan on non-swap entries Lance Yang
2025-10-08  8:18 ` David Hildenbrand
2025-10-10  3:44 ` Barry Song
2025-10-10 15:21 ` Zi Yan
2025-10-10 15:34   ` Lance Yang
2025-10-10 15:35     ` Zi Yan
2025-10-14 11:08 ` Lorenzo Stoakes
2025-10-14 14:26   ` Lance Yang
2025-10-14 14:32     ` David Hildenbrand
2025-10-14 14:37       ` Lance Yang
2025-10-14 14:43         ` Lorenzo Stoakes
2025-10-14 14:39     ` Lorenzo Stoakes
2025-10-14 15:01       ` Lance Yang
2025-10-14 15:12         ` David Hildenbrand
2025-10-14 15:33           ` David Hildenbrand
2025-10-14 16:10             ` Lorenzo Stoakes
2025-10-14 15:41           ` Lorenzo Stoakes
2025-10-14 15:48             ` David Hildenbrand
2025-10-14 15:57               ` Lorenzo Stoakes
2025-10-14 15:11       ` David Hildenbrand
2025-10-14 15:52         ` Lorenzo Stoakes
2025-10-14 16:09           ` Lance Yang
2025-10-14 16:27             ` Lorenzo Stoakes
2025-10-15  1:52               ` Lance Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox