* [PATCH mm-new v2 1/1] mm/khugepaged: abort collapse scan on non-swap entries
@ 2025-10-01 3:22 Lance Yang
2025-10-01 8:31 ` David Hildenbrand
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: Lance Yang @ 2025-10-01 3:22 UTC (permalink / raw)
To: akpm
Cc: david, lorenzo.stoakes, Liam.Howlett, baohua, baolin.wang,
dev.jain, hughd, ioworker0, kirill, linux-kernel, linux-mm,
mpenttil, npache, ryan.roberts, ziy, richard.weiyang, Lance Yang
From: Lance Yang <lance.yang@linux.dev>
Currently, special non-swap entries (like migration, hwpoison, or PTE
markers) are not caught early in hpage_collapse_scan_pmd(), leading to
failures deep in the swap-in logic.
hpage_collapse_scan_pmd()
`- collapse_huge_page()
`- __collapse_huge_page_swapin() -> fails!
As David suggested[1], this patch skips any such non-swap entries
early. If any one is found, the scan is aborted immediately with the
SCAN_PTE_NON_PRESENT result, as Lorenzo suggested[2], avoiding wasted
work.
[1] https://lore.kernel.org/linux-mm/7840f68e-7580-42cb-a7c8-1ba64fd6df69@redhat.com
[2] https://lore.kernel.org/linux-mm/7df49fe7-c6b7-426a-8680-dcd55219c8bd@lucifer.local
Suggested-by: David Hildenbrand <david@redhat.com>
Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Lance Yang <lance.yang@linux.dev>
---
v1 -> v2:
- Skip all non-present entries except swap entries (per David) thanks!
- https://lore.kernel.org/linux-mm/20250924100207.28332-1-lance.yang@linux.dev/
mm/khugepaged.c | 32 ++++++++++++++++++--------------
1 file changed, 18 insertions(+), 14 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 7ab2d1a42df3..d0957648db19 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1284,7 +1284,23 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
_pte++, addr += PAGE_SIZE) {
pte_t pteval = ptep_get(_pte);
- if (is_swap_pte(pteval)) {
+ if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
+ ++none_or_zero;
+ if (!userfaultfd_armed(vma) &&
+ (!cc->is_khugepaged ||
+ none_or_zero <= khugepaged_max_ptes_none)) {
+ continue;
+ } else {
+ result = SCAN_EXCEED_NONE_PTE;
+ count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
+ goto out_unmap;
+ }
+ } else if (!pte_present(pteval)) {
+ if (non_swap_entry(pte_to_swp_entry(pteval))) {
+ result = SCAN_PTE_NON_PRESENT;
+ goto out_unmap;
+ }
+
++unmapped;
if (!cc->is_khugepaged ||
unmapped <= khugepaged_max_ptes_swap) {
@@ -1293,7 +1309,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
* enabled swap entries. Please see
* comment below for pte_uffd_wp().
*/
- if (pte_swp_uffd_wp_any(pteval)) {
+ if (pte_swp_uffd_wp(pteval)) {
result = SCAN_PTE_UFFD_WP;
goto out_unmap;
}
@@ -1304,18 +1320,6 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
goto out_unmap;
}
}
- if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
- ++none_or_zero;
- if (!userfaultfd_armed(vma) &&
- (!cc->is_khugepaged ||
- none_or_zero <= khugepaged_max_ptes_none)) {
- continue;
- } else {
- result = SCAN_EXCEED_NONE_PTE;
- count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
- goto out_unmap;
- }
- }
if (pte_uffd_wp(pteval)) {
/*
* Don't collapse the page if any of the small
--
2.49.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH mm-new v2 1/1] mm/khugepaged: abort collapse scan on non-swap entries
2025-10-01 3:22 [PATCH mm-new v2 1/1] mm/khugepaged: abort collapse scan on non-swap entries Lance Yang
@ 2025-10-01 8:31 ` David Hildenbrand
2025-10-01 9:38 ` Lance Yang
2025-10-01 8:54 ` Wei Yang
2025-10-01 10:20 ` Dev Jain
2 siblings, 1 reply; 16+ messages in thread
From: David Hildenbrand @ 2025-10-01 8:31 UTC (permalink / raw)
To: Lance Yang, akpm
Cc: lorenzo.stoakes, Liam.Howlett, baohua, baolin.wang, dev.jain,
hughd, ioworker0, kirill, linux-kernel, linux-mm, mpenttil,
npache, ryan.roberts, ziy, richard.weiyang
On 01.10.25 05:22, Lance Yang wrote:
> From: Lance Yang <lance.yang@linux.dev>
>
> Currently, special non-swap entries (like migration, hwpoison, or PTE
> markers) are not caught early in hpage_collapse_scan_pmd(), leading to
> failures deep in the swap-in logic.
>
> hpage_collapse_scan_pmd()
> `- collapse_huge_page()
> `- __collapse_huge_page_swapin() -> fails!
>
> As David suggested[1], this patch skips any such non-swap entries
> early. If any one is found, the scan is aborted immediately with the
> SCAN_PTE_NON_PRESENT result, as Lorenzo suggested[2], avoiding wasted
> work.
>
> [1] https://lore.kernel.org/linux-mm/7840f68e-7580-42cb-a7c8-1ba64fd6df69@redhat.com
> [2] https://lore.kernel.org/linux-mm/7df49fe7-c6b7-426a-8680-dcd55219c8bd@lucifer.local
>
> Suggested-by: David Hildenbrand <david@redhat.com>
> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Signed-off-by: Lance Yang <lance.yang@linux.dev>
> ---
Could have mentioned you adjust the flow of code to resemble what we
have in __collapse_huge_page_isolate().
Acked-by: David Hildenbrand <david@redhat.com>
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH mm-new v2 1/1] mm/khugepaged: abort collapse scan on non-swap entries
2025-10-01 3:22 [PATCH mm-new v2 1/1] mm/khugepaged: abort collapse scan on non-swap entries Lance Yang
2025-10-01 8:31 ` David Hildenbrand
@ 2025-10-01 8:54 ` Wei Yang
2025-10-01 9:15 ` David Hildenbrand
2025-10-01 10:05 ` Lance Yang
2025-10-01 10:20 ` Dev Jain
2 siblings, 2 replies; 16+ messages in thread
From: Wei Yang @ 2025-10-01 8:54 UTC (permalink / raw)
To: Lance Yang
Cc: akpm, david, lorenzo.stoakes, Liam.Howlett, baohua, baolin.wang,
dev.jain, hughd, ioworker0, kirill, linux-kernel, linux-mm,
mpenttil, npache, ryan.roberts, ziy, richard.weiyang
On Wed, Oct 01, 2025 at 11:22:51AM +0800, Lance Yang wrote:
>From: Lance Yang <lance.yang@linux.dev>
>
>Currently, special non-swap entries (like migration, hwpoison, or PTE
>markers) are not caught early in hpage_collapse_scan_pmd(), leading to
>failures deep in the swap-in logic.
>
>hpage_collapse_scan_pmd()
> `- collapse_huge_page()
> `- __collapse_huge_page_swapin() -> fails!
>
>As David suggested[1], this patch skips any such non-swap entries
>early. If any one is found, the scan is aborted immediately with the
>SCAN_PTE_NON_PRESENT result, as Lorenzo suggested[2], avoiding wasted
>work.
>
>[1] https://lore.kernel.org/linux-mm/7840f68e-7580-42cb-a7c8-1ba64fd6df69@redhat.com
>[2] https://lore.kernel.org/linux-mm/7df49fe7-c6b7-426a-8680-dcd55219c8bd@lucifer.local
>
>Suggested-by: David Hildenbrand <david@redhat.com>
>Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>Signed-off-by: Lance Yang <lance.yang@linux.dev>
>---
>v1 -> v2:
> - Skip all non-present entries except swap entries (per David) thanks!
> - https://lore.kernel.org/linux-mm/20250924100207.28332-1-lance.yang@linux.dev/
>
> mm/khugepaged.c | 32 ++++++++++++++++++--------------
> 1 file changed, 18 insertions(+), 14 deletions(-)
>
>diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>index 7ab2d1a42df3..d0957648db19 100644
>--- a/mm/khugepaged.c
>+++ b/mm/khugepaged.c
>@@ -1284,7 +1284,23 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
> for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
> _pte++, addr += PAGE_SIZE) {
> pte_t pteval = ptep_get(_pte);
>- if (is_swap_pte(pteval)) {
It looks is_swap_pte() is mis-leading?
>+ if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
>+ ++none_or_zero;
>+ if (!userfaultfd_armed(vma) &&
>+ (!cc->is_khugepaged ||
>+ none_or_zero <= khugepaged_max_ptes_none)) {
>+ continue;
>+ } else {
>+ result = SCAN_EXCEED_NONE_PTE;
>+ count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
>+ goto out_unmap;
>+ }
>+ } else if (!pte_present(pteval)) {
>+ if (non_swap_entry(pte_to_swp_entry(pteval))) {
>+ result = SCAN_PTE_NON_PRESENT;
>+ goto out_unmap;
>+ }
>+
> ++unmapped;
> if (!cc->is_khugepaged ||
> unmapped <= khugepaged_max_ptes_swap) {
>@@ -1293,7 +1309,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
> * enabled swap entries. Please see
> * comment below for pte_uffd_wp().
> */
>- if (pte_swp_uffd_wp_any(pteval)) {
>+ if (pte_swp_uffd_wp(pteval)) {
I am not sure why we want to change this. There is no description in the
change log.
Would you mind giving some hint on this?
> result = SCAN_PTE_UFFD_WP;
> goto out_unmap;
> }
>@@ -1304,18 +1320,6 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
> goto out_unmap;
> }
> }
>- if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
>- ++none_or_zero;
>- if (!userfaultfd_armed(vma) &&
>- (!cc->is_khugepaged ||
>- none_or_zero <= khugepaged_max_ptes_none)) {
>- continue;
>- } else {
>- result = SCAN_EXCEED_NONE_PTE;
>- count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
>- goto out_unmap;
>- }
>- }
> if (pte_uffd_wp(pteval)) {
> /*
> * Don't collapse the page if any of the small
>--
>2.49.0
--
Wei Yang
Help you, Help me
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH mm-new v2 1/1] mm/khugepaged: abort collapse scan on non-swap entries
2025-10-01 8:54 ` Wei Yang
@ 2025-10-01 9:15 ` David Hildenbrand
2025-10-01 10:05 ` Lance Yang
1 sibling, 0 replies; 16+ messages in thread
From: David Hildenbrand @ 2025-10-01 9:15 UTC (permalink / raw)
To: Wei Yang, Lance Yang
Cc: akpm, lorenzo.stoakes, Liam.Howlett, baohua, baolin.wang,
dev.jain, hughd, ioworker0, kirill, linux-kernel, linux-mm,
mpenttil, npache, ryan.roberts, ziy
On 01.10.25 10:54, Wei Yang wrote:
> On Wed, Oct 01, 2025 at 11:22:51AM +0800, Lance Yang wrote:
>> From: Lance Yang <lance.yang@linux.dev>
>>
>> Currently, special non-swap entries (like migration, hwpoison, or PTE
>> markers) are not caught early in hpage_collapse_scan_pmd(), leading to
>> failures deep in the swap-in logic.
>>
>> hpage_collapse_scan_pmd()
>> `- collapse_huge_page()
>> `- __collapse_huge_page_swapin() -> fails!
>>
>> As David suggested[1], this patch skips any such non-swap entries
>> early. If any one is found, the scan is aborted immediately with the
>> SCAN_PTE_NON_PRESENT result, as Lorenzo suggested[2], avoiding wasted
>> work.
>>
>> [1] https://lore.kernel.org/linux-mm/7840f68e-7580-42cb-a7c8-1ba64fd6df69@redhat.com
>> [2] https://lore.kernel.org/linux-mm/7df49fe7-c6b7-426a-8680-dcd55219c8bd@lucifer.local
>>
>> Suggested-by: David Hildenbrand <david@redhat.com>
>> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>> Signed-off-by: Lance Yang <lance.yang@linux.dev>
>> ---
>> v1 -> v2:
>> - Skip all non-present entries except swap entries (per David) thanks!
>> - https://lore.kernel.org/linux-mm/20250924100207.28332-1-lance.yang@linux.dev/
>>
>> mm/khugepaged.c | 32 ++++++++++++++++++--------------
>> 1 file changed, 18 insertions(+), 14 deletions(-)
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index 7ab2d1a42df3..d0957648db19 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -1284,7 +1284,23 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>> for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
>> _pte++, addr += PAGE_SIZE) {
>> pte_t pteval = ptep_get(_pte);
>> - if (is_swap_pte(pteval)) {
>
> It looks is_swap_pte() is mis-leading?
Yeah, it nowadays includes non-swap entries as well.
Maybe we should consider renaming it at some point to something better.
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH mm-new v2 1/1] mm/khugepaged: abort collapse scan on non-swap entries
2025-10-01 8:31 ` David Hildenbrand
@ 2025-10-01 9:38 ` Lance Yang
0 siblings, 0 replies; 16+ messages in thread
From: Lance Yang @ 2025-10-01 9:38 UTC (permalink / raw)
To: David Hildenbrand
Cc: lorenzo.stoakes, Liam.Howlett, baohua, baolin.wang, dev.jain,
hughd, ioworker0, kirill, linux-kernel, linux-mm, mpenttil,
npache, ryan.roberts, ziy, richard.weiyang, akpm
On 2025/10/1 16:31, David Hildenbrand wrote:
> On 01.10.25 05:22, Lance Yang wrote:
>> From: Lance Yang <lance.yang@linux.dev>
>>
>> Currently, special non-swap entries (like migration, hwpoison, or PTE
>> markers) are not caught early in hpage_collapse_scan_pmd(), leading to
>> failures deep in the swap-in logic.
>>
>> hpage_collapse_scan_pmd()
>> `- collapse_huge_page()
>> `- __collapse_huge_page_swapin() -> fails!
>>
>> As David suggested[1], this patch skips any such non-swap entries
>> early. If any one is found, the scan is aborted immediately with the
>> SCAN_PTE_NON_PRESENT result, as Lorenzo suggested[2], avoiding wasted
>> work.
>>
>> [1] https://lore.kernel.org/linux-mm/7840f68e-7580-42cb-
>> a7c8-1ba64fd6df69@redhat.com
>> [2] https://lore.kernel.org/linux-mm/7df49fe7-c6b7-426a-8680-
>> dcd55219c8bd@lucifer.local
>>
>> Suggested-by: David Hildenbrand <david@redhat.com>
>> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>> Signed-off-by: Lance Yang <lance.yang@linux.dev>
>> ---
>
> Could have mentioned you adjust the flow of code to resemble what we
> have in __collapse_huge_page_isolate().
Thanks for pointing that out! Let's do that separately. I'm working
on a follow-up patch that will unify the scanning with the
almost-duplicated code as you suggested ;)
>
> Acked-by: David Hildenbrand <david@redhat.com>
Cheers!
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH mm-new v2 1/1] mm/khugepaged: abort collapse scan on non-swap entries
2025-10-01 8:54 ` Wei Yang
2025-10-01 9:15 ` David Hildenbrand
@ 2025-10-01 10:05 ` Lance Yang
2025-10-01 13:52 ` Wei Yang
2025-10-05 1:05 ` Wei Yang
1 sibling, 2 replies; 16+ messages in thread
From: Lance Yang @ 2025-10-01 10:05 UTC (permalink / raw)
To: Wei Yang
Cc: akpm, david, lorenzo.stoakes, Liam.Howlett, baohua, baolin.wang,
dev.jain, hughd, ioworker0, kirill, linux-kernel, linux-mm,
mpenttil, npache, ryan.roberts, ziy
On 2025/10/1 16:54, Wei Yang wrote:
> On Wed, Oct 01, 2025 at 11:22:51AM +0800, Lance Yang wrote:
>> From: Lance Yang <lance.yang@linux.dev>
>>
>> Currently, special non-swap entries (like migration, hwpoison, or PTE
>> markers) are not caught early in hpage_collapse_scan_pmd(), leading to
>> failures deep in the swap-in logic.
>>
>> hpage_collapse_scan_pmd()
>> `- collapse_huge_page()
>> `- __collapse_huge_page_swapin() -> fails!
>>
>> As David suggested[1], this patch skips any such non-swap entries
>> early. If any one is found, the scan is aborted immediately with the
>> SCAN_PTE_NON_PRESENT result, as Lorenzo suggested[2], avoiding wasted
>> work.
>>
>> [1] https://lore.kernel.org/linux-mm/7840f68e-7580-42cb-a7c8-1ba64fd6df69@redhat.com
>> [2] https://lore.kernel.org/linux-mm/7df49fe7-c6b7-426a-8680-dcd55219c8bd@lucifer.local
>>
>> Suggested-by: David Hildenbrand <david@redhat.com>
>> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>> Signed-off-by: Lance Yang <lance.yang@linux.dev>
>> ---
>> v1 -> v2:
>> - Skip all non-present entries except swap entries (per David) thanks!
>> - https://lore.kernel.org/linux-mm/20250924100207.28332-1-lance.yang@linux.dev/
>>
>> mm/khugepaged.c | 32 ++++++++++++++++++--------------
>> 1 file changed, 18 insertions(+), 14 deletions(-)
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index 7ab2d1a42df3..d0957648db19 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -1284,7 +1284,23 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>> for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
>> _pte++, addr += PAGE_SIZE) {
>> pte_t pteval = ptep_get(_pte);
>> - if (is_swap_pte(pteval)) {
>
> It looks is_swap_pte() is mis-leading?
Hmm.. not to me, IMO. is_swap_pte() just means:
!pte_none(pte) && !pte_present(pte)
>
>> + if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
>> + ++none_or_zero;
>> + if (!userfaultfd_armed(vma) &&
>> + (!cc->is_khugepaged ||
>> + none_or_zero <= khugepaged_max_ptes_none)) {
>> + continue;
>> + } else {
>> + result = SCAN_EXCEED_NONE_PTE;
>> + count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
>> + goto out_unmap;
>> + }
>> + } else if (!pte_present(pteval)) {
>> + if (non_swap_entry(pte_to_swp_entry(pteval))) {
>> + result = SCAN_PTE_NON_PRESENT;
>> + goto out_unmap;
>> + }
>> +
>> ++unmapped;
>> if (!cc->is_khugepaged ||
>> unmapped <= khugepaged_max_ptes_swap) {
>> @@ -1293,7 +1309,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>> * enabled swap entries. Please see
>> * comment below for pte_uffd_wp().
>> */
>> - if (pte_swp_uffd_wp_any(pteval)) {
>> + if (pte_swp_uffd_wp(pteval)) {
>
> I am not sure why we want to change this. There is no description in the
> change log.
>
> Would you mind giving some hint on this?
The reason is that pte_swp_uffd_wp_any(pte) is broader than what
we need :)
static inline bool pte_swp_uffd_wp_any(pte_t pte)
{
#ifdef CONFIG_PTE_MARKER_UFFD_WP
if (!is_swap_pte(pte))
return false;
if (pte_swp_uffd_wp(pte))
return true;
if (pte_marker_uffd_wp(pte))
return true;
#endif
return false;
}
In the context within hpage_collapse_scan_pmd(), we are already inside
an is_swap_pte() block, and we have just handled all non-swap entries
(which would include pte_marker_uffd_wp()).
So we only need to check if the swap entry itself is write-protected
for userfaultfd ;)
Hope that explains it. I skipped it in the changelog as it's a tiny
cleanup ...
Thanks,
Lance
>
>> result = SCAN_PTE_UFFD_WP;
>> goto out_unmap;
>> }
>> @@ -1304,18 +1320,6 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>> goto out_unmap;
>> }
>> }
>> - if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
>> - ++none_or_zero;
>> - if (!userfaultfd_armed(vma) &&
>> - (!cc->is_khugepaged ||
>> - none_or_zero <= khugepaged_max_ptes_none)) {
>> - continue;
>> - } else {
>> - result = SCAN_EXCEED_NONE_PTE;
>> - count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
>> - goto out_unmap;
>> - }
>> - }
>> if (pte_uffd_wp(pteval)) {
>> /*
>> * Don't collapse the page if any of the small
>> --
>> 2.49.0
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH mm-new v2 1/1] mm/khugepaged: abort collapse scan on non-swap entries
2025-10-01 3:22 [PATCH mm-new v2 1/1] mm/khugepaged: abort collapse scan on non-swap entries Lance Yang
2025-10-01 8:31 ` David Hildenbrand
2025-10-01 8:54 ` Wei Yang
@ 2025-10-01 10:20 ` Dev Jain
2025-10-01 10:48 ` Lance Yang
2 siblings, 1 reply; 16+ messages in thread
From: Dev Jain @ 2025-10-01 10:20 UTC (permalink / raw)
To: Lance Yang, akpm
Cc: david, lorenzo.stoakes, Liam.Howlett, baohua, baolin.wang, hughd,
ioworker0, kirill, linux-kernel, linux-mm, mpenttil, npache,
ryan.roberts, ziy, richard.weiyang
On 01/10/25 8:52 am, Lance Yang wrote:
> From: Lance Yang <lance.yang@linux.dev>
>
> Currently, special non-swap entries (like migration, hwpoison, or PTE
> markers) are not caught early in hpage_collapse_scan_pmd(), leading to
> failures deep in the swap-in logic.
>
> hpage_collapse_scan_pmd()
> `- collapse_huge_page()
> `- __collapse_huge_page_swapin() -> fails!
>
> As David suggested[1], this patch skips any such non-swap entries
> early. If any one is found, the scan is aborted immediately with the
> SCAN_PTE_NON_PRESENT result, as Lorenzo suggested[2], avoiding wasted
> work.
>
> [1] https://lore.kernel.org/linux-mm/7840f68e-7580-42cb-a7c8-1ba64fd6df69@redhat.com
> [2] https://lore.kernel.org/linux-mm/7df49fe7-c6b7-426a-8680-dcd55219c8bd@lucifer.local
>
> Suggested-by: David Hildenbrand <david@redhat.com>
> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Signed-off-by: Lance Yang <lance.yang@linux.dev>
> ---
> v1 -> v2:
> - Skip all non-present entries except swap entries (per David) thanks!
> - https://lore.kernel.org/linux-mm/20250924100207.28332-1-lance.yang@linux.dev/
>
> mm/khugepaged.c | 32 ++++++++++++++++++--------------
> 1 file changed, 18 insertions(+), 14 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 7ab2d1a42df3..d0957648db19 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1284,7 +1284,23 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
> for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
> _pte++, addr += PAGE_SIZE) {
> pte_t pteval = ptep_get(_pte);
> - if (is_swap_pte(pteval)) {
> + if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
> + ++none_or_zero;
> + if (!userfaultfd_armed(vma) &&
> + (!cc->is_khugepaged ||
> + none_or_zero <= khugepaged_max_ptes_none)) {
> + continue;
> + } else {
> + result = SCAN_EXCEED_NONE_PTE;
> + count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
> + goto out_unmap;
> + }
> + } else if (!pte_present(pteval)) {
If you are trying to merge this with the _isolate() conditions, we can do
a micro-optimization here - is_swap_pte, (pte_none && is_zero_pfn), and pte_uffd_wp
are disjoint conditions, so we can use if-else-if-else-if to write them.
> + if (non_swap_entry(pte_to_swp_entry(pteval))) {
> + result = SCAN_PTE_NON_PRESENT;
> + goto out_unmap;
> + }
> +
> ++unmapped;
> if (!cc->is_khugepaged ||
> unmapped <= khugepaged_max_ptes_swap) {
> @@ -1293,7 +1309,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
> * enabled swap entries. Please see
> * comment below for pte_uffd_wp().
> */
> - if (pte_swp_uffd_wp_any(pteval)) {
> + if (pte_swp_uffd_wp(pteval)) {
> result = SCAN_PTE_UFFD_WP;
Could have mentioned in the changelog "while at it, convert pte_swp_uffd_wp_any to
pte_swp_uffd_wp since we are in the swap pte branch".
> goto out_unmap;
> }
> @@ -1304,18 +1320,6 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
> goto out_unmap;
> }
> }
> - if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
> - ++none_or_zero;
> - if (!userfaultfd_armed(vma) &&
> - (!cc->is_khugepaged ||
> - none_or_zero <= khugepaged_max_ptes_none)) {
> - continue;
> - } else {
> - result = SCAN_EXCEED_NONE_PTE;
> - count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
> - goto out_unmap;
> - }
> - }
> if (pte_uffd_wp(pteval)) {
> /*
> * Don't collapse the page if any of the small
Otherwise LGTM
Reviewed-by: Dev Jain <dev.jain@arm.com>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH mm-new v2 1/1] mm/khugepaged: abort collapse scan on non-swap entries
2025-10-01 10:20 ` Dev Jain
@ 2025-10-01 10:48 ` Lance Yang
0 siblings, 0 replies; 16+ messages in thread
From: Lance Yang @ 2025-10-01 10:48 UTC (permalink / raw)
To: Dev Jain
Cc: david, lorenzo.stoakes, Liam.Howlett, baohua, baolin.wang, hughd,
ioworker0, kirill, linux-kernel, linux-mm, mpenttil, npache,
ryan.roberts, ziy, richard.weiyang, akpm
On 2025/10/1 18:20, Dev Jain wrote:
>
> On 01/10/25 8:52 am, Lance Yang wrote:
>> From: Lance Yang <lance.yang@linux.dev>
>>
>> Currently, special non-swap entries (like migration, hwpoison, or PTE
>> markers) are not caught early in hpage_collapse_scan_pmd(), leading to
>> failures deep in the swap-in logic.
>>
>> hpage_collapse_scan_pmd()
>> `- collapse_huge_page()
>> `- __collapse_huge_page_swapin() -> fails!
>>
>> As David suggested[1], this patch skips any such non-swap entries
>> early. If any one is found, the scan is aborted immediately with the
>> SCAN_PTE_NON_PRESENT result, as Lorenzo suggested[2], avoiding wasted
>> work.
>>
>> [1] https://lore.kernel.org/linux-mm/7840f68e-7580-42cb-
>> a7c8-1ba64fd6df69@redhat.com
>> [2] https://lore.kernel.org/linux-mm/7df49fe7-c6b7-426a-8680-
>> dcd55219c8bd@lucifer.local
>>
>> Suggested-by: David Hildenbrand <david@redhat.com>
>> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>> Signed-off-by: Lance Yang <lance.yang@linux.dev>
>> ---
>> v1 -> v2:
>> - Skip all non-present entries except swap entries (per David) thanks!
>> - https://lore.kernel.org/linux-mm/20250924100207.28332-1-
>> lance.yang@linux.dev/
>>
>> mm/khugepaged.c | 32 ++++++++++++++++++--------------
>> 1 file changed, 18 insertions(+), 14 deletions(-)
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index 7ab2d1a42df3..d0957648db19 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -1284,7 +1284,23 @@ static int hpage_collapse_scan_pmd(struct
>> mm_struct *mm,
>> for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
>> _pte++, addr += PAGE_SIZE) {
>> pte_t pteval = ptep_get(_pte);
>> - if (is_swap_pte(pteval)) {
>> + if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
>> + ++none_or_zero;
>> + if (!userfaultfd_armed(vma) &&
>> + (!cc->is_khugepaged ||
>> + none_or_zero <= khugepaged_max_ptes_none)) {
>> + continue;
>> + } else {
>> + result = SCAN_EXCEED_NONE_PTE;
>> + count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
>> + goto out_unmap;
>> + }
>> + } else if (!pte_present(pteval)) {
>
> If you are trying to merge this with the _isolate() conditions, we can do
> a micro-optimization here - is_swap_pte, (pte_none && is_zero_pfn), and
> pte_uffd_wp
> are disjoint conditions, so we can use if-else-if-else-if to write them.
Ah, indeed, thanks!
I think it would fit better into the follow-up patch that unifies the
scanning logic, and I'll make sure to include it there ;p
>
>> + if (non_swap_entry(pte_to_swp_entry(pteval))) {
>> + result = SCAN_PTE_NON_PRESENT;
>> + goto out_unmap;
>> + }
>> +
>> ++unmapped;
>> if (!cc->is_khugepaged ||
>> unmapped <= khugepaged_max_ptes_swap) {
>> @@ -1293,7 +1309,7 @@ static int hpage_collapse_scan_pmd(struct
>> mm_struct *mm,
>> * enabled swap entries. Please see
>> * comment below for pte_uffd_wp().
>> */
>> - if (pte_swp_uffd_wp_any(pteval)) {
>> + if (pte_swp_uffd_wp(pteval)) {
>> result = SCAN_PTE_UFFD_WP;
>
> Could have mentioned in the changelog "while at it, convert
> pte_swp_uffd_wp_any to
> pte_swp_uffd_wp since we are in the swap pte branch".
Right, that would have been clearer.
I'll add that if a next version is needed :)
>
>> goto out_unmap;
>> }
>> @@ -1304,18 +1320,6 @@ static int hpage_collapse_scan_pmd(struct
>> mm_struct *mm,
>> goto out_unmap;
>> }
>> }
>> - if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
>> - ++none_or_zero;
>> - if (!userfaultfd_armed(vma) &&
>> - (!cc->is_khugepaged ||
>> - none_or_zero <= khugepaged_max_ptes_none)) {
>> - continue;
>> - } else {
>> - result = SCAN_EXCEED_NONE_PTE;
>> - count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
>> - goto out_unmap;
>> - }
>> - }
>> if (pte_uffd_wp(pteval)) {
>> /*
>> * Don't collapse the page if any of the small
>
> Otherwise LGTM
>
> Reviewed-by: Dev Jain <dev.jain@arm.com>
Cheers!
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH mm-new v2 1/1] mm/khugepaged: abort collapse scan on non-swap entries
2025-10-01 10:05 ` Lance Yang
@ 2025-10-01 13:52 ` Wei Yang
2025-10-05 1:05 ` Wei Yang
1 sibling, 0 replies; 16+ messages in thread
From: Wei Yang @ 2025-10-01 13:52 UTC (permalink / raw)
To: Lance Yang
Cc: Wei Yang, akpm, david, lorenzo.stoakes, Liam.Howlett, baohua,
baolin.wang, dev.jain, hughd, ioworker0, kirill, linux-kernel,
linux-mm, mpenttil, npache, ryan.roberts, ziy
On Wed, Oct 01, 2025 at 06:05:57PM +0800, Lance Yang wrote:
>
>
>On 2025/10/1 16:54, Wei Yang wrote:
>> On Wed, Oct 01, 2025 at 11:22:51AM +0800, Lance Yang wrote:
>> > From: Lance Yang <lance.yang@linux.dev>
>> >
>> > Currently, special non-swap entries (like migration, hwpoison, or PTE
>> > markers) are not caught early in hpage_collapse_scan_pmd(), leading to
>> > failures deep in the swap-in logic.
>> >
>> > hpage_collapse_scan_pmd()
>> > `- collapse_huge_page()
>> > `- __collapse_huge_page_swapin() -> fails!
>> >
>> > As David suggested[1], this patch skips any such non-swap entries
>> > early. If any one is found, the scan is aborted immediately with the
>> > SCAN_PTE_NON_PRESENT result, as Lorenzo suggested[2], avoiding wasted
>> > work.
>> >
>> > [1] https://lore.kernel.org/linux-mm/7840f68e-7580-42cb-a7c8-1ba64fd6df69@redhat.com
>> > [2] https://lore.kernel.org/linux-mm/7df49fe7-c6b7-426a-8680-dcd55219c8bd@lucifer.local
>> >
>> > Suggested-by: David Hildenbrand <david@redhat.com>
>> > Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>> > Signed-off-by: Lance Yang <lance.yang@linux.dev>
>> > ---
>> > v1 -> v2:
>> > - Skip all non-present entries except swap entries (per David) thanks!
>> > - https://lore.kernel.org/linux-mm/20250924100207.28332-1-lance.yang@linux.dev/
>> >
>> > mm/khugepaged.c | 32 ++++++++++++++++++--------------
>> > 1 file changed, 18 insertions(+), 14 deletions(-)
>> >
>> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> > index 7ab2d1a42df3..d0957648db19 100644
>> > --- a/mm/khugepaged.c
>> > +++ b/mm/khugepaged.c
>> > @@ -1284,7 +1284,23 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>> > for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
>> > _pte++, addr += PAGE_SIZE) {
>> > pte_t pteval = ptep_get(_pte);
>> > - if (is_swap_pte(pteval)) {
>>
>> It looks is_swap_pte() is mis-leading?
>
>Hmm.. not to me, IMO. is_swap_pte() just means:
>
>!pte_none(pte) && !pte_present(pte)
>
>>
>> > + if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
>> > + ++none_or_zero;
>> > + if (!userfaultfd_armed(vma) &&
>> > + (!cc->is_khugepaged ||
>> > + none_or_zero <= khugepaged_max_ptes_none)) {
>> > + continue;
>> > + } else {
>> > + result = SCAN_EXCEED_NONE_PTE;
>> > + count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
>> > + goto out_unmap;
>> > + }
>> > + } else if (!pte_present(pteval)) {
>> > + if (non_swap_entry(pte_to_swp_entry(pteval))) {
>> > + result = SCAN_PTE_NON_PRESENT;
>> > + goto out_unmap;
>> > + }
>> > +
>> > ++unmapped;
>> > if (!cc->is_khugepaged ||
>> > unmapped <= khugepaged_max_ptes_swap) {
>> > @@ -1293,7 +1309,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>> > * enabled swap entries. Please see
>> > * comment below for pte_uffd_wp().
>> > */
>> > - if (pte_swp_uffd_wp_any(pteval)) {
>> > + if (pte_swp_uffd_wp(pteval)) {
>>
>> I am not sure why we want to change this. There is no description in the
>> change log.
>>
>> Would you mind giving some hint on this?
>
>The reason is that pte_swp_uffd_wp_any(pte) is broader than what
>we need :)
>
>static inline bool pte_swp_uffd_wp_any(pte_t pte)
>{
>#ifdef CONFIG_PTE_MARKER_UFFD_WP
> if (!is_swap_pte(pte))
> return false;
>
> if (pte_swp_uffd_wp(pte))
> return true;
>
> if (pte_marker_uffd_wp(pte))
> return true;
>#endif
> return false;
>}
>
>In the context within hpage_collapse_scan_pmd(), we are already inside
>an is_swap_pte() block, and we have just handled all non-swap entries
>(which would include pte_marker_uffd_wp()).
>
>So we only need to check if the swap entry itself is write-protected
>for userfaultfd ;)
>
>Hope that explains it. I skipped it in the changelog as it's a tiny
>cleanup ...
Thanks, I got it.
Generally, looks good to me.
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
--
Wei Yang
Help you, Help me
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH mm-new v2 1/1] mm/khugepaged: abort collapse scan on non-swap entries
2025-10-01 10:05 ` Lance Yang
2025-10-01 13:52 ` Wei Yang
@ 2025-10-05 1:05 ` Wei Yang
2025-10-05 2:12 ` Lance Yang
1 sibling, 1 reply; 16+ messages in thread
From: Wei Yang @ 2025-10-05 1:05 UTC (permalink / raw)
To: Lance Yang
Cc: Wei Yang, akpm, david, lorenzo.stoakes, Liam.Howlett, baohua,
baolin.wang, dev.jain, hughd, ioworker0, kirill, linux-kernel,
linux-mm, mpenttil, npache, ryan.roberts, ziy
On Wed, Oct 01, 2025 at 06:05:57PM +0800, Lance Yang wrote:
>
>
>On 2025/10/1 16:54, Wei Yang wrote:
>> On Wed, Oct 01, 2025 at 11:22:51AM +0800, Lance Yang wrote:
>> > From: Lance Yang <lance.yang@linux.dev>
>> >
>> > Currently, special non-swap entries (like migration, hwpoison, or PTE
>> > markers) are not caught early in hpage_collapse_scan_pmd(), leading to
>> > failures deep in the swap-in logic.
>> >
>> > hpage_collapse_scan_pmd()
>> > `- collapse_huge_page()
>> > `- __collapse_huge_page_swapin() -> fails!
>> >
>> > As David suggested[1], this patch skips any such non-swap entries
>> > early. If any one is found, the scan is aborted immediately with the
>> > SCAN_PTE_NON_PRESENT result, as Lorenzo suggested[2], avoiding wasted
>> > work.
>> >
>> > [1] https://lore.kernel.org/linux-mm/7840f68e-7580-42cb-a7c8-1ba64fd6df69@redhat.com
>> > [2] https://lore.kernel.org/linux-mm/7df49fe7-c6b7-426a-8680-dcd55219c8bd@lucifer.local
>> >
>> > Suggested-by: David Hildenbrand <david@redhat.com>
>> > Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>> > Signed-off-by: Lance Yang <lance.yang@linux.dev>
>> > ---
>> > v1 -> v2:
>> > - Skip all non-present entries except swap entries (per David) thanks!
>> > - https://lore.kernel.org/linux-mm/20250924100207.28332-1-lance.yang@linux.dev/
>> >
>> > mm/khugepaged.c | 32 ++++++++++++++++++--------------
>> > 1 file changed, 18 insertions(+), 14 deletions(-)
>> >
>> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> > index 7ab2d1a42df3..d0957648db19 100644
>> > --- a/mm/khugepaged.c
>> > +++ b/mm/khugepaged.c
>> > @@ -1284,7 +1284,23 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>> > for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
>> > _pte++, addr += PAGE_SIZE) {
>> > pte_t pteval = ptep_get(_pte);
>> > - if (is_swap_pte(pteval)) {
>>
>> It looks is_swap_pte() is mis-leading?
>
>Hmm.. not to me, IMO. is_swap_pte() just means:
>
>!pte_none(pte) && !pte_present(pte)
>
Maybe it has some reason.
I took another look into __collapse_huge_page_swapin(), which just check
is_swap_pte() before do_swap_page().
We have filtered non-swap entries in hpage_collapse_scan_pmd(), but we drop
mmap lock before isolation. This looks we may have a chance to get non-swap
entry.
Do you think it is reasonable to add a non_swap_entry() check before
do_swap_page()?
--
Wei Yang
Help you, Help me
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH mm-new v2 1/1] mm/khugepaged: abort collapse scan on non-swap entries
2025-10-05 1:05 ` Wei Yang
@ 2025-10-05 2:12 ` Lance Yang
2025-10-06 14:18 ` David Hildenbrand
0 siblings, 1 reply; 16+ messages in thread
From: Lance Yang @ 2025-10-05 2:12 UTC (permalink / raw)
To: Wei Yang
Cc: akpm, david, lorenzo.stoakes, Liam.Howlett, baohua, baolin.wang,
dev.jain, hughd, ioworker0, kirill, linux-kernel, linux-mm,
mpenttil, npache, ryan.roberts, ziy
On 2025/10/5 09:05, Wei Yang wrote:
> On Wed, Oct 01, 2025 at 06:05:57PM +0800, Lance Yang wrote:
>>
>>
>> On 2025/10/1 16:54, Wei Yang wrote:
>>> On Wed, Oct 01, 2025 at 11:22:51AM +0800, Lance Yang wrote:
>>>> From: Lance Yang <lance.yang@linux.dev>
>>>>
>>>> Currently, special non-swap entries (like migration, hwpoison, or PTE
>>>> markers) are not caught early in hpage_collapse_scan_pmd(), leading to
>>>> failures deep in the swap-in logic.
>>>>
>>>> hpage_collapse_scan_pmd()
>>>> `- collapse_huge_page()
>>>> `- __collapse_huge_page_swapin() -> fails!
>>>>
>>>> As David suggested[1], this patch skips any such non-swap entries
>>>> early. If any one is found, the scan is aborted immediately with the
>>>> SCAN_PTE_NON_PRESENT result, as Lorenzo suggested[2], avoiding wasted
>>>> work.
>>>>
>>>> [1] https://lore.kernel.org/linux-mm/7840f68e-7580-42cb-a7c8-1ba64fd6df69@redhat.com
>>>> [2] https://lore.kernel.org/linux-mm/7df49fe7-c6b7-426a-8680-dcd55219c8bd@lucifer.local
>>>>
>>>> Suggested-by: David Hildenbrand <david@redhat.com>
>>>> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>>>> Signed-off-by: Lance Yang <lance.yang@linux.dev>
>>>> ---
>>>> v1 -> v2:
>>>> - Skip all non-present entries except swap entries (per David) thanks!
>>>> - https://lore.kernel.org/linux-mm/20250924100207.28332-1-lance.yang@linux.dev/
>>>>
>>>> mm/khugepaged.c | 32 ++++++++++++++++++--------------
>>>> 1 file changed, 18 insertions(+), 14 deletions(-)
>>>>
>>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>>> index 7ab2d1a42df3..d0957648db19 100644
>>>> --- a/mm/khugepaged.c
>>>> +++ b/mm/khugepaged.c
>>>> @@ -1284,7 +1284,23 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>>>> for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
>>>> _pte++, addr += PAGE_SIZE) {
>>>> pte_t pteval = ptep_get(_pte);
>>>> - if (is_swap_pte(pteval)) {
>>>
>>> It looks is_swap_pte() is mis-leading?
>>
>> Hmm.. not to me, IMO. is_swap_pte() just means:
>>
>> !pte_none(pte) && !pte_present(pte)
>>
>
> Maybe it has some reason.
>
> I took another look into __collapse_huge_page_swapin(), which just check
> is_swap_pte() before do_swap_page().
>
> We have filtered non-swap entries in hpage_collapse_scan_pmd(), but we drop
> mmap lock before isolation. This looks we may have a chance to get non-swap
> entry.
Thanks for pointing that out!
Yep, there is a theoretical window between dropping the mmap lock
after the initial scan and re-acquiring it for isolation.
>
> Do you think it is reasonable to add a non_swap_entry() check before
> do_swap_page()?
However, that seems unlikely in practice. IMHO, the early check in
hpage_collapse_scan_pmd() is sufficient for now, so I'd prefer to
keep it as-is :)
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH mm-new v2 1/1] mm/khugepaged: abort collapse scan on non-swap entries
2025-10-05 2:12 ` Lance Yang
@ 2025-10-06 14:18 ` David Hildenbrand
2025-10-06 15:02 ` Lance Yang
2025-10-07 10:25 ` Lance Yang
0 siblings, 2 replies; 16+ messages in thread
From: David Hildenbrand @ 2025-10-06 14:18 UTC (permalink / raw)
To: Lance Yang, Wei Yang
Cc: akpm, lorenzo.stoakes, Liam.Howlett, baohua, baolin.wang,
dev.jain, hughd, ioworker0, kirill, linux-kernel, linux-mm,
mpenttil, npache, ryan.roberts, ziy
On 05.10.25 04:12, Lance Yang wrote:
>
>
> On 2025/10/5 09:05, Wei Yang wrote:
>> On Wed, Oct 01, 2025 at 06:05:57PM +0800, Lance Yang wrote:
>>>
>>>
>>> On 2025/10/1 16:54, Wei Yang wrote:
>>>> On Wed, Oct 01, 2025 at 11:22:51AM +0800, Lance Yang wrote:
>>>>> From: Lance Yang <lance.yang@linux.dev>
>>>>>
>>>>> Currently, special non-swap entries (like migration, hwpoison, or PTE
>>>>> markers) are not caught early in hpage_collapse_scan_pmd(), leading to
>>>>> failures deep in the swap-in logic.
>>>>>
>>>>> hpage_collapse_scan_pmd()
>>>>> `- collapse_huge_page()
>>>>> `- __collapse_huge_page_swapin() -> fails!
>>>>>
>>>>> As David suggested[1], this patch skips any such non-swap entries
>>>>> early. If any one is found, the scan is aborted immediately with the
>>>>> SCAN_PTE_NON_PRESENT result, as Lorenzo suggested[2], avoiding wasted
>>>>> work.
>>>>>
>>>>> [1] https://lore.kernel.org/linux-mm/7840f68e-7580-42cb-a7c8-1ba64fd6df69@redhat.com
>>>>> [2] https://lore.kernel.org/linux-mm/7df49fe7-c6b7-426a-8680-dcd55219c8bd@lucifer.local
>>>>>
>>>>> Suggested-by: David Hildenbrand <david@redhat.com>
>>>>> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>>>>> Signed-off-by: Lance Yang <lance.yang@linux.dev>
>>>>> ---
>>>>> v1 -> v2:
>>>>> - Skip all non-present entries except swap entries (per David) thanks!
>>>>> - https://lore.kernel.org/linux-mm/20250924100207.28332-1-lance.yang@linux.dev/
>>>>>
>>>>> mm/khugepaged.c | 32 ++++++++++++++++++--------------
>>>>> 1 file changed, 18 insertions(+), 14 deletions(-)
>>>>>
>>>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>>>> index 7ab2d1a42df3..d0957648db19 100644
>>>>> --- a/mm/khugepaged.c
>>>>> +++ b/mm/khugepaged.c
>>>>> @@ -1284,7 +1284,23 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>>>>> for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
>>>>> _pte++, addr += PAGE_SIZE) {
>>>>> pte_t pteval = ptep_get(_pte);
>>>>> - if (is_swap_pte(pteval)) {
>>>>
>>>> It looks is_swap_pte() is mis-leading?
>>>
>>> Hmm.. not to me, IMO. is_swap_pte() just means:
>>>
>>> !pte_none(pte) && !pte_present(pte)
>>>
>>
>> Maybe it has some reason.
>>
>> I took another look into __collapse_huge_page_swapin(), which just check
>> is_swap_pte() before do_swap_page().
Thanks for pointing that out.
A function that is called __collapse_huge_page_swapin() and documented
to "Bring missing pages in from swap" will handle other types as well.
Unbelievable horrible.
So let's think this through so we can document it in the changelog properly.
We could have currently ended up in do_swap_page() with
(1) Migration entries. We would have waited.
-> Maybe worth it to wait, maybe not. I suspect we don't stumble into
that frequently such that we don't care. We could always unlock this
separately later.
(2) Device-exclusive entries. We would have converted to non-exclusive.
-> See make_device_exclusive(), we cannot tolerate PMD entries and have
to split them through FOLL_SPLIT_PMD. As popped up during a recent
discussion, collapsing here is actually counter-productive, because
the next conversion will PTE-map it again. (until recently, it would
not have worked with large folios at all IIRC).
-> Ok to not collapse.
(3) Device-private entries. We would have migrated to RAM.
-> Device-private still does not support THPs, so collapsing right now
just means that the next device access would split the folio again.
-> Ok to not collapse.
(4) HWPoison entries
-> Cannot collapse
(5) Markers
-> Cannot collapse
I suggest we add that in some form to the patch description, stating
that we can unlock later what we really need, and not account it towards
max_swap_ptes.
>>
>> We have filtered non-swap entries in hpage_collapse_scan_pmd(), but we drop
>> mmap lock before isolation. This looks we may have a chance to get non-swap
>> entry.
>
> Thanks for pointing that out!
>
> Yep, there is a theoretical window between dropping the mmap lock
> after the initial scan and re-acquiring it for isolation.
>
>>
>> Do you think it is reasonable to add a non_swap_entry() check before
>> do_swap_page()?
>
> However, that seems unlikely in practice. IMHO, the early check in
> hpage_collapse_scan_pmd() is sufficient for now, so I'd prefer to
> keep it as-is :)
I think we really should add that check, as per reasoning above.
I was looking into some possible races with uffd-wp being set before we
enter do_swap_page(), but I think it might be okay (although very
confusing).
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH mm-new v2 1/1] mm/khugepaged: abort collapse scan on non-swap entries
2025-10-06 14:18 ` David Hildenbrand
@ 2025-10-06 15:02 ` Lance Yang
2025-10-07 10:25 ` Lance Yang
1 sibling, 0 replies; 16+ messages in thread
From: Lance Yang @ 2025-10-06 15:02 UTC (permalink / raw)
To: David Hildenbrand, Wei Yang
Cc: lorenzo.stoakes, Liam.Howlett, baohua, baolin.wang, dev.jain,
hughd, ioworker0, kirill, linux-kernel, linux-mm, mpenttil,
npache, ryan.roberts, ziy, akpm
On 2025/10/6 22:18, David Hildenbrand wrote:
> On 05.10.25 04:12, Lance Yang wrote:
>>
>>
>> On 2025/10/5 09:05, Wei Yang wrote:
>>> On Wed, Oct 01, 2025 at 06:05:57PM +0800, Lance Yang wrote:
>>>>
>>>>
>>>> On 2025/10/1 16:54, Wei Yang wrote:
>>>>> On Wed, Oct 01, 2025 at 11:22:51AM +0800, Lance Yang wrote:
>>>>>> From: Lance Yang <lance.yang@linux.dev>
>>>>>>
>>>>>> Currently, special non-swap entries (like migration, hwpoison, or PTE
>>>>>> markers) are not caught early in hpage_collapse_scan_pmd(),
>>>>>> leading to
>>>>>> failures deep in the swap-in logic.
>>>>>>
>>>>>> hpage_collapse_scan_pmd()
>>>>>> `- collapse_huge_page()
>>>>>> `- __collapse_huge_page_swapin() -> fails!
>>>>>>
>>>>>> As David suggested[1], this patch skips any such non-swap entries
>>>>>> early. If any one is found, the scan is aborted immediately with the
>>>>>> SCAN_PTE_NON_PRESENT result, as Lorenzo suggested[2], avoiding wasted
>>>>>> work.
>>>>>>
>>>>>> [1] https://lore.kernel.org/linux-mm/7840f68e-7580-42cb-
>>>>>> a7c8-1ba64fd6df69@redhat.com
>>>>>> [2] https://lore.kernel.org/linux-mm/7df49fe7-c6b7-426a-8680-
>>>>>> dcd55219c8bd@lucifer.local
>>>>>>
>>>>>> Suggested-by: David Hildenbrand <david@redhat.com>
>>>>>> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>>>>>> Signed-off-by: Lance Yang <lance.yang@linux.dev>
>>>>>> ---
>>>>>> v1 -> v2:
>>>>>> - Skip all non-present entries except swap entries (per David)
>>>>>> thanks!
>>>>>> - https://lore.kernel.org/linux-mm/20250924100207.28332-1-
>>>>>> lance.yang@linux.dev/
>>>>>>
>>>>>> mm/khugepaged.c | 32 ++++++++++++++++++--------------
>>>>>> 1 file changed, 18 insertions(+), 14 deletions(-)
>>>>>>
>>>>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>>>>> index 7ab2d1a42df3..d0957648db19 100644
>>>>>> --- a/mm/khugepaged.c
>>>>>> +++ b/mm/khugepaged.c
>>>>>> @@ -1284,7 +1284,23 @@ static int hpage_collapse_scan_pmd(struct
>>>>>> mm_struct *mm,
>>>>>> for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
>>>>>> _pte++, addr += PAGE_SIZE) {
>>>>>> pte_t pteval = ptep_get(_pte);
>>>>>> - if (is_swap_pte(pteval)) {
>>>>>
>>>>> It looks is_swap_pte() is mis-leading?
>>>>
>>>> Hmm.. not to me, IMO. is_swap_pte() just means:
>>>>
>>>> !pte_none(pte) && !pte_present(pte)
>>>>
>>>
>>> Maybe it has some reason.
>>>
>>> I took another look into __collapse_huge_page_swapin(), which just check
>>> is_swap_pte() before do_swap_page().
>
> Thanks for pointing that out.
>
> A function that is called __collapse_huge_page_swapin() and documented
> to "Bring missing pages in from swap" will handle other types as well.
>
> Unbelievable horrible.
>
> So let's think this through so we can document it in the changelog
> properly.
>
> We could have currently ended up in do_swap_page() with
>
> (1) Migration entries. We would have waited.
>
> -> Maybe worth it to wait, maybe not. I suspect we don't stumble into
> that frequently such that we don't care. We could always unlock this
> separately later.
>
>
> (2) Device-exclusive entries. We would have converted to non-exclusive.
>
> -> See make_device_exclusive(), we cannot tolerate PMD entries and have
> to split them through FOLL_SPLIT_PMD. As popped up during a recent
> discussion, collapsing here is actually counter-productive, because
> the next conversion will PTE-map it again. (until recently, it would
> not have worked with large folios at all IIRC).
>
> -> Ok to not collapse.
>
> (3) Device-private entries. We would have migrated to RAM.
>
> -> Device-private still does not support THPs, so collapsing right now
> just means that the next device access would split the folio again.
>
> -> Ok to not collapse.
>
> (4) HWPoison entries
>
> -> Cannot collapse
>
> (5) Markers
>
> -> Cannot collapse
>
>
> I suggest we add that in some form to the patch description, stating
> that we can unlock later what we really need, and not account it towards
> max_swap_ptes.
Cool!
I'll take a closer look and adjust the patch description accordingly ;)
Thanks a lot for the lesson!
>
>>>
>>> We have filtered non-swap entries in hpage_collapse_scan_pmd(), but
>>> we drop
>>> mmap lock before isolation. This looks we may have a chance to get
>>> non-swap
>>> entry.
>>
>> Thanks for pointing that out!
>>
>> Yep, there is a theoretical window between dropping the mmap lock
>> after the initial scan and re-acquiring it for isolation.
>>
>>>
>>> Do you think it is reasonable to add a non_swap_entry() check before
>>> do_swap_page()?
>>
>> However, that seems unlikely in practice. IMHO, the early check in
>> hpage_collapse_scan_pmd() is sufficient for now, so I'd prefer to
>> keep it as-is :)
>
> I think we really should add that check, as per reasoning above.
>
> I was looking into some possible races with uffd-wp being set before we
> enter do_swap_page(), but I think it might be okay (although very
> confusing).
Ah, I see ;p
@Wei could you send a patch to add the non_swap_entry() check there?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH mm-new v2 1/1] mm/khugepaged: abort collapse scan on non-swap entries
2025-10-06 14:18 ` David Hildenbrand
2025-10-06 15:02 ` Lance Yang
@ 2025-10-07 10:25 ` Lance Yang
2025-10-08 1:37 ` Wei Yang
2025-10-08 9:00 ` David Hildenbrand
1 sibling, 2 replies; 16+ messages in thread
From: Lance Yang @ 2025-10-07 10:25 UTC (permalink / raw)
To: David Hildenbrand, Wei Yang
Cc: akpm, lorenzo.stoakes, Liam.Howlett, baohua, baolin.wang,
dev.jain, hughd, ioworker0, kirill, linux-kernel, linux-mm,
mpenttil, npache, ryan.roberts, ziy
On 2025/10/6 22:18, David Hildenbrand wrote:
> On 05.10.25 04:12, Lance Yang wrote:
>>
>>
>> On 2025/10/5 09:05, Wei Yang wrote:
>>> On Wed, Oct 01, 2025 at 06:05:57PM +0800, Lance Yang wrote:
>>>>
>>>>
>>>> On 2025/10/1 16:54, Wei Yang wrote:
>>>>> On Wed, Oct 01, 2025 at 11:22:51AM +0800, Lance Yang wrote:
>>>>>> From: Lance Yang <lance.yang@linux.dev>
>>>>>>
>>>>>> Currently, special non-swap entries (like migration, hwpoison, or PTE
>>>>>> markers) are not caught early in hpage_collapse_scan_pmd(),
>>>>>> leading to
>>>>>> failures deep in the swap-in logic.
>>>>>>
>>>>>> hpage_collapse_scan_pmd()
>>>>>> `- collapse_huge_page()
>>>>>> `- __collapse_huge_page_swapin() -> fails!
>>>>>>
>>>>>> As David suggested[1], this patch skips any such non-swap entries
>>>>>> early. If any one is found, the scan is aborted immediately with the
>>>>>> SCAN_PTE_NON_PRESENT result, as Lorenzo suggested[2], avoiding wasted
>>>>>> work.
>>>>>>
>>>>>> [1] https://lore.kernel.org/linux-mm/7840f68e-7580-42cb-
>>>>>> a7c8-1ba64fd6df69@redhat.com
>>>>>> [2] https://lore.kernel.org/linux-mm/7df49fe7-c6b7-426a-8680-
>>>>>> dcd55219c8bd@lucifer.local
>>>>>>
>>>>>> Suggested-by: David Hildenbrand <david@redhat.com>
>>>>>> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>>>>>> Signed-off-by: Lance Yang <lance.yang@linux.dev>
>>>>>> ---
>>>>>> v1 -> v2:
>>>>>> - Skip all non-present entries except swap entries (per David)
>>>>>> thanks!
>>>>>> - https://lore.kernel.org/linux-mm/20250924100207.28332-1-
>>>>>> lance.yang@linux.dev/
>>>>>>
>>>>>> mm/khugepaged.c | 32 ++++++++++++++++++--------------
>>>>>> 1 file changed, 18 insertions(+), 14 deletions(-)
>>>>>>
>>>>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>>>>> index 7ab2d1a42df3..d0957648db19 100644
>>>>>> --- a/mm/khugepaged.c
>>>>>> +++ b/mm/khugepaged.c
>>>>>> @@ -1284,7 +1284,23 @@ static int hpage_collapse_scan_pmd(struct
>>>>>> mm_struct *mm,
>>>>>> for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
>>>>>> _pte++, addr += PAGE_SIZE) {
>>>>>> pte_t pteval = ptep_get(_pte);
>>>>>> - if (is_swap_pte(pteval)) {
>>>>>
>>>>> It looks is_swap_pte() is mis-leading?
>>>>
>>>> Hmm.. not to me, IMO. is_swap_pte() just means:
>>>>
>>>> !pte_none(pte) && !pte_present(pte)
>>>>
>>>
>>> Maybe it has some reason.
>>>
>>> I took another look into __collapse_huge_page_swapin(), which just check
>>> is_swap_pte() before do_swap_page().
>
> Thanks for pointing that out.
>
> A function that is called __collapse_huge_page_swapin() and documented
> to "Bring missing pages in from swap" will handle other types as well.
>
> Unbelievable horrible.
>
> So let's think this through so we can document it in the changelog
> properly.
>
> We could have currently ended up in do_swap_page() with
>
> (1) Migration entries. We would have waited.
>
> -> Maybe worth it to wait, maybe not. I suspect we don't stumble into
> that frequently such that we don't care. We could always unlock this
> separately later.
>
>
> (2) Device-exclusive entries. We would have converted to non-exclusive.
>
> -> See make_device_exclusive(), we cannot tolerate PMD entries and have
> to split them through FOLL_SPLIT_PMD. As popped up during a recent
> discussion, collapsing here is actually counter-productive, because
> the next conversion will PTE-map it again. (until recently, it would
> not have worked with large folios at all IIRC).
>
> -> Ok to not collapse.
>
> (3) Device-private entries. We would have migrated to RAM.
>
> -> Device-private still does not support THPs, so collapsing right now
> just means that the next device access would split the folio again.
>
> -> Ok to not collapse.
>
> (4) HWPoison entries
>
> -> Cannot collapse
>
> (5) Markers
>
> -> Cannot collapse
>
>
> I suggest we add that in some form to the patch description, stating
> that we can unlock later what we really need, and not account it towards
> max_swap_ptes.
>
>>>
>>> We have filtered non-swap entries in hpage_collapse_scan_pmd(), but
>>> we drop
>>> mmap lock before isolation. This looks we may have a chance to get
>>> non-swap
>>> entry.
>>
>> Thanks for pointing that out!
>>
>> Yep, there is a theoretical window between dropping the mmap lock
>> after the initial scan and re-acquiring it for isolation.
>>
>>>
>>> Do you think it is reasonable to add a non_swap_entry() check before
>>> do_swap_page()?
>>
>> However, that seems unlikely in practice. IMHO, the early check in
>> hpage_collapse_scan_pmd() is sufficient for now, so I'd prefer to
>> keep it as-is :)
>
> I think we really should add that check, as per reasoning above.
>
> I was looking into some possible races with uffd-wp being set before we
> enter do_swap_page(), but I think it might be okay (although very
> confusing).
How about the version below?
```
Currently, special non-swap entries (like PTE markers) are not caught
early in hpage_collapse_scan_pmd(), leading to failures deep in the
swap-in logic.
A function that is called __collapse_huge_page_swapin() and documented
to "Bring missing pages in from swap" will handle other types as well.
As analyzed by David[1], we could have ended up with the following
entry types right before do_swap_page():
(1) Migration entries. We would have waited.
-> Maybe worth it to wait, maybe not. We suspect we don't stumble
into that frequently such that we don't care. We could always
unlock this separately later.
(2) Device-exclusive entries. We would have converted to non-exclusive.
-> See make_device_exclusive(), we cannot tolerate PMD entries and
have to split them through FOLL_SPLIT_PMD. As popped up during
a recent discussion, collapsing here is actually
counter-productive, because the next conversion will PTE-map
it again.
-> Ok to not collapse.
(3) Device-private entries. We would have migrated to RAM.
-> Device-private still does not support THPs, so collapsing right
now just means that the next device access would split the
folio again.
-> Ok to not collapse.
(4) HWPoison entries
-> Cannot collapse
(5) Markers
-> Cannot collapse
First, this patch adds an early check for these non-swap entries. If
any one is found, the scan is aborted immediately with the
SCAN_PTE_NON_PRESENT result, as Lorenzo suggested[2], avoiding wasted
work.
Second, as Wei pointed out[3], we may have a chance to get a non-swap
entry, since we will drop and re-acquire the mmap lock before
__collapse_huge_page_swapin(). To handle this, we also add a
non_swap_entry() check there.
Note that we can unlock later what we really need, and not account it
towards max_swap_ptes.
[1]
https://lore.kernel.org/linux-mm/09eaca7b-9988-41c7-8d6e-4802055b3f1e@redhat.com
[2]
https://lore.kernel.org/linux-mm/7df49fe7-c6b7-426a-8680-dcd55219c8bd@lucifer.local
[3] https://lore.kernel.org/linux-mm/20251005010511.ysek2nqojebqngf3@master
```
I also think it makes sense to fold the change that adds the
non_swap_entry() check in __collapse_huge_page_swapin() into
this patch, rather than creating a new patch just for that :)
Hmmm... one thing I'm not sure about: regarding the uffd-wp
race you mentioned, is the pte_swp_uffd_wp() check needed
after non_swap_entry()? It seems like it might not be ...
```
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index f4f57ba69d72..bec3e268dc76 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1020,6 +1020,11 @@ static int __collapse_huge_page_swapin(struct
mm_struct *mm,
if (!is_swap_pte(vmf.orig_pte))
continue;
+ if (non_swap_entry(pte_to_swp_entry(vmf.orig_pte))) {
+ result = SCAN_PTE_NON_PRESENT;
+ goto out;
+ }
+
vmf.pte = pte;
vmf.ptl = ptl;
ret = do_swap_page(&vmf);
```
@David does that sound good to you?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH mm-new v2 1/1] mm/khugepaged: abort collapse scan on non-swap entries
2025-10-07 10:25 ` Lance Yang
@ 2025-10-08 1:37 ` Wei Yang
2025-10-08 9:00 ` David Hildenbrand
1 sibling, 0 replies; 16+ messages in thread
From: Wei Yang @ 2025-10-08 1:37 UTC (permalink / raw)
To: Lance Yang
Cc: David Hildenbrand, Wei Yang, akpm, lorenzo.stoakes, Liam.Howlett,
baohua, baolin.wang, dev.jain, hughd, ioworker0, kirill,
linux-kernel, linux-mm, mpenttil, npache, ryan.roberts, ziy
On Tue, Oct 07, 2025 at 06:25:13PM +0800, Lance Yang wrote:
>
>
>On 2025/10/6 22:18, David Hildenbrand wrote:
>> On 05.10.25 04:12, Lance Yang wrote:
[...]
>>
>> I was looking into some possible races with uffd-wp being set before we
>> enter do_swap_page(), but I think it might be okay (although very
>> confusing).
>
>How about the version below?
>
>```
>Currently, special non-swap entries (like PTE markers) are not caught
>early in hpage_collapse_scan_pmd(), leading to failures deep in the
>swap-in logic.
>
>A function that is called __collapse_huge_page_swapin() and documented
>to "Bring missing pages in from swap" will handle other types as well.
>
>As analyzed by David[1], we could have ended up with the following
>entry types right before do_swap_page():
>
> (1) Migration entries. We would have waited.
> -> Maybe worth it to wait, maybe not. We suspect we don't stumble
> into that frequently such that we don't care. We could always
> unlock this separately later.
>
> (2) Device-exclusive entries. We would have converted to non-exclusive.
> -> See make_device_exclusive(), we cannot tolerate PMD entries and
> have to split them through FOLL_SPLIT_PMD. As popped up during
> a recent discussion, collapsing here is actually
> counter-productive, because the next conversion will PTE-map
> it again.
> -> Ok to not collapse.
>
> (3) Device-private entries. We would have migrated to RAM.
> -> Device-private still does not support THPs, so collapsing right
> now just means that the next device access would split the
> folio again.
> -> Ok to not collapse.
>
> (4) HWPoison entries
> -> Cannot collapse
>
> (5) Markers
> -> Cannot collapse
>
>First, this patch adds an early check for these non-swap entries. If
>any one is found, the scan is aborted immediately with the
>SCAN_PTE_NON_PRESENT result, as Lorenzo suggested[2], avoiding wasted
>work.
>
>Second, as Wei pointed out[3], we may have a chance to get a non-swap
>entry, since we will drop and re-acquire the mmap lock before
>__collapse_huge_page_swapin(). To handle this, we also add a
>non_swap_entry() check there.
>
>Note that we can unlock later what we really need, and not account it
>towards max_swap_ptes.
>
>[1] https://lore.kernel.org/linux-mm/09eaca7b-9988-41c7-8d6e-4802055b3f1e@redhat.com
>[2] https://lore.kernel.org/linux-mm/7df49fe7-c6b7-426a-8680-dcd55219c8bd@lucifer.local
>[3] https://lore.kernel.org/linux-mm/20251005010511.ysek2nqojebqngf3@master
>```
>
>I also think it makes sense to fold the change that adds the
>non_swap_entry() check in __collapse_huge_page_swapin() into
>this patch, rather than creating a new patch just for that :)
>
Agree.
>Hmmm... one thing I'm not sure about: regarding the uffd-wp
>race you mentioned, is the pte_swp_uffd_wp() check needed
>after non_swap_entry()? It seems like it might not be ...
>
>```
>diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>index f4f57ba69d72..bec3e268dc76 100644
>--- a/mm/khugepaged.c
>+++ b/mm/khugepaged.c
>@@ -1020,6 +1020,11 @@ static int __collapse_huge_page_swapin(struct
>mm_struct *mm,
> if (!is_swap_pte(vmf.orig_pte))
> continue;
>
>+ if (non_swap_entry(pte_to_swp_entry(vmf.orig_pte))) {
>+ result = SCAN_PTE_NON_PRESENT;
>+ goto out;
>+ }
>+
> vmf.pte = pte;
> vmf.ptl = ptl;
> ret = do_swap_page(&vmf);
>```
>
>@David does that sound good to you?
--
Wei Yang
Help you, Help me
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH mm-new v2 1/1] mm/khugepaged: abort collapse scan on non-swap entries
2025-10-07 10:25 ` Lance Yang
2025-10-08 1:37 ` Wei Yang
@ 2025-10-08 9:00 ` David Hildenbrand
1 sibling, 0 replies; 16+ messages in thread
From: David Hildenbrand @ 2025-10-08 9:00 UTC (permalink / raw)
To: Lance Yang, Wei Yang
Cc: akpm, lorenzo.stoakes, Liam.Howlett, baohua, baolin.wang,
dev.jain, hughd, ioworker0, kirill, linux-kernel, linux-mm,
mpenttil, npache, ryan.roberts, ziy
> Second, as Wei pointed out[3], we may have a chance to get a non-swap
> entry, since we will drop and re-acquire the mmap lock before
> __collapse_huge_page_swapin(). To handle this, we also add a
> non_swap_entry() check there.
>
> Note that we can unlock later what we really need, and not account it
> towards max_swap_ptes.
>
> [1]
> https://lore.kernel.org/linux-mm/09eaca7b-9988-41c7-8d6e-4802055b3f1e@redhat.com
> [2]
> https://lore.kernel.org/linux-mm/7df49fe7-c6b7-426a-8680-dcd55219c8bd@lucifer.local
> [3] https://lore.kernel.org/linux-mm/20251005010511.ysek2nqojebqngf3@master
> ```
>
I replied to the new version.
> I also think it makes sense to fold the change that adds the
> non_swap_entry() check in __collapse_huge_page_swapin() into
> this patch, rather than creating a new patch just for that :)
>
> Hmmm... one thing I'm not sure about: regarding the uffd-wp
> race you mentioned, is the pte_swp_uffd_wp() check needed
> after non_swap_entry()? It seems like it might not be ...
>
If we check non_swap_entry(), there is no need to check for other
markers, agreed.
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2025-10-08 9:00 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-01 3:22 [PATCH mm-new v2 1/1] mm/khugepaged: abort collapse scan on non-swap entries Lance Yang
2025-10-01 8:31 ` David Hildenbrand
2025-10-01 9:38 ` Lance Yang
2025-10-01 8:54 ` Wei Yang
2025-10-01 9:15 ` David Hildenbrand
2025-10-01 10:05 ` Lance Yang
2025-10-01 13:52 ` Wei Yang
2025-10-05 1:05 ` Wei Yang
2025-10-05 2:12 ` Lance Yang
2025-10-06 14:18 ` David Hildenbrand
2025-10-06 15:02 ` Lance Yang
2025-10-07 10:25 ` Lance Yang
2025-10-08 1:37 ` Wei Yang
2025-10-08 9:00 ` David Hildenbrand
2025-10-01 10:20 ` Dev Jain
2025-10-01 10:48 ` Lance Yang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox