* [PATCH mm-new 0/2] mm/khugepaged: refactor and merge PTE scanning logic
@ 2025-10-02 7:32 Lance Yang
2025-10-02 7:32 ` [PATCH mm-new 1/2] mm/khugepaged: optimize PTE scanning with if-else-if-else-if chain Lance Yang
2025-10-02 7:32 ` [PATCH mm-new 2/2] mm/khugepaged: merge PTE scanning logic into a new helper Lance Yang
0 siblings, 2 replies; 18+ messages in thread
From: Lance Yang @ 2025-10-02 7:32 UTC (permalink / raw)
To: akpm, david, lorenzo.stoakes
Cc: ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain,
baohua, ioworker0, richard.weiyang, linux-kernel, linux-mm
Hi all,
This series cleans up the almost-duplicated PTE scanning logic in the
collapse path.
The first one is a preparatory step that refactors both loops to use
a single if-else-if-else-if chain for checking disjoint PTEs.
The second one then extracts the common logic into a shared helper.
Thanks,
Lance
Lance Yang (2):
mm/khugepaged: optimize PTE scanning with if-else-if-else-if chain
mm/khugepaged: merge PTE scanning logic into a new helper
mm/khugepaged.c | 171 +++++++++++++++++++++++++++++-------------------
1 file changed, 104 insertions(+), 67 deletions(-)
--
2.49.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH mm-new 1/2] mm/khugepaged: optimize PTE scanning with if-else-if-else-if chain
2025-10-02 7:32 [PATCH mm-new 0/2] mm/khugepaged: refactor and merge PTE scanning logic Lance Yang
@ 2025-10-02 7:32 ` Lance Yang
2025-10-03 13:21 ` Wei Yang
` (2 more replies)
2025-10-02 7:32 ` [PATCH mm-new 2/2] mm/khugepaged: merge PTE scanning logic into a new helper Lance Yang
1 sibling, 3 replies; 18+ messages in thread
From: Lance Yang @ 2025-10-02 7:32 UTC (permalink / raw)
To: akpm, david, lorenzo.stoakes
Cc: ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain,
baohua, ioworker0, richard.weiyang, linux-kernel, linux-mm,
Lance Yang
From: Lance Yang <lance.yang@linux.dev>
As pointed out by Dev, the PTE checks for disjoint conditions in the
scanning loops can be optimized. is_swap_pte, (pte_none && is_zero_pfn),
and pte_uffd_wp are mutually exclusive.
This patch refactors the loops in both __collapse_huge_page_isolate() and
hpage_collapse_scan_pmd() to use a continuous if-else-if-else-if chain
instead of separate if blocks.
Also, this is a preparatory step to make it easier to merge the
almost-duplicated scanning logic in these two functions, as suggested
by David.
Suggested-by: Dev Jain <dev.jain@arm.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Lance Yang <lance.yang@linux.dev>
---
mm/khugepaged.c | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index f4f57ba69d72..808523f92c7b 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -548,8 +548,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
for (_pte = pte; _pte < pte + HPAGE_PMD_NR;
_pte++, addr += PAGE_SIZE) {
pte_t pteval = ptep_get(_pte);
- if (pte_none(pteval) || (pte_present(pteval) &&
- is_zero_pfn(pte_pfn(pteval)))) {
+ if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
++none_or_zero;
if (!userfaultfd_armed(vma) &&
(!cc->is_khugepaged ||
@@ -560,12 +559,10 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
goto out;
}
- }
- if (!pte_present(pteval)) {
+ } else if (!pte_present(pteval)) {
result = SCAN_PTE_NON_PRESENT;
goto out;
- }
- if (pte_uffd_wp(pteval)) {
+ } else if (pte_uffd_wp(pteval)) {
result = SCAN_PTE_UFFD_WP;
goto out;
}
@@ -1316,8 +1313,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
count_vm_event(THP_SCAN_EXCEED_SWAP_PTE);
goto out_unmap;
}
- }
- if (pte_uffd_wp(pteval)) {
+ } else if (pte_uffd_wp(pteval)) {
/*
* Don't collapse the page if any of the small
* PTEs are armed with uffd write protection.
--
2.49.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH mm-new 2/2] mm/khugepaged: merge PTE scanning logic into a new helper
2025-10-02 7:32 [PATCH mm-new 0/2] mm/khugepaged: refactor and merge PTE scanning logic Lance Yang
2025-10-02 7:32 ` [PATCH mm-new 1/2] mm/khugepaged: optimize PTE scanning with if-else-if-else-if chain Lance Yang
@ 2025-10-02 7:32 ` Lance Yang
2025-10-03 17:05 ` Dev Jain
2025-10-03 17:11 ` Zi Yan
1 sibling, 2 replies; 18+ messages in thread
From: Lance Yang @ 2025-10-02 7:32 UTC (permalink / raw)
To: akpm, david, lorenzo.stoakes
Cc: ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain,
baohua, ioworker0, richard.weiyang, linux-kernel, linux-mm,
Lance Yang
From: Lance Yang <lance.yang@linux.dev>
As David suggested, the PTE scanning logic in hpage_collapse_scan_pmd()
and __collapse_huge_page_isolate() was almost duplicated.
This patch cleans things up by moving all the common PTE checking logic
into a new shared helper, thp_collapse_check_pte().
Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Lance Yang <lance.yang@linux.dev>
---
mm/khugepaged.c | 167 ++++++++++++++++++++++++++++++------------------
1 file changed, 104 insertions(+), 63 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 808523f92c7b..2a897cfb1d03 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -61,6 +61,12 @@ enum scan_result {
SCAN_PAGE_FILLED,
};
+enum pte_check_result {
+ PTE_CHECK_SUCCEED,
+ PTE_CHECK_CONTINUE,
+ PTE_CHECK_FAIL,
+};
+
#define CREATE_TRACE_POINTS
#include <trace/events/huge_memory.h>
@@ -533,6 +539,87 @@ static void release_pte_pages(pte_t *pte, pte_t *_pte,
}
}
+/*
+ * thp_collapse_check_pte - Check if a PTE is suitable for THP collapse
+ * @pte: PTE to check
+ * @vma: VMA the PTE belongs to
+ * @cc: Collapse control settings
+ * @scan_swap_pte: Allow scanning of swap PTEs if true
+ * @none_or_zero: Counter for none/zero PTEs (must be non-NULL)
+ * @unmapped: Counter for swap PTEs (must be non-NULL if scan_swap_pte
+ * is true)
+ * @scan_result: Used to return the failure reason (SCAN_*) on a
+ * PTE_CHECK_FAIL return. Must be non-NULL
+ *
+ * Returns:
+ * PTE_CHECK_SUCCEED - Valid PTE, proceed with collapse
+ * PTE_CHECK_CONTINUE - Skip this none/zero PTE but continue scanning
+ * PTE_CHECK_FAIL - Abort collapse scan
+ */
+static inline int thp_collapse_check_pte(pte_t pte, struct vm_area_struct *vma,
+ struct collapse_control *cc, bool scan_swap_pte,
+ int *none_or_zero, int *unmapped, int *scan_result)
+{
+ VM_BUG_ON(!none_or_zero || !scan_result);
+ VM_BUG_ON(scan_swap_pte && !unmapped);
+
+ if (pte_none(pte) || is_zero_pfn(pte_pfn(pte))) {
+ (*none_or_zero)++;
+ if (!userfaultfd_armed(vma) &&
+ (!cc->is_khugepaged ||
+ *none_or_zero <= khugepaged_max_ptes_none)) {
+ return PTE_CHECK_CONTINUE;
+ } else {
+ *scan_result = SCAN_EXCEED_NONE_PTE;
+ count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
+ return PTE_CHECK_FAIL;
+ }
+ } else if (!pte_present(pte)) {
+ if (!scan_swap_pte) {
+ *scan_result = SCAN_PTE_NON_PRESENT;
+ return PTE_CHECK_FAIL;
+ }
+
+ if (non_swap_entry(pte_to_swp_entry(pte))) {
+ *scan_result = SCAN_PTE_NON_PRESENT;
+ return PTE_CHECK_FAIL;
+ }
+
+ (*unmapped)++;
+ if (!cc->is_khugepaged ||
+ *unmapped <= khugepaged_max_ptes_swap) {
+ /*
+ * Always be strict with uffd-wp
+ * enabled swap entries. Please see
+ * comment below for pte_uffd_wp().
+ */
+ if (pte_swp_uffd_wp(pte)) {
+ *scan_result = SCAN_PTE_UFFD_WP;
+ return PTE_CHECK_FAIL;
+ }
+ return PTE_CHECK_CONTINUE;
+ } else {
+ *scan_result = SCAN_EXCEED_SWAP_PTE;
+ count_vm_event(THP_SCAN_EXCEED_SWAP_PTE);
+ return PTE_CHECK_FAIL;
+ }
+ } else if (pte_uffd_wp(pte)) {
+ /*
+ * Don't collapse the page if any of the small
+ * PTEs are armed with uffd write protection.
+ * Here we can also mark the new huge pmd as
+ * write protected if any of the small ones is
+ * marked but that could bring unknown
+ * userfault messages that falls outside of
+ * the registered range. So, just be simple.
+ */
+ *scan_result = SCAN_PTE_UFFD_WP;
+ return PTE_CHECK_FAIL;
+ }
+
+ return PTE_CHECK_SUCCEED;
+}
+
static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
unsigned long start_addr,
pte_t *pte,
@@ -544,28 +631,20 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
unsigned long addr = start_addr;
pte_t *_pte;
int none_or_zero = 0, shared = 0, result = SCAN_FAIL, referenced = 0;
+ int pte_check_res;
for (_pte = pte; _pte < pte + HPAGE_PMD_NR;
_pte++, addr += PAGE_SIZE) {
pte_t pteval = ptep_get(_pte);
- if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
- ++none_or_zero;
- if (!userfaultfd_armed(vma) &&
- (!cc->is_khugepaged ||
- none_or_zero <= khugepaged_max_ptes_none)) {
- continue;
- } else {
- result = SCAN_EXCEED_NONE_PTE;
- count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
- goto out;
- }
- } else if (!pte_present(pteval)) {
- result = SCAN_PTE_NON_PRESENT;
- goto out;
- } else if (pte_uffd_wp(pteval)) {
- result = SCAN_PTE_UFFD_WP;
+ pte_check_res = thp_collapse_check_pte(
+ pteval, vma, cc, false, /* scan_swap_pte = false */
+ &none_or_zero, NULL, &result);
+
+ if (pte_check_res == PTE_CHECK_CONTINUE)
+ continue;
+ else if (pte_check_res == PTE_CHECK_FAIL)
goto out;
- }
+
page = vm_normal_page(vma, addr, pteval);
if (unlikely(!page) || unlikely(is_zone_device_page(page))) {
result = SCAN_PAGE_NULL;
@@ -1260,6 +1339,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
unsigned long addr;
spinlock_t *ptl;
int node = NUMA_NO_NODE, unmapped = 0;
+ int pte_check_res;
VM_BUG_ON(start_addr & ~HPAGE_PMD_MASK);
@@ -1278,54 +1358,15 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
_pte++, addr += PAGE_SIZE) {
pte_t pteval = ptep_get(_pte);
- if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
- ++none_or_zero;
- if (!userfaultfd_armed(vma) &&
- (!cc->is_khugepaged ||
- none_or_zero <= khugepaged_max_ptes_none)) {
- continue;
- } else {
- result = SCAN_EXCEED_NONE_PTE;
- count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
- goto out_unmap;
- }
- } else if (!pte_present(pteval)) {
- if (non_swap_entry(pte_to_swp_entry(pteval))) {
- result = SCAN_PTE_NON_PRESENT;
- goto out_unmap;
- }
- ++unmapped;
- if (!cc->is_khugepaged ||
- unmapped <= khugepaged_max_ptes_swap) {
- /*
- * Always be strict with uffd-wp
- * enabled swap entries. Please see
- * comment below for pte_uffd_wp().
- */
- if (pte_swp_uffd_wp(pteval)) {
- result = SCAN_PTE_UFFD_WP;
- goto out_unmap;
- }
- continue;
- } else {
- result = SCAN_EXCEED_SWAP_PTE;
- count_vm_event(THP_SCAN_EXCEED_SWAP_PTE);
- goto out_unmap;
- }
- } else if (pte_uffd_wp(pteval)) {
- /*
- * Don't collapse the page if any of the small
- * PTEs are armed with uffd write protection.
- * Here we can also mark the new huge pmd as
- * write protected if any of the small ones is
- * marked but that could bring unknown
- * userfault messages that falls outside of
- * the registered range. So, just be simple.
- */
- result = SCAN_PTE_UFFD_WP;
+ pte_check_res = thp_collapse_check_pte(
+ pteval, vma, cc, true, /* scan_swap_pte = true */
+ &none_or_zero, &unmapped, &result);
+
+ if (pte_check_res == PTE_CHECK_CONTINUE)
+ continue;
+ else if (pte_check_res == PTE_CHECK_FAIL)
goto out_unmap;
- }
page = vm_normal_page(vma, addr, pteval);
if (unlikely(!page) || unlikely(is_zone_device_page(page))) {
--
2.49.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH mm-new 1/2] mm/khugepaged: optimize PTE scanning with if-else-if-else-if chain
2025-10-02 7:32 ` [PATCH mm-new 1/2] mm/khugepaged: optimize PTE scanning with if-else-if-else-if chain Lance Yang
@ 2025-10-03 13:21 ` Wei Yang
2025-10-03 16:33 ` Dev Jain
2025-10-03 17:04 ` Zi Yan
2 siblings, 0 replies; 18+ messages in thread
From: Wei Yang @ 2025-10-03 13:21 UTC (permalink / raw)
To: Lance Yang
Cc: akpm, david, lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett,
npache, ryan.roberts, dev.jain, baohua, ioworker0,
richard.weiyang, linux-kernel, linux-mm
On Thu, Oct 02, 2025 at 03:32:54PM +0800, Lance Yang wrote:
>From: Lance Yang <lance.yang@linux.dev>
>
>As pointed out by Dev, the PTE checks for disjoint conditions in the
>scanning loops can be optimized. is_swap_pte, (pte_none && is_zero_pfn),
>and pte_uffd_wp are mutually exclusive.
>
>This patch refactors the loops in both __collapse_huge_page_isolate() and
>hpage_collapse_scan_pmd() to use a continuous if-else-if-else-if chain
>instead of separate if blocks.
>
>Also, this is a preparatory step to make it easier to merge the
>almost-duplicated scanning logic in these two functions, as suggested
>by David.
>
>Suggested-by: Dev Jain <dev.jain@arm.com>
>Suggested-by: David Hildenbrand <david@redhat.com>
>Signed-off-by: Lance Yang <lance.yang@linux.dev>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
--
Wei Yang
Help you, Help me
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH mm-new 1/2] mm/khugepaged: optimize PTE scanning with if-else-if-else-if chain
2025-10-02 7:32 ` [PATCH mm-new 1/2] mm/khugepaged: optimize PTE scanning with if-else-if-else-if chain Lance Yang
2025-10-03 13:21 ` Wei Yang
@ 2025-10-03 16:33 ` Dev Jain
2025-10-04 3:08 ` Lance Yang
2025-10-03 17:04 ` Zi Yan
2 siblings, 1 reply; 18+ messages in thread
From: Dev Jain @ 2025-10-03 16:33 UTC (permalink / raw)
To: Lance Yang, akpm, david, lorenzo.stoakes
Cc: ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts, baohua,
ioworker0, richard.weiyang, linux-kernel, linux-mm
On 02/10/25 1:02 pm, Lance Yang wrote:
> From: Lance Yang <lance.yang@linux.dev>
>
> As pointed out by Dev, the PTE checks for disjoint conditions in the
> scanning loops can be optimized. is_swap_pte, (pte_none && is_zero_pfn),
> and pte_uffd_wp are mutually exclusive.
>
> This patch refactors the loops in both __collapse_huge_page_isolate() and
> hpage_collapse_scan_pmd() to use a continuous if-else-if-else-if chain
> instead of separate if blocks.
>
> Also, this is a preparatory step to make it easier to merge the
> almost-duplicated scanning logic in these two functions, as suggested
> by David.
>
> Suggested-by: Dev Jain <dev.jain@arm.com>
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Lance Yang <lance.yang@linux.dev>
> ---
> mm/khugepaged.c | 12 ++++--------
> 1 file changed, 4 insertions(+), 8 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index f4f57ba69d72..808523f92c7b 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -548,8 +548,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> for (_pte = pte; _pte < pte + HPAGE_PMD_NR;
> _pte++, addr += PAGE_SIZE) {
> pte_t pteval = ptep_get(_pte);
> - if (pte_none(pteval) || (pte_present(pteval) &&
> - is_zero_pfn(pte_pfn(pteval)))) {
> + if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
Should have mentioned in the description that pte_present() is not required
here, so removing it.
Reviewed-by: Dev Jain <dev.jain@arm.com>
> ++none_or_zero;
> if (!userfaultfd_armed(vma) &&
> (!cc->is_khugepaged ||
> @@ -560,12 +559,10 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
> goto out;
> }
> - }
> - if (!pte_present(pteval)) {
> + } else if (!pte_present(pteval)) {
> result = SCAN_PTE_NON_PRESENT;
> goto out;
> - }
> - if (pte_uffd_wp(pteval)) {
> + } else if (pte_uffd_wp(pteval)) {
> result = SCAN_PTE_UFFD_WP;
> goto out;
> }
> @@ -1316,8 +1313,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
> count_vm_event(THP_SCAN_EXCEED_SWAP_PTE);
> goto out_unmap;
> }
> - }
> - if (pte_uffd_wp(pteval)) {
> + } else if (pte_uffd_wp(pteval)) {
> /*
> * Don't collapse the page if any of the small
> * PTEs are armed with uffd write protection.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH mm-new 1/2] mm/khugepaged: optimize PTE scanning with if-else-if-else-if chain
2025-10-02 7:32 ` [PATCH mm-new 1/2] mm/khugepaged: optimize PTE scanning with if-else-if-else-if chain Lance Yang
2025-10-03 13:21 ` Wei Yang
2025-10-03 16:33 ` Dev Jain
@ 2025-10-03 17:04 ` Zi Yan
2 siblings, 0 replies; 18+ messages in thread
From: Zi Yan @ 2025-10-03 17:04 UTC (permalink / raw)
To: Lance Yang
Cc: akpm, david, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache,
ryan.roberts, dev.jain, baohua, ioworker0, richard.weiyang,
linux-kernel, linux-mm
On 2 Oct 2025, at 3:32, Lance Yang wrote:
> From: Lance Yang <lance.yang@linux.dev>
>
> As pointed out by Dev, the PTE checks for disjoint conditions in the
> scanning loops can be optimized. is_swap_pte, (pte_none && is_zero_pfn),
> and pte_uffd_wp are mutually exclusive.
>
> This patch refactors the loops in both __collapse_huge_page_isolate() and
> hpage_collapse_scan_pmd() to use a continuous if-else-if-else-if chain
> instead of separate if blocks.
>
> Also, this is a preparatory step to make it easier to merge the
> almost-duplicated scanning logic in these two functions, as suggested
> by David.
>
> Suggested-by: Dev Jain <dev.jain@arm.com>
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Lance Yang <lance.yang@linux.dev>
> ---
> mm/khugepaged.c | 12 ++++--------
> 1 file changed, 4 insertions(+), 8 deletions(-)
>
LGTM. Reviewed-by: Zi Yan <ziy@nvidia.com>
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH mm-new 2/2] mm/khugepaged: merge PTE scanning logic into a new helper
2025-10-02 7:32 ` [PATCH mm-new 2/2] mm/khugepaged: merge PTE scanning logic into a new helper Lance Yang
@ 2025-10-03 17:05 ` Dev Jain
2025-10-04 3:03 ` Lance Yang
2025-10-04 9:42 ` Wei Yang
2025-10-03 17:11 ` Zi Yan
1 sibling, 2 replies; 18+ messages in thread
From: Dev Jain @ 2025-10-03 17:05 UTC (permalink / raw)
To: Lance Yang, akpm, david, lorenzo.stoakes
Cc: ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts, baohua,
ioworker0, richard.weiyang, linux-kernel, linux-mm
On 02/10/25 1:02 pm, Lance Yang wrote:
> From: Lance Yang <lance.yang@linux.dev>
>
> As David suggested, the PTE scanning logic in hpage_collapse_scan_pmd()
> and __collapse_huge_page_isolate() was almost duplicated.
>
> This patch cleans things up by moving all the common PTE checking logic
> into a new shared helper, thp_collapse_check_pte().
>
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Lance Yang <lance.yang@linux.dev>
> ---
>
In hpage_collapse_scan_pmd(), we enter with mmap lock held, so for
an anonymous vma, is it even possible to hit if (!folio_test_anon(folio))?
In which case we can replace this with VM_BUG_ON_FOLIO and abstract away
till the folio_maybe_mapped_shared() block?
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH mm-new 2/2] mm/khugepaged: merge PTE scanning logic into a new helper
2025-10-02 7:32 ` [PATCH mm-new 2/2] mm/khugepaged: merge PTE scanning logic into a new helper Lance Yang
2025-10-03 17:05 ` Dev Jain
@ 2025-10-03 17:11 ` Zi Yan
2025-10-04 3:06 ` Lance Yang
1 sibling, 1 reply; 18+ messages in thread
From: Zi Yan @ 2025-10-03 17:11 UTC (permalink / raw)
To: Lance Yang
Cc: akpm, david, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache,
ryan.roberts, dev.jain, baohua, ioworker0, richard.weiyang,
linux-kernel, linux-mm
On 2 Oct 2025, at 3:32, Lance Yang wrote:
> From: Lance Yang <lance.yang@linux.dev>
>
> As David suggested, the PTE scanning logic in hpage_collapse_scan_pmd()
> and __collapse_huge_page_isolate() was almost duplicated.
>
> This patch cleans things up by moving all the common PTE checking logic
> into a new shared helper, thp_collapse_check_pte().
>
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Lance Yang <lance.yang@linux.dev>
> ---
> mm/khugepaged.c | 167 ++++++++++++++++++++++++++++++------------------
> 1 file changed, 104 insertions(+), 63 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 808523f92c7b..2a897cfb1d03 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -61,6 +61,12 @@ enum scan_result {
> SCAN_PAGE_FILLED,
> };
>
> +enum pte_check_result {
> + PTE_CHECK_SUCCEED,
> + PTE_CHECK_CONTINUE,
> + PTE_CHECK_FAIL,
> +};
> +
> #define CREATE_TRACE_POINTS
> #include <trace/events/huge_memory.h>
>
> @@ -533,6 +539,87 @@ static void release_pte_pages(pte_t *pte, pte_t *_pte,
> }
> }
>
> +/*
> + * thp_collapse_check_pte - Check if a PTE is suitable for THP collapse
> + * @pte: PTE to check
> + * @vma: VMA the PTE belongs to
> + * @cc: Collapse control settings
> + * @scan_swap_pte: Allow scanning of swap PTEs if true
> + * @none_or_zero: Counter for none/zero PTEs (must be non-NULL)
> + * @unmapped: Counter for swap PTEs (must be non-NULL if scan_swap_pte
> + * is true)
> + * @scan_result: Used to return the failure reason (SCAN_*) on a
> + * PTE_CHECK_FAIL return. Must be non-NULL
> + *
> + * Returns:
> + * PTE_CHECK_SUCCEED - Valid PTE, proceed with collapse
> + * PTE_CHECK_CONTINUE - Skip this none/zero PTE but continue scanning
> + * PTE_CHECK_FAIL - Abort collapse scan
> + */
> +static inline int thp_collapse_check_pte(pte_t pte, struct vm_area_struct *vma,
> + struct collapse_control *cc, bool scan_swap_pte,
> + int *none_or_zero, int *unmapped, int *scan_result)
> +{
> + VM_BUG_ON(!none_or_zero || !scan_result);
> + VM_BUG_ON(scan_swap_pte && !unmapped);
Please use VM_WARN_ONCE_ON and just return. Or put all stats in a struct,
pass and check the struct pointer instead.
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH mm-new 2/2] mm/khugepaged: merge PTE scanning logic into a new helper
2025-10-03 17:05 ` Dev Jain
@ 2025-10-04 3:03 ` Lance Yang
2025-10-04 4:42 ` Dev Jain
2025-10-04 9:42 ` Wei Yang
1 sibling, 1 reply; 18+ messages in thread
From: Lance Yang @ 2025-10-04 3:03 UTC (permalink / raw)
To: Dev Jain
Cc: ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts, baohua,
ioworker0, david, richard.weiyang, linux-kernel, lorenzo.stoakes,
linux-mm, akpm
On 2025/10/4 01:05, Dev Jain wrote:
>
> On 02/10/25 1:02 pm, Lance Yang wrote:
>> From: Lance Yang <lance.yang@linux.dev>
>>
>> As David suggested, the PTE scanning logic in hpage_collapse_scan_pmd()
>> and __collapse_huge_page_isolate() was almost duplicated.
>>
>> This patch cleans things up by moving all the common PTE checking logic
>> into a new shared helper, thp_collapse_check_pte().
>>
>> Suggested-by: David Hildenbrand <david@redhat.com>
>> Signed-off-by: Lance Yang <lance.yang@linux.dev>
>> ---
>
> In hpage_collapse_scan_pmd(), we enter with mmap lock held, so for
> an anonymous vma, is it even possible to hit if (!folio_test_anon(folio))?
Ah, indeed :)
> In which case we can replace this with VM_BUG_ON_FOLIO and abstract away
> till the folio_maybe_mapped_shared() block?
So you meant something like this:
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 2a897cfb1d03..ef87d7fe3d50 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1374,11 +1374,7 @@ static int hpage_collapse_scan_pmd(struct
mm_struct *mm,
goto out_unmap;
}
folio = page_folio(page);
-
- if (!folio_test_anon(folio)) {
- result = SCAN_PAGE_ANON;
- goto out_unmap;
- }
+ VM_BUG_ON_FOLIO(!folio_test_anon(folio), folio);
/*
* We treat a single page as shared if any part of the THP
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH mm-new 2/2] mm/khugepaged: merge PTE scanning logic into a new helper
2025-10-03 17:11 ` Zi Yan
@ 2025-10-04 3:06 ` Lance Yang
0 siblings, 0 replies; 18+ messages in thread
From: Lance Yang @ 2025-10-04 3:06 UTC (permalink / raw)
To: Zi Yan
Cc: akpm, david, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache,
ryan.roberts, dev.jain, baohua, ioworker0, richard.weiyang,
linux-kernel, linux-mm
On 2025/10/4 01:11, Zi Yan wrote:
> On 2 Oct 2025, at 3:32, Lance Yang wrote:
>
>> From: Lance Yang <lance.yang@linux.dev>
>>
>> As David suggested, the PTE scanning logic in hpage_collapse_scan_pmd()
>> and __collapse_huge_page_isolate() was almost duplicated.
>>
>> This patch cleans things up by moving all the common PTE checking logic
>> into a new shared helper, thp_collapse_check_pte().
>>
>> Suggested-by: David Hildenbrand <david@redhat.com>
>> Signed-off-by: Lance Yang <lance.yang@linux.dev>
>> ---
>> mm/khugepaged.c | 167 ++++++++++++++++++++++++++++++------------------
>> 1 file changed, 104 insertions(+), 63 deletions(-)
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index 808523f92c7b..2a897cfb1d03 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -61,6 +61,12 @@ enum scan_result {
>> SCAN_PAGE_FILLED,
>> };
>>
>> +enum pte_check_result {
>> + PTE_CHECK_SUCCEED,
>> + PTE_CHECK_CONTINUE,
>> + PTE_CHECK_FAIL,
>> +};
>> +
>> #define CREATE_TRACE_POINTS
>> #include <trace/events/huge_memory.h>
>>
>> @@ -533,6 +539,87 @@ static void release_pte_pages(pte_t *pte, pte_t *_pte,
>> }
>> }
>>
>> +/*
>> + * thp_collapse_check_pte - Check if a PTE is suitable for THP collapse
>> + * @pte: PTE to check
>> + * @vma: VMA the PTE belongs to
>> + * @cc: Collapse control settings
>> + * @scan_swap_pte: Allow scanning of swap PTEs if true
>> + * @none_or_zero: Counter for none/zero PTEs (must be non-NULL)
>> + * @unmapped: Counter for swap PTEs (must be non-NULL if scan_swap_pte
>> + * is true)
>> + * @scan_result: Used to return the failure reason (SCAN_*) on a
>> + * PTE_CHECK_FAIL return. Must be non-NULL
>> + *
>> + * Returns:
>> + * PTE_CHECK_SUCCEED - Valid PTE, proceed with collapse
>> + * PTE_CHECK_CONTINUE - Skip this none/zero PTE but continue scanning
>> + * PTE_CHECK_FAIL - Abort collapse scan
>> + */
>> +static inline int thp_collapse_check_pte(pte_t pte, struct vm_area_struct *vma,
>> + struct collapse_control *cc, bool scan_swap_pte,
>> + int *none_or_zero, int *unmapped, int *scan_result)
>> +{
>> + VM_BUG_ON(!none_or_zero || !scan_result);
>> + VM_BUG_ON(scan_swap_pte && !unmapped);
>
> Please use VM_WARN_ONCE_ON and just return. Or put all stats in a struct,
> pass and check the struct pointer instead.
Yes, will use VM_WARN_ONCE_ON instead of VM_BUG_ON, and just return
PTE_CHECK_FAIL.
Thanks,
Lance
>
>
> Best Regards,
> Yan, Zi
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH mm-new 1/2] mm/khugepaged: optimize PTE scanning with if-else-if-else-if chain
2025-10-03 16:33 ` Dev Jain
@ 2025-10-04 3:08 ` Lance Yang
0 siblings, 0 replies; 18+ messages in thread
From: Lance Yang @ 2025-10-04 3:08 UTC (permalink / raw)
To: Dev Jain
Cc: ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts, baohua,
lorenzo.stoakes, david, ioworker0, richard.weiyang, linux-kernel,
linux-mm, akpm
On 2025/10/4 00:33, Dev Jain wrote:
>
> On 02/10/25 1:02 pm, Lance Yang wrote:
>> From: Lance Yang <lance.yang@linux.dev>
>>
>> As pointed out by Dev, the PTE checks for disjoint conditions in the
>> scanning loops can be optimized. is_swap_pte, (pte_none && is_zero_pfn),
>> and pte_uffd_wp are mutually exclusive.
>>
>> This patch refactors the loops in both __collapse_huge_page_isolate() and
>> hpage_collapse_scan_pmd() to use a continuous if-else-if-else-if chain
>> instead of separate if blocks.
>>
>> Also, this is a preparatory step to make it easier to merge the
>> almost-duplicated scanning logic in these two functions, as suggested
>> by David.
>>
>> Suggested-by: Dev Jain <dev.jain@arm.com>
>> Suggested-by: David Hildenbrand <david@redhat.com>
>> Signed-off-by: Lance Yang <lance.yang@linux.dev>
>> ---
>> mm/khugepaged.c | 12 ++++--------
>> 1 file changed, 4 insertions(+), 8 deletions(-)
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index f4f57ba69d72..808523f92c7b 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -548,8 +548,7 @@ static int __collapse_huge_page_isolate(struct
>> vm_area_struct *vma,
>> for (_pte = pte; _pte < pte + HPAGE_PMD_NR;
>> _pte++, addr += PAGE_SIZE) {
>> pte_t pteval = ptep_get(_pte);
>> - if (pte_none(pteval) || (pte_present(pteval) &&
>> - is_zero_pfn(pte_pfn(pteval)))) {
>> + if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
>
> Should have mentioned in the description that pte_present() is not required
> here, so removing it.
Yep, got it.
>
> Reviewed-by: Dev Jain <dev.jain@arm.com>
Cheers,
Lance
>
>> ++none_or_zero;
>> if (!userfaultfd_armed(vma) &&
>> (!cc->is_khugepaged ||
>> @@ -560,12 +559,10 @@ static int __collapse_huge_page_isolate(struct
>> vm_area_struct *vma,
>> count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
>> goto out;
>> }
>> - }
>> - if (!pte_present(pteval)) {
>> + } else if (!pte_present(pteval)) {
>> result = SCAN_PTE_NON_PRESENT;
>> goto out;
>> - }
>> - if (pte_uffd_wp(pteval)) {
>> + } else if (pte_uffd_wp(pteval)) {
>> result = SCAN_PTE_UFFD_WP;
>> goto out;
>> }
>> @@ -1316,8 +1313,7 @@ static int hpage_collapse_scan_pmd(struct
>> mm_struct *mm,
>> count_vm_event(THP_SCAN_EXCEED_SWAP_PTE);
>> goto out_unmap;
>> }
>> - }
>> - if (pte_uffd_wp(pteval)) {
>> + } else if (pte_uffd_wp(pteval)) {
>> /*
>> * Don't collapse the page if any of the small
>> * PTEs are armed with uffd write protection.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH mm-new 2/2] mm/khugepaged: merge PTE scanning logic into a new helper
2025-10-04 3:03 ` Lance Yang
@ 2025-10-04 4:42 ` Dev Jain
2025-10-04 5:24 ` Lance Yang
0 siblings, 1 reply; 18+ messages in thread
From: Dev Jain @ 2025-10-04 4:42 UTC (permalink / raw)
To: Lance Yang
Cc: ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts, baohua,
ioworker0, david, richard.weiyang, linux-kernel, lorenzo.stoakes,
linux-mm, akpm
On 04/10/25 8:33 am, Lance Yang wrote:
>
>
> On 2025/10/4 01:05, Dev Jain wrote:
>>
>> On 02/10/25 1:02 pm, Lance Yang wrote:
>>> From: Lance Yang <lance.yang@linux.dev>
>>>
>>> As David suggested, the PTE scanning logic in hpage_collapse_scan_pmd()
>>> and __collapse_huge_page_isolate() was almost duplicated.
>>>
>>> This patch cleans things up by moving all the common PTE checking logic
>>> into a new shared helper, thp_collapse_check_pte().
>>>
>>> Suggested-by: David Hildenbrand <david@redhat.com>
>>> Signed-off-by: Lance Yang <lance.yang@linux.dev>
>>> ---
>>
>> In hpage_collapse_scan_pmd(), we enter with mmap lock held, so for
>> an anonymous vma, is it even possible to hit if
>> (!folio_test_anon(folio))?
>
> Ah, indeed :)
>
>> In which case we can replace this with VM_BUG_ON_FOLIO and abstract away
>> till the folio_maybe_mapped_shared() block?
>
> So you meant something like this:
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 2a897cfb1d03..ef87d7fe3d50 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1374,11 +1374,7 @@ static int hpage_collapse_scan_pmd(struct
> mm_struct *mm,
> goto out_unmap;
> }
> folio = page_folio(page);
> -
> - if (!folio_test_anon(folio)) {
> - result = SCAN_PAGE_ANON;
> - goto out_unmap;
> - }
> + VM_BUG_ON_FOLIO(!folio_test_anon(folio), folio);
Yes, I would suggest a different patch making this change, then in the
last patch you can abstract away till the shared statistics line since that
much code will become common between scan_pmd and _isolate.
>
> /*
> * We treat a single page as shared if any part of the
> THP
>
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH mm-new 2/2] mm/khugepaged: merge PTE scanning logic into a new helper
2025-10-04 4:42 ` Dev Jain
@ 2025-10-04 5:24 ` Lance Yang
0 siblings, 0 replies; 18+ messages in thread
From: Lance Yang @ 2025-10-04 5:24 UTC (permalink / raw)
To: Dev Jain
Cc: ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts, baohua,
ioworker0, david, richard.weiyang, linux-kernel, lorenzo.stoakes,
linux-mm, akpm
On 2025/10/4 12:42, Dev Jain wrote:
>
> On 04/10/25 8:33 am, Lance Yang wrote:
>>
>>
>> On 2025/10/4 01:05, Dev Jain wrote:
>>>
>>> On 02/10/25 1:02 pm, Lance Yang wrote:
>>>> From: Lance Yang <lance.yang@linux.dev>
>>>>
>>>> As David suggested, the PTE scanning logic in hpage_collapse_scan_pmd()
>>>> and __collapse_huge_page_isolate() was almost duplicated.
>>>>
>>>> This patch cleans things up by moving all the common PTE checking logic
>>>> into a new shared helper, thp_collapse_check_pte().
>>>>
>>>> Suggested-by: David Hildenbrand <david@redhat.com>
>>>> Signed-off-by: Lance Yang <lance.yang@linux.dev>
>>>> ---
>>>
>>> In hpage_collapse_scan_pmd(), we enter with mmap lock held, so for
>>> an anonymous vma, is it even possible to hit if (!
>>> folio_test_anon(folio))?
>>
>> Ah, indeed :)
>>
>>> In which case we can replace this with VM_BUG_ON_FOLIO and abstract away
>>> till the folio_maybe_mapped_shared() block?
>>
>> So you meant something like this:
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index 2a897cfb1d03..ef87d7fe3d50 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -1374,11 +1374,7 @@ static int hpage_collapse_scan_pmd(struct
>> mm_struct *mm,
>> goto out_unmap;
>> }
>> folio = page_folio(page);
>> -
>> - if (!folio_test_anon(folio)) {
>> - result = SCAN_PAGE_ANON;
>> - goto out_unmap;
>> - }
>> + VM_BUG_ON_FOLIO(!folio_test_anon(folio), folio);
>
> Yes, I would suggest a different patch making this change, then in the
>
> last patch you can abstract away till the shared statistics line since that
>
> much code will become common between scan_pmd and _isolate.
Nice. Makes sense to me!
I'll rework the series into three patches then:
1) The if-else-if-else-if optimization
2) Convert the !folio_test_anon() check to a VM_BUG_ON_FOLIO()
3) The final merge of the scanning logic into a new helper
Thanks,
Lance
>
>
>>
>> /*
>> * We treat a single page as shared if any part of the
>> THP
>>
>>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH mm-new 2/2] mm/khugepaged: merge PTE scanning logic into a new helper
2025-10-03 17:05 ` Dev Jain
2025-10-04 3:03 ` Lance Yang
@ 2025-10-04 9:42 ` Wei Yang
2025-10-04 13:11 ` Dev Jain
1 sibling, 1 reply; 18+ messages in thread
From: Wei Yang @ 2025-10-04 9:42 UTC (permalink / raw)
To: Dev Jain
Cc: Lance Yang, akpm, david, lorenzo.stoakes, ziy, baolin.wang,
Liam.Howlett, npache, ryan.roberts, baohua, ioworker0,
richard.weiyang, linux-kernel, linux-mm
On Fri, Oct 03, 2025 at 10:35:12PM +0530, Dev Jain wrote:
>
>On 02/10/25 1:02 pm, Lance Yang wrote:
>> From: Lance Yang <lance.yang@linux.dev>
>>
>> As David suggested, the PTE scanning logic in hpage_collapse_scan_pmd()
>> and __collapse_huge_page_isolate() was almost duplicated.
>>
>> This patch cleans things up by moving all the common PTE checking logic
>> into a new shared helper, thp_collapse_check_pte().
>>
>> Suggested-by: David Hildenbrand <david@redhat.com>
>> Signed-off-by: Lance Yang <lance.yang@linux.dev>
>> ---
>
>In hpage_collapse_scan_pmd(), we enter with mmap lock held, so for
This is true for the first loop, but we will unlock/lock mmap and revalidate
vma before isolation.
>an anonymous vma, is it even possible to hit if (!folio_test_anon(folio))?
>In which case we can replace this with VM_BUG_ON_FOLIO and abstract away
>till the folio_maybe_mapped_shared() block?
But it looks still valid, since hugepage_vma_revalidate() will check the vma
is still anonymous vma after grab the mmap lock again.
My concern is would VM_BUG_ON_FOLIO() be too heavy? How about warn on and
return?
--
Wei Yang
Help you, Help me
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH mm-new 2/2] mm/khugepaged: merge PTE scanning logic into a new helper
2025-10-04 9:42 ` Wei Yang
@ 2025-10-04 13:11 ` Dev Jain
2025-10-05 2:35 ` Lance Yang
0 siblings, 1 reply; 18+ messages in thread
From: Dev Jain @ 2025-10-04 13:11 UTC (permalink / raw)
To: Wei Yang
Cc: Lance Yang, akpm, david, lorenzo.stoakes, ziy, baolin.wang,
Liam.Howlett, npache, ryan.roberts, baohua, ioworker0,
linux-kernel, linux-mm
On 04/10/25 3:12 pm, Wei Yang wrote:
> On Fri, Oct 03, 2025 at 10:35:12PM +0530, Dev Jain wrote:
>> On 02/10/25 1:02 pm, Lance Yang wrote:
>>> From: Lance Yang <lance.yang@linux.dev>
>>>
>>> As David suggested, the PTE scanning logic in hpage_collapse_scan_pmd()
>>> and __collapse_huge_page_isolate() was almost duplicated.
>>>
>>> This patch cleans things up by moving all the common PTE checking logic
>>> into a new shared helper, thp_collapse_check_pte().
>>>
>>> Suggested-by: David Hildenbrand <david@redhat.com>
>>> Signed-off-by: Lance Yang <lance.yang@linux.dev>
>>> ---
>> In hpage_collapse_scan_pmd(), we enter with mmap lock held, so for
> This is true for the first loop, but we will unlock/lock mmap and revalidate
> vma before isolation.
>
>> an anonymous vma, is it even possible to hit if (!folio_test_anon(folio))?
>> In which case we can replace this with VM_BUG_ON_FOLIO and abstract away
>> till the folio_maybe_mapped_shared() block?
> But it looks still valid, since hugepage_vma_revalidate() will check the vma
> is still anonymous vma after grab the mmap lock again.
>
> My concern is would VM_BUG_ON_FOLIO() be too heavy? How about warn on and
> return?
Frankly I do not have much opinion on the BUG_ON/WARN_ON debate since I haven't
properly understood that, but this BUG_ON is under CONFIG_DEBUG_VM anways. But
if you want to change this to WARN then you can do it at both places.
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH mm-new 2/2] mm/khugepaged: merge PTE scanning logic into a new helper
2025-10-04 13:11 ` Dev Jain
@ 2025-10-05 2:35 ` Lance Yang
2025-10-05 2:38 ` Zi Yan
0 siblings, 1 reply; 18+ messages in thread
From: Lance Yang @ 2025-10-05 2:35 UTC (permalink / raw)
To: Wei Yang, Dev Jain
Cc: akpm, david, lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett,
npache, ryan.roberts, baohua, ioworker0, linux-kernel, linux-mm
On 2025/10/4 21:11, Dev Jain wrote:
>
> On 04/10/25 3:12 pm, Wei Yang wrote:
>> On Fri, Oct 03, 2025 at 10:35:12PM +0530, Dev Jain wrote:
>>> On 02/10/25 1:02 pm, Lance Yang wrote:
>>>> From: Lance Yang <lance.yang@linux.dev>
>>>>
>>>> As David suggested, the PTE scanning logic in hpage_collapse_scan_pmd()
>>>> and __collapse_huge_page_isolate() was almost duplicated.
>>>>
>>>> This patch cleans things up by moving all the common PTE checking logic
>>>> into a new shared helper, thp_collapse_check_pte().
>>>>
>>>> Suggested-by: David Hildenbrand <david@redhat.com>
>>>> Signed-off-by: Lance Yang <lance.yang@linux.dev>
>>>> ---
>>> In hpage_collapse_scan_pmd(), we enter with mmap lock held, so for
>> This is true for the first loop, but we will unlock/lock mmap and
>> revalidate
>> vma before isolation.
>>
>>> an anonymous vma, is it even possible to hit if (!
>>> folio_test_anon(folio))?
>>> In which case we can replace this with VM_BUG_ON_FOLIO and abstract away
>>> till the folio_maybe_mapped_shared() block?
>> But it looks still valid, since hugepage_vma_revalidate() will check
>> the vma
>> is still anonymous vma after grab the mmap lock again.
>>
>> My concern is would VM_BUG_ON_FOLIO() be too heavy? How about warn on and
>> return?
>
> Frankly I do not have much opinion on the BUG_ON/WARN_ON debate since I
> haven't
> properly understood that, but this BUG_ON is under CONFIG_DEBUG_VM
> anways. But
Yeah, VM_BUG_ON_FOLIO() is under CONFIG_DEBUG_VM, so it won't affect
production kernels.
> if you want to change this to WARN then you can do it at both places.
It should flag such an impossible condition there during development.
So, I'd prefer to stick with VM_BUG_ON_FOLIO().
@Wei please let me know if you feel strongly otherwise :)
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH mm-new 2/2] mm/khugepaged: merge PTE scanning logic into a new helper
2025-10-05 2:35 ` Lance Yang
@ 2025-10-05 2:38 ` Zi Yan
2025-10-05 2:44 ` Lance Yang
0 siblings, 1 reply; 18+ messages in thread
From: Zi Yan @ 2025-10-05 2:38 UTC (permalink / raw)
To: Lance Yang
Cc: Wei Yang, Dev Jain, akpm, david, lorenzo.stoakes, baolin.wang,
Liam.Howlett, npache, ryan.roberts, baohua, ioworker0,
linux-kernel, linux-mm
On 4 Oct 2025, at 22:35, Lance Yang wrote:
> On 2025/10/4 21:11, Dev Jain wrote:
>>
>> On 04/10/25 3:12 pm, Wei Yang wrote:
>>> On Fri, Oct 03, 2025 at 10:35:12PM +0530, Dev Jain wrote:
>>>> On 02/10/25 1:02 pm, Lance Yang wrote:
>>>>> From: Lance Yang <lance.yang@linux.dev>
>>>>>
>>>>> As David suggested, the PTE scanning logic in hpage_collapse_scan_pmd()
>>>>> and __collapse_huge_page_isolate() was almost duplicated.
>>>>>
>>>>> This patch cleans things up by moving all the common PTE checking logic
>>>>> into a new shared helper, thp_collapse_check_pte().
>>>>>
>>>>> Suggested-by: David Hildenbrand <david@redhat.com>
>>>>> Signed-off-by: Lance Yang <lance.yang@linux.dev>
>>>>> ---
>>>> In hpage_collapse_scan_pmd(), we enter with mmap lock held, so for
>>> This is true for the first loop, but we will unlock/lock mmap and revalidate
>>> vma before isolation.
>>>
>>>> an anonymous vma, is it even possible to hit if (! folio_test_anon(folio))?
>>>> In which case we can replace this with VM_BUG_ON_FOLIO and abstract away
>>>> till the folio_maybe_mapped_shared() block?
>>> But it looks still valid, since hugepage_vma_revalidate() will check the vma
>>> is still anonymous vma after grab the mmap lock again.
>>>
>>> My concern is would VM_BUG_ON_FOLIO() be too heavy? How about warn on and
>>> return?
>>
>> Frankly I do not have much opinion on the BUG_ON/WARN_ON debate since I haven't
>> properly understood that, but this BUG_ON is under CONFIG_DEBUG_VM anways. But
>
> Yeah, VM_BUG_ON_FOLIO() is under CONFIG_DEBUG_VM, so it won't affect
> production kernels.
Many distros enable it by default. For mm, we are moving away from
using BUG_ON or VM_BUG_ON. No need to crash the system if it is possible
to handle it gracefully.
>
>> if you want to change this to WARN then you can do it at both places.
>
> It should flag such an impossible condition there during development.
> So, I'd prefer to stick with VM_BUG_ON_FOLIO().
>
> @Wei please let me know if you feel strongly otherwise :)
--
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH mm-new 2/2] mm/khugepaged: merge PTE scanning logic into a new helper
2025-10-05 2:38 ` Zi Yan
@ 2025-10-05 2:44 ` Lance Yang
0 siblings, 0 replies; 18+ messages in thread
From: Lance Yang @ 2025-10-05 2:44 UTC (permalink / raw)
To: Zi Yan
Cc: Wei Yang, Dev Jain, akpm, david, lorenzo.stoakes, baolin.wang,
Liam.Howlett, npache, ryan.roberts, baohua, ioworker0,
linux-kernel, linux-mm
On 2025/10/5 10:38, Zi Yan wrote:
> On 4 Oct 2025, at 22:35, Lance Yang wrote:
>
>> On 2025/10/4 21:11, Dev Jain wrote:
>>>
>>> On 04/10/25 3:12 pm, Wei Yang wrote:
>>>> On Fri, Oct 03, 2025 at 10:35:12PM +0530, Dev Jain wrote:
>>>>> On 02/10/25 1:02 pm, Lance Yang wrote:
>>>>>> From: Lance Yang <lance.yang@linux.dev>
>>>>>>
>>>>>> As David suggested, the PTE scanning logic in hpage_collapse_scan_pmd()
>>>>>> and __collapse_huge_page_isolate() was almost duplicated.
>>>>>>
>>>>>> This patch cleans things up by moving all the common PTE checking logic
>>>>>> into a new shared helper, thp_collapse_check_pte().
>>>>>>
>>>>>> Suggested-by: David Hildenbrand <david@redhat.com>
>>>>>> Signed-off-by: Lance Yang <lance.yang@linux.dev>
>>>>>> ---
>>>>> In hpage_collapse_scan_pmd(), we enter with mmap lock held, so for
>>>> This is true for the first loop, but we will unlock/lock mmap and revalidate
>>>> vma before isolation.
>>>>
>>>>> an anonymous vma, is it even possible to hit if (! folio_test_anon(folio))?
>>>>> In which case we can replace this with VM_BUG_ON_FOLIO and abstract away
>>>>> till the folio_maybe_mapped_shared() block?
>>>> But it looks still valid, since hugepage_vma_revalidate() will check the vma
>>>> is still anonymous vma after grab the mmap lock again.
>>>>
>>>> My concern is would VM_BUG_ON_FOLIO() be too heavy? How about warn on and
>>>> return?
>>>
>>> Frankly I do not have much opinion on the BUG_ON/WARN_ON debate since I haven't
>>> properly understood that, but this BUG_ON is under CONFIG_DEBUG_VM anways. But
>>
>> Yeah, VM_BUG_ON_FOLIO() is under CONFIG_DEBUG_VM, so it won't affect
>> production kernels.
>
> Many distros enable it by default. For mm, we are moving away from
> using BUG_ON or VM_BUG_ON. No need to crash the system if it is possible
> to handle it gracefully.
Ah, good to know that, thanks!
>
>>
>>> if you want to change this to WARN then you can do it at both places.
>>
>> It should flag such an impossible condition there during development.
>> So, I'd prefer to stick with VM_BUG_ON_FOLIO().
>>
>> @Wei please let me know if you feel strongly otherwise :)
>
>
> --
> Best Regards,
> Yan, Zi
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2025-10-05 2:44 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-02 7:32 [PATCH mm-new 0/2] mm/khugepaged: refactor and merge PTE scanning logic Lance Yang
2025-10-02 7:32 ` [PATCH mm-new 1/2] mm/khugepaged: optimize PTE scanning with if-else-if-else-if chain Lance Yang
2025-10-03 13:21 ` Wei Yang
2025-10-03 16:33 ` Dev Jain
2025-10-04 3:08 ` Lance Yang
2025-10-03 17:04 ` Zi Yan
2025-10-02 7:32 ` [PATCH mm-new 2/2] mm/khugepaged: merge PTE scanning logic into a new helper Lance Yang
2025-10-03 17:05 ` Dev Jain
2025-10-04 3:03 ` Lance Yang
2025-10-04 4:42 ` Dev Jain
2025-10-04 5:24 ` Lance Yang
2025-10-04 9:42 ` Wei Yang
2025-10-04 13:11 ` Dev Jain
2025-10-05 2:35 ` Lance Yang
2025-10-05 2:38 ` Zi Yan
2025-10-05 2:44 ` Lance Yang
2025-10-03 17:11 ` Zi Yan
2025-10-04 3:06 ` Lance Yang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox