linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH mm-new v2 0/3] mm/khugepaged: refactor and merge PTE scanning logic
@ 2025-10-06 14:43 Lance Yang
  2025-10-06 14:43 ` [PATCH mm-new v2 1/3] mm/khugepaged: optimize PTE scanning with if-else-if-else-if chain Lance Yang
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Lance Yang @ 2025-10-06 14:43 UTC (permalink / raw)
  To: akpm, david, lorenzo.stoakes
  Cc: ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain,
	baohua, ioworker0, richard.weiyang, linux-kernel, linux-mm

Hi all,

This series cleans up the almost-duplicated PTE scanning logic in the
collapse path.

The first one is a preparatory step that refactors both loops to use
a single if-else-if-else-if chain for checking disjoint PTEs.

The second one replaces VM_BUG_ON_FOLIO() with a more graceful
VM_WARN_ON_FOLIO() for handling non-anonymous folios.

The last one then extracts the common logic into a shared helper.

Thanks,
Lance

---
v1 -> v2:
- #01 Update the changelog (per Dev)
- #01 Collect Reviewed-by from Wei, Dev and Zi - thanks!
- #03 Make more of the scanning logic common between scan_pmd() and
      _isolate() (per Dev)
- https://lore.kernel.org/linux-mm/20251002073255.14867-1-lance.yang@linux.dev

Lance Yang (3):
  mm/khugepaged: optimize PTE scanning with if-else-if-else-if chain
  mm/khugepaged: use VM_WARN_ON_FOLIO instead of VM_BUG_ON_FOLIO for
    non-anon folios
  mm/khugepaged: merge PTE scanning logic into a new helper

 mm/khugepaged.c | 243 ++++++++++++++++++++++++++----------------------
 1 file changed, 131 insertions(+), 112 deletions(-)

-- 
2.49.0



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH mm-new v2 1/3] mm/khugepaged: optimize PTE scanning with if-else-if-else-if chain
  2025-10-06 14:43 [PATCH mm-new v2 0/3] mm/khugepaged: refactor and merge PTE scanning logic Lance Yang
@ 2025-10-06 14:43 ` Lance Yang
  2025-10-06 14:43 ` [PATCH mm-new v2 2/3] mm/khugepaged: use VM_WARN_ON_FOLIO instead of VM_BUG_ON_FOLIO for non-anon folios Lance Yang
  2025-10-06 14:43 ` [PATCH mm-new v2 3/3] mm/khugepaged: merge PTE scanning logic into a new helper Lance Yang
  2 siblings, 0 replies; 8+ messages in thread
From: Lance Yang @ 2025-10-06 14:43 UTC (permalink / raw)
  To: akpm, david, lorenzo.stoakes
  Cc: ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain,
	baohua, ioworker0, richard.weiyang, linux-kernel, linux-mm,
	Lance Yang

From: Lance Yang <lance.yang@linux.dev>

As pointed out by Dev, the PTE checks for disjoint conditions in the
scanning loops can be optimized. is_swap_pte, (pte_none && is_zero_pfn),
and pte_uffd_wp are mutually exclusive.

This patch refactors the loops in both __collapse_huge_page_isolate() and
hpage_collapse_scan_pmd() to use a continuous if-else-if-else-if chain
instead of separate if blocks. While at it, the redundant pte_present()
check before is_zero_pfn() is also removed.

Also, this is a preparatory step to make it easier to merge the
almost-duplicated scanning logic in these two functions, as suggested
by David.

Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Suggested-by: Dev Jain <dev.jain@arm.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Lance Yang <lance.yang@linux.dev>
---
 mm/khugepaged.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index f4f57ba69d72..808523f92c7b 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -548,8 +548,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
 	for (_pte = pte; _pte < pte + HPAGE_PMD_NR;
 	     _pte++, addr += PAGE_SIZE) {
 		pte_t pteval = ptep_get(_pte);
-		if (pte_none(pteval) || (pte_present(pteval) &&
-				is_zero_pfn(pte_pfn(pteval)))) {
+		if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
 			++none_or_zero;
 			if (!userfaultfd_armed(vma) &&
 			    (!cc->is_khugepaged ||
@@ -560,12 +559,10 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
 				count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
 				goto out;
 			}
-		}
-		if (!pte_present(pteval)) {
+		} else if (!pte_present(pteval)) {
 			result = SCAN_PTE_NON_PRESENT;
 			goto out;
-		}
-		if (pte_uffd_wp(pteval)) {
+		} else if (pte_uffd_wp(pteval)) {
 			result = SCAN_PTE_UFFD_WP;
 			goto out;
 		}
@@ -1316,8 +1313,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
 				count_vm_event(THP_SCAN_EXCEED_SWAP_PTE);
 				goto out_unmap;
 			}
-		}
-		if (pte_uffd_wp(pteval)) {
+		} else if (pte_uffd_wp(pteval)) {
 			/*
 			 * Don't collapse the page if any of the small
 			 * PTEs are armed with uffd write protection.
-- 
2.49.0



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH mm-new v2 2/3] mm/khugepaged: use VM_WARN_ON_FOLIO instead of VM_BUG_ON_FOLIO for non-anon folios
  2025-10-06 14:43 [PATCH mm-new v2 0/3] mm/khugepaged: refactor and merge PTE scanning logic Lance Yang
  2025-10-06 14:43 ` [PATCH mm-new v2 1/3] mm/khugepaged: optimize PTE scanning with if-else-if-else-if chain Lance Yang
@ 2025-10-06 14:43 ` Lance Yang
  2025-10-07  0:35   ` Wei Yang
  2025-10-07  4:39   ` Dev Jain
  2025-10-06 14:43 ` [PATCH mm-new v2 3/3] mm/khugepaged: merge PTE scanning logic into a new helper Lance Yang
  2 siblings, 2 replies; 8+ messages in thread
From: Lance Yang @ 2025-10-06 14:43 UTC (permalink / raw)
  To: akpm, david, lorenzo.stoakes
  Cc: ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain,
	baohua, ioworker0, richard.weiyang, linux-kernel, linux-mm,
	Lance Yang

From: Lance Yang <lance.yang@linux.dev>

As Zi pointed out, we should avoid crashing the kernel for conditions
that can be handled gracefully. Encountering a non-anonymous folio in an
anonymous VMA is a bug, but a warning is sufficient.

This patch changes the VM_BUG_ON_FOLIO(!folio_test_anon(folio)) to a
VM_WARN_ON_FOLIO() in both __collapse_huge_page_isolate() and
hpage_collapse_scan_pmd(), and then aborts the scan with SCAN_PAGE_ANON.

Making more of the scanning logic common between hpage_collapse_scan_pmd()
and __collapse_huge_page_isolate(), as suggested by Dev.

Suggested-by: Dev Jain <dev.jain@arm.com>
Suggested-by: Zi Yan <ziy@nvidia.com>
Signed-off-by: Lance Yang <lance.yang@linux.dev>
---
 mm/khugepaged.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 808523f92c7b..87a8df90b3a6 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -573,7 +573,11 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
 		}
 
 		folio = page_folio(page);
-		VM_BUG_ON_FOLIO(!folio_test_anon(folio), folio);
+		if (!folio_test_anon(folio)) {
+			VM_WARN_ON_FOLIO(true, folio);
+			result = SCAN_PAGE_ANON;
+			goto out;
+		}
 
 		/* See hpage_collapse_scan_pmd(). */
 		if (folio_maybe_mapped_shared(folio)) {
@@ -1335,6 +1339,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
 		folio = page_folio(page);
 
 		if (!folio_test_anon(folio)) {
+			VM_WARN_ON_FOLIO(true, folio);
 			result = SCAN_PAGE_ANON;
 			goto out_unmap;
 		}
-- 
2.49.0



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH mm-new v2 3/3] mm/khugepaged: merge PTE scanning logic into a new helper
  2025-10-06 14:43 [PATCH mm-new v2 0/3] mm/khugepaged: refactor and merge PTE scanning logic Lance Yang
  2025-10-06 14:43 ` [PATCH mm-new v2 1/3] mm/khugepaged: optimize PTE scanning with if-else-if-else-if chain Lance Yang
  2025-10-06 14:43 ` [PATCH mm-new v2 2/3] mm/khugepaged: use VM_WARN_ON_FOLIO instead of VM_BUG_ON_FOLIO for non-anon folios Lance Yang
@ 2025-10-06 14:43 ` Lance Yang
  2025-10-07  6:28   ` Dev Jain
  2 siblings, 1 reply; 8+ messages in thread
From: Lance Yang @ 2025-10-06 14:43 UTC (permalink / raw)
  To: akpm, david, lorenzo.stoakes
  Cc: ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain,
	baohua, ioworker0, richard.weiyang, linux-kernel, linux-mm,
	Lance Yang

From: Lance Yang <lance.yang@linux.dev>

As David suggested, the PTE scanning logic in hpage_collapse_scan_pmd()
and __collapse_huge_page_isolate() was almost duplicated.

This patch cleans things up by moving all the common PTE checking logic
into a new shared helper, thp_collapse_check_pte().

Suggested-by: David Hildenbrand <david@redhat.com>
Suggested-by: Dev Jain <dev.jain@arm.com>
Signed-off-by: Lance Yang <lance.yang@linux.dev>
---
 mm/khugepaged.c | 244 ++++++++++++++++++++++++++----------------------
 1 file changed, 131 insertions(+), 113 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 87a8df90b3a6..96ea8d1b9fed 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -61,6 +61,12 @@ enum scan_result {
 	SCAN_PAGE_FILLED,
 };
 
+enum pte_check_result {
+	PTE_CHECK_SUCCEED,
+	PTE_CHECK_CONTINUE,
+	PTE_CHECK_FAIL,
+};
+
 #define CREATE_TRACE_POINTS
 #include <trace/events/huge_memory.h>
 
@@ -533,62 +539,140 @@ static void release_pte_pages(pte_t *pte, pte_t *_pte,
 	}
 }
 
+/*
+ * thp_collapse_check_pte - Check if a PTE is suitable for THP collapse
+ * @pte:           The PTE to check
+ * @vma:           The VMA the PTE belongs to
+ * @addr:          The virtual address corresponding to this PTE
+ * @cc:            Collapse control settings
+ * @foliop:        On success, used to return a pointer to the folio
+ *                 Must be non-NULL
+ * @none_or_zero:  Counter for none/zero PTEs. Must be non-NULL
+ * @unmapped:      Counter for swap PTEs. Can be NULL if not scanning swaps
+ * @shared:        Counter for shared pages. Must be non-NULL
+ * @scan_result:   Used to return the failure reason (SCAN_*) on a
+ *                 PTE_CHECK_FAIL return. Must be non-NULL
+ *
+ * Returns:
+ *   PTE_CHECK_SUCCEED  - PTE is suitable, proceed with further checks
+ *   PTE_CHECK_CONTINUE - Skip this PTE and continue scanning
+ *   PTE_CHECK_FAIL     - Abort collapse scan
+ */
+static inline int thp_collapse_check_pte(pte_t pte, struct vm_area_struct *vma,
+		unsigned long addr, struct collapse_control *cc,
+		struct folio **foliop, int *none_or_zero, int *unmapped,
+		int *shared, int *scan_result)
+{
+	struct folio *folio = NULL;
+	struct page *page = NULL;
+
+	if (pte_none(pte) || is_zero_pfn(pte_pfn(pte))) {
+		(*none_or_zero)++;
+		if (!userfaultfd_armed(vma) &&
+		    (!cc->is_khugepaged ||
+		     *none_or_zero <= khugepaged_max_ptes_none)) {
+			return PTE_CHECK_CONTINUE;
+		} else {
+			*scan_result = SCAN_EXCEED_NONE_PTE;
+			count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
+			return PTE_CHECK_FAIL;
+		}
+	} else if (!pte_present(pte)) {
+		if (!unmapped) {
+			*scan_result = SCAN_PTE_NON_PRESENT;
+			return PTE_CHECK_FAIL;
+		}
+
+		if (non_swap_entry(pte_to_swp_entry(pte))) {
+			*scan_result = SCAN_PTE_NON_PRESENT;
+			return PTE_CHECK_FAIL;
+		}
+
+		(*unmapped)++;
+		if (!cc->is_khugepaged ||
+		    *unmapped <= khugepaged_max_ptes_swap) {
+			/*
+			 * Always be strict with uffd-wp enabled swap
+			 * entries. Please see comment below for
+			 * pte_uffd_wp().
+			 */
+			if (pte_swp_uffd_wp(pte)) {
+				*scan_result = SCAN_PTE_UFFD_WP;
+				return PTE_CHECK_FAIL;
+			}
+			return PTE_CHECK_CONTINUE;
+		} else {
+			*scan_result = SCAN_EXCEED_SWAP_PTE;
+			count_vm_event(THP_SCAN_EXCEED_SWAP_PTE);
+			return PTE_CHECK_FAIL;
+		}
+	} else if (pte_uffd_wp(pte)) {
+		/*
+		 * Don't collapse the page if any of the small PTEs are
+		 * armed with uffd write protection. Here we can also mark
+		 * the new huge pmd as write protected if any of the small
+		 * ones is marked but that could bring unknown userfault
+		 * messages that falls outside of the registered range.
+		 * So, just be simple.
+		 */
+		*scan_result = SCAN_PTE_UFFD_WP;
+		return PTE_CHECK_FAIL;
+	}
+
+	page = vm_normal_page(vma, addr, pte);
+	if (unlikely(!page) || unlikely(is_zone_device_page(page))) {
+		*scan_result = SCAN_PAGE_NULL;
+		return PTE_CHECK_FAIL;
+	}
+
+	folio = page_folio(page);
+	if (!folio_test_anon(folio)) {
+		VM_WARN_ON_FOLIO(true, folio);
+		*scan_result = SCAN_PAGE_ANON;
+		return PTE_CHECK_FAIL;
+	}
+
+	/*
+	 * We treat a single page as shared if any part of the THP
+	 * is shared.
+	 */
+	if (folio_maybe_mapped_shared(folio)) {
+		(*shared)++;
+		if (cc->is_khugepaged && *shared > khugepaged_max_ptes_shared) {
+			*scan_result = SCAN_EXCEED_SHARED_PTE;
+			count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
+			return PTE_CHECK_FAIL;
+		}
+	}
+
+	*foliop = folio;
+
+	return PTE_CHECK_SUCCEED;
+}
+
 static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
 					unsigned long start_addr,
 					pte_t *pte,
 					struct collapse_control *cc,
 					struct list_head *compound_pagelist)
 {
-	struct page *page = NULL;
 	struct folio *folio = NULL;
 	unsigned long addr = start_addr;
 	pte_t *_pte;
 	int none_or_zero = 0, shared = 0, result = SCAN_FAIL, referenced = 0;
+	int pte_check_res;
 
 	for (_pte = pte; _pte < pte + HPAGE_PMD_NR;
 	     _pte++, addr += PAGE_SIZE) {
 		pte_t pteval = ptep_get(_pte);
-		if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
-			++none_or_zero;
-			if (!userfaultfd_armed(vma) &&
-			    (!cc->is_khugepaged ||
-			     none_or_zero <= khugepaged_max_ptes_none)) {
-				continue;
-			} else {
-				result = SCAN_EXCEED_NONE_PTE;
-				count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
-				goto out;
-			}
-		} else if (!pte_present(pteval)) {
-			result = SCAN_PTE_NON_PRESENT;
-			goto out;
-		} else if (pte_uffd_wp(pteval)) {
-			result = SCAN_PTE_UFFD_WP;
-			goto out;
-		}
-		page = vm_normal_page(vma, addr, pteval);
-		if (unlikely(!page) || unlikely(is_zone_device_page(page))) {
-			result = SCAN_PAGE_NULL;
-			goto out;
-		}
 
-		folio = page_folio(page);
-		if (!folio_test_anon(folio)) {
-			VM_WARN_ON_FOLIO(true, folio);
-			result = SCAN_PAGE_ANON;
-			goto out;
-		}
+		pte_check_res = thp_collapse_check_pte(pteval, vma, addr, cc,
+					&folio, &none_or_zero, NULL, &shared, &result);
 
-		/* See hpage_collapse_scan_pmd(). */
-		if (folio_maybe_mapped_shared(folio)) {
-			++shared;
-			if (cc->is_khugepaged &&
-			    shared > khugepaged_max_ptes_shared) {
-				result = SCAN_EXCEED_SHARED_PTE;
-				count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
-				goto out;
-			}
-		}
+		if (pte_check_res == PTE_CHECK_CONTINUE)
+			continue;
+		else if (pte_check_res == PTE_CHECK_FAIL)
+			goto out;
 
 		if (folio_test_large(folio)) {
 			struct folio *f;
@@ -1259,11 +1343,11 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
 	pte_t *pte, *_pte;
 	int result = SCAN_FAIL, referenced = 0;
 	int none_or_zero = 0, shared = 0;
-	struct page *page = NULL;
 	struct folio *folio = NULL;
 	unsigned long addr;
 	spinlock_t *ptl;
 	int node = NUMA_NO_NODE, unmapped = 0;
+	int pte_check_res;
 
 	VM_BUG_ON(start_addr & ~HPAGE_PMD_MASK);
 
@@ -1282,81 +1366,15 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
 	for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
 	     _pte++, addr += PAGE_SIZE) {
 		pte_t pteval = ptep_get(_pte);
-		if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
-			++none_or_zero;
-			if (!userfaultfd_armed(vma) &&
-			    (!cc->is_khugepaged ||
-			     none_or_zero <= khugepaged_max_ptes_none)) {
-				continue;
-			} else {
-				result = SCAN_EXCEED_NONE_PTE;
-				count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
-				goto out_unmap;
-			}
-		} else if (!pte_present(pteval)) {
-			if (non_swap_entry(pte_to_swp_entry(pteval))) {
-				result = SCAN_PTE_NON_PRESENT;
-				goto out_unmap;
-			}
-
-			++unmapped;
-			if (!cc->is_khugepaged ||
-			    unmapped <= khugepaged_max_ptes_swap) {
-				/*
-				 * Always be strict with uffd-wp
-				 * enabled swap entries.  Please see
-				 * comment below for pte_uffd_wp().
-				 */
-				if (pte_swp_uffd_wp(pteval)) {
-					result = SCAN_PTE_UFFD_WP;
-					goto out_unmap;
-				}
-				continue;
-			} else {
-				result = SCAN_EXCEED_SWAP_PTE;
-				count_vm_event(THP_SCAN_EXCEED_SWAP_PTE);
-				goto out_unmap;
-			}
-		} else if (pte_uffd_wp(pteval)) {
-			/*
-			 * Don't collapse the page if any of the small
-			 * PTEs are armed with uffd write protection.
-			 * Here we can also mark the new huge pmd as
-			 * write protected if any of the small ones is
-			 * marked but that could bring unknown
-			 * userfault messages that falls outside of
-			 * the registered range.  So, just be simple.
-			 */
-			result = SCAN_PTE_UFFD_WP;
-			goto out_unmap;
-		}
 
-		page = vm_normal_page(vma, addr, pteval);
-		if (unlikely(!page) || unlikely(is_zone_device_page(page))) {
-			result = SCAN_PAGE_NULL;
-			goto out_unmap;
-		}
-		folio = page_folio(page);
+		pte_check_res = thp_collapse_check_pte(pteval, vma, addr, cc,
+					&folio, &none_or_zero, &unmapped,
+					&shared, &result);
 
-		if (!folio_test_anon(folio)) {
-			VM_WARN_ON_FOLIO(true, folio);
-			result = SCAN_PAGE_ANON;
+		if (pte_check_res == PTE_CHECK_CONTINUE)
+			continue;
+		else if (pte_check_res == PTE_CHECK_FAIL)
 			goto out_unmap;
-		}
-
-		/*
-		 * We treat a single page as shared if any part of the THP
-		 * is shared.
-		 */
-		if (folio_maybe_mapped_shared(folio)) {
-			++shared;
-			if (cc->is_khugepaged &&
-			    shared > khugepaged_max_ptes_shared) {
-				result = SCAN_EXCEED_SHARED_PTE;
-				count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
-				goto out_unmap;
-			}
-		}
 
 		/*
 		 * Record which node the original page is from and save this
-- 
2.49.0



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH mm-new v2 2/3] mm/khugepaged: use VM_WARN_ON_FOLIO instead of VM_BUG_ON_FOLIO for non-anon folios
  2025-10-06 14:43 ` [PATCH mm-new v2 2/3] mm/khugepaged: use VM_WARN_ON_FOLIO instead of VM_BUG_ON_FOLIO for non-anon folios Lance Yang
@ 2025-10-07  0:35   ` Wei Yang
  2025-10-07  4:39   ` Dev Jain
  1 sibling, 0 replies; 8+ messages in thread
From: Wei Yang @ 2025-10-07  0:35 UTC (permalink / raw)
  To: Lance Yang
  Cc: akpm, david, lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett,
	npache, ryan.roberts, dev.jain, baohua, ioworker0,
	richard.weiyang, linux-kernel, linux-mm

On Mon, Oct 06, 2025 at 10:43:37PM +0800, Lance Yang wrote:
>From: Lance Yang <lance.yang@linux.dev>
>
>As Zi pointed out, we should avoid crashing the kernel for conditions
>that can be handled gracefully. Encountering a non-anonymous folio in an
>anonymous VMA is a bug, but a warning is sufficient.
>
>This patch changes the VM_BUG_ON_FOLIO(!folio_test_anon(folio)) to a
>VM_WARN_ON_FOLIO() in both __collapse_huge_page_isolate() and
>hpage_collapse_scan_pmd(), and then aborts the scan with SCAN_PAGE_ANON.
>
>Making more of the scanning logic common between hpage_collapse_scan_pmd()
>and __collapse_huge_page_isolate(), as suggested by Dev.
>
>Suggested-by: Dev Jain <dev.jain@arm.com>
>Suggested-by: Zi Yan <ziy@nvidia.com>
>Signed-off-by: Lance Yang <lance.yang@linux.dev>

Reviewed-by: Wei Yang <richard.weiyang@gmail.com>

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH mm-new v2 2/3] mm/khugepaged: use VM_WARN_ON_FOLIO instead of VM_BUG_ON_FOLIO for non-anon folios
  2025-10-06 14:43 ` [PATCH mm-new v2 2/3] mm/khugepaged: use VM_WARN_ON_FOLIO instead of VM_BUG_ON_FOLIO for non-anon folios Lance Yang
  2025-10-07  0:35   ` Wei Yang
@ 2025-10-07  4:39   ` Dev Jain
  1 sibling, 0 replies; 8+ messages in thread
From: Dev Jain @ 2025-10-07  4:39 UTC (permalink / raw)
  To: Lance Yang, akpm, david, lorenzo.stoakes
  Cc: ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts, baohua,
	ioworker0, richard.weiyang, linux-kernel, linux-mm


On 06/10/25 8:13 pm, Lance Yang wrote:
> From: Lance Yang <lance.yang@linux.dev>
>
> As Zi pointed out, we should avoid crashing the kernel for conditions
> that can be handled gracefully. Encountering a non-anonymous folio in an
> anonymous VMA is a bug, but a warning is sufficient.
>
> This patch changes the VM_BUG_ON_FOLIO(!folio_test_anon(folio)) to a
> VM_WARN_ON_FOLIO() in both __collapse_huge_page_isolate() and
> hpage_collapse_scan_pmd(), and then aborts the scan with SCAN_PAGE_ANON.
>
> Making more of the scanning logic common between hpage_collapse_scan_pmd()
> and __collapse_huge_page_isolate(), as suggested by Dev.
>
> Suggested-by: Dev Jain <dev.jain@arm.com>
> Suggested-by: Zi Yan <ziy@nvidia.com>
> Signed-off-by: Lance Yang <lance.yang@linux.dev>
> ---
>   

Reviewed-by: Dev Jain <dev.jain@arm.com>



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH mm-new v2 3/3] mm/khugepaged: merge PTE scanning logic into a new helper
  2025-10-06 14:43 ` [PATCH mm-new v2 3/3] mm/khugepaged: merge PTE scanning logic into a new helper Lance Yang
@ 2025-10-07  6:28   ` Dev Jain
  2025-10-07  8:32     ` Lance Yang
  0 siblings, 1 reply; 8+ messages in thread
From: Dev Jain @ 2025-10-07  6:28 UTC (permalink / raw)
  To: Lance Yang, akpm, david, lorenzo.stoakes
  Cc: ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts, baohua,
	ioworker0, richard.weiyang, linux-kernel, linux-mm


On 06/10/25 8:13 pm, Lance Yang wrote:
> +static inline int thp_collapse_check_pte(pte_t pte, struct vm_area_struct *vma,
> +		unsigned long addr, struct collapse_control *cc,
> +		struct folio **foliop, int *none_or_zero, int *unmapped,
> +		int *shared, int *scan_result)

Nit: Will prefer the cc parameter to go at the last.

> +{
> +	struct folio *folio = NULL;
> +	struct page *page = NULL;
> +
> +	if (pte_none(pte) || is_zero_pfn(pte_pfn(pte))) {
> +		(*none_or_zero)++;
> +		if (!userfaultfd_armed(vma) &&
> +		    (!cc->is_khugepaged ||
> +		     *none_or_zero <= khugepaged_max_ptes_none)) {
> +			return PTE_CHECK_CONTINUE;
> +		} else {
> +			*scan_result = SCAN_EXCEED_NONE_PTE;
> +			count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
> +			return PTE_CHECK_FAIL;
> +		}
> +	} else if (!pte_present(pte)) {
> +		if (!unmapped) {
> +			*scan_result = SCAN_PTE_NON_PRESENT;
> +			return PTE_CHECK_FAIL;
> +		}
> +
> +		if (non_swap_entry(pte_to_swp_entry(pte))) {
> +			*scan_result = SCAN_PTE_NON_PRESENT;
> +			return PTE_CHECK_FAIL;
> +		}
> +
> +		(*unmapped)++;
> +		if (!cc->is_khugepaged ||
> +		    *unmapped <= khugepaged_max_ptes_swap) {
> +			/*
> +			 * Always be strict with uffd-wp enabled swap
> +			 * entries. Please see comment below for
> +			 * pte_uffd_wp().
> +			 */
> +			if (pte_swp_uffd_wp(pte)) {
> +				*scan_result = SCAN_PTE_UFFD_WP;
> +				return PTE_CHECK_FAIL;
> +			}
> +			return PTE_CHECK_CONTINUE;
> +		} else {
> +			*scan_result = SCAN_EXCEED_SWAP_PTE;
> +			count_vm_event(THP_SCAN_EXCEED_SWAP_PTE);
> +			return PTE_CHECK_FAIL;
> +		}
> +	} else if (pte_uffd_wp(pte)) {
> +		/*
> +		 * Don't collapse the page if any of the small PTEs are
> +		 * armed with uffd write protection. Here we can also mark
> +		 * the new huge pmd as write protected if any of the small
> +		 * ones is marked but that could bring unknown userfault
> +		 * messages that falls outside of the registered range.
> +		 * So, just be simple.
> +		 */
> +		*scan_result = SCAN_PTE_UFFD_WP;
> +		return PTE_CHECK_FAIL;
> +	}
> +
> +	page = vm_normal_page(vma, addr, pte);

You should use vm_normal_folio here and drop struct page altogether - this was also
noted during the review of the mTHP collapse patchset.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH mm-new v2 3/3] mm/khugepaged: merge PTE scanning logic into a new helper
  2025-10-07  6:28   ` Dev Jain
@ 2025-10-07  8:32     ` Lance Yang
  0 siblings, 0 replies; 8+ messages in thread
From: Lance Yang @ 2025-10-07  8:32 UTC (permalink / raw)
  To: Dev Jain
  Cc: ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts, baohua,
	ioworker0, richard.weiyang, linux-kernel, linux-mm, akpm,
	lorenzo.stoakes, david



On 2025/10/7 14:28, Dev Jain wrote:
> 
> On 06/10/25 8:13 pm, Lance Yang wrote:
>> +static inline int thp_collapse_check_pte(pte_t pte, struct 
>> vm_area_struct *vma,
>> +        unsigned long addr, struct collapse_control *cc,
>> +        struct folio **foliop, int *none_or_zero, int *unmapped,
>> +        int *shared, int *scan_result)
> 
> Nit: Will prefer the cc parameter to go at the last.

Yep, got it.

> 
>> +{
>> +    struct folio *folio = NULL;
>> +    struct page *page = NULL;
>> +
>> +    if (pte_none(pte) || is_zero_pfn(pte_pfn(pte))) {
>> +        (*none_or_zero)++;
>> +        if (!userfaultfd_armed(vma) &&
>> +            (!cc->is_khugepaged ||
>> +             *none_or_zero <= khugepaged_max_ptes_none)) {
>> +            return PTE_CHECK_CONTINUE;
>> +        } else {
>> +            *scan_result = SCAN_EXCEED_NONE_PTE;
>> +            count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
>> +            return PTE_CHECK_FAIL;
>> +        }
>> +    } else if (!pte_present(pte)) {
>> +        if (!unmapped) {
>> +            *scan_result = SCAN_PTE_NON_PRESENT;
>> +            return PTE_CHECK_FAIL;
>> +        }
>> +
>> +        if (non_swap_entry(pte_to_swp_entry(pte))) {
>> +            *scan_result = SCAN_PTE_NON_PRESENT;
>> +            return PTE_CHECK_FAIL;
>> +        }
>> +
>> +        (*unmapped)++;
>> +        if (!cc->is_khugepaged ||
>> +            *unmapped <= khugepaged_max_ptes_swap) {
>> +            /*
>> +             * Always be strict with uffd-wp enabled swap
>> +             * entries. Please see comment below for
>> +             * pte_uffd_wp().
>> +             */
>> +            if (pte_swp_uffd_wp(pte)) {
>> +                *scan_result = SCAN_PTE_UFFD_WP;
>> +                return PTE_CHECK_FAIL;
>> +            }
>> +            return PTE_CHECK_CONTINUE;
>> +        } else {
>> +            *scan_result = SCAN_EXCEED_SWAP_PTE;
>> +            count_vm_event(THP_SCAN_EXCEED_SWAP_PTE);
>> +            return PTE_CHECK_FAIL;
>> +        }
>> +    } else if (pte_uffd_wp(pte)) {
>> +        /*
>> +         * Don't collapse the page if any of the small PTEs are
>> +         * armed with uffd write protection. Here we can also mark
>> +         * the new huge pmd as write protected if any of the small
>> +         * ones is marked but that could bring unknown userfault
>> +         * messages that falls outside of the registered range.
>> +         * So, just be simple.
>> +         */
>> +        *scan_result = SCAN_PTE_UFFD_WP;
>> +        return PTE_CHECK_FAIL;
>> +    }
>> +
>> +    page = vm_normal_page(vma, addr, pte);
> 
> You should use vm_normal_folio here and drop struct page altogether - 
> this was also
> noted during the review of the mTHP collapse patchset.

Right, I missed that vm_normal_folio() was the way to go here :)

Thanks for the pointer!
Lance


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-10-07  8:32 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-06 14:43 [PATCH mm-new v2 0/3] mm/khugepaged: refactor and merge PTE scanning logic Lance Yang
2025-10-06 14:43 ` [PATCH mm-new v2 1/3] mm/khugepaged: optimize PTE scanning with if-else-if-else-if chain Lance Yang
2025-10-06 14:43 ` [PATCH mm-new v2 2/3] mm/khugepaged: use VM_WARN_ON_FOLIO instead of VM_BUG_ON_FOLIO for non-anon folios Lance Yang
2025-10-07  0:35   ` Wei Yang
2025-10-07  4:39   ` Dev Jain
2025-10-06 14:43 ` [PATCH mm-new v2 3/3] mm/khugepaged: merge PTE scanning logic into a new helper Lance Yang
2025-10-07  6:28   ` Dev Jain
2025-10-07  8:32     ` Lance Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox