linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V2 0/5] mm/khugepaged: cleanups and scan limit fix
@ 2025-12-24 11:13 Shivank Garg
  2025-12-24 11:13 ` [PATCH V2 1/5] mm/khugepaged: remove unnecessary goto 'skip' label Shivank Garg
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Shivank Garg @ 2025-12-24 11:13 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Lorenzo Stoakes
  Cc: Zi Yan, Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, linux-mm, linux-kernel,
	shivankg

This series contains several cleanups for mm/khugepaged.c to improve code
readability and type safety, and one functional fix to ensure
khugepaged_scan_mm_slot() correctly accounts for small VMAs towards
scan limit.

Thanks,

v2:
- Added a fix for small VMAs not being counted in the scan limit (Wei)
- Updated 'progress' to 'unsigned int' to match types
- Update return types of internal functions to use enum scan_result (Zi)
- Add void wrapper collapse_pte_mapped_thp() for external callers to avoid
  exposing internal enum (David)

v1:
https://lore.kernel.org/linux-mm/20251216111139.95438-2-shivankg@amd.com

Shivank Garg (5):
  mm/khugepaged: remove unnecessary goto 'skip' label
  mm/khugepaged: count small VMAs towards scan limit
  mm/khugepaged: change collapse_pte_mapped_thp() to return void
  mm/khugepaged: use enum scan_result for result variables and return
    types
  mm/khugepaged: make khugepaged_collapse_control static

 include/linux/khugepaged.h |   9 +--
 mm/khugepaged.c            | 158 ++++++++++++++++++++-----------------
 2 files changed, 88 insertions(+), 79 deletions(-)


base-commit: cd119c65a615bd7bfe8cda715a77132c8e3da067
-- 
2.43.0



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH V2 1/5] mm/khugepaged: remove unnecessary goto 'skip' label
  2025-12-24 11:13 [PATCH V2 0/5] mm/khugepaged: cleanups and scan limit fix Shivank Garg
@ 2025-12-24 11:13 ` Shivank Garg
  2025-12-24 11:34   ` Lance Yang
  2025-12-24 11:13 ` [PATCH V2 2/5] mm/khugepaged: count small VMAs towards scan limit Shivank Garg
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Shivank Garg @ 2025-12-24 11:13 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Lorenzo Stoakes
  Cc: Zi Yan, Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, linux-mm, linux-kernel,
	shivankg

Replace goto skip with actual logic for better code readability.

No functional change.

Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Signed-off-by: Shivank Garg <shivankg@amd.com>
---
 mm/khugepaged.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 6c8c35d3e0c9..107146f012b1 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2442,14 +2442,15 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
 			break;
 		}
 		if (!thp_vma_allowable_order(vma, vma->vm_flags, TVA_KHUGEPAGED, PMD_ORDER)) {
-skip:
 			progress++;
 			continue;
 		}
 		hstart = round_up(vma->vm_start, HPAGE_PMD_SIZE);
 		hend = round_down(vma->vm_end, HPAGE_PMD_SIZE);
-		if (khugepaged_scan.address > hend)
-			goto skip;
+		if (khugepaged_scan.address > hend) {
+			progress++;
+			continue;
+		}
 		if (khugepaged_scan.address < hstart)
 			khugepaged_scan.address = hstart;
 		VM_BUG_ON(khugepaged_scan.address & ~HPAGE_PMD_MASK);
-- 
2.43.0



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH V2 2/5] mm/khugepaged: count small VMAs towards scan limit
  2025-12-24 11:13 [PATCH V2 0/5] mm/khugepaged: cleanups and scan limit fix Shivank Garg
  2025-12-24 11:13 ` [PATCH V2 1/5] mm/khugepaged: remove unnecessary goto 'skip' label Shivank Garg
@ 2025-12-24 11:13 ` Shivank Garg
  2025-12-24 11:51   ` Lance Yang
  2025-12-24 11:13 ` [PATCH V2 3/5] mm/khugepaged: change collapse_pte_mapped_thp() to return void Shivank Garg
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Shivank Garg @ 2025-12-24 11:13 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Lorenzo Stoakes
  Cc: Zi Yan, Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, linux-mm, linux-kernel,
	shivankg, Wei Yang

The khugepaged_scan_mm_slot() uses a 'progress' counter to limit the
amount of work performed and consists of three components:
1. Transitioning to a new mm (+1).
2. Skipping an unsuitable VMA (+1).
3. Scanning a PMD-sized range (+HPAGE_PMD_NR).

Consider a 1MB VMA sitting between two 2MB alignment boundaries:

     vma1       vma2   vma3
    +----------+------+----------+
    |2M        |1M    |2M        |
    +----------+------+----------+
               ^      ^
               start  end
               ^
          hstart,hend

In this case, for vma2:
  hstart = round_up(start, HPAGE_PMD_SIZE)  -> Next 2MB alignment
  hend   = round_down(end, HPAGE_PMD_SIZE) -> Prev 2MB alignment

Currently, since `hend <= hstart`, VMAs that are too small or unaligned
to contain a hugepage are skipped without incrementing 'progress'.
A process containing a large number of such small VMAs will unfairly
consume more CPU cycles before yielding compared to a process with
fewer, larger, or aligned VMAs.

Fix this by incrementing progress when the `hend <= hstart` condition
is met.

Additionally, change 'progress' type to `unsigned int` to match both
the 'pages' type and the function return value.

Suggested-by: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Shivank Garg <shivankg@amd.com>
---
 mm/khugepaged.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 107146f012b1..0b549c3250f9 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2403,7 +2403,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
 	struct mm_slot *slot;
 	struct mm_struct *mm;
 	struct vm_area_struct *vma;
-	int progress = 0;
+	unsigned int progress = 0;
 
 	VM_BUG_ON(!pages);
 	lockdep_assert_held(&khugepaged_mm_lock);
@@ -2447,7 +2447,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
 		}
 		hstart = round_up(vma->vm_start, HPAGE_PMD_SIZE);
 		hend = round_down(vma->vm_end, HPAGE_PMD_SIZE);
-		if (khugepaged_scan.address > hend) {
+		if (khugepaged_scan.address > hend || hend <= hstart) {
 			progress++;
 			continue;
 		}
-- 
2.43.0



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH V2 3/5] mm/khugepaged: change collapse_pte_mapped_thp() to return void
  2025-12-24 11:13 [PATCH V2 0/5] mm/khugepaged: cleanups and scan limit fix Shivank Garg
  2025-12-24 11:13 ` [PATCH V2 1/5] mm/khugepaged: remove unnecessary goto 'skip' label Shivank Garg
  2025-12-24 11:13 ` [PATCH V2 2/5] mm/khugepaged: count small VMAs towards scan limit Shivank Garg
@ 2025-12-24 11:13 ` Shivank Garg
  2025-12-24 12:21   ` Lance Yang
  2025-12-29 16:40   ` Zi Yan
  2025-12-24 11:13 ` [PATCH V2 4/5] mm/khugepaged: use enum scan_result for result variables and return types Shivank Garg
  2025-12-24 11:13 ` [PATCH V2 5/5] mm/khugepaged: make khugepaged_collapse_control static Shivank Garg
  4 siblings, 2 replies; 13+ messages in thread
From: Shivank Garg @ 2025-12-24 11:13 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Lorenzo Stoakes
  Cc: Zi Yan, Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, linux-mm, linux-kernel,
	shivankg

The only external caller of collapse_pte_mapped_thp() is uprobe, which
ignores the return value. Change the external API to return void to
simplify the interface.

Introduce try_collapse_pte_mapped_thp() for internal use that preserves
the return value. This prepares for future patch that will convert
the return type to use enum scan_result.

Suggested-by: David Hildenbrand (Red Hat) <david@kernel.org>
Signed-off-by: Shivank Garg <shivankg@amd.com>
---
 include/linux/khugepaged.h |  9 ++++-----
 mm/khugepaged.c            | 40 ++++++++++++++++++++++----------------
 2 files changed, 27 insertions(+), 22 deletions(-)

diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h
index eb1946a70cff..37b992b22bba 100644
--- a/include/linux/khugepaged.h
+++ b/include/linux/khugepaged.h
@@ -17,8 +17,8 @@ extern void khugepaged_enter_vma(struct vm_area_struct *vma,
 				 vm_flags_t vm_flags);
 extern void khugepaged_min_free_kbytes_update(void);
 extern bool current_is_khugepaged(void);
-extern int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr,
-				   bool install_pmd);
+extern void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr,
+				    bool install_pmd);
 
 static inline void khugepaged_fork(struct mm_struct *mm, struct mm_struct *oldmm)
 {
@@ -42,10 +42,9 @@ static inline void khugepaged_enter_vma(struct vm_area_struct *vma,
 					vm_flags_t vm_flags)
 {
 }
-static inline int collapse_pte_mapped_thp(struct mm_struct *mm,
-					  unsigned long addr, bool install_pmd)
+static inline void collapse_pte_mapped_thp(struct mm_struct *mm,
+					   unsigned long addr, bool install_pmd)
 {
-	return 0;
 }
 
 static inline void khugepaged_min_free_kbytes_update(void)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 0b549c3250f9..04ff0730c9a1 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1477,20 +1477,8 @@ static int set_huge_pmd(struct vm_area_struct *vma, unsigned long addr,
 	return SCAN_SUCCEED;
 }
 
-/**
- * collapse_pte_mapped_thp - Try to collapse a pte-mapped THP for mm at
- * address haddr.
- *
- * @mm: process address space where collapse happens
- * @addr: THP collapse address
- * @install_pmd: If a huge PMD should be installed
- *
- * This function checks whether all the PTEs in the PMD are pointing to the
- * right THP. If so, retract the page table so the THP can refault in with
- * as pmd-mapped. Possibly install a huge PMD mapping the THP.
- */
-int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr,
-			    bool install_pmd)
+static int try_collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr,
+				       bool install_pmd)
 {
 	int nr_mapped_ptes = 0, result = SCAN_FAIL;
 	unsigned int nr_batch_ptes;
@@ -1711,6 +1699,24 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr,
 	return result;
 }
 
+/**
+ * collapse_pte_mapped_thp - Try to collapse a pte-mapped THP for mm at
+ * address haddr.
+ *
+ * @mm: process address space where collapse happens
+ * @addr: THP collapse address
+ * @install_pmd: If a huge PMD should be installed
+ *
+ * This function checks whether all the PTEs in the PMD are pointing to the
+ * right THP. If so, retract the page table so the THP can refault in with
+ * as pmd-mapped. Possibly install a huge PMD mapping the THP.
+ */
+void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr,
+			     bool install_pmd)
+{
+	try_collapse_pte_mapped_thp(mm, addr, install_pmd);
+}
+
 /* Can we retract page tables for this file-backed VMA? */
 static bool file_backed_vma_is_retractable(struct vm_area_struct *vma)
 {
@@ -2227,7 +2233,7 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
 
 	/*
 	 * Remove pte page tables, so we can re-fault the page as huge.
-	 * If MADV_COLLAPSE, adjust result to call collapse_pte_mapped_thp().
+	 * If MADV_COLLAPSE, adjust result to call try_collapse_pte_mapped_thp().
 	 */
 	retract_page_tables(mapping, start);
 	if (cc && !cc->is_khugepaged)
@@ -2479,7 +2485,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
 					mmap_read_lock(mm);
 					if (hpage_collapse_test_exit_or_disable(mm))
 						goto breakouterloop;
-					*result = collapse_pte_mapped_thp(mm,
+					*result = try_collapse_pte_mapped_thp(mm,
 						khugepaged_scan.address, false);
 					if (*result == SCAN_PMD_MAPPED)
 						*result = SCAN_SUCCEED;
@@ -2869,7 +2875,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
 		case SCAN_PTE_MAPPED_HUGEPAGE:
 			BUG_ON(mmap_locked);
 			mmap_read_lock(mm);
-			result = collapse_pte_mapped_thp(mm, addr, true);
+			result = try_collapse_pte_mapped_thp(mm, addr, true);
 			mmap_read_unlock(mm);
 			goto handle_result;
 		/* Whitelisted set of results where continuing OK */
-- 
2.43.0



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH V2 4/5] mm/khugepaged: use enum scan_result for result variables and return types
  2025-12-24 11:13 [PATCH V2 0/5] mm/khugepaged: cleanups and scan limit fix Shivank Garg
                   ` (2 preceding siblings ...)
  2025-12-24 11:13 ` [PATCH V2 3/5] mm/khugepaged: change collapse_pte_mapped_thp() to return void Shivank Garg
@ 2025-12-24 11:13 ` Shivank Garg
  2025-12-29 16:41   ` Zi Yan
  2025-12-24 11:13 ` [PATCH V2 5/5] mm/khugepaged: make khugepaged_collapse_control static Shivank Garg
  4 siblings, 1 reply; 13+ messages in thread
From: Shivank Garg @ 2025-12-24 11:13 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Lorenzo Stoakes
  Cc: Zi Yan, Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, linux-mm, linux-kernel,
	shivankg

Convert result variables and return types from int to enum scan_result
throughout khugepaged code. This improves type safety and code clarity
by making the intent explicit.

No functional change.

Signed-off-by: Shivank Garg <shivankg@amd.com>
---
 mm/khugepaged.c | 111 +++++++++++++++++++++++++-----------------------
 1 file changed, 57 insertions(+), 54 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 04ff0730c9a1..6892b23d6fc4 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -537,17 +537,18 @@ static void release_pte_pages(pte_t *pte, pte_t *_pte,
 	}
 }
 
-static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
-					unsigned long start_addr,
-					pte_t *pte,
-					struct collapse_control *cc,
-					struct list_head *compound_pagelist)
+static enum scan_result __collapse_huge_page_isolate(struct vm_area_struct *vma,
+						     unsigned long start_addr,
+						     pte_t *pte,
+						     struct collapse_control *cc,
+						     struct list_head *compound_pagelist)
 {
 	struct page *page = NULL;
 	struct folio *folio = NULL;
 	unsigned long addr = start_addr;
 	pte_t *_pte;
-	int none_or_zero = 0, shared = 0, result = SCAN_FAIL, referenced = 0;
+	int none_or_zero = 0, shared = 0, referenced = 0;
+	enum scan_result result = SCAN_FAIL;
 
 	for (_pte = pte; _pte < pte + HPAGE_PMD_NR;
 	     _pte++, addr += PAGE_SIZE) {
@@ -780,13 +781,13 @@ static void __collapse_huge_page_copy_failed(pte_t *pte,
  * @ptl: lock on raw pages' PTEs
  * @compound_pagelist: list that stores compound pages
  */
-static int __collapse_huge_page_copy(pte_t *pte, struct folio *folio,
+static enum scan_result __collapse_huge_page_copy(pte_t *pte, struct folio *folio,
 		pmd_t *pmd, pmd_t orig_pmd, struct vm_area_struct *vma,
 		unsigned long address, spinlock_t *ptl,
 		struct list_head *compound_pagelist)
 {
 	unsigned int i;
-	int result = SCAN_SUCCEED;
+	enum scan_result result = SCAN_SUCCEED;
 
 	/*
 	 * Copying pages' contents is subject to memory poison at any iteration.
@@ -898,10 +899,9 @@ static int hpage_collapse_find_target_node(struct collapse_control *cc)
  * Returns enum scan_result value.
  */
 
-static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
-				   bool expect_anon,
-				   struct vm_area_struct **vmap,
-				   struct collapse_control *cc)
+static enum scan_result hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
+						bool expect_anon, struct vm_area_struct **vmap,
+						struct collapse_control *cc)
 {
 	struct vm_area_struct *vma;
 	enum tva_type type = cc->is_khugepaged ? TVA_KHUGEPAGED :
@@ -930,7 +930,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
 	return SCAN_SUCCEED;
 }
 
-static inline int check_pmd_state(pmd_t *pmd)
+static inline enum scan_result check_pmd_state(pmd_t *pmd)
 {
 	pmd_t pmde = pmdp_get_lockless(pmd);
 
@@ -953,9 +953,9 @@ static inline int check_pmd_state(pmd_t *pmd)
 	return SCAN_SUCCEED;
 }
 
-static int find_pmd_or_thp_or_none(struct mm_struct *mm,
-				   unsigned long address,
-				   pmd_t **pmd)
+static enum scan_result find_pmd_or_thp_or_none(struct mm_struct *mm,
+						unsigned long address,
+						pmd_t **pmd)
 {
 	*pmd = mm_find_pmd(mm, address);
 	if (!*pmd)
@@ -964,12 +964,12 @@ static int find_pmd_or_thp_or_none(struct mm_struct *mm,
 	return check_pmd_state(*pmd);
 }
 
-static int check_pmd_still_valid(struct mm_struct *mm,
-				 unsigned long address,
-				 pmd_t *pmd)
+static enum scan_result check_pmd_still_valid(struct mm_struct *mm,
+					      unsigned long address,
+					      pmd_t *pmd)
 {
 	pmd_t *new_pmd;
-	int result = find_pmd_or_thp_or_none(mm, address, &new_pmd);
+	enum scan_result result = find_pmd_or_thp_or_none(mm, address, &new_pmd);
 
 	if (result != SCAN_SUCCEED)
 		return result;
@@ -985,15 +985,15 @@ static int check_pmd_still_valid(struct mm_struct *mm,
  * Called and returns without pte mapped or spinlocks held.
  * Returns result: if not SCAN_SUCCEED, mmap_lock has been released.
  */
-static int __collapse_huge_page_swapin(struct mm_struct *mm,
-				       struct vm_area_struct *vma,
-				       unsigned long start_addr, pmd_t *pmd,
-				       int referenced)
+static enum scan_result __collapse_huge_page_swapin(struct mm_struct *mm,
+						    struct vm_area_struct *vma,
+						    unsigned long start_addr, pmd_t *pmd,
+						    int referenced)
 {
 	int swapped_in = 0;
 	vm_fault_t ret = 0;
 	unsigned long addr, end = start_addr + (HPAGE_PMD_NR * PAGE_SIZE);
-	int result;
+	enum scan_result result;
 	pte_t *pte = NULL;
 	spinlock_t *ptl;
 
@@ -1062,8 +1062,8 @@ static int __collapse_huge_page_swapin(struct mm_struct *mm,
 	return result;
 }
 
-static int alloc_charge_folio(struct folio **foliop, struct mm_struct *mm,
-			      struct collapse_control *cc)
+static enum scan_result alloc_charge_folio(struct folio **foliop, struct mm_struct *mm,
+					   struct collapse_control *cc)
 {
 	gfp_t gfp = (cc->is_khugepaged ? alloc_hugepage_khugepaged_gfpmask() :
 		     GFP_TRANSHUGE);
@@ -1090,9 +1090,9 @@ static int alloc_charge_folio(struct folio **foliop, struct mm_struct *mm,
 	return SCAN_SUCCEED;
 }
 
-static int collapse_huge_page(struct mm_struct *mm, unsigned long address,
-			      int referenced, int unmapped,
-			      struct collapse_control *cc)
+static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long address,
+					   int referenced, int unmapped,
+					   struct collapse_control *cc)
 {
 	LIST_HEAD(compound_pagelist);
 	pmd_t *pmd, _pmd;
@@ -1100,7 +1100,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address,
 	pgtable_t pgtable;
 	struct folio *folio;
 	spinlock_t *pmd_ptl, *pte_ptl;
-	int result = SCAN_FAIL;
+	enum scan_result result = SCAN_FAIL;
 	struct vm_area_struct *vma;
 	struct mmu_notifier_range range;
 
@@ -1246,15 +1246,15 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address,
 	return result;
 }
 
-static int hpage_collapse_scan_pmd(struct mm_struct *mm,
-				   struct vm_area_struct *vma,
-				   unsigned long start_addr, bool *mmap_locked,
-				   struct collapse_control *cc)
+static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *mm,
+						struct vm_area_struct *vma,
+						unsigned long start_addr, bool *mmap_locked,
+						struct collapse_control *cc)
 {
 	pmd_t *pmd;
 	pte_t *pte, *_pte;
-	int result = SCAN_FAIL, referenced = 0;
-	int none_or_zero = 0, shared = 0;
+	int none_or_zero = 0, shared = 0, referenced = 0;
+	enum scan_result result = SCAN_FAIL;
 	struct page *page = NULL;
 	struct folio *folio = NULL;
 	unsigned long addr;
@@ -1441,8 +1441,8 @@ static void collect_mm_slot(struct mm_slot *slot)
 }
 
 /* folio must be locked, and mmap_lock must be held */
-static int set_huge_pmd(struct vm_area_struct *vma, unsigned long addr,
-			pmd_t *pmdp, struct folio *folio, struct page *page)
+static enum scan_result set_huge_pmd(struct vm_area_struct *vma, unsigned long addr,
+				     pmd_t *pmdp, struct folio *folio, struct page *page)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	struct vm_fault vmf = {
@@ -1477,10 +1477,11 @@ static int set_huge_pmd(struct vm_area_struct *vma, unsigned long addr,
 	return SCAN_SUCCEED;
 }
 
-static int try_collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr,
-				       bool install_pmd)
+static enum scan_result try_collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr,
+						    bool install_pmd)
 {
-	int nr_mapped_ptes = 0, result = SCAN_FAIL;
+	enum scan_result result = SCAN_FAIL;
+	int nr_mapped_ptes = 0;
 	unsigned int nr_batch_ptes;
 	struct mmu_notifier_range range;
 	bool notified = false;
@@ -1862,9 +1863,9 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
  *    + unlock old pages
  *    + unlock and free huge page;
  */
-static int collapse_file(struct mm_struct *mm, unsigned long addr,
-			 struct file *file, pgoff_t start,
-			 struct collapse_control *cc)
+static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
+				      struct file *file, pgoff_t start,
+				      struct collapse_control *cc)
 {
 	struct address_space *mapping = file->f_mapping;
 	struct page *dst;
@@ -1872,7 +1873,8 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
 	pgoff_t index = 0, end = start + HPAGE_PMD_NR;
 	LIST_HEAD(pagelist);
 	XA_STATE_ORDER(xas, &mapping->i_pages, start, HPAGE_PMD_ORDER);
-	int nr_none = 0, result = SCAN_SUCCEED;
+	enum scan_result result = SCAN_SUCCEED;
+	int nr_none = 0;
 	bool is_shmem = shmem_file(file);
 
 	VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
@@ -2293,16 +2295,16 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
 	return result;
 }
 
-static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr,
-				    struct file *file, pgoff_t start,
-				    struct collapse_control *cc)
+static enum scan_result hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr,
+						 struct file *file, pgoff_t start,
+						 struct collapse_control *cc)
 {
 	struct folio *folio = NULL;
 	struct address_space *mapping = file->f_mapping;
 	XA_STATE(xas, &mapping->i_pages, start);
 	int present, swap;
 	int node = NUMA_NO_NODE;
-	int result = SCAN_SUCCEED;
+	enum scan_result result = SCAN_SUCCEED;
 
 	present = 0;
 	swap = 0;
@@ -2400,7 +2402,7 @@ static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr,
 	return result;
 }
 
-static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
+static unsigned int khugepaged_scan_mm_slot(unsigned int pages, enum scan_result *result,
 					    struct collapse_control *cc)
 	__releases(&khugepaged_mm_lock)
 	__acquires(&khugepaged_mm_lock)
@@ -2561,7 +2563,7 @@ static void khugepaged_do_scan(struct collapse_control *cc)
 	unsigned int progress = 0, pass_through_head = 0;
 	unsigned int pages = READ_ONCE(khugepaged_pages_to_scan);
 	bool wait = true;
-	int result = SCAN_SUCCEED;
+	enum scan_result result = SCAN_SUCCEED;
 
 	lru_add_drain_all();
 
@@ -2774,7 +2776,8 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
 	struct collapse_control *cc;
 	struct mm_struct *mm = vma->vm_mm;
 	unsigned long hstart, hend, addr;
-	int thps = 0, last_fail = SCAN_FAIL;
+	enum scan_result last_fail = SCAN_FAIL;
+	int thps = 0;
 	bool mmap_locked = true;
 
 	BUG_ON(vma->vm_start > start);
@@ -2796,7 +2799,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
 
 	for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) {
 		bool retried = false;
-		int result = SCAN_FAIL;
+		enum scan_result result = SCAN_FAIL;
 
 		if (!mmap_locked) {
 retry:
-- 
2.43.0



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH V2 5/5] mm/khugepaged: make khugepaged_collapse_control static
  2025-12-24 11:13 [PATCH V2 0/5] mm/khugepaged: cleanups and scan limit fix Shivank Garg
                   ` (3 preceding siblings ...)
  2025-12-24 11:13 ` [PATCH V2 4/5] mm/khugepaged: use enum scan_result for result variables and return types Shivank Garg
@ 2025-12-24 11:13 ` Shivank Garg
  4 siblings, 0 replies; 13+ messages in thread
From: Shivank Garg @ 2025-12-24 11:13 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Lorenzo Stoakes
  Cc: Zi Yan, Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, linux-mm, linux-kernel,
	shivankg, Wei Yang

The global variable 'khugepaged_collapse_control' is not used outside of
mm/khugepaged.c. Make it static to limit its scope.

Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Signed-off-by: Shivank Garg <shivankg@amd.com>
---
 mm/khugepaged.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 6892b23d6fc4..4df480a87a74 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -829,7 +829,7 @@ static void khugepaged_alloc_sleep(void)
 	remove_wait_queue(&khugepaged_wait, &wait);
 }
 
-struct collapse_control khugepaged_collapse_control = {
+static struct collapse_control khugepaged_collapse_control = {
 	.is_khugepaged = true,
 };
 
-- 
2.43.0



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V2 1/5] mm/khugepaged: remove unnecessary goto 'skip' label
  2025-12-24 11:13 ` [PATCH V2 1/5] mm/khugepaged: remove unnecessary goto 'skip' label Shivank Garg
@ 2025-12-24 11:34   ` Lance Yang
  0 siblings, 0 replies; 13+ messages in thread
From: Lance Yang @ 2025-12-24 11:34 UTC (permalink / raw)
  To: Shivank Garg
  Cc: Zi Yan, Lorenzo Stoakes, Baolin Wang, Andrew Morton,
	Liam R . Howlett, Nico Pache, Ryan Roberts, David Hildenbrand,
	Dev Jain, Barry Song, linux-mm, linux-kernel



On 2025/12/24 19:13, Shivank Garg wrote:
> Replace goto skip with actual logic for better code readability.
> 
> No functional change.
> 
> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
> Signed-off-by: Shivank Garg <shivankg@amd.com>
> ---

LGTM.

Reviewed-by: Lance Yang <lance.yang@linux.dev>

>   mm/khugepaged.c | 7 ++++---
>   1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 6c8c35d3e0c9..107146f012b1 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2442,14 +2442,15 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
>   			break;
>   		}
>   		if (!thp_vma_allowable_order(vma, vma->vm_flags, TVA_KHUGEPAGED, PMD_ORDER)) {
> -skip:
>   			progress++;
>   			continue;
>   		}
>   		hstart = round_up(vma->vm_start, HPAGE_PMD_SIZE);
>   		hend = round_down(vma->vm_end, HPAGE_PMD_SIZE);
> -		if (khugepaged_scan.address > hend)
> -			goto skip;
> +		if (khugepaged_scan.address > hend) {
> +			progress++;
> +			continue;
> +		}
>   		if (khugepaged_scan.address < hstart)
>   			khugepaged_scan.address = hstart;
>   		VM_BUG_ON(khugepaged_scan.address & ~HPAGE_PMD_MASK);



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V2 2/5] mm/khugepaged: count small VMAs towards scan limit
  2025-12-24 11:13 ` [PATCH V2 2/5] mm/khugepaged: count small VMAs towards scan limit Shivank Garg
@ 2025-12-24 11:51   ` Lance Yang
  2025-12-24 14:49     ` Wei Yang
  0 siblings, 1 reply; 13+ messages in thread
From: Lance Yang @ 2025-12-24 11:51 UTC (permalink / raw)
  To: Shivank Garg
  Cc: Zi Yan, Andrew Morton, Baolin Wang, Liam R . Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Lorenzo Stoakes, David Hildenbrand,
	Barry Song, linux-mm, linux-kernel, Wei Yang



On 2025/12/24 19:13, Shivank Garg wrote:
> The khugepaged_scan_mm_slot() uses a 'progress' counter to limit the
> amount of work performed and consists of three components:
> 1. Transitioning to a new mm (+1).
> 2. Skipping an unsuitable VMA (+1).
> 3. Scanning a PMD-sized range (+HPAGE_PMD_NR).
> 
> Consider a 1MB VMA sitting between two 2MB alignment boundaries:
> 
>       vma1       vma2   vma3
>      +----------+------+----------+
>      |2M        |1M    |2M        |
>      +----------+------+----------+
>                 ^      ^
>                 start  end
>                 ^
>            hstart,hend
> 
> In this case, for vma2:
>    hstart = round_up(start, HPAGE_PMD_SIZE)  -> Next 2MB alignment
>    hend   = round_down(end, HPAGE_PMD_SIZE) -> Prev 2MB alignment
> 
> Currently, since `hend <= hstart`, VMAs that are too small or unaligned
> to contain a hugepage are skipped without incrementing 'progress'.
> A process containing a large number of such small VMAs will unfairly
> consume more CPU cycles before yielding compared to a process with
> fewer, larger, or aligned VMAs.
> 
> Fix this by incrementing progress when the `hend <= hstart` condition
> is met.
> 
> Additionally, change 'progress' type to `unsigned int` to match both
> the 'pages' type and the function return value.
> 
> Suggested-by: Wei Yang <richard.weiyang@gmail.com>
> Signed-off-by: Shivank Garg <shivankg@amd.com>
> ---
>   mm/khugepaged.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 107146f012b1..0b549c3250f9 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2403,7 +2403,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
>   	struct mm_slot *slot;
>   	struct mm_struct *mm;
>   	struct vm_area_struct *vma;
> -	int progress = 0;
> +	unsigned int progress = 0;
>   
>   	VM_BUG_ON(!pages);
>   	lockdep_assert_held(&khugepaged_mm_lock);
> @@ -2447,7 +2447,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
>   		}
>   		hstart = round_up(vma->vm_start, HPAGE_PMD_SIZE);
>   		hend = round_down(vma->vm_end, HPAGE_PMD_SIZE);
> -		if (khugepaged_scan.address > hend) {

Maybe add a short comment explaining why we increment progress for small 
VMAs ;)

Something like this:

		/* Count small VMAs that can't hold a hugepage towards scan limit */
> +		if (khugepaged_scan.address > hend || hend <= hstart) {
>   			progress++;
>   			continue;
>   		}

Otherwise, looks good to me.

Reviewed-by: Lance Yang <lance.yang@linux.dev>




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V2 3/5] mm/khugepaged: change collapse_pte_mapped_thp() to return void
  2025-12-24 11:13 ` [PATCH V2 3/5] mm/khugepaged: change collapse_pte_mapped_thp() to return void Shivank Garg
@ 2025-12-24 12:21   ` Lance Yang
  2025-12-29 16:40   ` Zi Yan
  1 sibling, 0 replies; 13+ messages in thread
From: Lance Yang @ 2025-12-24 12:21 UTC (permalink / raw)
  To: Shivank Garg
  Cc: Zi Yan, David Hildenbrand, Baolin Wang, Liam R . Howlett,
	Nico Pache, Andrew Morton, Ryan Roberts, Dev Jain, Barry Song,
	linux-mm, linux-kernel, Lorenzo Stoakes



On 2025/12/24 19:13, Shivank Garg wrote:
> The only external caller of collapse_pte_mapped_thp() is uprobe, which
> ignores the return value. Change the external API to return void to
> simplify the interface.
> 
> Introduce try_collapse_pte_mapped_thp() for internal use that preserves
> the return value. This prepares for future patch that will convert
> the return type to use enum scan_result.
> 
> Suggested-by: David Hildenbrand (Red Hat) <david@kernel.org>
> Signed-off-by: Shivank Garg <shivankg@amd.com>
> ---

Looks good overall, thanks!

Acked-by: Lance Yang <lance.yang@linux.dev>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V2 2/5] mm/khugepaged: count small VMAs towards scan limit
  2025-12-24 11:51   ` Lance Yang
@ 2025-12-24 14:49     ` Wei Yang
  2025-12-28 17:58       ` Garg, Shivank
  0 siblings, 1 reply; 13+ messages in thread
From: Wei Yang @ 2025-12-24 14:49 UTC (permalink / raw)
  To: Lance Yang
  Cc: Shivank Garg, Zi Yan, Andrew Morton, Baolin Wang,
	Liam R . Howlett, Nico Pache, Ryan Roberts, Dev Jain,
	Lorenzo Stoakes, David Hildenbrand, Barry Song, linux-mm,
	linux-kernel, Wei Yang

On Wed, Dec 24, 2025 at 07:51:36PM +0800, Lance Yang wrote:
>
>
>On 2025/12/24 19:13, Shivank Garg wrote:
>> The khugepaged_scan_mm_slot() uses a 'progress' counter to limit the
>> amount of work performed and consists of three components:
>> 1. Transitioning to a new mm (+1).

Hmm... maybe not only a new mm, but also we start another scan from last mm.

Since default khugepaged_pages_to_scan is 8 PMD, it looks very possible.

>> 2. Skipping an unsuitable VMA (+1).
>> 3. Scanning a PMD-sized range (+HPAGE_PMD_NR).
>> 
>> Consider a 1MB VMA sitting between two 2MB alignment boundaries:
>> 
>>       vma1       vma2   vma3
>>      +----------+------+----------+
>>      |2M        |1M    |2M        |
>>      +----------+------+----------+
>>                 ^      ^
>>                 start  end
>>                 ^
>>            hstart,hend
>> 
>> In this case, for vma2:
>>    hstart = round_up(start, HPAGE_PMD_SIZE)  -> Next 2MB alignment
>>    hend   = round_down(end, HPAGE_PMD_SIZE) -> Prev 2MB alignment
>> 
>> Currently, since `hend <= hstart`, VMAs that are too small or unaligned
>> to contain a hugepage are skipped without incrementing 'progress'.
>> A process containing a large number of such small VMAs will unfairly
>> consume more CPU cycles before yielding compared to a process with
>> fewer, larger, or aligned VMAs.
>> 
>> Fix this by incrementing progress when the `hend <= hstart` condition
>> is met.
>> 
>> Additionally, change 'progress' type to `unsigned int` to match both
>> the 'pages' type and the function return value.
>> 
>> Suggested-by: Wei Yang <richard.weiyang@gmail.com>
>> Signed-off-by: Shivank Garg <shivankg@amd.com>
>> ---
>>   mm/khugepaged.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>> 
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index 107146f012b1..0b549c3250f9 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -2403,7 +2403,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
>>   	struct mm_slot *slot;
>>   	struct mm_struct *mm;
>>   	struct vm_area_struct *vma;
>> -	int progress = 0;
>> +	unsigned int progress = 0;
>>   	VM_BUG_ON(!pages);
>>   	lockdep_assert_held(&khugepaged_mm_lock);
>> @@ -2447,7 +2447,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
>>   		}
>>   		hstart = round_up(vma->vm_start, HPAGE_PMD_SIZE);
>>   		hend = round_down(vma->vm_end, HPAGE_PMD_SIZE);
>> -		if (khugepaged_scan.address > hend) {
>
>Maybe add a short comment explaining why we increment progress for small VMAs
>;)
>
>Something like this:
>
>		/* Count small VMAs that can't hold a hugepage towards scan limit */
>> +		if (khugepaged_scan.address > hend || hend <= hstart) {
>>   			progress++;
>>   			continue;
>>   		}
>
>Otherwise, looks good to me.
>
>Reviewed-by: Lance Yang <lance.yang@linux.dev>
>

The code change LGTM.

Reviewed-by: Wei Yang <richard.weiyang@gmail.com>

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V2 2/5] mm/khugepaged: count small VMAs towards scan limit
  2025-12-24 14:49     ` Wei Yang
@ 2025-12-28 17:58       ` Garg, Shivank
  0 siblings, 0 replies; 13+ messages in thread
From: Garg, Shivank @ 2025-12-28 17:58 UTC (permalink / raw)
  To: Wei Yang, Lance Yang
  Cc: Zi Yan, Andrew Morton, Baolin Wang, Liam R . Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Lorenzo Stoakes, David Hildenbrand,
	Barry Song, linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3196 bytes --]



On 12/24/2025 8:19 PM, Wei Yang wrote:
> On Wed, Dec 24, 2025 at 07:51:36PM +0800, Lance Yang wrote:
>>
>>
>> On 2025/12/24 19:13, Shivank Garg wrote:
>>> The khugepaged_scan_mm_slot() uses a 'progress' counter to limit the
>>> amount of work performed and consists of three components:
>>> 1. Transitioning to a new mm (+1).
> 
> Hmm... maybe not only a new mm, but also we start another scan from last mm.
> 
> Since default khugepaged_pages_to_scan is 8 PMD, it looks very possible.
> 
It makes sense, will correct this.

>>> 2. Skipping an unsuitable VMA (+1).
>>> 3. Scanning a PMD-sized range (+HPAGE_PMD_NR).
>>>
>>> Consider a 1MB VMA sitting between two 2MB alignment boundaries:
>>>
>>>       vma1       vma2   vma3
>>>      +----------+------+----------+
>>>      |2M        |1M    |2M        |
>>>      +----------+------+----------+
>>>                 ^      ^
>>>                 start  end
>>>                 ^
>>>            hstart,hend
>>>
>>> In this case, for vma2:
>>>    hstart = round_up(start, HPAGE_PMD_SIZE)  -> Next 2MB alignment
>>>    hend   = round_down(end, HPAGE_PMD_SIZE) -> Prev 2MB alignment
>>>
>>> Currently, since `hend <= hstart`, VMAs that are too small or unaligned
>>> to contain a hugepage are skipped without incrementing 'progress'.
>>> A process containing a large number of such small VMAs will unfairly
>>> consume more CPU cycles before yielding compared to a process with
>>> fewer, larger, or aligned VMAs.
>>>
>>> Fix this by incrementing progress when the `hend <= hstart` condition
>>> is met.
>>>
>>> Additionally, change 'progress' type to `unsigned int` to match both
>>> the 'pages' type and the function return value.
>>>
>>> Suggested-by: Wei Yang <richard.weiyang@gmail.com>
>>> Signed-off-by: Shivank Garg <shivankg@amd.com>
>>> ---
>>>   mm/khugepaged.c | 4 ++--
>>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>> index 107146f012b1..0b549c3250f9 100644
>>> --- a/mm/khugepaged.c
>>> +++ b/mm/khugepaged.c
>>> @@ -2403,7 +2403,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
>>>   	struct mm_slot *slot;
>>>   	struct mm_struct *mm;
>>>   	struct vm_area_struct *vma;
>>> -	int progress = 0;
>>> +	unsigned int progress = 0;
>>>   	VM_BUG_ON(!pages);
>>>   	lockdep_assert_held(&khugepaged_mm_lock);
>>> @@ -2447,7 +2447,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
>>>   		}
>>>   		hstart = round_up(vma->vm_start, HPAGE_PMD_SIZE);
>>>   		hend = round_down(vma->vm_end, HPAGE_PMD_SIZE);
>>> -		if (khugepaged_scan.address > hend) {
>>
>> Maybe add a short comment explaining why we increment progress for small VMAs
>> ;)
>>
>> Something like this:
>>
>> 		/* Count small VMAs that can't hold a hugepage towards scan limit */

I'll add explanation.

>>> +		if (khugepaged_scan.address > hend || hend <= hstart) {
>>>   			progress++;
>>>   			continue;
>>>   		}
>>
>> Otherwise, looks good to me.
>>
>> Reviewed-by: Lance Yang <lance.yang@linux.dev>
>>
> 
> The code change LGTM.
> 
> Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
> 

Thanks Lance and Wei. I have made suggested changes.


[-- Attachment #2: 0002-mm-khugepaged-count-small-VMAs-towards-scan-limit.patch --]
[-- Type: text/plain, Size: 2550 bytes --]

From d464604c09cef70f0f2aa0f9607a977b4bcd7081 Mon Sep 17 00:00:00 2001
From: Shivank Garg <shivankg@amd.com>
Date: Wed, 17 Dec 2025 19:43:26 +0000
Subject: [PATCH V2 2/5] mm/khugepaged: count small VMAs towards scan limit

The khugepaged_scan_mm_slot() uses a 'progress' counter to limit the
amount of work performed and consists of three components:
1. Starting/resuming scan of mm slot (+1).
2. Skipping an unsuitable VMA (+1).
3. Scanning a PMD-sized range (+HPAGE_PMD_NR).

Consider a 1MB VMA sitting between two 2MB alignment boundaries:

     vma1       vma2   vma3
    +----------+------+----------+
    |2M        |1M    |2M        |
    +----------+------+----------+
               ^      ^
               start  end
               ^
          hstart,hend

In this case, for vma2:
  hstart = round_up(start, HPAGE_PMD_SIZE)  -> Next 2MB alignment
  hend   = round_down(end, HPAGE_PMD_SIZE) -> Prev 2MB alignment

Currently, since `hend <= hstart`, VMAs that are too small or unaligned
to contain a hugepage are skipped without incrementing 'progress'.
A process containing a large number of such small VMAs will unfairly
consume more CPU cycles before yielding compared to a process with
fewer, larger, or aligned VMAs.

Fix this by incrementing progress when the `hend <= hstart` condition
is met.

Additionally, change 'progress' to `unsigned int`. This matches both
the 'pages' type and the function return value.

Suggested-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Signed-off-by: Shivank Garg <shivankg@amd.com>
---
 mm/khugepaged.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 107146f012b1..155281c49169 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2403,7 +2403,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
 	struct mm_slot *slot;
 	struct mm_struct *mm;
 	struct vm_area_struct *vma;
-	int progress = 0;
+	unsigned int progress = 0;
 
 	VM_BUG_ON(!pages);
 	lockdep_assert_held(&khugepaged_mm_lock);
@@ -2447,7 +2447,8 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
 		}
 		hstart = round_up(vma->vm_start, HPAGE_PMD_SIZE);
 		hend = round_down(vma->vm_end, HPAGE_PMD_SIZE);
-		if (khugepaged_scan.address > hend) {
+		if (khugepaged_scan.address > hend || hend <= hstart) {
+			/* VMA already scanned or too small/unaligned for hugepage. */
 			progress++;
 			continue;
 		}
-- 
2.43.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V2 3/5] mm/khugepaged: change collapse_pte_mapped_thp() to return void
  2025-12-24 11:13 ` [PATCH V2 3/5] mm/khugepaged: change collapse_pte_mapped_thp() to return void Shivank Garg
  2025-12-24 12:21   ` Lance Yang
@ 2025-12-29 16:40   ` Zi Yan
  1 sibling, 0 replies; 13+ messages in thread
From: Zi Yan @ 2025-12-29 16:40 UTC (permalink / raw)
  To: Shivank Garg
  Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes, Baolin Wang,
	Liam R . Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, linux-mm, linux-kernel

On 24 Dec 2025, at 6:13, Shivank Garg wrote:

> The only external caller of collapse_pte_mapped_thp() is uprobe, which
> ignores the return value. Change the external API to return void to
> simplify the interface.
>
> Introduce try_collapse_pte_mapped_thp() for internal use that preserves
> the return value. This prepares for future patch that will convert
> the return type to use enum scan_result.
>
> Suggested-by: David Hildenbrand (Red Hat) <david@kernel.org>
> Signed-off-by: Shivank Garg <shivankg@amd.com>
> ---
>  include/linux/khugepaged.h |  9 ++++-----
>  mm/khugepaged.c            | 40 ++++++++++++++++++++++----------------
>  2 files changed, 27 insertions(+), 22 deletions(-)
>
LGTM.
Reviewed-by: Zi Yan <ziy@nvidia.com>

--
Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V2 4/5] mm/khugepaged: use enum scan_result for result variables and return types
  2025-12-24 11:13 ` [PATCH V2 4/5] mm/khugepaged: use enum scan_result for result variables and return types Shivank Garg
@ 2025-12-29 16:41   ` Zi Yan
  0 siblings, 0 replies; 13+ messages in thread
From: Zi Yan @ 2025-12-29 16:41 UTC (permalink / raw)
  To: Shivank Garg
  Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes, Baolin Wang,
	Liam R . Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, linux-mm, linux-kernel

On 24 Dec 2025, at 6:13, Shivank Garg wrote:

> Convert result variables and return types from int to enum scan_result
> throughout khugepaged code. This improves type safety and code clarity
> by making the intent explicit.
>
> No functional change.
>
> Signed-off-by: Shivank Garg <shivankg@amd.com>
> ---
>  mm/khugepaged.c | 111 +++++++++++++++++++++++++-----------------------
>  1 file changed, 57 insertions(+), 54 deletions(-)
>
Reviewed-by: Zi Yan <ziy@nvidia.com>

--
Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-12-29 16:43 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-12-24 11:13 [PATCH V2 0/5] mm/khugepaged: cleanups and scan limit fix Shivank Garg
2025-12-24 11:13 ` [PATCH V2 1/5] mm/khugepaged: remove unnecessary goto 'skip' label Shivank Garg
2025-12-24 11:34   ` Lance Yang
2025-12-24 11:13 ` [PATCH V2 2/5] mm/khugepaged: count small VMAs towards scan limit Shivank Garg
2025-12-24 11:51   ` Lance Yang
2025-12-24 14:49     ` Wei Yang
2025-12-28 17:58       ` Garg, Shivank
2025-12-24 11:13 ` [PATCH V2 3/5] mm/khugepaged: change collapse_pte_mapped_thp() to return void Shivank Garg
2025-12-24 12:21   ` Lance Yang
2025-12-29 16:40   ` Zi Yan
2025-12-24 11:13 ` [PATCH V2 4/5] mm/khugepaged: use enum scan_result for result variables and return types Shivank Garg
2025-12-29 16:41   ` Zi Yan
2025-12-24 11:13 ` [PATCH V2 5/5] mm/khugepaged: make khugepaged_collapse_control static Shivank Garg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox