linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/6] support batched checking of the young flag for MGLRU
@ 2026-03-06  6:43 Baolin Wang
  2026-03-06  6:43 ` [PATCH v3 1/6] mm: use inline helper functions instead of ugly macros Baolin Wang
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Baolin Wang @ 2026-03-06  6:43 UTC (permalink / raw)
  To: akpm, david
  Cc: catalin.marinas, will, lorenzo.stoakes, ryan.roberts,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, riel, harry.yoo,
	jannh, willy, baohua, dev.jain, axelrasmussen, yuanchu, weixugc,
	hannes, zhengqi.arch, shakeel.butt, baolin.wang, linux-mm,
	linux-arm-kernel, linux-kernel

This is a follow-up to the previous work [1], to support batched checking
of the young flag for MGLRU.

Similarly, batched checking of young flag for large folios can improve
performance during large-folio reclamation when MGLRU is enabled. I
observed noticeable performance improvements (see patch 5) on an Arm64
machine that supports contiguous PTEs. All mm-selftests are passed.

Patch 1 - 3: cleanup patches.
Patch 4: add a new generic batched PTE helper: test_and_clear_young_ptes().
Patch 5: support batched young flag checking for MGLRU.
Patch 6: implement the Arm64 arch-specific test_and_clear_young_ptes().

[1] https://lore.kernel.org/all/cover.1770645603.git.baolin.wang@linux.alibaba.com/

Changes from v2:
v2: https://lore.kernel.org/all/cover.1772185080.git.baolin.wang@linux.alibaba.com/
 - Update the commit message of patch 5 (per David).
 - Fix some coding style issues (per David).
 - Remove 'cur_pte' variable in lru_gen_look_around() (per David).
 - Move 'ptes += nr;' to the suitable place in folio_referenced_one() (per David).
 - Add acked tag from David. Thanks.

Changes from v1:
v1: https://lore.kernel.org/all/cover.1771897150.git.baolin.wang@linux.alibaba.com/
 - Update some commit message (per David).
 - Fix some coding style issues (per David).
 - Use VM_WARN_ON_ONCE_FOLIO() instead of VM_WARN_ON_FOLIO() (per Rik).
 - Define a generic ptep_test_and_clear_young_notify() (per David).
 - Remove redundant 'nr > 1' check in folio_referenced_one() (per David).
 - Rename some variables' to make code more clear (per David).
 - Add a new patch to rename some functions (per David).
 - Update the young counters with the batched count for MGLRU.
 - Collect reviewed tags from Rik, Barry, Alistair and David. Thanks.

Baolin Wang (6):
  mm: use inline helper functions instead of ugly macros
  mm: rename ptep/pmdp_clear_young_notify() to
    ptep/pmdp_test_and_clear_young_notify()
  mm: rmap: add a ZONE_DEVICE folio warning in folio_referenced()
  mm: add a batched helper to clear the young flag for large folios
  mm: support batched checking of the young flag for MGLRU
  arm64: mm: implement the architecture-specific
    test_and_clear_young_ptes()

 arch/arm64/include/asm/pgtable.h | 18 +++++++----
 include/linux/mmu_notifier.h     | 54 --------------------------------
 include/linux/mmzone.h           |  5 +--
 include/linux/pgtable.h          | 37 ++++++++++++++++++++++
 mm/internal.h                    | 52 ++++++++++++++++++++++++++++++
 mm/rmap.c                        | 29 ++++++++---------
 mm/vmscan.c                      | 45 +++++++++++++++++++-------
 7 files changed, 152 insertions(+), 88 deletions(-)

-- 
2.47.3



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v3 1/6] mm: use inline helper functions instead of ugly macros
  2026-03-06  6:43 [PATCH v3 0/6] support batched checking of the young flag for MGLRU Baolin Wang
@ 2026-03-06  6:43 ` Baolin Wang
  2026-03-06  6:43 ` [PATCH v3 2/6] mm: rename ptep/pmdp_clear_young_notify() to ptep/pmdp_test_and_clear_young_notify() Baolin Wang
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Baolin Wang @ 2026-03-06  6:43 UTC (permalink / raw)
  To: akpm, david
  Cc: catalin.marinas, will, lorenzo.stoakes, ryan.roberts,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, riel, harry.yoo,
	jannh, willy, baohua, dev.jain, axelrasmussen, yuanchu, weixugc,
	hannes, zhengqi.arch, shakeel.butt, baolin.wang, linux-mm,
	linux-arm-kernel, linux-kernel

People have already complained that these *_clear_young_notify() related
macros are very ugly, so let's use inline helpers to make them more readable.

In addition, we cannot implement these inline helper functions in the
mmu_notifier.h file, because some arch-specific files will include the
mmu_notifier.h, which introduces header compilation dependencies and causes
build errors (e.g., arch/arm64/include/asm/tlbflush.h). Moreover, since
these functions are only used in the mm, implementing these inline helpers
in the mm/internal.h header seems reasonable.

Reviewed-by: Rik van Riel <riel@surriel.com>
Reviewed-by: Barry Song <baohua@kernel.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 include/linux/mmu_notifier.h | 54 ------------------------------------
 mm/internal.h                | 52 ++++++++++++++++++++++++++++++++++
 2 files changed, 52 insertions(+), 54 deletions(-)

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index 8450e18a87c2..3705d350c863 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -516,55 +516,6 @@ static inline void mmu_notifier_range_init_owner(
 	range->owner = owner;
 }
 
-#define clear_flush_young_ptes_notify(__vma, __address, __ptep, __nr)	\
-({									\
-	int __young;							\
-	struct vm_area_struct *___vma = __vma;				\
-	unsigned long ___address = __address;				\
-	unsigned int ___nr = __nr;					\
-	__young = clear_flush_young_ptes(___vma, ___address, __ptep, ___nr);	\
-	__young |= mmu_notifier_clear_flush_young(___vma->vm_mm,	\
-						  ___address,		\
-						  ___address +		\
-						  ___nr * PAGE_SIZE);	\
-	__young;							\
-})
-
-#define pmdp_clear_flush_young_notify(__vma, __address, __pmdp)		\
-({									\
-	int __young;							\
-	struct vm_area_struct *___vma = __vma;				\
-	unsigned long ___address = __address;				\
-	__young = pmdp_clear_flush_young(___vma, ___address, __pmdp);	\
-	__young |= mmu_notifier_clear_flush_young(___vma->vm_mm,	\
-						  ___address,		\
-						  ___address +		\
-							PMD_SIZE);	\
-	__young;							\
-})
-
-#define ptep_clear_young_notify(__vma, __address, __ptep)		\
-({									\
-	int __young;							\
-	struct vm_area_struct *___vma = __vma;				\
-	unsigned long ___address = __address;				\
-	__young = ptep_test_and_clear_young(___vma, ___address, __ptep);\
-	__young |= mmu_notifier_clear_young(___vma->vm_mm, ___address,	\
-					    ___address + PAGE_SIZE);	\
-	__young;							\
-})
-
-#define pmdp_clear_young_notify(__vma, __address, __pmdp)		\
-({									\
-	int __young;							\
-	struct vm_area_struct *___vma = __vma;				\
-	unsigned long ___address = __address;				\
-	__young = pmdp_test_and_clear_young(___vma, ___address, __pmdp);\
-	__young |= mmu_notifier_clear_young(___vma->vm_mm, ___address,	\
-					    ___address + PMD_SIZE);	\
-	__young;							\
-})
-
 #else /* CONFIG_MMU_NOTIFIER */
 
 struct mmu_notifier_range {
@@ -652,11 +603,6 @@ static inline void mmu_notifier_subscriptions_destroy(struct mm_struct *mm)
 
 #define mmu_notifier_range_update_to_read_only(r) false
 
-#define clear_flush_young_ptes_notify clear_flush_young_ptes
-#define pmdp_clear_flush_young_notify pmdp_clear_flush_young
-#define ptep_clear_young_notify ptep_test_and_clear_young
-#define pmdp_clear_young_notify pmdp_test_and_clear_young
-
 static inline void mmu_notifier_synchronize(void)
 {
 }
diff --git a/mm/internal.h b/mm/internal.h
index 0d5208101762..05eb0303f277 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -11,6 +11,7 @@
 #include <linux/khugepaged.h>
 #include <linux/mm.h>
 #include <linux/mm_inline.h>
+#include <linux/mmu_notifier.h>
 #include <linux/pagemap.h>
 #include <linux/pagewalk.h>
 #include <linux/rmap.h>
@@ -1796,4 +1797,55 @@ static inline int io_remap_pfn_range_complete(struct vm_area_struct *vma,
 	return remap_pfn_range_complete(vma, addr, pfn, size, prot);
 }
 
+#ifdef CONFIG_MMU_NOTIFIER
+static inline int clear_flush_young_ptes_notify(struct vm_area_struct *vma,
+		unsigned long addr, pte_t *ptep, unsigned int nr)
+{
+	int young;
+
+	young = clear_flush_young_ptes(vma, addr, ptep, nr);
+	young |= mmu_notifier_clear_flush_young(vma->vm_mm, addr,
+						addr + nr * PAGE_SIZE);
+	return young;
+}
+
+static inline int pmdp_clear_flush_young_notify(struct vm_area_struct *vma,
+		unsigned long addr, pmd_t *pmdp)
+{
+	int young;
+
+	young = pmdp_clear_flush_young(vma, addr, pmdp);
+	young |= mmu_notifier_clear_flush_young(vma->vm_mm, addr, addr + PMD_SIZE);
+	return young;
+}
+
+static inline int ptep_clear_young_notify(struct vm_area_struct *vma,
+		unsigned long addr, pte_t *ptep)
+{
+	int young;
+
+	young = ptep_test_and_clear_young(vma, addr, ptep);
+	young |= mmu_notifier_clear_young(vma->vm_mm, addr, addr + PAGE_SIZE);
+	return young;
+}
+
+static inline int pmdp_clear_young_notify(struct vm_area_struct *vma,
+		unsigned long addr, pmd_t *pmdp)
+{
+	int young;
+
+	young = pmdp_test_and_clear_young(vma, addr, pmdp);
+	young |= mmu_notifier_clear_young(vma->vm_mm, addr, addr + PMD_SIZE);
+	return young;
+}
+
+#else /* CONFIG_MMU_NOTIFIER */
+
+#define clear_flush_young_ptes_notify	clear_flush_young_ptes
+#define pmdp_clear_flush_young_notify	pmdp_clear_flush_young
+#define ptep_clear_young_notify	ptep_test_and_clear_young
+#define pmdp_clear_young_notify	pmdp_test_and_clear_young
+
+#endif /* CONFIG_MMU_NOTIFIER */
+
 #endif	/* __MM_INTERNAL_H */
-- 
2.47.3



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v3 2/6] mm: rename ptep/pmdp_clear_young_notify() to ptep/pmdp_test_and_clear_young_notify()
  2026-03-06  6:43 [PATCH v3 0/6] support batched checking of the young flag for MGLRU Baolin Wang
  2026-03-06  6:43 ` [PATCH v3 1/6] mm: use inline helper functions instead of ugly macros Baolin Wang
@ 2026-03-06  6:43 ` Baolin Wang
  2026-03-06  6:43 ` [PATCH v3 3/6] mm: rmap: add a ZONE_DEVICE folio warning in folio_referenced() Baolin Wang
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Baolin Wang @ 2026-03-06  6:43 UTC (permalink / raw)
  To: akpm, david
  Cc: catalin.marinas, will, lorenzo.stoakes, ryan.roberts,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, riel, harry.yoo,
	jannh, willy, baohua, dev.jain, axelrasmussen, yuanchu, weixugc,
	hannes, zhengqi.arch, shakeel.butt, baolin.wang, linux-mm,
	linux-arm-kernel, linux-kernel

Rename ptep/pmdp_clear_young_notify() to ptep/pmdp_test_and_clear_young_notify()
to make the function names consistent.

Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Suggested-by: David Hildenbrand (Arm) <david@kernel.org>
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 mm/internal.h | 8 ++++----
 mm/vmscan.c   | 8 ++++----
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 05eb0303f277..f45f97df0d28 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1819,7 +1819,7 @@ static inline int pmdp_clear_flush_young_notify(struct vm_area_struct *vma,
 	return young;
 }
 
-static inline int ptep_clear_young_notify(struct vm_area_struct *vma,
+static inline int ptep_test_and_clear_young_notify(struct vm_area_struct *vma,
 		unsigned long addr, pte_t *ptep)
 {
 	int young;
@@ -1829,7 +1829,7 @@ static inline int ptep_clear_young_notify(struct vm_area_struct *vma,
 	return young;
 }
 
-static inline int pmdp_clear_young_notify(struct vm_area_struct *vma,
+static inline int pmdp_test_and_clear_young_notify(struct vm_area_struct *vma,
 		unsigned long addr, pmd_t *pmdp)
 {
 	int young;
@@ -1843,8 +1843,8 @@ static inline int pmdp_clear_young_notify(struct vm_area_struct *vma,
 
 #define clear_flush_young_ptes_notify	clear_flush_young_ptes
 #define pmdp_clear_flush_young_notify	pmdp_clear_flush_young
-#define ptep_clear_young_notify	ptep_test_and_clear_young
-#define pmdp_clear_young_notify	pmdp_test_and_clear_young
+#define ptep_test_and_clear_young_notify	ptep_test_and_clear_young
+#define pmdp_test_and_clear_young_notify	pmdp_test_and_clear_young
 
 #endif /* CONFIG_MMU_NOTIFIER */
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 52adb37d1b01..e3425b4db755 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3504,7 +3504,7 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end,
 		if (!folio)
 			continue;
 
-		if (!ptep_clear_young_notify(args->vma, addr, pte + i))
+		if (!ptep_test_and_clear_young_notify(args->vma, addr, pte + i))
 			continue;
 
 		if (last != folio) {
@@ -3595,7 +3595,7 @@ static void walk_pmd_range_locked(pud_t *pud, unsigned long addr, struct vm_area
 		if (!folio)
 			goto next;
 
-		if (!pmdp_clear_young_notify(vma, addr, pmd + i))
+		if (!pmdp_test_and_clear_young_notify(vma, addr, pmd + i))
 			goto next;
 
 		if (last != folio) {
@@ -4185,7 +4185,7 @@ bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw)
 	lockdep_assert_held(pvmw->ptl);
 	VM_WARN_ON_ONCE_FOLIO(folio_test_lru(folio), folio);
 
-	if (!ptep_clear_young_notify(vma, addr, pte))
+	if (!ptep_test_and_clear_young_notify(vma, addr, pte))
 		return false;
 
 	if (spin_is_contended(pvmw->ptl))
@@ -4237,7 +4237,7 @@ bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw)
 		if (!folio)
 			continue;
 
-		if (!ptep_clear_young_notify(vma, addr, pte + i))
+		if (!ptep_test_and_clear_young_notify(vma, addr, pte + i))
 			continue;
 
 		if (last != folio) {
-- 
2.47.3



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v3 3/6] mm: rmap: add a ZONE_DEVICE folio warning in folio_referenced()
  2026-03-06  6:43 [PATCH v3 0/6] support batched checking of the young flag for MGLRU Baolin Wang
  2026-03-06  6:43 ` [PATCH v3 1/6] mm: use inline helper functions instead of ugly macros Baolin Wang
  2026-03-06  6:43 ` [PATCH v3 2/6] mm: rename ptep/pmdp_clear_young_notify() to ptep/pmdp_test_and_clear_young_notify() Baolin Wang
@ 2026-03-06  6:43 ` Baolin Wang
  2026-03-06  6:43 ` [PATCH v3 4/6] mm: add a batched helper to clear the young flag for large folios Baolin Wang
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Baolin Wang @ 2026-03-06  6:43 UTC (permalink / raw)
  To: akpm, david
  Cc: catalin.marinas, will, lorenzo.stoakes, ryan.roberts,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, riel, harry.yoo,
	jannh, willy, baohua, dev.jain, axelrasmussen, yuanchu, weixugc,
	hannes, zhengqi.arch, shakeel.butt, baolin.wang, linux-mm,
	linux-arm-kernel, linux-kernel

The folio_referenced() is used to test whether a folio was referenced during
reclaim. Moreover, ZONE_DEVICE folios are controlled by their device driver,
have a lifetime tied to that driver, and are never placed on the LRU list.
That means we should never try to reclaim ZONE_DEVICE folios, so add a warning
to catch this unexpected behavior in folio_referenced() to avoid confusion,
as discussed in the previous thread[1].

[1] https://lore.kernel.org/all/16fb7985-ec0f-4b56-91e7-404c5114f899@kernel.org/
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 mm/rmap.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/rmap.c b/mm/rmap.c
index 603186ff4ba5..2d94b3ba52da 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1065,6 +1065,7 @@ int folio_referenced(struct folio *folio, int is_locked,
 		.invalid_vma = invalid_folio_referenced_vma,
 	};
 
+	VM_WARN_ON_ONCE_FOLIO(folio_is_zone_device(folio), folio);
 	*vm_flags = 0;
 	if (!pra.mapcount)
 		return 0;
-- 
2.47.3



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v3 4/6] mm: add a batched helper to clear the young flag for large folios
  2026-03-06  6:43 [PATCH v3 0/6] support batched checking of the young flag for MGLRU Baolin Wang
                   ` (2 preceding siblings ...)
  2026-03-06  6:43 ` [PATCH v3 3/6] mm: rmap: add a ZONE_DEVICE folio warning in folio_referenced() Baolin Wang
@ 2026-03-06  6:43 ` Baolin Wang
  2026-03-06  6:43 ` [PATCH v3 5/6] mm: support batched checking of the young flag for MGLRU Baolin Wang
  2026-03-06  6:43 ` [PATCH v3 6/6] arm64: mm: implement the architecture-specific test_and_clear_young_ptes() Baolin Wang
  5 siblings, 0 replies; 7+ messages in thread
From: Baolin Wang @ 2026-03-06  6:43 UTC (permalink / raw)
  To: akpm, david
  Cc: catalin.marinas, will, lorenzo.stoakes, ryan.roberts,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, riel, harry.yoo,
	jannh, willy, baohua, dev.jain, axelrasmussen, yuanchu, weixugc,
	hannes, zhengqi.arch, shakeel.butt, baolin.wang, linux-mm,
	linux-arm-kernel, linux-kernel

Currently, MGLRU will call ptep_test_and_clear_young_notify() to check and
clear the young flag for each PTE sequentially, which is inefficient for
large folios reclamation.

Moreover, on Arm64 architecture, which supports contiguous PTEs, the Arm64-
specific ptep_test_and_clear_young() already implements an optimization to
clear the young flags for PTEs within a contiguous range. However, this is not
sufficient. Similar to the Arm64 specific clear_flush_young_ptes(), we can
extend this to perform batched operations for the entire large folio (which
might exceed the contiguous range: CONT_PTE_SIZE).

Thus, we can introduce a new batched helper: test_and_clear_young_ptes() and
its wrapper test_and_clear_young_ptes_notify() which are consistent with the
existing functions, to perform batched checking of the young flags for large
folios, which can help improve performance during large folio reclamation when
MGLRU is enabled. And it will be overridden by the architecture that implements
a more efficient batch operation in the following patches.

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 include/linux/pgtable.h | 37 +++++++++++++++++++++++++++++++++++++
 mm/internal.h           | 16 +++++++++++-----
 2 files changed, 48 insertions(+), 5 deletions(-)

diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index d2767a4c027b..17d961c612fc 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1103,6 +1103,43 @@ static inline int clear_flush_young_ptes(struct vm_area_struct *vma,
 }
 #endif
 
+#ifndef test_and_clear_young_ptes
+/**
+ * test_and_clear_young_ptes - Mark PTEs that map consecutive pages of the same
+ *			       folio as old
+ * @vma: The virtual memory area the pages are mapped into.
+ * @addr: Address the first page is mapped at.
+ * @ptep: Page table pointer for the first entry.
+ * @nr: Number of entries to clear access bit.
+ *
+ * May be overridden by the architecture; otherwise, implemented as a simple
+ * loop over ptep_test_and_clear_young().
+ *
+ * Note that PTE bits in the PTE range besides the PFN can differ. For example,
+ * some PTEs might be write-protected.
+ *
+ * Context: The caller holds the page table lock.  The PTEs map consecutive
+ * pages that belong to the same folio.  The PTEs are all in the same PMD.
+ *
+ * Returns: whether any PTE was young.
+ */
+static inline int test_and_clear_young_ptes(struct vm_area_struct *vma,
+		unsigned long addr, pte_t *ptep, unsigned int nr)
+{
+	int young = 0;
+
+	for (;;) {
+		young |= ptep_test_and_clear_young(vma, addr, ptep);
+		if (--nr == 0)
+			break;
+		ptep++;
+		addr += PAGE_SIZE;
+	}
+
+	return young;
+}
+#endif
+
 /*
  * On some architectures hardware does not set page access bit when accessing
  * memory page, it is responsibility of software setting this bit. It brings
diff --git a/mm/internal.h b/mm/internal.h
index f45f97df0d28..8cdd5d8e43fb 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1819,13 +1819,13 @@ static inline int pmdp_clear_flush_young_notify(struct vm_area_struct *vma,
 	return young;
 }
 
-static inline int ptep_test_and_clear_young_notify(struct vm_area_struct *vma,
-		unsigned long addr, pte_t *ptep)
+static inline int test_and_clear_young_ptes_notify(struct vm_area_struct *vma,
+		unsigned long addr, pte_t *ptep, unsigned int nr)
 {
 	int young;
 
-	young = ptep_test_and_clear_young(vma, addr, ptep);
-	young |= mmu_notifier_clear_young(vma->vm_mm, addr, addr + PAGE_SIZE);
+	young = test_and_clear_young_ptes(vma, addr, ptep, nr);
+	young |= mmu_notifier_clear_young(vma->vm_mm, addr, addr + nr * PAGE_SIZE);
 	return young;
 }
 
@@ -1843,9 +1843,15 @@ static inline int pmdp_test_and_clear_young_notify(struct vm_area_struct *vma,
 
 #define clear_flush_young_ptes_notify	clear_flush_young_ptes
 #define pmdp_clear_flush_young_notify	pmdp_clear_flush_young
-#define ptep_test_and_clear_young_notify	ptep_test_and_clear_young
+#define test_and_clear_young_ptes_notify	test_and_clear_young_ptes
 #define pmdp_test_and_clear_young_notify	pmdp_test_and_clear_young
 
 #endif /* CONFIG_MMU_NOTIFIER */
 
+static inline int ptep_test_and_clear_young_notify(struct vm_area_struct *vma,
+		unsigned long addr, pte_t *ptep)
+{
+	return test_and_clear_young_ptes_notify(vma, addr, ptep, 1);
+}
+
 #endif	/* __MM_INTERNAL_H */
-- 
2.47.3



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v3 5/6] mm: support batched checking of the young flag for MGLRU
  2026-03-06  6:43 [PATCH v3 0/6] support batched checking of the young flag for MGLRU Baolin Wang
                   ` (3 preceding siblings ...)
  2026-03-06  6:43 ` [PATCH v3 4/6] mm: add a batched helper to clear the young flag for large folios Baolin Wang
@ 2026-03-06  6:43 ` Baolin Wang
  2026-03-06  6:43 ` [PATCH v3 6/6] arm64: mm: implement the architecture-specific test_and_clear_young_ptes() Baolin Wang
  5 siblings, 0 replies; 7+ messages in thread
From: Baolin Wang @ 2026-03-06  6:43 UTC (permalink / raw)
  To: akpm, david
  Cc: catalin.marinas, will, lorenzo.stoakes, ryan.roberts,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, riel, harry.yoo,
	jannh, willy, baohua, dev.jain, axelrasmussen, yuanchu, weixugc,
	hannes, zhengqi.arch, shakeel.butt, baolin.wang, linux-mm,
	linux-arm-kernel, linux-kernel

Use the batched helper test_and_clear_young_ptes_notify() to check and clear
the young flag to improve the performance during large folio reclamation when
MGLRU is enabled.

Meanwhile, we can also support batched checking the young and dirty flag
when MGLRU walks the mm's pagetable to update the folios' generation
counter. Since MGLRU also checks the PTE dirty bit, use folio_pte_batch_flags()
with FPB_MERGE_YOUNG_DIRTY set to detect batches of PTEs for a large folio.

Then we can remove the ptep_test_and_clear_young_notify() since it has
no users now.

Note that we also update the 'young' counter and 'mm_stats[MM_LEAF_YOUNG]'
counter with the batched count in the lru_gen_look_around() and walk_pte_range().
However, the batched operations may inflate these two counters, because in
a large folio not all PTEs may have been accessed. (Additionally, tracking
how many PTEs have been accessed within a large folio is not very meaningful,
since the mm core actually tracks access/dirty on a per-folio basis, not per
page). The impact analysis is as follows:

1. The 'mm_stats[MM_LEAF_YOUNG]' counter has no functional impact and is
mainly for debugging.

2. The 'young' counter is used to decide whether to place the current PMD
entry into the bloom filters by suitable_to_scan() (so that next time we
can check whether it has been accessed again), which may set the hash bit
in the bloom filters for a PMD entry that hasn’t seen much access. However,
bloom filters inherently allow some error, so this effect appears negligible.

Reviewed-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 include/linux/mmzone.h |  5 +++--
 mm/internal.h          |  6 ------
 mm/rmap.c              | 28 +++++++++++++--------------
 mm/vmscan.c            | 43 +++++++++++++++++++++++++++++++-----------
 4 files changed, 49 insertions(+), 33 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index b53b95abe287..7bd0134c241c 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -684,7 +684,7 @@ struct lru_gen_memcg {
 
 void lru_gen_init_pgdat(struct pglist_data *pgdat);
 void lru_gen_init_lruvec(struct lruvec *lruvec);
-bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw);
+bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw, unsigned int nr);
 
 void lru_gen_init_memcg(struct mem_cgroup *memcg);
 void lru_gen_exit_memcg(struct mem_cgroup *memcg);
@@ -706,7 +706,8 @@ static inline void lru_gen_init_lruvec(struct lruvec *lruvec)
 {
 }
 
-static inline bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw)
+static inline bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw,
+		unsigned int nr)
 {
 	return false;
 }
diff --git a/mm/internal.h b/mm/internal.h
index 8cdd5d8e43fb..95b583e7e4f7 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1848,10 +1848,4 @@ static inline int pmdp_test_and_clear_young_notify(struct vm_area_struct *vma,
 
 #endif /* CONFIG_MMU_NOTIFIER */
 
-static inline int ptep_test_and_clear_young_notify(struct vm_area_struct *vma,
-		unsigned long addr, pte_t *ptep)
-{
-	return test_and_clear_young_ptes_notify(vma, addr, ptep, 1);
-}
-
 #endif	/* __MM_INTERNAL_H */
diff --git a/mm/rmap.c b/mm/rmap.c
index 2d94b3ba52da..6398d7eef393 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -958,25 +958,20 @@ static bool folio_referenced_one(struct folio *folio,
 			return false;
 		}
 
+		if (pvmw.pte && folio_test_large(folio)) {
+			const unsigned long end_addr = pmd_addr_end(address, vma->vm_end);
+			const unsigned int max_nr = (end_addr - address) >> PAGE_SHIFT;
+			pte_t pteval = ptep_get(pvmw.pte);
+
+			nr = folio_pte_batch(folio, pvmw.pte, pteval, max_nr);
+		}
+
 		if (lru_gen_enabled() && pvmw.pte) {
-			if (lru_gen_look_around(&pvmw))
+			if (lru_gen_look_around(&pvmw, nr))
 				referenced++;
 		} else if (pvmw.pte) {
-			if (folio_test_large(folio)) {
-				unsigned long end_addr = pmd_addr_end(address, vma->vm_end);
-				unsigned int max_nr = (end_addr - address) >> PAGE_SHIFT;
-				pte_t pteval = ptep_get(pvmw.pte);
-
-				nr = folio_pte_batch(folio, pvmw.pte,
-						     pteval, max_nr);
-			}
-
-			ptes += nr;
 			if (clear_flush_young_ptes_notify(vma, address, pvmw.pte, nr))
 				referenced++;
-			/* Skip the batched PTEs */
-			pvmw.pte += nr - 1;
-			pvmw.address += (nr - 1) * PAGE_SIZE;
 		} else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
 			if (pmdp_clear_flush_young_notify(vma, address,
 						pvmw.pmd))
@@ -986,6 +981,7 @@ static bool folio_referenced_one(struct folio *folio,
 			WARN_ON_ONCE(1);
 		}
 
+		ptes += nr;
 		pra->mapcount -= nr;
 		/*
 		 * If we are sure that we batched the entire folio,
@@ -995,6 +991,10 @@ static bool folio_referenced_one(struct folio *folio,
 			page_vma_mapped_walk_done(&pvmw);
 			break;
 		}
+
+		/* Skip the batched PTEs */
+		pvmw.pte += nr - 1;
+		pvmw.address += (nr - 1) * PAGE_SIZE;
 	}
 
 	if (referenced)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index e3425b4db755..33287ba4a500 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3470,6 +3470,7 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end,
 	struct pglist_data *pgdat = lruvec_pgdat(walk->lruvec);
 	DEFINE_MAX_SEQ(walk->lruvec);
 	int gen = lru_gen_from_seq(max_seq);
+	unsigned int nr;
 	pmd_t pmdval;
 
 	pte = pte_offset_map_rw_nolock(args->mm, pmd, start & PMD_MASK, &pmdval, &ptl);
@@ -3488,11 +3489,13 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end,
 
 	lazy_mmu_mode_enable();
 restart:
-	for (i = pte_index(start), addr = start; addr != end; i++, addr += PAGE_SIZE) {
+	for (i = pte_index(start), addr = start; addr != end; i += nr, addr += nr * PAGE_SIZE) {
 		unsigned long pfn;
 		struct folio *folio;
-		pte_t ptent = ptep_get(pte + i);
+		pte_t *cur_pte = pte + i;
+		pte_t ptent = ptep_get(cur_pte);
 
+		nr = 1;
 		total++;
 		walk->mm_stats[MM_LEAF_TOTAL]++;
 
@@ -3504,7 +3507,16 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end,
 		if (!folio)
 			continue;
 
-		if (!ptep_test_and_clear_young_notify(args->vma, addr, pte + i))
+		if (folio_test_large(folio)) {
+			const unsigned int max_nr = (end - addr) >> PAGE_SHIFT;
+
+			nr = folio_pte_batch_flags(folio, NULL, cur_pte, &ptent,
+						   max_nr, FPB_MERGE_YOUNG_DIRTY);
+			total += nr - 1;
+			walk->mm_stats[MM_LEAF_TOTAL] += nr - 1;
+		}
+
+		if (!test_and_clear_young_ptes_notify(args->vma, addr, cur_pte, nr))
 			continue;
 
 		if (last != folio) {
@@ -3517,8 +3529,8 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end,
 		if (pte_dirty(ptent))
 			dirty = true;
 
-		young++;
-		walk->mm_stats[MM_LEAF_YOUNG]++;
+		young += nr;
+		walk->mm_stats[MM_LEAF_YOUNG] += nr;
 	}
 
 	walk_update_folio(walk, last, gen, dirty);
@@ -4162,7 +4174,7 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc)
  * the PTE table to the Bloom filter. This forms a feedback loop between the
  * eviction and the aging.
  */
-bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw)
+bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw, unsigned int nr)
 {
 	int i;
 	bool dirty;
@@ -4185,7 +4197,7 @@ bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw)
 	lockdep_assert_held(pvmw->ptl);
 	VM_WARN_ON_ONCE_FOLIO(folio_test_lru(folio), folio);
 
-	if (!ptep_test_and_clear_young_notify(vma, addr, pte))
+	if (!test_and_clear_young_ptes_notify(vma, addr, pte, nr))
 		return false;
 
 	if (spin_is_contended(pvmw->ptl))
@@ -4225,10 +4237,12 @@ bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw)
 
 	pte -= (addr - start) / PAGE_SIZE;
 
-	for (i = 0, addr = start; addr != end; i++, addr += PAGE_SIZE) {
+	for (i = 0, addr = start; addr != end;
+	     i += nr, pte += nr, addr += nr * PAGE_SIZE) {
 		unsigned long pfn;
-		pte_t ptent = ptep_get(pte + i);
+		pte_t ptent = ptep_get(pte);
 
+		nr = 1;
 		pfn = get_pte_pfn(ptent, vma, addr, pgdat);
 		if (pfn == -1)
 			continue;
@@ -4237,7 +4251,14 @@ bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw)
 		if (!folio)
 			continue;
 
-		if (!ptep_test_and_clear_young_notify(vma, addr, pte + i))
+		if (folio_test_large(folio)) {
+			const unsigned int max_nr = (end - addr) >> PAGE_SHIFT;
+
+			nr = folio_pte_batch_flags(folio, NULL, pte, &ptent,
+						   max_nr, FPB_MERGE_YOUNG_DIRTY);
+		}
+
+		if (!test_and_clear_young_ptes_notify(vma, addr, pte, nr))
 			continue;
 
 		if (last != folio) {
@@ -4250,7 +4271,7 @@ bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw)
 		if (pte_dirty(ptent))
 			dirty = true;
 
-		young++;
+		young += nr;
 	}
 
 	walk_update_folio(walk, last, gen, dirty);
-- 
2.47.3



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v3 6/6] arm64: mm: implement the architecture-specific test_and_clear_young_ptes()
  2026-03-06  6:43 [PATCH v3 0/6] support batched checking of the young flag for MGLRU Baolin Wang
                   ` (4 preceding siblings ...)
  2026-03-06  6:43 ` [PATCH v3 5/6] mm: support batched checking of the young flag for MGLRU Baolin Wang
@ 2026-03-06  6:43 ` Baolin Wang
  5 siblings, 0 replies; 7+ messages in thread
From: Baolin Wang @ 2026-03-06  6:43 UTC (permalink / raw)
  To: akpm, david
  Cc: catalin.marinas, will, lorenzo.stoakes, ryan.roberts,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, riel, harry.yoo,
	jannh, willy, baohua, dev.jain, axelrasmussen, yuanchu, weixugc,
	hannes, zhengqi.arch, shakeel.butt, baolin.wang, linux-mm,
	linux-arm-kernel, linux-kernel

Implement the Arm64 architecture-specific test_and_clear_young_ptes() to enable
batched checking of young flags, improving performance during large folio
reclamation when MGLRU is enabled.

While we're at it, simplify ptep_test_and_clear_young() by calling
test_and_clear_young_ptes(). Since callers guarantee that PTEs are present
before calling these functions, we can use pte_cont() to check the CONT_PTE
flag instead of pte_valid_cont().

Performance testing:
Enable MGLRU, then allocate 10G clean file-backed folios by mmap() in a memory
cgroup, and try to reclaim 8G file-backed folios via the memory.reclaim interface.
I can observe 60%+ performance improvement on my Arm64 32-core server (and about
15% improvement on my X86 machine).

W/o patchset:
real	0m0.470s
user	0m0.000s
sys	0m0.470s

W/ patchset:
real	0m0.180s
user	0m0.001s
sys	0m0.179s

Reviewed-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 arch/arm64/include/asm/pgtable.h | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index aa4b13da6371..ab451d20e4c5 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -1812,16 +1812,22 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
 	return __ptep_get_and_clear(mm, addr, ptep);
 }
 
+#define test_and_clear_young_ptes test_and_clear_young_ptes
+static inline int test_and_clear_young_ptes(struct vm_area_struct *vma,
+					    unsigned long addr, pte_t *ptep,
+					    unsigned int nr)
+{
+	if (likely(nr == 1 && !pte_cont(__ptep_get(ptep))))
+		return __ptep_test_and_clear_young(vma, addr, ptep);
+
+	return contpte_test_and_clear_young_ptes(vma, addr, ptep, nr);
+}
+
 #define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
 static inline int ptep_test_and_clear_young(struct vm_area_struct *vma,
 				unsigned long addr, pte_t *ptep)
 {
-	pte_t orig_pte = __ptep_get(ptep);
-
-	if (likely(!pte_valid_cont(orig_pte)))
-		return __ptep_test_and_clear_young(vma, addr, ptep);
-
-	return contpte_test_and_clear_young_ptes(vma, addr, ptep, 1);
+	return test_and_clear_young_ptes(vma, addr, ptep, 1);
 }
 
 #define __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH
-- 
2.47.3



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-03-06  6:44 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-03-06  6:43 [PATCH v3 0/6] support batched checking of the young flag for MGLRU Baolin Wang
2026-03-06  6:43 ` [PATCH v3 1/6] mm: use inline helper functions instead of ugly macros Baolin Wang
2026-03-06  6:43 ` [PATCH v3 2/6] mm: rename ptep/pmdp_clear_young_notify() to ptep/pmdp_test_and_clear_young_notify() Baolin Wang
2026-03-06  6:43 ` [PATCH v3 3/6] mm: rmap: add a ZONE_DEVICE folio warning in folio_referenced() Baolin Wang
2026-03-06  6:43 ` [PATCH v3 4/6] mm: add a batched helper to clear the young flag for large folios Baolin Wang
2026-03-06  6:43 ` [PATCH v3 5/6] mm: support batched checking of the young flag for MGLRU Baolin Wang
2026-03-06  6:43 ` [PATCH v3 6/6] arm64: mm: implement the architecture-specific test_and_clear_young_ptes() Baolin Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox