From: Baolin Wang <baolin.wang@linux.alibaba.com>
To: akpm@linux-foundation.org, david@kernel.org
Cc: catalin.marinas@arm.com, will@kernel.org,
lorenzo.stoakes@oracle.com, ryan.roberts@arm.com,
Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, riel@surriel.com,
harry.yoo@oracle.com, jannh@google.com, willy@infradead.org,
baohua@kernel.org, dev.jain@arm.com, axelrasmussen@google.com,
yuanchu@google.com, weixugc@google.com, hannes@cmpxchg.org,
zhengqi.arch@bytedance.com, shakeel.butt@linux.dev,
baolin.wang@linux.alibaba.com, linux-mm@kvack.org,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org
Subject: [PATCH 3/5] mm: add a batched helper to clear the young flag for large folios
Date: Tue, 24 Feb 2026 09:56:06 +0800 [thread overview]
Message-ID: <bfbe28e381b02452b455498e7ea82662e83a3865.1771897150.git.baolin.wang@linux.alibaba.com> (raw)
In-Reply-To: <cover.1771897150.git.baolin.wang@linux.alibaba.com>
Currently, MGLRU will call ptep_clear_young_notify() to check and clear the
young flag for each PTE sequentially, which is inefficient for large folios
reclamation.
Moreover, on Arm64 architecture, which supports contiguous PTEs, the Arm64-
specific ptep_test_and_clear_young() already implements an optimization to
clear the young flags for PTEs within a contiguous range. However, this is not
sufficient. Similar to the Arm64 specific clear_flush_young_ptes(), we can
extend this to perform batched operations for the entire large folio (which
might exceed the contiguous range: CONT_PTE_SIZE).
Thus, we can introduce a new batched helper: test_and_clear_young_ptes() and
its wrapper clear_young_ptes_notify(), to perform batched checking of the young
flags for large folios, which can help improve performance during large folio
reclamation when MGLRU is enabled. And it will be overridden by the architecture
that implements a more efficient batch operation in the following patches.
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
include/linux/pgtable.h | 36 ++++++++++++++++++++++++++++++++++++
mm/internal.h | 23 ++++++++++++++++++-----
2 files changed, 54 insertions(+), 5 deletions(-)
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 776993d4567b..0bcd3be524d3 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1103,6 +1103,42 @@ static inline int clear_flush_young_ptes(struct vm_area_struct *vma,
}
#endif
+#ifndef test_and_clear_young_ptes
+/**
+ * test_and_clear_young_ptes - Mark PTEs that map consecutive pages of the same
+ * folio as old
+ * @vma: The virtual memory area the pages are mapped into.
+ * @addr: Address the first page is mapped at.
+ * @ptep: Page table pointer for the first entry.
+ * @nr: Number of entries to clear access bit.
+ *
+ * May be overridden by the architecture; otherwise, implemented as a simple
+ * loop over ptep_test_and_clear_young().
+ *
+ * Note that PTE bits in the PTE range besides the PFN can differ. For example,
+ * some PTEs might be write-protected.
+ *
+ * Context: The caller holds the page table lock. The PTEs map consecutive
+ * pages that belong to the same folio. The PTEs are all in the same PMD.
+ */
+static inline int test_and_clear_young_ptes(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep,
+ unsigned int nr)
+{
+ int young = 0;
+
+ for (;;) {
+ young |= ptep_test_and_clear_young(vma, addr, ptep);
+ if (--nr == 0)
+ break;
+ ptep++;
+ addr += PAGE_SIZE;
+ }
+
+ return young;
+}
+#endif
+
/*
* On some architectures hardware does not set page access bit when accessing
* memory page, it is responsibility of software setting this bit. It brings
diff --git a/mm/internal.h b/mm/internal.h
index 1ba175b8d4f1..1b59be99dc3f 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1813,16 +1813,23 @@ static inline int pmdp_clear_flush_young_notify(struct vm_area_struct *vma,
return young;
}
-static inline int ptep_clear_young_notify(struct vm_area_struct *vma,
- unsigned long addr, pte_t *ptep)
+static inline int clear_young_ptes_notify(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep,
+ unsigned int nr)
{
int young;
- young = ptep_test_and_clear_young(vma, addr, ptep);
- young |= mmu_notifier_clear_young(vma->vm_mm, addr, addr + PAGE_SIZE);
+ young = test_and_clear_young_ptes(vma, addr, ptep, nr);
+ young |= mmu_notifier_clear_young(vma->vm_mm, addr, addr + nr * PAGE_SIZE);
return young;
}
+static inline int ptep_clear_young_notify(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep)
+{
+ return clear_young_ptes_notify(vma, addr, ptep, 1);
+}
+
static inline int pmdp_clear_young_notify(struct vm_area_struct *vma,
unsigned long addr, pmd_t *pmdp)
{
@@ -1837,9 +1844,15 @@ static inline int pmdp_clear_young_notify(struct vm_area_struct *vma,
#define clear_flush_young_ptes_notify clear_flush_young_ptes
#define pmdp_clear_flush_young_notify pmdp_clear_flush_young
-#define ptep_clear_young_notify ptep_test_and_clear_young
+#define clear_young_ptes_notify test_and_clear_young_ptes
#define pmdp_clear_young_notify pmdp_test_and_clear_young
+static inline int ptep_clear_young_notify(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep)
+{
+ return test_and_clear_young_ptes(vma, addr, ptep, 1);
+}
+
#endif /* CONFIG_MMU_NOTIFIER */
#endif /* __MM_INTERNAL_H */
--
2.47.3
next prev parent reply other threads:[~2026-02-24 1:56 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-24 1:56 [PATCH 0/5] support batched checking of the young flag for MGLRU Baolin Wang
2026-02-24 1:56 ` [PATCH 1/5] mm: use inline helper functions instead of ugly macros Baolin Wang
2026-02-24 2:36 ` Rik van Riel
2026-02-24 1:56 ` [PATCH 2/5] mm: rmap: add a ZONE_DEVICE folio warning in folio_referenced() Baolin Wang
2026-02-24 2:38 ` Rik van Riel
2026-02-24 1:56 ` Baolin Wang [this message]
2026-02-24 1:56 ` [PATCH 4/5] mm: support batched checking of the young flag for MGLRU Baolin Wang
2026-02-24 1:56 ` [PATCH 5/5] arm64: mm: implement the architecture-specific test_and_clear_young_ptes() Baolin Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bfbe28e381b02452b455498e7ea82662e83a3865.1771897150.git.baolin.wang@linux.alibaba.com \
--to=baolin.wang@linux.alibaba.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=baohua@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=hannes@cmpxchg.org \
--cc=harry.yoo@oracle.com \
--cc=jannh@google.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=riel@surriel.com \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=shakeel.butt@linux.dev \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
--cc=weixugc@google.com \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=yuanchu@google.com \
--cc=zhengqi.arch@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox