* [PATCH mm-unstable v1 0/5] mm: khugepaged cleanups and mTHP prerequisites
@ 2026-02-12 2:18 Nico Pache
2026-02-12 2:18 ` [PATCH mm-unstable v1 1/5] mm: consolidate anonymous folio PTE mapping into helpers Nico Pache
` (5 more replies)
0 siblings, 6 replies; 32+ messages in thread
From: Nico Pache @ 2026-02-12 2:18 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: aarcange, akpm, anshuman.khandual, apopple, baohua, baolin.wang,
byungchul, catalin.marinas, cl, corbet, dave.hansen, david,
dev.jain, gourry, hannes, hughd, jackmanb, jack, jannh, jglisse,
joshua.hahnjy, kas, lance.yang, Liam.Howlett, lorenzo.stoakes,
mathieu.desnoyers, matthew.brost, mhiramat, mhocko, npache,
peterx, pfalcato, rakie.kim, raquini, rdunlap, richard.weiyang,
rientjes, rostedt, rppt, ryan.roberts, shivankg, sunnanyong,
surenb, thomas.hellstrom, tiwai, usamaarif642, vbabka,
vishal.moola, wangkefeng.wang, will, willy, yang, ying.huang,
ziy, zokeefe
The following series contains cleanups and prerequisites for my work on
khugepaged mTHP support [1]. These have been separated out to ease review.
The first patch in the series refactors the page fault folio to pte mapping
and follows a similar convention as defined by map_anon_folio_pmd_(no)pf().
This not only cleans up the current implementation of do_anonymous_page(),
but will allow for reuse later in the khugepaged mTHP implementation.
The second patch adds a small is_pmd_order() helper to check if an order is
the PMD order. This check is open-coded in a number of places. This patch
aims to clean this up and will be used more in the khugepaged mTHP work.
The third patch also adds a small DEFINE for (HPAGE_PMD_NR - 1) which is
used often across the khugepaged code.
The fourth and fifth patch come from the khugepaged mTHP patchset [1].
These two patches include the rename of function prefixes, and the
unification of khugepaged and madvise_collapse via a new
collapse_single_pmd function.
Patch 1: refactor do_anonymous_page into map_anon_folio_pte_(no)pf
Patch 2: add is_pmd_order helper
Patch 3: Add define for (HPAGE_PMD_NR - 1)
Patch 4: rename hpage_collapse to collapse_
Patch 5: Refactoring to combine madvise_collapse and khugepaged
---------
Testing
---------
- Built for x86_64, aarch64, ppc64le, and s390x
- ran all arches on test suites provided by the kernel-tests project
- selftests mm
V1 Changes (for patches coming from [1]):
- Refactor do_anonymous_page() and add helpers for use in mthp series
- moved is_pmd_order patch to this series [2]
- added a define for HPAGE_PMD_NR - 1
- moved rename to this series [3]
- Dropped acks/review-by on PATCH 5 given [4]
- moved unification patch to this series [4]. I also had to make some
modifications from my previous version which include moving the new
madvise_collapse writeback retry logic into the collapse_single_pmd
function. This prevents a potential UAF bug I introduced in my v14 when
handling the conflict. [5][6]
A big thanks to everyone that has reviewed, tested, and participated in
the development process. Its been a great experience working with all of
you on this endeavour.
[1] - https://lore.kernel.org/all/20260122192841.128719-1-npache@redhat.com/
[2] - https://lore.kernel.org/all/20260122192841.128719-2-npache@redhat.com/
[3] - https://lore.kernel.org/all/20260122192841.128719-3-npache@redhat.com/
[4] - https://lore.kernel.org/all/20260122192841.128719-4-npache@redhat.com/
[5] - https://lore.kernel.org/all/65dcf7ab-1299-411f-9cbc-438ae72ff757@linux.dev/
[6] - https://lore.kernel.org/all/b824f131-3e51-422c-9e98-044b0a2928a6@redhat.com/
Nico Pache (5):
mm: consolidate anonymous folio PTE mapping into helpers
mm: introduce is_pmd_order helper
mm/khugepaged: define COLLAPSE_MAX_PTES_LIMIT as HPAGE_PMD_NR - 1
mm/khugepaged: rename hpage_collapse_* to collapse_*
mm/khugepaged: unify khugepaged and madv_collapse with
collapse_single_pmd()
include/linux/huge_mm.h | 5 ++
include/linux/mm.h | 4 +
mm/huge_memory.c | 2 +-
mm/khugepaged.c | 194 +++++++++++++++++++++-------------------
mm/memory.c | 56 ++++++++----
mm/mempolicy.c | 2 +-
mm/mremap.c | 2 +-
mm/page_alloc.c | 2 +-
8 files changed, 152 insertions(+), 115 deletions(-)
--
2.53.0
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH mm-unstable v1 1/5] mm: consolidate anonymous folio PTE mapping into helpers
2026-02-12 2:18 [PATCH mm-unstable v1 0/5] mm: khugepaged cleanups and mTHP prerequisites Nico Pache
@ 2026-02-12 2:18 ` Nico Pache
2026-02-12 14:38 ` Pedro Falcato
` (2 more replies)
2026-02-12 2:18 ` [PATCH mm-unstable v1 2/5] mm: introduce is_pmd_order helper Nico Pache
` (4 subsequent siblings)
5 siblings, 3 replies; 32+ messages in thread
From: Nico Pache @ 2026-02-12 2:18 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: aarcange, akpm, anshuman.khandual, apopple, baohua, baolin.wang,
byungchul, catalin.marinas, cl, corbet, dave.hansen, david,
dev.jain, gourry, hannes, hughd, jackmanb, jack, jannh, jglisse,
joshua.hahnjy, kas, lance.yang, Liam.Howlett, lorenzo.stoakes,
mathieu.desnoyers, matthew.brost, mhiramat, mhocko, npache,
peterx, pfalcato, rakie.kim, raquini, rdunlap, richard.weiyang,
rientjes, rostedt, rppt, ryan.roberts, shivankg, sunnanyong,
surenb, thomas.hellstrom, tiwai, usamaarif642, vbabka,
vishal.moola, wangkefeng.wang, will, willy, yang, ying.huang,
ziy, zokeefe
The anonymous page fault handler in do_anonymous_page() open-codes the
sequence to map a newly allocated anonymous folio at the PTE level:
- construct the PTE entry
- add rmap
- add to LRU
- set the PTEs
- update the MMU cache.
Introduce a two helpers to consolidate this duplicated logic, mirroring the
existing map_anon_folio_pmd_nopf() pattern for PMD-level mappings:
map_anon_folio_pte_nopf(): constructs the PTE entry, takes folio
references, adds anon rmap and LRU. This function also handles the
uffd_wp that can occur in the pf variant.
map_anon_folio_pte_pf(): extends the nopf variant to handle MM_ANONPAGES
counter updates, and mTHP fault allocation statistics for the page fault
path.
The zero-page read path in do_anonymous_page() is also untangled from the
shared setpte label, since it does not allocate a folio and should not
share the same mapping sequence as the write path. Make nr_pages = 1
rather than relying on the variable. This makes it more clear that we
are operating on the zero page only.
This refactoring will also help reduce code duplication between mm/memory.c
and mm/khugepaged.c, and provides a clean API for PTE-level anonymous folio
mapping that can be reused by future callers.
Signed-off-by: Nico Pache <npache@redhat.com>
---
include/linux/mm.h | 4 ++++
mm/memory.c | 56 ++++++++++++++++++++++++++++++----------------
2 files changed, 41 insertions(+), 19 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index f8a8fd47399c..c3aa1f51e020 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4916,4 +4916,8 @@ static inline bool snapshot_page_is_faithful(const struct page_snapshot *ps)
void snapshot_page(struct page_snapshot *ps, const struct page *page);
+void map_anon_folio_pte_nopf(struct folio *folio, pte_t *pte,
+ struct vm_area_struct *vma, unsigned long addr,
+ bool uffd_wp);
+
#endif /* _LINUX_MM_H */
diff --git a/mm/memory.c b/mm/memory.c
index 8c19af97f0a0..61c2277c9d9f 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5211,6 +5211,35 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
return folio_prealloc(vma->vm_mm, vma, vmf->address, true);
}
+
+void map_anon_folio_pte_nopf(struct folio *folio, pte_t *pte,
+ struct vm_area_struct *vma, unsigned long addr,
+ bool uffd_wp)
+{
+ pte_t entry = folio_mk_pte(folio, vma->vm_page_prot);
+ unsigned int nr_pages = folio_nr_pages(folio);
+
+ entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+ if (uffd_wp)
+ entry = pte_mkuffd_wp(entry);
+
+ folio_ref_add(folio, nr_pages - 1);
+ folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE);
+ folio_add_lru_vma(folio, vma);
+ set_ptes(vma->vm_mm, addr, pte, entry, nr_pages);
+ update_mmu_cache_range(NULL, vma, addr, pte, nr_pages);
+}
+
+static void map_anon_folio_pte_pf(struct folio *folio, pte_t *pte,
+ struct vm_area_struct *vma, unsigned long addr,
+ unsigned int nr_pages, bool uffd_wp)
+{
+ map_anon_folio_pte_nopf(folio, pte, vma, addr, uffd_wp);
+ add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages);
+ count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_FAULT_ALLOC);
+}
+
+
/*
* We enter with non-exclusive mmap_lock (to exclude vma changes,
* but allow concurrent faults), and pte mapped but not yet locked.
@@ -5257,7 +5286,13 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
pte_unmap_unlock(vmf->pte, vmf->ptl);
return handle_userfault(vmf, VM_UFFD_MISSING);
}
- goto setpte;
+ if (vmf_orig_pte_uffd_wp(vmf))
+ entry = pte_mkuffd_wp(entry);
+ set_pte_at(vma->vm_mm, addr, vmf->pte, entry);
+
+ /* No need to invalidate - it was non-present before */
+ update_mmu_cache_range(vmf, vma, addr, vmf->pte, /*nr_pages=*/ 1);
+ goto unlock;
}
/* Allocate our own private page. */
@@ -5281,11 +5316,6 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
*/
__folio_mark_uptodate(folio);
- entry = folio_mk_pte(folio, vma->vm_page_prot);
- entry = pte_sw_mkyoung(entry);
- if (vma->vm_flags & VM_WRITE)
- entry = pte_mkwrite(pte_mkdirty(entry), vma);
-
vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, addr, &vmf->ptl);
if (!vmf->pte)
goto release;
@@ -5307,19 +5337,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
folio_put(folio);
return handle_userfault(vmf, VM_UFFD_MISSING);
}
-
- folio_ref_add(folio, nr_pages - 1);
- add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages);
- count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_FAULT_ALLOC);
- folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE);
- folio_add_lru_vma(folio, vma);
-setpte:
- if (vmf_orig_pte_uffd_wp(vmf))
- entry = pte_mkuffd_wp(entry);
- set_ptes(vma->vm_mm, addr, vmf->pte, entry, nr_pages);
-
- /* No need to invalidate - it was non-present before */
- update_mmu_cache_range(vmf, vma, addr, vmf->pte, nr_pages);
+ map_anon_folio_pte_pf(folio, vmf->pte, vma, addr, nr_pages, vmf_orig_pte_uffd_wp(vmf));
unlock:
if (vmf->pte)
pte_unmap_unlock(vmf->pte, vmf->ptl);
--
2.53.0
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH mm-unstable v1 2/5] mm: introduce is_pmd_order helper
2026-02-12 2:18 [PATCH mm-unstable v1 0/5] mm: khugepaged cleanups and mTHP prerequisites Nico Pache
2026-02-12 2:18 ` [PATCH mm-unstable v1 1/5] mm: consolidate anonymous folio PTE mapping into helpers Nico Pache
@ 2026-02-12 2:18 ` Nico Pache
2026-02-12 14:40 ` Pedro Falcato
` (5 more replies)
2026-02-12 2:18 ` [PATCH mm-unstable v1 3/5] mm/khugepaged: define COLLAPSE_MAX_PTES_LIMIT as HPAGE_PMD_NR - 1 Nico Pache
` (3 subsequent siblings)
5 siblings, 6 replies; 32+ messages in thread
From: Nico Pache @ 2026-02-12 2:18 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: aarcange, akpm, anshuman.khandual, apopple, baohua, baolin.wang,
byungchul, catalin.marinas, cl, corbet, dave.hansen, david,
dev.jain, gourry, hannes, hughd, jackmanb, jack, jannh, jglisse,
joshua.hahnjy, kas, lance.yang, Liam.Howlett, lorenzo.stoakes,
mathieu.desnoyers, matthew.brost, mhiramat, mhocko, npache,
peterx, pfalcato, rakie.kim, raquini, rdunlap, richard.weiyang,
rientjes, rostedt, rppt, ryan.roberts, shivankg, sunnanyong,
surenb, thomas.hellstrom, tiwai, usamaarif642, vbabka,
vishal.moola, wangkefeng.wang, will, willy, yang, ying.huang,
ziy, zokeefe
In order to add mTHP support to khugepaged, we will often be checking if a
given order is (or is not) a PMD order. Some places in the kernel already
use this check, so lets create a simple helper function to keep the code
clean and readable.
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Nico Pache <npache@redhat.com>
arches# pick b3ace8be204f # mm/khugepaged: rename hpage_collapse_* to collapse_*
---
include/linux/huge_mm.h | 5 +++++
mm/huge_memory.c | 2 +-
mm/khugepaged.c | 4 ++--
mm/mempolicy.c | 2 +-
mm/page_alloc.c | 2 +-
5 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index a4d9f964dfde..bd7f0e1d8094 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -771,6 +771,11 @@ static inline bool pmd_is_huge(pmd_t pmd)
}
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+static inline bool is_pmd_order(unsigned int order)
+{
+ return order == HPAGE_PMD_ORDER;
+}
+
static inline int split_folio_to_list_to_order(struct folio *folio,
struct list_head *list, int new_order)
{
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 44ff8a648afd..5eae85818635 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -4097,7 +4097,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
i_mmap_unlock_read(mapping);
out:
xas_destroy(&xas);
- if (old_order == HPAGE_PMD_ORDER)
+ if (is_pmd_order(old_order))
count_vm_event(!ret ? THP_SPLIT_PAGE : THP_SPLIT_PAGE_FAILED);
count_mthp_stat(old_order, !ret ? MTHP_STAT_SPLIT : MTHP_STAT_SPLIT_FAILED);
return ret;
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index fa1e57fd2c46..c362b3b2e08a 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2000,7 +2000,7 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
* we locked the first folio, then a THP might be there already.
* This will be discovered on the first iteration.
*/
- if (folio_order(folio) == HPAGE_PMD_ORDER &&
+ if (is_pmd_order(folio_order(folio)) &&
folio->index == start) {
/* Maybe PMD-mapped */
result = SCAN_PTE_MAPPED_HUGEPAGE;
@@ -2329,7 +2329,7 @@ static enum scan_result hpage_collapse_scan_file(struct mm_struct *mm, unsigned
continue;
}
- if (folio_order(folio) == HPAGE_PMD_ORDER &&
+ if (is_pmd_order(folio_order(folio)) &&
folio->index == start) {
/* Maybe PMD-mapped */
result = SCAN_PTE_MAPPED_HUGEPAGE;
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index dbd48502ac24..3802e52b01fc 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2450,7 +2450,7 @@ static struct page *alloc_pages_mpol(gfp_t gfp, unsigned int order,
if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
/* filter "hugepage" allocation, unless from alloc_pages() */
- order == HPAGE_PMD_ORDER && ilx != NO_INTERLEAVE_INDEX) {
+ is_pmd_order(order) && ilx != NO_INTERLEAVE_INDEX) {
/*
* For hugepage allocation and non-interleave policy which
* allows the current node (or other explicitly preferred
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c2e96ac35636..2acf22f97ae5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -719,7 +719,7 @@ static inline bool pcp_allowed_order(unsigned int order)
if (order <= PAGE_ALLOC_COSTLY_ORDER)
return true;
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- if (order == HPAGE_PMD_ORDER)
+ if (is_pmd_order(order))
return true;
#endif
return false;
--
2.53.0
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH mm-unstable v1 3/5] mm/khugepaged: define COLLAPSE_MAX_PTES_LIMIT as HPAGE_PMD_NR - 1
2026-02-12 2:18 [PATCH mm-unstable v1 0/5] mm: khugepaged cleanups and mTHP prerequisites Nico Pache
2026-02-12 2:18 ` [PATCH mm-unstable v1 1/5] mm: consolidate anonymous folio PTE mapping into helpers Nico Pache
2026-02-12 2:18 ` [PATCH mm-unstable v1 2/5] mm: introduce is_pmd_order helper Nico Pache
@ 2026-02-12 2:18 ` Nico Pache
2026-02-12 6:56 ` Vernon Yang
` (3 more replies)
2026-02-12 2:23 ` [PATCH mm-unstable v1 4/5] mm/khugepaged: rename hpage_collapse_* to collapse_* Nico Pache
` (2 subsequent siblings)
5 siblings, 4 replies; 32+ messages in thread
From: Nico Pache @ 2026-02-12 2:18 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: aarcange, akpm, anshuman.khandual, apopple, baohua, baolin.wang,
byungchul, catalin.marinas, cl, corbet, dave.hansen, david,
dev.jain, gourry, hannes, hughd, jackmanb, jack, jannh, jglisse,
joshua.hahnjy, kas, lance.yang, Liam.Howlett, lorenzo.stoakes,
mathieu.desnoyers, matthew.brost, mhiramat, mhocko, npache,
peterx, pfalcato, rakie.kim, raquini, rdunlap, richard.weiyang,
rientjes, rostedt, rppt, ryan.roberts, shivankg, sunnanyong,
surenb, thomas.hellstrom, tiwai, usamaarif642, vbabka,
vishal.moola, wangkefeng.wang, will, willy, yang, ying.huang,
ziy, zokeefe
The value (HPAGE_PMD_NR - 1) is used often in the khugepaged code to
signify the limit of the max_ptes_* values. Add a define for this to
increase code readability and reuse.
Signed-off-by: Nico Pache <npache@redhat.com>
---
mm/khugepaged.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index c362b3b2e08a..3dcce6018f20 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -85,6 +85,7 @@ static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait);
*
* Note that these are only respected if collapse was initiated by khugepaged.
*/
+#define COLLAPSE_MAX_PTES_LIMIT (HPAGE_PMD_NR - 1)
unsigned int khugepaged_max_ptes_none __read_mostly;
static unsigned int khugepaged_max_ptes_swap __read_mostly;
static unsigned int khugepaged_max_ptes_shared __read_mostly;
@@ -252,7 +253,7 @@ static ssize_t max_ptes_none_store(struct kobject *kobj,
unsigned long max_ptes_none;
err = kstrtoul(buf, 10, &max_ptes_none);
- if (err || max_ptes_none > HPAGE_PMD_NR - 1)
+ if (err || max_ptes_none > COLLAPSE_MAX_PTES_LIMIT)
return -EINVAL;
khugepaged_max_ptes_none = max_ptes_none;
@@ -277,7 +278,7 @@ static ssize_t max_ptes_swap_store(struct kobject *kobj,
unsigned long max_ptes_swap;
err = kstrtoul(buf, 10, &max_ptes_swap);
- if (err || max_ptes_swap > HPAGE_PMD_NR - 1)
+ if (err || max_ptes_swap > COLLAPSE_MAX_PTES_LIMIT)
return -EINVAL;
khugepaged_max_ptes_swap = max_ptes_swap;
@@ -303,7 +304,7 @@ static ssize_t max_ptes_shared_store(struct kobject *kobj,
unsigned long max_ptes_shared;
err = kstrtoul(buf, 10, &max_ptes_shared);
- if (err || max_ptes_shared > HPAGE_PMD_NR - 1)
+ if (err || max_ptes_shared > COLLAPSE_MAX_PTES_LIMIT)
return -EINVAL;
khugepaged_max_ptes_shared = max_ptes_shared;
@@ -384,7 +385,7 @@ int __init khugepaged_init(void)
return -ENOMEM;
khugepaged_pages_to_scan = HPAGE_PMD_NR * 8;
- khugepaged_max_ptes_none = HPAGE_PMD_NR - 1;
+ khugepaged_max_ptes_none = COLLAPSE_MAX_PTES_LIMIT;
khugepaged_max_ptes_swap = HPAGE_PMD_NR / 8;
khugepaged_max_ptes_shared = HPAGE_PMD_NR / 2;
@@ -1869,7 +1870,7 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
bool is_shmem = shmem_file(file);
VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
- VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
+ VM_BUG_ON(start & (COLLAPSE_MAX_PTES_LIMIT));
result = alloc_charge_folio(&new_folio, mm, cc);
if (result != SCAN_SUCCEED)
@@ -2209,7 +2210,7 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
* unwritten page.
*/
folio_mark_uptodate(new_folio);
- folio_ref_add(new_folio, HPAGE_PMD_NR - 1);
+ folio_ref_add(new_folio, COLLAPSE_MAX_PTES_LIMIT);
if (is_shmem)
folio_mark_dirty(new_folio);
@@ -2303,7 +2304,7 @@ static enum scan_result hpage_collapse_scan_file(struct mm_struct *mm, unsigned
memset(cc->node_load, 0, sizeof(cc->node_load));
nodes_clear(cc->alloc_nmask);
rcu_read_lock();
- xas_for_each(&xas, folio, start + HPAGE_PMD_NR - 1) {
+ xas_for_each(&xas, folio, start + COLLAPSE_MAX_PTES_LIMIT) {
if (xas_retry(&xas, folio))
continue;
--
2.53.0
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH mm-unstable v1 4/5] mm/khugepaged: rename hpage_collapse_* to collapse_*
2026-02-12 2:18 [PATCH mm-unstable v1 0/5] mm: khugepaged cleanups and mTHP prerequisites Nico Pache
` (2 preceding siblings ...)
2026-02-12 2:18 ` [PATCH mm-unstable v1 3/5] mm/khugepaged: define COLLAPSE_MAX_PTES_LIMIT as HPAGE_PMD_NR - 1 Nico Pache
@ 2026-02-12 2:23 ` Nico Pache
2026-02-12 2:23 ` [PATCH mm-unstable v1 5/5] mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd() Nico Pache
2026-02-12 19:52 ` [PATCH mm-unstable v1 4/5] mm/khugepaged: rename hpage_collapse_* to collapse_* David Hildenbrand (Arm)
2026-02-12 2:25 ` [PATCH mm-unstable v1 5/5] mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd() Nico Pache
2026-02-12 2:26 ` [PATCH mm-unstable v1 0/5] mm: khugepaged cleanups and mTHP prerequisites Nico Pache
5 siblings, 2 replies; 32+ messages in thread
From: Nico Pache @ 2026-02-12 2:23 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: aarcange, akpm, anshuman.khandual, apopple, baohua, baolin.wang,
byungchul, catalin.marinas, cl, corbet, dave.hansen, david,
dev.jain, gourry, hannes, hughd, jackmanb, jack, jannh, jglisse,
joshua.hahnjy, kas, lance.yang, Liam.Howlett, lorenzo.stoakes,
mathieu.desnoyers, matthew.brost, mhiramat, mhocko, npache,
peterx, pfalcato, rakie.kim, raquini, rdunlap, richard.weiyang,
rientjes, rostedt, rppt, ryan.roberts, shivankg, sunnanyong,
surenb, thomas.hellstrom, tiwai, usamaarif642, vbabka,
vishal.moola, wangkefeng.wang, will, willy, yang, ying.huang,
ziy, zokeefe
The hpage_collapse functions describe functions used by madvise_collapse
and khugepaged. remove the unnecessary hpage prefix to shorten the
function name.
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Nico Pache <npache@redhat.com>
---
mm/khugepaged.c | 62 ++++++++++++++++++++++++-------------------------
mm/mremap.c | 2 +-
2 files changed, 31 insertions(+), 33 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 3dcce6018f20..fa41480f6948 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -397,14 +397,14 @@ void __init khugepaged_destroy(void)
kmem_cache_destroy(mm_slot_cache);
}
-static inline int hpage_collapse_test_exit(struct mm_struct *mm)
+static inline int collapse_test_exit(struct mm_struct *mm)
{
return atomic_read(&mm->mm_users) == 0;
}
-static inline int hpage_collapse_test_exit_or_disable(struct mm_struct *mm)
+static inline int collapse_test_exit_or_disable(struct mm_struct *mm)
{
- return hpage_collapse_test_exit(mm) ||
+ return collapse_test_exit(mm) ||
mm_flags_test(MMF_DISABLE_THP_COMPLETELY, mm);
}
@@ -438,7 +438,7 @@ void __khugepaged_enter(struct mm_struct *mm)
int wakeup;
/* __khugepaged_exit() must not run from under us */
- VM_BUG_ON_MM(hpage_collapse_test_exit(mm), mm);
+ VM_BUG_ON_MM(collapse_test_exit(mm), mm);
if (unlikely(mm_flags_test_and_set(MMF_VM_HUGEPAGE, mm)))
return;
@@ -492,7 +492,7 @@ void __khugepaged_exit(struct mm_struct *mm)
} else if (slot) {
/*
* This is required to serialize against
- * hpage_collapse_test_exit() (which is guaranteed to run
+ * collapse_test_exit() (which is guaranteed to run
* under mmap sem read mode). Stop here (after we return all
* pagetables will be destroyed) until khugepaged has finished
* working on the pagetables under the mmap_lock.
@@ -581,7 +581,7 @@ static enum scan_result __collapse_huge_page_isolate(struct vm_area_struct *vma,
folio = page_folio(page);
VM_BUG_ON_FOLIO(!folio_test_anon(folio), folio);
- /* See hpage_collapse_scan_pmd(). */
+ /* See collapse_scan_pmd(). */
if (folio_maybe_mapped_shared(folio)) {
++shared;
if (cc->is_khugepaged &&
@@ -832,7 +832,7 @@ static struct collapse_control khugepaged_collapse_control = {
.is_khugepaged = true,
};
-static bool hpage_collapse_scan_abort(int nid, struct collapse_control *cc)
+static bool collapse_scan_abort(int nid, struct collapse_control *cc)
{
int i;
@@ -867,7 +867,7 @@ static inline gfp_t alloc_hugepage_khugepaged_gfpmask(void)
}
#ifdef CONFIG_NUMA
-static int hpage_collapse_find_target_node(struct collapse_control *cc)
+static int collapse_find_target_node(struct collapse_control *cc)
{
int nid, target_node = 0, max_value = 0;
@@ -886,7 +886,7 @@ static int hpage_collapse_find_target_node(struct collapse_control *cc)
return target_node;
}
#else
-static int hpage_collapse_find_target_node(struct collapse_control *cc)
+static int collapse_find_target_node(struct collapse_control *cc)
{
return 0;
}
@@ -905,7 +905,7 @@ static enum scan_result hugepage_vma_revalidate(struct mm_struct *mm, unsigned l
enum tva_type type = cc->is_khugepaged ? TVA_KHUGEPAGED :
TVA_FORCED_COLLAPSE;
- if (unlikely(hpage_collapse_test_exit_or_disable(mm)))
+ if (unlikely(collapse_test_exit_or_disable(mm)))
return SCAN_ANY_PROCESS;
*vmap = vma = find_vma(mm, address);
@@ -976,7 +976,7 @@ static enum scan_result check_pmd_still_valid(struct mm_struct *mm,
/*
* Bring missing pages in from swap, to complete THP collapse.
- * Only done if hpage_collapse_scan_pmd believes it is worthwhile.
+ * Only done if khugepaged_scan_pmd believes it is worthwhile.
*
* Called and returns without pte mapped or spinlocks held.
* Returns result: if not SCAN_SUCCEED, mmap_lock has been released.
@@ -1062,7 +1062,7 @@ static enum scan_result alloc_charge_folio(struct folio **foliop, struct mm_stru
{
gfp_t gfp = (cc->is_khugepaged ? alloc_hugepage_khugepaged_gfpmask() :
GFP_TRANSHUGE);
- int node = hpage_collapse_find_target_node(cc);
+ int node = collapse_find_target_node(cc);
struct folio *folio;
folio = __folio_alloc(gfp, HPAGE_PMD_ORDER, node, &cc->alloc_nmask);
@@ -1240,7 +1240,7 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long a
return result;
}
-static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *mm,
+static enum scan_result collapse_scan_pmd(struct mm_struct *mm,
struct vm_area_struct *vma, unsigned long start_addr, bool *mmap_locked,
struct collapse_control *cc)
{
@@ -1350,7 +1350,7 @@ static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *mm,
* hit record.
*/
node = folio_nid(folio);
- if (hpage_collapse_scan_abort(node, cc)) {
+ if (collapse_scan_abort(node, cc)) {
result = SCAN_SCAN_ABORT;
goto out_unmap;
}
@@ -1416,7 +1416,7 @@ static void collect_mm_slot(struct mm_slot *slot)
lockdep_assert_held(&khugepaged_mm_lock);
- if (hpage_collapse_test_exit(mm)) {
+ if (collapse_test_exit(mm)) {
/* free mm_slot */
hash_del(&slot->hash);
list_del(&slot->mm_node);
@@ -1771,7 +1771,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
if (find_pmd_or_thp_or_none(mm, addr, &pmd) != SCAN_SUCCEED)
continue;
- if (hpage_collapse_test_exit(mm))
+ if (collapse_test_exit(mm))
continue;
if (!file_backed_vma_is_retractable(vma))
@@ -2289,7 +2289,7 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
return result;
}
-static enum scan_result hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr,
+static enum scan_result collapse_scan_file(struct mm_struct *mm, unsigned long addr,
struct file *file, pgoff_t start, struct collapse_control *cc)
{
struct folio *folio = NULL;
@@ -2345,7 +2345,7 @@ static enum scan_result hpage_collapse_scan_file(struct mm_struct *mm, unsigned
}
node = folio_nid(folio);
- if (hpage_collapse_scan_abort(node, cc)) {
+ if (collapse_scan_abort(node, cc)) {
result = SCAN_SCAN_ABORT;
folio_put(folio);
break;
@@ -2395,7 +2395,7 @@ static enum scan_result hpage_collapse_scan_file(struct mm_struct *mm, unsigned
return result;
}
-static unsigned int khugepaged_scan_mm_slot(unsigned int pages, enum scan_result *result,
+static unsigned int collapse_scan_mm_slot(unsigned int pages, enum scan_result *result,
struct collapse_control *cc)
__releases(&khugepaged_mm_lock)
__acquires(&khugepaged_mm_lock)
@@ -2430,7 +2430,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, enum scan_result
goto breakouterloop_mmap_lock;
progress++;
- if (unlikely(hpage_collapse_test_exit_or_disable(mm)))
+ if (unlikely(collapse_test_exit_or_disable(mm)))
goto breakouterloop;
vma_iter_init(&vmi, mm, khugepaged_scan.address);
@@ -2438,7 +2438,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, enum scan_result
unsigned long hstart, hend;
cond_resched();
- if (unlikely(hpage_collapse_test_exit_or_disable(mm))) {
+ if (unlikely(collapse_test_exit_or_disable(mm))) {
progress++;
break;
}
@@ -2460,7 +2460,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, enum scan_result
bool mmap_locked = true;
cond_resched();
- if (unlikely(hpage_collapse_test_exit_or_disable(mm)))
+ if (unlikely(collapse_test_exit_or_disable(mm)))
goto breakouterloop;
VM_BUG_ON(khugepaged_scan.address < hstart ||
@@ -2473,12 +2473,12 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, enum scan_result
mmap_read_unlock(mm);
mmap_locked = false;
- *result = hpage_collapse_scan_file(mm,
+ *result = collapse_scan_file(mm,
khugepaged_scan.address, file, pgoff, cc);
fput(file);
if (*result == SCAN_PTE_MAPPED_HUGEPAGE) {
mmap_read_lock(mm);
- if (hpage_collapse_test_exit_or_disable(mm))
+ if (collapse_test_exit_or_disable(mm))
goto breakouterloop;
*result = try_collapse_pte_mapped_thp(mm,
khugepaged_scan.address, false);
@@ -2487,7 +2487,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, enum scan_result
mmap_read_unlock(mm);
}
} else {
- *result = hpage_collapse_scan_pmd(mm, vma,
+ *result = collapse_scan_pmd(mm, vma,
khugepaged_scan.address, &mmap_locked, cc);
}
@@ -2520,7 +2520,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, enum scan_result
* Release the current mm_slot if this mm is about to die, or
* if we scanned all vmas of this mm.
*/
- if (hpage_collapse_test_exit(mm) || !vma) {
+ if (collapse_test_exit(mm) || !vma) {
/*
* Make sure that if mm_users is reaching zero while
* khugepaged runs here, khugepaged_exit will find
@@ -2571,8 +2571,8 @@ static void khugepaged_do_scan(struct collapse_control *cc)
pass_through_head++;
if (khugepaged_has_work() &&
pass_through_head < 2)
- progress += khugepaged_scan_mm_slot(pages - progress,
- &result, cc);
+ progress += collapse_scan_mm_slot(pages - progress,
+ &result, cc);
else
progress = pages;
spin_unlock(&khugepaged_mm_lock);
@@ -2816,8 +2816,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
mmap_read_unlock(mm);
mmap_locked = false;
*lock_dropped = true;
- result = hpage_collapse_scan_file(mm, addr, file, pgoff,
- cc);
+ result = collapse_scan_file(mm, addr, file, pgoff, cc);
if (result == SCAN_PAGE_DIRTY_OR_WRITEBACK && !triggered_wb &&
mapping_can_writeback(file->f_mapping)) {
@@ -2831,8 +2830,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
}
fput(file);
} else {
- result = hpage_collapse_scan_pmd(mm, vma, addr,
- &mmap_locked, cc);
+ result = collapse_scan_pmd(mm, vma, addr, &mmap_locked, cc);
}
if (!mmap_locked)
*lock_dropped = true;
diff --git a/mm/mremap.c b/mm/mremap.c
index 2be876a70cc0..eb222af91c2d 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -244,7 +244,7 @@ static int move_ptes(struct pagetable_move_control *pmc,
goto out;
}
/*
- * Now new_pte is none, so hpage_collapse_scan_file() path can not find
+ * Now new_pte is none, so collapse_scan_file() path can not find
* this by traversing file->f_mapping, so there is no concurrency with
* retract_page_tables(). In addition, we already hold the exclusive
* mmap_lock, so this new_pte page is stable, so there is no need to get
--
2.53.0
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH mm-unstable v1 5/5] mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd()
2026-02-12 2:23 ` [PATCH mm-unstable v1 4/5] mm/khugepaged: rename hpage_collapse_* to collapse_* Nico Pache
@ 2026-02-12 2:23 ` Nico Pache
2026-02-12 19:52 ` [PATCH mm-unstable v1 4/5] mm/khugepaged: rename hpage_collapse_* to collapse_* David Hildenbrand (Arm)
1 sibling, 0 replies; 32+ messages in thread
From: Nico Pache @ 2026-02-12 2:23 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: aarcange, akpm, anshuman.khandual, apopple, baohua, baolin.wang,
byungchul, catalin.marinas, cl, corbet, dave.hansen, david,
dev.jain, gourry, hannes, hughd, jackmanb, jack, jannh, jglisse,
joshua.hahnjy, kas, lance.yang, Liam.Howlett, lorenzo.stoakes,
mathieu.desnoyers, matthew.brost, mhiramat, mhocko, npache,
peterx, pfalcato, rakie.kim, raquini, rdunlap, richard.weiyang,
rientjes, rostedt, rppt, ryan.roberts, shivankg, sunnanyong,
surenb, thomas.hellstrom, tiwai, usamaarif642, vbabka,
vishal.moola, wangkefeng.wang, will, willy, yang, ying.huang,
ziy, zokeefe
The khugepaged daemon and madvise_collapse have two different
implementations that do almost the same thing.
Create collapse_single_pmd to increase code reuse and create an entry
point to these two users.
Refactor madvise_collapse and collapse_scan_mm_slot to use the new
collapse_single_pmd function. This introduces a minor behavioral change
that is most likely an undiscovered bug. The current implementation of
khugepaged tests collapse_test_exit_or_disable before calling
collapse_pte_mapped_thp, but we weren't doing it in the madvise_collapse
case. By unifying these two callers madvise_collapse now also performs
this check. We also modify the return value to be SCAN_ANY_PROCESS which
properly indicates that this process is no longer valid to operate on.
We also guard the khugepaged_pages_collapsed variable to ensure its only
incremented for khugepaged.
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Nico Pache <npache@redhat.com>
---
mm/khugepaged.c | 121 ++++++++++++++++++++++++++----------------------
1 file changed, 66 insertions(+), 55 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index fa41480f6948..0839a781bedd 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2395,6 +2395,62 @@ static enum scan_result collapse_scan_file(struct mm_struct *mm, unsigned long a
return result;
}
+/*
+ * Try to collapse a single PMD starting at a PMD aligned addr, and return
+ * the results.
+ */
+static enum scan_result collapse_single_pmd(unsigned long addr,
+ struct vm_area_struct *vma, bool *mmap_locked,
+ struct collapse_control *cc)
+{
+ struct mm_struct *mm = vma->vm_mm;
+ enum scan_result result;
+ struct file *file;
+ pgoff_t pgoff;
+
+ if (vma_is_anonymous(vma)) {
+ result = collapse_scan_pmd(mm, vma, addr, mmap_locked, cc);
+ goto end;
+ }
+
+ file = get_file(vma->vm_file);
+ pgoff = linear_page_index(vma, addr);
+
+ mmap_read_unlock(mm);
+ *mmap_locked = false;
+ result = collapse_scan_file(mm, addr, file, pgoff, cc);
+
+ if (!cc->is_khugepaged && result == SCAN_PAGE_DIRTY_OR_WRITEBACK &&
+ mapping_can_writeback(file->f_mapping)) {
+ const loff_t lstart = (loff_t)pgoff << PAGE_SHIFT;
+ const loff_t lend = lstart + HPAGE_PMD_SIZE - 1;
+
+ filemap_write_and_wait_range(file->f_mapping, lstart, lend);
+ }
+ fput(file);
+
+ if (result != SCAN_PTE_MAPPED_HUGEPAGE)
+ goto end;
+
+ mmap_read_lock(mm);
+ *mmap_locked = true;
+ if (collapse_test_exit_or_disable(mm)) {
+ mmap_read_unlock(mm);
+ *mmap_locked = false;
+ return SCAN_ANY_PROCESS;
+ }
+ result = try_collapse_pte_mapped_thp(mm, addr, !cc->is_khugepaged);
+ if (result == SCAN_PMD_MAPPED)
+ result = SCAN_SUCCEED;
+ mmap_read_unlock(mm);
+ *mmap_locked = false;
+
+end:
+ if (cc->is_khugepaged && result == SCAN_SUCCEED)
+ ++khugepaged_pages_collapsed;
+ return result;
+}
+
static unsigned int collapse_scan_mm_slot(unsigned int pages, enum scan_result *result,
struct collapse_control *cc)
__releases(&khugepaged_mm_lock)
@@ -2466,34 +2522,9 @@ static unsigned int collapse_scan_mm_slot(unsigned int pages, enum scan_result *
VM_BUG_ON(khugepaged_scan.address < hstart ||
khugepaged_scan.address + HPAGE_PMD_SIZE >
hend);
- if (!vma_is_anonymous(vma)) {
- struct file *file = get_file(vma->vm_file);
- pgoff_t pgoff = linear_page_index(vma,
- khugepaged_scan.address);
-
- mmap_read_unlock(mm);
- mmap_locked = false;
- *result = collapse_scan_file(mm,
- khugepaged_scan.address, file, pgoff, cc);
- fput(file);
- if (*result == SCAN_PTE_MAPPED_HUGEPAGE) {
- mmap_read_lock(mm);
- if (collapse_test_exit_or_disable(mm))
- goto breakouterloop;
- *result = try_collapse_pte_mapped_thp(mm,
- khugepaged_scan.address, false);
- if (*result == SCAN_PMD_MAPPED)
- *result = SCAN_SUCCEED;
- mmap_read_unlock(mm);
- }
- } else {
- *result = collapse_scan_pmd(mm, vma,
- khugepaged_scan.address, &mmap_locked, cc);
- }
-
- if (*result == SCAN_SUCCEED)
- ++khugepaged_pages_collapsed;
+ *result = collapse_single_pmd(khugepaged_scan.address,
+ vma, &mmap_locked, cc);
/* move to next address */
khugepaged_scan.address += HPAGE_PMD_SIZE;
progress += HPAGE_PMD_NR;
@@ -2799,6 +2830,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
cond_resched();
mmap_read_lock(mm);
mmap_locked = true;
+ *lock_dropped = true;
result = hugepage_vma_revalidate(mm, addr, false, &vma,
cc);
if (result != SCAN_SUCCEED) {
@@ -2809,46 +2841,25 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
hend = min(hend, vma->vm_end & HPAGE_PMD_MASK);
}
mmap_assert_locked(mm);
- if (!vma_is_anonymous(vma)) {
- struct file *file = get_file(vma->vm_file);
- pgoff_t pgoff = linear_page_index(vma, addr);
- mmap_read_unlock(mm);
- mmap_locked = false;
- *lock_dropped = true;
- result = collapse_scan_file(mm, addr, file, pgoff, cc);
-
- if (result == SCAN_PAGE_DIRTY_OR_WRITEBACK && !triggered_wb &&
- mapping_can_writeback(file->f_mapping)) {
- loff_t lstart = (loff_t)pgoff << PAGE_SHIFT;
- loff_t lend = lstart + HPAGE_PMD_SIZE - 1;
+ result = collapse_single_pmd(addr, vma, &mmap_locked, cc);
- filemap_write_and_wait_range(file->f_mapping, lstart, lend);
- triggered_wb = true;
- fput(file);
- goto retry;
- }
- fput(file);
- } else {
- result = collapse_scan_pmd(mm, vma, addr, &mmap_locked, cc);
- }
if (!mmap_locked)
*lock_dropped = true;
-handle_result:
+ if (result == SCAN_PAGE_DIRTY_OR_WRITEBACK && !triggered_wb) {
+ triggered_wb = true;
+ goto retry;
+ }
+
switch (result) {
case SCAN_SUCCEED:
case SCAN_PMD_MAPPED:
++thps;
break;
- case SCAN_PTE_MAPPED_HUGEPAGE:
- BUG_ON(mmap_locked);
- mmap_read_lock(mm);
- result = try_collapse_pte_mapped_thp(mm, addr, true);
- mmap_read_unlock(mm);
- goto handle_result;
/* Whitelisted set of results where continuing OK */
case SCAN_NO_PTE_TABLE:
+ case SCAN_PTE_MAPPED_HUGEPAGE:
case SCAN_PTE_NON_PRESENT:
case SCAN_PTE_UFFD_WP:
case SCAN_LACK_REFERENCED_PAGE:
--
2.53.0
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH mm-unstable v1 5/5] mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd()
2026-02-12 2:18 [PATCH mm-unstable v1 0/5] mm: khugepaged cleanups and mTHP prerequisites Nico Pache
` (3 preceding siblings ...)
2026-02-12 2:23 ` [PATCH mm-unstable v1 4/5] mm/khugepaged: rename hpage_collapse_* to collapse_* Nico Pache
@ 2026-02-12 2:25 ` Nico Pache
2026-02-12 15:33 ` Pedro Falcato
` (2 more replies)
2026-02-12 2:26 ` [PATCH mm-unstable v1 0/5] mm: khugepaged cleanups and mTHP prerequisites Nico Pache
5 siblings, 3 replies; 32+ messages in thread
From: Nico Pache @ 2026-02-12 2:25 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: aarcange, akpm, anshuman.khandual, apopple, baohua, baolin.wang,
byungchul, catalin.marinas, cl, corbet, dave.hansen, david,
dev.jain, gourry, hannes, hughd, jackmanb, jack, jannh, jglisse,
joshua.hahnjy, kas, lance.yang, Liam.Howlett, lorenzo.stoakes,
mathieu.desnoyers, matthew.brost, mhiramat, mhocko, npache,
peterx, pfalcato, rakie.kim, raquini, rdunlap, richard.weiyang,
rientjes, rostedt, rppt, ryan.roberts, shivankg, sunnanyong,
surenb, thomas.hellstrom, tiwai, usamaarif642, vbabka,
vishal.moola, wangkefeng.wang, will, willy, yang, ying.huang,
ziy, zokeefe
The khugepaged daemon and madvise_collapse have two different
implementations that do almost the same thing.
Create collapse_single_pmd to increase code reuse and create an entry
point to these two users.
Refactor madvise_collapse and collapse_scan_mm_slot to use the new
collapse_single_pmd function. This introduces a minor behavioral change
that is most likely an undiscovered bug. The current implementation of
khugepaged tests collapse_test_exit_or_disable before calling
collapse_pte_mapped_thp, but we weren't doing it in the madvise_collapse
case. By unifying these two callers madvise_collapse now also performs
this check. We also modify the return value to be SCAN_ANY_PROCESS which
properly indicates that this process is no longer valid to operate on.
We also guard the khugepaged_pages_collapsed variable to ensure its only
incremented for khugepaged.
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Nico Pache <npache@redhat.com>
---
mm/khugepaged.c | 121 ++++++++++++++++++++++++++----------------------
1 file changed, 66 insertions(+), 55 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index fa41480f6948..0839a781bedd 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2395,6 +2395,62 @@ static enum scan_result collapse_scan_file(struct mm_struct *mm, unsigned long a
return result;
}
+/*
+ * Try to collapse a single PMD starting at a PMD aligned addr, and return
+ * the results.
+ */
+static enum scan_result collapse_single_pmd(unsigned long addr,
+ struct vm_area_struct *vma, bool *mmap_locked,
+ struct collapse_control *cc)
+{
+ struct mm_struct *mm = vma->vm_mm;
+ enum scan_result result;
+ struct file *file;
+ pgoff_t pgoff;
+
+ if (vma_is_anonymous(vma)) {
+ result = collapse_scan_pmd(mm, vma, addr, mmap_locked, cc);
+ goto end;
+ }
+
+ file = get_file(vma->vm_file);
+ pgoff = linear_page_index(vma, addr);
+
+ mmap_read_unlock(mm);
+ *mmap_locked = false;
+ result = collapse_scan_file(mm, addr, file, pgoff, cc);
+
+ if (!cc->is_khugepaged && result == SCAN_PAGE_DIRTY_OR_WRITEBACK &&
+ mapping_can_writeback(file->f_mapping)) {
+ const loff_t lstart = (loff_t)pgoff << PAGE_SHIFT;
+ const loff_t lend = lstart + HPAGE_PMD_SIZE - 1;
+
+ filemap_write_and_wait_range(file->f_mapping, lstart, lend);
+ }
+ fput(file);
+
+ if (result != SCAN_PTE_MAPPED_HUGEPAGE)
+ goto end;
+
+ mmap_read_lock(mm);
+ *mmap_locked = true;
+ if (collapse_test_exit_or_disable(mm)) {
+ mmap_read_unlock(mm);
+ *mmap_locked = false;
+ return SCAN_ANY_PROCESS;
+ }
+ result = try_collapse_pte_mapped_thp(mm, addr, !cc->is_khugepaged);
+ if (result == SCAN_PMD_MAPPED)
+ result = SCAN_SUCCEED;
+ mmap_read_unlock(mm);
+ *mmap_locked = false;
+
+end:
+ if (cc->is_khugepaged && result == SCAN_SUCCEED)
+ ++khugepaged_pages_collapsed;
+ return result;
+}
+
static unsigned int collapse_scan_mm_slot(unsigned int pages, enum scan_result *result,
struct collapse_control *cc)
__releases(&khugepaged_mm_lock)
@@ -2466,34 +2522,9 @@ static unsigned int collapse_scan_mm_slot(unsigned int pages, enum scan_result *
VM_BUG_ON(khugepaged_scan.address < hstart ||
khugepaged_scan.address + HPAGE_PMD_SIZE >
hend);
- if (!vma_is_anonymous(vma)) {
- struct file *file = get_file(vma->vm_file);
- pgoff_t pgoff = linear_page_index(vma,
- khugepaged_scan.address);
-
- mmap_read_unlock(mm);
- mmap_locked = false;
- *result = collapse_scan_file(mm,
- khugepaged_scan.address, file, pgoff, cc);
- fput(file);
- if (*result == SCAN_PTE_MAPPED_HUGEPAGE) {
- mmap_read_lock(mm);
- if (collapse_test_exit_or_disable(mm))
- goto breakouterloop;
- *result = try_collapse_pte_mapped_thp(mm,
- khugepaged_scan.address, false);
- if (*result == SCAN_PMD_MAPPED)
- *result = SCAN_SUCCEED;
- mmap_read_unlock(mm);
- }
- } else {
- *result = collapse_scan_pmd(mm, vma,
- khugepaged_scan.address, &mmap_locked, cc);
- }
-
- if (*result == SCAN_SUCCEED)
- ++khugepaged_pages_collapsed;
+ *result = collapse_single_pmd(khugepaged_scan.address,
+ vma, &mmap_locked, cc);
/* move to next address */
khugepaged_scan.address += HPAGE_PMD_SIZE;
progress += HPAGE_PMD_NR;
@@ -2799,6 +2830,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
cond_resched();
mmap_read_lock(mm);
mmap_locked = true;
+ *lock_dropped = true;
result = hugepage_vma_revalidate(mm, addr, false, &vma,
cc);
if (result != SCAN_SUCCEED) {
@@ -2809,46 +2841,25 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
hend = min(hend, vma->vm_end & HPAGE_PMD_MASK);
}
mmap_assert_locked(mm);
- if (!vma_is_anonymous(vma)) {
- struct file *file = get_file(vma->vm_file);
- pgoff_t pgoff = linear_page_index(vma, addr);
- mmap_read_unlock(mm);
- mmap_locked = false;
- *lock_dropped = true;
- result = collapse_scan_file(mm, addr, file, pgoff, cc);
-
- if (result == SCAN_PAGE_DIRTY_OR_WRITEBACK && !triggered_wb &&
- mapping_can_writeback(file->f_mapping)) {
- loff_t lstart = (loff_t)pgoff << PAGE_SHIFT;
- loff_t lend = lstart + HPAGE_PMD_SIZE - 1;
+ result = collapse_single_pmd(addr, vma, &mmap_locked, cc);
- filemap_write_and_wait_range(file->f_mapping, lstart, lend);
- triggered_wb = true;
- fput(file);
- goto retry;
- }
- fput(file);
- } else {
- result = collapse_scan_pmd(mm, vma, addr, &mmap_locked, cc);
- }
if (!mmap_locked)
*lock_dropped = true;
-handle_result:
+ if (result == SCAN_PAGE_DIRTY_OR_WRITEBACK && !triggered_wb) {
+ triggered_wb = true;
+ goto retry;
+ }
+
switch (result) {
case SCAN_SUCCEED:
case SCAN_PMD_MAPPED:
++thps;
break;
- case SCAN_PTE_MAPPED_HUGEPAGE:
- BUG_ON(mmap_locked);
- mmap_read_lock(mm);
- result = try_collapse_pte_mapped_thp(mm, addr, true);
- mmap_read_unlock(mm);
- goto handle_result;
/* Whitelisted set of results where continuing OK */
case SCAN_NO_PTE_TABLE:
+ case SCAN_PTE_MAPPED_HUGEPAGE:
case SCAN_PTE_NON_PRESENT:
case SCAN_PTE_UFFD_WP:
case SCAN_LACK_REFERENCED_PAGE:
--
2.53.0
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH mm-unstable v1 0/5] mm: khugepaged cleanups and mTHP prerequisites
2026-02-12 2:18 [PATCH mm-unstable v1 0/5] mm: khugepaged cleanups and mTHP prerequisites Nico Pache
` (4 preceding siblings ...)
2026-02-12 2:25 ` [PATCH mm-unstable v1 5/5] mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd() Nico Pache
@ 2026-02-12 2:26 ` Nico Pache
5 siblings, 0 replies; 32+ messages in thread
From: Nico Pache @ 2026-02-12 2:26 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: aarcange, akpm, anshuman.khandual, apopple, baohua, baolin.wang,
byungchul, catalin.marinas, cl, corbet, dave.hansen, david,
dev.jain, gourry, hannes, hughd, jackmanb, jack, jannh, jglisse,
joshua.hahnjy, kas, lance.yang, Liam.Howlett, lorenzo.stoakes,
mathieu.desnoyers, matthew.brost, mhiramat, mhocko, peterx,
pfalcato, rakie.kim, raquini, rdunlap, richard.weiyang, rientjes,
rostedt, rppt, ryan.roberts, shivankg, sunnanyong, surenb,
thomas.hellstrom, tiwai, usamaarif642, vbabka, vishal.moola,
wangkefeng.wang, will, willy, yang, ying.huang, ziy, zokeefe
Sorry for the noise, my patch 4 and 5 borked when I sent it. I believe
I fixed it, but patch 5 may have been sent twice.
On Wed, Feb 11, 2026 at 7:18 PM Nico Pache <npache@redhat.com> wrote:
>
> The following series contains cleanups and prerequisites for my work on
> khugepaged mTHP support [1]. These have been separated out to ease review.
>
> The first patch in the series refactors the page fault folio to pte mapping
> and follows a similar convention as defined by map_anon_folio_pmd_(no)pf().
> This not only cleans up the current implementation of do_anonymous_page(),
> but will allow for reuse later in the khugepaged mTHP implementation.
>
> The second patch adds a small is_pmd_order() helper to check if an order is
> the PMD order. This check is open-coded in a number of places. This patch
> aims to clean this up and will be used more in the khugepaged mTHP work.
> The third patch also adds a small DEFINE for (HPAGE_PMD_NR - 1) which is
> used often across the khugepaged code.
>
> The fourth and fifth patch come from the khugepaged mTHP patchset [1].
> These two patches include the rename of function prefixes, and the
> unification of khugepaged and madvise_collapse via a new
> collapse_single_pmd function.
>
> Patch 1: refactor do_anonymous_page into map_anon_folio_pte_(no)pf
> Patch 2: add is_pmd_order helper
> Patch 3: Add define for (HPAGE_PMD_NR - 1)
> Patch 4: rename hpage_collapse to collapse_
> Patch 5: Refactoring to combine madvise_collapse and khugepaged
>
> ---------
> Testing
> ---------
> - Built for x86_64, aarch64, ppc64le, and s390x
> - ran all arches on test suites provided by the kernel-tests project
> - selftests mm
>
> V1 Changes (for patches coming from [1]):
> - Refactor do_anonymous_page() and add helpers for use in mthp series
> - moved is_pmd_order patch to this series [2]
> - added a define for HPAGE_PMD_NR - 1
> - moved rename to this series [3]
> - Dropped acks/review-by on PATCH 5 given [4]
> - moved unification patch to this series [4]. I also had to make some
> modifications from my previous version which include moving the new
> madvise_collapse writeback retry logic into the collapse_single_pmd
> function. This prevents a potential UAF bug I introduced in my v14 when
> handling the conflict. [5][6]
>
> A big thanks to everyone that has reviewed, tested, and participated in
> the development process. Its been a great experience working with all of
> you on this endeavour.
>
> [1] - https://lore.kernel.org/all/20260122192841.128719-1-npache@redhat.com/
> [2] - https://lore.kernel.org/all/20260122192841.128719-2-npache@redhat.com/
> [3] - https://lore.kernel.org/all/20260122192841.128719-3-npache@redhat.com/
> [4] - https://lore.kernel.org/all/20260122192841.128719-4-npache@redhat.com/
> [5] - https://lore.kernel.org/all/65dcf7ab-1299-411f-9cbc-438ae72ff757@linux.dev/
> [6] - https://lore.kernel.org/all/b824f131-3e51-422c-9e98-044b0a2928a6@redhat.com/
>
> Nico Pache (5):
> mm: consolidate anonymous folio PTE mapping into helpers
> mm: introduce is_pmd_order helper
> mm/khugepaged: define COLLAPSE_MAX_PTES_LIMIT as HPAGE_PMD_NR - 1
> mm/khugepaged: rename hpage_collapse_* to collapse_*
> mm/khugepaged: unify khugepaged and madv_collapse with
> collapse_single_pmd()
>
> include/linux/huge_mm.h | 5 ++
> include/linux/mm.h | 4 +
> mm/huge_memory.c | 2 +-
> mm/khugepaged.c | 194 +++++++++++++++++++++-------------------
> mm/memory.c | 56 ++++++++----
> mm/mempolicy.c | 2 +-
> mm/mremap.c | 2 +-
> mm/page_alloc.c | 2 +-
> 8 files changed, 152 insertions(+), 115 deletions(-)
>
> --
> 2.53.0
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH mm-unstable v1 3/5] mm/khugepaged: define COLLAPSE_MAX_PTES_LIMIT as HPAGE_PMD_NR - 1
2026-02-12 2:18 ` [PATCH mm-unstable v1 3/5] mm/khugepaged: define COLLAPSE_MAX_PTES_LIMIT as HPAGE_PMD_NR - 1 Nico Pache
@ 2026-02-12 6:56 ` Vernon Yang
2026-02-12 14:45 ` Pedro Falcato
` (2 subsequent siblings)
3 siblings, 0 replies; 32+ messages in thread
From: Vernon Yang @ 2026-02-12 6:56 UTC (permalink / raw)
To: Nico Pache
Cc: linux-kernel, linux-mm, aarcange, akpm, anshuman.khandual,
apopple, baohua, baolin.wang, byungchul, catalin.marinas, cl,
corbet, dave.hansen, david, dev.jain, gourry, hannes, hughd,
jackmanb, jack, jannh, jglisse, joshua.hahnjy, kas, lance.yang,
Liam.Howlett, lorenzo.stoakes, mathieu.desnoyers, matthew.brost,
mhiramat, mhocko, peterx, pfalcato, rakie.kim, raquini, rdunlap,
richard.weiyang, rientjes, rostedt, rppt, ryan.roberts, shivankg,
sunnanyong, surenb, thomas.hellstrom, tiwai, usamaarif642,
vbabka, vishal.moola, wangkefeng.wang, will, willy, yang,
ying.huang, ziy, zokeefe
On Wed, Feb 11, 2026 at 07:18:33PM -0700, Nico Pache wrote:
> The value (HPAGE_PMD_NR - 1) is used often in the khugepaged code to
> signify the limit of the max_ptes_* values. Add a define for this to
> increase code readability and reuse.
>
> Signed-off-by: Nico Pache <npache@redhat.com>
> ---
> mm/khugepaged.c | 15 ++++++++-------
> 1 file changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index c362b3b2e08a..3dcce6018f20 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -85,6 +85,7 @@ static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait);
> *
> * Note that these are only respected if collapse was initiated by khugepaged.
> */
> +#define COLLAPSE_MAX_PTES_LIMIT (HPAGE_PMD_NR - 1)
> unsigned int khugepaged_max_ptes_none __read_mostly;
> static unsigned int khugepaged_max_ptes_swap __read_mostly;
> static unsigned int khugepaged_max_ptes_shared __read_mostly;
> @@ -252,7 +253,7 @@ static ssize_t max_ptes_none_store(struct kobject *kobj,
> unsigned long max_ptes_none;
>
> err = kstrtoul(buf, 10, &max_ptes_none);
> - if (err || max_ptes_none > HPAGE_PMD_NR - 1)
> + if (err || max_ptes_none > COLLAPSE_MAX_PTES_LIMIT)
> return -EINVAL;
>
> khugepaged_max_ptes_none = max_ptes_none;
> @@ -277,7 +278,7 @@ static ssize_t max_ptes_swap_store(struct kobject *kobj,
> unsigned long max_ptes_swap;
>
> err = kstrtoul(buf, 10, &max_ptes_swap);
> - if (err || max_ptes_swap > HPAGE_PMD_NR - 1)
> + if (err || max_ptes_swap > COLLAPSE_MAX_PTES_LIMIT)
> return -EINVAL;
>
> khugepaged_max_ptes_swap = max_ptes_swap;
> @@ -303,7 +304,7 @@ static ssize_t max_ptes_shared_store(struct kobject *kobj,
> unsigned long max_ptes_shared;
>
> err = kstrtoul(buf, 10, &max_ptes_shared);
> - if (err || max_ptes_shared > HPAGE_PMD_NR - 1)
> + if (err || max_ptes_shared > COLLAPSE_MAX_PTES_LIMIT)
> return -EINVAL;
>
> khugepaged_max_ptes_shared = max_ptes_shared;
> @@ -384,7 +385,7 @@ int __init khugepaged_init(void)
> return -ENOMEM;
>
> khugepaged_pages_to_scan = HPAGE_PMD_NR * 8;
> - khugepaged_max_ptes_none = HPAGE_PMD_NR - 1;
> + khugepaged_max_ptes_none = COLLAPSE_MAX_PTES_LIMIT;
> khugepaged_max_ptes_swap = HPAGE_PMD_NR / 8;
> khugepaged_max_ptes_shared = HPAGE_PMD_NR / 2;
>
> @@ -1869,7 +1870,7 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
> bool is_shmem = shmem_file(file);
>
> VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
> - VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
> + VM_BUG_ON(start & (COLLAPSE_MAX_PTES_LIMIT));
>
> result = alloc_charge_folio(&new_folio, mm, cc);
> if (result != SCAN_SUCCEED)
> @@ -2209,7 +2210,7 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
> * unwritten page.
> */
> folio_mark_uptodate(new_folio);
> - folio_ref_add(new_folio, HPAGE_PMD_NR - 1);
> + folio_ref_add(new_folio, COLLAPSE_MAX_PTES_LIMIT);
Hi Nico,
Thank you for cleanup.
The readability is better in other places, but here it confuses me.
Why is it LIMIT? I had to look at the implementation code.
Would COLLAPSE_MAX_PTES be better?
> if (is_shmem)
> folio_mark_dirty(new_folio);
> @@ -2303,7 +2304,7 @@ static enum scan_result hpage_collapse_scan_file(struct mm_struct *mm, unsigned
> memset(cc->node_load, 0, sizeof(cc->node_load));
> nodes_clear(cc->alloc_nmask);
> rcu_read_lock();
> - xas_for_each(&xas, folio, start + HPAGE_PMD_NR - 1) {
> + xas_for_each(&xas, folio, start + COLLAPSE_MAX_PTES_LIMIT) {
> if (xas_retry(&xas, folio))
> continue;
>
> --
> 2.53.0
>
>
--
Cheers,
Vernon
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH mm-unstable v1 1/5] mm: consolidate anonymous folio PTE mapping into helpers
2026-02-12 2:18 ` [PATCH mm-unstable v1 1/5] mm: consolidate anonymous folio PTE mapping into helpers Nico Pache
@ 2026-02-12 14:38 ` Pedro Falcato
2026-02-12 15:55 ` Joshua Hahn
2026-02-12 16:09 ` Zi Yan
2 siblings, 0 replies; 32+ messages in thread
From: Pedro Falcato @ 2026-02-12 14:38 UTC (permalink / raw)
To: Nico Pache
Cc: linux-kernel, linux-mm, aarcange, akpm, anshuman.khandual,
apopple, baohua, baolin.wang, byungchul, catalin.marinas, cl,
corbet, dave.hansen, david, dev.jain, gourry, hannes, hughd,
jackmanb, jack, jannh, jglisse, joshua.hahnjy, kas, lance.yang,
Liam.Howlett, lorenzo.stoakes, mathieu.desnoyers, matthew.brost,
mhiramat, mhocko, peterx, rakie.kim, raquini, rdunlap,
richard.weiyang, rientjes, rostedt, rppt, ryan.roberts, shivankg,
sunnanyong, surenb, thomas.hellstrom, tiwai, usamaarif642,
vbabka, vishal.moola, wangkefeng.wang, will, willy, yang,
ying.huang, ziy, zokeefe
On Wed, Feb 11, 2026 at 07:18:31PM -0700, Nico Pache wrote:
> The anonymous page fault handler in do_anonymous_page() open-codes the
> sequence to map a newly allocated anonymous folio at the PTE level:
> - construct the PTE entry
> - add rmap
> - add to LRU
> - set the PTEs
> - update the MMU cache.
>
> Introduce a two helpers to consolidate this duplicated logic, mirroring the
> existing map_anon_folio_pmd_nopf() pattern for PMD-level mappings:
>
> map_anon_folio_pte_nopf(): constructs the PTE entry, takes folio
> references, adds anon rmap and LRU. This function also handles the
> uffd_wp that can occur in the pf variant.
>
> map_anon_folio_pte_pf(): extends the nopf variant to handle MM_ANONPAGES
> counter updates, and mTHP fault allocation statistics for the page fault
> path.
>
> The zero-page read path in do_anonymous_page() is also untangled from the
> shared setpte label, since it does not allocate a folio and should not
> share the same mapping sequence as the write path. Make nr_pages = 1
> rather than relying on the variable. This makes it more clear that we
> are operating on the zero page only.
>
> This refactoring will also help reduce code duplication between mm/memory.c
> and mm/khugepaged.c, and provides a clean API for PTE-level anonymous folio
> mapping that can be reused by future callers.
>
> Signed-off-by: Nico Pache <npache@redhat.com>
Acked-by: Pedro Falcato <pfalcato@suse.de>
Looks a little nicer, thanks :)
--
Pedro
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH mm-unstable v1 2/5] mm: introduce is_pmd_order helper
2026-02-12 2:18 ` [PATCH mm-unstable v1 2/5] mm: introduce is_pmd_order helper Nico Pache
@ 2026-02-12 14:40 ` Pedro Falcato
2026-02-12 16:11 ` Zi Yan
` (4 subsequent siblings)
5 siblings, 0 replies; 32+ messages in thread
From: Pedro Falcato @ 2026-02-12 14:40 UTC (permalink / raw)
To: Nico Pache
Cc: linux-kernel, linux-mm, aarcange, akpm, anshuman.khandual,
apopple, baohua, baolin.wang, byungchul, catalin.marinas, cl,
corbet, dave.hansen, david, dev.jain, gourry, hannes, hughd,
jackmanb, jack, jannh, jglisse, joshua.hahnjy, kas, lance.yang,
Liam.Howlett, lorenzo.stoakes, mathieu.desnoyers, matthew.brost,
mhiramat, mhocko, peterx, rakie.kim, raquini, rdunlap,
richard.weiyang, rientjes, rostedt, rppt, ryan.roberts, shivankg,
sunnanyong, surenb, thomas.hellstrom, tiwai, usamaarif642,
vbabka, vishal.moola, wangkefeng.wang, will, willy, yang,
ying.huang, ziy, zokeefe
On Wed, Feb 11, 2026 at 07:18:32PM -0700, Nico Pache wrote:
> In order to add mTHP support to khugepaged, we will often be checking if a
> given order is (or is not) a PMD order. Some places in the kernel already
> use this check, so lets create a simple helper function to keep the code
> clean and readable.
>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Signed-off-by: Nico Pache <npache@redhat.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
>
> arches# pick b3ace8be204f # mm/khugepaged: rename hpage_collapse_* to collapse_*
^^Oops?
--
Pedro
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH mm-unstable v1 3/5] mm/khugepaged: define COLLAPSE_MAX_PTES_LIMIT as HPAGE_PMD_NR - 1
2026-02-12 2:18 ` [PATCH mm-unstable v1 3/5] mm/khugepaged: define COLLAPSE_MAX_PTES_LIMIT as HPAGE_PMD_NR - 1 Nico Pache
2026-02-12 6:56 ` Vernon Yang
@ 2026-02-12 14:45 ` Pedro Falcato
2026-02-12 16:21 ` Zi Yan
2026-02-12 19:51 ` David Hildenbrand (Arm)
3 siblings, 0 replies; 32+ messages in thread
From: Pedro Falcato @ 2026-02-12 14:45 UTC (permalink / raw)
To: Nico Pache
Cc: linux-kernel, linux-mm, aarcange, akpm, anshuman.khandual,
apopple, baohua, baolin.wang, byungchul, catalin.marinas, cl,
corbet, dave.hansen, david, dev.jain, gourry, hannes, hughd,
jackmanb, jack, jannh, jglisse, joshua.hahnjy, kas, lance.yang,
Liam.Howlett, lorenzo.stoakes, mathieu.desnoyers, matthew.brost,
mhiramat, mhocko, peterx, rakie.kim, raquini, rdunlap,
richard.weiyang, rientjes, rostedt, rppt, ryan.roberts, shivankg,
sunnanyong, surenb, thomas.hellstrom, tiwai, usamaarif642,
vbabka, vishal.moola, wangkefeng.wang, will, willy, yang,
ying.huang, ziy, zokeefe
On Wed, Feb 11, 2026 at 07:18:33PM -0700, Nico Pache wrote:
> The value (HPAGE_PMD_NR - 1) is used often in the khugepaged code to
> signify the limit of the max_ptes_* values. Add a define for this to
> increase code readability and reuse.
>
> Signed-off-by: Nico Pache <npache@redhat.com>
Acked-by: Pedro Falcato <pfalcato@suse.de>
> ---
> mm/khugepaged.c | 15 ++++++++-------
> 1 file changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index c362b3b2e08a..3dcce6018f20 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -85,6 +85,7 @@ static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait);
> *
> * Note that these are only respected if collapse was initiated by khugepaged.
> */
> +#define COLLAPSE_MAX_PTES_LIMIT (HPAGE_PMD_NR - 1)
> unsigned int khugepaged_max_ptes_none __read_mostly;
> static unsigned int khugepaged_max_ptes_swap __read_mostly;
> static unsigned int khugepaged_max_ptes_shared __read_mostly;
> @@ -252,7 +253,7 @@ static ssize_t max_ptes_none_store(struct kobject *kobj,
> unsigned long max_ptes_none;
>
> err = kstrtoul(buf, 10, &max_ptes_none);
> - if (err || max_ptes_none > HPAGE_PMD_NR - 1)
> + if (err || max_ptes_none > COLLAPSE_MAX_PTES_LIMIT)
> return -EINVAL;
>
> khugepaged_max_ptes_none = max_ptes_none;
> @@ -277,7 +278,7 @@ static ssize_t max_ptes_swap_store(struct kobject *kobj,
> unsigned long max_ptes_swap;
>
> err = kstrtoul(buf, 10, &max_ptes_swap);
> - if (err || max_ptes_swap > HPAGE_PMD_NR - 1)
> + if (err || max_ptes_swap > COLLAPSE_MAX_PTES_LIMIT)
> return -EINVAL;
>
> khugepaged_max_ptes_swap = max_ptes_swap;
> @@ -303,7 +304,7 @@ static ssize_t max_ptes_shared_store(struct kobject *kobj,
> unsigned long max_ptes_shared;
>
> err = kstrtoul(buf, 10, &max_ptes_shared);
> - if (err || max_ptes_shared > HPAGE_PMD_NR - 1)
> + if (err || max_ptes_shared > COLLAPSE_MAX_PTES_LIMIT)
> return -EINVAL;
>
> khugepaged_max_ptes_shared = max_ptes_shared;
> @@ -384,7 +385,7 @@ int __init khugepaged_init(void)
> return -ENOMEM;
>
> khugepaged_pages_to_scan = HPAGE_PMD_NR * 8;
> - khugepaged_max_ptes_none = HPAGE_PMD_NR - 1;
> + khugepaged_max_ptes_none = COLLAPSE_MAX_PTES_LIMIT;
> khugepaged_max_ptes_swap = HPAGE_PMD_NR / 8;
> khugepaged_max_ptes_shared = HPAGE_PMD_NR / 2;
>
> @@ -1869,7 +1870,7 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
> bool is_shmem = shmem_file(file);
>
> VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
> - VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
> + VM_BUG_ON(start & (COLLAPSE_MAX_PTES_LIMIT));
nit: no need for the () around COLLAPSE_MAX_PTES_LIMIT here
--
Pedro
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH mm-unstable v1 5/5] mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd()
2026-02-12 2:25 ` [PATCH mm-unstable v1 5/5] mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd() Nico Pache
@ 2026-02-12 15:33 ` Pedro Falcato
2026-02-12 17:34 ` Zi Yan
2026-02-12 20:03 ` David Hildenbrand (Arm)
2 siblings, 0 replies; 32+ messages in thread
From: Pedro Falcato @ 2026-02-12 15:33 UTC (permalink / raw)
To: Nico Pache
Cc: linux-kernel, linux-mm, aarcange, akpm, anshuman.khandual,
apopple, baohua, baolin.wang, byungchul, catalin.marinas, cl,
corbet, dave.hansen, david, dev.jain, gourry, hannes, hughd,
jackmanb, jack, jannh, jglisse, joshua.hahnjy, kas, lance.yang,
Liam.Howlett, lorenzo.stoakes, mathieu.desnoyers, matthew.brost,
mhiramat, mhocko, peterx, rakie.kim, raquini, rdunlap,
richard.weiyang, rientjes, rostedt, rppt, ryan.roberts, shivankg,
sunnanyong, surenb, thomas.hellstrom, tiwai, usamaarif642,
vbabka, vishal.moola, wangkefeng.wang, will, willy, yang,
ying.huang, ziy, zokeefe
On Wed, Feb 11, 2026 at 07:25:12PM -0700, Nico Pache wrote:
> The khugepaged daemon and madvise_collapse have two different
> implementations that do almost the same thing.
>
> Create collapse_single_pmd to increase code reuse and create an entry
> point to these two users.
>
> Refactor madvise_collapse and collapse_scan_mm_slot to use the new
> collapse_single_pmd function. This introduces a minor behavioral change
> that is most likely an undiscovered bug. The current implementation of
> khugepaged tests collapse_test_exit_or_disable before calling
> collapse_pte_mapped_thp, but we weren't doing it in the madvise_collapse
> case. By unifying these two callers madvise_collapse now also performs
> this check. We also modify the return value to be SCAN_ANY_PROCESS which
> properly indicates that this process is no longer valid to operate on.
>
> We also guard the khugepaged_pages_collapsed variable to ensure its only
> incremented for khugepaged.
>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Signed-off-by: Nico Pache <npache@redhat.com>
Acked-by: Pedro Falcato <pfalcato@suse.de>
--
Pedro
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH mm-unstable v1 1/5] mm: consolidate anonymous folio PTE mapping into helpers
2026-02-12 2:18 ` [PATCH mm-unstable v1 1/5] mm: consolidate anonymous folio PTE mapping into helpers Nico Pache
2026-02-12 14:38 ` Pedro Falcato
@ 2026-02-12 15:55 ` Joshua Hahn
2026-02-12 19:33 ` Nico Pache
2026-02-12 16:09 ` Zi Yan
2 siblings, 1 reply; 32+ messages in thread
From: Joshua Hahn @ 2026-02-12 15:55 UTC (permalink / raw)
To: Nico Pache
Cc: linux-kernel, linux-mm, akpm, anshuman.khandual, apopple, baohua,
baolin.wang, byungchul, catalin.marinas, cl, corbet, dave.hansen,
david, dev.jain, gourry, hannes, hughd, jackmanb, jack, jannh,
jglisse, joshua.hahnjy, kas, lance.yang, Liam.Howlett,
lorenzo.stoakes, mathieu.desnoyers, matthew.brost, mhiramat,
mhocko, peterx, pfalcato, rakie.kim, raquini, rdunlap,
richard.weiyang, rientjes, rostedt, rppt, ryan.roberts, shivankg,
sunnanyong, surenb, thomas.hellstrom, tiwai, usamaarif642,
vbabka, vishal.moola, wangkefeng.wang, will, willy, yang,
ying.huang, ziy, zokeefe
On Wed, 11 Feb 2026 19:18:31 -0700 Nico Pache <npache@redhat.com> wrote:
Hello Nico,
Thank you for the patch! I hope you are having a good day.
> The anonymous page fault handler in do_anonymous_page() open-codes the
> sequence to map a newly allocated anonymous folio at the PTE level:
> - construct the PTE entry
> - add rmap
> - add to LRU
> - set the PTEs
> - update the MMU cache.
>
> Introduce a two helpers to consolidate this duplicated logic, mirroring the
> existing map_anon_folio_pmd_nopf() pattern for PMD-level mappings:
>
> map_anon_folio_pte_nopf(): constructs the PTE entry, takes folio
> references, adds anon rmap and LRU. This function also handles the
> uffd_wp that can occur in the pf variant.
>
> map_anon_folio_pte_pf(): extends the nopf variant to handle MM_ANONPAGES
> counter updates, and mTHP fault allocation statistics for the page fault
> path.
>
> The zero-page read path in do_anonymous_page() is also untangled from the
> shared setpte label, since it does not allocate a folio and should not
> share the same mapping sequence as the write path. Make nr_pages = 1
> rather than relying on the variable. This makes it more clear that we
> are operating on the zero page only.
>
> This refactoring will also help reduce code duplication between mm/memory.c
> and mm/khugepaged.c, and provides a clean API for PTE-level anonymous folio
> mapping that can be reused by future callers.
It seems that based on this description, there should be no functional change
in the code below. Is that correct?
> Signed-off-by: Nico Pache <npache@redhat.com>
> ---
> include/linux/mm.h | 4 ++++
> mm/memory.c | 56 ++++++++++++++++++++++++++++++----------------
> 2 files changed, 41 insertions(+), 19 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index f8a8fd47399c..c3aa1f51e020 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -4916,4 +4916,8 @@ static inline bool snapshot_page_is_faithful(const struct page_snapshot *ps)
>
> void snapshot_page(struct page_snapshot *ps, const struct page *page);
>
> +void map_anon_folio_pte_nopf(struct folio *folio, pte_t *pte,
> + struct vm_area_struct *vma, unsigned long addr,
> + bool uffd_wp);
> +
> #endif /* _LINUX_MM_H */
> diff --git a/mm/memory.c b/mm/memory.c
> index 8c19af97f0a0..61c2277c9d9f 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5211,6 +5211,35 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
> return folio_prealloc(vma->vm_mm, vma, vmf->address, true);
> }
>
> +
^^^ extra newline?
> +void map_anon_folio_pte_nopf(struct folio *folio, pte_t *pte,
> + struct vm_area_struct *vma, unsigned long addr,
> + bool uffd_wp)
> +{
> + pte_t entry = folio_mk_pte(folio, vma->vm_page_prot);
> + unsigned int nr_pages = folio_nr_pages(folio);
Just reading through the code below on what was deleted and what was added,
Maybe we are missing a pte_sw_mkyoung(entry) here? Seems like this would
matter for MIPS systmes, but I couldn't find this change in the changelog.
> + entry = maybe_mkwrite(pte_mkdirty(entry), vma);
> + if (uffd_wp)
> + entry = pte_mkuffd_wp(entry);
The ordering here was also changed, but it wasn't immediately obvious to me
why it was changed.
> +
> + folio_ref_add(folio, nr_pages - 1);
> + folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE);
> + folio_add_lru_vma(folio, vma);
> + set_ptes(vma->vm_mm, addr, pte, entry, nr_pages);
> + update_mmu_cache_range(NULL, vma, addr, pte, nr_pages);
> +}
> +
> +static void map_anon_folio_pte_pf(struct folio *folio, pte_t *pte,
> + struct vm_area_struct *vma, unsigned long addr,
> + unsigned int nr_pages, bool uffd_wp)
> +{
> + map_anon_folio_pte_nopf(folio, pte, vma, addr, uffd_wp);
> + add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages);
> + count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_FAULT_ALLOC);
> +}
> +
> +
^^^ extra newline?
> /*
> * We enter with non-exclusive mmap_lock (to exclude vma changes,
> * but allow concurrent faults), and pte mapped but not yet locked.
> @@ -5257,7 +5286,13 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
> pte_unmap_unlock(vmf->pte, vmf->ptl);
> return handle_userfault(vmf, VM_UFFD_MISSING);
> }
> - goto setpte;
> + if (vmf_orig_pte_uffd_wp(vmf))
> + entry = pte_mkuffd_wp(entry);
> + set_pte_at(vma->vm_mm, addr, vmf->pte, entry);
> +
> + /* No need to invalidate - it was non-present before */
> + update_mmu_cache_range(vmf, vma, addr, vmf->pte, /*nr_pages=*/ 1);
NIT: Should we try to keep the line under 80 columns? ; -)
> + goto unlock;
> }
>
> /* Allocate our own private page. */
> @@ -5281,11 +5316,6 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
> */
> __folio_mark_uptodate(folio);
>
> - entry = folio_mk_pte(folio, vma->vm_page_prot);
> - entry = pte_sw_mkyoung(entry);
> - if (vma->vm_flags & VM_WRITE)
> - entry = pte_mkwrite(pte_mkdirty(entry), vma);
> -
> vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, addr, &vmf->ptl);
> if (!vmf->pte)
> goto release;
> @@ -5307,19 +5337,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
> folio_put(folio);
> return handle_userfault(vmf, VM_UFFD_MISSING);
> }
> -
> - folio_ref_add(folio, nr_pages - 1);
> - add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages);
> - count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_FAULT_ALLOC);
> - folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE);
> - folio_add_lru_vma(folio, vma);
> -setpte:
> - if (vmf_orig_pte_uffd_wp(vmf))
> - entry = pte_mkuffd_wp(entry);
> - set_ptes(vma->vm_mm, addr, vmf->pte, entry, nr_pages);
> -
> - /* No need to invalidate - it was non-present before */
> - update_mmu_cache_range(vmf, vma, addr, vmf->pte, nr_pages);
> + map_anon_folio_pte_pf(folio, vmf->pte, vma, addr, nr_pages, vmf_orig_pte_uffd_wp(vmf));
NIT: Maybe here as well?
> unlock:
> if (vmf->pte)
> pte_unmap_unlock(vmf->pte, vmf->ptl);
> --
> 2.53.0
Thank you for the patch again. I hope you have a great day!
Joshua
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH mm-unstable v1 1/5] mm: consolidate anonymous folio PTE mapping into helpers
2026-02-12 2:18 ` [PATCH mm-unstable v1 1/5] mm: consolidate anonymous folio PTE mapping into helpers Nico Pache
2026-02-12 14:38 ` Pedro Falcato
2026-02-12 15:55 ` Joshua Hahn
@ 2026-02-12 16:09 ` Zi Yan
2026-02-12 19:45 ` Nico Pache
2 siblings, 1 reply; 32+ messages in thread
From: Zi Yan @ 2026-02-12 16:09 UTC (permalink / raw)
To: Nico Pache
Cc: linux-kernel, linux-mm, aarcange, akpm, anshuman.khandual,
apopple, baohua, baolin.wang, byungchul, catalin.marinas, cl,
corbet, dave.hansen, david, dev.jain, gourry, hannes, hughd,
jackmanb, jack, jannh, jglisse, joshua.hahnjy, kas, lance.yang,
Liam.Howlett, lorenzo.stoakes, mathieu.desnoyers, matthew.brost,
mhiramat, mhocko, peterx, pfalcato, rakie.kim, raquini, rdunlap,
richard.weiyang, rientjes, rostedt, rppt, ryan.roberts, shivankg,
sunnanyong, surenb, thomas.hellstrom, tiwai, usamaarif642,
vbabka, vishal.moola, wangkefeng.wang, will, willy, yang,
ying.huang, zokeefe
On 11 Feb 2026, at 21:18, Nico Pache wrote:
> The anonymous page fault handler in do_anonymous_page() open-codes the
> sequence to map a newly allocated anonymous folio at the PTE level:
> - construct the PTE entry
> - add rmap
> - add to LRU
> - set the PTEs
> - update the MMU cache.
>
> Introduce a two helpers to consolidate this duplicated logic, mirroring the
> existing map_anon_folio_pmd_nopf() pattern for PMD-level mappings:
>
> map_anon_folio_pte_nopf(): constructs the PTE entry, takes folio
> references, adds anon rmap and LRU. This function also handles the
> uffd_wp that can occur in the pf variant.
>
> map_anon_folio_pte_pf(): extends the nopf variant to handle MM_ANONPAGES
> counter updates, and mTHP fault allocation statistics for the page fault
> path.
>
> The zero-page read path in do_anonymous_page() is also untangled from the
> shared setpte label, since it does not allocate a folio and should not
> share the same mapping sequence as the write path. Make nr_pages = 1
> rather than relying on the variable. This makes it more clear that we
> are operating on the zero page only.
>
> This refactoring will also help reduce code duplication between mm/memory.c
> and mm/khugepaged.c, and provides a clean API for PTE-level anonymous folio
> mapping that can be reused by future callers.
>
> Signed-off-by: Nico Pache <npache@redhat.com>
> ---
> include/linux/mm.h | 4 ++++
> mm/memory.c | 56 ++++++++++++++++++++++++++++++----------------
> 2 files changed, 41 insertions(+), 19 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index f8a8fd47399c..c3aa1f51e020 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -4916,4 +4916,8 @@ static inline bool snapshot_page_is_faithful(const struct page_snapshot *ps)
>
> void snapshot_page(struct page_snapshot *ps, const struct page *page);
>
> +void map_anon_folio_pte_nopf(struct folio *folio, pte_t *pte,
> + struct vm_area_struct *vma, unsigned long addr,
> + bool uffd_wp);
> +
> #endif /* _LINUX_MM_H */
> diff --git a/mm/memory.c b/mm/memory.c
> index 8c19af97f0a0..61c2277c9d9f 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5211,6 +5211,35 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
> return folio_prealloc(vma->vm_mm, vma, vmf->address, true);
> }
>
> +
> +void map_anon_folio_pte_nopf(struct folio *folio, pte_t *pte,
> + struct vm_area_struct *vma, unsigned long addr,
> + bool uffd_wp)
> +{
> + pte_t entry = folio_mk_pte(folio, vma->vm_page_prot);
> + unsigned int nr_pages = folio_nr_pages(folio);
> +
> + entry = maybe_mkwrite(pte_mkdirty(entry), vma);
> + if (uffd_wp)
> + entry = pte_mkuffd_wp(entry);
> +
> + folio_ref_add(folio, nr_pages - 1);
> + folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE);
> + folio_add_lru_vma(folio, vma);
> + set_ptes(vma->vm_mm, addr, pte, entry, nr_pages);
> + update_mmu_cache_range(NULL, vma, addr, pte, nr_pages);
Copy the comment
/* No need to invalidate - it was non-present before */
above it please.
> +}
> +
> +static void map_anon_folio_pte_pf(struct folio *folio, pte_t *pte,
> + struct vm_area_struct *vma, unsigned long addr,
> + unsigned int nr_pages, bool uffd_wp)
> +{
> + map_anon_folio_pte_nopf(folio, pte, vma, addr, uffd_wp);
> + add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages);
> + count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_FAULT_ALLOC);
> +}
> +
> +
> /*
> * We enter with non-exclusive mmap_lock (to exclude vma changes,
> * but allow concurrent faults), and pte mapped but not yet locked.
> @@ -5257,7 +5286,13 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
> pte_unmap_unlock(vmf->pte, vmf->ptl);
> return handle_userfault(vmf, VM_UFFD_MISSING);
> }
> - goto setpte;
> + if (vmf_orig_pte_uffd_wp(vmf))
> + entry = pte_mkuffd_wp(entry);
> + set_pte_at(vma->vm_mm, addr, vmf->pte, entry);
entry is only used in this if statement, you can move its declaration inside.
> +
> + /* No need to invalidate - it was non-present before */
> + update_mmu_cache_range(vmf, vma, addr, vmf->pte, /*nr_pages=*/ 1);
> + goto unlock;
> }
>
> /* Allocate our own private page. */
> @@ -5281,11 +5316,6 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
> */
> __folio_mark_uptodate(folio);
>
> - entry = folio_mk_pte(folio, vma->vm_page_prot);
> - entry = pte_sw_mkyoung(entry);
It is removed, can you explain why?
> - if (vma->vm_flags & VM_WRITE)
> - entry = pte_mkwrite(pte_mkdirty(entry), vma);
OK, this becomes maybe_mkwrite(pte_mkdirty(entry), vma).
> -
The above code is moved into map_anon_folio_pte_nopf(), thus executed
later than before the change. folio, vma->vm_flags, and vma->vm_page_prot
are not changed between, so there should be no functional change.
But it is better to explain it in the commit message to make review easier.
> vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, addr, &vmf->ptl);
> if (!vmf->pte)
> goto release;
> @@ -5307,19 +5337,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
> folio_put(folio);
> return handle_userfault(vmf, VM_UFFD_MISSING);
> }
> -
> - folio_ref_add(folio, nr_pages - 1);
> - add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages);
> - count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_FAULT_ALLOC);
These counter updates are moved after folio_add_new_anon_rmap(),
mirroring map_anon_folio_pmd_pf()’s order. Looks good to me.
> - folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE);
> - folio_add_lru_vma(folio, vma);
> -setpte:
> - if (vmf_orig_pte_uffd_wp(vmf))
> - entry = pte_mkuffd_wp(entry);
This is moved above folio_ref_add() in map_anon_folio_pte_nopf(), but
no functional change is expected.
> - set_ptes(vma->vm_mm, addr, vmf->pte, entry, nr_pages);
> -
> - /* No need to invalidate - it was non-present before */
> - update_mmu_cache_range(vmf, vma, addr, vmf->pte, nr_pages);
> + map_anon_folio_pte_pf(folio, vmf->pte, vma, addr, nr_pages, vmf_orig_pte_uffd_wp(vmf));
> unlock:
> if (vmf->pte)
> pte_unmap_unlock(vmf->pte, vmf->ptl);
> --
> 2.53.0
3 things:
1. Copy the comment for update_mmu_cache_range() in map_anon_folio_pte_nopf().
2. Make pte_t entry local in zero-page handling.
3. Explain why entry = pte_sw_mkyoung(entry) is removed.
Thanks.
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH mm-unstable v1 2/5] mm: introduce is_pmd_order helper
2026-02-12 2:18 ` [PATCH mm-unstable v1 2/5] mm: introduce is_pmd_order helper Nico Pache
2026-02-12 14:40 ` Pedro Falcato
@ 2026-02-12 16:11 ` Zi Yan
2026-02-12 19:45 ` David Hildenbrand (Arm)
` (3 subsequent siblings)
5 siblings, 0 replies; 32+ messages in thread
From: Zi Yan @ 2026-02-12 16:11 UTC (permalink / raw)
To: Nico Pache
Cc: linux-kernel, linux-mm, aarcange, akpm, anshuman.khandual,
apopple, baohua, baolin.wang, byungchul, catalin.marinas, cl,
corbet, dave.hansen, david, dev.jain, gourry, hannes, hughd,
jackmanb, jack, jannh, jglisse, joshua.hahnjy, kas, lance.yang,
Liam.Howlett, lorenzo.stoakes, mathieu.desnoyers, matthew.brost,
mhiramat, mhocko, peterx, pfalcato, rakie.kim, raquini, rdunlap,
richard.weiyang, rientjes, rostedt, rppt, ryan.roberts, shivankg,
sunnanyong, surenb, thomas.hellstrom, tiwai, usamaarif642,
vbabka, vishal.moola, wangkefeng.wang, will, willy, yang,
ying.huang, zokeefe
On 11 Feb 2026, at 21:18, Nico Pache wrote:
> In order to add mTHP support to khugepaged, we will often be checking if a
> given order is (or is not) a PMD order. Some places in the kernel already
> use this check, so lets create a simple helper function to keep the code
> clean and readable.
>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Signed-off-by: Nico Pache <npache@redhat.com>
>
> arches# pick b3ace8be204f # mm/khugepaged: rename hpage_collapse_* to collapse_*
> ---
> include/linux/huge_mm.h | 5 +++++
> mm/huge_memory.c | 2 +-
> mm/khugepaged.c | 4 ++--
> mm/mempolicy.c | 2 +-
> mm/page_alloc.c | 2 +-
> 5 files changed, 10 insertions(+), 5 deletions(-)
>
LGTM.
Reviewed-by: Zi Yan <ziy@nvidia.com>
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH mm-unstable v1 3/5] mm/khugepaged: define COLLAPSE_MAX_PTES_LIMIT as HPAGE_PMD_NR - 1
2026-02-12 2:18 ` [PATCH mm-unstable v1 3/5] mm/khugepaged: define COLLAPSE_MAX_PTES_LIMIT as HPAGE_PMD_NR - 1 Nico Pache
2026-02-12 6:56 ` Vernon Yang
2026-02-12 14:45 ` Pedro Falcato
@ 2026-02-12 16:21 ` Zi Yan
2026-02-12 19:51 ` David Hildenbrand (Arm)
3 siblings, 0 replies; 32+ messages in thread
From: Zi Yan @ 2026-02-12 16:21 UTC (permalink / raw)
To: Nico Pache
Cc: linux-kernel, linux-mm, aarcange, akpm, anshuman.khandual,
apopple, baohua, baolin.wang, byungchul, catalin.marinas, cl,
corbet, dave.hansen, david, dev.jain, gourry, hannes, hughd,
jackmanb, jack, jannh, jglisse, joshua.hahnjy, kas, lance.yang,
Liam.Howlett, lorenzo.stoakes, mathieu.desnoyers, matthew.brost,
mhiramat, mhocko, peterx, pfalcato, rakie.kim, raquini, rdunlap,
richard.weiyang, rientjes, rostedt, rppt, ryan.roberts, shivankg,
sunnanyong, surenb, thomas.hellstrom, tiwai, usamaarif642,
vbabka, vishal.moola, wangkefeng.wang, will, willy, yang,
ying.huang, zokeefe
On 11 Feb 2026, at 21:18, Nico Pache wrote:
> The value (HPAGE_PMD_NR - 1) is used often in the khugepaged code to
> signify the limit of the max_ptes_* values. Add a define for this to
> increase code readability and reuse.
>
> Signed-off-by: Nico Pache <npache@redhat.com>
> ---
> mm/khugepaged.c | 15 ++++++++-------
> 1 file changed, 8 insertions(+), 7 deletions(-)
>
LGTM.
Reviewed-by: Zi Yan <ziy@nvidia.com>
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH mm-unstable v1 5/5] mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd()
2026-02-12 2:25 ` [PATCH mm-unstable v1 5/5] mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd() Nico Pache
2026-02-12 15:33 ` Pedro Falcato
@ 2026-02-12 17:34 ` Zi Yan
2026-02-12 20:03 ` David Hildenbrand (Arm)
2 siblings, 0 replies; 32+ messages in thread
From: Zi Yan @ 2026-02-12 17:34 UTC (permalink / raw)
To: Nico Pache
Cc: linux-kernel, linux-mm, aarcange, akpm, anshuman.khandual,
apopple, baohua, baolin.wang, byungchul, catalin.marinas, cl,
corbet, dave.hansen, david, dev.jain, gourry, hannes, hughd,
jackmanb, jack, jannh, jglisse, joshua.hahnjy, kas, lance.yang,
Liam.Howlett, lorenzo.stoakes, mathieu.desnoyers, matthew.brost,
mhiramat, mhocko, peterx, pfalcato, rakie.kim, raquini, rdunlap,
richard.weiyang, rientjes, rostedt, rppt, ryan.roberts, shivankg,
sunnanyong, surenb, thomas.hellstrom, tiwai, usamaarif642,
vbabka, vishal.moola, wangkefeng.wang, will, willy, yang,
ying.huang, zokeefe
On 11 Feb 2026, at 21:25, Nico Pache wrote:
> The khugepaged daemon and madvise_collapse have two different
> implementations that do almost the same thing.
>
> Create collapse_single_pmd to increase code reuse and create an entry
> point to these two users.
>
> Refactor madvise_collapse and collapse_scan_mm_slot to use the new
> collapse_single_pmd function. This introduces a minor behavioral change
> that is most likely an undiscovered bug. The current implementation of
> khugepaged tests collapse_test_exit_or_disable before calling
> collapse_pte_mapped_thp, but we weren't doing it in the madvise_collapse
> case. By unifying these two callers madvise_collapse now also performs
> this check. We also modify the return value to be SCAN_ANY_PROCESS which
> properly indicates that this process is no longer valid to operate on.
>
> We also guard the khugepaged_pages_collapsed variable to ensure its only
> incremented for khugepaged.
>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Signed-off-by: Nico Pache <npache@redhat.com>
> ---
> mm/khugepaged.c | 121 ++++++++++++++++++++++++++----------------------
> 1 file changed, 66 insertions(+), 55 deletions(-)
>
<snip>
> @@ -2799,6 +2830,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
> cond_resched();
> mmap_read_lock(mm);
> mmap_locked = true;
> + *lock_dropped = true;
Is this needed?
1. There is a one above handle_result;
2. mmap_locked is true when entering madvise_collapse(), so *lock_dropped would
change only after one iteration and the one below should take care of it;
3. goto retry is moved below “*lock_dropped = true”.
Let me know if I miss anything.
> result = hugepage_vma_revalidate(mm, addr, false, &vma,
> cc);
> if (result != SCAN_SUCCEED) {
> @@ -2809,46 +2841,25 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
> hend = min(hend, vma->vm_end & HPAGE_PMD_MASK);
> }
> mmap_assert_locked(mm);
> - if (!vma_is_anonymous(vma)) {
> - struct file *file = get_file(vma->vm_file);
> - pgoff_t pgoff = linear_page_index(vma, addr);
>
> - mmap_read_unlock(mm);
> - mmap_locked = false;
> - *lock_dropped = true;
> - result = collapse_scan_file(mm, addr, file, pgoff, cc);
> -
> - if (result == SCAN_PAGE_DIRTY_OR_WRITEBACK && !triggered_wb &&
> - mapping_can_writeback(file->f_mapping)) {
> - loff_t lstart = (loff_t)pgoff << PAGE_SHIFT;
> - loff_t lend = lstart + HPAGE_PMD_SIZE - 1;
> + result = collapse_single_pmd(addr, vma, &mmap_locked, cc);
>
> - filemap_write_and_wait_range(file->f_mapping, lstart, lend);
> - triggered_wb = true;
> - fput(file);
> - goto retry;
> - }
> - fput(file);
> - } else {
> - result = collapse_scan_pmd(mm, vma, addr, &mmap_locked, cc);
> - }
> if (!mmap_locked)
> *lock_dropped = true;
>
> -handle_result:
> + if (result == SCAN_PAGE_DIRTY_OR_WRITEBACK && !triggered_wb) {
> + triggered_wb = true;
> + goto retry;
> + }
> +
> switch (result) {
> case SCAN_SUCCEED:
> case SCAN_PMD_MAPPED:
> ++thps;
> break;
> - case SCAN_PTE_MAPPED_HUGEPAGE:
> - BUG_ON(mmap_locked);
> - mmap_read_lock(mm);
> - result = try_collapse_pte_mapped_thp(mm, addr, true);
> - mmap_read_unlock(mm);
> - goto handle_result;
> /* Whitelisted set of results where continuing OK */
> case SCAN_NO_PTE_TABLE:
> + case SCAN_PTE_MAPPED_HUGEPAGE:
> case SCAN_PTE_NON_PRESENT:
> case SCAN_PTE_UFFD_WP:
> case SCAN_LACK_REFERENCED_PAGE:
> --
> 2.53.0
Otherwise, LGTM.
Reviewed-by: Zi Yan <ziy@nvidia.com>
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH mm-unstable v1 1/5] mm: consolidate anonymous folio PTE mapping into helpers
2026-02-12 15:55 ` Joshua Hahn
@ 2026-02-12 19:33 ` Nico Pache
0 siblings, 0 replies; 32+ messages in thread
From: Nico Pache @ 2026-02-12 19:33 UTC (permalink / raw)
To: Joshua Hahn
Cc: linux-kernel, linux-mm, akpm, anshuman.khandual, apopple, baohua,
baolin.wang, byungchul, catalin.marinas, cl, corbet, dave.hansen,
david, dev.jain, gourry, hannes, hughd, jackmanb, jack, jannh,
jglisse, kas, lance.yang, Liam.Howlett, lorenzo.stoakes,
mathieu.desnoyers, matthew.brost, mhiramat, mhocko, peterx,
pfalcato, rakie.kim, raquini, rdunlap, richard.weiyang, rientjes,
rostedt, rppt, ryan.roberts, shivankg, sunnanyong, surenb,
thomas.hellstrom, tiwai, usamaarif642, vbabka, vishal.moola,
wangkefeng.wang, will, willy, yang, ying.huang, ziy, zokeefe
On Thu, Feb 12, 2026 at 8:55 AM Joshua Hahn <joshua.hahnjy@gmail.com> wrote:
>
> On Wed, 11 Feb 2026 19:18:31 -0700 Nico Pache <npache@redhat.com> wrote:
>
> Hello Nico,
>
> Thank you for the patch! I hope you are having a good day.
Hey Joshua!
Thank you for reviewing! I hope you have a good day too :)
>
> > The anonymous page fault handler in do_anonymous_page() open-codes the
> > sequence to map a newly allocated anonymous folio at the PTE level:
> > - construct the PTE entry
> > - add rmap
> > - add to LRU
> > - set the PTEs
> > - update the MMU cache.
> >
> > Introduce a two helpers to consolidate this duplicated logic, mirroring the
> > existing map_anon_folio_pmd_nopf() pattern for PMD-level mappings:
> >
> > map_anon_folio_pte_nopf(): constructs the PTE entry, takes folio
> > references, adds anon rmap and LRU. This function also handles the
> > uffd_wp that can occur in the pf variant.
> >
> > map_anon_folio_pte_pf(): extends the nopf variant to handle MM_ANONPAGES
> > counter updates, and mTHP fault allocation statistics for the page fault
> > path.
> >
> > The zero-page read path in do_anonymous_page() is also untangled from the
> > shared setpte label, since it does not allocate a folio and should not
> > share the same mapping sequence as the write path. Make nr_pages = 1
> > rather than relying on the variable. This makes it more clear that we
> > are operating on the zero page only.
> >
> > This refactoring will also help reduce code duplication between mm/memory.c
> > and mm/khugepaged.c, and provides a clean API for PTE-level anonymous folio
> > mapping that can be reused by future callers.
>
> It seems that based on this description, there should be no functional change
> in the code below. Is that correct?
Correct, but as you and others have pointed out I believe I missed a
pte_sw_mkyoung.
On closer inspection I also may have inadvertently changed some
behavior around pte_mkdirty().
In the previous implementation we only called pte_mkdirty if its
VM_WRITE. When switching over to maybe_mkwrite pte_mkdirty is no
longer called conditionally, while pte_mkwrite still is.
Nothing showed up in my testing, but some of these things can be
tricky to trigger. Other callers also make this "mistake" (if it even
is one), but I'm aiming for no functional change so I appreciate the
thoroughness here! I will clean up both of these issues.
>
> > Signed-off-by: Nico Pache <npache@redhat.com>
> > ---
> > include/linux/mm.h | 4 ++++
> > mm/memory.c | 56 ++++++++++++++++++++++++++++++----------------
> > 2 files changed, 41 insertions(+), 19 deletions(-)
> >
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index f8a8fd47399c..c3aa1f51e020 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -4916,4 +4916,8 @@ static inline bool snapshot_page_is_faithful(const struct page_snapshot *ps)
> >
> > void snapshot_page(struct page_snapshot *ps, const struct page *page);
> >
> > +void map_anon_folio_pte_nopf(struct folio *folio, pte_t *pte,
> > + struct vm_area_struct *vma, unsigned long addr,
> > + bool uffd_wp);
> > +
> > #endif /* _LINUX_MM_H */
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 8c19af97f0a0..61c2277c9d9f 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -5211,6 +5211,35 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
> > return folio_prealloc(vma->vm_mm, vma, vmf->address, true);
> > }
> >
> > +
>
> ^^^ extra newline?
oops yes! thanks
>
> > +void map_anon_folio_pte_nopf(struct folio *folio, pte_t *pte,
> > + struct vm_area_struct *vma, unsigned long addr,
> > + bool uffd_wp)
> > +{
> > + pte_t entry = folio_mk_pte(folio, vma->vm_page_prot);
> > + unsigned int nr_pages = folio_nr_pages(folio);
>
> Just reading through the code below on what was deleted and what was added,
> Maybe we are missing a pte_sw_mkyoung(entry) here? Seems like this would
> matter for MIPS systmes, but I couldn't find this change in the changelog.
I think you are correct. In my khugepaged implementation this was not
present. I will add it back and run some tests! Thank you.
>
> > + entry = maybe_mkwrite(pte_mkdirty(entry), vma);
> > + if (uffd_wp)
> > + entry = pte_mkuffd_wp(entry);
>
> The ordering here was also changed, but it wasn't immediately obvious to me
> why it was changed.
I dont see any data dependencies between the folio rmap/folio/ref
changes and the pte changes. The order was most likely due to the
setpte labeling which this also cleans up. I believe we can reorder
this, but if you see an issue with it please lmk!
>
> > +
> > + folio_ref_add(folio, nr_pages - 1);
> > + folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE);
> > + folio_add_lru_vma(folio, vma);
> > + set_ptes(vma->vm_mm, addr, pte, entry, nr_pages);
> > + update_mmu_cache_range(NULL, vma, addr, pte, nr_pages);
> > +}
> > +
> > +static void map_anon_folio_pte_pf(struct folio *folio, pte_t *pte,
> > + struct vm_area_struct *vma, unsigned long addr,
> > + unsigned int nr_pages, bool uffd_wp)
> > +{
> > + map_anon_folio_pte_nopf(folio, pte, vma, addr, uffd_wp);
> > + add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages);
> > + count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_FAULT_ALLOC);
> > +}
> > +
> > +
>
> ^^^ extra newline?
whoops yes thank you!
>
> > /*
> > * We enter with non-exclusive mmap_lock (to exclude vma changes,
> > * but allow concurrent faults), and pte mapped but not yet locked.
> > @@ -5257,7 +5286,13 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
> > pte_unmap_unlock(vmf->pte, vmf->ptl);
> > return handle_userfault(vmf, VM_UFFD_MISSING);
> > }
> > - goto setpte;
> > + if (vmf_orig_pte_uffd_wp(vmf))
> > + entry = pte_mkuffd_wp(entry);
> > + set_pte_at(vma->vm_mm, addr, vmf->pte, entry);
> > +
> > + /* No need to invalidate - it was non-present before */
> > + update_mmu_cache_range(vmf, vma, addr, vmf->pte, /*nr_pages=*/ 1);
>
> NIT: Should we try to keep the line under 80 columns? ; -)
Ah yes, I still dont fully understand this rule as it's broken in a
lot of places. Ill move nr_pages to a new line.
>
> > + goto unlock;
> > }
> >
> > /* Allocate our own private page. */
> > @@ -5281,11 +5316,6 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
> > */
> > __folio_mark_uptodate(folio);
> >
> > - entry = folio_mk_pte(folio, vma->vm_page_prot);
> > - entry = pte_sw_mkyoung(entry);
> > - if (vma->vm_flags & VM_WRITE)
> > - entry = pte_mkwrite(pte_mkdirty(entry), vma);
> > -
> > vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, addr, &vmf->ptl);
> > if (!vmf->pte)
> > goto release;
> > @@ -5307,19 +5337,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
> > folio_put(folio);
> > return handle_userfault(vmf, VM_UFFD_MISSING);
> > }
> > -
> > - folio_ref_add(folio, nr_pages - 1);
> > - add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages);
> > - count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_FAULT_ALLOC);
> > - folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE);
> > - folio_add_lru_vma(folio, vma);
> > -setpte:
> > - if (vmf_orig_pte_uffd_wp(vmf))
> > - entry = pte_mkuffd_wp(entry);
> > - set_ptes(vma->vm_mm, addr, vmf->pte, entry, nr_pages);
> > -
> > - /* No need to invalidate - it was non-present before */
> > - update_mmu_cache_range(vmf, vma, addr, vmf->pte, nr_pages);
> > + map_anon_folio_pte_pf(folio, vmf->pte, vma, addr, nr_pages, vmf_orig_pte_uffd_wp(vmf));
>
> NIT: Maybe here as well?
ack
>
> > unlock:
> > if (vmf->pte)
> > pte_unmap_unlock(vmf->pte, vmf->ptl);
> > --
> > 2.53.0
>
> Thank you for the patch again. I hope you have a great day!
> Joshua
Thank you! you as well
-- Nico
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH mm-unstable v1 1/5] mm: consolidate anonymous folio PTE mapping into helpers
2026-02-12 16:09 ` Zi Yan
@ 2026-02-12 19:45 ` Nico Pache
2026-02-12 20:06 ` Zi Yan
0 siblings, 1 reply; 32+ messages in thread
From: Nico Pache @ 2026-02-12 19:45 UTC (permalink / raw)
To: Zi Yan
Cc: linux-kernel, linux-mm, aarcange, akpm, anshuman.khandual,
apopple, baohua, baolin.wang, byungchul, catalin.marinas, cl,
corbet, dave.hansen, david, dev.jain, gourry, hannes, hughd,
jackmanb, jack, jannh, jglisse, joshua.hahnjy, kas, lance.yang,
Liam.Howlett, lorenzo.stoakes, mathieu.desnoyers, matthew.brost,
mhiramat, mhocko, peterx, pfalcato, rakie.kim, raquini, rdunlap,
richard.weiyang, rientjes, rostedt, rppt, ryan.roberts, shivankg,
sunnanyong, surenb, thomas.hellstrom, tiwai, usamaarif642,
vbabka, vishal.moola, wangkefeng.wang, will, willy, yang,
ying.huang, zokeefe
On Thu, Feb 12, 2026 at 9:09 AM Zi Yan <ziy@nvidia.com> wrote:
>
> On 11 Feb 2026, at 21:18, Nico Pache wrote:
>
> > The anonymous page fault handler in do_anonymous_page() open-codes the
> > sequence to map a newly allocated anonymous folio at the PTE level:
> > - construct the PTE entry
> > - add rmap
> > - add to LRU
> > - set the PTEs
> > - update the MMU cache.
> >
> > Introduce a two helpers to consolidate this duplicated logic, mirroring the
> > existing map_anon_folio_pmd_nopf() pattern for PMD-level mappings:
> >
> > map_anon_folio_pte_nopf(): constructs the PTE entry, takes folio
> > references, adds anon rmap and LRU. This function also handles the
> > uffd_wp that can occur in the pf variant.
> >
> > map_anon_folio_pte_pf(): extends the nopf variant to handle MM_ANONPAGES
> > counter updates, and mTHP fault allocation statistics for the page fault
> > path.
> >
> > The zero-page read path in do_anonymous_page() is also untangled from the
> > shared setpte label, since it does not allocate a folio and should not
> > share the same mapping sequence as the write path. Make nr_pages = 1
> > rather than relying on the variable. This makes it more clear that we
> > are operating on the zero page only.
> >
> > This refactoring will also help reduce code duplication between mm/memory.c
> > and mm/khugepaged.c, and provides a clean API for PTE-level anonymous folio
> > mapping that can be reused by future callers.
> >
> > Signed-off-by: Nico Pache <npache@redhat.com>
> > ---
> > include/linux/mm.h | 4 ++++
> > mm/memory.c | 56 ++++++++++++++++++++++++++++++----------------
> > 2 files changed, 41 insertions(+), 19 deletions(-)
> >
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index f8a8fd47399c..c3aa1f51e020 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -4916,4 +4916,8 @@ static inline bool snapshot_page_is_faithful(const struct page_snapshot *ps)
> >
> > void snapshot_page(struct page_snapshot *ps, const struct page *page);
> >
> > +void map_anon_folio_pte_nopf(struct folio *folio, pte_t *pte,
> > + struct vm_area_struct *vma, unsigned long addr,
> > + bool uffd_wp);
> > +
> > #endif /* _LINUX_MM_H */
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 8c19af97f0a0..61c2277c9d9f 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -5211,6 +5211,35 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
> > return folio_prealloc(vma->vm_mm, vma, vmf->address, true);
> > }
> >
> > +
> > +void map_anon_folio_pte_nopf(struct folio *folio, pte_t *pte,
> > + struct vm_area_struct *vma, unsigned long addr,
> > + bool uffd_wp)
> > +{
> > + pte_t entry = folio_mk_pte(folio, vma->vm_page_prot);
> > + unsigned int nr_pages = folio_nr_pages(folio);
> > +
> > + entry = maybe_mkwrite(pte_mkdirty(entry), vma);
> > + if (uffd_wp)
> > + entry = pte_mkuffd_wp(entry);
> > +
> > + folio_ref_add(folio, nr_pages - 1);
> > + folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE);
> > + folio_add_lru_vma(folio, vma);
> > + set_ptes(vma->vm_mm, addr, pte, entry, nr_pages);
> > + update_mmu_cache_range(NULL, vma, addr, pte, nr_pages);
>
> Copy the comment
> /* No need to invalidate - it was non-present before */
> above it please.
Good call thank you!
>
> > +}
> > +
> > +static void map_anon_folio_pte_pf(struct folio *folio, pte_t *pte,
> > + struct vm_area_struct *vma, unsigned long addr,
> > + unsigned int nr_pages, bool uffd_wp)
> > +{
> > + map_anon_folio_pte_nopf(folio, pte, vma, addr, uffd_wp);
> > + add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages);
> > + count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_FAULT_ALLOC);
> > +}
> > +
> > +
> > /*
> > * We enter with non-exclusive mmap_lock (to exclude vma changes,
> > * but allow concurrent faults), and pte mapped but not yet locked.
> > @@ -5257,7 +5286,13 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
> > pte_unmap_unlock(vmf->pte, vmf->ptl);
> > return handle_userfault(vmf, VM_UFFD_MISSING);
> > }
> > - goto setpte;
> > + if (vmf_orig_pte_uffd_wp(vmf))
> > + entry = pte_mkuffd_wp(entry);
> > + set_pte_at(vma->vm_mm, addr, vmf->pte, entry);
>
> entry is only used in this if statement, you can move its declaration inside.
Ack!
>
> > +
> > + /* No need to invalidate - it was non-present before */
> > + update_mmu_cache_range(vmf, vma, addr, vmf->pte, /*nr_pages=*/ 1);
> > + goto unlock;
> > }
> >
> > /* Allocate our own private page. */
> > @@ -5281,11 +5316,6 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
> > */
> > __folio_mark_uptodate(folio);
> >
> > - entry = folio_mk_pte(folio, vma->vm_page_prot);
> > - entry = pte_sw_mkyoung(entry);
>
> It is removed, can you explain why?
Thanks for catching that (as others have too), I will add it back and
run my testing again to make sure everything is still ok. As Joshua
pointed out it may only affect MIPS, hence no issues in my testing.
>
> > - if (vma->vm_flags & VM_WRITE)
> > - entry = pte_mkwrite(pte_mkdirty(entry), vma);
>
> OK, this becomes maybe_mkwrite(pte_mkdirty(entry), vma).
Yes, upon further investigation this does seem to slightly change the behavior.
pte_mkdirty() is now being called unconditionally from the VM_WRITE
flag. I noticed other callers in the kernel doing this too.
Is it ok to leave the pte_mkdirty() or should I go back to using
pte_mkwrite with the conditional guarding both mkwrite and mkdirty?
>
> > -
>
> The above code is moved into map_anon_folio_pte_nopf(), thus executed
> later than before the change. folio, vma->vm_flags, and vma->vm_page_prot
> are not changed between, so there should be no functional change.
> But it is better to explain it in the commit message to make review easier.
Will do! Thank you for confirming :) I am pretty sure we can make this
move without any functional change.
>
> > vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, addr, &vmf->ptl);
> > if (!vmf->pte)
> > goto release;
> > @@ -5307,19 +5337,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
> > folio_put(folio);
> > return handle_userfault(vmf, VM_UFFD_MISSING);
> > }
> > -
> > - folio_ref_add(folio, nr_pages - 1);
>
> > - add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages);
> > - count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_FAULT_ALLOC);
>
> These counter updates are moved after folio_add_new_anon_rmap(),
> mirroring map_anon_folio_pmd_pf()’s order. Looks good to me.
>
> > - folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE);
> > - folio_add_lru_vma(folio, vma);
> > -setpte:
>
> > - if (vmf_orig_pte_uffd_wp(vmf))
> > - entry = pte_mkuffd_wp(entry);
>
> This is moved above folio_ref_add() in map_anon_folio_pte_nopf(), but
> no functional change is expected.
>
> > - set_ptes(vma->vm_mm, addr, vmf->pte, entry, nr_pages);
> > -
> > - /* No need to invalidate - it was non-present before */
> > - update_mmu_cache_range(vmf, vma, addr, vmf->pte, nr_pages);
> > + map_anon_folio_pte_pf(folio, vmf->pte, vma, addr, nr_pages, vmf_orig_pte_uffd_wp(vmf));
> > unlock:
> > if (vmf->pte)
> > pte_unmap_unlock(vmf->pte, vmf->ptl);
> > --
> > 2.53.0
>
> 3 things:
> 1. Copy the comment for update_mmu_cache_range() in map_anon_folio_pte_nopf().
> 2. Make pte_t entry local in zero-page handling.
> 3. Explain why entry = pte_sw_mkyoung(entry) is removed.
>
> Thanks.
Thanks for the review :) Ill fix the issues stated above!
-- Nico
>
>
> Best Regards,
> Yan, Zi
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH mm-unstable v1 2/5] mm: introduce is_pmd_order helper
2026-02-12 2:18 ` [PATCH mm-unstable v1 2/5] mm: introduce is_pmd_order helper Nico Pache
2026-02-12 14:40 ` Pedro Falcato
2026-02-12 16:11 ` Zi Yan
@ 2026-02-12 19:45 ` David Hildenbrand (Arm)
2026-02-13 3:51 ` Barry Song
` (2 subsequent siblings)
5 siblings, 0 replies; 32+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-12 19:45 UTC (permalink / raw)
To: Nico Pache, linux-kernel, linux-mm
Cc: aarcange, akpm, anshuman.khandual, apopple, baohua, baolin.wang,
byungchul, catalin.marinas, cl, corbet, dave.hansen, dev.jain,
gourry, hannes, hughd, jackmanb, jack, jannh, jglisse,
joshua.hahnjy, kas, lance.yang, Liam.Howlett, lorenzo.stoakes,
mathieu.desnoyers, matthew.brost, mhiramat, mhocko, peterx,
pfalcato, rakie.kim, raquini, rdunlap, richard.weiyang, rientjes,
rostedt, rppt, ryan.roberts, shivankg, sunnanyong, surenb,
thomas.hellstrom, tiwai, usamaarif642, vbabka, vishal.moola,
wangkefeng.wang, will, willy, yang, ying.huang, ziy, zokeefe
On 2/12/26 03:18, Nico Pache wrote:
> In order to add mTHP support to khugepaged, we will often be checking if a
> given order is (or is not) a PMD order. Some places in the kernel already
> use this check, so lets create a simple helper function to keep the code
> clean and readable.
>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Signed-off-by: Nico Pache <npache@redhat.com>
>
> arches# pick b3ace8be204f # mm/khugepaged: rename hpage_collapse_* to collapse_*
> ---
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
--
Cheers,
David
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH mm-unstable v1 3/5] mm/khugepaged: define COLLAPSE_MAX_PTES_LIMIT as HPAGE_PMD_NR - 1
2026-02-12 2:18 ` [PATCH mm-unstable v1 3/5] mm/khugepaged: define COLLAPSE_MAX_PTES_LIMIT as HPAGE_PMD_NR - 1 Nico Pache
` (2 preceding siblings ...)
2026-02-12 16:21 ` Zi Yan
@ 2026-02-12 19:51 ` David Hildenbrand (Arm)
3 siblings, 0 replies; 32+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-12 19:51 UTC (permalink / raw)
To: Nico Pache, linux-kernel, linux-mm
Cc: aarcange, akpm, anshuman.khandual, apopple, baohua, baolin.wang,
byungchul, catalin.marinas, cl, corbet, dave.hansen, dev.jain,
gourry, hannes, hughd, jackmanb, jack, jannh, jglisse,
joshua.hahnjy, kas, lance.yang, Liam.Howlett, lorenzo.stoakes,
mathieu.desnoyers, matthew.brost, mhiramat, mhocko, peterx,
pfalcato, rakie.kim, raquini, rdunlap, richard.weiyang, rientjes,
rostedt, rppt, ryan.roberts, shivankg, sunnanyong, surenb,
thomas.hellstrom, tiwai, usamaarif642, vbabka, vishal.moola,
wangkefeng.wang, will, willy, yang, ying.huang, ziy, zokeefe
On 2/12/26 03:18, Nico Pache wrote:
> The value (HPAGE_PMD_NR - 1) is used often in the khugepaged code to
> signify the limit of the max_ptes_* values. Add a define for this to
> increase code readability and reuse.
>
> Signed-off-by: Nico Pache <npache@redhat.com>
> ---
> mm/khugepaged.c | 15 ++++++++-------
> 1 file changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index c362b3b2e08a..3dcce6018f20 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -85,6 +85,7 @@ static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait);
> *
> * Note that these are only respected if collapse was initiated by khugepaged.
> */
> +#define COLLAPSE_MAX_PTES_LIMIT (HPAGE_PMD_NR - 1)
> unsigned int khugepaged_max_ptes_none __read_mostly;
> static unsigned int khugepaged_max_ptes_swap __read_mostly;
> static unsigned int khugepaged_max_ptes_shared __read_mostly;
> @@ -252,7 +253,7 @@ static ssize_t max_ptes_none_store(struct kobject *kobj,
> unsigned long max_ptes_none;
>
> err = kstrtoul(buf, 10, &max_ptes_none);
> - if (err || max_ptes_none > HPAGE_PMD_NR - 1)
> + if (err || max_ptes_none > COLLAPSE_MAX_PTES_LIMIT)
> return -EINVAL;
>
> khugepaged_max_ptes_none = max_ptes_none;
> @@ -277,7 +278,7 @@ static ssize_t max_ptes_swap_store(struct kobject *kobj,
> unsigned long max_ptes_swap;
>
> err = kstrtoul(buf, 10, &max_ptes_swap);
> - if (err || max_ptes_swap > HPAGE_PMD_NR - 1)
> + if (err || max_ptes_swap > COLLAPSE_MAX_PTES_LIMIT)
> return -EINVAL;
>
> khugepaged_max_ptes_swap = max_ptes_swap;
> @@ -303,7 +304,7 @@ static ssize_t max_ptes_shared_store(struct kobject *kobj,
> unsigned long max_ptes_shared;
>
> err = kstrtoul(buf, 10, &max_ptes_shared);
> - if (err || max_ptes_shared > HPAGE_PMD_NR - 1)
> + if (err || max_ptes_shared > COLLAPSE_MAX_PTES_LIMIT)
> return -EINVAL;
>
> khugepaged_max_ptes_shared = max_ptes_shared;
> @@ -384,7 +385,7 @@ int __init khugepaged_init(void)
> return -ENOMEM;
>
> khugepaged_pages_to_scan = HPAGE_PMD_NR * 8;
> - khugepaged_max_ptes_none = HPAGE_PMD_NR - 1;
> + khugepaged_max_ptes_none = COLLAPSE_MAX_PTES_LIMIT;
> khugepaged_max_ptes_swap = HPAGE_PMD_NR / 8;
> khugepaged_max_ptes_shared = HPAGE_PMD_NR / 2;
Changing these is ok. I don't like the name either.
It should be clear that they all belong to the "khugepaged_max_ptes" *
values. (see below)
>
> @@ -1869,7 +1870,7 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
> bool is_shmem = shmem_file(file);
>
> VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
> - VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
> + VM_BUG_ON(start & (COLLAPSE_MAX_PTES_LIMIT));
>
> result = alloc_charge_folio(&new_folio, mm, cc);
> if (result != SCAN_SUCCEED)
> @@ -2209,7 +2210,7 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
> * unwritten page.
> */
> folio_mark_uptodate(new_folio);
> - folio_ref_add(new_folio, HPAGE_PMD_NR - 1);
> + folio_ref_add(new_folio, COLLAPSE_MAX_PTES_LIMIT);
>
> if (is_shmem)
> folio_mark_dirty(new_folio);
> @@ -2303,7 +2304,7 @@ static enum scan_result hpage_collapse_scan_file(struct mm_struct *mm, unsigned
> memset(cc->node_load, 0, sizeof(cc->node_load));
> nodes_clear(cc->alloc_nmask);
> rcu_read_lock();
> - xas_for_each(&xas, folio, start + HPAGE_PMD_NR - 1) {
> + xas_for_each(&xas, folio, start + COLLAPSE_MAX_PTES_LIMIT) {
> if (xas_retry(&xas, folio))
> continue;
>
Changing these is not. They are semantically something different.
E.g., folio_ref_add() adds "HPAGE_PMD_NR - 1" because we already
obtained a reference, and we need one per page -- HPAGE_PMD_NR.
So it's something different than when messing with the magical
khugepaged_max_ptes_none values.
--
Cheers,
David
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH mm-unstable v1 4/5] mm/khugepaged: rename hpage_collapse_* to collapse_*
2026-02-12 2:23 ` [PATCH mm-unstable v1 4/5] mm/khugepaged: rename hpage_collapse_* to collapse_* Nico Pache
2026-02-12 2:23 ` [PATCH mm-unstable v1 5/5] mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd() Nico Pache
@ 2026-02-12 19:52 ` David Hildenbrand (Arm)
1 sibling, 0 replies; 32+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-12 19:52 UTC (permalink / raw)
To: Nico Pache, linux-kernel, linux-mm
Cc: aarcange, akpm, anshuman.khandual, apopple, baohua, baolin.wang,
byungchul, catalin.marinas, cl, corbet, dave.hansen, dev.jain,
gourry, hannes, hughd, jackmanb, jack, jannh, jglisse,
joshua.hahnjy, kas, lance.yang, Liam.Howlett, lorenzo.stoakes,
mathieu.desnoyers, matthew.brost, mhiramat, mhocko, peterx,
pfalcato, rakie.kim, raquini, rdunlap, richard.weiyang, rientjes,
rostedt, rppt, ryan.roberts, shivankg, sunnanyong, surenb,
thomas.hellstrom, tiwai, usamaarif642, vbabka, vishal.moola,
wangkefeng.wang, will, willy, yang, ying.huang, ziy, zokeefe
On 2/12/26 03:23, Nico Pache wrote:
> The hpage_collapse functions describe functions used by madvise_collapse
> and khugepaged. remove the unnecessary hpage prefix to shorten the
> function name.
>
> Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
> Reviewed-by: Lance Yang <lance.yang@linux.dev>
> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Signed-off-by: Nico Pache <npache@redhat.com>
> ---
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
--
Cheers,
David
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH mm-unstable v1 5/5] mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd()
2026-02-12 2:25 ` [PATCH mm-unstable v1 5/5] mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd() Nico Pache
2026-02-12 15:33 ` Pedro Falcato
2026-02-12 17:34 ` Zi Yan
@ 2026-02-12 20:03 ` David Hildenbrand (Arm)
2026-02-12 20:26 ` Nico Pache
2 siblings, 1 reply; 32+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-12 20:03 UTC (permalink / raw)
To: Nico Pache, linux-kernel, linux-mm
Cc: aarcange, akpm, anshuman.khandual, apopple, baohua, baolin.wang,
byungchul, catalin.marinas, cl, corbet, dave.hansen, dev.jain,
gourry, hannes, hughd, jackmanb, jack, jannh, jglisse,
joshua.hahnjy, kas, lance.yang, Liam.Howlett, lorenzo.stoakes,
mathieu.desnoyers, matthew.brost, mhiramat, mhocko, peterx,
pfalcato, rakie.kim, raquini, rdunlap, richard.weiyang, rientjes,
rostedt, rppt, ryan.roberts, shivankg, sunnanyong, surenb,
thomas.hellstrom, tiwai, usamaarif642, vbabka, vishal.moola,
wangkefeng.wang, will, willy, yang, ying.huang, ziy, zokeefe
On 2/12/26 03:25, Nico Pache wrote:
> The khugepaged daemon and madvise_collapse have two different
> implementations that do almost the same thing.
>
> Create collapse_single_pmd to increase code reuse and create an entry
> point to these two users.
>
> Refactor madvise_collapse and collapse_scan_mm_slot to use the new
> collapse_single_pmd function. This introduces a minor behavioral change
> that is most likely an undiscovered bug. The current implementation of
> khugepaged tests collapse_test_exit_or_disable before calling
> collapse_pte_mapped_thp, but we weren't doing it in the madvise_collapse
> case. By unifying these two callers madvise_collapse now also performs
> this check. We also modify the return value to be SCAN_ANY_PROCESS which
> properly indicates that this process is no longer valid to operate on.
>
> We also guard the khugepaged_pages_collapsed variable to ensure its only
> incremented for khugepaged.
>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Signed-off-by: Nico Pache <npache@redhat.com>
> ---
> mm/khugepaged.c | 121 ++++++++++++++++++++++++++----------------------
> 1 file changed, 66 insertions(+), 55 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index fa41480f6948..0839a781bedd 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2395,6 +2395,62 @@ static enum scan_result collapse_scan_file(struct mm_struct *mm, unsigned long a
> return result;
> }
>
> +/*
> + * Try to collapse a single PMD starting at a PMD aligned addr, and return
> + * the results.
> + */
> +static enum scan_result collapse_single_pmd(unsigned long addr,
> + struct vm_area_struct *vma, bool *mmap_locked,
> + struct collapse_control *cc)
> +{
> + struct mm_struct *mm = vma->vm_mm;
> + enum scan_result result;
> + struct file *file;
> + pgoff_t pgoff;
> +
> + if (vma_is_anonymous(vma)) {
> + result = collapse_scan_pmd(mm, vma, addr, mmap_locked, cc);
> + goto end;
> + }
> +
> + file = get_file(vma->vm_file);
> + pgoff = linear_page_index(vma, addr);
> +
> + mmap_read_unlock(mm);
> + *mmap_locked = false;
> + result = collapse_scan_file(mm, addr, file, pgoff, cc);
> +
> + if (!cc->is_khugepaged && result == SCAN_PAGE_DIRTY_OR_WRITEBACK &&
> + mapping_can_writeback(file->f_mapping)) {
> + const loff_t lstart = (loff_t)pgoff << PAGE_SHIFT;
> + const loff_t lend = lstart + HPAGE_PMD_SIZE - 1;
> +
> + filemap_write_and_wait_range(file->f_mapping, lstart, lend);
> + }
> + fput(file);
> +
> + if (result != SCAN_PTE_MAPPED_HUGEPAGE)
> + goto end;
> +
> + mmap_read_lock(mm);
> + *mmap_locked = true;
> + if (collapse_test_exit_or_disable(mm)) {
> + mmap_read_unlock(mm);
> + *mmap_locked = false;
> + return SCAN_ANY_PROCESS;
> + }
> + result = try_collapse_pte_mapped_thp(mm, addr, !cc->is_khugepaged);
> + if (result == SCAN_PMD_MAPPED)
> + result = SCAN_SUCCEED;
> + mmap_read_unlock(mm);
> + *mmap_locked = false;
> +
> +end:
> + if (cc->is_khugepaged && result == SCAN_SUCCEED)
> + ++khugepaged_pages_collapsed;
> + return result;
> +}
> +
> static unsigned int collapse_scan_mm_slot(unsigned int pages, enum scan_result *result,
> struct collapse_control *cc)
> __releases(&khugepaged_mm_lock)
> @@ -2466,34 +2522,9 @@ static unsigned int collapse_scan_mm_slot(unsigned int pages, enum scan_result *
> VM_BUG_ON(khugepaged_scan.address < hstart ||
> khugepaged_scan.address + HPAGE_PMD_SIZE >
> hend);
> - if (!vma_is_anonymous(vma)) {
> - struct file *file = get_file(vma->vm_file);
> - pgoff_t pgoff = linear_page_index(vma,
> - khugepaged_scan.address);
> -
> - mmap_read_unlock(mm);
> - mmap_locked = false;
> - *result = collapse_scan_file(mm,
> - khugepaged_scan.address, file, pgoff, cc);
> - fput(file);
> - if (*result == SCAN_PTE_MAPPED_HUGEPAGE) {
> - mmap_read_lock(mm);
> - if (collapse_test_exit_or_disable(mm))
> - goto breakouterloop;
> - *result = try_collapse_pte_mapped_thp(mm,
> - khugepaged_scan.address, false);
> - if (*result == SCAN_PMD_MAPPED)
> - *result = SCAN_SUCCEED;
> - mmap_read_unlock(mm);
> - }
> - } else {
> - *result = collapse_scan_pmd(mm, vma,
> - khugepaged_scan.address, &mmap_locked, cc);
> - }
> -
> - if (*result == SCAN_SUCCEED)
> - ++khugepaged_pages_collapsed;
>
> + *result = collapse_single_pmd(khugepaged_scan.address,
> + vma, &mmap_locked, cc);
> /* move to next address */
> khugepaged_scan.address += HPAGE_PMD_SIZE;
> progress += HPAGE_PMD_NR;
> @@ -2799,6 +2830,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
> cond_resched();
> mmap_read_lock(mm);
> mmap_locked = true;
> + *lock_dropped = true;
> result = hugepage_vma_revalidate(mm, addr, false, &vma,
> cc);
> if (result != SCAN_SUCCEED) {
> @@ -2809,46 +2841,25 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
> hend = min(hend, vma->vm_end & HPAGE_PMD_MASK);
> }
> mmap_assert_locked(mm);
> - if (!vma_is_anonymous(vma)) {
> - struct file *file = get_file(vma->vm_file);
> - pgoff_t pgoff = linear_page_index(vma, addr);
>
> - mmap_read_unlock(mm);
> - mmap_locked = false;
> - *lock_dropped = true;
> - result = collapse_scan_file(mm, addr, file, pgoff, cc);
> -
> - if (result == SCAN_PAGE_DIRTY_OR_WRITEBACK && !triggered_wb &&
> - mapping_can_writeback(file->f_mapping)) {
> - loff_t lstart = (loff_t)pgoff << PAGE_SHIFT;
> - loff_t lend = lstart + HPAGE_PMD_SIZE - 1;
> + result = collapse_single_pmd(addr, vma, &mmap_locked, cc);
>
> - filemap_write_and_wait_range(file->f_mapping, lstart, lend);
> - triggered_wb = true;
> - fput(file);
> - goto retry;
> - }
> - fput(file);
> - } else {
> - result = collapse_scan_pmd(mm, vma, addr, &mmap_locked, cc);
> - }
> if (!mmap_locked)
> *lock_dropped = true;
>
> -handle_result:
> + if (result == SCAN_PAGE_DIRTY_OR_WRITEBACK && !triggered_wb) {
> + triggered_wb = true;
> + goto retry;
> + }
Having triggered_wb set where writeback is not actually triggered is
suboptimal.
And you can tell that by realizing that you would now retry once even
thought the mapping does not support writeback (
mapping_can_writeback(file->f_mapping)) and no writeback actually happened.
Further, we would also try to call filemap_write_and_wait_range() now
twice instead of only during the first round.
--
Cheers,
David
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH mm-unstable v1 1/5] mm: consolidate anonymous folio PTE mapping into helpers
2026-02-12 19:45 ` Nico Pache
@ 2026-02-12 20:06 ` Zi Yan
2026-02-12 20:13 ` David Hildenbrand (Arm)
0 siblings, 1 reply; 32+ messages in thread
From: Zi Yan @ 2026-02-12 20:06 UTC (permalink / raw)
To: Nico Pache
Cc: linux-kernel, linux-mm, aarcange, akpm, anshuman.khandual,
apopple, baohua, baolin.wang, byungchul, catalin.marinas, cl,
corbet, dave.hansen, david, dev.jain, gourry, hannes, hughd,
jackmanb, jack, jannh, jglisse, joshua.hahnjy, kas, lance.yang,
Liam.Howlett, lorenzo.stoakes, mathieu.desnoyers, matthew.brost,
mhiramat, mhocko, peterx, pfalcato, rakie.kim, raquini, rdunlap,
richard.weiyang, rientjes, rostedt, rppt, ryan.roberts, shivankg,
sunnanyong, surenb, thomas.hellstrom, tiwai, usamaarif642,
vbabka, vishal.moola, wangkefeng.wang, will, willy, yang,
ying.huang, zokeefe
<snip>
>>
>>> - if (vma->vm_flags & VM_WRITE)
>>> - entry = pte_mkwrite(pte_mkdirty(entry), vma);
>>
>> OK, this becomes maybe_mkwrite(pte_mkdirty(entry), vma).
>
> Yes, upon further investigation this does seem to slightly change the behavior.
I did not notice it when I was reviewing it. ;)
>
> pte_mkdirty() is now being called unconditionally from the VM_WRITE
> flag. I noticed other callers in the kernel doing this too.
>
> Is it ok to leave the pte_mkdirty() or should I go back to using
> pte_mkwrite with the conditional guarding both mkwrite and mkdirty?
IMHO, it is better to use the conditional guarding way.
We reach here when userspace reads an address (VM_WRITE is not set)
and no zero page is used. Using maybe_mkwrite(pte_mkdirty(entry), vma)
means we will get a dirty PTE pointing to the allocated page but user
only reads from it.
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH mm-unstable v1 1/5] mm: consolidate anonymous folio PTE mapping into helpers
2026-02-12 20:06 ` Zi Yan
@ 2026-02-12 20:13 ` David Hildenbrand (Arm)
0 siblings, 0 replies; 32+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-12 20:13 UTC (permalink / raw)
To: Zi Yan, Nico Pache
Cc: linux-kernel, linux-mm, aarcange, akpm, anshuman.khandual,
apopple, baohua, baolin.wang, byungchul, catalin.marinas, cl,
corbet, dave.hansen, dev.jain, gourry, hannes, hughd, jackmanb,
jack, jannh, jglisse, joshua.hahnjy, kas, lance.yang,
Liam.Howlett, lorenzo.stoakes, mathieu.desnoyers, matthew.brost,
mhiramat, mhocko, peterx, pfalcato, rakie.kim, raquini, rdunlap,
richard.weiyang, rientjes, rostedt, rppt, ryan.roberts, shivankg,
sunnanyong, surenb, thomas.hellstrom, tiwai, usamaarif642,
vbabka, vishal.moola, wangkefeng.wang, will, willy, yang,
ying.huang, zokeefe
On 2/12/26 21:06, Zi Yan wrote:
> <snip>
>
>>>
>>>
>>> OK, this becomes maybe_mkwrite(pte_mkdirty(entry), vma).
>>
>> Yes, upon further investigation this does seem to slightly change the behavior.
>
> I did not notice it when I was reviewing it. ;)
>
>>
>> pte_mkdirty() is now being called unconditionally from the VM_WRITE
>> flag. I noticed other callers in the kernel doing this too.
>>
>> Is it ok to leave the pte_mkdirty() or should I go back to using
>> pte_mkwrite with the conditional guarding both mkwrite and mkdirty?
>
>
> IMHO, it is better to use the conditional guarding way.
> We reach here when userspace reads an address (VM_WRITE is not set)
> and no zero page is used. Using maybe_mkwrite(pte_mkdirty(entry), vma)
> means we will get a dirty PTE pointing to the allocated page but user
> only reads from it.
In general, it's best to not perform any such changes as part of a
bigger patch. :)
--
Cheers,
David
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH mm-unstable v1 5/5] mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd()
2026-02-12 20:03 ` David Hildenbrand (Arm)
@ 2026-02-12 20:26 ` Nico Pache
2026-02-14 8:24 ` Lance Yang
0 siblings, 1 reply; 32+ messages in thread
From: Nico Pache @ 2026-02-12 20:26 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: linux-kernel, linux-mm, aarcange, akpm, anshuman.khandual,
apopple, baohua, baolin.wang, byungchul, catalin.marinas, cl,
corbet, dave.hansen, dev.jain, gourry, hannes, hughd, jackmanb,
jack, jannh, jglisse, joshua.hahnjy, kas, lance.yang,
Liam.Howlett, lorenzo.stoakes, mathieu.desnoyers, matthew.brost,
mhiramat, mhocko, peterx, pfalcato, rakie.kim, raquini, rdunlap,
richard.weiyang, rientjes, rostedt, rppt, ryan.roberts, shivankg,
sunnanyong, surenb, thomas.hellstrom, tiwai, usamaarif642,
vbabka, vishal.moola, wangkefeng.wang, will, willy, yang,
ying.huang, ziy, zokeefe
On Thu, Feb 12, 2026 at 1:04 PM David Hildenbrand (Arm)
<david@kernel.org> wrote:
>
> On 2/12/26 03:25, Nico Pache wrote:
> > The khugepaged daemon and madvise_collapse have two different
> > implementations that do almost the same thing.
> >
> > Create collapse_single_pmd to increase code reuse and create an entry
> > point to these two users.
> >
> > Refactor madvise_collapse and collapse_scan_mm_slot to use the new
> > collapse_single_pmd function. This introduces a minor behavioral change
> > that is most likely an undiscovered bug. The current implementation of
> > khugepaged tests collapse_test_exit_or_disable before calling
> > collapse_pte_mapped_thp, but we weren't doing it in the madvise_collapse
> > case. By unifying these two callers madvise_collapse now also performs
> > this check. We also modify the return value to be SCAN_ANY_PROCESS which
> > properly indicates that this process is no longer valid to operate on.
> >
> > We also guard the khugepaged_pages_collapsed variable to ensure its only
> > incremented for khugepaged.
> >
> > Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > Signed-off-by: Nico Pache <npache@redhat.com>
> > ---
> > mm/khugepaged.c | 121 ++++++++++++++++++++++++++----------------------
> > 1 file changed, 66 insertions(+), 55 deletions(-)
> >
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index fa41480f6948..0839a781bedd 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -2395,6 +2395,62 @@ static enum scan_result collapse_scan_file(struct mm_struct *mm, unsigned long a
> > return result;
> > }
> >
> > +/*
> > + * Try to collapse a single PMD starting at a PMD aligned addr, and return
> > + * the results.
> > + */
> > +static enum scan_result collapse_single_pmd(unsigned long addr,
> > + struct vm_area_struct *vma, bool *mmap_locked,
> > + struct collapse_control *cc)
> > +{
> > + struct mm_struct *mm = vma->vm_mm;
> > + enum scan_result result;
> > + struct file *file;
> > + pgoff_t pgoff;
> > +
> > + if (vma_is_anonymous(vma)) {
> > + result = collapse_scan_pmd(mm, vma, addr, mmap_locked, cc);
> > + goto end;
> > + }
> > +
> > + file = get_file(vma->vm_file);
> > + pgoff = linear_page_index(vma, addr);
> > +
> > + mmap_read_unlock(mm);
> > + *mmap_locked = false;
> > + result = collapse_scan_file(mm, addr, file, pgoff, cc);
> > +
> > + if (!cc->is_khugepaged && result == SCAN_PAGE_DIRTY_OR_WRITEBACK &&
> > + mapping_can_writeback(file->f_mapping)) {
> > + const loff_t lstart = (loff_t)pgoff << PAGE_SHIFT;
> > + const loff_t lend = lstart + HPAGE_PMD_SIZE - 1;
> > +
> > + filemap_write_and_wait_range(file->f_mapping, lstart, lend);
> > + }
> > + fput(file);
> > +
> > + if (result != SCAN_PTE_MAPPED_HUGEPAGE)
> > + goto end;
> > +
> > + mmap_read_lock(mm);
> > + *mmap_locked = true;
> > + if (collapse_test_exit_or_disable(mm)) {
> > + mmap_read_unlock(mm);
> > + *mmap_locked = false;
> > + return SCAN_ANY_PROCESS;
> > + }
> > + result = try_collapse_pte_mapped_thp(mm, addr, !cc->is_khugepaged);
> > + if (result == SCAN_PMD_MAPPED)
> > + result = SCAN_SUCCEED;
> > + mmap_read_unlock(mm);
> > + *mmap_locked = false;
> > +
> > +end:
> > + if (cc->is_khugepaged && result == SCAN_SUCCEED)
> > + ++khugepaged_pages_collapsed;
> > + return result;
> > +}
> > +
> > static unsigned int collapse_scan_mm_slot(unsigned int pages, enum scan_result *result,
> > struct collapse_control *cc)
> > __releases(&khugepaged_mm_lock)
> > @@ -2466,34 +2522,9 @@ static unsigned int collapse_scan_mm_slot(unsigned int pages, enum scan_result *
> > VM_BUG_ON(khugepaged_scan.address < hstart ||
> > khugepaged_scan.address + HPAGE_PMD_SIZE >
> > hend);
> > - if (!vma_is_anonymous(vma)) {
> > - struct file *file = get_file(vma->vm_file);
> > - pgoff_t pgoff = linear_page_index(vma,
> > - khugepaged_scan.address);
> > -
> > - mmap_read_unlock(mm);
> > - mmap_locked = false;
> > - *result = collapse_scan_file(mm,
> > - khugepaged_scan.address, file, pgoff, cc);
> > - fput(file);
> > - if (*result == SCAN_PTE_MAPPED_HUGEPAGE) {
> > - mmap_read_lock(mm);
> > - if (collapse_test_exit_or_disable(mm))
> > - goto breakouterloop;
> > - *result = try_collapse_pte_mapped_thp(mm,
> > - khugepaged_scan.address, false);
> > - if (*result == SCAN_PMD_MAPPED)
> > - *result = SCAN_SUCCEED;
> > - mmap_read_unlock(mm);
> > - }
> > - } else {
> > - *result = collapse_scan_pmd(mm, vma,
> > - khugepaged_scan.address, &mmap_locked, cc);
> > - }
> > -
> > - if (*result == SCAN_SUCCEED)
> > - ++khugepaged_pages_collapsed;
> >
> > + *result = collapse_single_pmd(khugepaged_scan.address,
> > + vma, &mmap_locked, cc);
> > /* move to next address */
> > khugepaged_scan.address += HPAGE_PMD_SIZE;
> > progress += HPAGE_PMD_NR;
> > @@ -2799,6 +2830,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
> > cond_resched();
> > mmap_read_lock(mm);
> > mmap_locked = true;
> > + *lock_dropped = true;
> > result = hugepage_vma_revalidate(mm, addr, false, &vma,
> > cc);
> > if (result != SCAN_SUCCEED) {
> > @@ -2809,46 +2841,25 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
> > hend = min(hend, vma->vm_end & HPAGE_PMD_MASK);
> > }
> > mmap_assert_locked(mm);
> > - if (!vma_is_anonymous(vma)) {
> > - struct file *file = get_file(vma->vm_file);
> > - pgoff_t pgoff = linear_page_index(vma, addr);
> >
> > - mmap_read_unlock(mm);
> > - mmap_locked = false;
> > - *lock_dropped = true;
> > - result = collapse_scan_file(mm, addr, file, pgoff, cc);
> > -
> > - if (result == SCAN_PAGE_DIRTY_OR_WRITEBACK && !triggered_wb &&
> > - mapping_can_writeback(file->f_mapping)) {
> > - loff_t lstart = (loff_t)pgoff << PAGE_SHIFT;
> > - loff_t lend = lstart + HPAGE_PMD_SIZE - 1;
> > + result = collapse_single_pmd(addr, vma, &mmap_locked, cc);
> >
> > - filemap_write_and_wait_range(file->f_mapping, lstart, lend);
> > - triggered_wb = true;
> > - fput(file);
> > - goto retry;
> > - }
> > - fput(file);
> > - } else {
> > - result = collapse_scan_pmd(mm, vma, addr, &mmap_locked, cc);
> > - }
> > if (!mmap_locked)
> > *lock_dropped = true;
> >
> > -handle_result:
> > + if (result == SCAN_PAGE_DIRTY_OR_WRITEBACK && !triggered_wb) {
> > + triggered_wb = true;
> > + goto retry;
> > + }
>
> Having triggered_wb set where writeback is not actually triggered is
> suboptimal.
It took me a second to figure out what you were referring to, but I
see it now. if we return SCAN_PAGE_D_OR_WB but the can_writeback fails
it still retries.
Would be an appropriate solution if can_writeback fails to modify the
return value.
ie)
if (!cc->is_khugepaged && result == SCAN_PAGE_DIRTY_OR_WRITEBACK) {
if (mapping_can_writeback(file->f_mapping)) {
const loff_t lstart = (loff_t)pgoff << PAGE_SHIFT;
const loff_t lend = lstart + HPAGE_PMD_SIZE - 1;
filemap_write_and_wait_range(file->f_mapping, lstart, lend);
} else {
result = SCAN_(SOMETHING?)
}
}
fput(file);
we dont have a enum that fits this description but we want one that
will continue.
Cheers!
-- Nico
>
> And you can tell that by realizing that you would now retry once even
> thought the mapping does not support writeback (
> mapping_can_writeback(file->f_mapping)) and no writeback actually happened.
>
> Further, we would also try to call filemap_write_and_wait_range() now
> twice instead of only during the first round.
>
> --
> Cheers,
>
> David
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH mm-unstable v1 2/5] mm: introduce is_pmd_order helper
2026-02-12 2:18 ` [PATCH mm-unstable v1 2/5] mm: introduce is_pmd_order helper Nico Pache
` (2 preceding siblings ...)
2026-02-12 19:45 ` David Hildenbrand (Arm)
@ 2026-02-13 3:51 ` Barry Song
2026-02-14 7:24 ` Lance Yang
2026-02-20 10:38 ` Dev Jain
5 siblings, 0 replies; 32+ messages in thread
From: Barry Song @ 2026-02-13 3:51 UTC (permalink / raw)
To: Nico Pache
Cc: linux-kernel, linux-mm, aarcange, akpm, anshuman.khandual,
apopple, baolin.wang, byungchul, catalin.marinas, cl, corbet,
dave.hansen, david, dev.jain, gourry, hannes, hughd, jackmanb,
jack, jannh, jglisse, joshua.hahnjy, kas, lance.yang,
Liam.Howlett, lorenzo.stoakes, mathieu.desnoyers, matthew.brost,
mhiramat, mhocko, peterx, pfalcato, rakie.kim, raquini, rdunlap,
richard.weiyang, rientjes, rostedt, rppt, ryan.roberts, shivankg,
sunnanyong, surenb, thomas.hellstrom, tiwai, usamaarif642,
vbabka, vishal.moola, wangkefeng.wang, will, willy, yang,
ying.huang, ziy, zokeefe
On Thu, Feb 12, 2026 at 10:19 AM Nico Pache <npache@redhat.com> wrote:
>
> In order to add mTHP support to khugepaged, we will often be checking if a
> given order is (or is not) a PMD order. Some places in the kernel already
> use this check, so lets create a simple helper function to keep the code
> clean and readable.
>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Signed-off-by: Nico Pache <npache@redhat.com>
LGTM,
Reviewed-by: Barry Song <baohua@kernel.org>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH mm-unstable v1 2/5] mm: introduce is_pmd_order helper
2026-02-12 2:18 ` [PATCH mm-unstable v1 2/5] mm: introduce is_pmd_order helper Nico Pache
` (3 preceding siblings ...)
2026-02-13 3:51 ` Barry Song
@ 2026-02-14 7:24 ` Lance Yang
2026-02-20 10:38 ` Dev Jain
5 siblings, 0 replies; 32+ messages in thread
From: Lance Yang @ 2026-02-14 7:24 UTC (permalink / raw)
To: Nico Pache
Cc: aarcange, akpm, anshuman.khandual, apopple, baohua, linux-kernel,
baolin.wang, byungchul, catalin.marinas, cl, corbet, dave.hansen,
david, dev.jain, gourry, linux-mm, hannes, hughd, jackmanb, jack,
jannh, jglisse, joshua.hahnjy, kas, Liam.Howlett,
lorenzo.stoakes, mathieu.desnoyers, matthew.brost, mhiramat,
mhocko, peterx, pfalcato, rakie.kim, raquini, rdunlap,
richard.weiyang, rientjes, rostedt, rppt, ryan.roberts, shivankg,
sunnanyong, surenb, thomas.hellstrom, tiwai, usamaarif642,
vbabka, vishal.moola, wangkefeng.wang, will, willy, yang,
ying.huang, ziy, zokeefe
On 2026/2/12 10:18, Nico Pache wrote:
> In order to add mTHP support to khugepaged, we will often be checking if a
> given order is (or is not) a PMD order. Some places in the kernel already
> use this check, so lets create a simple helper function to keep the code
> clean and readable.
>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Signed-off-by: Nico Pache <npache@redhat.com>
>
> arches# pick b3ace8be204f # mm/khugepaged: rename hpage_collapse_* to collapse_*
> ---
Looks good to me.
Reviewed-by: Lance Yang <lance.yang@linux.dev>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH mm-unstable v1 5/5] mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd()
2026-02-12 20:26 ` Nico Pache
@ 2026-02-14 8:24 ` Lance Yang
0 siblings, 0 replies; 32+ messages in thread
From: Lance Yang @ 2026-02-14 8:24 UTC (permalink / raw)
To: Nico Pache, David Hildenbrand (Arm)
Cc: linux-kernel, linux-mm, aarcange, akpm, anshuman.khandual,
apopple, baohua, baolin.wang, byungchul, catalin.marinas, cl,
corbet, dave.hansen, dev.jain, gourry, hannes, hughd, jackmanb,
jack, jannh, jglisse, joshua.hahnjy, kas, Liam.Howlett,
lorenzo.stoakes, mathieu.desnoyers, matthew.brost, mhiramat,
mhocko, peterx, pfalcato, rakie.kim, raquini, rdunlap,
richard.weiyang, rientjes, rostedt, rppt, ryan.roberts, shivankg,
sunnanyong, surenb, thomas.hellstrom, tiwai, usamaarif642,
vbabka, vishal.moola, wangkefeng.wang, will, willy, yang,
ying.huang, ziy, zokeefe
On 2026/2/13 04:26, Nico Pache wrote:
> On Thu, Feb 12, 2026 at 1:04 PM David Hildenbrand (Arm)
> <david@kernel.org> wrote:
>>
>> On 2/12/26 03:25, Nico Pache wrote:
>>> The khugepaged daemon and madvise_collapse have two different
>>> implementations that do almost the same thing.
>>>
>>> Create collapse_single_pmd to increase code reuse and create an entry
>>> point to these two users.
>>>
>>> Refactor madvise_collapse and collapse_scan_mm_slot to use the new
>>> collapse_single_pmd function. This introduces a minor behavioral change
>>> that is most likely an undiscovered bug. The current implementation of
>>> khugepaged tests collapse_test_exit_or_disable before calling
>>> collapse_pte_mapped_thp, but we weren't doing it in the madvise_collapse
>>> case. By unifying these two callers madvise_collapse now also performs
>>> this check. We also modify the return value to be SCAN_ANY_PROCESS which
>>> properly indicates that this process is no longer valid to operate on.
>>>
>>> We also guard the khugepaged_pages_collapsed variable to ensure its only
>>> incremented for khugepaged.
>>>
>>> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>>> Signed-off-by: Nico Pache <npache@redhat.com>
>>> ---
>>> mm/khugepaged.c | 121 ++++++++++++++++++++++++++----------------------
>>> 1 file changed, 66 insertions(+), 55 deletions(-)
>>>
>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>> index fa41480f6948..0839a781bedd 100644
>>> --- a/mm/khugepaged.c
>>> +++ b/mm/khugepaged.c
>>> @@ -2395,6 +2395,62 @@ static enum scan_result collapse_scan_file(struct mm_struct *mm, unsigned long a
>>> return result;
>>> }
>>>
>>> +/*
>>> + * Try to collapse a single PMD starting at a PMD aligned addr, and return
>>> + * the results.
>>> + */
>>> +static enum scan_result collapse_single_pmd(unsigned long addr,
>>> + struct vm_area_struct *vma, bool *mmap_locked,
>>> + struct collapse_control *cc)
>>> +{
>>> + struct mm_struct *mm = vma->vm_mm;
>>> + enum scan_result result;
>>> + struct file *file;
>>> + pgoff_t pgoff;
>>> +
>>> + if (vma_is_anonymous(vma)) {
>>> + result = collapse_scan_pmd(mm, vma, addr, mmap_locked, cc);
>>> + goto end;
>>> + }
>>> +
>>> + file = get_file(vma->vm_file);
>>> + pgoff = linear_page_index(vma, addr);
>>> +
>>> + mmap_read_unlock(mm);
>>> + *mmap_locked = false;
>>> + result = collapse_scan_file(mm, addr, file, pgoff, cc);
>>> +
>>> + if (!cc->is_khugepaged && result == SCAN_PAGE_DIRTY_OR_WRITEBACK &&
>>> + mapping_can_writeback(file->f_mapping)) {
>>> + const loff_t lstart = (loff_t)pgoff << PAGE_SHIFT;
>>> + const loff_t lend = lstart + HPAGE_PMD_SIZE - 1;
>>> +
>>> + filemap_write_and_wait_range(file->f_mapping, lstart, lend);
>>> + }
>>> + fput(file);
>>> +
>>> + if (result != SCAN_PTE_MAPPED_HUGEPAGE)
>>> + goto end;
>>> +
>>> + mmap_read_lock(mm);
>>> + *mmap_locked = true;
>>> + if (collapse_test_exit_or_disable(mm)) {
>>> + mmap_read_unlock(mm);
>>> + *mmap_locked = false;
>>> + return SCAN_ANY_PROCESS;
>>> + }
>>> + result = try_collapse_pte_mapped_thp(mm, addr, !cc->is_khugepaged);
>>> + if (result == SCAN_PMD_MAPPED)
>>> + result = SCAN_SUCCEED;
>>> + mmap_read_unlock(mm);
>>> + *mmap_locked = false;
>>> +
>>> +end:
>>> + if (cc->is_khugepaged && result == SCAN_SUCCEED)
>>> + ++khugepaged_pages_collapsed;
>>> + return result;
>>> +}
>>> +
>>> static unsigned int collapse_scan_mm_slot(unsigned int pages, enum scan_result *result,
>>> struct collapse_control *cc)
>>> __releases(&khugepaged_mm_lock)
>>> @@ -2466,34 +2522,9 @@ static unsigned int collapse_scan_mm_slot(unsigned int pages, enum scan_result *
>>> VM_BUG_ON(khugepaged_scan.address < hstart ||
>>> khugepaged_scan.address + HPAGE_PMD_SIZE >
>>> hend);
>>> - if (!vma_is_anonymous(vma)) {
>>> - struct file *file = get_file(vma->vm_file);
>>> - pgoff_t pgoff = linear_page_index(vma,
>>> - khugepaged_scan.address);
>>> -
>>> - mmap_read_unlock(mm);
>>> - mmap_locked = false;
>>> - *result = collapse_scan_file(mm,
>>> - khugepaged_scan.address, file, pgoff, cc);
>>> - fput(file);
>>> - if (*result == SCAN_PTE_MAPPED_HUGEPAGE) {
>>> - mmap_read_lock(mm);
>>> - if (collapse_test_exit_or_disable(mm))
>>> - goto breakouterloop;
>>> - *result = try_collapse_pte_mapped_thp(mm,
>>> - khugepaged_scan.address, false);
>>> - if (*result == SCAN_PMD_MAPPED)
>>> - *result = SCAN_SUCCEED;
>>> - mmap_read_unlock(mm);
>>> - }
>>> - } else {
>>> - *result = collapse_scan_pmd(mm, vma,
>>> - khugepaged_scan.address, &mmap_locked, cc);
>>> - }
>>> -
>>> - if (*result == SCAN_SUCCEED)
>>> - ++khugepaged_pages_collapsed;
>>>
>>> + *result = collapse_single_pmd(khugepaged_scan.address,
>>> + vma, &mmap_locked, cc);
>>> /* move to next address */
>>> khugepaged_scan.address += HPAGE_PMD_SIZE;
>>> progress += HPAGE_PMD_NR;
>>> @@ -2799,6 +2830,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
>>> cond_resched();
>>> mmap_read_lock(mm);
>>> mmap_locked = true;
>>> + *lock_dropped = true;
>>> result = hugepage_vma_revalidate(mm, addr, false, &vma,
>>> cc);
>>> if (result != SCAN_SUCCEED) {
>>> @@ -2809,46 +2841,25 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
>>> hend = min(hend, vma->vm_end & HPAGE_PMD_MASK);
>>> }
>>> mmap_assert_locked(mm);
>>> - if (!vma_is_anonymous(vma)) {
>>> - struct file *file = get_file(vma->vm_file);
>>> - pgoff_t pgoff = linear_page_index(vma, addr);
>>>
>>> - mmap_read_unlock(mm);
>>> - mmap_locked = false;
>>> - *lock_dropped = true;
>>> - result = collapse_scan_file(mm, addr, file, pgoff, cc);
>>> -
>>> - if (result == SCAN_PAGE_DIRTY_OR_WRITEBACK && !triggered_wb &&
>>> - mapping_can_writeback(file->f_mapping)) {
>>> - loff_t lstart = (loff_t)pgoff << PAGE_SHIFT;
>>> - loff_t lend = lstart + HPAGE_PMD_SIZE - 1;
>>> + result = collapse_single_pmd(addr, vma, &mmap_locked, cc);
>>>
>>> - filemap_write_and_wait_range(file->f_mapping, lstart, lend);
>>> - triggered_wb = true;
>>> - fput(file);
>>> - goto retry;
>>> - }
>>> - fput(file);
>>> - } else {
>>> - result = collapse_scan_pmd(mm, vma, addr, &mmap_locked, cc);
>>> - }
>>> if (!mmap_locked)
>>> *lock_dropped = true;
>>>
>>> -handle_result:
>>> + if (result == SCAN_PAGE_DIRTY_OR_WRITEBACK && !triggered_wb) {
>>> + triggered_wb = true;
>>> + goto retry;
>>> + }
>>
>> Having triggered_wb set where writeback is not actually triggered is
>> suboptimal.
Good catch!
>
> It took me a second to figure out what you were referring to, but I
> see it now. if we return SCAN_PAGE_D_OR_WB but the can_writeback fails
> it still retries.
>
> Would be an appropriate solution if can_writeback fails to modify the
> return value.
> ie)
Yep, we're on the right track. IIRC, David's concern has two parts:
1) Avoid retry when writeback wasn't actually triggered
(mapping_can_writeback() fails)
2) Avoid calling filemap_write_and_wait_range() twice on retry
The proposed approach below addresses #1, but we still need to tackle #2.
The issue is that on the retry, collapse_single_pmd() doesn't know that
writeback was already performed in the previous round, so it could call
filemap_write_and_wait_range() again if the page is stil dirty.
>
> if (!cc->is_khugepaged && result == SCAN_PAGE_DIRTY_OR_WRITEBACK) {
> if (mapping_can_writeback(file->f_mapping)) {
> const loff_t lstart = (loff_t)pgoff << PAGE_SHIFT;
> const loff_t lend = lstart + HPAGE_PMD_SIZE - 1;
>
> filemap_write_and_wait_range(file->f_mapping, lstart, lend);
> } else {
> result = SCAN_(SOMETHING?)
> }
> }
> fput(file);
>
> we dont have a enum that fits this description but we want one that
> will continue.
>
> Cheers!
> -- Nico
>
>>
>> And you can tell that by realizing that you would now retry once even
>> thought the mapping does not support writeback (
>> mapping_can_writeback(file->f_mapping)) and no writeback actually happened.
>>
>> Further, we would also try to call filemap_write_and_wait_range() now
>> twice instead of only during the first round.
Right. Let's avoid calling it twice.
Cheers,
Lance
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH mm-unstable v1 2/5] mm: introduce is_pmd_order helper
2026-02-12 2:18 ` [PATCH mm-unstable v1 2/5] mm: introduce is_pmd_order helper Nico Pache
` (4 preceding siblings ...)
2026-02-14 7:24 ` Lance Yang
@ 2026-02-20 10:38 ` Dev Jain
2026-02-20 10:42 ` David Hildenbrand (Arm)
5 siblings, 1 reply; 32+ messages in thread
From: Dev Jain @ 2026-02-20 10:38 UTC (permalink / raw)
To: Nico Pache, linux-kernel, linux-mm
Cc: aarcange, akpm, anshuman.khandual, apopple, baohua, baolin.wang,
byungchul, catalin.marinas, cl, corbet, dave.hansen, david,
gourry, hannes, hughd, jackmanb, jack, jannh, jglisse,
joshua.hahnjy, kas, lance.yang, Liam.Howlett, lorenzo.stoakes,
mathieu.desnoyers, matthew.brost, mhiramat, mhocko, peterx,
pfalcato, rakie.kim, raquini, rdunlap, richard.weiyang, rientjes,
rostedt, rppt, ryan.roberts, shivankg, sunnanyong, surenb,
thomas.hellstrom, tiwai, usamaarif642, vbabka, vishal.moola,
wangkefeng.wang, will, willy, yang, ying.huang, ziy, zokeefe
On 12/02/26 7:48 am, Nico Pache wrote:
> In order to add mTHP support to khugepaged, we will often be checking if a
> given order is (or is not) a PMD order. Some places in the kernel already
> use this check, so lets create a simple helper function to keep the code
> clean and readable.
>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Signed-off-by: Nico Pache <npache@redhat.com>
>
> arches# pick b3ace8be204f # mm/khugepaged: rename hpage_collapse_* to collapse_*
> ---
Thanks, this is useful.
In drivers/dax/device.c and fs/dax.c, we have order == PMD_ORDER check,
not HPAGE_PMD_ORDER. Therefore,
1) Do we have a bug here, in that these codepaths are fault
handlers, therefore should be using HPAGE_PMD_ORDER since the
definition of this macro zeroes out on !CONFIG_PGTABLE_HAS_HUGE_LEAVES?
2) If the distinction between HPAGE_PMD_ORDER and PMD_ORDER is real,
then the helper should be named is_hpage_pmd_order?
> include/linux/huge_mm.h | 5 +++++
> mm/huge_memory.c | 2 +-
> mm/khugepaged.c | 4 ++--
> mm/mempolicy.c | 2 +-
> mm/page_alloc.c | 2 +-
> 5 files changed, 10 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index a4d9f964dfde..bd7f0e1d8094 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -771,6 +771,11 @@ static inline bool pmd_is_huge(pmd_t pmd)
> }
> #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>
> +static inline bool is_pmd_order(unsigned int order)
> +{
> + return order == HPAGE_PMD_ORDER;
> +}
> +
> static inline int split_folio_to_list_to_order(struct folio *folio,
> struct list_head *list, int new_order)
> {
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 44ff8a648afd..5eae85818635 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -4097,7 +4097,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
> i_mmap_unlock_read(mapping);
> out:
> xas_destroy(&xas);
> - if (old_order == HPAGE_PMD_ORDER)
> + if (is_pmd_order(old_order))
> count_vm_event(!ret ? THP_SPLIT_PAGE : THP_SPLIT_PAGE_FAILED);
> count_mthp_stat(old_order, !ret ? MTHP_STAT_SPLIT : MTHP_STAT_SPLIT_FAILED);
> return ret;
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index fa1e57fd2c46..c362b3b2e08a 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2000,7 +2000,7 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
> * we locked the first folio, then a THP might be there already.
> * This will be discovered on the first iteration.
> */
> - if (folio_order(folio) == HPAGE_PMD_ORDER &&
> + if (is_pmd_order(folio_order(folio)) &&
> folio->index == start) {
> /* Maybe PMD-mapped */
> result = SCAN_PTE_MAPPED_HUGEPAGE;
> @@ -2329,7 +2329,7 @@ static enum scan_result hpage_collapse_scan_file(struct mm_struct *mm, unsigned
> continue;
> }
>
> - if (folio_order(folio) == HPAGE_PMD_ORDER &&
> + if (is_pmd_order(folio_order(folio)) &&
> folio->index == start) {
> /* Maybe PMD-mapped */
> result = SCAN_PTE_MAPPED_HUGEPAGE;
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index dbd48502ac24..3802e52b01fc 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -2450,7 +2450,7 @@ static struct page *alloc_pages_mpol(gfp_t gfp, unsigned int order,
>
> if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
> /* filter "hugepage" allocation, unless from alloc_pages() */
> - order == HPAGE_PMD_ORDER && ilx != NO_INTERLEAVE_INDEX) {
> + is_pmd_order(order) && ilx != NO_INTERLEAVE_INDEX) {
> /*
> * For hugepage allocation and non-interleave policy which
> * allows the current node (or other explicitly preferred
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c2e96ac35636..2acf22f97ae5 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -719,7 +719,7 @@ static inline bool pcp_allowed_order(unsigned int order)
> if (order <= PAGE_ALLOC_COSTLY_ORDER)
> return true;
> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> - if (order == HPAGE_PMD_ORDER)
> + if (is_pmd_order(order))
> return true;
> #endif
> return false;
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH mm-unstable v1 2/5] mm: introduce is_pmd_order helper
2026-02-20 10:38 ` Dev Jain
@ 2026-02-20 10:42 ` David Hildenbrand (Arm)
0 siblings, 0 replies; 32+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-20 10:42 UTC (permalink / raw)
To: Dev Jain, Nico Pache, linux-kernel, linux-mm
Cc: aarcange, akpm, anshuman.khandual, apopple, baohua, baolin.wang,
byungchul, catalin.marinas, cl, corbet, dave.hansen, gourry,
hannes, hughd, jackmanb, jack, jannh, jglisse, joshua.hahnjy,
kas, lance.yang, Liam.Howlett, lorenzo.stoakes,
mathieu.desnoyers, matthew.brost, mhiramat, mhocko, peterx,
pfalcato, rakie.kim, raquini, rdunlap, richard.weiyang, rientjes,
rostedt, rppt, ryan.roberts, shivankg, sunnanyong, surenb,
thomas.hellstrom, tiwai, usamaarif642, vbabka, vishal.moola,
wangkefeng.wang, will, willy, yang, ying.huang, ziy, zokeefe
On 2/20/26 11:38, Dev Jain wrote:
>
> On 12/02/26 7:48 am, Nico Pache wrote:
>> In order to add mTHP support to khugepaged, we will often be checking if a
>> given order is (or is not) a PMD order. Some places in the kernel already
>> use this check, so lets create a simple helper function to keep the code
>> clean and readable.
>>
>> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>> Signed-off-by: Nico Pache <npache@redhat.com>
>>
>> arches# pick b3ace8be204f # mm/khugepaged: rename hpage_collapse_* to collapse_*
>> ---
>
> Thanks, this is useful.
>
> In drivers/dax/device.c and fs/dax.c, we have order == PMD_ORDER check,
> not HPAGE_PMD_ORDER. Therefore,
>
> 1) Do we have a bug here, in that these codepaths are fault
> handlers, therefore should be using HPAGE_PMD_ORDER since the
> definition of this macro zeroes out on !CONFIG_PGTABLE_HAS_HUGE_LEAVES?
>
> 2) If the distinction between HPAGE_PMD_ORDER and PMD_ORDER is real,
> then the helper should be named is_hpage_pmd_order?
HPAGE_PMD_ORDER is weird a stupid legacy leftover IIUC,
The only thing it checks is that you are not using HPAGE_PMD_ORDER in
code that might be compiled without THP support:
#ifdef CONFIG_PGTABLE_HAS_HUGE_LEAVES
#define HPAGE_PMD_SHIFT PMD_SHIFT
...
#else
#define HPAGE_PMD_SHIFT ({ BUILD_BUG(); 0; })
...
--
Cheers,
David
^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2026-02-20 10:43 UTC | newest]
Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-12 2:18 [PATCH mm-unstable v1 0/5] mm: khugepaged cleanups and mTHP prerequisites Nico Pache
2026-02-12 2:18 ` [PATCH mm-unstable v1 1/5] mm: consolidate anonymous folio PTE mapping into helpers Nico Pache
2026-02-12 14:38 ` Pedro Falcato
2026-02-12 15:55 ` Joshua Hahn
2026-02-12 19:33 ` Nico Pache
2026-02-12 16:09 ` Zi Yan
2026-02-12 19:45 ` Nico Pache
2026-02-12 20:06 ` Zi Yan
2026-02-12 20:13 ` David Hildenbrand (Arm)
2026-02-12 2:18 ` [PATCH mm-unstable v1 2/5] mm: introduce is_pmd_order helper Nico Pache
2026-02-12 14:40 ` Pedro Falcato
2026-02-12 16:11 ` Zi Yan
2026-02-12 19:45 ` David Hildenbrand (Arm)
2026-02-13 3:51 ` Barry Song
2026-02-14 7:24 ` Lance Yang
2026-02-20 10:38 ` Dev Jain
2026-02-20 10:42 ` David Hildenbrand (Arm)
2026-02-12 2:18 ` [PATCH mm-unstable v1 3/5] mm/khugepaged: define COLLAPSE_MAX_PTES_LIMIT as HPAGE_PMD_NR - 1 Nico Pache
2026-02-12 6:56 ` Vernon Yang
2026-02-12 14:45 ` Pedro Falcato
2026-02-12 16:21 ` Zi Yan
2026-02-12 19:51 ` David Hildenbrand (Arm)
2026-02-12 2:23 ` [PATCH mm-unstable v1 4/5] mm/khugepaged: rename hpage_collapse_* to collapse_* Nico Pache
2026-02-12 2:23 ` [PATCH mm-unstable v1 5/5] mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd() Nico Pache
2026-02-12 19:52 ` [PATCH mm-unstable v1 4/5] mm/khugepaged: rename hpage_collapse_* to collapse_* David Hildenbrand (Arm)
2026-02-12 2:25 ` [PATCH mm-unstable v1 5/5] mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd() Nico Pache
2026-02-12 15:33 ` Pedro Falcato
2026-02-12 17:34 ` Zi Yan
2026-02-12 20:03 ` David Hildenbrand (Arm)
2026-02-12 20:26 ` Nico Pache
2026-02-14 8:24 ` Lance Yang
2026-02-12 2:26 ` [PATCH mm-unstable v1 0/5] mm: khugepaged cleanups and mTHP prerequisites Nico Pache
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox