linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/6] mm: Fix vmemmap optimization accounting and initialization
@ 2026-04-15 11:14 Muchun Song
  2026-04-15 11:14 ` [PATCH v2 1/6] mm/sparse-vmemmap: Fix vmemmap accounting underflow Muchun Song
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Muchun Song @ 2026-04-15 11:14 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Muchun Song, Oscar Salvador,
	Michael Ellerman, Madhavan Srinivasan
  Cc: Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Nicholas Piggin,
	Christophe Leroy, aneesh.kumar, joao.m.martins, linux-mm,
	linuxppc-dev, linux-kernel, Muchun Song

This patch series fixes a number of issues related to vmemmap optimization
for compound pages (e.g., DAX), including incorrect page accounting and
missing architecture-specific initialization steps.

The series addresses these issues through the following steps:
- Patch 1: Fixes a vmemmap accounting underflow in error paths.
- Patch 2-3: Fixes DAX vmemmap accounting by plumbing the pgmap
  argument through memory deactivation paths.
- Patch 4-5: Fixes missing architecture-specific page table syncs by
  plumbing the pgmap argument through vmemmap_populate APIs.
- Patch 6: Fixes pageblock migratetype initialization for large
  compound ZONE_DEVICE pages.

Changelog:
v1 -> v2:
- Moved vmemmap accounting to populate_section_memmap() /
  depopulate_section_memmap() to fix accounting underflow,
  suggested by Mike.
- Replaced VM_BUG_ON with VM_WARN_ON_ONCE as requested by David.
- Reduced frequency of calling cond_resched() in pageblock_migratetype_init_range()
  based on feedback from David and Mike.
- Extracted all bugfix patches from a larger patchset
  (https://lore.kernel.org/linux-mm/20260405125240.2558577-1-songmuchun@bytedance.com/)
  into this separate patchset, suggested by David.

Muchun Song (6):
  mm/sparse-vmemmap: Fix vmemmap accounting underflow
  mm/sparse-vmemmap: Pass @pgmap argument to memory deactivation paths
  mm/sparse-vmemmap: Fix DAX vmemmap accounting with optimization
  mm/sparse-vmemmap: Pass @pgmap argument to arch vmemmap_populate()
  mm/sparse-vmemmap: Fix missing architecture-specific page table sync
  mm/mm_init: Fix pageblock migratetype for ZONE_DEVICE compound pages

 arch/arm64/mm/mmu.c                        | 11 +--
 arch/loongarch/mm/init.c                   | 12 +--
 arch/powerpc/include/asm/book3s/64/radix.h |  9 +--
 arch/powerpc/mm/book3s64/radix_pgtable.c   | 18 +++--
 arch/powerpc/mm/init_64.c                  |  4 +-
 arch/powerpc/mm/mem.c                      |  5 +-
 arch/riscv/mm/init.c                       |  9 ++-
 arch/s390/mm/init.c                        |  5 +-
 arch/s390/mm/vmem.c                        |  2 +-
 arch/sparc/mm/init_64.c                    |  5 +-
 arch/x86/mm/init_64.c                      | 13 +--
 include/linux/memory_hotplug.h             |  8 +-
 include/linux/mm.h                         |  8 +-
 mm/hugetlb_vmemmap.c                       |  4 +-
 mm/memory_hotplug.c                        | 12 +--
 mm/memremap.c                              |  4 +-
 mm/mm_init.c                               | 42 ++++++----
 mm/sparse-vmemmap.c                        | 93 ++++++++++++++--------
 18 files changed, 160 insertions(+), 104 deletions(-)

-- 
2.20.1



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 1/6] mm/sparse-vmemmap: Fix vmemmap accounting underflow
  2026-04-15 11:14 [PATCH v2 0/6] mm: Fix vmemmap optimization accounting and initialization Muchun Song
@ 2026-04-15 11:14 ` Muchun Song
  2026-04-15 11:26   ` Muchun Song
  2026-04-15 15:53   ` Mike Rapoport
  2026-04-15 11:14 ` [PATCH v2 2/6] mm/sparse-vmemmap: Pass @pgmap argument to memory deactivation paths Muchun Song
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 13+ messages in thread
From: Muchun Song @ 2026-04-15 11:14 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Muchun Song, Oscar Salvador,
	Michael Ellerman, Madhavan Srinivasan
  Cc: Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Nicholas Piggin,
	Christophe Leroy, aneesh.kumar, joao.m.martins, linux-mm,
	linuxppc-dev, linux-kernel, Muchun Song

In section_activate(), if populate_section_memmap() fails, the error
handling path calls section_deactivate() to roll back the state. This
causes a vmemmap accounting imbalance.

Since commit c3576889d87b ("mm: fix accounting of memmap pages"),
memmap pages are accounted for only after populate_section_memmap()
succeeds. However, the failure path unconditionally calls
section_deactivate(), which decreases the vmemmap count. Consequently,
a failure in populate_section_memmap() leads to an accounting underflow,
incorrectly reducing the system's tracked vmemmap usage.

Fix this more thoroughly by moving all accounting calls into the lower
level functions that actually perform the vmemmap allocation and freeing:

  - populate_section_memmap() accounts for newly allocated vmemmap pages
  - depopulate_section_memmap() unaccounts when vmemmap is freed
  - free_map_bootmem() handles early bootmem section accounting

This ensures proper accounting in all code paths, including error
handling and early section cases.

Fixes: c3576889d87b ("mm: fix accounting of memmap pages")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 mm/sparse-vmemmap.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 6eadb9d116e4..a7b11248b989 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -656,7 +656,12 @@ static struct page * __meminit populate_section_memmap(unsigned long pfn,
 		unsigned long nr_pages, int nid, struct vmem_altmap *altmap,
 		struct dev_pagemap *pgmap)
 {
-	return __populate_section_memmap(pfn, nr_pages, nid, altmap, pgmap);
+	struct page *page = __populate_section_memmap(pfn, nr_pages, nid, altmap,
+						      pgmap);
+
+	memmap_pages_add(DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE));
+
+	return page;
 }
 
 static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages,
@@ -665,13 +670,17 @@ static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages,
 	unsigned long start = (unsigned long) pfn_to_page(pfn);
 	unsigned long end = start + nr_pages * sizeof(struct page);
 
+	memmap_pages_add(-1L * (DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE)));
 	vmemmap_free(start, end, altmap);
 }
+
 static void free_map_bootmem(struct page *memmap)
 {
 	unsigned long start = (unsigned long)memmap;
 	unsigned long end = (unsigned long)(memmap + PAGES_PER_SECTION);
 
+	memmap_boot_pages_add(-1L * (DIV_ROUND_UP(PAGES_PER_SECTION * sizeof(struct page),
+						  PAGE_SIZE)));
 	vmemmap_free(start, end, NULL);
 }
 
@@ -774,14 +783,10 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
 	 * The memmap of early sections is always fully populated. See
 	 * section_activate() and pfn_valid() .
 	 */
-	if (!section_is_early) {
-		memmap_pages_add(-1L * (DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE)));
+	if (!section_is_early)
 		depopulate_section_memmap(pfn, nr_pages, altmap);
-	} else if (memmap) {
-		memmap_boot_pages_add(-1L * (DIV_ROUND_UP(nr_pages * sizeof(struct page),
-							  PAGE_SIZE)));
+	else if (memmap)
 		free_map_bootmem(memmap);
-	}
 
 	if (empty)
 		ms->section_mem_map = (unsigned long)NULL;
@@ -826,7 +831,6 @@ static struct page * __meminit section_activate(int nid, unsigned long pfn,
 		section_deactivate(pfn, nr_pages, altmap);
 		return ERR_PTR(-ENOMEM);
 	}
-	memmap_pages_add(DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE));
 
 	return memmap;
 }
-- 
2.20.1



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 2/6] mm/sparse-vmemmap: Pass @pgmap argument to memory deactivation paths
  2026-04-15 11:14 [PATCH v2 0/6] mm: Fix vmemmap optimization accounting and initialization Muchun Song
  2026-04-15 11:14 ` [PATCH v2 1/6] mm/sparse-vmemmap: Fix vmemmap accounting underflow Muchun Song
@ 2026-04-15 11:14 ` Muchun Song
  2026-04-15 15:55   ` Mike Rapoport
  2026-04-15 11:14 ` [PATCH v2 3/6] mm/sparse-vmemmap: Fix DAX vmemmap accounting with optimization Muchun Song
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Muchun Song @ 2026-04-15 11:14 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Muchun Song, Oscar Salvador,
	Michael Ellerman, Madhavan Srinivasan
  Cc: Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Nicholas Piggin,
	Christophe Leroy, aneesh.kumar, joao.m.martins, linux-mm,
	linuxppc-dev, linux-kernel, Muchun Song

Currently, the memory hot-remove call chain -- arch_remove_memory(),
__remove_pages(), sparse_remove_section() and section_deactivate() --
does not carry the struct dev_pagemap pointer. This prevents the lower
levels from knowing whether the section was originally populated with
vmemmap optimizations (e.g., DAX with vmemmap optimization enabled).

Without this information, we cannot call vmemmap_can_optimize() to
determine if the vmemmap pages were optimized. As a result, the vmemmap
page accounting during teardown will mistakenly assume a non-optimized
allocation, leading to incorrect memmap statistics.

To lay the groundwork for fixing the vmemmap page accounting, we need
to pass the @pgmap pointer down to the deactivation location. Plumb the
@pgmap argument through the APIs of arch_remove_memory(), __remove_pages()
and sparse_remove_section(), mirroring the corresponding *_activate()
paths.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 arch/arm64/mm/mmu.c            |  5 +++--
 arch/loongarch/mm/init.c       |  5 +++--
 arch/powerpc/mm/mem.c          |  5 +++--
 arch/riscv/mm/init.c           |  5 +++--
 arch/s390/mm/init.c            |  5 +++--
 arch/x86/mm/init_64.c          |  5 +++--
 include/linux/memory_hotplug.h |  8 +++++---
 mm/memory_hotplug.c            | 12 ++++++------
 mm/memremap.c                  |  4 ++--
 mm/sparse-vmemmap.c            | 17 +++++++++--------
 10 files changed, 40 insertions(+), 31 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index dd85e093ffdb..e5a42b7a0160 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -2024,12 +2024,13 @@ int arch_add_memory(int nid, u64 start, u64 size,
 	return ret;
 }
 
-void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
+void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap,
+			struct dev_pagemap *pgmap)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 
-	__remove_pages(start_pfn, nr_pages, altmap);
+	__remove_pages(start_pfn, nr_pages, altmap, pgmap);
 	__remove_pgd_mapping(swapper_pg_dir, __phys_to_virt(start), size);
 }
 
diff --git a/arch/loongarch/mm/init.c b/arch/loongarch/mm/init.c
index 00f3822b6e47..c9c57f08fa2c 100644
--- a/arch/loongarch/mm/init.c
+++ b/arch/loongarch/mm/init.c
@@ -86,7 +86,8 @@ int arch_add_memory(int nid, u64 start, u64 size, struct mhp_params *params)
 	return ret;
 }
 
-void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
+void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap,
+			struct dev_pagemap *pgmap)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
@@ -95,7 +96,7 @@ void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
 	/* With altmap the first mapped page is offset from @start */
 	if (altmap)
 		page += vmem_altmap_offset(altmap);
-	__remove_pages(start_pfn, nr_pages, altmap);
+	__remove_pages(start_pfn, nr_pages, altmap, pgmap);
 }
 #endif
 
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 648d0c5602ec..4c1afab91996 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -158,12 +158,13 @@ int __ref arch_add_memory(int nid, u64 start, u64 size,
 	return rc;
 }
 
-void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
+void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap,
+			      struct dev_pagemap *pgmap)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 
-	__remove_pages(start_pfn, nr_pages, altmap);
+	__remove_pages(start_pfn, nr_pages, altmap, pgmap);
 	arch_remove_linear_mapping(start, size);
 }
 #endif
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index decd7df40fa4..b0092fb842a3 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -1717,9 +1717,10 @@ int __ref arch_add_memory(int nid, u64 start, u64 size, struct mhp_params *param
 	return ret;
 }
 
-void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
+void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap,
+			      struct dev_pagemap *pgmap)
 {
-	__remove_pages(start >> PAGE_SHIFT, size >> PAGE_SHIFT, altmap);
+	__remove_pages(start >> PAGE_SHIFT, size >> PAGE_SHIFT, altmap, pgmap);
 	remove_linear_mapping(start, size);
 	flush_tlb_all();
 }
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index 1f72efc2a579..11a689423440 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -276,12 +276,13 @@ int arch_add_memory(int nid, u64 start, u64 size,
 	return rc;
 }
 
-void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
+void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap,
+			struct dev_pagemap *pgmap)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 
-	__remove_pages(start_pfn, nr_pages, altmap);
+	__remove_pages(start_pfn, nr_pages, altmap, pgmap);
 	vmem_remove_mapping(start, size);
 }
 #endif /* CONFIG_MEMORY_HOTPLUG */
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index df2261fa4f98..77b889b71cf3 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1288,12 +1288,13 @@ kernel_physical_mapping_remove(unsigned long start, unsigned long end)
 	remove_pagetable(start, end, true, NULL);
 }
 
-void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
+void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap,
+			      struct dev_pagemap *pgmap)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 
-	__remove_pages(start_pfn, nr_pages, altmap);
+	__remove_pages(start_pfn, nr_pages, altmap, pgmap);
 	kernel_physical_mapping_remove(start, start + size);
 }
 #endif /* CONFIG_MEMORY_HOTPLUG */
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 815e908c4135..7c9d66729c60 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -135,9 +135,10 @@ static inline bool movable_node_is_enabled(void)
 	return movable_node_enabled;
 }
 
-extern void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap);
+extern void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap,
+			       struct dev_pagemap *pgmap);
 extern void __remove_pages(unsigned long start_pfn, unsigned long nr_pages,
-			   struct vmem_altmap *altmap);
+			   struct vmem_altmap *altmap, struct dev_pagemap *pgmap);
 
 /* reasonably generic interface to expand the physical pages */
 extern int __add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
@@ -307,7 +308,8 @@ extern int sparse_add_section(int nid, unsigned long pfn,
 		unsigned long nr_pages, struct vmem_altmap *altmap,
 		struct dev_pagemap *pgmap);
 extern void sparse_remove_section(unsigned long pfn, unsigned long nr_pages,
-				  struct vmem_altmap *altmap);
+				  struct vmem_altmap *altmap,
+				  struct dev_pagemap *pgmap);
 extern struct zone *zone_for_pfn_range(enum mmop online_type,
 		int nid, struct memory_group *group, unsigned long start_pfn,
 		unsigned long nr_pages);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 2a943ec57c85..6a9e2dc751d2 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -583,7 +583,7 @@ void remove_pfn_range_from_zone(struct zone *zone,
  * calling offline_pages().
  */
 void __remove_pages(unsigned long pfn, unsigned long nr_pages,
-		    struct vmem_altmap *altmap)
+		    struct vmem_altmap *altmap, struct dev_pagemap *pgmap)
 {
 	const unsigned long end_pfn = pfn + nr_pages;
 	unsigned long cur_nr_pages;
@@ -598,7 +598,7 @@ void __remove_pages(unsigned long pfn, unsigned long nr_pages,
 		/* Select all remaining pages up to the next section boundary */
 		cur_nr_pages = min(end_pfn - pfn,
 				   SECTION_ALIGN_UP(pfn + 1) - pfn);
-		sparse_remove_section(pfn, cur_nr_pages, altmap);
+		sparse_remove_section(pfn, cur_nr_pages, altmap, pgmap);
 	}
 }
 
@@ -1425,7 +1425,7 @@ static void remove_memory_blocks_and_altmaps(u64 start, u64 size)
 
 		remove_memory_block_devices(cur_start, memblock_size);
 
-		arch_remove_memory(cur_start, memblock_size, altmap);
+		arch_remove_memory(cur_start, memblock_size, altmap, NULL);
 
 		/* Verify that all vmemmap pages have actually been freed. */
 		WARN(altmap->alloc, "Altmap not fully unmapped");
@@ -1468,7 +1468,7 @@ static int create_altmaps_and_memory_blocks(int nid, struct memory_group *group,
 		ret = create_memory_block_devices(cur_start, memblock_size, nid,
 						  params.altmap, group);
 		if (ret) {
-			arch_remove_memory(cur_start, memblock_size, NULL);
+			arch_remove_memory(cur_start, memblock_size, NULL, NULL);
 			kfree(params.altmap);
 			goto out;
 		}
@@ -1554,7 +1554,7 @@ int add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 		/* create memory block devices after memory was added */
 		ret = create_memory_block_devices(start, size, nid, NULL, group);
 		if (ret) {
-			arch_remove_memory(start, size, params.altmap);
+			arch_remove_memory(start, size, params.altmap, NULL);
 			goto error;
 		}
 	}
@@ -2266,7 +2266,7 @@ static int try_remove_memory(u64 start, u64 size)
 		 * No altmaps present, do the removal directly
 		 */
 		remove_memory_block_devices(start, size);
-		arch_remove_memory(start, size, NULL);
+		arch_remove_memory(start, size, NULL, NULL);
 	} else {
 		/* all memblocks in the range have altmaps */
 		remove_memory_blocks_and_altmaps(start, size);
diff --git a/mm/memremap.c b/mm/memremap.c
index ac7be07e3361..c45b90f334ea 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -97,10 +97,10 @@ static void pageunmap_range(struct dev_pagemap *pgmap, int range_id)
 				   PHYS_PFN(range_len(range)));
 	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
 		__remove_pages(PHYS_PFN(range->start),
-			       PHYS_PFN(range_len(range)), NULL);
+			       PHYS_PFN(range_len(range)), NULL, pgmap);
 	} else {
 		arch_remove_memory(range->start, range_len(range),
-				pgmap_altmap(pgmap));
+				pgmap_altmap(pgmap), pgmap);
 		kasan_remove_zero_shadow(__va(range->start), range_len(range));
 	}
 	mem_hotplug_done();
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index a7b11248b989..40290fbc1db4 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -665,7 +665,7 @@ static struct page * __meminit populate_section_memmap(unsigned long pfn,
 }
 
 static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages,
-		struct vmem_altmap *altmap)
+		struct vmem_altmap *altmap, struct dev_pagemap *pgmap)
 {
 	unsigned long start = (unsigned long) pfn_to_page(pfn);
 	unsigned long end = start + nr_pages * sizeof(struct page);
@@ -674,7 +674,8 @@ static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages,
 	vmemmap_free(start, end, altmap);
 }
 
-static void free_map_bootmem(struct page *memmap)
+static void free_map_bootmem(struct page *memmap, struct vmem_altmap *altmap,
+		struct dev_pagemap *pgmap)
 {
 	unsigned long start = (unsigned long)memmap;
 	unsigned long end = (unsigned long)(memmap + PAGES_PER_SECTION);
@@ -746,7 +747,7 @@ static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
  * usage map, but still need to free the vmemmap range.
  */
 static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
-		struct vmem_altmap *altmap)
+		struct vmem_altmap *altmap, struct dev_pagemap *pgmap)
 {
 	struct mem_section *ms = __pfn_to_section(pfn);
 	bool section_is_early = early_section(ms);
@@ -784,9 +785,9 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
 	 * section_activate() and pfn_valid() .
 	 */
 	if (!section_is_early)
-		depopulate_section_memmap(pfn, nr_pages, altmap);
+		depopulate_section_memmap(pfn, nr_pages, altmap, pgmap);
 	else if (memmap)
-		free_map_bootmem(memmap);
+		free_map_bootmem(memmap, altmap, pgmap);
 
 	if (empty)
 		ms->section_mem_map = (unsigned long)NULL;
@@ -828,7 +829,7 @@ static struct page * __meminit section_activate(int nid, unsigned long pfn,
 
 	memmap = populate_section_memmap(pfn, nr_pages, nid, altmap, pgmap);
 	if (!memmap) {
-		section_deactivate(pfn, nr_pages, altmap);
+		section_deactivate(pfn, nr_pages, altmap, pgmap);
 		return ERR_PTR(-ENOMEM);
 	}
 
@@ -889,13 +890,13 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
 }
 
 void sparse_remove_section(unsigned long pfn, unsigned long nr_pages,
-			   struct vmem_altmap *altmap)
+			   struct vmem_altmap *altmap, struct dev_pagemap *pgmap)
 {
 	struct mem_section *ms = __pfn_to_section(pfn);
 
 	if (WARN_ON_ONCE(!valid_section(ms)))
 		return;
 
-	section_deactivate(pfn, nr_pages, altmap);
+	section_deactivate(pfn, nr_pages, altmap, pgmap);
 }
 #endif /* CONFIG_MEMORY_HOTPLUG */
-- 
2.20.1



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 3/6] mm/sparse-vmemmap: Fix DAX vmemmap accounting with optimization
  2026-04-15 11:14 [PATCH v2 0/6] mm: Fix vmemmap optimization accounting and initialization Muchun Song
  2026-04-15 11:14 ` [PATCH v2 1/6] mm/sparse-vmemmap: Fix vmemmap accounting underflow Muchun Song
  2026-04-15 11:14 ` [PATCH v2 2/6] mm/sparse-vmemmap: Pass @pgmap argument to memory deactivation paths Muchun Song
@ 2026-04-15 11:14 ` Muchun Song
  2026-04-15 15:58   ` Mike Rapoport
  2026-04-15 11:14 ` [PATCH v2 4/6] mm/sparse-vmemmap: Pass @pgmap argument to arch vmemmap_populate() Muchun Song
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Muchun Song @ 2026-04-15 11:14 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Muchun Song, Oscar Salvador,
	Michael Ellerman, Madhavan Srinivasan
  Cc: Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Nicholas Piggin,
	Christophe Leroy, aneesh.kumar, joao.m.martins, linux-mm,
	linuxppc-dev, linux-kernel, Muchun Song

When vmemmap optimization is enabled for DAX, the nr_memmap_pages
counter in /proc/vmstat is incorrect. The current code always accounts
for the full, non-optimized vmemmap size, but vmemmap optimization
reduces the actual number of vmemmap pages by reusing tail pages. This
causes the system to overcount vmemmap usage, leading to inaccurate
page statistics in /proc/vmstat.

Fix this by introducing section_vmemmap_pages(), which returns the exact
vmemmap page count for a given pfn range based on whether optimization
is in effect.

Fixes: 15995a352474 ("mm: report per-page metadata information")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 mm/sparse-vmemmap.c | 32 ++++++++++++++++++++++++++++----
 1 file changed, 28 insertions(+), 4 deletions(-)

diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 40290fbc1db4..05e3e2b94e32 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -652,6 +652,29 @@ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
 	}
 }
 
+static int __meminit section_vmemmap_pages(unsigned long pfn, unsigned long nr_pages,
+					   struct vmem_altmap *altmap,
+					   struct dev_pagemap *pgmap)
+{
+	unsigned int order = pgmap ? pgmap->vmemmap_shift : 0;
+	unsigned long pages_per_compound = 1L << order;
+
+	VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | nr_pages, min(pages_per_compound,
+							PAGES_PER_SECTION)));
+	VM_WARN_ON_ONCE(pfn_to_section_nr(pfn) != pfn_to_section_nr(pfn + nr_pages - 1));
+
+	if (!vmemmap_can_optimize(altmap, pgmap))
+		return DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE);
+
+	if (order < PFN_SECTION_SHIFT)
+		return VMEMMAP_RESERVE_NR * nr_pages / pages_per_compound;
+
+	if (IS_ALIGNED(pfn, pages_per_compound))
+		return VMEMMAP_RESERVE_NR;
+
+	return 0;
+}
+
 static struct page * __meminit populate_section_memmap(unsigned long pfn,
 		unsigned long nr_pages, int nid, struct vmem_altmap *altmap,
 		struct dev_pagemap *pgmap)
@@ -659,7 +682,7 @@ static struct page * __meminit populate_section_memmap(unsigned long pfn,
 	struct page *page = __populate_section_memmap(pfn, nr_pages, nid, altmap,
 						      pgmap);
 
-	memmap_pages_add(DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE));
+	memmap_pages_add(section_vmemmap_pages(pfn, nr_pages, altmap, pgmap));
 
 	return page;
 }
@@ -670,7 +693,7 @@ static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages,
 	unsigned long start = (unsigned long) pfn_to_page(pfn);
 	unsigned long end = start + nr_pages * sizeof(struct page);
 
-	memmap_pages_add(-1L * (DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE)));
+	memmap_pages_add(-section_vmemmap_pages(pfn, nr_pages, altmap, pgmap));
 	vmemmap_free(start, end, altmap);
 }
 
@@ -679,9 +702,10 @@ static void free_map_bootmem(struct page *memmap, struct vmem_altmap *altmap,
 {
 	unsigned long start = (unsigned long)memmap;
 	unsigned long end = (unsigned long)(memmap + PAGES_PER_SECTION);
+	unsigned long pfn = page_to_pfn(memmap);
 
-	memmap_boot_pages_add(-1L * (DIV_ROUND_UP(PAGES_PER_SECTION * sizeof(struct page),
-						  PAGE_SIZE)));
+	memmap_boot_pages_add(-section_vmemmap_pages(pfn, PAGES_PER_SECTION,
+						     altmap, pgmap));
 	vmemmap_free(start, end, NULL);
 }
 
-- 
2.20.1



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 4/6] mm/sparse-vmemmap: Pass @pgmap argument to arch vmemmap_populate()
  2026-04-15 11:14 [PATCH v2 0/6] mm: Fix vmemmap optimization accounting and initialization Muchun Song
                   ` (2 preceding siblings ...)
  2026-04-15 11:14 ` [PATCH v2 3/6] mm/sparse-vmemmap: Fix DAX vmemmap accounting with optimization Muchun Song
@ 2026-04-15 11:14 ` Muchun Song
  2026-04-15 12:13   ` Joao Martins
  2026-04-15 11:14 ` [PATCH v2 5/6] mm/sparse-vmemmap: Fix missing architecture-specific page table sync Muchun Song
  2026-04-15 11:14 ` [PATCH v2 6/6] mm/mm_init: Fix pageblock migratetype for ZONE_DEVICE compound pages Muchun Song
  5 siblings, 1 reply; 13+ messages in thread
From: Muchun Song @ 2026-04-15 11:14 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Muchun Song, Oscar Salvador,
	Michael Ellerman, Madhavan Srinivasan
  Cc: Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Nicholas Piggin,
	Christophe Leroy, aneesh.kumar, joao.m.martins, linux-mm,
	linuxppc-dev, linux-kernel, Muchun Song

Add the struct dev_pagemap pointer as a parameter to the architecture
specific vmemmap_populate(), vmemmap_populate_hugepages() and
vmemmap_populate_basepages() functions.

Currently, the vmemmap optimization for DAX is handled mostly in an
architecture-agnostic way via vmemmap_populate_compound_pages().
However, this approach skips crucial architecture-specific initialization
steps. For example, the x86 path must call sync_global_pgds() after
populating the vmemmap, which is currently being bypassed.

To lay the groundwork for fixing the vmemmap optimization in the arch
level, we need to pass the @pgmap pointer down to the arch specific
vmemmap_populate() location. Plumb the @pgmap argument through the APIs
of vmemmap_populate(), vmemmap_populate_hugepages() and
vmemmap_populate_basepages().

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 arch/arm64/mm/mmu.c                        |  6 +++---
 arch/loongarch/mm/init.c                   |  7 ++++---
 arch/powerpc/include/asm/book3s/64/radix.h |  3 ++-
 arch/powerpc/mm/book3s64/radix_pgtable.c   |  2 +-
 arch/powerpc/mm/init_64.c                  |  4 ++--
 arch/riscv/mm/init.c                       |  4 ++--
 arch/s390/mm/vmem.c                        |  2 +-
 arch/sparc/mm/init_64.c                    |  5 +++--
 arch/x86/mm/init_64.c                      |  8 ++++----
 include/linux/mm.h                         |  8 +++++---
 mm/hugetlb_vmemmap.c                       |  4 ++--
 mm/sparse-vmemmap.c                        | 10 ++++++----
 12 files changed, 35 insertions(+), 28 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index e5a42b7a0160..11227e104c48 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1790,7 +1790,7 @@ int __meminit vmemmap_check_pmd(pmd_t *pmdp, int node,
 }
 
 int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
-		struct vmem_altmap *altmap)
+		struct vmem_altmap *altmap, struct dev_pagemap *pgmap)
 {
 	WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
 	/* [start, end] should be within one section */
@@ -1798,9 +1798,9 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
 
 	if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES) ||
 	    (end - start < PAGES_PER_SECTION * sizeof(struct page)))
-		return vmemmap_populate_basepages(start, end, node, altmap);
+		return vmemmap_populate_basepages(start, end, node, altmap, pgmap);
 	else
-		return vmemmap_populate_hugepages(start, end, node, altmap);
+		return vmemmap_populate_hugepages(start, end, node, altmap, pgmap);
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
diff --git a/arch/loongarch/mm/init.c b/arch/loongarch/mm/init.c
index c9c57f08fa2c..d61c2e09caae 100644
--- a/arch/loongarch/mm/init.c
+++ b/arch/loongarch/mm/init.c
@@ -123,12 +123,13 @@ int __meminit vmemmap_check_pmd(pmd_t *pmd, int node,
 }
 
 int __meminit vmemmap_populate(unsigned long start, unsigned long end,
-			       int node, struct vmem_altmap *altmap)
+			       int node, struct vmem_altmap *altmap,
+			       struct dev_pagemap *pgmap)
 {
 #if CONFIG_PGTABLE_LEVELS == 2
-	return vmemmap_populate_basepages(start, end, node, NULL);
+	return vmemmap_populate_basepages(start, end, node, NULL, pgmap);
 #else
-	return vmemmap_populate_hugepages(start, end, node, NULL);
+	return vmemmap_populate_hugepages(start, end, node, NULL, pgmap);
 #endif
 }
 
diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h
index da954e779744..bde07c6f900f 100644
--- a/arch/powerpc/include/asm/book3s/64/radix.h
+++ b/arch/powerpc/include/asm/book3s/64/radix.h
@@ -321,7 +321,8 @@ extern int __meminit radix__vmemmap_create_mapping(unsigned long start,
 					     unsigned long page_size,
 					     unsigned long phys);
 int __meminit radix__vmemmap_populate(unsigned long start, unsigned long end,
-				      int node, struct vmem_altmap *altmap);
+				      int node, struct vmem_altmap *altmap,
+				      struct dev_pagemap *pgmap);
 void __ref radix__vmemmap_free(unsigned long start, unsigned long end,
 			       struct vmem_altmap *altmap);
 extern void radix__vmemmap_remove_mapping(unsigned long start,
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 10aced261cff..568500343e5f 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -1112,7 +1112,7 @@ static inline pte_t *vmemmap_pte_alloc(pmd_t *pmdp, int node,
 
 
 int __meminit radix__vmemmap_populate(unsigned long start, unsigned long end, int node,
-				      struct vmem_altmap *altmap)
+				      struct vmem_altmap *altmap, struct dev_pagemap *pgmap)
 {
 	unsigned long addr;
 	unsigned long next;
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index b6f3ae03ca9e..8f4aa5b32186 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -275,12 +275,12 @@ static int __meminit __vmemmap_populate(unsigned long start, unsigned long end,
 }
 
 int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
-			       struct vmem_altmap *altmap)
+			       struct vmem_altmap *altmap, struct dev_pagemap *pgmap)
 {
 
 #ifdef CONFIG_PPC_BOOK3S_64
 	if (radix_enabled())
-		return radix__vmemmap_populate(start, end, node, altmap);
+		return radix__vmemmap_populate(start, end, node, altmap, pgmap);
 #endif
 
 	return __vmemmap_populate(start, end, node, altmap);
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index b0092fb842a3..a04ae9727cbe 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -1348,7 +1348,7 @@ int __meminit vmemmap_check_pmd(pmd_t *pmdp, int node,
 }
 
 int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
-			       struct vmem_altmap *altmap)
+			       struct vmem_altmap *altmap, struct dev_pagemap *pgmap)
 {
 	WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
 
@@ -1358,7 +1358,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
 	 * memory hotplug, we are not able to update all the page tables with
 	 * the new PMDs.
 	 */
-	return vmemmap_populate_hugepages(start, end, node, altmap);
+	return vmemmap_populate_hugepages(start, end, node, altmap, pgmap);
 }
 #endif
 
diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index eeadff45e0e1..a7bf8d3d5601 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -506,7 +506,7 @@ static void vmem_remove_range(unsigned long start, unsigned long size)
  * Add a backed mem_map array to the virtual mem_map array.
  */
 int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
-			       struct vmem_altmap *altmap)
+			       struct vmem_altmap *altmap, struct dev_pagemap *pgmap)
 {
 	int ret;
 
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 367c269305e5..f870ca330f9e 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2591,9 +2591,10 @@ int __meminit vmemmap_check_pmd(pmd_t *pmdp, int node,
 }
 
 int __meminit vmemmap_populate(unsigned long vstart, unsigned long vend,
-			       int node, struct vmem_altmap *altmap)
+			       int node, struct vmem_altmap *altmap,
+			       struct dev_pagemap *pgmap)
 {
-	return vmemmap_populate_hugepages(vstart, vend, node, NULL);
+	return vmemmap_populate_hugepages(vstart, vend, node, NULL, pgmap);
 }
 #endif /* CONFIG_SPARSEMEM_VMEMMAP */
 
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 77b889b71cf3..e18cc81a30b4 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1557,7 +1557,7 @@ int __meminit vmemmap_check_pmd(pmd_t *pmd, int node,
 }
 
 int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
-		struct vmem_altmap *altmap)
+		struct vmem_altmap *altmap, struct dev_pagemap *pgmap)
 {
 	int err;
 
@@ -1565,15 +1565,15 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
 	VM_BUG_ON(!PAGE_ALIGNED(end));
 
 	if (end - start < PAGES_PER_SECTION * sizeof(struct page))
-		err = vmemmap_populate_basepages(start, end, node, NULL);
+		err = vmemmap_populate_basepages(start, end, node, NULL, pgmap);
 	else if (boot_cpu_has(X86_FEATURE_PSE))
-		err = vmemmap_populate_hugepages(start, end, node, altmap);
+		err = vmemmap_populate_hugepages(start, end, node, altmap, pgmap);
 	else if (altmap) {
 		pr_err_once("%s: no cpu support for altmap allocations\n",
 				__func__);
 		err = -ENOMEM;
 	} else
-		err = vmemmap_populate_basepages(start, end, node, NULL);
+		err = vmemmap_populate_basepages(start, end, node, NULL, pgmap);
 	if (!err)
 		sync_global_pgds(start, end - 1);
 	return err;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0b776907152e..bebc5f892f81 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4877,11 +4877,13 @@ void vmemmap_set_pmd(pmd_t *pmd, void *p, int node,
 int vmemmap_check_pmd(pmd_t *pmd, int node,
 		      unsigned long addr, unsigned long next);
 int vmemmap_populate_basepages(unsigned long start, unsigned long end,
-			       int node, struct vmem_altmap *altmap);
+			       int node, struct vmem_altmap *altmap,
+			       struct dev_pagemap *pgmap);
 int vmemmap_populate_hugepages(unsigned long start, unsigned long end,
-			       int node, struct vmem_altmap *altmap);
+			       int node, struct vmem_altmap *altmap,
+			       struct dev_pagemap *pgmap);
 int vmemmap_populate(unsigned long start, unsigned long end, int node,
-		struct vmem_altmap *altmap);
+		struct vmem_altmap *altmap, struct dev_pagemap *pgmap);
 int vmemmap_populate_hvo(unsigned long start, unsigned long end,
 			 unsigned int order, struct zone *zone,
 			 unsigned long headsize);
diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
index 4a077d231d3a..50b7123f3bdd 100644
--- a/mm/hugetlb_vmemmap.c
+++ b/mm/hugetlb_vmemmap.c
@@ -829,7 +829,7 @@ void __init hugetlb_vmemmap_init_late(int nid)
 			 */
 			list_del(&m->list);
 
-			vmemmap_populate(start, end, nid, NULL);
+			vmemmap_populate(start, end, nid, NULL, NULL);
 			nr_mmap = end - start;
 			memmap_boot_pages_add(DIV_ROUND_UP(nr_mmap, PAGE_SIZE));
 
@@ -845,7 +845,7 @@ void __init hugetlb_vmemmap_init_late(int nid)
 		if (vmemmap_populate_hvo(start, end, huge_page_order(h), zone,
 					 HUGETLB_VMEMMAP_RESERVE_SIZE) < 0) {
 			/* Fallback if HVO population fails */
-			vmemmap_populate(start, end, nid, NULL);
+			vmemmap_populate(start, end, nid, NULL, NULL);
 			nr_mmap = end - start;
 		} else {
 			m->flags |= HUGE_BOOTMEM_ZONES_VALID;
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 05e3e2b94e32..f5245647afee 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -297,7 +297,8 @@ static int __meminit vmemmap_populate_range(unsigned long start,
 }
 
 int __meminit vmemmap_populate_basepages(unsigned long start, unsigned long end,
-					 int node, struct vmem_altmap *altmap)
+					 int node, struct vmem_altmap *altmap,
+					 struct dev_pagemap *pgmap)
 {
 	return vmemmap_populate_range(start, end, node, altmap, -1, 0);
 }
@@ -400,7 +401,8 @@ int __weak __meminit vmemmap_check_pmd(pmd_t *pmd, int node,
 }
 
 int __meminit vmemmap_populate_hugepages(unsigned long start, unsigned long end,
-					 int node, struct vmem_altmap *altmap)
+					 int node, struct vmem_altmap *altmap,
+					 struct dev_pagemap *pgmap)
 {
 	unsigned long addr;
 	unsigned long next;
@@ -445,7 +447,7 @@ int __meminit vmemmap_populate_hugepages(unsigned long start, unsigned long end,
 			}
 		} else if (vmemmap_check_pmd(pmd, node, addr, next))
 			continue;
-		if (vmemmap_populate_basepages(addr, next, node, altmap))
+		if (vmemmap_populate_basepages(addr, next, node, altmap, pgmap))
 			return -ENOMEM;
 	}
 	return 0;
@@ -559,7 +561,7 @@ struct page * __meminit __populate_section_memmap(unsigned long pfn,
 	if (vmemmap_can_optimize(altmap, pgmap))
 		r = vmemmap_populate_compound_pages(pfn, start, end, nid, pgmap);
 	else
-		r = vmemmap_populate(start, end, nid, altmap);
+		r = vmemmap_populate(start, end, nid, altmap, pgmap);
 
 	if (r < 0)
 		return NULL;
-- 
2.20.1



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 5/6] mm/sparse-vmemmap: Fix missing architecture-specific page table sync
  2026-04-15 11:14 [PATCH v2 0/6] mm: Fix vmemmap optimization accounting and initialization Muchun Song
                   ` (3 preceding siblings ...)
  2026-04-15 11:14 ` [PATCH v2 4/6] mm/sparse-vmemmap: Pass @pgmap argument to arch vmemmap_populate() Muchun Song
@ 2026-04-15 11:14 ` Muchun Song
  2026-04-15 11:14 ` [PATCH v2 6/6] mm/mm_init: Fix pageblock migratetype for ZONE_DEVICE compound pages Muchun Song
  5 siblings, 0 replies; 13+ messages in thread
From: Muchun Song @ 2026-04-15 11:14 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Muchun Song, Oscar Salvador,
	Michael Ellerman, Madhavan Srinivasan
  Cc: Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Nicholas Piggin,
	Christophe Leroy, aneesh.kumar, joao.m.martins, linux-mm,
	linuxppc-dev, linux-kernel, Muchun Song

On x86-64, vmemmap_populate() normally calls sync_global_pgds() to
keep the page tables in sync. However, when vmemmap optimization for
compound devmaps is enabled, vmemmap_populate_compound_pages() is called
directly from __populate_section_memmap(), bypassing the architecture-
specific vmemmap_populate() entirely. This skips the sync on x86-64
and can later trigger vmemmap-access faults.

Fix this by moving the vmemmap_can_optimize() dispatch from
__populate_section_memmap() into the generic helpers --
vmemmap_populate_basepages() and vmemmap_populate_hugepages(). This way,
the architecture vmemmap_populate() is always invoked first, ensuring
any arch-specific post-population steps (e.g. sync_global_pgds()) are
executed before returning.

Architectures that override vmemmap_populate() (e.g. powerpc) handle
the optimization dispatch in their own implementation instead.

Fixes: 4917f55b4ef9 ("mm/sparse-vmemmap: improve memory savings for compound devmaps")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 arch/powerpc/include/asm/book3s/64/radix.h |  6 ------
 arch/powerpc/mm/book3s64/radix_pgtable.c   | 16 ++++++++++-----
 mm/sparse-vmemmap.c                        | 24 +++++++++++-----------
 3 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h
index bde07c6f900f..2600defa2dc2 100644
--- a/arch/powerpc/include/asm/book3s/64/radix.h
+++ b/arch/powerpc/include/asm/book3s/64/radix.h
@@ -357,11 +357,5 @@ int radix__remove_section_mapping(unsigned long start, unsigned long end);
 #define vmemmap_can_optimize vmemmap_can_optimize
 bool vmemmap_can_optimize(struct vmem_altmap *altmap, struct dev_pagemap *pgmap);
 #endif
-
-#define vmemmap_populate_compound_pages vmemmap_populate_compound_pages
-int __meminit vmemmap_populate_compound_pages(unsigned long start_pfn,
-					      unsigned long start,
-					      unsigned long end, int node,
-					      struct dev_pagemap *pgmap);
 #endif /* __ASSEMBLER__ */
 #endif
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 568500343e5f..21fece355fbb 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -1109,7 +1109,10 @@ static inline pte_t *vmemmap_pte_alloc(pmd_t *pmdp, int node,
 	return pte_offset_kernel(pmdp, address);
 }
 
-
+static int __meminit vmemmap_populate_compound_pages(unsigned long start_pfn,
+						    unsigned long start,
+						    unsigned long end, int node,
+						    struct dev_pagemap *pgmap);
 
 int __meminit radix__vmemmap_populate(unsigned long start, unsigned long end, int node,
 				      struct vmem_altmap *altmap, struct dev_pagemap *pgmap)
@@ -1122,6 +1125,9 @@ int __meminit radix__vmemmap_populate(unsigned long start, unsigned long end, in
 	pmd_t *pmd;
 	pte_t *pte;
 
+	if (vmemmap_can_optimize(altmap, pgmap))
+		return vmemmap_populate_compound_pages(page_to_pfn((struct page *)start),
+						       start, end, node, pgmap);
 	/*
 	 * If altmap is present, Make sure we align the start vmemmap addr
 	 * to PAGE_SIZE so that we calculate the correct start_pfn in
@@ -1303,10 +1309,10 @@ static pte_t * __meminit vmemmap_compound_tail_page(unsigned long addr,
 	return pte;
 }
 
-int __meminit vmemmap_populate_compound_pages(unsigned long start_pfn,
-					      unsigned long start,
-					      unsigned long end, int node,
-					      struct dev_pagemap *pgmap)
+static int __meminit vmemmap_populate_compound_pages(unsigned long start_pfn,
+						     unsigned long start,
+						     unsigned long end, int node,
+						     struct dev_pagemap *pgmap)
 {
 	/*
 	 * we want to map things as base page size mapping so that
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index f5245647afee..7f684ed3479e 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -296,10 +296,16 @@ static int __meminit vmemmap_populate_range(unsigned long start,
 	return 0;
 }
 
+static int __meminit vmemmap_populate_compound_pages(unsigned long start,
+						     unsigned long end, int node,
+						     struct dev_pagemap *pgmap);
+
 int __meminit vmemmap_populate_basepages(unsigned long start, unsigned long end,
 					 int node, struct vmem_altmap *altmap,
 					 struct dev_pagemap *pgmap)
 {
+	if (vmemmap_can_optimize(altmap, pgmap))
+		return vmemmap_populate_compound_pages(start, end, node, pgmap);
 	return vmemmap_populate_range(start, end, node, altmap, -1, 0);
 }
 
@@ -411,6 +417,9 @@ int __meminit vmemmap_populate_hugepages(unsigned long start, unsigned long end,
 	pud_t *pud;
 	pmd_t *pmd;
 
+	if (vmemmap_can_optimize(altmap, pgmap))
+		return vmemmap_populate_compound_pages(start, end, node, pgmap);
+
 	for (addr = start; addr < end; addr = next) {
 		next = pmd_addr_end(addr, end);
 
@@ -453,7 +462,6 @@ int __meminit vmemmap_populate_hugepages(unsigned long start, unsigned long end,
 	return 0;
 }
 
-#ifndef vmemmap_populate_compound_pages
 /*
  * For compound pages bigger than section size (e.g. x86 1G compound
  * pages with 2M subsection size) fill the rest of sections as tail
@@ -491,14 +499,14 @@ static pte_t * __meminit compound_section_tail_page(unsigned long addr)
 	return pte;
 }
 
-static int __meminit vmemmap_populate_compound_pages(unsigned long start_pfn,
-						     unsigned long start,
+static int __meminit vmemmap_populate_compound_pages(unsigned long start,
 						     unsigned long end, int node,
 						     struct dev_pagemap *pgmap)
 {
 	unsigned long size, addr;
 	pte_t *pte;
 	int rc;
+	unsigned long start_pfn = page_to_pfn((struct page *)start);
 
 	if (reuse_compound_section(start_pfn, pgmap)) {
 		pte = compound_section_tail_page(start);
@@ -544,26 +552,18 @@ static int __meminit vmemmap_populate_compound_pages(unsigned long start_pfn,
 	return 0;
 }
 
-#endif
-
 struct page * __meminit __populate_section_memmap(unsigned long pfn,
 		unsigned long nr_pages, int nid, struct vmem_altmap *altmap,
 		struct dev_pagemap *pgmap)
 {
 	unsigned long start = (unsigned long) pfn_to_page(pfn);
 	unsigned long end = start + nr_pages * sizeof(struct page);
-	int r;
 
 	if (WARN_ON_ONCE(!IS_ALIGNED(pfn, PAGES_PER_SUBSECTION) ||
 		!IS_ALIGNED(nr_pages, PAGES_PER_SUBSECTION)))
 		return NULL;
 
-	if (vmemmap_can_optimize(altmap, pgmap))
-		r = vmemmap_populate_compound_pages(pfn, start, end, nid, pgmap);
-	else
-		r = vmemmap_populate(start, end, nid, altmap, pgmap);
-
-	if (r < 0)
+	if (vmemmap_populate(start, end, nid, altmap, pgmap))
 		return NULL;
 
 	return pfn_to_page(pfn);
-- 
2.20.1



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 6/6] mm/mm_init: Fix pageblock migratetype for ZONE_DEVICE compound pages
  2026-04-15 11:14 [PATCH v2 0/6] mm: Fix vmemmap optimization accounting and initialization Muchun Song
                   ` (4 preceding siblings ...)
  2026-04-15 11:14 ` [PATCH v2 5/6] mm/sparse-vmemmap: Fix missing architecture-specific page table sync Muchun Song
@ 2026-04-15 11:14 ` Muchun Song
  5 siblings, 0 replies; 13+ messages in thread
From: Muchun Song @ 2026-04-15 11:14 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Muchun Song, Oscar Salvador,
	Michael Ellerman, Madhavan Srinivasan
  Cc: Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Nicholas Piggin,
	Christophe Leroy, aneesh.kumar, joao.m.martins, linux-mm,
	linuxppc-dev, linux-kernel, Muchun Song

The memmap_init_zone_device() function only initializes the migratetype
of the first pageblock of a compound page. If the compound page size
exceeds pageblock_nr_pages (e.g., 1GB hugepages with 2MB pageblocks),
subsequent pageblocks in the compound page remain uninitialized.

Move the migratetype initialization out of __init_zone_device_page()
and into a separate pageblock_migratetype_init_range() function. This
iterates over the entire PFN range of the memory, ensuring that all
pageblocks are correctly initialized.

Fixes: c4386bd8ee3a ("mm/memremap: add ZONE_DEVICE support for compound pages")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 mm/mm_init.c | 42 +++++++++++++++++++++++++++---------------
 1 file changed, 27 insertions(+), 15 deletions(-)

diff --git a/mm/mm_init.c b/mm/mm_init.c
index f9f8e1af921c..30528c4206c1 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -674,6 +674,19 @@ static inline void fixup_hashdist(void)
 static inline void fixup_hashdist(void) {}
 #endif /* CONFIG_NUMA */
 
+static __meminit void pageblock_migratetype_init_range(unsigned long pfn,
+						       unsigned long nr_pages,
+						       int migratetype)
+{
+	unsigned long end = pfn + nr_pages;
+
+	for (pfn = pageblock_align(pfn); pfn < end; pfn += pageblock_nr_pages) {
+		init_pageblock_migratetype(pfn_to_page(pfn), migratetype, false);
+		if (IS_ALIGNED(pfn, PAGES_PER_SECTION))
+			cond_resched();
+	}
+}
+
 /*
  * Initialize a reserved page unconditionally, finding its zone first.
  */
@@ -1011,21 +1024,6 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn,
 	page_folio(page)->pgmap = pgmap;
 	page->zone_device_data = NULL;
 
-	/*
-	 * Mark the block movable so that blocks are reserved for
-	 * movable at startup. This will force kernel allocations
-	 * to reserve their blocks rather than leaking throughout
-	 * the address space during boot when many long-lived
-	 * kernel allocations are made.
-	 *
-	 * Please note that MEMINIT_HOTPLUG path doesn't clear memmap
-	 * because this is done early in section_activate()
-	 */
-	if (pageblock_aligned(pfn)) {
-		init_pageblock_migratetype(page, MIGRATE_MOVABLE, false);
-		cond_resched();
-	}
-
 	/*
 	 * ZONE_DEVICE pages other than MEMORY_TYPE_GENERIC are released
 	 * directly to the driver page allocator which will set the page count
@@ -1122,6 +1120,8 @@ void __ref memmap_init_zone_device(struct zone *zone,
 
 		__init_zone_device_page(page, pfn, zone_idx, nid, pgmap);
 
+		cond_resched();
+
 		if (pfns_per_compound == 1)
 			continue;
 
@@ -1129,6 +1129,18 @@ void __ref memmap_init_zone_device(struct zone *zone,
 				     compound_nr_pages(altmap, pgmap));
 	}
 
+	/*
+	 * Mark the block movable so that blocks are reserved for
+	 * movable at startup. This will force kernel allocations
+	 * to reserve their blocks rather than leaking throughout
+	 * the address space during boot when many long-lived
+	 * kernel allocations are made.
+	 *
+	 * Please note that MEMINIT_HOTPLUG path doesn't clear memmap
+	 * because this is done early in section_activate()
+	 */
+	pageblock_migratetype_init_range(start_pfn, nr_pages, MIGRATE_MOVABLE);
+
 	pr_debug("%s initialised %lu pages in %ums\n", __func__,
 		nr_pages, jiffies_to_msecs(jiffies - start));
 }
-- 
2.20.1



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/6] mm/sparse-vmemmap: Fix vmemmap accounting underflow
  2026-04-15 11:14 ` [PATCH v2 1/6] mm/sparse-vmemmap: Fix vmemmap accounting underflow Muchun Song
@ 2026-04-15 11:26   ` Muchun Song
  2026-04-15 15:53   ` Mike Rapoport
  1 sibling, 0 replies; 13+ messages in thread
From: Muchun Song @ 2026-04-15 11:26 UTC (permalink / raw)
  To: Muchun Song
  Cc: Andrew Morton, David Hildenbrand, Oscar Salvador,
	Michael Ellerman, Madhavan Srinivasan, Lorenzo Stoakes,
	Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Nicholas Piggin,
	Christophe Leroy, aneesh.kumar, joao.m.martins, linux-mm,
	linuxppc-dev, linux-kernel



> On Apr 15, 2026, at 19:14, Muchun Song <songmuchun@bytedance.com> wrote:
> 
> In section_activate(), if populate_section_memmap() fails, the error
> handling path calls section_deactivate() to roll back the state. This
> causes a vmemmap accounting imbalance.
> 
> Since commit c3576889d87b ("mm: fix accounting of memmap pages"),
> memmap pages are accounted for only after populate_section_memmap()
> succeeds. However, the failure path unconditionally calls
> section_deactivate(), which decreases the vmemmap count. Consequently,
> a failure in populate_section_memmap() leads to an accounting underflow,
> incorrectly reducing the system's tracked vmemmap usage.
> 
> Fix this more thoroughly by moving all accounting calls into the lower
> level functions that actually perform the vmemmap allocation and freeing:
> 
>  - populate_section_memmap() accounts for newly allocated vmemmap pages
>  - depopulate_section_memmap() unaccounts when vmemmap is freed
>  - free_map_bootmem() handles early bootmem section accounting

Sorry, this line was forgotten to be deleted. The subsequent modification
of free_map_bootmem() is a separate cleanup.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 4/6] mm/sparse-vmemmap: Pass @pgmap argument to arch vmemmap_populate()
  2026-04-15 11:14 ` [PATCH v2 4/6] mm/sparse-vmemmap: Pass @pgmap argument to arch vmemmap_populate() Muchun Song
@ 2026-04-15 12:13   ` Joao Martins
  2026-04-15 12:21     ` Muchun Song
  0 siblings, 1 reply; 13+ messages in thread
From: Joao Martins @ 2026-04-15 12:13 UTC (permalink / raw)
  To: Muchun Song
  Cc: Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Nicholas Piggin,
	Christophe Leroy, aneesh.kumar, linux-mm, linuxppc-dev,
	linux-kernel, harry.yoo, Andrew Morton, David Hildenbrand,
	Muchun Song, Oscar Salvador, Michael Ellerman,
	Madhavan Srinivasan

On 15/04/2026 12:14, Muchun Song wrote:
> Add the struct dev_pagemap pointer as a parameter to the architecture
> specific vmemmap_populate(), vmemmap_populate_hugepages() and
> vmemmap_populate_basepages() functions.
> 
> Currently, the vmemmap optimization for DAX is handled mostly in an
> architecture-agnostic way via vmemmap_populate_compound_pages().
> However, this approach skips crucial architecture-specific initialization
> steps. For example, the x86 path must call sync_global_pgds() after
> populating the vmemmap, which is currently being bypassed.
> 

Harry's series fixed in a different way (for x86):

https://lore.kernel.org/linux-mm/20250818020206.4517-1-harry.yoo@oracle.com/#t

> To lay the groundwork for fixing the vmemmap optimization in the arch
> level, we need to pass the @pgmap pointer down to the arch specific
> vmemmap_populate() location. Plumb the @pgmap argument through the APIs
> of vmemmap_populate(), vmemmap_populate_hugepages() and
> vmemmap_populate_basepages().
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  arch/arm64/mm/mmu.c                        |  6 +++---
>  arch/loongarch/mm/init.c                   |  7 ++++---
>  arch/powerpc/include/asm/book3s/64/radix.h |  3 ++-
>  arch/powerpc/mm/book3s64/radix_pgtable.c   |  2 +-
>  arch/powerpc/mm/init_64.c                  |  4 ++--
>  arch/riscv/mm/init.c                       |  4 ++--
>  arch/s390/mm/vmem.c                        |  2 +-
>  arch/sparc/mm/init_64.c                    |  5 +++--
>  arch/x86/mm/init_64.c                      |  8 ++++----
>  include/linux/mm.h                         |  8 +++++---
>  mm/hugetlb_vmemmap.c                       |  4 ++--
>  mm/sparse-vmemmap.c                        | 10 ++++++----
>  12 files changed, 35 insertions(+), 28 deletions(-)
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index e5a42b7a0160..11227e104c48 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -1790,7 +1790,7 @@ int __meminit vmemmap_check_pmd(pmd_t *pmdp, int node,
>  }
>  
>  int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
> -		struct vmem_altmap *altmap)
> +		struct vmem_altmap *altmap, struct dev_pagemap *pgmap)
>  {
>  	WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
>  	/* [start, end] should be within one section */
> @@ -1798,9 +1798,9 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>  
>  	if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES) ||
>  	    (end - start < PAGES_PER_SECTION * sizeof(struct page)))
> -		return vmemmap_populate_basepages(start, end, node, altmap);
> +		return vmemmap_populate_basepages(start, end, node, altmap, pgmap);
>  	else
> -		return vmemmap_populate_hugepages(start, end, node, altmap);
> +		return vmemmap_populate_hugepages(start, end, node, altmap, pgmap);
>  }
>  
>  #ifdef CONFIG_MEMORY_HOTPLUG
> diff --git a/arch/loongarch/mm/init.c b/arch/loongarch/mm/init.c
> index c9c57f08fa2c..d61c2e09caae 100644
> --- a/arch/loongarch/mm/init.c
> +++ b/arch/loongarch/mm/init.c
> @@ -123,12 +123,13 @@ int __meminit vmemmap_check_pmd(pmd_t *pmd, int node,
>  }
>  
>  int __meminit vmemmap_populate(unsigned long start, unsigned long end,
> -			       int node, struct vmem_altmap *altmap)
> +			       int node, struct vmem_altmap *altmap,
> +			       struct dev_pagemap *pgmap)
>  {
>  #if CONFIG_PGTABLE_LEVELS == 2
> -	return vmemmap_populate_basepages(start, end, node, NULL);
> +	return vmemmap_populate_basepages(start, end, node, NULL, pgmap);
>  #else
> -	return vmemmap_populate_hugepages(start, end, node, NULL);
> +	return vmemmap_populate_hugepages(start, end, node, NULL, pgmap);
>  #endif
>  }
>  
> diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h
> index da954e779744..bde07c6f900f 100644
> --- a/arch/powerpc/include/asm/book3s/64/radix.h
> +++ b/arch/powerpc/include/asm/book3s/64/radix.h
> @@ -321,7 +321,8 @@ extern int __meminit radix__vmemmap_create_mapping(unsigned long start,
>  					     unsigned long page_size,
>  					     unsigned long phys);
>  int __meminit radix__vmemmap_populate(unsigned long start, unsigned long end,
> -				      int node, struct vmem_altmap *altmap);
> +				      int node, struct vmem_altmap *altmap,
> +				      struct dev_pagemap *pgmap);
>  void __ref radix__vmemmap_free(unsigned long start, unsigned long end,
>  			       struct vmem_altmap *altmap);
>  extern void radix__vmemmap_remove_mapping(unsigned long start,
> diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c
> index 10aced261cff..568500343e5f 100644
> --- a/arch/powerpc/mm/book3s64/radix_pgtable.c
> +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
> @@ -1112,7 +1112,7 @@ static inline pte_t *vmemmap_pte_alloc(pmd_t *pmdp, int node,
>  
>  
>  int __meminit radix__vmemmap_populate(unsigned long start, unsigned long end, int node,
> -				      struct vmem_altmap *altmap)
> +				      struct vmem_altmap *altmap, struct dev_pagemap *pgmap)
>  {
>  	unsigned long addr;
>  	unsigned long next;
> diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
> index b6f3ae03ca9e..8f4aa5b32186 100644
> --- a/arch/powerpc/mm/init_64.c
> +++ b/arch/powerpc/mm/init_64.c
> @@ -275,12 +275,12 @@ static int __meminit __vmemmap_populate(unsigned long start, unsigned long end,
>  }
>  
>  int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
> -			       struct vmem_altmap *altmap)
> +			       struct vmem_altmap *altmap, struct dev_pagemap *pgmap)
>  {
>  
>  #ifdef CONFIG_PPC_BOOK3S_64
>  	if (radix_enabled())
> -		return radix__vmemmap_populate(start, end, node, altmap);
> +		return radix__vmemmap_populate(start, end, node, altmap, pgmap);
>  #endif
>  
>  	return __vmemmap_populate(start, end, node, altmap);
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index b0092fb842a3..a04ae9727cbe 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -1348,7 +1348,7 @@ int __meminit vmemmap_check_pmd(pmd_t *pmdp, int node,
>  }
>  
>  int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
> -			       struct vmem_altmap *altmap)
> +			       struct vmem_altmap *altmap, struct dev_pagemap *pgmap)
>  {
>  	WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
>  
> @@ -1358,7 +1358,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>  	 * memory hotplug, we are not able to update all the page tables with
>  	 * the new PMDs.
>  	 */
> -	return vmemmap_populate_hugepages(start, end, node, altmap);
> +	return vmemmap_populate_hugepages(start, end, node, altmap, pgmap);
>  }
>  #endif
>  
> diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
> index eeadff45e0e1..a7bf8d3d5601 100644
> --- a/arch/s390/mm/vmem.c
> +++ b/arch/s390/mm/vmem.c
> @@ -506,7 +506,7 @@ static void vmem_remove_range(unsigned long start, unsigned long size)
>   * Add a backed mem_map array to the virtual mem_map array.
>   */
>  int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
> -			       struct vmem_altmap *altmap)
> +			       struct vmem_altmap *altmap, struct dev_pagemap *pgmap)
>  {
>  	int ret;
>  
> diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
> index 367c269305e5..f870ca330f9e 100644
> --- a/arch/sparc/mm/init_64.c
> +++ b/arch/sparc/mm/init_64.c
> @@ -2591,9 +2591,10 @@ int __meminit vmemmap_check_pmd(pmd_t *pmdp, int node,
>  }
>  
>  int __meminit vmemmap_populate(unsigned long vstart, unsigned long vend,
> -			       int node, struct vmem_altmap *altmap)
> +			       int node, struct vmem_altmap *altmap,
> +			       struct dev_pagemap *pgmap)
>  {
> -	return vmemmap_populate_hugepages(vstart, vend, node, NULL);
> +	return vmemmap_populate_hugepages(vstart, vend, node, NULL, pgmap);
>  }
>  #endif /* CONFIG_SPARSEMEM_VMEMMAP */
>  
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 77b889b71cf3..e18cc81a30b4 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -1557,7 +1557,7 @@ int __meminit vmemmap_check_pmd(pmd_t *pmd, int node,
>  }
>  
>  int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
> -		struct vmem_altmap *altmap)
> +		struct vmem_altmap *altmap, struct dev_pagemap *pgmap)
>  {
>  	int err;
>  
> @@ -1565,15 +1565,15 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>  	VM_BUG_ON(!PAGE_ALIGNED(end));
>  
>  	if (end - start < PAGES_PER_SECTION * sizeof(struct page))
> -		err = vmemmap_populate_basepages(start, end, node, NULL);
> +		err = vmemmap_populate_basepages(start, end, node, NULL, pgmap);
>  	else if (boot_cpu_has(X86_FEATURE_PSE))
> -		err = vmemmap_populate_hugepages(start, end, node, altmap);
> +		err = vmemmap_populate_hugepages(start, end, node, altmap, pgmap);
>  	else if (altmap) {
>  		pr_err_once("%s: no cpu support for altmap allocations\n",
>  				__func__);
>  		err = -ENOMEM;
>  	} else
> -		err = vmemmap_populate_basepages(start, end, node, NULL);
> +		err = vmemmap_populate_basepages(start, end, node, NULL, pgmap);
>  	if (!err)
>  		sync_global_pgds(start, end - 1);
>  	return err;
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 0b776907152e..bebc5f892f81 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -4877,11 +4877,13 @@ void vmemmap_set_pmd(pmd_t *pmd, void *p, int node,
>  int vmemmap_check_pmd(pmd_t *pmd, int node,
>  		      unsigned long addr, unsigned long next);
>  int vmemmap_populate_basepages(unsigned long start, unsigned long end,
> -			       int node, struct vmem_altmap *altmap);
> +			       int node, struct vmem_altmap *altmap,
> +			       struct dev_pagemap *pgmap);
>  int vmemmap_populate_hugepages(unsigned long start, unsigned long end,
> -			       int node, struct vmem_altmap *altmap);
> +			       int node, struct vmem_altmap *altmap,
> +			       struct dev_pagemap *pgmap);
>  int vmemmap_populate(unsigned long start, unsigned long end, int node,
> -		struct vmem_altmap *altmap);
> +		struct vmem_altmap *altmap, struct dev_pagemap *pgmap);
>  int vmemmap_populate_hvo(unsigned long start, unsigned long end,
>  			 unsigned int order, struct zone *zone,
>  			 unsigned long headsize);
> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> index 4a077d231d3a..50b7123f3bdd 100644
> --- a/mm/hugetlb_vmemmap.c
> +++ b/mm/hugetlb_vmemmap.c
> @@ -829,7 +829,7 @@ void __init hugetlb_vmemmap_init_late(int nid)
>  			 */
>  			list_del(&m->list);
>  
> -			vmemmap_populate(start, end, nid, NULL);
> +			vmemmap_populate(start, end, nid, NULL, NULL);
>  			nr_mmap = end - start;
>  			memmap_boot_pages_add(DIV_ROUND_UP(nr_mmap, PAGE_SIZE));
>  
> @@ -845,7 +845,7 @@ void __init hugetlb_vmemmap_init_late(int nid)
>  		if (vmemmap_populate_hvo(start, end, huge_page_order(h), zone,
>  					 HUGETLB_VMEMMAP_RESERVE_SIZE) < 0) {
>  			/* Fallback if HVO population fails */
> -			vmemmap_populate(start, end, nid, NULL);
> +			vmemmap_populate(start, end, nid, NULL, NULL);
>  			nr_mmap = end - start;
>  		} else {
>  			m->flags |= HUGE_BOOTMEM_ZONES_VALID;
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index 05e3e2b94e32..f5245647afee 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -297,7 +297,8 @@ static int __meminit vmemmap_populate_range(unsigned long start,
>  }
>  
>  int __meminit vmemmap_populate_basepages(unsigned long start, unsigned long end,
> -					 int node, struct vmem_altmap *altmap)
> +					 int node, struct vmem_altmap *altmap,
> +					 struct dev_pagemap *pgmap)
>  {
>  	return vmemmap_populate_range(start, end, node, altmap, -1, 0);
>  }
> @@ -400,7 +401,8 @@ int __weak __meminit vmemmap_check_pmd(pmd_t *pmd, int node,
>  }
>  
>  int __meminit vmemmap_populate_hugepages(unsigned long start, unsigned long end,
> -					 int node, struct vmem_altmap *altmap)
> +					 int node, struct vmem_altmap *altmap,
> +					 struct dev_pagemap *pgmap)
>  {
>  	unsigned long addr;
>  	unsigned long next;
> @@ -445,7 +447,7 @@ int __meminit vmemmap_populate_hugepages(unsigned long start, unsigned long end,
>  			}
>  		} else if (vmemmap_check_pmd(pmd, node, addr, next))
>  			continue;
> -		if (vmemmap_populate_basepages(addr, next, node, altmap))
> +		if (vmemmap_populate_basepages(addr, next, node, altmap, pgmap))
>  			return -ENOMEM;
>  	}
>  	return 0;
> @@ -559,7 +561,7 @@ struct page * __meminit __populate_section_memmap(unsigned long pfn,
>  	if (vmemmap_can_optimize(altmap, pgmap))
>  		r = vmemmap_populate_compound_pages(pfn, start, end, nid, pgmap);
>  	else
> -		r = vmemmap_populate(start, end, nid, altmap);
> +		r = vmemmap_populate(start, end, nid, altmap, pgmap);
>  
>  	if (r < 0)
>  		return NULL;



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 4/6] mm/sparse-vmemmap: Pass @pgmap argument to arch vmemmap_populate()
  2026-04-15 12:13   ` Joao Martins
@ 2026-04-15 12:21     ` Muchun Song
  0 siblings, 0 replies; 13+ messages in thread
From: Muchun Song @ 2026-04-15 12:21 UTC (permalink / raw)
  To: Joao Martins
  Cc: Muchun Song, Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Nicholas Piggin,
	Christophe Leroy, aneesh.kumar, linux-mm, linuxppc-dev,
	linux-kernel, harry.yoo, Andrew Morton, David Hildenbrand,
	Oscar Salvador, Michael Ellerman, Madhavan Srinivasan



> On Apr 15, 2026, at 20:13, Joao Martins <joao.m.martins@oracle.com> wrote:
> 
> On 15/04/2026 12:14, Muchun Song wrote:
>> Add the struct dev_pagemap pointer as a parameter to the architecture
>> specific vmemmap_populate(), vmemmap_populate_hugepages() and
>> vmemmap_populate_basepages() functions.
>> 
>> Currently, the vmemmap optimization for DAX is handled mostly in an
>> architecture-agnostic way via vmemmap_populate_compound_pages().
>> However, this approach skips crucial architecture-specific initialization
>> steps. For example, the x86 path must call sync_global_pgds() after
>> populating the vmemmap, which is currently being bypassed.
>> 
> 
> Harry's series fixed in a different way (for x86):
> 
> https://lore.kernel.org/linux-mm/20250818020206.4517-1-harry.yoo@oracle.com/#t

Thanks for your information. It indeed fixes the page table sync issue.

I'll drop this one int the next version.

Thanks,
Muhcun

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/6] mm/sparse-vmemmap: Fix vmemmap accounting underflow
  2026-04-15 11:14 ` [PATCH v2 1/6] mm/sparse-vmemmap: Fix vmemmap accounting underflow Muchun Song
  2026-04-15 11:26   ` Muchun Song
@ 2026-04-15 15:53   ` Mike Rapoport
  1 sibling, 0 replies; 13+ messages in thread
From: Mike Rapoport @ 2026-04-15 15:53 UTC (permalink / raw)
  To: Muchun Song
  Cc: Andrew Morton, David Hildenbrand, Muchun Song, Oscar Salvador,
	Michael Ellerman, Madhavan Srinivasan, Lorenzo Stoakes,
	Liam R . Howlett, Vlastimil Babka, Suren Baghdasaryan,
	Michal Hocko, Nicholas Piggin, Christophe Leroy, aneesh.kumar,
	joao.m.martins, linux-mm, linuxppc-dev, linux-kernel

On Wed, Apr 15, 2026 at 07:14:07PM +0800, Muchun Song wrote:
> In section_activate(), if populate_section_memmap() fails, the error
> handling path calls section_deactivate() to roll back the state. This
> causes a vmemmap accounting imbalance.
> 
> Since commit c3576889d87b ("mm: fix accounting of memmap pages"),
> memmap pages are accounted for only after populate_section_memmap()
> succeeds. However, the failure path unconditionally calls
> section_deactivate(), which decreases the vmemmap count. Consequently,
> a failure in populate_section_memmap() leads to an accounting underflow,
> incorrectly reducing the system's tracked vmemmap usage.
> 
> Fix this more thoroughly by moving all accounting calls into the lower
> level functions that actually perform the vmemmap allocation and freeing:
> 
>   - populate_section_memmap() accounts for newly allocated vmemmap pages
>   - depopulate_section_memmap() unaccounts when vmemmap is freed
>   - free_map_bootmem() handles early bootmem section accounting
> 
> This ensures proper accounting in all code paths, including error
> handling and early section cases.
> 
> Fixes: c3576889d87b ("mm: fix accounting of memmap pages")
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>

Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

> ---
>  mm/sparse-vmemmap.c | 20 ++++++++++++--------
>  1 file changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index 6eadb9d116e4..a7b11248b989 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -656,7 +656,12 @@ static struct page * __meminit populate_section_memmap(unsigned long pfn,
>  		unsigned long nr_pages, int nid, struct vmem_altmap *altmap,
>  		struct dev_pagemap *pgmap)
>  {
> -	return __populate_section_memmap(pfn, nr_pages, nid, altmap, pgmap);
> +	struct page *page = __populate_section_memmap(pfn, nr_pages, nid, altmap,
> +						      pgmap);
> +
> +	memmap_pages_add(DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE));
> +
> +	return page;
>  }
>  
>  static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages,
> @@ -665,13 +670,17 @@ static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages,
>  	unsigned long start = (unsigned long) pfn_to_page(pfn);
>  	unsigned long end = start + nr_pages * sizeof(struct page);
>  
> +	memmap_pages_add(-1L * (DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE)));
>  	vmemmap_free(start, end, altmap);
>  }
> +
>  static void free_map_bootmem(struct page *memmap)
>  {
>  	unsigned long start = (unsigned long)memmap;
>  	unsigned long end = (unsigned long)(memmap + PAGES_PER_SECTION);
>  
> +	memmap_boot_pages_add(-1L * (DIV_ROUND_UP(PAGES_PER_SECTION * sizeof(struct page),
> +						  PAGE_SIZE)));
>  	vmemmap_free(start, end, NULL);
>  }
>  
> @@ -774,14 +783,10 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
>  	 * The memmap of early sections is always fully populated. See
>  	 * section_activate() and pfn_valid() .
>  	 */
> -	if (!section_is_early) {
> -		memmap_pages_add(-1L * (DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE)));
> +	if (!section_is_early)
>  		depopulate_section_memmap(pfn, nr_pages, altmap);
> -	} else if (memmap) {
> -		memmap_boot_pages_add(-1L * (DIV_ROUND_UP(nr_pages * sizeof(struct page),
> -							  PAGE_SIZE)));
> +	else if (memmap)
>  		free_map_bootmem(memmap);
> -	}
>  
>  	if (empty)
>  		ms->section_mem_map = (unsigned long)NULL;
> @@ -826,7 +831,6 @@ static struct page * __meminit section_activate(int nid, unsigned long pfn,
>  		section_deactivate(pfn, nr_pages, altmap);
>  		return ERR_PTR(-ENOMEM);
>  	}
> -	memmap_pages_add(DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE));
>  
>  	return memmap;
>  }
> -- 
> 2.20.1
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/6] mm/sparse-vmemmap: Pass @pgmap argument to memory deactivation paths
  2026-04-15 11:14 ` [PATCH v2 2/6] mm/sparse-vmemmap: Pass @pgmap argument to memory deactivation paths Muchun Song
@ 2026-04-15 15:55   ` Mike Rapoport
  0 siblings, 0 replies; 13+ messages in thread
From: Mike Rapoport @ 2026-04-15 15:55 UTC (permalink / raw)
  To: Muchun Song
  Cc: Andrew Morton, David Hildenbrand, Muchun Song, Oscar Salvador,
	Michael Ellerman, Madhavan Srinivasan, Lorenzo Stoakes,
	Liam R . Howlett, Vlastimil Babka, Suren Baghdasaryan,
	Michal Hocko, Nicholas Piggin, Christophe Leroy, aneesh.kumar,
	joao.m.martins, linux-mm, linuxppc-dev, linux-kernel

On Wed, Apr 15, 2026 at 07:14:08PM +0800, Muchun Song wrote:
> Currently, the memory hot-remove call chain -- arch_remove_memory(),
> __remove_pages(), sparse_remove_section() and section_deactivate() --
> does not carry the struct dev_pagemap pointer. This prevents the lower
> levels from knowing whether the section was originally populated with
> vmemmap optimizations (e.g., DAX with vmemmap optimization enabled).
> 
> Without this information, we cannot call vmemmap_can_optimize() to
> determine if the vmemmap pages were optimized. As a result, the vmemmap
> page accounting during teardown will mistakenly assume a non-optimized
> allocation, leading to incorrect memmap statistics.
> 
> To lay the groundwork for fixing the vmemmap page accounting, we need
> to pass the @pgmap pointer down to the deactivation location. Plumb the
> @pgmap argument through the APIs of arch_remove_memory(), __remove_pages()
> and sparse_remove_section(), mirroring the corresponding *_activate()
> paths.
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>

Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

> ---
>  arch/arm64/mm/mmu.c            |  5 +++--
>  arch/loongarch/mm/init.c       |  5 +++--
>  arch/powerpc/mm/mem.c          |  5 +++--
>  arch/riscv/mm/init.c           |  5 +++--
>  arch/s390/mm/init.c            |  5 +++--
>  arch/x86/mm/init_64.c          |  5 +++--
>  include/linux/memory_hotplug.h |  8 +++++---
>  mm/memory_hotplug.c            | 12 ++++++------
>  mm/memremap.c                  |  4 ++--
>  mm/sparse-vmemmap.c            | 17 +++++++++--------
>  10 files changed, 40 insertions(+), 31 deletions(-)
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index dd85e093ffdb..e5a42b7a0160 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -2024,12 +2024,13 @@ int arch_add_memory(int nid, u64 start, u64 size,
>  	return ret;
>  }
>  
> -void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
> +void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap,
> +			struct dev_pagemap *pgmap)
>  {
>  	unsigned long start_pfn = start >> PAGE_SHIFT;
>  	unsigned long nr_pages = size >> PAGE_SHIFT;
>  
> -	__remove_pages(start_pfn, nr_pages, altmap);
> +	__remove_pages(start_pfn, nr_pages, altmap, pgmap);
>  	__remove_pgd_mapping(swapper_pg_dir, __phys_to_virt(start), size);
>  }
>  
> diff --git a/arch/loongarch/mm/init.c b/arch/loongarch/mm/init.c
> index 00f3822b6e47..c9c57f08fa2c 100644
> --- a/arch/loongarch/mm/init.c
> +++ b/arch/loongarch/mm/init.c
> @@ -86,7 +86,8 @@ int arch_add_memory(int nid, u64 start, u64 size, struct mhp_params *params)
>  	return ret;
>  }
>  
> -void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
> +void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap,
> +			struct dev_pagemap *pgmap)
>  {
>  	unsigned long start_pfn = start >> PAGE_SHIFT;
>  	unsigned long nr_pages = size >> PAGE_SHIFT;
> @@ -95,7 +96,7 @@ void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
>  	/* With altmap the first mapped page is offset from @start */
>  	if (altmap)
>  		page += vmem_altmap_offset(altmap);
> -	__remove_pages(start_pfn, nr_pages, altmap);
> +	__remove_pages(start_pfn, nr_pages, altmap, pgmap);
>  }
>  #endif
>  
> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> index 648d0c5602ec..4c1afab91996 100644
> --- a/arch/powerpc/mm/mem.c
> +++ b/arch/powerpc/mm/mem.c
> @@ -158,12 +158,13 @@ int __ref arch_add_memory(int nid, u64 start, u64 size,
>  	return rc;
>  }
>  
> -void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
> +void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap,
> +			      struct dev_pagemap *pgmap)
>  {
>  	unsigned long start_pfn = start >> PAGE_SHIFT;
>  	unsigned long nr_pages = size >> PAGE_SHIFT;
>  
> -	__remove_pages(start_pfn, nr_pages, altmap);
> +	__remove_pages(start_pfn, nr_pages, altmap, pgmap);
>  	arch_remove_linear_mapping(start, size);
>  }
>  #endif
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index decd7df40fa4..b0092fb842a3 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -1717,9 +1717,10 @@ int __ref arch_add_memory(int nid, u64 start, u64 size, struct mhp_params *param
>  	return ret;
>  }
>  
> -void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
> +void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap,
> +			      struct dev_pagemap *pgmap)
>  {
> -	__remove_pages(start >> PAGE_SHIFT, size >> PAGE_SHIFT, altmap);
> +	__remove_pages(start >> PAGE_SHIFT, size >> PAGE_SHIFT, altmap, pgmap);
>  	remove_linear_mapping(start, size);
>  	flush_tlb_all();
>  }
> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
> index 1f72efc2a579..11a689423440 100644
> --- a/arch/s390/mm/init.c
> +++ b/arch/s390/mm/init.c
> @@ -276,12 +276,13 @@ int arch_add_memory(int nid, u64 start, u64 size,
>  	return rc;
>  }
>  
> -void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
> +void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap,
> +			struct dev_pagemap *pgmap)
>  {
>  	unsigned long start_pfn = start >> PAGE_SHIFT;
>  	unsigned long nr_pages = size >> PAGE_SHIFT;
>  
> -	__remove_pages(start_pfn, nr_pages, altmap);
> +	__remove_pages(start_pfn, nr_pages, altmap, pgmap);
>  	vmem_remove_mapping(start, size);
>  }
>  #endif /* CONFIG_MEMORY_HOTPLUG */
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index df2261fa4f98..77b889b71cf3 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -1288,12 +1288,13 @@ kernel_physical_mapping_remove(unsigned long start, unsigned long end)
>  	remove_pagetable(start, end, true, NULL);
>  }
>  
> -void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
> +void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap,
> +			      struct dev_pagemap *pgmap)
>  {
>  	unsigned long start_pfn = start >> PAGE_SHIFT;
>  	unsigned long nr_pages = size >> PAGE_SHIFT;
>  
> -	__remove_pages(start_pfn, nr_pages, altmap);
> +	__remove_pages(start_pfn, nr_pages, altmap, pgmap);
>  	kernel_physical_mapping_remove(start, start + size);
>  }
>  #endif /* CONFIG_MEMORY_HOTPLUG */
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index 815e908c4135..7c9d66729c60 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -135,9 +135,10 @@ static inline bool movable_node_is_enabled(void)
>  	return movable_node_enabled;
>  }
>  
> -extern void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap);
> +extern void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap,
> +			       struct dev_pagemap *pgmap);
>  extern void __remove_pages(unsigned long start_pfn, unsigned long nr_pages,
> -			   struct vmem_altmap *altmap);
> +			   struct vmem_altmap *altmap, struct dev_pagemap *pgmap);
>  
>  /* reasonably generic interface to expand the physical pages */
>  extern int __add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
> @@ -307,7 +308,8 @@ extern int sparse_add_section(int nid, unsigned long pfn,
>  		unsigned long nr_pages, struct vmem_altmap *altmap,
>  		struct dev_pagemap *pgmap);
>  extern void sparse_remove_section(unsigned long pfn, unsigned long nr_pages,
> -				  struct vmem_altmap *altmap);
> +				  struct vmem_altmap *altmap,
> +				  struct dev_pagemap *pgmap);
>  extern struct zone *zone_for_pfn_range(enum mmop online_type,
>  		int nid, struct memory_group *group, unsigned long start_pfn,
>  		unsigned long nr_pages);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 2a943ec57c85..6a9e2dc751d2 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -583,7 +583,7 @@ void remove_pfn_range_from_zone(struct zone *zone,
>   * calling offline_pages().
>   */
>  void __remove_pages(unsigned long pfn, unsigned long nr_pages,
> -		    struct vmem_altmap *altmap)
> +		    struct vmem_altmap *altmap, struct dev_pagemap *pgmap)
>  {
>  	const unsigned long end_pfn = pfn + nr_pages;
>  	unsigned long cur_nr_pages;
> @@ -598,7 +598,7 @@ void __remove_pages(unsigned long pfn, unsigned long nr_pages,
>  		/* Select all remaining pages up to the next section boundary */
>  		cur_nr_pages = min(end_pfn - pfn,
>  				   SECTION_ALIGN_UP(pfn + 1) - pfn);
> -		sparse_remove_section(pfn, cur_nr_pages, altmap);
> +		sparse_remove_section(pfn, cur_nr_pages, altmap, pgmap);
>  	}
>  }
>  
> @@ -1425,7 +1425,7 @@ static void remove_memory_blocks_and_altmaps(u64 start, u64 size)
>  
>  		remove_memory_block_devices(cur_start, memblock_size);
>  
> -		arch_remove_memory(cur_start, memblock_size, altmap);
> +		arch_remove_memory(cur_start, memblock_size, altmap, NULL);
>  
>  		/* Verify that all vmemmap pages have actually been freed. */
>  		WARN(altmap->alloc, "Altmap not fully unmapped");
> @@ -1468,7 +1468,7 @@ static int create_altmaps_and_memory_blocks(int nid, struct memory_group *group,
>  		ret = create_memory_block_devices(cur_start, memblock_size, nid,
>  						  params.altmap, group);
>  		if (ret) {
> -			arch_remove_memory(cur_start, memblock_size, NULL);
> +			arch_remove_memory(cur_start, memblock_size, NULL, NULL);
>  			kfree(params.altmap);
>  			goto out;
>  		}
> @@ -1554,7 +1554,7 @@ int add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
>  		/* create memory block devices after memory was added */
>  		ret = create_memory_block_devices(start, size, nid, NULL, group);
>  		if (ret) {
> -			arch_remove_memory(start, size, params.altmap);
> +			arch_remove_memory(start, size, params.altmap, NULL);
>  			goto error;
>  		}
>  	}
> @@ -2266,7 +2266,7 @@ static int try_remove_memory(u64 start, u64 size)
>  		 * No altmaps present, do the removal directly
>  		 */
>  		remove_memory_block_devices(start, size);
> -		arch_remove_memory(start, size, NULL);
> +		arch_remove_memory(start, size, NULL, NULL);
>  	} else {
>  		/* all memblocks in the range have altmaps */
>  		remove_memory_blocks_and_altmaps(start, size);
> diff --git a/mm/memremap.c b/mm/memremap.c
> index ac7be07e3361..c45b90f334ea 100644
> --- a/mm/memremap.c
> +++ b/mm/memremap.c
> @@ -97,10 +97,10 @@ static void pageunmap_range(struct dev_pagemap *pgmap, int range_id)
>  				   PHYS_PFN(range_len(range)));
>  	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
>  		__remove_pages(PHYS_PFN(range->start),
> -			       PHYS_PFN(range_len(range)), NULL);
> +			       PHYS_PFN(range_len(range)), NULL, pgmap);
>  	} else {
>  		arch_remove_memory(range->start, range_len(range),
> -				pgmap_altmap(pgmap));
> +				pgmap_altmap(pgmap), pgmap);
>  		kasan_remove_zero_shadow(__va(range->start), range_len(range));
>  	}
>  	mem_hotplug_done();
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index a7b11248b989..40290fbc1db4 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -665,7 +665,7 @@ static struct page * __meminit populate_section_memmap(unsigned long pfn,
>  }
>  
>  static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages,
> -		struct vmem_altmap *altmap)
> +		struct vmem_altmap *altmap, struct dev_pagemap *pgmap)
>  {
>  	unsigned long start = (unsigned long) pfn_to_page(pfn);
>  	unsigned long end = start + nr_pages * sizeof(struct page);
> @@ -674,7 +674,8 @@ static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages,
>  	vmemmap_free(start, end, altmap);
>  }
>  
> -static void free_map_bootmem(struct page *memmap)
> +static void free_map_bootmem(struct page *memmap, struct vmem_altmap *altmap,
> +		struct dev_pagemap *pgmap)
>  {
>  	unsigned long start = (unsigned long)memmap;
>  	unsigned long end = (unsigned long)(memmap + PAGES_PER_SECTION);
> @@ -746,7 +747,7 @@ static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
>   * usage map, but still need to free the vmemmap range.
>   */
>  static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> -		struct vmem_altmap *altmap)
> +		struct vmem_altmap *altmap, struct dev_pagemap *pgmap)
>  {
>  	struct mem_section *ms = __pfn_to_section(pfn);
>  	bool section_is_early = early_section(ms);
> @@ -784,9 +785,9 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
>  	 * section_activate() and pfn_valid() .
>  	 */
>  	if (!section_is_early)
> -		depopulate_section_memmap(pfn, nr_pages, altmap);
> +		depopulate_section_memmap(pfn, nr_pages, altmap, pgmap);
>  	else if (memmap)
> -		free_map_bootmem(memmap);
> +		free_map_bootmem(memmap, altmap, pgmap);
>  
>  	if (empty)
>  		ms->section_mem_map = (unsigned long)NULL;
> @@ -828,7 +829,7 @@ static struct page * __meminit section_activate(int nid, unsigned long pfn,
>  
>  	memmap = populate_section_memmap(pfn, nr_pages, nid, altmap, pgmap);
>  	if (!memmap) {
> -		section_deactivate(pfn, nr_pages, altmap);
> +		section_deactivate(pfn, nr_pages, altmap, pgmap);
>  		return ERR_PTR(-ENOMEM);
>  	}
>  
> @@ -889,13 +890,13 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
>  }
>  
>  void sparse_remove_section(unsigned long pfn, unsigned long nr_pages,
> -			   struct vmem_altmap *altmap)
> +			   struct vmem_altmap *altmap, struct dev_pagemap *pgmap)
>  {
>  	struct mem_section *ms = __pfn_to_section(pfn);
>  
>  	if (WARN_ON_ONCE(!valid_section(ms)))
>  		return;
>  
> -	section_deactivate(pfn, nr_pages, altmap);
> +	section_deactivate(pfn, nr_pages, altmap, pgmap);
>  }
>  #endif /* CONFIG_MEMORY_HOTPLUG */
> -- 
> 2.20.1
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 3/6] mm/sparse-vmemmap: Fix DAX vmemmap accounting with optimization
  2026-04-15 11:14 ` [PATCH v2 3/6] mm/sparse-vmemmap: Fix DAX vmemmap accounting with optimization Muchun Song
@ 2026-04-15 15:58   ` Mike Rapoport
  0 siblings, 0 replies; 13+ messages in thread
From: Mike Rapoport @ 2026-04-15 15:58 UTC (permalink / raw)
  To: Muchun Song
  Cc: Andrew Morton, David Hildenbrand, Muchun Song, Oscar Salvador,
	Michael Ellerman, Madhavan Srinivasan, Lorenzo Stoakes,
	Liam R . Howlett, Vlastimil Babka, Suren Baghdasaryan,
	Michal Hocko, Nicholas Piggin, Christophe Leroy, aneesh.kumar,
	joao.m.martins, linux-mm, linuxppc-dev, linux-kernel

On Wed, Apr 15, 2026 at 07:14:09PM +0800, Muchun Song wrote:
> When vmemmap optimization is enabled for DAX, the nr_memmap_pages
> counter in /proc/vmstat is incorrect. The current code always accounts
> for the full, non-optimized vmemmap size, but vmemmap optimization
> reduces the actual number of vmemmap pages by reusing tail pages. This
> causes the system to overcount vmemmap usage, leading to inaccurate
> page statistics in /proc/vmstat.
> 
> Fix this by introducing section_vmemmap_pages(), which returns the exact
> vmemmap page count for a given pfn range based on whether optimization
> is in effect.
> 
> Fixes: 15995a352474 ("mm: report per-page metadata information")
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>

Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

> ---
>  mm/sparse-vmemmap.c | 32 ++++++++++++++++++++++++++++----
>  1 file changed, 28 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index 40290fbc1db4..05e3e2b94e32 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -652,6 +652,29 @@ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
>  	}
>  }
>  
> +static int __meminit section_vmemmap_pages(unsigned long pfn, unsigned long nr_pages,
> +					   struct vmem_altmap *altmap,
> +					   struct dev_pagemap *pgmap)
> +{
> +	unsigned int order = pgmap ? pgmap->vmemmap_shift : 0;
> +	unsigned long pages_per_compound = 1L << order;
> +
> +	VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | nr_pages, min(pages_per_compound,
> +							PAGES_PER_SECTION)));
> +	VM_WARN_ON_ONCE(pfn_to_section_nr(pfn) != pfn_to_section_nr(pfn + nr_pages - 1));
> +
> +	if (!vmemmap_can_optimize(altmap, pgmap))
> +		return DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE);
> +
> +	if (order < PFN_SECTION_SHIFT)
> +		return VMEMMAP_RESERVE_NR * nr_pages / pages_per_compound;
> +
> +	if (IS_ALIGNED(pfn, pages_per_compound))
> +		return VMEMMAP_RESERVE_NR;
> +
> +	return 0;
> +}
> +
>  static struct page * __meminit populate_section_memmap(unsigned long pfn,
>  		unsigned long nr_pages, int nid, struct vmem_altmap *altmap,
>  		struct dev_pagemap *pgmap)
> @@ -659,7 +682,7 @@ static struct page * __meminit populate_section_memmap(unsigned long pfn,
>  	struct page *page = __populate_section_memmap(pfn, nr_pages, nid, altmap,
>  						      pgmap);
>  
> -	memmap_pages_add(DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE));
> +	memmap_pages_add(section_vmemmap_pages(pfn, nr_pages, altmap, pgmap));
>  
>  	return page;
>  }
> @@ -670,7 +693,7 @@ static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages,
>  	unsigned long start = (unsigned long) pfn_to_page(pfn);
>  	unsigned long end = start + nr_pages * sizeof(struct page);
>  
> -	memmap_pages_add(-1L * (DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE)));
> +	memmap_pages_add(-section_vmemmap_pages(pfn, nr_pages, altmap, pgmap));
>  	vmemmap_free(start, end, altmap);
>  }
>  
> @@ -679,9 +702,10 @@ static void free_map_bootmem(struct page *memmap, struct vmem_altmap *altmap,
>  {
>  	unsigned long start = (unsigned long)memmap;
>  	unsigned long end = (unsigned long)(memmap + PAGES_PER_SECTION);
> +	unsigned long pfn = page_to_pfn(memmap);
>  
> -	memmap_boot_pages_add(-1L * (DIV_ROUND_UP(PAGES_PER_SECTION * sizeof(struct page),
> -						  PAGE_SIZE)));
> +	memmap_boot_pages_add(-section_vmemmap_pages(pfn, PAGES_PER_SECTION,
> +						     altmap, pgmap));
>  	vmemmap_free(start, end, NULL);
>  }
>  
> -- 
> 2.20.1
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-04-15 15:59 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-04-15 11:14 [PATCH v2 0/6] mm: Fix vmemmap optimization accounting and initialization Muchun Song
2026-04-15 11:14 ` [PATCH v2 1/6] mm/sparse-vmemmap: Fix vmemmap accounting underflow Muchun Song
2026-04-15 11:26   ` Muchun Song
2026-04-15 15:53   ` Mike Rapoport
2026-04-15 11:14 ` [PATCH v2 2/6] mm/sparse-vmemmap: Pass @pgmap argument to memory deactivation paths Muchun Song
2026-04-15 15:55   ` Mike Rapoport
2026-04-15 11:14 ` [PATCH v2 3/6] mm/sparse-vmemmap: Fix DAX vmemmap accounting with optimization Muchun Song
2026-04-15 15:58   ` Mike Rapoport
2026-04-15 11:14 ` [PATCH v2 4/6] mm/sparse-vmemmap: Pass @pgmap argument to arch vmemmap_populate() Muchun Song
2026-04-15 12:13   ` Joao Martins
2026-04-15 12:21     ` Muchun Song
2026-04-15 11:14 ` [PATCH v2 5/6] mm/sparse-vmemmap: Fix missing architecture-specific page table sync Muchun Song
2026-04-15 11:14 ` [PATCH v2 6/6] mm/mm_init: Fix pageblock migratetype for ZONE_DEVICE compound pages Muchun Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox