* [PATCH v3 0/4] mm: Fix vmemmap optimization accounting and initialization
@ 2026-04-21 2:20 Muchun Song
2026-04-21 2:20 ` [PATCH v3 1/4] mm/sparse-vmemmap: Fix vmemmap accounting underflow Muchun Song
` (3 more replies)
0 siblings, 4 replies; 13+ messages in thread
From: Muchun Song @ 2026-04-21 2:20 UTC (permalink / raw)
To: Andrew Morton, David Hildenbrand, Muchun Song, Oscar Salvador,
Michael Ellerman, Madhavan Srinivasan
Cc: Muchun Song, Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Nicholas Piggin,
Christophe Leroy, aneesh.kumar, joao.m.martins, linux-mm,
linuxppc-dev, linux-kernel
The series fixes several bugs in vmemmap optimization, mainly about incorrect
page accounting when vmemmap optimization is enabled for DAX and memory
hotplug paths. Also fixes the pageblock migratetype initialization for
ZONE_DEVICE compound pages.
v2 -> v3:
- Drop patch 4 and patch 5 from v2 since the page table sync issue has
already been fixed by Harry's series.
- [Patch 1]: Remove an unintentionally left line.
- [Patch 4 (previously Patch 6)]: Call cond_resched() every
PAGES_PER_SECTION instead of every compound page, as suggested by
Mike Rapoport.
- Collect Acked-by and Reviewed-by tags from Mike Rapoport.
Muchun Song (4):
mm/sparse-vmemmap: Fix vmemmap accounting underflow
mm/sparse-vmemmap: Pass @pgmap argument to memory deactivation paths
mm/sparse-vmemmap: Fix DAX vmemmap accounting with optimization
mm/mm_init: Fix pageblock migratetype for ZONE_DEVICE compound pages
arch/arm64/mm/mmu.c | 5 +--
arch/loongarch/mm/init.c | 5 +--
arch/powerpc/mm/mem.c | 5 +--
arch/riscv/mm/init.c | 5 +--
arch/s390/mm/init.c | 5 +--
arch/x86/mm/init_64.c | 5 +--
include/linux/memory_hotplug.h | 8 +++--
mm/memory_hotplug.c | 12 +++----
mm/memremap.c | 4 +--
mm/mm_init.c | 43 +++++++++++++++---------
mm/sparse-vmemmap.c | 61 +++++++++++++++++++++++++---------
11 files changed, 104 insertions(+), 54 deletions(-)
--
2.20.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v3 1/4] mm/sparse-vmemmap: Fix vmemmap accounting underflow
2026-04-21 2:20 [PATCH v3 0/4] mm: Fix vmemmap optimization accounting and initialization Muchun Song
@ 2026-04-21 2:20 ` Muchun Song
2026-04-21 3:45 ` Oscar Salvador
2026-04-21 2:20 ` [PATCH v3 2/4] mm/sparse-vmemmap: Pass @pgmap argument to memory deactivation paths Muchun Song
` (2 subsequent siblings)
3 siblings, 1 reply; 13+ messages in thread
From: Muchun Song @ 2026-04-21 2:20 UTC (permalink / raw)
To: Andrew Morton, David Hildenbrand, Muchun Song, Oscar Salvador,
Michael Ellerman, Madhavan Srinivasan
Cc: Muchun Song, Mike Rapoport, Lorenzo Stoakes, Liam R . Howlett,
Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
Nicholas Piggin, Christophe Leroy, aneesh.kumar, joao.m.martins,
linux-mm, linuxppc-dev, linux-kernel
In section_activate(), if populate_section_memmap() fails, the error
handling path calls section_deactivate() to roll back the state. This
causes a vmemmap accounting imbalance.
Since commit c3576889d87b ("mm: fix accounting of memmap pages"),
memmap pages are accounted for only after populate_section_memmap()
succeeds. However, the failure path unconditionally calls
section_deactivate(), which decreases the vmemmap count. Consequently,
a failure in populate_section_memmap() leads to an accounting underflow,
incorrectly reducing the system's tracked vmemmap usage.
Fix this more thoroughly by moving all accounting calls into the lower
level functions that actually perform the vmemmap allocation and freeing:
- populate_section_memmap() accounts for newly allocated vmemmap pages
- depopulate_section_memmap() unaccounts when vmemmap is freed
This ensures proper accounting in all code paths, including error
handling and early section cases.
Fixes: c3576889d87b ("mm: fix accounting of memmap pages")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
mm/sparse-vmemmap.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 6eadb9d116e4..a7b11248b989 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -656,7 +656,12 @@ static struct page * __meminit populate_section_memmap(unsigned long pfn,
unsigned long nr_pages, int nid, struct vmem_altmap *altmap,
struct dev_pagemap *pgmap)
{
- return __populate_section_memmap(pfn, nr_pages, nid, altmap, pgmap);
+ struct page *page = __populate_section_memmap(pfn, nr_pages, nid, altmap,
+ pgmap);
+
+ memmap_pages_add(DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE));
+
+ return page;
}
static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages,
@@ -665,13 +670,17 @@ static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages,
unsigned long start = (unsigned long) pfn_to_page(pfn);
unsigned long end = start + nr_pages * sizeof(struct page);
+ memmap_pages_add(-1L * (DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE)));
vmemmap_free(start, end, altmap);
}
+
static void free_map_bootmem(struct page *memmap)
{
unsigned long start = (unsigned long)memmap;
unsigned long end = (unsigned long)(memmap + PAGES_PER_SECTION);
+ memmap_boot_pages_add(-1L * (DIV_ROUND_UP(PAGES_PER_SECTION * sizeof(struct page),
+ PAGE_SIZE)));
vmemmap_free(start, end, NULL);
}
@@ -774,14 +783,10 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
* The memmap of early sections is always fully populated. See
* section_activate() and pfn_valid() .
*/
- if (!section_is_early) {
- memmap_pages_add(-1L * (DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE)));
+ if (!section_is_early)
depopulate_section_memmap(pfn, nr_pages, altmap);
- } else if (memmap) {
- memmap_boot_pages_add(-1L * (DIV_ROUND_UP(nr_pages * sizeof(struct page),
- PAGE_SIZE)));
+ else if (memmap)
free_map_bootmem(memmap);
- }
if (empty)
ms->section_mem_map = (unsigned long)NULL;
@@ -826,7 +831,6 @@ static struct page * __meminit section_activate(int nid, unsigned long pfn,
section_deactivate(pfn, nr_pages, altmap);
return ERR_PTR(-ENOMEM);
}
- memmap_pages_add(DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE));
return memmap;
}
--
2.20.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v3 2/4] mm/sparse-vmemmap: Pass @pgmap argument to memory deactivation paths
2026-04-21 2:20 [PATCH v3 0/4] mm: Fix vmemmap optimization accounting and initialization Muchun Song
2026-04-21 2:20 ` [PATCH v3 1/4] mm/sparse-vmemmap: Fix vmemmap accounting underflow Muchun Song
@ 2026-04-21 2:20 ` Muchun Song
2026-04-21 3:55 ` Oscar Salvador
2026-04-21 2:20 ` [PATCH v3 3/4] mm/sparse-vmemmap: Fix DAX vmemmap accounting with optimization Muchun Song
2026-04-21 2:20 ` [PATCH v3 4/4] mm/mm_init: Fix pageblock migratetype for ZONE_DEVICE compound pages Muchun Song
3 siblings, 1 reply; 13+ messages in thread
From: Muchun Song @ 2026-04-21 2:20 UTC (permalink / raw)
To: Andrew Morton, David Hildenbrand, Muchun Song, Oscar Salvador,
Michael Ellerman, Madhavan Srinivasan
Cc: Muchun Song, Mike Rapoport, Lorenzo Stoakes, Liam R . Howlett,
Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
Nicholas Piggin, Christophe Leroy, aneesh.kumar, joao.m.martins,
linux-mm, linuxppc-dev, linux-kernel
Currently, the memory hot-remove call chain -- arch_remove_memory(),
__remove_pages(), sparse_remove_section() and section_deactivate() --
does not carry the struct dev_pagemap pointer. This prevents the lower
levels from knowing whether the section was originally populated with
vmemmap optimizations (e.g., DAX with vmemmap optimization enabled).
Without this information, we cannot call vmemmap_can_optimize() to
determine if the vmemmap pages were optimized. As a result, the vmemmap
page accounting during teardown will mistakenly assume a non-optimized
allocation, leading to incorrect memmap statistics.
To lay the groundwork for fixing the vmemmap page accounting, we need
to pass the @pgmap pointer down to the deactivation location. Plumb the
@pgmap argument through the APIs of arch_remove_memory(), __remove_pages()
and sparse_remove_section(), mirroring the corresponding *_activate()
paths.
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
arch/arm64/mm/mmu.c | 5 +++--
arch/loongarch/mm/init.c | 5 +++--
arch/powerpc/mm/mem.c | 5 +++--
arch/riscv/mm/init.c | 5 +++--
arch/s390/mm/init.c | 5 +++--
arch/x86/mm/init_64.c | 5 +++--
include/linux/memory_hotplug.h | 8 +++++---
mm/memory_hotplug.c | 12 ++++++------
mm/memremap.c | 4 ++--
mm/sparse-vmemmap.c | 17 +++++++++--------
10 files changed, 40 insertions(+), 31 deletions(-)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index dd85e093ffdb..e5a42b7a0160 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -2024,12 +2024,13 @@ int arch_add_memory(int nid, u64 start, u64 size,
return ret;
}
-void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
+void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap,
+ struct dev_pagemap *pgmap)
{
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
- __remove_pages(start_pfn, nr_pages, altmap);
+ __remove_pages(start_pfn, nr_pages, altmap, pgmap);
__remove_pgd_mapping(swapper_pg_dir, __phys_to_virt(start), size);
}
diff --git a/arch/loongarch/mm/init.c b/arch/loongarch/mm/init.c
index 00f3822b6e47..c9c57f08fa2c 100644
--- a/arch/loongarch/mm/init.c
+++ b/arch/loongarch/mm/init.c
@@ -86,7 +86,8 @@ int arch_add_memory(int nid, u64 start, u64 size, struct mhp_params *params)
return ret;
}
-void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
+void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap,
+ struct dev_pagemap *pgmap)
{
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
@@ -95,7 +96,7 @@ void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
/* With altmap the first mapped page is offset from @start */
if (altmap)
page += vmem_altmap_offset(altmap);
- __remove_pages(start_pfn, nr_pages, altmap);
+ __remove_pages(start_pfn, nr_pages, altmap, pgmap);
}
#endif
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 648d0c5602ec..4c1afab91996 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -158,12 +158,13 @@ int __ref arch_add_memory(int nid, u64 start, u64 size,
return rc;
}
-void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
+void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap,
+ struct dev_pagemap *pgmap)
{
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
- __remove_pages(start_pfn, nr_pages, altmap);
+ __remove_pages(start_pfn, nr_pages, altmap, pgmap);
arch_remove_linear_mapping(start, size);
}
#endif
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index decd7df40fa4..b0092fb842a3 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -1717,9 +1717,10 @@ int __ref arch_add_memory(int nid, u64 start, u64 size, struct mhp_params *param
return ret;
}
-void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
+void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap,
+ struct dev_pagemap *pgmap)
{
- __remove_pages(start >> PAGE_SHIFT, size >> PAGE_SHIFT, altmap);
+ __remove_pages(start >> PAGE_SHIFT, size >> PAGE_SHIFT, altmap, pgmap);
remove_linear_mapping(start, size);
flush_tlb_all();
}
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index 1f72efc2a579..11a689423440 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -276,12 +276,13 @@ int arch_add_memory(int nid, u64 start, u64 size,
return rc;
}
-void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
+void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap,
+ struct dev_pagemap *pgmap)
{
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
- __remove_pages(start_pfn, nr_pages, altmap);
+ __remove_pages(start_pfn, nr_pages, altmap, pgmap);
vmem_remove_mapping(start, size);
}
#endif /* CONFIG_MEMORY_HOTPLUG */
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index df2261fa4f98..77b889b71cf3 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1288,12 +1288,13 @@ kernel_physical_mapping_remove(unsigned long start, unsigned long end)
remove_pagetable(start, end, true, NULL);
}
-void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
+void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap,
+ struct dev_pagemap *pgmap)
{
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
- __remove_pages(start_pfn, nr_pages, altmap);
+ __remove_pages(start_pfn, nr_pages, altmap, pgmap);
kernel_physical_mapping_remove(start, start + size);
}
#endif /* CONFIG_MEMORY_HOTPLUG */
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 815e908c4135..7c9d66729c60 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -135,9 +135,10 @@ static inline bool movable_node_is_enabled(void)
return movable_node_enabled;
}
-extern void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap);
+extern void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap,
+ struct dev_pagemap *pgmap);
extern void __remove_pages(unsigned long start_pfn, unsigned long nr_pages,
- struct vmem_altmap *altmap);
+ struct vmem_altmap *altmap, struct dev_pagemap *pgmap);
/* reasonably generic interface to expand the physical pages */
extern int __add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
@@ -307,7 +308,8 @@ extern int sparse_add_section(int nid, unsigned long pfn,
unsigned long nr_pages, struct vmem_altmap *altmap,
struct dev_pagemap *pgmap);
extern void sparse_remove_section(unsigned long pfn, unsigned long nr_pages,
- struct vmem_altmap *altmap);
+ struct vmem_altmap *altmap,
+ struct dev_pagemap *pgmap);
extern struct zone *zone_for_pfn_range(enum mmop online_type,
int nid, struct memory_group *group, unsigned long start_pfn,
unsigned long nr_pages);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 2a943ec57c85..6a9e2dc751d2 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -583,7 +583,7 @@ void remove_pfn_range_from_zone(struct zone *zone,
* calling offline_pages().
*/
void __remove_pages(unsigned long pfn, unsigned long nr_pages,
- struct vmem_altmap *altmap)
+ struct vmem_altmap *altmap, struct dev_pagemap *pgmap)
{
const unsigned long end_pfn = pfn + nr_pages;
unsigned long cur_nr_pages;
@@ -598,7 +598,7 @@ void __remove_pages(unsigned long pfn, unsigned long nr_pages,
/* Select all remaining pages up to the next section boundary */
cur_nr_pages = min(end_pfn - pfn,
SECTION_ALIGN_UP(pfn + 1) - pfn);
- sparse_remove_section(pfn, cur_nr_pages, altmap);
+ sparse_remove_section(pfn, cur_nr_pages, altmap, pgmap);
}
}
@@ -1425,7 +1425,7 @@ static void remove_memory_blocks_and_altmaps(u64 start, u64 size)
remove_memory_block_devices(cur_start, memblock_size);
- arch_remove_memory(cur_start, memblock_size, altmap);
+ arch_remove_memory(cur_start, memblock_size, altmap, NULL);
/* Verify that all vmemmap pages have actually been freed. */
WARN(altmap->alloc, "Altmap not fully unmapped");
@@ -1468,7 +1468,7 @@ static int create_altmaps_and_memory_blocks(int nid, struct memory_group *group,
ret = create_memory_block_devices(cur_start, memblock_size, nid,
params.altmap, group);
if (ret) {
- arch_remove_memory(cur_start, memblock_size, NULL);
+ arch_remove_memory(cur_start, memblock_size, NULL, NULL);
kfree(params.altmap);
goto out;
}
@@ -1554,7 +1554,7 @@ int add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
/* create memory block devices after memory was added */
ret = create_memory_block_devices(start, size, nid, NULL, group);
if (ret) {
- arch_remove_memory(start, size, params.altmap);
+ arch_remove_memory(start, size, params.altmap, NULL);
goto error;
}
}
@@ -2266,7 +2266,7 @@ static int try_remove_memory(u64 start, u64 size)
* No altmaps present, do the removal directly
*/
remove_memory_block_devices(start, size);
- arch_remove_memory(start, size, NULL);
+ arch_remove_memory(start, size, NULL, NULL);
} else {
/* all memblocks in the range have altmaps */
remove_memory_blocks_and_altmaps(start, size);
diff --git a/mm/memremap.c b/mm/memremap.c
index 053842d45cb1..81766d822400 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -97,10 +97,10 @@ static void pageunmap_range(struct dev_pagemap *pgmap, int range_id)
PHYS_PFN(range_len(range)));
if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
__remove_pages(PHYS_PFN(range->start),
- PHYS_PFN(range_len(range)), NULL);
+ PHYS_PFN(range_len(range)), NULL, pgmap);
} else {
arch_remove_memory(range->start, range_len(range),
- pgmap_altmap(pgmap));
+ pgmap_altmap(pgmap), pgmap);
kasan_remove_zero_shadow(__va(range->start), range_len(range));
}
mem_hotplug_done();
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index a7b11248b989..40290fbc1db4 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -665,7 +665,7 @@ static struct page * __meminit populate_section_memmap(unsigned long pfn,
}
static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages,
- struct vmem_altmap *altmap)
+ struct vmem_altmap *altmap, struct dev_pagemap *pgmap)
{
unsigned long start = (unsigned long) pfn_to_page(pfn);
unsigned long end = start + nr_pages * sizeof(struct page);
@@ -674,7 +674,8 @@ static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages,
vmemmap_free(start, end, altmap);
}
-static void free_map_bootmem(struct page *memmap)
+static void free_map_bootmem(struct page *memmap, struct vmem_altmap *altmap,
+ struct dev_pagemap *pgmap)
{
unsigned long start = (unsigned long)memmap;
unsigned long end = (unsigned long)(memmap + PAGES_PER_SECTION);
@@ -746,7 +747,7 @@ static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
* usage map, but still need to free the vmemmap range.
*/
static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
- struct vmem_altmap *altmap)
+ struct vmem_altmap *altmap, struct dev_pagemap *pgmap)
{
struct mem_section *ms = __pfn_to_section(pfn);
bool section_is_early = early_section(ms);
@@ -784,9 +785,9 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
* section_activate() and pfn_valid() .
*/
if (!section_is_early)
- depopulate_section_memmap(pfn, nr_pages, altmap);
+ depopulate_section_memmap(pfn, nr_pages, altmap, pgmap);
else if (memmap)
- free_map_bootmem(memmap);
+ free_map_bootmem(memmap, altmap, pgmap);
if (empty)
ms->section_mem_map = (unsigned long)NULL;
@@ -828,7 +829,7 @@ static struct page * __meminit section_activate(int nid, unsigned long pfn,
memmap = populate_section_memmap(pfn, nr_pages, nid, altmap, pgmap);
if (!memmap) {
- section_deactivate(pfn, nr_pages, altmap);
+ section_deactivate(pfn, nr_pages, altmap, pgmap);
return ERR_PTR(-ENOMEM);
}
@@ -889,13 +890,13 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
}
void sparse_remove_section(unsigned long pfn, unsigned long nr_pages,
- struct vmem_altmap *altmap)
+ struct vmem_altmap *altmap, struct dev_pagemap *pgmap)
{
struct mem_section *ms = __pfn_to_section(pfn);
if (WARN_ON_ONCE(!valid_section(ms)))
return;
- section_deactivate(pfn, nr_pages, altmap);
+ section_deactivate(pfn, nr_pages, altmap, pgmap);
}
#endif /* CONFIG_MEMORY_HOTPLUG */
--
2.20.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v3 3/4] mm/sparse-vmemmap: Fix DAX vmemmap accounting with optimization
2026-04-21 2:20 [PATCH v3 0/4] mm: Fix vmemmap optimization accounting and initialization Muchun Song
2026-04-21 2:20 ` [PATCH v3 1/4] mm/sparse-vmemmap: Fix vmemmap accounting underflow Muchun Song
2026-04-21 2:20 ` [PATCH v3 2/4] mm/sparse-vmemmap: Pass @pgmap argument to memory deactivation paths Muchun Song
@ 2026-04-21 2:20 ` Muchun Song
2026-04-21 4:00 ` Oscar Salvador
2026-04-21 2:20 ` [PATCH v3 4/4] mm/mm_init: Fix pageblock migratetype for ZONE_DEVICE compound pages Muchun Song
3 siblings, 1 reply; 13+ messages in thread
From: Muchun Song @ 2026-04-21 2:20 UTC (permalink / raw)
To: Andrew Morton, David Hildenbrand, Muchun Song, Oscar Salvador,
Michael Ellerman, Madhavan Srinivasan
Cc: Muchun Song, Mike Rapoport, Lorenzo Stoakes, Liam R . Howlett,
Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
Nicholas Piggin, Christophe Leroy, aneesh.kumar, joao.m.martins,
linux-mm, linuxppc-dev, linux-kernel
When vmemmap optimization is enabled for DAX, the nr_memmap_pages
counter in /proc/vmstat is incorrect. The current code always accounts
for the full, non-optimized vmemmap size, but vmemmap optimization
reduces the actual number of vmemmap pages by reusing tail pages. This
causes the system to overcount vmemmap usage, leading to inaccurate
page statistics in /proc/vmstat.
Fix this by introducing section_vmemmap_pages(), which returns the exact
vmemmap page count for a given pfn range based on whether optimization
is in effect.
Fixes: 15995a352474 ("mm: report per-page metadata information")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
mm/sparse-vmemmap.c | 32 ++++++++++++++++++++++++++++----
1 file changed, 28 insertions(+), 4 deletions(-)
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 40290fbc1db4..05e3e2b94e32 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -652,6 +652,29 @@ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
}
}
+static int __meminit section_vmemmap_pages(unsigned long pfn, unsigned long nr_pages,
+ struct vmem_altmap *altmap,
+ struct dev_pagemap *pgmap)
+{
+ unsigned int order = pgmap ? pgmap->vmemmap_shift : 0;
+ unsigned long pages_per_compound = 1L << order;
+
+ VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | nr_pages, min(pages_per_compound,
+ PAGES_PER_SECTION)));
+ VM_WARN_ON_ONCE(pfn_to_section_nr(pfn) != pfn_to_section_nr(pfn + nr_pages - 1));
+
+ if (!vmemmap_can_optimize(altmap, pgmap))
+ return DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE);
+
+ if (order < PFN_SECTION_SHIFT)
+ return VMEMMAP_RESERVE_NR * nr_pages / pages_per_compound;
+
+ if (IS_ALIGNED(pfn, pages_per_compound))
+ return VMEMMAP_RESERVE_NR;
+
+ return 0;
+}
+
static struct page * __meminit populate_section_memmap(unsigned long pfn,
unsigned long nr_pages, int nid, struct vmem_altmap *altmap,
struct dev_pagemap *pgmap)
@@ -659,7 +682,7 @@ static struct page * __meminit populate_section_memmap(unsigned long pfn,
struct page *page = __populate_section_memmap(pfn, nr_pages, nid, altmap,
pgmap);
- memmap_pages_add(DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE));
+ memmap_pages_add(section_vmemmap_pages(pfn, nr_pages, altmap, pgmap));
return page;
}
@@ -670,7 +693,7 @@ static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages,
unsigned long start = (unsigned long) pfn_to_page(pfn);
unsigned long end = start + nr_pages * sizeof(struct page);
- memmap_pages_add(-1L * (DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE)));
+ memmap_pages_add(-section_vmemmap_pages(pfn, nr_pages, altmap, pgmap));
vmemmap_free(start, end, altmap);
}
@@ -679,9 +702,10 @@ static void free_map_bootmem(struct page *memmap, struct vmem_altmap *altmap,
{
unsigned long start = (unsigned long)memmap;
unsigned long end = (unsigned long)(memmap + PAGES_PER_SECTION);
+ unsigned long pfn = page_to_pfn(memmap);
- memmap_boot_pages_add(-1L * (DIV_ROUND_UP(PAGES_PER_SECTION * sizeof(struct page),
- PAGE_SIZE)));
+ memmap_boot_pages_add(-section_vmemmap_pages(pfn, PAGES_PER_SECTION,
+ altmap, pgmap));
vmemmap_free(start, end, NULL);
}
--
2.20.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v3 4/4] mm/mm_init: Fix pageblock migratetype for ZONE_DEVICE compound pages
2026-04-21 2:20 [PATCH v3 0/4] mm: Fix vmemmap optimization accounting and initialization Muchun Song
` (2 preceding siblings ...)
2026-04-21 2:20 ` [PATCH v3 3/4] mm/sparse-vmemmap: Fix DAX vmemmap accounting with optimization Muchun Song
@ 2026-04-21 2:20 ` Muchun Song
2026-04-21 4:15 ` Oscar Salvador
2026-04-21 9:31 ` Muchun Song
3 siblings, 2 replies; 13+ messages in thread
From: Muchun Song @ 2026-04-21 2:20 UTC (permalink / raw)
To: Andrew Morton, David Hildenbrand, Muchun Song, Oscar Salvador,
Michael Ellerman, Madhavan Srinivasan
Cc: Muchun Song, Mike Rapoport, Lorenzo Stoakes, Liam R . Howlett,
Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
Nicholas Piggin, Christophe Leroy, aneesh.kumar, joao.m.martins,
linux-mm, linuxppc-dev, linux-kernel
The memmap_init_zone_device() function only initializes the migratetype
of the first pageblock of a compound page. If the compound page size
exceeds pageblock_nr_pages (e.g., 1GB hugepages with 2MB pageblocks),
subsequent pageblocks in the compound page remain uninitialized.
Move the migratetype initialization out of __init_zone_device_page()
and into a separate pageblock_migratetype_init_range() function. This
iterates over the entire PFN range of the memory, ensuring that all
pageblocks are correctly initialized.
Fixes: c4386bd8ee3a ("mm/memremap: add ZONE_DEVICE support for compound pages")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
mm/mm_init.c | 43 ++++++++++++++++++++++++++++---------------
1 file changed, 28 insertions(+), 15 deletions(-)
diff --git a/mm/mm_init.c b/mm/mm_init.c
index f9f8e1af921c..e2d8eae23aa3 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -674,6 +674,19 @@ static inline void fixup_hashdist(void)
static inline void fixup_hashdist(void) {}
#endif /* CONFIG_NUMA */
+static __meminit void pageblock_migratetype_init_range(unsigned long pfn,
+ unsigned long nr_pages,
+ int migratetype)
+{
+ unsigned long end = pfn + nr_pages;
+
+ for (pfn = pageblock_align(pfn); pfn < end; pfn += pageblock_nr_pages) {
+ init_pageblock_migratetype(pfn_to_page(pfn), migratetype, false);
+ if (IS_ALIGNED(pfn, PAGES_PER_SECTION))
+ cond_resched();
+ }
+}
+
/*
* Initialize a reserved page unconditionally, finding its zone first.
*/
@@ -1011,21 +1024,6 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn,
page_folio(page)->pgmap = pgmap;
page->zone_device_data = NULL;
- /*
- * Mark the block movable so that blocks are reserved for
- * movable at startup. This will force kernel allocations
- * to reserve their blocks rather than leaking throughout
- * the address space during boot when many long-lived
- * kernel allocations are made.
- *
- * Please note that MEMINIT_HOTPLUG path doesn't clear memmap
- * because this is done early in section_activate()
- */
- if (pageblock_aligned(pfn)) {
- init_pageblock_migratetype(page, MIGRATE_MOVABLE, false);
- cond_resched();
- }
-
/*
* ZONE_DEVICE pages other than MEMORY_TYPE_GENERIC are released
* directly to the driver page allocator which will set the page count
@@ -1122,6 +1120,9 @@ void __ref memmap_init_zone_device(struct zone *zone,
__init_zone_device_page(page, pfn, zone_idx, nid, pgmap);
+ if (IS_ALIGNED(pfn, PAGES_PER_SECTION))
+ cond_resched();
+
if (pfns_per_compound == 1)
continue;
@@ -1129,6 +1130,18 @@ void __ref memmap_init_zone_device(struct zone *zone,
compound_nr_pages(altmap, pgmap));
}
+ /*
+ * Mark the block movable so that blocks are reserved for
+ * movable at startup. This will force kernel allocations
+ * to reserve their blocks rather than leaking throughout
+ * the address space during boot when many long-lived
+ * kernel allocations are made.
+ *
+ * Please note that MEMINIT_HOTPLUG path doesn't clear memmap
+ * because this is done early in section_activate()
+ */
+ pageblock_migratetype_init_range(start_pfn, nr_pages, MIGRATE_MOVABLE);
+
pr_debug("%s initialised %lu pages in %ums\n", __func__,
nr_pages, jiffies_to_msecs(jiffies - start));
}
--
2.20.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 1/4] mm/sparse-vmemmap: Fix vmemmap accounting underflow
2026-04-21 2:20 ` [PATCH v3 1/4] mm/sparse-vmemmap: Fix vmemmap accounting underflow Muchun Song
@ 2026-04-21 3:45 ` Oscar Salvador
0 siblings, 0 replies; 13+ messages in thread
From: Oscar Salvador @ 2026-04-21 3:45 UTC (permalink / raw)
To: Muchun Song
Cc: Andrew Morton, David Hildenbrand, Muchun Song, Michael Ellerman,
Madhavan Srinivasan, Mike Rapoport, Lorenzo Stoakes,
Liam R . Howlett, Vlastimil Babka, Suren Baghdasaryan,
Michal Hocko, Nicholas Piggin, Christophe Leroy, aneesh.kumar,
joao.m.martins, linux-mm, linuxppc-dev, linux-kernel
On Tue, Apr 21, 2026 at 10:20:41AM +0800, Muchun Song wrote:
> In section_activate(), if populate_section_memmap() fails, the error
> handling path calls section_deactivate() to roll back the state. This
> causes a vmemmap accounting imbalance.
>
> Since commit c3576889d87b ("mm: fix accounting of memmap pages"),
> memmap pages are accounted for only after populate_section_memmap()
> succeeds. However, the failure path unconditionally calls
> section_deactivate(), which decreases the vmemmap count. Consequently,
> a failure in populate_section_memmap() leads to an accounting underflow,
> incorrectly reducing the system's tracked vmemmap usage.
>
> Fix this more thoroughly by moving all accounting calls into the lower
> level functions that actually perform the vmemmap allocation and freeing:
>
> - populate_section_memmap() accounts for newly allocated vmemmap pages
> - depopulate_section_memmap() unaccounts when vmemmap is freed
>
> This ensures proper accounting in all code paths, including error
> handling and early section cases.
>
> Fixes: c3576889d87b ("mm: fix accounting of memmap pages")
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Oscar Salvador <osalvador@suse.de>
--
Oscar Salvador
SUSE Labs
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 2/4] mm/sparse-vmemmap: Pass @pgmap argument to memory deactivation paths
2026-04-21 2:20 ` [PATCH v3 2/4] mm/sparse-vmemmap: Pass @pgmap argument to memory deactivation paths Muchun Song
@ 2026-04-21 3:55 ` Oscar Salvador
2026-04-21 4:01 ` Muchun Song
0 siblings, 1 reply; 13+ messages in thread
From: Oscar Salvador @ 2026-04-21 3:55 UTC (permalink / raw)
To: Muchun Song
Cc: Andrew Morton, David Hildenbrand, Muchun Song, Michael Ellerman,
Madhavan Srinivasan, Mike Rapoport, Lorenzo Stoakes,
Liam R . Howlett, Vlastimil Babka, Suren Baghdasaryan,
Michal Hocko, Nicholas Piggin, Christophe Leroy, aneesh.kumar,
joao.m.martins, linux-mm, linuxppc-dev, linux-kernel
On Tue, Apr 21, 2026 at 10:20:42AM +0800, Muchun Song wrote:
> Currently, the memory hot-remove call chain -- arch_remove_memory(),
> __remove_pages(), sparse_remove_section() and section_deactivate() --
> does not carry the struct dev_pagemap pointer. This prevents the lower
> levels from knowing whether the section was originally populated with
> vmemmap optimizations (e.g., DAX with vmemmap optimization enabled).
>
> Without this information, we cannot call vmemmap_can_optimize() to
> determine if the vmemmap pages were optimized. As a result, the vmemmap
> page accounting during teardown will mistakenly assume a non-optimized
> allocation, leading to incorrect memmap statistics.
>
> To lay the groundwork for fixing the vmemmap page accounting, we need
> to pass the @pgmap pointer down to the deactivation location. Plumb the
> @pgmap argument through the APIs of arch_remove_memory(), __remove_pages()
> and sparse_remove_section(), mirroring the corresponding *_activate()
> paths.
>
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
The change looks good to me, but I was wondering whether we should pass a
mhp struct instead to low-level functions like arch_remove_memory and
__remove_pages and have __remove_pages then pass the right stuff down
the road.
That way it would mimic more what we do in hot-add path.
--
Oscar Salvador
SUSE Labs
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 3/4] mm/sparse-vmemmap: Fix DAX vmemmap accounting with optimization
2026-04-21 2:20 ` [PATCH v3 3/4] mm/sparse-vmemmap: Fix DAX vmemmap accounting with optimization Muchun Song
@ 2026-04-21 4:00 ` Oscar Salvador
0 siblings, 0 replies; 13+ messages in thread
From: Oscar Salvador @ 2026-04-21 4:00 UTC (permalink / raw)
To: Muchun Song
Cc: Andrew Morton, David Hildenbrand, Muchun Song, Michael Ellerman,
Madhavan Srinivasan, Mike Rapoport, Lorenzo Stoakes,
Liam R . Howlett, Vlastimil Babka, Suren Baghdasaryan,
Michal Hocko, Nicholas Piggin, Christophe Leroy, aneesh.kumar,
joao.m.martins, linux-mm, linuxppc-dev, linux-kernel
On Tue, Apr 21, 2026 at 10:20:43AM +0800, Muchun Song wrote:
> When vmemmap optimization is enabled for DAX, the nr_memmap_pages
> counter in /proc/vmstat is incorrect. The current code always accounts
> for the full, non-optimized vmemmap size, but vmemmap optimization
> reduces the actual number of vmemmap pages by reusing tail pages. This
> causes the system to overcount vmemmap usage, leading to inaccurate
> page statistics in /proc/vmstat.
>
> Fix this by introducing section_vmemmap_pages(), which returns the exact
> vmemmap page count for a given pfn range based on whether optimization
> is in effect.
>
> Fixes: 15995a352474 ("mm: report per-page metadata information")
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Oscar Salvador <osalvador@suse.de>
--
Oscar Salvador
SUSE Labs
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 2/4] mm/sparse-vmemmap: Pass @pgmap argument to memory deactivation paths
2026-04-21 3:55 ` Oscar Salvador
@ 2026-04-21 4:01 ` Muchun Song
0 siblings, 0 replies; 13+ messages in thread
From: Muchun Song @ 2026-04-21 4:01 UTC (permalink / raw)
To: Oscar Salvador
Cc: Muchun Song, Andrew Morton, David Hildenbrand, Michael Ellerman,
Madhavan Srinivasan, Mike Rapoport, Lorenzo Stoakes,
Liam R . Howlett, Vlastimil Babka, Suren Baghdasaryan,
Michal Hocko, Nicholas Piggin, Christophe Leroy, aneesh.kumar,
joao.m.martins, linux-mm, linuxppc-dev, linux-kernel
> On Apr 21, 2026, at 11:55, Oscar Salvador <osalvador@suse.de> wrote:
>
> On Tue, Apr 21, 2026 at 10:20:42AM +0800, Muchun Song wrote:
>> Currently, the memory hot-remove call chain -- arch_remove_memory(),
>> __remove_pages(), sparse_remove_section() and section_deactivate() --
>> does not carry the struct dev_pagemap pointer. This prevents the lower
>> levels from knowing whether the section was originally populated with
>> vmemmap optimizations (e.g., DAX with vmemmap optimization enabled).
>>
>> Without this information, we cannot call vmemmap_can_optimize() to
>> determine if the vmemmap pages were optimized. As a result, the vmemmap
>> page accounting during teardown will mistakenly assume a non-optimized
>> allocation, leading to incorrect memmap statistics.
>>
>> To lay the groundwork for fixing the vmemmap page accounting, we need
>> to pass the @pgmap pointer down to the deactivation location. Plumb the
>> @pgmap argument through the APIs of arch_remove_memory(), __remove_pages()
>> and sparse_remove_section(), mirroring the corresponding *_activate()
>> paths.
>>
>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
>
> Reviewed-by: Oscar Salvador <osalvador@suse.de>
Thanks.
>
> The change looks good to me, but I was wondering whether we should pass a
> mhp struct instead to low-level functions like arch_remove_memory and
> __remove_pages and have __remove_pages then pass the right stuff down
> the road.
> That way it would mimic more what we do in hot-add path.
Passing the pgmap parameter is a temporary fix, as I have another
patchset coming up to remove pgmap entirely [1].
[1] https://lore.kernel.org/linux-mm/20260405125240.2558577-46-songmuchun@bytedance.com/
Thanks,
Muchun.
>
>
> --
> Oscar Salvador
> SUSE Labs
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 4/4] mm/mm_init: Fix pageblock migratetype for ZONE_DEVICE compound pages
2026-04-21 2:20 ` [PATCH v3 4/4] mm/mm_init: Fix pageblock migratetype for ZONE_DEVICE compound pages Muchun Song
@ 2026-04-21 4:15 ` Oscar Salvador
2026-04-21 6:54 ` Muchun Song
2026-04-21 9:31 ` Muchun Song
1 sibling, 1 reply; 13+ messages in thread
From: Oscar Salvador @ 2026-04-21 4:15 UTC (permalink / raw)
To: Muchun Song
Cc: Andrew Morton, David Hildenbrand, Muchun Song, Michael Ellerman,
Madhavan Srinivasan, Mike Rapoport, Lorenzo Stoakes,
Liam R . Howlett, Vlastimil Babka, Suren Baghdasaryan,
Michal Hocko, Nicholas Piggin, Christophe Leroy, aneesh.kumar,
joao.m.martins, linux-mm, linuxppc-dev, linux-kernel
On Tue, Apr 21, 2026 at 10:20:44AM +0800, Muchun Song wrote:
> The memmap_init_zone_device() function only initializes the migratetype
> of the first pageblock of a compound page. If the compound page size
> exceeds pageblock_nr_pages (e.g., 1GB hugepages with 2MB pageblocks),
> subsequent pageblocks in the compound page remain uninitialized.
>
> Move the migratetype initialization out of __init_zone_device_page()
> and into a separate pageblock_migratetype_init_range() function. This
> iterates over the entire PFN range of the memory, ensuring that all
> pageblocks are correctly initialized.
>
> Fixes: c4386bd8ee3a ("mm/memremap: add ZONE_DEVICE support for compound pages")
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Would not the call to __init_zone_device_page() from
memmap_init_compound() take care of the subsequent pageblocks?
--
Oscar Salvador
SUSE Labs
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 4/4] mm/mm_init: Fix pageblock migratetype for ZONE_DEVICE compound pages
2026-04-21 4:15 ` Oscar Salvador
@ 2026-04-21 6:54 ` Muchun Song
2026-04-21 7:29 ` Oscar Salvador
0 siblings, 1 reply; 13+ messages in thread
From: Muchun Song @ 2026-04-21 6:54 UTC (permalink / raw)
To: Oscar Salvador
Cc: Muchun Song, Andrew Morton, David Hildenbrand, Michael Ellerman,
Madhavan Srinivasan, Mike Rapoport, Lorenzo Stoakes,
Liam R . Howlett, Vlastimil Babka, Suren Baghdasaryan,
Michal Hocko, Nicholas Piggin, Christophe Leroy, aneesh.kumar,
joao.m.martins, linux-mm, linuxppc-dev, linux-kernel
> On Apr 21, 2026, at 12:15, Oscar Salvador <osalvador@suse.de> wrote:
>
> On Tue, Apr 21, 2026 at 10:20:44AM +0800, Muchun Song wrote:
>> The memmap_init_zone_device() function only initializes the migratetype
>> of the first pageblock of a compound page. If the compound page size
>> exceeds pageblock_nr_pages (e.g., 1GB hugepages with 2MB pageblocks),
>> subsequent pageblocks in the compound page remain uninitialized.
>>
>> Move the migratetype initialization out of __init_zone_device_page()
>> and into a separate pageblock_migratetype_init_range() function. This
>> iterates over the entire PFN range of the memory, ensuring that all
>> pageblocks are correctly initialized.
>>
>> Fixes: c4386bd8ee3a ("mm/memremap: add ZONE_DEVICE support for compound pages")
>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
>
> Would not the call to __init_zone_device_page() from
> memmap_init_compound() take care of the subsequent pageblocks?
>
No, it won't handle them automatically, as the page count from
compound_nr_pages doesn't cover the following pageblocks.
Thanks.
Muchun
>
> --
> Oscar Salvador
> SUSE Labs
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 4/4] mm/mm_init: Fix pageblock migratetype for ZONE_DEVICE compound pages
2026-04-21 6:54 ` Muchun Song
@ 2026-04-21 7:29 ` Oscar Salvador
0 siblings, 0 replies; 13+ messages in thread
From: Oscar Salvador @ 2026-04-21 7:29 UTC (permalink / raw)
To: Muchun Song
Cc: Muchun Song, Andrew Morton, David Hildenbrand, Michael Ellerman,
Madhavan Srinivasan, Mike Rapoport, Lorenzo Stoakes,
Liam R . Howlett, Vlastimil Babka, Suren Baghdasaryan,
Michal Hocko, Nicholas Piggin, Christophe Leroy, aneesh.kumar,
joao.m.martins, linux-mm, linuxppc-dev, linux-kernel
On Tue, Apr 21, 2026 at 02:54:35PM +0800, Muchun Song wrote:
>
>
> > On Apr 21, 2026, at 12:15, Oscar Salvador <osalvador@suse.de> wrote:
> >
> > On Tue, Apr 21, 2026 at 10:20:44AM +0800, Muchun Song wrote:
> >> The memmap_init_zone_device() function only initializes the migratetype
> >> of the first pageblock of a compound page. If the compound page size
> >> exceeds pageblock_nr_pages (e.g., 1GB hugepages with 2MB pageblocks),
> >> subsequent pageblocks in the compound page remain uninitialized.
> >>
> >> Move the migratetype initialization out of __init_zone_device_page()
> >> and into a separate pageblock_migratetype_init_range() function. This
> >> iterates over the entire PFN range of the memory, ensuring that all
> >> pageblocks are correctly initialized.
> >>
> >> Fixes: c4386bd8ee3a ("mm/memremap: add ZONE_DEVICE support for compound pages")
> >> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> >> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
> > Would not the call to __init_zone_device_page() from
> > memmap_init_compound() take care of the subsequent pageblocks?
> >
>
> No, it won't handle them automatically, as the page count from
> compound_nr_pages doesn't cover the following pageblocks.
Ok, I see.
--
Oscar Salvador
SUSE Labs
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 4/4] mm/mm_init: Fix pageblock migratetype for ZONE_DEVICE compound pages
2026-04-21 2:20 ` [PATCH v3 4/4] mm/mm_init: Fix pageblock migratetype for ZONE_DEVICE compound pages Muchun Song
2026-04-21 4:15 ` Oscar Salvador
@ 2026-04-21 9:31 ` Muchun Song
1 sibling, 0 replies; 13+ messages in thread
From: Muchun Song @ 2026-04-21 9:31 UTC (permalink / raw)
To: Muchun Song
Cc: Andrew Morton, David Hildenbrand, Oscar Salvador,
Michael Ellerman, Madhavan Srinivasan, Mike Rapoport,
Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko, Nicholas Piggin,
Christophe Leroy, aneesh.kumar, joao.m.martins, linux-mm,
linuxppc-dev, linux-kernel
> On Apr 21, 2026, at 10:20, Muchun Song <songmuchun@bytedance.com> wrote:
>
> The memmap_init_zone_device() function only initializes the migratetype
> of the first pageblock of a compound page. If the compound page size
> exceeds pageblock_nr_pages (e.g., 1GB hugepages with 2MB pageblocks),
> subsequent pageblocks in the compound page remain uninitialized.
>
> Move the migratetype initialization out of __init_zone_device_page()
> and into a separate pageblock_migratetype_init_range() function. This
> iterates over the entire PFN range of the memory, ensuring that all
> pageblocks are correctly initialized.
>
> Fixes: c4386bd8ee3a ("mm/memremap: add ZONE_DEVICE support for compound pages")
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> ---
> mm/mm_init.c | 43 ++++++++++++++++++++++++++++---------------
> 1 file changed, 28 insertions(+), 15 deletions(-)
>
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index f9f8e1af921c..e2d8eae23aa3 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -674,6 +674,19 @@ static inline void fixup_hashdist(void)
> static inline void fixup_hashdist(void) {}
> #endif /* CONFIG_NUMA */
>
> +static __meminit void pageblock_migratetype_init_range(unsigned long pfn,
> + unsigned long nr_pages,
> + int migratetype)
> +{
> + unsigned long end = pfn + nr_pages;
> +
> + for (pfn = pageblock_align(pfn); pfn < end; pfn += pageblock_nr_pages) {
> + init_pageblock_migratetype(pfn_to_page(pfn), migratetype, false);
> + if (IS_ALIGNED(pfn, PAGES_PER_SECTION))
> + cond_resched();
> + }
> +}
I found a positive comment from AI review:
This trigger a -Wunused-function warning when CONFIG_ZONE_DEVICE is
disabled.
I'll fix it in the next version.
Thanks.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2026-04-21 9:32 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-04-21 2:20 [PATCH v3 0/4] mm: Fix vmemmap optimization accounting and initialization Muchun Song
2026-04-21 2:20 ` [PATCH v3 1/4] mm/sparse-vmemmap: Fix vmemmap accounting underflow Muchun Song
2026-04-21 3:45 ` Oscar Salvador
2026-04-21 2:20 ` [PATCH v3 2/4] mm/sparse-vmemmap: Pass @pgmap argument to memory deactivation paths Muchun Song
2026-04-21 3:55 ` Oscar Salvador
2026-04-21 4:01 ` Muchun Song
2026-04-21 2:20 ` [PATCH v3 3/4] mm/sparse-vmemmap: Fix DAX vmemmap accounting with optimization Muchun Song
2026-04-21 4:00 ` Oscar Salvador
2026-04-21 2:20 ` [PATCH v3 4/4] mm/mm_init: Fix pageblock migratetype for ZONE_DEVICE compound pages Muchun Song
2026-04-21 4:15 ` Oscar Salvador
2026-04-21 6:54 ` Muchun Song
2026-04-21 7:29 ` Oscar Salvador
2026-04-21 9:31 ` Muchun Song
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox