[PATCH v9 0/2] Optimize zone->contiguous update

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v9 0/2] Optimize zone->contiguous update
@ 2026-01-30 16:37 Tianyou Li
  2026-01-30 16:37 ` [PATCH v9 1/2] mm/memory hotplug/unplug: Add online_memory_block_pages() and offline_memory_block_pages() Tianyou Li
  2026-01-30 16:37 ` [PATCH v9 2/2] mm/memory hotplug/unplug: Optimize zone->contiguous update when changes pfn range Tianyou Li
  0 siblings, 2 replies; 13+ messages in thread
From: Tianyou Li @ 2026-01-30 16:37 UTC (permalink / raw)
  To: David Hildenbrand, Oscar Salvador, Mike Rapoport, Wei Yang, Michal Hocko
  Cc: linux-mm, Yong Hu, Nanhai Zou, Yuan Liu, Tim Chen, Qiuxu Zhuo,
	Yu C Chen, Pan Deng, Tianyou Li, Chen Zhang, linux-kernel

This series contains 2 patches. The first one encapsulate the
mhp_init_memmap_on_memory() and online_pages() into
online_memory_block_pages(), the mhp_deinit_memmap_on_memory()
and offline_pages() into offline_memory_block_pages(). Then
move most of memory_block_online() to the new function
mhp_block_online(struct memory_block *block) and correspondingly
memory_block_offline() to mhp_block_offline(struct memory_block *block).
The second one add a fast path to check the zone->contiguous.

Changes History
===============
v9 changes:
   1. Separate the bug fix and optimization into two patches.
   2. This patchset depends on the https://lore.kernel.org/linux-mm/20260130160938.2671462-1-tianyou.li@intel.com/.
   3. Refactor the code to move most of memory_block_online() to
      mhp_block_online() and correspondingly memory_block_offline() to
      mhp_block_offline().

v8 changes:
   1. Rebased to 6.19-rc6
   2. Add online_memory_block_pages() and offline_memory_block_pages()

v7 changes:
   1. Rebased to 6.19-rc1
   2. Reorder the patches so that the fix will be the first in the series.

v6 changes:
   1. Improve coding style.
   2. Add comments.

v5 changes:
   1. Improve coding style.
   2. Fix a issue in which zone->contiguous was always false when adding
      new memory, leveraging the fast path optimization.

v4 changes:
   1. Improve coding style.
   2. Add fast path for zone contiguity check in memory unplugged cases,
      and update test results.
   3. Refactor set_zone_contiguous: the new set_zone_contiguous updates
      zone contiguity based on the fast path results.

v3 changes:
   Add zone contiguity check for empty zones.

v2 changes:
   Add check_zone_contiguous_fast function to check zone contiguity for
   new  memory PFN ranges.

Tianyou Li (2):
  mm/memory hotplug/unplug: Add online_memory_block_pages() and
    offline_memory_block_pages()
  mm/memory hotplug/unplug: Optimize zone->contiguous update when
    changes pfn range

 drivers/base/memory.c          | 115 +-----------------
 include/linux/memory_hotplug.h |  13 +--
 include/linux/mm.h             |   6 +
 mm/internal.h                  |   8 +-
 mm/memory_hotplug.c            | 208 ++++++++++++++++++++++++++++++---
 mm/mm_init.c                   |  15 ++-
 6 files changed, 228 insertions(+), 137 deletions(-)

-- 
2.47.1



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v9 1/2] mm/memory hotplug/unplug: Add online_memory_block_pages() and offline_memory_block_pages()
  2026-01-30 16:37 [PATCH v9 0/2] Optimize zone->contiguous update Tianyou Li
@ 2026-01-30 16:37 ` Tianyou Li
  2026-01-30 16:37 ` [PATCH v9 2/2] mm/memory hotplug/unplug: Optimize zone->contiguous update when changes pfn range Tianyou Li
  1 sibling, 0 replies; 13+ messages in thread
From: Tianyou Li @ 2026-01-30 16:37 UTC (permalink / raw)
  To: David Hildenbrand, Oscar Salvador, Mike Rapoport, Wei Yang, Michal Hocko
  Cc: linux-mm, Yong Hu, Nanhai Zou, Yuan Liu, Tim Chen, Qiuxu Zhuo,
	Yu C Chen, Pan Deng, Tianyou Li, Chen Zhang, linux-kernel

Encapsulate the mhp_init_memmap_on_memory() and online_pages() into
online_memory_block_pages(). Thus we can further optimize the
set_zone_contiguous() to check the whole memory block range, instead
of check the zone contiguous in separate range.

Correspondingly, encapsulate the mhp_deinit_memmap_on_memory() and
offline_pages() into offline_memory_block_pages().

Furthermore, move most of memory_block_online() to the new function
mhp_block_online(struct memory_block *block) and correspondingly
memory_block_offline() to mhp_block_offline(struct memory_block *block).

Tested-by: Yuan Liu <yuan1.liu@intel.com>
Reviewed-by: Yuan Liu <yuan1.liu@intel.com>
Signed-off-by: Tianyou Li <tianyou.li@intel.com>
---
 drivers/base/memory.c          | 115 +---------------------------
 include/linux/memory_hotplug.h |  13 +---
 include/linux/mm.h             |   6 ++
 mm/memory_hotplug.c            | 132 ++++++++++++++++++++++++++++++++-
 4 files changed, 141 insertions(+), 125 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 751f248ca4a8..40f014c5dbb1 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -209,115 +209,6 @@ int memory_notify(enum memory_block_state state, void *v)
 	return blocking_notifier_call_chain(&memory_chain, state, v);
 }
 
-#if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_MEMORY_HOTPLUG)
-static unsigned long memblk_nr_poison(struct memory_block *mem);
-#else
-static inline unsigned long memblk_nr_poison(struct memory_block *mem)
-{
-	return 0;
-}
-#endif
-
-/*
- * Must acquire mem_hotplug_lock in write mode.
- */
-static int memory_block_online(struct memory_block *mem)
-{
-	unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
-	unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
-	unsigned long nr_vmemmap_pages = 0;
-	struct zone *zone;
-	int ret;
-
-	if (memblk_nr_poison(mem))
-		return -EHWPOISON;
-
-	zone = zone_for_pfn_range(mem->online_type, mem->nid, mem->group,
-				  start_pfn, nr_pages);
-
-	/*
-	 * Although vmemmap pages have a different lifecycle than the pages
-	 * they describe (they remain until the memory is unplugged), doing
-	 * their initialization and accounting at memory onlining/offlining
-	 * stage helps to keep accounting easier to follow - e.g vmemmaps
-	 * belong to the same zone as the memory they backed.
-	 */
-	if (mem->altmap)
-		nr_vmemmap_pages = mem->altmap->free;
-
-	mem_hotplug_begin();
-	if (nr_vmemmap_pages) {
-		ret = mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages, zone);
-		if (ret)
-			goto out;
-	}
-
-	ret = online_pages(start_pfn + nr_vmemmap_pages,
-			   nr_pages - nr_vmemmap_pages, zone, mem->group);
-	if (ret) {
-		if (nr_vmemmap_pages)
-			mhp_deinit_memmap_on_memory(start_pfn, nr_vmemmap_pages);
-		goto out;
-	}
-
-	/*
-	 * Account once onlining succeeded. If the zone was unpopulated, it is
-	 * now already properly populated.
-	 */
-	if (nr_vmemmap_pages)
-		adjust_present_page_count(pfn_to_page(start_pfn), mem->group,
-					  nr_vmemmap_pages);
-
-	mem->zone = zone;
-out:
-	mem_hotplug_done();
-	return ret;
-}
-
-/*
- * Must acquire mem_hotplug_lock in write mode.
- */
-static int memory_block_offline(struct memory_block *mem)
-{
-	unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
-	unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
-	unsigned long nr_vmemmap_pages = 0;
-	int ret;
-
-	if (!mem->zone)
-		return -EINVAL;
-
-	/*
-	 * Unaccount before offlining, such that unpopulated zone and kthreads
-	 * can properly be torn down in offline_pages().
-	 */
-	if (mem->altmap)
-		nr_vmemmap_pages = mem->altmap->free;
-
-	mem_hotplug_begin();
-	if (nr_vmemmap_pages)
-		adjust_present_page_count(pfn_to_page(start_pfn), mem->group,
-					  -nr_vmemmap_pages);
-
-	ret = offline_pages(start_pfn + nr_vmemmap_pages,
-			    nr_pages - nr_vmemmap_pages, mem->zone, mem->group);
-	if (ret) {
-		/* offline_pages() failed. Account back. */
-		if (nr_vmemmap_pages)
-			adjust_present_page_count(pfn_to_page(start_pfn),
-						  mem->group, nr_vmemmap_pages);
-		goto out;
-	}
-
-	if (nr_vmemmap_pages)
-		mhp_deinit_memmap_on_memory(start_pfn, nr_vmemmap_pages);
-
-	mem->zone = NULL;
-out:
-	mem_hotplug_done();
-	return ret;
-}
-
 /*
  * MEMORY_HOTPLUG depends on SPARSEMEM in mm/Kconfig, so it is
  * OK to have direct references to sparsemem variables in here.
@@ -329,10 +220,10 @@ memory_block_action(struct memory_block *mem, unsigned long action)
 
 	switch (action) {
 	case MEM_ONLINE:
-		ret = memory_block_online(mem);
+		ret = mhp_block_online(mem);
 		break;
 	case MEM_OFFLINE:
-		ret = memory_block_offline(mem);
+		ret = mhp_block_offline(mem);
 		break;
 	default:
 		WARN(1, KERN_WARNING "%s(%ld, %ld) unknown action: "
@@ -1243,7 +1134,7 @@ void memblk_nr_poison_sub(unsigned long pfn, long i)
 		atomic_long_sub(i, &mem->nr_hwpoison);
 }
 
-static unsigned long memblk_nr_poison(struct memory_block *mem)
+unsigned long memblk_nr_poison(struct memory_block *mem)
 {
 	return atomic_long_read(&mem->nr_hwpoison);
 }
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index f2f16cdd73ee..8783a11da464 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -12,6 +12,7 @@ struct zone;
 struct pglist_data;
 struct mem_section;
 struct memory_group;
+struct memory_block;
 struct resource;
 struct vmem_altmap;
 struct dev_pagemap;
@@ -106,11 +107,7 @@ extern void adjust_present_page_count(struct page *page,
 				      struct memory_group *group,
 				      long nr_pages);
 /* VM interface that may be used by firmware interface */
-extern int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_pages,
-				     struct zone *zone);
-extern void mhp_deinit_memmap_on_memory(unsigned long pfn, unsigned long nr_pages);
-extern int online_pages(unsigned long pfn, unsigned long nr_pages,
-			struct zone *zone, struct memory_group *group);
+extern int mhp_block_online(struct memory_block *block);
 extern unsigned long __offline_isolated_pages(unsigned long start_pfn,
 		unsigned long end_pfn);
 
@@ -261,8 +258,7 @@ static inline void pgdat_resize_init(struct pglist_data *pgdat) {}
 #ifdef CONFIG_MEMORY_HOTREMOVE
 
 extern void try_offline_node(int nid);
-extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
-			 struct zone *zone, struct memory_group *group);
+extern int mhp_block_offline(struct memory_block *block);
 extern int remove_memory(u64 start, u64 size);
 extern void __remove_memory(u64 start, u64 size);
 extern int offline_and_remove_memory(u64 start, u64 size);
@@ -270,8 +266,7 @@ extern int offline_and_remove_memory(u64 start, u64 size);
 #else
 static inline void try_offline_node(int nid) {}
 
-static inline int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
-				struct zone *zone, struct memory_group *group)
+static inline int mhp_block_offline(struct memory_block *block)
 {
 	return -EINVAL;
 }
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 6f959d8ca4b4..967605d95131 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4377,6 +4377,7 @@ static inline void num_poisoned_pages_sub(unsigned long pfn, long i)
 #if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_MEMORY_HOTPLUG)
 extern void memblk_nr_poison_inc(unsigned long pfn);
 extern void memblk_nr_poison_sub(unsigned long pfn, long i);
+extern unsigned long memblk_nr_poison(struct memory_block *mem);
 #else
 static inline void memblk_nr_poison_inc(unsigned long pfn)
 {
@@ -4385,6 +4386,11 @@ static inline void memblk_nr_poison_inc(unsigned long pfn)
 static inline void memblk_nr_poison_sub(unsigned long pfn, long i)
 {
 }
+
+static inline unsigned long memblk_nr_poison(struct memory_block *mem)
+{
+	return 0;
+}
 #endif
 
 #ifndef arch_memory_failure
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index c8f492b5daf0..62d6bc8ea2dd 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1085,7 +1085,7 @@ void adjust_present_page_count(struct page *page, struct memory_group *group,
 		group->present_kernel_pages += nr_pages;
 }
 
-int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_pages,
+static int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_pages,
 			      struct zone *zone)
 {
 	unsigned long end_pfn = pfn + nr_pages;
@@ -1116,7 +1116,7 @@ int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_pages,
 	return ret;
 }
 
-void mhp_deinit_memmap_on_memory(unsigned long pfn, unsigned long nr_pages)
+static void mhp_deinit_memmap_on_memory(unsigned long pfn, unsigned long nr_pages)
 {
 	unsigned long end_pfn = pfn + nr_pages;
 
@@ -1139,7 +1139,7 @@ void mhp_deinit_memmap_on_memory(unsigned long pfn, unsigned long nr_pages)
 /*
  * Must be called with mem_hotplug_lock in write mode.
  */
-int online_pages(unsigned long pfn, unsigned long nr_pages,
+static int online_pages(unsigned long pfn, unsigned long nr_pages,
 		       struct zone *zone, struct memory_group *group)
 {
 	struct memory_notify mem_arg = {
@@ -1254,6 +1254,74 @@ int online_pages(unsigned long pfn, unsigned long nr_pages,
 	return ret;
 }
 
+static int online_memory_block_pages(unsigned long start_pfn, unsigned long nr_pages,
+			unsigned long nr_vmemmap_pages, struct zone *zone,
+			struct memory_group *group)
+{
+	int ret;
+
+	if (nr_vmemmap_pages) {
+		ret = mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages, zone);
+		if (ret)
+			return ret;
+	}
+
+	ret = online_pages(start_pfn + nr_vmemmap_pages,
+			   nr_pages - nr_vmemmap_pages, zone, group);
+	if (ret) {
+		if (nr_vmemmap_pages)
+			mhp_deinit_memmap_on_memory(start_pfn, nr_vmemmap_pages);
+		return ret;
+	}
+
+	/*
+	 * Account once onlining succeeded. If the zone was unpopulated, it is
+	 * now already properly populated.
+	 */
+	if (nr_vmemmap_pages)
+		adjust_present_page_count(pfn_to_page(start_pfn), group,
+					  nr_vmemmap_pages);
+
+	return ret;
+}
+
+/*
+ * Must acquire mem_hotplug_lock in write mode.
+ */
+int mhp_block_online(struct memory_block *mem)
+{
+	unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
+	unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
+	unsigned long nr_vmemmap_pages = 0;
+	struct zone *zone;
+	int ret;
+
+	if (memblk_nr_poison(mem))
+		return -EHWPOISON;
+
+	zone = zone_for_pfn_range(mem->online_type, mem->nid, mem->group,
+				  start_pfn, nr_pages);
+
+	/*
+	 * Although vmemmap pages have a different lifecycle than the pages
+	 * they describe (they remain until the memory is unplugged), doing
+	 * their initialization and accounting at memory onlining/offlining
+	 * stage helps to keep accounting easier to follow - e.g vmemmaps
+	 * belong to the same zone as the memory they backed.
+	 */
+	if (mem->altmap)
+		nr_vmemmap_pages = mem->altmap->free;
+
+	mem_hotplug_begin();
+	ret = online_memory_block_pages(start_pfn, nr_pages, nr_vmemmap_pages,
+					zone, mem->group);
+	if (!ret)
+		mem->zone = zone;
+	mem_hotplug_done();
+
+	return ret;
+}
+
 /* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */
 static pg_data_t *hotadd_init_pgdat(int nid)
 {
@@ -1896,7 +1964,7 @@ static int count_system_ram_pages_cb(unsigned long start_pfn,
 /*
  * Must be called with mem_hotplug_lock in write mode.
  */
-int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
+static int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
 			struct zone *zone, struct memory_group *group)
 {
 	unsigned long pfn, managed_pages, system_ram_pages = 0;
@@ -2101,6 +2169,62 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
 	return ret;
 }
 
+static int offline_memory_block_pages(unsigned long start_pfn,
+		unsigned long nr_pages, unsigned long nr_vmemmap_pages,
+		struct zone *zone, struct memory_group *group)
+{
+	int ret;
+
+	if (nr_vmemmap_pages)
+		adjust_present_page_count(pfn_to_page(start_pfn), group,
+					  -nr_vmemmap_pages);
+
+	ret = offline_pages(start_pfn + nr_vmemmap_pages,
+			    nr_pages - nr_vmemmap_pages, zone, group);
+	if (ret) {
+		/* offline_pages() failed. Account back. */
+		if (nr_vmemmap_pages)
+			adjust_present_page_count(pfn_to_page(start_pfn),
+						  group, nr_vmemmap_pages);
+		return ret;
+	}
+
+	if (nr_vmemmap_pages)
+		mhp_deinit_memmap_on_memory(start_pfn, nr_vmemmap_pages);
+
+	return ret;
+}
+
+/*
+ * Must acquire mem_hotplug_lock in write mode.
+ */
+int mhp_block_offline(struct memory_block *mem)
+{
+	unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
+	unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
+	unsigned long nr_vmemmap_pages = 0;
+	int ret;
+
+	if (!mem->zone)
+		return -EINVAL;
+
+	/*
+	 * Unaccount before offlining, such that unpopulated zone and kthreads
+	 * can properly be torn down in offline_pages().
+	 */
+	if (mem->altmap)
+		nr_vmemmap_pages = mem->altmap->free;
+
+	mem_hotplug_begin();
+	ret = offline_memory_block_pages(start_pfn, nr_pages, nr_vmemmap_pages,
+					 mem->zone, mem->group);
+	if (!ret)
+		mem->zone = NULL;
+	mem_hotplug_done();
+
+	return ret;
+}
+
 static int check_memblock_offlined_cb(struct memory_block *mem, void *arg)
 {
 	int *nid = arg;
-- 
2.47.1



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v9 2/2] mm/memory hotplug/unplug: Optimize zone->contiguous update when changes pfn range
  2026-01-30 16:37 [PATCH v9 0/2] Optimize zone->contiguous update Tianyou Li
  2026-01-30 16:37 ` [PATCH v9 1/2] mm/memory hotplug/unplug: Add online_memory_block_pages() and offline_memory_block_pages() Tianyou Li
@ 2026-01-30 16:37 ` Tianyou Li
  2026-02-07 11:00   ` David Hildenbrand (Arm)
  1 sibling, 1 reply; 13+ messages in thread
From: Tianyou Li @ 2026-01-30 16:37 UTC (permalink / raw)
  To: David Hildenbrand, Oscar Salvador, Mike Rapoport, Wei Yang, Michal Hocko
  Cc: linux-mm, Yong Hu, Nanhai Zou, Yuan Liu, Tim Chen, Qiuxu Zhuo,
	Yu C Chen, Pan Deng, Tianyou Li, Chen Zhang, linux-kernel

When invoke move_pfn_range_to_zone or remove_pfn_range_from_zone, it will
update the zone->contiguous by checking the new zone's pfn range from the
beginning to the end, regardless the previous state of the old zone. When
the zone's pfn range is large, the cost of traversing the pfn range to
update the zone->contiguous could be significant.

Add fast paths to quickly detect cases where zone is definitely not
contiguous without scanning the new zone. The cases are: when the new range
did not overlap with previous range, the contiguous should be false; if the
new range adjacent with the previous range, just need to check the new
range; if the new added pages could not fill the hole of previous zone, the
contiguous should be false.

The following test cases of memory hotplug for a VM [1], tested in the
environment [2], show that this optimization can significantly reduce the
memory hotplug time [3].

+----------------+------+---------------+--------------+----------------+
|                | Size | Time (before) | Time (after) | Time Reduction |
|                +------+---------------+--------------+----------------+
| Plug Memory    | 256G |      10s      |      2s      |       80%      |
|                +------+---------------+--------------+----------------+
|                | 512G |      33s      |      6s      |       81%      |
+----------------+------+---------------+--------------+----------------+

+----------------+------+---------------+--------------+----------------+
|                | Size | Time (before) | Time (after) | Time Reduction |
|                +------+---------------+--------------+----------------+
| Unplug Memory  | 256G |      10s      |      2s      |       80%      |
|                +------+---------------+--------------+----------------+
|                | 512G |      34s      |      6s      |       82%      |
+----------------+------+---------------+--------------+----------------+

[1] Qemu commands to hotplug 256G/512G memory for a VM:
    object_add memory-backend-ram,id=hotmem0,size=256G/512G,share=on
    device_add virtio-mem-pci,id=vmem1,memdev=hotmem0,bus=port1
    qom-set vmem1 requested-size 256G/512G (Plug Memory)
    qom-set vmem1 requested-size 0G (Unplug Memory)

[2] Hardware     : Intel Icelake server
    Guest Kernel : v6.18-rc2
    Qemu         : v9.0.0

    Launch VM    :
    qemu-system-x86_64 -accel kvm -cpu host \
    -drive file=./Centos10_cloud.qcow2,format=qcow2,if=virtio \
    -drive file=./seed.img,format=raw,if=virtio \
    -smp 3,cores=3,threads=1,sockets=1,maxcpus=3 \
    -m 2G,slots=10,maxmem=2052472M \
    -device pcie-root-port,id=port1,bus=pcie.0,slot=1,multifunction=on \
    -device pcie-root-port,id=port2,bus=pcie.0,slot=2 \
    -nographic -machine q35 \
    -nic user,hostfwd=tcp::3000-:22

    Guest kernel auto-onlines newly added memory blocks:
    echo online > /sys/devices/system/memory/auto_online_blocks

[3] The time from typing the QEMU commands in [1] to when the output of
    'grep MemTotal /proc/meminfo' on Guest reflects that all hotplugged
    memory is recognized.

Reported-by: Nanhai Zou <nanhai.zou@intel.com>
Reported-by: Chen Zhang <zhangchen.kidd@jd.com>
Tested-by: Yuan Liu <yuan1.liu@intel.com>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Reviewed-by: Yu C Chen <yu.c.chen@intel.com>
Reviewed-by: Pan Deng <pan.deng@intel.com>
Reviewed-by: Nanhai Zou <nanhai.zou@intel.com>
Reviewed-by: Yuan Liu <yuan1.liu@intel.com>
Signed-off-by: Tianyou Li <tianyou.li@intel.com>
---
 mm/internal.h       |  8 ++++-
 mm/memory_hotplug.c | 80 ++++++++++++++++++++++++++++++++++++++-------
 mm/mm_init.c        | 15 +++++++--
 3 files changed, 89 insertions(+), 14 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index e430da900430..828aed5c2fef 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -730,7 +730,13 @@ static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
 	return __pageblock_pfn_to_page(start_pfn, end_pfn, zone);
 }
 
-void set_zone_contiguous(struct zone *zone);
+enum zone_contig_state {
+	ZONE_CONTIG_YES,
+	ZONE_CONTIG_NO,
+	ZONE_CONTIG_MAYBE,
+};
+
+void set_zone_contiguous(struct zone *zone, enum zone_contig_state state);
 bool pfn_range_intersects_zones(int nid, unsigned long start_pfn,
 			   unsigned long nr_pages);
 
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 62d6bc8ea2dd..e7a97c9c35be 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -544,6 +544,25 @@ static void update_pgdat_span(struct pglist_data *pgdat)
 	pgdat->node_spanned_pages = node_end_pfn - node_start_pfn;
 }
 
+static enum zone_contig_state zone_contig_state_after_shrinking(struct zone *zone,
+				unsigned long start_pfn, unsigned long nr_pages)
+{
+	const unsigned long end_pfn = start_pfn + nr_pages;
+
+	/*
+	 * If we cut a hole into the zone span, then the zone is
+	 * certainly not contiguous.
+	 */
+	if (start_pfn > zone->zone_start_pfn && end_pfn < zone_end_pfn(zone))
+		return ZONE_CONTIG_NO;
+
+	/* Removing from the start/end of the zone will not change anything. */
+	if (start_pfn == zone->zone_start_pfn || end_pfn == zone_end_pfn(zone))
+		return zone->contiguous ? ZONE_CONTIG_YES : ZONE_CONTIG_MAYBE;
+
+	return ZONE_CONTIG_MAYBE;
+}
+
 void remove_pfn_range_from_zone(struct zone *zone,
 				      unsigned long start_pfn,
 				      unsigned long nr_pages)
@@ -551,6 +570,7 @@ void remove_pfn_range_from_zone(struct zone *zone,
 	const unsigned long end_pfn = start_pfn + nr_pages;
 	struct pglist_data *pgdat = zone->zone_pgdat;
 	unsigned long pfn, cur_nr_pages;
+	enum zone_contig_state new_contiguous_state;
 
 	/* Poison struct pages because they are now uninitialized again. */
 	for (pfn = start_pfn; pfn < end_pfn; pfn += cur_nr_pages) {
@@ -571,12 +591,14 @@ void remove_pfn_range_from_zone(struct zone *zone,
 	if (zone_is_zone_device(zone))
 		return;
 
+	new_contiguous_state = zone_contig_state_after_shrinking(zone, start_pfn,
+								 nr_pages);
 	clear_zone_contiguous(zone);
 
 	shrink_zone_span(zone, start_pfn, start_pfn + nr_pages);
 	update_pgdat_span(pgdat);
 
-	set_zone_contiguous(zone);
+	set_zone_contiguous(zone, new_contiguous_state);
 }
 
 /**
@@ -736,6 +758,32 @@ static inline void section_taint_zone_device(unsigned long pfn)
 }
 #endif
 
+static enum zone_contig_state zone_contig_state_after_growing(struct zone *zone,
+				unsigned long start_pfn, unsigned long nr_pages)
+{
+	const unsigned long end_pfn = start_pfn + nr_pages;
+
+	if (zone_is_empty(zone))
+		return ZONE_CONTIG_YES;
+
+	/*
+	 * If the moved pfn range does not intersect with the original zone span
+	 * the zone is surely not contiguous.
+	 */
+	if (end_pfn < zone->zone_start_pfn || start_pfn > zone_end_pfn(zone))
+		return ZONE_CONTIG_NO;
+
+	/* Adding to the start/end of the zone will not change anything. */
+	if (end_pfn == zone->zone_start_pfn || start_pfn == zone_end_pfn(zone))
+		return zone->contiguous ? ZONE_CONTIG_YES : ZONE_CONTIG_NO;
+
+	/* If we cannot fill the hole, the zone stays not contiguous. */
+	if (nr_pages < (zone->spanned_pages - zone->present_pages))
+		return ZONE_CONTIG_NO;
+
+	return ZONE_CONTIG_MAYBE;
+}
+
 /*
  * Associate the pfn range with the given zone, initializing the memmaps
  * and resizing the pgdat/zone data to span the added pages. After this
@@ -1165,7 +1213,6 @@ static int online_pages(unsigned long pfn, unsigned long nr_pages,
 			 !IS_ALIGNED(pfn + nr_pages, PAGES_PER_SECTION)))
 		return -EINVAL;
 
-
 	/* associate pfn range with the zone */
 	move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_MOVABLE,
 			       true);
@@ -1203,13 +1250,6 @@ static int online_pages(unsigned long pfn, unsigned long nr_pages,
 	}
 
 	online_pages_range(pfn, nr_pages);
-
-	/*
-	 * Now that the ranges are indicated as online, check whether the whole
-	 * zone is contiguous.
-	 */
-	set_zone_contiguous(zone);
-
 	adjust_present_page_count(pfn_to_page(pfn), group, nr_pages);
 
 	if (node_arg.nid >= 0)
@@ -1258,12 +1298,21 @@ static int online_memory_block_pages(unsigned long start_pfn, unsigned long nr_p
 			unsigned long nr_vmemmap_pages, struct zone *zone,
 			struct memory_group *group)
 {
+	const bool contiguous = zone->contiguous;
+	enum zone_contig_state new_contiguous_state;
 	int ret;
 
+	/*
+	 * Calculate the new zone contig state before move_pfn_range_to_zone()
+	 * sets the zone temporarily to non-contiguous.
+	 */
+	new_contiguous_state = zone_contig_state_after_growing(zone, start_pfn,
+							       nr_pages);
+
 	if (nr_vmemmap_pages) {
 		ret = mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages, zone);
 		if (ret)
-			return ret;
+			goto restore_zone_contig;
 	}
 
 	ret = online_pages(start_pfn + nr_vmemmap_pages,
@@ -1271,7 +1320,7 @@ static int online_memory_block_pages(unsigned long start_pfn, unsigned long nr_p
 	if (ret) {
 		if (nr_vmemmap_pages)
 			mhp_deinit_memmap_on_memory(start_pfn, nr_vmemmap_pages);
-		return ret;
+		goto restore_zone_contig;
 	}
 
 	/*
@@ -1282,6 +1331,15 @@ static int online_memory_block_pages(unsigned long start_pfn, unsigned long nr_p
 		adjust_present_page_count(pfn_to_page(start_pfn), group,
 					  nr_vmemmap_pages);
 
+	/*
+	 * Now that the ranges are indicated as online, check whether the whole
+	 * zone is contiguous.
+	 */
+	set_zone_contiguous(zone, new_contiguous_state);
+	return 0;
+
+restore_zone_contig:
+	zone->contiguous = contiguous;
 	return ret;
 }
 
diff --git a/mm/mm_init.c b/mm/mm_init.c
index fc2a6f1e518f..5ed3fbd5c643 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -2263,11 +2263,22 @@ void __init init_cma_pageblock(struct page *page)
 }
 #endif
 
-void set_zone_contiguous(struct zone *zone)
+void set_zone_contiguous(struct zone *zone, enum zone_contig_state state)
 {
 	unsigned long block_start_pfn = zone->zone_start_pfn;
 	unsigned long block_end_pfn;
 
+	/* We expect an earlier call to clear_zone_contiguous(). */
+	VM_WARN_ON_ONCE(zone->contiguous);
+
+	if (state == ZONE_CONTIG_YES) {
+		zone->contiguous = true;
+		return;
+	}
+
+	if (state == ZONE_CONTIG_NO)
+		return;
+
 	block_end_pfn = pageblock_end_pfn(block_start_pfn);
 	for (; block_start_pfn < zone_end_pfn(zone);
 			block_start_pfn = block_end_pfn,
@@ -2348,7 +2359,7 @@ void __init page_alloc_init_late(void)
 		shuffle_free_memory(NODE_DATA(nid));
 
 	for_each_populated_zone(zone)
-		set_zone_contiguous(zone);
+		set_zone_contiguous(zone, ZONE_CONTIG_MAYBE);
 
 	/* Initialize page ext after all struct pages are initialized. */
 	if (deferred_struct_pages)
-- 
2.47.1



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v9 2/2] mm/memory hotplug/unplug: Optimize zone->contiguous update when changes pfn range
  2026-01-30 16:37 ` [PATCH v9 2/2] mm/memory hotplug/unplug: Optimize zone->contiguous update when changes pfn range Tianyou Li
@ 2026-02-07 11:00   ` David Hildenbrand (Arm)
  2026-02-08 19:39     ` Mike Rapoport
  0 siblings, 1 reply; 13+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-07 11:00 UTC (permalink / raw)
  To: Tianyou Li, Oscar Salvador, Mike Rapoport, Wei Yang, Michal Hocko
  Cc: linux-mm, Yong Hu, Nanhai Zou, Yuan Liu, Tim Chen, Qiuxu Zhuo,
	Yu C Chen, Pan Deng, Chen Zhang, linux-kernel

On 1/30/26 17:37, Tianyou Li wrote:
> When invoke move_pfn_range_to_zone or remove_pfn_range_from_zone, it will
> update the zone->contiguous by checking the new zone's pfn range from the
> beginning to the end, regardless the previous state of the old zone. When
> the zone's pfn range is large, the cost of traversing the pfn range to
> update the zone->contiguous could be significant.
> 
> Add fast paths to quickly detect cases where zone is definitely not
> contiguous without scanning the new zone. The cases are: when the new range
> did not overlap with previous range, the contiguous should be false; if the
> new range adjacent with the previous range, just need to check the new
> range; if the new added pages could not fill the hole of previous zone, the
> contiguous should be false.
> 
> The following test cases of memory hotplug for a VM [1], tested in the
> environment [2], show that this optimization can significantly reduce the
> memory hotplug time [3].
> 
> +----------------+------+---------------+--------------+----------------+
> |                | Size | Time (before) | Time (after) | Time Reduction |
> |                +------+---------------+--------------+----------------+
> | Plug Memory    | 256G |      10s      |      2s      |       80%      |
> |                +------+---------------+--------------+----------------+
> |                | 512G |      33s      |      6s      |       81%      |
> +----------------+------+---------------+--------------+----------------+
> 
> +----------------+------+---------------+--------------+----------------+
> |                | Size | Time (before) | Time (after) | Time Reduction |
> |                +------+---------------+--------------+----------------+
> | Unplug Memory  | 256G |      10s      |      2s      |       80%      |
> |                +------+---------------+--------------+----------------+
> |                | 512G |      34s      |      6s      |       82%      |
> +----------------+------+---------------+--------------+----------------+
> 
> [1] Qemu commands to hotplug 256G/512G memory for a VM:
>      object_add memory-backend-ram,id=hotmem0,size=256G/512G,share=on
>      device_add virtio-mem-pci,id=vmem1,memdev=hotmem0,bus=port1
>      qom-set vmem1 requested-size 256G/512G (Plug Memory)
>      qom-set vmem1 requested-size 0G (Unplug Memory)
> 
> [2] Hardware     : Intel Icelake server
>      Guest Kernel : v6.18-rc2
>      Qemu         : v9.0.0
> 
>      Launch VM    :
>      qemu-system-x86_64 -accel kvm -cpu host \
>      -drive file=./Centos10_cloud.qcow2,format=qcow2,if=virtio \
>      -drive file=./seed.img,format=raw,if=virtio \
>      -smp 3,cores=3,threads=1,sockets=1,maxcpus=3 \
>      -m 2G,slots=10,maxmem=2052472M \
>      -device pcie-root-port,id=port1,bus=pcie.0,slot=1,multifunction=on \
>      -device pcie-root-port,id=port2,bus=pcie.0,slot=2 \
>      -nographic -machine q35 \
>      -nic user,hostfwd=tcp::3000-:22
> 
>      Guest kernel auto-onlines newly added memory blocks:
>      echo online > /sys/devices/system/memory/auto_online_blocks
> 
> [3] The time from typing the QEMU commands in [1] to when the output of
>      'grep MemTotal /proc/meminfo' on Guest reflects that all hotplugged
>      memory is recognized.
> 
> Reported-by: Nanhai Zou <nanhai.zou@intel.com>
> Reported-by: Chen Zhang <zhangchen.kidd@jd.com>
> Tested-by: Yuan Liu <yuan1.liu@intel.com>
> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> Reviewed-by: Yu C Chen <yu.c.chen@intel.com>
> Reviewed-by: Pan Deng <pan.deng@intel.com>
> Reviewed-by: Nanhai Zou <nanhai.zou@intel.com>
> Reviewed-by: Yuan Liu <yuan1.liu@intel.com>
> Signed-off-by: Tianyou Li <tianyou.li@intel.com>
> ---

Thanks for all your work on this and sorry for being slower with
review the last month.

While I was in the shower I was thinking about how much I hate
zone->contiguous + the pageblock walking, and how we could just get
rid of it.

You know, just what you do while having a relaxing shower.


And I was wondering:

(a) in which case would we have zone_spanned_pages == zone_present_pages
and the zone *not* being contiguous? I assume this just cannot happen,
otherwise BUG.

(b) in which case would we have zone_spanned_pages != zone_present_pages
and the zone *being* contiguous? I assume in some cases where we have small
holes within a pageblock?


Reading the doc of __pageblock_pfn_to_page(), there are some weird
scenarios with holes in pageblocks.



I.e., on my notebook I have

$ cat /proc/zoneinfo  | grep -E "Node|spanned|present"
Node 0, zone      DMA
         spanned  4095
         present  3999
Node 0, zone    DMA32
         spanned  1044480
         present  439600
Node 0, zone   Normal
         spanned  7798784
         present  7798784
Node 0, zone  Movable
         spanned  0
         present  0
Node 0, zone   Device
         spanned  0
         present  0


For the most important zone regarding compaction, ZONE_NORMAL, it would be good enough.

We certainly don't care about detecting contigous for the DMA zone. For DMA32, I would suspect
that it is not detected as contigous either way, because the holes are just way too large?


So we could maybe do (completely untested):


 From 69093e5811b532812fde52b55a42dcb24d6e09dd Mon Sep 17 00:00:00 2001
From: "David Hildenbrand (Arm)" <david@kernel.org>
Date: Sat, 7 Feb 2026 11:45:21 +0100
Subject: [PATCH] tmp

Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
  include/linux/mmzone.h | 25 +++++++++++++++++++++++--
  mm/internal.h          |  8 +-------
  mm/memory_hotplug.c    | 11 +----------
  mm/mm_init.c           | 25 -------------------------
  4 files changed, 25 insertions(+), 44 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index fc5d6c88d2f0..7c80df343cfd 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1051,8 +1051,6 @@ struct zone {
  	bool			compact_blockskip_flush;
  #endif
  
-	bool			contiguous;
-
  	CACHELINE_PADDING(_pad3_);
  	/* Zone statistics */
  	atomic_long_t		vm_stat[NR_VM_ZONE_STAT_ITEMS];
@@ -1124,6 +1122,29 @@ static inline bool zone_spans_pfn(const struct zone *zone, unsigned long pfn)
  	return zone->zone_start_pfn <= pfn && pfn < zone_end_pfn(zone);
  }
  
+/**
+ * zone_is_contiguous - test whether a zone is contiguous
+ * @zone: the zone to test.
+ *
+ * In a contigous zone, it is valid to call pfn_to_page() on any pfn in the
+ * spanned zone without requiting pfn_valid() or pfn_to_online_page() checks.
+ *
+ * Returns: true if contiguous, otherwise false.
+ */
+static inline bool zone_is_contiguous(const struct zone *zone)
+{
+	/*
+	 * TODO: do we care about weird races? We could protect using a
+	 * seqcount or sth. like that (zone_span_seqbegin etc).
+	 *
+	 * Concurrent hotplug is not an issue. But likely the caller must
+	 * protect against concurrent hotunplug already? We should definitely
+	 * read these values through READ_ONCE and update them through
+	 * WRITE_ONCE().
+	 */
+	return zone->spanned_pages == zone->present_pages;
+}
+
  static inline bool zone_is_initialized(const struct zone *zone)
  {
  	return zone->initialized;
diff --git a/mm/internal.h b/mm/internal.h
index f35dbcf99a86..6062f9b8ee62 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -716,21 +716,15 @@ extern struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
  static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
  				unsigned long end_pfn, struct zone *zone)
  {
-	if (zone->contiguous)
+	if (zone_is_contiguous(zone))
  		return pfn_to_page(start_pfn);
  
  	return __pageblock_pfn_to_page(start_pfn, end_pfn, zone);
  }
  
-void set_zone_contiguous(struct zone *zone);
  bool pfn_range_intersects_zones(int nid, unsigned long start_pfn,
  			   unsigned long nr_pages);
  
-static inline void clear_zone_contiguous(struct zone *zone)
-{
-	zone->contiguous = false;
-}
-
  extern int __isolate_free_page(struct page *page, unsigned int order);
  extern void __putback_isolated_page(struct page *page, unsigned int order,
  				    int mt);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index a63ec679d861..790a8839b5d8 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -565,18 +565,13 @@ void remove_pfn_range_from_zone(struct zone *zone,
  
  	/*
  	 * Zone shrinking code cannot properly deal with ZONE_DEVICE. So
-	 * we will not try to shrink the zones - which is okay as
-	 * set_zone_contiguous() cannot deal with ZONE_DEVICE either way.
+	 * we will not try to shrink the zones.
  	 */
  	if (zone_is_zone_device(zone))
  		return;
  
-	clear_zone_contiguous(zone);
-
  	shrink_zone_span(zone, start_pfn, start_pfn + nr_pages);
  	update_pgdat_span(pgdat);
-
-	set_zone_contiguous(zone);
  }
  
  /**
@@ -753,8 +748,6 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
  	struct pglist_data *pgdat = zone->zone_pgdat;
  	int nid = pgdat->node_id;
  
-	clear_zone_contiguous(zone);
-
  	if (zone_is_empty(zone))
  		init_currently_empty_zone(zone, start_pfn, nr_pages);
  	resize_zone_range(zone, start_pfn, nr_pages);
@@ -782,8 +775,6 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
  	memmap_init_range(nr_pages, nid, zone_idx(zone), start_pfn, 0,
  			 MEMINIT_HOTPLUG, altmap, migratetype,
  			 isolate_pageblock);
-
-	set_zone_contiguous(zone);
  }
  
  struct auto_movable_stats {
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 2a809cd8e7fa..78115fb5808b 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -2263,28 +2263,6 @@ void __init init_cma_pageblock(struct page *page)
  }
  #endif
  
-void set_zone_contiguous(struct zone *zone)
-{
-	unsigned long block_start_pfn = zone->zone_start_pfn;
-	unsigned long block_end_pfn;
-
-	block_end_pfn = pageblock_end_pfn(block_start_pfn);
-	for (; block_start_pfn < zone_end_pfn(zone);
-			block_start_pfn = block_end_pfn,
-			 block_end_pfn += pageblock_nr_pages) {
-
-		block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
-
-		if (!__pageblock_pfn_to_page(block_start_pfn,
-					     block_end_pfn, zone))
-			return;
-		cond_resched();
-	}
-
-	/* We confirm that there is no hole */
-	zone->contiguous = true;
-}
-
  /*
   * Check if a PFN range intersects multiple zones on one or more
   * NUMA nodes. Specify the @nid argument if it is known that this
@@ -2347,9 +2325,6 @@ void __init page_alloc_init_late(void)
  	for_each_node_state(nid, N_MEMORY)
  		shuffle_free_memory(NODE_DATA(nid));
  
-	for_each_populated_zone(zone)
-		set_zone_contiguous(zone);
-
  	/* Initialize page ext after all struct pages are initialized. */
  	if (deferred_struct_pages)
  		page_ext_init();
-- 
2.43.0



If we would want to cover the cases with "holes in zone, but there is a struct page and it's
assigned to the zone", all we would have to do is manually track them (during boot only,
cannot happen during memory hotplug) in zone->absent pages. That value would never change.

Then we would have instead:

static inline bool zone_is_contiguous(const struct zone *zone)
{
	return zone->spanned_pages == zone->present_pages + zone->absent_pages;
}


I don't think we could just use "absent" as calculated in calculate_node_totalpages,
because I assume it could include "too many" things, not just these holes in pageblocks.


At least reading zone_absent_pages_in_node(), likely the value could return
* Pages that will not have a struct page in case of larger holes
* mirrored_kernelcore oddities

We'd need a reliably "absent pages that have a struct page that belongs to this zone".

Maybe Mike knows how to easily obtain that there to just set zone->absent_pages.

If we really need that optimization for these cases.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v9 2/2] mm/memory hotplug/unplug: Optimize zone->contiguous update when changes pfn range
  2026-02-07 11:00   ` David Hildenbrand (Arm)
@ 2026-02-08 19:39     ` Mike Rapoport
  2026-02-09 10:52       ` David Hildenbrand (Arm)
  2026-02-09 11:38       ` David Hildenbrand (Arm)
  0 siblings, 2 replies; 13+ messages in thread
From: Mike Rapoport @ 2026-02-08 19:39 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Tianyou Li, Oscar Salvador, Wei Yang, Michal Hocko, linux-mm,
	Yong Hu, Nanhai Zou, Yuan Liu, Tim Chen, Qiuxu Zhuo, Yu C Chen,
	Pan Deng, Chen Zhang, linux-kernel

On Sat, Feb 07, 2026 at 12:00:09PM +0100, David Hildenbrand (Arm) wrote:
> On 1/30/26 17:37, Tianyou Li wrote:
> > When invoke move_pfn_range_to_zone or remove_pfn_range_from_zone, it will
> > update the zone->contiguous by checking the new zone's pfn range from the
> > beginning to the end, regardless the previous state of the old zone. When
> > the zone's pfn range is large, the cost of traversing the pfn range to
> > update the zone->contiguous could be significant.
> > 
> > Add fast paths to quickly detect cases where zone is definitely not
> > contiguous without scanning the new zone. The cases are: when the new range
> > did not overlap with previous range, the contiguous should be false; if the
> > new range adjacent with the previous range, just need to check the new
> > range; if the new added pages could not fill the hole of previous zone, the
> > contiguous should be false.
> > 
> > The following test cases of memory hotplug for a VM [1], tested in the
> > environment [2], show that this optimization can significantly reduce the
> > memory hotplug time [3].
> > 
> > +----------------+------+---------------+--------------+----------------+
> > |                | Size | Time (before) | Time (after) | Time Reduction |
> > |                +------+---------------+--------------+----------------+
> > | Plug Memory    | 256G |      10s      |      2s      |       80%      |
> > |                +------+---------------+--------------+----------------+
> > |                | 512G |      33s      |      6s      |       81%      |
> > +----------------+------+---------------+--------------+----------------+
> > 
> > +----------------+------+---------------+--------------+----------------+
> > |                | Size | Time (before) | Time (after) | Time Reduction |
> > |                +------+---------------+--------------+----------------+
> > | Unplug Memory  | 256G |      10s      |      2s      |       80%      |
> > |                +------+---------------+--------------+----------------+
> > |                | 512G |      34s      |      6s      |       82%      |
> > +----------------+------+---------------+--------------+----------------+
> > 
> > [1] Qemu commands to hotplug 256G/512G memory for a VM:
> >      object_add memory-backend-ram,id=hotmem0,size=256G/512G,share=on
> >      device_add virtio-mem-pci,id=vmem1,memdev=hotmem0,bus=port1
> >      qom-set vmem1 requested-size 256G/512G (Plug Memory)
> >      qom-set vmem1 requested-size 0G (Unplug Memory)
> > 
> > [2] Hardware     : Intel Icelake server
> >      Guest Kernel : v6.18-rc2
> >      Qemu         : v9.0.0
> > 
> >      Launch VM    :
> >      qemu-system-x86_64 -accel kvm -cpu host \
> >      -drive file=./Centos10_cloud.qcow2,format=qcow2,if=virtio \
> >      -drive file=./seed.img,format=raw,if=virtio \
> >      -smp 3,cores=3,threads=1,sockets=1,maxcpus=3 \
> >      -m 2G,slots=10,maxmem=2052472M \
> >      -device pcie-root-port,id=port1,bus=pcie.0,slot=1,multifunction=on \
> >      -device pcie-root-port,id=port2,bus=pcie.0,slot=2 \
> >      -nographic -machine q35 \
> >      -nic user,hostfwd=tcp::3000-:22
> > 
> >      Guest kernel auto-onlines newly added memory blocks:
> >      echo online > /sys/devices/system/memory/auto_online_blocks
> > 
> > [3] The time from typing the QEMU commands in [1] to when the output of
> >      'grep MemTotal /proc/meminfo' on Guest reflects that all hotplugged
> >      memory is recognized.
> > 
> > Reported-by: Nanhai Zou <nanhai.zou@intel.com>
> > Reported-by: Chen Zhang <zhangchen.kidd@jd.com>
> > Tested-by: Yuan Liu <yuan1.liu@intel.com>
> > Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
> > Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> > Reviewed-by: Yu C Chen <yu.c.chen@intel.com>
> > Reviewed-by: Pan Deng <pan.deng@intel.com>
> > Reviewed-by: Nanhai Zou <nanhai.zou@intel.com>
> > Reviewed-by: Yuan Liu <yuan1.liu@intel.com>
> > Signed-off-by: Tianyou Li <tianyou.li@intel.com>
> > ---
> 
> Thanks for all your work on this and sorry for being slower with
> review the last month.
> 
> While I was in the shower I was thinking about how much I hate
> zone->contiguous + the pageblock walking, and how we could just get
> rid of it.
> 
> You know, just what you do while having a relaxing shower.
> 
> 
> And I was wondering:
> 
> (a) in which case would we have zone_spanned_pages == zone_present_pages
> and the zone *not* being contiguous? I assume this just cannot happen,
> otherwise BUG.
> 
> (b) in which case would we have zone_spanned_pages != zone_present_pages
> and the zone *being* contiguous? I assume in some cases where we have small
> holes within a pageblock?
>
> Reading the doc of __pageblock_pfn_to_page(), there are some weird
> scenarios with holes in pageblocks.
 
It seems that "zone->contigous" is really bad name for what this thing
represents.

tl;dr I don't think zone_spanned_pages == zone_present_pages is related to
zone->contigous at all :)

If you look at pageblock_pfn_to_page() and __pageblock_pfn_to_page(), the
check for zone->contigous should guarantee that the entire pageblock has a
valid memory map and that the entire pageblock fits a zone and does not
cross zone/node boundaries.

For coldplug memory the memory map is valid for every section that has
present memory, i.e. even it there is a hole in a section, it's memory map
will be populated and will have struct pages.

When zone->contigous is false, the slow path in __pageblock_pfn_to_page()
essentially checks if the first page in a pageblock is online and if first
and last pages are in the zone being compacted. 
 
AFAIU, in the hotplug case an entire pageblock is always onlined to the
same zone, so zone->contigous won't change after the hotplug is complete.

We might set it to false in the beginning of the hotplug to avoid scanning
offline pages, although I'm not sure if it's possible.

But in the end of hotplug we can simply restore the old value and move on.

For the coldplug case I'm also not sure it's worth the hassle, we could
just let compaction scan a few more pfns for those rare weird pageblocks
and bail out on wrong page conditions.

> I.e., on my notebook I have
> 
> $ cat /proc/zoneinfo  | grep -E "Node|spanned|present"
> Node 0, zone      DMA
>         spanned  4095
>         present  3999
> Node 0, zone    DMA32
>         spanned  1044480
>         present  439600

I suspect this one is contigous ;-)

> Node 0, zone   Normal
>         spanned  7798784
>         present  7798784
> Node 0, zone  Movable
>         spanned  0
>         present  0
> Node 0, zone   Device
>         spanned  0
>         present  0
> 
> 
> For the most important zone regarding compaction, ZONE_NORMAL, it would be good enough.
> 
> We certainly don't care about detecting contigous for the DMA zone. For DMA32, I would suspect
> that it is not detected as contigous either way, because the holes are just way too large?
> 
> -- 
> Cheers,
> 
> David

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v9 2/2] mm/memory hotplug/unplug: Optimize zone->contiguous update when changes pfn range
  2026-02-08 19:39     ` Mike Rapoport
@ 2026-02-09 10:52       ` David Hildenbrand (Arm)
  2026-02-09 12:44         ` David Hildenbrand (Arm)
  2026-02-09 11:38       ` David Hildenbrand (Arm)
  1 sibling, 1 reply; 13+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-09 10:52 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Tianyou Li, Oscar Salvador, Wei Yang, Michal Hocko, linux-mm,
	Yong Hu, Nanhai Zou, Yuan Liu, Tim Chen, Qiuxu Zhuo, Yu C Chen,
	Pan Deng, Chen Zhang, linux-kernel

On 2/8/26 20:39, Mike Rapoport wrote:
> On Sat, Feb 07, 2026 at 12:00:09PM +0100, David Hildenbrand (Arm) wrote:
>> On 1/30/26 17:37, Tianyou Li wrote:
>>> When invoke move_pfn_range_to_zone or remove_pfn_range_from_zone, it will
>>> update the zone->contiguous by checking the new zone's pfn range from the
>>> beginning to the end, regardless the previous state of the old zone. When
>>> the zone's pfn range is large, the cost of traversing the pfn range to
>>> update the zone->contiguous could be significant.
>>>
>>> Add fast paths to quickly detect cases where zone is definitely not
>>> contiguous without scanning the new zone. The cases are: when the new range
>>> did not overlap with previous range, the contiguous should be false; if the
>>> new range adjacent with the previous range, just need to check the new
>>> range; if the new added pages could not fill the hole of previous zone, the
>>> contiguous should be false.
>>>
>>> The following test cases of memory hotplug for a VM [1], tested in the
>>> environment [2], show that this optimization can significantly reduce the
>>> memory hotplug time [3].
>>>
>>> +----------------+------+---------------+--------------+----------------+
>>> |                | Size | Time (before) | Time (after) | Time Reduction |
>>> |                +------+---------------+--------------+----------------+
>>> | Plug Memory    | 256G |      10s      |      2s      |       80%      |
>>> |                +------+---------------+--------------+----------------+
>>> |                | 512G |      33s      |      6s      |       81%      |
>>> +----------------+------+---------------+--------------+----------------+
>>>
>>> +----------------+------+---------------+--------------+----------------+
>>> |                | Size | Time (before) | Time (after) | Time Reduction |
>>> |                +------+---------------+--------------+----------------+
>>> | Unplug Memory  | 256G |      10s      |      2s      |       80%      |
>>> |                +------+---------------+--------------+----------------+
>>> |                | 512G |      34s      |      6s      |       82%      |
>>> +----------------+------+---------------+--------------+----------------+
>>>
>>> [1] Qemu commands to hotplug 256G/512G memory for a VM:
>>>       object_add memory-backend-ram,id=hotmem0,size=256G/512G,share=on
>>>       device_add virtio-mem-pci,id=vmem1,memdev=hotmem0,bus=port1
>>>       qom-set vmem1 requested-size 256G/512G (Plug Memory)
>>>       qom-set vmem1 requested-size 0G (Unplug Memory)
>>>
>>> [2] Hardware     : Intel Icelake server
>>>       Guest Kernel : v6.18-rc2
>>>       Qemu         : v9.0.0
>>>
>>>       Launch VM    :
>>>       qemu-system-x86_64 -accel kvm -cpu host \
>>>       -drive file=./Centos10_cloud.qcow2,format=qcow2,if=virtio \
>>>       -drive file=./seed.img,format=raw,if=virtio \
>>>       -smp 3,cores=3,threads=1,sockets=1,maxcpus=3 \
>>>       -m 2G,slots=10,maxmem=2052472M \
>>>       -device pcie-root-port,id=port1,bus=pcie.0,slot=1,multifunction=on \
>>>       -device pcie-root-port,id=port2,bus=pcie.0,slot=2 \
>>>       -nographic -machine q35 \
>>>       -nic user,hostfwd=tcp::3000-:22
>>>
>>>       Guest kernel auto-onlines newly added memory blocks:
>>>       echo online > /sys/devices/system/memory/auto_online_blocks
>>>
>>> [3] The time from typing the QEMU commands in [1] to when the output of
>>>       'grep MemTotal /proc/meminfo' on Guest reflects that all hotplugged
>>>       memory is recognized.
>>>
>>> Reported-by: Nanhai Zou <nanhai.zou@intel.com>
>>> Reported-by: Chen Zhang <zhangchen.kidd@jd.com>
>>> Tested-by: Yuan Liu <yuan1.liu@intel.com>
>>> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
>>> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
>>> Reviewed-by: Yu C Chen <yu.c.chen@intel.com>
>>> Reviewed-by: Pan Deng <pan.deng@intel.com>
>>> Reviewed-by: Nanhai Zou <nanhai.zou@intel.com>
>>> Reviewed-by: Yuan Liu <yuan1.liu@intel.com>
>>> Signed-off-by: Tianyou Li <tianyou.li@intel.com>
>>> ---
>>
>> Thanks for all your work on this and sorry for being slower with
>> review the last month.
>>
>> While I was in the shower I was thinking about how much I hate
>> zone->contiguous + the pageblock walking, and how we could just get
>> rid of it.
>>
>> You know, just what you do while having a relaxing shower.
>>
>>
>> And I was wondering:
>>
>> (a) in which case would we have zone_spanned_pages == zone_present_pages
>> and the zone *not* being contiguous? I assume this just cannot happen,
>> otherwise BUG.
>>
>> (b) in which case would we have zone_spanned_pages != zone_present_pages
>> and the zone *being* contiguous? I assume in some cases where we have small
>> holes within a pageblock?
>>
>> Reading the doc of __pageblock_pfn_to_page(), there are some weird
>> scenarios with holes in pageblocks.
>   
> It seems that "zone->contigous" is really bad name for what this thing
> represents.
> 
> tl;dr I don't think zone_spanned_pages == zone_present_pages is related to
> zone->contigous at all :)

My point in (a) was that with "zone_spanned_pages == zone_present_pages" 
there are no holes so -> contiguous.

(b), and what I said further below, is exactly about memory holes where 
we have a memmap, but it's not present memory.

> 
> If you look at pageblock_pfn_to_page() and __pageblock_pfn_to_page(), the
> check for zone->contigous should guarantee that the entire pageblock has a
> valid memory map and that the entire pageblock fits a zone and does not
> cross zone/node boundaries.

Right. But that must hold for each and ever pageblock in the spanned 
zone range for it to be contiguous.

zone->contigous tells you "pfn_to_page()" is valid on the complete zone 
range"

That's why set_zone_contiguous() probes __pageblock_pfn_to_page() on ech 
and ever pageblock.

> 
> For coldplug memory the memory map is valid for every section that has
> present memory, i.e. even it there is a hole in a section, it's memory map
> will be populated and will have struct pages.

There is this sub-section thing, and holes larger than a section might 
not have a memmap (unless reserved I guess).

> 
> When zone->contigous is false, the slow path in __pageblock_pfn_to_page()
> essentially checks if the first page in a pageblock is online and if first
> and last pages are in the zone being compacted.
>   
> AFAIU, in the hotplug case an entire pageblock is always onlined to the
> same zone, so zone->contigous won't change after the hotplug is complete.

I think you are missing a point: hotp(un)plug might create holes in the 
zone span. Then, pfn_to_page() is no longer valid to be called on 
arbitrary pageblocks within the zone.

> 
> We might set it to false in the beginning of the hotplug to avoid scanning
> offline pages, although I'm not sure if it's possible.
> 
> But in the end of hotplug we can simply restore the old value and move on.

No, you might create holes.

> 
> For the coldplug case I'm also not sure it's worth the hassle, we could
> just let compaction scan a few more pfns for those rare weird pageblocks
> and bail out on wrong page conditions.

To recap:

My idea is that "zone_spanned_pages == zone_present_pages" tells you 
that the zone is contiguous because there are no holes.

To handle "non-memory with a struct page", you'd have to check

	"zone_spanned_pages == zone_present_pages +
          zone_non_present_memmap_pages"

Or shorter

	"zone_spanned_pages == zone_pages_with_memmap"

Then, pfn_to_page() is valid within the complete zone.

The question is how to best calculate the "zone_pages_with_memmap" 
during boot.

During hot(un)plug we only add/remove zone_present_pages. The 
zone_non_present_memmap_pages will not change due to hot(un)plug later.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v9 2/2] mm/memory hotplug/unplug: Optimize zone->contiguous update when changes pfn range
  2026-02-09 10:52       ` David Hildenbrand (Arm)
@ 2026-02-09 12:44         ` David Hildenbrand (Arm)
  2026-02-10 11:44           ` Mike Rapoport
  0 siblings, 1 reply; 13+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-09 12:44 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Tianyou Li, Oscar Salvador, Wei Yang, Michal Hocko, linux-mm,
	Yong Hu, Nanhai Zou, Yuan Liu, Tim Chen, Qiuxu Zhuo, Yu C Chen,
	Pan Deng, Chen Zhang, linux-kernel

On 2/9/26 11:52, David Hildenbrand (Arm) wrote:
> On 2/8/26 20:39, Mike Rapoport wrote:
>> On Sat, Feb 07, 2026 at 12:00:09PM +0100, David Hildenbrand (Arm) wrote:
>>>
>>> Thanks for all your work on this and sorry for being slower with
>>> review the last month.
>>>
>>> While I was in the shower I was thinking about how much I hate
>>> zone->contiguous + the pageblock walking, and how we could just get
>>> rid of it.
>>>
>>> You know, just what you do while having a relaxing shower.
>>>
>>>
>>> And I was wondering:
>>>
>>> (a) in which case would we have zone_spanned_pages == zone_present_pages
>>> and the zone *not* being contiguous? I assume this just cannot happen,
>>> otherwise BUG.
>>>
>>> (b) in which case would we have zone_spanned_pages != zone_present_pages
>>> and the zone *being* contiguous? I assume in some cases where we have 
>>> small
>>> holes within a pageblock?
>>>
>>> Reading the doc of __pageblock_pfn_to_page(), there are some weird
>>> scenarios with holes in pageblocks.
>> It seems that "zone->contigous" is really bad name for what this thing
>> represents.
>>
>> tl;dr I don't think zone_spanned_pages == zone_present_pages is 
>> related to
>> zone->contigous at all :)
> 
> My point in (a) was that with "zone_spanned_pages == zone_present_pages" 
> there are no holes so -> contiguous.
> 
> (b), and what I said further below, is exactly about memory holes where 
> we have a memmap, but it's not present memory.
> 
>>
>> If you look at pageblock_pfn_to_page() and __pageblock_pfn_to_page(), the
>> check for zone->contigous should guarantee that the entire pageblock 
>> has a
>> valid memory map and that the entire pageblock fits a zone and does not
>> cross zone/node boundaries.
> 
> Right. But that must hold for each and ever pageblock in the spanned 
> zone range for it to be contiguous.
> 
> zone->contigous tells you "pfn_to_page()" is valid on the complete zone 
> range"
> 
> That's why set_zone_contiguous() probes __pageblock_pfn_to_page() on ech 
> and ever pageblock.
> 
>>
>> For coldplug memory the memory map is valid for every section that has
>> present memory, i.e. even it there is a hole in a section, it's memory 
>> map
>> will be populated and will have struct pages.
> 
> There is this sub-section thing, and holes larger than a section might 
> not have a memmap (unless reserved I guess).
> 
>>
>> When zone->contigous is false, the slow path in __pageblock_pfn_to_page()
>> essentially checks if the first page in a pageblock is online and if 
>> first
>> and last pages are in the zone being compacted.
>> AFAIU, in the hotplug case an entire pageblock is always onlined to the
>> same zone, so zone->contigous won't change after the hotplug is complete.
> 
> I think you are missing a point: hotp(un)plug might create holes in the 
> zone span. Then, pfn_to_page() is no longer valid to be called on 
> arbitrary pageblocks within the zone.
> 
>>
>> We might set it to false in the beginning of the hotplug to avoid 
>> scanning
>> offline pages, although I'm not sure if it's possible.
>>
>> But in the end of hotplug we can simply restore the old value and move 
>> on.
> 
> No, you might create holes.
> 
>>
>> For the coldplug case I'm also not sure it's worth the hassle, we could
>> just let compaction scan a few more pfns for those rare weird pageblocks
>> and bail out on wrong page conditions.
> 
> To recap:
> 
> My idea is that "zone_spanned_pages == zone_present_pages" tells you 
> that the zone is contiguous because there are no holes.
> 
> To handle "non-memory with a struct page", you'd have to check
> 
>      "zone_spanned_pages == zone_present_pages +
>           zone_non_present_memmap_pages"
> 
> Or shorter
> 
>      "zone_spanned_pages == zone_pages_with_memmap"
> 
> Then, pfn_to_page() is valid within the complete zone.
> 
> The question is how to best calculate the "zone_pages_with_memmap" 
> during boot.
> 
> During hot(un)plug we only add/remove zone_present_pages. The 
> zone_non_present_memmap_pages will not change due to hot(un)plug later.
> 

The following hack does the trick. But

(a) I wish we could get rid of the pageblock walking in calc_online_pages().
(b) "online_pages" has weird semantics due to the pageblock handling.
     "online_pageblock_pages"? not sure.
(c) Calculating "online_pages" when we know there is a hole does not make sense,
     as we could just keep it 0 if there are holes and simply set it to
     zone->online_pageblock_pages->zone->spanned_pages in case all are online.


 From d4cb825e91a6363afc68fb994c5d9b29c38c5f42 Mon Sep 17 00:00:00 2001
From: "David Hildenbrand (Arm)" <david@kernel.org>
Date: Mon, 9 Feb 2026 13:40:24 +0100
Subject: [PATCH] tmp

Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
  include/linux/mmzone.h | 25 +++++++++++++++++++++++--
  mm/internal.h          |  8 +-------
  mm/memory_hotplug.c    | 20 ++++++--------------
  mm/mm_init.c           | 12 ++++++------
  4 files changed, 36 insertions(+), 29 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index fc5d6c88d2f0..3f7d8d88c597 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -943,6 +943,11 @@ struct zone {
  	 * cma pages is present pages that are assigned for CMA use
  	 * (MIGRATE_CMA).
  	 *
+	 * online_pages is pages within the zone that have an online memmap.
+	 * online_pages include present pages and memory holes that have a
+	 * memmap. When spanned_pages == online_pages, pfn_to_page() can be
+	 * performed without further checks on any pfn within the zone span.
+	 *
  	 * So present_pages may be used by memory hotplug or memory power
  	 * management logic to figure out unmanaged pages by checking
  	 * (present_pages - managed_pages). And managed_pages should be used
@@ -967,6 +972,7 @@ struct zone {
  	atomic_long_t		managed_pages;
  	unsigned long		spanned_pages;
  	unsigned long		present_pages;
+	unsigned long		online_pages;
  #if defined(CONFIG_MEMORY_HOTPLUG)
  	unsigned long		present_early_pages;
  #endif
@@ -1051,8 +1057,6 @@ struct zone {
  	bool			compact_blockskip_flush;
  #endif
  
-	bool			contiguous;
-
  	CACHELINE_PADDING(_pad3_);
  	/* Zone statistics */
  	atomic_long_t		vm_stat[NR_VM_ZONE_STAT_ITEMS];
@@ -1124,6 +1128,23 @@ static inline bool zone_spans_pfn(const struct zone *zone, unsigned long pfn)
  	return zone->zone_start_pfn <= pfn && pfn < zone_end_pfn(zone);
  }
  
+/**
+ * zone_is_contiguous - test whether a zone is contiguous
+ * @zone: the zone to test.
+ *
+ * In a contiguous zone, it is valid to call pfn_to_page() on any pfn in the
+ * spanned zone without requiting pfn_valid() or pfn_to_online_page() checks.
+ *
+ * Returns: true if contiguous, otherwise false.
+ */
+static inline bool zone_is_contiguous(const struct zone *zone)
+{
+	return READ_ONCE(zone->spanned_pages) == READ_ONCE(zone->online_pages);
+}
+
  static inline bool zone_is_initialized(const struct zone *zone)
  {
  	return zone->initialized;
diff --git a/mm/internal.h b/mm/internal.h
index f35dbcf99a86..6062f9b8ee62 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -716,21 +716,15 @@ extern struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
  static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
  				unsigned long end_pfn, struct zone *zone)
  {
-	if (zone->contiguous)
+	if (zone_is_contiguous(zone))
  		return pfn_to_page(start_pfn);
  
  	return __pageblock_pfn_to_page(start_pfn, end_pfn, zone);
  }
  
-void set_zone_contiguous(struct zone *zone);
  bool pfn_range_intersects_zones(int nid, unsigned long start_pfn,
  			   unsigned long nr_pages);
  
-static inline void clear_zone_contiguous(struct zone *zone)
-{
-	zone->contiguous = false;
-}
-
  extern int __isolate_free_page(struct page *page, unsigned int order);
  extern void __putback_isolated_page(struct page *page, unsigned int order,
  				    int mt);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index a63ec679d861..76496c1039a9 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -492,11 +492,11 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
  		pfn = find_smallest_section_pfn(nid, zone, end_pfn,
  						zone_end_pfn(zone));
  		if (pfn) {
-			zone->spanned_pages = zone_end_pfn(zone) - pfn;
+			WRITE_ONCE(zone->spanned_pages, zone_end_pfn(zone) - pfn);
  			zone->zone_start_pfn = pfn;
  		} else {
  			zone->zone_start_pfn = 0;
-			zone->spanned_pages = 0;
+			WRITE_ONCE(zone->spanned_pages, 0);
  		}
  	} else if (zone_end_pfn(zone) == end_pfn) {
  		/*
@@ -508,10 +508,10 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
  		pfn = find_biggest_section_pfn(nid, zone, zone->zone_start_pfn,
  					       start_pfn);
  		if (pfn)
-			zone->spanned_pages = pfn - zone->zone_start_pfn + 1;
+			WRITE_ONCE(zone->spanned_pages, pfn - zone->zone_start_pfn + 1);
  		else {
  			zone->zone_start_pfn = 0;
-			zone->spanned_pages = 0;
+			WRITE_ONCE(zone->spanned_pages, 0);
  		}
  	}
  }
@@ -565,18 +565,13 @@ void remove_pfn_range_from_zone(struct zone *zone,
  
  	/*
  	 * Zone shrinking code cannot properly deal with ZONE_DEVICE. So
-	 * we will not try to shrink the zones - which is okay as
-	 * set_zone_contiguous() cannot deal with ZONE_DEVICE either way.
+	 * we will not try to shrink the zones.
  	 */
  	if (zone_is_zone_device(zone))
  		return;
  
-	clear_zone_contiguous(zone);
-
  	shrink_zone_span(zone, start_pfn, start_pfn + nr_pages);
  	update_pgdat_span(pgdat);
-
-	set_zone_contiguous(zone);
  }
  
  /**
@@ -753,8 +748,6 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
  	struct pglist_data *pgdat = zone->zone_pgdat;
  	int nid = pgdat->node_id;
  
-	clear_zone_contiguous(zone);
-
  	if (zone_is_empty(zone))
  		init_currently_empty_zone(zone, start_pfn, nr_pages);
  	resize_zone_range(zone, start_pfn, nr_pages);
@@ -782,8 +775,6 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
  	memmap_init_range(nr_pages, nid, zone_idx(zone), start_pfn, 0,
  			 MEMINIT_HOTPLUG, altmap, migratetype,
  			 isolate_pageblock);
-
-	set_zone_contiguous(zone);
  }
  
  struct auto_movable_stats {
@@ -1079,6 +1070,7 @@ void adjust_present_page_count(struct page *page, struct memory_group *group,
  	if (early_section(__pfn_to_section(page_to_pfn(page))))
  		zone->present_early_pages += nr_pages;
  	zone->present_pages += nr_pages;
+	WRITE_ONCE(zone->online_pages,  zone->online_pages + nr_pages);
  	zone->zone_pgdat->node_present_pages += nr_pages;
  
  	if (group && movable)
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 2a809cd8e7fa..e33caa6fb6fc 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -2263,9 +2263,10 @@ void __init init_cma_pageblock(struct page *page)
  }
  #endif
  
-void set_zone_contiguous(struct zone *zone)
+static void calc_online_pages(struct zone *zone)
  {
  	unsigned long block_start_pfn = zone->zone_start_pfn;
+	unsigned long online_pages = 0;
  	unsigned long block_end_pfn;
  
  	block_end_pfn = pageblock_end_pfn(block_start_pfn);
@@ -2277,12 +2278,11 @@ void set_zone_contiguous(struct zone *zone)
  
  		if (!__pageblock_pfn_to_page(block_start_pfn,
  					     block_end_pfn, zone))
-			return;
+			continue;
  		cond_resched();
+		online_pages += block_end_pfn - block_start_pfn;
  	}
-
-	/* We confirm that there is no hole */
-	zone->contiguous = true;
+	zone->online_pages = online_pages;
  }
  
  /*
@@ -2348,7 +2348,7 @@ void __init page_alloc_init_late(void)
  		shuffle_free_memory(NODE_DATA(nid));
  
  	for_each_populated_zone(zone)
-		set_zone_contiguous(zone);
+		calc_online_pages(zone);
  
  	/* Initialize page ext after all struct pages are initialized. */
  	if (deferred_struct_pages)
-- 
2.43.0


-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v9 2/2] mm/memory hotplug/unplug: Optimize zone->contiguous update when changes pfn range
  2026-02-09 12:44         ` David Hildenbrand (Arm)
@ 2026-02-10 11:44           ` Mike Rapoport
  2026-02-10 15:28             ` Li, Tianyou
  2026-02-11 12:19             ` David Hildenbrand (Arm)
  0 siblings, 2 replies; 13+ messages in thread
From: Mike Rapoport @ 2026-02-10 11:44 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Tianyou Li, Oscar Salvador, Wei Yang, Michal Hocko, linux-mm,
	Yong Hu, Nanhai Zou, Yuan Liu, Tim Chen, Qiuxu Zhuo, Yu C Chen,
	Pan Deng, Chen Zhang, linux-kernel

On Mon, Feb 09, 2026 at 01:44:45PM +0100, David Hildenbrand (Arm) wrote:
> On 2/9/26 11:52, David Hildenbrand (Arm) wrote:
> > On 2/8/26 20:39, Mike Rapoport wrote:
> > > On Sat, Feb 07, 2026 at 12:00:09PM +0100, David Hildenbrand (Arm) wrote:
> > > > 
> > > > Thanks for all your work on this and sorry for being slower with
> > > > review the last month.
> > > > 
> > > > While I was in the shower I was thinking about how much I hate
> > > > zone->contiguous + the pageblock walking, and how we could just get
> > > > rid of it.
> > > > 
> > > > You know, just what you do while having a relaxing shower.
> > > > 
> > > > 
> > > > And I was wondering:
> > > > 
> > > > (a) in which case would we have zone_spanned_pages == zone_present_pages
> > > > and the zone *not* being contiguous? I assume this just cannot happen,
> > > > otherwise BUG.
> > > > 
> > > > (b) in which case would we have zone_spanned_pages != zone_present_pages
> > > > and the zone *being* contiguous? I assume in some cases where we
> > > > have small
> > > > holes within a pageblock?
> > > > 
> > > > Reading the doc of __pageblock_pfn_to_page(), there are some weird
> > > > scenarios with holes in pageblocks.
> > > It seems that "zone->contigous" is really bad name for what this thing
> > > represents.
> > > 
> > > tl;dr I don't think zone_spanned_pages == zone_present_pages is
> > > related to
> > > zone->contigous at all :)
> > 
> > My point in (a) was that with "zone_spanned_pages == zone_present_pages"
> > there are no holes so -> contiguous.
> > 
> > (b), and what I said further below, is exactly about memory holes where
> > we have a memmap, but it's not present memory.
> > 
> > > 
> > > If you look at pageblock_pfn_to_page() and __pageblock_pfn_to_page(), the
> > > check for zone->contigous should guarantee that the entire pageblock
> > > has a
> > > valid memory map and that the entire pageblock fits a zone and does not
> > > cross zone/node boundaries.
> > 
> > Right. But that must hold for each and ever pageblock in the spanned
> > zone range for it to be contiguous.
> > 
> > zone->contigous tells you "pfn_to_page()" is valid on the complete zone
> > range"
> > 
> > That's why set_zone_contiguous() probes __pageblock_pfn_to_page() on ech
> > and ever pageblock.
> > 
> > > 
> > > For coldplug memory the memory map is valid for every section that has
> > > present memory, i.e. even it there is a hole in a section, it's
> > > memory map
> > > will be populated and will have struct pages.
> > 
> > There is this sub-section thing, and holes larger than a section might
> > not have a memmap (unless reserved I guess).
> > 
> > > 
> > > When zone->contigous is false, the slow path in __pageblock_pfn_to_page()
> > > essentially checks if the first page in a pageblock is online and if
> > > first
> > > and last pages are in the zone being compacted.
> > > AFAIU, in the hotplug case an entire pageblock is always onlined to the
> > > same zone, so zone->contigous won't change after the hotplug is complete.
> > 
> > I think you are missing a point: hotp(un)plug might create holes in the
> > zone span. Then, pfn_to_page() is no longer valid to be called on
> > arbitrary pageblocks within the zone.
> > 
> > > 
> > > We might set it to false in the beginning of the hotplug to avoid
> > > scanning
> > > offline pages, although I'm not sure if it's possible.
> > > 
> > > But in the end of hotplug we can simply restore the old value and
> > > move on.
> > 
> > No, you might create holes.
> > 
> > > 
> > > For the coldplug case I'm also not sure it's worth the hassle, we could
> > > just let compaction scan a few more pfns for those rare weird pageblocks
> > > and bail out on wrong page conditions.
> > 
> > To recap:
> > 
> > My idea is that "zone_spanned_pages == zone_present_pages" tells you
> > that the zone is contiguous because there are no holes.
> > 
> > To handle "non-memory with a struct page", you'd have to check
> > 
> >      "zone_spanned_pages == zone_present_pages +
> >           zone_non_present_memmap_pages"
> > 
> > Or shorter
> > 
> >      "zone_spanned_pages == zone_pages_with_memmap"
> > 
> > Then, pfn_to_page() is valid within the complete zone.
> > 
> > The question is how to best calculate the "zone_pages_with_memmap"
> > during boot.
> > 
> > During hot(un)plug we only add/remove zone_present_pages. The
> > zone_non_present_memmap_pages will not change due to hot(un)plug later.
> > 
> 
> The following hack does the trick. But
> 
> (a) I wish we could get rid of the pageblock walking in calc_online_pages().
> (b) "online_pages" has weird semantics due to the pageblock handling.
>     "online_pageblock_pages"? not sure.
> (c) Calculating "online_pages" when we know there is a hole does not make sense,
>     as we could just keep it 0 if there are holes and simply set it to
>     zone->online_pageblock_pages->zone->spanned_pages in case all are online.
> 
> 
> From d4cb825e91a6363afc68fb994c5d9b29c38c5f42 Mon Sep 17 00:00:00 2001
> From: "David Hildenbrand (Arm)" <david@kernel.org>
> Date: Mon, 9 Feb 2026 13:40:24 +0100
> Subject: [PATCH] tmp
> 
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
> ---
>  include/linux/mmzone.h | 25 +++++++++++++++++++++++--
>  mm/internal.h          |  8 +-------
>  mm/memory_hotplug.c    | 20 ++++++--------------
>  mm/mm_init.c           | 12 ++++++------
>  4 files changed, 36 insertions(+), 29 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index fc5d6c88d2f0..3f7d8d88c597 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -943,6 +943,11 @@ struct zone {
>  	 * cma pages is present pages that are assigned for CMA use
>  	 * (MIGRATE_CMA).
>  	 *
> +	 * online_pages is pages within the zone that have an online memmap.
> +	 * online_pages include present pages and memory holes that have a
> +	 * memmap. When spanned_pages == online_pages, pfn_to_page() can be
> +	 * performed without further checks on any pfn within the zone span.

Maybe pages_with_memmap? It would stand off from managed, spanned and
present, but it's clearer than online IMHO.

> +	 *
>  	 * So present_pages may be used by memory hotplug or memory power
>  	 * management logic to figure out unmanaged pages by checking
>  	 * (present_pages - managed_pages). And managed_pages should be used
> @@ -967,6 +972,7 @@ struct zone {
>  	atomic_long_t		managed_pages;
>  	unsigned long		spanned_pages;
>  	unsigned long		present_pages;
> +	unsigned long		online_pages;
>  #if defined(CONFIG_MEMORY_HOTPLUG)
>  	unsigned long		present_early_pages;
>  #endif
> @@ -1051,8 +1057,6 @@ struct zone {
>  	bool			compact_blockskip_flush;
>  #endif
> -	bool			contiguous;
> -
>  	CACHELINE_PADDING(_pad3_);
>  	/* Zone statistics */
>  	atomic_long_t		vm_stat[NR_VM_ZONE_STAT_ITEMS];
> @@ -1124,6 +1128,23 @@ static inline bool zone_spans_pfn(const struct zone *zone, unsigned long pfn)
>  	return zone->zone_start_pfn <= pfn && pfn < zone_end_pfn(zone);
>  }
> +/**
> + * zone_is_contiguous - test whether a zone is contiguous
> + * @zone: the zone to test.
> + *
> + * In a contiguous zone, it is valid to call pfn_to_page() on any pfn in the
> + * spanned zone without requiting pfn_valid() or pfn_to_online_page() checks.
> + *
> + * Returns: true if contiguous, otherwise false.
> + */
> +static inline bool zone_is_contiguous(const struct zone *zone)
> +{
> +	return READ_ONCE(zone->spanned_pages) == READ_ONCE(zone->online_pages);
> +}
> +
>  static inline bool zone_is_initialized(const struct zone *zone)
>  {
>  	return zone->initialized;
> diff --git a/mm/internal.h b/mm/internal.h
> index f35dbcf99a86..6062f9b8ee62 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -716,21 +716,15 @@ extern struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
>  static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
>  				unsigned long end_pfn, struct zone *zone)
>  {
> -	if (zone->contiguous)
> +	if (zone_is_contiguous(zone))
>  		return pfn_to_page(start_pfn);
>  	return __pageblock_pfn_to_page(start_pfn, end_pfn, zone);
>  }
> -void set_zone_contiguous(struct zone *zone);
>  bool pfn_range_intersects_zones(int nid, unsigned long start_pfn,
>  			   unsigned long nr_pages);
> -static inline void clear_zone_contiguous(struct zone *zone)
> -{
> -	zone->contiguous = false;
> -}
> -
>  extern int __isolate_free_page(struct page *page, unsigned int order);
>  extern void __putback_isolated_page(struct page *page, unsigned int order,
>  				    int mt);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index a63ec679d861..76496c1039a9 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -492,11 +492,11 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
>  		pfn = find_smallest_section_pfn(nid, zone, end_pfn,
>  						zone_end_pfn(zone));
>  		if (pfn) {
> -			zone->spanned_pages = zone_end_pfn(zone) - pfn;
> +			WRITE_ONCE(zone->spanned_pages, zone_end_pfn(zone) - pfn);
>  			zone->zone_start_pfn = pfn;
>  		} else {
>  			zone->zone_start_pfn = 0;
> -			zone->spanned_pages = 0;
> +			WRITE_ONCE(zone->spanned_pages, 0);
>  		}
>  	} else if (zone_end_pfn(zone) == end_pfn) {
>  		/*
> @@ -508,10 +508,10 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
>  		pfn = find_biggest_section_pfn(nid, zone, zone->zone_start_pfn,
>  					       start_pfn);
>  		if (pfn)
> -			zone->spanned_pages = pfn - zone->zone_start_pfn + 1;
> +			WRITE_ONCE(zone->spanned_pages, pfn - zone->zone_start_pfn + 1);
>  		else {
>  			zone->zone_start_pfn = 0;
> -			zone->spanned_pages = 0;
> +			WRITE_ONCE(zone->spanned_pages, 0);
>  		}
>  	}
>  }
> @@ -565,18 +565,13 @@ void remove_pfn_range_from_zone(struct zone *zone,
>  	/*
>  	 * Zone shrinking code cannot properly deal with ZONE_DEVICE. So
> -	 * we will not try to shrink the zones - which is okay as
> -	 * set_zone_contiguous() cannot deal with ZONE_DEVICE either way.
> +	 * we will not try to shrink the zones.
>  	 */
>  	if (zone_is_zone_device(zone))
>  		return;
> -	clear_zone_contiguous(zone);
> -
>  	shrink_zone_span(zone, start_pfn, start_pfn + nr_pages);
>  	update_pgdat_span(pgdat);
> -
> -	set_zone_contiguous(zone);
>  }
>  /**
> @@ -753,8 +748,6 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
>  	struct pglist_data *pgdat = zone->zone_pgdat;
>  	int nid = pgdat->node_id;
> -	clear_zone_contiguous(zone);
> -
>  	if (zone_is_empty(zone))
>  		init_currently_empty_zone(zone, start_pfn, nr_pages);
>  	resize_zone_range(zone, start_pfn, nr_pages);
> @@ -782,8 +775,6 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
>  	memmap_init_range(nr_pages, nid, zone_idx(zone), start_pfn, 0,
>  			 MEMINIT_HOTPLUG, altmap, migratetype,
>  			 isolate_pageblock);
> -
> -	set_zone_contiguous(zone);
>  }
>  struct auto_movable_stats {
> @@ -1079,6 +1070,7 @@ void adjust_present_page_count(struct page *page, struct memory_group *group,
>  	if (early_section(__pfn_to_section(page_to_pfn(page))))
>  		zone->present_early_pages += nr_pages;
>  	zone->present_pages += nr_pages;
> +	WRITE_ONCE(zone->online_pages,  zone->online_pages + nr_pages);
>  	zone->zone_pgdat->node_present_pages += nr_pages;
>  	if (group && movable)
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index 2a809cd8e7fa..e33caa6fb6fc 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -2263,9 +2263,10 @@ void __init init_cma_pageblock(struct page *page)
>  }
>  #endif
> -void set_zone_contiguous(struct zone *zone)
> +static void calc_online_pages(struct zone *zone)
>  {
>  	unsigned long block_start_pfn = zone->zone_start_pfn;
> +	unsigned long online_pages = 0;
>  	unsigned long block_end_pfn;
>  	block_end_pfn = pageblock_end_pfn(block_start_pfn);
> @@ -2277,12 +2278,11 @@ void set_zone_contiguous(struct zone *zone)
>  		if (!__pageblock_pfn_to_page(block_start_pfn,
>  					     block_end_pfn, zone))
> -			return;
> +			continue;
>  		cond_resched();
> +		online_pages += block_end_pfn - block_start_pfn;

I think we can completely get rid of this with something like this untested
patch to calculate zone->online_pages for coldplug:

diff --git a/mm/mm_init.c b/mm/mm_init.c
index e33caa6fb6fc..ff2f75e7b49f 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -845,9 +845,9 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn)
  *   zone/node above the hole except for the trailing pages in the last
  *   section that will be appended to the zone/node below.
  */
-static void __init init_unavailable_range(unsigned long spfn,
-					  unsigned long epfn,
-					  int zone, int node)
+static u64 __init init_unavailable_range(unsigned long spfn,
+					 unsigned long epfn,
+					 int zone, int node)
 {
 	unsigned long pfn;
 	u64 pgcnt = 0;
@@ -861,6 +861,8 @@ static void __init init_unavailable_range(unsigned long spfn,
 	if (pgcnt)
 		pr_info("On node %d, zone %s: %lld pages in unavailable ranges\n",
 			node, zone_names[zone], pgcnt);
+
+	return pgcnt;
 }
 
 /*
@@ -959,9 +961,10 @@ static void __init memmap_init_zone_range(struct zone *zone,
 	memmap_init_range(end_pfn - start_pfn, nid, zone_id, start_pfn,
 			  zone_end_pfn, MEMINIT_EARLY, NULL, MIGRATE_MOVABLE,
 			  false);
+	zone->online_pages += (end_pfn - start_pfn);
 
 	if (*hole_pfn < start_pfn)
-		init_unavailable_range(*hole_pfn, start_pfn, zone_id, nid);
+		zone->online_pages += init_unavailable_range(*hole_pfn, start_pfn, zone_id, nid);
 
 	*hole_pfn = end_pfn;
 }

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v9 2/2] mm/memory hotplug/unplug: Optimize zone->contiguous update when changes pfn range
  2026-02-10 11:44           ` Mike Rapoport
@ 2026-02-10 15:28             ` Li, Tianyou
  2026-02-11 12:19             ` David Hildenbrand (Arm)
  1 sibling, 0 replies; 13+ messages in thread
From: Li, Tianyou @ 2026-02-10 15:28 UTC (permalink / raw)
  To: Mike Rapoport, David Hildenbrand (Arm)
  Cc: Oscar Salvador, Wei Yang, Michal Hocko, linux-mm, Yong Hu,
	Nanhai Zou, Yuan Liu, Tim Chen, Qiuxu Zhuo, Yu C Chen, Pan Deng,
	Chen Zhang, linux-kernel


On 2/10/2026 7:44 PM, Mike Rapoport wrote:
> On Mon, Feb 09, 2026 at 01:44:45PM +0100, David Hildenbrand (Arm) wrote:
>> On 2/9/26 11:52, David Hildenbrand (Arm) wrote:
>>> On 2/8/26 20:39, Mike Rapoport wrote:
>>>> On Sat, Feb 07, 2026 at 12:00:09PM +0100, David Hildenbrand (Arm) wrote:
>>>>> Thanks for all your work on this and sorry for being slower with
>>>>> review the last month.
>>>>>
>>>>> While I was in the shower I was thinking about how much I hate
>>>>> zone->contiguous + the pageblock walking, and how we could just get
>>>>> rid of it.
>>>>>
>>>>> You know, just what you do while having a relaxing shower.
>>>>>
>>>>>
>>>>> And I was wondering:
>>>>>
>>>>> (a) in which case would we have zone_spanned_pages == zone_present_pages
>>>>> and the zone *not* being contiguous? I assume this just cannot happen,
>>>>> otherwise BUG.
>>>>>
>>>>> (b) in which case would we have zone_spanned_pages != zone_present_pages
>>>>> and the zone *being* contiguous? I assume in some cases where we
>>>>> have small
>>>>> holes within a pageblock?
>>>>>
>>>>> Reading the doc of __pageblock_pfn_to_page(), there are some weird
>>>>> scenarios with holes in pageblocks.
>>>> It seems that "zone->contigous" is really bad name for what this thing
>>>> represents.
>>>>
>>>> tl;dr I don't think zone_spanned_pages == zone_present_pages is
>>>> related to
>>>> zone->contigous at all :)
>>> My point in (a) was that with "zone_spanned_pages == zone_present_pages"
>>> there are no holes so -> contiguous.
>>>
>>> (b), and what I said further below, is exactly about memory holes where
>>> we have a memmap, but it's not present memory.
>>>
>>>> If you look at pageblock_pfn_to_page() and __pageblock_pfn_to_page(), the
>>>> check for zone->contigous should guarantee that the entire pageblock
>>>> has a
>>>> valid memory map and that the entire pageblock fits a zone and does not
>>>> cross zone/node boundaries.
>>> Right. But that must hold for each and ever pageblock in the spanned
>>> zone range for it to be contiguous.
>>>
>>> zone->contigous tells you "pfn_to_page()" is valid on the complete zone
>>> range"
>>>
>>> That's why set_zone_contiguous() probes __pageblock_pfn_to_page() on ech
>>> and ever pageblock.
>>>> For coldplug memory the memory map is valid for every section that has
>>>> present memory, i.e. even it there is a hole in a section, it's
>>>> memory map
>>>> will be populated and will have struct pages.
>>> There is this sub-section thing, and holes larger than a section might
>>> not have a memmap (unless reserved I guess).
>>>
>>>> When zone->contigous is false, the slow path in __pageblock_pfn_to_page()
>>>> essentially checks if the first page in a pageblock is online and if
>>>> first
>>>> and last pages are in the zone being compacted.
>>>> AFAIU, in the hotplug case an entire pageblock is always onlined to the
>>>> same zone, so zone->contigous won't change after the hotplug is complete.
>>> I think you are missing a point: hotp(un)plug might create holes in the
>>> zone span. Then, pfn_to_page() is no longer valid to be called on
>>> arbitrary pageblocks within the zone.
>>>
>>>> We might set it to false in the beginning of the hotplug to avoid
>>>> scanning
>>>> offline pages, although I'm not sure if it's possible.
>>>>
>>>> But in the end of hotplug we can simply restore the old value and
>>>> move on.
>>> No, you might create holes.
>>>
>>>> For the coldplug case I'm also not sure it's worth the hassle, we could
>>>> just let compaction scan a few more pfns for those rare weird pageblocks
>>>> and bail out on wrong page conditions.
>>> To recap:
>>>
>>> My idea is that "zone_spanned_pages == zone_present_pages" tells you
>>> that the zone is contiguous because there are no holes.
>>>
>>> To handle "non-memory with a struct page", you'd have to check
>>>
>>>       "zone_spanned_pages == zone_present_pages +
>>>            zone_non_present_memmap_pages"
>>>
>>> Or shorter
>>>
>>>       "zone_spanned_pages == zone_pages_with_memmap"
>>>
>>> Then, pfn_to_page() is valid within the complete zone.
>>>
>>> The question is how to best calculate the "zone_pages_with_memmap"
>>> during boot.
>>>
>>> During hot(un)plug we only add/remove zone_present_pages. The
>>> zone_non_present_memmap_pages will not change due to hot(un)plug later.
>>>
>> The following hack does the trick. But
>>
>> (a) I wish we could get rid of the pageblock walking in calc_online_pages().
>> (b) "online_pages" has weird semantics due to the pageblock handling.
>>      "online_pageblock_pages"? not sure.
>> (c) Calculating "online_pages" when we know there is a hole does not make sense,
>>      as we could just keep it 0 if there are holes and simply set it to
>>      zone->online_pageblock_pages->zone->spanned_pages in case all are online.
>>
>>
>>  From d4cb825e91a6363afc68fb994c5d9b29c38c5f42 Mon Sep 17 00:00:00 2001
>> From: "David Hildenbrand (Arm)" <david@kernel.org>
>> Date: Mon, 9 Feb 2026 13:40:24 +0100
>> Subject: [PATCH] tmp
>>
>> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
>> ---
>>   include/linux/mmzone.h | 25 +++++++++++++++++++++++--
>>   mm/internal.h          |  8 +-------
>>   mm/memory_hotplug.c    | 20 ++++++--------------
>>   mm/mm_init.c           | 12 ++++++------
>>   4 files changed, 36 insertions(+), 29 deletions(-)
>>
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index fc5d6c88d2f0..3f7d8d88c597 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -943,6 +943,11 @@ struct zone {
>>   	 * cma pages is present pages that are assigned for CMA use
>>   	 * (MIGRATE_CMA).
>>   	 *
>> +	 * online_pages is pages within the zone that have an online memmap.
>> +	 * online_pages include present pages and memory holes that have a
>> +	 * memmap. When spanned_pages == online_pages, pfn_to_page() can be
>> +	 * performed without further checks on any pfn within the zone span.
> Maybe pages_with_memmap? It would stand off from managed, spanned and
> present, but it's clearer than online IMHO.
>
>> +	 *
>>   	 * So present_pages may be used by memory hotplug or memory power
>>   	 * management logic to figure out unmanaged pages by checking
>>   	 * (present_pages - managed_pages). And managed_pages should be used
>> @@ -967,6 +972,7 @@ struct zone {
>>   	atomic_long_t		managed_pages;
>>   	unsigned long		spanned_pages;
>>   	unsigned long		present_pages;
>> +	unsigned long		online_pages;
>>   #if defined(CONFIG_MEMORY_HOTPLUG)
>>   	unsigned long		present_early_pages;
>>   #endif
>> @@ -1051,8 +1057,6 @@ struct zone {
>>   	bool			compact_blockskip_flush;
>>   #endif
>> -	bool			contiguous;
>> -
>>   	CACHELINE_PADDING(_pad3_);
>>   	/* Zone statistics */
>>   	atomic_long_t		vm_stat[NR_VM_ZONE_STAT_ITEMS];
>> @@ -1124,6 +1128,23 @@ static inline bool zone_spans_pfn(const struct zone *zone, unsigned long pfn)
>>   	return zone->zone_start_pfn <= pfn && pfn < zone_end_pfn(zone);
>>   }
>> +/**
>> + * zone_is_contiguous - test whether a zone is contiguous
>> + * @zone: the zone to test.
>> + *
>> + * In a contiguous zone, it is valid to call pfn_to_page() on any pfn in the
>> + * spanned zone without requiting pfn_valid() or pfn_to_online_page() checks.
>> + *
>> + * Returns: true if contiguous, otherwise false.
>> + */
>> +static inline bool zone_is_contiguous(const struct zone *zone)
>> +{
>> +	return READ_ONCE(zone->spanned_pages) == READ_ONCE(zone->online_pages);
>> +}
>> +
>>   static inline bool zone_is_initialized(const struct zone *zone)
>>   {
>>   	return zone->initialized;
>> diff --git a/mm/internal.h b/mm/internal.h
>> index f35dbcf99a86..6062f9b8ee62 100644
>> --- a/mm/internal.h
>> +++ b/mm/internal.h
>> @@ -716,21 +716,15 @@ extern struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
>>   static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
>>   				unsigned long end_pfn, struct zone *zone)
>>   {
>> -	if (zone->contiguous)
>> +	if (zone_is_contiguous(zone))
>>   		return pfn_to_page(start_pfn);
>>   	return __pageblock_pfn_to_page(start_pfn, end_pfn, zone);
>>   }
>> -void set_zone_contiguous(struct zone *zone);
>>   bool pfn_range_intersects_zones(int nid, unsigned long start_pfn,
>>   			   unsigned long nr_pages);
>> -static inline void clear_zone_contiguous(struct zone *zone)
>> -{
>> -	zone->contiguous = false;
>> -}
>> -
>>   extern int __isolate_free_page(struct page *page, unsigned int order);
>>   extern void __putback_isolated_page(struct page *page, unsigned int order,
>>   				    int mt);
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index a63ec679d861..76496c1039a9 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -492,11 +492,11 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
>>   		pfn = find_smallest_section_pfn(nid, zone, end_pfn,
>>   						zone_end_pfn(zone));
>>   		if (pfn) {
>> -			zone->spanned_pages = zone_end_pfn(zone) - pfn;
>> +			WRITE_ONCE(zone->spanned_pages, zone_end_pfn(zone) - pfn);
>>   			zone->zone_start_pfn = pfn;
>>   		} else {
>>   			zone->zone_start_pfn = 0;
>> -			zone->spanned_pages = 0;
>> +			WRITE_ONCE(zone->spanned_pages, 0);
>>   		}
>>   	} else if (zone_end_pfn(zone) == end_pfn) {
>>   		/*
>> @@ -508,10 +508,10 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
>>   		pfn = find_biggest_section_pfn(nid, zone, zone->zone_start_pfn,
>>   					       start_pfn);
>>   		if (pfn)
>> -			zone->spanned_pages = pfn - zone->zone_start_pfn + 1;
>> +			WRITE_ONCE(zone->spanned_pages, pfn - zone->zone_start_pfn + 1);
>>   		else {
>>   			zone->zone_start_pfn = 0;
>> -			zone->spanned_pages = 0;
>> +			WRITE_ONCE(zone->spanned_pages, 0);
>>   		}
>>   	}
>>   }
>> @@ -565,18 +565,13 @@ void remove_pfn_range_from_zone(struct zone *zone,
>>   	/*
>>   	 * Zone shrinking code cannot properly deal with ZONE_DEVICE. So
>> -	 * we will not try to shrink the zones - which is okay as
>> -	 * set_zone_contiguous() cannot deal with ZONE_DEVICE either way.
>> +	 * we will not try to shrink the zones.
>>   	 */
>>   	if (zone_is_zone_device(zone))
>>   		return;
>> -	clear_zone_contiguous(zone);
>> -
>>   	shrink_zone_span(zone, start_pfn, start_pfn + nr_pages);
>>   	update_pgdat_span(pgdat);
>> -
>> -	set_zone_contiguous(zone);
>>   }
>>   /**
>> @@ -753,8 +748,6 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
>>   	struct pglist_data *pgdat = zone->zone_pgdat;
>>   	int nid = pgdat->node_id;
>> -	clear_zone_contiguous(zone);
>> -
>>   	if (zone_is_empty(zone))
>>   		init_currently_empty_zone(zone, start_pfn, nr_pages);
>>   	resize_zone_range(zone, start_pfn, nr_pages);
>> @@ -782,8 +775,6 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
>>   	memmap_init_range(nr_pages, nid, zone_idx(zone), start_pfn, 0,
>>   			 MEMINIT_HOTPLUG, altmap, migratetype,
>>   			 isolate_pageblock);
>> -
>> -	set_zone_contiguous(zone);
>>   }
>>   struct auto_movable_stats {
>> @@ -1079,6 +1070,7 @@ void adjust_present_page_count(struct page *page, struct memory_group *group,
>>   	if (early_section(__pfn_to_section(page_to_pfn(page))))
>>   		zone->present_early_pages += nr_pages;
>>   	zone->present_pages += nr_pages;
>> +	WRITE_ONCE(zone->online_pages,  zone->online_pages + nr_pages);
>>   	zone->zone_pgdat->node_present_pages += nr_pages;
>>   	if (group && movable)
>> diff --git a/mm/mm_init.c b/mm/mm_init.c
>> index 2a809cd8e7fa..e33caa6fb6fc 100644
>> --- a/mm/mm_init.c
>> +++ b/mm/mm_init.c
>> @@ -2263,9 +2263,10 @@ void __init init_cma_pageblock(struct page *page)
>>   }
>>   #endif
>> -void set_zone_contiguous(struct zone *zone)
>> +static void calc_online_pages(struct zone *zone)
>>   {
>>   	unsigned long block_start_pfn = zone->zone_start_pfn;
>> +	unsigned long online_pages = 0;
>>   	unsigned long block_end_pfn;
>>   	block_end_pfn = pageblock_end_pfn(block_start_pfn);
>> @@ -2277,12 +2278,11 @@ void set_zone_contiguous(struct zone *zone)
>>   		if (!__pageblock_pfn_to_page(block_start_pfn,
>>   					     block_end_pfn, zone))
>> -			return;
>> +			continue;
>>   		cond_resched();
>> +		online_pages += block_end_pfn - block_start_pfn;
> I think we can completely get rid of this with something like this untested
> patch to calculate zone->online_pages for coldplug:
>
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index e33caa6fb6fc..ff2f75e7b49f 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -845,9 +845,9 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn)
>    *   zone/node above the hole except for the trailing pages in the last
>    *   section that will be appended to the zone/node below.
>    */
> -static void __init init_unavailable_range(unsigned long spfn,
> -					  unsigned long epfn,
> -					  int zone, int node)
> +static u64 __init init_unavailable_range(unsigned long spfn,
> +					 unsigned long epfn,
> +					 int zone, int node)
>   {
>   	unsigned long pfn;
>   	u64 pgcnt = 0;
> @@ -861,6 +861,8 @@ static void __init init_unavailable_range(unsigned long spfn,
>   	if (pgcnt)
>   		pr_info("On node %d, zone %s: %lld pages in unavailable ranges\n",
>   			node, zone_names[zone], pgcnt);
> +
> +	return pgcnt;
>   }
>   
>   /*
> @@ -959,9 +961,10 @@ static void __init memmap_init_zone_range(struct zone *zone,
>   	memmap_init_range(end_pfn - start_pfn, nid, zone_id, start_pfn,
>   			  zone_end_pfn, MEMINIT_EARLY, NULL, MIGRATE_MOVABLE,
>   			  false);
> +	zone->online_pages += (end_pfn - start_pfn);
>   
>   	if (*hole_pfn < start_pfn)
> -		init_unavailable_range(*hole_pfn, start_pfn, zone_id, nid);
> +		zone->online_pages += init_unavailable_range(*hole_pfn, start_pfn, zone_id, nid);
>   
>   	*hole_pfn = end_pfn;
>   }



Sorry for late response, I am trying to catch up with the discussion:) 
Per my understanding, zone->contiguous has 2 semantics in combination, 
one is the pages are full filled in the zone span, the other is those 
pages could be access as it has been onlined. The check 
zone_spanned_pages == zone_online_pages guaranteed the both. Either 
resize_zone_span() or shrink_zone_span() will change the 
zone_spanned_pages so they need to use WRITE_ONCE to guarantee the 
ordering; so does to the adjust_present_page_count() where 
zone_online_pages get updated.

Regards,

Tianyou



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v9 2/2] mm/memory hotplug/unplug: Optimize zone->contiguous update when changes pfn range
  2026-02-10 11:44           ` Mike Rapoport
  2026-02-10 15:28             ` Li, Tianyou
@ 2026-02-11 12:19             ` David Hildenbrand (Arm)
  2026-02-12  8:32               ` Mike Rapoport
  1 sibling, 1 reply; 13+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-11 12:19 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Tianyou Li, Oscar Salvador, Wei Yang, Michal Hocko, linux-mm,
	Yong Hu, Nanhai Zou, Yuan Liu, Tim Chen, Qiuxu Zhuo, Yu C Chen,
	Pan Deng, Chen Zhang, linux-kernel

>>   	 *
>> +	 * online_pages is pages within the zone that have an online memmap.
>> +	 * online_pages include present pages and memory holes that have a
>> +	 * memmap. When spanned_pages == online_pages, pfn_to_page() can be
>> +	 * performed without further checks on any pfn within the zone span.
> 
> Maybe pages_with_memmap? It would stand off from managed, spanned and
> present, but it's clearer than online IMHO.

offline pages also have a memmap, but that should not be touched as it 
might contain garbage. So it's a bit more tricky :)

> 
>> +	 *
>>   	 * So present_pages may be used by memory hotplug or memory power
>>   	 * management logic to figure out unmanaged pages by checking
>>   	 * (present_pages - managed_pages). And managed_pages should be used
>> @@ -967,6 +972,7 @@ struct zone {
>>   	atomic_long_t		managed_pages;
>>   	unsigned long		spanned_pages;
>>   	unsigned long		present_pages;
>> +	unsigned long		online_pages;
>>   #if defined(CONFIG_MEMORY_HOTPLUG)
>>   	unsigned long		present_early_pages;
>>   #endif
>> @@ -1051,8 +1057,6 @@ struct zone {
>>   	bool			compact_blockskip_flush;
>>   #endif
>> -	bool			contiguous;
>> -
>>   	CACHELINE_PADDING(_pad3_);
>>   	/* Zone statistics */
>>   	atomic_long_t		vm_stat[NR_VM_ZONE_STAT_ITEMS];
>> @@ -1124,6 +1128,23 @@ static inline bool zone_spans_pfn(const struct zone *zone, unsigned long pfn)
>>   	return zone->zone_start_pfn <= pfn && pfn < zone_end_pfn(zone);
>>   }
>> +/**
>> + * zone_is_contiguous - test whether a zone is contiguous
>> + * @zone: the zone to test.
>> + *
>> + * In a contiguous zone, it is valid to call pfn_to_page() on any pfn in the
>> + * spanned zone without requiting pfn_valid() or pfn_to_online_page() checks.
>> + *
>> + * Returns: true if contiguous, otherwise false.
>> + */
>> +static inline bool zone_is_contiguous(const struct zone *zone)
>> +{
>> +	return READ_ONCE(zone->spanned_pages) == READ_ONCE(zone->online_pages);
>> +}
>> +
>>   static inline bool zone_is_initialized(const struct zone *zone)
>>   {
>>   	return zone->initialized;
>> diff --git a/mm/internal.h b/mm/internal.h
>> index f35dbcf99a86..6062f9b8ee62 100644
>> --- a/mm/internal.h
>> +++ b/mm/internal.h
>> @@ -716,21 +716,15 @@ extern struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
>>   static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
>>   				unsigned long end_pfn, struct zone *zone)
>>   {
>> -	if (zone->contiguous)
>> +	if (zone_is_contiguous(zone))
>>   		return pfn_to_page(start_pfn);
>>   	return __pageblock_pfn_to_page(start_pfn, end_pfn, zone);
>>   }
>> -void set_zone_contiguous(struct zone *zone);
>>   bool pfn_range_intersects_zones(int nid, unsigned long start_pfn,
>>   			   unsigned long nr_pages);
>> -static inline void clear_zone_contiguous(struct zone *zone)
>> -{
>> -	zone->contiguous = false;
>> -}
>> -
>>   extern int __isolate_free_page(struct page *page, unsigned int order);
>>   extern void __putback_isolated_page(struct page *page, unsigned int order,
>>   				    int mt);
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index a63ec679d861..76496c1039a9 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -492,11 +492,11 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
>>   		pfn = find_smallest_section_pfn(nid, zone, end_pfn,
>>   						zone_end_pfn(zone));
>>   		if (pfn) {
>> -			zone->spanned_pages = zone_end_pfn(zone) - pfn;
>> +			WRITE_ONCE(zone->spanned_pages, zone_end_pfn(zone) - pfn);
>>   			zone->zone_start_pfn = pfn;
>>   		} else {
>>   			zone->zone_start_pfn = 0;
>> -			zone->spanned_pages = 0;
>> +			WRITE_ONCE(zone->spanned_pages, 0);
>>   		}
>>   	} else if (zone_end_pfn(zone) == end_pfn) {
>>   		/*
>> @@ -508,10 +508,10 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
>>   		pfn = find_biggest_section_pfn(nid, zone, zone->zone_start_pfn,
>>   					       start_pfn);
>>   		if (pfn)
>> -			zone->spanned_pages = pfn - zone->zone_start_pfn + 1;
>> +			WRITE_ONCE(zone->spanned_pages, pfn - zone->zone_start_pfn + 1);
>>   		else {
>>   			zone->zone_start_pfn = 0;
>> -			zone->spanned_pages = 0;
>> +			WRITE_ONCE(zone->spanned_pages, 0);
>>   		}
>>   	}
>>   }
>> @@ -565,18 +565,13 @@ void remove_pfn_range_from_zone(struct zone *zone,
>>   	/*
>>   	 * Zone shrinking code cannot properly deal with ZONE_DEVICE. So
>> -	 * we will not try to shrink the zones - which is okay as
>> -	 * set_zone_contiguous() cannot deal with ZONE_DEVICE either way.
>> +	 * we will not try to shrink the zones.
>>   	 */
>>   	if (zone_is_zone_device(zone))
>>   		return;
>> -	clear_zone_contiguous(zone);
>> -
>>   	shrink_zone_span(zone, start_pfn, start_pfn + nr_pages);
>>   	update_pgdat_span(pgdat);
>> -
>> -	set_zone_contiguous(zone);
>>   }
>>   /**
>> @@ -753,8 +748,6 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
>>   	struct pglist_data *pgdat = zone->zone_pgdat;
>>   	int nid = pgdat->node_id;
>> -	clear_zone_contiguous(zone);
>> -
>>   	if (zone_is_empty(zone))
>>   		init_currently_empty_zone(zone, start_pfn, nr_pages);
>>   	resize_zone_range(zone, start_pfn, nr_pages);
>> @@ -782,8 +775,6 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
>>   	memmap_init_range(nr_pages, nid, zone_idx(zone), start_pfn, 0,
>>   			 MEMINIT_HOTPLUG, altmap, migratetype,
>>   			 isolate_pageblock);
>> -
>> -	set_zone_contiguous(zone);
>>   }
>>   struct auto_movable_stats {
>> @@ -1079,6 +1070,7 @@ void adjust_present_page_count(struct page *page, struct memory_group *group,
>>   	if (early_section(__pfn_to_section(page_to_pfn(page))))
>>   		zone->present_early_pages += nr_pages;
>>   	zone->present_pages += nr_pages;
>> +	WRITE_ONCE(zone->online_pages,  zone->online_pages + nr_pages);
>>   	zone->zone_pgdat->node_present_pages += nr_pages;
>>   	if (group && movable)
>> diff --git a/mm/mm_init.c b/mm/mm_init.c
>> index 2a809cd8e7fa..e33caa6fb6fc 100644
>> --- a/mm/mm_init.c
>> +++ b/mm/mm_init.c
>> @@ -2263,9 +2263,10 @@ void __init init_cma_pageblock(struct page *page)
>>   }
>>   #endif
>> -void set_zone_contiguous(struct zone *zone)
>> +static void calc_online_pages(struct zone *zone)
>>   {
>>   	unsigned long block_start_pfn = zone->zone_start_pfn;
>> +	unsigned long online_pages = 0;
>>   	unsigned long block_end_pfn;
>>   	block_end_pfn = pageblock_end_pfn(block_start_pfn);
>> @@ -2277,12 +2278,11 @@ void set_zone_contiguous(struct zone *zone)
>>   		if (!__pageblock_pfn_to_page(block_start_pfn,
>>   					     block_end_pfn, zone))
>> -			return;
>> +			continue;
>>   		cond_resched();
>> +		online_pages += block_end_pfn - block_start_pfn;
> 
> I think we can completely get rid of this with something like this untested
> patch to calculate zone->online_pages for coldplug:
> 
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index e33caa6fb6fc..ff2f75e7b49f 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -845,9 +845,9 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn)
>    *   zone/node above the hole except for the trailing pages in the last
>    *   section that will be appended to the zone/node below.
>    */
> -static void __init init_unavailable_range(unsigned long spfn,
> -					  unsigned long epfn,
> -					  int zone, int node)
> +static u64 __init init_unavailable_range(unsigned long spfn,
> +					 unsigned long epfn,
> +					 int zone, int node)
>   {
>   	unsigned long pfn;
>   	u64 pgcnt = 0;
> @@ -861,6 +861,8 @@ static void __init init_unavailable_range(unsigned long spfn,
>   	if (pgcnt)
>   		pr_info("On node %d, zone %s: %lld pages in unavailable ranges\n",
>   			node, zone_names[zone], pgcnt);
> +
> +	return pgcnt;
>   }
>   
>   /*
> @@ -959,9 +961,10 @@ static void __init memmap_init_zone_range(struct zone *zone,
>   	memmap_init_range(end_pfn - start_pfn, nid, zone_id, start_pfn,
>   			  zone_end_pfn, MEMINIT_EARLY, NULL, MIGRATE_MOVABLE,
>   			  false);
> +	zone->online_pages += (end_pfn - start_pfn);
>   
>   	if (*hole_pfn < start_pfn)
> -		init_unavailable_range(*hole_pfn, start_pfn, zone_id, nid);
> +		zone->online_pages += init_unavailable_range(*hole_pfn, start_pfn, zone_id, nid);
>   
>   	*hole_pfn = end_pfn;
>   }
> 

Looking at set_zone_contiguous(), __pageblock_pfn_to_page() takes care 
of a weird case where the end of a zone falls into the middle of a 
pageblock.

I am not even sure if that is possible, but we could handle that easily 
in pageblock_pfn_to_page() by checking the requested range against the 
zone spanned range.

Then the semantics "zone->online_pages" would be less weird and more 
closely resemble "pages with online memmap".

init_unavailable_range() might indeed do the trick!

@Tianyou, can you explore that direction? I know, your PTO is coming up.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v9 2/2] mm/memory hotplug/unplug: Optimize zone->contiguous update when changes pfn range
  2026-02-11 12:19             ` David Hildenbrand (Arm)
@ 2026-02-12  8:32               ` Mike Rapoport
  2026-02-12  8:45                 ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 13+ messages in thread
From: Mike Rapoport @ 2026-02-12  8:32 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Tianyou Li, Oscar Salvador, Wei Yang, Michal Hocko, linux-mm,
	Yong Hu, Nanhai Zou, Yuan Liu, Tim Chen, Qiuxu Zhuo, Yu C Chen,
	Pan Deng, Chen Zhang, linux-kernel

On Wed, Feb 11, 2026 at 01:19:56PM +0100, David Hildenbrand (Arm) wrote:
> > >   	 *
> > > +	 * online_pages is pages within the zone that have an online memmap.
> > > +	 * online_pages include present pages and memory holes that have a
> > > +	 * memmap. When spanned_pages == online_pages, pfn_to_page() can be
> > > +	 * performed without further checks on any pfn within the zone span.
> > 
> > Maybe pages_with_memmap? It would stand off from managed, spanned and
> > present, but it's clearer than online IMHO.
> 
> offline pages also have a memmap, but that should not be touched as it might
> contain garbage. So it's a bit more tricky :)

Naming is hard :) 
But I still think mentioning memmap there is useful :)
 
> Looking at set_zone_contiguous(), __pageblock_pfn_to_page() takes care of a
> weird case where the end of a zone falls into the middle of a pageblock.
> 
> I am not even sure if that is possible

It's possible if a pageblock crosses node boundary. We also might add
VM_BUG_ON(pageblock_crosses_nodes(), "FIX YOUR FIRMWARE!") there ;-)

> but we could handle that easily in pageblock_pfn_to_page() by checking
> the requested range against the zone spanned range.

Agree.
 
> Then the semantics "zone->online_pages" would be less weird and more closely
> resemble "pages with online memmap".
> 
> init_unavailable_range() might indeed do the trick!
> 
> @Tianyou, can you explore that direction? I know, your PTO is coming up.
> 
> -- 
> Cheers,
> 
> David

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v9 2/2] mm/memory hotplug/unplug: Optimize zone->contiguous update when changes pfn range
  2026-02-12  8:32               ` Mike Rapoport
@ 2026-02-12  8:45                 ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 13+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-12  8:45 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Tianyou Li, Oscar Salvador, Wei Yang, Michal Hocko, linux-mm,
	Yong Hu, Nanhai Zou, Yuan Liu, Tim Chen, Qiuxu Zhuo, Yu C Chen,
	Pan Deng, Chen Zhang, linux-kernel

On 2/12/26 09:32, Mike Rapoport wrote:
> On Wed, Feb 11, 2026 at 01:19:56PM +0100, David Hildenbrand (Arm) wrote:
>>>
>>> Maybe pages_with_memmap? It would stand off from managed, spanned and
>>> present, but it's clearer than online IMHO.
>>
>> offline pages also have a memmap, but that should not be touched as it might
>> contain garbage. So it's a bit more tricky :)
> 
> Naming is hard :)
> But I still think mentioning memmap there is useful :)

pages_with_online_memmap

Is a bit mouthful, but that shouldn't really be a problem here.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v9 2/2] mm/memory hotplug/unplug: Optimize zone->contiguous update when changes pfn range
  2026-02-08 19:39     ` Mike Rapoport
  2026-02-09 10:52       ` David Hildenbrand (Arm)
@ 2026-02-09 11:38       ` David Hildenbrand (Arm)
  1 sibling, 0 replies; 13+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-09 11:38 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Tianyou Li, Oscar Salvador, Wei Yang, Michal Hocko, linux-mm,
	Yong Hu, Nanhai Zou, Yuan Liu, Tim Chen, Qiuxu Zhuo, Yu C Chen,
	Pan Deng, Chen Zhang, linux-kernel

On 2/8/26 20:39, Mike Rapoport wrote:
> On Sat, Feb 07, 2026 at 12:00:09PM +0100, David Hildenbrand (Arm) wrote:
>> On 1/30/26 17:37, Tianyou Li wrote:
>>> When invoke move_pfn_range_to_zone or remove_pfn_range_from_zone, it will
>>> update the zone->contiguous by checking the new zone's pfn range from the
>>> beginning to the end, regardless the previous state of the old zone. When
>>> the zone's pfn range is large, the cost of traversing the pfn range to
>>> update the zone->contiguous could be significant.
>>>
>>> Add fast paths to quickly detect cases where zone is definitely not
>>> contiguous without scanning the new zone. The cases are: when the new range
>>> did not overlap with previous range, the contiguous should be false; if the
>>> new range adjacent with the previous range, just need to check the new
>>> range; if the new added pages could not fill the hole of previous zone, the
>>> contiguous should be false.
>>>
>>> The following test cases of memory hotplug for a VM [1], tested in the
>>> environment [2], show that this optimization can significantly reduce the
>>> memory hotplug time [3].
>>>
>>> +----------------+------+---------------+--------------+----------------+
>>> |                | Size | Time (before) | Time (after) | Time Reduction |
>>> |                +------+---------------+--------------+----------------+
>>> | Plug Memory    | 256G |      10s      |      2s      |       80%      |
>>> |                +------+---------------+--------------+----------------+
>>> |                | 512G |      33s      |      6s      |       81%      |
>>> +----------------+------+---------------+--------------+----------------+
>>>
>>> +----------------+------+---------------+--------------+----------------+
>>> |                | Size | Time (before) | Time (after) | Time Reduction |
>>> |                +------+---------------+--------------+----------------+
>>> | Unplug Memory  | 256G |      10s      |      2s      |       80%      |
>>> |                +------+---------------+--------------+----------------+
>>> |                | 512G |      34s      |      6s      |       82%      |
>>> +----------------+------+---------------+--------------+----------------+
>>>
>>> [1] Qemu commands to hotplug 256G/512G memory for a VM:
>>>       object_add memory-backend-ram,id=hotmem0,size=256G/512G,share=on
>>>       device_add virtio-mem-pci,id=vmem1,memdev=hotmem0,bus=port1
>>>       qom-set vmem1 requested-size 256G/512G (Plug Memory)
>>>       qom-set vmem1 requested-size 0G (Unplug Memory)
>>>
>>> [2] Hardware     : Intel Icelake server
>>>       Guest Kernel : v6.18-rc2
>>>       Qemu         : v9.0.0
>>>
>>>       Launch VM    :
>>>       qemu-system-x86_64 -accel kvm -cpu host \
>>>       -drive file=./Centos10_cloud.qcow2,format=qcow2,if=virtio \
>>>       -drive file=./seed.img,format=raw,if=virtio \
>>>       -smp 3,cores=3,threads=1,sockets=1,maxcpus=3 \
>>>       -m 2G,slots=10,maxmem=2052472M \
>>>       -device pcie-root-port,id=port1,bus=pcie.0,slot=1,multifunction=on \
>>>       -device pcie-root-port,id=port2,bus=pcie.0,slot=2 \
>>>       -nographic -machine q35 \
>>>       -nic user,hostfwd=tcp::3000-:22
>>>
>>>       Guest kernel auto-onlines newly added memory blocks:
>>>       echo online > /sys/devices/system/memory/auto_online_blocks
>>>
>>> [3] The time from typing the QEMU commands in [1] to when the output of
>>>       'grep MemTotal /proc/meminfo' on Guest reflects that all hotplugged
>>>       memory is recognized.
>>>
>>> Reported-by: Nanhai Zou <nanhai.zou@intel.com>
>>> Reported-by: Chen Zhang <zhangchen.kidd@jd.com>
>>> Tested-by: Yuan Liu <yuan1.liu@intel.com>
>>> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
>>> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
>>> Reviewed-by: Yu C Chen <yu.c.chen@intel.com>
>>> Reviewed-by: Pan Deng <pan.deng@intel.com>
>>> Reviewed-by: Nanhai Zou <nanhai.zou@intel.com>
>>> Reviewed-by: Yuan Liu <yuan1.liu@intel.com>
>>> Signed-off-by: Tianyou Li <tianyou.li@intel.com>
>>> ---
>>
>> Thanks for all your work on this and sorry for being slower with
>> review the last month.
>>
>> While I was in the shower I was thinking about how much I hate
>> zone->contiguous + the pageblock walking, and how we could just get
>> rid of it.
>>
>> You know, just what you do while having a relaxing shower.
>>
>>
>> And I was wondering:
>>
>> (a) in which case would we have zone_spanned_pages == zone_present_pages
>> and the zone *not* being contiguous? I assume this just cannot happen,
>> otherwise BUG.
>>
>> (b) in which case would we have zone_spanned_pages != zone_present_pages
>> and the zone *being* contiguous? I assume in some cases where we have small
>> holes within a pageblock?
>>
>> Reading the doc of __pageblock_pfn_to_page(), there are some weird
>> scenarios with holes in pageblocks.
>   
> It seems that "zone->contigous" is really bad name for what this thing
> represents.
> 
> tl;dr I don't think zone_spanned_pages == zone_present_pages is related to
> zone->contigous at all :)
> 
> If you look at pageblock_pfn_to_page() and __pageblock_pfn_to_page(), the
> check for zone->contigous should guarantee that the entire pageblock has a
> valid memory map and that the entire pageblock fits a zone and does not
> cross zone/node boundaries.
> 
> For coldplug memory the memory map is valid for every section that has
> present memory, i.e. even it there is a hole in a section, it's memory map
> will be populated and will have struct pages.
> 
> When zone->contigous is false, the slow path in __pageblock_pfn_to_page()
> essentially checks if the first page in a pageblock is online and if first
> and last pages are in the zone being compacted.
>   
> AFAIU, in the hotplug case an entire pageblock is always onlined to the
> same zone, so zone->contigous won't change after the hotplug is complete.
> 
> We might set it to false in the beginning of the hotplug to avoid scanning
> offline pages, although I'm not sure if it's possible.
> 
> But in the end of hotplug we can simply restore the old value and move on.
> 
> For the coldplug case I'm also not sure it's worth the hassle, we could
> just let compaction scan a few more pfns for those rare weird pageblocks
> and bail out on wrong page conditions.
> 
>> I.e., on my notebook I have
>>
>> $ cat /proc/zoneinfo  | grep -E "Node|spanned|present"
>> Node 0, zone      DMA
>>          spanned  4095
>>          present  3999
>> Node 0, zone    DMA32
>>          spanned  1044480
>>          present  439600
> 
> I suspect this one is contigous ;-)

Just checked. It's not. Probably because there are some holes that are 
entirely without a memmap. (PCI hole)

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-02-12  8:46 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-01-30 16:37 [PATCH v9 0/2] Optimize zone->contiguous update Tianyou Li
2026-01-30 16:37 ` [PATCH v9 1/2] mm/memory hotplug/unplug: Add online_memory_block_pages() and offline_memory_block_pages() Tianyou Li
2026-01-30 16:37 ` [PATCH v9 2/2] mm/memory hotplug/unplug: Optimize zone->contiguous update when changes pfn range Tianyou Li
2026-02-07 11:00   ` David Hildenbrand (Arm)
2026-02-08 19:39     ` Mike Rapoport
2026-02-09 10:52       ` David Hildenbrand (Arm)
2026-02-09 12:44         ` David Hildenbrand (Arm)
2026-02-10 11:44           ` Mike Rapoport
2026-02-10 15:28             ` Li, Tianyou
2026-02-11 12:19             ` David Hildenbrand (Arm)
2026-02-12  8:32               ` Mike Rapoport
2026-02-12  8:45                 ` David Hildenbrand (Arm)
2026-02-09 11:38       ` David Hildenbrand (Arm)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox