* [PATCH v3] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range
@ 2026-04-08 3:16 Yuan Liu
2026-04-08 7:36 ` David Hildenbrand (Arm)
2026-04-13 13:06 ` Wei Yang
0 siblings, 2 replies; 9+ messages in thread
From: Yuan Liu @ 2026-04-08 3:16 UTC (permalink / raw)
To: David Hildenbrand, Oscar Salvador, Mike Rapoport, Wei Yang
Cc: linux-mm, Yong Hu, Nanhai Zou, Yuan Liu, Tim Chen, Qiuxu Zhuo,
Yu C Chen, Pan Deng, Tianyou Li, Chen Zhang, linux-kernel
When move_pfn_range_to_zone() or remove_pfn_range_from_zone() updates a
zone, set_zone_contiguous() rescans the entire zone pageblock-by-pageblock
to rebuild zone->contiguous. For large zones this is a significant cost
during memory hotplug and hot-unplug.
Add a new zone member pages_with_online_memmap that tracks the number of
pages within the zone span that have an online memory map (including present
pages and memory holes whose memory map has been initialized). When
spanned_pages == pages_with_online_memmap the zone is contiguous and
pfn_to_page() can be called on any PFN in the zone span without further
pfn_valid() checks.
Only pages that fall within the current zone span are accounted towards
pages_with_online_memmap. A "too small" value is safe, it merely prevents
detecting a contiguous zone.
The following test cases of memory hotplug for a VM [1], tested in the
environment [2], show that this optimization can significantly reduce the
memory hotplug time [3].
+----------------+------+---------------+--------------+----------------+
| | Size | Time (before) | Time (after) | Time Reduction |
| +------+---------------+--------------+----------------+
| Plug Memory | 256G | 10s | 3s | 70% |
| +------+---------------+--------------+----------------+
| | 512G | 36s | 7s | 81% |
+----------------+------+---------------+--------------+----------------+
+----------------+------+---------------+--------------+----------------+
| | Size | Time (before) | Time (after) | Time Reduction |
| +------+---------------+--------------+----------------+
| Unplug Memory | 256G | 11s | 4s | 64% |
| +------+---------------+--------------+----------------+
| | 512G | 36s | 9s | 75% |
+----------------+------+---------------+--------------+----------------+
[1] Qemu commands to hotplug 256G/512G memory for a VM:
object_add memory-backend-ram,id=hotmem0,size=256G/512G,share=on
device_add virtio-mem-pci,id=vmem1,memdev=hotmem0,bus=port1
qom-set vmem1 requested-size 256G/512G (Plug Memory)
qom-set vmem1 requested-size 0G (Unplug Memory)
[2] Hardware : Intel Icelake server
Guest Kernel : v7.0-rc4
Qemu : v9.0.0
Launch VM :
qemu-system-x86_64 -accel kvm -cpu host \
-drive file=./Centos10_cloud.qcow2,format=qcow2,if=virtio \
-drive file=./seed.img,format=raw,if=virtio \
-smp 3,cores=3,threads=1,sockets=1,maxcpus=3 \
-m 2G,slots=10,maxmem=2052472M \
-device pcie-root-port,id=port1,bus=pcie.0,slot=1,multifunction=on \
-device pcie-root-port,id=port2,bus=pcie.0,slot=2 \
-nographic -machine q35 \
-nic user,hostfwd=tcp::3000-:22
Guest kernel auto-onlines newly added memory blocks:
echo online > /sys/devices/system/memory/auto_online_blocks
[3] The time from typing the QEMU commands in [1] to when the output of
'grep MemTotal /proc/meminfo' on Guest reflects that all hotplugged
memory is recognized.
Reported-by: Nanhai Zou <nanhai.zou@intel.com>
Reported-by: Chen Zhang <zhangchen.kidd@jd.com>
Tested-by: Yuan Liu <yuan1.liu@intel.com>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Reviewed-by: Yu C Chen <yu.c.chen@intel.com>
Reviewed-by: Pan Deng <pan.deng@intel.com>
Reviewed-by: Nanhai Zou <nanhai.zou@intel.com>
Co-developed-by: Tianyou Li <tianyou.li@intel.com>
Signed-off-by: Tianyou Li <tianyou.li@intel.com>
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
---
Documentation/mm/physical_memory.rst | 13 ++++++++
drivers/base/memory.c | 6 ++++
include/linux/mmzone.h | 47 ++++++++++++++++++++++++++++
mm/internal.h | 8 +----
mm/memory_hotplug.c | 12 ++-----
mm/mm_init.c | 42 ++++++++++---------------
6 files changed, 86 insertions(+), 42 deletions(-)
diff --git a/Documentation/mm/physical_memory.rst b/Documentation/mm/physical_memory.rst
index b76183545e5b..0aa65e6b5499 100644
--- a/Documentation/mm/physical_memory.rst
+++ b/Documentation/mm/physical_memory.rst
@@ -483,6 +483,19 @@ General
``present_pages`` should use ``get_online_mems()`` to get a stable value. It
is initialized by ``calculate_node_totalpages()``.
+``pages_with_online_memmap``
+ Tracks pages within the zone that have an online memory map (present pages
+ and memory holes whose memory map has been initialized). When
+ ``spanned_pages`` == ``pages_with_online_memmap``, ``pfn_to_page()`` can be
+ performed without further checks on any PFN within the zone span.
+
+ Note: this counter may temporarily undercount when pages with an online
+ memory map exist outside the current zone span. This can only happen during
+ boot, when initializing the memory map of pages that do not fall into any
+ zone span. Growing the zone to cover such pages and later shrinking it back
+ may result in a "too small" value. This is safe: it merely prevents
+ detecting a contiguous zone.
+
``present_early_pages``
The present pages existing within the zone located on memory available since
early boot, excluding hotplugged memory. Defined only when
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index a3091924918b..2b6b4e5508af 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -246,6 +246,7 @@ static int memory_block_online(struct memory_block *mem)
nr_vmemmap_pages = mem->altmap->free;
mem_hotplug_begin();
+ clear_zone_contiguous(zone);
if (nr_vmemmap_pages) {
ret = mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages, zone);
if (ret)
@@ -270,6 +271,7 @@ static int memory_block_online(struct memory_block *mem)
mem->zone = zone;
out:
+ set_zone_contiguous(zone);
mem_hotplug_done();
return ret;
}
@@ -282,6 +284,7 @@ static int memory_block_offline(struct memory_block *mem)
unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
unsigned long nr_vmemmap_pages = 0;
+ struct zone *zone;
int ret;
if (!mem->zone)
@@ -294,7 +297,9 @@ static int memory_block_offline(struct memory_block *mem)
if (mem->altmap)
nr_vmemmap_pages = mem->altmap->free;
+ zone = mem->zone;
mem_hotplug_begin();
+ clear_zone_contiguous(zone);
if (nr_vmemmap_pages)
adjust_present_page_count(pfn_to_page(start_pfn), mem->group,
-nr_vmemmap_pages);
@@ -314,6 +319,7 @@ static int memory_block_offline(struct memory_block *mem)
mem->zone = NULL;
out:
+ set_zone_contiguous(zone);
mem_hotplug_done();
return ret;
}
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 3e51190a55e4..d4dd37a7222a 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -943,6 +943,20 @@ struct zone {
* cma pages is present pages that are assigned for CMA use
* (MIGRATE_CMA).
*
+ * pages_with_online_memmap tracks pages within the zone that have
+ * an online memory map (present pages and memory holes whose memory
+ * map has been initialized). When spanned_pages ==
+ * pages_with_online_memmap, pfn_to_page() can be performed without
+ * further checks on any PFN within the zone span.
+ *
+ * Note: this counter may temporarily undercount when pages with an
+ * online memory map exist outside the current zone span. This can
+ * only happen during boot, when initializing the memory map of
+ * pages that do not fall into any zone span. Growing the zone to
+ * cover such pages and later shrinking it back may result in a
+ * "too small" value. This is safe: it merely prevents detecting a
+ * contiguous zone.
+ *
* So present_pages may be used by memory hotplug or memory power
* management logic to figure out unmanaged pages by checking
* (present_pages - managed_pages). And managed_pages should be used
@@ -967,6 +981,7 @@ struct zone {
atomic_long_t managed_pages;
unsigned long spanned_pages;
unsigned long present_pages;
+ unsigned long pages_with_online_memmap;
#if defined(CONFIG_MEMORY_HOTPLUG)
unsigned long present_early_pages;
#endif
@@ -1601,6 +1616,38 @@ static inline bool zone_is_zone_device(const struct zone *zone)
}
#endif
+/**
+ * zone_is_contiguous - test whether a zone is contiguous
+ * @zone: the zone to test.
+ *
+ * In a contiguous zone, it is valid to call pfn_to_page() on any PFN in the
+ * spanned zone without requiring pfn_valid() or pfn_to_online_page() checks.
+ *
+ * Note that missing synchronization with memory offlining makes any PFN
+ * traversal prone to races.
+ *
+ * ZONE_DEVICE zones are always marked non-contiguous.
+ *
+ * Return: true if contiguous, otherwise false.
+ */
+static inline bool zone_is_contiguous(const struct zone *zone)
+{
+ return zone->contiguous;
+}
+
+static inline void set_zone_contiguous(struct zone *zone)
+{
+ if (zone_is_zone_device(zone))
+ return;
+ if (zone->spanned_pages == zone->pages_with_online_memmap)
+ zone->contiguous = true;
+}
+
+static inline void clear_zone_contiguous(struct zone *zone)
+{
+ zone->contiguous = false;
+}
+
/*
* Returns true if a zone has pages managed by the buddy allocator.
* All the reclaim decisions have to use this function rather than
diff --git a/mm/internal.h b/mm/internal.h
index cb0af847d7d9..92fee035c3f2 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -793,21 +793,15 @@ extern struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
unsigned long end_pfn, struct zone *zone)
{
- if (zone->contiguous)
+ if (zone_is_contiguous(zone))
return pfn_to_page(start_pfn);
return __pageblock_pfn_to_page(start_pfn, end_pfn, zone);
}
-void set_zone_contiguous(struct zone *zone);
bool pfn_range_intersects_zones(int nid, unsigned long start_pfn,
unsigned long nr_pages);
-static inline void clear_zone_contiguous(struct zone *zone)
-{
- zone->contiguous = false;
-}
-
extern int __isolate_free_page(struct page *page, unsigned int order);
extern void __putback_isolated_page(struct page *page, unsigned int order,
int mt);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index bc805029da51..3f73fcb042cf 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -565,18 +565,13 @@ void remove_pfn_range_from_zone(struct zone *zone,
/*
* Zone shrinking code cannot properly deal with ZONE_DEVICE. So
- * we will not try to shrink the zones - which is okay as
- * set_zone_contiguous() cannot deal with ZONE_DEVICE either way.
+ * we will not try to shrink it.
*/
if (zone_is_zone_device(zone))
return;
- clear_zone_contiguous(zone);
-
shrink_zone_span(zone, start_pfn, start_pfn + nr_pages);
update_pgdat_span(pgdat);
-
- set_zone_contiguous(zone);
}
/**
@@ -753,8 +748,6 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
struct pglist_data *pgdat = zone->zone_pgdat;
int nid = pgdat->node_id;
- clear_zone_contiguous(zone);
-
if (zone_is_empty(zone))
init_currently_empty_zone(zone, start_pfn, nr_pages);
resize_zone_range(zone, start_pfn, nr_pages);
@@ -782,8 +775,6 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
memmap_init_range(nr_pages, nid, zone_idx(zone), start_pfn, 0,
MEMINIT_HOTPLUG, altmap, migratetype,
isolate_pageblock);
-
- set_zone_contiguous(zone);
}
struct auto_movable_stats {
@@ -1079,6 +1070,7 @@ void adjust_present_page_count(struct page *page, struct memory_group *group,
if (early_section(__pfn_to_section(page_to_pfn(page))))
zone->present_early_pages += nr_pages;
zone->present_pages += nr_pages;
+ zone->pages_with_online_memmap += nr_pages;
zone->zone_pgdat->node_present_pages += nr_pages;
if (group && movable)
diff --git a/mm/mm_init.c b/mm/mm_init.c
index df34797691bd..d88ba739ab3d 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -842,7 +842,7 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn)
* zone/node above the hole except for the trailing pages in the last
* section that will be appended to the zone/node below.
*/
-static void __init init_unavailable_range(unsigned long spfn,
+static unsigned long __init init_unavailable_range(unsigned long spfn,
unsigned long epfn,
int zone, int node)
{
@@ -858,6 +858,7 @@ static void __init init_unavailable_range(unsigned long spfn,
if (pgcnt)
pr_info("On node %d, zone %s: %lld pages in unavailable ranges\n",
node, zone_names[zone], pgcnt);
+ return pgcnt;
}
/*
@@ -956,9 +957,22 @@ static void __init memmap_init_zone_range(struct zone *zone,
memmap_init_range(end_pfn - start_pfn, nid, zone_id, start_pfn,
zone_end_pfn, MEMINIT_EARLY, NULL, MIGRATE_MOVABLE,
false);
+ zone->pages_with_online_memmap += end_pfn - start_pfn;
- if (*hole_pfn < start_pfn)
- init_unavailable_range(*hole_pfn, start_pfn, zone_id, nid);
+ if (*hole_pfn < start_pfn) {
+ unsigned long pgcnt;
+
+ if (*hole_pfn < zone_start_pfn) {
+ init_unavailable_range(*hole_pfn, zone_start_pfn,
+ zone_id, nid);
+ pgcnt = init_unavailable_range(zone_start_pfn,
+ start_pfn, zone_id, nid);
+ } else {
+ pgcnt = init_unavailable_range(*hole_pfn, start_pfn,
+ zone_id, nid);
+ }
+ zone->pages_with_online_memmap += pgcnt;
+ }
*hole_pfn = end_pfn;
}
@@ -2261,28 +2275,6 @@ void __init init_cma_pageblock(struct page *page)
}
#endif
-void set_zone_contiguous(struct zone *zone)
-{
- unsigned long block_start_pfn = zone->zone_start_pfn;
- unsigned long block_end_pfn;
-
- block_end_pfn = pageblock_end_pfn(block_start_pfn);
- for (; block_start_pfn < zone_end_pfn(zone);
- block_start_pfn = block_end_pfn,
- block_end_pfn += pageblock_nr_pages) {
-
- block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
-
- if (!__pageblock_pfn_to_page(block_start_pfn,
- block_end_pfn, zone))
- return;
- cond_resched();
- }
-
- /* We confirm that there is no hole */
- zone->contiguous = true;
-}
-
/*
* Check if a PFN range intersects multiple zones on one or more
* NUMA nodes. Specify the @nid argument if it is known that this
--
2.47.3
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range
2026-04-08 3:16 [PATCH v3] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range Yuan Liu
@ 2026-04-08 7:36 ` David Hildenbrand (Arm)
2026-04-08 12:29 ` Liu, Yuan1
2026-04-09 14:40 ` Mike Rapoport
2026-04-13 13:06 ` Wei Yang
1 sibling, 2 replies; 9+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-08 7:36 UTC (permalink / raw)
To: Yuan Liu, Oscar Salvador, Mike Rapoport, Wei Yang
Cc: linux-mm, Yong Hu, Nanhai Zou, Tim Chen, Qiuxu Zhuo, Yu C Chen,
Pan Deng, Tianyou Li, Chen Zhang, linux-kernel
On 4/8/26 05:16, Yuan Liu wrote:
> When move_pfn_range_to_zone() or remove_pfn_range_from_zone() updates a
> zone, set_zone_contiguous() rescans the entire zone pageblock-by-pageblock
> to rebuild zone->contiguous. For large zones this is a significant cost
> during memory hotplug and hot-unplug.
>
> Add a new zone member pages_with_online_memmap that tracks the number of
> pages within the zone span that have an online memory map (including present
> pages and memory holes whose memory map has been initialized). When
> spanned_pages == pages_with_online_memmap the zone is contiguous and
> pfn_to_page() can be called on any PFN in the zone span without further
> pfn_valid() checks.
>
> Only pages that fall within the current zone span are accounted towards
> pages_with_online_memmap. A "too small" value is safe, it merely prevents
> detecting a contiguous zone.
>
> The following test cases of memory hotplug for a VM [1], tested in the
> environment [2], show that this optimization can significantly reduce the
> memory hotplug time [3].
>
> +----------------+------+---------------+--------------+----------------+
> | | Size | Time (before) | Time (after) | Time Reduction |
> | +------+---------------+--------------+----------------+
> | Plug Memory | 256G | 10s | 3s | 70% |
> | +------+---------------+--------------+----------------+
> | | 512G | 36s | 7s | 81% |
> +----------------+------+---------------+--------------+----------------+
>
> +----------------+------+---------------+--------------+----------------+
> | | Size | Time (before) | Time (after) | Time Reduction |
> | +------+---------------+--------------+----------------+
> | Unplug Memory | 256G | 11s | 4s | 64% |
> | +------+---------------+--------------+----------------+
> | | 512G | 36s | 9s | 75% |
> +----------------+------+---------------+--------------+----------------+
>
> [1] Qemu commands to hotplug 256G/512G memory for a VM:
> object_add memory-backend-ram,id=hotmem0,size=256G/512G,share=on
> device_add virtio-mem-pci,id=vmem1,memdev=hotmem0,bus=port1
> qom-set vmem1 requested-size 256G/512G (Plug Memory)
> qom-set vmem1 requested-size 0G (Unplug Memory)
>
> [2] Hardware : Intel Icelake server
> Guest Kernel : v7.0-rc4
> Qemu : v9.0.0
>
> Launch VM :
> qemu-system-x86_64 -accel kvm -cpu host \
> -drive file=./Centos10_cloud.qcow2,format=qcow2,if=virtio \
> -drive file=./seed.img,format=raw,if=virtio \
> -smp 3,cores=3,threads=1,sockets=1,maxcpus=3 \
> -m 2G,slots=10,maxmem=2052472M \
> -device pcie-root-port,id=port1,bus=pcie.0,slot=1,multifunction=on \
> -device pcie-root-port,id=port2,bus=pcie.0,slot=2 \
> -nographic -machine q35 \
> -nic user,hostfwd=tcp::3000-:22
>
> Guest kernel auto-onlines newly added memory blocks:
> echo online > /sys/devices/system/memory/auto_online_blocks
>
> [3] The time from typing the QEMU commands in [1] to when the output of
> 'grep MemTotal /proc/meminfo' on Guest reflects that all hotplugged
> memory is recognized.
>
> Reported-by: Nanhai Zou <nanhai.zou@intel.com>
> Reported-by: Chen Zhang <zhangchen.kidd@jd.com>
> Tested-by: Yuan Liu <yuan1.liu@intel.com>
> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> Reviewed-by: Yu C Chen <yu.c.chen@intel.com>
> Reviewed-by: Pan Deng <pan.deng@intel.com>
> Reviewed-by: Nanhai Zou <nanhai.zou@intel.com>
> Co-developed-by: Tianyou Li <tianyou.li@intel.com>
> Signed-off-by: Tianyou Li <tianyou.li@intel.com>
> Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
> Acked-by: David Hildenbrand (Arm) <david@kernel.org>
> ---
[...]
> @@ -842,7 +842,7 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn)
> * zone/node above the hole except for the trailing pages in the last
> * section that will be appended to the zone/node below.
> */
> -static void __init init_unavailable_range(unsigned long spfn,
> +static unsigned long __init init_unavailable_range(unsigned long spfn,
> unsigned long epfn,
> int zone, int node)
> {
> @@ -858,6 +858,7 @@ static void __init init_unavailable_range(unsigned long spfn,
> if (pgcnt)
> pr_info("On node %d, zone %s: %lld pages in unavailable ranges\n",
> node, zone_names[zone], pgcnt);
> + return pgcnt;
> }
>
> /*
> @@ -956,9 +957,22 @@ static void __init memmap_init_zone_range(struct zone *zone,
> memmap_init_range(end_pfn - start_pfn, nid, zone_id, start_pfn,
> zone_end_pfn, MEMINIT_EARLY, NULL, MIGRATE_MOVABLE,
> false);
> + zone->pages_with_online_memmap += end_pfn - start_pfn;
>
> - if (*hole_pfn < start_pfn)
> - init_unavailable_range(*hole_pfn, start_pfn, zone_id, nid);
> + if (*hole_pfn < start_pfn) {
> + unsigned long pgcnt;
> +
> + if (*hole_pfn < zone_start_pfn) {
> + init_unavailable_range(*hole_pfn, zone_start_pfn,
> + zone_id, nid);
> + pgcnt = init_unavailable_range(zone_start_pfn,
> + start_pfn, zone_id, nid);
Indentation of parameters.
> + } else {
> + pgcnt = init_unavailable_range(*hole_pfn, start_pfn,
> + zone_id, nid);
Same here.
> + }
> + zone->pages_with_online_memmap += pgcnt;
> + }
Maybe something like the following could make it nicer to read, just a
thought.
unsigned long hole_start_pfn = *hole_pfn;
if (hole_start_pfn < zone_start_pfn) {
init_unavailable_range(hole_start_pfn, zone_start_pfn,
zone_id, nid);
hole_start_pfn = zone_start_pfn;
}
pgcnt = init_unavailable_range(hole_start_pfn, start_pfn,
zone_id, nid);
LGTM, thanks!
--
Cheers,
David
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: [PATCH v3] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range
2026-04-08 7:36 ` David Hildenbrand (Arm)
@ 2026-04-08 12:29 ` Liu, Yuan1
2026-04-08 12:31 ` David Hildenbrand (Arm)
2026-04-09 14:40 ` Mike Rapoport
1 sibling, 1 reply; 9+ messages in thread
From: Liu, Yuan1 @ 2026-04-08 12:29 UTC (permalink / raw)
To: David Hildenbrand (Arm), Oscar Salvador, Mike Rapoport, Wei Yang
Cc: linux-mm, Hu, Yong, Zou, Nanhai, Tim Chen, Zhuo, Qiuxu, Chen,
Yu C, Deng, Pan, Li, Tianyou, Chen Zhang, linux-kernel
> -----Original Message-----
> From: David Hildenbrand (Arm) <david@kernel.org>
> Sent: Wednesday, April 8, 2026 3:36 PM
> To: Liu, Yuan1 <yuan1.liu@intel.com>; Oscar Salvador <osalvador@suse.de>;
> Mike Rapoport <rppt@kernel.org>; Wei Yang <richard.weiyang@gmail.com>
> Cc: linux-mm@kvack.org; Hu, Yong <yong.hu@intel.com>; Zou, Nanhai
> <nanhai.zou@intel.com>; Tim Chen <tim.c.chen@linux.intel.com>; Zhuo, Qiuxu
> <qiuxu.zhuo@intel.com>; Chen, Yu C <yu.c.chen@intel.com>; Deng, Pan
> <pan.deng@intel.com>; Li, Tianyou <tianyou.li@intel.com>; Chen Zhang
> <zhangchen.kidd@jd.com>; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v3] mm/memory hotplug/unplug: Optimize zone contiguous
> check when changing pfn range
>
> On 4/8/26 05:16, Yuan Liu wrote:
> > When move_pfn_range_to_zone() or remove_pfn_range_from_zone() updates a
> > zone, set_zone_contiguous() rescans the entire zone pageblock-by-
> pageblock
> > to rebuild zone->contiguous. For large zones this is a significant cost
> > during memory hotplug and hot-unplug.
> >
> > Add a new zone member pages_with_online_memmap that tracks the number of
> > pages within the zone span that have an online memory map (including
> present
> > pages and memory holes whose memory map has been initialized). When
> > spanned_pages == pages_with_online_memmap the zone is contiguous and
> > pfn_to_page() can be called on any PFN in the zone span without further
> > pfn_valid() checks.
> >
> > Only pages that fall within the current zone span are accounted towards
> > pages_with_online_memmap. A "too small" value is safe, it merely
> prevents
> > detecting a contiguous zone.
> >
> > The following test cases of memory hotplug for a VM [1], tested in the
> > environment [2], show that this optimization can significantly reduce
> the
> > memory hotplug time [3].
> >
> > +----------------+------+---------------+--------------+----------------
> +
> > | | Size | Time (before) | Time (after) | Time Reduction
> |
> > | +------+---------------+--------------+----------------
> +
> > | Plug Memory | 256G | 10s | 3s | 70%
> |
> > | +------+---------------+--------------+----------------
> +
> > | | 512G | 36s | 7s | 81%
> |
> > +----------------+------+---------------+--------------+----------------
> +
> >
> > +----------------+------+---------------+--------------+----------------
> +
> > | | Size | Time (before) | Time (after) | Time Reduction
> |
> > | +------+---------------+--------------+----------------
> +
> > | Unplug Memory | 256G | 11s | 4s | 64%
> |
> > | +------+---------------+--------------+----------------
> +
> > | | 512G | 36s | 9s | 75%
> |
> > +----------------+------+---------------+--------------+----------------
> +
> >
> > [1] Qemu commands to hotplug 256G/512G memory for a VM:
> > object_add memory-backend-ram,id=hotmem0,size=256G/512G,share=on
> > device_add virtio-mem-pci,id=vmem1,memdev=hotmem0,bus=port1
> > qom-set vmem1 requested-size 256G/512G (Plug Memory)
> > qom-set vmem1 requested-size 0G (Unplug Memory)
> >
> > [2] Hardware : Intel Icelake server
> > Guest Kernel : v7.0-rc4
> > Qemu : v9.0.0
> >
> > Launch VM :
> > qemu-system-x86_64 -accel kvm -cpu host \
> > -drive file=./Centos10_cloud.qcow2,format=qcow2,if=virtio \
> > -drive file=./seed.img,format=raw,if=virtio \
> > -smp 3,cores=3,threads=1,sockets=1,maxcpus=3 \
> > -m 2G,slots=10,maxmem=2052472M \
> > -device pcie-root-port,id=port1,bus=pcie.0,slot=1,multifunction=on \
> > -device pcie-root-port,id=port2,bus=pcie.0,slot=2 \
> > -nographic -machine q35 \
> > -nic user,hostfwd=tcp::3000-:22
> >
> > Guest kernel auto-onlines newly added memory blocks:
> > echo online > /sys/devices/system/memory/auto_online_blocks
> >
> > [3] The time from typing the QEMU commands in [1] to when the output of
> > 'grep MemTotal /proc/meminfo' on Guest reflects that all hotplugged
> > memory is recognized.
> >
> > Reported-by: Nanhai Zou <nanhai.zou@intel.com>
> > Reported-by: Chen Zhang <zhangchen.kidd@jd.com>
> > Tested-by: Yuan Liu <yuan1.liu@intel.com>
> > Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
> > Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> > Reviewed-by: Yu C Chen <yu.c.chen@intel.com>
> > Reviewed-by: Pan Deng <pan.deng@intel.com>
> > Reviewed-by: Nanhai Zou <nanhai.zou@intel.com>
> > Co-developed-by: Tianyou Li <tianyou.li@intel.com>
> > Signed-off-by: Tianyou Li <tianyou.li@intel.com>
> > Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
> > Acked-by: David Hildenbrand (Arm) <david@kernel.org>
> > ---
>
> [...]
>
> > @@ -842,7 +842,7 @@ overlap_memmap_init(unsigned long zone, unsigned
> long *pfn)
> > * zone/node above the hole except for the trailing pages in the last
> > * section that will be appended to the zone/node below.
> > */
> > -static void __init init_unavailable_range(unsigned long spfn,
> > +static unsigned long __init init_unavailable_range(unsigned long spfn,
> > unsigned long epfn,
> > int zone, int node)
> > {
> > @@ -858,6 +858,7 @@ static void __init init_unavailable_range(unsigned
> long spfn,
> > if (pgcnt)
> > pr_info("On node %d, zone %s: %lld pages in unavailable
> ranges\n",
> > node, zone_names[zone], pgcnt);
> > + return pgcnt;
> > }
> >
> > /*
> > @@ -956,9 +957,22 @@ static void __init memmap_init_zone_range(struct
> zone *zone,
> > memmap_init_range(end_pfn - start_pfn, nid, zone_id, start_pfn,
> > zone_end_pfn, MEMINIT_EARLY, NULL, MIGRATE_MOVABLE,
> > false);
> > + zone->pages_with_online_memmap += end_pfn - start_pfn;
> >
> > - if (*hole_pfn < start_pfn)
> > - init_unavailable_range(*hole_pfn, start_pfn, zone_id, nid);
> > + if (*hole_pfn < start_pfn) {
> > + unsigned long pgcnt;
> > +
> > + if (*hole_pfn < zone_start_pfn) {
> > + init_unavailable_range(*hole_pfn, zone_start_pfn,
> > + zone_id, nid);
> > + pgcnt = init_unavailable_range(zone_start_pfn,
> > + start_pfn, zone_id, nid);
>
> Indentation of parameters.
Got it, I'll fix the indentation.
>
> > + } else {
> > + pgcnt = init_unavailable_range(*hole_pfn, start_pfn,
> > + zone_id, nid);
>
>
> Same here.
Sure
> > + }
> > + zone->pages_with_online_memmap += pgcnt;
> > + }
>
>
> Maybe something like the following could make it nicer to read, just a
> thought.
>
> unsigned long hole_start_pfn = *hole_pfn;
>
> if (hole_start_pfn < zone_start_pfn) {
> init_unavailable_range(hole_start_pfn, zone_start_pfn,
> zone_id, nid);
> hole_start_pfn = zone_start_pfn;
> }
> pgcnt = init_unavailable_range(hole_start_pfn, start_pfn,
> zone_id, nid);
Yes, this looks better. I'll apply your suggestion
>
> LGTM, thanks!
Thanks for the feedback, I'll include these changes in the next version
> --
> Cheers,
>
> David
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range
2026-04-08 12:29 ` Liu, Yuan1
@ 2026-04-08 12:31 ` David Hildenbrand (Arm)
2026-04-08 12:37 ` Liu, Yuan1
0 siblings, 1 reply; 9+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-08 12:31 UTC (permalink / raw)
To: Liu, Yuan1, Oscar Salvador, Mike Rapoport, Wei Yang
Cc: linux-mm, Hu, Yong, Zou, Nanhai, Tim Chen, Zhuo, Qiuxu, Chen,
Yu C, Deng, Pan, Li, Tianyou, Chen Zhang, linux-kernel
>>
>> Maybe something like the following could make it nicer to read, just a
>> thought.
>>
>> unsigned long hole_start_pfn = *hole_pfn;
>>
>> if (hole_start_pfn < zone_start_pfn) {
>> init_unavailable_range(hole_start_pfn, zone_start_pfn,
>> zone_id, nid);
>> hole_start_pfn = zone_start_pfn;
>> }
>> pgcnt = init_unavailable_range(hole_start_pfn, start_pfn,
>> zone_id, nid);
>
> Yes, this looks better. I'll apply your suggestion
Best to wait for Mike's comments first! :)
--
Cheers,
David
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: [PATCH v3] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range
2026-04-08 12:31 ` David Hildenbrand (Arm)
@ 2026-04-08 12:37 ` Liu, Yuan1
0 siblings, 0 replies; 9+ messages in thread
From: Liu, Yuan1 @ 2026-04-08 12:37 UTC (permalink / raw)
To: David Hildenbrand (Arm), Oscar Salvador, Mike Rapoport, Wei Yang
Cc: linux-mm, Hu, Yong, Zou, Nanhai, Tim Chen, Zhuo, Qiuxu, Chen,
Yu C, Deng, Pan, Li, Tianyou, Chen Zhang, linux-kernel
> -----Original Message-----
> From: David Hildenbrand (Arm) <david@kernel.org>
> Sent: Wednesday, April 8, 2026 8:31 PM
> To: Liu, Yuan1 <yuan1.liu@intel.com>; Oscar Salvador <osalvador@suse.de>;
> Mike Rapoport <rppt@kernel.org>; Wei Yang <richard.weiyang@gmail.com>
> Cc: linux-mm@kvack.org; Hu, Yong <yong.hu@intel.com>; Zou, Nanhai
> <nanhai.zou@intel.com>; Tim Chen <tim.c.chen@linux.intel.com>; Zhuo, Qiuxu
> <qiuxu.zhuo@intel.com>; Chen, Yu C <yu.c.chen@intel.com>; Deng, Pan
> <pan.deng@intel.com>; Li, Tianyou <tianyou.li@intel.com>; Chen Zhang
> <zhangchen.kidd@jd.com>; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v3] mm/memory hotplug/unplug: Optimize zone contiguous
> check when changing pfn range
>
>
> >>
> >> Maybe something like the following could make it nicer to read, just a
> >> thought.
> >>
> >> unsigned long hole_start_pfn = *hole_pfn;
> >>
> >> if (hole_start_pfn < zone_start_pfn) {
> >> init_unavailable_range(hole_start_pfn, zone_start_pfn,
> >> zone_id, nid);
> >> hole_start_pfn = zone_start_pfn;
> >> }
> >> pgcnt = init_unavailable_range(hole_start_pfn, start_pfn,
> >> zone_id, nid);
> >
> > Yes, this looks better. I'll apply your suggestion
>
> Best to wait for Mike's comments first! :)
Sure, I'll wait for Mike's comments before sending the next version.
> --
> Cheers,
>
> David
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range
2026-04-08 7:36 ` David Hildenbrand (Arm)
2026-04-08 12:29 ` Liu, Yuan1
@ 2026-04-09 14:40 ` Mike Rapoport
2026-04-09 15:08 ` David Hildenbrand (Arm)
1 sibling, 1 reply; 9+ messages in thread
From: Mike Rapoport @ 2026-04-09 14:40 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: Yuan Liu, Oscar Salvador, Wei Yang, linux-mm, Yong Hu,
Nanhai Zou, Tim Chen, Qiuxu Zhuo, Yu C Chen, Pan Deng,
Tianyou Li, Chen Zhang, linux-kernel
On Wed, Apr 08, 2026 at 09:36:14AM +0200, David Hildenbrand (Arm) wrote:
> On 4/8/26 05:16, Yuan Liu wrote:
> > When move_pfn_range_to_zone() or remove_pfn_range_from_zone() updates a
> > zone, set_zone_contiguous() rescans the entire zone pageblock-by-pageblock
> > to rebuild zone->contiguous. For large zones this is a significant cost
> > during memory hotplug and hot-unplug.
> >
> > Add a new zone member pages_with_online_memmap that tracks the number of
> > pages within the zone span that have an online memory map (including present
> > pages and memory holes whose memory map has been initialized). When
> > spanned_pages == pages_with_online_memmap the zone is contiguous and
> > pfn_to_page() can be called on any PFN in the zone span without further
> > pfn_valid() checks.
> >
> > Only pages that fall within the current zone span are accounted towards
> > pages_with_online_memmap. A "too small" value is safe, it merely prevents
> > detecting a contiguous zone.
> >
> > The following test cases of memory hotplug for a VM [1], tested in the
> > environment [2], show that this optimization can significantly reduce the
> > memory hotplug time [3].
> >
> > +----------------+------+---------------+--------------+----------------+
> > | | Size | Time (before) | Time (after) | Time Reduction |
> > | +------+---------------+--------------+----------------+
> > | Plug Memory | 256G | 10s | 3s | 70% |
> > | +------+---------------+--------------+----------------+
> > | | 512G | 36s | 7s | 81% |
> > +----------------+------+---------------+--------------+----------------+
> >
> > +----------------+------+---------------+--------------+----------------+
> > | | Size | Time (before) | Time (after) | Time Reduction |
> > | +------+---------------+--------------+----------------+
> > | Unplug Memory | 256G | 11s | 4s | 64% |
> > | +------+---------------+--------------+----------------+
> > | | 512G | 36s | 9s | 75% |
> > +----------------+------+---------------+--------------+----------------+
> >
> > [1] Qemu commands to hotplug 256G/512G memory for a VM:
> > object_add memory-backend-ram,id=hotmem0,size=256G/512G,share=on
> > device_add virtio-mem-pci,id=vmem1,memdev=hotmem0,bus=port1
> > qom-set vmem1 requested-size 256G/512G (Plug Memory)
> > qom-set vmem1 requested-size 0G (Unplug Memory)
> >
> > [2] Hardware : Intel Icelake server
> > Guest Kernel : v7.0-rc4
> > Qemu : v9.0.0
> >
> > Launch VM :
> > qemu-system-x86_64 -accel kvm -cpu host \
> > -drive file=./Centos10_cloud.qcow2,format=qcow2,if=virtio \
> > -drive file=./seed.img,format=raw,if=virtio \
> > -smp 3,cores=3,threads=1,sockets=1,maxcpus=3 \
> > -m 2G,slots=10,maxmem=2052472M \
> > -device pcie-root-port,id=port1,bus=pcie.0,slot=1,multifunction=on \
> > -device pcie-root-port,id=port2,bus=pcie.0,slot=2 \
> > -nographic -machine q35 \
> > -nic user,hostfwd=tcp::3000-:22
> >
> > Guest kernel auto-onlines newly added memory blocks:
> > echo online > /sys/devices/system/memory/auto_online_blocks
> >
> > [3] The time from typing the QEMU commands in [1] to when the output of
> > 'grep MemTotal /proc/meminfo' on Guest reflects that all hotplugged
> > memory is recognized.
> >
> > Reported-by: Nanhai Zou <nanhai.zou@intel.com>
> > Reported-by: Chen Zhang <zhangchen.kidd@jd.com>
> > Tested-by: Yuan Liu <yuan1.liu@intel.com>
> > Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
> > Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> > Reviewed-by: Yu C Chen <yu.c.chen@intel.com>
> > Reviewed-by: Pan Deng <pan.deng@intel.com>
> > Reviewed-by: Nanhai Zou <nanhai.zou@intel.com>
> > Co-developed-by: Tianyou Li <tianyou.li@intel.com>
> > Signed-off-by: Tianyou Li <tianyou.li@intel.com>
> > Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
> > Acked-by: David Hildenbrand (Arm) <david@kernel.org>
> > ---
>
> [...]
>
> > @@ -842,7 +842,7 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn)
> > * zone/node above the hole except for the trailing pages in the last
> > * section that will be appended to the zone/node below.
> > */
> > -static void __init init_unavailable_range(unsigned long spfn,
> > +static unsigned long __init init_unavailable_range(unsigned long spfn,
> > unsigned long epfn,
> > int zone, int node)
> > {
> > @@ -858,6 +858,7 @@ static void __init init_unavailable_range(unsigned long spfn,
> > if (pgcnt)
> > pr_info("On node %d, zone %s: %lld pages in unavailable ranges\n",
> > node, zone_names[zone], pgcnt);
> > + return pgcnt;
> > }
> >
> > /*
> > @@ -956,9 +957,22 @@ static void __init memmap_init_zone_range(struct zone *zone,
> > memmap_init_range(end_pfn - start_pfn, nid, zone_id, start_pfn,
> > zone_end_pfn, MEMINIT_EARLY, NULL, MIGRATE_MOVABLE,
> > false);
> > + zone->pages_with_online_memmap += end_pfn - start_pfn;
> >
> > - if (*hole_pfn < start_pfn)
> > - init_unavailable_range(*hole_pfn, start_pfn, zone_id, nid);
> > + if (*hole_pfn < start_pfn) {
> > + unsigned long pgcnt;
> > +
> > + if (*hole_pfn < zone_start_pfn) {
> > + init_unavailable_range(*hole_pfn, zone_start_pfn,
> > + zone_id, nid);
> > + pgcnt = init_unavailable_range(zone_start_pfn,
> > + start_pfn, zone_id, nid);
>
> Indentation of parameters.
>
> > + } else {
> > + pgcnt = init_unavailable_range(*hole_pfn, start_pfn,
> > + zone_id, nid);
>
>
> Same here.
>
> > + }
> > + zone->pages_with_online_memmap += pgcnt;
> > + }
>
>
> Maybe something like the following could make it nicer to read, just a
> thought.
>
>
> unsigned long hole_start_pfn = *hole_pfn;
>
> if (hole_start_pfn < zone_start_pfn) {
> init_unavailable_range(hole_start_pfn, zone_start_pfn,
> zone_id, nid);
> hole_start_pfn = zone_start_pfn;
> }
> pgcnt = init_unavailable_range(hole_start_pfn, start_pfn,
> zone_id, nid);
>
Yeah, this looks better :)
sashiko had several comments
https://sashiko.dev/#/patchset/20260408031615.1831922-1-yuan1.liu%40intel.com
I skipped the ones related to hotplug, but in the mm_init part the comment
about zones that can have overlapping physical spans when mirrored
kernelcore is enabled seems valid.
> --
> Cheers,
> David
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range
2026-04-09 14:40 ` Mike Rapoport
@ 2026-04-09 15:08 ` David Hildenbrand (Arm)
0 siblings, 0 replies; 9+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-09 15:08 UTC (permalink / raw)
To: Mike Rapoport
Cc: Yuan Liu, Oscar Salvador, Wei Yang, linux-mm, Yong Hu,
Nanhai Zou, Tim Chen, Qiuxu Zhuo, Yu C Chen, Pan Deng,
Tianyou Li, Chen Zhang, linux-kernel
On 4/9/26 16:40, Mike Rapoport wrote:
> On Wed, Apr 08, 2026 at 09:36:14AM +0200, David Hildenbrand (Arm) wrote:
>> On 4/8/26 05:16, Yuan Liu wrote:
>>> When move_pfn_range_to_zone() or remove_pfn_range_from_zone() updates a
>>> zone, set_zone_contiguous() rescans the entire zone pageblock-by-pageblock
>>> to rebuild zone->contiguous. For large zones this is a significant cost
>>> during memory hotplug and hot-unplug.
>>>
>>> Add a new zone member pages_with_online_memmap that tracks the number of
>>> pages within the zone span that have an online memory map (including present
>>> pages and memory holes whose memory map has been initialized). When
>>> spanned_pages == pages_with_online_memmap the zone is contiguous and
>>> pfn_to_page() can be called on any PFN in the zone span without further
>>> pfn_valid() checks.
>>>
>>> Only pages that fall within the current zone span are accounted towards
>>> pages_with_online_memmap. A "too small" value is safe, it merely prevents
>>> detecting a contiguous zone.
>>>
>>> The following test cases of memory hotplug for a VM [1], tested in the
>>> environment [2], show that this optimization can significantly reduce the
>>> memory hotplug time [3].
>>>
>>> +----------------+------+---------------+--------------+----------------+
>>> | | Size | Time (before) | Time (after) | Time Reduction |
>>> | +------+---------------+--------------+----------------+
>>> | Plug Memory | 256G | 10s | 3s | 70% |
>>> | +------+---------------+--------------+----------------+
>>> | | 512G | 36s | 7s | 81% |
>>> +----------------+------+---------------+--------------+----------------+
>>>
>>> +----------------+------+---------------+--------------+----------------+
>>> | | Size | Time (before) | Time (after) | Time Reduction |
>>> | +------+---------------+--------------+----------------+
>>> | Unplug Memory | 256G | 11s | 4s | 64% |
>>> | +------+---------------+--------------+----------------+
>>> | | 512G | 36s | 9s | 75% |
>>> +----------------+------+---------------+--------------+----------------+
>>>
>>> [1] Qemu commands to hotplug 256G/512G memory for a VM:
>>> object_add memory-backend-ram,id=hotmem0,size=256G/512G,share=on
>>> device_add virtio-mem-pci,id=vmem1,memdev=hotmem0,bus=port1
>>> qom-set vmem1 requested-size 256G/512G (Plug Memory)
>>> qom-set vmem1 requested-size 0G (Unplug Memory)
>>>
>>> [2] Hardware : Intel Icelake server
>>> Guest Kernel : v7.0-rc4
>>> Qemu : v9.0.0
>>>
>>> Launch VM :
>>> qemu-system-x86_64 -accel kvm -cpu host \
>>> -drive file=./Centos10_cloud.qcow2,format=qcow2,if=virtio \
>>> -drive file=./seed.img,format=raw,if=virtio \
>>> -smp 3,cores=3,threads=1,sockets=1,maxcpus=3 \
>>> -m 2G,slots=10,maxmem=2052472M \
>>> -device pcie-root-port,id=port1,bus=pcie.0,slot=1,multifunction=on \
>>> -device pcie-root-port,id=port2,bus=pcie.0,slot=2 \
>>> -nographic -machine q35 \
>>> -nic user,hostfwd=tcp::3000-:22
>>>
>>> Guest kernel auto-onlines newly added memory blocks:
>>> echo online > /sys/devices/system/memory/auto_online_blocks
>>>
>>> [3] The time from typing the QEMU commands in [1] to when the output of
>>> 'grep MemTotal /proc/meminfo' on Guest reflects that all hotplugged
>>> memory is recognized.
>>>
>>> Reported-by: Nanhai Zou <nanhai.zou@intel.com>
>>> Reported-by: Chen Zhang <zhangchen.kidd@jd.com>
>>> Tested-by: Yuan Liu <yuan1.liu@intel.com>
>>> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
>>> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
>>> Reviewed-by: Yu C Chen <yu.c.chen@intel.com>
>>> Reviewed-by: Pan Deng <pan.deng@intel.com>
>>> Reviewed-by: Nanhai Zou <nanhai.zou@intel.com>
>>> Co-developed-by: Tianyou Li <tianyou.li@intel.com>
>>> Signed-off-by: Tianyou Li <tianyou.li@intel.com>
>>> Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
>>> Acked-by: David Hildenbrand (Arm) <david@kernel.org>
>>> ---
>>
>> [...]
>>
>>> @@ -842,7 +842,7 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn)
>>> * zone/node above the hole except for the trailing pages in the last
>>> * section that will be appended to the zone/node below.
>>> */
>>> -static void __init init_unavailable_range(unsigned long spfn,
>>> +static unsigned long __init init_unavailable_range(unsigned long spfn,
>>> unsigned long epfn,
>>> int zone, int node)
>>> {
>>> @@ -858,6 +858,7 @@ static void __init init_unavailable_range(unsigned long spfn,
>>> if (pgcnt)
>>> pr_info("On node %d, zone %s: %lld pages in unavailable ranges\n",
>>> node, zone_names[zone], pgcnt);
>>> + return pgcnt;
>>> }
>>>
>>> /*
>>> @@ -956,9 +957,22 @@ static void __init memmap_init_zone_range(struct zone *zone,
>>> memmap_init_range(end_pfn - start_pfn, nid, zone_id, start_pfn,
>>> zone_end_pfn, MEMINIT_EARLY, NULL, MIGRATE_MOVABLE,
>>> false);
>>> + zone->pages_with_online_memmap += end_pfn - start_pfn;
>>>
>>> - if (*hole_pfn < start_pfn)
>>> - init_unavailable_range(*hole_pfn, start_pfn, zone_id, nid);
>>> + if (*hole_pfn < start_pfn) {
>>> + unsigned long pgcnt;
>>> +
>>> + if (*hole_pfn < zone_start_pfn) {
>>> + init_unavailable_range(*hole_pfn, zone_start_pfn,
>>> + zone_id, nid);
>>> + pgcnt = init_unavailable_range(zone_start_pfn,
>>> + start_pfn, zone_id, nid);
>>
>> Indentation of parameters.
>>
>>> + } else {
>>> + pgcnt = init_unavailable_range(*hole_pfn, start_pfn,
>>> + zone_id, nid);
>>
>>
>> Same here.
>>
>>> + }
>>> + zone->pages_with_online_memmap += pgcnt;
>>> + }
>>
>>
>> Maybe something like the following could make it nicer to read, just a
>> thought.
>>
>>
>> unsigned long hole_start_pfn = *hole_pfn;
>>
>> if (hole_start_pfn < zone_start_pfn) {
>> init_unavailable_range(hole_start_pfn, zone_start_pfn,
>> zone_id, nid);
>> hole_start_pfn = zone_start_pfn;
>> }
>> pgcnt = init_unavailable_range(hole_start_pfn, start_pfn,
>> zone_id, nid);
>>
>
> Yeah, this looks better :)
>
> sashiko had several comments
> https://sashiko.dev/#/patchset/20260408031615.1831922-1-yuan1.liu%40intel.com
>
> I skipped the ones related to hotplug, but in the mm_init part the comment
> about zones that can have overlapping physical spans when mirrored
> kernelcore is enabled seems valid.
The set_zone_contiguous/clear_zone_contiguous can be ignored I think.
The comment about shrink_zone_span() is likely not realistic.
shrink_zone_span() would not shrink over boot holes.
Well, unless we have an odd case where the hole+memory starts in the
middle of a "PAGES_PER_SUBSECTION". That would already be problematic if
memory starts/ends in the middle of a PAGES_PER_SUBSECTION chunk. I
don't such a case exists.
We could improve shrink_zone_span() to let
find_smallest_section_pfn/find_biggest_section_pfn test the pfn_to_nid()
and page_zone() not on;y on the smallest/highest pfn, but also on the
highest/smallest PFN in a PAGES_PER_SUBSECTION chunk.
No need to test pfn_to_online_page() twice, as that is the same result
for all pages in a PAGES_PER_SUBSECTION chunk.
--
Cheers,
David
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range
2026-04-08 3:16 [PATCH v3] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range Yuan Liu
2026-04-08 7:36 ` David Hildenbrand (Arm)
@ 2026-04-13 13:06 ` Wei Yang
2026-04-13 18:24 ` David Hildenbrand (Arm)
1 sibling, 1 reply; 9+ messages in thread
From: Wei Yang @ 2026-04-13 13:06 UTC (permalink / raw)
To: Yuan Liu
Cc: David Hildenbrand, Oscar Salvador, Mike Rapoport, Wei Yang,
linux-mm, Yong Hu, Nanhai Zou, Tim Chen, Qiuxu Zhuo, Yu C Chen,
Pan Deng, Tianyou Li, Chen Zhang, linux-kernel
On Tue, Apr 07, 2026 at 11:16:15PM -0400, Yuan Liu wrote:
[...]
>
>-void set_zone_contiguous(struct zone *zone)
>-{
>- unsigned long block_start_pfn = zone->zone_start_pfn;
>- unsigned long block_end_pfn;
>-
>- block_end_pfn = pageblock_end_pfn(block_start_pfn);
>- for (; block_start_pfn < zone_end_pfn(zone);
>- block_start_pfn = block_end_pfn,
>- block_end_pfn += pageblock_nr_pages) {
>-
>- block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
>-
>- if (!__pageblock_pfn_to_page(block_start_pfn,
>- block_end_pfn, zone))
>- return;
>- cond_resched();
>- }
>-
>- /* We confirm that there is no hole */
>- zone->contiguous = true;
>-}
>-
Hi,
I may see a behavioral change after this patch.
* An originally non-contiguous zone would be detected as contiguous after this patch.
My test setup:
Did test in a qemu with 6G memory with memblock_debug enabled.
And adjust the /proc/zoneinfo to display zone->contiguous field.
Originally, memblock_dump shows:
MEMBLOCK configuration:
memory size = 0x000000017ff7dc00 reserved size = 0x0000000005a9d9c2
memory.cnt = 0x3
memory[0x0] [0x0000000000001000-0x000000000009efff], 0x000000000009e000 bytes on node 0 flags: 0x0
memory[0x1] [0x0000000000100000-0x00000000bffdefff], 0x00000000bfedf000 bytes on node 0 flags: 0x0
+- memory[0x2] [0x0000000100000000-0x00000001bfffffff], 0x00000000c0000000 bytes on node 1 flags: 0x0
And zone range shows:
Zone ranges:
DMA [mem 0x0000000000001000-0x0000000000ffffff]
DMA32 [mem 0x0000000001000000-0x00000000ffffffff]
Normal [mem 0x0000000100000000-0x00000001bfffffff] <--- entire last memblock region
With the last memblock region fits in Node 1 Zone Normal.
Then I punch a hole in this region with 2M(subsection) size with following
change, to mimic there is a hole in memory range:
@@ -1372,5 +1372,8 @@ __init void e820__memblock_setup(void)
/* Throw away partial pages: */
memblock_trim_memory(PAGE_SIZE);
+ memblock_remove(0x140000000, 0x200000);
+
memblock_dump_all();
}
Then the memblock dump shows:
MEMBLOCK configuration:
memory size = 0x000000017fd7dc00 reserved size = 0x0000000005a97 9c2
memory.cnt = 0x4
memory[0x0] [0x0000000000001000-0x000000000009efff], 0x000000000009e000 bytes on node 0 flags: 0x0
memory[0x1] [0x0000000000100000-0x00000000bffdefff], 0x00000000bfedf000 bytes on node 0 flags: 0x0
+- memory[0x2] [0x0000000100000000-0x000000013fffffff], 0x0000000040000000 bytes on node 1 flags: 0x0
+- memory[0x3] [0x0000000140200000-0x00000001bfffffff], 0x000000007fe00000 bytes on node 1 flags: 0x0
We can see the original one memblock region is divided into two, with a hole
of 2M in the middle.
Not sure this is a reasonable mimic of memory hole. Also I tried to
punch a larger hole, e.g. 10M, still see the behavioral change.
The /proc/zoneinfo result:
w/o patch
Node 1, zone Normal
pages free 469271
boost 0
min 8567
low 10708
high 12849
promo 14990
spanned 786432
present 785920
contigu 0 <--- zone is non-contiguous
managed 766024
cma 0
with patch
Node 1, zone Normal
pages free 121098
boost 0
min 8665
low 10831
high 12997
promo 15163
spanned 786432
present 785920
contigu 1 <--- zone is contiguous
managed 773041
cma 0
This shows we treat Node 1 Zone Normal as non-contiguous before, but treat
it a contiguous zone after this patch.
Reason:
set_zone_contiguous()
__pageblock_pfn_to_page()
pfn_to_online_page()
pfn_section_valid() <--- check subsection
When SPARSEMEM_VMEMMEP is set, pfn_section_valid() checks subsection bit to
decide if it is valid. For a hole, the corresponding bit is not set. So it
is non-contiguous before the patch.
After this patch, the memory map in this hole also contributes to
pages_with_online_memmap, so it is treated as contiguous.
Some question:
I suspect with !SPARSEMEM_VMEMMEP, we always treat Zone Normal as
contiguous, because we don't set subsection. So it looks the behavior is
different from SPARSEMEM_VMEMMEP. But I didn't manage to build kernel with
!SPARSEMEM_VMEMMEP to verify.
I see the discussion on defining zone->contiguous as safe to use
pfn_to_page() for the whole zone. For this purpose, current change looks
good to me. Since we do allocate and init memory map for holes.
But pageblock_pfn_to_page() is used for compaction and other. A pfn with
memory map but no actual memory seems not guarantee to be a usable page. So
the correct usage of pageblock_pfn_to_page() is after
pageblock_pfn_to_page() return a page, we should validate each page in the
range before using? I am a little lost here.
> /*
> * Check if a PFN range intersects multiple zones on one or more
> * NUMA nodes. Specify the @nid argument if it is known that this
>--
>2.47.3
--
Wei Yang
Help you, Help me
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range
2026-04-13 13:06 ` Wei Yang
@ 2026-04-13 18:24 ` David Hildenbrand (Arm)
0 siblings, 0 replies; 9+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-13 18:24 UTC (permalink / raw)
To: Wei Yang, Yuan Liu
Cc: Oscar Salvador, Mike Rapoport, linux-mm, Yong Hu, Nanhai Zou,
Tim Chen, Qiuxu Zhuo, Yu C Chen, Pan Deng, Tianyou Li,
Chen Zhang, linux-kernel
> With the last memblock region fits in Node 1 Zone Normal.
>
> Then I punch a hole in this region with 2M(subsection) size with following
> change, to mimic there is a hole in memory range:
>
> @@ -1372,5 +1372,8 @@ __init void e820__memblock_setup(void)
> /* Throw away partial pages: */
> memblock_trim_memory(PAGE_SIZE);
>
> + memblock_remove(0x140000000, 0x200000);
> +
> memblock_dump_all();
> }
>
> Then the memblock dump shows:
>
> MEMBLOCK configuration:
> memory size = 0x000000017fd7dc00 reserved size = 0x0000000005a97 9c2
> memory.cnt = 0x4
> memory[0x0] [0x0000000000001000-0x000000000009efff], 0x000000000009e000 bytes on node 0 flags: 0x0
> memory[0x1] [0x0000000000100000-0x00000000bffdefff], 0x00000000bfedf000 bytes on node 0 flags: 0x0
> +- memory[0x2] [0x0000000100000000-0x000000013fffffff], 0x0000000040000000 bytes on node 1 flags: 0x0
> +- memory[0x3] [0x0000000140200000-0x00000001bfffffff], 0x000000007fe00000 bytes on node 1 flags: 0x0
>
> We can see the original one memblock region is divided into two, with a hole
> of 2M in the middle.
Yes, that makes sense.
>
> Not sure this is a reasonable mimic of memory hole. Also I tried to
> punch a larger hole, e.g. 10M, still see the behavioral change.
>
> The /proc/zoneinfo result:
>
> w/o patch
>
> Node 1, zone Normal
> pages free 469271
> boost 0
> min 8567
> low 10708
> high 12849
> promo 14990
> spanned 786432
> present 785920
> contigu 0 <--- zone is non-contiguous
> managed 766024
> cma 0
>
> with patch
>
> Node 1, zone Normal
> pages free 121098
> boost 0
> min 8665
> low 10831
> high 12997
> promo 15163
> spanned 786432
> present 785920
> contigu 1 <--- zone is contiguous
> managed 773041
> cma 0
>
> This shows we treat Node 1 Zone Normal as non-contiguous before, but treat
> it a contiguous zone after this patch.
>
> Reason:
>
> set_zone_contiguous()
> __pageblock_pfn_to_page()
> pfn_to_online_page()
> pfn_section_valid() <--- check subsection
>
> When SPARSEMEM_VMEMMEP is set, pfn_section_valid() checks subsection bit to
> decide if it is valid. For a hole, the corresponding bit is not set. So it
> is non-contiguous before the patch.
>
> After this patch, the memory map in this hole also contributes to
> pages_with_online_memmap, so it is treated as contiguous.
That means that mm init code actually initialized a memmap, so there is
a memmap there that is properly initialized?
So init_unavailable_range()->for_each_valid_pfn() processed these
sub-section holes I guess.
subsection_map_init() takes care of initializing the subsections. That
happens before memmap_init() in free_area_init().
Is there a problem in for_each_valid_pfn()?
And I think there is in first_valid_pfn:
if (valid_section(ms) &&
(early_section(ms) || pfn_section_first_valid(ms, &pfn))) {
rcu_read_unlock_sched();
return pfn;
}
The PFN is valid, but we actually care about whether it will be online.
So likely, we should skip over sub-sections here also for early sections
(even though the memmap exist, nobody should be looking at it, just like
for an offline memory section).
>
> Some question:
>
> I suspect with !SPARSEMEM_VMEMMEP, we always treat Zone Normal as
> contiguous, because we don't set subsection. So it looks the behavior is
> different from SPARSEMEM_VMEMMEP. But I didn't manage to build kernel with
> !SPARSEMEM_VMEMMEP to verify.
>
> I see the discussion on defining zone->contiguous as safe to use
> pfn_to_page() for the whole zone. For this purpose, current change looks
> good to me. Since we do allocate and init memory map for holes.
Right.
>
> But pageblock_pfn_to_page() is used for compaction and other. A pfn with
> memory map but no actual memory seems not guarantee to be a usable page. So
> the correct usage of pageblock_pfn_to_page() is after
> pageblock_pfn_to_page() return a page, we should validate each page in the
> range before using? I am a little lost here.
These non-existent pages (holes) are no different than allocated
un-movable memory. So compaction code must deal with them. Just like
smaller memory holes that don't cover a full memory section.
--
Cheers,
David
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-04-13 18:24 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-04-08 3:16 [PATCH v3] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range Yuan Liu
2026-04-08 7:36 ` David Hildenbrand (Arm)
2026-04-08 12:29 ` Liu, Yuan1
2026-04-08 12:31 ` David Hildenbrand (Arm)
2026-04-08 12:37 ` Liu, Yuan1
2026-04-09 14:40 ` Mike Rapoport
2026-04-09 15:08 ` David Hildenbrand (Arm)
2026-04-13 13:06 ` Wei Yang
2026-04-13 18:24 ` David Hildenbrand (Arm)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox