From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Mike Rapoport <rppt@kernel.org>
Cc: Tianyou Li <tianyou.li@intel.com>,
Oscar Salvador <osalvador@suse.de>,
Wei Yang <richard.weiyang@gmail.com>,
Michal Hocko <mhocko@suse.com>,
linux-mm@kvack.org, Yong Hu <yong.hu@intel.com>,
Nanhai Zou <nanhai.zou@intel.com>, Yuan Liu <yuan1.liu@intel.com>,
Tim Chen <tim.c.chen@linux.intel.com>,
Qiuxu Zhuo <qiuxu.zhuo@intel.com>,
Yu C Chen <yu.c.chen@intel.com>, Pan Deng <pan.deng@intel.com>,
Chen Zhang <zhangchen.kidd@jd.com>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v9 2/2] mm/memory hotplug/unplug: Optimize zone->contiguous update when changes pfn range
Date: Mon, 9 Feb 2026 12:38:20 +0100 [thread overview]
Message-ID: <79272ca0-a326-4c0d-9e18-eea8e2d68160@kernel.org> (raw)
In-Reply-To: <aYjmcZ4hg9bNbmiY@kernel.org>
On 2/8/26 20:39, Mike Rapoport wrote:
> On Sat, Feb 07, 2026 at 12:00:09PM +0100, David Hildenbrand (Arm) wrote:
>> On 1/30/26 17:37, Tianyou Li wrote:
>>> When invoke move_pfn_range_to_zone or remove_pfn_range_from_zone, it will
>>> update the zone->contiguous by checking the new zone's pfn range from the
>>> beginning to the end, regardless the previous state of the old zone. When
>>> the zone's pfn range is large, the cost of traversing the pfn range to
>>> update the zone->contiguous could be significant.
>>>
>>> Add fast paths to quickly detect cases where zone is definitely not
>>> contiguous without scanning the new zone. The cases are: when the new range
>>> did not overlap with previous range, the contiguous should be false; if the
>>> new range adjacent with the previous range, just need to check the new
>>> range; if the new added pages could not fill the hole of previous zone, the
>>> contiguous should be false.
>>>
>>> The following test cases of memory hotplug for a VM [1], tested in the
>>> environment [2], show that this optimization can significantly reduce the
>>> memory hotplug time [3].
>>>
>>> +----------------+------+---------------+--------------+----------------+
>>> | | Size | Time (before) | Time (after) | Time Reduction |
>>> | +------+---------------+--------------+----------------+
>>> | Plug Memory | 256G | 10s | 2s | 80% |
>>> | +------+---------------+--------------+----------------+
>>> | | 512G | 33s | 6s | 81% |
>>> +----------------+------+---------------+--------------+----------------+
>>>
>>> +----------------+------+---------------+--------------+----------------+
>>> | | Size | Time (before) | Time (after) | Time Reduction |
>>> | +------+---------------+--------------+----------------+
>>> | Unplug Memory | 256G | 10s | 2s | 80% |
>>> | +------+---------------+--------------+----------------+
>>> | | 512G | 34s | 6s | 82% |
>>> +----------------+------+---------------+--------------+----------------+
>>>
>>> [1] Qemu commands to hotplug 256G/512G memory for a VM:
>>> object_add memory-backend-ram,id=hotmem0,size=256G/512G,share=on
>>> device_add virtio-mem-pci,id=vmem1,memdev=hotmem0,bus=port1
>>> qom-set vmem1 requested-size 256G/512G (Plug Memory)
>>> qom-set vmem1 requested-size 0G (Unplug Memory)
>>>
>>> [2] Hardware : Intel Icelake server
>>> Guest Kernel : v6.18-rc2
>>> Qemu : v9.0.0
>>>
>>> Launch VM :
>>> qemu-system-x86_64 -accel kvm -cpu host \
>>> -drive file=./Centos10_cloud.qcow2,format=qcow2,if=virtio \
>>> -drive file=./seed.img,format=raw,if=virtio \
>>> -smp 3,cores=3,threads=1,sockets=1,maxcpus=3 \
>>> -m 2G,slots=10,maxmem=2052472M \
>>> -device pcie-root-port,id=port1,bus=pcie.0,slot=1,multifunction=on \
>>> -device pcie-root-port,id=port2,bus=pcie.0,slot=2 \
>>> -nographic -machine q35 \
>>> -nic user,hostfwd=tcp::3000-:22
>>>
>>> Guest kernel auto-onlines newly added memory blocks:
>>> echo online > /sys/devices/system/memory/auto_online_blocks
>>>
>>> [3] The time from typing the QEMU commands in [1] to when the output of
>>> 'grep MemTotal /proc/meminfo' on Guest reflects that all hotplugged
>>> memory is recognized.
>>>
>>> Reported-by: Nanhai Zou <nanhai.zou@intel.com>
>>> Reported-by: Chen Zhang <zhangchen.kidd@jd.com>
>>> Tested-by: Yuan Liu <yuan1.liu@intel.com>
>>> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
>>> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
>>> Reviewed-by: Yu C Chen <yu.c.chen@intel.com>
>>> Reviewed-by: Pan Deng <pan.deng@intel.com>
>>> Reviewed-by: Nanhai Zou <nanhai.zou@intel.com>
>>> Reviewed-by: Yuan Liu <yuan1.liu@intel.com>
>>> Signed-off-by: Tianyou Li <tianyou.li@intel.com>
>>> ---
>>
>> Thanks for all your work on this and sorry for being slower with
>> review the last month.
>>
>> While I was in the shower I was thinking about how much I hate
>> zone->contiguous + the pageblock walking, and how we could just get
>> rid of it.
>>
>> You know, just what you do while having a relaxing shower.
>>
>>
>> And I was wondering:
>>
>> (a) in which case would we have zone_spanned_pages == zone_present_pages
>> and the zone *not* being contiguous? I assume this just cannot happen,
>> otherwise BUG.
>>
>> (b) in which case would we have zone_spanned_pages != zone_present_pages
>> and the zone *being* contiguous? I assume in some cases where we have small
>> holes within a pageblock?
>>
>> Reading the doc of __pageblock_pfn_to_page(), there are some weird
>> scenarios with holes in pageblocks.
>
> It seems that "zone->contigous" is really bad name for what this thing
> represents.
>
> tl;dr I don't think zone_spanned_pages == zone_present_pages is related to
> zone->contigous at all :)
>
> If you look at pageblock_pfn_to_page() and __pageblock_pfn_to_page(), the
> check for zone->contigous should guarantee that the entire pageblock has a
> valid memory map and that the entire pageblock fits a zone and does not
> cross zone/node boundaries.
>
> For coldplug memory the memory map is valid for every section that has
> present memory, i.e. even it there is a hole in a section, it's memory map
> will be populated and will have struct pages.
>
> When zone->contigous is false, the slow path in __pageblock_pfn_to_page()
> essentially checks if the first page in a pageblock is online and if first
> and last pages are in the zone being compacted.
>
> AFAIU, in the hotplug case an entire pageblock is always onlined to the
> same zone, so zone->contigous won't change after the hotplug is complete.
>
> We might set it to false in the beginning of the hotplug to avoid scanning
> offline pages, although I'm not sure if it's possible.
>
> But in the end of hotplug we can simply restore the old value and move on.
>
> For the coldplug case I'm also not sure it's worth the hassle, we could
> just let compaction scan a few more pfns for those rare weird pageblocks
> and bail out on wrong page conditions.
>
>> I.e., on my notebook I have
>>
>> $ cat /proc/zoneinfo | grep -E "Node|spanned|present"
>> Node 0, zone DMA
>> spanned 4095
>> present 3999
>> Node 0, zone DMA32
>> spanned 1044480
>> present 439600
>
> I suspect this one is contigous ;-)
Just checked. It's not. Probably because there are some holes that are
entirely without a memmap. (PCI hole)
--
Cheers,
David
prev parent reply other threads:[~2026-02-09 11:38 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-30 16:37 [PATCH v9 0/2] Optimize zone->contiguous update Tianyou Li
2026-01-30 16:37 ` [PATCH v9 1/2] mm/memory hotplug/unplug: Add online_memory_block_pages() and offline_memory_block_pages() Tianyou Li
2026-01-30 16:37 ` [PATCH v9 2/2] mm/memory hotplug/unplug: Optimize zone->contiguous update when changes pfn range Tianyou Li
2026-02-07 11:00 ` David Hildenbrand (Arm)
2026-02-08 19:39 ` Mike Rapoport
2026-02-09 10:52 ` David Hildenbrand (Arm)
2026-02-09 12:44 ` David Hildenbrand (Arm)
2026-02-10 11:44 ` Mike Rapoport
2026-02-10 15:28 ` Li, Tianyou
2026-02-11 12:19 ` David Hildenbrand (Arm)
2026-02-12 8:32 ` Mike Rapoport
2026-02-12 8:45 ` David Hildenbrand (Arm)
2026-02-09 11:38 ` David Hildenbrand (Arm) [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=79272ca0-a326-4c0d-9e18-eea8e2d68160@kernel.org \
--to=david@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=nanhai.zou@intel.com \
--cc=osalvador@suse.de \
--cc=pan.deng@intel.com \
--cc=qiuxu.zhuo@intel.com \
--cc=richard.weiyang@gmail.com \
--cc=rppt@kernel.org \
--cc=tianyou.li@intel.com \
--cc=tim.c.chen@linux.intel.com \
--cc=yong.hu@intel.com \
--cc=yu.c.chen@intel.com \
--cc=yuan1.liu@intel.com \
--cc=zhangchen.kidd@jd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox