Re: [PATCH v9 2/2] mm/memory hotplug/unplug: Optimize zone->contiguous update when changes pfn range

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Mike Rapoport <rppt@kernel.org>
Cc: Tianyou Li <tianyou.li@intel.com>,
	Oscar Salvador <osalvador@suse.de>,
	Wei Yang <richard.weiyang@gmail.com>,
	Michal Hocko <mhocko@suse.com>,
	linux-mm@kvack.org, Yong Hu <yong.hu@intel.com>,
	Nanhai Zou <nanhai.zou@intel.com>, Yuan Liu <yuan1.liu@intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Qiuxu Zhuo <qiuxu.zhuo@intel.com>,
	Yu C Chen <yu.c.chen@intel.com>, Pan Deng <pan.deng@intel.com>,
	Chen Zhang <zhangchen.kidd@jd.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v9 2/2] mm/memory hotplug/unplug: Optimize zone->contiguous update when changes pfn range
Date: Mon, 9 Feb 2026 12:38:20 +0100	[thread overview]
Message-ID: <79272ca0-a326-4c0d-9e18-eea8e2d68160@kernel.org> (raw)
In-Reply-To: <aYjmcZ4hg9bNbmiY@kernel.org>

On 2/8/26 20:39, Mike Rapoport wrote:
> On Sat, Feb 07, 2026 at 12:00:09PM +0100, David Hildenbrand (Arm) wrote:
>> On 1/30/26 17:37, Tianyou Li wrote:
>>> When invoke move_pfn_range_to_zone or remove_pfn_range_from_zone, it will
>>> update the zone->contiguous by checking the new zone's pfn range from the
>>> beginning to the end, regardless the previous state of the old zone. When
>>> the zone's pfn range is large, the cost of traversing the pfn range to
>>> update the zone->contiguous could be significant.
>>>
>>> Add fast paths to quickly detect cases where zone is definitely not
>>> contiguous without scanning the new zone. The cases are: when the new range
>>> did not overlap with previous range, the contiguous should be false; if the
>>> new range adjacent with the previous range, just need to check the new
>>> range; if the new added pages could not fill the hole of previous zone, the
>>> contiguous should be false.
>>>
>>> The following test cases of memory hotplug for a VM [1], tested in the
>>> environment [2], show that this optimization can significantly reduce the
>>> memory hotplug time [3].
>>>
>>> +----------------+------+---------------+--------------+----------------+
>>> |                | Size | Time (before) | Time (after) | Time Reduction |
>>> |                +------+---------------+--------------+----------------+
>>> | Plug Memory    | 256G |      10s      |      2s      |       80%      |
>>> |                +------+---------------+--------------+----------------+
>>> |                | 512G |      33s      |      6s      |       81%      |
>>> +----------------+------+---------------+--------------+----------------+
>>>
>>> +----------------+------+---------------+--------------+----------------+
>>> |                | Size | Time (before) | Time (after) | Time Reduction |
>>> |                +------+---------------+--------------+----------------+
>>> | Unplug Memory  | 256G |      10s      |      2s      |       80%      |
>>> |                +------+---------------+--------------+----------------+
>>> |                | 512G |      34s      |      6s      |       82%      |
>>> +----------------+------+---------------+--------------+----------------+
>>>
>>> [1] Qemu commands to hotplug 256G/512G memory for a VM:
>>>       object_add memory-backend-ram,id=hotmem0,size=256G/512G,share=on
>>>       device_add virtio-mem-pci,id=vmem1,memdev=hotmem0,bus=port1
>>>       qom-set vmem1 requested-size 256G/512G (Plug Memory)
>>>       qom-set vmem1 requested-size 0G (Unplug Memory)
>>>
>>> [2] Hardware     : Intel Icelake server
>>>       Guest Kernel : v6.18-rc2
>>>       Qemu         : v9.0.0
>>>
>>>       Launch VM    :
>>>       qemu-system-x86_64 -accel kvm -cpu host \
>>>       -drive file=./Centos10_cloud.qcow2,format=qcow2,if=virtio \
>>>       -drive file=./seed.img,format=raw,if=virtio \
>>>       -smp 3,cores=3,threads=1,sockets=1,maxcpus=3 \
>>>       -m 2G,slots=10,maxmem=2052472M \
>>>       -device pcie-root-port,id=port1,bus=pcie.0,slot=1,multifunction=on \
>>>       -device pcie-root-port,id=port2,bus=pcie.0,slot=2 \
>>>       -nographic -machine q35 \
>>>       -nic user,hostfwd=tcp::3000-:22
>>>
>>>       Guest kernel auto-onlines newly added memory blocks:
>>>       echo online > /sys/devices/system/memory/auto_online_blocks
>>>
>>> [3] The time from typing the QEMU commands in [1] to when the output of
>>>       'grep MemTotal /proc/meminfo' on Guest reflects that all hotplugged
>>>       memory is recognized.
>>>
>>> Reported-by: Nanhai Zou <nanhai.zou@intel.com>
>>> Reported-by: Chen Zhang <zhangchen.kidd@jd.com>
>>> Tested-by: Yuan Liu <yuan1.liu@intel.com>
>>> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
>>> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
>>> Reviewed-by: Yu C Chen <yu.c.chen@intel.com>
>>> Reviewed-by: Pan Deng <pan.deng@intel.com>
>>> Reviewed-by: Nanhai Zou <nanhai.zou@intel.com>
>>> Reviewed-by: Yuan Liu <yuan1.liu@intel.com>
>>> Signed-off-by: Tianyou Li <tianyou.li@intel.com>
>>> ---
>>
>> Thanks for all your work on this and sorry for being slower with
>> review the last month.
>>
>> While I was in the shower I was thinking about how much I hate
>> zone->contiguous + the pageblock walking, and how we could just get
>> rid of it.
>>
>> You know, just what you do while having a relaxing shower.
>>
>>
>> And I was wondering:
>>
>> (a) in which case would we have zone_spanned_pages == zone_present_pages
>> and the zone *not* being contiguous? I assume this just cannot happen,
>> otherwise BUG.
>>
>> (b) in which case would we have zone_spanned_pages != zone_present_pages
>> and the zone *being* contiguous? I assume in some cases where we have small
>> holes within a pageblock?
>>
>> Reading the doc of __pageblock_pfn_to_page(), there are some weird
>> scenarios with holes in pageblocks.
>   
> It seems that "zone->contigous" is really bad name for what this thing
> represents.
> 
> tl;dr I don't think zone_spanned_pages == zone_present_pages is related to
> zone->contigous at all :)
> 
> If you look at pageblock_pfn_to_page() and __pageblock_pfn_to_page(), the
> check for zone->contigous should guarantee that the entire pageblock has a
> valid memory map and that the entire pageblock fits a zone and does not
> cross zone/node boundaries.
> 
> For coldplug memory the memory map is valid for every section that has
> present memory, i.e. even it there is a hole in a section, it's memory map
> will be populated and will have struct pages.
> 
> When zone->contigous is false, the slow path in __pageblock_pfn_to_page()
> essentially checks if the first page in a pageblock is online and if first
> and last pages are in the zone being compacted.
>   
> AFAIU, in the hotplug case an entire pageblock is always onlined to the
> same zone, so zone->contigous won't change after the hotplug is complete.
> 
> We might set it to false in the beginning of the hotplug to avoid scanning
> offline pages, although I'm not sure if it's possible.
> 
> But in the end of hotplug we can simply restore the old value and move on.
> 
> For the coldplug case I'm also not sure it's worth the hassle, we could
> just let compaction scan a few more pfns for those rare weird pageblocks
> and bail out on wrong page conditions.
> 
>> I.e., on my notebook I have
>>
>> $ cat /proc/zoneinfo  | grep -E "Node|spanned|present"
>> Node 0, zone      DMA
>>          spanned  4095
>>          present  3999
>> Node 0, zone    DMA32
>>          spanned  1044480
>>          present  439600
> 
> I suspect this one is contigous ;-)

Just checked. It's not. Probably because there are some holes that are 
entirely without a memmap. (PCI hole)

-- 
Cheers,

David

     prev parent reply	other threads:[~2026-02-09 11:38 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-30 16:37 [PATCH v9 0/2] Optimize zone->contiguous update Tianyou Li
2026-01-30 16:37 ` [PATCH v9 1/2] mm/memory hotplug/unplug: Add online_memory_block_pages() and offline_memory_block_pages() Tianyou Li
2026-01-30 16:37 ` [PATCH v9 2/2] mm/memory hotplug/unplug: Optimize zone->contiguous update when changes pfn range Tianyou Li
2026-02-07 11:00   ` David Hildenbrand (Arm)
2026-02-08 19:39     ` Mike Rapoport
2026-02-09 10:52       ` David Hildenbrand (Arm)
2026-02-09 12:44         ` David Hildenbrand (Arm)
2026-02-10 11:44           ` Mike Rapoport
2026-02-10 15:28             ` Li, Tianyou
2026-02-11 12:19             ` David Hildenbrand (Arm)
2026-02-12  8:32               ` Mike Rapoport
2026-02-12  8:45                 ` David Hildenbrand (Arm)
2026-02-09 11:38       ` David Hildenbrand (Arm) [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=79272ca0-a326-4c0d-9e18-eea8e2d68160@kernel.org \
    --to=david@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=nanhai.zou@intel.com \
    --cc=osalvador@suse.de \
    --cc=pan.deng@intel.com \
    --cc=qiuxu.zhuo@intel.com \
    --cc=richard.weiyang@gmail.com \
    --cc=rppt@kernel.org \
    --cc=tianyou.li@intel.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=yong.hu@intel.com \
    --cc=yu.c.chen@intel.com \
    --cc=yuan1.liu@intel.com \
    --cc=zhangchen.kidd@jd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox