From: "Li, Tianyou" <tianyou.li@intel.com>
To: "David Hildenbrand (Red Hat)" <david@kernel.org>,
Oscar Salvador <osalvador@suse.de>,
Mike Rapoport <rppt@kernel.org>,
Wei Yang <richard.weiyang@gmail.com>
Cc: <linux-mm@kvack.org>, Yong Hu <yong.hu@intel.com>,
Nanhai Zou <nanhai.zou@intel.com>, Yuan Liu <yuan1.liu@intel.com>,
Tim Chen <tim.c.chen@linux.intel.com>,
Qiuxu Zhuo <qiuxu.zhuo@intel.com>,
Yu C Chen <yu.c.chen@intel.com>, Pan Deng <pan.deng@intel.com>,
Chen Zhang <zhangchen.kidd@jd.com>,
<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v7 2/2] mm/memory hotplug/unplug: Optimize zone->contiguous update when changes pfn range
Date: Fri, 16 Jan 2026 19:58:27 +0800 [thread overview]
Message-ID: <da65f26d-d2e0-42ae-a682-9652182d37b3@intel.com> (raw)
In-Reply-To: <4d47cd01-034d-4886-80c4-3861d2181e18@kernel.org>
On 1/16/2026 1:00 AM, David Hildenbrand (Red Hat) wrote:
> On 1/15/26 16:09, Li, Tianyou wrote:
>> Sorry for my delayed response. Appreciated for all your suggestions
>> David. My thoughts inlined for your kind consideration.
>>
>> On 1/14/2026 7:13 PM, David Hildenbrand (Red Hat) wrote:
>>>>>
>>>>> This is nasty. I would wish we could just leave that code path alone.
>>>>>
>>>>> In particular: I am 99% sure that we never ever run into this case in
>>>>> practice.
>>>>>
>>>>> E.g., on x86, we can have up to 2 GiB memory blocks. But the
>>>>> memmap of
>>>>> that is 64/4096*2GiB == 32 MB ... and a memory section is 128 MiB.
>>>>>
>>>>>
>>>>> As commented on patch #1, we should drop the set_zone_contiguous() in
>>>>> this function either way and let online_pages() deal with it.
>>>>>
>>>>> We just have to make sure that we don't create some
>>>>> inconsistencies by
>>>>> doing that.
>>>>>
>>>>> Can you double-check?
>>>
>>> I thought about this some more, and it's all a bit nasty. We have to
>>> get this right.
>>>
>>> Losing the optimization for memmap_on_memory users indicates that we
>>> are doing the wrong thing.
>>>
>>> You could introduce the set_zone_contiguous() in this patch here. But
>>> then, I think instead of
>>>
>>> + /*
>>> + * If the allocated memmap pages are not in a full section,
>>> keep the
>>> + * contiguous state as ZONE_CONTIG_NO.
>>> + */
>>> + if (IS_ALIGNED(end_pfn, PAGES_PER_SECTION))
>>> + new_contiguous_state = zone_contig_state_after_growing(zone,
>>> + pfn, nr_pages);
>>> +
>>>
>>> We'd actually unconditionally have to do that, no?
>>>
>> That's a great idea! Probably we need to move
>> the adjust_present_page_count right after mhp_init_memmap_on_memory to
>> keep the present_pages in sync.
>>
>> When the zone is contiguous previously, there probably 2 situations:
>>
>> 1. If new added range is at the start of the zone, then
>> after mhp_init_memmap_on_memory, the zone contiguous will be false at
>> fast path. It should be expected as long as the
>> adjust_present_page_count was called before the online_pages,
>> so zone_contig_state_after_growing in online_pages will return
>> ZONE_CONTIG_MAYBE, which is expected. Then the set_zone_contiguous in
>> online_pages will get correct result.
>>
>> 2. If new added range is at the end of the zone, then
>> the zone_contig_state_after_growing will return ZONE_CONTIG_YES or
>> ZONE_CONTIG_NO, regardless of the memory section online or not. When the
>> contiguous check comes into the online_pages, it will follow the fast
>> path and get the correct contiguous state.
>>
>> When the zone is not contiguous previously, in any case the new zone
>> contiguous state will be false in mhp_init_memmap_on_memory, because
>> either the nr_vmemmap_pages can not fill the hole or the memory section
>> is not online. If we update the present_pages correctly, then in
>> online_pages, the zone_contig_state_after_growing could have the chance
>> to return ZONE_CONTIG_MAYBE, and since all memory sections are onlined,
>> the set_zone_contiguous will get the correct result.
>>
>>
>> I am not sure if you'd like to consider another option: could we
>> encapsulate the mhp_init_memmap_on_memory and online_pages into one
>> function eg. online_memory_block_pages, and offline_memory_block_pages
>> correspondingly as well, in the memory_hotplug.c. So we can check the
>> zone contiguous state as the whole for the new added range.
>>
>> int online_memory_block_pages(unsigned long start_pfn, unsigned long
>> nr_pages,
>> unsigned long nr_vmemmap_pages, struct zone *zone,
>> struct memory_group *group)
>> {
>> bool contiguous = zone->contiguous;
>> enum zone_contig_state new_contiguous_state;
>> int ret;
>>
>> /*
>> * Calculate the new zone contig state before
>> move_pfn_range_to_zone()
>> * sets the zone temporarily to non-contiguous.
>> */
>> new_contiguous_state = zone_contig_state_after_growing(zone,
>> start_pfn,
>> nr_pages);
>
> Should we clear the flag here?
>
Yuan and I have quite a few debate here, the contiguous flag should be
clear once the zone changed, so it make sense that
move_pfn_range_to_zone and remove_pfn_range_from_zone coupled with
clear_zone_contiguous. While the set_zone_contiguous has the
dependencies with page online, so it should be carefully invoked after
page online or before page offline.
>>
>> if (nr_vmemmap_pages) {
>> ret = mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages,
>> zone);
>> if (ret)
>> goto restore_zone_contig;
>> }
>>
>> ret = online_pages(start_pfn + nr_vmemmap_pages,
>> nr_pages - nr_vmemmap_pages, zone, group);
>> if (ret) {
>> if (nr_vmemmap_pages)
>> mhp_deinit_memmap_on_memory(start_pfn, nr_vmemmap_pages);
>> goto restore_zone_contig;
>> }
>>
>> /*
>> * Account once onlining succeeded. If the zone was
>> unpopulated, it is
>> * now already properly populated.
>> */
>> if (nr_vmemmap_pages)
>> adjust_present_page_count(pfn_to_page(start_pfn), mem->group,
>> nr_vmemmap_pages);
>>
>> /*
>> * Now that the ranges are indicated as online, check whether
>> the whole
>> * zone is contiguous.
>> */
>> set_zone_contiguous(zone, new_contiguous_state);
>> return 0;
>>
>> restore_zone_contig:
>> zone->contiguous = contiguous;
>> return ret;
>> }
>
> That is even better, although it sucks to have to handle it on that
> level, and that it's so far away from actual zone resizing code.
>
Agreed. Thanks for the confirmation.
> Should we do the same on the offlining path?
>
Yes, definitely. I will work on this change for patch v8. I will post
the patch v8 once tested and confirmed by Yuan.
Regards,
Tianyou
next prev parent reply other threads:[~2026-01-16 11:58 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-22 14:58 [PATCH v7 0/2] Optimize zone->contiguous update and issue fix Tianyou Li
2025-12-22 14:58 ` [PATCH v7 1/2] mm/memory hotplug: fix zone->contiguous always false when hotplug Tianyou Li
2026-01-06 20:03 ` David Hildenbrand (Red Hat)
2026-01-08 7:35 ` Li, Tianyou
2025-12-22 14:58 ` [PATCH v7 2/2] mm/memory hotplug/unplug: Optimize zone->contiguous update when changes pfn range Tianyou Li
2026-01-06 20:18 ` David Hildenbrand (Red Hat)
2026-01-08 8:13 ` Li, Tianyou
2026-01-08 8:23 ` Li, Tianyou
2026-01-14 11:13 ` David Hildenbrand (Red Hat)
2026-01-15 15:09 ` Li, Tianyou
2026-01-15 17:00 ` David Hildenbrand (Red Hat)
2026-01-16 11:58 ` Li, Tianyou [this message]
2026-01-06 20:25 ` David Hildenbrand (Red Hat)
2026-01-08 8:31 ` Li, Tianyou
2026-01-05 12:21 ` [PATCH v7 0/2] Optimize zone->contiguous update and issue fix Li, Tianyou
2026-01-05 16:27 ` David Hildenbrand (Red Hat)
2026-01-06 3:48 ` Li, Tianyou
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=da65f26d-d2e0-42ae-a682-9652182d37b3@intel.com \
--to=tianyou.li@intel.com \
--cc=david@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nanhai.zou@intel.com \
--cc=osalvador@suse.de \
--cc=pan.deng@intel.com \
--cc=qiuxu.zhuo@intel.com \
--cc=richard.weiyang@gmail.com \
--cc=rppt@kernel.org \
--cc=tim.c.chen@linux.intel.com \
--cc=yong.hu@intel.com \
--cc=yu.c.chen@intel.com \
--cc=yuan1.liu@intel.com \
--cc=zhangchen.kidd@jd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox