linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Li, Tianyou" <tianyou.li@intel.com>
To: "David Hildenbrand (Red Hat)" <david@kernel.org>,
	Oscar Salvador <osalvador@suse.de>,
	Mike Rapoport <rppt@kernel.org>,
	Wei Yang <richard.weiyang@gmail.com>
Cc: <linux-mm@kvack.org>, Yong Hu <yong.hu@intel.com>,
	Nanhai Zou <nanhai.zou@intel.com>, Yuan Liu <yuan1.liu@intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Qiuxu Zhuo <qiuxu.zhuo@intel.com>,
	Yu C Chen <yu.c.chen@intel.com>, Pan Deng <pan.deng@intel.com>,
	Chen Zhang <zhangchen.kidd@jd.com>,
	<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v7 2/2] mm/memory hotplug/unplug: Optimize zone->contiguous update when changes pfn range
Date: Fri, 16 Jan 2026 19:58:27 +0800	[thread overview]
Message-ID: <da65f26d-d2e0-42ae-a682-9652182d37b3@intel.com> (raw)
In-Reply-To: <4d47cd01-034d-4886-80c4-3861d2181e18@kernel.org>


On 1/16/2026 1:00 AM, David Hildenbrand (Red Hat) wrote:
> On 1/15/26 16:09, Li, Tianyou wrote:
>> Sorry for my delayed response. Appreciated for all your suggestions
>> David. My thoughts inlined for your kind consideration.
>>
>> On 1/14/2026 7:13 PM, David Hildenbrand (Red Hat) wrote:
>>>>>
>>>>> This is nasty. I would wish we could just leave that code path alone.
>>>>>
>>>>> In particular: I am 99% sure that we never ever run into this case in
>>>>> practice.
>>>>>
>>>>> E.g., on x86, we can have up to 2 GiB memory blocks. But the 
>>>>> memmap of
>>>>> that is 64/4096*2GiB == 32 MB ... and a memory section is 128 MiB.
>>>>>
>>>>>
>>>>> As commented on patch #1, we should drop the set_zone_contiguous() in
>>>>> this function either way and let online_pages() deal with it.
>>>>>
>>>>> We just have to make sure that we don't create some 
>>>>> inconsistencies by
>>>>> doing that.
>>>>>
>>>>> Can you double-check?
>>>
>>> I thought about this some more, and it's all a bit nasty. We have to
>>> get this right.
>>>
>>> Losing the optimization for memmap_on_memory users indicates that we
>>> are doing the wrong thing.
>>>
>>> You could introduce the set_zone_contiguous() in this patch here. But
>>> then, I think instead of
>>>
>>> +    /*
>>> +     * If the allocated memmap pages are not in a full section, 
>>> keep the
>>> +     * contiguous state as ZONE_CONTIG_NO.
>>> +     */
>>> +    if (IS_ALIGNED(end_pfn, PAGES_PER_SECTION))
>>> +        new_contiguous_state = zone_contig_state_after_growing(zone,
>>> +                                pfn, nr_pages);
>>> +
>>>
>>> We'd actually unconditionally have to do that, no?
>>>
>> That's a great idea! Probably we need to move
>> the adjust_present_page_count right after mhp_init_memmap_on_memory to
>> keep the present_pages in sync.
>>
>> When the zone is contiguous previously, there probably 2 situations:
>>
>> 1. If new added range is at the start of the zone, then
>> after mhp_init_memmap_on_memory, the zone contiguous will be false at
>> fast path. It should be expected as long as the
>> adjust_present_page_count was called before the online_pages,
>> so zone_contig_state_after_growing in online_pages will return
>> ZONE_CONTIG_MAYBE, which is expected. Then the set_zone_contiguous in
>> online_pages will get correct result.
>>
>> 2. If new added range is at the end of the zone, then
>> the zone_contig_state_after_growing will return ZONE_CONTIG_YES or
>> ZONE_CONTIG_NO, regardless of the memory section online or not. When the
>> contiguous check comes into the online_pages, it will follow the fast
>> path and get the correct contiguous state.
>>
>> When the zone is not contiguous previously, in any case the new zone
>> contiguous state will be false in mhp_init_memmap_on_memory, because
>> either the nr_vmemmap_pages can not fill the hole or the memory section
>> is not online. If we update the present_pages correctly, then in
>> online_pages, the zone_contig_state_after_growing could have the chance
>> to return ZONE_CONTIG_MAYBE, and since all memory sections are onlined,
>> the set_zone_contiguous will get the correct result.
>>
>>
>> I am not sure if you'd like to consider another option: could we
>> encapsulate the mhp_init_memmap_on_memory and online_pages into one
>> function eg. online_memory_block_pages, and offline_memory_block_pages
>> correspondingly as well, in the memory_hotplug.c. So we can check the
>> zone contiguous state as the whole for the new added range.
>>
>> int online_memory_block_pages(unsigned long start_pfn, unsigned long
>> nr_pages,
>>               unsigned long nr_vmemmap_pages, struct zone *zone,
>>               struct memory_group *group)
>> {
>>       bool contiguous = zone->contiguous;
>>       enum zone_contig_state new_contiguous_state;
>>       int ret;
>>
>>       /*
>>        * Calculate the new zone contig state before 
>> move_pfn_range_to_zone()
>>        * sets the zone temporarily to non-contiguous.
>>        */
>>       new_contiguous_state = zone_contig_state_after_growing(zone, 
>> start_pfn,
>>                                   nr_pages);
>
> Should we clear the flag here?
>

Yuan and I have quite a few debate here, the contiguous flag should be 
clear once the zone changed, so it make sense that 
move_pfn_range_to_zone and remove_pfn_range_from_zone coupled with 
clear_zone_contiguous. While the set_zone_contiguous has the 
dependencies with page online, so it should be carefully invoked after 
page online or before page offline.


>>
>>       if (nr_vmemmap_pages) {
>>           ret = mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages,
>>                           zone);
>>           if (ret)
>>               goto restore_zone_contig;
>>       }
>>
>>       ret = online_pages(start_pfn + nr_vmemmap_pages,
>>                  nr_pages - nr_vmemmap_pages, zone, group);
>>       if (ret) {
>>           if (nr_vmemmap_pages)
>>               mhp_deinit_memmap_on_memory(start_pfn, nr_vmemmap_pages);
>>           goto restore_zone_contig;
>>       }
>>
>>       /*
>>        * Account once onlining succeeded. If the zone was 
>> unpopulated, it is
>>        * now already properly populated.
>>        */
>>       if (nr_vmemmap_pages)
>>           adjust_present_page_count(pfn_to_page(start_pfn), mem->group,
>>                         nr_vmemmap_pages);
>>
>>       /*
>>        * Now that the ranges are indicated as online, check whether 
>> the whole
>>        * zone is contiguous.
>>        */
>>       set_zone_contiguous(zone, new_contiguous_state);
>>       return 0;
>>
>> restore_zone_contig:
>>       zone->contiguous = contiguous;
>>       return ret;
>> }
>
> That is even better, although it sucks to have to handle it on that 
> level, and that it's so far away from actual zone resizing code.
>

Agreed. Thanks for the confirmation.


> Should we do the same on the offlining path?
>

Yes, definitely. I will work on this change for patch v8. I will post 
the patch v8 once tested and confirmed by Yuan.


Regards,

Tianyou



  reply	other threads:[~2026-01-16 11:58 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-22 14:58 [PATCH v7 0/2] Optimize zone->contiguous update and issue fix Tianyou Li
2025-12-22 14:58 ` [PATCH v7 1/2] mm/memory hotplug: fix zone->contiguous always false when hotplug Tianyou Li
2026-01-06 20:03   ` David Hildenbrand (Red Hat)
2026-01-08  7:35     ` Li, Tianyou
2025-12-22 14:58 ` [PATCH v7 2/2] mm/memory hotplug/unplug: Optimize zone->contiguous update when changes pfn range Tianyou Li
2026-01-06 20:18   ` David Hildenbrand (Red Hat)
2026-01-08  8:13     ` Li, Tianyou
2026-01-08  8:23     ` Li, Tianyou
2026-01-14 11:13       ` David Hildenbrand (Red Hat)
2026-01-15 15:09         ` Li, Tianyou
2026-01-15 17:00           ` David Hildenbrand (Red Hat)
2026-01-16 11:58             ` Li, Tianyou [this message]
2026-01-06 20:25   ` David Hildenbrand (Red Hat)
2026-01-08  8:31     ` Li, Tianyou
2026-01-05 12:21 ` [PATCH v7 0/2] Optimize zone->contiguous update and issue fix Li, Tianyou
2026-01-05 16:27   ` David Hildenbrand (Red Hat)
2026-01-06  3:48     ` Li, Tianyou

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=da65f26d-d2e0-42ae-a682-9652182d37b3@intel.com \
    --to=tianyou.li@intel.com \
    --cc=david@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nanhai.zou@intel.com \
    --cc=osalvador@suse.de \
    --cc=pan.deng@intel.com \
    --cc=qiuxu.zhuo@intel.com \
    --cc=richard.weiyang@gmail.com \
    --cc=rppt@kernel.org \
    --cc=tim.c.chen@linux.intel.com \
    --cc=yong.hu@intel.com \
    --cc=yu.c.chen@intel.com \
    --cc=yuan1.liu@intel.com \
    --cc=zhangchen.kidd@jd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox