From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Wei Yang <richard.weiyang@gmail.com>
Cc: Yuan Liu <yuan1.liu@intel.com>,
Oscar Salvador <osalvador@suse.de>,
Mike Rapoport <rppt@kernel.org>,
linux-mm@kvack.org, Yong Hu <yong.hu@intel.com>,
Nanhai Zou <nanhai.zou@intel.com>,
Tim Chen <tim.c.chen@linux.intel.com>,
Qiuxu Zhuo <qiuxu.zhuo@intel.com>,
Yu C Chen <yu.c.chen@intel.com>, Pan Deng <pan.deng@intel.com>,
Tianyou Li <tianyou.li@intel.com>,
Chen Zhang <zhangchen.kidd@jd.com>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range
Date: Tue, 14 Apr 2026 11:32:13 +0200 [thread overview]
Message-ID: <fd59736d-a02a-4693-9e70-23c593c9dac7@kernel.org> (raw)
In-Reply-To: <20260414021219.wayysugpfbzirzh6@master>
On 4/14/26 04:12, Wei Yang wrote:
> On Mon, Apr 13, 2026 at 08:24:05PM +0200, David Hildenbrand (Arm) wrote:
>>> With the last memblock region fits in Node 1 Zone Normal.
>>>
>>> Then I punch a hole in this region with 2M(subsection) size with following
>>> change, to mimic there is a hole in memory range:
>>>
>>> @@ -1372,5 +1372,8 @@ __init void e820__memblock_setup(void)
>>> /* Throw away partial pages: */
>>> memblock_trim_memory(PAGE_SIZE);
>>>
>>> + memblock_remove(0x140000000, 0x200000);
>>> +
>>> memblock_dump_all();
>>> }
>>>
>>> Then the memblock dump shows:
>>>
>>> MEMBLOCK configuration:
>>> memory size = 0x000000017fd7dc00 reserved size = 0x0000000005a97 9c2
>>> memory.cnt = 0x4
>>> memory[0x0] [0x0000000000001000-0x000000000009efff], 0x000000000009e000 bytes on node 0 flags: 0x0
>>> memory[0x1] [0x0000000000100000-0x00000000bffdefff], 0x00000000bfedf000 bytes on node 0 flags: 0x0
>>> +- memory[0x2] [0x0000000100000000-0x000000013fffffff], 0x0000000040000000 bytes on node 1 flags: 0x0
>>> +- memory[0x3] [0x0000000140200000-0x00000001bfffffff], 0x000000007fe00000 bytes on node 1 flags: 0x0
>>>
>>> We can see the original one memblock region is divided into two, with a hole
>>> of 2M in the middle.
>>
>> Yes, that makes sense.
>>
>>>
>>> Not sure this is a reasonable mimic of memory hole. Also I tried to
>>> punch a larger hole, e.g. 10M, still see the behavioral change.
>>>
>>> The /proc/zoneinfo result:
>>>
>>> w/o patch
>>>
>>> Node 1, zone Normal
>>> pages free 469271
>>> boost 0
>>> min 8567
>>> low 10708
>>> high 12849
>>> promo 14990
>>> spanned 786432
>>> present 785920
>>> contigu 0 <--- zone is non-contiguous
>>> managed 766024
>>> cma 0
>>>
>>> with patch
>>>
>>> Node 1, zone Normal
>>> pages free 121098
>>> boost 0
>>> min 8665
>>> low 10831
>>> high 12997
>>> promo 15163
>>> spanned 786432
>>> present 785920
>>> contigu 1 <--- zone is contiguous
>>> managed 773041
>>> cma 0
>>>
>>> This shows we treat Node 1 Zone Normal as non-contiguous before, but treat
>>> it a contiguous zone after this patch.
>>>
>>> Reason:
>>>
>>> set_zone_contiguous()
>>> __pageblock_pfn_to_page()
>>> pfn_to_online_page()
>>> pfn_section_valid() <--- check subsection
>>>
>>> When SPARSEMEM_VMEMMEP is set, pfn_section_valid() checks subsection bit to
>>> decide if it is valid. For a hole, the corresponding bit is not set. So it
>>> is non-contiguous before the patch.
>>>
>>> After this patch, the memory map in this hole also contributes to
>>> pages_with_online_memmap, so it is treated as contiguous.
>>
>> That means that mm init code actually initialized a memmap, so there is
>> a memmap there that is properly initialized?
>>
>> So init_unavailable_range()->for_each_valid_pfn() processed these
>> sub-section holes I guess.
>>
>
> Yes, I think so.
>
> When memmap_init()->for_each_mem_pfn_range() iterate on the last memblock
> region, init_unavailable_range() will init the hole.
>
>> subsection_map_init() takes care of initializing the subsections. That
>> happens before memmap_init() in free_area_init().
>>
>
> Yes. I guess you mean sparse_init_subsection_map().
>
>> Is there a problem in for_each_valid_pfn()?
>>
>> And I think there is in first_valid_pfn:
>>
>
> You mean there is a problem in first_valid_pfn?
That is my theory.
>
>> if (valid_section(ms) &&
>> (early_section(ms) || pfn_section_first_valid(ms, &pfn))) {
>> rcu_read_unlock_sched();
>> return pfn;
>> }
>>
>> The PFN is valid, but we actually care about whether it will be online.
>> So likely, we should skip over sub-sections here also for early sections
>> (even though the memmap exist, nobody should be looking at it, just like
>> for an offline memory section).
>>
>
> And it should be like below?
>
> if (valid_section(ms) &&
> pfn_section_first_valid(ms, &pfn)) {
> rcu_read_unlock_sched();
> return pfn;
> }
Probably, yes. We have to understand if other users would be negatively
affected.
>
> IIUC, this would skip hole and leave allocated memory map uninitialized. And
> then those pages won't contribute to pages_with_online_memmap, which further
> leave the zone non-contiguous.
Yes.
>
> But we want zone to be contiguous when we have a hole like this, right?
Not if a subsection is marked invalid. That's why the existing scenario
is that it will not be contiguous.
Note that Wei reported that it was not contiguous but would now be
contiguous.
If you have a DAX device the plugs into that hole through
memremap_pages()->pagemap_range(), I think this could cause problems. I
doubt that this would happen in practice for such small holes, but if
they would be bigger, or at the start/end of the range, it could be
problematic.
--
Cheers,
David
prev parent reply other threads:[~2026-04-14 9:32 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-08 3:16 Yuan Liu
2026-04-08 7:36 ` David Hildenbrand (Arm)
2026-04-08 12:29 ` Liu, Yuan1
2026-04-08 12:31 ` David Hildenbrand (Arm)
2026-04-08 12:37 ` Liu, Yuan1
2026-04-09 14:40 ` Mike Rapoport
2026-04-09 15:08 ` David Hildenbrand (Arm)
2026-04-14 7:06 ` Liu, Yuan1
2026-04-14 9:24 ` David Hildenbrand (Arm)
2026-04-13 13:06 ` Wei Yang
2026-04-13 18:24 ` David Hildenbrand (Arm)
2026-04-14 2:12 ` Wei Yang
2026-04-14 9:32 ` David Hildenbrand (Arm) [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fd59736d-a02a-4693-9e70-23c593c9dac7@kernel.org \
--to=david@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nanhai.zou@intel.com \
--cc=osalvador@suse.de \
--cc=pan.deng@intel.com \
--cc=qiuxu.zhuo@intel.com \
--cc=richard.weiyang@gmail.com \
--cc=rppt@kernel.org \
--cc=tianyou.li@intel.com \
--cc=tim.c.chen@linux.intel.com \
--cc=yong.hu@intel.com \
--cc=yu.c.chen@intel.com \
--cc=yuan1.liu@intel.com \
--cc=zhangchen.kidd@jd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox