Re: [PATCH v3] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Wei Yang <richard.weiyang@gmail.com>
Cc: Yuan Liu <yuan1.liu@intel.com>,
	Oscar Salvador <osalvador@suse.de>,
	Mike Rapoport <rppt@kernel.org>,
	linux-mm@kvack.org, Yong Hu <yong.hu@intel.com>,
	Nanhai Zou <nanhai.zou@intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Qiuxu Zhuo <qiuxu.zhuo@intel.com>,
	Yu C Chen <yu.c.chen@intel.com>, Pan Deng <pan.deng@intel.com>,
	Tianyou Li <tianyou.li@intel.com>,
	Chen Zhang <zhangchen.kidd@jd.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range
Date: Wed, 15 Apr 2026 11:11:12 +0200	[thread overview]
Message-ID: <b9e7e80a-f052-42ab-ad24-033d714911dc@kernel.org> (raw)
In-Reply-To: <20260415023015.5i7hrixrk64uzjqj@master>

On 4/15/26 04:30, Wei Yang wrote:
> On Tue, Apr 14, 2026 at 11:32:13AM +0200, David Hildenbrand (Arm) wrote:
>> On 4/14/26 04:12, Wei Yang wrote:
>>>
>>> Yes, I think so.
>>>
>>> When memmap_init()->for_each_mem_pfn_range() iterate on the last memblock
>>> region, init_unavailable_range() will init the hole.
>>>
>>>
>>> Yes. I guess you mean sparse_init_subsection_map().
>>>
>>>
>>> You mean there is a problem in first_valid_pfn?
>>
>> That is my theory.
>>
>>>
>>>
>>> And it should be like below?
>>>
>>> 	if (valid_section(ms) &&
>>> 	    pfn_section_first_valid(ms, &pfn)) {
>>> 		rcu_read_unlock_sched();
>>> 		return pfn;
>>> 	}
>>
>> Probably, yes. We have to understand if other users would be negatively
>> affected.
>>
>>>
>>> IIUC, this would skip hole and leave allocated memory map uninitialized. And
>>> then those pages won't contribute to pages_with_online_memmap, which further
>>> leave the zone non-contiguous.
>>
>> Yes.
>>
>>>
>>> But we want zone to be contiguous when we have a hole like this, right?
>>
>> Not if a subsection is marked invalid. That's why the existing scenario
>> is that it will not be contiguous.
>>
> 
> Let me try to understand.
> 
> During previous discussion[1], we want to define "zone->contiguous guarantee
> pfn_to_page() is valid on the complete zone". So this definition is not true
> now, since we found the behavioral change.
> 
> Because __pageblock_pfn_to_page() won't treat range with hole as contiguous,
> detected by pfn_to_online_page() on invalid subsection.
> 
> [1]: https://lkml.org/lkml/2026/2/9/550
> 
> Now you are thinking the problem is in the iteration function,
> for_each_valid_pfn(), used in init_unavaiable_range(). It should only take
> valid subsections into consideration, even for early sections.
> 
> So your concern is if we change first_valid_pfn(), it will affect other users.

Let me clarify: We created a bit of a mess.

pfn_valid() says: "early sections always have a full memmap, so even invalid
subsections have a memmap".

pfn_to_online_page() says "it's an invalid subsection, so it sure can't be
online and the content must be stale"

for_each_valid_pfn() obeys to pfn_valid() semantics, and we use it to initialize 
the memmap (that's not going to be online) and accounts it with this patch here 
as "pages_with_online_memmap", which is wrong.


The *cleanest* thing would be to not handle early sections in a special way. That
is, don't allocate a memmap for subsections. That would remove the special
early handling from pfn_valid() and for_each_valid_pfn().


As an easy way forward, maybe we can just make pfn_valid() / for_each_valid_pfn()
behave just like pfn_to_online_page():


diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index e6858b13a0be..bfa5c3df5ee2 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -2253,6 +2253,10 @@ void sparse_init_early_section(int nid, struct page *map, unsigned long pnum,
  * there is actual usable memory at that @pfn. The struct page may
  * represent a hole or an unusable page frame.
  *
+ * Note that this function returns "0" for PFNs that fall into
+ * invalid subsections as part of early sections, even though there would
+ * currently be a memmap allocated (that should not be touched).
+ *
  * Return: 1 for PFNs that have memory map entries and 0 otherwise
  */
 static inline int pfn_valid(unsigned long pfn)
@@ -2277,11 +2281,7 @@ static inline int pfn_valid(unsigned long pfn)
                rcu_read_unlock_sched();
                return 0;
        }
-       /*
-        * Traditionally early sections always returned pfn_valid() for
-        * the entire section-sized span.
-        */
-       ret = early_section(ms) || pfn_section_valid(ms, pfn);
+       ret = pfn_section_valid(ms, pfn);
        rcu_read_unlock_sched();
 
        return ret;
@@ -2297,8 +2297,7 @@ static inline unsigned long first_valid_pfn(unsigned long pfn, unsigned long end
        while (nr <= __highest_present_section_nr && pfn < end_pfn) {
                struct mem_section *ms = __pfn_to_section(pfn);
 
-               if (valid_section(ms) &&
-                   (early_section(ms) || pfn_section_first_valid(ms, &pfn))) {
+               if (valid_section(ms) && pfn_section_first_valid(ms, &pfn)) {
                        rcu_read_unlock_sched();
                        return pfn;
                }
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 330579365a0f..6d0bbf8a2159 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -775,8 +775,8 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
        }
 
        /*
-        * The memmap of early sections is always fully populated. See
-        * section_activate() and pfn_valid() .
+        * The memmap of early sections is currently always fully populated. See
+        * section_activate().
         */
        if (!section_is_early) {
                memmap_pages_add(-1L * (DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE)));



It would now return !pfn_valid() on subsection-sized holes. The price is another
test_bit(idx, usage->subsection_map) check for early sections. If that's a real
problem, we could add a section flag that just caches "full valid section".

> 
> If the above understanding is correct, maybe we can use spanned == present
> to do the trick? Because holes are marked subsection invalid and holes are
> counted into absent.
> 
> But I see the mirrored_kernel thing, not fully understand yet. This is the
> reason to prevent spanned == present approach?

I think we have to rework the way that mirrored .... stuff is handled. It's
way too intrusive, and the memmap init special casing deep down in there is
just nasty.

I'd think that we should just skip over mirrored part on a higher level.

My understanding is: mirror memory has two physical ranges that mirror each other.
We expose that memory to ZONE_MOABLE. But only one of the two physical ranges
is actually used by the buddy, has a memmap etc.

The other (mirrored) range should just be ignored entirely.

-- 
Cheers,

David

     prev parent reply	other threads:[~2026-04-15  9:11 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-08  3:16 Yuan Liu
2026-04-08  7:36 ` David Hildenbrand (Arm)
2026-04-08 12:29   ` Liu, Yuan1
2026-04-08 12:31     ` David Hildenbrand (Arm)
2026-04-08 12:37       ` Liu, Yuan1
2026-04-09 14:40   ` Mike Rapoport
2026-04-09 15:08     ` David Hildenbrand (Arm)
2026-04-14  7:06       ` Liu, Yuan1
2026-04-14  9:24         ` David Hildenbrand (Arm)
2026-04-13 13:06 ` Wei Yang
2026-04-13 18:24   ` David Hildenbrand (Arm)
2026-04-14  2:12     ` Wei Yang
2026-04-14  9:32       ` David Hildenbrand (Arm)
2026-04-15  2:30         ` Wei Yang
2026-04-15  9:11           ` David Hildenbrand (Arm) [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b9e7e80a-f052-42ab-ad24-033d714911dc@kernel.org \
    --to=david@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nanhai.zou@intel.com \
    --cc=osalvador@suse.de \
    --cc=pan.deng@intel.com \
    --cc=qiuxu.zhuo@intel.com \
    --cc=richard.weiyang@gmail.com \
    --cc=rppt@kernel.org \
    --cc=tianyou.li@intel.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=yong.hu@intel.com \
    --cc=yu.c.chen@intel.com \
    --cc=yuan1.liu@intel.com \
    --cc=zhangchen.kidd@jd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox