linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Li, Tianyou" <tianyou.li@intel.com>
To: Mike Rapoport <rppt@kernel.org>,
	"David Hildenbrand (Arm)" <david@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>,
	Wei Yang <richard.weiyang@gmail.com>,
	Michal Hocko <mhocko@suse.com>, <linux-mm@kvack.org>,
	Yong Hu <yong.hu@intel.com>, Nanhai Zou <nanhai.zou@intel.com>,
	Yuan Liu <yuan1.liu@intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Qiuxu Zhuo <qiuxu.zhuo@intel.com>,
	Yu C Chen <yu.c.chen@intel.com>, Pan Deng <pan.deng@intel.com>,
	Chen Zhang <zhangchen.kidd@jd.com>,
	<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v9 2/2] mm/memory hotplug/unplug: Optimize zone->contiguous update when changes pfn range
Date: Tue, 10 Feb 2026 23:28:20 +0800	[thread overview]
Message-ID: <4660c8e3-92db-4555-8168-e8cc57c92230@intel.com> (raw)
In-Reply-To: <aYsaBkMecDG595Xg@kernel.org>


On 2/10/2026 7:44 PM, Mike Rapoport wrote:
> On Mon, Feb 09, 2026 at 01:44:45PM +0100, David Hildenbrand (Arm) wrote:
>> On 2/9/26 11:52, David Hildenbrand (Arm) wrote:
>>> On 2/8/26 20:39, Mike Rapoport wrote:
>>>> On Sat, Feb 07, 2026 at 12:00:09PM +0100, David Hildenbrand (Arm) wrote:
>>>>> Thanks for all your work on this and sorry for being slower with
>>>>> review the last month.
>>>>>
>>>>> While I was in the shower I was thinking about how much I hate
>>>>> zone->contiguous + the pageblock walking, and how we could just get
>>>>> rid of it.
>>>>>
>>>>> You know, just what you do while having a relaxing shower.
>>>>>
>>>>>
>>>>> And I was wondering:
>>>>>
>>>>> (a) in which case would we have zone_spanned_pages == zone_present_pages
>>>>> and the zone *not* being contiguous? I assume this just cannot happen,
>>>>> otherwise BUG.
>>>>>
>>>>> (b) in which case would we have zone_spanned_pages != zone_present_pages
>>>>> and the zone *being* contiguous? I assume in some cases where we
>>>>> have small
>>>>> holes within a pageblock?
>>>>>
>>>>> Reading the doc of __pageblock_pfn_to_page(), there are some weird
>>>>> scenarios with holes in pageblocks.
>>>> It seems that "zone->contigous" is really bad name for what this thing
>>>> represents.
>>>>
>>>> tl;dr I don't think zone_spanned_pages == zone_present_pages is
>>>> related to
>>>> zone->contigous at all :)
>>> My point in (a) was that with "zone_spanned_pages == zone_present_pages"
>>> there are no holes so -> contiguous.
>>>
>>> (b), and what I said further below, is exactly about memory holes where
>>> we have a memmap, but it's not present memory.
>>>
>>>> If you look at pageblock_pfn_to_page() and __pageblock_pfn_to_page(), the
>>>> check for zone->contigous should guarantee that the entire pageblock
>>>> has a
>>>> valid memory map and that the entire pageblock fits a zone and does not
>>>> cross zone/node boundaries.
>>> Right. But that must hold for each and ever pageblock in the spanned
>>> zone range for it to be contiguous.
>>>
>>> zone->contigous tells you "pfn_to_page()" is valid on the complete zone
>>> range"
>>>
>>> That's why set_zone_contiguous() probes __pageblock_pfn_to_page() on ech
>>> and ever pageblock.
>>>> For coldplug memory the memory map is valid for every section that has
>>>> present memory, i.e. even it there is a hole in a section, it's
>>>> memory map
>>>> will be populated and will have struct pages.
>>> There is this sub-section thing, and holes larger than a section might
>>> not have a memmap (unless reserved I guess).
>>>
>>>> When zone->contigous is false, the slow path in __pageblock_pfn_to_page()
>>>> essentially checks if the first page in a pageblock is online and if
>>>> first
>>>> and last pages are in the zone being compacted.
>>>> AFAIU, in the hotplug case an entire pageblock is always onlined to the
>>>> same zone, so zone->contigous won't change after the hotplug is complete.
>>> I think you are missing a point: hotp(un)plug might create holes in the
>>> zone span. Then, pfn_to_page() is no longer valid to be called on
>>> arbitrary pageblocks within the zone.
>>>
>>>> We might set it to false in the beginning of the hotplug to avoid
>>>> scanning
>>>> offline pages, although I'm not sure if it's possible.
>>>>
>>>> But in the end of hotplug we can simply restore the old value and
>>>> move on.
>>> No, you might create holes.
>>>
>>>> For the coldplug case I'm also not sure it's worth the hassle, we could
>>>> just let compaction scan a few more pfns for those rare weird pageblocks
>>>> and bail out on wrong page conditions.
>>> To recap:
>>>
>>> My idea is that "zone_spanned_pages == zone_present_pages" tells you
>>> that the zone is contiguous because there are no holes.
>>>
>>> To handle "non-memory with a struct page", you'd have to check
>>>
>>>       "zone_spanned_pages == zone_present_pages +
>>>            zone_non_present_memmap_pages"
>>>
>>> Or shorter
>>>
>>>       "zone_spanned_pages == zone_pages_with_memmap"
>>>
>>> Then, pfn_to_page() is valid within the complete zone.
>>>
>>> The question is how to best calculate the "zone_pages_with_memmap"
>>> during boot.
>>>
>>> During hot(un)plug we only add/remove zone_present_pages. The
>>> zone_non_present_memmap_pages will not change due to hot(un)plug later.
>>>
>> The following hack does the trick. But
>>
>> (a) I wish we could get rid of the pageblock walking in calc_online_pages().
>> (b) "online_pages" has weird semantics due to the pageblock handling.
>>      "online_pageblock_pages"? not sure.
>> (c) Calculating "online_pages" when we know there is a hole does not make sense,
>>      as we could just keep it 0 if there are holes and simply set it to
>>      zone->online_pageblock_pages->zone->spanned_pages in case all are online.
>>
>>
>>  From d4cb825e91a6363afc68fb994c5d9b29c38c5f42 Mon Sep 17 00:00:00 2001
>> From: "David Hildenbrand (Arm)" <david@kernel.org>
>> Date: Mon, 9 Feb 2026 13:40:24 +0100
>> Subject: [PATCH] tmp
>>
>> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
>> ---
>>   include/linux/mmzone.h | 25 +++++++++++++++++++++++--
>>   mm/internal.h          |  8 +-------
>>   mm/memory_hotplug.c    | 20 ++++++--------------
>>   mm/mm_init.c           | 12 ++++++------
>>   4 files changed, 36 insertions(+), 29 deletions(-)
>>
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index fc5d6c88d2f0..3f7d8d88c597 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -943,6 +943,11 @@ struct zone {
>>   	 * cma pages is present pages that are assigned for CMA use
>>   	 * (MIGRATE_CMA).
>>   	 *
>> +	 * online_pages is pages within the zone that have an online memmap.
>> +	 * online_pages include present pages and memory holes that have a
>> +	 * memmap. When spanned_pages == online_pages, pfn_to_page() can be
>> +	 * performed without further checks on any pfn within the zone span.
> Maybe pages_with_memmap? It would stand off from managed, spanned and
> present, but it's clearer than online IMHO.
>
>> +	 *
>>   	 * So present_pages may be used by memory hotplug or memory power
>>   	 * management logic to figure out unmanaged pages by checking
>>   	 * (present_pages - managed_pages). And managed_pages should be used
>> @@ -967,6 +972,7 @@ struct zone {
>>   	atomic_long_t		managed_pages;
>>   	unsigned long		spanned_pages;
>>   	unsigned long		present_pages;
>> +	unsigned long		online_pages;
>>   #if defined(CONFIG_MEMORY_HOTPLUG)
>>   	unsigned long		present_early_pages;
>>   #endif
>> @@ -1051,8 +1057,6 @@ struct zone {
>>   	bool			compact_blockskip_flush;
>>   #endif
>> -	bool			contiguous;
>> -
>>   	CACHELINE_PADDING(_pad3_);
>>   	/* Zone statistics */
>>   	atomic_long_t		vm_stat[NR_VM_ZONE_STAT_ITEMS];
>> @@ -1124,6 +1128,23 @@ static inline bool zone_spans_pfn(const struct zone *zone, unsigned long pfn)
>>   	return zone->zone_start_pfn <= pfn && pfn < zone_end_pfn(zone);
>>   }
>> +/**
>> + * zone_is_contiguous - test whether a zone is contiguous
>> + * @zone: the zone to test.
>> + *
>> + * In a contiguous zone, it is valid to call pfn_to_page() on any pfn in the
>> + * spanned zone without requiting pfn_valid() or pfn_to_online_page() checks.
>> + *
>> + * Returns: true if contiguous, otherwise false.
>> + */
>> +static inline bool zone_is_contiguous(const struct zone *zone)
>> +{
>> +	return READ_ONCE(zone->spanned_pages) == READ_ONCE(zone->online_pages);
>> +}
>> +
>>   static inline bool zone_is_initialized(const struct zone *zone)
>>   {
>>   	return zone->initialized;
>> diff --git a/mm/internal.h b/mm/internal.h
>> index f35dbcf99a86..6062f9b8ee62 100644
>> --- a/mm/internal.h
>> +++ b/mm/internal.h
>> @@ -716,21 +716,15 @@ extern struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
>>   static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
>>   				unsigned long end_pfn, struct zone *zone)
>>   {
>> -	if (zone->contiguous)
>> +	if (zone_is_contiguous(zone))
>>   		return pfn_to_page(start_pfn);
>>   	return __pageblock_pfn_to_page(start_pfn, end_pfn, zone);
>>   }
>> -void set_zone_contiguous(struct zone *zone);
>>   bool pfn_range_intersects_zones(int nid, unsigned long start_pfn,
>>   			   unsigned long nr_pages);
>> -static inline void clear_zone_contiguous(struct zone *zone)
>> -{
>> -	zone->contiguous = false;
>> -}
>> -
>>   extern int __isolate_free_page(struct page *page, unsigned int order);
>>   extern void __putback_isolated_page(struct page *page, unsigned int order,
>>   				    int mt);
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index a63ec679d861..76496c1039a9 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -492,11 +492,11 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
>>   		pfn = find_smallest_section_pfn(nid, zone, end_pfn,
>>   						zone_end_pfn(zone));
>>   		if (pfn) {
>> -			zone->spanned_pages = zone_end_pfn(zone) - pfn;
>> +			WRITE_ONCE(zone->spanned_pages, zone_end_pfn(zone) - pfn);
>>   			zone->zone_start_pfn = pfn;
>>   		} else {
>>   			zone->zone_start_pfn = 0;
>> -			zone->spanned_pages = 0;
>> +			WRITE_ONCE(zone->spanned_pages, 0);
>>   		}
>>   	} else if (zone_end_pfn(zone) == end_pfn) {
>>   		/*
>> @@ -508,10 +508,10 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
>>   		pfn = find_biggest_section_pfn(nid, zone, zone->zone_start_pfn,
>>   					       start_pfn);
>>   		if (pfn)
>> -			zone->spanned_pages = pfn - zone->zone_start_pfn + 1;
>> +			WRITE_ONCE(zone->spanned_pages, pfn - zone->zone_start_pfn + 1);
>>   		else {
>>   			zone->zone_start_pfn = 0;
>> -			zone->spanned_pages = 0;
>> +			WRITE_ONCE(zone->spanned_pages, 0);
>>   		}
>>   	}
>>   }
>> @@ -565,18 +565,13 @@ void remove_pfn_range_from_zone(struct zone *zone,
>>   	/*
>>   	 * Zone shrinking code cannot properly deal with ZONE_DEVICE. So
>> -	 * we will not try to shrink the zones - which is okay as
>> -	 * set_zone_contiguous() cannot deal with ZONE_DEVICE either way.
>> +	 * we will not try to shrink the zones.
>>   	 */
>>   	if (zone_is_zone_device(zone))
>>   		return;
>> -	clear_zone_contiguous(zone);
>> -
>>   	shrink_zone_span(zone, start_pfn, start_pfn + nr_pages);
>>   	update_pgdat_span(pgdat);
>> -
>> -	set_zone_contiguous(zone);
>>   }
>>   /**
>> @@ -753,8 +748,6 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
>>   	struct pglist_data *pgdat = zone->zone_pgdat;
>>   	int nid = pgdat->node_id;
>> -	clear_zone_contiguous(zone);
>> -
>>   	if (zone_is_empty(zone))
>>   		init_currently_empty_zone(zone, start_pfn, nr_pages);
>>   	resize_zone_range(zone, start_pfn, nr_pages);
>> @@ -782,8 +775,6 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
>>   	memmap_init_range(nr_pages, nid, zone_idx(zone), start_pfn, 0,
>>   			 MEMINIT_HOTPLUG, altmap, migratetype,
>>   			 isolate_pageblock);
>> -
>> -	set_zone_contiguous(zone);
>>   }
>>   struct auto_movable_stats {
>> @@ -1079,6 +1070,7 @@ void adjust_present_page_count(struct page *page, struct memory_group *group,
>>   	if (early_section(__pfn_to_section(page_to_pfn(page))))
>>   		zone->present_early_pages += nr_pages;
>>   	zone->present_pages += nr_pages;
>> +	WRITE_ONCE(zone->online_pages,  zone->online_pages + nr_pages);
>>   	zone->zone_pgdat->node_present_pages += nr_pages;
>>   	if (group && movable)
>> diff --git a/mm/mm_init.c b/mm/mm_init.c
>> index 2a809cd8e7fa..e33caa6fb6fc 100644
>> --- a/mm/mm_init.c
>> +++ b/mm/mm_init.c
>> @@ -2263,9 +2263,10 @@ void __init init_cma_pageblock(struct page *page)
>>   }
>>   #endif
>> -void set_zone_contiguous(struct zone *zone)
>> +static void calc_online_pages(struct zone *zone)
>>   {
>>   	unsigned long block_start_pfn = zone->zone_start_pfn;
>> +	unsigned long online_pages = 0;
>>   	unsigned long block_end_pfn;
>>   	block_end_pfn = pageblock_end_pfn(block_start_pfn);
>> @@ -2277,12 +2278,11 @@ void set_zone_contiguous(struct zone *zone)
>>   		if (!__pageblock_pfn_to_page(block_start_pfn,
>>   					     block_end_pfn, zone))
>> -			return;
>> +			continue;
>>   		cond_resched();
>> +		online_pages += block_end_pfn - block_start_pfn;
> I think we can completely get rid of this with something like this untested
> patch to calculate zone->online_pages for coldplug:
>
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index e33caa6fb6fc..ff2f75e7b49f 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -845,9 +845,9 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn)
>    *   zone/node above the hole except for the trailing pages in the last
>    *   section that will be appended to the zone/node below.
>    */
> -static void __init init_unavailable_range(unsigned long spfn,
> -					  unsigned long epfn,
> -					  int zone, int node)
> +static u64 __init init_unavailable_range(unsigned long spfn,
> +					 unsigned long epfn,
> +					 int zone, int node)
>   {
>   	unsigned long pfn;
>   	u64 pgcnt = 0;
> @@ -861,6 +861,8 @@ static void __init init_unavailable_range(unsigned long spfn,
>   	if (pgcnt)
>   		pr_info("On node %d, zone %s: %lld pages in unavailable ranges\n",
>   			node, zone_names[zone], pgcnt);
> +
> +	return pgcnt;
>   }
>   
>   /*
> @@ -959,9 +961,10 @@ static void __init memmap_init_zone_range(struct zone *zone,
>   	memmap_init_range(end_pfn - start_pfn, nid, zone_id, start_pfn,
>   			  zone_end_pfn, MEMINIT_EARLY, NULL, MIGRATE_MOVABLE,
>   			  false);
> +	zone->online_pages += (end_pfn - start_pfn);
>   
>   	if (*hole_pfn < start_pfn)
> -		init_unavailable_range(*hole_pfn, start_pfn, zone_id, nid);
> +		zone->online_pages += init_unavailable_range(*hole_pfn, start_pfn, zone_id, nid);
>   
>   	*hole_pfn = end_pfn;
>   }



Sorry for late response, I am trying to catch up with the discussion:) 
Per my understanding, zone->contiguous has 2 semantics in combination, 
one is the pages are full filled in the zone span, the other is those 
pages could be access as it has been onlined. The check 
zone_spanned_pages == zone_online_pages guaranteed the both. Either 
resize_zone_span() or shrink_zone_span() will change the 
zone_spanned_pages so they need to use WRITE_ONCE to guarantee the 
ordering; so does to the adjust_present_page_count() where 
zone_online_pages get updated.

Regards,

Tianyou



  reply	other threads:[~2026-02-10 15:28 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-30 16:37 [PATCH v9 0/2] Optimize zone->contiguous update Tianyou Li
2026-01-30 16:37 ` [PATCH v9 1/2] mm/memory hotplug/unplug: Add online_memory_block_pages() and offline_memory_block_pages() Tianyou Li
2026-01-30 16:37 ` [PATCH v9 2/2] mm/memory hotplug/unplug: Optimize zone->contiguous update when changes pfn range Tianyou Li
2026-02-07 11:00   ` David Hildenbrand (Arm)
2026-02-08 19:39     ` Mike Rapoport
2026-02-09 10:52       ` David Hildenbrand (Arm)
2026-02-09 12:44         ` David Hildenbrand (Arm)
2026-02-10 11:44           ` Mike Rapoport
2026-02-10 15:28             ` Li, Tianyou [this message]
2026-02-11 12:19             ` David Hildenbrand (Arm)
2026-02-12  8:32               ` Mike Rapoport
2026-02-12  8:45                 ` David Hildenbrand (Arm)
2026-02-09 11:38       ` David Hildenbrand (Arm)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4660c8e3-92db-4555-8168-e8cc57c92230@intel.com \
    --to=tianyou.li@intel.com \
    --cc=david@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=nanhai.zou@intel.com \
    --cc=osalvador@suse.de \
    --cc=pan.deng@intel.com \
    --cc=qiuxu.zhuo@intel.com \
    --cc=richard.weiyang@gmail.com \
    --cc=rppt@kernel.org \
    --cc=tim.c.chen@linux.intel.com \
    --cc=yong.hu@intel.com \
    --cc=yu.c.chen@intel.com \
    --cc=yuan1.liu@intel.com \
    --cc=zhangchen.kidd@jd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox