Re: [PATCH v7 2/3] kho: fix deferred init of kho scratch

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Zi Yan <ziy@nvidia.com>
To: Pratyush Yadav <pratyush@kernel.org>
Cc: "Mike Rapoport" <rppt@kernel.org>,
	"Michał Cłapiński" <mclapinski@google.com>,
	"Evangelos Petrongonas" <epetron@amazon.de>,
	"Pasha Tatashin" <pasha.tatashin@soleen.com>,
	"Alexander Graf" <graf@amazon.com>,
	"Samiullah Khawaja" <skhawaja@google.com>,
	kexec@lists.infradead.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,
	"Andrew Morton" <akpm@linux-foundation.org>
Subject: Re: [PATCH v7 2/3] kho: fix deferred init of kho scratch
Date: Tue, 07 Apr 2026 09:21:25 -0400	[thread overview]
Message-ID: <DFD0C0F5-F23B-44CA-B2C7-9D03F2397DCF@nvidia.com> (raw)
In-Reply-To: <2vxzwlyj9d0b.fsf@kernel.org>

On 7 Apr 2026, at 8:21, Pratyush Yadav wrote:

> On Sun, Mar 22 2026, Mike Rapoport wrote:
>
>> On Thu, Mar 19, 2026 at 07:17:48PM +0100, Michał Cłapiński wrote:
>>> On Thu, Mar 19, 2026 at 8:54 AM Mike Rapoport <rppt@kernel.org> wrote:
> [...]
>>>> +__init_memblock struct memblock_region *memblock_region_from_iter(u64 iterator)
>>>> +{
>>>> +       int index = iterator & 0xffffffff;
>>>
>>> I'm not sure about this. __next_mem_range() has this code:
>>> /*
>>> * The region which ends first is
>>> * advanced for the next iteration.
>>> */
>>> if (m_end <= r_end)
>>>         idx_a++;
>>> else
>>>         idx_b++;
>>>
>>> Therefore, the index you get from this might be correct or it might
>>> already be incremented.
>>
>> Hmm, right, missed that :/
>>
>> Still, we can check if an address is inside scratch in
>> reserve_bootmem_regions() and in deferred_init_pages() and set migrate type
>> to CMA in that case.
>>
>> I think something like the patch below should work. It might not be the
>> most optimized, but it localizes the changes to mm_init and memblock and
>> does not complicated the code (well, almost).
>>
>> The patch is on top of
>> https://lore.kernel.org/linux-mm/20260322143144.3540679-1-rppt@kernel.org/T/#u
>>
>> and I pushed the entire set here:
>> https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=kho-deferred-init
>>
>> It compiles and passes kho self test with both deferred pages enabled and
>> disabled, but I didn't do further testing yet.
>>
>> From 97aa1ea8e085a128dd5add73f81a5a1e4e0aad5e Mon Sep 17 00:00:00 2001
>> From: Michal Clapinski <mclapinski@google.com>
>> Date: Tue, 17 Mar 2026 15:15:33 +0100
>> Subject: [PATCH] kho: fix deferred initialization of scratch areas
>>
>> Currently, if CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled,
>> kho_release_scratch() will initialize the struct pages and set migratetype
>> of KHO scratch. Unless the whole scratch fits below first_deferred_pfn, some
>> of that will be overwritten either by deferred_init_pages() or
>> memmap_init_reserved_range().
>>
>> To fix it, modify kho_release_scratch() to only set the migratetype on
>> already initialized pages and make deferred_init_pages() and
>> memmap_init_reserved_range() recognize KHO scratch regions and set
>> migratetype of pageblocks in that regions to MIGRATE_CMA.
>
> Hmm, I don't like that how complex this is. It adds another layer of
> complexity to the initialization of the migratetype, and you have to dig
> through all the possible call sites to be sure that we catch all the
> cases. Makes it harder to wrap your head around it. Plus, makes it more
> likely for bugs to slip through if later refactors change some page init
> flow.
>
> Is the cost to look through the scratch array really that bad? I would
> suspect we'd have at most 4-6 per-node scratches, and one global one
> lowmem. So I'd expect around 10 items to look through, and it will
> probably be in the cache anyway.

It is not only about the cost of going through the scratch array, but also
about adding kho code to the generic init_pageblock_migratetype().
This means all callers of init_pageblock_migratetype(), no matter if
they are involved with kho or not, need to do the check. It is a good
practice to do the check when necessary, otherwise, this catch-all check
might hide some bugs in the future.

>
> Michal, did you ever run any numbers on how much extra time
> init_pageblock_migratetype() takes as a result of your patch?
>
> Anyway, Mike, if you do want to do it this way, it LGTM for the most
> part, but some comments below.
>
>>
>> Signed-off-by: Michal Clapinski <mclapinski@google.com>
>> Co-developed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
>> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
>> ---
>>  include/linux/memblock.h           |  7 ++++--
>>  kernel/liveupdate/kexec_handover.c | 10 +++++---
>>  mm/memblock.c                      | 39 +++++++++++++-----------------
>>  mm/mm_init.c                       | 14 ++++++-----
>>  4 files changed, 36 insertions(+), 34 deletions(-)
>>
>> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
>> index 6ec5e9ac0699..410f2a399691 100644
>> --- a/include/linux/memblock.h
>> +++ b/include/linux/memblock.h
>> @@ -614,11 +614,14 @@ static inline void memtest_report_meminfo(struct seq_file *m) { }
>>  #ifdef CONFIG_MEMBLOCK_KHO_SCRATCH
>>  void memblock_set_kho_scratch_only(void);
>>  void memblock_clear_kho_scratch_only(void);
>> -void memmap_init_kho_scratch_pages(void);
>> +bool memblock_is_kho_scratch_memory(phys_addr_t addr);
>>  #else
>>  static inline void memblock_set_kho_scratch_only(void) { }
>>  static inline void memblock_clear_kho_scratch_only(void) { }
>> -static inline void memmap_init_kho_scratch_pages(void) {}
>> +static inline bool memblock_is_kho_scratch_memory(phys_addr_t addr)
>> +{
>> +	return false;
>> +}
>>  #endif
>>
>>  #endif /* _LINUX_MEMBLOCK_H */
>> diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_handover.c
>> index 532f455c5d4f..12292b83bf49 100644
>> --- a/kernel/liveupdate/kexec_handover.c
>> +++ b/kernel/liveupdate/kexec_handover.c
>> @@ -1457,8 +1457,7 @@ static void __init kho_release_scratch(void)
>>  {
>>  	phys_addr_t start, end;
>>  	u64 i;
>> -
>> -	memmap_init_kho_scratch_pages();
>> +	int nid;
>>
>>  	/*
>>  	 * Mark scratch mem as CMA before we return it. That way we
>> @@ -1466,10 +1465,13 @@ static void __init kho_release_scratch(void)
>>  	 * we can reuse it as scratch memory again later.
>>  	 */
>>  	__for_each_mem_range(i, &memblock.memory, NULL, NUMA_NO_NODE,
>> -			     MEMBLOCK_KHO_SCRATCH, &start, &end, NULL) {
>> +			     MEMBLOCK_KHO_SCRATCH, &start, &end, &nid) {
>>  		ulong start_pfn = pageblock_start_pfn(PFN_DOWN(start));
>>  		ulong end_pfn = pageblock_align(PFN_UP(end));
>>  		ulong pfn;
>> +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
>> +		end_pfn = min(end_pfn, NODE_DATA(nid)->first_deferred_pfn);
>> +#endif
>
> Can we just get rid of this entirely? And just update
> memmap_init_zone_range() to also look for scratch and set the
> migratetype correctly from the get go? That's more consistent IMO. The
> two main places that initialize the struct page,
> memmap_init_zone_range() and deferred_init_memmap_chunk(), check for
> scratch and set the migratetype correctly.
>>
>>  		for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages)
>>  			init_pageblock_migratetype(pfn_to_page(pfn),
>> @@ -1480,8 +1482,8 @@ static void __init kho_release_scratch(void)
>>  void __init kho_memory_init(void)
>>  {
>>  	if (kho_in.scratch_phys) {
>> -		kho_scratch = phys_to_virt(kho_in.scratch_phys);
>>  		kho_release_scratch();
>> +		kho_scratch = phys_to_virt(kho_in.scratch_phys);
>>
>>  		if (kho_mem_retrieve(kho_get_fdt()))
>>  			kho_in.fdt_phys = 0;
>> diff --git a/mm/memblock.c b/mm/memblock.c
>> index 17aa8661b84d..fe50d60db9c6 100644
>> --- a/mm/memblock.c
>> +++ b/mm/memblock.c
>> @@ -17,6 +17,7 @@
>>  #include <linux/seq_file.h>
>>  #include <linux/memblock.h>
>>  #include <linux/mutex.h>
>> +#include <linux/page-isolation.h>
>>
>>  #ifdef CONFIG_KEXEC_HANDOVER
>>  #include <linux/libfdt.h>
>> @@ -959,28 +960,6 @@ __init void memblock_clear_kho_scratch_only(void)
>>  {
>>  	kho_scratch_only = false;
>>  }
>> -
>> -__init void memmap_init_kho_scratch_pages(void)
>> -{
>> -	phys_addr_t start, end;
>> -	unsigned long pfn;
>> -	int nid;
>> -	u64 i;
>> -
>> -	if (!IS_ENABLED(CONFIG_DEFERRED_STRUCT_PAGE_INIT))
>> -		return;
>> -
>> -	/*
>> -	 * Initialize struct pages for free scratch memory.
>> -	 * The struct pages for reserved scratch memory will be set up in
>> -	 * reserve_bootmem_region()
>> -	 */
>> -	__for_each_mem_range(i, &memblock.memory, NULL, NUMA_NO_NODE,
>> -			     MEMBLOCK_KHO_SCRATCH, &start, &end, &nid) {
>> -		for (pfn = PFN_UP(start); pfn < PFN_DOWN(end); pfn++)
>> -			init_deferred_page(pfn, nid);
>> -	}
>> -}
>>  #endif
>>
>>  /**
>> @@ -1971,6 +1950,18 @@ bool __init_memblock memblock_is_map_memory(phys_addr_t addr)
>>  	return !memblock_is_nomap(&memblock.memory.regions[i]);
>>  }
>>
>> +#ifdef CONFIG_MEMBLOCK_KHO_SCRATCH
>> +bool __init_memblock memblock_is_kho_scratch_memory(phys_addr_t addr)
>> +{
>> +	int i = memblock_search(&memblock.memory, addr);
>> +
>> +	if (i == -1)
>> +		return false;
>> +
>> +	return memblock_is_kho_scratch(&memblock.memory.regions[i]);
>> +}
>> +#endif
>> +
>>  int __init_memblock memblock_search_pfn_nid(unsigned long pfn,
>>  			 unsigned long *start_pfn, unsigned long *end_pfn)
>>  {
>> @@ -2262,6 +2253,10 @@ static void __init memmap_init_reserved_range(phys_addr_t start,
>>  		 * access it yet.
>>  		 */
>>  		__SetPageReserved(page);
>> +
>> +		if (memblock_is_kho_scratch_memory(PFN_PHYS(pfn)) &&
>> +		    pageblock_aligned(pfn))
>> +			init_pageblock_migratetype(page, MIGRATE_CMA, false);
>>  	}
>>  }
>>
>> diff --git a/mm/mm_init.c b/mm/mm_init.c
>> index 96ae6024a75f..5ead2b0f07c6 100644
>> --- a/mm/mm_init.c
>> +++ b/mm/mm_init.c
>> @@ -1971,7 +1971,7 @@ unsigned long __init node_map_pfn_alignment(void)
>>
>>  #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
>>  static void __init deferred_free_pages(unsigned long pfn,
>> -		unsigned long nr_pages)
>> +		unsigned long nr_pages, enum migratetype mt)
>>  {
>>  	struct page *page;
>>  	unsigned long i;
>> @@ -1984,8 +1984,7 @@ static void __init deferred_free_pages(unsigned long pfn,
>>  	/* Free a large naturally-aligned chunk if possible */
>>  	if (nr_pages == MAX_ORDER_NR_PAGES && IS_MAX_ORDER_ALIGNED(pfn)) {
>>  		for (i = 0; i < nr_pages; i += pageblock_nr_pages)
>> -			init_pageblock_migratetype(page + i, MIGRATE_MOVABLE,
>> -					false);
>> +			init_pageblock_migratetype(page + i, mt, false);
>>  		__free_pages_core(page, MAX_PAGE_ORDER, MEMINIT_EARLY);
>>  		return;
>>  	}
>> @@ -1995,8 +1994,7 @@ static void __init deferred_free_pages(unsigned long pfn,
>>
>>  	for (i = 0; i < nr_pages; i++, page++, pfn++) {
>>  		if (pageblock_aligned(pfn))
>> -			init_pageblock_migratetype(page, MIGRATE_MOVABLE,
>> -					false);
>> +			init_pageblock_migratetype(page, mt, false);
>>  		__free_pages_core(page, 0, MEMINIT_EARLY);
>>  	}
>>  }
>> @@ -2052,6 +2050,7 @@ deferred_init_memmap_chunk(unsigned long start_pfn, unsigned long end_pfn,
>>  	u64 i = 0;
>>
>>  	for_each_free_mem_range(i, nid, 0, &start, &end, NULL) {
>> +		enum migratetype mt = MIGRATE_MOVABLE;
>>  		unsigned long spfn = PFN_UP(start);
>>  		unsigned long epfn = PFN_DOWN(end);
>>
>> @@ -2061,12 +2060,15 @@ deferred_init_memmap_chunk(unsigned long start_pfn, unsigned long end_pfn,
>>  		spfn = max(spfn, start_pfn);
>>  		epfn = min(epfn, end_pfn);
>>
>> +		if (memblock_is_kho_scratch_memory(PFN_PHYS(spfn)))
>> +			mt = MIGRATE_CMA;
>
> Would it make sense for for_each_free_mem_range() to also return the
> flags for the region? Then you won't have to do another search. It adds
> yet another parameter to it so no strong opinion, but something to
> consider.
>
>> +
>>  		while (spfn < epfn) {
>>  			unsigned long mo_pfn = ALIGN(spfn + 1, MAX_ORDER_NR_PAGES);
>>  			unsigned long chunk_end = min(mo_pfn, epfn);
>>
>>  			nr_pages += deferred_init_pages(zone, spfn, chunk_end);
>> -			deferred_free_pages(spfn, chunk_end - spfn);
>> +			deferred_free_pages(spfn, chunk_end - spfn, mt);
>>
>>  			spfn = chunk_end;
>>
>> -- 
>>
>> 2.53.0
>
> -- 
> Regards,
> Pratyush Yadav


Best Regards,
Yan, Zi

next prev parent reply	other threads:[~2026-04-07 13:21 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-17 14:15 [PATCH v7 0/3] kho: add support for deferred struct page init Michal Clapinski
2026-03-17 14:15 ` [PATCH v7 1/3] kho: make kho_scratch_overlap usable outside debugging Michal Clapinski
2026-03-18  9:16   ` Mike Rapoport
2026-04-07 10:55     ` Pratyush Yadav
2026-04-07 14:18       ` Pasha Tatashin
2026-04-07 16:09         ` Pratyush Yadav
2026-04-07 16:32           ` Pasha Tatashin
2026-03-17 14:15 ` [PATCH v7 2/3] kho: fix deferred init of kho scratch Michal Clapinski
2026-03-17 23:23   ` Vishal Moola (Oracle)
2026-03-18  0:08     ` SeongJae Park
2026-03-18  0:23       ` Andrew Morton
2026-03-18  9:33   ` Mike Rapoport
2026-03-18 10:28     ` Michał Cłapiński
2026-03-18 10:33     ` Michał Cłapiński
2026-03-18 11:02       ` Mike Rapoport
2026-03-18 15:10   ` Zi Yan
2026-03-18 15:18     ` Michał Cłapiński
2026-03-18 15:26       ` Zi Yan
2026-03-18 15:45         ` Michał Cłapiński
2026-03-18 17:08           ` Zi Yan
2026-03-18 17:19             ` Michał Cłapiński
2026-03-18 17:36               ` Zi Yan
2026-03-19  7:54                 ` Mike Rapoport
2026-03-19 18:17                   ` Michał Cłapiński
2026-03-22 14:45                     ` Mike Rapoport
2026-04-07 12:21                       ` Pratyush Yadav
2026-04-07 13:21                         ` Zi Yan [this message]
2026-03-17 14:15 ` [PATCH v7 3/3] kho: make preserved pages compatible with deferred struct page init Michal Clapinski
2026-03-17 17:46 ` [PATCH v7 0/3] kho: add support for " Andrew Morton
2026-03-18  9:34   ` Mike Rapoport
2026-03-18  9:18 ` Mike Rapoport

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DFD0C0F5-F23B-44CA-B2C7-9D03F2397DCF@nvidia.com \
    --to=ziy@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=epetron@amazon.de \
    --cc=graf@amazon.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mclapinski@google.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=pratyush@kernel.org \
    --cc=rppt@kernel.org \
    --cc=skhawaja@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox