From: Zi Yan <ziy@nvidia.com>
To: "Michał Cłapiński" <mclapinski@google.com>
Cc: Evangelos Petrongonas <epetron@amazon.de>,
Pasha Tatashin <pasha.tatashin@soleen.com>,
Mike Rapoport <rppt@kernel.org>,
Pratyush Yadav <pratyush@kernel.org>,
Alexander Graf <graf@amazon.com>,
Samiullah Khawaja <skhawaja@google.com>,
kexec@lists.infradead.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH v7 2/3] kho: fix deferred init of kho scratch
Date: Wed, 18 Mar 2026 13:36:07 -0400 [thread overview]
Message-ID: <58A8B1B4-A73B-48D2-8492-A58A03634644@nvidia.com> (raw)
In-Reply-To: <CAAi7L5fvUPUqd3A4m6wubeYh90NH+S2KBRE1R8WKerBYhkU8kg@mail.gmail.com>
On 18 Mar 2026, at 13:19, Michał Cłapiński wrote:
> On Wed, Mar 18, 2026 at 6:08 PM Zi Yan <ziy@nvidia.com> wrote:
>>
>> On 18 Mar 2026, at 11:45, Michał Cłapiński wrote:
>>
>>> On Wed, Mar 18, 2026 at 4:26 PM Zi Yan <ziy@nvidia.com> wrote:
>>>>
>>>> On 18 Mar 2026, at 11:18, Michał Cłapiński wrote:
>>>>
>>>>> On Wed, Mar 18, 2026 at 4:10 PM Zi Yan <ziy@nvidia.com> wrote:
>>>>>>
>>>>>> On 17 Mar 2026, at 10:15, Michal Clapinski wrote:
>>>>>>
>>>>>>> Currently, if DEFERRED is enabled, kho_release_scratch will initialize
>>>>>>> the struct pages and set migratetype of kho scratch. Unless the whole
>>>>>>> scratch fit below first_deferred_pfn, some of that will be overwritten
>>>>>>> either by deferred_init_pages or memmap_init_reserved_pages.
>>>>>>>
>>>>>>> To fix it, I modified kho_release_scratch to only set the migratetype
>>>>>>> on already initialized pages. Then, modified init_pageblock_migratetype
>>>>>>> to set the migratetype to CMA if the page is located inside scratch.
>>>>>>>
>>>>>>> Signed-off-by: Michal Clapinski <mclapinski@google.com>
>>>>>>> ---
>>>>>>> include/linux/memblock.h | 2 --
>>>>>>> kernel/liveupdate/kexec_handover.c | 10 ++++++----
>>>>>>> mm/memblock.c | 22 ----------------------
>>>>>>> mm/page_alloc.c | 7 +++++++
>>>>>>> 4 files changed, 13 insertions(+), 28 deletions(-)
>>>>>>>
>>>>>>
>>>>>> <snip>
>>>>>>
>>>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>>>>> index ee81f5c67c18..5ca078dde61d 100644
>>>>>>> --- a/mm/page_alloc.c
>>>>>>> +++ b/mm/page_alloc.c
>>>>>>> @@ -55,6 +55,7 @@
>>>>>>> #include <linux/cacheinfo.h>
>>>>>>> #include <linux/pgalloc_tag.h>
>>>>>>> #include <linux/mmzone_lock.h>
>>>>>>> +#include <linux/kexec_handover.h>
>>>>>>> #include <asm/div64.h>
>>>>>>> #include "internal.h"
>>>>>>> #include "shuffle.h"
>>>>>>> @@ -549,6 +550,12 @@ void __meminit init_pageblock_migratetype(struct page *page,
>>>>>>> migratetype < MIGRATE_PCPTYPES))
>>>>>>> migratetype = MIGRATE_UNMOVABLE;
>>>>>>>
>>>>>>> + /*
>>>>>>> + * Mark KHO scratch as CMA so no unmovable allocations are made there.
>>>>>>> + */
>>>>>>> + if (unlikely(kho_scratch_overlap(page_to_phys(page), PAGE_SIZE)))
>>>>>>> + migratetype = MIGRATE_CMA;
>>>>>>> +
>>>>>>
>>>>>> If this is only for deferred init code, why not put it in deferred_free_pages()?
>>>>>> Otherwise, all init_pageblock_migratetype() callers need to pay the penalty
>>>>>> of traversing kho_scratch array.
>>>>>
>>>>> Because reserve_bootmem_region() doesn't call deferred_free_pages().
>>>>> So I would also have to modify it.
>>>>>
>>>>> And the early initialization won't pay the penalty of traversing the
>>>>> kho_scratch array, since then kho_scratch is NULL.
>>>>
>>>> How about hugetlb_bootmem_init_migratetype(), init_cma_pageblock(),
>>>> init_cma_reserved_pageblock(), __init_page_from_nid(), memmap_init_range(),
>>>> __init_zone_device_page()?
>>>>
>>>> 1. are they having any PFN range overlapping with kho?
>>>> 2. is kho_scratch NULL for them?
>>>>
>>>> 1 tells us whether putting code in init_pageblock_migratetype() could save
>>>> the hassle of changing all above locations.
>>>> 2 tells us how many callers are affected by traversing kho_scratch.
>>>
>>> I could try answering those questions but
>>>
>>> 1. I'm new to this and I'm not sure how correct the answers will be.
>>>
>>> 2. If you're not using CONFIG_KEXEC_HANDOVER, the performance penalty
>>> will be zero.
>>> If you are using it, currently you have to disable
>>> CONFIG_DEFERRED_STRUCT_PAGE_INIT and the performance hit from this is
>>> far, far greater. This solution saves 0.5s on my setup (100GB of
>>> memory). We can always improve the performance further in the future.
>>>
>>
>> OK, I asked Claude for help and the answer is that not all callers of
>> init_pageblock_migratetype() touch kho scratch memory regions. Basically,
>> you only need to perform the kho_scratch_overlap() check in
>> __init_page_from_nid() to achieve the same end result.
>>
>>
>> The below is the analysis from Claude.
>> Based on my understanding,
>> 1. memmap_init_range() is done before kho_memory_init(), so it does not need
>> the check.
>>
>> 2. __init_zone_device_page() is not relevant.
>>
>> 3. init_cma_reserved_pageblock() / init_cma_pageblock() are already set
>> to MIGRATE_CMA.
>>
>> 4. hugetlb is not used by kho scratch, so also does not need the check.
>>
>> 5. kho_release_scratch() already takes care of it.
>>
>> The remaining memblock_free_pages() needs a check, but I am not 100%.
>>
>>
>> # kho_scratch_overlap() in init_pageblock_migratetype() — scope analysis
>>
>> ## Context
>>
>> Commit a7700b3c6779 ("kho: fix deferred init of kho scratch") added a
>> kho_scratch_overlap() call inside init_pageblock_migratetype() in
>> mm/page_alloc.c:
>>
>> ```c
>> if (unlikely(kho_scratch_overlap(page_to_phys(page), PAGE_SIZE)))
>> migratetype = MIGRATE_CMA;
>> ```
>>
>> kho_scratch_overlap() does a NULL check followed by a loop over the
>> kho_scratch array. For non-KHO boots (kho_scratch == NULL) the cost is
>> a single NULL load and branch. For KHO boots the loop runs on every call
>> to init_pageblock_migratetype().
>>
>> ## Question
>>
>> Does this add overhead for callers whose memory range cannot overlap
>> with scratch? Can the check be moved to the caller side?
>>
>> ## Call site analysis
>>
>> init_pageblock_migratetype() has nine call sites. The init call ordering
>> relevant to scratch is:
>>
>> ```
>> setup_arch()
>> zone_sizes_init() -> free_area_init() -> memmap_init_range() [1]
>>
>> mm_init_free_all() / start_kernel():
>> kho_memory_init() -> kho_release_scratch() [2]
>> memblock_free_all()
>> free_low_memory_core_early()
>> memmap_init_reserved_pages()
>> reserve_bootmem_region() -> __init_deferred_page()
>> -> __init_page_from_nid() [3]
>> deferred init kthreads -> __init_page_from_nid() [4]
>> ```
>
> I don't understand this. deferred_free_pages() doesn't call
> __init_page_from_nid(). So I would clearly need to modify both
> deferred_free_pages and __init_page_from_nid.
Sure. But other callers I mentioned above do not need to check kho_scratch,
right?
Best Regards,
Yan, Zi
next prev parent reply other threads:[~2026-03-18 17:36 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-17 14:15 [PATCH v7 0/3] kho: add support for deferred struct page init Michal Clapinski
2026-03-17 14:15 ` [PATCH v7 1/3] kho: make kho_scratch_overlap usable outside debugging Michal Clapinski
2026-03-18 9:16 ` Mike Rapoport
2026-04-07 10:55 ` Pratyush Yadav
2026-04-07 14:18 ` Pasha Tatashin
2026-04-07 16:09 ` Pratyush Yadav
2026-04-07 16:32 ` Pasha Tatashin
2026-03-17 14:15 ` [PATCH v7 2/3] kho: fix deferred init of kho scratch Michal Clapinski
2026-03-17 23:23 ` Vishal Moola (Oracle)
2026-03-18 0:08 ` SeongJae Park
2026-03-18 0:23 ` Andrew Morton
2026-03-18 9:33 ` Mike Rapoport
2026-03-18 10:28 ` Michał Cłapiński
2026-03-18 10:33 ` Michał Cłapiński
2026-03-18 11:02 ` Mike Rapoport
2026-03-18 15:10 ` Zi Yan
2026-03-18 15:18 ` Michał Cłapiński
2026-03-18 15:26 ` Zi Yan
2026-03-18 15:45 ` Michał Cłapiński
2026-03-18 17:08 ` Zi Yan
2026-03-18 17:19 ` Michał Cłapiński
2026-03-18 17:36 ` Zi Yan [this message]
2026-03-19 7:54 ` Mike Rapoport
2026-03-19 18:17 ` Michał Cłapiński
2026-03-22 14:45 ` Mike Rapoport
2026-04-07 12:21 ` Pratyush Yadav
2026-04-07 13:21 ` Zi Yan
2026-03-17 14:15 ` [PATCH v7 3/3] kho: make preserved pages compatible with deferred struct page init Michal Clapinski
2026-03-17 17:46 ` [PATCH v7 0/3] kho: add support for " Andrew Morton
2026-03-18 9:34 ` Mike Rapoport
2026-03-18 9:18 ` Mike Rapoport
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=58A8B1B4-A73B-48D2-8492-A58A03634644@nvidia.com \
--to=ziy@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=epetron@amazon.de \
--cc=graf@amazon.com \
--cc=kexec@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mclapinski@google.com \
--cc=pasha.tatashin@soleen.com \
--cc=pratyush@kernel.org \
--cc=rppt@kernel.org \
--cc=skhawaja@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox