From: Mike Rapoport <rppt@kernel.org>
To: Pratyush Yadav <pratyush@kernel.org>
Cc: "Michał Cłapiński" <mclapinski@google.com>,
"Zi Yan" <ziy@nvidia.com>,
"Evangelos Petrongonas" <epetron@amazon.de>,
"Pasha Tatashin" <pasha.tatashin@soleen.com>,
"Alexander Graf" <graf@amazon.com>,
"Samiullah Khawaja" <skhawaja@google.com>,
kexec@lists.infradead.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org,
"Andrew Morton" <akpm@linux-foundation.org>
Subject: Re: [PATCH v7 2/3] kho: fix deferred init of kho scratch
Date: Thu, 9 Apr 2026 21:06:08 +0300 [thread overview]
Message-ID: <adfqkOWVFgeAkItF@kernel.org> (raw)
In-Reply-To: <2vxzwlyj9d0b.fsf@kernel.org>
On Tue, Apr 07, 2026 at 12:21:56PM +0000, Pratyush Yadav wrote:
> On Sun, Mar 22 2026, Mike Rapoport wrote:
>
> > On Thu, Mar 19, 2026 at 07:17:48PM +0100, Michał Cłapiński wrote:
> >> On Thu, Mar 19, 2026 at 8:54 AM Mike Rapoport <rppt@kernel.org> wrote:
> [...]
> >> > +__init_memblock struct memblock_region *memblock_region_from_iter(u64 iterator)
> >> > +{
> >> > + int index = iterator & 0xffffffff;
> >>
> >> I'm not sure about this. __next_mem_range() has this code:
> >> /*
> >> * The region which ends first is
> >> * advanced for the next iteration.
> >> */
> >> if (m_end <= r_end)
> >> idx_a++;
> >> else
> >> idx_b++;
> >>
> >> Therefore, the index you get from this might be correct or it might
> >> already be incremented.
> >
> > Hmm, right, missed that :/
> >
> > Still, we can check if an address is inside scratch in
> > reserve_bootmem_regions() and in deferred_init_pages() and set migrate type
> > to CMA in that case.
> >
> > I think something like the patch below should work. It might not be the
> > most optimized, but it localizes the changes to mm_init and memblock and
> > does not complicated the code (well, almost).
> >
> > The patch is on top of
> > https://lore.kernel.org/linux-mm/20260322143144.3540679-1-rppt@kernel.org/T/#u
> >
> > and I pushed the entire set here:
> > https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=kho-deferred-init
> >
> > It compiles and passes kho self test with both deferred pages enabled and
> > disabled, but I didn't do further testing yet.
> >
> > From 97aa1ea8e085a128dd5add73f81a5a1e4e0aad5e Mon Sep 17 00:00:00 2001
> > From: Michal Clapinski <mclapinski@google.com>
> > Date: Tue, 17 Mar 2026 15:15:33 +0100
> > Subject: [PATCH] kho: fix deferred initialization of scratch areas
> >
> > Currently, if CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled,
> > kho_release_scratch() will initialize the struct pages and set migratetype
> > of KHO scratch. Unless the whole scratch fits below first_deferred_pfn, some
> > of that will be overwritten either by deferred_init_pages() or
> > memmap_init_reserved_range().
> >
> > To fix it, modify kho_release_scratch() to only set the migratetype on
> > already initialized pages and make deferred_init_pages() and
> > memmap_init_reserved_range() recognize KHO scratch regions and set
> > migratetype of pageblocks in that regions to MIGRATE_CMA.
>
> Hmm, I don't like that how complex this is. It adds another layer of
> complexity to the initialization of the migratetype, and you have to dig
> through all the possible call sites to be sure that we catch all the
> cases. Makes it harder to wrap your head around it. Plus, makes it more
> likely for bugs to slip through if later refactors change some page init
> flow.
>
> Is the cost to look through the scratch array really that bad? I would
> suspect we'd have at most 4-6 per-node scratches, and one global one
> lowmem. So I'd expect around 10 items to look through, and it will
> probably be in the cache anyway.
The check is most probably cheap enough, but if we stick it into
init_pageblock_migratetype(), we'd check each pageblock even though we have
that information already for much larger chunks. And we should use that
information rather than go back and forth for each pageblock.
> Michal, did you ever run any numbers on how much extra time
> init_pageblock_migratetype() takes as a result of your patch?
>
> Anyway, Mike, if you do want to do it this way, it LGTM for the most
> part, but some comments below.
>
> > @@ -1466,10 +1465,13 @@ static void __init kho_release_scratch(void)
> > * we can reuse it as scratch memory again later.
> > */
> > __for_each_mem_range(i, &memblock.memory, NULL, NUMA_NO_NODE,
> > - MEMBLOCK_KHO_SCRATCH, &start, &end, NULL) {
> > + MEMBLOCK_KHO_SCRATCH, &start, &end, &nid) {
> > ulong start_pfn = pageblock_start_pfn(PFN_DOWN(start));
> > ulong end_pfn = pageblock_align(PFN_UP(end));
> > ulong pfn;
> > +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
> > + end_pfn = min(end_pfn, NODE_DATA(nid)->first_deferred_pfn);
> > +#endif
>
> Can we just get rid of this entirely? And just update
> memmap_init_zone_range() to also look for scratch and set the
> migratetype correctly from the get go? That's more consistent IMO. The
> two main places that initialize the struct page,
> memmap_init_zone_range() and deferred_init_memmap_chunk(), check for
> scratch and set the migratetype correctly.
We could. E.g. let memmap_init() check the memblock flags and pass the
migratetype to memmap_init_zone_range().
I wanted to avoid as much KHO code in mm/ as possible, but if it is must
have in deferred_init_memmap_chunk() we could add some to memmap_init() as
well.
> > @@ -2061,12 +2060,15 @@ deferred_init_memmap_chunk(unsigned long start_pfn, unsigned long end_pfn,
> > spfn = max(spfn, start_pfn);
> > epfn = min(epfn, end_pfn);
> >
> > + if (memblock_is_kho_scratch_memory(PFN_PHYS(spfn)))
> > + mt = MIGRATE_CMA;
>
> Would it make sense for for_each_free_mem_range() to also return the
> flags for the region? Then you won't have to do another search. It adds
> yet another parameter to it so no strong opinion, but something to
> consider.
I hesitated a lot about this.
Have you seen memblock::__next_mem_range() signature? ;-)
I decided to start with something correct, but slowish and leave the churn
and speed for later.
> --
> Regards,
> Pratyush Yadav
--
Sincerely yours,
Mike.
next prev parent reply other threads:[~2026-04-09 18:06 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-17 14:15 [PATCH v7 0/3] kho: add support for deferred struct page init Michal Clapinski
2026-03-17 14:15 ` [PATCH v7 1/3] kho: make kho_scratch_overlap usable outside debugging Michal Clapinski
2026-03-18 9:16 ` Mike Rapoport
2026-04-07 10:55 ` Pratyush Yadav
2026-04-07 14:18 ` Pasha Tatashin
2026-04-07 16:09 ` Pratyush Yadav
2026-04-07 16:32 ` Pasha Tatashin
2026-03-17 14:15 ` [PATCH v7 2/3] kho: fix deferred init of kho scratch Michal Clapinski
2026-03-17 23:23 ` Vishal Moola (Oracle)
2026-03-18 0:08 ` SeongJae Park
2026-03-18 0:23 ` Andrew Morton
2026-03-18 9:33 ` Mike Rapoport
2026-03-18 10:28 ` Michał Cłapiński
2026-03-18 10:33 ` Michał Cłapiński
2026-03-18 11:02 ` Mike Rapoport
2026-03-18 15:10 ` Zi Yan
2026-03-18 15:18 ` Michał Cłapiński
2026-03-18 15:26 ` Zi Yan
2026-03-18 15:45 ` Michał Cłapiński
2026-03-18 17:08 ` Zi Yan
2026-03-18 17:19 ` Michał Cłapiński
2026-03-18 17:36 ` Zi Yan
2026-03-19 7:54 ` Mike Rapoport
2026-03-19 18:17 ` Michał Cłapiński
2026-03-22 14:45 ` Mike Rapoport
2026-04-07 12:21 ` Pratyush Yadav
2026-04-07 13:21 ` Zi Yan
2026-04-09 18:06 ` Mike Rapoport [this message]
2026-03-17 14:15 ` [PATCH v7 3/3] kho: make preserved pages compatible with deferred struct page init Michal Clapinski
2026-03-17 17:46 ` [PATCH v7 0/3] kho: add support for " Andrew Morton
2026-03-18 9:34 ` Mike Rapoport
2026-03-18 9:18 ` Mike Rapoport
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=adfqkOWVFgeAkItF@kernel.org \
--to=rppt@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=epetron@amazon.de \
--cc=graf@amazon.com \
--cc=kexec@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mclapinski@google.com \
--cc=pasha.tatashin@soleen.com \
--cc=pratyush@kernel.org \
--cc=skhawaja@google.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox