* [PATCH] mm: Fix memblock_free_late() when using deferred struct page
@ 2026-02-03 8:02 Benjamin Herrenschmidt
2026-02-03 18:40 ` Mike Rapoport
2026-02-17 8:28 ` [PATCH v2] " Benjamin Herrenschmidt
0 siblings, 2 replies; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2026-02-03 8:02 UTC (permalink / raw)
To: linux-mm; +Cc: Mike Rapoport
Currently, when using deferred struct page init, we may end up not
freeing the pages depending on where they are. Typically this happens
when efi_free_boot_services() tries to free UEFI Boot Services pages.
We can hit the !early_page_initialised() test in memblock_free_pages()
since the deferred initializer hasn't even started yet. As a result we
drop the pages on the floor.
Now, memblock_free_late() should only ever be called for pages that
are reserved, and thus for which the struct page has already been
initialized by memmap_init_reserved_pages().
So it should be safe to just free them normally and ignore the deferred
initializer, which will skip over them as it skips over anything still
in the memblock reserved list.
This recovers something like 130MB of RAM on EC2 t3a.nano instances who
only have 512MB to begin with (as to why UEFI uses that much, that's a
question for another day).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
mm/internal.h | 2 +-
mm/memblock.c | 6 ++++--
mm/mm_init.c | 4 ++--
3 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/mm/internal.h b/mm/internal.h
index 9e0577413087c..fe6da7c30caf0 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -660,7 +660,7 @@ extern int __isolate_free_page(struct page *page,
unsigned int order);
extern void __putback_isolated_page(struct page *page, unsigned int
order,
int mt);
extern void memblock_free_pages(struct page *page, unsigned long pfn,
- unsigned int order);
+ unsigned int order, bool reserved);
extern void __free_pages_core(struct page *page, unsigned int order,
enum meminit_context context);
diff --git a/mm/memblock.c b/mm/memblock.c
index 3d7b0114442c4..b836b638615d0 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1718,8 +1718,10 @@ void __init memblock_free_late(phys_addr_t base,
phys_addr_t size)
cursor = PFN_UP(base);
end = PFN_DOWN(base + size);
+ /* Only free pages that were reserved */
+ VM_WARN_ON(!memblock_is_region_reserved(base, size));
for (; cursor < end; cursor++) {
- memblock_free_pages(pfn_to_page(cursor), cursor, 0);
+ memblock_free_pages(pfn_to_page(cursor), cursor, 0,
true);
totalram_pages_inc();
}
}
@@ -2141,7 +2143,7 @@ static void __init __free_pages_memory(unsigned
long start, unsigned long end)
while (start + (1UL << order) > end)
order--;
- memblock_free_pages(pfn_to_page(start), start, order);
+ memblock_free_pages(pfn_to_page(start), start, order,
false);
start += (1UL << order);
}
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 624c1f90ce050..34dc39a21b4bb 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -2440,9 +2440,9 @@ void *__init alloc_large_system_hash(const char
*tablename,
}
void __init memblock_free_pages(struct page *page, unsigned long pfn,
- unsigned int
order)
+ unsigned int order, bool reserved)
{
- if (IS_ENABLED(CONFIG_DEFERRED_STRUCT_PAGE_INIT)) {
+ if (IS_ENABLED(CONFIG_DEFERRED_STRUCT_PAGE_INIT) && !reserved)
{
int nid = early_pfn_to_nid(pfn);
if (!early_page_initialised(pfn, nid))
--
2.43.0
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] mm: Fix memblock_free_late() when using deferred struct page
2026-02-03 8:02 [PATCH] mm: Fix memblock_free_late() when using deferred struct page Benjamin Herrenschmidt
@ 2026-02-03 18:40 ` Mike Rapoport
2026-02-03 19:53 ` Benjamin Herrenschmidt
2026-02-17 8:28 ` [PATCH v2] " Benjamin Herrenschmidt
1 sibling, 1 reply; 33+ messages in thread
From: Mike Rapoport @ 2026-02-03 18:40 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linux-mm
Hi Ben,
On Tue, Feb 03, 2026 at 07:02:08PM +1100, Benjamin Herrenschmidt wrote:
> Currently, when using deferred struct page init, we may end up not
> freeing the pages depending on where they are. Typically this happens
> when efi_free_boot_services() tries to free UEFI Boot Services pages.
>
> We can hit the !early_page_initialised() test in memblock_free_pages()
> since the deferred initializer hasn't even started yet. As a result we
> drop the pages on the floor.
>
> Now, memblock_free_late() should only ever be called for pages that
> are reserved, and thus for which the struct page has already been
> initialized by memmap_init_reserved_pages().
>
> So it should be safe to just free them normally and ignore the deferred
> initializer, which will skip over them as it skips over anything still
> in the memblock reserved list.
>
> This recovers something like 130MB of RAM on EC2 t3a.nano instances who
> only have 512MB to begin with (as to why UEFI uses that much, that's a
> question for another day).
>
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> ---
> mm/internal.h | 2 +-
> mm/memblock.c | 6 ++++--
> mm/mm_init.c | 4 ++--
> 3 files changed, 7 insertions(+), 5 deletions(-)
...
> @@ -2440,9 +2440,9 @@ void *__init alloc_large_system_hash(const char
> *tablename,
> }
>
> void __init memblock_free_pages(struct page *page, unsigned long pfn,
> - unsigned int
> order)
> + unsigned int order, bool reserved)
I've been thinking about after more coffee after our chat on IRC, and I
believe we don't need the bool reserved here.
Since the assumption that memblock_free_late() should be called only after
buddy is initialized, all the reserved pages should have their memmap setup
with PG_Reserved set. So we can use PageReserved() instead of passing the
boolean.
> {
> - if (IS_ENABLED(CONFIG_DEFERRED_STRUCT_PAGE_INIT)) {
> + if (IS_ENABLED(CONFIG_DEFERRED_STRUCT_PAGE_INIT) && !reserved)
> {
> int nid = early_pfn_to_nid(pfn);
>
> if (!early_page_initialised(pfn, nid))
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] mm: Fix memblock_free_late() when using deferred struct page
2026-02-03 18:40 ` Mike Rapoport
@ 2026-02-03 19:53 ` Benjamin Herrenschmidt
2026-02-04 7:39 ` Mike Rapoport
0 siblings, 1 reply; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2026-02-03 19:53 UTC (permalink / raw)
To: Mike Rapoport; +Cc: linux-mm
On Tue, 2026-02-03 at 20:40 +0200, Mike Rapoport wrote:
> > void __init memblock_free_pages(struct page *page, unsigned long pfn,
> > - unsigned int
> > order)
> > + unsigned int order, bool reserved)
>
> I've been thinking about after more coffee after our chat on IRC, and I
> believe we don't need the bool reserved here.
>
> Since the assumption that memblock_free_late() should be called only after
> buddy is initialized, all the reserved pages should have their memmap setup
> with PG_Reserved set. So we can use PageReserved() instead of passing the
> boolean.
What about free_low_memory_core_early() ->
__free_memory_core() ->
__free_pages_memory() ->
memblock_free_pages() ?
I might be missing something but I don't see what would restrict this
to the early pre-initialized struct pages other than that
early_page_initialised() test, so we can't rely on anything in struct
page inside memblock_free_pages().
Cheers,
Ben.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] mm: Fix memblock_free_late() when using deferred struct page
2026-02-03 19:53 ` Benjamin Herrenschmidt
@ 2026-02-04 7:39 ` Mike Rapoport
2026-02-04 9:02 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 33+ messages in thread
From: Mike Rapoport @ 2026-02-04 7:39 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linux-mm
On Wed, Feb 04, 2026 at 06:53:29AM +1100, Benjamin Herrenschmidt wrote:
> On Tue, 2026-02-03 at 20:40 +0200, Mike Rapoport wrote:
> > > void __init memblock_free_pages(struct page *page, unsigned long pfn,
> > > - unsigned int
> > > order)
> > > + unsigned int order, bool reserved)
> >
> > I've been thinking about after more coffee after our chat on IRC, and I
> > believe we don't need the bool reserved here.
> >
> > Since the assumption that memblock_free_late() should be called only after
> > buddy is initialized, all the reserved pages should have their memmap setup
> > with PG_Reserved set. So we can use PageReserved() instead of passing the
> > boolean.
>
> What about free_low_memory_core_early() ->
> __free_memory_core() ->
> __free_pages_memory() ->
> memblock_free_pages() ?
>
> I might be missing something but I don't see what would restrict this
> to the early pre-initialized struct pages other than that
> early_page_initialised() test, so we can't rely on anything in struct
> page inside memblock_free_pages().
Right, we can't rely on PG_Reserved being cleared for uninitialized pages :/
But I overlooked an easier and actually reliable way: use
free_reserved_area() instead of memblock_free_late().
> Cheers,
> Ben.
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] mm: Fix memblock_free_late() when using deferred struct page
2026-02-04 7:39 ` Mike Rapoport
@ 2026-02-04 9:02 ` Benjamin Herrenschmidt
2026-02-06 10:33 ` Mike Rapoport
0 siblings, 1 reply; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2026-02-04 9:02 UTC (permalink / raw)
To: Mike Rapoport; +Cc: linux-mm
On Wed, 2026-02-04 at 09:39 +0200, Mike Rapoport wrote:
> > I might be missing something but I don't see what would restrict
> > this
> > to the early pre-initialized struct pages other than that
> > early_page_initialised() test, so we can't rely on anything in
> > struct
> > page inside memblock_free_pages().
>
> Right, we can't rely on PG_Reserved being cleared for uninitialized
> pages :/
>
> But I overlooked an easier and actually reliable way: use
> free_reserved_area() instead of memblock_free_late().
You mean replace all callers of memblock_free_late() and kill it ? Or
make memblock_free_late() use free_reserved_area() instead of
memblock_free_pages() ? :-)
The former misses:
- totalram_pages_inc() and kmemleak_free_part_phys() in
memblock_free_late()
They also both miss as far as I can tell:
if (!kmsan_memblock_free_pages(page, order)) {
/* KMSAN will take care of these pages. */
return;
}
But I don't know if that matters, I don't know anything about kmsan :-)
There are other subtle differences between the two implementations
which probably boil down to the same thing but it's been a while and I
don't have time today to dig into the gory details :-)
ie, one does
clear_page_tag_ref(page);
__free_pages_core(page, order, MEMINIT_EARLY);
ie, clear_page_tag_ref() is done once for the whole "order" (though in
the memblock_free_late() order is always 0), then __free_pages_core()
which kind-of hard resets count to 0 etc...
The other one ends up setting the count to 1 then __free_page() which
does a LOT more "stuff" that is new to me since last I looked (such as
the pcp stuff), ie a lot more convoluted code path, but I don't know if
it differs practically for that use case :-)
I assume that the right approach here is to make memblock_free_late()
call free_reserved_area() instead of memblock_free_pages() so we
preserve totalram_pages_inc() and kmemleak_free_part_phys() but I might
be missing something (and I don't know about KMSAN).
Cheers,
Ben.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] mm: Fix memblock_free_late() when using deferred struct page
2026-02-04 9:02 ` Benjamin Herrenschmidt
@ 2026-02-06 10:33 ` Mike Rapoport
2026-02-10 1:04 ` Benjamin Herrenschmidt
2026-02-16 10:36 ` Alexander Potapenko
0 siblings, 2 replies; 33+ messages in thread
From: Mike Rapoport @ 2026-02-06 10:33 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: linux-mm, Alexander Potapenko, Marco Elver, Dmitry Vyukov
(added KMSAN folks)
On Wed, Feb 04, 2026 at 08:02:13PM +1100, Benjamin Herrenschmidt wrote:
> On Wed, 2026-02-04 at 09:39 +0200, Mike Rapoport wrote:
> > > I might be missing something but I don't see what would restrict
> > > this
> > > to the early pre-initialized struct pages other than that
> > > early_page_initialised() test, so we can't rely on anything in
> > > struct
> > > page inside memblock_free_pages().
> >
> > Right, we can't rely on PG_Reserved being cleared for uninitialized
> > pages :/
> >
> > But I overlooked an easier and actually reliable way: use
> > free_reserved_area() instead of memblock_free_late().
>
> You mean replace all callers of memblock_free_late() and kill it ?
That would be great, but with all the subtle differences you note below
it's for the future :)
> Or make memblock_free_late() use free_reserved_area() instead of
> memblock_free_pages() ? :-)
Yes, I think either calling free_reserved_page() in the loop in
memblock_free_late() or replacing the entire loop with free_reserved_area().
> The former misses:
> - totalram_pages_inc() and kmemleak_free_part_phys() in
> memblock_free_late()
>
> They also both miss as far as I can tell:
>
> if (!kmsan_memblock_free_pages(page, order)) {
> /* KMSAN will take care of these pages. */
> return;
> }
>
> But I don't know if that matters, I don't know anything about kmsan :-)
AFAIU, here kmsan allocates metadata for each page freed to buddy, but it
handles reserved memory differently anyway, so it shouldn't be a problem.
> There are other subtle differences between the two implementations
> which probably boil down to the same thing but it's been a while and I
> don't have time today to dig into the gory details :-)
>
> ie, one does
>
> clear_page_tag_ref(page);
> __free_pages_core(page, order, MEMINIT_EARLY);
>
> ie, clear_page_tag_ref() is done once for the whole "order" (though in
> the memblock_free_late() order is always 0), then __free_pages_core()
> which kind-of hard resets count to 0 etc...
>
> The other one ends up setting the count to 1 then __free_page() which
> does a LOT more "stuff" that is new to me since last I looked (such as
> the pcp stuff), ie a lot more convoluted code path, but I don't know if
> it differs practically for that use case :-)
It does not :)
with __free_page() the pages may be freed to PCP lists rather than on
global free lists, but it does not really matter, the pages are free and
buddy can use them.
> I assume that the right approach here is to make memblock_free_late()
> call free_reserved_area() instead of memblock_free_pages() so we
> preserve totalram_pages_inc() and kmemleak_free_part_phys() but I might
> be missing something (and I don't know about KMSAN).
free_reserved_page() adjusts totalram internally, so we only need to
preserve kmemleak part.
So it essentially becomes "oneliner" :)
diff --git a/mm/memblock.c b/mm/memblock.c
index e76255e4ff36..6e984bcdf6cd 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1770,10 +1770,8 @@ void __init memblock_free_late(phys_addr_t base, phys_addr_t size)
cursor = PFN_UP(base);
end = PFN_DOWN(base + size);
- for (; cursor < end; cursor++) {
- memblock_free_pages(pfn_to_page(cursor), cursor, 0);
- totalram_pages_inc();
- }
+ for (; cursor < end; cursor++)
+ free_reserved_page(pfn_to_page(cursor));
}
/*
> Cheers,
> Ben.
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] mm: Fix memblock_free_late() when using deferred struct page
2026-02-06 10:33 ` Mike Rapoport
@ 2026-02-10 1:04 ` Benjamin Herrenschmidt
2026-02-10 2:10 ` Benjamin Herrenschmidt
2026-02-16 10:36 ` Alexander Potapenko
1 sibling, 1 reply; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2026-02-10 1:04 UTC (permalink / raw)
To: Mike Rapoport; +Cc: linux-mm, Alexander Potapenko, Marco Elver, Dmitry Vyukov
On Fri, 2026-02-06 at 12:33 +0200, Mike Rapoport wrote:
>
> So it essentially becomes "oneliner" :)
>
> diff --git a/mm/memblock.c b/mm/memblock.c
> index e76255e4ff36..6e984bcdf6cd 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -1770,10 +1770,8 @@ void __init memblock_free_late(phys_addr_t
> base, phys_addr_t size)
> cursor = PFN_UP(base);
> end = PFN_DOWN(base + size);
>
> - for (; cursor < end; cursor++) {
> - memblock_free_pages(pfn_to_page(cursor), cursor, 0);
> - totalram_pages_inc();
> - }
> + for (; cursor < end; cursor++)
> + free_reserved_page(pfn_to_page(cursor));
> }
Nice and sweet :-)
I'll spin that & test it and send a v2. Thanks !
Cheers,
Ben.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] mm: Fix memblock_free_late() when using deferred struct page
2026-02-10 1:04 ` Benjamin Herrenschmidt
@ 2026-02-10 2:10 ` Benjamin Herrenschmidt
2026-02-10 6:17 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2026-02-10 2:10 UTC (permalink / raw)
To: Mike Rapoport; +Cc: linux-mm, Alexander Potapenko, Marco Elver, Dmitry Vyukov
On Tue, 2026-02-10 at 12:04 +1100, Benjamin Herrenschmidt wrote:
> On Fri, 2026-02-06 at 12:33 +0200, Mike Rapoport wrote:
> >
> > So it essentially becomes "oneliner" :)
> >
> > diff --git a/mm/memblock.c b/mm/memblock.c
> > index e76255e4ff36..6e984bcdf6cd 100644
> > --- a/mm/memblock.c
> > +++ b/mm/memblock.c
> > @@ -1770,10 +1770,8 @@ void __init memblock_free_late(phys_addr_t
> > base, phys_addr_t size)
> > cursor = PFN_UP(base);
> > end = PFN_DOWN(base + size);
> >
> > - for (; cursor < end; cursor++) {
> > - memblock_free_pages(pfn_to_page(cursor), cursor,
> > 0);
> > - totalram_pages_inc();
> > - }
> > + for (; cursor < end; cursor++)
> > + free_reserved_page(pfn_to_page(cursor));
> > }
>
> Nice and sweet :-)
>
> I'll spin that & test it and send a v2. Thanks !
Tadaaa ! Looks like I'll need to dig deeper... Busy with something else
today but I'll get back to this asap.
[ 0.076840] BUG: unable to handle page fault for address: ffffce1a005a0788
[ 0.078226] #PF: supervisor read access in kernel mode
[ 0.078226] #PF: error_code(0x0000) - not-present page
[ 0.078226] PGD 0 P4D 0
[ 0.078226] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 0.078226] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.12.68-92.123.amzn2023.x86_64 #1
[ 0.078226] Hardware name: Amazon EC2 t3a.nano/, BIOS 1.0 10/16/2017
[ 0.078226] RIP: 0010:__list_del_entry_valid_or_report+0x32/0xb0
[ 0.078226] Code: 89 fe 48 85 d2 74 3e 48 85 c9 74 47 48 b8 00 01 00 00 00 00 ad de 48 39 c2 74 46 48 b8 22 01 00 00 00 00 ad de 48 39 c1 74 45 <4c> 8b 01 49 39 f8 75 4e 4c 8b 4a 08 4d 39 c1 75 56 b8 01 00 00 00
[ 0.078226] RSP: 0000:ffffffff9ac03cc0 EFLAGS: 00010006
[ 0.078226] RAX: dead000000000122 RBX: fffff56600459c80 RCX: ffffce1a005a0788
[ 0.078226] RDX: ffffce1a005e3e08 RSI: fffff56600459c88 RDI: fffff56600459c88
[ 0.078226] RBP: 0000000000000000 R08: ffffffffffffffc0 R09: 0000000000000000
[ 0.078226] R10: 000000000000001c R11: 0000000000000200 R12: ffff8ca75bacbc80
[ 0.078226] R13: 0000000000000000 R14: 0000000000011673 R15: fffff56600459cc0
[ 0.078226] FS: 0000000000000000(0000) GS:ffff8ca752c00000(0000) knlGS:0000000000000000
[ 0.078226] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.078226] CR2: ffffce1a005a0788 CR3: 0000000006c34000 CR4: 00000000003506f0
[ 0.078226] Call Trace:
[ 0.078226] <TASK>
[ 0.078226] __free_one_page+0x170/0x520
[ 0.078226] free_pcppages_bulk+0x151/0x1e0
[ 0.078226] free_unref_page_commit+0x263/0x320
[ 0.078226] free_unref_page+0x2c8/0x5b0
[ 0.078226] ? srso_return_thunk+0x5/0x5f
[ 0.078226] free_reserved_page+0x1c/0x30
[ 0.078226] memblock_free_late+0x6c/0xc0
[ 0.078226] efi_free_boot_services+0x11f/0x2e0
[ 0.078226] __efi_enter_virtual_mode+0x181/0x210
[ 0.078226] efi_enter_virtual_mode+0xcd/0x110
[ 0.078226] start_kernel+0x393/0x500
[ 0.078226] x86_64_start_reservations+0x14/0x30
[ 0.078226] x86_64_start_kernel+0x77/0x80
[ 0.078226] common_startup_64+0x13e/0x141
[ 0.078226] </TASK>
[ 0.078226] Modules linked in:
[ 0.078226] CR2: ffffce1a005a0788
[ 0.078226] ---[ end trace 0000000000000000 ]---
[ 0.078226] RIP: 0010:__list_del_entry_valid_or_report+0x32/0xb0
[ 0.078226] Code: 89 fe 48 85 d2 74 3e 48 85 c9 74 47 48 b8 00 01 00 00 00 00 ad de 48 39 c2 74 46 48 b8 22 01 00 00 00 00 ad de 48 39 c1 74 45 <4c> 8b 01 49 39 f8 75 4e 4c 8b 4a 08 4d 39 c1 75 56 b8 01 00 00 00
[ 0.078226] RSP: 0000:ffffffff9ac03cc0 EFLAGS: 00010006
[ 0.078226] RAX: dead000000000122 RBX: fffff56600459c80 RCX: ffffce1a005a0788
[ 0.078226] RDX: ffffce1a005e3e08 RSI: fffff56600459c88 RDI: fffff56600459c88
[ 0.078226] RBP: 0000000000000000 R08: ffffffffffffffc0 R09: 0000000000000000
[ 0.078226] R10: 000000000000001c R11: 0000000000000200 R12: ffff8ca75bacbc80
[ 0.078226] R13: 0000000000000000 R14: 0000000000011673 R15: fffff56600459cc0
[ 0.078226] FS: 0000000000000000(0000) GS:ffff8ca752c00000(0000) knlGS:0000000000000000
[ 0.078226] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.078226] CR2: ffffce1a005a0788 CR3: 0000000006c34000 CR4: 00000000003506f0
[ 0.078226] Kernel panic - not syncing: Fatal exception
[ 0.078226] ---[ end Kernel panic - not syncing: Fatal exception ]---
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] mm: Fix memblock_free_late() when using deferred struct page
2026-02-10 2:10 ` Benjamin Herrenschmidt
@ 2026-02-10 6:17 ` Benjamin Herrenschmidt
2026-02-10 8:34 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2026-02-10 6:17 UTC (permalink / raw)
To: Mike Rapoport; +Cc: linux-mm, Alexander Potapenko, Marco Elver, Dmitry Vyukov
So ... that was a backport to 6.12.68 and my original patch is crashing
the same way ! (it was working last week interestingly enough,
something else got backported that gets in the way maybe ?).
I'm going to have to go back to digging :-(
I suspect the pages aren't reserved. I swear this was working :-)
Cheers,
Ben.
[ 0.033998] RETBleed: WARNING: Spectre v2 mitigation leaves CPU vulnerable to RETBleed attacks, data leaks possible!
[ 0.043386] BUG: unable to handle page fault for address: ffffe49c80307388
[ 0.043386] #PF: supervisor read access in kernel mode
[ 0.043386] #PF: error_code(0x0000) - not-present page
[ 0.043386] PGD 1024067 P4D 1024067 PUD 0
[ 0.043386] Oops: Oops: 0000 [#1] PREEMPT SMP PTI
[ 0.043386] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.12.68-92.123.amzn2023.x86_64 #1
[ 0.043386] Hardware name: Amazon EC2 t3.nano/, BIOS 1.0 10/16/2017
[ 0.043386] RIP: 0010:__list_del_entry_valid_or_report+0x32/0xb0
[ 0.043386] Code: 89 fe 48 85 d2 74 3e 48 85 c9 74 47 48 b8 00 01 00 00 00 00 ad de 48 39 c2 74 46 48 b8 22 01 00 00 00 00 ad de 48 39 c1 74 45 <4c> 8b 01 49 39 f8 75 4e 4c 8b 4a 08 4d 39 c1 75 56 b8 01 00 00 00
[ 0.043386] RSP: 0000:ffffffffb3c03da0 EFLAGS: 00010006
[ 0.043386] RAX: dead000000000122 RBX: fffff44480600300 RCX: ffffe49c80307388
[ 0.043386] RDX: ffffe49c804e5288 RSI: fffff44480600308 RDI: fffff44480600308
[ 0.043386] RBP: 0000000000000000 R08: ffffffffffffffc0 R09: 0000000000000002
[ 0.043386] R10: 0000000000000000 R11: 0000000000000200 R12: ffff8cf21b8cbc80
[ 0.043386] R13: 0000000000000000 R14: 000000000001800d R15: fffff44480600340
[ 0.043386] FS: 0000000000000000(0000) GS:ffff8cf21aa00000(0000) knlGS:0000000000000000
[ 0.043386] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.043386] CR2: ffffe49c80307388 CR3: 0000000006c34001 CR4: 00000000007706f0
[ 0.043386] PKRU: 55555554
[ 0.043386] Call Trace:
[ 0.043386] <TASK>
[ 0.043386] __free_one_page+0x170/0x520
[ 0.043386] free_one_page+0x4c/0x80
[ 0.043386] memblock_free_late+0x72/0xd0
[ 0.043386] efi_free_boot_services+0x11f/0x2e0
[ 0.043386] __efi_enter_virtual_mode+0x181/0x210
[ 0.043386] efi_enter_virtual_mode+0xcd/0x110
[ 0.043386] start_kernel+0x393/0x500
[ 0.043386] x86_64_start_reservations+0x14/0x30
[ 0.043386] x86_64_start_kernel+0x77/0x80
[ 0.043386] common_startup_64+0x13e/0x141
[ 0.043386] </TASK>
[ 0.043386] Modules linked in:
[ 0.043386] CR2: ffffe49c80307388
[ 0.043386] ---[ end trace 0000000000000000 ]---
[ 0.043386] RIP: 0010:__list_del_entry_valid_or_report+0x32/0xb0
[ 0.043386] Code: 89 fe 48 85 d2 74 3e 48 85 c9 74 47 48 b8 00 01 00 00 00 00 ad de 48 39 c2 74 46 48 b8 22 01 00 00 00 00 ad de 48 39 c1 74 45 <4c> 8b 01 49 39 f8 75 4e 4c 8b 4a 08 4d 39 c1 75 56 b8 01 00 00 00
[ 0.043386] RSP: 0000:ffffffffb3c03da0 EFLAGS: 00010006
[ 0.043386] RAX: dead000000000122 RBX: fffff44480600300 RCX: ffffe49c80307388
[ 0.043386] RDX: ffffe49c804e5288 RSI: fffff44480600308 RDI: fffff44480600308
[ 0.043386] RBP: 0000000000000000 R08: ffffffffffffffc0 R09: 0000000000000002
[ 0.043386] R10: 0000000000000000 R11: 0000000000000200 R12: ffff8cf21b8cbc80
[ 0.043386] R13: 0000000000000000 R14: 000000000001800d R15: fffff44480600340
[ 0.043386] FS: 0000000000000000(0000) GS:ffff8cf21aa00000(0000) knlGS:0000000000000000
[ 0.043386] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.043386] CR2: ffffe49c80307388 CR3: 0000000006c34001 CR4: 00000000007706f0
[ 0.043386] PKRU: 55555554
[ 0.043386] Kernel panic - not syncing: Fatal exception
[ 0.043386] ---[ end Kernel panic - not syncing: Fatal exception ]---
[ 0.033998] RETBleed: WARNING: Spectre v2 mitigation leaves CPU vulnerable to RETBleed attacks, data leaks possible!
[ 0.043386] BUG: unable to handle page fault for address: ffffe49c80307388
[ 0.043386] #PF: supervisor read access in kernel mode
[ 0.043386] #PF: error_code(0x0000) - not-present page
[ 0.043386] PGD 1024067 P4D 1024067 PUD 0
[ 0.043386] Oops: Oops: 0000 [#1] PREEMPT SMP PTI
[ 0.043386] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.12.68-92.123.amzn2023.x86_64 #1
[ 0.043386] Hardware name: Amazon EC2 t3.nano/, BIOS 1.0 10/16/2017
[ 0.043386] RIP: 0010:__list_del_entry_valid_or_report+0x32/0xb0
[ 0.043386] Code: 89 fe 48 85 d2 74 3e 48 85 c9 74 47 48 b8 00 01 00 00 00 00 ad de 48 39 c2 74 46 48 b8 22 01 00 00 00 00 ad de 48 39 c1 74 45 <4c> 8b 01 49 39 f8 75 4e 4c 8b 4a 08 4d 39 c1 75 56 b8 01 00 00 00
[ 0.043386] RSP: 0000:ffffffffb3c03da0 EFLAGS: 00010006
[ 0.043386] RAX: dead000000000122 RBX: fffff44480600300 RCX: ffffe49c80307388
[ 0.043386] RDX: ffffe49c804e5288 RSI: fffff44480600308 RDI: fffff44480600308
[ 0.043386] RBP: 0000000000000000 R08: ffffffffffffffc0 R09: 0000000000000002
[ 0.043386] R10: 0000000000000000 R11: 0000000000000200 R12: ffff8cf21b8cbc80
[ 0.043386] R13: 0000000000000000 R14: 000000000001800d R15: fffff44480600340
[ 0.043386] FS: 0000000000000000(0000) GS:ffff8cf21aa00000(0000) knlGS:0000000000000000
[ 0.043386] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.043386] CR2: ffffe49c80307388 CR3: 0000000006c34001 CR4: 00000000007706f0
[ 0.043386] PKRU: 55555554
[ 0.043386] Call Trace:
[ 0.043386] <TASK>
[ 0.043386] __free_one_page+0x170/0x520
[ 0.043386] free_one_page+0x4c/0x80
[ 0.043386] memblock_free_late+0x72/0xd0
[ 0.043386] efi_free_boot_services+0x11f/0x2e0
[ 0.043386] __efi_enter_virtual_mode+0x181/0x210
[ 0.043386] efi_enter_virtual_mode+0xcd/0x110
[ 0.043386] start_kernel+0x393/0x500
[ 0.043386] x86_64_start_reservations+0x14/0x30
[ 0.043386] x86_64_start_kernel+0x77/0x80
[ 0.043386] common_startup_64+0x13e/0x141
[ 0.043386] </TASK>
[ 0.043386] Modules linked in:
[ 0.043386] CR2: ffffe49c80307388
[ 0.043386] ---[ end trace 0000000000000000 ]---
[ 0.043386] RIP: 0010:__list_del_entry_valid_or_report+0x32/0xb0
[ 0.043386] Code: 89 fe 48 85 d2 74 3e 48 85 c9 74 47 48 b8 00 01 00 00 00 00 ad de 48 39 c2 74 46 48 b8 22 01 00 00 00 00 ad de 48 39 c1 74 45 <4c> 8b 01 49 39 f8 75 4e 4c 8b 4a 08 4d 39 c1 75 56 b8 01 00 00 00
[ 0.043386] RSP: 0000:ffffffffb3c03da0 EFLAGS: 00010006
[ 0.043386] RAX: dead000000000122 RBX: fffff44480600300 RCX: ffffe49c80307388
[ 0.043386] RDX: ffffe49c804e5288 RSI: fffff44480600308 RDI: fffff44480600308
[ 0.043386] RBP: 0000000000000000 R08: ffffffffffffffc0 R09: 0000000000000002
[ 0.043386] R10: 0000000000000000 R11: 0000000000000200 R12: ffff8cf21b8cbc80
[ 0.043386] R13: 0000000000000000 R14: 000000000001800d R15: fffff44480600340
[ 0.043386] FS: 0000000000000000(0000) GS:ffff8cf21aa00000(0000) knlGS:0000000000000000
[ 0.043386] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.043386] CR2: ffffe49c80307388 CR3: 0000000006c34001 CR4: 00000000007706f0
[ 0.043386] PKRU: 55555554
[ 0.043386] Kernel panic - not syncing: Fatal exception
[ 0.043386] ---[ end Kernel panic - not syncing: Fatal exception ]---
On Tue, 2026-02-10 at 13:10 +1100, Benjamin Herrenschmidt wrote:
> On Tue, 2026-02-10 at 12:04 +1100, Benjamin Herrenschmidt wrote:
> > On Fri, 2026-02-06 at 12:33 +0200, Mike Rapoport wrote:
> > >
> > > So it essentially becomes "oneliner" :)
> > >
> > > diff --git a/mm/memblock.c b/mm/memblock.c
> > > index e76255e4ff36..6e984bcdf6cd 100644
> > > --- a/mm/memblock.c
> > > +++ b/mm/memblock.c
> > > @@ -1770,10 +1770,8 @@ void __init memblock_free_late(phys_addr_t
> > > base, phys_addr_t size)
> > > cursor = PFN_UP(base);
> > > end = PFN_DOWN(base + size);
> > >
> > > - for (; cursor < end; cursor++) {
> > > - memblock_free_pages(pfn_to_page(cursor), cursor,
> > > 0);
> > > - totalram_pages_inc();
> > > - }
> > > + for (; cursor < end; cursor++)
> > > + free_reserved_page(pfn_to_page(cursor));
> > > }
> >
> > Nice and sweet :-)
> >
> > I'll spin that & test it and send a v2. Thanks !
>
> Tadaaa ! Looks like I'll need to dig deeper... Busy with something
> else
> today but I'll get back to this asap.
>
> [ 0.076840] BUG: unable to handle page fault for address:
> ffffce1a005a0788
> [ 0.078226] #PF: supervisor read access in kernel mode
> [ 0.078226] #PF: error_code(0x0000) - not-present page
> [ 0.078226] PGD 0 P4D 0
> [ 0.078226] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
> [ 0.078226] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted
> 6.12.68-92.123.amzn2023.x86_64 #1
> [ 0.078226] Hardware name: Amazon EC2 t3a.nano/, BIOS 1.0
> 10/16/2017
> [ 0.078226] RIP: 0010:__list_del_entry_valid_or_report+0x32/0xb0
> [ 0.078226] Code: 89 fe 48 85 d2 74 3e 48 85 c9 74 47 48 b8 00 01
> 00 00 00 00 ad de 48 39 c2 74 46 48 b8 22 01 00 00 00 00 ad de 48 39
> c1 74 45 <4c> 8b 01 49 39 f8 75 4e 4c 8b 4a 08 4d 39 c1 75 56 b8 01
> 00 00 00
> [ 0.078226] RSP: 0000:ffffffff9ac03cc0 EFLAGS: 00010006
> [ 0.078226] RAX: dead000000000122 RBX: fffff56600459c80 RCX:
> ffffce1a005a0788
> [ 0.078226] RDX: ffffce1a005e3e08 RSI: fffff56600459c88 RDI:
> fffff56600459c88
> [ 0.078226] RBP: 0000000000000000 R08: ffffffffffffffc0 R09:
> 0000000000000000
> [ 0.078226] R10: 000000000000001c R11: 0000000000000200 R12:
> ffff8ca75bacbc80
> [ 0.078226] R13: 0000000000000000 R14: 0000000000011673 R15:
> fffff56600459cc0
> [ 0.078226] FS: 0000000000000000(0000) GS:ffff8ca752c00000(0000)
> knlGS:0000000000000000
> [ 0.078226] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 0.078226] CR2: ffffce1a005a0788 CR3: 0000000006c34000 CR4:
> 00000000003506f0
> [ 0.078226] Call Trace:
> [ 0.078226] <TASK>
> [ 0.078226] __free_one_page+0x170/0x520
> [ 0.078226] free_pcppages_bulk+0x151/0x1e0
> [ 0.078226] free_unref_page_commit+0x263/0x320
> [ 0.078226] free_unref_page+0x2c8/0x5b0
> [ 0.078226] ? srso_return_thunk+0x5/0x5f
> [ 0.078226] free_reserved_page+0x1c/0x30
> [ 0.078226] memblock_free_late+0x6c/0xc0
> [ 0.078226] efi_free_boot_services+0x11f/0x2e0
> [ 0.078226] __efi_enter_virtual_mode+0x181/0x210
> [ 0.078226] efi_enter_virtual_mode+0xcd/0x110
> [ 0.078226] start_kernel+0x393/0x500
> [ 0.078226] x86_64_start_reservations+0x14/0x30
> [ 0.078226] x86_64_start_kernel+0x77/0x80
> [ 0.078226] common_startup_64+0x13e/0x141
> [ 0.078226] </TASK>
> [ 0.078226] Modules linked in:
> [ 0.078226] CR2: ffffce1a005a0788
> [ 0.078226] ---[ end trace 0000000000000000 ]---
> [ 0.078226] RIP: 0010:__list_del_entry_valid_or_report+0x32/0xb0
> [ 0.078226] Code: 89 fe 48 85 d2 74 3e 48 85 c9 74 47 48 b8 00 01
> 00 00 00 00 ad de 48 39 c2 74 46 48 b8 22 01 00 00 00 00 ad de 48 39
> c1 74 45 <4c> 8b 01 49 39 f8 75 4e 4c 8b 4a 08 4d 39 c1 75 56 b8 01
> 00 00 00
> [ 0.078226] RSP: 0000:ffffffff9ac03cc0 EFLAGS: 00010006
> [ 0.078226] RAX: dead000000000122 RBX: fffff56600459c80 RCX:
> ffffce1a005a0788
> [ 0.078226] RDX: ffffce1a005e3e08 RSI: fffff56600459c88 RDI:
> fffff56600459c88
> [ 0.078226] RBP: 0000000000000000 R08: ffffffffffffffc0 R09:
> 0000000000000000
> [ 0.078226] R10: 000000000000001c R11: 0000000000000200 R12:
> ffff8ca75bacbc80
> [ 0.078226] R13: 0000000000000000 R14: 0000000000011673 R15:
> fffff56600459cc0
> [ 0.078226] FS: 0000000000000000(0000) GS:ffff8ca752c00000(0000)
> knlGS:0000000000000000
> [ 0.078226] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 0.078226] CR2: ffffce1a005a0788 CR3: 0000000006c34000 CR4:
> 00000000003506f0
> [ 0.078226] Kernel panic - not syncing: Fatal exception
> [ 0.078226] ---[ end Kernel panic - not syncing: Fatal exception
> ]---
>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] mm: Fix memblock_free_late() when using deferred struct page
2026-02-10 6:17 ` Benjamin Herrenschmidt
@ 2026-02-10 8:34 ` Benjamin Herrenschmidt
2026-02-10 14:32 ` Mike Rapoport
0 siblings, 1 reply; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2026-02-10 8:34 UTC (permalink / raw)
To: Mike Rapoport; +Cc: linux-mm, Alexander Potapenko, Marco Elver, Dmitry Vyukov
On Tue, 2026-02-10 at 17:17 +1100, Benjamin Herrenschmidt wrote:
>
> So ... that was a backport to 6.12.68 and my original patch is
> crashing
> the same way ! (it was working last week interestingly enough,
> something else got backported that gets in the way maybe ?).
>
> I'm going to have to go back to digging :-(
>
> I suspect the pages aren't reserved. I swear this was working :-)
So I rebuilt with a bit of extra debug prints, CONFIG_DEBUG_VM on, and
memblock=debug ... it's not hitting the reserved check, but it's also
not crashing the same way (still 6.12, I'll play with upstream again
later):
.../...
[ 0.045633] Freeing SMP alternatives memory: 36K
[ 0.045633] pid_max: default: 32768 minimum: 301
[ 0.045633] memblock_free_late: [0x000000003d36b000-0x000000003d37bfff] efi_free_boot_services+0x11f/0x2e0
[ 0.045633] memblock_free_late: [0x000000003b336000-0x000000003d36afff] efi_free_boot_services+0x11f/0x2e0
[ 0.045633] memblock_free_late: [0x000000003b317000-0x000000003b335fff] efi_free_boot_services+0x11f/0x2e0
[ 0.045633] memblock_free_late: [0x000000003b2f7000-0x000000003b316fff] efi_free_boot_services+0x11f/0x2e0
[ 0.045633] memblock_free_late: [0x000000003b000000-0x000000003b1fffff] efi_free_boot_services+0x11f/0x2e0
[ 0.045633] memblock_free_late: [0x00000000393de000-0x00000000393defff] efi_free_boot_services+0x11f/0x2e0
[ 0.045633] memblock_free_late: [0x0000000038e73000-0x00000000390cdfff] efi_free_boot_services+0x11f/0x2e0
[ 0.045633] LSM: initializing lsm=lockdown,capability,landlock,yama,safesetid,selinux,bpf,ima
[ 0.045633] landlock: Up and running.
[ 0.045633] Yama: becoming mindful.
[ 0.045633] SELinux: Initializing.
[ 0.045633] LSM support for eBPF active
[ 0.045633] Mount-cache hash table entries: 2048 (order: 2, 16384 bytes, linear)
[ 0.045633] Mountpoint-cache hash table entries: 2048 (order: 2, 16384 bytes, linear)
[ 0.045633] smpboot: CPU0: Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz (family: 0x6, model: 0x55, stepping: 0x7)
[ 0.045633] Performance Events: unsupported p6 CPU model 85 no PMU driver, software events only.
[ 0.045633] signal: max sigframe size: 3632
[ 0.045633] rcu: Hierarchical SRCU implementation.
[ 0.045633] rcu: Max phase no-delay instances is 1000.
[ 0.045633] Timer migration: 1 hierarchy levels; 8 children per group; 1 crossnode level
[ 0.045633] smp: Bringing up secondary CPUs ...
[ 0.045633] smpboot: x86: Booting SMP configuration:
[ 0.045633] .... node #0, CPUs: #1
[ 0.045633] MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details.
[ 0.045633] MMIO Stale Data CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/processor_mmio_stale_data.html for more details.
[ 0.045633] smp: Brought up 1 node, 2 CPUs
[ 0.045633] smpboot: Total of 2 processors activated (9999.97 BogoMIPS)
[ 0.045633] node 0 deferred pages initialised in 0ms
[ 0.045633] Memory: 900460K/999468K available (16384K kernel code, 9440K rwdata, 11364K rodata, 3740K init, 6440K bss, 94600K reserved, 0K cma-reserved)
[ 0.045633] devtmpfs: initialized
[ 0.045633] x86/mm: Memory block size: 128MB
[ 0.045633] ------------[ cut here ]------------
[ 0.045633] page type is 1, passed migratetype is 0 (nr=16)
[ 0.045633] WARNING: CPU: 1 PID: 2 at mm/page_alloc.c:721 rmqueue_bulk+0x82e/0x880
[ 0.045633] Modules linked in:
[ 0.045633] CPU: 1 UID: 0 PID: 2 Comm: kthreadd Not tainted 6.12.68-93.123.amzn2023.x86_64 #1
[ 0.045633] Hardware name: Amazon EC2 t3.micro/, BIOS 1.0 10/16/2017
[ 0.045633] RIP: 0010:rmqueue_bulk+0x82e/0x880
[ 0.045633] Code: c6 05 be be 13 02 01 e8 b0 b5 ff ff 44 89 e9 8b 14 24 48 c7 c7 a8 6d 51 8e 48 89 c6 b8 01 00 00 00 d3 e0 89 c1 e8 32 4f d2 ff <0f> 0b 4c 8b 44 24 48 e9 79 fc ff ff 48 c7 c6 e0 77 51 8e 4c 89 e7
[ 0.045633] RSP: 0000:ffffd592c002f898 EFLAGS: 00010086
[ 0.045633] RAX: 0000000000000000 RBX: ffff8e363b2cbc80 RCX: ffffffff8f1f0c68
[ 0.045633] RDX: 0000000000000000 RSI: 00000000fffeffff RDI: 0000000000000001
[ 0.045633] RBP: fffffb9c40e3a408 R08: 0000000000000000 R09: ffffd592c002f740
[ 0.045633] R10: ffffd592c002f738 R11: ffffffff8f370ca8 R12: fffffb9c40e3a400
[ 0.045633] R13: 0000000000000004 R14: 0000000000000003 R15: 0000000000038e90
[ 0.045633] FS: 0000000000000000(0000) GS:ffff8e3639f00000(0000) knlGS:0000000000000000
[ 0.045633] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.045633] CR2: 0000000000000000 CR3: 000000001bc34001 CR4: 00000000007706f0
[ 0.045633] PKRU: 55555554
[ 0.045633] Call Trace:
[ 0.045633] <TASK>
[ 0.045633] __rmqueue_pcplist+0x233/0x2c0
[ 0.045633] rmqueue.constprop.0+0x4b6/0xe80
[ 0.045633] ? _raw_spin_unlock+0xa/0x30
[ 0.045633] ? rmqueue.constprop.0+0x557/0xe80
[ 0.045633] ? _raw_spin_unlock_irqrestore+0xa/0x30
[ 0.045633] get_page_from_freelist+0x16e/0x5f0
[ 0.045633] __alloc_pages_noprof+0x18a/0x350
[ 0.045633] alloc_pages_mpol_noprof+0xf2/0x1e0
[ 0.045633] ? shuffle_freelist+0x126/0x1b0
[ 0.045633] allocate_slab+0x2b3/0x410
[ 0.045633] ___slab_alloc+0x396/0x830
[ 0.045633] ? switch_hrtimer_base+0x8e/0x190
[ 0.045633] ? timerqueue_add+0x9b/0xc0
[ 0.045633] ? dup_task_struct+0x2d/0x1b0
[ 0.045633] ? _raw_spin_unlock_irqrestore+0xa/0x30
[ 0.045633] ? start_dl_timer+0xb0/0x140
[ 0.045633] kmem_cache_alloc_node_noprof+0x271/0x2e0
[ 0.045633] ? dup_task_struct+0x2d/0x1b0
[ 0.045633] dup_task_struct+0x2d/0x1b0
[ 0.045633] copy_process+0x195/0x17e0
[ 0.045633] kernel_clone+0x9a/0x3b0
[ 0.045633] ? psi_task_switch+0x105/0x290
[ 0.045633] kernel_thread+0x6b/0x90
[ 0.045633] ? __pfx_kthread+0x10/0x10
[ 0.045633] kthreadd+0x276/0x2d0
[ 0.045633] ? __pfx_kthreadd+0x10/0x10
[ 0.045633] ret_from_fork+0x30/0x50
[ 0.045633] ? __pfx_kthreadd+0x10/0x10
[ 0.045633] ret_from_fork_asm+0x1a/0x30
[ 0.045633] </TASK>
[ 0.045633] ---[ end trace 0000000000000000 ]---
[ 0.045633] ------------[ cut here ]------------
[ 0.045633] page type is 1, passed migratetype is 0 (nr=8)
[ 0.045633] WARNING: CPU: 1 PID: 2 at mm/page_alloc.c:686 expand+0x1af/0x1e0
[ 0.045633] Modules linked in:
[ 0.045633] CPU: 1 UID: 0 PID: 2 Comm: kthreadd Tainted: G W 6.12.68-93.123.amzn2023.x86_64 #1
[ 0.045633] Tainted: [W]=WARN
[ 0.045633] Hardware name: Amazon EC2 t3.micro/, BIOS 1.0 10/16/2017
[ 0.045633] RIP: 0010:expand+0x1af/0x1e0
[ 0.045633] Code: c6 05 af 06 14 02 01 e8 9f fd ff ff 89 e9 8b 54 24 34 48 c7 c7 a8 6d 51 8e 48 89 c6 b8 01 00 00 00 d3 e0 89 c1 e8 21 97 d2 ff <0f> 0b e9 e5 fe ff ff 48 c7 c6 e0 6d 51 8e 4c 89 ff e8 eb 23 fc ff
[ 0.045633] RSP: 0000:ffffd592c002f828 EFLAGS: 00010082
[ 0.045633] RAX: 0000000000000000 RBX: ffff8e363b2cbc80 RCX: ffffffff8f1f0c68
[ 0.045633] RDX: 0000000000000000 RSI: 00000000fffeffff RDI: 0000000000000001
[ 0.045633] RBP: 0000000000000003 R08: 0000000000000000 R09: ffffd592c002f6d0
[ 0.045633] R10: ffffd592c002f6c8 R11: ffffffff8f370ca8 R12: 0000000000000008
[ 0.045633] R13: 0000000000038e98 R14: 0000000000000003 R15: fffffb9c40e3a600
[ 0.045633] FS: 0000000000000000(0000) GS:ffff8e3639f00000(0000) knlGS:0000000000000000
[ 0.045633] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.045633] CR2: 0000000000000000 CR3: 000000001bc34001 CR4: 00000000007706f0
[ 0.045633] PKRU: 55555554
[ 0.045633] Call Trace:
[ 0.045633] <TASK>
[ 0.045633] rmqueue_bulk+0x541/0x880
[ 0.045633] __rmqueue_pcplist+0x233/0x2c0
[ 0.045633] rmqueue.constprop.0+0x4b6/0xe80
[ 0.045633] ? _raw_spin_unlock+0xa/0x30
[ 0.045633] ? rmqueue.constprop.0+0x557/0xe80
[ 0.045633] ? _raw_spin_unlock_irqrestore+0xa/0x30
[ 0.045633] get_page_from_freelist+0x16e/0x5f0
[ 0.045633] __alloc_pages_noprof+0x18a/0x350
[ 0.045633] alloc_pages_mpol_noprof+0xf2/0x1e0
[ 0.045633] ? shuffle_freelist+0x126/0x1b0
[ 0.045633] allocate_slab+0x2b3/0x410
[ 0.045633] ___slab_alloc+0x396/0x830
[ 0.045633] ? switch_hrtimer_base+0x8e/0x190
[ 0.045633] ? timerqueue_add+0x9b/0xc0
[ 0.045633] ? dup_task_struct+0x2d/0x1b0
[ 0.045633] ? _raw_spin_unlock_irqrestore+0xa/0x30
[ 0.045633] ? start_dl_timer+0xb0/0x140
[ 0.045633] kmem_cache_alloc_node_noprof+0x271/0x2e0
[ 0.045633] ? dup_task_struct+0x2d/0x1b0
[ 0.045633] dup_task_struct+0x2d/0x1b0
[ 0.045633] copy_process+0x195/0x17e0
[ 0.045633] kernel_clone+0x9a/0x3b0
[ 0.045633] ? psi_task_switch+0x105/0x290
[ 0.045633] kernel_thread+0x6b/0x90
[ 0.045633] ? __pfx_kthread+0x10/0x10
[ 0.045633] kthreadd+0x276/0x2d0
[ 0.045633] ? __pfx_kthreadd+0x10/0x10
[ 0.045633] ret_from_fork+0x30/0x50
[ 0.045633] ? __pfx_kthreadd+0x10/0x10
[ 0.045633] ret_from_fork_asm+0x1a/0x30
[ 0.045633] </TASK>
[ 0.045633] ---[ end trace 0000000000000000 ]---
> >
>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] mm: Fix memblock_free_late() when using deferred struct page
2026-02-10 8:34 ` Benjamin Herrenschmidt
@ 2026-02-10 14:32 ` Mike Rapoport
2026-02-10 23:23 ` Benjamin Herrenschmidt
2026-02-16 4:53 ` Benjamin Herrenschmidt
0 siblings, 2 replies; 33+ messages in thread
From: Mike Rapoport @ 2026-02-10 14:32 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: linux-mm, Alexander Potapenko, Marco Elver, Dmitry Vyukov
Hi Ben,
On Tue, Feb 10, 2026 at 07:34:15PM +1100, Benjamin Herrenschmidt wrote:
> On Tue, 2026-02-10 at 17:17 +1100, Benjamin Herrenschmidt wrote:
> >
> > So ... that was a backport to 6.12.68 and my original patch is
> > crashing
> > the same way ! (it was working last week interestingly enough,
> > something else got backported that gets in the way maybe ?).
> >
> > I'm going to have to go back to digging :-(
> >
> > I suspect the pages aren't reserved. I swear this was working :-)
>
> So I rebuilt with a bit of extra debug prints, CONFIG_DEBUG_VM on, and
> memblock=debug ... it's not hitting the reserved check, but it's also
> not crashing the same way (still 6.12, I'll play with upstream again
> later):
>
> .../...
Do you mind sending the entire log?
>
> [ 0.045633] Freeing SMP alternatives memory: 36K
> [ 0.045633] pid_max: default: 32768 minimum: 301
> [ 0.045633] memblock_free_late: [0x000000003d36b000-0x000000003d37bfff] efi_free_boot_services+0x11f/0x2e0
> [ 0.045633] memblock_free_late: [0x000000003b336000-0x000000003d36afff] efi_free_boot_services+0x11f/0x2e0
> [ 0.045633] memblock_free_late: [0x000000003b317000-0x000000003b335fff] efi_free_boot_services+0x11f/0x2e0
> [ 0.045633] memblock_free_late: [0x000000003b2f7000-0x000000003b316fff] efi_free_boot_services+0x11f/0x2e0
> [ 0.045633] memblock_free_late: [0x000000003b000000-0x000000003b1fffff] efi_free_boot_services+0x11f/0x2e0
> [ 0.045633] memblock_free_late: [0x00000000393de000-0x00000000393defff] efi_free_boot_services+0x11f/0x2e0
> [ 0.045633] memblock_free_late: [0x0000000038e73000-0x00000000390cdfff] efi_free_boot_services+0x11f/0x2e0
> [ 0.045633] LSM: initializing lsm=lockdown,capability,landlock,yama,safesetid,selinux,bpf,ima
> [ 0.045633] landlock: Up and running.
> [ 0.045633] Yama: becoming mindful.
> [ 0.045633] SELinux: Initializing.
> [ 0.045633] LSM support for eBPF active
> [ 0.045633] Mount-cache hash table entries: 2048 (order: 2, 16384 bytes, linear)
> [ 0.045633] Mountpoint-cache hash table entries: 2048 (order: 2, 16384 bytes, linear)
> [ 0.045633] smpboot: CPU0: Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz (family: 0x6, model: 0x55, stepping: 0x7)
> [ 0.045633] Performance Events: unsupported p6 CPU model 85 no PMU driver, software events only.
> [ 0.045633] signal: max sigframe size: 3632
> [ 0.045633] rcu: Hierarchical SRCU implementation.
> [ 0.045633] rcu: Max phase no-delay instances is 1000.
> [ 0.045633] Timer migration: 1 hierarchy levels; 8 children per group; 1 crossnode level
> [ 0.045633] smp: Bringing up secondary CPUs ...
> [ 0.045633] smpboot: x86: Booting SMP configuration:
> [ 0.045633] .... node #0, CPUs: #1
> [ 0.045633] MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details.
> [ 0.045633] MMIO Stale Data CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/processor_mmio_stale_data.html for more details.
> [ 0.045633] smp: Brought up 1 node, 2 CPUs
> [ 0.045633] smpboot: Total of 2 processors activated (9999.97 BogoMIPS)
> [ 0.045633] node 0 deferred pages initialised in 0ms
> [ 0.045633] Memory: 900460K/999468K available (16384K kernel code, 9440K rwdata, 11364K rodata, 3740K init, 6440K bss, 94600K reserved, 0K cma-reserved)
> [ 0.045633] devtmpfs: initialized
> [ 0.045633] x86/mm: Memory block size: 128MB
> [ 0.045633] ------------[ cut here ]------------
> [ 0.045633] page type is 1, passed migratetype is 0 (nr=16)
> [ 0.045633] WARNING: CPU: 1 PID: 2 at mm/page_alloc.c:721 rmqueue_bulk+0x82e/0x880
> [ 0.045633] Modules linked in:
> [ 0.045633] CPU: 1 UID: 0 PID: 2 Comm: kthreadd Not tainted 6.12.68-93.123.amzn2023.x86_64 #1
> [ 0.045633] Hardware name: Amazon EC2 t3.micro/, BIOS 1.0 10/16/2017
> [ 0.045633] RIP: 0010:rmqueue_bulk+0x82e/0x880
> [ 0.045633] Code: c6 05 be be 13 02 01 e8 b0 b5 ff ff 44 89 e9 8b 14 24 48 c7 c7 a8 6d 51 8e 48 89 c6 b8 01 00 00 00 d3 e0 89 c1 e8 32 4f d2 ff <0f> 0b 4c 8b 44 24 48 e9 79 fc ff ff 48 c7 c6 e0 77 51 8e 4c 89 e7
> [ 0.045633] RSP: 0000:ffffd592c002f898 EFLAGS: 00010086
> [ 0.045633] RAX: 0000000000000000 RBX: ffff8e363b2cbc80 RCX: ffffffff8f1f0c68
> [ 0.045633] RDX: 0000000000000000 RSI: 00000000fffeffff RDI: 0000000000000001
> [ 0.045633] RBP: fffffb9c40e3a408 R08: 0000000000000000 R09: ffffd592c002f740
> [ 0.045633] R10: ffffd592c002f738 R11: ffffffff8f370ca8 R12: fffffb9c40e3a400
> [ 0.045633] R13: 0000000000000004 R14: 0000000000000003 R15: 0000000000038e90
> [ 0.045633] FS: 0000000000000000(0000) GS:ffff8e3639f00000(0000) knlGS:0000000000000000
> [ 0.045633] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 0.045633] CR2: 0000000000000000 CR3: 000000001bc34001 CR4: 00000000007706f0
> [ 0.045633] PKRU: 55555554
> [ 0.045633] Call Trace:
> [ 0.045633] <TASK>
> [ 0.045633] __rmqueue_pcplist+0x233/0x2c0
> [ 0.045633] rmqueue.constprop.0+0x4b6/0xe80
> [ 0.045633] ? _raw_spin_unlock+0xa/0x30
> [ 0.045633] ? rmqueue.constprop.0+0x557/0xe80
> [ 0.045633] ? _raw_spin_unlock_irqrestore+0xa/0x30
> [ 0.045633] get_page_from_freelist+0x16e/0x5f0
> [ 0.045633] __alloc_pages_noprof+0x18a/0x350
> [ 0.045633] alloc_pages_mpol_noprof+0xf2/0x1e0
> [ 0.045633] ? shuffle_freelist+0x126/0x1b0
> [ 0.045633] allocate_slab+0x2b3/0x410
> [ 0.045633] ___slab_alloc+0x396/0x830
> [ 0.045633] ? switch_hrtimer_base+0x8e/0x190
> [ 0.045633] ? timerqueue_add+0x9b/0xc0
> [ 0.045633] ? dup_task_struct+0x2d/0x1b0
> [ 0.045633] ? _raw_spin_unlock_irqrestore+0xa/0x30
> [ 0.045633] ? start_dl_timer+0xb0/0x140
> [ 0.045633] kmem_cache_alloc_node_noprof+0x271/0x2e0
> [ 0.045633] ? dup_task_struct+0x2d/0x1b0
> [ 0.045633] dup_task_struct+0x2d/0x1b0
> [ 0.045633] copy_process+0x195/0x17e0
> [ 0.045633] kernel_clone+0x9a/0x3b0
> [ 0.045633] ? psi_task_switch+0x105/0x290
> [ 0.045633] kernel_thread+0x6b/0x90
> [ 0.045633] ? __pfx_kthread+0x10/0x10
> [ 0.045633] kthreadd+0x276/0x2d0
> [ 0.045633] ? __pfx_kthreadd+0x10/0x10
> [ 0.045633] ret_from_fork+0x30/0x50
> [ 0.045633] ? __pfx_kthreadd+0x10/0x10
> [ 0.045633] ret_from_fork_asm+0x1a/0x30
> [ 0.045633] </TASK>
> [ 0.045633] ---[ end trace 0000000000000000 ]---
> [ 0.045633] ------------[ cut here ]------------
> [ 0.045633] page type is 1, passed migratetype is 0 (nr=8)
> [ 0.045633] WARNING: CPU: 1 PID: 2 at mm/page_alloc.c:686 expand+0x1af/0x1e0
> [ 0.045633] Modules linked in:
> [ 0.045633] CPU: 1 UID: 0 PID: 2 Comm: kthreadd Tainted: G W 6.12.68-93.123.amzn2023.x86_64 #1
> [ 0.045633] Tainted: [W]=WARN
> [ 0.045633] Hardware name: Amazon EC2 t3.micro/, BIOS 1.0 10/16/2017
> [ 0.045633] RIP: 0010:expand+0x1af/0x1e0
> [ 0.045633] Code: c6 05 af 06 14 02 01 e8 9f fd ff ff 89 e9 8b 54 24 34 48 c7 c7 a8 6d 51 8e 48 89 c6 b8 01 00 00 00 d3 e0 89 c1 e8 21 97 d2 ff <0f> 0b e9 e5 fe ff ff 48 c7 c6 e0 6d 51 8e 4c 89 ff e8 eb 23 fc ff
> [ 0.045633] RSP: 0000:ffffd592c002f828 EFLAGS: 00010082
> [ 0.045633] RAX: 0000000000000000 RBX: ffff8e363b2cbc80 RCX: ffffffff8f1f0c68
> [ 0.045633] RDX: 0000000000000000 RSI: 00000000fffeffff RDI: 0000000000000001
> [ 0.045633] RBP: 0000000000000003 R08: 0000000000000000 R09: ffffd592c002f6d0
> [ 0.045633] R10: ffffd592c002f6c8 R11: ffffffff8f370ca8 R12: 0000000000000008
> [ 0.045633] R13: 0000000000038e98 R14: 0000000000000003 R15: fffffb9c40e3a600
> [ 0.045633] FS: 0000000000000000(0000) GS:ffff8e3639f00000(0000) knlGS:0000000000000000
> [ 0.045633] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 0.045633] CR2: 0000000000000000 CR3: 000000001bc34001 CR4: 00000000007706f0
> [ 0.045633] PKRU: 55555554
> [ 0.045633] Call Trace:
> [ 0.045633] <TASK>
> [ 0.045633] rmqueue_bulk+0x541/0x880
> [ 0.045633] __rmqueue_pcplist+0x233/0x2c0
> [ 0.045633] rmqueue.constprop.0+0x4b6/0xe80
> [ 0.045633] ? _raw_spin_unlock+0xa/0x30
> [ 0.045633] ? rmqueue.constprop.0+0x557/0xe80
> [ 0.045633] ? _raw_spin_unlock_irqrestore+0xa/0x30
> [ 0.045633] get_page_from_freelist+0x16e/0x5f0
> [ 0.045633] __alloc_pages_noprof+0x18a/0x350
> [ 0.045633] alloc_pages_mpol_noprof+0xf2/0x1e0
> [ 0.045633] ? shuffle_freelist+0x126/0x1b0
> [ 0.045633] allocate_slab+0x2b3/0x410
> [ 0.045633] ___slab_alloc+0x396/0x830
> [ 0.045633] ? switch_hrtimer_base+0x8e/0x190
> [ 0.045633] ? timerqueue_add+0x9b/0xc0
> [ 0.045633] ? dup_task_struct+0x2d/0x1b0
> [ 0.045633] ? _raw_spin_unlock_irqrestore+0xa/0x30
> [ 0.045633] ? start_dl_timer+0xb0/0x140
> [ 0.045633] kmem_cache_alloc_node_noprof+0x271/0x2e0
> [ 0.045633] ? dup_task_struct+0x2d/0x1b0
> [ 0.045633] dup_task_struct+0x2d/0x1b0
> [ 0.045633] copy_process+0x195/0x17e0
> [ 0.045633] kernel_clone+0x9a/0x3b0
> [ 0.045633] ? psi_task_switch+0x105/0x290
> [ 0.045633] kernel_thread+0x6b/0x90
> [ 0.045633] ? __pfx_kthread+0x10/0x10
> [ 0.045633] kthreadd+0x276/0x2d0
> [ 0.045633] ? __pfx_kthreadd+0x10/0x10
> [ 0.045633] ret_from_fork+0x30/0x50
> [ 0.045633] ? __pfx_kthreadd+0x10/0x10
> [ 0.045633] ret_from_fork_asm+0x1a/0x30
> [ 0.045633] </TASK>
> [ 0.045633] ---[ end trace 0000000000000000 ]---
>
> > >
> >
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] mm: Fix memblock_free_late() when using deferred struct page
2026-02-10 14:32 ` Mike Rapoport
@ 2026-02-10 23:23 ` Benjamin Herrenschmidt
2026-02-11 5:20 ` Mike Rapoport
2026-02-16 5:34 ` Benjamin Herrenschmidt
2026-02-16 4:53 ` Benjamin Herrenschmidt
1 sibling, 2 replies; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2026-02-10 23:23 UTC (permalink / raw)
To: Mike Rapoport; +Cc: linux-mm, Alexander Potapenko, Marco Elver, Dmitry Vyukov
On Tue, 2026-02-10 at 16:32 +0200, Mike Rapoport wrote:
> Hi Ben,
>
> On Tue, Feb 10, 2026 at 07:34:15PM +1100, Benjamin Herrenschmidt wrote:
> > On Tue, 2026-02-10 at 17:17 +1100, Benjamin Herrenschmidt wrote:
> > >
> > > So ... that was a backport to 6.12.68 and my original patch is
> > > crashing
> > > the same way ! (it was working last week interestingly enough,
> > > something else got backported that gets in the way maybe ?).
> > >
> > > I'm going to have to go back to digging :-(
> > >
> > > I suspect the pages aren't reserved. I swear this was working :-)
> >
> > So I rebuilt with a bit of extra debug prints, CONFIG_DEBUG_VM on, and
> > memblock=debug ... it's not hitting the reserved check, but it's also
> > not crashing the same way (still 6.12, I'll play with upstream again
> > later):
> >
> > .../...
>
> Do you mind sending the entire log?
I misplaced it, but give me a few days to dig see if I can find what's going on.
Cheers
Ben.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] mm: Fix memblock_free_late() when using deferred struct page
2026-02-10 23:23 ` Benjamin Herrenschmidt
@ 2026-02-11 5:20 ` Mike Rapoport
2026-02-16 5:34 ` Benjamin Herrenschmidt
1 sibling, 0 replies; 33+ messages in thread
From: Mike Rapoport @ 2026-02-11 5:20 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: linux-mm, Alexander Potapenko, Marco Elver, Dmitry Vyukov
On Wed, Feb 11, 2026 at 10:23:06AM +1100, Benjamin Herrenschmidt wrote:
> On Tue, 2026-02-10 at 16:32 +0200, Mike Rapoport wrote:
> > Hi Ben,
> >
> > On Tue, Feb 10, 2026 at 07:34:15PM +1100, Benjamin Herrenschmidt wrote:
> > > On Tue, 2026-02-10 at 17:17 +1100, Benjamin Herrenschmidt wrote:
> > > >
> > > > So ... that was a backport to 6.12.68 and my original patch is
> > > > crashing
> > > > the same way ! (it was working last week interestingly enough,
> > > > something else got backported that gets in the way maybe ?).
> > > >
> > > > I'm going to have to go back to digging :-(
> > > >
> > > > I suspect the pages aren't reserved. I swear this was working :-)
> > >
> > > So I rebuilt with a bit of extra debug prints, CONFIG_DEBUG_VM on, and
> > > memblock=debug ... it's not hitting the reserved check, but it's also
> > > not crashing the same way (still 6.12, I'll play with upstream again
> > > later):
> > >
> > > .../...
> >
> > Do you mind sending the entire log?
>
> I misplaced it, but give me a few days to dig see if I can find what's going on.
A log from any nano instance even without debug would be helpful. I'd like
to see what e820 table it has.
There always been some gap between e820 and memblock because historically
e820 does not consider some of the reserved memory as memory.
If large enough parts of the boot services memory are E820_TYPE_RESERVED,
there might be no memory map for them at all.
> Cheers
> Ben.
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] mm: Fix memblock_free_late() when using deferred struct page
2026-02-10 14:32 ` Mike Rapoport
2026-02-10 23:23 ` Benjamin Herrenschmidt
@ 2026-02-16 4:53 ` Benjamin Herrenschmidt
2026-02-16 15:28 ` Mike Rapoport
1 sibling, 1 reply; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2026-02-16 4:53 UTC (permalink / raw)
To: Mike Rapoport; +Cc: linux-mm, Alexander Potapenko, Marco Elver, Dmitry Vyukov
(stripping history)
So I went into a big refresher (or learning exercise since there's
quite a bit here that I never really looked at before either).
So here is a break down, in chronological order, of the setup and
initialization of the memory map, and how the reserve business
interacts with it as I understand it from reading the code.
Please correct me if I missed or misunderstood something :-) Also maybe
this is worth turning into a piece of doc ?
Then some conclusions (I think I know why the patches crashed).
1) Setting up the memblock maps
-------------------------------
This is the first thing that happens, usually deep in arch code (though
DT based archs use common code for it).
* memblock.memory is initialized (from e820 in our case). In the e820
case, we only populate what is explicitely marked as usable. So we have
a pile of holes in there, especially around low memory where ACPI
sticks a bunch of things.
So for example, this snippet:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000007fffff] usable
[ 0.000000] BIOS-e820: [mem 0x0000000000800000-0x0000000000807fff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x0000000000808000-0x000000000080afff] usable
Will result in a 'hole' from 0x0000000000800000 to 0x0000000000807fff.
* This is also where we collect the EFI boot services memory map and
plonk
it in memblock.reserved on x86 via efi_reserve_boot_services().
This will be useful later.
From this point, memblock is the memory allocator.
2) Allocation of memory backing for struct pages (memmap).
----------------------------------------------------------
Before we poke at struct pages, they need to exist.
On sparsemem systems, this happens at
setup_arch() -> ... -> paging_init() (in arch code)
which calls sparse_init() to do the job.
From my understanding they memmap is effectively created (though not
initialized) in sections by memblocks_present() in sparse.c which
iterates
the memblock.memory list (coming from e820 above) and calls
memblock_present() for each usable chunk.
On sparsemem, the section_mem_map in the memory sections is set to
track
which sections have mapped backing pages, for use later by pfn_valid()
Note that hole we had in my example is too small to result in a missing
sparsemem allocation, but any big enough hole (as big as a section)
could result in struct page(s) not existing at all.
For non-sparsemem systems, the mem_map allocation happens a little bit
later, in
paging_init() -> zone_sizes_init() -> free_area_init() ->
free_area_init_node(), but for all intent and purpose, it is the same
time.
3) Early initialization of struct pages
---------------------------------------
Once allocated, struct pages need to be initialized. We have a multi-
stage process due to the option of deferring that initialization to a
multithreaded process.
The first stage of initialization of struct pages happens in
paging_init() -> free_area_init(). So *right after* the allocation
mentioned above.
It sets up the zones and a bunch of other things (including
free_area_init_node() mentioned above), and eventually calls
memmap_init() which is the interesting bit here.
Ignoring the ZONE_DEVICE case for now, memmap_init() will iterate the
memblock.memory ranges (so the same ranges for which we ensured we have
allocated the corresponding sections of mem_map earlier) and the zones,
calling memmap_init_zone_range() for each combination:
First memmap_init_zone_range() will for each valid intersection of
memory range and zone, initialize struct pages until defer_init() says
no more (ie, deferring by setting pgdat->first_deferred_pfn to
something that isn't ULONG_MAX).
We start with only one section. This is where the "deferral point" is
established. (There is a mechanism to "grow" that early initialization
on demand if early allocs need it but I'll ignore that for now as
well).
It also tracks the holes between the regions and calls
init_unavailable_range() for those (additionally memmap_init() calls
init_unavailable_range() one last time for any hole after the last
region).
Note that init_unavailable_range() is thus called for *every* hole
between the memory regions, regardless of whether we have deferred
something or not and regardless of whether we have allocated sections
of memory map or not at this point. The pfn_valid() test inside
init_unavailable_range() will take care of skipping the unallocated
sections of memory map. So far so good...
So at this point, we have:
- mem_map allocated
- "usable" memory ranges has struct pages initialized up to the
"deferral" point for not-already-reserved regions. (additionally marked
reserved already for ZONE_DEVICE, otherwise not).
- holes between memory ranges have struct page initialized and
reserved provided they have corresponding backing struct pages
allocated (present sections).
- What is uninitialized at this point are any struct pages above the
deferral point. Anything else is initialized. Not all reservations are
represented yet
IE. The memmap has backing memory, initialized for all holes and up to
the deferral point for the rest. Only reserved for holes (and
ZONE_DEVICE). We still have work to do :-)
Now, we go back from setup_arch() to the main boot process, memblock is
still "live" and our primary memory allocator.
4) Transition to the page allocator
-----------------------------------
A bit later, still fairly early during boot, it's time to enable the
page allocator and slab. It all starts with mm_core_init() ->
mem_init() in arch code.
Now mem_init() has been abused over time to do more than just this, but
the meat here is that it eventually calls memblock_free_all(). This is
when we start actually "freeing" pages and reserving memblock.reserved
pages.
* First we calls free_unused_memmap(). So from what I can tell, this
frees bits of the mem_map that aren't covered in memblock.memory. Now
I'm not too sure what the purpose of this is at this point, as we
already only allocated the mem_map for what's in memblock.memory early
on. Could this be that we have code path that take out sections of
memblock.memory between then and now that I missed ?
* Then the meat of the matter: free_low_memory_core_early() which does
the interesting stuff, notably memmap_init_reserved_pages() and
__free_memory_core(). The former reserves the stuff that should be
reserved, the latter sends non-deferred and non-reserved pages to buddy
for use. Let's focus on the former:
* memmap_init_reserved_pages() is doing mostly two passes, one looks
for memblock.memory regions marked nomap, and reserves them, which I'll
ignore. The second pass use the for_each_reserved_mem_region() iterator
to mark memblock.reserved regions reserved using
reserve_bootmem_region().
This will just walk memblock.reserved blindly, doesn't specifically
limits itself to things covered by memblock.memory (ie e820). The
saving grace here is that it checks for pfn_valid(), and so will avoid
holes in the mem_map.
There is no other check, so if a page happens to be marked as reserved
by the BIOS and also part of a "hole", the struct page will be
initialized twice.
In both cases we land in init_reserved_page() followed by
__SetPageReserved(). In both case pfn_valid() should save the day if
the corresponding section of mem_map hasn't been allocated (which could
happen since we ignore memblock.memory).
Let's have a closer look. init_reserved_page() is called for every
reserved page in memblock.reserved() for which a backing struct page
exists basically. However the first thing it does is:
if (early_page_initialised(pfn, nid))
return;
That means that anything below the deferral point is skipped. Fair
enough, it has already been initialized as we established earlier
(note: the marking of PG_reserved happens in the caller, so it happens
regardless of that test, as expected).
That does mean that there is a small window here for double-
initialization: reserved areas covering memory holes above the deferral
point will be initialized twice, once earlier as all holes are, and
once here. I don't think that's an issue however, is it ?
At this point, we thus have initialized and marked all
memblock.reserved pages properly (as long as they don't land in a
hole), whether they sit below or above the deferral point.
Next we actually free some memory into the page allocator with:
for_each_free_mem_range(i, NUMA_NO_NODE, MEMBLOCK_NONE, &start, &end,
NULL)
count += __free_memory_core(start, end);
Nothing much to add here, it skips reserved regions and "frees" the
remaining pages in the usable mem ranges.
One little nit: This iterates everything. The decision to skip pages
below the deferral point (since they struct page isn't initialized)
comes from the test early_page_initialised() inside
memblock_free_pages().
At this point, the page allocator is "live" and memblock is "dead"
(though the memblock data structures are still around, it is just not
supposed to be updated anymore).
5) Late freeing of memblock memory (EFI Boot Services and others)
-----------------------------------------------------------------
This is the result of something calling memblock_free_late() after the
above point.
Now, for the sake of this conversation, I assume this happens *before*
the deferred pages init. There could be cases where it happens after, I
haven't audited all callers of memblock_free_late(), I'm mostly
interested in what happens in efi_free_boot_services() and that happens
before.
We also assume we cannot trust the EFI memory map to contains only
things referencing usable memory. So we get called with stuff that may
or may not be backed by a struct page, and if it does, the struct page
may or may not be initialized.
I think we can assume that:
* If pfn_valid() the struct page exists, otherwise it doesn't.
* If it exists, then the struct page was initialized if (and only if)
it was marked reserved earlier. It doesn't matter if it sits in a hole
anymore at this point. If it was not marked reserved, the struct page
has also not been initialized if above the deferral point. We assume
that all those pages HAVE been marked reserved by
efi_reserve_boot_services() earlier, meaning they *are* initialized as
long as pfn_valid() is happy.
* One thing I have NOT yet figured out ... do we have a problem if the
page is in a hole that lands outside of a zone boundary ? I haven't
really got my head deep down into the details of zone initializations
(especially as we adjust the boundaries here or there), so this could
be a problem.
99) Conclusion :-)
------------------
Nothing firm yet here but a few hints at what could possibly go wrong
and one obvious issue with the previous patch(es).
First the obvious ... the proposed patch that just makes
memblock_free_late() call free_reserved_page() is missing a call to
pfn_valid(). Without this, it can (and will) hit holes in the mem_map,
and that's probably one of the crashes I reported.
Now, it would be nice to then go allocate those missing bits of
mem_map, because I really don't want to give up on that memory. Small
instances are a thing and with the current price of DRAM, a fairly
relevant one :-) But I'll look at that later.
My original patch had the exact same issue btw.
The other potential issue, for which I welcome your input as I'm
running short on time for the day is ... the impact to zones. I see a
possibility for those pages to be outside of any zone's
zone_start_pfn/spanned_pages range ... or not ? As I said, I didn't get
my head yet around the zones init and spanning adjustments that
happens, so I don't know if we really have potentially "holes" here or
not.
This leads to the question... could we work around a lot of those
issues easily by making the early efi_reserve_boot_services() *also*
add the regions to memblock.memory in addition to memblock.reserve ?
ie, those regions are marked as boot services code/data, so they must
be memory to begin with, and that's all early enough that we can do it.
We should still add the missing pfn_valid() of course, if anything for
the sake of any other caller of memblock_free_late() ... or we could
change memblock_free_late() to only consider ranges that are both
reserved *and* in memblock.memory. You mentioned that might be slow
though.
Opinions ?
Cheers,
Ben.
On Tue, 2026-02-10 at 16:32 +0200, Mike Rapoport wrote:
> Hi Ben,
>
> On Tue, Feb 10, 2026 at 07:34:15PM +1100, Benjamin Herrenschmidt
> wrote:
> > On Tue, 2026-02-10 at 17:17 +1100, Benjamin Herrenschmidt wrote:
> > >
> > > So ... that was a backport to 6.12.68 and my original patch is
> > > crashing
> > > the same way ! (it was working last week interestingly enough,
> > > something else got backported that gets in the way maybe ?).
> > >
> > > I'm going to have to go back to digging :-(
> > >
> > > I suspect the pages aren't reserved. I swear this was working :-)
> >
> > So I rebuilt with a bit of extra debug prints, CONFIG_DEBUG_VM on,
> > and
> > memblock=debug ... it's not hitting the reserved check, but it's
> > also
> > not crashing the same way (still 6.12, I'll play with upstream
> > again
> > later):
> >
> > .../...
>
> Do you mind sending the entire log?
>
> >
> > [ 0.045633] Freeing SMP alternatives memory: 36K
> > [ 0.045633] pid_max: default: 32768 minimum: 301
> > [ 0.045633] memblock_free_late: [0x000000003d36b000-
> > 0x000000003d37bfff] efi_free_boot_services+0x11f/0x2e0
> > [ 0.045633] memblock_free_late: [0x000000003b336000-
> > 0x000000003d36afff] efi_free_boot_services+0x11f/0x2e0
> > [ 0.045633] memblock_free_late: [0x000000003b317000-
> > 0x000000003b335fff] efi_free_boot_services+0x11f/0x2e0
> > [ 0.045633] memblock_free_late: [0x000000003b2f7000-
> > 0x000000003b316fff] efi_free_boot_services+0x11f/0x2e0
> > [ 0.045633] memblock_free_late: [0x000000003b000000-
> > 0x000000003b1fffff] efi_free_boot_services+0x11f/0x2e0
> > [ 0.045633] memblock_free_late: [0x00000000393de000-
> > 0x00000000393defff] efi_free_boot_services+0x11f/0x2e0
> > [ 0.045633] memblock_free_late: [0x0000000038e73000-
> > 0x00000000390cdfff] efi_free_boot_services+0x11f/0x2e0
> > [ 0.045633] LSM: initializing
> > lsm=lockdown,capability,landlock,yama,safesetid,selinux,bpf,ima
> > [ 0.045633] landlock: Up and running.
> > [ 0.045633] Yama: becoming mindful.
> > [ 0.045633] SELinux: Initializing.
> > [ 0.045633] LSM support for eBPF active
> > [ 0.045633] Mount-cache hash table entries: 2048 (order: 2,
> > 16384 bytes, linear)
> > [ 0.045633] Mountpoint-cache hash table entries: 2048 (order: 2,
> > 16384 bytes, linear)
> > [ 0.045633] smpboot: CPU0: Intel(R) Xeon(R) Platinum 8259CL CPU
> > @ 2.50GHz (family: 0x6, model: 0x55, stepping: 0x7)
> > [ 0.045633] Performance Events: unsupported p6 CPU model 85 no
> > PMU driver, software events only.
> > [ 0.045633] signal: max sigframe size: 3632
> > [ 0.045633] rcu: Hierarchical SRCU implementation.
> > [ 0.045633] rcu: Max phase no-delay instances is 1000.
> > [ 0.045633] Timer migration: 1 hierarchy levels; 8 children per
> > group; 1 crossnode level
> > [ 0.045633] smp: Bringing up secondary CPUs ...
> > [ 0.045633] smpboot: x86: Booting SMP configuration:
> > [ 0.045633] .... node #0, CPUs: #1
> > [ 0.045633] MDS CPU bug present and SMT on, data leak possible.
> > See
> > https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html
> > for more details.
> > [ 0.045633] MMIO Stale Data CPU bug present and SMT on, data
> > leak possible. See
> > https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/processor_mmio_stale_data.html
> > for more details.
> > [ 0.045633] smp: Brought up 1 node, 2 CPUs
> > [ 0.045633] smpboot: Total of 2 processors activated (9999.97
> > BogoMIPS)
> > [ 0.045633] node 0 deferred pages initialised in 0ms
> > [ 0.045633] Memory: 900460K/999468K available (16384K kernel
> > code, 9440K rwdata, 11364K rodata, 3740K init, 6440K bss, 94600K
> > reserved, 0K cma-reserved)
> > [ 0.045633] devtmpfs: initialized
> > [ 0.045633] x86/mm: Memory block size: 128MB
> > [ 0.045633] ------------[ cut here ]------------
> > [ 0.045633] page type is 1, passed migratetype is 0 (nr=16)
> > [ 0.045633] WARNING: CPU: 1 PID: 2 at mm/page_alloc.c:721
> > rmqueue_bulk+0x82e/0x880
> > [ 0.045633] Modules linked in:
> > [ 0.045633] CPU: 1 UID: 0 PID: 2 Comm: kthreadd Not tainted
> > 6.12.68-93.123.amzn2023.x86_64 #1
> > [ 0.045633] Hardware name: Amazon EC2 t3.micro/, BIOS 1.0
> > 10/16/2017
> > [ 0.045633] RIP: 0010:rmqueue_bulk+0x82e/0x880
> > [ 0.045633] Code: c6 05 be be 13 02 01 e8 b0 b5 ff ff 44 89 e9
> > 8b 14 24 48 c7 c7 a8 6d 51 8e 48 89 c6 b8 01 00 00 00 d3 e0 89 c1
> > e8 32 4f d2 ff <0f> 0b 4c 8b 44 24 48 e9 79 fc ff ff 48 c7 c6 e0 77
> > 51 8e 4c 89 e7
> > [ 0.045633] RSP: 0000:ffffd592c002f898 EFLAGS: 00010086
> > [ 0.045633] RAX: 0000000000000000 RBX: ffff8e363b2cbc80 RCX:
> > ffffffff8f1f0c68
> > [ 0.045633] RDX: 0000000000000000 RSI: 00000000fffeffff RDI:
> > 0000000000000001
> > [ 0.045633] RBP: fffffb9c40e3a408 R08: 0000000000000000 R09:
> > ffffd592c002f740
> > [ 0.045633] R10: ffffd592c002f738 R11: ffffffff8f370ca8 R12:
> > fffffb9c40e3a400
> > [ 0.045633] R13: 0000000000000004 R14: 0000000000000003 R15:
> > 0000000000038e90
> > [ 0.045633] FS: 0000000000000000(0000)
> > GS:ffff8e3639f00000(0000) knlGS:0000000000000000
> > [ 0.045633] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 0.045633] CR2: 0000000000000000 CR3: 000000001bc34001 CR4:
> > 00000000007706f0
> > [ 0.045633] PKRU: 55555554
> > [ 0.045633] Call Trace:
> > [ 0.045633] <TASK>
> > [ 0.045633] __rmqueue_pcplist+0x233/0x2c0
> > [ 0.045633] rmqueue.constprop.0+0x4b6/0xe80
> > [ 0.045633] ? _raw_spin_unlock+0xa/0x30
> > [ 0.045633] ? rmqueue.constprop.0+0x557/0xe80
> > [ 0.045633] ? _raw_spin_unlock_irqrestore+0xa/0x30
> > [ 0.045633] get_page_from_freelist+0x16e/0x5f0
> > [ 0.045633] __alloc_pages_noprof+0x18a/0x350
> > [ 0.045633] alloc_pages_mpol_noprof+0xf2/0x1e0
> > [ 0.045633] ? shuffle_freelist+0x126/0x1b0
> > [ 0.045633] allocate_slab+0x2b3/0x410
> > [ 0.045633] ___slab_alloc+0x396/0x830
> > [ 0.045633] ? switch_hrtimer_base+0x8e/0x190
> > [ 0.045633] ? timerqueue_add+0x9b/0xc0
> > [ 0.045633] ? dup_task_struct+0x2d/0x1b0
> > [ 0.045633] ? _raw_spin_unlock_irqrestore+0xa/0x30
> > [ 0.045633] ? start_dl_timer+0xb0/0x140
> > [ 0.045633] kmem_cache_alloc_node_noprof+0x271/0x2e0
> > [ 0.045633] ? dup_task_struct+0x2d/0x1b0
> > [ 0.045633] dup_task_struct+0x2d/0x1b0
> > [ 0.045633] copy_process+0x195/0x17e0
> > [ 0.045633] kernel_clone+0x9a/0x3b0
> > [ 0.045633] ? psi_task_switch+0x105/0x290
> > [ 0.045633] kernel_thread+0x6b/0x90
> > [ 0.045633] ? __pfx_kthread+0x10/0x10
> > [ 0.045633] kthreadd+0x276/0x2d0
> > [ 0.045633] ? __pfx_kthreadd+0x10/0x10
> > [ 0.045633] ret_from_fork+0x30/0x50
> > [ 0.045633] ? __pfx_kthreadd+0x10/0x10
> > [ 0.045633] ret_from_fork_asm+0x1a/0x30
> > [ 0.045633] </TASK>
> > [ 0.045633] ---[ end trace 0000000000000000 ]---
> > [ 0.045633] ------------[ cut here ]------------
> > [ 0.045633] page type is 1, passed migratetype is 0 (nr=8)
> > [ 0.045633] WARNING: CPU: 1 PID: 2 at mm/page_alloc.c:686
> > expand+0x1af/0x1e0
> > [ 0.045633] Modules linked in:
> > [ 0.045633] CPU: 1 UID: 0 PID: 2 Comm: kthreadd Tainted:
> > G W 6.12.68-93.123.amzn2023.x86_64 #1
> > [ 0.045633] Tainted: [W]=WARN
> > [ 0.045633] Hardware name: Amazon EC2 t3.micro/, BIOS 1.0
> > 10/16/2017
> > [ 0.045633] RIP: 0010:expand+0x1af/0x1e0
> > [ 0.045633] Code: c6 05 af 06 14 02 01 e8 9f fd ff ff 89 e9 8b
> > 54 24 34 48 c7 c7 a8 6d 51 8e 48 89 c6 b8 01 00 00 00 d3 e0 89 c1
> > e8 21 97 d2 ff <0f> 0b e9 e5 fe ff ff 48 c7 c6 e0 6d 51 8e 4c 89 ff
> > e8 eb 23 fc ff
> > [ 0.045633] RSP: 0000:ffffd592c002f828 EFLAGS: 00010082
> > [ 0.045633] RAX: 0000000000000000 RBX: ffff8e363b2cbc80 RCX:
> > ffffffff8f1f0c68
> > [ 0.045633] RDX: 0000000000000000 RSI: 00000000fffeffff RDI:
> > 0000000000000001
> > [ 0.045633] RBP: 0000000000000003 R08: 0000000000000000 R09:
> > ffffd592c002f6d0
> > [ 0.045633] R10: ffffd592c002f6c8 R11: ffffffff8f370ca8 R12:
> > 0000000000000008
> > [ 0.045633] R13: 0000000000038e98 R14: 0000000000000003 R15:
> > fffffb9c40e3a600
> > [ 0.045633] FS: 0000000000000000(0000)
> > GS:ffff8e3639f00000(0000) knlGS:0000000000000000
> > [ 0.045633] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 0.045633] CR2: 0000000000000000 CR3: 000000001bc34001 CR4:
> > 00000000007706f0
> > [ 0.045633] PKRU: 55555554
> > [ 0.045633] Call Trace:
> > [ 0.045633] <TASK>
> > [ 0.045633] rmqueue_bulk+0x541/0x880
> > [ 0.045633] __rmqueue_pcplist+0x233/0x2c0
> > [ 0.045633] rmqueue.constprop.0+0x4b6/0xe80
> > [ 0.045633] ? _raw_spin_unlock+0xa/0x30
> > [ 0.045633] ? rmqueue.constprop.0+0x557/0xe80
> > [ 0.045633] ? _raw_spin_unlock_irqrestore+0xa/0x30
> > [ 0.045633] get_page_from_freelist+0x16e/0x5f0
> > [ 0.045633] __alloc_pages_noprof+0x18a/0x350
> > [ 0.045633] alloc_pages_mpol_noprof+0xf2/0x1e0
> > [ 0.045633] ? shuffle_freelist+0x126/0x1b0
> > [ 0.045633] allocate_slab+0x2b3/0x410
> > [ 0.045633] ___slab_alloc+0x396/0x830
> > [ 0.045633] ? switch_hrtimer_base+0x8e/0x190
> > [ 0.045633] ? timerqueue_add+0x9b/0xc0
> > [ 0.045633] ? dup_task_struct+0x2d/0x1b0
> > [ 0.045633] ? _raw_spin_unlock_irqrestore+0xa/0x30
> > [ 0.045633] ? start_dl_timer+0xb0/0x140
> > [ 0.045633] kmem_cache_alloc_node_noprof+0x271/0x2e0
> > [ 0.045633] ? dup_task_struct+0x2d/0x1b0
> > [ 0.045633] dup_task_struct+0x2d/0x1b0
> > [ 0.045633] copy_process+0x195/0x17e0
> > [ 0.045633] kernel_clone+0x9a/0x3b0
> > [ 0.045633] ? psi_task_switch+0x105/0x290
> > [ 0.045633] kernel_thread+0x6b/0x90
> > [ 0.045633] ? __pfx_kthread+0x10/0x10
> > [ 0.045633] kthreadd+0x276/0x2d0
> > [ 0.045633] ? __pfx_kthreadd+0x10/0x10
> > [ 0.045633] ret_from_fork+0x30/0x50
> > [ 0.045633] ? __pfx_kthreadd+0x10/0x10
> > [ 0.045633] ret_from_fork_asm+0x1a/0x30
> > [ 0.045633] </TASK>
> > [ 0.045633] ---[ end trace 0000000000000000 ]---
> >
> > > >
> > >
> >
>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] mm: Fix memblock_free_late() when using deferred struct page
2026-02-10 23:23 ` Benjamin Herrenschmidt
2026-02-11 5:20 ` Mike Rapoport
@ 2026-02-16 5:34 ` Benjamin Herrenschmidt
2026-02-16 6:51 ` Benjamin Herrenschmidt
1 sibling, 1 reply; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2026-02-16 5:34 UTC (permalink / raw)
To: Mike Rapoport; +Cc: linux-mm, Alexander Potapenko, Marco Elver, Dmitry Vyukov
On Wed, 2026-02-11 at 10:23 +1100, Benjamin Herrenschmidt wrote:
> >
> > Do you mind sending the entire log?
>
> I misplaced it, but give me a few days to dig see if I can find what's going on.
Here's a log on a t3 instance with you patch plus some printk's of mine
showing the EFI memory reserves and with memblock debug.
This crash is NOT explained by the missing pfn_valid() I mentioned in my
earlier email. I don't know if there could be a zone issue here, it does
look like every page we are trying to free is contained in a e820 "usable"
range here unless I really misread something.
Just in case I accidentally screwed up something last week, I will run
a bunch of tests with both my original and your patch + the missing
pfn_valid(), see if that trips anything.
I will separately look at adding the EFI regions to the memblock.memory,
as I think the very first crash I reported smelled a lot like an unmapped
piece of mem_map.
Cheers,
Ben.
[ec2-user@ip-172-31-29-240 ~]$ sudo dmesg
[ 0.000000] Linux version 6.12.68-93.123.amzn2023.x86_64 (mockbuild@ip-10-0-53-120) (gcc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-5), GNU ld version 2.41-50.amzn2023.0.5) #1 SMP PREEMPT_DYNAMIC Tue Feb 10 06:40:43 UTC 2026
[ 0.000000] Command line: BOOT_IMAGE=(hd0,gpt1)/boot/vmlinuz-6.12.68-93.123.amzn2023.x86_64 root=UUID=7813e2d4-9cdc-416a-a749-25de8a9f36d0 ro console=tty0 console=ttyS0,115200n8 nvme_core.io_timeout=4294967295 rd.emergency=poweroff rd.shell=0 selinux=1 security=selinux quiet memblock=debug
[ 0.000000] KASLR enabled
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000390cdfff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000390ce000-0x000000003934dfff] reserved
[ 0.000000] BIOS-e820: [mem 0x000000003934e000-0x000000003935dfff] ACPI data
[ 0.000000] BIOS-e820: [mem 0x000000003935e000-0x00000000393ddfff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x00000000393de000-0x000000003d37bfff] usable
[ 0.000000] BIOS-e820: [mem 0x000000003d37c000-0x000000003d3fffff] reserved
[ 0.000000] memblock_reserve: [0x0000000039350040-0x000000003935060f] efi_memblock_x86_reserve_range+0x159/0x1e0
[ 0.000000] NX (Execute Disable) protection: active
[ 0.000000] APIC: Static calls initialized
[ 0.000000] e820: update [mem 0x372be018-0x372c6e57] usable ==> usable
[ 0.000000] extended physical RAM map:
[ 0.000000] reserve setup_data: [mem 0x0000000000000000-0x000000000009ffff] usable
[ 0.000000] reserve setup_data: [mem 0x0000000000100000-0x00000000372be017] usable
[ 0.000000] reserve setup_data: [mem 0x00000000372be018-0x00000000372c6e57] usable
[ 0.000000] reserve setup_data: [mem 0x00000000372c6e58-0x00000000390cdfff] usable
[ 0.000000] reserve setup_data: [mem 0x00000000390ce000-0x000000003934dfff] reserved
[ 0.000000] reserve setup_data: [mem 0x000000003934e000-0x000000003935dfff] ACPI data
[ 0.000000] reserve setup_data: [mem 0x000000003935e000-0x00000000393ddfff] ACPI NVS
[ 0.000000] reserve setup_data: [mem 0x00000000393de000-0x000000003d37bfff] usable
[ 0.000000] reserve setup_data: [mem 0x000000003d37c000-0x000000003d3fffff] reserved
[ 0.000000] efi: EFI v2.7 by EDK II
[ 0.000000] efi: SMBIOS=0x3926a000 ACPI=0x3935d000 ACPI 2.0=0x3935d014 MEMATTR=0x37a43a98
[ 0.000000] memblock_reserve: [0x0000000037a43a98-0x0000000037a43e37] efi_memattr_init+0x4d/0xa0
[ 0.000000] SMBIOS 2.7 present.
[ 0.000000] DMI: Amazon EC2 t3.micro/, BIOS 1.0 10/16/2017
[ 0.000000] DMI: Memory slots populated: 1/1
[ 0.000000] Hypervisor detected: KVM
[ 0.000000] last_pfn = 0x3d37c max_arch_pfn = 0x400000000
[ 0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00
[ 0.000000] kvm-clock: using sched offset of 7715848997 cycles
[ 0.000003] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[ 0.000006] tsc: Detected 2499.994 MHz processor
[ 0.000088] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
[ 0.000090] e820: remove [mem 0x000a0000-0x000fffff] usable
[ 0.000096] last_pfn = 0x3d37c max_arch_pfn = 0x400000000
[ 0.000123] MTRR map: 4 entries (2 fixed + 2 variable; max 18), built from 8 variable MTRRs
[ 0.000126] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT
[ 0.006943] memblock_reserve: [0x000000001d400000-0x000000001d40cfff] setup_arch+0x953/0xad0
[ 0.006950] memblock_add: [0x0000000000001000-0x000000000009ffff] e820__memblock_setup+0x6f/0xb0
[ 0.006954] memblock_add: [0x0000000000100000-0x00000000372be017] e820__memblock_setup+0x6f/0xb0
[ 0.006957] memblock_add: [0x00000000372be018-0x00000000372c6e57] e820__memblock_setup+0x6f/0xb0
[ 0.006960] memblock_add: [0x00000000372c6e58-0x00000000390cdfff] e820__memblock_setup+0x6f/0xb0
[ 0.006962] memblock_add: [0x00000000393de000-0x000000003d37bfff] e820__memblock_setup+0x6f/0xb0
[ 0.006965] MEMBLOCK configuration:
[ 0.006965] memory size = 0x000000003d00b000 reserved size = 0x00000000049ff7b0
[ 0.006966] memory.cnt = 0x3
[ 0.006967] memory[0x0] [0x0000000000001000-0x000000000009ffff], 0x000000000009f000 bytes flags: 0x0
[ 0.006969] memory[0x1] [0x0000000000100000-0x00000000390cdfff], 0x0000000038fce000 bytes flags: 0x0
[ 0.006970] memory[0x2] [0x00000000393de000-0x000000003d37bfff], 0x0000000003f9e000 bytes flags: 0x0
[ 0.006971] reserved.cnt = 0x7
[ 0.006972] reserved[0x0] [0x0000000000000000-0x000000000000ffff], 0x0000000000010000 bytes flags: 0x0
[ 0.006973] reserved[0x1] [0x000000000009f000-0x00000000000fffff], 0x0000000000061000 bytes flags: 0x0
[ 0.006975] reserved[0x2] [0x000000001a000000-0x000000001d40cfff], 0x000000000340d000 bytes flags: 0x0
[ 0.006976] reserved[0x3] [0x000000003254f000-0x0000000033ac6fff], 0x0000000001578000 bytes flags: 0x0
[ 0.006977] reserved[0x4] [0x00000000372be018-0x00000000372c6e57], 0x0000000000008e40 bytes flags: 0x0
[ 0.006978] reserved[0x5] [0x0000000037a43a98-0x0000000037a43e37], 0x00000000000003a0 bytes flags: 0x0
[ 0.006979] reserved[0x6] [0x0000000039350040-0x000000003935060f], 0x00000000000005d0 bytes flags: 0x0
[ 0.006981] EFI XX 0x0000000038e73000..00000000390cdfff
[ 0.006982] memblock_reserve: [0x0000000038e73000-0x00000000390cdfff] efi_reserve_boot_services+0xc1/0x100
[ 0.006985] EFI XX 0x00000000393de000..00000000393defff
[ 0.006986] memblock_reserve: [0x00000000393de000-0x00000000393defff] efi_reserve_boot_services+0xc1/0x100
[ 0.006988] EFI XX 0x000000003b000000..000000003b1fffff
[ 0.006989] memblock_reserve: [0x000000003b000000-0x000000003b1fffff] efi_reserve_boot_services+0xc1/0x100
[ 0.006991] EFI XX 0x000000003b2f7000..000000003b316fff
[ 0.006992] memblock_reserve: [0x000000003b2f7000-0x000000003b316fff] efi_reserve_boot_services+0xc1/0x100
[ 0.006994] EFI XX 0x000000003b317000..000000003b335fff
[ 0.006995] memblock_reserve: [0x000000003b317000-0x000000003b335fff] efi_reserve_boot_services+0xc1/0x100
[ 0.006997] EFI XX 0x000000003b336000..000000003d36afff
[ 0.006998] memblock_reserve: [0x000000003b336000-0x000000003d36afff] efi_reserve_boot_services+0xc1/0x100
[ 0.007000] EFI XX 0x000000003d36b000..000000003d37bfff
[ 0.007000] memblock_reserve: [0x000000003d36b000-0x000000003d37bfff] efi_reserve_boot_services+0xc1/0x100
[ 0.007004] memblock_phys_alloc_range: 28672 bytes align=0x1000 from=0x0000000000000000 max_addr=0x0000000000100000 reserve_real_mode+0x53/0x90
[ 0.007008] memblock_reserve: [0x0000000000098000-0x000000000009efff] memblock_alloc_range_nid+0xbf/0x180
[ 0.007012] memblock_reserve: [0x0000000000000000-0x00000000000fffff] setup_arch+0x636/0xad0
[ 0.007014] Using GB pages for direct mapping
[ 0.007052] memblock_phys_alloc_range: 2097152 bytes align=0x200000 from=0x0000000000100000 max_addr=0x000000003d37c000 init_mem_mapping+0x140/0x2c0
[ 0.007055] memblock_reserve: [0x000000003ae00000-0x000000003affffff] memblock_alloc_range_nid+0xbf/0x180
[ 0.007058] memblock_phys_free: [0x000000003ae00000-0x000000003affffff] init_mem_mapping+0x160/0x2c0
[ 0.007261] Secure boot disabled
[ 0.007261] RAMDISK: [mem 0x3254f000-0x33ac6fff]
[ 0.007271] ACPI: Early table checksum verification disabled
[ 0.007275] ACPI: RSDP 0x000000003935D014 000024 (v02 AMAZON)
[ 0.007278] ACPI: XSDT 0x000000003935C0E8 00006C (v01 AMAZON AMZNFACP 00000001 01000013)
[ 0.007283] ACPI: FACP 0x0000000039355000 000114 (v01 AMAZON AMZNFACP 00000001 AMZN 00000001)
[ 0.007288] ACPI: DSDT 0x0000000039356000 00115A (v01 AMAZON AMZNDSDT 00000001 AMZN 00000001)
[ 0.007291] ACPI: FACS 0x00000000393D0000 000040
[ 0.007294] ACPI: WAET 0x000000003935B000 000028 (v01 AMAZON AMZNWAET 00000001 AMZN 00000001)
[ 0.007296] ACPI: SLIT 0x000000003935A000 00006C (v01 AMAZON AMZNSLIT 00000001 AMZN 00000001)
[ 0.007299] ACPI: APIC 0x0000000039359000 000076 (v01 AMAZON AMZNAPIC 00000001 AMZN 00000001)
[ 0.007301] ACPI: SRAT 0x0000000039358000 0000A0 (v01 AMAZON AMZNSRAT 00000001 AMZN 00000001)
[ 0.007304] ACPI: HPET 0x0000000039354000 000038 (v01 AMAZON AMZNHPET 00000001 AMZN 00000001)
[ 0.007306] ACPI: SSDT 0x0000000039353000 000759 (v01 AMAZON AMZNSSDT 00000001 AMZN 00000001)
[ 0.007309] ACPI: SSDT 0x0000000039352000 00007F (v01 AMAZON AMZNSSDT 00000001 AMZN 00000001)
[ 0.007312] ACPI: BGRT 0x0000000039351000 000038 (v01 AMAZON AMAZON 00000002 01000013)
[ 0.007314] ACPI: Reserving FACP table memory at [mem 0x39355000-0x39355113]
[ 0.007315] memblock_reserve: [0x0000000039355000-0x0000000039355113] acpi_reserve_initial_tables+0x46/0x70
[ 0.007319] ACPI: Reserving DSDT table memory at [mem 0x39356000-0x39357159]
[ 0.007320] memblock_reserve: [0x0000000039356000-0x0000000039357159] acpi_reserve_initial_tables+0x46/0x70
[ 0.007323] ACPI: Reserving FACS table memory at [mem 0x393d0000-0x393d003f]
[ 0.007323] memblock_reserve: [0x00000000393d0000-0x00000000393d003f] acpi_reserve_initial_tables+0x46/0x70
[ 0.007325] ACPI: Reserving WAET table memory at [mem 0x3935b000-0x3935b027]
[ 0.007326] memblock_reserve: [0x000000003935b000-0x000000003935b027] acpi_reserve_initial_tables+0x46/0x70
[ 0.007328] ACPI: Reserving SLIT table memory at [mem 0x3935a000-0x3935a06b]
[ 0.007329] memblock_reserve: [0x000000003935a000-0x000000003935a06b] acpi_reserve_initial_tables+0x46/0x70
[ 0.007331] ACPI: Reserving APIC table memory at [mem 0x39359000-0x39359075]
[ 0.007332] memblock_reserve: [0x0000000039359000-0x0000000039359075] acpi_reserve_initial_tables+0x46/0x70
[ 0.007334] ACPI: Reserving SRAT table memory at [mem 0x39358000-0x3935809f]
[ 0.007335] memblock_reserve: [0x0000000039358000-0x000000003935809f] acpi_reserve_initial_tables+0x46/0x70
[ 0.007337] ACPI: Reserving HPET table memory at [mem 0x39354000-0x39354037]
[ 0.007338] memblock_reserve: [0x0000000039354000-0x0000000039354037] acpi_reserve_initial_tables+0x46/0x70
[ 0.007340] ACPI: Reserving SSDT table memory at [mem 0x39353000-0x39353758]
[ 0.007341] memblock_reserve: [0x0000000039353000-0x0000000039353758] acpi_reserve_initial_tables+0x46/0x70
[ 0.007343] ACPI: Reserving SSDT table memory at [mem 0x39352000-0x3935207e]
[ 0.007344] memblock_reserve: [0x0000000039352000-0x000000003935207e] acpi_reserve_initial_tables+0x46/0x70
[ 0.007346] ACPI: Reserving BGRT table memory at [mem 0x39351000-0x39351037]
[ 0.007346] memblock_reserve: [0x0000000039351000-0x0000000039351037] acpi_reserve_initial_tables+0x46/0x70
[ 0.007398] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x3fffffff]
[ 0.007407] memblock_alloc_try_nid: 1 bytes align=0x1000 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 numa_alloc_distance+0x14a/0x200
[ 0.007414] memblock_reserve: [0x000000003b2f6000-0x000000003b2f6000] memblock_alloc_range_nid+0xbf/0x180
[ 0.007421] NUMA: Initialized distance table, cnt=1
[ 0.007430] memblock_reserve: [0x000000003b2cb680-0x000000003b2f5fff] memblock_alloc_range_nid+0xbf/0x180
[ 0.007433] NODE_DATA(0) allocated [mem 0x3b2cb680-0x3b2f5fff]
[ 0.007452] MEMBLOCK configuration:
[ 0.007452] memory size = 0x000000003d00b000 reserved size = 0x0000000006f9bdd1
[ 0.007453] memory.cnt = 0x3
[ 0.007454] memory[0x0] [0x0000000000001000-0x000000000009ffff], 0x000000000009f000 bytes on node 0 flags: 0x0
[ 0.007456] memory[0x1] [0x0000000000100000-0x00000000390cdfff], 0x0000000038fce000 bytes on node 0 flags: 0x0
[ 0.007458] memory[0x2] [0x00000000393de000-0x000000003d37bfff], 0x0000000003f9e000 bytes on node 0 flags: 0x0
[ 0.007459] reserved.cnt = 0x18
[ 0.007460] reserved[0x0] [0x0000000000000000-0x0000000000000fff], 0x0000000000001000 bytes flags: 0x0
[ 0.007462] reserved[0x1] [0x0000000000001000-0x00000000000fffff], 0x00000000000ff000 bytes on node 0 flags: 0x0
[ 0.007464] reserved[0x2] [0x000000001a000000-0x000000001d40cfff], 0x000000000340d000 bytes on node 0 flags: 0x0
[ 0.007465] reserved[0x3] [0x000000003254f000-0x0000000033ac6fff], 0x0000000001578000 bytes on node 0 flags: 0x0
[ 0.007467] reserved[0x4] [0x00000000372be018-0x00000000372c6e57], 0x0000000000008e40 bytes on node 0 flags: 0x0
[ 0.007468] reserved[0x5] [0x0000000037a43a98-0x0000000037a43e37], 0x00000000000003a0 bytes on node 0 flags: 0x0
[ 0.007470] reserved[0x6] [0x0000000038e73000-0x00000000390cdfff], 0x000000000025b000 bytes on node 0 flags: 0x0
[ 0.007471] reserved[0x7] [0x0000000039350040-0x000000003935060f], 0x00000000000005d0 bytes on node 0 flags: 0x0
[ 0.007473] reserved[0x8] [0x0000000039351000-0x0000000039351037], 0x0000000000000038 bytes on node 0 flags: 0x0
[ 0.007474] reserved[0x9] [0x0000000039352000-0x000000003935207e], 0x000000000000007f bytes on node 0 flags: 0x0
[ 0.007475] reserved[0xa] [0x0000000039353000-0x0000000039353758], 0x0000000000000759 bytes on node 0 flags: 0x0
[ 0.007477] reserved[0xb] [0x0000000039354000-0x0000000039354037], 0x0000000000000038 bytes on node 0 flags: 0x0
[ 0.007478] reserved[0xc] [0x0000000039355000-0x0000000039355113], 0x0000000000000114 bytes on node 0 flags: 0x0
[ 0.007479] reserved[0xd] [0x0000000039356000-0x0000000039357159], 0x000000000000115a bytes on node 0 flags: 0x0
[ 0.007480] reserved[0xe] [0x0000000039358000-0x000000003935809f], 0x00000000000000a0 bytes on node 0 flags: 0x0
[ 0.007482] reserved[0xf] [0x0000000039359000-0x0000000039359075], 0x0000000000000076 bytes on node 0 flags: 0x0
[ 0.007483] reserved[0x10] [0x000000003935a000-0x000000003935a06b], 0x000000000000006c bytes on node 0 flags: 0x0
[ 0.007485] reserved[0x11] [0x000000003935b000-0x000000003935b027], 0x0000000000000028 bytes on node 0 flags: 0x0
[ 0.007487] reserved[0x12] [0x00000000393d0000-0x00000000393d003f], 0x0000000000000040 bytes on node 0 flags: 0x0
[ 0.007488] reserved[0x13] [0x00000000393de000-0x00000000393defff], 0x0000000000001000 bytes on node 0 flags: 0x0
[ 0.007490] reserved[0x14] [0x000000003b000000-0x000000003b1fffff], 0x0000000000200000 bytes on node 0 flags: 0x0
[ 0.007491] reserved[0x15] [0x000000003b2cb680-0x000000003b2f5fff], 0x000000000002a980 bytes flags: 0x0
[ 0.007492] reserved[0x16] [0x000000003b2f6000-0x000000003b2f6000], 0x0000000000000001 bytes on node 0 flags: 0x0
[ 0.007493] reserved[0x17] [0x000000003b2f7000-0x000000003d37bfff], 0x0000000002085000 bytes on node 0 flags: 0x0
[ 0.007652] memblock_alloc_try_nid: 16384 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 memblocks_present+0x1d1/0x210
[ 0.007655] memblock_reserve: [0x000000003b2c7680-0x000000003b2cb67f] memblock_alloc_range_nid+0xbf/0x180
[ 0.007659] memblock_alloc_try_nid: 4096 bytes align=0x40 nid=0 from=0x0000000000000000 max_addr=0x0000000000000000 sparse_index_alloc+0x44/0x70
[ 0.007663] memblock_reserve: [0x000000003b2c6680-0x000000003b2c767f] memblock_alloc_range_nid+0xbf/0x180
[ 0.007667] memblock_alloc_try_nid: 448 bytes align=0x40 nid=0 from=0x0000000038000000 max_addr=0x0000000040000000 sparse_init_nid+0x9b/0x4e0
[ 0.007669] memblock_reserve: [0x000000003b2f6e40-0x000000003b2f6fff] memblock_alloc_range_nid+0xbf/0x180
[ 0.007672] memblock_alloc_exact_nid_raw: 16777216 bytes align=0x200000 nid=0 from=0x0000000001000000 max_addr=0x0000000000000000 memmap_alloc+0x1f/0x60
[ 0.007675] memblock_reserve: [0x000000003a000000-0x000000003affffff] memblock_alloc_range_nid+0xbf/0x180
[ 0.009338] memblock_alloc_try_nid_raw: 4096 bytes align=0x1000 nid=0 from=0x0000000001000000 max_addr=0x0000000000000000 vmemmap_alloc_block_zero.constprop.0+0x11/0x50
[ 0.009345] memblock_reserve: [0x000000003b2c5000-0x000000003b2c5fff] memblock_alloc_range_nid+0xbf/0x180
[ 0.009351] memblock_alloc_try_nid_raw: 4096 bytes align=0x1000 nid=0 from=0x0000000001000000 max_addr=0x0000000000000000 vmemmap_alloc_block_zero.constprop.0+0x11/0x50
[ 0.009354] memblock_reserve: [0x000000003b2c4000-0x000000003b2c4fff] memblock_alloc_range_nid+0xbf/0x180
[ 0.009360] Zone ranges:
[ 0.009361] DMA [mem 0x0000000000001000-0x0000000000ffffff]
[ 0.009363] DMA32 [mem 0x0000000001000000-0x000000003d37bfff]
[ 0.009364] Normal empty
[ 0.009365] Device empty
[ 0.009366] Movable zone start for each node
[ 0.009369] Early memory node ranges
[ 0.009369] node 0: [mem 0x0000000000001000-0x000000000009ffff]
[ 0.009371] node 0: [mem 0x0000000000100000-0x00000000390cdfff]
[ 0.009372] node 0: [mem 0x00000000393de000-0x000000003d37bfff]
[ 0.009373] Initmem setup node 0 [mem 0x0000000000001000-0x000000003d37bfff]
[ 0.009378] On node 0, zone DMA: 1 pages in unavailable ranges
[ 0.009406] On node 0, zone DMA: 96 pages in unavailable ranges
[ 0.009905] On node 0, zone DMA32: 784 pages in unavailable ranges
[ 0.009994] On node 0, zone DMA32: 11396 pages in unavailable ranges
[ 0.010412] ACPI: PM-Timer IO Port: 0xb008
[ 0.010424] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
[ 0.010480] IOAPIC[0]: apic_id 0, version 32, address 0xfec00000, GSI 0-23
[ 0.010482] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
[ 0.010484] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[ 0.010486] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
[ 0.010487] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
[ 0.010490] ACPI: Using ACPI (MADT) for SMP configuration information
[ 0.010491] ACPI: HPET id: 0x8086a201 base: 0xfed00000
[ 0.010492] memblock_alloc_try_nid: 73 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 acpi_parse_hpet+0x91/0x150
[ 0.010498] memblock_reserve: [0x000000003b2f6dc0-0x000000003b2f6e08] memblock_alloc_range_nid+0xbf/0x180
[ 0.010503] TSC deadline timer available
[ 0.010508] CPU topo: Max. logical packages: 1
[ 0.010509] CPU topo: Max. logical dies: 1
[ 0.010509] CPU topo: Max. dies per package: 1
[ 0.010514] CPU topo: Max. threads per core: 2
[ 0.010515] CPU topo: Num. cores per package: 1
[ 0.010516] CPU topo: Num. threads per package: 2
[ 0.010516] CPU topo: Allowing 2 present CPUs plus 0 hotplug CPUs
[ 0.010518] memblock_alloc_try_nid: 75 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 io_apic_init_mappings+0x41/0x1e0
[ 0.010522] memblock_reserve: [0x000000003b2f6d40-0x000000003b2f6d8a] memblock_alloc_range_nid+0xbf/0x180
[ 0.010539] kvm-guest: APIC: eoi() replaced with kvm_guest_apic_eoi_write()
[ 0.010547] memblock_alloc_try_nid: 640 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 e820__reserve_resources+0x2e/0x1f0
[ 0.010552] memblock_reserve: [0x000000003b2f6ac0-0x000000003b2f6d3f] memblock_alloc_range_nid+0xbf/0x180
[ 0.010556] memblock_alloc_try_nid: 104 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 firmware_map_add_early+0x2c/0x60
[ 0.010560] memblock_reserve: [0x000000003b2f6a40-0x000000003b2f6aa7] memblock_alloc_range_nid+0xbf/0x180
[ 0.010563] memblock_alloc_try_nid: 104 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 firmware_map_add_early+0x2c/0x60
[ 0.010566] memblock_reserve: [0x000000003b2f69c0-0x000000003b2f6a27] memblock_alloc_range_nid+0xbf/0x180
[ 0.010569] memblock_alloc_try_nid: 104 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 firmware_map_add_early+0x2c/0x60
[ 0.010572] memblock_reserve: [0x000000003b2f6940-0x000000003b2f69a7] memblock_alloc_range_nid+0xbf/0x180
[ 0.010574] memblock_alloc_try_nid: 104 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 firmware_map_add_early+0x2c/0x60
[ 0.010577] memblock_reserve: [0x000000003b2f68c0-0x000000003b2f6927] memblock_alloc_range_nid+0xbf/0x180
[ 0.010580] memblock_alloc_try_nid: 104 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 firmware_map_add_early+0x2c/0x60
[ 0.010583] memblock_reserve: [0x000000003b2f6840-0x000000003b2f68a7] memblock_alloc_range_nid+0xbf/0x180
[ 0.010585] memblock_alloc_try_nid: 104 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 firmware_map_add_early+0x2c/0x60
[ 0.010588] memblock_reserve: [0x000000003b2f67c0-0x000000003b2f6827] memblock_alloc_range_nid+0xbf/0x180
[ 0.010591] memblock_alloc_try_nid: 104 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 firmware_map_add_early+0x2c/0x60
[ 0.010593] memblock_reserve: [0x000000003b2f6740-0x000000003b2f67a7] memblock_alloc_range_nid+0xbf/0x180
[ 0.010596] memblock_alloc_try_nid: 32 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 register_nosave_region+0x53/0xe0
[ 0.010601] memblock_reserve: [0x000000003b2f6700-0x000000003b2f671f] memblock_alloc_range_nid+0xbf/0x180
[ 0.010604] PM: hibernation: Registered nosave memory: [mem 0x00000000-0x00000fff]
[ 0.010605] memblock_alloc_try_nid: 32 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 register_nosave_region+0x53/0xe0
[ 0.010608] memblock_reserve: [0x000000003b2f66c0-0x000000003b2f66df] memblock_alloc_range_nid+0xbf/0x180
[ 0.010611] PM: hibernation: Registered nosave memory: [mem 0x000a0000-0x000fffff]
[ 0.010611] memblock_alloc_try_nid: 32 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 register_nosave_region+0x53/0xe0
[ 0.010615] memblock_reserve: [0x000000003b2f6680-0x000000003b2f669f] memblock_alloc_range_nid+0xbf/0x180
[ 0.010617] PM: hibernation: Registered nosave memory: [mem 0x390ce000-0x393ddfff]
[ 0.010619] [mem 0x3d400000-0xffffffff] available for PCI devices
[ 0.010621] Booting paravirtualized kernel on KVM
[ 0.010623] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[ 0.015925] memblock_alloc_try_nid: 265 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 setup_command_line+0x71/0x230
[ 0.015933] memblock_reserve: [0x000000003b2f6540-0x000000003b2f6648] memblock_alloc_range_nid+0xbf/0x180
[ 0.015937] memblock_alloc_try_nid: 265 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 setup_command_line+0xa4/0x230
[ 0.015940] memblock_reserve: [0x000000003b2f6400-0x000000003b2f6508] memblock_alloc_range_nid+0xbf/0x180
[ 0.015944] setup_percpu: NR_CPUS:8192 nr_cpumask_bits:2 nr_cpu_ids:2 nr_node_ids:1
[ 0.015954] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 pcpu_build_alloc_info+0x2e1/0x5c0
[ 0.015958] memblock_reserve: [0x000000003b2c3000-0x000000003b2c3fff] memblock_alloc_range_nid+0xbf/0x180
[ 0.015961] memblock_alloc_try_nid: 4096 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 pcpu_embed_first_chunk+0x7a/0x580
[ 0.015964] memblock_reserve: [0x000000003b2c2000-0x000000003b2c2fff] memblock_alloc_range_nid+0xbf/0x180
[ 0.015968] memblock_alloc_try_nid: 2097152 bytes align=0x200000 nid=0 from=0x0000000001000000 max_addr=0x0000000000000000 pcpu_fc_alloc+0x101/0x170
[ 0.015971] memblock_reserve: [0x0000000039e00000-0x0000000039ffffff] memblock_alloc_range_nid+0xbf/0x180
[ 0.016206] memblock_phys_free: [0x0000000039e41000-0x0000000039efffff] pcpu_embed_first_chunk+0x1e6/0x580
[ 0.016229] memblock_phys_free: [0x0000000039f41000-0x0000000039ffffff] pcpu_embed_first_chunk+0x1e6/0x580
[ 0.016231] percpu: Embedded 65 pages/cpu s229376 r8192 d28672 u1048576
[ 0.016233] memblock_alloc_try_nid: 8 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 pcpu_setup_first_chunk+0xc3/0x900
[ 0.016236] memblock_reserve: [0x000000003b2f63c0-0x000000003b2f63c7] memblock_alloc_range_nid+0xbf/0x180
[ 0.016239] memblock_alloc_try_nid: 8 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 pcpu_setup_first_chunk+0xee/0x900
[ 0.016242] memblock_reserve: [0x000000003b2f6380-0x000000003b2f6387] memblock_alloc_range_nid+0xbf/0x180
[ 0.016244] memblock_alloc_try_nid: 8 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 pcpu_setup_first_chunk+0x120/0x900
[ 0.016247] memblock_reserve: [0x000000003b2f6340-0x000000003b2f6347] memblock_alloc_range_nid+0xbf/0x180
[ 0.016250] memblock_alloc_try_nid: 16 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 pcpu_setup_first_chunk+0x156/0x900
[ 0.016253] memblock_reserve: [0x000000003b2f6300-0x000000003b2f630f] memblock_alloc_range_nid+0xbf/0x180
[ 0.016256] pcpu-alloc: s229376 r8192 d28672 u1048576 alloc=1*2097152
[ 0.016258] pcpu-alloc: [0] 0 1
[ 0.016261] memblock_alloc_try_nid: 352 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 pcpu_setup_first_chunk+0x707/0x900
[ 0.016264] memblock_reserve: [0x000000003b2f6180-0x000000003b2f62df] memblock_alloc_range_nid+0xbf/0x180
[ 0.016266] memblock_alloc_try_nid: 264 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 pcpu_alloc_first_chunk+0x76/0x2f0
[ 0.016269] memblock_reserve: [0x000000003b2f6040-0x000000003b2f6147] memblock_alloc_range_nid+0xbf/0x180
[ 0.016272] memblock_alloc_try_nid: 256 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 pcpu_alloc_first_chunk+0xee/0x2f0
[ 0.016275] memblock_reserve: [0x000000003b2c6580-0x000000003b2c667f] memblock_alloc_range_nid+0xbf/0x180
[ 0.016278] memblock_alloc_try_nid: 264 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 pcpu_alloc_first_chunk+0x129/0x2f0
[ 0.016280] memblock_reserve: [0x000000003b2c6440-0x000000003b2c6547] memblock_alloc_range_nid+0xbf/0x180
[ 0.016283] memblock_alloc_try_nid: 64 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 pcpu_alloc_first_chunk+0x15a/0x2f0
[ 0.016286] memblock_reserve: [0x000000003b2c6400-0x000000003b2c643f] memblock_alloc_range_nid+0xbf/0x180
[ 0.016289] memblock_alloc_try_nid: 264 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 pcpu_alloc_first_chunk+0x76/0x2f0
[ 0.016291] memblock_reserve: [0x000000003b2c62c0-0x000000003b2c63c7] memblock_alloc_range_nid+0xbf/0x180
[ 0.016294] memblock_alloc_try_nid: 896 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 pcpu_alloc_first_chunk+0xee/0x2f0
[ 0.016297] memblock_reserve: [0x000000003b2c1c80-0x000000003b2c1fff] memblock_alloc_range_nid+0xbf/0x180
[ 0.016300] memblock_alloc_try_nid: 904 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 pcpu_alloc_first_chunk+0x129/0x2f0
[ 0.016303] memblock_reserve: [0x000000003b2c18c0-0x000000003b2c1c47] memblock_alloc_range_nid+0xbf/0x180
[ 0.016305] memblock_alloc_try_nid: 224 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 pcpu_alloc_first_chunk+0x15a/0x2f0
[ 0.016308] memblock_reserve: [0x000000003b2c61c0-0x000000003b2c629f] memblock_alloc_range_nid+0xbf/0x180
[ 0.016311] memblock_phys_free: [0x000000003b2c3000-0x000000003b2c3fff] pcpu_embed_first_chunk+0x26c/0x580
[ 0.016314] memblock_phys_free: [0x000000003b2c2000-0x000000003b2c2fff] pcpu_embed_first_chunk+0x279/0x580
[ 0.016317] memblock_alloc_try_nid: 8 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 alloc_bootmem_cpumask_var+0x2a/0x60
[ 0.016320] memblock_reserve: [0x000000003b2c6180-0x000000003b2c6187] memblock_alloc_range_nid+0xbf/0x180
[ 0.016323] memblock_alloc_try_nid: 8 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 alloc_bootmem_cpumask_var+0x2a/0x60
[ 0.016325] memblock_reserve: [0x000000003b2c6140-0x000000003b2c6147] memblock_alloc_range_nid+0xbf/0x180
[ 0.016996] kvm-guest: PV spinlocks enabled
[ 0.016997] memblock_alloc_try_nid: 4096 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 alloc_large_system_hash+0x232/0x2d0
[ 0.017000] memblock_reserve: [0x000000003b2c3000-0x000000003b2c3fff] memblock_alloc_range_nid+0xbf/0x180
[ 0.017004] PV qspinlock hash table entries: 256 (order: 0, 4096 bytes, linear)
[ 0.017006] Kernel command line: BOOT_IMAGE=(hd0,gpt1)/boot/vmlinuz-6.12.68-93.123.amzn2023.x86_64 root=UUID=7813e2d4-9cdc-416a-a749-25de8a9f36d0 ro console=tty0 console=ttyS0,115200n8 nvme_core.io_timeout=4294967295 rd.emergency=poweroff rd.shell=0 selinux=1 security=selinux quiet memblock=debug
[ 0.017133] random: crng init done
[ 0.017136] memblock_alloc_try_nid: 1048576 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 alloc_large_system_hash+0x232/0x2d0
[ 0.017139] memblock_reserve: [0x0000000039d00000-0x0000000039dfffff] memblock_alloc_range_nid+0xbf/0x180
[ 0.017255] Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)
[ 0.017257] memblock_alloc_try_nid: 524288 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 alloc_large_system_hash+0x232/0x2d0
[ 0.017261] memblock_reserve: [0x000000003b2418c0-0x000000003b2c18bf] memblock_alloc_range_nid+0xbf/0x180
[ 0.017314] Inode-cache hash table entries: 65536 (order: 7, 524288 bytes, linear)
[ 0.017318] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 spp_getpage+0x51/0xa0
[ 0.017323] memblock_reserve: [0x000000003b2c2000-0x000000003b2c2fff] memblock_alloc_range_nid+0xbf/0x180
[ 0.017326] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 spp_getpage+0x51/0xa0
[ 0.017328] memblock_reserve: [0x000000003b240000-0x000000003b240fff] memblock_alloc_range_nid+0xbf/0x180
[ 0.017331] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 spp_getpage+0x51/0xa0
[ 0.017334] memblock_reserve: [0x000000003b23f000-0x000000003b23ffff] memblock_alloc_range_nid+0xbf/0x180
[ 0.017347] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 spp_getpage+0x51/0xa0
[ 0.017350] memblock_reserve: [0x000000003b23e000-0x000000003b23efff] memblock_alloc_range_nid+0xbf/0x180
[ 0.017353] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 spp_getpage+0x51/0xa0
[ 0.017356] memblock_reserve: [0x000000003b23d000-0x000000003b23dfff] memblock_alloc_range_nid+0xbf/0x180
[ 0.017361] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 spp_getpage+0x51/0xa0
[ 0.017364] memblock_reserve: [0x000000003b23c000-0x000000003b23cfff] memblock_alloc_range_nid+0xbf/0x180
[ 0.017388] Fallback order for Node 0: 0
[ 0.017391] Built 1 zonelists, mobility grouping on. Total pages: 249867
[ 0.017392] Policy zone: DMA32
[ 0.017393] mem auto-init: stack:off, heap alloc:off, heap free:off
[ 0.018640] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
[ 0.018654] Kernel/User page tables isolation: enabled
[ 0.018679] ftrace: allocating 46825 entries in 183 pages
[ 0.027381] ftrace: allocated 183 pages with 6 groups
[ 0.028278] Dynamic Preempt: none
[ 0.028324] rcu: Preemptible hierarchical RCU implementation.
[ 0.028325] rcu: RCU restricting CPUs from NR_CPUS=8192 to nr_cpu_ids=2.
[ 0.028327] Trampoline variant of Tasks RCU enabled.
[ 0.028328] Rude variant of Tasks RCU enabled.
[ 0.028328] Tracing variant of Tasks RCU enabled.
[ 0.028329] rcu: RCU calculated value of scheduler-enlistment delay is 10 jiffies.
[ 0.028330] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
[ 0.028336] RCU Tasks: Setting shift to 1 and lim to 1 rcu_task_cb_adjust=1 rcu_task_cpu_ids=2.
[ 0.028338] RCU Tasks Rude: Setting shift to 1 and lim to 1 rcu_task_cb_adjust=1 rcu_task_cpu_ids=2.
[ 0.028339] RCU Tasks Trace: Setting shift to 1 and lim to 1 rcu_task_cb_adjust=1 rcu_task_cpu_ids=2.
[ 0.032427] NR_IRQS: 524544, nr_irqs: 440, preallocated irqs: 16
[ 0.032664] rcu: srcu_init: Setting srcu_struct sizes based on contention.
[ 0.032720] Console: colour dummy device 80x25
[ 0.032722] printk: legacy console [tty0] enabled
[ 0.032818] printk: legacy console [ttyS0] enabled
[ 0.032884] ACPI: Core revision 20240827
[ 0.033070] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 30580167144 ns
[ 0.033089] APIC: Switch to symmetric I/O mode setup
[ 0.033518] x2apic enabled
[ 0.033960] APIC: Switched APIC routing to: physical x2apic
[ 0.035630] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x240933eba6e, max_idle_ns: 440795246008 ns
[ 0.035635] Calibrating delay loop (skipped) preset value.. 4999.98 BogoMIPS (lpj=24999940)
[ 0.036010] Last level iTLB entries: 4KB 64, 2MB 8, 4MB 8
[ 0.036015] Last level dTLB entries: 4KB 64, 2MB 32, 4MB 32, 1GB 4
[ 0.036019] Spectre V1 : Mitigation: usercopy/swapgs barriers and __user pointer sanitization
[ 0.036022] Spectre V2 : Mitigation: Retpolines
[ 0.036023] Spectre V2 : Spectre v2 / SpectreRSB: Filling RSB on context switch and VMEXIT
[ 0.036024] RETBleed: WARNING: Spectre v2 mitigation leaves CPU vulnerable to RETBleed attacks, data leaks possible!
[ 0.045633] RETBleed: Vulnerable
[ 0.045633] Speculative Store Bypass: Vulnerable
[ 0.045633] MDS: Vulnerable: Clear CPU buffers attempted, no microcode
[ 0.045633] MMIO Stale Data: Vulnerable: Clear CPU buffers attempted, no microcode
[ 0.045633] GDS: Unknown: Dependent on hypervisor status
[ 0.045633] active return thunk: its_return_thunk
[ 0.045633] ITS: Mitigation: Aligned branch/return thunks
[ 0.045633] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[ 0.045633] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[ 0.045633] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[ 0.045633] x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers'
[ 0.045633] x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR'
[ 0.045633] x86/fpu: Supporting XSAVE feature 0x020: 'AVX-512 opmask'
[ 0.045633] x86/fpu: Supporting XSAVE feature 0x040: 'AVX-512 Hi256'
[ 0.045633] x86/fpu: Supporting XSAVE feature 0x080: 'AVX-512 ZMM_Hi256'
[ 0.045633] x86/fpu: Supporting XSAVE feature 0x200: 'Protection Keys User registers'
[ 0.045633] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
[ 0.045633] x86/fpu: xstate_offset[3]: 832, xstate_sizes[3]: 64
[ 0.045633] x86/fpu: xstate_offset[4]: 896, xstate_sizes[4]: 64
[ 0.045633] x86/fpu: xstate_offset[5]: 960, xstate_sizes[5]: 64
[ 0.045633] x86/fpu: xstate_offset[6]: 1024, xstate_sizes[6]: 512
[ 0.045633] x86/fpu: xstate_offset[7]: 1536, xstate_sizes[7]: 1024
[ 0.045633] x86/fpu: xstate_offset[9]: 2560, xstate_sizes[9]: 8
[ 0.045633] x86/fpu: Enabled xstate features 0x2ff, context size is 2568 bytes, using 'compacted' format.
[ 0.045633] Freeing SMP alternatives memory: 36K
[ 0.045633] pid_max: default: 32768 minimum: 301
[ 0.045633] memblock_free_late: [0x000000003d36b000-0x000000003d37bfff] efi_free_boot_services+0x11f/0x2e0
[ 0.045633] memblock_free_late: [0x000000003b336000-0x000000003d36afff] efi_free_boot_services+0x11f/0x2e0
[ 0.045633] memblock_free_late: [0x000000003b317000-0x000000003b335fff] efi_free_boot_services+0x11f/0x2e0
[ 0.045633] memblock_free_late: [0x000000003b2f7000-0x000000003b316fff] efi_free_boot_services+0x11f/0x2e0
[ 0.045633] memblock_free_late: [0x000000003b000000-0x000000003b1fffff] efi_free_boot_services+0x11f/0x2e0
[ 0.045633] memblock_free_late: [0x00000000393de000-0x00000000393defff] efi_free_boot_services+0x11f/0x2e0
[ 0.045633] memblock_free_late: [0x0000000038e73000-0x00000000390cdfff] efi_free_boot_services+0x11f/0x2e0
[ 0.045633] LSM: initializing lsm=lockdown,capability,landlock,yama,safesetid,selinux,bpf,ima
[ 0.045633] landlock: Up and running.
[ 0.045633] Yama: becoming mindful.
[ 0.045633] SELinux: Initializing.
[ 0.045633] LSM support for eBPF active
[ 0.045633] Mount-cache hash table entries: 2048 (order: 2, 16384 bytes, linear)
[ 0.045633] Mountpoint-cache hash table entries: 2048 (order: 2, 16384 bytes, linear)
[ 0.045633] smpboot: CPU0: Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz (family: 0x6, model: 0x55, stepping: 0x7)
[ 0.045633] Performance Events: unsupported p6 CPU model 85 no PMU driver, software events only.
[ 0.045633] signal: max sigframe size: 3632
[ 0.045633] rcu: Hierarchical SRCU implementation.
[ 0.045633] rcu: Max phase no-delay instances is 1000.
[ 0.045633] Timer migration: 1 hierarchy levels; 8 children per group; 1 crossnode level
[ 0.045633] smp: Bringing up secondary CPUs ...
[ 0.045633] smpboot: x86: Booting SMP configuration:
[ 0.045633] .... node #0, CPUs: #1
[ 0.045633] MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details.
[ 0.045633] MMIO Stale Data CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/processor_mmio_stale_data.html for more details.
[ 0.045633] smp: Brought up 1 node, 2 CPUs
[ 0.045633] smpboot: Total of 2 processors activated (9999.97 BogoMIPS)
[ 0.045633] node 0 deferred pages initialised in 0ms
[ 0.045633] Memory: 900460K/999468K available (16384K kernel code, 9440K rwdata, 11364K rodata, 3740K init, 6440K bss, 94600K reserved, 0K cma-reserved)
[ 0.045633] devtmpfs: initialized
[ 0.045633] x86/mm: Memory block size: 128MB
[ 0.045633] ------------[ cut here ]------------
[ 0.045633] page type is 1, passed migratetype is 0 (nr=16)
[ 0.045633] WARNING: CPU: 1 PID: 2 at mm/page_alloc.c:721 rmqueue_bulk+0x82e/0x880
[ 0.045633] Modules linked in:
[ 0.045633] CPU: 1 UID: 0 PID: 2 Comm: kthreadd Not tainted 6.12.68-93.123.amzn2023.x86_64 #1
[ 0.045633] Hardware name: Amazon EC2 t3.micro/, BIOS 1.0 10/16/2017
[ 0.045633] RIP: 0010:rmqueue_bulk+0x82e/0x880
[ 0.045633] Code: c6 05 be be 13 02 01 e8 b0 b5 ff ff 44 89 e9 8b 14 24 48 c7 c7 a8 6d 51 8e 48 89 c6 b8 01 00 00 00 d3 e0 89 c1 e8 32 4f d2 ff <0f> 0b 4c 8b 44 24 48 e9 79 fc ff ff 48 c7 c6 e0 77 51 8e 4c 89 e7
[ 0.045633] RSP: 0000:ffffd592c002f898 EFLAGS: 00010086
[ 0.045633] RAX: 0000000000000000 RBX: ffff8e363b2cbc80 RCX: ffffffff8f1f0c68
[ 0.045633] RDX: 0000000000000000 RSI: 00000000fffeffff RDI: 0000000000000001
[ 0.045633] RBP: fffffb9c40e3a408 R08: 0000000000000000 R09: ffffd592c002f740
[ 0.045633] R10: ffffd592c002f738 R11: ffffffff8f370ca8 R12: fffffb9c40e3a400
[ 0.045633] R13: 0000000000000004 R14: 0000000000000003 R15: 0000000000038e90
[ 0.045633] FS: 0000000000000000(0000) GS:ffff8e3639f00000(0000) knlGS:0000000000000000
[ 0.045633] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.045633] CR2: 0000000000000000 CR3: 000000001bc34001 CR4: 00000000007706f0
[ 0.045633] PKRU: 55555554
[ 0.045633] Call Trace:
[ 0.045633] <TASK>
[ 0.045633] __rmqueue_pcplist+0x233/0x2c0
[ 0.045633] rmqueue.constprop.0+0x4b6/0xe80
[ 0.045633] ? _raw_spin_unlock+0xa/0x30
[ 0.045633] ? rmqueue.constprop.0+0x557/0xe80
[ 0.045633] ? _raw_spin_unlock_irqrestore+0xa/0x30
[ 0.045633] get_page_from_freelist+0x16e/0x5f0
[ 0.045633] __alloc_pages_noprof+0x18a/0x350
[ 0.045633] alloc_pages_mpol_noprof+0xf2/0x1e0
[ 0.045633] ? shuffle_freelist+0x126/0x1b0
[ 0.045633] allocate_slab+0x2b3/0x410
[ 0.045633] ___slab_alloc+0x396/0x830
[ 0.045633] ? switch_hrtimer_base+0x8e/0x190
[ 0.045633] ? timerqueue_add+0x9b/0xc0
[ 0.045633] ? dup_task_struct+0x2d/0x1b0
[ 0.045633] ? _raw_spin_unlock_irqrestore+0xa/0x30
[ 0.045633] ? start_dl_timer+0xb0/0x140
[ 0.045633] kmem_cache_alloc_node_noprof+0x271/0x2e0
[ 0.045633] ? dup_task_struct+0x2d/0x1b0
[ 0.045633] dup_task_struct+0x2d/0x1b0
[ 0.045633] copy_process+0x195/0x17e0
[ 0.045633] kernel_clone+0x9a/0x3b0
[ 0.045633] ? psi_task_switch+0x105/0x290
[ 0.045633] kernel_thread+0x6b/0x90
[ 0.045633] ? __pfx_kthread+0x10/0x10
[ 0.045633] kthreadd+0x276/0x2d0
[ 0.045633] ? __pfx_kthreadd+0x10/0x10
[ 0.045633] ret_from_fork+0x30/0x50
[ 0.045633] ? __pfx_kthreadd+0x10/0x10
[ 0.045633] ret_from_fork_asm+0x1a/0x30
[ 0.045633] </TASK>
[ 0.045633] ---[ end trace 0000000000000000 ]---
[ 0.045633] ------------[ cut here ]------------
[ 0.045633] page type is 1, passed migratetype is 0 (nr=8)
[ 0.045633] WARNING: CPU: 1 PID: 2 at mm/page_alloc.c:686 expand+0x1af/0x1e0
[ 0.045633] Modules linked in:
[ 0.045633] CPU: 1 UID: 0 PID: 2 Comm: kthreadd Tainted: G W 6.12.68-93.123.amzn2023.x86_64 #1
[ 0.045633] Tainted: [W]=WARN
[ 0.045633] Hardware name: Amazon EC2 t3.micro/, BIOS 1.0 10/16/2017
[ 0.045633] RIP: 0010:expand+0x1af/0x1e0
[ 0.045633] Code: c6 05 af 06 14 02 01 e8 9f fd ff ff 89 e9 8b 54 24 34 48 c7 c7 a8 6d 51 8e 48 89 c6 b8 01 00 00 00 d3 e0 89 c1 e8 21 97 d2 ff <0f> 0b e9 e5 fe ff ff 48 c7 c6 e0 6d 51 8e 4c 89 ff e8 eb 23 fc ff
[ 0.045633] RSP: 0000:ffffd592c002f828 EFLAGS: 00010082
[ 0.045633] RAX: 0000000000000000 RBX: ffff8e363b2cbc80 RCX: ffffffff8f1f0c68
[ 0.045633] RDX: 0000000000000000 RSI: 00000000fffeffff RDI: 0000000000000001
[ 0.045633] RBP: 0000000000000003 R08: 0000000000000000 R09: ffffd592c002f6d0
[ 0.045633] R10: ffffd592c002f6c8 R11: ffffffff8f370ca8 R12: 0000000000000008
[ 0.045633] R13: 0000000000038e98 R14: 0000000000000003 R15: fffffb9c40e3a600
[ 0.045633] FS: 0000000000000000(0000) GS:ffff8e3639f00000(0000) knlGS:0000000000000000
[ 0.045633] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.045633] CR2: 0000000000000000 CR3: 000000001bc34001 CR4: 00000000007706f0
[ 0.045633] PKRU: 55555554
[ 0.045633] Call Trace:
[ 0.045633] <TASK>
[ 0.045633] rmqueue_bulk+0x541/0x880
[ 0.045633] __rmqueue_pcplist+0x233/0x2c0
[ 0.045633] rmqueue.constprop.0+0x4b6/0xe80
[ 0.045633] ? _raw_spin_unlock+0xa/0x30
[ 0.045633] ? rmqueue.constprop.0+0x557/0xe80
[ 0.045633] ? _raw_spin_unlock_irqrestore+0xa/0x30
[ 0.045633] get_page_from_freelist+0x16e/0x5f0
[ 0.045633] __alloc_pages_noprof+0x18a/0x350
[ 0.045633] alloc_pages_mpol_noprof+0xf2/0x1e0
[ 0.045633] ? shuffle_freelist+0x126/0x1b0
[ 0.045633] allocate_slab+0x2b3/0x410
[ 0.045633] ___slab_alloc+0x396/0x830
[ 0.045633] ? switch_hrtimer_base+0x8e/0x190
[ 0.045633] ? timerqueue_add+0x9b/0xc0
[ 0.045633] ? dup_task_struct+0x2d/0x1b0
[ 0.045633] ? _raw_spin_unlock_irqrestore+0xa/0x30
[ 0.045633] ? start_dl_timer+0xb0/0x140
[ 0.045633] kmem_cache_alloc_node_noprof+0x271/0x2e0
[ 0.045633] ? dup_task_struct+0x2d/0x1b0
[ 0.045633] dup_task_struct+0x2d/0x1b0
[ 0.045633] copy_process+0x195/0x17e0
[ 0.045633] kernel_clone+0x9a/0x3b0
[ 0.045633] ? psi_task_switch+0x105/0x290
[ 0.045633] kernel_thread+0x6b/0x90
[ 0.045633] ? __pfx_kthread+0x10/0x10
[ 0.045633] kthreadd+0x276/0x2d0
[ 0.045633] ? __pfx_kthreadd+0x10/0x10
[ 0.045633] ret_from_fork+0x30/0x50
[ 0.045633] ? __pfx_kthreadd+0x10/0x10
[ 0.045633] ret_from_fork_asm+0x1a/0x30
[ 0.045633] </TASK>
[ 0.045633] ---[ end trace 0000000000000000 ]---
[ 0.045633] ACPI: PM: Registering ACPI NVS region [mem 0x3935e000-0x393ddfff] (524288 bytes)
[ 0.045633] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[ 0.045633] futex hash table entries: 512 (order: 3, 32768 bytes, linear)
[ 0.045633] NET: Registered PF_NETLINK/PF_ROUTE protocol family
[ 0.045633] DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
[ 0.045633] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
[ 0.045633] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
[ec2-user@ip-172-31-29-240 ~]$ sudo dmesg
[ 0.000000] Linux version 6.12.68-93.123.amzn2023.x86_64 (mockbuild@ip-10-0-53-120) (gcc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-5), GNU ld version 2.41-50.amzn2023.0.5) #1 SMP PREEMPT_DYNAMIC Tue Feb 10 06:40:43 UTC 2026
[ 0.000000] Command line: BOOT_IMAGE=(hd0,gpt1)/boot/vmlinuz-6.12.68-93.123.amzn2023.x86_64 root=UUID=7813e2d4-9cdc-416a-a749-25de8a9f36d0 ro console=tty0 console=ttyS0,115200n8 nvme_core.io_timeout=4294967295 rd.emergency=poweroff rd.shell=0 selinux=1 security=selinux quiet memblock=debug
[ 0.000000] KASLR enabled
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000390cdfff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000390ce000-0x000000003934dfff] reserved
[ 0.000000] BIOS-e820: [mem 0x000000003934e000-0x000000003935dfff] ACPI data
[ 0.000000] BIOS-e820: [mem 0x000000003935e000-0x00000000393ddfff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x00000000393de000-0x000000003d37bfff] usable
[ 0.000000] BIOS-e820: [mem 0x000000003d37c000-0x000000003d3fffff] reserved
[ 0.000000] memblock_reserve: [0x0000000039350040-0x000000003935060f] efi_memblock_x86_reserve_range+0x159/0x1e0
[ 0.000000] NX (Execute Disable) protection: active
[ 0.000000] APIC: Static calls initialized
[ 0.000000] e820: update [mem 0x372be018-0x372c6e57] usable ==> usable
[ 0.000000] extended physical RAM map:
[ 0.000000] reserve setup_data: [mem 0x0000000000000000-0x000000000009ffff] usable
[ 0.000000] reserve setup_data: [mem 0x0000000000100000-0x00000000372be017] usable
[ 0.000000] reserve setup_data: [mem 0x00000000372be018-0x00000000372c6e57] usable
[ 0.000000] reserve setup_data: [mem 0x00000000372c6e58-0x00000000390cdfff] usable
[ 0.000000] reserve setup_data: [mem 0x00000000390ce000-0x000000003934dfff] reserved
[ 0.000000] reserve setup_data: [mem 0x000000003934e000-0x000000003935dfff] ACPI data
[ 0.000000] reserve setup_data: [mem 0x000000003935e000-0x00000000393ddfff] ACPI NVS
[ 0.000000] reserve setup_data: [mem 0x00000000393de000-0x000000003d37bfff] usable
[ 0.000000] reserve setup_data: [mem 0x000000003d37c000-0x000000003d3fffff] reserved
[ 0.000000] efi: EFI v2.7 by EDK II
[ 0.000000] efi: SMBIOS=0x3926a000 ACPI=0x3935d000 ACPI 2.0=0x3935d014 MEMATTR=0x37a43a98
[ 0.000000] memblock_reserve: [0x0000000037a43a98-0x0000000037a43e37] efi_memattr_init+0x4d/0xa0
[ 0.000000] SMBIOS 2.7 present.
[ 0.000000] DMI: Amazon EC2 t3.micro/, BIOS 1.0 10/16/2017
[ 0.000000] DMI: Memory slots populated: 1/1
[ 0.000000] Hypervisor detected: KVM
[ 0.000000] last_pfn = 0x3d37c max_arch_pfn = 0x400000000
[ 0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00
[ 0.000000] kvm-clock: using sched offset of 7715848997 cycles
[ 0.000003] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[ 0.000006] tsc: Detected 2499.994 MHz processor
[ 0.000088] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
[ 0.000090] e820: remove [mem 0x000a0000-0x000fffff] usable
[ 0.000096] last_pfn = 0x3d37c max_arch_pfn = 0x400000000
[ 0.000123] MTRR map: 4 entries (2 fixed + 2 variable; max 18), built from 8 variable MTRRs
[ 0.000126] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT
[ 0.006943] memblock_reserve: [0x000000001d400000-0x000000001d40cfff] setup_arch+0x953/0xad0
[ 0.006950] memblock_add: [0x0000000000001000-0x000000000009ffff] e820__memblock_setup+0x6f/0xb0
[ 0.006954] memblock_add: [0x0000000000100000-0x00000000372be017] e820__memblock_setup+0x6f/0xb0
[ 0.006957] memblock_add: [0x00000000372be018-0x00000000372c6e57] e820__memblock_setup+0x6f/0xb0
[ 0.006960] memblock_add: [0x00000000372c6e58-0x00000000390cdfff] e820__memblock_setup+0x6f/0xb0
[ 0.006962] memblock_add: [0x00000000393de000-0x000000003d37bfff] e820__memblock_setup+0x6f/0xb0
[ 0.006965] MEMBLOCK configuration:
[ 0.006965] memory size = 0x000000003d00b000 reserved size = 0x00000000049ff7b0
[ 0.006966] memory.cnt = 0x3
[ 0.006967] memory[0x0] [0x0000000000001000-0x000000000009ffff], 0x000000000009f000 bytes flags: 0x0
[ 0.006969] memory[0x1] [0x0000000000100000-0x00000000390cdfff], 0x0000000038fce000 bytes flags: 0x0
[ 0.006970] memory[0x2] [0x00000000393de000-0x000000003d37bfff], 0x0000000003f9e000 bytes flags: 0x0
[ 0.006971] reserved.cnt = 0x7
[ 0.006972] reserved[0x0] [0x0000000000000000-0x000000000000ffff], 0x0000000000010000 bytes flags: 0x0
[ 0.006973] reserved[0x1] [0x000000000009f000-0x00000000000fffff], 0x0000000000061000 bytes flags: 0x0
[ 0.006975] reserved[0x2] [0x000000001a000000-0x000000001d40cfff], 0x000000000340d000 bytes flags: 0x0
[ 0.006976] reserved[0x3] [0x000000003254f000-0x0000000033ac6fff], 0x0000000001578000 bytes flags: 0x0
[ 0.006977] reserved[0x4] [0x00000000372be018-0x00000000372c6e57], 0x0000000000008e40 bytes flags: 0x0
[ 0.006978] reserved[0x5] [0x0000000037a43a98-0x0000000037a43e37], 0x00000000000003a0 bytes flags: 0x0
[ 0.006979] reserved[0x6] [0x0000000039350040-0x000000003935060f], 0x00000000000005d0 bytes flags: 0x0
[ 0.006981] EFI XX 0x0000000038e73000..00000000390cdfff
[ 0.006982] memblock_reserve: [0x0000000038e73000-0x00000000390cdfff] efi_reserve_boot_services+0xc1/0x100
[ 0.006985] EFI XX 0x00000000393de000..00000000393defff
[ 0.006986] memblock_reserve: [0x00000000393de000-0x00000000393defff] efi_reserve_boot_services+0xc1/0x100
[ 0.006988] EFI XX 0x000000003b000000..000000003b1fffff
[ 0.006989] memblock_reserve: [0x000000003b000000-0x000000003b1fffff] efi_reserve_boot_services+0xc1/0x100
[ 0.006991] EFI XX 0x000000003b2f7000..000000003b316fff
[ 0.006992] memblock_reserve: [0x000000003b2f7000-0x000000003b316fff] efi_reserve_boot_services+0xc1/0x100
[ 0.006994] EFI XX 0x000000003b317000..000000003b335fff
[ 0.006995] memblock_reserve: [0x000000003b317000-0x000000003b335fff] efi_reserve_boot_services+0xc1/0x100
[ 0.006997] EFI XX 0x000000003b336000..000000003d36afff
[ 0.006998] memblock_reserve: [0x000000003b336000-0x000000003d36afff] efi_reserve_boot_services+0xc1/0x100
[ 0.007000] EFI XX 0x000000003d36b000..000000003d37bfff
[ 0.007000] memblock_reserve: [0x000000003d36b000-0x000000003d37bfff] efi_reserve_boot_services+0xc1/0x100
[ 0.007004] memblock_phys_alloc_range: 28672 bytes align=0x1000 from=0x0000000000000000 max_addr=0x0000000000100000 reserve_real_mode+0x53/0x90
[ 0.007008] memblock_reserve: [0x0000000000098000-0x000000000009efff] memblock_alloc_range_nid+0xbf/0x180
[ 0.007012] memblock_reserve: [0x0000000000000000-0x00000000000fffff] setup_arch+0x636/0xad0
[ 0.007014] Using GB pages for direct mapping
[ 0.007052] memblock_phys_alloc_range: 2097152 bytes align=0x200000 from=0x0000000000100000 max_addr=0x000000003d37c000 init_mem_mapping+0x140/0x2c0
[ 0.007055] memblock_reserve: [0x000000003ae00000-0x000000003affffff] memblock_alloc_range_nid+0xbf/0x180
[ 0.007058] memblock_phys_free: [0x000000003ae00000-0x000000003affffff] init_mem_mapping+0x160/0x2c0
[ 0.007261] Secure boot disabled
[ 0.007261] RAMDISK: [mem 0x3254f000-0x33ac6fff]
[ 0.007271] ACPI: Early table checksum verification disabled
[ 0.007275] ACPI: RSDP 0x000000003935D014 000024 (v02 AMAZON)
[ 0.007278] ACPI: XSDT 0x000000003935C0E8 00006C (v01 AMAZON AMZNFACP 00000001 01000013)
[ 0.007283] ACPI: FACP 0x0000000039355000 000114 (v01 AMAZON AMZNFACP 00000001 AMZN 00000001)
[ 0.007288] ACPI: DSDT 0x0000000039356000 00115A (v01 AMAZON AMZNDSDT 00000001 AMZN 00000001)
[ 0.007291] ACPI: FACS 0x00000000393D0000 000040
[ 0.007294] ACPI: WAET 0x000000003935B000 000028 (v01 AMAZON AMZNWAET 00000001 AMZN 00000001)
[ 0.007296] ACPI: SLIT 0x000000003935A000 00006C (v01 AMAZON AMZNSLIT 00000001 AMZN 00000001)
[ 0.007299] ACPI: APIC 0x0000000039359000 000076 (v01 AMAZON AMZNAPIC 00000001 AMZN 00000001)
[ 0.007301] ACPI: SRAT 0x0000000039358000 0000A0 (v01 AMAZON AMZNSRAT 00000001 AMZN 00000001)
[ 0.007304] ACPI: HPET 0x0000000039354000 000038 (v01 AMAZON AMZNHPET 00000001 AMZN 00000001)
[ 0.007306] ACPI: SSDT 0x0000000039353000 000759 (v01 AMAZON AMZNSSDT 00000001 AMZN 00000001)
[ 0.007309] ACPI: SSDT 0x0000000039352000 00007F (v01 AMAZON AMZNSSDT 00000001 AMZN 00000001)
[ 0.007312] ACPI: BGRT 0x0000000039351000 000038 (v01 AMAZON AMAZON 00000002 01000013)
[ 0.007314] ACPI: Reserving FACP table memory at [mem 0x39355000-0x39355113]
[ 0.007315] memblock_reserve: [0x0000000039355000-0x0000000039355113] acpi_reserve_initial_tables+0x46/0x70
[ 0.007319] ACPI: Reserving DSDT table memory at [mem 0x39356000-0x39357159]
[ 0.007320] memblock_reserve: [0x0000000039356000-0x0000000039357159] acpi_reserve_initial_tables+0x46/0x70
[ 0.007323] ACPI: Reserving FACS table memory at [mem 0x393d0000-0x393d003f]
[ 0.007323] memblock_reserve: [0x00000000393d0000-0x00000000393d003f] acpi_reserve_initial_tables+0x46/0x70
[ 0.007325] ACPI: Reserving WAET table memory at [mem 0x3935b000-0x3935b027]
[ 0.007326] memblock_reserve: [0x000000003935b000-0x000000003935b027] acpi_reserve_initial_tables+0x46/0x70
[ 0.007328] ACPI: Reserving SLIT table memory at [mem 0x3935a000-0x3935a06b]
[ 0.007329] memblock_reserve: [0x000000003935a000-0x000000003935a06b] acpi_reserve_initial_tables+0x46/0x70
[ 0.007331] ACPI: Reserving APIC table memory at [mem 0x39359000-0x39359075]
[ 0.007332] memblock_reserve: [0x0000000039359000-0x0000000039359075] acpi_reserve_initial_tables+0x46/0x70
[ 0.007334] ACPI: Reserving SRAT table memory at [mem 0x39358000-0x3935809f]
[ 0.007335] memblock_reserve: [0x0000000039358000-0x000000003935809f] acpi_reserve_initial_tables+0x46/0x70
[ 0.007337] ACPI: Reserving HPET table memory at [mem 0x39354000-0x39354037]
[ 0.007338] memblock_reserve: [0x0000000039354000-0x0000000039354037] acpi_reserve_initial_tables+0x46/0x70
[ 0.007340] ACPI: Reserving SSDT table memory at [mem 0x39353000-0x39353758]
[ 0.007341] memblock_reserve: [0x0000000039353000-0x0000000039353758] acpi_reserve_initial_tables+0x46/0x70
[ 0.007343] ACPI: Reserving SSDT table memory at [mem 0x39352000-0x3935207e]
[ 0.007344] memblock_reserve: [0x0000000039352000-0x000000003935207e] acpi_reserve_initial_tables+0x46/0x70
[ 0.007346] ACPI: Reserving BGRT table memory at [mem 0x39351000-0x39351037]
[ 0.007346] memblock_reserve: [0x0000000039351000-0x0000000039351037] acpi_reserve_initial_tables+0x46/0x70
[ 0.007398] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x3fffffff]
[ 0.007407] memblock_alloc_try_nid: 1 bytes align=0x1000 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 numa_alloc_distance+0x14a/0x200
[ 0.007414] memblock_reserve: [0x000000003b2f6000-0x000000003b2f6000] memblock_alloc_range_nid+0xbf/0x180
[ 0.007421] NUMA: Initialized distance table, cnt=1
[ 0.007430] memblock_reserve: [0x000000003b2cb680-0x000000003b2f5fff] memblock_alloc_range_nid+0xbf/0x180
[ 0.007433] NODE_DATA(0) allocated [mem 0x3b2cb680-0x3b2f5fff]
[ 0.007452] MEMBLOCK configuration:
[ 0.007452] memory size = 0x000000003d00b000 reserved size = 0x0000000006f9bdd1
[ 0.007453] memory.cnt = 0x3
[ 0.007454] memory[0x0] [0x0000000000001000-0x000000000009ffff], 0x000000000009f000 bytes on node 0 flags: 0x0
[ 0.007456] memory[0x1] [0x0000000000100000-0x00000000390cdfff], 0x0000000038fce000 bytes on node 0 flags: 0x0
[ 0.007458] memory[0x2] [0x00000000393de000-0x000000003d37bfff], 0x0000000003f9e000 bytes on node 0 flags: 0x0
[ 0.007459] reserved.cnt = 0x18
[ 0.007460] reserved[0x0] [0x0000000000000000-0x0000000000000fff], 0x0000000000001000 bytes flags: 0x0
[ 0.007462] reserved[0x1] [0x0000000000001000-0x00000000000fffff], 0x00000000000ff000 bytes on node 0 flags: 0x0
[ 0.007464] reserved[0x2] [0x000000001a000000-0x000000001d40cfff], 0x000000000340d000 bytes on node 0 flags: 0x0
[ 0.007465] reserved[0x3] [0x000000003254f000-0x0000000033ac6fff], 0x0000000001578000 bytes on node 0 flags: 0x0
[ 0.007467] reserved[0x4] [0x00000000372be018-0x00000000372c6e57], 0x0000000000008e40 bytes on node 0 flags: 0x0
[ 0.007468] reserved[0x5] [0x0000000037a43a98-0x0000000037a43e37], 0x00000000000003a0 bytes on node 0 flags: 0x0
[ 0.007470] reserved[0x6] [0x0000000038e73000-0x00000000390cdfff], 0x000000000025b000 bytes on node 0 flags: 0x0
[ 0.007471] reserved[0x7] [0x0000000039350040-0x000000003935060f], 0x00000000000005d0 bytes on node 0 flags: 0x0
[ 0.007473] reserved[0x8] [0x0000000039351000-0x0000000039351037], 0x0000000000000038 bytes on node 0 flags: 0x0
[ 0.007474] reserved[0x9] [0x0000000039352000-0x000000003935207e], 0x000000000000007f bytes on node 0 flags: 0x0
[ 0.007475] reserved[0xa] [0x0000000039353000-0x0000000039353758], 0x0000000000000759 bytes on node 0 flags: 0x0
[ 0.007477] reserved[0xb] [0x0000000039354000-0x0000000039354037], 0x0000000000000038 bytes on node 0 flags: 0x0
[ 0.007478] reserved[0xc] [0x0000000039355000-0x0000000039355113], 0x0000000000000114 bytes on node 0 flags: 0x0
[ 0.007479] reserved[0xd] [0x0000000039356000-0x0000000039357159], 0x000000000000115a bytes on node 0 flags: 0x0
[ 0.007480] reserved[0xe] [0x0000000039358000-0x000000003935809f], 0x00000000000000a0 bytes on node 0 flags: 0x0
[ 0.007482] reserved[0xf] [0x0000000039359000-0x0000000039359075], 0x0000000000000076 bytes on node 0 flags: 0x0
[ 0.007483] reserved[0x10] [0x000000003935a000-0x000000003935a06b], 0x000000000000006c bytes on node 0 flags: 0x0
[ 0.007485] reserved[0x11] [0x000000003935b000-0x000000003935b027], 0x0000000000000028 bytes on node 0 flags: 0x0
[ 0.007487] reserved[0x12] [0x00000000393d0000-0x00000000393d003f], 0x0000000000000040 bytes on node 0 flags: 0x0
[ 0.007488] reserved[0x13] [0x00000000393de000-0x00000000393defff], 0x0000000000001000 bytes on node 0 flags: 0x0
[ 0.007490] reserved[0x14] [0x000000003b000000-0x000000003b1fffff], 0x0000000000200000 bytes on node 0 flags: 0x0
[ 0.007491] reserved[0x15] [0x000000003b2cb680-0x000000003b2f5fff], 0x000000000002a980 bytes flags: 0x0
[ 0.007492] reserved[0x16] [0x000000003b2f6000-0x000000003b2f6000], 0x0000000000000001 bytes on node 0 flags: 0x0
[ 0.007493] reserved[0x17] [0x000000003b2f7000-0x000000003d37bfff], 0x0000000002085000 bytes on node 0 flags: 0x0
[ 0.007652] memblock_alloc_try_nid: 16384 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 memblocks_present+0x1d1/0x210
[ 0.007655] memblock_reserve: [0x000000003b2c7680-0x000000003b2cb67f] memblock_alloc_range_nid+0xbf/0x180
[ 0.007659] memblock_alloc_try_nid: 4096 bytes align=0x40 nid=0 from=0x0000000000000000 max_addr=0x0000000000000000 sparse_index_alloc+0x44/0x70
[ 0.007663] memblock_reserve: [0x000000003b2c6680-0x000000003b2c767f] memblock_alloc_range_nid+0xbf/0x180
[ 0.007667] memblock_alloc_try_nid: 448 bytes align=0x40 nid=0 from=0x0000000038000000 max_addr=0x0000000040000000 sparse_init_nid+0x9b/0x4e0
[ 0.007669] memblock_reserve: [0x000000003b2f6e40-0x000000003b2f6fff] memblock_alloc_range_nid+0xbf/0x180
[ 0.007672] memblock_alloc_exact_nid_raw: 16777216 bytes align=0x200000 nid=0 from=0x0000000001000000 max_addr=0x0000000000000000 memmap_alloc+0x1f/0x60
[ 0.007675] memblock_reserve: [0x000000003a000000-0x000000003affffff] memblock_alloc_range_nid+0xbf/0x180
[ 0.009338] memblock_alloc_try_nid_raw: 4096 bytes align=0x1000 nid=0 from=0x0000000001000000 max_addr=0x0000000000000000 vmemmap_alloc_block_zero.constprop.0+0x11/0x50
[ 0.009345] memblock_reserve: [0x000000003b2c5000-0x000000003b2c5fff] memblock_alloc_range_nid+0xbf/0x180
[ 0.009351] memblock_alloc_try_nid_raw: 4096 bytes align=0x1000 nid=0 from=0x0000000001000000 max_addr=0x0000000000000000 vmemmap_alloc_block_zero.constprop.0+0x11/0x50
[ 0.009354] memblock_reserve: [0x000000003b2c4000-0x000000003b2c4fff] memblock_alloc_range_nid+0xbf/0x180
[ 0.009360] Zone ranges:
[ 0.009361] DMA [mem 0x0000000000001000-0x0000000000ffffff]
[ 0.009363] DMA32 [mem 0x0000000001000000-0x000000003d37bfff]
[ 0.009364] Normal empty
[ 0.009365] Device empty
[ 0.009366] Movable zone start for each node
[ 0.009369] Early memory node ranges
[ 0.009369] node 0: [mem 0x0000000000001000-0x000000000009ffff]
[ 0.009371] node 0: [mem 0x0000000000100000-0x00000000390cdfff]
[ 0.009372] node 0: [mem 0x00000000393de000-0x000000003d37bfff]
[ 0.009373] Initmem setup node 0 [mem 0x0000000000001000-0x000000003d37bfff]
[ 0.009378] On node 0, zone DMA: 1 pages in unavailable ranges
[ 0.009406] On node 0, zone DMA: 96 pages in unavailable ranges
[ 0.009905] On node 0, zone DMA32: 784 pages in unavailable ranges
[ 0.009994] On node 0, zone DMA32: 11396 pages in unavailable ranges
[ 0.010412] ACPI: PM-Timer IO Port: 0xb008
[ 0.010424] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
[ 0.010480] IOAPIC[0]: apic_id 0, version 32, address 0xfec00000, GSI 0-23
[ 0.010482] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
[ 0.010484] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[ 0.010486] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
[ 0.010487] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
[ 0.010490] ACPI: Using ACPI (MADT) for SMP configuration information
[ 0.010491] ACPI: HPET id: 0x8086a201 base: 0xfed00000
[ 0.010492] memblock_alloc_try_nid: 73 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 acpi_parse_hpet+0x91/0x150
[ 0.010498] memblock_reserve: [0x000000003b2f6dc0-0x000000003b2f6e08] memblock_alloc_range_nid+0xbf/0x180
[ 0.010503] TSC deadline timer available
[ 0.010508] CPU topo: Max. logical packages: 1
[ 0.010509] CPU topo: Max. logical dies: 1
[ 0.010509] CPU topo: Max. dies per package: 1
[ 0.010514] CPU topo: Max. threads per core: 2
[ 0.010515] CPU topo: Num. cores per package: 1
[ 0.010516] CPU topo: Num. threads per package: 2
[ 0.010516] CPU topo: Allowing 2 present CPUs plus 0 hotplug CPUs
[ 0.010518] memblock_alloc_try_nid: 75 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 io_apic_init_mappings+0x41/0x1e0
[ 0.010522] memblock_reserve: [0x000000003b2f6d40-0x000000003b2f6d8a] memblock_alloc_range_nid+0xbf/0x180
[ 0.010539] kvm-guest: APIC: eoi() replaced with kvm_guest_apic_eoi_write()
[ 0.010547] memblock_alloc_try_nid: 640 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 e820__reserve_resources+0x2e/0x1f0
[ 0.010552] memblock_reserve: [0x000000003b2f6ac0-0x000000003b2f6d3f] memblock_alloc_range_nid+0xbf/0x180
[ 0.010556] memblock_alloc_try_nid: 104 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 firmware_map_add_early+0x2c/0x60
[ 0.010560] memblock_reserve: [0x000000003b2f6a40-0x000000003b2f6aa7] memblock_alloc_range_nid+0xbf/0x180
[ 0.010563] memblock_alloc_try_nid: 104 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 firmware_map_add_early+0x2c/0x60
[ 0.010566] memblock_reserve: [0x000000003b2f69c0-0x000000003b2f6a27] memblock_alloc_range_nid+0xbf/0x180
[ 0.010569] memblock_alloc_try_nid: 104 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 firmware_map_add_early+0x2c/0x60
[ 0.010572] memblock_reserve: [0x000000003b2f6940-0x000000003b2f69a7] memblock_alloc_range_nid+0xbf/0x180
[ 0.010574] memblock_alloc_try_nid: 104 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 firmware_map_add_early+0x2c/0x60
[ 0.010577] memblock_reserve: [0x000000003b2f68c0-0x000000003b2f6927] memblock_alloc_range_nid+0xbf/0x180
[ 0.010580] memblock_alloc_try_nid: 104 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 firmware_map_add_early+0x2c/0x60
[ 0.010583] memblock_reserve: [0x000000003b2f6840-0x000000003b2f68a7] memblock_alloc_range_nid+0xbf/0x180
[ 0.010585] memblock_alloc_try_nid: 104 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 firmware_map_add_early+0x2c/0x60
[ 0.010588] memblock_reserve: [0x000000003b2f67c0-0x000000003b2f6827] memblock_alloc_range_nid+0xbf/0x180
[ 0.010591] memblock_alloc_try_nid: 104 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 firmware_map_add_early+0x2c/0x60
[ 0.010593] memblock_reserve: [0x000000003b2f6740-0x000000003b2f67a7] memblock_alloc_range_nid+0xbf/0x180
[ 0.010596] memblock_alloc_try_nid: 32 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 register_nosave_region+0x53/0xe0
[ 0.010601] memblock_reserve: [0x000000003b2f6700-0x000000003b2f671f] memblock_alloc_range_nid+0xbf/0x180
[ 0.010604] PM: hibernation: Registered nosave memory: [mem 0x00000000-0x00000fff]
[ 0.010605] memblock_alloc_try_nid: 32 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 register_nosave_region+0x53/0xe0
[ 0.010608] memblock_reserve: [0x000000003b2f66c0-0x000000003b2f66df] memblock_alloc_range_nid+0xbf/0x180
[ 0.010611] PM: hibernation: Registered nosave memory: [mem 0x000a0000-0x000fffff]
[ 0.010611] memblock_alloc_try_nid: 32 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 register_nosave_region+0x53/0xe0
[ 0.010615] memblock_reserve: [0x000000003b2f6680-0x000000003b2f669f] memblock_alloc_range_nid+0xbf/0x180
[ 0.010617] PM: hibernation: Registered nosave memory: [mem 0x390ce000-0x393ddfff]
[ 0.010619] [mem 0x3d400000-0xffffffff] available for PCI devices
[ 0.010621] Booting paravirtualized kernel on KVM
[ 0.010623] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[ 0.015925] memblock_alloc_try_nid: 265 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 setup_command_line+0x71/0x230
[ 0.015933] memblock_reserve: [0x000000003b2f6540-0x000000003b2f6648] memblock_alloc_range_nid+0xbf/0x180
[ 0.015937] memblock_alloc_try_nid: 265 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 setup_command_line+0xa4/0x230
[ 0.015940] memblock_reserve: [0x000000003b2f6400-0x000000003b2f6508] memblock_alloc_range_nid+0xbf/0x180
[ 0.015944] setup_percpu: NR_CPUS:8192 nr_cpumask_bits:2 nr_cpu_ids:2 nr_node_ids:1
[ 0.015954] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 pcpu_build_alloc_info+0x2e1/0x5c0
[ 0.015958] memblock_reserve: [0x000000003b2c3000-0x000000003b2c3fff] memblock_alloc_range_nid+0xbf/0x180
[ 0.015961] memblock_alloc_try_nid: 4096 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 pcpu_embed_first_chunk+0x7a/0x580
[ 0.015964] memblock_reserve: [0x000000003b2c2000-0x000000003b2c2fff] memblock_alloc_range_nid+0xbf/0x180
[ 0.015968] memblock_alloc_try_nid: 2097152 bytes align=0x200000 nid=0 from=0x0000000001000000 max_addr=0x0000000000000000 pcpu_fc_alloc+0x101/0x170
[ 0.015971] memblock_reserve: [0x0000000039e00000-0x0000000039ffffff] memblock_alloc_range_nid+0xbf/0x180
[ 0.016206] memblock_phys_free: [0x0000000039e41000-0x0000000039efffff] pcpu_embed_first_chunk+0x1e6/0x580
[ 0.016229] memblock_phys_free: [0x0000000039f41000-0x0000000039ffffff] pcpu_embed_first_chunk+0x1e6/0x580
[ 0.016231] percpu: Embedded 65 pages/cpu s229376 r8192 d28672 u1048576
[ 0.016233] memblock_alloc_try_nid: 8 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 pcpu_setup_first_chunk+0xc3/0x900
[ 0.016236] memblock_reserve: [0x000000003b2f63c0-0x000000003b2f63c7] memblock_alloc_range_nid+0xbf/0x180
[ 0.016239] memblock_alloc_try_nid: 8 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 pcpu_setup_first_chunk+0xee/0x900
[ 0.016242] memblock_reserve: [0x000000003b2f6380-0x000000003b2f6387] memblock_alloc_range_nid+0xbf/0x180
[ 0.016244] memblock_alloc_try_nid: 8 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 pcpu_setup_first_chunk+0x120/0x900
[ 0.016247] memblock_reserve: [0x000000003b2f6340-0x000000003b2f6347] memblock_alloc_range_nid+0xbf/0x180
[ 0.016250] memblock_alloc_try_nid: 16 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 pcpu_setup_first_chunk+0x156/0x900
[ 0.016253] memblock_reserve: [0x000000003b2f6300-0x000000003b2f630f] memblock_alloc_range_nid+0xbf/0x180
[ 0.016256] pcpu-alloc: s229376 r8192 d28672 u1048576 alloc=1*2097152
[ 0.016258] pcpu-alloc: [0] 0 1
[ 0.016261] memblock_alloc_try_nid: 352 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 pcpu_setup_first_chunk+0x707/0x900
[ 0.016264] memblock_reserve: [0x000000003b2f6180-0x000000003b2f62df] memblock_alloc_range_nid+0xbf/0x180
[ 0.016266] memblock_alloc_try_nid: 264 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 pcpu_alloc_first_chunk+0x76/0x2f0
[ 0.016269] memblock_reserve: [0x000000003b2f6040-0x000000003b2f6147] memblock_alloc_range_nid+0xbf/0x180
[ 0.016272] memblock_alloc_try_nid: 256 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 pcpu_alloc_first_chunk+0xee/0x2f0
[ 0.016275] memblock_reserve: [0x000000003b2c6580-0x000000003b2c667f] memblock_alloc_range_nid+0xbf/0x180
[ 0.016278] memblock_alloc_try_nid: 264 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 pcpu_alloc_first_chunk+0x129/0x2f0
[ 0.016280] memblock_reserve: [0x000000003b2c6440-0x000000003b2c6547] memblock_alloc_range_nid+0xbf/0x180
[ 0.016283] memblock_alloc_try_nid: 64 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 pcpu_alloc_first_chunk+0x15a/0x2f0
[ 0.016286] memblock_reserve: [0x000000003b2c6400-0x000000003b2c643f] memblock_alloc_range_nid+0xbf/0x180
[ 0.016289] memblock_alloc_try_nid: 264 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 pcpu_alloc_first_chunk+0x76/0x2f0
[ 0.016291] memblock_reserve: [0x000000003b2c62c0-0x000000003b2c63c7] memblock_alloc_range_nid+0xbf/0x180
[ 0.016294] memblock_alloc_try_nid: 896 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 pcpu_alloc_first_chunk+0xee/0x2f0
[ 0.016297] memblock_reserve: [0x000000003b2c1c80-0x000000003b2c1fff] memblock_alloc_range_nid+0xbf/0x180
[ 0.016300] memblock_alloc_try_nid: 904 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 pcpu_alloc_first_chunk+0x129/0x2f0
[ 0.016303] memblock_reserve: [0x000000003b2c18c0-0x000000003b2c1c47] memblock_alloc_range_nid+0xbf/0x180
[ 0.016305] memblock_alloc_try_nid: 224 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 pcpu_alloc_first_chunk+0x15a/0x2f0
[ 0.016308] memblock_reserve: [0x000000003b2c61c0-0x000000003b2c629f] memblock_alloc_range_nid+0xbf/0x180
[ 0.016311] memblock_phys_free: [0x000000003b2c3000-0x000000003b2c3fff] pcpu_embed_first_chunk+0x26c/0x580
[ 0.016314] memblock_phys_free: [0x000000003b2c2000-0x000000003b2c2fff] pcpu_embed_first_chunk+0x279/0x580
[ 0.016317] memblock_alloc_try_nid: 8 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 alloc_bootmem_cpumask_var+0x2a/0x60
[ 0.016320] memblock_reserve: [0x000000003b2c6180-0x000000003b2c6187] memblock_alloc_range_nid+0xbf/0x180
[ 0.016323] memblock_alloc_try_nid: 8 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 alloc_bootmem_cpumask_var+0x2a/0x60
[ 0.016325] memblock_reserve: [0x000000003b2c6140-0x000000003b2c6147] memblock_alloc_range_nid+0xbf/0x180
[ 0.016996] kvm-guest: PV spinlocks enabled
[ 0.016997] memblock_alloc_try_nid: 4096 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 alloc_large_system_hash+0x232/0x2d0
[ 0.017000] memblock_reserve: [0x000000003b2c3000-0x000000003b2c3fff] memblock_alloc_range_nid+0xbf/0x180
[ 0.017004] PV qspinlock hash table entries: 256 (order: 0, 4096 bytes, linear)
[ 0.017006] Kernel command line: BOOT_IMAGE=(hd0,gpt1)/boot/vmlinuz-6.12.68-93.123.amzn2023.x86_64 root=UUID=7813e2d4-9cdc-416a-a749-25de8a9f36d0 ro console=tty0 console=ttyS0,115200n8 nvme_core.io_timeout=4294967295 rd.emergency=poweroff rd.shell=0 selinux=1 security=selinux quiet memblock=debug
[ 0.017133] random: crng init done
[ 0.017136] memblock_alloc_try_nid: 1048576 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 alloc_large_system_hash+0x232/0x2d0
[ 0.017139] memblock_reserve: [0x0000000039d00000-0x0000000039dfffff] memblock_alloc_range_nid+0xbf/0x180
[ 0.017255] Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)
[ 0.017257] memblock_alloc_try_nid: 524288 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 alloc_large_system_hash+0x232/0x2d0
[ 0.017261] memblock_reserve: [0x000000003b2418c0-0x000000003b2c18bf] memblock_alloc_range_nid+0xbf/0x180
[ 0.017314] Inode-cache hash table entries: 65536 (order: 7, 524288 bytes, linear)
[ 0.017318] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 spp_getpage+0x51/0xa0
[ 0.017323] memblock_reserve: [0x000000003b2c2000-0x000000003b2c2fff] memblock_alloc_range_nid+0xbf/0x180
[ 0.017326] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 spp_getpage+0x51/0xa0
[ 0.017328] memblock_reserve: [0x000000003b240000-0x000000003b240fff] memblock_alloc_range_nid+0xbf/0x180
[ 0.017331] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 spp_getpage+0x51/0xa0
[ 0.017334] memblock_reserve: [0x000000003b23f000-0x000000003b23ffff] memblock_alloc_range_nid+0xbf/0x180
[ 0.017347] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 spp_getpage+0x51/0xa0
[ 0.017350] memblock_reserve: [0x000000003b23e000-0x000000003b23efff] memblock_alloc_range_nid+0xbf/0x180
[ 0.017353] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 spp_getpage+0x51/0xa0
[ 0.017356] memblock_reserve: [0x000000003b23d000-0x000000003b23dfff] memblock_alloc_range_nid+0xbf/0x180
[ 0.017361] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 spp_getpage+0x51/0xa0
[ 0.017364] memblock_reserve: [0x000000003b23c000-0x000000003b23cfff] memblock_alloc_range_nid+0xbf/0x180
[ 0.017388] Fallback order for Node 0: 0
[ 0.017391] Built 1 zonelists, mobility grouping on. Total pages: 249867
[ 0.017392] Policy zone: DMA32
[ 0.017393] mem auto-init: stack:off, heap alloc:off, heap free:off
[ 0.018640] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
[ 0.018654] Kernel/User page tables isolation: enabled
[ 0.018679] ftrace: allocating 46825 entries in 183 pages
[ 0.027381] ftrace: allocated 183 pages with 6 groups
[ 0.028278] Dynamic Preempt: none
[ 0.028324] rcu: Preemptible hierarchical RCU implementation.
[ 0.028325] rcu: RCU restricting CPUs from NR_CPUS=8192 to nr_cpu_ids=2.
[ 0.028327] Trampoline variant of Tasks RCU enabled.
[ 0.028328] Rude variant of Tasks RCU enabled.
[ 0.028328] Tracing variant of Tasks RCU enabled.
[ 0.028329] rcu: RCU calculated value of scheduler-enlistment delay is 10 jiffies.
[ 0.028330] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
[ 0.028336] RCU Tasks: Setting shift to 1 and lim to 1 rcu_task_cb_adjust=1 rcu_task_cpu_ids=2.
[ 0.028338] RCU Tasks Rude: Setting shift to 1 and lim to 1 rcu_task_cb_adjust=1 rcu_task_cpu_ids=2.
[ 0.028339] RCU Tasks Trace: Setting shift to 1 and lim to 1 rcu_task_cb_adjust=1 rcu_task_cpu_ids=2.
[ 0.032427] NR_IRQS: 524544, nr_irqs: 440, preallocated irqs: 16
[ 0.032664] rcu: srcu_init: Setting srcu_struct sizes based on contention.
[ 0.032720] Console: colour dummy device 80x25
[ 0.032722] printk: legacy console [tty0] enabled
[ 0.032818] printk: legacy console [ttyS0] enabled
[ 0.032884] ACPI: Core revision 20240827
[ 0.033070] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 30580167144 ns
[ 0.033089] APIC: Switch to symmetric I/O mode setup
[ 0.033518] x2apic enabled
[ 0.033960] APIC: Switched APIC routing to: physical x2apic
[ 0.035630] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x240933eba6e, max_idle_ns: 440795246008 ns
[ 0.035635] Calibrating delay loop (skipped) preset value.. 4999.98 BogoMIPS (lpj=24999940)
[ 0.036010] Last level iTLB entries: 4KB 64, 2MB 8, 4MB 8
[ 0.036015] Last level dTLB entries: 4KB 64, 2MB 32, 4MB 32, 1GB 4
[ 0.036019] Spectre V1 : Mitigation: usercopy/swapgs barriers and __user pointer sanitization
[ 0.036022] Spectre V2 : Mitigation: Retpolines
[ 0.036023] Spectre V2 : Spectre v2 / SpectreRSB: Filling RSB on context switch and VMEXIT
[ 0.036024] RETBleed: WARNING: Spectre v2 mitigation leaves CPU vulnerable to RETBleed attacks, data leaks possible!
[ 0.045633] RETBleed: Vulnerable
[ 0.045633] Speculative Store Bypass: Vulnerable
[ 0.045633] MDS: Vulnerable: Clear CPU buffers attempted, no microcode
[ 0.045633] MMIO Stale Data: Vulnerable: Clear CPU buffers attempted, no microcode
[ 0.045633] GDS: Unknown: Dependent on hypervisor status
[ 0.045633] active return thunk: its_return_thunk
[ 0.045633] ITS: Mitigation: Aligned branch/return thunks
[ 0.045633] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[ 0.045633] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[ 0.045633] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[ 0.045633] x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers'
[ 0.045633] x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR'
[ 0.045633] x86/fpu: Supporting XSAVE feature 0x020: 'AVX-512 opmask'
[ 0.045633] x86/fpu: Supporting XSAVE feature 0x040: 'AVX-512 Hi256'
[ 0.045633] x86/fpu: Supporting XSAVE feature 0x080: 'AVX-512 ZMM_Hi256'
[ 0.045633] x86/fpu: Supporting XSAVE feature 0x200: 'Protection Keys User registers'
[ 0.045633] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
[ 0.045633] x86/fpu: xstate_offset[3]: 832, xstate_sizes[3]: 64
[ 0.045633] x86/fpu: xstate_offset[4]: 896, xstate_sizes[4]: 64
[ 0.045633] x86/fpu: xstate_offset[5]: 960, xstate_sizes[5]: 64
[ 0.045633] x86/fpu: xstate_offset[6]: 1024, xstate_sizes[6]: 512
[ 0.045633] x86/fpu: xstate_offset[7]: 1536, xstate_sizes[7]: 1024
[ 0.045633] x86/fpu: xstate_offset[9]: 2560, xstate_sizes[9]: 8
[ 0.045633] x86/fpu: Enabled xstate features 0x2ff, context size is 2568 bytes, using 'compacted' format.
[ 0.045633] Freeing SMP alternatives memory: 36K
[ 0.045633] pid_max: default: 32768 minimum: 301
[ 0.045633] memblock_free_late: [0x000000003d36b000-0x000000003d37bfff] efi_free_boot_services+0x11f/0x2e0
[ 0.045633] memblock_free_late: [0x000000003b336000-0x000000003d36afff] efi_free_boot_services+0x11f/0x2e0
[ 0.045633] memblock_free_late: [0x000000003b317000-0x000000003b335fff] efi_free_boot_services+0x11f/0x2e0
[ 0.045633] memblock_free_late: [0x000000003b2f7000-0x000000003b316fff] efi_free_boot_services+0x11f/0x2e0
[ 0.045633] memblock_free_late: [0x000000003b000000-0x000000003b1fffff] efi_free_boot_services+0x11f/0x2e0
[ 0.045633] memblock_free_late: [0x00000000393de000-0x00000000393defff] efi_free_boot_services+0x11f/0x2e0
[ 0.045633] memblock_free_late: [0x0000000038e73000-0x00000000390cdfff] efi_free_boot_services+0x11f/0x2e0
[ 0.045633] LSM: initializing lsm=lockdown,capability,landlock,yama,safesetid,selinux,bpf,ima
[ 0.045633] landlock: Up and running.
[ 0.045633] Yama: becoming mindful.
[ 0.045633] SELinux: Initializing.
[ 0.045633] LSM support for eBPF active
[ 0.045633] Mount-cache hash table entries: 2048 (order: 2, 16384 bytes, linear)
[ 0.045633] Mountpoint-cache hash table entries: 2048 (order: 2, 16384 bytes, linear)
[ 0.045633] smpboot: CPU0: Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz (family: 0x6, model: 0x55, stepping: 0x7)
[ 0.045633] Performance Events: unsupported p6 CPU model 85 no PMU driver, software events only.
[ 0.045633] signal: max sigframe size: 3632
[ 0.045633] rcu: Hierarchical SRCU implementation.
[ 0.045633] rcu: Max phase no-delay instances is 1000.
[ 0.045633] Timer migration: 1 hierarchy levels; 8 children per group; 1 crossnode level
[ 0.045633] smp: Bringing up secondary CPUs ...
[ 0.045633] smpboot: x86: Booting SMP configuration:
[ 0.045633] .... node #0, CPUs: #1
[ 0.045633] MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details.
[ 0.045633] MMIO Stale Data CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/processor_mmio_stale_data.html for more details.
[ 0.045633] smp: Brought up 1 node, 2 CPUs
[ 0.045633] smpboot: Total of 2 processors activated (9999.97 BogoMIPS)
[ 0.045633] node 0 deferred pages initialised in 0ms
[ 0.045633] Memory: 900460K/999468K available (16384K kernel code, 9440K rwdata, 11364K rodata, 3740K init, 6440K bss, 94600K reserved, 0K cma-reserved)
[ 0.045633] devtmpfs: initialized
[ 0.045633] x86/mm: Memory block size: 128MB
[ 0.045633] ------------[ cut here ]------------
[ 0.045633] page type is 1, passed migratetype is 0 (nr=16)
[ 0.045633] WARNING: CPU: 1 PID: 2 at mm/page_alloc.c:721 rmqueue_bulk+0x82e/0x880
[ 0.045633] Modules linked in:
[ 0.045633] CPU: 1 UID: 0 PID: 2 Comm: kthreadd Not tainted 6.12.68-93.123.amzn2023.x86_64 #1
[ 0.045633] Hardware name: Amazon EC2 t3.micro/, BIOS 1.0 10/16/2017
[ 0.045633] RIP: 0010:rmqueue_bulk+0x82e/0x880
[ 0.045633] Code: c6 05 be be 13 02 01 e8 b0 b5 ff ff 44 89 e9 8b 14 24 48 c7 c7 a8 6d 51 8e 48 89 c6 b8 01 00 00 00 d3 e0 89 c1 e8 32 4f d2 ff <0f> 0b 4c 8b 44 24 48 e9 79 fc ff ff 48 c7 c6 e0 77 51 8e 4c 89 e7
[ 0.045633] RSP: 0000:ffffd592c002f898 EFLAGS: 00010086
[ 0.045633] RAX: 0000000000000000 RBX: ffff8e363b2cbc80 RCX: ffffffff8f1f0c68
[ 0.045633] RDX: 0000000000000000 RSI: 00000000fffeffff RDI: 0000000000000001
[ 0.045633] RBP: fffffb9c40e3a408 R08: 0000000000000000 R09: ffffd592c002f740
[ 0.045633] R10: ffffd592c002f738 R11: ffffffff8f370ca8 R12: fffffb9c40e3a400
[ 0.045633] R13: 0000000000000004 R14: 0000000000000003 R15: 0000000000038e90
[ 0.045633] FS: 0000000000000000(0000) GS:ffff8e3639f00000(0000) knlGS:0000000000000000
[ 0.045633] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.045633] CR2: 0000000000000000 CR3: 000000001bc34001 CR4: 00000000007706f0
[ 0.045633] PKRU: 55555554
[ 0.045633] Call Trace:
[ 0.045633] <TASK>
[ 0.045633] __rmqueue_pcplist+0x233/0x2c0
[ 0.045633] rmqueue.constprop.0+0x4b6/0xe80
[ 0.045633] ? _raw_spin_unlock+0xa/0x30
[ 0.045633] ? rmqueue.constprop.0+0x557/0xe80
[ 0.045633] ? _raw_spin_unlock_irqrestore+0xa/0x30
[ 0.045633] get_page_from_freelist+0x16e/0x5f0
[ 0.045633] __alloc_pages_noprof+0x18a/0x350
[ 0.045633] alloc_pages_mpol_noprof+0xf2/0x1e0
[ 0.045633] ? shuffle_freelist+0x126/0x1b0
[ 0.045633] allocate_slab+0x2b3/0x410
[ 0.045633] ___slab_alloc+0x396/0x830
[ 0.045633] ? switch_hrtimer_base+0x8e/0x190
[ 0.045633] ? timerqueue_add+0x9b/0xc0
[ 0.045633] ? dup_task_struct+0x2d/0x1b0
[ 0.045633] ? _raw_spin_unlock_irqrestore+0xa/0x30
[ 0.045633] ? start_dl_timer+0xb0/0x140
[ 0.045633] kmem_cache_alloc_node_noprof+0x271/0x2e0
[ 0.045633] ? dup_task_struct+0x2d/0x1b0
[ 0.045633] dup_task_struct+0x2d/0x1b0
[ 0.045633] copy_process+0x195/0x17e0
[ 0.045633] kernel_clone+0x9a/0x3b0
[ 0.045633] ? psi_task_switch+0x105/0x290
[ 0.045633] kernel_thread+0x6b/0x90
[ 0.045633] ? __pfx_kthread+0x10/0x10
[ 0.045633] kthreadd+0x276/0x2d0
[ 0.045633] ? __pfx_kthreadd+0x10/0x10
[ 0.045633] ret_from_fork+0x30/0x50
[ 0.045633] ? __pfx_kthreadd+0x10/0x10
[ 0.045633] ret_from_fork_asm+0x1a/0x30
[ 0.045633] </TASK>
[ 0.045633] ---[ end trace 0000000000000000 ]---
[ 0.045633] ------------[ cut here ]------------
[ 0.045633] page type is 1, passed migratetype is 0 (nr=8)
[ 0.045633] WARNING: CPU: 1 PID: 2 at mm/page_alloc.c:686 expand+0x1af/0x1e0
[ 0.045633] Modules linked in:
[ 0.045633] CPU: 1 UID: 0 PID: 2 Comm: kthreadd Tainted: G W 6.12.68-93.123.amzn2023.x86_64 #1
[ 0.045633] Tainted: [W]=WARN
[ 0.045633] Hardware name: Amazon EC2 t3.micro/, BIOS 1.0 10/16/2017
[ 0.045633] RIP: 0010:expand+0x1af/0x1e0
[ 0.045633] Code: c6 05 af 06 14 02 01 e8 9f fd ff ff 89 e9 8b 54 24 34 48 c7 c7 a8 6d 51 8e 48 89 c6 b8 01 00 00 00 d3 e0 89 c1 e8 21 97 d2 ff <0f> 0b e9 e5 fe ff ff 48 c7 c6 e0 6d 51 8e 4c 89 ff e8 eb 23 fc ff
[ 0.045633] RSP: 0000:ffffd592c002f828 EFLAGS: 00010082
[ 0.045633] RAX: 0000000000000000 RBX: ffff8e363b2cbc80 RCX: ffffffff8f1f0c68
[ 0.045633] RDX: 0000000000000000 RSI: 00000000fffeffff RDI: 0000000000000001
[ 0.045633] RBP: 0000000000000003 R08: 0000000000000000 R09: ffffd592c002f6d0
[ 0.045633] R10: ffffd592c002f6c8 R11: ffffffff8f370ca8 R12: 0000000000000008
[ 0.045633] R13: 0000000000038e98 R14: 0000000000000003 R15: fffffb9c40e3a600
[ 0.045633] FS: 0000000000000000(0000) GS:ffff8e3639f00000(0000) knlGS:0000000000000000
[ 0.045633] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.045633] CR2: 0000000000000000 CR3: 000000001bc34001 CR4: 00000000007706f0
[ 0.045633] PKRU: 55555554
[ 0.045633] Call Trace:
[ 0.045633] <TASK>
[ 0.045633] rmqueue_bulk+0x541/0x880
[ 0.045633] __rmqueue_pcplist+0x233/0x2c0
[ 0.045633] rmqueue.constprop.0+0x4b6/0xe80
[ 0.045633] ? _raw_spin_unlock+0xa/0x30
[ 0.045633] ? rmqueue.constprop.0+0x557/0xe80
[ 0.045633] ? _raw_spin_unlock_irqrestore+0xa/0x30
[ 0.045633] get_page_from_freelist+0x16e/0x5f0
[ 0.045633] __alloc_pages_noprof+0x18a/0x350
[ 0.045633] alloc_pages_mpol_noprof+0xf2/0x1e0
[ 0.045633] ? shuffle_freelist+0x126/0x1b0
[ 0.045633] allocate_slab+0x2b3/0x410
[ 0.045633] ___slab_alloc+0x396/0x830
[ 0.045633] ? switch_hrtimer_base+0x8e/0x190
[ 0.045633] ? timerqueue_add+0x9b/0xc0
[ 0.045633] ? dup_task_struct+0x2d/0x1b0
[ 0.045633] ? _raw_spin_unlock_irqrestore+0xa/0x30
[ 0.045633] ? start_dl_timer+0xb0/0x140
[ 0.045633] kmem_cache_alloc_node_noprof+0x271/0x2e0
[ 0.045633] ? dup_task_struct+0x2d/0x1b0
[ 0.045633] dup_task_struct+0x2d/0x1b0
[ 0.045633] copy_process+0x195/0x17e0
[ 0.045633] kernel_clone+0x9a/0x3b0
[ 0.045633] ? psi_task_switch+0x105/0x290
[ 0.045633] kernel_thread+0x6b/0x90
[ 0.045633] ? __pfx_kthread+0x10/0x10
[ 0.045633] kthreadd+0x276/0x2d0
[ 0.045633] ? __pfx_kthreadd+0x10/0x10
[ 0.045633] ret_from_fork+0x30/0x50
[ 0.045633] ? __pfx_kthreadd+0x10/0x10
[ 0.045633] ret_from_fork_asm+0x1a/0x30
[ 0.045633] </TASK>
[ 0.045633] ---[ end trace 0000000000000000 ]---
[ 0.045633] ACPI: PM: Registering ACPI NVS region [mem 0x3935e000-0x393ddfff] (524288 bytes)
[ 0.045633] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[ 0.045633] futex hash table entries: 512 (order: 3, 32768 bytes, linear)
[ 0.045633] NET: Registered PF_NETLINK/PF_ROUTE protocol family
[ 0.045633] DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
[ 0.045633] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
[ 0.045633] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] mm: Fix memblock_free_late() when using deferred struct page
2026-02-16 5:34 ` Benjamin Herrenschmidt
@ 2026-02-16 6:51 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2026-02-16 6:51 UTC (permalink / raw)
To: Mike Rapoport; +Cc: linux-mm, Alexander Potapenko, Marco Elver, Dmitry Vyukov
On Mon, 2026-02-16 at 16:34 +1100, Benjamin Herrenschmidt wrote:
>
> Here's a log on a t3 instance with you patch plus some printk's of
> mine showing the EFI memory reserves and with memblock debug.
Happens also on a qemu VM with 1GB of RAM, so .. :-) There's about 25MB
of RAM lost to UEFI on this with whatever version of qemu I was playing
with that the patch recovers.
It doesn't matter which "method" of freeing the page, the original one
or your free_reserved_page().
I'll switch to upstream to avoid confusion and test it all again.
Cheers,
Ben.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] mm: Fix memblock_free_late() when using deferred struct page
2026-02-06 10:33 ` Mike Rapoport
2026-02-10 1:04 ` Benjamin Herrenschmidt
@ 2026-02-16 10:36 ` Alexander Potapenko
1 sibling, 0 replies; 33+ messages in thread
From: Alexander Potapenko @ 2026-02-16 10:36 UTC (permalink / raw)
To: Mike Rapoport
Cc: Benjamin Herrenschmidt, linux-mm, Marco Elver, Dmitry Vyukov
On Fri, Feb 6, 2026 at 11:33 AM Mike Rapoport <rppt@kernel.org> wrote:
>
> (added KMSAN folks)
>
> On Wed, Feb 04, 2026 at 08:02:13PM +1100, Benjamin Herrenschmidt wrote:
> > On Wed, 2026-02-04 at 09:39 +0200, Mike Rapoport wrote:
> > > > I might be missing something but I don't see what would restrict
> > > > this
> > > > to the early pre-initialized struct pages other than that
> > > > early_page_initialised() test, so we can't rely on anything in
> > > > struct
> > > > page inside memblock_free_pages().
> > >
> > > Right, we can't rely on PG_Reserved being cleared for uninitialized
> > > pages :/
> > >
> > > But I overlooked an easier and actually reliable way: use
> > > free_reserved_area() instead of memblock_free_late().
> >
> > You mean replace all callers of memblock_free_late() and kill it ?
>
> That would be great, but with all the subtle differences you note below
> it's for the future :)
>
> > Or make memblock_free_late() use free_reserved_area() instead of
> > memblock_free_pages() ? :-)
>
> Yes, I think either calling free_reserved_page() in the loop in
> memblock_free_late() or replacing the entire loop with free_reserved_area().
>
> > The former misses:
> > - totalram_pages_inc() and kmemleak_free_part_phys() in
> > memblock_free_late()
> >
> > They also both miss as far as I can tell:
> >
> > if (!kmsan_memblock_free_pages(page, order)) {
> > /* KMSAN will take care of these pages. */
> > return;
> > }
> >
> > But I don't know if that matters, I don't know anything about kmsan :-)
>
> AFAIU, here kmsan allocates metadata for each page freed to buddy, but it
> handles reserved memory differently anyway, so it shouldn't be a problem.
I am a bit late here, sorry.
Yes, kmsan_init_shadow() iterates over the reserved ranges (which I
believe happens before memblock_free_late(), right?) and reserves the
metadata pages for them. So freeing these ranges we won't need to call
kmsan_memblock_free_pages().
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] mm: Fix memblock_free_late() when using deferred struct page
2026-02-16 4:53 ` Benjamin Herrenschmidt
@ 2026-02-16 15:28 ` Mike Rapoport
0 siblings, 0 replies; 33+ messages in thread
From: Mike Rapoport @ 2026-02-16 15:28 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: linux-mm, Alexander Potapenko, Marco Elver, Dmitry Vyukov
On Mon, Feb 16, 2026 at 03:53:33PM +1100, Benjamin Herrenschmidt wrote:
> (stripping history)
>
> So I went into a big refresher (or learning exercise since there's
> quite a bit here that I never really looked at before either).
>
> So here is a break down, in chronological order, of the setup and
> initialization of the memory map, and how the reserve business
> interacts with it as I understand it from reading the code.
>
> Please correct me if I missed or misunderstood something :-)
Your description is correct. There are some things that moved from arch to
mm_core_init() just recently, but the order remains the same.
> Also maybe this is worth turning into a piece of doc ?
Care to send a patch? ;-)
> Then some conclusions (I think I know why the patches crashed).
(trimmed down the description)
> * One thing I have NOT yet figured out ... do we have a problem if the
> page is in a hole that lands outside of a zone boundary ? I haven't
> really got my head deep down into the details of zone initializations
> (especially as we adjust the boundaries here or there), so this could
> be a problem.
Zones boundaries are determined by addressing constraints (DMA, DMA32,
NORMAL), node assignment and actual usable memory in memblock.memory.
For a machine like t3a.nano there will be ZONE_DMA up to 16M and ZONE_DMA32
from 16M till the end of the usable memory so there should be no problem
there :)
In the general case, I believe it is possible that some reserved memory
will not be covered by a zone, for example if the reserved memory is beyond
the end of the last zone.
I think that this is a really pathological case and we can dismiss it for
now.
> 99) Conclusion :-)
> ------------------
>
> Nothing firm yet here but a few hints at what could possibly go wrong
> and one obvious issue with the previous patch(es).
>
> First the obvious ... the proposed patch that just makes
> memblock_free_late() call free_reserved_page() is missing a call to
> pfn_valid(). Without this, it can (and will) hit holes in the mem_map,
> and that's probably one of the crashes I reported.
It can, not sure it will, but we want to stay on the safe side :)
> Now, it would be nice to then go allocate those missing bits of
> mem_map, because I really don't want to give up on that memory. Small
> instances are a thing and with the current price of DRAM, a fairly
> relevant one :-) But I'll look at that later.
>
> My original patch had the exact same issue btw.
>
> The other potential issue, for which I welcome your input as I'm
> running short on time for the day is ... the impact to zones. I see a
> possibility for those pages to be outside of any zone's
> zone_start_pfn/spanned_pages range ... or not ? As I said, I didn't get
> my head yet around the zones init and spanning adjustments that
> happens, so I don't know if we really have potentially "holes" here or
> not.
>
> This leads to the question... could we work around a lot of those
> issues easily by making the early efi_reserve_boot_services() *also*
> add the regions to memblock.memory in addition to memblock.reserve ?
> ie, those regions are marked as boot services code/data, so they must
> be memory to begin with, and that's all early enough that we can do it.
Adding the EFI boot services to memblock.memory would certainly solve both
potentially missing memory map and (unlikely) missing zone span.
More broadly, it would have been nice if e820/efi would add *anything* that
lives in DRAM to memblock.memory, but last time I tried I could not
convince x86 folks that even memory that's unusable to kernel is still
memory :)
> We should still add the missing pfn_valid() of course, if anything for
> the sake of any other caller of memblock_free_late() ... or we could
> change memblock_free_late() to only consider ranges that are both
> reserved *and* in memblock.memory. You mentioned that might be slow
> though.
memblock_free_late() can't consider memblock.memory and memblock.reserved
because it might be called after they were discarded.
And pfn_valid() should protect against accesses to non-existent memory map.
> Opinions ?
>
> Cheers,
> Ben.
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH v2] mm: Fix memblock_free_late() when using deferred struct page
2026-02-03 8:02 [PATCH] mm: Fix memblock_free_late() when using deferred struct page Benjamin Herrenschmidt
2026-02-03 18:40 ` Mike Rapoport
@ 2026-02-17 8:28 ` Benjamin Herrenschmidt
2026-02-17 12:32 ` Mike Rapoport
2026-02-17 21:47 ` Benjamin Herrenschmidt
1 sibling, 2 replies; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2026-02-17 8:28 UTC (permalink / raw)
To: linux-mm; +Cc: Mike Rapoport
We have two issues:
- One is we don't check for pfn_valid(). If this is called for
a page corresponding to a big enough memory hole that we don't have
allocated a corresponding sparsemem section for it, it will crash.
- Then, when using deferred struct page init, we can end up not
freeing the pages at all. This happens routinely with some of the
UEFI Boot Services memory, as soon as they fall above the threshold
of pages whose initialization is deferred.
We can very easily hit the !early_page_initialised() test in
memblock_free_pages() since the deferred initializer hasn't even
started yet. As a result we drop the pages on the floor.
Now, memblock_free_late() should only ever be called for pages that
are reserved, and thus for which the struct page has already been
initialized by memmap_init_reserved_pages().... as long as we check
for pfn_valid() as a big enough hole might cause entire sections of
the mem_map to not be allocated at all.
So it should be safe to just free them normally and ignore the deferred
initializer, which will skip over them as it skips over anything still
in the memblock reserved list.
This helps recover something like 140MB of RAM on EC2 t3a.nano instances
who only have 512MB to begin with (as to why UEFI uses that much, that's
a question for another day).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
v2. Reworked a bit to add the pfn_valid() check, remove the bogus memblock
access in debug mode, and add a test of PageReserved() for sanity.
We could separately do a patch forcing UEFI Boot Services into
memblock.memory but so far I haven't hit a case where that is necessary.
mm/memblock.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/mm/memblock.c b/mm/memblock.c
index 905d06b16348a..71eb25b68851e 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1770,9 +1770,14 @@ void __init memblock_free_late(phys_addr_t base, phys_addr_t size)
cursor = PFN_UP(base);
end = PFN_DOWN(base + size);
+ /* Only free pages that were reserved */
for (; cursor < end; cursor++) {
- memblock_free_pages(pfn_to_page(cursor), cursor, 0);
- totalram_pages_inc();
+ struct page *p;
+ if (!pfn_valid(cursor))
+ continue;
+ p = pfn_to_page(cursor);
+ if (!WARN_ON(!PageReserved(p)))
+ free_reserved_page(pfn_to_page(cursor));
}
}
--
2.43.0
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v2] mm: Fix memblock_free_late() when using deferred struct page
2026-02-17 8:28 ` [PATCH v2] " Benjamin Herrenschmidt
@ 2026-02-17 12:32 ` Mike Rapoport
2026-02-17 22:00 ` Benjamin Herrenschmidt
2026-02-17 21:47 ` Benjamin Herrenschmidt
1 sibling, 1 reply; 33+ messages in thread
From: Mike Rapoport @ 2026-02-17 12:32 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linux-mm
On Tue, Feb 17, 2026 at 07:28:12PM +1100, Benjamin Herrenschmidt wrote:
> We have two issues:
>
> - One is we don't check for pfn_valid(). If this is called for
> a page corresponding to a big enough memory hole that we don't have
> allocated a corresponding sparsemem section for it, it will crash.
>
> - Then, when using deferred struct page init, we can end up not
> freeing the pages at all. This happens routinely with some of the
> UEFI Boot Services memory, as soon as they fall above the threshold
> of pages whose initialization is deferred.
>
> We can very easily hit the !early_page_initialised() test in
> memblock_free_pages() since the deferred initializer hasn't even
> started yet. As a result we drop the pages on the floor.
>
> Now, memblock_free_late() should only ever be called for pages that
> are reserved, and thus for which the struct page has already been
> initialized by memmap_init_reserved_pages().... as long as we check
> for pfn_valid() as a big enough hole might cause entire sections of
> the mem_map to not be allocated at all.
>
> So it should be safe to just free them normally and ignore the deferred
> initializer, which will skip over them as it skips over anything still
> in the memblock reserved list.
>
> This helps recover something like 140MB of RAM on EC2 t3a.nano instances
> who only have 512MB to begin with (as to why UEFI uses that much, that's
> a question for another day).
>
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> ---
>
> v2. Reworked a bit to add the pfn_valid() check, remove the bogus memblock
> access in debug mode, and add a test of PageReserved() for sanity.
>
> We could separately do a patch forcing UEFI Boot Services into
> memblock.memory but so far I haven't hit a case where that is necessary.
>
> mm/memblock.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/mm/memblock.c b/mm/memblock.c
> index 905d06b16348a..71eb25b68851e 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -1770,9 +1770,14 @@ void __init memblock_free_late(phys_addr_t base, phys_addr_t size)
> cursor = PFN_UP(base);
> end = PFN_DOWN(base + size);
>
> + /* Only free pages that were reserved */
> for (; cursor < end; cursor++) {
> - memblock_free_pages(pfn_to_page(cursor), cursor, 0);
> - totalram_pages_inc();
> + struct page *p;
> + if (!pfn_valid(cursor))
> + continue;
> + p = pfn_to_page(cursor);
> + if (!WARN_ON(!PageReserved(p)))
Took me a second with the double negation. I like
if (WARN_ON(!PageReserved(p)))
continue;
more.
> + free_reserved_page(pfn_to_page(cursor));
We already have page here, no need to pfn_to_page() again :)
I can fix those up when applying.
> }
> }
>
> --
> 2.43.0
>
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v2] mm: Fix memblock_free_late() when using deferred struct page
2026-02-17 8:28 ` [PATCH v2] " Benjamin Herrenschmidt
2026-02-17 12:32 ` Mike Rapoport
@ 2026-02-17 21:47 ` Benjamin Herrenschmidt
2026-02-18 0:15 ` Benjamin Herrenschmidt
1 sibling, 1 reply; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2026-02-17 21:47 UTC (permalink / raw)
To: linux-mm; +Cc: Mike Rapoport
On Tue, 2026-02-17 at 19:28 +1100, Benjamin Herrenschmidt wrote:
> We have two issues:
.../...
So I ran this through our full regression suite and out of hundreds
(thousands ?) of runs, it hit this *once*:
230 [ 0.036100] RETBleed: WARNING: Spectre v2 mitigation leaves CPU vulnerable to RETBleed attacks, data leaks possible!
231 [ 0.045442] BUG: unable to handle page fault for address: fffff1688051dc08
232 [ 0.045442] #PF: supervisor read access in kernel mode
233 [ 0.045442] #PF: error_code(0x0000) - not-present page
234 [ 0.045442] PGD 0 P4D 0
235 [ 0.045442] Oops: Oops: 0000 [#1] PREEMPT SMP PTI
236 [ 0.045442] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.12.68-92.123.amzn2023.x86_64 #1
237 [ 0.045442] Hardware name: Amazon EC2 t3.nano/, BIOS 1.0 10/16/2017
238 [ 0.045442] RIP: 0010:__list_del_entry_valid_or_report+0x32/0xb0
239 [ 0.045442] Code: 89 fe 48 85 d2 74 3e 48 85 c9 74 47 48 b8 00 01 00 00 00 00 ad de 48 39 c2 74 46 48 b8 22 01 00 00 00 00 ad de 48 39 c1 74 45 <4c> 8b 01 49 39 f8 75 4e 4c 8b 4a 08 4d 39 c1 75 56 b8 01 00 00 00
240 [ 0.045442] RSP: 0000:ffffffffadc03cc0 EFLAGS: 00010002
241 [ 0.045442] RAX: dead000000000122 RBX: fffff7c440651c80 RCX: fffff1688051dc08
242 [ 0.045442] RDX: fffff1688063ca48 RSI: fffff7c440651c88 RDI: fffff7c440651c88
243 [ 0.045442] RBP: 0000000000000000 R08: ffffffffffffffc0 R09: 0000000000000000
244 [ 0.045442] R10: 000000000000003c R11: 0000000000000200 R12: ffff88831b8cbc80
245 [ 0.045442] R13: 0000000000000000 R14: 0000000000019473 R15: fffff7c440651cc0
246 [ 0.045442] FS: 0000000000000000(0000) GS:ffff88831aa00000(0000) knlGS:0000000000000000
247 [ 0.045442] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
248 [ 0.045442] CR2: fffff1688051dc08 CR3: 000000000ec34001 CR4: 00000000007706f0
249 [ 0.045442] PKRU: 55555554
250 [ 0.045442] Call Trace:
251 [ 0.045442] <TASK>
252 [ 0.045442] __free_one_page+0x170/0x520
253 [ 0.045442] free_pcppages_bulk+0x151/0x1e0
254 [ 0.045442] free_unref_page_commit+0x263/0x320
255 [ 0.045442] free_unref_page+0x2c8/0x5b0
256 [ 0.045442] free_reserved_page+0x1c/0x30
257 [ 0.045442] memblock_free_late+0xea/0x190
258 [ 0.045442] efi_free_boot_services+0x11f/0x2e0
259 [ 0.045442] __efi_enter_virtual_mode+0x181/0x210
260 [ 0.045442] efi_enter_virtual_mode+0xcd/0x110
261 [ 0.045442] start_kernel+0x393/0x500
262 [ 0.045442] x86_64_start_reservations+0x14/0x30
263 [ 0.045442] x86_64_start_kernel+0x77/0x80
264 [ 0.045442] common_startup_64+0x13e/0x141
265 [ 0.045442] </TASK>
266 [ 0.045442] Modules linked in:
267 [ 0.045442] CR2: fffff1688051dc08
268 [ 0.045442] ---[ end trace 0000000000000000 ]---
269 [ 0.045442] RIP: 0010:__list_del_entry_valid_or_report+0x32/0xb0
270 [ 0.045442] Code: 89 fe 48 85 d2 74 3e 48 85 c9 74 47 48 b8 00 01 00 00 00 00 ad de 48 39 c2 74 46 48 b8 22 01 00 00 00 00 ad de 48 39 c1 74 45 <4c> 8b 01 49 39 f8 75 4e 4c 8b 4a 08 4d 39 c1 75 56 b8 01 00 00 00
271 [ 0.045442] RSP: 0000:ffffffffadc03cc0 EFLAGS: 00010002
272 [ 0.045442] RAX: dead000000000122 RBX: fffff7c440651c80 RCX: fffff1688051dc08
273 [ 0.045442] RDX: fffff1688063ca48 RSI: fffff7c440651c88 RDI: fffff7c440651c88
274 [ 0.045442] RBP: 0000000000000000 R08: ffffffffffffffc0 R09: 0000000000000000
275 [ 0.045442] R10: 000000000000003c R11: 0000000000000200 R12: ffff88831b8cbc80
276 [ 0.045442] R13: 0000000000000000 R14: 0000000000019473 R15: fffff7c440651cc0
277 [ 0.045442] FS: 0000000000000000(0000) GS:ffff88831aa00000(0000) knlGS:0000000000000000
278 [ 0.045442] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
279 [ 0.045442] CR2: fffff1688051dc08 CR3: 000000000ec34001 CR4: 00000000007706f0
280 [ 0.045442] PKRU: 55555554
281 [ 0.045442] Kernel panic - not syncing: Fatal exception
282 [ 0.045442] ---[ end Kernel panic - not syncing: Fatal exception ]---
283
Unfortunately, I don't have a more complete log (those machines boot
with "quiet").
There is definitely something fishy going on, though I don't know what,
as the page is reserved so it should *not* be touched by the deferred
initialization... Could there be an issue by which we incorrectly go
look at the head page (which hasn't been initialized) of a *potential*
compound/huge page ?
Cheers,
Ben.
> - One is we don't check for pfn_valid(). If this is called for
> a page corresponding to a big enough memory hole that we don't have
> allocated a corresponding sparsemem section for it, it will crash.
>
> - Then, when using deferred struct page init, we can end up not
> freeing the pages at all. This happens routinely with some of the
> UEFI Boot Services memory, as soon as they fall above the threshold
> of pages whose initialization is deferred.
>
> We can very easily hit the !early_page_initialised() test in
> memblock_free_pages() since the deferred initializer hasn't even
> started yet. As a result we drop the pages on the floor.
>
> Now, memblock_free_late() should only ever be called for pages that
> are reserved, and thus for which the struct page has already been
> initialized by memmap_init_reserved_pages().... as long as we check
> for pfn_valid() as a big enough hole might cause entire sections of
> the mem_map to not be allocated at all.
>
> So it should be safe to just free them normally and ignore the
> deferred
> initializer, which will skip over them as it skips over anything
> still
> in the memblock reserved list.
>
> This helps recover something like 140MB of RAM on EC2 t3a.nano
> instances
> who only have 512MB to begin with (as to why UEFI uses that much,
> that's
> a question for another day).
>
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> ---
>
> v2. Reworked a bit to add the pfn_valid() check, remove the bogus
> memblock
> access in debug mode, and add a test of PageReserved() for sanity.
>
> We could separately do a patch forcing UEFI Boot Services into
> memblock.memory but so far I haven't hit a case where that is
> necessary.
>
> mm/memblock.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/mm/memblock.c b/mm/memblock.c
> index 905d06b16348a..71eb25b68851e 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -1770,9 +1770,14 @@ void __init memblock_free_late(phys_addr_t
> base, phys_addr_t size)
> cursor = PFN_UP(base);
> end = PFN_DOWN(base + size);
>
> + /* Only free pages that were reserved */
> for (; cursor < end; cursor++) {
> - memblock_free_pages(pfn_to_page(cursor), cursor, 0);
> - totalram_pages_inc();
> + struct page *p;
> + if (!pfn_valid(cursor))
> + continue;
> + p = pfn_to_page(cursor);
> + if (!WARN_ON(!PageReserved(p)))
> + free_reserved_page(pfn_to_page(cursor));
> }
> }
>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v2] mm: Fix memblock_free_late() when using deferred struct page
2026-02-17 12:32 ` Mike Rapoport
@ 2026-02-17 22:00 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2026-02-17 22:00 UTC (permalink / raw)
To: Mike Rapoport; +Cc: linux-mm
On Tue, 2026-02-17 at 14:32 +0200, Mike Rapoport wrote:
>
> Took me a second with the double negation. I like
>
> if (WARN_ON(!PageReserved(p)))
> continue;
Fair :-)
>
> > + free_reserved_page(pfn_to_page(cursor));
>
> We already have page here, no need to pfn_to_page() again :)
>
> I can fix those up when applying.
Haha, I added "p" and forgot to fix that one up. Don't apply though.
There's still a problem. See my other email.
Cheers,
Ben.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v2] mm: Fix memblock_free_late() when using deferred struct page
2026-02-17 21:47 ` Benjamin Herrenschmidt
@ 2026-02-18 0:15 ` Benjamin Herrenschmidt
2026-02-18 8:05 ` Mike Rapoport
0 siblings, 1 reply; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2026-02-18 0:15 UTC (permalink / raw)
To: linux-mm; +Cc: Mike Rapoport
On Wed, 2026-02-18 at 08:47 +1100, Benjamin Herrenschmidt wrote:
> There is definitely something fishy going on, though I don't know what,
> as the page is reserved so it should *not* be touched by the deferred
> initialization... Could there be an issue by which we incorrectly go
> look at the head page (which hasn't been initialized) of a *potential*
> compound/huge page ?
So ... not 100% certain but I see this in __free_one_page():
buddy = find_buddy_page_pfn(page, pfn, order, &buddy_pfn);
Then goes do things with the buddy. find_buddy_page_pfn() is
(stripping comments):
static inline unsigned long
__find_buddy_pfn(unsigned long page_pfn, unsigned int order)
{
return page_pfn ^ (1 << order);
}
static inline struct page *find_buddy_page_pfn(struct page *page,
unsigned long pfn, unsigned int order, unsigned long *buddy_pfn)
{
unsigned long __buddy_pfn = __find_buddy_pfn(pfn, order);
struct page *buddy;
buddy = page + (__buddy_pfn - pfn);
if (buddy_pfn)
*buddy_pfn = __buddy_pfn;
if (page_is_buddy(page, buddy, order))
return buddy;
return NULL;
}
Now what happens if order is 0, page_pfn is a reserved page whose
"buddy" isn't reserved ... and whose struct page is not initialized
yet due to deferral ?
Unless I'm mistaken, we are going to poke around at uninitialized
struct pages and things can go anywhere from there, can't they ?
Or am I missing a piece of the puzzle ?
Cheers,
Ben.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v2] mm: Fix memblock_free_late() when using deferred struct page
2026-02-18 0:15 ` Benjamin Herrenschmidt
@ 2026-02-18 8:05 ` Mike Rapoport
2026-02-19 2:48 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 33+ messages in thread
From: Mike Rapoport @ 2026-02-18 8:05 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linux-mm
On Wed, Feb 18, 2026 at 11:15:59AM +1100, Benjamin Herrenschmidt wrote:
> On Wed, 2026-02-18 at 08:47 +1100, Benjamin Herrenschmidt wrote:
> > There is definitely something fishy going on, though I don't know what,
> > as the page is reserved so it should *not* be touched by the deferred
> > initialization... Could there be an issue by which we incorrectly go
> > look at the head page (which hasn't been initialized) of a *potential*
> > compound/huge page ?
>
> So ... not 100% certain but I see this in __free_one_page():
>
>
> buddy = find_buddy_page_pfn(page, pfn, order, &buddy_pfn);
>
> Then goes do things with the buddy. find_buddy_page_pfn() is
> (stripping comments):
>
> static inline unsigned long
> __find_buddy_pfn(unsigned long page_pfn, unsigned int order)
> {
> return page_pfn ^ (1 << order);
> }
>
> static inline struct page *find_buddy_page_pfn(struct page *page,
> unsigned long pfn, unsigned int order, unsigned long *buddy_pfn)
> {
> unsigned long __buddy_pfn = __find_buddy_pfn(pfn, order);
> struct page *buddy;
>
> buddy = page + (__buddy_pfn - pfn);
> if (buddy_pfn)
> *buddy_pfn = __buddy_pfn;
>
> if (page_is_buddy(page, buddy, order))
> return buddy;
> return NULL;
> }
>
> Now what happens if order is 0, page_pfn is a reserved page whose
> "buddy" isn't reserved ... and whose struct page is not initialized
> yet due to deferral ?
>
> Unless I'm mistaken, we are going to poke around at uninitialized
> struct pages and things can go anywhere from there, can't they ?
It's possible.
> Or am I missing a piece of the puzzle ?
Apparently we do miss some piece of the puzzle, otherwise you'd see no
crashes :)
I think an easy and backportable fix would be to make
efi_free_boot_services() an initcall, so that it will surely run after
deferred pages are initialized.
And since the boot services memory is not memblock_alloc()ed but rather
memblock_reserve()ed, it should be freed with free_reserved_area().
With the symptom fixed, we can audit memblock_free_late() and
free_reserved_area() callers and see how to make this all less messy and
more robust.
> Cheers,
> Ben.
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v2] mm: Fix memblock_free_late() when using deferred struct page
2026-02-18 8:05 ` Mike Rapoport
@ 2026-02-19 2:48 ` Benjamin Herrenschmidt
2026-02-19 10:16 ` Mike Rapoport
0 siblings, 1 reply; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2026-02-19 2:48 UTC (permalink / raw)
To: Mike Rapoport; +Cc: linux-mm
On Wed, 2026-02-18 at 10:05 +0200, Mike Rapoport wrote:
> Apparently we do miss some piece of the puzzle, otherwise you'd see no
> crashes :)
>
> I think an easy and backportable fix would be to make
> efi_free_boot_services() an initcall, so that it will surely run after
> deferred pages are initialized.
> And since the boot services memory is not memblock_alloc()ed but rather
> memblock_reserve()ed, it should be freed with free_reserved_area().
>
> With the symptom fixed, we can audit memblock_free_late() and
> free_reserved_area() callers and see how to make this all less messy and
> more robust.
I will play around. The biggest issue I see with this is that
efi_free_boot_services() also manipulates the efi memmap, and that's
done without any locking whatsoever.
I'm semi tempted to split that part. We can unmap the boot services
from EFI memory in the current spot, and defer the actual freeing.
Cheers,
Ben.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v2] mm: Fix memblock_free_late() when using deferred struct page
2026-02-19 2:48 ` Benjamin Herrenschmidt
@ 2026-02-19 10:16 ` Mike Rapoport
2026-02-19 22:46 ` Benjamin Herrenschmidt
` (3 more replies)
0 siblings, 4 replies; 33+ messages in thread
From: Mike Rapoport @ 2026-02-19 10:16 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linux-mm
On Thu, Feb 19, 2026 at 01:48:16PM +1100, Benjamin Herrenschmidt wrote:
> On Wed, 2026-02-18 at 10:05 +0200, Mike Rapoport wrote:
> > Apparently we do miss some piece of the puzzle, otherwise you'd see no
> > crashes :)
> >
> > I think an easy and backportable fix would be to make
> > efi_free_boot_services() an initcall, so that it will surely run after
> > deferred pages are initialized.
> > And since the boot services memory is not memblock_alloc()ed but rather
> > memblock_reserve()ed, it should be freed with free_reserved_area().
> >
> > With the symptom fixed, we can audit memblock_free_late() and
> > free_reserved_area() callers and see how to make this all less messy and
> > more robust.
>
> I will play around. The biggest issue I see with this is that
> efi_free_boot_services() also manipulates the efi memmap, and that's
> done without any locking whatsoever.
>
> I'm semi tempted to split that part. We can unmap the boot services
> from EFI memory in the current spot, and defer the actual freeing.
Let's split it. EFI does weird things with memory already, like mremapping
normal memory for example.
Here's my take on the split. Lightly tested on qemu and recovered ~45M of
ram with the OVMF version I have :)
From fdfbda756d6107a7bc7c3ad4eb589af810ddba49 Mon Sep 17 00:00:00 2001
From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
Date: Thu, 19 Feb 2026 11:22:53 +0200
Subject: [PATCH] x86/efi: defer freeing of boot services memory
efi_free_boot_services() frees memory occupied by EFI_BOOT_SERVICES_CODE
and EFI_BOOT_SERVICES_DATA using memblock_free_late().
There are two issue with that: memblock_free_late() should be used for
memory allocated with memblock_alloc() while the memory reserved with
memblock_reserve() should be freed with free_reserved_area().
More acutely, with CONFIG_DEFERRED_STRUCT_PAGE_INIT=y
efi_free_boot_services() is called before deferred initialization of the
memory map is complete. The freeing path
If the freed memory resides in the areas that memory map for them is
still uninitialized, they won't be actually freed because
memblock_free_late() calls memblock_free_pages() and the latter skips
uninitialized pages.
Using free_reserved_area() at this point is also problematic because
__free_page() accesses the buddy of the freed page and that again might
end up in uninitialized part of the memory map.
Delaying the entire efi_free_boot_services() could be problematic
because in addition to freeing boot services memory it updates
efi.memmap without any synchronization and that's undesirable late in
boot when there is concurrency.
More robust approach is to only defer freeing of the EFI boot services memory.
Make efi_free_boot_services() collect ranges that should be freed into
an array and add an initcall efi_free_boot_services_memory() that walks
that array and actually frees the memory using free_reserved_area().
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
arch/x86/platform/efi/quirks.c | 42 +++++++++++++++++++++++++++++++++-
1 file changed, 41 insertions(+), 1 deletion(-)
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 553f330198f2..bba1fb57a4bd 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -404,17 +404,32 @@ static void __init efi_unmap_pages(efi_memory_desc_t *md)
pr_err("Failed to unmap VA mapping for 0x%llx\n", va);
}
+struct efi_freeable_range {
+ u64 start;
+ u64 end;
+};
+
+static struct efi_freeable_range *ranges_to_free;
+
void __init efi_free_boot_services(void)
{
struct efi_memory_map_data data = { 0 };
efi_memory_desc_t *md;
int num_entries = 0;
+ int idx = 0;
void *new, *new_md;
/* Keep all regions for /sys/kernel/debug/efi */
if (efi_enabled(EFI_DBG))
return;
+ ranges_to_free = kzalloc(sizeof(*ranges_to_free) * efi.memmap.nr_map,
+ GFP_KERNEL);
+ if (!ranges_to_free) {
+ pr_err("Failed to allocate storage for freeable EFI regions\n");
+ return;
+ }
+
for_each_efi_memory_desc(md) {
unsigned long long start = md->phys_addr;
unsigned long long size = md->num_pages << EFI_PAGE_SHIFT;
@@ -471,7 +486,15 @@ void __init efi_free_boot_services(void)
start = SZ_1M;
}
- memblock_free_late(start, size);
+ /*
+ * With CONFIG_DEFERRED_STRUCT_PAGE_INIT parts of the memory
+ * map are still not initialized and we can't reliably free
+ * memory here.
+ * Queue the ranges to free at a later point.
+ */
+ ranges_to_free[idx].start = start;
+ ranges_to_free[idx].end = start + size;
+ idx++;
}
if (!num_entries)
@@ -512,6 +535,23 @@ void __init efi_free_boot_services(void)
}
}
+static int __init efi_free_boot_services_memory(void)
+{
+ struct efi_freeable_range *range = ranges_to_free;
+
+ while (range->start) {
+ void *start = phys_to_virt(range->start);
+ void *end = phys_to_virt(range->end);
+
+ free_reserved_area(start, end, -1, NULL);
+ range++;
+ }
+ kfree(ranges_to_free);
+
+ return 0;
+}
+late_initcall(efi_free_boot_services_memory);
+
/*
* A number of config table entries get remapped to virtual addresses
* after entering EFI virtual mode. However, the kexec kernel requires
base-commit: 05f7e89ab9731565d8a62e3b5d1ec206485eeb0b
--
2.51.0
> Cheers,
> Ben.
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v2] mm: Fix memblock_free_late() when using deferred struct page
2026-02-19 10:16 ` Mike Rapoport
@ 2026-02-19 22:46 ` Benjamin Herrenschmidt
2026-02-20 4:57 ` Benjamin Herrenschmidt
2026-02-20 9:00 ` Mike Rapoport
2026-02-20 5:12 ` Benjamin Herrenschmidt
` (2 subsequent siblings)
3 siblings, 2 replies; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2026-02-19 22:46 UTC (permalink / raw)
To: Mike Rapoport; +Cc: linux-mm
On Thu, 2026-02-19 at 12:16 +0200, Mike Rapoport wrote:
>
> Let's split it. EFI does weird things with memory already, like mremapping
> normal memory for example.
Yup.
> Here's my take on the split. Lightly tested on qemu and recovered ~45M of
> ram with the OVMF version I have :)
Nice :-) I'll test this here.
> >
> +struct efi_freeable_range {
> + u64 start;
> + u64 end;
> +};
>
Haha, you went the blunt way :-) I was trying to avoid creating yet-
another structure with "start/end" :-)
> +
> +static struct efi_freeable_range *ranges_to_free;
> +
> void __init efi_free_boot_services(void)
> {
I was going to call it efi_unmap_boot_services() to avoid having two
things with almost the same name.
> struct efi_memory_map_data data = { 0 };
> efi_memory_desc_t *md;
> int num_entries = 0;
> + int idx = 0;
> void *new, *new_md;
>
> /* Keep all regions for /sys/kernel/debug/efi */
> if (efi_enabled(EFI_DBG))
> return;
>
> + ranges_to_free = kzalloc(sizeof(*ranges_to_free) * efi.memmap.nr_map,
> + GFP_KERNEL);
> + if (!ranges_to_free) {
> + pr_err("Failed to allocate storage for freeable EFI regions\n");
> + return;
> + }
Do we still want to do the whole unmap dance in that case ? I mean, OOM
here means the system is pretty much a goner at that stage but ...
> for_each_efi_memory_desc(md) {
> unsigned long long start = md->phys_addr;
> unsigned long long size = md->num_pages << EFI_PAGE_SHIFT;
> @@ -471,7 +486,15 @@ void __init efi_free_boot_services(void)
> start = SZ_1M;
> }
>
> - memblock_free_late(start, size);
> + /*
> + * With CONFIG_DEFERRED_STRUCT_PAGE_INIT parts of the memory
> + * map are still not initialized and we can't reliably free
> + * memory here.
> + * Queue the ranges to free at a later point.
> + */
> + ranges_to_free[idx].start = start;
> + ranges_to_free[idx].end = start + size;
> + idx++;
Do we want to make this conditional to CONFIG_DEFERRED_STRUCT_PAGE_INIT
or we don't care ?
> }
>
> if (!num_entries)
> @@ -512,6 +535,23 @@ void __init efi_free_boot_services(void)
> }
> }
>
> +static int __init efi_free_boot_services_memory(void)
> +{
> + struct efi_freeable_range *range = ranges_to_free;
> +
> + while (range->start) {
> + void *start = phys_to_virt(range->start);
> + void *end = phys_to_virt(range->end);
> +
> + free_reserved_area(start, end, -1, NULL);
I assume here too the total_ram_page_inc stuff is taken care of ? I
haven't really looked. This feels like a fragile counter.
> + range++;
> + }
> + kfree(ranges_to_free);
> +
> + return 0;
> +}
> +late_initcall(efi_free_boot_services_memory);
> +
> /*
> * A number of config table entries get remapped to virtual addresses
> * after entering EFI virtual mode. However, the kexec kernel requires
>
> base-commit: 05f7e89ab9731565d8a62e3b5d1ec206485eeb0b
> --
> 2.51.0
>
>
> > Cheers,
> > Ben.
> >
>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v2] mm: Fix memblock_free_late() when using deferred struct page
2026-02-19 22:46 ` Benjamin Herrenschmidt
@ 2026-02-20 4:57 ` Benjamin Herrenschmidt
2026-02-20 9:09 ` Mike Rapoport
2026-02-20 9:00 ` Mike Rapoport
1 sibling, 1 reply; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2026-02-20 4:57 UTC (permalink / raw)
To: Mike Rapoport; +Cc: linux-mm
> > +late_initcall(efi_free_boot_services_memory);
Why late btw ? Any particular reason ?
One very minor nit (but it kind of is annoying when you gather logs at
scale and some people do look at this :-) ) is that the memory isn't
accounted in the boot message:
Memory: 224440K/483372K available (16384K kernel code, 9440K rwdata,
11344K rodata, 3732K init, 6480K bss, 254088K reserved, 0K cma-
reserved)
I'm not going to cry about this, but it might be nice to have the
__initcall display how much extra if freed so it's just a log grep
away.
Cheers,
Ben.
> > /*
> > * A number of config table entries get remapped to virtual
> > addresses
> > * after entering EFI virtual mode. However, the kexec kernel
> > requires
> >
> > base-commit: 05f7e89ab9731565d8a62e3b5d1ec206485eeb0b
> > --
> > 2.51.0
> >
> >
> > > Cheers,
> > > Ben.
> > >
> >
>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v2] mm: Fix memblock_free_late() when using deferred struct page
2026-02-19 10:16 ` Mike Rapoport
2026-02-19 22:46 ` Benjamin Herrenschmidt
@ 2026-02-20 5:12 ` Benjamin Herrenschmidt
2026-02-20 5:15 ` Benjamin Herrenschmidt
2026-02-20 5:47 ` Benjamin Herrenschmidt
3 siblings, 0 replies; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2026-02-20 5:12 UTC (permalink / raw)
To: Mike Rapoport; +Cc: linux-mm
Ugh ... backport to 6.12.68:
(sorry, no more logs, it's from the large scale testsuite, a couple of
hits out of 1450 tests but all run with "quiet" on the command line).
260 [ 0.014444] ACPI: SPCR: [Firmware Bug]: Unexpected SPCR Access Width. Defaulting to byte size
261 [ 1.015737] BUG: kernel NULL pointer dereference, address: 0000000000000000
262 [ 1.016238] #PF: supervisor read access in kernel mode
263 [ 1.016651] #PF: error_code(0x0000) - not-present page
264 [ 1.017070] PGD 0 P4D 0
265 [ 1.017274] Oops: Oops: 0000 [#1] PREEMPT SMP PTI
266 [ 1.017664] CPU: 23 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.68-94.123.amzn2023.x86_64 #1
267 [ 1.018384] Hardware name: Amazon EC2 c5n.metal/Not Specified, BIOS 1.0 10/16/2017
268 [ 1.019011] RIP: 0010:efi_free_boot_services_memory+0xd/0x60
269 [ 1.019472] Code: 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 8b 2d 2b a2 39 00 <48> 8b 45 00 48 85 c0 74 31 48 8b 75 08 31 c9 ba ff ff ff ff 48 83
270 [ 1.021033] RSP: 0000:ffffcfcd40037e00 EFLAGS: 00010246
271 [ 1.021461] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
272 [ 1.022059] RDX: 0000000000000000 RSI: 0000000000000202 RDI: ffffffffafa663d0
273 [ 1.022637] RBP: 0000000000000000 R08: 000001158a8b1d80 R09: 000001158a8b1d80
274 [ 1.023222] R10: 0000000000000012 R11: fefefefefefefeff R12: ffff8a8a5f142a00
275 [ 1.023813] R13: 0000000000000139 R14: 0000000000000000 R15: 0000000000000000
276 [ 1.024399] FS: 0000000000000000(0000) GS:ffff8a8a5ee80000(0000) knlGS:0000000000000000
277 [ 1.025064] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
278 [ 1.025513] CR2: 0000000000000000 CR3: 0000000683c34001 CR4: 00000000007706f0
279 [ 1.025513] PKRU: 55555554
280 [ 1.025513] Call Trace:
281 [ 1.025513] <TASK>
282 [ 1.025513] ? __pfx_efi_free_boot_services_memory+0x10/0x10
283 [ 1.025513] do_one_initcall+0x41/0x300
284 [ 1.025513] do_initcalls+0xac/0x130
285 [ 1.025513] kernel_init_freeable+0x256/0x310
286 [ 1.025513] ? __pfx_kernel_init+0x10/0x10
287 [ 1.025513] kernel_init+0x16/0x1c0
288 [ 1.025513] ret_from_fork+0x2d/0x50
289 [ 1.025513] ? __pfx_kernel_init+0x10/0x10
290 [ 1.025513] ret_from_fork_asm+0x1a/0x30
291 [ 1.025513] </TASK>
292 [ 1.025513] Modules linked in:
293 [ 1.025513] CR2: 0000000000000000
294 [ 1.025513] ---[ end trace 0000000000000000 ]---
295 [ 1.025513] RIP: 0010:efi_free_boot_services_memory+0xd/0x60
296 [ 1.025513] Code: 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 8b 2d 2b a2 39 00 <48> 8b 45 00 48 85 c0 74 31 48 8b 75 08 31 c9 ba ff ff ff ff 48 83
297 [ 1.025513] RSP: 0000:ffffcfcd40037e00 EFLAGS: 00010246
298 [ 1.025513] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
299 [ 1.025513] RDX: 0000000000000000 RSI: 0000000000000202 RDI: ffffffffafa663d0
300 [ 1.025513] RBP: 0000000000000000 R08: 000001158a8b1d80 R09: 000001158a8b1d80
301 [ 1.025513] R10: 0000000000000012 R11: fefefefefefefeff R12: ffff8a8a5f142a00
302 [ 1.025513] R13: 0000000000000139 R14: 0000000000000000 R15: 0000000000000000
303 [ 1.025513] FS: 0000000000000000(0000) GS:ffff8a8a5ee80000(0000) knlGS:0000000000000000
304 [ 1.025513] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
305 [ 1.025513] CR2: 0000000000000000 CR3: 0000000683c34001 CR4: 00000000007706f0
306 [ 1.025513] PKRU: 55555554
307 [ 1.025513] Kernel panic - not syncing: Fatal exception
308 [ 1.025513] Kernel Offset: 0x2c000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
309 [ 1.025513] ---[ end Kernel panic - not syncing: Fatal exception ]---
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v2] mm: Fix memblock_free_late() when using deferred struct page
2026-02-19 10:16 ` Mike Rapoport
2026-02-19 22:46 ` Benjamin Herrenschmidt
2026-02-20 5:12 ` Benjamin Herrenschmidt
@ 2026-02-20 5:15 ` Benjamin Herrenschmidt
2026-02-20 5:47 ` Benjamin Herrenschmidt
3 siblings, 0 replies; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2026-02-20 5:15 UTC (permalink / raw)
To: Mike Rapoport; +Cc: linux-mm
On Thu, 2026-02-19 at 12:16 +0200, Mike Rapoport wrote:
> static struct efi_freeable_range *ranges_to_free;
__initdata ?
> +static int __init efi_free_boot_services_memory(void)
> +{
> + struct efi_freeable_range *range = ranges_to_free;
> +
if (!range)
return 0;
IE. We might not even be running UEFI or we might have hit the
if (efi_enabled(EFI_DBG))
return;
test and thus not allocated the table.
> + while (range->start) {
> + void *start = phys_to_virt(range->start);
> + void *end = phys_to_virt(range->end);
> +
> + free_reserved_area(start, end, -1, NULL);
> + range++;
> + }
> + kfree(ranges_to_free);
> + return 0;
> +}
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v2] mm: Fix memblock_free_late() when using deferred struct page
2026-02-19 10:16 ` Mike Rapoport
` (2 preceding siblings ...)
2026-02-20 5:15 ` Benjamin Herrenschmidt
@ 2026-02-20 5:47 ` Benjamin Herrenschmidt
3 siblings, 0 replies; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2026-02-20 5:47 UTC (permalink / raw)
To: Mike Rapoport; +Cc: linux-mm
On Thu, 2026-02-19 at 12:16 +0200, Mike Rapoport wrote:
> + ranges_to_free = kzalloc(sizeof(*ranges_to_free) * efi.memmap.nr_map,
> + GFP_KERNEL);
Another issue ... you need to alloc n+1 since you need a terminating
NULL entry and there's no guarantee that the memory map will contain
something you can't free :-) Very very hypothetical scenario but...
Cheers,
Ben.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v2] mm: Fix memblock_free_late() when using deferred struct page
2026-02-19 22:46 ` Benjamin Herrenschmidt
2026-02-20 4:57 ` Benjamin Herrenschmidt
@ 2026-02-20 9:00 ` Mike Rapoport
1 sibling, 0 replies; 33+ messages in thread
From: Mike Rapoport @ 2026-02-20 9:00 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linux-mm
On Fri, Feb 20, 2026 at 09:46:50AM +1100, Benjamin Herrenschmidt wrote:
> On Thu, 2026-02-19 at 12:16 +0200, Mike Rapoport wrote:
> >
> > Let's split it. EFI does weird things with memory already, like mremapping
> > normal memory for example.
>
> Yup.
>
> > Here's my take on the split. Lightly tested on qemu and recovered ~45M of
> > ram with the OVMF version I have :)
>
> Nice :-) I'll test this here.
>
> > >
> > +struct efi_freeable_range {
> > + u64 start;
> > + u64 end;
> > +};
> >
>
> Haha, you went the blunt way :-) I was trying to avoid creating yet-
> another structure with "start/end" :-)
Well, seems to me the easiest and the most efficient :)
I could have used "struct range", but I don't like it's semantics with
excluding the end. It would mean adding/subtracting 1 everywhere, seems
error prone to me.
> > +
> > +static struct efi_freeable_range *ranges_to_free;
> > +
> > void __init efi_free_boot_services(void)
> > {
>
> I was going to call it efi_unmap_boot_services() to avoid having two
> things with almost the same name.
I wanted to minimize churn, but in the end it's not that much to change and
efi_unmap_boot_services() is a better name.
> > struct efi_memory_map_data data = { 0 };
> > efi_memory_desc_t *md;
> > int num_entries = 0;
> > + int idx = 0;
> > void *new, *new_md;
> >
> > /* Keep all regions for /sys/kernel/debug/efi */
> > if (efi_enabled(EFI_DBG))
> > return;
> >
> > + ranges_to_free = kzalloc(sizeof(*ranges_to_free) * efi.memmap.nr_map,
> > + GFP_KERNEL);
> > + if (!ranges_to_free) {
> > + pr_err("Failed to allocate storage for freeable EFI regions\n");
> > + return;
> > + }
>
> Do we still want to do the whole unmap dance in that case ? I mean, OOM
> here means the system is pretty much a goner at that stage but ...
There is another potential OOM in that function. If it happens, we just
skip remapping and return. So return here is consistent :)
> > for_each_efi_memory_desc(md) {
> > unsigned long long start = md->phys_addr;
> > unsigned long long size = md->num_pages << EFI_PAGE_SHIFT;
> > @@ -471,7 +486,15 @@ void __init efi_free_boot_services(void)
> > start = SZ_1M;
> > }
> >
> > - memblock_free_late(start, size);
> > + /*
> > + * With CONFIG_DEFERRED_STRUCT_PAGE_INIT parts of the memory
> > + * map are still not initialized and we can't reliably free
> > + * memory here.
> > + * Queue the ranges to free at a later point.
> > + */
> > + ranges_to_free[idx].start = start;
> > + ranges_to_free[idx].end = start + size;
> > + idx++;
>
> Do we want to make this conditional to CONFIG_DEFERRED_STRUCT_PAGE_INIT
> or we don't care ?
I think it'll add ugliness for no good reason. If we want to keep systems
with CONFIG_DEFERRED_STRUCT_PAGE_INIT=n behave the same way as now, we need
several more if (CONFIG_DEFERRED_STRUCT_PAGE_INIT) and it becomes hairy.
And the change is quite small IMHO to just make it for everything.
> > }
> >
> > if (!num_entries)
> > @@ -512,6 +535,23 @@ void __init efi_free_boot_services(void)
> > }
> > }
> >
> > +static int __init efi_free_boot_services_memory(void)
> > +{
> > + struct efi_freeable_range *range = ranges_to_free;
> > +
> > + while (range->start) {
> > + void *start = phys_to_virt(range->start);
> > + void *end = phys_to_virt(range->end);
> > +
> > + free_reserved_area(start, end, -1, NULL);
>
> I assume here too the total_ram_page_inc stuff is taken care of ? I
> haven't really looked. This feels like a fragile counter.
This is a fragile counter :)
free_reserved_area() -> free_reserved_page() take care of it.
> > + range++;
> > + }
> > + kfree(ranges_to_free);
> > +
> > + return 0;
> > +}
> > +late_initcall(efi_free_boot_services_memory);
> > +
> > /*
> > * A number of config table entries get remapped to virtual addresses
> > * after entering EFI virtual mode. However, the kexec kernel requires
> >
> > base-commit: 05f7e89ab9731565d8a62e3b5d1ec206485eeb0b
> > --
> > 2.51.0
> >
> >
> > > Cheers,
> > > Ben.
> > >
> >
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v2] mm: Fix memblock_free_late() when using deferred struct page
2026-02-20 4:57 ` Benjamin Herrenschmidt
@ 2026-02-20 9:09 ` Mike Rapoport
0 siblings, 0 replies; 33+ messages in thread
From: Mike Rapoport @ 2026-02-20 9:09 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linux-mm
On Fri, Feb 20, 2026 at 03:57:58PM +1100, Benjamin Herrenschmidt wrote:
> > > +late_initcall(efi_free_boot_services_memory);
>
> Why late btw ? Any particular reason ?
It does not really matter, but then I thought that arch_initcall would read
more nicely there :)
> One very minor nit (but it kind of is annoying when you gather logs at
> scale and some people do look at this :-) ) is that the memory isn't
> accounted in the boot message:
>
> Memory: 224440K/483372K available (16384K kernel code, 9440K rwdata,
> 11344K rodata, 3732K init, 6480K bss, 254088K reserved, 0K cma-
> reserved)
>
> I'm not going to cry about this, but it might be nice to have the
> __initcall display how much extra if freed so it's just a log grep
> away.
Sure.
Here's v2 with fixes and updates. After you confirm it passes your
regression suite I'll send it properly to x86 folks.
From c05e37b848cd281a074b18ad28f0717a81649560 Mon Sep 17 00:00:00 2001
From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
Date: Thu, 19 Feb 2026 11:22:53 +0200
Subject: [PATCH] x86/efi: defer freeing of boot services memory
efi_free_boot_services() frees memory occupied by EFI_BOOT_SERVICES_CODE
and EFI_BOOT_SERVICES_DATA using memblock_free_late().
There are two issue with that: memblock_free_late() should be used for
memory allocated with memblock_alloc() while the memory reserved with
memblock_reserve() should be freed with free_reserved_area().
More acutely, with CONFIG_DEFERRED_STRUCT_PAGE_INIT=y
efi_free_boot_services() is called before deferred initialization of the
memory map is complete.
Benjamin Herrenschmidt reports that this causes a leak of ~140MB of
RAM on EC2 t3a.nano instances which only have 512MB or RAM.
If the freed memory resides in the areas that memory map for them is
still uninitialized, they won't be actually freed because
memblock_free_late() calls memblock_free_pages() and the latter skips
uninitialized pages.
Using free_reserved_area() at this point is also problematic because
__free_page() accesses the buddy of the freed page and that again might
end up in uninitialized part of the memory map.
Delaying the entire efi_free_boot_services() could be problematic
because in addition to freeing boot services memory it updates
efi.memmap without any synchronization and that's undesirable late in
boot when there is concurrency.
More robust approach is to only defer freeing of the EFI boot services
memory.
Make efi_free_boot_services() collect ranges that should be freed into
an array and add an initcall efi_free_boot_services_memory() that walks
that array and actually frees the memory using free_reserved_area().
Link: https://lore.kernel.org/all/ec2aaef14783869b3be6e3c253b2dcbf67dbc12a.camel@kernel.crashing.org
Fixes: 916f676f8dc0 ("x86, efi: Retain boot service code until after switching to virtual mode")
Cc: <stable@vger.kernel.org>
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
arch/x86/include/asm/efi.h | 2 +-
arch/x86/platform/efi/efi.c | 2 +-
arch/x86/platform/efi/quirks.c | 55 +++++++++++++++++++++++++++--
drivers/firmware/efi/mokvar-table.c | 2 +-
4 files changed, 55 insertions(+), 6 deletions(-)
diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index f227a70ac91f..51b4cdbea061 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -138,7 +138,7 @@ extern void __init efi_apply_memmap_quirks(void);
extern int __init efi_reuse_config(u64 tables, int nr_tables);
extern void efi_delete_dummy_variable(void);
extern void efi_crash_gracefully_on_page_fault(unsigned long phys_addr);
-extern void efi_free_boot_services(void);
+extern void efi_unmap_boot_services(void);
void arch_efi_call_virt_setup(void);
void arch_efi_call_virt_teardown(void);
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 463b784499a8..791c52c8393f 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -837,7 +837,7 @@ static void __init __efi_enter_virtual_mode(void)
}
efi_check_for_embedded_firmwares();
- efi_free_boot_services();
+ efi_unmap_boot_services();
if (!efi_is_mixed())
efi_native_runtime_setup();
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 553f330198f2..35caa5746115 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -341,7 +341,7 @@ void __init efi_reserve_boot_services(void)
/*
* Because the following memblock_reserve() is paired
- * with memblock_free_late() for this region in
+ * with free_reserved_area() for this region in
* efi_free_boot_services(), we must be extremely
* careful not to reserve, and subsequently free,
* critical regions of memory (like the kernel image) or
@@ -404,17 +404,33 @@ static void __init efi_unmap_pages(efi_memory_desc_t *md)
pr_err("Failed to unmap VA mapping for 0x%llx\n", va);
}
-void __init efi_free_boot_services(void)
+struct efi_freeable_range {
+ u64 start;
+ u64 end;
+};
+
+static struct efi_freeable_range *ranges_to_free;
+
+void __init efi_unmap_boot_services(void)
{
struct efi_memory_map_data data = { 0 };
efi_memory_desc_t *md;
int num_entries = 0;
+ int idx = 0;
+ size_t sz;
void *new, *new_md;
/* Keep all regions for /sys/kernel/debug/efi */
if (efi_enabled(EFI_DBG))
return;
+ sz = sizeof(*ranges_to_free) * efi.memmap.nr_map + 1;
+ ranges_to_free = kzalloc(sz, GFP_KERNEL);
+ if (!ranges_to_free) {
+ pr_err("Failed to allocate storage for freeable EFI regions\n");
+ return;
+ }
+
for_each_efi_memory_desc(md) {
unsigned long long start = md->phys_addr;
unsigned long long size = md->num_pages << EFI_PAGE_SHIFT;
@@ -471,7 +487,15 @@ void __init efi_free_boot_services(void)
start = SZ_1M;
}
- memblock_free_late(start, size);
+ /*
+ * With CONFIG_DEFERRED_STRUCT_PAGE_INIT parts of the memory
+ * map are still not initialized and we can't reliably free
+ * memory here.
+ * Queue the ranges to free at a later point.
+ */
+ ranges_to_free[idx].start = start;
+ ranges_to_free[idx].end = start + size;
+ idx++;
}
if (!num_entries)
@@ -512,6 +536,31 @@ void __init efi_free_boot_services(void)
}
}
+static int __init efi_free_boot_services(void)
+{
+ struct efi_freeable_range *range = ranges_to_free;
+ unsigned long freed = 0;
+
+ if (!ranges_to_free)
+ return 0;
+
+ while (range->start) {
+ void *start = phys_to_virt(range->start);
+ void *end = phys_to_virt(range->end);
+
+ free_reserved_area(start, end, -1, NULL);
+ freed += (end - start);
+ range++;
+ }
+ kfree(ranges_to_free);
+
+ if (freed)
+ pr_info("Freeing EFI boot services memory: %ldK\n", freed / SZ_1K);
+
+ return 0;
+}
+arch_initcall(efi_free_boot_services);
+
/*
* A number of config table entries get remapped to virtual addresses
* after entering EFI virtual mode. However, the kexec kernel requires
diff --git a/drivers/firmware/efi/mokvar-table.c b/drivers/firmware/efi/mokvar-table.c
index aedbbd627706..741674a0a70c 100644
--- a/drivers/firmware/efi/mokvar-table.c
+++ b/drivers/firmware/efi/mokvar-table.c
@@ -85,7 +85,7 @@ static struct kobject *mokvar_kobj;
* as an alternative to ordinary EFI variables, due to platform-dependent
* limitations. The memory occupied by this table is marked as reserved.
*
- * This routine must be called before efi_free_boot_services() in order
+ * This routine must be called before efi_unmap_boot_services() in order
* to guarantee that it can mark the table as reserved.
*
* Implicit inputs:
base-commit: 05f7e89ab9731565d8a62e3b5d1ec206485eeb0b
--
2.51.0
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2026-02-20 9:09 UTC | newest]
Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-03 8:02 [PATCH] mm: Fix memblock_free_late() when using deferred struct page Benjamin Herrenschmidt
2026-02-03 18:40 ` Mike Rapoport
2026-02-03 19:53 ` Benjamin Herrenschmidt
2026-02-04 7:39 ` Mike Rapoport
2026-02-04 9:02 ` Benjamin Herrenschmidt
2026-02-06 10:33 ` Mike Rapoport
2026-02-10 1:04 ` Benjamin Herrenschmidt
2026-02-10 2:10 ` Benjamin Herrenschmidt
2026-02-10 6:17 ` Benjamin Herrenschmidt
2026-02-10 8:34 ` Benjamin Herrenschmidt
2026-02-10 14:32 ` Mike Rapoport
2026-02-10 23:23 ` Benjamin Herrenschmidt
2026-02-11 5:20 ` Mike Rapoport
2026-02-16 5:34 ` Benjamin Herrenschmidt
2026-02-16 6:51 ` Benjamin Herrenschmidt
2026-02-16 4:53 ` Benjamin Herrenschmidt
2026-02-16 15:28 ` Mike Rapoport
2026-02-16 10:36 ` Alexander Potapenko
2026-02-17 8:28 ` [PATCH v2] " Benjamin Herrenschmidt
2026-02-17 12:32 ` Mike Rapoport
2026-02-17 22:00 ` Benjamin Herrenschmidt
2026-02-17 21:47 ` Benjamin Herrenschmidt
2026-02-18 0:15 ` Benjamin Herrenschmidt
2026-02-18 8:05 ` Mike Rapoport
2026-02-19 2:48 ` Benjamin Herrenschmidt
2026-02-19 10:16 ` Mike Rapoport
2026-02-19 22:46 ` Benjamin Herrenschmidt
2026-02-20 4:57 ` Benjamin Herrenschmidt
2026-02-20 9:09 ` Mike Rapoport
2026-02-20 9:00 ` Mike Rapoport
2026-02-20 5:12 ` Benjamin Herrenschmidt
2026-02-20 5:15 ` Benjamin Herrenschmidt
2026-02-20 5:47 ` Benjamin Herrenschmidt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox