linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: linux-mm@kvack.org
Cc: Mike Rapoport <rppt@kernel.org>
Subject: Re: [PATCH v2] mm: Fix memblock_free_late() when using deferred struct page
Date: Wed, 18 Feb 2026 08:47:56 +1100	[thread overview]
Message-ID: <14295eba34f10f5896e6cb7d3e1abd36199cd918.camel@kernel.crashing.org> (raw)
In-Reply-To: <bbffd0db46460cebd604f5c1e6668984fd4cc435.camel@kernel.crashing.org>


On Tue, 2026-02-17 at 19:28 +1100, Benjamin Herrenschmidt wrote:
> We have two issues:

 .../...

So I ran this through our full regression suite and out of hundreds
(thousands ?) of runs, it hit this *once*:

230	[    0.036100] RETBleed: WARNING: Spectre v2 mitigation leaves CPU vulnerable to RETBleed attacks, data leaks possible!
231	[    0.045442] BUG: unable to handle page fault for address: fffff1688051dc08
232	[    0.045442] #PF: supervisor read access in kernel mode
233	[    0.045442] #PF: error_code(0x0000) - not-present page
234	[    0.045442] PGD 0 P4D 0 
235	[    0.045442] Oops: Oops: 0000 [#1] PREEMPT SMP PTI
236	[    0.045442] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.12.68-92.123.amzn2023.x86_64 #1
237	[    0.045442] Hardware name: Amazon EC2 t3.nano/, BIOS 1.0 10/16/2017
238	[    0.045442] RIP: 0010:__list_del_entry_valid_or_report+0x32/0xb0
239	[    0.045442] Code: 89 fe 48 85 d2 74 3e 48 85 c9 74 47 48 b8 00 01 00 00 00 00 ad de 48 39 c2 74 46 48 b8 22 01 00 00 00 00 ad de 48 39 c1 74 45 <4c> 8b 01 49 39 f8 75 4e 4c 8b 4a 08 4d 39 c1 75 56 b8 01 00 00 00
240	[    0.045442] RSP: 0000:ffffffffadc03cc0 EFLAGS: 00010002
241	[    0.045442] RAX: dead000000000122 RBX: fffff7c440651c80 RCX: fffff1688051dc08
242	[    0.045442] RDX: fffff1688063ca48 RSI: fffff7c440651c88 RDI: fffff7c440651c88
243	[    0.045442] RBP: 0000000000000000 R08: ffffffffffffffc0 R09: 0000000000000000
244	[    0.045442] R10: 000000000000003c R11: 0000000000000200 R12: ffff88831b8cbc80
245	[    0.045442] R13: 0000000000000000 R14: 0000000000019473 R15: fffff7c440651cc0
246	[    0.045442] FS:  0000000000000000(0000) GS:ffff88831aa00000(0000) knlGS:0000000000000000
247	[    0.045442] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
248	[    0.045442] CR2: fffff1688051dc08 CR3: 000000000ec34001 CR4: 00000000007706f0
249	[    0.045442] PKRU: 55555554
250	[    0.045442] Call Trace:
251	[    0.045442]  <TASK>
252	[    0.045442]  __free_one_page+0x170/0x520
253	[    0.045442]  free_pcppages_bulk+0x151/0x1e0
254	[    0.045442]  free_unref_page_commit+0x263/0x320
255	[    0.045442]  free_unref_page+0x2c8/0x5b0
256	[    0.045442]  free_reserved_page+0x1c/0x30
257	[    0.045442]  memblock_free_late+0xea/0x190
258	[    0.045442]  efi_free_boot_services+0x11f/0x2e0
259	[    0.045442]  __efi_enter_virtual_mode+0x181/0x210
260	[    0.045442]  efi_enter_virtual_mode+0xcd/0x110
261	[    0.045442]  start_kernel+0x393/0x500
262	[    0.045442]  x86_64_start_reservations+0x14/0x30
263	[    0.045442]  x86_64_start_kernel+0x77/0x80
264	[    0.045442]  common_startup_64+0x13e/0x141
265	[    0.045442]  </TASK>
266	[    0.045442] Modules linked in:
267	[    0.045442] CR2: fffff1688051dc08
268	[    0.045442] ---[ end trace 0000000000000000 ]---
269	[    0.045442] RIP: 0010:__list_del_entry_valid_or_report+0x32/0xb0
270	[    0.045442] Code: 89 fe 48 85 d2 74 3e 48 85 c9 74 47 48 b8 00 01 00 00 00 00 ad de 48 39 c2 74 46 48 b8 22 01 00 00 00 00 ad de 48 39 c1 74 45 <4c> 8b 01 49 39 f8 75 4e 4c 8b 4a 08 4d 39 c1 75 56 b8 01 00 00 00
271	[    0.045442] RSP: 0000:ffffffffadc03cc0 EFLAGS: 00010002
272	[    0.045442] RAX: dead000000000122 RBX: fffff7c440651c80 RCX: fffff1688051dc08
273	[    0.045442] RDX: fffff1688063ca48 RSI: fffff7c440651c88 RDI: fffff7c440651c88
274	[    0.045442] RBP: 0000000000000000 R08: ffffffffffffffc0 R09: 0000000000000000
275	[    0.045442] R10: 000000000000003c R11: 0000000000000200 R12: ffff88831b8cbc80
276	[    0.045442] R13: 0000000000000000 R14: 0000000000019473 R15: fffff7c440651cc0
277	[    0.045442] FS:  0000000000000000(0000) GS:ffff88831aa00000(0000) knlGS:0000000000000000
278	[    0.045442] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
279	[    0.045442] CR2: fffff1688051dc08 CR3: 000000000ec34001 CR4: 00000000007706f0
280	[    0.045442] PKRU: 55555554
281	[    0.045442] Kernel panic - not syncing: Fatal exception
282	[    0.045442] ---[ end Kernel panic - not syncing: Fatal exception ]---
283	

Unfortunately, I don't have a more complete log (those machines boot
with "quiet").

There is definitely something fishy going on, though I don't know what,
as the page is reserved so it should *not* be touched by the deferred
initialization... Could there be an issue by which we incorrectly go
look at the head page (which hasn't been initialized) of a *potential*
compound/huge page ?

Cheers,
Ben.


> - One is we don't check for pfn_valid(). If this is called for
> a page corresponding to a big enough memory hole that we don't have
> allocated a corresponding sparsemem section for it, it will crash.
> 
> - Then, when using deferred struct page init, we can end up not
> freeing the pages at all. This happens routinely with some of the
> UEFI Boot Services memory, as soon as they fall above the threshold
> of pages whose initialization is deferred.
> 
> We can very easily hit the !early_page_initialised() test in
> memblock_free_pages() since the deferred initializer hasn't even
> started yet. As a result we drop the pages on the floor.
> 
> Now, memblock_free_late() should only ever be called for pages that
> are reserved, and thus for which the struct page has already been
> initialized by memmap_init_reserved_pages().... as long as we check
> for pfn_valid() as a big enough hole might cause entire sections of
> the mem_map to not be allocated at all.
> 
> So it should be safe to just free them normally and ignore the
> deferred
> initializer, which will skip over them as it skips over anything
> still
> in the memblock reserved list.
> 
> This helps recover something like 140MB of RAM on EC2 t3a.nano
> instances
> who only have 512MB to begin with (as to why UEFI uses that much,
> that's
> a question for another day).
> 
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> ---
> 
> v2. Reworked a bit to add the pfn_valid() check, remove the bogus
> memblock
> access in debug mode, and add a test of PageReserved() for sanity.
> 
> We could separately do a patch forcing UEFI Boot Services into
> memblock.memory but so far I haven't hit a case where that is
> necessary.
> 
>  mm/memblock.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memblock.c b/mm/memblock.c
> index 905d06b16348a..71eb25b68851e 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -1770,9 +1770,14 @@ void __init memblock_free_late(phys_addr_t
> base, phys_addr_t size)
>  	cursor = PFN_UP(base);
>  	end = PFN_DOWN(base + size);
>  
> +	/* Only free pages that were reserved */
>  	for (; cursor < end; cursor++) {
> -		memblock_free_pages(pfn_to_page(cursor), cursor, 0);
> -		totalram_pages_inc();
> +		struct page *p;
> +		if (!pfn_valid(cursor))
> +			continue;
> +		p = pfn_to_page(cursor);
> +		if (!WARN_ON(!PageReserved(p)))
> +			free_reserved_page(pfn_to_page(cursor));
>  	}
>  }
>  



  parent reply	other threads:[~2026-02-17 21:48 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-03  8:02 [PATCH] " Benjamin Herrenschmidt
2026-02-03 18:40 ` Mike Rapoport
2026-02-03 19:53   ` Benjamin Herrenschmidt
2026-02-04  7:39     ` Mike Rapoport
2026-02-04  9:02       ` Benjamin Herrenschmidt
2026-02-06 10:33         ` Mike Rapoport
2026-02-10  1:04           ` Benjamin Herrenschmidt
2026-02-10  2:10             ` Benjamin Herrenschmidt
2026-02-10  6:17               ` Benjamin Herrenschmidt
2026-02-10  8:34                 ` Benjamin Herrenschmidt
2026-02-10 14:32                   ` Mike Rapoport
2026-02-10 23:23                     ` Benjamin Herrenschmidt
2026-02-11  5:20                       ` Mike Rapoport
2026-02-16  5:34                       ` Benjamin Herrenschmidt
2026-02-16  6:51                         ` Benjamin Herrenschmidt
2026-02-16  4:53                     ` Benjamin Herrenschmidt
2026-02-16 15:28                       ` Mike Rapoport
2026-02-16 10:36           ` Alexander Potapenko
2026-02-17  8:28 ` [PATCH v2] " Benjamin Herrenschmidt
2026-02-17 12:32   ` Mike Rapoport
2026-02-17 22:00     ` Benjamin Herrenschmidt
2026-02-17 21:47   ` Benjamin Herrenschmidt [this message]
2026-02-18  0:15     ` Benjamin Herrenschmidt
2026-02-18  8:05       ` Mike Rapoport
2026-02-19  2:48         ` Benjamin Herrenschmidt
2026-02-19 10:16           ` Mike Rapoport
2026-02-19 22:46             ` Benjamin Herrenschmidt
2026-02-20  4:57               ` Benjamin Herrenschmidt
2026-02-20  9:09                 ` Mike Rapoport
2026-02-20  9:00               ` Mike Rapoport
2026-02-20  5:12             ` Benjamin Herrenschmidt
2026-02-20  5:15             ` Benjamin Herrenschmidt
2026-02-20  5:47             ` Benjamin Herrenschmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=14295eba34f10f5896e6cb7d3e1abd36199cd918.camel@kernel.crashing.org \
    --to=benh@kernel.crashing.org \
    --cc=linux-mm@kvack.org \
    --cc=rppt@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox