From: Usama Arif <usamaarif642@gmail.com>
To: Changyuan Lyu <changyuanl@google.com>,
akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
Mike Rapoport <rppt@kernel.org>
Cc: anthony.yznaga@oracle.com, arnd@arndb.de, ashish.kalra@amd.com,
benh@kernel.crashing.org, bp@alien8.de, catalin.marinas@arm.com,
corbet@lwn.net, dave.hansen@linux.intel.com,
devicetree@vger.kernel.org, dwmw2@infradead.org,
ebiederm@xmission.com, graf@amazon.com, hpa@zytor.com,
jgowans@amazon.com, kexec@lists.infradead.org, krzk@kernel.org,
linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org,
linux-mm@kvack.org, luto@kernel.org, mark.rutland@arm.com,
mingo@redhat.com, pasha.tatashin@soleen.com, pbonzini@redhat.com,
peterz@infradead.org, ptyadav@amazon.de, robh@kernel.org,
rostedt@goodmis.org, rppt@kernel.org, saravanak@google.com,
skinsburskii@linux.microsoft.com, tglx@linutronix.de,
thomas.lendacky@amd.com, will@kernel.org, x86@kernel.org,
Breno Leitao <leitao@debian.org>,
thevlad@meta.com
Subject: Re: [PATCH v8 12/17] x86/e820: temporarily enable KHO scratch for memory below 1M
Date: Mon, 24 Nov 2025 19:24:58 +0000 [thread overview]
Message-ID: <a0f875f1-45ad-4dfc-b5c8-ecb51b242523@gmail.com> (raw)
In-Reply-To: <20250509074635.3187114-13-changyuanl@google.com>
On 09/05/2025 08:46, Changyuan Lyu wrote:
> From: Alexander Graf <graf@amazon.com>
>
> KHO kernels are special and use only scratch memory for memblock
> allocations, but memory below 1M is ignored by kernel after early boot
> and cannot be naturally marked as scratch.
>
> To allow allocation of the real-mode trampoline and a few (if any) other
> very early allocations from below 1M forcibly mark the memory below 1M
> as scratch.
>
> After real mode trampoline is allocated, clear that scratch marking.
>
> Signed-off-by: Alexander Graf <graf@amazon.com>
> Co-developed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> Co-developed-by: Changyuan Lyu <changyuanl@google.com>
> Signed-off-by: Changyuan Lyu <changyuanl@google.com>
> Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
> ---
> arch/x86/kernel/e820.c | 18 ++++++++++++++++++
> arch/x86/realmode/init.c | 2 ++
> 2 files changed, 20 insertions(+)
>
> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
> index 9920122018a0b..c3acbd26408ba 100644
> --- a/arch/x86/kernel/e820.c
> +++ b/arch/x86/kernel/e820.c
> @@ -1299,6 +1299,24 @@ void __init e820__memblock_setup(void)
> memblock_add(entry->addr, entry->size);
> }
>
> + /*
> + * At this point memblock is only allowed to allocate from memory
> + * below 1M (aka ISA_END_ADDRESS) up until direct map is completely set
> + * up in init_mem_mapping().
> + *
> + * KHO kernels are special and use only scratch memory for memblock
> + * allocations, but memory below 1M is ignored by kernel after early
> + * boot and cannot be naturally marked as scratch.
> + *
> + * To allow allocation of the real-mode trampoline and a few (if any)
> + * other very early allocations from below 1M forcibly mark the memory
> + * below 1M as scratch.
> + *
> + * After real mode trampoline is allocated, we clear that scratch
> + * marking.
> + */
> + memblock_mark_kho_scratch(0, SZ_1M);
> +
> /*
> * 32-bit systems are limited to 4BG of memory even with HIGHMEM and
> * to even less without it.
> diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
> index f9bc444a3064d..9b9f4534086d2 100644
> --- a/arch/x86/realmode/init.c
> +++ b/arch/x86/realmode/init.c
> @@ -65,6 +65,8 @@ void __init reserve_real_mode(void)
> * setup_arch().
> */
> memblock_reserve(0, SZ_1M);
> +
> + memblock_clear_kho_scratch(0, SZ_1M);
> }
>
> static void __init sme_sev_setup_real_mode(struct trampoline_header *th)
Hello!
I am working with Breno who reported that we are seeing the below warning at boot
when rolling out 6.16 in Meta fleet. It is difficult to reproduce on a single host
manually but we are seeing this several times a day inside the fleet.
20:16:33 ------------[ cut here ]------------
20:16:33 WARNING: CPU: 0 PID: 0 at mm/memblock.c:668 memblock_add_range+0x316/0x330
20:16:33 Modules linked in:
20:16:33 CPU: 0 UID: 0 PID: 0 Comm: swapper Tainted: G S 6.16.1-0_fbk0_0_gc0739ee5037a #1 NONE
20:16:33 Tainted: [S]=CPU_OUT_OF_SPEC
20:16:33 RIP: 0010:memblock_add_range+0x316/0x330
20:16:33 Code: ff ff ff 89 5c 24 08 41 ff c5 44 89 6c 24 10 48 63 74 24 08 48 63 54 24 10 e8 26 0c 00 00 e9 41 ff ff ff 0f 0b e9 af fd ff ff <0f> 0b e9 b7 fd ff ff 0f 0b 0f 0b cc cc cc cc cc cc cc cc cc cc cc
20:16:33 RSP: 0000:ffffffff83403dd8 EFLAGS: 00010083 ORIG_RAX: 0000000000000000
20:16:33 RAX: ffffffff8476ff90 RBX: 0000000000001c00 RCX: 0000000000000002
20:16:33 RDX: 00000000ffffffff RSI: 0000000000000000 RDI: ffffffff83bad4d8
20:16:33 RBP: 000000000009f000 R08: 0000000000000020 R09: 8000000000097101
20:16:33 R10: ffffffffff2004b0 R11: 203a6d6f646e6172 R12: 000000000009ec00
20:16:33 R13: 0000000000000002 R14: 0000000000100000 R15: 000000000009d000
20:16:33 FS: 0000000000000000(0000) GS:0000000000000000(0000) knlGS:0000000000000000
20:16:33 CR2: ffff888065413ff8 CR3: 00000000663b7000 CR4: 00000000000000b0
20:16:33 Call Trace:
20:16:33 <TASK>
20:16:33 ? __memblock_reserve+0x75/0x80
20:16:33 ? setup_arch+0x30f/0xb10
20:16:33 ? start_kernel+0x58/0x960
20:16:33 ? x86_64_start_reservations+0x20/0x20
20:16:33 ? x86_64_start_kernel+0x13d/0x140
20:16:33 ? common_startup_64+0x13e/0x140
20:16:33 </TASK>
20:16:33 ---[ end trace 0000000000000000 ]---
Rolling out with memblock=debug is not really an option in a large scale fleet due to the
time added to boot. But I did try on one of the hosts (without reproducing the issue) and I see:
[ 0.000616] memory.cnt = 0x6
[ 0.000617] memory[0x0] [0x0000000000001000-0x000000000009bfff], 0x000000000009b000 bytes flags: 0x40
[ 0.000620] memory[0x1] [0x000000000009f000-0x000000000009ffff], 0x0000000000001000 bytes flags: 0x40
[ 0.000621] memory[0x2] [0x0000000000100000-0x000000005ed09fff], 0x000000005ec0a000 bytes flags: 0x0
...
The 0x40 (MEMBLOCK_KHO_SCRATCH) is coming from memblock_mark_kho_scratch in e820__memblock_setup. I believe this
should be under ifdef like the diff at the end? (Happy to send this as a patch for review if it makes sense).
We have KEXEC_HANDOVER disabled in our defconfig, therefore MEMBLOCK_KHO_SCRATCH shouldnt be selected and
we shouldnt have any MEMBLOCK_KHO_SCRATCH type regions in our memblock reservations.
The other thing I did was insert a while(1) just before the warning and inspected the registers in qemu.
R14 held the base register, and R15 held the size at that point.
In the warning R14 is 0x100000 meaning that someone is reserving a region with a different flag to MEMBLOCK_NONE
at the boundary of MEMBLOCK_KHO_SCRATCH.
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index c3acbd26408ba..26e4062a0bd09 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -1299,6 +1299,7 @@ void __init e820__memblock_setup(void)
memblock_add(entry->addr, entry->size);
}
+#ifdef CONFIG_MEMBLOCK_KHO_SCRATCH
/*
* At this point memblock is only allowed to allocate from memory
* below 1M (aka ISA_END_ADDRESS) up until direct map is completely set
@@ -1316,7 +1317,7 @@ void __init e820__memblock_setup(void)
* marking.
*/
memblock_mark_kho_scratch(0, SZ_1M);
-
+#endif
/*
* 32-bit systems are limited to 4BG of memory even with HIGHMEM and
* to even less without it.
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 88be32026768c..1cd80293a3e23 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -66,8 +66,9 @@ void __init reserve_real_mode(void)
* setup_arch().
*/
memblock_reserve(0, SZ_1M);
-
+#ifdef CONFIG_MEMBLOCK_KHO_SCRATCH
memblock_clear_kho_scratch(0, SZ_1M);
+#endif
}
static void __init sme_sev_setup_real_mode(struct trampoline_header *th)
next prev parent reply other threads:[~2025-11-24 19:25 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-09 7:46 [PATCH v8 00/17] kexec: introduce Kexec HandOver (KHO) Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 01/17] memblock: add MEMBLOCK_RSRV_KERN flag Changyuan Lyu
2025-10-10 9:33 ` Breno Leitao
2025-10-13 14:59 ` Pratyush Yadav
2025-10-13 16:40 ` Pratyush Yadav
2025-10-14 8:34 ` Breno Leitao
2025-10-14 13:10 ` Pratyush Yadav
2025-11-05 10:18 ` Breno Leitao
2025-11-06 8:24 ` Mike Rapoport
2025-05-09 7:46 ` [PATCH v8 02/17] memblock: Add support for scratch memory Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 03/17] memblock: introduce memmap_init_kho_scratch() Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 04/17] kexec: add Kexec HandOver (KHO) generation helpers Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 05/17] kexec: add KHO parsing support Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 06/17] kexec: enable KHO support for memory preservation Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 07/17] kexec: add KHO support to kexec file loads Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 08/17] kexec: add config option for KHO Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 09/17] arm64: add KHO support Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 10/17] x86/setup: use memblock_reserve_kern for memory used by kernel Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 11/17] x86/kexec: add support for passing kexec handover (KHO) data Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 12/17] x86/e820: temporarily enable KHO scratch for memory below 1M Changyuan Lyu
2025-11-24 19:24 ` Usama Arif [this message]
2025-11-25 0:56 ` H. Peter Anvin
2025-11-25 12:23 ` Pratyush Yadav
2025-11-25 13:53 ` Mike Rapoport
2025-11-25 13:15 ` Pratyush Yadav
2025-11-25 13:50 ` Mike Rapoport
2025-11-25 18:47 ` Usama Arif
2025-11-26 6:14 ` Mike Rapoport
2025-11-26 7:25 ` Usama Arif
2025-11-25 14:31 ` Usama Arif
2025-11-25 14:39 ` Pratyush Yadav
2025-05-09 7:46 ` [PATCH v8 13/17] x86/boot: make sure KASLR does not step over KHO preserved memory Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 14/17] x86/Kconfig: enable kexec handover for 64 bits Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 15/17] memblock: add KHO support for reserve_mem Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 16/17] Documentation: add documentation for KHO Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 17/17] Documentation: KHO: Add memblock bindings Changyuan Lyu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a0f875f1-45ad-4dfc-b5c8-ecb51b242523@gmail.com \
--to=usamaarif642@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=anthony.yznaga@oracle.com \
--cc=arnd@arndb.de \
--cc=ashish.kalra@amd.com \
--cc=benh@kernel.crashing.org \
--cc=bp@alien8.de \
--cc=catalin.marinas@arm.com \
--cc=changyuanl@google.com \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=devicetree@vger.kernel.org \
--cc=dwmw2@infradead.org \
--cc=ebiederm@xmission.com \
--cc=graf@amazon.com \
--cc=hpa@zytor.com \
--cc=jgowans@amazon.com \
--cc=kexec@lists.infradead.org \
--cc=krzk@kernel.org \
--cc=leitao@debian.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=mark.rutland@arm.com \
--cc=mingo@redhat.com \
--cc=pasha.tatashin@soleen.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=ptyadav@amazon.de \
--cc=robh@kernel.org \
--cc=rostedt@goodmis.org \
--cc=rppt@kernel.org \
--cc=saravanak@google.com \
--cc=skinsburskii@linux.microsoft.com \
--cc=tglx@linutronix.de \
--cc=thevlad@meta.com \
--cc=thomas.lendacky@amd.com \
--cc=will@kernel.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox