From: "H. Peter Anvin" <hpa@zytor.com>
To: Usama Arif <usamaarif642@gmail.com>,
Changyuan Lyu <changyuanl@google.com>,
akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
Mike Rapoport <rppt@kernel.org>
Cc: anthony.yznaga@oracle.com, arnd@arndb.de, ashish.kalra@amd.com,
benh@kernel.crashing.org, bp@alien8.de, catalin.marinas@arm.com,
corbet@lwn.net, dave.hansen@linux.intel.com,
devicetree@vger.kernel.org, dwmw2@infradead.org,
ebiederm@xmission.com, graf@amazon.com, jgowans@amazon.com,
kexec@lists.infradead.org, krzk@kernel.org,
linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org,
linux-mm@kvack.org, luto@kernel.org, mark.rutland@arm.com,
mingo@redhat.com, pasha.tatashin@soleen.com, pbonzini@redhat.com,
peterz@infradead.org, ptyadav@amazon.de, robh@kernel.org,
rostedt@goodmis.org, rppt@kernel.org, saravanak@google.com,
skinsburskii@linux.microsoft.com, tglx@linutronix.de,
thomas.lendacky@amd.com, will@kernel.org, x86@kernel.org,
Breno Leitao <leitao@debian.org>,
thevlad@meta.com
Subject: Re: [PATCH v8 12/17] x86/e820: temporarily enable KHO scratch for memory below 1M
Date: Mon, 24 Nov 2025 16:56:34 -0800 [thread overview]
Message-ID: <22BDBF5C-C831-4BBC-A854-20CA77234084@zytor.com> (raw)
In-Reply-To: <a0f875f1-45ad-4dfc-b5c8-ecb51b242523@gmail.com>
On November 24, 2025 11:24:58 AM PST, Usama Arif <usamaarif642@gmail.com> wrote:
>
>
>On 09/05/2025 08:46, Changyuan Lyu wrote:
>> From: Alexander Graf <graf@amazon.com>
>>
>> KHO kernels are special and use only scratch memory for memblock
>> allocations, but memory below 1M is ignored by kernel after early boot
>> and cannot be naturally marked as scratch.
>>
>> To allow allocation of the real-mode trampoline and a few (if any) other
>> very early allocations from below 1M forcibly mark the memory below 1M
>> as scratch.
>>
>> After real mode trampoline is allocated, clear that scratch marking.
>>
>> Signed-off-by: Alexander Graf <graf@amazon.com>
>> Co-developed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
>> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
>> Co-developed-by: Changyuan Lyu <changyuanl@google.com>
>> Signed-off-by: Changyuan Lyu <changyuanl@google.com>
>> Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
>> ---
>> arch/x86/kernel/e820.c | 18 ++++++++++++++++++
>> arch/x86/realmode/init.c | 2 ++
>> 2 files changed, 20 insertions(+)
>>
>> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
>> index 9920122018a0b..c3acbd26408ba 100644
>> --- a/arch/x86/kernel/e820.c
>> +++ b/arch/x86/kernel/e820.c
>> @@ -1299,6 +1299,24 @@ void __init e820__memblock_setup(void)
>> memblock_add(entry->addr, entry->size);
>> }
>>
>> + /*
>> + * At this point memblock is only allowed to allocate from memory
>> + * below 1M (aka ISA_END_ADDRESS) up until direct map is completely set
>> + * up in init_mem_mapping().
>> + *
>> + * KHO kernels are special and use only scratch memory for memblock
>> + * allocations, but memory below 1M is ignored by kernel after early
>> + * boot and cannot be naturally marked as scratch.
>> + *
>> + * To allow allocation of the real-mode trampoline and a few (if any)
>> + * other very early allocations from below 1M forcibly mark the memory
>> + * below 1M as scratch.
>> + *
>> + * After real mode trampoline is allocated, we clear that scratch
>> + * marking.
>> + */
>> + memblock_mark_kho_scratch(0, SZ_1M);
>> +
>> /*
>> * 32-bit systems are limited to 4BG of memory even with HIGHMEM and
>> * to even less without it.
>> diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
>> index f9bc444a3064d..9b9f4534086d2 100644
>> --- a/arch/x86/realmode/init.c
>> +++ b/arch/x86/realmode/init.c
>> @@ -65,6 +65,8 @@ void __init reserve_real_mode(void)
>> * setup_arch().
>> */
>> memblock_reserve(0, SZ_1M);
>> +
>> + memblock_clear_kho_scratch(0, SZ_1M);
>> }
>>
>> static void __init sme_sev_setup_real_mode(struct trampoline_header *th)
>
>Hello!
>
>I am working with Breno who reported that we are seeing the below warning at boot
>when rolling out 6.16 in Meta fleet. It is difficult to reproduce on a single host
>manually but we are seeing this several times a day inside the fleet.
>
> 20:16:33 ------------[ cut here ]------------
> 20:16:33 WARNING: CPU: 0 PID: 0 at mm/memblock.c:668 memblock_add_range+0x316/0x330
> 20:16:33 Modules linked in:
> 20:16:33 CPU: 0 UID: 0 PID: 0 Comm: swapper Tainted: G S 6.16.1-0_fbk0_0_gc0739ee5037a #1 NONE
> 20:16:33 Tainted: [S]=CPU_OUT_OF_SPEC
> 20:16:33 RIP: 0010:memblock_add_range+0x316/0x330
> 20:16:33 Code: ff ff ff 89 5c 24 08 41 ff c5 44 89 6c 24 10 48 63 74 24 08 48 63 54 24 10 e8 26 0c 00 00 e9 41 ff ff ff 0f 0b e9 af fd ff ff <0f> 0b e9 b7 fd ff ff 0f 0b 0f 0b cc cc cc cc cc cc cc cc cc cc cc
> 20:16:33 RSP: 0000:ffffffff83403dd8 EFLAGS: 00010083 ORIG_RAX: 0000000000000000
> 20:16:33 RAX: ffffffff8476ff90 RBX: 0000000000001c00 RCX: 0000000000000002
> 20:16:33 RDX: 00000000ffffffff RSI: 0000000000000000 RDI: ffffffff83bad4d8
> 20:16:33 RBP: 000000000009f000 R08: 0000000000000020 R09: 8000000000097101
> 20:16:33 R10: ffffffffff2004b0 R11: 203a6d6f646e6172 R12: 000000000009ec00
> 20:16:33 R13: 0000000000000002 R14: 0000000000100000 R15: 000000000009d000
> 20:16:33 FS: 0000000000000000(0000) GS:0000000000000000(0000) knlGS:0000000000000000
> 20:16:33 CR2: ffff888065413ff8 CR3: 00000000663b7000 CR4: 00000000000000b0
> 20:16:33 Call Trace:
> 20:16:33 <TASK>
> 20:16:33 ? __memblock_reserve+0x75/0x80
> 20:16:33 ? setup_arch+0x30f/0xb10
> 20:16:33 ? start_kernel+0x58/0x960
> 20:16:33 ? x86_64_start_reservations+0x20/0x20
> 20:16:33 ? x86_64_start_kernel+0x13d/0x140
> 20:16:33 ? common_startup_64+0x13e/0x140
> 20:16:33 </TASK>
> 20:16:33 ---[ end trace 0000000000000000 ]---
>
>
>Rolling out with memblock=debug is not really an option in a large scale fleet due to the
>time added to boot. But I did try on one of the hosts (without reproducing the issue) and I see:
>
>[ 0.000616] memory.cnt = 0x6
>[ 0.000617] memory[0x0] [0x0000000000001000-0x000000000009bfff], 0x000000000009b000 bytes flags: 0x40
>[ 0.000620] memory[0x1] [0x000000000009f000-0x000000000009ffff], 0x0000000000001000 bytes flags: 0x40
>[ 0.000621] memory[0x2] [0x0000000000100000-0x000000005ed09fff], 0x000000005ec0a000 bytes flags: 0x0
>...
>
>The 0x40 (MEMBLOCK_KHO_SCRATCH) is coming from memblock_mark_kho_scratch in e820__memblock_setup. I believe this
>should be under ifdef like the diff at the end? (Happy to send this as a patch for review if it makes sense).
>We have KEXEC_HANDOVER disabled in our defconfig, therefore MEMBLOCK_KHO_SCRATCH shouldnt be selected and
>we shouldnt have any MEMBLOCK_KHO_SCRATCH type regions in our memblock reservations.
>
>The other thing I did was insert a while(1) just before the warning and inspected the registers in qemu.
>R14 held the base register, and R15 held the size at that point.
>In the warning R14 is 0x100000 meaning that someone is reserving a region with a different flag to MEMBLOCK_NONE
>at the boundary of MEMBLOCK_KHO_SCRATCH.
>
>diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
>index c3acbd26408ba..26e4062a0bd09 100644
>--- a/arch/x86/kernel/e820.c
>+++ b/arch/x86/kernel/e820.c
>@@ -1299,6 +1299,7 @@ void __init e820__memblock_setup(void)
> memblock_add(entry->addr, entry->size);
> }
>
>+#ifdef CONFIG_MEMBLOCK_KHO_SCRATCH
> /*
> * At this point memblock is only allowed to allocate from memory
> * below 1M (aka ISA_END_ADDRESS) up until direct map is completely set
>@@ -1316,7 +1317,7 @@ void __init e820__memblock_setup(void)
> * marking.
> */
> memblock_mark_kho_scratch(0, SZ_1M);
>-
>+#endif
> /*
> * 32-bit systems are limited to 4BG of memory even with HIGHMEM and
> * to even less without it.
>diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
>index 88be32026768c..1cd80293a3e23 100644
>--- a/arch/x86/realmode/init.c
>+++ b/arch/x86/realmode/init.c
>@@ -66,8 +66,9 @@ void __init reserve_real_mode(void)
> * setup_arch().
> */
> memblock_reserve(0, SZ_1M);
>-
>+#ifdef CONFIG_MEMBLOCK_KHO_SCRATCH
> memblock_clear_kho_scratch(0, SZ_1M);
>+#endif
> }
>
> static void __init sme_sev_setup_real_mode(struct trampoline_header *th)
What does "scratch" mean in this exact context? (Sorry, don't have the code in front of me.)
next prev parent reply other threads:[~2025-11-25 0:58 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-09 7:46 [PATCH v8 00/17] kexec: introduce Kexec HandOver (KHO) Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 01/17] memblock: add MEMBLOCK_RSRV_KERN flag Changyuan Lyu
2025-10-10 9:33 ` Breno Leitao
2025-10-13 14:59 ` Pratyush Yadav
2025-10-13 16:40 ` Pratyush Yadav
2025-10-14 8:34 ` Breno Leitao
2025-10-14 13:10 ` Pratyush Yadav
2025-11-05 10:18 ` Breno Leitao
2025-11-06 8:24 ` Mike Rapoport
2025-05-09 7:46 ` [PATCH v8 02/17] memblock: Add support for scratch memory Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 03/17] memblock: introduce memmap_init_kho_scratch() Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 04/17] kexec: add Kexec HandOver (KHO) generation helpers Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 05/17] kexec: add KHO parsing support Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 06/17] kexec: enable KHO support for memory preservation Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 07/17] kexec: add KHO support to kexec file loads Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 08/17] kexec: add config option for KHO Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 09/17] arm64: add KHO support Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 10/17] x86/setup: use memblock_reserve_kern for memory used by kernel Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 11/17] x86/kexec: add support for passing kexec handover (KHO) data Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 12/17] x86/e820: temporarily enable KHO scratch for memory below 1M Changyuan Lyu
2025-11-24 19:24 ` Usama Arif
2025-11-25 0:56 ` H. Peter Anvin [this message]
2025-11-25 12:23 ` Pratyush Yadav
2025-11-25 13:53 ` Mike Rapoport
2025-11-25 13:15 ` Pratyush Yadav
2025-11-25 13:50 ` Mike Rapoport
2025-11-25 18:47 ` Usama Arif
2025-11-26 6:14 ` Mike Rapoport
2025-11-26 7:25 ` Usama Arif
2025-11-25 14:31 ` Usama Arif
2025-11-25 14:39 ` Pratyush Yadav
2025-05-09 7:46 ` [PATCH v8 13/17] x86/boot: make sure KASLR does not step over KHO preserved memory Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 14/17] x86/Kconfig: enable kexec handover for 64 bits Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 15/17] memblock: add KHO support for reserve_mem Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 16/17] Documentation: add documentation for KHO Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 17/17] Documentation: KHO: Add memblock bindings Changyuan Lyu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=22BDBF5C-C831-4BBC-A854-20CA77234084@zytor.com \
--to=hpa@zytor.com \
--cc=akpm@linux-foundation.org \
--cc=anthony.yznaga@oracle.com \
--cc=arnd@arndb.de \
--cc=ashish.kalra@amd.com \
--cc=benh@kernel.crashing.org \
--cc=bp@alien8.de \
--cc=catalin.marinas@arm.com \
--cc=changyuanl@google.com \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=devicetree@vger.kernel.org \
--cc=dwmw2@infradead.org \
--cc=ebiederm@xmission.com \
--cc=graf@amazon.com \
--cc=jgowans@amazon.com \
--cc=kexec@lists.infradead.org \
--cc=krzk@kernel.org \
--cc=leitao@debian.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=mark.rutland@arm.com \
--cc=mingo@redhat.com \
--cc=pasha.tatashin@soleen.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=ptyadav@amazon.de \
--cc=robh@kernel.org \
--cc=rostedt@goodmis.org \
--cc=rppt@kernel.org \
--cc=saravanak@google.com \
--cc=skinsburskii@linux.microsoft.com \
--cc=tglx@linutronix.de \
--cc=thevlad@meta.com \
--cc=thomas.lendacky@amd.com \
--cc=usamaarif642@gmail.com \
--cc=will@kernel.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox