From: Usama Arif <usamaarif642@gmail.com>
To: Mike Rapoport <rppt@kernel.org>
Cc: Pratyush Yadav <pratyush@kernel.org>,
Changyuan Lyu <changyuanl@google.com>,
akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
anthony.yznaga@oracle.com, arnd@arndb.de, ashish.kalra@amd.com,
benh@kernel.crashing.org, bp@alien8.de, catalin.marinas@arm.com,
corbet@lwn.net, dave.hansen@linux.intel.com,
devicetree@vger.kernel.org, dwmw2@infradead.org,
ebiederm@xmission.com, graf@amazon.com, hpa@zytor.com,
jgowans@amazon.com, kexec@lists.infradead.org, krzk@kernel.org,
linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org,
linux-mm@kvack.org, luto@kernel.org, mark.rutland@arm.com,
mingo@redhat.com, pasha.tatashin@soleen.com, pbonzini@redhat.com,
peterz@infradead.org, robh@kernel.org, rostedt@goodmis.org,
saravanak@google.com, skinsburskii@linux.microsoft.com,
tglx@linutronix.de, thomas.lendacky@amd.com, will@kernel.org,
x86@kernel.org, Breno Leitao <leitao@debian.org>,
thevlad@meta.com
Subject: Re: [PATCH v8 12/17] x86/e820: temporarily enable KHO scratch for memory below 1M
Date: Wed, 26 Nov 2025 07:25:48 +0000 [thread overview]
Message-ID: <d38aeb5d-5a46-4e8d-b3c1-f2b4cdb15b04@gmail.com> (raw)
In-Reply-To: <aSaazgjKX8PfFDXf@kernel.org>
On 26/11/2025 06:14, Mike Rapoport wrote:
> On Tue, Nov 25, 2025 at 06:47:15PM +0000, Usama Arif wrote:
>>
>>
>> On 25/11/2025 13:50, Mike Rapoport wrote:
>>> Hi,
>>>
>>> On Tue, Nov 25, 2025 at 02:15:34PM +0100, Pratyush Yadav wrote:
>>>> On Mon, Nov 24 2025, Usama Arif wrote:
>>>
>>>>>> --- a/arch/x86/realmode/init.c
>>>>>> +++ b/arch/x86/realmode/init.c
>>>>>> @@ -65,6 +65,8 @@ void __init reserve_real_mode(void)
>>>>>> * setup_arch().
>>>>>> */
>>>>>> memblock_reserve(0, SZ_1M);
>>>>>> +
>>>>>> + memblock_clear_kho_scratch(0, SZ_1M);
>>>>>> }
>>>>>>
>>>>>> static void __init sme_sev_setup_real_mode(struct trampoline_header *th)
>>>>>
>>>>> Hello!
>>>>>
>>>>> I am working with Breno who reported that we are seeing the below warning at boot
>>>>> when rolling out 6.16 in Meta fleet. It is difficult to reproduce on a single host
>>>>> manually but we are seeing this several times a day inside the fleet.
>>>>>
>>>>> 20:16:33 ------------[ cut here ]------------
>>>>> 20:16:33 WARNING: CPU: 0 PID: 0 at mm/memblock.c:668 memblock_add_range+0x316/0x330
>>>>> 20:16:33 Modules linked in:
>>>>> 20:16:33 CPU: 0 UID: 0 PID: 0 Comm: swapper Tainted: G S 6.16.1-0_fbk0_0_gc0739ee5037a #1 NONE
>>>>> 20:16:33 Tainted: [S]=CPU_OUT_OF_SPEC
>>>>> 20:16:33 RIP: 0010:memblock_add_range+0x316/0x330
>>>>> 20:16:33 Code: ff ff ff 89 5c 24 08 41 ff c5 44 89 6c 24 10 48 63 74 24 08 48 63 54 24 10 e8 26 0c 00 00 e9 41 ff ff ff 0f 0b e9 af fd ff ff <0f> 0b e9 b7 fd ff ff 0f 0b 0f 0b cc cc cc cc cc cc cc cc cc cc cc
>>>>> 20:16:33 RSP: 0000:ffffffff83403dd8 EFLAGS: 00010083 ORIG_RAX: 0000000000000000
>>>>> 20:16:33 RAX: ffffffff8476ff90 RBX: 0000000000001c00 RCX: 0000000000000002
>>>>> 20:16:33 RDX: 00000000ffffffff RSI: 0000000000000000 RDI: ffffffff83bad4d8
>>>>> 20:16:33 RBP: 000000000009f000 R08: 0000000000000020 R09: 8000000000097101
>>>>> 20:16:33 R10: ffffffffff2004b0 R11: 203a6d6f646e6172 R12: 000000000009ec00
>>>>> 20:16:33 R13: 0000000000000002 R14: 0000000000100000 R15: 000000000009d000
>>>>> 20:16:33 FS: 0000000000000000(0000) GS:0000000000000000(0000) knlGS:0000000000000000
>>>>> 20:16:33 CR2: ffff888065413ff8 CR3: 00000000663b7000 CR4: 00000000000000b0
>>>>> 20:16:33 Call Trace:
>>>>> 20:16:33 <TASK>
>>>>> 20:16:33 ? __memblock_reserve+0x75/0x80
>>>
>>> Do you have faddr2line for this?
>>>>>> 20:16:33 ? setup_arch+0x30f/0xb10
>>>
>>> And this?
>>>
>>
>>
>> Thanks for this! I think it helped narrow down the problem.
>>
>> The stack is:
>>
>> 20:16:33 ? __memblock_reserve (mm/memblock.c:936)
>> 20:16:33 ? setup_arch (arch/x86/kernel/setup.c:413 arch/x86/kernel/setup.c:499 arch/x86/kernel/setup.c:956)
>> 20:16:33 ? start_kernel (init/main.c:922)
>> 20:16:33 ? x86_64_start_reservations (arch/x86/kernel/ebda.c:57)
>> 20:16:33 ? x86_64_start_kernel (arch/x86/kernel/head64.c:231)
>> 20:16:33 ? common_startup_64 (arch/x86/kernel/head_64.S:419)
>>
>> This is 6.16 kernel.
>>
>> 20:16:33 ? __memblock_reserve (mm/memblock.c:936)
>> Thats memblock_add_range call in memblock_reserve
>>
>> 20:16:33 ? setup_arch (arch/x86/kernel/setup.c:413 arch/x86/kernel/setup.c:499 arch/x86/kernel/setup.c:956)
>> That is parse_setup_data -> add_early_ima_buffer -> add_early_ima_buffer -> memblock_reserve_kern
>>
>>
>> I put a simple print like below:
>>
>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>> index 680d1b6dfea41..cc97ffc0083c7 100644
>> --- a/arch/x86/kernel/setup.c
>> +++ b/arch/x86/kernel/setup.c
>> @@ -409,6 +409,7 @@ static void __init add_early_ima_buffer(u64 phys_addr)
>> }
>>
>> if (data->size) {
>> + pr_err("PPP %s %s %d data->addr %llx, data->size %llx \n", __FILE__, __func__, __LINE__, data->addr, data->size);
>> memblock_reserve_kern(data->addr, data->size);
>> ima_kexec_buffer_phys = data->addr;
>> ima_kexec_buffer_size = data->size;
>>
>>
>> and I see (without replicating the warning):
>>
>> [ 0.000000] PPP arch/x86/kernel/setup.c add_early_ima_buffer 412 data->addr 9e000, data->size 1000
>> ....
>
> So it looks like in cases when the warning reproduces there's something
> that reserves memory overlapping with IMA buffer before
> add_early_ima_buffer().
>
>>
>> [ 0.000348] MEMBLOCK configuration:
>> [ 0.000348] memory size = 0x0000003fea329ff0 reserved size = 0x00000000050c969b
>> [ 0.000350] memory.cnt = 0x5
>> [ 0.000351] memory[0x0] [0x0000000000001000-0x000000000009ffff], 0x000000000009f000 bytes flags: 0x40
>> [ 0.000353] memory[0x1] [0x0000000000100000-0x0000000067c65fff], 0x0000000067b66000 bytes flags: 0x0
>> [ 0.000355] memory[0x2] [0x000000006d8db000-0x000000006fffffff], 0x0000000002725000 bytes flags: 0x0
>> [ 0.000356] memory[0x3] [0x0000000100000000-0x000000407fff8fff], 0x0000003f7fff9000 bytes flags: 0x0
>> [ 0.000358] memory[0x4] [0x000000407fffa000-0x000000407fffffff], 0x0000000000006000 bytes flags: 0x0
>> [ 0.000359] reserved.cnt = 0x7
>>
>>
>> So MEMBLOCK_RSRV_KERN and MEMBLOCK_KHO_SCRATCH seem to overlap..
>
> It does not matter, they are set on different arrays. RSRV_KERN is set on
> regions in memblock.reserved and KHO_SCRATCH is set on regions in
> memblock.memory.
>
> So dumping memblock.memory is completely irrelevant, you need to check
> memblock.reserved for potential conflicts.
>
>>>>> 20:16:33 ? start_kernel+0x58/0x960
>>>>> 20:16:33 ? x86_64_start_reservations+0x20/0x20
>>>>> 20:16:33 ? x86_64_start_kernel+0x13d/0x140
>>>>> 20:16:33 ? common_startup_64+0x13e/0x140
>>>>> 20:16:33 </TASK>
>>>>> 20:16:33 ---[ end trace 0000000000000000 ]---
>>>>>
>>>>>
>>>>> Rolling out with memblock=debug is not really an option in a large scale fleet due to the
>>>>> time added to boot. But I did try on one of the hosts (without reproducing the issue) and I see:
>>>
>>> Is it a problem to roll out a kernel that has additional debug printouts as
>>> Breno suggested earlier? I.e.
>>>
>>> if (flags != MEMBLOCK_NONE && flags != rgn->flags) {
>>> pr_warn("memblock: Flag mismatch at region [%pa-%pa]\n",
>>> &rgn->base, &rend);
>>> pr_warn(" Existing region flags: %#x\n", rgn->flags);
>>> pr_warn(" New range flags: %#x\n", flags);
>>> pr_warn(" New range: [%pa-%pa]\n", &base, &end);
>>> WARN_ON_ONCE(1);
>>> }
>>>
>>
>> I can add this, but the only thing is that it might be several weeks between me putting this in the
>> kernel and that kernel being deployed to enough machines that it starts to show up. I think the IMA coinciding
>> with memblock_mark_kho_scratch in e820__memblock_setup could be the reason for the warning. It might be better to
>> fix that case and deploy it to see if the warnings still show up?
>> I can add these prints as well incase it doesnt fix the problem.
>
> I really don't think that effectively disabling memblock_mark_kho_scratch()
> when KHO is disabled will solve the problem because as I said the flags it
> sets are on different structure than the flags set by
> memblock_reserve_kern().
>
>>> If you have the logs from failing boots up to the point where SLUB reports
>>> about it's initialization, e.g.
>>>
>>> [ 0.134377] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1
>>>
>>> something there may hint about what's the issue.
>>
>> So the boot doesnt fail, its just giving warnings in the fleet.
>> I have added the dmesg to the end of the mail.
>
> Thanks, unfortunately nothing jumped at me there.
>
>> Does something like this look good? I can try deploying this (although it will take sometime to find out).
>> We can get it upstream as well as that makes backports easier.
>>
>> diff --git a/mm/memblock.c b/mm/memblock.c
>> index 154f1d73b61f2..257c6f0eee03d 100644
>> --- a/mm/memblock.c
>> +++ b/mm/memblock.c
>> @@ -1119,8 +1119,13 @@ int __init_memblock memblock_reserved_mark_noinit(phys_addr_t base, phys_addr_t
>> */
>> __init int memblock_mark_kho_scratch(phys_addr_t base, phys_addr_t size)
>> {
>> - return memblock_setclr_flag(&memblock.memory, base, size, 1,
>> - MEMBLOCK_KHO_SCRATCH);
>> +#ifdef CONFIG_MEMBLOCK_KHO_SCRATCH
>> + if (is_kho_boot())
>
> Please use
>
> if (IS_ENABLED(CONFIG_MEMBLOCK_KHO_SCRATCH)
>
> instead of indef.
>
> If you send a formal patch with it, I'll take it.
> I'd suggest still deploying additional debug printouts internally.
Thanks! I will add the additional debug prints and [1] in the next release.
It will be sometime before it makes it into production, so I will try to debug
this more using the information you provided above.
[1] https://lore.kernel.org/all/20251126072051.546700-1-usamaarif642@gmail.com/
>
>> + return memblock_setclr_flag(&memblock.memory, base, size, 1,
>> + MEMBLOCK_KHO_SCRATCH);
>> +#else
>> + return 0;
>> +#endif
>> }
>>
>> /**
>> @@ -1133,8 +1138,13 @@ __init int memblock_mark_kho_scratch(phys_addr_t base, phys_addr_t size)
>> */
>> __init int memblock_clear_kho_scratch(phys_addr_t base, phys_addr_t size)
>> {
>> - return memblock_setclr_flag(&memblock.memory, base, size, 0,
>> - MEMBLOCK_KHO_SCRATCH);
>> +#ifdef CONFIG_MEMBLOCK_KHO_SCRATCH
>> + if (is_kho_boot())
>> + return memblock_setclr_flag(&memblock.memory, base, size, 0,
>> + MEMBLOCK_KHO_SCRATCH);
>> +#else
>
> If nothing sets the flag _clear is anyway nop, but let's update it as well
> for symmetry.
>
next prev parent reply other threads:[~2025-11-26 7:25 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-09 7:46 [PATCH v8 00/17] kexec: introduce Kexec HandOver (KHO) Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 01/17] memblock: add MEMBLOCK_RSRV_KERN flag Changyuan Lyu
2025-10-10 9:33 ` Breno Leitao
2025-10-13 14:59 ` Pratyush Yadav
2025-10-13 16:40 ` Pratyush Yadav
2025-10-14 8:34 ` Breno Leitao
2025-10-14 13:10 ` Pratyush Yadav
2025-11-05 10:18 ` Breno Leitao
2025-11-06 8:24 ` Mike Rapoport
2025-05-09 7:46 ` [PATCH v8 02/17] memblock: Add support for scratch memory Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 03/17] memblock: introduce memmap_init_kho_scratch() Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 04/17] kexec: add Kexec HandOver (KHO) generation helpers Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 05/17] kexec: add KHO parsing support Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 06/17] kexec: enable KHO support for memory preservation Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 07/17] kexec: add KHO support to kexec file loads Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 08/17] kexec: add config option for KHO Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 09/17] arm64: add KHO support Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 10/17] x86/setup: use memblock_reserve_kern for memory used by kernel Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 11/17] x86/kexec: add support for passing kexec handover (KHO) data Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 12/17] x86/e820: temporarily enable KHO scratch for memory below 1M Changyuan Lyu
2025-11-24 19:24 ` Usama Arif
2025-11-25 0:56 ` H. Peter Anvin
2025-11-25 12:23 ` Pratyush Yadav
2025-11-25 13:53 ` Mike Rapoport
2025-11-25 13:15 ` Pratyush Yadav
2025-11-25 13:50 ` Mike Rapoport
2025-11-25 18:47 ` Usama Arif
2025-11-26 6:14 ` Mike Rapoport
2025-11-26 7:25 ` Usama Arif [this message]
2025-11-25 14:31 ` Usama Arif
2025-11-25 14:39 ` Pratyush Yadav
2025-05-09 7:46 ` [PATCH v8 13/17] x86/boot: make sure KASLR does not step over KHO preserved memory Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 14/17] x86/Kconfig: enable kexec handover for 64 bits Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 15/17] memblock: add KHO support for reserve_mem Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 16/17] Documentation: add documentation for KHO Changyuan Lyu
2025-05-09 7:46 ` [PATCH v8 17/17] Documentation: KHO: Add memblock bindings Changyuan Lyu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d38aeb5d-5a46-4e8d-b3c1-f2b4cdb15b04@gmail.com \
--to=usamaarif642@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=anthony.yznaga@oracle.com \
--cc=arnd@arndb.de \
--cc=ashish.kalra@amd.com \
--cc=benh@kernel.crashing.org \
--cc=bp@alien8.de \
--cc=catalin.marinas@arm.com \
--cc=changyuanl@google.com \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=devicetree@vger.kernel.org \
--cc=dwmw2@infradead.org \
--cc=ebiederm@xmission.com \
--cc=graf@amazon.com \
--cc=hpa@zytor.com \
--cc=jgowans@amazon.com \
--cc=kexec@lists.infradead.org \
--cc=krzk@kernel.org \
--cc=leitao@debian.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=mark.rutland@arm.com \
--cc=mingo@redhat.com \
--cc=pasha.tatashin@soleen.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=pratyush@kernel.org \
--cc=robh@kernel.org \
--cc=rostedt@goodmis.org \
--cc=rppt@kernel.org \
--cc=saravanak@google.com \
--cc=skinsburskii@linux.microsoft.com \
--cc=tglx@linutronix.de \
--cc=thevlad@meta.com \
--cc=thomas.lendacky@amd.com \
--cc=will@kernel.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox