From: Pratyush Yadav <pratyush@kernel.org>
To: Pasha Tatashin <pasha.tatashin@soleen.com>,
Mike Rapoport <rppt@kernel.org>,
Pratyush Yadav <pratyush@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Vlastimil Babka <vbabka@suse.cz>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>, Jonathan Corbet <corbet@lwn.net>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
Muchun Song <muchun.song@linux.dev>,
Oscar Salvador <osalvador@suse.de>,
Alexander Graf <graf@amazon.com>,
David Matlack <dmatlack@google.com>,
David Rientjes <rientjes@google.com>,
Jason Gunthorpe <jgg@nvidia.com>,
Samiullah Khawaja <skhawaja@google.com>,
Vipin Sharma <vipinsh@google.com>,
Zhu Yanjun <yanjun.zhu@linux.dev>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-doc@vger.kernel.org, kexec@lists.infradead.org
Subject: [RFC PATCH 02/10] kho: disable scratch-only earlier in boot
Date: Sun, 7 Dec 2025 00:02:12 +0100 [thread overview]
Message-ID: <20251206230222.853493-3-pratyush@kernel.org> (raw)
In-Reply-To: <20251206230222.853493-1-pratyush@kernel.org>
Background
==========
Scratch areas
-------------
When KHO is used, allocations are only allowed from scratch areas. The
scratch areas are pre-reserved chunks of memory that are known to not
have any preserved memory. They can safely be used until KHO is able to
parse its serialized data to find out which pages are preserved.
The scratch areas are generally sized to ensure enough memory is available for
early boot allocations. They should not be excessively large to ensure
less memory is wasted.
Gigantic hugepage allocation
----------------------------
Gigantic hugepages are allocated early in boot before memblock releases
pages to the buddy allocator. This is to ensure enough contiguous chunks
of memory are available to satisfy huge page allocations. On x86 this is
done in setup_arch(). On other architectures, including ARM64 (the only
other arch that supports KHO), this is done in mm_core_init().
Problem
=======
Currently during KHO boot, scratch-only mode is active when hugepage
allocations are attempted on both x86 and ARM64. Since scratch areas are
generally not large enough to accommodate the allocation, this
allocation fails and results in gigantic hugepages being unavailable.
Solution
========
Moving KHO memory init
----------------------
Move KHO memory initialization before gigantic hugepage allocation.
Disable scratch-only as soon as the bitmaps are deserialized, since
there is no longer a reason to stay in scratch-only mode. Since on x86
this can get called twice, once from setup_arch() and once from the
generic path in mm_core_init(), add a variable to catch this and skip
double-initialization.
Re-ordering hugepage allocation
-------------------------------
KHO memory initialization uses the struct page to store the order. On
x86, This is not available until paging_init(). If kho_memory_init() is
called before paging_init() it will cause a page fault when trying to
access the struct pages.
But Hugepage allocations are done before paging_init(). Move them to
just after paging_init(), and call kho_memory_init() right before that.
While in theory it might result in more chances in failing hugepage
allocations, in practice it will likely not have a huge impact, since
usually systems leave a fair bit of margin for non-hugepage workloads.
Testing results
===============
Normal boot
-----------
On my test system with 7GiB of memory, I tried allocating 6 1G
hugepages. I can get a maximum of 4 1G hugepages both with and without
this patch.
[ 0.039182] HugeTLB: allocating 6 of page size 1.00 GiB failed. Only allocated 4 hugepages.
KHO boot
--------
Without this patch, I cannot get any hugepages:
[ 0.098201] HugeTLB: allocating 6 of page size 1.00 GiB failed. Only allocated 0 hugepages.
With this patch, I am again able to get 4:
[ 0.194657] HugeTLB: allocating 6 of page size 1.00 GiB failed. Only allocated 4 hugepages.
Signed-off-by: Pratyush Yadav <pratyush@kernel.org>
---
Notes:
Only tested on x86 so far, not yet on ARM64. This patch can also be
taken independent of the rest of the series. Even with plain KHO with
live update not even enabled, gigantic hugepages fail to allocate
because of scratch-only.
arch/x86/kernel/setup.c | 12 +++++++-----
kernel/liveupdate/kexec_handover.c | 11 ++++++++++-
mm/memblock.c | 1 -
mm/mm_init.c | 8 ++------
4 files changed, 19 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 74aa904be6dc..9bf00287c408 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1203,11 +1203,6 @@ void __init setup_arch(char **cmdline_p)
initmem_init();
dma_contiguous_reserve(max_pfn_mapped << PAGE_SHIFT);
- if (boot_cpu_has(X86_FEATURE_GBPAGES)) {
- hugetlb_cma_reserve(PUD_SHIFT - PAGE_SHIFT);
- hugetlb_bootmem_alloc();
- }
-
/*
* Reserve memory for crash kernel after SRAT is parsed so that it
* won't consume hotpluggable memory.
@@ -1219,6 +1214,13 @@ void __init setup_arch(char **cmdline_p)
x86_init.paging.pagetable_init();
+ kho_memory_init();
+
+ if (boot_cpu_has(X86_FEATURE_GBPAGES)) {
+ hugetlb_cma_reserve(PUD_SHIFT - PAGE_SHIFT);
+ hugetlb_bootmem_alloc();
+ }
+
kasan_init();
/*
diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_handover.c
index 9aa128909ecf..4cfd5690f356 100644
--- a/kernel/liveupdate/kexec_handover.c
+++ b/kernel/liveupdate/kexec_handover.c
@@ -1432,14 +1432,23 @@ static void __init kho_release_scratch(void)
}
}
+static bool kho_memory_initialized;
+
void __init kho_memory_init(void)
{
+ if (kho_memory_initialized)
+ return;
+
+ kho_memory_initialized = true;
+
if (kho_in.scratch_phys) {
kho_scratch = phys_to_virt(kho_in.scratch_phys);
- kho_release_scratch();
if (!kho_mem_deserialize(kho_get_fdt()))
kho_in.fdt_phys = 0;
+
+ memblock_clear_kho_scratch_only();
+ kho_release_scratch();
} else {
kho_reserve_scratch();
}
diff --git a/mm/memblock.c b/mm/memblock.c
index c7869860e659..a5682dff526d 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -2342,7 +2342,6 @@ void __init memblock_free_all(void)
free_unused_memmap();
reset_all_zones_managed_pages();
- memblock_clear_kho_scratch_only();
pages = free_low_memory_core_early();
totalram_pages_add(pages);
}
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 7712d887b696..93cec06c1c8a 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -2679,6 +2679,8 @@ void __init __weak mem_init(void)
void __init mm_core_init(void)
{
arch_mm_preinit();
+
+ kho_memory_init();
hugetlb_bootmem_alloc();
/* Initializations relying on SMP setup */
@@ -2697,12 +2699,6 @@ void __init mm_core_init(void)
kmsan_init_shadow();
stack_depot_early_init();
- /*
- * KHO memory setup must happen while memblock is still active, but
- * as close as possible to buddy initialization
- */
- kho_memory_init();
-
memblock_free_all();
mem_init();
kmem_cache_init();
--
2.43.0
next prev parent reply other threads:[~2025-12-06 23:03 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-06 23:02 [RFC PATCH 00/10] liveupdate: hugetlb support Pratyush Yadav
2025-12-06 23:02 ` [RFC PATCH 01/10] kho: drop restriction on maximum page order Pratyush Yadav
2025-12-06 23:02 ` Pratyush Yadav [this message]
2025-12-06 23:02 ` [RFC PATCH 03/10] liveupdate: do early initialization before hugepages are allocated Pratyush Yadav
2025-12-06 23:02 ` [RFC PATCH 04/10] liveupdate: flb: allow getting FLB data in early boot Pratyush Yadav
2025-12-06 23:02 ` [RFC PATCH 05/10] mm: hugetlb: export some functions to hugetlb-internal header Pratyush Yadav
2025-12-06 23:02 ` [RFC PATCH 06/10] liveupdate: hugetlb subsystem FLB state preservation Pratyush Yadav
2025-12-06 23:02 ` [RFC PATCH 07/10] mm: hugetlb: don't allocate pages already in live update Pratyush Yadav
2025-12-06 23:02 ` [RFC PATCH 08/10] mm: hugetlb: disable CMA if liveupdate is enabled Pratyush Yadav
2025-12-06 23:02 ` [RFC PATCH 09/10] mm: hugetlb: allow freezing the inode Pratyush Yadav
2025-12-06 23:02 ` [RFC PATCH 10/10] liveupdate: allow preserving hugetlb-backed memfd Pratyush Yadav
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251206230222.853493-3-pratyush@kernel.org \
--to=pratyush@kernel.org \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=bp@alien8.de \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=david@kernel.org \
--cc=dmatlack@google.com \
--cc=graf@amazon.com \
--cc=hpa@zytor.com \
--cc=jgg@nvidia.com \
--cc=kexec@lists.infradead.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=mingo@redhat.com \
--cc=muchun.song@linux.dev \
--cc=osalvador@suse.de \
--cc=pasha.tatashin@soleen.com \
--cc=rientjes@google.com \
--cc=rppt@kernel.org \
--cc=skhawaja@google.com \
--cc=surenb@google.com \
--cc=tglx@linutronix.de \
--cc=vbabka@suse.cz \
--cc=vipinsh@google.com \
--cc=x86@kernel.org \
--cc=yanjun.zhu@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox