From: Pratyush Yadav <pratyush@kernel.org>
To: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: akpm@linux-foundation.org, bhe@redhat.com, rppt@kernel.org,
jasonmiu@google.com, arnd@arndb.de, coxu@redhat.com,
dave@vasilevsky.ca, ebiggers@google.com, graf@amazon.com,
kees@kernel.org, linux-kernel@vger.kernel.org,
kexec@lists.infradead.org, linux-mm@kvack.org
Subject: Re: [PATCH v1 13/13] kho: Introduce high-level memory allocation API
Date: Fri, 14 Nov 2025 18:45:54 +0100 [thread overview]
Message-ID: <mafs0qzu05wz1.fsf@kernel.org> (raw)
In-Reply-To: <20251114155358.2884014-14-pasha.tatashin@soleen.com> (Pasha Tatashin's message of "Fri, 14 Nov 2025 10:53:58 -0500")
On Fri, Nov 14 2025, Pasha Tatashin wrote:
> Currently, clients of KHO must manually allocate memory (e.g., via
> alloc_pages), calculate the page order, and explicitly call
> kho_preserve_folio(). Similarly, cleanup requires separate calls to
> unpreserve and free the memory.
>
> Introduce a high-level API to streamline this common pattern:
>
> - kho_alloc_preserve(size): Allocates physically contiguous, zeroed
> memory and immediately marks it for preservation.
> - kho_free_unpreserve(ptr, size): Unpreserves and frees the memory
> in the current kernel.
> - kho_free_restore(ptr, size): Restores the struct page state of
> preserved memory in the new kernel and immediately frees it to the
> page allocator.
Nit: kho_unpreserve_free() and kho_restore_free() make more sense to me
since that is the order of operations. Having them the other way round
is kind of confusing.
Also, why do the free functions need size? They can get the order from
folio_order(). This would save users of the API from having to store the
size somewhere and make things simpler.
>
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> ---
> include/linux/kexec_handover.h | 22 +++++--
> kernel/liveupdate/kexec_handover.c | 101 +++++++++++++++++++++++++++++
> 2 files changed, 116 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/kexec_handover.h b/include/linux/kexec_handover.h
> index 80ece4232617..76c496e01877 100644
> --- a/include/linux/kexec_handover.h
> +++ b/include/linux/kexec_handover.h
> @@ -2,8 +2,9 @@
> #ifndef LINUX_KEXEC_HANDOVER_H
> #define LINUX_KEXEC_HANDOVER_H
>
> -#include <linux/types.h>
> +#include <linux/err.h>
> #include <linux/errno.h>
> +#include <linux/types.h>
>
> struct kho_scratch {
> phys_addr_t addr;
> @@ -48,6 +49,9 @@ int kho_preserve_pages(struct page *page, unsigned int nr_pages);
> int kho_unpreserve_pages(struct page *page, unsigned int nr_pages);
> int kho_preserve_vmalloc(void *ptr, struct kho_vmalloc *preservation);
> int kho_unpreserve_vmalloc(struct kho_vmalloc *preservation);
> +void *kho_alloc_preserve(size_t size);
> +void kho_free_unpreserve(void *mem, size_t size);
> +void kho_free_restore(void *mem, size_t size);
> struct folio *kho_restore_folio(phys_addr_t phys);
> struct page *kho_restore_pages(phys_addr_t phys, unsigned int nr_pages);
> void *kho_restore_vmalloc(const struct kho_vmalloc *preservation);
> @@ -101,6 +105,14 @@ static inline int kho_unpreserve_vmalloc(struct kho_vmalloc *preservation)
> return -EOPNOTSUPP;
> }
>
> +void *kho_alloc_preserve(size_t size)
> +{
> + return ERR_PTR(-EOPNOTSUPP);
> +}
> +
> +void kho_free_unpreserve(void *mem, size_t size) { }
> +void kho_free_restore(void *mem, size_t size) { }
> +
> static inline struct folio *kho_restore_folio(phys_addr_t phys)
> {
> return NULL;
> @@ -122,18 +134,14 @@ static inline int kho_add_subtree(const char *name, void *fdt)
> return -EOPNOTSUPP;
> }
>
> -static inline void kho_remove_subtree(void *fdt)
> -{
> -}
> +static inline void kho_remove_subtree(void *fdt) { }
>
> static inline int kho_retrieve_subtree(const char *name, phys_addr_t *phys)
> {
> return -EOPNOTSUPP;
> }
>
> -static inline void kho_memory_init(void)
> -{
> -}
> +static inline void kho_memory_init(void) { }
>
> static inline void kho_populate(phys_addr_t fdt_phys, u64 fdt_len,
> phys_addr_t scratch_phys, u64 scratch_len)
> diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_handover.c
> index a905bccf5f65..9f05849fd68e 100644
> --- a/kernel/liveupdate/kexec_handover.c
> +++ b/kernel/liveupdate/kexec_handover.c
> @@ -4,6 +4,7 @@
> * Copyright (C) 2023 Alexander Graf <graf@amazon.com>
> * Copyright (C) 2025 Microsoft Corporation, Mike Rapoport <rppt@kernel.org>
> * Copyright (C) 2025 Google LLC, Changyuan Lyu <changyuanl@google.com>
> + * Copyright (C) 2025 Pasha Tatashin <pasha.tatashin@soleen.com>
> */
>
> #define pr_fmt(fmt) "KHO: " fmt
> @@ -1151,6 +1152,106 @@ void *kho_restore_vmalloc(const struct kho_vmalloc *preservation)
> }
> EXPORT_SYMBOL_GPL(kho_restore_vmalloc);
>
> +/**
> + * kho_alloc_preserve - Allocate, zero, and preserve memory.
> + * @size: The number of bytes to allocate.
> + *
> + * Allocates a physically contiguous block of zeroed pages that is large
> + * enough to hold @size bytes. The allocated memory is then registered with
> + * KHO for preservation across a kexec.
> + *
> + * Note: The actual allocated size will be rounded up to the nearest
> + * power-of-two page boundary.
> + *
> + * @return A virtual pointer to the allocated and preserved memory on success,
> + * or an ERR_PTR() encoded error on failure.
> + */
> +void *kho_alloc_preserve(size_t size)
> +{
> + struct folio *folio;
> + int order, ret;
> +
> + if (!size)
> + return ERR_PTR(-EINVAL);
> +
> + order = get_order(size);
> + if (order > MAX_PAGE_ORDER)
> + return ERR_PTR(-E2BIG);
> +
> + folio = folio_alloc(GFP_KERNEL | __GFP_ZERO, order);
> + if (!folio)
> + return ERR_PTR(-ENOMEM);
> +
> + ret = kho_preserve_folio(folio);
> + if (ret) {
> + folio_put(folio);
> + return ERR_PTR(ret);
> + }
> +
> + return folio_address(folio);
> +}
> +EXPORT_SYMBOL_GPL(kho_alloc_preserve);
> +
> +/**
> + * kho_free_unpreserve - Unpreserve and free memory.
> + * @mem: Pointer to the memory allocated by kho_alloc_preserve().
> + * @size: The original size requested during allocation. This is used to
> + * recalculate the correct order for freeing the pages.
> + *
> + * Unregisters the memory from KHO preservation and frees the underlying
> + * pages back to the system. This function should be called to clean up
> + * memory allocated with kho_alloc_preserve().
> + */
> +void kho_free_unpreserve(void *mem, size_t size)
> +{
> + struct folio *folio;
> + unsigned int order;
> +
> + if (!mem || !size)
> + return;
> +
> + order = get_order(size);
> + if (WARN_ON_ONCE(order > MAX_PAGE_ORDER))
> + return;
> +
> + folio = virt_to_folio(mem);
> + WARN_ON_ONCE(kho_unpreserve_folio(folio));
This is what I meant in my reply to the previous patch.
kho_unpreserve_folio() can be void now, so the WARN_ON_ONCE() is not
needed.
> + folio_put(folio);
> +}
> +EXPORT_SYMBOL_GPL(kho_free_unpreserve);
> +
> +/**
> + * kho_free_restore - Restore and free memory after kexec.
> + * @mem: Pointer to the memory (in the new kernel's address space)
> + * that was allocated by the old kernel.
> + * @size: The original size requested during allocation. This is used to
> + * recalculate the correct order for freeing the pages.
> + *
> + * This function is intended to be called in the new kernel (post-kexec)
> + * to take ownership of and free a memory region that was preserved by the
> + * old kernel using kho_alloc_preserve().
> + *
> + * It first restores the pages from KHO (using their physical address)
> + * and then frees the pages back to the new kernel's page allocator.
> + */
> +void kho_free_restore(void *mem, size_t size)
On restore side, callers are already using the phys addr directly. So do
kho_restore_folio() and kho_restore_pages() for example. This should
follow suit for uniformity. Would also save the callers a __va() call
and this function the __pa() call.
> +{
> + struct folio *folio;
> + unsigned int order;
> +
> + if (!mem || !size)
> + return;
> +
> + order = get_order(size);
> + if (WARN_ON_ONCE(order > MAX_PAGE_ORDER))
> + return;
> +
> + folio = kho_restore_folio(__pa(mem));
> + if (!WARN_ON(!folio))
kho_restore_folio() already WARNs on failure. So the WARN_ON() here can
be skipped I think.
> + free_pages((unsigned long)mem, order);
folio_put() here makes more sense since we just restored a folio.
> +}
> +EXPORT_SYMBOL_GPL(kho_free_restore);
> +
> int kho_finalize(void)
> {
> int ret;
--
Regards,
Pratyush Yadav
next prev parent reply other threads:[~2025-11-14 17:46 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-14 15:53 [PATCH v1 00/13] kho: simplify state machine and enable dynamic updates Pasha Tatashin
2025-11-14 15:53 ` [PATCH v1 01/13] kho: Fix misleading log message in kho_populate() Pasha Tatashin
2025-11-14 16:32 ` Pratyush Yadav
2025-11-14 15:53 ` [PATCH v1 02/13] kho: Convert __kho_abort() to return void Pasha Tatashin
2025-11-14 16:32 ` Pratyush Yadav
2025-11-14 15:53 ` [PATCH v1 03/13] kho: Preserve FDT folio only once during initialization Pasha Tatashin
2025-11-14 16:32 ` Pratyush Yadav
2025-11-14 15:53 ` [PATCH v1 04/13] kho: Verify deserialization status and fix FDT alignment access Pasha Tatashin
2025-11-14 16:52 ` Pratyush Yadav
2025-11-14 17:21 ` Pasha Tatashin
2025-11-15 9:36 ` Mike Rapoport
2025-11-18 13:19 ` Pratyush Yadav
2025-11-18 15:25 ` Pasha Tatashin
2025-11-18 17:11 ` Pratyush Yadav
2025-11-20 10:39 ` Mike Rapoport
2025-11-14 15:53 ` [PATCH v1 05/13] kho: Always expose output FDT in debugfs Pasha Tatashin
2025-11-14 16:59 ` Pratyush Yadav
2025-11-14 15:53 ` [PATCH v1 06/13] kho: Simplify serialization and remove __kho_abort Pasha Tatashin
2025-11-14 17:04 ` Pratyush Yadav
2025-11-14 15:53 ` [PATCH v1 07/13] kho: Remove global preserved_mem_map and store state in FDT Pasha Tatashin
2025-11-14 17:11 ` Pratyush Yadav
2025-11-14 15:53 ` [PATCH v1 08/13] kho: Remove abort functionality and support state refresh Pasha Tatashin
2025-11-14 17:18 ` Pratyush Yadav
2025-11-14 17:23 ` Pasha Tatashin
2025-11-14 17:47 ` Pratyush Yadav
2025-11-14 15:53 ` [PATCH v1 09/13] kho: Update FDT dynamically for subtree addition/removal Pasha Tatashin
2025-11-14 16:15 ` Mike Rapoport
2025-11-14 16:42 ` Pasha Tatashin
2025-11-14 17:27 ` Pratyush Yadav
2025-11-14 15:53 ` [PATCH v1 10/13] kho: Allow kexec load before KHO finalization Pasha Tatashin
2025-11-14 17:30 ` Pratyush Yadav
2025-11-14 15:53 ` [PATCH v1 11/13] kho: Allow memory preservation state updates after finalization Pasha Tatashin
2025-11-14 17:33 ` Pratyush Yadav
2025-11-14 17:47 ` Pasha Tatashin
2025-11-14 15:53 ` [PATCH v1 12/13] kho: Add Kconfig option to enable KHO by default Pasha Tatashin
2025-11-14 17:34 ` Pratyush Yadav
2025-11-14 15:53 ` [PATCH v1 13/13] kho: Introduce high-level memory allocation API Pasha Tatashin
2025-11-14 16:15 ` Mike Rapoport
2025-11-14 16:40 ` Pasha Tatashin
2025-11-14 17:45 ` Pratyush Yadav [this message]
2025-11-14 17:54 ` Pasha Tatashin
2025-11-14 16:17 ` [PATCH v1 00/13] kho: simplify state machine and enable dynamic updates Mike Rapoport
2025-11-14 16:46 ` Pasha Tatashin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=mafs0qzu05wz1.fsf@kernel.org \
--to=pratyush@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=arnd@arndb.de \
--cc=bhe@redhat.com \
--cc=coxu@redhat.com \
--cc=dave@vasilevsky.ca \
--cc=ebiggers@google.com \
--cc=graf@amazon.com \
--cc=jasonmiu@google.com \
--cc=kees@kernel.org \
--cc=kexec@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=pasha.tatashin@soleen.com \
--cc=rppt@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox