linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] kho: introduce the KHO array
@ 2025-09-09 14:44 Pratyush Yadav
  2025-09-09 14:44 ` [RFC PATCH 1/4] " Pratyush Yadav
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Pratyush Yadav @ 2025-09-09 14:44 UTC (permalink / raw)
  To: Alexander Graf, Mike Rapoport, Changyuan Lyu, Andrew Morton,
	Baoquan He, Pratyush Yadav, Pasha Tatashin, Jason Gunthorpe,
	Thomas Weißschuh, Chris Li, Jason Miu, David Matlack,
	David Rientjes
  Cc: linux-kernel, kexec, linux-mm

Hi,

This series introduces the KHO array concept, and to demonstrate it in action it
converts the perservation bitmaps to use it. It also takes Mike's vmalloc
preservation patches v2 [0] and ports them to use the KHO array as well. The
memfd preservation proposed in [1] can also be a user of this.

The main idea behind the KHO array is to build fundamental and flexible
serialization data structures that other layers can use to make their logic
easier. There was extensive discussion on this topic in [1]. This series
provides actual code in action to give more substance to that discussion.

Patch 1 adds the KHO array. Read it for more details on the data structure.
Patch 2 converts the memory preservation bitmaps to use KHO array. Patches 3 and
4 are from Mike's vmalloc series [0] and are modified to use KHO array instead
of open-coding the linked lists. The full git tree can be found at [2].

When converting the two users to the KHO array, I think it did make the
serialization logic simpler for both. The actual judgement of whether this is
"worth it" is of course subjective, but I think this should at least give
everyone a clear idea of how it looks like.

[0] https://lore.kernel.org/linux-mm/20250905131302.3595582-1-rppt@kernel.org/T/#u
[1] https://lore.kernel.org/linux-mm/20250807014442.3829950-30-pasha.tatashin@soleen.com/T/#u
[2] https://git.kernel.org/pub/scm/linux/kernel/git/pratyush/linux.git/tag/?h=kho-array-rfc-v1

Mike Rapoport (Microsoft) (2):
  kho: add support for preserving vmalloc allocations
  lib/test_kho: use kho_preserve_vmalloc instead of storing addresses in
    fdt

Pratyush Yadav (2):
  kho: introduce the KHO array
  kho: use KHO array for preserved memory bitmap serialization

 MAINTAINERS                    |   2 +
 include/linux/kexec_handover.h |  21 +++
 include/linux/kho_array.h      | 300 +++++++++++++++++++++++++++++++++
 kernel/Makefile                |   1 +
 kernel/kexec_handover.c        | 291 +++++++++++++++++++++++---------
 kernel/kho_array.c             | 209 +++++++++++++++++++++++
 lib/test_kho.c                 |  30 +++-
 7 files changed, 768 insertions(+), 86 deletions(-)
 create mode 100644 include/linux/kho_array.h
 create mode 100644 kernel/kho_array.c


base-commit: b320789d6883cc00ac78ce83bccbfe7ed58afcf0
-- 
2.47.3



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC PATCH 1/4] kho: introduce the KHO array
  2025-09-09 14:44 [RFC PATCH 0/4] kho: introduce the KHO array Pratyush Yadav
@ 2025-09-09 14:44 ` Pratyush Yadav
  2025-09-09 15:28   ` Jason Gunthorpe
  2025-09-09 14:44 ` [RFC PATCH 2/4] kho: use KHO array for preserved memory bitmap serialization Pratyush Yadav
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Pratyush Yadav @ 2025-09-09 14:44 UTC (permalink / raw)
  To: Alexander Graf, Mike Rapoport, Changyuan Lyu, Andrew Morton,
	Baoquan He, Pratyush Yadav, Pasha Tatashin, Jason Gunthorpe,
	Thomas Weißschuh, Chris Li, Jason Miu, David Matlack,
	David Rientjes
  Cc: linux-kernel, kexec, linux-mm

The KHO Array is a data structure that behaves like a sparse array of
pointers. It is designed to be preserved and restored over Kexec
Handover (KHO), and targets only 64-bit platforms. It can store 8-byte
aligned pointers. It can also store integers between 0 and LONG_MAX. It
supports sparse indices, though it performs best with densely clustered
indices.

The goal with KHO array is to provide a fundamental data type that can
then be used to build serialization logic for higher layers. Moving the
complexity of tracking these scattered list of pages to the KHO array
layer makes higher layers simpler.

The data format consists of a descriptor of the array which contains a
magic number, format version, and pointer to the first page. Each page
contains the starting position of the entries in the page and a pointer
to the next page, forming a linked list. This linked list allows for the
array to be built with non-contiguous pages. Visually, the data format
looks like below:

  kho_array
 +----------+
 |  Magic   |
 +----------+                   kho_array_page
 | Version  |         +----------+----------+-----------
 +----------+    +--->|   Next   | Startpos | Entries...
 | Reserved |    |    +----------+----------+-----------
 +----------+    |          |               kho_array_page
 |  First   |----+          |    +----------+----------+-----------
 +----------+               +--->|   Next   | Startpos | Entries...
                                 +----------+----------+-----------
                                       |
                                       |
                                       +--->...

Signed-off-by: Pratyush Yadav <pratyush@kernel.org>
---
 MAINTAINERS               |   2 +
 include/linux/kho_array.h | 300 ++++++++++++++++++++++++++++++++++++++
 kernel/Makefile           |   1 +
 kernel/kho_array.c        | 209 ++++++++++++++++++++++++++
 4 files changed, 512 insertions(+)
 create mode 100644 include/linux/kho_array.h
 create mode 100644 kernel/kho_array.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 6dcfbd11efef8..e66bc05bce0e3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13550,7 +13550,9 @@ S:	Maintained
 F:	Documentation/admin-guide/mm/kho.rst
 F:	Documentation/core-api/kho/*
 F:	include/linux/kexec_handover.h
+F:	include/linux/kho_array.h
 F:	kernel/kexec_handover.c
+F:	kernel/kho_array.c
 F:	tools/testing/selftests/kho/
 
 KEYS-ENCRYPTED
diff --git a/include/linux/kho_array.h b/include/linux/kho_array.h
new file mode 100644
index 0000000000000..39ab5532ee765
--- /dev/null
+++ b/include/linux/kho_array.h
@@ -0,0 +1,300 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2025 Amazon.com Inc. or its affiliates.
+ * Pratyush Yadav <pratyush@kernel.org>
+ */
+
+/**
+ * DOC: KHO Array
+ *
+ * The KHO Array is a data structure that behaves like a sparse array of
+ * pointers. It is designed to be preserved and restored over Kexec Handover
+ * (KHO), and targets only 64-bit platforms. It can store 8-byte aligned
+ * pointers. It can also store integers between 0 and LONG_MAX. It supports
+ * sparse indices, though it performs best with densely clustered indices. The
+ * data structure does not provide any locking. Callers must ensure they have
+ * exclusive access.
+ *
+ * To keep the data format simple, the data structure is designed to only be
+ * accessed linearly. When reading or writing the data structure, the values
+ * should be accessed from the lowest index to the highest.
+ *
+ * The data format consists of a descriptor of the array which contains a magic
+ * number, format version, and pointer to the first page. Each page contains the
+ * starting position of the entries in the page and a pointer to the next page,
+ * forming a linked list. This linked list allows for the array to be built with
+ * non-contiguous pages.
+ *
+ * The starting position of each page an offset that is applied to calculate the
+ * index of each entry in the array. For example, of the starting position is
+ * 1000, entry 0 has index 1000, entry 1 has index 1001, and so on. This
+ * facilitates memory-efficient handling of holes in the array.
+ *
+ * The diagram below shows the data format visually:
+ *
+ *   kho_array
+ *  +----------+
+ *  |  Magic   |
+ *  +----------+                   kho_array_page
+ *  | Version  |         +----------+----------+-----------
+ *  +----------+    +--->|   Next   | Startpos | Entries...
+ *  | Reserved |    |    +----------+----------+-----------
+ *  +----------+    |          |               kho_array_page
+ *  |  First   |----+          |    +----------+----------+-----------
+ *  +----------+               +--->|   Next   | Startpos | Entries...
+ *                                  +----------+----------+-----------
+ *                                        |
+ *                                        |
+ *                                        +--->...
+ */
+
+#ifndef LINUX_KHO_ARRAY_H
+#define LINUX_KHO_ARRAY_H
+
+#include <linux/bug.h>
+
+#define KHO_ARRAY_MAGIC		0x4b415252 /* ASCII for 'KARR' */
+#define KHO_ARRAY_VERSION	0
+
+/**
+ * struct kho_array - Descriptor for a KHO array.
+ * @magic: Magic number to ensure valid descriptor.
+ * @version: Data format version.
+ * @__reserved: Reserved bytes. Must be set to 0.
+ * @first: Physical address of the first page in the list of pages. If 0, the
+ *         list is empty.
+ */
+struct kho_array {
+	u32		magic;
+	u16		version;
+	u16		__reserved;
+	__aligned_u64	first;
+} __packed;
+
+/**
+ * struct kho_array_page - A page in the KHO array.
+ * @next: Physical address of the next page in the list. If 0, there is no next
+ *        page.
+ * @startpos: Position at which entries in this page start.
+ * @entries: Entries in the array.
+ */
+struct kho_array_page {
+	__aligned_u64	next;
+	__aligned_u64	startpos;
+	__aligned_u64	entries[];
+} __packed;
+
+#define KA_PAGE_NR_ENTRIES ((PAGE_SIZE - sizeof(struct kho_array_page)) / sizeof(u64))
+
+#define KA_ITER_PAGEPOS(iter) ((iter)->pos - (iter)->cur->startpos)
+#define KA_PAGE(phys) ((phys) ? (struct kho_array_page *)__va((phys)) : NULL)
+
+/**
+ * kho_array_valid() - Validate KHO array descriptor.
+ * @ka: KHO array.
+ *
+ * Return: %true if valid, %false otherwise.
+ */
+bool kho_array_valid(struct kho_array *ka);
+
+/**
+ * kho_array_init() - Initialize an empty KHO array.
+ * @ka: KHO array.
+ *
+ * Initilizes @ka to an empty KHO array full of NULL entries.
+ */
+void kho_array_init(struct kho_array *ka);
+
+/**
+ * kho_array_destroy() - Free the KHO array.
+ * @ka: KHO array.
+ *
+ * After calling this function, @ka is destroyed and all its pages have been
+ * freed. It must be initialized again before reuse.
+ */
+void kho_array_destroy(struct kho_array *ka);
+
+/**
+ * kho_array_preserve() - KHO-preserve all pages of the array
+ * @ka: KHO array.
+ *
+ * Mark all pages of the array to be preserved across KHO.
+ *
+ * Note: the memory for the struct @ka itself is not marked as preserved. The
+ * caller must take care of doing that, likely embedding it in a larger
+ * serialized data structure.
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+int kho_array_preserve(struct kho_array *ka);
+
+/**
+ * kho_array_restore() - KHO-restore all pages of the array
+ * @ka: KHO array.
+ *
+ * Validate the magic and version of @ka, and if they match, restore all pages
+ * ka from KHO to set the array up for being accessed.
+ *
+ * Note: the memory for the struct @ka itself is not KHO-restored. The caller
+ * must take care of doing that, likely embedding it in a larger serialized data
+ * structure.
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+int kho_array_restore(struct kho_array *ka);
+
+/**
+ * ka_is_value() - Determine if an entry is a value.
+ * @entry: KHO array entry.
+ *
+ * Return: %true if the entry is a value, %false if it is a pointer.
+ */
+static inline bool ka_is_value(const void *entry)
+{
+	return (unsigned long)entry & 1;
+}
+
+/**
+ * ka_to_value() - Get value stored in an KHO array entry.
+ * @entry: KHO array entry.
+ *
+ * Return: The value stored in @entry.
+ */
+static inline unsigned long ka_to_value(const void *entry)
+{
+	return (unsigned long)entry >> 1;
+}
+
+/**
+ * ka_mk_value() - Create an KHO array entry from an integer.
+ * @v: Value to store in KHO array.
+ *
+ * Return: An entry suitable for storing in a KHO array.
+ */
+static inline void *ka_mk_value(unsigned long v)
+{
+	WARN_ON((long)v < 0);
+	return (void *)((v << 1) | 1);
+}
+
+enum ka_iter_mode {
+	KA_ITER_READ,
+	KA_ITER_WRITE,
+};
+
+struct ka_iter {
+	struct kho_array	*ka;
+	struct kho_array_page	*cur;
+	unsigned long		pos;
+	enum ka_iter_mode	mode;
+};
+
+/**
+ * ka_iter_init_read() - Initialize iterator for reading.
+ * @iter: KHO array iterator.
+ * @ka: KHO array.
+ *
+ * Initialize @iter in read mode for reading @ka. After the function returns,
+ * @iter points to the first non-empty entry in the array, if any. @ka must be a
+ * valid KHO array. No validation on @ka is performed.
+ */
+void ka_iter_init_read(struct ka_iter *iter, struct kho_array *ka);
+
+/**
+ * ka_iter_init_write() - Initialize iterator for writing.
+ * @iter: KHO array iterator.
+ * @ka: KHO array.
+ *
+ * Initialize @ka to an empty array and then initialize @iter in write mode
+ * for building @ka. All data in @ka is over-written, so it must be an
+ * un-initialized array. After the function returns, @iter points to the first
+ * entry in the array.
+ */
+void ka_iter_init_write(struct ka_iter *iter, struct kho_array *ka);
+
+/**
+ * ka_iter_init_restore() - Restore KHO array and initialize iterator for reading.
+ * @iter: KHO array iterator.
+ * @ka: KHO array.
+ *
+ * KHO-restore @ka, performing version and format validation, and initialize
+ * @iter in read mode for reading the array. After the function returns, @iter
+ * points to the first non-empty entry in the array, if any
+ *
+ * Returns: 0 on success, -errno on failure.
+ */
+int ka_iter_init_restore(struct ka_iter *iter, struct kho_array *ka);
+
+/**
+ * ka_iter_setentry() - Set entry at current iterator position.
+ * @iter: KHO array iterator in write mode.
+ * @value: Value or pointer to store.
+ *
+ * Store @value at the current position of @iter. @iter must be in write mode.
+ * The iterator position is not advanced.
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+int ka_iter_setentry(struct ka_iter *iter, const void *value);
+
+/**
+ * ka_iter_nextentry() - Advance iterator to next non-empty entry.
+ * @iter: KHO array iterator.
+ *
+ * Advance @iter to the next non-empty entry in the array, skipping over
+ * empty entries and holes between pages.
+ *
+ * Return: The entry, or %NULL if end of array reached.
+ */
+void *ka_iter_nextentry(struct ka_iter *iter);
+
+/**
+ * ka_iter_setpos() - Set iterator position.
+ * @iter: KHO array iterator.
+ * @pos: New position (must be >= current position).
+ *
+ * Set the iterator position to @pos. The position can only be moved forward.
+ * The iterator will point to the appropriate page for the given position.
+ *
+ * Return: 0 on success, -EINVAL if @pos is less than current position.
+ */
+int ka_iter_setpos(struct ka_iter *iter, unsigned long pos);
+
+/**
+ * ka_iter_end() - Check if iterator has reached end of array.
+ * @iter: KHO array iterator.
+ *
+ * Return: %true if iterator is at end of array, %false otherwise.
+ */
+bool ka_iter_end(struct ka_iter *iter);
+
+/**
+ * ka_iter_getpos() - Get current iterator position.
+ * @iter: KHO array iterator.
+ *
+ * Return: Current position in the array.
+ */
+static inline unsigned long ka_iter_getpos(struct ka_iter *iter)
+{
+	return iter->pos;
+}
+
+/**
+ * ka_iter_getentry() - Get entry at current iterator position.
+ * @iter: KHO array iterator.
+ *
+ * Return: Pointer to entry at current position, or %NULL if none.
+ */
+void *ka_iter_getentry(struct ka_iter *iter);
+
+/**
+ * ka_iter_for_each - Iterate over all non-empty entries in array.
+ * @iter: KHO array iterator.
+ * @entry: Variable to store current entry.
+ *
+ * Loop over all non-empty entries in the array starting from current position.
+ */
+#define ka_iter_for_each(iter, entry)					\
+	for ((entry) = ka_iter_getentry(iter); (entry); (entry) = ka_iter_nextentry((iter)))
+
+#endif /* LINUX_KHO_ARRAY_H */
diff --git a/kernel/Makefile b/kernel/Makefile
index c60623448235f..8baef3cb3979f 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -82,6 +82,7 @@ obj-$(CONFIG_KEXEC) += kexec.o
 obj-$(CONFIG_KEXEC_FILE) += kexec_file.o
 obj-$(CONFIG_KEXEC_ELF) += kexec_elf.o
 obj-$(CONFIG_KEXEC_HANDOVER) += kexec_handover.o
+obj-$(CONFIG_KEXEC_HANDOVER) += kho_array.o
 obj-$(CONFIG_BACKTRACE_SELF_TEST) += backtracetest.o
 obj-$(CONFIG_COMPAT) += compat.o
 obj-$(CONFIG_CGROUPS) += cgroup/
diff --git a/kernel/kho_array.c b/kernel/kho_array.c
new file mode 100644
index 0000000000000..bdac471c45c58
--- /dev/null
+++ b/kernel/kho_array.c
@@ -0,0 +1,209 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2025 Amazon.com Inc. or its affiliates.
+ * Pratyush Yadav <pratyush@kernel.org>
+ */
+
+#include <linux/kexec_handover.h>
+#include <linux/kho_array.h>
+#include <linux/string.h>
+#include <linux/mm.h>
+#include <linux/bug.h>
+#include <linux/types.h>
+
+#define KA_PAGE_NR_ENTRIES ((PAGE_SIZE - sizeof(struct kho_array_page)) / sizeof(u64))
+
+#define KA_ITER_PAGEPOS(iter) ((iter)->pos - (iter)->cur->startpos)
+#define KA_PAGE(phys) ((phys) ? (struct kho_array_page *)__va((phys)) : NULL)
+
+bool ka_iter_end(struct ka_iter *iter)
+{
+	return !iter->cur || (KA_ITER_PAGEPOS(iter) >= KA_PAGE_NR_ENTRIES && !iter->cur->next);
+}
+
+void *ka_iter_getentry(struct ka_iter *iter)
+{
+	if (!iter->cur || KA_ITER_PAGEPOS(iter) >= KA_PAGE_NR_ENTRIES)
+		return NULL;
+
+	return (void *)iter->cur->entries[KA_ITER_PAGEPOS(iter)];
+}
+
+static int ka_iter_extend(struct ka_iter *iter)
+{
+	struct kho_array_page *kap;
+	struct folio *folio;
+	u64 phys;
+
+	if (!ka_iter_end(iter))
+		return 0;
+
+	folio = folio_alloc(GFP_KERNEL | __GFP_ZERO, 0);
+	if (!folio)
+		return -ENOMEM;
+
+	kap = folio_address(folio);
+	kap->startpos = rounddown(iter->pos, KA_PAGE_NR_ENTRIES);
+
+	phys = (u64)PFN_PHYS(folio_pfn(folio));
+	/*
+	 * If the iterator already has a page, insert the page after it.
+	 * Otherwise, set the page as the first in the array.
+	 */
+	if (iter->cur)
+		iter->cur->next = phys;
+	else
+		iter->ka->first = phys;
+
+	iter->cur = kap;
+
+	return 0;
+}
+
+void ka_iter_init_read(struct ka_iter *iter, struct kho_array *ka)
+{
+	memset(iter, 0, sizeof(*iter));
+	iter->ka = ka;
+	iter->mode = KA_ITER_READ;
+	iter->cur = KA_PAGE(ka->first);
+
+	/* Make the iterator point to first valid entry. */
+	if (!ka_iter_getentry(iter))
+		ka_iter_nextentry(iter);
+}
+
+void ka_iter_init_write(struct ka_iter *iter, struct kho_array *ka)
+{
+	kho_array_init(ka);
+	memset(iter, 0, sizeof(*iter));
+	iter->ka = ka;
+	iter->mode = KA_ITER_WRITE;
+}
+
+int ka_iter_init_restore(struct ka_iter *iter, struct kho_array *ka)
+{
+	int err;
+
+	err = kho_array_restore(ka);
+	if (err)
+		return err;
+
+	ka_iter_init_read(iter, ka);
+	return 0;
+}
+
+int ka_iter_setpos(struct ka_iter *iter, unsigned long pos)
+{
+	if (pos < iter->pos)
+		return -EINVAL;
+
+	iter->pos = pos;
+
+	/*
+	 * The iterator must point to the highest page with startpos <= pos.
+	 * Advance it as far as possible.
+	 */
+	while (iter->cur && KA_PAGE(iter->cur->next) &&
+	       KA_PAGE(iter->cur->next)->startpos <= pos)
+		iter->cur = KA_PAGE(iter->cur->next);
+
+	return 0;
+}
+
+int ka_iter_setentry(struct ka_iter *iter, const void *value)
+{
+	int err = 0;
+
+	if (iter->mode != KA_ITER_WRITE)
+		return -EPERM;
+
+	err = ka_iter_extend(iter);
+	if (err)
+		return err;
+
+	iter->cur->entries[KA_ITER_PAGEPOS(iter)] = (u64)value;
+	return 0;
+}
+
+void *ka_iter_nextentry(struct ka_iter *iter)
+{
+	ka_iter_setpos(iter, iter->pos + 1);
+	while (!ka_iter_end(iter) && !ka_iter_getentry(iter)) {
+		/*
+		 * If we are in the hole between two pages, jump to the next
+		 * page.
+		 */
+		if (KA_ITER_PAGEPOS(iter) >= KA_PAGE_NR_ENTRIES)
+			/*
+			 * The check for ka_iter_end() above makes sure next
+			 * page exists.
+			 *
+			 * TODO: This is a bit nasty and might attract review
+			 * comments. Can I make it cleaner?
+			 */
+			ka_iter_setpos(iter, KA_PAGE(iter->cur->next)->startpos);
+		else
+			ka_iter_setpos(iter, iter->pos + 1);
+	}
+
+	return ka_iter_getentry(iter);
+}
+
+bool kho_array_valid(struct kho_array *ka)
+{
+	return ka->magic == KHO_ARRAY_MAGIC && ka->version == KHO_ARRAY_VERSION;
+}
+
+void kho_array_init(struct kho_array *ka)
+{
+	memset(ka, 0, sizeof(*ka));
+	ka->magic = KHO_ARRAY_MAGIC;
+	ka->version = KHO_ARRAY_VERSION;
+}
+
+void kho_array_destroy(struct kho_array *ka)
+{
+	u64 cur = ka->first, next;
+
+	while (cur) {
+		next = KA_PAGE(cur)->next;
+		folio_put(pfn_folio(PHYS_PFN(cur)));
+		cur = next;
+	}
+
+	ka->magic = 0;
+}
+
+int kho_array_preserve(struct kho_array *ka)
+{
+	u64 cur = ka->first;
+	int err;
+
+	while (cur) {
+		err = kho_preserve_folio(pfn_folio(PHYS_PFN(cur)));
+		if (err)
+			return err;
+
+		cur = KA_PAGE(cur)->next;
+	}
+
+	return 0;
+}
+
+int kho_array_restore(struct kho_array *ka)
+{
+	u64 cur = ka->first;
+	struct folio *folio;
+
+	if (!kho_array_valid(ka))
+		return -EOPNOTSUPP;
+
+	while (cur) {
+		folio = kho_restore_folio(cur);
+		if (!folio)
+			return -ENOMEM;
+		cur = KA_PAGE(cur)->next;
+	}
+
+	return 0;
+}
-- 
2.47.3



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC PATCH 2/4] kho: use KHO array for preserved memory bitmap serialization
  2025-09-09 14:44 [RFC PATCH 0/4] kho: introduce the KHO array Pratyush Yadav
  2025-09-09 14:44 ` [RFC PATCH 1/4] " Pratyush Yadav
@ 2025-09-09 14:44 ` Pratyush Yadav
  2025-09-09 14:44 ` [RFC PATCH 3/4] kho: add support for preserving vmalloc allocations Pratyush Yadav
  2025-09-09 14:44 ` [RFC PATCH 4/4] lib/test_kho: use kho_preserve_vmalloc instead of storing addresses in fdt Pratyush Yadav
  3 siblings, 0 replies; 8+ messages in thread
From: Pratyush Yadav @ 2025-09-09 14:44 UTC (permalink / raw)
  To: Alexander Graf, Mike Rapoport, Changyuan Lyu, Andrew Morton,
	Baoquan He, Pratyush Yadav, Pasha Tatashin, Jason Gunthorpe,
	Thomas Weißschuh, Chris Li, Jason Miu, David Matlack,
	David Rientjes
  Cc: linux-kernel, kexec, linux-mm

The preserved memory bitmap preservation creates a linked list of pages
to track the bitmaps for preserved memory. Essentially, it is a
scattered list of pointers grouped by folio order. Use a KHO array to
hold the pointers to the bitmaps instead. This moves the burden of
tracking this metadata to the KHO array layer, and makes the KHO core
simpler.

Currently, the bitmaps are held in chunks, which is a fixed-size array
of pointers, plus some metadata including the order of the preserved
folios. The KHO array holds only pointers and has no mechanism for
grouping. To make the serialization format simpler, move the folio order
from struct khoser_mem_chunk to struct khoser_mem_bitmap_ptr.

The chunks to hold the bitmaps are not KHO-preserved since they are only
used during the scratch-only phase. The same holds true with the KHO
array. The pages which track the KHO array metadata are not
KHO-preserved and thus are only valid during the scratch phase of the
next kernel. After that, they are discarded and freed to buddy.

Signed-off-by: Pratyush Yadav <pratyush@kernel.org>
---

The diff is a bit hard to read. The final result can be found at
https://git.kernel.org/pub/scm/linux/kernel/git/pratyush/linux.git/tree/kernel/kexec_handover.c?h=kho-array-rfc-v1#n227

 kernel/kexec_handover.c | 148 +++++++++++++++++++---------------------
 1 file changed, 69 insertions(+), 79 deletions(-)

diff --git a/kernel/kexec_handover.c b/kernel/kexec_handover.c
index ecd1ac210dbd7..26f9f5295f07d 100644
--- a/kernel/kexec_handover.c
+++ b/kernel/kexec_handover.c
@@ -18,6 +18,7 @@
 #include <linux/memblock.h>
 #include <linux/notifier.h>
 #include <linux/page-isolation.h>
+#include <linux/kho_array.h>
 
 #include <asm/early_ioremap.h>
 
@@ -80,15 +81,13 @@ struct kho_mem_track {
 	struct xarray orders;
 };
 
-struct khoser_mem_chunk;
-
 struct kho_serialization {
 	struct page *fdt;
 	struct list_head fdt_list;
 	struct dentry *sub_fdt_dir;
 	struct kho_mem_track track;
-	/* First chunk of serialized preserved memory map */
-	struct khoser_mem_chunk *preserved_mem_map;
+	/* Serialized preserved memory map */
+	struct kho_array *preserved_mem_map;
 };
 
 static void *xa_load_or_alloc(struct xarray *xa, unsigned long index, size_t sz)
@@ -226,11 +225,11 @@ EXPORT_SYMBOL_GPL(kho_restore_folio);
 
 /* Serialize and deserialize struct kho_mem_phys across kexec
  *
- * Record all the bitmaps in a linked list of pages for the next kernel to
- * process. Each chunk holds bitmaps of the same order and each block of bitmaps
- * starts at a given physical address. This allows the bitmaps to be sparse. The
- * xarray is used to store them in a tree while building up the data structure,
- * but the KHO successor kernel only needs to process them once in order.
+ * Record all the bitmaps in a KHO array for the next kernel to process. Each
+ * bitmap stores the order of the folios and starts at a given physical address.
+ * This allows the bitmaps to be sparse. The xarray is used to store them in a
+ * tree while building up the data structure, but the KHO successor kernel only
+ * needs to process them once in order.
  *
  * All of this memory is normal kmalloc() memory and is not marked for
  * preservation. The successor kernel will remain isolated to the scratch space
@@ -240,118 +239,107 @@ EXPORT_SYMBOL_GPL(kho_restore_folio);
 
 struct khoser_mem_bitmap_ptr {
 	phys_addr_t phys_start;
-	DECLARE_KHOSER_PTR(bitmap, struct kho_mem_phys_bits *);
-};
-
-struct khoser_mem_chunk_hdr {
-	DECLARE_KHOSER_PTR(next, struct khoser_mem_chunk *);
 	unsigned int order;
-	unsigned int num_elms;
-};
-
-#define KHOSER_BITMAP_SIZE                                   \
-	((PAGE_SIZE - sizeof(struct khoser_mem_chunk_hdr)) / \
-	 sizeof(struct khoser_mem_bitmap_ptr))
-
-struct khoser_mem_chunk {
-	struct khoser_mem_chunk_hdr hdr;
-	struct khoser_mem_bitmap_ptr bitmaps[KHOSER_BITMAP_SIZE];
+	unsigned int __reserved;
+	DECLARE_KHOSER_PTR(bitmap, struct kho_mem_phys_bits *);
 };
 
-static_assert(sizeof(struct khoser_mem_chunk) == PAGE_SIZE);
-
-static struct khoser_mem_chunk *new_chunk(struct khoser_mem_chunk *cur_chunk,
-					  unsigned long order)
+static struct khoser_mem_bitmap_ptr *new_bitmap(phys_addr_t start,
+						struct kho_mem_phys_bits *bits,
+						unsigned int order)
 {
-	struct khoser_mem_chunk *chunk;
+	struct khoser_mem_bitmap_ptr *bitmap;
 
-	chunk = kzalloc(PAGE_SIZE, GFP_KERNEL);
-	if (!chunk)
+	bitmap = kzalloc(sizeof(*bitmap), GFP_KERNEL);
+	if (!bitmap)
 		return NULL;
-	chunk->hdr.order = order;
-	if (cur_chunk)
-		KHOSER_STORE_PTR(cur_chunk->hdr.next, chunk);
-	return chunk;
+
+	bitmap->phys_start = start;
+	bitmap->order = order;
+	KHOSER_STORE_PTR(bitmap->bitmap, bits);
+	return bitmap;
 }
 
-static void kho_mem_ser_free(struct khoser_mem_chunk *first_chunk)
+static void kho_mem_ser_free(struct kho_array *ka)
 {
-	struct khoser_mem_chunk *chunk = first_chunk;
+	struct khoser_mem_bitmap_ptr *elm;
+	struct ka_iter iter;
 
-	while (chunk) {
-		struct khoser_mem_chunk *tmp = chunk;
+	if (!ka)
+		return;
 
-		chunk = KHOSER_LOAD_PTR(chunk->hdr.next);
-		kfree(tmp);
-	}
+	ka_iter_init_read(&iter, ka);
+	ka_iter_for_each(&iter, elm)
+		kfree(elm);
+
+	kho_array_destroy(ka);
+	kfree(ka);
 }
 
 static int kho_mem_serialize(struct kho_serialization *ser)
 {
-	struct khoser_mem_chunk *first_chunk = NULL;
-	struct khoser_mem_chunk *chunk = NULL;
 	struct kho_mem_phys *physxa;
-	unsigned long order;
+	unsigned long order, pos = 0;
+	struct kho_array *ka = NULL;
+	struct ka_iter iter;
+
+	ka = kzalloc(sizeof(*ka), GFP_KERNEL);
+	if (!ka)
+		return -ENOMEM;
+	ka_iter_init_write(&iter, ka);
 
 	xa_for_each(&ser->track.orders, order, physxa) {
 		struct kho_mem_phys_bits *bits;
 		unsigned long phys;
 
-		chunk = new_chunk(chunk, order);
-		if (!chunk)
-			goto err_free;
-
-		if (!first_chunk)
-			first_chunk = chunk;
-
 		xa_for_each(&physxa->phys_bits, phys, bits) {
 			struct khoser_mem_bitmap_ptr *elm;
+			phys_addr_t start;
+
+			start = (phys * PRESERVE_BITS) << (order + PAGE_SHIFT);
+			elm = new_bitmap(start, bits, order);
+			if (!elm)
+				goto err_free;
 
-			if (chunk->hdr.num_elms == ARRAY_SIZE(chunk->bitmaps)) {
-				chunk = new_chunk(chunk, order);
-				if (!chunk)
-					goto err_free;
-			}
-
-			elm = &chunk->bitmaps[chunk->hdr.num_elms];
-			chunk->hdr.num_elms++;
-			elm->phys_start = (phys * PRESERVE_BITS)
-					  << (order + PAGE_SHIFT);
-			KHOSER_STORE_PTR(elm->bitmap, bits);
+			ka_iter_setpos(&iter, pos);
+			if (ka_iter_setentry(&iter, elm))
+				goto err_free;
+			pos++;
 		}
 	}
 
-	ser->preserved_mem_map = first_chunk;
+	ser->preserved_mem_map = ka;
 
 	return 0;
 
 err_free:
-	kho_mem_ser_free(first_chunk);
+	kho_mem_ser_free(ka);
 	return -ENOMEM;
 }
 
-static void __init deserialize_bitmap(unsigned int order,
-				      struct khoser_mem_bitmap_ptr *elm)
+static void __init deserialize_bitmap(struct khoser_mem_bitmap_ptr *elm)
 {
 	struct kho_mem_phys_bits *bitmap = KHOSER_LOAD_PTR(elm->bitmap);
 	unsigned long bit;
 
 	for_each_set_bit(bit, bitmap->preserve, PRESERVE_BITS) {
-		int sz = 1 << (order + PAGE_SHIFT);
+		int sz = 1 << (elm->order + PAGE_SHIFT);
 		phys_addr_t phys =
-			elm->phys_start + (bit << (order + PAGE_SHIFT));
+			elm->phys_start + (bit << (elm->order + PAGE_SHIFT));
 		struct page *page = phys_to_page(phys);
 
 		memblock_reserve(phys, sz);
 		memblock_reserved_mark_noinit(phys, sz);
-		page->private = order;
+		page->private = elm->order;
 	}
 }
 
 static void __init kho_mem_deserialize(const void *fdt)
 {
-	struct khoser_mem_chunk *chunk;
+	struct khoser_mem_bitmap_ptr *elm;
 	const phys_addr_t *mem;
+	struct kho_array *ka;
+	struct ka_iter iter;
 	int len;
 
 	mem = fdt_getprop(fdt, 0, PROP_PRESERVED_MEMORY_MAP, &len);
@@ -361,15 +349,17 @@ static void __init kho_mem_deserialize(const void *fdt)
 		return;
 	}
 
-	chunk = *mem ? phys_to_virt(*mem) : NULL;
-	while (chunk) {
-		unsigned int i;
-
-		for (i = 0; i != chunk->hdr.num_elms; i++)
-			deserialize_bitmap(chunk->hdr.order,
-					   &chunk->bitmaps[i]);
-		chunk = KHOSER_LOAD_PTR(chunk->hdr.next);
+	ka = *mem ? phys_to_virt(*mem) : NULL;
+	if (!ka)
+		return;
+	if (!kho_array_valid(ka)) {
+		pr_err("invalid KHO array for preserved memory bitmaps\n");
+		return;
 	}
+
+	ka_iter_init_read(&iter, ka);
+	ka_iter_for_each(&iter, elm)
+		deserialize_bitmap(elm);
 }
 
 /*
-- 
2.47.3



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC PATCH 3/4] kho: add support for preserving vmalloc allocations
  2025-09-09 14:44 [RFC PATCH 0/4] kho: introduce the KHO array Pratyush Yadav
  2025-09-09 14:44 ` [RFC PATCH 1/4] " Pratyush Yadav
  2025-09-09 14:44 ` [RFC PATCH 2/4] kho: use KHO array for preserved memory bitmap serialization Pratyush Yadav
@ 2025-09-09 14:44 ` Pratyush Yadav
  2025-09-09 14:44 ` [RFC PATCH 4/4] lib/test_kho: use kho_preserve_vmalloc instead of storing addresses in fdt Pratyush Yadav
  3 siblings, 0 replies; 8+ messages in thread
From: Pratyush Yadav @ 2025-09-09 14:44 UTC (permalink / raw)
  To: Alexander Graf, Mike Rapoport, Changyuan Lyu, Andrew Morton,
	Baoquan He, Pratyush Yadav, Pasha Tatashin, Jason Gunthorpe,
	Thomas Weißschuh, Chris Li, Jason Miu, David Matlack,
	David Rientjes
  Cc: linux-kernel, kexec, linux-mm

From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>

A vmalloc allocation is preserved using binary structure similar to
global KHO memory tracker. It's a linked list of pages where each page
is an array of physical address of pages in vmalloc area.

kho_preserve_vmalloc() hands out the physical address of the head page
to the caller. This address is used as the argument to
kho_vmalloc_restore() to restore the mapping in the vmalloc address
space and populate it with the preserved pages.

Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
[pratyush@kernel.org: use KHO array instead of linked list of pages to
track physical addresses]
Signed-off-by: Pratyush Yadav <pratyush@kernel.org>
---
 include/linux/kexec_handover.h |  21 +++++
 kernel/kexec_handover.c        | 143 +++++++++++++++++++++++++++++++++
 2 files changed, 164 insertions(+)

diff --git a/include/linux/kexec_handover.h b/include/linux/kexec_handover.h
index 348844cffb136..633f94cec1a35 100644
--- a/include/linux/kexec_handover.h
+++ b/include/linux/kexec_handover.h
@@ -4,6 +4,7 @@
 
 #include <linux/types.h>
 #include <linux/errno.h>
+#include <linux/kho_array.h>
 
 struct kho_scratch {
 	phys_addr_t addr;
@@ -37,13 +38,23 @@ struct notifier_block;
 	})
 
 struct kho_serialization;
+struct kho_vmalloc;
 
 #ifdef CONFIG_KEXEC_HANDOVER
+struct kho_vmalloc {
+	struct kho_array ka;
+	unsigned int total_pages;
+	unsigned int flags;
+	unsigned short order;
+};
+
 bool kho_is_enabled(void);
 
 int kho_preserve_folio(struct folio *folio);
+int kho_preserve_vmalloc(void *ptr, struct kho_vmalloc *preservation);
 int kho_preserve_phys(phys_addr_t phys, size_t size);
 struct folio *kho_restore_folio(phys_addr_t phys);
+void *kho_restore_vmalloc(struct kho_vmalloc *preservation);
 int kho_add_subtree(struct kho_serialization *ser, const char *name, void *fdt);
 int kho_retrieve_subtree(const char *name, phys_addr_t *phys);
 
@@ -70,11 +81,21 @@ static inline int kho_preserve_phys(phys_addr_t phys, size_t size)
 	return -EOPNOTSUPP;
 }
 
+static inline int kho_preserve_vmalloc(void *ptr, struct kho_vmalloc *preservation)
+{
+	return -EOPNOTSUPP;
+}
+
 static inline struct folio *kho_restore_folio(phys_addr_t phys)
 {
 	return NULL;
 }
 
+static inline void *kho_restore_vmalloc(struct kho_vmalloc *preservation)
+{
+	return NULL;
+}
+
 static inline int kho_add_subtree(struct kho_serialization *ser,
 				  const char *name, void *fdt)
 {
diff --git a/kernel/kexec_handover.c b/kernel/kexec_handover.c
index 26f9f5295f07d..5f89134ceeee0 100644
--- a/kernel/kexec_handover.c
+++ b/kernel/kexec_handover.c
@@ -19,6 +19,7 @@
 #include <linux/notifier.h>
 #include <linux/page-isolation.h>
 #include <linux/kho_array.h>
+#include <linux/vmalloc.h>
 
 #include <asm/early_ioremap.h>
 
@@ -723,6 +724,148 @@ int kho_preserve_phys(phys_addr_t phys, size_t size)
 }
 EXPORT_SYMBOL_GPL(kho_preserve_phys);
 
+#define KHO_VMALLOC_FLAGS_MASK	(VM_ALLOC | VM_ALLOW_HUGE_VMAP)
+
+/**
+ * kho_preserve_vmalloc - preserve memory allocated with vmalloc() across kexec
+ * @ptr: pointer to the area in vmalloc address space
+ * @preservation: pointer to metadata for preserved data.
+ *
+ * Instructs KHO to preserve the area in vmalloc address space at @ptr. The
+ * physical pages mapped at @ptr will be preserved and on successful return
+ * @preservation will hold the structure that describes the metadata for the
+ * preserved pages. @preservation itself is not KHO-preserved. The caller must
+ * do that.
+ *
+ * NOTE: The memory allocated with vmalloc_node() variants cannot be reliably
+ * restored on the same node
+ *
+ * Return: 0 on success, error code on failure
+ */
+int kho_preserve_vmalloc(void *ptr, struct kho_vmalloc *preservation)
+{
+	struct kho_mem_track *track = &kho_out.ser.track;
+	struct vm_struct *vm = find_vm_area(ptr);
+	unsigned int order, flags;
+	struct ka_iter iter;
+	int err;
+
+	if (!vm)
+		return -EINVAL;
+
+	if (vm->flags & ~KHO_VMALLOC_FLAGS_MASK)
+		return -EOPNOTSUPP;
+
+	flags = vm->flags & KHO_VMALLOC_FLAGS_MASK;
+	order = get_vm_area_page_order(vm);
+
+	preservation->total_pages = vm->nr_pages;
+	preservation->flags = flags;
+	preservation->order = order;
+
+	ka_iter_init_write(&iter, &preservation->ka);
+
+	for (int i = 0, pos = 0; i < vm->nr_pages; i += (1 << order), pos++) {
+		phys_addr_t phys = page_to_phys(vm->pages[i]);
+
+		err = __kho_preserve_order(track, PHYS_PFN(phys), order);
+		if (err)
+			goto err_free;
+
+		err = ka_iter_setpos(&iter, pos);
+		if (err)
+			goto err_free;
+
+		err = ka_iter_setentry(&iter, ka_mk_value(phys));
+		if (err)
+			goto err_free;
+	}
+
+	err = kho_array_preserve(&preservation->ka);
+	if (err)
+		goto err_free;
+
+	return 0;
+
+err_free:
+	kho_array_destroy(&preservation->ka);
+	return err;
+}
+EXPORT_SYMBOL_GPL(kho_preserve_vmalloc);
+
+/**
+ * kho_restore_vmalloc - recreates and populates an area in vmalloc address
+ * space from the preserved memory.
+ * @preservation: the preservation metadata.
+ *
+ * Recreates an area in vmalloc address space and populates it with memory that
+ * was preserved using kho_preserve_vmalloc().
+ *
+ * Return: pointer to the area in the vmalloc address space, NULL on failure.
+ */
+void *kho_restore_vmalloc(struct kho_vmalloc *preservation)
+{
+	unsigned int align, order, shift, flags;
+	unsigned int idx = 0, nr;
+	unsigned long addr, size;
+	struct vm_struct *area;
+	struct page **pages;
+	struct ka_iter iter;
+	void *entry;
+	int err;
+
+	flags = preservation->flags;
+	if (flags & ~KHO_VMALLOC_FLAGS_MASK)
+		return NULL;
+
+	err = ka_iter_init_restore(&iter, &preservation->ka);
+	if (err)
+		return NULL;
+
+	nr = preservation->total_pages;
+	pages = kvmalloc_array(nr, sizeof(*pages), GFP_KERNEL);
+	if (!pages)
+		goto err_ka_destroy;
+	order = preservation->order;
+	shift = PAGE_SHIFT + order;
+	align = 1 << shift;
+
+	ka_iter_for_each(&iter, entry) {
+		phys_addr_t phys = ka_to_value(entry);
+		struct page *page;
+
+		page = phys_to_page(phys);
+		kho_restore_page(page, 0);
+		pages[idx++] = page;
+		phys += PAGE_SIZE;
+	}
+
+	area = __get_vm_area_node(nr * PAGE_SIZE, align, shift, flags,
+				  VMALLOC_START, VMALLOC_END, NUMA_NO_NODE,
+				  GFP_KERNEL, __builtin_return_address(0));
+	if (!area)
+		goto err_free_pages_array;
+
+	addr = (unsigned long)area->addr;
+	size = get_vm_area_size(area);
+	err = vmap_pages_range(addr, addr + size, PAGE_KERNEL, pages, shift);
+	if (err)
+		goto err_free_vm_area;
+
+	kho_array_destroy(&preservation->ka);
+
+	return area->addr;
+
+err_free_vm_area:
+	free_vm_area(area);
+err_free_pages_array:
+	kvfree(pages);
+err_ka_destroy:
+	kho_array_destroy(&preservation->ka);
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(kho_restore_vmalloc);
+
 /* Handling for debug/kho/out */
 
 static struct dentry *debugfs_root;
-- 
2.47.3



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC PATCH 4/4] lib/test_kho: use kho_preserve_vmalloc instead of storing addresses in fdt
  2025-09-09 14:44 [RFC PATCH 0/4] kho: introduce the KHO array Pratyush Yadav
                   ` (2 preceding siblings ...)
  2025-09-09 14:44 ` [RFC PATCH 3/4] kho: add support for preserving vmalloc allocations Pratyush Yadav
@ 2025-09-09 14:44 ` Pratyush Yadav
  3 siblings, 0 replies; 8+ messages in thread
From: Pratyush Yadav @ 2025-09-09 14:44 UTC (permalink / raw)
  To: Alexander Graf, Mike Rapoport, Changyuan Lyu, Andrew Morton,
	Baoquan He, Pratyush Yadav, Pasha Tatashin, Jason Gunthorpe,
	Thomas Weißschuh, Chris Li, Jason Miu, David Matlack,
	David Rientjes
  Cc: linux-kernel, kexec, linux-mm

From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>

KHO test stores physical addresses of the preserved folios directly in
fdt.
Use kho_preserve_vmalloc() instead of it and kho_restore_vmalloc() to
retrieve the addresses after kexec.

This makes the test more scalable from one side and adds tests coverage
for kho_preserve_vmalloc() from the other.

Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
[pratyush@kernel.org: use the KHO-array version of kho_restore_vmalloc()]
Signed-off-by: Pratyush Yadav <pratyush@kernel.org>
---
 lib/test_kho.c | 30 +++++++++++++++++++++++-------
 1 file changed, 23 insertions(+), 7 deletions(-)

diff --git a/lib/test_kho.c b/lib/test_kho.c
index c2eb899c3b456..3f4cb39cd917e 100644
--- a/lib/test_kho.c
+++ b/lib/test_kho.c
@@ -32,6 +32,7 @@ module_param(max_mem, long, 0644);
 struct kho_test_state {
 	unsigned int nr_folios;
 	struct folio **folios;
+	phys_addr_t *folios_info;
 	struct folio *fdt;
 	__wsum csum;
 };
@@ -67,14 +68,18 @@ static struct notifier_block kho_test_nb = {
 
 static int kho_test_save_data(struct kho_test_state *state, void *fdt)
 {
+	struct kho_vmalloc folios_info_preservation = {};
 	phys_addr_t *folios_info __free(kvfree) = NULL;
 	int err = 0;
 
-	folios_info = kvmalloc_array(state->nr_folios, sizeof(*folios_info),
-				     GFP_KERNEL);
+	folios_info = vmalloc_array(state->nr_folios, sizeof(*folios_info));
 	if (!folios_info)
 		return -ENOMEM;
 
+	err = kho_preserve_vmalloc(folios_info, &folios_info_preservation);
+	if (err)
+		return err;
+
 	for (int i = 0; i < state->nr_folios; i++) {
 		struct folio *folio = state->folios[i];
 		unsigned int order = folio_order(folio);
@@ -89,11 +94,14 @@ static int kho_test_save_data(struct kho_test_state *state, void *fdt)
 	err |= fdt_begin_node(fdt, "data");
 	err |= fdt_property(fdt, "nr_folios", &state->nr_folios,
 			    sizeof(state->nr_folios));
-	err |= fdt_property(fdt, "folios_info", folios_info,
-			    state->nr_folios * sizeof(*folios_info));
+	err |= fdt_property(fdt, "folios_info", &folios_info_preservation,
+			    sizeof(folios_info_preservation));
 	err |= fdt_property(fdt, "csum", &state->csum, sizeof(state->csum));
 	err |= fdt_end_node(fdt);
 
+	if (!err)
+		state->folios_info = no_free_ptr(folios_info);
+
 	return err;
 }
 
@@ -197,7 +205,8 @@ static int kho_test_save(void)
 static int kho_test_restore_data(const void *fdt, int node)
 {
 	const unsigned int *nr_folios;
-	const phys_addr_t *folios_info;
+	const struct kho_vmalloc *folios_info_preservation;
+	phys_addr_t *folios_info;
 	const __wsum *old_csum;
 	__wsum csum = 0;
 	int len;
@@ -212,8 +221,12 @@ static int kho_test_restore_data(const void *fdt, int node)
 	if (!old_csum || len != sizeof(*old_csum))
 		return -EINVAL;
 
-	folios_info = fdt_getprop(fdt, node, "folios_info", &len);
-	if (!folios_info || len != sizeof(*folios_info) * *nr_folios)
+	folios_info_preservation = fdt_getprop(fdt, node, "folios_info", &len);
+	if (!folios_info_preservation || len != sizeof(*folios_info_preservation))
+		return -EINVAL;
+
+	folios_info = kho_restore_vmalloc((struct kho_vmalloc *)folios_info_preservation);
+	if (!folios_info)
 		return -EINVAL;
 
 	for (int i = 0; i < *nr_folios; i++) {
@@ -233,6 +246,8 @@ static int kho_test_restore_data(const void *fdt, int node)
 		folio_put(folio);
 	}
 
+	vfree(folios_info);
+
 	if (csum != *old_csum)
 		return -EINVAL;
 
@@ -291,6 +306,7 @@ static void kho_test_cleanup(void)
 		folio_put(kho_test_state.folios[i]);
 
 	kvfree(kho_test_state.folios);
+	vfree(kho_test_state.folios_info);
 }
 
 static void __exit kho_test_exit(void)
-- 
2.47.3



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH 1/4] kho: introduce the KHO array
  2025-09-09 14:44 ` [RFC PATCH 1/4] " Pratyush Yadav
@ 2025-09-09 15:28   ` Jason Gunthorpe
  2025-09-09 15:40     ` Pratyush Yadav
  0 siblings, 1 reply; 8+ messages in thread
From: Jason Gunthorpe @ 2025-09-09 15:28 UTC (permalink / raw)
  To: Pratyush Yadav
  Cc: Alexander Graf, Mike Rapoport, Changyuan Lyu, Andrew Morton,
	Baoquan He, Pasha Tatashin, Thomas Weißschuh, Chris Li,
	Jason Miu, David Matlack, David Rientjes, linux-kernel, kexec,
	linux-mm

On Tue, Sep 09, 2025 at 04:44:21PM +0200, Pratyush Yadav wrote:
> The KHO Array is a data structure that behaves like a sparse array of
> pointers. It is designed to be preserved and restored over Kexec
> Handover (KHO), and targets only 64-bit platforms. It can store 8-byte
> aligned pointers. It can also store integers between 0 and LONG_MAX. It
> supports sparse indices, though it performs best with densely clustered
> indices.

That is a bit of an understatement, it looks like worst case cost is
4k per entry. I would expect better efficiency than this if we are
serious about supporting sparsity..

I think you need to encode the start pos within the entries in some
way so worst case cost is bounded to more like 16/24 byte per entry.

For instance if the page was broken up into an array of structs like

struct entries_block {
  u64 flags:1;
  u64 num_entries:13
  u64 pos_increment:50;
  u64 entries[]; // contiguous pos
};

And if a high 64 bit pos can't be represented with pos_increment then
you'd have flags = X and entries[0] == pos instead.

Jason


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH 1/4] kho: introduce the KHO array
  2025-09-09 15:28   ` Jason Gunthorpe
@ 2025-09-09 15:40     ` Pratyush Yadav
  2025-09-09 15:50       ` Jason Gunthorpe
  0 siblings, 1 reply; 8+ messages in thread
From: Pratyush Yadav @ 2025-09-09 15:40 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Pratyush Yadav, Alexander Graf, Mike Rapoport, Changyuan Lyu,
	Andrew Morton, Baoquan He, Pasha Tatashin, Thomas Weißschuh,
	Chris Li, Jason Miu, David Matlack, David Rientjes, linux-kernel,
	kexec, linux-mm

Hi Jason,

On Tue, Sep 09 2025, Jason Gunthorpe wrote:

> On Tue, Sep 09, 2025 at 04:44:21PM +0200, Pratyush Yadav wrote:
>> The KHO Array is a data structure that behaves like a sparse array of
>> pointers. It is designed to be preserved and restored over Kexec
>> Handover (KHO), and targets only 64-bit platforms. It can store 8-byte
>> aligned pointers. It can also store integers between 0 and LONG_MAX. It
>> supports sparse indices, though it performs best with densely clustered
>> indices.
>
> That is a bit of an understatement, it looks like worst case cost is
> 4k per entry. I would expect better efficiency than this if we are
> serious about supporting sparsity..
>
> I think you need to encode the start pos within the entries in some
> way so worst case cost is bounded to more like 16/24 byte per entry.
>
> For instance if the page was broken up into an array of structs like
>
> struct entries_block {
>   u64 flags:1;
>   u64 num_entries:13
>   u64 pos_increment:50;
>   u64 entries[]; // contiguous pos
> };

Right, good idea. I can look into this. But only if we get an agreement
that this whole idea is worth pursuing. I don't want to waste time on
something that will not make it in at a fundamental level :-)

I think another idea can be run-length encoding to make this even more
efficient. But I have stayed away from that so far since I think that
can get tricky and bug-prone to create and parse.

PS: do you know if bitfield layout is reliable for serialization? Can
different compiler versions move them around? I always thought they can.
If not, I can also use them in memfd code since they make the code
neater.

>
> And if a high 64 bit pos can't be represented with pos_increment then
> you'd have flags = X and entries[0] == pos instead.
>
> Jason

-- 
Regards,
Pratyush Yadav


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH 1/4] kho: introduce the KHO array
  2025-09-09 15:40     ` Pratyush Yadav
@ 2025-09-09 15:50       ` Jason Gunthorpe
  0 siblings, 0 replies; 8+ messages in thread
From: Jason Gunthorpe @ 2025-09-09 15:50 UTC (permalink / raw)
  To: Pratyush Yadav
  Cc: Alexander Graf, Mike Rapoport, Changyuan Lyu, Andrew Morton,
	Baoquan He, Pasha Tatashin, Thomas Weißschuh, Chris Li,
	Jason Miu, David Matlack, David Rientjes, linux-kernel, kexec,
	linux-mm

On Tue, Sep 09, 2025 at 05:40:21PM +0200, Pratyush Yadav wrote:
> PS: do you know if bitfield layout is reliable for serialization? Can
> different compiler versions move them around? I always thought they can.
> If not, I can also use them in memfd code since they make the code
> neater.

It is specified by the SYSV ABI, compilers are not free to change it.

https://gitlab.com/x86-psABIs/x86-64-ABI/-/blob/master/x86-64-ABI/low-level-sys-info.tex?ref_type=heads#L337

I didn't check ARM64 but would be shocked to find it is not specified.

Jason


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-09-09 15:50 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-09-09 14:44 [RFC PATCH 0/4] kho: introduce the KHO array Pratyush Yadav
2025-09-09 14:44 ` [RFC PATCH 1/4] " Pratyush Yadav
2025-09-09 15:28   ` Jason Gunthorpe
2025-09-09 15:40     ` Pratyush Yadav
2025-09-09 15:50       ` Jason Gunthorpe
2025-09-09 14:44 ` [RFC PATCH 2/4] kho: use KHO array for preserved memory bitmap serialization Pratyush Yadav
2025-09-09 14:44 ` [RFC PATCH 3/4] kho: add support for preserving vmalloc allocations Pratyush Yadav
2025-09-09 14:44 ` [RFC PATCH 4/4] lib/test_kho: use kho_preserve_vmalloc instead of storing addresses in fdt Pratyush Yadav

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox