* [PATCH v3 0/7] KSTATE: [de]serialization framework for KHO
@ 2025-09-09 20:14 Andrey Ryabinin
2025-09-09 20:14 ` [PATCH v3 1/7] kho: move fdt setup in separate helper Andrey Ryabinin
` (6 more replies)
0 siblings, 7 replies; 18+ messages in thread
From: Andrey Ryabinin @ 2025-09-09 20:14 UTC (permalink / raw)
To: linux-kernel
Cc: Alexander Graf, Mike Rapoport, James Gowans, Andrew Morton,
linux-mm, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Baoquan He, kexec,
Pratyush Yadav, Jason Gunthorpe, Pasha Tatashin, David Rientjes,
Pratyush Yadav, Changyuan Lyu, Jonathan Corbet, linux-doc,
Andrey Ryabinin, Chris Li, Ashish.Kalra, William Tu,
David Matlack, Andrey Ryabinin
Main changes from v2 [1]:
- Rebase on top of KHO+LUO. Instead of being standalone solution KSTATE now
relies on KHO for preserving memory.
Main changes from v1 [2]:
- Get rid of abusing crashkernel and implent proper way to pass memory to new kernel
- Lots of misc cleanups/refactoring.
Series depenecies:
- is_kho_boot() - https://lkml.kernel.org/r/cover.1755721529.git.epetron@amazon.de
- LUO v3 series - https://lkml.kernel.org/r/20250807014442.3829950-1-pasha.tatashin@soleen.com
GIT:
git fetch https://github.com/aryabinin/linux.git kstate-v3
TODO:
- KSTATE currently have only one, global stream of data. We need to add
substreams (kinda like subtrees in FDT) and integrate them with LUO,
so we could have per-file streams.
That is planned to be fixed in v4.
KSTATE (kernel state) is a mechanism to describe some part of the internal
kernel state, save it into the memory preserved by KHO and restore the
state after kexec in the new kernel.
The end goal here is to be able to update host kernel under VMs
with VFIO pass-through devices running on that host. This implies that
we need svae/restore a lot different structs/state across different subsystems.
The purpose of KSTATE is to provide common infrastructure for saving/restoring
complex in-kernel states. Currently KHO uses FDT for that purpose, KSTATE
aims to provide easier for use alternative.
In this series KSTATE provides alternative to FDT usage in KHO, without
replacing it completely. So both can be used and FDT user can be converted
to KSTATE later if needed.
As demonstration memblock's reserved tables converted from FDT to KSTATE,
making the code simpler and smaller:
include/linux/kstate.h | 1
mm/memblock.c | 158
2 files changed, 49 insertions(+), 110 deletions(-)
The idea behind KSTATE resembles QEMU's migration framework [3], which
solves quite similar problem - migrate state of VM/emulated devices
across different versions of QEMU.
So why not use FDT?
- The main reason is FDT doesn't provide simple and convenient internal
API for the drivers/subsystems to preserve internal data.
E.g. lets consider we have some variable of type 'struct a'
that needs to be preserved:
struct a {
int i;
unsigned long *p_ulong;
char s[10];
struct folio *folio;
};
The FDT-way requires driver/subsystem to have a bunch of code
dealing with FDT stuff, something like
a_kho_write()
{
...
fdt_property(fdt, "i", &a.i, sizeof(a.i));
fdt_property(fdt, "ulong", a.p_ulong, sizeof(*a.p_ulong));
fdt_property(fdt, "s", &a.s, sizeof(a.s));
if (err)
...
}
a_kho_restore()
{
...
a.i = fdt_getprop(fdt, offset, "i", &len);
if (!a.i || len != sizeof(a.i))
goto err
*a.p_ulong = fdt_getprop....
}
Each driver/subsystem has to solve this problem in their own way.
Also if we use fdt properties for individual fields, that might be wastefull
in terms of used memory, as these properties use strings as keys.
While with KSTATE solves the same problem in more elegant way, with this:
struct kstate_description a_state = {
.name = "a_struct",
.version_id = 1,
.id = KSTATE_TEST_ID,
.state_list = LIST_HEAD_INIT(test_state.state_list),
.fields = (const struct kstate_field[]) {
KSTATE_BASE_TYPE(i, struct a, int),
KSTATE_BASE_TYPE(s, struct a, char [10]),
KSTATE_POINTER(p_ulong, struct a),
KSTATE_FOLIO(page, struct a),
KSTATE_END_OF_LIST()
},
};
saving:
{
static unsigned long ulong
static struct a a_data = { .p_ulong = &ulong };
const int a_data_instance_id = 123;
kstate_register(&test_state, &a_data, a_data_instance_id);
}
restoring:
{
static unsigned long ulong
static struct a a_data = { .p_ulong = &ulong };
const int a_data_instance_id = 123;
kstate_restore(&test_state, &a_data, a_data_instance_id);
}
The driver needs only to have a proper 'kstate_description' and provide some ID that
uniquely identifies `a_data` among other instances of 'struct a'. Then call kstate_register()
which will register a_data to be saved and KHO-finalize stage of kexec reboot.
After reboot, the kstate_restore() call should restore all parts of a_data, in accordance with
kstate_description.
So basically 'struct kstate_description' provides instructions how to save/restore 'struct a'.
So now to the part how this works.
State of kernel data (usually it's some struct) is described by the
'struct kstate_description' containing the array of individual
fields descpriptions - 'struct kstate_field'. Each field
has set of bits in ->flags which instructs how to save/restore
a certain field of the struct. E.g.:
- KS_BASE_TYPE flag tells that field can be just copied by value,
- KS_POINTER means that the struct member is a pointer to the actual
data, so it needs to be dereference before saving/restoring data
to/from kstate data steam.
- KS_STRUCT - contains another struct, field->ksd must point to
another 'struct kstate_dscription'
- KS_CUSTOM - Some non-trivial field that requires custom kstate_field->save()
->restore() callbacks to save/restore data.
- KS_ARRAY_OF_POINTER - array of pointers, the size of array determined by the
field->count() callback
- KS_ADDRESS - field is a pointer to either vmemmap area (struct page) or
linear address. Stored as offset from the base address.
- KS_END - special flag indicating the end of migration stream data.
kstate_register() call accepts kstate_description along with an instance
of an object and registers it in the global 'states' list.
During 'finalize' phase of KHO we go through the list of 'kstate_description's
and each instance of kstate_description forms the 'struct kstate_entry'
which save into the kstate's data stream.
The 'kstate_entry' contains information like ID of kstate_description, version
of it, size of migration data and the data itself. The ->data is formed in
accordance to the kstate_field's of the corresponding kstate_description.
After the reboot, when the kstate_restore() called it parses KSTATE's data
stream, finds the appropriate 'kstate_entry' and restores the contents of
the object in accordance with kstate_description and ->fields.
[1] https://lkml.kernel.org/r/20250310120318.2124-1-arbn@yandex-team.com
[2] https://lkml.kernel.org/r/20241002160722.20025-1-arbn@yandex-team.com
[3] https://www.qemu.org/docs/master/devel/migration/main.html#vmstate
Andrey Ryabinin (7):
kho: move fdt setup in separate helper.
kho: move scratch memory in separate helper.
kstate: Add KSTATE - [de]serialization framework for KHO
kho: replace KHO FDT with kstate metadata
kstate, test: add test module for testing kstate subsystem.
mm/memblock: Use KSTATE instead of kho to preserve preserved_mem_table
Documentation, kstate: Add KSTATE documentation
Documentation/core-api/index.rst | 1 +
Documentation/core-api/kstate.rst | 117 ++++++
MAINTAINERS | 8 +
arch/x86/include/uapi/asm/setup_data.h | 4 +-
arch/x86/kernel/kexec-bzimage64.c | 6 +-
arch/x86/kernel/setup.c | 3 +-
drivers/of/fdt.c | 6 +-
include/linux/kexec.h | 2 +-
include/linux/kstate.h | 235 +++++++++++
kernel/liveupdate/Kconfig | 16 +
kernel/liveupdate/Makefile | 2 +
kernel/liveupdate/kexec_handover.c | 95 ++++-
kernel/liveupdate/kstate.c | 536 +++++++++++++++++++++++++
lib/Makefile | 2 +
lib/test_kstate.c | 116 ++++++
mm/memblock.c | 158 +++-----
16 files changed, 1174 insertions(+), 133 deletions(-)
create mode 100644 Documentation/core-api/kstate.rst
create mode 100644 include/linux/kstate.h
create mode 100644 kernel/liveupdate/kstate.c
create mode 100644 lib/test_kstate.c
--
2.49.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v3 1/7] kho: move fdt setup in separate helper.
2025-09-09 20:14 [PATCH v3 0/7] KSTATE: [de]serialization framework for KHO Andrey Ryabinin
@ 2025-09-09 20:14 ` Andrey Ryabinin
2025-09-09 20:14 ` [PATCH v3 2/7] kho: move scratch memory " Andrey Ryabinin
` (5 subsequent siblings)
6 siblings, 0 replies; 18+ messages in thread
From: Andrey Ryabinin @ 2025-09-09 20:14 UTC (permalink / raw)
To: linux-kernel
Cc: Alexander Graf, Mike Rapoport, James Gowans, Andrew Morton,
linux-mm, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Baoquan He, kexec,
Pratyush Yadav, Jason Gunthorpe, Pasha Tatashin, David Rientjes,
Pratyush Yadav, Changyuan Lyu, Jonathan Corbet, linux-doc,
Andrey Ryabinin, Chris Li, Ashish.Kalra, William Tu,
David Matlack, Andrey Ryabinin
Move FDT initialization logic from kho_populate() into a new
helper function kho_fdt_init(). The helper takes care of mapping,
validating, and unmapping the FDT.
This prepares for using KSTATE in KHO instead of FDT.
Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
---
kernel/liveupdate/kexec_handover.c | 27 +++++++++++++++++++++------
1 file changed, 21 insertions(+), 6 deletions(-)
diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_handover.c
index 364a69a2fa1b..e5a436893f5b 100644
--- a/kernel/liveupdate/kexec_handover.c
+++ b/kernel/liveupdate/kexec_handover.c
@@ -1077,13 +1077,10 @@ void __init kho_memory_init(void)
}
}
-void __init kho_populate(phys_addr_t fdt_phys, u64 fdt_len,
- phys_addr_t scratch_phys, u64 scratch_len)
+static int __init kho_fdt_init(phys_addr_t fdt_phys, u64 fdt_len)
{
void *fdt = NULL;
- struct kho_scratch *scratch = NULL;
int err = 0;
- unsigned int scratch_cnt = scratch_len / sizeof(*kho_scratch);
/* Validate the input FDT */
fdt = early_memremap(fdt_phys, fdt_len);
@@ -1107,6 +1104,26 @@ void __init kho_populate(phys_addr_t fdt_phys, u64 fdt_len,
goto out;
}
+out:
+ if (fdt)
+ early_memunmap(fdt, fdt_len);
+
+ return err;
+}
+
+void __init kho_populate(phys_addr_t fdt_phys, u64 fdt_len,
+ phys_addr_t scratch_phys, u64 scratch_len)
+{
+
+ struct kho_scratch *scratch = NULL;
+ int err = 0;
+ unsigned int scratch_cnt = scratch_len / sizeof(*kho_scratch);
+
+
+ err = kho_fdt_init(fdt_phys, fdt_len);
+ if (err)
+ goto out;
+
scratch = early_memremap(scratch_phys, scratch_len);
if (!scratch) {
pr_warn("setup: failed to memremap scratch (phys=0x%llx, len=%lld)\n",
@@ -1151,8 +1168,6 @@ void __init kho_populate(phys_addr_t fdt_phys, u64 fdt_len,
pr_info("found kexec handover data. Will skip init for some devices\n");
out:
- if (fdt)
- early_memunmap(fdt, fdt_len);
if (scratch)
early_memunmap(scratch, scratch_len);
if (err)
--
2.49.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v3 2/7] kho: move scratch memory in separate helper.
2025-09-09 20:14 [PATCH v3 0/7] KSTATE: [de]serialization framework for KHO Andrey Ryabinin
2025-09-09 20:14 ` [PATCH v3 1/7] kho: move fdt setup in separate helper Andrey Ryabinin
@ 2025-09-09 20:14 ` Andrey Ryabinin
2025-09-09 20:14 ` [PATCH v3 3/7] kstate: Add KSTATE - [de]serialization framework for KHO Andrey Ryabinin
` (4 subsequent siblings)
6 siblings, 0 replies; 18+ messages in thread
From: Andrey Ryabinin @ 2025-09-09 20:14 UTC (permalink / raw)
To: linux-kernel
Cc: Alexander Graf, Mike Rapoport, James Gowans, Andrew Morton,
linux-mm, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Baoquan He, kexec,
Pratyush Yadav, Jason Gunthorpe, Pasha Tatashin, David Rientjes,
Pratyush Yadav, Changyuan Lyu, Jonathan Corbet, linux-doc,
Andrey Ryabinin, Chris Li, Ashish.Kalra, William Tu,
David Matlack, Andrey Ryabinin
Move scratch memory initialization logic from kho_populate() into a new
helper function kho_scrath_init().
This prepares for using KSTATE in KHO instead of FDT.
Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
---
kernel/liveupdate/kexec_handover.c | 34 ++++++++++++++++++++----------
1 file changed, 23 insertions(+), 11 deletions(-)
diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_handover.c
index e5a436893f5b..a99425fdfce4 100644
--- a/kernel/liveupdate/kexec_handover.c
+++ b/kernel/liveupdate/kexec_handover.c
@@ -1111,19 +1111,12 @@ static int __init kho_fdt_init(phys_addr_t fdt_phys, u64 fdt_len)
return err;
}
-void __init kho_populate(phys_addr_t fdt_phys, u64 fdt_len,
- phys_addr_t scratch_phys, u64 scratch_len)
+static int __init kho_scratch_init(phys_addr_t scratch_phys, u64 scratch_len)
{
-
- struct kho_scratch *scratch = NULL;
int err = 0;
+ struct kho_scratch *scratch = NULL;
unsigned int scratch_cnt = scratch_len / sizeof(*kho_scratch);
-
- err = kho_fdt_init(fdt_phys, fdt_len);
- if (err)
- goto out;
-
scratch = early_memremap(scratch_phys, scratch_len);
if (!scratch) {
pr_warn("setup: failed to memremap scratch (phys=0x%llx, len=%lld)\n",
@@ -1161,6 +1154,27 @@ void __init kho_populate(phys_addr_t fdt_phys, u64 fdt_len,
* memory reservations from the previous kernel.
*/
memblock_set_kho_scratch_only();
+out:
+ if (scratch)
+ early_memunmap(scratch, scratch_len);
+
+ return err;
+}
+
+void __init kho_populate(phys_addr_t fdt_phys, u64 fdt_len,
+ phys_addr_t scratch_phys, u64 scratch_len)
+{
+
+ int err = 0;
+ unsigned int scratch_cnt = scratch_len / sizeof(*kho_scratch);
+
+ err = kho_fdt_init(fdt_phys, fdt_len);
+ if (err)
+ goto out;
+
+ err = kho_scratch_init(scratch_phys, scratch_len);
+ if (err)
+ goto out;
kho_in.fdt_phys = fdt_phys;
kho_in.scratch_phys = scratch_phys;
@@ -1168,8 +1182,6 @@ void __init kho_populate(phys_addr_t fdt_phys, u64 fdt_len,
pr_info("found kexec handover data. Will skip init for some devices\n");
out:
- if (scratch)
- early_memunmap(scratch, scratch_len);
if (err)
pr_warn("disabling KHO revival: %d\n", err);
}
--
2.49.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v3 3/7] kstate: Add KSTATE - [de]serialization framework for KHO
2025-09-09 20:14 [PATCH v3 0/7] KSTATE: [de]serialization framework for KHO Andrey Ryabinin
2025-09-09 20:14 ` [PATCH v3 1/7] kho: move fdt setup in separate helper Andrey Ryabinin
2025-09-09 20:14 ` [PATCH v3 2/7] kho: move scratch memory " Andrey Ryabinin
@ 2025-09-09 20:14 ` Andrey Ryabinin
2025-09-09 20:14 ` [PATCH v3 4/7] kho: replace KHO FDT with kstate metadata Andrey Ryabinin
` (3 subsequent siblings)
6 siblings, 0 replies; 18+ messages in thread
From: Andrey Ryabinin @ 2025-09-09 20:14 UTC (permalink / raw)
To: linux-kernel
Cc: Alexander Graf, Mike Rapoport, James Gowans, Andrew Morton,
linux-mm, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Baoquan He, kexec,
Pratyush Yadav, Jason Gunthorpe, Pasha Tatashin, David Rientjes,
Pratyush Yadav, Changyuan Lyu, Jonathan Corbet, linux-doc,
Andrey Ryabinin, Chris Li, Ashish.Kalra, William Tu,
David Matlack, Andrey Ryabinin
KSTATE (kernel state) is a mechanism to describe some part of the internal
kernel state, save it into the memory preserved by KHO and restore the
state after kexec in the new kernel.
The end goal here is to be able to update host kernel under VMs
with VFIO pass-through devices running on that host. This implies that
we need svae/restore a lot different structs/state across different subsystems.
The purpose of KSTATE is to provide common infrastructure for saving/restoring
complex in-kernel states. Currently KHO uses FDT for that purpose, KSTATE
aims to provide easier for use alternative.
In this series KSTATE provides alternative to FDT usage in KHO, without
replacing it completely. So both can be used and FDT user can be converted
to KSTATE later if needed.
The idea behind KSTATE resembles QEMU's migration framework [1], which
solves quite similar problem - migrate state of VM/emulated devices
across different versions of QEMU.
State of kernel data (usually it's some struct) is described by the
'struct kstate_description' containing the array of individual
fields descpriptions - 'struct kstate_field'. Each field
has set of bits in ->flags which instructs how to save/restore
a certain field of the struct. E.g.:
- KS_BASE_TYPE flag tells that field can be just copied by value,
- KS_POINTER means that the struct member is a pointer to the actual
data, so it needs to be dereference before saving/restoring data
to/from kstate data steam.
- KS_STRUCT - contains another struct, field->ksd must point to
another 'struct kstate_dscription'
- KS_CUSTOM - Some non-trivial field that requires custom kstate_field->save()
->restore() callbacks to save/restore data.
- KS_ARRAY_OF_POINTER - array of pointers, the size of array determined by the
field->count() callback
- KS_ADDRESS - field is a pointer to either vmemmap area (struct page) or
linear address. Stored as offset from the base address.
- KS_END - special flag indicating the end of migration stream data.
kstate_register() call accepts kstate_description along with an instance
of an object and registers it in the global 'states' list.
During 'finalize' phase of KHO we go through the list of 'kstate_description's
and each instance of kstate_description forms the 'struct kstate_entry'
which save into the kstate's data stream.
The 'kstate_entry' contains information like ID of kstate_description, version
of it, size of migration data and the data itself. The ->data is formed in
accordance to the kstate_field's of the corresponding kstate_description.
After the reboot, when the kstate_restore() called it parses KSTATE's data
stream, finds the appropriate 'kstate_entry' and restores the contents of
the object in accordance with kstate_description and ->fields.
[1] https://www.qemu.org/docs/master/devel/migration/main.html#vmstate
Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
---
MAINTAINERS | 6 +
include/linux/kstate.h | 231 +++++++++++++
kernel/liveupdate/Kconfig | 8 +
kernel/liveupdate/Makefile | 2 +
kernel/liveupdate/kexec_handover.c | 4 +
kernel/liveupdate/kstate.c | 536 +++++++++++++++++++++++++++++
6 files changed, 787 insertions(+)
create mode 100644 include/linux/kstate.h
create mode 100644 kernel/liveupdate/kstate.c
diff --git a/MAINTAINERS b/MAINTAINERS
index 8296bf89e040..2cd9e49abee5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13718,6 +13718,12 @@ F: Documentation/admin-guide/auxdisplay/ks0108.rst
F: drivers/auxdisplay/ks0108.c
F: include/linux/ks0108.h
+KSTATE
+M: Andrey Ryabinin <ryabinin.a.a@gmail.com>
+S: Maintained
+F: include/linux/kstate.h
+F: kernel/livupdate/kstate.c
+
KTD253 BACKLIGHT DRIVER
M: Linus Walleij <linus.walleij@linaro.org>
S: Maintained
diff --git a/include/linux/kstate.h b/include/linux/kstate.h
new file mode 100644
index 000000000000..53992593cb19
--- /dev/null
+++ b/include/linux/kstate.h
@@ -0,0 +1,231 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _KSTATE_H
+#define _KSTATE_H
+
+#include <linux/atomic.h>
+#include <linux/build_bug.h>
+#include <linux/list.h>
+#include <linux/stringify.h>
+
+struct kstate_description;
+struct kstate_stream;
+struct kimage;
+
+enum kstate_flags {
+
+ /*
+ * The struct member at 'obj + kstate_field.offset' is some basic
+ * type, just copy it by value. The size is kstate_field->size.
+ */
+
+ KS_BASE_TYPE = (1 << 0),
+
+ /*
+ * The struct member at 'obj + kstate_field.offset' is a pointer
+ * to the actual data (e.g. struct a { int *b; }).
+ * save_kstate() will dereference the pointer to get the actual data
+ * and store it to the stream. restore_kstate() will copy the data from
+ * the stream to wherever the pointer points to.
+ */
+ KS_POINTER = (1 << 1),
+
+ /*
+ * The struct member at 'obj + kstate_field.offset' is another struct.
+ * kstate_field->ksd points to 'kstate_description' of that struct.
+ */
+ KS_STRUCT = (1 << 2),
+
+ /*
+ * Some non-trivial field that requires custom kstate_field->save()
+ * ->restore() callbacks to save/restore data.
+ */
+ KS_CUSTOM = (1 << 3),
+
+ /*
+ * The field is a array of kstate_field->count() pointers
+ * (e.g. struct a { uint8_t *b[]; }). Dereference each array entry
+ * before store/restore data.
+ */
+ KS_ARRAY_OF_POINTER = (1 << 4),
+
+ /*
+ * The field is a pointer to vmemmap or linear memory (determined by
+ * kstate_field->addr_type). This is used for pointers to persistent
+ * pages/data. Store offset from the start of the area instead of
+ * pointer itself, so we could defeat KASLR on restore phase (by adding
+ * new kernel's corresponding offset).
+ */
+ KS_ADDRESS = (1 << 5),
+
+ /*
+ * The field used to exist in older versions. kstate_field->version_id
+ * is latest version that have this field.
+ */
+ KS_DEPRECATED = (1 << 6),
+
+ /* Marks the end of fields list */
+ KS_END = (1UL << 31),
+};
+
+enum kstate_addr_type {
+ KS_VMEMMAP_ADDR,
+ KS_LINEAR_ADDR,
+};
+
+struct kstate_stream {
+ void *pos;
+ struct folio *folio;
+};
+
+struct kstate_field {
+ const char *name;
+ size_t offset;
+ size_t size;
+ enum kstate_flags flags;
+ const struct kstate_description *ksd;
+ enum kstate_addr_type addr_type;
+ int version_id;
+ int (*restore)(struct kstate_stream *stream, void *obj,
+ const struct kstate_field *field);
+ int (*save)(struct kstate_stream *stream, void *obj,
+ const struct kstate_field *field);
+ int (*count)(void);
+};
+
+enum kstate_ids {
+ KSTATE_FOLIO_ID = 1,
+ KSTATE_LAST_ID = -1,
+};
+
+struct kstate_description {
+ const char *name;
+ enum kstate_ids id;
+ atomic_t instance_id;
+ int version_id;
+ int min_version_id;
+
+ const struct kstate_field *fields;
+ const struct kstate_description **subsections;
+};
+
+struct state_entry {
+ u64 id;
+ struct list_head list;
+ struct kstate_description *kstd;
+ void *obj;
+};
+
+static inline bool kstate_get_byte(struct kstate_stream *stream)
+{
+ bool ret = *(u8 *)stream->pos;
+
+ stream->pos++;
+ return ret;
+}
+
+static inline unsigned long kstate_get_ulong(struct kstate_stream *stream)
+{
+ unsigned long ret = *(unsigned long *)stream->pos;
+
+ stream->pos += sizeof(unsigned long);
+ return ret;
+}
+
+extern struct kstate_description page_state;
+
+#ifdef CONFIG_KSTATE
+
+extern phys_addr_t kstate_out_paddr;
+
+int kstate_save_state(void);
+void free_kstate_stream(void);
+
+int kstate_save_data(struct kstate_stream *stream, const void *val,
+ size_t size);
+void kstate_restore_data(struct kstate_stream *stream, void *val, size_t size);
+int kstate_register(struct kstate_description *state, void *obj, int id);
+void kstate_unregister(struct kstate_description *state, void *obj, int id);
+int kstate_restore(struct kstate_description *state, void *obj, int id);
+int kstate_register_restore(struct kstate_description *state, void *obj);
+
+struct kstate_entry;
+
+int kstate_folio_save(struct kstate_stream *stream, void *obj,
+ const struct kstate_field *field);
+int kstate_folio_restore(struct kstate_stream *stream, void *obj,
+ const struct kstate_field *field);
+
+int kstate_abort(void);
+int kstate_finalize(void);
+
+int kstate_early_init(phys_addr_t kstate_entries, u64 len);
+
+#else
+
+#define kstate_register(state, obj)
+
+static inline int kstate_save_data(struct kstate_stream *stream,
+ const void *val, size_t size)
+{
+ return 0;
+}
+static inline void kstate_restore_data(struct kstate_stream *stream, void *val,
+ size_t size)
+{
+}
+
+#endif
+
+#define KSTATE_BASE_TYPE_V(_f, _state, _type, _v) { \
+ .name = (__stringify(_f)), \
+ .version_id = (_v), \
+ .size = sizeof(_type) + BUILD_BUG_ON_ZERO( \
+ !__same_type(typeof_member(_state, _f), _type)),\
+ .flags = KS_BASE_TYPE, \
+ .offset = offsetof(_state, _f), \
+}
+
+#define KSTATE_BASE_TYPE(_f, _state, _type) \
+ KSTATE_BASE_TYPE_V(_f, _state, _type, 0)
+
+#define KSTATE_BASE_TYPE_DEPRECATED(_f, _type, _v) { \
+ .name = (__stringify(_f)), \
+ .version_id = (_v), \
+ .size = sizeof(_type), \
+ .flags = KS_DEPRECATED, \
+}
+
+#define KSTATE_POINTER_V(_f, _state, _v) { \
+ .name = (__stringify(_f)), \
+ .version_id = (_v), \
+ .size = sizeof(*(((_state *)0)->_f)), \
+ .flags = KS_POINTER, \
+ .offset = offsetof(_state, _f), \
+ }
+
+#define KSTATE_POINTER(_f, _state) KSTATE_POINTER_V(_f, _state, 0)
+
+#define KSTATE_ADDRESS_V(_f, _state, _addr_type, _v) { \
+ .name = (__stringify(_f)), \
+ .version_id = (_v), \
+ .size = sizeof(*(((_state *)0)->_f)), \
+ .addr_type = (_addr_type), \
+ .flags = KS_ADDRESS, \
+ .offset = offsetof(_state, _f), \
+ }
+#define KSTATE_ADDRESS(_f, _state, _addr_type) \
+ KSTATE_ADDRESS_V(_f, _state, _addr_type, 0)
+
+#define KSTATE_FOLIO(_f, _state) { \
+ .name = "folio", \
+ .flags = KS_CUSTOM, \
+ .offset = offsetof(_state, _f), \
+ .save = kstate_folio_save, \
+ .restore = kstate_folio_restore, \
+ }
+
+#define KSTATE_END_OF_LIST() { \
+ .flags = KS_END, \
+ }
+
+#endif
diff --git a/kernel/liveupdate/Kconfig b/kernel/liveupdate/Kconfig
index 5be04ede357d..b6ea861006bf 100644
--- a/kernel/liveupdate/Kconfig
+++ b/kernel/liveupdate/Kconfig
@@ -62,12 +62,20 @@ config LIVEUPDATE_SELFTESTS
If you are unsure or are building a production kernel where size
or attack surface is a concern, say N.
+config KSTATE
+ bool
+ help
+ KSTATE (kernel state) is a mechanism to describe internal kernel
+ state, save it into the memory and restore the state after kexec
+ in new kernel.
+
config KEXEC_HANDOVER
bool "kexec handover"
depends on ARCH_SUPPORTS_KEXEC_HANDOVER && ARCH_SUPPORTS_KEXEC_FILE
depends on !DEFERRED_STRUCT_PAGE_INIT
select MEMBLOCK_KHO_SCRATCH
select KEXEC_FILE
+ select KSTATE
select DEBUG_FS
select LIBFDT
select CMA
diff --git a/kernel/liveupdate/Makefile b/kernel/liveupdate/Makefile
index 9b8b69517463..0dc5122147d0 100644
--- a/kernel/liveupdate/Makefile
+++ b/kernel/liveupdate/Makefile
@@ -12,6 +12,8 @@ luo-y := \
obj-$(CONFIG_KEXEC_HANDOVER) += kexec_handover.o
obj-$(CONFIG_KEXEC_HANDOVER_DEBUG) += kexec_handover_debug.o
+obj-$(CONFIG_KSTATE) += kstate.o
+
obj-$(CONFIG_LIVEUPDATE) += luo.o
obj-$(CONFIG_LIVEUPDATE_SELFTESTS) += luo_selftests.o
obj-$(CONFIG_LIVEUPDATE_SYSFS_API) += luo_sysfs.o
diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_handover.c
index a99425fdfce4..f1c6378b2657 100644
--- a/kernel/liveupdate/kexec_handover.c
+++ b/kernel/liveupdate/kexec_handover.c
@@ -12,6 +12,7 @@
#include <linux/count_zeros.h>
#include <linux/kexec.h>
#include <linux/kexec_handover.h>
+#include <linux/kstate.h>
#include <linux/libfdt.h>
#include <linux/list.h>
#include <linux/memblock.h>
@@ -885,6 +886,9 @@ int kho_finalize(void)
ret = -EEXIST;
goto unlock;
}
+ ret = kstate_finalize();
+ if (ret)
+ goto unlock;
ret = __kho_finalize();
if (ret)
diff --git a/kernel/liveupdate/kstate.c b/kernel/liveupdate/kstate.c
new file mode 100644
index 000000000000..dde40899fd99
--- /dev/null
+++ b/kernel/liveupdate/kstate.c
@@ -0,0 +1,536 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <linux/ctype.h>
+#include <linux/gfp.h>
+#include <linux/kexec.h>
+#include <linux/kexec_handover.h>
+#include <linux/kstate.h>
+#include <linux/memblock.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/vmalloc.h>
+
+#define KSTATE_MAGIC 0x3B37778C
+
+static DEFINE_MUTEX(states_lock);
+static LIST_HEAD(states);
+
+phys_addr_t kstate_out_paddr;
+
+void *kstate_stream_addr;
+
+struct kstate_entry {
+ int section_type;
+ int state_id;
+ int version_id;
+ int instance_id;
+ int size;
+ DECLARE_FLEX_ARRAY(u8, data);
+};
+
+struct kstate_stream kstate_stream;
+
+enum {
+ KS_SUBSECTION = 1,
+ KS_SECTION,
+ KS_EOF = -1,
+};
+
+static unsigned long get_addr_offset(const struct kstate_field *field)
+{
+ switch (field->addr_type) {
+ case KS_VMEMMAP_ADDR:
+ return VMEMMAP_START;
+ case KS_LINEAR_ADDR:
+ return PAGE_OFFSET;
+ default:
+ WARN_ON(1);
+ }
+ return 0;
+}
+
+static struct folio *folio_realloc(struct folio *folio, int new_order,
+ gfp_t gfp_mask)
+{
+ struct folio *new_folio = folio_alloc(GFP_KERNEL, new_order);
+
+ if (!new_folio)
+ return NULL;
+
+ memcpy(folio_address(new_folio), folio_address(folio), folio_size(folio));
+ folio_put(folio);
+ return new_folio;
+}
+
+static int alloc_space(struct kstate_stream *stream, size_t size)
+{
+ int new_order;
+ struct folio *new_folio;
+ size_t cur_size = stream->pos - folio_address(stream->folio);
+
+ size = size + 4; /* Always alloc extra for KS_EOF */
+ if (cur_size + size < folio_size(stream->folio))
+ return 0;
+
+ new_order = get_order(cur_size) + 1;
+
+ new_folio = folio_realloc(stream->folio, new_order, GFP_KERNEL);
+ if (!new_folio)
+ return -ENOMEM;
+
+ stream->pos = folio_address(stream->folio) + cur_size;
+ return 0;
+}
+
+int kstate_save_data(struct kstate_stream *stream, const void *val,
+ size_t size)
+{
+ int ret;
+
+ ret = alloc_space(stream, size);
+ if (ret)
+ return ret;
+ memcpy(stream->pos, val, size);
+ stream->pos += size;
+ return 0;
+}
+
+static int save_kstate(struct kstate_stream *stream, int id,
+ const struct kstate_description *kstate,
+ void *obj, int section_type)
+{
+ const struct kstate_field *field = kstate->fields;
+ struct kstate_entry *ke;
+ unsigned long ke_off;
+ int ret = 0;
+
+ ret = alloc_space(stream, sizeof(*ke));
+ if (ret)
+ goto err;
+
+ ke_off = stream->pos - folio_address(stream->folio);
+ ke = stream->pos;
+ stream->pos += sizeof(*ke);
+
+ ke->section_type = section_type;
+ ke->state_id = kstate->id;
+ ke->version_id = kstate->version_id;
+ ke->instance_id = id;
+
+ while (field->flags != KS_END) {
+ void *first, *cur;
+ int n_elems = 1;
+ int size, i;
+
+ first = obj + field->offset;
+ /* Fields of higher versions shouldn't exist */
+ if (WARN_ON(field->version_id > kstate->version_id)) {
+ field++;
+ continue;
+ }
+ if (field->flags & KS_DEPRECATED) {
+ field++;
+ continue;
+ }
+
+ if (field->flags & KS_POINTER)
+ first = *(void **)(obj + field->offset);
+ if (field->count)
+ n_elems = field->count();
+ size = field->size;
+ for (i = 0; i < n_elems; i++) {
+ cur = first + i * size;
+
+ if (field->flags & KS_ARRAY_OF_POINTER)
+ cur = *(void **)cur;
+
+ if (field->flags & KS_STRUCT) {
+ ret = save_kstate(stream, 0, field->ksd, cur, section_type);
+ if (ret)
+ goto err;
+ } else if (field->flags & KS_CUSTOM) {
+ if (field->save) {
+ ret = field->save(stream, cur, field);
+ if (ret)
+ goto err;
+ }
+ } else if (field->flags & (KS_BASE_TYPE|KS_POINTER)) {
+ ret = kstate_save_data(stream, cur, size);
+ if (ret)
+ goto err;
+ } else if (field->flags & KS_ADDRESS) {
+ void *addr_offset = *(void **)cur
+ - get_addr_offset(field);
+ ret = kstate_save_data(stream, &addr_offset,
+ sizeof(addr_offset));
+ if (ret)
+ goto err;
+ } else
+ WARN_ON_ONCE(1);
+ }
+ field++;
+
+ }
+
+ ke = folio_address(stream->folio) + ke_off;
+ ke->size = (stream->pos - folio_address(stream->folio)) - (ke_off + sizeof(*ke));
+err:
+ if (ret)
+ pr_err("kstate: save of state %s failed\n", kstate->name);
+
+ return ret;
+}
+
+static int save_kstates(struct kstate_stream *stream, int id,
+ const struct kstate_description *kstate,
+ void *obj)
+{
+ int ret = 0;
+ const struct kstate_description *const *section;
+
+ ret = save_kstate(stream, id, kstate, obj, KS_SECTION);
+ if (ret)
+ return ret;
+
+ if (!kstate->subsections)
+ return ret;
+
+ section = kstate->subsections;
+ while (*section) {
+ ret = save_kstate(stream, id, *section, obj, KS_SUBSECTION);
+ if (ret)
+ break;
+ section++;
+ }
+
+ return ret;
+}
+
+static int alloc_kstate_stream(void)
+{
+ struct folio *folio;
+ u32 *buf;
+
+ folio = folio_alloc(GFP_KERNEL, 0);
+ if (!folio)
+ return -ENOMEM;
+
+ buf = folio_address(folio);
+ *buf++ = KSTATE_MAGIC;
+ kstate_stream.pos = buf;
+ kstate_stream.folio = folio;
+ return 0;
+}
+
+void free_kstate_stream(void)
+{
+ if (kstate_stream.folio)
+ folio_put(kstate_stream.folio);
+
+ kstate_stream.folio = NULL;
+ kstate_stream.pos = NULL;
+}
+
+int kstate_save_state(void)
+{
+ struct state_entry *se;
+ struct kstate_entry *ke;
+ int err = 0;
+
+ err = alloc_kstate_stream();
+ if (err)
+ return err;
+
+ mutex_lock(&states_lock);
+ list_for_each_entry(se, &states, list) {
+ err = save_kstates(&kstate_stream, se->id, se->kstd, se->obj);
+ if (err)
+ goto out;
+ }
+ ke = kstate_stream.pos;
+ ke->section_type = KS_EOF;
+out:
+ mutex_unlock(&states_lock);
+ if (err)
+ free_kstate_stream();
+ return err;
+}
+
+void kstate_restore_data(struct kstate_stream *stream, void *val, size_t size)
+{
+ memcpy(val, stream->pos, size);
+ stream->pos += size;
+}
+
+static void restore_kstate(struct kstate_stream *stream, int id,
+ const struct kstate_description *kstate, void *obj)
+{
+ const struct kstate_field *field = kstate->fields;
+ struct kstate_entry *ke = stream->pos;
+
+ stream->pos = ke->data;
+
+ WARN_ONCE(ke->version_id != kstate->version_id, "version mismatch %d %d\n",
+ ke->version_id, kstate->version_id);
+
+ WARN_ONCE(ke->instance_id != id, "instance id mismatch %d %d\n",
+ ke->instance_id, id);
+
+ while (field->flags != KS_END) {
+ void *first, *cur;
+ int n_elems = 1;
+ int size, i;
+
+ if (field->version_id > ke->version_id) {
+ field++;
+ continue;
+ }
+ if (field->flags & KS_DEPRECATED) {
+ if (ke->version_id <= field->version_id)
+ stream->pos += field->size;
+ field++;
+ continue;
+ }
+
+ first = obj + field->offset;
+ if (field->flags & KS_POINTER)
+ first = *(void **)(obj + field->offset);
+ if (field->count)
+ n_elems = field->count();
+ size = field->size;
+ for (i = 0; i < n_elems; i++) {
+ cur = first + i * size;
+
+ if (field->flags & KS_ARRAY_OF_POINTER)
+ cur = *(void **)cur;
+
+ if (field->flags & KS_STRUCT)
+ restore_kstate(stream, 0, field->ksd, cur);
+ else if (field->flags & KS_CUSTOM) {
+ if (field->restore)
+ field->restore(stream, cur, field);
+ } else if (field->flags & (KS_BASE_TYPE | KS_POINTER)) {
+ memcpy(cur, stream->pos, size);
+ stream->pos += size;
+ } else if (field->flags & KS_ADDRESS) {
+ *(void **)cur = (*(void **)stream->pos) +
+ get_addr_offset(field);
+ stream->pos += sizeof(void *);
+ } else
+ WARN_ON_ONCE(1);
+
+ }
+ field++;
+ }
+}
+
+static struct kstate_entry *find_subsection(struct kstate_stream *stream, int id)
+{
+ struct kstate_entry *ke = stream->pos;
+
+ while (ke->section_type == KS_SUBSECTION) {
+ if (ke->state_id == id)
+ return ke;
+
+ ke = (struct kstate_entry *)(ke->data + ke->size);
+ }
+ return NULL;
+}
+
+static void subsection_load(struct kstate_stream *stream, int id,
+ struct kstate_description *ksd, void *obj)
+{
+ struct kstate_entry *start_ke, *ke;
+ const struct kstate_description *const *section;
+
+ if (!ksd->subsections)
+ return;
+
+ start_ke = stream->pos;
+ section = ksd->subsections;
+ while (*section) {
+ stream->pos = start_ke;
+ ke = find_subsection(stream, (*section)->id);
+ if (ke) {
+ stream->pos = ke;
+ restore_kstate(stream, id, *section, obj);
+ }
+ section++;
+ }
+}
+
+int kstate_restore(struct kstate_description *state, void *obj, int id)
+{
+ struct kstate_stream stream;
+ struct kstate_entry *ke;
+
+ if (kstate_stream_addr == NULL)
+ return -ENOENT;
+
+ if (*(u32 *)kstate_stream_addr != KSTATE_MAGIC) {
+ kstate_stream_addr = NULL;
+ return -ENOENT;
+ }
+
+ ke = (struct kstate_entry *)(kstate_stream_addr + sizeof(u32));
+ if (WARN_ON_ONCE(ke->state_id == 0))
+ return -ENOENT;
+
+ stream.pos = ke;
+ while (ke->section_type != KS_EOF) {
+ if (ke->state_id != state->id ||
+ ke->instance_id != id) {
+ ke = (struct kstate_entry *)(ke->data + ke->size);
+ continue;
+ }
+ stream.pos = ke;
+ restore_kstate(&stream, id, state, obj);
+ ke = (struct kstate_entry *)(ke->data + ke->size);
+ subsection_load(&stream, id, state, obj);
+ return 0;
+ }
+ return -ENOENT;
+}
+
+int kstate_register(struct kstate_description *state, void *obj, int id)
+{
+ struct state_entry *se;
+
+ se = kmalloc(sizeof(*se), GFP_KERNEL);
+ if (!se)
+ return -ENOMEM;
+
+ se->kstd = state;
+ se->id = id;
+ se->obj = obj;
+
+ mutex_lock(&states_lock);
+ list_add(&se->list, &states);
+ mutex_unlock(&states_lock);
+ return 0;
+}
+
+void kstate_unregister(struct kstate_description *state, void *obj, int id)
+{
+ struct state_entry *se, *tmp;
+
+ mutex_lock(&states_lock);
+ list_for_each_entry_safe(se, tmp, &states, list) {
+ if (se->id == id && se->obj == obj) {
+ list_del(&se->list);
+ break;
+ }
+ }
+ mutex_unlock(&states_lock);
+ kfree(se);
+}
+
+int kstate_register_restore(struct kstate_description *state, void *obj)
+{
+ int id = atomic_inc_return(&state->instance_id);
+
+ kstate_register(state, obj, id);
+ return kstate_restore(state, obj, id);
+}
+
+int kstate_folio_restore(struct kstate_stream *stream, void *obj,
+ const struct kstate_field *field)
+{
+ phys_addr_t paddr;
+ struct folio *folio;
+
+ kstate_restore_data(stream, &paddr, sizeof(paddr));
+ folio = kho_restore_folio(paddr);
+ if (!folio)
+ return -ENOENT;
+
+ *(struct folio **)obj = folio;
+ return 0;
+}
+
+int kstate_folio_save(struct kstate_stream *stream, void *obj,
+ const struct kstate_field *field)
+{
+ struct folio *folio = *(struct folio **)obj;
+ phys_addr_t paddr = PFN_PHYS(folio_pfn(folio));
+ int ret;
+
+ ret = kstate_save_data(stream, &paddr, sizeof(paddr));
+ if (ret)
+ return ret;
+
+ return kho_preserve_folio(folio);
+}
+
+
+struct kstate_out {
+ union {
+ phys_addr_t kstate_paddr;
+ u8 data[PAGE_SIZE];
+ };
+};
+
+int kstate_abort(void)
+{
+ free_kstate_stream();
+ return 0;
+}
+
+int kstate_finalize(void)
+{
+ int err = 0;
+ struct kstate_out *kstate_out = phys_to_virt(kstate_out_paddr);
+ struct folio *kstate_out_folio = page_folio(phys_to_page(kstate_out_paddr));
+
+ err = kstate_save_state();
+ if (err)
+ return err;
+
+ err = kho_preserve_folio(kstate_out_folio);
+ if (err)
+ goto out_save_state;
+
+ err = kho_preserve_folio(kstate_stream.folio);
+ if (err)
+ goto out;
+
+ kstate_out->kstate_paddr = PFN_PHYS(folio_pfn(kstate_stream.folio));
+out:
+ if (err)
+ kho_unpreserve_folio(kstate_out_folio);
+out_save_state:
+ if (err)
+ free_kstate_stream();
+
+ return err;
+}
+
+static int __init kstate_init(void)
+{
+ struct page *page;
+ int err;
+
+ if (!kho_is_enabled())
+ return 0;
+
+ page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+ if (!page)
+ return -ENOMEM;
+
+ kstate_out_paddr = page_to_phys(page);
+ return err;
+}
+late_initcall(kstate_init);
+
+int __init kstate_early_init(phys_addr_t kstate_entries, u64 len)
+{
+ struct kstate_out *kstate_out;
+
+ kstate_out = early_memremap(kstate_entries, len);
+ if (!kstate_out) {
+ pr_err("%s failed\n", __func__);
+ return -ENOMEM;
+ }
+ kstate_stream_addr = phys_to_virt(kstate_out->kstate_paddr);
+ early_memunmap(kstate_out, len);
+ return 0;
+}
--
2.49.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v3 4/7] kho: replace KHO FDT with kstate metadata
2025-09-09 20:14 [PATCH v3 0/7] KSTATE: [de]serialization framework for KHO Andrey Ryabinin
` (2 preceding siblings ...)
2025-09-09 20:14 ` [PATCH v3 3/7] kstate: Add KSTATE - [de]serialization framework for KHO Andrey Ryabinin
@ 2025-09-09 20:14 ` Andrey Ryabinin
2025-09-10 16:50 ` Rob Herring
2025-09-09 20:14 ` [PATCH v3 5/7] kstate, test: add test module for testing kstate subsystem Andrey Ryabinin
` (2 subsequent siblings)
6 siblings, 1 reply; 18+ messages in thread
From: Andrey Ryabinin @ 2025-09-09 20:14 UTC (permalink / raw)
To: linux-kernel
Cc: Alexander Graf, Mike Rapoport, James Gowans, Andrew Morton,
linux-mm, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Baoquan He, kexec,
Pratyush Yadav, Jason Gunthorpe, Pasha Tatashin, David Rientjes,
Pratyush Yadav, Changyuan Lyu, Jonathan Corbet, linux-doc,
Andrey Ryabinin, Chris Li, Ashish.Kalra, William Tu,
David Matlack, Andrey Ryabinin, Rob Herring, Saravana Kannan,
devicetree
Store KSTATE physical address & size instead of FDT in kho_data.
and initialize KSTATE from kho_populate().
To be able to use FDT in parallel with KSTATE place FDT
address&size into 'struct kho_fdt' and save/restore it using KSTATE.
This finishes wiring KSTATE with KHO, making it fully functional.
Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
---
arch/x86/include/uapi/asm/setup_data.h | 4 +-
arch/x86/kernel/kexec-bzimage64.c | 6 +--
arch/x86/kernel/setup.c | 3 +-
drivers/of/fdt.c | 6 +--
include/linux/kexec.h | 2 +-
include/linux/kstate.h | 1 +
kernel/liveupdate/kexec_handover.c | 60 +++++++++++++++++++++-----
7 files changed, 61 insertions(+), 21 deletions(-)
diff --git a/arch/x86/include/uapi/asm/setup_data.h b/arch/x86/include/uapi/asm/setup_data.h
index 2671c4e1b3a0..844f5b93473f 100644
--- a/arch/x86/include/uapi/asm/setup_data.h
+++ b/arch/x86/include/uapi/asm/setup_data.h
@@ -83,8 +83,8 @@ struct ima_setup_data {
* Locations of kexec handover metadata
*/
struct kho_data {
- __u64 fdt_addr;
- __u64 fdt_size;
+ __u64 kstate_addr;
+ __u64 kstate_size;
__u64 scratch_addr;
__u64 scratch_size;
} __attribute__((packed));
diff --git a/arch/x86/kernel/kexec-bzimage64.c b/arch/x86/kernel/kexec-bzimage64.c
index 24a41f0e0cf1..1bf9474d4286 100644
--- a/arch/x86/kernel/kexec-bzimage64.c
+++ b/arch/x86/kernel/kexec-bzimage64.c
@@ -253,12 +253,12 @@ static void setup_kho(const struct kimage *image, struct boot_params *params,
sd->len = sizeof(struct kho_data);
/* Only add if we have all KHO images in place */
- if (!image->kho.fdt || !image->kho.scratch)
+ if (!image->kho.kstate || !image->kho.scratch)
return;
/* Add setup data */
- kho->fdt_addr = image->kho.fdt;
- kho->fdt_size = PAGE_SIZE;
+ kho->kstate_addr = image->kho.kstate;
+ kho->kstate_size = PAGE_SIZE;
kho->scratch_addr = image->kho.scratch->mem;
kho->scratch_size = image->kho.scratch->bufsz;
sd->next = params->hdr.setup_data;
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 1b2edd07a3e1..eedcf4be8985 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -467,7 +467,8 @@ static void __init add_kho(u64 phys_addr, u32 data_len)
return;
}
- kho_populate(kho->fdt_addr, kho->fdt_size, kho->scratch_addr, kho->scratch_size);
+ kho_populate(kho->kstate_addr, kho->kstate_size, kho->scratch_addr,
+ kho->scratch_size);
early_memunmap(kho, size);
}
diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 0edd639898a6..f682ef6a1187 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -893,8 +893,8 @@ static void __init early_init_dt_check_kho(void)
if (l != (dt_root_addr_cells + dt_root_size_cells) * sizeof(__be32))
return;
- fdt_start = dt_mem_next_cell(dt_root_addr_cells, &p);
- fdt_size = dt_mem_next_cell(dt_root_addr_cells, &p);
+ kstate_start = dt_mem_next_cell(dt_root_addr_cells, &p);
+ kstate_size = dt_mem_next_cell(dt_root_addr_cells, &p);
p = of_get_flat_dt_prop(node, "linux,kho-scratch", &l);
if (l != (dt_root_addr_cells + dt_root_size_cells) * sizeof(__be32))
@@ -903,7 +903,7 @@ static void __init early_init_dt_check_kho(void)
scratch_start = dt_mem_next_cell(dt_root_addr_cells, &p);
scratch_size = dt_mem_next_cell(dt_root_addr_cells, &p);
- kho_populate(fdt_start, fdt_size, scratch_start, scratch_size);
+ kho_populate(kstate_addr, kstate_size, scratch_start, scratch_size);
}
#ifdef CONFIG_SERIAL_EARLYCON
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 39fe3e6cd282..ebf70361269f 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -416,7 +416,7 @@ struct kimage {
struct {
struct kexec_segment *scratch;
- phys_addr_t fdt;
+ phys_addr_t kstate;
} kho;
/* Core ELF header buffer */
diff --git a/include/linux/kstate.h b/include/linux/kstate.h
index 53992593cb19..5a95960e5b03 100644
--- a/include/linux/kstate.h
+++ b/include/linux/kstate.h
@@ -94,6 +94,7 @@ struct kstate_field {
enum kstate_ids {
KSTATE_FOLIO_ID = 1,
+ KSTATE_KHO_FDT_ID,
KSTATE_LAST_ID = -1,
};
diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_handover.c
index f1c6378b2657..216cb58553d6 100644
--- a/kernel/liveupdate/kexec_handover.c
+++ b/kernel/liveupdate/kexec_handover.c
@@ -806,6 +806,10 @@ int kho_abort(void)
goto unlock;
}
+ ret = kstate_abort();
+ if (ret)
+ goto unlock;
+
ret = __kho_abort();
if (ret)
goto unlock;
@@ -984,6 +988,25 @@ int kho_retrieve_subtree(const char *name, phys_addr_t *phys)
}
EXPORT_SYMBOL_GPL(kho_retrieve_subtree);
+
+#define KHO_FDT_INSTANCE_ID 1
+
+static struct kho_fdt {
+ phys_addr_t fdt_phys;
+ u64 fdt_len;
+} kho_fdt;
+
+struct kstate_description kho_fdt_state = {
+ .name = "kho_fdt",
+ .version_id = 1,
+ .id = KSTATE_KHO_FDT_ID,
+ .fields = (const struct kstate_field[]) {
+ KSTATE_BASE_TYPE(fdt_phys, struct kho_fdt, phys_addr_t),
+ KSTATE_BASE_TYPE(fdt_len, struct kho_fdt, u64),
+ KSTATE_END_OF_LIST()
+ },
+};
+
static __init int kho_init(void)
{
int err = 0;
@@ -1000,13 +1023,20 @@ static __init int kho_init(void)
}
kho_out.fdt = page_to_virt(fdt_page);
- err = kho_debugfs_init();
+ err = kstate_register(&kho_fdt_state, &kho_fdt, KHO_FDT_INSTANCE_ID);
if (err)
goto err_free_fdt;
+ kho_fdt.fdt_phys = page_to_phys(fdt_page);
+ kho_fdt.fdt_len = PAGE_SIZE;
+
+ err = kho_debugfs_init();
+ if (err)
+ goto err_free_kstate;
+
err = kho_out_debugfs_init(&kho_out.dbg);
if (err)
- goto err_free_fdt;
+ goto err_free_kstate;
if (fdt) {
kho_in_debugfs_init(&kho_in.dbg, fdt);
@@ -1025,6 +1055,8 @@ static __init int kho_init(void)
return 0;
+err_free_kstate:
+ kstate_unregister(&kho_fdt_state, &kho_fdt, KHO_FDT_INSTANCE_ID);
err_free_fdt:
put_page(fdt_page);
kho_out.fdt = NULL;
@@ -1165,24 +1197,30 @@ static int __init kho_scratch_init(phys_addr_t scratch_phys, u64 scratch_len)
return err;
}
-void __init kho_populate(phys_addr_t fdt_phys, u64 fdt_len,
- phys_addr_t scratch_phys, u64 scratch_len)
+void __init kho_populate(phys_addr_t kstate_phys, u64 kstate_len,
+ phys_addr_t scratch_phys, u64 scratch_len)
{
-
int err = 0;
- unsigned int scratch_cnt = scratch_len / sizeof(*kho_scratch);
- err = kho_fdt_init(fdt_phys, fdt_len);
+ err = kho_scratch_init(scratch_phys, scratch_len);
if (err)
goto out;
- err = kho_scratch_init(scratch_phys, scratch_len);
+ err = kstate_early_init(kstate_phys, kstate_len);
+ if (err)
+ goto out;
+
+ err = kstate_restore(&kho_fdt_state, &kho_fdt, KHO_FDT_INSTANCE_ID);
+ if (err)
+ goto out;
+
+ err = kho_fdt_init(kho_fdt.fdt_phys, kho_fdt.fdt_len);
if (err)
goto out;
- kho_in.fdt_phys = fdt_phys;
+ kho_in.fdt_phys = kho_fdt.fdt_phys;
kho_in.scratch_phys = scratch_phys;
- kho_scratch_cnt = scratch_cnt;
+ kho_scratch_cnt = scratch_len / sizeof(*kho_scratch);
pr_info("found kexec handover data. Will skip init for some devices\n");
out:
@@ -1201,7 +1239,7 @@ int kho_fill_kimage(struct kimage *image)
if (!kho_enable)
return 0;
- image->kho.fdt = virt_to_phys(kho_out.fdt);
+ image->kho.kstate = kstate_out_paddr;
scratch_size = sizeof(*kho_scratch) * kho_scratch_cnt;
scratch = (struct kexec_buf){
--
2.49.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v3 5/7] kstate, test: add test module for testing kstate subsystem.
2025-09-09 20:14 [PATCH v3 0/7] KSTATE: [de]serialization framework for KHO Andrey Ryabinin
` (3 preceding siblings ...)
2025-09-09 20:14 ` [PATCH v3 4/7] kho: replace KHO FDT with kstate metadata Andrey Ryabinin
@ 2025-09-09 20:14 ` Andrey Ryabinin
2025-09-10 0:33 ` Randy Dunlap
2025-09-09 20:14 ` [PATCH v3 6/7] mm/memblock: Use KSTATE instead of kho to preserve preserved_mem_table Andrey Ryabinin
2025-09-09 20:14 ` [PATCH v3 7/7] Documentation, kstate: Add KSTATE documentation Andrey Ryabinin
6 siblings, 1 reply; 18+ messages in thread
From: Andrey Ryabinin @ 2025-09-09 20:14 UTC (permalink / raw)
To: linux-kernel
Cc: Alexander Graf, Mike Rapoport, James Gowans, Andrew Morton,
linux-mm, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Baoquan He, kexec,
Pratyush Yadav, Jason Gunthorpe, Pasha Tatashin, David Rientjes,
Pratyush Yadav, Changyuan Lyu, Jonathan Corbet, linux-doc,
Andrey Ryabinin, Chris Li, Ashish.Kalra, William Tu,
David Matlack, Andrey Ryabinin
This is simple test and playground useful kstate subsystem development.
It contains some structure with different kind of data which migrated
across kexec to the new kernel using kstate.
Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
---
MAINTAINERS | 1 +
include/linux/kstate.h | 2 +
kernel/liveupdate/Kconfig | 8 +++
lib/Makefile | 2 +
lib/test_kstate.c | 116 ++++++++++++++++++++++++++++++++++++++
5 files changed, 129 insertions(+)
create mode 100644 lib/test_kstate.c
diff --git a/MAINTAINERS b/MAINTAINERS
index 2cd9e49abee5..e96da6d97e75 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13723,6 +13723,7 @@ M: Andrey Ryabinin <ryabinin.a.a@gmail.com>
S: Maintained
F: include/linux/kstate.h
F: kernel/livupdate/kstate.c
+F: lib/test_kstate.c
KTD253 BACKLIGHT DRIVER
M: Linus Walleij <linus.walleij@linaro.org>
diff --git a/include/linux/kstate.h b/include/linux/kstate.h
index 5a95960e5b03..0ced0da37c8f 100644
--- a/include/linux/kstate.h
+++ b/include/linux/kstate.h
@@ -95,6 +95,8 @@ struct kstate_field {
enum kstate_ids {
KSTATE_FOLIO_ID = 1,
KSTATE_KHO_FDT_ID,
+ KSTATE_TEST_ID,
+ KSTATE_TEST_ID_V2,
KSTATE_LAST_ID = -1,
};
diff --git a/kernel/liveupdate/Kconfig b/kernel/liveupdate/Kconfig
index b6ea861006bf..af9a25bdcd6e 100644
--- a/kernel/liveupdate/Kconfig
+++ b/kernel/liveupdate/Kconfig
@@ -69,6 +69,14 @@ config KSTATE
state, save it into the memory and restore the state after kexec
in new kernel.
+config KSTATE_TEST
+ bool "KSTATE test code"
+ help
+ Build a simple test/playground code that is useful for kstate
+ subsystem development. It contains some structure with different
+ kind of data which migrated across kexec to the new kernel
+ using KSTATE.
+
config KEXEC_HANDOVER
bool "kexec handover"
depends on ARCH_SUPPORTS_KEXEC_HANDOVER && ARCH_SUPPORTS_KEXEC_FILE
diff --git a/lib/Makefile b/lib/Makefile
index 392ff808c9b9..46616577caf3 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -316,6 +316,8 @@ obj-$(CONFIG_PARMAN) += parman.o
obj-y += group_cpus.o
+obj-$(CONFIG_KSTATE_TEST) += test_kstate.o
+
# GCC library routines
obj-$(CONFIG_GENERIC_LIB_ASHLDI3) += ashldi3.o
obj-$(CONFIG_GENERIC_LIB_ASHRDI3) += ashrdi3.o
diff --git a/lib/test_kstate.c b/lib/test_kstate.c
new file mode 100644
index 000000000000..70534e8c718f
--- /dev/null
+++ b/lib/test_kstate.c
@@ -0,0 +1,116 @@
+// SPDX-License-Identifier: GPL-2.0
+#define pr_fmt(fmt) "kstate test: " fmt
+#include <linux/io.h>
+#include <linux/kexec_handover.h>
+#include <linux/kstate.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+
+static unsigned long ulong_val;
+struct kstate_test_data {
+ int i;
+ unsigned long *p_ulong;
+ char s[10];
+ struct folio *folio;
+};
+
+#define KSTATE_TEST_DATA_ID 123
+
+struct kstate_description test_state_v2 = {
+ .name = "test_v2",
+ .version_id = 1,
+ .id = KSTATE_TEST_ID_V2,
+ .fields = (const struct kstate_field[]) {
+ KSTATE_BASE_TYPE(i, struct kstate_test_data, int),
+ KSTATE_END_OF_LIST()
+ },
+};
+
+struct kstate_description test_state = {
+ .name = "test",
+ .version_id = 2,
+ .id = KSTATE_TEST_ID,
+ .fields = (const struct kstate_field[]) {
+ KSTATE_BASE_TYPE(s, struct kstate_test_data, char [10]),
+ KSTATE_POINTER(p_ulong, struct kstate_test_data),
+ KSTATE_FOLIO(folio, struct kstate_test_data),
+ KSTATE_BASE_TYPE_DEPRECATED(k, u16, 1),
+ KSTATE_END_OF_LIST()
+ },
+ .subsections = (const struct kstate_description *[]){
+ &test_state_v2,
+ NULL
+ },
+};
+
+static struct kstate_test_data test_data;
+
+static int init_test_data(void)
+{
+ struct folio *folio;
+ int i;
+
+ test_data.i = 10;
+ ulong_val = 20;
+ memcpy(test_data.s, "abcdefghk", sizeof(test_data.s));
+ folio = folio_alloc(GFP_KERNEL, 0);
+ if (!folio)
+ return -ENOMEM;
+
+ for (i = 0; i < folio_size(folio)/sizeof(u32); i += 4)
+ *((u32 *)folio_address(folio) + i) = 0xdeadbeef;
+ test_data.folio = folio;
+ return 0;
+}
+
+static void validate_test_data(void)
+{
+ int i;
+
+ if (WARN_ON(test_data.i != 10))
+ return;
+ if (WARN_ON(*test_data.p_ulong != 20))
+ return;
+ if (WARN_ON(strcmp(test_data.s, "abcdefghk") != 0))
+ return;
+
+ for (i = 0; i < folio_size(test_data.folio)/4; i += 4) {
+ u32 val = *((u32 *)folio_address(test_data.folio) + i);
+
+ if (WARN_ON_ONCE(val != 0xdeadbeef))
+ return;
+ }
+}
+
+static int __init test_kstate_init(void)
+{
+ int ret = 0;
+
+ test_data.p_ulong = &ulong_val;
+
+ ret = kstate_register(&test_state, &test_data, KSTATE_TEST_DATA_ID);
+ if (ret) {
+ pr_err("register failed %d\n", ret);
+ goto out;
+ }
+
+ if (!is_kho_boot()) {
+ ret = init_test_data();
+ if (ret)
+ goto out;
+ } else {
+ pr_info("restoring data\n");
+ ret = kstate_restore(&test_state, &test_data, KSTATE_TEST_DATA_ID);
+ if (ret) {
+ pr_err("restore failed %d\n", ret);
+ goto out;
+ }
+
+ }
+
+ validate_test_data();
+
+out:
+ return ret;
+}
+late_initcall(test_kstate_init);
--
2.49.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v3 6/7] mm/memblock: Use KSTATE instead of kho to preserve preserved_mem_table
2025-09-09 20:14 [PATCH v3 0/7] KSTATE: [de]serialization framework for KHO Andrey Ryabinin
` (4 preceding siblings ...)
2025-09-09 20:14 ` [PATCH v3 5/7] kstate, test: add test module for testing kstate subsystem Andrey Ryabinin
@ 2025-09-09 20:14 ` Andrey Ryabinin
2025-09-15 11:47 ` Jason Gunthorpe
2025-09-09 20:14 ` [PATCH v3 7/7] Documentation, kstate: Add KSTATE documentation Andrey Ryabinin
6 siblings, 1 reply; 18+ messages in thread
From: Andrey Ryabinin @ 2025-09-09 20:14 UTC (permalink / raw)
To: linux-kernel
Cc: Alexander Graf, Mike Rapoport, James Gowans, Andrew Morton,
linux-mm, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Baoquan He, kexec,
Pratyush Yadav, Jason Gunthorpe, Pasha Tatashin, David Rientjes,
Pratyush Yadav, Changyuan Lyu, Jonathan Corbet, linux-doc,
Andrey Ryabinin, Chris Li, Ashish.Kalra, William Tu,
David Matlack, Andrey Ryabinin
Currently preserved_mem_table serialized/deserialized using fdt. Use KSTATE
instead as it makes code simpler and more compact.
Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
---
include/linux/kstate.h | 1 +
mm/memblock.c | 158 +++++++++++++----------------------------
2 files changed, 49 insertions(+), 110 deletions(-)
diff --git a/include/linux/kstate.h b/include/linux/kstate.h
index 0ced0da37c8f..db8ba07e2319 100644
--- a/include/linux/kstate.h
+++ b/include/linux/kstate.h
@@ -97,6 +97,7 @@ enum kstate_ids {
KSTATE_KHO_FDT_ID,
KSTATE_TEST_ID,
KSTATE_TEST_ID_V2,
+ KSTATE_RESERVED_MEM_ID,
KSTATE_LAST_ID = -1,
};
diff --git a/mm/memblock.c b/mm/memblock.c
index 6af0b51b1bb7..b9d84d1ffd83 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -14,11 +14,13 @@
#include <linux/pfn.h>
#include <linux/debugfs.h>
#include <linux/kmemleak.h>
+#include <linux/kstate.h>
#include <linux/seq_file.h>
#include <linux/memblock.h>
#include <linux/mutex.h>
#ifdef CONFIG_KEXEC_HANDOVER
+#include <linux/crc32.h>
#include <linux/libfdt.h>
#include <linux/kexec_handover.h>
#endif /* CONFIG_KEXEC_HANDOVER */
@@ -2498,140 +2500,76 @@ int reserve_mem_release_by_name(const char *name)
}
#ifdef CONFIG_KEXEC_HANDOVER
-#define MEMBLOCK_KHO_FDT "memblock"
-#define MEMBLOCK_KHO_NODE_COMPATIBLE "memblock-v1"
-#define RESERVE_MEM_KHO_NODE_COMPATIBLE "reserve-mem-v1"
-
-static int __init prepare_kho_fdt(void)
-{
- int err = 0, i;
- struct page *fdt_page;
- void *fdt;
-
- fdt_page = alloc_page(GFP_KERNEL);
- if (!fdt_page)
- return -ENOMEM;
-
- fdt = page_to_virt(fdt_page);
-
- err |= fdt_create(fdt, PAGE_SIZE);
- err |= fdt_finish_reservemap(fdt);
-
- err |= fdt_begin_node(fdt, "");
- err |= fdt_property_string(fdt, "compatible", MEMBLOCK_KHO_NODE_COMPATIBLE);
- for (i = 0; i < reserved_mem_count; i++) {
- struct reserve_mem_table *map = &reserved_mem_table[i];
-
- err |= kho_preserve_phys(map->start, map->size);
- err |= fdt_begin_node(fdt, map->name);
- err |= fdt_property_string(fdt, "compatible", RESERVE_MEM_KHO_NODE_COMPATIBLE);
- err |= fdt_property(fdt, "start", &map->start, sizeof(map->start));
- err |= fdt_property(fdt, "size", &map->size, sizeof(map->size));
- err |= fdt_end_node(fdt);
- }
- err |= fdt_end_node(fdt);
- err |= fdt_finish(fdt);
-
- err |= kho_preserve_folio(page_folio(fdt_page));
- err |= kho_add_subtree(MEMBLOCK_KHO_FDT, fdt);
-
- if (err) {
- pr_err("failed to prepare memblock FDT for KHO: %d\n", err);
- put_page(fdt_page);
- }
-
- return err;
-}
+static int kstate_preserve_phys(struct kstate_stream *stream, void *obj,
+ const struct kstate_field *field)
+{
+ struct reserve_mem_table *map = obj;
+
+ return kho_preserve_phys(map->start, map->size);
+}
+
+struct kstate_description kstate_reserve_mem = {
+ .name = "reserved_mem",
+ .id = KSTATE_RESERVED_MEM_ID,
+ .fields = (const struct kstate_field[]) {
+ KSTATE_BASE_TYPE(name, struct reserve_mem_table,
+ char[RESERVE_MEM_NAME_SIZE]),
+ KSTATE_BASE_TYPE(start, struct reserve_mem_table, phys_addr_t),
+ KSTATE_BASE_TYPE(size, struct reserve_mem_table, phys_addr_t),
+ {
+ .name = "phys_range",
+ .flags = KS_CUSTOM,
+ .save = kstate_preserve_phys,
+ },
+ KSTATE_END_OF_LIST(),
+ },
+};
static int __init reserve_mem_init(void)
{
int err;
+ int i;
if (!kho_is_enabled() || !reserved_mem_count)
return 0;
- err = prepare_kho_fdt();
- if (err)
- return err;
- return err;
-}
-late_initcall(reserve_mem_init);
-
-static void *__init reserve_mem_kho_retrieve_fdt(void)
-{
- phys_addr_t fdt_phys;
- static void *fdt;
- int err;
-
- if (fdt)
- return fdt;
-
- err = kho_retrieve_subtree(MEMBLOCK_KHO_FDT, &fdt_phys);
- if (err) {
- if (err != -ENOENT)
- pr_warn("failed to retrieve FDT '%s' from KHO: %d\n",
- MEMBLOCK_KHO_FDT, err);
- return NULL;
- }
-
- fdt = phys_to_virt(fdt_phys);
+ for (i = 0; i < reserved_mem_count; i++) {
+ struct reserve_mem_table *map = &reserved_mem_table[i];
- err = fdt_node_check_compatible(fdt, 0, MEMBLOCK_KHO_NODE_COMPATIBLE);
- if (err) {
- pr_warn("FDT '%s' is incompatible with '%s': %d\n",
- MEMBLOCK_KHO_FDT, MEMBLOCK_KHO_NODE_COMPATIBLE, err);
- fdt = NULL;
+ err = kstate_register(&kstate_reserve_mem,
+ map, crc32(~0, map->name, RESERVE_MEM_NAME_SIZE));
+ if (err)
+ goto out;
}
-
- return fdt;
+out:
+ return err;
}
+late_initcall(reserve_mem_init);
static bool __init reserve_mem_kho_revive(const char *name, phys_addr_t size,
phys_addr_t align)
{
- int err, len_start, len_size, offset;
- const phys_addr_t *p_start, *p_size;
- const void *fdt;
+ struct reserve_mem_table *map = &reserved_mem_table[reserved_mem_count];
- fdt = reserve_mem_kho_retrieve_fdt();
- if (!fdt)
+ if (kstate_restore(&kstate_reserve_mem, map,
+ crc32(~0, name, RESERVE_MEM_NAME_SIZE)))
return false;
- offset = fdt_subnode_offset(fdt, 0, name);
- if (offset < 0) {
- pr_warn("FDT '%s' has no child '%s': %d\n",
- MEMBLOCK_KHO_FDT, name, offset);
- return false;
- }
- err = fdt_node_check_compatible(fdt, offset, RESERVE_MEM_KHO_NODE_COMPATIBLE);
- if (err) {
- pr_warn("Node '%s' is incompatible with '%s': %d\n",
- name, RESERVE_MEM_KHO_NODE_COMPATIBLE, err);
+ if (map->start & (align - 1)) {
+ pr_warn("KHO reserve-mem '%s' has wrong alignment (0x%pa, 0x%pa)\n",
+ name, &align, &map->start);
return false;
}
- p_start = fdt_getprop(fdt, offset, "start", &len_start);
- p_size = fdt_getprop(fdt, offset, "size", &len_size);
- if (!p_start || len_start != sizeof(*p_start) || !p_size ||
- len_size != sizeof(*p_size)) {
+ if (map->size != size) {
+ pr_warn("KHO reserve-mem '%s' has wrong size (0x%pa != 0x%pa)\n",
+ name, &map->size, &size);
return false;
}
- if (*p_start & (align - 1)) {
- pr_warn("KHO reserve-mem '%s' has wrong alignment (0x%lx, 0x%lx)\n",
- name, (long)align, (long)*p_start);
- return false;
- }
-
- if (*p_size != size) {
- pr_warn("KHO reserve-mem '%s' has wrong size (0x%lx != 0x%lx)\n",
- name, (long)*p_size, (long)size);
- return false;
- }
-
- reserved_mem_add(*p_start, size, name);
- pr_info("Revived memory reservation '%s' from KHO\n", name);
-
+ pr_info("Revived memory reservation '%s' %pa %pa from KHO\n",
+ name, &map->start, &map->size);
+ reserved_mem_count++;
return true;
}
#else
--
2.49.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v3 7/7] Documentation, kstate: Add KSTATE documentation
2025-09-09 20:14 [PATCH v3 0/7] KSTATE: [de]serialization framework for KHO Andrey Ryabinin
` (5 preceding siblings ...)
2025-09-09 20:14 ` [PATCH v3 6/7] mm/memblock: Use KSTATE instead of kho to preserve preserved_mem_table Andrey Ryabinin
@ 2025-09-09 20:14 ` Andrey Ryabinin
2025-09-10 0:53 ` Bagas Sanjaya
2025-09-10 1:00 ` Randy Dunlap
6 siblings, 2 replies; 18+ messages in thread
From: Andrey Ryabinin @ 2025-09-09 20:14 UTC (permalink / raw)
To: linux-kernel
Cc: Alexander Graf, Mike Rapoport, James Gowans, Andrew Morton,
linux-mm, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Baoquan He, kexec,
Pratyush Yadav, Jason Gunthorpe, Pasha Tatashin, David Rientjes,
Pratyush Yadav, Changyuan Lyu, Jonathan Corbet, linux-doc,
Andrey Ryabinin, Chris Li, Ashish.Kalra, William Tu,
David Matlack, Andrey Ryabinin
Add KSTATE doc. Describe 'struct kstate_description' and information
about versioning fields.
Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
---
Documentation/core-api/index.rst | 1 +
Documentation/core-api/kstate.rst | 117 ++++++++++++++++++++++++++++++
MAINTAINERS | 1 +
3 files changed, 119 insertions(+)
create mode 100644 Documentation/core-api/kstate.rst
diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst
index a8b7d1417f0a..6c0466e0bb35 100644
--- a/Documentation/core-api/index.rst
+++ b/Documentation/core-api/index.rst
@@ -136,6 +136,7 @@ Documents that don't fit elsewhere or which have yet to be categorized.
.. toctree::
:maxdepth: 1
+ kstate
librs
liveupdate
netlink
diff --git a/Documentation/core-api/kstate.rst b/Documentation/core-api/kstate.rst
new file mode 100644
index 000000000000..981ba162109c
--- /dev/null
+++ b/Documentation/core-api/kstate.rst
@@ -0,0 +1,117 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+KSTATE: Kernel state preservation framework
+===========================================
+
+KSTATE (kernel state) is framework to migrate some part of the internal
+kernel state (device driver, memory, etc) from one kernel to another across
+kexec reboot.
+
+kstate_description
+------------------
+
+Most kernel's state is in structs and structs could be described by
+kstate_description. E.g.
+
+struct kstate_test_data {
+ int i;
+ unsigned long *p_ulong;
+ char s[10];
+ struct folio *folio;
+};
+
+struct kstate_description test_state = {
+ .name = "test",
+ .version_id = 1,
+ .id = KSTATE_TEST_ID,
+ .fields = (const struct kstate_field[]) {
+ KSTATE_BASE_TYPE(s, struct kstate_test_data, char [10]),
+ KSTATE_POINTER(p_ulong, struct kstate_test_data),
+ KSTATE_FOLIO(folio, struct kstate_test_data),
+ KSTATE_END_OF_LIST()
+ },
+};
+
+Changing data structures
+------------------------
+
+KSTATE saves/restores structs as a series of fields. When the kernel structs
+are changed we may need to change the state to store more/different information.
+
+Versions
+--------
+
+Version numbers are intended for major incompatible changes, that are not
+backward compatible.
+
+Each version is associated with a series of fields saved. The state is always
+saved as the newest version specified by ->version_id.
+But loading state sometimes is able to load state from an older version.
+
+There are two version fields:
+
+ - version_id: the maximum version_id supported by kstate_description.
+ - min_version_id: the minimum version_id that given kstate_description is able to understand.
+
+KSTATE is able to read versions from minimum_version_id to version_id.
+
+There are _V forms of many KSTATE_ macros to load fields for version dependent fields, e.g.
+
+ KSTATE_BASE_TYPE_V(i, struct kstate_test_data, int, 2),
+
+only loads that field for versions 2 and newer.
+
+Saving state will always create a section with the ‘version_id’ value and thus can’t
+be loaded by any older kernel.
+
+Removing field
+--------------
+If field is no longer needed it could be marked deprecated using
+KSTATE_*_DEPRECATED macro and bumping ->version_id of kstate_description:
+
+ KSTATE_BASE_TYPE_DEPRECATED(k, u16, 1),
+
+The last parameter of the macro is the last version number that have this field.
+Old kernel will save such field, but new kernel will skip it on load. Also
+the new kernel will not save such field (as there is nothing to save).
+Such change is not backward compatible.
+
+Adding new field
+----------------
+
+Addition of new field can be done as version dependent field by using _V form of
+KSTATE_ macro:
+ KSTATE_BASE_TYPE_V(i, struct kstate_test_data, int, 2),
+
+This indicates that 'test_state' only from version 2 and above have field '->i'.
+If new kernel sees incoming 'test_state' of version 1 it will skip restoring '->i'
+as nothing was saved. This is not backward compatible, as old kernel doesn't
+understand the new V2 'test_state'.
+
+Subsections
+-----------
+Another option is adding subsection to kstate_description. A subsection is
+additional kstate_description which linked to the main one:
+
+struct kstate_description test_state_v2 = {
+ .name = "test_v2",
+ .id = KSTATE_TEST_ID_V2,
+ .fields = (const struct kstate_field[]) {
+ KSTATE_BASE_TYPE(i, struct kstate_test_data, int),
+ KSTATE_END_OF_LIST()
+ },
+};
+
+struct kstate_description test_state = {
+ ......
+ .subsections = (const struct kstate_description *[]){
+ &test_state_v2,
+ NULL
+ },
+};
+
+
+Subsection must have a unique ->id. If the receiving side finds a subsection
+with unknown id it will be ignored. This make subsections suitable for backward
+compatible changes (migrate from N+1 to N kernel) assuming old kernel is ok without
+information in subsection.
diff --git a/MAINTAINERS b/MAINTAINERS
index e96da6d97e75..a9baf49cdbeb 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13721,6 +13721,7 @@ F: include/linux/ks0108.h
KSTATE
M: Andrey Ryabinin <ryabinin.a.a@gmail.com>
S: Maintained
+F: Documentation/core-api/kstate.rst
F: include/linux/kstate.h
F: kernel/livupdate/kstate.c
F: lib/test_kstate.c
--
2.49.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 5/7] kstate, test: add test module for testing kstate subsystem.
2025-09-09 20:14 ` [PATCH v3 5/7] kstate, test: add test module for testing kstate subsystem Andrey Ryabinin
@ 2025-09-10 0:33 ` Randy Dunlap
2025-09-11 17:00 ` Andrey Ryabinin
0 siblings, 1 reply; 18+ messages in thread
From: Randy Dunlap @ 2025-09-10 0:33 UTC (permalink / raw)
To: Andrey Ryabinin, linux-kernel
Cc: Alexander Graf, Mike Rapoport, James Gowans, Andrew Morton,
linux-mm, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Baoquan He, kexec,
Pratyush Yadav, Jason Gunthorpe, Pasha Tatashin, David Rientjes,
Pratyush Yadav, Changyuan Lyu, Jonathan Corbet, linux-doc,
Andrey Ryabinin, Chris Li, Ashish.Kalra, William Tu,
David Matlack
On 9/9/25 1:14 PM, Andrey Ryabinin wrote:
> diff --git a/kernel/liveupdate/Kconfig b/kernel/liveupdate/Kconfig
> index b6ea861006bf..af9a25bdcd6e 100644
> --- a/kernel/liveupdate/Kconfig
> +++ b/kernel/liveupdate/Kconfig
> @@ -69,6 +69,14 @@ config KSTATE
> state, save it into the memory and restore the state after kexec
> in new kernel.
>
> +config KSTATE_TEST
> + bool "KSTATE test code"
> + help
> + Build a simple test/playground code that is useful for kstate
Build simple
> + subsystem development. It contains some structure with different
structures
?
> + kind of data which migrated across kexec to the new kernel
kinds which are migrated
> + using KSTATE.
> +
--
~Randy
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 7/7] Documentation, kstate: Add KSTATE documentation
2025-09-09 20:14 ` [PATCH v3 7/7] Documentation, kstate: Add KSTATE documentation Andrey Ryabinin
@ 2025-09-10 0:53 ` Bagas Sanjaya
2025-09-11 17:07 ` Andrey Ryabinin
2025-09-10 1:00 ` Randy Dunlap
1 sibling, 1 reply; 18+ messages in thread
From: Bagas Sanjaya @ 2025-09-10 0:53 UTC (permalink / raw)
To: Andrey Ryabinin, linux-kernel
Cc: Alexander Graf, Mike Rapoport, James Gowans, Andrew Morton,
linux-mm, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Baoquan He, kexec,
Pratyush Yadav, Jason Gunthorpe, Pasha Tatashin, David Rientjes,
Pratyush Yadav, Changyuan Lyu, Jonathan Corbet, linux-doc,
Andrey Ryabinin, Chris Li, Ashish.Kalra, William Tu,
David Matlack
On Tue, Sep 09, 2025 at 10:14:42PM +0200, Andrey Ryabinin wrote:
> +There are _V forms of many KSTATE_ macros to load fields for version dependent fields, e.g.
Escape the trailing underscore (i.e. KSTATE\_).
> +Addition of new field can be done as version dependent field by using _V form of
> +KSTATE_ macro:
Ditto.
> +Subsections
> +-----------
> +Another option is adding subsection to kstate_description. A subsection is
> +additional kstate_description which linked to the main one:
> +
> +struct kstate_description test_state_v2 = {
> + .name = "test_v2",
> + .id = KSTATE_TEST_ID_V2,
> + .fields = (const struct kstate_field[]) {
> + KSTATE_BASE_TYPE(i, struct kstate_test_data, int),
> + KSTATE_END_OF_LIST()
> + },
> +};
> +
> +struct kstate_description test_state = {
> + ......
> + .subsections = (const struct kstate_description *[]){
> + &test_state_v2,
> + NULL
> + },
> +};
Sphinx errors out on struct snippets like above:
Documentation/core-api/kstate.rst:17: WARNING: Inline emphasis start-string without end-string. [docutils]
Documentation/core-api/kstate.rst:17: WARNING: Inline emphasis start-string without end-string. [docutils]
Documentation/core-api/kstate.rst:21: WARNING: Definition list ends without a blank line; unexpected unindent. [docutils]
Documentation/core-api/kstate.rst:28: ERROR: Unexpected indentation. [docutils]
Documentation/core-api/kstate.rst:32: WARNING: Block quote ends without a blank line; unexpected unindent. [docutils]
Documentation/core-api/kstate.rst:33: WARNING: Definition list ends without a blank line; unexpected unindent. [docutils]
Documentation/core-api/kstate.rst:84: ERROR: Unexpected indentation. [docutils]
Documentation/core-api/kstate.rst:100: ERROR: Unexpected indentation. [docutils]
Documentation/core-api/kstate.rst:102: WARNING: Block quote ends without a blank line; unexpected unindent. [docutils]
Documentation/core-api/kstate.rst:103: WARNING: Definition list ends without a blank line; unexpected unindent. [docutils]
Documentation/core-api/kstate.rst:106: CRITICAL: Unexpected section title or transition.
...... [docutils]
reStructuredText markup error!
I have to wrap them in literal code blocks:
---- >8 ----
diff --git a/Documentation/core-api/kstate.rst b/Documentation/core-api/kstate.rst
index 981ba162109c34..620d7c126c2038 100644
--- a/Documentation/core-api/kstate.rst
+++ b/Documentation/core-api/kstate.rst
@@ -11,16 +11,16 @@ kstate_description
------------------
Most kernel's state is in structs and structs could be described by
-kstate_description. E.g.
+kstate_description. E.g.::
-struct kstate_test_data {
+ struct kstate_test_data {
int i;
unsigned long *p_ulong;
char s[10];
struct folio *folio;
-};
+ };
-struct kstate_description test_state = {
+ struct kstate_description test_state = {
.name = "test",
.version_id = 1,
.id = KSTATE_TEST_ID,
@@ -30,7 +30,7 @@ struct kstate_description test_state = {
KSTATE_FOLIO(folio, struct kstate_test_data),
KSTATE_END_OF_LIST()
},
-};
+ };
Changing data structures
------------------------
@@ -55,7 +55,7 @@ There are two version fields:
KSTATE is able to read versions from minimum_version_id to version_id.
-There are _V forms of many KSTATE_ macros to load fields for version dependent fields, e.g.
+There are _V forms of many KSTATE_ macros to load fields for version dependent fields, e.g.::
KSTATE_BASE_TYPE_V(i, struct kstate_test_data, int, 2),
@@ -67,7 +67,7 @@ be loaded by any older kernel.
Removing field
--------------
If field is no longer needed it could be marked deprecated using
-KSTATE_*_DEPRECATED macro and bumping ->version_id of kstate_description:
+KSTATE_*_DEPRECATED macro and bumping ->version_id of kstate_description::
KSTATE_BASE_TYPE_DEPRECATED(k, u16, 1),
@@ -80,7 +80,8 @@ Adding new field
----------------
Addition of new field can be done as version dependent field by using _V form of
-KSTATE_ macro:
+KSTATE_ macro::
+
KSTATE_BASE_TYPE_V(i, struct kstate_test_data, int, 2),
This indicates that 'test_state' only from version 2 and above have field '->i'.
@@ -91,24 +92,24 @@ understand the new V2 'test_state'.
Subsections
-----------
Another option is adding subsection to kstate_description. A subsection is
-additional kstate_description which linked to the main one:
+additional kstate_description which linked to the main one::
-struct kstate_description test_state_v2 = {
+ struct kstate_description test_state_v2 = {
.name = "test_v2",
.id = KSTATE_TEST_ID_V2,
.fields = (const struct kstate_field[]) {
KSTATE_BASE_TYPE(i, struct kstate_test_data, int),
KSTATE_END_OF_LIST()
},
-};
+ };
-struct kstate_description test_state = {
+ struct kstate_description test_state = {
......
.subsections = (const struct kstate_description *[]){
&test_state_v2,
NULL
},
-};
+ };
Subsection must have a unique ->id. If the receiving side finds a subsection
Thanks.
--
An old man doll... just what I always wanted! - Clara
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 7/7] Documentation, kstate: Add KSTATE documentation
2025-09-09 20:14 ` [PATCH v3 7/7] Documentation, kstate: Add KSTATE documentation Andrey Ryabinin
2025-09-10 0:53 ` Bagas Sanjaya
@ 2025-09-10 1:00 ` Randy Dunlap
1 sibling, 0 replies; 18+ messages in thread
From: Randy Dunlap @ 2025-09-10 1:00 UTC (permalink / raw)
To: Andrey Ryabinin, linux-kernel
Cc: Alexander Graf, Mike Rapoport, James Gowans, Andrew Morton,
linux-mm, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Baoquan He, kexec,
Pratyush Yadav, Jason Gunthorpe, Pasha Tatashin, David Rientjes,
Pratyush Yadav, Changyuan Lyu, Jonathan Corbet, linux-doc,
Andrey Ryabinin, Chris Li, Ashish.Kalra, William Tu,
David Matlack
On 9/9/25 1:14 PM, Andrey Ryabinin wrote:
> Add KSTATE doc. Describe 'struct kstate_description' and information
> about versioning fields.
>
> Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
> ---
> Documentation/core-api/index.rst | 1 +
> Documentation/core-api/kstate.rst | 117 ++++++++++++++++++++++++++++++
> MAINTAINERS | 1 +
> 3 files changed, 119 insertions(+)
> create mode 100644 Documentation/core-api/kstate.rst
>
> diff --git a/Documentation/core-api/kstate.rst b/Documentation/core-api/kstate.rst
> new file mode 100644
> index 000000000000..981ba162109c
> --- /dev/null
> +++ b/Documentation/core-api/kstate.rst
> @@ -0,0 +1,117 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +KSTATE: Kernel state preservation framework
> +===========================================
> +
> +KSTATE (kernel state) is framework to migrate some part of the internal
> +kernel state (device driver, memory, etc) from one kernel to another across
> +kexec reboot.
> +
> +kstate_description
> +------------------
> +
> +Most kernel's state is in structs and structs could be described by
Most kernel state
> +kstate_description. E.g. <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> +
> +struct kstate_test_data {
> + int i;
> + unsigned long *p_ulong;
> + char s[10];
> + struct folio *folio;
> +};
> +
> +struct kstate_description test_state = {
> + .name = "test",
> + .version_id = 1,
> + .id = KSTATE_TEST_ID,
> + .fields = (const struct kstate_field[]) {
> + KSTATE_BASE_TYPE(s, struct kstate_test_data, char [10]),
> + KSTATE_POINTER(p_ulong, struct kstate_test_data),
> + KSTATE_FOLIO(folio, struct kstate_test_data),
> + KSTATE_END_OF_LIST()
> + },
> +};
> +
> +Changing data structures
> +------------------------
> +
> +KSTATE saves/restores structs as a series of fields. When the kernel structs
> +are changed we may need to change the state to store more/different information.
> +
> +Versions
> +--------
> +
> +Version numbers are intended for major incompatible changes, that are not
no comma
Drop "incompatible" since that is implied in the rest of the sentence.
> +backward compatible.
> +
> +Each version is associated with a series of fields saved. The state is always
> +saved as the newest version specified by ->version_id.
> +But loading state sometimes is able to load state from an older version.
> +
> +There are two version fields:
> +
> + - version_id: the maximum version_id supported by kstate_description.
> + - min_version_id: the minimum version_id that given kstate_description is able to understand.
> +
> +KSTATE is able to read versions from minimum_version_id to version_id.
> +
> +There are _V forms of many KSTATE_ macros to load fields for version dependent fields, e.g.
> +
> + KSTATE_BASE_TYPE_V(i, struct kstate_test_data, int, 2),
> +
> +only loads that field for versions 2 and newer.
> +
> +Saving state will always create a section with the ‘version_id’ value and thus can’t
> +be loaded by any older kernel.
> +
> +Removing field
> +--------------
> +If field is no longer needed it could be marked deprecated using
If a field
> +KSTATE_*_DEPRECATED macro and bumping ->version_id of kstate_description:
> +
> + KSTATE_BASE_TYPE_DEPRECATED(k, u16, 1),
> +
> +The last parameter of the macro is the last version number that have this field.
has
> +Old kernel will save such field, but new kernel will skip it on load. Also
An old kernel a new kernel
> +the new kernel will not save such field (as there is nothing to save).
> +Such change is not backward compatible.
> +
> +Adding new field
> +----------------
> +
> +Addition of new field can be done as version dependent field by using _V form of
of a new field as a version-dependent field by using the _V form of
> +KSTATE_ macro:
> + KSTATE_BASE_TYPE_V(i, struct kstate_test_data, int, 2),
> +
> +This indicates that 'test_state' only from version 2 and above have field '->i'.
> +If new kernel sees incoming 'test_state' of version 1 it will skip restoring '->i'
If a new kernel
or
If the new kernel
> +as nothing was saved. This is not backward compatible, as old kernel doesn't
as an old kernel doesn't
> +understand the new V2 'test_state'.
> +
> +Subsections
> +-----------
> +Another option is adding subsection to kstate_description. A subsection is
adding a subsection
> +additional kstate_description which linked to the main one:
an additional which is linked to the main one:
> +
> +struct kstate_description test_state_v2 = {
> + .name = "test_v2",
> + .id = KSTATE_TEST_ID_V2,
> + .fields = (const struct kstate_field[]) {
> + KSTATE_BASE_TYPE(i, struct kstate_test_data, int),
> + KSTATE_END_OF_LIST()
> + },
> +};
> +
> +struct kstate_description test_state = {
> + ......
> + .subsections = (const struct kstate_description *[]){
> + &test_state_v2,
> + NULL
> + },
> +};
> +
> +
> +Subsection must have a unique ->id. If the receiving side finds a subsection
A subsection
> +with unknown id it will be ignored. This make subsections suitable for backward
> +compatible changes (migrate from N+1 to N kernel) assuming old kernel is ok without
assuming the old kernel
> +information in subsection.
in the subsection.
--
~Randy
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 4/7] kho: replace KHO FDT with kstate metadata
2025-09-09 20:14 ` [PATCH v3 4/7] kho: replace KHO FDT with kstate metadata Andrey Ryabinin
@ 2025-09-10 16:50 ` Rob Herring
2025-09-11 16:54 ` Andrey Ryabinin
0 siblings, 1 reply; 18+ messages in thread
From: Rob Herring @ 2025-09-10 16:50 UTC (permalink / raw)
To: Andrey Ryabinin
Cc: linux-kernel, Alexander Graf, Mike Rapoport, James Gowans,
Andrew Morton, linux-mm, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Baoquan He,
kexec, Pratyush Yadav, Jason Gunthorpe, Pasha Tatashin,
David Rientjes, Pratyush Yadav, Changyuan Lyu, Jonathan Corbet,
linux-doc, Andrey Ryabinin, Chris Li, Ashish.Kalra, William Tu,
David Matlack, Saravana Kannan, devicetree
On Tue, Sep 09, 2025 at 10:14:39PM +0200, Andrey Ryabinin wrote:
> Store KSTATE physical address & size instead of FDT in kho_data.
> and initialize KSTATE from kho_populate().
> To be able to use FDT in parallel with KSTATE place FDT
> address&size into 'struct kho_fdt' and save/restore it using KSTATE.
>
> This finishes wiring KSTATE with KHO, making it fully functional.
>
> Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
> ---
> arch/x86/include/uapi/asm/setup_data.h | 4 +-
> arch/x86/kernel/kexec-bzimage64.c | 6 +--
> arch/x86/kernel/setup.c | 3 +-
> drivers/of/fdt.c | 6 +--
> include/linux/kexec.h | 2 +-
> include/linux/kstate.h | 1 +
> kernel/liveupdate/kexec_handover.c | 60 +++++++++++++++++++++-----
> 7 files changed, 61 insertions(+), 21 deletions(-)
>
> diff --git a/arch/x86/include/uapi/asm/setup_data.h b/arch/x86/include/uapi/asm/setup_data.h
> index 2671c4e1b3a0..844f5b93473f 100644
> --- a/arch/x86/include/uapi/asm/setup_data.h
> +++ b/arch/x86/include/uapi/asm/setup_data.h
> @@ -83,8 +83,8 @@ struct ima_setup_data {
> * Locations of kexec handover metadata
> */
> struct kho_data {
> - __u64 fdt_addr;
> - __u64 fdt_size;
> + __u64 kstate_addr;
> + __u64 kstate_size;
> __u64 scratch_addr;
> __u64 scratch_size;
> } __attribute__((packed));
Isn't this part of the ABI which you can't just change.
Rob
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 4/7] kho: replace KHO FDT with kstate metadata
2025-09-10 16:50 ` Rob Herring
@ 2025-09-11 16:54 ` Andrey Ryabinin
0 siblings, 0 replies; 18+ messages in thread
From: Andrey Ryabinin @ 2025-09-11 16:54 UTC (permalink / raw)
To: Rob Herring, Andrey Ryabinin
Cc: linux-kernel, Alexander Graf, Mike Rapoport, James Gowans,
Andrew Morton, linux-mm, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Baoquan He,
kexec, Pratyush Yadav, Jason Gunthorpe, Pasha Tatashin,
David Rientjes, Pratyush Yadav, Changyuan Lyu, Jonathan Corbet,
linux-doc, Chris Li, Ashish.Kalra, William Tu, David Matlack,
Saravana Kannan, devicetree
On 9/10/25 6:50 PM, Rob Herring wrote:
> On Tue, Sep 09, 2025 at 10:14:39PM +0200, Andrey Ryabinin wrote:
>> Store KSTATE physical address & size instead of FDT in kho_data.
>> and initialize KSTATE from kho_populate().
>> To be able to use FDT in parallel with KSTATE place FDT
>> address&size into 'struct kho_fdt' and save/restore it using KSTATE.
>>
>> This finishes wiring KSTATE with KHO, making it fully functional.
>>
>> Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
>> ---
>> arch/x86/include/uapi/asm/setup_data.h | 4 +-
>> arch/x86/kernel/kexec-bzimage64.c | 6 +--
>> arch/x86/kernel/setup.c | 3 +-
>> drivers/of/fdt.c | 6 +--
>> include/linux/kexec.h | 2 +-
>> include/linux/kstate.h | 1 +
>> kernel/liveupdate/kexec_handover.c | 60 +++++++++++++++++++++-----
>> 7 files changed, 61 insertions(+), 21 deletions(-)
>>
>> diff --git a/arch/x86/include/uapi/asm/setup_data.h b/arch/x86/include/uapi/asm/setup_data.h
>> index 2671c4e1b3a0..844f5b93473f 100644
>> --- a/arch/x86/include/uapi/asm/setup_data.h
>> +++ b/arch/x86/include/uapi/asm/setup_data.h
>> @@ -83,8 +83,8 @@ struct ima_setup_data {
>> * Locations of kexec handover metadata
>> */
>> struct kho_data {
>> - __u64 fdt_addr;
>> - __u64 fdt_size;
>> + __u64 kstate_addr;
>> + __u64 kstate_size;
>> __u64 scratch_addr;
>> __u64 scratch_size;
>> } __attribute__((packed));
>
> Isn't this part of the ABI which you can't just change.
>
> Rob
Yes, it is part of the ABI and layout of this struct can't be changed.
But this change only affects semantics of fields, layout stays the same,
so this should be ok.
If we kexec from an old kernel and kho_data contains FDT address,
the initialization of KSTATE will just fail and boot should continue as
if there was no kho_data at all. At current stage of KHO development,
without real users, change like this should be fine.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 5/7] kstate, test: add test module for testing kstate subsystem.
2025-09-10 0:33 ` Randy Dunlap
@ 2025-09-11 17:00 ` Andrey Ryabinin
0 siblings, 0 replies; 18+ messages in thread
From: Andrey Ryabinin @ 2025-09-11 17:00 UTC (permalink / raw)
To: Randy Dunlap, Andrey Ryabinin, linux-kernel
Cc: Alexander Graf, Mike Rapoport, James Gowans, Andrew Morton,
linux-mm, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Baoquan He, kexec,
Pratyush Yadav, Jason Gunthorpe, Pasha Tatashin, David Rientjes,
Pratyush Yadav, Changyuan Lyu, Jonathan Corbet, linux-doc,
Chris Li, Ashish.Kalra, William Tu, David Matlack
On 9/10/25 2:33 AM, Randy Dunlap wrote:
>
>
> On 9/9/25 1:14 PM, Andrey Ryabinin wrote:
>> diff --git a/kernel/liveupdate/Kconfig b/kernel/liveupdate/Kconfig
>> index b6ea861006bf..af9a25bdcd6e 100644
>> --- a/kernel/liveupdate/Kconfig
>> +++ b/kernel/liveupdate/Kconfig
>> @@ -69,6 +69,14 @@ config KSTATE
>> state, save it into the memory and restore the state after kexec
>> in new kernel.
>>
>> +config KSTATE_TEST
>> + bool "KSTATE test code"
>> + help
>> + Build a simple test/playground code that is useful for kstate
>
> Build simple
>
>> + subsystem development. It contains some structure with different
>
> structures
> ?
>
>> + kind of data which migrated across kexec to the new kernel
>
> kinds which are migrated
>
>> + using KSTATE.
>> +
>
Thanks, Randy. I'll fix this up, and the documentation as well.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 7/7] Documentation, kstate: Add KSTATE documentation
2025-09-10 0:53 ` Bagas Sanjaya
@ 2025-09-11 17:07 ` Andrey Ryabinin
0 siblings, 0 replies; 18+ messages in thread
From: Andrey Ryabinin @ 2025-09-11 17:07 UTC (permalink / raw)
To: Bagas Sanjaya, Andrey Ryabinin, linux-kernel
Cc: Alexander Graf, Mike Rapoport, James Gowans, Andrew Morton,
linux-mm, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Baoquan He, kexec,
Pratyush Yadav, Jason Gunthorpe, Pasha Tatashin, David Rientjes,
Pratyush Yadav, Changyuan Lyu, Jonathan Corbet, linux-doc,
Chris Li, Ashish.Kalra, William Tu, David Matlack
On 9/10/25 2:53 AM, Bagas Sanjaya wrote:
> On Tue, Sep 09, 2025 at 10:14:42PM +0200, Andrey Ryabinin wrote:
>> +There are _V forms of many KSTATE_ macros to load fields for version dependent fields, e.g.
>
> Escape the trailing underscore (i.e. KSTATE\_).
>
>> +Addition of new field can be done as version dependent field by using _V form of
>> +KSTATE_ macro:
>
> Ditto.
>
>> +Subsections
>> +-----------
>> +Another option is adding subsection to kstate_description. A subsection is
>> +additional kstate_description which linked to the main one:
>> +
>> +struct kstate_description test_state_v2 = {
>> + .name = "test_v2",
>> + .id = KSTATE_TEST_ID_V2,
>> + .fields = (const struct kstate_field[]) {
>> + KSTATE_BASE_TYPE(i, struct kstate_test_data, int),
>> + KSTATE_END_OF_LIST()
>> + },
>> +};
>> +
>> +struct kstate_description test_state = {
>> + ......
>> + .subsections = (const struct kstate_description *[]){
>> + &test_state_v2,
>> + NULL
>> + },
>> +};
>
> Sphinx errors out on struct snippets like above:
>
> Documentation/core-api/kstate.rst:17: WARNING: Inline emphasis start-string without end-string. [docutils]
> Documentation/core-api/kstate.rst:17: WARNING: Inline emphasis start-string without end-string. [docutils]
> Documentation/core-api/kstate.rst:21: WARNING: Definition list ends without a blank line; unexpected unindent. [docutils]
> Documentation/core-api/kstate.rst:28: ERROR: Unexpected indentation. [docutils]
> Documentation/core-api/kstate.rst:32: WARNING: Block quote ends without a blank line; unexpected unindent. [docutils]
> Documentation/core-api/kstate.rst:33: WARNING: Definition list ends without a blank line; unexpected unindent. [docutils]
> Documentation/core-api/kstate.rst:84: ERROR: Unexpected indentation. [docutils]
> Documentation/core-api/kstate.rst:100: ERROR: Unexpected indentation. [docutils]
> Documentation/core-api/kstate.rst:102: WARNING: Block quote ends without a blank line; unexpected unindent. [docutils]
> Documentation/core-api/kstate.rst:103: WARNING: Definition list ends without a blank line; unexpected unindent. [docutils]
> Documentation/core-api/kstate.rst:106: CRITICAL: Unexpected section title or transition.
>
> ...... [docutils]
>
> reStructuredText markup error!
>
> I have to wrap them in literal code blocks:
>
Thanks, I will make sure to check that the documentation builds next time.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 6/7] mm/memblock: Use KSTATE instead of kho to preserve preserved_mem_table
2025-09-09 20:14 ` [PATCH v3 6/7] mm/memblock: Use KSTATE instead of kho to preserve preserved_mem_table Andrey Ryabinin
@ 2025-09-15 11:47 ` Jason Gunthorpe
2025-09-18 19:00 ` Andrey Ryabinin
0 siblings, 1 reply; 18+ messages in thread
From: Jason Gunthorpe @ 2025-09-15 11:47 UTC (permalink / raw)
To: Andrey Ryabinin
Cc: linux-kernel, Alexander Graf, Mike Rapoport, James Gowans,
Andrew Morton, linux-mm, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Baoquan He,
kexec, Pratyush Yadav, Pasha Tatashin, David Rientjes,
Pratyush Yadav, Changyuan Lyu, Jonathan Corbet, linux-doc,
Andrey Ryabinin, Chris Li, Ashish.Kalra, William Tu,
David Matlack
On Tue, Sep 09, 2025 at 10:14:41PM +0200, Andrey Ryabinin wrote:
> +static int kstate_preserve_phys(struct kstate_stream *stream, void *obj,
> + const struct kstate_field *field)
> +{
> + struct reserve_mem_table *map = obj;
> +
> + return kho_preserve_phys(map->start, map->size);
> +}
> +
> +struct kstate_description kstate_reserve_mem = {
> + .name = "reserved_mem",
> + .id = KSTATE_RESERVED_MEM_ID,
> + .fields = (const struct kstate_field[]) {
> + KSTATE_BASE_TYPE(name, struct reserve_mem_table,
> + char[RESERVE_MEM_NAME_SIZE]),
> + KSTATE_BASE_TYPE(start, struct reserve_mem_table, phys_addr_t),
> + KSTATE_BASE_TYPE(size, struct reserve_mem_table, phys_addr_t),
> + {
> + .name = "phys_range",
> + .flags = KS_CUSTOM,
> + .save = kstate_preserve_phys,
> + },
> + KSTATE_END_OF_LIST(),
> + },
> +};
>
> static int __init reserve_mem_init(void)
> {
> int err;
> + int i;
>
> if (!kho_is_enabled() || !reserved_mem_count)
> return 0;
>
> + for (i = 0; i < reserved_mem_count; i++) {
> + struct reserve_mem_table *map = &reserved_mem_table[i];
>
> + err = kstate_register(&kstate_reserve_mem,
> + map, crc32(~0, map->name, RESERVE_MEM_NAME_SIZE));
> + if (err)
> + goto out;
> }
As I've said to the other proposals, this doesn't seem to be bringing
that much value compared to just using a normal struct:
for (i = 0; i < reserved_mem_count; i++) {
struct reserve_mem_table *map = &reserved_mem_table[i];
struct khoser_reserve_mem_table abi_map = {.name = map->name. .start = map->start, .size = map->size};
err = kho_preserve_phys(map->start, map->size);
if (err)
return err; // Should unwind the other preservations!
luo_preserve_key(luo_obj, map->name, &abi_map, sizeof(abi_map), VERSION_0);
}
Jason
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 6/7] mm/memblock: Use KSTATE instead of kho to preserve preserved_mem_table
2025-09-15 11:47 ` Jason Gunthorpe
@ 2025-09-18 19:00 ` Andrey Ryabinin
2025-09-18 23:14 ` Jason Gunthorpe
0 siblings, 1 reply; 18+ messages in thread
From: Andrey Ryabinin @ 2025-09-18 19:00 UTC (permalink / raw)
To: Jason Gunthorpe, Andrey Ryabinin
Cc: linux-kernel, Alexander Graf, Mike Rapoport, James Gowans,
Andrew Morton, linux-mm, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Baoquan He,
kexec, Pratyush Yadav, Pasha Tatashin, David Rientjes,
Pratyush Yadav, Changyuan Lyu, Jonathan Corbet, linux-doc,
Chris Li, Ashish.Kalra, William Tu, David Matlack
On 9/15/25 1:47 PM, Jason Gunthorpe wrote:
> On Tue, Sep 09, 2025 at 10:14:41PM +0200, Andrey Ryabinin wrote:
>> +static int kstate_preserve_phys(struct kstate_stream *stream, void *obj,
>> + const struct kstate_field *field)
>> +{
>> + struct reserve_mem_table *map = obj;
>> +
>> + return kho_preserve_phys(map->start, map->size);
>> +}
>> +
>> +struct kstate_description kstate_reserve_mem = {
>> + .name = "reserved_mem",
>> + .id = KSTATE_RESERVED_MEM_ID,
>> + .fields = (const struct kstate_field[]) {
>> + KSTATE_BASE_TYPE(name, struct reserve_mem_table,
>> + char[RESERVE_MEM_NAME_SIZE]),
>> + KSTATE_BASE_TYPE(start, struct reserve_mem_table, phys_addr_t),
>> + KSTATE_BASE_TYPE(size, struct reserve_mem_table, phys_addr_t),
>> + {
>> + .name = "phys_range",
>> + .flags = KS_CUSTOM,
>> + .save = kstate_preserve_phys,
>> + },
>> + KSTATE_END_OF_LIST(),
>> + },
>> +};
>>
>> static int __init reserve_mem_init(void)
>> {
>> int err;
>> + int i;
>>
>> if (!kho_is_enabled() || !reserved_mem_count)
>> return 0;
>>
>> + for (i = 0; i < reserved_mem_count; i++) {
>> + struct reserve_mem_table *map = &reserved_mem_table[i];
>>
>> + err = kstate_register(&kstate_reserve_mem,
>> + map, crc32(~0, map->name, RESERVE_MEM_NAME_SIZE));
>> + if (err)
>> + goto out;
>> }
>
> As I've said to the other proposals, this doesn't seem to be bringing
> that much value compared to just using a normal struct:
We expect to have many such ABI maps across the kernel.
These maps will share common elements - simple types, folios, and preserved
regions.
With the approach you're suggesting, we'd need to re-implement the same
preserve/unpreserve/recover logic, error handling, and unwind code for
every individual ABI map. That quickly becomes repetitive and error-prone.
By contrast, KSTATE centralizes this logic. It avoids duplicating code
and lets us express the preservation details declaratively instead
of re-implementing them per struct.
> for (i = 0; i < reserved_mem_count; i++) {
> struct reserve_mem_table *map = &reserved_mem_table[i];
> struct khoser_reserve_mem_table abi_map = {.name = map->name. .start = map->start, .size = map->size};
>
> err = kho_preserve_phys(map->start, map->size);
> if (err)
> return err; // Should unwind the other preservations!
>
> luo_preserve_key(luo_obj, map->name, &abi_map, sizeof(abi_map), VERSION_0);
On the versioning side:
With this approach, introducing a new ABI version (say, abi_map_v1)
would require us to maintain restore logic for each supported version,
and carefully handle upgrades between them.
With KSTATE, versioning is built in. For example, adding a new field can
simply be expressed as:
KSTATE_BASE_TYPE_V(new_field, struct reserve_mem_table, int, 1);
This way, the framework handles compatibility, and we don’t need to manually
write version-specific restore paths for each ABI map.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 6/7] mm/memblock: Use KSTATE instead of kho to preserve preserved_mem_table
2025-09-18 19:00 ` Andrey Ryabinin
@ 2025-09-18 23:14 ` Jason Gunthorpe
0 siblings, 0 replies; 18+ messages in thread
From: Jason Gunthorpe @ 2025-09-18 23:14 UTC (permalink / raw)
To: Andrey Ryabinin
Cc: Andrey Ryabinin, linux-kernel, Alexander Graf, Mike Rapoport,
James Gowans, Andrew Morton, linux-mm, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Baoquan He, kexec, Pratyush Yadav, Pasha Tatashin,
David Rientjes, Pratyush Yadav, Changyuan Lyu, Jonathan Corbet,
linux-doc, Chris Li, Ashish.Kalra, William Tu, David Matlack
On Thu, Sep 18, 2025 at 09:00:31PM +0200, Andrey Ryabinin wrote:
> By contrast, KSTATE centralizes this logic. It avoids duplicating code
> and lets us express the preservation details declaratively instead
> of re-implementing them per struct.
I didn't really see it centralize much of anything, it is just a long
way to spell "memcpy" the way it is being shown here.
I'm all for consolidating, but please do actually show some
consolidation..
> On the versioning side:
> With this approach, introducing a new ABI version (say, abi_map_v1)
> would require us to maintain restore logic for each supported version,
> and carefully handle upgrades between them.
Yes, you MUST do this. It cannot be magically avoided.
> With KSTATE, versioning is built in. For example, adding a new field can
> simply be expressed as:
No, it isn't. The code still has to process versions and still has to
understand what to do when the unpacked struct didn't have its fields
written.
If anything it is making it more obfuscated and complicated to tell if
the comparability is done correctly or not.
Jason
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2025-09-18 23:14 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-09-09 20:14 [PATCH v3 0/7] KSTATE: [de]serialization framework for KHO Andrey Ryabinin
2025-09-09 20:14 ` [PATCH v3 1/7] kho: move fdt setup in separate helper Andrey Ryabinin
2025-09-09 20:14 ` [PATCH v3 2/7] kho: move scratch memory " Andrey Ryabinin
2025-09-09 20:14 ` [PATCH v3 3/7] kstate: Add KSTATE - [de]serialization framework for KHO Andrey Ryabinin
2025-09-09 20:14 ` [PATCH v3 4/7] kho: replace KHO FDT with kstate metadata Andrey Ryabinin
2025-09-10 16:50 ` Rob Herring
2025-09-11 16:54 ` Andrey Ryabinin
2025-09-09 20:14 ` [PATCH v3 5/7] kstate, test: add test module for testing kstate subsystem Andrey Ryabinin
2025-09-10 0:33 ` Randy Dunlap
2025-09-11 17:00 ` Andrey Ryabinin
2025-09-09 20:14 ` [PATCH v3 6/7] mm/memblock: Use KSTATE instead of kho to preserve preserved_mem_table Andrey Ryabinin
2025-09-15 11:47 ` Jason Gunthorpe
2025-09-18 19:00 ` Andrey Ryabinin
2025-09-18 23:14 ` Jason Gunthorpe
2025-09-09 20:14 ` [PATCH v3 7/7] Documentation, kstate: Add KSTATE documentation Andrey Ryabinin
2025-09-10 0:53 ` Bagas Sanjaya
2025-09-11 17:07 ` Andrey Ryabinin
2025-09-10 1:00 ` Randy Dunlap
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox