* [PATCH v1 0/3] Make KHO Stateless
@ 2025-10-01 1:19 Jason Miu
2025-10-01 1:19 ` [PATCH v1 1/3] kho: Adopt KHO radix tree data structures Jason Miu
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Jason Miu @ 2025-10-01 1:19 UTC (permalink / raw)
To: Alexander Graf, Andrew Morton, Baoquan He, Changyuan Lyu,
David Matlack, David Rientjes, Jason Gunthorpe, Jason Miu,
Mike Rapoport, Pasha Tatashin, Pratyush Yadav, kexec,
linux-kernel, linux-mm
This series transitions KHO from an xarray-based metadata tracking system
with serialization to using a radix tree data structure that can be
passed directly to the next kernel.
The key motivations for this change are to:
- Eliminate the need for data serialization before kexec.
- Remove the former KHO state machine by deprecating the finalize
and abort states.
- Pass preservation metadata more directly to the next kernel via the FDT.
The new approach uses a radix tree to mark preserved pages. A page's
physical address and its order are encoded into a single value. The tree
is composed of multiple levels of page-sized tables, with leaf nodes
being bitmap tables where each set bit represents a preserved page. The
physical address of the radix tree's root is passed in the FDT, allowing
the next kernel to reconstruct the preserved memory map.
The series includes the following changes:
1. kho: Adopt KHO radix tree data structures: Replaces the xarray-based
tracker with the new radix tree implementation and removes the
serialization/finalization code, thereby eliminating the KHO finalize
and abort states.
2. memblock: Remove KHO notifier usage: Decouples the memblock subsystem
from the KHO notifier system, switching it to use direct KHO API calls
and adjusting KHO FDT completion timing.
3. kho: Remove notifier system infrastructure: Removes the now-unused
notifier infrastructure from the KHO core.
Jason Miu (3):
kho: Adopt KHO radix tree data structures
memblock: Remove KHO notifier usage
kho: Remove notifier system infrastructure
include/linux/kexec_handover.h | 44 +-
kernel/kexec_handover.c | 788 +++++++++++++++------------------
mm/memblock.c | 45 +-
3 files changed, 362 insertions(+), 515 deletions(-)
--
2.51.0.618.g983fd99d29-goog
^ permalink raw reply [flat|nested] 13+ messages in thread* [PATCH v1 1/3] kho: Adopt KHO radix tree data structures 2025-10-01 1:19 [PATCH v1 0/3] Make KHO Stateless Jason Miu @ 2025-10-01 1:19 ` Jason Miu 2025-10-02 4:29 ` kernel test robot 2025-10-06 14:14 ` Jason Gunthorpe 2025-10-01 1:19 ` [PATCH v1 2/3] memblock: Remove KHO notifier usage Jason Miu 2025-10-01 1:19 ` [PATCH v1 3/3] kho: Remove notifier system infrastructure Jason Miu 2 siblings, 2 replies; 13+ messages in thread From: Jason Miu @ 2025-10-01 1:19 UTC (permalink / raw) To: Alexander Graf, Andrew Morton, Baoquan He, Changyuan Lyu, David Matlack, David Rientjes, Jason Gunthorpe, Jason Miu, Mike Rapoport, Pasha Tatashin, Pratyush Yadav, kexec, linux-kernel, linux-mm Introduce a radix tree data structure for tracking preserved memory pages in KHO, which will replace the current xarray-based implementation. The primary motivation for this change is to eliminate the need for serialization. By marking preserved pages directly in the new KHO radix tree and passing them to the next kernel, the entire serialization process can be removed. This ultimately allows for the removal of the KHO finalize and abort states, simplifying the overall design. The preserved page physical address and its order are encoded in to a value. The KHO radix tree has multiple level of nodes where each node is a table contining a descriptor to the next level of nodes. The encoded value get split and stored its parts along the tree traversal. The tree traversal ends with the `kho_bitmap_table`, where each bit represents a single preserved page. Instead of serializing the memory map, the first kernel store the KHO radix tree root in the FDT. This KHO radix tree root is passed to the second kernel after kexec, hence elimitated the KHO finalize and abort states. The second kernel walks the passed-in KHO radix tree from its root. It restores the memory pages and their orders by decoding the value stored in the KHO radix tree. This architectural shift to using a shared radix tree structure simplifies the KHO design and eliminates the overhead of serializing and deserializing the preserved memory map. Signed-off-by: Jason Miu <jasonmiu@google.com> --- include/linux/kexec_handover.h | 17 - kernel/kexec_handover.c | 729 +++++++++++++++------------------ 2 files changed, 322 insertions(+), 424 deletions(-) diff --git a/include/linux/kexec_handover.h b/include/linux/kexec_handover.h index 348844cffb13..c8229cb11f4b 100644 --- a/include/linux/kexec_handover.h +++ b/include/linux/kexec_handover.h @@ -19,23 +19,6 @@ enum kho_event { struct folio; struct notifier_block; -#define DECLARE_KHOSER_PTR(name, type) \ - union { \ - phys_addr_t phys; \ - type ptr; \ - } name -#define KHOSER_STORE_PTR(dest, val) \ - ({ \ - typeof(val) v = val; \ - typecheck(typeof((dest).ptr), v); \ - (dest).phys = virt_to_phys(v); \ - }) -#define KHOSER_LOAD_PTR(src) \ - ({ \ - typeof(src) s = src; \ - (typeof((s).ptr))((s).phys ? phys_to_virt((s).phys) : NULL); \ - }) - struct kho_serialization; #ifdef CONFIG_KEXEC_HANDOVER diff --git a/kernel/kexec_handover.c b/kernel/kexec_handover.c index ecd1ac210dbd..34cf0ce4f359 100644 --- a/kernel/kexec_handover.c +++ b/kernel/kexec_handover.c @@ -18,6 +18,7 @@ #include <linux/memblock.h> #include <linux/notifier.h> #include <linux/page-isolation.h> +#include <linux/rwsem.h> #include <asm/early_ioremap.h> @@ -29,7 +30,7 @@ #include "kexec_internal.h" #define KHO_FDT_COMPATIBLE "kho-v1" -#define PROP_PRESERVED_MEMORY_MAP "preserved-memory-map" +#define PROP_PRESERVED_PAGE_RADIX_TREE "preserved-page-radix-tree" #define PROP_SUB_FDT "fdt" static bool kho_enable __ro_after_init; @@ -46,143 +47,306 @@ static int __init kho_parse_enable(char *p) } early_param("kho", kho_parse_enable); +typedef int (*kho_radix_tree_walk_callback_t)(unsigned long encoded); + /* - * Keep track of memory that is to be preserved across KHO. + * The KHO radix tree tracks preserved memory pages. It is a hierarchical + * structure that starts with a single root `kho_radix_tree`. This single + * tree stores pages of all orders. + * + * This is achieved by encoding the page's physical address and its order into + * a single `unsigned long` value. This encoded value is then used to traverse + * the tree. + * + * The tree hierarchy is shown below: + * + * kho_radix_tree_root + * +-------------------+ + * | Level 6 | (struct kho_radix_tree) + * +-------------------+ + * | + * v + * +-------------------+ + * | Level 5 | (struct kho_radix_tree) + * +-------------------+ + * | + * | ... (intermediate levels) + * | + * v + * +-------------------+ + * | Level 1 | (struct kho_bitmap_table) + * +-------------------+ + * + * The following diagram illustrates how the encoded value is split into + * indices for the tree levels: * - * The serializing side uses two levels of xarrays to manage chunks of per-order - * 512 byte bitmaps. For instance if PAGE_SIZE = 4096, the entire 1G order of a - * 1TB system would fit inside a single 512 byte bitmap. For order 0 allocations - * each bitmap will cover 16M of address space. Thus, for 16G of memory at most - * 512K of bitmap memory will be needed for order 0. + * 63:60 59:51 50:42 41:33 32:24 23:15 14:0 + * +---------+--------+--------+--------+--------+--------+-----------------+ + * | 0 | Lv 6 | Lv 5 | Lv 4 | Lv 3 | Lv 2 | Lv 1 (bitmap) | + * +---------+--------+--------+--------+--------+--------+-----------------+ * - * This approach is fully incremental, as the serialization progresses folios - * can continue be aggregated to the tracker. The final step, immediately prior - * to kexec would serialize the xarray information into a linked list for the - * successor kernel to parse. + * Each `kho_radix_tree` (Levels 2-6) and `kho_bitmap_table` (Level 1) is + * PAGE_SIZE. Each entry in a `kho_radix_tree` is a descriptor (a physical + * address) pointing to the next level node. For Level 2 `kho_radix_tree` + * nodes, these descriptors point to a `kho_bitmap_table`. The final + * `kho_bitmap_table` is a bitmap where each set bit represents a single + * preserved page. */ +struct kho_radix_tree { + unsigned long table[PAGE_SIZE / sizeof(unsigned long)]; +}; -#define PRESERVE_BITS (512 * 8) - -struct kho_mem_phys_bits { - DECLARE_BITMAP(preserve, PRESERVE_BITS); +struct kho_bitmap_table { + unsigned long bitmaps[PAGE_SIZE / sizeof(unsigned long)]; }; -struct kho_mem_phys { +/* + * `kho_radix_tree_root` points to a page thats serves as the root of the + * KHO radix tree. This page is allocated during KHO module initialization. + * Its physical address is written to the FDT and passed to the next kernel + * during kexec. + */ +static struct kho_radix_tree *kho_radix_tree_root; +static DECLARE_RWSEM(kho_radix_tree_root_sem); + +static int kho_radix_tree_max_depth(void) +{ + int page_offset_bit_num = BITS_PER_LONG - PAGE_SHIFT; + int order_bit_num = ilog2(__roundup_pow_of_two(page_offset_bit_num)); + int bitmap_bit_num = PAGE_SHIFT + ilog2(BITS_PER_BYTE); + int table_bit_num = ilog2(PAGE_SIZE / sizeof(unsigned long)); + int table_level_num = DIV_ROUND_UP(page_offset_bit_num - + bitmap_bit_num + order_bit_num, + table_bit_num); + /* - * Points to kho_mem_phys_bits, a sparse bitmap array. Each bit is sized - * to order. + * The total tree depth is the number of intermediate levels + * and 1 bitmap level. */ - struct xarray phys_bits; -}; + return table_level_num + 1; +} -struct kho_mem_track { - /* Points to kho_mem_phys, each order gets its own bitmap tree */ - struct xarray orders; -}; +static struct kho_radix_tree *kho_alloc_radix_tree(void) +{ + return (struct kho_radix_tree *)get_zeroed_page(GFP_KERNEL); +} -struct khoser_mem_chunk; +/* + * The KHO radix tree tracks preserved pages by encoding a page's physical + * address (pa) and its order into a single unsigned long value. This value + * is then used to traverse the tree. The encoded value is composed of two + * parts: the 'order bits' in the upper part and the 'page offset' in the + * lower part. + * + * <-- Higher Bits ------------------------------------ Lower Bits --> + * +--------------------------+-----------------------------------------+ + * | Order Bits | Page Offset | + * +--------------------------+-----------------------------------------+ + * | ... 0 0 1 0 0 ... | pa >> (PAGE_SHIFT + order) | + * +--------------------------+-----------------------------------------+ + * ^ + * | + * This single '1' bit's position + * uniquely identifies the 'order'. + * + * + * Page Offset: + * The 'page offset' is the physical address normalized for its order. It + * effectively represents the page offset for the given order. + * + * Order Bits: + * The 'order bits' encode the page order by setting a single bit at a + * specific position. The position of this bit itself represents the order. + * + * For instance, on a 64-bit system with 4KB pages (PAGE_SHIFT = 12), the + * maximum range for a page offset (for order 0) is 52 bits (64 - 12). This + * offset occupies bits [0-51]. For order 0, the order bit is set at + * position 52. + * + * As the order increases, the number of bits required for the 'page offset' + * decreases. For example, order 1 requires one less bit for its page + * offset. This allows its order bit to be set at position 51 without + * conflicting with the page offset bits. + * + * This scheme ensures that the single order bit is always in a higher + * position than any bit used by the page offset for that same order, + * preventing collisions. + */ +static unsigned long kho_radix_encode(unsigned long pa, unsigned int order) +{ + unsigned long h = 1UL << (BITS_PER_LONG - PAGE_SHIFT - order); + unsigned long l = pa >> (PAGE_SHIFT + order); -struct kho_serialization { - struct page *fdt; - struct list_head fdt_list; - struct dentry *sub_fdt_dir; - struct kho_mem_track track; - /* First chunk of serialized preserved memory map */ - struct khoser_mem_chunk *preserved_mem_map; -}; + return h | l; +} -static void *xa_load_or_alloc(struct xarray *xa, unsigned long index, size_t sz) +static unsigned long kho_radix_decode(unsigned long encoded, unsigned int *order) { - void *elm, *res; + unsigned long order_bit = fls64(encoded); + unsigned long pa; - elm = xa_load(xa, index); - if (elm) - return elm; + *order = BITS_PER_LONG - PAGE_SHIFT - order_bit + 1; + pa = encoded << (PAGE_SHIFT + *order); - elm = kzalloc(sz, GFP_KERNEL); - if (!elm) - return ERR_PTR(-ENOMEM); - - res = xa_cmpxchg(xa, index, NULL, elm, GFP_KERNEL); - if (xa_is_err(res)) - res = ERR_PTR(xa_err(res)); + return pa; +} - if (res) { - kfree(elm); - return res; +static unsigned long kho_radix_get_index(unsigned long encoded, int level) +{ + int table_bit_num = ilog2(PAGE_SIZE / sizeof(unsigned long)); + int bitmap_bit_num = PAGE_SHIFT + ilog2(BITS_PER_BYTE); + unsigned long mask; + int s; + + if (level == 1) { + s = 0; + mask = (1UL << bitmap_bit_num) - 1; + } else { + s = ((level - 2) * table_bit_num) + bitmap_bit_num; + mask = (1UL << table_bit_num) - 1; } - return elm; + return (encoded >> s) & mask; } -static void __kho_unpreserve(struct kho_mem_track *track, unsigned long pfn, - unsigned long end_pfn) +static int kho_radix_set_bitmap(struct kho_bitmap_table *bit_tlb, unsigned long offset) { - struct kho_mem_phys_bits *bits; - struct kho_mem_phys *physxa; + if (!bit_tlb || + offset >= PAGE_SIZE * BITS_PER_BYTE) + return -EINVAL; - while (pfn < end_pfn) { - const unsigned int order = - min(count_trailing_zeros(pfn), ilog2(end_pfn - pfn)); - const unsigned long pfn_high = pfn >> order; + set_bit(offset, bit_tlb->bitmaps); + return 0; +} - physxa = xa_load(&track->orders, order); - if (!physxa) - continue; +static int kho_radix_preserve_page(unsigned long pa, unsigned int order) +{ + unsigned long encoded = kho_radix_encode(pa, order); + int num_tree_level = kho_radix_tree_max_depth(); + struct kho_radix_tree *current_tree, *new_tree; + struct kho_bitmap_table *bitmap_table; + int err = 0; + int i, idx; - bits = xa_load(&physxa->phys_bits, pfn_high / PRESERVE_BITS); - if (!bits) - continue; + down_write(&kho_radix_tree_root_sem); - clear_bit(pfn_high % PRESERVE_BITS, bits->preserve); + current_tree = kho_radix_tree_root; - pfn += 1 << order; + /* Go from high levels to low levels */ + for (i = num_tree_level; i >= 1; i--) { + idx = kho_radix_get_index(encoded, i); + + if (i == 1) { + bitmap_table = (struct kho_bitmap_table *)current_tree; + err = kho_radix_set_bitmap(bitmap_table, idx); + goto out; + } + + if (!current_tree->table[idx]) { + new_tree = kho_alloc_radix_tree(); + if (!new_tree) { + err = -ENOMEM; + goto out; + } + + current_tree->table[idx] = + (unsigned long)virt_to_phys(new_tree); + } + + current_tree = (struct kho_radix_tree *) + phys_to_virt(current_tree->table[idx]); } + +out: + up_write(&kho_radix_tree_root_sem); + return err; } -static int __kho_preserve_order(struct kho_mem_track *track, unsigned long pfn, - unsigned int order) +static int kho_radix_walk_bitmaps(struct kho_bitmap_table *bit_tlb, + unsigned long offset, + kho_radix_tree_walk_callback_t cb) { - struct kho_mem_phys_bits *bits; - struct kho_mem_phys *physxa, *new_physxa; - const unsigned long pfn_high = pfn >> order; + unsigned long encoded = offset << (PAGE_SHIFT + ilog2(BITS_PER_BYTE)); + unsigned long *bitmap = (unsigned long *)bit_tlb; + int err = 0; + int i; - might_sleep(); + for_each_set_bit(i, bitmap, PAGE_SIZE * BITS_PER_BYTE) { + err = cb(encoded | i); + if (err) + return err; + } - physxa = xa_load(&track->orders, order); - if (!physxa) { - int err; + return 0; +} - new_physxa = kzalloc(sizeof(*physxa), GFP_KERNEL); - if (!new_physxa) - return -ENOMEM; +static int kho_radix_walk_trees(struct kho_radix_tree *root, int level, + unsigned long offset, + kho_radix_tree_walk_callback_t cb) +{ + int level_shift = ilog2(PAGE_SIZE / sizeof(unsigned long)); + struct kho_radix_tree *next_tree; + unsigned long encoded, i; + int err = 0; - xa_init(&new_physxa->phys_bits); - physxa = xa_cmpxchg(&track->orders, order, NULL, new_physxa, - GFP_KERNEL); + if (level == 1) { + encoded = offset; + return kho_radix_walk_bitmaps((struct kho_bitmap_table *)root, + encoded, cb); + } - err = xa_err(physxa); - if (err || physxa) { - xa_destroy(&new_physxa->phys_bits); - kfree(new_physxa); + for (i = 0; i < PAGE_SIZE / sizeof(unsigned long); i++) { + if (root->table[i]) { + encoded = offset << level_shift | i; + next_tree = (struct kho_radix_tree *) + phys_to_virt(root->table[i]); + err = kho_radix_walk_trees(next_tree, level - 1, encoded, cb); if (err) return err; - } else { - physxa = new_physxa; } } - bits = xa_load_or_alloc(&physxa->phys_bits, pfn_high / PRESERVE_BITS, - sizeof(*bits)); - if (IS_ERR(bits)) - return PTR_ERR(bits); + return 0; +} - set_bit(pfn_high % PRESERVE_BITS, bits->preserve); +static int kho_memblock_reserve(phys_addr_t pa, int order) +{ + int sz = 1 << (order + PAGE_SHIFT); + struct page *page = phys_to_page(pa); + + memblock_reserve(pa, sz); + memblock_reserved_mark_noinit(pa, sz); + page->private = order; return 0; } +static int kho_radix_walk_trees_callback(unsigned long encoded) +{ + unsigned int order; + unsigned long pa; + + pa = kho_radix_decode(encoded, &order); + + return kho_memblock_reserve(pa, order); +} + +struct kho_serialization { + struct page *fdt; + struct list_head fdt_list; + struct dentry *sub_fdt_dir; +}; + +static int __kho_preserve_order(unsigned long pfn, unsigned int order) +{ + unsigned long pa = PFN_PHYS(pfn); + + might_sleep(); + + return kho_radix_preserve_page(pa, order); +} + /* almost as free_reserved_page(), just don't free the page */ static void kho_restore_page(struct page *page, unsigned int order) { @@ -224,152 +388,29 @@ struct folio *kho_restore_folio(phys_addr_t phys) } EXPORT_SYMBOL_GPL(kho_restore_folio); -/* Serialize and deserialize struct kho_mem_phys across kexec - * - * Record all the bitmaps in a linked list of pages for the next kernel to - * process. Each chunk holds bitmaps of the same order and each block of bitmaps - * starts at a given physical address. This allows the bitmaps to be sparse. The - * xarray is used to store them in a tree while building up the data structure, - * but the KHO successor kernel only needs to process them once in order. - * - * All of this memory is normal kmalloc() memory and is not marked for - * preservation. The successor kernel will remain isolated to the scratch space - * until it completes processing this list. Once processed all the memory - * storing these ranges will be marked as free. - */ - -struct khoser_mem_bitmap_ptr { - phys_addr_t phys_start; - DECLARE_KHOSER_PTR(bitmap, struct kho_mem_phys_bits *); -}; - -struct khoser_mem_chunk_hdr { - DECLARE_KHOSER_PTR(next, struct khoser_mem_chunk *); - unsigned int order; - unsigned int num_elms; -}; - -#define KHOSER_BITMAP_SIZE \ - ((PAGE_SIZE - sizeof(struct khoser_mem_chunk_hdr)) / \ - sizeof(struct khoser_mem_bitmap_ptr)) - -struct khoser_mem_chunk { - struct khoser_mem_chunk_hdr hdr; - struct khoser_mem_bitmap_ptr bitmaps[KHOSER_BITMAP_SIZE]; -}; - -static_assert(sizeof(struct khoser_mem_chunk) == PAGE_SIZE); - -static struct khoser_mem_chunk *new_chunk(struct khoser_mem_chunk *cur_chunk, - unsigned long order) -{ - struct khoser_mem_chunk *chunk; - - chunk = kzalloc(PAGE_SIZE, GFP_KERNEL); - if (!chunk) - return NULL; - chunk->hdr.order = order; - if (cur_chunk) - KHOSER_STORE_PTR(cur_chunk->hdr.next, chunk); - return chunk; -} - -static void kho_mem_ser_free(struct khoser_mem_chunk *first_chunk) -{ - struct khoser_mem_chunk *chunk = first_chunk; - - while (chunk) { - struct khoser_mem_chunk *tmp = chunk; - - chunk = KHOSER_LOAD_PTR(chunk->hdr.next); - kfree(tmp); - } -} - -static int kho_mem_serialize(struct kho_serialization *ser) -{ - struct khoser_mem_chunk *first_chunk = NULL; - struct khoser_mem_chunk *chunk = NULL; - struct kho_mem_phys *physxa; - unsigned long order; - - xa_for_each(&ser->track.orders, order, physxa) { - struct kho_mem_phys_bits *bits; - unsigned long phys; - - chunk = new_chunk(chunk, order); - if (!chunk) - goto err_free; - - if (!first_chunk) - first_chunk = chunk; - - xa_for_each(&physxa->phys_bits, phys, bits) { - struct khoser_mem_bitmap_ptr *elm; - - if (chunk->hdr.num_elms == ARRAY_SIZE(chunk->bitmaps)) { - chunk = new_chunk(chunk, order); - if (!chunk) - goto err_free; - } - - elm = &chunk->bitmaps[chunk->hdr.num_elms]; - chunk->hdr.num_elms++; - elm->phys_start = (phys * PRESERVE_BITS) - << (order + PAGE_SHIFT); - KHOSER_STORE_PTR(elm->bitmap, bits); - } - } - - ser->preserved_mem_map = first_chunk; - - return 0; - -err_free: - kho_mem_ser_free(first_chunk); - return -ENOMEM; -} - -static void __init deserialize_bitmap(unsigned int order, - struct khoser_mem_bitmap_ptr *elm) -{ - struct kho_mem_phys_bits *bitmap = KHOSER_LOAD_PTR(elm->bitmap); - unsigned long bit; - - for_each_set_bit(bit, bitmap->preserve, PRESERVE_BITS) { - int sz = 1 << (order + PAGE_SHIFT); - phys_addr_t phys = - elm->phys_start + (bit << (order + PAGE_SHIFT)); - struct page *page = phys_to_page(phys); - - memblock_reserve(phys, sz); - memblock_reserved_mark_noinit(phys, sz); - page->private = order; - } -} - static void __init kho_mem_deserialize(const void *fdt) { - struct khoser_mem_chunk *chunk; const phys_addr_t *mem; int len; + struct kho_radix_tree *tree_root; - mem = fdt_getprop(fdt, 0, PROP_PRESERVED_MEMORY_MAP, &len); + /* Retrieve the KHO radix tree from passed-in FDT. */ + mem = fdt_getprop(fdt, 0, PROP_PRESERVED_PAGE_RADIX_TREE, &len); if (!mem || len != sizeof(*mem)) { - pr_err("failed to get preserved memory bitmaps\n"); + pr_err("failed to get preserved KHO memory tree\n"); return; } - chunk = *mem ? phys_to_virt(*mem) : NULL; - while (chunk) { - unsigned int i; + tree_root = *mem ? + (struct kho_radix_tree *)phys_to_virt(*mem) : + NULL; - for (i = 0; i != chunk->hdr.num_elms; i++) - deserialize_bitmap(chunk->hdr.order, - &chunk->bitmaps[i]); - chunk = KHOSER_LOAD_PTR(chunk->hdr.next); - } + if (!tree_root) + return; + + kho_radix_walk_trees(tree_root, kho_radix_tree_max_depth(), + 0, kho_radix_walk_trees_callback); } /* @@ -633,25 +674,15 @@ EXPORT_SYMBOL_GPL(kho_add_subtree); struct kho_out { struct blocking_notifier_head chain_head; - struct dentry *dir; - - struct mutex lock; /* protects KHO FDT finalization */ - struct kho_serialization ser; - bool finalized; }; static struct kho_out kho_out = { .chain_head = BLOCKING_NOTIFIER_INIT(kho_out.chain_head), - .lock = __MUTEX_INITIALIZER(kho_out.lock), .ser = { .fdt_list = LIST_HEAD_INIT(kho_out.ser.fdt_list), - .track = { - .orders = XARRAY_INIT(kho_out.ser.track.orders, 0), - }, }, - .finalized = false, }; int register_kho_notifier(struct notifier_block *nb) @@ -679,12 +710,8 @@ int kho_preserve_folio(struct folio *folio) { const unsigned long pfn = folio_pfn(folio); const unsigned int order = folio_order(folio); - struct kho_mem_track *track = &kho_out.ser.track; - if (kho_out.finalized) - return -EBUSY; - - return __kho_preserve_order(track, pfn, order); + return __kho_preserve_order(pfn, order); } EXPORT_SYMBOL_GPL(kho_preserve_folio); @@ -701,14 +728,8 @@ EXPORT_SYMBOL_GPL(kho_preserve_folio); int kho_preserve_phys(phys_addr_t phys, size_t size) { unsigned long pfn = PHYS_PFN(phys); - unsigned long failed_pfn = 0; - const unsigned long start_pfn = pfn; const unsigned long end_pfn = PHYS_PFN(phys + size); int err = 0; - struct kho_mem_track *track = &kho_out.ser.track; - - if (kho_out.finalized) - return -EBUSY; if (!PAGE_ALIGNED(phys) || !PAGE_ALIGNED(size)) return -EINVAL; @@ -717,19 +738,14 @@ int kho_preserve_phys(phys_addr_t phys, size_t size) const unsigned int order = min(count_trailing_zeros(pfn), ilog2(end_pfn - pfn)); - err = __kho_preserve_order(track, pfn, order); - if (err) { - failed_pfn = pfn; - break; - } + err = __kho_preserve_order(pfn, order); + if (err) + return err; pfn += 1 << order; } - if (err) - __kho_unpreserve(track, start_pfn, failed_pfn); - - return err; + return 0; } EXPORT_SYMBOL_GPL(kho_preserve_phys); @@ -737,150 +753,6 @@ EXPORT_SYMBOL_GPL(kho_preserve_phys); static struct dentry *debugfs_root; -static int kho_out_update_debugfs_fdt(void) -{ - int err = 0; - struct fdt_debugfs *ff, *tmp; - - if (kho_out.finalized) { - err = kho_debugfs_fdt_add(&kho_out.ser.fdt_list, kho_out.dir, - "fdt", page_to_virt(kho_out.ser.fdt)); - } else { - list_for_each_entry_safe(ff, tmp, &kho_out.ser.fdt_list, list) { - debugfs_remove(ff->file); - list_del(&ff->list); - kfree(ff); - } - } - - return err; -} - -static int kho_abort(void) -{ - int err; - unsigned long order; - struct kho_mem_phys *physxa; - - xa_for_each(&kho_out.ser.track.orders, order, physxa) { - struct kho_mem_phys_bits *bits; - unsigned long phys; - - xa_for_each(&physxa->phys_bits, phys, bits) - kfree(bits); - - xa_destroy(&physxa->phys_bits); - kfree(physxa); - } - xa_destroy(&kho_out.ser.track.orders); - - if (kho_out.ser.preserved_mem_map) { - kho_mem_ser_free(kho_out.ser.preserved_mem_map); - kho_out.ser.preserved_mem_map = NULL; - } - - err = blocking_notifier_call_chain(&kho_out.chain_head, KEXEC_KHO_ABORT, - NULL); - err = notifier_to_errno(err); - - if (err) - pr_err("Failed to abort KHO finalization: %d\n", err); - - return err; -} - -static int kho_finalize(void) -{ - int err = 0; - u64 *preserved_mem_map; - void *fdt = page_to_virt(kho_out.ser.fdt); - - err |= fdt_create(fdt, PAGE_SIZE); - err |= fdt_finish_reservemap(fdt); - err |= fdt_begin_node(fdt, ""); - err |= fdt_property_string(fdt, "compatible", KHO_FDT_COMPATIBLE); - /** - * Reserve the preserved-memory-map property in the root FDT, so - * that all property definitions will precede subnodes created by - * KHO callers. - */ - err |= fdt_property_placeholder(fdt, PROP_PRESERVED_MEMORY_MAP, - sizeof(*preserved_mem_map), - (void **)&preserved_mem_map); - if (err) - goto abort; - - err = kho_preserve_folio(page_folio(kho_out.ser.fdt)); - if (err) - goto abort; - - err = blocking_notifier_call_chain(&kho_out.chain_head, - KEXEC_KHO_FINALIZE, &kho_out.ser); - err = notifier_to_errno(err); - if (err) - goto abort; - - err = kho_mem_serialize(&kho_out.ser); - if (err) - goto abort; - - *preserved_mem_map = (u64)virt_to_phys(kho_out.ser.preserved_mem_map); - - err |= fdt_end_node(fdt); - err |= fdt_finish(fdt); - -abort: - if (err) { - pr_err("Failed to convert KHO state tree: %d\n", err); - kho_abort(); - } - - return err; -} - -static int kho_out_finalize_get(void *data, u64 *val) -{ - mutex_lock(&kho_out.lock); - *val = kho_out.finalized; - mutex_unlock(&kho_out.lock); - - return 0; -} - -static int kho_out_finalize_set(void *data, u64 _val) -{ - int ret = 0; - bool val = !!_val; - - mutex_lock(&kho_out.lock); - - if (val == kho_out.finalized) { - if (kho_out.finalized) - ret = -EEXIST; - else - ret = -ENOENT; - goto unlock; - } - - if (val) - ret = kho_finalize(); - else - ret = kho_abort(); - - if (ret) - goto unlock; - - kho_out.finalized = val; - ret = kho_out_update_debugfs_fdt(); - -unlock: - mutex_unlock(&kho_out.lock); - return ret; -} - -DEFINE_DEBUGFS_ATTRIBUTE(fops_kho_out_finalize, kho_out_finalize_get, - kho_out_finalize_set, "%llu\n"); - static int scratch_phys_show(struct seq_file *m, void *v) { for (int i = 0; i < kho_scratch_cnt; i++) @@ -921,11 +793,6 @@ static __init int kho_out_debugfs_init(void) if (IS_ERR(f)) goto err_rmdir; - f = debugfs_create_file("finalize", 0600, dir, NULL, - &fops_kho_out_finalize); - if (IS_ERR(f)) - goto err_rmdir; - kho_out.dir = dir; kho_out.ser.sub_fdt_dir = sub_fdt_dir; return 0; @@ -1037,6 +904,37 @@ static __init int kho_in_debugfs_init(const void *fdt) return err; } +static int kho_out_fdt_init(void) +{ + int err = 0; + void *fdt = page_to_virt(kho_out.ser.fdt); + u64 *preserved_radix_tree_root; + + err |= fdt_create(fdt, PAGE_SIZE); + err |= fdt_finish_reservemap(fdt); + err |= fdt_begin_node(fdt, ""); + err |= fdt_property_string(fdt, "compatible", KHO_FDT_COMPATIBLE); + + err |= fdt_property_placeholder(fdt, PROP_PRESERVED_PAGE_RADIX_TREE, + sizeof(*preserved_radix_tree_root), + (void **)&preserved_radix_tree_root); + if (err) + goto abort; + + down_read(&kho_radix_tree_root_sem); + *preserved_radix_tree_root = (u64)virt_to_phys(kho_radix_tree_root); + up_read(&kho_radix_tree_root_sem); + + err |= fdt_end_node(fdt); + err |= fdt_finish(fdt); + +abort: + if (err) + pr_err("Failed to convert KHO memory tree: %d\n", err); + + return err; +} + static __init int kho_init(void) { int err = 0; @@ -1051,15 +949,29 @@ static __init int kho_init(void) goto err_free_scratch; } + kho_radix_tree_root = (struct kho_radix_tree *) + kzalloc(PAGE_SIZE, GFP_KERNEL); + if (!kho_radix_tree_root) { + err = -ENOMEM; + goto err_free_fdt; + } + + err = kho_out_fdt_init(); + if (err) + goto err_free_kho_radix_tree_root; + debugfs_root = debugfs_create_dir("kho", NULL); if (IS_ERR(debugfs_root)) { err = -ENOENT; - goto err_free_fdt; + goto err_free_kho_radix_tree_root; } err = kho_out_debugfs_init(); if (err) - goto err_free_fdt; + goto err_free_kho_radix_tree_root; + + /* Preserve the memory page of FDT for the next kernel */ + kho_preserve_phys(page_to_phys(kho_out.ser.fdt), PAGE_SIZE); if (fdt) { err = kho_in_debugfs_init(fdt); @@ -1087,6 +999,9 @@ static __init int kho_init(void) return 0; +err_free_kho_radix_tree_root: + kfree(kho_radix_tree_root); + kho_radix_tree_root = NULL; err_free_fdt: put_page(kho_out.ser.fdt); kho_out.ser.fdt = NULL; -- 2.51.0.618.g983fd99d29-goog ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v1 1/3] kho: Adopt KHO radix tree data structures 2025-10-01 1:19 ` [PATCH v1 1/3] kho: Adopt KHO radix tree data structures Jason Miu @ 2025-10-02 4:29 ` kernel test robot 2025-10-06 14:14 ` Jason Gunthorpe 1 sibling, 0 replies; 13+ messages in thread From: kernel test robot @ 2025-10-02 4:29 UTC (permalink / raw) To: Jason Miu, Alexander Graf, Andrew Morton, Baoquan He, Changyuan Lyu, David Matlack, David Rientjes, Jason Gunthorpe, Mike Rapoport, Pasha Tatashin, Pratyush Yadav, kexec, linux-kernel Cc: oe-kbuild-all, Linux Memory Management List Hi Jason, kernel test robot noticed the following build warnings: [auto build test WARNING on rppt-memblock/for-next] [also build test WARNING on linus/master v6.17] [cannot apply to rppt-memblock/fixes akpm-mm/mm-everything next-20250929] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Jason-Miu/kho-Adopt-KHO-radix-tree-data-structures/20251001-092230 base: https://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock.git for-next patch link: https://lore.kernel.org/r/20251001011941.1513050-2-jasonmiu%40google.com patch subject: [PATCH v1 1/3] kho: Adopt KHO radix tree data structures config: x86_64-randconfig-005-20251001 (https://download.01.org/0day-ci/archive/20251002/202510021229.mKL5i2Vt-lkp@intel.com/config) compiler: gcc-14 (Debian 14.2.0-19) 14.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251002/202510021229.mKL5i2Vt-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202510021229.mKL5i2Vt-lkp@intel.com/ All warnings (new ones prefixed by >>, old ones prefixed by <<): >> WARNING: modpost: vmlinux: section mismatch in reference: kho_radix_walk_trees_callback+0x83 (section: .text) -> __memblock_reserve (section: .init.text) >> WARNING: modpost: vmlinux: section mismatch in reference: kho_radix_walk_trees_callback+0x91 (section: .text) -> memblock_reserved_mark_noinit (section: .init.text) -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v1 1/3] kho: Adopt KHO radix tree data structures 2025-10-01 1:19 ` [PATCH v1 1/3] kho: Adopt KHO radix tree data structures Jason Miu 2025-10-02 4:29 ` kernel test robot @ 2025-10-06 14:14 ` Jason Gunthorpe 2025-10-06 17:26 ` Pasha Tatashin 2025-10-09 2:07 ` Jason Miu 1 sibling, 2 replies; 13+ messages in thread From: Jason Gunthorpe @ 2025-10-06 14:14 UTC (permalink / raw) To: Jason Miu Cc: Alexander Graf, Andrew Morton, Baoquan He, Changyuan Lyu, David Matlack, David Rientjes, Mike Rapoport, Pasha Tatashin, Pratyush Yadav, kexec, linux-kernel, linux-mm On Tue, Sep 30, 2025 at 06:19:39PM -0700, Jason Miu wrote: > @@ -29,7 +30,7 @@ > #include "kexec_internal.h" > > #define KHO_FDT_COMPATIBLE "kho-v1" We don't bump this? > -#define PROP_PRESERVED_MEMORY_MAP "preserved-memory-map" > +#define PROP_PRESERVED_PAGE_RADIX_TREE "preserved-page-radix-tree" > #define PROP_SUB_FDT "fdt" I'de really like to see all of these sorts of definitions in some structured ABI header not open coded all over the place.. > /* > + * The KHO radix tree tracks preserved memory pages. It is a hierarchical > + * structure that starts with a single root `kho_radix_tree`. This single > + * tree stores pages of all orders. > + * > + * This is achieved by encoding the page's physical address and its order into > + * a single `unsigned long` value. This encoded value is then used to traverse > + * the tree. > + * > + * The tree hierarchy is shown below: > + * > + * kho_radix_tree_root > + * +-------------------+ > + * | Level 6 | (struct kho_radix_tree) > + * +-------------------+ > + * | > + * v > + * +-------------------+ > + * | Level 5 | (struct kho_radix_tree) > + * +-------------------+ > + * | > + * | ... (intermediate levels) > + * | > + * v > + * +-------------------+ > + * | Level 1 | (struct kho_bitmap_table) > + * +-------------------+ > + * > + * The following diagram illustrates how the encoded value is split into > + * indices for the tree levels: > * > + * 63:60 59:51 50:42 41:33 32:24 23:15 14:0 > + * +---------+--------+--------+--------+--------+--------+-----------------+ > + * | 0 | Lv 6 | Lv 5 | Lv 4 | Lv 3 | Lv 2 | Lv 1 (bitmap) | > + * +---------+--------+--------+--------+--------+--------+-----------------+ > * > + * Each `kho_radix_tree` (Levels 2-6) and `kho_bitmap_table` (Level 1) is > + * PAGE_SIZE. Each entry in a `kho_radix_tree` is a descriptor (a physical > + * address) pointing to the next level node. For Level 2 `kho_radix_tree` > + * nodes, these descriptors point to a `kho_bitmap_table`. The final > + * `kho_bitmap_table` is a bitmap where each set bit represents a single > + * preserved page. Maybe a note that this is example is for PAGE_SIZE=4k. > */ > +struct kho_radix_tree { > + unsigned long table[PAGE_SIZE / sizeof(unsigned long)]; This should be phys_addr_t. > +}; You dropped the macros so now we don't know these are actually pointers to 'struct kho_radix_tree' > +/* > + * `kho_radix_tree_root` points to a page thats serves as the root of the > + * KHO radix tree. This page is allocated during KHO module initialization. > + * Its physical address is written to the FDT and passed to the next kernel > + * during kexec. > + */ > +static struct kho_radix_tree *kho_radix_tree_root; > +static DECLARE_RWSEM(kho_radix_tree_root_sem); > + > +static int kho_radix_tree_max_depth(void) > +{ > + int page_offset_bit_num = BITS_PER_LONG - PAGE_SHIFT; > + int order_bit_num = ilog2(__roundup_pow_of_two(page_offset_bit_num)); > + int bitmap_bit_num = PAGE_SHIFT + ilog2(BITS_PER_BYTE); > + int table_bit_num = ilog2(PAGE_SIZE / sizeof(unsigned long)); > + int table_level_num = DIV_ROUND_UP(page_offset_bit_num - > + bitmap_bit_num + order_bit_num, > + table_bit_num); All should be unsigned int. Below I suggest to put it in an enum and use different names.. And since the function is constant it can just be an enum TOP_LEVEL too. > +/* > + * The KHO radix tree tracks preserved pages by encoding a page's physical > + * address (pa) and its order into a single unsigned long value. This value > + * is then used to traverse the tree. The encoded value is composed of two > + * parts: the 'order bits' in the upper part and the 'page offset' in the > + * lower part. > + * > + * <-- Higher Bits ------------------------------------ Lower Bits --> > + * +--------------------------+-----------------------------------------+ > + * | Order Bits | Page Offset | > + * +--------------------------+-----------------------------------------+ > + * | ... 0 0 1 0 0 ... | pa >> (PAGE_SHIFT + order) | > + * +--------------------------+-----------------------------------------+ > + * ^ > + * | > + * This single '1' bit's position > + * uniquely identifies the 'order'. > + * > + * > + * Page Offset: > + * The 'page offset' is the physical address normalized for its order. It > + * effectively represents the page offset for the given order. > + * > + * Order Bits: > + * The 'order bits' encode the page order by setting a single bit at a > + * specific position. The position of this bit itself represents the order. > + * > + * For instance, on a 64-bit system with 4KB pages (PAGE_SHIFT = 12), the > + * maximum range for a page offset (for order 0) is 52 bits (64 - 12). This > + * offset occupies bits [0-51]. For order 0, the order bit is set at > + * position 52. > + * > + * As the order increases, the number of bits required for the 'page offset' > + * decreases. For example, order 1 requires one less bit for its page > + * offset. This allows its order bit to be set at position 51 without > + * conflicting with the page offset bits. > + * > + * This scheme ensures that the single order bit is always in a higher > + * position than any bit used by the page offset for that same order, > + * preventing collisions. Should explain why it is like this: This scheme allows storing all the multi-order page sizes in a single 6 level table with a good sharing of lower tables levels for 0 top address bits. A single algorithm can efficiently process everything. > + */ > +static unsigned long kho_radix_encode(unsigned long pa, unsigned int order) pa is phys_addr_t in the kernel, never unsigned long. If you want to make it all dynamic then this should be phys_addr_t > +{ > + unsigned long h = 1UL << (BITS_PER_LONG - PAGE_SHIFT - order); And this BITS_PER_LONG is confused, it is BITS_PER_PHYS_ADDR_T which may not exist. Use an enum ORDER_0_LG2 maybe > + unsigned long l = pa >> (PAGE_SHIFT + order); > > + return h | l; > +} > > +static unsigned long kho_radix_decode(unsigned long encoded, unsigned int *order) Returns phys_addr_t > { > - void *elm, *res; > + unsigned long order_bit = fls64(encoded); unsigned int > + unsigned long pa; phys_addr_t > + *order = BITS_PER_LONG - PAGE_SHIFT - order_bit + 1; ORDER_0_LG2 > + pa = encoded << (PAGE_SHIFT + *order); I'd add a comment that the shift always discards order. > + return pa; > +} > > +static unsigned long kho_radix_get_index(unsigned long encoded, int level) unsigned int level > +{ > + int table_bit_num = ilog2(PAGE_SIZE / sizeof(unsigned long)); > + int bitmap_bit_num = PAGE_SHIFT + ilog2(BITS_PER_BYTE); Stick all the constants that kho_radix_tree_max_depth() are computing in an enum instead of recomputing them.. > + unsigned long mask; > + int s; unsigned for all of these. > + > + if (level == 1) { I think the math is easier if level 0 == bitmap.. > + s = 0; > + mask = (1UL << bitmap_bit_num) - 1; > + } else { > + s = ((level - 2) * table_bit_num) + bitmap_bit_num; eg here you are doing level-2 which is a bit weird only because of the arbitary choice to make level=1 be the bitmap. I'd also use some different names table_bit_num == TABLE_SIZE_LG2 BITMAP_BIT_NUM = BITMAP_SIZE_LG2 Log2 designates the value is 1<<LG2 > + mask = (1UL << table_bit_num) - 1; > } > > - return elm; > + return (encoded >> s) & mask; It is just: return encoded % (1 << BITMAP_SIZE_LG2); return (encoded >> s) % (1 << TABLE_SIZE_LG2); The compiler is smart enough to choose bit logic if that is the fastest option and the above is more readable. > +static int kho_radix_set_bitmap(struct kho_bitmap_table *bit_tlb, unsigned long offset) > { > + if (!bit_tlb || > + offset >= PAGE_SIZE * BITS_PER_BYTE) > + return -EINVAL; > > + set_bit(offset, bit_tlb->bitmaps); set_bit is an atomic, you want __set_bit() > + return 0; > +} > > +static int kho_radix_preserve_page(unsigned long pa, unsigned int order) phys_addr_t > +{ > + unsigned long encoded = kho_radix_encode(pa, order); > + int num_tree_level = kho_radix_tree_max_depth(); kho_radix_tree_max_depth() is constant, stick it in an enum with the rest of them. > + struct kho_radix_tree *current_tree, *new_tree; > + struct kho_bitmap_table *bitmap_table; > + int err = 0; > + int i, idx; various unsigned int. > > + down_write(&kho_radix_tree_root_sem); > > + current_tree = kho_radix_tree_root; > > + /* Go from high levels to low levels */ > + for (i = num_tree_level; i >= 1; i--) { > + idx = kho_radix_get_index(encoded, i); > + > + if (i == 1) { > + bitmap_table = (struct kho_bitmap_table *)current_tree; > + err = kho_radix_set_bitmap(bitmap_table, idx); > + goto out; > + } > + > + if (!current_tree->table[idx]) { > + new_tree = kho_alloc_radix_tree(); > + if (!new_tree) { > + err = -ENOMEM; > + goto out; > + } > + > + current_tree->table[idx] = > + (unsigned long)virt_to_phys(new_tree); current_tree = new_tree > + } else > + > + current_tree = (struct kho_radix_tree *) > + phys_to_virt(current_tree->table[idx]); > } > + > +out: > + up_write(&kho_radix_tree_root_sem); > + return err; > } > > +static int kho_radix_walk_bitmaps(struct kho_bitmap_table *bit_tlb, > + unsigned long offset, phys_addr_t > + kho_radix_tree_walk_callback_t cb) > { > + unsigned long encoded = offset << (PAGE_SHIFT + ilog2(BITS_PER_BYTE)); > + unsigned long *bitmap = (unsigned long *)bit_tlb; > + int err = 0; > + int i; > > + for_each_set_bit(i, bitmap, PAGE_SIZE * BITS_PER_BYTE) { > + err = cb(encoded | i); > + if (err) > + return err; > + } > > + return 0; > +} > > +static int kho_radix_walk_trees(struct kho_radix_tree *root, int level, unsigned int > + unsigned long offset, phys_addr_t. I would call this start not offset.. > + kho_radix_tree_walk_callback_t cb) > +{ > + int level_shift = ilog2(PAGE_SIZE / sizeof(unsigned long)); > + struct kho_radix_tree *next_tree; > + unsigned long encoded, i; > + int err = 0; > > + if (level == 1) { > + encoded = offset; > + return kho_radix_walk_bitmaps((struct kho_bitmap_table *)root, > + encoded, cb); Better to do this in the caller a few lines below > + } > > + for (i = 0; i < PAGE_SIZE / sizeof(unsigned long); i++) { > + if (root->table[i]) { > + encoded = offset << level_shift | i; This doesn't seem right.. The argument to the walker should be the starting encoded of the table it is about to walk. Since everything always starts at 0 it should always be start | (i << level_shift) ? > + next_tree = (struct kho_radix_tree *) > + phys_to_virt(root->table[i]); > + err = kho_radix_walk_trees(next_tree, level - 1, encoded, cb); > if (err) > return err; > } > } > > + return 0; > +} > > +static int kho_memblock_reserve(phys_addr_t pa, int order) > +{ > + int sz = 1 << (order + PAGE_SHIFT); > + struct page *page = phys_to_page(pa); > + > + memblock_reserve(pa, sz); > + memblock_reserved_mark_noinit(pa, sz); > + page->private = order; > > return 0; > } > > +static int kho_radix_walk_trees_callback(unsigned long encoded) > +{ > + unsigned int order; > + unsigned long pa; > + > + pa = kho_radix_decode(encoded, &order); > + > + return kho_memblock_reserve(pa, order); > +} > + > +struct kho_serialization { > + struct page *fdt; > + struct list_head fdt_list; > + struct dentry *sub_fdt_dir; > +}; > + > +static int __kho_preserve_order(unsigned long pfn, unsigned int order) > +{ > + unsigned long pa = PFN_PHYS(pfn); phys_addr_t Jason ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v1 1/3] kho: Adopt KHO radix tree data structures 2025-10-06 14:14 ` Jason Gunthorpe @ 2025-10-06 17:26 ` Pasha Tatashin 2025-10-06 22:50 ` Jason Gunthorpe 2025-10-09 2:07 ` Jason Miu 1 sibling, 1 reply; 13+ messages in thread From: Pasha Tatashin @ 2025-10-06 17:26 UTC (permalink / raw) To: Jason Gunthorpe Cc: Jason Miu, Alexander Graf, Andrew Morton, Baoquan He, Changyuan Lyu, David Matlack, David Rientjes, Mike Rapoport, Pratyush Yadav, kexec, linux-kernel, linux-mm On Mon, Oct 6, 2025 at 10:14 AM Jason Gunthorpe <jgg@nvidia.com> wrote: > > On Tue, Sep 30, 2025 at 06:19:39PM -0700, Jason Miu wrote: > > @@ -29,7 +30,7 @@ > > #include "kexec_internal.h" > > > > #define KHO_FDT_COMPATIBLE "kho-v1" > > We don't bump this? > > > -#define PROP_PRESERVED_MEMORY_MAP "preserved-memory-map" > > +#define PROP_PRESERVED_PAGE_RADIX_TREE "preserved-page-radix-tree" > > #define PROP_SUB_FDT "fdt" > > I'de really like to see all of these sorts of definitions in some > structured ABI header not open coded all over the place.. > > > /* > > + * The KHO radix tree tracks preserved memory pages. It is a hierarchical > > + * structure that starts with a single root `kho_radix_tree`. This single > > + * tree stores pages of all orders. > > + * > > + * This is achieved by encoding the page's physical address and its order into > > + * a single `unsigned long` value. This encoded value is then used to traverse > > + * the tree. > > + * > > + * The tree hierarchy is shown below: > > + * > > + * kho_radix_tree_root > > + * +-------------------+ > > + * | Level 6 | (struct kho_radix_tree) > > + * +-------------------+ > > + * | > > + * v > > + * +-------------------+ > > + * | Level 5 | (struct kho_radix_tree) > > + * +-------------------+ > > + * | > > + * | ... (intermediate levels) > > + * | > > + * v > > + * +-------------------+ > > + * | Level 1 | (struct kho_bitmap_table) > > + * +-------------------+ > > + * > > + * The following diagram illustrates how the encoded value is split into > > + * indices for the tree levels: > > * > > + * 63:60 59:51 50:42 41:33 32:24 23:15 14:0 > > + * +---------+--------+--------+--------+--------+--------+-----------------+ > > + * | 0 | Lv 6 | Lv 5 | Lv 4 | Lv 3 | Lv 2 | Lv 1 (bitmap) | > > + * +---------+--------+--------+--------+--------+--------+-----------------+ > > * > > + * Each `kho_radix_tree` (Levels 2-6) and `kho_bitmap_table` (Level 1) is > > + * PAGE_SIZE. Each entry in a `kho_radix_tree` is a descriptor (a physical > > + * address) pointing to the next level node. For Level 2 `kho_radix_tree` > > + * nodes, these descriptors point to a `kho_bitmap_table`. The final > > + * `kho_bitmap_table` is a bitmap where each set bit represents a single > > + * preserved page. > > Maybe a note that this is example is for PAGE_SIZE=4k. > > > > */ > > +struct kho_radix_tree { > > + unsigned long table[PAGE_SIZE / sizeof(unsigned long)]; > > This should be phys_addr_t. Maybe u64 ? This is a preserved data, I would specify the size, and not care about 32-bit arches. Also, if we ever have to support larger physical spaces, this radix tree version would need to be bumped anyway. > > > +}; > > You dropped the macros so now we don't know these are actually > pointers to 'struct kho_radix_tree' > > > +/* > > + * `kho_radix_tree_root` points to a page thats serves as the root of the > > + * KHO radix tree. This page is allocated during KHO module initialization. > > + * Its physical address is written to the FDT and passed to the next kernel > > + * during kexec. > > + */ > > +static struct kho_radix_tree *kho_radix_tree_root; > > +static DECLARE_RWSEM(kho_radix_tree_root_sem); > > + > > +static int kho_radix_tree_max_depth(void) > > +{ > > + int page_offset_bit_num = BITS_PER_LONG - PAGE_SHIFT; > > + int order_bit_num = ilog2(__roundup_pow_of_two(page_offset_bit_num)); > > + int bitmap_bit_num = PAGE_SHIFT + ilog2(BITS_PER_BYTE); > > + int table_bit_num = ilog2(PAGE_SIZE / sizeof(unsigned long)); > > + int table_level_num = DIV_ROUND_UP(page_offset_bit_num - > > + bitmap_bit_num + order_bit_num, > > + table_bit_num); > > All should be unsigned int. Below I suggest to put it in an enum and > use different names.. And since the function is constant it can just > be an enum TOP_LEVEL too. > > > +/* > > + * The KHO radix tree tracks preserved pages by encoding a page's physical > > + * address (pa) and its order into a single unsigned long value. This value > > + * is then used to traverse the tree. The encoded value is composed of two > > + * parts: the 'order bits' in the upper part and the 'page offset' in the > > + * lower part. > > + * > > + * <-- Higher Bits ------------------------------------ Lower Bits --> > > + * +--------------------------+-----------------------------------------+ > > + * | Order Bits | Page Offset | > > + * +--------------------------+-----------------------------------------+ > > + * | ... 0 0 1 0 0 ... | pa >> (PAGE_SHIFT + order) | > > + * +--------------------------+-----------------------------------------+ > > + * ^ > > + * | > > + * This single '1' bit's position > > + * uniquely identifies the 'order'. > > + * > > + * > > + * Page Offset: > > + * The 'page offset' is the physical address normalized for its order. It > > + * effectively represents the page offset for the given order. > > + * > > + * Order Bits: > > + * The 'order bits' encode the page order by setting a single bit at a > > + * specific position. The position of this bit itself represents the order. > > + * > > + * For instance, on a 64-bit system with 4KB pages (PAGE_SHIFT = 12), the > > + * maximum range for a page offset (for order 0) is 52 bits (64 - 12). This > > + * offset occupies bits [0-51]. For order 0, the order bit is set at > > + * position 52. > > + * > > + * As the order increases, the number of bits required for the 'page offset' > > + * decreases. For example, order 1 requires one less bit for its page > > + * offset. This allows its order bit to be set at position 51 without > > + * conflicting with the page offset bits. > > + * > > + * This scheme ensures that the single order bit is always in a higher > > + * position than any bit used by the page offset for that same order, > > + * preventing collisions. > > Should explain why it is like this: > > This scheme allows storing all the multi-order page sizes in a single > 6 level table with a good sharing of lower tables levels for 0 top > address bits. A single algorithm can efficiently process everything. > > > + */ > > +static unsigned long kho_radix_encode(unsigned long pa, unsigned int order) > > pa is phys_addr_t in the kernel, never unsigned long. > > If you want to make it all dynamic then this should be phys_addr_t > > > +{ > > + unsigned long h = 1UL << (BITS_PER_LONG - PAGE_SHIFT - order); > > And this BITS_PER_LONG is confused, it is BITS_PER_PHYS_ADDR_T which > may not exist. > > Use an enum ORDER_0_LG2 maybe > > > + unsigned long l = pa >> (PAGE_SHIFT + order); > > > > + return h | l; > > +} > > > > +static unsigned long kho_radix_decode(unsigned long encoded, unsigned int *order) > > Returns phys_addr_t > > > { > > - void *elm, *res; > > + unsigned long order_bit = fls64(encoded); > > unsigned int > > > + unsigned long pa; > > phys_addr_t > > > + *order = BITS_PER_LONG - PAGE_SHIFT - order_bit + 1; > > ORDER_0_LG2 > > > + pa = encoded << (PAGE_SHIFT + *order); > > I'd add a comment that the shift always discards order. > > > + return pa; > > +} > > > > +static unsigned long kho_radix_get_index(unsigned long encoded, int level) > > unsigned int level > > > +{ > > + int table_bit_num = ilog2(PAGE_SIZE / sizeof(unsigned long)); > > + int bitmap_bit_num = PAGE_SHIFT + ilog2(BITS_PER_BYTE); > > Stick all the constants that kho_radix_tree_max_depth() are computing > in an enum instead of recomputing them.. > > > + unsigned long mask; > > + int s; > > unsigned for all of these. > > > + > > + if (level == 1) { > > I think the math is easier if level 0 == bitmap.. > > > + s = 0; > > + mask = (1UL << bitmap_bit_num) - 1; > > + } else { > > + s = ((level - 2) * table_bit_num) + bitmap_bit_num; > > eg here you are doing level-2 which is a bit weird only because of the > arbitary choice to make level=1 be the bitmap. > > I'd also use some different names > > table_bit_num == TABLE_SIZE_LG2 > BITMAP_BIT_NUM = BITMAP_SIZE_LG2 > > Log2 designates the value is 1<<LG2 > > > + mask = (1UL << table_bit_num) - 1; > > } > > > > - return elm; > > + return (encoded >> s) & mask; > > It is just: > > return encoded % (1 << BITMAP_SIZE_LG2); > return (encoded >> s) % (1 << TABLE_SIZE_LG2); > > The compiler is smart enough to choose bit logic if that is the > fastest option and the above is more readable. > > > +static int kho_radix_set_bitmap(struct kho_bitmap_table *bit_tlb, unsigned long offset) > > { > > + if (!bit_tlb || > > + offset >= PAGE_SIZE * BITS_PER_BYTE) > > + return -EINVAL; > > > > + set_bit(offset, bit_tlb->bitmaps); > > set_bit is an atomic, you want __set_bit() > > > + return 0; > > +} > > > > +static int kho_radix_preserve_page(unsigned long pa, unsigned int order) > > phys_addr_t > > > +{ > > + unsigned long encoded = kho_radix_encode(pa, order); > > + int num_tree_level = kho_radix_tree_max_depth(); > > kho_radix_tree_max_depth() is constant, stick it in an enum with the > rest of them. > > > + struct kho_radix_tree *current_tree, *new_tree; > > + struct kho_bitmap_table *bitmap_table; > > + int err = 0; > > + int i, idx; > > various unsigned int. > > > > > + down_write(&kho_radix_tree_root_sem); > > > > + current_tree = kho_radix_tree_root; > > > > + /* Go from high levels to low levels */ > > + for (i = num_tree_level; i >= 1; i--) { > > + idx = kho_radix_get_index(encoded, i); > > + > > + if (i == 1) { > > + bitmap_table = (struct kho_bitmap_table *)current_tree; > > + err = kho_radix_set_bitmap(bitmap_table, idx); > > + goto out; > > + } > > + > > + if (!current_tree->table[idx]) { > > + new_tree = kho_alloc_radix_tree(); > > + if (!new_tree) { > > + err = -ENOMEM; > > + goto out; > > + } > > + > > + current_tree->table[idx] = > > + (unsigned long)virt_to_phys(new_tree); > > current_tree = new_tree > > + } > > else > > > + > > + current_tree = (struct kho_radix_tree *) > > + phys_to_virt(current_tree->table[idx]); > > } > > + > > +out: > > + up_write(&kho_radix_tree_root_sem); > > + return err; > > } > > > > +static int kho_radix_walk_bitmaps(struct kho_bitmap_table *bit_tlb, > > + unsigned long offset, > > phys_addr_t > > > + kho_radix_tree_walk_callback_t cb) > > { > > + unsigned long encoded = offset << (PAGE_SHIFT + ilog2(BITS_PER_BYTE)); > > + unsigned long *bitmap = (unsigned long *)bit_tlb; > > + int err = 0; > > + int i; > > > > + for_each_set_bit(i, bitmap, PAGE_SIZE * BITS_PER_BYTE) { > > + err = cb(encoded | i); > > + if (err) > > + return err; > > + } > > > > + return 0; > > +} > > > > +static int kho_radix_walk_trees(struct kho_radix_tree *root, int level, > > unsigned int > > > + unsigned long offset, > > phys_addr_t. I would call this start not offset.. > > > + kho_radix_tree_walk_callback_t cb) > > +{ > > + int level_shift = ilog2(PAGE_SIZE / sizeof(unsigned long)); > > + struct kho_radix_tree *next_tree; > > + unsigned long encoded, i; > > + int err = 0; > > > > + if (level == 1) { > > + encoded = offset; > > + return kho_radix_walk_bitmaps((struct kho_bitmap_table *)root, > > + encoded, cb); > > Better to do this in the caller a few lines below > > > + } > > > > > + for (i = 0; i < PAGE_SIZE / sizeof(unsigned long); i++) { > > + if (root->table[i]) { > > + encoded = offset << level_shift | i; > > This doesn't seem right.. > > The argument to the walker should be the starting encoded of the table > it is about to walk. > > Since everything always starts at 0 it should always be > start | (i << level_shift) > > ? > > > + next_tree = (struct kho_radix_tree *) > > + phys_to_virt(root->table[i]); > > + err = kho_radix_walk_trees(next_tree, level - 1, encoded, cb); > > if (err) > > return err; > > } > > } > > > > + return 0; > > +} > > > > +static int kho_memblock_reserve(phys_addr_t pa, int order) > > +{ > > + int sz = 1 << (order + PAGE_SHIFT); > > + struct page *page = phys_to_page(pa); > > + > > + memblock_reserve(pa, sz); > > + memblock_reserved_mark_noinit(pa, sz); > > + page->private = order; > > > > return 0; > > } > > > > +static int kho_radix_walk_trees_callback(unsigned long encoded) > > +{ > > + unsigned int order; > > + unsigned long pa; > > + > > + pa = kho_radix_decode(encoded, &order); > > + > > + return kho_memblock_reserve(pa, order); > > +} > > + > > +struct kho_serialization { > > + struct page *fdt; > > + struct list_head fdt_list; > > + struct dentry *sub_fdt_dir; > > +}; > > + > > +static int __kho_preserve_order(unsigned long pfn, unsigned int order) > > +{ > > + unsigned long pa = PFN_PHYS(pfn); > > phys_addr_t > > Jason ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v1 1/3] kho: Adopt KHO radix tree data structures 2025-10-06 17:26 ` Pasha Tatashin @ 2025-10-06 22:50 ` Jason Gunthorpe 0 siblings, 0 replies; 13+ messages in thread From: Jason Gunthorpe @ 2025-10-06 22:50 UTC (permalink / raw) To: Pasha Tatashin Cc: Jason Miu, Alexander Graf, Andrew Morton, Baoquan He, Changyuan Lyu, David Matlack, David Rientjes, Mike Rapoport, Pratyush Yadav, kexec, linux-kernel, linux-mm On Mon, Oct 06, 2025 at 01:26:57PM -0400, Pasha Tatashin wrote: > > > +struct kho_radix_tree { > > > + unsigned long table[PAGE_SIZE / sizeof(unsigned long)]; > > > > This should be phys_addr_t. > > Maybe u64 ? This is a preserved data, I would specify the size, and > not care about 32-bit arches. Also, if we ever have to support larger > physical spaces, this radix tree version would need to be bumped > anyway. Yeah, that is a good plan. Jason ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v1 1/3] kho: Adopt KHO radix tree data structures 2025-10-06 14:14 ` Jason Gunthorpe 2025-10-06 17:26 ` Pasha Tatashin @ 2025-10-09 2:07 ` Jason Miu 2025-10-09 17:52 ` Jason Gunthorpe 1 sibling, 1 reply; 13+ messages in thread From: Jason Miu @ 2025-10-09 2:07 UTC (permalink / raw) To: Jason Gunthorpe Cc: Alexander Graf, Andrew Morton, Baoquan He, Changyuan Lyu, David Matlack, David Rientjes, Mike Rapoport, Pasha Tatashin, Pratyush Yadav, kexec, linux-kernel, linux-mm Hi Jason, Thank you very much for your feedback again. On Mon, Oct 6, 2025 at 7:14 AM Jason Gunthorpe <jgg@nvidia.com> wrote: > > #define KHO_FDT_COMPATIBLE "kho-v1" > > We don't bump this? Will do. Will be "kho-v2". > > > -#define PROP_PRESERVED_MEMORY_MAP "preserved-memory-map" > > +#define PROP_PRESERVED_PAGE_RADIX_TREE "preserved-page-radix-tree" > > #define PROP_SUB_FDT "fdt" > > I'de really like to see all of these sorts of definitions in some > structured ABI header not open coded all over the place.. Do you think `include/linux/kexec_handover.h` is the appropriate place, or would you prefer a new, dedicated ABI header (e.g., in `include/uapi/linux/`) for all KHO-related FDT constants? > > > */ > > +struct kho_radix_tree { > > + unsigned long table[PAGE_SIZE / sizeof(unsigned long)]; > > This should be phys_addr_t. > > > +}; > > You dropped the macros so now we don't know these are actually > pointers to 'struct kho_radix_tree' > Agreed. Will change `u64` according to Pasha's comment. And we use explicit casts like `(u64)virt_to_phys(new_tree)` and `(struct kho_radix_tree *)phys_to_virt(table_entry)` in the current series. I believe this, along with the `u64` type, makes it clear that the table stores physical addresses. > > +static int kho_radix_tree_max_depth(void) > > +{ > > + int page_offset_bit_num = BITS_PER_LONG - PAGE_SHIFT; > > + int order_bit_num = ilog2(__roundup_pow_of_two(page_offset_bit_num)); > > + int bitmap_bit_num = PAGE_SHIFT + ilog2(BITS_PER_BYTE); > > + int table_bit_num = ilog2(PAGE_SIZE / sizeof(unsigned long)); > > + int table_level_num = DIV_ROUND_UP(page_offset_bit_num - > > + bitmap_bit_num + order_bit_num, > > + table_bit_num); > > All should be unsigned int. Below I suggest to put it in an enum and > use different names.. And since the function is constant it can just > be an enum TOP_LEVEL too. > Yes I did think of returning a const for `kho_radix_tree_max_depth()`. I think using enums is a better idea and I can place all above values as enums. > > + */ > > +static unsigned long kho_radix_encode(unsigned long pa, unsigned int order) > > pa is phys_addr_t in the kernel, never unsigned long. > > If you want to make it all dynamic then this should be phys_addr_t Should this also be `u64`, or we stay with `phys_addr_t` for all page addresses? > > > +{ > > + unsigned long h = 1UL << (BITS_PER_LONG - PAGE_SHIFT - order); > > And this BITS_PER_LONG is confused, it is BITS_PER_PHYS_ADDR_T which > may not exist. > > Use an enum ORDER_0_LG2 maybe I prefer `KHO_RADIX_ORDER_0_BIT_POS` (defined as `BITS_PER_LONG - PAGE_SHIFT`) over `ORDER_0_LG2`, as I think the latter is a bit hard to understand, what do you think? This constant, along with others, will be placed in the enum. > > > + unsigned long l = pa >> (PAGE_SHIFT + order); > > > > + return h | l; > > +} > > > > +static unsigned long kho_radix_decode(unsigned long encoded, unsigned int *order) > > Returns phys_addr_t > > > { > > - void *elm, *res; > > + unsigned long order_bit = fls64(encoded); > > unsigned int > > > + unsigned long pa; > > phys_addr_t > > > + *order = BITS_PER_LONG - PAGE_SHIFT - order_bit + 1; > > ORDER_0_LG2 > > > + pa = encoded << (PAGE_SHIFT + *order); > > I'd add a comment that the shift always discards order. > > > + return pa; > > +} > > > > +static unsigned long kho_radix_get_index(unsigned long encoded, int level) > > unsigned int level > > > +{ > > + int table_bit_num = ilog2(PAGE_SIZE / sizeof(unsigned long)); > > + int bitmap_bit_num = PAGE_SHIFT + ilog2(BITS_PER_BYTE); > > Stick all the constants that kho_radix_tree_max_depth() are computing > in an enum instead of recomputing them.. > > > + unsigned long mask; > > + int s; > > unsigned for all of these. > > > + > > + if (level == 1) { > > I think the math is easier if level 0 == bitmap.. > > > + s = 0; > > + mask = (1UL << bitmap_bit_num) - 1; > > + } else { > > + s = ((level - 2) * table_bit_num) + bitmap_bit_num; > > eg here you are doing level-2 which is a bit weird only because of the > arbitary choice to make level=1 be the bitmap. > > I'd also use some different names > > table_bit_num == TABLE_SIZE_LG2 > BITMAP_BIT_NUM = BITMAP_SIZE_LG2 > > Log2 designates the value is 1<<LG2 Good point on the level numbering, we'll switch to 0-based where level 0 is the bitmap. The modulo operations you suggested play nicely with the 0-based numbering too, thanks. Will also update the names and put them in enum. > > > + mask = (1UL << table_bit_num) - 1; > > } > > > > - return elm; > > + return (encoded >> s) & mask; > > It is just: > > return encoded % (1 << BITMAP_SIZE_LG2); > return (encoded >> s) % (1 << TABLE_SIZE_LG2); > > The compiler is smart enough to choose bit logic if that is the > fastest option and the above is more readable. > > > +static int kho_radix_set_bitmap(struct kho_bitmap_table *bit_tlb, unsigned long offset) > > { > > + if (!bit_tlb || > > + offset >= PAGE_SIZE * BITS_PER_BYTE) > > + return -EINVAL; > > > > + set_bit(offset, bit_tlb->bitmaps); > > set_bit is an atomic, you want __set_bit() > > > + return 0; > > +} > > > > +static int kho_radix_preserve_page(unsigned long pa, unsigned int order) > > phys_addr_t > > > +{ > > + unsigned long encoded = kho_radix_encode(pa, order); > > + int num_tree_level = kho_radix_tree_max_depth(); > > kho_radix_tree_max_depth() is constant, stick it in an enum with the > rest of them. > > > + struct kho_radix_tree *current_tree, *new_tree; > > + struct kho_bitmap_table *bitmap_table; > > + int err = 0; > > + int i, idx; > > various unsigned int. > > > > > + down_write(&kho_radix_tree_root_sem); > > > > + current_tree = kho_radix_tree_root; > > > > + /* Go from high levels to low levels */ > > + for (i = num_tree_level; i >= 1; i--) { > > + idx = kho_radix_get_index(encoded, i); > > + > > + if (i == 1) { > > + bitmap_table = (struct kho_bitmap_table *)current_tree; > > + err = kho_radix_set_bitmap(bitmap_table, idx); > > + goto out; > > + } > > + > > + if (!current_tree->table[idx]) { > > + new_tree = kho_alloc_radix_tree(); > > + if (!new_tree) { > > + err = -ENOMEM; > > + goto out; > > + } > > + > > + current_tree->table[idx] = > > + (unsigned long)virt_to_phys(new_tree); > > current_tree = new_tree > > + } > > else > > > + > > + current_tree = (struct kho_radix_tree *) > > + phys_to_virt(current_tree->table[idx]); > > } > > + > > +out: > > + up_write(&kho_radix_tree_root_sem); > > + return err; > > } > > > > +static int kho_radix_walk_bitmaps(struct kho_bitmap_table *bit_tlb, > > + unsigned long offset, > > phys_addr_t > > > + kho_radix_tree_walk_callback_t cb) > > { > > + unsigned long encoded = offset << (PAGE_SHIFT + ilog2(BITS_PER_BYTE)); > > + unsigned long *bitmap = (unsigned long *)bit_tlb; > > + int err = 0; > > + int i; > > > > + for_each_set_bit(i, bitmap, PAGE_SIZE * BITS_PER_BYTE) { > > + err = cb(encoded | i); > > + if (err) > > + return err; > > + } > > > > + return 0; > > +} > > > > +static int kho_radix_walk_trees(struct kho_radix_tree *root, int level, > > unsigned int > > > + unsigned long offset, > > phys_addr_t. I would call this start not offset.. > > > + kho_radix_tree_walk_callback_t cb) > > +{ > > + int level_shift = ilog2(PAGE_SIZE / sizeof(unsigned long)); > > + struct kho_radix_tree *next_tree; > > + unsigned long encoded, i; > > + int err = 0; > > > > + if (level == 1) { > > + encoded = offset; > > + return kho_radix_walk_bitmaps((struct kho_bitmap_table *)root, > > + encoded, cb); > > Better to do this in the caller a few lines below But the caller is in a different tree level? Should we only walk the bitmaps at the lowest level? > > > + } > > > > > + for (i = 0; i < PAGE_SIZE / sizeof(unsigned long); i++) { > > + if (root->table[i]) { > > + encoded = offset << level_shift | i; > > This doesn't seem right.. > > The argument to the walker should be the starting encoded of the table > it is about to walk. > > Since everything always starts at 0 it should always be > start | (i << level_shift) > > ? You're right that this line might not be immediately intuitive. The var `level_shift` (which is constant value 9 here) is applied to the *accumulated* `offset` from the parent level. Let's consider an example of a preserved page at physical address `0x1000`, which encodes to `0x10000000000001` (bit 52 is set for order 0, bit 0 is set for page 1). If we were to use `start | (i << level_shift)` where `level_shift` is a constant 9, and `start` is the value from the parent call: - At Level 6, `start` is `0`. `i` is 2 as bit 51:59 = 2. Result: `0 | (2 << 9) = 0x400`. This is passed to Level 5. - At Level 5, `start` is `0x400`, `i` is 0 as bit 50:42 = 0. Result: `0x400 | (0 << 9) = 0x400`. This is passed to Level 4. - At Level 4, `start` is `0x400`, `i` is 0 as bit 33:41 = 0. Result: `0x400 | (0 << 9) = 0x400`. And so on. As you can see, the encoded value is not correctly reconstructed. This will work if the `level_shift` represents the total shift from the LSB for each specific level, but it is not the case here. I will, however, add a detailed comment to `kho_radix_walk_trees` to clarify this logic. I also agree to change the name of `offset` to make it more clearer, how about `base_encoded`, or do you still prefer `start`? > > > + next_tree = (struct kho_radix_tree *) > > + phys_to_virt(root->table[i]); > > + err = kho_radix_walk_trees(next_tree, level - 1, encoded, cb); > > if (err) > > return err; > > } > > } > > > > + return 0; > > +} > > > > +static int kho_memblock_reserve(phys_addr_t pa, int order) > > +{ > > + int sz = 1 << (order + PAGE_SHIFT); > > + struct page *page = phys_to_page(pa); > > + > > + memblock_reserve(pa, sz); > > + memblock_reserved_mark_noinit(pa, sz); > > + page->private = order; > > > > return 0; > > } > > > > +static int kho_radix_walk_trees_callback(unsigned long encoded) > > +{ > > + unsigned int order; > > + unsigned long pa; > > + > > + pa = kho_radix_decode(encoded, &order); > > + > > + return kho_memblock_reserve(pa, order); > > +} > > + > > +struct kho_serialization { > > + struct page *fdt; > > + struct list_head fdt_list; > > + struct dentry *sub_fdt_dir; > > +}; > > + > > +static int __kho_preserve_order(unsigned long pfn, unsigned int order) > > +{ > > + unsigned long pa = PFN_PHYS(pfn); > > phys_addr_t > > Jason Will do the update in the next patch version. Thanks again. -- Jason Miu ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v1 1/3] kho: Adopt KHO radix tree data structures 2025-10-09 2:07 ` Jason Miu @ 2025-10-09 17:52 ` Jason Gunthorpe 2025-10-22 0:59 ` Jason Miu 0 siblings, 1 reply; 13+ messages in thread From: Jason Gunthorpe @ 2025-10-09 17:52 UTC (permalink / raw) To: Jason Miu Cc: Alexander Graf, Andrew Morton, Baoquan He, Changyuan Lyu, David Matlack, David Rientjes, Mike Rapoport, Pasha Tatashin, Pratyush Yadav, kexec, linux-kernel, linux-mm > > > -#define PROP_PRESERVED_MEMORY_MAP "preserved-memory-map" > > > +#define PROP_PRESERVED_PAGE_RADIX_TREE "preserved-page-radix-tree" > > > #define PROP_SUB_FDT "fdt" > > > > I'de really like to see all of these sorts of definitions in some > > structured ABI header not open coded all over the place.. > > Do you think `include/linux/kexec_handover.h` is the appropriate > place, or would you prefer a new, dedicated ABI header (e.g., in > `include/uapi/linux/`) for all KHO-related FDT constants? I would avoid uapi, but maybe Pasha has some idea. include/linux/live_update/abi/ ? > Agreed. Will change `u64` according to Pasha's comment. And we use > explicit casts like `(u64)virt_to_phys(new_tree)` and `(struct > kho_radix_tree *)phys_to_virt(table_entry)` in the current series. I > believe this, along with the `u64` type, makes it clear that the table > stores physical addresses. Well, the macros were intended to automate this and avoid mistakes from opencoding.. Just keep using them? > > > + */ > > > +static unsigned long kho_radix_encode(unsigned long pa, unsigned int order) > > > > pa is phys_addr_t in the kernel, never unsigned long. > > > > If you want to make it all dynamic then this should be phys_addr_t > > Should this also be `u64`, or we stay with `phys_addr_t` for all page > addresses? you should use phys_addr_t for everything that originates from a phys_addr_t, and u64 for all the ABI > > > +{ > > > + unsigned long h = 1UL << (BITS_PER_LONG - PAGE_SHIFT - order); > > > > And this BITS_PER_LONG is confused, it is BITS_PER_PHYS_ADDR_T which > > may not exist. > > > > Use an enum ORDER_0_LG2 maybe > > I prefer `KHO_RADIX_ORDER_0_BIT_POS` (defined as `BITS_PER_LONG - > PAGE_SHIFT`) over `ORDER_0_LG2`, as I think the latter is a bit hard > to understand, what do you think? This constant, along with others, > will be placed in the enum. Sure, though I prefer LG2 to BIT_POS BIT_POS to me implies it is being used as bit wise operation, while log2 is a mathematical concept X_lg2 = ilog2(X) && X == 1 << X_lg2 > > > + kho_radix_tree_walk_callback_t cb) > > > +{ > > > + int level_shift = ilog2(PAGE_SIZE / sizeof(unsigned long)); > > > + struct kho_radix_tree *next_tree; > > > + unsigned long encoded, i; > > > + int err = 0; > > > > > > + if (level == 1) { > > > + encoded = offset; > > > + return kho_radix_walk_bitmaps((struct kho_bitmap_table *)root, > > > + encoded, cb); > > > > Better to do this in the caller a few lines below > > But the caller is in a different tree level? Should we only walk the > bitmaps at the lowest level? I mean just have the caller do if (level-1 ==0) kho_radix_walk_bitmaps() else .. Avoids a function call > > > + for (i = 0; i < PAGE_SIZE / sizeof(unsigned long); i++) { > > > + if (root->table[i]) { > > > + encoded = offset << level_shift | i; > > > > This doesn't seem right.. > > > > The argument to the walker should be the starting encoded of the table > > it is about to walk. > > > > Since everything always starts at 0 it should always be > > start | (i << level_shift) > > > > ? > > You're right that this line might not be immediately intuitive. The > var `level_shift` (which is constant value 9 here) is applied to the > *accumulated* `offset` from the parent level. Let's consider an > example of a preserved page at physical address `0x1000`, which > encodes to `0x10000000000001` (bit 52 is set for order 0, bit 0 is set > for page 1). Oh, weird, too weird maybe. I'd just keep all the values as fully shifted, level_shift should be adjusted to have the full shift for this level. Easier to understand. Also, I think the order bits might have become a bit confused, I think I explained it wrong. My idea was to try to share the radix levels to save space eg if we have like this patch does: Order phys 00001 abcd 00010 0bcd 00100 00cd 01000 000d Then we don't get too much page level sharing, the middle ends up with 0 indexes in tables that cannot be shared. What I was going for was to push all the shared pages to the left 00001 abcd 00000 1bcd 00000 01cd 00000 001d Here the first radix level has index 0 or 1 and is fully shared. So eg Order 4 and 5 will have all the same 0 index table levels. This also reduces the max height of the tree because only +1 bit is needed to store order. Jason ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v1 1/3] kho: Adopt KHO radix tree data structures 2025-10-09 17:52 ` Jason Gunthorpe @ 2025-10-22 0:59 ` Jason Miu 0 siblings, 0 replies; 13+ messages in thread From: Jason Miu @ 2025-10-22 0:59 UTC (permalink / raw) To: Jason Gunthorpe Cc: Alexander Graf, Andrew Morton, Baoquan He, Changyuan Lyu, David Matlack, David Rientjes, Mike Rapoport, Pasha Tatashin, Pratyush Yadav, kexec, linux-kernel, linux-mm Thanks Jason, I uploaded the patch v2 according to your feedback. On Thu, Oct 9, 2025 at 10:52 AM Jason Gunthorpe <jgg@nvidia.com> wrote: > > > > > -#define PROP_PRESERVED_MEMORY_MAP "preserved-memory-map" > > > > +#define PROP_PRESERVED_PAGE_RADIX_TREE "preserved-page-radix-tree" > > > > #define PROP_SUB_FDT "fdt" > > > > > > I'de really like to see all of these sorts of definitions in some > > > structured ABI header not open coded all over the place.. > > > > Do you think `include/linux/kexec_handover.h` is the appropriate > > place, or would you prefer a new, dedicated ABI header (e.g., in > > `include/uapi/linux/`) for all KHO-related FDT constants? > > I would avoid uapi, but maybe Pasha has some > idea. > > include/linux/live_update/abi/ ? Yes, moved to include/linux/live_update/abi/. > > > Agreed. Will change `u64` according to Pasha's comment. And we use > > explicit casts like `(u64)virt_to_phys(new_tree)` and `(struct > > kho_radix_tree *)phys_to_virt(table_entry)` in the current series. I > > believe this, along with the `u64` type, makes it clear that the table > > stores physical addresses. > > Well, the macros were intended to automate this and avoid mistakes > from opencoding.. Just keep using them? > Sure, added two inline functions `kho_radix_tree_desc()` and `kho_radix_tree()` back for converting. > > > > + */ > > > > +static unsigned long kho_radix_encode(unsigned long pa, unsigned int order) > > > > > > pa is phys_addr_t in the kernel, never unsigned long. > > > > > > If you want to make it all dynamic then this should be phys_addr_t > > > > Should this also be `u64`, or we stay with `phys_addr_t` for all page > > addresses? > > you should use phys_addr_t for everything that originates from a > phys_addr_t, and u64 for all the ABI > done > > > > +{ > > > > + unsigned long h = 1UL << (BITS_PER_LONG - PAGE_SHIFT - order); > > > > > > And this BITS_PER_LONG is confused, it is BITS_PER_PHYS_ADDR_T which > > > may not exist. > > > > > > Use an enum ORDER_0_LG2 maybe > > > > I prefer `KHO_RADIX_ORDER_0_BIT_POS` (defined as `BITS_PER_LONG - > > PAGE_SHIFT`) over `ORDER_0_LG2`, as I think the latter is a bit hard > > to understand, what do you think? This constant, along with others, > > will be placed in the enum. > > Sure, though I prefer LG2 to BIT_POS Lets pick LG2. =) > > BIT_POS to me implies it is being used as bit wise operation, while > log2 is a mathematical concept > > X_lg2 = ilog2(X) && X == 1 << X_lg2 > > > > > + kho_radix_tree_walk_callback_t cb) > > > > +{ > > > > + int level_shift = ilog2(PAGE_SIZE / sizeof(unsigned long)); > > > > + struct kho_radix_tree *next_tree; > > > > + unsigned long encoded, i; > > > > + int err = 0; > > > > > > > > + if (level == 1) { > > > > + encoded = offset; > > > > + return kho_radix_walk_bitmaps((struct kho_bitmap_table *)root, > > > > + encoded, cb); > > > > > > Better to do this in the caller a few lines below > > > > But the caller is in a different tree level? Should we only walk the > > bitmaps at the lowest level? > > I mean just have the caller do > > if (level-1 ==0) > kho_radix_walk_bitmaps() > else > .. > > Avoids a function call I see. Done. > > > > > + for (i = 0; i < PAGE_SIZE / sizeof(unsigned long); i++) { > > > > + if (root->table[i]) { > > > > + encoded = offset << level_shift | i; > > > > > > This doesn't seem right.. > > > > > > The argument to the walker should be the starting encoded of the table > > > it is about to walk. > > > > > > Since everything always starts at 0 it should always be > > > start | (i << level_shift) > > > > > > ? > > > > You're right that this line might not be immediately intuitive. The > > var `level_shift` (which is constant value 9 here) is applied to the > > *accumulated* `offset` from the parent level. Let's consider an > > example of a preserved page at physical address `0x1000`, which > > encodes to `0x10000000000001` (bit 52 is set for order 0, bit 0 is set > > for page 1). > > Oh, weird, too weird maybe. I'd just keep all the values as fully > shifted, level_shift should be adjusted to have the full shift for > this level. Easier to understand. > > Also, I think the order bits might have become a bit confused, I think > I explained it wrong. > > My idea was to try to share the radix levels to save space eg if we > have like this patch does: > > Order phys > 00001 abcd > 00010 0bcd > 00100 00cd > 01000 000d > > Then we don't get too much page level sharing, the middle ends up with > 0 indexes in tables that cannot be shared. > > What I was going for was to push all the shared pages to the left > > 00001 abcd > 00000 1bcd > 00000 01cd > 00000 001d > > Here the first radix level has index 0 or 1 and is fully shared. So eg > Order 4 and 5 will have all the same 0 index table levels. This also > reduces the max height of the tree because only +1 bit is needed to > store order. > > Jason Thanks for the clarification. I updated the logic by keeping the encoded value fully shifted and adjusting the `level_shift` according to the current level. And yes we are having the shared pages on the left side (zeros in the encoded prefix) while having the order bits shift to right when the page order increases. I hope the updated code makes this more clearer. -- Jason Miu ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v1 2/3] memblock: Remove KHO notifier usage 2025-10-01 1:19 [PATCH v1 0/3] Make KHO Stateless Jason Miu 2025-10-01 1:19 ` [PATCH v1 1/3] kho: Adopt KHO radix tree data structures Jason Miu @ 2025-10-01 1:19 ` Jason Miu 2025-10-01 16:35 ` kernel test robot 2025-10-01 1:19 ` [PATCH v1 3/3] kho: Remove notifier system infrastructure Jason Miu 2 siblings, 1 reply; 13+ messages in thread From: Jason Miu @ 2025-10-01 1:19 UTC (permalink / raw) To: Alexander Graf, Andrew Morton, Baoquan He, Changyuan Lyu, David Matlack, David Rientjes, Jason Gunthorpe, Jason Miu, Mike Rapoport, Pasha Tatashin, Pratyush Yadav, kexec, linux-kernel, linux-mm Remove the KHO notifier registration and callbacks from the memblock subsystem. These notifiers were tied to the former KHO finalize and abort events, which are no longer used. Memblock now preserves its `reserve_mem` regions and registers its metadata by calling kho_preserve_phys(), kho_preserve_folio(), and kho_add_subtree() directly within its initialization function. Signed-off-by: Jason Miu <jasonmiu@google.com> --- include/linux/kexec_handover.h | 5 ++-- kernel/kexec_handover.c | 48 +++++++++++++++++++++------------- mm/memblock.c | 45 +++++++------------------------ 3 files changed, 42 insertions(+), 56 deletions(-) diff --git a/include/linux/kexec_handover.h b/include/linux/kexec_handover.h index c8229cb11f4b..9566c90a3501 100644 --- a/include/linux/kexec_handover.h +++ b/include/linux/kexec_handover.h @@ -27,7 +27,7 @@ bool kho_is_enabled(void); int kho_preserve_folio(struct folio *folio); int kho_preserve_phys(phys_addr_t phys, size_t size); struct folio *kho_restore_folio(phys_addr_t phys); -int kho_add_subtree(struct kho_serialization *ser, const char *name, void *fdt); +int kho_add_subtree(const char *name, void *fdt); int kho_retrieve_subtree(const char *name, phys_addr_t *phys); int register_kho_notifier(struct notifier_block *nb); @@ -58,8 +58,7 @@ static inline struct folio *kho_restore_folio(phys_addr_t phys) return NULL; } -static inline int kho_add_subtree(struct kho_serialization *ser, - const char *name, void *fdt) +static inline int kho_add_subtree(const char *name, void *fdt) { return -EOPNOTSUPP; } diff --git a/kernel/kexec_handover.c b/kernel/kexec_handover.c index 34cf0ce4f359..ee4f430dfae0 100644 --- a/kernel/kexec_handover.c +++ b/kernel/kexec_handover.c @@ -640,9 +640,21 @@ static int kho_debugfs_fdt_add(struct list_head *list, struct dentry *dir, return 0; } +struct kho_out { + struct blocking_notifier_head chain_head; + struct dentry *dir; + struct kho_serialization ser; +}; + +static struct kho_out kho_out = { + .chain_head = BLOCKING_NOTIFIER_INIT(kho_out.chain_head), + .ser = { + .fdt_list = LIST_HEAD_INIT(kho_out.ser.fdt_list), + }, +}; + /** * kho_add_subtree - record the physical address of a sub FDT in KHO root tree. - * @ser: serialization control object passed by KHO notifiers. * @name: name of the sub tree. * @fdt: the sub tree blob. * @@ -655,16 +667,29 @@ static int kho_debugfs_fdt_add(struct list_head *list, struct dentry *dir, * * Return: 0 on success, error code on failure */ -int kho_add_subtree(struct kho_serialization *ser, const char *name, void *fdt) +int kho_add_subtree(const char *name, void *fdt) { + struct kho_serialization *ser = &kho_out.ser; int err = 0; + int root_node_offset, subnode_offset; u64 phys = (u64)virt_to_phys(fdt); void *root = page_to_virt(ser->fdt); - err |= fdt_begin_node(root, name); - err |= fdt_property(root, PROP_SUB_FDT, &phys, sizeof(phys)); - err |= fdt_end_node(root); + /* Reload the KHO root FDT to the same buffer */ + err = fdt_open_into(root, root, PAGE_SIZE); + if (err) + return err; + + root_node_offset = fdt_path_offset(fdt, "/"); + if (root_node_offset < 0) + return root_node_offset; + + subnode_offset = fdt_add_subnode(root, root_node_offset, name); + if (subnode_offset < 0) + return subnode_offset; + err = fdt_setprop(root, subnode_offset, + PROP_SUB_FDT, &phys, sizeof(phys)); if (err) return err; @@ -672,19 +697,6 @@ int kho_add_subtree(struct kho_serialization *ser, const char *name, void *fdt) } EXPORT_SYMBOL_GPL(kho_add_subtree); -struct kho_out { - struct blocking_notifier_head chain_head; - struct dentry *dir; - struct kho_serialization ser; -}; - -static struct kho_out kho_out = { - .chain_head = BLOCKING_NOTIFIER_INIT(kho_out.chain_head), - .ser = { - .fdt_list = LIST_HEAD_INIT(kho_out.ser.fdt_list), - }, -}; - int register_kho_notifier(struct notifier_block *nb) { return blocking_notifier_chain_register(&kho_out.chain_head, nb); diff --git a/mm/memblock.c b/mm/memblock.c index 117d963e677c..602a16cb467a 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -2510,39 +2510,6 @@ int reserve_mem_release_by_name(const char *name) #define RESERVE_MEM_KHO_NODE_COMPATIBLE "reserve-mem-v1" static struct page *kho_fdt; -static int reserve_mem_kho_finalize(struct kho_serialization *ser) -{ - int err = 0, i; - - for (i = 0; i < reserved_mem_count; i++) { - struct reserve_mem_table *map = &reserved_mem_table[i]; - - err |= kho_preserve_phys(map->start, map->size); - } - - err |= kho_preserve_folio(page_folio(kho_fdt)); - err |= kho_add_subtree(ser, MEMBLOCK_KHO_FDT, page_to_virt(kho_fdt)); - - return notifier_from_errno(err); -} - -static int reserve_mem_kho_notifier(struct notifier_block *self, - unsigned long cmd, void *v) -{ - switch (cmd) { - case KEXEC_KHO_FINALIZE: - return reserve_mem_kho_finalize((struct kho_serialization *)v); - case KEXEC_KHO_ABORT: - return NOTIFY_DONE; - default: - return NOTIFY_BAD; - } -} - -static struct notifier_block reserve_mem_kho_nb = { - .notifier_call = reserve_mem_kho_notifier, -}; - static int __init prepare_kho_fdt(void) { int err = 0, i; @@ -2583,7 +2550,7 @@ static int __init prepare_kho_fdt(void) static int __init reserve_mem_init(void) { - int err; + int err, i; if (!kho_is_enabled() || !reserved_mem_count) return 0; @@ -2592,7 +2559,15 @@ static int __init reserve_mem_init(void) if (err) return err; - err = register_kho_notifier(&reserve_mem_kho_nb); + for (i = 0; i < reserved_mem_count; i++) { + struct reserve_mem_table *map = &reserved_mem_table[i]; + + err |= kho_preserve_phys(map->start, map->size); + } + + err |= kho_preserve_folio(page_folio(kho_fdt)); + err |= kho_add_subtree(MEMBLOCK_KHO_FDT, page_to_virt(kho_fdt)); + if (err) { put_page(kho_fdt); kho_fdt = NULL; -- 2.51.0.618.g983fd99d29-goog ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v1 2/3] memblock: Remove KHO notifier usage 2025-10-01 1:19 ` [PATCH v1 2/3] memblock: Remove KHO notifier usage Jason Miu @ 2025-10-01 16:35 ` kernel test robot 0 siblings, 0 replies; 13+ messages in thread From: kernel test robot @ 2025-10-01 16:35 UTC (permalink / raw) To: Jason Miu, Alexander Graf, Andrew Morton, Baoquan He, Changyuan Lyu, David Matlack, David Rientjes, Jason Gunthorpe, Mike Rapoport, Pasha Tatashin, Pratyush Yadav, kexec, linux-kernel Cc: llvm, oe-kbuild-all, Linux Memory Management List Hi Jason, kernel test robot noticed the following build errors: [auto build test ERROR on rppt-memblock/for-next] [also build test ERROR on linus/master v6.17] [cannot apply to rppt-memblock/fixes akpm-mm/mm-everything next-20250929] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Jason-Miu/kho-Adopt-KHO-radix-tree-data-structures/20251001-092230 base: https://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock.git for-next patch link: https://lore.kernel.org/r/20251001011941.1513050-3-jasonmiu%40google.com patch subject: [PATCH v1 2/3] memblock: Remove KHO notifier usage config: x86_64-randconfig-003-20251001 (https://download.01.org/0day-ci/archive/20251002/202510020000.IIPFcxsW-lkp@intel.com/config) compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251002/202510020000.IIPFcxsW-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202510020000.IIPFcxsW-lkp@intel.com/ All errors (new ones prefixed by >>): >> lib/test_kho.c:59:44: error: too many arguments to function call, expected 2, have 3 59 | err |= kho_add_subtree(ser, KHO_TEST_FDT, folio_address(state->fdt)); | ~~~~~~~~~~~~~~~ ^~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/kexec_handover.h:30:5: note: 'kho_add_subtree' declared here 30 | int kho_add_subtree(const char *name, void *fdt); | ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1 error generated. vim +59 lib/test_kho.c b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 40) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 41) static int kho_test_notifier(struct notifier_block *self, unsigned long cmd, b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 42) void *v) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 43) { b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 44) struct kho_test_state *state = &kho_test_state; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 45) struct kho_serialization *ser = v; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 46) int err = 0; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 47) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 48) switch (cmd) { b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 49) case KEXEC_KHO_ABORT: b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 50) return NOTIFY_DONE; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 51) case KEXEC_KHO_FINALIZE: b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 52) /* Handled below */ b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 53) break; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 54) default: b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 55) return NOTIFY_BAD; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 56) } b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 57) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 58) err |= kho_preserve_folio(state->fdt); b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 @59) err |= kho_add_subtree(ser, KHO_TEST_FDT, folio_address(state->fdt)); b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 60) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 61) return err ? NOTIFY_BAD : NOTIFY_DONE; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 62) } b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 63) -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v1 3/3] kho: Remove notifier system infrastructure 2025-10-01 1:19 [PATCH v1 0/3] Make KHO Stateless Jason Miu 2025-10-01 1:19 ` [PATCH v1 1/3] kho: Adopt KHO radix tree data structures Jason Miu 2025-10-01 1:19 ` [PATCH v1 2/3] memblock: Remove KHO notifier usage Jason Miu @ 2025-10-01 1:19 ` Jason Miu 2025-10-01 18:07 ` kernel test robot 2 siblings, 1 reply; 13+ messages in thread From: Jason Miu @ 2025-10-01 1:19 UTC (permalink / raw) To: Alexander Graf, Andrew Morton, Baoquan He, Changyuan Lyu, David Matlack, David Rientjes, Jason Gunthorpe, Jason Miu, Mike Rapoport, Pasha Tatashin, Pratyush Yadav, kexec, linux-kernel, linux-mm Eliminate the core KHO notifier API functions (`register_kho_notifier`, `unregister_kho_notifier`), the `kho_event` enum, and the notifier chain head from KHO internal structures. This infrastructure was used to support the now-removed finalize and abort states and is no longer required. Client subsystems now interact with KHO through direct API calls. Signed-off-by: Jason Miu <jasonmiu@google.com> --- include/linux/kexec_handover.h | 22 ---------------------- kernel/kexec_handover.c | 15 --------------- 2 files changed, 37 deletions(-) diff --git a/include/linux/kexec_handover.h b/include/linux/kexec_handover.h index 9566c90a3501..09e8f0b0fcab 100644 --- a/include/linux/kexec_handover.h +++ b/include/linux/kexec_handover.h @@ -10,16 +10,7 @@ struct kho_scratch { phys_addr_t size; }; -/* KHO Notifier index */ -enum kho_event { - KEXEC_KHO_FINALIZE = 0, - KEXEC_KHO_ABORT = 1, -}; - struct folio; -struct notifier_block; - -struct kho_serialization; #ifdef CONFIG_KEXEC_HANDOVER bool kho_is_enabled(void); @@ -30,9 +21,6 @@ struct folio *kho_restore_folio(phys_addr_t phys); int kho_add_subtree(const char *name, void *fdt); int kho_retrieve_subtree(const char *name, phys_addr_t *phys); -int register_kho_notifier(struct notifier_block *nb); -int unregister_kho_notifier(struct notifier_block *nb); - void kho_memory_init(void); void kho_populate(phys_addr_t fdt_phys, u64 fdt_len, phys_addr_t scratch_phys, @@ -68,16 +56,6 @@ static inline int kho_retrieve_subtree(const char *name, phys_addr_t *phys) return -EOPNOTSUPP; } -static inline int register_kho_notifier(struct notifier_block *nb) -{ - return -EOPNOTSUPP; -} - -static inline int unregister_kho_notifier(struct notifier_block *nb) -{ - return -EOPNOTSUPP; -} - static inline void kho_memory_init(void) { } diff --git a/kernel/kexec_handover.c b/kernel/kexec_handover.c index ee4f430dfae0..fc290226e58b 100644 --- a/kernel/kexec_handover.c +++ b/kernel/kexec_handover.c @@ -16,7 +16,6 @@ #include <linux/libfdt.h> #include <linux/list.h> #include <linux/memblock.h> -#include <linux/notifier.h> #include <linux/page-isolation.h> #include <linux/rwsem.h> @@ -641,13 +640,11 @@ static int kho_debugfs_fdt_add(struct list_head *list, struct dentry *dir, } struct kho_out { - struct blocking_notifier_head chain_head; struct dentry *dir; struct kho_serialization ser; }; static struct kho_out kho_out = { - .chain_head = BLOCKING_NOTIFIER_INIT(kho_out.chain_head), .ser = { .fdt_list = LIST_HEAD_INIT(kho_out.ser.fdt_list), }, @@ -697,18 +694,6 @@ int kho_add_subtree(const char *name, void *fdt) } EXPORT_SYMBOL_GPL(kho_add_subtree); -int register_kho_notifier(struct notifier_block *nb) -{ - return blocking_notifier_chain_register(&kho_out.chain_head, nb); -} -EXPORT_SYMBOL_GPL(register_kho_notifier); - -int unregister_kho_notifier(struct notifier_block *nb) -{ - return blocking_notifier_chain_unregister(&kho_out.chain_head, nb); -} -EXPORT_SYMBOL_GPL(unregister_kho_notifier); - /** * kho_preserve_folio - preserve a folio across kexec. * @folio: folio to preserve. -- 2.51.0.618.g983fd99d29-goog ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v1 3/3] kho: Remove notifier system infrastructure 2025-10-01 1:19 ` [PATCH v1 3/3] kho: Remove notifier system infrastructure Jason Miu @ 2025-10-01 18:07 ` kernel test robot 0 siblings, 0 replies; 13+ messages in thread From: kernel test robot @ 2025-10-01 18:07 UTC (permalink / raw) To: Jason Miu, Alexander Graf, Andrew Morton, Baoquan He, Changyuan Lyu, David Matlack, David Rientjes, Jason Gunthorpe, Mike Rapoport, Pasha Tatashin, Pratyush Yadav, kexec, linux-kernel Cc: llvm, oe-kbuild-all, Linux Memory Management List Hi Jason, kernel test robot noticed the following build errors: [auto build test ERROR on rppt-memblock/for-next] [also build test ERROR on linus/master v6.17] [cannot apply to rppt-memblock/fixes akpm-mm/mm-everything next-20250929] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Jason-Miu/kho-Adopt-KHO-radix-tree-data-structures/20251001-092230 base: https://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock.git for-next patch link: https://lore.kernel.org/r/20251001011941.1513050-4-jasonmiu%40google.com patch subject: [PATCH v1 3/3] kho: Remove notifier system infrastructure config: x86_64-randconfig-003-20251001 (https://download.01.org/0day-ci/archive/20251002/202510020105.a05LM8TX-lkp@intel.com/config) compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251002/202510020105.a05LM8TX-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202510020105.a05LM8TX-lkp@intel.com/ All errors (new ones prefixed by >>): >> lib/test_kho.c:49:7: error: use of undeclared identifier 'KEXEC_KHO_ABORT' 49 | case KEXEC_KHO_ABORT: | ^ >> lib/test_kho.c:51:7: error: use of undeclared identifier 'KEXEC_KHO_FINALIZE' 51 | case KEXEC_KHO_FINALIZE: | ^ lib/test_kho.c:59:44: error: too many arguments to function call, expected 2, have 3 59 | err |= kho_add_subtree(ser, KHO_TEST_FDT, folio_address(state->fdt)); | ~~~~~~~~~~~~~~~ ^~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/kexec_handover.h:21:5: note: 'kho_add_subtree' declared here 21 | int kho_add_subtree(const char *name, void *fdt); | ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> lib/test_kho.c:194:9: error: call to undeclared function 'register_kho_notifier'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] 194 | return register_kho_notifier(&kho_test_nb); | ^ lib/test_kho.c:194:9: note: did you mean 'register_module_notifier'? include/linux/module.h:745:5: note: 'register_module_notifier' declared here 745 | int register_module_notifier(struct notifier_block *nb); | ^ >> lib/test_kho.c:298:2: error: call to undeclared function 'unregister_kho_notifier'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] 298 | unregister_kho_notifier(&kho_test_nb); | ^ lib/test_kho.c:298:2: note: did you mean 'unregister_module_notifier'? include/linux/module.h:746:5: note: 'unregister_module_notifier' declared here 746 | int unregister_module_notifier(struct notifier_block *nb); | ^ 5 errors generated. vim +/KEXEC_KHO_ABORT +49 lib/test_kho.c b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 40) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 41) static int kho_test_notifier(struct notifier_block *self, unsigned long cmd, b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 42) void *v) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 43) { b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 44) struct kho_test_state *state = &kho_test_state; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 45) struct kho_serialization *ser = v; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 46) int err = 0; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 47) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 48) switch (cmd) { b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 @49) case KEXEC_KHO_ABORT: b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 50) return NOTIFY_DONE; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 @51) case KEXEC_KHO_FINALIZE: b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 52) /* Handled below */ b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 53) break; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 54) default: b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 55) return NOTIFY_BAD; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 56) } b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 57) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 58) err |= kho_preserve_folio(state->fdt); b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 59) err |= kho_add_subtree(ser, KHO_TEST_FDT, folio_address(state->fdt)); b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 60) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 61) return err ? NOTIFY_BAD : NOTIFY_DONE; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 62) } b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 63) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 64) static struct notifier_block kho_test_nb = { b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 65) .notifier_call = kho_test_notifier, b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 66) }; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 67) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 68) static int kho_test_save_data(struct kho_test_state *state, void *fdt) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 69) { b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 70) phys_addr_t *folios_info __free(kvfree) = NULL; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 71) int err = 0; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 72) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 73) folios_info = kvmalloc_array(state->nr_folios, sizeof(*folios_info), b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 74) GFP_KERNEL); b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 75) if (!folios_info) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 76) return -ENOMEM; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 77) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 78) for (int i = 0; i < state->nr_folios; i++) { b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 79) struct folio *folio = state->folios[i]; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 80) unsigned int order = folio_order(folio); b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 81) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 82) folios_info[i] = virt_to_phys(folio_address(folio)) | order; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 83) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 84) err = kho_preserve_folio(folio); b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 85) if (err) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 86) return err; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 87) } b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 88) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 89) err |= fdt_begin_node(fdt, "data"); b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 90) err |= fdt_property(fdt, "nr_folios", &state->nr_folios, b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 91) sizeof(state->nr_folios)); b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 92) err |= fdt_property(fdt, "folios_info", folios_info, b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 93) state->nr_folios * sizeof(*folios_info)); b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 94) err |= fdt_property(fdt, "csum", &state->csum, sizeof(state->csum)); b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 95) err |= fdt_end_node(fdt); b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 96) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 97) return err; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 98) } b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 99) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 100) static int kho_test_prepare_fdt(struct kho_test_state *state) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 101) { b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 102) const char compatible[] = KHO_TEST_COMPAT; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 103) unsigned int magic = KHO_TEST_MAGIC; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 104) ssize_t fdt_size; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 105) int err = 0; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 106) void *fdt; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 107) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 108) fdt_size = state->nr_folios * sizeof(phys_addr_t) + PAGE_SIZE; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 109) state->fdt = folio_alloc(GFP_KERNEL, get_order(fdt_size)); b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 110) if (!state->fdt) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 111) return -ENOMEM; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 112) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 113) fdt = folio_address(state->fdt); b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 114) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 115) err |= fdt_create(fdt, fdt_size); b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 116) err |= fdt_finish_reservemap(fdt); b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 117) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 118) err |= fdt_begin_node(fdt, ""); b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 119) err |= fdt_property(fdt, "compatible", compatible, sizeof(compatible)); b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 120) err |= fdt_property(fdt, "magic", &magic, sizeof(magic)); b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 121) err |= kho_test_save_data(state, fdt); b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 122) err |= fdt_end_node(fdt); b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 123) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 124) err |= fdt_finish(fdt); b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 125) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 126) if (err) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 127) folio_put(state->fdt); b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 128) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 129) return err; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 130) } b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 131) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 132) static int kho_test_generate_data(struct kho_test_state *state) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 133) { b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 134) size_t alloc_size = 0; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 135) __wsum csum = 0; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 136) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 137) while (alloc_size < max_mem) { b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 138) int order = get_random_u32() % NR_PAGE_ORDERS; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 139) struct folio *folio; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 140) unsigned int size; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 141) void *addr; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 142) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 143) /* cap allocation so that we won't exceed max_mem */ b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 144) if (alloc_size + (PAGE_SIZE << order) > max_mem) { b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 145) order = get_order(max_mem - alloc_size); b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 146) if (order) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 147) order--; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 148) } b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 149) size = PAGE_SIZE << order; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 150) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 151) folio = folio_alloc(GFP_KERNEL | __GFP_NORETRY, order); b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 152) if (!folio) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 153) goto err_free_folios; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 154) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 155) state->folios[state->nr_folios++] = folio; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 156) addr = folio_address(folio); b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 157) get_random_bytes(addr, size); b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 158) csum = csum_partial(addr, size, csum); b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 159) alloc_size += size; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 160) } b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 161) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 162) state->csum = csum; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 163) return 0; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 164) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 165) err_free_folios: b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 166) for (int i = 0; i < state->nr_folios; i++) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 167) folio_put(state->folios[i]); b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 168) return -ENOMEM; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 169) } b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 170) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 171) static int kho_test_save(void) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 172) { b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 173) struct kho_test_state *state = &kho_test_state; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 174) struct folio **folios __free(kvfree) = NULL; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 175) unsigned long max_nr; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 176) int err; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 177) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 178) max_mem = PAGE_ALIGN(max_mem); b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 179) max_nr = max_mem >> PAGE_SHIFT; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 180) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 181) folios = kvmalloc_array(max_nr, sizeof(*state->folios), GFP_KERNEL); b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 182) if (!folios) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 183) return -ENOMEM; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 184) state->folios = folios; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 185) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 186) err = kho_test_generate_data(state); b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 187) if (err) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 188) return err; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 189) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 190) err = kho_test_prepare_fdt(state); b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 191) if (err) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 192) return err; b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 193) b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 @194) return register_kho_notifier(&kho_test_nb); b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 195) } b753522bed0b7e Mike Rapoport (Microsoft 2025-07-27 196) -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2025-10-22 0:59 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2025-10-01 1:19 [PATCH v1 0/3] Make KHO Stateless Jason Miu 2025-10-01 1:19 ` [PATCH v1 1/3] kho: Adopt KHO radix tree data structures Jason Miu 2025-10-02 4:29 ` kernel test robot 2025-10-06 14:14 ` Jason Gunthorpe 2025-10-06 17:26 ` Pasha Tatashin 2025-10-06 22:50 ` Jason Gunthorpe 2025-10-09 2:07 ` Jason Miu 2025-10-09 17:52 ` Jason Gunthorpe 2025-10-22 0:59 ` Jason Miu 2025-10-01 1:19 ` [PATCH v1 2/3] memblock: Remove KHO notifier usage Jason Miu 2025-10-01 16:35 ` kernel test robot 2025-10-01 1:19 ` [PATCH v1 3/3] kho: Remove notifier system infrastructure Jason Miu 2025-10-01 18:07 ` kernel test robot
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox