Subject: [BUG] maple_tree: KCSAN: data-race in mas_topiary_replace / mas_walk Dear Maintainers, We are writing to report a KCSAN-detected data-race vulnerability in the Linux kernel. This bug was found by our custom fuzzing tool, RacePilot. The bug occurs in the maple tree component during concurrent node replacement and tree traversal via RCU walk operations. We observed this on the Linux kernel version 6.18.0-08691-g2061f18ad76e-dirty. Call Trace & Context ================================================================== BUG: KCSAN: data-race in mas_topiary_replace / mas_walk write to 0xffff88800c45d600 of 8 bytes by task 17331 on cpu 1: mte_set_node_dead lib/maple_tree.c:335 [inline] mas_put_in_tree lib/maple_tree.c:1571 [inline] mas_topiary_replace+0x14e/0x14a0 lib/maple_tree.c:2350 mas_wmb_replace lib/maple_tree.c:2443 [inline] mas_split lib/maple_tree.c:3067 [inline] mas_commit_b_node lib/maple_tree.c:3087 [inline] mas_wr_bnode+0xd2a/0x23b0 lib/maple_tree.c:3755 mas_wr_store_entry+0x77b/0x1120 lib/maple_tree.c:3787 mas_store_prealloc+0x47c/0xa60 lib/maple_tree.c:5191 vma_iter_store_overwrite mm/vma.h:481 [inline] vma_iter_store_new mm/vma.h:488 [inline] vma_complete+0x6a9/0x8a0 mm/vma.c:353 __split_vma+0x5fb/0x6f0 mm/vma.c:567 split_vma mm/vma.c:597 [inline] vma_modify+0xac8/0xdd0 mm/vma.c:1635 vma_modify_flags+0x16c/0x1a0 mm/vma.c:1662 mprotect_fixup+0x170/0x660 mm/mprotect.c:816 do_mprotect_pkey+0x5fe/0x930 mm/mprotect.c:990 __do_sys_mprotect mm/mprotect.c:1011 [inline] __se_sys_mprotect mm/mprotect.c:1008 [inline] __x64_sys_mprotect+0x47/0x60 mm/mprotect.c:1008 x64_sys_call+0xc6c/0x2030 arch/x86/include/generated/asm/syscalls_64.h:11 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xae/0x2c0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f read to 0xffff88800c45d600 of 8 bytes by task 17339 on cpu 0: ma_dead_node lib/maple_tree.c:576 [inline] mte_dead_node lib/maple_tree.c:591 [inline] mas_start lib/maple_tree.c:1211 [inline] mas_start lib/maple_tree.c:1194 [inline] mas_state_walk lib/maple_tree.c:3306 [inline] mas_walk+0x257/0x400 lib/maple_tree.c:4617 lock_vma_under_rcu+0xd3/0x710 mm/mmap_lock.c:238 do_user_addr_fault arch/x86/mm/fault.c:1327 [inline] handle_page_fault arch/x86/mm/fault.c:1476 [inline] exc_page_fault+0x294/0x10d0 arch/x86/mm/fault.c:1532 asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:618 value changed: 0xffff88802d63d141 -> 0xffff88800c45d600 Reported by Kernel Concurrency Sanitizer on: CPU: 0 UID: 0 PID: 17339 Comm: syz.3.451 Not tainted 6.18.0-08691-g2061f18ad76e-dirty #44 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 ================================================================== Execution Flow & Code Context The CPU 1 task modifies memory regions via `mprotect` causing a VMA split. It calls `mas_store_prealloc()` to update the maple tree structure. During node replacement through `mas_topiary_replace()`, `mte_set_node_dead()` writes the parent pointer to itself to mark the node as dead using a plain C store: ```c // lib/maple_tree.c static inline void mte_set_node_dead(struct maple_enode *mn) { mte_to_node(mn)->parent = ma_parent_ptr(mte_to_node(mn)); // <-- Write smp_wmb(); /* Needed for RCU */ } ``` Meanwhile, CPU 0 concurrently handles a page fault for user memory via RCU lookup using `lock_vma_under_rcu()`. During `mas_walk()`, `mas_start()` reads the root node and checks if it's dead using `mte_dead_node() -> ma_dead_node()`. The reader unconditionally fetches `node->parent` using a simple memory access in C: ```c // lib/maple_tree.c static __always_inline bool ma_dead_node(const struct maple_node *node) { struct maple_node *parent; /* Do not reorder reads from the node prior to the parent check */ smp_rmb(); parent = (void *)((unsigned long)node->parent & ~MAPLE_NODE_MASK); // <-- Lockless Read return (parent == node); } ``` Root Cause Analysis A data race occurs over `node->parent`. The writer directly assigns `node->parent` to itself (without atomic annotations) during tree structural updates. Simultaneously, a lockless reader checks `node->parent` to decide whether it has stepped into a dead subtree. The lack of `READ_ONCE()`/`WRITE_ONCE()` exposes the access to potential read tearing or store tearing from compiler optimizations, breaking the guarantee of a clean pointer load. Unfortunately, we were unable to generate a reproducer for this bug. Potential Impact If `ma_dead_node()` reads a partially torn or out-of-date pointer due to caching or hoisting, a dead node could incorrectly appear alive. This would allow an RCU reader to continue navigating into a freed or corrupted subtree, leading to use-after-free conditions, null pointer dereferences, infinite traversal loops inside `maple_tree` routing algorithms, and eventually system panic or local DoS. Proposed Fix To safely resolve this data-race without breaking the fast RCU walk architecture latency, we must add concurrent memory annotations to `node->parent` where readers interact with writers locklessly. The writer should assign the parent via `WRITE_ONCE()`, and the reader should fetch the parent using `READ_ONCE()`. ```diff --- a/lib/maple_tree.c +++ b/lib/maple_tree.c @@ -332,7 +332,7 @@ static inline struct maple_node *mas_mn(const struct ma_state *mas) static inline void mte_set_node_dead(struct maple_enode *mn) { - mte_to_node(mn)->parent = ma_parent_ptr(mte_to_node(mn)); + WRITE_ONCE(mte_to_node(mn)->parent, ma_parent_ptr(mte_to_node(mn))); smp_wmb(); /* Needed for RCU */ } @@ -576,7 +576,8 @@ static __always_inline bool ma_dead_node(const struct maple_node *node) /* Do not reorder reads from the node prior to the parent check */ smp_rmb(); - parent = (void *)((unsigned long)node->parent & ~MAPLE_NODE_MASK); + parent = (void *)((unsigned long)READ_ONCE(node->parent) & + ~MAPLE_NODE_MASK); return (parent == node); } ``` We would be highly honored if this could be of any help. Best regards, RacePilot Team