From: "Jianzhou Zhao" <luckd0g@163.com>
To: akpm@linux-foundation.org, Liam.Howlett@oracle.com,
aliceryhl@google.com, andrewjballance@gmail.com,
linux-kernel@vger.kernel.org, maple-tree@lists.infradead.org,
linux-mm@kvack.org
Subject: KCSAN: data-race in mas_topiary_replace / mas_walk
Date: Wed, 11 Mar 2026 11:33:49 +0800 (CST) [thread overview]
Message-ID: <35e96d54.3914.19cdaf536a9.Coremail.luckd0g@163.com> (raw)
[-- Attachment #1: Type: text/plain, Size: 6088 bytes --]
Subject: [BUG] maple_tree: KCSAN: data-race in mas_topiary_replace / mas_walk
Dear Maintainers,
We are writing to report a KCSAN-detected data-race vulnerability in the Linux kernel. This bug was found by our custom fuzzing tool, RacePilot. The bug occurs in the maple tree component during concurrent node replacement and tree traversal via RCU walk operations. We observed this on the Linux kernel version 6.18.0-08691-g2061f18ad76e-dirty.
Call Trace & Context
==================================================================
BUG: KCSAN: data-race in mas_topiary_replace / mas_walk
write to 0xffff88800c45d600 of 8 bytes by task 17331 on cpu 1:
mte_set_node_dead lib/maple_tree.c:335 [inline]
mas_put_in_tree lib/maple_tree.c:1571 [inline]
mas_topiary_replace+0x14e/0x14a0 lib/maple_tree.c:2350
mas_wmb_replace lib/maple_tree.c:2443 [inline]
mas_split lib/maple_tree.c:3067 [inline]
mas_commit_b_node lib/maple_tree.c:3087 [inline]
mas_wr_bnode+0xd2a/0x23b0 lib/maple_tree.c:3755
mas_wr_store_entry+0x77b/0x1120 lib/maple_tree.c:3787
mas_store_prealloc+0x47c/0xa60 lib/maple_tree.c:5191
vma_iter_store_overwrite mm/vma.h:481 [inline]
vma_iter_store_new mm/vma.h:488 [inline]
vma_complete+0x6a9/0x8a0 mm/vma.c:353
__split_vma+0x5fb/0x6f0 mm/vma.c:567
split_vma mm/vma.c:597 [inline]
vma_modify+0xac8/0xdd0 mm/vma.c:1635
vma_modify_flags+0x16c/0x1a0 mm/vma.c:1662
mprotect_fixup+0x170/0x660 mm/mprotect.c:816
do_mprotect_pkey+0x5fe/0x930 mm/mprotect.c:990
__do_sys_mprotect mm/mprotect.c:1011 [inline]
__se_sys_mprotect mm/mprotect.c:1008 [inline]
__x64_sys_mprotect+0x47/0x60 mm/mprotect.c:1008
x64_sys_call+0xc6c/0x2030 arch/x86/include/generated/asm/syscalls_64.h:11
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xae/0x2c0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
read to 0xffff88800c45d600 of 8 bytes by task 17339 on cpu 0:
ma_dead_node lib/maple_tree.c:576 [inline]
mte_dead_node lib/maple_tree.c:591 [inline]
mas_start lib/maple_tree.c:1211 [inline]
mas_start lib/maple_tree.c:1194 [inline]
mas_state_walk lib/maple_tree.c:3306 [inline]
mas_walk+0x257/0x400 lib/maple_tree.c:4617
lock_vma_under_rcu+0xd3/0x710 mm/mmap_lock.c:238
do_user_addr_fault arch/x86/mm/fault.c:1327 [inline]
handle_page_fault arch/x86/mm/fault.c:1476 [inline]
exc_page_fault+0x294/0x10d0 arch/x86/mm/fault.c:1532
asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:618
value changed: 0xffff88802d63d141 -> 0xffff88800c45d600
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 UID: 0 PID: 17339 Comm: syz.3.451 Not tainted 6.18.0-08691-g2061f18ad76e-dirty #44 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
==================================================================
Execution Flow & Code Context
The CPU 1 task modifies memory regions via `mprotect` causing a VMA split. It calls `mas_store_prealloc()` to update the maple tree structure. During node replacement through `mas_topiary_replace()`, `mte_set_node_dead()` writes the parent pointer to itself to mark the node as dead using a plain C store:
```c
// lib/maple_tree.c
static inline void mte_set_node_dead(struct maple_enode *mn)
{
mte_to_node(mn)->parent = ma_parent_ptr(mte_to_node(mn)); // <-- Write
smp_wmb(); /* Needed for RCU */
}
```
Meanwhile, CPU 0 concurrently handles a page fault for user memory via RCU lookup using `lock_vma_under_rcu()`. During `mas_walk()`, `mas_start()` reads the root node and checks if it's dead using `mte_dead_node() -> ma_dead_node()`. The reader unconditionally fetches `node->parent` using a simple memory access in C:
```c
// lib/maple_tree.c
static __always_inline bool ma_dead_node(const struct maple_node *node)
{
struct maple_node *parent;
/* Do not reorder reads from the node prior to the parent check */
smp_rmb();
parent = (void *)((unsigned long)node->parent & ~MAPLE_NODE_MASK); // <-- Lockless Read
return (parent == node);
}
```
Root Cause Analysis
A data race occurs over `node->parent`. The writer directly assigns `node->parent` to itself (without atomic annotations) during tree structural updates. Simultaneously, a lockless reader checks `node->parent` to decide whether it has stepped into a dead subtree. The lack of `READ_ONCE()`/`WRITE_ONCE()` exposes the access to potential read tearing or store tearing from compiler optimizations, breaking the guarantee of a clean pointer load.
Unfortunately, we were unable to generate a reproducer for this bug.
Potential Impact
If `ma_dead_node()` reads a partially torn or out-of-date pointer due to caching or hoisting, a dead node could incorrectly appear alive. This would allow an RCU reader to continue navigating into a freed or corrupted subtree, leading to use-after-free conditions, null pointer dereferences, infinite traversal loops inside `maple_tree` routing algorithms, and eventually system panic or local DoS.
Proposed Fix
To safely resolve this data-race without breaking the fast RCU walk architecture latency, we must add concurrent memory annotations to `node->parent` where readers interact with writers locklessly. The writer should assign the parent via `WRITE_ONCE()`, and the reader should fetch the parent using `READ_ONCE()`.
```diff
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -332,7 +332,7 @@ static inline struct maple_node *mas_mn(const struct ma_state *mas)
static inline void mte_set_node_dead(struct maple_enode *mn)
{
- mte_to_node(mn)->parent = ma_parent_ptr(mte_to_node(mn));
+ WRITE_ONCE(mte_to_node(mn)->parent, ma_parent_ptr(mte_to_node(mn)));
smp_wmb(); /* Needed for RCU */
}
@@ -576,7 +576,8 @@ static __always_inline bool ma_dead_node(const struct maple_node *node)
/* Do not reorder reads from the node prior to the parent check */
smp_rmb();
- parent = (void *)((unsigned long)node->parent & ~MAPLE_NODE_MASK);
+ parent = (void *)((unsigned long)READ_ONCE(node->parent) &
+ ~MAPLE_NODE_MASK);
return (parent == node);
}
```
We would be highly honored if this could be of any help.
Best regards,
RacePilot Team
[-- Attachment #2: Type: text/html, Size: 10001 bytes --]
reply other threads:[~2026-03-11 3:34 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=35e96d54.3914.19cdaf536a9.Coremail.luckd0g@163.com \
--to=luckd0g@163.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=aliceryhl@google.com \
--cc=andrewjballance@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=maple-tree@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox