From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>
To: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Cc: maple-tree@lists.infradead.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: BUG: maple_tree: KCSAN: data-race in mas_topiary_replace / mtree_range_walk
Date: Thu, 28 Sep 2023 15:59:26 -0400 [thread overview]
Message-ID: <20230928195926.ucp7it3c3c75erzk@revolver> (raw)
In-Reply-To: <5d908c60-88fb-49c0-a150-bfb468eaa513@alu.unizg.hr>
[-- Attachment #1: Type: text/plain, Size: 8431 bytes --]
* Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr> [230923 03:27]:
> On 9/22/23 15:51, Liam R. Howlett wrote:
...
> > > [ 6413.367326] ==================================================================
> > > [ 6413.367349] BUG: KCSAN: data-race in mas_topiary_replace / mtree_range_walk
> > >
> > > [ 6413.367375] write to 0xffff8883a0c5db00 of 8 bytes by task 5475 on cpu 24:
> > > [ 6413.367386] mas_topiary_replace (/home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:491 /home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:1701 /home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:2590)
> > > [ 6413.367399] mas_spanning_rebalance.isra.0 (/home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:2664 /home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:2961)
> > > [ 6413.367413] mas_wr_spanning_store.isra.0 (/home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:3894)
> > > [ 6413.367428] mas_wr_store_entry.isra.0 (/home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:4242)
> > > [ 6413.367442] mas_store_prealloc (/home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:256 /home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:5408)
> > > [ 6413.367455] vma_merge (/home/marvin/linux/kernel/torvalds2/mm/internal.h:1127 /home/marvin/linux/kernel/torvalds2/mm/mmap.c:1005)
> > > [ 6413.367466] mprotect_fixup (/home/marvin/linux/kernel/torvalds2/mm/mprotect.c:632)
> > > [ 6413.367480] do_mprotect_pkey (/home/marvin/linux/kernel/torvalds2/mm/mprotect.c:819)
> > > [ 6413.367494] __x64_sys_mprotect (/home/marvin/linux/kernel/torvalds2/mm/mprotect.c:837)
> > > [ 6413.367503] do_syscall_64 (/home/marvin/linux/kernel/torvalds2/arch/x86/entry/common.c:50 /home/marvin/linux/kernel/torvalds2/arch/x86/entry/common.c:80)
> > > [ 6413.367517] entry_SYSCALL_64_after_hwframe (/home/marvin/linux/kernel/torvalds2/arch/x86/entry/entry_64.S:120)
> > >
> > > [ 6413.367534] read to 0xffff8883a0c5db00 of 8 bytes by task 5558 on cpu 11:
> > > [ 6413.367545] mtree_range_walk (/home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:539 /home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:2789)
> > > [ 6413.367556] mas_walk (/home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:251 /home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:4844)
> > > [ 6413.367567] lock_vma_under_rcu (/home/marvin/linux/kernel/torvalds2/mm/memory.c:5436)
> > > [ 6413.367579] do_user_addr_fault (/home/marvin/linux/kernel/torvalds2/arch/x86/mm/fault.c:1357)
> > > [ 6413.367593] exc_page_fault (/home/marvin/linux/kernel/torvalds2/./arch/x86/include/asm/paravirt.h:695 /home/marvin/linux/kernel/torvalds2/arch/x86/mm/fault.c:1513 /home/marvin/linux/kernel/torvalds2/arch/x86/mm/fault.c:1561)
> > > [ 6413.367602] asm_exc_page_fault (/home/marvin/linux/kernel/torvalds2/./arch/x86/include/asm/idtentry.h:570)
> > >
> > > [ 6413.367617] value changed: 0xffff888341d43116 -> 0xffff888340f92116
> > >
> > > [ 6413.367632] Reported by Kernel Concurrency Sanitizer on:
> > > [ 6413.367640] CPU: 11 PID: 5558 Comm: ThreadPoolForeg Tainted: G L 6.6.0-rc2-kcsan-00143-gb5cbe7c00aa0 #41
> > > [ 6413.367653] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
> > > [ 6413.367660] ==================================================================
> > >
> > > For your convenience, took the trouble of finding those suspicious lines of code:
> > >
> > > The write side:
> > >
> > > lib/maple_tree.c:491
> > > --------------------
> > > 456 /*
> > > 457 * mas_set_parent() - Set the parent node and encode the slot
> > > 458 * @enode: The encoded maple node.
> > > 459 * @parent: The encoded maple node that is the parent of @enode.
> > > 460 * @slot: The slot that @enode resides in @parent.
> > > 461 *
> > > 462 * Slot number is encoded in the enode->parent bit 3-6 or 2-6, depending on the
> > > 463 * parent type.
> > > 464 */
> > > 465 static inline
> > > 466 void mas_set_parent(struct ma_state *mas, struct maple_enode *enode,
> > > 467 const struct maple_enode *parent, unsigned char slot)
> > > 468 {
> > > 469 unsigned long val = (unsigned long)parent;
> > > 470 unsigned long shift;
> > > 471 unsigned long type;
> > > 472 enum maple_type p_type = mte_node_type(parent);
> > > 473
> > > 474 MAS_BUG_ON(mas, p_type == maple_dense);
> > > 475 MAS_BUG_ON(mas, p_type == maple_leaf_64);
> > > 476
> > > 477 switch (p_type) {
> > > 478 case maple_range_64:
> > > 479 case maple_arange_64:
> > > 480 shift = MAPLE_PARENT_SLOT_SHIFT;
> > > 481 type = MAPLE_PARENT_RANGE64;
> > > 482 break;
> > > 483 default:
> > > 484 case maple_dense:
> > > 485 case maple_leaf_64:
> > > 486 shift = type = 0;
> > > 487 break;
> > > 488 }
> > > 489
> > > 490 val &= ~MAPLE_NODE_MASK; /* Clear all node metadata in parent */
> > > → 491 val |= (slot << shift) | type;
> > > 492 mte_to_node(enode)->parent = ma_parent_ptr(val);
> > > 493 }
This should probably be 492, not 491. I know what is racing here and it
shouldn't cause issue.
...
> > > The read side:
> > >
> > > 527 /*
> > > 528 * ma_dead_node() - check if the @enode is dead.
> > > 529 * @enode: The encoded maple node
> > > 530 *
> > > 531 * Return: true if dead, false otherwise.
> > > 532 */
> > > 533 static inline bool ma_dead_node(const struct maple_node *node)
> > > 534 {
> > > 535 struct maple_node *parent;
> > > 536
> > > 537 /* Do not reorder reads from the node prior to the parent check */
> > > 538 smp_rmb();
> > > → 539 parent = (void *)((unsigned long) node->parent & ~MAPLE_NODE_MASK);
> > > 540 return (parent == node);
> > > 541 }
This is the correct line.
...
> > >
> > > as above, but the smb_rmb() protection is here, so the KCSAN warning should be double-checked for
> > > validity.
> > >
> > > But I do not really understand maple trees to their depth, I am only reporting the symptomatic
> > > outlook of the reported data-race.
> >
> > This is the most complex part of the maple tree, and unique to the
> > ability to store a range over multiple existing entries. I recently
> > rewrote this particular section to avoid a potential live-lock issue.
>
> I see.
>
> > I am confident that the parent pointer will not be set to the node
> > pointer as checked by ma_dead_node(). But in an abundance of caution
> > and to resolve this potential false-positive, I am looking at this in
> > more detail.
> >
> > This race is to see if the walk down the tree into unchanged data will
> > be seen before it is placed in the new subtree with its data also
> > unchanged. Since we know the parent can never be the node, I feel this
> > is safe - but I'm not willing to live with the complaints from kasan.
>
> I cannot analyse a couple of thousand lines at such a short notice.
Don't worry, I will :)
>
> It looks suspicious because val in line 491 in a local variable and thus
> isn't available elsewhere.
It is used in the node->parent, as described above. It is a race, but
it doesn't matter who wins.
>
> Maybe the compiler (gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0) did something
> to the code?
Probably off-by-one line.
>
> > > This is all-in-all a very interested subject, and I wish there was a way to just slurp all those
> > > interesting kernel intrinsics into the brain, but it just ain't that easy. Forgive me for being
> > > open ...
> >
> > I appreciate that and your detailed analysis with code pointers of where
> > this happens. Is this easy to recreate? If so, how? Can you attach
> > your kernel config?
>
> Got that attached first, so I do not forget. :-/
>
> Recreate? Actually it happened quite a number of times on my configuration (480+?).
I'm having issues recreating it because I hit a lot of races before this
one in my test setup. I will keep working on reproducing this race, but
in the mean time can you test the attached patch with your setup?
...
Thanks,
Liam
[-- Attachment #2: race_fix.patch --]
[-- Type: text/x-diff, Size: 381 bytes --]
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index ee1ff0c59fd7..8f68582451c4 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -1699,6 +1699,7 @@ static inline void mas_adopt_children(struct ma_state *mas,
do {
child = mas_slot_locked(mas, slots, offset);
mas_set_parent(mas, child, parent, offset);
+ smp_wmb(); /* Needed for RCU */
} while (offset--);
}
next prev parent reply other threads:[~2023-09-28 19:59 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-21 21:01 Mirsad Todorovac
2023-09-22 13:51 ` Liam R. Howlett
2023-09-23 7:26 ` Mirsad Todorovac
2023-09-28 19:59 ` Liam R. Howlett [this message]
2023-09-30 7:02 ` Mirsad Todorovac
2023-10-01 21:08 ` BUG: maple_tree: KCSAN: data-race in mas_topiary_replace / mtree_range_walk [EXPERIMENTAL PATCH] Mirsad Todorovac
2023-10-02 16:43 ` Liam R. Howlett
2023-10-02 19:50 ` Mirsad Todorovac
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230928195926.ucp7it3c3c75erzk@revolver \
--to=liam.howlett@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=maple-tree@lists.infradead.org \
--cc=mirsad.todorovac@alu.unizg.hr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox