linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* BUG: maple_tree: KCSAN: data-race in mas_topiary_replace / mtree_range_walk
@ 2023-09-21 21:01 Mirsad Todorovac
  2023-09-22 13:51 ` Liam R. Howlett
  0 siblings, 1 reply; 8+ messages in thread
From: Mirsad Todorovac @ 2023-09-21 21:01 UTC (permalink / raw)
  To: maple-tree; +Cc: Liam R. Howlett, linux-mm, linux-kernel

Hi,

On the 6.6-rc2 vanilla torvalds tree kernel, KCSAN had reported a number of data-races:

[ 6413.367326] ==================================================================
[ 6413.367349] BUG: KCSAN: data-race in mas_topiary_replace / mtree_range_walk

[ 6413.367375] write to 0xffff8883a0c5db00 of 8 bytes by task 5475 on cpu 24:
[ 6413.367386] mas_topiary_replace (/home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:491 /home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:1701 /home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:2590)
[ 6413.367399] mas_spanning_rebalance.isra.0 (/home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:2664 /home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:2961)
[ 6413.367413] mas_wr_spanning_store.isra.0 (/home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:3894)
[ 6413.367428] mas_wr_store_entry.isra.0 (/home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:4242)
[ 6413.367442] mas_store_prealloc (/home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:256 /home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:5408)
[ 6413.367455] vma_merge (/home/marvin/linux/kernel/torvalds2/mm/internal.h:1127 /home/marvin/linux/kernel/torvalds2/mm/mmap.c:1005)
[ 6413.367466] mprotect_fixup (/home/marvin/linux/kernel/torvalds2/mm/mprotect.c:632)
[ 6413.367480] do_mprotect_pkey (/home/marvin/linux/kernel/torvalds2/mm/mprotect.c:819)
[ 6413.367494] __x64_sys_mprotect (/home/marvin/linux/kernel/torvalds2/mm/mprotect.c:837)
[ 6413.367503] do_syscall_64 (/home/marvin/linux/kernel/torvalds2/arch/x86/entry/common.c:50 /home/marvin/linux/kernel/torvalds2/arch/x86/entry/common.c:80)
[ 6413.367517] entry_SYSCALL_64_after_hwframe (/home/marvin/linux/kernel/torvalds2/arch/x86/entry/entry_64.S:120)

[ 6413.367534] read to 0xffff8883a0c5db00 of 8 bytes by task 5558 on cpu 11:
[ 6413.367545] mtree_range_walk (/home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:539 /home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:2789)
[ 6413.367556] mas_walk (/home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:251 /home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:4844)
[ 6413.367567] lock_vma_under_rcu (/home/marvin/linux/kernel/torvalds2/mm/memory.c:5436)
[ 6413.367579] do_user_addr_fault (/home/marvin/linux/kernel/torvalds2/arch/x86/mm/fault.c:1357)
[ 6413.367593] exc_page_fault (/home/marvin/linux/kernel/torvalds2/./arch/x86/include/asm/paravirt.h:695 /home/marvin/linux/kernel/torvalds2/arch/x86/mm/fault.c:1513 /home/marvin/linux/kernel/torvalds2/arch/x86/mm/fault.c:1561)
[ 6413.367602] asm_exc_page_fault (/home/marvin/linux/kernel/torvalds2/./arch/x86/include/asm/idtentry.h:570)

[ 6413.367617] value changed: 0xffff888341d43116 -> 0xffff888340f92116

[ 6413.367632] Reported by Kernel Concurrency Sanitizer on:
[ 6413.367640] CPU: 11 PID: 5558 Comm: ThreadPoolForeg Tainted: G             L     6.6.0-rc2-kcsan-00143-gb5cbe7c00aa0 #41
[ 6413.367653] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
[ 6413.367660] ==================================================================

For your convenience, took the trouble of finding those suspicious lines of code:

The write side:

lib/maple_tree.c:491
--------------------
   456 /*
   457  * mas_set_parent() - Set the parent node and encode the slot
   458  * @enode: The encoded maple node.
   459  * @parent: The encoded maple node that is the parent of @enode.
   460  * @slot: The slot that @enode resides in @parent.
   461  *
   462  * Slot number is encoded in the enode->parent bit 3-6 or 2-6, depending on the
   463  * parent type.
   464  */
   465 static inline
   466 void mas_set_parent(struct ma_state *mas, struct maple_enode *enode,
   467                     const struct maple_enode *parent, unsigned char slot)
   468 {
   469         unsigned long val = (unsigned long)parent;
   470         unsigned long shift;
   471         unsigned long type;
   472         enum maple_type p_type = mte_node_type(parent);
   473
   474         MAS_BUG_ON(mas, p_type == maple_dense);
   475         MAS_BUG_ON(mas, p_type == maple_leaf_64);
   476
   477         switch (p_type) {
   478         case maple_range_64:
   479         case maple_arange_64:
   480                 shift = MAPLE_PARENT_SLOT_SHIFT;
   481                 type = MAPLE_PARENT_RANGE64;
   482                 break;
   483         default:
   484         case maple_dense:
   485         case maple_leaf_64:
   486                 shift = type = 0;
   487                 break;
   488         }
   489
   490         val &= ~MAPLE_NODE_MASK; /* Clear all node metadata in parent */
→ 491         val |= (slot << shift) | type;
   492         mte_to_node(enode)->parent = ma_parent_ptr(val);
   493 }

lib/maple_tree.c:1701
---------------------
   1682 /*
   1683  * mas_adopt_children() - Set the parent pointer of all nodes in @parent to
   1684  * @parent with the slot encoded.
   1685  * @mas - the maple state (for the tree)
   1686  * @parent - the maple encoded node containing the children.
   1687  */
   1688 static inline void mas_adopt_children(struct ma_state *mas,
   1689                 struct maple_enode *parent)
   1690 {
   1691         enum maple_type type = mte_node_type(parent);
   1692         struct maple_node *node = mte_to_node(parent);
   1693         void __rcu **slots = ma_slots(node, type);
   1694         unsigned long *pivots = ma_pivots(node, type);
   1695         struct maple_enode *child;
   1696         unsigned char offset;
   1697
   1698         offset = ma_data_end(node, type, pivots, mas->max);
   1699         do {
   1700                 child = mas_slot_locked(mas, slots, offset);
→ 1701                 mas_set_parent(mas, child, parent, offset);
   1702         } while (offset--);
   1703 }

   2562 static inline void mas_topiary_replace(struct ma_state *mas,
   2563                 struct maple_enode *old_enode)
   2564 {
   2565         struct ma_state tmp[3], tmp_next[3];
   2566         MA_TOPIARY(subtrees, mas->tree);
   2567         bool in_rcu;
   2568         int i, n;
   2569
   2570         /* Place data in tree & then mark node as old */
   2571         mas_put_in_tree(mas, old_enode);
   2572
   2573         /* Update the parent pointers in the tree */
   2574         tmp[0] = *mas;
   2575         tmp[0].offset = 0;
   2576         tmp[1].node = MAS_NONE;
   2577         tmp[2].node = MAS_NONE;
   2578         while (!mte_is_leaf(tmp[0].node)) {
   2579                 n = 0;
   2580                 for (i = 0; i < 3; i++) {
   2581                         if (mas_is_none(&tmp[i]))
   2582                                 continue;
   2583
   2584                         while (n < 3) {
   2585                                 if (!mas_find_child(&tmp[i], &tmp_next[n]))
   2586                                         break;
   2587                                 n++;
   2588                         }
   2589
→ 2590                        mas_adopt_children(&tmp[i], tmp[i].node);
   2591                 }
   2592
   2593                 if (MAS_WARN_ON(mas, n == 0))
   2594                         break;
   2595
   2596                 while (n < 3)
   2597                         tmp_next[n++].node = MAS_NONE;
   2598
   2599                 for (i = 0; i < 3; i++)
   2600                         tmp[i] = tmp_next[i];
   2601         }
   2602
   2603         /* Collect the old nodes that need to be discarded */
   2604         if (mte_is_leaf(old_enode))
   2605                 return mas_free(mas, old_enode);
   2606
   2607         tmp[0] = *mas;
   2608         tmp[0].offset = 0;
   2609         tmp[0].node = old_enode;
   2610         tmp[1].node = MAS_NONE;
   2611         tmp[2].node = MAS_NONE;
   2612         in_rcu = mt_in_rcu(mas->tree);
   2613         do {
   2614                 n = 0;
   2615                 for (i = 0; i < 3; i++) {
   2616                         if (mas_is_none(&tmp[i]))
   2617                                 continue;

The read side:

    527 /*
    528  * ma_dead_node() - check if the @enode is dead.
    529  * @enode: The encoded maple node
    530  *
    531  * Return: true if dead, false otherwise.
    532  */
    533 static inline bool ma_dead_node(const struct maple_node *node)
    534 {
    535         struct maple_node *parent;
    536
    537         /* Do not reorder reads from the node prior to the parent check */
    538         smp_rmb();
→  539         parent = (void *)((unsigned long) node->parent & ~MAPLE_NODE_MASK);
    540         return (parent == node);
    541 }

   2767 static inline void *mtree_range_walk(struct ma_state *mas)
   2768 {
   2769         unsigned long *pivots;
   2770         unsigned char offset;
   2771         struct maple_node *node;
   2772         struct maple_enode *next, *last;
   2773         enum maple_type type;
   2774         void __rcu **slots;
   2775         unsigned char end;
   2776         unsigned long max, min;
   2777         unsigned long prev_max, prev_min;
   2778
   2779         next = mas->node;
   2780         min = mas->min;
   2781         max = mas->max;
   2782         do {
   2783                 offset = 0;
   2784                 last = next;
   2785                 node = mte_to_node(next);
   2786                 type = mte_node_type(next);
   2787                 pivots = ma_pivots(node, type);
   2788                 end = ma_data_end(node, type, pivots, max);
→ 2789                 if (unlikely(ma_dead_node(node)))
   2790                         goto dead_node;
   2791
   2792                 if (pivots[offset] >= mas->index) {
   2793                         prev_max = max;
   2794                         prev_min = min;
   2795                         max = pivots[offset];
   2796                         goto next;
   2797                 }
   2798
   2799                 do {
   2800                         offset++;
   2801                 } while ((offset < end) && (pivots[offset] < mas->index));

As it is evident, ma_dead_node() expands to:

    527 /*
    528  * ma_dead_node() - check if the @enode is dead.
    529  * @enode: The encoded maple node
    530  *
    531  * Return: true if dead, false otherwise.
    532  */
    533 static inline bool ma_dead_node(const struct maple_node *node)
    534 {
    535         struct maple_node *parent;
    536
    537         /* Do not reorder reads from the node prior to the parent check */
    538         smp_rmb();
→  539         parent = (void *)((unsigned long) node->parent & ~MAPLE_NODE_MASK);
    540         return (parent == node);
    541 }

as above, but the smb_rmb() protection is here, so the KCSAN warning should be double-checked for
validity.

But I do not really understand maple trees to their depth, I am only reporting the symptomatic
outlook of the reported data-race.

This is all-in-all a very interested subject, and I wish there was a way to just slurp all those
interesting kernel intrinsics into the brain, but it just ain't that easy. Forgive me for being
open ...

Best regards,
Mirsad Todorovac


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-10-02 19:50 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-21 21:01 BUG: maple_tree: KCSAN: data-race in mas_topiary_replace / mtree_range_walk Mirsad Todorovac
2023-09-22 13:51 ` Liam R. Howlett
2023-09-23  7:26   ` Mirsad Todorovac
2023-09-28 19:59     ` Liam R. Howlett
2023-09-30  7:02       ` Mirsad Todorovac
2023-10-01 21:08       ` BUG: maple_tree: KCSAN: data-race in mas_topiary_replace / mtree_range_walk [EXPERIMENTAL PATCH] Mirsad Todorovac
2023-10-02 16:43         ` Liam R. Howlett
2023-10-02 19:50           ` Mirsad Todorovac

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox