linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/6] Track node vacancy to reduce worst case allocation counts
@ 2025-02-21 16:36 Sidhartha Kumar
  2025-02-21 16:36 ` [PATCH v2 1/6] maple_tree: convert mas_prealloc_calc() to take in a maple write state Sidhartha Kumar
                   ` (6 more replies)
  0 siblings, 7 replies; 18+ messages in thread
From: Sidhartha Kumar @ 2025-02-21 16:36 UTC (permalink / raw)
  To: linux-kernel, maple-tree
  Cc: linux-mm, akpm, liam.howlett, richard.weiyang, Sidhartha Kumar

v1[1] -> v2:
  - fix comment for vacant_height which refers to depth per Wei 

  - add a patch to reorder switch case statements in mas_prealloc_calc and
    mas_wr_store_entry

  - use sufficient height in spanning stores

  - modify patch 2 to use a counter to track ascending the tree rather
    than overloading mas->depth to have this function.

================ overview ========================
Currently, the maple tree preallocates the worst case number of nodes for
given store type by taking into account the whole height of the tree. This
comes from a worst case scenario of every node in the tree being full and
having to propagate node allocation upwards until we reach the root of the
tree. This can be optimized if there are vacancies in nodes that are at a
lower depth than the root node. This series implements tracking the level
at which there is a vacant node so we only need to allocate until this
level is reached, rather than always using the full height of the tree.
The ma_wr_state struct is modified to add a field which keeps track of the
vacant height and is updated during walks of the tree. This value is then
read in mas_prealloc_calc() when we decide how many nodes to allocate.

For rebalancing and spanning stores, we also need to track the lowest
height at which a node has 1 more entry than the minimum sufficient number
of entries. This is because rebalancing can cause a parent node to become
insufficient which results in further node allocations. In this case, we
need to use the sufficient height as the worst case rather than the vacant
height.

patch 1-2: preparatory patches
patch 3: implement vacant height tracking + update the tests
patch 4: support vacant height tracking for rebalacning writes
patch 5: implement sufficient height tracking
patch 6: reorder switch case statements

================ results =========================
Bpftrace was used to profile the allocation path for requesting new maple
nodes while running stress-ng mmap 120s. The histograms below represent
requests to kmem_cache_alloc_bulk() and show the count argument. This
represnts how many maple nodes the caller is requesting in
kmem_cache_alloc_bulk()

command: stress-ng --mmap 4 --timeout 120

mm-unstable 

@bulk_alloc_req:
[3, 4)                 4 |                                                    |
[4, 5)             54170 |@                                                   |
[5, 6)                 0 |                                                    |
[6, 7)            893057 |@@@@@@@@@@@@@@@@@@@@                                |
[7, 8)                 4 |                                                    |
[8, 9)           2230287 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[9, 10)            55811 |@                                                   |
[10, 11)           77834 |@                                                   |
[11, 12)               0 |                                                    |
[12, 13)         1368684 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                     |
[13, 14)               0 |                                                    |
[14, 15)               0 |                                                    |
[15, 16)          367197 |@@@@@@@@                                            |


@maple_node_total: 46,630,160
@total_vmas: 46184591

mm-unstable + this series

@bulk_alloc_req:
[2, 3)               198 |                                                    |
[3, 4)                 4 |                                                    |
[4, 5)                43 |                                                    |
[5, 6)                 0 |                                                    |
[6, 7)           1069503 |@@@@@@@@@@@@@@@@@@@@@                               |
[7, 8)                 4 |                                                    |
[8, 9)           2597268 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[9, 10)           472191 |@@@@@@@@@                                           |
[10, 11)          191904 |@@@                                                 |
[11, 12)               0 |                                                    |
[12, 13)          247316 |@@@@                                                |
[13, 14)               0 |                                                    |
[14, 15)               0 |                                                    |
[15, 16)           98769 |@                                                   |


@maple_node_total: 37,813,856
@total_vmas: 43493287

This represents a ~19% reduction in the number of bulk maple nodes allocated.

For more reproducible results, a historgram of the return value of
mas_prealloc_calc() is displayed while running the maple_tree_tests
whcih have a deterministic store pattern

mas_prealloc_calc() return value mm-unstable
1   :  							 (12068)
3   :  							 (11836)
5   : ***** 						 (271192)
7   : ************************************************** (2329329)
9   : *********** 					 (534186)
10  :  							 (435)
11  : *************** 					 (704306)
13  : ******** 						 (409781)

mas_prealloc_calc() return value mm-unstable + this series
1   : 							 (12070)
3   : ************************************************** (3548777)
5   : ********                                           (633458)
7   :  							 (65081)
9   :  							 (11224)
10  :  							 (341)
11  :  							 (2973)
13  :  							 (68)

do_mmap latency was also measured for regressions:
command: stress-ng --mmap 4 --timeout 120

mm-unstable:
avg = 7162 nsecs, total: 16101821292 nsecs, count: 2248034

mm-unstable + this series:
avg = 6689 nsecs, total: 15135391764 nsecs, count: 2262726


[1]: https://lore.kernel.org/lkml/20241114170524.64391-1-sidhartha.kumar@oracle.com/T/

Sidhartha Kumar (6):
  maple_tree: convert mas_prealloc_calc() to take in a maple write state
  maple_tree: use height and depth consistently
  maple_tree: use vacant nodes to reduce worst case allocations
  maple_tree: break on convergence in mas_spanning_rebalance()
  maple_tree: add sufficient height
  maple_tree: reorder mas->store_type case statements

 include/linux/maple_tree.h       |   4 +
 lib/maple_tree.c                 | 193 ++++++++++++++++++-------------
 tools/testing/radix-tree/maple.c | 130 +++++++++++++++++++--
 3 files changed, 240 insertions(+), 87 deletions(-)

-- 
2.43.0



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 1/6] maple_tree: convert mas_prealloc_calc() to take in a maple write state
  2025-02-21 16:36 [PATCH v2 0/6] Track node vacancy to reduce worst case allocation counts Sidhartha Kumar
@ 2025-02-21 16:36 ` Sidhartha Kumar
  2025-02-25 15:40   ` Liam R. Howlett
  2025-02-21 16:36 ` [PATCH v2 2/6] maple_tree: use height and depth consistently Sidhartha Kumar
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Sidhartha Kumar @ 2025-02-21 16:36 UTC (permalink / raw)
  To: linux-kernel, maple-tree
  Cc: linux-mm, akpm, liam.howlett, richard.weiyang, Sidhartha Kumar

In a subsequent patch, mas_prealloc_calc() will need to access fields only
in the ma_wr_state. Convert the function to take in a ma_wr_state and
modify all callers. There is no functional change.

Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
---
 lib/maple_tree.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 42c65974a56c..0410e13a099e 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -4144,13 +4144,14 @@ static inline void mas_wr_prealloc_setup(struct ma_wr_state *wr_mas)
 /**
  * mas_prealloc_calc() - Calculate number of nodes needed for a
  * given store oepration
- * @mas: The maple state
+ * @wr_mas: The maple write state
  * @entry: The entry to store into the tree
  *
  * Return: Number of nodes required for preallocation.
  */
-static inline int mas_prealloc_calc(struct ma_state *mas, void *entry)
+static inline int mas_prealloc_calc(struct ma_wr_state *wr_mas, void *entry)
 {
+	struct ma_state *mas = wr_mas->mas;
 	int ret = mas_mt_height(mas) * 3 + 1;
 
 	switch (mas->store_type) {
@@ -4247,16 +4248,15 @@ static inline enum store_type mas_wr_store_type(struct ma_wr_state *wr_mas)
  */
 static inline void mas_wr_preallocate(struct ma_wr_state *wr_mas, void *entry)
 {
-	struct ma_state *mas = wr_mas->mas;
 	int request;
 
 	mas_wr_prealloc_setup(wr_mas);
-	mas->store_type = mas_wr_store_type(wr_mas);
-	request = mas_prealloc_calc(mas, entry);
+	wr_mas->mas->store_type = mas_wr_store_type(wr_mas);
+	request = mas_prealloc_calc(wr_mas, entry);
 	if (!request)
 		return;
 
-	mas_node_count(mas, request);
+	mas_node_count(wr_mas->mas, request);
 }
 
 /**
@@ -5401,7 +5401,7 @@ void *mas_store(struct ma_state *mas, void *entry)
 		return wr_mas.content;
 	}
 
-	request = mas_prealloc_calc(mas, entry);
+	request = mas_prealloc_calc(&wr_mas, entry);
 	if (!request)
 		goto store;
 
@@ -5498,7 +5498,7 @@ int mas_preallocate(struct ma_state *mas, void *entry, gfp_t gfp)
 
 	mas_wr_prealloc_setup(&wr_mas);
 	mas->store_type = mas_wr_store_type(&wr_mas);
-	request = mas_prealloc_calc(mas, entry);
+	request = mas_prealloc_calc(&wr_mas, entry);
 	if (!request)
 		return ret;
 
-- 
2.43.0



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 2/6] maple_tree: use height and depth consistently
  2025-02-21 16:36 [PATCH v2 0/6] Track node vacancy to reduce worst case allocation counts Sidhartha Kumar
  2025-02-21 16:36 ` [PATCH v2 1/6] maple_tree: convert mas_prealloc_calc() to take in a maple write state Sidhartha Kumar
@ 2025-02-21 16:36 ` Sidhartha Kumar
  2025-02-25 15:39   ` Liam R. Howlett
  2025-02-21 16:36 ` [PATCH v2 3/6] maple_tree: use vacant nodes to reduce worst case allocations Sidhartha Kumar
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Sidhartha Kumar @ 2025-02-21 16:36 UTC (permalink / raw)
  To: linux-kernel, maple-tree
  Cc: linux-mm, akpm, liam.howlett, richard.weiyang, Sidhartha Kumar

For the maple tree, the root node is defined to have a depth of 0 with a
height of 1. Each level down from the node, these values are incremented
by 1. Various code paths define a root with depth 1 which is inconsisent
with the definition. Modify the code to be consistent with this
definition.

Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
---
 lib/maple_tree.c | 85 +++++++++++++++++++++++++-----------------------
 1 file changed, 45 insertions(+), 40 deletions(-)

diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 0410e13a099e..d7dac3119748 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -211,14 +211,14 @@ static void ma_free_rcu(struct maple_node *node)
 	call_rcu(&node->rcu, mt_free_rcu);
 }
 
-static void mas_set_height(struct ma_state *mas)
+static void mt_set_height(struct maple_tree *mt, unsigned char height)
 {
-	unsigned int new_flags = mas->tree->ma_flags;
+	unsigned int new_flags = mt->ma_flags;
 
 	new_flags &= ~MT_FLAGS_HEIGHT_MASK;
-	MAS_BUG_ON(mas, mas->depth > MAPLE_HEIGHT_MAX);
-	new_flags |= mas->depth << MT_FLAGS_HEIGHT_OFFSET;
-	mas->tree->ma_flags = new_flags;
+	MT_BUG_ON(mt, height > MAPLE_HEIGHT_MAX);
+	new_flags |= height << MT_FLAGS_HEIGHT_OFFSET;
+	mt->ma_flags = new_flags;
 }
 
 static unsigned int mas_mt_height(struct ma_state *mas)
@@ -1375,7 +1375,7 @@ static inline struct maple_enode *mas_start(struct ma_state *mas)
 		root = mas_root(mas);
 		/* Tree with nodes */
 		if (likely(xa_is_node(root))) {
-			mas->depth = 1;
+			mas->depth = 0;
 			mas->status = ma_active;
 			mas->node = mte_safe_root(root);
 			mas->offset = 0;
@@ -1716,9 +1716,10 @@ static inline void mas_adopt_children(struct ma_state *mas,
  * node as dead.
  * @mas: the maple state with the new node
  * @old_enode: The old maple encoded node to replace.
+ * @new_height: if we are inserting a root node, update the height of the tree
  */
 static inline void mas_put_in_tree(struct ma_state *mas,
-		struct maple_enode *old_enode)
+		struct maple_enode *old_enode, char new_height)
 	__must_hold(mas->tree->ma_lock)
 {
 	unsigned char offset;
@@ -1727,7 +1728,7 @@ static inline void mas_put_in_tree(struct ma_state *mas,
 	if (mte_is_root(mas->node)) {
 		mas_mn(mas)->parent = ma_parent_ptr(mas_tree_parent(mas));
 		rcu_assign_pointer(mas->tree->ma_root, mte_mk_root(mas->node));
-		mas_set_height(mas);
+		mt_set_height(mas->tree, new_height);
 	} else {
 
 		offset = mte_parent_slot(mas->node);
@@ -1747,10 +1748,10 @@ static inline void mas_put_in_tree(struct ma_state *mas,
  * @old_enode: The old maple encoded node.
  */
 static inline void mas_replace_node(struct ma_state *mas,
-		struct maple_enode *old_enode)
+		struct maple_enode *old_enode, unsigned char new_height)
 	__must_hold(mas->tree->ma_lock)
 {
-	mas_put_in_tree(mas, old_enode);
+	mas_put_in_tree(mas, old_enode, new_height);
 	mas_free(mas, old_enode);
 }
 
@@ -2543,7 +2544,7 @@ static inline void mas_topiary_node(struct ma_state *mas,
  *
  */
 static inline void mas_topiary_replace(struct ma_state *mas,
-		struct maple_enode *old_enode)
+		struct maple_enode *old_enode, unsigned char new_height)
 {
 	struct ma_state tmp[3], tmp_next[3];
 	MA_TOPIARY(subtrees, mas->tree);
@@ -2551,7 +2552,7 @@ static inline void mas_topiary_replace(struct ma_state *mas,
 	int i, n;
 
 	/* Place data in tree & then mark node as old */
-	mas_put_in_tree(mas, old_enode);
+	mas_put_in_tree(mas, old_enode, new_height);
 
 	/* Update the parent pointers in the tree */
 	tmp[0] = *mas;
@@ -2639,10 +2640,10 @@ static inline void mas_topiary_replace(struct ma_state *mas,
  * Updates gap as necessary.
  */
 static inline void mas_wmb_replace(struct ma_state *mas,
-		struct maple_enode *old_enode)
+		struct maple_enode *old_enode, unsigned char new_height)
 {
 	/* Insert the new data in the tree */
-	mas_topiary_replace(mas, old_enode);
+	mas_topiary_replace(mas, old_enode, new_height);
 
 	if (mte_is_leaf(mas->node))
 		return;
@@ -2828,6 +2829,7 @@ static void mas_spanning_rebalance(struct ma_state *mas,
 {
 	unsigned char split, mid_split;
 	unsigned char slot = 0;
+	unsigned char new_height = 0; /* used if node is a new root */
 	struct maple_enode *left = NULL, *middle = NULL, *right = NULL;
 	struct maple_enode *old_enode;
 
@@ -2877,7 +2879,7 @@ static void mas_spanning_rebalance(struct ma_state *mas,
 		 */
 		memset(mast->bn, 0, sizeof(struct maple_big_node));
 		mast->bn->type = mte_node_type(left);
-		l_mas.depth++;
+		new_height++;
 
 		/* Root already stored in l->node. */
 		if (mas_is_root_limits(mast->l))
@@ -2901,8 +2903,10 @@ static void mas_spanning_rebalance(struct ma_state *mas,
 			continue;
 
 		/* May be a new root stored in mast->bn */
-		if (mas_is_root_limits(mast->orig_l))
+		if (mas_is_root_limits(mast->orig_l)) {
+			new_height++;
 			break;
+		}
 
 		mast_spanning_rebalance(mast);
 
@@ -2913,7 +2917,7 @@ static void mas_spanning_rebalance(struct ma_state *mas,
 
 	l_mas.node = mt_mk_node(ma_mnode_ptr(mas_pop_node(mas)),
 				mte_node_type(mast->orig_l->node));
-	l_mas.depth++;
+
 	mab_mas_cp(mast->bn, 0, mt_slots[mast->bn->type] - 1, &l_mas, true);
 	mas_set_parent(mas, left, l_mas.node, slot);
 	if (middle)
@@ -2937,7 +2941,7 @@ static void mas_spanning_rebalance(struct ma_state *mas,
 	mas->min = l_mas.min;
 	mas->max = l_mas.max;
 	mas->offset = l_mas.offset;
-	mas_wmb_replace(mas, old_enode);
+	mas_wmb_replace(mas, old_enode, new_height);
 	mtree_range_walk(mas);
 	return;
 }
@@ -3013,6 +3017,7 @@ static inline void mas_destroy_rebalance(struct ma_state *mas, unsigned char end
 	void __rcu **l_slots, **slots;
 	unsigned long *l_pivs, *pivs, gap;
 	bool in_rcu = mt_in_rcu(mas->tree);
+	unsigned char new_height = mas_mt_height(mas);
 
 	MA_STATE(l_mas, mas->tree, mas->index, mas->last);
 
@@ -3107,7 +3112,7 @@ static inline void mas_destroy_rebalance(struct ma_state *mas, unsigned char end
 	mas_ascend(mas);
 
 	if (in_rcu) {
-		mas_replace_node(mas, old_eparent);
+		mas_replace_node(mas, old_eparent, new_height);
 		mas_adopt_children(mas, mas->node);
 	}
 
@@ -3118,10 +3123,9 @@ static inline void mas_destroy_rebalance(struct ma_state *mas, unsigned char end
  * mas_split_final_node() - Split the final node in a subtree operation.
  * @mast: the maple subtree state
  * @mas: The maple state
- * @height: The height of the tree in case it's a new root.
  */
 static inline void mas_split_final_node(struct maple_subtree_state *mast,
-					struct ma_state *mas, int height)
+					struct ma_state *mas)
 {
 	struct maple_enode *ancestor;
 
@@ -3130,7 +3134,6 @@ static inline void mas_split_final_node(struct maple_subtree_state *mast,
 			mast->bn->type = maple_arange_64;
 		else
 			mast->bn->type = maple_range_64;
-		mas->depth = height;
 	}
 	/*
 	 * Only a single node is used here, could be root.
@@ -3153,8 +3156,7 @@ static inline void mas_split_final_node(struct maple_subtree_state *mast,
  * @skip: The number of entries to skip for new nodes insertion.
  */
 static inline void mast_fill_bnode(struct maple_subtree_state *mast,
-					 struct ma_state *mas,
-					 unsigned char skip)
+					 struct ma_state *mas, unsigned char skip, int *height)
 {
 	bool cp = true;
 	unsigned char split;
@@ -3226,7 +3228,7 @@ static inline void mast_split_data(struct maple_subtree_state *mast,
  *
  * Return: True if pushed, false otherwise.
  */
-static inline bool mas_push_data(struct ma_state *mas, int height,
+static inline bool mas_push_data(struct ma_state *mas, int *height,
 				 struct maple_subtree_state *mast, bool left)
 {
 	unsigned char slot_total = mast->bn->b_end;
@@ -3283,8 +3285,8 @@ static inline bool mas_push_data(struct ma_state *mas, int height,
 		mast->orig_l->offset += end + 1;
 
 	mast_split_data(mast, mas, split);
-	mast_fill_bnode(mast, mas, 2);
-	mas_split_final_node(mast, mas, height + 1);
+	mast_fill_bnode(mast, mas, 2, height);
+	mas_split_final_node(mast, mas);
 	return true;
 }
 
@@ -3297,6 +3299,7 @@ static void mas_split(struct ma_state *mas, struct maple_big_node *b_node)
 {
 	struct maple_subtree_state mast;
 	int height = 0;
+	unsigned int orig_height = mas_mt_height(mas);
 	unsigned char mid_split, split = 0;
 	struct maple_enode *old;
 
@@ -3323,7 +3326,7 @@ static void mas_split(struct ma_state *mas, struct maple_big_node *b_node)
 	MA_STATE(prev_r_mas, mas->tree, mas->index, mas->last);
 
 	trace_ma_op(__func__, mas);
-	mas->depth = mas_mt_height(mas);
+	mas->depth = orig_height;
 
 	mast.l = &l_mas;
 	mast.r = &r_mas;
@@ -3333,7 +3336,7 @@ static void mas_split(struct ma_state *mas, struct maple_big_node *b_node)
 
 	while (height++ <= mas->depth) {
 		if (mt_slots[b_node->type] > b_node->b_end) {
-			mas_split_final_node(&mast, mas, height);
+			mas_split_final_node(&mast, mas);
 			break;
 		}
 
@@ -3348,11 +3351,15 @@ static void mas_split(struct ma_state *mas, struct maple_big_node *b_node)
 		 * is a significant savings.
 		 */
 		/* Try to push left. */
-		if (mas_push_data(mas, height, &mast, true))
+		if (mas_push_data(mas, &height, &mast, true)) {
+			height++;
 			break;
+		}
 		/* Try to push right. */
-		if (mas_push_data(mas, height, &mast, false))
+		if (mas_push_data(mas, &height, &mast, false)) {
+			height++;
 			break;
+		}
 
 		split = mab_calc_split(mas, b_node, &mid_split);
 		mast_split_data(&mast, mas, split);
@@ -3361,7 +3368,7 @@ static void mas_split(struct ma_state *mas, struct maple_big_node *b_node)
 		 * r->max.
 		 */
 		mast.r->max = mas->max;
-		mast_fill_bnode(&mast, mas, 1);
+		mast_fill_bnode(&mast, mas, 1, &height);
 		prev_l_mas = *mast.l;
 		prev_r_mas = *mast.r;
 	}
@@ -3369,7 +3376,7 @@ static void mas_split(struct ma_state *mas, struct maple_big_node *b_node)
 	/* Set the original node as dead */
 	old = mas->node;
 	mas->node = l_mas.node;
-	mas_wmb_replace(mas, old);
+	mas_wmb_replace(mas, old, height);
 	mtree_range_walk(mas);
 	return;
 }
@@ -3428,8 +3435,7 @@ static inline void mas_root_expand(struct ma_state *mas, void *entry)
 	if (mas->last != ULONG_MAX)
 		pivots[++slot] = ULONG_MAX;
 
-	mas->depth = 1;
-	mas_set_height(mas);
+	mt_set_height(mas->tree, 1);
 	ma_set_meta(node, maple_leaf_64, 0, slot);
 	/* swap the new root into the tree */
 	rcu_assign_pointer(mas->tree->ma_root, mte_mk_root(mas->node));
@@ -3673,8 +3679,7 @@ static inline void mas_new_root(struct ma_state *mas, void *entry)
 	WARN_ON_ONCE(mas->index || mas->last != ULONG_MAX);
 
 	if (!entry) {
-		mas->depth = 0;
-		mas_set_height(mas);
+		mt_set_height(mas->tree, 0);
 		rcu_assign_pointer(mas->tree->ma_root, entry);
 		mas->status = ma_start;
 		goto done;
@@ -3688,8 +3693,7 @@ static inline void mas_new_root(struct ma_state *mas, void *entry)
 	mas->status = ma_active;
 	rcu_assign_pointer(slots[0], entry);
 	pivots[0] = mas->last;
-	mas->depth = 1;
-	mas_set_height(mas);
+	mt_set_height(mas->tree, 1);
 	rcu_assign_pointer(mas->tree->ma_root, mte_mk_root(mas->node));
 
 done:
@@ -3808,6 +3812,7 @@ static inline void mas_wr_node_store(struct ma_wr_state *wr_mas,
 	struct maple_node reuse, *newnode;
 	unsigned char copy_size, node_pivots = mt_pivots[wr_mas->type];
 	bool in_rcu = mt_in_rcu(mas->tree);
+	unsigned char height = mas_mt_height(mas);
 
 	if (mas->last == wr_mas->end_piv)
 		offset_end++; /* don't copy this offset */
@@ -3864,7 +3869,7 @@ static inline void mas_wr_node_store(struct ma_wr_state *wr_mas,
 		struct maple_enode *old_enode = mas->node;
 
 		mas->node = mt_mk_node(newnode, wr_mas->type);
-		mas_replace_node(mas, old_enode);
+		mas_replace_node(mas, old_enode, height);
 	} else {
 		memcpy(wr_mas->node, newnode, sizeof(struct maple_node));
 	}
-- 
2.43.0



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 3/6] maple_tree: use vacant nodes to reduce worst case allocations
  2025-02-21 16:36 [PATCH v2 0/6] Track node vacancy to reduce worst case allocation counts Sidhartha Kumar
  2025-02-21 16:36 ` [PATCH v2 1/6] maple_tree: convert mas_prealloc_calc() to take in a maple write state Sidhartha Kumar
  2025-02-21 16:36 ` [PATCH v2 2/6] maple_tree: use height and depth consistently Sidhartha Kumar
@ 2025-02-21 16:36 ` Sidhartha Kumar
  2025-02-25 15:46   ` Liam R. Howlett
  2025-02-21 16:36 ` [PATCH v2 4/6] maple_tree: break on convergence in mas_spanning_rebalance() Sidhartha Kumar
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Sidhartha Kumar @ 2025-02-21 16:36 UTC (permalink / raw)
  To: linux-kernel, maple-tree
  Cc: linux-mm, akpm, liam.howlett, richard.weiyang, Sidhartha Kumar

In order to determine the store type for a maple tree operation, a walk
of the tree is done through mas_wr_walk(). This function descends the
tree until a spanning write is detected or we reach a leaf node. While
descending, keep track of the height at which we encounter a node with
available space. This is done by checking if mas->end is less than the
number of slots a given node type can fit.

Now that the height of the vacant node is tracked, we can use the
difference between the height of the tree and the height of the vacant
node to know how many levels we will have to propagate creating new
nodes. Update mas_prealloc_calc() to consider the vacant height and
reduce the number of worst-case allocations.

Rebalancing and spanning stores are not supported and fall back to using
the full height of the tree for allocations.

Update preallocation testing assertions to take into account vacant
height.

Signed-off-by: Sidhartha <sidhartha.kumar@oracle.com>
---
 include/linux/maple_tree.h       |   2 +
 lib/maple_tree.c                 |  13 ++--
 tools/testing/radix-tree/maple.c | 102 ++++++++++++++++++++++++++++---
 3 files changed, 105 insertions(+), 12 deletions(-)

diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
index cbbcd18d4186..7d777aa2d9ed 100644
--- a/include/linux/maple_tree.h
+++ b/include/linux/maple_tree.h
@@ -463,6 +463,7 @@ struct ma_wr_state {
 	void __rcu **slots;		/* mas->node->slots pointer */
 	void *entry;			/* The entry to write */
 	void *content;			/* The existing entry that is being overwritten */
+	unsigned char vacant_height;	/* Height of lowest node with free space */
 };
 
 #define mas_lock(mas)           spin_lock(&((mas)->tree->ma_lock))
@@ -498,6 +499,7 @@ struct ma_wr_state {
 		.mas = ma_state,					\
 		.content = NULL,					\
 		.entry = wr_entry,					\
+		.vacant_height = 0					\
 	}
 
 #define MA_TOPIARY(name, tree)						\
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index d7dac3119748..ef71af0529f4 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -3542,6 +3542,9 @@ static bool mas_wr_walk(struct ma_wr_state *wr_mas)
 		if (ma_is_leaf(wr_mas->type))
 			return true;
 
+		if (mas->end < mt_slots[wr_mas->type] - 1)
+			wr_mas->vacant_height = mas->depth + 1;
+
 		mas_wr_walk_traverse(wr_mas);
 	}
 
@@ -4157,7 +4160,9 @@ static inline void mas_wr_prealloc_setup(struct ma_wr_state *wr_mas)
 static inline int mas_prealloc_calc(struct ma_wr_state *wr_mas, void *entry)
 {
 	struct ma_state *mas = wr_mas->mas;
-	int ret = mas_mt_height(mas) * 3 + 1;
+	unsigned char height = mas_mt_height(mas);
+	int ret = height * 3 + 1;
+	unsigned char delta = height - wr_mas->vacant_height;
 
 	switch (mas->store_type) {
 	case wr_invalid:
@@ -4175,13 +4180,13 @@ static inline int mas_prealloc_calc(struct ma_wr_state *wr_mas, void *entry)
 			ret = 0;
 		break;
 	case wr_spanning_store:
-		ret =  mas_mt_height(mas) * 3 + 1;
+		ret = height * 3 + 1;
 		break;
 	case wr_split_store:
-		ret =  mas_mt_height(mas) * 2 + 1;
+		ret = delta * 2 + 1;
 		break;
 	case wr_rebalance:
-		ret =  mas_mt_height(mas) * 2 - 1;
+		ret = height * 2 + 1;
 		break;
 	case wr_node_store:
 		ret = mt_in_rcu(mas->tree) ? 1 : 0;
diff --git a/tools/testing/radix-tree/maple.c b/tools/testing/radix-tree/maple.c
index bc30050227fd..d22c1008dffe 100644
--- a/tools/testing/radix-tree/maple.c
+++ b/tools/testing/radix-tree/maple.c
@@ -35475,12 +35475,85 @@ static void check_dfs_preorder(struct maple_tree *mt)
 }
 /* End of depth first search tests */
 
+/* same implementation as mas_is_span_wr() in lib/maple_tree.c */
+static bool is_span_wr(struct ma_state *mas, unsigned long r_max,
+				enum maple_type type, void *entry)
+{
+	unsigned long max = r_max;
+	unsigned long last = mas->last;
+
+	/* Contained in this pivot, fast path */
+	if (last < max)
+		return false;
+
+	if (ma_is_leaf(type)) {
+		max = mas->max;
+		if (last < max)
+			return false;
+	}
+
+	if (last == max) {
+		/*
+		 * The last entry of leaf node cannot be NULL unless it is the
+		 * rightmost node (writing ULONG_MAX), otherwise it spans slots.
+		 */
+		if (entry || last == ULONG_MAX)
+			return false;
+	}
+
+	return true;
+}
+
+/* get height of the lowest non-leaf node with free space */
+static unsigned char get_vacant_height(struct ma_state *mas, void *entry)
+{
+	char vacant_height = 0;
+	enum maple_type type;
+	unsigned long *pivots;
+	unsigned long min = 0;
+	unsigned long max = ULONG_MAX;
+
+	/* start traversal */
+	mas_reset(mas);
+	mas_start(mas);
+	if (!xa_is_node(mas_root(mas)))
+		return 0;
+
+	type = mte_node_type(mas->node);
+	while (!ma_is_leaf(type)) {
+		mas_node_walk(mas, mte_to_node(mas->node), type, &min, &max);
+		mas->end = mas_data_end(mas);
+		pivots = ma_pivots(mte_to_node(mas->node), type);
+
+		if (pivots) {
+			if (mas->offset)
+				min = pivots[mas->offset - 1];
+			if (mas->offset < mas->end)
+				max = pivots[mas->offset];
+		}
+
+		/* detect spanning write */
+		if (is_span_wr(mas, max, type, entry))
+			break;
+
+		if (mas->end < mt_slot_count(mas->node) - 1)
+			vacant_height = mas->depth + 1;
+
+		mas_descend(mas);
+		type = mte_node_type(mas->node);
+		mas->depth++;
+	}
+
+	return vacant_height;
+}
+
 /* Preallocation testing */
 static noinline void __init check_prealloc(struct maple_tree *mt)
 {
 	unsigned long i, max = 100;
 	unsigned long allocated;
 	unsigned char height;
+	unsigned char vacant_height;
 	struct maple_node *mn;
 	void *ptr = check_prealloc;
 	MA_STATE(mas, mt, 10, 20);
@@ -35494,8 +35567,9 @@ static noinline void __init check_prealloc(struct maple_tree *mt)
 	MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
 	allocated = mas_allocated(&mas);
 	height = mas_mt_height(&mas);
+	vacant_height = get_vacant_height(&mas, ptr);
 	MT_BUG_ON(mt, allocated == 0);
-	MT_BUG_ON(mt, allocated != 1 + height * 3);
+	MT_BUG_ON(mt, allocated != 1 + (height - vacant_height) * 3);
 	mas_destroy(&mas);
 	allocated = mas_allocated(&mas);
 	MT_BUG_ON(mt, allocated != 0);
@@ -35503,8 +35577,9 @@ static noinline void __init check_prealloc(struct maple_tree *mt)
 	MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
 	allocated = mas_allocated(&mas);
 	height = mas_mt_height(&mas);
+	vacant_height = get_vacant_height(&mas, ptr);
 	MT_BUG_ON(mt, allocated == 0);
-	MT_BUG_ON(mt, allocated != 1 + height * 3);
+	MT_BUG_ON(mt, allocated != 1 + (height - vacant_height) * 3);
 	MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
 	mas_destroy(&mas);
 	allocated = mas_allocated(&mas);
@@ -35514,7 +35589,8 @@ static noinline void __init check_prealloc(struct maple_tree *mt)
 	MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
 	allocated = mas_allocated(&mas);
 	height = mas_mt_height(&mas);
-	MT_BUG_ON(mt, allocated != 1 + height * 3);
+	vacant_height = get_vacant_height(&mas, ptr);
+	MT_BUG_ON(mt, allocated != 1 + (height - vacant_height) * 3);
 	mn = mas_pop_node(&mas);
 	MT_BUG_ON(mt, mas_allocated(&mas) != allocated - 1);
 	mn->parent = ma_parent_ptr(mn);
@@ -35527,7 +35603,8 @@ static noinline void __init check_prealloc(struct maple_tree *mt)
 	MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
 	allocated = mas_allocated(&mas);
 	height = mas_mt_height(&mas);
-	MT_BUG_ON(mt, allocated != 1 + height * 3);
+	vacant_height = get_vacant_height(&mas, ptr);
+	MT_BUG_ON(mt, allocated != 1 + (height - vacant_height) * 3);
 	mn = mas_pop_node(&mas);
 	MT_BUG_ON(mt, mas_allocated(&mas) != allocated - 1);
 	MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
@@ -35540,7 +35617,8 @@ static noinline void __init check_prealloc(struct maple_tree *mt)
 	MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
 	allocated = mas_allocated(&mas);
 	height = mas_mt_height(&mas);
-	MT_BUG_ON(mt, allocated != 1 + height * 3);
+	vacant_height = get_vacant_height(&mas, ptr);
+	MT_BUG_ON(mt, allocated != 1 + (height - vacant_height) * 3);
 	mn = mas_pop_node(&mas);
 	MT_BUG_ON(mt, mas_allocated(&mas) != allocated - 1);
 	mas_push_node(&mas, mn);
@@ -35553,7 +35631,8 @@ static noinline void __init check_prealloc(struct maple_tree *mt)
 	MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
 	allocated = mas_allocated(&mas);
 	height = mas_mt_height(&mas);
-	MT_BUG_ON(mt, allocated != 1 + height * 3);
+	vacant_height = get_vacant_height(&mas, ptr);
+	MT_BUG_ON(mt, allocated != 1 + (height - vacant_height) * 3);
 	mas_store_prealloc(&mas, ptr);
 	MT_BUG_ON(mt, mas_allocated(&mas) != 0);
 
@@ -35578,7 +35657,8 @@ static noinline void __init check_prealloc(struct maple_tree *mt)
 	MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
 	allocated = mas_allocated(&mas);
 	height = mas_mt_height(&mas);
-	MT_BUG_ON(mt, allocated != 1 + height * 2);
+	vacant_height = get_vacant_height(&mas, ptr);
+	MT_BUG_ON(mt, allocated != 1 + (height - vacant_height) * 2);
 	mas_store_prealloc(&mas, ptr);
 	MT_BUG_ON(mt, mas_allocated(&mas) != 0);
 	mt_set_non_kernel(1);
@@ -35595,8 +35675,14 @@ static noinline void __init check_prealloc(struct maple_tree *mt)
 	MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
 	allocated = mas_allocated(&mas);
 	height = mas_mt_height(&mas);
+	vacant_height = get_vacant_height(&mas, ptr);
 	MT_BUG_ON(mt, allocated == 0);
-	MT_BUG_ON(mt, allocated != 1 + height * 3);
+	/*
+	 * vacant height cannot be used to compute the number of nodes needed
+	 * as the root contains two entries which means it is on the verge of
+	 * insufficiency. The worst case full height of the tree is needed.
+	 */
+	MT_BUG_ON(mt, allocated != height * 3 + 1);
 	mas_store_prealloc(&mas, ptr);
 	MT_BUG_ON(mt, mas_allocated(&mas) != 0);
 	mas_set_range(&mas, 0, 200);
-- 
2.43.0



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 4/6] maple_tree: break on convergence in mas_spanning_rebalance()
  2025-02-21 16:36 [PATCH v2 0/6] Track node vacancy to reduce worst case allocation counts Sidhartha Kumar
                   ` (2 preceding siblings ...)
  2025-02-21 16:36 ` [PATCH v2 3/6] maple_tree: use vacant nodes to reduce worst case allocations Sidhartha Kumar
@ 2025-02-21 16:36 ` Sidhartha Kumar
  2025-02-25 15:47   ` Liam R. Howlett
  2025-02-21 16:36 ` [PATCH v2 5/6] maple_tree: add sufficient height Sidhartha Kumar
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Sidhartha Kumar @ 2025-02-21 16:36 UTC (permalink / raw)
  To: linux-kernel, maple-tree
  Cc: linux-mm, akpm, liam.howlett, richard.weiyang, Sidhartha Kumar

This allows support for using the vacant height to calculate the worst
case number of nodes needed for wr_rebalance operation.
mas_spanning_rebalance() was seen to perform unnecessary node allocations.
We can reduce allocations by breaking early during the rebalancing loop
once we realize that we have ascended to a common ancestor.

Suggested-by: Liam Howlett <liam.howlett@oracle.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
---
 lib/maple_tree.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index ef71af0529f4..4de257003251 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -2896,11 +2896,24 @@ static void mas_spanning_rebalance(struct ma_state *mas,
 		mast_combine_cp_right(mast);
 		mast->orig_l->last = mast->orig_l->max;
 
-		if (mast_sufficient(mast))
-			continue;
+		if (mast_sufficient(mast)) {
+			if (mast_overflow(mast))
+				continue;
+
+			if (mast->orig_l->node == mast->orig_r->node) {
+			       /*
+				* The data in b_node should be stored in one
+				* node and in the tree
+				*/
+				slot = mast->l->offset;
+				/* May be a new root stored in mast->bn */
+				if (mas_is_root_limits(mast->orig_l))
+					new_height++;
+				break;
+			}
 
-		if (mast_overflow(mast))
 			continue;
+		}
 
 		/* May be a new root stored in mast->bn */
 		if (mas_is_root_limits(mast->orig_l)) {
-- 
2.43.0



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 5/6] maple_tree: add sufficient height
  2025-02-21 16:36 [PATCH v2 0/6] Track node vacancy to reduce worst case allocation counts Sidhartha Kumar
                   ` (3 preceding siblings ...)
  2025-02-21 16:36 ` [PATCH v2 4/6] maple_tree: break on convergence in mas_spanning_rebalance() Sidhartha Kumar
@ 2025-02-21 16:36 ` Sidhartha Kumar
  2025-02-25 16:02   ` Liam R. Howlett
  2025-02-21 16:36 ` [PATCH v2 6/6] maple_tree: reorder mas->store_type case statements Sidhartha Kumar
  2025-02-25 15:40 ` [PATCH v2 0/6] Track node vacancy to reduce worst case allocation counts Liam R. Howlett
  6 siblings, 1 reply; 18+ messages in thread
From: Sidhartha Kumar @ 2025-02-21 16:36 UTC (permalink / raw)
  To: linux-kernel, maple-tree
  Cc: linux-mm, akpm, liam.howlett, richard.weiyang, Sidhartha Kumar

If a parent node is vacant but holds mt_min_slots + 1 entries,
rebalancing with a leaf node could cause this parent node to become
insufficient. This will lead to another level of rebalancing in the tree
and requires more node allocations. Therefore, we also have to track the
level at which there is a node with > mt_min_slots entries. We can use
this as the worst case for the spanning and rebalacning stores.

Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
---
 include/linux/maple_tree.h       |  4 +++-
 lib/maple_tree.c                 | 17 +++++++++++++++--
 tools/testing/radix-tree/maple.c | 28 ++++++++++++++++++++++++++++
 3 files changed, 46 insertions(+), 3 deletions(-)

diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
index 7d777aa2d9ed..37dc9525dff6 100644
--- a/include/linux/maple_tree.h
+++ b/include/linux/maple_tree.h
@@ -464,6 +464,7 @@ struct ma_wr_state {
 	void *entry;			/* The entry to write */
 	void *content;			/* The existing entry that is being overwritten */
 	unsigned char vacant_height;	/* Depth of lowest node with free space */
+	unsigned char sufficient_height;/* Depth of lowest node with min sufficiency + 1 nodes */
 };
 
 #define mas_lock(mas)           spin_lock(&((mas)->tree->ma_lock))
@@ -499,7 +500,8 @@ struct ma_wr_state {
 		.mas = ma_state,					\
 		.content = NULL,					\
 		.entry = wr_entry,					\
-		.vacant_height = 0					\
+		.vacant_height = 0,					\
+		.sufficient_height = 0					\
 	}
 
 #define MA_TOPIARY(name, tree)						\
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 4de257003251..8fdd3f477198 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -3558,6 +3558,13 @@ static bool mas_wr_walk(struct ma_wr_state *wr_mas)
 		if (mas->end < mt_slots[wr_mas->type] - 1)
 			wr_mas->vacant_height = mas->depth + 1;
 
+		if (ma_is_root(mas_mn(mas))) {
+			/* root needs more than 2 entries to be sufficient + 1 */
+			if (mas->end > 2)
+				wr_mas->sufficient_height = 1;
+		} else if (mas->end > mt_min_slots[wr_mas->type] + 1)
+			wr_mas->sufficient_height = mas->depth + 1;
+
 		mas_wr_walk_traverse(wr_mas);
 	}
 
@@ -4193,13 +4200,19 @@ static inline int mas_prealloc_calc(struct ma_wr_state *wr_mas, void *entry)
 			ret = 0;
 		break;
 	case wr_spanning_store:
-		ret = height * 3 + 1;
+		if (wr_mas->sufficient_height < wr_mas->vacant_height)
+			ret = (height - wr_mas->sufficient_height) * 3 + 1;
+		else
+			ret = delta * 3 + 1;
 		break;
 	case wr_split_store:
 		ret = delta * 2 + 1;
 		break;
 	case wr_rebalance:
-		ret = height * 2 + 1;
+		if (wr_mas->sufficient_height < wr_mas->vacant_height)
+			ret = (height - wr_mas->sufficient_height) * 2 + 1;
+		else
+			ret = delta * 2 + 1;
 		break;
 	case wr_node_store:
 		ret = mt_in_rcu(mas->tree) ? 1 : 0;
diff --git a/tools/testing/radix-tree/maple.c b/tools/testing/radix-tree/maple.c
index d22c1008dffe..d40f70671cb8 100644
--- a/tools/testing/radix-tree/maple.c
+++ b/tools/testing/radix-tree/maple.c
@@ -36334,6 +36334,30 @@ static noinline void __init check_mtree_dup(struct maple_tree *mt)
 
 extern void test_kmem_cache_bulk(void);
 
+/*
+ * Test to check the path of a spanning rebalance which results in
+ * a collapse where the rebalancing of the child node leads to
+ * insufficieny in the parent node.
+ */
+static void check_collapsing_rebalance(struct maple_tree *mt)
+{
+	int i = 0;
+	MA_STATE(mas, mt, ULONG_MAX, ULONG_MAX);
+
+	/* create a height 4 tree */
+	while (mt_height(mt) < 4) {
+		mtree_store_range(mt, i, i + 10, xa_mk_value(i), GFP_KERNEL);
+		i += 9;
+	}
+
+	/* delete all entries one at a time, starting from the right */
+	do {
+		mas_erase(&mas);
+	} while (mas_prev(&mas, 0) != NULL);
+
+	mtree_unlock(mt);
+}
+
 /* callback function used for check_nomem_writer_race() */
 static void writer2(void *maple_tree)
 {
@@ -36500,6 +36524,10 @@ void farmer_tests(void)
 	check_spanning_write(&tree);
 	mtree_destroy(&tree);
 
+	mt_init_flags(&tree, MT_FLAGS_ALLOC_RANGE);
+	check_collapsing_rebalance(&tree);
+	mtree_destroy(&tree);
+
 	mt_init_flags(&tree, MT_FLAGS_ALLOC_RANGE);
 	check_null_expand(&tree);
 	mtree_destroy(&tree);
-- 
2.43.0



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 6/6] maple_tree: reorder mas->store_type case statements
  2025-02-21 16:36 [PATCH v2 0/6] Track node vacancy to reduce worst case allocation counts Sidhartha Kumar
                   ` (4 preceding siblings ...)
  2025-02-21 16:36 ` [PATCH v2 5/6] maple_tree: add sufficient height Sidhartha Kumar
@ 2025-02-21 16:36 ` Sidhartha Kumar
  2025-02-25 16:03   ` Liam R. Howlett
  2025-02-25 15:40 ` [PATCH v2 0/6] Track node vacancy to reduce worst case allocation counts Liam R. Howlett
  6 siblings, 1 reply; 18+ messages in thread
From: Sidhartha Kumar @ 2025-02-21 16:36 UTC (permalink / raw)
  To: linux-kernel, maple-tree
  Cc: linux-mm, akpm, liam.howlett, richard.weiyang, Sidhartha Kumar

Move the unlikely case that mas->store_type is invalid to be the last
evaluated case and put liklier cases higher up.

Suggested-by: Liam Howlett <liam.howlett@oracle.com>
Signed-off-by: Sidhartha <sidhartha.kumar@oracle.com>
---
 lib/maple_tree.c | 51 ++++++++++++++++++++++++------------------------
 1 file changed, 25 insertions(+), 26 deletions(-)

diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 8fdd3f477198..36d7d7a27e32 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -4091,15 +4091,6 @@ static inline void mas_wr_store_entry(struct ma_wr_state *wr_mas)
 	unsigned char new_end = mas_wr_new_end(wr_mas);
 
 	switch (mas->store_type) {
-	case wr_invalid:
-		MT_BUG_ON(mas->tree, 1);
-		return;
-	case wr_new_root:
-		mas_new_root(mas, wr_mas->entry);
-		break;
-	case wr_store_root:
-		mas_store_root(mas, wr_mas->entry);
-		break;
 	case wr_exact_fit:
 		rcu_assign_pointer(wr_mas->slots[mas->offset], wr_mas->entry);
 		if (!!wr_mas->entry ^ !!wr_mas->content)
@@ -4121,6 +4112,14 @@ static inline void mas_wr_store_entry(struct ma_wr_state *wr_mas)
 	case wr_rebalance:
 		mas_wr_bnode(wr_mas);
 		break;
+	case wr_new_root:
+		mas_new_root(mas, wr_mas->entry);
+		break;
+	case wr_store_root:
+		mas_store_root(mas, wr_mas->entry);
+		break;
+	case wr_invalid:
+		MT_BUG_ON(mas->tree, 1);
 	}
 
 	return;
@@ -4185,19 +4184,10 @@ static inline int mas_prealloc_calc(struct ma_wr_state *wr_mas, void *entry)
 	unsigned char delta = height - wr_mas->vacant_height;
 
 	switch (mas->store_type) {
-	case wr_invalid:
-		WARN_ON_ONCE(1);
-		break;
-	case wr_new_root:
-		ret = 1;
-		break;
-	case wr_store_root:
-		if (likely((mas->last != 0) || (mas->index != 0)))
-			ret = 1;
-		else if (((unsigned long) (entry) & 3) == 2)
-			ret = 1;
-		else
-			ret = 0;
+	case wr_exact_fit:
+	case wr_append:
+	case wr_slot_store:
+		ret = 0;
 		break;
 	case wr_spanning_store:
 		if (wr_mas->sufficient_height < wr_mas->vacant_height)
@@ -4217,10 +4207,19 @@ static inline int mas_prealloc_calc(struct ma_wr_state *wr_mas, void *entry)
 	case wr_node_store:
 		ret = mt_in_rcu(mas->tree) ? 1 : 0;
 		break;
-	case wr_append:
-	case wr_exact_fit:
-	case wr_slot_store:
-		ret = 0;
+	case wr_new_root:
+		ret = 1;
+		break;
+	case wr_store_root:
+		if (likely((mas->last != 0) || (mas->index != 0)))
+			ret = 1;
+		else if (((unsigned long) (entry) & 3) == 2)
+			ret = 1;
+		else
+			ret = 0;
+		break;
+	case wr_invalid:
+		WARN_ON_ONCE(1);
 	}
 
 	return ret;
-- 
2.43.0



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 2/6] maple_tree: use height and depth consistently
  2025-02-21 16:36 ` [PATCH v2 2/6] maple_tree: use height and depth consistently Sidhartha Kumar
@ 2025-02-25 15:39   ` Liam R. Howlett
  0 siblings, 0 replies; 18+ messages in thread
From: Liam R. Howlett @ 2025-02-25 15:39 UTC (permalink / raw)
  To: Sidhartha Kumar; +Cc: linux-kernel, maple-tree, linux-mm, akpm, richard.weiyang

* Sidhartha Kumar <sidhartha.kumar@oracle.com> [250221 11:36]:
> For the maple tree, the root node is defined to have a depth of 0 with a
> height of 1. Each level down from the node, these values are incremented
> by 1. Various code paths define a root with depth 1 which is inconsisent
> with the definition. Modify the code to be consistent with this
> definition.
> 
> Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
> ---
>  lib/maple_tree.c | 85 +++++++++++++++++++++++++-----------------------
>  1 file changed, 45 insertions(+), 40 deletions(-)
> 
> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
> index 0410e13a099e..d7dac3119748 100644
> --- a/lib/maple_tree.c
> +++ b/lib/maple_tree.c
> @@ -211,14 +211,14 @@ static void ma_free_rcu(struct maple_node *node)
>  	call_rcu(&node->rcu, mt_free_rcu);
>  }
>  
> -static void mas_set_height(struct ma_state *mas)
> +static void mt_set_height(struct maple_tree *mt, unsigned char height)
>  {
> -	unsigned int new_flags = mas->tree->ma_flags;
> +	unsigned int new_flags = mt->ma_flags;
>  
>  	new_flags &= ~MT_FLAGS_HEIGHT_MASK;
> -	MAS_BUG_ON(mas, mas->depth > MAPLE_HEIGHT_MAX);
> -	new_flags |= mas->depth << MT_FLAGS_HEIGHT_OFFSET;
> -	mas->tree->ma_flags = new_flags;
> +	MT_BUG_ON(mt, height > MAPLE_HEIGHT_MAX);
> +	new_flags |= height << MT_FLAGS_HEIGHT_OFFSET;
> +	mt->ma_flags = new_flags;
>  }
>  
>  static unsigned int mas_mt_height(struct ma_state *mas)
> @@ -1375,7 +1375,7 @@ static inline struct maple_enode *mas_start(struct ma_state *mas)
>  		root = mas_root(mas);
>  		/* Tree with nodes */
>  		if (likely(xa_is_node(root))) {
> -			mas->depth = 1;
> +			mas->depth = 0;
>  			mas->status = ma_active;
>  			mas->node = mte_safe_root(root);
>  			mas->offset = 0;
> @@ -1716,9 +1716,10 @@ static inline void mas_adopt_children(struct ma_state *mas,
>   * node as dead.
>   * @mas: the maple state with the new node
>   * @old_enode: The old maple encoded node to replace.
> + * @new_height: if we are inserting a root node, update the height of the tree
>   */
>  static inline void mas_put_in_tree(struct ma_state *mas,
> -		struct maple_enode *old_enode)
> +		struct maple_enode *old_enode, char new_height)
>  	__must_hold(mas->tree->ma_lock)
>  {
>  	unsigned char offset;
> @@ -1727,7 +1728,7 @@ static inline void mas_put_in_tree(struct ma_state *mas,
>  	if (mte_is_root(mas->node)) {
>  		mas_mn(mas)->parent = ma_parent_ptr(mas_tree_parent(mas));
>  		rcu_assign_pointer(mas->tree->ma_root, mte_mk_root(mas->node));
> -		mas_set_height(mas);
> +		mt_set_height(mas->tree, new_height);
>  	} else {
>  
>  		offset = mte_parent_slot(mas->node);
> @@ -1747,10 +1748,10 @@ static inline void mas_put_in_tree(struct ma_state *mas,
>   * @old_enode: The old maple encoded node.
>   */
>  static inline void mas_replace_node(struct ma_state *mas,
> -		struct maple_enode *old_enode)
> +		struct maple_enode *old_enode, unsigned char new_height)
>  	__must_hold(mas->tree->ma_lock)
>  {
> -	mas_put_in_tree(mas, old_enode);
> +	mas_put_in_tree(mas, old_enode, new_height);
>  	mas_free(mas, old_enode);
>  }
>  
> @@ -2543,7 +2544,7 @@ static inline void mas_topiary_node(struct ma_state *mas,
>   *
>   */
>  static inline void mas_topiary_replace(struct ma_state *mas,
> -		struct maple_enode *old_enode)
> +		struct maple_enode *old_enode, unsigned char new_height)
>  {
>  	struct ma_state tmp[3], tmp_next[3];
>  	MA_TOPIARY(subtrees, mas->tree);
> @@ -2551,7 +2552,7 @@ static inline void mas_topiary_replace(struct ma_state *mas,
>  	int i, n;
>  
>  	/* Place data in tree & then mark node as old */
> -	mas_put_in_tree(mas, old_enode);
> +	mas_put_in_tree(mas, old_enode, new_height);
>  
>  	/* Update the parent pointers in the tree */
>  	tmp[0] = *mas;
> @@ -2639,10 +2640,10 @@ static inline void mas_topiary_replace(struct ma_state *mas,
>   * Updates gap as necessary.
>   */
>  static inline void mas_wmb_replace(struct ma_state *mas,
> -		struct maple_enode *old_enode)
> +		struct maple_enode *old_enode, unsigned char new_height)
>  {
>  	/* Insert the new data in the tree */
> -	mas_topiary_replace(mas, old_enode);
> +	mas_topiary_replace(mas, old_enode, new_height);
>  
>  	if (mte_is_leaf(mas->node))
>  		return;
> @@ -2828,6 +2829,7 @@ static void mas_spanning_rebalance(struct ma_state *mas,
>  {
>  	unsigned char split, mid_split;
>  	unsigned char slot = 0;
> +	unsigned char new_height = 0; /* used if node is a new root */
>  	struct maple_enode *left = NULL, *middle = NULL, *right = NULL;
>  	struct maple_enode *old_enode;
>  
> @@ -2877,7 +2879,7 @@ static void mas_spanning_rebalance(struct ma_state *mas,
>  		 */
>  		memset(mast->bn, 0, sizeof(struct maple_big_node));
>  		mast->bn->type = mte_node_type(left);
> -		l_mas.depth++;
> +		new_height++;
>  
>  		/* Root already stored in l->node. */
>  		if (mas_is_root_limits(mast->l))
> @@ -2901,8 +2903,10 @@ static void mas_spanning_rebalance(struct ma_state *mas,
>  			continue;
>  
>  		/* May be a new root stored in mast->bn */
> -		if (mas_is_root_limits(mast->orig_l))
> +		if (mas_is_root_limits(mast->orig_l)) {
> +			new_height++;
>  			break;
> +		}
>  
>  		mast_spanning_rebalance(mast);
>  
> @@ -2913,7 +2917,7 @@ static void mas_spanning_rebalance(struct ma_state *mas,
>  
>  	l_mas.node = mt_mk_node(ma_mnode_ptr(mas_pop_node(mas)),
>  				mte_node_type(mast->orig_l->node));
> -	l_mas.depth++;
> +
>  	mab_mas_cp(mast->bn, 0, mt_slots[mast->bn->type] - 1, &l_mas, true);
>  	mas_set_parent(mas, left, l_mas.node, slot);
>  	if (middle)
> @@ -2937,7 +2941,7 @@ static void mas_spanning_rebalance(struct ma_state *mas,
>  	mas->min = l_mas.min;
>  	mas->max = l_mas.max;
>  	mas->offset = l_mas.offset;
> -	mas_wmb_replace(mas, old_enode);
> +	mas_wmb_replace(mas, old_enode, new_height);
>  	mtree_range_walk(mas);
>  	return;
>  }
> @@ -3013,6 +3017,7 @@ static inline void mas_destroy_rebalance(struct ma_state *mas, unsigned char end
>  	void __rcu **l_slots, **slots;
>  	unsigned long *l_pivs, *pivs, gap;
>  	bool in_rcu = mt_in_rcu(mas->tree);
> +	unsigned char new_height = mas_mt_height(mas);
>  
>  	MA_STATE(l_mas, mas->tree, mas->index, mas->last);
>  
> @@ -3107,7 +3112,7 @@ static inline void mas_destroy_rebalance(struct ma_state *mas, unsigned char end
>  	mas_ascend(mas);
>  
>  	if (in_rcu) {
> -		mas_replace_node(mas, old_eparent);
> +		mas_replace_node(mas, old_eparent, new_height);
>  		mas_adopt_children(mas, mas->node);
>  	}
>  
> @@ -3118,10 +3123,9 @@ static inline void mas_destroy_rebalance(struct ma_state *mas, unsigned char end
>   * mas_split_final_node() - Split the final node in a subtree operation.
>   * @mast: the maple subtree state
>   * @mas: The maple state
> - * @height: The height of the tree in case it's a new root.
>   */
>  static inline void mas_split_final_node(struct maple_subtree_state *mast,
> -					struct ma_state *mas, int height)
> +					struct ma_state *mas)
>  {
>  	struct maple_enode *ancestor;
>  
> @@ -3130,7 +3134,6 @@ static inline void mas_split_final_node(struct maple_subtree_state *mast,
>  			mast->bn->type = maple_arange_64;
>  		else
>  			mast->bn->type = maple_range_64;
> -		mas->depth = height;
>  	}
>  	/*
>  	 * Only a single node is used here, could be root.
> @@ -3153,8 +3156,7 @@ static inline void mas_split_final_node(struct maple_subtree_state *mast,
>   * @skip: The number of entries to skip for new nodes insertion.
>   */
>  static inline void mast_fill_bnode(struct maple_subtree_state *mast,
> -					 struct ma_state *mas,
> -					 unsigned char skip)
> +					 struct ma_state *mas, unsigned char skip, int *height)

Update the argument list in the comments?

>  {
>  	bool cp = true;
>  	unsigned char split;
> @@ -3226,7 +3228,7 @@ static inline void mast_split_data(struct maple_subtree_state *mast,
>   *
>   * Return: True if pushed, false otherwise.
>   */
> -static inline bool mas_push_data(struct ma_state *mas, int height,
> +static inline bool mas_push_data(struct ma_state *mas, int *height,
>  				 struct maple_subtree_state *mast, bool left)

Update the argument list in the comments?

>  {
>  	unsigned char slot_total = mast->bn->b_end;
> @@ -3283,8 +3285,8 @@ static inline bool mas_push_data(struct ma_state *mas, int height,
>  		mast->orig_l->offset += end + 1;
>  
>  	mast_split_data(mast, mas, split);
> -	mast_fill_bnode(mast, mas, 2);
> -	mas_split_final_node(mast, mas, height + 1);
> +	mast_fill_bnode(mast, mas, 2, height);
> +	mas_split_final_node(mast, mas);
>  	return true;
>  }
>  
> @@ -3297,6 +3299,7 @@ static void mas_split(struct ma_state *mas, struct maple_big_node *b_node)
>  {
>  	struct maple_subtree_state mast;
>  	int height = 0;
> +	unsigned int orig_height = mas_mt_height(mas);
>  	unsigned char mid_split, split = 0;
>  	struct maple_enode *old;
>  
> @@ -3323,7 +3326,7 @@ static void mas_split(struct ma_state *mas, struct maple_big_node *b_node)
>  	MA_STATE(prev_r_mas, mas->tree, mas->index, mas->last);
>  
>  	trace_ma_op(__func__, mas);
> -	mas->depth = mas_mt_height(mas);
> +	mas->depth = orig_height;

Why is this still needed?  It might be worth adding a comment or
removing it?

>  
>  	mast.l = &l_mas;
>  	mast.r = &r_mas;
> @@ -3333,7 +3336,7 @@ static void mas_split(struct ma_state *mas, struct maple_big_node *b_node)
>  
>  	while (height++ <= mas->depth) {
>  		if (mt_slots[b_node->type] > b_node->b_end) {
> -			mas_split_final_node(&mast, mas, height);
> +			mas_split_final_node(&mast, mas);
>  			break;
>  		}
>  
> @@ -3348,11 +3351,15 @@ static void mas_split(struct ma_state *mas, struct maple_big_node *b_node)
>  		 * is a significant savings.
>  		 */
>  		/* Try to push left. */
> -		if (mas_push_data(mas, height, &mast, true))
> +		if (mas_push_data(mas, &height, &mast, true)) {
> +			height++;
>  			break;
> +		}
>  		/* Try to push right. */
> -		if (mas_push_data(mas, height, &mast, false))
> +		if (mas_push_data(mas, &height, &mast, false)) {
> +			height++;
>  			break;
> +		}
>  
>  		split = mab_calc_split(mas, b_node, &mid_split);
>  		mast_split_data(&mast, mas, split);
> @@ -3361,7 +3368,7 @@ static void mas_split(struct ma_state *mas, struct maple_big_node *b_node)
>  		 * r->max.
>  		 */
>  		mast.r->max = mas->max;
> -		mast_fill_bnode(&mast, mas, 1);
> +		mast_fill_bnode(&mast, mas, 1, &height);
>  		prev_l_mas = *mast.l;
>  		prev_r_mas = *mast.r;
>  	}
> @@ -3369,7 +3376,7 @@ static void mas_split(struct ma_state *mas, struct maple_big_node *b_node)
>  	/* Set the original node as dead */
>  	old = mas->node;
>  	mas->node = l_mas.node;
> -	mas_wmb_replace(mas, old);
> +	mas_wmb_replace(mas, old, height);
>  	mtree_range_walk(mas);
>  	return;
>  }
> @@ -3428,8 +3435,7 @@ static inline void mas_root_expand(struct ma_state *mas, void *entry)
>  	if (mas->last != ULONG_MAX)
>  		pivots[++slot] = ULONG_MAX;
>  
> -	mas->depth = 1;
> -	mas_set_height(mas);
> +	mt_set_height(mas->tree, 1);
>  	ma_set_meta(node, maple_leaf_64, 0, slot);
>  	/* swap the new root into the tree */
>  	rcu_assign_pointer(mas->tree->ma_root, mte_mk_root(mas->node));
> @@ -3673,8 +3679,7 @@ static inline void mas_new_root(struct ma_state *mas, void *entry)
>  	WARN_ON_ONCE(mas->index || mas->last != ULONG_MAX);
>  
>  	if (!entry) {
> -		mas->depth = 0;
> -		mas_set_height(mas);
> +		mt_set_height(mas->tree, 0);
>  		rcu_assign_pointer(mas->tree->ma_root, entry);
>  		mas->status = ma_start;
>  		goto done;
> @@ -3688,8 +3693,7 @@ static inline void mas_new_root(struct ma_state *mas, void *entry)
>  	mas->status = ma_active;
>  	rcu_assign_pointer(slots[0], entry);
>  	pivots[0] = mas->last;
> -	mas->depth = 1;
> -	mas_set_height(mas);
> +	mt_set_height(mas->tree, 1);
>  	rcu_assign_pointer(mas->tree->ma_root, mte_mk_root(mas->node));
>  
>  done:
> @@ -3808,6 +3812,7 @@ static inline void mas_wr_node_store(struct ma_wr_state *wr_mas,
>  	struct maple_node reuse, *newnode;
>  	unsigned char copy_size, node_pivots = mt_pivots[wr_mas->type];
>  	bool in_rcu = mt_in_rcu(mas->tree);
> +	unsigned char height = mas_mt_height(mas);
>  
>  	if (mas->last == wr_mas->end_piv)
>  		offset_end++; /* don't copy this offset */
> @@ -3864,7 +3869,7 @@ static inline void mas_wr_node_store(struct ma_wr_state *wr_mas,
>  		struct maple_enode *old_enode = mas->node;
>  
>  		mas->node = mt_mk_node(newnode, wr_mas->type);
> -		mas_replace_node(mas, old_enode);
> +		mas_replace_node(mas, old_enode, height);
>  	} else {
>  		memcpy(wr_mas->node, newnode, sizeof(struct maple_node));
>  	}
> -- 
> 2.43.0
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/6] Track node vacancy to reduce worst case allocation counts
  2025-02-21 16:36 [PATCH v2 0/6] Track node vacancy to reduce worst case allocation counts Sidhartha Kumar
                   ` (5 preceding siblings ...)
  2025-02-21 16:36 ` [PATCH v2 6/6] maple_tree: reorder mas->store_type case statements Sidhartha Kumar
@ 2025-02-25 15:40 ` Liam R. Howlett
  6 siblings, 0 replies; 18+ messages in thread
From: Liam R. Howlett @ 2025-02-25 15:40 UTC (permalink / raw)
  To: Sidhartha Kumar; +Cc: linux-kernel, maple-tree, linux-mm, akpm, richard.weiyang

* Sidhartha Kumar <sidhartha.kumar@oracle.com> [250221 11:36]:
> v1[1] -> v2:
>   - fix comment for vacant_height which refers to depth per Wei 
> 
>   - add a patch to reorder switch case statements in mas_prealloc_calc and
>     mas_wr_store_entry
> 
>   - use sufficient height in spanning stores
> 
>   - modify patch 2 to use a counter to track ascending the tree rather
>     than overloading mas->depth to have this function.
> 
> ================ overview ========================
> Currently, the maple tree preallocates the worst case number of nodes for
> given store type by taking into account the whole height of the tree. This
> comes from a worst case scenario of every node in the tree being full and
> having to propagate node allocation upwards until we reach the root of the
> tree. This can be optimized if there are vacancies in nodes that are at a
> lower depth than the root node. This series implements tracking the level
> at which there is a vacant node so we only need to allocate until this
> level is reached, rather than always using the full height of the tree.
> The ma_wr_state struct is modified to add a field which keeps track of the
> vacant height and is updated during walks of the tree. This value is then
> read in mas_prealloc_calc() when we decide how many nodes to allocate.
> 
> For rebalancing and spanning stores, we also need to track the lowest
> height at which a node has 1 more entry than the minimum sufficient number
> of entries. This is because rebalancing can cause a parent node to become
> insufficient which results in further node allocations. In this case, we
> need to use the sufficient height as the worst case rather than the vacant
> height.
> 
> patch 1-2: preparatory patches
> patch 3: implement vacant height tracking + update the tests
> patch 4: support vacant height tracking for rebalacning writes
                                              ^^^^^^^^^^^- Typo
> patch 5: implement sufficient height tracking
> patch 6: reorder switch case statements
> 
> ================ results =========================
> Bpftrace was used to profile the allocation path for requesting new maple
> nodes while running stress-ng mmap 120s. The histograms below represent
> requests to kmem_cache_alloc_bulk() and show the count argument. This
> represnts how many maple nodes the caller is requesting in
> kmem_cache_alloc_bulk()
> 
> command: stress-ng --mmap 4 --timeout 120
> 
> mm-unstable 
> 
> @bulk_alloc_req:
> [3, 4)                 4 |                                                    |
> [4, 5)             54170 |@                                                   |
> [5, 6)                 0 |                                                    |
> [6, 7)            893057 |@@@@@@@@@@@@@@@@@@@@                                |
> [7, 8)                 4 |                                                    |
> [8, 9)           2230287 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [9, 10)            55811 |@                                                   |
> [10, 11)           77834 |@                                                   |
> [11, 12)               0 |                                                    |
> [12, 13)         1368684 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                     |
> [13, 14)               0 |                                                    |
> [14, 15)               0 |                                                    |
> [15, 16)          367197 |@@@@@@@@                                            |
> 
> 
> @maple_node_total: 46,630,160
> @total_vmas: 46184591
> 
> mm-unstable + this series
> 
> @bulk_alloc_req:
> [2, 3)               198 |                                                    |
> [3, 4)                 4 |                                                    |
> [4, 5)                43 |                                                    |
> [5, 6)                 0 |                                                    |
> [6, 7)           1069503 |@@@@@@@@@@@@@@@@@@@@@                               |
> [7, 8)                 4 |                                                    |
> [8, 9)           2597268 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [9, 10)           472191 |@@@@@@@@@                                           |
> [10, 11)          191904 |@@@                                                 |
> [11, 12)               0 |                                                    |
> [12, 13)          247316 |@@@@                                                |
> [13, 14)               0 |                                                    |
> [14, 15)               0 |                                                    |
> [15, 16)           98769 |@                                                   |
> 
> 
> @maple_node_total: 37,813,856
> @total_vmas: 43493287
> 
> This represents a ~19% reduction in the number of bulk maple nodes allocated.
> 
> For more reproducible results, a historgram of the return value of
> mas_prealloc_calc() is displayed while running the maple_tree_tests
> whcih have a deterministic store pattern
> 
> mas_prealloc_calc() return value mm-unstable
> 1   :  							 (12068)
> 3   :  							 (11836)
> 5   : ***** 						 (271192)
> 7   : ************************************************** (2329329)
> 9   : *********** 					 (534186)
> 10  :  							 (435)
> 11  : *************** 					 (704306)
> 13  : ******** 						 (409781)
> 
> mas_prealloc_calc() return value mm-unstable + this series
> 1   : 							 (12070)
> 3   : ************************************************** (3548777)
> 5   : ********                                           (633458)
> 7   :  							 (65081)
> 9   :  							 (11224)
> 10  :  							 (341)
> 11  :  							 (2973)
> 13  :  							 (68)
> 
> do_mmap latency was also measured for regressions:
> command: stress-ng --mmap 4 --timeout 120
> 
> mm-unstable:
> avg = 7162 nsecs, total: 16101821292 nsecs, count: 2248034
> 
> mm-unstable + this series:
> avg = 6689 nsecs, total: 15135391764 nsecs, count: 2262726
> 
> 
> [1]: https://lore.kernel.org/lkml/20241114170524.64391-1-sidhartha.kumar@oracle.com/T/
> 
> Sidhartha Kumar (6):
>   maple_tree: convert mas_prealloc_calc() to take in a maple write state
>   maple_tree: use height and depth consistently
>   maple_tree: use vacant nodes to reduce worst case allocations
>   maple_tree: break on convergence in mas_spanning_rebalance()
>   maple_tree: add sufficient height
>   maple_tree: reorder mas->store_type case statements
> 
>  include/linux/maple_tree.h       |   4 +
>  lib/maple_tree.c                 | 193 ++++++++++++++++++-------------
>  tools/testing/radix-tree/maple.c | 130 +++++++++++++++++++--
>  3 files changed, 240 insertions(+), 87 deletions(-)
> 
> -- 
> 2.43.0
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/6] maple_tree: convert mas_prealloc_calc() to take in a maple write state
  2025-02-21 16:36 ` [PATCH v2 1/6] maple_tree: convert mas_prealloc_calc() to take in a maple write state Sidhartha Kumar
@ 2025-02-25 15:40   ` Liam R. Howlett
  0 siblings, 0 replies; 18+ messages in thread
From: Liam R. Howlett @ 2025-02-25 15:40 UTC (permalink / raw)
  To: Sidhartha Kumar; +Cc: linux-kernel, maple-tree, linux-mm, akpm, richard.weiyang

* Sidhartha Kumar <sidhartha.kumar@oracle.com> [250221 11:36]:
> In a subsequent patch, mas_prealloc_calc() will need to access fields only
> in the ma_wr_state. Convert the function to take in a ma_wr_state and
> modify all callers. There is no functional change.
> 
> Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>

Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

> ---
>  lib/maple_tree.c | 16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
> index 42c65974a56c..0410e13a099e 100644
> --- a/lib/maple_tree.c
> +++ b/lib/maple_tree.c
> @@ -4144,13 +4144,14 @@ static inline void mas_wr_prealloc_setup(struct ma_wr_state *wr_mas)
>  /**
>   * mas_prealloc_calc() - Calculate number of nodes needed for a
>   * given store oepration
> - * @mas: The maple state
> + * @wr_mas: The maple write state
>   * @entry: The entry to store into the tree
>   *
>   * Return: Number of nodes required for preallocation.
>   */
> -static inline int mas_prealloc_calc(struct ma_state *mas, void *entry)
> +static inline int mas_prealloc_calc(struct ma_wr_state *wr_mas, void *entry)
>  {
> +	struct ma_state *mas = wr_mas->mas;
>  	int ret = mas_mt_height(mas) * 3 + 1;
>  
>  	switch (mas->store_type) {
> @@ -4247,16 +4248,15 @@ static inline enum store_type mas_wr_store_type(struct ma_wr_state *wr_mas)
>   */
>  static inline void mas_wr_preallocate(struct ma_wr_state *wr_mas, void *entry)
>  {
> -	struct ma_state *mas = wr_mas->mas;
>  	int request;
>  
>  	mas_wr_prealloc_setup(wr_mas);
> -	mas->store_type = mas_wr_store_type(wr_mas);
> -	request = mas_prealloc_calc(mas, entry);
> +	wr_mas->mas->store_type = mas_wr_store_type(wr_mas);
> +	request = mas_prealloc_calc(wr_mas, entry);
>  	if (!request)
>  		return;
>  
> -	mas_node_count(mas, request);
> +	mas_node_count(wr_mas->mas, request);
>  }
>  
>  /**
> @@ -5401,7 +5401,7 @@ void *mas_store(struct ma_state *mas, void *entry)
>  		return wr_mas.content;
>  	}
>  
> -	request = mas_prealloc_calc(mas, entry);
> +	request = mas_prealloc_calc(&wr_mas, entry);
>  	if (!request)
>  		goto store;
>  
> @@ -5498,7 +5498,7 @@ int mas_preallocate(struct ma_state *mas, void *entry, gfp_t gfp)
>  
>  	mas_wr_prealloc_setup(&wr_mas);
>  	mas->store_type = mas_wr_store_type(&wr_mas);
> -	request = mas_prealloc_calc(mas, entry);
> +	request = mas_prealloc_calc(&wr_mas, entry);
>  	if (!request)
>  		return ret;
>  
> -- 
> 2.43.0
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 3/6] maple_tree: use vacant nodes to reduce worst case allocations
  2025-02-21 16:36 ` [PATCH v2 3/6] maple_tree: use vacant nodes to reduce worst case allocations Sidhartha Kumar
@ 2025-02-25 15:46   ` Liam R. Howlett
  2025-02-25 16:22     ` Sidhartha Kumar
  0 siblings, 1 reply; 18+ messages in thread
From: Liam R. Howlett @ 2025-02-25 15:46 UTC (permalink / raw)
  To: Sidhartha Kumar; +Cc: linux-kernel, maple-tree, linux-mm, akpm, richard.weiyang

* Sidhartha Kumar <sidhartha.kumar@oracle.com> [250221 11:36]:
> In order to determine the store type for a maple tree operation, a walk
> of the tree is done through mas_wr_walk(). This function descends the
> tree until a spanning write is detected or we reach a leaf node. While
> descending, keep track of the height at which we encounter a node with
> available space. This is done by checking if mas->end is less than the
> number of slots a given node type can fit.
> 
> Now that the height of the vacant node is tracked, we can use the
> difference between the height of the tree and the height of the vacant
> node to know how many levels we will have to propagate creating new
> nodes. Update mas_prealloc_calc() to consider the vacant height and
> reduce the number of worst-case allocations.
> 
> Rebalancing and spanning stores are not supported and fall back to using
> the full height of the tree for allocations.
> 
> Update preallocation testing assertions to take into account vacant
> height.
> 
> Signed-off-by: Sidhartha <sidhartha.kumar@oracle.com>
> ---
>  include/linux/maple_tree.h       |   2 +
>  lib/maple_tree.c                 |  13 ++--
>  tools/testing/radix-tree/maple.c | 102 ++++++++++++++++++++++++++++---
>  3 files changed, 105 insertions(+), 12 deletions(-)
> 
> diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
> index cbbcd18d4186..7d777aa2d9ed 100644
> --- a/include/linux/maple_tree.h
> +++ b/include/linux/maple_tree.h
> @@ -463,6 +463,7 @@ struct ma_wr_state {
>  	void __rcu **slots;		/* mas->node->slots pointer */
>  	void *entry;			/* The entry to write */
>  	void *content;			/* The existing entry that is being overwritten */
> +	unsigned char vacant_height;	/* Height of lowest node with free space */
>  };
>  
>  #define mas_lock(mas)           spin_lock(&((mas)->tree->ma_lock))
> @@ -498,6 +499,7 @@ struct ma_wr_state {
>  		.mas = ma_state,					\
>  		.content = NULL,					\
>  		.entry = wr_entry,					\
> +		.vacant_height = 0					\
>  	}
>  
>  #define MA_TOPIARY(name, tree)						\
> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
> index d7dac3119748..ef71af0529f4 100644
> --- a/lib/maple_tree.c
> +++ b/lib/maple_tree.c
> @@ -3542,6 +3542,9 @@ static bool mas_wr_walk(struct ma_wr_state *wr_mas)
>  		if (ma_is_leaf(wr_mas->type))
>  			return true;
>  
> +		if (mas->end < mt_slots[wr_mas->type] - 1)
> +			wr_mas->vacant_height = mas->depth + 1;
> +
>  		mas_wr_walk_traverse(wr_mas);
>  	}
>  
> @@ -4157,7 +4160,9 @@ static inline void mas_wr_prealloc_setup(struct ma_wr_state *wr_mas)
>  static inline int mas_prealloc_calc(struct ma_wr_state *wr_mas, void *entry)
>  {
>  	struct ma_state *mas = wr_mas->mas;
> -	int ret = mas_mt_height(mas) * 3 + 1;
> +	unsigned char height = mas_mt_height(mas);
> +	int ret = height * 3 + 1;
> +	unsigned char delta = height - wr_mas->vacant_height;
>  
>  	switch (mas->store_type) {
>  	case wr_invalid:
> @@ -4175,13 +4180,13 @@ static inline int mas_prealloc_calc(struct ma_wr_state *wr_mas, void *entry)
>  			ret = 0;
>  		break;
>  	case wr_spanning_store:
> -		ret =  mas_mt_height(mas) * 3 + 1;
> +		ret = height * 3 + 1;

ret is already height * 3 + 1, you could add a WARN_ON_ONCE or such to
ensure that's the case here and it's not missed in updates to the code.

>  		break;
>  	case wr_split_store:
> -		ret =  mas_mt_height(mas) * 2 + 1;
> +		ret = delta * 2 + 1;
>  		break;
>  	case wr_rebalance:
> -		ret =  mas_mt_height(mas) * 2 - 1;
> +		ret = height * 2 + 1;
>  		break;
>  	case wr_node_store:
>  		ret = mt_in_rcu(mas->tree) ? 1 : 0;
> diff --git a/tools/testing/radix-tree/maple.c b/tools/testing/radix-tree/maple.c
> index bc30050227fd..d22c1008dffe 100644
> --- a/tools/testing/radix-tree/maple.c
> +++ b/tools/testing/radix-tree/maple.c
> @@ -35475,12 +35475,85 @@ static void check_dfs_preorder(struct maple_tree *mt)
>  }
>  /* End of depth first search tests */
>  
> +/* same implementation as mas_is_span_wr() in lib/maple_tree.c */

Is this needed then?  At the start of this file we have:
#include "../../../lib/maple_tree.c" so I would think you could use the
one already defined?

> +static bool is_span_wr(struct ma_state *mas, unsigned long r_max,
> +				enum maple_type type, void *entry)
> +{
> +	unsigned long max = r_max;
> +	unsigned long last = mas->last;
> +
> +	/* Contained in this pivot, fast path */
> +	if (last < max)
> +		return false;
> +
> +	if (ma_is_leaf(type)) {
> +		max = mas->max;
> +		if (last < max)
> +			return false;
> +	}
> +
> +	if (last == max) {
> +		/*
> +		 * The last entry of leaf node cannot be NULL unless it is the
> +		 * rightmost node (writing ULONG_MAX), otherwise it spans slots.
> +		 */
> +		if (entry || last == ULONG_MAX)
> +			return false;
> +	}
> +
> +	return true;
> +}
> +
> +/* get height of the lowest non-leaf node with free space */
> +static unsigned char get_vacant_height(struct ma_state *mas, void *entry)
> +{
> +	char vacant_height = 0;
> +	enum maple_type type;
> +	unsigned long *pivots;
> +	unsigned long min = 0;
> +	unsigned long max = ULONG_MAX;
> +
> +	/* start traversal */
> +	mas_reset(mas);
> +	mas_start(mas);
> +	if (!xa_is_node(mas_root(mas)))
> +		return 0;
> +
> +	type = mte_node_type(mas->node);
> +	while (!ma_is_leaf(type)) {
> +		mas_node_walk(mas, mte_to_node(mas->node), type, &min, &max);
> +		mas->end = mas_data_end(mas);
> +		pivots = ma_pivots(mte_to_node(mas->node), type);
> +
> +		if (pivots) {
> +			if (mas->offset)
> +				min = pivots[mas->offset - 1];
> +			if (mas->offset < mas->end)
> +				max = pivots[mas->offset];
> +		}
> +
> +		/* detect spanning write */
> +		if (is_span_wr(mas, max, type, entry))
> +			break;
> +
> +		if (mas->end < mt_slot_count(mas->node) - 1)
> +			vacant_height = mas->depth + 1;
> +
> +		mas_descend(mas);
> +		type = mte_node_type(mas->node);
> +		mas->depth++;
> +	}
> +
> +	return vacant_height;
> +}
> +
>  /* Preallocation testing */
>  static noinline void __init check_prealloc(struct maple_tree *mt)
>  {
>  	unsigned long i, max = 100;
>  	unsigned long allocated;
>  	unsigned char height;
> +	unsigned char vacant_height;
>  	struct maple_node *mn;
>  	void *ptr = check_prealloc;
>  	MA_STATE(mas, mt, 10, 20);
> @@ -35494,8 +35567,9 @@ static noinline void __init check_prealloc(struct maple_tree *mt)
>  	MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
>  	allocated = mas_allocated(&mas);
>  	height = mas_mt_height(&mas);
> +	vacant_height = get_vacant_height(&mas, ptr);
>  	MT_BUG_ON(mt, allocated == 0);
> -	MT_BUG_ON(mt, allocated != 1 + height * 3);
> +	MT_BUG_ON(mt, allocated != 1 + (height - vacant_height) * 3);
>  	mas_destroy(&mas);
>  	allocated = mas_allocated(&mas);
>  	MT_BUG_ON(mt, allocated != 0);
> @@ -35503,8 +35577,9 @@ static noinline void __init check_prealloc(struct maple_tree *mt)
>  	MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
>  	allocated = mas_allocated(&mas);
>  	height = mas_mt_height(&mas);
> +	vacant_height = get_vacant_height(&mas, ptr);
>  	MT_BUG_ON(mt, allocated == 0);
> -	MT_BUG_ON(mt, allocated != 1 + height * 3);
> +	MT_BUG_ON(mt, allocated != 1 + (height - vacant_height) * 3);
>  	MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
>  	mas_destroy(&mas);
>  	allocated = mas_allocated(&mas);
> @@ -35514,7 +35589,8 @@ static noinline void __init check_prealloc(struct maple_tree *mt)
>  	MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
>  	allocated = mas_allocated(&mas);
>  	height = mas_mt_height(&mas);
> -	MT_BUG_ON(mt, allocated != 1 + height * 3);
> +	vacant_height = get_vacant_height(&mas, ptr);
> +	MT_BUG_ON(mt, allocated != 1 + (height - vacant_height) * 3);
>  	mn = mas_pop_node(&mas);
>  	MT_BUG_ON(mt, mas_allocated(&mas) != allocated - 1);
>  	mn->parent = ma_parent_ptr(mn);
> @@ -35527,7 +35603,8 @@ static noinline void __init check_prealloc(struct maple_tree *mt)
>  	MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
>  	allocated = mas_allocated(&mas);
>  	height = mas_mt_height(&mas);
> -	MT_BUG_ON(mt, allocated != 1 + height * 3);
> +	vacant_height = get_vacant_height(&mas, ptr);
> +	MT_BUG_ON(mt, allocated != 1 + (height - vacant_height) * 3);
>  	mn = mas_pop_node(&mas);
>  	MT_BUG_ON(mt, mas_allocated(&mas) != allocated - 1);
>  	MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
> @@ -35540,7 +35617,8 @@ static noinline void __init check_prealloc(struct maple_tree *mt)
>  	MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
>  	allocated = mas_allocated(&mas);
>  	height = mas_mt_height(&mas);
> -	MT_BUG_ON(mt, allocated != 1 + height * 3);
> +	vacant_height = get_vacant_height(&mas, ptr);
> +	MT_BUG_ON(mt, allocated != 1 + (height - vacant_height) * 3);
>  	mn = mas_pop_node(&mas);
>  	MT_BUG_ON(mt, mas_allocated(&mas) != allocated - 1);
>  	mas_push_node(&mas, mn);
> @@ -35553,7 +35631,8 @@ static noinline void __init check_prealloc(struct maple_tree *mt)
>  	MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
>  	allocated = mas_allocated(&mas);
>  	height = mas_mt_height(&mas);
> -	MT_BUG_ON(mt, allocated != 1 + height * 3);
> +	vacant_height = get_vacant_height(&mas, ptr);
> +	MT_BUG_ON(mt, allocated != 1 + (height - vacant_height) * 3);
>  	mas_store_prealloc(&mas, ptr);
>  	MT_BUG_ON(mt, mas_allocated(&mas) != 0);
>  
> @@ -35578,7 +35657,8 @@ static noinline void __init check_prealloc(struct maple_tree *mt)
>  	MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
>  	allocated = mas_allocated(&mas);
>  	height = mas_mt_height(&mas);
> -	MT_BUG_ON(mt, allocated != 1 + height * 2);
> +	vacant_height = get_vacant_height(&mas, ptr);
> +	MT_BUG_ON(mt, allocated != 1 + (height - vacant_height) * 2);
>  	mas_store_prealloc(&mas, ptr);
>  	MT_BUG_ON(mt, mas_allocated(&mas) != 0);
>  	mt_set_non_kernel(1);
> @@ -35595,8 +35675,14 @@ static noinline void __init check_prealloc(struct maple_tree *mt)
>  	MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
>  	allocated = mas_allocated(&mas);
>  	height = mas_mt_height(&mas);
> +	vacant_height = get_vacant_height(&mas, ptr);
>  	MT_BUG_ON(mt, allocated == 0);
> -	MT_BUG_ON(mt, allocated != 1 + height * 3);
> +	/*
> +	 * vacant height cannot be used to compute the number of nodes needed
> +	 * as the root contains two entries which means it is on the verge of
> +	 * insufficiency. The worst case full height of the tree is needed.
> +	 */
> +	MT_BUG_ON(mt, allocated != height * 3 + 1);
>  	mas_store_prealloc(&mas, ptr);
>  	MT_BUG_ON(mt, mas_allocated(&mas) != 0);
>  	mas_set_range(&mas, 0, 200);
> -- 
> 2.43.0
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 4/6] maple_tree: break on convergence in mas_spanning_rebalance()
  2025-02-21 16:36 ` [PATCH v2 4/6] maple_tree: break on convergence in mas_spanning_rebalance() Sidhartha Kumar
@ 2025-02-25 15:47   ` Liam R. Howlett
  0 siblings, 0 replies; 18+ messages in thread
From: Liam R. Howlett @ 2025-02-25 15:47 UTC (permalink / raw)
  To: Sidhartha Kumar; +Cc: linux-kernel, maple-tree, linux-mm, akpm, richard.weiyang

* Sidhartha Kumar <sidhartha.kumar@oracle.com> [250221 11:36]:
> This allows support for using the vacant height to calculate the worst
> case number of nodes needed for wr_rebalance operation.
> mas_spanning_rebalance() was seen to perform unnecessary node allocations.
> We can reduce allocations by breaking early during the rebalancing loop
> once we realize that we have ascended to a common ancestor.
> 
> Suggested-by: Liam Howlett <liam.howlett@oracle.com>
> Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
> Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>

Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

> ---
>  lib/maple_tree.c | 19 ++++++++++++++++---
>  1 file changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
> index ef71af0529f4..4de257003251 100644
> --- a/lib/maple_tree.c
> +++ b/lib/maple_tree.c
> @@ -2896,11 +2896,24 @@ static void mas_spanning_rebalance(struct ma_state *mas,
>  		mast_combine_cp_right(mast);
>  		mast->orig_l->last = mast->orig_l->max;
>  
> -		if (mast_sufficient(mast))
> -			continue;
> +		if (mast_sufficient(mast)) {
> +			if (mast_overflow(mast))
> +				continue;
> +
> +			if (mast->orig_l->node == mast->orig_r->node) {
> +			       /*
> +				* The data in b_node should be stored in one
> +				* node and in the tree
> +				*/
> +				slot = mast->l->offset;
> +				/* May be a new root stored in mast->bn */
> +				if (mas_is_root_limits(mast->orig_l))
> +					new_height++;
> +				break;
> +			}
>  
> -		if (mast_overflow(mast))
>  			continue;
> +		}
>  
>  		/* May be a new root stored in mast->bn */
>  		if (mas_is_root_limits(mast->orig_l)) {
> -- 
> 2.43.0
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 5/6] maple_tree: add sufficient height
  2025-02-21 16:36 ` [PATCH v2 5/6] maple_tree: add sufficient height Sidhartha Kumar
@ 2025-02-25 16:02   ` Liam R. Howlett
  2025-02-25 20:36     ` Sidhartha Kumar
  0 siblings, 1 reply; 18+ messages in thread
From: Liam R. Howlett @ 2025-02-25 16:02 UTC (permalink / raw)
  To: Sidhartha Kumar; +Cc: linux-kernel, maple-tree, linux-mm, akpm, richard.weiyang

* Sidhartha Kumar <sidhartha.kumar@oracle.com> [250221 11:36]:
> If a parent node is vacant but holds mt_min_slots + 1 entries,
> rebalancing with a leaf node could cause this parent node to become
> insufficient. This will lead to another level of rebalancing in the tree
> and requires more node allocations. Therefore, we also have to track the
> level at which there is a node with > mt_min_slots entries. We can use
> this as the worst case for the spanning and rebalacning stores.

This may not explain the situation fully; We also have to track the last
level at which there is a node that will not become insufficient.  We
know that during rebalance, the number of entries in a non-leaf node may
decrease by one.  Tracking the last node that will remain sufficient and
stop the cascading operation can be used to reduce the number of nodes
preallocated for the operation.

Note that this can happen at any level of an operation and not just a
node containing leaves.

The spanning store operation can also be treated the same because the
walk down the tree stops when it is detected.  That means the location
of the walk that detects the spanning store may be reduced to be
insufficient and will be rebalanced or may be split and need to absorb
up to two entries.

I think this commit needs some more text explaining these changes.


> 
> Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
> ---
>  include/linux/maple_tree.h       |  4 +++-
>  lib/maple_tree.c                 | 17 +++++++++++++++--
>  tools/testing/radix-tree/maple.c | 28 ++++++++++++++++++++++++++++
>  3 files changed, 46 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
> index 7d777aa2d9ed..37dc9525dff6 100644
> --- a/include/linux/maple_tree.h
> +++ b/include/linux/maple_tree.h
> @@ -464,6 +464,7 @@ struct ma_wr_state {
>  	void *entry;			/* The entry to write */
>  	void *content;			/* The existing entry that is being overwritten */
>  	unsigned char vacant_height;	/* Depth of lowest node with free space */
> +	unsigned char sufficient_height;/* Depth of lowest node with min sufficiency + 1 nodes */
>  };
>  
>  #define mas_lock(mas)           spin_lock(&((mas)->tree->ma_lock))
> @@ -499,7 +500,8 @@ struct ma_wr_state {
>  		.mas = ma_state,					\
>  		.content = NULL,					\
>  		.entry = wr_entry,					\
> -		.vacant_height = 0					\
> +		.vacant_height = 0,					\
> +		.sufficient_height = 0					\
>  	}
>  
>  #define MA_TOPIARY(name, tree)						\
> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
> index 4de257003251..8fdd3f477198 100644
> --- a/lib/maple_tree.c
> +++ b/lib/maple_tree.c
> @@ -3558,6 +3558,13 @@ static bool mas_wr_walk(struct ma_wr_state *wr_mas)
>  		if (mas->end < mt_slots[wr_mas->type] - 1)
>  			wr_mas->vacant_height = mas->depth + 1;
>  
> +		if (ma_is_root(mas_mn(mas))) {
> +			/* root needs more than 2 entries to be sufficient + 1 */
> +			if (mas->end > 2)
> +				wr_mas->sufficient_height = 1;
> +		} else if (mas->end > mt_min_slots[wr_mas->type] + 1)
> +			wr_mas->sufficient_height = mas->depth + 1;
> +
>  		mas_wr_walk_traverse(wr_mas);
>  	}
>  
> @@ -4193,13 +4200,19 @@ static inline int mas_prealloc_calc(struct ma_wr_state *wr_mas, void *entry)
>  			ret = 0;
>  		break;
>  	case wr_spanning_store:
> -		ret = height * 3 + 1;
> +		if (wr_mas->sufficient_height < wr_mas->vacant_height)
> +			ret = (height - wr_mas->sufficient_height) * 3 + 1;
> +		else
> +			ret = delta * 3 + 1;

Ah, ret was short lived.  Okay.

I still think this stuff needs some more context in the commit message.

>  		break;
>  	case wr_split_store:
>  		ret = delta * 2 + 1;
>  		break;
>  	case wr_rebalance:
> -		ret = height * 2 + 1;
> +		if (wr_mas->sufficient_height < wr_mas->vacant_height)
> +			ret = (height - wr_mas->sufficient_height) * 2 + 1;
> +		else
> +			ret = delta * 2 + 1;
>  		break;
>  	case wr_node_store:
>  		ret = mt_in_rcu(mas->tree) ? 1 : 0;
> diff --git a/tools/testing/radix-tree/maple.c b/tools/testing/radix-tree/maple.c
> index d22c1008dffe..d40f70671cb8 100644
> --- a/tools/testing/radix-tree/maple.c
> +++ b/tools/testing/radix-tree/maple.c
> @@ -36334,6 +36334,30 @@ static noinline void __init check_mtree_dup(struct maple_tree *mt)
>  
>  extern void test_kmem_cache_bulk(void);
>  
> +/*
> + * Test to check the path of a spanning rebalance which results in
> + * a collapse where the rebalancing of the child node leads to
> + * insufficieny in the parent node.
> + */
> +static void check_collapsing_rebalance(struct maple_tree *mt)
> +{
> +	int i = 0;
> +	MA_STATE(mas, mt, ULONG_MAX, ULONG_MAX);
> +
> +	/* create a height 4 tree */
> +	while (mt_height(mt) < 4) {
> +		mtree_store_range(mt, i, i + 10, xa_mk_value(i), GFP_KERNEL);
> +		i += 9;
> +	}
> +
> +	/* delete all entries one at a time, starting from the right */
> +	do {
> +		mas_erase(&mas);
> +	} while (mas_prev(&mas, 0) != NULL);
> +
> +	mtree_unlock(mt);
> +}
> +
>  /* callback function used for check_nomem_writer_race() */
>  static void writer2(void *maple_tree)
>  {
> @@ -36500,6 +36524,10 @@ void farmer_tests(void)
>  	check_spanning_write(&tree);
>  	mtree_destroy(&tree);
>  
> +	mt_init_flags(&tree, MT_FLAGS_ALLOC_RANGE);
> +	check_collapsing_rebalance(&tree);
> +	mtree_destroy(&tree);
> +
>  	mt_init_flags(&tree, MT_FLAGS_ALLOC_RANGE);
>  	check_null_expand(&tree);
>  	mtree_destroy(&tree);
> -- 
> 2.43.0
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 6/6] maple_tree: reorder mas->store_type case statements
  2025-02-21 16:36 ` [PATCH v2 6/6] maple_tree: reorder mas->store_type case statements Sidhartha Kumar
@ 2025-02-25 16:03   ` Liam R. Howlett
  0 siblings, 0 replies; 18+ messages in thread
From: Liam R. Howlett @ 2025-02-25 16:03 UTC (permalink / raw)
  To: Sidhartha Kumar; +Cc: linux-kernel, maple-tree, linux-mm, akpm, richard.weiyang

* Sidhartha Kumar <sidhartha.kumar@oracle.com> [250221 11:36]:
> Move the unlikely case that mas->store_type is invalid to be the last
> evaluated case and put liklier cases higher up.
> 
> Suggested-by: Liam Howlett <liam.howlett@oracle.com>
> Signed-off-by: Sidhartha <sidhartha.kumar@oracle.com>

Thanks for doing this.  There will probably be zero change in the
benchmarking besides instruction count, but it's low risk and low effort
to do.

Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

> ---
>  lib/maple_tree.c | 51 ++++++++++++++++++++++++------------------------
>  1 file changed, 25 insertions(+), 26 deletions(-)
> 
> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
> index 8fdd3f477198..36d7d7a27e32 100644
> --- a/lib/maple_tree.c
> +++ b/lib/maple_tree.c
> @@ -4091,15 +4091,6 @@ static inline void mas_wr_store_entry(struct ma_wr_state *wr_mas)
>  	unsigned char new_end = mas_wr_new_end(wr_mas);
>  
>  	switch (mas->store_type) {
> -	case wr_invalid:
> -		MT_BUG_ON(mas->tree, 1);
> -		return;
> -	case wr_new_root:
> -		mas_new_root(mas, wr_mas->entry);
> -		break;
> -	case wr_store_root:
> -		mas_store_root(mas, wr_mas->entry);
> -		break;
>  	case wr_exact_fit:
>  		rcu_assign_pointer(wr_mas->slots[mas->offset], wr_mas->entry);
>  		if (!!wr_mas->entry ^ !!wr_mas->content)
> @@ -4121,6 +4112,14 @@ static inline void mas_wr_store_entry(struct ma_wr_state *wr_mas)
>  	case wr_rebalance:
>  		mas_wr_bnode(wr_mas);
>  		break;
> +	case wr_new_root:
> +		mas_new_root(mas, wr_mas->entry);
> +		break;
> +	case wr_store_root:
> +		mas_store_root(mas, wr_mas->entry);
> +		break;
> +	case wr_invalid:
> +		MT_BUG_ON(mas->tree, 1);
>  	}
>  
>  	return;
> @@ -4185,19 +4184,10 @@ static inline int mas_prealloc_calc(struct ma_wr_state *wr_mas, void *entry)
>  	unsigned char delta = height - wr_mas->vacant_height;
>  
>  	switch (mas->store_type) {
> -	case wr_invalid:
> -		WARN_ON_ONCE(1);
> -		break;
> -	case wr_new_root:
> -		ret = 1;
> -		break;
> -	case wr_store_root:
> -		if (likely((mas->last != 0) || (mas->index != 0)))
> -			ret = 1;
> -		else if (((unsigned long) (entry) & 3) == 2)
> -			ret = 1;
> -		else
> -			ret = 0;
> +	case wr_exact_fit:
> +	case wr_append:
> +	case wr_slot_store:
> +		ret = 0;
>  		break;
>  	case wr_spanning_store:
>  		if (wr_mas->sufficient_height < wr_mas->vacant_height)
> @@ -4217,10 +4207,19 @@ static inline int mas_prealloc_calc(struct ma_wr_state *wr_mas, void *entry)
>  	case wr_node_store:
>  		ret = mt_in_rcu(mas->tree) ? 1 : 0;
>  		break;
> -	case wr_append:
> -	case wr_exact_fit:
> -	case wr_slot_store:
> -		ret = 0;
> +	case wr_new_root:
> +		ret = 1;
> +		break;
> +	case wr_store_root:
> +		if (likely((mas->last != 0) || (mas->index != 0)))
> +			ret = 1;
> +		else if (((unsigned long) (entry) & 3) == 2)
> +			ret = 1;
> +		else
> +			ret = 0;
> +		break;
> +	case wr_invalid:
> +		WARN_ON_ONCE(1);
>  	}
>  
>  	return ret;
> -- 
> 2.43.0
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 3/6] maple_tree: use vacant nodes to reduce worst case allocations
  2025-02-25 15:46   ` Liam R. Howlett
@ 2025-02-25 16:22     ` Sidhartha Kumar
  2025-02-26 15:30       ` Liam R. Howlett
  0 siblings, 1 reply; 18+ messages in thread
From: Sidhartha Kumar @ 2025-02-25 16:22 UTC (permalink / raw)
  To: Liam R. Howlett, linux-kernel, maple-tree, linux-mm, akpm,
	richard.weiyang

On 2/25/25 10:46 AM, Liam R. Howlett wrote:
> * Sidhartha Kumar <sidhartha.kumar@oracle.com> [250221 11:36]:
>> In order to determine the store type for a maple tree operation, a walk
>> of the tree is done through mas_wr_walk(). This function descends the
>> tree until a spanning write is detected or we reach a leaf node. While
>> descending, keep track of the height at which we encounter a node with
>> available space. This is done by checking if mas->end is less than the
>> number of slots a given node type can fit.
>>
>> Now that the height of the vacant node is tracked, we can use the
>> difference between the height of the tree and the height of the vacant
>> node to know how many levels we will have to propagate creating new
>> nodes. Update mas_prealloc_calc() to consider the vacant height and
>> reduce the number of worst-case allocations.
>>
>> Rebalancing and spanning stores are not supported and fall back to using
>> the full height of the tree for allocations.
>>
>> Update preallocation testing assertions to take into account vacant
>> height.
>>
>> Signed-off-by: Sidhartha <sidhartha.kumar@oracle.com>
>> ---
>>   include/linux/maple_tree.h       |   2 +
>>   lib/maple_tree.c                 |  13 ++--
>>   tools/testing/radix-tree/maple.c | 102 ++++++++++++++++++++++++++++---
>>   3 files changed, 105 insertions(+), 12 deletions(-)
>>
>> diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
>> index cbbcd18d4186..7d777aa2d9ed 100644
>> --- a/include/linux/maple_tree.h
>> +++ b/include/linux/maple_tree.h
>> @@ -463,6 +463,7 @@ struct ma_wr_state {
>>   	void __rcu **slots;		/* mas->node->slots pointer */
>>   	void *entry;			/* The entry to write */
>>   	void *content;			/* The existing entry that is being overwritten */
>> +	unsigned char vacant_height;	/* Height of lowest node with free space */
>>   };
>>   
>>   #define mas_lock(mas)           spin_lock(&((mas)->tree->ma_lock))
>> @@ -498,6 +499,7 @@ struct ma_wr_state {
>>   		.mas = ma_state,					\
>>   		.content = NULL,					\
>>   		.entry = wr_entry,					\
>> +		.vacant_height = 0					\
>>   	}
>>   
>>   #define MA_TOPIARY(name, tree)						\
>> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
>> index d7dac3119748..ef71af0529f4 100644
>> --- a/lib/maple_tree.c
>> +++ b/lib/maple_tree.c
>> @@ -3542,6 +3542,9 @@ static bool mas_wr_walk(struct ma_wr_state *wr_mas)
>>   		if (ma_is_leaf(wr_mas->type))
>>   			return true;
>>   
>> +		if (mas->end < mt_slots[wr_mas->type] - 1)
>> +			wr_mas->vacant_height = mas->depth + 1;
>> +
>>   		mas_wr_walk_traverse(wr_mas);
>>   	}
>>   
>> @@ -4157,7 +4160,9 @@ static inline void mas_wr_prealloc_setup(struct ma_wr_state *wr_mas)
>>   static inline int mas_prealloc_calc(struct ma_wr_state *wr_mas, void *entry)
>>   {
>>   	struct ma_state *mas = wr_mas->mas;
>> -	int ret = mas_mt_height(mas) * 3 + 1;
>> +	unsigned char height = mas_mt_height(mas);
>> +	int ret = height * 3 + 1;
>> +	unsigned char delta = height - wr_mas->vacant_height;
>>   
>>   	switch (mas->store_type) {
>>   	case wr_invalid:
>> @@ -4175,13 +4180,13 @@ static inline int mas_prealloc_calc(struct ma_wr_state *wr_mas, void *entry)
>>   			ret = 0;
>>   		break;
>>   	case wr_spanning_store:
>> -		ret =  mas_mt_height(mas) * 3 + 1;
>> +		ret = height * 3 + 1;
> 
> ret is already height * 3 + 1, you could add a WARN_ON_ONCE or such to
> ensure that's the case here and it's not missed in updates to the code.
> 
>>   		break;
>>   	case wr_split_store:
>> -		ret =  mas_mt_height(mas) * 2 + 1;
>> +		ret = delta * 2 + 1;
>>   		break;
>>   	case wr_rebalance:
>> -		ret =  mas_mt_height(mas) * 2 - 1;
>> +		ret = height * 2 + 1;
>>   		break;
>>   	case wr_node_store:
>>   		ret = mt_in_rcu(mas->tree) ? 1 : 0;
>> diff --git a/tools/testing/radix-tree/maple.c b/tools/testing/radix-tree/maple.c
>> index bc30050227fd..d22c1008dffe 100644
>> --- a/tools/testing/radix-tree/maple.c
>> +++ b/tools/testing/radix-tree/maple.c
>> @@ -35475,12 +35475,85 @@ static void check_dfs_preorder(struct maple_tree *mt)
>>   }
>>   /* End of depth first search tests */
>>   
>> +/* same implementation as mas_is_span_wr() in lib/maple_tree.c */
> 
> Is this needed then?  At the start of this file we have:
> #include "../../../lib/maple_tree.c" so I would think you could use the
> one already defined?


I don't think we can use that one because the it takes in a maple write 
state as an argument which is not exposed externally. The 
check_prealloc() tests use a maple state and the maple write state is 
then defined internally in mas_preallocate() so I'm not sure how to get 
an external facing interface to the maple write state.

That's why a similar implementation is needed but one that takes in a 
maple state rather than a maple write state.

Thanks,
Sid


> 
>> +static bool is_span_wr(struct ma_state *mas, unsigned long r_max,
>> +				enum maple_type type, void *entry)
>> +{
>> +	unsigned long max = r_max;
>> +	unsigned long last = mas->last;
>> +
>> +	/* Contained in this pivot, fast path */
>> +	if (last < max)
>> +		return false;
>> +
>> +	if (ma_is_leaf(type)) {
>> +		max = mas->max;
>> +		if (last < max)
>> +			return false;
>> +	}
>> +
>> +	if (last == max) {
>> +		/*
>> +		 * The last entry of leaf node cannot be NULL unless it is the
>> +		 * rightmost node (writing ULONG_MAX), otherwise it spans slots.
>> +		 */
>> +		if (entry || last == ULONG_MAX)
>> +			return false;
>> +	}
>> +
>> +	return true;
>> +}
>> +
>> +/* get height of the lowest non-leaf node with free space */
>> +static unsigned char get_vacant_height(struct ma_state *mas, void *entry)
>> +{
>> +	char vacant_height = 0;
>> +	enum maple_type type;
>> +	unsigned long *pivots;
>> +	unsigned long min = 0;
>> +	unsigned long max = ULONG_MAX;
>> +
>> +	/* start traversal */
>> +	mas_reset(mas);
>> +	mas_start(mas);
>> +	if (!xa_is_node(mas_root(mas)))
>> +		return 0;
>> +
>> +	type = mte_node_type(mas->node);
>> +	while (!ma_is_leaf(type)) {
>> +		mas_node_walk(mas, mte_to_node(mas->node), type, &min, &max);
>> +		mas->end = mas_data_end(mas);
>> +		pivots = ma_pivots(mte_to_node(mas->node), type);
>> +
>> +		if (pivots) {
>> +			if (mas->offset)
>> +				min = pivots[mas->offset - 1];
>> +			if (mas->offset < mas->end)
>> +				max = pivots[mas->offset];
>> +		}
>> +
>> +		/* detect spanning write */
>> +		if (is_span_wr(mas, max, type, entry))
>> +			break;
>> +
>> +		if (mas->end < mt_slot_count(mas->node) - 1)
>> +			vacant_height = mas->depth + 1;
>> +
>> +		mas_descend(mas);
>> +		type = mte_node_type(mas->node);
>> +		mas->depth++;
>> +	}
>> +
>> +	return vacant_height;
>> +}
>> +
>>   /* Preallocation testing */
>>   static noinline void __init check_prealloc(struct maple_tree *mt)
>>   {
>>   	unsigned long i, max = 100;
>>   	unsigned long allocated;
>>   	unsigned char height;
>> +	unsigned char vacant_height;
>>   	struct maple_node *mn;
>>   	void *ptr = check_prealloc;
>>   	MA_STATE(mas, mt, 10, 20);
>> @@ -35494,8 +35567,9 @@ static noinline void __init check_prealloc(struct maple_tree *mt)
>>   	MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
>>   	allocated = mas_allocated(&mas);
>>   	height = mas_mt_height(&mas);
>> +	vacant_height = get_vacant_height(&mas, ptr);
>>   	MT_BUG_ON(mt, allocated == 0);
>> -	MT_BUG_ON(mt, allocated != 1 + height * 3);
>> +	MT_BUG_ON(mt, allocated != 1 + (height - vacant_height) * 3);
>>   	mas_destroy(&mas);
>>   	allocated = mas_allocated(&mas);
>>   	MT_BUG_ON(mt, allocated != 0);
>> @@ -35503,8 +35577,9 @@ static noinline void __init check_prealloc(struct maple_tree *mt)
>>   	MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
>>   	allocated = mas_allocated(&mas);
>>   	height = mas_mt_height(&mas);
>> +	vacant_height = get_vacant_height(&mas, ptr);
>>   	MT_BUG_ON(mt, allocated == 0);
>> -	MT_BUG_ON(mt, allocated != 1 + height * 3);
>> +	MT_BUG_ON(mt, allocated != 1 + (height - vacant_height) * 3);
>>   	MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
>>   	mas_destroy(&mas);
>>   	allocated = mas_allocated(&mas);
>> @@ -35514,7 +35589,8 @@ static noinline void __init check_prealloc(struct maple_tree *mt)
>>   	MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
>>   	allocated = mas_allocated(&mas);
>>   	height = mas_mt_height(&mas);
>> -	MT_BUG_ON(mt, allocated != 1 + height * 3);
>> +	vacant_height = get_vacant_height(&mas, ptr);
>> +	MT_BUG_ON(mt, allocated != 1 + (height - vacant_height) * 3);
>>   	mn = mas_pop_node(&mas);
>>   	MT_BUG_ON(mt, mas_allocated(&mas) != allocated - 1);
>>   	mn->parent = ma_parent_ptr(mn);
>> @@ -35527,7 +35603,8 @@ static noinline void __init check_prealloc(struct maple_tree *mt)
>>   	MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
>>   	allocated = mas_allocated(&mas);
>>   	height = mas_mt_height(&mas);
>> -	MT_BUG_ON(mt, allocated != 1 + height * 3);
>> +	vacant_height = get_vacant_height(&mas, ptr);
>> +	MT_BUG_ON(mt, allocated != 1 + (height - vacant_height) * 3);
>>   	mn = mas_pop_node(&mas);
>>   	MT_BUG_ON(mt, mas_allocated(&mas) != allocated - 1);
>>   	MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
>> @@ -35540,7 +35617,8 @@ static noinline void __init check_prealloc(struct maple_tree *mt)
>>   	MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
>>   	allocated = mas_allocated(&mas);
>>   	height = mas_mt_height(&mas);
>> -	MT_BUG_ON(mt, allocated != 1 + height * 3);
>> +	vacant_height = get_vacant_height(&mas, ptr);
>> +	MT_BUG_ON(mt, allocated != 1 + (height - vacant_height) * 3);
>>   	mn = mas_pop_node(&mas);
>>   	MT_BUG_ON(mt, mas_allocated(&mas) != allocated - 1);
>>   	mas_push_node(&mas, mn);
>> @@ -35553,7 +35631,8 @@ static noinline void __init check_prealloc(struct maple_tree *mt)
>>   	MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
>>   	allocated = mas_allocated(&mas);
>>   	height = mas_mt_height(&mas);
>> -	MT_BUG_ON(mt, allocated != 1 + height * 3);
>> +	vacant_height = get_vacant_height(&mas, ptr);
>> +	MT_BUG_ON(mt, allocated != 1 + (height - vacant_height) * 3);
>>   	mas_store_prealloc(&mas, ptr);
>>   	MT_BUG_ON(mt, mas_allocated(&mas) != 0);
>>   
>> @@ -35578,7 +35657,8 @@ static noinline void __init check_prealloc(struct maple_tree *mt)
>>   	MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
>>   	allocated = mas_allocated(&mas);
>>   	height = mas_mt_height(&mas);
>> -	MT_BUG_ON(mt, allocated != 1 + height * 2);
>> +	vacant_height = get_vacant_height(&mas, ptr);
>> +	MT_BUG_ON(mt, allocated != 1 + (height - vacant_height) * 2);
>>   	mas_store_prealloc(&mas, ptr);
>>   	MT_BUG_ON(mt, mas_allocated(&mas) != 0);
>>   	mt_set_non_kernel(1);
>> @@ -35595,8 +35675,14 @@ static noinline void __init check_prealloc(struct maple_tree *mt)
>>   	MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
>>   	allocated = mas_allocated(&mas);
>>   	height = mas_mt_height(&mas);
>> +	vacant_height = get_vacant_height(&mas, ptr);
>>   	MT_BUG_ON(mt, allocated == 0);
>> -	MT_BUG_ON(mt, allocated != 1 + height * 3);
>> +	/*
>> +	 * vacant height cannot be used to compute the number of nodes needed
>> +	 * as the root contains two entries which means it is on the verge of
>> +	 * insufficiency. The worst case full height of the tree is needed.
>> +	 */
>> +	MT_BUG_ON(mt, allocated != height * 3 + 1);
>>   	mas_store_prealloc(&mas, ptr);
>>   	MT_BUG_ON(mt, mas_allocated(&mas) != 0);
>>   	mas_set_range(&mas, 0, 200);
>> -- 
>> 2.43.0
>>



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 5/6] maple_tree: add sufficient height
  2025-02-25 16:02   ` Liam R. Howlett
@ 2025-02-25 20:36     ` Sidhartha Kumar
  2025-02-26 15:15       ` Liam R. Howlett
  0 siblings, 1 reply; 18+ messages in thread
From: Sidhartha Kumar @ 2025-02-25 20:36 UTC (permalink / raw)
  To: Liam R. Howlett, linux-kernel, maple-tree, linux-mm, akpm,
	richard.weiyang

On 2/25/25 11:02 AM, Liam R. Howlett wrote:
> * Sidhartha Kumar <sidhartha.kumar@oracle.com> [250221 11:36]:
>> If a parent node is vacant but holds mt_min_slots + 1 entries,
>> rebalancing with a leaf node could cause this parent node to become
>> insufficient. This will lead to another level of rebalancing in the tree
>> and requires more node allocations. Therefore, we also have to track the
>> level at which there is a node with > mt_min_slots entries. We can use
>> this as the worst case for the spanning and rebalacning stores.
> 
> This may not explain the situation fully; We also have to track the last
> level at which there is a node that will not become insufficient.  We
> know that during rebalance, the number of entries in a non-leaf node may
> decrease by one.  Tracking the last node that will remain sufficient and
> stop the cascading operation can be used to reduce the number of nodes
> preallocated for the operation.
> 
> Note that this can happen at any level of an operation and not just a
> node containing leaves.
> 
> The spanning store operation can also be treated the same because the
> walk down the tree stops when it is detected.  That means the location
> of the walk that detects the spanning store may be reduced to be
> insufficient and will be rebalanced or may be split and need to absorb
> up to two entries.
> 
> I think this commit needs some more text explaining these changes.
> 

Does this commit message work better?


Using vacant height to reduce the worst case maple node allocation count 
can lead to a shortcoming of nodes in the following scenarios.

For rebalancing writes, when a leaf node becomes insufficient, we push 
the now insufficient number of entries into a sibling node. This means 
that the parent node which has entries for this children will lose one 
entry. If this parent node was only sufficient because it had the 
minimum number of entries to be sufficient, losing one entry will now 
cause this parent node to be insufficient. This leads to a cascading 
operation of rebalancing at different levels and can lead to more node 
allocations that simply using vacant height can return.

For spanning writes, a similar situation occurs. At the location at 
which a spanning write is detected, the number of ancestor nodes may 
similarly need to rebalanced into a smaller number of nodes and the same 
cascading situation could occur.

To use less than the full height of the tree for the number of 
allocations, we also need to track the height at which a non-leaf node 
cannot become insufficient. This means even if a rebalance occurs to a 
child of this node, it currently has enough entries that losing one 
entry will not cause this node to be insufficient. This field is stored 
in the maple write state as sufficient height. In mas_prealloc_calc() 
when figuring out how many nodes to allocate, we check if the the vacant 
node is lower in the tree than a sufficient node (has a larger value). 
If it is, we cannot use the vacant height and must use the different in 
the height and sufficient height as the basis for the number of nodes 
needed.




> 
>>
>> Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
>> ---
>>   include/linux/maple_tree.h       |  4 +++-
>>   lib/maple_tree.c                 | 17 +++++++++++++++--
>>   tools/testing/radix-tree/maple.c | 28 ++++++++++++++++++++++++++++
>>   3 files changed, 46 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
>> index 7d777aa2d9ed..37dc9525dff6 100644
>> --- a/include/linux/maple_tree.h
>> +++ b/include/linux/maple_tree.h
>> @@ -464,6 +464,7 @@ struct ma_wr_state {
>>   	void *entry;			/* The entry to write */
>>   	void *content;			/* The existing entry that is being overwritten */
>>   	unsigned char vacant_height;	/* Depth of lowest node with free space */
>> +	unsigned char sufficient_height;/* Depth of lowest node with min sufficiency + 1 nodes */
>>   };
>>   
>>   #define mas_lock(mas)           spin_lock(&((mas)->tree->ma_lock))
>> @@ -499,7 +500,8 @@ struct ma_wr_state {
>>   		.mas = ma_state,					\
>>   		.content = NULL,					\
>>   		.entry = wr_entry,					\
>> -		.vacant_height = 0					\
>> +		.vacant_height = 0,					\
>> +		.sufficient_height = 0					\
>>   	}
>>   
>>   #define MA_TOPIARY(name, tree)						\
>> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
>> index 4de257003251..8fdd3f477198 100644
>> --- a/lib/maple_tree.c
>> +++ b/lib/maple_tree.c
>> @@ -3558,6 +3558,13 @@ static bool mas_wr_walk(struct ma_wr_state *wr_mas)
>>   		if (mas->end < mt_slots[wr_mas->type] - 1)
>>   			wr_mas->vacant_height = mas->depth + 1;
>>   
>> +		if (ma_is_root(mas_mn(mas))) {
>> +			/* root needs more than 2 entries to be sufficient + 1 */
>> +			if (mas->end > 2)
>> +				wr_mas->sufficient_height = 1;
>> +		} else if (mas->end > mt_min_slots[wr_mas->type] + 1)
>> +			wr_mas->sufficient_height = mas->depth + 1;
>> +
>>   		mas_wr_walk_traverse(wr_mas);
>>   	}
>>   
>> @@ -4193,13 +4200,19 @@ static inline int mas_prealloc_calc(struct ma_wr_state *wr_mas, void *entry)
>>   			ret = 0;
>>   		break;
>>   	case wr_spanning_store:
>> -		ret = height * 3 + 1;
>> +		if (wr_mas->sufficient_height < wr_mas->vacant_height)
>> +			ret = (height - wr_mas->sufficient_height) * 3 + 1;
>> +		else
>> +			ret = delta * 3 + 1;
> 
> Ah, ret was short lived.  Okay.
> 
> I still think this stuff needs some more context in the commit message.
> 
>>   		break;
>>   	case wr_split_store:
>>   		ret = delta * 2 + 1;
>>   		break;
>>   	case wr_rebalance:
>> -		ret = height * 2 + 1;
>> +		if (wr_mas->sufficient_height < wr_mas->vacant_height)
>> +			ret = (height - wr_mas->sufficient_height) * 2 + 1;
>> +		else
>> +			ret = delta * 2 + 1;
>>   		break;
>>   	case wr_node_store:
>>   		ret = mt_in_rcu(mas->tree) ? 1 : 0;
>> diff --git a/tools/testing/radix-tree/maple.c b/tools/testing/radix-tree/maple.c
>> index d22c1008dffe..d40f70671cb8 100644
>> --- a/tools/testing/radix-tree/maple.c
>> +++ b/tools/testing/radix-tree/maple.c
>> @@ -36334,6 +36334,30 @@ static noinline void __init check_mtree_dup(struct maple_tree *mt)
>>   
>>   extern void test_kmem_cache_bulk(void);
>>   
>> +/*
>> + * Test to check the path of a spanning rebalance which results in
>> + * a collapse where the rebalancing of the child node leads to
>> + * insufficieny in the parent node.
>> + */
>> +static void check_collapsing_rebalance(struct maple_tree *mt)
>> +{
>> +	int i = 0;
>> +	MA_STATE(mas, mt, ULONG_MAX, ULONG_MAX);
>> +
>> +	/* create a height 4 tree */
>> +	while (mt_height(mt) < 4) {
>> +		mtree_store_range(mt, i, i + 10, xa_mk_value(i), GFP_KERNEL);
>> +		i += 9;
>> +	}
>> +
>> +	/* delete all entries one at a time, starting from the right */
>> +	do {
>> +		mas_erase(&mas);
>> +	} while (mas_prev(&mas, 0) != NULL);
>> +
>> +	mtree_unlock(mt);
>> +}
>> +
>>   /* callback function used for check_nomem_writer_race() */
>>   static void writer2(void *maple_tree)
>>   {
>> @@ -36500,6 +36524,10 @@ void farmer_tests(void)
>>   	check_spanning_write(&tree);
>>   	mtree_destroy(&tree);
>>   
>> +	mt_init_flags(&tree, MT_FLAGS_ALLOC_RANGE);
>> +	check_collapsing_rebalance(&tree);
>> +	mtree_destroy(&tree);
>> +
>>   	mt_init_flags(&tree, MT_FLAGS_ALLOC_RANGE);
>>   	check_null_expand(&tree);
>>   	mtree_destroy(&tree);
>> -- 
>> 2.43.0
>>



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 5/6] maple_tree: add sufficient height
  2025-02-25 20:36     ` Sidhartha Kumar
@ 2025-02-26 15:15       ` Liam R. Howlett
  0 siblings, 0 replies; 18+ messages in thread
From: Liam R. Howlett @ 2025-02-26 15:15 UTC (permalink / raw)
  To: Sidhartha Kumar; +Cc: linux-kernel, maple-tree, linux-mm, akpm, richard.weiyang

* Sidhartha Kumar <sidhartha.kumar@oracle.com> [250225 15:36]:
> On 2/25/25 11:02 AM, Liam R. Howlett wrote:
> > * Sidhartha Kumar <sidhartha.kumar@oracle.com> [250221 11:36]:
> > > If a parent node is vacant but holds mt_min_slots + 1 entries,
> > > rebalancing with a leaf node could cause this parent node to become
> > > insufficient. This will lead to another level of rebalancing in the tree
> > > and requires more node allocations. Therefore, we also have to track the
> > > level at which there is a node with > mt_min_slots entries. We can use
> > > this as the worst case for the spanning and rebalacning stores.
> > 
> > This may not explain the situation fully; We also have to track the last
> > level at which there is a node that will not become insufficient.  We
> > know that during rebalance, the number of entries in a non-leaf node may
> > decrease by one.  Tracking the last node that will remain sufficient and
> > stop the cascading operation can be used to reduce the number of nodes
> > preallocated for the operation.
> > 
> > Note that this can happen at any level of an operation and not just a
> > node containing leaves.
> > 
> > The spanning store operation can also be treated the same because the
> > walk down the tree stops when it is detected.  That means the location
> > of the walk that detects the spanning store may be reduced to be
> > insufficient and will be rebalanced or may be split and need to absorb
> > up to two entries.
> > 
> > I think this commit needs some more text explaining these changes.
> > 
> 
> Does this commit message work better?
> 
> 
> Using vacant height to reduce the worst case maple node allocation count can
> lead to a shortcoming of nodes in the following scenarios.

This sounds like you are fixing an issue you introduced, but really you
are making the calculation lower for cases that were not covered before.

> 
> For rebalancing writes, when a leaf node becomes insufficient, we push the
> now insufficient number of entries into a sibling node.

when a leaf node becomes insufficient, it may be combined with a sibling
into a single node.

>This means that the
> parent node which has entries for this children will lose one entry. If this
> parent node was only sufficient because it had the minimum number of entries
> to be sufficient,

"if this parent node was just meeting the minimum entries,"

>losing one entry will now cause this parent node to be
> insufficient. This leads to a cascading operation of rebalancing at
> different levels and can lead to more node allocations that simply using
                                                         ^^^^- than
> vacant height can return.
> 
> For spanning writes, a similar situation occurs. At the location at which a
> spanning write is detected, the number of ancestor nodes may similarly need
> to rebalanced into a smaller number of nodes and the same cascading
> situation could occur.
> 
> To use less than the full height of the tree for the number of allocations,
> we also need to track the height at which a non-leaf node cannot become
> insufficient. This means even if a rebalance occurs to a child of this node,
> it currently has enough entries that losing one entry will not cause this
> node to be insufficient.

"currently has enough entries that it can lose one without any further
action."

> This field is stored in the maple write state as
> sufficient height. In mas_prealloc_calc() when figuring out how many nodes
> to allocate, we check if the the vacant node is lower in the tree than a
> sufficient node (has a larger value). If it is, we cannot use the vacant
> height and must use the different in the height and sufficient height as the
> basis for the number of nodes needed.

Sounds good.

> 
> 
> 
> 
> > 
> > > 
> > > Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
> > > ---
> > >   include/linux/maple_tree.h       |  4 +++-
> > >   lib/maple_tree.c                 | 17 +++++++++++++++--
> > >   tools/testing/radix-tree/maple.c | 28 ++++++++++++++++++++++++++++
> > >   3 files changed, 46 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
> > > index 7d777aa2d9ed..37dc9525dff6 100644
> > > --- a/include/linux/maple_tree.h
> > > +++ b/include/linux/maple_tree.h
> > > @@ -464,6 +464,7 @@ struct ma_wr_state {
> > >   	void *entry;			/* The entry to write */
> > >   	void *content;			/* The existing entry that is being overwritten */
> > >   	unsigned char vacant_height;	/* Depth of lowest node with free space */
> > > +	unsigned char sufficient_height;/* Depth of lowest node with min sufficiency + 1 nodes */
> > >   };
> > >   #define mas_lock(mas)           spin_lock(&((mas)->tree->ma_lock))
> > > @@ -499,7 +500,8 @@ struct ma_wr_state {
> > >   		.mas = ma_state,					\
> > >   		.content = NULL,					\
> > >   		.entry = wr_entry,					\
> > > -		.vacant_height = 0					\
> > > +		.vacant_height = 0,					\
> > > +		.sufficient_height = 0					\
> > >   	}
> > >   #define MA_TOPIARY(name, tree)						\
> > > diff --git a/lib/maple_tree.c b/lib/maple_tree.c
> > > index 4de257003251..8fdd3f477198 100644
> > > --- a/lib/maple_tree.c
> > > +++ b/lib/maple_tree.c
> > > @@ -3558,6 +3558,13 @@ static bool mas_wr_walk(struct ma_wr_state *wr_mas)
> > >   		if (mas->end < mt_slots[wr_mas->type] - 1)
> > >   			wr_mas->vacant_height = mas->depth + 1;
> > > +		if (ma_is_root(mas_mn(mas))) {
> > > +			/* root needs more than 2 entries to be sufficient + 1 */
> > > +			if (mas->end > 2)
> > > +				wr_mas->sufficient_height = 1;
> > > +		} else if (mas->end > mt_min_slots[wr_mas->type] + 1)
> > > +			wr_mas->sufficient_height = mas->depth + 1;
> > > +
> > >   		mas_wr_walk_traverse(wr_mas);
> > >   	}
> > > @@ -4193,13 +4200,19 @@ static inline int mas_prealloc_calc(struct ma_wr_state *wr_mas, void *entry)
> > >   			ret = 0;
> > >   		break;
> > >   	case wr_spanning_store:
> > > -		ret = height * 3 + 1;
> > > +		if (wr_mas->sufficient_height < wr_mas->vacant_height)
> > > +			ret = (height - wr_mas->sufficient_height) * 3 + 1;
> > > +		else
> > > +			ret = delta * 3 + 1;
> > 
> > Ah, ret was short lived.  Okay.
> > 
> > I still think this stuff needs some more context in the commit message.
> > 
> > >   		break;
> > >   	case wr_split_store:
> > >   		ret = delta * 2 + 1;
> > >   		break;
> > >   	case wr_rebalance:
> > > -		ret = height * 2 + 1;
> > > +		if (wr_mas->sufficient_height < wr_mas->vacant_height)
> > > +			ret = (height - wr_mas->sufficient_height) * 2 + 1;
> > > +		else
> > > +			ret = delta * 2 + 1;
> > >   		break;
> > >   	case wr_node_store:
> > >   		ret = mt_in_rcu(mas->tree) ? 1 : 0;
> > > diff --git a/tools/testing/radix-tree/maple.c b/tools/testing/radix-tree/maple.c
> > > index d22c1008dffe..d40f70671cb8 100644
> > > --- a/tools/testing/radix-tree/maple.c
> > > +++ b/tools/testing/radix-tree/maple.c
> > > @@ -36334,6 +36334,30 @@ static noinline void __init check_mtree_dup(struct maple_tree *mt)
> > >   extern void test_kmem_cache_bulk(void);
> > > +/*
> > > + * Test to check the path of a spanning rebalance which results in
> > > + * a collapse where the rebalancing of the child node leads to
> > > + * insufficieny in the parent node.
> > > + */
> > > +static void check_collapsing_rebalance(struct maple_tree *mt)
> > > +{
> > > +	int i = 0;
> > > +	MA_STATE(mas, mt, ULONG_MAX, ULONG_MAX);
> > > +
> > > +	/* create a height 4 tree */
> > > +	while (mt_height(mt) < 4) {
> > > +		mtree_store_range(mt, i, i + 10, xa_mk_value(i), GFP_KERNEL);
> > > +		i += 9;
> > > +	}
> > > +
> > > +	/* delete all entries one at a time, starting from the right */
> > > +	do {
> > > +		mas_erase(&mas);
> > > +	} while (mas_prev(&mas, 0) != NULL);
> > > +
> > > +	mtree_unlock(mt);
> > > +}
> > > +
> > >   /* callback function used for check_nomem_writer_race() */
> > >   static void writer2(void *maple_tree)
> > >   {
> > > @@ -36500,6 +36524,10 @@ void farmer_tests(void)
> > >   	check_spanning_write(&tree);
> > >   	mtree_destroy(&tree);
> > > +	mt_init_flags(&tree, MT_FLAGS_ALLOC_RANGE);
> > > +	check_collapsing_rebalance(&tree);
> > > +	mtree_destroy(&tree);
> > > +
> > >   	mt_init_flags(&tree, MT_FLAGS_ALLOC_RANGE);
> > >   	check_null_expand(&tree);
> > >   	mtree_destroy(&tree);
> > > -- 
> > > 2.43.0
> > > 
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 3/6] maple_tree: use vacant nodes to reduce worst case allocations
  2025-02-25 16:22     ` Sidhartha Kumar
@ 2025-02-26 15:30       ` Liam R. Howlett
  0 siblings, 0 replies; 18+ messages in thread
From: Liam R. Howlett @ 2025-02-26 15:30 UTC (permalink / raw)
  To: Sidhartha Kumar; +Cc: linux-kernel, maple-tree, linux-mm, akpm, richard.weiyang

* Sidhartha Kumar <sidhartha.kumar@oracle.com> [250225 11:22]:
> On 2/25/25 10:46 AM, Liam R. Howlett wrote:
> > * Sidhartha Kumar <sidhartha.kumar@oracle.com> [250221 11:36]:
> > > In order to determine the store type for a maple tree operation, a walk
> > > of the tree is done through mas_wr_walk(). This function descends the
> > > tree until a spanning write is detected or we reach a leaf node. While
> > > descending, keep track of the height at which we encounter a node with
> > > available space. This is done by checking if mas->end is less than the
> > > number of slots a given node type can fit.
> > > 
> > > Now that the height of the vacant node is tracked, we can use the
> > > difference between the height of the tree and the height of the vacant
> > > node to know how many levels we will have to propagate creating new
> > > nodes. Update mas_prealloc_calc() to consider the vacant height and
> > > reduce the number of worst-case allocations.
> > > 
> > > Rebalancing and spanning stores are not supported and fall back to using
> > > the full height of the tree for allocations.
> > > 
> > > Update preallocation testing assertions to take into account vacant
> > > height.
> > > 
> > > Signed-off-by: Sidhartha <sidhartha.kumar@oracle.com>
> > > ---
> > >   include/linux/maple_tree.h       |   2 +
> > >   lib/maple_tree.c                 |  13 ++--
> > >   tools/testing/radix-tree/maple.c | 102 ++++++++++++++++++++++++++++---
> > >   3 files changed, 105 insertions(+), 12 deletions(-)
> > > 
> > > diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
> > > index cbbcd18d4186..7d777aa2d9ed 100644
> > > --- a/include/linux/maple_tree.h
> > > +++ b/include/linux/maple_tree.h
> > > @@ -463,6 +463,7 @@ struct ma_wr_state {
> > >   	void __rcu **slots;		/* mas->node->slots pointer */
> > >   	void *entry;			/* The entry to write */
> > >   	void *content;			/* The existing entry that is being overwritten */
> > > +	unsigned char vacant_height;	/* Height of lowest node with free space */
> > >   };
> > >   #define mas_lock(mas)           spin_lock(&((mas)->tree->ma_lock))
> > > @@ -498,6 +499,7 @@ struct ma_wr_state {
> > >   		.mas = ma_state,					\
> > >   		.content = NULL,					\
> > >   		.entry = wr_entry,					\
> > > +		.vacant_height = 0					\
> > >   	}
> > >   #define MA_TOPIARY(name, tree)						\
> > > diff --git a/lib/maple_tree.c b/lib/maple_tree.c
> > > index d7dac3119748..ef71af0529f4 100644
> > > --- a/lib/maple_tree.c
> > > +++ b/lib/maple_tree.c
> > > @@ -3542,6 +3542,9 @@ static bool mas_wr_walk(struct ma_wr_state *wr_mas)
> > >   		if (ma_is_leaf(wr_mas->type))
> > >   			return true;
> > > +		if (mas->end < mt_slots[wr_mas->type] - 1)
> > > +			wr_mas->vacant_height = mas->depth + 1;
> > > +
> > >   		mas_wr_walk_traverse(wr_mas);
> > >   	}
> > > @@ -4157,7 +4160,9 @@ static inline void mas_wr_prealloc_setup(struct ma_wr_state *wr_mas)
> > >   static inline int mas_prealloc_calc(struct ma_wr_state *wr_mas, void *entry)
> > >   {
> > >   	struct ma_state *mas = wr_mas->mas;
> > > -	int ret = mas_mt_height(mas) * 3 + 1;
> > > +	unsigned char height = mas_mt_height(mas);
> > > +	int ret = height * 3 + 1;
> > > +	unsigned char delta = height - wr_mas->vacant_height;
> > >   	switch (mas->store_type) {
> > >   	case wr_invalid:
> > > @@ -4175,13 +4180,13 @@ static inline int mas_prealloc_calc(struct ma_wr_state *wr_mas, void *entry)
> > >   			ret = 0;
> > >   		break;
> > >   	case wr_spanning_store:
> > > -		ret =  mas_mt_height(mas) * 3 + 1;
> > > +		ret = height * 3 + 1;
> > 
> > ret is already height * 3 + 1, you could add a WARN_ON_ONCE or such to
> > ensure that's the case here and it's not missed in updates to the code.
> > 
> > >   		break;
> > >   	case wr_split_store:
> > > -		ret =  mas_mt_height(mas) * 2 + 1;
> > > +		ret = delta * 2 + 1;
> > >   		break;
> > >   	case wr_rebalance:
> > > -		ret =  mas_mt_height(mas) * 2 - 1;
> > > +		ret = height * 2 + 1;
> > >   		break;
> > >   	case wr_node_store:
> > >   		ret = mt_in_rcu(mas->tree) ? 1 : 0;
> > > diff --git a/tools/testing/radix-tree/maple.c b/tools/testing/radix-tree/maple.c
> > > index bc30050227fd..d22c1008dffe 100644
> > > --- a/tools/testing/radix-tree/maple.c
> > > +++ b/tools/testing/radix-tree/maple.c
> > > @@ -35475,12 +35475,85 @@ static void check_dfs_preorder(struct maple_tree *mt)
> > >   }
> > >   /* End of depth first search tests */
> > > +/* same implementation as mas_is_span_wr() in lib/maple_tree.c */
> > 
> > Is this needed then?  At the start of this file we have:
> > #include "../../../lib/maple_tree.c" so I would think you could use the
> > one already defined?
> 
> 
> I don't think we can use that one because the it takes in a maple write
> state as an argument which is not exposed externally. The check_prealloc()
> tests use a maple state and the maple write state is then defined internally
> in mas_preallocate() so I'm not sure how to get an external facing interface
> to the maple write state.

Ah, yes.  But you have everything it uses, so you can use it by setting:
wr_mas.mas = mas,
wr_mas.r_max = r_max,
wr_mas.type = type,
wr_mas.entry = entry

You can add a new MA_WR_STATE #define like in the maple_tree.h file, but
only to the testing side.

I think it's worth doing because, if the internal checking changes then
this test will still work (or will fail and force an update the wr_mas
setup, or whatever..).  If we have duplicated code that becomes out of
sync, we risk passing this test without actually getting the right answer
in the kernel where it matters most.

> 
> That's why a similar implementation is needed but one that takes in a maple
> state rather than a maple write state.
> 
> Thanks,
> Sid
> 
> 
> > 
> > > +static bool is_span_wr(struct ma_state *mas, unsigned long r_max,
> > > +				enum maple_type type, void *entry)
> > > +{
> > > +	unsigned long max = r_max;
> > > +	unsigned long last = mas->last;
> > > +
> > > +	/* Contained in this pivot, fast path */
> > > +	if (last < max)
> > > +		return false;
> > > +
> > > +	if (ma_is_leaf(type)) {
> > > +		max = mas->max;
> > > +		if (last < max)
> > > +			return false;
> > > +	}
> > > +
> > > +	if (last == max) {
> > > +		/*
> > > +		 * The last entry of leaf node cannot be NULL unless it is the
> > > +		 * rightmost node (writing ULONG_MAX), otherwise it spans slots.
> > > +		 */
> > > +		if (entry || last == ULONG_MAX)
> > > +			return false;
> > > +	}
> > > +
> > > +	return true;
> > > +}
> > > +
> > > +/* get height of the lowest non-leaf node with free space */
> > > +static unsigned char get_vacant_height(struct ma_state *mas, void *entry)
> > > +{
> > > +	char vacant_height = 0;
> > > +	enum maple_type type;
> > > +	unsigned long *pivots;
> > > +	unsigned long min = 0;
> > > +	unsigned long max = ULONG_MAX;
> > > +
> > > +	/* start traversal */
> > > +	mas_reset(mas);
> > > +	mas_start(mas);
> > > +	if (!xa_is_node(mas_root(mas)))
> > > +		return 0;
> > > +
> > > +	type = mte_node_type(mas->node);
> > > +	while (!ma_is_leaf(type)) {
> > > +		mas_node_walk(mas, mte_to_node(mas->node), type, &min, &max);
> > > +		mas->end = mas_data_end(mas);
> > > +		pivots = ma_pivots(mte_to_node(mas->node), type);
> > > +
> > > +		if (pivots) {
> > > +			if (mas->offset)
> > > +				min = pivots[mas->offset - 1];
> > > +			if (mas->offset < mas->end)
> > > +				max = pivots[mas->offset];
> > > +		}
> > > +
> > > +		/* detect spanning write */
> > > +		if (is_span_wr(mas, max, type, entry))
> > > +			break;
> > > +
> > > +		if (mas->end < mt_slot_count(mas->node) - 1)
> > > +			vacant_height = mas->depth + 1;
> > > +
> > > +		mas_descend(mas);
> > > +		type = mte_node_type(mas->node);
> > > +		mas->depth++;
> > > +	}
> > > +
> > > +	return vacant_height;
> > > +}
> > > +
> > >   /* Preallocation testing */
> > >   static noinline void __init check_prealloc(struct maple_tree *mt)
> > >   {
> > >   	unsigned long i, max = 100;
> > >   	unsigned long allocated;
> > >   	unsigned char height;
> > > +	unsigned char vacant_height;
> > >   	struct maple_node *mn;
> > >   	void *ptr = check_prealloc;
> > >   	MA_STATE(mas, mt, 10, 20);
> > > @@ -35494,8 +35567,9 @@ static noinline void __init check_prealloc(struct maple_tree *mt)
> > >   	MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
> > >   	allocated = mas_allocated(&mas);
> > >   	height = mas_mt_height(&mas);
> > > +	vacant_height = get_vacant_height(&mas, ptr);
> > >   	MT_BUG_ON(mt, allocated == 0);
> > > -	MT_BUG_ON(mt, allocated != 1 + height * 3);
> > > +	MT_BUG_ON(mt, allocated != 1 + (height - vacant_height) * 3);
> > >   	mas_destroy(&mas);
> > >   	allocated = mas_allocated(&mas);
> > >   	MT_BUG_ON(mt, allocated != 0);
> > > @@ -35503,8 +35577,9 @@ static noinline void __init check_prealloc(struct maple_tree *mt)
> > >   	MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
> > >   	allocated = mas_allocated(&mas);
> > >   	height = mas_mt_height(&mas);
> > > +	vacant_height = get_vacant_height(&mas, ptr);
> > >   	MT_BUG_ON(mt, allocated == 0);
> > > -	MT_BUG_ON(mt, allocated != 1 + height * 3);
> > > +	MT_BUG_ON(mt, allocated != 1 + (height - vacant_height) * 3);
> > >   	MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
> > >   	mas_destroy(&mas);
> > >   	allocated = mas_allocated(&mas);
> > > @@ -35514,7 +35589,8 @@ static noinline void __init check_prealloc(struct maple_tree *mt)
> > >   	MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
> > >   	allocated = mas_allocated(&mas);
> > >   	height = mas_mt_height(&mas);
> > > -	MT_BUG_ON(mt, allocated != 1 + height * 3);
> > > +	vacant_height = get_vacant_height(&mas, ptr);
> > > +	MT_BUG_ON(mt, allocated != 1 + (height - vacant_height) * 3);
> > >   	mn = mas_pop_node(&mas);
> > >   	MT_BUG_ON(mt, mas_allocated(&mas) != allocated - 1);
> > >   	mn->parent = ma_parent_ptr(mn);
> > > @@ -35527,7 +35603,8 @@ static noinline void __init check_prealloc(struct maple_tree *mt)
> > >   	MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
> > >   	allocated = mas_allocated(&mas);
> > >   	height = mas_mt_height(&mas);
> > > -	MT_BUG_ON(mt, allocated != 1 + height * 3);
> > > +	vacant_height = get_vacant_height(&mas, ptr);
> > > +	MT_BUG_ON(mt, allocated != 1 + (height - vacant_height) * 3);
> > >   	mn = mas_pop_node(&mas);
> > >   	MT_BUG_ON(mt, mas_allocated(&mas) != allocated - 1);
> > >   	MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
> > > @@ -35540,7 +35617,8 @@ static noinline void __init check_prealloc(struct maple_tree *mt)
> > >   	MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
> > >   	allocated = mas_allocated(&mas);
> > >   	height = mas_mt_height(&mas);
> > > -	MT_BUG_ON(mt, allocated != 1 + height * 3);
> > > +	vacant_height = get_vacant_height(&mas, ptr);
> > > +	MT_BUG_ON(mt, allocated != 1 + (height - vacant_height) * 3);
> > >   	mn = mas_pop_node(&mas);
> > >   	MT_BUG_ON(mt, mas_allocated(&mas) != allocated - 1);
> > >   	mas_push_node(&mas, mn);
> > > @@ -35553,7 +35631,8 @@ static noinline void __init check_prealloc(struct maple_tree *mt)
> > >   	MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
> > >   	allocated = mas_allocated(&mas);
> > >   	height = mas_mt_height(&mas);
> > > -	MT_BUG_ON(mt, allocated != 1 + height * 3);
> > > +	vacant_height = get_vacant_height(&mas, ptr);
> > > +	MT_BUG_ON(mt, allocated != 1 + (height - vacant_height) * 3);
> > >   	mas_store_prealloc(&mas, ptr);
> > >   	MT_BUG_ON(mt, mas_allocated(&mas) != 0);
> > > @@ -35578,7 +35657,8 @@ static noinline void __init check_prealloc(struct maple_tree *mt)
> > >   	MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
> > >   	allocated = mas_allocated(&mas);
> > >   	height = mas_mt_height(&mas);
> > > -	MT_BUG_ON(mt, allocated != 1 + height * 2);
> > > +	vacant_height = get_vacant_height(&mas, ptr);
> > > +	MT_BUG_ON(mt, allocated != 1 + (height - vacant_height) * 2);
> > >   	mas_store_prealloc(&mas, ptr);
> > >   	MT_BUG_ON(mt, mas_allocated(&mas) != 0);
> > >   	mt_set_non_kernel(1);
> > > @@ -35595,8 +35675,14 @@ static noinline void __init check_prealloc(struct maple_tree *mt)
> > >   	MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
> > >   	allocated = mas_allocated(&mas);
> > >   	height = mas_mt_height(&mas);
> > > +	vacant_height = get_vacant_height(&mas, ptr);
> > >   	MT_BUG_ON(mt, allocated == 0);
> > > -	MT_BUG_ON(mt, allocated != 1 + height * 3);
> > > +	/*
> > > +	 * vacant height cannot be used to compute the number of nodes needed
> > > +	 * as the root contains two entries which means it is on the verge of
> > > +	 * insufficiency. The worst case full height of the tree is needed.
> > > +	 */
> > > +	MT_BUG_ON(mt, allocated != height * 3 + 1);
> > >   	mas_store_prealloc(&mas, ptr);
> > >   	MT_BUG_ON(mt, mas_allocated(&mas) != 0);
> > >   	mas_set_range(&mas, 0, 200);
> > > -- 
> > > 2.43.0
> > > 
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2025-02-26 15:31 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-02-21 16:36 [PATCH v2 0/6] Track node vacancy to reduce worst case allocation counts Sidhartha Kumar
2025-02-21 16:36 ` [PATCH v2 1/6] maple_tree: convert mas_prealloc_calc() to take in a maple write state Sidhartha Kumar
2025-02-25 15:40   ` Liam R. Howlett
2025-02-21 16:36 ` [PATCH v2 2/6] maple_tree: use height and depth consistently Sidhartha Kumar
2025-02-25 15:39   ` Liam R. Howlett
2025-02-21 16:36 ` [PATCH v2 3/6] maple_tree: use vacant nodes to reduce worst case allocations Sidhartha Kumar
2025-02-25 15:46   ` Liam R. Howlett
2025-02-25 16:22     ` Sidhartha Kumar
2025-02-26 15:30       ` Liam R. Howlett
2025-02-21 16:36 ` [PATCH v2 4/6] maple_tree: break on convergence in mas_spanning_rebalance() Sidhartha Kumar
2025-02-25 15:47   ` Liam R. Howlett
2025-02-21 16:36 ` [PATCH v2 5/6] maple_tree: add sufficient height Sidhartha Kumar
2025-02-25 16:02   ` Liam R. Howlett
2025-02-25 20:36     ` Sidhartha Kumar
2025-02-26 15:15       ` Liam R. Howlett
2025-02-21 16:36 ` [PATCH v2 6/6] maple_tree: reorder mas->store_type case statements Sidhartha Kumar
2025-02-25 16:03   ` Liam R. Howlett
2025-02-25 15:40 ` [PATCH v2 0/6] Track node vacancy to reduce worst case allocation counts Liam R. Howlett

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox