* [PATCH 0/5] Track node vacancy to reduce worst case allocation counts
@ 2024-11-14 17:05 Sidhartha Kumar
2024-11-14 17:05 ` [PATCH 1/5] maple_tree: convert mas_prealloc_calc() to take in a maple write state Sidhartha Kumar
` (5 more replies)
0 siblings, 6 replies; 16+ messages in thread
From: Sidhartha Kumar @ 2024-11-14 17:05 UTC (permalink / raw)
To: linux-kernel, maple-tree; +Cc: linux-mm, akpm, liam.howlett, Sidhartha Kumar
================ overview ========================
Currently, the maple tree preallocates the worst case number of nodes for
given store type by taking into account the whole height of the tree. This
comes from a worst case scenario of every node in the tree being full and
having to propagate node allocation upwards until we reach the root of the
tree. This can be optimized if there are vacancies in nodes that are at a
lower depth than the root node. This series implements tracking the level
at which there is a vacant node so we only need to allocate until this
level is reached, rather than always using the full height of the tree.
The ma_wr_state struct is modified to add a field which keeps track of the
vacant height and is updated during walks of the tree. This value is then
read in mas_prealloc_calc() when we decide how many nodes to allocate.
For rebalancing stores, we also need to track the lowest height at which
a node has 1 more entry than the minimum sufficient number of entries.
This is because rebalancing can cause a parent node to become insufficient
which results in further node allocations. In this case, we need to use
the sufficient height as the worst case rather than the vacant height.
patch 1-2: preparatory patches
patch 3: implement vacant height tracking + update the tests
patch 4: support vacant height tracking for rebalacning writes
patch 5: implement sufficient height tracking
================ results =========================
Bpftrace was used to profile the allocation path for requesting new maple
nodes while running the ./mmap1_processes test from mmtests. The two paths
for allocation are requests for a single node and the bulk allocation path.
The histogram represents the number of calls to these paths and a shows the
distribution of the number of nodes requested for the bulk allocation path.
mm-unstable 11/13/24
@bulk_alloc_req:
[2, 4) 10 |@@@@@@@@@@@@@ |
[4, 8) 38 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[8, 16) 19 |@@@@@@@@@@@@@@@@@@@@@@@@@@ |
mm-unstable 11/13/24 + this series
@bulk_alloc_req:
[2, 4) 9 |@@@@@@@@@@ |
[4, 8) 43 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[8, 16) 15 |@@@@@@@@@@@@@@@@@@ |
We can see the worst case bulk allocations of [8,16) nodes are reduced after
this series.
Sidhartha Kumar (5):
maple_tree: convert mas_prealloc_calc() to take in a maple write state
maple_tree: use height and depth consistently
maple_tree: use vacant nodes to reduce worst case allocations
maple_tree: break on convergence in mas_spanning_rebalance()
maple_tree: add sufficient height
include/linux/maple_tree.h | 4 +
lib/maple_tree.c | 89 +++++++++++++---------
tools/testing/radix-tree/maple.c | 125 +++++++++++++++++++++++++++++--
3 files changed, 176 insertions(+), 42 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/5] maple_tree: convert mas_prealloc_calc() to take in a maple write state
2024-11-14 17:05 [PATCH 0/5] Track node vacancy to reduce worst case allocation counts Sidhartha Kumar
@ 2024-11-14 17:05 ` Sidhartha Kumar
2024-11-14 17:05 ` [PATCH 2/5] maple_tree: use height and depth consistently Sidhartha Kumar
` (4 subsequent siblings)
5 siblings, 0 replies; 16+ messages in thread
From: Sidhartha Kumar @ 2024-11-14 17:05 UTC (permalink / raw)
To: linux-kernel, maple-tree; +Cc: linux-mm, akpm, liam.howlett, Sidhartha Kumar
In a subsequent patch, mas_prealloc_calc() will need to access fields only
in the ma_wr_state. Convert the function to take in a ma_wr_state and
modify all callers.
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
---
lib/maple_tree.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index d0ae808f3a14..318d496debd7 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -4155,13 +4155,14 @@ static inline void mas_wr_prealloc_setup(struct ma_wr_state *wr_mas)
/**
* mas_prealloc_calc() - Calculate number of nodes needed for a
* given store oepration
- * @mas: The maple state
+ * @wr_mas: The maple write state
* @entry: The entry to store into the tree
*
* Return: Number of nodes required for preallocation.
*/
-static inline int mas_prealloc_calc(struct ma_state *mas, void *entry)
+static inline int mas_prealloc_calc(struct ma_wr_state *wr_mas, void *entry)
{
+ struct ma_state *mas = wr_mas->mas;
int ret = mas_mt_height(mas) * 3 + 1;
switch (mas->store_type) {
@@ -4258,16 +4259,15 @@ static inline enum store_type mas_wr_store_type(struct ma_wr_state *wr_mas)
*/
static inline void mas_wr_preallocate(struct ma_wr_state *wr_mas, void *entry)
{
- struct ma_state *mas = wr_mas->mas;
int request;
mas_wr_prealloc_setup(wr_mas);
- mas->store_type = mas_wr_store_type(wr_mas);
- request = mas_prealloc_calc(mas, entry);
+ wr_mas->mas->store_type = mas_wr_store_type(wr_mas);
+ request = mas_prealloc_calc(wr_mas, entry);
if (!request)
return;
- mas_node_count(mas, request);
+ mas_node_count(wr_mas->mas, request);
}
/**
@@ -5441,7 +5441,7 @@ void *mas_store(struct ma_state *mas, void *entry)
return wr_mas.content;
}
- request = mas_prealloc_calc(mas, entry);
+ request = mas_prealloc_calc(&wr_mas, entry);
if (!request)
goto store;
@@ -5538,7 +5538,7 @@ int mas_preallocate(struct ma_state *mas, void *entry, gfp_t gfp)
mas_wr_prealloc_setup(&wr_mas);
mas->store_type = mas_wr_store_type(&wr_mas);
- request = mas_prealloc_calc(mas, entry);
+ request = mas_prealloc_calc(&wr_mas, entry);
if (!request)
return ret;
--
2.43.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 2/5] maple_tree: use height and depth consistently
2024-11-14 17:05 [PATCH 0/5] Track node vacancy to reduce worst case allocation counts Sidhartha Kumar
2024-11-14 17:05 ` [PATCH 1/5] maple_tree: convert mas_prealloc_calc() to take in a maple write state Sidhartha Kumar
@ 2024-11-14 17:05 ` Sidhartha Kumar
2024-11-14 17:05 ` [PATCH 3/5] maple_tree: use vacant nodes to reduce worst case allocations Sidhartha Kumar
` (3 subsequent siblings)
5 siblings, 0 replies; 16+ messages in thread
From: Sidhartha Kumar @ 2024-11-14 17:05 UTC (permalink / raw)
To: linux-kernel, maple-tree; +Cc: linux-mm, akpm, liam.howlett, Sidhartha Kumar
For the maple tree, the root node is defined to have a depth of 0 with a
height of 1. Each level down from the node, these values are incremented
by 1. Various code paths define a root with depth 1 which is inconsisent
with the definition. Modify the code to be consistent with this
definition.
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
---
lib/maple_tree.c | 30 +++++++++++++-----------------
1 file changed, 13 insertions(+), 17 deletions(-)
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 318d496debd7..21289e350382 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -211,14 +211,14 @@ static void ma_free_rcu(struct maple_node *node)
call_rcu(&node->rcu, mt_free_rcu);
}
-static void mas_set_height(struct ma_state *mas)
+static void mt_set_height(struct maple_tree *mt, unsigned char height)
{
- unsigned int new_flags = mas->tree->ma_flags;
+ unsigned int new_flags = mt->ma_flags;
new_flags &= ~MT_FLAGS_HEIGHT_MASK;
- MAS_BUG_ON(mas, mas->depth > MAPLE_HEIGHT_MAX);
- new_flags |= mas->depth << MT_FLAGS_HEIGHT_OFFSET;
- mas->tree->ma_flags = new_flags;
+ MT_BUG_ON(mt, height > MAPLE_HEIGHT_MAX);
+ new_flags |= height << MT_FLAGS_HEIGHT_OFFSET;
+ mt->ma_flags = new_flags;
}
static unsigned int mas_mt_height(struct ma_state *mas)
@@ -1375,7 +1375,6 @@ static inline struct maple_enode *mas_start(struct ma_state *mas)
root = mas_root(mas);
/* Tree with nodes */
if (likely(xa_is_node(root))) {
- mas->depth = 1;
mas->status = ma_active;
mas->node = mte_safe_root(root);
mas->offset = 0;
@@ -1727,7 +1726,7 @@ static inline void mas_put_in_tree(struct ma_state *mas,
if (mte_is_root(mas->node)) {
mas_mn(mas)->parent = ma_parent_ptr(mas_tree_parent(mas));
rcu_assign_pointer(mas->tree->ma_root, mte_mk_root(mas->node));
- mas_set_height(mas);
+ mt_set_height(mas->tree, mas->depth + 1);
} else {
offset = mte_parent_slot(mas->node);
@@ -2888,12 +2887,12 @@ static void mas_spanning_rebalance(struct ma_state *mas,
*/
memset(mast->bn, 0, sizeof(struct maple_big_node));
mast->bn->type = mte_node_type(left);
- l_mas.depth++;
/* Root already stored in l->node. */
if (mas_is_root_limits(mast->l))
goto new_root;
+ l_mas.depth++;
mast_ascend(mast);
mast_combine_cp_left(mast);
l_mas.offset = mast->bn->b_end;
@@ -3141,7 +3140,7 @@ static inline void mas_split_final_node(struct maple_subtree_state *mast,
mast->bn->type = maple_arange_64;
else
mast->bn->type = maple_range_64;
- mas->depth = height;
+ mas->depth = height - 1;
}
/*
* Only a single node is used here, could be root.
@@ -3308,6 +3307,7 @@ static void mas_split(struct ma_state *mas, struct maple_big_node *b_node)
{
struct maple_subtree_state mast;
int height = 0;
+ unsigned int orig_height = mas_mt_height(mas);
unsigned char mid_split, split = 0;
struct maple_enode *old;
@@ -3334,7 +3334,6 @@ static void mas_split(struct ma_state *mas, struct maple_big_node *b_node)
MA_STATE(prev_r_mas, mas->tree, mas->index, mas->last);
trace_ma_op(__func__, mas);
- mas->depth = mas_mt_height(mas);
mast.l = &l_mas;
mast.r = &r_mas;
@@ -3342,7 +3341,7 @@ static void mas_split(struct ma_state *mas, struct maple_big_node *b_node)
mast.orig_r = &prev_r_mas;
mast.bn = b_node;
- while (height++ <= mas->depth) {
+ while (height++ <= orig_height) {
if (mt_slots[b_node->type] > b_node->b_end) {
mas_split_final_node(&mast, mas, height);
break;
@@ -3439,8 +3438,7 @@ static inline void mas_root_expand(struct ma_state *mas, void *entry)
if (mas->last != ULONG_MAX)
pivots[++slot] = ULONG_MAX;
- mas->depth = 1;
- mas_set_height(mas);
+ mt_set_height(mas->tree, 1);
ma_set_meta(node, maple_leaf_64, 0, slot);
/* swap the new root into the tree */
rcu_assign_pointer(mas->tree->ma_root, mte_mk_root(mas->node));
@@ -3684,8 +3682,7 @@ static inline void mas_new_root(struct ma_state *mas, void *entry)
WARN_ON_ONCE(mas->index || mas->last != ULONG_MAX);
if (!entry) {
- mas->depth = 0;
- mas_set_height(mas);
+ mt_set_height(mas->tree, 0);
rcu_assign_pointer(mas->tree->ma_root, entry);
mas->status = ma_start;
goto done;
@@ -3699,8 +3696,7 @@ static inline void mas_new_root(struct ma_state *mas, void *entry)
mas->status = ma_active;
rcu_assign_pointer(slots[0], entry);
pivots[0] = mas->last;
- mas->depth = 1;
- mas_set_height(mas);
+ mt_set_height(mas->tree, 1);
rcu_assign_pointer(mas->tree->ma_root, mte_mk_root(mas->node));
done:
--
2.43.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 3/5] maple_tree: use vacant nodes to reduce worst case allocations
2024-11-14 17:05 [PATCH 0/5] Track node vacancy to reduce worst case allocation counts Sidhartha Kumar
2024-11-14 17:05 ` [PATCH 1/5] maple_tree: convert mas_prealloc_calc() to take in a maple write state Sidhartha Kumar
2024-11-14 17:05 ` [PATCH 2/5] maple_tree: use height and depth consistently Sidhartha Kumar
@ 2024-11-14 17:05 ` Sidhartha Kumar
2024-11-15 7:52 ` Wei Yang
2024-11-14 17:05 ` [PATCH 4/5] maple_tree: break on convergence in mas_spanning_rebalance() Sidhartha Kumar
` (2 subsequent siblings)
5 siblings, 1 reply; 16+ messages in thread
From: Sidhartha Kumar @ 2024-11-14 17:05 UTC (permalink / raw)
To: linux-kernel, maple-tree; +Cc: linux-mm, akpm, liam.howlett, Sidhartha Kumar
In order to determine the store type for a maple tree operation, a walk
of the tree is done through mas_wr_walk(). This function descends the
tree until a spanning write is detected or we reach a leaf node. While
descending, keep track of the height at which we encounter a node with
available space. This is done by checking if mas->end is less than the
number of slots a given node type can fit.
Now that the height of the vacant node is tracked, we can use the
difference between the height of the tree and the height of the vacant
node to know how many levels we will have to propagate creating new
nodes. Update mas_prealloc_calc() to consider the vacant height and
reduce the number of worst allocations.
Rebalancing stores are not supported and fall back to using the full
height of the tree for allocations.
Update preallocation testing assertions to take into account vacant
height.
Signed-off-by: Sidhartha <sidhartha.kumar@oracle.com>
---
include/linux/maple_tree.h | 2 +
lib/maple_tree.c | 13 +++--
tools/testing/radix-tree/maple.c | 97 +++++++++++++++++++++++++++++---
3 files changed, 100 insertions(+), 12 deletions(-)
diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
index cbbcd18d4186..7d777aa2d9ed 100644
--- a/include/linux/maple_tree.h
+++ b/include/linux/maple_tree.h
@@ -463,6 +463,7 @@ struct ma_wr_state {
void __rcu **slots; /* mas->node->slots pointer */
void *entry; /* The entry to write */
void *content; /* The existing entry that is being overwritten */
+ unsigned char vacant_height; /* Depth of lowest node with free space */
};
#define mas_lock(mas) spin_lock(&((mas)->tree->ma_lock))
@@ -498,6 +499,7 @@ struct ma_wr_state {
.mas = ma_state, \
.content = NULL, \
.entry = wr_entry, \
+ .vacant_height = 0 \
}
#define MA_TOPIARY(name, tree) \
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 21289e350382..f14d70c171c2 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -3545,6 +3545,9 @@ static bool mas_wr_walk(struct ma_wr_state *wr_mas)
if (ma_is_leaf(wr_mas->type))
return true;
+ if (mas->end < mt_slots[wr_mas->type] - 1)
+ wr_mas->vacant_height = mas->depth + 1;
+
mas_wr_walk_traverse(wr_mas);
}
@@ -4159,7 +4162,9 @@ static inline void mas_wr_prealloc_setup(struct ma_wr_state *wr_mas)
static inline int mas_prealloc_calc(struct ma_wr_state *wr_mas, void *entry)
{
struct ma_state *mas = wr_mas->mas;
- int ret = mas_mt_height(mas) * 3 + 1;
+ unsigned char height = mas_mt_height(mas);
+ int ret = height * 3 + 1;
+ unsigned char delta = height - wr_mas->vacant_height;
switch (mas->store_type) {
case wr_invalid:
@@ -4177,13 +4182,13 @@ static inline int mas_prealloc_calc(struct ma_wr_state *wr_mas, void *entry)
ret = 0;
break;
case wr_spanning_store:
- ret = mas_mt_height(mas) * 3 + 1;
+ ret = delta * 3 + 1;
break;
case wr_split_store:
- ret = mas_mt_height(mas) * 2 + 1;
+ ret = delta * 2 + 1;
break;
case wr_rebalance:
- ret = mas_mt_height(mas) * 2 - 1;
+ ret = height * 2 + 1;
break;
case wr_node_store:
ret = mt_in_rcu(mas->tree) ? 1 : 0;
diff --git a/tools/testing/radix-tree/maple.c b/tools/testing/radix-tree/maple.c
index bc30050227fd..bc8b107e0177 100644
--- a/tools/testing/radix-tree/maple.c
+++ b/tools/testing/radix-tree/maple.c
@@ -35475,12 +35475,85 @@ static void check_dfs_preorder(struct maple_tree *mt)
}
/* End of depth first search tests */
+/* same implementation as mas_is_span_wr() in lib/maple_tree.c */
+static bool is_span_wr(struct ma_state *mas, unsigned long r_max,
+ enum maple_type type, void *entry)
+{
+ unsigned long max = r_max;
+ unsigned long last = mas->last;
+
+ /* Contained in this pivot, fast path */
+ if (last < max)
+ return false;
+
+ if (ma_is_leaf(type)) {
+ max = mas->max;
+ if (last < max)
+ return false;
+ }
+
+ if (last == max) {
+ /*
+ * The last entry of leaf node cannot be NULL unless it is the
+ * rightmost node (writing ULONG_MAX), otherwise it spans slots.
+ */
+ if (entry || last == ULONG_MAX)
+ return false;
+ }
+
+ return true;
+}
+
+/* get height of the lowest non-leaf node with free space */
+static unsigned char get_vacant_height(struct ma_state *mas, void *entry)
+{
+ char vacant_height = 0;
+ enum maple_type type;
+ unsigned long *pivots;
+ unsigned long min = 0;
+ unsigned long max = ULONG_MAX;
+
+ /* start traversal */
+ mas_reset(mas);
+ mas_start(mas);
+ if (!xa_is_node(mas_root(mas)))
+ return 0;
+
+ type = mte_node_type(mas->node);
+ while (!ma_is_leaf(type)) {
+ mas_node_walk(mas, mte_to_node(mas->node), type, &min, &max);
+ mas->end = mas_data_end(mas);
+ pivots = ma_pivots(mte_to_node(mas->node), type);
+
+ if (pivots) {
+ if (mas->offset)
+ min = pivots[mas->offset - 1];
+ if (mas->offset < mas->end)
+ max = pivots[mas->offset];
+ }
+
+ /* detect spanning write */
+ if (is_span_wr(mas, max, type, entry))
+ break;
+
+ if (mas->end < mt_slot_count(mas->node) - 1)
+ vacant_height = mas->depth + 1;
+
+ mas_descend(mas);
+ type = mte_node_type(mas->node);
+ mas->depth++;
+ }
+
+ return vacant_height;
+}
+
/* Preallocation testing */
static noinline void __init check_prealloc(struct maple_tree *mt)
{
unsigned long i, max = 100;
unsigned long allocated;
unsigned char height;
+ unsigned char vacant_height;
struct maple_node *mn;
void *ptr = check_prealloc;
MA_STATE(mas, mt, 10, 20);
@@ -35494,8 +35567,9 @@ static noinline void __init check_prealloc(struct maple_tree *mt)
MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
allocated = mas_allocated(&mas);
height = mas_mt_height(&mas);
+ vacant_height = get_vacant_height(&mas, ptr);
MT_BUG_ON(mt, allocated == 0);
- MT_BUG_ON(mt, allocated != 1 + height * 3);
+ MT_BUG_ON(mt, allocated != 1 + (height - vacant_height) * 3);
mas_destroy(&mas);
allocated = mas_allocated(&mas);
MT_BUG_ON(mt, allocated != 0);
@@ -35503,8 +35577,9 @@ static noinline void __init check_prealloc(struct maple_tree *mt)
MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
allocated = mas_allocated(&mas);
height = mas_mt_height(&mas);
+ vacant_height = get_vacant_height(&mas, ptr);
MT_BUG_ON(mt, allocated == 0);
- MT_BUG_ON(mt, allocated != 1 + height * 3);
+ MT_BUG_ON(mt, allocated != 1 + (height - vacant_height) * 3);
MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
mas_destroy(&mas);
allocated = mas_allocated(&mas);
@@ -35514,7 +35589,8 @@ static noinline void __init check_prealloc(struct maple_tree *mt)
MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
allocated = mas_allocated(&mas);
height = mas_mt_height(&mas);
- MT_BUG_ON(mt, allocated != 1 + height * 3);
+ vacant_height = get_vacant_height(&mas, ptr);
+ MT_BUG_ON(mt, allocated != 1 + (height - vacant_height) * 3);
mn = mas_pop_node(&mas);
MT_BUG_ON(mt, mas_allocated(&mas) != allocated - 1);
mn->parent = ma_parent_ptr(mn);
@@ -35527,7 +35603,8 @@ static noinline void __init check_prealloc(struct maple_tree *mt)
MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
allocated = mas_allocated(&mas);
height = mas_mt_height(&mas);
- MT_BUG_ON(mt, allocated != 1 + height * 3);
+ vacant_height = get_vacant_height(&mas, ptr);
+ MT_BUG_ON(mt, allocated != 1 + (height - vacant_height) * 3);
mn = mas_pop_node(&mas);
MT_BUG_ON(mt, mas_allocated(&mas) != allocated - 1);
MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
@@ -35540,7 +35617,8 @@ static noinline void __init check_prealloc(struct maple_tree *mt)
MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
allocated = mas_allocated(&mas);
height = mas_mt_height(&mas);
- MT_BUG_ON(mt, allocated != 1 + height * 3);
+ vacant_height = get_vacant_height(&mas, ptr);
+ MT_BUG_ON(mt, allocated != 1 + (height - vacant_height) * 3);
mn = mas_pop_node(&mas);
MT_BUG_ON(mt, mas_allocated(&mas) != allocated - 1);
mas_push_node(&mas, mn);
@@ -35553,7 +35631,8 @@ static noinline void __init check_prealloc(struct maple_tree *mt)
MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
allocated = mas_allocated(&mas);
height = mas_mt_height(&mas);
- MT_BUG_ON(mt, allocated != 1 + height * 3);
+ vacant_height = get_vacant_height(&mas, ptr);
+ MT_BUG_ON(mt, allocated != 1 + (height - vacant_height) * 3);
mas_store_prealloc(&mas, ptr);
MT_BUG_ON(mt, mas_allocated(&mas) != 0);
@@ -35578,7 +35657,8 @@ static noinline void __init check_prealloc(struct maple_tree *mt)
MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
allocated = mas_allocated(&mas);
height = mas_mt_height(&mas);
- MT_BUG_ON(mt, allocated != 1 + height * 2);
+ vacant_height = get_vacant_height(&mas, ptr);
+ MT_BUG_ON(mt, allocated != 1 + (height - vacant_height) * 2);
mas_store_prealloc(&mas, ptr);
MT_BUG_ON(mt, mas_allocated(&mas) != 0);
mt_set_non_kernel(1);
@@ -35595,8 +35675,9 @@ static noinline void __init check_prealloc(struct maple_tree *mt)
MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
allocated = mas_allocated(&mas);
height = mas_mt_height(&mas);
+ vacant_height = get_vacant_height(&mas, ptr);
MT_BUG_ON(mt, allocated == 0);
- MT_BUG_ON(mt, allocated != 1 + height * 3);
+ MT_BUG_ON(mt, allocated != 1 + (height - vacant_height) * 3);
mas_store_prealloc(&mas, ptr);
MT_BUG_ON(mt, mas_allocated(&mas) != 0);
mas_set_range(&mas, 0, 200);
--
2.43.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 4/5] maple_tree: break on convergence in mas_spanning_rebalance()
2024-11-14 17:05 [PATCH 0/5] Track node vacancy to reduce worst case allocation counts Sidhartha Kumar
` (2 preceding siblings ...)
2024-11-14 17:05 ` [PATCH 3/5] maple_tree: use vacant nodes to reduce worst case allocations Sidhartha Kumar
@ 2024-11-14 17:05 ` Sidhartha Kumar
2024-11-15 7:14 ` Wei Yang
2024-11-14 17:05 ` [PATCH 5/5] maple_tree: add sufficient height Sidhartha Kumar
2024-11-14 21:39 ` [PATCH 0/5] Track node vacancy to reduce worst case allocation counts Sid Kumar
5 siblings, 1 reply; 16+ messages in thread
From: Sidhartha Kumar @ 2024-11-14 17:05 UTC (permalink / raw)
To: linux-kernel, maple-tree; +Cc: linux-mm, akpm, liam.howlett, Sidhartha Kumar
This allows support for using the vacant height to calculate the worst
case number of nodes needed for wr_rebalance operation.
mas_spanning_rebalance() was seen to perform unnecessary node allocations.
We can reduce allocations by breaking early during the rebalancing loop
once we realize that we have ascended to a common ancestor.
Suggested-by: Liam Howlett <liam.howlett@oracle.com>
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
---
lib/maple_tree.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index f14d70c171c2..59c5c3f8db30 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -2904,11 +2904,21 @@ static void mas_spanning_rebalance(struct ma_state *mas,
mast_combine_cp_right(mast);
mast->orig_l->last = mast->orig_l->max;
- if (mast_sufficient(mast))
- continue;
+ if (mast_sufficient(mast)) {
+ if (mast_overflow(mast))
+ continue;
+
+ if (mast->orig_l->node == mast->orig_r->node) {
+ /*
+ * The data in b_node should be stored in one
+ * node and in the tree
+ */
+ slot = mast->l->offset;
+ break;
+ }
- if (mast_overflow(mast))
continue;
+ }
/* May be a new root stored in mast->bn */
if (mas_is_root_limits(mast->orig_l))
--
2.43.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 5/5] maple_tree: add sufficient height
2024-11-14 17:05 [PATCH 0/5] Track node vacancy to reduce worst case allocation counts Sidhartha Kumar
` (3 preceding siblings ...)
2024-11-14 17:05 ` [PATCH 4/5] maple_tree: break on convergence in mas_spanning_rebalance() Sidhartha Kumar
@ 2024-11-14 17:05 ` Sidhartha Kumar
2024-11-14 21:39 ` [PATCH 0/5] Track node vacancy to reduce worst case allocation counts Sid Kumar
5 siblings, 0 replies; 16+ messages in thread
From: Sidhartha Kumar @ 2024-11-14 17:05 UTC (permalink / raw)
To: linux-kernel, maple-tree; +Cc: linux-mm, akpm, liam.howlett, Sidhartha Kumar
If a parent node is vacant but holds mt_min_slots + 1 entries,
rebalancing with a leaf node could cause this parent node to become
insufficient. This will lead to another level of rebalancing in the tree
and requires more node allocations. Therefore, we also have to track the
level at which there is a node with > mt_min_slots entries. We can use
this as the worst case for the wr_rebalance case.
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
---
include/linux/maple_tree.h | 4 +++-
lib/maple_tree.c | 14 +++++++++++++-
tools/testing/radix-tree/maple.c | 28 ++++++++++++++++++++++++++++
3 files changed, 44 insertions(+), 2 deletions(-)
diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
index 7d777aa2d9ed..37dc9525dff6 100644
--- a/include/linux/maple_tree.h
+++ b/include/linux/maple_tree.h
@@ -464,6 +464,7 @@ struct ma_wr_state {
void *entry; /* The entry to write */
void *content; /* The existing entry that is being overwritten */
unsigned char vacant_height; /* Depth of lowest node with free space */
+ unsigned char sufficient_height;/* Depth of lowest node with min sufficiency + 1 nodes */
};
#define mas_lock(mas) spin_lock(&((mas)->tree->ma_lock))
@@ -499,7 +500,8 @@ struct ma_wr_state {
.mas = ma_state, \
.content = NULL, \
.entry = wr_entry, \
- .vacant_height = 0 \
+ .vacant_height = 0, \
+ .sufficient_height = 0 \
}
#define MA_TOPIARY(name, tree) \
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 59c5c3f8db30..3d39773601d3 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -3558,6 +3558,14 @@ static bool mas_wr_walk(struct ma_wr_state *wr_mas)
if (mas->end < mt_slots[wr_mas->type] - 1)
wr_mas->vacant_height = mas->depth + 1;
+ if (ma_is_root(mas_mn(mas)))
+ /* root needs more than 2 entries to be sufficient + 1 */
+ if (mas->end > 2)
+ wr_mas->sufficient_height = 1;
+
+ if (mas->end > mt_min_slots[wr_mas->type] + 1)
+ wr_mas->sufficient_height = mas->depth + 1;
+
mas_wr_walk_traverse(wr_mas);
}
@@ -4198,7 +4206,11 @@ static inline int mas_prealloc_calc(struct ma_wr_state *wr_mas, void *entry)
ret = delta * 2 + 1;
break;
case wr_rebalance:
- ret = height * 2 + 1;
+ if (wr_mas->sufficient_height < wr_mas->vacant_height) {
+ ret = (height - wr_mas->sufficient_height) * 2 + 1;
+ break;
+ }
+ ret = delta * 2 + 1;
break;
case wr_node_store:
ret = mt_in_rcu(mas->tree) ? 1 : 0;
diff --git a/tools/testing/radix-tree/maple.c b/tools/testing/radix-tree/maple.c
index bc8b107e0177..f55ab8c8b491 100644
--- a/tools/testing/radix-tree/maple.c
+++ b/tools/testing/radix-tree/maple.c
@@ -36329,6 +36329,30 @@ static noinline void __init check_mtree_dup(struct maple_tree *mt)
extern void test_kmem_cache_bulk(void);
+/*
+ * Test to check the path of a spanning rebalance which results in
+ * a collapse where the rebalancing of the child node leads to
+ * insufficieny in the parent node.
+ */
+static void check_collapsing_rebalance(struct maple_tree *mt)
+{
+ int i = 0;
+ MA_STATE(mas, mt, ULONG_MAX, ULONG_MAX);
+
+ /* create a height 4 tree */
+ while (mt_height(mt) < 4) {
+ mtree_store_range(mt, i, i + 10, xa_mk_value(i), GFP_KERNEL);
+ i += 9;
+ }
+
+ /* delete all entries one at a time, starting from the right */
+ do {
+ mas_erase(&mas);
+ } while (mas_prev(&mas, 0) != NULL);
+
+ mtree_unlock(mt);
+}
+
/* callback function used for check_nomem_writer_race() */
static void writer2(void *maple_tree)
{
@@ -36495,6 +36519,10 @@ void farmer_tests(void)
check_spanning_write(&tree);
mtree_destroy(&tree);
+ mt_init_flags(&tree, MT_FLAGS_ALLOC_RANGE);
+ check_collapsing_rebalance(&tree);
+ mtree_destroy(&tree);
+
mt_init_flags(&tree, MT_FLAGS_ALLOC_RANGE);
check_null_expand(&tree);
mtree_destroy(&tree);
--
2.43.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/5] Track node vacancy to reduce worst case allocation counts
2024-11-14 17:05 [PATCH 0/5] Track node vacancy to reduce worst case allocation counts Sidhartha Kumar
` (4 preceding siblings ...)
2024-11-14 17:05 ` [PATCH 5/5] maple_tree: add sufficient height Sidhartha Kumar
@ 2024-11-14 21:39 ` Sid Kumar
2024-11-19 9:59 ` Wei Yang
5 siblings, 1 reply; 16+ messages in thread
From: Sid Kumar @ 2024-11-14 21:39 UTC (permalink / raw)
To: linux-kernel, maple-tree; +Cc: linux-mm, akpm, liam.howlett
On 11/14/24 12:05 PM, Sidhartha Kumar wrote:
> ================ overview ========================
> Currently, the maple tree preallocates the worst case number of nodes for
> given store type by taking into account the whole height of the tree. This
> comes from a worst case scenario of every node in the tree being full and
> having to propagate node allocation upwards until we reach the root of the
> tree. This can be optimized if there are vacancies in nodes that are at a
> lower depth than the root node. This series implements tracking the level
> at which there is a vacant node so we only need to allocate until this
> level is reached, rather than always using the full height of the tree.
> The ma_wr_state struct is modified to add a field which keeps track of the
> vacant height and is updated during walks of the tree. This value is then
> read in mas_prealloc_calc() when we decide how many nodes to allocate.
>
> For rebalancing stores, we also need to track the lowest height at which
> a node has 1 more entry than the minimum sufficient number of entries.
> This is because rebalancing can cause a parent node to become insufficient
> which results in further node allocations. In this case, we need to use
> the sufficient height as the worst case rather than the vacant height.
>
> patch 1-2: preparatory patches
> patch 3: implement vacant height tracking + update the tests
> patch 4: support vacant height tracking for rebalacning writes
> patch 5: implement sufficient height tracking
>
> ================ results =========================
> Bpftrace was used to profile the allocation path for requesting new maple
> nodes while running the ./mmap1_processes test from mmtests. The two paths
> for allocation are requests for a single node and the bulk allocation path.
> The histogram represents the number of calls to these paths and a shows the
> distribution of the number of nodes requested for the bulk allocation path.
>
>
> mm-unstable 11/13/24
> @bulk_alloc_req:
> [2, 4) 10 |@@@@@@@@@@@@@ |
> [4, 8) 38 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [8, 16) 19 |@@@@@@@@@@@@@@@@@@@@@@@@@@ |
>
>
> mm-unstable 11/13/24 + this series
> @bulk_alloc_req:
> [2, 4) 9 |@@@@@@@@@@ |
> [4, 8) 43 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [8, 16) 15 |@@@@@@@@@@@@@@@@@@ |
>
> We can see the worst case bulk allocations of [8,16) nodes are reduced after
> this series.
From running the ./malloc1_threads test case we eliminate almost all
bulk allocation requests that
fall between 8 and 16 nodes
./malloc1_threads -t 8 -s 100
mm-unstable + this series
@bulk_alloc_req:
[2, 4) 2
| |
[4, 8) 3381
|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[8, 16) 2
| |
mm-unstable
@bulk_alloc_req:
[2, 4) 1
| |
[4, 8) 1427
|@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[8, 16) 2790
|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
>
> Sidhartha Kumar (5):
> maple_tree: convert mas_prealloc_calc() to take in a maple write state
> maple_tree: use height and depth consistently
> maple_tree: use vacant nodes to reduce worst case allocations
> maple_tree: break on convergence in mas_spanning_rebalance()
> maple_tree: add sufficient height
>
> include/linux/maple_tree.h | 4 +
> lib/maple_tree.c | 89 +++++++++++++---------
> tools/testing/radix-tree/maple.c | 125 +++++++++++++++++++++++++++++--
> 3 files changed, 176 insertions(+), 42 deletions(-)
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 4/5] maple_tree: break on convergence in mas_spanning_rebalance()
2024-11-14 17:05 ` [PATCH 4/5] maple_tree: break on convergence in mas_spanning_rebalance() Sidhartha Kumar
@ 2024-11-15 7:14 ` Wei Yang
0 siblings, 0 replies; 16+ messages in thread
From: Wei Yang @ 2024-11-15 7:14 UTC (permalink / raw)
To: Sidhartha Kumar; +Cc: linux-kernel, maple-tree, linux-mm, akpm, liam.howlett
On Thu, Nov 14, 2024 at 12:05:23PM -0500, Sidhartha Kumar wrote:
>This allows support for using the vacant height to calculate the worst
>case number of nodes needed for wr_rebalance operation.
>mas_spanning_rebalance() was seen to perform unnecessary node allocations.
>We can reduce allocations by breaking early during the rebalancing loop
>once we realize that we have ascended to a common ancestor.
>
>Suggested-by: Liam Howlett <liam.howlett@oracle.com>
>Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
I am just about to send a similar patch :-)
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
--
Wei Yang
Help you, Help me
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/5] maple_tree: use vacant nodes to reduce worst case allocations
2024-11-14 17:05 ` [PATCH 3/5] maple_tree: use vacant nodes to reduce worst case allocations Sidhartha Kumar
@ 2024-11-15 7:52 ` Wei Yang
2024-11-15 20:34 ` Sidhartha Kumar
0 siblings, 1 reply; 16+ messages in thread
From: Wei Yang @ 2024-11-15 7:52 UTC (permalink / raw)
To: Sidhartha Kumar; +Cc: linux-kernel, maple-tree, linux-mm, akpm, liam.howlett
On Thu, Nov 14, 2024 at 12:05:22PM -0500, Sidhartha Kumar wrote:
>In order to determine the store type for a maple tree operation, a walk
>of the tree is done through mas_wr_walk(). This function descends the
>tree until a spanning write is detected or we reach a leaf node. While
>descending, keep track of the height at which we encounter a node with
>available space. This is done by checking if mas->end is less than the
>number of slots a given node type can fit.
>
>Now that the height of the vacant node is tracked, we can use the
>difference between the height of the tree and the height of the vacant
>node to know how many levels we will have to propagate creating new
>nodes. Update mas_prealloc_calc() to consider the vacant height and
>reduce the number of worst allocations.
>
>Rebalancing stores are not supported and fall back to using the full
>height of the tree for allocations.
>
>Update preallocation testing assertions to take into account vacant
>height.
>
>Signed-off-by: Sidhartha <sidhartha.kumar@oracle.com>
>---
> include/linux/maple_tree.h | 2 +
> lib/maple_tree.c | 13 +++--
> tools/testing/radix-tree/maple.c | 97 +++++++++++++++++++++++++++++---
> 3 files changed, 100 insertions(+), 12 deletions(-)
>
>diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
>index cbbcd18d4186..7d777aa2d9ed 100644
>--- a/include/linux/maple_tree.h
>+++ b/include/linux/maple_tree.h
>@@ -463,6 +463,7 @@ struct ma_wr_state {
> void __rcu **slots; /* mas->node->slots pointer */
> void *entry; /* The entry to write */
> void *content; /* The existing entry that is being overwritten */
>+ unsigned char vacant_height; /* Depth of lowest node with free space */
^^^ ^^^
Would this be a little misleading?
> };
>
> #define mas_lock(mas) spin_lock(&((mas)->tree->ma_lock))
>@@ -498,6 +499,7 @@ struct ma_wr_state {
> .mas = ma_state, \
> .content = NULL, \
> .entry = wr_entry, \
>+ .vacant_height = 0 \
> }
>
> #define MA_TOPIARY(name, tree) \
>diff --git a/lib/maple_tree.c b/lib/maple_tree.c
>index 21289e350382..f14d70c171c2 100644
>--- a/lib/maple_tree.c
>+++ b/lib/maple_tree.c
>@@ -3545,6 +3545,9 @@ static bool mas_wr_walk(struct ma_wr_state *wr_mas)
> if (ma_is_leaf(wr_mas->type))
> return true;
>
>+ if (mas->end < mt_slots[wr_mas->type] - 1)
>+ wr_mas->vacant_height = mas->depth + 1;
For some cases in rebalance, we may split data into three parts, which means
we need 2 extra vacant slot.
Maybe this check is not accurate?
>+
> mas_wr_walk_traverse(wr_mas);
> }
>
>@@ -4159,7 +4162,9 @@ static inline void mas_wr_prealloc_setup(struct ma_wr_state *wr_mas)
> static inline int mas_prealloc_calc(struct ma_wr_state *wr_mas, void *entry)
> {
> struct ma_state *mas = wr_mas->mas;
>- int ret = mas_mt_height(mas) * 3 + 1;
>+ unsigned char height = mas_mt_height(mas);
>+ int ret = height * 3 + 1;
>+ unsigned char delta = height - wr_mas->vacant_height;
>
> switch (mas->store_type) {
> case wr_invalid:
>@@ -4177,13 +4182,13 @@ static inline int mas_prealloc_calc(struct ma_wr_state *wr_mas, void *entry)
> ret = 0;
> break;
> case wr_spanning_store:
>- ret = mas_mt_height(mas) * 3 + 1;
>+ ret = delta * 3 + 1;
Hmm... I am afraid we need to put this patch after next one.
Without the change in next patch, we still need to go up the tree till root to
rebalance.
> break;
> case wr_split_store:
>- ret = mas_mt_height(mas) * 2 + 1;
>+ ret = delta * 2 + 1;
> break;
> case wr_rebalance:
>- ret = mas_mt_height(mas) * 2 - 1;
>+ ret = height * 2 + 1;
Looks current calculation is not correct?
If so, do we need to have a fix to be backported?
--
Wei Yang
Help you, Help me
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/5] maple_tree: use vacant nodes to reduce worst case allocations
2024-11-15 7:52 ` Wei Yang
@ 2024-11-15 20:34 ` Sidhartha Kumar
2024-11-16 1:41 ` Wei Yang
0 siblings, 1 reply; 16+ messages in thread
From: Sidhartha Kumar @ 2024-11-15 20:34 UTC (permalink / raw)
To: Wei Yang; +Cc: linux-kernel, maple-tree, linux-mm, akpm, liam.howlett
On 11/15/24 2:52 AM, Wei Yang wrote:
> On Thu, Nov 14, 2024 at 12:05:22PM -0500, Sidhartha Kumar wrote:
>> In order to determine the store type for a maple tree operation, a walk
>> of the tree is done through mas_wr_walk(). This function descends the
>> tree until a spanning write is detected or we reach a leaf node. While
>> descending, keep track of the height at which we encounter a node with
>> available space. This is done by checking if mas->end is less than the
>> number of slots a given node type can fit.
>>
>> Now that the height of the vacant node is tracked, we can use the
>> difference between the height of the tree and the height of the vacant
>> node to know how many levels we will have to propagate creating new
>> nodes. Update mas_prealloc_calc() to consider the vacant height and
>> reduce the number of worst allocations.
>>
>> Rebalancing stores are not supported and fall back to using the full
>> height of the tree for allocations.
>>
>> Update preallocation testing assertions to take into account vacant
>> height.
>>
>> Signed-off-by: Sidhartha <sidhartha.kumar@oracle.com>
>> ---
>> include/linux/maple_tree.h | 2 +
>> lib/maple_tree.c | 13 +++--
>> tools/testing/radix-tree/maple.c | 97 +++++++++++++++++++++++++++++---
>> 3 files changed, 100 insertions(+), 12 deletions(-)
>>
>> diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
>> index cbbcd18d4186..7d777aa2d9ed 100644
>> --- a/include/linux/maple_tree.h
>> +++ b/include/linux/maple_tree.h
>> @@ -463,6 +463,7 @@ struct ma_wr_state {
>> void __rcu **slots; /* mas->node->slots pointer */
>> void *entry; /* The entry to write */
>> void *content; /* The existing entry that is being overwritten */
>> + unsigned char vacant_height; /* Depth of lowest node with free space */
> ^^^ ^^^
>
> Would this be a little misleading?
>
Could you elaborate on how its misleading?
>> };
>>
>> #define mas_lock(mas) spin_lock(&((mas)->tree->ma_lock))
>> @@ -498,6 +499,7 @@ struct ma_wr_state {
>> .mas = ma_state, \
>> .content = NULL, \
>> .entry = wr_entry, \
>> + .vacant_height = 0 \
>> }
>>
>> #define MA_TOPIARY(name, tree) \
>> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
>> index 21289e350382..f14d70c171c2 100644
>> --- a/lib/maple_tree.c
>> +++ b/lib/maple_tree.c
>> @@ -3545,6 +3545,9 @@ static bool mas_wr_walk(struct ma_wr_state *wr_mas)
>> if (ma_is_leaf(wr_mas->type))
>> return true;
>>
>> + if (mas->end < mt_slots[wr_mas->type] - 1)
>> + wr_mas->vacant_height = mas->depth + 1;
>
> For some cases in rebalance, we may split data into three parts, which means
> we need 2 extra vacant slot.
>
> Maybe this check is not accurate?
>
The triple split scenario which you are describing comes from the
spanning store case not on the wr_rebalance case. There is a check
before we set vacant height to return if is_span_wr() so I believe this
is correct still.
>> +
>> mas_wr_walk_traverse(wr_mas);
>> }
>>
>> @@ -4159,7 +4162,9 @@ static inline void mas_wr_prealloc_setup(struct ma_wr_state *wr_mas)
>> static inline int mas_prealloc_calc(struct ma_wr_state *wr_mas, void *entry)
>> {
>> struct ma_state *mas = wr_mas->mas;
>> - int ret = mas_mt_height(mas) * 3 + 1;
>> + unsigned char height = mas_mt_height(mas);
>> + int ret = height * 3 + 1;
>> + unsigned char delta = height - wr_mas->vacant_height;
>>
>> switch (mas->store_type) {
>> case wr_invalid:
>> @@ -4177,13 +4182,13 @@ static inline int mas_prealloc_calc(struct ma_wr_state *wr_mas, void *entry)
>> ret = 0;
>> break;
>> case wr_spanning_store:
>> - ret = mas_mt_height(mas) * 3 + 1;
>> + ret = delta * 3 + 1;
>
> Hmm... I am afraid we need to put this patch after next one.
>
> Without the change in next patch, we still need to go up the tree till root to
> rebalance.
>
I think you are right here as mas_wr_spanning_store() calls
mas_spanning_rebalance(), I'll switch the order of patch 3 and patch 4.
>> break;
>> case wr_split_store:
>> - ret = mas_mt_height(mas) * 2 + 1;
>> + ret = delta * 2 + 1;
>> break;
>> case wr_rebalance:
>> - ret = mas_mt_height(mas) * 2 - 1;
>> + ret = height * 2 + 1;
>
> Looks current calculation is not correct?
> If so, do we need to have a fix to be backported?
>
This was a typo, it can remain as height * 2 -1.
Thanks for taking a look,
Sidhartha Kumar
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/5] maple_tree: use vacant nodes to reduce worst case allocations
2024-11-15 20:34 ` Sidhartha Kumar
@ 2024-11-16 1:41 ` Wei Yang
2024-11-18 16:36 ` Sidhartha Kumar
0 siblings, 1 reply; 16+ messages in thread
From: Wei Yang @ 2024-11-16 1:41 UTC (permalink / raw)
To: Sidhartha Kumar
Cc: Wei Yang, linux-kernel, maple-tree, linux-mm, akpm, liam.howlett
On Fri, Nov 15, 2024 at 03:34:55PM -0500, Sidhartha Kumar wrote:
>On 11/15/24 2:52 AM, Wei Yang wrote:
>> On Thu, Nov 14, 2024 at 12:05:22PM -0500, Sidhartha Kumar wrote:
>> > In order to determine the store type for a maple tree operation, a walk
>> > of the tree is done through mas_wr_walk(). This function descends the
>> > tree until a spanning write is detected or we reach a leaf node. While
>> > descending, keep track of the height at which we encounter a node with
>> > available space. This is done by checking if mas->end is less than the
>> > number of slots a given node type can fit.
>> >
>> > Now that the height of the vacant node is tracked, we can use the
>> > difference between the height of the tree and the height of the vacant
>> > node to know how many levels we will have to propagate creating new
>> > nodes. Update mas_prealloc_calc() to consider the vacant height and
>> > reduce the number of worst allocations.
>> >
>> > Rebalancing stores are not supported and fall back to using the full
>> > height of the tree for allocations.
>> >
>> > Update preallocation testing assertions to take into account vacant
>> > height.
>> >
>> > Signed-off-by: Sidhartha <sidhartha.kumar@oracle.com>
>> > ---
>> > include/linux/maple_tree.h | 2 +
>> > lib/maple_tree.c | 13 +++--
>> > tools/testing/radix-tree/maple.c | 97 +++++++++++++++++++++++++++++---
>> > 3 files changed, 100 insertions(+), 12 deletions(-)
>> >
>> > diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
>> > index cbbcd18d4186..7d777aa2d9ed 100644
>> > --- a/include/linux/maple_tree.h
>> > +++ b/include/linux/maple_tree.h
>> > @@ -463,6 +463,7 @@ struct ma_wr_state {
>> > void __rcu **slots; /* mas->node->slots pointer */
>> > void *entry; /* The entry to write */
>> > void *content; /* The existing entry that is being overwritten */
>> > + unsigned char vacant_height; /* Depth of lowest node with free space */
>> ^^^ ^^^
>>
>> Would this be a little misleading?
>>
>
>Could you elaborate on how its misleading?
>
As you mentioned in previous patch, depth and height has different meaning.
Root node has depth of 0 and height of 1. So I may wandering whether this is
depth or height.
>> > };
>> >
>> > #define mas_lock(mas) spin_lock(&((mas)->tree->ma_lock))
>> > @@ -498,6 +499,7 @@ struct ma_wr_state {
>> > .mas = ma_state, \
>> > .content = NULL, \
>> > .entry = wr_entry, \
>> > + .vacant_height = 0 \
>> > }
>> >
>> > #define MA_TOPIARY(name, tree) \
>> > diff --git a/lib/maple_tree.c b/lib/maple_tree.c
>> > index 21289e350382..f14d70c171c2 100644
>> > --- a/lib/maple_tree.c
>> > +++ b/lib/maple_tree.c
>> > @@ -3545,6 +3545,9 @@ static bool mas_wr_walk(struct ma_wr_state *wr_mas)
>> > if (ma_is_leaf(wr_mas->type))
>> > return true;
>> >
>> > + if (mas->end < mt_slots[wr_mas->type] - 1)
>> > + wr_mas->vacant_height = mas->depth + 1;
>>
>> For some cases in rebalance, we may split data into three parts, which means
>> we need 2 extra vacant slot.
>>
>> Maybe this check is not accurate?
>>
>
>The triple split scenario which you are describing comes from the spanning
>store case not on the wr_rebalance case. There is a check before we set
>vacant height to return if is_span_wr() so I believe this is correct still.
>
Hmm... I come up with a case in which vacant_height may not high enough.
Here is the subtree where spanning write locates. The first level is the
parent node of the second level nodes.
vacant node
+--------+-+-+-------+
| |l|r| |
+--------+-+-+-------+
l r
+------+ +----+-------+ +---------+--+ +------+
| | | | | | | | | |
+------+ +----+-------+ +---------+--+ +------+
^ ^
| |
index last
When mas_wr_walk_descend() to node l, mas_is_span_wr() return true since last
is in the right sibling, node r. Let's say the parent is vacant and l/r is
leaf. So the request number is (1 * 3) + 1.
Let's assume:
* vacant node is just sufficient
* l's left part + r's right part is sufficient but not overflow
Then the "new vacant node" would be deficient and needs another round
rebalance.
For this case, I am afraid it doesn't allocate enough nodes. Or I
misunderstand this?
--
Wei Yang
Help you, Help me
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/5] maple_tree: use vacant nodes to reduce worst case allocations
2024-11-16 1:41 ` Wei Yang
@ 2024-11-18 16:36 ` Sidhartha Kumar
2024-11-19 2:30 ` Wei Yang
0 siblings, 1 reply; 16+ messages in thread
From: Sidhartha Kumar @ 2024-11-18 16:36 UTC (permalink / raw)
To: Wei Yang; +Cc: linux-kernel, maple-tree, linux-mm, akpm, liam.howlett
On 11/15/24 8:41 PM, Wei Yang wrote:
> On Fri, Nov 15, 2024 at 03:34:55PM -0500, Sidhartha Kumar wrote:
>> On 11/15/24 2:52 AM, Wei Yang wrote:
>>> On Thu, Nov 14, 2024 at 12:05:22PM -0500, Sidhartha Kumar wrote:
>>>> In order to determine the store type for a maple tree operation, a walk
>>>> of the tree is done through mas_wr_walk(). This function descends the
>>>> tree until a spanning write is detected or we reach a leaf node. While
>>>> descending, keep track of the height at which we encounter a node with
>>>> available space. This is done by checking if mas->end is less than the
>>>> number of slots a given node type can fit.
>>>>
>>>> Now that the height of the vacant node is tracked, we can use the
>>>> difference between the height of the tree and the height of the vacant
>>>> node to know how many levels we will have to propagate creating new
>>>> nodes. Update mas_prealloc_calc() to consider the vacant height and
>>>> reduce the number of worst allocations.
>>>>
>>>> Rebalancing stores are not supported and fall back to using the full
>>>> height of the tree for allocations.
>>>>
>>>> Update preallocation testing assertions to take into account vacant
>>>> height.
>>>>
>>>> Signed-off-by: Sidhartha <sidhartha.kumar@oracle.com>
>>>> ---
>>>> include/linux/maple_tree.h | 2 +
>>>> lib/maple_tree.c | 13 +++--
>>>> tools/testing/radix-tree/maple.c | 97 +++++++++++++++++++++++++++++---
>>>> 3 files changed, 100 insertions(+), 12 deletions(-)
>>>>
>>>> diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
>>>> index cbbcd18d4186..7d777aa2d9ed 100644
>>>> --- a/include/linux/maple_tree.h
>>>> +++ b/include/linux/maple_tree.h
>>>> @@ -463,6 +463,7 @@ struct ma_wr_state {
>>>> void __rcu **slots; /* mas->node->slots pointer */
>>>> void *entry; /* The entry to write */
>>>> void *content; /* The existing entry that is being overwritten */
>>>> + unsigned char vacant_height; /* Depth of lowest node with free space */
>>> ^^^ ^^^
>>>
>>> Would this be a little misleading?
>>>
>>
>> Could you elaborate on how its misleading?
>>
>
> As you mentioned in previous patch, depth and height has different meaning.
>
> Root node has depth of 0 and height of 1. So I may wandering whether this is
> depth or height.
>
>>>> };
>>>>
>>>> #define mas_lock(mas) spin_lock(&((mas)->tree->ma_lock))
>>>> @@ -498,6 +499,7 @@ struct ma_wr_state {
>>>> .mas = ma_state, \
>>>> .content = NULL, \
>>>> .entry = wr_entry, \
>>>> + .vacant_height = 0 \
>>>> }
>>>>
>>>> #define MA_TOPIARY(name, tree) \
>>>> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
>>>> index 21289e350382..f14d70c171c2 100644
>>>> --- a/lib/maple_tree.c
>>>> +++ b/lib/maple_tree.c
>>>> @@ -3545,6 +3545,9 @@ static bool mas_wr_walk(struct ma_wr_state *wr_mas)
>>>> if (ma_is_leaf(wr_mas->type))
>>>> return true;
>>>>
>>>> + if (mas->end < mt_slots[wr_mas->type] - 1)
>>>> + wr_mas->vacant_height = mas->depth + 1;
>>>
>>> For some cases in rebalance, we may split data into three parts, which means
>>> we need 2 extra vacant slot.
>>>
>>> Maybe this check is not accurate?
>>>
>>
>> The triple split scenario which you are describing comes from the spanning
>> store case not on the wr_rebalance case. There is a check before we set
>> vacant height to return if is_span_wr() so I believe this is correct still.
>>
>
> Hmm... I come up with a case in which vacant_height may not high enough.
>
> Here is the subtree where spanning write locates. The first level is the
> parent node of the second level nodes.
>
> vacant node
> +--------+-+-+-------+
> | |l|r| |
> +--------+-+-+-------+
>
> l r
> +------+ +----+-------+ +---------+--+ +------+
> | | | | | | | | | |
> +------+ +----+-------+ +---------+--+ +------+
> ^ ^
> | |
> index last
>
> When mas_wr_walk_descend() to node l, mas_is_span_wr() return true since last
> is in the right sibling, node r. Let's say the parent is vacant and l/r is
> leaf. So the request number is (1 * 3) + 1.
>
> Let's assume:
>
> * vacant node is just sufficient
> * l's left part + r's right part is sufficient but not overflow
>
> Then the "new vacant node" would be deficient and needs another round
> rebalance.
>
> For this case, I am afraid it doesn't allocate enough nodes. Or I
> misunderstand this?
I think you are correct and we need to use the sufficient height which
is introduced in patch 5 in the spanning store case similar to how patch
5 uses it for the rebalance store case.
case wr_rebalance:
if (wr_mas->sufficient_height < wr_mas->vacant_height) {
ret = (height - wr_mas->sufficient_height)*2 +1;
break;
}
ret = delta * 2 + 1;
break;
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/5] maple_tree: use vacant nodes to reduce worst case allocations
2024-11-18 16:36 ` Sidhartha Kumar
@ 2024-11-19 2:30 ` Wei Yang
2024-11-19 14:15 ` Liam R. Howlett
0 siblings, 1 reply; 16+ messages in thread
From: Wei Yang @ 2024-11-19 2:30 UTC (permalink / raw)
To: Sidhartha Kumar
Cc: Wei Yang, linux-kernel, maple-tree, linux-mm, akpm, liam.howlett
On Mon, Nov 18, 2024 at 11:36:18AM -0500, Sidhartha Kumar wrote:
>On 11/15/24 8:41 PM, Wei Yang wrote:
>> On Fri, Nov 15, 2024 at 03:34:55PM -0500, Sidhartha Kumar wrote:
>> > On 11/15/24 2:52 AM, Wei Yang wrote:
>> > > On Thu, Nov 14, 2024 at 12:05:22PM -0500, Sidhartha Kumar wrote:
>> > > > In order to determine the store type for a maple tree operation, a walk
>> > > > of the tree is done through mas_wr_walk(). This function descends the
>> > > > tree until a spanning write is detected or we reach a leaf node. While
>> > > > descending, keep track of the height at which we encounter a node with
>> > > > available space. This is done by checking if mas->end is less than the
>> > > > number of slots a given node type can fit.
>> > > >
>> > > > Now that the height of the vacant node is tracked, we can use the
>> > > > difference between the height of the tree and the height of the vacant
>> > > > node to know how many levels we will have to propagate creating new
>> > > > nodes. Update mas_prealloc_calc() to consider the vacant height and
>> > > > reduce the number of worst allocations.
>> > > >
>> > > > Rebalancing stores are not supported and fall back to using the full
>> > > > height of the tree for allocations.
>> > > >
>> > > > Update preallocation testing assertions to take into account vacant
>> > > > height.
>> > > >
>> > > > Signed-off-by: Sidhartha <sidhartha.kumar@oracle.com>
>> > > > ---
>> > > > include/linux/maple_tree.h | 2 +
>> > > > lib/maple_tree.c | 13 +++--
>> > > > tools/testing/radix-tree/maple.c | 97 +++++++++++++++++++++++++++++---
>> > > > 3 files changed, 100 insertions(+), 12 deletions(-)
>> > > >
>> > > > diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
>> > > > index cbbcd18d4186..7d777aa2d9ed 100644
>> > > > --- a/include/linux/maple_tree.h
>> > > > +++ b/include/linux/maple_tree.h
>> > > > @@ -463,6 +463,7 @@ struct ma_wr_state {
>> > > > void __rcu **slots; /* mas->node->slots pointer */
>> > > > void *entry; /* The entry to write */
>> > > > void *content; /* The existing entry that is being overwritten */
>> > > > + unsigned char vacant_height; /* Depth of lowest node with free space */
>> > > ^^^ ^^^
>> > >
>> > > Would this be a little misleading?
>> > >
>> >
>> > Could you elaborate on how its misleading?
>> >
>>
>> As you mentioned in previous patch, depth and height has different meaning.
>>
>> Root node has depth of 0 and height of 1. So I may wandering whether this is
>> depth or height.
>>
>> > > > };
>> > > >
>> > > > #define mas_lock(mas) spin_lock(&((mas)->tree->ma_lock))
>> > > > @@ -498,6 +499,7 @@ struct ma_wr_state {
>> > > > .mas = ma_state, \
>> > > > .content = NULL, \
>> > > > .entry = wr_entry, \
>> > > > + .vacant_height = 0 \
>> > > > }
>> > > >
>> > > > #define MA_TOPIARY(name, tree) \
>> > > > diff --git a/lib/maple_tree.c b/lib/maple_tree.c
>> > > > index 21289e350382..f14d70c171c2 100644
>> > > > --- a/lib/maple_tree.c
>> > > > +++ b/lib/maple_tree.c
>> > > > @@ -3545,6 +3545,9 @@ static bool mas_wr_walk(struct ma_wr_state *wr_mas)
>> > > > if (ma_is_leaf(wr_mas->type))
>> > > > return true;
>> > > >
>> > > > + if (mas->end < mt_slots[wr_mas->type] - 1)
>> > > > + wr_mas->vacant_height = mas->depth + 1;
>> > >
>> > > For some cases in rebalance, we may split data into three parts, which means
>> > > we need 2 extra vacant slot.
>> > >
>> > > Maybe this check is not accurate?
>> > >
>> >
>> > The triple split scenario which you are describing comes from the spanning
>> > store case not on the wr_rebalance case. There is a check before we set
>> > vacant height to return if is_span_wr() so I believe this is correct still.
>> >
>>
>> Hmm... I come up with a case in which vacant_height may not high enough.
>>
>> Here is the subtree where spanning write locates. The first level is the
>> parent node of the second level nodes.
>>
>> vacant node
>> +--------+-+-+-------+
>> | |l|r| |
>> +--------+-+-+-------+
>>
>> l r
>> +------+ +----+-------+ +---------+--+ +------+
>> | | | | | | | | | |
>> +------+ +----+-------+ +---------+--+ +------+
>> ^ ^
>> | |
>> index last
>>
>> When mas_wr_walk_descend() to node l, mas_is_span_wr() return true since last
>> is in the right sibling, node r. Let's say the parent is vacant and l/r is
>> leaf. So the request number is (1 * 3) + 1.
>>
>> Let's assume:
>>
>> * vacant node is just sufficient
>> * l's left part + r's right part is sufficient but not overflow
>>
>> Then the "new vacant node" would be deficient and needs another round
>> rebalance.
>>
>> For this case, I am afraid it doesn't allocate enough nodes. Or I
>> misunderstand this?
>
>I think you are correct and we need to use the sufficient height which is
>introduced in patch 5 in the spanning store case similar to how patch 5 uses
>it for the rebalance store case.
>
> case wr_rebalance:
> if (wr_mas->sufficient_height < wr_mas->vacant_height) {
> ret = (height - wr_mas->sufficient_height)*2 +1;
> break;
> }
> ret = delta * 2 + 1;
> break;
>
I have thought about this.
But the spanning write case seems to be a little special to
splitting/rebalance.
It could be both require one more extra slot and may need to be rebalanced.
We are not sure the exact behavior on converge node. Only check one aspect
seems not enough.
Do you think so ?
--
Wei Yang
Help you, Help me
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/5] Track node vacancy to reduce worst case allocation counts
2024-11-14 21:39 ` [PATCH 0/5] Track node vacancy to reduce worst case allocation counts Sid Kumar
@ 2024-11-19 9:59 ` Wei Yang
2024-11-26 19:32 ` Sidhartha Kumar
0 siblings, 1 reply; 16+ messages in thread
From: Wei Yang @ 2024-11-19 9:59 UTC (permalink / raw)
To: Sid Kumar; +Cc: linux-kernel, maple-tree, linux-mm, akpm, liam.howlett
On Thu, Nov 14, 2024 at 04:39:00PM -0500, Sid Kumar wrote:
>
>On 11/14/24 12:05 PM, Sidhartha Kumar wrote:
[...]
>> ================ results =========================
>> Bpftrace was used to profile the allocation path for requesting new maple
>> nodes while running the ./mmap1_processes test from mmtests. The two paths
>> for allocation are requests for a single node and the bulk allocation path.
>> The histogram represents the number of calls to these paths and a shows the
>> distribution of the number of nodes requested for the bulk allocation path.
>>
>>
>> mm-unstable 11/13/24
>> @bulk_alloc_req:
>> [2, 4) 10 |@@@@@@@@@@@@@ |
>> [4, 8) 38 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
>> [8, 16) 19 |@@@@@@@@@@@@@@@@@@@@@@@@@@ |
>>
>>
>> mm-unstable 11/13/24 + this series
>> @bulk_alloc_req:
>> [2, 4) 9 |@@@@@@@@@@ |
>> [4, 8) 43 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
>> [8, 16) 15 |@@@@@@@@@@@@@@@@@@ |
>>
>> We can see the worst case bulk allocations of [8,16) nodes are reduced after
>> this series.
>
>From running the ./malloc1_threads test case we eliminate almost all bulk
>allocation requests that
>
>fall between 8 and 16 nodes
>
>./malloc1_threads -t 8 -s 100
>mm-unstable + this series
>@bulk_alloc_req:
>[2, 4) 2 |
>|
>[4, 8) 3381
>|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
>[8, 16) 2 |
>|
>
This is impressive. But I come up one thing not clear.
For mmap related code, we usually have the following usage:
vma_iter_prealloc(vmi, vma);
mas_preallocate(vmi->mas, vma);
MA_WR_STATE(wr_mas, );
mas_wr_store_type(&wr_mas); --- (1)
vma_iter_store(vmi, vma);
Locaton (1) is where we try to get a better estimation of allocations.
The estimation is based on we walk down the tree and try to meet a proper
node.
In mmap related code, we usually have already walked down the
tree to leaf, by vma_find() or related iteration operation, and the mas.status
is set to ma_active. To me, I don't expect mas_preallocate() would traverse
the tree again.
But from your result, it seems most cases do traverse the tree again to get a
more precise height.
Which part do you think I have missed?
>
>mm-unstable
>@bulk_alloc_req:
>[2, 4) 1 |
>|
>[4, 8) 1427 |@@@@@@@@@@@@@@@@@@@@@@@@@@
>|
>[8, 16) 2790
>|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
>
>
>>
>> Sidhartha Kumar (5):
>> maple_tree: convert mas_prealloc_calc() to take in a maple write state
>> maple_tree: use height and depth consistently
>> maple_tree: use vacant nodes to reduce worst case allocations
>> maple_tree: break on convergence in mas_spanning_rebalance()
>> maple_tree: add sufficient height
>>
>> include/linux/maple_tree.h | 4 +
>> lib/maple_tree.c | 89 +++++++++++++---------
>> tools/testing/radix-tree/maple.c | 125 +++++++++++++++++++++++++++++--
>> 3 files changed, 176 insertions(+), 42 deletions(-)
>>
--
Wei Yang
Help you, Help me
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/5] maple_tree: use vacant nodes to reduce worst case allocations
2024-11-19 2:30 ` Wei Yang
@ 2024-11-19 14:15 ` Liam R. Howlett
0 siblings, 0 replies; 16+ messages in thread
From: Liam R. Howlett @ 2024-11-19 14:15 UTC (permalink / raw)
To: Wei Yang; +Cc: Sidhartha Kumar, linux-kernel, maple-tree, linux-mm, akpm
* Wei Yang <richard.weiyang@gmail.com> [241118 21:31]:
> On Mon, Nov 18, 2024 at 11:36:18AM -0500, Sidhartha Kumar wrote:
> >On 11/15/24 8:41 PM, Wei Yang wrote:
> >> On Fri, Nov 15, 2024 at 03:34:55PM -0500, Sidhartha Kumar wrote:
> >> > On 11/15/24 2:52 AM, Wei Yang wrote:
...
> >>
> >> Hmm... I come up with a case in which vacant_height may not high enough.
> >>
> >> Here is the subtree where spanning write locates. The first level is the
> >> parent node of the second level nodes.
> >>
> >> vacant node
> >> +--------+-+-+-------+
> >> | |l|r| |
> >> +--------+-+-+-------+
> >>
> >> l r
> >> +------+ +----+-------+ +---------+--+ +------+
> >> | | | | | | | | | |
> >> +------+ +----+-------+ +---------+--+ +------+
> >> ^ ^
> >> | |
> >> index last
> >>
> >> When mas_wr_walk_descend() to node l, mas_is_span_wr() return true since last
> >> is in the right sibling, node r. Let's say the parent is vacant and l/r is
> >> leaf. So the request number is (1 * 3) + 1.
> >>
> >> Let's assume:
> >>
> >> * vacant node is just sufficient
> >> * l's left part + r's right part is sufficient but not overflow
> >>
> >> Then the "new vacant node" would be deficient and needs another round
> >> rebalance.
> >>
> >> For this case, I am afraid it doesn't allocate enough nodes. Or I
> >> misunderstand this?
> >
> >I think you are correct and we need to use the sufficient height which is
> >introduced in patch 5 in the spanning store case similar to how patch 5 uses
> >it for the rebalance store case.
> >
> > case wr_rebalance:
> > if (wr_mas->sufficient_height < wr_mas->vacant_height) {
> > ret = (height - wr_mas->sufficient_height)*2 +1;
> > break;
> > }
> > ret = delta * 2 + 1;
> > break;
> >
>
> I have thought about this.
>
> But the spanning write case seems to be a little special to
> splitting/rebalance.
>
> It could be both require one more extra slot and may need to be rebalanced.
> We are not sure the exact behavior on converge node. Only check one aspect
> seems not enough.
>
> Do you think so ?
I'm pretty sure the calculation will need to be different than the
rebalance case. He was just saying that it's going to need to be like
rebalance, but not the same code.
Let's see what Sid comes up with.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/5] Track node vacancy to reduce worst case allocation counts
2024-11-19 9:59 ` Wei Yang
@ 2024-11-26 19:32 ` Sidhartha Kumar
0 siblings, 0 replies; 16+ messages in thread
From: Sidhartha Kumar @ 2024-11-26 19:32 UTC (permalink / raw)
To: Wei Yang; +Cc: linux-kernel, maple-tree, linux-mm, akpm, liam.howlett
On 11/19/24 3:59 AM, Wei Yang wrote:
> On Thu, Nov 14, 2024 at 04:39:00PM -0500, Sid Kumar wrote:
>>
>> On 11/14/24 12:05 PM, Sidhartha Kumar wrote:
> [...]
>>> ================ results =========================
>>> Bpftrace was used to profile the allocation path for requesting new maple
>>> nodes while running the ./mmap1_processes test from mmtests. The two paths
>>> for allocation are requests for a single node and the bulk allocation path.
>>> The histogram represents the number of calls to these paths and a shows the
>>> distribution of the number of nodes requested for the bulk allocation path.
>>>
>>>
>>> mm-unstable 11/13/24
>>> @bulk_alloc_req:
>>> [2, 4) 10 |@@@@@@@@@@@@@ |
>>> [4, 8) 38 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
>>> [8, 16) 19 |@@@@@@@@@@@@@@@@@@@@@@@@@@ |
>>>
>>>
>>> mm-unstable 11/13/24 + this series
>>> @bulk_alloc_req:
>>> [2, 4) 9 |@@@@@@@@@@ |
>>> [4, 8) 43 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
>>> [8, 16) 15 |@@@@@@@@@@@@@@@@@@ |
>>>
>>> We can see the worst case bulk allocations of [8,16) nodes are reduced after
>>> this series.
>>
>>From running the ./malloc1_threads test case we eliminate almost all bulk
>> allocation requests that
>>
>> fall between 8 and 16 nodes
>>
>> ./malloc1_threads -t 8 -s 100
>> mm-unstable + this series
>> @bulk_alloc_req:
>> [2, 4) 2 |
>> |
>> [4, 8) 3381
>> |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
>> [8, 16) 2 |
>> |
>>
>
> This is impressive. But I come up one thing not clear.
>
> For mmap related code, we usually have the following usage:
>
> vma_iter_prealloc(vmi, vma);
> mas_preallocate(vmi->mas, vma);
> MA_WR_STATE(wr_mas, );
> mas_wr_store_type(&wr_mas); --- (1)
> vma_iter_store(vmi, vma);
>
> Locaton (1) is where we try to get a better estimation of allocations.
> The estimation is based on we walk down the tree and try to meet a proper
> node.
>
> In mmap related code, we usually have already walked down the
> tree to leaf, by vma_find() or related iteration operation, and the mas.status
> is set to ma_active. To me, I don't expect mas_preallocate() would traverse
> the tree again.
>
> But from your result, it seems most cases do traverse the tree again to get a
> more precise height.
>
Hello,
From looking at mas_wr_prealloc_setup(), when mas_is_active():
we reset in two scenarios:
if (mas->last > mas->max)
goto reset;
if (wr_mas->entry)
goto set_content;
if (mte_is_leaf(mas->node) && mas->last == mas->max)
goto reset;
it could be that this test case specifically hits these two cases. In
testing brk() I did not see the same gains that this malloc test had so
in that case we are probably not traversing the tree again as you say.
Thanks,
Sid
> Which part do you think I have missed?
>
>>
>> mm-unstable
>> @bulk_alloc_req:
>> [2, 4) 1 |
>> |
>> [4, 8) 1427 |@@@@@@@@@@@@@@@@@@@@@@@@@@
>> |
>> [8, 16) 2790
>> |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
>>
>>
>>>
>>> Sidhartha Kumar (5):
>>> maple_tree: convert mas_prealloc_calc() to take in a maple write state
>>> maple_tree: use height and depth consistently
>>> maple_tree: use vacant nodes to reduce worst case allocations
>>> maple_tree: break on convergence in mas_spanning_rebalance()
>>> maple_tree: add sufficient height
>>>
>>> include/linux/maple_tree.h | 4 +
>>> lib/maple_tree.c | 89 +++++++++++++---------
>>> tools/testing/radix-tree/maple.c | 125 +++++++++++++++++++++++++++++--
>>> 3 files changed, 176 insertions(+), 42 deletions(-)
>>>
>
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2024-11-26 19:32 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-11-14 17:05 [PATCH 0/5] Track node vacancy to reduce worst case allocation counts Sidhartha Kumar
2024-11-14 17:05 ` [PATCH 1/5] maple_tree: convert mas_prealloc_calc() to take in a maple write state Sidhartha Kumar
2024-11-14 17:05 ` [PATCH 2/5] maple_tree: use height and depth consistently Sidhartha Kumar
2024-11-14 17:05 ` [PATCH 3/5] maple_tree: use vacant nodes to reduce worst case allocations Sidhartha Kumar
2024-11-15 7:52 ` Wei Yang
2024-11-15 20:34 ` Sidhartha Kumar
2024-11-16 1:41 ` Wei Yang
2024-11-18 16:36 ` Sidhartha Kumar
2024-11-19 2:30 ` Wei Yang
2024-11-19 14:15 ` Liam R. Howlett
2024-11-14 17:05 ` [PATCH 4/5] maple_tree: break on convergence in mas_spanning_rebalance() Sidhartha Kumar
2024-11-15 7:14 ` Wei Yang
2024-11-14 17:05 ` [PATCH 5/5] maple_tree: add sufficient height Sidhartha Kumar
2024-11-14 21:39 ` [PATCH 0/5] Track node vacancy to reduce worst case allocation counts Sid Kumar
2024-11-19 9:59 ` Wei Yang
2024-11-26 19:32 ` Sidhartha Kumar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox