* [PATCH 0/4] maple_tree: current split may result in deficient node
@ 2024-10-20 2:46 Wei Yang
2024-10-20 2:46 ` [PATCH 1/4] " Wei Yang
` (4 more replies)
0 siblings, 5 replies; 22+ messages in thread
From: Wei Yang @ 2024-10-20 2:46 UTC (permalink / raw)
To: akpm, Liam.Howlett; +Cc: maple-tree, linux-mm, Wei Yang
Here are 4 patches related to correctly split node.
Patch 1: adjust the calculation of split
Patch 2: add a test case to check deficient split
Patch 3: the min value for mab_calc_split() seems not correct
Patch 4: during validation, we skip right most node on each level
Wei Yang (4):
maple_tree: current split may result in deficient node
maple_tree: add a test check deficient node
maple_tree: use the correct min to calculate split
maple_tree: only root node could be deficient
lib/maple_tree.c | 8 ++++----
lib/test_maple_tree.c | 28 ++++++++++++++++++++++++++++
2 files changed, 32 insertions(+), 4 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 1/4] maple_tree: current split may result in deficient node
2024-10-20 2:46 [PATCH 0/4] maple_tree: current split may result in deficient node Wei Yang
@ 2024-10-20 2:46 ` Wei Yang
2024-10-20 21:55 ` Liam R. Howlett
2024-10-20 2:46 ` [PATCH 2/4] maple_tree: add a test check " Wei Yang
` (3 subsequent siblings)
4 siblings, 1 reply; 22+ messages in thread
From: Wei Yang @ 2024-10-20 2:46 UTC (permalink / raw)
To: akpm, Liam.Howlett
Cc: maple-tree, linux-mm, Wei Yang, Liam R . Howlett,
Sidhartha Kumar, Lorenzo Stoakes
Current split would result in deficient node in rare case.
For example:
mt_init_flags(mt, MT_FLAGS_ALLOC_RANGE)
for (count = 0; count < 10; count++) {
mas_set(&mas, count);
mas_store(&mas, xa_mk_value(count));
}
for (count = 20; count < 39; count++) {
mas_set(&mas, count);
mas_store(&mas, xa_mk_value(count));
}
for (count = 10; count < 12; count++) {
mas_set(&mas, count);
mas_store(&mas, xa_mk_value(count));
}
mt_validate(mt);
The validation would report deficient node.
The reason is we don't leave enough room for the right node.
The deficient check is (end < mt_min_slot[]), which means a node must
have at least (mt_min_slot[] + 1) number of data. The right node's index
range is [split + 1, b_end], which means the number of data in right
node is (b_end - (split + 1) + 1) = (b_end - split).
Then the check between the number of data and (mt_min_slot[] + 1) is
(b_end - split) > (mt_min_slot[] + 1), which leads to
(b_end - split - 1) > (mt_min_slot[]).
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
CC: Liam R. Howlett <Liam.Howlett@Oracle.com>
CC: Sidhartha Kumar <sidhartha.kumar@oracle.com>
CC: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
lib/maple_tree.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 1205a5208cfe..894dc5e9436e 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -1831,7 +1831,7 @@ static inline int mab_no_null_split(struct maple_big_node *b_node,
* still be sufficient, then increment the split on NULL.
*/
if ((split < slot_count - 1) &&
- (b_node->b_end - split) > (mt_min_slots[b_node->type]))
+ (b_node->b_end - split - 1) > (mt_min_slots[b_node->type]))
split++;
else
split--;
@@ -1896,7 +1896,7 @@ static inline int mab_calc_split(struct ma_state *mas,
*/
while ((split < slot_count - 1) &&
((bn->pivot[split] - min) < slot_count - 1) &&
- (b_end - split > slot_min))
+ (b_end - split - 1 > slot_min))
split++;
}
--
2.34.1
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 2/4] maple_tree: add a test check deficient node
2024-10-20 2:46 [PATCH 0/4] maple_tree: current split may result in deficient node Wei Yang
2024-10-20 2:46 ` [PATCH 1/4] " Wei Yang
@ 2024-10-20 2:46 ` Wei Yang
2024-10-20 2:46 ` [PATCH 3/4] maple_tree: use the correct min to calculate split Wei Yang
` (2 subsequent siblings)
4 siblings, 0 replies; 22+ messages in thread
From: Wei Yang @ 2024-10-20 2:46 UTC (permalink / raw)
To: akpm, Liam.Howlett
Cc: maple-tree, linux-mm, Wei Yang, Liam R . Howlett,
Sidhartha Kumar, Lorenzo Stoakes
Add a test to assert when resulting a deficient node on splitting.
We can achieve this by build a tree with two nodes. With the left
node with consecutive data from 0 and leave some room for the final
insert to locate in left node. And the right node a full node to force
the split happens on the left node.
Another simple way to trigger this is to inserting from 0 to 16 till the
first split happens. The new right node is deficient. But current
validation skip the right node. This will be fixed in next patch.
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
CC: Liam R. Howlett <Liam.Howlett@Oracle.com>
CC: Sidhartha Kumar <sidhartha.kumar@oracle.com>
CC: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
lib/test_maple_tree.c | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)
diff --git a/lib/test_maple_tree.c b/lib/test_maple_tree.c
index 2ef72f6c6d1b..7cd38c884869 100644
--- a/lib/test_maple_tree.c
+++ b/lib/test_maple_tree.c
@@ -1563,6 +1563,30 @@ static noinline void __init check_root_expand(struct maple_tree *mt)
mas_unlock(&mas);
}
+static noinline void __init check_deficient_node(struct maple_tree *mt)
+{
+ MA_STATE(mas, mt, 0, 0);
+ int count;
+
+ mas_lock(&mas);
+ for (count = 0; count < 10; count++) {
+ mas_set(&mas, count);
+ mas_store_gfp(&mas, xa_mk_value(count), GFP_KERNEL);
+ }
+
+ for (count = 20; count < 39; count++) {
+ mas_set(&mas, count);
+ mas_store_gfp(&mas, xa_mk_value(count), GFP_KERNEL);
+ }
+
+ for (count = 10; count < 12; count++) {
+ mas_set(&mas, count);
+ mas_store_gfp(&mas, xa_mk_value(count), GFP_KERNEL);
+ }
+ mas_unlock(&mas);
+ mt_validate(mt);
+}
+
static noinline void __init check_gap_combining(struct maple_tree *mt)
{
struct maple_enode *mn1, *mn2;
@@ -3804,6 +3828,10 @@ static int __init maple_tree_seed(void)
check_root_expand(&tree);
mtree_destroy(&tree);
+ mt_init_flags(&tree, MT_FLAGS_ALLOC_RANGE);
+ check_deficient_node(&tree);
+ mtree_destroy(&tree);
+
mt_init_flags(&tree, MT_FLAGS_ALLOC_RANGE);
check_iteration(&tree);
mtree_destroy(&tree);
--
2.34.1
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 3/4] maple_tree: use the correct min to calculate split
2024-10-20 2:46 [PATCH 0/4] maple_tree: current split may result in deficient node Wei Yang
2024-10-20 2:46 ` [PATCH 1/4] " Wei Yang
2024-10-20 2:46 ` [PATCH 2/4] maple_tree: add a test check " Wei Yang
@ 2024-10-20 2:46 ` Wei Yang
2024-10-20 22:00 ` Liam R. Howlett
2024-10-20 2:46 ` [PATCH 4/4] maple_tree: only root node could be deficient Wei Yang
2024-10-20 20:34 ` [PATCH 0/4] maple_tree: current split may result in deficient node Liam R. Howlett
4 siblings, 1 reply; 22+ messages in thread
From: Wei Yang @ 2024-10-20 2:46 UTC (permalink / raw)
To: akpm, Liam.Howlett
Cc: maple-tree, linux-mm, Wei Yang, Liam R . Howlett,
Sidhartha Kumar, Lorenzo Stoakes
The check in mab_calc_split() is to make sure the gap between
[0, split] won't be too small. But we don't pass the correct min.
First we may encounter a pivot[split] smaller than min. For example:
mt_init_flags(mt, 0);
for (count = 0; count <= 240; count++) {
mas_set(&mas, count);
mas_store(&mas, xa_mk_value(count));
}
On splitting root for storing 240, the pivot[split] is smaller than min.
This result a huge (pivot[split] - min).
Second prev_l_mas.min is not initialized for the first iteration. This
means on splitting leaf node, this value is mostly taking no effect.
Since we are now calculating the split of mas->node, we should use the
mas->min instead of prev_l_mas.min.
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
CC: Liam R. Howlett <Liam.Howlett@Oracle.com>
CC: Sidhartha Kumar <sidhartha.kumar@oracle.com>
CC: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
lib/maple_tree.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 894dc5e9436e..c2d4b188646c 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -3357,7 +3357,7 @@ static void mas_split(struct ma_state *mas, struct maple_big_node *b_node)
if (mas_push_data(mas, height, &mast, false))
break;
- split = mab_calc_split(mas, b_node, &mid_split, prev_l_mas.min);
+ split = mab_calc_split(mas, b_node, &mid_split, mas->min);
mast_split_data(&mast, mas, split);
/*
* Usually correct, mab_mas_cp in the above call overwrites
--
2.34.1
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 4/4] maple_tree: only root node could be deficient
2024-10-20 2:46 [PATCH 0/4] maple_tree: current split may result in deficient node Wei Yang
` (2 preceding siblings ...)
2024-10-20 2:46 ` [PATCH 3/4] maple_tree: use the correct min to calculate split Wei Yang
@ 2024-10-20 2:46 ` Wei Yang
2024-10-20 21:56 ` Liam R. Howlett
2024-10-20 20:34 ` [PATCH 0/4] maple_tree: current split may result in deficient node Liam R. Howlett
4 siblings, 1 reply; 22+ messages in thread
From: Wei Yang @ 2024-10-20 2:46 UTC (permalink / raw)
To: akpm, Liam.Howlett
Cc: maple-tree, linux-mm, Wei Yang, Liam R . Howlett,
Sidhartha Kumar, Lorenzo Stoakes
Each level's right most node could have (max == ULONG_MAX).
Only root node could be deficient.
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
CC: Liam R. Howlett <Liam.Howlett@Oracle.com>
CC: Sidhartha Kumar <sidhartha.kumar@oracle.com>
CC: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
lib/maple_tree.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index c2d4b188646c..e6b2ab5e27b0 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -7610,7 +7610,7 @@ void mt_validate(struct maple_tree *mt)
MAS_WARN_ON(&mas, mte_dead_node(mas.node));
end = mas_data_end(&mas);
if (MAS_WARN_ON(&mas, (end < mt_min_slot_count(mas.node)) &&
- (mas.max != ULONG_MAX))) {
+ (!mte_is_root(mas.node)))) {
pr_err("Invalid size %u of %p\n", end, mas_mn(&mas));
}
--
2.34.1
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/4] maple_tree: current split may result in deficient node
2024-10-20 2:46 [PATCH 0/4] maple_tree: current split may result in deficient node Wei Yang
` (3 preceding siblings ...)
2024-10-20 2:46 ` [PATCH 4/4] maple_tree: only root node could be deficient Wei Yang
@ 2024-10-20 20:34 ` Liam R. Howlett
2024-10-21 1:30 ` Wei Yang
4 siblings, 1 reply; 22+ messages in thread
From: Liam R. Howlett @ 2024-10-20 20:34 UTC (permalink / raw)
To: Wei Yang; +Cc: akpm, maple-tree, linux-mm
* Wei Yang <richard.weiyang@gmail.com> [241019 22:46]:
> Here are 4 patches related to correctly split node.
So I haven't worked through your last bunch of patches and I had not
fully responded to v3 before v4 came out.
These things take time to go through because you are changing fiddly
things that may break things or cause regressions, but you keep sending
a new series while another is in flight.
>
> Patch 1: adjust the calculation of split
Split calculations exist with +/-1 to avoid jitter, but I will have to
look at this in minute detail.
If you screw this up, then we may lose data.
> Patch 2: add a test case to check deficient split
> Patch 3: the min value for mab_calc_split() seems not correct
> Patch 4: during validation, we skip right most node on each level
>
> Wei Yang (4):
> maple_tree: current split may result in deficient node
> maple_tree: add a test check deficient node
> maple_tree: use the correct min to calculate split
> maple_tree: only root node could be deficient
>
> lib/maple_tree.c | 8 ++++----
> lib/test_maple_tree.c | 28 ++++++++++++++++++++++++++++
> 2 files changed, 32 insertions(+), 4 deletions(-)
>
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/4] maple_tree: current split may result in deficient node
2024-10-20 2:46 ` [PATCH 1/4] " Wei Yang
@ 2024-10-20 21:55 ` Liam R. Howlett
2024-11-03 23:47 ` Wei Yang
2024-11-08 2:30 ` Wei Yang
0 siblings, 2 replies; 22+ messages in thread
From: Liam R. Howlett @ 2024-10-20 21:55 UTC (permalink / raw)
To: Wei Yang; +Cc: akpm, maple-tree, linux-mm, Sidhartha Kumar, Lorenzo Stoakes
* Wei Yang <richard.weiyang@gmail.com> [241019 22:46]:
> Current split would result in deficient node in rare case.
>
> For example:
>
> mt_init_flags(mt, MT_FLAGS_ALLOC_RANGE)
> for (count = 0; count < 10; count++) {
> mas_set(&mas, count);
> mas_store(&mas, xa_mk_value(count));
> }
>
> for (count = 20; count < 39; count++) {
> mas_set(&mas, count);
> mas_store(&mas, xa_mk_value(count));
> }
>
> for (count = 10; count < 12; count++) {
> mas_set(&mas, count);
> mas_store(&mas, xa_mk_value(count));
> }
> mt_validate(mt);
>
> The validation would report deficient node.
I think you can explain it better than this?
If we fill a left node with the right node being already full, the split
of the left node will result in the new middle node being insufficient.
>
> The reason is we don't leave enough room for the right node.
The reason is we don't leave enough data for the node on the right of
the split. The node on the right has too much room from what I see?
>
> The deficient check is (end < mt_min_slot[]), which means a node must
> have at least (mt_min_slot[] + 1) number of data. The right node's index
> range is [split + 1, b_end], which means the number of data in right
> node is (b_end - (split + 1) + 1) = (b_end - split).
>
> Then the check between the number of data and (mt_min_slot[] + 1) is
> (b_end - split) > (mt_min_slot[] + 1), which leads to
> (b_end - split - 1) > (mt_min_slot[]).
Don't state the code, it's stated below.
I am still concerned about jitter that this patch set may cause.
>
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> CC: Liam R. Howlett <Liam.Howlett@Oracle.com>
> CC: Sidhartha Kumar <sidhartha.kumar@oracle.com>
> CC: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
> lib/maple_tree.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
> index 1205a5208cfe..894dc5e9436e 100644
> --- a/lib/maple_tree.c
> +++ b/lib/maple_tree.c
> @@ -1831,7 +1831,7 @@ static inline int mab_no_null_split(struct maple_big_node *b_node,
> * still be sufficient, then increment the split on NULL.
> */
> if ((split < slot_count - 1) &&
> - (b_node->b_end - split) > (mt_min_slots[b_node->type]))
> + (b_node->b_end - split - 1) > (mt_min_slots[b_node->type]))
> split++;
> else
> split--;
> @@ -1896,7 +1896,7 @@ static inline int mab_calc_split(struct ma_state *mas,
> */
> while ((split < slot_count - 1) &&
> ((bn->pivot[split] - min) < slot_count - 1) &&
> - (b_end - split > slot_min))
> + (b_end - split - 1 > slot_min))
> split++;
> }
>
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 4/4] maple_tree: only root node could be deficient
2024-10-20 2:46 ` [PATCH 4/4] maple_tree: only root node could be deficient Wei Yang
@ 2024-10-20 21:56 ` Liam R. Howlett
2024-11-03 23:15 ` Wei Yang
0 siblings, 1 reply; 22+ messages in thread
From: Liam R. Howlett @ 2024-10-20 21:56 UTC (permalink / raw)
To: Wei Yang; +Cc: akpm, maple-tree, linux-mm, Sidhartha Kumar, Lorenzo Stoakes
* Wei Yang <richard.weiyang@gmail.com> [241019 22:46]:
> Each level's right most node could have (max == ULONG_MAX).
I think each levels right most node MUST have max == ULONX_MAX.
>
> Only root node could be deficient.
No, root node deficient is defined as having 1 entry. Only the root
node may be below the minimum data threshold.
>
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> CC: Liam R. Howlett <Liam.Howlett@Oracle.com>
> CC: Sidhartha Kumar <sidhartha.kumar@oracle.com>
> CC: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
> lib/maple_tree.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
> index c2d4b188646c..e6b2ab5e27b0 100644
> --- a/lib/maple_tree.c
> +++ b/lib/maple_tree.c
> @@ -7610,7 +7610,7 @@ void mt_validate(struct maple_tree *mt)
> MAS_WARN_ON(&mas, mte_dead_node(mas.node));
> end = mas_data_end(&mas);
> if (MAS_WARN_ON(&mas, (end < mt_min_slot_count(mas.node)) &&
> - (mas.max != ULONG_MAX))) {
> + (!mte_is_root(mas.node)))) {
> pr_err("Invalid size %u of %p\n", end, mas_mn(&mas));
> }
>
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 3/4] maple_tree: use the correct min to calculate split
2024-10-20 2:46 ` [PATCH 3/4] maple_tree: use the correct min to calculate split Wei Yang
@ 2024-10-20 22:00 ` Liam R. Howlett
2024-10-29 17:49 ` Liam R. Howlett
2024-11-08 2:55 ` Wei Yang
0 siblings, 2 replies; 22+ messages in thread
From: Liam R. Howlett @ 2024-10-20 22:00 UTC (permalink / raw)
To: Wei Yang; +Cc: akpm, maple-tree, linux-mm, Sidhartha Kumar, Lorenzo Stoakes
* Wei Yang <richard.weiyang@gmail.com> [241019 22:46]:
> The check in mab_calc_split() is to make sure the gap between
> [0, split] won't be too small. But we don't pass the correct min.
>
> First we may encounter a pivot[split] smaller than min. For example:
>
> mt_init_flags(mt, 0);
> for (count = 0; count <= 240; count++) {
> mas_set(&mas, count);
> mas_store(&mas, xa_mk_value(count));
> }
>
> On splitting root for storing 240, the pivot[split] is smaller than min.
> This result a huge (pivot[split] - min).
This is fine.
There is an open work item to make it more accurate at higher levels,
but it's not a problem as it is.
Each level upwards needs a better 'minimum span', meaning that the node
should have at least mas.min - mas.min + level * something.
It works today for leaves, somewhat.
>
> Second prev_l_mas.min is not initialized for the first iteration. This
> means on splitting leaf node, this value is mostly taking no effect.
No, it is set to 0. Not initialized would cause random data loss.
See MA_STATE() in the header.
>
> Since we are now calculating the split of mas->node, we should use the
> mas->min instead of prev_l_mas.min.
This sounds reasonable, but considering what this number is used for, I
don't see how it is working as well as it is today. I will need to look
deeper into this.
>
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> CC: Liam R. Howlett <Liam.Howlett@Oracle.com>
> CC: Sidhartha Kumar <sidhartha.kumar@oracle.com>
> CC: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
> lib/maple_tree.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
> index 894dc5e9436e..c2d4b188646c 100644
> --- a/lib/maple_tree.c
> +++ b/lib/maple_tree.c
> @@ -3357,7 +3357,7 @@ static void mas_split(struct ma_state *mas, struct maple_big_node *b_node)
> if (mas_push_data(mas, height, &mast, false))
> break;
>
> - split = mab_calc_split(mas, b_node, &mid_split, prev_l_mas.min);
> + split = mab_calc_split(mas, b_node, &mid_split, mas->min);
> mast_split_data(&mast, mas, split);
> /*
> * Usually correct, mab_mas_cp in the above call overwrites
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/4] maple_tree: current split may result in deficient node
2024-10-20 20:34 ` [PATCH 0/4] maple_tree: current split may result in deficient node Liam R. Howlett
@ 2024-10-21 1:30 ` Wei Yang
0 siblings, 0 replies; 22+ messages in thread
From: Wei Yang @ 2024-10-21 1:30 UTC (permalink / raw)
To: Liam R. Howlett, Wei Yang, akpm, maple-tree, linux-mm
On Sun, Oct 20, 2024 at 04:34:03PM -0400, Liam R. Howlett wrote:
>* Wei Yang <richard.weiyang@gmail.com> [241019 22:46]:
>> Here are 4 patches related to correctly split node.
>
>So I haven't worked through your last bunch of patches and I had not
>fully responded to v3 before v4 came out.
>
Sorry, I thought you have finished v3.
I will check with you in later work.
>These things take time to go through because you are changing fiddly
>things that may break things or cause regressions, but you keep sending
>a new series while another is in flight.
>
Thanks for you time on reviewing these. This really took efforts.
I will keep those in turn later. Is this what you preferred?
>>
>> Patch 1: adjust the calculation of split
>
>Split calculations exist with +/-1 to avoid jitter, but I will have to
>look at this in minute detail.
>
>If you screw this up, then we may lose data.
>
>> Patch 2: add a test case to check deficient split
>> Patch 3: the min value for mab_calc_split() seems not correct
>> Patch 4: during validation, we skip right most node on each level
>>
>> Wei Yang (4):
>> maple_tree: current split may result in deficient node
>> maple_tree: add a test check deficient node
>> maple_tree: use the correct min to calculate split
>> maple_tree: only root node could be deficient
>>
>> lib/maple_tree.c | 8 ++++----
>> lib/test_maple_tree.c | 28 ++++++++++++++++++++++++++++
>> 2 files changed, 32 insertions(+), 4 deletions(-)
>>
>> --
>> 2.34.1
>>
--
Wei Yang
Help you, Help me
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 3/4] maple_tree: use the correct min to calculate split
2024-10-20 22:00 ` Liam R. Howlett
@ 2024-10-29 17:49 ` Liam R. Howlett
2024-11-08 3:02 ` Wei Yang
2024-11-09 1:24 ` Wei Yang
2024-11-08 2:55 ` Wei Yang
1 sibling, 2 replies; 22+ messages in thread
From: Liam R. Howlett @ 2024-10-29 17:49 UTC (permalink / raw)
To: Wei Yang, akpm, maple-tree, linux-mm, Sidhartha Kumar, Lorenzo Stoakes
* Liam R. Howlett <Liam.Howlett@oracle.com> [241020 18:00]:
> * Wei Yang <richard.weiyang@gmail.com> [241019 22:46]:
> > The check in mab_calc_split() is to make sure the gap between
> > [0, split] won't be too small. But we don't pass the correct min.
> >
> > First we may encounter a pivot[split] smaller than min. For example:
> >
> > mt_init_flags(mt, 0);
> > for (count = 0; count <= 240; count++) {
> > mas_set(&mas, count);
> > mas_store(&mas, xa_mk_value(count));
> > }
> >
> > On splitting root for storing 240, the pivot[split] is smaller than min.
> > This result a huge (pivot[split] - min).
>
> This is fine.
>
> There is an open work item to make it more accurate at higher levels,
> but it's not a problem as it is.
>
> Each level upwards needs a better 'minimum span', meaning that the node
> should have at least mas.min - mas.min + level * something.
>
> It works today for leaves, somewhat.
>
> >
> > Second prev_l_mas.min is not initialized for the first iteration. This
> > means on splitting leaf node, this value is mostly taking no effect.
>
> No, it is set to 0. Not initialized would cause random data loss.
>
> See MA_STATE() in the header.
>
> >
> > Since we are now calculating the split of mas->node, we should use the
> > mas->min instead of prev_l_mas.min.
>
> This sounds reasonable, but considering what this number is used for, I
> don't see how it is working as well as it is today. I will need to look
> deeper into this.
On examination of what is happening here, the only way this makes a
difference during the testcases is if we have a node with 16 entries,
we'll put 8 in the left today and 9 in the left after this change.
This only matters if the range is less than the slot count, so the real
world implications of the change will be negligible, if anything.
I honestly think I'm trying to be too smart here, especially at the
leaves. We should just set mid_split = 0; in that complex statement and
drop the min argument all together. It hasn't made a difference besides
the number of instructions executed.
>
> >
> > Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> > CC: Liam R. Howlett <Liam.Howlett@Oracle.com>
> > CC: Sidhartha Kumar <sidhartha.kumar@oracle.com>
> > CC: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > ---
> > lib/maple_tree.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/lib/maple_tree.c b/lib/maple_tree.c
> > index 894dc5e9436e..c2d4b188646c 100644
> > --- a/lib/maple_tree.c
> > +++ b/lib/maple_tree.c
> > @@ -3357,7 +3357,7 @@ static void mas_split(struct ma_state *mas, struct maple_big_node *b_node)
> > if (mas_push_data(mas, height, &mast, false))
> > break;
> >
> > - split = mab_calc_split(mas, b_node, &mid_split, prev_l_mas.min);
> > + split = mab_calc_split(mas, b_node, &mid_split, mas->min);
> > mast_split_data(&mast, mas, split);
> > /*
> > * Usually correct, mab_mas_cp in the above call overwrites
> > --
> > 2.34.1
> >
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 4/4] maple_tree: only root node could be deficient
2024-10-20 21:56 ` Liam R. Howlett
@ 2024-11-03 23:15 ` Wei Yang
0 siblings, 0 replies; 22+ messages in thread
From: Wei Yang @ 2024-11-03 23:15 UTC (permalink / raw)
To: Liam R. Howlett, Wei Yang, akpm, maple-tree, linux-mm,
Sidhartha Kumar, Lorenzo Stoakes
On Sun, Oct 20, 2024 at 05:56:36PM -0400, Liam R. Howlett wrote:
>* Wei Yang <richard.weiyang@gmail.com> [241019 22:46]:
>> Each level's right most node could have (max == ULONG_MAX).
>
>I think each levels right most node MUST have max == ULONX_MAX.
>
Yes, I should be more accurate.
Will adjust the message.
>>
>> Only root node could be deficient.
>
>No, root node deficient is defined as having 1 entry. Only the root
>node may be below the minimum data threshold.
>
Thanks, would rephrase it to
"Only the root node may be below the minimum data threshold.".
>>
>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>> CC: Liam R. Howlett <Liam.Howlett@Oracle.com>
>> CC: Sidhartha Kumar <sidhartha.kumar@oracle.com>
>> CC: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>> ---
>> lib/maple_tree.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
>> index c2d4b188646c..e6b2ab5e27b0 100644
>> --- a/lib/maple_tree.c
>> +++ b/lib/maple_tree.c
>> @@ -7610,7 +7610,7 @@ void mt_validate(struct maple_tree *mt)
>> MAS_WARN_ON(&mas, mte_dead_node(mas.node));
>> end = mas_data_end(&mas);
>> if (MAS_WARN_ON(&mas, (end < mt_min_slot_count(mas.node)) &&
>> - (mas.max != ULONG_MAX))) {
>> + (!mte_is_root(mas.node)))) {
The change here looks good?
>> pr_err("Invalid size %u of %p\n", end, mas_mn(&mas));
BTW, this patch could apply on top of current mm-unstable, since this line is
changed. Will rebase it in next spin.
>> }
>>
>> --
>> 2.34.1
>>
--
Wei Yang
Help you, Help me
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/4] maple_tree: current split may result in deficient node
2024-10-20 21:55 ` Liam R. Howlett
@ 2024-11-03 23:47 ` Wei Yang
2024-11-08 2:30 ` Wei Yang
1 sibling, 0 replies; 22+ messages in thread
From: Wei Yang @ 2024-11-03 23:47 UTC (permalink / raw)
To: Liam R. Howlett, Wei Yang, akpm, maple-tree, linux-mm,
Sidhartha Kumar, Lorenzo Stoakes
On Sun, Oct 20, 2024 at 05:55:10PM -0400, Liam R. Howlett wrote:
>* Wei Yang <richard.weiyang@gmail.com> [241019 22:46]:
>> Current split would result in deficient node in rare case.
>>
>> For example:
>>
>> mt_init_flags(mt, MT_FLAGS_ALLOC_RANGE)
>> for (count = 0; count < 10; count++) {
>> mas_set(&mas, count);
>> mas_store(&mas, xa_mk_value(count));
>> }
>>
>> for (count = 20; count < 39; count++) {
>> mas_set(&mas, count);
>> mas_store(&mas, xa_mk_value(count));
>> }
>>
>> for (count = 10; count < 12; count++) {
>> mas_set(&mas, count);
>> mas_store(&mas, xa_mk_value(count));
>> }
>> mt_validate(mt);
>>
>> The validation would report deficient node.
>
>I think you can explain it better than this?
>
>If we fill a left node with the right node being already full, the split
>of the left node will result in the new middle node being insufficient.
>
Thanks, I will put the explanation later.
>>
>> The reason is we don't leave enough room for the right node.
>
>The reason is we don't leave enough data for the node on the right of
>the split. The node on the right has too much room from what I see?
>
Too much room? Or too much empty room?
I may missed here.
>>
>> The deficient check is (end < mt_min_slot[]), which means a node must
>> have at least (mt_min_slot[] + 1) number of data. The right node's index
>> range is [split + 1, b_end], which means the number of data in right
>> node is (b_end - (split + 1) + 1) = (b_end - split).
>>
>> Then the check between the number of data and (mt_min_slot[] + 1) is
>> (b_end - split) > (mt_min_slot[] + 1), which leads to
>> (b_end - split - 1) > (mt_min_slot[]).
>
>Don't state the code, it's stated below.
Sure, will remote this part.
>
>I am still concerned about jitter that this patch set may cause.
Per my understanding, a jitter is a removal which cause a deficient node which
is just created from a split, right?
>
>>
>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>> CC: Liam R. Howlett <Liam.Howlett@Oracle.com>
>> CC: Sidhartha Kumar <sidhartha.kumar@oracle.com>
>> CC: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>> ---
>> lib/maple_tree.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
>> index 1205a5208cfe..894dc5e9436e 100644
>> --- a/lib/maple_tree.c
>> +++ b/lib/maple_tree.c
>> @@ -1831,7 +1831,7 @@ static inline int mab_no_null_split(struct maple_big_node *b_node,
>> * still be sufficient, then increment the split on NULL.
>> */
>> if ((split < slot_count - 1) &&
>> - (b_node->b_end - split) > (mt_min_slots[b_node->type]))
>> + (b_node->b_end - split - 1) > (mt_min_slots[b_node->type]))
>> split++;
>> else
>> split--;
>> @@ -1896,7 +1896,7 @@ static inline int mab_calc_split(struct ma_state *mas,
>> */
>> while ((split < slot_count - 1) &&
>> ((bn->pivot[split] - min) < slot_count - 1) &&
>> - (b_end - split > slot_min))
>> + (b_end - split - 1 > slot_min))
>> split++;
>> }
>>
>> --
>> 2.34.1
>>
--
Wei Yang
Help you, Help me
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/4] maple_tree: current split may result in deficient node
2024-10-20 21:55 ` Liam R. Howlett
2024-11-03 23:47 ` Wei Yang
@ 2024-11-08 2:30 ` Wei Yang
2024-11-08 2:49 ` Liam R. Howlett
1 sibling, 1 reply; 22+ messages in thread
From: Wei Yang @ 2024-11-08 2:30 UTC (permalink / raw)
To: Liam R. Howlett, Wei Yang, akpm, maple-tree, linux-mm,
Sidhartha Kumar, Lorenzo Stoakes
On Sun, Oct 20, 2024 at 05:55:10PM -0400, Liam R. Howlett wrote:
>* Wei Yang <richard.weiyang@gmail.com> [241019 22:46]:
[...]
>
>Don't state the code, it's stated below.
>
>I am still concerned about jitter that this patch set may cause.
>
If my understanding is correct, the change here won't cause worse jitter.
Below is the dump result when this case happens.
Let's name those nodes: n1, n2, n3.
maple_tree(0x5611079e1680) flags 9, height 2 root 0x615000001c1e
0-18446744073709551615: node 0x615000001c00 depth 0 type 3 parent 0x5611079e1681 contents: 0 8 18446744073709551577 0 0 0 0 0 0 0 | 02 02| 0x61500000210c 10 0x615000001f0c 23 0x61500000120c 18446744073709551615 (nil) 0 (nil) 0 (nil) 0 (nil) 0 (nil) 0 (nil) 0 (nil)
0-10: node 0x615000002100 depth 1 type 1 parent 0x615000001c06 contents: 0x1 0 0x3 1 0x5 2 0x7 3 0x9 4 0xb 5 0xd 6 0xf 7 0x11 8 0x13 9 0x15 10 (nil) 0 (nil) 0 (nil) 0 (nil) 0 0xa
0: value 0 (0x0) [0x1]
1: value 1 (0x1) [0x3]
2: value 2 (0x2) [0x5]
3: value 3 (0x3) [0x7]
4: value 4 (0x4) [0x9]
5: value 5 (0x5) [0xb]
6: value 6 (0x6) [0xd]
7: value 7 (0x7) [0xf]
8: value 8 (0x8) [0x11]
9: value 9 (0x9) [0x13]
10: value 10 (0xa) [0x15]
11-23: node 0x615000001f00 depth 1 type 1 parent 0x615000001c0e contents: 0x17 11 (nil) 19 0x29 20 0x2b 21 0x2d 22 0x2f 23 (nil) 0 (nil) 0 (nil) 0 (nil) 0 (nil) 0 (nil) 0 (nil) 0 (nil) 0 (nil) 0 0x5
11: value 11 (0xb) [0x17]
12-19: (nil)
20: value 20 (0x14) [0x29]
21: value 21 (0x15) [0x2b]
22: value 22 (0x16) [0x2d]
23: value 23 (0x17) [0x2f]
24-18446744073709551615: node 0x615000001200 depth 1 type 1 parent 0x615000001c16 contents: 0x31 24 0x33 25 0x35 26 0x37 27 0x39 28 0x3b 29 0x3d 30 0x3f 31 0x41 32 0x43 33 0x45 34 0x47 35 0x49 36 0x4b 37 0x4d 38 (nil)
24: value 24 (0x18) [0x31]
25: value 25 (0x19) [0x33]
26: value 26 (0x1a) [0x35]
27: value 27 (0x1b) [0x37]
28: value 28 (0x1c) [0x39]
29: value 29 (0x1d) [0x3b]
30: value 30 (0x1e) [0x3d]
31: value 31 (0x1f) [0x3f]
32: value 32 (0x20) [0x41]
33: value 33 (0x21) [0x43]
34: value 34 (0x22) [0x45]
35: value 35 (0x23) [0x47]
36: value 36 (0x24) [0x49]
37: value 37 (0x25) [0x4b]
38: value 38 (0x26) [0x4d]
39-18446744073709551615: (nil)
Since n2 is already deficient, removal a data from n2 would need rebalance,
which is a jitter if my understanding is correct.
After this change, removal a data from n2 would also result in a deficient
node. So this is not worse than current behavior.
Do you have other cases in concern?
--
Wei Yang
Help you, Help me
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/4] maple_tree: current split may result in deficient node
2024-11-08 2:30 ` Wei Yang
@ 2024-11-08 2:49 ` Liam R. Howlett
0 siblings, 0 replies; 22+ messages in thread
From: Liam R. Howlett @ 2024-11-08 2:49 UTC (permalink / raw)
To: Wei Yang; +Cc: akpm, maple-tree, linux-mm, Sidhartha Kumar, Lorenzo Stoakes
* Wei Yang <richard.weiyang@gmail.com> [241107 21:31]:
> On Sun, Oct 20, 2024 at 05:55:10PM -0400, Liam R. Howlett wrote:
> >* Wei Yang <richard.weiyang@gmail.com> [241019 22:46]:
> [...]
> >
> >Don't state the code, it's stated below.
> >
> >I am still concerned about jitter that this patch set may cause.
> >
>
> If my understanding is correct, the change here won't cause worse jitter.
>
> Below is the dump result when this case happens.
> Let's name those nodes: n1, n2, n3.
>
> maple_tree(0x5611079e1680) flags 9, height 2 root 0x615000001c1e
> 0-18446744073709551615: node 0x615000001c00 depth 0 type 3 parent 0x5611079e1681 contents: 0 8 18446744073709551577 0 0 0 0 0 0 0 | 02 02| 0x61500000210c 10 0x615000001f0c 23 0x61500000120c 18446744073709551615 (nil) 0 (nil) 0 (nil) 0 (nil) 0 (nil) 0 (nil) 0 (nil)
> 0-10: node 0x615000002100 depth 1 type 1 parent 0x615000001c06 contents: 0x1 0 0x3 1 0x5 2 0x7 3 0x9 4 0xb 5 0xd 6 0xf 7 0x11 8 0x13 9 0x15 10 (nil) 0 (nil) 0 (nil) 0 (nil) 0 0xa
> 0: value 0 (0x0) [0x1]
> 1: value 1 (0x1) [0x3]
> 2: value 2 (0x2) [0x5]
> 3: value 3 (0x3) [0x7]
> 4: value 4 (0x4) [0x9]
> 5: value 5 (0x5) [0xb]
> 6: value 6 (0x6) [0xd]
> 7: value 7 (0x7) [0xf]
> 8: value 8 (0x8) [0x11]
> 9: value 9 (0x9) [0x13]
> 10: value 10 (0xa) [0x15]
> 11-23: node 0x615000001f00 depth 1 type 1 parent 0x615000001c0e contents: 0x17 11 (nil) 19 0x29 20 0x2b 21 0x2d 22 0x2f 23 (nil) 0 (nil) 0 (nil) 0 (nil) 0 (nil) 0 (nil) 0 (nil) 0 (nil) 0 (nil) 0 0x5
> 11: value 11 (0xb) [0x17]
> 12-19: (nil)
> 20: value 20 (0x14) [0x29]
> 21: value 21 (0x15) [0x2b]
> 22: value 22 (0x16) [0x2d]
> 23: value 23 (0x17) [0x2f]
> 24-18446744073709551615: node 0x615000001200 depth 1 type 1 parent 0x615000001c16 contents: 0x31 24 0x33 25 0x35 26 0x37 27 0x39 28 0x3b 29 0x3d 30 0x3f 31 0x41 32 0x43 33 0x45 34 0x47 35 0x49 36 0x4b 37 0x4d 38 (nil)
> 24: value 24 (0x18) [0x31]
> 25: value 25 (0x19) [0x33]
> 26: value 26 (0x1a) [0x35]
> 27: value 27 (0x1b) [0x37]
> 28: value 28 (0x1c) [0x39]
> 29: value 29 (0x1d) [0x3b]
> 30: value 30 (0x1e) [0x3d]
> 31: value 31 (0x1f) [0x3f]
> 32: value 32 (0x20) [0x41]
> 33: value 33 (0x21) [0x43]
> 34: value 34 (0x22) [0x45]
> 35: value 35 (0x23) [0x47]
> 36: value 36 (0x24) [0x49]
> 37: value 37 (0x25) [0x4b]
> 38: value 38 (0x26) [0x4d]
> 39-18446744073709551615: (nil)
>
> Since n2 is already deficient, removal a data from n2 would need rebalance,
> which is a jitter if my understanding is correct.
>
> After this change, removal a data from n2 would also result in a deficient
> node. So this is not worse than current behavior.
>
> Do you have other cases in concern?
My concern comes when we have a node with 16 entries, then add one then
delete one, then add one, then delete one, then add one, then delete
one...
I am not worried that you fixed the insufficient node, I am worried that
the splitting and rebalancing target will become too strict and cause us
to allocate nodes over and over - this can be even worse if it causes a
tree expansion/contraction further up.
There are a few benchmarks that do this more than it happens in the real
world, but it is still important to handle this case for real world
cases.
Thanks,
Liam
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 3/4] maple_tree: use the correct min to calculate split
2024-10-20 22:00 ` Liam R. Howlett
2024-10-29 17:49 ` Liam R. Howlett
@ 2024-11-08 2:55 ` Wei Yang
2024-11-08 3:19 ` Liam R. Howlett
1 sibling, 1 reply; 22+ messages in thread
From: Wei Yang @ 2024-11-08 2:55 UTC (permalink / raw)
To: Liam R. Howlett, Wei Yang, akpm, maple-tree, linux-mm,
Sidhartha Kumar, Lorenzo Stoakes
On Sun, Oct 20, 2024 at 06:00:35PM -0400, Liam R. Howlett wrote:
>* Wei Yang <richard.weiyang@gmail.com> [241019 22:46]:
>> The check in mab_calc_split() is to make sure the gap between
>> [0, split] won't be too small. But we don't pass the correct min.
>>
>> First we may encounter a pivot[split] smaller than min. For example:
>>
>> mt_init_flags(mt, 0);
>> for (count = 0; count <= 240; count++) {
>> mas_set(&mas, count);
>> mas_store(&mas, xa_mk_value(count));
>> }
>>
>> On splitting root for storing 240, the pivot[split] is smaller than min.
>> This result a huge (pivot[split] - min).
>
>This is fine.
>
>There is an open work item to make it more accurate at higher levels,
>but it's not a problem as it is.
>
>Each level upwards needs a better 'minimum span', meaning that the node
>should have at least mas.min - mas.min + level * something.
Don't follow this.
First, I guess it is mas.max - mas.min?
Then there is sth related to level. But I don't see level is used for
calculation. Would you mind sharing more your original thought?
>
>It works today for leaves, somewhat.
>
To me it works by coincidence.
The condition check is:
(bn->pivot[split] - min) < (slot_count - 1)
while slot_count is a constent, 16 for leaf node.
And we always pass 0 as min, the condition for leaf is:
bn->pivot[split] < 15
For the mmap world, per my understanding, we won't expect an address is smaller
than 15.
>>
>> Second prev_l_mas.min is not initialized for the first iteration. This
>> means on splitting leaf node, this value is mostly taking no effect.
>
>No, it is set to 0. Not initialized would cause random data loss.
>
>See MA_STATE() in the header.
>
Right, I want to say not initialized to the value we want.
Will be more careful next time.
>>
>> Since we are now calculating the split of mas->node, we should use the
>> mas->min instead of prev_l_mas.min.
>
>This sounds reasonable, but considering what this number is used for, I
>don't see how it is working as well as it is today. I will need to look
>deeper into this.
>
>>
>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>> CC: Liam R. Howlett <Liam.Howlett@Oracle.com>
>> CC: Sidhartha Kumar <sidhartha.kumar@oracle.com>
>> CC: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>> ---
>> lib/maple_tree.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
>> index 894dc5e9436e..c2d4b188646c 100644
>> --- a/lib/maple_tree.c
>> +++ b/lib/maple_tree.c
>> @@ -3357,7 +3357,7 @@ static void mas_split(struct ma_state *mas, struct maple_big_node *b_node)
>> if (mas_push_data(mas, height, &mast, false))
>> break;
>>
>> - split = mab_calc_split(mas, b_node, &mid_split, prev_l_mas.min);
>> + split = mab_calc_split(mas, b_node, &mid_split, mas->min);
>> mast_split_data(&mast, mas, split);
>> /*
>> * Usually correct, mab_mas_cp in the above call overwrites
>> --
>> 2.34.1
>>
--
Wei Yang
Help you, Help me
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 3/4] maple_tree: use the correct min to calculate split
2024-10-29 17:49 ` Liam R. Howlett
@ 2024-11-08 3:02 ` Wei Yang
2024-11-09 1:24 ` Wei Yang
1 sibling, 0 replies; 22+ messages in thread
From: Wei Yang @ 2024-11-08 3:02 UTC (permalink / raw)
To: Liam R. Howlett, Wei Yang, akpm, maple-tree, linux-mm,
Sidhartha Kumar, Lorenzo Stoakes
On Tue, Oct 29, 2024 at 01:49:28PM -0400, Liam R. Howlett wrote:
>* Liam R. Howlett <Liam.Howlett@oracle.com> [241020 18:00]:
>> * Wei Yang <richard.weiyang@gmail.com> [241019 22:46]:
>> > The check in mab_calc_split() is to make sure the gap between
>> > [0, split] won't be too small. But we don't pass the correct min.
>> >
>> > First we may encounter a pivot[split] smaller than min. For example:
>> >
>> > mt_init_flags(mt, 0);
>> > for (count = 0; count <= 240; count++) {
>> > mas_set(&mas, count);
>> > mas_store(&mas, xa_mk_value(count));
>> > }
>> >
>> > On splitting root for storing 240, the pivot[split] is smaller than min.
>> > This result a huge (pivot[split] - min).
>>
>> This is fine.
>>
>> There is an open work item to make it more accurate at higher levels,
>> but it's not a problem as it is.
>>
>> Each level upwards needs a better 'minimum span', meaning that the node
>> should have at least mas.min - mas.min + level * something.
>>
>> It works today for leaves, somewhat.
>>
>> >
>> > Second prev_l_mas.min is not initialized for the first iteration. This
>> > means on splitting leaf node, this value is mostly taking no effect.
>>
>> No, it is set to 0. Not initialized would cause random data loss.
>>
>> See MA_STATE() in the header.
>>
>> >
>> > Since we are now calculating the split of mas->node, we should use the
>> > mas->min instead of prev_l_mas.min.
>>
>> This sounds reasonable, but considering what this number is used for, I
>> don't see how it is working as well as it is today. I will need to look
>> deeper into this.
>
>
>On examination of what is happening here, the only way this makes a
>difference during the testcases is if we have a node with 16 entries,
>we'll put 8 in the left today and 9 in the left after this change.
>
>This only matters if the range is less than the slot count, so the real
>world implications of the change will be negligible, if anything.
>
>I honestly think I'm trying to be too smart here, especially at the
>leaves. We should just set mid_split = 0; in that complex statement and
>drop the min argument all together. It hasn't made a difference besides
>the number of instructions executed.
>
This looks good to me. And it looks we won't face jitter problem after this.
I need to take a through look.
>>
>> >
>> > Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>> > CC: Liam R. Howlett <Liam.Howlett@Oracle.com>
>> > CC: Sidhartha Kumar <sidhartha.kumar@oracle.com>
>> > CC: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>> > ---
>> > lib/maple_tree.c | 2 +-
>> > 1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/lib/maple_tree.c b/lib/maple_tree.c
>> > index 894dc5e9436e..c2d4b188646c 100644
>> > --- a/lib/maple_tree.c
>> > +++ b/lib/maple_tree.c
>> > @@ -3357,7 +3357,7 @@ static void mas_split(struct ma_state *mas, struct maple_big_node *b_node)
>> > if (mas_push_data(mas, height, &mast, false))
>> > break;
>> >
>> > - split = mab_calc_split(mas, b_node, &mid_split, prev_l_mas.min);
>> > + split = mab_calc_split(mas, b_node, &mid_split, mas->min);
>> > mast_split_data(&mast, mas, split);
>> > /*
>> > * Usually correct, mab_mas_cp in the above call overwrites
>> > --
>> > 2.34.1
>> >
--
Wei Yang
Help you, Help me
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 3/4] maple_tree: use the correct min to calculate split
2024-11-08 2:55 ` Wei Yang
@ 2024-11-08 3:19 ` Liam R. Howlett
2024-11-09 1:40 ` Wei Yang
0 siblings, 1 reply; 22+ messages in thread
From: Liam R. Howlett @ 2024-11-08 3:19 UTC (permalink / raw)
To: Wei Yang; +Cc: akpm, maple-tree, linux-mm, Sidhartha Kumar, Lorenzo Stoakes
* Wei Yang <richard.weiyang@gmail.com> [241107 21:55]:
> On Sun, Oct 20, 2024 at 06:00:35PM -0400, Liam R. Howlett wrote:
> >* Wei Yang <richard.weiyang@gmail.com> [241019 22:46]:
> >> The check in mab_calc_split() is to make sure the gap between
> >> [0, split] won't be too small. But we don't pass the correct min.
> >>
> >> First we may encounter a pivot[split] smaller than min. For example:
> >>
> >> mt_init_flags(mt, 0);
> >> for (count = 0; count <= 240; count++) {
> >> mas_set(&mas, count);
> >> mas_store(&mas, xa_mk_value(count));
> >> }
> >>
> >> On splitting root for storing 240, the pivot[split] is smaller than min.
> >> This result a huge (pivot[split] - min).
> >
> >This is fine.
> >
> >There is an open work item to make it more accurate at higher levels,
> >but it's not a problem as it is.
> >
> >Each level upwards needs a better 'minimum span', meaning that the node
> >should have at least mas.min - mas.min + level * something.
>
> Don't follow this.
>
> First, I guess it is mas.max - mas.min?
>
No, the idea is that we could do better if we could keep the data more
densely packed.
If we have singletons (all range of 1), and we have 10,000 singletons,
then a parent node with 0 - 128 and a sibling of 129-257 makes no sense,
there is nothing that can be inserting into the parents last two slots
- we can't split singletons smaller so we should have done a better job
at splitting.
If you think of how this grows today, we split then rebalance until the
left node is full and we have to split again. This could be better
optimised to avoid multiple rebalances by doing something to push more
data to the left based on the span of the children and the nodes. But
we'd want to keep enough so some modifications to each node doesn't
cause a rebalance in the opposite direction.
So a minimum span of parent nodes one level up from the leaves would be
16 * 10 = 160, so when we split we can push more to the left to try and
get to 160 while keeping the right node sufficient. The math of the
span would change based on how far up the tree we go, but would be
easier to maintain.
I'd probably still want to split it 50/50 the first time, but it might
make sense to push more on the next rebalance. so 10 => 6|5, 6|10 =>
9|5, but maybe it's not worth the effort. I'm starting to think so,
considering it's been ineffective to date. There is a chance that the
active area gets moved to the more-full node and ends up splitting
sooner anyways..
All of this will probably change when dense nodes come anyways.
> Then there is sth related to level. But I don't see level is used for
> calculation. Would you mind sharing more your original thought?
>
> >
> >It works today for leaves, somewhat.
> >
>
> To me it works by coincidence.
Seems like it.
>
> The condition check is:
>
> (bn->pivot[split] - min) < (slot_count - 1)
>
> while slot_count is a constent, 16 for leaf node.
>
> And we always pass 0 as min, the condition for leaf is:
>
> bn->pivot[split] < 15
>
> For the mmap world, per my understanding, we won't expect an address is smaller
> than 15.
There are other users, but I think we should just drop this attempt at
fancy stuff.
>
> >>
> >> Second prev_l_mas.min is not initialized for the first iteration. This
> >> means on splitting leaf node, this value is mostly taking no effect.
> >
> >No, it is set to 0. Not initialized would cause random data loss.
> >
> >See MA_STATE() in the header.
> >
>
> Right, I want to say not initialized to the value we want.
>
> Will be more careful next time.
>
> >>
> >> Since we are now calculating the split of mas->node, we should use the
> >> mas->min instead of prev_l_mas.min.
> >
> >This sounds reasonable, but considering what this number is used for, I
> >don't see how it is working as well as it is today. I will need to look
> >deeper into this.
> >
> >>
> >> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> >> CC: Liam R. Howlett <Liam.Howlett@Oracle.com>
> >> CC: Sidhartha Kumar <sidhartha.kumar@oracle.com>
> >> CC: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> >> ---
> >> lib/maple_tree.c | 2 +-
> >> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
> >> index 894dc5e9436e..c2d4b188646c 100644
> >> --- a/lib/maple_tree.c
> >> +++ b/lib/maple_tree.c
> >> @@ -3357,7 +3357,7 @@ static void mas_split(struct ma_state *mas, struct maple_big_node *b_node)
> >> if (mas_push_data(mas, height, &mast, false))
> >> break;
> >>
> >> - split = mab_calc_split(mas, b_node, &mid_split, prev_l_mas.min);
> >> + split = mab_calc_split(mas, b_node, &mid_split, mas->min);
> >> mast_split_data(&mast, mas, split);
> >> /*
> >> * Usually correct, mab_mas_cp in the above call overwrites
> >> --
> >> 2.34.1
> >>
>
> --
> Wei Yang
> Help you, Help me
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 3/4] maple_tree: use the correct min to calculate split
2024-10-29 17:49 ` Liam R. Howlett
2024-11-08 3:02 ` Wei Yang
@ 2024-11-09 1:24 ` Wei Yang
1 sibling, 0 replies; 22+ messages in thread
From: Wei Yang @ 2024-11-09 1:24 UTC (permalink / raw)
To: Liam R. Howlett, Wei Yang, akpm, maple-tree, linux-mm,
Sidhartha Kumar, Lorenzo Stoakes
On Tue, Oct 29, 2024 at 01:49:28PM -0400, Liam R. Howlett wrote:
>* Liam R. Howlett <Liam.Howlett@oracle.com> [241020 18:00]:
>> * Wei Yang <richard.weiyang@gmail.com> [241019 22:46]:
>> > The check in mab_calc_split() is to make sure the gap between
>> > [0, split] won't be too small. But we don't pass the correct min.
>> >
>> > First we may encounter a pivot[split] smaller than min. For example:
>> >
>> > mt_init_flags(mt, 0);
>> > for (count = 0; count <= 240; count++) {
>> > mas_set(&mas, count);
>> > mas_store(&mas, xa_mk_value(count));
>> > }
>> >
>> > On splitting root for storing 240, the pivot[split] is smaller than min.
>> > This result a huge (pivot[split] - min).
>>
>> This is fine.
>>
>> There is an open work item to make it more accurate at higher levels,
>> but it's not a problem as it is.
>>
>> Each level upwards needs a better 'minimum span', meaning that the node
>> should have at least mas.min - mas.min + level * something.
>>
>> It works today for leaves, somewhat.
>>
>> >
>> > Second prev_l_mas.min is not initialized for the first iteration. This
>> > means on splitting leaf node, this value is mostly taking no effect.
>>
>> No, it is set to 0. Not initialized would cause random data loss.
>>
>> See MA_STATE() in the header.
>>
>> >
>> > Since we are now calculating the split of mas->node, we should use the
>> > mas->min instead of prev_l_mas.min.
>>
>> This sounds reasonable, but considering what this number is used for, I
>> don't see how it is working as well as it is today. I will need to look
>> deeper into this.
>
>
>On examination of what is happening here, the only way this makes a
>difference during the testcases is if we have a node with 16 entries,
>we'll put 8 in the left today and 9 in the left after this change.
>
>This only matters if the range is less than the slot count, so the real
>world implications of the change will be negligible, if anything.
>
Yes, maybe it only effect when there are all singleton.
>I honestly think I'm trying to be too smart here, especially at the
>leaves. We should just set mid_split = 0; in that complex statement and
>drop the min argument all together. It hasn't made a difference besides
>the number of instructions executed.
>
I have tried to remove those lines, so we got split = b_end / 2 and
mid_split = 0. Tests all passed and kernel runs. (I guess in the real world,
it behaves the same, since (bn->pivot[split] - min) is always bigger than 15.)
Since we call mab_calc_split() when b_end >= slot_count == 16, it means we
get split == 16/2 == 8. So the new left|right node has data 9|8, this is
friendly to jitter problem.
>>
>> >
>> > Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>> > CC: Liam R. Howlett <Liam.Howlett@Oracle.com>
>> > CC: Sidhartha Kumar <sidhartha.kumar@oracle.com>
>> > CC: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>> > ---
>> > lib/maple_tree.c | 2 +-
>> > 1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/lib/maple_tree.c b/lib/maple_tree.c
>> > index 894dc5e9436e..c2d4b188646c 100644
>> > --- a/lib/maple_tree.c
>> > +++ b/lib/maple_tree.c
>> > @@ -3357,7 +3357,7 @@ static void mas_split(struct ma_state *mas, struct maple_big_node *b_node)
>> > if (mas_push_data(mas, height, &mast, false))
>> > break;
>> >
>> > - split = mab_calc_split(mas, b_node, &mid_split, prev_l_mas.min);
>> > + split = mab_calc_split(mas, b_node, &mid_split, mas->min);
>> > mast_split_data(&mast, mas, split);
>> > /*
>> > * Usually correct, mab_mas_cp in the above call overwrites
>> > --
>> > 2.34.1
>> >
--
Wei Yang
Help you, Help me
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 3/4] maple_tree: use the correct min to calculate split
2024-11-08 3:19 ` Liam R. Howlett
@ 2024-11-09 1:40 ` Wei Yang
2024-11-09 4:01 ` Liam R. Howlett
0 siblings, 1 reply; 22+ messages in thread
From: Wei Yang @ 2024-11-09 1:40 UTC (permalink / raw)
To: Liam R. Howlett, Wei Yang, akpm, maple-tree, linux-mm,
Sidhartha Kumar, Lorenzo Stoakes
On Thu, Nov 07, 2024 at 10:19:37PM -0500, Liam R. Howlett wrote:
>* Wei Yang <richard.weiyang@gmail.com> [241107 21:55]:
>> On Sun, Oct 20, 2024 at 06:00:35PM -0400, Liam R. Howlett wrote:
>> >* Wei Yang <richard.weiyang@gmail.com> [241019 22:46]:
>> >> The check in mab_calc_split() is to make sure the gap between
>> >> [0, split] won't be too small. But we don't pass the correct min.
>> >>
>> >> First we may encounter a pivot[split] smaller than min. For example:
>> >>
>> >> mt_init_flags(mt, 0);
>> >> for (count = 0; count <= 240; count++) {
>> >> mas_set(&mas, count);
>> >> mas_store(&mas, xa_mk_value(count));
>> >> }
>> >>
>> >> On splitting root for storing 240, the pivot[split] is smaller than min.
>> >> This result a huge (pivot[split] - min).
>> >
>> >This is fine.
>> >
>> >There is an open work item to make it more accurate at higher levels,
>> >but it's not a problem as it is.
>> >
>> >Each level upwards needs a better 'minimum span', meaning that the node
>> >should have at least mas.min - mas.min + level * something.
>>
>> Don't follow this.
>>
>> First, I guess it is mas.max - mas.min?
>>
>
>No, the idea is that we could do better if we could keep the data more
>densely packed.
>
>If we have singletons (all range of 1), and we have 10,000 singletons,
>then a parent node with 0 - 128 and a sibling of 129-257 makes no sense,
>there is nothing that can be inserting into the parents last two slots
>- we can't split singletons smaller so we should have done a better job
>at splitting.
>
>If you think of how this grows today, we split then rebalance until the
>left node is full and we have to split again. This could be better
>optimised to avoid multiple rebalances by doing something to push more
>data to the left based on the span of the children and the nodes. But
>we'd want to keep enough so some modifications to each node doesn't
>cause a rebalance in the opposite direction.
>
>So a minimum span of parent nodes one level up from the leaves would be
>16 * 10 = 160, so when we split we can push more to the left to try and
>get to 160 while keeping the right node sufficient. The math of the
>span would change based on how far up the tree we go, but would be
>easier to maintain.
>
>I'd probably still want to split it 50/50 the first time, but it might
>make sense to push more on the next rebalance. so 10 => 6|5, 6|10 =>
>9|5, but maybe it's not worth the effort. I'm starting to think so,
>considering it's been ineffective to date. There is a chance that the
>active area gets moved to the more-full node and ends up splitting
>sooner anyways..
>
>All of this will probably change when dense nodes come anyways.
>
>> Then there is sth related to level. But I don't see level is used for
>> calculation. Would you mind sharing more your original thought?
>>
>> >
>> >It works today for leaves, somewhat.
>> >
>>
>> To me it works by coincidence.
>
>Seems like it.
>
>>
>> The condition check is:
>>
>> (bn->pivot[split] - min) < (slot_count - 1)
>>
>> while slot_count is a constent, 16 for leaf node.
>>
>> And we always pass 0 as min, the condition for leaf is:
>>
>> bn->pivot[split] < 15
>>
>> For the mmap world, per my understanding, we won't expect an address is smaller
>> than 15.
>
>There are other users, but I think we should just drop this attempt at
>fancy stuff.
>
You mean "set mid_split = 0 and drop the min argument all together" mentioned
in another mail?
>>
>> >>
>> >> Second prev_l_mas.min is not initialized for the first iteration. This
>> >> means on splitting leaf node, this value is mostly taking no effect.
>> >
>> >No, it is set to 0. Not initialized would cause random data loss.
>> >
>> >See MA_STATE() in the header.
>> >
>>
>> Right, I want to say not initialized to the value we want.
>>
>> Will be more careful next time.
>>
>> >>
>> >> Since we are now calculating the split of mas->node, we should use the
>> >> mas->min instead of prev_l_mas.min.
>> >
>> >This sounds reasonable, but considering what this number is used for, I
>> >don't see how it is working as well as it is today. I will need to look
>> >deeper into this.
>> >
>> >>
>> >> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>> >> CC: Liam R. Howlett <Liam.Howlett@Oracle.com>
>> >> CC: Sidhartha Kumar <sidhartha.kumar@oracle.com>
>> >> CC: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>> >> ---
>> >> lib/maple_tree.c | 2 +-
>> >> 1 file changed, 1 insertion(+), 1 deletion(-)
>> >>
>> >> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
>> >> index 894dc5e9436e..c2d4b188646c 100644
>> >> --- a/lib/maple_tree.c
>> >> +++ b/lib/maple_tree.c
>> >> @@ -3357,7 +3357,7 @@ static void mas_split(struct ma_state *mas, struct maple_big_node *b_node)
>> >> if (mas_push_data(mas, height, &mast, false))
>> >> break;
>> >>
>> >> - split = mab_calc_split(mas, b_node, &mid_split, prev_l_mas.min);
>> >> + split = mab_calc_split(mas, b_node, &mid_split, mas->min);
>> >> mast_split_data(&mast, mas, split);
>> >> /*
>> >> * Usually correct, mab_mas_cp in the above call overwrites
>> >> --
>> >> 2.34.1
>> >>
>>
>> --
>> Wei Yang
>> Help you, Help me
--
Wei Yang
Help you, Help me
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 3/4] maple_tree: use the correct min to calculate split
2024-11-09 1:40 ` Wei Yang
@ 2024-11-09 4:01 ` Liam R. Howlett
2024-11-09 12:22 ` Wei Yang
0 siblings, 1 reply; 22+ messages in thread
From: Liam R. Howlett @ 2024-11-09 4:01 UTC (permalink / raw)
To: Wei Yang; +Cc: akpm, maple-tree, linux-mm, Sidhartha Kumar, Lorenzo Stoakes
* Wei Yang <richard.weiyang@gmail.com> [241108 20:40]:
> On Thu, Nov 07, 2024 at 10:19:37PM -0500, Liam R. Howlett wrote:
> >* Wei Yang <richard.weiyang@gmail.com> [241107 21:55]:
> >> On Sun, Oct 20, 2024 at 06:00:35PM -0400, Liam R. Howlett wrote:
> >> >* Wei Yang <richard.weiyang@gmail.com> [241019 22:46]:
> >> >> The check in mab_calc_split() is to make sure the gap between
> >> >> [0, split] won't be too small. But we don't pass the correct min.
> >> >>
> >> >> First we may encounter a pivot[split] smaller than min. For example:
> >> >>
> >> >> mt_init_flags(mt, 0);
> >> >> for (count = 0; count <= 240; count++) {
> >> >> mas_set(&mas, count);
> >> >> mas_store(&mas, xa_mk_value(count));
> >> >> }
> >> >>
> >> >> On splitting root for storing 240, the pivot[split] is smaller than min.
> >> >> This result a huge (pivot[split] - min).
> >> >
> >> >This is fine.
> >> >
> >> >There is an open work item to make it more accurate at higher levels,
> >> >but it's not a problem as it is.
> >> >
> >> >Each level upwards needs a better 'minimum span', meaning that the node
> >> >should have at least mas.min - mas.min + level * something.
> >>
> >> Don't follow this.
> >>
> >> First, I guess it is mas.max - mas.min?
> >>
> >
> >No, the idea is that we could do better if we could keep the data more
> >densely packed.
> >
> >If we have singletons (all range of 1), and we have 10,000 singletons,
> >then a parent node with 0 - 128 and a sibling of 129-257 makes no sense,
> >there is nothing that can be inserting into the parents last two slots
> >- we can't split singletons smaller so we should have done a better job
> >at splitting.
> >
> >If you think of how this grows today, we split then rebalance until the
> >left node is full and we have to split again. This could be better
> >optimised to avoid multiple rebalances by doing something to push more
> >data to the left based on the span of the children and the nodes. But
> >we'd want to keep enough so some modifications to each node doesn't
> >cause a rebalance in the opposite direction.
> >
> >So a minimum span of parent nodes one level up from the leaves would be
> >16 * 10 = 160, so when we split we can push more to the left to try and
> >get to 160 while keeping the right node sufficient. The math of the
> >span would change based on how far up the tree we go, but would be
> >easier to maintain.
> >
> >I'd probably still want to split it 50/50 the first time, but it might
> >make sense to push more on the next rebalance. so 10 => 6|5, 6|10 =>
> >9|5, but maybe it's not worth the effort. I'm starting to think so,
> >considering it's been ineffective to date. There is a chance that the
> >active area gets moved to the more-full node and ends up splitting
> >sooner anyways..
> >
> >All of this will probably change when dense nodes come anyways.
> >
> >> Then there is sth related to level. But I don't see level is used for
> >> calculation. Would you mind sharing more your original thought?
> >>
> >> >
> >> >It works today for leaves, somewhat.
> >> >
> >>
> >> To me it works by coincidence.
> >
> >Seems like it.
> >
> >>
> >> The condition check is:
> >>
> >> (bn->pivot[split] - min) < (slot_count - 1)
> >>
> >> while slot_count is a constent, 16 for leaf node.
> >>
> >> And we always pass 0 as min, the condition for leaf is:
> >>
> >> bn->pivot[split] < 15
> >>
> >> For the mmap world, per my understanding, we won't expect an address is smaller
> >> than 15.
> >
> >There are other users, but I think we should just drop this attempt at
> >fancy stuff.
> >
>
> You mean "set mid_split = 0 and drop the min argument all together" mentioned
> in another mail?
Yes, please. The more I think about this, the more I think we're
wasting time trying to predict the future. And, although fun, it
usually doesn't work out.
>
> >>
> >> >>
> >> >> Second prev_l_mas.min is not initialized for the first iteration. This
> >> >> means on splitting leaf node, this value is mostly taking no effect.
> >> >
> >> >No, it is set to 0. Not initialized would cause random data loss.
> >> >
> >> >See MA_STATE() in the header.
> >> >
> >>
> >> Right, I want to say not initialized to the value we want.
> >>
> >> Will be more careful next time.
> >>
> >> >>
> >> >> Since we are now calculating the split of mas->node, we should use the
> >> >> mas->min instead of prev_l_mas.min.
> >> >
> >> >This sounds reasonable, but considering what this number is used for, I
> >> >don't see how it is working as well as it is today. I will need to look
> >> >deeper into this.
> >> >
> >> >>
> >> >> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> >> >> CC: Liam R. Howlett <Liam.Howlett@Oracle.com>
> >> >> CC: Sidhartha Kumar <sidhartha.kumar@oracle.com>
> >> >> CC: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> >> >> ---
> >> >> lib/maple_tree.c | 2 +-
> >> >> 1 file changed, 1 insertion(+), 1 deletion(-)
> >> >>
> >> >> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
> >> >> index 894dc5e9436e..c2d4b188646c 100644
> >> >> --- a/lib/maple_tree.c
> >> >> +++ b/lib/maple_tree.c
> >> >> @@ -3357,7 +3357,7 @@ static void mas_split(struct ma_state *mas, struct maple_big_node *b_node)
> >> >> if (mas_push_data(mas, height, &mast, false))
> >> >> break;
> >> >>
> >> >> - split = mab_calc_split(mas, b_node, &mid_split, prev_l_mas.min);
> >> >> + split = mab_calc_split(mas, b_node, &mid_split, mas->min);
> >> >> mast_split_data(&mast, mas, split);
> >> >> /*
> >> >> * Usually correct, mab_mas_cp in the above call overwrites
> >> >> --
> >> >> 2.34.1
> >> >>
> >>
> >> --
> >> Wei Yang
> >> Help you, Help me
>
> --
> Wei Yang
> Help you, Help me
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 3/4] maple_tree: use the correct min to calculate split
2024-11-09 4:01 ` Liam R. Howlett
@ 2024-11-09 12:22 ` Wei Yang
0 siblings, 0 replies; 22+ messages in thread
From: Wei Yang @ 2024-11-09 12:22 UTC (permalink / raw)
To: Liam R. Howlett, Wei Yang, akpm, maple-tree, linux-mm,
Sidhartha Kumar, Lorenzo Stoakes
On Fri, Nov 08, 2024 at 11:01:21PM -0500, Liam R. Howlett wrote:
>* Wei Yang <richard.weiyang@gmail.com> [241108 20:40]:
>> On Thu, Nov 07, 2024 at 10:19:37PM -0500, Liam R. Howlett wrote:
>> >* Wei Yang <richard.weiyang@gmail.com> [241107 21:55]:
>> >> On Sun, Oct 20, 2024 at 06:00:35PM -0400, Liam R. Howlett wrote:
>> >> >* Wei Yang <richard.weiyang@gmail.com> [241019 22:46]:
>> >> >> The check in mab_calc_split() is to make sure the gap between
>> >> >> [0, split] won't be too small. But we don't pass the correct min.
>> >> >>
>> >> >> First we may encounter a pivot[split] smaller than min. For example:
>> >> >>
>> >> >> mt_init_flags(mt, 0);
>> >> >> for (count = 0; count <= 240; count++) {
>> >> >> mas_set(&mas, count);
>> >> >> mas_store(&mas, xa_mk_value(count));
>> >> >> }
>> >> >>
>> >> >> On splitting root for storing 240, the pivot[split] is smaller than min.
>> >> >> This result a huge (pivot[split] - min).
>> >> >
>> >> >This is fine.
>> >> >
>> >> >There is an open work item to make it more accurate at higher levels,
>> >> >but it's not a problem as it is.
>> >> >
>> >> >Each level upwards needs a better 'minimum span', meaning that the node
>> >> >should have at least mas.min - mas.min + level * something.
>> >>
>> >> Don't follow this.
>> >>
>> >> First, I guess it is mas.max - mas.min?
>> >>
>> >
>> >No, the idea is that we could do better if we could keep the data more
>> >densely packed.
>> >
>> >If we have singletons (all range of 1), and we have 10,000 singletons,
>> >then a parent node with 0 - 128 and a sibling of 129-257 makes no sense,
>> >there is nothing that can be inserting into the parents last two slots
>> >- we can't split singletons smaller so we should have done a better job
>> >at splitting.
>> >
>> >If you think of how this grows today, we split then rebalance until the
>> >left node is full and we have to split again. This could be better
>> >optimised to avoid multiple rebalances by doing something to push more
>> >data to the left based on the span of the children and the nodes. But
>> >we'd want to keep enough so some modifications to each node doesn't
>> >cause a rebalance in the opposite direction.
>> >
>> >So a minimum span of parent nodes one level up from the leaves would be
>> >16 * 10 = 160, so when we split we can push more to the left to try and
>> >get to 160 while keeping the right node sufficient. The math of the
>> >span would change based on how far up the tree we go, but would be
>> >easier to maintain.
>> >
>> >I'd probably still want to split it 50/50 the first time, but it might
>> >make sense to push more on the next rebalance. so 10 => 6|5, 6|10 =>
>> >9|5, but maybe it's not worth the effort. I'm starting to think so,
>> >considering it's been ineffective to date. There is a chance that the
>> >active area gets moved to the more-full node and ends up splitting
>> >sooner anyways..
>> >
>> >All of this will probably change when dense nodes come anyways.
>> >
>> >> Then there is sth related to level. But I don't see level is used for
>> >> calculation. Would you mind sharing more your original thought?
>> >>
>> >> >
>> >> >It works today for leaves, somewhat.
>> >> >
>> >>
>> >> To me it works by coincidence.
>> >
>> >Seems like it.
>> >
>> >>
>> >> The condition check is:
>> >>
>> >> (bn->pivot[split] - min) < (slot_count - 1)
>> >>
>> >> while slot_count is a constent, 16 for leaf node.
>> >>
>> >> And we always pass 0 as min, the condition for leaf is:
>> >>
>> >> bn->pivot[split] < 15
>> >>
>> >> For the mmap world, per my understanding, we won't expect an address is smaller
>> >> than 15.
>> >
>> >There are other users, but I think we should just drop this attempt at
>> >fancy stuff.
>> >
>>
>> You mean "set mid_split = 0 and drop the min argument all together" mentioned
>> in another mail?
>
>Yes, please. The more I think about this, the more I think we're
>wasting time trying to predict the future. And, although fun, it
>usually doesn't work out.
>
Agree, will prepare another version with this.
Thanks
--
Wei Yang
Help you, Help me
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2024-11-09 12:22 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-10-20 2:46 [PATCH 0/4] maple_tree: current split may result in deficient node Wei Yang
2024-10-20 2:46 ` [PATCH 1/4] " Wei Yang
2024-10-20 21:55 ` Liam R. Howlett
2024-11-03 23:47 ` Wei Yang
2024-11-08 2:30 ` Wei Yang
2024-11-08 2:49 ` Liam R. Howlett
2024-10-20 2:46 ` [PATCH 2/4] maple_tree: add a test check " Wei Yang
2024-10-20 2:46 ` [PATCH 3/4] maple_tree: use the correct min to calculate split Wei Yang
2024-10-20 22:00 ` Liam R. Howlett
2024-10-29 17:49 ` Liam R. Howlett
2024-11-08 3:02 ` Wei Yang
2024-11-09 1:24 ` Wei Yang
2024-11-08 2:55 ` Wei Yang
2024-11-08 3:19 ` Liam R. Howlett
2024-11-09 1:40 ` Wei Yang
2024-11-09 4:01 ` Liam R. Howlett
2024-11-09 12:22 ` Wei Yang
2024-10-20 2:46 ` [PATCH 4/4] maple_tree: only root node could be deficient Wei Yang
2024-10-20 21:56 ` Liam R. Howlett
2024-11-03 23:15 ` Wei Yang
2024-10-20 20:34 ` [PATCH 0/4] maple_tree: current split may result in deficient node Liam R. Howlett
2024-10-21 1:30 ` Wei Yang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox