* [RFC PATCH v6.6] maple_tree: Fix mas_prealloc() reset
@ 2025-04-28 18:40 Liam R. Howlett
2025-04-28 21:08 ` Suren Baghdasaryan
0 siblings, 1 reply; 3+ messages in thread
From: Liam R. Howlett @ 2025-04-28 18:40 UTC (permalink / raw)
To: Liam R . Howlett
Cc: sidhartha.kumar, maple-tree, linux-mm, linux-kernel,
Zhaoyang Huang, Hailong Liu, Lorenzo Stoakes, Suren Baghdasaryan,
zhangpeng . 00 @ bytedance . com, Steve Kang, Matthew Wilcox
A previously used maple state may not be in the correct state for the
preallocation to correctly calculate the number of nodes necessary for the
configured store operation.
The user visible effect of which is warning that there are no nodes
allocated followed by a crash when there is a null pointer dereference
shortly after.
The NULL pointer dereference has been reported to happen when a vma
iterator is used to preallocate after walking to a leaf node but then
requesting to preallocate for a store across node boundaries (in v6.6.
of the kernel). The altered range means that there may not been enough
nodes as the maple state has been incorrectly configured. A critical
step is that the vma iterator then detects the misconfigured maple state
and resets, thus ensuring the tree is not corrupted - but ultimately
causes a failure when there are no nodes left.
Detect a misconfigured maple state in the mas_preallocate() code by
examining the current location and planned write to ensure a reset is
done if required. The performance impacts are minimal and within the
noise in initial testing.
Reported-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Reported-by: Hailong Liu <hailong.liu@oppo.com>
Fixes: fec29364348fe ("maple_tree: reduce resets during store setup")
Link: https://lore.kernel.org/all/1652f7eb-a51b-4fee-8058-c73af63bacd1@oppo.com/
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Hailong Liu <hailong.liu@oppo.com>
Cc: zhangpeng.00@bytedance.com <zhangpeng.00@bytedance.com>
Cc: Steve Kang <Steve.Kang@unisoc.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
lib/maple_tree.c | 35 +++++++++++++++++++++++++++++++----
1 file changed, 31 insertions(+), 4 deletions(-)
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 4eda949063602..17af9073494f5 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -5350,6 +5350,8 @@ static inline void mte_destroy_walk(struct maple_enode *enode,
static void mas_wr_store_setup(struct ma_wr_state *wr_mas)
{
+ unsigned char node_size;
+
if (!mas_is_active(wr_mas->mas)) {
if (mas_is_start(wr_mas->mas))
return;
@@ -5372,17 +5374,42 @@ static void mas_wr_store_setup(struct ma_wr_state *wr_mas)
* writes within this node. This is to stop partial walks in
* mas_prealloc() from being reset.
*/
+
+ /* Leaf root node is safe */
+ if (mte_is_root(wr_mas->mas->node))
+ return;
+
+ /* Cannot span beyond this node */
if (wr_mas->mas->last > wr_mas->mas->max)
goto reset;
- if (wr_mas->entry)
+ /* Cannot span before this node */
+ if (wr_mas->mas->index < wr_mas->mas->min)
+ goto reset;
+
+ wr_mas->type = mte_node_type(wr_mas->mas->node);
+ /* unfinished walk is okay */
+ if (!ma_is_leaf(wr_mas->type))
return;
- if (mte_is_leaf(wr_mas->mas->node) &&
- wr_mas->mas->last == wr_mas->mas->max)
+ /* Leaf node that ends in 0 means a spanning store */
+ if (!wr_mas->entry &&
+ (wr_mas->mas->last == wr_mas->mas->max))
goto reset;
- return;
+ mas_wr_node_walk(wr_mas);
+ if (wr_mas->r_min == wr_mas->mas->index &&
+ wr_mas->r_max == wr_mas->mas->last)
+ return; /* exact fit, no allocations */
+
+ wr_mas->slots = ma_slots(wr_mas->node, wr_mas->type);
+ mas_wr_end_piv(wr_mas);
+ node_size = mas_wr_new_end(wr_mas);
+ if (node_size >= mt_slots[wr_mas->type])
+ goto reset; /* Not going to fit */
+
+ if (node_size - 1 > mt_min_slots[wr_mas->type])
+ return; /* sufficient and will fit */
reset:
mas_reset(wr_mas->mas);
--
2.47.2
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [RFC PATCH v6.6] maple_tree: Fix mas_prealloc() reset
2025-04-28 18:40 [RFC PATCH v6.6] maple_tree: Fix mas_prealloc() reset Liam R. Howlett
@ 2025-04-28 21:08 ` Suren Baghdasaryan
2025-04-29 0:19 ` Liam R. Howlett
0 siblings, 1 reply; 3+ messages in thread
From: Suren Baghdasaryan @ 2025-04-28 21:08 UTC (permalink / raw)
To: Liam R. Howlett
Cc: sidhartha.kumar, maple-tree, linux-mm, linux-kernel,
Zhaoyang Huang, Hailong Liu, Lorenzo Stoakes,
zhangpeng . 00 @ bytedance . com, Steve Kang, Matthew Wilcox
On Mon, Apr 28, 2025 at 12:02 PM Liam R. Howlett
<Liam.Howlett@oracle.com> wrote:
>
> A previously used maple state may not be in the correct state for the
> preallocation to correctly calculate the number of nodes necessary for the
> configured store operation.
>
> The user visible effect of which is warning that there are no nodes
> allocated followed by a crash when there is a null pointer dereference
> shortly after.
>
> The NULL pointer dereference has been reported to happen when a vma
> iterator is used to preallocate after walking to a leaf node but then
> requesting to preallocate for a store across node boundaries (in v6.6.
> of the kernel). The altered range means that there may not been enough
> nodes as the maple state has been incorrectly configured. A critical
> step is that the vma iterator then detects the misconfigured maple state
> and resets, thus ensuring the tree is not corrupted - but ultimately
> causes a failure when there are no nodes left.
>
> Detect a misconfigured maple state in the mas_preallocate() code by
> examining the current location and planned write to ensure a reset is
> done if required. The performance impacts are minimal and within the
> noise in initial testing.
With this fix applied I can still see the issue using Hailong's
reproducers, both the userspace one that he shared over the weekend
and the one posted at
https://lore.kernel.org/all/1652f7eb-a51b-4fee-8058-c73af63bacd1@oppo.com/
>
> Reported-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> Reported-by: Hailong Liu <hailong.liu@oppo.com>
> Fixes: fec29364348fe ("maple_tree: reduce resets during store setup")
> Link: https://lore.kernel.org/all/1652f7eb-a51b-4fee-8058-c73af63bacd1@oppo.com/
> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Hailong Liu <hailong.liu@oppo.com>
> Cc: zhangpeng.00@bytedance.com <zhangpeng.00@bytedance.com>
> Cc: Steve Kang <Steve.Kang@unisoc.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> ---
> lib/maple_tree.c | 35 +++++++++++++++++++++++++++++++----
> 1 file changed, 31 insertions(+), 4 deletions(-)
>
> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
> index 4eda949063602..17af9073494f5 100644
> --- a/lib/maple_tree.c
> +++ b/lib/maple_tree.c
> @@ -5350,6 +5350,8 @@ static inline void mte_destroy_walk(struct maple_enode *enode,
>
> static void mas_wr_store_setup(struct ma_wr_state *wr_mas)
> {
> + unsigned char node_size;
> +
> if (!mas_is_active(wr_mas->mas)) {
> if (mas_is_start(wr_mas->mas))
> return;
> @@ -5372,17 +5374,42 @@ static void mas_wr_store_setup(struct ma_wr_state *wr_mas)
> * writes within this node. This is to stop partial walks in
> * mas_prealloc() from being reset.
> */
> +
> + /* Leaf root node is safe */
> + if (mte_is_root(wr_mas->mas->node))
> + return;
> +
> + /* Cannot span beyond this node */
> if (wr_mas->mas->last > wr_mas->mas->max)
> goto reset;
>
> - if (wr_mas->entry)
> + /* Cannot span before this node */
> + if (wr_mas->mas->index < wr_mas->mas->min)
> + goto reset;
> +
> + wr_mas->type = mte_node_type(wr_mas->mas->node);
> + /* unfinished walk is okay */
> + if (!ma_is_leaf(wr_mas->type))
> return;
>
> - if (mte_is_leaf(wr_mas->mas->node) &&
> - wr_mas->mas->last == wr_mas->mas->max)
> + /* Leaf node that ends in 0 means a spanning store */
> + if (!wr_mas->entry &&
> + (wr_mas->mas->last == wr_mas->mas->max))
> goto reset;
>
> - return;
> + mas_wr_node_walk(wr_mas);
> + if (wr_mas->r_min == wr_mas->mas->index &&
> + wr_mas->r_max == wr_mas->mas->last)
> + return; /* exact fit, no allocations */
> +
> + wr_mas->slots = ma_slots(wr_mas->node, wr_mas->type);
> + mas_wr_end_piv(wr_mas);
> + node_size = mas_wr_new_end(wr_mas);
> + if (node_size >= mt_slots[wr_mas->type])
> + goto reset; /* Not going to fit */
> +
> + if (node_size - 1 > mt_min_slots[wr_mas->type])
> + return; /* sufficient and will fit */
>
> reset:
> mas_reset(wr_mas->mas);
> --
> 2.47.2
>
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [RFC PATCH v6.6] maple_tree: Fix mas_prealloc() reset
2025-04-28 21:08 ` Suren Baghdasaryan
@ 2025-04-29 0:19 ` Liam R. Howlett
0 siblings, 0 replies; 3+ messages in thread
From: Liam R. Howlett @ 2025-04-29 0:19 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: sidhartha.kumar, maple-tree, linux-mm, linux-kernel,
Zhaoyang Huang, Hailong Liu, Lorenzo Stoakes,
zhangpeng . 00 @ bytedance . com, Steve Kang, Matthew Wilcox
* Suren Baghdasaryan <surenb@google.com> [250428 17:08]:
> On Mon, Apr 28, 2025 at 12:02 PM Liam R. Howlett
> <Liam.Howlett@oracle.com> wrote:
> >
> > A previously used maple state may not be in the correct state for the
> > preallocation to correctly calculate the number of nodes necessary for the
> > configured store operation.
> >
> > The user visible effect of which is warning that there are no nodes
> > allocated followed by a crash when there is a null pointer dereference
> > shortly after.
> >
> > The NULL pointer dereference has been reported to happen when a vma
> > iterator is used to preallocate after walking to a leaf node but then
> > requesting to preallocate for a store across node boundaries (in v6.6.
> > of the kernel). The altered range means that there may not been enough
> > nodes as the maple state has been incorrectly configured. A critical
> > step is that the vma iterator then detects the misconfigured maple state
> > and resets, thus ensuring the tree is not corrupted - but ultimately
> > causes a failure when there are no nodes left.
> >
> > Detect a misconfigured maple state in the mas_preallocate() code by
> > examining the current location and planned write to ensure a reset is
> > done if required. The performance impacts are minimal and within the
> > noise in initial testing.
>
> With this fix applied I can still see the issue using Hailong's
> reproducers, both the userspace one that he shared over the weekend
> and the one posted at
> https://lore.kernel.org/all/1652f7eb-a51b-4fee-8058-c73af63bacd1@oppo.com/
Thanks.
The test patch also has a few issues which would affect the allocations
(such as not setting the rcu flag on the tree).
Besides that, there are a few bugs here. One is the incorrect
calculation due to the moving vma iterator - which my patch doesn't
quite address as I've come up with another scenario, and test case. I
suspect the rule of the vma iterator pointing at the 'correct slot'
(documented but not enforced) saved us a lot of issues here.
I'm undecided if it's worth detecting this being false and resetting,
like in this patch.
The other is to do with MA_STATE_PREALLOC, which will stop
preallocations in bulk insert mode. If it's left set from the first
mas_preallocate() call, then we won't allocate more nodes and simply
return. That is, the bulk allocation is causing issues with chaining
mas_preallocate() calls. There is another issue with this flag not
always being set, and so my testing passed... With the way this month
has been going, I expect more fun yet.
I'll send out another version of this RFC with two (or, I guess more..)
patches soon.
Annoyingly, not all of these bugs exist in upstream - and so the vma
test code is less useful than it will be in the future. I have
recreated the scenario using the test code, but it didn't create the
same issues.
Thanks,
Liam
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-04-29 0:20 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-04-28 18:40 [RFC PATCH v6.6] maple_tree: Fix mas_prealloc() reset Liam R. Howlett
2025-04-28 21:08 ` Suren Baghdasaryan
2025-04-29 0:19 ` Liam R. Howlett
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox