From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Bert Karwatzki <spasswolf@web.de>
Cc: "Liam R . Howlett" <Liam.Howlett@oracle.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
spassowlf@web.de
Subject: Re: [PATCH v8 14/21] mm/mmap: Avoid zeroing vma tree in mmap_region()
Date: Sat, 5 Oct 2024 07:21:17 +0100 [thread overview]
Message-ID: <9e1f326d-7740-4f4c-baf5-45f9eae0048d@lucifer.local> (raw)
In-Reply-To: <088a3541b85b783ef68337bd4bb790d62f200dfa.camel@web.de>
On Sat, Oct 05, 2024 at 02:56:01AM +0200, Bert Karwatzki wrote:
> Am Freitag, dem 04.10.2024 um 23:41 +0100 schrieb Lorenzo Stoakes:
> > On Fri, Oct 04, 2024 at 11:35:44AM +0200, Bert Karwatzki wrote:
> > > Here's the log procduced by this kernel:
> > >
> > > c9e7f76815d3 (HEAD -> maple_tree_debug_4) hack: set of info stuff v5
> > > 7e3bb072761a mm: correct error handling in mmap_region()
> > > 77df9e4bb222 (tag: next-20241001, origin/master, origin/HEAD, master) Add linux-next specific files for 20241001
> > >
> > > Again it took two attempts to trigger the bug.
> > >
> > > Bert Karwatzki
> > >
> >
> > Sending an updated, cleaned up version of the patch with a lot of
> > explanation. This is functionally identical to the v3 fix I already sent so
> > you can try that or this to confirm it resolves your issue.
> >
> > If you are able to do so, I can submit this to upstream for a hotfix. If
> > not, well then back to the drawing board and I'd be very shocked :)
> >
> > I have been able to reproduce the issue locally in our userland testing
> > suite entirely consistently, and this patch resolves the issue and also
> > continues to pass all maple tree unit tests.
> >
> > Again thank you so much for all your help - I hope you are able to find a
> > spare moment to quickly give this one a try and confirm whether it does
> > indeed address the problem you've reported.
> >
> > Thanks, Lorenzo
> >
> > ----8<----
> > From 126d65bd9839cd3ec941007872b357e27fd56066 Mon Sep 17 00:00:00 2001
> > From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > Date: Fri, 4 Oct 2024 15:18:58 +0100
> > Subject: [PATCH] maple_tree: correct tree corruption on spanning store
> >
> > Writing a data range into a maple tree may involve overwriting a number of
> > existing entries that span across more than one node. Doing so invokes a
> > 'spanning' store.
> >
> > Performing a spanning store across two leaf nodes in a maple tree in which
> > entries are overwritten is achieved by first initialising a 'big' node,
> > which will store the coalesced entries between the two nodes comprising
> > entries prior to the newly stored entry, the newly stored entry, and
> > subsequent entries.
> >
> > This 'big node' is then merged back into the tree and the tree is
> > rebalanced, replacing the entries across the spanned nodes with those
> > contained in the big node.
> >
> > The operation is performed in mas_wr_spanning_store() which starts by
> > establishing two maple tree state objects ('mas' objects) on the left of
> > the range and on the right (l_mas and r_mas respectively).
> >
> > l_mas traverses to the beginning of the range to be stored in order to copy
> > the data BEFORE the requested store into the big node.
> >
> > We then insert our new entry immediately afterwards (both the left copy and
> > the storing of the new entry are combined and performed by
> > mas_store_b_node()).
> >
> > r_mas traverses to the populated slot immediately after, in order to copy
> > the data AFTER the requested store into the big node.
> >
> > This copy of the right-hand node is performed by mas_mab_cp() as long as
> > r_mas indicates that there's data to copy, i.e. r_mas.offset <= r_mas.end.
> >
> > We traverse r_mas to this position in mas_wr_node_walk() using a simple
> > loop:
> >
> > while (offset < count && mas->index > wr_mas->pivots[offset])
> > offset++;
> >
> > Note here that count is determined to be the (inclusive) index of the last
> > node containing data in the node as determined by ma_data_end().
> >
> > This means that even in searching for mas->index, which will have been set
> > to one plus the end of the target range in order to traverse to the next
> > slot in mas_wr_spanning_store(), we will terminate the iteration at the end
> > of the node range even if this condition is not met due to the offset <
> > count condition.
> >
> > The fact this right hand node contains the end of the range being stored is
> > why we are traversing it, and this loop is why we appear to discover a
> > viable range within the right node to copy to the big one.
> >
> > However, if the node that r_mas traverses contains a pivot EQUAL to the end
> > of the range being stored, and this is the LAST pivot contained within the
> > node, something unexpected happens:
> >
> > 1. The l_mas traversal copy and insertion of the new entry in the big node
> > is performed via mas_store_b_node() correctly.
> >
> > 2. The traversal performed by mas_wr_node_walk() means our r_mas.offset is
> > set to the offset of the entry equal to the end of the range we store.
> >
> > 3. We therefore copy this DUPLICATE of the final pivot into the big node,
> > and insert this DUPLICATE entry, alongside its invalid slot entry
> > immediately after the newly inserted entry.
> >
> > 4. The big node containing this duplicated is inserted into the tree which
> > is rebalanced, and therefore the maple tree becomes corrupted.
> >
> > Note that if the right hand node had one or more entries with pivots of
> > greater value than the end of the stored range, this would not happen. If
> > it contained entries with pivots of lesser value it would not be the right
> > node in this spanning store.
> >
> > This appears to have been at risk of happening throughout the maple tree's
> > history, however it seemed significantly less likely to occur until
> > recently.
> >
> > The balancing of the tree seems to have made it unlikely that you would
> > happen to perform a store that both spans two nodes AND would overwrite
> > precisely the entry with the largest pivot in the right-hand node which
> > contains no further larger pivots.
> >
> > The work performed in commit f8d112a4e657 ("mm/mmap: avoid zeroing vma tree
> > in mmap_region()") seems to have made the probability of this event much
> > more likely.
> >
> > Previous to this change, MAP_FIXED mappings which were overwritten would
> > first be cleared before any subsequent store or importantly - merge of
> > surrounding entries - would be performed.
> >
> > After this change, this is no longer the case, and this means that, in the
> > worst case, a number of entries might be overwritten in combination with a
> > merge (and subsequent overwriting expansion) between both the prior entry
> > AND a subsequent entry.
> >
> > The motivation for this change arose from Bert Karwatzki's report of
> > encountering mm instability after the release of kernel v6.12-rc1 which,
> > after the use of CONFIG_DEBUG_VM_MAPLE_TREE and similar configuration
> > options, was identified as maple tree corruption.
> >
> > After Bert very generously provided his time and ability to reproduce this
> > event consistently, I was able to finally identify that the issue discussed
> > in this commit message was occurring for him.
> >
> > The solution implemented in this patch is:
> >
> > 1. Adjust mas_wr_walk_index() to return a boolean value indicating whether
> > the containing node is actually populated with entries possessing pivots
> > equal to or greater than mas->index.
> >
> > 2. When traversing the right node in mas_wr_spanning_store(), use this
> > value to determine whether to try to copy from the right node - if it is
> > not populated, then do not do so.
> >
> > This passes all maple tree unit tests and resolves the reported bug.
> > ---
> > lib/maple_tree.c | 20 ++++++++++++++++----
> > 1 file changed, 16 insertions(+), 4 deletions(-)
> >
> > diff --git a/lib/maple_tree.c b/lib/maple_tree.c
> > index 37abf0fe380b..e6f0da908ba7 100644
> > --- a/lib/maple_tree.c
> > +++ b/lib/maple_tree.c
> > @@ -2194,6 +2194,8 @@ static inline void mas_node_or_none(struct ma_state *mas,
> >
> > /*
> > * mas_wr_node_walk() - Find the correct offset for the index in the @mas.
> > + * If @mas->index cannot be found within the containing
> > + * node, we traverse to the last entry in the node.
> > * @wr_mas: The maple write state
> > *
> > * Uses mas_slot_locked() and does not need to worry about dead nodes.
> > @@ -3527,6 +3529,12 @@ static bool mas_wr_walk(struct ma_wr_state *wr_mas)
> > return true;
> > }
> >
> > +/*
> > + * Traverse the maple tree until the offset of mas->index is reached.
> > + *
> > + * Return: Is this node actually populated with entries possessing pivots equal
> > + * to or greater than mas->index?
> > + */
> > static bool mas_wr_walk_index(struct ma_wr_state *wr_mas)
> > {
> > struct ma_state *mas = wr_mas->mas;
> > @@ -3535,8 +3543,11 @@ static bool mas_wr_walk_index(struct ma_wr_state *wr_mas)
> > mas_wr_walk_descend(wr_mas);
> > wr_mas->content = mas_slot_locked(mas, wr_mas->slots,
> > mas->offset);
> > - if (ma_is_leaf(wr_mas->type))
> > - return true;
> > + if (ma_is_leaf(wr_mas->type)) {
> > + unsigned long pivot = wr_mas->pivots[mas->offset];
> > +
> > + return pivot == 0 || mas->index <= pivot;
> > + }
> > mas_wr_walk_traverse(wr_mas);
> >
> > }
> > @@ -3696,6 +3707,7 @@ static noinline void mas_wr_spanning_store(struct ma_wr_state *wr_mas)
> > struct maple_big_node b_node;
> > struct ma_state *mas;
> > unsigned char height;
> > + bool r_populated;
> >
> > /* Left and Right side of spanning store */
> > MA_STATE(l_mas, NULL, 0, 0);
> > @@ -3737,7 +3749,7 @@ static noinline void mas_wr_spanning_store(struct ma_wr_state *wr_mas)
> > r_mas.last++;
> >
> > r_mas.index = r_mas.last;
> > - mas_wr_walk_index(&r_wr_mas);
> > + r_populated = mas_wr_walk_index(&r_wr_mas);
> > r_mas.last = r_mas.index = mas->last;
> >
> > /* Set up left side. */
> > @@ -3761,7 +3773,7 @@ static noinline void mas_wr_spanning_store(struct ma_wr_state *wr_mas)
> > /* Copy l_mas and store the value in b_node. */
> > mas_store_b_node(&l_wr_mas, &b_node, l_mas.end);
> > /* Copy r_mas into b_node. */
> > - if (r_mas.offset <= r_mas.end)
> > + if (r_populated && r_mas.offset <= r_mas.end)
> > mas_mab_cp(&r_mas, r_mas.offset, r_mas.end,
> > &b_node, b_node.b_end + 1);
> > else
> > --
> > 2.46.2
>
> I just tested this and it passed ten tests (i.e. upgrading the proton version i
> steam) in a row.
>
> Bert Karwatzki
Perfect :) will send the fix upstream then as a hotfix for 6.12! Thanks
very much for helping out with this, your help has been absolutely
invaluable and HUGELY appreciated.
Cheers, Lorenzo
next prev parent reply other threads:[~2024-10-05 6:21 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-04 9:35 Bert Karwatzki
2024-10-04 9:58 ` Lorenzo Stoakes
2024-10-04 14:23 ` Lorenzo Stoakes
2024-10-04 14:26 ` Lorenzo Stoakes
2024-10-04 14:32 ` Lorenzo Stoakes
2024-10-04 14:58 ` Lorenzo Stoakes
2024-10-04 22:41 ` Lorenzo Stoakes
2024-10-05 0:56 ` Bert Karwatzki
2024-10-05 6:21 ` Lorenzo Stoakes [this message]
2024-10-05 8:57 ` Bert Karwatzki
2024-10-05 11:11 ` Lorenzo Stoakes
-- strict thread matches above, loose matches on Subject: below --
2024-10-13 22:35 Bert Karwatzki
2024-10-14 9:46 ` Lorenzo Stoakes
2024-10-16 10:28 ` Bert Karwatzki
2024-10-16 11:16 ` Lorenzo Stoakes
2024-10-16 14:13 ` Liam R. Howlett
2024-10-04 8:51 Bert Karwatzki
2024-10-04 8:59 ` Lorenzo Stoakes
2024-10-03 17:07 Bert Karwatzki
2024-10-03 17:24 ` Lorenzo Stoakes
2024-10-03 19:32 ` Lorenzo Stoakes
2024-10-04 8:36 ` Lorenzo Stoakes
2024-10-03 13:09 Bert Karwatzki
2024-10-03 13:34 ` Lorenzo Stoakes
2024-10-03 10:51 Bert Karwatzki
2024-10-03 11:17 ` Lorenzo Stoakes
2024-10-03 10:41 Bert Karwatzki
2024-10-03 10:46 ` Lorenzo Stoakes
2024-10-03 8:59 Bert Karwatzki
2024-10-03 9:04 ` Lorenzo Stoakes
2024-10-03 9:27 ` Lorenzo Stoakes
2024-10-02 22:58 Bert Karwatzki
2024-10-03 7:43 ` Lorenzo Stoakes
2024-10-02 22:57 Bert Karwatzki
2024-10-03 8:06 ` Lorenzo Stoakes
2024-10-02 21:58 Bert Karwatzki
2024-10-02 21:48 Bert Karwatzki
2024-10-02 21:41 Bert Karwatzki
[not found] <20241002105131.4545-1-spasswolf@web.de>
2024-10-02 11:19 ` Lorenzo Stoakes
2024-10-01 2:34 Bert Karwatzki
2024-10-01 8:02 ` Lorenzo Stoakes
2024-10-01 8:38 ` Bert Karwatzki
2024-10-01 8:49 ` Lorenzo Stoakes
2024-10-01 8:55 ` Bert Karwatzki
2024-10-01 8:59 ` Lorenzo Stoakes
2024-10-01 9:10 ` Bert Karwatzki
2024-10-01 9:20 ` Lorenzo Stoakes
2024-10-01 9:49 ` Lorenzo Stoakes
2024-10-01 9:57 ` Bert Karwatzki
2024-10-01 10:02 ` Lorenzo Stoakes
2024-10-01 10:22 ` Bert Karwatzki
2024-10-01 10:33 ` Lorenzo Stoakes
2024-10-01 10:42 ` Bert Karwatzki
2024-10-01 11:23 ` Lorenzo Stoakes
2024-10-01 11:56 ` Lorenzo Stoakes
2024-10-01 16:43 ` Bert Karwatzki
2024-10-01 18:01 ` Lorenzo Stoakes
2024-10-02 8:39 ` Lorenzo Stoakes
2024-10-02 8:48 ` Lorenzo Stoakes
2024-10-02 12:13 ` Lorenzo Stoakes
2024-10-02 13:23 ` Lorenzo Stoakes
2024-10-02 16:13 ` Bert Karwatzki
2024-10-02 17:19 ` Lorenzo Stoakes
2024-10-02 18:28 ` Lorenzo Stoakes
2024-10-02 18:54 ` Lorenzo Stoakes
2024-10-02 20:06 ` Bert Karwatzki
2024-10-02 20:22 ` Lorenzo Stoakes
2024-10-02 20:39 ` Bert Karwatzki
2024-10-02 20:44 ` Lorenzo Stoakes
2024-10-02 21:13 ` Lorenzo Stoakes
2024-08-30 4:00 [PATCH v8 00/21] Avoid MAP_FIXED gap exposure Liam R. Howlett
2024-08-30 4:00 ` [PATCH v8 14/21] mm/mmap: Avoid zeroing vma tree in mmap_region() Liam R. Howlett
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9e1f326d-7740-4f4c-baf5-45f9eae0048d@lucifer.local \
--to=lorenzo.stoakes@oracle.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=spassowlf@web.de \
--cc=spasswolf@web.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox