From: Jann Horn <jannh@google.com>
To: "Liam R. Howlett" <Liam.Howlett@oracle.com>,
Jann Horn <jannh@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Suren Baghdasaryan <surenb@google.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 15/15] mm/mmap: Change vma iteration order in do_vmi_align_munmap()
Date: Tue, 15 Aug 2023 16:19:38 +0200 [thread overview]
Message-ID: <CAG48ez2OTwhdbN2NsYEoU4mayfdCBT+4NirdxMQ=5fZvKFjq6w@mail.gmail.com> (raw)
In-Reply-To: <20230815072907.fsvetn4dzohgt2z5@revolver>
On Tue, Aug 15, 2023 at 9:29 AM Liam R. Howlett <Liam.Howlett@oracle.com> wrote:
>
> * Jann Horn <jannh@google.com> [230814 17:22]:
> > On Mon, Aug 14, 2023 at 10:32 PM Liam R. Howlett
> > <Liam.Howlett@oracle.com> wrote:
> > > * Jann Horn <jannh@google.com> [230814 11:44]:
> > > > @akpm
> > > >
> > > > On Mon, Jul 24, 2023 at 8:31 PM Liam R. Howlett <Liam.Howlett@oracle.com> wrote:
> > > > > Since prev will be set later in the function, it is better to reverse
> > > > > the splitting direction of the start VMA (modify the new_below argument
> > > > > to __split_vma).
> > > >
> > > > It might be a good idea to reorder "mm: always lock new vma before
> > > > inserting into vma tree" before this patch.
> > > >
> > > > If you apply this patch without "mm: always lock new vma before
> > > > inserting into vma tree", I think move_vma(), when called with a start
> > > > address in the middle of a VMA, will behave like this:
> > > >
> > > > - vma_start_write() [lock the VMA to be moved]
> > > > - move_page_tables() [moves page table entries]
> > > > - do_vmi_munmap()
> > > > - do_vmi_align_munmap()
> > > > - __split_vma()
> > > > - creates a new VMA **covering the moved range** that is **not locked**
> > > > - stores the new VMA in the VMA tree **without locking it** [1]
> > > > - new VMA is locked and removed again [2]
> > > > [...]
> > > >
> > > > So after the page tables in the region have already been moved, I
> > > > believe there will be a brief window (between [1] and [2]) where page
> > > > faults in the region can happen again, which could probably cause new
> > > > page tables and PTEs to be created in the region again in that window.
> > > > (This can't happen in Linus' current tree because the new VMA created
> > > > by __split_vma() only covers the range that is not being moved.)
> > >
> > > Ah, so my reversing of which VMA to keep to the first split call opens a
> > > window where the VMA being removed is not locked. Good catch.
>
> Looking at this again, I think it exists in Linus' tree and my change
> actually removes this window:
>
> - error = __split_vma(vmi, vma, start, 0);
> + error = __split_vma(vmi, vma, start, 1);
> if (error)
> goto start_split_failed;
>
> The last argument is "new_below", which means the new VMA will be at the
> lower address. I don't love the argument of int or the name, also the
> documentation is lacking for the split function.
>
> So, once we split at "start", vm_end = "start" in the new VMA while
> start will be in the old VMA. I then lock the old vma to be removed
> (again) and add it to the detached maple tree.
>
> Before my patch, we split the VMA and took the new unlocked VMA for
> removal.. until I locked the new vma to be removed and add it to the
> detached maple tree. So there is a window that we write the new split
> VMA into the tree prior to locking the VMA, but it is locked before
> removal.
>
> This change actually aligns the splitting with the other callers who use
> the split_vma() wrapper.
Oooh, you're right. Sorry, I misread that.
> > >
> > > >
> > > > Though I guess that's not going to lead to anything bad, since
> > > > do_vmi_munmap() anyway cleans up PTEs and page tables in the region?
> > > > So maybe it's not that important.
> > >
> > > do_vmi_munmap() will clean up PTEs from the end of the previous VMA to
> > > the start of the next
> >
> > Alright, I guess no action is needed here then.
>
> I don't see a difference between this and the race that exists after the
> page fault ends and a task unmaps the area prior to the first task using
> the faulted areas?
Yeah, you're right. I was thinking about it in terms of "this is a
weird situation and it would be dangerous if something relied on there
not being any non-empty PTEs in the range", but there's nothing that
relies on it, so that's fine.
prev parent reply other threads:[~2023-08-15 14:20 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-24 18:31 [PATCH v3 00/15] Reduce preallocations for maple tree Liam R. Howlett
2023-07-24 18:31 ` [PATCH v3 01/15] maple_tree: Add benchmarking for mas_for_each Liam R. Howlett
2023-07-24 18:31 ` [PATCH v3 02/15] maple_tree: Add benchmarking for mas_prev() Liam R. Howlett
2023-07-24 18:31 ` [PATCH v3 03/15] mm: Change do_vmi_align_munmap() tracking of VMAs to remove Liam R. Howlett
2023-07-24 18:31 ` [PATCH v3 04/15] mm: Remove prev check from do_vmi_align_munmap() Liam R. Howlett
2023-07-24 18:31 ` [PATCH v3 05/15] maple_tree: Introduce __mas_set_range() Liam R. Howlett
2023-07-24 18:31 ` [PATCH v3 06/15] mm: Remove re-walk from mmap_region() Liam R. Howlett
2023-07-24 18:31 ` [PATCH v3 07/15] maple_tree: Re-introduce entry to mas_preallocate() arguments Liam R. Howlett
2023-07-24 18:31 ` [PATCH v3 08/15] maple_tree: Adjust node allocation on mas_rebalance() Liam R. Howlett
2023-07-24 18:31 ` [PATCH v3 09/15] mm: Use vma_iter_clear_gfp() in nommu Liam R. Howlett
2023-07-24 18:31 ` [PATCH v3 10/15] mm: Set up vma iterator for vma_iter_prealloc() calls Liam R. Howlett
2023-07-24 18:31 ` [PATCH v3 11/15] maple_tree: Move mas_wr_end_piv() below mas_wr_extend_null() Liam R. Howlett
2023-07-24 18:31 ` [PATCH v3 12/15] maple_tree: Update mas_preallocate() testing Liam R. Howlett
2023-07-24 18:31 ` [PATCH v3 13/15] maple_tree: Refine mas_preallocate() node calculations Liam R. Howlett
2023-07-24 18:31 ` [PATCH v3 14/15] maple_tree: Reduce resets during store setup Liam R. Howlett
2023-07-24 18:31 ` [PATCH v3 15/15] mm/mmap: Change vma iteration order in do_vmi_align_munmap() Liam R. Howlett
2023-08-14 15:43 ` Jann Horn
2023-08-14 19:10 ` Andrew Morton
2023-08-14 19:18 ` Liam R. Howlett
2023-08-14 21:22 ` Jann Horn
2023-08-15 7:29 ` Liam R. Howlett
2023-08-15 14:19 ` Jann Horn [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAG48ez2OTwhdbN2NsYEoU4mayfdCBT+4NirdxMQ=5fZvKFjq6w@mail.gmail.com' \
--to=jannh@google.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=surenb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox