linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>
To: Jann Horn <jannh@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Suren Baghdasaryan <surenb@google.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 15/15] mm/mmap: Change vma iteration order in do_vmi_align_munmap()
Date: Tue, 15 Aug 2023 03:29:07 -0400	[thread overview]
Message-ID: <20230815072907.fsvetn4dzohgt2z5@revolver> (raw)
In-Reply-To: <CAG48ez2UbpFb41gfcwyoA73ado=+YEiRtU2KmKt560_M_B7JUw@mail.gmail.com>

* Jann Horn <jannh@google.com> [230814 17:22]:
> On Mon, Aug 14, 2023 at 10:32 PM Liam R. Howlett
> <Liam.Howlett@oracle.com> wrote:
> > * Jann Horn <jannh@google.com> [230814 11:44]:
> > > @akpm
> > >
> > > On Mon, Jul 24, 2023 at 8:31 PM Liam R. Howlett <Liam.Howlett@oracle.com> wrote:
> > > > Since prev will be set later in the function, it is better to reverse
> > > > the splitting direction of the start VMA (modify the new_below argument
> > > > to __split_vma).
> > >
> > > It might be a good idea to reorder "mm: always lock new vma before
> > > inserting into vma tree" before this patch.
> > >
> > > If you apply this patch without "mm: always lock new vma before
> > > inserting into vma tree", I think move_vma(), when called with a start
> > > address in the middle of a VMA, will behave like this:
> > >
> > >  - vma_start_write() [lock the VMA to be moved]
> > >  - move_page_tables() [moves page table entries]
> > >  - do_vmi_munmap()
> > >    - do_vmi_align_munmap()
> > >      - __split_vma()
> > >        - creates a new VMA **covering the moved range** that is **not locked**
> > >        - stores the new VMA in the VMA tree **without locking it** [1]
> > >      - new VMA is locked and removed again [2]
> > > [...]
> > >
> > > So after the page tables in the region have already been moved, I
> > > believe there will be a brief window (between [1] and [2]) where page
> > > faults in the region can happen again, which could probably cause new
> > > page tables and PTEs to be created in the region again in that window.
> > > (This can't happen in Linus' current tree because the new VMA created
> > > by __split_vma() only covers the range that is not being moved.)
> >
> > Ah, so my reversing of which VMA to keep to the first split call opens a
> > window where the VMA being removed is not locked.  Good catch.

Looking at this again, I think it exists in Linus' tree and my change
actually removes this window:

-               error = __split_vma(vmi, vma, start, 0);
+               error = __split_vma(vmi, vma, start, 1);
                if (error)
                        goto start_split_failed;

The last argument is "new_below", which means the new VMA will be at the
lower address.  I don't love the argument of int or the name, also the
documentation is lacking for the split function.

So, once we split at "start", vm_end = "start" in the new VMA while
start will be in the old VMA.  I then lock the old vma to be removed
(again) and add it to the detached maple tree.

Before my patch, we split the VMA and took the new unlocked VMA for
removal.. until I locked the new vma to be removed and add it to the
detached maple tree.  So there is a window that we write the new split
VMA into the tree prior to locking the VMA, but it is locked before
removal.

This change actually aligns the splitting with the other callers who use
the split_vma() wrapper.

> >
> > >
> > > Though I guess that's not going to lead to anything bad, since
> > > do_vmi_munmap() anyway cleans up PTEs and page tables in the region?
> > > So maybe it's not that important.
> >
> > do_vmi_munmap() will clean up PTEs from the end of the previous VMA to
> > the start of the next
> 
> Alright, I guess no action is needed here then.

I don't see a difference between this and the race that exists after the
page fault ends and a task unmaps the area prior to the first task using
the faulted areas?

> 
> > I don't have any objections in the ordering or see an issue resulting
> > from having it this way... Except for maybe lockdep, so maybe we should
> > change the ordering of the patch sets just to be safe?
> >
> > In fact, should we add another check somewhere to ensure we do generate
> > the warning?  Perhaps to remove_mt() to avoid the exit path hitting it?
> 
> I'm not sure which lockdep check you mean. do_vmi_align_munmap() is
> going to lock the VMAs again before it operates on them; I guess the
> only checks that would catch this would be the page table validation
> logic or the RSS counter checks on exit?
> 

I'm trying to add a lockdep to detect this potential window in the
future, but it won't work as you pointed out since it will be locked
before removal.  I'm not sure it's worth it since Suren added more
lockdep checks in his series.

I appreciate you really looking at these changes and thinking them
through.

Regards,
Liam


  reply	other threads:[~2023-08-15  7:29 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-24 18:31 [PATCH v3 00/15] Reduce preallocations for maple tree Liam R. Howlett
2023-07-24 18:31 ` [PATCH v3 01/15] maple_tree: Add benchmarking for mas_for_each Liam R. Howlett
2023-07-24 18:31 ` [PATCH v3 02/15] maple_tree: Add benchmarking for mas_prev() Liam R. Howlett
2023-07-24 18:31 ` [PATCH v3 03/15] mm: Change do_vmi_align_munmap() tracking of VMAs to remove Liam R. Howlett
2023-07-24 18:31 ` [PATCH v3 04/15] mm: Remove prev check from do_vmi_align_munmap() Liam R. Howlett
2023-07-24 18:31 ` [PATCH v3 05/15] maple_tree: Introduce __mas_set_range() Liam R. Howlett
2023-07-24 18:31 ` [PATCH v3 06/15] mm: Remove re-walk from mmap_region() Liam R. Howlett
2023-07-24 18:31 ` [PATCH v3 07/15] maple_tree: Re-introduce entry to mas_preallocate() arguments Liam R. Howlett
2023-07-24 18:31 ` [PATCH v3 08/15] maple_tree: Adjust node allocation on mas_rebalance() Liam R. Howlett
2023-07-24 18:31 ` [PATCH v3 09/15] mm: Use vma_iter_clear_gfp() in nommu Liam R. Howlett
2023-07-24 18:31 ` [PATCH v3 10/15] mm: Set up vma iterator for vma_iter_prealloc() calls Liam R. Howlett
2023-07-24 18:31 ` [PATCH v3 11/15] maple_tree: Move mas_wr_end_piv() below mas_wr_extend_null() Liam R. Howlett
2023-07-24 18:31 ` [PATCH v3 12/15] maple_tree: Update mas_preallocate() testing Liam R. Howlett
2023-07-24 18:31 ` [PATCH v3 13/15] maple_tree: Refine mas_preallocate() node calculations Liam R. Howlett
2023-07-24 18:31 ` [PATCH v3 14/15] maple_tree: Reduce resets during store setup Liam R. Howlett
2023-07-24 18:31 ` [PATCH v3 15/15] mm/mmap: Change vma iteration order in do_vmi_align_munmap() Liam R. Howlett
2023-08-14 15:43   ` Jann Horn
2023-08-14 19:10     ` Andrew Morton
2023-08-14 19:18     ` Liam R. Howlett
2023-08-14 21:22       ` Jann Horn
2023-08-15  7:29         ` Liam R. Howlett [this message]
2023-08-15 14:19           ` Jann Horn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230815072907.fsvetn4dzohgt2z5@revolver \
    --to=liam.howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=jannh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=surenb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox