Re: [PATCH] mremap: enforce rmap src/dst vma ordering in case of vma_merge succeeding in copy_vma

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Hugh Dickins <hughd@google.com>
To: Nai Xia <nai.xia@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
	Mel Gorman <mgorman@suse.de>, Pawel Sikora <pluto@agmk.net>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, jpiszcz@lucidpixels.com, arekm@pld-linux.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mremap: enforce rmap src/dst vma ordering in case of vma_merge succeeding in copy_vma
Date: Fri, 4 Nov 2011 19:21:28 -0700 (PDT)	[thread overview]
Message-ID: <alpine.LSU.2.00.1111041856530.22199@sister.anvils> (raw)
In-Reply-To: <CAPQyPG5RgPnN-kVc1Oy+78mAa9vevLiZWCwx2pEkeHKY1t6V1A@mail.gmail.com>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 7081 bytes --]

On Sat, 5 Nov 2011, Nai Xia wrote:
> On Sat, Nov 5, 2011 at 4:54 AM, Andrea Arcangeli <aarcange@redhat.com> wrote:
> > On Fri, Nov 04, 2011 at 12:16:03PM -0700, Hugh Dickins wrote:
> >> On Fri, 4 Nov 2011, Nai Xia wrote:
> >> > On Fri, Nov 4, 2011 at 3:31 PM, Hugh Dickins <hughd@google.com> wrote:
> >> > > On Mon, 31 Oct 2011, Andrea Arcangeli wrote:
> >> > >> @@ -2339,7 +2339,15 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
> >> > >>                */
> >> > >>               if (vma_start >= new_vma->vm_start &&
> >> > >>                   vma_start < new_vma->vm_end)
> >> > >> +                     /*
> >> > >> +                      * No need to call anon_vma_order_tail() in
> >> > >> +                      * this case because the same PT lock will
> >> > >> +                      * serialize the rmap_walk against both src
> >> > >> +                      * and dst vmas.
> >> > >> +                      */
> >> > >
> >> > > Really?  Please convince me: I just do not see what ensures that
> >> > > the same pt lock covers both src and dst areas in this case.
> >> >
> >> > At the first glance that rmap_walk does travel this merged VMA
> >> > once...
> >> > But, Now, Wait...., I am actually really puzzled that this case can really
> >> > happen at all, you see that vma_merge() does not break the validness
> >> > between page->index and its VMA. So if this can really happen,
> >> > a page->index should be valid in both areas in a same VMA.
> >> > It's strange to imagine that a PTE is copy inside a _same_ VMA
> >> > and page->index is valid at both old and new places.
> >>
> >> Yes, I think you are right, thank you for elucidating it.
> >>
> >> That was a real case when we wrote copy_vma(), when rmap was using
> >> pte_chains; but once anon_vma came in, and imposed vm_pgoff matching
> >> on anonymous mappings too, it became dead code.  With linear vm_pgoff
> >> matching, you cannot fit a range in two places within the same vma.
> >> (And even the non-linear case relies upon vm_pgoff defaults.)
> >>
> >> So we could simplify the copy_vma() interface a little now (get rid of
> >> that nasty **vmap): I'm not quite sure whether we ought to do that,
> >> but certainly Andrea's comment there should be updated (if he also
> >> agrees with your analysis).
> >
> > The vmap should only trigger when the prev vma (prev relative to src
> > vma) is extended at the end to make space for the dst range. And by
> > extending it, we filled the hole between the prev vma and "src"
> > vma. So then the prev vma becomes the "src vma" and also the "dst
> > vma". So we can't keep working with the old "vma" pointer after that.
> >
> > I doubt it can be removed without crashing in the above case.
> 
> Yes, this line itself should not be removed. As I explained,
> pgoff adjustment at the top of the copy_vma() for non-faulted
> vma will lead to this case.

Ah, thank you, that's what I was asking you to point me to, the place
I was missing that recalculates pgoff: at the head of copy_vma() itself.

Yes, if that adjustment remains (no reason why not), then we cannot
remove the *vmap = new_vma; but that is the only case that nowadays
can need the *vmap = new_vma (agreed?), which does deserve a comment.


> But we do not need to worry
> about the move_page_tables() should after this happens.
> And so no lines need to be added here. But maybe the
> documentation should be changed in your original patch
> to clarify this. Reasoning with PTL locks for this case might
> be somewhat misleading.

Right, there are no ptes there yet, so we're cannot miss any.

> 
>  Furthermore, the move_page_tables() call following this case
> might better be totally avoided for code readability and it's
> simple to judge with (vma == new_vma)
> 
> Do you agree? :)

Well, it's true that looking at pagetables in this case is just
a waste of time; but personally I'd prefer to add more comment
than special case handling for this.

> 
> >
> > I thought some more about it and what I missed I think is the
> > anon_vma_merge in vma_adjust. What that anon_vma_merge, rmap_walk will
> > have to complete before we can start moving the ptes. And so rmap_walk
> > when starts again from scratch (after anon_vma_merge run in
> > vma_adjust) will find all ptes even if vma_merge succeeded before.
> >
> > In fact this may also work for fork. Fork will take the anon_vma root
> > lock somehow to queue the child vma in the same_anon_vma. Doing so it
> > will serialize against any running rmap_walk from all other cpus. The
> > ordering has never been an issue for fork anyway, but it would have
> > have been an issue for mremap in case vma_merge succeeded and src_vma
> > != dst_vma, if vma_merge didn't act as a serialization point against
> > rmap_walk (which I realized now).
> >
> > What makes it safe is again taking both PT locks simultanously. So it
> > doesn't matter what rmap_walk searches, as long as the anon_vma_chain
> > list cannot change by the time rmap_walk started.
> >
> > What I thought before was rmap_walk checking vma1 and then vma_merge
> > succeed (where src vma is vma2 and dst vma is vma1, but vma1 is not a
> > new vma queued at the end of same_anon_vma), move_page_tables moves
> > the pte from vma2 to vma1, and then rmap_walk checks vma2. But again
> > vma_merge won't be allowed to complete in the middle of rmap_walk, and
> > so it cannot trigger and we can safely drop the patch. It wasn't
> > immediate to think at the locks taken within vma_adjust sorry.
> >

I found Andrea's "anon_vma_merge" reply very hard to understand; but
it looks like he now accepts that it was mistaken, or on the wrong
track at least...

> 
> Oh, no, sorry. I think I was trying to clarify in the first reply on
> that thread that
> we all agree that anon_vma chain is 100% stable when doing rmap_walk().
> What is important, I think,  is the relative order of these three events:
> 1.  The time  rmap_walk() scans the src
> 2.  The time rmap_walk() scans the dst
> 3.  The time move_page_tables() move PTE from src vma to dst.

... after you set us straight again with this.

> 
> rmap_walk() scans dst( taking dst PTL) ---> move_page_tables() with
> both PTLs ---> rmap_walk() scans src(taking src PTL)
> 
> will trigger this bug.  The racing is there even if rmap_walk() scans src--->dst
> but that racing does not harm. I think Mel explained why it's safe for good
> ordering in his first reply to my post.
> 
> vma_merge() is only guilty for giving a wrong order of VMAs before
> move_page_tables() and rmap_walk() begin to race, itself does not race
> with rmap_walk().
> 
> You see, it seems this game might be really puzzling. Indeed, maybe it's time
> to fall back on locks instead of playing with racing. Just like the
> good old time,
> our classic OS text book told us that shared variables deserve locks. :-)

That's my preference, yes: this mail thread seems to cry out for that!

Hugh

next prev parent reply	other threads:[~2011-11-05  2:21 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <201110122012.33767.pluto@agmk.net>
     [not found] ` <alpine.LSU.2.00.1110131547550.1346@sister.anvils>
2011-10-13 23:30   ` kernel 3.0: BUG: soft lockup: find_get_pages+0x51/0x110 Hugh Dickins
2011-10-16 16:11     ` Christoph Hellwig
2011-10-16 23:54     ` Andrea Arcangeli
2011-10-17 18:51       ` Hugh Dickins
2011-10-17 22:05         ` Andrea Arcangeli
2011-10-19  7:43         ` Mel Gorman
2011-10-19 13:39           ` Linus Torvalds
2011-10-19 19:42             ` Hugh Dickins
2011-10-20  6:30               ` Paweł Sikora
2011-10-20  6:51                 ` Linus Torvalds
2011-10-21  6:54                 ` Nai Xia
2011-10-21  7:35                   ` Pawel Sikora
2011-10-20 12:51               ` Nai Xia
2011-10-20 18:36                 ` Hugh Dickins
2011-10-21  6:22                   ` Nai Xia
2011-10-21  8:07                     ` Pawel Sikora
2011-10-21  9:07                       ` Nai Xia
2011-10-21 21:36                         ` Paweł Sikora
2011-10-22  6:21                           ` Nai Xia
2011-10-22 16:42                             ` Paweł Sikora
2011-10-20  9:11       ` Nai Xia
2011-10-21 15:56         ` Mel Gorman
2011-10-21 17:21           ` Nai Xia
2011-10-21 17:41           ` Andrea Arcangeli
2011-10-21 22:50             ` Andrea Arcangeli
2011-10-22  5:52               ` Nai Xia
2011-10-31 17:14                 ` Andrea Arcangeli
2011-10-31 17:27                   ` [PATCH] mremap: enforce rmap src/dst vma ordering in case of vma_merge succeeding in copy_vma Andrea Arcangeli
2011-11-01 12:07                     ` Mel Gorman
2011-11-01 14:35                     ` Nai Xia
2011-11-04  7:31                     ` Hugh Dickins
2011-11-04 14:34                       ` Nai Xia
2011-11-04 15:59                         ` Pawel Sikora
2011-11-05  2:21                           ` Nai Xia
2011-11-04 19:16                         ` Hugh Dickins
2011-11-04 20:54                           ` Andrea Arcangeli
2011-11-05  0:09                             ` Nai Xia
2011-11-05  2:21                               ` Hugh Dickins [this message]
2011-11-05  3:07                                 ` Andrea Arcangeli
2011-11-05 17:06                                 ` Andrea Arcangeli
2011-12-08  3:24                                   ` David Rientjes
2011-12-08 12:42                                     ` Andrea Arcangeli
2011-12-09  0:08                                   ` Andrew Morton
2011-12-09  1:55                                     ` Andrea Arcangeli
2011-11-04 23:56                       ` Andrea Arcangeli
2011-11-05  0:21                         ` Nai Xia
2011-11-05  0:59                           ` Nai Xia
2011-11-05  1:33                           ` Andrea Arcangeli
2011-11-05  2:00                             ` Nai Xia
2011-11-07 13:14                               ` Mel Gorman
2011-11-07 15:42                                 ` Andrea Arcangeli
2011-11-07 16:28                                   ` Mel Gorman
2011-11-09  1:25                                     ` Andrea Arcangeli
2011-11-11  9:14                                       ` Nai Xia
2011-11-16 14:00                                       ` Andrea Arcangeli
2011-11-17  0:16                                         ` Hugh Dickins
2011-11-17  2:49                                           ` Nai Xia
2011-11-17  6:21                                           ` Nai Xia
2011-11-17 18:42                                           ` Andrea Arcangeli
2011-11-18  1:42                                             ` Nai Xia
2011-11-18  2:17                                               ` Andrea Arcangeli
2011-11-19  9:15                                                 ` Nai Xia
2011-10-22  5:07             ` kernel 3.0: BUG: soft lockup: find_get_pages+0x51/0x110 Nai Xia
2011-10-31 16:34               ` Andrea Arcangeli
2011-10-16 22:37   ` Linus Torvalds
2011-10-17  3:02     ` Hugh Dickins
2011-10-17  3:09       ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LSU.2.00.1111041856530.22199@sister.anvils \
    --to=hughd@google.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=arekm@pld-linux.org \
    --cc=jpiszcz@lucidpixels.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=nai.xia@gmail.com \
    --cc=pluto@agmk.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox