Re: [patch] mm: fix anon_vma races

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Hugh Dickins <hugh@veritas.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Nick Piggin <npiggin@suse.de>,
	Linux Memory Management List <linux-mm@kvack.org>
Subject: Re: [patch] mm: fix anon_vma races
Date: Tue, 21 Oct 2008 13:56:12 +1100	[thread overview]
Message-ID: <200810211356.13191.nickpiggin@yahoo.com.au> (raw)
In-Reply-To: <alpine.LFD.2.00.0810200742300.3518@nehalem.linux-foundation.org>

On Tuesday 21 October 2008 02:17, Linus Torvalds wrote:
> On Mon, 20 Oct 2008, Hugh Dickins wrote:

> > But in the case of anon pages, what page->mapping points to may be
> > volatile, in the sense that that memory might at some point get reused
> > for a different anon_vma, or the slab page below it get freed and
> > reused for a different purpose completely: that's what we have to
> > careful of in the case of anon pages, and it's RCU and the
> > page_mapped() test which guard that.
>
> .. and I'm not worried about the slab page. It's stable, since we hold the
> RCU read-lock. No worries about that one either.
>
> It's the "struct page" itself - the one we use for lookup in
> page_lock_anon_vma(). And I'm worried about the need for *re-doing* the
> page_mapped() test.
>
> The problem that makes "page_lock_anon_vma()" such a total disaster is
> that yes, it does locking, but it does locking _after_ the lookup, and the
> lock doesn't actually protect any of the data that it is using for the
> lookup itself.
>
> And yes, we have various tricks to try to make the data "safe" even if we
> race with the lookup, like the RCU stability of the anon_vma allocation,
> so that even if we race, we don't do anything bad. And I don't worry about
> the anon_vma, that part looks fine.
>
> But page_remove_rmap() is run totally unlocked wrt page_lock_anon_vma(),
> it looks like. page_remove_rmap() is run under the pte lock, and
> page_lock_anon_vma() is run under no lock at all that I can see that is
> reliable.
>
> So yes, we have the same kind of "delay the destroy" wrt page->mapping (ie
> page_remove_rmap() doesn't actually clear page->mapping at all), but none
> of this runs under any lock at all.
>
> So as far as I can tell, the only reason "page_lock_anon_vma()" is safe is
> because we hopefully always hold an elevated page count when we call it,
> so at least the struct page itself will never get freed, so page->mapping
> is safe just because it's not cleared.
>
> But assuming all that is true, it boils down to this:
>
>  - the anon_vma we get may be the wrong one or a stale one (since
>    page_remove_rmap ran concurrently and we ended up freeing the
>    anon_vma), but it's always a "valid" anon_vma in the sense that it's
>    initialized and the list is always pointing to *some* stable set of
>    vm_area_struct's.
>
>  - if we raced, and the anon_vma is stale, we're going to walk over
>    some bogus list that pertains to a totally different page, but WHO
>    REALLY CARES? If it is about another page that got that anon_vma
>    reallocated to it, all the code that traverses the list of vma's still
>    has to check the page tables _anyway_. And that will never trigger, and
>    that will get the pte lock for checking anyway, so at _that_ point do
>    we correctly finally synchronize with a lock that is actually relevant.
>
>  - the "anon_vma->lock" is totally irrelevant wrt "page_mapped()", and I'm
>    not seeing what it could possibly help to re-check after getting that
>    lock.
>
> So what I'm trying to figure out is why Nick wanted to add another check
> for page_mapped(). I'm not seeing what it is supposed to protect against.

It's not supposed to protect against anything that would be a problem
in the existing code (well, I initially thought it might be, but Hugh
explained why its not needed). I'd still like to put the check in, in
order to constrain this peculiarity of SLAB_DESTROY_BY_RCU to those
couple of functions which allocate or take a reference.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2008-10-21  2:56 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-16  4:10 Nick Piggin
2008-10-17 22:14 ` Hugh Dickins
2008-10-17 23:05   ` Linus Torvalds
2008-10-18  0:13     ` Hugh Dickins
2008-10-18  0:25       ` Linus Torvalds
2008-10-18  1:53       ` Nick Piggin
2008-10-18  2:50         ` Paul Mackerras
2008-10-18  2:57           ` Linus Torvalds
2008-10-18  5:49           ` Nick Piggin
2008-10-18 10:49             ` Paul Mackerras
2008-10-18 17:00             ` Linus Torvalds
2008-10-18 18:44               ` Matthew Wilcox
2008-10-19  2:54                 ` Nick Piggin
2008-10-19  2:53               ` Nick Piggin
2008-10-17 23:13 ` Peter Zijlstra
2008-10-17 23:53   ` Linus Torvalds
2008-10-18  0:42     ` Linus Torvalds
2008-10-18  1:08       ` Linus Torvalds
2008-10-18  1:32         ` Nick Piggin
2008-10-18  2:11           ` Linus Torvalds
2008-10-18  2:25             ` Nick Piggin
2008-10-18  2:35               ` Nick Piggin
2008-10-18  2:53               ` Linus Torvalds
2008-10-18  5:20                 ` Nick Piggin
2008-10-18 10:38                   ` Peter Zijlstra
2008-10-19  9:52                     ` Hugh Dickins
2008-10-19 10:51                       ` Peter Zijlstra
2008-10-19 12:39                         ` Hugh Dickins
2008-10-19 18:25                         ` Linus Torvalds
2008-10-19 18:45                           ` Peter Zijlstra
2008-10-19 19:00                           ` Hugh Dickins
2008-10-20  4:03                           ` Hugh Dickins
2008-10-20 15:17                             ` Linus Torvalds
2008-10-20 18:21                               ` Hugh Dickins
2008-10-21  2:56                               ` Nick Piggin [this message]
2008-10-21  3:25                                 ` Linus Torvalds
2008-10-21  4:33                                   ` Nick Piggin
2008-10-21 12:58                                     ` Hugh Dickins
2008-10-21 15:59                                     ` Christoph Lameter
2008-10-22  9:29                                       ` Nick Piggin
2008-10-21  4:34                                   ` Nick Piggin
2008-10-21 13:55                                     ` Hugh Dickins
2008-10-21  2:44                           ` Nick Piggin
2008-10-18 19:14               ` Hugh Dickins
2008-10-19  3:03                 ` Nick Piggin
2008-10-19  7:07                   ` Hugh Dickins
2008-10-20  3:26                     ` Hugh Dickins
2008-10-21  2:45                       ` Nick Piggin
2008-10-19  1:13       ` Hugh Dickins
2008-10-19  2:41         ` Nick Piggin
2008-10-19  9:45           ` Hugh Dickins
2008-10-21  3:59             ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200810211356.13191.nickpiggin@yahoo.com.au \
    --to=nickpiggin@yahoo.com.au \
    --cc=a.p.zijlstra@chello.nl \
    --cc=hugh@veritas.com \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox