linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hugh@veritas.com>
To: Nick Piggin <npiggin@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Nick Piggin <nickpiggin@yahoo.com.au>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Linux Memory Management List <linux-mm@kvack.org>
Subject: Re: [patch] mm: fix anon_vma races
Date: Tue, 21 Oct 2008 13:58:19 +0100 (BST)	[thread overview]
Message-ID: <Pine.LNX.4.64.0810211338580.4529@blonde.site> (raw)
In-Reply-To: <20081021043338.GA5694@wotan.suse.de>

On Tue, 21 Oct 2008, Nick Piggin wrote:
> On Mon, Oct 20, 2008 at 08:25:54PM -0700, Linus Torvalds wrote:
> > On Tue, 21 Oct 2008, Nick Piggin wrote:
> > > >
> > > > So what I'm trying to figure out is why Nick wanted to add another check
> > > > for page_mapped(). I'm not seeing what it is supposed to protect against.
> > > 
> > > It's not supposed to protect against anything that would be a problem
> > > in the existing code (well, I initially thought it might be, but Hugh
> > > explained why its not needed). I'd still like to put the check in, in
> > > order to constrain this peculiarity of SLAB_DESTROY_BY_RCU to those
> > > couple of functions which allocate or take a reference.
> > 
> > Hmm.  Ok, as long as I understand what it is for (and if it's not a 
> > bug-fix but a "like to drop the stale anon_vma early), I'm ok.
> > 
> > So I won't mind, and Hugh seems to prefer it. So if you send that patch 

(I'd prefer just a comment myself, but I do see Nick's point of view.)

> > alogn with a good explanation for a changelog entry, I'll apply it.
> 
> Well something like this, then. Hugh?

Yes, that's good.  I'm not a huge fan of such comments that dwarf the
code, but I've rather disqualified myself by going to the opposite
extreme, and it's hard to get over in less, and I think all you've
said is correct and relevant.

Would it be better without the VM_BUG_ON on page->mapping?  I find
that a bit distracting and of no interest, though I guess it's a
further way of clarifying the assumptions.  You could as well add
a VM_BUG_ON(!page_count(page)), but I don't really want that.

Thanks,
Hugh

> 
> --
> 
> With the existing SLAB_DESTROY_BY_RCU scheme for anon_vma, page_lock_anon_vma
> might take the lock of the anon_vma at a point where it has already been freed
> then re-allocated and reused for something else.
> 
> This is OK (with the exception of the now-fixed case where newly allocated
> anon_vma had its list manipulated without holding the lock), because in order
> to get to the pte, the page tables must be walked and the pte confirmed to
> point to this page anyway. So technically it should work.
> 
> The problem with it is that it is quite subtle, and it means that we have to
> keep this stale-anon_vma problem in the back of our minds, when reviewing or
> modifying any part of the anonymous rmap code. It *could* be that it would
> break some otherwise legitimate change to the code.
> 
> Add another page_mapped check to weed out these anon_vmas. Comment the
> existing page_mapped check a little bit.
> 
> Signed-off-by: Nick Piggin <npiggin@suse.de>
> ---
> Index: linux-2.6/mm/rmap.c
> ===================================================================
> --- linux-2.6.orig/mm/rmap.c
> +++ linux-2.6/mm/rmap.c
> @@ -200,11 +200,47 @@ struct anon_vma *page_lock_anon_vma(stru
>  	anon_mapping = (unsigned long) page->mapping;
>  	if (!(anon_mapping & PAGE_MAPPING_ANON))
>  		goto out;
> +
> +	/*
> +	 * The page_mapped check is required in order to ensure anon_vma is
> +	 * protected under this RCU critical section before we touch it.
> +	 *
> +	 * If page_mapped was not checked, page->mapping may refer to an
> +	 * anon_vma that has since been freed (see page_remove_rmap comment not
> +	 * resetting PageAnon). And hence it would not be protected with RCU
> +	 * and could be freed and reused at any time.
> +	 */
>  	if (!page_mapped(page))
>  		goto out;
>  
>  	anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
>  	spin_lock(&anon_vma->lock);
> +
> +	/*
> +	 * If the page is no longer mapped, we have no way to keep the anon_vma
> +	 * stable. It may be freed and even re-allocated for some other set of
> +	 * anonymous mappings at any point. Technically this should be OK, as
> +	 * we hold the spinlock, and should be able to tolerate finding
> +	 * unrelated vmas on our list. However we'd rather nip these in the bud
> +	 * here, for simplicity.
> +	 *
> +	 * If the page is mapped while we have the lock on the anon_vma, then
> +	 * we know anon_vma_unlink can't run and garbage collect the anon_vma:
> +	 * unmapping the page and decrementing its mapcount happens before
> +	 * unlinking the anon_vma; unlinking the anon_vma requires the
> +	 * anon_vma lock to be held. So this check ensures we have a stable
> +	 * anon_vma.
> +	 *
> +	 * Note: the page can still become unmapped, and the !page_mapped
> +	 * condition become true at any point. This check is definitely not
> +	 * preventing any such thing.
> +	 */
> +	if (unlikely(!page_mapped(page))) {
> +		spin_unlock(&anon_vma->lock);
> +		goto out;
> +	}
> +	VM_BUG_ON(anon_mapping != (unsigned long)page->mapping);
> +
>  	return anon_vma;
>  out:
>  	rcu_read_unlock();

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2008-10-21 12:58 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-16  4:10 Nick Piggin
2008-10-17 22:14 ` Hugh Dickins
2008-10-17 23:05   ` Linus Torvalds
2008-10-18  0:13     ` Hugh Dickins
2008-10-18  0:25       ` Linus Torvalds
2008-10-18  1:53       ` Nick Piggin
2008-10-18  2:50         ` Paul Mackerras
2008-10-18  2:57           ` Linus Torvalds
2008-10-18  5:49           ` Nick Piggin
2008-10-18 10:49             ` Paul Mackerras
2008-10-18 17:00             ` Linus Torvalds
2008-10-18 18:44               ` Matthew Wilcox
2008-10-19  2:54                 ` Nick Piggin
2008-10-19  2:53               ` Nick Piggin
2008-10-17 23:13 ` Peter Zijlstra
2008-10-17 23:53   ` Linus Torvalds
2008-10-18  0:42     ` Linus Torvalds
2008-10-18  1:08       ` Linus Torvalds
2008-10-18  1:32         ` Nick Piggin
2008-10-18  2:11           ` Linus Torvalds
2008-10-18  2:25             ` Nick Piggin
2008-10-18  2:35               ` Nick Piggin
2008-10-18  2:53               ` Linus Torvalds
2008-10-18  5:20                 ` Nick Piggin
2008-10-18 10:38                   ` Peter Zijlstra
2008-10-19  9:52                     ` Hugh Dickins
2008-10-19 10:51                       ` Peter Zijlstra
2008-10-19 12:39                         ` Hugh Dickins
2008-10-19 18:25                         ` Linus Torvalds
2008-10-19 18:45                           ` Peter Zijlstra
2008-10-19 19:00                           ` Hugh Dickins
2008-10-20  4:03                           ` Hugh Dickins
2008-10-20 15:17                             ` Linus Torvalds
2008-10-20 18:21                               ` Hugh Dickins
2008-10-21  2:56                               ` Nick Piggin
2008-10-21  3:25                                 ` Linus Torvalds
2008-10-21  4:33                                   ` Nick Piggin
2008-10-21 12:58                                     ` Hugh Dickins [this message]
2008-10-21 15:59                                     ` Christoph Lameter
2008-10-22  9:29                                       ` Nick Piggin
2008-10-21  4:34                                   ` Nick Piggin
2008-10-21 13:55                                     ` Hugh Dickins
2008-10-21  2:44                           ` Nick Piggin
2008-10-18 19:14               ` Hugh Dickins
2008-10-19  3:03                 ` Nick Piggin
2008-10-19  7:07                   ` Hugh Dickins
2008-10-20  3:26                     ` Hugh Dickins
2008-10-21  2:45                       ` Nick Piggin
2008-10-19  1:13       ` Hugh Dickins
2008-10-19  2:41         ` Nick Piggin
2008-10-19  9:45           ` Hugh Dickins
2008-10-21  3:59             ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0810211338580.4529@blonde.site \
    --to=hugh@veritas.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=linux-mm@kvack.org \
    --cc=nickpiggin@yahoo.com.au \
    --cc=npiggin@suse.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox