linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Linus Torvalds <torvalds@linux-foundation.org>, stable@vger.kernel.org
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Hugh Dickins <hugh@veritas.com>, Nick Piggin <npiggin@suse.de>,
	Linux Memory Management List <linux-mm@kvack.org>
Subject: Re: [patch] mm: fix anon_vma races
Date: Tue, 21 Oct 2008 13:44:45 +1100	[thread overview]
Message-ID: <200810211344.45410.nickpiggin@yahoo.com.au> (raw)
In-Reply-To: <alpine.LFD.2.00.0810191105090.4386@nehalem.linux-foundation.org>

On Monday 20 October 2008 05:25, Linus Torvalds wrote:
> On Sun, 19 Oct 2008, Peter Zijlstra wrote:
> > Part of the confusion is that we don't clear those pointers at the end
> > of their lifetimes (page_remove_rmap and anon_vma_unlink).
> >
> > I guess the !page_mapping() test in page_lock_anon_vma() is meant to
> > deal with this
>
> Hmm. So that part I'm still not entirely convinced about.
>
> The thing is, we have two issues on anon_vma usage, and the
> page_lock_anon_vma() usage in particular:
>
>  - the integrity of the list itself
>
>    Here it should be sufficient to just always get the lock, to the point
>    where we don't need to care about anything else. So getting the lock
>    properly on new allocation makes all the other races irrelevant.
>
>  - the integrity of the _result_ of traversing the list
>
>    This is what the !page_mapping() thing is supposedly protecting
>    against, I think.
>
>    But as far as I can tell, there's really two different use cases here:
>    (a) people who care deeply about the result and (b) people who don't.
>
>    And the difference between the two cases is whether they had the page
>    locked or not. The "try_to_unmap()" callers care deeply, and lock the
>    page. In contrast, some "page_referenced()" callers (really just
>    shrink_active_list) don't care deeply, and to them the return value is
>    really just a heuristic.
>
> As far as I can tell, all the people who care deeply will lock the page
> (and _have_ to lock the page), and thus 'page->mapping' should be stable
> for those cases.
>
> And then we have the other cases, who just want a heuristic, and they
> don't hold the page lock, but if we look at the wrong active_vma that has
> gotten reallocated to something else, they don't even really care.
>
> So I'm not seeing the reason for that check for page_mapped() at the end.
> Does it actually protect against anything relevant?
>
> Anyway, I _think_ the part that everybody agrees about is the initial
> locking of the anon_vma. Whether we then even need any memory barriers
> and/or the page_mapped() check is an independent question. Yes? No?
>
> So I'm suggesting this commit as the part we at least all agree on. But I
> haven't pushed it out yet, so you can still holler.. But I think all the
> discussion is about other issues, and we all agree on at least this part?
>
> 		Linus
>
> ---
> From f422f2ec50872331820f15711f48b9ffc9cbb64e Mon Sep 17 00:00:00 2001
> From: Linus Torvalds <torvalds@linux-foundation.org>
> Date: Sun, 19 Oct 2008 10:32:20 -0700
> Subject: [PATCH] anon_vma_prepare: properly lock even newly allocated
> entries
>
> The anon_vma code is very subtle, and we end up doing optimistic lookups
> of anon_vmas under RCU in page_lock_anon_vma() with no locking.  Other
> CPU's can also see the newly allocated entry immediately after we've
> exposed it by setting "vma->anon_vma" to the new value.
>
> We protect against the anon_vma being destroyed by having the SLAB
> marked as SLAB_DESTROY_BY_RCU, so the RCU lookup can depend on the
> allocation not being destroyed - but it might still be free'd and
> re-allocated here to a new vma.
>
> As a result, we should not do the anon_vma list ops on a newly allocated
> vma without proper locking.

Thanks, this is exactly what I proposed. I preferred to add more
explicit comments about why it's OK to expose anon_vma to vma->anon_vma
without ordering initialisation etc.

But those comments were mainly for the benefit of others. If you and
Hugh now agree with me on that, then maybe no comments are required.

>
> Acked-by: Nick Piggin <npiggin@suse.de>
> Acked-by: Hugh Dickins <hugh@veritas.com>
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> ---
>  mm/rmap.c |   42 ++++++++++++++++++++++++++++++++----------
>  1 files changed, 32 insertions(+), 10 deletions(-)
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 0383acf..e8d639b 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -55,7 +55,33 @@
>
>  struct kmem_cache *anon_vma_cachep;
>
> -/* This must be called under the mmap_sem. */
> +/**
> + * anon_vma_prepare - attach an anon_vma to a memory region
> + * @vma: the memory region in question
> + *
> + * This makes sure the memory mapping described by 'vma' has
> + * an 'anon_vma' attached to it, so that we can associate the
> + * anonymous pages mapped into it with that anon_vma.
> + *
> + * The common case will be that we already have one, but if
> + * if not we either need to find an adjacent mapping that we
> + * can re-use the anon_vma from (very common when the only
> + * reason for splitting a vma has been mprotect()), or we
> + * allocate a new one.
> + *
> + * Anon-vma allocations are very subtle, because we may have
> + * optimistically looked up an anon_vma in page_lock_anon_vma()
> + * and that may actually touch the spinlock even in the newly
> + * allocated vma (it depends on RCU to make sure that the
> + * anon_vma isn't actually destroyed).
> + *
> + * As a result, we need to do proper anon_vma locking even
> + * for the new allocation. At the same time, we do not want
> + * to do any locking for the common case of already having
> + * an anon_vma.
> + *
> + * This must be called with the mmap_sem held for reading.
> + */
>  int anon_vma_prepare(struct vm_area_struct *vma)
>  {
>  	struct anon_vma *anon_vma = vma->anon_vma;
> @@ -63,20 +89,17 @@ int anon_vma_prepare(struct vm_area_struct *vma)
>  	might_sleep();
>  	if (unlikely(!anon_vma)) {
>  		struct mm_struct *mm = vma->vm_mm;
> -		struct anon_vma *allocated, *locked;
> +		struct anon_vma *allocated;
>
>  		anon_vma = find_mergeable_anon_vma(vma);
> -		if (anon_vma) {
> -			allocated = NULL;
> -			locked = anon_vma;
> -			spin_lock(&locked->lock);
> -		} else {
> +		allocated = NULL;
> +		if (!anon_vma) {
>  			anon_vma = anon_vma_alloc();
>  			if (unlikely(!anon_vma))
>  				return -ENOMEM;
>  			allocated = anon_vma;
> -			locked = NULL;
>  		}
> +		spin_lock(&anon_vma->lock);
>
>  		/* page_table_lock to protect against threads */
>  		spin_lock(&mm->page_table_lock);
> @@ -87,8 +110,7 @@ int anon_vma_prepare(struct vm_area_struct *vma)
>  		}
>  		spin_unlock(&mm->page_table_lock);
>
> -		if (locked)
> -			spin_unlock(&locked->lock);
> +		spin_unlock(&anon_vma->lock);
>  		if (unlikely(allocated))
>  			anon_vma_free(allocated);
>  	}
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2008-10-21  2:44 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-16  4:10 Nick Piggin
2008-10-17 22:14 ` Hugh Dickins
2008-10-17 23:05   ` Linus Torvalds
2008-10-18  0:13     ` Hugh Dickins
2008-10-18  0:25       ` Linus Torvalds
2008-10-18  1:53       ` Nick Piggin
2008-10-18  2:50         ` Paul Mackerras
2008-10-18  2:57           ` Linus Torvalds
2008-10-18  5:49           ` Nick Piggin
2008-10-18 10:49             ` Paul Mackerras
2008-10-18 17:00             ` Linus Torvalds
2008-10-18 18:44               ` Matthew Wilcox
2008-10-19  2:54                 ` Nick Piggin
2008-10-19  2:53               ` Nick Piggin
2008-10-17 23:13 ` Peter Zijlstra
2008-10-17 23:53   ` Linus Torvalds
2008-10-18  0:42     ` Linus Torvalds
2008-10-18  1:08       ` Linus Torvalds
2008-10-18  1:32         ` Nick Piggin
2008-10-18  2:11           ` Linus Torvalds
2008-10-18  2:25             ` Nick Piggin
2008-10-18  2:35               ` Nick Piggin
2008-10-18  2:53               ` Linus Torvalds
2008-10-18  5:20                 ` Nick Piggin
2008-10-18 10:38                   ` Peter Zijlstra
2008-10-19  9:52                     ` Hugh Dickins
2008-10-19 10:51                       ` Peter Zijlstra
2008-10-19 12:39                         ` Hugh Dickins
2008-10-19 18:25                         ` Linus Torvalds
2008-10-19 18:45                           ` Peter Zijlstra
2008-10-19 19:00                           ` Hugh Dickins
2008-10-20  4:03                           ` Hugh Dickins
2008-10-20 15:17                             ` Linus Torvalds
2008-10-20 18:21                               ` Hugh Dickins
2008-10-21  2:56                               ` Nick Piggin
2008-10-21  3:25                                 ` Linus Torvalds
2008-10-21  4:33                                   ` Nick Piggin
2008-10-21 12:58                                     ` Hugh Dickins
2008-10-21 15:59                                     ` Christoph Lameter
2008-10-22  9:29                                       ` Nick Piggin
2008-10-21  4:34                                   ` Nick Piggin
2008-10-21 13:55                                     ` Hugh Dickins
2008-10-21  2:44                           ` Nick Piggin [this message]
2008-10-18 19:14               ` Hugh Dickins
2008-10-19  3:03                 ` Nick Piggin
2008-10-19  7:07                   ` Hugh Dickins
2008-10-20  3:26                     ` Hugh Dickins
2008-10-21  2:45                       ` Nick Piggin
2008-10-19  1:13       ` Hugh Dickins
2008-10-19  2:41         ` Nick Piggin
2008-10-19  9:45           ` Hugh Dickins
2008-10-21  3:59             ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200810211344.45410.nickpiggin@yahoo.com.au \
    --to=nickpiggin@yahoo.com.au \
    --cc=a.p.zijlstra@chello.nl \
    --cc=hugh@veritas.com \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox