From: Nick Piggin <npiggin@suse.de>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
Hugh Dickins <hugh@veritas.com>,
Linux Memory Management List <linux-mm@kvack.org>
Subject: Re: [patch] mm: fix anon_vma races
Date: Sat, 18 Oct 2008 07:20:46 +0200 [thread overview]
Message-ID: <20081018052046.GA26472@wotan.suse.de> (raw)
In-Reply-To: <alpine.LFD.2.00.0810171949010.3438@nehalem.linux-foundation.org>
On Fri, Oct 17, 2008 at 07:53:49PM -0700, Linus Torvalds wrote:
>
>
> On Sat, 18 Oct 2008, Nick Piggin wrote:
> > @@ -171,6 +181,10 @@ static struct anon_vma *page_lock_anon_v
> >
> > anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
> > spin_lock(&anon_vma->lock);
> > +
> > + if (anon_mapping != (unsigned long)page->mapping)
> > + goto out;
> > +
> > return anon_vma;
> > out:
> > rcu_read_unlock();
>
> I see why you'd like to try to do this, but look a bit closer, and you'll
> realize that this is *really* wrong.
>
> So there's the brown-paper-bag-reason why it's wrong: you need to unlock
> in this case,
Check.
> but there's a subtler reason why I doubt the whole approach
> works: I don't think we actually hold the anon_vma lock when we set
> page->mapping.
No, we don't, but I think that's OK because we do an atomic assignment
to page->mapping. I can't see any bugs there (the change to always take
the anon_vma lock when inserting a new anon_vma into vma->anon_vma
should ensure the vma->anon_vma assignment happens after the busy lock
is visible, and the fact that anyone looking into the anon_vma should
hold the lock takes care of everything else).
> So I don't think you really fixed the race that you want to fix, and I
> don't think that does what you wanted to do.
>
> But I might have missed something.
No, I think this race is different. It's because it is "hard" to get a
reference on anon_vma from the lru->page path, because unmapping a vma
doesn't take any of the same locks (in particular it doesn't take the
page lock, which would be the big hammer solution).
So we can have a thread in reclaim who has a locked page, and is just
about to call page_lock_anon_vma.
At this point, another thread might unmap the whole vma. If this is
the last vma in the anon_vma, then it garbage collects it in anon_vma_unlink.
page->mapping does not get NULLed out or anything.
So the first thread picks up the anon_vma under RCU, sees page_mapped is
still true (let's say this part runs just before the unmapper decrements
the last ->mapcount, then the page gets garbage collected).
Then we take the page lock. Still OK because we are under SLAB_DESTROY_BY_RCU.
Then we return the anon_vma and start using it. But when we took the page
lock, we don't actually know that the anon_vma hasn't been allocated and used
for something else entirely.
Taking the anon_vma->lock in anon_vma_prepare of a new anon_vma closes the
obvious list corruption problems that could occur if we tried to walk it
at the same time a new vma was being put on there. But AFAIKS, even then we
have a problem where we might be trying to walk over completely the wrong
vmas now.
Slight improvement attached.
---
Index: linux-2.6/mm/rmap.c
===================================================================
--- linux-2.6.orig/mm/rmap.c
+++ linux-2.6/mm/rmap.c
@@ -63,32 +63,42 @@ int anon_vma_prepare(struct vm_area_stru
might_sleep();
if (unlikely(!anon_vma)) {
struct mm_struct *mm = vma->vm_mm;
- struct anon_vma *allocated, *locked;
+ struct anon_vma *allocated;
anon_vma = find_mergeable_anon_vma(vma);
if (anon_vma) {
allocated = NULL;
- locked = anon_vma;
- spin_lock(&locked->lock);
} else {
anon_vma = anon_vma_alloc();
if (unlikely(!anon_vma))
return -ENOMEM;
allocated = anon_vma;
- locked = NULL;
}
+ /*
+ * The lock is required even for new anon_vmas, because as
+ * soon as we store vma->anon_vma = anon_vma, then the
+ * anon_vma becomes visible via the vma. This means another
+ * CPU can find the anon_vma, then store it into the struct
+ * page with page_add_anon_rmap. At this point, anon_vma can
+ * be loaded from the page with page_lock_anon_vma.
+ *
+ * So long as the anon_vma->lock is taken before looking at
+ * any fields in the anon_vma, the lock should take care of
+ * races and memory ordering issues WRT anon_vma fields.
+ */
+ spin_lock(&anon_vma->lock);
+
/* page_table_lock to protect against threads */
spin_lock(&mm->page_table_lock);
if (likely(!vma->anon_vma)) {
- vma->anon_vma = anon_vma;
list_add_tail(&vma->anon_vma_node, &anon_vma->head);
+ vma->anon_vma = anon_vma;
allocated = NULL;
}
spin_unlock(&mm->page_table_lock);
+ spin_lock(&anon_vma->lock);
- if (locked)
- spin_unlock(&locked->lock);
if (unlikely(allocated))
anon_vma_free(allocated);
}
@@ -171,6 +181,21 @@ static struct anon_vma *page_lock_anon_v
anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
spin_lock(&anon_vma->lock);
+
+ /*
+ * If the page is no longer mapped, we have no way to keep the
+ * anon_vma stable. It may be freed and even re-allocated for some
+ * other set of anonymous mappings at any point. If the page is
+ * mapped while we have the lock on the anon_vma, then we know
+ * anon_vma_unlink can't run and garbage collect the anon_vma
+ * (because unmapping the page happens before unlinking the anon_vma).
+ */
+ if (unlikely(!page_mapped(page))) {
+ spin_unlock(&anon_vma->lock);
+ goto out;
+ }
+ BUG_ON(page->mapping != anon_mapping);
+
return anon_vma;
out:
rcu_read_unlock();
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-10-18 5:20 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-10-16 4:10 Nick Piggin
2008-10-17 22:14 ` Hugh Dickins
2008-10-17 23:05 ` Linus Torvalds
2008-10-18 0:13 ` Hugh Dickins
2008-10-18 0:25 ` Linus Torvalds
2008-10-18 1:53 ` Nick Piggin
2008-10-18 2:50 ` Paul Mackerras
2008-10-18 2:57 ` Linus Torvalds
2008-10-18 5:49 ` Nick Piggin
2008-10-18 10:49 ` Paul Mackerras
2008-10-18 17:00 ` Linus Torvalds
2008-10-18 18:44 ` Matthew Wilcox
2008-10-19 2:54 ` Nick Piggin
2008-10-19 2:53 ` Nick Piggin
2008-10-17 23:13 ` Peter Zijlstra
2008-10-17 23:53 ` Linus Torvalds
2008-10-18 0:42 ` Linus Torvalds
2008-10-18 1:08 ` Linus Torvalds
2008-10-18 1:32 ` Nick Piggin
2008-10-18 2:11 ` Linus Torvalds
2008-10-18 2:25 ` Nick Piggin
2008-10-18 2:35 ` Nick Piggin
2008-10-18 2:53 ` Linus Torvalds
2008-10-18 5:20 ` Nick Piggin [this message]
2008-10-18 10:38 ` Peter Zijlstra
2008-10-19 9:52 ` Hugh Dickins
2008-10-19 10:51 ` Peter Zijlstra
2008-10-19 12:39 ` Hugh Dickins
2008-10-19 18:25 ` Linus Torvalds
2008-10-19 18:45 ` Peter Zijlstra
2008-10-19 19:00 ` Hugh Dickins
2008-10-20 4:03 ` Hugh Dickins
2008-10-20 15:17 ` Linus Torvalds
2008-10-20 18:21 ` Hugh Dickins
2008-10-21 2:56 ` Nick Piggin
2008-10-21 3:25 ` Linus Torvalds
2008-10-21 4:33 ` Nick Piggin
2008-10-21 12:58 ` Hugh Dickins
2008-10-21 15:59 ` Christoph Lameter
2008-10-22 9:29 ` Nick Piggin
2008-10-21 4:34 ` Nick Piggin
2008-10-21 13:55 ` Hugh Dickins
2008-10-21 2:44 ` Nick Piggin
2008-10-18 19:14 ` Hugh Dickins
2008-10-19 3:03 ` Nick Piggin
2008-10-19 7:07 ` Hugh Dickins
2008-10-20 3:26 ` Hugh Dickins
2008-10-21 2:45 ` Nick Piggin
2008-10-19 1:13 ` Hugh Dickins
2008-10-19 2:41 ` Nick Piggin
2008-10-19 9:45 ` Hugh Dickins
2008-10-21 3:59 ` Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081018052046.GA26472@wotan.suse.de \
--to=npiggin@suse.de \
--cc=a.p.zijlstra@chello.nl \
--cc=hugh@veritas.com \
--cc=linux-mm@kvack.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox