From: Linus Torvalds <torvalds@linux-foundation.org>
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Matt Mackall <mpm@selenic.com>, Hugh Dickins <hugh@veritas.com>,
Pekka Enberg <penberg@cs.helsinki.fi>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [rfc] SLOB memory ordering issue
Date: Fri, 17 Oct 2008 13:29:04 -0700 (PDT) [thread overview]
Message-ID: <alpine.LFD.2.00.0810171235310.3438@nehalem.linux-foundation.org> (raw)
In-Reply-To: <200810160506.14261.nickpiggin@yahoo.com.au>
Ok, finally looked at this.
There is indeed a locking bug there. "anon_vma_prepare()" optimistically
looks at vma->anon_vma without taking the &mm->page_table_lock. That's not
right.
Of course, we could just take the lock, but in this case it's probably ok
to just admit that we have a lockless algorithm. But that implies that we
need to do the right memory ordering.
And that, in turn, doesn't just imply a "smp_wmb()" - if you do memory
ordering, you need to do it on *both* sides, so now the other side needs
to also do a matching smp_rmb(). Or, in this case, smp_rmb_depends(), I
guess.
That, btw, is an important part of memory ordering. You can never do
ordering on just one side. A "smp_wmb()" on its own is always nonsensical.
It always needs to be paired with a "smp_rmb()" variant.
Something like the appended may fix it.
But I do think we have a potential independent issue with the new page
table lookup code now that it's lock-free. We have the smp_rmb() calls in
gup_get_pte() (at least on x86), when we look things up, but we don't
actually have a lot of smp_wmb()'s there when we insert the page.
For the anonymous page case, we end up doing a
page_add_new_anon_rmap();
before we do the set_pte_at() that actually exposes it, and that does the
whole
page->mapping = (struct address_space *) anon_vma;
page->index = linear_page_index(vma, address);
thing, but there is no write barrier between those and the actual write to
the page tables, so when GUP looks up the page, it can't actually depend
on page->mappign or anything else!
Now, this really isn't an issue on x86, since smp_wmb() is a no-op, and
the compiler won't be re-ordering the writes, but in general I do think
that now that we do lockless lookup of pages from the page tables, we
probably do need smp_wmb()'s there just in front of the "set_pte_at()"
calls.
NOTE NOTE! The patch below is only about "page->anon_vma", not about the
GUP lookup and page->mapping/index fields. That's an independent issue.
And notice? This has _nothing_ to do with constructors or allocators.
And of course - this patch is totally untested, and may well need some
thinking about.
Linus
---
mm/rmap.c | 18 ++++++++++++++++++
1 files changed, 18 insertions(+), 0 deletions(-)
diff --git a/mm/rmap.c b/mm/rmap.c
index 0383acf..21d09bb 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -81,6 +81,13 @@ int anon_vma_prepare(struct vm_area_struct *vma)
/* page_table_lock to protect against threads */
spin_lock(&mm->page_table_lock);
if (likely(!vma->anon_vma)) {
+ /*
+ * We hold the mm->page_table_lock, but another
+ * CPU may be doing an optimistic load (the one
+ * at the top), and we want to make sure that
+ * the anon_vma changes are visible.
+ */
+ smp_wmb();
vma->anon_vma = anon_vma;
list_add_tail(&vma->anon_vma_node, &anon_vma->head);
allocated = NULL;
@@ -92,6 +99,17 @@ int anon_vma_prepare(struct vm_area_struct *vma)
if (unlikely(allocated))
anon_vma_free(allocated);
}
+ /*
+ * Subtle: we looked up anon_vma without any locking
+ * (in the comon case), and are going to look at the
+ * spinlock etc behind it. In order to know that it's
+ * initialized, we need to do a read barrier here.
+ *
+ * We can use the cheaper "depends" version, since we
+ * are following a pointer, and only on alpha may that
+ * give a stale value.
+ */
+ smp_read_barrier_depends();
return 0;
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
prev parent reply other threads:[~2008-10-17 20:29 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-10-15 16:34 Nick Piggin
2008-10-15 16:46 ` Nick Piggin
2008-10-15 16:54 ` Matt Mackall
2008-10-15 17:10 ` Nick Piggin
2008-10-15 17:33 ` Linus Torvalds
2008-10-15 17:36 ` Linus Torvalds
2008-10-15 17:58 ` Matt Mackall
2008-10-15 17:45 ` Nick Piggin
2008-10-15 18:03 ` Linus Torvalds
2008-10-15 18:12 ` Nick Piggin
2008-10-15 18:19 ` Matt Mackall
2008-10-15 18:35 ` Nick Piggin
2008-10-15 18:43 ` Linus Torvalds
2008-10-15 19:19 ` Nick Piggin
2008-10-15 19:47 ` Linus Torvalds
2008-10-15 18:29 ` Linus Torvalds
2008-10-15 18:06 ` Nick Piggin
2008-10-15 18:26 ` Linus Torvalds
2008-10-15 18:50 ` Nick Piggin
2008-10-17 20:29 ` Linus Torvalds [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.2.00.0810171235310.3438@nehalem.linux-foundation.org \
--to=torvalds@linux-foundation.org \
--cc=hugh@veritas.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mpm@selenic.com \
--cc=nickpiggin@yahoo.com.au \
--cc=penberg@cs.helsinki.fi \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox