From: Linus Torvalds <torvalds@linux-foundation.org>
To: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <npiggin@suse.de>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [patch] mm: fix anon_vma races
Date: Fri, 17 Oct 2008 16:05:03 -0700 (PDT) [thread overview]
Message-ID: <alpine.LFD.2.00.0810171549310.3438@nehalem.linux-foundation.org> (raw)
In-Reply-To: <Pine.LNX.4.64.0810172300280.30871@blonde.site>
On Fri, 17 Oct 2008, Hugh Dickins wrote:
>
> My problem is really with the smp_read_barrier_depends() you each
> have in anon_vma_prepare(). But the only thing which its CPU
> does with the anon_vma is put its address into a struct page
> (or am I forgetting more?). Wouldn't the smp_read_barrier_depends()
> need to be, not there in anon_vma_prepare(), but over on the third
> CPU, perhaps in page_lock_anon_vma()?
I thought about it, but it's a disaster from a maintenance standpoint to
put it there, rather than make it all clear in the _one_ function that
actually does things optimistically.
I agree that it's a bit subtle the way I did it (haven't seen Nick's
patch, I assume he was upset at me for shouting at him), but that's part
of why I put that comment in there and said things are subtle.
Anyway, technically you're right: the smp_read_barrier_depends() really
would be more obvious in the place where we actually fetch that "anon_vma"
pointer again and actually derefernce it.
HOWEVER:
- there are potentially multiple places that do that, and putting it in
the anon_vma_prepare() thing not only matches things with the
smp_wmb(), making that whole pairing much more obvious, but it also
means that we're guaranteed that any anon_vma user will have done the
smp_read_barrier_depends(), since they all have to do that prepare
thing anyway.
So putting it there is simpler and gives better guarantees, and pairs
up the barriers better.
- Now, "simpler" (etc) is no help if it doesn't work, so now I have to
convince you that it's _sufficient_ to do that "read_barrier_depends()"
early, even if we then end up re-doing the first read and thus the
"depends" part doesn't work any more. So "simpler" is all good, but not
if it's incorrect.
And I admit it, here my argument is one of implementation. The fact is,
the only architecture where "read_barrier_depends()" exists at all as
anything but a no-op is alpha, and there it's a full read barrier. On
all other architectures, causality implies a read barrier anyway, so
for them, placement (or non-placement) of the smp_read_barrier_depends
is a total non-issue.
And so, since on the only architecture where it could possibly matter,
that _depends thing turns into a full read barrier, and since
"anon_vma" is actually stable since written, and since the only
ordering constrain is that initial ordering of seeing the "anon_vma"
turn non-NULL, you may as well think of that "read_barrier_depends()"
as a full read barrier between the _original_ read of the anon_vma
pointer and then the read of the lock data we want to protect.
Which it is, on alpha. And that is sufficient. IOW, think of it as a
real read_barrier(), with no dependency thing, but that only happens
when an architecture doesn't already guarantee the causality barrier.
And once you think of it as a "smp_rmb() for alpha", you realize that
it's perfectly ok for it to be where it is.
Anyway, lockless is bad. It would certainly be a *lot* simpler to just
take the page_table_lock around the whole thing, except I think we really
*really* don't want to do that. That thing is solidly in a couple of
*very* timing-critical routines. Doing another lock there is just not an
option.
Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-10-17 23:05 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-10-16 4:10 Nick Piggin
2008-10-17 22:14 ` Hugh Dickins
2008-10-17 23:05 ` Linus Torvalds [this message]
2008-10-18 0:13 ` Hugh Dickins
2008-10-18 0:25 ` Linus Torvalds
2008-10-18 1:53 ` Nick Piggin
2008-10-18 2:50 ` Paul Mackerras
2008-10-18 2:57 ` Linus Torvalds
2008-10-18 5:49 ` Nick Piggin
2008-10-18 10:49 ` Paul Mackerras
2008-10-18 17:00 ` Linus Torvalds
2008-10-18 18:44 ` Matthew Wilcox
2008-10-19 2:54 ` Nick Piggin
2008-10-19 2:53 ` Nick Piggin
2008-10-17 23:13 ` Peter Zijlstra
2008-10-17 23:53 ` Linus Torvalds
2008-10-18 0:42 ` Linus Torvalds
2008-10-18 1:08 ` Linus Torvalds
2008-10-18 1:32 ` Nick Piggin
2008-10-18 2:11 ` Linus Torvalds
2008-10-18 2:25 ` Nick Piggin
2008-10-18 2:35 ` Nick Piggin
2008-10-18 2:53 ` Linus Torvalds
2008-10-18 5:20 ` Nick Piggin
2008-10-18 10:38 ` Peter Zijlstra
2008-10-19 9:52 ` Hugh Dickins
2008-10-19 10:51 ` Peter Zijlstra
2008-10-19 12:39 ` Hugh Dickins
2008-10-19 18:25 ` Linus Torvalds
2008-10-19 18:45 ` Peter Zijlstra
2008-10-19 19:00 ` Hugh Dickins
2008-10-20 4:03 ` Hugh Dickins
2008-10-20 15:17 ` Linus Torvalds
2008-10-20 18:21 ` Hugh Dickins
2008-10-21 2:56 ` Nick Piggin
2008-10-21 3:25 ` Linus Torvalds
2008-10-21 4:33 ` Nick Piggin
2008-10-21 12:58 ` Hugh Dickins
2008-10-21 15:59 ` Christoph Lameter
2008-10-22 9:29 ` Nick Piggin
2008-10-21 4:34 ` Nick Piggin
2008-10-21 13:55 ` Hugh Dickins
2008-10-21 2:44 ` Nick Piggin
2008-10-18 19:14 ` Hugh Dickins
2008-10-19 3:03 ` Nick Piggin
2008-10-19 7:07 ` Hugh Dickins
2008-10-20 3:26 ` Hugh Dickins
2008-10-21 2:45 ` Nick Piggin
2008-10-19 1:13 ` Hugh Dickins
2008-10-19 2:41 ` Nick Piggin
2008-10-19 9:45 ` Hugh Dickins
2008-10-21 3:59 ` Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.2.00.0810171549310.3438@nehalem.linux-foundation.org \
--to=torvalds@linux-foundation.org \
--cc=hugh@veritas.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=npiggin@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox