linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <npiggin@suse.de>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [patch] mm: fix anon_vma races
Date: Fri, 17 Oct 2008 16:05:03 -0700 (PDT)	[thread overview]
Message-ID: <alpine.LFD.2.00.0810171549310.3438@nehalem.linux-foundation.org> (raw)
In-Reply-To: <Pine.LNX.4.64.0810172300280.30871@blonde.site>


On Fri, 17 Oct 2008, Hugh Dickins wrote:
> 
> My problem is really with the smp_read_barrier_depends() you each
> have in anon_vma_prepare().  But the only thing which its CPU
> does with the anon_vma is put its address into a struct page
> (or am I forgetting more?).  Wouldn't the smp_read_barrier_depends()
> need to be, not there in anon_vma_prepare(), but over on the third
> CPU, perhaps in page_lock_anon_vma()?

I thought about it, but it's a disaster from a maintenance standpoint to 
put it there, rather than make it all clear in the _one_ function that 
actually does things optimistically.

I agree that it's a bit subtle the way I did it (haven't seen Nick's 
patch, I assume he was upset at me for shouting at him), but that's part 
of why I put that comment in there and said things are subtle.

Anyway, technically you're right: the smp_read_barrier_depends() really 
would be more obvious in the place where we actually fetch that "anon_vma" 
pointer again and actually derefernce it.

HOWEVER:

 - there are potentially multiple places that do that, and putting it in 
   the anon_vma_prepare() thing not only matches things with the 
   smp_wmb(), making that whole pairing much more obvious, but it also 
   means that we're guaranteed that any anon_vma user will have done the 
   smp_read_barrier_depends(), since they all have to do that prepare 
   thing anyway.

   So putting it there is simpler and gives better guarantees, and pairs 
   up the barriers better.

 - Now, "simpler" (etc) is no help if it doesn't work, so now I have to 
   convince you that it's _sufficient_ to do that "read_barrier_depends()" 
   early, even if we then end up re-doing the first read and thus the 
   "depends" part doesn't work any more. So "simpler" is all good, but not 
   if it's incorrect.

   And I admit it, here my argument is one of implementation. The fact is, 
   the only architecture where "read_barrier_depends()" exists at all as 
   anything but a no-op is alpha, and there it's a full read barrier. On 
   all other architectures, causality implies a read barrier anyway, so 
   for them, placement (or non-placement) of the smp_read_barrier_depends 
   is a total non-issue.

   And so, since on the only architecture where it could possibly matter, 
   that _depends thing turns into a full read barrier, and since 
   "anon_vma" is actually stable since written, and since the only 
   ordering constrain is that initial ordering of seeing the "anon_vma" 
   turn non-NULL, you may as well think of that "read_barrier_depends()" 
   as a full read barrier between the _original_ read of the anon_vma 
   pointer and then the read of the lock data we want to protect.

   Which it is, on alpha. And that is sufficient. IOW, think of it as a 
   real read_barrier(), with no dependency thing, but that only happens 
   when an architecture doesn't already guarantee the causality barrier.

   And once you think of it as a "smp_rmb() for alpha", you realize that 
   it's perfectly ok for it to be where it is.

Anyway, lockless is bad. It would certainly be a *lot* simpler to just 
take the page_table_lock around the whole thing, except I think we really 
*really* don't want to do that. That thing is solidly in a couple of 
*very* timing-critical routines. Doing another lock there is just not an 
option.

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2008-10-17 23:05 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-16  4:10 Nick Piggin
2008-10-17 22:14 ` Hugh Dickins
2008-10-17 23:05   ` Linus Torvalds [this message]
2008-10-18  0:13     ` Hugh Dickins
2008-10-18  0:25       ` Linus Torvalds
2008-10-18  1:53       ` Nick Piggin
2008-10-18  2:50         ` Paul Mackerras
2008-10-18  2:57           ` Linus Torvalds
2008-10-18  5:49           ` Nick Piggin
2008-10-18 10:49             ` Paul Mackerras
2008-10-18 17:00             ` Linus Torvalds
2008-10-18 18:44               ` Matthew Wilcox
2008-10-19  2:54                 ` Nick Piggin
2008-10-19  2:53               ` Nick Piggin
2008-10-17 23:13 ` Peter Zijlstra
2008-10-17 23:53   ` Linus Torvalds
2008-10-18  0:42     ` Linus Torvalds
2008-10-18  1:08       ` Linus Torvalds
2008-10-18  1:32         ` Nick Piggin
2008-10-18  2:11           ` Linus Torvalds
2008-10-18  2:25             ` Nick Piggin
2008-10-18  2:35               ` Nick Piggin
2008-10-18  2:53               ` Linus Torvalds
2008-10-18  5:20                 ` Nick Piggin
2008-10-18 10:38                   ` Peter Zijlstra
2008-10-19  9:52                     ` Hugh Dickins
2008-10-19 10:51                       ` Peter Zijlstra
2008-10-19 12:39                         ` Hugh Dickins
2008-10-19 18:25                         ` Linus Torvalds
2008-10-19 18:45                           ` Peter Zijlstra
2008-10-19 19:00                           ` Hugh Dickins
2008-10-20  4:03                           ` Hugh Dickins
2008-10-20 15:17                             ` Linus Torvalds
2008-10-20 18:21                               ` Hugh Dickins
2008-10-21  2:56                               ` Nick Piggin
2008-10-21  3:25                                 ` Linus Torvalds
2008-10-21  4:33                                   ` Nick Piggin
2008-10-21 12:58                                     ` Hugh Dickins
2008-10-21 15:59                                     ` Christoph Lameter
2008-10-22  9:29                                       ` Nick Piggin
2008-10-21  4:34                                   ` Nick Piggin
2008-10-21 13:55                                     ` Hugh Dickins
2008-10-21  2:44                           ` Nick Piggin
2008-10-18 19:14               ` Hugh Dickins
2008-10-19  3:03                 ` Nick Piggin
2008-10-19  7:07                   ` Hugh Dickins
2008-10-20  3:26                     ` Hugh Dickins
2008-10-21  2:45                       ` Nick Piggin
2008-10-19  1:13       ` Hugh Dickins
2008-10-19  2:41         ` Nick Piggin
2008-10-19  9:45           ` Hugh Dickins
2008-10-21  3:59             ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.2.00.0810171549310.3438@nehalem.linux-foundation.org \
    --to=torvalds@linux-foundation.org \
    --cc=hugh@veritas.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox