linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nick Piggin <npiggin@suse.de>
To: Hugh Dickins <hugh@veritas.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [patch] mm: fix race in COW logic
Date: Mon, 23 Jun 2008 14:18:31 +0200	[thread overview]
Message-ID: <20080623121831.GA26555@wotan.suse.de> (raw)
In-Reply-To: <Pine.LNX.4.64.0806231015140.3513@blonde.site>

On Mon, Jun 23, 2008 at 11:04:31AM +0100, Hugh Dickins wrote:
> On Mon, 23 Jun 2008, Nick Piggin wrote:
> > On Sun, Jun 22, 2008 at 07:10:41PM +0100, Hugh Dickins wrote:
> > > On Sun, 22 Jun 2008, Linus Torvalds wrote:
> > > > On Sun, 22 Jun 2008, Hugh Dickins wrote:
> > > > 
> > > > > One thing though, in moving the page_remove_rmap in that way, aren't
> > > > > you assuming that there's an appropriate wmbarrier between the two
> > > > > locations?  If that is necessarily so (there's plenty happening in
> > > > > between), it may deserve a comment to say just where that barrier is.
> ...
> > I was initially thinking an smp_wmb might have been in order (excuse the pun),
> > but then I rethought it and added the 2d paragraph to my comment.
> 
> First off, I've a bewildering and shameful confession and apology
> to make: I made that suggestion that you add a comment without even
> reading through the comment you had made!  Ugh (at myself)!  Sorry
> for that.  I've a suspicion that the longer a comment is, the more
> likely my eye will skip over it...

No need for that :) You miss less than most, I think...

 
> > But I may
> > still have been wrong. Let's ignore the barriers implicit in the rmap
> > functions for now, and if we find they are required we can add a nice
> > /* smp_wmb() for ..., provided by ...! */
> 
> Well, okay, but we're discussing a minuscule likelihood on top of the
> minuscule likelihood of the race you astutely identified.  So I don't
> want to waste anyone's time with an academic exercise; but if any of
> us learn something from it, then it may be worth it.

No, I don't think it's a waste of time at all. Although we've already
established the atomic operation in the rmap operation should be
enough, understanding and properly commenting this will I think help
maintain this and other code too.

BTW. Yes, the race identified is pretty slim, requiring a lot to happen
on other CPUs between a small non-preemptible window of instructions,
it *might* actually happen in practice on larger systems where cachelines
might be highly contended or take a long time to get from one place to
another... and the -rt patchset where I assume ptl is preemptible too.


> > Now. The critical memory operations AFAIKS are:
> > 
> > 			dec page->mapcount
> > load page->mapcount (== 1)
> > store pte (RW)
> > store via pte
> > 							load via pte
> > 			ptep_clear_flush
> > 			store new pte
> > 
> > Note that I don't believe the page_add_new_anon_rmap is part of the critical
> > ordering. Unless that is for a different issue?
> 
> Agreed, the page_add_new_anon_rmap is irrelevant to this, it only got
> mentioned when I was trying to make sense of the mail Linus retracted.

OK, good.


> > Now if we move the decrement of page->mapcount to below the ptep_clear_flush,
> > then our TLB shootdown protocol *should* guarantee that nothing may load via
> > that pte after page->mapcount has been decremented, right?
> 
> Via that old pte, yes.  But page->_mapcount is not held at a
> virtual address affected by the TLB shootdown, so I don't see how the
> ptep_clear_flush would give a guarantee on the visibility of the mapcount
> change.  Besides which, the shootdown hits the right hand CPU not the left.
 
Right. Yes, I am just assuming that the TLB shootdown guarantees this,
and it may even be true for all implementations, but you're probably
right that it is wrong to rely on it...


> I think it likely that any implementation of ptep_clear_flush would
> include instructions which give that guarantee (particularly since
> it has to do something inter-CPU to handle the CPU on the right);
> but I don't see how ptep_clear_flush gives that guarantee itself.
> 
> > 
> > Now we only have pairwise barrier semantics, so if the leftmost process is
> > not part of the TLB shootdown (which it is not), then it is possible that
> > it may see the store to decrement the mapcount before the store to clear the
> > pte. Maybe. I was hoping causality would not allow a subsequent store through
> > the pte to be seen by the rightmost guy before the TLB flush. But maybe I
> > was wrong?
> 
> If you're amidst the maybes, imagine the darkness I'm in!
> And I'm not adding much to the argument with that remark,
> so please don't feel obliged to respond.
> 
> > (at any rate, page_remove_rmap gives us smp_wmb if required, so
> > the code is not technically wrong)
> 
> Originally I came at the question because it seemed to me that if
> moving the page_remove_rmap down was to be fully effective, it needed
> to move through a suitable barrier; it hadn't occurred to me that it
> was carrying the suitable barrier with it.  But if that is indeed
> correct, I think it would be better to rely upon that, than resort
> to more difficult arguments.

No I actually think you make a good point, and I'll resubmit the
patch with a replacement comment to say we've got the ordering
covered if nothing else then by the atomic op in rmap.
 

> I would love to find a sure-footed way to think about memory barriers,
> but I don't think I'll ever get the knack; particularly when even you
> and Paul and Linus can sometimes be caught in doubt.

Actually it can get pretty hard if you try to handwave about some
abstract higher level operation like "TLB flushing" (like I was).
It's a good idea to first distil out all the critical load and
store instructions before trying to reason with ordering issues.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2008-06-23 12:18 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-22 15:30 Nick Piggin
2008-06-22 17:11 ` Hugh Dickins
2008-06-22 17:35   ` Linus Torvalds
2008-06-22 18:10     ` Hugh Dickins
2008-06-22 18:18       ` Linus Torvalds
2008-06-23  1:49       ` Nick Piggin
2008-06-23 10:04         ` Hugh Dickins
2008-06-23 12:18           ` Nick Piggin [this message]
2008-06-23 12:30             ` Nick Piggin
2008-06-23 15:39               ` Hugh Dickins
2008-06-27  9:19               ` Peter Zijlstra
2008-06-27  9:13             ` Peter Zijlstra
2008-06-23  1:52     ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080623121831.GA26555@wotan.suse.de \
    --to=npiggin@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=hugh@veritas.com \
    --cc=linux-mm@kvack.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox