From: Hugh Dickins <hugh@veritas.com>
To: Nick Piggin <npiggin@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Linux Memory Management List <linux-mm@kvack.org>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [patch] mm: fix race in COW logic
Date: Mon, 23 Jun 2008 11:04:31 +0100 (BST) [thread overview]
Message-ID: <Pine.LNX.4.64.0806231015140.3513@blonde.site> (raw)
In-Reply-To: <20080623014940.GA29413@wotan.suse.de>
On Mon, 23 Jun 2008, Nick Piggin wrote:
> On Sun, Jun 22, 2008 at 07:10:41PM +0100, Hugh Dickins wrote:
> > On Sun, 22 Jun 2008, Linus Torvalds wrote:
> > > On Sun, 22 Jun 2008, Hugh Dickins wrote:
> > >
> > > > One thing though, in moving the page_remove_rmap in that way, aren't
> > > > you assuming that there's an appropriate wmbarrier between the two
> > > > locations? If that is necessarily so (there's plenty happening in
> > > > between), it may deserve a comment to say just where that barrier is.
...
> I was initially thinking an smp_wmb might have been in order (excuse the pun),
> but then I rethought it and added the 2d paragraph to my comment.
First off, I've a bewildering and shameful confession and apology
to make: I made that suggestion that you add a comment without even
reading through the comment you had made! Ugh (at myself)! Sorry
for that. I've a suspicion that the longer a comment is, the more
likely my eye will skip over it...
> But I may
> still have been wrong. Let's ignore the barriers implicit in the rmap
> functions for now, and if we find they are required we can add a nice
> /* smp_wmb() for ..., provided by ...! */
Well, okay, but we're discussing a minuscule likelihood on top of the
minuscule likelihood of the race you astutely identified. So I don't
want to waste anyone's time with an academic exercise; but if any of
us learn something from it, then it may be worth it.
>
> Now. The critical memory operations AFAIKS are:
>
> dec page->mapcount
> load page->mapcount (== 1)
> store pte (RW)
> store via pte
> load via pte
> ptep_clear_flush
> store new pte
>
> Note that I don't believe the page_add_new_anon_rmap is part of the critical
> ordering. Unless that is for a different issue?
Agreed, the page_add_new_anon_rmap is irrelevant to this, it only got
mentioned when I was trying to make sense of the mail Linus retracted.
>
> Now if we move the decrement of page->mapcount to below the ptep_clear_flush,
> then our TLB shootdown protocol *should* guarantee that nothing may load via
> that pte after page->mapcount has been decremented, right?
Via that old pte, yes. But page->_mapcount is not held at a
virtual address affected by the TLB shootdown, so I don't see how the
ptep_clear_flush would give a guarantee on the visibility of the mapcount
change. Besides which, the shootdown hits the right hand CPU not the left.
I think it likely that any implementation of ptep_clear_flush would
include instructions which give that guarantee (particularly since
it has to do something inter-CPU to handle the CPU on the right);
but I don't see how ptep_clear_flush gives that guarantee itself.
>
> Now we only have pairwise barrier semantics, so if the leftmost process is
> not part of the TLB shootdown (which it is not), then it is possible that
> it may see the store to decrement the mapcount before the store to clear the
> pte. Maybe. I was hoping causality would not allow a subsequent store through
> the pte to be seen by the rightmost guy before the TLB flush. But maybe I
> was wrong?
If you're amidst the maybes, imagine the darkness I'm in!
And I'm not adding much to the argument with that remark,
so please don't feel obliged to respond.
> (at any rate, page_remove_rmap gives us smp_wmb if required, so
> the code is not technically wrong)
Originally I came at the question because it seemed to me that if
moving the page_remove_rmap down was to be fully effective, it needed
to move through a suitable barrier; it hadn't occurred to me that it
was carrying the suitable barrier with it. But if that is indeed
correct, I think it would be better to rely upon that, than resort
to more difficult arguments.
I would love to find a sure-footed way to think about memory barriers,
but I don't think I'll ever get the knack; particularly when even you
and Paul and Linus can sometimes be caught in doubt.
Hugh
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-06-23 10:04 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-06-22 15:30 Nick Piggin
2008-06-22 17:11 ` Hugh Dickins
2008-06-22 17:35 ` Linus Torvalds
2008-06-22 18:10 ` Hugh Dickins
2008-06-22 18:18 ` Linus Torvalds
2008-06-23 1:49 ` Nick Piggin
2008-06-23 10:04 ` Hugh Dickins [this message]
2008-06-23 12:18 ` Nick Piggin
2008-06-23 12:30 ` Nick Piggin
2008-06-23 15:39 ` Hugh Dickins
2008-06-27 9:19 ` Peter Zijlstra
2008-06-27 9:13 ` Peter Zijlstra
2008-06-23 1:52 ` Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0806231015140.3513@blonde.site \
--to=hugh@veritas.com \
--cc=akpm@linux-foundation.org \
--cc=linux-mm@kvack.org \
--cc=npiggin@suse.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox