Re: [PATCH v6 4/5] MCS Lock: Barrier corrections

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Linus Torvalds <torvalds@linux-foundation.org>
To: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Will Deacon <will.deacon@arm.com>, Ingo Molnar <mingo@elte.hu>,
	Andrew Morton <akpm@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
	Waiman Long <waiman.long@hp.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Alex Shi <alex.shi@linaro.org>, Andi Kleen <andi@firstfloor.org>,
	Michel Lespinasse <walken@google.com>,
	Davidlohr Bueso <davidlohr.bueso@hp.com>,
	Matthew R Wilcox <matthew.r.wilcox@intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Rik van Riel <riel@redhat.com>,
	Peter Hurley <peter@hurleysoftware.com>,
	Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>,
	George Spelvin <linux@horizon.com>,
	"H. Peter Anvin" <hpa@zytor.com>, Arnd Bergmann <arnd@arndb.de>,
	Aswin Chandramouleeswaran <aswin@hp.com>,
	Scott J Norton <scott.norton@hp.com>,
	"Figo.zhang" <figo1802@gmail.com>
Subject: Re: [PATCH v6 4/5] MCS Lock: Barrier corrections
Date: Sat, 23 Nov 2013 12:21:13 -0800	[thread overview]
Message-ID: <CA+55aFxQy8afgf6geqJOEHmsJ=ME-6CXrrPfj=aggH7u_jEEZA@mail.gmail.com> (raw)
In-Reply-To: <20131123013654.GG4138@linux.vnet.ibm.com>

On Fri, Nov 22, 2013 at 5:36 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
>
> But this does -not- guarantee that some other non-lock-holding CPU 2 will
> see CS0 and CS1 in order.  To see this, let's fill in the two critical
> sections, where variables X and Y are both initially zero:
>
>         CPU 0 (releasing)       CPU 1 (acquiring)
>         -----                   -----
>         ACCESS_ONCE(X) = 1;     while (ACCESS_ONCE(lock) == 1)
>         lwsync                          continue;
>         ACCESS_ONCE(lock) = 0;  lwsync
>                                 r1 = ACCESS_ONCE(Y);
>
> Then let's add in the observer CPU 2:
>
>         CPU 2
>         -----
>         ACCESS_ONCE(Y) = 1;
>         sync
>         r2 = ACCESS_ONCE(X);
>
> If unlock+lock act as a full memory barrier, it would be impossible to
> end up with (r1 == 0 && r2 == 0).  After all, (r1 == 0) implies that
> CPU 2's store to Y happened after CPU 1's load from Y, and (r2 == 0)
> implies that CPU 0's load from X happened after CPU 2's store to X.
> If CPU 0's unlock combined with CPU 1's lock really acted like a full
> memory barrier, we end up with CPU 0's load happening before CPU 1's
> store happening before CPU 2's store happening before CPU 2's load
> happening before CPU 0's load.
>
> However, the outcome (r1 == 0 && r2 == 0) really does happen both
> in theory and on real hardware.

Ok, so I have ruminated.

But even after having ruminated, the one thing I cannot find is
support for your "(r1 == 0 && r2 == 0) really does happen on
hardware".

Ignore theory, and assume just cache coherency, ie the notion that in
order to write to a cacheline, you have to have that cacheline in some
exclusive state. We have three cachelines, X, Y and lock (and our
cachelines only have a single bit, starting out as 0,0,1
respectively).

CPU0:
   write X = 1;
   lwsync
   write lock = 0;

doesn't even really require any cache accesses at all per se, but it
*does* require that the two stores be ordered in the store buffer on
CPU0 in such a way that cacheline X gets updated (ie is in
exclusive/dirty state in CPU0 with the value 1) before cacheline
'lock' gets released from its exclusive/dirty state after having
itself been updated to 0.

So basically we know that *within* CPU0, by the time the lock actually
makes it out of the CPU, the cacheline containing X will have been in
dirty mode with the value "1". The core might actually have written
'lock' first, but it can't release that cacheline from exclusive state
(so that it is visible anywhere else) until it has _also_ gotten 'X'
into exclusive state (once both cachelines are exclusive within CPU0,
it can do any ordering, because the ordering won't be externally
visible).

And this is all just looking at CPU0, nothing else. But if it is
exclusive/dirty on CPU0, then it cannot be shared in any other CPU
(although a previous stale value may obviously still be "in flight"
somewhere else outside of the coherency domain).

So let's look at CPU2, which is similar, but now the second access is
a read (of zero), not a write:

CPU2:
   write Y = 1;
   sync
   read X as zero

So if 'sync' is a true memory barrier between the write and the read,
then we know that the following is true: CPU2 must have gotten
cacheline 'Y' into exclusive state and acquired (or held on to, which
is equivalent) cacheline 'X' in shared state _after_ it got that Y
into exclusive state. It can't rely on some "in flight" previously
read value of 'X' until after it got Y into exclusive state. Otherwise
it wouldn't be a ordering between the write and the read, agreed?

The pattern on CPU1, meanwhile, is somewhat different. But I'm going
to ignore the "while" part, and just concentrate on the last iteration
of the loop, and it turns into:

CPU1:
   read lock as zero
   lwsync
   read Y as zero

It only does reads, so it is happy with a shared cacheline, but in
order for the lwsync to actually act as an acquire, it does mean that
the cacheline 'Y' needs to be in some shared state within the cache
coherency domain after (or, again, across) cacheline 'lock' having
been in a shared state with value == 0 on CPU1. No "in flight" values.

Agreed?

So the above isn't really about lwsync/sync/memory ordering any more,
the above is basically rewriting things purely about cacheline states
as seen by individual processors. And I know the POWER cache coherency
is really complex (iirc cachelines can have 11+ states - we're not
talking about MOESI any more), but even there you have to have a
notion of "exclusive access" to be able to write in the end. So my
states are certainly simplified, but I don't see how that basic rule
can be violated (and still be called cache coherent).

And let's just look at the individual "events" on these CPU's:

 - A = CPU0 acquires exclusive access to cacheline X (in order to
write 1 into it)
 - B = CPU0 releases its exclusive access to cacheline lock (after
having written 0 into it)
 - C = CPU1 reads shared cacheline lock as being zero
 - D = CPU1 reads shared cacheline Y as being zero
 - E = CPU2 acquires exclusive access to cacheline Y (in order to
write 1 into it)
 - F = CPU2 reads shared cacheline X as being zero

and you cannot order these events arbitrarily, but there *ARE* certain
orderings you can rely on:

 - within a CPU due to the barriers. This gives you

    A < B, C < D and E < F

 - with regards to a particular cacheline because of that cacheline
coming exclusive (or, in the case of 'B', moving out of exclusive
state) and the value it has at that point:

    B < C, F < A and D < E

And as far as I can tell, the above gives you: A < B < C < D < E < F <
A. Which doesn't look possible.

So which step in my rumination here is actually wrong? Because I
really don't see how you can get that "(r1 == 0 && r2 == 0)" on real
hardware using cache coherency.

*SOME* assumption of mine must be wrong. But I don't see which one.

                   Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2013-11-23 20:21 UTC|newest]

Thread overview: 116+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <cover.1384885312.git.tim.c.chen@linux.intel.com>
2013-11-20  1:37 ` [PATCH v6 0/5] MCS Lock: MCS lock code cleanup and optimizations Tim Chen
2013-11-20 10:19   ` Will Deacon
2013-11-20 12:50     ` Paul E. McKenney
2013-11-20 17:00       ` Will Deacon
2013-11-20 17:14         ` Paul E. McKenney
2013-11-20 17:00     ` Tim Chen
2013-11-20 17:16       ` Paul E. McKenney
2013-11-20  1:37 ` [PATCH v6 1/5] MCS Lock: Restructure the MCS lock defines and locking code into its own file Tim Chen
2013-11-20  1:37 ` [PATCH v6 2/5] MCS Lock: optimizations and extra comments Tim Chen
2013-11-20  1:37 ` [PATCH v6 3/5] MCS Lock: Move mcs_lock/unlock function into its own file Tim Chen
2013-11-20  1:37 ` [PATCH v6 4/5] MCS Lock: Barrier corrections Tim Chen
2013-11-20 15:31   ` Paul E. McKenney
2013-11-20 15:46     ` Will Deacon
2013-11-20 17:14       ` Paul E. McKenney
2013-11-20 18:43         ` Tim Chen
2013-11-20 19:06           ` Paul E. McKenney
2013-11-20 20:36             ` Tim Chen
2013-11-20 21:44               ` Paul E. McKenney
2013-11-20 23:51                 ` Tim Chen
2013-11-21  4:53                   ` Paul E. McKenney
2013-11-21 10:17                     ` Will Deacon
2013-11-21 13:16                       ` Paul E. McKenney
2013-11-21 10:45                     ` Peter Zijlstra
2013-11-21 13:18                       ` Paul E. McKenney
2013-11-21 22:27                     ` Linus Torvalds
2013-11-21 22:52                       ` Paul E. McKenney
2013-11-22  0:09                         ` Linus Torvalds
2013-11-22  4:08                           ` Paul E. McKenney
2013-11-22  4:25                             ` Linus Torvalds
2013-11-22  6:23                               ` Paul E. McKenney
2013-11-22 15:16                                 ` Ingo Molnar
2013-11-22 18:49                                   ` Paul E. McKenney
2013-11-22 19:06                                     ` Linus Torvalds
2013-11-22 20:06                                       ` Paul E. McKenney
2013-11-22 20:09                                         ` Linus Torvalds
2013-11-22 20:37                                           ` Paul E. McKenney
2013-11-22 21:01                                             ` Linus Torvalds
2013-11-22 21:52                                               ` Paul E. McKenney
2013-11-22 22:19                                                 ` Linus Torvalds
2013-11-23  0:25                                                   ` Paul E. McKenney
2013-11-23  0:42                                                     ` Linus Torvalds
2013-11-23  1:36                                                       ` Paul E. McKenney
2013-11-23  2:11                                                         ` Linus Torvalds
2013-11-23  4:05                                                           ` Paul E. McKenney
2013-11-23 11:24                                                             ` Ingo Molnar
2013-11-23 17:06                                                               ` Paul E. McKenney
2013-11-26 12:02                                                                 ` Ingo Molnar
2013-11-26 19:28                                                                   ` Paul E. McKenney
2013-11-23 20:21                                                         ` Linus Torvalds [this message]
2013-11-23 20:39                                                           ` Linus Torvalds
2013-11-25 12:09                                                             ` Peter Zijlstra
2013-11-25 17:18                                                               ` Will Deacon
2013-11-25 17:56                                                                 ` Paul E. McKenney
2013-11-25 17:54                                                             ` Paul E. McKenney
2013-11-23 21:29                                                           ` Peter Zijlstra
2013-11-23 22:24                                                             ` Linus Torvalds
2013-11-25 17:53                                                           ` Paul E. McKenney
2013-11-25 18:21                                                             ` Peter Zijlstra
2013-11-21 11:03         ` Peter Zijlstra
2013-11-21 12:56           ` Peter Zijlstra
2013-11-21 13:20             ` Paul E. McKenney
2013-11-21 17:25               ` Paul E. McKenney
2013-11-21 21:52                 ` Peter Zijlstra
2013-11-21 22:18                   ` Paul E. McKenney
2013-11-22 15:58                     ` Peter Zijlstra
2013-11-22 18:26                       ` Paul E. McKenney
2013-11-22 18:51                         ` Peter Zijlstra
2013-11-22 18:59                           ` Paul E. McKenney
2013-11-25 17:35                           ` Peter Zijlstra
2013-11-25 18:02                             ` Paul E. McKenney
2013-11-25 18:24                               ` Peter Zijlstra
2013-11-25 18:34                                 ` Tim Chen
2013-11-25 18:27                               ` Peter Zijlstra
2013-11-25 23:52                                 ` Paul E. McKenney
2013-11-26  9:59                                   ` Peter Zijlstra
2013-11-26 17:11                                     ` Paul E. McKenney
2013-11-26 17:18                                       ` Peter Zijlstra
2013-11-26 19:00                                     ` Linus Torvalds
2013-11-26 19:20                                       ` Paul E. McKenney
2013-11-26 19:32                                         ` Linus Torvalds
2013-11-26 22:51                                           ` Paul E. McKenney
2013-11-26 23:58                                             ` Linus Torvalds
2013-11-27  0:21                                               ` Thomas Gleixner
2013-11-27  0:39                                               ` Paul E. McKenney
2013-11-27  1:05                                                 ` Linus Torvalds
2013-11-27  1:31                                                   ` Paul E. McKenney
2013-11-27 10:16                                             ` Will Deacon
2013-11-27 17:11                                               ` Paul E. McKenney
2013-11-28 11:40                                                 ` Will Deacon
2013-11-28 17:38                                                   ` Paul E. McKenney
2013-11-28 18:03                                                     ` Will Deacon
2013-11-28 18:27                                                       ` Paul E. McKenney
2013-11-28 18:53                                                         ` Will Deacon
2013-11-28 19:50                                                           ` Paul E. McKenney
2013-11-29 16:17                                                             ` Will Deacon
2013-11-29 16:44                                                               ` Linus Torvalds
2013-11-29 18:18                                                                 ` Will Deacon
2013-11-30 17:38                                                                 ` Paul E. McKenney
2013-11-26 19:21                                       ` Peter Zijlstra
2013-11-27 16:58                                         ` Oleg Nesterov
2013-11-26 23:08                                       ` Benjamin Herrenschmidt
2013-11-25 23:55                               ` H. Peter Anvin
2013-11-26  3:16                                 ` Paul E. McKenney
2013-11-27  0:46                                   ` H. Peter Anvin
2013-11-27  1:07                                     ` Linus Torvalds
2013-11-27  1:27                                     ` Paul E. McKenney
2013-11-27  2:59                                       ` H. Peter Anvin
2013-11-25 18:52                             ` H. Peter Anvin
2013-11-25 22:58                               ` Tim Chen
2013-11-25 23:28                                 ` H. Peter Anvin
2013-11-25 23:51                                   ` Paul E. McKenney
2013-11-25 23:36                               ` Paul E. McKenney
2013-12-04 21:26                 ` Andi Kleen
2013-12-04 22:07                   ` Paul E. McKenney
2013-11-21 13:19           ` Paul E. McKenney
2013-11-20  1:37 ` [PATCH v6 5/5] MCS Lock: Allows for architecture specific mcs lock and unlock Tim Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+55aFxQy8afgf6geqJOEHmsJ=ME-6CXrrPfj=aggH7u_jEEZA@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.shi@linaro.org \
    --cc=andi@firstfloor.org \
    --cc=arnd@arndb.de \
    --cc=aswin@hp.com \
    --cc=dave.hansen@intel.com \
    --cc=davidlohr.bueso@hp.com \
    --cc=figo1802@gmail.com \
    --cc=hpa@zytor.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux@horizon.com \
    --cc=matthew.r.wilcox@intel.com \
    --cc=mingo@elte.hu \
    --cc=mingo@kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peter@hurleysoftware.com \
    --cc=raghavendra.kt@linux.vnet.ibm.com \
    --cc=riel@redhat.com \
    --cc=scott.norton@hp.com \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@linux.intel.com \
    --cc=waiman.long@hp.com \
    --cc=walken@google.com \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox