2013/11/6 Tim Chen <tim.c.chen@linux.intel.com>

> On Tue, 2013-11-05 at 18:37 +0000, Will Deacon wrote:
> > On Tue, Nov 05, 2013 at 05:42:36PM +0000, Tim Chen wrote:
> > > This patch corrects the way memory barriers are used in the MCS lock
> > > and removes ones that are not needed. Also add comments on all
> barriers.
> >
> > Hmm, I see that you're fixing up the barriers, but I still don't
> completely
> > understand how what you have is correct. Hopefully you can help me out :)
> >
> > > Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
> > > Signed-off-by: Jason Low <jason.low2@hp.com>
> > > ---
> > >  include/linux/mcs_spinlock.h |   13 +++++++++++--
> > >  1 files changed, 11 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/include/linux/mcs_spinlock.h
> b/include/linux/mcs_spinlock.h
> > > index 96f14299..93d445d 100644
> > > --- a/include/linux/mcs_spinlock.h
> > > +++ b/include/linux/mcs_spinlock.h
> > > @@ -36,16 +36,19 @@ void mcs_spin_lock(struct mcs_spinlock **lock,
> struct mcs_spinlock *node)
> > >     node->locked = 0;
> > >     node->next   = NULL;
> > >
> > > +   /* xchg() provides a memory barrier */
> > >     prev = xchg(lock, node);
> > >     if (likely(prev == NULL)) {
> > >             /* Lock acquired */
> > >             return;
> > >     }
> > >     ACCESS_ONCE(prev->next) = node;
> > > -   smp_wmb();
> > >     /* Wait until the lock holder passes the lock down */
> > >     while (!ACCESS_ONCE(node->locked))
> > >             arch_mutex_cpu_relax();
> > > +
> > > +   /* Make sure subsequent operations happen after the lock is
> acquired */
> > > +   smp_rmb();
> >
> > Ok, so this is an smp_rmb() because we assume that stores aren't
> speculated,
> > right? (i.e. the control dependency above is enough for stores to be
> ordered
> > with respect to taking the lock)...
> >
> > >  }
> > >
> > >  /*
> > > @@ -58,6 +61,7 @@ static void mcs_spin_unlock(struct mcs_spinlock
> **lock, struct mcs_spinlock *nod
> > >
> > >     if (likely(!next)) {
> > >             /*
> > > +            * cmpxchg() provides a memory barrier.
> > >              * Release the lock by setting it to NULL
> > >              */
> > >             if (likely(cmpxchg(lock, node, NULL) == node))
> > > @@ -65,9 +69,14 @@ static void mcs_spin_unlock(struct mcs_spinlock
> **lock, struct mcs_spinlock *nod
> > >             /* Wait until the next pointer is set */
> > >             while (!(next = ACCESS_ONCE(node->next)))
> > >                     arch_mutex_cpu_relax();
> > > +   } else {
> > > +           /*
> > > +            * Make sure all operations within the critical section
> > > +            * happen before the lock is released.
> > > +            */
> > > +           smp_wmb();
> >
> > ...but I don't see what prevents reads inside the critical section from
> > moving across the smp_wmb() here.
>
> This is to prevent any read in next critical section from
> creeping up before write in the previous critical section
> has completed
>
> e.g.
> CPU 1 execute
>         mcs_lock
>         x = 1;
>         ...
>         x = 2;
>         mcs_unlock
>
> and CPU 2 execute
>
>         mcs_lock
>         y = x;
>         ...
>         mcs_unlock
>
> We expect y to be 2 after the "y = x" assignment. Without the proper
> rmb in lock and wmb in unlock, y could be 1 for CPU 2 with
> speculative read (i.e. before the x=2 assignment is completed).
>

is it not a good example ?
why CPU2 will be waited  the "x" set to "2" ?  Maybe "y=x" assignment will
be executed firstly than CPU1 in pipeline
because of out-of-reorder.

e.g.
CPU 1 execute
        mcs_lock
        x = 1;
        ...
        x = 2;
        flags = true;
        mcs_unlock

and CPU 2 execute

       while (flags) {
              mcs_lock
               y = x;
                ...
              mcs_unlock
       }