On Nov 29, 2013 8:18 AM, "Will Deacon" wrote: > > To get some sort of > idea, I tried adding a dmb to the start of spin_unlock on ARMv7 and I saw a > 3% performance hit in hackbench on my dual-cluster board. Don't do a dmb. Just do a dummy release. You just said that on arm64 a unlock+lock is a memory barrier, so just make the mb__before_spinlock() be a dummy store with release to the stack.. That should be noticeably cheaper than a full dmb. Linus