From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from neon.transmeta.com (neon-best.transmeta.com [206.184.214.10]) by kvack.org (8.8.7/8.8.7) with ESMTP id SAA19075 for ; Mon, 23 Mar 1998 18:20:31 -0500 Date: Mon, 23 Mar 1998 15:20:11 -0800 (PST) From: Linus Torvalds Subject: Re: Lazy page reclamation on SMP machines: memory barriers In-Reply-To: <199803232249.WAA02431@dax.dcs.ed.ac.uk> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org To: "Stephen C. Tweedie" Cc: linux-mm@kvack.org, linux-smp@vger.rutgers.edu List-ID: On Mon, 23 Mar 1998, Stephen C. Tweedie wrote: > > Are there barrier constructs available to do this? I believe the answer > to be no, based on the recent thread concerning the use of inline asm > cpuid instructions as a barrier on Intel machines. Alternatively, does > Intel provide any ordering guarantees which may help? Intel only gives you total ordering across certain instructions (cpuid being one of them, and the only one that is easily usable under all circumstances). > Finally, I looked quickly at the kernel's spinlock primitives, and they > also seem unprotected by memory barriers on Intel. Is this really safe? Yes. Intel guarantees total ordering around any locked instruction, so the spinlocks themselves act as the barriers. This is why "unlock" is a slow lock ; btrl $0,(mem) instead of the much faster movl $0,(mem) because the latter doesn't imply any ordering, and there are no faster ways to do it (cpuid is fairly slow, so trying to do a "movl + cpuid" doesn't help either). The intel ordering is really nasty, because there is no good fast synchronization. "cpuid" trashes half the register set, and all the other synchronizing instructions have other even nastier side effects. And there is nothing like the alpha (and others) "write memory barrier" instruction that does only a one-way barrier. (To be fair, the alpha for example has very nice primitives for SMP, but sometimes the implementation of them is horribly slow. For example, the "load-and-protect" thing always seems to go to the bus even when the CPU has exclusive ownership, which makes atomic sequences much more expensive than they should be. I think DEC fixed this in their later alpha's, but the point being that even when you have the right concepts you can mess up with having a bad implementation ;) Linus