From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f170.google.com (mail-pd0-f170.google.com [209.85.192.170]) by kanga.kvack.org (Postfix) with ESMTP id 72F3A6B013E for ; Wed, 6 Nov 2013 23:30:34 -0500 (EST) Received: by mail-pd0-f170.google.com with SMTP id v10so24557pde.1 for ; Wed, 06 Nov 2013 20:30:34 -0800 (PST) Received: from psmtp.com ([74.125.245.173]) by mx.google.com with SMTP id yk3si1543669pac.12.2013.11.06.20.30.15 for ; Wed, 06 Nov 2013 20:30:31 -0800 (PST) Message-ID: <527B1742.60400@hp.com> Date: Wed, 06 Nov 2013 23:29:54 -0500 From: Waiman Long MIME-Version: 1.0 Subject: Re: [PATCH v3 3/5] MCS Lock: Barrier corrections References: <1383773827.11046.355.camel@schen9-DESK> In-Reply-To: Content-Type: multipart/alternative; boundary="------------010800000509090703040100" Sender: owner-linux-mm@kvack.org List-ID: To: Linus Torvalds Cc: Tim Chen , Arnd Bergmann , "Figo. zhang" , Aswin Chandramouleeswaran , Rik van Riel , Raghavendra K T , "Paul E.McKenney" , linux-arch@vger.kernel.org, Andi Kleen , Peter Zijlstra , George Spelvin , Michel Lespinasse , Ingo Molnar , Peter Hurley , "H. Peter Anvin" , Andrew Morton , linux-mm , Alex Shi , Andrea Arcangeli , Scott J Norton , linux-kernel@vger.kernel.org, Thomas Gleixner , Dave Hansen , Matthew R Wilcox , Will Deacon , Davidlohr Bueso This is a multi-part message in MIME format. --------------010800000509090703040100 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 11/06/2013 08:39 PM, Linus Torvalds wrote: > > Sorry about the HTML crap, the internet connection is too slow for my > normal email habits, so I'm using my phone. > > I think the barriers are still totally wrong for the locking functions. > > Adding an smp_rmb after waiting for the lock is pure BS. Writes in the > locked region could percolate out of the locked region. > > The thing is, you cannot do the memory ordering for locks in any same > generic way. Not using our current barrier system. On x86 (and many > others) the smp_rmb will work fine, because writes are never moved > earlier. But on other architectures you really need an acquire to get > a lock efficiently. No separate barriers. An acquire needs to be on > the instruction that does the lock. > > Same goes for unlock. On x86 any store is a fine unlock, but on other > architectures you need a store with a release marker. > > So no amount of barriers will ever do this correctly. Sure, you can > add full memory barriers and it will be "correct" but it will be > unbearably slow, and add totally unnecessary serialization. So > *correct* locking will require architecture support. > > Yes, we realized that we can't do it in a generic way without introducing unwanted overhead. So I had sent out another patch to do it in an architecture specific way to enable each architecture to choose their memory barrier. It was at the end of the v3 and v4 patch series. -Longman --------------010800000509090703040100 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit On 11/06/2013 08:39 PM, Linus Torvalds wrote:

Sorry about the HTML crap, the internet connection is too slow for my normal email habits, so I'm using my phone.

I think the barriers are still totally wrong for the locking functions.

Adding an smp_rmb after waiting for the lock is pure BS. Writes in the locked region could percolate out of the locked region.

The thing is, you cannot do the memory ordering for locks in any same generic way. Not using our current barrier system. On x86 (and many others) the smp_rmb will work fine, because writes are never moved earlier. But on other architectures you really need an acquire to get a lock efficiently. No separate barriers. An acquire needs to be on the instruction that does the lock.

Same goes for unlock. On x86 any store is a fine unlock, but on other architectures you need a store with a release marker.

So no amount of barriers will ever do this correctly. Sure, you can add full memory barriers and it will be "correct" but it will be unbearably slow, and add totally unnecessary serialization. So *correct* locking will require architecture support.

A A A A

Yes, we realized that we can't do it in a generic way without introducing unwanted overhead. So I had sent out another patch to do it in an architecture specific way to enable each architecture to choose their memory barrier. It was at the end of the v3 and v4 patch series.

-Longman
--------------010800000509090703040100-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org