From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id AD9BC919 for ; Tue, 26 Jul 2016 22:40:39 +0000 (UTC) Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 08904160 for ; Tue, 26 Jul 2016 22:40:38 +0000 (UTC) Received: from pps.filterd (m0098410.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.11/8.16.0.11) with SMTP id u6QMdEUv013976 for ; Tue, 26 Jul 2016 18:40:38 -0400 Received: from e36.co.us.ibm.com (e36.co.us.ibm.com [32.97.110.154]) by mx0a-001b2d01.pphosted.com with ESMTP id 24e0gkg3jh-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Tue, 26 Jul 2016 18:40:38 -0400 Received: from localhost by e36.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 26 Jul 2016 16:40:37 -0600 Received: from b01cxnp22036.gho.pok.ibm.com (b01cxnp22036.gho.pok.ibm.com [9.57.198.26]) by d03dlp01.boulder.ibm.com (Postfix) with ESMTP id AE5121FF001E for ; Tue, 26 Jul 2016 16:40:18 -0600 (MDT) Received: from d01av01.pok.ibm.com (d01av01.pok.ibm.com [9.56.224.215]) by b01cxnp22036.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id u6QMeZJs62259208 for ; Tue, 26 Jul 2016 22:40:35 GMT Received: from d01av01.pok.ibm.com (localhost [127.0.0.1]) by d01av01.pok.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id u6QMeYcQ012885 for ; Tue, 26 Jul 2016 18:40:35 -0400 Date: Tue, 26 Jul 2016 15:40:35 -0700 From: "Paul E. McKenney" To: Hannes Reinecke Reply-To: paulmck@linux.vnet.ibm.com References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Message-Id: <20160726224035.GD7094@linux.vnet.ibm.com> Cc: jakub@redhat.com, parri.andrea@gmail.com, ksummit-discuss@lists.linuxfoundation.org, peterz@infradead.org, Alan Stern , ramana.radhakrishnan@arm.com, luc.maranget@inria.fr, j.alglave@ucl.ac.uk Subject: Re: [Ksummit-discuss] [TECH TOPIC] Memory model, using ISO C++11 atomic ops List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Tue, Jul 26, 2016 at 05:23:36PM +0200, Hannes Reinecke wrote: > On 07/26/2016 03:10 PM, Alan Stern wrote: > >On Tue, 26 Jul 2016, Hannes Reinecke wrote: > > > >>I have been playing around with RCUs and memory barriers quite a lot > >>recently, and found some really 'odd' use-cases in the kernel which > >>would benefit from improvements here. > > > >Could you post one or two examples? It would be interesting to see > >what they involve. > > > I have been working on a performance regression when calling > 'dm_suspend/dm_resume' repeatedly for several (hundreds) devices. > That boiled down to the patch introducing srcu in the device mapper core > with commit 83d5e5b0af907 (dm: optimize use SRCU and RCU). > Looking at it the code they do things like: > > set_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags); > if (map) > synchronize_srcu(&md->io_barrier); > > where the srcu is used to ensure the code has left the critical > sections. However, if the memory pointed to by the srcu isn't > actually freed why we could easily drop the 'synchronize_srcu' call. > But that would require that > a) the set_bit() above is indeed atomic > and > b) there's no need to call 'synchronize_rcu' if you're not actually > freeing memory but rather fiddle pointers. > Both are somewhat shady areas where the documentation nor usage > reveals some obvious insights. The set_bit() function is guaranteed to execute atomically, but it does not guarantee any ordering against prior accesses. The synchronize_rcu() does provide ordering against the set_bit(), to subsequent accesses are covered, but only in the case where "map" is non-zero. For RCU, it does depend on the use case. For example, there are some rare but real cases where synchronize_rcu() is required even if you are not freeing memory: https://www.usenix.org/legacy/event/atc11/tech/final_files/Triplett.pdf > On another example I've been doing performance patches to the lpfc > driver (cf my talk at VAULT this year), where I've replaced most > spinlocks with atomics and bitops. > Which should work as well, only that it's still a bit unclear to me > if an when you need barriers in addition to atomic resp bitops. If the bitop returns a value, you don't need additional barriers. Otherwise ... ... you need smp_mb__before_atomic() to order prior accesses against the bitop and smp_mb__after_atomic() to order subsequent accesses against the bitop. If you need the bitop to be ordered against both prior and subsequent accesses, then you need both smp_mb__before_atomic() and smp_mb__after_atomic(). > And if you need barriers, which variant would be most appropriate? > The __before or the __after variant? ... you need smp_mb__before_atomic() to order prior accesses against the bitop and smp_mb__after_atomic() to order subsequent accesses against the bitop. If you need the bitop to be ordered against both prior and subsequent accesses, then you need both smp_mb__before_atomic() and smp_mb__after_atomic(). > Also, what happens to bitops on bitfields longer than an unsigned long? > Are they still atomic? >>From what I can see, yes, sort of. The "sort of" part is due to the fact that bitops on widely separated bits would be would avoid interfering with each other, but on the other hand, there would be no cause-and-effect relationship between them, either. Furthermore, processes reading the bits set might disagree on the order in which they were set. All that aside, please note that the initial memory model is limited to memory reference, barriers, and RCU. We do not yet have locking or read-modify-write atomic operations. We have to start somewhere! Thanx, Paul