[Ksummit-discuss] [TECH TOPIC] Memory model, using ISO C++11 atomic ops

ksummit.lists.linux.dev archive mirror
 help / color / mirror / Atom feed

* [Ksummit-discuss] [TECH TOPIC] Memory model, using ISO C++11 atomic ops
@ 2016-07-22 10:34 David Howells
  2016-07-22 16:44 ` Paul E. McKenney
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: David Howells @ 2016-07-22 10:34 UTC (permalink / raw)
  To: ksummit-discuss; +Cc: jakub, peterz, ramana.radhakrishnan

Earlier this year we had a discussion of the possibilities of using ISO C++11
atomic operations inside the kernel to implement kernel atomic ops of sorts
various, posted here:

	Subject: [RFC PATCH 00/15] Provide atomics and bitops implemented with ISO C++11 atomics
	Date: Wed, 18 May 2016 16:10:37 +0100

Is it worth getting together to discuss these in person in one of the tech
slots - especially if there are some gcc or llvm people available at plumbers
who could join in?

Further, Paul McKenney and others are assembling a memory model description.
Do we want to consider loosening up the kernel memory model?

Currently, for example, locks imply semi-permeable barriers that are not tied
to the lock variable - but, as I understand it, this may not hold true on all
arches, and on those arches an extra memory barrier would be required.  For
example, { LOCK(A), UNLOCK(A), LOCK(B), UNLOCK(B) } would not imply a full
memory barrier in the middle.

Also, we have read and write memory barrier constructs - but not all CPUs have
such things, some instead have acquire and release.  Would it be worth having
higher level constructs that encapsulate what you're trying to do and select
the most appropriate barrier from the available choices?  For instance, we
have a fair few circular buffers, with linux/circ_buf.h providing some useful
bits.  Should we provide circular buffering barrier constructs?

If we do this, Will Deacon, Peter Zijlstra and Paul McKenney should definitely
be there.  I would suggest Jakub Jelinek and Ramana Radhakrishnan as gcc
representatives.  I don't know anyone from LLVM.

David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] Memory model, using ISO C++11 atomic ops
  2016-07-22 10:34 [Ksummit-discuss] [TECH TOPIC] Memory model, using ISO C++11 atomic ops David Howells
@ 2016-07-22 16:44 ` Paul E. McKenney
  2016-07-25 17:14   ` Luis R. Rodriguez
  2016-07-23 20:21 ` Benjamin Herrenschmidt
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Paul E. McKenney @ 2016-07-22 16:44 UTC (permalink / raw)
  To: David Howells
  Cc: jakub, parri.andrea, ksummit-discuss, peterz, stern,
	ramana.radhakrishnan, luc.maranget, j.alglave

On Fri, Jul 22, 2016 at 11:34:35AM +0100, David Howells wrote:
> Earlier this year we had a discussion of the possibilities of using ISO C++11
> atomic operations inside the kernel to implement kernel atomic ops of sorts
> various, posted here:
> 
> 	Subject: [RFC PATCH 00/15] Provide atomics and bitops implemented with ISO C++11 atomics
> 	Date: Wed, 18 May 2016 16:10:37 +0100
> 
> Is it worth getting together to discuss these in person in one of the tech
> slots - especially if there are some gcc or llvm people available at plumbers
> who could join in?
> 
> 
> Further, Paul McKenney and others are assembling a memory model description.

We are attempting to automate memory-barriers.txt, so that you could
provide fragments of C code, and the tool would tell you whether a given
outcome happens always, sometimes, or never.  The current prototype
handles memory accesses, memory barriers, and RCU, but not yet locking
or read-modify-write atomics (though there is some vestigal support for
RMW atomics).  We are currently playing whack-a-mole with odd corner cases
of various architectures' memory models.  We are therefore also working
on ways of handling the resulting uncertainty.  Good clean fun!  ;-)

> Do we want to consider loosening up the kernel memory model?
> 
> Currently, for example, locks imply semi-permeable barriers that are not tied
> to the lock variable - but, as I understand it, this may not hold true on all
> arches, and on those arches an extra memory barrier would be required.  For
> example, { LOCK(A), UNLOCK(A), LOCK(B), UNLOCK(B) } would not imply a full
> memory barrier in the middle.

Agreed, the current version of memory-barriers.txt documents the fact
that an unlock/lock sequence does not imply a full memory barrier:

	Similarly, the reverse case of a RELEASE followed by an ACQUIRE
	does not imply a full memory barrier.

If we are going to talk about changing this, we should most definitely
include a powerpc arch maintainer.

> Also, we have read and write memory barrier constructs - but not all CPUs have
> such things, some instead have acquire and release.  Would it be worth having
> higher level constructs that encapsulate what you're trying to do and select
> the most appropriate barrier from the available choices?  For instance, we
> have a fair few circular buffers, with linux/circ_buf.h providing some useful
> bits.  Should we provide circular buffering barrier constructs?
> 
> 
> If we do this, Will Deacon, Peter Zijlstra and Paul McKenney should definitely
> be there.  I would suggest Jakub Jelinek and Ramana Radhakrishnan as gcc
> representatives.  I don't know anyone from LLVM.

Alan Stern should be included, as he is the author of the most recent
version of the prototype formalized memory model (including the current
extremely nice formal model of RCU!), Luc Maranget as the author of
the previous version, Jade Alglave as author of the first version and
founder of this project, and Andrea Parri for his many contributions to
recent models.

Shouldn't we also include experts on other SMP architectures?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] Memory model, using ISO C++11 atomic ops
  2016-07-22 16:44 ` Paul E. McKenney
@ 2016-07-25 17:14   ` Luis R. Rodriguez
  2016-07-26  6:09     ` Hannes Reinecke
  0 siblings, 1 reply; 15+ messages in thread
From: Luis R. Rodriguez @ 2016-07-25 17:14 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: jakub, parri.andrea, ksummit-discuss, peterz, stern,
	ramana.radhakrishnan, luc.maranget, j.alglave

On Fri, Jul 22, 2016 at 09:44:11AM -0700, Paul E. McKenney wrote:
> On Fri, Jul 22, 2016 at 11:34:35AM +0100, David Howells wrote:
> > Earlier this year we had a discussion of the possibilities of using ISO C++11
> > atomic operations inside the kernel to implement kernel atomic ops of sorts
> > various, posted here:
> > 
> > 	Subject: [RFC PATCH 00/15] Provide atomics and bitops implemented with ISO C++11 atomics
> > 	Date: Wed, 18 May 2016 16:10:37 +0100
> > 
> > Is it worth getting together to discuss these in person in one of the tech
> > slots - especially if there are some gcc or llvm people available at plumbers
> > who could join in?
> > 
> > 
> > Further, Paul McKenney and others are assembling a memory model description.
> 
> We are attempting to automate memory-barriers.txt, so that you could
> provide fragments of C code, and the tool would tell you whether a given
> outcome happens always, sometimes, or never.  The current prototype
> handles memory accesses, memory barriers, and RCU, but not yet locking
> or read-modify-write atomics (though there is some vestigal support for
> RMW atomics).  We are currently playing whack-a-mole with odd corner cases
> of various architectures' memory models.  We are therefore also working
> on ways of handling the resulting uncertainty.  Good clean fun!  ;-)

Consider me interested in this discussion, patches, etc.

  Luis

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] Memory model, using ISO C++11 atomic ops
  2016-07-25 17:14   ` Luis R. Rodriguez
@ 2016-07-26  6:09     ` Hannes Reinecke
  2016-07-26 13:10       ` Alan Stern
  0 siblings, 1 reply; 15+ messages in thread
From: Hannes Reinecke @ 2016-07-26  6:09 UTC (permalink / raw)
  To: Luis R. Rodriguez, Paul E. McKenney
  Cc: jakub, parri.andrea, ksummit-discuss, peterz, stern,
	ramana.radhakrishnan, luc.maranget, j.alglave

On 07/25/2016 07:14 PM, Luis R. Rodriguez wrote:
> On Fri, Jul 22, 2016 at 09:44:11AM -0700, Paul E. McKenney wrote:
>> On Fri, Jul 22, 2016 at 11:34:35AM +0100, David Howells wrote:
>>> Earlier this year we had a discussion of the possibilities of using ISO C++11
>>> atomic operations inside the kernel to implement kernel atomic ops of sorts
>>> various, posted here:
>>>
>>> 	Subject: [RFC PATCH 00/15] Provide atomics and bitops implemented with ISO C++11 atomics
>>> 	Date: Wed, 18 May 2016 16:10:37 +0100
>>>
>>> Is it worth getting together to discuss these in person in one of the tech
>>> slots - especially if there are some gcc or llvm people available at plumbers
>>> who could join in?
>>>
>>>
>>> Further, Paul McKenney and others are assembling a memory model description.
>>
>> We are attempting to automate memory-barriers.txt, so that you could
>> provide fragments of C code, and the tool would tell you whether a given
>> outcome happens always, sometimes, or never.  The current prototype
>> handles memory accesses, memory barriers, and RCU, but not yet locking
>> or read-modify-write atomics (though there is some vestigal support for
>> RMW atomics).  We are currently playing whack-a-mole with odd corner cases
>> of various architectures' memory models.  We are therefore also working
>> on ways of handling the resulting uncertainty.  Good clean fun!  ;-)
>
> Consider me interested in this discussion, patches, etc.
>
Same here.
I have been playing around with RCUs and memory barriers quite a lot 
recently, and found some really 'odd' use-cases in the kernel which 
would benefit from improvements here.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		               zSeries & Storage
hare@suse.com			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] Memory model, using ISO C++11 atomic ops
  2016-07-26  6:09     ` Hannes Reinecke
@ 2016-07-26 13:10       ` Alan Stern
  2016-07-26 13:35         ` Paul E. McKenney
  2016-07-26 15:23         ` Hannes Reinecke
  0 siblings, 2 replies; 15+ messages in thread
From: Alan Stern @ 2016-07-26 13:10 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: jakub, parri.andrea, j.alglave, ksummit-discuss, peterz,
	ramana.radhakrishnan, luc.maranget

On Tue, 26 Jul 2016, Hannes Reinecke wrote:

> I have been playing around with RCUs and memory barriers quite a lot 
> recently, and found some really 'odd' use-cases in the kernel which 
> would benefit from improvements here.

Could you post one or two examples?  It would be interesting to see 
what they involve.

Alan Stern

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] Memory model, using ISO C++11 atomic ops
  2016-07-26 13:10       ` Alan Stern
@ 2016-07-26 13:35         ` Paul E. McKenney
  2016-07-29  1:06           ` Steven Rostedt
  2016-07-26 15:23         ` Hannes Reinecke
  1 sibling, 1 reply; 15+ messages in thread
From: Paul E. McKenney @ 2016-07-26 13:35 UTC (permalink / raw)
  To: Alan Stern
  Cc: jakub, parri.andrea, ksummit-discuss, peterz,
	ramana.radhakrishnan, luc.maranget, j.alglave

On Tue, Jul 26, 2016 at 09:10:32AM -0400, Alan Stern wrote:
> On Tue, 26 Jul 2016, Hannes Reinecke wrote:
> 
> > I have been playing around with RCUs and memory barriers quite a lot 
> > recently, and found some really 'odd' use-cases in the kernel which 
> > would benefit from improvements here.
> 
> Could you post one or two examples?  It would be interesting to see 
> what they involve.

I would be intersted as well!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] Memory model, using ISO C++11 atomic ops
  2016-07-26 13:35         ` Paul E. McKenney
@ 2016-07-29  1:06           ` Steven Rostedt
  0 siblings, 0 replies; 15+ messages in thread
From: Steven Rostedt @ 2016-07-29  1:06 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: jakub, parri.andrea, ksummit-discuss, peterz, Alan Stern,
	ramana.radhakrishnan, luc.maranget, j.alglave

On Tue, 26 Jul 2016 06:35:51 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> On Tue, Jul 26, 2016 at 09:10:32AM -0400, Alan Stern wrote:
> > On Tue, 26 Jul 2016, Hannes Reinecke wrote:
> >   
> > > I have been playing around with RCUs and memory barriers quite a lot 
> > > recently, and found some really 'odd' use-cases in the kernel which 
> > > would benefit from improvements here.  
> > 
> > Could you post one or two examples?  It would be interesting to see 
> > what they involve.  
> 
> I would be intersted as well!
> 

Me too, as tracing plays a bit of magic with memory models as well.

-- Steve

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] Memory model, using ISO C++11 atomic ops
  2016-07-26 13:10       ` Alan Stern
  2016-07-26 13:35         ` Paul E. McKenney
@ 2016-07-26 15:23         ` Hannes Reinecke
  2016-07-26 22:40           ` Paul E. McKenney
  1 sibling, 1 reply; 15+ messages in thread
From: Hannes Reinecke @ 2016-07-26 15:23 UTC (permalink / raw)
  To: Alan Stern
  Cc: jakub, parri.andrea, j.alglave, ksummit-discuss, peterz,
	ramana.radhakrishnan, luc.maranget

On 07/26/2016 03:10 PM, Alan Stern wrote:
> On Tue, 26 Jul 2016, Hannes Reinecke wrote:
>
>> I have been playing around with RCUs and memory barriers quite a lot
>> recently, and found some really 'odd' use-cases in the kernel which
>> would benefit from improvements here.
>
> Could you post one or two examples?  It would be interesting to see
> what they involve.
>
I have been working on a performance regression when calling 
'dm_suspend/dm_resume' repeatedly for several (hundreds) devices.
That boiled down to the patch introducing srcu in the device mapper core
with commit 83d5e5b0af907 (dm: optimize use SRCU and RCU).
Looking at it the code they do things like:

	set_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags);
	if (map)
		synchronize_srcu(&md->io_barrier);

where the srcu is used to ensure the code has left the critical 
sections. However, if the memory pointed to by the srcu isn't actually 
freed why we could easily drop the 'synchronize_srcu' call.
But that would require that
a) the set_bit() above is indeed atomic
and
b) there's no need to call 'synchronize_rcu' if you're not actually 
freeing memory but rather fiddle pointers.
Both are somewhat shady areas where the documentation nor usage reveals 
some obvious insights.

On another example I've been doing performance patches to the lpfc 
driver (cf my talk at VAULT this year), where I've replaced most 
spinlocks with atomics and bitops.
Which should work as well, only that it's still a bit unclear to me
if an when you need barriers in addition to atomic resp bitops.
And if you need barriers, which variant would be most appropriate?
The __before or the __after variant?
Also, what happens to bitops on bitfields longer than an unsigned long?
Are they still atomic?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		               zSeries & Storage
hare@suse.com			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] Memory model, using ISO C++11 atomic ops
  2016-07-26 15:23         ` Hannes Reinecke
@ 2016-07-26 22:40           ` Paul E. McKenney
  0 siblings, 0 replies; 15+ messages in thread
From: Paul E. McKenney @ 2016-07-26 22:40 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: jakub, parri.andrea, ksummit-discuss, peterz, Alan Stern,
	ramana.radhakrishnan, luc.maranget, j.alglave

On Tue, Jul 26, 2016 at 05:23:36PM +0200, Hannes Reinecke wrote:
> On 07/26/2016 03:10 PM, Alan Stern wrote:
> >On Tue, 26 Jul 2016, Hannes Reinecke wrote:
> >
> >>I have been playing around with RCUs and memory barriers quite a lot
> >>recently, and found some really 'odd' use-cases in the kernel which
> >>would benefit from improvements here.
> >
> >Could you post one or two examples?  It would be interesting to see
> >what they involve.
> >
> I have been working on a performance regression when calling
> 'dm_suspend/dm_resume' repeatedly for several (hundreds) devices.
> That boiled down to the patch introducing srcu in the device mapper core
> with commit 83d5e5b0af907 (dm: optimize use SRCU and RCU).
> Looking at it the code they do things like:
> 
> 	set_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags);
> 	if (map)
> 		synchronize_srcu(&md->io_barrier);
> 
> where the srcu is used to ensure the code has left the critical
> sections. However, if the memory pointed to by the srcu isn't
> actually freed why we could easily drop the 'synchronize_srcu' call.
> But that would require that
> a) the set_bit() above is indeed atomic
> and
> b) there's no need to call 'synchronize_rcu' if you're not actually
> freeing memory but rather fiddle pointers.
> Both are somewhat shady areas where the documentation nor usage
> reveals some obvious insights.

The set_bit() function is guaranteed to execute atomically, but it does
not guarantee any ordering against prior accesses.  The synchronize_rcu()
does provide ordering against the set_bit(), to subsequent accesses are
covered, but only in the case where "map" is non-zero.

For RCU, it does depend on the use case.  For example, there are some
rare but real cases where synchronize_rcu() is required even if you are
not freeing memory:

https://www.usenix.org/legacy/event/atc11/tech/final_files/Triplett.pdf

> On another example I've been doing performance patches to the lpfc
> driver (cf my talk at VAULT this year), where I've replaced most
> spinlocks with atomics and bitops.
> Which should work as well, only that it's still a bit unclear to me
> if an when you need barriers in addition to atomic resp bitops.

If the bitop returns a value, you don't need additional barriers.
Otherwise ...
 ... you need smp_mb__before_atomic() to order prior accesses
against the bitop and smp_mb__after_atomic() to order subsequent
accesses against the bitop.  If you need the bitop to be ordered
against both prior and subsequent accesses, then you need both
smp_mb__before_atomic() and smp_mb__after_atomic().

> And if you need barriers, which variant would be most appropriate?
> The __before or the __after variant?

 ... you need smp_mb__before_atomic() to order prior accesses
against the bitop and smp_mb__after_atomic() to order subsequent
accesses against the bitop.  If you need the bitop to be ordered
against both prior and subsequent accesses, then you need both
smp_mb__before_atomic() and smp_mb__after_atomic().

> Also, what happens to bitops on bitfields longer than an unsigned long?
> Are they still atomic?

>From what I can see, yes, sort of.  The "sort of" part is due to the fact
that bitops on widely separated bits would be would avoid interfering
with each other, but on the other hand, there would be no cause-and-effect
relationship between them, either.  Furthermore, processes reading the
bits set might disagree on the order in which they were set.

All that aside, please note that the initial memory model is limited
to memory reference, barriers, and RCU.  We do not yet have locking or
read-modify-write atomic operations.  We have to start somewhere!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] Memory model, using ISO C++11 atomic ops
  2016-07-22 10:34 [Ksummit-discuss] [TECH TOPIC] Memory model, using ISO C++11 atomic ops David Howells
  2016-07-22 16:44 ` Paul E. McKenney
@ 2016-07-23 20:21 ` Benjamin Herrenschmidt
  2016-07-26 15:11 ` David Woodhouse
  2016-07-26 15:20 ` David Howells
  3 siblings, 0 replies; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2016-07-23 20:21 UTC (permalink / raw)
  To: David Howells, ksummit-discuss; +Cc: jakub, peterz, ramana.radhakrishnan

On Fri, 2016-07-22 at 11:34 +0100, David Howells wrote:
> If we do this, Will Deacon, Peter Zijlstra and Paul McKenney should definitely
> be there.  I would suggest Jakub Jelinek and Ramana Radhakrishnan as gcc
> representatives.  I don't know anyone from LLVM.

I'd like to be there too.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] Memory model, using ISO C++11 atomic ops
  2016-07-22 10:34 [Ksummit-discuss] [TECH TOPIC] Memory model, using ISO C++11 atomic ops David Howells
  2016-07-22 16:44 ` Paul E. McKenney
  2016-07-23 20:21 ` Benjamin Herrenschmidt
@ 2016-07-26 15:11 ` David Woodhouse
  2016-07-28 10:41   ` Will Deacon
  2016-08-02 13:42   ` Peter Zijlstra
  2016-07-26 15:20 ` David Howells
  3 siblings, 2 replies; 15+ messages in thread
From: David Woodhouse @ 2016-07-26 15:11 UTC (permalink / raw)
  To: David Howells, ksummit-discuss; +Cc: jakub, peterz, ramana.radhakrishnan

[-- Attachment #1: Type: text/plain, Size: 909 bytes --]

On Fri, 2016-07-22 at 11:34 +0100, David Howells wrote:
> Further, Paul McKenney and others are assembling a memory model description.
> Do we want to consider loosening up the kernel memory model?

It's not clear that 'loosening up' is what we're after.

In Seoul last year, weren't we looking at things like readl_relaxed()
and lamenting the fact that they do actually still have strong enough
requirements that they can't *really* be very relaxed on Power and
ARM64 at all, because they're basically being used with the assumption
of Intel-like semantics.

The cheap answer is "well, it sucks to be on POWER or ARM64 because
then readl_relaxed() has to be as slow as readl() is".

But it would be good to follow up on that properly, and maybe introduce
a variant which *can* be implemented across more architectures.

Is that what Paul is working on, that you mention above?

-- 
dwmw2

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5760 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] Memory model, using ISO C++11 atomic ops
  2016-07-26 15:11 ` David Woodhouse
@ 2016-07-28 10:41   ` Will Deacon
  2016-08-02 13:42   ` Peter Zijlstra
  1 sibling, 0 replies; 15+ messages in thread
From: Will Deacon @ 2016-07-28 10:41 UTC (permalink / raw)
  To: David Woodhouse; +Cc: jakub, peterz, ksummit-discuss, ramana.radhakrishnan

On Tue, Jul 26, 2016 at 04:11:21PM +0100, David Woodhouse wrote:
> On Fri, 2016-07-22 at 11:34 +0100, David Howells wrote:
> > Further, Paul McKenney and others are assembling a memory model description.
> > Do we want to consider loosening up the kernel memory model?
> 
> It's not clear that 'loosening up' is what we're after.
> 
> In Seoul last year, weren't we looking at things like readl_relaxed()
> and lamenting the fact that they do actually still have strong enough
> requirements that they can't *really* be very relaxed on Power and
> ARM64 at all, because they're basically being used with the assumption
> of Intel-like semantics.
> 
> The cheap answer is "well, it sucks to be on POWER or ARM64 because
> then readl_relaxed() has to be as slow as readl() is".

I wasn't in Seoul, but I think some people got the wrong end of the stick
about the relaxed accessors and I'd be interested in trying to address
some of that, at least from the arm64 point-of-view. Paul has been busy
writing something up a summary for lwn, but I don't think it's quite ready
yet.

Having said that, the memory model work that I'm aware of focusses
completely on SMP synchronisation and I don't think it's particularly
helpful to throw I/O into the mix just yet.

Will

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] Memory model, using ISO C++11 atomic ops
  2016-07-26 15:11 ` David Woodhouse
  2016-07-28 10:41   ` Will Deacon
@ 2016-08-02 13:42   ` Peter Zijlstra
  2016-08-03  8:49     ` Will Deacon
  1 sibling, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2016-08-02 13:42 UTC (permalink / raw)
  To: David Woodhouse; +Cc: jakub, ksummit-discuss, ramana.radhakrishnan

On Tue, Jul 26, 2016 at 04:11:21PM +0100, David Woodhouse wrote:
> On Fri, 2016-07-22 at 11:34 +0100, David Howells wrote:
> > Further, Paul McKenney and others are assembling a memory model description.
> > Do we want to consider loosening up the kernel memory model?
> 
> It's not clear that 'loosening up' is what we're after.

Linus (who should also very much be present for this) always argues
against relaxing ordering.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] Memory model, using ISO C++11 atomic ops
  2016-08-02 13:42   ` Peter Zijlstra
@ 2016-08-03  8:49     ` Will Deacon
  0 siblings, 0 replies; 15+ messages in thread
From: Will Deacon @ 2016-08-03  8:49 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: jakub, ksummit-discuss, ramana.radhakrishnan

On Tue, Aug 02, 2016 at 03:42:23PM +0200, Peter Zijlstra wrote:
> On Tue, Jul 26, 2016 at 04:11:21PM +0100, David Woodhouse wrote:
> > On Fri, 2016-07-22 at 11:34 +0100, David Howells wrote:
> > > Further, Paul McKenney and others are assembling a memory model description.
> > > Do we want to consider loosening up the kernel memory model?
> > 
> > It's not clear that 'loosening up' is what we're after.
> 
> Linus (who should also very much be present for this) always argues
> against relaxing ordering.

Even if he had an inexplicable change in heart, how on Earth do you go
about validating the existing codebase against a new memory model? It's
one thing to show that the relaxation is strictly a relaxation (and
therefore the existing backend implementations remain sound), but quite
another to show that locking implementations don't fall apart, or the
guarantees that no longer hold aren't relied upon someplace.

One might consider adding more atomic operations. For example, the
release/acquire stuff we grew recently is "weaker" than full fences but,
unlike the C11 atomics, the release/acquire primitives complement full
fences. Conversely, having two sets of fences (C11 and kernel) or two
sets of release/acquire, each with subtly different semantics sounds
like an awful maintainance burden and not somewhere we should be going.

That's why I'd be interested in building kernel-compatible operations
using C11 atomics (relaxed accesses and fences) for asm/generic, but not
more than that, at least outside of arch/*.

Will

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] Memory model, using ISO C++11 atomic ops
  2016-07-22 10:34 [Ksummit-discuss] [TECH TOPIC] Memory model, using ISO C++11 atomic ops David Howells
                   ` (2 preceding siblings ...)
  2016-07-26 15:11 ` David Woodhouse
@ 2016-07-26 15:20 ` David Howells
  3 siblings, 0 replies; 15+ messages in thread
From: David Howells @ 2016-07-26 15:20 UTC (permalink / raw)
  To: David Woodhouse; +Cc: jakub, peterz, ksummit-discuss, ramana.radhakrishnan

David Woodhouse <dwmw2@infradead.org> wrote:

> In Seoul last year, weren't we looking at things like readl_relaxed()
> and lamenting the fact that they do actually still have strong enough
> requirements that they can't *really* be very relaxed on Power and
> ARM64 at all, because they're basically being used with the assumption
> of Intel-like semantics.

I don't recall that.  Possibly that was a track I wasn't in.

> The cheap answer is "well, it sucks to be on POWER or ARM64 because
> then readl_relaxed() has to be as slow as readl() is".

Does the memory model for CPU/device interactions have to be the same as that
for CPU/CPU interactions?  I guess with respect to locks, it does so that two
processors who both want to access a device don't trample over each other.

> Is that what Paul is working on, that you mention above?

Paul is working on a general overall description.

David

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-08-03  8:49 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-22 10:34 [Ksummit-discuss] [TECH TOPIC] Memory model, using ISO C++11 atomic ops David Howells
2016-07-22 16:44 ` Paul E. McKenney
2016-07-25 17:14   ` Luis R. Rodriguez
2016-07-26  6:09     ` Hannes Reinecke
2016-07-26 13:10       ` Alan Stern
2016-07-26 13:35         ` Paul E. McKenney
2016-07-29  1:06           ` Steven Rostedt
2016-07-26 15:23         ` Hannes Reinecke
2016-07-26 22:40           ` Paul E. McKenney
2016-07-23 20:21 ` Benjamin Herrenschmidt
2016-07-26 15:11 ` David Woodhouse
2016-07-28 10:41   ` Will Deacon
2016-08-02 13:42   ` Peter Zijlstra
2016-08-03  8:49     ` Will Deacon
2016-07-26 15:20 ` David Howells

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox