linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [rfc] SLOB memory ordering issue
@ 2008-10-15 16:34 Nick Piggin
  2008-10-15 16:46 ` Nick Piggin
  2008-10-15 16:54 ` Matt Mackall
  0 siblings, 2 replies; 20+ messages in thread
From: Nick Piggin @ 2008-10-15 16:34 UTC (permalink / raw)
  To: torvalds, Pekka Enberg, Matt Mackall, linux-mm, linux-kernel

I think I see a possible memory ordering problem with SLOB:
In slab caches with constructors, the constructor is run
before returning the object to caller, with no memory barrier
afterwards.

Now there is nothing that indicates the _exact_ behaviour
required here. Is it at all reasonable to expect ->ctor() to
be visible to all CPUs and not just the allocating CPU?

SLAB and SLUB don't appear to have this problem. Of course,
they have per-CPU fastpath queues, so _can_ have effectively
exactly the same ordering issue if the object was brought
back into the "initialized" state before being freed, rather
than by ->ctor(). However in that case, it is at least
kind of visible to the caller.

Anyone care or think it is a problem? Should we just document
that ->ctor doesn't imply any barriers? Better ideas?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [rfc] SLOB memory ordering issue
  2008-10-15 16:34 [rfc] SLOB memory ordering issue Nick Piggin
@ 2008-10-15 16:46 ` Nick Piggin
  2008-10-15 16:54 ` Matt Mackall
  1 sibling, 0 replies; 20+ messages in thread
From: Nick Piggin @ 2008-10-15 16:46 UTC (permalink / raw)
  To: torvalds; +Cc: Pekka Enberg, Matt Mackall, linux-mm, linux-kernel

On Thursday 16 October 2008 03:34, Nick Piggin wrote:
> I think I see a possible memory ordering problem with SLOB:
> In slab caches with constructors, the constructor is run
> before returning the object to caller, with no memory barrier
> afterwards.
>
> Now there is nothing that indicates the _exact_ behaviour
> required here. Is it at all reasonable to expect ->ctor() to
> be visible to all CPUs and not just the allocating CPU?
>
> SLAB and SLUB don't appear to have this problem. Of course,
> they have per-CPU fastpath queues, so _can_ have effectively
> exactly the same ordering issue if the object was brought
> back into the "initialized" state before being freed, rather
> than by ->ctor(). However in that case, it is at least
> kind of visible to the caller.

Although I guess it's just as much of a SLAB implementation
detail as the lack of ->ctor() barrier... And I really doubt
_any_ of the callers would have ever thought about either
possible problem.

I'd really hate to add a branch to the slab fastpath for this
though. Maybe we just have to document it, assume there are
no problems, and maybe take a look at some of the core users
of this.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [rfc] SLOB memory ordering issue
  2008-10-15 16:34 [rfc] SLOB memory ordering issue Nick Piggin
  2008-10-15 16:46 ` Nick Piggin
@ 2008-10-15 16:54 ` Matt Mackall
  2008-10-15 17:10   ` Nick Piggin
  1 sibling, 1 reply; 20+ messages in thread
From: Matt Mackall @ 2008-10-15 16:54 UTC (permalink / raw)
  To: Nick Piggin; +Cc: torvalds, Pekka Enberg, linux-mm, linux-kernel

On Thu, 2008-10-16 at 03:34 +1100, Nick Piggin wrote:
> I think I see a possible memory ordering problem with SLOB:
> In slab caches with constructors, the constructor is run
> before returning the object to caller, with no memory barrier
> afterwards.
> 
> Now there is nothing that indicates the _exact_ behaviour
> required here. Is it at all reasonable to expect ->ctor() to
> be visible to all CPUs and not just the allocating CPU?

Do you have a failure scenario in mind?

First, it's a categorical mistake for another CPU to be looking at the
contents of an object unless it knows that it's in an allocated state.

For another CPU to receive that knowledge (by reading a causally-valid
pointer to it in memory), a memory barrier has to occur, no?

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [rfc] SLOB memory ordering issue
  2008-10-15 16:54 ` Matt Mackall
@ 2008-10-15 17:10   ` Nick Piggin
  2008-10-15 17:33     ` Linus Torvalds
  2008-10-15 18:06     ` Nick Piggin
  0 siblings, 2 replies; 20+ messages in thread
From: Nick Piggin @ 2008-10-15 17:10 UTC (permalink / raw)
  To: Matt Mackall; +Cc: torvalds, Pekka Enberg, linux-mm, linux-kernel

On Thursday 16 October 2008 03:54, Matt Mackall wrote:
> On Thu, 2008-10-16 at 03:34 +1100, Nick Piggin wrote:
> > I think I see a possible memory ordering problem with SLOB:
> > In slab caches with constructors, the constructor is run
> > before returning the object to caller, with no memory barrier
> > afterwards.
> >
> > Now there is nothing that indicates the _exact_ behaviour
> > required here. Is it at all reasonable to expect ->ctor() to
> > be visible to all CPUs and not just the allocating CPU?
>
> Do you have a failure scenario in mind?
>
> First, it's a categorical mistake for another CPU to be looking at the
> contents of an object unless it knows that it's in an allocated state.
>
> For another CPU to receive that knowledge (by reading a causally-valid
> pointer to it in memory), a memory barrier has to occur, no?

No.

For (slightly bad) example. Some architectures have a ctor for their
page table page slabs. Not a bad thing to do.

Now they allocate these guys, take a lock, then insert them into the
page tables. The lock is only an acquire barrier, so it can leak past
stores.

The read side is all lockless and in some cases even done by hardware.

Now in _practice_, this is not a problem because some architectures
don't have ctors, and I spotted the issue and put proper barriers in
there. But it was a known fact that ctors were always used, and if I
had assumed ctors were barriers so didn't need the wmb, then there
would be a bug.

Especially the fact that a lock doesn't order the stores makes me
worried that a lockless read-side algorithm might have a bug here.
Fortunately, most of them use RCU and probably use rcu_assign_pointer
even if they do have ctors. But I can't be sure...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [rfc] SLOB memory ordering issue
  2008-10-15 17:10   ` Nick Piggin
@ 2008-10-15 17:33     ` Linus Torvalds
  2008-10-15 17:36       ` Linus Torvalds
  2008-10-15 17:45       ` Nick Piggin
  2008-10-15 18:06     ` Nick Piggin
  1 sibling, 2 replies; 20+ messages in thread
From: Linus Torvalds @ 2008-10-15 17:33 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Matt Mackall, Pekka Enberg, linux-mm, linux-kernel


On Thu, 16 Oct 2008, Nick Piggin wrote:
> 
> Now they allocate these guys, take a lock, then insert them into the
> page tables. The lock is only an acquire barrier, so it can leak past
> stores.

I think that Matt's point was that the code is buggy regardless of any 
ctor or not.

If you make an allocation visible to other CPU's, you would need to make 
sure that allocation is stable with a smp_wmb() before you update the 
pointer to that allocation.

So the code that makes a page visible should just always do that 
synchronization.

And it has nothing to do with ctors or not. It's true whether you do the 
initialization by hand, or whether you use a ctor.

And more importantly, putting the write barrier in the ctor or in the 
memory allocator is simply broken. It's not a ctor/allocator issue. Why? 
Because even if you have a ctor, there is absolutely *nothing* that says 
that the ctor will be sufficient to initialize everything. Most ctors, in 
fact, are just initializing the basic fields - the person that does the 
allocation should finish things up.

The fact that _some_ people using an allocator with a ctor may not do 
anything but the ctor to the page is immaterial.

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [rfc] SLOB memory ordering issue
  2008-10-15 17:33     ` Linus Torvalds
@ 2008-10-15 17:36       ` Linus Torvalds
  2008-10-15 17:58         ` Matt Mackall
  2008-10-15 17:45       ` Nick Piggin
  1 sibling, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2008-10-15 17:36 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Matt Mackall, Pekka Enberg, linux-mm, linux-kernel


On Wed, 15 Oct 2008, Linus Torvalds wrote:
> 
> If you make an allocation visible to other CPU's, you would need to make 
> sure that allocation is stable with a smp_wmb() before you update the 
> pointer to that allocation.

Just to clarify a hopefully obvious issue..

The assumption here is that you don't protect things with locking. Of 
course, if all people accessing the new pointer always have the 
appropriate lock, then memory ordering never matters, since the locks take 
care of it.

So _most_ allocators obviously don't need to do any smp_wmb() at all. But 
the ones that expose things locklessly (where page tables are just one 
example) need to worry.

Again, this is yet another reason to not put things in the allocator. The 
allocator cannot know, and shouldn't care. For all the exact same reasons 
that the allocator cannot know and shouldn't care whether the ctor results 
in a 'final' version or whether the allocator will do some final fixups.

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [rfc] SLOB memory ordering issue
  2008-10-15 17:33     ` Linus Torvalds
  2008-10-15 17:36       ` Linus Torvalds
@ 2008-10-15 17:45       ` Nick Piggin
  2008-10-15 18:03         ` Linus Torvalds
  1 sibling, 1 reply; 20+ messages in thread
From: Nick Piggin @ 2008-10-15 17:45 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Matt Mackall, Pekka Enberg, linux-mm, linux-kernel

On Thursday 16 October 2008 04:33, Linus Torvalds wrote:
> On Thu, 16 Oct 2008, Nick Piggin wrote:
> > Now they allocate these guys, take a lock, then insert them into the
> > page tables. The lock is only an acquire barrier, so it can leak past
> > stores.
>
> I think that Matt's point was that the code is buggy regardless of any
> ctor or not.
>
> If you make an allocation visible to other CPU's, you would need to make
> sure that allocation is stable with a smp_wmb() before you update the
> pointer to that allocation.

What do you mean by the allocation is stable? Let's just talk in loads and
stores and order. You need to make sure previous stores to initialise the
object become visible before subsequent store to make the object visible.
No questions about that (I think that's what you meant by make the alloc
stable).

1. However, if the object is already fully initialised at the point the caller
gets it out of the allocator, then the caller doesn't need to make any
stores to initialise it obviously.

2. I think it could be easy to assume that the allocated object that was
initialised with a ctor for us already will have its initializing stores
ordered when we get it from slab.

So in my page table almost-example, by combining 1 and 2, one might think
it is OK to leave out those smp_wmb()s. And it would be valid code if all
those assumptions _were_ true.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [rfc] SLOB memory ordering issue
  2008-10-15 17:36       ` Linus Torvalds
@ 2008-10-15 17:58         ` Matt Mackall
  0 siblings, 0 replies; 20+ messages in thread
From: Matt Mackall @ 2008-10-15 17:58 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Nick Piggin, Pekka Enberg, linux-mm, linux-kernel

On Wed, 2008-10-15 at 10:36 -0700, Linus Torvalds wrote:
> 
> On Wed, 15 Oct 2008, Linus Torvalds wrote:
> > 
> > If you make an allocation visible to other CPU's, you would need to make 
> > sure that allocation is stable with a smp_wmb() before you update the 
> > pointer to that allocation.
> 
> Just to clarify a hopefully obvious issue..
> 
> The assumption here is that you don't protect things with locking. Of 
> course, if all people accessing the new pointer always have the 
> appropriate lock, then memory ordering never matters, since the locks take 
> care of it.

Right. This is the 99.9% case and is why we should definitely not put an
additional barrier in the allocator.

Lockless users are already on their own with regard to memory ordering
issues, so it makes sense for them to absorb the (often nil) incremental
cost of ensuring object initialization gets flushed.

I feel like we ought to document this in the SLAB API, but at the same
time, I think we'll scare more people than we'll enlighten.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [rfc] SLOB memory ordering issue
  2008-10-15 17:45       ` Nick Piggin
@ 2008-10-15 18:03         ` Linus Torvalds
  2008-10-15 18:12           ` Nick Piggin
  0 siblings, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2008-10-15 18:03 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Matt Mackall, Pekka Enberg, linux-mm, linux-kernel


On Thu, 16 Oct 2008, Nick Piggin wrote:
> 
> What do you mean by the allocation is stable?

"all writes done to it before it's exposed".

> 2. I think it could be easy to assume that the allocated object that was
> initialised with a ctor for us already will have its initializing stores
> ordered when we get it from slab.

You make tons of assumptions.

You assume that
 (a) unlocked accesses are the normal case and should be something the 
     allocator should prioritize/care about.
 (b) that if you have a ctor, it's the only thing the allocator will do.

I don't think either of those assumptions are at all relevant or 
interesting. Quite the reverse - I'd expect them to be in a very small 
minority.

Now, obviously, on pretty much all machines out there (ie x86[-64] and UP 
ARM), smp_wmb() is a no-op, so in that sense we could certainly say that 
"sure, this is a total special case, but we can add a smp_wmb() anyway 
since it won't cost us anything".

On the other hand, on the machines where it doesn't cost us anything, it 
obviously doesn't _do_ anything either, so that argument is pretty 
dubious. 

And on machines where the memory ordering _can_ matter, it's going to add 
cost to the wrong point.

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [rfc] SLOB memory ordering issue
  2008-10-15 17:10   ` Nick Piggin
  2008-10-15 17:33     ` Linus Torvalds
@ 2008-10-15 18:06     ` Nick Piggin
  2008-10-15 18:26       ` Linus Torvalds
  2008-10-17 20:29       ` Linus Torvalds
  1 sibling, 2 replies; 20+ messages in thread
From: Nick Piggin @ 2008-10-15 18:06 UTC (permalink / raw)
  To: Matt Mackall, Hugh Dickins; +Cc: torvalds, Pekka Enberg, linux-mm, linux-kernel

On Thursday 16 October 2008 04:10, Nick Piggin wrote:
> On Thursday 16 October 2008 03:54, Matt Mackall wrote:
> > On Thu, 2008-10-16 at 03:34 +1100, Nick Piggin wrote:
> > > I think I see a possible memory ordering problem with SLOB:
> > > In slab caches with constructors, the constructor is run
> > > before returning the object to caller, with no memory barrier
> > > afterwards.
> > >
> > > Now there is nothing that indicates the _exact_ behaviour
> > > required here. Is it at all reasonable to expect ->ctor() to
> > > be visible to all CPUs and not just the allocating CPU?
> >
> > Do you have a failure scenario in mind?
> >
> > First, it's a categorical mistake for another CPU to be looking at the
> > contents of an object unless it knows that it's in an allocated state.
> >
> > For another CPU to receive that knowledge (by reading a causally-valid
> > pointer to it in memory), a memory barrier has to occur, no?
>
> No.
>
> For (slightly bad) example. Some architectures have a ctor for their
> page table page slabs. Not a bad thing to do.
>
> Now they allocate these guys, take a lock, then insert them into the
> page tables. The lock is only an acquire barrier, so it can leak past
> stores.
>
> The read side is all lockless and in some cases even done by hardware.
>
> Now in _practice_, this is not a problem because some architectures
> don't have ctors, and I spotted the issue and put proper barriers in
> there. But it was a known fact that ctors were always used, and if I
> had assumed ctors were barriers so didn't need the wmb, then there
> would be a bug.
>
> Especially the fact that a lock doesn't order the stores makes me
> worried that a lockless read-side algorithm might have a bug here.
> Fortunately, most of them use RCU and probably use rcu_assign_pointer
> even if they do have ctors. But I can't be sure...

OK, now I have something that'll blow your fuckin mind.

anon_vma_cachep.

P0
do_anonymous_page()
 anon_vma_prepare()
  ctor(anon_vma)
  [sets vma->anon_vma = anon_vma]

P1
do_anonymous_page()
 anon_vma_prepare()
  [sees P0 already allocated it]
 lru_cache_add(page)
  [flushes page to lru]
 page_add_anon_rmap (increments mapcount)
  page_set_anon_rmap
   [sets page->anon_vma = anon_vma]

P2
find page from lru
page_referenced()
 page_referenced_anon()
  page_lock_anon_vma()
   [loads anon_vma from page->anon_vma]
   spin_lock(&anon_vma->lock)


Who was it that said memory ordering was self-evident?

For everyone else:

P1 sees P0's store to vma->anon_vma, then P2 sees P1's store
to page->anon_vma (among others), but P2 does not see P0's ctor
store to initialise anon_vma->lock.

And there seems like another bug there too, but just a plain control
race rather than strictly[*] a data race, P0 is executing list_add_tail
of vma to anon_vma->head at some point here too, so even assuming
we're running on a machine with transitive store ordering, then the
above race can't hit, then P2 subsequently wants to run a
list_for_each_entry over anon_vma->head while P0 is in the process of
modifying it.

Am I the one who's bamboozled, or can anyone confirm?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [rfc] SLOB memory ordering issue
  2008-10-15 18:03         ` Linus Torvalds
@ 2008-10-15 18:12           ` Nick Piggin
  2008-10-15 18:19             ` Matt Mackall
  2008-10-15 18:29             ` Linus Torvalds
  0 siblings, 2 replies; 20+ messages in thread
From: Nick Piggin @ 2008-10-15 18:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Matt Mackall, Pekka Enberg, linux-mm, linux-kernel

On Thursday 16 October 2008 05:03, Linus Torvalds wrote:
> On Thu, 16 Oct 2008, Nick Piggin wrote:
> > What do you mean by the allocation is stable?
>
> "all writes done to it before it's exposed".
>
> > 2. I think it could be easy to assume that the allocated object that was
> > initialised with a ctor for us already will have its initializing stores
> > ordered when we get it from slab.
>
> You make tons of assumptions.
>
> You assume that
>  (a) unlocked accesses are the normal case and should be something the
>      allocator should prioritize/care about.
>  (b) that if you have a ctor, it's the only thing the allocator will do.

Yes, as I said, I do not want to add a branch and/or barrier to the
allocator for this. I just want to flag the issue and discuss whether
there is anything that can be done about it.


> I don't think either of those assumptions are at all relevant or
> interesting. Quite the reverse - I'd expect them to be in a very small
> minority.

They will be in the minority or non-existant, but obviously there only
need be one "counterexample" bug to disprove a claim that it never
matters.


> Now, obviously, on pretty much all machines out there (ie x86[-64] and UP
> ARM), smp_wmb() is a no-op, so in that sense we could certainly say that
> "sure, this is a total special case, but we can add a smp_wmb() anyway
> since it won't cost us anything".
>
> On the other hand, on the machines where it doesn't cost us anything, it
> obviously doesn't _do_ anything either, so that argument is pretty
> dubious.
>
> And on machines where the memory ordering _can_ matter, it's going to add
> cost to the wrong point.

When I said "I'd really hate to add a branch to the slab fastpath", it
wasn't a tacit acknowlegement that the barrier is the only way to go,
if it sounded that way.

I meant: I'd *really* hate to add a branch to the slab fastpath :)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [rfc] SLOB memory ordering issue
  2008-10-15 18:12           ` Nick Piggin
@ 2008-10-15 18:19             ` Matt Mackall
  2008-10-15 18:35               ` Nick Piggin
  2008-10-15 18:29             ` Linus Torvalds
  1 sibling, 1 reply; 20+ messages in thread
From: Matt Mackall @ 2008-10-15 18:19 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Linus Torvalds, Pekka Enberg, linux-mm, linux-kernel

On Thu, 2008-10-16 at 05:12 +1100, Nick Piggin wrote:
> On Thursday 16 October 2008 05:03, Linus Torvalds wrote:
> > On Thu, 16 Oct 2008, Nick Piggin wrote:
> > > What do you mean by the allocation is stable?
> >
> > "all writes done to it before it's exposed".
> >
> > > 2. I think it could be easy to assume that the allocated object that was
> > > initialised with a ctor for us already will have its initializing stores
> > > ordered when we get it from slab.
> >
> > You make tons of assumptions.
> >
> > You assume that
> >  (a) unlocked accesses are the normal case and should be something the
> >      allocator should prioritize/care about.
> >  (b) that if you have a ctor, it's the only thing the allocator will do.
> 
> Yes, as I said, I do not want to add a branch and/or barrier to the
> allocator for this. I just want to flag the issue and discuss whether
> there is anything that can be done about it.

Well the alternative is to have someone really smart investigate all the
lockless users of ctors and add appropriate barriers. I suspect that's a
fairly small set and that you're already familiar with most of them.

But yes, I think you may be on to a real problem. It might also be worth
devoting a few neurons to thinking about zeroed allocations.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [rfc] SLOB memory ordering issue
  2008-10-15 18:06     ` Nick Piggin
@ 2008-10-15 18:26       ` Linus Torvalds
  2008-10-15 18:50         ` Nick Piggin
  2008-10-17 20:29       ` Linus Torvalds
  1 sibling, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2008-10-15 18:26 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Matt Mackall, Hugh Dickins, Pekka Enberg, linux-mm, linux-kernel


On Thu, 16 Oct 2008, Nick Piggin wrote:
> 
> Who was it that said memory ordering was self-evident?

Nobody has _ever_ said that memory ordering is self-evident. Quite the 
reverse.

What we've said is that it's not a ctor issue. This has nothing 
what-so-ever to do with ctors, and everything to do with the fact that 
lockless is hard.

And the general rule is: to find a page (or any other data structures) on 
another CPU, you need to insert it into the right data structures. And 
that insertion event needs to either be locked, or it needs to be ordered.

But notice that it's the _insertion_ event. Not the ctor. Not the 
allocator. It's the person _doing_ the allocation that needs to order 
things.

See?

And no, I didn't look at your exact case. But for pages in page tables, 
we'd need to have the right smp_wmb() at the "set_pte[_at]()" stage, 
either inside that macro or in the caller.

We used to only care about the page _contents_ (because the only unlocked 
access was the one that was done by hardware), but now that we do unlocked 
lookups in software too, we need to make sure the "struct page" itself is 
also valid.

For non-page-table lookups (LRU, radix trees, etc etc), the rules are 
different. Again, it's not an _allocator_ (or ctor) issue, it's about the 
point where you insert the thing. If you insert the page using a lock, you 
need not worry about memory ordering at all. And if you insert it using 
RCU, you do.

This is *all* we have argued about. The argument is simple: this has 
NOTHING to do with the allocator, and has NOTHING to do with constructors.

			Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [rfc] SLOB memory ordering issue
  2008-10-15 18:12           ` Nick Piggin
  2008-10-15 18:19             ` Matt Mackall
@ 2008-10-15 18:29             ` Linus Torvalds
  1 sibling, 0 replies; 20+ messages in thread
From: Linus Torvalds @ 2008-10-15 18:29 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Matt Mackall, Pekka Enberg, linux-mm, linux-kernel


On Thu, 16 Oct 2008, Nick Piggin wrote:
> 
> When I said "I'd really hate to add a branch to the slab fastpath", it
> wasn't a tacit acknowlegement that the barrier is the only way to go,
> if it sounded that way.
> 
> I meant: I'd *really* hate to add a branch to the slab fastpath :)

Well, quite frankly, your choice of subject line and whole point of 
argument may have confused me.

You started out - and continue to - make this sound like it's a 
SLAB/SLOB/SLUB issue. It's not. 

I agree there is quite likely memory ordering issues - possibly old ones, 
but quite possibly also ones that have just happened fairly recently as 
we've done more unlocked lookups - and all I've ever disagreed with is 
how you seem to have mixed this up with the allocator.

And I still don't understand why you even _mention_ the slab fastpath. It 
seems totally immaterial.

			Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [rfc] SLOB memory ordering issue
  2008-10-15 18:19             ` Matt Mackall
@ 2008-10-15 18:35               ` Nick Piggin
  2008-10-15 18:43                 ` Linus Torvalds
  0 siblings, 1 reply; 20+ messages in thread
From: Nick Piggin @ 2008-10-15 18:35 UTC (permalink / raw)
  To: Matt Mackall; +Cc: Linus Torvalds, Pekka Enberg, linux-mm, linux-kernel

On Thursday 16 October 2008 05:19, Matt Mackall wrote:
> On Thu, 2008-10-16 at 05:12 +1100, Nick Piggin wrote:
> > On Thursday 16 October 2008 05:03, Linus Torvalds wrote:
> > > On Thu, 16 Oct 2008, Nick Piggin wrote:
> > > > What do you mean by the allocation is stable?
> > >
> > > "all writes done to it before it's exposed".
> > >
> > > > 2. I think it could be easy to assume that the allocated object that
> > > > was initialised with a ctor for us already will have its initializing
> > > > stores ordered when we get it from slab.
> > >
> > > You make tons of assumptions.
> > >
> > > You assume that
> > >  (a) unlocked accesses are the normal case and should be something the
> > >      allocator should prioritize/care about.
> > >  (b) that if you have a ctor, it's the only thing the allocator will
> > > do.
> >
> > Yes, as I said, I do not want to add a branch and/or barrier to the
> > allocator for this. I just want to flag the issue and discuss whether
> > there is anything that can be done about it.
>
> Well the alternative is to have someone really smart investigate all the
> lockless users of ctors and add appropriate barriers. I suspect that's a
> fairly small set and that you're already familiar with most of them.

I thought someone might volunteer me for that job :)

Actually, there are surprisingly huge number of them. What I would be
most comfortable doing, if I was making a kernel to run my life support
system on an SMP powerpc box, would be to spend zero time on all the
drivers and whacky things with ctors and just add smp_wmb() after them
if they are not _totally_ obvious.

There is only one user in mm/. Which I had a quick look at. There might
be other issues with it, but maybe with a bit more locking or an
explicit barrier or two, it will become more obviously correct.

fs/, inode, dentries, seem to be a very big user. I'd _guess_ they should
be OK because the inode and dentry caches are pretty serialised as it is.
I think unless a fs is doing something really crazy, it would be hard for
one CPU to get to a new dentry, say, before it has been locked up and into
the dcache. Casting a quick eye over things wouldn't hurt, though.


> But yes, I think you may be on to a real problem. It might also be worth
> devoting a few neurons to thinking about zeroed allocations.

Hmm, that's a point. The page nor slab allocators don't order those before
we get them either. I might have a sniff around mm/ or so, but no way I
could audit all kzalloc etc. users in the kernel. Maybe that case is a bit
more obvious though, because we're not conceptually thinking about pulling
out an already-set-up-object living happily in ctor land somewhere. We
explicitly say: allocate me something, and zero it out.

It's 5am and I'm rambling at this point. Probably either one is about as
likely to have bugs, and maybe only _very slightly_ more likely to have an
ordering bug than any other kernel code.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [rfc] SLOB memory ordering issue
  2008-10-15 18:35               ` Nick Piggin
@ 2008-10-15 18:43                 ` Linus Torvalds
  2008-10-15 19:19                   ` Nick Piggin
  0 siblings, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2008-10-15 18:43 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Matt Mackall, Pekka Enberg, linux-mm, linux-kernel


On Thu, 16 Oct 2008, Nick Piggin wrote:
>
> Actually, there are surprisingly huge number of them. What I would be
> most comfortable doing, if I was making a kernel to run my life support
> system on an SMP powerpc box, would be to spend zero time on all the
> drivers and whacky things with ctors and just add smp_wmb() after them
> if they are not _totally_ obvious.

WHY?

THIS HAS NOTHING TO DO WITH CONSTRUCTORS!

If the driver is using locking, there is no memory ordering issues 
what-so-ever.

And if the driver isn't using locking, IT IS BROKEN.

It's that simple. Why do you keep bringing up non-issues?

What matters is not constructors. Never has been. Constructors are 
actually very rare, it's much more common to do

	ptr = kmalloc(..)
	.. initialize it by hand ..

and why do you think constructors are somehow different? They're not.

What matter is how you look things up on the other CPU's. If you don't use 
locking, you use some lockless thing, and then you need to be careful 
about memory ordering.

And quite frankly, if you're a driver, and you're trying to do lockless 
algorithms, you're just being crazy. You're going to have much worse bugs, 
and again, whether you use constructors or pink elephants is going to be 
totally irrelevant. 

So why do you bring up these totally pointless things? Why do you bring up 
drivers? Why do you bring up constructors? Why, why, why?

			Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [rfc] SLOB memory ordering issue
  2008-10-15 18:26       ` Linus Torvalds
@ 2008-10-15 18:50         ` Nick Piggin
  0 siblings, 0 replies; 20+ messages in thread
From: Nick Piggin @ 2008-10-15 18:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Matt Mackall, Hugh Dickins, Pekka Enberg, linux-mm, linux-kernel

On Thursday 16 October 2008 05:26, Linus Torvalds wrote:
> On Thu, 16 Oct 2008, Nick Piggin wrote:
> > Who was it that said memory ordering was self-evident?
>
> Nobody has _ever_ said that memory ordering is self-evident. Quite the
> reverse.

Heh, no that was a dig at someone who actually did say that the
other day. Not you or Matt.


> What we've said is that it's not a ctor issue. This has nothing
> what-so-ever to do with ctors, and everything to do with the fact that
> lockless is hard.
>
> And the general rule is: to find a page (or any other data structures) on
> another CPU, you need to insert it into the right data structures. And
> that insertion event needs to either be locked, or it needs to be ordered.

Sure. And "ordered" doesn't come by itself, but specifically with preceeding
stores that initialise the object.


> But notice that it's the _insertion_ event. Not the ctor. Not the
> allocator. It's the person _doing_ the allocation that needs to order
> things.
>
> See?

I see it as a joint effort between the code initialising the object, and
the code making it visible. But whatever. Whether or not we agree on the
exact obviousness or whatever, I don't want to argue. I think the solution
is agreed more or less "very unlikely to be bugs, and anyway the only way
to fix them is to document it and read code".


> And no, I didn't look at your exact case. But for pages in page tables,
> we'd need to have the right smp_wmb() at the "set_pte[_at]()" stage,
> either inside that macro or in the caller.

The example was actually talking about _page table pages_ in page tables.


> We used to only care about the page _contents_ (because the only unlocked
> access was the one that was done by hardware), but now that we do unlocked
> lookups in software too, we need to make sure the "struct page" itself is
> also valid.

I added those. Actually we _also_ need them for lockless hardware and
software walks, and not just for struct page, but also page table page
contents (which is what some architectures initialise with ctor).

Because otherwise you can insert your page table page, only to have that
store pass an earlier store to invalidate one of its ptes, and boom!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [rfc] SLOB memory ordering issue
  2008-10-15 18:43                 ` Linus Torvalds
@ 2008-10-15 19:19                   ` Nick Piggin
  2008-10-15 19:47                     ` Linus Torvalds
  0 siblings, 1 reply; 20+ messages in thread
From: Nick Piggin @ 2008-10-15 19:19 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Matt Mackall, Pekka Enberg, linux-mm, linux-kernel

On Thursday 16 October 2008 05:43, Linus Torvalds wrote:
> On Thu, 16 Oct 2008, Nick Piggin wrote:
> > Actually, there are surprisingly huge number of them. What I would be
> > most comfortable doing, if I was making a kernel to run my life support
> > system on an SMP powerpc box, would be to spend zero time on all the
> > drivers and whacky things with ctors and just add smp_wmb() after them
> > if they are not _totally_ obvious.
>
> WHY?

I guess I wouldn't bother with your kernel. I was being hypothetical.
Can you _prove_ no code has a bug due specifically to this issue?


> THIS HAS NOTHING TO DO WITH CONSTRUCTORS!
>
> If the driver is using locking, there is no memory ordering issues
> what-so-ever.
>
> And if the driver isn't using locking, IT IS BROKEN.

Did you read the anon_vma example? It's broken if it assumes the objects
coming out of its slab are always "stable".


> It's that simple. Why do you keep bringing up non-issues?

Why are you being antagonistic and assuming I'm wrong instead of
considering you mistunderstand me, maybe I'm not a retard? I am bad
at explaining myself, but I'll try once more.


> What matters is not constructors. Never has been. Constructors are
> actually very rare, it's much more common to do
>
> 	ptr = kmalloc(..)
> 	.. initialize it by hand ..
>
> and why do you think constructors are somehow different? They're not.

I think they might be interpreted or viewed by the caller as giving
a "stable" object. It is rather more obvious to a caller that it has
previous unordered stores if it is doing this
 	ptr = kmalloc(..)
 	.. initialize it by hand ..

I haven't dealt much with constructors myself so I haven't really
had to think about it. But I'm sure I could have missed it and been
fooled.

If you still don't agree, then fine; if I find a bug I'll send a patch.
I don't want to keep arguing.


> What matter is how you look things up on the other CPU's. If you don't use
> locking, you use some lockless thing, and then you need to be careful
> about memory ordering.
>
> And quite frankly, if you're a driver, and you're trying to do lockless
> algorithms, you're just being crazy. You're going to have much worse bugs,
> and again, whether you use constructors or pink elephants is going to be
> totally irrelevant.
>
> So why do you bring up these totally pointless things? Why do you bring up
> drivers? Why do you bring up constructors? Why, why, why?

I'll try to keep them to myself in future.

Thanks,
Nick

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [rfc] SLOB memory ordering issue
  2008-10-15 19:19                   ` Nick Piggin
@ 2008-10-15 19:47                     ` Linus Torvalds
  0 siblings, 0 replies; 20+ messages in thread
From: Linus Torvalds @ 2008-10-15 19:47 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Matt Mackall, Pekka Enberg, linux-mm, linux-kernel


On Thu, 16 Oct 2008, Nick Piggin wrote:
> 
> I guess I wouldn't bother with your kernel. I was being hypothetical.
> Can you _prove_ no code has a bug due specifically to this issue?

Nick, nobody can prove anything but the most trivial programs.

> Did you read the anon_vma example? It's broken if it assumes the objects
> coming out of its slab are always "stable".

So why do you blame SLOB/SLAB?

This is my whole and only point - you're pointing at all the wrong things, 
and then I get upset when I point out to you - over and over again - that 
you point to the wrong thing, and you just keep on (once more) pointing to 
it.

Can you see my frustration? You keep on claiming this is somehow an 
issue of the slab allocator, and seem to refuse to just read what I write.

But let me try it again:

 - The object you get from kmalloc (or *any* allocator) is a per-CPU 
   object. It's your _local_ memory area. And it has to be that way, 
   because no allocator can ever know the difference between objects that 
   are going to have global visibility and objects that don't. So the 
   allocator has to basically assume the cheap case.

 - Yes, we could add a "smp_wmb()" at the end of all allocators, but that 
   would be a pointless no-op on architectures where it doesn't matter, 
   and it would be potentially expensive on architectures where it _does_ 
   matter. In other words, in neither case is it the right thing to do.

 - Most allocations by _far_ (at least in the static sense of "there's a 
   lot of kmalloc/kmem_cache_alloc's in the kernel") are going to be used 
   for things that are either thread-local (think temporary data 
   structures like "__getname()" for path allocators) or are going to be 
   used with proper locking.

   NONE OF THOSE CASES WANT THE OVERHEAD! And they are the *common* ones.

 - constructiors really have absolutely nothing to do with anything. What 
   about kzalloc()? That's an implicit "constructor" too. Do you want the 
   smp_wmb() there for that too? Do you realize that 99.9% of all such 
   users will fill in a few bytes/fields in _addition_ to clearing the 
   structure they just allocated? You do realize that almost nobody wants 
   a really empty data structure? You _do_ realize that the "smp_wmb()" in 
   the allocator IS TOTALLY USELESS if the code that did the allocation 
   then updates a few other fields too?

   ADDING the smp_wmb() at an allocation point WOULD BE ACTIVELY 
   MISLEADING. Anybody who thinks that it helps is just fooling himself. 

   We're *much* better off just telling everybody that if they think they 
   can do lockless data structures, they have to do the memory ordering at 
   the _insertion_ point, and stop believing in fairies and wizards and in 
   allocators doing it for them!

 - For _all_ of these reasons, any time you say that this is an allocator 
   issue, or a constructor issue, I don't need to even bother reading any 
   more. Because you've just shown yourself to not read what I wrote, nor 
   understand the issue.

So if you want to have a constructive discussion, you need to

 - *read* what I wrote. UNDERSTAND that memory ordering is a non-issue 
   when there is locking involved, and that locking is still the default 
   approach for any normal data structure.

 - *stop* talking about "constructors" and "SLOB allocators". Because as 
   long as you do, you're not making sense.

and if you can do that, I can treat you like you're worth talking to. But 
as long as you cannot accept that, what's the point?

			Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [rfc] SLOB memory ordering issue
  2008-10-15 18:06     ` Nick Piggin
  2008-10-15 18:26       ` Linus Torvalds
@ 2008-10-17 20:29       ` Linus Torvalds
  1 sibling, 0 replies; 20+ messages in thread
From: Linus Torvalds @ 2008-10-17 20:29 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Matt Mackall, Hugh Dickins, Pekka Enberg, linux-mm, linux-kernel

Ok, finally looked at this.

There is indeed a locking bug there. "anon_vma_prepare()" optimistically 
looks at vma->anon_vma without taking the &mm->page_table_lock. That's not 
right.

Of course, we could just take the lock, but in this case it's probably ok 
to just admit that we have a lockless algorithm. But that implies that we 
need to do the right memory ordering.

And that, in turn, doesn't just imply a "smp_wmb()" - if you do memory 
ordering, you need to do it on *both* sides, so now the other side needs 
to also do a matching smp_rmb(). Or, in this case, smp_rmb_depends(), I 
guess.

That, btw, is an important part of memory ordering. You can never do 
ordering on just one side. A "smp_wmb()" on its own is always nonsensical. 
It always needs to be paired with a "smp_rmb()" variant.

Something like the appended may fix it.

But I do think we have a potential independent issue with the new page 
table lookup code now that it's lock-free. We have the smp_rmb() calls in 
gup_get_pte() (at least on x86), when we look things up, but we don't 
actually have a lot of smp_wmb()'s there when we insert the page.

For the anonymous page case, we end up doing a

	page_add_new_anon_rmap();

before we do the set_pte_at() that actually exposes it, and that does the 
whole

	page->mapping = (struct address_space *) anon_vma;
	page->index = linear_page_index(vma, address);

thing, but there is no write barrier between those and the actual write to 
the page tables, so when GUP looks up the page, it can't actually depend 
on page->mappign or anything else!

Now, this really isn't an issue on x86, since smp_wmb() is a no-op, and 
the compiler won't be re-ordering the writes, but in general I do think 
that now that we do lockless lookup of pages from the page tables, we 
probably do need smp_wmb()'s there just in front of the "set_pte_at()" 
calls.

NOTE NOTE! The patch below is only about "page->anon_vma", not about the 
GUP lookup and page->mapping/index fields. That's an independent issue.

And notice? This has _nothing_ to do with constructors or allocators.

And of course - this patch is totally untested, and may well need some 
thinking about.

			Linus

---
 mm/rmap.c |   18 ++++++++++++++++++
 1 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 0383acf..21d09bb 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -81,6 +81,13 @@ int anon_vma_prepare(struct vm_area_struct *vma)
 		/* page_table_lock to protect against threads */
 		spin_lock(&mm->page_table_lock);
 		if (likely(!vma->anon_vma)) {
+			/*
+			 * We hold the mm->page_table_lock, but another
+			 * CPU may be doing an optimistic load (the one
+			 * at the top), and we want to make sure that
+			 * the anon_vma changes are visible.
+			 */
+			smp_wmb();
 			vma->anon_vma = anon_vma;
 			list_add_tail(&vma->anon_vma_node, &anon_vma->head);
 			allocated = NULL;
@@ -92,6 +99,17 @@ int anon_vma_prepare(struct vm_area_struct *vma)
 		if (unlikely(allocated))
 			anon_vma_free(allocated);
 	}
+	/*
+	 * Subtle: we looked up anon_vma without any locking
+	 * (in the comon case), and are going to look at the
+	 * spinlock etc behind it. In order to know that it's
+	 * initialized, we need to do a read barrier here.
+	 *
+	 * We can use the cheaper "depends" version, since we
+	 * are following a pointer, and only on alpha may that
+	 * give a stale value.
+	 */
+	smp_read_barrier_depends();
 	return 0;
 }
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2008-10-17 20:29 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-10-15 16:34 [rfc] SLOB memory ordering issue Nick Piggin
2008-10-15 16:46 ` Nick Piggin
2008-10-15 16:54 ` Matt Mackall
2008-10-15 17:10   ` Nick Piggin
2008-10-15 17:33     ` Linus Torvalds
2008-10-15 17:36       ` Linus Torvalds
2008-10-15 17:58         ` Matt Mackall
2008-10-15 17:45       ` Nick Piggin
2008-10-15 18:03         ` Linus Torvalds
2008-10-15 18:12           ` Nick Piggin
2008-10-15 18:19             ` Matt Mackall
2008-10-15 18:35               ` Nick Piggin
2008-10-15 18:43                 ` Linus Torvalds
2008-10-15 19:19                   ` Nick Piggin
2008-10-15 19:47                     ` Linus Torvalds
2008-10-15 18:29             ` Linus Torvalds
2008-10-15 18:06     ` Nick Piggin
2008-10-15 18:26       ` Linus Torvalds
2008-10-15 18:50         ` Nick Piggin
2008-10-17 20:29       ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox