* [rfc] SLOB memory ordering issue @ 2008-10-15 16:34 Nick Piggin 2008-10-15 16:46 ` Nick Piggin 2008-10-15 16:54 ` Matt Mackall 0 siblings, 2 replies; 20+ messages in thread From: Nick Piggin @ 2008-10-15 16:34 UTC (permalink / raw) To: torvalds, Pekka Enberg, Matt Mackall, linux-mm, linux-kernel I think I see a possible memory ordering problem with SLOB: In slab caches with constructors, the constructor is run before returning the object to caller, with no memory barrier afterwards. Now there is nothing that indicates the _exact_ behaviour required here. Is it at all reasonable to expect ->ctor() to be visible to all CPUs and not just the allocating CPU? SLAB and SLUB don't appear to have this problem. Of course, they have per-CPU fastpath queues, so _can_ have effectively exactly the same ordering issue if the object was brought back into the "initialized" state before being freed, rather than by ->ctor(). However in that case, it is at least kind of visible to the caller. Anyone care or think it is a problem? Should we just document that ->ctor doesn't imply any barriers? Better ideas? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [rfc] SLOB memory ordering issue 2008-10-15 16:34 [rfc] SLOB memory ordering issue Nick Piggin @ 2008-10-15 16:46 ` Nick Piggin 2008-10-15 16:54 ` Matt Mackall 1 sibling, 0 replies; 20+ messages in thread From: Nick Piggin @ 2008-10-15 16:46 UTC (permalink / raw) To: torvalds; +Cc: Pekka Enberg, Matt Mackall, linux-mm, linux-kernel On Thursday 16 October 2008 03:34, Nick Piggin wrote: > I think I see a possible memory ordering problem with SLOB: > In slab caches with constructors, the constructor is run > before returning the object to caller, with no memory barrier > afterwards. > > Now there is nothing that indicates the _exact_ behaviour > required here. Is it at all reasonable to expect ->ctor() to > be visible to all CPUs and not just the allocating CPU? > > SLAB and SLUB don't appear to have this problem. Of course, > they have per-CPU fastpath queues, so _can_ have effectively > exactly the same ordering issue if the object was brought > back into the "initialized" state before being freed, rather > than by ->ctor(). However in that case, it is at least > kind of visible to the caller. Although I guess it's just as much of a SLAB implementation detail as the lack of ->ctor() barrier... And I really doubt _any_ of the callers would have ever thought about either possible problem. I'd really hate to add a branch to the slab fastpath for this though. Maybe we just have to document it, assume there are no problems, and maybe take a look at some of the core users of this. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [rfc] SLOB memory ordering issue 2008-10-15 16:34 [rfc] SLOB memory ordering issue Nick Piggin 2008-10-15 16:46 ` Nick Piggin @ 2008-10-15 16:54 ` Matt Mackall 2008-10-15 17:10 ` Nick Piggin 1 sibling, 1 reply; 20+ messages in thread From: Matt Mackall @ 2008-10-15 16:54 UTC (permalink / raw) To: Nick Piggin; +Cc: torvalds, Pekka Enberg, linux-mm, linux-kernel On Thu, 2008-10-16 at 03:34 +1100, Nick Piggin wrote: > I think I see a possible memory ordering problem with SLOB: > In slab caches with constructors, the constructor is run > before returning the object to caller, with no memory barrier > afterwards. > > Now there is nothing that indicates the _exact_ behaviour > required here. Is it at all reasonable to expect ->ctor() to > be visible to all CPUs and not just the allocating CPU? Do you have a failure scenario in mind? First, it's a categorical mistake for another CPU to be looking at the contents of an object unless it knows that it's in an allocated state. For another CPU to receive that knowledge (by reading a causally-valid pointer to it in memory), a memory barrier has to occur, no? -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [rfc] SLOB memory ordering issue 2008-10-15 16:54 ` Matt Mackall @ 2008-10-15 17:10 ` Nick Piggin 2008-10-15 17:33 ` Linus Torvalds 2008-10-15 18:06 ` Nick Piggin 0 siblings, 2 replies; 20+ messages in thread From: Nick Piggin @ 2008-10-15 17:10 UTC (permalink / raw) To: Matt Mackall; +Cc: torvalds, Pekka Enberg, linux-mm, linux-kernel On Thursday 16 October 2008 03:54, Matt Mackall wrote: > On Thu, 2008-10-16 at 03:34 +1100, Nick Piggin wrote: > > I think I see a possible memory ordering problem with SLOB: > > In slab caches with constructors, the constructor is run > > before returning the object to caller, with no memory barrier > > afterwards. > > > > Now there is nothing that indicates the _exact_ behaviour > > required here. Is it at all reasonable to expect ->ctor() to > > be visible to all CPUs and not just the allocating CPU? > > Do you have a failure scenario in mind? > > First, it's a categorical mistake for another CPU to be looking at the > contents of an object unless it knows that it's in an allocated state. > > For another CPU to receive that knowledge (by reading a causally-valid > pointer to it in memory), a memory barrier has to occur, no? No. For (slightly bad) example. Some architectures have a ctor for their page table page slabs. Not a bad thing to do. Now they allocate these guys, take a lock, then insert them into the page tables. The lock is only an acquire barrier, so it can leak past stores. The read side is all lockless and in some cases even done by hardware. Now in _practice_, this is not a problem because some architectures don't have ctors, and I spotted the issue and put proper barriers in there. But it was a known fact that ctors were always used, and if I had assumed ctors were barriers so didn't need the wmb, then there would be a bug. Especially the fact that a lock doesn't order the stores makes me worried that a lockless read-side algorithm might have a bug here. Fortunately, most of them use RCU and probably use rcu_assign_pointer even if they do have ctors. But I can't be sure... -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [rfc] SLOB memory ordering issue 2008-10-15 17:10 ` Nick Piggin @ 2008-10-15 17:33 ` Linus Torvalds 2008-10-15 17:36 ` Linus Torvalds 2008-10-15 17:45 ` Nick Piggin 2008-10-15 18:06 ` Nick Piggin 1 sibling, 2 replies; 20+ messages in thread From: Linus Torvalds @ 2008-10-15 17:33 UTC (permalink / raw) To: Nick Piggin; +Cc: Matt Mackall, Pekka Enberg, linux-mm, linux-kernel On Thu, 16 Oct 2008, Nick Piggin wrote: > > Now they allocate these guys, take a lock, then insert them into the > page tables. The lock is only an acquire barrier, so it can leak past > stores. I think that Matt's point was that the code is buggy regardless of any ctor or not. If you make an allocation visible to other CPU's, you would need to make sure that allocation is stable with a smp_wmb() before you update the pointer to that allocation. So the code that makes a page visible should just always do that synchronization. And it has nothing to do with ctors or not. It's true whether you do the initialization by hand, or whether you use a ctor. And more importantly, putting the write barrier in the ctor or in the memory allocator is simply broken. It's not a ctor/allocator issue. Why? Because even if you have a ctor, there is absolutely *nothing* that says that the ctor will be sufficient to initialize everything. Most ctors, in fact, are just initializing the basic fields - the person that does the allocation should finish things up. The fact that _some_ people using an allocator with a ctor may not do anything but the ctor to the page is immaterial. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [rfc] SLOB memory ordering issue 2008-10-15 17:33 ` Linus Torvalds @ 2008-10-15 17:36 ` Linus Torvalds 2008-10-15 17:58 ` Matt Mackall 2008-10-15 17:45 ` Nick Piggin 1 sibling, 1 reply; 20+ messages in thread From: Linus Torvalds @ 2008-10-15 17:36 UTC (permalink / raw) To: Nick Piggin; +Cc: Matt Mackall, Pekka Enberg, linux-mm, linux-kernel On Wed, 15 Oct 2008, Linus Torvalds wrote: > > If you make an allocation visible to other CPU's, you would need to make > sure that allocation is stable with a smp_wmb() before you update the > pointer to that allocation. Just to clarify a hopefully obvious issue.. The assumption here is that you don't protect things with locking. Of course, if all people accessing the new pointer always have the appropriate lock, then memory ordering never matters, since the locks take care of it. So _most_ allocators obviously don't need to do any smp_wmb() at all. But the ones that expose things locklessly (where page tables are just one example) need to worry. Again, this is yet another reason to not put things in the allocator. The allocator cannot know, and shouldn't care. For all the exact same reasons that the allocator cannot know and shouldn't care whether the ctor results in a 'final' version or whether the allocator will do some final fixups. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [rfc] SLOB memory ordering issue 2008-10-15 17:36 ` Linus Torvalds @ 2008-10-15 17:58 ` Matt Mackall 0 siblings, 0 replies; 20+ messages in thread From: Matt Mackall @ 2008-10-15 17:58 UTC (permalink / raw) To: Linus Torvalds; +Cc: Nick Piggin, Pekka Enberg, linux-mm, linux-kernel On Wed, 2008-10-15 at 10:36 -0700, Linus Torvalds wrote: > > On Wed, 15 Oct 2008, Linus Torvalds wrote: > > > > If you make an allocation visible to other CPU's, you would need to make > > sure that allocation is stable with a smp_wmb() before you update the > > pointer to that allocation. > > Just to clarify a hopefully obvious issue.. > > The assumption here is that you don't protect things with locking. Of > course, if all people accessing the new pointer always have the > appropriate lock, then memory ordering never matters, since the locks take > care of it. Right. This is the 99.9% case and is why we should definitely not put an additional barrier in the allocator. Lockless users are already on their own with regard to memory ordering issues, so it makes sense for them to absorb the (often nil) incremental cost of ensuring object initialization gets flushed. I feel like we ought to document this in the SLAB API, but at the same time, I think we'll scare more people than we'll enlighten. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [rfc] SLOB memory ordering issue 2008-10-15 17:33 ` Linus Torvalds 2008-10-15 17:36 ` Linus Torvalds @ 2008-10-15 17:45 ` Nick Piggin 2008-10-15 18:03 ` Linus Torvalds 1 sibling, 1 reply; 20+ messages in thread From: Nick Piggin @ 2008-10-15 17:45 UTC (permalink / raw) To: Linus Torvalds; +Cc: Matt Mackall, Pekka Enberg, linux-mm, linux-kernel On Thursday 16 October 2008 04:33, Linus Torvalds wrote: > On Thu, 16 Oct 2008, Nick Piggin wrote: > > Now they allocate these guys, take a lock, then insert them into the > > page tables. The lock is only an acquire barrier, so it can leak past > > stores. > > I think that Matt's point was that the code is buggy regardless of any > ctor or not. > > If you make an allocation visible to other CPU's, you would need to make > sure that allocation is stable with a smp_wmb() before you update the > pointer to that allocation. What do you mean by the allocation is stable? Let's just talk in loads and stores and order. You need to make sure previous stores to initialise the object become visible before subsequent store to make the object visible. No questions about that (I think that's what you meant by make the alloc stable). 1. However, if the object is already fully initialised at the point the caller gets it out of the allocator, then the caller doesn't need to make any stores to initialise it obviously. 2. I think it could be easy to assume that the allocated object that was initialised with a ctor for us already will have its initializing stores ordered when we get it from slab. So in my page table almost-example, by combining 1 and 2, one might think it is OK to leave out those smp_wmb()s. And it would be valid code if all those assumptions _were_ true. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [rfc] SLOB memory ordering issue 2008-10-15 17:45 ` Nick Piggin @ 2008-10-15 18:03 ` Linus Torvalds 2008-10-15 18:12 ` Nick Piggin 0 siblings, 1 reply; 20+ messages in thread From: Linus Torvalds @ 2008-10-15 18:03 UTC (permalink / raw) To: Nick Piggin; +Cc: Matt Mackall, Pekka Enberg, linux-mm, linux-kernel On Thu, 16 Oct 2008, Nick Piggin wrote: > > What do you mean by the allocation is stable? "all writes done to it before it's exposed". > 2. I think it could be easy to assume that the allocated object that was > initialised with a ctor for us already will have its initializing stores > ordered when we get it from slab. You make tons of assumptions. You assume that (a) unlocked accesses are the normal case and should be something the allocator should prioritize/care about. (b) that if you have a ctor, it's the only thing the allocator will do. I don't think either of those assumptions are at all relevant or interesting. Quite the reverse - I'd expect them to be in a very small minority. Now, obviously, on pretty much all machines out there (ie x86[-64] and UP ARM), smp_wmb() is a no-op, so in that sense we could certainly say that "sure, this is a total special case, but we can add a smp_wmb() anyway since it won't cost us anything". On the other hand, on the machines where it doesn't cost us anything, it obviously doesn't _do_ anything either, so that argument is pretty dubious. And on machines where the memory ordering _can_ matter, it's going to add cost to the wrong point. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [rfc] SLOB memory ordering issue 2008-10-15 18:03 ` Linus Torvalds @ 2008-10-15 18:12 ` Nick Piggin 2008-10-15 18:19 ` Matt Mackall 2008-10-15 18:29 ` Linus Torvalds 0 siblings, 2 replies; 20+ messages in thread From: Nick Piggin @ 2008-10-15 18:12 UTC (permalink / raw) To: Linus Torvalds; +Cc: Matt Mackall, Pekka Enberg, linux-mm, linux-kernel On Thursday 16 October 2008 05:03, Linus Torvalds wrote: > On Thu, 16 Oct 2008, Nick Piggin wrote: > > What do you mean by the allocation is stable? > > "all writes done to it before it's exposed". > > > 2. I think it could be easy to assume that the allocated object that was > > initialised with a ctor for us already will have its initializing stores > > ordered when we get it from slab. > > You make tons of assumptions. > > You assume that > (a) unlocked accesses are the normal case and should be something the > allocator should prioritize/care about. > (b) that if you have a ctor, it's the only thing the allocator will do. Yes, as I said, I do not want to add a branch and/or barrier to the allocator for this. I just want to flag the issue and discuss whether there is anything that can be done about it. > I don't think either of those assumptions are at all relevant or > interesting. Quite the reverse - I'd expect them to be in a very small > minority. They will be in the minority or non-existant, but obviously there only need be one "counterexample" bug to disprove a claim that it never matters. > Now, obviously, on pretty much all machines out there (ie x86[-64] and UP > ARM), smp_wmb() is a no-op, so in that sense we could certainly say that > "sure, this is a total special case, but we can add a smp_wmb() anyway > since it won't cost us anything". > > On the other hand, on the machines where it doesn't cost us anything, it > obviously doesn't _do_ anything either, so that argument is pretty > dubious. > > And on machines where the memory ordering _can_ matter, it's going to add > cost to the wrong point. When I said "I'd really hate to add a branch to the slab fastpath", it wasn't a tacit acknowlegement that the barrier is the only way to go, if it sounded that way. I meant: I'd *really* hate to add a branch to the slab fastpath :) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [rfc] SLOB memory ordering issue 2008-10-15 18:12 ` Nick Piggin @ 2008-10-15 18:19 ` Matt Mackall 2008-10-15 18:35 ` Nick Piggin 2008-10-15 18:29 ` Linus Torvalds 1 sibling, 1 reply; 20+ messages in thread From: Matt Mackall @ 2008-10-15 18:19 UTC (permalink / raw) To: Nick Piggin; +Cc: Linus Torvalds, Pekka Enberg, linux-mm, linux-kernel On Thu, 2008-10-16 at 05:12 +1100, Nick Piggin wrote: > On Thursday 16 October 2008 05:03, Linus Torvalds wrote: > > On Thu, 16 Oct 2008, Nick Piggin wrote: > > > What do you mean by the allocation is stable? > > > > "all writes done to it before it's exposed". > > > > > 2. I think it could be easy to assume that the allocated object that was > > > initialised with a ctor for us already will have its initializing stores > > > ordered when we get it from slab. > > > > You make tons of assumptions. > > > > You assume that > > (a) unlocked accesses are the normal case and should be something the > > allocator should prioritize/care about. > > (b) that if you have a ctor, it's the only thing the allocator will do. > > Yes, as I said, I do not want to add a branch and/or barrier to the > allocator for this. I just want to flag the issue and discuss whether > there is anything that can be done about it. Well the alternative is to have someone really smart investigate all the lockless users of ctors and add appropriate barriers. I suspect that's a fairly small set and that you're already familiar with most of them. But yes, I think you may be on to a real problem. It might also be worth devoting a few neurons to thinking about zeroed allocations. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [rfc] SLOB memory ordering issue 2008-10-15 18:19 ` Matt Mackall @ 2008-10-15 18:35 ` Nick Piggin 2008-10-15 18:43 ` Linus Torvalds 0 siblings, 1 reply; 20+ messages in thread From: Nick Piggin @ 2008-10-15 18:35 UTC (permalink / raw) To: Matt Mackall; +Cc: Linus Torvalds, Pekka Enberg, linux-mm, linux-kernel On Thursday 16 October 2008 05:19, Matt Mackall wrote: > On Thu, 2008-10-16 at 05:12 +1100, Nick Piggin wrote: > > On Thursday 16 October 2008 05:03, Linus Torvalds wrote: > > > On Thu, 16 Oct 2008, Nick Piggin wrote: > > > > What do you mean by the allocation is stable? > > > > > > "all writes done to it before it's exposed". > > > > > > > 2. I think it could be easy to assume that the allocated object that > > > > was initialised with a ctor for us already will have its initializing > > > > stores ordered when we get it from slab. > > > > > > You make tons of assumptions. > > > > > > You assume that > > > (a) unlocked accesses are the normal case and should be something the > > > allocator should prioritize/care about. > > > (b) that if you have a ctor, it's the only thing the allocator will > > > do. > > > > Yes, as I said, I do not want to add a branch and/or barrier to the > > allocator for this. I just want to flag the issue and discuss whether > > there is anything that can be done about it. > > Well the alternative is to have someone really smart investigate all the > lockless users of ctors and add appropriate barriers. I suspect that's a > fairly small set and that you're already familiar with most of them. I thought someone might volunteer me for that job :) Actually, there are surprisingly huge number of them. What I would be most comfortable doing, if I was making a kernel to run my life support system on an SMP powerpc box, would be to spend zero time on all the drivers and whacky things with ctors and just add smp_wmb() after them if they are not _totally_ obvious. There is only one user in mm/. Which I had a quick look at. There might be other issues with it, but maybe with a bit more locking or an explicit barrier or two, it will become more obviously correct. fs/, inode, dentries, seem to be a very big user. I'd _guess_ they should be OK because the inode and dentry caches are pretty serialised as it is. I think unless a fs is doing something really crazy, it would be hard for one CPU to get to a new dentry, say, before it has been locked up and into the dcache. Casting a quick eye over things wouldn't hurt, though. > But yes, I think you may be on to a real problem. It might also be worth > devoting a few neurons to thinking about zeroed allocations. Hmm, that's a point. The page nor slab allocators don't order those before we get them either. I might have a sniff around mm/ or so, but no way I could audit all kzalloc etc. users in the kernel. Maybe that case is a bit more obvious though, because we're not conceptually thinking about pulling out an already-set-up-object living happily in ctor land somewhere. We explicitly say: allocate me something, and zero it out. It's 5am and I'm rambling at this point. Probably either one is about as likely to have bugs, and maybe only _very slightly_ more likely to have an ordering bug than any other kernel code. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [rfc] SLOB memory ordering issue 2008-10-15 18:35 ` Nick Piggin @ 2008-10-15 18:43 ` Linus Torvalds 2008-10-15 19:19 ` Nick Piggin 0 siblings, 1 reply; 20+ messages in thread From: Linus Torvalds @ 2008-10-15 18:43 UTC (permalink / raw) To: Nick Piggin; +Cc: Matt Mackall, Pekka Enberg, linux-mm, linux-kernel On Thu, 16 Oct 2008, Nick Piggin wrote: > > Actually, there are surprisingly huge number of them. What I would be > most comfortable doing, if I was making a kernel to run my life support > system on an SMP powerpc box, would be to spend zero time on all the > drivers and whacky things with ctors and just add smp_wmb() after them > if they are not _totally_ obvious. WHY? THIS HAS NOTHING TO DO WITH CONSTRUCTORS! If the driver is using locking, there is no memory ordering issues what-so-ever. And if the driver isn't using locking, IT IS BROKEN. It's that simple. Why do you keep bringing up non-issues? What matters is not constructors. Never has been. Constructors are actually very rare, it's much more common to do ptr = kmalloc(..) .. initialize it by hand .. and why do you think constructors are somehow different? They're not. What matter is how you look things up on the other CPU's. If you don't use locking, you use some lockless thing, and then you need to be careful about memory ordering. And quite frankly, if you're a driver, and you're trying to do lockless algorithms, you're just being crazy. You're going to have much worse bugs, and again, whether you use constructors or pink elephants is going to be totally irrelevant. So why do you bring up these totally pointless things? Why do you bring up drivers? Why do you bring up constructors? Why, why, why? Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [rfc] SLOB memory ordering issue 2008-10-15 18:43 ` Linus Torvalds @ 2008-10-15 19:19 ` Nick Piggin 2008-10-15 19:47 ` Linus Torvalds 0 siblings, 1 reply; 20+ messages in thread From: Nick Piggin @ 2008-10-15 19:19 UTC (permalink / raw) To: Linus Torvalds; +Cc: Matt Mackall, Pekka Enberg, linux-mm, linux-kernel On Thursday 16 October 2008 05:43, Linus Torvalds wrote: > On Thu, 16 Oct 2008, Nick Piggin wrote: > > Actually, there are surprisingly huge number of them. What I would be > > most comfortable doing, if I was making a kernel to run my life support > > system on an SMP powerpc box, would be to spend zero time on all the > > drivers and whacky things with ctors and just add smp_wmb() after them > > if they are not _totally_ obvious. > > WHY? I guess I wouldn't bother with your kernel. I was being hypothetical. Can you _prove_ no code has a bug due specifically to this issue? > THIS HAS NOTHING TO DO WITH CONSTRUCTORS! > > If the driver is using locking, there is no memory ordering issues > what-so-ever. > > And if the driver isn't using locking, IT IS BROKEN. Did you read the anon_vma example? It's broken if it assumes the objects coming out of its slab are always "stable". > It's that simple. Why do you keep bringing up non-issues? Why are you being antagonistic and assuming I'm wrong instead of considering you mistunderstand me, maybe I'm not a retard? I am bad at explaining myself, but I'll try once more. > What matters is not constructors. Never has been. Constructors are > actually very rare, it's much more common to do > > ptr = kmalloc(..) > .. initialize it by hand .. > > and why do you think constructors are somehow different? They're not. I think they might be interpreted or viewed by the caller as giving a "stable" object. It is rather more obvious to a caller that it has previous unordered stores if it is doing this ptr = kmalloc(..) .. initialize it by hand .. I haven't dealt much with constructors myself so I haven't really had to think about it. But I'm sure I could have missed it and been fooled. If you still don't agree, then fine; if I find a bug I'll send a patch. I don't want to keep arguing. > What matter is how you look things up on the other CPU's. If you don't use > locking, you use some lockless thing, and then you need to be careful > about memory ordering. > > And quite frankly, if you're a driver, and you're trying to do lockless > algorithms, you're just being crazy. You're going to have much worse bugs, > and again, whether you use constructors or pink elephants is going to be > totally irrelevant. > > So why do you bring up these totally pointless things? Why do you bring up > drivers? Why do you bring up constructors? Why, why, why? I'll try to keep them to myself in future. Thanks, Nick -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [rfc] SLOB memory ordering issue 2008-10-15 19:19 ` Nick Piggin @ 2008-10-15 19:47 ` Linus Torvalds 0 siblings, 0 replies; 20+ messages in thread From: Linus Torvalds @ 2008-10-15 19:47 UTC (permalink / raw) To: Nick Piggin; +Cc: Matt Mackall, Pekka Enberg, linux-mm, linux-kernel On Thu, 16 Oct 2008, Nick Piggin wrote: > > I guess I wouldn't bother with your kernel. I was being hypothetical. > Can you _prove_ no code has a bug due specifically to this issue? Nick, nobody can prove anything but the most trivial programs. > Did you read the anon_vma example? It's broken if it assumes the objects > coming out of its slab are always "stable". So why do you blame SLOB/SLAB? This is my whole and only point - you're pointing at all the wrong things, and then I get upset when I point out to you - over and over again - that you point to the wrong thing, and you just keep on (once more) pointing to it. Can you see my frustration? You keep on claiming this is somehow an issue of the slab allocator, and seem to refuse to just read what I write. But let me try it again: - The object you get from kmalloc (or *any* allocator) is a per-CPU object. It's your _local_ memory area. And it has to be that way, because no allocator can ever know the difference between objects that are going to have global visibility and objects that don't. So the allocator has to basically assume the cheap case. - Yes, we could add a "smp_wmb()" at the end of all allocators, but that would be a pointless no-op on architectures where it doesn't matter, and it would be potentially expensive on architectures where it _does_ matter. In other words, in neither case is it the right thing to do. - Most allocations by _far_ (at least in the static sense of "there's a lot of kmalloc/kmem_cache_alloc's in the kernel") are going to be used for things that are either thread-local (think temporary data structures like "__getname()" for path allocators) or are going to be used with proper locking. NONE OF THOSE CASES WANT THE OVERHEAD! And they are the *common* ones. - constructiors really have absolutely nothing to do with anything. What about kzalloc()? That's an implicit "constructor" too. Do you want the smp_wmb() there for that too? Do you realize that 99.9% of all such users will fill in a few bytes/fields in _addition_ to clearing the structure they just allocated? You do realize that almost nobody wants a really empty data structure? You _do_ realize that the "smp_wmb()" in the allocator IS TOTALLY USELESS if the code that did the allocation then updates a few other fields too? ADDING the smp_wmb() at an allocation point WOULD BE ACTIVELY MISLEADING. Anybody who thinks that it helps is just fooling himself. We're *much* better off just telling everybody that if they think they can do lockless data structures, they have to do the memory ordering at the _insertion_ point, and stop believing in fairies and wizards and in allocators doing it for them! - For _all_ of these reasons, any time you say that this is an allocator issue, or a constructor issue, I don't need to even bother reading any more. Because you've just shown yourself to not read what I wrote, nor understand the issue. So if you want to have a constructive discussion, you need to - *read* what I wrote. UNDERSTAND that memory ordering is a non-issue when there is locking involved, and that locking is still the default approach for any normal data structure. - *stop* talking about "constructors" and "SLOB allocators". Because as long as you do, you're not making sense. and if you can do that, I can treat you like you're worth talking to. But as long as you cannot accept that, what's the point? Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [rfc] SLOB memory ordering issue 2008-10-15 18:12 ` Nick Piggin 2008-10-15 18:19 ` Matt Mackall @ 2008-10-15 18:29 ` Linus Torvalds 1 sibling, 0 replies; 20+ messages in thread From: Linus Torvalds @ 2008-10-15 18:29 UTC (permalink / raw) To: Nick Piggin; +Cc: Matt Mackall, Pekka Enberg, linux-mm, linux-kernel On Thu, 16 Oct 2008, Nick Piggin wrote: > > When I said "I'd really hate to add a branch to the slab fastpath", it > wasn't a tacit acknowlegement that the barrier is the only way to go, > if it sounded that way. > > I meant: I'd *really* hate to add a branch to the slab fastpath :) Well, quite frankly, your choice of subject line and whole point of argument may have confused me. You started out - and continue to - make this sound like it's a SLAB/SLOB/SLUB issue. It's not. I agree there is quite likely memory ordering issues - possibly old ones, but quite possibly also ones that have just happened fairly recently as we've done more unlocked lookups - and all I've ever disagreed with is how you seem to have mixed this up with the allocator. And I still don't understand why you even _mention_ the slab fastpath. It seems totally immaterial. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [rfc] SLOB memory ordering issue 2008-10-15 17:10 ` Nick Piggin 2008-10-15 17:33 ` Linus Torvalds @ 2008-10-15 18:06 ` Nick Piggin 2008-10-15 18:26 ` Linus Torvalds 2008-10-17 20:29 ` Linus Torvalds 1 sibling, 2 replies; 20+ messages in thread From: Nick Piggin @ 2008-10-15 18:06 UTC (permalink / raw) To: Matt Mackall, Hugh Dickins; +Cc: torvalds, Pekka Enberg, linux-mm, linux-kernel On Thursday 16 October 2008 04:10, Nick Piggin wrote: > On Thursday 16 October 2008 03:54, Matt Mackall wrote: > > On Thu, 2008-10-16 at 03:34 +1100, Nick Piggin wrote: > > > I think I see a possible memory ordering problem with SLOB: > > > In slab caches with constructors, the constructor is run > > > before returning the object to caller, with no memory barrier > > > afterwards. > > > > > > Now there is nothing that indicates the _exact_ behaviour > > > required here. Is it at all reasonable to expect ->ctor() to > > > be visible to all CPUs and not just the allocating CPU? > > > > Do you have a failure scenario in mind? > > > > First, it's a categorical mistake for another CPU to be looking at the > > contents of an object unless it knows that it's in an allocated state. > > > > For another CPU to receive that knowledge (by reading a causally-valid > > pointer to it in memory), a memory barrier has to occur, no? > > No. > > For (slightly bad) example. Some architectures have a ctor for their > page table page slabs. Not a bad thing to do. > > Now they allocate these guys, take a lock, then insert them into the > page tables. The lock is only an acquire barrier, so it can leak past > stores. > > The read side is all lockless and in some cases even done by hardware. > > Now in _practice_, this is not a problem because some architectures > don't have ctors, and I spotted the issue and put proper barriers in > there. But it was a known fact that ctors were always used, and if I > had assumed ctors were barriers so didn't need the wmb, then there > would be a bug. > > Especially the fact that a lock doesn't order the stores makes me > worried that a lockless read-side algorithm might have a bug here. > Fortunately, most of them use RCU and probably use rcu_assign_pointer > even if they do have ctors. But I can't be sure... OK, now I have something that'll blow your fuckin mind. anon_vma_cachep. P0 do_anonymous_page() anon_vma_prepare() ctor(anon_vma) [sets vma->anon_vma = anon_vma] P1 do_anonymous_page() anon_vma_prepare() [sees P0 already allocated it] lru_cache_add(page) [flushes page to lru] page_add_anon_rmap (increments mapcount) page_set_anon_rmap [sets page->anon_vma = anon_vma] P2 find page from lru page_referenced() page_referenced_anon() page_lock_anon_vma() [loads anon_vma from page->anon_vma] spin_lock(&anon_vma->lock) Who was it that said memory ordering was self-evident? For everyone else: P1 sees P0's store to vma->anon_vma, then P2 sees P1's store to page->anon_vma (among others), but P2 does not see P0's ctor store to initialise anon_vma->lock. And there seems like another bug there too, but just a plain control race rather than strictly[*] a data race, P0 is executing list_add_tail of vma to anon_vma->head at some point here too, so even assuming we're running on a machine with transitive store ordering, then the above race can't hit, then P2 subsequently wants to run a list_for_each_entry over anon_vma->head while P0 is in the process of modifying it. Am I the one who's bamboozled, or can anyone confirm? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [rfc] SLOB memory ordering issue 2008-10-15 18:06 ` Nick Piggin @ 2008-10-15 18:26 ` Linus Torvalds 2008-10-15 18:50 ` Nick Piggin 2008-10-17 20:29 ` Linus Torvalds 1 sibling, 1 reply; 20+ messages in thread From: Linus Torvalds @ 2008-10-15 18:26 UTC (permalink / raw) To: Nick Piggin Cc: Matt Mackall, Hugh Dickins, Pekka Enberg, linux-mm, linux-kernel On Thu, 16 Oct 2008, Nick Piggin wrote: > > Who was it that said memory ordering was self-evident? Nobody has _ever_ said that memory ordering is self-evident. Quite the reverse. What we've said is that it's not a ctor issue. This has nothing what-so-ever to do with ctors, and everything to do with the fact that lockless is hard. And the general rule is: to find a page (or any other data structures) on another CPU, you need to insert it into the right data structures. And that insertion event needs to either be locked, or it needs to be ordered. But notice that it's the _insertion_ event. Not the ctor. Not the allocator. It's the person _doing_ the allocation that needs to order things. See? And no, I didn't look at your exact case. But for pages in page tables, we'd need to have the right smp_wmb() at the "set_pte[_at]()" stage, either inside that macro or in the caller. We used to only care about the page _contents_ (because the only unlocked access was the one that was done by hardware), but now that we do unlocked lookups in software too, we need to make sure the "struct page" itself is also valid. For non-page-table lookups (LRU, radix trees, etc etc), the rules are different. Again, it's not an _allocator_ (or ctor) issue, it's about the point where you insert the thing. If you insert the page using a lock, you need not worry about memory ordering at all. And if you insert it using RCU, you do. This is *all* we have argued about. The argument is simple: this has NOTHING to do with the allocator, and has NOTHING to do with constructors. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [rfc] SLOB memory ordering issue 2008-10-15 18:26 ` Linus Torvalds @ 2008-10-15 18:50 ` Nick Piggin 0 siblings, 0 replies; 20+ messages in thread From: Nick Piggin @ 2008-10-15 18:50 UTC (permalink / raw) To: Linus Torvalds Cc: Matt Mackall, Hugh Dickins, Pekka Enberg, linux-mm, linux-kernel On Thursday 16 October 2008 05:26, Linus Torvalds wrote: > On Thu, 16 Oct 2008, Nick Piggin wrote: > > Who was it that said memory ordering was self-evident? > > Nobody has _ever_ said that memory ordering is self-evident. Quite the > reverse. Heh, no that was a dig at someone who actually did say that the other day. Not you or Matt. > What we've said is that it's not a ctor issue. This has nothing > what-so-ever to do with ctors, and everything to do with the fact that > lockless is hard. > > And the general rule is: to find a page (or any other data structures) on > another CPU, you need to insert it into the right data structures. And > that insertion event needs to either be locked, or it needs to be ordered. Sure. And "ordered" doesn't come by itself, but specifically with preceeding stores that initialise the object. > But notice that it's the _insertion_ event. Not the ctor. Not the > allocator. It's the person _doing_ the allocation that needs to order > things. > > See? I see it as a joint effort between the code initialising the object, and the code making it visible. But whatever. Whether or not we agree on the exact obviousness or whatever, I don't want to argue. I think the solution is agreed more or less "very unlikely to be bugs, and anyway the only way to fix them is to document it and read code". > And no, I didn't look at your exact case. But for pages in page tables, > we'd need to have the right smp_wmb() at the "set_pte[_at]()" stage, > either inside that macro or in the caller. The example was actually talking about _page table pages_ in page tables. > We used to only care about the page _contents_ (because the only unlocked > access was the one that was done by hardware), but now that we do unlocked > lookups in software too, we need to make sure the "struct page" itself is > also valid. I added those. Actually we _also_ need them for lockless hardware and software walks, and not just for struct page, but also page table page contents (which is what some architectures initialise with ctor). Because otherwise you can insert your page table page, only to have that store pass an earlier store to invalidate one of its ptes, and boom! -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [rfc] SLOB memory ordering issue 2008-10-15 18:06 ` Nick Piggin 2008-10-15 18:26 ` Linus Torvalds @ 2008-10-17 20:29 ` Linus Torvalds 1 sibling, 0 replies; 20+ messages in thread From: Linus Torvalds @ 2008-10-17 20:29 UTC (permalink / raw) To: Nick Piggin Cc: Matt Mackall, Hugh Dickins, Pekka Enberg, linux-mm, linux-kernel Ok, finally looked at this. There is indeed a locking bug there. "anon_vma_prepare()" optimistically looks at vma->anon_vma without taking the &mm->page_table_lock. That's not right. Of course, we could just take the lock, but in this case it's probably ok to just admit that we have a lockless algorithm. But that implies that we need to do the right memory ordering. And that, in turn, doesn't just imply a "smp_wmb()" - if you do memory ordering, you need to do it on *both* sides, so now the other side needs to also do a matching smp_rmb(). Or, in this case, smp_rmb_depends(), I guess. That, btw, is an important part of memory ordering. You can never do ordering on just one side. A "smp_wmb()" on its own is always nonsensical. It always needs to be paired with a "smp_rmb()" variant. Something like the appended may fix it. But I do think we have a potential independent issue with the new page table lookup code now that it's lock-free. We have the smp_rmb() calls in gup_get_pte() (at least on x86), when we look things up, but we don't actually have a lot of smp_wmb()'s there when we insert the page. For the anonymous page case, we end up doing a page_add_new_anon_rmap(); before we do the set_pte_at() that actually exposes it, and that does the whole page->mapping = (struct address_space *) anon_vma; page->index = linear_page_index(vma, address); thing, but there is no write barrier between those and the actual write to the page tables, so when GUP looks up the page, it can't actually depend on page->mappign or anything else! Now, this really isn't an issue on x86, since smp_wmb() is a no-op, and the compiler won't be re-ordering the writes, but in general I do think that now that we do lockless lookup of pages from the page tables, we probably do need smp_wmb()'s there just in front of the "set_pte_at()" calls. NOTE NOTE! The patch below is only about "page->anon_vma", not about the GUP lookup and page->mapping/index fields. That's an independent issue. And notice? This has _nothing_ to do with constructors or allocators. And of course - this patch is totally untested, and may well need some thinking about. Linus --- mm/rmap.c | 18 ++++++++++++++++++ 1 files changed, 18 insertions(+), 0 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 0383acf..21d09bb 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -81,6 +81,13 @@ int anon_vma_prepare(struct vm_area_struct *vma) /* page_table_lock to protect against threads */ spin_lock(&mm->page_table_lock); if (likely(!vma->anon_vma)) { + /* + * We hold the mm->page_table_lock, but another + * CPU may be doing an optimistic load (the one + * at the top), and we want to make sure that + * the anon_vma changes are visible. + */ + smp_wmb(); vma->anon_vma = anon_vma; list_add_tail(&vma->anon_vma_node, &anon_vma->head); allocated = NULL; @@ -92,6 +99,17 @@ int anon_vma_prepare(struct vm_area_struct *vma) if (unlikely(allocated)) anon_vma_free(allocated); } + /* + * Subtle: we looked up anon_vma without any locking + * (in the comon case), and are going to look at the + * spinlock etc behind it. In order to know that it's + * initialized, we need to do a read barrier here. + * + * We can use the cheaper "depends" version, since we + * are following a pointer, and only on alpha may that + * give a stale value. + */ + smp_read_barrier_depends(); return 0; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2008-10-17 20:29 UTC | newest] Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2008-10-15 16:34 [rfc] SLOB memory ordering issue Nick Piggin 2008-10-15 16:46 ` Nick Piggin 2008-10-15 16:54 ` Matt Mackall 2008-10-15 17:10 ` Nick Piggin 2008-10-15 17:33 ` Linus Torvalds 2008-10-15 17:36 ` Linus Torvalds 2008-10-15 17:58 ` Matt Mackall 2008-10-15 17:45 ` Nick Piggin 2008-10-15 18:03 ` Linus Torvalds 2008-10-15 18:12 ` Nick Piggin 2008-10-15 18:19 ` Matt Mackall 2008-10-15 18:35 ` Nick Piggin 2008-10-15 18:43 ` Linus Torvalds 2008-10-15 19:19 ` Nick Piggin 2008-10-15 19:47 ` Linus Torvalds 2008-10-15 18:29 ` Linus Torvalds 2008-10-15 18:06 ` Nick Piggin 2008-10-15 18:26 ` Linus Torvalds 2008-10-15 18:50 ` Nick Piggin 2008-10-17 20:29 ` Linus Torvalds
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox