* Re: SLUB: Fix dynamic dma kmalloc cache creation [not found] <200708100559.l7A5x3r2019930@hera.kernel.org> @ 2007-08-10 7:40 ` Andrew Morton 2007-08-10 17:40 ` Christoph Lameter 2007-08-10 18:37 ` Christoph Lameter 0 siblings, 2 replies; 7+ messages in thread From: Andrew Morton @ 2007-08-10 7:40 UTC (permalink / raw) To: Christoph Lameter; +Cc: linux-mm On Fri, 10 Aug 2007 05:59:03 GMT Linux Kernel Mailing List <linux-kernel@vger.kernel.org> wrote: > Gitweb: http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=1ceef40249f21eceabf8633934d94962e7d8e1d7 > Commit: 1ceef40249f21eceabf8633934d94962e7d8e1d7 > Parent: fcda3d89bf1366f6801447eab2d8a75ac5b9c4ce > Author: Christoph Lameter <clameter@sgi.com> > AuthorDate: Tue Aug 7 15:11:48 2007 -0700 > Committer: Christoph Lameter <clameter@sgi.com> > CommitDate: Thu Aug 9 21:57:16 2007 -0700 > > SLUB: Fix dynamic dma kmalloc cache creation > > The dynamic dma kmalloc creation can run into trouble if a > GFP_ATOMIC allocation is the first one performed for a certain size > of dma kmalloc slab. > > - Move the adding of the slab to sysfs into a workqueue > (sysfs does GFP_KERNEL allocations) > - Do not call kmem_cache_destroy() (uses slub_lock) > - Only acquire the slub_lock once and--if we cannot wait--do a trylock. > > This introduces a slight risk of the first kmalloc(x, GFP_DMA|GFP_ATOMIC) > for a range of sizes failing due to another process holding the slub_lock. > However, we only need to acquire the spinlock once in order to establish > each power of two DMA kmalloc cache. The possible conflict is with the > slub_lock taken during slab management actions (create / remove slab cache). > > It is rather typical that a driver will first fill its buffers using > GFP_KERNEL allocations which will wait until the slub_lock can be acquired. > Drivers will also create its slab caches first outside of an atomic > context before starting to use atomic kmalloc from an interrupt context. > > If there are any failures then they will occur early after boot or when > loading of multiple drivers concurrently. Drivers can already accomodate > failures of GFP_ATOMIC for other reasons. Retries will then create the slab. > Well that was fairly foul. What was wrong wih turning slub_lock into a spinlock? > static noinline struct kmem_cache *dma_kmalloc_cache(int index, gfp_t flags) > { > struct kmem_cache *s; > - struct kmem_cache *x; > char *text; > size_t realsize; > > @@ -2289,22 +2306,36 @@ static noinline struct kmem_cache *dma_kmalloc_cache(int index, gfp_t flags) > return s; > > /* Dynamically create dma cache */ > - x = kmalloc(kmem_size, flags & ~SLUB_DMA); > - if (!x) > - panic("Unable to allocate memory for dma cache\n"); > + if (flags & __GFP_WAIT) > + down_write(&slub_lock); > + else { > + if (!down_write_trylock(&slub_lock)) > + goto out; > + } > + > + if (kmalloc_caches_dma[index]) > + goto unlock_out; > > realsize = kmalloc_caches[index].objsize; > - text = kasprintf(flags & ~SLUB_DMA, "kmalloc_dma-%d", > - (unsigned int)realsize); > - s = create_kmalloc_cache(x, text, realsize, flags); > - down_write(&slub_lock); > - if (!kmalloc_caches_dma[index]) { > - kmalloc_caches_dma[index] = s; > - up_write(&slub_lock); > - return s; > + text = kasprintf(flags & ~SLUB_DMA, "kmalloc_dma-%d", (unsigned int)realsize), > + s = kmalloc(kmem_size, flags & ~SLUB_DMA); > + > + if (!s || !text || !kmem_cache_open(s, flags, text, > + realsize, ARCH_KMALLOC_MINALIGN, > + SLAB_CACHE_DMA|__SYSFS_ADD_DEFERRED, NULL)) { > + kfree(s); > + kfree(text); > + goto unlock_out; > } > + > + list_add(&s->list, &slab_caches); > + kmalloc_caches_dma[index] = s; > + > + schedule_work(&sysfs_add_work); sysfs_add_work could be already pending, or running. boom. > +unlock_out: > up_write(&slub_lock); > - kmem_cache_destroy(s); > +out: > return kmalloc_caches_dma[index]; > } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: SLUB: Fix dynamic dma kmalloc cache creation 2007-08-10 7:40 ` SLUB: Fix dynamic dma kmalloc cache creation Andrew Morton @ 2007-08-10 17:40 ` Christoph Lameter 2007-08-10 18:40 ` Andrew Morton 2007-08-10 18:37 ` Christoph Lameter 1 sibling, 1 reply; 7+ messages in thread From: Christoph Lameter @ 2007-08-10 17:40 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-mm On Fri, 10 Aug 2007, Andrew Morton wrote: > Well that was fairly foul. What was wrong wih turning slub_lock into a > spinlock? It would make things even worse because we would have always to do atomic allocs when holding the lock. Or allocate before and then take the lock to check if someone else has created it. If so we would need to fall back meaning we cannot avoid kmem_cache_destroy() from dynamic cache creation. The trylock avoids the kmem_cache_destroy() and is minimally invasive. > > + schedule_work(&sysfs_add_work); > > sysfs_add_work could be already pending, or running. boom. sysfs_add_work takes the slub_lock. It cannot be running. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: SLUB: Fix dynamic dma kmalloc cache creation 2007-08-10 17:40 ` Christoph Lameter @ 2007-08-10 18:40 ` Andrew Morton 2007-08-10 18:52 ` Christoph Lameter 0 siblings, 1 reply; 7+ messages in thread From: Andrew Morton @ 2007-08-10 18:40 UTC (permalink / raw) To: Christoph Lameter; +Cc: linux-mm On Fri, 10 Aug 2007 10:40:15 -0700 (PDT) Christoph Lameter <clameter@sgi.com> wrote: > On Fri, 10 Aug 2007, Andrew Morton wrote: > > > Well that was fairly foul. What was wrong wih turning slub_lock into a > > spinlock? > > It would make things even worse because we would have always to do atomic > allocs when holding the lock. That would be dumb. > Or allocate before and then take the > lock to check if someone else has created it. Obviously better. > If so we would need to fall > back meaning we cannot avoid kmem_cache_destroy() from dynamic cache > creation. I think you meant kmem_cache_close(). There's nothing wrong with running kmem_cache_close() synchronously, inside spinlock. > The trylock avoids the kmem_cache_destroy() and is minimally > invasive. I see no need for a kmem_cache_destroy() call. The sysfs stuff hasn't been created yet. The trylock is *revolting*. They always are. They introduce rarely-occurring special cases which get little runtime testing and they introduce special-cases which often only get exercised with certain configs. As an example, look at what this patch did. There are crufty old drivers out there which do GFP_ATOMIC allocations at init-time because they got themselves in a locking mess. Old scsi drivers come to mind. Old scsi drivers often use GFP_DMA too. We've now gone and increased the probability of those allocations failing. Contrary to the assertions in the changelog, those allocations will not be retried. Generally if an allocation fails at initialisation time a crufty old driver will either a) fail the initialisation (ie: boot failure), b) panic or c) oops the kernel. > > > + schedule_work(&sysfs_add_work); > > > > sysfs_add_work could be already pending, or running. boom. > > sysfs_add_work takes the slub_lock. It cannot be running. It _can_ be running and it _can_ be pending. But yes, given schedule_work()'s behaviour when an already-pending work is rescheduled and given the locking which you have in there, that part of the code appears to be reliable. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: SLUB: Fix dynamic dma kmalloc cache creation 2007-08-10 18:40 ` Andrew Morton @ 2007-08-10 18:52 ` Christoph Lameter 0 siblings, 0 replies; 7+ messages in thread From: Christoph Lameter @ 2007-08-10 18:52 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-mm On Fri, 10 Aug 2007, Andrew Morton wrote: > > back meaning we cannot avoid kmem_cache_destroy() from dynamic cache > > creation. > > I think you meant kmem_cache_close(). There's nothing wrong with running > kmem_cache_close() synchronously, inside spinlock. Using kmem_cache_close would be possible if we defer the sysfs_add as in the existing patch and keep the unrolling of the create_kmalloc_cache function. > > The trylock avoids the kmem_cache_destroy() and is minimally > > invasive. > > I see no need for a kmem_cache_destroy() call. The sysfs stuff hasn't > been created yet. Yes after my patch that is true. Its not true before the patch. Before we called create_kmalloc_path. > The trylock is *revolting*. They always are. They introduce > rarely-occurring special cases which get little runtime testing and they > introduce special-cases which often only get exercised with certain > configs. This is an extremely rare condition for rarely used functionality that we are dealing with here. The acquisition is only needed for a very few numbers of times. Changing to spinlock is a signficant change in slub certainly not fit for 2.6.23. Spinlocks will make later allocations of per node and per cpu structures (cpu hotplug node hotplug) while looping over the list of all slabs very difficult. > As an example, look at what this patch did. There are crufty old drivers > out there which do GFP_ATOMIC allocations at init-time because they got > themselves in a locking mess. Old scsi drivers come to mind. Old scsi > drivers often use GFP_DMA too. We've now gone and increased the > probability of those allocations failing. Given that SLUB has been out there for 6 months with this issue and we have not heard from them I think that is very unlikely. The issue only surfaced when someone started tinkering with a new driver that needed GFP_DMA for allocs < 1GB. > > sysfs_add_work takes the slub_lock. It cannot be running. > > It _can_ be running and it _can_ be pending. But yes, given > schedule_work()'s behaviour when an already-pending work is rescheduled and > given the locking which you have in there, that part of the code appears to > be reliable. Well we both reversed position on that one. See the other message. After the changes for SLUB (page allocator pass through) in mm we have reduced the number of kmalloc slabs to PAGE_SHIFT. We could create all kmalloc dma slabs at init time without wasting too much memory for the mm version of SLUB. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: SLUB: Fix dynamic dma kmalloc cache creation 2007-08-10 7:40 ` SLUB: Fix dynamic dma kmalloc cache creation Andrew Morton 2007-08-10 17:40 ` Christoph Lameter @ 2007-08-10 18:37 ` Christoph Lameter 2007-08-10 18:53 ` Andrew Morton 1 sibling, 1 reply; 7+ messages in thread From: Christoph Lameter @ 2007-08-10 18:37 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-mm > > + schedule_work(&sysfs_add_work); On Fri, 10 Aug 2007, Andrew Morton wrote: > sysfs_add_work could be already pending, or running. boom. Ok so queue_work serializes with run_workqueue but does not check that the entry is already inserted? static void __queue_work(struct cpu_workqueue_struct *cwq, struct work_struct *work) { unsigned long flags; spin_lock_irqsave(&cwq->lock, flags); insert_work(cwq, work, 1); spin_unlock_irqrestore(&cwq->lock, flags); } run_workqueue static void run_workqueue(struct cpu_workqueue_struct *cwq) { spin_lock_irq(&cwq->lock); cwq->run_depth++; if (cwq->run_depth > 3) { ... Then we need this patch? SLUB dynamic kmalloc cache create: Prevent scheduling sysfs_add_slab workqueue twice. If another dynamic slab creation is done shortly after an earlier one and the sysfs_add_slab function has not been run yet then we may corrupt the workqueue list since we schedule the work structure twice. Avoid that by setting a flag indicating that the sysfs add work has already been scheduled. sysfs_add_func can handle adding multiple dma kmalloc slab in one go so we do not need to schedule it again. Signed-off-by: Christoph Lameter <clameter@sgi.com> Index: linux-2.6/mm/slub.c =================================================================== --- linux-2.6.orig/mm/slub.c 2007-08-10 11:14:28.000000000 -0700 +++ linux-2.6/mm/slub.c 2007-08-10 11:25:13.000000000 -0700 @@ -2279,6 +2279,8 @@ panic: #ifdef CONFIG_ZONE_DMA +static int sysfs_add_scheduled = 0; + static void sysfs_add_func(struct work_struct *w) { struct kmem_cache *s; @@ -2290,6 +2292,7 @@ static void sysfs_add_func(struct work_s sysfs_slab_add(s); } } + sysfs_add_scheduled = 0; up_write(&slub_lock); } @@ -2331,7 +2334,10 @@ static noinline struct kmem_cache *dma_k list_add(&s->list, &slab_caches); kmalloc_caches_dma[index] = s; - schedule_work(&sysfs_add_work); + if (!sysfs_add_scheduled) { + schedule_work(&sysfs_add_work); + sysfs_add_scheduled = 1; + } unlock_out: up_write(&slub_lock); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: SLUB: Fix dynamic dma kmalloc cache creation 2007-08-10 18:37 ` Christoph Lameter @ 2007-08-10 18:53 ` Andrew Morton 2007-08-10 20:55 ` Christoph Lameter 0 siblings, 1 reply; 7+ messages in thread From: Andrew Morton @ 2007-08-10 18:53 UTC (permalink / raw) To: Christoph Lameter; +Cc: linux-mm On Fri, 10 Aug 2007 11:37:13 -0700 (PDT) Christoph Lameter <clameter@sgi.com> wrote: > Then we need this patch? > > SLUB dynamic kmalloc cache create: Prevent scheduling sysfs_add_slab workqueue twice. No, that's OK - see the test_and_set_bit() in queue_work(). -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: SLUB: Fix dynamic dma kmalloc cache creation 2007-08-10 18:53 ` Andrew Morton @ 2007-08-10 20:55 ` Christoph Lameter 0 siblings, 0 replies; 7+ messages in thread From: Christoph Lameter @ 2007-08-10 20:55 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-mm Just to show that we could do it (I can never resist a challenge. Sigh). I think the approach is overkill and it may create some new races vs. hotplug since we may now be operating on slabs that are not on the slab list. Not tested. SLUB: Dynamic dma kmalloc slab allocation: Make it reliable We can avoid the slight chance of failing on the first GFP_ATOMIC|GFP_DMA allocation through a new spin lock in the ZONE_DMA section if we do not take the slub_lock in dma_cache_create() but speculatively allocate the kmem_cache structures and related entities. Then we take the dma cache lock and check if the cache was already installed. If so then we just call kmem_cache_close() (I moved the flushing from kmem_cache_close() into kmem_cache_destroy() to make that work and added checking so that kmem_cache_close() works on a kmem_cache structure that has no nodes allocated) and then free up the space we allocated. If we are successful then we schedule the dma_cache_add_func(). The function now scans over the dma kmalloc caches instead over all the slab caches. If it finds a dma kmalloc caches whose adding to the list was deferred then it will add the kmalloc cache to the slab list in addition to performing the sysfs add. This means that during short periods we may have active slab caches that are not on the slab lists. We create races with cpu and node hotplug by doing so. But maybe they are negligible. Signed-off-by: Christoph Lameter <clameter@sgi.com> Index: linux-2.6/mm/slub.c =================================================================== --- linux-2.6.orig/mm/slub.c 2007-08-10 13:13:35.000000000 -0700 +++ linux-2.6/mm/slub.c 2007-08-10 13:50:58.000000000 -0700 @@ -212,7 +212,7 @@ static inline void ClearSlabDebug(struct /* Internal SLUB flags */ #define __OBJECT_POISON 0x80000000 /* Poison object */ -#define __SYSFS_ADD_DEFERRED 0x40000000 /* Not yet visible via sysfs */ +#define __SLAB_ADD_DEFERRED 0x40000000 /* Not yet added to list */ /* Not all arches define cache_line_size */ #ifndef cache_line_size @@ -2174,15 +2174,15 @@ static inline int kmem_cache_close(struc { int node; - flush_all(s); - /* Attempt to free all objects */ for_each_online_node(node) { struct kmem_cache_node *n = get_node(s, node); - n->nr_partial -= free_list(s, n, &n->partial); - if (atomic_long_read(&n->nr_slabs)) - return 1; + if (n) { + n->nr_partial -= free_list(s, n, &n->partial); + if (atomic_long_read(&n->nr_slabs)) + return 1; + } } free_kmem_cache_nodes(s); return 0; @@ -2194,6 +2194,7 @@ static inline int kmem_cache_close(struc */ void kmem_cache_destroy(struct kmem_cache *s) { + flush_all(s); down_write(&slub_lock); s->refcount--; if (!s->refcount) { @@ -2215,10 +2216,6 @@ EXPORT_SYMBOL(kmem_cache_destroy); struct kmem_cache kmalloc_caches[KMALLOC_SHIFT_HIGH + 1] __cacheline_aligned; EXPORT_SYMBOL(kmalloc_caches); -#ifdef CONFIG_ZONE_DMA -static struct kmem_cache *kmalloc_caches_dma[KMALLOC_SHIFT_HIGH + 1]; -#endif - static int __init setup_slub_min_order(char *str) { get_option (&str, &slub_min_order); @@ -2278,22 +2275,35 @@ panic: } #ifdef CONFIG_ZONE_DMA +static struct kmem_cache *kmalloc_caches_dma[KMALLOC_SHIFT_HIGH + 1]; -static void sysfs_add_func(struct work_struct *w) +static spinlock_t dma_cache_lock; + +static void dma_cache_add_func(struct work_struct *w) { struct kmem_cache *s; + struct kmem_cache **p; - down_write(&slub_lock); - list_for_each_entry(s, &slab_caches, list) { - if (s->flags & __SYSFS_ADD_DEFERRED) { - s->flags &= ~__SYSFS_ADD_DEFERRED; +redo: + spin_lock(&dma_cache_lock); + for (p = kmalloc_caches_dma; + p < kmalloc_caches_dma + KMALLOC_SHIFT_HIGH + 1; p++) { + s = *p; + + if (s->flags & __SLAB_ADD_DEFERRED) { + spin_unlock(&dma_cache_lock); + down_write(&slub_lock); + s->flags &= ~__SLAB_ADD_DEFERRED; + list_add(&s->list, &slab_caches); sysfs_slab_add(s); + up_write(&slub_lock); + goto redo; } } - up_write(&slub_lock); + spin_unlock(&dma_cache_lock); } -static DECLARE_WORK(sysfs_add_work, sysfs_add_func); +static DECLARE_WORK(dma_cache_add_work, dma_cache_add_func); static noinline struct kmem_cache *dma_kmalloc_cache(int index, gfp_t flags) { @@ -2306,36 +2316,30 @@ static noinline struct kmem_cache *dma_k return s; /* Dynamically create dma cache */ - if (flags & __GFP_WAIT) - down_write(&slub_lock); - else { - if (!down_write_trylock(&slub_lock)) - goto out; - } - - if (kmalloc_caches_dma[index]) - goto unlock_out; - realsize = kmalloc_caches[index].objsize; text = kasprintf(flags & ~SLUB_DMA, "kmalloc_dma-%d", (unsigned int)realsize), s = kmalloc(kmem_size, flags & ~SLUB_DMA); if (!s || !text || !kmem_cache_open(s, flags, text, realsize, ARCH_KMALLOC_MINALIGN, - SLAB_CACHE_DMA|__SYSFS_ADD_DEFERRED, NULL)) { - kfree(s); - kfree(text); - goto unlock_out; - } + SLAB_CACHE_DMA|__SLAB_ADD_DEFERRED, NULL)) + goto out; - list_add(&s->list, &slab_caches); + spin_lock(&dma_cache_lock); + if (kmalloc_caches_dma[index]) { + spin_unlock(&dma_cache_lock); + goto out; + } kmalloc_caches_dma[index] = s; + spin_unlock(&dma_cache_lock); - schedule_work(&sysfs_add_work); + schedule_work(&dma_cache_add_work); + return kmalloc_caches_dma[index]; -unlock_out: - up_write(&slub_lock); out: + kmem_cache_close(s); + kfree(s); + kfree(text); return kmalloc_caches_dma[index]; } #endif -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2007-08-10 20:55 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <200708100559.l7A5x3r2019930@hera.kernel.org>
2007-08-10 7:40 ` SLUB: Fix dynamic dma kmalloc cache creation Andrew Morton
2007-08-10 17:40 ` Christoph Lameter
2007-08-10 18:40 ` Andrew Morton
2007-08-10 18:52 ` Christoph Lameter
2007-08-10 18:37 ` Christoph Lameter
2007-08-10 18:53 ` Andrew Morton
2007-08-10 20:55 ` Christoph Lameter
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox