* Re: SLUB: Fix dynamic dma kmalloc cache creation
[not found] <200708100559.l7A5x3r2019930@hera.kernel.org>
@ 2007-08-10 7:40 ` Andrew Morton
2007-08-10 17:40 ` Christoph Lameter
2007-08-10 18:37 ` Christoph Lameter
0 siblings, 2 replies; 7+ messages in thread
From: Andrew Morton @ 2007-08-10 7:40 UTC (permalink / raw)
To: Christoph Lameter; +Cc: linux-mm
On Fri, 10 Aug 2007 05:59:03 GMT Linux Kernel Mailing List <linux-kernel@vger.kernel.org> wrote:
> Gitweb: http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=1ceef40249f21eceabf8633934d94962e7d8e1d7
> Commit: 1ceef40249f21eceabf8633934d94962e7d8e1d7
> Parent: fcda3d89bf1366f6801447eab2d8a75ac5b9c4ce
> Author: Christoph Lameter <clameter@sgi.com>
> AuthorDate: Tue Aug 7 15:11:48 2007 -0700
> Committer: Christoph Lameter <clameter@sgi.com>
> CommitDate: Thu Aug 9 21:57:16 2007 -0700
>
> SLUB: Fix dynamic dma kmalloc cache creation
>
> The dynamic dma kmalloc creation can run into trouble if a
> GFP_ATOMIC allocation is the first one performed for a certain size
> of dma kmalloc slab.
>
> - Move the adding of the slab to sysfs into a workqueue
> (sysfs does GFP_KERNEL allocations)
> - Do not call kmem_cache_destroy() (uses slub_lock)
> - Only acquire the slub_lock once and--if we cannot wait--do a trylock.
>
> This introduces a slight risk of the first kmalloc(x, GFP_DMA|GFP_ATOMIC)
> for a range of sizes failing due to another process holding the slub_lock.
> However, we only need to acquire the spinlock once in order to establish
> each power of two DMA kmalloc cache. The possible conflict is with the
> slub_lock taken during slab management actions (create / remove slab cache).
>
> It is rather typical that a driver will first fill its buffers using
> GFP_KERNEL allocations which will wait until the slub_lock can be acquired.
> Drivers will also create its slab caches first outside of an atomic
> context before starting to use atomic kmalloc from an interrupt context.
>
> If there are any failures then they will occur early after boot or when
> loading of multiple drivers concurrently. Drivers can already accomodate
> failures of GFP_ATOMIC for other reasons. Retries will then create the slab.
>
Well that was fairly foul. What was wrong wih turning slub_lock into a
spinlock?
> static noinline struct kmem_cache *dma_kmalloc_cache(int index, gfp_t flags)
> {
> struct kmem_cache *s;
> - struct kmem_cache *x;
> char *text;
> size_t realsize;
>
> @@ -2289,22 +2306,36 @@ static noinline struct kmem_cache *dma_kmalloc_cache(int index, gfp_t flags)
> return s;
>
> /* Dynamically create dma cache */
> - x = kmalloc(kmem_size, flags & ~SLUB_DMA);
> - if (!x)
> - panic("Unable to allocate memory for dma cache\n");
> + if (flags & __GFP_WAIT)
> + down_write(&slub_lock);
> + else {
> + if (!down_write_trylock(&slub_lock))
> + goto out;
> + }
> +
> + if (kmalloc_caches_dma[index])
> + goto unlock_out;
>
> realsize = kmalloc_caches[index].objsize;
> - text = kasprintf(flags & ~SLUB_DMA, "kmalloc_dma-%d",
> - (unsigned int)realsize);
> - s = create_kmalloc_cache(x, text, realsize, flags);
> - down_write(&slub_lock);
> - if (!kmalloc_caches_dma[index]) {
> - kmalloc_caches_dma[index] = s;
> - up_write(&slub_lock);
> - return s;
> + text = kasprintf(flags & ~SLUB_DMA, "kmalloc_dma-%d", (unsigned int)realsize),
> + s = kmalloc(kmem_size, flags & ~SLUB_DMA);
> +
> + if (!s || !text || !kmem_cache_open(s, flags, text,
> + realsize, ARCH_KMALLOC_MINALIGN,
> + SLAB_CACHE_DMA|__SYSFS_ADD_DEFERRED, NULL)) {
> + kfree(s);
> + kfree(text);
> + goto unlock_out;
> }
> +
> + list_add(&s->list, &slab_caches);
> + kmalloc_caches_dma[index] = s;
> +
> + schedule_work(&sysfs_add_work);
sysfs_add_work could be already pending, or running. boom.
> +unlock_out:
> up_write(&slub_lock);
> - kmem_cache_destroy(s);
> +out:
> return kmalloc_caches_dma[index];
> }
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: SLUB: Fix dynamic dma kmalloc cache creation
2007-08-10 7:40 ` SLUB: Fix dynamic dma kmalloc cache creation Andrew Morton
@ 2007-08-10 17:40 ` Christoph Lameter
2007-08-10 18:40 ` Andrew Morton
2007-08-10 18:37 ` Christoph Lameter
1 sibling, 1 reply; 7+ messages in thread
From: Christoph Lameter @ 2007-08-10 17:40 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm
On Fri, 10 Aug 2007, Andrew Morton wrote:
> Well that was fairly foul. What was wrong wih turning slub_lock into a
> spinlock?
It would make things even worse because we would have always to do atomic
allocs when holding the lock. Or allocate before and then take the
lock to check if someone else has created it. If so we would need to fall
back meaning we cannot avoid kmem_cache_destroy() from dynamic cache
creation. The trylock avoids the kmem_cache_destroy() and is minimally
invasive.
> > + schedule_work(&sysfs_add_work);
>
> sysfs_add_work could be already pending, or running. boom.
sysfs_add_work takes the slub_lock. It cannot be running.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: SLUB: Fix dynamic dma kmalloc cache creation
2007-08-10 7:40 ` SLUB: Fix dynamic dma kmalloc cache creation Andrew Morton
2007-08-10 17:40 ` Christoph Lameter
@ 2007-08-10 18:37 ` Christoph Lameter
2007-08-10 18:53 ` Andrew Morton
1 sibling, 1 reply; 7+ messages in thread
From: Christoph Lameter @ 2007-08-10 18:37 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm
> > + schedule_work(&sysfs_add_work);
On Fri, 10 Aug 2007, Andrew Morton wrote:
> sysfs_add_work could be already pending, or running. boom.
Ok so queue_work serializes with run_workqueue but does not check that the
entry is already inserted?
static void __queue_work(struct cpu_workqueue_struct *cwq,
struct work_struct *work)
{
unsigned long flags;
spin_lock_irqsave(&cwq->lock, flags);
insert_work(cwq, work, 1);
spin_unlock_irqrestore(&cwq->lock, flags);
}
run_workqueue
static void run_workqueue(struct cpu_workqueue_struct *cwq)
{
spin_lock_irq(&cwq->lock);
cwq->run_depth++;
if (cwq->run_depth > 3) {
...
Then we need this patch?
SLUB dynamic kmalloc cache create: Prevent scheduling sysfs_add_slab workqueue twice.
If another dynamic slab creation is done shortly after an earlier one and
the sysfs_add_slab function has not been run yet then we may corrupt the
workqueue list since we schedule the work structure twice.
Avoid that by setting a flag indicating that the sysfs add work has
already been scheduled. sysfs_add_func can handle adding multiple
dma kmalloc slab in one go so we do not need to schedule it again.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c 2007-08-10 11:14:28.000000000 -0700
+++ linux-2.6/mm/slub.c 2007-08-10 11:25:13.000000000 -0700
@@ -2279,6 +2279,8 @@ panic:
#ifdef CONFIG_ZONE_DMA
+static int sysfs_add_scheduled = 0;
+
static void sysfs_add_func(struct work_struct *w)
{
struct kmem_cache *s;
@@ -2290,6 +2292,7 @@ static void sysfs_add_func(struct work_s
sysfs_slab_add(s);
}
}
+ sysfs_add_scheduled = 0;
up_write(&slub_lock);
}
@@ -2331,7 +2334,10 @@ static noinline struct kmem_cache *dma_k
list_add(&s->list, &slab_caches);
kmalloc_caches_dma[index] = s;
- schedule_work(&sysfs_add_work);
+ if (!sysfs_add_scheduled) {
+ schedule_work(&sysfs_add_work);
+ sysfs_add_scheduled = 1;
+ }
unlock_out:
up_write(&slub_lock);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: SLUB: Fix dynamic dma kmalloc cache creation
2007-08-10 17:40 ` Christoph Lameter
@ 2007-08-10 18:40 ` Andrew Morton
2007-08-10 18:52 ` Christoph Lameter
0 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2007-08-10 18:40 UTC (permalink / raw)
To: Christoph Lameter; +Cc: linux-mm
On Fri, 10 Aug 2007 10:40:15 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:
> On Fri, 10 Aug 2007, Andrew Morton wrote:
>
> > Well that was fairly foul. What was wrong wih turning slub_lock into a
> > spinlock?
>
> It would make things even worse because we would have always to do atomic
> allocs when holding the lock.
That would be dumb.
> Or allocate before and then take the
> lock to check if someone else has created it.
Obviously better.
> If so we would need to fall
> back meaning we cannot avoid kmem_cache_destroy() from dynamic cache
> creation.
I think you meant kmem_cache_close(). There's nothing wrong with running
kmem_cache_close() synchronously, inside spinlock.
> The trylock avoids the kmem_cache_destroy() and is minimally
> invasive.
I see no need for a kmem_cache_destroy() call. The sysfs stuff hasn't
been created yet.
The trylock is *revolting*. They always are. They introduce
rarely-occurring special cases which get little runtime testing and they
introduce special-cases which often only get exercised with certain
configs.
As an example, look at what this patch did. There are crufty old drivers
out there which do GFP_ATOMIC allocations at init-time because they got
themselves in a locking mess. Old scsi drivers come to mind. Old scsi
drivers often use GFP_DMA too. We've now gone and increased the
probability of those allocations failing.
Contrary to the assertions in the changelog, those allocations will not be
retried. Generally if an allocation fails at initialisation time a crufty
old driver will either a) fail the initialisation (ie: boot failure), b)
panic or c) oops the kernel.
> > > + schedule_work(&sysfs_add_work);
> >
> > sysfs_add_work could be already pending, or running. boom.
>
> sysfs_add_work takes the slub_lock. It cannot be running.
It _can_ be running and it _can_ be pending. But yes, given
schedule_work()'s behaviour when an already-pending work is rescheduled and
given the locking which you have in there, that part of the code appears to
be reliable.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: SLUB: Fix dynamic dma kmalloc cache creation
2007-08-10 18:40 ` Andrew Morton
@ 2007-08-10 18:52 ` Christoph Lameter
0 siblings, 0 replies; 7+ messages in thread
From: Christoph Lameter @ 2007-08-10 18:52 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm
On Fri, 10 Aug 2007, Andrew Morton wrote:
> > back meaning we cannot avoid kmem_cache_destroy() from dynamic cache
> > creation.
>
> I think you meant kmem_cache_close(). There's nothing wrong with running
> kmem_cache_close() synchronously, inside spinlock.
Using kmem_cache_close would be possible if we defer the sysfs_add as
in the existing patch and keep the unrolling of the
create_kmalloc_cache function.
> > The trylock avoids the kmem_cache_destroy() and is minimally
> > invasive.
>
> I see no need for a kmem_cache_destroy() call. The sysfs stuff hasn't
> been created yet.
Yes after my patch that is true. Its not true before the patch. Before we
called create_kmalloc_path.
> The trylock is *revolting*. They always are. They introduce
> rarely-occurring special cases which get little runtime testing and they
> introduce special-cases which often only get exercised with certain
> configs.
This is an extremely rare condition for rarely used functionality that we
are dealing with here. The acquisition is only needed for a very few
numbers of times.
Changing to spinlock is a signficant change in slub certainly not fit for
2.6.23. Spinlocks will make later allocations of per node and per
cpu structures (cpu hotplug node hotplug) while looping over the list of all
slabs very difficult.
> As an example, look at what this patch did. There are crufty old drivers
> out there which do GFP_ATOMIC allocations at init-time because they got
> themselves in a locking mess. Old scsi drivers come to mind. Old scsi
> drivers often use GFP_DMA too. We've now gone and increased the
> probability of those allocations failing.
Given that SLUB has been out there for 6 months with this issue and we
have not heard from them I think that is very unlikely. The issue only
surfaced when someone started tinkering with a new driver that needed
GFP_DMA for allocs < 1GB.
> > sysfs_add_work takes the slub_lock. It cannot be running.
>
> It _can_ be running and it _can_ be pending. But yes, given
> schedule_work()'s behaviour when an already-pending work is rescheduled and
> given the locking which you have in there, that part of the code appears to
> be reliable.
Well we both reversed position on that one. See the other
message.
After the changes for SLUB (page allocator pass through) in mm we have
reduced the number of kmalloc slabs to PAGE_SHIFT. We could create all
kmalloc dma slabs at init time without wasting too much memory for the mm
version of SLUB.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: SLUB: Fix dynamic dma kmalloc cache creation
2007-08-10 18:37 ` Christoph Lameter
@ 2007-08-10 18:53 ` Andrew Morton
2007-08-10 20:55 ` Christoph Lameter
0 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2007-08-10 18:53 UTC (permalink / raw)
To: Christoph Lameter; +Cc: linux-mm
On Fri, 10 Aug 2007 11:37:13 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:
> Then we need this patch?
>
> SLUB dynamic kmalloc cache create: Prevent scheduling sysfs_add_slab workqueue twice.
No, that's OK - see the test_and_set_bit() in queue_work().
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: SLUB: Fix dynamic dma kmalloc cache creation
2007-08-10 18:53 ` Andrew Morton
@ 2007-08-10 20:55 ` Christoph Lameter
0 siblings, 0 replies; 7+ messages in thread
From: Christoph Lameter @ 2007-08-10 20:55 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm
Just to show that we could do it (I can never resist a challenge. Sigh).
I think the approach is overkill and it may create some new races
vs. hotplug since we may now be operating on slabs that are not
on the slab list. Not tested.
SLUB: Dynamic dma kmalloc slab allocation: Make it reliable
We can avoid the slight chance of failing on the first GFP_ATOMIC|GFP_DMA
allocation through a new spin lock in the ZONE_DMA section if we do not
take the slub_lock in dma_cache_create() but speculatively allocate the
kmem_cache structures and related entities. Then we take the dma cache
lock and check if the cache was already installed. If so then we just call
kmem_cache_close() (I moved the flushing from kmem_cache_close() into
kmem_cache_destroy() to make that work and added checking so that
kmem_cache_close() works on a kmem_cache structure that has no nodes
allocated) and then free up the space we allocated.
If we are successful then we schedule the dma_cache_add_func().
The function now scans over the dma kmalloc caches instead over all the
slab caches. If it finds a dma kmalloc caches whose adding to the
list was deferred then it will add the kmalloc cache to the slab list
in addition to performing the sysfs add.
This means that during short periods we may have active slab caches
that are not on the slab lists. We create races with cpu and node hotplug
by doing so. But maybe they are negligible.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c 2007-08-10 13:13:35.000000000 -0700
+++ linux-2.6/mm/slub.c 2007-08-10 13:50:58.000000000 -0700
@@ -212,7 +212,7 @@ static inline void ClearSlabDebug(struct
/* Internal SLUB flags */
#define __OBJECT_POISON 0x80000000 /* Poison object */
-#define __SYSFS_ADD_DEFERRED 0x40000000 /* Not yet visible via sysfs */
+#define __SLAB_ADD_DEFERRED 0x40000000 /* Not yet added to list */
/* Not all arches define cache_line_size */
#ifndef cache_line_size
@@ -2174,15 +2174,15 @@ static inline int kmem_cache_close(struc
{
int node;
- flush_all(s);
-
/* Attempt to free all objects */
for_each_online_node(node) {
struct kmem_cache_node *n = get_node(s, node);
- n->nr_partial -= free_list(s, n, &n->partial);
- if (atomic_long_read(&n->nr_slabs))
- return 1;
+ if (n) {
+ n->nr_partial -= free_list(s, n, &n->partial);
+ if (atomic_long_read(&n->nr_slabs))
+ return 1;
+ }
}
free_kmem_cache_nodes(s);
return 0;
@@ -2194,6 +2194,7 @@ static inline int kmem_cache_close(struc
*/
void kmem_cache_destroy(struct kmem_cache *s)
{
+ flush_all(s);
down_write(&slub_lock);
s->refcount--;
if (!s->refcount) {
@@ -2215,10 +2216,6 @@ EXPORT_SYMBOL(kmem_cache_destroy);
struct kmem_cache kmalloc_caches[KMALLOC_SHIFT_HIGH + 1] __cacheline_aligned;
EXPORT_SYMBOL(kmalloc_caches);
-#ifdef CONFIG_ZONE_DMA
-static struct kmem_cache *kmalloc_caches_dma[KMALLOC_SHIFT_HIGH + 1];
-#endif
-
static int __init setup_slub_min_order(char *str)
{
get_option (&str, &slub_min_order);
@@ -2278,22 +2275,35 @@ panic:
}
#ifdef CONFIG_ZONE_DMA
+static struct kmem_cache *kmalloc_caches_dma[KMALLOC_SHIFT_HIGH + 1];
-static void sysfs_add_func(struct work_struct *w)
+static spinlock_t dma_cache_lock;
+
+static void dma_cache_add_func(struct work_struct *w)
{
struct kmem_cache *s;
+ struct kmem_cache **p;
- down_write(&slub_lock);
- list_for_each_entry(s, &slab_caches, list) {
- if (s->flags & __SYSFS_ADD_DEFERRED) {
- s->flags &= ~__SYSFS_ADD_DEFERRED;
+redo:
+ spin_lock(&dma_cache_lock);
+ for (p = kmalloc_caches_dma;
+ p < kmalloc_caches_dma + KMALLOC_SHIFT_HIGH + 1; p++) {
+ s = *p;
+
+ if (s->flags & __SLAB_ADD_DEFERRED) {
+ spin_unlock(&dma_cache_lock);
+ down_write(&slub_lock);
+ s->flags &= ~__SLAB_ADD_DEFERRED;
+ list_add(&s->list, &slab_caches);
sysfs_slab_add(s);
+ up_write(&slub_lock);
+ goto redo;
}
}
- up_write(&slub_lock);
+ spin_unlock(&dma_cache_lock);
}
-static DECLARE_WORK(sysfs_add_work, sysfs_add_func);
+static DECLARE_WORK(dma_cache_add_work, dma_cache_add_func);
static noinline struct kmem_cache *dma_kmalloc_cache(int index, gfp_t flags)
{
@@ -2306,36 +2316,30 @@ static noinline struct kmem_cache *dma_k
return s;
/* Dynamically create dma cache */
- if (flags & __GFP_WAIT)
- down_write(&slub_lock);
- else {
- if (!down_write_trylock(&slub_lock))
- goto out;
- }
-
- if (kmalloc_caches_dma[index])
- goto unlock_out;
-
realsize = kmalloc_caches[index].objsize;
text = kasprintf(flags & ~SLUB_DMA, "kmalloc_dma-%d", (unsigned int)realsize),
s = kmalloc(kmem_size, flags & ~SLUB_DMA);
if (!s || !text || !kmem_cache_open(s, flags, text,
realsize, ARCH_KMALLOC_MINALIGN,
- SLAB_CACHE_DMA|__SYSFS_ADD_DEFERRED, NULL)) {
- kfree(s);
- kfree(text);
- goto unlock_out;
- }
+ SLAB_CACHE_DMA|__SLAB_ADD_DEFERRED, NULL))
+ goto out;
- list_add(&s->list, &slab_caches);
+ spin_lock(&dma_cache_lock);
+ if (kmalloc_caches_dma[index]) {
+ spin_unlock(&dma_cache_lock);
+ goto out;
+ }
kmalloc_caches_dma[index] = s;
+ spin_unlock(&dma_cache_lock);
- schedule_work(&sysfs_add_work);
+ schedule_work(&dma_cache_add_work);
+ return kmalloc_caches_dma[index];
-unlock_out:
- up_write(&slub_lock);
out:
+ kmem_cache_close(s);
+ kfree(s);
+ kfree(text);
return kmalloc_caches_dma[index];
}
#endif
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2007-08-10 20:55 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <200708100559.l7A5x3r2019930@hera.kernel.org>
2007-08-10 7:40 ` SLUB: Fix dynamic dma kmalloc cache creation Andrew Morton
2007-08-10 17:40 ` Christoph Lameter
2007-08-10 18:40 ` Andrew Morton
2007-08-10 18:52 ` Christoph Lameter
2007-08-10 18:37 ` Christoph Lameter
2007-08-10 18:53 ` Andrew Morton
2007-08-10 20:55 ` Christoph Lameter
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox