Re: [PATCH] dm: Fix deadlock under high i/o load in raid1 setup.

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH] dm: Fix deadlock under high i/o load in raid1 setup.
@ 2007-08-13 11:33 Heiko Carstens
  2007-08-15 22:56 ` Andrew Morton
  0 siblings, 1 reply; 5+ messages in thread
From: Heiko Carstens @ 2007-08-13 11:33 UTC (permalink / raw)
  To: linux-mm, dm-devel
  Cc: Daniel Kobras, Alasdair G Kergon, Stefan Weinhuber, Stefan Bader,
	Andrew Morton, Linus Torvalds

Hi,

the patch below went into 2.6.18. Now my question is: why doesn't it check
if kmalloc(..., GFP_NOIO) returns with a NULL pointer?
Did I miss anything that guarentees that this will always succeed or is it
just a bug?

commit c06aad854fdb9da38fcc22dccfe9d72919453e43
Author: Daniel Kobras <kobras@linux.de>
Date:   Sun Aug 27 01:23:24 2006 -0700

    [PATCH] dm: Fix deadlock under high i/o load in raid1 setup.
    
    On an nForce4-equipped machine with two SATA disk in raid1 setup using dmraid,
    we experienced frequent deadlock of the system under high i/o load.  'cat
    /dev/zero > ~/zero' was the most reliable way to reproduce them: Randomly
    after a few GB, 'cp' would be left in 'D' state along with kjournald and
    kmirrord.  The functions cp and kjournald were blocked in did vary, but
    kmirrord's wchan always pointed to 'mempool_alloc()'.  We've seen this pattern
    on 2.6.15 and 2.6.17 kernels.  http://lkml.org/lkml/2005/4/20/142 indicates
    that this problem has been around even before.
    
    So much for the facts, here's my interpretation: mempool_alloc() first tries
    to atomically allocate the requested memory, or falls back to hand out
    preallocated chunks from the mempool.  If both fail, it puts the calling
    process (kmirrord in this case) on a private waitqueue until somebody refills
    the pool.  Where the only 'somebody' is kmirrord itself, so we have a
    deadlock.
    
    I worked around this problem by falling back to a (blocking) kmalloc when
    before kmirrord would have ended up on the waitqueue.  This defeats part of
    the benefits of using the mempool, but at least keeps the system running.  And
    it could be done with a two-line change.  Note that mempool_alloc() clears the
    GFP_NOIO flag internally, and only uses it to decide whether to wait or return
    an error if immediate allocation fails, so the attached patch doesn't change
    behaviour in the non-deadlocking case.  Path is against current git
    (2.6.18-rc4), but should apply to earlier versions as well.  I've tested on
    2.6.15, where this patch makes the difference between random lockup and a
    stable system.
    
    Signed-off-by: Daniel Kobras <kobras@linux.de>
    Acked-by: Alasdair G Kergon <agk@redhat.com>
    Cc: <stable@kernel.org>
    Signed-off-by: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>

diff --git a/drivers/md/dm-raid1.c b/drivers/md/dm-raid1.c
index be48ced..c54de98 100644
--- a/drivers/md/dm-raid1.c
+++ b/drivers/md/dm-raid1.c
@@ -255,7 +255,9 @@ static struct region *__rh_alloc(struct region_hash *rh, region_t region)
 	struct region *reg, *nreg;
 
 	read_unlock(&rh->hash_lock);
-	nreg = mempool_alloc(rh->region_pool, GFP_NOIO);
+	nreg = mempool_alloc(rh->region_pool, GFP_ATOMIC);
+	if (unlikely(!nreg))
+		nreg = kmalloc(sizeof(struct region), GFP_NOIO);
 	nreg->state = rh->log->type->in_sync(rh->log, region, 1) ?
 		RH_CLEAN : RH_NOSYNC;
 	nreg->rh = rh;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] dm: Fix deadlock under high i/o load in raid1 setup.
  2007-08-13 11:33 [PATCH] dm: Fix deadlock under high i/o load in raid1 setup Heiko Carstens
@ 2007-08-15 22:56 ` Andrew Morton
  2007-08-15 23:59   ` Heiko Carstens
  0 siblings, 1 reply; 5+ messages in thread
From: Andrew Morton @ 2007-08-15 22:56 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: linux-mm, dm-devel, Daniel Kobras, Alasdair G Kergon,
	Stefan Weinhuber, Stefan Bader, Linus Torvalds

On Mon, 13 Aug 2007 13:33:40 +0200
Heiko Carstens <heiko.carstens@de.ibm.com> wrote:

> Hi,
> 
> the patch below went into 2.6.18. Now my question is: why doesn't it check
> if kmalloc(..., GFP_NOIO) returns with a NULL pointer?
> Did I miss anything that guarentees that this will always succeed or is it
> just a bug?

How come my computer is the only one with a reply button?

Sigh.

> commit c06aad854fdb9da38fcc22dccfe9d72919453e43
> Author: Daniel Kobras <kobras@linux.de>
> Date:   Sun Aug 27 01:23:24 2006 -0700
> 
>     [PATCH] dm: Fix deadlock under high i/o load in raid1 setup.
>     
>     On an nForce4-equipped machine with two SATA disk in raid1 setup using dmraid,
>     we experienced frequent deadlock of the system under high i/o load.  'cat
>     /dev/zero > ~/zero' was the most reliable way to reproduce them: Randomly
>     after a few GB, 'cp' would be left in 'D' state along with kjournald and
>     kmirrord.  The functions cp and kjournald were blocked in did vary, but
>     kmirrord's wchan always pointed to 'mempool_alloc()'.  We've seen this pattern
>     on 2.6.15 and 2.6.17 kernels.  http://lkml.org/lkml/2005/4/20/142 indicates
>     that this problem has been around even before.
>     
>     So much for the facts, here's my interpretation: mempool_alloc() first tries
>     to atomically allocate the requested memory, or falls back to hand out
>     preallocated chunks from the mempool.  If both fail, it puts the calling
>     process (kmirrord in this case) on a private waitqueue until somebody refills
>     the pool.  Where the only 'somebody' is kmirrord itself, so we have a
>     deadlock.
>     
>     I worked around this problem by falling back to a (blocking) kmalloc when
>     before kmirrord would have ended up on the waitqueue.  This defeats part of
>     the benefits of using the mempool, but at least keeps the system running.  And
>     it could be done with a two-line change.  Note that mempool_alloc() clears the
>     GFP_NOIO flag internally, and only uses it to decide whether to wait or return
>     an error if immediate allocation fails, so the attached patch doesn't change
>     behaviour in the non-deadlocking case.  Path is against current git
>     (2.6.18-rc4), but should apply to earlier versions as well.  I've tested on
>     2.6.15, where this patch makes the difference between random lockup and a
>     stable system.
>     
>     Signed-off-by: Daniel Kobras <kobras@linux.de>
>     Acked-by: Alasdair G Kergon <agk@redhat.com>
>     Cc: <stable@kernel.org>
>     Signed-off-by: Andrew Morton <akpm@osdl.org>
>     Signed-off-by: Linus Torvalds <torvalds@osdl.org>
> 
> diff --git a/drivers/md/dm-raid1.c b/drivers/md/dm-raid1.c
> index be48ced..c54de98 100644
> --- a/drivers/md/dm-raid1.c
> +++ b/drivers/md/dm-raid1.c
> @@ -255,7 +255,9 @@ static struct region *__rh_alloc(struct region_hash *rh, region_t region)
>  	struct region *reg, *nreg;
>  
>  	read_unlock(&rh->hash_lock);
> -	nreg = mempool_alloc(rh->region_pool, GFP_NOIO);
> +	nreg = mempool_alloc(rh->region_pool, GFP_ATOMIC);
> +	if (unlikely(!nreg))
> +		nreg = kmalloc(sizeof(struct region), GFP_NOIO);
>  	nreg->state = rh->log->type->in_sync(rh->log, region, 1) ?
>  		RH_CLEAN : RH_NOSYNC;
>  	nreg->rh = rh;
> 

Yeah, that's a bug.

kmalloc(small_amount, GFP_NOIO) can fail if the calling process gets
oom-killed, and it can fail if the system is using fault-injection.

One could say "don't use fault injection" and, perhaps, "this is only
ever called by a kernel thread and kernel threads don't get oom-killed". 
But the former is lame and the latter assumes current implementation
details which could change (and indeed have in the past).


So yes, I'd say this is a bug in DM.

Also, __rh_alloc() is called under read_lock(), via __rh_find().  If
__rh_alloc()'s mempool_alloc() fails, it will perform a sleeping allocation
under read_lock(), which is deadlockable and will generate might_sleep()
warnings


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] dm: Fix deadlock under high i/o load in raid1 setup.
  2007-08-15 22:56 ` Andrew Morton
@ 2007-08-15 23:59   ` Heiko Carstens
  2007-08-16  3:10     ` Andrew Morton
  0 siblings, 1 reply; 5+ messages in thread
From: Heiko Carstens @ 2007-08-15 23:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, dm-devel, Daniel Kobras, Alasdair G Kergon,
	Stefan Weinhuber, Stefan Bader, Linus Torvalds

On Wed, Aug 15, 2007 at 03:56:04PM -0700, Andrew Morton wrote:
> On Mon, 13 Aug 2007 13:33:40 +0200
> Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> > the patch below went into 2.6.18. Now my question is: why doesn't it check
> > if kmalloc(..., GFP_NOIO) returns with a NULL pointer?
> > Did I miss anything that guarentees that this will always succeed or is it
> > just a bug?
> > --- a/drivers/md/dm-raid1.c
> > +++ b/drivers/md/dm-raid1.c
> > @@ -255,7 +255,9 @@ static struct region *__rh_alloc(struct region_hash *rh, region_t region)
> >  	struct region *reg, *nreg;
> >  
> >  	read_unlock(&rh->hash_lock);
> > -	nreg = mempool_alloc(rh->region_pool, GFP_NOIO);
> > +	nreg = mempool_alloc(rh->region_pool, GFP_ATOMIC);
> > +	if (unlikely(!nreg))
> > +		nreg = kmalloc(sizeof(struct region), GFP_NOIO);
> >  	nreg->state = rh->log->type->in_sync(rh->log, region, 1) ?
> >  		RH_CLEAN : RH_NOSYNC;
> >  	nreg->rh = rh;
> > 
> 
> Yeah, that's a bug.
> 
> kmalloc(small_amount, GFP_NOIO) can fail if the calling process gets
> oom-killed, and it can fail if the system is using fault-injection.
> 
> One could say "don't use fault injection" and, perhaps, "this is only
> ever called by a kernel thread and kernel threads don't get oom-killed". 
> But the former is lame and the latter assumes current implementation
> details which could change (and indeed have in the past).

Thanks for clarifying!

> So yes, I'd say this is a bug in DM.
> 
> Also, __rh_alloc() is called under read_lock(), via __rh_find().  If
> __rh_alloc()'s mempool_alloc() fails, it will perform a sleeping allocation
> under read_lock(), which is deadlockable and will generate might_sleep()
> warnings

The read_lock() is unlocked at the beginning of the function. Unless
you're talking of a different lock, but I couldn't find any.

So at least _currently_ this should work unless somebody uses fault
injection. Would it make sense then to add the __GFP_NOFAIL flag to
the kmalloc call?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] dm: Fix deadlock under high i/o load in raid1 setup.
  2007-08-15 23:59   ` Heiko Carstens
@ 2007-08-16  3:10     ` Andrew Morton
  2007-08-16  7:09       ` [dm-devel] " Stefan Bader
  0 siblings, 1 reply; 5+ messages in thread
From: Andrew Morton @ 2007-08-16  3:10 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: linux-mm, dm-devel, Daniel Kobras, Alasdair G Kergon,
	Stefan Weinhuber, Stefan Bader, Linus Torvalds

On Thu, 16 Aug 2007 01:59:56 +0200 Heiko Carstens <heiko.carstens@de.ibm.com> wrote:

> > So yes, I'd say this is a bug in DM.
> > 
> > Also, __rh_alloc() is called under read_lock(), via __rh_find().  If
> > __rh_alloc()'s mempool_alloc() fails, it will perform a sleeping allocation
> > under read_lock(), which is deadlockable and will generate might_sleep()
> > warnings
> 
> The read_lock() is unlocked at the beginning of the function.

Oh, OK.  Looks odd, but whatever.

> Unless
> you're talking of a different lock, but I couldn't find any.
> 
> So at least _currently_ this should work unless somebody uses fault
> injection. Would it make sense then to add the __GFP_NOFAIL flag to
> the kmalloc call?

It would best to avoid that.  __GFP_NOFAIL was added as a way of
consolidating a number of callsites which were performing open-coded
infinite retries and it is also used as a "this is lame and needs to be
fixed" indicator.

It'd be better to fix the kmirrord design so that it can use mempools
properly.  One possible way of doing that might be to notice when mempool
exhaustion happens, submit whatever IO is thus-far buffered up and then do
a sleeping mempool allocation, to wait for that memory to come free (via IO
completion).

That would be a bit abusive of the mempool intent though.  A more idiomatic
fix would be to change kmirrord so that it no longer can consume all of the
mempool's reserves without having submitted any I/O (which is what I assume
it is doing).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dm-devel] Re: [PATCH] dm: Fix deadlock under high i/o load in raid1 setup.
  2007-08-16  3:10     ` Andrew Morton
@ 2007-08-16  7:09       ` Stefan Bader
  0 siblings, 0 replies; 5+ messages in thread
From: Stefan Bader @ 2007-08-16  7:09 UTC (permalink / raw)
  To: device-mapper development
  Cc: Heiko Carstens, Stefan Weinhuber, linux-mm, Daniel Kobras,
	Linus Torvalds, Alasdair G Kergon

>>> How come my computer is the only one with a reply button?

Hey, I've got one. ;-)

2007/8/16, Andrew Morton <akpm@linux-foundation.org>:
> On Thu, 16 Aug 2007 01:59:56 +0200 Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
>
> > > So yes, I'd say this is a bug in DM.
> > >
> > > Also, __rh_alloc() is called under read_lock(), via __rh_find().  If
> > > __rh_alloc()'s mempool_alloc() fails, it will perform a sleeping allocation
> > > under read_lock(), which is deadlockable and will generate might_sleep()
> > > warnings
> >
> > The read_lock() is unlocked at the beginning of the function.
>
> Oh, OK.  Looks odd, but whatever.
>

The major trick, if I am not wrong, is to use GFP_ATOMIC on that
mempool_alloc(). This prevents the sleeping allocation but fails, if
memory as well as the pool is exhausted.

>
> It'd be better to fix the kmirrord design so that it can use mempools
> properly.  One possible way of doing that might be to notice when mempool
> exhaustion happens, submit whatever IO is thus-far buffered up and then do
> a sleeping mempool allocation, to wait for that memory to come free (via IO
> completion).
>
> That would be a bit abusive of the mempool intent though.  A more idiomatic
> fix would be to change kmirrord so that it no longer can consume all of the
> mempool's reserves without having submitted any I/O (which is what I assume
> it is doing).
>

The problem is, that only the same thread, that allocates from the
pool would return memory back. This would be done before the new
allocations. But, if there is very high memory pressure, the pool
might get drained in the allocation cycle. Then mempool_alloc() waits
to be woken from mempool_free(). And this never happens, since the
thread will be stuck. So I guess the fix would be to somehow separate
the allocation and freeing functionality. If I remember correctly
back, the patch was always seen as "not quite correctly, but seems to
work". However, due to lack of time, nobody ever came up with a better
solution.

Stefan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-08-16  7:09 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-08-13 11:33 [PATCH] dm: Fix deadlock under high i/o load in raid1 setup Heiko Carstens
2007-08-15 22:56 ` Andrew Morton
2007-08-15 23:59   ` Heiko Carstens
2007-08-16  3:10     ` Andrew Morton
2007-08-16  7:09       ` [dm-devel] " Stefan Bader

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox