Why kmem_cache_free occupy CPU for more than 10 seconds?

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Why kmem_cache_free occupy CPU for more than 10 seconds?
@ 2007-04-11  6:17 Zhao Forrest
  2007-04-11  6:53 ` Pekka Enberg
  2007-04-11 22:30 ` Andrew Morton
  0 siblings, 2 replies; 17+ messages in thread
From: Zhao Forrest @ 2007-04-11  6:17 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel

Hi,

We're using RHEL5 with kernel version 2.6.18-8.el5.
When doing a stress test on raw device for about 3-4 hours, we found
the soft lockup message in dmesg.
I know we're not reporting the bug on the latest kernel, but does any
expert know if this is the known issue in old kernel? Or why
kmem_cache_free occupy CPU for more than 10 seconds?

Please let me know if you need any information.

Thanks,
Forrest
--------------------------------------------------------------
BUG: soft lockup detected on CPU#1!

Call Trace:
 <IRQ>  [<ffffffff800b2c93>] softlockup_tick+0xdb/0xed
 [<ffffffff800933df>] update_process_times+0x42/0x68
 [<ffffffff80073d97>] smp_local_timer_interrupt+0x23/0x47
 [<ffffffff80074459>] smp_apic_timer_interrupt+0x41/0x47
 [<ffffffff8005bcc2>] apic_timer_interrupt+0x66/0x6c
 <EOI>  [<ffffffff80007660>] kmem_cache_free+0x1c0/0x1cb
 [<ffffffff800262ee>] free_buffer_head+0x2a/0x43
 [<ffffffff80027110>] try_to_free_buffers+0x89/0x9d
 [<ffffffff80043041>] invalidate_mapping_pages+0x90/0x15f
 [<ffffffff800d4a77>] kill_bdev+0xe/0x21
 [<ffffffff800d4f9d>] __blkdev_put+0x4f/0x169
 [<ffffffff80012281>] __fput+0xae/0x198
 [<ffffffff80023647>] filp_close+0x5c/0x64
 [<ffffffff800384f9>] put_files_struct+0x6c/0xc3
 [<ffffffff80014f01>] do_exit+0x2d2/0x8b1
 [<ffffffff80046eb6>] cpuset_exit+0x0/0x6c
 [<ffffffff8002abd7>] get_signal_to_deliver+0x427/0x456
 [<ffffffff80059122>] do_notify_resume+0x9c/0x7a9
 [<ffffffff80086c6d>] default_wake_function+0x0/0xe
 [<ffffffff800b1fd8>] audit_syscall_exit+0x2cd/0x2ec
 [<ffffffff8005b362>] int_signal+0x12/0x17
--------------------------------------------------------------

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
  2007-04-11  6:17 Why kmem_cache_free occupy CPU for more than 10 seconds? Zhao Forrest
@ 2007-04-11  6:53 ` Pekka Enberg
  2007-04-11  9:14   ` Zhao Forrest
  2007-04-11  9:53   ` Zhao Forrest
  2007-04-11 22:30 ` Andrew Morton
  1 sibling, 2 replies; 17+ messages in thread
From: Pekka Enberg @ 2007-04-11  6:53 UTC (permalink / raw)
  To: Zhao Forrest; +Cc: linux-mm, linux-kernel

On 4/11/07, Zhao Forrest <forrest.zhao@gmail.com> wrote:
> We're using RHEL5 with kernel version 2.6.18-8.el5.
> When doing a stress test on raw device for about 3-4 hours, we found
> the soft lockup message in dmesg.
> I know we're not reporting the bug on the latest kernel, but does any
> expert know if this is the known issue in old kernel? Or why
> kmem_cache_free occupy CPU for more than 10 seconds?

Sounds like slab corruption. CONFIG_DEBUG_SLAB should tell you more.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
  2007-04-11  6:53 ` Pekka Enberg
@ 2007-04-11  9:14   ` Zhao Forrest
  2007-04-11  9:53     ` Paul Jackson
  2007-04-11  9:53   ` Zhao Forrest
  1 sibling, 1 reply; 17+ messages in thread
From: Zhao Forrest @ 2007-04-11  9:14 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: linux-mm, linux-kernel

I enable CONFIG_DEBUG_SLAB, but don't get any extra debug messages
related to slab.
Is there other switch that I need to turn on?

Thanks,
Forrest

BUG: soft lockup detected on CPU#6!

Call Trace:
 <IRQ>  [<ffffffff800b3834>] softlockup_tick+0xdb/0xed
 [<ffffffff80093edf>] update_process_times+0x42/0x68
 [<ffffffff80074897>] smp_local_timer_interrupt+0x23/0x47
 [<ffffffff80074f59>] smp_apic_timer_interrupt+0x41/0x47
 [<ffffffff8005c7c2>] apic_timer_interrupt+0x66/0x6c
 <EOI>  [<ffffffff80043b09>] invalidate_mapping_pages+0xe1/0x15f
 [<ffffffff80043afa>] invalidate_mapping_pages+0xd2/0x15f
 [<ffffffff800d5bea>] kill_bdev+0xe/0x21
 [<ffffffff800d6110>] __blkdev_put+0x4f/0x169
 [<ffffffff80012785>] __fput+0xae/0x198
 [<ffffffff80023ca6>] filp_close+0x5c/0x64
 [<ffffffff80038e33>] put_files_struct+0x6c/0xc3
 [<ffffffff8001543d>] do_exit+0x2d2/0x8b1
 [<ffffffff80047932>] cpuset_exit+0x0/0x6c
 [<ffffffff8002b30f>] get_signal_to_deliver+0x427/0x456
 [<ffffffff80059b9e>] do_notify_resume+0x9c/0x7a9
 [<ffffffff8008776d>] default_wake_function+0x0/0xe
 [<ffffffff800b2b79>] audit_syscall_exit+0x2cd/0x2ec
 [<ffffffff8005be62>] int_signal+0x12/0x17


On 4/11/07, Pekka Enberg <penberg@cs.helsinki.fi> wrote:
> On 4/11/07, Zhao Forrest <forrest.zhao@gmail.com> wrote:
> > We're using RHEL5 with kernel version 2.6.18-8.el5.
> > When doing a stress test on raw device for about 3-4 hours, we found
> > the soft lockup message in dmesg.
> > I know we're not reporting the bug on the latest kernel, but does any
> > expert know if this is the known issue in old kernel? Or why
> > kmem_cache_free occupy CPU for more than 10 seconds?
>
> Sounds like slab corruption. CONFIG_DEBUG_SLAB should tell you more.
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
  2007-04-11  9:14   ` Zhao Forrest
@ 2007-04-11  9:53     ` Paul Jackson
  2007-04-11 10:06       ` Peter Zijlstra
  0 siblings, 1 reply; 17+ messages in thread
From: Paul Jackson @ 2007-04-11  9:53 UTC (permalink / raw)
  To: Zhao Forrest; +Cc: penberg, linux-mm, linux-kernel

I'm confused - which end of ths stack is up?

cpuset_exit doesn't call do_exit, rather it's the other
way around.  But put_files_struct doesn't call do_exit,
rather do_exit calls __exit_files calls put_files_struct.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
  2007-04-11  6:53 ` Pekka Enberg
  2007-04-11  9:14   ` Zhao Forrest
@ 2007-04-11  9:53   ` Zhao Forrest
  2007-04-11 10:38     ` Peter Zijlstra
  1 sibling, 1 reply; 17+ messages in thread
From: Zhao Forrest @ 2007-04-11  9:53 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: linux-mm, linux-kernel

I got some new information:
Before soft lockup message is out, we have:
[root@nsgsh-dhcp-149 home]# cat /proc/slabinfo |grep buffer_head
buffer_head       10927942 10942560    120   32    1 : tunables   32
16    8 : slabdata 341955 341955      6 : globalstat 37602996 11589379
1174373    6                              0    1 6918 12166031 1013708
: cpustat 35254590 2350698 13610965 907286

Then after buffer_head is freed, we have:
[root@nsgsh-dhcp-149 home]# cat /proc/slabinfo |grep buffer_head
buffer_head         9542  36384    120   32    1 : tunables   32   16
  8 : slabdata   1137   1137    245 : globalstat 37602996 11589379
1174373    6                                  0    1 6983 20507478
1708818 : cpustat 35254625 2350704 16027174 1068367

Does this huge number of buffer_head cause the soft lockup?

Thanks,
Forrest

On 4/11/07, Pekka Enberg <penberg@cs.helsinki.fi> wrote:
> On 4/11/07, Zhao Forrest <forrest.zhao@gmail.com> wrote:
> > We're using RHEL5 with kernel version 2.6.18-8.el5.
> > When doing a stress test on raw device for about 3-4 hours, we found
> > the soft lockup message in dmesg.
> > I know we're not reporting the bug on the latest kernel, but does any
> > expert know if this is the known issue in old kernel? Or why
> > kmem_cache_free occupy CPU for more than 10 seconds?
>
> Sounds like slab corruption. CONFIG_DEBUG_SLAB should tell you more.
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
  2007-04-11  9:53     ` Paul Jackson
@ 2007-04-11 10:06       ` Peter Zijlstra
  2007-04-11 10:10         ` Zhao Forrest
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2007-04-11 10:06 UTC (permalink / raw)
  To: Paul Jackson; +Cc: Zhao Forrest, penberg, linux-mm, linux-kernel

On Wed, 2007-04-11 at 02:53 -0700, Paul Jackson wrote:
> I'm confused - which end of ths stack is up?
> 
> cpuset_exit doesn't call do_exit, rather it's the other
> way around.  But put_files_struct doesn't call do_exit,
> rather do_exit calls __exit_files calls put_files_struct.

I'm guessing its x86_64 which generates crap traces.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
  2007-04-11 10:06       ` Peter Zijlstra
@ 2007-04-11 10:10         ` Zhao Forrest
  2007-04-11 10:16           ` Peter Zijlstra
                             ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Zhao Forrest @ 2007-04-11 10:10 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Paul Jackson, penberg, linux-mm, linux-kernel

On 4/11/07, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> On Wed, 2007-04-11 at 02:53 -0700, Paul Jackson wrote:
> > I'm confused - which end of ths stack is up?
> >
> > cpuset_exit doesn't call do_exit, rather it's the other
> > way around.  But put_files_struct doesn't call do_exit,
> > rather do_exit calls __exit_files calls put_files_struct.
>
> I'm guessing its x86_64 which generates crap traces.
>
Yes, it's x86_64. Is there a reliable way to generate stack traces under x86_64?
Can enabling "[ ] Compile the kernel with frame pointers" help?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
  2007-04-11 10:10         ` Zhao Forrest
@ 2007-04-11 10:16           ` Peter Zijlstra
  2007-04-11 16:23           ` Chuck Ebbert
  2007-04-11 16:44           ` Badari Pulavarty
  2 siblings, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2007-04-11 10:16 UTC (permalink / raw)
  To: Zhao Forrest; +Cc: Paul Jackson, penberg, linux-mm, linux-kernel

On Wed, 2007-04-11 at 18:10 +0800, Zhao Forrest wrote:
> On 4/11/07, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> > On Wed, 2007-04-11 at 02:53 -0700, Paul Jackson wrote:
> > > I'm confused - which end of ths stack is up?
> > >
> > > cpuset_exit doesn't call do_exit, rather it's the other
> > > way around.  But put_files_struct doesn't call do_exit,
> > > rather do_exit calls __exit_files calls put_files_struct.
> >
> > I'm guessing its x86_64 which generates crap traces.
> >
> Yes, it's x86_64. Is there a reliable way to generate stack traces under x86_64?
> Can enabling "[ ] Compile the kernel with frame pointers" help?

Sometimes, the best is to redo the undo of the dwarf based stack
unwinder.

But as you said, that _huge_ number of buffers might be the issue, I'm
looking through the codepaths from __blkdev_put on downwards, I suspect
we hold a single lock somewhere,...

So hold on with patching the unwinder back in (unless of course you
fancy doing so :-)



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
  2007-04-11  9:53   ` Zhao Forrest
@ 2007-04-11 10:38     ` Peter Zijlstra
  2007-04-11 16:35       ` Ken Chen
  2007-04-12  6:17       ` Zhao Forrest
  0 siblings, 2 replies; 17+ messages in thread
From: Peter Zijlstra @ 2007-04-11 10:38 UTC (permalink / raw)
  To: Zhao Forrest; +Cc: Pekka Enberg, linux-mm, linux-kernel

On Wed, 2007-04-11 at 17:53 +0800, Zhao Forrest wrote:
> I got some new information:
> Before soft lockup message is out, we have:
> [root@nsgsh-dhcp-149 home]# cat /proc/slabinfo |grep buffer_head
> buffer_head       10927942 10942560    120   32    1 : tunables   32
> 16    8 : slabdata 341955 341955      6 : globalstat 37602996 11589379
> 1174373    6                              0    1 6918 12166031 1013708
> : cpustat 35254590 2350698 13610965 907286
> 
> Then after buffer_head is freed, we have:
> [root@nsgsh-dhcp-149 home]# cat /proc/slabinfo |grep buffer_head
> buffer_head         9542  36384    120   32    1 : tunables   32   16
>   8 : slabdata   1137   1137    245 : globalstat 37602996 11589379
> 1174373    6                                  0    1 6983 20507478
> 1708818 : cpustat 35254625 2350704 16027174 1068367
> 
> Does this huge number of buffer_head cause the soft lockup?


__blkdev_put() takes the BKL and bd_mutex
invalidate_mapping_pages() tries to take the PageLock

But no other looks seem held while free_buffer_head() is called

All these locks are preemptible (CONFIG_PREEMPT_BKL?=y) and should not
hog the cpu like that, what preemption mode have you got selected?
(CONFIG_PREEMPT_VOLUNTARY?=y)

Does this fix it?

--- fs/buffer.c~	2007-02-01 12:00:34.000000000 +0100
+++ fs/buffer.c	2007-04-11 12:35:48.000000000 +0200
@@ -3029,6 +3029,8 @@ out:
 			struct buffer_head *next = bh->b_this_page;
 			free_buffer_head(bh);
 			bh = next;
+
+			cond_resched();
 		} while (bh != buffers_to_free);
 	}
 	return ret;




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
  2007-04-11 10:10         ` Zhao Forrest
  2007-04-11 10:16           ` Peter Zijlstra
@ 2007-04-11 16:23           ` Chuck Ebbert
  2007-04-11 16:44           ` Badari Pulavarty
  2 siblings, 0 replies; 17+ messages in thread
From: Chuck Ebbert @ 2007-04-11 16:23 UTC (permalink / raw)
  To: Zhao Forrest
  Cc: Peter Zijlstra, Paul Jackson, penberg, linux-mm, linux-kernel

Zhao Forrest wrote:
> On 4/11/07, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>> On Wed, 2007-04-11 at 02:53 -0700, Paul Jackson wrote:
>> > I'm confused - which end of ths stack is up?
>> >
>> > cpuset_exit doesn't call do_exit, rather it's the other
>> > way around.  But put_files_struct doesn't call do_exit,
>> > rather do_exit calls __exit_files calls put_files_struct.
>>
>> I'm guessing its x86_64 which generates crap traces.
>>
> Yes, it's x86_64. Is there a reliable way to generate stack traces under
> x86_64?
> Can enabling "[ ] Compile the kernel with frame pointers" help?

It will help a little, but not much because the stack backtrace
code ignores the frame pointers. But it will prevent tail calls,
making it somewhat easier to make sense of the data.

This should be a FAQ item somewhere...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
  2007-04-11 10:38     ` Peter Zijlstra
@ 2007-04-11 16:35       ` Ken Chen
  2007-04-12  6:17       ` Zhao Forrest
  1 sibling, 0 replies; 17+ messages in thread
From: Ken Chen @ 2007-04-11 16:35 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Zhao Forrest, Pekka Enberg, linux-mm, linux-kernel

On 4/11/07, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> On Wed, 2007-04-11 at 17:53 +0800, Zhao Forrest wrote:
> > I got some new information:
> > Before soft lockup message is out, we have:
> > [root@nsgsh-dhcp-149 home]# cat /proc/slabinfo |grep buffer_head
> > buffer_head       10927942 10942560    120   32    1 : tunables   32
> > 16    8 : slabdata 341955 341955      6 : globalstat 37602996 11589379
> > 1174373    6                              0    1 6918 12166031 1013708
> > : cpustat 35254590 2350698 13610965 907286
> >
> > Then after buffer_head is freed, we have:
> > [root@nsgsh-dhcp-149 home]# cat /proc/slabinfo |grep buffer_head
> > buffer_head         9542  36384    120   32    1 : tunables   32   16
> >   8 : slabdata   1137   1137    245 : globalstat 37602996 11589379
> > 1174373    6                                  0    1 6983 20507478
> > 1708818 : cpustat 35254625 2350704 16027174 1068367
> >
> > Does this huge number of buffer_head cause the soft lockup?
>
> __blkdev_put() takes the BKL and bd_mutex
> invalidate_mapping_pages() tries to take the PageLock
>
> But no other looks seem held while free_buffer_head() is called
>
> All these locks are preemptible (CONFIG_PREEMPT_BKL?=y) and should not
> hog the cpu like that, what preemption mode have you got selected?
> (CONFIG_PREEMPT_VOLUNTARY?=y)

also, you can try this patch:

http://groups.google.com/group/linux.kernel/browse_thread/thread/7086e4b9d5504dc9/c608bfea4614b07e?lnk=gst&q=+Big+kernel+lock+contention+in+do_open&rnum=1#c608bfea4614b07e

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
  2007-04-11 10:10         ` Zhao Forrest
  2007-04-11 10:16           ` Peter Zijlstra
  2007-04-11 16:23           ` Chuck Ebbert
@ 2007-04-11 16:44           ` Badari Pulavarty
  2 siblings, 0 replies; 17+ messages in thread
From: Badari Pulavarty @ 2007-04-11 16:44 UTC (permalink / raw)
  To: Zhao Forrest; +Cc: Peter Zijlstra, Paul Jackson, penberg, linux-mm, lkml

On Wed, 2007-04-11 at 18:10 +0800, Zhao Forrest wrote:
> On 4/11/07, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> > On Wed, 2007-04-11 at 02:53 -0700, Paul Jackson wrote:
> > > I'm confused - which end of ths stack is up?
> > >
> > > cpuset_exit doesn't call do_exit, rather it's the other
> > > way around.  But put_files_struct doesn't call do_exit,
> > > rather do_exit calls __exit_files calls put_files_struct.
> >
> > I'm guessing its x86_64 which generates crap traces.
> >
> Yes, it's x86_64. Is there a reliable way to generate stack traces under x86_64?
> Can enabling "[ ] Compile the kernel with frame pointers" help?

CONFIG_UNWIND_INFO=y
CONFIG_STACK_UNWIND=y

should help.

Thanks,
Badari



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
  2007-04-11  6:17 Why kmem_cache_free occupy CPU for more than 10 seconds? Zhao Forrest
  2007-04-11  6:53 ` Pekka Enberg
@ 2007-04-11 22:30 ` Andrew Morton
  2007-04-12  7:39   ` Peter Zijlstra
  1 sibling, 1 reply; 17+ messages in thread
From: Andrew Morton @ 2007-04-11 22:30 UTC (permalink / raw)
  To: Zhao Forrest; +Cc: linux-mm, linux-kernel

On Wed, 11 Apr 2007 14:17:04 +0800
"Zhao Forrest" <forrest.zhao@gmail.com> wrote:

> We're using RHEL5 with kernel version 2.6.18-8.el5.
> When doing a stress test on raw device for about 3-4 hours, we found
> the soft lockup message in dmesg.
> I know we're not reporting the bug on the latest kernel, but does any
> expert know if this is the known issue in old kernel? Or why
> kmem_cache_free occupy CPU for more than 10 seconds?
> 
> Please let me know if you need any information.
> 
> Thanks,
> Forrest
> --------------------------------------------------------------
> BUG: soft lockup detected on CPU#1!
> 
> Call Trace:
>  <IRQ>  [<ffffffff800b2c93>] softlockup_tick+0xdb/0xed
>  [<ffffffff800933df>] update_process_times+0x42/0x68
>  [<ffffffff80073d97>] smp_local_timer_interrupt+0x23/0x47
>  [<ffffffff80074459>] smp_apic_timer_interrupt+0x41/0x47
>  [<ffffffff8005bcc2>] apic_timer_interrupt+0x66/0x6c
>  <EOI>  [<ffffffff80007660>] kmem_cache_free+0x1c0/0x1cb
>  [<ffffffff800262ee>] free_buffer_head+0x2a/0x43
>  [<ffffffff80027110>] try_to_free_buffers+0x89/0x9d
>  [<ffffffff80043041>] invalidate_mapping_pages+0x90/0x15f
>  [<ffffffff800d4a77>] kill_bdev+0xe/0x21
>  [<ffffffff800d4f9d>] __blkdev_put+0x4f/0x169
>  [<ffffffff80012281>] __fput+0xae/0x198
>  [<ffffffff80023647>] filp_close+0x5c/0x64
>  [<ffffffff800384f9>] put_files_struct+0x6c/0xc3
>  [<ffffffff80014f01>] do_exit+0x2d2/0x8b1
>  [<ffffffff80046eb6>] cpuset_exit+0x0/0x6c
>  [<ffffffff8002abd7>] get_signal_to_deliver+0x427/0x456
>  [<ffffffff80059122>] do_notify_resume+0x9c/0x7a9
>  [<ffffffff80086c6d>] default_wake_function+0x0/0xe
>  [<ffffffff800b1fd8>] audit_syscall_exit+0x2cd/0x2ec
>  [<ffffffff8005b362>] int_signal+0x12/0x17

I think there's nothing unusual happening here - you closed the device and
the kernel has to remove a tremendous number of pagecache pages, and that
simply takes a long time.

How much memory does the machine have?

There used to be a cond_resched() in invalidate_mapping_pages() which would
have prevented this, but I rudely removed it to support
/proc/sys/vm/drop_caches (which needs to call invalidate_inode_pages()
under spinlock).

We could resurrect that cond_resched() by passing in some flag, I guess. 
Or change the code to poke the softlockup detector.  The former would be
better.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
  2007-04-11 10:38     ` Peter Zijlstra
  2007-04-11 16:35       ` Ken Chen
@ 2007-04-12  6:17       ` Zhao Forrest
  1 sibling, 0 replies; 17+ messages in thread
From: Zhao Forrest @ 2007-04-12  6:17 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Pekka Enberg, linux-mm, linux-kernel

On 4/11/07, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> On Wed, 2007-04-11 at 17:53 +0800, Zhao Forrest wrote:
> > I got some new information:
> > Before soft lockup message is out, we have:
> > [root@nsgsh-dhcp-149 home]# cat /proc/slabinfo |grep buffer_head
> > buffer_head       10927942 10942560    120   32    1 : tunables   32
> > 16    8 : slabdata 341955 341955      6 : globalstat 37602996 11589379
> > 1174373    6                              0    1 6918 12166031 1013708
> > : cpustat 35254590 2350698 13610965 907286
> >
> > Then after buffer_head is freed, we have:
> > [root@nsgsh-dhcp-149 home]# cat /proc/slabinfo |grep buffer_head
> > buffer_head         9542  36384    120   32    1 : tunables   32   16
> >   8 : slabdata   1137   1137    245 : globalstat 37602996 11589379
> > 1174373    6                                  0    1 6983 20507478
> > 1708818 : cpustat 35254625 2350704 16027174 1068367
> >
> > Does this huge number of buffer_head cause the soft lockup?
>
>
> __blkdev_put() takes the BKL and bd_mutex
> invalidate_mapping_pages() tries to take the PageLock
>
> But no other looks seem held while free_buffer_head() is called
>
> All these locks are preemptible (CONFIG_PREEMPT_BKL?=y) and should not
> hog the cpu like that, what preemption mode have you got selected?
> (CONFIG_PREEMPT_VOLUNTARY?=y)
These 2 kernel options are turned on by default in my kernel. Here's
snip from .config
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_BKL=y
CONFIG_NUMA=y
CONFIG_K8_NUMA=y

>
> Does this fix it?
>
> --- fs/buffer.c~        2007-02-01 12:00:34.000000000 +0100
> +++ fs/buffer.c 2007-04-11 12:35:48.000000000 +0200
> @@ -3029,6 +3029,8 @@ out:
>                         struct buffer_head *next = bh->b_this_page;
>                         free_buffer_head(bh);
>                         bh = next;
> +
> +                       cond_resched();
>                 } while (bh != buffers_to_free);
>         }
>         return ret;
>
So far I have run the test with patched kernel for 6 rounds, and
didn't see the soft lockup. I think this patch should fix the problem.
But what still confused me is that why do we need to invoke
cond_resched() voluntarily since CONFIG_PREEMPT_VOLUNTARY and
CONFIG_PREEMPT_BKL are both turned on? From my understanding these 2
options should make schedule happen even if CPU is under heavy
load......

Thanks,
Forrest

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
  2007-04-11 22:30 ` Andrew Morton
@ 2007-04-12  7:39   ` Peter Zijlstra
  2007-04-12  7:55     ` Andrew Morton
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2007-04-12  7:39 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Zhao Forrest, linux-mm, linux-kernel

On Wed, 2007-04-11 at 15:30 -0700, Andrew Morton wrote:

> There used to be a cond_resched() in invalidate_mapping_pages() which would
> have prevented this, but I rudely removed it to support
> /proc/sys/vm/drop_caches (which needs to call invalidate_inode_pages()
> under spinlock).
> 
> We could resurrect that cond_resched() by passing in some flag, I guess. 
> Or change the code to poke the softlockup detector.  The former would be
> better.

cond_resched() is conditional on __resched_legal(0), which should take
care of being called under a spinlock.

so I guess we can just reinstate the call in invalidate_mapping_pages()

(still waiting on the compile to finish...)
---
invalidate_mapping_pages() is called under locks (usually preemptable)
but can do a _lot_ of work, stick in a voluntary preemption point to
avoid excessive latencies (over 10 seconds was reported by softlockup).

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 mm/truncate.c |    2 ++
 1 file changed, 2 insertions(+)

Index: linux-2.6-mm/mm/truncate.c
===================================================================
--- linux-2.6-mm.orig/mm/truncate.c
+++ linux-2.6-mm/mm/truncate.c
@@ -292,6 +292,8 @@ unsigned long invalidate_mapping_pages(s
 			pgoff_t index;
 			int lock_failed;
 
+			cond_resched();
+
 			lock_failed = TestSetPageLocked(page);
 
 			/*


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
  2007-04-12  7:39   ` Peter Zijlstra
@ 2007-04-12  7:55     ` Andrew Morton
  2007-04-12  9:19       ` Peter Zijlstra
  0 siblings, 1 reply; 17+ messages in thread
From: Andrew Morton @ 2007-04-12  7:55 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Zhao Forrest, linux-mm, linux-kernel

On Thu, 12 Apr 2007 09:39:25 +0200 Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> On Wed, 2007-04-11 at 15:30 -0700, Andrew Morton wrote:
> 
> > There used to be a cond_resched() in invalidate_mapping_pages() which would
> > have prevented this, but I rudely removed it to support
> > /proc/sys/vm/drop_caches (which needs to call invalidate_inode_pages()
> > under spinlock).
> > 
> > We could resurrect that cond_resched() by passing in some flag, I guess. 
> > Or change the code to poke the softlockup detector.  The former would be
> > better.
> 
> cond_resched() is conditional on __resched_legal(0)

What's __resched_legal()?

> which should take
> care of being called under a spinlock.

We only increment preempt_count() in spin_lock() if CONFIG_PREEMPT.

> so I guess we can just reinstate the call in invalidate_mapping_pages()
> 
> (still waiting on the compile to finish...)
> ---
> invalidate_mapping_pages() is called under locks (usually preemptable)
> but can do a _lot_ of work, stick in a voluntary preemption point to
> avoid excessive latencies (over 10 seconds was reported by softlockup).
> 
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
>  mm/truncate.c |    2 ++
>  1 file changed, 2 insertions(+)
> 
> Index: linux-2.6-mm/mm/truncate.c
> ===================================================================
> --- linux-2.6-mm.orig/mm/truncate.c
> +++ linux-2.6-mm/mm/truncate.c
> @@ -292,6 +292,8 @@ unsigned long invalidate_mapping_pages(s
>  			pgoff_t index;
>  			int lock_failed;
>  
> +			cond_resched();
> +
>  			lock_failed = TestSetPageLocked(page);

Is deadlocky on the drop_caches path and if CONFIG_PREEMPT we'll get
scheduling-in-spinlock warnings.

For the blkdev_close() path the change is unneeded if CONFIG_PREEMPT and
will fix things if !CONFIG_PREEMPT.

We can presumably just remove the invalidate_mapping_pages() call from the
kill_bdev() path (at least) - kill_bdev()'s truncate_inode_pages() will do
the same thing.

It might be time to remove that unused-for-six-years destroy_dirty_buffers
too.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Why kmem_cache_free occupy CPU for more than 10 seconds?
  2007-04-12  7:55     ` Andrew Morton
@ 2007-04-12  9:19       ` Peter Zijlstra
  0 siblings, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2007-04-12  9:19 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Zhao Forrest, linux-mm, linux-kernel

On Thu, 2007-04-12 at 00:55 -0700, Andrew Morton wrote:
> On Thu, 12 Apr 2007 09:39:25 +0200 Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> 
> > On Wed, 2007-04-11 at 15:30 -0700, Andrew Morton wrote:
> > 
> > > There used to be a cond_resched() in invalidate_mapping_pages() which would
> > > have prevented this, but I rudely removed it to support
> > > /proc/sys/vm/drop_caches (which needs to call invalidate_inode_pages()
> > > under spinlock).
> > > 
> > > We could resurrect that cond_resched() by passing in some flag, I guess. 
> > > Or change the code to poke the softlockup detector.  The former would be
> > > better.
> > 
> > cond_resched() is conditional on __resched_legal(0)
> 
> What's __resched_legal()?

Hmm, I was looking at the RHEL-5 sources, it seems this function was
removed later (commmit 9414232fa0cc28e2f51b8c76d260f2748f7953fc).

> > which should take
> > care of being called under a spinlock.
> 
> We only increment preempt_count() in spin_lock() if CONFIG_PREEMPT.

Right, ok, that makes the whole thing fall apart indeed.

> > Index: linux-2.6-mm/mm/truncate.c
> > ===================================================================
> > --- linux-2.6-mm.orig/mm/truncate.c
> > +++ linux-2.6-mm/mm/truncate.c
> > @@ -292,6 +292,8 @@ unsigned long invalidate_mapping_pages(s
> >  			pgoff_t index;
> >  			int lock_failed;
> >  
> > +			cond_resched();
> > +
> >  			lock_failed = TestSetPageLocked(page);
> 
> Is deadlocky on the drop_caches path and if CONFIG_PREEMPT we'll get
> scheduling-in-spinlock warnings.

*nod*

> For the blkdev_close() path the change is unneeded if CONFIG_PREEMPT and
> will fix things if !CONFIG_PREEMPT.
> 
> We can presumably just remove the invalidate_mapping_pages() call from the
> kill_bdev() path (at least) - kill_bdev()'s truncate_inode_pages() will do
> the same thing.

Certainly looks that way, I'll see if I can spot a hole in that.

> It might be time to remove that unused-for-six-years destroy_dirty_buffers
> too.

ok.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2007-04-12  9:19 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-04-11  6:17 Why kmem_cache_free occupy CPU for more than 10 seconds? Zhao Forrest
2007-04-11  6:53 ` Pekka Enberg
2007-04-11  9:14   ` Zhao Forrest
2007-04-11  9:53     ` Paul Jackson
2007-04-11 10:06       ` Peter Zijlstra
2007-04-11 10:10         ` Zhao Forrest
2007-04-11 10:16           ` Peter Zijlstra
2007-04-11 16:23           ` Chuck Ebbert
2007-04-11 16:44           ` Badari Pulavarty
2007-04-11  9:53   ` Zhao Forrest
2007-04-11 10:38     ` Peter Zijlstra
2007-04-11 16:35       ` Ken Chen
2007-04-12  6:17       ` Zhao Forrest
2007-04-11 22:30 ` Andrew Morton
2007-04-12  7:39   ` Peter Zijlstra
2007-04-12  7:55     ` Andrew Morton
2007-04-12  9:19       ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox