linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
To: dchinner@redhat.com
Cc: mhocko@suse.cz, linux-mm@kvack.org, rientjes@google.com,
	oleg@redhat.com, david@fromorbit.com
Subject: Re: How to handle TIF_MEMDIE stalls?
Date: Sat, 20 Dec 2014 21:41:22 +0900	[thread overview]
Message-ID: <201412202141.ADF87596.tOSLJHFFOOFMVQ@I-love.SAKURA.ne.jp> (raw)
In-Reply-To: <20141220020331.GM1942@devil.localdomain>

Dave Chinner wrote:
> On Fri, Dec 19, 2014 at 09:22:49PM +0900, Tetsuo Handa wrote:
> > > > The global OOM killer will try to kill this program because this program
> > > > will be using 400MB+ of RAM by the time the global OOM killer is triggered.
> > > > But sometimes this program cannot be terminated by the global OOM killer
> > > > due to XFS lock dependency.
> > > >
> > > > You can see what is happening from OOM traces after uptime > 320 seconds of
> > > > http://I-love.SAKURA.ne.jp/tmp/serial-20141213.txt.xz though memcg is not
> > > > configured on this program.
> > >
> > > This is clearly a separate issue. It is a lock dependency and that alone
> > > _cannot_ be handled from OOM killer as it doesn't understand lock
> > > dependencies. This should be addressed from the xfs point of view IMHO
> > > but I am not familiar with this filesystem to tell you how or whether it
> > > is possible.
> 
> What XFS lock dependency? I see nothing in that output file that indicates a
> lock dependency problem - can you point out what the issue is here?

This is a problem which lockdep cannot report.

The problem is that an OOM-victim task is unable to terminate because it is
blocked for waiting for (I don't know which lock but) one of locks used by XFS.

----------
[  320.788387] Kill process 10732 (a.out) sharing same memory
(...snipped...)
[  398.641724] a.out           D ffff880077e42638     0 10732      1 0x00000084
[  398.643705]  ffff8800770ebcb8 0000000000000082 ffff8800770ebc88 ffff880077e42210
[  398.645819]  0000000000012500 ffff8800770ebfd8 0000000000012500 ffff880077e42210
[  398.647917]  ffff8800770ebcb8 ffff88007b4a2a48 ffff88007b4a2a4c ffff880077e42210
[  398.650009] Call Trace:
[  398.651094]  [<ffffffff8159f954>] schedule_preempt_disabled+0x24/0x70
[  398.652913]  [<ffffffff815a1705>] __mutex_lock_slowpath+0xb5/0x120
[  398.654679]  [<ffffffff815a178e>] mutex_lock+0x1e/0x32
[  398.656262]  [<ffffffffa023b58a>] xfs_file_buffered_aio_write.isra.15+0x6a/0x200 [xfs]
[  398.658350]  [<ffffffffa023b79e>] xfs_file_write_iter+0x7e/0x120 [xfs]
[  398.660191]  [<ffffffff8117edd9>] new_sync_write+0x89/0xd0
[  398.661829]  [<ffffffff8117f742>] vfs_write+0xb2/0x1f0
[  398.663397]  [<ffffffff8101a9f4>] ? do_audit_syscall_entry+0x64/0x70
[  398.665190]  [<ffffffff81180200>] SyS_write+0x50/0xc0
[  398.666745]  [<ffffffff810f729e>] ? __audit_syscall_exit+0x22e/0x2d0
[  398.668539]  [<ffffffff815a38e9>] system_call_fastpath+0x12/0x17
(...snipped...)
[  897.190487] Out of memory: Kill process 10732 (a.out) score 898 or sacrifice child
[  897.192236] Killed process 10732 (a.out) total-vm:2166864kB, anon-rss:1727976kB, file-rss:0kB
(...snipped...)
[  904.819053] a.out           D ffff880077e42638     0 10732      1 0x00100084
[  904.820967]  ffff8800770ebcb8 0000000000000082 ffff8800770ebc88 ffff880077e42210
[  904.823011]  0000000000012500 ffff8800770ebfd8 0000000000012500 ffff880077e42210
[  904.825054]  ffff8800770ebcb8 ffff88007b4a2a48 ffff88007b4a2a4c ffff880077e42210
[  904.827137] Call Trace:
[  904.828174]  [<ffffffff8159f954>] schedule_preempt_disabled+0x24/0x70
[  904.829924]  [<ffffffff815a1705>] __mutex_lock_slowpath+0xb5/0x120
[  904.831634]  [<ffffffff815a178e>] mutex_lock+0x1e/0x32
[  904.833148]  [<ffffffffa023b58a>] xfs_file_buffered_aio_write.isra.15+0x6a/0x200 [xfs]
[  904.835178]  [<ffffffffa023b79e>] xfs_file_write_iter+0x7e/0x120 [xfs]
[  904.836980]  [<ffffffff8117edd9>] new_sync_write+0x89/0xd0
[  904.838561]  [<ffffffff8117f742>] vfs_write+0xb2/0x1f0
[  904.840094]  [<ffffffff8101a9f4>] ? do_audit_syscall_entry+0x64/0x70
[  904.841846]  [<ffffffff81180200>] SyS_write+0x50/0xc0
[  904.844026]  [<ffffffff810f729e>] ? __audit_syscall_exit+0x22e/0x2d0
[  904.845826]  [<ffffffff815a38e9>] system_call_fastpath+0x12/0x17
----------

I don't know how block layer requests are issued by filesystem layer's
activities, but PID=10832 is blocked for so long at blk_rq_map_kern() doing
__GFP_WAIT allocation. I'm sure that this blk_rq_map_kern() is issued by XFS
filesystem's activities because this system has only /dev/sda1 formatted as
XFS and there is no swap memory.

----------
[  393.696527] kworker/1:1     R  running task        0    43      2 0x00000000
[  393.698561] Workqueue: events_freezable_power_ disk_events_workfn
[  393.700339]  ffff88007c5437d8 0000000000000046 ffff88007c5438a0 ffff88007c4b4cc0
[  393.702513]  0000000000012500 ffff88007c543fd8 0000000000012500 ffff88007c4b4cc0
[  393.704631]  0000000000000020 ffff88007c5438b0 0000000000000002 ffffffff81848408
[  393.706748] Call Trace:
[  393.707924]  [<ffffffff8159f814>] _cond_resched+0x24/0x40
[  393.709572]  [<ffffffff81122119>] shrink_slab+0x139/0x150
[  393.711206]  [<ffffffff811252bf>] do_try_to_free_pages+0x35f/0x4d0
[  393.713001]  [<ffffffff811254c4>] try_to_free_pages+0x94/0xc0
[  393.714679]  [<ffffffff8111a793>] __alloc_pages_nodemask+0x4e3/0xa40
[  393.716538]  [<ffffffff8115a8ce>] alloc_pages_current+0x8e/0x100
[  393.718262]  [<ffffffff8125bed6>] bio_copy_user_iov+0x1d6/0x380
[  393.719959]  [<ffffffff8125e4cd>] ? blk_rq_init+0xed/0x160
[  393.721628]  [<ffffffff8125c119>] bio_copy_kern+0x49/0x100
[  393.723240]  [<ffffffff810a14a0>] ? prepare_to_wait_event+0x100/0x100
[  393.725043]  [<ffffffff81265e6f>] blk_rq_map_kern+0x6f/0x130
[  393.726695]  [<ffffffff8116393e>] ? kmem_cache_alloc+0x48e/0x4b0
[  393.728407]  [<ffffffff813a66cf>] scsi_execute+0x12f/0x160
[  393.730021]  [<ffffffff813a7f14>] scsi_execute_req_flags+0x84/0xf0
[  393.731776]  [<ffffffffa01e29cc>] sr_check_events+0xbc/0x2e0 [sr_mod]
[  393.733561]  [<ffffffff8109834c>] ? put_prev_entity+0x2c/0x3b0
[  393.735235]  [<ffffffffa01d6177>] cdrom_check_events+0x17/0x30 [cdrom]
[  393.737027]  [<ffffffffa01e2e5d>] sr_block_check_events+0x2d/0x30 [sr_mod]
[  393.738918]  [<ffffffff812701c6>] disk_check_events+0x56/0x1b0
[  393.740602]  [<ffffffff81270331>] disk_events_workfn+0x11/0x20
[  393.742254]  [<ffffffff8107ceaf>] process_one_work+0x13f/0x370
[  393.743898]  [<ffffffff8107de99>] worker_thread+0x119/0x500
[  393.745495]  [<ffffffff8107dd80>] ? rescuer_thread+0x350/0x350
[  393.747152]  [<ffffffff81082f7c>] kthread+0xdc/0x100
[  393.748637]  [<ffffffff81082ea0>] ? kthread_create_on_node+0x1b0/0x1b0
[  393.750438]  [<ffffffff815a383c>] ret_from_fork+0x7c/0xb0
[  393.752004]  [<ffffffff81082ea0>] ? kthread_create_on_node+0x1b0/0x1b0
(...snipped...)
[  525.157216] kworker/1:0     R  running task        0 10832      2 0x00000080
[  525.159187] Workqueue: events_freezable_power_ disk_events_workfn
[  525.160907]  ffff88007c8ab7d8 0000000000000046 ffff88007c8ab8a0 ffff88007c894190
[  525.162956]  0000000000012500 ffff88007c8abfd8 0000000000012500 ffff88007c894190
[  525.165010]  0000000000000020 ffff88007c8ab8b0 0000000000000002 ffffffff81848408
[  525.167068] Call Trace:
[  525.168100]  [<ffffffff8159f814>] _cond_resched+0x24/0x40
[  525.169679]  [<ffffffff81122119>] shrink_slab+0x139/0x150
[  525.171241]  [<ffffffff811252bf>] do_try_to_free_pages+0x35f/0x4d0
[  525.172960]  [<ffffffff811254c4>] try_to_free_pages+0x94/0xc0
[  525.174580]  [<ffffffff8111a793>] __alloc_pages_nodemask+0x4e3/0xa40
[  525.176302]  [<ffffffff8115a8ce>] alloc_pages_current+0x8e/0x100
[  525.177982]  [<ffffffff8125bed6>] bio_copy_user_iov+0x1d6/0x380
[  525.179631]  [<ffffffff8125e4cd>] ? blk_rq_init+0xed/0x160
[  525.181215]  [<ffffffff8125c119>] bio_copy_kern+0x49/0x100
[  525.182785]  [<ffffffff810a14a0>] ? prepare_to_wait_event+0x100/0x100
[  525.184545]  [<ffffffff81265e6f>] blk_rq_map_kern+0x6f/0x130
[  525.186156]  [<ffffffff8116393e>] ? kmem_cache_alloc+0x48e/0x4b0
[  525.187831]  [<ffffffff813a66cf>] scsi_execute+0x12f/0x160
[  525.189418]  [<ffffffff813a7f14>] scsi_execute_req_flags+0x84/0xf0
[  525.191148]  [<ffffffffa01e29cc>] sr_check_events+0xbc/0x2e0 [sr_mod]
[  525.192969]  [<ffffffff8109834c>] ? put_prev_entity+0x2c/0x3b0
[  525.194688]  [<ffffffffa01d6177>] cdrom_check_events+0x17/0x30 [cdrom]
[  525.196455]  [<ffffffffa01e2e5d>] sr_block_check_events+0x2d/0x30 [sr_mod]
[  525.198291]  [<ffffffff812701c6>] disk_check_events+0x56/0x1b0
[  525.199984]  [<ffffffff81270331>] disk_events_workfn+0x11/0x20
[  525.201616]  [<ffffffff8107ceaf>] process_one_work+0x13f/0x370
[  525.203264]  [<ffffffff8107de99>] worker_thread+0x119/0x500
[  525.204799]  [<ffffffff8107dd80>] ? rescuer_thread+0x350/0x350
[  525.206436]  [<ffffffff81082f7c>] kthread+0xdc/0x100
[  525.207902]  [<ffffffff81082ea0>] ? kthread_create_on_node+0x1b0/0x1b0
[  525.209655]  [<ffffffff815a383c>] ret_from_fork+0x7c/0xb0
[  525.211206]  [<ffffffff81082ea0>] ? kthread_create_on_node+0x1b0/0x1b0
(...snipped...)
[  619.934144] kworker/1:0     R  running task        0 10832      2 0x00000080
[  619.936060] Workqueue: events_freezable_power_ disk_events_workfn
[  619.937833]  ffff88007c8ab7d8 0000000000000046 ffff88007c8ab8a0 ffff88007c894190
[  619.939912]  0000000000012500 ffff88007c8abfd8 0000000000012500 ffff88007c894190
[  619.942010]  0000000000000020 ffff88007c8ab8b0 0000000000000002 ffffffff81848408
[  619.944123] Call Trace:
[  619.945168]  [<ffffffff8159f814>] _cond_resched+0x24/0x40
[  619.946697]  [<ffffffff81122119>] shrink_slab+0x139/0x150
[  619.948271]  [<ffffffff811252bf>] do_try_to_free_pages+0x35f/0x4d0
[  619.949968]  [<ffffffff811254c4>] try_to_free_pages+0x94/0xc0
[  619.951576]  [<ffffffff8111a793>] __alloc_pages_nodemask+0x4e3/0xa40
[  619.953387]  [<ffffffff8115a8ce>] alloc_pages_current+0x8e/0x100
[  619.955062]  [<ffffffff8125bed6>] bio_copy_user_iov+0x1d6/0x380
[  619.956726]  [<ffffffff8125e4cd>] ? blk_rq_init+0xed/0x160
[  619.958289]  [<ffffffff8125c119>] bio_copy_kern+0x49/0x100
[  619.959886]  [<ffffffff810a14a0>] ? prepare_to_wait_event+0x100/0x100
[  619.961641]  [<ffffffff81265e6f>] blk_rq_map_kern+0x6f/0x130
[  619.963229]  [<ffffffff8116393e>] ? kmem_cache_alloc+0x48e/0x4b0
[  619.964904]  [<ffffffff813a66cf>] scsi_execute+0x12f/0x160
[  619.966499]  [<ffffffff813a7f14>] scsi_execute_req_flags+0x84/0xf0
[  619.968182]  [<ffffffffa01e29cc>] sr_check_events+0xbc/0x2e0 [sr_mod]
[  619.969936]  [<ffffffff8109834c>] ? put_prev_entity+0x2c/0x3b0
[  619.971583]  [<ffffffffa01d6177>] cdrom_check_events+0x17/0x30 [cdrom]
[  619.973346]  [<ffffffffa01e2e5d>] sr_block_check_events+0x2d/0x30 [sr_mod]
[  619.975213]  [<ffffffff812701c6>] disk_check_events+0x56/0x1b0
[  619.976865]  [<ffffffff81270331>] disk_events_workfn+0x11/0x20
[  619.978497]  [<ffffffff8107ceaf>] process_one_work+0x13f/0x370
[  619.980179]  [<ffffffff8107de99>] worker_thread+0x119/0x500
[  619.981793]  [<ffffffff8107dd80>] ? rescuer_thread+0x350/0x350
[  619.983468]  [<ffffffff81082f7c>] kthread+0xdc/0x100
[  619.984939]  [<ffffffff81082ea0>] ? kthread_create_on_node+0x1b0/0x1b0
[  619.986684]  [<ffffffff815a383c>] ret_from_fork+0x7c/0xb0
[  619.988231]  [<ffffffff81082ea0>] ? kthread_create_on_node+0x1b0/0x1b0
(...snipped...)
[  715.930998] kworker/1:0     R  running task        0 10832      2 0x00000080
[  715.932930] Workqueue: events_freezable_power_ disk_events_workfn
[  715.934670]  ffff880076fb9b40 0000000000000400 ffff88007c8ab8a0 0000000000000000
[  715.936814]  ffff88007c8ab7e8 ffff88007c8abfd8 0000000000012500 ffff88007c894190
[  715.938869]  0000000000000020 ffff88007c8ab8b0 0000000000000002 ffffffff81848408
[  715.940909] Call Trace:
[  715.942017]  [<ffffffff8159f814>] _cond_resched+0x24/0x40
[  715.943638]  [<ffffffff81122119>] shrink_slab+0x139/0x150
[  715.945256]  [<ffffffff811252bf>] do_try_to_free_pages+0x35f/0x4d0
[  715.947001]  [<ffffffff811254c4>] try_to_free_pages+0x94/0xc0
[  715.948603]  [<ffffffff8111a793>] __alloc_pages_nodemask+0x4e3/0xa40
[  715.950298]  [<ffffffff8115a8ce>] alloc_pages_current+0x8e/0x100
[  715.952010]  [<ffffffff8125bed6>] bio_copy_user_iov+0x1d6/0x380
[  715.953658]  [<ffffffff8125e4cd>] ? blk_rq_init+0xed/0x160
[  715.955324]  [<ffffffff8125c119>] bio_copy_kern+0x49/0x100
[  715.956929]  [<ffffffff810a14a0>] ? prepare_to_wait_event+0x100/0x100
[  715.958693]  [<ffffffff81265e6f>] blk_rq_map_kern+0x6f/0x130
[  715.960722]  [<ffffffff8116393e>] ? kmem_cache_alloc+0x48e/0x4b0
[  715.962488]  [<ffffffff813a66cf>] scsi_execute+0x12f/0x160
[  715.964142]  [<ffffffff813a7f14>] scsi_execute_req_flags+0x84/0xf0
[  715.965870]  [<ffffffffa01e29cc>] sr_check_events+0xbc/0x2e0 [sr_mod]
[  715.967615]  [<ffffffff8109834c>] ? put_prev_entity+0x2c/0x3b0
[  715.969255]  [<ffffffffa01d6177>] cdrom_check_events+0x17/0x30 [cdrom]
[  715.971061]  [<ffffffffa01e2e5d>] sr_block_check_events+0x2d/0x30 [sr_mod]
[  715.972981]  [<ffffffff812701c6>] disk_check_events+0x56/0x1b0
[  715.974692]  [<ffffffff81270331>] disk_events_workfn+0x11/0x20
[  715.976330]  [<ffffffff8107ceaf>] process_one_work+0x13f/0x370
[  715.978090]  [<ffffffff8107de99>] worker_thread+0x119/0x500
[  715.979723]  [<ffffffff8107dd80>] ? rescuer_thread+0x350/0x350
[  715.981361]  [<ffffffff81082f7c>] kthread+0xdc/0x100
[  715.982794]  [<ffffffff81082ea0>] ? kthread_create_on_node+0x1b0/0x1b0
[  715.984554]  [<ffffffff815a383c>] ret_from_fork+0x7c/0xb0
[  715.986116]  [<ffffffff81082ea0>] ? kthread_create_on_node+0x1b0/0x1b0
(...snipped...)
[  798.788405] kworker/1:0     R  running task        0 10832      2 0x00000088
[  798.790344] Workqueue: events_freezable_power_ disk_events_workfn
[  798.792191]  ffff880035e3f340 0000000000000400 ffff88007c8ab8a0 0000000000000000
[  798.794328]  ffff88007c8ab7e8 ffffffff8112132a ffff88007c8ab908 ffff88007cfee800
[  798.796395]  0000000000000020 0000000000000000 ffff88007c8ab838 ffff88007c8ab8b0
[  798.798458] Call Trace:
[  798.799525]  [<ffffffff8112132a>] ? shrink_slab_node+0x3a/0x1b0
[  798.801229]  [<ffffffff81122063>] ? shrink_slab+0x83/0x150
[  798.802809]  [<ffffffff811252bf>] ? do_try_to_free_pages+0x35f/0x4d0
[  798.804586]  [<ffffffff811254c4>] ? try_to_free_pages+0x94/0xc0
[  798.806250]  [<ffffffff8111a793>] ? __alloc_pages_nodemask+0x4e3/0xa40
[  798.808050]  [<ffffffff8115a8ce>] ? alloc_pages_current+0x8e/0x100
[  798.809759]  [<ffffffff8125bed6>] ? bio_copy_user_iov+0x1d6/0x380
[  798.811500]  [<ffffffff8125e4cd>] ? blk_rq_init+0xed/0x160
[  798.813053]  [<ffffffff8125c119>] ? bio_copy_kern+0x49/0x100
[  798.814699]  [<ffffffff810a14a0>] ? prepare_to_wait_event+0x100/0x100
[  798.816494]  [<ffffffff81265e6f>] ? blk_rq_map_kern+0x6f/0x130
[  798.818421]  [<ffffffff8116393e>] ? kmem_cache_alloc+0x48e/0x4b0
[  798.820083]  [<ffffffff813a66cf>] ? scsi_execute+0x12f/0x160
[  798.821733]  [<ffffffff813a7f14>] ? scsi_execute_req_flags+0x84/0xf0
[  798.823454]  [<ffffffffa01e29cc>] ? sr_check_events+0xbc/0x2e0 [sr_mod]
[  798.825312]  [<ffffffff8109834c>] ? put_prev_entity+0x2c/0x3b0
[  798.826930]  [<ffffffffa01d6177>] ? cdrom_check_events+0x17/0x30 [cdrom]
[  798.828733]  [<ffffffffa01e2e5d>] ? sr_block_check_events+0x2d/0x30 [sr_mod]
[  798.830594]  [<ffffffff812701c6>] ? disk_check_events+0x56/0x1b0
[  798.832338]  [<ffffffff81270331>] ? disk_events_workfn+0x11/0x20
[  798.834013]  [<ffffffff8107ceaf>] ? process_one_work+0x13f/0x370
[  798.835682]  [<ffffffff8107de99>] ? worker_thread+0x119/0x500
[  798.837350]  [<ffffffff8107dd80>] ? rescuer_thread+0x350/0x350
[  798.838990]  [<ffffffff81082f7c>] ? kthread+0xdc/0x100
[  798.840489]  [<ffffffff81082ea0>] ? kthread_create_on_node+0x1b0/0x1b0
[  798.842258]  [<ffffffff815a383c>] ? ret_from_fork+0x7c/0xb0
[  798.843837]  [<ffffffff81082ea0>] ? kthread_create_on_node+0x1b0/0x1b0
(...snipped...)
[  850.354473] kworker/1:0     R  running task        0 10832      2 0x00000080
[  850.356549] Workqueue: events_freezable_power_ disk_events_workfn
[  850.358273]  ffff88007c8ab7d8 0000000000000046 ffff88007c8ab8a0 ffff88007c894190
[  850.360359]  0000000000012500 ffff88007c8abfd8 0000000000012500 ffff88007c894190
[  850.362427]  0000000000000020 ffff88007c8ab8b0 0000000000000002 ffffffff81848408
[  850.364505] Call Trace:
[  850.365504]  [<ffffffff8159f814>] _cond_resched+0x24/0x40
[  850.369185]  [<ffffffff81122119>] shrink_slab+0x139/0x150
[  850.371553]  [<ffffffff811252bf>] do_try_to_free_pages+0x35f/0x4d0
[  850.373384]  [<ffffffff811254c4>] try_to_free_pages+0x94/0xc0
[  850.375503]  [<ffffffff8111a793>] __alloc_pages_nodemask+0x4e3/0xa40
[  850.377333]  [<ffffffff8115a8ce>] alloc_pages_current+0x8e/0x100
[  850.379100]  [<ffffffff8125bed6>] bio_copy_user_iov+0x1d6/0x380
[  850.380763]  [<ffffffff8125e4cd>] ? blk_rq_init+0xed/0x160
[  850.382362]  [<ffffffff8125c119>] bio_copy_kern+0x49/0x100
[  850.384008]  [<ffffffff810a14a0>] ? prepare_to_wait_event+0x100/0x100
[  850.385799]  [<ffffffff81265e6f>] blk_rq_map_kern+0x6f/0x130
[  850.387572]  [<ffffffff8116393e>] ? kmem_cache_alloc+0x48e/0x4b0
[  850.389995]  [<ffffffff813a66cf>] scsi_execute+0x12f/0x160
[  850.391575]  [<ffffffff813a7f14>] scsi_execute_req_flags+0x84/0xf0
[  850.393298]  [<ffffffffa01e29cc>] sr_check_events+0xbc/0x2e0 [sr_mod]
[  850.395050]  [<ffffffff8109834c>] ? put_prev_entity+0x2c/0x3b0
[  850.396696]  [<ffffffffa01d6177>] cdrom_check_events+0x17/0x30 [cdrom]
[  850.398459]  [<ffffffffa01e2e5d>] sr_block_check_events+0x2d/0x30 [sr_mod]
[  850.400321]  [<ffffffff812701c6>] disk_check_events+0x56/0x1b0
[  850.401986]  [<ffffffff81270331>] disk_events_workfn+0x11/0x20
[  850.403621]  [<ffffffff8107ceaf>] process_one_work+0x13f/0x370
[  850.405618]  [<ffffffff8107de99>] worker_thread+0x119/0x500
[  850.407336]  [<ffffffff8107dd80>] ? rescuer_thread+0x350/0x350
[  850.411190]  [<ffffffff81082f7c>] kthread+0xdc/0x100
[  850.412677]  [<ffffffff81082ea0>] ? kthread_create_on_node+0x1b0/0x1b0
[  850.414454]  [<ffffffff815a383c>] ret_from_fork+0x7c/0xb0
[  850.416010]  [<ffffffff81082ea0>] ? kthread_create_on_node+0x1b0/0x1b0
(...snipped...)
[  907.302050] kworker/1:0     R  running task        0 10832      2 0x00000080
[  907.303961] Workqueue: events_freezable_power_ disk_events_workfn
[  907.305706]  ffff88007c8ab7d8 0000000000000046 ffff88007c8ab8a0 ffff88007c894190
[  907.307761]  0000000000012500 ffff88007c8abfd8 0000000000012500 ffff88007c894190
[  907.309894]  0000000000000020 ffff88007c8ab8b0 0000000000000002 ffffffff81848408
[  907.311949] Call Trace:
[  907.312989]  [<ffffffff8159f814>] _cond_resched+0x24/0x40
[  907.314578]  [<ffffffff81122119>] shrink_slab+0x139/0x150
[  907.316182]  [<ffffffff811252bf>] do_try_to_free_pages+0x35f/0x4d0
[  907.317889]  [<ffffffff811254c4>] try_to_free_pages+0x94/0xc0
[  907.319535]  [<ffffffff8111a793>] __alloc_pages_nodemask+0x4e3/0xa40
[  907.321259]  [<ffffffff8115a8ce>] alloc_pages_current+0x8e/0x100
[  907.322945]  [<ffffffff8125bed6>] bio_copy_user_iov+0x1d6/0x380
[  907.324606]  [<ffffffff8125e4cd>] ? blk_rq_init+0xed/0x160
[  907.326196]  [<ffffffff8125c119>] bio_copy_kern+0x49/0x100
[  907.327788]  [<ffffffff810a14a0>] ? prepare_to_wait_event+0x100/0x100
[  907.329549]  [<ffffffff81265e6f>] blk_rq_map_kern+0x6f/0x130
[  907.331184]  [<ffffffff8116393e>] ? kmem_cache_alloc+0x48e/0x4b0
[  907.332877]  [<ffffffff813a66cf>] scsi_execute+0x12f/0x160
[  907.334452]  [<ffffffff813a7f14>] scsi_execute_req_flags+0x84/0xf0
[  907.336156]  [<ffffffffa01e29cc>] sr_check_events+0xbc/0x2e0 [sr_mod]
[  907.337893]  [<ffffffff8109834c>] ? put_prev_entity+0x2c/0x3b0
[  907.339539]  [<ffffffffa01d6177>] cdrom_check_events+0x17/0x30 [cdrom]
[  907.341289]  [<ffffffffa01e2e5d>] sr_block_check_events+0x2d/0x30 [sr_mod]
[  907.343115]  [<ffffffff812701c6>] disk_check_events+0x56/0x1b0
[  907.344771]  [<ffffffff81270331>] disk_events_workfn+0x11/0x20
[  907.346421]  [<ffffffff8107ceaf>] process_one_work+0x13f/0x370
[  907.348057]  [<ffffffff8107de99>] worker_thread+0x119/0x500
[  907.349650]  [<ffffffff8107dd80>] ? rescuer_thread+0x350/0x350
[  907.351295]  [<ffffffff81082f7c>] kthread+0xdc/0x100
[  907.352765]  [<ffffffff81082ea0>] ? kthread_create_on_node+0x1b0/0x1b0
[  907.354520]  [<ffffffff815a383c>] ret_from_fork+0x7c/0xb0
[  907.356097]  [<ffffffff81082ea0>] ? kthread_create_on_node+0x1b0/0x1b0
----------

I don't know which process is holding the mutex which PID=10732 is waiting
for, but I suspect that a process holding the mutex which PID=10732 is waiting
for is waiting for completion of disk I/O which is processed by PID=10832.

If my suspect is correct, it's a AB-BA livelock because the OOM killer is
waiting for PID=10732 to terminate whereas PID=10832 cannot complete disk
I/O due to waiting for the OOM killer. Unfortunately I'm not familiar with
XFS, thus I can't find who is.

Maybe PID=10802 than PID=10832? Then, why both PID=10802 and PID=10832 are
blocked for memory allocation?

----------
[  715.162520] a.out           R  running task        0 10802      1 0x00000084
[  715.164482]  ffff88007b877898 0000000000000082 ffff88007b877960 ffff8800751bc050
[  715.166574]  0000000000012500 ffff88007b877fd8 0000000000012500 ffff8800751bc050
[  715.169036]  0000000000000020 ffff88007b877970 0000000000000003 ffffffff81848408
[  715.171125] Call Trace:
[  715.172185]  [<ffffffff8159f814>] _cond_resched+0x24/0x40
[  715.173773]  [<ffffffff81122119>] shrink_slab+0x139/0x150
[  715.175356]  [<ffffffff811252bf>] do_try_to_free_pages+0x35f/0x4d0
[  715.177088]  [<ffffffff811254c4>] try_to_free_pages+0x94/0xc0
[  715.178721]  [<ffffffff8111a793>] __alloc_pages_nodemask+0x4e3/0xa40
[  715.180583]  [<ffffffff8115a8ce>] alloc_pages_current+0x8e/0x100
[  715.182203]  [<ffffffff81111b27>] __page_cache_alloc+0xa7/0xc0
[  715.183864]  [<ffffffff8111263b>] pagecache_get_page+0x6b/0x1e0
[  715.185533]  [<ffffffffa02522ae>] ? xfs_trans_commit+0x13e/0x230 [xfs]
[  715.187314]  [<ffffffff811127de>] grab_cache_page_write_begin+0x2e/0x50
[  715.189108]  [<ffffffffa02301cf>] xfs_vm_write_begin+0x2f/0xe0 [xfs]
[  715.190876]  [<ffffffff8111188c>] generic_perform_write+0xcc/0x1d0
[  715.192610]  [<ffffffffa023b50f>] ? xfs_file_aio_write_checks+0xdf/0xf0 [xfs]
[  715.194526]  [<ffffffffa023b5ef>] xfs_file_buffered_aio_write.isra.15+0xcf/0x200 [xfs]
[  715.196580]  [<ffffffffa023b79e>] xfs_file_write_iter+0x7e/0x120 [xfs]
[  715.198368]  [<ffffffff8117edd9>] new_sync_write+0x89/0xd0
[  715.200029]  [<ffffffff8117f742>] vfs_write+0xb2/0x1f0
[  715.201576]  [<ffffffff8101a9f4>] ? do_audit_syscall_entry+0x64/0x70
[  715.203309]  [<ffffffff81180200>] SyS_write+0x50/0xc0
[  715.204866]  [<ffffffff810f729e>] ? __audit_syscall_exit+0x22e/0x2d0
[  715.206613]  [<ffffffff815a38e9>] system_call_fastpath+0x12/0x17
(...snipped...)
[  906.533722] a.out           R  running task        0 10802      1 0x00000084
[  906.535671]  ffff88007b877898 0000000000000082 ffff88007b877960 ffff8800751bc050
[  906.537699]  0000000000012500 ffff88007b877fd8 0000000000012500 ffff8800751bc050
[  906.539838]  0000000000000020 ffff88007b877970 0000000000000003 ffffffff81848408
[  906.541916] Call Trace:
[  906.543075]  [<ffffffff8159f814>] _cond_resched+0x24/0x40
[  906.544610]  [<ffffffff81122119>] shrink_slab+0x139/0x150
[  906.546223]  [<ffffffff811252bf>] do_try_to_free_pages+0x35f/0x4d0
[  906.547941]  [<ffffffff811254c4>] try_to_free_pages+0x94/0xc0
[  906.549622]  [<ffffffff8111a793>] __alloc_pages_nodemask+0x4e3/0xa40
[  906.551357]  [<ffffffff8115a8ce>] alloc_pages_current+0x8e/0x100
[  906.553070]  [<ffffffff81111b27>] __page_cache_alloc+0xa7/0xc0
[  906.554748]  [<ffffffff8111263b>] pagecache_get_page+0x6b/0x1e0
[  906.556409]  [<ffffffffa02522ae>] ? xfs_trans_commit+0x13e/0x230 [xfs]
[  906.558180]  [<ffffffff811127de>] grab_cache_page_write_begin+0x2e/0x50
[  906.560242]  [<ffffffffa02301cf>] xfs_vm_write_begin+0x2f/0xe0 [xfs]
[  906.562027]  [<ffffffff8111188c>] generic_perform_write+0xcc/0x1d0
[  906.563851]  [<ffffffffa023b50f>] ? xfs_file_aio_write_checks+0xdf/0xf0 [xfs]
[  906.565838]  [<ffffffffa023b5ef>] xfs_file_buffered_aio_write.isra.15+0xcf/0x200 [xfs]
[  906.567892]  [<ffffffffa023b79e>] xfs_file_write_iter+0x7e/0x120 [xfs]
[  906.569719]  [<ffffffff8117edd9>] new_sync_write+0x89/0xd0
[  906.571300]  [<ffffffff8117f742>] vfs_write+0xb2/0x1f0
[  906.572836]  [<ffffffff8101a9f4>] ? do_audit_syscall_entry+0x64/0x70
[  906.574578]  [<ffffffff81180200>] SyS_write+0x50/0xc0
[  906.576198]  [<ffffffff810f729e>] ? __audit_syscall_exit+0x22e/0x2d0
[  906.577929]  [<ffffffff815a38e9>] system_call_fastpath+0x12/0x17
----------

Anyway stalling for 10 minutes upon OOM (and can't solve with SysRq-f) is
unusable for me.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2014-12-20 12:41 UTC|newest]

Thread overview: 177+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-12 13:54 [RFC PATCH] oom: Don't count on mm-less current process Tetsuo Handa
2014-12-16 12:47 ` Michal Hocko
2014-12-17 11:54   ` Tetsuo Handa
2014-12-17 13:08     ` Michal Hocko
2014-12-18 12:11       ` Tetsuo Handa
2014-12-18 15:33         ` Michal Hocko
2014-12-19 12:07           ` Tetsuo Handa
2014-12-19 12:49             ` Michal Hocko
2014-12-20  9:13               ` Tetsuo Handa
2014-12-20 11:42                 ` Tetsuo Handa
2014-12-22 20:25                   ` Michal Hocko
2014-12-23  1:00                     ` Tetsuo Handa
2014-12-23  9:51                       ` Michal Hocko
2014-12-23 11:46                         ` Tetsuo Handa
2014-12-23 11:57                           ` Tetsuo Handa
2014-12-23 12:12                             ` Tetsuo Handa
2014-12-23 12:27                             ` Michal Hocko
2014-12-23 12:24                           ` Michal Hocko
2014-12-23 13:00                             ` Tetsuo Handa
2014-12-23 13:09                               ` Michal Hocko
2014-12-23 13:20                                 ` Tetsuo Handa
2014-12-23 13:43                                   ` Michal Hocko
2014-12-23 14:11                                     ` Tetsuo Handa
2014-12-23 14:57                                       ` Michal Hocko
2014-12-19 12:22           ` How to handle TIF_MEMDIE stalls? Tetsuo Handa
2014-12-20  2:03             ` Dave Chinner
2014-12-20 12:41               ` Tetsuo Handa [this message]
2014-12-20 22:35                 ` Dave Chinner
2014-12-21  8:45                   ` Tetsuo Handa
2014-12-21 20:42                     ` Dave Chinner
2014-12-22 16:57                       ` Michal Hocko
2014-12-22 21:30                         ` Dave Chinner
2014-12-23  9:41                           ` Johannes Weiner
2014-12-24  1:06                             ` Dave Chinner
2014-12-24  2:40                               ` Linus Torvalds
2014-12-29 18:19                     ` Michal Hocko
2014-12-30  6:42                       ` Tetsuo Handa
2014-12-30 11:21                         ` Michal Hocko
2014-12-30 13:33                           ` Tetsuo Handa
2014-12-31 10:24                             ` Tetsuo Handa
2015-02-09 11:44                           ` Tetsuo Handa
2015-02-10 13:58                             ` Tetsuo Handa
2015-02-10 15:19                               ` Johannes Weiner
2015-02-11  2:23                                 ` Tetsuo Handa
2015-02-11 13:37                                   ` Tetsuo Handa
2015-02-11 18:50                                     ` Oleg Nesterov
2015-02-11 18:59                                       ` Oleg Nesterov
2015-03-14 13:03                                         ` Tetsuo Handa
2015-02-17 12:23                                   ` Tetsuo Handa
2015-02-17 12:53                                     ` Johannes Weiner
2015-02-17 15:38                                       ` Michal Hocko
2015-02-17 22:54                                       ` Dave Chinner
2015-02-17 23:32                                         ` Dave Chinner
2015-02-18  8:25                                         ` Michal Hocko
2015-02-18 10:48                                           ` Dave Chinner
2015-02-18 12:16                                             ` Michal Hocko
2015-02-18 21:31                                               ` Dave Chinner
2015-02-19  9:40                                                 ` Michal Hocko
2015-02-19 22:03                                                   ` Dave Chinner
2015-02-20  9:27                                                     ` Michal Hocko
2015-02-19 11:01                                               ` Johannes Weiner
2015-02-19 12:29                                                 ` Michal Hocko
2015-02-19 12:58                                                   ` Michal Hocko
2015-02-19 15:29                                                     ` Tetsuo Handa
2015-02-19 21:53                                                       ` Tetsuo Handa
2015-02-20  9:13                                                       ` Michal Hocko
2015-02-20 13:37                                                         ` Stefan Ring
2015-02-19 13:29                                                   ` Tetsuo Handa
2015-02-20  9:10                                                     ` Michal Hocko
2015-02-20 12:20                                                       ` Tetsuo Handa
2015-02-20 12:38                                                         ` Michal Hocko
2015-02-19 21:43                                                   ` Dave Chinner
2015-02-20 12:48                                                     ` Michal Hocko
2015-02-20 23:09                                                       ` Dave Chinner
2015-02-19 10:24                                         ` Johannes Weiner
2015-02-19 22:52                                           ` Dave Chinner
2015-02-20 10:36                                             ` Tetsuo Handa
2015-02-20 23:15                                               ` Dave Chinner
2015-02-21  3:20                                                 ` Theodore Ts'o
2015-02-21  9:19                                                   ` Andrew Morton
2015-02-21 13:48                                                     ` Tetsuo Handa
2015-02-21 21:38                                                     ` Dave Chinner
2015-02-22  0:20                                                     ` Johannes Weiner
2015-02-23 10:48                                                       ` Michal Hocko
2015-02-23 11:23                                                         ` Tetsuo Handa
2015-02-23 21:33                                                       ` David Rientjes
2015-02-22 14:48                                                     ` __GFP_NOFAIL and oom_killer_disabled? Tetsuo Handa
2015-02-23 10:21                                                       ` Michal Hocko
2015-02-23 13:03                                                         ` Tetsuo Handa
2015-02-24 18:14                                                           ` Michal Hocko
2015-02-25 11:22                                                             ` Tetsuo Handa
2015-02-25 16:02                                                               ` Michal Hocko
2015-02-25 21:48                                                                 ` Tetsuo Handa
2015-02-25 21:51                                                                   ` Andrew Morton
2015-02-21 12:00                                                   ` How to handle TIF_MEMDIE stalls? Tetsuo Handa
2015-02-23 10:26                                                   ` Michal Hocko
2015-02-21 11:12                                                 ` Tetsuo Handa
2015-02-21 21:48                                                   ` Dave Chinner
2015-02-21 23:52                                             ` Johannes Weiner
2015-02-23  0:45                                               ` Dave Chinner
2015-02-23  1:29                                                 ` Andrew Morton
2015-02-23  7:32                                                   ` Dave Chinner
2015-02-27 18:24                                                     ` Vlastimil Babka
2015-02-28  0:03                                                       ` Dave Chinner
2015-02-28 15:17                                                         ` Theodore Ts'o
2015-03-02  9:39                                                     ` Vlastimil Babka
2015-03-02 22:31                                                       ` Dave Chinner
2015-03-03  9:13                                                         ` Vlastimil Babka
2015-03-04  1:33                                                           ` Dave Chinner
2015-03-04  8:50                                                             ` Vlastimil Babka
2015-03-04 11:03                                                               ` Dave Chinner
2015-03-07  0:20                                                         ` Johannes Weiner
2015-03-07  3:43                                                           ` Dave Chinner
2015-03-07 15:08                                                             ` Johannes Weiner
2015-03-02 20:22                                                     ` Johannes Weiner
2015-03-02 23:12                                                       ` Dave Chinner
2015-03-03  2:50                                                         ` Johannes Weiner
2015-03-04  6:52                                                           ` Dave Chinner
2015-03-04 15:04                                                             ` Johannes Weiner
2015-03-04 17:38                                                               ` Theodore Ts'o
2015-03-04 23:17                                                                 ` Dave Chinner
2015-02-28 16:29                                                 ` Johannes Weiner
2015-02-28 16:41                                                   ` Theodore Ts'o
2015-02-28 22:15                                                     ` Johannes Weiner
2015-03-01 11:17                                                       ` Tetsuo Handa
2015-03-06 11:53                                                         ` Tetsuo Handa
2015-03-01 13:43                                                       ` Theodore Ts'o
2015-03-01 16:15                                                         ` Johannes Weiner
2015-03-01 19:36                                                           ` Theodore Ts'o
2015-03-01 20:44                                                             ` Johannes Weiner
2015-03-01 20:17                                                         ` Johannes Weiner
2015-03-01 21:48                                                       ` Dave Chinner
2015-03-02  0:17                                                         ` Dave Chinner
2015-03-02 12:46                                                           ` Brian Foster
2015-02-28 18:36                                                 ` Vlastimil Babka
2015-03-02 15:18                                                 ` Michal Hocko
2015-03-02 16:05                                                   ` Johannes Weiner
2015-03-02 17:10                                                     ` Michal Hocko
2015-03-02 17:27                                                       ` Johannes Weiner
2015-03-02 16:39                                                   ` Theodore Ts'o
2015-03-02 16:58                                                     ` Michal Hocko
2015-03-04 12:52                                                       ` Dave Chinner
2015-02-17 14:59                                     ` Michal Hocko
2015-02-17 14:50                                 ` Michal Hocko
2015-02-17 14:37                             ` Michal Hocko
2015-02-17 14:44                               ` Michal Hocko
2015-02-16 11:23                           ` Tetsuo Handa
2015-02-16 15:42                             ` Johannes Weiner
2015-02-17 11:57                               ` Tetsuo Handa
2015-02-17 13:16                                 ` Johannes Weiner
2015-02-17 16:50                                   ` Michal Hocko
2015-02-17 23:25                                     ` Dave Chinner
2015-02-18  8:48                                       ` Michal Hocko
2015-02-18 11:23                                         ` Tetsuo Handa
2015-02-18 12:29                                           ` Michal Hocko
2015-02-18 14:06                                             ` Tetsuo Handa
2015-02-18 14:25                                               ` Michal Hocko
2015-02-19 10:48                                                 ` Tetsuo Handa
2015-02-20  8:26                                                   ` Michal Hocko
2015-02-23 22:08                                 ` David Rientjes
2015-02-24 11:20                                   ` Tetsuo Handa
2015-02-24 15:20                                     ` Theodore Ts'o
2015-02-24 21:02                                       ` Dave Chinner
2015-02-25 14:31                                         ` Tetsuo Handa
2015-02-27  7:39                                           ` Dave Chinner
2015-02-27 12:42                                             ` Tetsuo Handa
2015-02-27 13:12                                               ` Dave Chinner
2015-03-04 12:41                                                 ` Tetsuo Handa
2015-03-04 13:25                                                   ` Dave Chinner
2015-03-04 14:11                                                     ` Tetsuo Handa
2015-03-05  1:36                                                       ` Dave Chinner
2015-02-17 16:33                             ` Michal Hocko
2014-12-29 17:40                   ` [PATCH] mm: get rid of radix tree gfp mask for pagecache_get_page (was: Re: How to handle TIF_MEMDIE stalls?) Michal Hocko
2014-12-29 18:45                     ` Linus Torvalds
2014-12-29 19:33                       ` Michal Hocko
2014-12-30 13:42                         ` Michal Hocko
2014-12-30 21:45                           ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201412202141.ADF87596.tOSLJHFFOOFMVQ@I-love.SAKURA.ne.jp \
    --to=penguin-kernel@i-love.sakura.ne.jp \
    --cc=david@fromorbit.com \
    --cc=dchinner@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=oleg@redhat.com \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox