From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Sun, 26 Oct 2008 15:20:26 +1100 From: Dave Chinner Subject: Re: deadlock with latest xfs Message-ID: <20081026042026.GM18495@disturbed> References: <4900412A.2050802@sgi.com> <20081023205727.GA28490@infradead.org> <49013C47.4090601@sgi.com> <20081024052418.GO25906@disturbed> <20081024064804.GQ25906@disturbed> <20081026005351.GK18495@disturbed> <20081026025013.GL18495@disturbed> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20081026025013.GL18495@disturbed> Sender: owner-linux-mm@kvack.org Return-Path: To: Lachlan McIlroy , Christoph Hellwig , xfs-oss , linux-mm@kvack.org List-ID: On Sun, Oct 26, 2008 at 01:50:13PM +1100, Dave Chinner wrote: > On Sun, Oct 26, 2008 at 11:53:51AM +1100, Dave Chinner wrote: > > On Fri, Oct 24, 2008 at 05:48:04PM +1100, Dave Chinner wrote: > > > OK, I just hung a single-threaded rm -rf after this completed: > > > > > > # fsstress -p 1024 -n 100 -d /mnt/xfs2/fsstress > > > > > > It has hung with this trace: > > > > > > # echo w > /proc/sysrq-trigger > > .... > > > [42954211.590000] 794877f8: [<6002e40a>] update_curr+0x3a/0x50 > > > [42954211.590000] 79487818: [<60014f0d>] _switch_to+0x6d/0xe0 > > > [42954211.590000] 79487858: [<60324b21>] schedule+0x171/0x2c0 > > > [42954211.590000] 794878a8: [<60324e6d>] schedule_timeout+0xad/0xf0 > > > [42954211.590000] 794878c8: [<60326e98>] _spin_unlock_irqrestore+0x18/0x20 > > > [42954211.590000] 79487908: [<60195455>] xlog_grant_log_space+0x245/0x470 > > > [42954211.590000] 79487920: [<60030ba0>] default_wake_function+0x0/0x10 > > > [42954211.590000] 79487978: [<601957a2>] xfs_log_reserve+0x122/0x140 > > > [42954211.590000] 794879c8: [<601a36e7>] xfs_trans_reserve+0x147/0x2e0 > > > [42954211.590000] 794879f8: [<60087374>] kmem_cache_alloc+0x84/0x100 > > > [42954211.590000] 79487a38: [<601ab01f>] xfs_inactive_symlink_rmt+0x9f/0x450 > > > [42954211.590000] 79487a88: [<601ada94>] kmem_zone_zalloc+0x34/0x50 > > > [42954211.590000] 79487aa8: [<601a3a6d>] _xfs_trans_alloc+0x2d/0x70 > > .... > > > > I came back to the system, and found that the hang had gone away - the > > rm -rf had finished sometime in the ~36 hours between triggering the > > problem and coming back to look at the corpse.... > > > > So nothing to report yet. > > Got it now. I can reproduce this in a couple of minutes now that both > the test fs and the fs hosting the UML fs images are using lazy-count=1 > (and the frequent 10s long host system freezes have gone away, too). > > Looks like *another* new memory allocation problem [1]: [snip] And having fixed that, I'm now seeing the log reservation hang: [42950307.350000] xfsdatad/0 D 00000000407219f0 0 51 2 [42950307.350000] 7bd1acd8 7bd1a838 60498c40 81074000 81077b40 60014f0d 81044780 81074000 [42950307.350000] 81074000 7e15f808 7bd1a800 81044780 81077b90 60324bc1 81074000 00000250 [42950307.350000] 81074000 81074000 7fffffffffffffff 6646a168 80b6dd28 80b6ddf8 81077bf0 60324f0d <6>Call Trace: [42950307.350000] 81077b08: [<60014f0d>] _switch_to+0x6d/0xe0 [42950307.350000] 81077b48: [<60324bc1>] schedule+0x171/0x2c0 [42950307.350000] 81077b98: [<60324f0d>] schedule_timeout+0xad/0xf0 [42950307.350000] 81077bb8: [<60326f38>] _spin_unlock_irqrestore+0x18/0x20 [42950307.350000] 81077bf8: [<601953e9>] xlog_grant_log_space+0x169/0x470 [42950307.350000] 81077c10: [<60030ba0>] default_wake_function+0x0/0x10 [42950307.350000] 81077c68: [<60195812>] xfs_log_reserve+0x122/0x140 [42950307.350000] 81077cb8: [<601a3757>] xfs_trans_reserve+0x147/0x2e0 [42950307.350000] 81077ce8: [<601adb14>] kmem_zone_zalloc+0x34/0x50 [42950307.350000] 81077d28: [<6018f985>] xfs_iomap_write_unwritten+0xa5/0x2d0 [42950307.350000] 81077d38: [<60326f38>] _spin_unlock_irqrestore+0x18/0x20 [42950307.350000] 81077d48: [<60085750>] cache_free_debugcheck+0x150/0x2e0 [42950307.350000] 81077d50: [<60063d12>] mempool_free_slab+0x12/0x20 [42950307.350000] 81077d88: [<60085e02>] kmem_cache_free+0x72/0xb0 [42950307.350000] 81077dc8: [<60063dbf>] mempool_free+0x4f/0x90 [42950307.350000] 81077e08: [<601af66d>] xfs_end_bio_unwritten+0x6d/0xa0 [42950307.350000] 81077e38: [<60048574>] run_workqueue+0xa4/0x180 [42950307.350000] 81077e50: [<601af600>] xfs_end_bio_unwritten+0x0/0xa0 [42950307.350000] 81077e58: [<6004c791>] prepare_to_wait+0x51/0x80 [42950307.350000] 81077e98: [<600488e0>] worker_thread+0x70/0xd0 [42950307.350000] 81077eb0: [<6004c5b0>] autoremove_wake_function+0x0/0x40 [42950307.350000] 81077ee8: [<60048870>] worker_thread+0x0/0xd0 [42950307.350000] 81077f08: [<6004c204>] kthread+0x64/0xb0 [42950307.350000] 81077f48: [<60026285>] run_kernel_thread+0x35/0x60 [42950307.350000] 81077f58: [<6004c1a0>] kthread+0x0/0xb0 [42950307.350000] 81077f98: [<60026278>] run_kernel_thread+0x28/0x60 [42950307.350000] 81077fc8: [<60014e71>] new_thread_handler+0x71/0xa0 Basically, the log is too small to fit the number of transaction reservations that are currently being attempted (roughly 1000 parallel transactions), and so xlog_grant_log_space() is sleeping. Because it is sleeping in I/O completion, the log tail can't move forward because I/O completion is not occurring. I think that at this point, we need a separate workqueue for unwritten extent conversion to prevent it from blocking normal data and metadata I/O completion. that way we can allow it to recurse on allocation and transaction reservation without introducing I/O completion deadlocks.... Cheers, Dave. -- Dave Chinner david@fromorbit.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org