From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail190.messagelabs.com (mail190.messagelabs.com [216.82.249.51]) by kanga.kvack.org (Postfix) with ESMTP id 399975F0001 for ; Mon, 2 Feb 2009 06:44:52 -0500 (EST) Subject: Re: [BUG??] Deadlock between kswapd and sys_inotify_add_watch(lockdep report) From: Peter Zijlstra In-Reply-To: <20090202112721.GA13532@barrios-desktop> References: <20090202101735.GA12757@barrios-desktop> <28c262360902020225w6419089ft2dda30da9dfb32a9@mail.gmail.com> <1233571202.4787.124.camel@laptop> <20090202112721.GA13532@barrios-desktop> Content-Type: text/plain Date: Mon, 02 Feb 2009 12:44:45 +0100 Message-Id: <1233575085.4787.140.camel@laptop> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: MinChan Kim Cc: Nick Piggin , linux kernel , linux mm List-ID: On Mon, 2009-02-02 at 20:27 +0900, MinChan Kim wrote: > On Mon, Feb 02, 2009 at 11:40:02AM +0100, Peter Zijlstra wrote: > > On Mon, 2009-02-02 at 19:25 +0900, MinChan Kim wrote: > > > But, I am not sure whether it's real bug or not. > > > > Me neither, inode life-times are tricky, but on first sight it looks > > real enough. > > > > > I always suffer from reading lockdep report's result. :( > > > It would be better to have a document about lockdep report analysis. > > > > I've never found them hard to read, so I'm afraid you'll have to be more > > explicit about what is unclear to you. > > It's becuase not lockdep humble report but my poor knowledge. :( > Could you elaborate please ? > > >[ 331.718120] [ INFO: inconsistent lock state ] > >[ 331.718124] 2.6.28-rc2-mm1-lockdep #6 > >[ 331.718126] --------------------------------- > >[ 331.718129] inconsistent {ov-reclaim-W} -> {in-reclaim-W} usage. > ^ ^ > write ? write ? Correct, we track states for read and write, for single state locks we map everything on the exclusive state (write). > > > >[ 331.718133] kswapd0/218 [HC0[0]:SC0[0]:HE0:SE1] takes: > ^^^^^^^^^^^^^^^^^^^^^^ > what means ? HC,SC,HE,SE Ah, yes, that's a bit obscure, but usually not needed. Hardirq Context -- irq state tracking [preempt_count tracking] Softirq Context -- idem Hardirq Enabled Softirq Enabled It allows you to see if the irq state tracking matches up, and what the call context is. > > > >[ 331.718136] (&inode->inotify_mutex){--..+.}, at: [] inotify_inode_is_dead+0x20/0x90 > > > > Is it related to recursive lock of inotify_mutex ? Yes. > but, Subject means 'inconsistent {ov-reclaim-W} -> {in-reclaim-W}', > IOW, it's related to reclaim of GFP_FS. > What's relation inotify_mutex and reclaim of GFP_FS? The lockdep report states the following: While holding inotify_mutex, we do a __GFP_FS allocation. But __GFP_FS allocations can end up locking inotify_mutex. > I think if reclaim context which have GFP_FS already have lock A and then > do pageout, if writepage need the lock A, we have to catch such a case. > I thought Nick's patch's goal catchs such a case. Correct, it exactly does that. > One more question is that what's difference between lock inversion and > circular locking dependency ? I'm not sure if there's a difference. I suspect they are two ways of saying the same. > >[ 331.718148] {ov-reclaim-W} state was registered at: > >[ 331.718150] [] mark_held_locks+0x3e/0x90 > >[ 331.718157] [] lockdep_trace_alloc+0x4e/0x80 > >[ 331.718162] [] kmem_cache_alloc+0x26/0xf0 > >[ 331.718166] [] idr_pre_get+0x50/0x70 > >[ 331.718172] [] inotify_handle_get_wd+0x21/0x60 > >[ 331.718176] [] inotify_add_watch+0x52/0xe0 > >[ 331.718181] [] sys_inotify_add_watch+0x148/0x170 > >[ 331.718185] [] syscall_call+0x7/0xb > >[ 331.718190] [] 0xffffffff This bit states, we saw inotify_mutex being held over a __GFP_FS reclaim. > >[ 331.718205] irq event stamp: 1288446 > >[ 331.718207] hardirqs last enabled at (1288445): [] call_rcu+0x75/0x90 > >[ 331.718213] hardirqs last disabled at (1288446): [] mutex_lock_nested+0x53/0x2f0 > >[ 331.718221] softirqs last enabled at (1284622): [] __do_softirq+0x132/0x180 > >[ 331.718226] softirqs last disabled at (1284617): [] do_softirq+0x89/0x90 > >[ 331.718231] > >[ 331.718232] other info that might help us debug this: > >[ 331.718236] 2 locks held by kswapd0/218: > >[ 331.718238] #0: (shrinker_rwsem){----..}, at: [] shrink_slab+0x25/0x1a0 > >[ 331.718248] #1: (&type->s_umount_key#4){-----.}, at: [] shrink_dcache_memory+0xfb/0x1a0 > >[ 331.718259] > >[ 331.718260] stack backtrace: > >[ 331.718263] Pid: 218, comm: kswapd0 Not tainted 2.6.28-rc2-mm1-lockdep #6 > >[ 331.718266] Call Trace: > >[ 331.718272] [] print_usage_bug+0x176/0x1c0 > >[ 331.718276] [] mark_lock+0xb05/0x10b0 > >[ 331.718282] [] ? __free_pages_ok+0x349/0x450 > >[ 331.718287] [] __lock_acquire+0x602/0xa80 > >[ 331.718291] [] ? validate_chain+0x3ef/0x1050 > >[ 331.718296] [] lock_acquire+0x71/0xa0 > >[ 331.718300] [] ? inotify_inode_is_dead+0x20/0x90 > >[ 331.718305] [] mutex_lock_nested+0x9d/0x2f0 > >[ 331.718310] [] ? inotify_inode_is_dead+0x20/0x90 > >[ 331.718314] [] ? inotify_inode_is_dead+0x20/0x90 > >[ 331.718318] [] inotify_inode_is_dead+0x20/0x90 > >[ 331.718323] [] ? _raw_spin_unlock+0x46/0x80 > >[ 331.718328] [] dentry_iput+0xa4/0xc0 > >[ 331.718333] [] d_kill+0x3b/0x60 > >[ 331.718337] [] __shrink_dcache_sb+0x1c6/0x2c0 > >[ 331.718342] [] shrink_dcache_memory+0x18d/0x1a0 > >[ 331.718347] [] shrink_slab+0x12b/0x1a0 > >[ 331.718351] [] kswapd+0x3af/0x5c0 > >[ 331.718356] [] ? isolate_pages_global+0x0/0x220 > >[ 331.718362] [] ? autoremove_wake_function+0x0/0x40 > >[ 331.718366] [] ? kswapd+0x0/0x5c0 > >[ 331.718371] [] kthread+0x47/0x80 > >[ 331.718375] [] ? kthread+0x0/0x80 > >[ 331.718380] [] kernel_thread_helper+0x7/0x10 This trace gives us the current situation, that is reported to violate the previous state. IOW here we use inotify_mutex during a __GFP_FS reclaim. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org