From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f197.google.com (mail-pf0-f197.google.com [209.85.192.197]) by kanga.kvack.org (Postfix) with ESMTP id 961F26B0069 for ; Wed, 19 Oct 2016 04:33:09 -0400 (EDT) Received: by mail-pf0-f197.google.com with SMTP id t25so8706339pfg.3 for ; Wed, 19 Oct 2016 01:33:09 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org. [2001:1868:205::9]) by mx.google.com with ESMTPS id o7si3006355pae.227.2016.10.19.01.33.08 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 19 Oct 2016 01:33:08 -0700 (PDT) Date: Wed, 19 Oct 2016 10:33:04 +0200 From: Peter Zijlstra Subject: Re: Xfs lockdep warning with for-dave-for-4.6 branch Message-ID: <20161019083304.GD3102@twins.programming.kicks-ass.net> References: <20160516130519.GJ23146@dhcp22.suse.cz> <20160516132541.GP3193@twins.programming.kicks-ass.net> <20160516231056.GE18496@dastard> <20160517144912.GZ3193@twins.programming.kicks-ass.net> <20160517223549.GV26977@dastard> <20160519081146.GS3193@twins.programming.kicks-ass.net> <20160520001714.GC26977@dastard> <20160601131758.GO26601@dhcp22.suse.cz> <20160601181617.GV3190@twins.programming.kicks-ass.net> <20161006130454.GI10570@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161006130454.GI10570@dhcp22.suse.cz> Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: Dave Chinner , "Darrick J. Wong" , Qu Wenruo , xfs@oss.sgi.com, linux-mm@kvack.org, Ingo Molnar On Thu, Oct 06, 2016 at 03:04:54PM +0200, Michal Hocko wrote: > From 04b3923e5b12f0eb3859f0718881fa0f40e60164 Mon Sep 17 00:00:00 2001 > From: Michal Hocko > Date: Fri, 13 May 2016 17:47:31 +0200 > Subject: [PATCH] lockdep: allow to disable reclaim lockup detection > > The current implementation of the reclaim lockup detection can lead to > false positives and those even happen and usually lead to tweak the > code to silence the lockdep by using GFP_NOFS even though the context > can use __GFP_FS just fine. See > http://lkml.kernel.org/r/20160512080321.GA18496@dastard as an example. > > ================================= > [ INFO: inconsistent lock state ] > 4.5.0-rc2+ #4 Tainted: G O > --------------------------------- > inconsistent {RECLAIM_FS-ON-R} -> {IN-RECLAIM_FS-W} usage. > kswapd0/543 [HC0[0]:SC0[0]:HE1:SE1] takes: > > (&xfs_nondir_ilock_class){++++-+}, at: [] xfs_ilock+0x177/0x200 [xfs] > > {RECLAIM_FS-ON-R} state was registered at: > [] mark_held_locks+0x79/0xa0 > [] lockdep_trace_alloc+0xb3/0x100 > [] kmem_cache_alloc+0x33/0x230 > [] kmem_zone_alloc+0x81/0x120 [xfs] > [] xfs_refcountbt_init_cursor+0x3e/0xa0 [xfs] > [] __xfs_refcount_find_shared+0x75/0x580 [xfs] > [] xfs_refcount_find_shared+0x84/0xb0 [xfs] > [] xfs_getbmap+0x608/0x8c0 [xfs] > [] xfs_vn_fiemap+0xab/0xc0 [xfs] > [] do_vfs_ioctl+0x498/0x670 > [] SyS_ioctl+0x79/0x90 > [] entry_SYSCALL_64_fastpath+0x12/0x6f > > CPU0 > ---- > lock(&xfs_nondir_ilock_class); > > lock(&xfs_nondir_ilock_class); > > *** DEADLOCK *** > > 3 locks held by kswapd0/543: > > stack backtrace: > CPU: 0 PID: 543 Comm: kswapd0 Tainted: G O 4.5.0-rc2+ #4 > > Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 > > ffffffff82a34f10 ffff88003aa078d0 ffffffff813a14f9 ffff88003d8551c0 > ffff88003aa07920 ffffffff8110ec65 0000000000000000 0000000000000001 > ffff880000000001 000000000000000b 0000000000000008 ffff88003d855aa0 > Call Trace: > [] dump_stack+0x4b/0x72 > [] print_usage_bug+0x215/0x240 > [] mark_lock+0x1f5/0x660 > [] ? print_shortest_lock_dependencies+0x1a0/0x1a0 > [] __lock_acquire+0xa80/0x1e50 > [] ? kmem_cache_alloc+0x15e/0x230 > [] ? kmem_zone_alloc+0x81/0x120 [xfs] > [] lock_acquire+0xd8/0x1e0 > [] ? xfs_ilock+0x177/0x200 [xfs] > [] ? xfs_reflink_cancel_cow_range+0x150/0x300 [xfs] > [] down_write_nested+0x5e/0xc0 > [] ? xfs_ilock+0x177/0x200 [xfs] > [] xfs_ilock+0x177/0x200 [xfs] > [] xfs_reflink_cancel_cow_range+0x150/0x300 [xfs] > [] xfs_fs_evict_inode+0xdc/0x1e0 [xfs] > [] evict+0xc5/0x190 > [] dispose_list+0x39/0x60 > [] prune_icache_sb+0x4b/0x60 > [] super_cache_scan+0x14f/0x1a0 > [] shrink_slab.part.63.constprop.79+0x1e9/0x4e0 > [] shrink_zone+0x15e/0x170 > [] kswapd+0x4f1/0xa80 > [] ? zone_reclaim+0x230/0x230 > [] kthread+0xf2/0x110 > [] ? kthread_create_on_node+0x220/0x220 > [] ret_from_fork+0x3f/0x70 > [] ? kthread_create_on_node+0x220/0x220 > > To quote Dave: > " > Ignoring whether reflink should be doing anything or not, that's a > "xfs_refcountbt_init_cursor() gets called both outside and inside > transactions" lockdep false positive case. The problem here is > lockdep has seen this allocation from within a transaction, hence a > GFP_NOFS allocation, and now it's seeing it in a GFP_KERNEL context. > Also note that we have an active reference to this inode. > > So, because the reclaim annotations overload the interrupt level > detections and it's seen the inode ilock been taken in reclaim > ("interrupt") context, this triggers a reclaim context warning where > it thinks it is unsafe to do this allocation in GFP_KERNEL context > holding the inode ilock... > " > > This sounds like a fundamental problem of the reclaim lock detection. > It is really impossible to annotate such a special usecase IMHO unless > the reclaim lockup detection is reworked completely. Until then it > is much better to provide a way to add "I know what I am doing flag" > and mark problematic places. This would prevent from abusing GFP_NOFS > flag which has a runtime effect even on configurations which have > lockdep disabled. > > Introduce __GFP_NOLOCKDEP flag which tells the lockdep gfp tracking to > skip the current allocation request. > > While we are at it also make sure that the radix tree doesn't > accidentaly override tags stored in the upper part of the gfp_mask. > > Suggested-by: Peter Zijlstra > Signed-off-by: Michal Hocko So I'm all for this if this works for Dave. Acked-by: Peter Zijlstra (Intel) Please take it through the XFS tree which would introduce its first user etc.. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org