Re: Xfs lockdep warning with for-dave-for-4.6 branch

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Michal Hocko <mhocko@kernel.org>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Qu Wenruo <quwenruo@cn.fujitsu.com>,
	xfs@oss.sgi.com, linux-mm@kvack.org,
	Ingo Molnar <mingo@kernel.org>
Subject: Re: Xfs lockdep warning with for-dave-for-4.6 branch
Date: Wed, 18 May 2016 09:20:05 +0200	[thread overview]
Message-ID: <20160518072005.GA3193@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20160517223549.GV26977@dastard>

On Wed, May 18, 2016 at 08:35:49AM +1000, Dave Chinner wrote:
> On Tue, May 17, 2016 at 04:49:12PM +0200, Peter Zijlstra wrote:
> > 
> > Thanks for writing all that down Dave!
> > 
> > On Tue, May 17, 2016 at 09:10:56AM +1000, Dave Chinner wrote:
> > 
> > > The reason we don't have lock clases for the ilock is that we aren't
> > > supposed to call memory reclaim with that lock held in exclusive
> > > mode. This is because reclaim can run transactions, and that may
> > > need to flush dirty inodes to make progress. Flushing dirty inode
> > > requires taking the ilock in shared mode.
> > > 
> > > In the code path that was reported, we hold the ilock in /shared/
> > > mode with no transaction context (we are doing a read-only
> > > operation). This means we can run transactions in memory reclaim
> > > because a) we can't deadlock on the inode we hold locks on, and b)
> > > transaction reservations will be able to make progress as we don't
> > > hold any locks it can block on.
> > 
> > Just to clarify; I read the above as that we cannot block on recursive
> > shared locks, is this correct?
> > 
> > Because we can in fact block on down_read()+down_read() just fine, so if
> > you're assuming that, then something's busted.
> 
> The transaction reservation path will run down_read_trylock() on the
> inode, not down_read(). Hence if there are no pending writers, it
> will happily take the lock twice and make progress, otherwise it
> will skip the inode and there's no deadlock.  If there's a pending
> writer, then we have another context that is already in a
> transaction context and has already pushed the item, hence it is
> only in the scope of the current push because IO hasn't completed
> yet and removed it from the list.
> 
> > Otherwise, I'm not quite reading it right, which is, given the
> > complexity of that stuff, entirely possible.
> 
> There's a maze of dark, grue-filled twisty passages here...

I'm sure. Thanks for the extra detail; yes the down_read_trylock() thing
will work.

> > In any case; would something like this work for you? Its entirely
> > untested, but the idea is to mark an entire class to skip reclaim
> > validation, instead of marking individual sites.
> 
> Probably would, but it seems like swatting a fly with runaway
> train. I'd much prefer a per-site annotation (e.g. as a GFP_ flag)
> so that we don't turn off something that will tell us we've made a
> mistake while developing new code...

Fair enough; if the mm folks don't object to 'wasting' a GFP flag on
this the below ought to do I think.

---
 include/linux/gfp.h      | 9 ++++++++-
 kernel/locking/lockdep.c | 3 +++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 570383a41853..7b5b9db4c821 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -41,6 +41,7 @@ struct vm_area_struct;
 #define ___GFP_OTHER_NODE	0x800000u
 #define ___GFP_WRITE		0x1000000u
 #define ___GFP_KSWAPD_RECLAIM	0x2000000u
+#define ___GFP_NOVALIDATE	0x4000000u
 /* If the above are modified, __GFP_BITS_SHIFT may need updating */
 
 /*
@@ -146,6 +147,11 @@ struct vm_area_struct;
  *   return NULL when direct reclaim and memory compaction have failed to allow
  *   the allocation to succeed.  The OOM killer is not called with the current
  *   implementation.
+ *
+ * __GFP_NOVALIDATE: lockdep annotation; do not use this allocation to set
+ *   LOCK_USED_IN_RECLAIM_FS and thereby avoid possible false positives.
+ *   Use of this flag should have a comment explaining the false positive
+ *   in detail.
  */
 #define __GFP_IO	((__force gfp_t)___GFP_IO)
 #define __GFP_FS	((__force gfp_t)___GFP_FS)
@@ -155,6 +161,7 @@ struct vm_area_struct;
 #define __GFP_REPEAT	((__force gfp_t)___GFP_REPEAT)
 #define __GFP_NOFAIL	((__force gfp_t)___GFP_NOFAIL)
 #define __GFP_NORETRY	((__force gfp_t)___GFP_NORETRY)
+#define __GFP_NOVALIDATE ((__force gfp_t)___GFP_NOVALIDATE)
 
 /*
  * Action modifiers
@@ -188,7 +195,7 @@ struct vm_area_struct;
 #define __GFP_OTHER_NODE ((__force gfp_t)___GFP_OTHER_NODE)
 
 /* Room for N __GFP_FOO bits */
-#define __GFP_BITS_SHIFT 26
+#define __GFP_BITS_SHIFT 27
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
 
 /*
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 81f1a7107c0e..88bf43c3bf76 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2853,6 +2853,9 @@ static void __lockdep_trace_alloc(gfp_t gfp_mask, unsigned long flags)
 	if (unlikely(!debug_locks))
 		return;
 
+	if (gfp_mask & __GFP_NOVALIDATE)
+		return;
+
 	/* no reclaim without waiting on it */
 	if (!(gfp_mask & __GFP_DIRECT_RECLAIM))
 		return;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2016-05-18  7:20 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <94cea603-2782-1c5a-e2df-42db4459a8ce@cn.fujitsu.com>
     [not found] ` <20160512055756.GE6648@birch.djwong.org>
     [not found]   ` <20160512080321.GA18496@dastard>
2016-05-13 16:03     ` Michal Hocko
2016-05-16 10:41       ` Peter Zijlstra
2016-05-16 13:05         ` Michal Hocko
2016-05-16 13:25           ` Peter Zijlstra
2016-05-16 23:10             ` Dave Chinner
2016-05-17 14:49               ` Peter Zijlstra
2016-05-17 22:35                 ` Dave Chinner
2016-05-18  7:20                   ` Peter Zijlstra [this message]
2016-05-18  8:25                     ` Michal Hocko
2016-05-18  9:49                       ` Peter Zijlstra
2016-05-18 11:31                         ` Michal Hocko
2016-05-19  8:11                   ` Peter Zijlstra
2016-05-20  0:17                     ` Dave Chinner
2016-06-01 13:17                       ` Michal Hocko
2016-06-01 18:16                         ` Peter Zijlstra
2016-06-02 14:50                           ` Michal Hocko
2016-06-02 15:11                             ` Peter Zijlstra
2016-06-02 15:46                               ` Michal Hocko
2016-06-02 23:22                                 ` Dave Chinner
2016-06-06 12:20                                   ` Michal Hocko
2016-06-15  7:21                                     ` Dave Chinner
2016-06-21 14:26                                       ` Michal Hocko
2016-06-22  1:03                                         ` Dave Chinner
2016-06-22 12:38                                           ` Michal Hocko
2016-06-22 22:58                                             ` Dave Chinner
2016-06-23 11:35                                               ` Michal Hocko
2016-10-06 13:04                           ` Michal Hocko
2016-10-17 13:49                             ` Michal Hocko
2016-10-19  0:33                             ` Dave Chinner
2016-10-19  5:30                               ` Dave Chinner
2016-10-19  8:33                             ` Peter Zijlstra
2016-10-19 12:06                               ` Michal Hocko
2016-10-19 21:49                                 ` Dave Chinner
2016-10-20  7:15                                   ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160518072005.GA3193@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mingo@kernel.org \
    --cc=quwenruo@cn.fujitsu.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox