From: Michal Hocko <mhocko@kernel.org>
To: Peter Zijlstra <peterz@infradead.org>,
Dave Chinner <david@fromorbit.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>,
Qu Wenruo <quwenruo@cn.fujitsu.com>,
xfs@oss.sgi.com, linux-mm@kvack.org,
Ingo Molnar <mingo@kernel.org>
Subject: Re: Xfs lockdep warning with for-dave-for-4.6 branch
Date: Wed, 1 Jun 2016 15:17:58 +0200 [thread overview]
Message-ID: <20160601131758.GO26601@dhcp22.suse.cz> (raw)
In-Reply-To: <20160520001714.GC26977@dastard>
Thanks Dave for your detailed explanation again! Peter do you have any
other idea how to deal with these situations other than opt out from
lockdep reclaim machinery?
If not I would rather go with an annotation than a gfp flag to be honest
but if you absolutely hate that approach then I will try to check wheter
a CONFIG_LOCKDEP GFP_FOO doesn't break something else. Otherwise I would
steal the description from Dave's email and repost my patch.
I plan to repost my scope gfp patches in few days and it would be good
to have some mechanism to drop those GFP_NOFS to paper over lockdep
false positives for that.
[keeping Dave's explanation for reference]
On Fri 20-05-16 10:17:14, Dave Chinner wrote:
> On Thu, May 19, 2016 at 10:11:46AM +0200, Peter Zijlstra wrote:
> > On Wed, May 18, 2016 at 08:35:49AM +1000, Dave Chinner wrote:
[...]
> > > There's a maze of dark, grue-filled twisty passages here...
> >
> > OK; I might need a bit more again.
> >
> > So now the code does something like:
> >
> > down_read(&i_lock); -- lockdep marks lock as held
> > kmalloc(GFP_KERNEL); -- lockdep marks held locks as ENABLED_RECLAIM_FS
> > --> reclaim()
> > down_read_trylock(&i_lock); -- lockdep does _NOT_ mark as USED_IN_RECLAIM_FS
> >
> > Right?
>
> In the path that can deadlock the log, yes. It's actually way more
> complex than the above, because the down_read_trylock(&i_lock) that
> matters is run in a completely separate, async kthread that
> xfs_trans_reserve() will block waiting for.
>
> process context xfsaild kthread(*)
> --------------- ------------------
> down_read(&i_lock); -- lockdep marks lock as held
> kmalloc(GFP_KERNEL); -- lockdep marks held locks as ENABLED_RECLAIM_FS
> --> reclaim()
> xfs_trans_reserve()
> ....
> xfs_trans_push_ail() ---- called if no space in the log to kick the xfsaild into action
> ....
> xlog_grant_head_wait() ---- blocks waiting for log space
> .....
>
> xfsaild_push() ----- iterates AIL
> grabs log item
> lock log item
> >>>>>>>>>>>>>>>>>>>>> down_read_trylock(&i_lock);
> format item into buffer
> add to dirty buffer list
> ....
> submit dirty buffer list for IO
> buffer IO started
> .....
> <async IO completion context>
> buffer callbacks
> mark inode clean
> remove inode from AIL
> move tail of log forwards
> wake grant head waiters
> <woken by log tail moving>
> <log space available>
> transaction reservation granted
> .....
> down_write(some other inode ilock)
> <modify some other inode>
> xfs_trans_commit
> .....
>
> (*) xfsaild runs with PF_MEMALLOC context.
>
> The problem is that if the ilock is held exclusively at GFP_KERNEL
> time, the xfsaild cannot lock the inode to flush it, so if that
> inode pins the tail of the log then we can't make space available
> for xfs_trans_reserve and there is the deadlock.
>
> Once xfs_trans_reserve completes, however, we'll take the ilock on
> *some other inode*, and that's where the "it can't be the inode we
> currently hold locked because we have references to it" and
> henceit's safe to have a pattern like:
>
> down_read(&i_lock); -- lockdep marks lock as held
> kmalloc(GFP_KERNEL); -- lockdep marks held locks as ENABLED_RECLAIM_FS
> --> reclaim()
> down_write(&ilock)
>
> because the lock within reclaim context is completely unrelated to
> the lock we already hold.
>
> Lockdep can't possibly know about this because the deadlock involves
> locking contexts that *aren't doing anything wrong within their own
> contexts*. It's only when you add the dependency of log space
> reservation requirements needed to make forwards progress that
> there's then an issue with locking and reclaim.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-06-01 13:18 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <94cea603-2782-1c5a-e2df-42db4459a8ce@cn.fujitsu.com>
[not found] ` <20160512055756.GE6648@birch.djwong.org>
[not found] ` <20160512080321.GA18496@dastard>
2016-05-13 16:03 ` Michal Hocko
2016-05-16 10:41 ` Peter Zijlstra
2016-05-16 13:05 ` Michal Hocko
2016-05-16 13:25 ` Peter Zijlstra
2016-05-16 23:10 ` Dave Chinner
2016-05-17 14:49 ` Peter Zijlstra
2016-05-17 22:35 ` Dave Chinner
2016-05-18 7:20 ` Peter Zijlstra
2016-05-18 8:25 ` Michal Hocko
2016-05-18 9:49 ` Peter Zijlstra
2016-05-18 11:31 ` Michal Hocko
2016-05-19 8:11 ` Peter Zijlstra
2016-05-20 0:17 ` Dave Chinner
2016-06-01 13:17 ` Michal Hocko [this message]
2016-06-01 18:16 ` Peter Zijlstra
2016-06-02 14:50 ` Michal Hocko
2016-06-02 15:11 ` Peter Zijlstra
2016-06-02 15:46 ` Michal Hocko
2016-06-02 23:22 ` Dave Chinner
2016-06-06 12:20 ` Michal Hocko
2016-06-15 7:21 ` Dave Chinner
2016-06-21 14:26 ` Michal Hocko
2016-06-22 1:03 ` Dave Chinner
2016-06-22 12:38 ` Michal Hocko
2016-06-22 22:58 ` Dave Chinner
2016-06-23 11:35 ` Michal Hocko
2016-10-06 13:04 ` Michal Hocko
2016-10-17 13:49 ` Michal Hocko
2016-10-19 0:33 ` Dave Chinner
2016-10-19 5:30 ` Dave Chinner
2016-10-19 8:33 ` Peter Zijlstra
2016-10-19 12:06 ` Michal Hocko
2016-10-19 21:49 ` Dave Chinner
2016-10-20 7:15 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160601131758.GO26601@dhcp22.suse.cz \
--to=mhocko@kernel.org \
--cc=darrick.wong@oracle.com \
--cc=david@fromorbit.com \
--cc=linux-mm@kvack.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=quwenruo@cn.fujitsu.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox