From: Michal Hocko <mhocko@kernel.org>
To: NeilBrown <mr@neil.brown.name>
Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Dave Chinner <david@fromorbit.com>, Theodore Ts'o <tytso@mit.edu>,
Chris Mason <clm@fb.com>, Jan Kara <jack@suse.cz>,
ceph-devel@vger.kernel.org, cluster-devel@redhat.com,
linux-nfs@vger.kernel.org, logfs@logfs.org, xfs@oss.sgi.com,
linux-ext4@vger.kernel.org, linux-btrfs@vger.kernel.org,
linux-mtd@lists.infradead.org, reiserfs-devel@vger.kernel.org,
linux-ntfs-dev@lists.sourceforge.net,
linux-f2fs-devel@lists.sourceforge.net,
linux-afs@lists.infradead.org,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 0/2] scop GFP_NOFS api
Date: Fri, 29 Apr 2016 14:04:18 +0200 [thread overview]
Message-ID: <20160429120418.GK21977@dhcp22.suse.cz> (raw)
In-Reply-To: <8737q5ugcx.fsf@notabene.neil.brown.name>
On Fri 29-04-16 15:35:42, NeilBrown wrote:
> On Tue, Apr 26 2016, Michal Hocko wrote:
>
> > Hi,
> > we have discussed this topic at LSF/MM this year. There was a general
> > interest in the scope GFP_NOFS allocation context among some FS
> > developers. For those who are not aware of the discussion or the issue
> > I am trying to sort out (or at least start in that direction) please
> > have a look at patch 1 which adds memalloc_nofs_{save,restore} api
> > which basically copies what we have for the scope GFP_NOIO allocation
> > context. I haven't converted any of the FS myself because that is way
> > beyond my area of expertise but I would be happy to help with further
> > changes on the MM front as well as in some more generic code paths.
> >
> > Dave had an idea on how to further improve the reclaim context to be
> > less all-or-nothing wrt. GFP_NOFS. In short he was suggesting an opaque
> > and FS specific cookie set in the FS allocation context and consumed
> > by the FS reclaim context to allow doing some provably save actions
> > that would be skipped due to GFP_NOFS normally. I like this idea and
> > I believe we can go that direction regardless of the approach taken here.
> > Many filesystems simply need to cleanup their NOFS usage first before
> > diving into a more complex changes.>
>
> This strikes me as over-engineering to work around an unnecessarily
> burdensome interface.... but without details it is hard to be certain.
>
> Exactly what things happen in "FS reclaim context" which may, or may
> not, be safe depending on the specific FS allocation context? Do they
> need to happen at all?
Let me quote Dave Chinner from one of the emails discussed at LSFMM
mailing list:
: IMO, making GFP_NOFS "better" cannot be done with context-less flags
: being passed through reclaim. If we want to prevent the recursive
: self-deadlock case in an optimal manner, then we need to be able to
: pass state down to reclaim so that page writeback and the shrinkers
: can determine if they are likely to deadlock.
:
: IOWs, I think we should stop thinking of GFP_NOFS as a *global*
: directive to avoid recursion under any circumstance and instead
: start thinking about it as a mechanism to avoid recursion in
: specific reclaim contexts.
:
: Something as simple as adding an opaque cookie (e.g. can hold a
: superblock or inode pointer) to check against in writeback and
: subsystem shrinkers would result in the vast majority of GFP_NOFS
: contexts being able to reclaim from everything but the one context
: that we *might* deadlock against.
:
: e.g, if we then also check the PF_FSTRANS flag in XFS, we'll
: still be able to reclaim clean inodes, buffers and write back
: dirty pages that don't require transactions to complete under "don't
: recurse" situations because we know it's transactions that we could
: deadlock on in the direct reclaim context.
:
: Note that this information could be added to the writeback_control
: for page writeback, and it could be passed directly to shrinkers
: in the shrink_control structures. The allocation paths might be a
: little harder, but I suspect using the task struct for passing this
: information into direct reclaim might be the easiest approach...
> My research suggests that for most filesystems the only thing that
> happens in reclaim context that is at all troublesome is the final
> 'evict()' on an inode. This needs to flush out dirty pages and sync the
> inode to storage. Some time ago we moved most dirty-page writeout out
> of the reclaim context and into kswapd. I think this was an excellent
> advance in simplicity.
> If we could similarly move evict() into kswapd (and I believe we can)
> then most file systems would do nothing in reclaim context that
> interferes with allocation context.
>
> The exceptions include:
> - nfs and any filesystem using fscache can block for up to 1 second
> in ->releasepage(). They used to block waiting for some IO, but that
> caused deadlocks and wasn't really needed. I left the timeout because
> it seemed likely that some throttling would help. I suspect that a
> careful analysis will show that there is sufficient throttling
> elsewhere.
>
> - xfs_qm_shrink_scan is nearly unique among shrinkers in that it waits
> for IO so it can free some quotainfo things. If it could be changed
> to just schedule the IO without waiting for it then I think this
> would be safe to be called in any FS allocation context. It already
> uses a 'trylock' in xfs_dqlock_nowait() to avoid deadlocking
> if the lock is held.
>
> I think you/we would end up with a much simpler system if instead of
> focussing on the places where GFP_NOFS is used, we focus on places where
> __GFP_FS is tested, and try to remove them.
One think I have learned is that shrinkers can be really complex and
getting rid of GFP_NOFS will be really hard so I would really like to
start the easiest way possible and remove the direct usage and replace
it by scope one which would at least _explain_ why it is needed. I think
this is a reasonable _first_ step and a large step ahead because we have
a good chance to get rid of a large number of those which were used
"just because I wasn't sure and this should be safe, right?". I wouldn't
be surprised if we end up with a very small number of both scope and
direct usage in the end.
I would also like to revisit generic inode/dentry shrinker and see
whether it could be more __GFP_FS friendly. As you say many FS might
even not depend on some FS internal locks so pushing GFP_FS check down
the layers might make a lot of sense and allow to clean some [id]cache
even for __GFP_FS context.
> If we get rid of enough of them the remainder could just use __GFP_IO.
>
> > The patch 2 is a debugging aid which warns about explicit allocation
> > requests from the scope context. This is should help to reduce the
> > direct usage of the NOFS flags to bare minimum in favor of the scope
> > API. It is not aimed to be merged upstream. I would hope Andrew took it
> > into mmotm tree to give it linux-next exposure and allow developers to
> > do further cleanups. There is a new kernel command line parameter which
> > has to be used for the debugging to be enabled.
> >
> > I think the GFP_NOIO should be seeing the same clean up.
>
> I think you are suggesting that use of GFP_NOIO should (largely) be
> deprecated in favour of memalloc_noio_save(). I think I agree.
Yes that was the idea.
> Could we go a step further and deprecate GFP_ATOMIC in favour of some
> in_atomic() test? Maybe that is going too far.
I am not really sure we need that and some GFP_NOWAIT usage is deliberate
to perform an optimistic allocation with another fallback (e.g. higher order
for performance reasons with single page fallback). So I think that nowait
is a slightly different thing.
Thanks!
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-04-29 12:04 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-26 11:56 Michal Hocko
2016-04-26 11:56 ` [PATCH 1/2] mm: add PF_MEMALLOC_NOFS Michal Hocko
2016-04-26 23:07 ` Dave Chinner
2016-04-27 7:51 ` Michal Hocko
2016-04-27 10:53 ` Tetsuo Handa
2016-04-27 11:15 ` Michal Hocko
2016-04-27 14:44 ` Tetsuo Handa
2016-04-27 20:05 ` Michal Hocko
2016-04-27 11:54 ` [PATCH 1.1/2] xfs: abstract PF_FSTRANS to PF_MEMALLOC_NOFS Michal Hocko
2016-04-27 11:54 ` [PATCH 1.2/2] mm: introduce memalloc_nofs_{save,restore} API Michal Hocko
2016-04-27 13:07 ` Michal Hocko
2016-04-27 20:09 ` Michal Hocko
2016-04-27 20:30 ` Michal Hocko
2016-04-27 21:14 ` Michal Hocko
2016-04-27 17:41 ` [PATCH 1.1/2] xfs: abstract PF_FSTRANS to PF_MEMALLOC_NOFS Andreas Dilger
2016-04-27 19:43 ` Michal Hocko
2016-04-26 11:56 ` [PATCH 2/2] mm, debug: report when GFP_NO{FS,IO} is used explicitly from memalloc_no{fs,io}_{save,restore} context Michal Hocko
2016-04-26 22:58 ` Dave Chinner
2016-04-27 8:03 ` Michal Hocko
2016-04-27 22:55 ` Dave Chinner
2016-04-28 8:17 ` Michal Hocko
2016-04-28 21:51 ` Dave Chinner
2016-04-29 12:12 ` Michal Hocko
2016-04-29 23:40 ` Dave Chinner
2016-05-03 15:38 ` Michal Hocko
2016-05-04 0:07 ` Dave Chinner
2016-04-29 5:35 ` [PATCH 0/2] scop GFP_NOFS api NeilBrown
2016-04-29 10:20 ` [Cluster-devel] " Steven Whitehouse
2016-04-30 21:17 ` NeilBrown
2016-04-29 12:04 ` Michal Hocko [this message]
2016-04-30 0:24 ` Dave Chinner
2016-04-30 21:55 ` NeilBrown
2016-05-03 15:13 ` Michal Hocko
2016-05-03 23:26 ` NeilBrown
2016-04-30 0:11 ` Dave Chinner
2016-04-30 22:19 ` NeilBrown
2016-05-04 1:00 ` Dave Chinner
2016-05-06 3:20 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160429120418.GK21977@dhcp22.suse.cz \
--to=mhocko@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=ceph-devel@vger.kernel.org \
--cc=clm@fb.com \
--cc=cluster-devel@redhat.com \
--cc=david@fromorbit.com \
--cc=jack@suse.cz \
--cc=linux-afs@lists.infradead.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-f2fs-devel@lists.sourceforge.net \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-mtd@lists.infradead.org \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-ntfs-dev@lists.sourceforge.net \
--cc=logfs@logfs.org \
--cc=mr@neil.brown.name \
--cc=reiserfs-devel@vger.kernel.org \
--cc=tytso@mit.edu \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox