linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: NeilBrown <mr@neil.brown.name>
Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Dave Chinner <david@fromorbit.com>, Theodore Ts'o <tytso@mit.edu>,
	Chris Mason <clm@fb.com>, Jan Kara <jack@suse.cz>,
	ceph-devel@vger.kernel.org, cluster-devel@redhat.com,
	linux-nfs@vger.kernel.org, logfs@logfs.org, xfs@oss.sgi.com,
	linux-ext4@vger.kernel.org, linux-btrfs@vger.kernel.org,
	linux-mtd@lists.infradead.org, reiserfs-devel@vger.kernel.org,
	linux-ntfs-dev@lists.sourceforge.net,
	linux-f2fs-devel@lists.sourceforge.net,
	linux-afs@lists.infradead.org,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 0/2] scop GFP_NOFS api
Date: Fri, 29 Apr 2016 14:04:18 +0200	[thread overview]
Message-ID: <20160429120418.GK21977@dhcp22.suse.cz> (raw)
In-Reply-To: <8737q5ugcx.fsf@notabene.neil.brown.name>

On Fri 29-04-16 15:35:42, NeilBrown wrote:
> On Tue, Apr 26 2016, Michal Hocko wrote:
> 
> > Hi,
> > we have discussed this topic at LSF/MM this year. There was a general
> > interest in the scope GFP_NOFS allocation context among some FS
> > developers. For those who are not aware of the discussion or the issue
> > I am trying to sort out (or at least start in that direction) please
> > have a look at patch 1 which adds memalloc_nofs_{save,restore} api
> > which basically copies what we have for the scope GFP_NOIO allocation
> > context. I haven't converted any of the FS myself because that is way
> > beyond my area of expertise but I would be happy to help with further
> > changes on the MM front as well as in some more generic code paths.
> >
> > Dave had an idea on how to further improve the reclaim context to be
> > less all-or-nothing wrt. GFP_NOFS. In short he was suggesting an opaque
> > and FS specific cookie set in the FS allocation context and consumed
> > by the FS reclaim context to allow doing some provably save actions
> > that would be skipped due to GFP_NOFS normally.  I like this idea and
> > I believe we can go that direction regardless of the approach taken here.
> > Many filesystems simply need to cleanup their NOFS usage first before
> > diving into a more complex changes.>
> 
> This strikes me as over-engineering to work around an unnecessarily
> burdensome interface.... but without details it is hard to be certain.
>
> Exactly what things happen in "FS reclaim context" which may, or may
> not, be safe depending on the specific FS allocation context?  Do they
> need to happen at all?

Let me quote Dave Chinner from one of the emails discussed at LSFMM
mailing list:
: IMO, making GFP_NOFS "better" cannot be done with context-less flags
: being passed through reclaim. If we want to prevent the recursive
: self-deadlock case in an optimal manner, then we need to be able to
: pass state down to reclaim so that page writeback and the shrinkers
: can determine if they are likely to deadlock.
: 
: IOWs, I think we should stop thinking of GFP_NOFS as a *global*
: directive to avoid recursion under any circumstance and instead
: start thinking about it as a mechanism to avoid recursion in
: specific reclaim contexts.
: 
: Something as simple as adding an opaque cookie (e.g. can hold a
: superblock or inode pointer) to check against in writeback and
: subsystem shrinkers would result in the vast majority of GFP_NOFS
: contexts being able to reclaim from everything but the one context
: that we *might* deadlock against.
: 
: e.g, if we then also check the PF_FSTRANS flag in XFS, we'll
: still be able to reclaim clean inodes, buffers and write back
: dirty pages that don't require transactions to complete under "don't
: recurse" situations because we know it's transactions that we could
: deadlock on in the direct reclaim context.
: 
: Note that this information could be added to the writeback_control
: for page writeback, and it could be passed directly to shrinkers
: in the shrink_control structures. The allocation paths might be a
: little harder, but I suspect using the task struct for passing this
: information into direct reclaim might be the easiest approach...

> My research suggests that for most filesystems the only thing that
> happens in reclaim context that is at all troublesome is the final
> 'evict()' on an inode.  This needs to flush out dirty pages and sync the
> inode to storage.  Some time ago we moved most dirty-page writeout out
> of the reclaim context and into kswapd.  I think this was an excellent
> advance in simplicity.
> If we could similarly move evict() into kswapd (and I believe we can)
> then most file systems would do nothing in reclaim context that
> interferes with allocation context.
> 
> The exceptions include:
>  - nfs and any filesystem using fscache can block for up to 1 second
>    in ->releasepage().  They used to block waiting for some IO, but that
>    caused deadlocks and wasn't really needed.  I left the timeout because
>    it seemed likely that some throttling would help.  I suspect that a
>    careful analysis will show that there is sufficient throttling
>    elsewhere.
> 
>  - xfs_qm_shrink_scan is nearly unique among shrinkers in that it waits
>    for IO so it can free some quotainfo things.  If it could be changed
>    to just schedule the IO without waiting for it then I think this
>    would be safe to be called in any FS allocation context.  It already
>    uses a 'trylock' in xfs_dqlock_nowait() to avoid deadlocking
>    if the lock is held.
> 
> I think you/we would end up with a much simpler system if instead of
> focussing on the places where GFP_NOFS is used, we focus on places where
> __GFP_FS is tested, and try to remove them.

One think I have learned is that shrinkers can be really complex and
getting rid of GFP_NOFS will be really hard so I would really like to
start the easiest way possible and remove the direct usage and replace
it by scope one which would at least _explain_ why it is needed. I think
this is a reasonable _first_ step and a large step ahead because we have
a good chance to get rid of a large number of those which were used
"just because I wasn't sure and this should be safe, right?". I wouldn't
be surprised if we end up with a very small number of both scope and
direct usage in the end.

I would also like to revisit generic inode/dentry shrinker and see
whether it could be more __GFP_FS friendly. As you say many FS might
even not depend on some FS internal locks so pushing GFP_FS check down
the layers might make a lot of sense and allow to clean some [id]cache
even for __GFP_FS context.

> If we get rid of enough of them the remainder could just use __GFP_IO.
> 
> > The patch 2 is a debugging aid which warns about explicit allocation
> > requests from the scope context. This is should help to reduce the
> > direct usage of the NOFS flags to bare minimum in favor of the scope
> > API. It is not aimed to be merged upstream. I would hope Andrew took it
> > into mmotm tree to give it linux-next exposure and allow developers to
> > do further cleanups.  There is a new kernel command line parameter which
> > has to be used for the debugging to be enabled.
> >
> > I think the GFP_NOIO should be seeing the same clean up.
> 
> I think you are suggesting that use of GFP_NOIO should (largely) be
> deprecated in favour of memalloc_noio_save().  I think I agree.

Yes that was the idea.

> Could we go a step further and deprecate GFP_ATOMIC in favour of some
> in_atomic() test?  Maybe that is going too far.

I am not really sure we need that and some GFP_NOWAIT usage is deliberate
to perform an optimistic allocation with another fallback (e.g. higher order
for performance reasons with single page fallback). So I think that nowait
is a slightly different thing.

Thanks!
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2016-04-29 12:04 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-26 11:56 Michal Hocko
2016-04-26 11:56 ` [PATCH 1/2] mm: add PF_MEMALLOC_NOFS Michal Hocko
2016-04-26 23:07   ` Dave Chinner
2016-04-27  7:51     ` Michal Hocko
2016-04-27 10:53   ` Tetsuo Handa
2016-04-27 11:15     ` Michal Hocko
2016-04-27 14:44       ` Tetsuo Handa
2016-04-27 20:05         ` Michal Hocko
2016-04-27 11:54   ` [PATCH 1.1/2] xfs: abstract PF_FSTRANS to PF_MEMALLOC_NOFS Michal Hocko
2016-04-27 11:54     ` [PATCH 1.2/2] mm: introduce memalloc_nofs_{save,restore} API Michal Hocko
2016-04-27 13:07       ` Michal Hocko
2016-04-27 20:09       ` Michal Hocko
2016-04-27 20:30         ` Michal Hocko
2016-04-27 21:14       ` Michal Hocko
2016-04-27 17:41     ` [PATCH 1.1/2] xfs: abstract PF_FSTRANS to PF_MEMALLOC_NOFS Andreas Dilger
2016-04-27 19:43       ` Michal Hocko
2016-04-26 11:56 ` [PATCH 2/2] mm, debug: report when GFP_NO{FS,IO} is used explicitly from memalloc_no{fs,io}_{save,restore} context Michal Hocko
2016-04-26 22:58   ` Dave Chinner
2016-04-27  8:03     ` Michal Hocko
2016-04-27 22:55       ` Dave Chinner
2016-04-28  8:17     ` Michal Hocko
2016-04-28 21:51       ` Dave Chinner
2016-04-29 12:12         ` Michal Hocko
2016-04-29 23:40           ` Dave Chinner
2016-05-03 15:38             ` Michal Hocko
2016-05-04  0:07               ` Dave Chinner
2016-04-29  5:35 ` [PATCH 0/2] scop GFP_NOFS api NeilBrown
2016-04-29 10:20   ` [Cluster-devel] " Steven Whitehouse
2016-04-30 21:17     ` NeilBrown
2016-04-29 12:04   ` Michal Hocko [this message]
2016-04-30  0:24     ` Dave Chinner
2016-04-30 21:55     ` NeilBrown
2016-05-03 15:13       ` Michal Hocko
2016-05-03 23:26         ` NeilBrown
2016-04-30  0:11   ` Dave Chinner
2016-04-30 22:19     ` NeilBrown
2016-05-04  1:00       ` Dave Chinner
2016-05-06  3:20         ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160429120418.GK21977@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=ceph-devel@vger.kernel.org \
    --cc=clm@fb.com \
    --cc=cluster-devel@redhat.com \
    --cc=david@fromorbit.com \
    --cc=jack@suse.cz \
    --cc=linux-afs@lists.infradead.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-mtd@lists.infradead.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-ntfs-dev@lists.sourceforge.net \
    --cc=logfs@logfs.org \
    --cc=mr@neil.brown.name \
    --cc=reiserfs-devel@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox