From: Matthew Wilcox <willy@infradead.org>
To: NeilBrown <neilb@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Jeff Layton <jlayton@kernel.org>,
Ilya Dryomov <idryomov@gmail.com>,
Miklos Szeredi <miklos@szeredi.hu>,
Trond Myklebust <trond.myklebust@hammerspace.com>,
Anna Schumaker <anna.schumaker@netapp.com>,
linux-mm@kvack.org, linux-nfs@vger.kernel.org,
linux-fsdevel@vger.kernel.org, ceph-devel@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/3] fuse: remove reliance on bdi congestion
Date: Tue, 1 Feb 2022 02:01:12 +0000 [thread overview]
Message-ID: <YfiUaJ59A3px+DqP@casper.infradead.org> (raw)
In-Reply-To: <164367002370.18996.7242801209611375112@noble.neil.brown.name>
On Tue, Feb 01, 2022 at 10:00:23AM +1100, NeilBrown wrote:
> On Tue, 01 Feb 2022, Matthew Wilcox wrote:
> > On Mon, Jan 31, 2022 at 03:47:41PM +1100, NeilBrown wrote:
> > > On Mon, 31 Jan 2022, Matthew Wilcox wrote:
> > > > > +++ b/fs/fuse/file.c
> > > > > @@ -958,6 +958,8 @@ static void fuse_readahead(struct readahead_control *rac)
> > > > >
> > > > > if (fuse_is_bad(inode))
> > > > > return;
> > > > > + if (fc->num_background >= fc->congestion_threshold)
> > > > > + return;
> > > >
> > > > This seems like a bad idea to me. If we don't even start reads on
> > > > readahead pages, they'll get ->readpage called on them one at a time
> > > > and the reading thread will block. It's going to lead to some nasty
> > > > performance problems, exactly when you don't want them. Better to
> > > > queue the reads internally and wait for congestion to ease before
> > > > submitting the read.
> > > >
> > >
> > > Isn't that exactly what happens now? page_cache_async_ra() sees that
> > > inode_read_congested() returns true, so it doesn't start readahead.
> > > ???
> >
> > It's rather different. Imagine the readahead window has expanded to
> > 256kB (64 pages). Today, we see congestion and don't do anything.
> > That means we miss the async readahed opportunity, find a missing
> > page and end up calling into page_cache_sync_ra(), by which time
> > we may or may not be congested.
> >
> > If the inode_read_congested() in page_cache_async_ra() is removed and
> > the patch above is added to replace it, we'll allocate those 64 pages and
> > add them to the page cache. But then we'll return without starting IO.
> > When we hit one of those !uptodate pages, we'll call ->readpage on it,
> > but we won't do anything to the other 63 pages. So we'll go through a
> > protracted slow period of sending 64 reads, one at a time, whether or
> > not congestion has eased. Then we'll hit a missing page and proceed
> > to the sync ra case as above.
>
> Hmmm... where is all this documented?
> The entry for readahead in vfs.rst says:
>
> If the filesystem decides to stop attempting I/O before reaching the
> end of the readahead window, it can simply return.
>
> but you are saying that if it simply returns, it'll most likely just get
> called again. So maybe it shouldn't say that?
That's not what I'm saying at all. I'm saying that if ->readahead fails
to read the page, ->readpage will be called to read the page (if it's
actually accessed).
> What do other filesystems do?
> ext4 sets REQ_RAHEAD, but otherwise just pushes ahead and submits all
> requests. btrfs seems much the same.
> xfs uses iomp_readahead .. which again sets REQ_RAHEAD but otherwise
> just does a normal read.
>
> The effect of REQ_RAHEAD seems to be primarily to avoid retries on
> failure.
>
> So it seems that core read-ahead code it not set up to expect readahead
> to fail, though it is (begrudgingly) permitted.
Well, yes. The vast majority of reads don't fail.
> The current inode_read_congested() test in page_cache_async_ra() seems
> to be just delaying the inevitable (and in fairness, the comment does
> say "Defer...."). Maybe just blocking on the congestion is an equally
> good way to delay it...
I don't think we should _block_ for an async read request. We're in the
context of a process which has read a different page. Maybe what we
need is a readahead_abandon() call that removes the just-added pages
from the page cache, so we fall back to doing a sync readahead?
> I note that ->readahead isn't told if the read-ahead is async or not, so
> my patch will drop sync read-ahead on congestion, which the current code
> doesn't do.
Now that we have a readahead_control, it's simple to add that
information to it.
> So maybe this congestion tracking really is useful, and we really want
> to keep it.
>
> I really would like to see that high-level documentation!!
I've done my best to add documentation. There's more than before
I started.
next prev parent reply other threads:[~2022-02-01 2:01 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-31 4:03 [PATCH 0/3] remove dependence of inode_congested() NeilBrown
2022-01-31 4:03 ` [PATCH 1/3] fuse: remove reliance on bdi congestion NeilBrown
2022-01-31 4:28 ` Matthew Wilcox
2022-01-31 4:47 ` NeilBrown
2022-01-31 10:21 ` Miklos Szeredi
2022-01-31 13:12 ` Matthew Wilcox
2022-01-31 23:00 ` NeilBrown
2022-02-01 2:01 ` Matthew Wilcox [this message]
2022-02-01 3:28 ` NeilBrown
2022-02-01 4:06 ` Matthew Wilcox
2022-02-07 0:47 ` NeilBrown
2022-01-31 4:03 ` [PATCH 3/3] ceph: " NeilBrown
2022-01-31 4:03 ` [PATCH 2/3] nfs: " NeilBrown
2022-01-31 4:22 ` Matthew Wilcox
2022-01-31 4:55 ` NeilBrown
2022-01-31 13:15 ` Matthew Wilcox
2022-01-31 21:38 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YfiUaJ59A3px+DqP@casper.infradead.org \
--to=willy@infradead.org \
--cc=akpm@linux-foundation.org \
--cc=anna.schumaker@netapp.com \
--cc=ceph-devel@vger.kernel.org \
--cc=idryomov@gmail.com \
--cc=jlayton@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nfs@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=neilb@suse.de \
--cc=trond.myklebust@hammerspace.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox