From: Matthew Wilcox <willy@infradead.org>
To: David Howells <dhowells@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Steve French <stfrench@microsoft.com>,
Vishal Moola <vishal.moola@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
Jan Kara <jack@suse.cz>, Paulo Alcantara <pc@cjr.nz>,
Huang Ying <ying.huang@intel.com>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
Xin Hao <xhao@linux.alibaba.com>,
linux-mm@kvack.org, mm-commits@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [RFC][PATCH] cifs: Fix cifs_writepages_region()
Date: Fri, 24 Feb 2023 18:44:46 +0000 [thread overview]
Message-ID: <Y/kFnhUM5hjWM2Ae@casper.infradead.org> (raw)
In-Reply-To: <2385089.1677258941@warthog.procyon.org.uk>
On Fri, Feb 24, 2023 at 05:15:41PM +0000, David Howells wrote:
> Matthew Wilcox <willy@infradead.org> wrote:
>
> > Why are you doing it this way? What's wrong with using
> > write_cache_pages() to push all the contiguous dirty folios into a single
> > I/O object, submitting it when the folios turn out not to be contiguous,
> > or when we run out of a batch?
> >
> > You've written an awful lot of code here and it's a different model from
> > every other filesystem. Why is it better?
>
> Because write_cache_pages():
>
> (1) Takes no account of fscache. I can't just build knowledge of PG_fscache
> into it because PG_private_2 may be used differently by other filesystems
> (btrfs, for example). (I'm also trying to phase out the use of
> PG_private_2 and instead uses PG_writeback to cover both and the
> difference will be recorded elsewhere - but that's not there yet).
I don't understand why waiting for fscache here is even the right thing
to do. The folio is dirty and needs to be written back to the
server. Why should beginning that write wait for the current write
to the fscache to complete?
> (2) Calls ->writepage() individually for each folio - which is excessive. In
> AFS's implementation, we locate the first folio, then race through the
> following folios without ever waiting until we hit something that's
> locked or a gap and then stop and submit.
>
> write_cache_pages(), otoh, calls us with the next folio already undirtied
> and set for writeback when we find out that we don't want it yet.
I think you're so convinced that your way is better that you don't see
what write_cache_pages() is actually doing. Yes, the writepage callback
is called once for each folio, but that doesn't actually submit a write.
Instead, they're accumulated into the equivalent of a wdata and the
wdata is submitted when there's a discontiguity or we've accumulated
enough to satisfy the constraints of the wbc.
> (3) Skips over holes, but at some point in the future we're going to need to
> schedule adjacent clean pages (before and after) for writeback too to
> handle transport compression and fscache updates if the granule size for
> either is larger than the folio size.
Then we'll need it for other filesystems too, so should enhance
write_cache_pages().
> It might be better to take what's in cifs, generalise it and replace
> write_cache_pages() with it, then have a "->submit_write()" aop that takes an
> ITER_XARRAY iterator to write from.
->writepages _is_ ->submit_write. Should write_cache_pages() be better?
Maybe! But it works a whole lot better than what AFS was doing and
you've now ladelled into cifs.
next prev parent reply other threads:[~2023-02-24 18:45 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-20 21:52 [GIT PULL] MM updates for 6.3-rc1 Andrew Morton
2023-02-21 23:36 ` Andrew Morton
2023-02-24 1:33 ` pr-tracker-bot
2023-02-24 1:33 ` Linus Torvalds
2023-02-24 1:56 ` Andrew Morton
2023-02-24 3:01 ` Huang, Ying
2023-02-24 9:04 ` David Howells
2023-02-24 12:12 ` David Howells
2023-02-24 14:31 ` [RFC][PATCH] cifs: Fix cifs_writepages_region() David Howells
2023-02-24 16:06 ` Linus Torvalds
2023-02-24 16:11 ` Matthew Wilcox
2023-02-24 17:15 ` David Howells
2023-02-24 18:44 ` Matthew Wilcox [this message]
2023-02-24 20:13 ` David Howells
2023-02-24 20:16 ` Linus Torvalds
2023-02-24 20:45 ` Matthew Wilcox
2023-02-27 13:20 ` David Howells
2023-02-24 20:58 ` David Howells
2023-02-24 17:19 ` David Howells
2023-02-24 18:58 ` Linus Torvalds
2023-02-24 19:05 ` Linus Torvalds
2023-02-24 22:11 ` [EXTERNAL] " Steven French
2023-03-01 18:32 ` Steven French
2023-02-24 14:48 ` [RFC][PATCH] cifs, afs: Revert changes to {cifs,afs}_writepages_region() David Howells
2023-02-24 16:13 ` Linus Torvalds
2023-02-24 15:13 ` [RFC][PATCH] cifs: Improve use of filemap_get_folios_tag() David Howells
2023-02-24 16:22 ` Linus Torvalds
2023-02-24 17:22 ` David Howells
2023-02-26 2:43 ` [GIT PULL] MM updates for 6.3-rc1 Linus Torvalds
2023-02-26 3:27 ` Linus Torvalds
2023-02-26 3:53 ` Linus Torvalds
2023-02-26 3:57 ` Andrew Morton
2023-02-26 4:03 ` Linus Torvalds
2023-02-26 4:12 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y/kFnhUM5hjWM2Ae@casper.infradead.org \
--to=willy@infradead.org \
--cc=akpm@linux-foundation.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=dhowells@redhat.com \
--cc=jack@suse.cz \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mm-commits@vger.kernel.org \
--cc=pc@cjr.nz \
--cc=stfrench@microsoft.com \
--cc=torvalds@linux-foundation.org \
--cc=vishal.moola@gmail.com \
--cc=xhao@linux.alibaba.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox