From: Joanne Koong <joannelkoong@gmail.com>
To: Jeff Layton <jlayton@kernel.org>
Cc: miklos@szeredi.hu, akpm@linux-foundation.org,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
jefflexu@linux.alibaba.com, shakeel.butt@linux.dev,
david@redhat.com, bernd.schubert@fastmail.fm, ziy@nvidia.com,
kernel-team@meta.com
Subject: Re: [PATCH v7 0/3] fuse: remove temp page copies in writeback
Date: Mon, 14 Apr 2025 13:28:47 -0700 [thread overview]
Message-ID: <CAJnrk1a6QoLWPOvoi4vakGOWTp9xDU=SCiPHx+cEg_Kdf6rYLA@mail.gmail.com> (raw)
In-Reply-To: <0e00e8b306620c781868f375a462127d72b26289.camel@kernel.org>
On Mon, Apr 14, 2025 at 9:21 AM Jeff Layton <jlayton@kernel.org> wrote:
>
> On Fri, 2025-04-04 at 11:14 -0700, Joanne Koong wrote:
> > The purpose of this patchset is to help make writeback in FUSE filesystems as
> > fast as possible.
> >
> > In the current FUSE writeback design (see commit 3be5a52b30aa
> > ("fuse: support writable mmap"))), a temp page is allocated for every dirty
> > page to be written back, the contents of the dirty page are copied over to the
> > temp page, and the temp page gets handed to the server to write back. This is
> > done so that writeback may be immediately cleared on the dirty page, and this
> > in turn is done in order to mitigate the following deadlock scenario that may
> > arise if reclaim waits on writeback on the dirty page to complete (more details
> > can be found in this thread [1]):
> > * single-threaded FUSE server is in the middle of handling a request
> > that needs a memory allocation
> > * memory allocation triggers direct reclaim
> > * direct reclaim waits on a folio under writeback
> > * the FUSE server can't write back the folio since it's stuck in
> > direct reclaim
> >
> > Allocating and copying dirty pages to temp pages is the biggest performance
> > bottleneck for FUSE writeback. This patchset aims to get rid of the temp page
> > altogether (which will also allow us to get rid of the internal FUSE rb tree
> > that is needed to keep track of writeback status on the temp pages).
> > Benchmarks show approximately a 20% improvement in throughput for 4k
> > block-size writes and a 45% improvement for 1M block-size writes.
> >
> > In the current reclaim code, there is one scenario where writeback is waited
> > on, which is the case where the system is running legacy cgroupv1 and reclaim
> > encounters a folio that already has the reclaim flag set and the caller did
> > not have __GFP_FS (or __GFP_IO if swap) set.
> >
> > This patchset adds a new mapping flag, AS_WRITEBACK_INDETERMINATE, which
> > filesystems may set on its inode mappings to indicate that writeback
> > operations may take an indeterminate amount of time to complete. FUSE will set
> > this flag on its mappings. Reclaim for the legacy cgroup v1 case described
> > above will skip reclaim of folios with that flag set.
> >
> > With this change, writeback state is now only cleared on the dirty page after
> > the server has written it back to disk. If the server is deliberately
> > malicious or well-intentioned but buggy, this may stall sync(2) and page
> > migration, but for sync(2), a malicious server may already stall this by not
> > replying to the FUSE_SYNCFS request and for page migration, there are already
> > many easier ways to stall this by having FUSE permanently hold the folio lock.
> > A fuller discussion on this can be found in [2]. Long-term, there needs to be
> > a more comprehensive solution for addressing migration of FUSE pages that
> > handles all scenarios where FUSE may permanently hold the lock, but that is
> > outside the scope of this patchset and will be done as future work. Please
> > also note that this change also now ensures that when sync(2) returns, FUSE
> > filesystems will have persisted writeback changes.
> >
> > [1] https://lore.kernel.org/linux-kernel/495d2400-1d96-4924-99d3-8b2952e05fc3@linux.alibaba.com/
> > [2] https://lore.kernel.org/linux-fsdevel/20241122232359.429647-1-joannelkoong@gmail.com/
> >
> > Changelog
> > ---------
> > v6:
> > https://lore.kernel.org/linux-fsdevel/20241122232359.429647-1-joannelkoong@gmail.com/
> > Changes from v6 -> v7:
> > * Drop migration and sync patches, as they are useless if a server is
> > determined to be malicious
> >
> > v5:
> > https://lore.kernel.org/linux-fsdevel/20241115224459.427610-1-joannelkoong@gmail.com/
> > Changes from v5 -> v6:
> > * Add Shakeel and Jingbo's reviewed-bys
> > * Move folio_end_writeback() to fuse_writepage_finish() (Jingbo)
> > * Embed fuse_writepage_finish_stat() logic inline (Jingbo)
> > * Remove node_stat NR_WRITEBACK inc/sub (Jingbo)
> >
> > v4:
> > https://lore.kernel.org/linux-fsdevel/20241107235614.3637221-1-joannelkoong@gmail.com/
> > Changes from v4 -> v5:
> > * AS_WRITEBACK_MAY_BLOCK -> AS_WRITEBACK_INDETERMINATE (Shakeel)
> > * Drop memory hotplug patch (David and Shakeel)
> > * Remove some more kunnecessary writeback waits in fuse code (Jingbo)
> > * Make commit message for reclaim patch more concise - drop part about
> > deadlock and just focus on how it may stall waits
> >
> > v3:
> > https://lore.kernel.org/linux-fsdevel/20241107191618.2011146-1-joannelkoong@gmail.com/
> > Changes from v3 -> v4:
> > * Use filemap_fdatawait_range() instead of filemap_range_has_writeback() in
> > readahead
> >
> > v2:
> > https://lore.kernel.org/linux-fsdevel/20241014182228.1941246-1-joannelkoong@gmail.com/
> > Changes from v2 -> v3:
> > * Account for sync and page migration cases as well (Miklos)
> > * Change AS_NO_WRITEBACK_RECLAIM to the more generic AS_WRITEBACK_MAY_BLOCK
> > * For fuse inodes, set mapping_writeback_may_block only if fc->writeback_cache
> > is enabled
> >
> > v1:
> > https://lore.kernel.org/linux-fsdevel/20241011223434.1307300-1-joannelkoong@gmail.com/T/#t
> > Changes from v1 -> v2:
> > * Have flag in "enum mapping_flags" instead of creating asop_flags (Shakeel)
> > * Set fuse inodes to use AS_NO_WRITEBACK_RECLAIM (Shakeel)
> >
> > Joanne Koong (3):
> > mm: add AS_WRITEBACK_INDETERMINATE mapping flag
> > mm: skip reclaiming folios in legacy memcg writeback indeterminate
> > contexts
> > fuse: remove tmp folio for writebacks and internal rb tree
> >
> > fs/fuse/file.c | 360 ++++------------------------------------
> > fs/fuse/fuse_i.h | 3 -
> > include/linux/pagemap.h | 11 ++
> > mm/vmscan.c | 10 +-
> > 4 files changed, 46 insertions(+), 338 deletions(-)
> >
>
> This looks sane, and I love that diffstat.
>
> I also agree with David about changing the flag name to something more
> specific. As a kernel engineer, anything with "INDETERMINATE" in the
> name gives me the ick.
>
> Assuming that the only real change in v8 will be the flag name change,
> you can add:
>
> Reviewed-by: Jeff Layton <jlayton@kernel.org>
>
> Assuming others are ok with this, how do you see this going in? Maybe
> Andrew could pick up the mm bits and Miklos could take the FUSE patch?
Thanks for the review. The only thing I plan to change for v8 is the
flag name and removing the unneeded fuse_sync_writes() call in
fuse_flush() that Jingbo pointed out.
With v8, I'm hoping the mm bits (first 2 patches) could be picked up
by Andrew and that the 3rd patch (the one with FUSE changes) could be
taken by Miklos, as the FUSE large folios patchset [1] I will be
resending will depend on patch 3.
Thanks,
Joanne
[1] https://lore.kernel.org/linux-fsdevel/20241213221818.322371-1-joannelkoong@gmail.com/
prev parent reply other threads:[~2025-04-14 20:29 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-04 18:14 Joanne Koong
2025-04-04 18:14 ` [PATCH v7 1/3] mm: add AS_WRITEBACK_INDETERMINATE mapping flag Joanne Koong
2025-04-04 19:13 ` David Hildenbrand
2025-04-04 20:09 ` Joanne Koong
2025-04-04 20:13 ` David Hildenbrand
2025-04-09 22:05 ` Shakeel Butt
2025-04-09 23:48 ` Joanne Koong
2025-04-04 18:14 ` [PATCH v7 2/3] mm: skip reclaiming folios in legacy memcg writeback indeterminate contexts Joanne Koong
2025-04-04 18:14 ` [PATCH v7 3/3] fuse: remove tmp folio for writebacks and internal rb tree Joanne Koong
2025-04-09 2:43 ` Jingbo Xu
2025-04-09 23:47 ` Joanne Koong
2025-04-10 2:12 ` Jingbo Xu
2025-04-10 15:07 ` Joanne Koong
2025-04-10 15:11 ` Jingbo Xu
2025-04-10 16:11 ` Joanne Koong
2025-04-14 20:24 ` Joanne Koong
2025-04-15 7:49 ` Jingbo Xu
2025-04-15 15:59 ` Joanne Koong
2025-04-16 1:40 ` Jingbo Xu
2025-04-16 16:43 ` Joanne Koong
2025-04-16 18:05 ` Bernd Schubert
2025-04-14 16:21 ` [PATCH v7 0/3] fuse: remove temp page copies in writeback Jeff Layton
2025-04-14 20:28 ` Joanne Koong [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAJnrk1a6QoLWPOvoi4vakGOWTp9xDU=SCiPHx+cEg_Kdf6rYLA@mail.gmail.com' \
--to=joannelkoong@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=bernd.schubert@fastmail.fm \
--cc=david@redhat.com \
--cc=jefflexu@linux.alibaba.com \
--cc=jlayton@kernel.org \
--cc=kernel-team@meta.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=miklos@szeredi.hu \
--cc=shakeel.butt@linux.dev \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox