From: Shakeel Butt <shakeel.butt@linux.dev>
To: Joanne Koong <joannelkoong@gmail.com>
Cc: miklos@szeredi.hu, akpm@linux-foundation.org,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
jefflexu@linux.alibaba.com, david@redhat.com,
bernd.schubert@fastmail.fm, ziy@nvidia.com, jlayton@kernel.org,
kernel-team@meta.com
Subject: Re: [PATCH v8 0/2] fuse: remove temp page copies in writeback
Date: Mon, 14 Apr 2025 17:06:57 -0700 [thread overview]
Message-ID: <57pojgb4bsesfvbbeit3ohjre5sorcafqs62zszrdgfeyp3qaz@k732xugk53lm> (raw)
In-Reply-To: <CAJnrk1bOJYFTAybYHL9HW=Ex7rs3DgYU10W=7wsuu8t1OoMx8Q@mail.gmail.com>
On Mon, Apr 14, 2025 at 04:36:58PM -0700, Joanne Koong wrote:
> On Mon, Apr 14, 2025 at 3:47 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> >
> > On Mon, Apr 14, 2025 at 03:22:08PM -0700, Joanne Koong wrote:
> > > The purpose of this patchset is to help make writeback in FUSE filesystems as
> > > fast as possible.
> > >
> > > In the current FUSE writeback design (see commit 3be5a52b30aa
> > > ("fuse: support writable mmap"))), a temp page is allocated for every dirty
> > > page to be written back, the contents of the dirty page are copied over to the
> > > temp page, and the temp page gets handed to the server to write back. This is
> > > done so that writeback may be immediately cleared on the dirty page, and this
> > > in turn is done in order to mitigate the following deadlock scenario that may
> > > arise if reclaim waits on writeback on the dirty page to complete (more
> > > details
> > > can be found in this thread [1]):
> > > * single-threaded FUSE server is in the middle of handling a request
> > > that needs a memory allocation
> > > * memory allocation triggers direct reclaim
> > > * direct reclaim waits on a folio under writeback
> > > * the FUSE server can't write back the folio since it's stuck in
> > > direct reclaim
> > >
> > > Allocating and copying dirty pages to temp pages is the biggest performance
> > > bottleneck for FUSE writeback. This patchset aims to get rid of the temp page
> > > altogether (which will also allow us to get rid of the internal FUSE rb tree
> > > that is needed to keep track of writeback status on the temp pages).
> > > Benchmarks show approximately a 20% improvement in throughput for 4k
> > > block-size writes and a 45% improvement for 1M block-size writes.
> > >
> > > In the current reclaim code, there is one scenario where writeback is waited
> > > on, which is the case where the system is running legacy cgroupv1 and reclaim
> > > encounters a folio that already has the reclaim flag set and the caller did
> > > not have __GFP_FS (or __GFP_IO if swap) set.
> > >
> > > This patchset adds a new mapping flag, AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM,
> > > which filesystems may set on its inode mappings to indicate that reclaim
> > > should not wait on writeback. FUSE will set this flag on its mappings. Reclaim
> > > for the legacy cgroup v1 case described above will skip reclaim of folios with
> > > that flag set. With this flag set, now FUSE can remove temp pages altogether.
> > >
> > > With this change, writeback state is now only cleared on the dirty page after
> > > the server has written it back to disk. If the server is deliberately
> > > malicious or well-intentioned but buggy, this may stall sync(2) and page
> > > migration, but for sync(2), a malicious server may already stall this by not
> > > replying to the FUSE_SYNCFS request and for page migration, there are already
> > > many easier ways to stall this by having FUSE permanently hold the folio lock.
> > > A fuller discussion on this can be found in [2]. Long-term, there needs to be
> > > a more comprehensive solution for addressing migration of FUSE pages that
> > > handles all scenarios where FUSE may permanently hold the lock, but that is
> > > outside the scope of this patchset and will be done as future work. Please
> > > also note that this change also now ensures that when sync(2) returns, FUSE
> > > filesystems will have persisted writeback changes.
> > >
> > > For this patchset, it would be ideal if the first patch could be taken by
> > > Andrew to the mm tree and the second patch could be taken by Miklos into the
> > > fuse tree, as the fuse large folios patchset [3] depends on the second patch.
> >
> > Why not take both patches through FUSE tree? Second patch has dependency
> > on first patch, so there is no need to keep them separate.
>
> If that's possible, that sounds great to me too. The patchset went
> through Andrew's mm tree last time, so I'm not sure if the protocol is
> that any/all mm changes need to go through Andrew's tree.
This series can go through mm tree or fuse tree but it seems like you
plan to do a followup fuse work which requires this series. I would
suggest to go through fuse tree. Just let Andrew know and he is mostly
fine with it.
next prev parent reply other threads:[~2025-04-15 0:07 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-14 22:22 Joanne Koong
2025-04-14 22:22 ` [PATCH v8 1/2] mm: skip folio reclaim in legacy memcg contexts for deadlockable mappings Joanne Koong
2025-04-15 10:00 ` David Hildenbrand
2025-04-14 22:22 ` [PATCH v8 2/2] fuse: remove tmp folio for writebacks and internal rb tree Joanne Koong
2025-04-14 22:47 ` [PATCH v8 0/2] fuse: remove temp page copies in writeback Shakeel Butt
2025-04-14 23:36 ` Joanne Koong
2025-04-15 0:06 ` Shakeel Butt [this message]
2025-04-15 8:08 ` Jingbo Xu
2025-04-15 12:30 ` Miklos Szeredi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=57pojgb4bsesfvbbeit3ohjre5sorcafqs62zszrdgfeyp3qaz@k732xugk53lm \
--to=shakeel.butt@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=bernd.schubert@fastmail.fm \
--cc=david@redhat.com \
--cc=jefflexu@linux.alibaba.com \
--cc=jlayton@kernel.org \
--cc=joannelkoong@gmail.com \
--cc=kernel-team@meta.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=miklos@szeredi.hu \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox