From: Joanne Koong <joannelkoong@gmail.com>
To: miklos@szeredi.hu, linux-fsdevel@vger.kernel.org
Cc: shakeel.butt@linux.dev, jefflexu@linux.alibaba.com,
josef@toxicpanda.com, linux-mm@kvack.org,
bernd.schubert@fastmail.fm, kernel-team@meta.com
Subject: [PATCH v4 0/6] fuse: remove temp page copies in writeback
Date: Thu, 7 Nov 2024 15:56:08 -0800 [thread overview]
Message-ID: <20241107235614.3637221-1-joannelkoong@gmail.com> (raw)
The purpose of this patchset is to help make writeback-cache write
performance in FUSE filesystems as fast as possible.
In the current FUSE writeback design (see commit 3be5a52b30aa
("fuse: support writable mmap"))), a temp page is allocated for every dirty
page to be written back, the contents of the dirty page are copied over to the
temp page, and the temp page gets handed to the server to write back. This is
done so that writeback may be immediately cleared on the dirty page, which is
done in order to mitigate the following deadlock scenario that may arise if
reclaim waits on writeback on the dirty page to complete (more details can be
found in this thread [1]):
* single-threaded FUSE server is in the middle of handling a request
that needs a memory allocation
* memory allocation triggers direct reclaim
* direct reclaim waits on a folio under writeback
* the FUSE server can't write back the folio since it's stuck in
direct reclaim
Allocating and copying dirty pages to temp pages is the biggest performance
bottleneck for FUSE writeback. This patchset aims to get rid of the temp page
altogether (which will also allow us to get rid of the internal FUSE rb tree
that is needed to keep track of writeback status on the temp pages).
Benchmarks show approximately a 20% improvement in throughput for 4k
block-size writes and a 45% improvement for 1M block-size writes.
With removing the temp page, writeback state is now only cleared on the dirty
page after the server has written it back to disk. This may take an
indeterminate amount of time. As well, there is also the possibility of
malicious or well-intentioned but buggy servers where writeback may in the
worst case scenario, never complete. This means that any
folio_wait_writeback() on a dirty page belonging to a FUSE filesystem needs to
be carefully audited.
In particular, these are the cases that need to be accounted for:
* potentially deadlocking in reclaim, as mentioned above
* potentially stalling sync(2)
* potentially stalling page migration / compaction
This patchset adds a new mapping flag, AS_WRITEBACK_MAY_BLOCK, which
filesystems may set on its inode mappings to indicate that writeback
operations
may take an indeterminate amount of time to complete. FUSE will set this flag
on its mappings. This patchset adds checks to the critical parts of reclaim,
sync, and page migration logic where writeback may be waited on.
Please note the following:
* For sync(2), waiting on writeback will be skipped for FUSE, but this has no
effect on existing behavior. Dirty FUSE pages are already not guaranteed to
be written to disk by the time sync(2) returns (eg writeback is cleared on
the dirty page but the server may not have written out the temp page to disk
yet). If the caller wishes to ensure the data has actually been synced to
disk, they should use fsync(2)/fdatasync(2) instead.
* AS_WRITEBACK_MAY_BLOCK does not indicate that the folios should never be
waited on when in writeback. There are some cases where the wait is
desirable. For example, for the sync_file_range() syscall, it is fine to
wait on the writeback since the caller passes in a fd for the operation.
[1]
https://lore.kernel.org/linux-kernel/495d2400-1d96-4924-99d3-8b2952e05fc3@linux.alibaba.com/
Changelog
---------
v3:
https://lore.kernel.org/linux-fsdevel/20241107191618.2011146-1-joannelkoong@gmail.com/
Changes from v3 -> v4:
* Use filemap_fdatawait_range() instead of filemap_range_has_writeback() in
readahead
v2:
https://lore.kernel.org/linux-fsdevel/20241014182228.1941246-1-joannelkoong@gmail.com/
Changes from v2 -> v3:
* Account for sync and page migration cases as well (Miklos)
* Change AS_NO_WRITEBACK_RECLAIM to the more generic AS_WRITEBACK_MAY_BLOCK
* For fuse inodes, set mapping_writeback_may_block only if fc->writeback_cache
is enabled
v1:
https://lore.kernel.org/linux-fsdevel/20241011223434.1307300-1-joannelkoong@gmail.com/T/#t
Changes from v1 -> v2:
* Have flag in "enum mapping_flags" instead of creating asop_flags (Shakeel)
* Set fuse inodes to use AS_NO_WRITEBACK_RECLAIM (Shakeel)
Joanne Koong (6):
mm: add AS_WRITEBACK_MAY_BLOCK mapping flag
mm: skip reclaiming folios in legacy memcg writeback contexts that may
block
fs/writeback: in wait_sb_inodes(), skip wait for
AS_WRITEBACK_MAY_BLOCK mappings
mm/memory-hotplug: add finite retries in offline_pages() if migration
fails
mm/migrate: skip migrating folios under writeback with
AS_WRITEBACK_MAY_BLOCK mappings
fuse: remove tmp folio for writebacks and internal rb tree
fs/fs-writeback.c | 3 +
fs/fuse/file.c | 339 ++++------------------------------------
include/linux/pagemap.h | 11 ++
mm/memory_hotplug.c | 13 +-
mm/migrate.c | 5 +-
mm/vmscan.c | 10 +-
6 files changed, 60 insertions(+), 321 deletions(-)
--
2.43.5
next reply other threads:[~2024-11-07 23:56 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-07 23:56 Joanne Koong [this message]
2024-11-07 23:56 ` [PATCH v4 1/6] mm: add AS_WRITEBACK_MAY_BLOCK mapping flag Joanne Koong
2024-11-09 0:10 ` Shakeel Butt
2024-11-11 21:11 ` Joanne Koong
2024-11-15 19:33 ` Joanne Koong
2024-11-15 20:17 ` Joanne Koong
2024-11-07 23:56 ` [PATCH v4 2/6] mm: skip reclaiming folios in legacy memcg writeback contexts that may block Joanne Koong
2024-11-09 0:16 ` Shakeel Butt
2024-11-07 23:56 ` [PATCH v4 3/6] fs/writeback: in wait_sb_inodes(), skip wait for AS_WRITEBACK_MAY_BLOCK mappings Joanne Koong
2024-11-07 23:56 ` [PATCH v4 4/6] mm/memory-hotplug: add finite retries in offline_pages() if migration fails Joanne Koong
2024-11-08 17:33 ` SeongJae Park
2024-11-08 18:56 ` David Hildenbrand
2024-11-08 19:00 ` David Hildenbrand
2024-11-08 21:27 ` Shakeel Butt
2024-11-08 21:42 ` Joanne Koong
2024-11-08 22:16 ` Shakeel Butt
2024-11-08 22:20 ` Joanne Koong
2024-11-08 21:59 ` Joanne Koong
2024-11-07 23:56 ` [PATCH v4 5/6] mm/migrate: skip migrating folios under writeback with AS_WRITEBACK_MAY_BLOCK mappings Joanne Koong
2024-11-07 23:56 ` [PATCH v4 6/6] fuse: remove tmp folio for writebacks and internal rb tree Joanne Koong
2024-11-08 8:48 ` Jingbo Xu
2024-11-08 22:33 ` Joanne Koong
2024-11-11 8:32 ` Jingbo Xu
2024-11-11 21:30 ` Joanne Koong
2024-11-12 2:31 ` Jingbo Xu
2024-11-13 19:11 ` Joanne Koong
2024-11-12 9:25 ` Jingbo Xu
2024-11-14 0:39 ` Joanne Koong
2024-11-14 1:46 ` Jingbo Xu
2024-11-14 18:19 ` Joanne Koong
2024-11-15 2:18 ` Jingbo Xu
2024-11-15 18:29 ` Joanne Koong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241107235614.3637221-1-joannelkoong@gmail.com \
--to=joannelkoong@gmail.com \
--cc=bernd.schubert@fastmail.fm \
--cc=jefflexu@linux.alibaba.com \
--cc=josef@toxicpanda.com \
--cc=kernel-team@meta.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=miklos@szeredi.hu \
--cc=shakeel.butt@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox