From: Ojaswin Mujoo <ojaswin@linux.ibm.com>
To: Christian Brauner <brauner@kernel.org>,
djwong@kernel.org, ritesh.list@gmail.com,
john.g.garry@oracle.com, tytso@mit.edu, willy@infradead.org,
dchinner@redhat.com, hch@lst.de
Cc: linux-xfs@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, jack@suse.cz, nilay@linux.ibm.com,
martin.petersen@oracle.com, rostedt@goodmis.org, axboe@kernel.dk,
linux-block@vger.kernel.org, linux-trace-kernel@vger.kernel.org
Subject: [RFC PATCH 0/8] xfs: single block atomic writes for buffered IO
Date: Wed, 12 Nov 2025 16:36:03 +0530 [thread overview]
Message-ID: <cover.1762945505.git.ojaswin@linux.ibm.com> (raw)
This patch adds support to perform single block RWF_ATOMIC writes for
iomap xfs buffered IO. This builds upon the inital RFC shared by John
Garry last year [1]. Most of the details are present in the respective
commit messages but I'd mention some of the design points below:
1. The first 4 patches introduce the statx and iomap plubming and page
flags to add basic atomic writes support to buffered IO. However, there
are still 2 key restrictions that apply:
FIRST: If the user buffer of atomic write crosses page boundary, there's a
possibility of short write, example if 1 user page could not be faulted or got
reclaimed before the copy operation. For now don't allow such a scenario by
ensuring user buffer is page aligned. This way either the full write goes
through or nothing does. This is also discussed in Mathew Wilcox's comment here
[2]
This is lifted in patch 5. The approach we took was to:
1. pin the user pages
2. Create a BVEC out of the struct page to pass to
copy_folio_from_iter_atomic() rather than the USER backed iter. We
don't use the user iter directly because the pinned user page could
still get unmapped from the process, leading to short writes.
This approach allows us to only proceed if we are sure we will not have a short
copy.
SECOND: We only support block size == page size buf-io atomic writes.
This is to avoid the following scenario:
1. 4kb block atomic write marks the complete 64kb folio as
atomic.
2. Other writes, dirty the whole 64kb folio.
3. Writeback sees the whole folio dirty and atomic and tries
to send a 64kb atomic write, which might exceed the
allowed atomic write size and fail.
Patch 7 adds support for sub-page atomic write tracking to remove this
restriction. We do this by adding 2 more bitmaps to ifs to track atomic
write start and end.
Lastly, a non atomic write over an atomic write will remove the atomic
guarantee. Userspace is expected to make sure to sync the data to disk
after an atomic write before performing any overwrites.
This series has survived -g quick xfstests and I'll be continuing to
test it. Just wanted to put out the RFC to get some reviews on the
design and suggestions on any better approaches.
[1] https://lore.kernel.org/all/20240422143923.3927601-1-john.g.garry@oracle.com/
[2] https://lore.kernel.org/all/ZiZ8XGZz46D3PRKr@casper.infradead.org/
Thanks,
Ojaswin
John Garry (2):
fs: Rename STATX{_ATTR}_WRITE_ATOMIC -> STATX{_ATTR}_WRITE_ATOMIC_DIO
mm: Add PG_atomic
Ojaswin Mujoo (6):
fs: Add initial buffered atomic write support info to statx
iomap: buffered atomic write support
iomap: pin pages for RWF_ATOMIC buffered write
xfs: Report atomic write min and max for buf io as well
iomap: Add bs<ps buffered atomic writes support
xfs: Lift the bs == ps restriction for HW buffered atomic writes
.../filesystems/ext4/atomic_writes.rst | 4 +-
block/bdev.c | 7 +-
fs/ext4/inode.c | 9 +-
fs/iomap/buffered-io.c | 395 ++++++++++++++++--
fs/iomap/ioend.c | 21 +-
fs/iomap/trace.h | 12 +-
fs/read_write.c | 3 -
fs/stat.c | 33 +-
fs/xfs/xfs_file.c | 9 +-
fs/xfs/xfs_iops.c | 127 +++---
fs/xfs/xfs_iops.h | 6 +-
include/linux/fs.h | 3 +-
include/linux/iomap.h | 3 +
include/linux/page-flags.h | 5 +
include/trace/events/mmflags.h | 3 +-
include/trace/misc/fs.h | 3 +-
include/uapi/linux/stat.h | 10 +-
tools/include/uapi/linux/stat.h | 10 +-
.../trace/beauty/include/uapi/linux/stat.h | 10 +-
19 files changed, 551 insertions(+), 122 deletions(-)
--
2.51.0
next reply other threads:[~2025-11-12 11:07 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-12 11:06 Ojaswin Mujoo [this message]
2025-11-12 11:06 ` [RFC PATCH 1/8] fs: Rename STATX{_ATTR}_WRITE_ATOMIC -> STATX{_ATTR}_WRITE_ATOMIC_DIO Ojaswin Mujoo
2025-11-12 11:06 ` [RFC PATCH 2/8] mm: Add PG_atomic Ojaswin Mujoo
2025-11-12 15:56 ` Matthew Wilcox
2025-11-13 12:34 ` David Hildenbrand (Red Hat)
2025-11-14 5:00 ` Ritesh Harjani
2025-11-14 13:16 ` Matthew Wilcox
2025-11-18 16:17 ` Ritesh Harjani
2025-11-18 23:30 ` Dave Chinner
2025-11-12 11:06 ` [RFC PATCH 3/8] fs: Add initial buffered atomic write support info to statx Ojaswin Mujoo
2025-11-12 11:06 ` [RFC PATCH 4/8] iomap: buffered atomic write support Ojaswin Mujoo
2025-11-12 11:06 ` [RFC PATCH 5/8] iomap: pin pages for RWF_ATOMIC buffered write Ojaswin Mujoo
2025-11-12 11:06 ` [RFC PATCH 6/8] xfs: Report atomic write min and max for buf io as well Ojaswin Mujoo
2025-11-12 11:06 ` [RFC PATCH 7/8] iomap: Add bs<ps buffered atomic writes support Ojaswin Mujoo
2025-11-12 11:06 ` [RFC PATCH 8/8] xfs: Lift the bs == ps restriction for HW buffered atomic writes Ojaswin Mujoo
2025-11-12 15:50 ` [syzbot ci] Re: xfs: single block atomic writes for buffered IO syzbot ci
2025-11-12 21:56 ` [RFC PATCH 0/8] " Dave Chinner
2025-11-13 5:23 ` Christoph Hellwig
2025-11-13 5:42 ` Ritesh Harjani
2025-11-13 5:57 ` Christoph Hellwig
2025-11-13 10:32 ` Dave Chinner
2025-11-14 9:20 ` Ojaswin Mujoo
2025-11-14 13:18 ` Matthew Wilcox
2025-11-16 8:11 ` Dave Chinner
2025-11-17 10:59 ` John Garry
2025-11-17 20:51 ` Dave Chinner
2025-11-20 10:37 ` Ojaswin Mujoo
2025-11-20 12:14 ` Ojaswin Mujoo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1762945505.git.ojaswin@linux.ibm.com \
--to=ojaswin@linux.ibm.com \
--cc=axboe@kernel.dk \
--cc=brauner@kernel.org \
--cc=dchinner@redhat.com \
--cc=djwong@kernel.org \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=john.g.garry@oracle.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=nilay@linux.ibm.com \
--cc=ritesh.list@gmail.com \
--cc=rostedt@goodmis.org \
--cc=tytso@mit.edu \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox