From: Dave Chinner <david@fromorbit.com>
To: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Luis Chamberlain <mcgrof@kernel.org>,
lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org,
linux-mm <linux-mm@kvack.org>,
Daniel Gomez <da.gomez@samsung.com>,
Pankaj Raghav <p.raghav@samsung.com>,
Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@lst.de>,
Chris Mason <clm@fb.com>, Johannes Weiner <hannes@cmpxchg.org>,
Matthew Wilcox <willy@infradead.org>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [LSF/MM/BPF TOPIC] Measuring limits and enhancing buffered IO
Date: Wed, 28 Feb 2024 09:42:01 +1100 [thread overview]
Message-ID: <Zd5lORiOCUsARPWq@dread.disaster.area> (raw)
In-Reply-To: <d2zbdldh5l6flfwzcwo6pnhjpoihfiaafl7lqeqmxdbpgoj77y@fjpx3tcc4oev>
On Tue, Feb 27, 2024 at 05:21:20PM -0500, Kent Overstreet wrote:
> On Wed, Feb 28, 2024 at 09:13:05AM +1100, Dave Chinner wrote:
> > On Tue, Feb 27, 2024 at 05:07:30AM -0500, Kent Overstreet wrote:
> > > AFAIK every filesystem allows concurrent direct writes, not just xfs,
> > > it's _buffered_ writes that we care about here.
> >
> > We could do concurrent buffered writes in XFS - we would just use
> > the same locking strategy as direct IO and fall back on folio locks
> > for copy-in exclusion like ext4 does.
>
> ext4 code doesn't do that. it takes the inode lock in exclusive mode,
> just like everyone else.
Uhuh. ext4 does allow concurrent DIO writes. It's just much more
constrained than XFS. See ext4_dio_write_checks().
> > The real question is how much of userspace will that break, because
> > of implicit assumptions that the kernel has always serialised
> > buffered writes?
>
> What would break?
Good question. If you don't know the answer, then you've got the
same problem as I have. i.e. we don't know if concurrent
applications that use buffered IO extensively (eg. postgres) assume
data coherency because of the implicit serialisation occurring
during buffered IO writes?
> > > If we do a short write because of a page fault (despite previously
> > > faulting in the userspace buffer), there is no way to completely prevent
> > > torn writes an atomicity breakage; we could at least try a trylock on
> > > the inode lock, I didn't do that here.
> >
> > As soon as we go for concurrent writes, we give up on any concept of
> > atomicity of buffered writes (esp. w.r.t reads), so this really
> > doesn't matter at all.
>
> We've already given up buffered write vs. read atomicity, have for a
> long time - buffered read path takes no locks.
We still have explicit buffered read() vs buffered write() atomicity
in XFS via buffered reads taking the inode lock shared (see
xfs_file_buffered_read()) because that's what POSIX says we should
have.
Essentially, we need to explicitly give POSIX the big finger and
state that there are no atomicity guarantees given for write() calls
of any size, nor are there any guarantees for data coherency for
any overlapping concurrent buffered IO operations.
Those are things we haven't completely given up yet w.r.t. buffered
IO, and enabling concurrent buffered writes will expose to users.
So we need to have explicit policies for this and document them
clearly in all the places that application developers might look
for behavioural hints.
-Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2024-02-27 22:42 UTC|newest]
Thread overview: 90+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-23 23:59 Luis Chamberlain
2024-02-24 4:12 ` Matthew Wilcox
2024-02-24 17:31 ` Linus Torvalds
2024-02-24 18:13 ` Matthew Wilcox
2024-02-24 18:24 ` Linus Torvalds
2024-02-24 18:20 ` Linus Torvalds
2024-02-24 19:11 ` Linus Torvalds
2024-02-24 21:42 ` Theodore Ts'o
2024-02-24 22:57 ` Chris Mason
2024-02-24 23:40 ` Linus Torvalds
2024-05-10 23:57 ` Luis Chamberlain
2024-02-25 5:18 ` Kent Overstreet
2024-02-25 6:04 ` Kent Overstreet
2024-02-25 13:10 ` Matthew Wilcox
2024-02-25 17:03 ` Linus Torvalds
2024-02-25 21:14 ` Matthew Wilcox
2024-02-25 23:45 ` Linus Torvalds
2024-02-26 1:02 ` Kent Overstreet
2024-02-26 1:32 ` Linus Torvalds
2024-02-26 1:58 ` Kent Overstreet
2024-02-26 2:06 ` Kent Overstreet
2024-02-26 2:34 ` Linus Torvalds
2024-02-26 2:50 ` Al Viro
2024-02-26 17:17 ` Linus Torvalds
2024-02-26 21:07 ` Matthew Wilcox
2024-02-26 21:17 ` Kent Overstreet
2024-02-26 21:19 ` Kent Overstreet
2024-02-26 21:55 ` Paul E. McKenney
2024-02-26 23:29 ` Kent Overstreet
2024-02-27 0:05 ` Paul E. McKenney
2024-02-27 0:29 ` Kent Overstreet
2024-02-27 0:55 ` Paul E. McKenney
2024-02-27 1:08 ` Kent Overstreet
2024-02-27 5:17 ` Paul E. McKenney
2024-02-27 6:21 ` Kent Overstreet
2024-02-27 15:32 ` Paul E. McKenney
2024-02-27 15:52 ` Kent Overstreet
2024-02-27 16:06 ` Paul E. McKenney
2024-02-27 15:54 ` Matthew Wilcox
2024-02-27 16:21 ` Paul E. McKenney
2024-02-27 16:34 ` Kent Overstreet
2024-02-27 17:58 ` Paul E. McKenney
2024-02-28 23:55 ` Kent Overstreet
2024-02-29 19:42 ` Paul E. McKenney
2024-02-29 20:51 ` Kent Overstreet
2024-03-05 2:19 ` Paul E. McKenney
2024-02-27 0:43 ` Dave Chinner
2024-02-26 22:46 ` Linus Torvalds
2024-02-26 23:48 ` Linus Torvalds
2024-02-27 7:21 ` Kent Overstreet
2024-02-27 15:39 ` Matthew Wilcox
2024-02-27 15:54 ` Kent Overstreet
2024-02-27 16:34 ` Linus Torvalds
2024-02-27 16:47 ` Kent Overstreet
2024-02-27 17:07 ` Linus Torvalds
2024-02-27 17:20 ` Kent Overstreet
2024-02-27 18:02 ` Linus Torvalds
2024-05-14 11:52 ` Luis Chamberlain
2024-05-14 16:04 ` Linus Torvalds
2024-11-15 19:43 ` Linus Torvalds
2024-11-15 20:42 ` Matthew Wilcox
2024-11-15 21:52 ` Linus Torvalds
2024-02-25 21:29 ` Kent Overstreet
2024-02-25 17:32 ` Kent Overstreet
2024-02-24 17:55 ` Luis Chamberlain
2024-02-25 5:24 ` Kent Overstreet
2024-02-26 12:22 ` Dave Chinner
2024-02-27 10:07 ` Kent Overstreet
2024-02-27 14:08 ` Luis Chamberlain
2024-02-27 14:57 ` Kent Overstreet
2024-02-27 22:13 ` Dave Chinner
2024-02-27 22:21 ` Kent Overstreet
2024-02-27 22:42 ` Dave Chinner [this message]
2024-02-28 7:48 ` [Lsf-pc] " Amir Goldstein
2024-02-28 14:01 ` Chris Mason
2024-02-29 0:25 ` Dave Chinner
2024-02-29 0:57 ` Kent Overstreet
2024-03-04 0:46 ` Dave Chinner
2024-02-27 22:46 ` Linus Torvalds
2024-02-27 23:00 ` Linus Torvalds
2024-02-28 2:22 ` Kent Overstreet
2024-02-28 3:00 ` Matthew Wilcox
2024-02-28 4:22 ` Matthew Wilcox
2024-02-28 17:34 ` Kent Overstreet
2024-02-28 18:04 ` Matthew Wilcox
2024-02-28 18:18 ` Kent Overstreet
2024-02-28 19:09 ` Linus Torvalds
2024-02-28 19:29 ` Kent Overstreet
2024-02-28 20:17 ` Linus Torvalds
2024-02-28 23:21 ` Kent Overstreet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zd5lORiOCUsARPWq@dread.disaster.area \
--to=david@fromorbit.com \
--cc=axboe@kernel.dk \
--cc=clm@fb.com \
--cc=da.gomez@samsung.com \
--cc=hannes@cmpxchg.org \
--cc=hch@lst.de \
--cc=kent.overstreet@linux.dev \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=mcgrof@kernel.org \
--cc=p.raghav@samsung.com \
--cc=torvalds@linux-foundation.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox