From: Dave Chinner <david@fromorbit.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: Baokun Li <libaokun1@huawei.com>, Jan Kara <jack@suse.cz>,
linux-mm@kvack.org, linux-ext4@vger.kernel.org, tytso@mit.edu,
adilger.kernel@dilger.ca, willy@infradead.org,
akpm@linux-foundation.org, ritesh.list@gmail.com,
linux-kernel@vger.kernel.org, yi.zhang@huawei.com,
yangerkun@huawei.com, yukuai3@huawei.com
Subject: Re: [PATCH -RFC 0/2] mm/ext4: avoid data corruption when extending DIO write race with buffered read
Date: Wed, 6 Dec 2023 21:34:49 +1100 [thread overview]
Message-ID: <ZXBOSRhm11DtGO+K@dread.disaster.area> (raw)
In-Reply-To: <ZXA4swgzsHbkm/uB@infradead.org>
On Wed, Dec 06, 2023 at 01:02:43AM -0800, Christoph Hellwig wrote:
> On Wed, Dec 06, 2023 at 07:35:35PM +1100, Dave Chinner wrote:
> > Mixing overlapping buffered read with direct writes - especially partial block
> > extending DIO writes - is a recipe for data corruption. It's not a
> > matter of if, it's a matter of when.
> >
> > Fundamentally, when you have overlapping write IO involving DIO, the
> > result of the overlapping IOs is undefined. One cannot control
> > submission order, the order that the overlapping IO hit the
> > media, or completion ordering that might clear flags like unwritten
> > extents. The only guarantee that we give in this case is that we
> > won't expose stale data from the disk to the user read.
>
> Btw, one thing we could do to kill these races forever is to track if
> there are any buffered openers for an inode and just fall back to
> buffered I/O for that case. With that and and inode_dio_wait for
> when opening for buffered I/O we'd avoid the races an various crazy
> workarounds entirely.
That's basically what Solaris did 20-25 years ago. The inode held a
flag that indicated what IO was being done, and if the "buffered"
flag was set (either through mmap() based access or buffered
read/write syscalls) then direct IO would do also do buffered IO
until the flag was cleared and the cache cleaned and invalidated.
That had .... problems.
Largely they were performance problems - unpredictable IO latency
and CPU overhead for IO meant applications would randomly miss SLAs.
The application would see IO suddenly lose all concurrency, go real
slow and/or burn lots more CPU when the inode switched to buffered
mode.
I'm not sure that's a particularly viable model given the raw IO
throughput even cheap modern SSDs largely exceeds the capability of
buffered IO through the page cache. The differences in concurrency,
latency and throughput between buffered and DIO modes will be even
more stark itoday than they were 20 years ago....
-Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2023-12-06 10:34 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-02 9:14 Baokun Li
2023-12-02 9:14 ` [PATCH -RFC 1/2] mm: " Baokun Li
2023-12-02 9:14 ` [PATCH -RFC 2/2] ext4: " Baokun Li
2023-12-04 12:11 ` [PATCH -RFC 0/2] mm/ext4: " Jan Kara
2023-12-04 13:50 ` Baokun Li
2023-12-04 14:41 ` Jan Kara
2023-12-05 12:50 ` Baokun Li
2023-12-06 19:37 ` Jan Kara
2023-12-07 3:01 ` Baokun Li
2023-12-07 14:15 ` Baokun Li
2023-12-11 17:49 ` Jan Kara
2023-12-12 2:15 ` Baokun Li
2023-12-12 4:36 ` Matthew Wilcox
2023-12-12 14:25 ` Jan Kara
2023-12-05 4:17 ` Theodore Ts'o
2023-12-05 13:19 ` Baokun Li
2023-12-06 21:55 ` Theodore Ts'o
2023-12-07 6:41 ` Baokun Li
2023-12-06 8:35 ` Dave Chinner
2023-12-06 9:02 ` Christoph Hellwig
2023-12-06 10:34 ` Dave Chinner [this message]
2023-12-06 12:20 ` Christoph Hellwig
2023-12-06 11:57 ` Baokun Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZXBOSRhm11DtGO+K@dread.disaster.area \
--to=david@fromorbit.com \
--cc=adilger.kernel@dilger.ca \
--cc=akpm@linux-foundation.org \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=libaokun1@huawei.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ritesh.list@gmail.com \
--cc=tytso@mit.edu \
--cc=willy@infradead.org \
--cc=yangerkun@huawei.com \
--cc=yi.zhang@huawei.com \
--cc=yukuai3@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox