From: Jan Kara <jack@suse.cz>
To: Baokun Li <libaokun1@huawei.com>
Cc: linux-mm@kvack.org, linux-ext4@vger.kernel.org, tytso@mit.edu,
adilger.kernel@dilger.ca, jack@suse.cz, willy@infradead.org,
akpm@linux-foundation.org, ritesh.list@gmail.com,
linux-kernel@vger.kernel.org, yi.zhang@huawei.com,
yangerkun@huawei.com, yukuai3@huawei.com
Subject: Re: [PATCH -RFC 0/2] mm/ext4: avoid data corruption when extending DIO write race with buffered read
Date: Mon, 4 Dec 2023 13:11:20 +0100 [thread overview]
Message-ID: <20231204121120.mpxntey47rluhcfi@quack3> (raw)
In-Reply-To: <20231202091432.8349-1-libaokun1@huawei.com>
Hello!
On Sat 02-12-23 17:14:30, Baokun Li wrote:
> Recently, while running some pressure tests on MYSQL, noticed that
> occasionally a "corrupted data in log event" error would be reported.
> After analyzing the error, I found that extending DIO write and buffered
> read were competing, resulting in some zero-filled page end being read.
> Since ext4 buffered read doesn't hold an inode lock, and there is no
> field in the page to indicate the valid data size, it seems to me that
> it is impossible to solve this problem perfectly without changing these
> two things.
Yes, combining buffered reads with direct IO writes is a recipe for
problems and pretty much in the "don't do it" territory. So honestly I'd
consider this a MYSQL bug. Were you able to identify why does MYSQL use
buffered read in this case? It is just something specific to the test
you're doing?
> In this series, the first patch reads the inode size twice, and takes the
> smaller of the two values as the copyout limit to avoid copying data that
> was not actually read (0-padding) into the user buffer and causing data
> corruption. This greatly reduces the probability of problems under 4k
> page. However, the problem is still easily triggered under 64k page.
>
> The second patch waits for the existing dio write to complete and
> invalidate the stale page cache before performing a new buffered read
> in ext4, avoiding data corruption by copying the stale page cache to
> the user buffer. This makes it much less likely that the problem will
> be triggered in a 64k page.
>
> Do we have a plan to add a lock to the ext4 buffered read or a field in
> the page that indicates the size of the valid data in the page? Or does
> anyone have a better idea?
No, there are no plans to address this AFAIK. Because such locking will
slow down all the well behaved applications to fix a corner case for
application doing unsupported things. Sure we must not crash the kernel,
corrupt the filesystem or leak sensitive (e.g. uninitialized) data if app
combines buffered and direct IO but returning zeros instead of valid data
is in my opinion fully within the range of acceptable behavior for such
case.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
next prev parent reply other threads:[~2023-12-04 12:11 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-02 9:14 Baokun Li
2023-12-02 9:14 ` [PATCH -RFC 1/2] mm: " Baokun Li
2023-12-02 9:14 ` [PATCH -RFC 2/2] ext4: " Baokun Li
2023-12-04 12:11 ` Jan Kara [this message]
2023-12-04 13:50 ` [PATCH -RFC 0/2] mm/ext4: " Baokun Li
2023-12-04 14:41 ` Jan Kara
2023-12-05 12:50 ` Baokun Li
2023-12-06 19:37 ` Jan Kara
2023-12-07 3:01 ` Baokun Li
2023-12-07 14:15 ` Baokun Li
2023-12-11 17:49 ` Jan Kara
2023-12-12 2:15 ` Baokun Li
2023-12-12 4:36 ` Matthew Wilcox
2023-12-12 14:25 ` Jan Kara
2023-12-05 4:17 ` Theodore Ts'o
2023-12-05 13:19 ` Baokun Li
2023-12-06 21:55 ` Theodore Ts'o
2023-12-07 6:41 ` Baokun Li
2023-12-06 8:35 ` Dave Chinner
2023-12-06 9:02 ` Christoph Hellwig
2023-12-06 10:34 ` Dave Chinner
2023-12-06 12:20 ` Christoph Hellwig
2023-12-06 11:57 ` Baokun Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231204121120.mpxntey47rluhcfi@quack3 \
--to=jack@suse.cz \
--cc=adilger.kernel@dilger.ca \
--cc=akpm@linux-foundation.org \
--cc=libaokun1@huawei.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ritesh.list@gmail.com \
--cc=tytso@mit.edu \
--cc=willy@infradead.org \
--cc=yangerkun@huawei.com \
--cc=yi.zhang@huawei.com \
--cc=yukuai3@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox