linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Baokun Li <libaokun1@huawei.com>
To: <linux-mm@kvack.org>, <linux-ext4@vger.kernel.org>
Cc: <tytso@mit.edu>, <adilger.kernel@dilger.ca>, <jack@suse.cz>,
	<willy@infradead.org>, <akpm@linux-foundation.org>,
	<ritesh.list@gmail.com>, <linux-kernel@vger.kernel.org>,
	<yi.zhang@huawei.com>, <yangerkun@huawei.com>,
	<yukuai3@huawei.com>, <libaokun1@huawei.com>
Subject: [PATCH -RFC 2/2] ext4: avoid data corruption when extending DIO write race with buffered read
Date: Sat, 2 Dec 2023 17:14:32 +0800	[thread overview]
Message-ID: <20231202091432.8349-3-libaokun1@huawei.com> (raw)
In-Reply-To: <20231202091432.8349-1-libaokun1@huawei.com>

The following race between extending DIO write and buffered read may
result in reading a stale page cache:

          cpu1                             cpu2
------------------------------|-----------------------------
// Direct write 1024 from 4096
                              // Buffer read 8192 from 0
...                           ...
 ext4_file_write_iter
  ext4_dio_write_iter
   iomap_dio_rw
   ...
                               ext4_file_read_iter
                                generic_file_read_iter
                                 filemap_read
                                  i_size_read(inode) // 4096
                                  filemap_get_pages
                                   ...
                                    ext4_mpage_readpages
                                     ext4_readpage_limit(inode)
                                      i_size_read(inode) // 4096
                                     // read 4096, zero-filled 4096
    ext4_dio_write_end_io
     i_size_write(inode, 5120)
                                   i_size_read(inode) // 5120
                                   copyout 4096

                              // new read 4096 from 4096
                              ext4_file_read_iter
                               generic_file_read_iter
                                filemap_read
                                 i_size_read(inode) // 5120
                                 filemap_get_pages
                                  // stale page is uptodata
                                 i_size_read(inode) // 5120
                                 copyout 5120
    dio invalidate stale page cache

In the above race, after DIO write updates the inode size, but before
invalidate stale page cache, buffered read sees that the last read page
chche is still uptodata, and does not re-read it from the disk to copy
it directly to the user space, which results in the data in the tail of
1024 bytes is not the same as the data on the disk.

To get around this, we wait for the existing DIO write to invalidate the
stale page cache before each new buffered read.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
---
 fs/ext4/file.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 0166bb9ca160..99e92ddef97d 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -144,6 +144,9 @@ static ssize_t ext4_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	if (iocb->ki_flags & IOCB_DIRECT)
 		return ext4_dio_read_iter(iocb, to);
 
+	/* wait for stale page cache to be invalidated */
+	inode_dio_wait(inode);
+
 	return generic_file_read_iter(iocb, to);
 }
 
-- 
2.31.1



  parent reply	other threads:[~2023-12-02  9:10 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-02  9:14 [PATCH -RFC 0/2] mm/ext4: " Baokun Li
2023-12-02  9:14 ` [PATCH -RFC 1/2] mm: " Baokun Li
2023-12-02  9:14 ` Baokun Li [this message]
2023-12-04 12:11 ` [PATCH -RFC 0/2] mm/ext4: " Jan Kara
2023-12-04 13:50   ` Baokun Li
2023-12-04 14:41     ` Jan Kara
2023-12-05 12:50       ` Baokun Li
2023-12-06 19:37         ` Jan Kara
2023-12-07  3:01           ` Baokun Li
2023-12-07 14:15           ` Baokun Li
2023-12-11 17:49             ` Jan Kara
2023-12-12  2:15               ` Baokun Li
2023-12-12  4:36           ` Matthew Wilcox
2023-12-12 14:25             ` Jan Kara
2023-12-05  4:17     ` Theodore Ts'o
2023-12-05 13:19       ` Baokun Li
2023-12-06 21:55         ` Theodore Ts'o
2023-12-07  6:41           ` Baokun Li
2023-12-06  8:35     ` Dave Chinner
2023-12-06  9:02       ` Christoph Hellwig
2023-12-06 10:34         ` Dave Chinner
2023-12-06 12:20           ` Christoph Hellwig
2023-12-06 11:57       ` Baokun Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231202091432.8349-3-libaokun1@huawei.com \
    --to=libaokun1@huawei.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=akpm@linux-foundation.org \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ritesh.list@gmail.com \
    --cc=tytso@mit.edu \
    --cc=willy@infradead.org \
    --cc=yangerkun@huawei.com \
    --cc=yi.zhang@huawei.com \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox