From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Neil Brown <neilb@suse.de>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Peter Linich <plinich@cse.unsw.edu.au>
Subject: Re: [PATCH/RFC] Is it OK for 'read' to return nuls for a file that never had nuls in it?
Date: Tue, 29 May 2007 16:39:37 +1000 [thread overview]
Message-ID: <465BCAA9.3070707@yahoo.com.au> (raw)
In-Reply-To: <18011.51290.257450.26100@notabene.brown>
[-- Attachment #1: Type: text/plain, Size: 2346 bytes --]
Neil Brown wrote:
> [resending with correct To address - please reply to this one]
>
> It appears that there is a race when reading from a file that is
> concurrently being truncated. It is possible to read a number of
> bytes that matches the size of the file before the truncate, but the
> actual bytes are all nuls - values that had never been in the file.
>
> Below is a simple C program to demonstrate this, and a patch that
> might fix it (initial testing is positive, but it might just make the
> window smaller).
> To trial the program run two instances, naming the same file as the
> only argument. Every '!' indicates a read that found a nul. I get
> one every few minutes.
> e.g. cc -o race race.c ; ./race /tmp/testfile & ./race /tmp/tracefile
>
> My exploration suggests the problem is in do_generic_mapping_read in
> mm/filemap.c.
> This code:
> gets the size of the file
> triggers readahead
> gets the appropriate page
> If the page is up-to-date, return data.
>
> If a truncate happens just before readahead is triggered, then
> the size will be the pre-truncate size of the file, while the page
> could have been read by the readahead and so will be up-to-date and
> full of nuls.
>
> Note that if do_generic_mapping_read calls readpage explicitly, it
> samples the value of inode->i_size again after the read. However if
> the readpage is called by the readahead code, i_size is not
> re-sampled.
>
> I am not 100% confident of every aspect of this explanation (I haven't
> traced all the way through the read-ahead code) but it seems to fit
> the available data including the fact that if I disable read-ahead
> (blockdev --setra 0) then the apparent problem goes away.
>
> The patch below moves the code for re-sampling i_size from after the
> readpage call to before the "actor" call.
>
> Questions:
> - Is this a problem, and should it be fixed (I think "yes").
I think you are right.
> - Is the patch appropriate, and does it have no negative
> consequences?.
> (Obviously some comments should be tidied up to reflect the new
> reality).
Would it be better (and closer to following the existing logic) if
we sampled i_size before testing each page for uptodateness? It might
also cost a little less in the fastpath case of finding an uptodate
page.
--
SUSE Labs, Novell Inc.
[-- Attachment #2: mm-ra-zerofill-fix.patch --]
[-- Type: text/plain, Size: 1235 bytes --]
Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c 2007-05-29 16:34:38.000000000 +1000
+++ linux-2.6/mm/filemap.c 2007-05-29 16:35:42.000000000 +1000
@@ -863,13 +863,11 @@ void do_generic_mapping_read(struct addr
{
struct inode *inode = mapping->host;
unsigned long index;
- unsigned long end_index;
unsigned long offset;
unsigned long last_index;
unsigned long next_index;
unsigned long prev_index;
unsigned int prev_offset;
- loff_t isize;
struct page *cached_page;
int error;
struct file_ra_state ra = *_ra;
@@ -882,15 +880,17 @@ void do_generic_mapping_read(struct addr
last_index = (*ppos + desc->count + PAGE_CACHE_SIZE-1) >> PAGE_CACHE_SHIFT;
offset = *ppos & ~PAGE_CACHE_MASK;
- isize = i_size_read(inode);
- if (!isize)
- goto out;
-
- end_index = (isize - 1) >> PAGE_CACHE_SHIFT;
for (;;) {
+ loff_t isize;
struct page *page;
+ unsigned long end_index;
unsigned long nr, ret;
+ isize = i_size_read(inode);
+ if (!isize)
+ goto out;
+ end_index = (isize - 1) >> PAGE_CACHE_SHIFT;
+
/* nr is the maximum number of bytes to copy from this page */
nr = PAGE_CACHE_SIZE;
if (index >= end_index) {
next prev parent reply other threads:[~2007-05-29 6:39 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-29 6:29 Neil Brown
2007-05-29 6:39 ` Nick Piggin [this message]
2007-05-29 7:00 ` Neil Brown
2007-05-29 7:28 ` Nick Piggin
2007-05-31 6:51 ` Neil Brown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=465BCAA9.3070707@yahoo.com.au \
--to=nickpiggin@yahoo.com.au \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=neilb@suse.de \
--cc=plinich@cse.unsw.edu.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox