Re: [PATCH/RFC] Is it OK for 'read' to return nuls for a file that never had nuls in it?

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Neil Brown <neilb@suse.de>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Peter Linich <plinich@cse.unsw.edu.au>
Subject: Re: [PATCH/RFC] Is it OK for 'read' to return nuls for a file   that never had nuls in it?
Date: Tue, 29 May 2007 16:39:37 +1000	[thread overview]
Message-ID: <465BCAA9.3070707@yahoo.com.au> (raw)
In-Reply-To: <18011.51290.257450.26100@notabene.brown>

[-- Attachment #1: Type: text/plain, Size: 2346 bytes --]

Neil Brown wrote:
> [resending with correct To address - please reply to this one]
> 
> It appears that there is a race when reading from a file that is
> concurrently being truncated.  It is possible to read a number of
> bytes that matches the size of the file before the truncate, but the
> actual bytes are all nuls - values that had never been in the file.
> 
> Below is a simple C program to demonstrate this, and a patch that
> might fix it (initial testing is positive, but it might just make the
> window smaller).
> To trial the program run two instances, naming the same file as the
> only argument.  Every '!' indicates a read that found a nul.  I get
> one every few minutes.
> e.g.  cc -o race race.c ; ./race /tmp/testfile & ./race /tmp/tracefile
> 
> My exploration suggests the problem is in do_generic_mapping_read in
> mm/filemap.c. 
> This code:
>    gets the size of the file
>    triggers readahead
>    gets the appropriate page
>    If the page is up-to-date, return data.
> 
> If a truncate happens just before readahead is triggered, then
> the size will be the pre-truncate size of the file, while the page
> could have been read by the readahead and so will be up-to-date and
> full of nuls.
> 
> Note that if do_generic_mapping_read calls readpage explicitly, it
> samples the value of inode->i_size again after the read.  However if
> the readpage is called by the readahead code, i_size is not
> re-sampled.
> 
> I am not 100% confident of every aspect of this explanation (I haven't
> traced all the way through the read-ahead code) but it seems to fit
> the available data including the fact that if I disable read-ahead
> (blockdev --setra 0) then the apparent problem goes away.
> 
> The patch below moves the code for re-sampling i_size from after the
> readpage call to before the "actor" call.
> 
> Questions:
>   - Is this a problem, and should it be fixed (I think "yes").

I think you are right.

>   - Is the patch appropriate, and does it have no negative
>     consequences?.
>     (Obviously some comments should be tidied up to reflect the new
>     reality).

Would it be better (and closer to following the existing logic) if
we sampled i_size before testing each page for uptodateness? It might
also cost a little less in the fastpath case of finding an uptodate
page.

-- 
SUSE Labs, Novell Inc.

[-- Attachment #2: mm-ra-zerofill-fix.patch --]
[-- Type: text/plain, Size: 1235 bytes --]

Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c	2007-05-29 16:34:38.000000000 +1000
+++ linux-2.6/mm/filemap.c	2007-05-29 16:35:42.000000000 +1000
@@ -863,13 +863,11 @@ void do_generic_mapping_read(struct addr
 {
 	struct inode *inode = mapping->host;
 	unsigned long index;
-	unsigned long end_index;
 	unsigned long offset;
 	unsigned long last_index;
 	unsigned long next_index;
 	unsigned long prev_index;
 	unsigned int prev_offset;
-	loff_t isize;
 	struct page *cached_page;
 	int error;
 	struct file_ra_state ra = *_ra;
@@ -882,15 +880,17 @@ void do_generic_mapping_read(struct addr
 	last_index = (*ppos + desc->count + PAGE_CACHE_SIZE-1) >> PAGE_CACHE_SHIFT;
 	offset = *ppos & ~PAGE_CACHE_MASK;
 
-	isize = i_size_read(inode);
-	if (!isize)
-		goto out;
-
-	end_index = (isize - 1) >> PAGE_CACHE_SHIFT;
 	for (;;) {
+		loff_t isize;
 		struct page *page;
+		unsigned long end_index;
 		unsigned long nr, ret;
 
+		isize = i_size_read(inode);
+		if (!isize)
+			goto out;
+		end_index = (isize - 1) >> PAGE_CACHE_SHIFT;
+
 		/* nr is the maximum number of bytes to copy from this page */
 		nr = PAGE_CACHE_SIZE;
 		if (index >= end_index) {

next prev parent reply	other threads:[~2007-05-29  6:39 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-29  6:29 Neil Brown
2007-05-29  6:39 ` Nick Piggin [this message]
2007-05-29  7:00   ` Neil Brown
2007-05-29  7:28     ` Nick Piggin
2007-05-31  6:51       ` Neil Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=465BCAA9.3070707@yahoo.com.au \
    --to=nickpiggin@yahoo.com.au \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=neilb@suse.de \
    --cc=plinich@cse.unsw.edu.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox