Neil Brown wrote: > [resending with correct To address - please reply to this one] > > It appears that there is a race when reading from a file that is > concurrently being truncated. It is possible to read a number of > bytes that matches the size of the file before the truncate, but the > actual bytes are all nuls - values that had never been in the file. > > Below is a simple C program to demonstrate this, and a patch that > might fix it (initial testing is positive, but it might just make the > window smaller). > To trial the program run two instances, naming the same file as the > only argument. Every '!' indicates a read that found a nul. I get > one every few minutes. > e.g. cc -o race race.c ; ./race /tmp/testfile & ./race /tmp/tracefile > > My exploration suggests the problem is in do_generic_mapping_read in > mm/filemap.c. > This code: > gets the size of the file > triggers readahead > gets the appropriate page > If the page is up-to-date, return data. > > If a truncate happens just before readahead is triggered, then > the size will be the pre-truncate size of the file, while the page > could have been read by the readahead and so will be up-to-date and > full of nuls. > > Note that if do_generic_mapping_read calls readpage explicitly, it > samples the value of inode->i_size again after the read. However if > the readpage is called by the readahead code, i_size is not > re-sampled. > > I am not 100% confident of every aspect of this explanation (I haven't > traced all the way through the read-ahead code) but it seems to fit > the available data including the fact that if I disable read-ahead > (blockdev --setra 0) then the apparent problem goes away. > > The patch below moves the code for re-sampling i_size from after the > readpage call to before the "actor" call. > > Questions: > - Is this a problem, and should it be fixed (I think "yes"). I think you are right. > - Is the patch appropriate, and does it have no negative > consequences?. > (Obviously some comments should be tidied up to reflect the new > reality). Would it be better (and closer to following the existing logic) if we sampled i_size before testing each page for uptodateness? It might also cost a little less in the fastpath case of finding an uptodate page. -- SUSE Labs, Novell Inc.