linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* msync() behaviour broken for MS_ASYNC, revert patch?
@ 2004-03-31 22:16 Stephen C. Tweedie
  2004-03-31 22:37 ` Linus Torvalds
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Stephen C. Tweedie @ 2004-03-31 22:16 UTC (permalink / raw)
  To: linux-mm, Andrew Morton; +Cc: linux-kernel, Stephen Tweedie, Ulrich Drepper

[-- Attachment #1: Type: text/plain, Size: 3170 bytes --]

Hi,

I've been looking at a discrepancy between msync() behaviour on 2.4.9
and newer 2.4 kernels, and it looks like things changed again in
2.5.68.  From the ChangeLog:

ChangeSet 1.971.76.156 2003/04/09 11:31:36 akpm@digeo.com
  [PATCH] Make msync(MS_ASYNC) no longer start the I/O
  
  MS_ASYNC will currently wait on previously-submitted I/O, then start new I/O
  and not wait on it.  This can cause undesirable blocking if msync is called
  rapidly against the same memory.
  
  So instead, change msync(MS_ASYNC) to not start any IO at all.  Just flush
  the pte dirty bits into the pageframe and leave it at that.
  
  The IO _will_ happen within a kupdate period.  And the application can use
  fsync() or fadvise(FADV_DONTNEED) if it actually wants to schedule the IO
  immediately.

Unfortunately, this seems to contradict SingleUnix requirements, which
state:

        When MS_ASYNC is specified, msync() shall return immediately
        once all the write operations are initiated or queued for
        servicing
        
although I can't find an unambiguous definition of "queued for service"
in the online standard.  I'm reading it as requiring that the I/O has
reached the block device layer, not simply that it has been marked dirty
for some future writeback pass to catch; Uli agrees with that
interpretation.

The comment that was added with this change in 2.5.68 states:

 * MS_ASYNC does not start I/O (it used to, up to 2.5.67).  Instead, it just
 * marks the relevant pages dirty.  The application may now run fsync() to
 * write out the dirty pages and wait on the writeout and check the result.
 * Or the application may run fadvise(FADV_DONTNEED) against the fd to start
 * async writeout immediately.
 * So my _not_ starting I/O in MS_ASYNC we provide complete flexibility to
 * applications.

but that's actually misleading --- in every kernel I've looked at as far
back as 2.4.9, the "new" behaviour of simply queueing the pages for
kupdated writeback is already available by specifying flags==0 for
msync(), so there's no additional flexibility added by making MS_ASYNC
do the same thing.  And FADV_DONTNEED doesn't make any guarantees that
anything gets queued for writeback: it only does a filemap writeback if
there's no congestion on the backing device.

So we're actually _more_ flexible, as well as more compliant, under the
old behaviour --- flags==0 gives the "defer to kupdated" behaviour,
MS_ASYNC guarantees that IOs have been queued, and MS_SYNC waits for
synchronised completion.

All that's really missing is documentation to define how Linux deals
with flags==0 as an extension to SingleUnix (which requires either
MS_SYNC, MS_ASYNC or MS_INVALIDATE to be set, and which does not define
a behaviour for flags==0.)
  
The 2.5.68 changeset also includes the comment:

  (This has triggered an ext3 bug - the page's buffers get dirtied so fast
  that kjournald keeps writing the buffers over and over for 10-20 seconds
  before deciding to give up for some reason)

Was that ever resolved?  If it's still there, I should have a look at it
if we're restoring the old trigger.

Patch below reverts the behaviour in 2.6.

--Stephen


[-- Attachment #2: msync-async-writeout.patch --]
[-- Type: text/plain, Size: 1935 bytes --]

--- linux-2.6-tmp/mm/msync.c.=K0000=.orig
+++ linux-2.6-tmp/mm/msync.c
@@ -127,13 +127,10 @@ static int filemap_sync(struct vm_area_s
 /*
  * MS_SYNC syncs the entire file - including mappings.
  *
- * MS_ASYNC does not start I/O (it used to, up to 2.5.67).  Instead, it just
- * marks the relevant pages dirty.  The application may now run fsync() to
- * write out the dirty pages and wait on the writeout and check the result.
- * Or the application may run fadvise(FADV_DONTNEED) against the fd to start
- * async writeout immediately.
- * So my _not_ starting I/O in MS_ASYNC we provide complete flexibility to
- * applications.
+ * MS_ASYNC once again starts I/O (it did not between 2.5.68 and 2.6.4.)
+ * SingleUnix requires it.  If an application wants to queue dirty pages
+ * for normal asychronous writeback, msync with flags==0 should achieve
+ * that on all kernels at least as far back as 2.4.
  */
 static int msync_interval(struct vm_area_struct * vma,
 	unsigned long start, unsigned long end, int flags)
@@ -147,20 +144,22 @@ static int msync_interval(struct vm_area
 	if (file && (vma->vm_flags & VM_SHARED)) {
 		ret = filemap_sync(vma, start, end-start, flags);
 
-		if (!ret && (flags & MS_SYNC)) {
+		if (!ret && (flags & (MS_SYNC|MS_ASYNC))) {
 			struct address_space *mapping = file->f_mapping;
 			int err;
 
 			down(&mapping->host->i_sem);
 			ret = filemap_fdatawrite(mapping);
-			if (file->f_op && file->f_op->fsync) {
-				err = file->f_op->fsync(file,file->f_dentry,1);
-				if (err && !ret)
+			if (flags & MS_SYNC) {
+				if (file->f_op && file->f_op->fsync) {
+					err = file->f_op->fsync(file, file->f_dentry, 1);
+					if (err && !ret)
+						ret = err;
+				}
+				err = filemap_fdatawait(mapping);
+				if (!ret)
 					ret = err;
 			}
-			err = filemap_fdatawait(mapping);
-			if (!ret)
-				ret = err;
 			up(&mapping->host->i_sem);
 		}
 	}

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2004-04-21  9:52 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-03-31 22:16 msync() behaviour broken for MS_ASYNC, revert patch? Stephen C. Tweedie
2004-03-31 22:37 ` Linus Torvalds
2004-03-31 23:41   ` Stephen C. Tweedie
2004-04-01  0:08     ` Linus Torvalds
2004-04-01  0:30       ` Andrew Morton
2004-04-01 15:40       ` Stephen C. Tweedie
2004-04-01 16:02         ` Linus Torvalds
2004-04-01 16:33           ` Stephen C. Tweedie
2004-04-01 16:19         ` Jamie Lokier
2004-04-01 16:56           ` s390 storage key inconsistency? [was Re: msync() behaviour broken for MS_ASYNC, revert patch?] Stephen C. Tweedie
2004-04-01 16:57           ` msync() behaviour broken for MS_ASYNC, revert patch? Stephen C. Tweedie
2004-04-01 18:51         ` Andrew Morton
2004-03-31 22:53 ` Andrew Morton
2004-03-31 23:20   ` Stephen C. Tweedie
2004-04-16 22:35 ` Jamie Lokier
2004-04-19 21:54   ` Stephen C. Tweedie
2004-04-21  2:10     ` Jamie Lokier
2004-04-21  9:52       ` Stephen C. Tweedie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox