Hi, I've been looking at a discrepancy between msync() behaviour on 2.4.9 and newer 2.4 kernels, and it looks like things changed again in 2.5.68. From the ChangeLog: ChangeSet 1.971.76.156 2003/04/09 11:31:36 akpm@digeo.com [PATCH] Make msync(MS_ASYNC) no longer start the I/O MS_ASYNC will currently wait on previously-submitted I/O, then start new I/O and not wait on it. This can cause undesirable blocking if msync is called rapidly against the same memory. So instead, change msync(MS_ASYNC) to not start any IO at all. Just flush the pte dirty bits into the pageframe and leave it at that. The IO _will_ happen within a kupdate period. And the application can use fsync() or fadvise(FADV_DONTNEED) if it actually wants to schedule the IO immediately. Unfortunately, this seems to contradict SingleUnix requirements, which state: When MS_ASYNC is specified, msync() shall return immediately once all the write operations are initiated or queued for servicing although I can't find an unambiguous definition of "queued for service" in the online standard. I'm reading it as requiring that the I/O has reached the block device layer, not simply that it has been marked dirty for some future writeback pass to catch; Uli agrees with that interpretation. The comment that was added with this change in 2.5.68 states: * MS_ASYNC does not start I/O (it used to, up to 2.5.67). Instead, it just * marks the relevant pages dirty. The application may now run fsync() to * write out the dirty pages and wait on the writeout and check the result. * Or the application may run fadvise(FADV_DONTNEED) against the fd to start * async writeout immediately. * So my _not_ starting I/O in MS_ASYNC we provide complete flexibility to * applications. but that's actually misleading --- in every kernel I've looked at as far back as 2.4.9, the "new" behaviour of simply queueing the pages for kupdated writeback is already available by specifying flags==0 for msync(), so there's no additional flexibility added by making MS_ASYNC do the same thing. And FADV_DONTNEED doesn't make any guarantees that anything gets queued for writeback: it only does a filemap writeback if there's no congestion on the backing device. So we're actually _more_ flexible, as well as more compliant, under the old behaviour --- flags==0 gives the "defer to kupdated" behaviour, MS_ASYNC guarantees that IOs have been queued, and MS_SYNC waits for synchronised completion. All that's really missing is documentation to define how Linux deals with flags==0 as an extension to SingleUnix (which requires either MS_SYNC, MS_ASYNC or MS_INVALIDATE to be set, and which does not define a behaviour for flags==0.) The 2.5.68 changeset also includes the comment: (This has triggered an ext3 bug - the page's buffers get dirtied so fast that kjournald keeps writing the buffers over and over for 10-20 seconds before deciding to give up for some reason) Was that ever resolved? If it's still there, I should have a look at it if we're restoring the old trigger. Patch below reverts the behaviour in 2.6. --Stephen