msync() behaviour broken for MS_ASYNC, revert patch?

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* msync() behaviour broken for MS_ASYNC, revert patch?
@ 2004-03-31 22:16 Stephen C. Tweedie
  2004-03-31 22:37 ` Linus Torvalds
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Stephen C. Tweedie @ 2004-03-31 22:16 UTC (permalink / raw)
  To: linux-mm, Andrew Morton; +Cc: linux-kernel, Stephen Tweedie, Ulrich Drepper

[-- Attachment #1: Type: text/plain, Size: 3170 bytes --]

Hi,

I've been looking at a discrepancy between msync() behaviour on 2.4.9
and newer 2.4 kernels, and it looks like things changed again in
2.5.68.  From the ChangeLog:

ChangeSet 1.971.76.156 2003/04/09 11:31:36 akpm@digeo.com
  [PATCH] Make msync(MS_ASYNC) no longer start the I/O

  MS_ASYNC will currently wait on previously-submitted I/O, then start new I/O
  and not wait on it.  This can cause undesirable blocking if msync is called
  rapidly against the same memory.

  So instead, change msync(MS_ASYNC) to not start any IO at all.  Just flush
  the pte dirty bits into the pageframe and leave it at that.

  The IO _will_ happen within a kupdate period.  And the application can use
  fsync() or fadvise(FADV_DONTNEED) if it actually wants to schedule the IO
  immediately.

Unfortunately, this seems to contradict SingleUnix requirements, which
state:

        When MS_ASYNC is specified, msync() shall return immediately
        once all the write operations are initiated or queued for
        servicing

although I can't find an unambiguous definition of "queued for service"
in the online standard.  I'm reading it as requiring that the I/O has
reached the block device layer, not simply that it has been marked dirty
for some future writeback pass to catch; Uli agrees with that
interpretation.

The comment that was added with this change in 2.5.68 states:

 * MS_ASYNC does not start I/O (it used to, up to 2.5.67).  Instead, it just
 * marks the relevant pages dirty.  The application may now run fsync() to
 * write out the dirty pages and wait on the writeout and check the result.
 * Or the application may run fadvise(FADV_DONTNEED) against the fd to start
 * async writeout immediately.
 * So my _not_ starting I/O in MS_ASYNC we provide complete flexibility to
 * applications.

but that's actually misleading --- in every kernel I've looked at as far
back as 2.4.9, the "new" behaviour of simply queueing the pages for
kupdated writeback is already available by specifying flags==0 for
msync(), so there's no additional flexibility added by making MS_ASYNC
do the same thing.  And FADV_DONTNEED doesn't make any guarantees that
anything gets queued for writeback: it only does a filemap writeback if
there's no congestion on the backing device.

So we're actually _more_ flexible, as well as more compliant, under the
old behaviour --- flags==0 gives the "defer to kupdated" behaviour,
MS_ASYNC guarantees that IOs have been queued, and MS_SYNC waits for
synchronised completion.

All that's really missing is documentation to define how Linux deals
with flags==0 as an extension to SingleUnix (which requires either
MS_SYNC, MS_ASYNC or MS_INVALIDATE to be set, and which does not define
a behaviour for flags==0.)

The 2.5.68 changeset also includes the comment:

  (This has triggered an ext3 bug - the page's buffers get dirtied so fast
  that kjournald keeps writing the buffers over and over for 10-20 seconds
  before deciding to give up for some reason)

Was that ever resolved?  If it's still there, I should have a look at it
if we're restoring the old trigger.

Patch below reverts the behaviour in 2.6.

--Stephen

[-- Attachment #2: msync-async-writeout.patch --]
[-- Type: text/plain, Size: 1935 bytes --]

--- linux-2.6-tmp/mm/msync.c.=K0000=.orig
+++ linux-2.6-tmp/mm/msync.c
@@ -127,13 +127,10 @@ static int filemap_sync(struct vm_area_s
 /*
  * MS_SYNC syncs the entire file - including mappings.
  *
- * MS_ASYNC does not start I/O (it used to, up to 2.5.67).  Instead, it just
- * marks the relevant pages dirty.  The application may now run fsync() to
- * write out the dirty pages and wait on the writeout and check the result.
- * Or the application may run fadvise(FADV_DONTNEED) against the fd to start
- * async writeout immediately.
- * So my _not_ starting I/O in MS_ASYNC we provide complete flexibility to
- * applications.
+ * MS_ASYNC once again starts I/O (it did not between 2.5.68 and 2.6.4.)
+ * SingleUnix requires it.  If an application wants to queue dirty pages
+ * for normal asychronous writeback, msync with flags==0 should achieve
+ * that on all kernels at least as far back as 2.4.
  */
 static int msync_interval(struct vm_area_struct * vma,
 	unsigned long start, unsigned long end, int flags)
@@ -147,20 +144,22 @@ static int msync_interval(struct vm_area
 	if (file && (vma->vm_flags & VM_SHARED)) {
 		ret = filemap_sync(vma, start, end-start, flags);

-		if (!ret && (flags & MS_SYNC)) {
+		if (!ret && (flags & (MS_SYNC|MS_ASYNC))) {
 			struct address_space *mapping = file->f_mapping;
 			int err;

 			down(&mapping->host->i_sem);
 			ret = filemap_fdatawrite(mapping);
-			if (file->f_op && file->f_op->fsync) {
-				err = file->f_op->fsync(file,file->f_dentry,1);
-				if (err && !ret)
+			if (flags & MS_SYNC) {
+				if (file->f_op && file->f_op->fsync) {
+					err = file->f_op->fsync(file, file->f_dentry, 1);
+					if (err && !ret)
+						ret = err;
+				}
+				err = filemap_fdatawait(mapping);
+				if (!ret)
 					ret = err;
 			}
-			err = filemap_fdatawait(mapping);
-			if (!ret)
-				ret = err;
 			up(&mapping->host->i_sem);
 		}
 	}

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: msync() behaviour broken for MS_ASYNC, revert patch?
  2004-03-31 22:16 msync() behaviour broken for MS_ASYNC, revert patch? Stephen C. Tweedie
@ 2004-03-31 22:37 ` Linus Torvalds
  2004-03-31 23:41   ` Stephen C. Tweedie
  2004-03-31 22:53 ` Andrew Morton
  2004-04-16 22:35 ` Jamie Lokier
  2 siblings, 1 reply; 18+ messages in thread
From: Linus Torvalds @ 2004-03-31 22:37 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: linux-mm, Andrew Morton, linux-kernel, Ulrich Drepper

On Wed, 31 Mar 2004, Stephen C. Tweedie wrote:
>         
> although I can't find an unambiguous definition of "queued for service"
> in the online standard.  I'm reading it as requiring that the I/O has
> reached the block device layer, not simply that it has been marked dirty
> for some future writeback pass to catch; Uli agrees with that
> interpretation.

That interpretation makes pretty much zero sense.

If you care about the data hitting the disk, you have to use fsync() or 
similar _anyway_, and pretending anything else is just bogus.

As such, just marking the pages dirty is as much of a "queing" them for 
write as actually writing them, since in both cases the guarantees are 
_exactly_ the same: the pages have not hit the disk by the time the system 
call returns, but will hit the disk at some time in the future.

Having the requirement that it is on some sw-only request queue is
nonsensical, since such a queue is totally invisible from a user
perspective.

User space has no idea about "block device layer" vs "VM layer" queues,
and trying to distinguiosh between the two is madness. It's just an 
internal implementation issue that has no meaning to the user.

		Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: msync() behaviour broken for MS_ASYNC, revert patch?
  2004-03-31 22:16 msync() behaviour broken for MS_ASYNC, revert patch? Stephen C. Tweedie
  2004-03-31 22:37 ` Linus Torvalds
@ 2004-03-31 22:53 ` Andrew Morton
  2004-03-31 23:20   ` Stephen C. Tweedie
  2004-04-16 22:35 ` Jamie Lokier
  2 siblings, 1 reply; 18+ messages in thread
From: Andrew Morton @ 2004-03-31 22:53 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: linux-mm, linux-kernel, drepper

"Stephen C. Tweedie" <sct@redhat.com> wrote:
>
> Hi,
> 
> I've been looking at a discrepancy between msync() behaviour on 2.4.9
> and newer 2.4 kernels, and it looks like things changed again in
> 2.5.68.  From the ChangeLog:
> 
> ChangeSet 1.971.76.156 2003/04/09 11:31:36 akpm@digeo.com
>   [PATCH] Make msync(MS_ASYNC) no longer start the I/O
>   
>   MS_ASYNC will currently wait on previously-submitted I/O, then start new I/O
>   and not wait on it.  This can cause undesirable blocking if msync is called
>   rapidly against the same memory.
>   
>   So instead, change msync(MS_ASYNC) to not start any IO at all.  Just flush
>   the pte dirty bits into the pageframe and leave it at that.
>   
>   The IO _will_ happen within a kupdate period.  And the application can use
>   fsync() or fadvise(FADV_DONTNEED) if it actually wants to schedule the IO
>   immediately.
> 
> Unfortunately, this seems to contradict SingleUnix requirements, which
> state:
> 
>         When MS_ASYNC is specified, msync() shall return immediately
>         once all the write operations are initiated or queued for
>         servicing
>         
> although I can't find an unambiguous definition of "queued for service"
> in the online standard.  I'm reading it as requiring that the I/O has
> reached the block device layer, not simply that it has been marked dirty
> for some future writeback pass to catch; Uli agrees with that
> interpretation.

I don't think I agree with that.  If "queued for service" means we've
started the I/O, then what does "initiated" mean, and why did they specify
"initiated" separately?

What triggered all this was a dinky little test app which Linus wrote to
time some aspect of P4 tlb writeback latency.  It sits in a loop dirtying a
page then msyncing it with MS_ASYNC.  It ran very poorly, because MS_ASYNC
ended up waiting on the previously-submitted I/O before starting new I/O.

One approach to improving that would be for MS_ASYNC to say "if the page is
already under writeout then just skip the I/O".  But that's worthless,
really - it makes the MS_ASYNC semantics too vague.

As you point out, Linus's app should have used the "flags=0" linux
extension.  Didn't think of that.

Your reversion patch would mean that current applications which use
MS_ASYNC will again suffer large latencies if the pages are under writeout.
Sure, users could switch apps to using flags=0 to avoid that, but people
don't know to do that.

So given that SUS is ambiguous about this, I'd suggest that we be able to
demonstrate some real-world reason why this matters.  Why are you concerned
about this?

> The 2.5.68 changeset also includes the comment:
> 
>   (This has triggered an ext3 bug - the page's buffers get dirtied so fast
>   that kjournald keeps writing the buffers over and over for 10-20 seconds
>   before deciding to give up for some reason)
> 
> Was that ever resolved?  If it's still there, I should have a look at it
> if we're restoring the old trigger.

(These changelog thingies are useful, aren't they?)

I don't recall checking since that time.  I expect that Linus's test app
will still livelock kjournals in the current -linus tree - kjournald sits
there trying to write out the dirty buffers but the dang things just keep
on getting dirtied.

If so, I'm sure this patch (queued for 2.6.6) will fix it:

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.5-rc3/2.6.5-rc3-mm3/broken-out/jbd-move-locked-buffers.patch

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: msync() behaviour broken for MS_ASYNC, revert patch?
  2004-03-31 22:53 ` Andrew Morton
@ 2004-03-31 23:20   ` Stephen C. Tweedie
  0 siblings, 0 replies; 18+ messages in thread
From: Stephen C. Tweedie @ 2004-03-31 23:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Ulrich Drepper, Linus Torvalds, Stephen Tweedie

Hi,

On Wed, 2004-03-31 at 23:53, Andrew Morton wrote:
> "Stephen C. Tweedie" <sct@redhat.com> wrote:

> > Unfortunately, this seems to contradict SingleUnix requirements, which
> > state:
> >         When MS_ASYNC is specified, msync() shall return immediately
> >         once all the write operations are initiated or queued for
> >         servicing
> > although I can't find an unambiguous definition of "queued for service"
> > in the online standard.  I'm reading it as requiring that the I/O has
> > reached the block device layer

> I don't think I agree with that.  If "queued for service" means we've
> started the I/O, then what does "initiated" mean, and why did they specify
> "initiated" separately?

I'd interpret "initiated" as having reached hardware.  "Queued for
service" is much more open to interpretation: Uli came up with "the data
must be actively put in a stage where I/O is initiated", which still
doesn't really address what sort of queueing is allowed.

> What triggered all this was a dinky little test app which Linus wrote to
> time some aspect of P4 tlb writeback latency.  It sits in a loop dirtying a
> page then msyncing it with MS_ASYNC.  It ran very poorly, because MS_ASYNC
> ended up waiting on the previously-submitted I/O before starting new I/O.

Sure.  There are lots of ways an interface can be misused, though: you
only know if one use is valid or not once you've determined what the
_correct_ use is.  I'm much more concerned about getting a correct
interpretation of the spec than of making IO fast for the sake of a
memory benchmark. :-)

> One approach to improving that would be for MS_ASYNC to say "if the page is
> already under writeout then just skip the I/O".  But that's worthless,
> really - it makes the MS_ASYNC semantics too vague.

Agreed.

> Your reversion patch would mean that current applications which use
> MS_ASYNC will again suffer large latencies if the pages are under writeout.

Well, this whole issue came up precisely because somebody was seeing
exactly such a latency hit going from 2.4.9 to a later kernel.  We've
not really been consistent about it in the past.

> Sure, users could switch apps to using flags=0 to avoid that, but people
> don't know to do that.

Exactly why we need documentation for that combination, whatever
happens.

> So given that SUS is ambiguous about this, I'd suggest that we be able to
> demonstrate some real-world reason why this matters.  Why are you concerned
> about this?

Just for the reason you mentioned --- a real-world app (in-house, so
flags==0 is actually a valid solution for them) which was seeing
performance degradation when the "MS_ASYNC submits IO" was introduced in
the first place.  But it was internally written, so I've no idea at all
whether or not the app was assuming one behaviour or the other on other
Unixen.

--Stephen

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: msync() behaviour broken for MS_ASYNC, revert patch?
  2004-03-31 22:37 ` Linus Torvalds
@ 2004-03-31 23:41   ` Stephen C. Tweedie
  2004-04-01  0:08     ` Linus Torvalds
  0 siblings, 1 reply; 18+ messages in thread
From: Stephen C. Tweedie @ 2004-03-31 23:41 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-mm, Andrew Morton, linux-kernel, Ulrich Drepper, Stephen Tweedie

Hi,

On Wed, 2004-03-31 at 23:37, Linus Torvalds wrote:
> On Wed, 31 Mar 2004, Stephen C. Tweedie wrote:
> >         
> > although I can't find an unambiguous definition of "queued for service"
> > in the online standard.  I'm reading it as requiring that the I/O has
> > reached the block device layer, not simply that it has been marked dirty
> > for some future writeback pass to catch; Uli agrees with that
> > interpretation.
> 
> That interpretation makes pretty much zero sense.
> 
> If you care about the data hitting the disk, you have to use fsync() or 
> similar _anyway_, and pretending anything else is just bogus.

You can make the same argument for either implementation of MS_ASYNC. 
And there's at least one way in which the "submit IO now" version can be
used meaningfully --- if you've got several specific areas of data in
one or more mappings that need flushed to disk, you'd be able to
initiate IO with multiple MS_ASYNC calls and then wait for completion
with either MS_SYNC or fsync().  That gives you an interface that
corresponds somewhat with the region-based filemap_sync();
filemap_fdatawrite(); filemap_datawait() that the kernel itself uses.  

> Having the requirement that it is on some sw-only request queue is
> nonsensical, since such a queue is totally invisible from a user
> perspective.

It's very much visible, just from a performance perspective, if you want
to support "kick off this IO, I'm going to wait for the completion
shortly."  If that's the interpretation of MS_ASYNC, then the app is
basically saying it doesn't want the writeback mechanism to be idle
until the writes have completed, regardless of whether it's a block
device or an NFS file or whatever underneath.

But whether that's a legal use of MS_ASYNC really depends on what the
standard is requiring.  I could be persuaded either way.  Uli?

Does anyone know what other Unixen do here?

--Stephen

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: msync() behaviour broken for MS_ASYNC, revert patch?
  2004-03-31 23:41   ` Stephen C. Tweedie
@ 2004-04-01  0:08     ` Linus Torvalds
  2004-04-01  0:30       ` Andrew Morton
  2004-04-01 15:40       ` Stephen C. Tweedie
  0 siblings, 2 replies; 18+ messages in thread
From: Linus Torvalds @ 2004-04-01  0:08 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: linux-mm, Andrew Morton, linux-kernel, Ulrich Drepper

On Wed, 1 Apr 2004, Stephen C. Tweedie wrote:
> 
> On Wed, 2004-03-31 at 23:37, Linus Torvalds wrote:
>
> > If you care about the data hitting the disk, you have to use fsync() or 
> > similar _anyway_, and pretending anything else is just bogus.
> 
> You can make the same argument for either implementation of MS_ASYNC. 

Exactly.

Which is why I say that the implementation cannot matter, because user 
space would be _buggy_ if it depended on some timing issue.

> And there's at least one way in which the "submit IO now" version can be
> used meaningfully --- if you've got several specific areas of data in
> one or more mappings that need flushed to disk, you'd be able to
> initiate IO with multiple MS_ASYNC calls and then wait for completion
> with either MS_SYNC or fsync().

Why wouldn't you be able to do that with the current one?

Tha advantage of the current MS_ASYNC is absolutely astoundingly HUGE: 
because we don't wait for in-progress IO, it can be used to efficiently 
synchronize multiple different areas, and then after that waiting for them 
with _one_ single fsync().

In contrast, the "wait for queued IO" approach can't sanely do that, 
exactly because it will wait in the middle, depending on other activity at 
the same time. It will always have the worry that it happens to do the 
msync() at the wrong time, and then wait synchronously when it shouldn't.

More importanrtly, the current behaviour makes certain patterns _possible_ 
that your suggested semantics simply cannot do efficiently. If we have 
data records smaller than a page, and want to mark them dirty as they 
happen, the current msync() allows that - it doesn't matter that another 
datum was marked dirty just a moment ago. Then, you do one fsync() only 
when you actually want to _commit_ a series of updates before you change 
the index. 

But if we want to have another flag, with MS_HALF_ASYNC, that's certainly
ok by me. I'm all for choice. It's just that I most definitely want the 
choice of doing it the way we do it now, since I consider that to be the 
_sane_ way.

> It's very much visible, just from a performance perspective, if you want
> to support "kick off this IO, I'm going to wait for the completion
> shortly."

That may well be worth a call of its own. It has nothing to do with memory 
mapping, though - what you're really looking for is fasync(). 

And yes, I agree that _that_ would make sense. Havign some primitives to 
start writeout of an area of a file would likely be a good thing.

I'd be perfectly happy with a set of file cache control operations, 
including

 - start writeback in [a,b]
 - wait for [a,b] stable
 - and maybe "punch hole in [a,b]"

Then you could use these for write() in addition to mmap(), and you can 
first mark multiple regions dirty, and then do a single wait (which is 
clearly more efficient than synchronously waiting for multiple regions).

But none of these have anything to do with what SuS or any other standard 
says about MS_ASYNC. 

> But whether that's a legal use of MS_ASYNC really depends on what the
> standard is requiring.  I could be persuaded either way.  Uli?

My argument was that a standard CANNOT say anything one way or the other, 
because the behaviour IS NOT USER-VISIBLE! A program fundamentally cannot 
care, since the only issue is a pure implementation issue of "which queue" 
the data got queued onto.

Bringing in a standards body is irrelevant. It's like trying to use the 
bible to determine whether protons have a positive charge. 

			Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: msync() behaviour broken for MS_ASYNC, revert patch?
  2004-04-01  0:08     ` Linus Torvalds
@ 2004-04-01  0:30       ` Andrew Morton
  2004-04-01 15:40       ` Stephen C. Tweedie
  1 sibling, 0 replies; 18+ messages in thread
From: Andrew Morton @ 2004-04-01  0:30 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: sct, linux-mm, linux-kernel, drepper

Linus Torvalds <torvalds@osdl.org> wrote:
>
> I'd be perfectly happy with a set of file cache control operations, 
> including
> 
>  - start writeback in [a,b]
>  - wait for [a,b] stable
>  - and maybe "punch hole in [a,b]"

Yup, there are a number of linux-specific fadvise() extensions we
can/should be adding, including "start writeback on this byte range for
flush" and "start writeback on this byte range for data integrity" and
"wait on writeback of this byte range".

Some of these are needed internally for the fs-AIO implementation, and also
for an O_SYNC which only writes the pages which the writer wrote.  It's
pretty simple, and it'll be happening.

One wrinkle is that we'd need to add the start/end loff_t pair to the
a_ops->writepages() prototype.  But instead I intend to put the start/end
info into struct writeback_control and pass it that way.  It seems sleazy
at first but when you think about it, it isn't.  It provides forward and
backward compatability, it recognises that it's just a hint and that
filesystems can legitimately sync the whole file and it produces
smaller+faster code.

We might need a wait_on_page_writeback_range() a_op though.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: msync() behaviour broken for MS_ASYNC, revert patch?
  2004-04-01  0:08     ` Linus Torvalds
  2004-04-01  0:30       ` Andrew Morton
@ 2004-04-01 15:40       ` Stephen C. Tweedie
  2004-04-01 16:02         ` Linus Torvalds
                           ` (2 more replies)
  1 sibling, 3 replies; 18+ messages in thread
From: Stephen C. Tweedie @ 2004-04-01 15:40 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm, Andrew Morton, linux-kernel, Ulrich Drepper

Hi,

On Thu, 2004-04-01 at 01:08, Linus Torvalds wrote:

> > You can make the same argument for either implementation of MS_ASYNC. 
> Exactly.

> Which is why I say that the implementation cannot matter, because user 
> space would be _buggy_ if it depended on some timing issue.

I see it purely as a performance issue.  That's the context in which we
saw the initial complaint about the 2.4 behaviour change.

> > And there's at least one way in which the "submit IO now" version can be
> > used meaningfully --- if you've got several specific areas of data in
> > one or more mappings that need flushed to disk, you'd be able to
> > initiate IO with multiple MS_ASYNC calls and then wait for completion
> > with either MS_SYNC or fsync().
> 
> Why wouldn't you be able to do that with the current one?

You can, but only across one fd.

        A = mmap(..., a);
        B = mmap(..., b);

        msync(A, ..., MS_ASYNC);
        msync(B, ..., MS_ASYNC);
        fsync(a);
        fsync(b);

has rather different performance characteristics according to which way
you go.  Do deferred writeback and the two fsync()s do serialised IO,
with the fs idle in between.  Submit the background IO immediately and
you avoid that.

Anyway, I just tried on a Solaris-2.8 box, and the results are rather
interesting.  Doing a simple (touch-one-char, msync one page) loop on a
mmap'ed file on a local scsi disk, MS_ASYNC gives ~15000 msyncs per
second; MS_SYNC gives ~900.  [A null getpid() loop gives about 250,000
loops a second.]

However, the "iostat" shows *exactly* the same disk throughput in each
case.  MS_ASYNC is causing immediate IO kickoff, but shows only ~900 ios
per second, the same as the ios-per-second for MS_SYNC and the same as
the MS_SYNC loop frequency.

So it appears that on Solaris, MS_ASYNC is kicking off instant IO, but
is not waiting for existing IO to complete first.  So if we have an IO
already in progress, then many msync calls end up queuing the *same*
subsequent IO, and once one new IO is queued, further MS_ASYNC msyncs
don't bother scheduling a new one (on the basis that the
already-scheduled one hasn't started yet so the new data is already
guaranteed to hit disk.)

So Solaris behaviour is indeed to begin IO as soon as possible on
MS_ASYNC, but they are doing it far more efficiently than our current
msync code can do.

> Tha advantage of the current MS_ASYNC is absolutely astoundingly HUGE: 
> because we don't wait for in-progress IO, it can be used to efficiently 
> synchronize multiple different areas, and then after that waiting for them 
> with _one_ single fsync().

The Solaris one manages to preserve those properties while still
scheduling the IO "soon".  I'm not sure how we could do that in the
current VFS, short of having a background thread scheduling deferred
writepage()s as soon as the existing page becomes unlocked.

> More importanrtly, the current behaviour makes certain patterns _possible_ 
> that your suggested semantics simply cannot do efficiently. If we have 
> data records smaller than a page, and want to mark them dirty as they 
> happen, the current msync() allows that - it doesn't matter that another 
> datum was marked dirty just a moment ago. Then, you do one fsync() only 
> when you actually want to _commit_ a series of updates before you change 
> the index. 

> But if we want to have another flag, with MS_HALF_ASYNC, that's certainly
> ok by me. I'm all for choice.

Yes, but we _used_ to have that choice --- call msync() with flags == 0,
and you'd get the deferred kupdated writeback; call it with MS_ASYNC and
you'd get instant IO kickoff; call it with MS_SYNC and you'd get
synchronous completion.  But now we've lost the instant kickoff, async
completion option, and MS_ASYNC behaves just like flags==0.  

So I'm all for adding the choice back, and *documenting* it so that
people know exactly what to expect in all three cases.  Whether the
choice comes from an fadvise option or an msync() doesn't bother me that
much.

In that case, the decision about which version of the behaviour MS_ASYNC
should give is (as it should be) a matter of obeying the standard
correctly, and the other useful behaviours are preserved elsewhere. 
Which brings us back to trying to interpret the vague standard.  Both
Uli's interpretation and the Solaris implementation suggest that we need
to start the writepage sooner rather than later.

> > It's very much visible, just from a performance perspective, if you want
> > to support "kick off this IO, I'm going to wait for the completion
> > shortly."
> 
> That may well be worth a call of its own. It has nothing to do with memory 
> mapping, though - what you're really looking for is fasync(). 

Indeed.  And msync(flags==0) remains as a way of synchronising mmaps
with the inode-dirty-list fasync writeback.

> And yes, I agree that _that_ would make sense. Havign some primitives to 
> start writeout of an area of a file would likely be a good thing.
> 
> I'd be perfectly happy with a set of file cache control operations, 
> including
> 
>  - start writeback in [a,b]

posix_fadvise() seems to do something a little like this already: the
FADV_DONTNEED handler tries

		if (!bdi_write_congested(mapping->backing_dev_info))
			filemap_flush(mapping);

before going into the invalidate_mapping_pages() call.  Having that (a)
limited to the specific file range passed into the fadvise(), and (b)
available as a separate function independent of the DONTNEED page
invalidator, would seem like an entirely sensible extension.  

The obvious implementations would be somewhat inefficient in some cases,
though --- currently __filemap_fdatawrite simply list_splice()s the
inode dirty list into the io list.  Walking a long dirty list to flush
just a few pages from a narrow range could get slow, and walking the
radix tree would be inefficient if there are only a few dirty pages
hidden in a large cache of clean pages.

> My argument was that a standard CANNOT say anything one way or the other, 
> because the behaviour IS NOT USER-VISIBLE! 

Worse, it doesn't seem to be implemented consistently either.  I've been
trying on a few other Unixen while writing this.  First on a Tru64 box,
and it is _not_ kicking off any IO at all for MS_ASYNC, except for the
30-second regular sync.  The same appears to be true on FreeBSD.  And on
HP-UX, things go in the other direction: the performance of MS_ASYNC is
identical to MS_SYNC, both in terms of observed disk IO during the sync
and the overall rate of the msync loop.

So it appears we've got Unix precedent for pretty-much any reasonable
interpretation of MS_ASYNC that we want.  Patch withdrawn!

--Stephen

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: msync() behaviour broken for MS_ASYNC, revert patch?
  2004-04-01 15:40       ` Stephen C. Tweedie
@ 2004-04-01 16:02         ` Linus Torvalds
  2004-04-01 16:33           ` Stephen C. Tweedie
  2004-04-01 16:19         ` Jamie Lokier
  2004-04-01 18:51         ` Andrew Morton
  2 siblings, 1 reply; 18+ messages in thread
From: Linus Torvalds @ 2004-04-01 16:02 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: linux-mm, Andrew Morton, linux-kernel, Ulrich Drepper

On Thu, 1 Apr 2004, Stephen C. Tweedie wrote:
> 
> So it appears that on Solaris, MS_ASYNC is kicking off instant IO, but
> is not waiting for existing IO to complete first.

A much more likely schenario is that Solaris is really doing the same
thing we are, but it _also_ ends up opportunistically trying to put the
resultant pages on the IO queues if possible (ie do a "write-ahead": start
writeout if that doesn't imply blocking).

We could probably do that too, it seems easy enough. A 
"TestSetPageLocked()" along with setting the BIO_RW_AHEAD flag. The only 
problem is that I don't think we really have support for doing write-ahead 
(ie we clear the page "dirty" bit too early, so if the write gets 
cancelled due to the IO queues being full, the dirty bit gets lost.

So we don't want to go there for now, but it's something to keep in mind, 
perhaps. 

> Worse, it doesn't seem to be implemented consistently either.  I've been
> trying on a few other Unixen while writing this.  First on a Tru64 box,
> and it is _not_ kicking off any IO at all for MS_ASYNC, except for the
> 30-second regular sync.  The same appears to be true on FreeBSD.  And on
> HP-UX, things go in the other direction: the performance of MS_ASYNC is
> identical to MS_SYNC, both in terms of observed disk IO during the sync
> and the overall rate of the msync loop.

If you check HP-UX, make sure it's a recent one. HPUX has historically 
been just too broken for words when it comes to mmap() (ie some _really_ 
strange semantics, like not being able to unmap partial mappings etc).

		Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: msync() behaviour broken for MS_ASYNC, revert patch?
  2004-04-01 15:40       ` Stephen C. Tweedie
  2004-04-01 16:02         ` Linus Torvalds
@ 2004-04-01 16:19         ` Jamie Lokier
  2004-04-01 16:56           ` s390 storage key inconsistency? [was Re: msync() behaviour broken for MS_ASYNC, revert patch?] Stephen C. Tweedie
  2004-04-01 16:57           ` msync() behaviour broken for MS_ASYNC, revert patch? Stephen C. Tweedie
  2004-04-01 18:51         ` Andrew Morton
  2 siblings, 2 replies; 18+ messages in thread
From: Jamie Lokier @ 2004-04-01 16:19 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Linus Torvalds, linux-mm, Andrew Morton, linux-kernel, Ulrich Drepper

Stephen C. Tweedie wrote:
> Yes, but we _used_ to have that choice --- call msync() with flags == 0,
> and you'd get the deferred kupdated writeback;

Is that not equivalent to MS_INVALIDATE?  It seems to be equivalent in
2.6.4.

The code in 2.6.4 ignores MS_INVALIDATE except for trivial error
checks, so msync() with flags == MS_INVALIDATE has the same effect as
msync() with flags == 0.

Some documentation I'm looking at says MS_INVALIDATE updates the
mapped page to contain the current contents of the file.  2.6.4 seems
to do the reverse: update the file to contain the current content of
the mapped page.  "man msync" agrees with the the latter.  (I can't
look at SUS right now).

On systems where the CPU caches are fully coherent, the only
difference is that the former is a no-op and the latter does the same
as the new behaviour of MS_ASYNC.

On systems where the CPU caches aren't coherent, some cache
synchronising or flushing operations are implied.

On either type of system, MS_INVALIDATE doesn't seem to be doing what
the documentation I'm looking at says it should do.

-- Jamie
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: msync() behaviour broken for MS_ASYNC, revert patch?
  2004-04-01 16:02         ` Linus Torvalds
@ 2004-04-01 16:33           ` Stephen C. Tweedie
  0 siblings, 0 replies; 18+ messages in thread
From: Stephen C. Tweedie @ 2004-04-01 16:33 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-mm, Andrew Morton, linux-kernel, Ulrich Drepper, Stephen Tweedie

Hi,

On Thu, 2004-04-01 at 17:02, Linus Torvalds wrote:

> > Worse, it doesn't seem to be implemented consistently either.  I've been
> > trying on a few other Unixen while writing this.  First on a Tru64 box,
> > and it is _not_ kicking off any IO at all for MS_ASYNC, except for the
> > 30-second regular sync.  The same appears to be true on FreeBSD.  And on
> > HP-UX, things go in the other direction: the performance of MS_ASYNC is
> > identical to MS_SYNC, both in terms of observed disk IO during the sync
> > and the overall rate of the msync loop.
> 
> If you check HP-UX, make sure it's a recent one. HPUX has historically 
> been just too broken for words when it comes to mmap() (ie some _really_ 
> strange semantics, like not being able to unmap partial mappings etc).

I'm not sure what counts as "recent" for that, but this was on HP-UX
11.  That's the most recent I've got access to.

--Stephen

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* s390 storage key inconsistency?  [was Re: msync() behaviour broken for MS_ASYNC, revert patch?]
  2004-04-01 16:19         ` Jamie Lokier
@ 2004-04-01 16:56           ` Stephen C. Tweedie
  2004-04-01 16:57           ` msync() behaviour broken for MS_ASYNC, revert patch? Stephen C. Tweedie
  1 sibling, 0 replies; 18+ messages in thread
From: Stephen C. Tweedie @ 2004-04-01 16:56 UTC (permalink / raw)
  To: Jamie Lokier, Linux on 390 Port
  Cc: Linus Torvalds, linux-mm, Andrew Morton, linux-kernel,
	Ulrich Drepper, Stephen Tweedie

Hi,

On Thu, 2004-04-01 at 17:19, Jamie Lokier wrote:

> Some documentation I'm looking at says MS_INVALIDATE updates the
> mapped page to contain the current contents of the file.  2.6.4 seems
> to do the reverse: update the file to contain the current content of
> the mapped page.  "man msync" agrees with the the latter.  (I can't
> look at SUS right now).

btw, just looking at the filemap_sync_pte() code for MS_INVALIDATE, I
noticed

	if (!PageReserved(page) &&
	    (ptep_clear_flush_dirty(vma, address, ptep) ||
	     page_test_and_clear_dirty(page)))
		set_page_dirty(page);

I just happened to follow the function and noticed that on s390,
page_test_and_clear_dirty() has the comment:

* Test and clear dirty bit in storage key.
* We can't clear the changed bit atomically. This is a potential
* race against modification of the referenced bit. This function
* should therefore only be called if it is not mapped in any
* address space.

but in this case the page is clearly mapped in the caller's address
space, else we wouldn't have reached this.

Is this a problem?

--Stephen



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: msync() behaviour broken for MS_ASYNC, revert patch?
  2004-04-01 16:19         ` Jamie Lokier
  2004-04-01 16:56           ` s390 storage key inconsistency? [was Re: msync() behaviour broken for MS_ASYNC, revert patch?] Stephen C. Tweedie
@ 2004-04-01 16:57           ` Stephen C. Tweedie
  1 sibling, 0 replies; 18+ messages in thread
From: Stephen C. Tweedie @ 2004-04-01 16:57 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Linus Torvalds, linux-mm, Andrew Morton, linux-kernel,
	Ulrich Drepper, Stephen Tweedie

Hi,

On Thu, 2004-04-01 at 17:19, Jamie Lokier wrote:
> Stephen C. Tweedie wrote:
> > Yes, but we _used_ to have that choice --- call msync() with flags == 0,
> > and you'd get the deferred kupdated writeback;
> 
> Is that not equivalent to MS_INVALIDATE?  It seems to be equivalent in
> 2.6.4.

It is in all the kernels I've looked at, but that's mainly because we
seem to ignore MS_INVALIDATE.

> Some documentation I'm looking at says MS_INVALIDATE updates the
> mapped page to contain the current contents of the file.  2.6.4 seems
> to do the reverse: update the file to contain the current content of
> the mapped page.  "man msync" agrees with the the latter.  (I can't
> look at SUS right now).

SUSv3 says

        When MS_INVALIDATE is specified, msync() shall invalidate all
        cached copies of mapped data that are inconsistent with the
        permanent storage locations such that subsequent references
        shall obtain data that was consistent with the permanent storage
        locations sometime between the call to msync() and the first
        subsequent memory reference to the data.
        
which seems to imply that dirty ptes should simply be cleared, rather
than propagated to the page dirty bits.

That's easy enough --- we already propagate the flags down to
filemap_sync_pte, where the page and pte dirty bits are modified.  Does
anyone know any reason why we don't do MS_INVALIDATE there already?

--Stephen


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: msync() behaviour broken for MS_ASYNC, revert patch?
  2004-04-01 15:40       ` Stephen C. Tweedie
  2004-04-01 16:02         ` Linus Torvalds
  2004-04-01 16:19         ` Jamie Lokier
@ 2004-04-01 18:51         ` Andrew Morton
  2 siblings, 0 replies; 18+ messages in thread
From: Andrew Morton @ 2004-04-01 18:51 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: torvalds, linux-mm, linux-kernel, drepper

"Stephen C. Tweedie" <sct@redhat.com> wrote:
>
> > Tha advantage of the current MS_ASYNC is absolutely astoundingly HUGE: 
> > because we don't wait for in-progress IO, it can be used to efficiently 
> > synchronize multiple different areas, and then after that waiting for them 
> > with _one_ single fsync().
> 
> The Solaris one manages to preserve those properties while still
> scheduling the IO "soon".  I'm not sure how we could do that in the
> current VFS, short of having a background thread scheduling deferred
> writepage()s as soon as the existing page becomes unlocked.

filemap_flush() will do exactly this.  So if you want the Solaris
semantics, calling filemap_flush() intead of filemap_fdatawrite() should do
it.

> posix_fadvise() seems to do something a little like this already: the
> FADV_DONTNEED handler tries
> 
> 		if (!bdi_write_congested(mapping->backing_dev_info))
> 			filemap_flush(mapping);
> 
> before going into the invalidate_mapping_pages() call.  Having that (a)
> limited to the specific file range passed into the fadvise(), and (b)
> available as a separate function independent of the DONTNEED page
> invalidator, would seem like an entirely sensible extension.  
> 
> The obvious implementations would be somewhat inefficient in some cases,
> though --- currently __filemap_fdatawrite simply list_splice()s the
> inode dirty list into the io list.  Walking a long dirty list to flush
> just a few pages from a narrow range could get slow, and walking the
> radix tree would be inefficient if there are only a few dirty pages
> hidden in a large cache of clean pages.

The patches I have queued in -mm allow us to do this.  We use
find_get_pages_tag() to iterate over only the dirty pages in the tree.

That still has the efficiency problem that when searching for dirty pages
we also visit pages which are both dirty and under writeback (we're not
interested in those pages if it is a non-blocking flush), although I've
only observed that to be a problem when the queue size was bumped up to
10,000 requests and I fixed that up for the common cases by other means.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: msync() behaviour broken for MS_ASYNC, revert patch?
  2004-03-31 22:16 msync() behaviour broken for MS_ASYNC, revert patch? Stephen C. Tweedie
  2004-03-31 22:37 ` Linus Torvalds
  2004-03-31 22:53 ` Andrew Morton
@ 2004-04-16 22:35 ` Jamie Lokier
  2004-04-19 21:54   ` Stephen C. Tweedie
  2 siblings, 1 reply; 18+ messages in thread
From: Jamie Lokier @ 2004-04-16 22:35 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: linux-mm, Andrew Morton, linux-kernel, Ulrich Drepper

Stephen C. Tweedie wrote:
> I've been looking at a discrepancy between msync() behaviour on 2.4.9
> and newer 2.4 kernels, and it looks like things changed again in
> 2.5.68.

When you say a discrepancy between 2.4.9 and newer 2.4 kernels, do you
mean that the msync() behaviour changed during the 2.4 series?

If so, what was the change?

Thanks,
-- Jamie
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: msync() behaviour broken for MS_ASYNC, revert patch?
  2004-04-16 22:35 ` Jamie Lokier
@ 2004-04-19 21:54   ` Stephen C. Tweedie
  2004-04-21  2:10     ` Jamie Lokier
  0 siblings, 1 reply; 18+ messages in thread
From: Stephen C. Tweedie @ 2004-04-19 21:54 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: linux-mm, Andrew Morton, linux-kernel, Ulrich Drepper, Stephen Tweedie

Hi,

On Fri, 2004-04-16 at 23:35, Jamie Lokier wrote:
> Stephen C. Tweedie wrote:
> > I've been looking at a discrepancy between msync() behaviour on 2.4.9
> > and newer 2.4 kernels, and it looks like things changed again in
> > 2.5.68.
> 
> When you say a discrepancy between 2.4.9 and newer 2.4 kernels, do you
> mean that the msync() behaviour changed during the 2.4 series?

Yes.

> If so, what was the change?

2.4.9 behaved like current 2.6 --- on MS_ASYNC, it did a
set_page_dirty() which means the page will get picked up by the next
5-second bdflush pass.  But later 2.4 kernels were changed so that they
started MS_ASYNC IO immediately with filemap_fdatasync() (which is
asynchronous regarding the new IO, but which blocks synchronously if
there is already old IO in flight on the page.)

That was reverted back to the earlier, 2.4.9 behaviour in the 2.5
series.

Cheers,
 Stephen

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: msync() behaviour broken for MS_ASYNC, revert patch?
  2004-04-19 21:54   ` Stephen C. Tweedie
@ 2004-04-21  2:10     ` Jamie Lokier
  2004-04-21  9:52       ` Stephen C. Tweedie
  0 siblings, 1 reply; 18+ messages in thread
From: Jamie Lokier @ 2004-04-21  2:10 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: linux-mm, Andrew Morton, linux-kernel, Ulrich Drepper

Stephen C. Tweedie wrote:
> > If so, what was the change?
> 
> 2.4.9 behaved like current 2.6 --- on MS_ASYNC, it did a
> set_page_dirty() which means the page will get picked up by the next
> 5-second bdflush pass.  But later 2.4 kernels were changed so that they
> started MS_ASYNC IO immediately with filemap_fdatasync() (which is
> asynchronous regarding the new IO, but which blocks synchronously if
> there is already old IO in flight on the page.)
> 
> That was reverted back to the earlier, 2.4.9 behaviour in the 2.5
> series.

It was 2.5.68.

Thanks, that's very helpful.

msync(0) has always had behaviour consistent with the <=2.4.9 and
>=2.5.68 MS_ASYNC behaviour, is that right?

If so, programs may as well "#define MS_ASYNC 0" on Linux, to get well
defined and consistent behaviour.  It would be nice to change the
definition in libc to zero, but I don't think it's possible because
msync(MS_SYNC|MS_ASYNC) needs to fail.

-- Jamie
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: msync() behaviour broken for MS_ASYNC, revert patch?
  2004-04-21  2:10     ` Jamie Lokier
@ 2004-04-21  9:52       ` Stephen C. Tweedie
  0 siblings, 0 replies; 18+ messages in thread
From: Stephen C. Tweedie @ 2004-04-21  9:52 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: linux-mm, Andrew Morton, linux-kernel, Ulrich Drepper, Stephen Tweedie

Hi,

On Wed, 2004-04-21 at 03:10, Jamie Lokier wrote:

> msync(0) has always had behaviour consistent with the <=2.4.9 and
> >=2.5.68 MS_ASYNC behaviour, is that right?

Not sure about "always", but it looks like it recently at least.  2.2
msync was implemented very differently but seems, from the source, to
have the same property --- do_write_page() calls f_op->write() on msync,
and MS_SYNC forces an fsync after the writes.  But 2.4 and 2.6 share
much more similar code to each other.  So all since 2.2 seem to do the
fully-async, deferred writeback behaviour for flags==0.

--Stephen

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2004-04-21  9:52 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-03-31 22:16 msync() behaviour broken for MS_ASYNC, revert patch? Stephen C. Tweedie
2004-03-31 22:37 ` Linus Torvalds
2004-03-31 23:41   ` Stephen C. Tweedie
2004-04-01  0:08     ` Linus Torvalds
2004-04-01  0:30       ` Andrew Morton
2004-04-01 15:40       ` Stephen C. Tweedie
2004-04-01 16:02         ` Linus Torvalds
2004-04-01 16:33           ` Stephen C. Tweedie
2004-04-01 16:19         ` Jamie Lokier
2004-04-01 16:56           ` s390 storage key inconsistency? [was Re: msync() behaviour broken for MS_ASYNC, revert patch?] Stephen C. Tweedie
2004-04-01 16:57           ` msync() behaviour broken for MS_ASYNC, revert patch? Stephen C. Tweedie
2004-04-01 18:51         ` Andrew Morton
2004-03-31 22:53 ` Andrew Morton
2004-03-31 23:20   ` Stephen C. Tweedie
2004-04-16 22:35 ` Jamie Lokier
2004-04-19 21:54   ` Stephen C. Tweedie
2004-04-21  2:10     ` Jamie Lokier
2004-04-21  9:52       ` Stephen C. Tweedie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox