From: Linus Torvalds <torvalds@osdl.org>
To: "Stephen C. Tweedie" <sct@redhat.com>
Cc: linux-mm <linux-mm@kvack.org>, Andrew Morton <akpm@osdl.org>,
linux-kernel <linux-kernel@vger.kernel.org>,
Ulrich Drepper <drepper@redhat.com>
Subject: Re: msync() behaviour broken for MS_ASYNC, revert patch?
Date: Wed, 31 Mar 2004 16:08:05 -0800 (PST) [thread overview]
Message-ID: <Pine.LNX.4.58.0403311550040.1116@ppc970.osdl.org> (raw)
In-Reply-To: <1080776487.1991.113.camel@sisko.scot.redhat.com>
On Wed, 1 Apr 2004, Stephen C. Tweedie wrote:
>
> On Wed, 2004-03-31 at 23:37, Linus Torvalds wrote:
>
> > If you care about the data hitting the disk, you have to use fsync() or
> > similar _anyway_, and pretending anything else is just bogus.
>
> You can make the same argument for either implementation of MS_ASYNC.
Exactly.
Which is why I say that the implementation cannot matter, because user
space would be _buggy_ if it depended on some timing issue.
> And there's at least one way in which the "submit IO now" version can be
> used meaningfully --- if you've got several specific areas of data in
> one or more mappings that need flushed to disk, you'd be able to
> initiate IO with multiple MS_ASYNC calls and then wait for completion
> with either MS_SYNC or fsync().
Why wouldn't you be able to do that with the current one?
Tha advantage of the current MS_ASYNC is absolutely astoundingly HUGE:
because we don't wait for in-progress IO, it can be used to efficiently
synchronize multiple different areas, and then after that waiting for them
with _one_ single fsync().
In contrast, the "wait for queued IO" approach can't sanely do that,
exactly because it will wait in the middle, depending on other activity at
the same time. It will always have the worry that it happens to do the
msync() at the wrong time, and then wait synchronously when it shouldn't.
More importanrtly, the current behaviour makes certain patterns _possible_
that your suggested semantics simply cannot do efficiently. If we have
data records smaller than a page, and want to mark them dirty as they
happen, the current msync() allows that - it doesn't matter that another
datum was marked dirty just a moment ago. Then, you do one fsync() only
when you actually want to _commit_ a series of updates before you change
the index.
But if we want to have another flag, with MS_HALF_ASYNC, that's certainly
ok by me. I'm all for choice. It's just that I most definitely want the
choice of doing it the way we do it now, since I consider that to be the
_sane_ way.
> It's very much visible, just from a performance perspective, if you want
> to support "kick off this IO, I'm going to wait for the completion
> shortly."
That may well be worth a call of its own. It has nothing to do with memory
mapping, though - what you're really looking for is fasync().
And yes, I agree that _that_ would make sense. Havign some primitives to
start writeout of an area of a file would likely be a good thing.
I'd be perfectly happy with a set of file cache control operations,
including
- start writeback in [a,b]
- wait for [a,b] stable
- and maybe "punch hole in [a,b]"
Then you could use these for write() in addition to mmap(), and you can
first mark multiple regions dirty, and then do a single wait (which is
clearly more efficient than synchronously waiting for multiple regions).
But none of these have anything to do with what SuS or any other standard
says about MS_ASYNC.
> But whether that's a legal use of MS_ASYNC really depends on what the
> standard is requiring. I could be persuaded either way. Uli?
My argument was that a standard CANNOT say anything one way or the other,
because the behaviour IS NOT USER-VISIBLE! A program fundamentally cannot
care, since the only issue is a pure implementation issue of "which queue"
the data got queued onto.
Bringing in a standards body is irrelevant. It's like trying to use the
bible to determine whether protons have a positive charge.
Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
next prev parent reply other threads:[~2004-04-01 0:08 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-03-31 22:16 Stephen C. Tweedie
2004-03-31 22:37 ` Linus Torvalds
2004-03-31 23:41 ` Stephen C. Tweedie
2004-04-01 0:08 ` Linus Torvalds [this message]
2004-04-01 0:30 ` Andrew Morton
2004-04-01 15:40 ` Stephen C. Tweedie
2004-04-01 16:02 ` Linus Torvalds
2004-04-01 16:33 ` Stephen C. Tweedie
2004-04-01 16:19 ` Jamie Lokier
2004-04-01 16:56 ` s390 storage key inconsistency? [was Re: msync() behaviour broken for MS_ASYNC, revert patch?] Stephen C. Tweedie
2004-04-01 16:57 ` msync() behaviour broken for MS_ASYNC, revert patch? Stephen C. Tweedie
2004-04-01 18:51 ` Andrew Morton
2004-03-31 22:53 ` Andrew Morton
2004-03-31 23:20 ` Stephen C. Tweedie
2004-04-16 22:35 ` Jamie Lokier
2004-04-19 21:54 ` Stephen C. Tweedie
2004-04-21 2:10 ` Jamie Lokier
2004-04-21 9:52 ` Stephen C. Tweedie
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.58.0403311550040.1116@ppc970.osdl.org \
--to=torvalds@osdl.org \
--cc=akpm@osdl.org \
--cc=drepper@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=sct@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox