linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@osdl.org>
To: "Stephen C. Tweedie" <sct@redhat.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, drepper@redhat.com
Subject: Re: msync() behaviour broken for MS_ASYNC, revert patch?
Date: Wed, 31 Mar 2004 14:53:52 -0800	[thread overview]
Message-ID: <20040331145352.23df0831.akpm@osdl.org> (raw)
In-Reply-To: <1080771361.1991.73.camel@sisko.scot.redhat.com>

"Stephen C. Tweedie" <sct@redhat.com> wrote:
>
> Hi,
> 
> I've been looking at a discrepancy between msync() behaviour on 2.4.9
> and newer 2.4 kernels, and it looks like things changed again in
> 2.5.68.  From the ChangeLog:
> 
> ChangeSet 1.971.76.156 2003/04/09 11:31:36 akpm@digeo.com
>   [PATCH] Make msync(MS_ASYNC) no longer start the I/O
>   
>   MS_ASYNC will currently wait on previously-submitted I/O, then start new I/O
>   and not wait on it.  This can cause undesirable blocking if msync is called
>   rapidly against the same memory.
>   
>   So instead, change msync(MS_ASYNC) to not start any IO at all.  Just flush
>   the pte dirty bits into the pageframe and leave it at that.
>   
>   The IO _will_ happen within a kupdate period.  And the application can use
>   fsync() or fadvise(FADV_DONTNEED) if it actually wants to schedule the IO
>   immediately.
> 
> Unfortunately, this seems to contradict SingleUnix requirements, which
> state:
> 
>         When MS_ASYNC is specified, msync() shall return immediately
>         once all the write operations are initiated or queued for
>         servicing
>         
> although I can't find an unambiguous definition of "queued for service"
> in the online standard.  I'm reading it as requiring that the I/O has
> reached the block device layer, not simply that it has been marked dirty
> for some future writeback pass to catch; Uli agrees with that
> interpretation.

I don't think I agree with that.  If "queued for service" means we've
started the I/O, then what does "initiated" mean, and why did they specify
"initiated" separately?


What triggered all this was a dinky little test app which Linus wrote to
time some aspect of P4 tlb writeback latency.  It sits in a loop dirtying a
page then msyncing it with MS_ASYNC.  It ran very poorly, because MS_ASYNC
ended up waiting on the previously-submitted I/O before starting new I/O.

One approach to improving that would be for MS_ASYNC to say "if the page is
already under writeout then just skip the I/O".  But that's worthless,
really - it makes the MS_ASYNC semantics too vague.

As you point out, Linus's app should have used the "flags=0" linux
extension.  Didn't think of that.

Your reversion patch would mean that current applications which use
MS_ASYNC will again suffer large latencies if the pages are under writeout.
Sure, users could switch apps to using flags=0 to avoid that, but people
don't know to do that.

So given that SUS is ambiguous about this, I'd suggest that we be able to
demonstrate some real-world reason why this matters.  Why are you concerned
about this?


> The 2.5.68 changeset also includes the comment:
> 
>   (This has triggered an ext3 bug - the page's buffers get dirtied so fast
>   that kjournald keeps writing the buffers over and over for 10-20 seconds
>   before deciding to give up for some reason)
> 
> Was that ever resolved?  If it's still there, I should have a look at it
> if we're restoring the old trigger.

(These changelog thingies are useful, aren't they?)

I don't recall checking since that time.  I expect that Linus's test app
will still livelock kjournals in the current -linus tree - kjournald sits
there trying to write out the dirty buffers but the dang things just keep
on getting dirtied.

If so, I'm sure this patch (queued for 2.6.6) will fix it:

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.5-rc3/2.6.5-rc3-mm3/broken-out/jbd-move-locked-buffers.patch

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

  parent reply	other threads:[~2004-03-31 22:53 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-03-31 22:16 Stephen C. Tweedie
2004-03-31 22:37 ` Linus Torvalds
2004-03-31 23:41   ` Stephen C. Tweedie
2004-04-01  0:08     ` Linus Torvalds
2004-04-01  0:30       ` Andrew Morton
2004-04-01 15:40       ` Stephen C. Tweedie
2004-04-01 16:02         ` Linus Torvalds
2004-04-01 16:33           ` Stephen C. Tweedie
2004-04-01 16:19         ` Jamie Lokier
2004-04-01 16:56           ` s390 storage key inconsistency? [was Re: msync() behaviour broken for MS_ASYNC, revert patch?] Stephen C. Tweedie
2004-04-01 16:57           ` msync() behaviour broken for MS_ASYNC, revert patch? Stephen C. Tweedie
2004-04-01 18:51         ` Andrew Morton
2004-03-31 22:53 ` Andrew Morton [this message]
2004-03-31 23:20   ` Stephen C. Tweedie
2004-04-16 22:35 ` Jamie Lokier
2004-04-19 21:54   ` Stephen C. Tweedie
2004-04-21  2:10     ` Jamie Lokier
2004-04-21  9:52       ` Stephen C. Tweedie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040331145352.23df0831.akpm@osdl.org \
    --to=akpm@osdl.org \
    --cc=drepper@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=sct@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox