linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@osdl.org>
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, mason@suse.com,
	andrea@suse.de, hugh@veritas.com, axboe@suse.de
Subject: Re: [rfc][patch] remove racy sync_page?
Date: Wed, 31 May 2006 08:09:04 -0700 (PDT)	[thread overview]
Message-ID: <Pine.LNX.4.64.0605310755210.24646@g5.osdl.org> (raw)
In-Reply-To: <Pine.LNX.4.64.0605310740530.24646@g5.osdl.org>


On Wed, 31 May 2006, Linus Torvalds wrote:
> 
> The reason it's kicked by wait_on_page() is that is when it's needed.

Btw, that's not how it has always been done.

For the longest time, it was actually triggered by scheduler activity, in 
particular, plugging used to be a workqueue event that was triggered by 
the scheduler (or any explicit points when you wanted it to be triggered 
earlier).

So whenever you scheduled, _all_ plugs would be unplugged.

It was specialized to wait_for_page() in order to avoid unnecessary 
overhead in scheduling (making it more directed), and to allow you to 
leave the request around for further merging/sorting even if a process had 
to wait for something unrelated.

But in particular, the old non-directed unplug didn't work well in SMP 
environments (because _one_ CPU re-scheduling obviously doesn't mean 
anything for the _other_ CPU that is actually working on setting up the 
request queue).

The point being that we could certainly do it somewhere else. Doing it in 
wait_for_page() (and at least historically, in waiting for bh's) is really 
nothing more than trying to have as few points as possible where it's 
done, and at the same time not missing any.

And yes, I'd _love_ to have better interfaces to let people take advantage 
of this than sys_readahead(). sys_readahead() was a 5-minute hack that 
actually does generate wonderful IO patterns, but it is also not all that 
useful (too specialized, and non-portable).

I tried at one point to make us do directory inode read-ahead (ie put the 
inodes on a read-ahead queue when doing a directory listing), but that 
failed miserably. All the low-level filesystems are very much designed to 
have inode reading be synchronous, and it would have implied major surgery 
to do (and, sadly, my preliminary numbers also seemed to say that it might 
be a huge time waster, with enough users just wanting the filenames and 
not the inodes).

The thing is, right now we have very bad IO patterns for things that 
traverse whole directory trees (like doign a "tar" or a "diff" of a tree 
that is cold in the cache) because we have way too many serialization 
points. We do a good job of prefetching within a file, but if you have 
source trees etc, the median size for files is often smaller than a single 
page, so the prefetching ends up being a non-issue most of the time, and 
we do _zero_ prefetching between files ;/

Now, the reason we don't do it is that it seems to be damn hard to do. No 
question about that. Especially since it's only worth doing (obviously) on 
the cold-cache case, and that's also when we likely have very little 
information about what the access patterns might be.. Oh, well.

Even with sys_readahead(), my simple "pre-read a whole tree" often ended 
up waiting for inode IO (although at least the fact that several inodes 
fit in one block gets _some_ of that).

			Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2006-05-31 15:09 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-05-29  9:34 Nick Piggin
2006-05-29 19:15 ` Andrew Morton
2006-05-30  0:08   ` Nick Piggin
2006-05-30  1:32     ` Andrew Morton
2006-05-30  2:54       ` Nick Piggin
2006-05-30  3:14         ` Andrew Morton
2006-05-30  4:13           ` Nick Piggin
2006-05-30  9:05           ` Jens Axboe
2006-05-31 13:43             ` Nick Piggin
2006-05-31 15:09               ` Hugh Dickins
2006-05-31 15:22                 ` Nick Piggin
2006-05-31 17:51                   ` Jens Axboe
2006-05-31 17:50               ` Jens Axboe
2006-05-30  4:20         ` Linus Torvalds
2006-05-30  5:07           ` Nick Piggin
2006-05-30  5:21             ` Nick Piggin
2006-05-30  6:12               ` Neil Brown
2006-05-30  7:10                 ` Nick Piggin
2006-05-31  4:34                   ` Neil Brown
2006-05-30  8:24               ` Nikita Danilov
2006-05-30 17:55               ` Linus Torvalds
2006-05-31  0:32                 ` Nick Piggin
2006-05-31  0:56                   ` Linus Torvalds
2006-05-31  1:33                     ` Mark Lord
2006-05-31  6:11                       ` Jens Axboe
2006-05-31 12:55                         ` Mark Lord
2006-05-31 13:02                           ` Jens Axboe
2006-06-01 13:19                           ` NCQ performance (was Re: [rfc][patch] remove racy sync_page?) Jens Axboe
2006-06-01 14:56                             ` Avi Kivity
2006-06-01 15:03                               ` Jens Axboe
2006-06-01 18:04                                 ` Jens Axboe
2006-06-05  5:30                                   ` Avi Kivity
2006-06-05  7:59                                     ` Jens Axboe
2006-05-31 12:31                     ` [rfc][patch] remove racy sync_page? Helge Hafting
2006-05-31 12:36                       ` Arjan van de Ven
2006-05-31 13:29                     ` Nick Piggin
2006-05-31 13:41                       ` Jens Axboe
2006-05-31 13:54                         ` Nick Piggin
2006-05-31 14:43                       ` Linus Torvalds
2006-05-31 14:57                         ` Nick Piggin
2006-05-31 15:13                           ` Linus Torvalds
2006-05-31 15:09                         ` Linus Torvalds [this message]
2006-05-31 18:13                           ` Jens Axboe
2006-05-31 18:26                             ` Linus Torvalds
2006-05-30  5:36             ` Nick Piggin
2006-05-30 18:31               ` Hugh Dickins
2006-05-31  0:21                 ` Nick Piggin
2006-05-31  3:06                   ` Hugh Dickins
2006-05-31 14:30                     ` Hugh Dickins
2006-05-31 17:56                     ` Jens Axboe
2006-05-30  5:51 ` Josef Sipek
2006-05-30  6:44   ` Nick Piggin
2006-05-30  6:50     ` Nick Piggin
2006-05-30 13:12     ` Josef Sipek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0605310755210.24646@g5.osdl.org \
    --to=torvalds@osdl.org \
    --cc=andrea@suse.de \
    --cc=axboe@suse.de \
    --cc=hugh@veritas.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mason@suse.com \
    --cc=nickpiggin@yahoo.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox