linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@osdl.org>
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, mason@suse.com,
	andrea@suse.de, hugh@veritas.com, axboe@suse.de
Subject: Re: [rfc][patch] remove racy sync_page?
Date: Tue, 30 May 2006 17:56:55 -0700 (PDT)	[thread overview]
Message-ID: <Pine.LNX.4.64.0605301739030.24646@g5.osdl.org> (raw)
In-Reply-To: <447CE43A.6030700@yahoo.com.au>


On Wed, 31 May 2006, Nick Piggin wrote:
> 
> The requests can only get merged if contiguous requests from the upper
> layers come down, right?

It has nothing to do with merging. It has to do with IO patterns.

Seeking.

Seeking is damn expensive - much more so than command issue. People forget 
that sometimes.

If you can sort the requests so that you don't have to seek back and 
forth, that's often a HUGE win. 

Yes, the requests will still be small, and yes, the IO might happen in 4kB 
chunks, but it happens a lot faster if you do it in a good elevator 
ordering and if you hit the track cache than if you seek back and forth.

And part of that is that you have to submit multiple requests when you 
start, and allow the elevator to work on it.

Now, of course, if you have tons of reqeusts already in flight, you don't 
care (you already have lots of work for the elevator), but at least in 
desktop loads the "starting from idle" case is pretty common. Getting just 
a few requests to start up with is good.

(Yes, tagged queueing makes it less of an issue, of course. I know, I 
know. But I _think_ a lot of disks will start seeking for an incoming 
command the moment they see it, just to get the best latency, rather than 
wait a millisecond or two to see if they get another request. So even 
with tagged queuing, the elevator can help, _especially_ for the initial 
request).

> Why would plugging help if the requests can't get merged, though?

Why do you think we _have_ an elevator in the first place?

And just how well do you think it works if you submit one entry at a time 
(regardless of how _big_ it is) and start IO on it immediately? Vs trying 
to get several IO's out there, so that we can say "do this one first".

Sometimes I think harddisks have gotten too quiet - people no longer hear 
it when access patters are horrible. But the big issue with plugging was 
only partially about request coalescing, and was always about trying to 
get the _order_ right when you start to actually submit the requests to 
the hardware.

And yes, I realize that modern disks do remapping, and that we will never 
do a "perfect" job. But it's still true that the block number has _some_ 
(fairly big, in fact) relationship to the actual disk layout, and that 
avoiding seeking is a big deal.

Rotational latency is often an even bigger issue, of course, but we can't 
do much about that. We really can't estimate where the head is, like 
people used to try to do three decades ago. _That_ time is long past, but 
we can try to avoid long seeks, and it's still true that you can get 
blocks that are _close_ faster (if only because they may end up being on 
the same cylinder and not need a seek).

Even better than "same cylinder" is sometimes "same cache block" - disks 
often do track caching, and they aren't necessarily all that smart about 
it, so even if you don't read one huge contiguous block, it's much better 
to read an area _close_ to another than seek back and forth, because 
you're more likely to hit the disks own track cache.

And I know, disks aren't as sensitive to long seeks as they used to be (a 
short seek is almost as expensive as a long one, and a lot of it is the 
head settling time), but as another example - I think for CD-ROMs you can 
still have things like the motor spinning faster or slower depending on 
where the read head is, for example, meaning that short seeks are cheaper 
than long ones.

(Maybe constant angular velocity is what people use, though. I dunno).

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2006-05-31  0:56 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-05-29  9:34 Nick Piggin
2006-05-29 19:15 ` Andrew Morton
2006-05-30  0:08   ` Nick Piggin
2006-05-30  1:32     ` Andrew Morton
2006-05-30  2:54       ` Nick Piggin
2006-05-30  3:14         ` Andrew Morton
2006-05-30  4:13           ` Nick Piggin
2006-05-30  9:05           ` Jens Axboe
2006-05-31 13:43             ` Nick Piggin
2006-05-31 15:09               ` Hugh Dickins
2006-05-31 15:22                 ` Nick Piggin
2006-05-31 17:51                   ` Jens Axboe
2006-05-31 17:50               ` Jens Axboe
2006-05-30  4:20         ` Linus Torvalds
2006-05-30  5:07           ` Nick Piggin
2006-05-30  5:21             ` Nick Piggin
2006-05-30  6:12               ` Neil Brown
2006-05-30  7:10                 ` Nick Piggin
2006-05-31  4:34                   ` Neil Brown
2006-05-30  8:24               ` Nikita Danilov
2006-05-30 17:55               ` Linus Torvalds
2006-05-31  0:32                 ` Nick Piggin
2006-05-31  0:56                   ` Linus Torvalds [this message]
2006-05-31  1:33                     ` Mark Lord
2006-05-31  6:11                       ` Jens Axboe
2006-05-31 12:55                         ` Mark Lord
2006-05-31 13:02                           ` Jens Axboe
2006-06-01 13:19                           ` NCQ performance (was Re: [rfc][patch] remove racy sync_page?) Jens Axboe
2006-06-01 14:56                             ` Avi Kivity
2006-06-01 15:03                               ` Jens Axboe
2006-06-01 18:04                                 ` Jens Axboe
2006-06-05  5:30                                   ` Avi Kivity
2006-06-05  7:59                                     ` Jens Axboe
2006-05-31 12:31                     ` [rfc][patch] remove racy sync_page? Helge Hafting
2006-05-31 12:36                       ` Arjan van de Ven
2006-05-31 13:29                     ` Nick Piggin
2006-05-31 13:41                       ` Jens Axboe
2006-05-31 13:54                         ` Nick Piggin
2006-05-31 14:43                       ` Linus Torvalds
2006-05-31 14:57                         ` Nick Piggin
2006-05-31 15:13                           ` Linus Torvalds
2006-05-31 15:09                         ` Linus Torvalds
2006-05-31 18:13                           ` Jens Axboe
2006-05-31 18:26                             ` Linus Torvalds
2006-05-30  5:36             ` Nick Piggin
2006-05-30 18:31               ` Hugh Dickins
2006-05-31  0:21                 ` Nick Piggin
2006-05-31  3:06                   ` Hugh Dickins
2006-05-31 14:30                     ` Hugh Dickins
2006-05-31 17:56                     ` Jens Axboe
2006-05-30  5:51 ` Josef Sipek
2006-05-30  6:44   ` Nick Piggin
2006-05-30  6:50     ` Nick Piggin
2006-05-30 13:12     ` Josef Sipek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0605301739030.24646@g5.osdl.org \
    --to=torvalds@osdl.org \
    --cc=andrea@suse.de \
    --cc=axboe@suse.de \
    --cc=hugh@veritas.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mason@suse.com \
    --cc=nickpiggin@yahoo.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox