From: Scott Kaplan <sfkaplan@cs.amherst.edu>
To: Rik van Riel <riel@conectiva.com.br>
Cc: Andrew Morton <akpm@zip.com.au>, Christoph Hellwig <hch@lst.de>,
torvalds@transmeta.com, linux-mm@kvack.org
Subject: Re: [RFC] start_aggressive_readahead
Date: Mon, 29 Jul 2002 11:24:07 -0400 [thread overview]
Message-ID: <397881C8-A307-11D6-A4C0-000393829FA4@cs.amherst.edu> (raw)
In-Reply-To: <Pine.LNX.4.44L.0207282355130.3086-100000@imladris.surriel.com>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On Sunday, July 28, 2002, at 11:05 PM, Rik van Riel wrote:
> My experience with 300 ftp clients pulling a collective 40 Mbit/s
> suggests otherwise.
>
> About 70% of the clients were on modem speed and the other 30% of
> the clients were on widely variable higher speeds.
>
> Since a disk seek + read is about 10ms, the absolute maximum
> number of seeks that can be done is 100 a second and the minimum
> amount of time between disk seeks for one stream should be about
> 3 seconds.
This is a very interesting example of some real (and important) reference
behavior that must be understood to be handled well. In the context of
this thread of discussion, this case is substantially different from your
original comment on read-ahead for ``linear file I/O''.
Just as a refresher for myself and anyone else that needs it: I claimed
that linear file I/O was a case in which read-ahead blocks should not be
cached for long before they would either be used or evicted from lack of
use. (That is, they should be placed nearer to the end of the LRU
ordering.) The claim was based on the observation that sequential file
traversal is a very good case for read-ahead, where the read-ahead blocks
are very likely to be used very soon.
What's important about this example is that, due to the whole system
workload and the disparate connection speeds of the ftp clients, it is
*NOT* a typical case of linear file I/O. In fact, what's odd about it is
that block `i' of a file will be read, and for slower connections, block `
i+1' will *not* be used for some time, since reading block `i' will take a
while. In other words, the interleaved reference behavior from all of
these ftp downloads makes the prediction that block `i+1' will be used
soon a weaker prediction. It is very likely to be used, yes, but not so
soon in many cases due to the other files being read and referenced.
Because the soonness of use is weak, we do indeed want to cache the
read-ahead pages for longer. (That is, I agree that for this example,
read-ahead pages should go into the active list.) Caching read-ahead
pages for longer, though, displaces more used pages, forcing them to be
evicted sooner then they would have been without the aggressive read-ahead
caching. Critically, for *this* workload, that's probably just fine.
Assuming that different files are being downloaded by different ftp
clients, after reading and referencing a block, it's probably worth little
to cache it in case of re-use for very long. In other words, among the
referenced pages, those near the end of the LRU ordering are referenced
rarely. The competition between read-ahead pages and less recently used
referenced pages is lopsided in favor of the read-ahead pages. But that
is only a consequence of reference pattern for *this specific workload* --
it may not be true for other workloads.
Incidentally, this is all just mental masturbation until someone actually
records and measures the reference behavior from this kind of workload.
It all sounds about right, but that's neither good science nor good
engineering.
In short, I agree that for this case, inserting read-ahead pages into the
inactive list may not be aggressive enough. I disagree that the reason is
``linear file I/O'', as the reference pattern here is more complex than
that. This is also a wonderful case for getting read-ahead caching
adaptivity right: A system that can weigh read-ahead caching allocations
against less recently used referenced-page allocations will detect and
adjust to this case quickly, while avoiding such aggressive read-ahead
caching for other workloads.
Scott
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (Darwin)
Comment: For info see http://www.gnupg.org
iD8DBQE9RV4a8eFdWQtoOmgRAk6tAKCYX8tHrauHGMaek1oyCJMvEQf5yACgrEgX
pHx2gTsY4HTy9OUmOZjT7I8=
=JTJP
-----END PGP SIGNATURE-----
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
next prev parent reply other threads:[~2002-07-29 15:24 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-07-25 16:10 Christoph Hellwig
2002-07-25 16:44 ` Rik van Riel
2002-07-25 19:40 ` Andrew Morton
2002-07-26 16:50 ` Scott Kaplan
2002-07-26 19:38 ` Andrew Morton
2002-07-28 23:32 ` Scott Kaplan
2002-07-29 0:19 ` Rik van Riel
2002-07-29 2:12 ` Scott Kaplan
2002-07-29 3:05 ` Rik van Riel
2002-07-29 15:24 ` Scott Kaplan [this message]
2002-07-29 7:34 ` Andrew Morton
2002-07-29 7:37 ` Vladimir Dergachev
2002-07-29 7:53 ` Andrew Morton
2002-07-29 8:04 ` Rik van Riel
2002-07-30 16:11 ` Scott Kaplan
2002-07-30 16:21 ` Martin J. Bligh
2002-07-30 16:38 ` Scott Kaplan
2002-07-30 16:52 ` Martin J. Bligh
2002-08-05 18:54 ` Scott Kaplan
2002-07-30 17:13 ` William Lee Irwin III
2002-07-26 20:14 ` Stephen Lord
2002-07-26 20:29 ` Andrew Morton
2002-07-26 6:53 ` Daniel Phillips
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=397881C8-A307-11D6-A4C0-000393829FA4@cs.amherst.edu \
--to=sfkaplan@cs.amherst.edu \
--cc=akpm@zip.com.au \
--cc=hch@lst.de \
--cc=linux-mm@kvack.org \
--cc=riel@conectiva.com.br \
--cc=torvalds@transmeta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox