From: Scott Kaplan <sfkaplan@cs.amherst.edu>
To: linux-mm@kvack.org
Subject: Re: Broad questions about the current design
Date: Mon, 12 Aug 2002 13:58:17 -0400 [thread overview]
Message-ID: <147C8BD2-AE1D-11D6-8D07-000393829FA4@cs.amherst.edu> (raw)
In-Reply-To: <E17eBGG-0001nL-00@starship>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On Monday, August 12, 2002, at 05:13 AM, Daniel Phillips wrote:
> On Friday 09 August 2002 17:12, Scott Kaplan wrote:
>> 1) What happened to page ages? I found them in 2.4.0, but they're
>> gone by 2.4.19, and remain gone in 2.5.30.
>
> The jury is still out as to whether aging or LRU is the better page
> replacement policy, and to date, no formal comparisons have been done.
Okay, so *that* explains when and how it happened. Thank you. As to
``which is better'': At the least, it's a very hard question to answer,
not only because testing VM policies is difficult, but also because this
policy is managing pages that are being put to very different uses. From
a VM perspective, LRU tends to be the better idea, and frequency
information, while always tempting, is generally a bad idea with one
notable exception -- In those cases where LRU performs poorly, frequency
allows the replacement policy to deviate from LRU. As it happens, just
about *anything* that isn't LRU performs better in those cases, so there's
nothing laudable about using frequency information in such situations, as
other non-LRU approaches work, too. Since page aging is only partly
frequency based, it may be that its benefits are exactly these cases where
just about anything that deviates from LRU will help.
For filesystem caching, the picture is less clear. Some studies have
shown frequency to be a genuinely good idea, as file access patterns
exhibit strong regularities for which LRU performs poorly. While I think
that even those studies are oversimplifying the problem, frequency could
be a decent approach. Since the Linux MM system manages both VM pages and
filesystem cache pages together, its hard to say how those pools compete,
and which policy is a better choice. I certainly think that something
LRU-like is going to be more stable and predictable, failing in cases that
people understand pretty well.
> This code implements the LRU on the active list:
>
> http://lxr.linux.no/source/mm/vmscan.c?v=2.5.28#L349:
>
> 349 if (page->pte.chain && page_referenced(page)) {
> 350 list_del(&page->lru);
> 351 list_add(&page->lru, &active_list);
> 352 pte_chain_unlock(page);
> 353 continue;
> 354 }
>
> Yes, it was supposed to be LRU but as you point out, it's merely a clock.
> It would be an LRU if the list deletion and reinsertion occured directly
> in
> try_to_swap_out, but there the page referenced bit is merely set. I asked
> Andrea why he did not do this and he wasn't sure, but he thought that
> maybe
> the way he did it was more efficient.
I'm a bit confused by these comments, so maybe you can help me out a bit.
While I agree that it would be possible to move pages to the front of the
active list in try_to_swap_out() rather than setting their reference bits,
I don't think that change would make this an LRU policy. There are only
two ways to achieve a true LRU ordering:
1) Trap into the kernel on every reference, moving the referenced page to
the front immediately. (Obviously, the overhead here would be absurd.)
2) Use hardware that timestamps each page frame with the time of each
reference, allowing you to discover the order of last reference. No chip
does this, of course, since it's not worth the hardware or the cycles to
examine the timestamps.
In other words, by the time try_to_swap_out() runs, it is possible to
discover which pages have been used lately and move them to the front, but
the order of last reference among those pages is already lost. It's not a
true LRU ordering.
Critically, true LRU orderings aren't worth much. That the active list is
managed via CLOCK is totally appropriate and desirable. The entire
purpose of CLOCK is that it is an approximation of LRU -- and, in fact, a
very good one -- that doesn't incur the overhead needed for true LRU. I
can't think of any reason to try to make the active list more LRU-like, as
there's no real benefit to be gained.
> For any page that is explicitly touched, e.g., by file IO, we use
> activate_page, which moves the page to the head of the active list
> regardless of which list the page is currently on. This is a classic LRU.
Okay, that sounds fine, although for the reasons I just mentioned, it's
not clear that it's helping much to move the page to the front if it's
already on the active list, which itself is a CLOCK that yields good
LRU-like behavior.
> The inactive list is a fifo queue. So you have a (sort-of) LRU feeding
> pages from its cold end into the FIFO, and if the page stays on the FIFO
> long enough to reach the code end it gets evicted, or at least it starts
> on the process.
Wait, this doesn't make sense to me. I assume that there is some code
that examines pages on the inactive list and, if they've been referenced,
moves them to the front of the active list. That would make the inactive
list another CLOCK-like queue, not a FIFO queue. (It would be FIFO only
if pages were pulled only from the cold end of the FIFO queue and either
(a) evicted if they've not been used or (b) moved to the front of the
active list if they have been.)
Making this list an LRU queue makes this whole structure a classic
segmented queue (SEGQ) arrangement. The first queue is a FIFO or CLOCK --
a kind of queue where references to pages are not detected immediately,
but only when something from that queue needs replacement for an incoming
page. Pages evicted from the first queue are inserted into the second one.
The purpose of the structure is that pages in the second queue are
referenced far less often, and so incurring the overhead of detecting
their references when they occur -- that is, protecting the pages so that
the reference causes a trap into the kernel -- is a low cost way to order
the pages near eviction. While keeping the inactive queue as a CLOCK-like
structure is fine, it could be a true LRU queue, and there's little
advantage to making it a FIFO queue. (Again, though, it doesn't sound to
me as though it *is* a FIFO queue -- or am I misunderstanding your
comments?)
> [...] and needing to find page referenced bits by virtual scanning. The
> latter means that the referenced information at the cold end of the LRU
> and FIFO is unreliable.
The cold end of the CLOCK actually tends to be quite reliable. I agree
that the cold end of a FIFO queue is not, so why not do away with the
scanning and simply mark pages in the inactive queue as not present
(although they are, of course)? If referenced, they are immediately moved
to the front. It's not likely to happen often, so the overhead is modest
(and, for many workloads, far less than with the continual scanning).
Leave the first queue as a CLOCK, make the second queue a true LRU.
> I haven't tried this, because I think it's more important to get the
> reverse mapping work nailed down so that the page referenced information
> is reliable.
Fair enough -- we can't solve all problems at once. But, I do have a
proposal, now that I've done all of this nit-picking: For my own purposes,
I want a simpler, non-scanning structure. I want the CLOCK/LRU SEGQ
structure that I described. So I'll just go ahead and do that, as it will
be the basis of some other experiments that I'm trying to do. Once (if?)
I've managed that, we can try some workloads to see what the overhead of
scanning is vs. the overhead of minor (non-I/O) page faults for the
inactive list references. My prediction for the outcome is as follows:
For workloads that are loop-like and require space near to the capacity of
memory (that is, workloads that will hit the inactive list pages often),
my approach will incur more overhead. I think on other workloads,
eliminating the scanning will be worthwhile, not only in the reduction of
overhead, but in the elimination of one more factor to tune. (Mind you,
scanning will still be somewhat needed to batch together page-write
operations for cleaning purposes, but that's yet another topic.)
Anyone think this is interesting? Or am I just doing this for myself? (I'
m happy to do it for myself, but if others want to know, I'll try to share
the results.) Also, does anyone think I'm nuts, and misunderstanding some
of the issues? (Always a possibility.)
Scott
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (Darwin)
Comment: For info see http://www.gnupg.org
iD8DBQE9V/c88eFdWQtoOmgRAsPUAJ9P8Qkag/wXeBibK01CjvgnjtnnwgCgkwYw
Zu1FbDBWZPYYV/tg13hifiA=
=sANG
-----END PGP SIGNATURE-----
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
next prev parent reply other threads:[~2002-08-12 17:58 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-08-09 15:12 Scott Kaplan
2002-08-09 15:52 ` William Lee Irwin III
2002-08-09 15:53 ` Rik van Riel
2002-08-12 9:13 ` Daniel Phillips
2002-08-12 17:58 ` Scott Kaplan [this message]
2002-08-12 20:55 ` Rik van Riel
2002-08-12 21:07 ` Martin J. Bligh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=147C8BD2-AE1D-11D6-8D07-000393829FA4@cs.amherst.edu \
--to=sfkaplan@cs.amherst.edu \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox