From: Al Boldi <a1426z@gawab.com>
To: Chris Snook <csnook@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: How can we make page replacement smarter (was: swap-prefetch)
Date: Sat, 28 Jul 2007 07:17:41 +0300 [thread overview]
Message-ID: <200707280717.41250.a1426z@gawab.com> (raw)
In-Reply-To: <46AAA25E.7040301@redhat.com>
Chris Snook wrote:
> Resource size has been outpacing processing latency since the dawn of
> time. Disks get bigger much faster than seek times shrink. Main memory
> and cache keep growing, while single-threaded processing speed has nearly
> ground to a halt.
>
> In the old days, it made lots of sense to manage resource allocation in
> pages and blocks. In the past few years, we started reserving blocks in
> ext3 automatically because it saves more in seek time than it costs in
> disk space. Now we're taking preallocation and antifragmentation to the
> next level with extent-based allocation in ext4.
>
> Well, we're still using bitmap-style allocation for pages, and the
> prefetch-less swap mechanism adheres to this design as well. Maybe it's
> time to start thinking about memory in a somewhat more extent-like
> fashion.
>
> With swap prefetch, we're only optimizing the case when the box isn't
> loaded and there's RAM free, but we're not optimizing the case when the
> box is heavily loaded and we need for RAM to be free. This is a complete
> reversal of sane development priorities. If swap batching is an
> optimization at all (and we have empirical evidence that it is) then it
> should also be an optimization to swap out chunks of pages when we need to
> free memory.
>
> So, how do we go about this grouping? I suggest that if we keep per-VMA
> reference/fault/dirty statistics, we can tell which logically distinct
> chunks of memory are being regularly used. This would also us to apply
> different page replacement policies to chunks of memory that are being
> used in different fashions.
>
> With such statistics, we could then page out VMAs in 2MB chunks when we're
> under memory pressure, also giving us the option of transparently paging
> them back in to hugepages when we have the memory free, once anonymous
> hugepage support is in place.
>
> I'm inclined to view swap prefetch as a successful scientific experiment,
> and use that data to inform a more reasoned engineering effort. If we can
> design something intelligent which happens to behave more or less like
> swap prefetch does under the circumstances where swap prefetch helps, and
> does something else smart under the circumstances where swap prefetch
> makes no discernable difference, it'll be a much bigger improvement.
>
> Because we cannot prove why the existing patch helps, we cannot say what
> impact it will have when things like virtualization and solid state drives
> radically change the coefficients of the equation we have not solved.
> Providing a sysctl to turn off a misbehaving feature is a poor substitute
> for doing it right the first time, and leaving it off by default will
> ensure that it only gets used by the handful of people who know enough to
> rebuild with the patch anyway.
>
> Let's talk about how we can make page replacement smarter, so it naturally
> accomplishes what swap prefetch accomplishes, as part of a design we can
> reason about.
>
> CC-ing linux-mm, since that's where I think we should take this next.
Good idea, but unless we understand the problems involved, we are bound to
repeat it. So my first question would be: Why is swap-in so slow?
As I have posted in other threads, swap-in of consecutive pages suffers a 2x
slowdown wrt swap-out, whereas swap-in of random pages suffers over 6x
slowdown.
Because it is hard to quantify the expected swap-in speed for random pages,
let's first tackle the swap-in of consecutive pages, which should be at
least as fast as swap-out. So again, why is swap-in so slow?
Once we understand this problem, we may be able to suggest a smart
improvement.
Thanks!
--
Al
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-07-28 4:17 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <200707272243.02336.a1426z@gawab.com>
2007-07-28 1:56 ` swap-prefetch: A smart way to make good use of idle resources (was: updatedb) Chris Snook
2007-07-28 4:17 ` Al Boldi [this message]
2007-07-28 7:27 ` How can we make page replacement smarter (was: swap-prefetch) Chris Snook
2007-07-28 11:11 ` Al Boldi
2007-07-29 4:07 ` Rik van Riel
2007-07-29 6:40 ` Erblichs
2007-07-29 1:46 ` How can we make page replacement smarter Rik van Riel
2007-07-29 13:09 ` Alan Cox
2007-07-29 15:01 ` Rik van Riel
2007-07-29 14:55 ` Al Boldi
2007-07-28 4:18 ` swap-prefetch: A smart way to make good use of idle resources (was: updatedb) Al Boldi
[not found] <fa.RQO1FPcnWSV7f0LbL9tuLuh/fYY@ifi.uio.no>
[not found] ` <fa.FI89MRq1q0M+6SmmYNPsXQv2gC8@ifi.uio.no>
[not found] ` <fa./S2LBynIjozRhHfPsYxB9mQDpKE@ifi.uio.no>
[not found] ` <fa.0CL7DLsw6U7akTkW79pdCM5NPRk@ifi.uio.no>
2007-07-28 16:32 ` How can we make page replacement smarter (was: swap-prefetch) Robert Hancock
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200707280717.41250.a1426z@gawab.com \
--to=a1426z@gawab.com \
--cc=csnook@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox