From: Chris Snook <csnook@redhat.com>
To: Al Boldi <a1426z@gawab.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: How can we make page replacement smarter (was: swap-prefetch)
Date: Sat, 28 Jul 2007 03:27:00 -0400 [thread overview]
Message-ID: <46AAEFC4.8000006@redhat.com> (raw)
In-Reply-To: <200707280717.41250.a1426z@gawab.com>
Al Boldi wrote:
> Chris Snook wrote:
>> Resource size has been outpacing processing latency since the dawn of
>> time. Disks get bigger much faster than seek times shrink. Main memory
>> and cache keep growing, while single-threaded processing speed has nearly
>> ground to a halt.
>>
>> In the old days, it made lots of sense to manage resource allocation in
>> pages and blocks. In the past few years, we started reserving blocks in
>> ext3 automatically because it saves more in seek time than it costs in
>> disk space. Now we're taking preallocation and antifragmentation to the
>> next level with extent-based allocation in ext4.
>>
>> Well, we're still using bitmap-style allocation for pages, and the
>> prefetch-less swap mechanism adheres to this design as well. Maybe it's
>> time to start thinking about memory in a somewhat more extent-like
>> fashion.
>>
>> With swap prefetch, we're only optimizing the case when the box isn't
>> loaded and there's RAM free, but we're not optimizing the case when the
>> box is heavily loaded and we need for RAM to be free. This is a complete
>> reversal of sane development priorities. If swap batching is an
>> optimization at all (and we have empirical evidence that it is) then it
>> should also be an optimization to swap out chunks of pages when we need to
>> free memory.
>>
>> So, how do we go about this grouping? I suggest that if we keep per-VMA
>> reference/fault/dirty statistics, we can tell which logically distinct
>> chunks of memory are being regularly used. This would also us to apply
>> different page replacement policies to chunks of memory that are being
>> used in different fashions.
>>
>> With such statistics, we could then page out VMAs in 2MB chunks when we're
>> under memory pressure, also giving us the option of transparently paging
>> them back in to hugepages when we have the memory free, once anonymous
>> hugepage support is in place.
>>
>> I'm inclined to view swap prefetch as a successful scientific experiment,
>> and use that data to inform a more reasoned engineering effort. If we can
>> design something intelligent which happens to behave more or less like
>> swap prefetch does under the circumstances where swap prefetch helps, and
>> does something else smart under the circumstances where swap prefetch
>> makes no discernable difference, it'll be a much bigger improvement.
>>
>> Because we cannot prove why the existing patch helps, we cannot say what
>> impact it will have when things like virtualization and solid state drives
>> radically change the coefficients of the equation we have not solved.
>> Providing a sysctl to turn off a misbehaving feature is a poor substitute
>> for doing it right the first time, and leaving it off by default will
>> ensure that it only gets used by the handful of people who know enough to
>> rebuild with the patch anyway.
>>
>> Let's talk about how we can make page replacement smarter, so it naturally
>> accomplishes what swap prefetch accomplishes, as part of a design we can
>> reason about.
>>
>> CC-ing linux-mm, since that's where I think we should take this next.
>
> Good idea, but unless we understand the problems involved, we are bound to
> repeat it. So my first question would be: Why is swap-in so slow?
>
> As I have posted in other threads, swap-in of consecutive pages suffers a 2x
> slowdown wrt swap-out, whereas swap-in of random pages suffers over 6x
> slowdown.
>
> Because it is hard to quantify the expected swap-in speed for random pages,
> let's first tackle the swap-in of consecutive pages, which should be at
> least as fast as swap-out. So again, why is swap-in so slow?
If I'm writing 20 pages to swap, I can find a suitable chunk of swap and
write them all in one place. If I'm reading 20 pages from swap, they
could be anywhere. Also, writes get buffered at one or more layers of
hardware. At best, reads can be read-ahead and cached, which is why
sequential swap-in sucks less. On-demand reads are as expensive as I/O
can get.
> Once we understand this problem, we may be able to suggest a smart
> improvement.
There are lots of page replacement schemes that optimize for different
access patterns, and they all suck at certain other access patterns. We
tweak our behavior slightly based on fadvise and madvise hints, but most
of the memory we're managing is an opaque mass. With more statistics,
we could do a better job of managing chunks of unhinted memory with
disparate access patterns. Of course, this imposes overhead. I
suggested VMA granularity because a VMA represents a logically distinct
piece of address space, though this may not be suitable for shared mappings.
-- Chris
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-07-28 7:27 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <200707272243.02336.a1426z@gawab.com>
2007-07-28 1:56 ` swap-prefetch: A smart way to make good use of idle resources (was: updatedb) Chris Snook
2007-07-28 4:17 ` How can we make page replacement smarter (was: swap-prefetch) Al Boldi
2007-07-28 7:27 ` Chris Snook [this message]
2007-07-28 11:11 ` Al Boldi
2007-07-29 4:07 ` Rik van Riel
2007-07-29 6:40 ` Erblichs
2007-07-29 1:46 ` How can we make page replacement smarter Rik van Riel
2007-07-29 13:09 ` Alan Cox
2007-07-29 15:01 ` Rik van Riel
2007-07-29 14:55 ` Al Boldi
2007-07-28 4:18 ` swap-prefetch: A smart way to make good use of idle resources (was: updatedb) Al Boldi
[not found] <fa.RQO1FPcnWSV7f0LbL9tuLuh/fYY@ifi.uio.no>
[not found] ` <fa.FI89MRq1q0M+6SmmYNPsXQv2gC8@ifi.uio.no>
[not found] ` <fa./S2LBynIjozRhHfPsYxB9mQDpKE@ifi.uio.no>
[not found] ` <fa.0CL7DLsw6U7akTkW79pdCM5NPRk@ifi.uio.no>
2007-07-28 16:32 ` How can we make page replacement smarter (was: swap-prefetch) Robert Hancock
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=46AAEFC4.8000006@redhat.com \
--to=csnook@redhat.com \
--cc=a1426z@gawab.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox