* PATCH: vm/kswapd in linux-2.4.0-test2
@ 2000-07-05 0:02 ludovic fernandez
2000-07-05 20:53 ` Andi Kleen
0 siblings, 1 reply; 3+ messages in thread
From: ludovic fernandez @ 2000-07-05 0:02 UTC (permalink / raw)
To: Linux MM mailing list
[-- Attachment #1: Type: text/plain, Size: 7598 bytes --]
Hello guys,
I'd like to submit a patch against linux-2.4.0-test2 regarding
the vm/kswapd. The patch is attached to this email. Sorry I don't
have access to a web or ftp server where I can put it.
The following paragraph tries to explain what this patch is supposed to do
by describing how the swap works. I'm sure the first part will sound obvious
for most of you in which case you can just skip this part and go directly to
the idea section.
Linux, like any modern OS, tries to cache almost everything in a common cache
with the hope that what is cached will be used/reused/shared soon. The more
a system caches, the better the throughput is and Linux is good at that game.
In particular, this cache contains:
. I/O buffers from read/write.
. shared pages.
. potentially shared pages (as an example, when a page is accessed for
writing, the system keeps a copy of the original page in the cache).
. read-ahead pages.
. potentially re-usable pages (pages from a process that dies stay in
the cache in the hope that the same process will be executed again soon).
. pending swap pages (dirty pages that are/will be swapped out).
Hopefully, this cache grows as long as there is some memory available. The main
reason is that we want the system to use all the resources of the machine and
not just a subset of it. Now, when the memory becomes too low, it is time to
remove some [old] stuff from the cache and this mechanism is called swapping.
In this sense, the probability that the system actually swaps is higher than
one can think. Writing an old dirty page on a swap device in order to free it,
is only one part of the problem. The swap algorithm may have started before that
but the pages removed from the cache happened to be not dirty and simply not
used anymore.
An another interesting behaviour of the swap is that as soon as it is activated,
it never ends. The system keeps the available memory between two water marks.
The low limit activates the swap while the high limit forces it to stop.
The range between the low and high mark depends on how much memory the system
has at boot time, but it is usually pretty small; the deal here is not to throw
away all the content of the memory but rather remove from the cache what seems
to be irrelevant (ok, the less important stuff).
To recap things, the point I wanted to make is the following; The swap
algorithm is basically a cache replacement problem; By design, the system
does eventually swap. And finally, when the swap mechanism is activated,
it never ends until the system shutdowns.
Idea
----
The main problem with kswapd comes from the fact that it actually handles
2 jobs completely different from my point of view. The first one is to actually
free some memory by removing pages from the cache and/or starting a disk I/O for
a dirty page (and even waiting for it if the disk queue becomes dangerously
flooded). The second job is to figure out which pages can be "safely" freed,
i.e which pages are the last recently used. I don't think the 2 jobs are
compatible in terms of when to start, what to do and how to stop.
So, the idea of this patch is actually simple; do the same thing but do it
a little bit differently.
. A new thread (kpaged) ages the physical pages and tries to keep a set
of LRU pages per virtual mapping. The execution model of this thread is
based as much as possible on the idle thread. There is two reasons for
that; First I believe there is enough spare cycles in the system to do
the job in background (especially during pageout activity where I/Os are
important). Second if there is not enough idle time, it probably means
that the system is entering an "overload" situation and kpaged won't have
the time to find correctly the LRU pages anyway.
. The kswapd thread, as usual, wakes up when the memory becomes low and
checks that it is relatively easy to remove/get a page from the page
cache. If it's not, it starts "flushing" the cache by swapping out the
LRU pages computed by kpaged. If kpaged didn't cope, kswapd falls back
to the original algorithm and swap out pages based on the RSS usage.
. Finally, an allocation request does not try to swap out anything, it just
request to get a page from the page cache.
Other improvement?/modifications
--------------------------------
The following is a list of modifications I made to the vm/swapout in addition
of the algorithm described above. There are, in the sense, minor but I believe
still important:
. An allocation request cannot fail because the pageout mechanism didn't
keep up. The only way a normal (i.e no atomic) memory allocation should
fail is if the system is out of swap or if an error occurred during the
swap. If kswapd is too slow, the allocation will wait for kswapd to
catch up.
. The swap doesn't deal with processes but rather with virtual mappings.
Processes can share a virtual mapping because of fork() or because of
multithreaded applications. The problem of swapping is to deal with the
currently allocated memory, swapping processes doesn't seem to be fair
or really efficient.
. A read-ahead memory allocation can be discarded if the available memory
is too low. Read-ahead is very important in the system. However when the
swap is active, a read-ahead page can be removed from the cache before
being hit and in this case we just overload the system for nothing.
. The swap defers a small amount of dirty pages that need to be written on
the swap device (this is a patch I found on the Linux-MM web page coming
from Eric W. Biederman I believe). Some measurement shows that a small
percentage of LRU pages put in the cache by kswapd are actually reused
before being freed. Well, I believe this proves that trying to predict
the future by looking at the past doesn't work all the time.
This patch seems to work well for me. But, I validated/tested it on my own
computer, using my own environment. It's obviously a rather subjective opinion.
In particular, I didn't check it on a SMP machine, so I don't know how it
behaves and even if it's working on SMP.
I modified the Alt-SysReq-M key to have a better understanding of what's going
on in the system:
Swap cache: add {A} [{B}-{C}], del {D}, find {E}/{F} [{G}] {H}%
kswapd: total {I} overload {J} out of sync {K}
kswapd: wakeup {L} [g {M} y {N} o {O} r {P}] free {Q} io {R}
kswapd: aged pages {S} dirty pages {T}
A: total number of pages added to the cache by the swap mechanism.
B: number of swap pages added because of the read-head.
C: number of swap pages added because of kswapd.
D: total number of swap pages deleted from the cache.
E: number of pages found in the cache during a swap page fault.
F: total number of swap page faults.
G: number of pages marked for swapout found in the cache during a
swap page fault.
H: average percentage of hits in the page swap cache.
I: total number of pages marked for swapout by kswapd.
J: number of times kswapd fell back to the RSS usage algorithm
K: number of times a memory allocation had to wait for kswapd.
L: number of times kswapd has been wake-up.
M, N, O, P: number of times kpaged run in green, yellow, orange and
and red mode respectively.
Q: number of times kswapd tried to free something.
R: number of times kswapd tried to swapout a virtual mapping.
S: current view of the total number of LRU pages in the system.
T: number of pending dirty pages in the cache.
Ludo.
[-- Attachment #2: patch_kswapd.gz --]
[-- Type: application/x-gzip, Size: 19229 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: PATCH: vm/kswapd in linux-2.4.0-test2
2000-07-05 0:02 PATCH: vm/kswapd in linux-2.4.0-test2 ludovic fernandez
@ 2000-07-05 20:53 ` Andi Kleen
2000-07-06 2:57 ` ludovic fernandez
0 siblings, 1 reply; 3+ messages in thread
From: Andi Kleen @ 2000-07-05 20:53 UTC (permalink / raw)
To: ludovic fernandez; +Cc: Linux MM mailing list
On Wed, Jul 05, 2000 at 01:59:59AM +0200, ludovic fernandez wrote:
> Hello guys,
>
> I'd like to submit a patch against linux-2.4.0-test2 regarding
> the vm/kswapd. The patch is attached to this email. Sorry I don't
> have access to a web or ftp server where I can put it.
[...]
Nice work. As a datapoint it runs fine on my UP machine with various loads
and feels ``snappy''.
-Andi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: PATCH: vm/kswapd in linux-2.4.0-test2
2000-07-05 20:53 ` Andi Kleen
@ 2000-07-06 2:57 ` ludovic fernandez
0 siblings, 0 replies; 3+ messages in thread
From: ludovic fernandez @ 2000-07-06 2:57 UTC (permalink / raw)
To: Andi Kleen; +Cc: Linux MM mailing list
Hello Andi,
Thanks for trying this patch....but I still believe it needs some tunings.
I will be really interested to get some stats. Could you send me the
report logged by Alt-SysReq-M after some [normal] swap utilization ?
Also, adding your cpu and hard drive type would help a lot (this way
I can have an idea about the ratio between the memory/cpu access
and the I/O throughput).
Since I'm asking for something, I believe it's fair to do the same.
Here is what I got after a working day using this patch.
CPU: AMD k6 3D 550Mhz (1097 bogomips)
HD: IDE WDC ATA66
SysRq: Show Memory
Mem-info:
Free pages: 3116kB ( 0kB HighMem)
( Free: 779, lru_cache: 14837 (239 478 717 956) )
DMA: 119*4kB 22*8kB 1*16kB 1*32kB 3*64kB 0*128kB 0*256kB 0*512kB 0*1024kB
0*2048kB = 892kB)
Normal: 24*4kB 18*8kB 22*16kB 1*32kB 1*64kB 2*128kB 3*256kB 1*512kB 0*1024kB
0*2048kB = 2224kB)
HighMem: = 0kB)
Swap cache: add 59950 [32651-27299], del 57064, find 134795/143195 [4413] 94%
kswapd: total 73264 overload 0 out of sync 0
kswapd: wakeup 649 [g 40857 y 11 o 2 r 0] free 22767 io 4579
kswapd: aged pages 5516 dirty pages 12
Free swap: 202316kB
30704 pages of RAM
0 pages of HIGHMEM
1089 reserved pages
19835 pages shared
2886 pages swap cached
0 pages in page table cache
Buffer memory: 2412kB
Thanks !
Ludo.
Andi Kleen wrote:
> On Wed, Jul 05, 2000 at 01:59:59AM +0200, ludovic fernandez wrote:
> > Hello guys,
> >
> > I'd like to submit a patch against linux-2.4.0-test2 regarding
> > the vm/kswapd. The patch is attached to this email. Sorry I don't
> > have access to a web or ftp server where I can put it.
>
> [...]
>
> Nice work. As a datapoint it runs fine on my UP machine with various loads
> and feels ``snappy''.
>
> -Andi
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux.eu.org/Linux-MM/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2000-07-06 2:57 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-07-05 0:02 PATCH: vm/kswapd in linux-2.4.0-test2 ludovic fernandez
2000-07-05 20:53 ` Andi Kleen
2000-07-06 2:57 ` ludovic fernandez
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox