* [PATCH] kswapd fix & logic improvement
@ 1998-03-03 0:35 Rik van Riel
1998-03-03 7:16 ` Michael L. Galbraith
0 siblings, 1 reply; 10+ messages in thread
From: Rik van Riel @ 1998-03-03 0:35 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Stephen C. Tweedie, linux-mm, linux-kernel
Hi there,
here's the final patch to improve kswapd behaviour
and improve the performance of the readahead code.
It was diffed against 2.1.89pre2, but since the VM
code hasn't changed up to pre5, it can be applied
easily.
I'm currently running a kernel with those changes,
and it works better than before.
To Linus: this is code is either so trivial or so
well-tested that it _can_ be safely merged into
pre6 or .89-final...
Rik.
+-----------------------------+------------------------------+
| For Linux mm-patches, go to | "I'm busy managing memory.." |
| my homepage (via LinuxHQ). | H.H.vanRiel@fys.ruu.nl |
| ...submissions welcome... | http://www.fys.ruu.nl/~riel/ |
+-----------------------------+------------------------------+
--- linux/mm/filemap.c.orig Thu Feb 26 21:10:44 1998
+++ linux/mm/filemap.c Thu Feb 26 21:19:52 1998
@@ -25,6 +25,7 @@
#include <linux/smp.h>
#include <linux/smp_lock.h>
#include <linux/blkdev.h>
+#include <linux/swapctl.h>
#include <asm/system.h>
#include <asm/pgtable.h>
@@ -158,12 +159,15 @@
switch (atomic_read(&page->count)) {
case 1:
- /* If it has been referenced recently, don't free it */
- if (test_and_clear_bit(PG_referenced, &page->flags))
- break;
-
/* is it a swap-cache or page-cache page? */
if (page->inode) {
+ if (test_and_clear_bit(PG_referenced, &page->flags)) {
+ touch_page(page);
+ break;
+ }
+ age_page(page);
+ if (page->age)
+ break;
if (PageSwapCache(page)) {
delete_from_swap_cache(page);
return 1;
@@ -173,6 +177,10 @@
__free_page(page);
return 1;
}
+ /* It's not a cache page, so we don't do aging.
+ * If it has been referenced recently, don't free it */
+ if (test_and_clear_bit(PG_referenced, &page->flags))
+ break;
/* is it a buffer cache page? */
if ((gfp_mask & __GFP_IO) && bh && try_to_free_buffer(bh, &bh, 6))
--- linux/mm/page_alloc.c.orig Mon Mar 2 23:32:16 1998
+++ linux/mm/page_alloc.c Tue Mar 3 00:03:48 1998
@@ -108,22 +108,51 @@
* but this had better return false if any reasonable "get_free_page()"
* allocation could currently fail..
*
- * Right now we just require that the highest memory order should
- * have at least two entries. Whether this makes sense or not
- * under real load is to be tested, but it also gives us some
- * guarantee about memory fragmentation (essentially, it means
- * that there should be at least two large areas available).
+ * Currently we approve of the following situations:
+ * - the highest memory order has two entries
+ * - the highest memory order has one free entry and:
+ * - the next-highest memory order has two free entries
+ * - the highest memory order has one free entry and:
+ * - the next-highest memory order has one free entry
+ * - the next-next-highest memory order has two free entries
+ *
+ * [previously, there had to be two entries of the highest memory
+ * order, but this lead to problems on large-memory machines.]
*/
int free_memory_available(void)
{
- int retval;
+ int retval = 0;
unsigned long flags;
- struct free_area_struct * last = free_area + NR_MEM_LISTS - 1;
+ struct free_area_struct * biggest = free_area + NR_MEM_LISTS - 1;
+ struct free_area_struct * bigger = free_area + NR_MEM_LISTS - 2;
+ struct free_area_struct * big = free_area + NR_MEM_LISTS - 3;
spin_lock_irqsave(&page_alloc_lock, flags);
- retval = (last->next != memory_head(last)) && (last->next->next != memory_head(last));
+ if (biggest->next != memory_head(biggest)) {
+ retval = 4;
+ if (biggest->next->next != memory_head(biggest))
+ retval += 4;
+ } else {
+ /* we want at least one free area of the 'biggest' size */
+ goto out;
+ }
+ if (bigger->next != memory_head(bigger)) {
+ retval += 2;
+ if (bigger->next->next != memory_head(bigger))
+ retval += 2;
+ } else {
+ /* if we have only one free area of the 'biggest' size, we also
+ * want one of the 'bigger' size */
+ goto out;
+ }
+ if (big->next != memory_head(big)) {
+ retval += 1;
+ if (big->next->next != memory_head(big))
+ retval += 1;
+ }
+out:
spin_unlock_irqrestore(&page_alloc_lock, flags);
- return retval;
+ return retval > 7;
}
static inline void free_pages_ok(unsigned long map_nr, unsigned long order)
--- linux/mm/vmscan.c.orig Thu Feb 26 21:10:33 1998
+++ linux/mm/vmscan.c Thu Feb 26 21:57:53 1998
@@ -539,7 +539,7 @@
init_swap_timer();
add_wait_queue(&kswapd_wait, &wait);
while (1) {
- int async;
+ int tries;
kswapd_awake = 0;
flush_signals(current);
@@ -549,32 +549,45 @@
kswapd_awake = 1;
swapstats.wakeups++;
/* Do the background pageout:
- * We now only swap out as many pages as needed.
- * When we are truly low on memory, we swap out
- * synchronously (WAIT == 1). -- Rik.
- * If we've had too many consecutive failures,
- * go back to sleep to let other tasks run.
+ * When we've got loads of memory, we try
+ * (free_pages_high - nr_free_pages) times to
+ * free memory. As memory gets tighter, kswapd
+ * gets more and more agressive. -- Rik.
*/
- async = 1;
- for (;;) {
+ tries = free_pages_high - nr_free_pages;
+ if (tries < min_free_pages) {
+ tries = min_free_pages;
+ }
+ else if (nr_free_pages < (free_pages_high + free_pages_low) / 2) {
+ tries <<= 1;
+ if (nr_free_pages < free_pages_low) {
+ tries <<= 1;
+ if (nr_free_pages <= min_free_pages) {
+ tries <<= 1;
+ }
+ }
+ }
+ while (tries--) {
int gfp_mask;
if (free_memory_available())
break;
gfp_mask = __GFP_IO;
- if (!async)
- gfp_mask |= __GFP_WAIT;
- async = try_to_free_page(gfp_mask);
- if (!(gfp_mask & __GFP_WAIT) || async)
- continue;
-
+ try_to_free_page(gfp_mask);
/*
- * Not good. We failed to free a page even though
- * we were synchronous. Complain and give up..
+ * Syncing large chunks is faster than swapping
+ * synchronously (less head movement). -- Rik.
*/
- printk("kswapd: failed to free page\n");
- break;
+ if (atomic_read(&nr_async_pages) >= SWAP_CLUSTER_MAX)
+ run_task_queue(&tq_disk);
+
}
+ /*
+ * Report failure if we couldn't even reach min_free_pages.
+ */
+ if (nr_free_pages < min_free_pages)
+ printk("kswapd: failed, got %d of %d\n",
+ nr_free_pages, min_free_pages);
}
/* As if we could ever get here - maybe we want to make this killable */
remove_wait_queue(&kswapd_wait, &wait);
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] kswapd fix & logic improvement
1998-03-03 0:35 [PATCH] kswapd fix & logic improvement Rik van Riel
@ 1998-03-03 7:16 ` Michael L. Galbraith
1998-03-03 7:39 ` Rik van Riel
0 siblings, 1 reply; 10+ messages in thread
From: Michael L. Galbraith @ 1998-03-03 7:16 UTC (permalink / raw)
To: Rik van Riel; +Cc: linux-mm, linux-kernel
On Tue, 3 Mar 1998, Rik van Riel wrote:
> Hi there,
>
> here's the final patch to improve kswapd behaviour
> and improve the performance of the readahead code.
>
> It was diffed against 2.1.89pre2, but since the VM
> code hasn't changed up to pre5, it can be applied
> easily.
>
> I'm currently running a kernel with those changes,
> and it works better than before.
>
Hello Rik,
I was able to stimulate a 'swap-attack' which took almost a hour to
recover control from.
Started X+KDE in 32bpp + some toys. Started 5 instances of Xboard in
two machine mode. This + toys took the machine up to a working set of
180+ MB on a 80 MB machine. It was swapping like mad (better be) but
all tasks were progressing nicely. I let the xboards run until the
games were mostly over, and reset them to keep the pressure as high
as possible.
After about an hour and a half of this, I started a find on a drive
with 1.5G of 'stuff'. The find ran nicely, but after it finished,
cpu usage dropped to almost zilch. The machine ended up doing almost
nothing but swapping and became useless and nearly unaccessable.
After (finally) managing to terminate a couple of processes and dropping
the working set to ~120MB, the xboards began to run again and cpu usage
snapped back to normal (pegged).
As I was composing this message (machine idle until updatedb started),
a slew of .. 'kswapd: failed, got xxx of 160' came flying across the
screen. Examining my logs, I find that there are about 10000 lines of
these messages beginning with ..
Mar 3 05:24:24 mikeg kernel: kswapd: failed, got 134 of 160
(beginning of heavy swapping) and ending with
Mar 3 07:49:48 mikeg kernel: kswapd: failed, got 97 of 160
2.1.89pre5 + swap patch
-Mike
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] kswapd fix & logic improvement
1998-03-03 7:16 ` Michael L. Galbraith
@ 1998-03-03 7:39 ` Rik van Riel
1998-03-03 16:10 ` Michael L. Galbraith
0 siblings, 1 reply; 10+ messages in thread
From: Rik van Riel @ 1998-03-03 7:39 UTC (permalink / raw)
To: Michael L. Galbraith; +Cc: linux-mm, linux-kernel
On Tue, 3 Mar 1998, Michael L. Galbraith wrote:
> I was able to stimulate a 'swap-attack' which took almost a hour to
> recover control from.
>
> 2.1.89pre5 + swap patch
The attack you mention is going to affect _every_ kernel
out there. It's just that without my patch a lot of
random processes are going to be killed with signal 7 (sigbus).
Now kswapd is somewhat better to keep up with things, it
will remain swapping, instead of killing...
To 'recover from' or 'handle' your attack (180+ mb working
set on an 80 mb machine) is going to need 'real' swapping,
ie. the temporary suspension of processes to reduce VM load.
I'd like you to try to even start your stress test under a
normal kernel (it'll probably work, but not without the
neccesary oom()s and signal 7s).
This patch is only an improvement for normal use. Anyways,
thrashing can't be combatted by paging algorithms, no matter
how good.
I'll be working on the swapping daemon as soon as I've got
the current patch sorted out...
Rik.
+-----------------------------+------------------------------+
| For Linux mm-patches, go to | "I'm busy managing memory.." |
| my homepage (via LinuxHQ). | H.H.vanRiel@fys.ruu.nl |
| ...submissions welcome... | http://www.fys.ruu.nl/~riel/ |
+-----------------------------+------------------------------+
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] kswapd fix & logic improvement
1998-03-03 7:39 ` Rik van Riel
@ 1998-03-03 16:10 ` Michael L. Galbraith
1998-03-03 17:16 ` Rik van Riel
0 siblings, 1 reply; 10+ messages in thread
From: Michael L. Galbraith @ 1998-03-03 16:10 UTC (permalink / raw)
To: Rik van Riel; +Cc: linux-mm, linux-kernel
On Tue, 3 Mar 1998, Rik van Riel wrote:
> On Tue, 3 Mar 1998, Michael L. Galbraith wrote:
>
> > I was able to stimulate a 'swap-attack' which took almost a hour to
> > recover control from.
> >
> > 2.1.89pre5 + swap patch
>
> To 'recover from' or 'handle' your attack (180+ mb working
> set on an 80 mb machine) is going to need 'real' swapping,
> ie. the temporary suspension of processes to reduce VM load.
>
> I'd like you to try to even start your stress test under a
> normal kernel (it'll probably work, but not without the
> neccesary oom()s and signal 7s).
>
I've run much larger working sets on this machine without either
losing control or having the tasks killed. I've run simulations
which ate 400+ Mb. The realtime aspect was a joke, but it worked.
> This patch is only an improvement for normal use. Anyways,
> thrashing can't be combatted by paging algorithms, no matter
> how good.
>
OK.. thought you wanted it pounded upon.
It was running fine with all tasks being scheduled smoothly until
something triggered a mega-thrash.
> I'll be working on the swapping daemon as soon as I've got
> the current patch sorted out...
>
Turned out the kswapd messages weren't related to the thrashing.
I would have seen it if I hadn't jumped straight into X.
-Mike
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] kswapd fix & logic improvement
1998-03-03 16:10 ` Michael L. Galbraith
@ 1998-03-03 17:16 ` Rik van Riel
1998-03-03 19:17 ` Benjamin C.R. LaHaise
0 siblings, 1 reply; 10+ messages in thread
From: Rik van Riel @ 1998-03-03 17:16 UTC (permalink / raw)
To: Michael L. Galbraith; +Cc: linux-mm, linux-kernel
On Tue, 3 Mar 1998, Michael L. Galbraith wrote:
> > To 'recover from' or 'handle' your attack (180+ mb working
> > set on an 80 mb machine) is going to need 'real' swapping,
> > ie. the temporary suspension of processes to reduce VM load.
>
> I've run much larger working sets on this machine without either
> losing control or having the tasks killed. I've run simulations
> which ate 400+ Mb. The realtime aspect was a joke, but it worked.
When allocation is done piece-by-piece, and there's only
one big process which is faulting all the time, all known
Linux kernels can handle it (more or less).
> > This patch is only an improvement for normal use. Anyways,
> > thrashing can't be combatted by paging algorithms, no matter
> > how good.
>
> OK.. thought you wanted it pounded upon.
You were right about that. I wanted to be sure that my
patch was at least as solid as the old code before it
gets merged into the kernel. Judging from the reports
I got, it is. In fact, most people have reported a big
improvement, and some people have pounded and ground it
to a crawl (without being able to make it crash).
> It was running fine with all tasks being scheduled smoothly until
> something triggered a mega-thrash.
Once you start thrashing, only real swapping is an option
to save performance (somewhat).
> > I'll be working on the swapping daemon as soon as I've got
> > the current patch sorted out...
>
> Turned out the kswapd messages weren't related to the thrashing.
> I would have seen it if I hadn't jumped straight into X.
Ahh, yes. X allocates a _lot_ of memory at once, and then
the damn thing _uses_ it at once... This is guaranteed to
make kswapd a bit nervous, both with or without my patch.
Rik.
+-----------------------------+------------------------------+
| For Linux mm-patches, go to | "I'm busy managing memory.." |
| my homepage (via LinuxHQ). | H.H.vanRiel@fys.ruu.nl |
| ...submissions welcome... | http://www.fys.ruu.nl/~riel/ |
+-----------------------------+------------------------------+
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] kswapd fix & logic improvement
1998-03-03 17:16 ` Rik van Riel
@ 1998-03-03 19:17 ` Benjamin C.R. LaHaise
1998-03-04 8:33 ` Pavel Machek
0 siblings, 1 reply; 10+ messages in thread
From: Benjamin C.R. LaHaise @ 1998-03-03 19:17 UTC (permalink / raw)
To: Rik van Riel; +Cc: Michael L. Galbraith, linux-mm, linux-kernel
On Tue, 3 Mar 1998, Rik van Riel wrote:
...
> > Turned out the kswapd messages weren't related to the thrashing.
> > I would have seen it if I hadn't jumped straight into X.
>
> Ahh, yes. X allocates a _lot_ of memory at once, and then
> the damn thing _uses_ it at once... This is guaranteed to
> make kswapd a bit nervous, both with or without my patch.
Not only that, but the network activity X induces puts additional stress
on an already low-memory system by allocating lots of unswappable memory.
When might we see Pavel's patches to the networking stack meant to get
swapping over TCP working, but I think they'll really help stability on
systems with low-memory and busy networks, get integrated?
-ben
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] kswapd fix & logic improvement
1998-03-03 19:17 ` Benjamin C.R. LaHaise
@ 1998-03-04 8:33 ` Pavel Machek
1998-03-06 9:06 ` Benjamin C.R. LaHaise
0 siblings, 1 reply; 10+ messages in thread
From: Pavel Machek @ 1998-03-04 8:33 UTC (permalink / raw)
To: Benjamin C.R. LaHaise
Cc: Rik van Riel, Michael L. Galbraith, linux-mm, linux-kernel
Hi!
> ...
> > > Turned out the kswapd messages weren't related to the thrashing.
> > > I would have seen it if I hadn't jumped straight into X.
> >
> > Ahh, yes. X allocates a _lot_ of memory at once, and then
> > the damn thing _uses_ it at once... This is guaranteed to
> > make kswapd a bit nervous, both with or without my patch.
>
> Not only that, but the network activity X induces puts additional stress
> on an already low-memory system by allocating lots of unswappable memory.
> When might we see Pavel's patches to the networking stack meant to get
> swapping over TCP working, but I think they'll really help stability on
> systems with low-memory and busy networks, get integrated?
Sorry? My patches are usable only if you are trying to swap over
network. They will not help on low-memory systems, unless that systems
also lack hard-drives. It is usually much better to swap onto local
drive than over network.
Pavel
--
I'm really pavel@atrey.karlin.mff.cuni.cz. Pavel
Look at http://atrey.karlin.mff.cuni.cz/~pavel/ ;-).
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] kswapd fix & logic improvement
1998-03-04 8:33 ` Pavel Machek
@ 1998-03-06 9:06 ` Benjamin C.R. LaHaise
1998-03-06 14:40 ` Pavel Machek
0 siblings, 1 reply; 10+ messages in thread
From: Benjamin C.R. LaHaise @ 1998-03-06 9:06 UTC (permalink / raw)
To: Pavel Machek; +Cc: Rik van Riel, Michael L. Galbraith, linux-mm, linux-kernel
Hello!
On Wed, 4 Mar 1998, Pavel Machek wrote:
...
> > Not only that, but the network activity X induces puts additional stress
> > on an already low-memory system by allocating lots of unswappable memory.
> > When might we see Pavel's patches to the networking stack meant to get
> > swapping over TCP working, but I think they'll really help stability on
> > systems with low-memory and busy networks, get integrated?
>
> Sorry? My patches are usable only if you are trying to swap over
> network. They will not help on low-memory systems, unless that systems
> also lack hard-drives. It is usually much better to swap onto local
> drive than over network.
If they're setup the way I think they are, you're mistaken. ;-) I'm
thinking of the pathelogical case where the system is thrown into a state
where atomic memory consumption is occurring faster than the system can
free up memory. This could occur on a system with, say 100Mbps ethernet
and a low-end IDE drive (~5-7MBps peak) if we're using TCP with large
windows and have a *large* number of sockets open and receiving data.
Incoming packets could consume up to 10MB of GFP_ATOMIC memory per second
- ouch! With your patch, once we hit a danger zone, the system starts
dropping network packets, right? That way there will still be enough
memory for allocating buffer heads and such to swap out as nescessary...
-ben
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] kswapd fix & logic improvement
1998-03-06 9:06 ` Benjamin C.R. LaHaise
@ 1998-03-06 14:40 ` Pavel Machek
0 siblings, 0 replies; 10+ messages in thread
From: Pavel Machek @ 1998-03-06 14:40 UTC (permalink / raw)
To: Benjamin C.R. LaHaise
Cc: Pavel Machek, Rik van Riel, Michael L. Galbraith, linux-mm, linux-kernel
Hi!
> > > Not only that, but the network activity X induces puts additional stress
> > > on an already low-memory system by allocating lots of unswappable memory.
> > > When might we see Pavel's patches to the networking stack meant to get
> > > swapping over TCP working, but I think they'll really help stability on
> > > systems with low-memory and busy networks, get integrated?
> >
> > Sorry? My patches are usable only if you are trying to swap over
> > network. They will not help on low-memory systems, unless that systems
> > also lack hard-drives. It is usually much better to swap onto local
> > drive than over network.
>
> If they're setup the way I think they are, you're mistaken. ;-) I'm
> thinking of the pathelogical case where the system is thrown into a state
> where atomic memory consumption is occurring faster than the system can
> free up memory. This could occur on a system with, say 100Mbps ethernet
> and a low-end IDE drive (~5-7MBps peak) if we're using TCP with large
> windows and have a *large* number of sockets open and receiving data.
> Incoming packets could consume up to 10MB of GFP_ATOMIC memory per second
> - ouch! With your patch, once we hit a danger zone, the system starts
> dropping network packets, right?
No. I create new priority level ('GFP_NUCLEONIC') which is allowed to
consume few last-resort pages. This pages will be used for networking,
only, and they will be used only for that single socked used for swapping.
> That way there will still be enough
> memory for allocating buffer heads and such to swap out as
> nescessary...
I thought that current swapping is deadlock-free. Am I wrong? [I tried
hard to make network swap deadlock-free. I trusted swap-to-disk code
to be deadlock-free...]
Pavel
--
I'm really pavel@atrey.karlin.mff.cuni.cz. Pavel
Look at http://atrey.karlin.mff.cuni.cz/~pavel/ ;-).
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] kswapd fix & logic improvement
[not found] <199803031135.MAA19461@max.fys.ruu.nl>
@ 1998-03-03 13:10 ` Rik van Riel
0 siblings, 0 replies; 10+ messages in thread
From: Rik van Riel @ 1998-03-03 13:10 UTC (permalink / raw)
To: jahakala; +Cc: linux-mm, Stephen C. Tweedie, Linus Torvalds
On Tue, 3 Mar 1998, Jani Hakala wrote:
> I patched pre5 with your diff. Now I get 'kswapd: failed, got 73 of
> 128' messages all the time.
Maybe you should play around a little with /proc/sys/vm/swapctl
and /proc/sys/vm/freepages...
A 1:2:4 ratio for freepages usualy works wonders.
And a echo "10 3 1 3 0 0 0 0 1024" > swapctl gives some
improvement too. As a matter of fact, I'm currently (read:
now) removing some of the 1.1.xx artifacts from mm/vmscan.c...
Things will straighten up RSN, but until that time, you can:
- tune /proc/sys/vm/*
- tune mm/vmscan.c (just make it more agressive)
The main reason that you didn't get that message with the
old kernel is that it doesn't show you the error, but
you're right, I should limit the printout of the error
(to once every 5 seconds?).
Rik.
+-----------------------------+------------------------------+
| For Linux mm-patches, go to | "I'm busy managing memory.." |
| my homepage (via LinuxHQ). | H.H.vanRiel@fys.ruu.nl |
| ...submissions welcome... | http://www.fys.ruu.nl/~riel/ |
+-----------------------------+------------------------------+
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~1998-03-06 14:40 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1998-03-03 0:35 [PATCH] kswapd fix & logic improvement Rik van Riel
1998-03-03 7:16 ` Michael L. Galbraith
1998-03-03 7:39 ` Rik van Riel
1998-03-03 16:10 ` Michael L. Galbraith
1998-03-03 17:16 ` Rik van Riel
1998-03-03 19:17 ` Benjamin C.R. LaHaise
1998-03-04 8:33 ` Pavel Machek
1998-03-06 9:06 ` Benjamin C.R. LaHaise
1998-03-06 14:40 ` Pavel Machek
[not found] <199803031135.MAA19461@max.fys.ruu.nl>
1998-03-03 13:10 ` Rik van Riel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox