[PATCH] a simple OOM killer to save me from Netscape

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] a simple OOM killer to save me from Netscape
@ 2001-04-12 16:58 Slats Grobnik
  2001-04-12 18:25 ` Rik van Riel
  2001-04-17 10:58 ` limit for number of processes Uman
  0 siblings, 2 replies; 85+ messages in thread
From: Slats Grobnik @ 2001-04-12 16:58 UTC (permalink / raw)
  To: linux-mm

This is to solve a specific problem, with no claim to generality.  Say
some X-app memory hog (always seems to be a browser) sneaks up on you,
and by the time HD thrashing catches your attention, the mouse & keyboard
have become sluggish or unresponsive--it may already be too late.
Pretty soon, even the Magic SysRq keybindings don't work....
This used to be my only occasion for ever resorting to the Reset
button, until I found out about oom_kill.  In all the message 
traffic about it, I haven't found this particular solution, so
here's my patch.  It might be useful on some desktop systems.

First, I simplified the criteria for selecting the killable app, as
seemed appropriate.  No root processes;  don't worry about CPU time,
nor nice-ness;  leave direct-hardware-access processes alone.  These
changes to the function `badness' weren't quite enough.  A 2.2.17
kernel patched with such an oom_kill was saved from hard-rebooting
5 or 6 times during a 50-day uptime....but only after waiting through
_extended_ bouts of thrashing.  Here's what happens:

By running `free -s1' or `top' it's clear that once swap memory gets
maxed out, *cache* memory size decreases until, at about 4M, mouse & 
keyboard response becomes noticeably sluggish.  At cache=3M or less,
all hope is lost.  But at this point, *free* RAM size may not be
affected much.  And since CPU activity is down to a crawl, it may
take a while to reach minimum (or some small arbitrary figure.)
So I altered the `out_of_memory' function accordingly, and expect to
never reboot again.  (Except for changing kernels, and power outage.
 (But don't ever try to mount a swap partition.  Seriously.  Nor stick
 beans up your nose. ))       regards,   Slats  

:THANKS:
to Rik van Riel for documenting his code with comments plain enough
that a beginner might be tempted.  It's the chance you take.

:NOTES:  
occasioned by my ignorance of LK programming and C.
  1. PAGE_CACHE_SHIFT would be better than PAGE_SHIFT, but the former
     is undefined in oom_kill.c and I don't know enough to go messing
     with includes.  I _think_ the patch is arch independent, if anyone cares.
  2. I don't know why `atomic_read(&page_cache_size)' is better than 
     `page_cache_size.counter';  I'm just mimicking something I saw 
     while grepping source.  Anyway, to patch 2.2.19 & preceding 
     versions, just substitute `page_cache_size' instead
     (which is just a number in 2.2, while in 2.4 it's a struct
     containing a single member, which is a number.  Go figure.)
  3. (3 << 20)-1, or anything under 3 megs:  This value is of course
     negotiable, depending on your system.  Mine's an old Pentium
     MMX, 32M RAM, piix4 chipset, standard Award BIOS, 3.8G IDE HD.
     For an immediate stress test, try Netscape 4.x rendering
     http://www.nature.com/nature/journal/v409/n6822/toc_r.html
     http://www.nature.com/nature/journal/v409/n6818/toc_r.html
     etc.  Swap grows _absurdly_ fast if Javascript is enabled.
  4. After browsing the ML:  It may be better security on a multi-
     user system NOT to neglect processes with direct hardware 
     access.  Either delete that section of `badness' (treating
     DHA cases the same as others), or restore RR's `points /= 4',
     or some other formula.

=== PATCH against Linux kernel 2.4.3 ===   made with diff -u
Copyright (c) 1999-2001 by Rik van Riel & others, under GNU General
     Public License.  http://www.fsf.org/
--- linux-2.4.3/mm/oom_kill.c	Tue Nov 14 12:56:46 2000
+++ linux-alt/mm/oom_kill.c	Wed Apr 11 19:48:30 2001
@@ -23,19 +23,6 @@
 
 /* #define DEBUG */
 
-/**
- * int_sqrt - oom_kill.c internal function, rough approximation to sqrt
- * @x: integer of which to calculate the sqrt
- * 
- * A very rough approximation to the sqrt() function.
- */
-static unsigned int int_sqrt(unsigned int x)
-{
-	unsigned int out = x;
-	while (x & ~(unsigned int)1) x >>=2, out >>=1;
-	if (x) out -= out >> 2;
-	return (out ? out : 1);
-}	
 
 /**
  * oom_badness - calculate a numeric value for how bad this task has been
@@ -46,7 +33,8 @@
  * to kill when we run out of memory.
  *
  * Good in this context means that:
- * 1) we lose the minimum amount of work done
+ * 1) kill only a normal user (not root) process
+ * -)  (amount of work done and niceness don't count)
  * 2) we recover a large amount of memory
  * 3) we don't kill anything innocent of eating tons of memory
  * 4) we want to kill the minimum amount of processes (one)
@@ -57,7 +45,7 @@
 
 static int badness(struct task_struct *p)
 {
-	int points, cpu_time, run_time;
+	int points;
 
 	if (!p->mm)
 		return 0;
@@ -66,41 +54,26 @@
 	 */
 	points = p->mm->total_vm;
 
-	/*
-	 * CPU time is in seconds and run time is in minutes. There is no
-	 * particular reason for this other than that it turned out to work
-	 * very well in practice. This is not safe against jiffie wraps
-	 * but we don't care _that_ much...
-	 */
-	cpu_time = (p->times.tms_utime + p->times.tms_stime) >> (SHIFT_HZ + 3);
-	run_time = (jiffies - p->start_time) >> (SHIFT_HZ + 10);
+	/* CPU time not considered:  this is for MEMory hogs. */
 
-	points /= int_sqrt(cpu_time);
-	points /= int_sqrt(int_sqrt(run_time));
-
-	/*
-	 * Niced processes are most likely less important, so double
-	 * their badness points.
-	 */
-	if (p->nice > 0)
-		points *= 2;
+	/* Niced processes less important?  Distributed.net would disagree! */
 
 	/*
 	 * Superuser processes are usually more important, so we make it
-	 * less likely that we kill those.
+	 * less likely (impossible) that we kill those.
 	 */
 	if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_ADMIN) ||
 				p->uid == 0 || p->euid == 0)
-		points /= 4;
+		points = 0;
 
 	/*
-	 * We don't want to kill a process with direct hardware access.
+	 * We WON'T kill a process with direct hardware access.
 	 * Not only could that mess up the hardware, but usually users
 	 * tend to only have this flag set on applications they think
 	 * of as important.
 	 */
 	if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_RAWIO))
-		points /= 4;
+		points = 0;
 #ifdef DEBUG
 	printk(KERN_DEBUG "OOMkill: task %d (%s) got %d points\n",
 	p->pid, p->comm, points);
@@ -193,11 +166,10 @@
 {
 	struct sysinfo swp_info;
 
-	/* Enough free memory?  Not OOM. */
-	if (nr_free_pages() > freepages.min)
-		return 0;
+	/* Even if free memory stays big enough...  */
+	/*  ...a cramped cache means thrashing, then keyboard lockout. */
 
-	if (nr_free_pages() + nr_inactive_clean_pages() > freepages.low)
+	if ((atomic_read(&page_cache_size) << PAGE_SHIFT)  >  (3 << 20)-1 )
 		return 0;
 
 	/* Enough swap space left?  Not OOM. */
 



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH] a simple OOM killer to save me from Netscape
  2001-04-12 16:58 [PATCH] a simple OOM killer to save me from Netscape Slats Grobnik
@ 2001-04-12 18:25 ` Rik van Riel
  2001-04-12 18:49   ` James A. Sutherland
  2001-04-13  6:45   ` Eric W. Biederman
  2001-04-17 10:58 ` limit for number of processes Uman
  1 sibling, 2 replies; 85+ messages in thread
From: Rik van Riel @ 2001-04-12 18:25 UTC (permalink / raw)
  To: Slats Grobnik; +Cc: linux-mm, Andrew Morton

On Thu, 12 Apr 2001, Slats Grobnik wrote:

	[snip special-purpose part]

> By running `free -s1' or `top' it's clear that once swap memory gets
> maxed out, *cache* memory size decreases until, at about 4M, mouse & 
> keyboard response becomes noticeably sluggish.  At cache=3M or less,
> all hope is lost.  But at this point, *free* RAM size may not be
> affected much.  And since CPU activity is down to a crawl, it may
> take a while to reach minimum (or some small arbitrary figure.)
> So I altered the `out_of_memory' function accordingly, and expect to
> never reboot again.  (Except for changing kernels, and power outage.

*nod*  We need to OOM-kill before we're dead in the water due to
thrashing.

> -	/*
> -	 * Niced processes are most likely less important, so double
> -	 * their badness points.
> -	 */
> -	if (p->nice > 0)
> -		points *= 2;
> +	/* Niced processes less important?  Distributed.net would disagree! */

Agreed. A while ago there was a discussion about this and we
agreed that we should remove this test (only, we never got
around to sending something to Linus ;)).


> -	/* Enough free memory?  Not OOM. */
> -	if (nr_free_pages() > freepages.min)
> -		return 0;
> +	/* Even if free memory stays big enough...  */
> +	/*  ...a cramped cache means thrashing, then keyboard lockout. */
>  
> -	if (nr_free_pages() + nr_inactive_clean_pages() > freepages.low)
> +	if ((atomic_read(&page_cache_size) << PAGE_SHIFT)  >  (3 << 20)-1 )
>  		return 0;

1) you DO need to check to see if the system still has enough
   free pages
2) the cache size may be better expressed as some percentage
   of system memory ... it's still not good, but the 3 MB you
   chose is probably completely wrong for 90% of the systems
   out there ;)

I believe Andrew Morton was also looking at making changes to the
out_of_memory() function, but only to make sure the OOM killer
isn't started to SOON. I guess we can work something out that will
both kill soon enough *and* not too soon  ;)

Any suggestions for making Slats' ideas more generic so they work
on every system ?

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com.br/


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH] a simple OOM killer to save me from Netscape
  2001-04-12 18:25 ` Rik van Riel
@ 2001-04-12 18:49   ` James A. Sutherland
  2001-04-13  6:45   ` Eric W. Biederman
  1 sibling, 0 replies; 85+ messages in thread
From: James A. Sutherland @ 2001-04-12 18:49 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Slats Grobnik, linux-mm, Andrew Morton

On Thu, 12 Apr 2001 15:25:00 -0300 (BRST), you wrote:

>On Thu, 12 Apr 2001, Slats Grobnik wrote:
>
>	[snip special-purpose part]
>
>> By running `free -s1' or `top' it's clear that once swap memory gets
>> maxed out, *cache* memory size decreases until, at about 4M, mouse & 
>> keyboard response becomes noticeably sluggish.  At cache=3M or less,
>> all hope is lost.  But at this point, *free* RAM size may not be
>> affected much.  And since CPU activity is down to a crawl, it may
>> take a while to reach minimum (or some small arbitrary figure.)
>> So I altered the `out_of_memory' function accordingly, and expect to
>> never reboot again.  (Except for changing kernels, and power outage.
>
>*nod*  We need to OOM-kill before we're dead in the water due to
>thrashing.
>
>> -	/*
>> -	 * Niced processes are most likely less important, so double
>> -	 * their badness points.
>> -	 */
>> -	if (p->nice > 0)
>> -		points *= 2;
>> +	/* Niced processes less important?  Distributed.net would disagree! */
>
>Agreed. A while ago there was a discussion about this and we
>agreed that we should remove this test (only, we never got
>around to sending something to Linus ;)).
>
>
>> -	/* Enough free memory?  Not OOM. */
>> -	if (nr_free_pages() > freepages.min)
>> -		return 0;
>> +	/* Even if free memory stays big enough...  */
>> +	/*  ...a cramped cache means thrashing, then keyboard lockout. */
>>  
>> -	if (nr_free_pages() + nr_inactive_clean_pages() > freepages.low)
>> +	if ((atomic_read(&page_cache_size) << PAGE_SHIFT)  >  (3 << 20)-1 )
>>  		return 0;
>
>1) you DO need to check to see if the system still has enough
>   free pages
>2) the cache size may be better expressed as some percentage
>   of system memory ... it's still not good, but the 3 MB you
>   chose is probably completely wrong for 90% of the systems
>   out there ;)
>
>I believe Andrew Morton was also looking at making changes to the
>out_of_memory() function, but only to make sure the OOM killer
>isn't started to SOON. I guess we can work something out that will
>both kill soon enough *and* not too soon  ;)

A manual (SysRq?) way of triggering the killer would be nice too. A
couple of times now I've had a process (usually the Acrobat Reader)
chomp a few hundred Mb of swap, causing horrible swapping. Had the OOM
killer triggered, it would have blown the rogue process away straight
away - except I couldn't trigger it manually...

>Any suggestions for making Slats' ideas more generic so they work
>on every system ?

How about setting a "target" cache size - so if the cache drops below
X Mb, you consider the system OOM and call up the firing squad?


James.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH] a simple OOM killer to save me from Netscape
  2001-04-12 18:25 ` Rik van Riel
  2001-04-12 18:49   ` James A. Sutherland
@ 2001-04-13  6:45   ` Eric W. Biederman
  2001-04-13 16:20     ` Rik van Riel
  1 sibling, 1 reply; 85+ messages in thread
From: Eric W. Biederman @ 2001-04-13  6:45 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Slats Grobnik, linux-mm, Andrew Morton

Rik van Riel <riel@conectiva.com.br> writes:
> 
> 1) you DO need to check to see if the system still has enough
>    free pages
> 2) the cache size may be better expressed as some percentage
>    of system memory ... it's still not good, but the 3 MB you
>    chose is probably completely wrong for 90% of the systems
>    out there ;)
> 
> I believe Andrew Morton was also looking at making changes to the
> out_of_memory() function, but only to make sure the OOM killer
> isn't started to SOON. I guess we can work something out that will
> both kill soon enough *and* not too soon  ;)
> 
> Any suggestions for making Slats' ideas more generic so they work
> on every system ?

Well I don't see how thrashing is necessarily connected to oom
at all.  You could have Gigs of swap not even touched and still
thrash.  

I would suggest adding a user space app to kill ill behaved processes.
It can do all kinds of things like put netscape on it's hit list, have
a config file etc.  But with a mlocked user space app killing ill behaved
processes, we can worry less about a kernel oom.  (Yes the user space
app would need to be static and probably not depend on glibc at all
since it is such a pig, but that shouldn't be a real issue).

The kernel should always wait until it is certain we are out of
memory.  This should give a user space app plenty of time to react.

Eric
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH] a simple OOM killer to save me from Netscape
  2001-04-13  6:45   ` Eric W. Biederman
@ 2001-04-13 16:20     ` Rik van Riel
  2001-04-14  1:20       ` Stephen C. Tweedie
                         ` (2 more replies)
  0 siblings, 3 replies; 85+ messages in thread
From: Rik van Riel @ 2001-04-13 16:20 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Slats Grobnik, linux-mm, Andrew Morton

On 13 Apr 2001, Eric W. Biederman wrote:

> > Any suggestions for making Slats' ideas more generic so they work
> > on every system ?
> 
> Well I don't see how thrashing is necessarily connected to oom
> at all.  You could have Gigs of swap not even touched and still
> thrash.  

OOM leads to thrashing, however.

If we run out of memory and swap, all we can evict are the
filesystem-backed parts of memory, which includes mapped
executables.  This is how OOM and thrashing are connected.

What we'd like to see is have the OOM killer act before the
system thrashes ... if only because this thrashing could mean
we never actually reach OOM because everything grinds to a
halt.

Thrashing when we still have swap free is an entirely different
matter, which I want to solve with load control code. That is,
when the load gets too high, we temporarily suspend processes
to bring the load down to more acceptable levels.

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com.br/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH] a simple OOM killer to save me from Netscape
  2001-04-13 16:20     ` Rik van Riel
@ 2001-04-14  1:20       ` Stephen C. Tweedie
  2001-04-16 21:06         ` James A. Sutherland
  2001-04-14  7:00       ` Eric W. Biederman
  2001-04-16 12:17       ` suspend processes at load (was Re: a simple OOM ...) Szabolcs Szakacsits
  2 siblings, 1 reply; 85+ messages in thread
From: Stephen C. Tweedie @ 2001-04-14  1:20 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Eric W. Biederman, Slats Grobnik, linux-mm, Andrew Morton

Hi,

On Fri, Apr 13, 2001 at 01:20:07PM -0300, Rik van Riel wrote:

> What we'd like to see is have the OOM killer act before the
> system thrashes ... if only because this thrashing could mean
> we never actually reach OOM because everything grinds to a
> halt.

It's almost impossible to tell in advance whether the system is going
to stabilise on its own when you start getting into a swap storm.
Going into OOM killer preemptively is going to risk killing tasks
unnecessarily.  I'd much rather leave the killer as a last-chance
thing to save us from eternal thrashing, rather than have it try too
hard to prevent any thrashing in the first place. 

If the workload suddenly changes, for example switching virtual
desktops on a low memory machine so that suddenly a lot of active
tasks need swapped out and a great deal of new data becomes
accessible, you get something that is still a swap storm but which
will reach equilibrium itself in time, for example.

--Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH] a simple OOM killer to save me from Netscape
  2001-04-13 16:20     ` Rik van Riel
  2001-04-14  1:20       ` Stephen C. Tweedie
@ 2001-04-14  7:00       ` Eric W. Biederman
  2001-04-15  5:05         ` Rik van Riel
  2001-04-16 11:52         ` Szabolcs Szakacsits
  2001-04-16 12:17       ` suspend processes at load (was Re: a simple OOM ...) Szabolcs Szakacsits
  2 siblings, 2 replies; 85+ messages in thread
From: Eric W. Biederman @ 2001-04-14  7:00 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Slats Grobnik, linux-mm, Andrew Morton

Rik van Riel <riel@conectiva.com.br> writes:

> On 13 Apr 2001, Eric W. Biederman wrote:
> 
> > > Any suggestions for making Slats' ideas more generic so they work
> > > on every system ?
> > 
> > Well I don't see how thrashing is necessarily connected to oom
> > at all.  You could have Gigs of swap not even touched and still
> > thrash.  
> 
> OOM leads to thrashing, however.
> 
> If we run out of memory and swap, all we can evict are the
> filesystem-backed parts of memory, which includes mapped
> executables.  This is how OOM and thrashing are connected.

I agree.  I just said there wasn't necessarily a connection.

> What we'd like to see is have the OOM killer act before the
> system thrashes ... if only because this thrashing could mean
> we never actually reach OOM because everything grinds to a
> halt.

Seriously you could do this in user-space with a 16KB or so mlocked
binary.  If you can detected OOM before thrashing I don't have a
problem.  But acting before OOM hits can be a pain.  

Suppose you have a computation that has been running for a month.  You
failed to add enough swap for it to run comfortably, and you forgot to
write check-pointing code.  It starts thrashing, but eventually it
will complete in another week pushing the system to the edge of OOM
the whole time (It will only use another hour of cpu in that time).
The OOM killer is broken if it kills this application.

But assuming we have swap-cache reclaim going on.  The conditions for
OOM are fairly simple.  
- All-caches are shrunk to minimal.
- We have no swap-cache pages.
- We have no swap.
- We have no mmaped pages in core.
- We have no ram (except a very small portion reserved for the kernel).

> Thrashing when we still have swap free is an entirely different
> matter, which I want to solve with load control code. That is,
> when the load gets too high, we temporarily suspend processes
> to bring the load down to more acceptable levels.

That's not bad but when it starts coming to policy, the policy
decisions are much more safely made in user space rather than the
kernel.  And we just allow the kernel to completely swap-out suspended
processes. 

Hmm. The more I look at this the more I keep thinking we should have a
process management daemon, enforcing some of these interesting
policies.  This would have to be small so it could be mlocked, and it
should take care of the following tasks. 

- Suspending processes in a high load/thrashing situation
- Creating swap files when we approach oom.
- Killing processes when oom is close and we can't add swap.

But since I can kill the daemon I don't have to use it.

Eric
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH] a simple OOM killer to save me from Netscape
  2001-04-14  7:00       ` Eric W. Biederman
@ 2001-04-15  5:05         ` Rik van Riel
  2001-04-15  5:20           ` Rik van Riel
  2001-04-16 11:52         ` Szabolcs Szakacsits
  1 sibling, 1 reply; 85+ messages in thread
From: Rik van Riel @ 2001-04-15  5:05 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Slats Grobnik, linux-mm, Andrew Morton

On 14 Apr 2001, Eric W. Biederman wrote:

> > Thrashing when we still have swap free is an entirely different
> > matter, which I want to solve with load control code. That is,
> > when the load gets too high, we temporarily suspend processes
> > to bring the load down to more acceptable levels.
> 
> That's not bad but when it starts coming to policy, the policy
> decisions are much more safely made in user space rather than the
> kernel.  And we just allow the kernel to completely swap-out suspended
> processes. 

You're soooo full of crap.  Next we know you'll be proposing
to move the scheduler and the pageout code to userspace.

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com.br/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH] a simple OOM killer to save me from Netscape
  2001-04-15  5:05         ` Rik van Riel
@ 2001-04-15  5:20           ` Rik van Riel
  0 siblings, 0 replies; 85+ messages in thread
From: Rik van Riel @ 2001-04-15  5:20 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Slats Grobnik, linux-mm, Andrew Morton

On Sun, 15 Apr 2001, Rik van Riel wrote:
> On 14 Apr 2001, Eric W. Biederman wrote:

> > That's not bad but when it starts coming to policy, the policy
> > decisions are much more safely made in user space rather than the
> > kernel.  And we just allow the kernel to completely swap-out suspended
> > processes. 
> 
> You're soooo full of crap.  Next we know you'll be proposing
> to move the scheduler and the pageout code to userspace.

To elaborate on that:

1) there already is lots of policy in the kernel (scheduler,
   page stealing code, users can nice-down-but-not-up, ...)
2) thrashing and OOM are relatively rare situations
3) I can see absolutely no reason why you would ever want
   to take a 2 kB piece of code from the kernel and put it
   in a 32 kB userland daemon (which would need another 32 kB
   of kernel overhead for task struct, pagetables, etc..)

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com.br/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH] a simple OOM killer to save me from Netscape
  2001-04-14  7:00       ` Eric W. Biederman
  2001-04-15  5:05         ` Rik van Riel
@ 2001-04-16 11:52         ` Szabolcs Szakacsits
  1 sibling, 0 replies; 85+ messages in thread
From: Szabolcs Szakacsits @ 2001-04-16 11:52 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Rik van Riel, linux-mm

On 14 Apr 2001, Eric W. Biederman wrote:

> Seriously you could do this in user-space with a 16KB or so mlocked
> binary.

You'd need to fix at least these as well, no new memory required to read
from /proc, no minutes latencies and obsolete values when reading /proc.
You're idea already failed in theory. I'd also suggest to study how
others handle the problem, there are a *lot* to learn ;)

	Szaka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-13 16:20     ` Rik van Riel
  2001-04-14  1:20       ` Stephen C. Tweedie
  2001-04-14  7:00       ` Eric W. Biederman
@ 2001-04-16 12:17       ` Szabolcs Szakacsits
  2001-04-17 19:48         ` Rik van Riel
  2 siblings, 1 reply; 85+ messages in thread
From: Szabolcs Szakacsits @ 2001-04-16 12:17 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Eric W. Biederman, linux-mm, Andrew Morton

On Fri, 13 Apr 2001, Rik van Riel wrote:

> That is, when the load gets too high, we temporarily suspend
> processes to bring the load down to more acceptable levels.

Please don't. Or at least make it optional and not the default or user
controllable. Trashing is good. People get feedback system is not
properly setup and they can tune. The problem Linux uses more and more
hardcoded values and "try to be clever algorithms" instead of tuning
parameters (see e.g. read-only /proc/sys/vm/freepages and other place
holders). Suspended pacemakers, quakes, e-commerce web servers, etc is
not the expected behavior and I'm not sure it will make people happy.

This is also my problem with __alloc_pages(), potentially looping
infinitely instead of falling back at one point and let the ENOMEM
handled by the upper layer (trying a smaller order allocation or
whatever).

	Szaka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH] a simple OOM killer to save me from Netscape
  2001-04-14  1:20       ` Stephen C. Tweedie
@ 2001-04-16 21:06         ` James A. Sutherland
  2001-04-16 21:40           ` Jonathan Morton
  0 siblings, 1 reply; 85+ messages in thread
From: James A. Sutherland @ 2001-04-16 21:06 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Rik van Riel, Eric W. Biederman, Slats Grobnik, linux-mm, Andrew Morton

On Sat, 14 Apr 2001 02:20:48 +0100, you wrote:

>Hi,
>
>On Fri, Apr 13, 2001 at 01:20:07PM -0300, Rik van Riel wrote:
>
>> What we'd like to see is have the OOM killer act before the
>> system thrashes ... if only because this thrashing could mean
>> we never actually reach OOM because everything grinds to a
>> halt.
>
>It's almost impossible to tell in advance whether the system is going
>to stabilise on its own when you start getting into a swap storm.
>Going into OOM killer preemptively is going to risk killing tasks
>unnecessarily.  I'd much rather leave the killer as a last-chance
>thing to save us from eternal thrashing, rather than have it try too
>hard to prevent any thrashing in the first place. 
>
>If the workload suddenly changes, for example switching virtual
>desktops on a low memory machine so that suddenly a lot of active
>tasks need swapped out and a great deal of new data becomes
>accessible, you get something that is still a swap storm but which
>will reach equilibrium itself in time, for example.

Ideally, I'd SIGSTOP each thrashing process. That way, enough
processes can be swapped out and KEPT swapped out to allow others to
complete their task, freeing up physical memory. Then you can SIGCONT
the processes you suspended, and make progress that way. There are
risks of "deadlocks", of course - suspend X, and all your graphical
apps will lock up waiting for it. This should lower VM pressure enough
to cause X to be restarted, though...


James.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH] a simple OOM killer to save me from Netscape
  2001-04-16 21:06         ` James A. Sutherland
@ 2001-04-16 21:40           ` Jonathan Morton
  2001-04-16 22:12             ` Rik van Riel
  2001-04-16 22:21             ` James A. Sutherland
  0 siblings, 2 replies; 85+ messages in thread
From: Jonathan Morton @ 2001-04-16 21:40 UTC (permalink / raw)
  To: James A. Sutherland, Stephen C. Tweedie
  Cc: Rik van Riel, Eric W. Biederman, Slats Grobnik, linux-mm, Andrew Morton

>Ideally, I'd SIGSTOP each thrashing process. That way, enough
>processes can be swapped out and KEPT swapped out to allow others to
>complete their task, freeing up physical memory. Then you can SIGCONT
>the processes you suspended, and make progress that way. There are
>risks of "deadlocks", of course - suspend X, and all your graphical
>apps will lock up waiting for it. This should lower VM pressure enough
>to cause X to be restarted, though...

Strongly agree.  Two points that need defining for this:

- When does a process become "thrashing"?  Clearly paging-in in itself is
not a good measure, since all processes do this at startup - paging-in
which forces other memory out, OTOH, is a prime target.

- How long do we suspend it for?  Does this depend on how many times it's
been suspended recently?

A major point I've noticed is that a relatively small number of thrashing
processes can force small interactive applications out of physical memory,
too - this needs fixing urgently.

Example: running 3 active memory hogs on my 256Mb physical + 256Mb swap
machine causes XMMS to stutter and crackle; increasing the load to 4 memory
hogs causes it to stop working completely for extended periods of time.
The same effect can be seen on the (graphical) system monitors and on an
SSH session in progress from outside.

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)
big-mail: chromatix@penguinpowered.com
uni-mail: j.d.morton@lancaster.ac.uk

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-----END GEEK CODE BLOCK-----


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH] a simple OOM killer to save me from Netscape
  2001-04-16 21:40           ` Jonathan Morton
@ 2001-04-16 22:12             ` Rik van Riel
  2001-04-16 22:21             ` James A. Sutherland
  1 sibling, 0 replies; 85+ messages in thread
From: Rik van Riel @ 2001-04-16 22:12 UTC (permalink / raw)
  To: Jonathan Morton
  Cc: James A. Sutherland, Stephen C. Tweedie, Eric W. Biederman,
	Slats Grobnik, linux-mm, Andrew Morton

On Mon, 16 Apr 2001, Jonathan Morton wrote:

> >Ideally, I'd SIGSTOP each thrashing process. That way, enough
> >processes can be swapped out and KEPT swapped out to allow others to
> >complete their task, freeing up physical memory. Then you can SIGCONT
> >the processes you suspended, and make progress that way. There are
> >risks of "deadlocks", of course - suspend X, and all your graphical
> >apps will lock up waiting for it. This should lower VM pressure enough
> >to cause X to be restarted, though...
> 
> Strongly agree.  Two points that need defining for this:
> 
> - When does a process become "thrashing"?  Clearly paging-in in itself is
> not a good measure, since all processes do this at startup - paging-in
> which forces other memory out, OTOH, is a prime target.
> 
> - How long do we suspend it for?  Does this depend on how many times it's
> been suspended recently?

I'm already working on something like this. 

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com.br/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH] a simple OOM killer to save me from Netscape
  2001-04-16 21:40           ` Jonathan Morton
  2001-04-16 22:12             ` Rik van Riel
@ 2001-04-16 22:21             ` James A. Sutherland
  2001-04-17 14:26               ` Jonathan Morton
  1 sibling, 1 reply; 85+ messages in thread
From: James A. Sutherland @ 2001-04-16 22:21 UTC (permalink / raw)
  To: Jonathan Morton
  Cc: Stephen C. Tweedie, Rik van Riel, Eric W. Biederman,
	Slats Grobnik, linux-mm, Andrew Morton

On Mon, 16 Apr 2001 22:40:31 +0100, you wrote:

>>Ideally, I'd SIGSTOP each thrashing process. That way, enough
>>processes can be swapped out and KEPT swapped out to allow others to
>>complete their task, freeing up physical memory. Then you can SIGCONT
>>the processes you suspended, and make progress that way. There are
>>risks of "deadlocks", of course - suspend X, and all your graphical
>>apps will lock up waiting for it. This should lower VM pressure enough
>>to cause X to be restarted, though...
>
>Strongly agree.  Two points that need defining for this:
>
>- When does a process become "thrashing"?  Clearly paging-in in itself is
>not a good measure, since all processes do this at startup - paging-in
>which forces other memory out, OTOH, is a prime target.

Yes... I think the best metric is how long the process is able to run
for between page faults. In short, "is it making progress?"

>- How long do we suspend it for?  Does this depend on how many times it's
>been suspended recently?

Probably, yes - in my example above, if we suspend X (blocking other
memory hogs), then unsuspend it again, we need to be sure we'll
suspend something else next cycle!

>A major point I've noticed is that a relatively small number of thrashing
>processes can force small interactive applications out of physical memory,
>too - this needs fixing urgently.
>
>Example: running 3 active memory hogs on my 256Mb physical + 256Mb swap
>machine causes XMMS to stutter and crackle; increasing the load to 4 memory
>hogs causes it to stop working completely for extended periods of time.
>The same effect can be seen on the (graphical) system monitors and on an
>SSH session in progress from outside.

Yep. Ideally, here, we'd suspend all but two of those memory hogs at
any one time. Probably suspending and restoring them in rotation, a
few seconds at a time, as a very coarse-grain scheduler? This way, all
these processes get similar amounts of CPU time, without forcing
thrashing or interactive performance degradation.

It's a very black art, this; "clever" page replacement algorithms will
probably go some way towards helping, but there will always be a point
when you really are thrashing - at which point, I think the best
solution is to suspend processes alternately until the problem is
resolved.


James.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* limit for number of processes
  2001-04-12 16:58 [PATCH] a simple OOM killer to save me from Netscape Slats Grobnik
  2001-04-12 18:25 ` Rik van Riel
@ 2001-04-17 10:58 ` Uman
  1 sibling, 0 replies; 85+ messages in thread
From: Uman @ 2001-04-17 10:58 UTC (permalink / raw)
  To: linux-mm

hello.
Yesterday when i wrote program which use fork for every connection, and
made stupid mistake. So i tested my PC(kernel 2.4.3-pre2+xfs) with 
something like
while(1){
fork();
}
i had ulimit 4000 processes  but my box became completely unresponsible 
in X.
As i understood it started to use swap intensively . But amount of 
memory was enough
so no OOM killing. The only thing i could do is to reboot.
After testing  i found that if i create up to 3200 processes i still can 
Ctrl-C  and everything
will be good. If i have more i can kill them but  kernel threads , as i 
understand, continue
to thrash system and the only thing i can do Sys-Rq. 
So my question is , what is amount of processes kernel can support (if  
i have enough memory)
without  thrashing  my system  and requiring reboot for  normal job.
Thank you.
Andrei.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH] a simple OOM killer to save me from Netscape
  2001-04-16 22:21             ` James A. Sutherland
@ 2001-04-17 14:26               ` Jonathan Morton
  2001-04-17 19:53                 ` Rik van Riel
  0 siblings, 1 reply; 85+ messages in thread
From: Jonathan Morton @ 2001-04-17 14:26 UTC (permalink / raw)
  To: James A. Sutherland
  Cc: Stephen C. Tweedie, Rik van Riel, Eric W. Biederman,
	Slats Grobnik, linux-mm, Andrew Morton

>It's a very black art, this; "clever" page replacement algorithms will
>probably go some way towards helping, but there will always be a point
>when you really are thrashing - at which point, I think the best
>solution is to suspend processes alternately until the problem is
>resolved.

I've got an even better idea.  Monitor each process's "working set" - ie.
the set of unique pages it regularly "uses" or pages in over some period of
(real) time.  In the event of thrashing, processes should be reserved an
amount of physical RAM equal to their working set, except for processes
which have "unreasonably large" working sets.  These last should be given
some arbitrarily small and fixed working set - they will perform just the
same as if nothing was done, but everything else runs *far* better.

The parameters for the above algorithm would be threefold:

- The time over which the working set is calculated (a decaying weight
would probably work)
- Determining "unreasonably large" (probably can be done by taking the
largest working set(s) on the system and penalising those until the total
working set is within the physical limit of the machine)
- How small the "fixed working set" should be for penalised processes (such
as a fair proportion of the un-reserved physical memory)

If this is done properly, well-behaved processes like XMMS, shells and
system monitors can continue working normally even if a ton of "memory
hogs" attempt to thrash the system to it's knees.  I suspect even a runaway
Netscape would be handled sensibly by this technique.  Best of all, this
technique doesn't involve arbitrarily killing or suspending processes.

It is still possible, mostly on small systems, to have *every* active
process thrashing in this manner.  However, I would submit that if it gets
this far, the system can safely be considered overloaded.  :)  It has
certainly got to be an improvement over the current situation, where just 3
or 4 runaway processes can bring down my well-endowed machine, and Netscape
can crunch a typical desktop.

The disadvantage, of course, is that some record has to be kept of the
working set of each process over time.  This could be some overhead in
terms of storage, but probably won't be much of a CPU burden.

Interestingly, the "working set" calculation yielded by this method, if
made available to userland, could possibly aid optimisation by application
programmers.  It is well known to systems programmers that if the working
set exceeds the size of the largest CPU cache on the system, performance is
limited to the speed and latency of DRAM (both of which are exceptionally
poor) - but it is considerably less well known to applications programmers.

As for how to actually implement this...  don't ask me.  I'm still a kernel
newbie, really!

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)
big-mail: chromatix@penguinpowered.com
uni-mail: j.d.morton@lancaster.ac.uk

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-----END GEEK CODE BLOCK-----

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-16 12:17       ` suspend processes at load (was Re: a simple OOM ...) Szabolcs Szakacsits
@ 2001-04-17 19:48         ` Rik van Riel
  2001-04-18 21:32           ` Szabolcs Szakacsits
  0 siblings, 1 reply; 85+ messages in thread
From: Rik van Riel @ 2001-04-17 19:48 UTC (permalink / raw)
  To: Szabolcs Szakacsits; +Cc: Eric W. Biederman, linux-mm, Andrew Morton

On Mon, 16 Apr 2001, Szabolcs Szakacsits wrote:
> On Fri, 13 Apr 2001, Rik van Riel wrote:
> 
> > That is, when the load gets too high, we temporarily suspend
> > processes to bring the load down to more acceptable levels.
> 
> Please don't. Or at least make it optional and not the default or user
> controllable. Trashing is good.

This sounds like you have no idea what thrashing is.

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com.br/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH] a simple OOM killer to save me from Netscape
  2001-04-17 14:26               ` Jonathan Morton
@ 2001-04-17 19:53                 ` Rik van Riel
  2001-04-17 20:44                   ` James A. Sutherland
  0 siblings, 1 reply; 85+ messages in thread
From: Rik van Riel @ 2001-04-17 19:53 UTC (permalink / raw)
  To: Jonathan Morton
  Cc: James A. Sutherland, Stephen C. Tweedie, Eric W. Biederman,
	Slats Grobnik, linux-mm, Andrew Morton

On Tue, 17 Apr 2001, Jonathan Morton wrote:

> >It's a very black art, this; "clever" page replacement algorithms will
> >probably go some way towards helping, but there will always be a point
> >when you really are thrashing - at which point, I think the best
> >solution is to suspend processes alternately until the problem is
> >resolved.
> 
> I've got an even better idea.  Monitor each process's "working set" -
> ie. the set of unique pages it regularly "uses" or pages in over some
> period of (real) time.  In the event of thrashing, processes should be
> reserved an amount of physical RAM equal to their working set, except
> for processes which have "unreasonably large" working sets.

This may be a nice idea to move the thrashing point out a bit
further, and as such may be nice in addition to the load control
code.

> It is still possible, mostly on small systems, to have *every* active
> process thrashing in this manner.  However, I would submit that if it
> gets this far, the system can safely be considered overloaded.  :)

... And when the system _is_ overloaded, load control (ie. process
suspension) is what saves us. Load control makes sure the processes
in the system can all still make progress and the system can (slowly)
work itself out of the overloaded situation.

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com.br/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH] a simple OOM killer to save me from Netscape
  2001-04-17 19:53                 ` Rik van Riel
@ 2001-04-17 20:44                   ` James A. Sutherland
  2001-04-17 20:59                     ` Jonathan Morton
  0 siblings, 1 reply; 85+ messages in thread
From: James A. Sutherland @ 2001-04-17 20:44 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Jonathan Morton, Stephen C. Tweedie, Eric W. Biederman,
	Slats Grobnik, linux-mm, Andrew Morton

On Tue, 17 Apr 2001 16:53:51 -0300 (BRST), you wrote:

>On Tue, 17 Apr 2001, Jonathan Morton wrote:
>
>> >It's a very black art, this; "clever" page replacement algorithms will
>> >probably go some way towards helping, but there will always be a point
>> >when you really are thrashing - at which point, I think the best
>> >solution is to suspend processes alternately until the problem is
>> >resolved.
>> 
>> I've got an even better idea.  Monitor each process's "working set" -
>> ie. the set of unique pages it regularly "uses" or pages in over some
>> period of (real) time.  In the event of thrashing, processes should be
>> reserved an amount of physical RAM equal to their working set, except
>> for processes which have "unreasonably large" working sets.
>
>This may be a nice idea to move the thrashing point out a bit
>further, and as such may be nice in addition to the load control
>code.

Yes - in addition to, not instead of. Ultimately, there are workloads
which CANNOT be handled without suspending/killing some tasks...

>> It is still possible, mostly on small systems, to have *every* active
>> process thrashing in this manner.  However, I would submit that if it
>> gets this far, the system can safely be considered overloaded.  :)
>
>... And when the system _is_ overloaded, load control (ie. process
>suspension) is what saves us. Load control makes sure the processes
>in the system can all still make progress and the system can (slowly)
>work itself out of the overloaded situation.

Indeed: the whole problem is that under very heavy load, processes
simply cannot make progress - they will continue to thrash
indefinitely, and the only way out is to suspend or kill one or more
of the offending processes! Clever techniques to make that scenario
less likely to occur in the first place are nice, but we DO need an
"ultimate failsafe" here...


James.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH] a simple OOM killer to save me from Netscape
  2001-04-17 20:44                   ` James A. Sutherland
@ 2001-04-17 20:59                     ` Jonathan Morton
  2001-04-17 21:09                       ` James A. Sutherland
  0 siblings, 1 reply; 85+ messages in thread
From: Jonathan Morton @ 2001-04-17 20:59 UTC (permalink / raw)
  To: James A. Sutherland, Rik van Riel
  Cc: Stephen C. Tweedie, Eric W. Biederman, Slats Grobnik, linux-mm,
	Andrew Morton

>>> I've got an even better idea.  Monitor each process's "working set" -
>>> ie. the set of unique pages it regularly "uses" or pages in over some
>>> period of (real) time.  In the event of thrashing, processes should be
>>> reserved an amount of physical RAM equal to their working set, except
>>> for processes which have "unreasonably large" working sets.
>>
>>This may be a nice idea to move the thrashing point out a bit
>>further, and as such may be nice in addition to the load control
>>code.
>
>Yes - in addition to, not instead of. Ultimately, there are workloads
>which CANNOT be handled without suspending/killing some tasks...

Umm.  Actually, my idea wasn't to move the thrashing point but to limit
thrashing to processes which (by some measure) deserve it.  Thus the
thrashing in itself becomes load control, rather than (as at present)
bringing the entire system down.  Hope that's a bit clearer?

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)
big-mail: chromatix@penguinpowered.com
uni-mail: j.d.morton@lancaster.ac.uk

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-----END GEEK CODE BLOCK-----


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH] a simple OOM killer to save me from Netscape
  2001-04-17 20:59                     ` Jonathan Morton
@ 2001-04-17 21:09                       ` James A. Sutherland
  0 siblings, 0 replies; 85+ messages in thread
From: James A. Sutherland @ 2001-04-17 21:09 UTC (permalink / raw)
  To: Jonathan Morton
  Cc: Rik van Riel, Stephen C. Tweedie, Eric W. Biederman,
	Slats Grobnik, linux-mm, Andrew Morton

On Tue, 17 Apr 2001 21:59:46 +0100, you wrote:

>>>> I've got an even better idea.  Monitor each process's "working set" -
>>>> ie. the set of unique pages it regularly "uses" or pages in over some
>>>> period of (real) time.  In the event of thrashing, processes should be
>>>> reserved an amount of physical RAM equal to their working set, except
>>>> for processes which have "unreasonably large" working sets.
>>>
>>>This may be a nice idea to move the thrashing point out a bit
>>>further, and as such may be nice in addition to the load control
>>>code.
>>
>>Yes - in addition to, not instead of. Ultimately, there are workloads
>>which CANNOT be handled without suspending/killing some tasks...
>
>Umm.  Actually, my idea wasn't to move the thrashing point but to limit
>thrashing to processes which (by some measure) deserve it.  Thus the
>thrashing in itself becomes load control, rather than (as at present)
>bringing the entire system down.  Hope that's a bit clearer?

The trouble is, you're effectively suspending these processes, but
wasting system resources on them! It's much more efficient to suspend
then until they can be run properly. If they are genuinely thrashing -
effectively busy-waiting for resources to be available - what point is
there in NOT suspending them?


James.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-18 21:32           ` Szabolcs Szakacsits
@ 2001-04-18 20:38             ` James A. Sutherland
  2001-04-18 23:25               ` Szabolcs Szakacsits
  2001-04-19 18:34             ` Dave McCracken
  1 sibling, 1 reply; 85+ messages in thread
From: James A. Sutherland @ 2001-04-18 20:38 UTC (permalink / raw)
  To: Szabolcs Szakacsits; +Cc: Rik van Riel, linux-mm

On Wed, 18 Apr 2001 23:32:25 +0200 (MET DST), you wrote:

>On Tue, 17 Apr 2001, Rik van Riel wrote:
>> On Mon, 16 Apr 2001, Szabolcs Szakacsits wrote:
>> > Please don't. Or at least make it optional and not the default or user
>> > controllable. Trashing is good.
>> This sounds like you have no idea what thrashing is.
>
>Sorry, your comment isn't convincing enough ;) Why do you think
>"arbitrarily" (decided exclusively by the kernel itself) suspending
>processes (that can be done in user space anyway) would help?

Not "arbitrarily"; they will be frozen for increasing periods of time.
Effectively just a huge increase in timeslice size.

>Even if you block new process creation and memory allocations (that's
>also not nice since it can be done by resource limits) why you think
>situation will ever get better i.e. processes release memory?

Only a pathological workload will lead to indefinite thrashing; in
this, worst case, scenario this approach makes no real difference. In
any other scenario, it's a major improvement.

>How you want to avoid "deadlocks" when running processes have
>dependencies on suspended processes?

If a process blocks waiting for another, the thrashing will be
resolved.

>What control you plan for sysadmins who *want* to get feedback about bad
>setups as soon as possible?

They will get this feedback, and more effectively than they do now:
right now, they are left with a dead box they have to reboot. With
this solution, a few resource hog processes get suspended briefly.

>How you plan to explain on comp.os.linux.development.applications
>that your *perfect* programs can't only be SIGKILL'd by kernel at any
>time but also suspended for indefinite time from now?

IF you overload the system to extremes, then your processes will stop
running for brief periods. Right now, they ALL stop running
indefinitely!

>Sure it would help in cases and in others it would utterly fail. 

Nope. Allowing the system to thrash IS the worst case scenario!

>Just
>like the thrasing case. So as such I see it an unnecessary bloat adding
>complexity and no real functionality.

You haven't thought it through, then. Thrashing is the worst-case
endgame scenario: all bets are off. ANYTHING, including SIGKILLing
RANDOM processes, is better than that.

James.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-17 19:48         ` Rik van Riel
@ 2001-04-18 21:32           ` Szabolcs Szakacsits
  2001-04-18 20:38             ` James A. Sutherland
  2001-04-19 18:34             ` Dave McCracken
  0 siblings, 2 replies; 85+ messages in thread
From: Szabolcs Szakacsits @ 2001-04-18 21:32 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-mm

On Tue, 17 Apr 2001, Rik van Riel wrote:
> On Mon, 16 Apr 2001, Szabolcs Szakacsits wrote:
> > Please don't. Or at least make it optional and not the default or user
> > controllable. Trashing is good.
> This sounds like you have no idea what thrashing is.

Sorry, your comment isn't convincing enough ;) Why do you think
"arbitrarily" (decided exclusively by the kernel itself) suspending
processes (that can be done in user space anyway) would help?

Even if you block new process creation and memory allocations (that's
also not nice since it can be done by resource limits) why you think
situation will ever get better i.e. processes release memory?

How you want to avoid "deadlocks" when running processes have
dependencies on suspended processes?

What control you plan for sysadmins who *want* to get feedback about bad
setups as soon as possible?

How you plan to explain on comp.os.linux.development.applications
that your *perfect* programs can't only be SIGKILL'd by kernel at any
time but also suspended for indefinite time from now?

Sure it would help in cases and in others it would utterly fail. Just
like the thrasing case. So as such I see it an unnecessary bloat adding
complexity and no real functionality.

        Szaka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-18 23:25               ` Szabolcs Szakacsits
@ 2001-04-18 22:29                 ` Rik van Riel
  2001-04-19 10:14                   ` Stephen C. Tweedie
  2001-04-19 13:23                   ` Szabolcs Szakacsits
  2001-04-19  2:11                 ` Rik van Riel
  2001-04-19  9:15                 ` James A. Sutherland
  2 siblings, 2 replies; 85+ messages in thread
From: Rik van Riel @ 2001-04-18 22:29 UTC (permalink / raw)
  To: Szabolcs Szakacsits; +Cc: James A. Sutherland, linux-mm

On Thu, 19 Apr 2001, Szabolcs Szakacsits wrote:

> > They will get this feedback, and more effectively than they do now:
> > right now, they are left with a dead box they have to reboot. With
>
> Not if they RTFM. Moreover thrashing != dead.
>
> > IF you overload the system to extremes, then your processes will stop
> > running for brief periods. Right now, they ALL stop running
> > indefinitely!
>
> This is not true. There *is* progress, it just can be painful slow.

"Painfully slow" when you are thrashing  ==  "root cannot login
because his login times out every time he tries to login".

THIS is why we need process suspension in the kernel.

Also think about the problem a bit more.  If the "painfully slow
progress" is getting less work done than the amount of new work
that's incoming (think of eg. a mailserver), then the system has
NO WAY to ever recover ... at least, not without the system
administrator walking by after the weekend.

OTOH, when the kernel suspends SOME tasks, so the others can run
at full speed (and then switches, etc..), then the system is able
to run all tasks to completion and crawl out of the overload
situation.

This is nothing different from CPU scheduling, except that this
happens on a larger timescale and is only done to rescue the system
in an emergency.    Or did you want to get rid of preemptive
multitasking too ?

regards,

Rik
--
Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml

Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-18 20:38             ` James A. Sutherland
@ 2001-04-18 23:25               ` Szabolcs Szakacsits
  2001-04-18 22:29                 ` Rik van Riel
                                   ` (2 more replies)
  0 siblings, 3 replies; 85+ messages in thread
From: Szabolcs Szakacsits @ 2001-04-18 23:25 UTC (permalink / raw)
  To: James A. Sutherland; +Cc: Rik van Riel, linux-mm

On Wed, 18 Apr 2001, James A. Sutherland wrote:
> >How you want to avoid "deadlocks" when running processes have
> >dependencies on suspended processes?
> If a process blocks waiting for another, the thrashing will be
> resolved.

This is a big simplification, e.g. not if it polls [not poll(2)].

> They will get this feedback, and more effectively than they do now:
> right now, they are left with a dead box they have to reboot. With

Not if they RTFM. Moreover thrashing != dead.

> IF you overload the system to extremes, then your processes will stop
> running for brief periods. Right now, they ALL stop running
> indefinitely!

This is not true. There *is* progress, it just can be painful slow.

> You haven't thought it through, then.

"If you don't learn from history .... ". Anyway get familiar with AIX.

But as I wrote before, I can't see problem with optional implementation
even I think the whole issue is a user space one and kernel efforts
should be concentrated fixing 2.4 MM bugs.

	Szaka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-18 23:25               ` Szabolcs Szakacsits
  2001-04-18 22:29                 ` Rik van Riel
@ 2001-04-19  2:11                 ` Rik van Riel
  2001-04-19  7:08                   ` James A. Sutherland
  2001-04-19  9:15                 ` James A. Sutherland
  2 siblings, 1 reply; 85+ messages in thread
From: Rik van Riel @ 2001-04-19  2:11 UTC (permalink / raw)
  To: Szabolcs Szakacsits; +Cc: James A. Sutherland, linux-mm

On Thu, 19 Apr 2001, Szabolcs Szakacsits wrote:
> On Wed, 18 Apr 2001, James A. Sutherland wrote:
> > >How you want to avoid "deadlocks" when running processes have
> > >dependencies on suspended processes?
> > If a process blocks waiting for another, the thrashing will be
> > resolved.
> 
> This is a big simplification, e.g. not if it polls [not poll(2)].

If it sits there in a loop, the rest of the memory that process
uses can be swapped out ;)

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com.br/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-19  2:11                 ` Rik van Riel
@ 2001-04-19  7:08                   ` James A. Sutherland
  2001-04-19 13:37                     ` Szabolcs Szakacsits
  0 siblings, 1 reply; 85+ messages in thread
From: James A. Sutherland @ 2001-04-19  7:08 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Szabolcs Szakacsits, linux-mm

On Wed, 18 Apr 2001 23:11:59 -0300 (BRST), you wrote:

>On Thu, 19 Apr 2001, Szabolcs Szakacsits wrote:
>> On Wed, 18 Apr 2001, James A. Sutherland wrote:
>> > >How you want to avoid "deadlocks" when running processes have
>> > >dependencies on suspended processes?
>> > If a process blocks waiting for another, the thrashing will be
>> > resolved.
>> 
>> This is a big simplification, e.g. not if it polls [not poll(2)].
>
>If it sits there in a loop, the rest of the memory that process
>uses can be swapped out ;)

Also, if your program is busy-waiting for another to complete in that
way, you need to feed it into /dev/null and get another program :-)


James.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-18 23:25               ` Szabolcs Szakacsits
  2001-04-18 22:29                 ` Rik van Riel
  2001-04-19  2:11                 ` Rik van Riel
@ 2001-04-19  9:15                 ` James A. Sutherland
  2 siblings, 0 replies; 85+ messages in thread
From: James A. Sutherland @ 2001-04-19  9:15 UTC (permalink / raw)
  To: Szabolcs Szakacsits; +Cc: Rik van Riel, linux-mm

On Thu, 19 Apr 2001 01:25:46 +0200 (MET DST), you wrote:

>On Wed, 18 Apr 2001, James A. Sutherland wrote:
>> >How you want to avoid "deadlocks" when running processes have
>> >dependencies on suspended processes?
>> If a process blocks waiting for another, the thrashing will be
>> resolved.
>
>This is a big simplification, e.g. not if it polls [not poll(2)].

If it is just polling waiting for something - i.e. check for result
file, sleep, repeat - then it isn't part of the thrashing workload.

>> They will get this feedback, and more effectively than they do now:
>> right now, they are left with a dead box they have to reboot. With
>
>Not if they RTFM. Moreover thrashing != dead.

Thrashing == dead == hard reboot needed. If you cannot log in as root
to kill the offending processes, and the processes in question are
thrashing, you are unlikely to recover the system on a practical
timescale.

>> IF you overload the system to extremes, then your processes will stop
>> running for brief periods. Right now, they ALL stop running
>> indefinitely!
>
>This is not true. There *is* progress, it just can be painful slow.

Not necessarily: it can easily go into a hard loop to the point you
never recover without rebooting.

>> You haven't thought it through, then.
>
>"If you don't learn from history .... ". Anyway get familiar with AIX.

OK, how does AIX handle the system effectively locking up?

>But as I wrote before, I can't see problem with optional implementation
>even I think the whole issue is a user space one and kernel efforts
>should be concentrated fixing 2.4 MM bugs.

The issue can't really be solved properly in userspace... Once
thrashing has started, your userspace daemon may not necessarily ever
get swapped in enough to run! If you mlock the whole thing and are
very careful, you might just be able to SIGSTOP suitable processes -
but why not just do this kernel-side, where it should be much easier?

James.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-18 22:29                 ` Rik van Riel
@ 2001-04-19 10:14                   ` Stephen C. Tweedie
  2001-04-19 13:23                   ` Szabolcs Szakacsits
  1 sibling, 0 replies; 85+ messages in thread
From: Stephen C. Tweedie @ 2001-04-19 10:14 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Szabolcs Szakacsits, James A. Sutherland, linux-mm

Hi,

On Wed, Apr 18, 2001 at 07:29:26PM -0300, Rik van Riel wrote:
> On Thu, 19 Apr 2001, Szabolcs Szakacsits wrote:
> 
> > This is not true. There *is* progress, it just can be painful slow.
> 
> "Painfully slow" when you are thrashing  ==  "root cannot login
> because his login times out every time he tries to login".

Not necessarily.  If we can guarantee a minimal working set size to
all active processes when under severe VM load, then processes like
login may still be slow but they will at least be able to make
progress.  The existance of thrashing will obviously have an impact on
the swap device performance for all processes, but the thrashing's
impact on other processes' working sets can, and should, be
controlled.

> THIS is why we need process suspension in the kernel.

Not necessarily.  Creating a minimal working set guarantee for small
tasks is one way to avoid the need for process suspension.  Creating a
dynamic working set upper limit for large, thrashing tasks is a way to
avoid the thrashing tasks from impacting everybody else too much.
There are many possible ways forward, and I am not yet convinced that
process suspension is necessary.

Cheers,
 Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-19 13:37                     ` Szabolcs Szakacsits
@ 2001-04-19 12:26                       ` Christoph Rohland
  2001-04-19 12:30                       ` James A. Sutherland
  1 sibling, 0 replies; 85+ messages in thread
From: Christoph Rohland @ 2001-04-19 12:26 UTC (permalink / raw)
  To: Szabolcs Szakacsits; +Cc: linux-mm

Hi Szabolcs,

On Thu, 19 Apr 2001, Szabolcs Szakacsits wrote:
> real life is much more tough (SAP dies on Linux because of its
> max process limit [and forget 2.4]).

???

Greetings
		Christoph


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-19 13:37                     ` Szabolcs Szakacsits
  2001-04-19 12:26                       ` Christoph Rohland
@ 2001-04-19 12:30                       ` James A. Sutherland
  1 sibling, 0 replies; 85+ messages in thread
From: James A. Sutherland @ 2001-04-19 12:30 UTC (permalink / raw)
  To: Szabolcs Szakacsits; +Cc: Rik van Riel, linux-mm

On Thu, 19 Apr 2001 15:37:04 +0200 (MET DST), you wrote:

>
>On Thu, 19 Apr 2001, James A. Sutherland wrote:
>> On Wed, 18 Apr 2001 23:11:59 -0300 (BRST), you wrote:
>> >If it sits there in a loop, the rest of the memory that process
>> >uses can be swapped out ;)
>> Also, if your program is busy-waiting for another to complete in that
>> way, you need to feed it into /dev/null and get another program :-)
>
>Is it so difficult to imagine a thread/process, doing its job and
>sometimes checking (changed filestamps, new files in a dir, whatever)
>for new things to do? This is of course a simple, stupid case, real life
>is much more tough (SAP dies on Linux because of its max process limit
>[and forget 2.4]). IMHO you want to stop the river and hope it won't
>flood.

Quite the opposite - I want to drain the river, you want to let it
flood to teach the sysadmin a lesson!


James.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-18 22:29                 ` Rik van Riel
  2001-04-19 10:14                   ` Stephen C. Tweedie
@ 2001-04-19 13:23                   ` Szabolcs Szakacsits
  1 sibling, 0 replies; 85+ messages in thread
From: Szabolcs Szakacsits @ 2001-04-19 13:23 UTC (permalink / raw)
  To: Rik van Riel; +Cc: James A. Sutherland, linux-mm

On Wed, 18 Apr 2001, Rik van Riel wrote:
> "Painfully slow" when you are thrashing  ==  "root cannot login
> because his login times out every time he tries to login".
> THIS is why we need process suspension in the kernel.

man 5 login.defs
vi /etc/security/limits.conf

> Also think about the problem a bit more.  If the "painfully slow
> progress" is getting less work done than the amount of new work
> that's incoming (think of eg. a mailserver), then the system has
> NO WAY to ever recover ... at least, not without the system
> administrator walking by after the weekend.

This is also quite typical for inexperienced web admins and guess what?
They learn to use resource limits and config settings.

	Szaka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-19  7:08                   ` James A. Sutherland
@ 2001-04-19 13:37                     ` Szabolcs Szakacsits
  2001-04-19 12:26                       ` Christoph Rohland
  2001-04-19 12:30                       ` James A. Sutherland
  0 siblings, 2 replies; 85+ messages in thread
From: Szabolcs Szakacsits @ 2001-04-19 13:37 UTC (permalink / raw)
  To: James A. Sutherland; +Cc: Rik van Riel, linux-mm

On Thu, 19 Apr 2001, James A. Sutherland wrote:
> On Wed, 18 Apr 2001 23:11:59 -0300 (BRST), you wrote:
> >If it sits there in a loop, the rest of the memory that process
> >uses can be swapped out ;)
> Also, if your program is busy-waiting for another to complete in that
> way, you need to feed it into /dev/null and get another program :-)

Is it so difficult to imagine a thread/process, doing its job and
sometimes checking (changed filestamps, new files in a dir, whatever)
for new things to do? This is of course a simple, stupid case, real life
is much more tough (SAP dies on Linux because of its max process limit
[and forget 2.4]). IMHO you want to stop the river and hope it won't
flood.

	Szaka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-18 21:32           ` Szabolcs Szakacsits
  2001-04-18 20:38             ` James A. Sutherland
@ 2001-04-19 18:34             ` Dave McCracken
  2001-04-19 18:47               ` James A. Sutherland
  2001-04-20 12:18               ` Szabolcs Szakacsits
  1 sibling, 2 replies; 85+ messages in thread
From: Dave McCracken @ 2001-04-19 18:34 UTC (permalink / raw)
  To: linux-mm

--On Wednesday, April 18, 2001 23:32:25 +0200 Szabolcs Szakacsits 
<szaka@f-secure.com> wrote:

> Sorry, your comment isn't convincing enough ;) Why do you think
> "arbitrarily" (decided exclusively by the kernel itself) suspending
> processes (that can be done in user space anyway) would help?
>
> Even if you block new process creation and memory allocations (that's
> also not nice since it can be done by resource limits) why you think
> situation will ever get better i.e. processes release memory?
>
> How you want to avoid "deadlocks" when running processes have
> dependencies on suspended processes?

I think there's a semantic misunderstanding here.  If I understand Rik's 
proposal right, he's not talking about completely suspending a process ala 
SIGSTOP.  He's talking about removing it from the run queue for some small 
length of time (ie a few seconds, probably) during which all the other 
processes can make progress.  This kind of suspension won't be noticeable 
to users/administrators or permanently block dependent processes.  In fact, 
it should make the system appear more responsive than one in a thrashing 
state.

Dave McCracken

======================================================================
Dave McCracken          IBM Linux Base Kernel Team      1-512-838-3059
dmc@austin.ibm.com                                      T/L   678-3059

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-19 18:34             ` Dave McCracken
@ 2001-04-19 18:47               ` James A. Sutherland
  2001-04-19 18:53                 ` Dave McCracken
  2001-04-20 12:25                 ` Szabolcs Szakacsits
  2001-04-20 12:18               ` Szabolcs Szakacsits
  1 sibling, 2 replies; 85+ messages in thread
From: James A. Sutherland @ 2001-04-19 18:47 UTC (permalink / raw)
  To: Dave McCracken; +Cc: linux-mm

On Thu, 19 Apr 2001 13:34:59 -0500, you wrote:

>--On Wednesday, April 18, 2001 23:32:25 +0200 Szabolcs Szakacsits 
><szaka@f-secure.com> wrote:
>
>> Sorry, your comment isn't convincing enough ;) Why do you think
>> "arbitrarily" (decided exclusively by the kernel itself) suspending
>> processes (that can be done in user space anyway) would help?
>>
>> Even if you block new process creation and memory allocations (that's
>> also not nice since it can be done by resource limits) why you think
>> situation will ever get better i.e. processes release memory?
>>
>> How you want to avoid "deadlocks" when running processes have
>> dependencies on suspended processes?
>
>I think there's a semantic misunderstanding here.  If I understand Rik's 
>proposal right, 

Well, it was my proposal when I first said it :-)

>he's not talking about completely suspending a process ala 
>SIGSTOP.  He's talking about removing it from the run queue for some small 
>length of time (ie a few seconds, probably) during which all the other 
>processes can make progress.  

Rik and I are both proposing that, AFAICS; however it's implemented
(SIGSTOP or direct tweaking of the run queue; I prefer the former,
since I think it could be done more neatly) you just suspend the
process for a couple of seconds, then resume it (and suspend someone
else if the thrashing continues).

>This kind of suspension won't be noticeable 
>to users/administrators or permanently block dependent processes.  In fact, 
>it should make the system appear more responsive than one in a thrashing 
>state.

Indeed. It would certainly help with the usual test-case for such
things ("make -j 50" or similar): you'll end up with 40 gcc processes
being frozen at once, allowing the other 10 to complete first.


James.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-19 18:47               ` James A. Sutherland
@ 2001-04-19 18:53                 ` Dave McCracken
  2001-04-19 19:10                   ` James A. Sutherland
  2001-04-19 19:13                   ` Rik van Riel
  2001-04-20 12:25                 ` Szabolcs Szakacsits
  1 sibling, 2 replies; 85+ messages in thread
From: Dave McCracken @ 2001-04-19 18:53 UTC (permalink / raw)
  To: James A. Sutherland; +Cc: linux-mm

--On Thursday, April 19, 2001 19:47:12 +0100 "James A. Sutherland" 
<jas88@cam.ac.uk> wrote:

> Well, it was my proposal when I first said it :-)

Oops.  My apologies.  I'd lost track of whose idea it was originally :)

Dave McCracken

======================================================================
Dave McCracken          IBM Linux Base Kernel Team      1-512-838-3059
dmc@austin.ibm.com                                      T/L   678-3059

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-19 18:53                 ` Dave McCracken
@ 2001-04-19 19:10                   ` James A. Sutherland
  2001-04-20 14:58                     ` Rik van Riel
  2001-04-19 19:13                   ` Rik van Riel
  1 sibling, 1 reply; 85+ messages in thread
From: James A. Sutherland @ 2001-04-19 19:10 UTC (permalink / raw)
  To: Dave McCracken; +Cc: linux-mm

On Thu, 19 Apr 2001 13:53:48 -0500, you wrote:

>--On Thursday, April 19, 2001 19:47:12 +0100 "James A. Sutherland" 
><jas88@cam.ac.uk> wrote:
>
>> Well, it was my proposal when I first said it :-)
>
>Oops.  My apologies.  I'd lost track of whose idea it was originally :)

No problem - the OOM killer itself had similar origins, in fact! (Back
in the heat of the "Avoiding OOM on overcommit" flamewar, I suggested
the original concept, which evolved into the OOM killer we have now)


James.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-19 18:53                 ` Dave McCracken
  2001-04-19 19:10                   ` James A. Sutherland
@ 2001-04-19 19:13                   ` Rik van Riel
  2001-04-19 19:47                     ` Gerrit Huizenga
                                       ` (2 more replies)
  1 sibling, 3 replies; 85+ messages in thread
From: Rik van Riel @ 2001-04-19 19:13 UTC (permalink / raw)
  To: Dave McCracken; +Cc: James A. Sutherland, linux-mm

On Thu, 19 Apr 2001, Dave McCracken wrote:
> --On Thursday, April 19, 2001 19:47:12 +0100 "James A. Sutherland"
> <jas88@cam.ac.uk> wrote:
>
> > Well, it was my proposal when I first said it :-)
>
> Oops.  My apologies.  I'd lost track of whose idea it was originally :)

Actually, this idea must have been in Unix since about
Bell Labs v5 Unix, possibly before.

And when paging was introduced in 3bsd, process suspension
under heavy load was preserved in the system to make sure
the system would continue to make progress under heavy
load instead of thrashing to a halt.

This is not a new idea, it's an old solution to an old
problem; it even seems to work quite well.

Incidentally, the "minimal working set" idea Stephen posted
was also in 3bsd. Since this idea is good for preserving the
forward progress of smaller programs and is extremely simple
to implement, we probably want this too.

regards,

Rik
--
Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml

Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-19 19:13                   ` Rik van Riel
@ 2001-04-19 19:47                     ` Gerrit Huizenga
  2001-04-20 12:44                       ` Szabolcs Szakacsits
  2001-04-19 20:06                     ` James A. Sutherland
  2001-04-20 12:29                     ` Szabolcs Szakacsits
  2 siblings, 1 reply; 85+ messages in thread
From: Gerrit Huizenga @ 2001-04-19 19:47 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Dave McCracken, James A. Sutherland, linux-mm

Other options to think about here include tuning/limiting a process's
working set size based on page fault frequency, adjusting the
scheduling quanta or degrading the scheduling priority of a process
when its page fault frequency is high and memory is tight, or putting
to sleep processes with a high page fault frequency.  Yes, stopping the
largest process in linux works because there are no(?) memory
allocation limits for any process, hence a process which either has
poor memory locality or simply a need for a Bigabyte of address space
will soon become the largest process.  And as memory sizes increase,
global LRU page stealing becomes less efficient, right when you need to
make quicker decisions.  Often a local page replacement algorithm or
local working space management mechanism allows the memory pigs to only
impact themselves, instead of thrashing the rest of the system.

gerrit

> On Thu, 19 Apr 2001, Rik van Riel wrote:
> [...]
> And when paging was introduced in 3bsd, process suspension
> under heavy load was preserved in the system to make sure
> the system would continue to make progress under heavy
> load instead of thrashing to a halt.
> 
> Incidentally, the "minimal working set" idea Stephen posted
> was also in 3bsd. Since this idea is good for preserving the
> forward progress of smaller programs and is extremely simple
> to implement, we probably want this too.
> 
> regards,
> 
> Rik
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-19 19:13                   ` Rik van Riel
  2001-04-19 19:47                     ` Gerrit Huizenga
@ 2001-04-19 20:06                     ` James A. Sutherland
  2001-04-20 12:29                     ` Szabolcs Szakacsits
  2 siblings, 0 replies; 85+ messages in thread
From: James A. Sutherland @ 2001-04-19 20:06 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Dave McCracken, linux-mm

On Thu, 19 Apr 2001 16:13:02 -0300 (BRST), you wrote:

>On Thu, 19 Apr 2001, Dave McCracken wrote:
>> --On Thursday, April 19, 2001 19:47:12 +0100 "James A. Sutherland"
>> <jas88@cam.ac.uk> wrote:
>>
>> > Well, it was my proposal when I first said it :-)
>>
>> Oops.  My apologies.  I'd lost track of whose idea it was originally :)
>
>Actually, this idea must have been in Unix since about
>Bell Labs v5 Unix, possibly before.

Well, good to know our wheel's the same shape as everyone else's :-)

>And when paging was introduced in 3bsd, process suspension
>under heavy load was preserved in the system to make sure
>the system would continue to make progress under heavy
>load instead of thrashing to a halt.
>
>This is not a new idea, it's an old solution to an old
>problem; it even seems to work quite well.
>
>Incidentally, the "minimal working set" idea Stephen posted
>was also in 3bsd. Since this idea is good for preserving the
>forward progress of smaller programs and is extremely simple
>to implement, we probably want this too.

Yes. A quick look at how VMS/WinNT implements this strategy might be
useful here too; still a good idea, even if MS have assimilated it :-)


James.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-20 12:29                     ` Szabolcs Szakacsits
@ 2001-04-20 11:50                       ` Jonathan Morton
  2001-04-20 13:32                         ` Szabolcs Szakacsits
  2001-04-22 10:21                       ` James A. Sutherland
  1 sibling, 1 reply; 85+ messages in thread
From: Jonathan Morton @ 2001-04-20 11:50 UTC (permalink / raw)
  To: Szabolcs Szakacsits, Rik van Riel
  Cc: Dave McCracken, James A. Sutherland, linux-mm

>> Actually, this idea must have been in Unix since about
>> Bell Labs v5 Unix, possibly before.
>
>When people were happy they could sit down in front of a computer. But
>world changed since then. Users expectations are much higher, they want
>[among others] latency and high availability.
>
>> This is not a new idea, it's an old solution to an old
>> problem; it even seems to work quite well.
>
>Seems for who? AIX? "DON'T TOUCH IT!" I think HP-UX also has and it's
>not famous because of its stability. Sure, not because of this but maybe
>sometimes it contributes, maybe its design contributes, maybe its
>designers contribute.

Well, OK, let's look at a commercial UNIX known for stability at high load:
Solaris.  How does Solaris handle thrashing?

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)
big-mail: chromatix@penguinpowered.com
uni-mail: j.d.morton@lancaster.ac.uk

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-----END GEEK CODE BLOCK-----


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-19 18:34             ` Dave McCracken
  2001-04-19 18:47               ` James A. Sutherland
@ 2001-04-20 12:18               ` Szabolcs Szakacsits
  2001-04-22 10:19                 ` James A. Sutherland
  1 sibling, 1 reply; 85+ messages in thread
From: Szabolcs Szakacsits @ 2001-04-20 12:18 UTC (permalink / raw)
  To: Dave McCracken; +Cc: linux-mm

On Thu, 19 Apr 2001, Dave McCracken wrote:
> --On Wednesday, April 18, 2001 23:32:25 +0200 Szabolcs Szakacsits
> > How you want to avoid "deadlocks" when running processes have
> > dependencies on suspended processes?
> I think there's a semantic misunderstanding here.  If I understand Rik's
> proposal right, he's not talking about completely suspending a process ala
> SIGSTOP.  He's talking about removing it from the run queue for some small
> length of time (ie a few seconds, probably) during which all the other
> processes can make progress.

Yes, I also didn't mean deadlocks in its classical sense this is the
reason I put it in quote. The issue is the unexpected potentially huge
communication latencies between processes/threads or between user and
system. App developers do write code taking load/latency into account
but not in mind some of their processes/threads can get suspended for
indeterminated interval from time to time.

> This kind of suspension won't be noticeable to users/administrators
> or permanently block dependent processes.  In fact, it should make
> the system appear more responsive than one in a thrashing state.

With occasionally suspended X, sshd, etc, etc, etc ;)

	Szaka


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-19 18:47               ` James A. Sutherland
  2001-04-19 18:53                 ` Dave McCracken
@ 2001-04-20 12:25                 ` Szabolcs Szakacsits
  2001-04-21  6:08                   ` James A. Sutherland
  1 sibling, 1 reply; 85+ messages in thread
From: Szabolcs Szakacsits @ 2001-04-20 12:25 UTC (permalink / raw)
  To: James A. Sutherland; +Cc: linux-mm

On Thu, 19 Apr 2001, James A. Sutherland wrote:

> Rik and I are both proposing that, AFAICS; however it's implemented

Is it implemented? So why wasting words? Why don't you send the patch
for tests?

> since I think it could be done more neatly) you just suspend the
> process for a couple of seconds,

Processes are already suspended in __alloc_pages() for potentially
infinitely. This could explain why you see no progress and perhaps also
other people's problems who reported lockups on lkml. I run with a patch
that prevents this infinite looping in __alloc_pages().

So suspend at page level didn't help, now comes process level. What
next? Because it will not help either.

What would help from kernel level?

o reserved root vm, class/fair share scheduling (I run with the former
  and helps a lot to take control back [well to be honest, your
  statements about reboots are completely false even without too strict
  resource limits])

o non-overcommit [per process granularity and/or virtual swap spaces
  would be nice as well]

o better system monitoring: more info, more efficiently, smaller
  latencies [I mean 1 sec is ok but not the occasional 10+ sec
  accumulated stats that just hide a problem. This seems inrelevant but
  would help users and kernel developers to understand better a particular
  workload and tune or fix things (possibly not with the currently
  popular hard coded values).

As Stephen mentioned there are many [other] ways to improve things and I
think process suspension is just the wrong one.

> Indeed. It would certainly help with the usual test-case for such
> things ("make -j 50" or similar): you'll end up with 40 gcc processes
> being frozen at once, allowing the other 10 to complete first.

Can I recommend a real life test-case? Constant/increasing rate hit
to a dynamic web server.

	Szaka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-19 19:13                   ` Rik van Riel
  2001-04-19 19:47                     ` Gerrit Huizenga
  2001-04-19 20:06                     ` James A. Sutherland
@ 2001-04-20 12:29                     ` Szabolcs Szakacsits
  2001-04-20 11:50                       ` Jonathan Morton
  2001-04-22 10:21                       ` James A. Sutherland
  2 siblings, 2 replies; 85+ messages in thread
From: Szabolcs Szakacsits @ 2001-04-20 12:29 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Dave McCracken, James A. Sutherland, linux-mm

On Thu, 19 Apr 2001, Rik van Riel wrote:
> Actually, this idea must have been in Unix since about
> Bell Labs v5 Unix, possibly before.

When people were happy they could sit down in front of a computer. But
world changed since then. Users expectations are much higher, they want
[among others] latency and high availability.

> This is not a new idea, it's an old solution to an old
> problem; it even seems to work quite well.

Seems for who? AIX? "DON'T TOUCH IT!" I think HP-UX also has and it's
not famous because of its stability. Sure, not because of this but maybe
sometimes it contributes, maybe its design contributes, maybe its
designers contribute.

	Szaka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-19 19:47                     ` Gerrit Huizenga
@ 2001-04-20 12:44                       ` Szabolcs Szakacsits
  0 siblings, 0 replies; 85+ messages in thread
From: Szabolcs Szakacsits @ 2001-04-20 12:44 UTC (permalink / raw)
  To: Gerrit Huizenga
  Cc: Rik van Riel, Dave McCracken, James A. Sutherland, linux-mm

On Thu, 19 Apr 2001, Gerrit Huizenga wrote:

> Other options to think about here include tuning/limiting a process's
> working set size based on page fault frequency, adjusting the

Heavy paging != thrasing. You even can't suppose major faults are really
major ones.

	Szaka


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-20 11:50                       ` Jonathan Morton
@ 2001-04-20 13:32                         ` Szabolcs Szakacsits
  2001-04-20 14:30                           ` Rik van Riel
  0 siblings, 1 reply; 85+ messages in thread
From: Szabolcs Szakacsits @ 2001-04-20 13:32 UTC (permalink / raw)
  To: Jonathan Morton
  Cc: Rik van Riel, Dave McCracken, James A. Sutherland, linux-mm

On Fri, 20 Apr 2001, Jonathan Morton wrote:

> Well, OK, let's look at a commercial UNIX known for stability at high load:
> Solaris.  How does Solaris handle thrashing?

Just as 2.2 and earlier kernels did [but not 2.4], keeps processes
running. Moreover the default is non-overcommiting memory handling.
There are also nice performance tuning guides.

	Szaka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-20 13:32                         ` Szabolcs Szakacsits
@ 2001-04-20 14:30                           ` Rik van Riel
  0 siblings, 0 replies; 85+ messages in thread
From: Rik van Riel @ 2001-04-20 14:30 UTC (permalink / raw)
  To: Szabolcs Szakacsits
  Cc: Jonathan Morton, Dave McCracken, James A. Sutherland, linux-mm

On Fri, 20 Apr 2001, Szabolcs Szakacsits wrote:
> On Fri, 20 Apr 2001, Jonathan Morton wrote:
> 
> > Well, OK, let's look at a commercial UNIX known for stability at high load:
> > Solaris.  How does Solaris handle thrashing?
> 
> Just as 2.2 and earlier kernels did [but not 2.4], keeps processes
> running. Moreover the default is non-overcommiting memory handling.
> There are also nice performance tuning guides.

1)  Solaris DOES suspend processes under heavy load
2)  Linux 2.4 does not (but should, IMHO)

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com.br/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-19 19:10                   ` James A. Sutherland
@ 2001-04-20 14:58                     ` Rik van Riel
  2001-04-21  6:10                       ` James A. Sutherland
  0 siblings, 1 reply; 85+ messages in thread
From: Rik van Riel @ 2001-04-20 14:58 UTC (permalink / raw)
  To: James A.Sutherland; +Cc: Dave McCracken, linux-mm

On Thu, 19 Apr 2001, James A.Sutherland wrote:

> No problem - the OOM killer itself had similar origins, in fact! (Back
> in the heat of the "Avoiding OOM on overcommit" flamewar, I suggested
> the original concept, which evolved into the OOM killer we have now)

What year was that ?  ;)

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com.br/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-20 12:25                 ` Szabolcs Szakacsits
@ 2001-04-21  6:08                   ` James A. Sutherland
  0 siblings, 0 replies; 85+ messages in thread
From: James A. Sutherland @ 2001-04-21  6:08 UTC (permalink / raw)
  To: Szabolcs Szakacsits; +Cc: linux-mm

On Fri, 20 Apr 2001 14:25:36 +0200 (MET DST), you wrote:
>On Thu, 19 Apr 2001, James A. Sutherland wrote:
>
>> Rik and I are both proposing that, AFAICS; however it's implemented
>
>Is it implemented? So why wasting words? Why don't you send the patch
>for tests?

It isn't. You've mangled my sentence, changing the meaning...

>> since I think it could be done more neatly) you just suspend the
>> process for a couple of seconds,
>
>Processes are already suspended in __alloc_pages() for potentially
>infinitely. This could explain why you see no progress and perhaps also
>other people's problems who reported lockups on lkml. I run with a patch
>that prevents this infinite looping in __alloc_pages().

Yep, that's the whole problem. One process starts running, page
faults, so another starts running and faults - if you have enough
faults before you get back to the first process, it will fault again
straight away because you've swapped the page it was waiting for back
out!

>So suspend at page level didn't help, now comes process level. What
>next? Because it will not help either.

It will... "suspend at page level" is part of the problem: you need to
make sure the process gets a chance to USE the memory it just faulted
in.

>What would help from kernel level?
>
>o reserved root vm, 

Not much; OK, it would allow you to log in and kill the runaway
processes, but that's it. An Alt+SysRq key could do the same...

>class/fair share scheduling (I run with the former
>  and helps a lot to take control back [well to be honest, your
>  statements about reboots are completely false even without too strict
>  resource limits])

I was speaking from personal experience there...

>o non-overcommit [per process granularity and/or virtual swap spaces
>  would be nice as well]

Non-overcommit would just make matters worse: you would get the same
results but with a big chunk of swap space wasted "just in case". How
exactly would that help?

>o better system monitoring: more info, more efficiently, smaller
>  latencies [I mean 1 sec is ok but not the occasional 10+ sec
>  accumulated stats that just hide a problem. This seems inrelevant but
>  would help users and kernel developers to understand better a particular
>  workload and tune or fix things (possibly not with the currently
>  popular hard coded values).

You don't need system monitoring to detect thrashing: this would be
like fitting a warning light to your car to indicate "You've hit
something!": other subtle hints like the loud noise, the impact and
the change in car shape should convey this information already.

>As Stephen mentioned there are many [other] ways to improve things and I
>think process suspension is just the wrong one.

It's the best approach in the pathological cases where we NEED to do
something drastic or we lose the box.

>> Indeed. It would certainly help with the usual test-case for such
>> things ("make -j 50" or similar): you'll end up with 40 gcc processes
>> being frozen at once, allowing the other 10 to complete first.
>
>Can I recommend a real life test-case? Constant/increasing rate hit
>to a dynamic web server.

Yep, OK; let's assume it's a prefork server like Apache 1.3, so you
have lots of independent processes, each serving one client. Right
now, each request will hit a thrashing process. On non-thrashing
systems (running in RAM) the request takes 1 seconds to process. If
you're very lucky, thrashing, the request will be handled within two
hours. By which time, any real-world browser has given up, and you
wasted a lot of resources feeding data to /dev/null.

Now we try with process suspension. Again, we'll have Apache's
MaxProcesses number of processes running accepting requests, but this
time all the active processes are being periodically suspended to
allow others to complete. Suppose we can support 10 simultaneous
processes, and MaxProcesses is 100; the worst case is then that a 1
second response time goes to 10, instead of every single request
timing out.

Summary: with process suspension, clients get handled slowly. Without
it, requests go to /dev/null and eat CPU on the way. I know which I
prefer!

James.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-20 14:58                     ` Rik van Riel
@ 2001-04-21  6:10                       ` James A. Sutherland
  0 siblings, 0 replies; 85+ messages in thread
From: James A. Sutherland @ 2001-04-21  6:10 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Dave McCracken, linux-mm

On Fri, 20 Apr 2001 11:58:04 -0300 (BRST), you wrote:

>On Thu, 19 Apr 2001, James A.Sutherland wrote:
>
>> No problem - the OOM killer itself had similar origins, in fact! (Back
>> in the heat of the "Avoiding OOM on overcommit" flamewar, I suggested
>> the original concept, which evolved into the OOM killer we have now)
>
>What year was that ?  ;)

The second year of the flamewar, I think :) Probably early last year
or sometime in '99 - I don't have the diskspace to keep lkml stuff
that long...

It was a week or two before you said you were working on the OOM
killer, anyway.


James.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-20 12:18               ` Szabolcs Szakacsits
@ 2001-04-22 10:19                 ` James A. Sutherland
  0 siblings, 0 replies; 85+ messages in thread
From: James A. Sutherland @ 2001-04-22 10:19 UTC (permalink / raw)
  To: Szabolcs Szakacsits; +Cc: Dave McCracken, linux-mm

On Fri, 20 Apr 2001 14:18:34 +0200 (MET DST), you wrote:

>On Thu, 19 Apr 2001, Dave McCracken wrote:
>> --On Wednesday, April 18, 2001 23:32:25 +0200 Szabolcs Szakacsits
>> > How you want to avoid "deadlocks" when running processes have
>> > dependencies on suspended processes?
>> I think there's a semantic misunderstanding here.  If I understand Rik's
>> proposal right, he's not talking about completely suspending a process ala
>> SIGSTOP.  He's talking about removing it from the run queue for some small
>> length of time (ie a few seconds, probably) during which all the other
>> processes can make progress.
>
>Yes, I also didn't mean deadlocks in its classical sense this is the
>reason I put it in quote. The issue is the unexpected potentially huge
>communication latencies between processes/threads or between user and
>system. App developers do write code taking load/latency into account
>but not in mind some of their processes/threads can get suspended for
>indeterminated interval from time to time.

If some part of the multi-threaded/multi-process system overloads the
system to the point of thrashing, it has already failed, and is likely
to encounter a SIGKILL from the sysadmin - if and when the sysadmin is
able to issue a SIGKILL...

>> This kind of suspension won't be noticeable to users/administrators
>> or permanently block dependent processes.  In fact, it should make
>> the system appear more responsive than one in a thrashing state.
>
>With occasionally suspended X, sshd, etc, etc, etc ;)

If sshd blows up to the point of getting suspended, it's already gone
wrong... Suspending X could happen, and would be a *GOOD* thing under
the circumstances: it would then enable you to kill the rogue
process(es) on a virtual console or network login.


James.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-20 12:29                     ` Szabolcs Szakacsits
  2001-04-20 11:50                       ` Jonathan Morton
@ 2001-04-22 10:21                       ` James A. Sutherland
  1 sibling, 0 replies; 85+ messages in thread
From: James A. Sutherland @ 2001-04-22 10:21 UTC (permalink / raw)
  To: Szabolcs Szakacsits; +Cc: Rik van Riel, Dave McCracken, linux-mm

On Fri, 20 Apr 2001 14:29:57 +0200 (MET DST), you wrote:

>
>On Thu, 19 Apr 2001, Rik van Riel wrote:
>> Actually, this idea must have been in Unix since about
>> Bell Labs v5 Unix, possibly before.
>
>When people were happy they could sit down in front of a computer. But
>world changed since then. Users expectations are much higher, 

Hrm. How do you reconcile that with increasing use of Windows? :-)

>they want
>[among others] latency 

They want latency?! Just put them on a BT Internet connection then...

>and high availability.

Yes - which requires strangling rogue processes before they can take
the box out...

>> This is not a new idea, it's an old solution to an old
>> problem; it even seems to work quite well.
>
>Seems for who? AIX? "DON'T TOUCH IT!" I think HP-UX also has and it's
>not famous because of its stability. Sure, not because of this but maybe
>sometimes it contributes, maybe its design contributes, maybe its
>designers contribute.

Compared to other "desktop" OSs, Linux is excellent - compared to many
"commercial" Unixes, it still has weak points. It's rapidly improving,
but don't go thinking there is nothing to be learned from other, far
more mature, platforms...


James.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-23  5:55                                   ` James A. Sutherland
@ 2001-04-23  5:59                                     ` Rik van Riel
  0 siblings, 0 replies; 85+ messages in thread
From: Rik van Riel @ 2001-04-23  5:59 UTC (permalink / raw)
  To: James A.Sutherland; +Cc: Jonathan Morton, Joseph A. Knapka, linux-mm

On Mon, 23 Apr 2001, James A.Sutherland wrote:

> >Now, I suspect you guys have been thinking "hey, he's going to give
> >processes memory *proportionate* to their working sets, which doesn't
> >work!" - well, I realised early on it wasn't going to work that way.  :)
> 
> You seem to be creeping subtly towards process suspension :)

For every simple, working solution there must be at least 3
complex grand schemes that almost work ;)

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com.br/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-22 22:26                                 ` Jonathan Morton
@ 2001-04-23  5:55                                   ` James A. Sutherland
  2001-04-23  5:59                                     ` Rik van Riel
  0 siblings, 1 reply; 85+ messages in thread
From: James A. Sutherland @ 2001-04-23  5:55 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Rik van Riel, Joseph A. Knapka, linux-mm

On Sun, 22 Apr 2001 23:26:37 +0100, you wrote:

>>> We've crossed wires here: I know that's how the suspension approach
>>> works, I'm talking about the "working set" approach - which to me,
>>> sounds more likely to give both processes 50Mb each, and spend the
>>> next six weeks grinding the disks to powder!
>>
>>Indeed, in this case the working set approach won't work.
>
>Going back to my description of my algorithm from a few days ago, it
>selects *one* process at a time to penalise.  If processes are not
>re-ordered and remain with the same-sized working set, it will ensure that
>one of the large processes remains fully resident and runs to completion

"remains"? Neither process was able to get 100Mb of RAM; one got 75Mb,
the other got 25Mb. They are both now thrashing, and will continue
until the disks melt.

If you "penalise" one process, you are effectively suspending it - but
in a way that wastes CPU time and I/O bandwidth. Why bother?

>(as I described).  Thus the period in which the disks get churned is quite
>short.  When combined with suspension, the intensity of disk activity would
>also be reduced.
>
>Of course, if the working set of the swapped-out process decreases (as a
>result of being swapped out and/or suspended), it will eventually come off
>the penalised list and replace the resident one.  It is important to keep
>the period over which the working set is calculated fairly long, to
>minimise the frequency of oscillations resulting from this effect.  My
>algorithm takes this into account as well, with the period being
>approximately 5.5 minutes on 100Hz hardware.
>
>If further processes come in, increasing the working set further beyond the
>system limits, my algorithm selects another *single* process at a time to
>add to the penalised list.  This ensures that at any time, the maximum
>amount of physical memory is utilised by processes which are not subject to
>suspension or thrashing.

Your "penalised" processes are thrashing anyway. They might as well be
suspended, freeing up system resources which are otherwise wasted.

>Now, I suspect you guys have been thinking "hey, he's going to give
>processes memory *proportionate* to their working sets, which doesn't
>work!" - well, I realised early on it wasn't going to work that way.  :)

You seem to be creeping subtly towards process suspension :)


James.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-22 21:26                               ` Rik van Riel
@ 2001-04-22 22:26                                 ` Jonathan Morton
  2001-04-23  5:55                                   ` James A. Sutherland
  0 siblings, 1 reply; 85+ messages in thread
From: Jonathan Morton @ 2001-04-22 22:26 UTC (permalink / raw)
  To: Rik van Riel, James A.Sutherland; +Cc: Joseph A. Knapka, linux-mm

>> We've crossed wires here: I know that's how the suspension approach
>> works, I'm talking about the "working set" approach - which to me,
>> sounds more likely to give both processes 50Mb each, and spend the
>> next six weeks grinding the disks to powder!
>
>Indeed, in this case the working set approach won't work.

Going back to my description of my algorithm from a few days ago, it
selects *one* process at a time to penalise.  If processes are not
re-ordered and remain with the same-sized working set, it will ensure that
one of the large processes remains fully resident and runs to completion
(as I described).  Thus the period in which the disks get churned is quite
short.  When combined with suspension, the intensity of disk activity would
also be reduced.

Of course, if the working set of the swapped-out process decreases (as a
result of being swapped out and/or suspended), it will eventually come off
the penalised list and replace the resident one.  It is important to keep
the period over which the working set is calculated fairly long, to
minimise the frequency of oscillations resulting from this effect.  My
algorithm takes this into account as well, with the period being
approximately 5.5 minutes on 100Hz hardware.

If further processes come in, increasing the working set further beyond the
system limits, my algorithm selects another *single* process at a time to
add to the penalised list.  This ensures that at any time, the maximum
amount of physical memory is utilised by processes which are not subject to
suspension or thrashing.

Now, I suspect you guys have been thinking "hey, he's going to give
processes memory *proportionate* to their working sets, which doesn't
work!" - well, I realised early on it wasn't going to work that way.  :)

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)
big-mail: chromatix@penguinpowered.com
uni-mail: j.d.morton@lancaster.ac.uk

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-----END GEEK CODE BLOCK-----

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-22 20:58                             ` James A. Sutherland
@ 2001-04-22 21:26                               ` Rik van Riel
  2001-04-22 22:26                                 ` Jonathan Morton
  0 siblings, 1 reply; 85+ messages in thread
From: Rik van Riel @ 2001-04-22 21:26 UTC (permalink / raw)
  To: James A.Sutherland; +Cc: Jonathan Morton, Joseph A. Knapka, linux-mm

On Sun, 22 Apr 2001, James A.Sutherland wrote:
> On Sun, 22 Apr 2001 17:41:41 -0300 (BRST), you wrote:
> >On Sun, 22 Apr 2001, James A.Sutherland wrote:
> >
> >> >>How exactly will your approach solve the two process case, yet still
> >> >>keeping the processes running properly?
> >> >
> >> >It will allocate one process it's entire working set in physical RAM, 
> >> 
> >> Which one?
> >
> >A random one. And after some time you switch, suspending the
> >first process and letting the other one run.
> 
> We've crossed wires here: I know that's how the suspension approach
> works, I'm talking about the "working set" approach - which to me,
> sounds more likely to give both processes 50Mb each, and spend the
> next six weeks grinding the disks to powder!

Indeed, in this case the working set approach won't work.

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com.br/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-22 20:41                           ` Rik van Riel
@ 2001-04-22 20:58                             ` James A. Sutherland
  2001-04-22 21:26                               ` Rik van Riel
  0 siblings, 1 reply; 85+ messages in thread
From: James A. Sutherland @ 2001-04-22 20:58 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Jonathan Morton, Joseph A. Knapka, linux-mm

On Sun, 22 Apr 2001 17:41:41 -0300 (BRST), you wrote:

>On Sun, 22 Apr 2001, James A.Sutherland wrote:
>
>> >>How exactly will your approach solve the two process case, yet still
>> >>keeping the processes running properly?
>> >
>> >It will allocate one process it's entire working set in physical RAM, 
>> 
>> Which one?
>
>A random one. And after some time you switch, suspending the
>first process and letting the other one run.

We've crossed wires here: I know that's how the suspension approach
works, I'm talking about the "working set" approach - which to me,
sounds more likely to give both processes 50Mb each, and spend the
next six weeks grinding the disks to powder!

>Note that I have code for this on my system here, I'll put it
>online soon.

Cool - I'll finally be able to open files in Acrobat Reader without
having one finger on the reset button :-)


James.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-22 20:35                         ` James A. Sutherland
@ 2001-04-22 20:41                           ` Rik van Riel
  2001-04-22 20:58                             ` James A. Sutherland
  0 siblings, 1 reply; 85+ messages in thread
From: Rik van Riel @ 2001-04-22 20:41 UTC (permalink / raw)
  To: James A.Sutherland; +Cc: Jonathan Morton, Joseph A. Knapka, linux-mm

On Sun, 22 Apr 2001, James A.Sutherland wrote:

> >>How exactly will your approach solve the two process case, yet still
> >>keeping the processes running properly?
> >
> >It will allocate one process it's entire working set in physical RAM, 
> 
> Which one?

A random one. And after some time you switch, suspending the
first process and letting the other one run.

Note that I have code for this on my system here, I'll put it
online soon.

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com.br/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-22 20:21                       ` Jonathan Morton
@ 2001-04-22 20:36                         ` Jonathan Morton
  0 siblings, 0 replies; 85+ messages in thread
From: Jonathan Morton @ 2001-04-22 20:36 UTC (permalink / raw)
  To: Rik van Riel; +Cc: James A. Sutherland, Joseph A. Knapka, linux-mm

>>1) a minimal guaranteed working set for small processes, so root
>>   can login and large hogs don't penalize good guys
>>   (simpler than the working set idea, should work just as good)
>
>This is also worth considering, perhaps as a subset of the working-set
>algorithm.
>
>I'm looking at sources, trying to figure out how to implement this kind of
>thing...  but is there an easy way to find out what process(es) is/are
>using a given page?  I'm talking about the page-replacement policy, of
>course, where (current) is no help in this matter.

Oh, never mind, I found vmscan.c and the scanning sequence works in just
the way I need it to anyway.

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)
big-mail: chromatix@penguinpowered.com
uni-mail: j.d.morton@lancaster.ac.uk

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-----END GEEK CODE BLOCK-----


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-22 19:11                       ` Rik van Riel
@ 2001-04-22 20:36                         ` James A. Sutherland
  0 siblings, 0 replies; 85+ messages in thread
From: James A. Sutherland @ 2001-04-22 20:36 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Jonathan Morton, Joseph A. Knapka, linux-mm

On Sun, 22 Apr 2001 16:11:36 -0300 (BRST), you wrote:

>On Sun, 22 Apr 2001, James A.Sutherland wrote:
>
>> >But login was suspended because of a page fault,
>> 
>> No, login was NOT *suspended*. It's sleeping on I/O, not suspended.
>> 
>> > so potentially it will
>> >*also* get suspended for just as long as the hogs.  
>> 
>> No, it will get CPU time a small fraction of a second later, once the
>> I/O completes.
>
>You're assuming login won't have the rest of its memory (which
>it needs to do certain things) swapped out again in the time
>it waits for this page to be swapped in...
>
>... which is exactly what happens when the system is thrashing.

Except that we aren't thrashing, because the memory hog processes have
been suspended by this point and so we do have enough memory free for
login!


James.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-22 19:30                       ` Jonathan Morton
@ 2001-04-22 20:35                         ` James A. Sutherland
  2001-04-22 20:41                           ` Rik van Riel
  0 siblings, 1 reply; 85+ messages in thread
From: James A. Sutherland @ 2001-04-22 20:35 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Joseph A. Knapka, linux-mm

On Sun, 22 Apr 2001 20:30:50 +0100, you wrote:

>>>>>- Login needs paging in (is suspended while it waits).
>
>>>But login was suspended because of a page fault,
>>
>>No, login was NOT *suspended*. It's sleeping on I/O, not suspended.
>
>So, the memory hogs are causing page faults (accessing memory which is not
>currently resident), login is causing page faults (same definition).
>What's the difference?

The number of page faults, the size of process. One is a huge process
generating large numbers of page faults over a period of time,
contributing a large amount to the VM load.

>>>>Not really. Your WS suggestion doesn't evict some processes entirely,
>>>>which is necessary under some workloads.
>>>
>>>Can you give an example of such a workload?
>>
>>Example: any process which is doing random access throughout an array
>>in memory. Let's suppose it's a 100 Mb array on a machine with 128Mb
>>of RAM.
>
>>How exactly will your approach solve the two process case, yet still
>>keeping the processes running properly?
>
>It will allocate one process it's entire working set in physical RAM, 

Which one?

>and
>allow the other to make progress as fast as disk I/O will allow (which I
>would call "single-process thrashing").  

So you effectively "busy-suspend" the other process - it's going
nowhere, but eating I/O capacity as it does so.

>When, after a few seconds, the
>entirely-resident process completes, the other is allowed to take up as
>much RAM as it likes.

i.e. it resumes proper execution.

>If I've followed my mental dry-run correctly, the entirely-resident process
>would probably be the *second* process to be started, assuming both are
>identical, one is started a few scheduling cycles after the other, and the
>first process establishes it's 100Mb working set within those few cycles.
>
>If, at this point, your suspension algorithm decided to suspend the
>(mostly) swapped-out process for a few brief periods of time, it would have
>little effect except maybe to slightly delay the resumption of progress of
>the swapped-out process and to reduce the amount of disk I/O caused while
>the first process ran to completion.

If you truly allow the second process to be starved entirely of
memory, yes. At which point, it's suspended, and you've just copied my
solution (and Rik's, and that used by a dozen other Unices.)

>>>I think we're approaching the problem from opposite viewpoints.  Don't get
>>>me wrong here - I think process suspension could be a valuable "feature"
>>>under extreme load, but I think that the working-set idea will perform
>>>better and more consistently under "mild overloads", which the current
>>>system handles extremely poorly.  Probably the only way to resolve this
>>>argument is to actually try and implement each idea, and see how they
>>>perform.
>>
>>Since the two are not mutually exclusive, why try "comparing" them?
>>Returning to our car analogy, would you try "comparing" snow chains
>>with diff-lock?!
>
>I said nothing about comparison or competition.  By "each idea" I *include*
>the possibility of having the suspension algorithm *and* the working-set
>algorithm implemented simultaneously.  It would be instructive to see how
>they performed separately, too.

Perhaps, but they tackle different problems.


James.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-22 19:41                       ` James A. Sutherland
@ 2001-04-22 20:33                         ` Jean Francois Martinez
  0 siblings, 0 replies; 85+ messages in thread
From: Jean Francois Martinez @ 2001-04-22 20:33 UTC (permalink / raw)
  To: James A. Sutherland
  Cc: Rik van Riel, Jonathan Morton, Joseph A. Knapka, linux-mm

"James A. Sutherland" a ecrit :

> On Sun, 22 Apr 2001 15:57:32 -0300 (BRST), you wrote:
>
> >On Sun, 22 Apr 2001, Jonathan Morton wrote:
> >
> >> I think we're approaching the problem from opposite viewpoints.
> >> Don't get me wrong here - I think process suspension could be a
> >> valuable "feature" under extreme load, but I think that the
> >> working-set idea will perform better and more consistently under "mild
> >> overloads", which the current system handles extremely poorly.
> >
> >Could this mean that we might want _both_ ?
>
> Absolutely, as I said elsewhere.
>
> >1) a minimal guaranteed working set for small processes, so root
> >   can login and large hogs don't penalize good guys
> >   (simpler than the working set idea, should work just as good)
>
> Yep - this will help us under heavy load conditions, when the system
> starts getting "sluggish"...
>
> >2) load control through process suspension when the load gets
> >   too high to handle, this is also good to let the hogs (which
> >   would thrash with the working set idea) make some progress
> >   in turns
>
> Exactly!
>
>

I find this funny because I suggested that idea in 1996 after 2.0 release.
I even gave an example (with an 8 megs box, how time change :-) from
a situation who could be handled only by stopping a process.  That is
two processes who peek 5 Megs of memory in 1 ms (they are scaning
an array).  Since your average disk needs some 20 ms to retrieve a
page that means both processes will spend nearly 100% of time waiting
for pages who have been stolen by the other so the only way is to stop
or swap one of them and let the other run alone for some time.  But at that
time I was told Linux this was feature for high loads and Linux was not
being used there.

BTW this idea has been implemented in mainframes since the 60s.

Another idea in mainframes is that some processes can be swapped out
because you know they will be sleeping for a long time.    The 3270
interface only interacts with the mainframe when user hits the enter key
and in whole screen mode this is when he has filled a whole page of text.
That means that when a process enters keyboard sleep it will probably
remain in that state for  several minutes so in
case MVS needs memory it looks for TSO (interactive) process on keyboard
sleep, swaps them first and ask questions later.
Of course Unix has a differnt UI and I don't see a sleep class where
we can assume programs on it will be sleeping for minutes.

                                    JFM

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-22 18:57                     ` Rik van Riel
  2001-04-22 19:41                       ` James A. Sutherland
@ 2001-04-22 20:21                       ` Jonathan Morton
  2001-04-22 20:36                         ` Jonathan Morton
  1 sibling, 1 reply; 85+ messages in thread
From: Jonathan Morton @ 2001-04-22 20:21 UTC (permalink / raw)
  To: Rik van Riel; +Cc: James A. Sutherland, Joseph A. Knapka, linux-mm

>1) a minimal guaranteed working set for small processes, so root
>   can login and large hogs don't penalize good guys
>   (simpler than the working set idea, should work just as good)

This is also worth considering, perhaps as a subset of the working-set
algorithm.

I'm looking at sources, trying to figure out how to implement this kind of
thing...  but is there an easy way to find out what process(es) is/are
using a given page?  I'm talking about the page-replacement policy, of
course, where (current) is no help in this matter.

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)
big-mail: chromatix@penguinpowered.com
uni-mail: j.d.morton@lancaster.ac.uk

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-----END GEEK CODE BLOCK-----


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-22 18:57                     ` Rik van Riel
@ 2001-04-22 19:41                       ` James A. Sutherland
  2001-04-22 20:33                         ` Jean Francois Martinez
  2001-04-22 20:21                       ` Jonathan Morton
  1 sibling, 1 reply; 85+ messages in thread
From: James A. Sutherland @ 2001-04-22 19:41 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Jonathan Morton, Joseph A. Knapka, linux-mm

On Sun, 22 Apr 2001 15:57:32 -0300 (BRST), you wrote:

>On Sun, 22 Apr 2001, Jonathan Morton wrote:
>
>> I think we're approaching the problem from opposite viewpoints.  
>> Don't get me wrong here - I think process suspension could be a
>> valuable "feature" under extreme load, but I think that the
>> working-set idea will perform better and more consistently under "mild
>> overloads", which the current system handles extremely poorly.  
>
>Could this mean that we might want _both_ ?

Absolutely, as I said elsewhere.

>1) a minimal guaranteed working set for small processes, so root
>   can login and large hogs don't penalize good guys
>   (simpler than the working set idea, should work just as good)

Yep - this will help us under heavy load conditions, when the system
starts getting "sluggish"...

>2) load control through process suspension when the load gets
>   too high to handle, this is also good to let the hogs (which
>   would thrash with the working set idea) make some progress
>   in turns

Exactly!


James.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-22 19:01                     ` James A. Sutherland
  2001-04-22 19:11                       ` Rik van Riel
@ 2001-04-22 19:30                       ` Jonathan Morton
  2001-04-22 20:35                         ` James A. Sutherland
  1 sibling, 1 reply; 85+ messages in thread
From: Jonathan Morton @ 2001-04-22 19:30 UTC (permalink / raw)
  To: James A. Sutherland; +Cc: Joseph A. Knapka, linux-mm

>>>>- Login needs paging in (is suspended while it waits).

>>But login was suspended because of a page fault,
>
>No, login was NOT *suspended*. It's sleeping on I/O, not suspended.

So, the memory hogs are causing page faults (accessing memory which is not
currently resident), login is causing page faults (same definition).
What's the difference?

>>>Not really. Your WS suggestion doesn't evict some processes entirely,
>>>which is necessary under some workloads.
>>
>>Can you give an example of such a workload?
>
>Example: any process which is doing random access throughout an array
>in memory. Let's suppose it's a 100 Mb array on a machine with 128Mb
>of RAM.

>How exactly will your approach solve the two process case, yet still
>keeping the processes running properly?

It will allocate one process it's entire working set in physical RAM, and
allow the other to make progress as fast as disk I/O will allow (which I
would call "single-process thrashing").  When, after a few seconds, the
entirely-resident process completes, the other is allowed to take up as
much RAM as it likes.

If I've followed my mental dry-run correctly, the entirely-resident process
would probably be the *second* process to be started, assuming both are
identical, one is started a few scheduling cycles after the other, and the
first process establishes it's 100Mb working set within those few cycles.

If, at this point, your suspension algorithm decided to suspend the
(mostly) swapped-out process for a few brief periods of time, it would have
little effect except maybe to slightly delay the resumption of progress of
the swapped-out process and to reduce the amount of disk I/O caused while
the first process ran to completion.

>>I think we're approaching the problem from opposite viewpoints.  Don't get
>>me wrong here - I think process suspension could be a valuable "feature"
>>under extreme load, but I think that the working-set idea will perform
>>better and more consistently under "mild overloads", which the current
>>system handles extremely poorly.  Probably the only way to resolve this
>>argument is to actually try and implement each idea, and see how they
>>perform.
>
>Since the two are not mutually exclusive, why try "comparing" them?
>Returning to our car analogy, would you try "comparing" snow chains
>with diff-lock?!

I said nothing about comparison or competition.  By "each idea" I *include*
the possibility of having the suspension algorithm *and* the working-set
algorithm implemented simultaneously.  It would be instructive to see how
they performed separately, too.

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)
big-mail: chromatix@penguinpowered.com
uni-mail: j.d.morton@lancaster.ac.uk

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-----END GEEK CODE BLOCK-----

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-22 19:01                     ` James A. Sutherland
@ 2001-04-22 19:11                       ` Rik van Riel
  2001-04-22 20:36                         ` James A. Sutherland
  2001-04-22 19:30                       ` Jonathan Morton
  1 sibling, 1 reply; 85+ messages in thread
From: Rik van Riel @ 2001-04-22 19:11 UTC (permalink / raw)
  To: James A.Sutherland; +Cc: Jonathan Morton, Joseph A. Knapka, linux-mm

On Sun, 22 Apr 2001, James A.Sutherland wrote:

> >But login was suspended because of a page fault,
> 
> No, login was NOT *suspended*. It's sleeping on I/O, not suspended.
> 
> > so potentially it will
> >*also* get suspended for just as long as the hogs.  
> 
> No, it will get CPU time a small fraction of a second later, once the
> I/O completes.

You're assuming login won't have the rest of its memory (which
it needs to do certain things) swapped out again in the time
it waits for this page to be swapped in...

... which is exactly what happens when the system is thrashing.

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com.br/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-22 18:18                   ` Jonathan Morton
  2001-04-22 18:57                     ` Rik van Riel
@ 2001-04-22 19:01                     ` James A. Sutherland
  2001-04-22 19:11                       ` Rik van Riel
  2001-04-22 19:30                       ` Jonathan Morton
  1 sibling, 2 replies; 85+ messages in thread
From: James A. Sutherland @ 2001-04-22 19:01 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Joseph A. Knapka, linux-mm

On Sun, 22 Apr 2001 19:18:19 +0100, you wrote:

>>>No, it doesn't.  If we stick with the current page-replacement policy, then
>>>regardless of what we do with the size of the timeslice, there is always
>>>going to be the following situation:
>>
>>This is not just a case of increasing the timeslice: the suspension
>>strategy avoids the penultimate stage of this list happening:
>>
>>>- Large process(es) are thrashing.
>>>- Login needs paging in (is suspended while it waits).
>>>- Each large process gets it's page and is resumed, but immediately page
>>>faults again, gets suspended
>>>- Memory reserved for Login gets paged out before Login can do any useful
>>>work
>>
>>Except suspended processes do not get scheduled for a couple of
>>seconds, meaning login CAN do useful work.
>
>But login was suspended because of a page fault,

No, login was NOT *suspended*. It's sleeping on I/O, not suspended.

> so potentially it will
>*also* get suspended for just as long as the hogs.  

No, it will get CPU time a small fraction of a second later, once the
I/O completes.

>Unless, of course, the
>suspension time is increased with page fault count per process.

The suspension time is irrelevant to login.

>>Not really. Your WS suggestion doesn't evict some processes entirely,
>>which is necessary under some workloads.
>
>Can you give an example of such a workload?

Example: any process which is doing random access throughout an array
in memory. Let's suppose it's a 100 Mb array on a machine with 128Mb
of RAM.

One process running: array in RAM, completes in seconds.

Two processes, no suspension: half the array on disk, both complete in
days.

Two processes, suspension: complete in a little more than twice the
time for one.

How exactly will your approach solve the two process case, yet still
keeping the processes running properly?

>>Distributing "fairly" is sub-optimal: sequential suspension and
>>resumption of each memory hog will yield far better performance. To
>>the extent some workloads fail with your approach but succeed with
>>mine: if a process needs more than the current working-set in RAM to
>>make progress, your suggestion leaves each process spinning, taking up
>>resources.
>
>I think we're approaching the problem from opposite viewpoints.  Don't get
>me wrong here - I think process suspension could be a valuable "feature"
>under extreme load, but I think that the working-set idea will perform
>better and more consistently under "mild overloads", which the current
>system handles extremely poorly.  Probably the only way to resolve this
>argument is to actually try and implement each idea, and see how they
>perform.

Since the two are not mutually exclusive, why try "comparing" them?
Returning to our car analogy, would you try "comparing" snow chains
with diff-lock?!


James.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-22 18:18                   ` Jonathan Morton
@ 2001-04-22 18:57                     ` Rik van Riel
  2001-04-22 19:41                       ` James A. Sutherland
  2001-04-22 20:21                       ` Jonathan Morton
  2001-04-22 19:01                     ` James A. Sutherland
  1 sibling, 2 replies; 85+ messages in thread
From: Rik van Riel @ 2001-04-22 18:57 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: James A. Sutherland, Joseph A. Knapka, linux-mm

On Sun, 22 Apr 2001, Jonathan Morton wrote:

> I think we're approaching the problem from opposite viewpoints.  
> Don't get me wrong here - I think process suspension could be a
> valuable "feature" under extreme load, but I think that the
> working-set idea will perform better and more consistently under "mild
> overloads", which the current system handles extremely poorly.  

Could this mean that we might want _both_ ?

1) a minimal guaranteed working set for small processes, so root
   can login and large hogs don't penalize good guys
   (simpler than the working set idea, should work just as good)

2) load control through process suspension when the load gets
   too high to handle, this is also good to let the hogs (which
   would thrash with the working set idea) make some progress
   in turns

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com.br/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-22 17:06                 ` James A. Sutherland
@ 2001-04-22 18:18                   ` Jonathan Morton
  2001-04-22 18:57                     ` Rik van Riel
  2001-04-22 19:01                     ` James A. Sutherland
  0 siblings, 2 replies; 85+ messages in thread
From: Jonathan Morton @ 2001-04-22 18:18 UTC (permalink / raw)
  To: James A. Sutherland; +Cc: Joseph A. Knapka, linux-mm

>>No, it doesn't.  If we stick with the current page-replacement policy, then
>>regardless of what we do with the size of the timeslice, there is always
>>going to be the following situation:
>
>This is not just a case of increasing the timeslice: the suspension
>strategy avoids the penultimate stage of this list happening:
>
>>- Large process(es) are thrashing.
>>- Login needs paging in (is suspended while it waits).
>>- Each large process gets it's page and is resumed, but immediately page
>>faults again, gets suspended
>>- Memory reserved for Login gets paged out before Login can do any useful
>>work
>
>Except suspended processes do not get scheduled for a couple of
>seconds, meaning login CAN do useful work.

But login was suspended because of a page fault, so potentially it will
*also* get suspended for just as long as the hogs.  Unless, of course, the
suspension time is increased with page fault count per process.

>Not really. Your WS suggestion doesn't evict some processes entirely,
>which is necessary under some workloads.

Can you give an example of such a workload?

>Distributing "fairly" is sub-optimal: sequential suspension and
>resumption of each memory hog will yield far better performance. To
>the extent some workloads fail with your approach but succeed with
>mine: if a process needs more than the current working-set in RAM to
>make progress, your suggestion leaves each process spinning, taking up
>resources.

I think we're approaching the problem from opposite viewpoints.  Don't get
me wrong here - I think process suspension could be a valuable "feature"
under extreme load, but I think that the working-set idea will perform
better and more consistently under "mild overloads", which the current
system handles extremely poorly.  Probably the only way to resolve this
argument is to actually try and implement each idea, and see how they
perform.

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)
big-mail: chromatix@penguinpowered.com
uni-mail: j.d.morton@lancaster.ac.uk

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-----END GEEK CODE BLOCK-----


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-22 16:53               ` Jonathan Morton
@ 2001-04-22 17:06                 ` James A. Sutherland
  2001-04-22 18:18                   ` Jonathan Morton
  0 siblings, 1 reply; 85+ messages in thread
From: James A. Sutherland @ 2001-04-22 17:06 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Joseph A. Knapka, linux-mm

On Sun, 22 Apr 2001 17:53:05 +0100, you wrote:

>>>That might possibly work for some loads, mostly where there are some
>>>processes which are already swapped-in (and have sensible working sets)
>>>alongside the "thrashing" processes.  That would at least give the
>>>well-behaved processes some chance to keep their "active" bits up to date.
>>
>>The trouble is, thrashing isn't really a process level issue: yes,
>>there are a group of processes causing it, but you don't have
>>"thrashing processes" and "non-thrashing processes". Like a car with
>>one wheel stuck in a pool of mud without a diff-lock: yes, you have
>>one or two point(s) where all your engine power is going, and the
>>other wheels are just spinning, but as a result the whole car is going
>>nowhere! In both cases, the answer is to "starve" the spinning
>>wheel(s) of power, allowing the others to pull you out...
>
>Actually, that's not quite how a diff-lock works - it distributes tractive
>effort equally across all four wheels, rather than simply locking a single
>wheel.  You don't get out of a mud puddle by (effectively) braking one
>wheel.

If it's stuck in mud, spinning freely, a diff-lock WILL actually mean
(almost) no power goes to that wheel: it just rotates at the same
speed as the others, with no power being exerted.

>>>However, it doesn't help at all for the cases where some paging-in has to
>>>be done for a well-behaved but only-just-accessed process.
>>
>>Yes it does: we've suspended the runaway process (Netscape, Acrobat
>>Reader, whatever), leaving enough RAM free for login to be paged in.
>
>No, it doesn't.  If we stick with the current page-replacement policy, then
>regardless of what we do with the size of the timeslice, there is always
>going to be the following situation:

This is not just a case of increasing the timeslice: the suspension
strategy avoids the penultimate stage of this list happening:

>- Large process(es) are thrashing.
>- Login needs paging in (is suspended while it waits).
>- Each large process gets it's page and is resumed, but immediately page
>faults again, gets suspended
>- Memory reserved for Login gets paged out before Login can do any useful work

Except suspended processes do not get scheduled for a couple of
seconds, meaning login CAN do useful work.

>- Repeat ad infinitum.

Doesn't repeat, since login has succeeded.

>IOW, even with the current timeslice (which, BTW, depends on 'nice' value -
>setting the memory hogs to nice 19 and XMMS to nice -20 doesn't help), the
>timeslice limit is often never reached for a given process when the system
>is thrashing.  Increasing the timeslice will not help, except for process
>which are already completely resident in memory.  Increasing the suspension
>time *might* help, provided pages newly swapped in get locked in for that
>time period.  Oh, wait a minute...  isn't that exactly what my working-set
>suggestion does?

Not really. Your WS suggestion doesn't evict some processes entirely,
which is necessary under some workloads.

>>>Example of a
>>>critically important process under this category: LOGIN.  :)  IMHO, the
>>>only way to sensibly cater for this case (and a few others) is to update
>>>the page-replacement algorithm.
>>
>>Updating the page replacement algorithm will help, but our core
>>problem remains: we don't have enough pages for the currently active
>>processes! Either we starve SOME processes, or we starve all of
>>them...
>
>Or we distribute the "tractive effort" (physical RAM) equally (or fairly)
>among them, just like the diff-lock you so helpfully mentioned.  :)  A 4x4
>vehicle doesn't perform optimally when the diff-lock is applied, but it's
>certainly an improvement in the case where one wheel would otherwise spin
>uselessly.

Distributing "fairly" is sub-optimal: sequential suspension and
resumption of each memory hog will yield far better performance. To
the extent some workloads fail with your approach but succeed with
mine: if a process needs more than the current working-set in RAM to
make progress, your suggestion leaves each process spinning, taking up
resources.

>Right now, the page-replacement policy simply finds a page it "can" swap
>out, and pays only cursory attention to whether it's actually in use.  I
>firmly believe it's well worth spending a little more effort there to
>reduce the amount of swapping required for a given VM load, especially if
>it means that Linux gets more stable under such loads.  Piddling around
>with the scheduler won't do that, although it might help with pathological
>loads *iff* we get a better pager.

On the contrary: tweaking page-replacement will probably help in most
cases, but won't solve any pathological case.


James.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-22 10:08             ` James A. Sutherland
@ 2001-04-22 16:53               ` Jonathan Morton
  2001-04-22 17:06                 ` James A. Sutherland
  0 siblings, 1 reply; 85+ messages in thread
From: Jonathan Morton @ 2001-04-22 16:53 UTC (permalink / raw)
  To: James A. Sutherland; +Cc: Joseph A. Knapka, linux-mm

>>That might possibly work for some loads, mostly where there are some
>>processes which are already swapped-in (and have sensible working sets)
>>alongside the "thrashing" processes.  That would at least give the
>>well-behaved processes some chance to keep their "active" bits up to date.
>
>The trouble is, thrashing isn't really a process level issue: yes,
>there are a group of processes causing it, but you don't have
>"thrashing processes" and "non-thrashing processes". Like a car with
>one wheel stuck in a pool of mud without a diff-lock: yes, you have
>one or two point(s) where all your engine power is going, and the
>other wheels are just spinning, but as a result the whole car is going
>nowhere! In both cases, the answer is to "starve" the spinning
>wheel(s) of power, allowing the others to pull you out...

Actually, that's not quite how a diff-lock works - it distributes tractive
effort equally across all four wheels, rather than simply locking a single
wheel.  You don't get out of a mud puddle by (effectively) braking one
wheel.

>>However, it doesn't help at all for the cases where some paging-in has to
>>be done for a well-behaved but only-just-accessed process.
>
>Yes it does: we've suspended the runaway process (Netscape, Acrobat
>Reader, whatever), leaving enough RAM free for login to be paged in.

No, it doesn't.  If we stick with the current page-replacement policy, then
regardless of what we do with the size of the timeslice, there is always
going to be the following situation:

- Large process(es) are thrashing.
- Login needs paging in (is suspended while it waits).
- Each large process gets it's page and is resumed, but immediately page
faults again, gets suspended
- Memory reserved for Login gets paged out before Login can do any useful work
- Repeat ad infinitum.

IOW, even with the current timeslice (which, BTW, depends on 'nice' value -
setting the memory hogs to nice 19 and XMMS to nice -20 doesn't help), the
timeslice limit is often never reached for a given process when the system
is thrashing.  Increasing the timeslice will not help, except for process
which are already completely resident in memory.  Increasing the suspension
time *might* help, provided pages newly swapped in get locked in for that
time period.  Oh, wait a minute...  isn't that exactly what my working-set
suggestion does?

>>Example of a
>>critically important process under this category: LOGIN.  :)  IMHO, the
>>only way to sensibly cater for this case (and a few others) is to update
>>the page-replacement algorithm.
>
>Updating the page replacement algorithm will help, but our core
>problem remains: we don't have enough pages for the currently active
>processes! Either we starve SOME processes, or we starve all of
>them...

Or we distribute the "tractive effort" (physical RAM) equally (or fairly)
among them, just like the diff-lock you so helpfully mentioned.  :)  A 4x4
vehicle doesn't perform optimally when the diff-lock is applied, but it's
certainly an improvement in the case where one wheel would otherwise spin
uselessly.

Right now, the page-replacement policy simply finds a page it "can" swap
out, and pays only cursory attention to whether it's actually in use.  I
firmly believe it's well worth spending a little more effort there to
reduce the amount of swapping required for a given VM load, especially if
it means that Linux gets more stable under such loads.  Piddling around
with the scheduler won't do that, although it might help with pathological
loads *iff* we get a better pager.

Right now, I'm going to look at how my working-set algorithm could actually
be implemented in the kernel, starting with my detailed suggestion of the
other day.

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)
big-mail: chromatix@penguinpowered.com
uni-mail: j.d.morton@lancaster.ac.uk

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-----END GEEK CODE BLOCK-----

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-21 19:16         ` Joseph A. Knapka
  2001-04-21 19:41           ` Jonathan Morton
  2001-04-21 20:29           ` Rik van Riel
@ 2001-04-22 10:08           ` James A. Sutherland
  2 siblings, 0 replies; 85+ messages in thread
From: James A. Sutherland @ 2001-04-22 10:08 UTC (permalink / raw)
  To: Joseph A. Knapka; +Cc: linux-mm

On Sat, 21 Apr 2001 13:16:56 -0600, you wrote:

>"James A. Sutherland" wrote:
>> 
>> Note that process suspension already happens, but with too fine a
>> granularity (the scheduler) - that's what causes the problem. If one
>> process were able to run uninterrupted for, say, a second, it would
>> get useful work done, then you could switch to another. The current
>> scheduling doesn't give enough time for that under thrashing
>> conditions.
>
>This suggests that a very simple approach might be to just increase
>the scheduling granularity as the machine begins to thrash. IOW,
>use the existing scheduler as the "suspension scheduler".

That's effectively what this approach does - the problem is, we need
to prevent this process being scheduled for some significant period of
time. I think just SIGSTOPing each process to be suspended is a more
elegant solution than trying to hack the scheduler to support "Don't
schedule this process for the next 5 seconds", but I'm not certain?


James.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-21 19:41           ` Jonathan Morton
@ 2001-04-22 10:08             ` James A. Sutherland
  2001-04-22 16:53               ` Jonathan Morton
  0 siblings, 1 reply; 85+ messages in thread
From: James A. Sutherland @ 2001-04-22 10:08 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Joseph A. Knapka, linux-mm

On Sat, 21 Apr 2001 20:41:40 +0100, you wrote:

>>> Note that process suspension already happens, but with too fine a
>>> granularity (the scheduler) - that's what causes the problem. If one
>>> process were able to run uninterrupted for, say, a second, it would
>>> get useful work done, then you could switch to another. The current
>>> scheduling doesn't give enough time for that under thrashing
>>> conditions.
>>
>>This suggests that a very simple approach might be to just increase
>>the scheduling granularity as the machine begins to thrash. IOW,
>>use the existing scheduler as the "suspension scheduler".
>
>That might possibly work for some loads, mostly where there are some
>processes which are already swapped-in (and have sensible working sets)
>alongside the "thrashing" processes.  That would at least give the
>well-behaved processes some chance to keep their "active" bits up to date.

The trouble is, thrashing isn't really a process level issue: yes,
there are a group of processes causing it, but you don't have
"thrashing processes" and "non-thrashing processes". Like a car with
one wheel stuck in a pool of mud without a diff-lock: yes, you have
one or two point(s) where all your engine power is going, and the
other wheels are just spinning, but as a result the whole car is going
nowhere! In both cases, the answer is to "starve" the spinning
wheel(s) of power, allowing the others to pull you out...

>However, it doesn't help at all for the cases where some paging-in has to
>be done for a well-behaved but only-just-accessed process.  

Yes it does: we've suspended the runaway process (Netscape, Acrobat
Reader, whatever), leaving enough RAM free for login to be paged in.

>Example of a
>critically important process under this category: LOGIN.  :)  IMHO, the
>only way to sensibly cater for this case (and a few others) is to update
>the page-replacement algorithm.

Updating the page replacement algorithm will help, but our core
problem remains: we don't have enough pages for the currently active
processes! Either we starve SOME processes, or we starve all of
them...

James.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-21 19:16         ` Joseph A. Knapka
  2001-04-21 19:41           ` Jonathan Morton
@ 2001-04-21 20:29           ` Rik van Riel
  2001-04-22 10:08           ` James A. Sutherland
  2 siblings, 0 replies; 85+ messages in thread
From: Rik van Riel @ 2001-04-21 20:29 UTC (permalink / raw)
  To: Joseph A. Knapka; +Cc: James A. Sutherland, linux-mm

On Sat, 21 Apr 2001, Joseph A. Knapka wrote:
> "James A. Sutherland" wrote:
> > 
> > Note that process suspension already happens, but with too fine a
> > granularity (the scheduler) - that's what causes the problem. If one
> > process were able to run uninterrupted for, say, a second, it would
> > get useful work done, then you could switch to another. The current
> > scheduling doesn't give enough time for that under thrashing
> > conditions.
> 
> This suggests that a very simple approach might be to just increase
> the scheduling granularity as the machine begins to thrash. IOW,
> use the existing scheduler as the "suspension scheduler".

That doesn't work.  The CPU scheduler works on very small time
scales and won't run any process anyway when all of them are
waiting for IO.

What we want instead is a 2nd level scheduler which simply uses
the standard mechanisms of the kernel to temporarily suspend a
few processes on a LONGER timescale (multiple seconds) and makes
sure the normal scheduler doesn't even try to run them when all
the non-suspended processes are waiting for disk.

Btw, I have something like this almost working ...

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com.br/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-21 19:16         ` Joseph A. Knapka
@ 2001-04-21 19:41           ` Jonathan Morton
  2001-04-22 10:08             ` James A. Sutherland
  2001-04-21 20:29           ` Rik van Riel
  2001-04-22 10:08           ` James A. Sutherland
  2 siblings, 1 reply; 85+ messages in thread
From: Jonathan Morton @ 2001-04-21 19:41 UTC (permalink / raw)
  To: Joseph A. Knapka, James A. Sutherland; +Cc: linux-mm

>> Note that process suspension already happens, but with too fine a
>> granularity (the scheduler) - that's what causes the problem. If one
>> process were able to run uninterrupted for, say, a second, it would
>> get useful work done, then you could switch to another. The current
>> scheduling doesn't give enough time for that under thrashing
>> conditions.
>
>This suggests that a very simple approach might be to just increase
>the scheduling granularity as the machine begins to thrash. IOW,
>use the existing scheduler as the "suspension scheduler".

That might possibly work for some loads, mostly where there are some
processes which are already swapped-in (and have sensible working sets)
alongside the "thrashing" processes.  That would at least give the
well-behaved processes some chance to keep their "active" bits up to date.

However, it doesn't help at all for the cases where some paging-in has to
be done for a well-behaved but only-just-accessed process.  Example of a
critically important process under this category: LOGIN.  :)  IMHO, the
only way to sensibly cater for this case (and a few others) is to update
the page-replacement algorithm.

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)
big-mail: chromatix@penguinpowered.com
uni-mail: j.d.morton@lancaster.ac.uk

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-----END GEEK CODE BLOCK-----


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-21  5:49       ` James A. Sutherland
@ 2001-04-21 19:16         ` Joseph A. Knapka
  2001-04-21 19:41           ` Jonathan Morton
                             ` (2 more replies)
  0 siblings, 3 replies; 85+ messages in thread
From: Joseph A. Knapka @ 2001-04-21 19:16 UTC (permalink / raw)
  To: James A. Sutherland; +Cc: linux-mm

"James A. Sutherland" wrote:
> 
> Note that process suspension already happens, but with too fine a
> granularity (the scheduler) - that's what causes the problem. If one
> process were able to run uninterrupted for, say, a second, it would
> get useful work done, then you could switch to another. The current
> scheduling doesn't give enough time for that under thrashing
> conditions.

This suggests that a very simple approach might be to just increase
the scheduling granularity as the machine begins to thrash. IOW,
use the existing scheduler as the "suspension scheduler".

-- Joe
 

-- 
"If I ever get reincarnated... let me make certain I don't come back
 as a paperclip." -- protagonist, H Murakami's "Hard-boiled Wonderland"
// Linux MM Documentation in progress:
// http://home.earthlink.net/~jknapka/linux-mm/vmoutline.html
* Evolution is an "unproven theory" in the same sense that gravity is. *
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-20 12:14     ` Szabolcs Szakacsits
  2001-04-20 12:02       ` Jonathan Morton
  2001-04-20 14:48       ` Dave McCracken
@ 2001-04-21  5:49       ` James A. Sutherland
  2001-04-21 19:16         ` Joseph A. Knapka
  2 siblings, 1 reply; 85+ messages in thread
From: James A. Sutherland @ 2001-04-21  5:49 UTC (permalink / raw)
  To: Szabolcs Szakacsits; +Cc: Dave McCracken, linux-mm

On Fri, 20 Apr 2001 14:14:29 +0200 (MET DST), you wrote:
>On Thu, 19 Apr 2001, James A. Sutherland wrote:
>
>> That's my suspicion too: The "strangled" processes eat up system
>> resources and still get nowhere (no win there: might as well suspend
>> them until they can run properly!) and you are wasting resources which
>> could be put to good use by other processes.
>
>You assumes processes are completely equal or their goodnesses are based
>on their thrasing behavior. No. Processes are not like that from user
>point of view (admins, app developers) moreover they can have complex
>relationships between them.

How do you think I am assuming this? The kernel already suspends and
resumes processes all the time!

>Kernel must give mechanisms to enforce policies, not to dictate them.
>And this can be done even at present. You want to create and solve a
>problem that doesn't exist because you don't want to RTFM.

"RTFM" does not solve this problem. All the manual in question could
say is "add more RAM" or "kill some processes". That's not very
useful.

>> More to the point, though, what about the worst case, where every
>> process is thrashing?
>
>What about the simplest case when one process thrasing? 

Tell me how one process can be starving ITSELF of resources?!

>You suspend it
>continuously from time to time so it won't finish e.g. in 10 minutes but
>in 1 hour.

No you don't. If you have TWO processes which are harming each other
by fighting over memory, you start suspending them alternately: this
makes both complete SOONER than otherwise!

>> With my approach, some processes get suspended, others run to
>> completion freeing up resources for others.
>
>This is black magic also. Why do you think they will run to completion
>or/and free up memory?

If all your active processes are in infinite loops, nothing is going
to help you here short of killing them - which my approach also makes
easier/possible.

>> With this approach, every process will still thrash indefinitely:
>> perhaps the effects on other processes will be reduced, but you
>> don't actually get out of the hole you're in!
>
>So both approach failed.

Note that process suspension already happens, but with too fine a
granularity (the scheduler) - that's what causes the problem. If one
process were able to run uninterrupted for, say, a second, it would
get useful work done, then you could switch to another. The current
scheduling doesn't give enough time for that under thrashing
conditions.

James.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-20 12:14     ` Szabolcs Szakacsits
  2001-04-20 12:02       ` Jonathan Morton
@ 2001-04-20 14:48       ` Dave McCracken
  2001-04-21  5:49       ` James A. Sutherland
  2 siblings, 0 replies; 85+ messages in thread
From: Dave McCracken @ 2001-04-20 14:48 UTC (permalink / raw)
  To: Szabolcs Szakacsits; +Cc: linux-mm

--On Friday, April 20, 2001 14:14:29 +0200 Szabolcs Szakacsits 
<szaka@f-secure.com> wrote:

> What about the simplest case when one process thrasing? You suspend it
> continuously from time to time so it won't finish e.g. in 10 minutes but
> in 1 hour.

Isn't one prcess thrashing sort of like one hand clapping? :)

Seriously, the state we're talking about is when the running processes in 
the machine collectively want significantly more memory than is available, 
and none of them can make real progress.  Suspending one or more of them 
for a few seconds will actually improve throughput and responsiveness of 
the entire system.  As Rik has said, this has been in pretty much all 
flavors of Unix since the early days, and it has been proven to be 
effective.

I'm not saying there aren't other things we can do with working set 
tracking that could help push out the point where the machine thrashes, but 
at some point all those mechanisms will be overwhelmed, and process 
suspension is a good last resort.

Dave McCracken

======================================================================
Dave McCracken          IBM Linux Base Kernel Team      1-512-838-3059
dmc@austin.ibm.com                                      T/L   678-3059

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-19 18:32   ` James A. Sutherland
  2001-04-19 20:23     ` Jonathan Morton
@ 2001-04-20 12:14     ` Szabolcs Szakacsits
  2001-04-20 12:02       ` Jonathan Morton
                         ` (2 more replies)
  1 sibling, 3 replies; 85+ messages in thread
From: Szabolcs Szakacsits @ 2001-04-20 12:14 UTC (permalink / raw)
  To: James A. Sutherland; +Cc: Dave McCracken, linux-mm

On Thu, 19 Apr 2001, James A. Sutherland wrote:

> That's my suspicion too: The "strangled" processes eat up system
> resources and still get nowhere (no win there: might as well suspend
> them until they can run properly!) and you are wasting resources which
> could be put to good use by other processes.

You assumes processes are completely equal or their goodnesses are based
on their thrasing behavior. No. Processes are not like that from user
point of view (admins, app developers) moreover they can have complex
relationships between them.

Kernel must give mechanisms to enforce policies, not to dictate them.
And this can be done even at present. You want to create and solve a
problem that doesn't exist because you don't want to RTFM.

> More to the point, though, what about the worst case, where every
> process is thrashing?

What about the simplest case when one process thrasing? You suspend it
continuously from time to time so it won't finish e.g. in 10 minutes but
in 1 hour.

> With my approach, some processes get suspended, others run to
> completion freeing up resources for others.

This is black magic also. Why do you think they will run to completion
or/and free up memory?

> With this approach, every process will still thrash indefinitely:
> perhaps the effects on other processes will be reduced, but you
> don't actually get out of the hole you're in!

So both approach failed.

	Szaka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-20 12:14     ` Szabolcs Szakacsits
@ 2001-04-20 12:02       ` Jonathan Morton
  2001-04-20 14:48       ` Dave McCracken
  2001-04-21  5:49       ` James A. Sutherland
  2 siblings, 0 replies; 85+ messages in thread
From: Jonathan Morton @ 2001-04-20 12:02 UTC (permalink / raw)
  To: Szabolcs Szakacsits, James A. Sutherland; +Cc: Dave McCracken, linux-mm

>> More to the point, though, what about the worst case, where every
>> process is thrashing?
>
>What about the simplest case when one process thrasing? You suspend it
>continuously from time to time so it won't finish e.g. in 10 minutes but
>in 1 hour.

One process thrashing, lots of other processes behaving sensibly.  With the
current page-replacement policy, active memory belonging to well-behaved
processes will be regularly paged out (a Bad Thing?), whether the thrashing
process is suspended periodically or not.  The suspensions simply reduce
the frequency of this a little.

Where *every* process is thrashing, you have to suspend lots of processes
in order to get the rest to run.  Also, every time you change the set of
suspended processes, you have to wait for the VM to settle before the peak
useful work is being done again, and even longer than that before you can
sensibly change the set of suspended processes again.  This is *very*
granular - of the order of tens of seconds for a medium-sized PC-type
computer.

We need a better page-replacement algorithm, and I think my suggestion goes
some way towards that.  Who knows, I might even attempt to implement it
next week...

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)
big-mail: chromatix@penguinpowered.com
uni-mail: j.d.morton@lancaster.ac.uk

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-----END GEEK CODE BLOCK-----

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-19 18:32   ` James A. Sutherland
@ 2001-04-19 20:23     ` Jonathan Morton
  2001-04-20 12:14     ` Szabolcs Szakacsits
  1 sibling, 0 replies; 85+ messages in thread
From: Jonathan Morton @ 2001-04-19 20:23 UTC (permalink / raw)
  To: James A. Sutherland, Dave McCracken; +Cc: linux-mm

>>It appears to me that the end result of all this is about the same as
>>suspending a few selected processes.  Under your algorithm the processes
>>that have no guaranteed working set make no real progress and the others
>>get to run.  It seems like a significant amount of additional overhead to
>>end up with the same result.  Additionally, those processes will be
>>generating large numbers of page faults as they fight over the scrap of
>>memory they have.  Using the suspension algorithm they'll be removed
>>entirely from running, this freeing up resources for the remaining
>>processes.
>
>That's my suspicion too: The "strangled" processes eat up system
>resources and still get nowhere (no win there: might as well suspend
>them until they can run properly!) and you are wasting resources which
>could be put to good use by other processes.
>
>More to the point, though, what about the worst case, where every
>process is thrashing? With my approach, some processes get suspended,
>others run to completion freeing up resources for others. With this
>approach, every process will still thrash indefinitely: perhaps the
>effects on other processes will be reduced, but you don't actually get
>out of the hole you're in!

My suggestion is not written in competition with the suspension idea, but
as a significant improvement to the current situation.  I also believe that
my suggestion can be implemented very cheaply and (mostly) with O(1)
complexity.

Case study: my 256Mb RAM box with arbitrary amount of swap.

Load X, a few system monitors, and XMMS.  XMMS on this configuration
consumes about 9Mb RAM and probably has a working set well below 4Mb.  Now
load 3 synthetic memory hogs with essentially infinite working sets.  XMMS
will soon begin to stutter as it is repeatedly paged in and out arbitrarily
by the rather poor NRU algorithm Linux uses.  Upon loading the fourth
memory hog, XMMS and X will stop working altogether, and it becomes
impossible to log in either locally or remotely.  Usually when this
happens, I am forced to hit the reset switch.

With the working set algorithm I proposed, the active portions of XMMS, X
and all other processes would be kept in physical memory, preventing the
stuttering and subsequent failure.  Login processes would also continue to
operate correctly, if with a little delay as the process is initially paged
in.  The memory hogs will thrash themselves and make *slow* progress (this
is NOT the same as *no* progress), but their impact on the system at large
is *much* less than at present.  Remember that processes unrelated to the
swap activity can continue to operate while the disk and swap are in use!

Now for the worst-case scenario, where no active process on the system has
a working set small enough to be given it's entire share.  For this
example, I will use an 8Mb RAM box with 1.5Mb used by the kernel and a
total of 200Kb reserved for buffers, cache and other sundry items.  There
are 10 memory hogs running on this system - each of their working sets is
far larger than the physical memory on the system.  There are no other
processes running, but are present.  Obviously the system is thrashing, but
because each memory hog gets to keep several hundred Kb of itself resident
at a time, progress is still made.

On the above system, suppose root wants to log in and kill some of the
thrashing processes.  At present, this would not be possible (as on the
256Mb box), because swapped-in pages get thrown out even before they can be
used by the login process.  With the working set algorithm, pages used by
the login processes would be forced to remain resident until they were no
longer needed, and root can log in and deal with the situation.

Now consider if 100 memory hogs are present on the 8Mb box.  Each will
effectively have 67Kb to work in - thrashing still definitely occurs, but
the system is still alive.  Root wants to log in - and login gets to keep
66K of resident pages at a time in the *worst* case, and may even be able
to keep *all* of itself resident (depending on the tunable parameter - the
number of pages reserved for each large-working-set process).  I think 66K
is enough to keep a login process happy.

I repeat my request for a precisely-defined suspension algorithm.  I would
like to consider how well it performs in the above 3 scenarios,
particularly in the last case where there are approximately 100 processes
to suspend at once before root can log in successfully.

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)
big-mail: chromatix@penguinpowered.com
uni-mail: j.d.morton@lancaster.ac.uk

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-----END GEEK CODE BLOCK-----

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-19 18:25 ` Dave McCracken
@ 2001-04-19 18:32   ` James A. Sutherland
  2001-04-19 20:23     ` Jonathan Morton
  2001-04-20 12:14     ` Szabolcs Szakacsits
  0 siblings, 2 replies; 85+ messages in thread
From: James A. Sutherland @ 2001-04-19 18:32 UTC (permalink / raw)
  To: Dave McCracken; +Cc: linux-mm

On Thu, 19 Apr 2001 13:25:45 -0500, you wrote:

>--On Thursday, April 19, 2001 15:03:28 +0100 Jonathan Morton 
><chromi@cyberspace.org> wrote:
>
>> My proposal is to introduce a better approximation to LRU in the VM,
>> solely for the purpose of determining the working set.  No alterations to
>> the page replacement policy are needed per se, except to honour the
>> "allowed working set" for each process as calculated below.
>>
>> (...)
>>
>> - Calculate the total physical quota for all processes as the sum of all
>> working sets (plus unswappable memory such as kernel, mlock(), plus a
>> small chunk to handle buffers, cache, etc.)
>> - If this total is within the physical memory of the system, the physical
>> quota for each process is the same as it's working set.  (fast common
>> case) - Otherwise, locate the process with the largest quota and remove
>> it from the total quota.  Add in "a few" pages to ensure this process
>> always has *some* memory to work in.  Repeat this step until the physical
>> quota is within physical memory or no processes remain.
>> - Any remaining processes after this step get their full working set as
>> physical quota.  Processes removed from the list get equal share of
>> (remaining physical memory, minus the chunk for buffers, cache and so on).
>
>It appears to me that the end result of all this is about the same as 
>suspending a few selected processes.  Under your algorithm the processes 
>that have no guaranteed working set make no real progress and the others 
>get to run.  It seems like a significant amount of additional overhead to 
>end up with the same result.  Additionally, those processes will be 
>generating large numbers of page faults as they fight over the scrap of 
>memory they have.  Using the suspension algorithm they'll be removed 
>entirely from running, this freeing up resources for the remaining 
>processes.

That's my suspicion too: The "strangled" processes eat up system
resources and still get nowhere (no win there: might as well suspend
them until they can run properly!) and you are wasting resources which
could be put to good use by other processes.

More to the point, though, what about the worst case, where every
process is thrashing? With my approach, some processes get suspended,
others run to completion freeing up resources for others. With this
approach, every process will still thrash indefinitely: perhaps the
effects on other processes will be reduced, but you don't actually get
out of the hole you're in!


James.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
  2001-04-19 14:03 suspend processes at load (was Re: a simple OOM ...) Jonathan Morton
@ 2001-04-19 18:25 ` Dave McCracken
  2001-04-19 18:32   ` James A. Sutherland
  0 siblings, 1 reply; 85+ messages in thread
From: Dave McCracken @ 2001-04-19 18:25 UTC (permalink / raw)
  To: linux-mm

--On Thursday, April 19, 2001 15:03:28 +0100 Jonathan Morton 
<chromi@cyberspace.org> wrote:

> My proposal is to introduce a better approximation to LRU in the VM,
> solely for the purpose of determining the working set.  No alterations to
> the page replacement policy are needed per se, except to honour the
> "allowed working set" for each process as calculated below.
>
> (...)
>
> - Calculate the total physical quota for all processes as the sum of all
> working sets (plus unswappable memory such as kernel, mlock(), plus a
> small chunk to handle buffers, cache, etc.)
> - If this total is within the physical memory of the system, the physical
> quota for each process is the same as it's working set.  (fast common
> case) - Otherwise, locate the process with the largest quota and remove
> it from the total quota.  Add in "a few" pages to ensure this process
> always has *some* memory to work in.  Repeat this step until the physical
> quota is within physical memory or no processes remain.
> - Any remaining processes after this step get their full working set as
> physical quota.  Processes removed from the list get equal share of
> (remaining physical memory, minus the chunk for buffers, cache and so on).

It appears to me that the end result of all this is about the same as 
suspending a few selected processes.  Under your algorithm the processes 
that have no guaranteed working set make no real progress and the others 
get to run.  It seems like a significant amount of additional overhead to 
end up with the same result.  Additionally, those processes will be 
generating large numbers of page faults as they fight over the scrap of 
memory they have.  Using the suspension algorithm they'll be removed 
entirely from running, this freeing up resources for the remaining 
processes.

Dave McCracken

======================================================================
Dave McCracken          IBM Linux Base Kernel Team      1-512-838-3059
dmc@austin.ibm.com                                      T/L   678-3059

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: suspend processes at load (was Re: a simple OOM ...)
@ 2001-04-19 14:03 Jonathan Morton
  2001-04-19 18:25 ` Dave McCracken
  0 siblings, 1 reply; 85+ messages in thread
From: Jonathan Morton @ 2001-04-19 14:03 UTC (permalink / raw)
  To: linux-mm

>> THIS is why we need process suspension in the kernel.
>
>Not necessarily.  Creating a minimal working set guarantee for small
>tasks is one way to avoid the need for process suspension.  Creating a
>dynamic working set upper limit for large, thrashing tasks is a way to
>avoid the thrashing tasks from impacting everybody else too much.
>There are many possible ways forward, and I am not yet convinced that
>process suspension is necessary.

Let's stop arguing at such an abstract level, and try to get some
algorithms down so we can analyse this properly.  Below, I outline a
possible algorithm for handling the working-set model I outlined yesterday.
Those of you who still believe in process suspension, please do likewise
(exactly which process do you suspend, and for how long?).

My proposal is to introduce a better approximation to LRU in the VM, solely
for the purpose of determining the working set.  No alterations to the page
replacement policy are needed per se, except to honour the "allowed working
set" for each process as calculated below.

As I understand things (correct me if I'm wrong), there is a list of VM
pages associated with each process (current->mm).  There is also a number
of lists of pages, classifying them into "active", "inactive/clean",
"inactive/dirty" and so on.  There are routines which know when and how to
move pages between these lists (precisely when and how these are called is
an area I haven't investigated yet).  When a process accesses memory, there
must be a routine which moves the relevant page onto the active list.  The
page may already be on the active list, in which case nothing is done at
present.

During the act of moving a page onto the active list (or determining that
it already is there and doesn't need to be moved), I think it would be
appropriate to associate the time of last access with the page, and the
page access order with the process.  From maintenance of a list of such
associations, the working set of the process can be calculated quite easily.

struct working_set_list {
	struct working_set_list *next;
	page_id id;
	unsigned short accessed;
};

Suppose the page list current->mm is extended to contain the above in some
manner.  The 'accessed' field is set to the LSW of jiffies, and old entries
are purged from the working set when 0x8000 jiffies have passed (about 5.5
minutes on x86 and other 100Hz systems, probably shorter on some
architectures).  By keeping head, tail and possibly 'oldness threshold'
pointers in current->mm, list maintenance should become O(1) for most
common operations.

The working set is simply the number of entries in the list which are newer
than the oldness threshold.  Calculation of this value can be made trivial
by keeping a separate counter (similar to current->mm.total_vm) which is
updated whenever the list is maintained.  Note that the working set can
contain pages which are not in the active list - removal of a page from the
active list does not remove it from the working set.

Since the sum of all the working sets in the system can be greater than the
physical memory in the system (this is what thrashing means, after all), a
"physical quota" needs to be calculated for each process.  The calculation
of the physical quota is based heavily on the working set, and should
probably be done at scan-for-swap-out time.  This calculation is roughly as
I described yesterday:

- Calculate the total physical quota for all processes as the sum of all
working sets (plus unswappable memory such as kernel, mlock(), plus a small
chunk to handle buffers, cache, etc.)
- If this total is within the physical memory of the system, the physical
quota for each process is the same as it's working set.  (fast common case)
- Otherwise, locate the process with the largest quota and remove it from
the total quota.  Add in "a few" pages to ensure this process always has
*some* memory to work in.  Repeat this step until the physical quota is
within physical memory or no processes remain.
- Any remaining processes after this step get their full working set as
physical quota.  Processes removed from the list get equal share of
(remaining physical memory, minus the chunk for buffers, cache and so on).

Now we turn to the page replacement policy.  At present, AFAICT, this is a
"not recently used" policy - pages are swapped out if they are not actually
in the active list.  The act of scanning memory for swappable pages also
removes pages from the active list (presumably so they will be swapped out
anyway if nothing could be found on the first scan).

A simple modification is needed here - if a page is "not recently used" AND
all of the page's users are processes which currently have more
physically-resident pages than it's "physical quota" as calculated above,
it is swapped out.

For the special case where the physical quota of a process equals it's
working set, the replacement algorithm might check if the candidate page is
in the working set for the process, as a hint not to page it out.

Comments?

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)
big-mail: chromatix@penguinpowered.com
uni-mail: j.d.morton@lancaster.ac.uk

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-----END GEEK CODE BLOCK-----

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

end of thread, other threads:[~2001-04-23  5:59 UTC | newest]

Thread overview: 85+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-04-12 16:58 [PATCH] a simple OOM killer to save me from Netscape Slats Grobnik
2001-04-12 18:25 ` Rik van Riel
2001-04-12 18:49   ` James A. Sutherland
2001-04-13  6:45   ` Eric W. Biederman
2001-04-13 16:20     ` Rik van Riel
2001-04-14  1:20       ` Stephen C. Tweedie
2001-04-16 21:06         ` James A. Sutherland
2001-04-16 21:40           ` Jonathan Morton
2001-04-16 22:12             ` Rik van Riel
2001-04-16 22:21             ` James A. Sutherland
2001-04-17 14:26               ` Jonathan Morton
2001-04-17 19:53                 ` Rik van Riel
2001-04-17 20:44                   ` James A. Sutherland
2001-04-17 20:59                     ` Jonathan Morton
2001-04-17 21:09                       ` James A. Sutherland
2001-04-14  7:00       ` Eric W. Biederman
2001-04-15  5:05         ` Rik van Riel
2001-04-15  5:20           ` Rik van Riel
2001-04-16 11:52         ` Szabolcs Szakacsits
2001-04-16 12:17       ` suspend processes at load (was Re: a simple OOM ...) Szabolcs Szakacsits
2001-04-17 19:48         ` Rik van Riel
2001-04-18 21:32           ` Szabolcs Szakacsits
2001-04-18 20:38             ` James A. Sutherland
2001-04-18 23:25               ` Szabolcs Szakacsits
2001-04-18 22:29                 ` Rik van Riel
2001-04-19 10:14                   ` Stephen C. Tweedie
2001-04-19 13:23                   ` Szabolcs Szakacsits
2001-04-19  2:11                 ` Rik van Riel
2001-04-19  7:08                   ` James A. Sutherland
2001-04-19 13:37                     ` Szabolcs Szakacsits
2001-04-19 12:26                       ` Christoph Rohland
2001-04-19 12:30                       ` James A. Sutherland
2001-04-19  9:15                 ` James A. Sutherland
2001-04-19 18:34             ` Dave McCracken
2001-04-19 18:47               ` James A. Sutherland
2001-04-19 18:53                 ` Dave McCracken
2001-04-19 19:10                   ` James A. Sutherland
2001-04-20 14:58                     ` Rik van Riel
2001-04-21  6:10                       ` James A. Sutherland
2001-04-19 19:13                   ` Rik van Riel
2001-04-19 19:47                     ` Gerrit Huizenga
2001-04-20 12:44                       ` Szabolcs Szakacsits
2001-04-19 20:06                     ` James A. Sutherland
2001-04-20 12:29                     ` Szabolcs Szakacsits
2001-04-20 11:50                       ` Jonathan Morton
2001-04-20 13:32                         ` Szabolcs Szakacsits
2001-04-20 14:30                           ` Rik van Riel
2001-04-22 10:21                       ` James A. Sutherland
2001-04-20 12:25                 ` Szabolcs Szakacsits
2001-04-21  6:08                   ` James A. Sutherland
2001-04-20 12:18               ` Szabolcs Szakacsits
2001-04-22 10:19                 ` James A. Sutherland
2001-04-17 10:58 ` limit for number of processes Uman
2001-04-19 14:03 suspend processes at load (was Re: a simple OOM ...) Jonathan Morton
2001-04-19 18:25 ` Dave McCracken
2001-04-19 18:32   ` James A. Sutherland
2001-04-19 20:23     ` Jonathan Morton
2001-04-20 12:14     ` Szabolcs Szakacsits
2001-04-20 12:02       ` Jonathan Morton
2001-04-20 14:48       ` Dave McCracken
2001-04-21  5:49       ` James A. Sutherland
2001-04-21 19:16         ` Joseph A. Knapka
2001-04-21 19:41           ` Jonathan Morton
2001-04-22 10:08             ` James A. Sutherland
2001-04-22 16:53               ` Jonathan Morton
2001-04-22 17:06                 ` James A. Sutherland
2001-04-22 18:18                   ` Jonathan Morton
2001-04-22 18:57                     ` Rik van Riel
2001-04-22 19:41                       ` James A. Sutherland
2001-04-22 20:33                         ` Jean Francois Martinez
2001-04-22 20:21                       ` Jonathan Morton
2001-04-22 20:36                         ` Jonathan Morton
2001-04-22 19:01                     ` James A. Sutherland
2001-04-22 19:11                       ` Rik van Riel
2001-04-22 20:36                         ` James A. Sutherland
2001-04-22 19:30                       ` Jonathan Morton
2001-04-22 20:35                         ` James A. Sutherland
2001-04-22 20:41                           ` Rik van Riel
2001-04-22 20:58                             ` James A. Sutherland
2001-04-22 21:26                               ` Rik van Riel
2001-04-22 22:26                                 ` Jonathan Morton
2001-04-23  5:55                                   ` James A. Sutherland
2001-04-23  5:59                                     ` Rik van Riel
2001-04-21 20:29           ` Rik van Riel
2001-04-22 10:08           ` James A. Sutherland

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox