linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] a simple OOM killer to save me from Netscape
@ 2001-04-12 16:58 Slats Grobnik
  2001-04-12 18:25 ` Rik van Riel
  2001-04-17 10:58 ` limit for number of processes Uman
  0 siblings, 2 replies; 85+ messages in thread
From: Slats Grobnik @ 2001-04-12 16:58 UTC (permalink / raw)
  To: linux-mm

This is to solve a specific problem, with no claim to generality.  Say
some X-app memory hog (always seems to be a browser) sneaks up on you,
and by the time HD thrashing catches your attention, the mouse & keyboard
have become sluggish or unresponsive--it may already be too late.
Pretty soon, even the Magic SysRq keybindings don't work....
This used to be my only occasion for ever resorting to the Reset
button, until I found out about oom_kill.  In all the message 
traffic about it, I haven't found this particular solution, so
here's my patch.  It might be useful on some desktop systems.

First, I simplified the criteria for selecting the killable app, as
seemed appropriate.  No root processes;  don't worry about CPU time,
nor nice-ness;  leave direct-hardware-access processes alone.  These
changes to the function `badness' weren't quite enough.  A 2.2.17
kernel patched with such an oom_kill was saved from hard-rebooting
5 or 6 times during a 50-day uptime....but only after waiting through
_extended_ bouts of thrashing.  Here's what happens:

By running `free -s1' or `top' it's clear that once swap memory gets
maxed out, *cache* memory size decreases until, at about 4M, mouse & 
keyboard response becomes noticeably sluggish.  At cache=3M or less,
all hope is lost.  But at this point, *free* RAM size may not be
affected much.  And since CPU activity is down to a crawl, it may
take a while to reach minimum (or some small arbitrary figure.)
So I altered the `out_of_memory' function accordingly, and expect to
never reboot again.  (Except for changing kernels, and power outage.
 (But don't ever try to mount a swap partition.  Seriously.  Nor stick
 beans up your nose. ))       regards,   Slats  

:THANKS:
to Rik van Riel for documenting his code with comments plain enough
that a beginner might be tempted.  It's the chance you take.

:NOTES:  
occasioned by my ignorance of LK programming and C.
  1. PAGE_CACHE_SHIFT would be better than PAGE_SHIFT, but the former
     is undefined in oom_kill.c and I don't know enough to go messing
     with includes.  I _think_ the patch is arch independent, if anyone cares.
  2. I don't know why `atomic_read(&page_cache_size)' is better than 
     `page_cache_size.counter';  I'm just mimicking something I saw 
     while grepping source.  Anyway, to patch 2.2.19 & preceding 
     versions, just substitute `page_cache_size' instead
     (which is just a number in 2.2, while in 2.4 it's a struct
     containing a single member, which is a number.  Go figure.)
  3. (3 << 20)-1, or anything under 3 megs:  This value is of course
     negotiable, depending on your system.  Mine's an old Pentium
     MMX, 32M RAM, piix4 chipset, standard Award BIOS, 3.8G IDE HD.
     For an immediate stress test, try Netscape 4.x rendering
     http://www.nature.com/nature/journal/v409/n6822/toc_r.html
     http://www.nature.com/nature/journal/v409/n6818/toc_r.html
     etc.  Swap grows _absurdly_ fast if Javascript is enabled.
  4. After browsing the ML:  It may be better security on a multi-
     user system NOT to neglect processes with direct hardware 
     access.  Either delete that section of `badness' (treating
     DHA cases the same as others), or restore RR's `points /= 4',
     or some other formula.

=== PATCH against Linux kernel 2.4.3 ===   made with diff -u
Copyright (c) 1999-2001 by Rik van Riel & others, under GNU General
     Public License.  http://www.fsf.org/
--- linux-2.4.3/mm/oom_kill.c	Tue Nov 14 12:56:46 2000
+++ linux-alt/mm/oom_kill.c	Wed Apr 11 19:48:30 2001
@@ -23,19 +23,6 @@
 
 /* #define DEBUG */
 
-/**
- * int_sqrt - oom_kill.c internal function, rough approximation to sqrt
- * @x: integer of which to calculate the sqrt
- * 
- * A very rough approximation to the sqrt() function.
- */
-static unsigned int int_sqrt(unsigned int x)
-{
-	unsigned int out = x;
-	while (x & ~(unsigned int)1) x >>=2, out >>=1;
-	if (x) out -= out >> 2;
-	return (out ? out : 1);
-}	
 
 /**
  * oom_badness - calculate a numeric value for how bad this task has been
@@ -46,7 +33,8 @@
  * to kill when we run out of memory.
  *
  * Good in this context means that:
- * 1) we lose the minimum amount of work done
+ * 1) kill only a normal user (not root) process
+ * -)  (amount of work done and niceness don't count)
  * 2) we recover a large amount of memory
  * 3) we don't kill anything innocent of eating tons of memory
  * 4) we want to kill the minimum amount of processes (one)
@@ -57,7 +45,7 @@
 
 static int badness(struct task_struct *p)
 {
-	int points, cpu_time, run_time;
+	int points;
 
 	if (!p->mm)
 		return 0;
@@ -66,41 +54,26 @@
 	 */
 	points = p->mm->total_vm;
 
-	/*
-	 * CPU time is in seconds and run time is in minutes. There is no
-	 * particular reason for this other than that it turned out to work
-	 * very well in practice. This is not safe against jiffie wraps
-	 * but we don't care _that_ much...
-	 */
-	cpu_time = (p->times.tms_utime + p->times.tms_stime) >> (SHIFT_HZ + 3);
-	run_time = (jiffies - p->start_time) >> (SHIFT_HZ + 10);
+	/* CPU time not considered:  this is for MEMory hogs. */
 
-	points /= int_sqrt(cpu_time);
-	points /= int_sqrt(int_sqrt(run_time));
-
-	/*
-	 * Niced processes are most likely less important, so double
-	 * their badness points.
-	 */
-	if (p->nice > 0)
-		points *= 2;
+	/* Niced processes less important?  Distributed.net would disagree! */
 
 	/*
 	 * Superuser processes are usually more important, so we make it
-	 * less likely that we kill those.
+	 * less likely (impossible) that we kill those.
 	 */
 	if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_ADMIN) ||
 				p->uid == 0 || p->euid == 0)
-		points /= 4;
+		points = 0;
 
 	/*
-	 * We don't want to kill a process with direct hardware access.
+	 * We WON'T kill a process with direct hardware access.
 	 * Not only could that mess up the hardware, but usually users
 	 * tend to only have this flag set on applications they think
 	 * of as important.
 	 */
 	if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_RAWIO))
-		points /= 4;
+		points = 0;
 #ifdef DEBUG
 	printk(KERN_DEBUG "OOMkill: task %d (%s) got %d points\n",
 	p->pid, p->comm, points);
@@ -193,11 +166,10 @@
 {
 	struct sysinfo swp_info;
 
-	/* Enough free memory?  Not OOM. */
-	if (nr_free_pages() > freepages.min)
-		return 0;
+	/* Even if free memory stays big enough...  */
+	/*  ...a cramped cache means thrashing, then keyboard lockout. */
 
-	if (nr_free_pages() + nr_inactive_clean_pages() > freepages.low)
+	if ((atomic_read(&page_cache_size) << PAGE_SHIFT)  >  (3 << 20)-1 )
 		return 0;
 
 	/* Enough swap space left?  Not OOM. */
 



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread
* Re: suspend processes at load (was Re: a simple OOM ...)
@ 2001-04-19 14:03 Jonathan Morton
  2001-04-19 18:25 ` Dave McCracken
  0 siblings, 1 reply; 85+ messages in thread
From: Jonathan Morton @ 2001-04-19 14:03 UTC (permalink / raw)
  To: linux-mm

>> THIS is why we need process suspension in the kernel.
>
>Not necessarily.  Creating a minimal working set guarantee for small
>tasks is one way to avoid the need for process suspension.  Creating a
>dynamic working set upper limit for large, thrashing tasks is a way to
>avoid the thrashing tasks from impacting everybody else too much.
>There are many possible ways forward, and I am not yet convinced that
>process suspension is necessary.

Let's stop arguing at such an abstract level, and try to get some
algorithms down so we can analyse this properly.  Below, I outline a
possible algorithm for handling the working-set model I outlined yesterday.
Those of you who still believe in process suspension, please do likewise
(exactly which process do you suspend, and for how long?).

My proposal is to introduce a better approximation to LRU in the VM, solely
for the purpose of determining the working set.  No alterations to the page
replacement policy are needed per se, except to honour the "allowed working
set" for each process as calculated below.

As I understand things (correct me if I'm wrong), there is a list of VM
pages associated with each process (current->mm).  There is also a number
of lists of pages, classifying them into "active", "inactive/clean",
"inactive/dirty" and so on.  There are routines which know when and how to
move pages between these lists (precisely when and how these are called is
an area I haven't investigated yet).  When a process accesses memory, there
must be a routine which moves the relevant page onto the active list.  The
page may already be on the active list, in which case nothing is done at
present.

During the act of moving a page onto the active list (or determining that
it already is there and doesn't need to be moved), I think it would be
appropriate to associate the time of last access with the page, and the
page access order with the process.  From maintenance of a list of such
associations, the working set of the process can be calculated quite easily.

struct working_set_list {
	struct working_set_list *next;
	page_id id;
	unsigned short accessed;
};

Suppose the page list current->mm is extended to contain the above in some
manner.  The 'accessed' field is set to the LSW of jiffies, and old entries
are purged from the working set when 0x8000 jiffies have passed (about 5.5
minutes on x86 and other 100Hz systems, probably shorter on some
architectures).  By keeping head, tail and possibly 'oldness threshold'
pointers in current->mm, list maintenance should become O(1) for most
common operations.

The working set is simply the number of entries in the list which are newer
than the oldness threshold.  Calculation of this value can be made trivial
by keeping a separate counter (similar to current->mm.total_vm) which is
updated whenever the list is maintained.  Note that the working set can
contain pages which are not in the active list - removal of a page from the
active list does not remove it from the working set.

Since the sum of all the working sets in the system can be greater than the
physical memory in the system (this is what thrashing means, after all), a
"physical quota" needs to be calculated for each process.  The calculation
of the physical quota is based heavily on the working set, and should
probably be done at scan-for-swap-out time.  This calculation is roughly as
I described yesterday:

- Calculate the total physical quota for all processes as the sum of all
working sets (plus unswappable memory such as kernel, mlock(), plus a small
chunk to handle buffers, cache, etc.)
- If this total is within the physical memory of the system, the physical
quota for each process is the same as it's working set.  (fast common case)
- Otherwise, locate the process with the largest quota and remove it from
the total quota.  Add in "a few" pages to ensure this process always has
*some* memory to work in.  Repeat this step until the physical quota is
within physical memory or no processes remain.
- Any remaining processes after this step get their full working set as
physical quota.  Processes removed from the list get equal share of
(remaining physical memory, minus the chunk for buffers, cache and so on).

Now we turn to the page replacement policy.  At present, AFAICT, this is a
"not recently used" policy - pages are swapped out if they are not actually
in the active list.  The act of scanning memory for swappable pages also
removes pages from the active list (presumably so they will be swapped out
anyway if nothing could be found on the first scan).

A simple modification is needed here - if a page is "not recently used" AND
all of the page's users are processes which currently have more
physically-resident pages than it's "physical quota" as calculated above,
it is swapped out.

For the special case where the physical quota of a process equals it's
working set, the replacement algorithm might check if the candidate page is
in the working set for the process, as a hint not to page it out.

Comments?

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)
big-mail: chromatix@penguinpowered.com
uni-mail: j.d.morton@lancaster.ac.uk

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-----END GEEK CODE BLOCK-----


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 85+ messages in thread

end of thread, other threads:[~2001-04-23  5:59 UTC | newest]

Thread overview: 85+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-04-12 16:58 [PATCH] a simple OOM killer to save me from Netscape Slats Grobnik
2001-04-12 18:25 ` Rik van Riel
2001-04-12 18:49   ` James A. Sutherland
2001-04-13  6:45   ` Eric W. Biederman
2001-04-13 16:20     ` Rik van Riel
2001-04-14  1:20       ` Stephen C. Tweedie
2001-04-16 21:06         ` James A. Sutherland
2001-04-16 21:40           ` Jonathan Morton
2001-04-16 22:12             ` Rik van Riel
2001-04-16 22:21             ` James A. Sutherland
2001-04-17 14:26               ` Jonathan Morton
2001-04-17 19:53                 ` Rik van Riel
2001-04-17 20:44                   ` James A. Sutherland
2001-04-17 20:59                     ` Jonathan Morton
2001-04-17 21:09                       ` James A. Sutherland
2001-04-14  7:00       ` Eric W. Biederman
2001-04-15  5:05         ` Rik van Riel
2001-04-15  5:20           ` Rik van Riel
2001-04-16 11:52         ` Szabolcs Szakacsits
2001-04-16 12:17       ` suspend processes at load (was Re: a simple OOM ...) Szabolcs Szakacsits
2001-04-17 19:48         ` Rik van Riel
2001-04-18 21:32           ` Szabolcs Szakacsits
2001-04-18 20:38             ` James A. Sutherland
2001-04-18 23:25               ` Szabolcs Szakacsits
2001-04-18 22:29                 ` Rik van Riel
2001-04-19 10:14                   ` Stephen C. Tweedie
2001-04-19 13:23                   ` Szabolcs Szakacsits
2001-04-19  2:11                 ` Rik van Riel
2001-04-19  7:08                   ` James A. Sutherland
2001-04-19 13:37                     ` Szabolcs Szakacsits
2001-04-19 12:26                       ` Christoph Rohland
2001-04-19 12:30                       ` James A. Sutherland
2001-04-19  9:15                 ` James A. Sutherland
2001-04-19 18:34             ` Dave McCracken
2001-04-19 18:47               ` James A. Sutherland
2001-04-19 18:53                 ` Dave McCracken
2001-04-19 19:10                   ` James A. Sutherland
2001-04-20 14:58                     ` Rik van Riel
2001-04-21  6:10                       ` James A. Sutherland
2001-04-19 19:13                   ` Rik van Riel
2001-04-19 19:47                     ` Gerrit Huizenga
2001-04-20 12:44                       ` Szabolcs Szakacsits
2001-04-19 20:06                     ` James A. Sutherland
2001-04-20 12:29                     ` Szabolcs Szakacsits
2001-04-20 11:50                       ` Jonathan Morton
2001-04-20 13:32                         ` Szabolcs Szakacsits
2001-04-20 14:30                           ` Rik van Riel
2001-04-22 10:21                       ` James A. Sutherland
2001-04-20 12:25                 ` Szabolcs Szakacsits
2001-04-21  6:08                   ` James A. Sutherland
2001-04-20 12:18               ` Szabolcs Szakacsits
2001-04-22 10:19                 ` James A. Sutherland
2001-04-17 10:58 ` limit for number of processes Uman
2001-04-19 14:03 suspend processes at load (was Re: a simple OOM ...) Jonathan Morton
2001-04-19 18:25 ` Dave McCracken
2001-04-19 18:32   ` James A. Sutherland
2001-04-19 20:23     ` Jonathan Morton
2001-04-20 12:14     ` Szabolcs Szakacsits
2001-04-20 12:02       ` Jonathan Morton
2001-04-20 14:48       ` Dave McCracken
2001-04-21  5:49       ` James A. Sutherland
2001-04-21 19:16         ` Joseph A. Knapka
2001-04-21 19:41           ` Jonathan Morton
2001-04-22 10:08             ` James A. Sutherland
2001-04-22 16:53               ` Jonathan Morton
2001-04-22 17:06                 ` James A. Sutherland
2001-04-22 18:18                   ` Jonathan Morton
2001-04-22 18:57                     ` Rik van Riel
2001-04-22 19:41                       ` James A. Sutherland
2001-04-22 20:33                         ` Jean Francois Martinez
2001-04-22 20:21                       ` Jonathan Morton
2001-04-22 20:36                         ` Jonathan Morton
2001-04-22 19:01                     ` James A. Sutherland
2001-04-22 19:11                       ` Rik van Riel
2001-04-22 20:36                         ` James A. Sutherland
2001-04-22 19:30                       ` Jonathan Morton
2001-04-22 20:35                         ` James A. Sutherland
2001-04-22 20:41                           ` Rik van Riel
2001-04-22 20:58                             ` James A. Sutherland
2001-04-22 21:26                               ` Rik van Riel
2001-04-22 22:26                                 ` Jonathan Morton
2001-04-23  5:55                                   ` James A. Sutherland
2001-04-23  5:59                                     ` Rik van Riel
2001-04-21 20:29           ` Rik van Riel
2001-04-22 10:08           ` James A. Sutherland

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox