* [PATCH] OOM killer
@ 1998-08-16 16:34 Rik van Riel
1998-08-17 16:50 ` Claus Fischer
1998-08-18 6:33 ` Savochkin Andrey Vladimirovich
0 siblings, 2 replies; 4+ messages in thread
From: Rik van Riel @ 1998-08-16 16:34 UTC (permalink / raw)
To: Linux MM; +Cc: Linux Kernel, Claus Fischer
Hi,
here is the first patch that provides kernel-based out-of-memory
killing.
It is only here to try if it works, I know it compiles but
I haven't even booted it yet :)
Basically, when kswapd fails to free up pages, we're out of
memory and the system would otherwise die, the added functions
select a process to kill.
I don't know if it will always select the right process, nor
if it even works correctly. All I do know is that the code
is currently _VERY_ dirty and that it needs some major cleanups
and sysctl tunables; right now I don't even dare sending Linus
a cc: of this message :-) [Linus, if you read this, don't
read on unless you don't mind ROFLing]
Rik.
+-------------------------------------------------------------------+
| Linux memory management tour guide. H.H.vanRiel@phys.uu.nl |
| Scouting Vries cubscout leader. http://www.phys.uu.nl/~riel/ |
+-------------------------------------------------------------------+
--- mm/Makefile.orig Sun Aug 16 17:26:38 1998
+++ mm/Makefile Sun Aug 16 17:26:57 1998
@@ -9,7 +9,7 @@
O_TARGET := mm.o
O_OBJS := memory.o mmap.o filemap.o mprotect.o mlock.o mremap.o \
- vmalloc.o slab.o \
+ vmalloc.o slab.o oom_kill.o\
swap.o vmscan.o page_io.o page_alloc.o swap_state.o swapfile.o
include $(TOPDIR)/Rules.make
--- mm/oom_kill.c.orig Sun Aug 16 17:26:30 1998
+++ mm/oom_kill.c Sun Aug 16 18:24:05 1998
@@ -0,0 +1,133 @@
+/*
+ * linux/mm/oom_kill.c
+ *
+ * Copyright (C) 1998 Rik van Riel
+ *
+ * The routines in this file are used to kill a process when
+ * we're seriously out of memory. This gets called from kswapd()
+ * in linux/mm/vmscan.c when we really run out of memory.
+ *
+ */
+
+#include <linux/mm.h>
+#include <linux/sched.h>
+#include <linux/stddef.h>
+#include <linux/swap.h>
+#include <linux/swapctl.h>
+#include <linux/timex.h>
+
+#define DEBUG
+/* Hmm, I remember a global declaration. Haven't found
+ * it though... */
+#define min(a,b) (((a)<(b))?(a):(b))
+
+typedef struct vm_kill_t
+{
+ unsigned int ram;
+ unsigned int total;
+} vm_kill_t;
+
+struct vm_kill_t vm_kill = {25, 3};
+
+inline int int_sqrt(unsigned int x)
+{
+ int out = x;
+ while (x & ~(unsigned int)1) x >>=2, out >>=1;
+ if (x) out -= out >> 2;
+ return (out ? out : 1);
+}
+
+/*
+ * Basically, points = size / (sqrt(CPU_used) * sqrt(sqrt(time_running)))
+ * with some bonusses/penalties.
+ *
+ * This is ugly as hell, and a nice cleanup is welcome :-)
+ */
+
+inline int badness(struct task_struct *p)
+{
+ int points = p->mm->total_vm;
+ points /= int_sqrt((p->times.tms_utime + p->times.tms_stime) >> (SHIFT_HZ + 3));
+ points /= int_sqrt(int_sqrt((jiffies - p->start_time) >> (SHIFT_HZ + 10)));
+ if (p->priority < DEF_PRIORITY)
+ points <<= 1;
+ if (p->uid == 0 || p->euid == 0 || p->cap_effective.cap & CAP_TO_MASK(CAP_SYS_ADMIN))
+ points >>= 2;
+ if (p->start_time < jiffies >> 6)
+ points >>= 2;
+/*
+ * NEVER, EVER kill a process with direct hardware acces. If
+ * we start doing that, we won't make a clean recovery and a
+ * sync + umount + reboot will be better.
+ */
+ if (p->cap_effective.cap & CAP_TO_MASK(CAP_SYS_RAWIO)
+#ifdef __i386__
+ || p->tss.bitmap == offsetof(struct thread_struct, io_bitmap)
+#endif
+ )
+ points = 0;
+#ifdef DEBUG
+ printk(KERN_DEBUG "OOMkill: task %d (%s) got %d points\n",
+ p->pid, p->comm, points);
+#endif
+ return points;
+}
+
+inline struct task_struct * select_bad_process(void)
+{
+ int points = 0;
+ struct task_struct *p = NULL;
+ struct task_struct *chosen = NULL;
+ read_lock(&tasklist_lock); /* We might need this on SMP */
+ for_each_task(p)
+ if (p->pid && badness(p) > points)
+ chosen = p;
+ read_unlock(&tasklist_lock);
+ return chosen;
+}
+
+/*
+ * The SCHED_FIFO magic should make sure that the killed context
+ * gets absolute priority when killing itself. This should prevent
+ * a looping kswapd from interfering with the process killing.
+ */
+void oom_kill(void)
+{
+
+ struct task_struct *p = select_bad_process();
+ if (p == NULL)
+ return;
+ printk(KERN_ERR "Out of Memory: Killed process %d (%s).", p->pid, p->comm);
+ force_sig(SIGKILL, p);
+ p->policy = SCHED_FIFO;
+ p->rt_priority = 1000;
+ current->policy |= SCHED_YIELD;
+ schedule();
+ return;
+}
+
+/*
+ * Are we out of memory?
+ *
+ * We ignore swap cache pages and simplify the situation a bit.
+ * This probably won't hurt, because when kswapd is failing we
+ * already have to assume the worst.
+ */
+
+int out_of_memory(void)
+{
+ struct sysinfo val;
+ int free_vm, kill_limit;
+ si_meminfo(&val);
+ si_swapinfo(&val);
+ kill_limit = min(vm_kill.ram * (val.totalram >> PAGE_SHIFT),
+ vm_kill.total * ((val.totalram + val.totalswap) >> PAGE_SHIFT));
+ free_vm = ((val.freeram + val.bufferram + val.freeswap) >>
+ PAGE_SHIFT) + page_cache_size - (page_cache.min_percent +
+ buffer_mem.min_percent) * num_physpages;
+ if (free_vm * 100 < kill_limit)
+ return 1;
+ return 0;
+}
+
+
\ No newline at end of file
--- mm/vmscan.c.orig Sun Aug 16 17:26:20 1998
+++ mm/vmscan.c Sun Aug 16 18:26:28 1998
@@ -28,6 +28,13 @@
#include <asm/bitops.h>
#include <asm/pgtable.h>
+/*
+ * OOM kill declarations. Move to .h file before submission :)
+ */
+
+extern int out_of_memory(void);
+extern void oom_kill(void);
+
/*
* When are we next due for a page scan?
*/
@@ -532,7 +539,7 @@
init_swap_timer();
add_wait_queue(&kswapd_wait, &wait);
while (1) {
- int tries;
+ int tries, tried, succes;
current->state = TASK_INTERRUPTIBLE;
flush_signals(current);
@@ -558,14 +565,16 @@
*/
tries = pager_daemon.tries_base;
tries >>= 4*free_memory_available();
-
+ tried = succes = 0;
+
while (tries--) {
int gfp_mask;
- if (free_memory_available() > 1)
+ if (free_memory_available() > 1 && ++tried > pager_daemon.tries_min)
break;
gfp_mask = __GFP_IO;
- do_try_to_free_page(gfp_mask);
+ if (do_try_to_free_page(gfp_mask))
+ succes++;
/*
* Syncing large chunks is faster than swapping
* synchronously (less head movement). -- Rik.
@@ -574,6 +583,8 @@
run_task_queue(&tq_disk);
}
+ if (succes < 4 * tried && out_of_memory())
+ oom_kill();
}
/* As if we could ever get here - maybe we want to make this killable */
remove_wait_queue(&kswapd_wait, &wait);
--
This is a majordomo managed list. To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] OOM killer
1998-08-16 16:34 [PATCH] OOM killer Rik van Riel
@ 1998-08-17 16:50 ` Claus Fischer
1998-08-17 18:41 ` Rik van Riel
1998-08-18 6:33 ` Savochkin Andrey Vladimirovich
1 sibling, 1 reply; 4+ messages in thread
From: Claus Fischer @ 1998-08-17 16:50 UTC (permalink / raw)
To: H.H.vanRiel; +Cc: linux-mm, linux-kernel
Rik,
thanks for doing it so nicely. I just came back from a Yellowstone vacation.
Comments (disordered):
unsigned int ram; /* in percent */
unsigned int total; /* in percent */
The comments would help just a bit :-)
points /= int_sqrt(int_sqrt((jiffies - p->start_time) >> (SHIFT_HZ + 10)));
If jiffies have wrapped around, the process does not get more points.
It's no big problem in this case, it just means that jobs like are not
init etc. all too well weighted after the wrap-around.
I don't know of a good solution to that.
unsigned int points;
unsigned int b;
if (p->pid && (b = badness(p)) > points)
chosen = p, points = b;
\ No newline at end of file
Not easy to apply for a patch; Wine recently had the same thing in one of
its patches :-)
int tries, tried, succes;
Should it read success, or is this deliberate?
Otherwise I can't comment on the last patch part.
Here's the most important comment:
free_vm = ((val.freeram + val.bufferram + val.freeswap) >>
PAGE_SHIFT) + page_cache_size - (page_cache.min_percent +
buffer_mem.min_percent) * num_physpages;
I somehow have a feeling that the page cache, min_percent etc. things
should be subtracted from the kill_limit instead of added to the
free_vm. Also, they should perhaps be individually limited?
Rationale:
Just imagine 2 % free memory, buffer_mem.min_percent is 5?
In this case free_vm would result as a negative value, and
it would kill though it should't.
Perhaps so:
int page_cache_min = page_cache.min_percent * num_physpages;
int buffer_min = /* something similar? */
int blocked_ram = page_cache_min +
(buffer_mem.min_percent * num_physpages;
int page_cache_exceeding = max(page_cache_size - page_cache_min,0);
int buffer_exceeding = max(buffer_size - buffer_min,0);
kill_limit = min(vm_kill.ram * (val.totalram >> PAGE_SHIFT -
blocked_ram),
...);
free_vm = ( ... )
+ page_cache_exceeding + buffer_exceeding;
Generally, I think this is an excellent object for 'theoretical programming';
since this code will not be used much in everyday practice (hopefully),
you can only look at it and try very hard to make sure it will work :-)
Right now I have some friends to visit, so I can't spend too much time,
but I promise to take a closer look before end of August.
Thanks for doing all that. You probably have a small circle of dedicated
customers for that but this circle will appreciate it very much.
Claus
--
This is a majordomo managed list. To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] OOM killer
1998-08-17 16:50 ` Claus Fischer
@ 1998-08-17 18:41 ` Rik van Riel
0 siblings, 0 replies; 4+ messages in thread
From: Rik van Riel @ 1998-08-17 18:41 UTC (permalink / raw)
To: Claus Fischer; +Cc: H.H.vanRiel, linux-mm, linux-kernel
On Mon, 17 Aug 1998, Claus Fischer wrote:
> Comments (disordered):
> unsigned int ram; /* in percent */
> unsigned int total; /* in percent */
> The comments would help just a bit :-)
I promise a code cleanup before submission. Note the
"this should be in a .h file" statements...
> points /= int_sqrt(int_sqrt((jiffies - p->start_time) >> (SHIFT_HZ + 10)));
>
> If jiffies have wrapped around, the process does not get more points.
> I don't know of a good solution to that.
Anybody?
> int tries, tried, succes;
>
> Should it read success, or is this deliberate?
Oops, a speling erorr :)
> Here's the most important comment:
>
> free_vm = ((val.freeram + val.bufferram + val.freeswap) >>
> PAGE_SHIFT) + page_cache_size - (page_cache.min_percent +
> buffer_mem.min_percent) * num_physpages;
>
> I somehow have a feeling that the page cache, min_percent etc. things
> should be subtracted from the kill_limit instead of added to the
> free_vm. Also, they should perhaps be individually limited?
>
> Rationale:
> Just imagine 2 % free memory, buffer_mem.min_percent is 5?
> In this case free_vm would result as a negative value, and
> it would kill though it should't.
Even with 2% of free memory, if you have buffer_mem.min_percent at
5, at least 5% of memory will be used by the buffer cache. This
makes sure that free_vm can't be negative. This also means the
code _is_ correct after all...
> int page_cache_min = page_cache.min_percent * num_physpages;
> int buffer_min = /* something similar? */
> int blocked_ram = page_cache_min +
> (buffer_mem.min_percent * num_physpages;
> int page_cache_exceeding = max(page_cache_size - page_cache_min,0);
> int buffer_exceeding = max(buffer_size - buffer_min,0);
This doesn't add much to the readability of the code. Nice comments
and pointers to other places in the code will teach new folks much
more.
> Generally, I think this is an excellent object for 'theoretical programming';
> since this code will not be used much in everyday practice (hopefully),
> you can only look at it and try very hard to make sure it will work :-)
See the comment at the top of the file. I intend it to be a nice and
readable starting point for newbie kernel hackers. This is _the_
place in the kernel where we don't need performance and where we
_do_ need to be absolutely correct.
Besides, having a nice signpost in the kernel source might not be
bad after all. What's 5 or even 10 kB of signposting in this file
if it can teach a lot about memory management and scheduling to new
potential kernel hackers?
> Thanks for doing all that. You probably have a small circle of dedicated
> customers for that but this circle will appreciate it very much.
Thanks.
Rik.
+-------------------------------------------------------------------+
| Linux memory management tour guide. H.H.vanRiel@phys.uu.nl |
| Scouting Vries cubscout leader. http://www.phys.uu.nl/~riel/ |
+-------------------------------------------------------------------+
--
This is a majordomo managed list. To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] OOM killer
1998-08-16 16:34 [PATCH] OOM killer Rik van Riel
1998-08-17 16:50 ` Claus Fischer
@ 1998-08-18 6:33 ` Savochkin Andrey Vladimirovich
1 sibling, 0 replies; 4+ messages in thread
From: Savochkin Andrey Vladimirovich @ 1998-08-18 6:33 UTC (permalink / raw)
To: Rik van Riel, Linux MM; +Cc: Linux Kernel, Claus Fischer
On Sun, Aug 16, 1998 at 06:34:32PM +0200, Rik van Riel wrote:
> Hi,
>
> here is the first patch that provides kernel-based out-of-memory
> killing.
>
> It is only here to try if it works, I know it compiles but
> I haven't even booted it yet :)
>
> Basically, when kswapd fails to free up pages, we're out of
> memory and the system would otherwise die, the added functions
> select a process to kill.
>
> I don't know if it will always select the right process, nor
> if it even works correctly. All I do know is that the code
> is currently _VERY_ dirty and that it needs some major cleanups
> and sysctl tunables; right now I don't even dare sending Linus
> a cc: of this message :-) [Linus, if you read this, don't
> read on unless you don't mind ROFLing]
Rik,
Don't you think that it would be much easier if we just implement
"kill priorities" which applications will set themselves?
Certainly, only a limited range of the priorities will be available
for non privileged applications. If people think that this application
is something special (like X or long standing computation programs
or anything else) they set a non default killing priority for the process.
Among other applications it isn't matter which one will be killed first.
Best wishes
Andrey V.
Savochkin
--
This is a majordomo managed list. To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~1998-08-18 6:41 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1998-08-16 16:34 [PATCH] OOM killer Rik van Riel
1998-08-17 16:50 ` Claus Fischer
1998-08-17 18:41 ` Rik van Riel
1998-08-18 6:33 ` Savochkin Andrey Vladimirovich
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox