[PATCH] VM fix for 2.4.0-test9 & OOM handler

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] VM fix for 2.4.0-test9 & OOM handler
@ 2000-10-06 18:59 Rik van Riel
  2000-10-06 20:19 ` Byron Stanoszek
  0 siblings, 1 reply; 112+ messages in thread
From: Rik van Riel @ 2000-10-06 18:59 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm, linux-kernel

Hi Linus,

the following patch contains 2 fixes and one addition
to the VM layer:

1. Roger Larson's fix to make sure there is no
   "1 page gap" between the point where __alloc_pages()
   goes to sleep and kswapd() wakes up    <== livelock fix

2. fix the calculation of freepages.{min,low,high} to better
   reflect the reality of having per-zone tunable free
   memory target                          <== balancing fix

3. add the out of memory killer, which has been tuned with
   -test9 to be ran at exactly the right moment; process
   selection: "principle of least surprise"  <== OOM handling

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/



--- linux-2.4.0-test9/fs/buffer.c.orig	Tue Oct  3 10:19:10 2000
+++ linux-2.4.0-test9/fs/buffer.c	Tue Oct  3 12:25:23 2000
@@ -706,7 +706,7 @@
 static void refill_freelist(int size)
 {
 	if (!grow_buffers(size)) {
-		wakeup_bdflush(1);
+		wakeup_bdflush(1);  /* Sets task->state to TASK_RUNNING */
 		current->policy |= SCHED_YIELD;
 		schedule();
 	}
--- linux-2.4.0-test9/mm/highmem.c.orig	Tue Oct  3 10:20:41 2000
+++ linux-2.4.0-test9/mm/highmem.c	Tue Oct  3 12:25:44 2000
@@ -310,7 +310,7 @@
 repeat_bh:
 	bh = kmem_cache_alloc(bh_cachep, SLAB_BUFFER);
 	if (!bh) {
-		wakeup_bdflush(1);
+		wakeup_bdflush(1);  /* Sets task->state to TASK_RUNNING */
 		current->policy |= SCHED_YIELD;
 		schedule();
 		goto repeat_bh;
@@ -324,7 +324,7 @@
 repeat_page:
 	page = alloc_page(GFP_BUFFER);
 	if (!page) {
-		wakeup_bdflush(1);
+		wakeup_bdflush(1);  /* Sets task->state to TASK_RUNNING */
 		current->policy |= SCHED_YIELD;
 		schedule();
 		goto repeat_page;
--- linux-2.4.0-test9/mm/page_alloc.c.orig	Tue Oct  3 10:20:41 2000
+++ linux-2.4.0-test9/mm/page_alloc.c	Fri Oct  6 15:45:36 2000
@@ -268,7 +268,8 @@
 				water_mark = z->pages_high;
 		}
 
-		if (z->free_pages + z->inactive_clean_pages > water_mark) {
+		/* Use >= to have one page overlap with free_shortage() !! */
+		if (z->free_pages + z->inactive_clean_pages >= water_mark) {
 			struct page *page = NULL;
 			/* If possible, reclaim a page directly. */
 			if (direct_reclaim && z->free_pages < z->pages_min + 8)
@@ -795,21 +796,6 @@
 			
 	printk("On node %d totalpages: %lu\n", nid, realtotalpages);
 
-	/*
-	 * Select nr of pages we try to keep free for important stuff
-	 * with a minimum of 10 pages and a maximum of 256 pages, so
-	 * that we don't waste too much memory on large systems.
-	 * This is fairly arbitrary, but based on some behaviour
-	 * analysis.
-	 */
-	i = realtotalpages >> 7;
-	if (i < 10)
-		i = 10;
-	if (i > 256)
-		i = 256;
-	freepages.min += i;
-	freepages.low += i * 2;
-	freepages.high += i * 3;
 	memlist_init(&active_list);
 	memlist_init(&inactive_dirty_list);
 
@@ -875,6 +861,20 @@
 		zone->pages_min = mask;
 		zone->pages_low = mask*2;
 		zone->pages_high = mask*3;
+		/*
+		 * Add these free targets to the global free target;
+		 * we have to be SURE that freepages.high is higher
+		 * than SUM [zone->pages_min] for all zones, otherwise
+		 * we may have bad bad problems.
+		 *
+		 * This means we cannot make the freepages array writable
+		 * in /proc, but have to add a separate extra_free_target
+		 * for people who require it to catch load spikes in eg.
+		 * gigabit ethernet routing...
+		 */
+		freepages.min += mask;
+		freepages.low += mask*2;
+		freepages.high += mask*3;
 		zone->zone_mem_map = mem_map + offset;
 		zone->zone_start_mapnr = offset;
 		zone->zone_start_paddr = zone_start_paddr;
--- linux-2.4.0-test9/mm/vmscan.c.orig	Tue Oct  3 10:20:41 2000
+++ linux-2.4.0-test9/mm/vmscan.c	Fri Oct  6 15:46:14 2000
@@ -837,8 +837,9 @@
 		for(i = 0; i < MAX_NR_ZONES; i++) {
 			zone_t *zone = pgdat->node_zones+ i;
 			if (zone->size && (zone->inactive_clean_pages +
-					zone->free_pages < zone->pages_min)) {
-				sum += zone->pages_min;
+					zone->free_pages < zone->pages_min+1)) {
+				/* + 1 to have overlap with alloc_pages() !! */
+				sum += zone->pages_min + 1;
 				sum -= zone->free_pages;
 				sum -= zone->inactive_clean_pages;
 			}
@@ -1095,12 +1096,20 @@
 		 * We go to sleep for one second, but if it's needed
 		 * we'll be woken up earlier...
 		 */
-		if (!free_shortage() || !inactive_shortage())
+		if (!free_shortage() || !inactive_shortage()) {
 			interruptible_sleep_on_timeout(&kswapd_wait, HZ);
 		/*
-		 * TODO: insert out of memory check & oom killer
-		 * invocation in an else branch here.
+		 * If we couldn't free enough memory, we see if it was
+		 * due to the system just not having enough memory.
+		 * If that is the case, the only solution is to kill
+		 * a process (the alternative is enternal deadlock).
+		 *
+		 * If there still is enough memory around, we just loop
+		 * and try free some more memory...
 		 */
+		} else if (out_of_memory()) {
+			oom_kill();
+		}
 	}
 }
 
--- linux-2.4.0-test9/mm/Makefile.orig	Wed Oct  4 21:11:05 2000
+++ linux-2.4.0-test9/mm/Makefile	Wed Oct  4 21:11:13 2000
@@ -10,7 +10,7 @@
 O_TARGET := mm.o
 O_OBJS	 := memory.o mmap.o filemap.o mprotect.o mlock.o mremap.o \
 	    vmalloc.o slab.o bootmem.o swap.o vmscan.o page_io.o \
-	    page_alloc.o swap_state.o swapfile.o numa.o
+	    page_alloc.o swap_state.o swapfile.o numa.o oom_kill.o
 
 ifeq ($(CONFIG_HIGHMEM),y)
 O_OBJS += highmem.o
--- linux-2.4.0-test9/mm/oom_kill.c.orig	Wed Oct  4 21:12:51 2000
+++ linux-2.4.0-test9/mm/oom_kill.c	Fri Oct  6 15:35:29 2000
@@ -0,0 +1,210 @@
+/*
+ *  linux/mm/oom_kill.c
+ * 
+ *  Copyright (C)  1998,2000  Rik van Riel
+ *	Thanks go out to Claus Fischer for some serious inspiration and
+ *	for goading me into coding this file...
+ *
+ *  The routines in this file are used to kill a process when
+ *  we're seriously out of memory. This gets called from kswapd()
+ *  in linux/mm/vmscan.c when we really run out of memory.
+ *
+ *  Since we won't call these routines often (on a well-configured
+ *  machine) this file will double as a 'coding guide' and a signpost
+ *  for newbie kernel hackers. It features several pointers to major
+ *  kernel subsystems and hints as to where to find out what things do.
+ */
+
+#include <linux/mm.h>
+#include <linux/sched.h>
+#include <linux/swap.h>
+#include <linux/swapctl.h>
+#include <linux/timex.h>
+
+/* #define DEBUG */
+
+/**
+ * int_sqrt - oom_kill.c internal function, rough approximation to sqrt
+ * @x: integer of which to calculate the sqrt
+ * 
+ * A very rough approximation to the sqrt() function.
+ */
+static unsigned int int_sqrt(unsigned int x)
+{
+	unsigned int out = x;
+	while (x & ~(unsigned int)1) x >>=2, out >>=1;
+	if (x) out -= out >> 2;
+	return (out ? out : 1);
+}	
+
+/**
+ * oom_badness - calculate a numeric value for how bad this task has been
+ * @p: task struct of which task we should calculate
+ *
+ * The formula used is relatively simple and documented inline in the
+ * function. The main rationale is that we want to select a good task
+ * to kill when we run out of memory.
+ *
+ * Good in this context means that:
+ * 1) we lose the minimum amount of work done
+ * 2) we recover a large amount of memory
+ * 3) we don't kill anything innocent of eating tons of memory
+ * 4) we want to kill the minimum amount of processes (one)
+ * 5) we try to kill the process the user expects us to kill, this
+ *    algorithm has been meticulously tuned to meet the priniciple
+ *    of least surprise ... (be careful when you change it)
+ */
+
+static int badness(struct task_struct *p)
+{
+	int points, cpu_time, run_time;
+
+	if (!p->mm)
+		return 0;
+	/*
+	 * The memory size of the process is the basis for the badness.
+	 */
+	points = p->mm->total_vm;
+
+	/*
+	 * CPU time is in seconds and run time is in minutes. There is no
+	 * particular reason for this other than that it turned out to work
+	 * very well in practice. This is not safe against jiffie wraps
+	 * but we don't care _that_ much...
+	 */
+	cpu_time = (p->times.tms_utime + p->times.tms_stime) >> (SHIFT_HZ + 3);
+	run_time = (jiffies - p->start_time) >> (SHIFT_HZ + 10);
+
+	points /= int_sqrt(cpu_time);
+	points /= int_sqrt(int_sqrt(run_time));
+
+	/*
+	 * Niced processes are most likely less important, so double
+	 * their badness points.
+	 */
+	if (p->nice > 0)
+		points *= 2;
+
+	/*
+	 * Superuser processes are usually more important, so we make it
+	 * less likely that we kill those.
+	 */
+	if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_ADMIN) ||
+				p->uid == 0 || p->euid == 0)
+		points /= 4;
+
+	/*
+	 * We don't want to kill a process with direct hardware access.
+	 * Not only could that mess up the hardware, but usually users
+	 * tend to only have this flag set on applications they think
+	 * of as important.
+	 */
+	if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_RAWIO))
+		points /= 4;
+#ifdef DEBUG
+	printk(KERN_DEBUG "OOMkill: task %d (%s) got %d points\n",
+	p->pid, p->comm, points);
+#endif
+	return points;
+}
+
+/*
+ * Simple selection loop. We chose the process with the highest
+ * number of 'points'. We need the locks to make sure that the
+ * list of task structs doesn't change while we look the other way.
+ *
+ * (not docbooked, we don't want this one cluttering up the manual)
+ */
+static struct task_struct * select_bad_process(void)
+{
+	int points = 0, maxpoints = 0;
+	struct task_struct *p = NULL;
+	struct task_struct *chosen = NULL;
+
+	read_lock(&tasklist_lock);
+	for_each_task(p)
+	{
+		if (p->pid)
+			points = badness(p);
+		if (points > maxpoints) {
+			chosen = p;
+			maxpoints = points;
+		}
+	}
+	read_unlock(&tasklist_lock);
+	return chosen;
+}
+
+/**
+ * oom_kill - kill the "best" process when we run out of memory
+ *
+ * If we run out of memory, we have the choice between either
+ * killing a random task (bad), letting the system crash (worse)
+ * OR try to be smart about which process to kill. Note that we
+ * don't have to be perfect here, we just have to be good.
+ *
+ * We must be careful though to never send SIGKILL a process with
+ * CAP_SYS_RAW_IO set, send SIGTERM instead (but it's unlikely that
+ * we select a process with CAP_SYS_RAW_IO set).
+ */
+void oom_kill(void)
+{
+
+	struct task_struct *p = select_bad_process();
+
+	/* Found nothing?!?! Either we hang forever, or we panic. */
+	if (p == NULL)
+		panic("Out of memory and no killable processes...\n");
+
+	printk(KERN_ERR "Out of Memory: Killed process %d (%s).", p->pid, p->comm);
+
+	/*
+	 * We give our sacrificial lamb high priority and access to
+	 * all the memory it needs. That way it should be able to
+	 * exit() and clear out its resources quickly...
+	 */
+	p->counter = 5 * HZ;
+	p->flags |= PF_MEMALLOC;
+
+	/* This process has hardware access, be more careful. */
+	if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_RAWIO)) {
+		force_sig(SIGTERM, p);
+	} else {
+		force_sig(SIGKILL, p);
+	}
+
+	/*
+	 * Make kswapd go out of the way, so "p" has a good chance of
+	 * killing itself before someone else gets the chance to ask
+	 * for more memory.
+	 */
+	current->policy |= SCHED_YIELD;
+	schedule();
+	return;
+}
+
+/**
+ * out_of_memory - is the system out of memory?
+ *
+ * Returns 0 if there is still enough memory left,
+ * 1 when we are out of memory (otherwise).
+ */
+int out_of_memory(void)
+{
+	struct sysinfo swp_info;
+
+	/* Enough free memory?  Not OOM. */
+	if (nr_free_pages() > freepages.min)
+		return 0;
+
+	if (nr_free_pages() + nr_inactive_clean_pages() > freepages.low)
+		return 0;
+
+	/* Enough swap space left?  Not OOM. */
+	si_swapinfo(&swp_info);
+	if (swp_info.freeswap > 0)
+		return 0;
+
+	/* Else... */
+	return 1;
+}
--- linux-2.4.0-test9/include/linux/swap.h.orig	Fri Oct  6 12:33:05 2000
+++ linux-2.4.0-test9/include/linux/swap.h	Fri Oct  6 12:33:48 2000
@@ -126,6 +126,10 @@
 extern struct page * read_swap_cache_async(swp_entry_t, int);
 #define read_swap_cache(entry) read_swap_cache_async(entry, 1);
 
+/* linux/mm/oom_kill.c */
+extern int out_of_memory(void);
+extern void oom_kill(void);
+
 /*
  * Make these inline later once they are working properly.
  */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-06 18:59 [PATCH] VM fix for 2.4.0-test9 & OOM handler Rik van Riel
@ 2000-10-06 20:19 ` Byron Stanoszek
  2000-10-06 20:31   ` Rik van Riel
                     ` (2 more replies)
  0 siblings, 3 replies; 112+ messages in thread
From: Byron Stanoszek @ 2000-10-06 20:19 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Linus Torvalds, linux-mm, linux-kernel

On Fri, 6 Oct 2000, Rik van Riel wrote:

> 3. add the out of memory killer, which has been tuned with
>    -test9 to be ran at exactly the right moment; process
>    selection: "principle of least surprise"  <== OOM handling

In the OOM killer, shouldn't there be a check for PID 1 just to enforce that
INIT will not be the victim? Sure its total_vm might be small, but if there was
a memory leak in the kernel somewhere, it might eventually become the target.
I suppose, if it ever were to become the victim, your system wouldn't be too
usable anyway...

Can you give me your rationale for selecting 'nice' processes as being badder?
Do you think it would be a good idea to scale the amount of badness according
to how nice the process is (a nice value of 20 could get the full *2, otherwise
a smaller multiplier)?

How about using the current process priority level instead of nicety. If a
process was deprioritized (or auto-niced) because it was starting to eat up CPU
time, AND its memory is abnormally high, then should that be our #1 victim? We
also don't want to kill things like benchmarks either, but hopefully they
wouldn't start eating up more than the available system memory.

Just some thoughts.

 -Byron

-- 
Byron Stanoszek                         Ph: (330) 644-3059
Systems Programmer                      Fax: (330) 644-8110
Commercial Timesharing Inc.             Email: bstanoszek@comtime.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-06 20:19 ` Byron Stanoszek
@ 2000-10-06 20:31   ` Rik van Riel
  2000-10-09 10:12     ` Marco Colombo
  2000-10-06 21:27   ` David Weinehall
  2000-10-09 18:28   ` Andrea Arcangeli
  2 siblings, 1 reply; 112+ messages in thread
From: Rik van Riel @ 2000-10-06 20:31 UTC (permalink / raw)
  To: Byron Stanoszek; +Cc: Linus Torvalds, linux-mm, linux-kernel

On Fri, 6 Oct 2000, Byron Stanoszek wrote:
> On Fri, 6 Oct 2000, Rik van Riel wrote:
> 
> > 3. add the out of memory killer, which has been tuned with
> >    -test9 to be ran at exactly the right moment; process
> >    selection: "principle of least surprise"  <== OOM handling
> 
> In the OOM killer, shouldn't there be a check for PID 1 just to
> enforce that INIT will not be the victim? Sure its total_vm
> might be small, but if there was a memory leak in the kernel
> somewhere, it might eventually become the target. I suppose, if
> it ever were to become the victim, your system wouldn't be too
> usable anyway...

Indeed, if init is chosen for some reason, something really
strange is going on and there's not much hope for rescueing
it ;)

> Can you give me your rationale for selecting 'nice' processes as
> being badder?

They are niced because the user thinks them a bit less
important. This includes stuff like cron jobs that _just_
push a system over the edge ...

> Do you think it would be a good idea to scale the amount of
> badness according to how nice the process is (a nice value of 20
> could get the full *2, otherwise a smaller multiplier)?

I've thought about this, but the algorithms used are so
coarse that this makes little sense. Also, a nice+20
process is often more "important" than the nice+4 cron
job ... ;)

> How about using the current process priority level instead of
> nicety. If a process was deprioritized (or auto-niced) because
> it was starting to eat up CPU time, AND its memory is abnormally
> high, then should that be our #1 victim?

Not really. In the first place, the dynamic priority changes
so fast that it means almost nothing. Furthermore, once a process
has used a lot of CPU time, killing it means you're potentially
throwing away a lot of useful work that's been done.

(same for a process which has been running a long time)

> We also don't want to kill things like benchmarks either, but
> hopefully they wouldn't start eating up more than the available
> system memory.

*grin*

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-06 20:19 ` Byron Stanoszek
  2000-10-06 20:31   ` Rik van Riel
@ 2000-10-06 21:27   ` David Weinehall
  2000-10-06 23:21     ` David Weinehall
  2000-10-09 18:28   ` Andrea Arcangeli
  2 siblings, 1 reply; 112+ messages in thread
From: David Weinehall @ 2000-10-06 21:27 UTC (permalink / raw)
  To: Byron Stanoszek; +Cc: Rik van Riel, Linus Torvalds, linux-mm, linux-kernel

On Fri, Oct 06, 2000 at 04:19:55PM -0400, Byron Stanoszek wrote:
> On Fri, 6 Oct 2000, Rik van Riel wrote:
> 
> > 3. add the out of memory killer, which has been tuned with
> >    -test9 to be ran at exactly the right moment; process
> >    selection: "principle of least surprise"  <== OOM handling

I've tested v2.4.0test9+RielVMpatch now, together with the
memory_static program. It works terrific. No innocent process got
killed, just the offending one. And not until the memory was completely
depleted.

> In the OOM killer, shouldn't there be a check for PID 1 just to enforce that
> INIT will not be the victim? Sure its total_vm might be small, but if there
> was a memory leak in the kernel somewhere, it might eventually become the
> target.

If INIT has a memory-leak, it deserves to die. We have bigger problems
then anyway... And certainly, if INIT gets killed, we quickly notice
that something is wrong.

> I suppose, if it ever were to become the victim, your system wouldn't
> be too usable anyway...

Correct.

> Can you give me your rationale for selecting 'nice' processes as being
> badder?  Do you think it would be a good idea to scale the amount of
> badness according to how nice the process is (a nice value of 20 could
> get the full *2, otherwise a smaller multiplier)?
> 
> How about using the current process priority level instead of nicety.
> If a process was deprioritized (or auto-niced) because it was starting
> to eat up CPU time, AND its memory is abnormally high, then should
> that be our #1 victim? We also don't want to kill things like
> benchmarks either, but hopefully they wouldn't start eating up more
> than the available system memory.

I wouldn't care a least bit if a benchmark I'm running gets killed if
the memory runs out, but if my dnetc client which has low priority and
neatly works in the background without disturbing anything suddenly
gets killed when another program starts eating memory, I'd be dang
angry...


Standing ovations for Rik van Riel. You've managed to get the VM in
good shape, at least for my machine... Now I'll test it for some machines
with less and more memory (4MB and 64MB ram, with 16MB swap and 
0/256/512/1024/2048 MB swap respectively.)


/David
  _                                                                 _
 // David Weinehall <tao@acc.umu.se> /> Northern lights wander      \\
//  Project MCA Linux hacker        //  Dance across the winter sky //
\>  http://www.acc.umu.se/~tao/    </   Full colour fire           </
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-06 21:27   ` David Weinehall
@ 2000-10-06 23:21     ` David Weinehall
  0 siblings, 0 replies; 112+ messages in thread
From: David Weinehall @ 2000-10-06 23:21 UTC (permalink / raw)
  To: Byron Stanoszek; +Cc: Rik van Riel, Linus Torvalds, linux-mm, linux-kernel

On Fri, Oct 06, 2000 at 11:27:18PM +0200, David Weinehall wrote:
> On Fri, Oct 06, 2000 at 04:19:55PM -0400, Byron Stanoszek wrote:
> > On Fri, 6 Oct 2000, Rik van Riel wrote:
> > 
> > > 3. add the out of memory killer, which has been tuned with
> > >    -test9 to be ran at exactly the right moment; process
> > >    selection: "principle of least surprise"  <== OOM handling
> 
> I've tested v2.4.0test9+RielVMpatch now, together with the
> memory_static program. It works terrific. No innocent process got
> killed, just the offending one. And not until the memory was completely
> depleted.

More tests conducted:

16MB memory, 32MB swapfile + 64MB swappartition (in that order)
16MB memory, 64MB swappartition + 32MB swapfile
16MB memory, 64MB swappartition
16MB memory, 32MB swapfile
16MB memory, NO swap

64MB memory, 256MB swappartition
64MB memory, NO swap

All survives just fine.

I can't do anything else while running the memory-eater program
(this is via ssh; haven't tried locally), but when it finally gets
killed, everything works ok again.


/David
  _                                                                 _
 // David Weinehall <tao@acc.umu.se> /> Northern lights wander      \\
//  Project MCA Linux hacker        //  Dance across the winter sky //
\>  http://www.acc.umu.se/~tao/    </   Full colour fire           </
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-06 20:31   ` Rik van Riel
@ 2000-10-09 10:12     ` Marco Colombo
  2000-10-09 11:27       ` Byron Stanoszek
                         ` (3 more replies)
  0 siblings, 4 replies; 112+ messages in thread
From: Marco Colombo @ 2000-10-09 10:12 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-mm, linux-kernel

On Fri, 6 Oct 2000, Rik van Riel wrote:

[...]
> They are niced because the user thinks them a bit less
> important. 

Please don't, this assumption is quite wrong. I use nice just to be
'nice' to other users. I can run my *important* CPU hog simulation
nice +10 in order to let other people get more CPU when the need it.
But if you put the logic "niced == not important" somewhere into the
kernel, nobody will use nice anymore. I'd rather give a bonus to niced
processes.

I agree this is a small issue, the OOM killer job isn't "nice" at all
anyway. B-) (at OOM time, I'd not even look at the nice of a process at
all. But my point here is that you do, and you take it as an hint for
process importance as percieved by the user that run it, and I believe
it's just wrong guessing).

.TM.
-- 
      ____/  ____/   /
     /      /       /			Marco Colombo
    ___/  ___  /   /		      Technical Manager
   /          /   /			 ESI s.r.l.
 _____/ _____/  _/		       Colombo@ESI.it

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 10:12     ` Marco Colombo
@ 2000-10-09 11:27       ` Byron Stanoszek
  2000-10-09 16:26       ` Kurt Garloff
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 112+ messages in thread
From: Byron Stanoszek @ 2000-10-09 11:27 UTC (permalink / raw)
  To: Marco Colombo; +Cc: Rik van Riel, linux-mm, linux-kernel

On Mon, 9 Oct 2000, Marco Colombo wrote:

> On Fri, 6 Oct 2000, Rik van Riel wrote:
> 
> [...]
> > They are niced because the user thinks them a bit less
> > important. 
> 
> Please don't, this assumption is quite wrong. I use nice just to be
> 'nice' to other users. I can run my *important* CPU hog simulation
> nice +10 in order to let other people get more CPU when the need it.
> But if you put the logic "niced == not important" somewhere into the
> kernel, nobody will use nice anymore. I'd rather give a bonus to niced
> processes.
> 
> I agree this is a small issue, the OOM killer job isn't "nice" at all
> anyway. B-) (at OOM time, I'd not even look at the nice of a process at
> all. But my point here is that you do, and you take it as an hint for
> process importance as percieved by the user that run it, and I believe
> it's just wrong guessing).

I agree completely. Friday night I had a talk with a few others at the office,
and we all came to a concensus that the 'nice' value really shouldn't be a
factor to determine which process gets killed first. The primary point was
that 'nice' is most commonly used for background tasks that are meant to run in
hidden and unseen with low priority. It would be extremely upsetting if a user
decided to log in and browse 50 picture-intensive pages with netscape,
racking up the memory over time, and allowing the OOM killer to zap the
peaceful, 'nice' process in the background that wasn't causing any harm.

Why else would you nice a process? Because you don't want it to interfere with
normal cpu usage by those that normally use the system. You expect that process
to still be running at the end of the day when everyone's gone home.

Regards,
 Byron

-- 
Byron Stanoszek                         Ph: (330) 644-3059
Systems Programmer                      Fax: (330) 644-8110
Commercial Timesharing Inc.             Email: bstanoszek@comtime.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 10:12     ` Marco Colombo
  2000-10-09 11:27       ` Byron Stanoszek
@ 2000-10-09 16:26       ` Kurt Garloff
  2000-10-09 18:29         ` Jamie Lokier
  2000-10-09 17:27       ` Ingo Molnar
  2000-10-09 18:14       ` Rik van Riel
  3 siblings, 1 reply; 112+ messages in thread
From: Kurt Garloff @ 2000-10-09 16:26 UTC (permalink / raw)
  To: Marco Colombo; +Cc: Rik van Riel, linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1398 bytes --]

On Mon, Oct 09, 2000 at 12:12:02PM +0200, Marco Colombo wrote:
> On Fri, 6 Oct 2000, Rik van Riel wrote:
> [...]
> > They are niced because the user thinks them a bit less
> > important. 
> 
> Please don't, this assumption is quite wrong. I use nice just to be
> 'nice' to other users. I can run my *important* CPU hog simulation
> nice +10 in order to let other people get more CPU when the need it.
> But if you put the logic "niced == not important" somewhere into the
> kernel, nobody will use nice anymore. I'd rather give a bonus to niced
> processes.

I could not agree more. Normally, you'd better kill a foreground task
(running nice 0) than selecting one of those background jobs for some
reasons:
* The foreground job can be restarted by the interactive user
  (Most likely, it will be only netscape anyway)
* The background job probably is the more useful one which has been running
  since a longer time (computations, ...)
* If we put any policy like this into the kernel at all, I'd rather
  encourage the usage of nice instead of discouraging it.

I assume here backgrd job == niced job, which mostly is the case in reality.

Regards,
-- 
Kurt Garloff  <garloff@suse.de>                          Eindhoven, NL
GPG key: See mail header, key servers         Linux kernel development
SuSE GmbH, Nuernberg, FRG                               SCSI, Security

[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 17:27       ` Ingo Molnar
@ 2000-10-09 17:25         ` Mark Hahn
  2000-10-09 17:37           ` Ingo Molnar
  2000-10-09 17:47           ` Ed Tomlinson
  0 siblings, 2 replies; 112+ messages in thread
From: Mark Hahn @ 2000-10-09 17:25 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Marco Colombo, Rik van Riel, linux-mm

> feature. Rather introduce a orthogonal voluntary "importance" system-call,
> which marks processes as more and less important. This is similar to
> priority, it can only be decreased by ordinary users.

nice!  call it CAP_IMPORTANT ;)
come to think of it, I'm not sure more than one bit would be terribly
useful - no any sane person is going to spend time 
sorting all their processes by importance...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 10:12     ` Marco Colombo
  2000-10-09 11:27       ` Byron Stanoszek
  2000-10-09 16:26       ` Kurt Garloff
@ 2000-10-09 17:27       ` Ingo Molnar
  2000-10-09 17:25         ` Mark Hahn
  2000-10-09 18:14       ` Rik van Riel
  3 siblings, 1 reply; 112+ messages in thread
From: Ingo Molnar @ 2000-10-09 17:27 UTC (permalink / raw)
  To: Marco Colombo; +Cc: Rik van Riel, linux-mm, linux-kernel

On Mon, 9 Oct 2000, Marco Colombo wrote:

> [...]
> > They are niced because the user thinks them a bit less
> > important. 
> 
> Please don't, this assumption is quite wrong. I use nice just to be
> 'nice' to other users. I can run my *important* CPU hog simulation
> nice +10 in order to let other people get more CPU when the need it.

yep. The OOM killer heuristics *must not* penalize any other kernel
feature. Rather introduce a orthogonal voluntary "importance" system-call,
which marks processes as more and less important. This is similar to
priority, it can only be decreased by ordinary users.

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 17:25         ` Mark Hahn
@ 2000-10-09 17:37           ` Ingo Molnar
  2000-10-09 17:47           ` Ed Tomlinson
  1 sibling, 0 replies; 112+ messages in thread
From: Ingo Molnar @ 2000-10-09 17:37 UTC (permalink / raw)
  To: Mark Hahn; +Cc: Marco Colombo, Rik van Riel, linux-mm

On Mon, 9 Oct 2000, Mark Hahn wrote:

> > feature. Rather introduce a orthogonal voluntary "importance" system-call,
> > which marks processes as more and less important. This is similar to
> > priority, it can only be decreased by ordinary users.
> 
> nice!  call it CAP_IMPORTANT ;)
> come to think of it, I'm not sure more than one bit would be terribly
> useful - no any sane person is going to spend time 
> sorting all their processes by importance...

well, this is like priorities, there is a default value, and i suspect
root-owned daemons such as sendmail should get a higher 'importance'
rating. This is not really directed towards ordinary users, it's rather
for the protection of system-critical daemons. Anyway, this pushes the
policy into user-space.

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 17:25         ` Mark Hahn
  2000-10-09 17:37           ` Ingo Molnar
@ 2000-10-09 17:47           ` Ed Tomlinson
  2000-10-09 18:01             ` Ingo Molnar
  1 sibling, 1 reply; 112+ messages in thread
From: Ed Tomlinson @ 2000-10-09 17:47 UTC (permalink / raw)
  To: Mark Hahn, Ingo Molnar; +Cc: Marco Colombo, Rik van Riel, linux-mm

On Mon, 09 Oct 2000, Mark Hahn wrote:
> > feature. Rather introduce a orthogonal voluntary "importance"
> > system-call, which marks processes as more and less important. This is
> > similar to priority, it can only be decreased by ordinary users.
>
> nice!  call it CAP_IMPORTANT ;)
> come to think of it, I'm not sure more than one bit would be terribly
> useful - no any sane person is going to spend time
> sorting all their processes by importance...

What about the AIX way?  When the system is nearly OOM it sends a
SIG_DANGER signal to all processes.  Those that handle the signal are not 
initial targets for OOM...  Also in the SIG_DANGER processing they can take 
there own actions to reduce their memory usage... (we would have to look out 
for a SIG_DANGER handler that had a memory leak though)

Ed Tomlinson 
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 17:47           ` Ed Tomlinson
@ 2000-10-09 18:01             ` Ingo Molnar
  0 siblings, 0 replies; 112+ messages in thread
From: Ingo Molnar @ 2000-10-09 18:01 UTC (permalink / raw)
  To: Ed Tomlinson; +Cc: Mark Hahn, Marco Colombo, Rik van Riel, linux-mm

On Mon, 9 Oct 2000, Ed Tomlinson wrote:

> What about the AIX way?  When the system is nearly OOM it sends a
> SIG_DANGER signal to all processes.  Those that handle the signal are
> not initial targets for OOM...  Also in the SIG_DANGER processing they
> can take there own actions to reduce their memory usage... (we would
> have to look out for a SIG_DANGER handler that had a memory leak
> though)

i think 'importance' should be an integer value, not just a 'can it handle
SIG_DANGER' flag.

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 10:12     ` Marco Colombo
                         ` (2 preceding siblings ...)
  2000-10-09 17:27       ` Ingo Molnar
@ 2000-10-09 18:14       ` Rik van Riel
  2000-10-09 18:47         ` Ingo Molnar
  2000-10-09 19:38         ` Marco Colombo
  3 siblings, 2 replies; 112+ messages in thread
From: Rik van Riel @ 2000-10-09 18:14 UTC (permalink / raw)
  To: Marco Colombo; +Cc: linux-mm, linux-kernel

On Mon, 9 Oct 2000, Marco Colombo wrote:
> On Fri, 6 Oct 2000, Rik van Riel wrote:
> 
> [...]
> > They are niced because the user thinks them a bit less
> > important. 
> 
> Please don't, this assumption is quite wrong. I use nice just to
> be 'nice' to other users. I can run my *important* CPU hog
> simulation nice +10 in order to let other people get more CPU
> when the need it.

In that case the time the process has been running and the
CPU time used will save the process if it's been running for
a long time.

Please read the /entire/ algorithm before making rash
conclusions like this.

If nice is used for important, long-running tasks, the fact
that they are long-running will save them (and be honest,
would you really care if a simulation would be killed after
5 minutes?  it's only inconvenient if it gets killed after
a few hours...)

> But if you put the logic "niced == not important" somewhere into
> the kernel, nobody will use nice anymore. I'd rather give a
> bonus to niced processes.

This doesn't make ANY sense at all. The objective is to destroy
the least amount of work, which means giving a bonus to processes
which have used a lot of CPU time already ... regardless of nice
value.

> all. But my point here is that you do, and you take it as an hint for
> process importance as percieved by the user that run it, and I believe
> it's just wrong guessing).

If you have a better algorithm, feel free to send patches.

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-06 20:19 ` Byron Stanoszek
  2000-10-06 20:31   ` Rik van Riel
  2000-10-06 21:27   ` David Weinehall
@ 2000-10-09 18:28   ` Andrea Arcangeli
  2000-10-09 18:42     ` Ingo Molnar
  2 siblings, 1 reply; 112+ messages in thread
From: Andrea Arcangeli @ 2000-10-09 18:28 UTC (permalink / raw)
  To: Byron Stanoszek; +Cc: Rik van Riel, Linus Torvalds, linux-mm, linux-kernel

On Fri, Oct 06, 2000 at 04:19:55PM -0400, Byron Stanoszek wrote:
> In the OOM killer, shouldn't there be a check for PID 1 just to enforce that

Init can't be killed in 2.2.x latest, the same bugfix should be forward
ported to 2.4.x.
 
> Can you give me your rationale for selecting 'nice' processes as being badder?

Also the cpu time and start time of a process are meaningless. Simulations
runs for weeks before they run the machine out of memory.

Andrea
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 16:26       ` Kurt Garloff
@ 2000-10-09 18:29         ` Jamie Lokier
  0 siblings, 0 replies; 112+ messages in thread
From: Jamie Lokier @ 2000-10-09 18:29 UTC (permalink / raw)
  To: Kurt Garloff, Marco Colombo, Rik van Riel, linux-mm, linux-kernel

Kurt Garloff wrote:
> I could not agree more. Normally, you'd better kill a foreground task
> (running nice 0) than selecting one of those background jobs for some
> reasons:
> * The foreground job can be restarted by the interactive user
>   (Most likely, it will be only netscape anyway)
> * The background job probably is the more useful one which has been running
>   since a longer time (computations, ...)

Ick.  A background job that's been running for a long time will be saved
by that, as Rik pointed out.

If I've got a background process running for 30 minutes, and a Netscape
with 5 windows open that I'm using (for long or not, doesn't matter),
guess which one I'd rather died?  Not Netscape -- I'm using that and
I'll never remember how to find those 5 windows again if it just dies.

-- Jamie
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 18:28   ` Andrea Arcangeli
@ 2000-10-09 18:42     ` Ingo Molnar
  2000-10-09 19:05       ` Andrea Arcangeli
  2000-10-09 19:30       ` [PATCH] VM fix for 2.4.0-test9 & OOM handler David Ford
  0 siblings, 2 replies; 112+ messages in thread
From: Ingo Molnar @ 2000-10-09 18:42 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Byron Stanoszek, Rik van Riel, Linus Torvalds, linux-mm, linux-kernel

On Mon, 9 Oct 2000, Andrea Arcangeli wrote:

> On Fri, Oct 06, 2000 at 04:19:55PM -0400, Byron Stanoszek wrote:
> > In the OOM killer, shouldn't there be a check for PID 1 just to enforce that
> 
> Init can't be killed in 2.2.x latest, the same bugfix should be forward
> ported to 2.4.x.

I believe we should not special-case init in this case. If the OOM would
kill init then we *want* to know about it ASAP, because it's either a bug
in the OOM code or a memory leak in init. Both things are very bad, and
ignoring the kill would just preserve those bugs artificially.

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 18:14       ` Rik van Riel
@ 2000-10-09 18:47         ` Ingo Molnar
  2000-10-09 18:52           ` Rik van Riel
  2000-10-09 19:38         ` Marco Colombo
  1 sibling, 1 reply; 112+ messages in thread
From: Ingo Molnar @ 2000-10-09 18:47 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Marco Colombo, MM mailing list, linux-kernel, Linus Torvalds

On Mon, 9 Oct 2000, Rik van Riel wrote:

> In that case the time the process has been running and the
> CPU time used will save the process if it's been running for
> a long time.

'importance' is not something we can measure reliably within the kernel.
And assuming that a niced, not long-running process is unimportant misses
the bus as well. What if i just started an important simulation before
going to vacation for two weeks?

> would you really care if a simulation would be killed after
> 5 minutes? [...]

yes, i would. I would probably end up not using nice values. Please, Rik,
dont penalize an unrelated kernel feature!

> [...] The objective is to destroy the least amount of work, which
> means giving a bonus to processes which have used a lot of CPU time
> already ... regardless of nice value.

your OOM code does not follow this objective:

+       /*
+        * Niced processes are most likely less important, so double
+        * their badness points.
+        */
+       if (p->nice > 0)
+               points *= 2;

Niced processes *can be just as important*.

> If you have a better algorithm, feel free to send patches.

yes. Please remove the above part.

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 18:47         ` Ingo Molnar
@ 2000-10-09 18:52           ` Rik van Riel
  2000-10-09 19:27             ` Ingo Molnar
  0 siblings, 1 reply; 112+ messages in thread
From: Rik van Riel @ 2000-10-09 18:52 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Marco Colombo, MM mailing list, linux-kernel, Linus Torvalds

On Mon, 9 Oct 2000, Ingo Molnar wrote:

> > If you have a better algorithm, feel free to send patches.
> 
> yes. Please remove the above part.

OK, done.

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 18:42     ` Ingo Molnar
@ 2000-10-09 19:05       ` Andrea Arcangeli
  2000-10-09 19:07         ` Rik van Riel
  2000-10-09 19:30       ` [PATCH] VM fix for 2.4.0-test9 & OOM handler David Ford
  1 sibling, 1 reply; 112+ messages in thread
From: Andrea Arcangeli @ 2000-10-09 19:05 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Byron Stanoszek, Rik van Riel, Linus Torvalds, linux-mm, linux-kernel

On Mon, Oct 09, 2000 at 08:42:26PM +0200, Ingo Molnar wrote:
> ignoring the kill would just preserve those bugs artificially.

If the oom killer kills a thing like init by mistake or init has a memleak
you'll notice both problems regardless of having a magic for init in a _very_
slow path so I don't buy your point.
.
For corretness init must not be killed ever, period.

So you have two choices:

o	math proof that the current algorithm without the magic can't end
	killing init (and I should be able to proof the other way around
	instead)

o	have a magic check for init

So the magic is _strictly_ necessary at the moment.

Andrea
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 19:05       ` Andrea Arcangeli
@ 2000-10-09 19:07         ` Rik van Riel
  2000-10-09 19:42           ` Andrea Arcangeli
                             ` (2 more replies)
  0 siblings, 3 replies; 112+ messages in thread
From: Rik van Riel @ 2000-10-09 19:07 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Ingo Molnar, Byron Stanoszek, Linus Torvalds, linux-mm, linux-kernel

On Mon, 9 Oct 2000, Andrea Arcangeli wrote:
> On Mon, Oct 09, 2000 at 08:42:26PM +0200, Ingo Molnar wrote:
> > ignoring the kill would just preserve those bugs artificially.
> 
> If the oom killer kills a thing like init by mistake

That only happens in the "random" OOM killer 2.2 has ...

> So you have two choices:
> 
> o	math proof that the current algorithm without the magic can't end
> 	killing init (and I should be able to proof the other way around
> 	instead)
> 
> o	have a magic check for init
> 
> So the magic is _strictly_ necessary at the moment.

No. It's only needed if your OOM algorithm is so crappy that
it might end up killing init by mistake.

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 18:52           ` Rik van Riel
@ 2000-10-09 19:27             ` Ingo Molnar
  0 siblings, 0 replies; 112+ messages in thread
From: Ingo Molnar @ 2000-10-09 19:27 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Marco Colombo, MM mailing list, linux-kernel, Linus Torvalds

On Mon, 9 Oct 2000, Rik van Riel wrote:

> > yes. Please remove the above part.
> 
> OK, done.

thanks - i think all the other heuristics are 'fair': processes with more
CPU and run time used are likely to be more important, superuser processes
and direct-hw-access processes ditto.

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 18:42     ` Ingo Molnar
  2000-10-09 19:05       ` Andrea Arcangeli
@ 2000-10-09 19:30       ` David Ford
  2000-10-09 19:58         ` Andrea Arcangeli
                           ` (2 more replies)
  1 sibling, 3 replies; 112+ messages in thread
From: David Ford @ 2000-10-09 19:30 UTC (permalink / raw)
  To: mingo
  Cc: Andrea Arcangeli, Byron Stanoszek, Rik van Riel, Linus Torvalds,
	linux-mm, linux-kernel

Then spam the console loudly with printk, but don't destroy the whole machine.
Init should only get killed if it REALLY is taking a lot of memory.  On a 4 or 8meg
machine tho, the probability of init getting killed is simply too high for
comfort.  I have never ever seen init start consuming memory like this so I'd
rather get spammed on the console a LOT then have my entire machine instantly go
dead.

We get enough reports about innocuous messages on the console, I'm sure these would
get reported to LKML as well...and in short order as is usual.

-d

Ingo Molnar wrote:

> On Mon, 9 Oct 2000, Andrea Arcangeli wrote:
>
> > On Fri, Oct 06, 2000 at 04:19:55PM -0400, Byron Stanoszek wrote:
> > > In the OOM killer, shouldn't there be a check for PID 1 just to enforce that
> >
> > Init can't be killed in 2.2.x latest, the same bugfix should be forward
> > ported to 2.4.x.
>
> I believe we should not special-case init in this case. If the OOM would
> kill init then we *want* to know about it ASAP, because it's either a bug
> in the OOM code or a memory leak in init. Both things are very bad, and
> ignoring the kill would just preserve those bugs artificially.

--
      "There is a natural aristocracy among men. The grounds of this are
      virtue and talents", Thomas Jefferson [1742-1826], 3rd US President



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 18:14       ` Rik van Riel
  2000-10-09 18:47         ` Ingo Molnar
@ 2000-10-09 19:38         ` Marco Colombo
  1 sibling, 0 replies; 112+ messages in thread
From: Marco Colombo @ 2000-10-09 19:38 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-mm, linux-kernel

On Mon, 9 Oct 2000, Rik van Riel wrote:

> On Mon, 9 Oct 2000, Marco Colombo wrote:
> > On Fri, 6 Oct 2000, Rik van Riel wrote:
> > 
> > [...]
> > > They are niced because the user thinks them a bit less
> > > important. 
> > 
> > Please don't, this assumption is quite wrong. I use nice just to
> > be 'nice' to other users. I can run my *important* CPU hog
> > simulation nice +10 in order to let other people get more CPU
> > when the need it.
> 
> In that case the time the process has been running and the
> CPU time used will save the process if it's been running for
> a long time.
> 
> Please read the /entire/ algorithm before making rash
> conclusions like this.

<flame level=+1>
What "conclusions"? YOU stated that "They are niced because the user
thinks them a bit less important", and I was only commenting on that.
I've never said your /entire/ algorithm is failing, the point was on
the purpose of the 'nice' level. Users don't use nice to mark less 
important processes. It's completely orthogonal. And if you really
want to correlate nice level and importance, I'd rather say that
niced processes tend to be more important that "normal" processes,
on average.
</flame>

> If nice is used for important, long-running tasks, the fact
> that they are long-running will save them (and be honest,
> would you really care if a simulation would be killed after
> 5 minutes?  it's only inconvenient if it gets killed after
> a few hours...)

Ok. Now tell me what's the purpose to run your 'ls' at nice +5 at all.
You nice processes that are going to take a while, otherwise nicing
them has hardly a measurable effect, if any.

> > But if you put the logic "niced == not important" somewhere into
> > the kernel, nobody will use nice anymore. I'd rather give a
> > bonus to niced processes.
> 
> This doesn't make ANY sense at all. The objective is to destroy
> the least amount of work, which means giving a bonus to processes
> which have used a lot of CPU time already ... regardless of nice
> value.

'regardless of nice value' is the part I like.

> > all. But my point here is that you do, and you take it as an hint for
> > process importance as percieved by the user that run it, and I believe
> > it's just wrong guessing).
> 
> If you have a better algorithm, feel free to send patches.

No need. Either reverse the weight you give to nice level or just
ignore it, which probably is easier. I agree that giving a bonus to
niced processed it's nearly useless.
As I've written in my previous message, I don't think it's a big
issue. OOM should not happen, full stop. OOM killer is a last resort
measure, so it needs not to be *too* careful.

> 
> regards,
> 
> Rik
> --
> "What you're running that piece of shit Gnome?!?!"
>        -- Miguel de Icaza, UKUUG 2000
> 
> http://www.conectiva.com/		http://www.surriel.com/
> 
> 

.TM.
-- 
      ____/  ____/   /
     /      /       /			Marco Colombo
    ___/  ___  /   /		      Technical Manager
   /          /   /			 ESI s.r.l.
 _____/ _____/  _/		       Colombo@ESI.it

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 19:07         ` Rik van Riel
@ 2000-10-09 19:42           ` Andrea Arcangeli
  2000-10-09 20:06             ` Ingo Molnar
  2000-10-09 20:06             ` Rik van Riel
  2000-10-09 20:13           ` Ingo Molnar
  2000-10-09 23:35           ` Ingo Oeser
  2 siblings, 2 replies; 112+ messages in thread
From: Andrea Arcangeli @ 2000-10-09 19:42 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Ingo Molnar, Byron Stanoszek, Linus Torvalds, linux-mm, linux-kernel

On Mon, Oct 09, 2000 at 04:07:32PM -0300, Rik van Riel wrote:
> No. It's only needed if your OOM algorithm is so crappy that
> it might end up killing init by mistake.

The algorithm you posted on the list in this thread will kill init if on 4Mbyte
machine without swap init is large 3 Mbytes and you execute a task that grows
over 1M.

So I repeat again: for correctness you should either fix the oom algorithm and
demonstrate with math that it can't kill init or fix the bug using a magic
check.

Since it's not going to be possible to proof that an oom algorithm won't kill
init (also considering init isn't always /sbin/init) the magic check is going
to be the only bugfix possible.

Andrea
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 19:30       ` [PATCH] VM fix for 2.4.0-test9 & OOM handler David Ford
@ 2000-10-09 19:58         ` Andrea Arcangeli
  2000-10-09 20:14           ` David Ford
  2000-10-09 20:05         ` Rik van Riel
  2000-10-09 21:07         ` Alan Cox
  2 siblings, 1 reply; 112+ messages in thread
From: Andrea Arcangeli @ 2000-10-09 19:58 UTC (permalink / raw)
  To: david+validemail
  Cc: mingo, Byron Stanoszek, Rik van Riel, Linus Torvalds, linux-mm,
	linux-kernel

On Mon, Oct 09, 2000 at 12:30:20PM -0700, David Ford wrote:
> Init should only get killed if it REALLY is taking a lot of memory.  On a 4 or 8meg

Init should never get killed. Killing init can be compared to destroy the TCP
stack. Some app can keep to run right for some minute until they run socket()
and then they will hang. Same with init, some task may still run right for
some time but the machine will die eventually. We simply must not pass the
point of not return or we're buggy and after the bug triggered we have to force
the user to reboot the machine as only way to recover.

Andrea
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 19:30       ` [PATCH] VM fix for 2.4.0-test9 & OOM handler David Ford
  2000-10-09 19:58         ` Andrea Arcangeli
@ 2000-10-09 20:05         ` Rik van Riel
  2000-10-09 21:07         ` Alan Cox
  2 siblings, 0 replies; 112+ messages in thread
From: Rik van Riel @ 2000-10-09 20:05 UTC (permalink / raw)
  To: david+validemail
  Cc: mingo, Andrea Arcangeli, Byron Stanoszek, Linus Torvalds,
	linux-mm, linux-kernel

On Mon, 9 Oct 2000, David Ford wrote:

> Then spam the console loudly with printk, but don't destroy the
> whole machine. Init should only get killed if it REALLY is
> taking a lot of memory.  On a 4 or 8meg machine tho, the
> probability of init getting killed is simply too high for
> comfort.  I have never ever seen init start consuming memory
> like this so I'd rather get spammed on the console a LOT then
> have my entire machine instantly go dead.

Please TEST THIS before spreading Wild Rumours(tm)

On 2.2 a /random/ process gets killed when the system gets
tight, so you'll see init killed on (pre-kludge) 2.2 kernels,
but I don't believe you'll see this with 2.4...

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 19:42           ` Andrea Arcangeli
@ 2000-10-09 20:06             ` Ingo Molnar
  2000-10-09 20:06               ` Andi Kleen
                                 ` (3 more replies)
  2000-10-09 20:06             ` Rik van Riel
  1 sibling, 4 replies; 112+ messages in thread
From: Ingo Molnar @ 2000-10-09 20:06 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Rik van Riel, Byron Stanoszek, Linus Torvalds, MM mailing list,
	linux-kernel

On Mon, 9 Oct 2000, Andrea Arcangeli wrote:

> > No. It's only needed if your OOM algorithm is so crappy that
> > it might end up killing init by mistake.
> 
> The algorithm you posted on the list in this thread will kill init if
> on 4Mbyte machine without swap init is large 3 Mbytes and you execute
> a task that grows over 1M.

i think the OOM algorithm should not kill processes that have
child-processes, it should first kill child-less 'leaves'. Killing a
process that has child processes likely results in unexpected behavior of
those child-processes. (and equals to effective killing of those
child-processes as well.)

But this mechanizm can be abused (a malicious memory hog can create a
child-process just to avoid the OOM-killer) - but there are ways to avoid
this, eg. to add all the 'MM badness' points to children? Ie. a child
which has MM-abuser parent(s) will definitely be killed first.

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 20:06             ` Ingo Molnar
@ 2000-10-09 20:06               ` Andi Kleen
  2000-10-09 20:19                 ` Ingo Molnar
  2000-10-09 20:52                 ` Linus Torvalds
  2000-10-09 20:11               ` [PATCH] VM fix for 2.4.0-test9 & " Andrea Arcangeli
                                 ` (2 subsequent siblings)
  3 siblings, 2 replies; 112+ messages in thread
From: Andi Kleen @ 2000-10-09 20:06 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrea Arcangeli, Rik van Riel, Byron Stanoszek, Linus Torvalds,
	MM mailing list, linux-kernel

On Mon, Oct 09, 2000 at 10:06:02PM +0200, Ingo Molnar wrote:
> 
> On Mon, 9 Oct 2000, Andrea Arcangeli wrote:
> 
> > > No. It's only needed if your OOM algorithm is so crappy that
> > > it might end up killing init by mistake.
> > 
> > The algorithm you posted on the list in this thread will kill init if
> > on 4Mbyte machine without swap init is large 3 Mbytes and you execute
> > a task that grows over 1M.
> 
> i think the OOM algorithm should not kill processes that have
> child-processes, it should first kill child-less 'leaves'. Killing a
> process that has child processes likely results in unexpected behavior of
> those child-processes. (and equals to effective killing of those
> child-processes as well.)

netscape usually has child processes: the dns helper. 

-Andi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 19:42           ` Andrea Arcangeli
  2000-10-09 20:06             ` Ingo Molnar
@ 2000-10-09 20:06             ` Rik van Riel
  2000-10-09 20:18               ` Andrea Arcangeli
  2000-10-10  3:29               ` Philipp Rumpf
  1 sibling, 2 replies; 112+ messages in thread
From: Rik van Riel @ 2000-10-09 20:06 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Ingo Molnar, Byron Stanoszek, Linus Torvalds, linux-mm, linux-kernel

On Mon, 9 Oct 2000, Andrea Arcangeli wrote:
> On Mon, Oct 09, 2000 at 04:07:32PM -0300, Rik van Riel wrote:
> > No. It's only needed if your OOM algorithm is so crappy that
> > it might end up killing init by mistake.
> 
> The algorithm you posted on the list in this thread will kill
> init if on 4Mbyte machine without swap init is large 3 Mbytes
> and you execute a task that grows over 1M.

This sounds suspiciously like the description of a DEAD system ;)

(in which case you simply don't care if init is being killed or not)

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 20:13           ` Ingo Molnar
@ 2000-10-09 20:08             ` Rik van Riel
  2000-10-09 20:22               ` Ingo Molnar
  0 siblings, 1 reply; 112+ messages in thread
From: Rik van Riel @ 2000-10-09 20:08 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrea Arcangeli, Byron Stanoszek, Linus Torvalds,
	MM mailing list, linux-kernel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 622 bytes --]

On Mon, 9 Oct 2000, Ingo Molnar wrote:

> what do you think about the attached patch? It increases the effective
> priority of a (kernel-) killed process, and initiates a reschedule, so
> that it gets selected ASAP. (except if there are RT processes around.)
> This should make OOM decisions 'visible' much more quickly.

Note that the OOM killer already has this code built-in,
but it may be a good idea to have SIGKILL delivery speeded
up for every SIGKILL ...

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

[-- Attachment #2: Type: TEXT/PLAIN, Size: 538 bytes --]

--- linux/kernel/signal.c.orig	Mon Oct  9 12:56:45 2000
+++ linux/kernel/signal.c	Mon Oct  9 13:00:20 2000
@@ -569,6 +569,14 @@
 		spin_unlock_irqrestore(&t->sigmask_lock, flags);
 		return -ESRCH;
 	}
+	/*
+	 * Special case, kernel is forcing SIGKILL.
+	 * Decrease signal delivery latency.
+	 */
+	if (sig == SIGKILL && (t->policy == SCHED_OTHER)) {
+		t->counter = MAX_COUNTER;
+		current->need_resched = 1;
+	}
 
 	if (t->sig->action[sig-1].sa.sa_handler == SIG_IGN)
 		t->sig->action[sig-1].sa.sa_handler = SIG_DFL;

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 20:06             ` Ingo Molnar
  2000-10-09 20:06               ` Andi Kleen
@ 2000-10-09 20:11               ` Andrea Arcangeli
  2000-10-09 20:15                 ` Rik van Riel
  2000-10-09 20:40               ` Linus Torvalds
  2000-10-09 21:10               ` Alan Cox
  3 siblings, 1 reply; 112+ messages in thread
From: Andrea Arcangeli @ 2000-10-09 20:11 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rik van Riel, Byron Stanoszek, Linus Torvalds, MM mailing list,
	linux-kernel

On Mon, Oct 09, 2000 at 10:06:02PM +0200, Ingo Molnar wrote:
> i think the OOM algorithm should not kill processes that have
> process that has child processes likely results in unexpected behavior of

You just know what I think about those heuristics. I think all we need is a
per-task pagefault/allocation rate avoiding any other complication that tries
to do the right thing but that it will end doing the wrong thing eventually,
but obviously nobody agreeed with me and before I implement that myself it will
still take some time.

Even the total_vm information will be wrong for example if the task was a
netscape iconized and completly swapped out that wasn't running since two days.
Killing it is going to only delay the killing of the real offender that is
generating a flood of page faults at high frequency.

Andrea
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 20:19                 ` Ingo Molnar
@ 2000-10-09 20:12                   ` Rik van Riel
  2000-10-09 20:24                     ` Ingo Molnar
  0 siblings, 1 reply; 112+ messages in thread
From: Rik van Riel @ 2000-10-09 20:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, Andrea Arcangeli, Byron Stanoszek, Linus Torvalds,
	MM mailing list, linux-kernel

On Mon, 9 Oct 2000, Ingo Molnar wrote:
> On Mon, 9 Oct 2000, Andi Kleen wrote:
> 
> > netscape usually has child processes: the dns helper.
> 
> so dns helper is killed first, then netscape. (my idea might not
> make sense though.)

It makes some sense, but I don't think OOM is something that
occurs often enough to care about it /that/ much...

My algorithm is already complex enough for my tastes (but seems
to work quite well in the sense that it usually picks the "right"
process in one shot and kills the process the user expects to be
killed).

regards,


Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 19:07         ` Rik van Riel
  2000-10-09 19:42           ` Andrea Arcangeli
@ 2000-10-09 20:13           ` Ingo Molnar
  2000-10-09 20:08             ` Rik van Riel
  2000-10-09 23:35           ` Ingo Oeser
  2 siblings, 1 reply; 112+ messages in thread
From: Ingo Molnar @ 2000-10-09 20:13 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrea Arcangeli, Byron Stanoszek, Linus Torvalds,
	MM mailing list, linux-kernel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 287 bytes --]


Rik,

what do you think about the attached patch? It increases the effective
priority of a (kernel-) killed process, and initiates a reschedule, so
that it gets selected ASAP. (except if there are RT processes around.)
This should make OOM decisions 'visible' much more quickly.

	Ingo

[-- Attachment #2: Type: TEXT/PLAIN, Size: 538 bytes --]

--- linux/kernel/signal.c.orig	Mon Oct  9 12:56:45 2000
+++ linux/kernel/signal.c	Mon Oct  9 13:00:20 2000
@@ -569,6 +569,14 @@
 		spin_unlock_irqrestore(&t->sigmask_lock, flags);
 		return -ESRCH;
 	}
+	/*
+	 * Special case, kernel is forcing SIGKILL.
+	 * Decrease signal delivery latency.
+	 */
+	if (sig == SIGKILL && (t->policy == SCHED_OTHER)) {
+		t->counter = MAX_COUNTER;
+		current->need_resched = 1;
+	}
 
 	if (t->sig->action[sig-1].sa.sa_handler == SIG_IGN)
 		t->sig->action[sig-1].sa.sa_handler = SIG_DFL;

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 19:58         ` Andrea Arcangeli
@ 2000-10-09 20:14           ` David Ford
  0 siblings, 0 replies; 112+ messages in thread
From: David Ford @ 2000-10-09 20:14 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: mingo, Byron Stanoszek, Rik van Riel, Linus Torvalds, linux-mm,
	linux-kernel

Andrea Arcangeli wrote:

> On Mon, Oct 09, 2000 at 12:30:20PM -0700, David Ford wrote:
> > Init should only get killed if it REALLY is taking a lot of memory.  On a 4 or 8meg
>
> Init should never get killed. Killing init can be compared to destroy the TCP
> stack. Some app can keep to run right for some minute until they run socket()
> and then they will hang. Same with init, some task may still run right for
> some time but the machine will die eventually. We simply must not pass the
> point of not return or we're buggy and after the bug triggered we have to force
> the user to reboot the machine as only way to recover.

After 1/2 a second of deep reflection, I concur.  Pretty much all interactive processes
will die immediately.  That just doesn't make for happy penguins.

-d

--
      "There is a natural aristocracy among men. The grounds of this are
      virtue and talents", Thomas Jefferson [1742-1826], 3rd US President



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 20:11               ` [PATCH] VM fix for 2.4.0-test9 & " Andrea Arcangeli
@ 2000-10-09 20:15                 ` Rik van Riel
  0 siblings, 0 replies; 112+ messages in thread
From: Rik van Riel @ 2000-10-09 20:15 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Ingo Molnar, Byron Stanoszek, Linus Torvalds, MM mailing list,
	linux-kernel

On Mon, 9 Oct 2000, Andrea Arcangeli wrote:
> On Mon, Oct 09, 2000 at 10:06:02PM +0200, Ingo Molnar wrote:
> > i think the OOM algorithm should not kill processes that have
> > process that has child processes likely results in unexpected behavior of
> 
> You just know what I think about those heuristics. I think all
> we need is a per-task pagefault/allocation rate avoiding any
> other complication that tries to do the right thing but that it
> will end doing the wrong thing eventually, but obviously nobody
> agreeed with me and before I implement that myself it will still
> take some time.

Furthermore, keeping track of these allocations will mean that you
/ALWAYS/ rack up the overhead of keeping track of this, even though
most machines probably won't run out of memory ever, or no more
than twice a year or so ;)

> Even the total_vm information will be wrong for example if the
> task was a netscape iconized and completly swapped out that
> wasn't running since two days. Killing it is going to only delay
> the killing of the real offender that is generating a flood of
> page faults at high frequency.

However true this may be, I wonder if we really care /that/ much.

OOM is a very rare situation and as long as you don't do something
that's really a bad surprise to the user, everything should be ok.

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 20:24                     ` Ingo Molnar
@ 2000-10-09 20:18                       ` Rik van Riel
  2000-10-10  3:23                         ` Philipp Rumpf
  2000-10-09 20:38                       ` James Sutherland
  2000-10-09 22:29                       ` FORT David
  2 siblings, 1 reply; 112+ messages in thread
From: Rik van Riel @ 2000-10-09 20:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, Andrea Arcangeli, Byron Stanoszek, Linus Torvalds,
	MM mailing list, linux-kernel

On Mon, 9 Oct 2000, Ingo Molnar wrote:
> On Mon, 9 Oct 2000, Rik van Riel wrote:
> 
> > > so dns helper is killed first, then netscape. (my idea might not
> > > make sense though.)
> > 
> > It makes some sense, but I don't think OOM is something that
> > occurs often enough to care about it /that/ much...
> 
> i'm trying to handle Andrea's case, the init=/bin/bash
> manual-bootup case, with 4MB RAM and no swap, where the admin
> tries to exec a 2MB process. I think it's a legitimate concern -
> i cannot know in advance whether a freshly started process would
> trigger an OOM or not.

In that case the time running and the cpu time used
factors should give the new process a heavy penalty
compared to init.

(but I'd be curious if somebody actually manages to
trick the OOM killer into killing init ... please
test a bit more to see if this really happens ;))

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 20:06             ` Rik van Riel
@ 2000-10-09 20:18               ` Andrea Arcangeli
  2000-10-10  3:29               ` Philipp Rumpf
  1 sibling, 0 replies; 112+ messages in thread
From: Andrea Arcangeli @ 2000-10-09 20:18 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Ingo Molnar, Byron Stanoszek, Linus Torvalds, linux-mm, linux-kernel

On Mon, Oct 09, 2000 at 05:06:48PM -0300, Rik van Riel wrote:
> On Mon, 9 Oct 2000, Andrea Arcangeli wrote:
> > On Mon, Oct 09, 2000 at 04:07:32PM -0300, Rik van Riel wrote:
> > > No. It's only needed if your OOM algorithm is so crappy that
> > > it might end up killing init by mistake.
> > 
> > The algorithm you posted on the list in this thread will kill
> > init if on 4Mbyte machine without swap init is large 3 Mbytes
> > and you execute a task that grows over 1M.
> 
> This sounds suspiciously like the description of a DEAD system ;)

The system will be DEAD only when your current algorithm will kill init.

Andrea
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 20:06               ` Andi Kleen
@ 2000-10-09 20:19                 ` Ingo Molnar
  2000-10-09 20:12                   ` Rik van Riel
  2000-10-09 20:52                 ` Linus Torvalds
  1 sibling, 1 reply; 112+ messages in thread
From: Ingo Molnar @ 2000-10-09 20:19 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andrea Arcangeli, Rik van Riel, Byron Stanoszek, Linus Torvalds,
	MM mailing list, linux-kernel

On Mon, 9 Oct 2000, Andi Kleen wrote:

> netscape usually has child processes: the dns helper.

so dns helper is killed first, then netscape. (my idea might not make
sense though.)

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 20:08             ` Rik van Riel
@ 2000-10-09 20:22               ` Ingo Molnar
  2000-10-09 20:28                 ` David Ford
  0 siblings, 1 reply; 112+ messages in thread
From: Ingo Molnar @ 2000-10-09 20:22 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrea Arcangeli, Byron Stanoszek, Linus Torvalds,
	MM mailing list, linux-kernel

On Mon, 9 Oct 2000, Rik van Riel wrote:

> Note that the OOM killer already has this code built-in, but it may be

oops, i didnt notice (really!). One comment: 5*HZ in your code is way too
much for counter, and it might break the scheduler in the future. (right
now those counter values are unused, RT priorities start at 1000, so it
cannot cause harm, but one never knows.) Please use MAX_COUNTER instead.

The SCHED_YIELD thing is a nice trick, it should be added to my signal.c
change as well, without the schedule().

> a good idea to have SIGKILL delivery speeded up for every SIGKILL ...

yep.

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 20:12                   ` Rik van Riel
@ 2000-10-09 20:24                     ` Ingo Molnar
  2000-10-09 20:18                       ` Rik van Riel
                                         ` (2 more replies)
  0 siblings, 3 replies; 112+ messages in thread
From: Ingo Molnar @ 2000-10-09 20:24 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andi Kleen, Andrea Arcangeli, Byron Stanoszek, Linus Torvalds,
	MM mailing list, linux-kernel

On Mon, 9 Oct 2000, Rik van Riel wrote:

> > so dns helper is killed first, then netscape. (my idea might not
> > make sense though.)
> 
> It makes some sense, but I don't think OOM is something that
> occurs often enough to care about it /that/ much...

i'm trying to handle Andrea's case, the init=/bin/bash manual-bootup case,
with 4MB RAM and no swap, where the admin tries to exec a 2MB process. I
think it's a legitimate concern - i cannot know in advance whether a
freshly started process would trigger an OOM or not.

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 20:22               ` Ingo Molnar
@ 2000-10-09 20:28                 ` David Ford
  2000-10-09 20:34                   ` Rik van Riel
  0 siblings, 1 reply; 112+ messages in thread
From: David Ford @ 2000-10-09 20:28 UTC (permalink / raw)
  To: mingo
  Cc: Rik van Riel, Andrea Arcangeli, Byron Stanoszek, Linus Torvalds,
	MM mailing list, linux-kernel

Ingo Molnar wrote:

> > a good idea to have SIGKILL delivery speeded up for every SIGKILL ...
>
> yep.

How about SIGTERM a bit before SIGKILL then re-evaluate the OOM N usecs
later?

-d

--
      "There is a natural aristocracy among men. The grounds of this are
      virtue and talents", Thomas Jefferson [1742-1826], 3rd US President



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 20:28                 ` David Ford
@ 2000-10-09 20:34                   ` Rik van Riel
  2000-10-09 20:45                     ` David Ford
  0 siblings, 1 reply; 112+ messages in thread
From: Rik van Riel @ 2000-10-09 20:34 UTC (permalink / raw)
  To: david+validemail
  Cc: mingo, Andrea Arcangeli, Byron Stanoszek, Linus Torvalds,
	MM mailing list, linux-kernel

On Mon, 9 Oct 2000, David Ford wrote:
> Ingo Molnar wrote:
> 
> > > a good idea to have SIGKILL delivery speeded up for every SIGKILL ...
> >
> > yep.
> 
> How about SIGTERM a bit before SIGKILL then re-evaluate the OOM
> N usecs later?

And run the risk of having to kill /another/ process as well ?

I really don't know if that would be a wise thing to do
(but feel free to do some tests to see if your idea would
work ... I'd love to hear some test results with your idea).

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 20:24                     ` Ingo Molnar
  2000-10-09 20:18                       ` Rik van Riel
@ 2000-10-09 20:38                       ` James Sutherland
  2000-10-09 20:40                         ` Rik van Riel
                                           ` (2 more replies)
  2000-10-09 22:29                       ` FORT David
  2 siblings, 3 replies; 112+ messages in thread
From: James Sutherland @ 2000-10-09 20:38 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rik van Riel, Andi Kleen, Andrea Arcangeli, Byron Stanoszek,
	Linus Torvalds, MM mailing list, linux-kernel

On Mon, 9 Oct 2000, Ingo Molnar wrote:

> On Mon, 9 Oct 2000, Rik van Riel wrote:
> 
> > > so dns helper is killed first, then netscape. (my idea might not
> > > make sense though.)
> > 
> > It makes some sense, but I don't think OOM is something that
> > occurs often enough to care about it /that/ much...
> 
> i'm trying to handle Andrea's case, the init=/bin/bash manual-bootup case,
> with 4MB RAM and no swap, where the admin tries to exec a 2MB process. I
> think it's a legitimate concern - i cannot know in advance whether a
> freshly started process would trigger an OOM or not.

Shouldn't the runtime factor handle this, making sure the new process is
killed? (Maybe not if you're almost OOM right from the word go, and run
this process straight off... Hrm.)


James.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 20:38                       ` James Sutherland
@ 2000-10-09 20:40                         ` Rik van Riel
  2000-10-10  9:59                           ` J.A. Sutherland
  2000-10-09 20:44                         ` Andrea Arcangeli
  2000-10-09 21:52                         ` Aaron Sethman
  2 siblings, 1 reply; 112+ messages in thread
From: Rik van Riel @ 2000-10-09 20:40 UTC (permalink / raw)
  To: James Sutherland
  Cc: Ingo Molnar, Andi Kleen, Andrea Arcangeli, Byron Stanoszek,
	Linus Torvalds, MM mailing list, linux-kernel

On Mon, 9 Oct 2000, James Sutherland wrote:
> On Mon, 9 Oct 2000, Ingo Molnar wrote:
> > On Mon, 9 Oct 2000, Rik van Riel wrote:
> > 
> > > > so dns helper is killed first, then netscape. (my idea might not
> > > > make sense though.)
> > > 
> > > It makes some sense, but I don't think OOM is something that
> > > occurs often enough to care about it /that/ much...
> > 
> > i'm trying to handle Andrea's case, the init=/bin/bash manual-bootup case,
> > with 4MB RAM and no swap, where the admin tries to exec a 2MB process. I
> > think it's a legitimate concern - i cannot know in advance whether a
> > freshly started process would trigger an OOM or not.
> 
> Shouldn't the runtime factor handle this, making sure the new
> process is killed? (Maybe not if you're almost OOM right from
> the word go, and run this process straight off... Hrm.)

It should.

Also, the example is a tad unrealistic since init seems to be
around 70 kB in size on my systems ;)

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 20:06             ` Ingo Molnar
  2000-10-09 20:06               ` Andi Kleen
  2000-10-09 20:11               ` [PATCH] VM fix for 2.4.0-test9 & " Andrea Arcangeli
@ 2000-10-09 20:40               ` Linus Torvalds
  2000-10-09 20:47                 ` Rik van Riel
  2000-10-09 20:57                 ` Ingo Molnar
  2000-10-09 21:10               ` Alan Cox
  3 siblings, 2 replies; 112+ messages in thread
From: Linus Torvalds @ 2000-10-09 20:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrea Arcangeli, Rik van Riel, Byron Stanoszek, MM mailing list,
	linux-kernel


On Mon, 9 Oct 2000, Ingo Molnar wrote:
> 
> i think the OOM algorithm should not kill processes that have
> child-processes, it should first kill child-less 'leaves'. Killing a
> process that has child processes likely results in unexpected behavior of
> those child-processes. (and equals to effective killing of those
> child-processes as well.)

I disagree - if we start adding these kinds of heuristics to it, it wil
just be a way for people to try to confuse the OOM code. Imagine some bad
guy that does 15 fork()'s and then tries to OOM...

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 20:38                       ` James Sutherland
  2000-10-09 20:40                         ` Rik van Riel
@ 2000-10-09 20:44                         ` Andrea Arcangeli
  2000-10-09 21:52                         ` Aaron Sethman
  2 siblings, 0 replies; 112+ messages in thread
From: Andrea Arcangeli @ 2000-10-09 20:44 UTC (permalink / raw)
  To: James Sutherland
  Cc: Ingo Molnar, Rik van Riel, Andi Kleen, Byron Stanoszek,
	Linus Torvalds, MM mailing list, linux-kernel

On Mon, Oct 09, 2000 at 09:38:08PM +0100, James Sutherland wrote:
> Shouldn't the runtime factor handle this, making sure the new process is

The runtime factor in the algorithm will make the first difference
only after lots lots of time (and the run_time can as well be wrong
because of jiffies wrap around). But even if it would make a difference
after 1 second, there would be a 1 second window where init can be killed.

Andrea
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 20:34                   ` Rik van Riel
@ 2000-10-09 20:45                     ` David Ford
  2000-10-10  4:22                       ` Andreas Dilger
  0 siblings, 1 reply; 112+ messages in thread
From: David Ford @ 2000-10-09 20:45 UTC (permalink / raw)
  To: Rik van Riel
  Cc: mingo, Andrea Arcangeli, Byron Stanoszek, Linus Torvalds,
	MM mailing list, linux-kernel

Rik van Riel wrote:

> > How about SIGTERM a bit before SIGKILL then re-evaluate the OOM
> > N usecs later?
>
> And run the risk of having to kill /another/ process as well ?
>
> I really don't know if that would be a wise thing to do
> (but feel free to do some tests to see if your idea would
> work ... I'd love to hear some test results with your idea).

I was thinking (dangerous) about an urgent v.s. critical OOM.  urgent could
trigger a SIGTERM which would give advance notice to the offending process.
I don't think we have a signal method of notifying processes when resources
are critically low, feel free to correct me.

Is there a signal that -might- be used for this?

-d

--
      "There is a natural aristocracy among men. The grounds of this are
      virtue and talents", Thomas Jefferson [1742-1826], 3rd US President



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 20:40               ` Linus Torvalds
@ 2000-10-09 20:47                 ` Rik van Riel
  2000-10-09 20:57                 ` Ingo Molnar
  1 sibling, 0 replies; 112+ messages in thread
From: Rik van Riel @ 2000-10-09 20:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Andrea Arcangeli, Byron Stanoszek, MM mailing list,
	linux-kernel

On Mon, 9 Oct 2000, Linus Torvalds wrote:
> On Mon, 9 Oct 2000, Ingo Molnar wrote:
> > 
> > i think the OOM algorithm should not kill processes that have
> > child-processes, it should first kill child-less 'leaves'. Killing a
> > process that has child processes likely results in unexpected behavior of
> > those child-processes. (and equals to effective killing of those
> > child-processes as well.)
> 
> I disagree - if we start adding these kinds of heuristics to it,
> it wil just be a way for people to try to confuse the OOM code.
> Imagine some bad guy that does 15 fork()'s and then tries to
> OOM...

Also, the only way to prevent bad things like this is userbeans,
the per-user resource quotas; until we have that there will ALWAYS
be ways to fool the OOM killer. It is just a stop-gap measure to
recover from a very bad situation...

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 20:06               ` Andi Kleen
  2000-10-09 20:19                 ` Ingo Molnar
@ 2000-10-09 20:52                 ` Linus Torvalds
  2000-10-09 20:58                   ` Andi Kleen
  2000-10-09 21:05                   ` Rik van Riel
  1 sibling, 2 replies; 112+ messages in thread
From: Linus Torvalds @ 2000-10-09 20:52 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Ingo Molnar, Andrea Arcangeli, Rik van Riel, Byron Stanoszek,
	MM mailing list, linux-kernel

On Mon, 9 Oct 2000, Andi Kleen wrote:
> 
> netscape usually has child processes: the dns helper. 

Yeah.

One thing we _can_ (and probably should do) is to do a per-user memory
pressure thing - we have easy access to the "struct user_struct" (every
process has a direct pointer to it), and it should not be too bad to
maintain a per-user "VM pressure" counter.

Then, instead of trying to use heuristics like "does this process have
children" etc, you'd have things like "is this user a nasty user", which
is a much more valid thing to do and can be used to find people who fork
tons of processes that are mid-sized but use a lot of memory due to just
being many..

			Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 20:40               ` Linus Torvalds
  2000-10-09 20:47                 ` Rik van Riel
@ 2000-10-09 20:57                 ` Ingo Molnar
  2000-10-09 21:10                   ` Peter Waltenberg
  1 sibling, 1 reply; 112+ messages in thread
From: Ingo Molnar @ 2000-10-09 20:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrea Arcangeli, Rik van Riel, Byron Stanoszek, MM mailing list,
	linux-kernel

On Mon, 9 Oct 2000, Linus Torvalds wrote:

> I disagree - if we start adding these kinds of heuristics to it, it
> wil just be a way for people to try to confuse the OOM code. Imagine
> some bad guy that does 15 fork()'s and then tries to OOM...

yep.

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 20:52                 ` Linus Torvalds
@ 2000-10-09 20:58                   ` Andi Kleen
  2000-10-09 21:21                     ` Jim Gettys
  2000-10-09 21:05                   ` Rik van Riel
  1 sibling, 1 reply; 112+ messages in thread
From: Andi Kleen @ 2000-10-09 20:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andi Kleen, Ingo Molnar, Andrea Arcangeli, Rik van Riel,
	Byron Stanoszek, MM mailing list, linux-kernel

On Mon, Oct 09, 2000 at 01:52:21PM -0700, Linus Torvalds wrote:
> One thing we _can_ (and probably should do) is to do a per-user memory
> pressure thing - we have easy access to the "struct user_struct" (every
> process has a direct pointer to it), and it should not be too bad to
> maintain a per-user "VM pressure" counter.
> 
> Then, instead of trying to use heuristics like "does this process have
> children" etc, you'd have things like "is this user a nasty user", which
> is a much more valid thing to do and can be used to find people who fork
> tons of processes that are mid-sized but use a lot of memory due to just
> being many..

Would not help much when "they" eat your memory by loading big bitmaps
into the X server which runs as root (it seems there are many programs
which are very good at this particular DOS ;) 

Also I think most oom situations are accidents anyways, not malicious users.
When you're the only user of the machine sophisticated per user accouting
won't be very useful. 

-Andi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 20:52                 ` Linus Torvalds
  2000-10-09 20:58                   ` Andi Kleen
@ 2000-10-09 21:05                   ` Rik van Riel
  2000-10-09 22:08                     ` Gerrit.Huizenga
  1 sibling, 1 reply; 112+ messages in thread
From: Rik van Riel @ 2000-10-09 21:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andi Kleen, Ingo Molnar, Andrea Arcangeli, Byron Stanoszek,
	MM mailing list, linux-kernel

On Mon, 9 Oct 2000, Linus Torvalds wrote:
> On Mon, 9 Oct 2000, Andi Kleen wrote:
> > 
> > netscape usually has child processes: the dns helper. 
> 
> Yeah.
> 
> One thing we _can_ (and probably should do) is to do a per-user
> memory pressure thing - we have easy access to the "struct
> user_struct" (every process has a direct pointer to it), and it
> should not be too bad to maintain a per-user "VM pressure"
> counter.
> 
> Then, instead of trying to use heuristics like "does this
> process have children" etc, you'd have things like "is this user
> a nasty user", which is a much more valid thing to do and can be
> used to find people who fork tons of processes that are
> mid-sized but use a lot of memory due to just being many..

Sure we could do all of this, but does OOM really happen that
often that we want to make the algorithm this complex ?

The current algorithm seems to work quite well and is already
at the limit of how complex I'd like to see it. Having a less
complex OOM killer turned out to not work very well, but having
a more complex one is - IMHO - probably overkill ...

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 19:30       ` [PATCH] VM fix for 2.4.0-test9 & OOM handler David Ford
  2000-10-09 19:58         ` Andrea Arcangeli
  2000-10-09 20:05         ` Rik van Riel
@ 2000-10-09 21:07         ` Alan Cox
  2000-10-10  3:38           ` Philipp Rumpf
  2 siblings, 1 reply; 112+ messages in thread
From: Alan Cox @ 2000-10-09 21:07 UTC (permalink / raw)
  To: david+validemail
  Cc: mingo, Andrea Arcangeli, Byron Stanoszek, Rik van Riel,
	Linus Torvalds, linux-mm, linux-kernel

> Then spam the console loudly with printk, but don't destroy the whole machine.
> Init should only get killed if it REALLY is taking a lot of memory.  On a 4 or 8meg

If init dies the kernel hangs solid anyway

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 20:57                 ` Ingo Molnar
@ 2000-10-09 21:10                   ` Peter Waltenberg
  2000-10-09 22:25                     ` Andrea Arcangeli
  0 siblings, 1 reply; 112+ messages in thread
From: Peter Waltenberg @ 2000-10-09 21:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: MM mailing list, Byron Stanoszek, Rik van Riel, Andrea Arcangeli,
	Linus Torvalds

On 09-Oct-2000 Ingo Molnar wrote:
> 
> On Mon, 9 Oct 2000, Linus Torvalds wrote:
> 
>> I disagree - if we start adding these kinds of heuristics to it, it
>> wil just be a way for people to try to confuse the OOM code. Imagine
>> some bad guy that does 15 fork()'s and then tries to OOM...
> 
> yep.
> 
>       Ingo
> 

People seem to be forgetting (again), that Rik's code is *REALLY* an
OOM killer, i.e. it only kicks in when there is *NO* memory left. If something
isn't killed now, the machine hangs or crashes anyway.

I.e. it isn't a "well in 5 minutes or so we'll be a little short of memory
so lets ask some of these processes to go away killer", it kicks in when there
probably isn't enough RAM left to safely do a printk, let alone pop up a window
asking the user which process he or she wants to sacrifce today.

It's probably reasonable to not kill init, but the rest just don't matter.
Without the OOM killer the machine would have locked up and you'd lose that 3
days of work from the background process. You'd have lost that site you
were looking at with Netscape, (etc).

At least with Rik's code you end up with a usable machine afterwards which is a
major improvement.

If you want "clever" do it in user space, the kernel code should be as simple
as possible.

Peter
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 20:06             ` Ingo Molnar
                                 ` (2 preceding siblings ...)
  2000-10-09 20:40               ` Linus Torvalds
@ 2000-10-09 21:10               ` Alan Cox
  2000-10-09 21:25                 ` Ingo Molnar
  3 siblings, 1 reply; 112+ messages in thread
From: Alan Cox @ 2000-10-09 21:10 UTC (permalink / raw)
  To: mingo
  Cc: Andrea Arcangeli, Rik van Riel, Byron Stanoszek, Linus Torvalds,
	MM mailing list, linux-kernel

> i think the OOM algorithm should not kill processes that have
> child-processes, it should first kill child-less 'leaves'. Killing a
> process that has child processes likely results in unexpected behavior of
> those child-processes. (and equals to effective killing of those
> child-processes as well.)

Lets kill a 6 week long typical background compute job because netscape exploded
(and yes netscape has a child process)

Rik's current OOM killer works very well but its a heuristic, so like all
heuristics you can always find a problem case

Alan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 20:58                   ` Andi Kleen
@ 2000-10-09 21:21                     ` Jim Gettys
  2000-10-09 21:28                       ` Alan Cox
  0 siblings, 1 reply; 112+ messages in thread
From: Jim Gettys @ 2000-10-09 21:21 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Linus Torvalds, Ingo Molnar, Andrea Arcangeli, Rik van Riel,
	Byron Stanoszek, MM mailing list, linux-kernel

> Sender: linux-kernel-owner@vger.kernel.org
> From: "Andi Kleen" <ak@suse.de>
> Date: 	Mon, 9 Oct 2000 22:58:22 +0200
> To: Linus Torvalds <torvalds@transmeta.com>
> Cc: Andi Kleen <ak@suse.de>, Ingo Molnar <mingo@elte.hu>,
>         Andrea Arcangeli <andrea@suse.de>,
>         Rik van Riel <riel@conectiva.com.br>,
>         Byron Stanoszek <gandalf@winds.org>,
>         MM mailing list <linux-mm@kvack.org>, linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
> -----
> On Mon, Oct 09, 2000 at 01:52:21PM -0700, Linus Torvalds wrote:
> > One thing we _can_ (and probably should do) is to do a per-user memory
> > pressure thing - we have easy access to the "struct user_struct" (every
> > process has a direct pointer to it), and it should not be too bad to
> > maintain a per-user "VM pressure" counter.
> >
> > Then, instead of trying to use heuristics like "does this process have
> > children" etc, you'd have things like "is this user a nasty user", which
> > is a much more valid thing to do and can be used to find people who fork
> > tons of processes that are mid-sized but use a lot of memory due to just
> > being many..
> 
> Would not help much when "they" eat your memory by loading big bitmaps
> into the X server which runs as root (it seems there are many programs
> which are very good at this particular DOS ;)
> 

This is generic to any server program, not unique to X.

Sounds like one needs in addition some mechanism for servers to "charge" clients for
consumption. X certainly knows on behalf of which connection resources
are created; the OS could then transfer this back to the appropriate client
(at least when on machine).

					- Jim

--
Jim Gettys
Technology and Corporate Development
Compaq Computer Corporation
jg@pa.dec.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 21:10               ` Alan Cox
@ 2000-10-09 21:25                 ` Ingo Molnar
  2000-10-09 21:26                   ` Rik van Riel
  0 siblings, 1 reply; 112+ messages in thread
From: Ingo Molnar @ 2000-10-09 21:25 UTC (permalink / raw)
  To: Alan Cox
  Cc: Andrea Arcangeli, Rik van Riel, Byron Stanoszek, Linus Torvalds,
	MM mailing list, linux-kernel

On Mon, 9 Oct 2000, Alan Cox wrote:

> Lets kill a 6 week long typical background compute job because
> netscape exploded (and yes netscape has a child process)

in the paragraph you didnt quote i pointed this out and suggested adding
all parent's badness value to children as well - so we'd end up killing
netscape.

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 21:25                 ` Ingo Molnar
@ 2000-10-09 21:26                   ` Rik van Riel
  2000-10-09 21:38                     ` Ingo Molnar
  0 siblings, 1 reply; 112+ messages in thread
From: Rik van Riel @ 2000-10-09 21:26 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Alan Cox, Andrea Arcangeli, Byron Stanoszek, Linus Torvalds,
	MM mailing list, linux-kernel

On Mon, 9 Oct 2000, Ingo Molnar wrote:
> On Mon, 9 Oct 2000, Alan Cox wrote:
> 
> > Lets kill a 6 week long typical background compute job because
> > netscape exploded (and yes netscape has a child process)
> 
> in the paragraph you didnt quote i pointed this out and
> suggested adding all parent's badness value to children as well
> - so we'd end up killing netscape.

Would this complexity /really/ be worth it for the twice-yearly
OOM situation?

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 21:21                     ` Jim Gettys
@ 2000-10-09 21:28                       ` Alan Cox
  2000-10-09 21:34                         ` Andi Kleen
                                           ` (2 more replies)
  0 siblings, 3 replies; 112+ messages in thread
From: Alan Cox @ 2000-10-09 21:28 UTC (permalink / raw)
  To: Jim Gettys
  Cc: Andi Kleen, Linus Torvalds, Ingo Molnar, Andrea Arcangeli,
	Rik van Riel, Byron Stanoszek, MM mailing list, linux-kernel

> Sounds like one needs in addition some mechanism for servers to "charge" clients for
> consumption. X certainly knows on behalf of which connection resources
> are created; the OS could then transfer this back to the appropriate client
> (at least when on machine).

Definitely - and this is present in some non Unix OS's. We do pass credentials
across AF_UNIX sockets so the mechanism is notionally there to provide the 
credentials to X, just not to use them
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 21:38                     ` Ingo Molnar
@ 2000-10-09 21:34                       ` Rik van Riel
  2000-10-10  9:09                         ` john slee
  0 siblings, 1 reply; 112+ messages in thread
From: Rik van Riel @ 2000-10-09 21:34 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Alan Cox, Andrea Arcangeli, Byron Stanoszek, Linus Torvalds,
	MM mailing list, linux-kernel

On Mon, 9 Oct 2000, Ingo Molnar wrote:
> On Mon, 9 Oct 2000, Rik van Riel wrote:
> 
> > Would this complexity /really/ be worth it for the twice-yearly OOM
> > situation?
> 
> the only reason i suggested this was the init=/bin/bash, 4MB
> RAM, no swap emergency-bootup case. We must not kill init in
> that case - if the current code doesnt then great and none of
> this is needed.

I guess this requires some testing. If anybody can reproduce
the bad effects without going /too/ much out of the way of a
realistic scenario, the code needs to be fixed.

If it turns out to be a non-issue in all scenarios, there's
no need to make the code any more complex.

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 21:28                       ` Alan Cox
@ 2000-10-09 21:34                         ` Andi Kleen
  2000-10-09 21:38                         ` Linus Torvalds
  2000-10-09 21:40                         ` Jim Gettys
  2 siblings, 0 replies; 112+ messages in thread
From: Andi Kleen @ 2000-10-09 21:34 UTC (permalink / raw)
  To: Alan Cox
  Cc: Jim Gettys, Andi Kleen, Linus Torvalds, Ingo Molnar,
	Andrea Arcangeli, Rik van Riel, Byron Stanoszek, MM mailing list,
	linux-kernel

On Mon, Oct 09, 2000 at 10:28:38PM +0100, Alan Cox wrote:
> > Sounds like one needs in addition some mechanism for servers to "charge" clients for
> > consumption. X certainly knows on behalf of which connection resources
> > are created; the OS could then transfer this back to the appropriate client
> > (at least when on machine).
> 
> Definitely - and this is present in some non Unix OS's. We do pass credentials
> across AF_UNIX sockets so the mechanism is notionally there to provide the 
> credentials to X, just not to use them

X can get the pid using SO_PEERCRED for unix connections. 

When the oom killer maintains some kind of badness value in the task_struct
it would be possible to add a charge() systemcall that manipulates it.

int charge(pid_t pid, int memorytobecharged) 


-Andi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 21:28                       ` Alan Cox
  2000-10-09 21:34                         ` Andi Kleen
@ 2000-10-09 21:38                         ` Linus Torvalds
  2000-10-09 21:39                           ` Rik van Riel
                                             ` (2 more replies)
  2000-10-09 21:40                         ` Jim Gettys
  2 siblings, 3 replies; 112+ messages in thread
From: Linus Torvalds @ 2000-10-09 21:38 UTC (permalink / raw)
  To: Alan Cox
  Cc: Jim Gettys, Andi Kleen, Ingo Molnar, Andrea Arcangeli,
	Rik van Riel, Byron Stanoszek, MM mailing list, linux-kernel


On Mon, 9 Oct 2000, Alan Cox wrote:
> > consumption. X certainly knows on behalf of which connection resources
> > are created; the OS could then transfer this back to the appropriate client
> > (at least when on machine).
> 
> Definitely - and this is present in some non Unix OS's. We do pass credentials
> across AF_UNIX sockets so the mechanism is notionally there to provide the 
> credentials to X, just not to use them

The problem is that there is no way to keep track of them afterwards.

So the process that gave X the bitmap dies. What now? Are we going to
depend on X un-counting the resources?

I'd prefer just X having a higher "mm nice level" or something.

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 21:26                   ` Rik van Riel
@ 2000-10-09 21:38                     ` Ingo Molnar
  2000-10-09 21:34                       ` Rik van Riel
  0 siblings, 1 reply; 112+ messages in thread
From: Ingo Molnar @ 2000-10-09 21:38 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Alan Cox, Andrea Arcangeli, Byron Stanoszek, Linus Torvalds,
	MM mailing list, linux-kernel

On Mon, 9 Oct 2000, Rik van Riel wrote:

> Would this complexity /really/ be worth it for the twice-yearly OOM
> situation?

the only reason i suggested this was the init=/bin/bash, 4MB RAM, no swap
emergency-bootup case. We must not kill init in that case - if the current
code doesnt then great and none of this is needed.

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 21:38                         ` Linus Torvalds
@ 2000-10-09 21:39                           ` Rik van Riel
  2000-10-09 21:44                             ` Linus Torvalds
  2000-10-09 21:44                           ` Jim Gettys
  2000-10-09 21:51                           ` Alan Cox
  2 siblings, 1 reply; 112+ messages in thread
From: Rik van Riel @ 2000-10-09 21:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Cox, Jim Gettys, Andi Kleen, Ingo Molnar, Andrea Arcangeli,
	Byron Stanoszek, MM mailing list, linux-kernel

On Mon, 9 Oct 2000, Linus Torvalds wrote:
> On Mon, 9 Oct 2000, Alan Cox wrote:
> > > consumption. X certainly knows on behalf of which connection resources
> > > are created; the OS could then transfer this back to the appropriate client
> > > (at least when on machine).
> > 
> > Definitely - and this is present in some non Unix OS's. We do pass credentials
> > across AF_UNIX sockets so the mechanism is notionally there to provide the 
> > credentials to X, just not to use them
> 
> The problem is that there is no way to keep track of them afterwards.
> 
> So the process that gave X the bitmap dies. What now? Are we going to
> depend on X un-counting the resources?
> 
> I'd prefer just X having a higher "mm nice level" or something.

Which it has, because:

1) CAP_RAW_IO
2) p->euid == 0

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 21:28                       ` Alan Cox
  2000-10-09 21:34                         ` Andi Kleen
  2000-10-09 21:38                         ` Linus Torvalds
@ 2000-10-09 21:40                         ` Jim Gettys
  2 siblings, 0 replies; 112+ messages in thread
From: Jim Gettys @ 2000-10-09 21:40 UTC (permalink / raw)
  To: Alan Cox
  Cc: Jim Gettys, Andi Kleen, Linus Torvalds, Ingo Molnar,
	Andrea Arcangeli, Rik van Riel, Byron Stanoszek, MM mailing list,
	linux-kernel, sct, keithp, dshr

> > Sounds like one needs in addition some mechanism for servers to "charge"
> clients for
> > consumption. X certainly knows on behalf of which connection resources
> > are created; the OS could then transfer this back to the appropriate client
> > (at least when on machine).
> 
> Definitely - and this is present in some non Unix OS's. We do pass credentials
> across AF_UNIX sockets so the mechanism is notionally there to provide the
> credentials to X, just not to use them

Stephen Tweedie, Dave Rosenthal, Keith Packard and myself had an extensive
discussion on similar ideas around process quantum scheduling (the X server
would like to be able to forward quantum to clients) as well at Usenix.
This is closely related, and needed to finally fully control interactive
feel in the face of "greedy" clients.

My memory is that it sounded like things could become very interesting
with such a facility, and might be ripe for 2.5.

Keith, Stephen, Dave, do you remember the details of our discussion?
			- Jim

--
Jim Gettys
Technology and Corporate Development
Compaq Computer Corporation
jg@pa.dec.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 21:39                           ` Rik van Riel
@ 2000-10-09 21:44                             ` Linus Torvalds
  2000-10-10 13:17                               ` Marco Colombo
  0 siblings, 1 reply; 112+ messages in thread
From: Linus Torvalds @ 2000-10-09 21:44 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Alan Cox, Jim Gettys, Andi Kleen, Ingo Molnar, Andrea Arcangeli,
	Byron Stanoszek, MM mailing list, linux-kernel


On Mon, 9 Oct 2000, Rik van Riel wrote:
>
> > I'd prefer just X having a higher "mm nice level" or something.
> 
> Which it has, because:
> 
> 1) CAP_RAW_IO
> 2) p->euid == 0

Oh, I agree, but we might want to generalize this a bit so that root could
say "this process is important" and then drop root privileges and still
get "credited" for the fact that it's important.

It's not a big deal. It works for X right now.

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 21:38                         ` Linus Torvalds
  2000-10-09 21:39                           ` Rik van Riel
@ 2000-10-09 21:44                           ` Jim Gettys
  2000-10-09 21:50                             ` Linus Torvalds
  2000-10-09 21:51                           ` Alan Cox
  2 siblings, 1 reply; 112+ messages in thread
From: Jim Gettys @ 2000-10-09 21:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Cox, Jim Gettys, Andi Kleen, Ingo Molnar, Andrea Arcangeli,
	Rik van Riel, Byron Stanoszek, MM mailing list, linux-kernel


On Date: Mon, 9 Oct 2000 14:38:10 -0700 (PDT), \fLinus Torvalds <torvalds@transmeta.com>
said:

> 
> The problem is that there is no way to keep track of them afterwards.
> 
> So the process that gave X the bitmap dies. What now? Are we going to
> depend on X un-counting the resources?
> 

X has to uncount the resources already, to free the memory in the X server
allocated on behalf of that client.  X has to get this right, to be a long
lived server (properly debugged X servers last many months without problems:
unfortunately, a fair number of DDX's are buggy).

					- Jim

--
Jim Gettys
Technology and Corporate Development
Compaq Computer Corporation
jg@pa.dec.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 21:44                           ` Jim Gettys
@ 2000-10-09 21:50                             ` Linus Torvalds
  2000-10-09 22:07                               ` Jim Gettys
  2000-10-10 14:41                               ` Rogier Wolff
  0 siblings, 2 replies; 112+ messages in thread
From: Linus Torvalds @ 2000-10-09 21:50 UTC (permalink / raw)
  To: Jim Gettys
  Cc: Alan Cox, Andi Kleen, Ingo Molnar, Andrea Arcangeli,
	Rik van Riel, Byron Stanoszek, MM mailing list, linux-kernel

On Mon, 9 Oct 2000, Jim Gettys wrote:
> 
> 
> On Date: Mon, 9 Oct 2000 14:38:10 -0700 (PDT), \fLinus Torvalds <torvalds@transmeta.com>
> said:
> 
> >
> > The problem is that there is no way to keep track of them afterwards.
> >
> > So the process that gave X the bitmap dies. What now? Are we going to
> > depend on X un-counting the resources?
> >
> 
> X has to uncount the resources already, to free the memory in the X server
> allocated on behalf of that client.  X has to get this right, to be a long
> lived server (properly debugged X servers last many months without problems:
> unfortunately, a fair number of DDX's are buggy).

No, but my point is that it doesn't really work.

One of the biggest bitmaps is the background bitmap. So you have a client
that uploads it to X and then goes away. There's nobody to un-count to by
the time X decides to switch to another background.

Does that memory just disappear as far as the resource handling is
concerned when the client that originated it dies?

What happens with TCP connections? They might be local. Or they might not.
In either case X doesn't know whom to blame.

Basically, the only thing _I_ think X can do is to really say "oh, please
don't count my memory, because everything I do I do for my clients, not
for myself". 

THAT is my argument. Basically there is nothing we can reliably account.

So we might as well fall back on just saying "X is more important than
some random client", and have a mm niceness level. Which right now is
obviously approximated by the IO capabilities tests etc.

			Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 21:38                         ` Linus Torvalds
  2000-10-09 21:39                           ` Rik van Riel
  2000-10-09 21:44                           ` Jim Gettys
@ 2000-10-09 21:51                           ` Alan Cox
  2 siblings, 0 replies; 112+ messages in thread
From: Alan Cox @ 2000-10-09 21:51 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Cox, Jim Gettys, Andi Kleen, Ingo Molnar, Andrea Arcangeli,
	Rik van Riel, Byron Stanoszek, MM mailing list, linux-kernel

> > across AF_UNIX sockets so the mechanism is notionally there to provide the 
> > credentials to X, just not to use them
> 
> The problem is that there is no way to keep track of them afterwards.

If you use mmap for your allocator then beancounter will get it right. Every
resource knows which beancounter it was charged too. It adds an overhead the
average desktop user won't like but which is pretty much essential to do real
mainframe world operation. So it would become

	seteuid(Client->passed_euid);
	mmap(buffer in pages)
	seteuid(getuid());

With lightwait counting semantics its hard to make any tracking system work
well in the corner cases like resources that survive process death.

Alan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 20:38                       ` James Sutherland
  2000-10-09 20:40                         ` Rik van Riel
  2000-10-09 20:44                         ` Andrea Arcangeli
@ 2000-10-09 21:52                         ` Aaron Sethman
  2000-10-09 21:54                           ` Rik van Riel
  2 siblings, 1 reply; 112+ messages in thread
From: Aaron Sethman @ 2000-10-09 21:52 UTC (permalink / raw)
  To: James Sutherland
  Cc: Ingo Molnar, Rik van Riel, Andi Kleen, Andrea Arcangeli,
	Byron Stanoszek, Linus Torvalds, MM mailing list, linux-kernel

On Mon, 9 Oct 2000, James Sutherland wrote:

> On Mon, 9 Oct 2000, Ingo Molnar wrote:
> 
> > On Mon, 9 Oct 2000, Rik van Riel wrote:
> > 
> > > > so dns helper is killed first, then netscape. (my idea might not
> > > > make sense though.)
> > > 
> > > It makes some sense, but I don't think OOM is something that
> > > occurs often enough to care about it /that/ much...
> > 
> > i'm trying to handle Andrea's case, the init=/bin/bash manual-bootup case,
> > with 4MB RAM and no swap, where the admin tries to exec a 2MB process. I
> > think it's a legitimate concern - i cannot know in advance whether a
> > freshly started process would trigger an OOM or not.
> 
> Shouldn't the runtime factor handle this, making sure the new process is
> killed? (Maybe not if you're almost OOM right from the word go, and run
> this process straight off... Hrm.)

I think the run time should probably be accounted into to this as
well. Basically start knocking off recent processes first, which are
likely to be childless, and start working your way up in age. The
reasoning here is that your less likely an important, long running
service.  Of course you could probably account for whether the process is
childless or not as well. 

Just my $0.02 on it..


Aaron

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 21:52                         ` Aaron Sethman
@ 2000-10-09 21:54                           ` Rik van Riel
  0 siblings, 0 replies; 112+ messages in thread
From: Rik van Riel @ 2000-10-09 21:54 UTC (permalink / raw)
  To: Aaron Sethman
  Cc: James Sutherland, Ingo Molnar, Andi Kleen, Andrea Arcangeli,
	Byron Stanoszek, Linus Torvalds, MM mailing list, linux-kernel

On Mon, 9 Oct 2000, Aaron Sethman wrote:

> I think the run time should probably be accounted into to this
> as well. Basically start knocking off recent processes first,
> which are likely to be childless, and start working your way up
> in age.

I'm almost getting USENET flashbacks ...  ;)

Please look at the code before suggesting something that
is already there (and has been in the code for some 2 years).

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 21:50                             ` Linus Torvalds
@ 2000-10-09 22:07                               ` Jim Gettys
  2000-10-09 23:13                                 ` Albert D. Cahalan
  2000-10-10 14:41                               ` Rogier Wolff
  1 sibling, 1 reply; 112+ messages in thread
From: Jim Gettys @ 2000-10-09 22:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jim Gettys, Alan Cox, Andi Kleen, Ingo Molnar, Andrea Arcangeli,
	Rik van Riel, Byron Stanoszek, MM mailing list, linux-kernel

> From: Linus Torvalds <torvalds@transmeta.com>
> Date: Mon, 9 Oct 2000 14:50:51 -0700 (PDT)
> To: Jim Gettys <jg@pa.dec.com>
> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>, Andi Kleen <ak@suse.de>,
>         Ingo Molnar <mingo@elte.hu>, Andrea Arcangeli <andrea@suse.de>,
>         Rik van Riel <riel@conectiva.com.br>,
>         Byron Stanoszek <gandalf@winds.org>,
>         MM mailing list <linux-mm@kvack.org>, linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
> -----
> On Mon, 9 Oct 2000, Jim Gettys wrote:
> >
> >
> > On Date: Mon, 9 Oct 2000 14:38:10 -0700 (PDT), \fLinus Torvalds
> <torvalds@transmeta.com>
> > said:
> >
> > >
> > > The problem is that there is no way to keep track of them afterwards.
> > >
> > > So the process that gave X the bitmap dies. What now? Are we going to
> > > depend on X un-counting the resources?
> > >
> >
> > X has to uncount the resources already, to free the memory in the X server
> > allocated on behalf of that client.  X has to get this right, to be a long
> > lived server (properly debugged X servers last many months without problems:
> > unfortunately, a fair number of DDX's are buggy).
> 
> No, but my point is that it doesn't really work.
> 
> One of the biggest bitmaps is the background bitmap. So you have a client
> that uploads it to X and then goes away. There's nobody to un-count to by
> the time X decides to switch to another background.

Actually, the big offenders are things other than the background bitmap:
things like E do absolutely insane things, you would not believe (or maybe
you would).  The background pixmap is generally in the worst case typically
no worse than 4 megabytes (for those people who are crazy enough to put
images up as their root window on 32 bit deep displays, at 1kX1k resolution).

> 
> Does that memory just disappear as far as the resource handling is
> concerned when the client that originated it dies?

No, X recovers the memory when a connection dies, unless the client has
gone out of its way to arrange to preserve things across connection
termination.  Few, if any clients do this: it is primarily possible mostly
for debugging purposes, that (fortunately, or unfortunately, depending
on your opinion) what happens not just vanish before you can see what
happened.

So the X server does extensive bookkeeping of its memory usage, and retrieves
all memory used by clients when they terminate (with the above rare
exception).

> 
> What happens with TCP connections? They might be local. Or they might not.
> In either case X doesn't know whom to blame.

At least on BSD kernels, it was reasonably straightforward to determine
if a TCP connection was local: in that case, the code actually did an upcall
and delivered data directly to the appropriate socket.  Dunno about the
insides of Linux.

I suspect it should not be hard to find the right process for local
connections.  Distant connections are, indeed, a challenge.

> 
> Basically, the only thing _I_ think X can do is to really say "oh, please
> don't count my memory, because everything I do I do for my clients, not
> for myself".
> 
> THAT is my argument. Basically there is nothing we can reliably account.

Your argument has alot of validity, though the X server does a better job
of accounting than you might think.

BUT, I'm actually more interested in dealing with scheduling preferences, to
get really first rate interactive feel.

> 
> So we might as well fall back on just saying "X is more important than
> some random client", and have a mm niceness level. Which right now is
> obviously approximated by the IO capabilities tests etc.
> 

As I say above, the principle here may be more useful than for the memory 
example, but for controlling scheduling so we can get great interactive 
feel.  THAT is what is really worth discussing.
				- Jim

--
Jim Gettys
Technology and Corporate Development
Compaq Computer Corporation
jg@pa.dec.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 21:05                   ` Rik van Riel
@ 2000-10-09 22:08                     ` Gerrit.Huizenga
  2000-10-09 22:34                       ` Byron Stanoszek
  0 siblings, 1 reply; 112+ messages in thread
From: Gerrit.Huizenga @ 2000-10-09 22:08 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Linus Torvalds, Andi Kleen, Ingo Molnar, Andrea Arcangeli,
	Byron Stanoszek, MM mailing list, linux-kernel

At Sequent, we found that there are a small set of processes which are
"critical" to the system's operation in that they should not be killed
on swap shortage, memory shortage, etc.  This included things like init,
potentially inetd, the swapper, page daemon, clusters heartbeat daemon,
and generally any core system service which had a user process component.
If there wasn't enough memory for those processes, or if those processes
weren't already responsible in their use of memory/resources, you were
already toast.

Anyway, there is/was an API in PTX to say (either from in-kernel or through
some user machinations) "I Am a System Process".  Turns on a bit in the
proc struct (task struct) that made it exempt from death from a variety
of sources, e.g. OOM, generic user signals, portions of system shutdown,
etc.

Then, the code looking for things to kill simply skips those that are
intelligently marked, taking most of the decision making/policy making
out of the scheduler/memory manager.

gerrit

> On Mon, 9 Oct 2000, Linus Torvalds wrote:
> > On Mon, 9 Oct 2000, Andi Kleen wrote:
> > > 
> > > netscape usually has child processes: the dns helper. 
> > 
> > Yeah.
> > 
> > One thing we _can_ (and probably should do) is to do a per-user
> > memory pressure thing - we have easy access to the "struct
> > user_struct" (every process has a direct pointer to it), and it
> > should not be too bad to maintain a per-user "VM pressure"
> > counter.
> > 
> > Then, instead of trying to use heuristics like "does this
> > process have children" etc, you'd have things like "is this user
> > a nasty user", which is a much more valid thing to do and can be
> > used to find people who fork tons of processes that are
> > mid-sized but use a lot of memory due to just being many..
> 
> Sure we could do all of this, but does OOM really happen that
> often that we want to make the algorithm this complex ?
> 
> The current algorithm seems to work quite well and is already
> at the limit of how complex I'd like to see it. Having a less
> complex OOM killer turned out to not work very well, but having
> a more complex one is - IMHO - probably overkill ...
> 
> regards,
> 
> Rik
> --
> "What you're running that piece of shit Gnome?!?!"
>        -- Miguel de Icaza, UKUUG 2000
> 
> http://www.conectiva.com/		http://www.surriel.com/
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux.eu.org/Linux-MM/
> 
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 21:10                   ` Peter Waltenberg
@ 2000-10-09 22:25                     ` Andrea Arcangeli
  2000-10-09 22:59                       ` Peter Waltenberg
  2000-10-09 23:10                       ` Rik van Riel
  0 siblings, 2 replies; 112+ messages in thread
From: Andrea Arcangeli @ 2000-10-09 22:25 UTC (permalink / raw)
  To: Peter Waltenberg
  Cc: Ingo Molnar, MM mailing list, Byron Stanoszek, Rik van Riel,
	Linus Torvalds

On Tue, Oct 10, 2000 at 07:10:13AM +1000, Peter Waltenberg wrote:
> People seem to be forgetting (again), that Rik's code is *REALLY* an

Please explain why you think "people" is forgetting that. At least from my part
I wasn't forgetting that and so far I didn't read any email that made me to
think others are forgetting that.

> It's probably reasonable to not kill init, but the rest just don't matter.

Killing init is a kernel bug.

> Without the OOM killer the machine would have locked up and you'd lose that 3

Grab 2.2.18pre15aa1 and try to lockup the machine if you can.

> At least with Rik's code you end up with a usable machine afterwards which is
> a major improvement.

If current 2.4.x lockups during OOM that's because of bugs introduced during
2.[34].x. The oom killer is completly irrelevant to the stability of the kernel,
the oom killer only deals with the _selection_ of the task to kill. OOM
detection is a completly orthogonal issue.

If something the oom killer can introduce a lockup condition if there isn't
a mechamism to fallback killing the current task (all the other tasks
may be sleeping on a down-nfs-server in interruptible mode).

Andrea
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 20:24                     ` Ingo Molnar
  2000-10-09 20:18                       ` Rik van Riel
  2000-10-09 20:38                       ` James Sutherland
@ 2000-10-09 22:29                       ` FORT David
  2 siblings, 0 replies; 112+ messages in thread
From: FORT David @ 2000-10-09 22:29 UTC (permalink / raw)
  Cc: MM mailing list

[-- Attachment #1: Type: text/plain, Size: 1972 bytes --]

Ingo Molnar wrote:

> On Mon, 9 Oct 2000, Rik van Riel wrote:
>
> > > so dns helper is killed first, then netscape. (my idea might not
> > > make sense though.)
> >
> > It makes some sense, but I don't think OOM is something that
> > occurs often enough to care about it /that/ much...
>
> i'm trying to handle Andrea's case, the init=/bin/bash manual-bootup case,
> with 4MB RAM and no swap, where the admin tries to exec a 2MB process. I
> think it's a legitimate concern - i cannot know in advance whether a
> freshly started process would trigger an OOM or not.
>
>         Ingo
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> Please read the FAQ at http://www.tux.org/lkml/

Everybody seems to agreed that depending of the goal, we may kill interactive
process or niced process. What
about a tunable OOM killer with a /proc/ file which would indicate which sort
of process to kill ?

--
%--IRIN->-Institut-de-Recherche-en-Informatique-de-Nantes-----------------%
% FORT David,                                                             %
% 7 avenue de la morvandiere                                   0240726275 %
% 44470 Thouare, France                                epopo@onetelnet.fr %
% ICU:78064991   AIM: enlighted popo             fort@irin.univ-nantes.fr %
%--LINUX-HTTPD-PIOGENE----------------------------------------------------%
%  -datamining <-/                        |   .~.                         %
%  -networking/flashed PHP3 coming soon   |   /V\        L  I  N  U  X    %
%  -opensource                            |  // \\     >Fear the Penguin< %
%  -GNOME/enlightenment/GIMP              | /(   )\                       %
%           feel enlighted....            |  ^^-^^                        %
%                             http://ibonneace.dyndns.org/ when connected %
%-------------------------------------------------------------------------%



[-- Attachment #2: Type: text/html, Size: 4011 bytes --]

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 22:08                     ` Gerrit.Huizenga
@ 2000-10-09 22:34                       ` Byron Stanoszek
  2000-10-09 22:57                         ` Rik van Riel
  2000-10-10  0:25                         ` [RFC] New ideas for the " Byron Stanoszek
  0 siblings, 2 replies; 112+ messages in thread
From: Byron Stanoszek @ 2000-10-09 22:34 UTC (permalink / raw)
  To: Gerrit.Huizenga
  Cc: Rik van Riel, Linus Torvalds, Andi Kleen, Ingo Molnar,
	Andrea Arcangeli, MM mailing list, linux-kernel

On Mon, 9 Oct 2000 Gerrit.Huizenga@us.ibm.com wrote:

> Anyway, there is/was an API in PTX to say (either from in-kernel or through
> some user machinations) "I Am a System Process".  Turns on a bit in the
> proc struct (task struct) that made it exempt from death from a variety
> of sources, e.g. OOM, generic user signals, portions of system shutdown,
> etc.

The current OOM killer does this, except for init. Checking to see if the
process has a page table is equivalent to checking for the kernel threads that
are integral to the system (PIDs 2-5). These will never be killed by the OOM.
Init, however, still can be killed, and there should be an additional statement
that doesn't kill if PID == 1.

I think we need to sit down and write a better OOM proposal, something that
doesn't use CPU time and the NICE flag. Lets concentrate our efforts on what
constitutes a good selection method instead of bickering with each other.

How about we start by everyone in this discussion give their opinion on what
the OOM selection process should do, listing them in both order of importance
and severity, giving a rational reason for each choice. Maybe then we can get
somewhere.

 -Byron

-- 
Byron Stanoszek                         Ph: (330) 644-3059
Systems Programmer                      Fax: (330) 644-8110
Commercial Timesharing Inc.             Email: bstanoszek@comtime.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 22:34                       ` Byron Stanoszek
@ 2000-10-09 22:57                         ` Rik van Riel
  2000-10-10  0:25                         ` [RFC] New ideas for the " Byron Stanoszek
  1 sibling, 0 replies; 112+ messages in thread
From: Rik van Riel @ 2000-10-09 22:57 UTC (permalink / raw)
  To: Byron Stanoszek
  Cc: Gerrit.Huizenga, Linus Torvalds, Andi Kleen, Ingo Molnar,
	Andrea Arcangeli, MM mailing list, linux-kernel

On Mon, 9 Oct 2000, Byron Stanoszek wrote:
> On Mon, 9 Oct 2000 Gerrit.Huizenga@us.ibm.com wrote:
> 
> > Anyway, there is/was an API in PTX to say (either from in-kernel or through
> > some user machinations) "I Am a System Process".  Turns on a bit in the
> > proc struct (task struct) that made it exempt from death from a variety
> > of sources, e.g. OOM, generic user signals, portions of system shutdown,
> > etc.
> 
> The current OOM killer does this, except for init. Checking to
> see if the process has a page table is equivalent to checking
> for the kernel threads that are integral to the system (PIDs
> 2-5). These will never be killed by the OOM. Init, however,
> still can be killed, and there should be an additional statement
> that doesn't kill if PID == 1.

Only if you can demonstrate any real-world scenario where 
init will be chosen with the current algorithm.

The "3 MB init on 4MB machine" kind of theoretical argument
just isn't convincing if nobody can show that there is a
problem in reality.

> I think we need to sit down and write a better OOM proposal,
> something that doesn't use CPU time and the NICE flag.

The nice flag has been removed from my current kernel tree.

The CPU time used, however, is a different matter. You really
don't want to have the OOM killer kill your 6-week-old running
simulation because a newly started netscape explodes ...

> How about we start by everyone in this discussion give their
> opinion on what the OOM selection process should do,

Quoting from mm/oom_kill.c:

/**
 * oom_badness - calculate a numeric value for how bad this task has been
 * @p: task struct of which task we should calculate
 *
 * The formula used is relatively simple and documented inline in the
 * function. The main rationale is that we want to select a good task
 * to kill when we run out of memory.
 *
 * Good in this context means that:
 * 1) we lose the minimum amount of work done
 * 2) we recover a large amount of memory
 * 3) we don't kill anything innocent of eating tons of memory
 * 4) we want to kill the minimum amount of processes (one)
 * 5) we try to kill the process the user expects us to kill, this
 *    algorithm has been meticulously tuned to meet the priniciple
 *    of least surprise ... (be careful when you change it)
 */

Do you have any additional requirements?

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 22:25                     ` Andrea Arcangeli
@ 2000-10-09 22:59                       ` Peter Waltenberg
  2000-10-09 23:52                         ` Andrea Arcangeli
  2000-10-09 23:10                       ` Rik van Riel
  1 sibling, 1 reply; 112+ messages in thread
From: Peter Waltenberg @ 2000-10-09 22:59 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Linus Torvalds, Rik van Riel, Byron Stanoszek, MM mailing list,
	Ingo Molnar, Peter Waltenberg

On 09-Oct-2000 Andrea Arcangeli wrote:
> On Tue, Oct 10, 2000 at 07:10:13AM +1000, Peter Waltenberg wrote:
>> People seem to be forgetting (again), that Rik's code is *REALLY* an
> 
> Please explain why you think "people" is forgetting that. At least from my
> part
> I wasn't forgetting that and so far I didn't read any email that made me to
> think others are forgetting that.

I didn't mail the whole kernel list originally, maybe I should have. This
discussion has happened before. The OOM code can never be perfect, I beleive
this can be proven mathematically. In that case, it should at least be simple.

We've seen suggestion after suggestion recently for making the heuristics more
and more complex to cope with corner cases. That isn't going to help, it just
makes it's behaviour less predictable. 

THAT is what I was commenting on.

Without some last resort "kill user processes" code, the kernel hangs under
memory pressure. It'd be nicer if it didn't, but eventually thats what happens.

Having some last resort kernel process which will attempt to keep the kernel
usable is a good idea, and it seems to work, at least on my testing.

Frankly, when it gets to the point where my machine will crash anyway, I don't
really care if the OOM killer gets it wrong now and then. It's still better
than it not being there.

I realize that the MM people are making efforts to ensure that the kernel will
keep running under insane pressure, and maybe you'll produce a kernel now and
then that doesn't die, BUT, I don't think you can ensure that's the case with
every kernel produced. Something will slip through, and again we'll have the
possibility of hangs.

Having a SIMPLE OOM handler in the kernel is a very usefull fallback, it's a
last resort, and if it gets it right 9 times out of 10, it's added another "9"
to the reliability figures.

>> It's probably reasonable to not kill init, but the rest just don't matter.
> 
> Killing init is a kernel bug.
> 
>> Without the OOM killer the machine would have locked up and you'd lose that
>> 3
> 
> Grab 2.2.18pre15aa1 and try to lockup the machine if you can.
> 
>> At least with Rik's code you end up with a usable machine afterwards which
>> is
>> a major improvement.
> 
> If current 2.4.x lockups during OOM that's because of bugs introduced during
> 2.[34].x. The oom killer is completly irrelevant to the stability of the
> kernel,

But not the the stability of the system. I agree, it's better if the OOM killer
never gets used, but the majority of kernels released ARE killable with memory
pressure.

> the oom killer only deals with the _selection_ of the task to kill. OOM
> detection is a completly orthogonal issue.
> 
> If something the oom killer can introduce a lockup condition if there isn't
> a mechamism to fallback killing the current task (all the other tasks
> may be sleeping on a down-nfs-server in interruptible mode).

That probably doesn't matter, the machine would be dead otherwise anyway. WITH
the OOM killer it has some chance of recovering, without it none. It'd be nicer
if that didn't occur, but OOM handling is still an improvement.

> Andrea

Peter
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 22:25                     ` Andrea Arcangeli
  2000-10-09 22:59                       ` Peter Waltenberg
@ 2000-10-09 23:10                       ` Rik van Riel
  1 sibling, 0 replies; 112+ messages in thread
From: Rik van Riel @ 2000-10-09 23:10 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Peter Waltenberg, Ingo Molnar, MM mailing list, Byron Stanoszek,
	Linus Torvalds

On Tue, 10 Oct 2000, Andrea Arcangeli wrote:
> On Tue, Oct 10, 2000 at 07:10:13AM +1000, Peter Waltenberg wrote:

> > It's probably reasonable to not kill init, but the rest just don't matter.
> Killing init is a kernel bug.

And if people find this is a real problem with the OOM killer
I posted some days ago, I'll gladly add the extra code to make
sure init won't be killed.

But until I know it is a problem, I'd rather keep the (hardly
ever used) code small.

> > Without the OOM killer the machine would have locked up and you'd lose that 3
> 
> Grab 2.2.18pre15aa1 and try to lockup the machine if you can.

*grin*    (ok, I'll bite)

Are you /sure/ that kernel no longer kills syslogd,
knfsd or X (crashing the console) ?? ;)

> > At least with Rik's code you end up with a usable machine afterwards which is
> > a major improvement.
> 
> If current 2.4.x lockups during OOM that's because of bugs
> introduced during 2.[34].x.

And not accidentally introduced either. If you read back the
email exchanges between Linus and me regarding the new VM,
you'll see that there's a REASON I didn't have the OOM killer
from the beginning.

I was busy stabilising the new VM feature by feature, only
adding new features (like the OOM killer) after the previous
features had stabilised. The fact that Linus chose to merge
the new VM before I got around to integrating the OOM killer
is purely coincidental.

(in fact, the time Linus chose was quite a bad time for me
because I was just leaving for 2 weeks of conferences)

> The oom killer is completly irrelevant to the stability of the
> kernel, the oom killer only deals with the _selection_ of the
> task to kill. OOM detection is a completly orthogonal issue.

Indeed. And I think you'll have to agree that OOM detection in
2.4 is quite a bit more solid now than it was in 2.2 ...

(where the system simply bails out under too heavy memory
pressure, instead of testing if we are /really/ out of memory)

> If something the oom killer can introduce a lockup condition if
> there isn't a mechamism to fallback killing the current task
> (all the other tasks may be sleeping on a down-nfs-server in
> interruptible mode).

Indeed, this is another theoretical problem. What I'd like
to know, though, is if it matters enough in practice that
we really want the extra bloat in the kernel to deal with
all those theoretically possible corner cases...

(And I'm willing to bet that even when we have 100kB of
OOM killer heuristics in the kernel, there will /STILL/
be corner cases we don't catch)

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 22:07                               ` Jim Gettys
@ 2000-10-09 23:13                                 ` Albert D. Cahalan
  2000-10-09 23:16                                   ` Rik van Riel
                                                     ` (2 more replies)
  0 siblings, 3 replies; 112+ messages in thread
From: Albert D. Cahalan @ 2000-10-09 23:13 UTC (permalink / raw)
  To: Jim Gettys
  Cc: Linus Torvalds, Alan Cox, Andi Kleen, Ingo Molnar,
	Andrea Arcangeli, Rik van Riel, Byron Stanoszek, MM mailing list,
	linux-kernel

Jim Gettys writes:
>> From: Linus Torvalds <torvalds@transmeta.com>

>> One of the biggest bitmaps is the background bitmap. So you have a
>> client that uploads it to X and then goes away. There's nobody to
>> un-count to by the time X decides to switch to another background.
>
> Actually, the big offenders are things other than the background
> bitmap: things like E do absolutely insane things, you would not
> believe (or maybe you would).  The background pixmap is generally
> in the worst case typically no worse than 4 megabytes (for those
> people who are crazy enough to put images up as their root window
> on 32 bit deep displays, at 1kX1k resolution).

Still, it would be nice to recover that 4 MB when the system
doesn't have any memory left.

X, and any other big friendly processes, could participate in
memory balancing operations. X could be made to clean out a
font cache when the kernel signals that memory is low. When
the situation becomes serious, X could just mmap /dev/zero over
top of the background image.

Netscape could even be hacked to dump old junk... or if it is
just too leaky, it could exec itself to fix the problem.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 23:13                                 ` Albert D. Cahalan
@ 2000-10-09 23:16                                   ` Rik van Riel
  2000-10-09 23:46                                   ` Jim Gettys
  2000-10-10  9:46                                   ` Jamie Lokier
  2 siblings, 0 replies; 112+ messages in thread
From: Rik van Riel @ 2000-10-09 23:16 UTC (permalink / raw)
  To: Albert D. Cahalan
  Cc: Jim Gettys, Linus Torvalds, Alan Cox, Andi Kleen, Ingo Molnar,
	Andrea Arcangeli, Byron Stanoszek, MM mailing list, linux-kernel

On Mon, 9 Oct 2000, Albert D. Cahalan wrote:
> Jim Gettys writes:
> >> From: Linus Torvalds <torvalds@transmeta.com>
> 
> >> One of the biggest bitmaps is the background bitmap. So you have a
> >> client that uploads it to X and then goes away. There's nobody to
> >> un-count to by the time X decides to switch to another background.
> >
> > Actually, the big offenders are things other than the background
> > bitmap: things like E do absolutely insane things, you would not
> > believe (or maybe you would).  The background pixmap is generally
> > in the worst case typically no worse than 4 megabytes (for those
> > people who are crazy enough to put images up as their root window
> > on 32 bit deep displays, at 1kX1k resolution).
> 
> Still, it would be nice to recover that 4 MB when the system
> doesn't have any memory left.
> 
> X, and any other big friendly processes, could participate in
> memory balancing operations. X could be made to clean out a
> font cache when the kernel signals that memory is low. When
> the situation becomes serious, X could just mmap /dev/zero over
> top of the background image.
> 
> Netscape could even be hacked to dump old junk... or if it is
> just too leaky, it could exec itself to fix the problem.

Which is all good and well to DELAY the task of the OOM killer
for a few more minutes.

But in the end, there will be a point where you REALLY run out
of memory and you have no other choice than the OOM killer...

(not that I'm against alternative measures, I just think they're
orthagonal to this whole discussion)

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 19:07         ` Rik van Riel
  2000-10-09 19:42           ` Andrea Arcangeli
  2000-10-09 20:13           ` Ingo Molnar
@ 2000-10-09 23:35           ` Ingo Oeser
  2000-10-10 15:07             ` [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 & OOM handler) Ingo Oeser
  2 siblings, 1 reply; 112+ messages in thread
From: Ingo Oeser @ 2000-10-09 23:35 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrea Arcangeli, Ingo Molnar, Linus Torvalds, linux-mm, linux-kernel

On Mon, Oct 09, 2000 at 04:07:32PM -0300, Rik van Riel wrote:
> > If the oom killer kills a thing like init by mistake
> That only happens in the "random" OOM killer 2.2 has ...

[OOM killer war]

Hi there,

before you argue endlessly about the "Right OOM Killer (TM)", I
did a small patch to allow replacing the OOM killer at runtime.

You can even use modules, if you are careful (see khttpd on how
to do this without refcouting).

So now you can stop arguing about the one and only OOM killer,
implement it, provide it as module and get back to the important
stuff ;-)

PS: Patch is against test9 with Rik's latest vmpatch applied.

Thanks for listening

Ingo Oeser

diff -Naur linux-2.4.0-test9-vmpatch/include/linux/swap.h linux-2.4.0-test9-vmpatch-ioe/include/linux/swap.h
--- linux-2.4.0-test9-vmpatch/include/linux/swap.h	Sun Oct  8 00:49:17 2000
+++ linux-2.4.0-test9-vmpatch-ioe/include/linux/swap.h	Tue Oct 10 00:50:17 2000
@@ -129,6 +129,9 @@
 /* linux/mm/oom_kill.c */
 extern int out_of_memory(void);
 extern void oom_kill(void);
+void install_oom_killer(void (*new_oom_kill)(void));
+void reset_default_oom_killer(void);
+
 
 /*
  * Make these inline later once they are working properly.
diff -Naur linux-2.4.0-test9-vmpatch/mm/Makefile linux-2.4.0-test9-vmpatch-ioe/mm/Makefile
--- linux-2.4.0-test9-vmpatch/mm/Makefile	Sun Oct  8 00:49:17 2000
+++ linux-2.4.0-test9-vmpatch-ioe/mm/Makefile	Tue Oct 10 00:10:07 2000
@@ -10,7 +10,8 @@
 O_TARGET := mm.o
 O_OBJS	 := memory.o mmap.o filemap.o mprotect.o mlock.o mremap.o \
 	    vmalloc.o slab.o bootmem.o swap.o vmscan.o page_io.o \
-	    page_alloc.o swap_state.o swapfile.o numa.o oom_kill.o
+	    page_alloc.o swap_state.o swapfile.o numa.o
+OX_OBJS  := oom_kill.o
 
 ifeq ($(CONFIG_HIGHMEM),y)
 O_OBJS += highmem.o
diff -Naur linux-2.4.0-test9-vmpatch/mm/oom_kill.c linux-2.4.0-test9-vmpatch-ioe/mm/oom_kill.c
--- linux-2.4.0-test9-vmpatch/mm/oom_kill.c	Sun Oct  8 00:49:17 2000
+++ linux-2.4.0-test9-vmpatch-ioe/mm/oom_kill.c	Tue Oct 10 00:35:32 2000
@@ -13,6 +13,8 @@
  *  machine) this file will double as a 'coding guide' and a signpost
  *  for newbie kernel hackers. It features several pointers to major
  *  kernel subsystems and hints as to where to find out what things do.
+ *
+ *  Added oom_killer API for special needs - Ingo Oeser
  */
 
 #include <linux/mm.h>
@@ -147,7 +149,9 @@
  * CAP_SYS_RAW_IO set, send SIGTERM instead (but it's unlikely that
  * we select a process with CAP_SYS_RAW_IO set).
  */
-void oom_kill(void)
+
+
+static void oom_kill_rik(void)
 {
 
 	struct task_struct *p = select_bad_process();
@@ -207,4 +211,26 @@
 
 	/* Else... */
 	return 1;
+}
+
+/* Protects oom_killer against resetting during its execution */
+static rwlock_t oom_kill_lock;
+
+static void (*oom_killer)(void)=oom_kill_rik;
+
+void oom_kill(void) {
+	read_lock(&oom_kill_lock);
+	oom_killer();
+	read_unlock(&oom_kill_lock);
+}
+
+void install_oom_killer(void (*new_oom_kill)(void)) {
+	if (!new_oom_kill) return;
+	write_lock(&oom_kill_lock);
+	oom_killer=new_oom_kill;
+	write_unlock(&oom_kill_lock);
+}
+
+void reset_default_oom_killer(void) {
+	install_oom_killer(&oom_kill_rik);
 }

-- 
Feel the power of the penguin - run linux@your.pc
<esc>:x
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 23:13                                 ` Albert D. Cahalan
  2000-10-09 23:16                                   ` Rik van Riel
@ 2000-10-09 23:46                                   ` Jim Gettys
  2000-10-10  9:46                                   ` Jamie Lokier
  2 siblings, 0 replies; 112+ messages in thread
From: Jim Gettys @ 2000-10-09 23:46 UTC (permalink / raw)
  To: Albert D. Cahalan
  Cc: Jim Gettys, Linus Torvalds, Alan Cox, Andi Kleen, Ingo Molnar,
	Andrea Arcangeli, Rik van Riel, Byron Stanoszek, MM mailing list,
	linux-kernel

"Albert D. Cahalan" <acahalan@cs.uml.edu> writes: 
> Date: Mon, 9 Oct 2000 19:13:25 -0400 (EDT)
>
> >> From: Linus Torvalds <torvalds@transmeta.com>
> 
> >> One of the biggest bitmaps is the background bitmap. So you have a
> >> client that uploads it to X and then goes away. There's nobody to
> >> un-count to by the time X decides to switch to another background.
> >
> > Actually, the big offenders are things other than the background
> > bitmap: things like E do absolutely insane things, you would not
> > believe (or maybe you would).  The background pixmap is generally
> > in the worst case typically no worse than 4 megabytes (for those
> > people who are crazy enough to put images up as their root window
> > on 32 bit deep displays, at 1kX1k resolution).
> 
> Still, it would be nice to recover that 4 MB when the system
> doesn't have any memory left.
> 

Yup. The X server could give back the memory for some cases like the
background without too much hackery.

> X, and any other big friendly processes, could participate in
> memory balancing operations. X could be made to clean out a
> font cache when the kernel signals that memory is low. When
> the situation becomes serious, X could just mmap /dev/zero over
> top of the background image.

I agree in principle, though the problem is difficult, as the memory pool 
may get fragmented... Most memory usage is less monolithic than the 
background pixmap.

And maintaining separate memory pools often wastes more memory than it
saves.

> 
> Netscape could even be hacked to dump old junk... or if it is
> just too leaky, it could exec itself to fix the problem.

Netscape 4.x is hopeless; it is leakier than the Titanic.  There is hope 
for Mozilla.
				- Jim


--
Jim Gettys
Technology and Corporate Development
Compaq Computer Corporation
jg@pa.dec.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 22:59                       ` Peter Waltenberg
@ 2000-10-09 23:52                         ` Andrea Arcangeli
  0 siblings, 0 replies; 112+ messages in thread
From: Andrea Arcangeli @ 2000-10-09 23:52 UTC (permalink / raw)
  To: Peter Waltenberg
  Cc: Linus Torvalds, Rik van Riel, Byron Stanoszek, MM mailing list,
	Ingo Molnar

On Tue, Oct 10, 2000 at 08:59:23AM +1000, Peter Waltenberg wrote:
> never gets used, but the majority of kernels released ARE killable with memory
> pressure.

If those kernels are killable with memory pressure it's because of bugs
in the kernel not because of missing oom killer heuristic.

> That probably doesn't matter, the machine would be dead otherwise anyway. WITH

The current task may be almost as big as the one that we choosed to kill that
was hanging in a read from NFS and killing it (even if it wasn't selected by
the oom killer) would allow the machine to run again. The NFS server could
return alive only after several minutes instead.

Andrea
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [RFC] New ideas for the OOM handler
  2000-10-09 22:34                       ` Byron Stanoszek
  2000-10-09 22:57                         ` Rik van Riel
@ 2000-10-10  0:25                         ` Byron Stanoszek
  1 sibling, 0 replies; 112+ messages in thread
From: Byron Stanoszek @ 2000-10-10  0:25 UTC (permalink / raw)
  To: Gerrit.Huizenga
  Cc: Rik van Riel, Linus Torvalds, Andi Kleen, Ingo Molnar,
	Andrea Arcangeli, MM mailing list, linux-kernel

What I'd personally like to see in the OOM should cover the following
scenarios, theoretically:

 1. User does a malloc() bomb. This should be caught instantly and killed when
    there is no memory left to allocate. This covers tons of low-sized
    mallocs() as well as a timed delay infinite loop. Obviously, unless the
    sysadmin enabled vm_overcommit_memory (if that even still exists in 2.4),
    the person won't be able to malloc(2147483647) anyway. The reason is clear.

 2. We want to protect daemon-type system level processes more than anything.
    Therefore, any non-root process should be given higher priority for killing
    versus a superuser process. On production systems, root is less likely to
    hog a machine's memory than normal users. Furthermore, root's processes are
    effectively more important and should be handled specially.
    
    This does NOT mean that we should ignore mistakes such as #1 above (Higher
    priority does not mean exclusive priority). This also does not mean that
    superuser processes shouldn't be killed until all regular user processes
    are dead. Obviously, if root is tricked into running a malloc() bomb, the
    VM should kill that process first.

 3. We want to target processes that will give us the biggest memory gain in
    return. We should look more closely at parent nodes of parent-child
    processes that use shared memory or copy-on-write segments between the two
    from the use of a fork(). The 'originator' of that shared memory should be
    the one to target. (See #4 below for how this may be useful).

    Total VSZ should not be the primary basis for selection. It does not make
    sense to kill a child whose VSZ is 80,000kb when its parent is 70,000kb
    (and 65,000kb is shared between the two). In contrast, it does make sense
    to base the selection on the process who is the 'originator' of the shared
    memory segment (the one who creates the mmap, or who loads the DSO).
    
    The best way to describe the DSO case is with an example. Say the machine
    has 8 MB of ram. Root decides to run Apache, which happily loads several
    dynamically shared objects. Say there was enough memory to load the parent
    process and all shared objects, and then the process spawns an additional
    15 PIDs / threads with shared memory attached. The correct process to
    target here is the parent httpd and not the children individually. However,
    we do not want to leave the children lying around without the parent, as
    this does nothing, and no shared memory would be expunged (see #4 below).

    This concept becomes much harder to grasp when you want to subtract the
    size of shared libraries (e.g. libc, libm) from the VSZ. As long as another
    process has a shared object, then that size should be factored out. A
    program whose shared Libc is 1200k out of a total VSZ of 2600k should not
    get killed over a static program using 2200k.

    The same thing goes for fork()'d processes, since memory is copy-on-write.
    There is not much benefit to killing a child who just came out of a fork(),
    as the parent will most likely fork again.

 4. Arguably, children of a killed process in the same process/session group
    should also be killed. If netscape got killed, its child DNS helper should
    too. It's more likely that a [working] shell would not be killed,
    preventing several user programs from getting killed also. Programs like
    'screen' should always initialize a new process group or session for their
    children so that their children disassociate themselves from the parent
    process. Most (but not all) child processes of high-memory programs would
    be the 'worker bees' for that program.

    This is a lot to chew, and I even doubt this should go into practice
    because 90% of independent child processes are not initialized with a
    separate session/process group ID. But this satisfies the assumption that
    most memory eaters would usually be Leaf Nodes in the process table (e.g.
    large programs run off of a shell) rather than parent nodes, and a shell is
    not likely to be selected for killing. I'd like some comments on this.

 5. How about factoring stack size into the equation? I don't know how the
    stack figures into the VM, but processes with a 70,000 function backtrace
    log should be looked at with higher interest than a 'valid' program such as
    'netscape'. Chances are the kernel already sets stack size limits and kills
    with a SIGSEGV when that limit is hit, so we might not have to worry about
    this one.

 6. Kill programs with an abnormally large number of pages used in the page
    table first. This covers the usage of programs like Electric Fence that eat
    up memory extremely quickly, while most of those pages are not actually
    resident in Physical RAM.

Rules to enforce:

 1. Init should never be killed. Ever. Unless the machine is on crack.

 2. Processes with no virtual memory should not be touched -- Kernel threads.

Additional ideas:

I thought of some additional ways of determining which process gets killed
first, prioritized on the above criteria:

 1. Keep a count of the number of sbrk() memory regions in terms of size for
    each process. The count should not be a recent total or moving average kept
    for the past 5-10 minutes, but instead it should be a ratio relative to the
    size of sbrk() requests of other processes. This quickly determines which
    process is eating up memory the fastest. 99 out of 100 times this will be a
    runaway process, an evil malloc(), or an overly abusive user. At times like
    these, the user will 'expect' the program to crash with a SEGV anyway.

 2. Short of marking a process a "System Process", we want to keep programs
    like X or Svgalib from crashing. In this manner, I agree with the person
    who said programs that have I/O Ports or devices open should be one of the
    last to kill.
    
    Also, if such processes DO get killed, we want them to return the user into
    a usable state where they can interact with the computer. In all OOM
    killers 2.2 and up, killing X with sig 9 is a _bad_ idea. With all due
    respects, we should be killing these processes with SIGSEGV instead of
    SIGKILL to give programs a chance to cool down. However, when the OOM
    killer kicks in there might not be enough memory free for even a printk()
    let alone a core dump. It should be possible to reserve memory for handling
    OOM situations (for instance, kick in OOM when there is 64kb of memory free
    and no less). Chances are the program will just crash due to default signal
    handling. But if the program catches SEGV and does nothing about it, then
    when 0kb of memory becomes free, completely terminate the program.

    This, of course, should only happen when swap is something like 95% full
    and the program isn't almost entirely swapped out. We should also set a
    flag to Never dump core. We should leave enough space on the swap partition
    for memory to get swapped out to disk (and program memory swapped in) to
    let a signal handler do its job. I think using 100% swap is a bad idea.

These are all ideas and suggestions, and I expect most to be flamed out quick.
I wrote this to get people thinking about how we could improve our current OOM
killer and kill the 'right' programs instead of vital system daemons, without
leaving our machine idle for 5 minutes while the OOM killer tries to think of
what to kill next, either because the program is ignoring SIGTERM or there is
100% swap space used.

All in all, the OOM killer we have now is much better than the 2.2 version and
works very well for its intended purpose. These are the types of ideas I would
toss around if I were to implement the killer myself. Keeping it from being too
complicated is the hard part. So, having said the above, elaborate on these
ideas to see if we can _really_ improve our OOM and if it is worth the trouble
doing so.

I however suggest strongly that we implement the check for PID == 1 into the
current OOM and toss out checking for Nice status, which makes no real sense
(see my last post, and the posts for several others).

 -Byron

-- 
Byron Stanoszek                         Ph: (330) 644-3059
Systems Programmer                      Fax: (330) 644-8110
Commercial Timesharing Inc.             Email: bstanoszek@comtime.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 20:18                       ` Rik van Riel
@ 2000-10-10  3:23                         ` Philipp Rumpf
  0 siblings, 0 replies; 112+ messages in thread
From: Philipp Rumpf @ 2000-10-10  3:23 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Ingo Molnar, Andi Kleen, Andrea Arcangeli, Byron Stanoszek,
	Linus Torvalds, MM mailing list, linux-kernel

> (but I'd be curious if somebody actually manages to
> trick the OOM killer into killing init ... please
> test a bit more to see if this really happens ;))

In a non-real-world situation, yes.  (mem=3500k, many drivers, init=/bin/bash,
tried to enter a command).  Since the process in question (bash) ignores
SIGTERM, I actually got a hard hang. 

We really should turn this into a panic() (panic means your elevator control
system reboots and maybe misses the right floor.  hard hang means you need
to reboot manually).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 20:06             ` Rik van Riel
  2000-10-09 20:18               ` Andrea Arcangeli
@ 2000-10-10  3:29               ` Philipp Rumpf
  2000-10-10 15:06                 ` Rik van Riel
  1 sibling, 1 reply; 112+ messages in thread
From: Philipp Rumpf @ 2000-10-10  3:29 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrea Arcangeli, Ingo Molnar, Byron Stanoszek, Linus Torvalds,
	linux-mm, linux-kernel

> > The algorithm you posted on the list in this thread will kill
> > init if on 4Mbyte machine without swap init is large 3 Mbytes
> > and you execute a task that grows over 1M.
> 
> This sounds suspiciously like the description of a DEAD system ;)

But wouldn't a watchdog daemon which doesn't allocate any memory still
get run ?

> (in which case you simply don't care if init is being killed or not)

You care about getting an automatic reboot.  So you need to be sure the
watchdog daemon gets killed first or you panic() after some time.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 21:07         ` Alan Cox
@ 2000-10-10  3:38           ` Philipp Rumpf
  2000-10-10 14:07             ` Andrea Arcangeli
  0 siblings, 1 reply; 112+ messages in thread
From: Philipp Rumpf @ 2000-10-10  3:38 UTC (permalink / raw)
  To: Alan Cox
  Cc: david+validemail, mingo, Andrea Arcangeli, Byron Stanoszek,
	Rik van Riel, Linus Torvalds, linux-mm, linux-kernel

> If init dies the kernel hangs solid anyway

Init should never die.  If we get to do_exit in init we'll panic which is
the right thing to do (reboot on critical systems).
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 20:45                     ` David Ford
@ 2000-10-10  4:22                       ` Andreas Dilger
  2000-10-10  4:30                         ` David Ford
  2000-10-10  9:54                         ` Jamie Lokier
  0 siblings, 2 replies; 112+ messages in thread
From: Andreas Dilger @ 2000-10-10  4:22 UTC (permalink / raw)
  To: david+validemail
  Cc: Rik van Riel, mingo, Andrea Arcangeli, Byron Stanoszek,
	Linus Torvalds, MM mailing list, linux-kernel, jg, alan,
	acahalan, Gerrit.Huizenga

> Rik van Riel wrote:
> > > How about SIGTERM a bit before SIGKILL then re-evaluate the OOM
> > > N usecs later?
> >
> > And run the risk of having to kill /another/ process as well ?
> >
> > I really don't know if that would be a wise thing to do
> > (but feel free to do some tests to see if your idea would
> > work ... I'd love to hear some test results with your idea).

David Ford writes:
> I was thinking (dangerous) about an urgent v.s. critical OOM.  urgent could
> trigger a SIGTERM which would give advance notice to the offending process.
> I don't think we have a signal method of notifying processes when resources
> are critically low, feel free to correct me.
> 
> Is there a signal that -might- be used for this?

Albert D. Cahalan wrote:
> X, and any other big friendly processes, could participate in
> memory balancing operations. X could be made to clean out a
> font cache when the kernel signals that memory is low. When
> the situation becomes serious, X could just mmap /dev/zero over
> top of the background image.
>
> Netscape could even be hacked to dump old junk... or if it is
> just too leaky, it could exec itself to fix the problem.

Gerrit Huizenga wrote:
> Anyway, there is/was an API in PTX to say (either from in-kernel or through
> some user machinations) "I Am a System Process".  Turns on a bit in the
> proc struct (task struct) that made it exempt from death from a variety
> of sources, e.g. OOM, generic user signals, portions of system shutdown,
> etc.
> 
> Then, the code looking for things to kill simply skips those that are
> intelligently marked, taking most of the decision making/policy making
> out of the scheduler/memory manager.

On AIX there is a signal called SIGDANGER, which is basically what you
are looking for.  By default it is ignored, but for processes that care
(e.g. init, X, whatever) they can register a SIGDANGER handler.  At an
"urgent" (as oposed to "critical") OOM situation, all processes get a
SIGDANGER sent to them.  Most will ignore it, but ones with handlers
can free caches, try to do a clean shutdown, whatever.  Any process with
a SIGDANGER handler get a reduction of "badness" (as the OOM killer calls
it) when looking for processes to kill.

Having a SIGDANGER handler is good for 2 reasons:
1) Lets processes know when memory is short so they can free needless cache.
2) Mark process with a SIGDANGER handler as "more important" than those
   without.  Most people won't care about this, but init, and X, and
   long-running simulations might.

Cheers, Andreas
-- 
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
                 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/               -- Dogbert
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-10  4:22                       ` Andreas Dilger
@ 2000-10-10  4:30                         ` David Ford
  2000-10-10  9:54                         ` Jamie Lokier
  1 sibling, 0 replies; 112+ messages in thread
From: David Ford @ 2000-10-10  4:30 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Rik van Riel, mingo, Andrea Arcangeli, Byron Stanoszek,
	Linus Torvalds, MM mailing list, linux-kernel, jg,
	Gerrit.Huizenga

Andreas Dilger wrote:

> Albert D. Cahalan wrote:
> > X, and any other big friendly processes, could participate in
> > memory balancing operations. X could be made to clean out a
>
> Gerrit Huizenga wrote:
> > Anyway, there is/was an API in PTX to say (either from in-kernel or through
> > some user machinations) "I Am a System Process".  Turns on a bit in the
>
> On AIX there is a signal called SIGDANGER, which is basically what you
> are looking for.  By default it is ignored, but for processes that care
> (e.g. init, X, whatever) they can register a SIGDANGER handler.  At an
> "urgent" (as oposed to "critical") OOM situation, all processes get a
> SIGDANGER sent to them.  Most will ignore it, but ones with handlers
> can free caches, try to do a clean shutdown, whatever.  Any process with
> a SIGDANGER handler get a reduction of "badness" (as the OOM killer calls
> it) when looking for processes to kill.
>
> Having a SIGDANGER handler is good for 2 reasons:
> 1) Lets processes know when memory is short so they can free needless cache.
> 2) Mark process with a SIGDANGER handler as "more important" than those
>    without.  Most people won't care about this, but init, and X, and
>    long-running simulations might.

Is there any reason why we can't do something like this for 2.5?

-d

--
      "There is a natural aristocracy among men. The grounds of this are
      virtue and talents", Thomas Jefferson [1742-1826], 3rd US President



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 21:34                       ` Rik van Riel
@ 2000-10-10  9:09                         ` john slee
  0 siblings, 0 replies; 112+ messages in thread
From: john slee @ 2000-10-10  9:09 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Ingo Molnar, MM mailing list, linux-kernel

On Mon, Oct 09, 2000 at 06:34:29PM -0300, Rik van Riel wrote:
> On Mon, 9 Oct 2000, Ingo Molnar wrote:
> > On Mon, 9 Oct 2000, Rik van Riel wrote:
> > 
> > > Would this complexity /really/ be worth it for the twice-yearly OOM
> > > situation?
> > 
> > the only reason i suggested this was the init=/bin/bash, 4MB
> > RAM, no swap emergency-bootup case. We must not kill init in
> > that case - if the current code doesnt then great and none of
> > this is needed.

perhaps a boot time option oom=0 ?  since oom is such a rare case, this
wouldn't impact normal usage...

-- 
john slee <indigoid@higherplane.net>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 23:13                                 ` Albert D. Cahalan
  2000-10-09 23:16                                   ` Rik van Riel
  2000-10-09 23:46                                   ` Jim Gettys
@ 2000-10-10  9:46                                   ` Jamie Lokier
  2 siblings, 0 replies; 112+ messages in thread
From: Jamie Lokier @ 2000-10-10  9:46 UTC (permalink / raw)
  To: Albert D. Cahalan
  Cc: Jim Gettys, Linus Torvalds, Alan Cox, Andi Kleen, Ingo Molnar,
	Andrea Arcangeli, Rik van Riel, Byron Stanoszek, MM mailing list,
	linux-kernel

Albert D. Cahalan wrote:
> X, and any other big friendly processes, could participate in
> memory balancing operations. X could be made to clean out a
> font cache when the kernel signals that memory is low. When
> the situation becomes serious, X could just mmap /dev/zero over
> top of the background image.

Haven't we already had this discussion?  Quite a lot of programs have
cached data (X fonts, Netscape (lots!)), GC-able data (Emacs, Java
etc.), data that can simply be discarded (X window backing stores), or
data that can be written to disk on demand (Netscape again).

-- Jamie
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-10  4:22                       ` Andreas Dilger
  2000-10-10  4:30                         ` David Ford
@ 2000-10-10  9:54                         ` Jamie Lokier
  1 sibling, 0 replies; 112+ messages in thread
From: Jamie Lokier @ 2000-10-10  9:54 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: david+validemail, Rik van Riel, mingo, Andrea Arcangeli,
	Byron Stanoszek, Linus Torvalds, MM mailing list, linux-kernel,
	jg, alan, acahalan, Gerrit.Huizenga

Andreas Dilger wrote:
> Having a SIGDANGER handler is good for 2 reasons:
> 1) Lets processes know when memory is short so they can free needless cache.
> 2) Mark process with a SIGDANGER handler as "more important" than those
>    without.  Most people won't care about this, but init, and X, and
>    long-running simulations might.

For point 1, it would be much nicer to have user processes participate
in memory balancing _before_ getting anywhere near an OOM state.

A nice way is to send SIGDANGER with siginfo saying how much memory the
kernel wants back (or how fast).  Applications that don't know to use
that info, but do have a SIGDANGER handler, will still react just rather
more severely.

-- Jamie
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 20:40                         ` Rik van Riel
@ 2000-10-10  9:59                           ` J.A. Sutherland
  0 siblings, 0 replies; 112+ messages in thread
From: J.A. Sutherland @ 2000-10-10  9:59 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Ingo Molnar, Andi Kleen, Andrea Arcangeli, Byron Stanoszek,
	Linus Torvalds, MM mailing list, linux-kernel

--On 09 October 2000, 17:40 -0300 Rik van Riel <riel@conectiva.com.br>
wrote:
> On Mon, 9 Oct 2000, James Sutherland wrote:
>> On Mon, 9 Oct 2000, Ingo Molnar wrote:
>> > On Mon, 9 Oct 2000, Rik van Riel wrote:
>> > 
>> > > > so dns helper is killed first, then netscape. (my idea might not
>> > > > make sense though.)
>> > > 
>> > > It makes some sense, but I don't think OOM is something that
>> > > occurs often enough to care about it /that/ much...
>> > 
>> > i'm trying to handle Andrea's case, the init=/bin/bash manual-bootup
>> > case, with 4MB RAM and no swap, where the admin tries to exec a 2MB
>> > process. I think it's a legitimate concern - i cannot know in advance
>> > whether a freshly started process would trigger an OOM or not.
>> 
>> Shouldn't the runtime factor handle this, making sure the new
>> process is killed? (Maybe not if you're almost OOM right from
>> the word go, and run this process straight off... Hrm.)
> 
> It should.
> 
> Also, the example is a tad unrealistic since init seems to be
> around 70 kB in size on my systems ;)

In extreme cases, though, you could arrange things so the
machine only has 100K of RAM when it loads init, at which
point init tries running, say, rc.sysinit - and everything goes 
bang. Of course, a machine like that won't be very much use
anyway...

More realistically, though, I could be running with something
like init=/bin/sash - does your statically linked sash binary
fit in 70K? :-)


James.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 21:44                             ` Linus Torvalds
@ 2000-10-10 13:17                               ` Marco Colombo
  0 siblings, 0 replies; 112+ messages in thread
From: Marco Colombo @ 2000-10-10 13:17 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Rik van Riel, MM mailing list, linux-kernel

On Mon, 9 Oct 2000, Linus Torvalds wrote:

> On Mon, 9 Oct 2000, Rik van Riel wrote:
> >
> > > I'd prefer just X having a higher "mm nice level" or something.
> > 
> > Which it has, because:
> > 
> > 1) CAP_RAW_IO
> > 2) p->euid == 0
> 
> Oh, I agree, but we might want to generalize this a bit so that root could
> say "this process is important" and then drop root privileges and still
> get "credited" for the fact that it's important.
> 
> It's not a big deal. It works for X right now.

How about using

	p->rlim[RLIMIT_AS].rlim_cur

to weight the badness point for a process?
On my system, a 128MB RAM + 256MB swap, it defaults to some (insane?) value:

bash$ ulimit -vH -vS
virtual memory (kbytes)  4194302
virtual memory (kbytes)  2105343

for every process, which just means it is unused.

The idea is:
1) set default for rlim[RLIMIT_AS].rlim_max to a saner value;
2) processes with higher rlim[RLIMIT_AS].rlim_cur get lower badness.

This way, the badness of a process is not proportional to its absolute
size, but to the fraction of allowed AS it is using. Processes
that are capable(CAP_SYS_RESOURCE) can set RLIMIT_AS to a very high value,
so they get less badness point. X is a perfect candidate.

User's runaway processes (netscape) will have lower rlim[RLIMIT_AS].rlim_cur,
thus will get higher badness.

Something like:

-	points = p->mm->total_vm;
+	points = p->mm->total_vm / (p->rlim[RLIMIT_AS].rlim_cur << AS_FACTOR);

with

#define AS_FACTOR 30

maybe? (this is Rik's call, he knows better than me how to balance it...)

It's simple, it's configurable. 1) may be enforced by the kernel, or
completely left to user space.
On my system, in its default configuration (no use of RLIMIT_AS),
it has no impact at all (all processes have the same limit).

Sounds good or am I missing something?

> 
> 		Linus
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> Please read the FAQ at http://www.tux.org/lkml/
> 

.TM.
-- 
      ____/  ____/   /
     /      /       /			Marco Colombo
    ___/  ___  /   /		      Technical Manager
   /          /   /			 ESI s.r.l.
 _____/ _____/  _/		       Colombo@ESI.it

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-10  3:38           ` Philipp Rumpf
@ 2000-10-10 14:07             ` Andrea Arcangeli
  0 siblings, 0 replies; 112+ messages in thread
From: Andrea Arcangeli @ 2000-10-10 14:07 UTC (permalink / raw)
  To: Philipp Rumpf
  Cc: Alan Cox, david+validemail, mingo, Byron Stanoszek, Rik van Riel,
	Linus Torvalds, linux-mm, linux-kernel

On Tue, Oct 10, 2000 at 04:38:02AM +0100, Philipp Rumpf wrote:
> Init should never die.  If we get to do_exit in init we'll panic which is
> the right thing to do (reboot on critical systems).

If the page fault can fail with OOM on init, init will get a SIGSEGV while
running a signal handler (copy-user will return -EFAULT regardless it was an
oom or a real segfault) and it _won't_ panic and the system is unusable.

Andrea
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 21:50                             ` Linus Torvalds
  2000-10-09 22:07                               ` Jim Gettys
@ 2000-10-10 14:41                               ` Rogier Wolff
  2000-10-10 17:28                                 ` Linus Torvalds
  1 sibling, 1 reply; 112+ messages in thread
From: Rogier Wolff @ 2000-10-10 14:41 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jim Gettys, Alan Cox, Andi Kleen, Ingo Molnar, Andrea Arcangeli,
	Rik van Riel, Byron Stanoszek, MM mailing list, linux-kernel

Linus Torvalds wrote:
> Basically, the only thing _I_ think X can do is to really say "oh, please
> don't count my memory, because everything I do I do for my clients, not
> for myself". 
> 
> THAT is my argument. Basically there is nothing we can reliably account.
> 
> So we might as well fall back on just saying "X is more important than
> some random client", and have a mm niceness level. Which right now is
> obviously approximated by the IO capabilities tests etc.

FYI:

I ran my machine out of memory (without crashing by the way) this
weekend by loading a whole bunch of large images into netscape. I
noticed not being able to open more windows when I saw my swapspace
exhausted. I noticed the large netscape, and killed it. 

At that moment my X was still taking 80Mb of RAM. I manually killed it
and restarted it to get rid of that memory. 

So if Netscape can "pump" 40 extra megabytes of memory out of X, this
can be exploited. 

Now we're back to the point that a heuristic can never be right all
the time......

			Roger. 

-- 
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2137555 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
*       Common sense is the collection of                                *
******  prejudices acquired by age eighteen.   -- Albert Einstein ********
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-10  3:29               ` Philipp Rumpf
@ 2000-10-10 15:06                 ` Rik van Riel
  2000-10-10 15:24                   ` Philipp Rumpf
  0 siblings, 1 reply; 112+ messages in thread
From: Rik van Riel @ 2000-10-10 15:06 UTC (permalink / raw)
  To: Philipp Rumpf
  Cc: Andrea Arcangeli, Ingo Molnar, Byron Stanoszek, Linus Torvalds,
	linux-mm, linux-kernel

On Tue, 10 Oct 2000, Philipp Rumpf wrote:

> > > The algorithm you posted on the list in this thread will kill
> > > init if on 4Mbyte machine without swap init is large 3 Mbytes
> > > and you execute a task that grows over 1M.
> > 
> > This sounds suspiciously like the description of a DEAD system ;)
> 
> But wouldn't a watchdog daemon which doesn't allocate any memory
> still get run ?

Indeed, it would. It would also /prevent/ the system
from automatically rebooting itself into a usable state ;)

> > (in which case you simply don't care if init is being killed or not)
> 
> You care about getting an automatic reboot.  So you need to be sure the
> watchdog daemon gets killed first or you panic() after some time.

echo 30 > /proc/sys/kernel/panic

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 & OOM handler)
  2000-10-09 23:35           ` Ingo Oeser
@ 2000-10-10 15:07             ` Ingo Oeser
  2000-10-10 15:32               ` Rik van Riel
  0 siblings, 1 reply; 112+ messages in thread
From: Ingo Oeser @ 2000-10-10 15:07 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-mm, linux-kernel

[OOM killer war]

Hi there,

before you argue endlessly about the "Right OOM Killer (TM)", I
did a small patch to allow replacing the OOM killer at runtime.

You can even use modules, if you are careful (see khttpd on how
to do this without refcouting).

So now you can stop arguing about the one and only OOM killer,
implement it, provide it as module and get back to the important
stuff ;-)

PS: Patch is against test10-pre1.

Thanks for listening

Ingo Oeser

--- linux-2.4.0-test10-pre1/mm/oom_kill.c	Tue Oct 10 16:31:08 2000
+++ linux-2.4.0-test10-pre1-ioe/mm/oom_kill.c	Tue Oct 10 16:59:27 2000
@@ -13,6 +13,8 @@
  *  machine) this file will double as a 'coding guide' and a signpost
  *  for newbie kernel hackers. It features several pointers to major
  *  kernel subsystems and hints as to where to find out what things do.
+ *
+ *  Added oom_killer API for special needs - Ingo Oeser
  */
 
 #include <linux/mm.h>
@@ -136,7 +138,7 @@
 }
 
 /**
- * oom_kill - kill the "best" process when we run out of memory
+ * oom_kill_rik - kill the "best" process when we run out of memory
  *
  * If we run out of memory, we have the choice between either
  * killing a random task (bad), letting the system crash (worse)
@@ -147,7 +149,9 @@
  * CAP_SYS_RAW_IO set, send SIGTERM instead (but it's unlikely that
  * we select a process with CAP_SYS_RAW_IO set).
  */
-void oom_kill(void)
+
+
+static void oom_kill_rik(void)
 {
 
 	struct task_struct *p = select_bad_process();
@@ -207,4 +211,63 @@
 
 	/* Else... */
 	return 1;
+}
+
+/* Protects oom_killer against resetting during its execution */
+static rwlock_t oom_kill_lock = RW_LOCK_UNLOCKED;
+
+static oom_killer_t oom_killer = oom_kill_rik;
+
+/** 
+ * oom_kill - the oom_kill wrapper for installable OOM killers
+ *
+ * Wraper around the OOM killers, that can be installed via
+ * install_oom_killer and reset_default_oom_killer.
+ *
+ * This gets called from kswapd() in linux/mm/vmscan.c when we 
+ * really run out of memory.
+ */
+void oom_kill(void) {
+	read_lock(&oom_kill_lock);
+	oom_killer();
+	read_unlock(&oom_kill_lock);
+}
+
+/**
+ * install_oom_killer - install alternate OOM killer
+ * @new_oom_kill: the alternate OOM killer provided by the caller
+ *
+ * Since the default OOM killer (oom_kill_rik) is not suitable 
+ * for everyone, we provide an interface to install custom OOM killers.
+ * 
+ * You can take the most appropriate action for your application if the
+ * kernel goes OOM.
+ *
+ * Providing an NULL argument just returns the current OOM killer.
+ *
+ * Returns: The OOM killer, which has been installed so far.
+ * 
+ * NOTE: We don't do refcounting on OOM killers, so be careful with 
+ * 	modules
+ */
+oom_killer_t install_oom_killer(oom_killer_t new_oom_kill) {
+	oom_killer_t tmp;
+	write_lock(&oom_kill_lock);
+	tmp=oom_killer;
+	if (new_oom_kill) 
+		oom_killer=new_oom_kill;
+	write_unlock(&oom_kill_lock);
+	return tmp;
+}
+
+/**
+ * reset_default_oom_killer - reset back to default OOM killer
+ *
+ * If you are going to unload the module which provided 
+ * your OOM killer, you can install the default one by this.
+ *
+ * Returns: The OOM killer, which has been installed so far.
+ */
+oom_killer_t reset_default_oom_killer(void) {
+	return install_oom_killer(&oom_kill_rik);
 }
--- linux-2.4.0-test10-pre1/include/linux/swap.h	Tue Oct 10 16:31:08 2000
+++ linux-2.4.0-test10-pre1-ioe/include/linux/swap.h	Tue Oct 10 16:44:22 2000
@@ -127,8 +127,14 @@
 #define read_swap_cache(entry) read_swap_cache_async(entry, 1);
 
 /* linux/mm/oom_kill.c */
+typedef void (*oom_killer_t)(void);
+
 extern int out_of_memory(void);
 extern void oom_kill(void);
+
+oom_killer_t install_oom_killer(oom_killer_t new_oom_kill);
+oom_killer_t reset_default_oom_killer(void);
+
 
 /*
  * Make these inline later once they are working properly.
-- 
Feel the power of the penguin - run linux@your.pc
<esc>:x
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-10 15:06                 ` Rik van Riel
@ 2000-10-10 15:24                   ` Philipp Rumpf
  2000-10-10 15:30                     ` Rik van Riel
  0 siblings, 1 reply; 112+ messages in thread
From: Philipp Rumpf @ 2000-10-10 15:24 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrea Arcangeli, Ingo Molnar, Byron Stanoszek, Linus Torvalds,
	linux-mm, linux-kernel

On Tue, Oct 10, 2000 at 12:06:07PM -0300, Rik van Riel wrote:
> On Tue, 10 Oct 2000, Philipp Rumpf wrote:
> > > > The algorithm you posted on the list in this thread will kill
> > > > init if on 4Mbyte machine without swap init is large 3 Mbytes
> > > > and you execute a task that grows over 1M.
> > > 
> > > This sounds suspiciously like the description of a DEAD system ;)
> > 
> > But wouldn't a watchdog daemon which doesn't allocate any memory
> > still get run ?
> 
> Indeed, it would. It would also /prevent/ the system
> from automatically rebooting itself into a usable state ;)

So it's not dead in the "oh, it'll be back in 30 seconds" sense.  So our
behaviour is broken (more so than random process killing).

> > You care about getting an automatic reboot.  So you need to be sure the
> > watchdog daemon gets killed first or you panic() after some time.
> 
> echo 30 > /proc/sys/kernel/panic

that's what I said.  we need to be sure to _get_ a panic() though.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-10 15:24                   ` Philipp Rumpf
@ 2000-10-10 15:30                     ` Rik van Riel
  2000-10-10 15:37                       ` Philipp Rumpf
  0 siblings, 1 reply; 112+ messages in thread
From: Rik van Riel @ 2000-10-10 15:30 UTC (permalink / raw)
  To: Philipp Rumpf
  Cc: Andrea Arcangeli, Ingo Molnar, Byron Stanoszek, Linus Torvalds,
	linux-mm, linux-kernel

On Tue, 10 Oct 2000, Philipp Rumpf wrote:
> On Tue, Oct 10, 2000 at 12:06:07PM -0300, Rik van Riel wrote:
> > On Tue, 10 Oct 2000, Philipp Rumpf wrote:
> > > > > The algorithm you posted on the list in this thread will kill
> > > > > init if on 4Mbyte machine without swap init is large 3 Mbytes
> > > > > and you execute a task that grows over 1M.
> > > > 
> > > > This sounds suspiciously like the description of a DEAD system ;)
> > > 
> > > But wouldn't a watchdog daemon which doesn't allocate any memory
> > > still get run ?
> > 
> > Indeed, it would. It would also /prevent/ the system
> > from automatically rebooting itself into a usable state ;)
> 
> So it's not dead in the "oh, it'll be back in 30 seconds" sense.  
> So our behaviour is broken (more so than random process
> killing).

*nod*

Not killing init when we "should" definately prevents
embedded systems from auto-rebooting when they should
do so.

(OTOH, I don't think embedded systems will run into
this OOM issue too much)

> > > You care about getting an automatic reboot.  So you need to be sure the
> > > watchdog daemon gets killed first or you panic() after some time.
> > 
> > echo 30 > /proc/sys/kernel/panic
> 
> that's what I said.  we need to be sure to _get_ a panic() though.

I believe the kernel automatically panic()s when init
dies ... from kernel/exit.c::do_exit()

        if (tsk->pid == 1)
                panic("Attempted to kill init!");

[which will make our system auto-reboot and be back on its feet
in a healty state again soon]

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 & OOM handler)
  2000-10-10 15:07             ` [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 & OOM handler) Ingo Oeser
@ 2000-10-10 15:32               ` Rik van Riel
  2000-10-10 16:11                 ` Ingo Oeser
  2000-10-10 18:57                 ` Tom Rini
  0 siblings, 2 replies; 112+ messages in thread
From: Rik van Riel @ 2000-10-10 15:32 UTC (permalink / raw)
  To: Ingo Oeser; +Cc: linux-mm, linux-kernel

On Tue, 10 Oct 2000, Ingo Oeser wrote:

> before you argue endlessly about the "Right OOM Killer (TM)", I
> did a small patch to allow replacing the OOM killer at runtime.
> 
> So now you can stop arguing about the one and only OOM killer,
> implement it, provide it as module and get back to the important
> stuff ;-)

This is definately a cool toy for people who have doubts
that my OOM killer will do the wrong thing in their
workloads.

If anyone can demonstrate that the current OOM killer is
doing the wrong thing and has a replacement algorithm
available, please let us know ... ;)

[lets move the discussion back to a less theoretical and
more practical point of view]

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-10 15:30                     ` Rik van Riel
@ 2000-10-10 15:37                       ` Philipp Rumpf
  0 siblings, 0 replies; 112+ messages in thread
From: Philipp Rumpf @ 2000-10-10 15:37 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrea Arcangeli, Ingo Molnar, Byron Stanoszek, Linus Torvalds,
	linux-mm, linux-kernel

On Tue, Oct 10, 2000 at 12:30:51PM -0300, Rik van Riel wrote:
> Not killing init when we "should" definately prevents
> embedded systems from auto-rebooting when they should
> do so.
> 
> (OTOH, I don't think embedded systems will run into
> this OOM issue too much)

but when they do, they're hard to fix.  Think about an elevator control
system with a single process that happens to implement a somewhat broken
version of the elevator algorithm ;)

> > that's what I said.  we need to be sure to _get_ a panic() though.
> 
> I believe the kernel automatically panic()s when init
> dies ... from kernel/exit.c::do_exit()
> 
>         if (tsk->pid == 1)
>                 panic("Attempted to kill init!");

guess who added that code.  We still kill init with SIGTERM which doesn't
seem to work though.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 & OOM handler)
  2000-10-10 15:32               ` Rik van Riel
@ 2000-10-10 16:11                 ` Ingo Oeser
  2000-10-10 18:57                 ` Tom Rini
  1 sibling, 0 replies; 112+ messages in thread
From: Ingo Oeser @ 2000-10-10 16:11 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-mm, linux-kernel

On Tue, Oct 10, 2000 at 12:32:50PM -0300, Rik van Riel wrote:
> > So now you can stop arguing about the one and only OOM killer,
> > implement it, provide it as module and get back to the important
> > stuff ;-)
> 
> This is definately a cool toy for people who have doubts
> that my OOM killer will do the wrong thing in their
> workloads.

Thanks ;-)

But I forgot to include my changes to the mm/Makefile (to export
the API for modules).

Here is a _working_ one:

--- linux-2.4.0-test10-pre1/mm/oom_kill.c	Tue Oct 10 16:31:08 2000
+++ linux-2.4.0-test10-pre1-ioe/mm/oom_kill.c	Tue Oct 10 16:59:27 2000
@@ -13,6 +13,8 @@
  *  machine) this file will double as a 'coding guide' and a signpost
  *  for newbie kernel hackers. It features several pointers to major
  *  kernel subsystems and hints as to where to find out what things do.
+ *
+ *  Added oom_killer API for special needs - Ingo Oeser
  */
 
 #include <linux/mm.h>
@@ -136,7 +138,7 @@
 }
 
 /**
- * oom_kill - kill the "best" process when we run out of memory
+ * oom_kill_rik - kill the "best" process when we run out of memory
  *
  * If we run out of memory, we have the choice between either
  * killing a random task (bad), letting the system crash (worse)
@@ -147,7 +149,9 @@
  * CAP_SYS_RAW_IO set, send SIGTERM instead (but it's unlikely that
  * we select a process with CAP_SYS_RAW_IO set).
  */
-void oom_kill(void)
+
+
+static void oom_kill_rik(void)
 {
 
 	struct task_struct *p = select_bad_process();
@@ -207,4 +211,63 @@
 
 	/* Else... */
 	return 1;
+}
+
+/* Protects oom_killer against resetting during its execution */
+static rwlock_t oom_kill_lock = RW_LOCK_UNLOCKED;
+
+static oom_killer_t oom_killer = oom_kill_rik;
+
+/** 
+ * oom_kill - the oom_kill wrapper for installable OOM killers
+ *
+ * Wraper around the OOM killers, that can be installed via
+ * install_oom_killer and reset_default_oom_killer.
+ *
+ * This gets called from kswapd() in linux/mm/vmscan.c when we 
+ * really run out of memory.
+ */
+void oom_kill(void) {
+	read_lock(&oom_kill_lock);
+	oom_killer();
+	read_unlock(&oom_kill_lock);
+}
+
+/**
+ * install_oom_killer - install alternate OOM killer
+ * @new_oom_kill: the alternate OOM killer provided by the caller
+ *
+ * Since the default OOM killer (oom_kill_rik) is not suitable 
+ * for everyone, we provide an interface to install custom OOM killers.
+ * 
+ * You can take the most appropriate action for your application if the
+ * kernel goes OOM.
+ *
+ * Providing an NULL argument just returns the current OOM killer.
+ *
+ * Returns: The OOM killer, which has been installed so far.
+ * 
+ * NOTE: We don't do refcounting on OOM killers, so be careful with 
+ * 	modules
+ */
+oom_killer_t install_oom_killer(oom_killer_t new_oom_kill) {
+	oom_killer_t tmp;
+	write_lock(&oom_kill_lock);
+	tmp=oom_killer;
+	if (new_oom_kill) 
+		oom_killer=new_oom_kill;
+	write_unlock(&oom_kill_lock);
+	return tmp;
+}
+
+/**
+ * reset_default_oom_killer - reset back to default OOM killer
+ *
+ * If you are going to unload the module which provided 
+ * your OOM killer, you can install the default one by this.
+ *
+ * Returns: The OOM killer, which has been installed so far.
+ */
+oom_killer_t reset_default_oom_killer(void) {
+	return install_oom_killer(&oom_kill_rik);
 }
--- linux-2.4.0-test10-pre1/include/linux/swap.h	Tue Oct 10 16:31:08 2000
+++ linux-2.4.0-test10-pre1-ioe/include/linux/swap.h	Tue Oct 10 16:44:22 2000
@@ -127,8 +127,14 @@
 #define read_swap_cache(entry) read_swap_cache_async(entry, 1);
 
 /* linux/mm/oom_kill.c */
+typedef void (*oom_killer_t)(void);
+
 extern int out_of_memory(void);
 extern void oom_kill(void);
+
+oom_killer_t install_oom_killer(oom_killer_t new_oom_kill);
+oom_killer_t reset_default_oom_killer(void);
+
 
 /*
  * Make these inline later once they are working properly.
--- linux-2.4.0-test10-pre1/mm/Makefile	Tue Oct 10 16:31:08 2000
+++ linux-2.4.0-test10-pre1-ioe/mm/Makefile	Tue Oct 10 16:34:06 2000
@@ -10,7 +10,8 @@
 O_TARGET := mm.o
 O_OBJS	 := memory.o mmap.o filemap.o mprotect.o mlock.o mremap.o \
 	    vmalloc.o slab.o bootmem.o swap.o vmscan.o page_io.o \
-	    page_alloc.o swap_state.o swapfile.o numa.o oom_kill.o
+	    page_alloc.o swap_state.o swapfile.o numa.o
+OX_OBJS  := oom_kill.o
 
 ifeq ($(CONFIG_HIGHMEM),y)
 O_OBJS += highmem.o

Regards

Ingo Oeser
-- 
Feel the power of the penguin - run linux@your.pc
<esc>:x
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-10 14:41                               ` Rogier Wolff
@ 2000-10-10 17:28                                 ` Linus Torvalds
  0 siblings, 0 replies; 112+ messages in thread
From: Linus Torvalds @ 2000-10-10 17:28 UTC (permalink / raw)
  To: Rogier Wolff
  Cc: Jim Gettys, Alan Cox, Andi Kleen, Ingo Molnar, Andrea Arcangeli,
	Rik van Riel, Byron Stanoszek, MM mailing list, linux-kernel

On Tue, 10 Oct 2000, Rogier Wolff wrote:
> 
> So if Netscape can "pump" 40 extra megabytes of memory out of X, this
> can be exploited. 
> 
> Now we're back to the point that a heuristic can never be right all
> the time......

I agree. In fact, we never left that.

Nothing is perfect.

In fact, a lot of engineering is _recognizing_ that you can never achieve
"perfect", and you're much better off not even trying - and having a
simple system that is "good enough".

This is the old adage of "perfect is the enemy of good" - trying too hard
is actually _detrimental_ in 99% of all cases. We should have simple
heuristics that work most of the time, instead of trying to cajole a
complex system like X to help us do some complicated resource management
system.

Complexity will just result in the OOM killer failing in surprising ways.

A simple heuristic will mean that the OOM killer will still fail, but at
least it won't be be in subtle and surprising ways.

			Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 & OOM handler)
  2000-10-10 15:32               ` Rik van Riel
  2000-10-10 16:11                 ` Ingo Oeser
@ 2000-10-10 18:57                 ` Tom Rini
  2000-10-10 20:58                   ` Rik van Riel
  1 sibling, 1 reply; 112+ messages in thread
From: Tom Rini @ 2000-10-10 18:57 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Ingo Oeser, linux-mm, linux-kernel

On Tue, Oct 10, 2000 at 12:32:50PM -0300, Rik van Riel wrote:
> On Tue, 10 Oct 2000, Ingo Oeser wrote:
> 
> > before you argue endlessly about the "Right OOM Killer (TM)", I
> > did a small patch to allow replacing the OOM killer at runtime.
> > 
> > So now you can stop arguing about the one and only OOM killer,
> > implement it, provide it as module and get back to the important
> > stuff ;-)
> 
> This is definately a cool toy for people who have doubts
> that my OOM killer will do the wrong thing in their
> workloads.

I think this can be useful for more than just a cool toy.  I think that the
main thing that this discusion has shown is no OOM killer will please 100% of
the people 100% of the time.  I think we should try and have a good generic
OOM killer that kills the right process most of the time.  People can impliment
(and submit) different-style OOM killers as needed.  Or at least get 'em on
freshmeat. :)

-- 
Tom Rini (TR1265)
http://gate.crashing.org/~trini/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 & OOM handler)
  2000-10-10 18:57                 ` Tom Rini
@ 2000-10-10 20:58                   ` Rik van Riel
  2000-10-10 22:46                     ` Tom Rini
  0 siblings, 1 reply; 112+ messages in thread
From: Rik van Riel @ 2000-10-10 20:58 UTC (permalink / raw)
  To: Tom Rini; +Cc: Ingo Oeser, linux-mm, linux-kernel

On Tue, 10 Oct 2000, Tom Rini wrote:
> On Tue, Oct 10, 2000 at 12:32:50PM -0300, Rik van Riel wrote:
> > On Tue, 10 Oct 2000, Ingo Oeser wrote:
> > 
> > > before you argue endlessly about the "Right OOM Killer (TM)", I
> > > did a small patch to allow replacing the OOM killer at runtime.
> > > 
> > > So now you can stop arguing about the one and only OOM killer,
> > > implement it, provide it as module and get back to the important
> > > stuff ;-)
> > 
> > This is definately a cool toy for people who have doubts
> > that my OOM killer will do the wrong thing in their
> > workloads.
> 
> I think this can be useful for more than just a cool toy.  I
> think that the main thing that this discusion has shown is no
> OOM killer will please 100% of the people 100% of the time.  I
> think we should try and have a good generic OOM killer that
> kills the right process most of the time.  People can impliment
> (and submit) different-style OOM killers as needed.

Indeed, though I suspect most of the people trying this would
fall into the trap of over-engineering their OOM killer, after
which it mostly becomes less predictable ;)

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 & OOM handler)
  2000-10-10 20:58                   ` Rik van Riel
@ 2000-10-10 22:46                     ` Tom Rini
  0 siblings, 0 replies; 112+ messages in thread
From: Tom Rini @ 2000-10-10 22:46 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Ingo Oeser, linux-mm, linux-kernel

On Tue, Oct 10, 2000 at 05:58:46PM -0300, Rik van Riel wrote:
> On Tue, 10 Oct 2000, Tom Rini wrote:
> > On Tue, Oct 10, 2000 at 12:32:50PM -0300, Rik van Riel wrote:
> > > On Tue, 10 Oct 2000, Ingo Oeser wrote:
> > > 
> > > > before you argue endlessly about the "Right OOM Killer (TM)", I
> > > > did a small patch to allow replacing the OOM killer at runtime.
> > > > 
> > > > So now you can stop arguing about the one and only OOM killer,
> > > > implement it, provide it as module and get back to the important
> > > > stuff ;-)
> > > 
> > > This is definately a cool toy for people who have doubts
> > > that my OOM killer will do the wrong thing in their
> > > workloads.
> > 
> > I think this can be useful for more than just a cool toy.  I
> > think that the main thing that this discusion has shown is no
> > OOM killer will please 100% of the people 100% of the time.  I
> > think we should try and have a good generic OOM killer that
> > kills the right process most of the time.  People can impliment
> > (and submit) different-style OOM killers as needed.
> 
> Indeed, though I suspect most of the people trying this would
> fall into the trap of over-engineering their OOM killer, after
> which it mostly becomes less predictable ;)

I was thinking more along the lines of ones w/ "safety" features that not
everyone might like/need (ie /usr/local/bin/foo is always good, those
sugjestions).  It seems like useful functionality at little/no cost.
And a neat toy for now. :)

-- 
Tom Rini (TR1265)
http://gate.crashing.org/~trini/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* RE: [PATCH] VM fix for 2.4.0-test9 & OOM handler
  2000-10-09 18:07 Wagner, Dave
@ 2000-10-09 20:27 ` James Sutherland
  0 siblings, 0 replies; 112+ messages in thread
From: James Sutherland @ 2000-10-09 20:27 UTC (permalink / raw)
  To: Wagner, Dave
  Cc: mingo, Ed Tomlinson, Mark Hahn, Marco Colombo, Rik van Riel, linux-mm

On Mon, 9 Oct 2000, Wagner, Dave wrote:

> > -----Original Message-----
> > From: Ingo Molnar [mailto:mingo@elte.hu]
> > Sent: Monday, October 09, 2000 11:02 AM
> > To: Ed Tomlinson
> > Cc: Mark Hahn; Marco Colombo; Rik van Riel; linux-mm@kvack.org
> > Subject: Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
> > 
> > 
> > On Mon, 9 Oct 2000, Ed Tomlinson wrote:
> > 
> > > What about the AIX way?  When the system is nearly OOM it sends a
> > > SIG_DANGER signal to all processes.  Those that handle the 
> > signal are
> > > not initial targets for OOM...  Also in the SIG_DANGER 
> > processing they
> > > can take there own actions to reduce their memory usage... (we would
> > > have to look out for a SIG_DANGER handler that had a memory leak
> > > though)
> > 
> > i think 'importance' should be an integer value, not just a 
> > 'can it handle
> > SIG_DANGER' flag.
> > 
> In a perfect world, perhaps.  But how many people/systems are going to
> have a well-thought out distribution of "importance" values.  It's
> probably too much to have people even set a single boolean value
> reasonably.

You could say exactly the same about niceness: how many people have a
well-thought out distribution of nice values? Yet the mechanism works
reasonably well: important system things get -ve values, normal user
processes get the default, CPU hogs get niced down a bit.

> How about a bit in the executable to say "unimportant".  Netscape, would
> of course, have this bit set. ;-)

I think an integer, just like nice, would be the way to go.


James.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
@ 2000-10-09 19:06 Hubertus Franke/Watson/IBM
  0 siblings, 0 replies; 112+ messages in thread
From: Hubertus Franke/Watson/IBM @ 2000-10-09 19:06 UTC (permalink / raw)
  To: linux-mm

I think what Ed was pointing out is that the SIG_DANGER signal approach
provides a good mean to inform applications that things are getting tight
and that further deterioration will result in process kills. Now providing
some integer value only gives you some edge, but since you don't know how
many other processes provided higher priority then you with regard to this
value, you can' make any assumption.
A process properly responding to the SIG_DANGER signal should release some
memory, e.g. it could do carbage collection and freeing of pages so that
the kernel can release them. They should get some credit for releaving the
memory pressure.
So first avoiding processes that can deal with SIG_DANGER seems a good
approach, while those processes still a target after this should be
identified by the priority mechanism discussed. I think, these are just
orthogonal issues.

-- Hubertus

Ingo Molnar <mingo@elte.hu>@kvack.org on 10/09/2000 02:01:48 PM

Please respond to mingo@elte.hu

Sent by:  owner-linux-mm@kvack.org

To:   Ed Tomlinson <tomlins@cam.org>
cc:   Mark Hahn <hahn@coffee.psychology.mcmaster.ca>, Marco Colombo
      <marco@esi.it>, Rik van Riel <riel@conectiva.com.br>,
      linux-mm@kvack.org
Subject:  Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler

On Mon, 9 Oct 2000, Ed Tomlinson wrote:

> What about the AIX way?  When the system is nearly OOM it sends a
> SIG_DANGER signal to all processes.  Those that handle the signal are
> not initial targets for OOM...  Also in the SIG_DANGER processing they
> can take there own actions to reduce their memory usage... (we would
> have to look out for a SIG_DANGER handler that had a memory leak
> though)

i think 'importance' should be an integer value, not just a 'can it handle
SIG_DANGER' flag.

     Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

* RE: [PATCH] VM fix for 2.4.0-test9 & OOM handler
@ 2000-10-09 18:07 Wagner, Dave
  2000-10-09 20:27 ` James Sutherland
  0 siblings, 1 reply; 112+ messages in thread
From: Wagner, Dave @ 2000-10-09 18:07 UTC (permalink / raw)
  To: mingo, Ed Tomlinson; +Cc: Mark Hahn, Marco Colombo, Rik van Riel, linux-mm

> -----Original Message-----
> From: Ingo Molnar [mailto:mingo@elte.hu]
> Sent: Monday, October 09, 2000 11:02 AM
> To: Ed Tomlinson
> Cc: Mark Hahn; Marco Colombo; Rik van Riel; linux-mm@kvack.org
> Subject: Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler
> 
> 
> On Mon, 9 Oct 2000, Ed Tomlinson wrote:
> 
> > What about the AIX way?  When the system is nearly OOM it sends a
> > SIG_DANGER signal to all processes.  Those that handle the 
> signal are
> > not initial targets for OOM...  Also in the SIG_DANGER 
> processing they
> > can take there own actions to reduce their memory usage... (we would
> > have to look out for a SIG_DANGER handler that had a memory leak
> > though)
> 
> i think 'importance' should be an integer value, not just a 
> 'can it handle
> SIG_DANGER' flag.
> 
In a perfect world, perhaps.  But how many people/systems are going to
have a well-thought out distribution of "importance" values.  It's
probably too much to have people even set a single boolean value
reasonably.

How about a bit in the executable to say "unimportant".  Netscape, would
of course, have this bit set. ;-)

Dave Wagner
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 112+ messages in thread

end of thread, other threads:[~2000-10-10 22:46 UTC | newest]

Thread overview: 112+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-10-06 18:59 [PATCH] VM fix for 2.4.0-test9 & OOM handler Rik van Riel
2000-10-06 20:19 ` Byron Stanoszek
2000-10-06 20:31   ` Rik van Riel
2000-10-09 10:12     ` Marco Colombo
2000-10-09 11:27       ` Byron Stanoszek
2000-10-09 16:26       ` Kurt Garloff
2000-10-09 18:29         ` Jamie Lokier
2000-10-09 17:27       ` Ingo Molnar
2000-10-09 17:25         ` Mark Hahn
2000-10-09 17:37           ` Ingo Molnar
2000-10-09 17:47           ` Ed Tomlinson
2000-10-09 18:01             ` Ingo Molnar
2000-10-09 18:14       ` Rik van Riel
2000-10-09 18:47         ` Ingo Molnar
2000-10-09 18:52           ` Rik van Riel
2000-10-09 19:27             ` Ingo Molnar
2000-10-09 19:38         ` Marco Colombo
2000-10-06 21:27   ` David Weinehall
2000-10-06 23:21     ` David Weinehall
2000-10-09 18:28   ` Andrea Arcangeli
2000-10-09 18:42     ` Ingo Molnar
2000-10-09 19:05       ` Andrea Arcangeli
2000-10-09 19:07         ` Rik van Riel
2000-10-09 19:42           ` Andrea Arcangeli
2000-10-09 20:06             ` Ingo Molnar
2000-10-09 20:06               ` Andi Kleen
2000-10-09 20:19                 ` Ingo Molnar
2000-10-09 20:12                   ` Rik van Riel
2000-10-09 20:24                     ` Ingo Molnar
2000-10-09 20:18                       ` Rik van Riel
2000-10-10  3:23                         ` Philipp Rumpf
2000-10-09 20:38                       ` James Sutherland
2000-10-09 20:40                         ` Rik van Riel
2000-10-10  9:59                           ` J.A. Sutherland
2000-10-09 20:44                         ` Andrea Arcangeli
2000-10-09 21:52                         ` Aaron Sethman
2000-10-09 21:54                           ` Rik van Riel
2000-10-09 22:29                       ` FORT David
2000-10-09 20:52                 ` Linus Torvalds
2000-10-09 20:58                   ` Andi Kleen
2000-10-09 21:21                     ` Jim Gettys
2000-10-09 21:28                       ` Alan Cox
2000-10-09 21:34                         ` Andi Kleen
2000-10-09 21:38                         ` Linus Torvalds
2000-10-09 21:39                           ` Rik van Riel
2000-10-09 21:44                             ` Linus Torvalds
2000-10-10 13:17                               ` Marco Colombo
2000-10-09 21:44                           ` Jim Gettys
2000-10-09 21:50                             ` Linus Torvalds
2000-10-09 22:07                               ` Jim Gettys
2000-10-09 23:13                                 ` Albert D. Cahalan
2000-10-09 23:16                                   ` Rik van Riel
2000-10-09 23:46                                   ` Jim Gettys
2000-10-10  9:46                                   ` Jamie Lokier
2000-10-10 14:41                               ` Rogier Wolff
2000-10-10 17:28                                 ` Linus Torvalds
2000-10-09 21:51                           ` Alan Cox
2000-10-09 21:40                         ` Jim Gettys
2000-10-09 21:05                   ` Rik van Riel
2000-10-09 22:08                     ` Gerrit.Huizenga
2000-10-09 22:34                       ` Byron Stanoszek
2000-10-09 22:57                         ` Rik van Riel
2000-10-10  0:25                         ` [RFC] New ideas for the " Byron Stanoszek
2000-10-09 20:11               ` [PATCH] VM fix for 2.4.0-test9 & " Andrea Arcangeli
2000-10-09 20:15                 ` Rik van Riel
2000-10-09 20:40               ` Linus Torvalds
2000-10-09 20:47                 ` Rik van Riel
2000-10-09 20:57                 ` Ingo Molnar
2000-10-09 21:10                   ` Peter Waltenberg
2000-10-09 22:25                     ` Andrea Arcangeli
2000-10-09 22:59                       ` Peter Waltenberg
2000-10-09 23:52                         ` Andrea Arcangeli
2000-10-09 23:10                       ` Rik van Riel
2000-10-09 21:10               ` Alan Cox
2000-10-09 21:25                 ` Ingo Molnar
2000-10-09 21:26                   ` Rik van Riel
2000-10-09 21:38                     ` Ingo Molnar
2000-10-09 21:34                       ` Rik van Riel
2000-10-10  9:09                         ` john slee
2000-10-09 20:06             ` Rik van Riel
2000-10-09 20:18               ` Andrea Arcangeli
2000-10-10  3:29               ` Philipp Rumpf
2000-10-10 15:06                 ` Rik van Riel
2000-10-10 15:24                   ` Philipp Rumpf
2000-10-10 15:30                     ` Rik van Riel
2000-10-10 15:37                       ` Philipp Rumpf
2000-10-09 20:13           ` Ingo Molnar
2000-10-09 20:08             ` Rik van Riel
2000-10-09 20:22               ` Ingo Molnar
2000-10-09 20:28                 ` David Ford
2000-10-09 20:34                   ` Rik van Riel
2000-10-09 20:45                     ` David Ford
2000-10-10  4:22                       ` Andreas Dilger
2000-10-10  4:30                         ` David Ford
2000-10-10  9:54                         ` Jamie Lokier
2000-10-09 23:35           ` Ingo Oeser
2000-10-10 15:07             ` [PATCH] OOM killer API (was: [PATCH] VM fix for 2.4.0-test9 & OOM handler) Ingo Oeser
2000-10-10 15:32               ` Rik van Riel
2000-10-10 16:11                 ` Ingo Oeser
2000-10-10 18:57                 ` Tom Rini
2000-10-10 20:58                   ` Rik van Riel
2000-10-10 22:46                     ` Tom Rini
2000-10-09 19:30       ` [PATCH] VM fix for 2.4.0-test9 & OOM handler David Ford
2000-10-09 19:58         ` Andrea Arcangeli
2000-10-09 20:14           ` David Ford
2000-10-09 20:05         ` Rik van Riel
2000-10-09 21:07         ` Alan Cox
2000-10-10  3:38           ` Philipp Rumpf
2000-10-10 14:07             ` Andrea Arcangeli
2000-10-09 18:07 Wagner, Dave
2000-10-09 20:27 ` James Sutherland
2000-10-09 19:06 Hubertus Franke/Watson/IBM

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox