[RFC] [PATCH] support for oom

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [RFC] [PATCH] support for oom_die
@ 2006-04-11  5:29 KAMEZAWA Hiroyuki
  2006-04-11 17:28 ` Christoph Lameter
  0 siblings, 1 reply; 7+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-04-11  5:29 UTC (permalink / raw)
  To: linux-mm

Hi,

This patch adds a feature to panic at OOM, oom_die.

I think 2.6 kernel is very robust against OOM situation but sometimes
it occurs. Yes, oom_kill works enough and exit oom situation, *when*
the system wants to survive.

First, crash-dump is merged (to -mm?). So panic at OOM can be a method to
preserve *all* information at OOM. Current OOM killer kills process by SIGKILL,
this doesn't preserve any information about OOM situation. Just message log tell
something and we have to imagine what happend.

Second, considering clustering system, it has a failover node replacement 
system. Because oom_killer tends to kill system slowly, one by one, to detect 
it and do failover(or not) at OOM is tend to be difficult. (as far as I know)
Panic at OOM is useful in such system because failover system can replace
the node immediately.

I'm sorry if this kind of discussion has been setteled in past.

-Kame
==
This patch adds oom_die sysctl under sys.vm.

When oom_die==1, system panic at out_of_memory istead of kill some
process. In some environment, I think panic is more useful than kill.

for example)
(1) When a host is a node of a clustering system and panics at OOM,
    Failover system can detect panic by out-of-memory easily and immediately.
    It can replace the node with another node in fast way.

(2) When the system equips crash dump, out-of-memory will cause crash
    dump. While oom_killer cannot preserve enough information to detect
    the reason of OOM, crash dump can preserve *all* information.
    We can chase it.

Signed-Off-By: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Index: linux-2.6.17-rc1-mm2/kernel/sysctl.c
===================================================================
--- linux-2.6.17-rc1-mm2.orig/kernel/sysctl.c
+++ linux-2.6.17-rc1-mm2/kernel/sysctl.c
@@ -60,6 +60,7 @@ extern int proc_nr_files(ctl_table *tabl
 extern int C_A_D;
 extern int sysctl_overcommit_memory;
 extern int sysctl_overcommit_ratio;
+extern int sysctl_oom_die;
 extern int max_threads;
 extern int sysrq_enabled;
 extern int core_uses_pid;
@@ -718,6 +719,14 @@ static ctl_table vm_table[] = {
 		.proc_handler	= &proc_dointvec,
 	},
 	{
+		.ctl_name	= VM_OOM_DIE,
+		.procname	= "oom_die",
+		.data		= &sysctl_oom_die,
+		.maxlen		= sizeof(sysctl_oom_die),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec,
+	},
+	{
 		.ctl_name	= VM_OVERCOMMIT_RATIO,
 		.procname	= "overcommit_ratio",
 		.data		= &sysctl_overcommit_ratio,
Index: linux-2.6.17-rc1-mm2/mm/oom_kill.c
===================================================================
--- linux-2.6.17-rc1-mm2.orig/mm/oom_kill.c
+++ linux-2.6.17-rc1-mm2/mm/oom_kill.c
@@ -23,7 +23,7 @@
 #include <linux/cpuset.h>
 
 /* #define DEBUG */
-
+int sysctl_oom_die = 0;
 /**
  * oom_badness - calculate a numeric value for how bad this task has been
  * @p: task struct of which task we should calculate
@@ -290,6 +290,12 @@ static struct mm_struct *oom_kill_proces
 	return oom_kill_task(p, message);
 }
 
+
+static void oom_die(void)
+{
+	panic("Panic: out of memory: oom_die is selected.");
+}
+
 /**
  * oom_kill - kill the "best" process when we run out of memory
  *
@@ -331,6 +337,8 @@ void out_of_memory(struct zonelist *zone
 
 	case CONSTRAINT_NONE:
 retry:
+		if (sysctl_oom_die)
+			oom_die();
 		/*
 		 * Rambo mode: Shoot down a process and hope it solves whatever
 		 * issues we may have.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] [PATCH] support for oom_die
  2006-04-11  5:29 [RFC] [PATCH] support for oom_die KAMEZAWA Hiroyuki
@ 2006-04-11 17:28 ` Christoph Lameter
  2006-04-11 17:39   ` Om Narasimhan
                     ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Christoph Lameter @ 2006-04-11 17:28 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm

On Tue, 11 Apr 2006, KAMEZAWA Hiroyuki wrote:

> I think 2.6 kernel is very robust against OOM situation but sometimes
> it occurs. Yes, oom_kill works enough and exit oom situation, *when*
> the system wants to survive.
> 
> First, crash-dump is merged (to -mm?). So panic at OOM can be a method to
> preserve *all* information at OOM. Current OOM killer kills process by SIGKILL,
> this doesn't preserve any information about OOM situation. Just message log tell
> something and we have to imagine what happend.
> 
> Second, considering clustering system, it has a failover node replacement 
> system. Because oom_killer tends to kill system slowly, one by one, to detect 
> it and do failover(or not) at OOM is tend to be difficult. (as far as I know)
> Panic at OOM is useful in such system because failover system can replace
> the node immediately.
> 
> I'm sorry if this kind of discussion has been setteled in past.

A user process can cause an oops by using too much memory? Would it not be 
better to terminate the rogue process instead? Otherwise any user can 
bring down the system?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] [PATCH] support for oom_die
  2006-04-11 17:28 ` Christoph Lameter
@ 2006-04-11 17:39   ` Om Narasimhan
  2006-04-12  0:37   ` David Chinner
  2006-04-12  1:11   ` KAMEZAWA Hiroyuki
  2 siblings, 0 replies; 7+ messages in thread
From: Om Narasimhan @ 2006-04-11 17:39 UTC (permalink / raw)
  To: linux-mm

> A user process can cause an oops by using too much memory? Would it not be
> better to terminate the rogue process instead? Otherwise any user can
> bring down the system?
How can we differentiate a rogue process requestion huge amount of
memory and a legitimate process requesting huge amount of memory? Or
do you mean despite the status, kill the process that request huge
amounts of memory?

Om.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] [PATCH] support for oom_die
  2006-04-11 17:28 ` Christoph Lameter
  2006-04-11 17:39   ` Om Narasimhan
@ 2006-04-12  0:37   ` David Chinner
  2006-04-12  1:11   ` KAMEZAWA Hiroyuki
  2 siblings, 0 replies; 7+ messages in thread
From: David Chinner @ 2006-04-12  0:37 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: KAMEZAWA Hiroyuki, linux-mm

On Tue, Apr 11, 2006 at 10:28:32AM -0700, Christoph Lameter wrote:
> On Tue, 11 Apr 2006, KAMEZAWA Hiroyuki wrote:
> 
> > I think 2.6 kernel is very robust against OOM situation but sometimes
> > it occurs. Yes, oom_kill works enough and exit oom situation, *when*
> > the system wants to survive.
> 
> A user process can cause an oops by using too much memory? Would it not be 
> better to terminate the rogue process instead? Otherwise any user can 
> bring down the system?

In a HA environment, the OOM killer can take out the failover daemon
or other services and the failover infrastructure may not be able to
handle this gracefully and services will become unavailable. This is
about the worst thing that can happen in this environment.

In these situations, it is better to panic the box on OOM and get a
clean failover of services than risk having the OOM killer
compromise your HA setup.

Also, you typically don't have Random J. User logging in and running
stuff on HA server clusters, so if you're in an OOM situation there
is already something wrong that needs fixing.....

Cheers,

Dave.
-- 
Dave Chinner
R&D Software Enginner
SGI Australian Software Group

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] [PATCH] support for oom_die
  2006-04-11 17:28 ` Christoph Lameter
  2006-04-11 17:39   ` Om Narasimhan
  2006-04-12  0:37   ` David Chinner
@ 2006-04-12  1:11   ` KAMEZAWA Hiroyuki
  2006-04-12  3:31     ` Rik van Riel
  2006-04-12  4:49     ` KAMEZAWA Hiroyuki
  2 siblings, 2 replies; 7+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-04-12  1:11 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm

On Tue, 11 Apr 2006 10:28:32 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:

> On Tue, 11 Apr 2006, KAMEZAWA Hiroyuki wrote:
> 
> > I think 2.6 kernel is very robust against OOM situation but sometimes
> > it occurs. Yes, oom_kill works enough and exit oom situation, *when*
> > the system wants to survive.
> A user process can cause an oops by using too much memory?
Yes. if oom_die=1.

> Would it not be better to terminate the rogue process instead? Otherwise 
> any user can bring down the system?
> 
I thought so until met system admins. And this panic works only if oom_die=1,
this is set by system admin. This is admin's choice.

When OOM-kill occurs on customer's system, I(we) am usually blamed by them 
"Why does the kernel kill process ? Please panic, then we can switch 
 system to sub-system immediately and crash-dump tells us what was happened."
(Note: RHELX has crashdump support.)

More description:
Why they want panic at OOM ?

One reason is to take crashdump at OOM. They just send dump image
to support team, support team can know what happend.Support team can have
precise evidence of 'who is rogue ?'

Another is failover system. Because they can replace system immediately at
panic, they doesn't need oom_kill. 

When implementing failover system , there is two ways in general.
(1) driver-level heartbeat check.
(2) process-level heartbeat check. (check specified process is alive or not)

(1) cannot detect OOM situation. driver is always alive.
(2) can check what process is alive (by kill -0). but sometimes this check is
    delayed. and checking hundreds of applications (all they need) is 

If panic at OOM, (1) can do all we (and customers) want.

Third is we can catch oom caued by kernel-memory-leak and chase it by dump. :)

Note:
I proposed them to use overcommit_memory=2, but that didn't work well.
(because of Java and multithreaded system.....)

We have oom_adj. I think this is very useful (only if used in sane way..)
But it looks difficult to use....When hundreds of applications runs on the server,
they are all important. It's impossible to attach valid oom_adj value to all of them.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] [PATCH] support for oom_die
  2006-04-12  1:11   ` KAMEZAWA Hiroyuki
@ 2006-04-12  3:31     ` Rik van Riel
  2006-04-12  4:49     ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 7+ messages in thread
From: Rik van Riel @ 2006-04-12  3:31 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: Christoph Lameter, linux-mm

On Wed, 12 Apr 2006, KAMEZAWA Hiroyuki wrote:

> More description:
> Why they want panic at OOM ?

> Another is failover system. Because they can replace system immediately 
> at panic, they doesn't need oom_kill.

This makes perfect sense to me.  Of course, one of the guys
developing our cluster software sits in the cube next to me,
so I do get to see quite a bit of the cluster software ;)

-- 
All Rights Reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] [PATCH] support for oom_die
  2006-04-12  1:11   ` KAMEZAWA Hiroyuki
  2006-04-12  3:31     ` Rik van Riel
@ 2006-04-12  4:49     ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 7+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-04-12  4:49 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: clameter, linux-mm, dgc, riel

On Wed, 12 Apr 2006 10:11:54 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> Why they want panic at OOM ?
> 

David-san, Rik-san,  thank you for comments.
I'm convinced there are some cases where panic is better than kill , again.

I'll fix my corrupt English and post this patch to lkml with proper description.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2006-04-12  4:49 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-04-11  5:29 [RFC] [PATCH] support for oom_die KAMEZAWA Hiroyuki
2006-04-11 17:28 ` Christoph Lameter
2006-04-11 17:39   ` Om Narasimhan
2006-04-12  0:37   ` David Chinner
2006-04-12  1:11   ` KAMEZAWA Hiroyuki
2006-04-12  3:31     ` Rik van Riel
2006-04-12  4:49     ` KAMEZAWA Hiroyuki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox