unnecessary oom killer panics in 2.6.38 (was Re: Linux 2.6.38)

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Rientjes <rientjes@google.com>
To: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, Rik van Riel <riel@redhat.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Subject: unnecessary oom killer panics in 2.6.38 (was Re: Linux 2.6.38)
Date: Tue, 15 Mar 2011 16:32:58 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.2.00.1103151618150.5985@chino.kir.corp.google.com> (raw)
In-Reply-To: <20110315210855.GI21640@redhat.com>

On Tue, 15 Mar 2011, Oleg Nesterov wrote:

> What I can't understand is what exactly the first patch tries to fix.
> When I ask you, you tell me that for_each_process() can miss the group
> leader because it can exit before sub-threads. This must not happen,
> or we have some serious bug triggered by your workload.
> 
> So, once again. Could you please explain the original problem and how
> this patch helps?
> 

[trimming cc list with a less worrysome subject line]

A process in a cpuset by itself (or with other processes that are 
OOM_DISABLE) runs out of memory while handling page faults.  It is 
selected as the last possible target by the oom killer and gets killed.  
All of its children are reparented to init (yet they have the same 
cpuset restrictions as the parent and are oom as well) and call do_exit().  
do_exit() happens to require memory while handling proc_exit_connector() 
and trigger an oom itself.  There are no eligible threads left to be found 
in the for_each_process() loop which results in a panic.  The remaining 
children of the oom killed process spin in the page allocator because they 
cannot acquire the zone locks necessary for calling the oom killer 
themselves -- this isn't really important since they would panic the 
machine as well if they do call out_of_memory().

Instead, we want do_each_thread() to identify these threads that are 
eligible for oom kill because they have the same intersecting set of 
allowed nodes (regardless of whether they are reparented to init or not) 
and give them access to memory reserves so that they may finish allocating 
slab for proc_exit_connector() and exit.  Anything else will unnecessary 
panic the machine and that's why 
oom-prevent-unnecessary-oom-kills-or-kernel-panics.patch fixes the issue.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

          parent reply	other threads:[~2011-03-15 23:33 UTC|newest]

Thread overview: expand[flat|nested]  mbox.gz  Atom feed
 [parent not found: <20110315210855.GI21640@redhat.com>]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.00.1103151618150.5985@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=oleg@redhat.com \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox