Re: Improving OOM killer - David Rientjes

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Rientjes <rientjes@google.com>
To: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>, Lubos Lunak <l.lunak@suse.cz>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Nick Piggin <npiggin@suse.de>, Jiri Kosina <jkosina@suse.cz>
Subject: Re: Improving OOM killer
Date: Wed, 3 Feb 2010 10:58:01 -0800 (PST)	[thread overview]
Message-ID: <alpine.DEB.2.00.1002031021190.14088@chino.kir.corp.google.com> (raw)
In-Reply-To: <20100203170127.GH19641@balbir.in.ibm.com>

On Wed, 3 Feb 2010, Balbir Singh wrote:

> > IIRC the child accumulating code was introduced to deal with
> > malicious code (fork bombs), but it makes things worse for the
> > (much more common) situation of a system without malicious
> > code simply running out of memory due to being very busy.
> >
> 
> For fork bombs, we could do a number of children number test and have
> a threshold before we consider a process and its children for
> badness().
> 

Yes, we could look for the number of children with seperate mm's and then 
penalize those threads that have forked an egregious amount, say, 500 
tasks.  I think we should check for this threshold within the badness() 
heuristic to identify such forkbombs and not limit it only to certain 
applications.  

My rewrite for the badness() heuristic is centered on the idea that scores 
should range from 0 to 1000, 0 meaning "never kill this task" and 1000 
meaning "kill this task first."  The baseline for a thread, p, may be 
something like this:

	unsigned int badness(struct task_struct *p,
					unsigned long totalram)
	{
		struct task_struct *child;
		struct mm_struct *mm;
		int forkcount = 0;
		long points;

		task_lock(p);
		mm = p->mm;
		if (!mm) {
			task_unlock(p);
			return 0;
		}
		points = (get_mm_rss(mm) +
				get_mm_counter(mm, MM_SWAPENTS)) * 1000 /
				totalram;
		task_unlock(p);

		list_for_each_entry(child, &p->children, sibling)
			/* No lock, child->mm won't be dereferenced */
			if (child->mm && child->mm != mm)
				forkcount++;

		/* Forkbombs get penalized 10% of available RAM */
		if (forkcount > 500)
			points += 100;

		...

		/*
		 * /proc/pid/oom_adj ranges from -1000 to +1000 to either
		 * completely disable oom killing or always prefer it.
		 */
		points += p->signal->oom_adj;

		if (points < 0)
			return 0;
		return (points <= 1000) ? points : 1000;
	}

	static struct task_struct *select_bad_process(...,
						nodemask_t *nodemask)
	{
		struct task_struct *p;
		unsigned long totalram = 0;
		int nid;

		for_each_node_mask(nid, nodemask)
			totalram += NODE_DATA(nid)->node_present_pages;

		for_each_process(p) {
			unsigned int points;

			...

			if (!nodes_intersects(p->mems_allowed, nodemasks))
				continue;

			...
			points = badness(p, totalram);
			...
		}
		...
	}

In this example, /proc/pid/oom_adj now ranges from -1000 to +1000, with 
OOM_DISABLE being -1000, to polarize tasks for oom killing or determine 
when a task is leaking memory because it is using far more memory than it 
should.  The nodemask passed from the page allocator should be intersected 
with current->mems_allowed within the oom killer; userspace is then fully 
aware of what value is an egregious amount of RAM for a task to consume, 
including information it knows about the task's cpuset or mempolicy.  For 
example, it would be very simple for a user to set an oom_adj of -500, 
which means "we discount 50% of the task's allowed memory from being 
considered in the heuristic" or +500, which means "we always allow all 
other system/cpuset/mempolicy tasks to use at least 50% more allowed 
memory than this one."

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2010-02-03 18:58 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-01 22:02 Lubos Lunak
2010-02-01 23:53 ` David Rientjes
2010-02-02 21:10   ` Lubos Lunak
2010-02-03  1:41     ` David Rientjes
2010-02-03  1:52       ` KAMEZAWA Hiroyuki
2010-02-03  2:12         ` David Rientjes
2010-02-03  2:12           ` KAMEZAWA Hiroyuki
2010-02-03  2:36             ` [patch] sysctl: clean up vm related variable declarations David Rientjes
2010-02-03  8:07               ` KOSAKI Motohiro
2010-02-03  8:17               ` Balbir Singh
2010-02-03 22:54       ` Improving OOM killer Lubos Lunak
2010-02-04  0:00         ` David Rientjes
2010-02-03  7:50 ` KOSAKI Motohiro
2010-02-03  9:40   ` David Rientjes
2010-02-03  8:57 ` Balbir Singh
2010-02-03 12:10   ` Lubos Lunak
2010-02-03 12:25     ` Balbir Singh
2010-02-03 15:00       ` Minchan Kim
2010-02-03 16:06         ` Minchan Kim
2010-02-03 21:22       ` Lubos Lunak
2010-02-03 14:49 ` Rik van Riel
2010-02-03 17:01   ` Balbir Singh
2010-02-03 18:58     ` David Rientjes [this message]
2010-02-03 19:29       ` Frans Pop
2010-02-03 19:52         ` David Rientjes
2010-02-03 20:12           ` Frans Pop
2010-02-03 20:26             ` David Rientjes
2010-02-03 22:55       ` Lubos Lunak
2010-02-04  0:05         ` David Rientjes
2010-02-04  0:18           ` Rik van Riel
2010-02-04 21:48             ` David Rientjes
2010-02-04 22:06               ` Rik van Riel
2010-02-04 22:14                 ` David Rientjes
2010-02-10 20:54                   ` Lubos Lunak
2010-02-10 21:10                     ` Rik van Riel
2010-02-10 21:29                       ` Lubos Lunak
2010-02-10 22:18                       ` Alan Cox
2010-02-10 22:31                         ` David Rientjes
2010-02-11  9:50                         ` Lubos Lunak
2010-02-04 22:31               ` Frans Pop
2010-02-04 22:53                 ` David Rientjes
2010-02-04  7:58           ` Lubos Lunak
2010-02-04 21:34             ` David Rientjes
2010-02-10 20:54               ` Lubos Lunak
2010-02-10 21:09                 ` Rik van Riel
2010-02-10 21:34                   ` Lubos Lunak
2010-02-10 22:25                 ` David Rientjes
2010-02-11 10:16                   ` Lubos Lunak
2010-02-11 21:17                     ` David Rientjes
2010-02-04  9:50           ` Jiri Kosina
2010-02-04 21:39             ` David Rientjes
2010-02-05  7:35               ` Oliver Neukum
2010-02-10  3:10                 ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.00.1002031021190.14088@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=jkosina@suse.cz \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=l.lunak@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox