linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Rientjes <rientjes@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>, Nick Piggin <npiggin@suse.de>,
	Oleg Nesterov <oleg@redhat.com>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	linux-mm@kvack.org, Andrea Arcangeli <aarcange@redhat.com>
Subject: Re: [patch 09/18] oom: select task from tasklist for mempolicy ooms
Date: Tue, 8 Jun 2010 17:40:45 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.2.00.1006081732080.19582@chino.kir.corp.google.com> (raw)
In-Reply-To: <20100608164325.a5fcdb39.akpm@linux-foundation.org>

On Tue, 8 Jun 2010, Andrew Morton wrote:

> > The oom killer presently kills current whenever there is no more memory
> > free or reclaimable on its mempolicy's nodes.  There is no guarantee that
> > current is a memory-hogging task or that killing it will free any
> > substantial amount of memory, however.
> 
> Well OK.  But we don't necesarily *want* to "free a substantial amount
> of memory".  We want to resolve the oom within `current'.  That's the
> sole responsibility of the oom-killer.  It doesn't have to free up
> large amounts of additional memory in the expectation that sometime in
> the future some other task will get an oom as well.  if the oom-killer
> is working well, we can defer those actions until the problem actually
> occurs.
> 

The oom killer has always attempted to kill a task that frees a large 
amount of memory: look at goal #2 in today's badness() heuristic (we 
recover a large amount of memory).  By doing this, we avoid endless loops 
where anything we fork or our bash shell is constantly being oom killed or 
a large number of tasks that only free minimal amounts of memory get 
killed.  The current behavior of killing current rarely works as a single 
remedy without being followed up by additional kills or user intervention.

> Plus: if `current' isn't using much memory then it's probably a
> short-lived or not-very-important process anyway.
> 

That potentially prevents anything bound to that mempolicy from ever 
getting forked.

> > In such situations, it is better to scan the tasklist for nodes that are
> > allowed to allocate on current's set of nodes and kill the task with the
> > highest badness() score.  This ensures that the most memory-hogging task,
> > or the one configured by the user with /proc/pid/oom_adj, is always
> > selected in such scenarios.
> 
> Well... *why* is it better?  Needs more justification/explanation IMO.
> 

This unifies mempolicy oom conditions with the same behavior of cpuset or 
memcg oom conditions: we want to utilize the badness() heuristic to kill 
the best candidate task and not nuke tons of processes for little benefit 
or, for instance, kill all other tasks sharing those same mempolicy nodes 
at the benefit of a memory hogger.  Userspace has the ability to influence 
this heuristic (and even more powerfully with my heuristic rewrite coming 
later in this series) so it can better tune how the kernel reacts to 
mempolicy ooms, which is a key objective of this work.  Simply killing 
current leaves no userspace intervention and can kill meaningful (and 
innocent) tasks which loses work for no reason.

> A long time ago Andrea changed the oom-killer so that it basically
> always killed `current', iirc.  I think that shipped in the Suse
> kernel.

You can do that for the entire oom killer by enabling 
/proc/sys/vm/oom_kill_allocating_task.  SGI wanted that to avoid these 
lengthy tasklist scans.

> Maybe it was only in the case where `current' got an oom when
> satisfying a pagefault, I forget the details.  But according to Andrea,
> this design provided a simple and practical solution to ooms.
> 

Right, VM_FAULT_OOM always killed current and that was recently changed to 
invoke the pagefault oom handler.  Nick has now converted the remaining 
architectures which were not using it to do so, so there is actually no 
difference for pagefaults anymore.  In an earlier revision of this 
rewrite, I wanted pagefault ooms to try killing current first if it were 
killable and then backup to the tasklist scan and heuristic use, but that 
was argued against for not conforming to other memory allocation failures.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2010-06-09  0:40 UTC|newest]

Thread overview: 104+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-06 22:33 [patch 00/18] oom killer rewrite David Rientjes
2010-06-06 22:34 ` [patch 01/18] oom: check PF_KTHREAD instead of !mm to skip kthreads David Rientjes
2010-06-07 12:12   ` Balbir Singh
2010-06-07 19:50     ` David Rientjes
2010-06-08 19:33   ` Andrew Morton
2010-06-08 23:40     ` David Rientjes
2010-06-08 23:52       ` Andrew Morton
2010-06-06 22:34 ` [patch 02/18] oom: introduce find_lock_task_mm() to fix !mm false positives David Rientjes
2010-06-07 12:58   ` Balbir Singh
2010-06-07 13:49     ` Minchan Kim
2010-06-07 19:49       ` David Rientjes
2010-06-08 19:42   ` Andrew Morton
2010-06-08 20:14     ` Oleg Nesterov
2010-06-08 20:17       ` Oleg Nesterov
2010-06-08 21:34         ` Andrew Morton
2010-06-08 23:50     ` David Rientjes
2010-06-06 22:34 ` [patch 03/18] oom: dump_tasks use find_lock_task_mm too David Rientjes
2010-06-08 19:55   ` Andrew Morton
2010-06-09  0:06     ` David Rientjes
2010-06-06 22:34 ` [patch 04/18] oom: PF_EXITING check should take mm into account David Rientjes
2010-06-08 20:00   ` Andrew Morton
2010-06-06 22:34 ` [patch 05/18] oom: give current access to memory reserves if it has been killed David Rientjes
2010-06-08 11:41   ` KOSAKI Motohiro
2010-06-08 18:47     ` David Rientjes
2010-06-14 11:08       ` KOSAKI Motohiro
2010-06-08 20:12     ` Andrew Morton
2010-06-13 11:24       ` KOSAKI Motohiro
2010-06-08 20:08   ` Andrew Morton
2010-06-09  0:14     ` David Rientjes
2010-06-06 22:34 ` [patch 06/18] oom: avoid sending exiting tasks a SIGKILL David Rientjes
2010-06-08 11:41   ` KOSAKI Motohiro
2010-06-08 18:48     ` David Rientjes
2010-06-08 20:17   ` Andrew Morton
2010-06-08 20:26   ` Oleg Nesterov
2010-06-09  6:32     ` David Rientjes
2010-06-09 16:25       ` Oleg Nesterov
2010-06-09 19:44         ` David Rientjes
2010-06-09 20:14           ` Oleg Nesterov
2010-06-10  0:15             ` KAMEZAWA Hiroyuki
2010-06-10  1:21               ` Oleg Nesterov
2010-06-10  1:43                 ` KAMEZAWA Hiroyuki
2010-06-10  1:51                   ` Oleg Nesterov
2010-06-06 22:34 ` [patch 07/18] oom: filter tasks not sharing the same cpuset David Rientjes
2010-06-08 11:41   ` KOSAKI Motohiro
2010-06-08 18:51     ` David Rientjes
2010-06-08 19:27       ` Andrew Morton
2010-06-13 11:24         ` KOSAKI Motohiro
2010-07-02 22:35           ` Andrew Morton
2010-07-04 22:08             ` David Rientjes
2010-07-09  3:00             ` KOSAKI Motohiro
2010-06-08 20:23   ` Andrew Morton
2010-06-09  0:25     ` David Rientjes
2010-06-06 22:34 ` [patch 08/18] oom: sacrifice child with highest badness score for parent David Rientjes
2010-06-08 11:41   ` KOSAKI Motohiro
2010-06-08 18:53     ` David Rientjes
2010-06-08 20:33   ` Andrew Morton
2010-06-09  0:30     ` David Rientjes
2010-06-06 22:34 ` [patch 09/18] oom: select task from tasklist for mempolicy ooms David Rientjes
2010-06-08 11:41   ` KOSAKI Motohiro
2010-06-08 21:08   ` Andrew Morton
2010-06-08 21:17     ` Oleg Nesterov
2010-06-09  0:46     ` David Rientjes
2010-06-08 23:43   ` Andrew Morton
2010-06-09  0:40     ` David Rientjes [this message]
2010-06-06 22:34 ` [patch 10/18] oom: enable oom tasklist dump by default David Rientjes
2010-06-08 11:42   ` KOSAKI Motohiro
2010-06-08 18:56     ` David Rientjes
2010-06-08 21:13   ` Andrew Morton
2010-06-09  0:52     ` David Rientjes
2010-06-06 22:34 ` [patch 11/18] oom: avoid oom killer for lowmem allocations David Rientjes
2010-06-08 11:42   ` KOSAKI Motohiro
2010-06-08 21:19   ` Andrew Morton
2010-06-06 22:34 ` [patch 12/18] oom: extract panic helper function David Rientjes
2010-06-08 11:42   ` KOSAKI Motohiro
2010-06-06 22:34 ` [patch 13/18] oom: remove special handling for pagefault ooms David Rientjes
2010-06-08 11:42   ` KOSAKI Motohiro
2010-06-08 18:57     ` David Rientjes
2010-06-08 21:27   ` Andrew Morton
2010-06-06 22:34 ` [patch 14/18] oom: move sysctl declarations to oom.h David Rientjes
2010-06-08 11:42   ` KOSAKI Motohiro
2010-06-06 22:34 ` [patch 15/18] oom: remove unnecessary code and cleanup David Rientjes
2010-06-06 22:34 ` [patch 16/18] oom: badness heuristic rewrite David Rientjes
2010-06-08 11:41   ` KOSAKI Motohiro
2010-06-08 23:02     ` Andrew Morton
2010-06-13 11:24       ` KOSAKI Motohiro
2010-06-17  5:14       ` David Rientjes
2010-06-21 11:45         ` KOSAKI Motohiro
2010-06-21 20:47           ` David Rientjes
2010-06-30  9:26             ` KOSAKI Motohiro
2010-06-17  5:12     ` David Rientjes
2010-06-21 11:45       ` KOSAKI Motohiro
2010-06-08 22:58   ` Andrew Morton
2010-06-17  5:32     ` David Rientjes
2010-06-06 22:34 ` [patch 17/18] oom: add forkbomb penalty to badness heuristic David Rientjes
2010-06-08 11:41   ` KOSAKI Motohiro
2010-06-08 23:15   ` Andrew Morton
2010-06-06 22:35 ` [patch 18/18] oom: deprecate oom_adj tunable David Rientjes
2010-06-08 11:42   ` KOSAKI Motohiro
2010-06-08 19:00     ` David Rientjes
2010-06-08 23:18     ` Andrew Morton
2010-06-13 11:24       ` KOSAKI Motohiro
2010-06-17  3:36         ` David Rientjes
2010-06-21 11:45           ` KOSAKI Motohiro
2010-06-21 20:54             ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.00.1006081732080.19582@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=oleg@redhat.com \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox