linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Rientjes <rientjes@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>, Nick Piggin <npiggin@suse.de>,
	Oleg Nesterov <oleg@redhat.com>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	linux-mm@kvack.org
Subject: Re: [patch 16/18] oom: badness heuristic rewrite
Date: Wed, 16 Jun 2010 22:32:09 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.2.00.1006162215280.19549@chino.kir.corp.google.com> (raw)
In-Reply-To: <20100608155802.cdd4aff3.akpm@linux-foundation.org>

On Tue, 8 Jun 2010, Andrew Morton wrote:

> > This a complete rewrite of the oom killer's badness() heuristic which is
> > used to determine which task to kill in oom conditions.  The goal is to
> > make it as simple and predictable as possible so the results are better
> > understood and we end up killing the task which will lead to the most
> > memory freeing while still respecting the fine-tuning from userspace.
> 
> It's not obvious from this description that then end result is better! 

I think it's fairly obvious that predictablility is an important part of 
any heuristic that will determine whether your task survives or dies.

> Have you any testcases or scenarios which got improved?
> 

Yes, as cited below in the changelog with the KDE example.

> > Instead of basing the heuristic on mm->total_vm for each task, the task's
> > rss and swap space is used instead.  This is a better indication of the
> > amount of memory that will be freeable if the oom killed task is chosen
> > and subsequently exits.
> 
> Again, why should we optimise for the amount of memory which a killing
> will yield (if that's what you mean).  We only need to free enough
> memory to unblock the oom condition then proceed.
> 

That's what the oom killer has always done simply because we want to avoid 
subsequent oom conditions in the near future that will require additional 
tasks to be killed.  It seems far better to kill a large memory-hogging 
task[*] than ten smaller tasks that total the same amount of memory usage.

 [*] And, with this rewrite, "memory-hogging" can be defined for the first
     time from userspace with a tunable, oom_score_adj, that actually has
     units so that within a cpuset, for example, we can bias a task by
     25% of available memory or bias other tasks against it by 25%.  For
     the first time ever, we can say "this task should be able to use 25%
     more memory than other tasks without getting killed first."

> The last thing we want to do is to kill a process which has consumed
> 1000 CPU hours, or which is providing some system-critical service or
> whatever.  Amount-of-memory-freeable is a relatively minor criterion.
> 

What would you suggest otherwise?  Cputime?  Then we may never be able to 
fork our bash shell or ssh into our machines.

> >  This helps specifically in cases where KDE or
> > GNOME is chosen for oom kill on desktop systems instead of a memory
> > hogging task.
> 
> It helps how?  Examples and test cases?
> 

Because KDE and GNOME typically have very large mm->total_vm values but 
the amount of resident memory in RAM is consumed by other tasks, even 
memory leakers.  mm->total_vm is agreed to be a very poor heursitic 
baseline by just about everyone.

> > The baseline for the heuristic is a proportion of memory that each task is
> > currently using in memory plus swap compared to the amount of "allowable"
> > memory.
> 
> What does "swap" mean?  swapspace includes swap-backed swapcache,
> un-swap-backed swapcache and non-resident swap.  Which of all these is
> being used here and for what reason?
> 

This is swap cache, the number of swap entries for the task which could be 
freeable if the task is killed that could subsequently be used for page 
allocations that triggered the oom killer.  We want to add hints to the 
oom killer so that memory which cannot be used on blockable memory 
allocations may be freed so we don't call into the oom killer again in the 
near future.

> > /proc/pid/oom_adj is changed so that its meaning is rescaled into the
> > units used by /proc/pid/oom_score_adj, and vice versa.  Changing one of
> > these per-task tunables will rescale the value of the other to an
> > equivalent meaning.  Although /proc/pid/oom_adj was originally defined as
> > a bitshift on the badness score, it now shares the same linear growth as
> > /proc/pid/oom_score_adj but with different granularity.  This is required
> > so the ABI is not broken with userspace applications and allows oom_adj to
> > be deprecated for future removal.
> 
> It was a mistake to add oom_adj in the first place.  Because it's a
> user-visible knob which us tied to a particular in-kernel
> implementation.  As we're seeing now, the presence of that knob locks
> us into a particular implementation.
> 

Agreed.

> Given that oom_score_adj is just a rescaled version of oom_adj
> (correct?), I guess things haven't got a lot worse on that front as a
> result of these changes.
> 

No, it's not a rescaled version at all, we merely rescale oom_adj to 
oom_score_adj units because everyone objected to removing oom_adj without 
deprecation first.  oom_score_adj has units: a proportion of memory 
available to the application, meaning how much of the system, memcg, 
cpuset, or mempolicy it should be biased or favored by.  Please see the 
change to Documentation/filesystems/proc.txt which explain this pretty 
elaborately.

> General observation regarding the patch description: I'm not seeing a
> lot of reason for merging the patch!  What value does it bring to our
> users?  What problems got solved?
> 

It significantly improves the oom killer's predictability, it protects 
vital system tasks like KDE and GNOME on the desktop, it allows users to 
tune each task with a bias or preference in units they understand to 
affect its score, and it allows that interface to remain constant and 
valid even when those tasks are subsequently attached to a cgroup or bound 
to a mempolicy (or their limits or set of allowed nodes are changed).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2010-06-17  5:32 UTC|newest]

Thread overview: 104+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-06 22:33 [patch 00/18] oom killer rewrite David Rientjes
2010-06-06 22:34 ` [patch 01/18] oom: check PF_KTHREAD instead of !mm to skip kthreads David Rientjes
2010-06-07 12:12   ` Balbir Singh
2010-06-07 19:50     ` David Rientjes
2010-06-08 19:33   ` Andrew Morton
2010-06-08 23:40     ` David Rientjes
2010-06-08 23:52       ` Andrew Morton
2010-06-06 22:34 ` [patch 02/18] oom: introduce find_lock_task_mm() to fix !mm false positives David Rientjes
2010-06-07 12:58   ` Balbir Singh
2010-06-07 13:49     ` Minchan Kim
2010-06-07 19:49       ` David Rientjes
2010-06-08 19:42   ` Andrew Morton
2010-06-08 20:14     ` Oleg Nesterov
2010-06-08 20:17       ` Oleg Nesterov
2010-06-08 21:34         ` Andrew Morton
2010-06-08 23:50     ` David Rientjes
2010-06-06 22:34 ` [patch 03/18] oom: dump_tasks use find_lock_task_mm too David Rientjes
2010-06-08 19:55   ` Andrew Morton
2010-06-09  0:06     ` David Rientjes
2010-06-06 22:34 ` [patch 04/18] oom: PF_EXITING check should take mm into account David Rientjes
2010-06-08 20:00   ` Andrew Morton
2010-06-06 22:34 ` [patch 05/18] oom: give current access to memory reserves if it has been killed David Rientjes
2010-06-08 11:41   ` KOSAKI Motohiro
2010-06-08 18:47     ` David Rientjes
2010-06-14 11:08       ` KOSAKI Motohiro
2010-06-08 20:12     ` Andrew Morton
2010-06-13 11:24       ` KOSAKI Motohiro
2010-06-08 20:08   ` Andrew Morton
2010-06-09  0:14     ` David Rientjes
2010-06-06 22:34 ` [patch 06/18] oom: avoid sending exiting tasks a SIGKILL David Rientjes
2010-06-08 11:41   ` KOSAKI Motohiro
2010-06-08 18:48     ` David Rientjes
2010-06-08 20:17   ` Andrew Morton
2010-06-08 20:26   ` Oleg Nesterov
2010-06-09  6:32     ` David Rientjes
2010-06-09 16:25       ` Oleg Nesterov
2010-06-09 19:44         ` David Rientjes
2010-06-09 20:14           ` Oleg Nesterov
2010-06-10  0:15             ` KAMEZAWA Hiroyuki
2010-06-10  1:21               ` Oleg Nesterov
2010-06-10  1:43                 ` KAMEZAWA Hiroyuki
2010-06-10  1:51                   ` Oleg Nesterov
2010-06-06 22:34 ` [patch 07/18] oom: filter tasks not sharing the same cpuset David Rientjes
2010-06-08 11:41   ` KOSAKI Motohiro
2010-06-08 18:51     ` David Rientjes
2010-06-08 19:27       ` Andrew Morton
2010-06-13 11:24         ` KOSAKI Motohiro
2010-07-02 22:35           ` Andrew Morton
2010-07-04 22:08             ` David Rientjes
2010-07-09  3:00             ` KOSAKI Motohiro
2010-06-08 20:23   ` Andrew Morton
2010-06-09  0:25     ` David Rientjes
2010-06-06 22:34 ` [patch 08/18] oom: sacrifice child with highest badness score for parent David Rientjes
2010-06-08 11:41   ` KOSAKI Motohiro
2010-06-08 18:53     ` David Rientjes
2010-06-08 20:33   ` Andrew Morton
2010-06-09  0:30     ` David Rientjes
2010-06-06 22:34 ` [patch 09/18] oom: select task from tasklist for mempolicy ooms David Rientjes
2010-06-08 11:41   ` KOSAKI Motohiro
2010-06-08 21:08   ` Andrew Morton
2010-06-08 21:17     ` Oleg Nesterov
2010-06-09  0:46     ` David Rientjes
2010-06-08 23:43   ` Andrew Morton
2010-06-09  0:40     ` David Rientjes
2010-06-06 22:34 ` [patch 10/18] oom: enable oom tasklist dump by default David Rientjes
2010-06-08 11:42   ` KOSAKI Motohiro
2010-06-08 18:56     ` David Rientjes
2010-06-08 21:13   ` Andrew Morton
2010-06-09  0:52     ` David Rientjes
2010-06-06 22:34 ` [patch 11/18] oom: avoid oom killer for lowmem allocations David Rientjes
2010-06-08 11:42   ` KOSAKI Motohiro
2010-06-08 21:19   ` Andrew Morton
2010-06-06 22:34 ` [patch 12/18] oom: extract panic helper function David Rientjes
2010-06-08 11:42   ` KOSAKI Motohiro
2010-06-06 22:34 ` [patch 13/18] oom: remove special handling for pagefault ooms David Rientjes
2010-06-08 11:42   ` KOSAKI Motohiro
2010-06-08 18:57     ` David Rientjes
2010-06-08 21:27   ` Andrew Morton
2010-06-06 22:34 ` [patch 14/18] oom: move sysctl declarations to oom.h David Rientjes
2010-06-08 11:42   ` KOSAKI Motohiro
2010-06-06 22:34 ` [patch 15/18] oom: remove unnecessary code and cleanup David Rientjes
2010-06-06 22:34 ` [patch 16/18] oom: badness heuristic rewrite David Rientjes
2010-06-08 11:41   ` KOSAKI Motohiro
2010-06-08 23:02     ` Andrew Morton
2010-06-13 11:24       ` KOSAKI Motohiro
2010-06-17  5:14       ` David Rientjes
2010-06-21 11:45         ` KOSAKI Motohiro
2010-06-21 20:47           ` David Rientjes
2010-06-30  9:26             ` KOSAKI Motohiro
2010-06-17  5:12     ` David Rientjes
2010-06-21 11:45       ` KOSAKI Motohiro
2010-06-08 22:58   ` Andrew Morton
2010-06-17  5:32     ` David Rientjes [this message]
2010-06-06 22:34 ` [patch 17/18] oom: add forkbomb penalty to badness heuristic David Rientjes
2010-06-08 11:41   ` KOSAKI Motohiro
2010-06-08 23:15   ` Andrew Morton
2010-06-06 22:35 ` [patch 18/18] oom: deprecate oom_adj tunable David Rientjes
2010-06-08 11:42   ` KOSAKI Motohiro
2010-06-08 19:00     ` David Rientjes
2010-06-08 23:18     ` Andrew Morton
2010-06-13 11:24       ` KOSAKI Motohiro
2010-06-17  3:36         ` David Rientjes
2010-06-21 11:45           ` KOSAKI Motohiro
2010-06-21 20:54             ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.00.1006162215280.19549@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=oleg@redhat.com \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox