linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Rientjes <rientjes@google.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Nick Piggin <npiggin@suse.de>, Oleg Nesterov <oleg@redhat.com>,
	Balbir Singh <balbir@in.ibm.com>,
	linux-mm@kvack.org
Subject: Re: [patch -mm 1/2] oom: badness heuristic rewrite
Date: Tue, 3 Aug 2010 00:23:32 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.2.00.1008030016590.20849@chino.kir.corp.google.com> (raw)
In-Reply-To: <20100803133255.deb5c208.kamezawa.hiroyu@jp.fujitsu.com>

On Tue, 3 Aug 2010, KAMEZAWA Hiroyuki wrote:

> In old behavior, oom_score order is synchronous both in the system and
> container. High-score one will be killed.
> IOW, oom_score have worked as oom_score.
> 

This isn't necessarily true as I've already pointed out: the highest score 
as exported by /proc/pid/oom_score is not always killed if it's not a 
candidate task: it may be in a disjoint memcg, for example.  The highest 
_candidate_ task is killed, and that's unchanged with my rewrite.

The current /proc/pid/oom_score is also not synchronous between the system 
and container at least in the cpuset case since we currently divide a 
task's score by 8 if it doesn't intersect current's mems_allowed, so 
that's not true either.

> But, after the patch,  the user (of LXC at el.) can't trust oom_score. 

Yes, they can, but they need to know the context in which the oom occurs.  
/proc/pid/oom_score cannot export multiple values although its kill 
ranking actually depends on whether its a system oom, memcg oom, cpuset 
oom, etc.  It needs to export a single value as a function of the 
heuristic.  The user must then take those values at the time of 
collection and find how the various tasks rank relative to one another 
depending on MPOL_BIND, cpuset hierarchy, etc.  That's actually not that 
difficult because admins who don't use any cgroups typically only have 
system-wide ooms where oom_score is always accurate and admins who use 
cpusets or memcg or mempolicies on large NUMA systems already know the set 
of tasks that are attached to them and want to prioritize the killing list 
specifically for those entities.

> Especially with memcg, it just shows a _broken_ value.
> 

Not at all, the user knows what tasks are attached to the memcg and can 
easily determine which task is going to be killed when it ooms: simply 
iterate through the memcg tasklist, check /proc/pid/oom_score, and sort.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2010-08-03  7:17 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-17 19:16 David Rientjes
2010-07-17 19:16 ` [patch -mm 2/2] oom: deprecate oom_adj tunable David Rientjes
2010-07-29 23:08 ` [patch -mm 1/2] oom: badness heuristic rewrite Andrew Morton
2010-07-30  0:12   ` KOSAKI Motohiro
2010-07-30  1:38     ` Andrew Morton
2010-07-30 11:02       ` KOSAKI Motohiro
2010-07-30 20:14         ` David Rientjes
2010-08-02 20:43         ` Andrew Morton
2010-08-03  0:00           ` KAMEZAWA Hiroyuki
2010-08-03  0:27             ` David Rientjes
2010-08-03  0:36               ` KAMEZAWA Hiroyuki
2010-08-03  1:02                 ` David Rientjes
2010-08-03  1:08                   ` KAMEZAWA Hiroyuki
2010-08-03  1:24                     ` KAMEZAWA Hiroyuki
2010-08-03  1:52                       ` David Rientjes
2010-08-03  2:05                         ` KAMEZAWA Hiroyuki
2010-08-03  3:05                           ` David Rientjes
2010-08-03  3:11                             ` KAMEZAWA Hiroyuki
2010-08-03  4:20                               ` David Rientjes
2010-08-03  4:32                                 ` KAMEZAWA Hiroyuki
2010-08-03  7:23                                   ` David Rientjes [this message]
2010-08-03  7:21                                     ` KAMEZAWA Hiroyuki
2010-08-03  7:27                                       ` KAMEZAWA Hiroyuki
2010-08-03 20:43                                         ` David Rientjes
2010-08-03  1:50                     ` David Rientjes
2010-08-03  1:50                       ` KAMEZAWA Hiroyuki
2010-08-03  6:00           ` KOSAKI Motohiro
2010-08-03  7:16             ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.00.1008030016590.20849@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=balbir@in.ibm.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=oleg@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox