Re: [patch -mm 1/2] oom: badness heuristic rewrite

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Rientjes <rientjes@google.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Nick Piggin <npiggin@suse.de>, Oleg Nesterov <oleg@redhat.com>,
	Balbir Singh <balbir@in.ibm.com>,
	linux-mm@kvack.org
Subject: Re: [patch -mm 1/2] oom: badness heuristic rewrite
Date: Mon, 2 Aug 2010 18:02:48 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.2.00.1008021742440.9569@chino.kir.corp.google.com> (raw)
In-Reply-To: <20100803093610.f4d30ca7.kamezawa.hiroyu@jp.fujitsu.com>

On Tue, 3 Aug 2010, KAMEZAWA Hiroyuki wrote:

> > > Then, an applications' oom_score on a host is quite different from on the other
> > > host. This operation is very new rather than a simple interface updates.
> > > This opinion was rejected.
> > > 
> > 
> > It wasn't rejected, I responded to your comment and you never wrote back.  
> > The idea 
> > 
> I just got tired to write the same thing in many times. And I don't have
> strong opinions. I _know_ your patch fixes X-server problem. That was enough
> for me.
> 

There're a couple of reasons why I disagree that oom_score_adj should have 
memory quantity units.

First, individual oom scores that come out of oom_badness() don't mean 
anything in isolation, they only mean something when compared to other 
candidate tasks.  All applications, whether attached to a cpuset, a 
mempolicy, a memcg, or not, have an allowed set of memory and applications 
that are competing for those shared resources.  When defining what 
application happens to be the most memory hogging, which is the one we 
want to kill, they are ranked amongst themselves.  Using oom_score_adj as 
a proportion, we can say a particular application should be allowed 25% of 
resources, other applications should be allowed 5%, and others should be 
penalized 10%, for example.  This makes prioritization for oom kill rather 
simple.

Second, we don't want to adjust oom_score_adj anytime a task is attached 
to a cpuset, a mempolicy, or a memcg, or whenever those cpuset's mems 
changes, the bound mempolicy nodemask changes, or the memcg limit changes.  
The application need not know what that set of allowed memory is and the 
kernel should operate seemlessly regardless of what the attachment is.  
These are, in a sense, "virtualized" systems unto themselves: if a task is 
moved from a child cpuset to the root cpuset, it's set of allowed memory 
may become much larger.  That action shouldn't need to have an equivalent 
change to /proc/pid/oom_score_adj: the priority of the task relative to 
its other competing tasks is the same.  That set of allowed memory may 
change, but its priority does not unless explicitly changed by the admin.

> > That would work if you want to setup individual memcgs for every 
> > application on your system, know what sane limits are for each one, and 
> > want to incur the significant memory expense of enabling 
> > CONFIG_CGROUP_MEM_RES_CTLR for its metadata.
> > 
> Usual disto alreay enables it.
> 

Yes, I'm well aware of my 40MB of lost memory on my laptop :)

> Simply puts all applications to a group and disable oom and set oom_notifier. 
> Then,
>  - a "pop-up window" of task list will ask the user "which one do you want to kill ?"
>  - send a packet to ask a administlation server system "which one is killable ?"
>    or "increase memory limit" or "memory hot-add ?" 
> 

Having user interaction at the time of oom would certainly be nice, but is 
certainly impractical for us.  So we need some way to state the relative 
importance of a task to the kernel so that it can act on our behalf when 
we encounter such a condition.  I believe oom_score_adj does that quite 
effectively.

> Possible case will be
>    - send SIGSTOP to all apps at OOM.
>    - rise limit to some extent. or move a killable one to a special group.
>    - wake up a killable one with SIGCONT.
>    - send SIGHUP to stop it safely.
> 

We use oom notifiers with cpusets, which in this case can be used 
identically to how you're imagining memcg can be used.  This particular 
change, however, only affects the oom killer: that is, it's only scope is 
that when the kernel can't do anything else, no userspace notifier is 
attached, and no memory freeing is going to otherwise occur.  I would love 
to see a per-cgroup oom notifier to allow userspace to respond to these 
conditions in more effective ways, but I still believe there is a general 
need for a simple and predictable oom killer heuristic that the user has 
full power over.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2010-08-03  0:59 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-17 19:16 David Rientjes
2010-07-17 19:16 ` [patch -mm 2/2] oom: deprecate oom_adj tunable David Rientjes
2010-07-29 23:08 ` [patch -mm 1/2] oom: badness heuristic rewrite Andrew Morton
2010-07-30  0:12   ` KOSAKI Motohiro
2010-07-30  1:38     ` Andrew Morton
2010-07-30 11:02       ` KOSAKI Motohiro
2010-07-30 20:14         ` David Rientjes
2010-08-02 20:43         ` Andrew Morton
2010-08-03  0:00           ` KAMEZAWA Hiroyuki
2010-08-03  0:27             ` David Rientjes
2010-08-03  0:36               ` KAMEZAWA Hiroyuki
2010-08-03  1:02                 ` David Rientjes [this message]
2010-08-03  1:08                   ` KAMEZAWA Hiroyuki
2010-08-03  1:24                     ` KAMEZAWA Hiroyuki
2010-08-03  1:52                       ` David Rientjes
2010-08-03  2:05                         ` KAMEZAWA Hiroyuki
2010-08-03  3:05                           ` David Rientjes
2010-08-03  3:11                             ` KAMEZAWA Hiroyuki
2010-08-03  4:20                               ` David Rientjes
2010-08-03  4:32                                 ` KAMEZAWA Hiroyuki
2010-08-03  7:23                                   ` David Rientjes
2010-08-03  7:21                                     ` KAMEZAWA Hiroyuki
2010-08-03  7:27                                       ` KAMEZAWA Hiroyuki
2010-08-03 20:43                                         ` David Rientjes
2010-08-03  1:50                     ` David Rientjes
2010-08-03  1:50                       ` KAMEZAWA Hiroyuki
2010-08-03  6:00           ` KOSAKI Motohiro
2010-08-03  7:16             ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.00.1008021742440.9569@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=balbir@in.ibm.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=oleg@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox