From: David Rientjes <rientjes@google.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Nick Piggin <npiggin@suse.de>, Oleg Nesterov <oleg@redhat.com>,
Balbir Singh <balbir@in.ibm.com>,
linux-mm@kvack.org
Subject: Re: [patch -mm 1/2] oom: badness heuristic rewrite
Date: Mon, 2 Aug 2010 18:02:48 -0700 (PDT) [thread overview]
Message-ID: <alpine.DEB.2.00.1008021742440.9569@chino.kir.corp.google.com> (raw)
In-Reply-To: <20100803093610.f4d30ca7.kamezawa.hiroyu@jp.fujitsu.com>
On Tue, 3 Aug 2010, KAMEZAWA Hiroyuki wrote:
> > > Then, an applications' oom_score on a host is quite different from on the other
> > > host. This operation is very new rather than a simple interface updates.
> > > This opinion was rejected.
> > >
> >
> > It wasn't rejected, I responded to your comment and you never wrote back.
> > The idea
> >
> I just got tired to write the same thing in many times. And I don't have
> strong opinions. I _know_ your patch fixes X-server problem. That was enough
> for me.
>
There're a couple of reasons why I disagree that oom_score_adj should have
memory quantity units.
First, individual oom scores that come out of oom_badness() don't mean
anything in isolation, they only mean something when compared to other
candidate tasks. All applications, whether attached to a cpuset, a
mempolicy, a memcg, or not, have an allowed set of memory and applications
that are competing for those shared resources. When defining what
application happens to be the most memory hogging, which is the one we
want to kill, they are ranked amongst themselves. Using oom_score_adj as
a proportion, we can say a particular application should be allowed 25% of
resources, other applications should be allowed 5%, and others should be
penalized 10%, for example. This makes prioritization for oom kill rather
simple.
Second, we don't want to adjust oom_score_adj anytime a task is attached
to a cpuset, a mempolicy, or a memcg, or whenever those cpuset's mems
changes, the bound mempolicy nodemask changes, or the memcg limit changes.
The application need not know what that set of allowed memory is and the
kernel should operate seemlessly regardless of what the attachment is.
These are, in a sense, "virtualized" systems unto themselves: if a task is
moved from a child cpuset to the root cpuset, it's set of allowed memory
may become much larger. That action shouldn't need to have an equivalent
change to /proc/pid/oom_score_adj: the priority of the task relative to
its other competing tasks is the same. That set of allowed memory may
change, but its priority does not unless explicitly changed by the admin.
> > That would work if you want to setup individual memcgs for every
> > application on your system, know what sane limits are for each one, and
> > want to incur the significant memory expense of enabling
> > CONFIG_CGROUP_MEM_RES_CTLR for its metadata.
> >
> Usual disto alreay enables it.
>
Yes, I'm well aware of my 40MB of lost memory on my laptop :)
> Simply puts all applications to a group and disable oom and set oom_notifier.
> Then,
> - a "pop-up window" of task list will ask the user "which one do you want to kill ?"
> - send a packet to ask a administlation server system "which one is killable ?"
> or "increase memory limit" or "memory hot-add ?"
>
Having user interaction at the time of oom would certainly be nice, but is
certainly impractical for us. So we need some way to state the relative
importance of a task to the kernel so that it can act on our behalf when
we encounter such a condition. I believe oom_score_adj does that quite
effectively.
> Possible case will be
> - send SIGSTOP to all apps at OOM.
> - rise limit to some extent. or move a killable one to a special group.
> - wake up a killable one with SIGCONT.
> - send SIGHUP to stop it safely.
>
We use oom notifiers with cpusets, which in this case can be used
identically to how you're imagining memcg can be used. This particular
change, however, only affects the oom killer: that is, it's only scope is
that when the kernel can't do anything else, no userspace notifier is
attached, and no memory freeing is going to otherwise occur. I would love
to see a per-cgroup oom notifier to allow userspace to respond to these
conditions in more effective ways, but I still believe there is a general
need for a simple and predictable oom killer heuristic that the user has
full power over.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-08-03 0:59 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-07-17 19:16 David Rientjes
2010-07-17 19:16 ` [patch -mm 2/2] oom: deprecate oom_adj tunable David Rientjes
2010-07-29 23:08 ` [patch -mm 1/2] oom: badness heuristic rewrite Andrew Morton
2010-07-30 0:12 ` KOSAKI Motohiro
2010-07-30 1:38 ` Andrew Morton
2010-07-30 11:02 ` KOSAKI Motohiro
2010-07-30 20:14 ` David Rientjes
2010-08-02 20:43 ` Andrew Morton
2010-08-03 0:00 ` KAMEZAWA Hiroyuki
2010-08-03 0:27 ` David Rientjes
2010-08-03 0:36 ` KAMEZAWA Hiroyuki
2010-08-03 1:02 ` David Rientjes [this message]
2010-08-03 1:08 ` KAMEZAWA Hiroyuki
2010-08-03 1:24 ` KAMEZAWA Hiroyuki
2010-08-03 1:52 ` David Rientjes
2010-08-03 2:05 ` KAMEZAWA Hiroyuki
2010-08-03 3:05 ` David Rientjes
2010-08-03 3:11 ` KAMEZAWA Hiroyuki
2010-08-03 4:20 ` David Rientjes
2010-08-03 4:32 ` KAMEZAWA Hiroyuki
2010-08-03 7:23 ` David Rientjes
2010-08-03 7:21 ` KAMEZAWA Hiroyuki
2010-08-03 7:27 ` KAMEZAWA Hiroyuki
2010-08-03 20:43 ` David Rientjes
2010-08-03 1:50 ` David Rientjes
2010-08-03 1:50 ` KAMEZAWA Hiroyuki
2010-08-03 6:00 ` KOSAKI Motohiro
2010-08-03 7:16 ` David Rientjes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.2.00.1008021742440.9569@chino.kir.corp.google.com \
--to=rientjes@google.com \
--cc=akpm@linux-foundation.org \
--cc=balbir@in.ibm.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=npiggin@suse.de \
--cc=oleg@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox