From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Nick Piggin <npiggin@suse.de>, Oleg Nesterov <oleg@redhat.com>,
Balbir Singh <balbir@in.ibm.com>,
linux-mm@kvack.org
Subject: Re: [patch -mm 1/2] oom: badness heuristic rewrite
Date: Tue, 3 Aug 2010 10:08:15 +0900 [thread overview]
Message-ID: <20100803100815.11d10519.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1008021742440.9569@chino.kir.corp.google.com>
On Mon, 2 Aug 2010 18:02:48 -0700 (PDT)
David Rientjes <rientjes@google.com> wrote:
> On Tue, 3 Aug 2010, KAMEZAWA Hiroyuki wrote:
>
> > > > Then, an applications' oom_score on a host is quite different from on the other
> > > > host. This operation is very new rather than a simple interface updates.
> > > > This opinion was rejected.
> > > >
> > >
> > > It wasn't rejected, I responded to your comment and you never wrote back.
> > > The idea
> > >
> > I just got tired to write the same thing in many times. And I don't have
> > strong opinions. I _know_ your patch fixes X-server problem. That was enough
> > for me.
> >
>
> There're a couple of reasons why I disagree that oom_score_adj should have
> memory quantity units.
>
> First, individual oom scores that come out of oom_badness() don't mean
> anything in isolation, they only mean something when compared to other
> candidate tasks. All applications, whether attached to a cpuset, a
> mempolicy, a memcg, or not, have an allowed set of memory and applications
> that are competing for those shared resources. When defining what
> application happens to be the most memory hogging, which is the one we
> want to kill, they are ranked amongst themselves. Using oom_score_adj as
> a proportion, we can say a particular application should be allowed 25% of
> resources, other applications should be allowed 5%, and others should be
> penalized 10%, for example. This makes prioritization for oom kill rather
> simple.
>
> Second, we don't want to adjust oom_score_adj anytime a task is attached
> to a cpuset, a mempolicy, or a memcg, or whenever those cpuset's mems
> changes, the bound mempolicy nodemask changes, or the memcg limit changes.
> The application need not know what that set of allowed memory is and the
> kernel should operate seemlessly regardless of what the attachment is.
> These are, in a sense, "virtualized" systems unto themselves: if a task is
> moved from a child cpuset to the root cpuset, it's set of allowed memory
> may become much larger. That action shouldn't need to have an equivalent
> change to /proc/pid/oom_score_adj: the priority of the task relative to
> its other competing tasks is the same. That set of allowed memory may
> change, but its priority does not unless explicitly changed by the admin.
>
Hmm, then, oom_score shows the values for all limitations in array ?
> > > That would work if you want to setup individual memcgs for every
> > > application on your system, know what sane limits are for each one, and
> > > want to incur the significant memory expense of enabling
> > > CONFIG_CGROUP_MEM_RES_CTLR for its metadata.
> > >
> > Usual disto alreay enables it.
> >
>
> Yes, I'm well aware of my 40MB of lost memory on my laptop :)
>
Very sorry ;)
But it's required to track memory usage from init...
> > Simply puts all applications to a group and disable oom and set oom_notifier.
> > Then,
> > - a "pop-up window" of task list will ask the user "which one do you want to kill ?"
> > - send a packet to ask a administlation server system "which one is killable ?"
> > or "increase memory limit" or "memory hot-add ?"
> >
>
> Having user interaction at the time of oom would certainly be nice, but is
> certainly impractical for us. So we need some way to state the relative
> importance of a task to the kernel so that it can act on our behalf when
> we encounter such a condition. I believe oom_score_adj does that quite
> effectively.
>
I don't disagree we need some way. And please take my words as strong objections.
I repeatedly said "I like the patch". but just had small concerns.
And I already explained why I can ignore my concners.
> > Possible case will be
> > - send SIGSTOP to all apps at OOM.
> > - rise limit to some extent. or move a killable one to a special group.
> > - wake up a killable one with SIGCONT.
> > - send SIGHUP to stop it safely.
> >
>
> We use oom notifiers with cpusets, which in this case can be used
> identically to how you're imagining memcg can be used. This particular
> change, however, only affects the oom killer: that is, it's only scope is
> that when the kernel can't do anything else, no userspace notifier is
> attached, and no memory freeing is going to otherwise occur. I would love
> to see a per-cgroup oom notifier to allow userspace to respond to these
> conditions in more effective ways, but I still believe there is a general
> need for a simple and predictable oom killer heuristic that the user has
> full power over.
>
yes. the kernel's oom killer should work as the final back-up.
And your new one works very well for X-server case which was an issue for
long time.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-08-03 1:09 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-07-17 19:16 David Rientjes
2010-07-17 19:16 ` [patch -mm 2/2] oom: deprecate oom_adj tunable David Rientjes
2010-07-29 23:08 ` [patch -mm 1/2] oom: badness heuristic rewrite Andrew Morton
2010-07-30 0:12 ` KOSAKI Motohiro
2010-07-30 1:38 ` Andrew Morton
2010-07-30 11:02 ` KOSAKI Motohiro
2010-07-30 20:14 ` David Rientjes
2010-08-02 20:43 ` Andrew Morton
2010-08-03 0:00 ` KAMEZAWA Hiroyuki
2010-08-03 0:27 ` David Rientjes
2010-08-03 0:36 ` KAMEZAWA Hiroyuki
2010-08-03 1:02 ` David Rientjes
2010-08-03 1:08 ` KAMEZAWA Hiroyuki [this message]
2010-08-03 1:24 ` KAMEZAWA Hiroyuki
2010-08-03 1:52 ` David Rientjes
2010-08-03 2:05 ` KAMEZAWA Hiroyuki
2010-08-03 3:05 ` David Rientjes
2010-08-03 3:11 ` KAMEZAWA Hiroyuki
2010-08-03 4:20 ` David Rientjes
2010-08-03 4:32 ` KAMEZAWA Hiroyuki
2010-08-03 7:23 ` David Rientjes
2010-08-03 7:21 ` KAMEZAWA Hiroyuki
2010-08-03 7:27 ` KAMEZAWA Hiroyuki
2010-08-03 20:43 ` David Rientjes
2010-08-03 1:50 ` David Rientjes
2010-08-03 1:50 ` KAMEZAWA Hiroyuki
2010-08-03 6:00 ` KOSAKI Motohiro
2010-08-03 7:16 ` David Rientjes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100803100815.11d10519.kamezawa.hiroyu@jp.fujitsu.com \
--to=kamezawa.hiroyu@jp.fujitsu.com \
--cc=akpm@linux-foundation.org \
--cc=balbir@in.ibm.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=npiggin@suse.de \
--cc=oleg@redhat.com \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox