From: Michal Hocko <mhocko@kernel.org>
To: Roman Gushchin <guro@fb.com>, hannes@cmpxchg.org, tj@kernel.org
Cc: David Rientjes <rientjes@google.com>,
linux-mm@kvack.org, akpm@linux-foundation.org,
gthelen@google.com
Subject: Re: cgroup-aware OOM killer, how to move forward
Date: Mon, 23 Jul 2018 16:17:48 +0200 [thread overview]
Message-ID: <20180723141748.GH31229@dhcp22.suse.cz> (raw)
In-Reply-To: <20180719170543.GA21770@castle.DHCP.thefacebook.com>
On Thu 19-07-18 10:05:47, Roman Gushchin wrote:
> On Thu, Jul 19, 2018 at 09:38:43AM +0200, Michal Hocko wrote:
> > On Wed 18-07-18 08:28:50, Roman Gushchin wrote:
> > > On Wed, Jul 18, 2018 at 10:12:30AM +0200, Michal Hocko wrote:
> > > > On Tue 17-07-18 13:06:42, Roman Gushchin wrote:
> > > > > On Tue, Jul 17, 2018 at 09:49:46PM +0200, Michal Hocko wrote:
> > > > > > On Tue 17-07-18 10:38:45, Roman Gushchin wrote:
> > > > > > [...]
> > > > > > > Let me show my proposal on examples. Let's say we have the following hierarchy,
> > > > > > > and the biggest process (or the process with highest oom_score_adj) is in D.
> > > > > > >
> > > > > > > /
> > > > > > > |
> > > > > > > A
> > > > > > > |
> > > > > > > B
> > > > > > > / \
> > > > > > > C D
> > > > > > >
> > > > > > > Let's look at different examples and intended behavior:
> > > > > > > 1) system-wide OOM
> > > > > > > - default settings: the biggest process is killed
> > > > > > > - D/memory.group_oom=1: all processes in D are killed
> > > > > > > - A/memory.group_oom=1: all processes in A are killed
> > > > > > > 2) memcg oom in B
> > > > > > > - default settings: the biggest process is killed
> > > > > > > - A/memory.group_oom=1: the biggest process is killed
> > > > > >
> > > > > > Huh? Why would you even consider A here when the oom is below it?
> > > > > > /me confused
> > > > >
> > > > > I do not.
> > > > > This is exactly a counter-example: A's memory.group_oom
> > > > > is not considered at all in this case,
> > > > > because A is above ooming cgroup.
> > > >
> > > > OK, it confused me.
> > > >
> > > > > >
> > > > > > > - B/memory.group_oom=1: all processes in B are killed
> > > > > >
> > > > > > - B/memory.group_oom=0 &&
> > > > > > > - D/memory.group_oom=1: all processes in D are killed
> > > > > >
> > > > > > What about?
> > > > > > - B/memory.group_oom=1 && D/memory.group_oom=0
> > > > >
> > > > > All tasks in B are killed.
> > > >
> > > > so essentially find a task, traverse the memcg hierarchy from the
> > > > victim's memcg up to the oom root as long as memcg.group_oom = 1?
> > > > If the resulting memcg.group_oom == 1 then kill the whole sub tree.
> > > > Right?
> > >
> > > Yes.
> > >
> > > >
> > > > > Group_oom set to 1 means that the workload can't tolerate
> > > > > killing of a random process, so in this case it's better
> > > > > to guarantee consistency for B.
> > > >
> > > > OK, but then if D itself is OOM then we do not care about consistency
> > > > all of the sudden? I have hard time to think about a sensible usecase.
> > >
> > > I mean if traversing the hierarchy up to the oom root we meet
> > > a memcg with group_oom set to 0, we shouldn't stop traversing.
> >
> > Well, I am still fighting with the semantic of group, no-group, group
> > configuration. Why does it make any sense? In other words when can we
> > consider a cgroup to be a indivisible workload for one oom context while
> > it is fine to lose head or arm from another?
>
> Hm, so the question is should we traverse up to the OOMing cgroup,
> or up to the first cgroup with memory.group_oom=0?
>
> I looked at an example, and it *might* be the latter is better,
> especially if we'll make the default value inheritable.
>
> Let's say we have a sub-tree with a workload and some control stuff.
> Workload is tolerable to OOM's (we can handle it in userspace, for
> example), but the control stuff is not.
> Then it probably makes no sense to kill the entire sub-tree,
> if a task in C has to be killed. But makes perfect sense if we
> have to kill a task in B.
>
> /
> |
> A, delegated sub-tree, group_oom=1
> / \
> B C, workload, group_oom=0
> ^
> some control stuff here, group_oom=1
>
> Does this makes sense?
I am not sure. If you are going to delegate then you are basically
losing control of the group_oom at A-level. Is this good? What if I
_want_ to tear down the whole thing if it starts misbehaving because I
do not trust it?
The more I think about it the more I am concluding that we should start
with a more contrained model and require that once parent is
group_oom == 1 then children have to as well. If we ever find a usecase
to require a different scheme we can weaker it later. We cannot do that
other way around.
Tejun, Johannes what do you think about that?
--
Michal Hocko
SUSE Labs
next prev parent reply other threads:[~2018-07-23 14:17 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-11 22:40 Roman Gushchin
2018-07-12 12:07 ` Michal Hocko
2018-07-12 15:55 ` Roman Gushchin
2018-07-13 21:34 ` David Rientjes
2018-07-13 22:16 ` Roman Gushchin
2018-07-13 22:39 ` David Rientjes
2018-07-13 23:05 ` Roman Gushchin
2018-07-13 23:11 ` David Rientjes
2018-07-13 23:16 ` Roman Gushchin
2018-07-17 4:19 ` David Rientjes
2018-07-17 12:41 ` Michal Hocko
2018-07-17 17:38 ` Roman Gushchin
2018-07-17 19:49 ` Michal Hocko
2018-07-17 20:06 ` Roman Gushchin
2018-07-17 20:41 ` David Rientjes
2018-07-17 20:52 ` Roman Gushchin
2018-07-20 8:30 ` David Rientjes
2018-07-20 11:21 ` Tejun Heo
2018-07-20 16:13 ` Roman Gushchin
2018-07-20 20:28 ` David Rientjes
2018-07-20 20:47 ` Roman Gushchin
2018-07-23 23:06 ` David Rientjes
2018-07-23 14:12 ` Michal Hocko
2018-07-18 8:19 ` Michal Hocko
2018-07-18 8:12 ` Michal Hocko
2018-07-18 15:28 ` Roman Gushchin
2018-07-19 7:38 ` Michal Hocko
2018-07-19 17:05 ` Roman Gushchin
2018-07-20 8:32 ` David Rientjes
2018-07-23 14:17 ` Michal Hocko [this message]
2018-07-23 15:09 ` Tejun Heo
2018-07-24 7:32 ` Michal Hocko
2018-07-24 13:08 ` Tejun Heo
2018-07-24 13:26 ` Michal Hocko
2018-07-24 13:31 ` Tejun Heo
2018-07-24 13:50 ` Michal Hocko
2018-07-24 13:55 ` Tejun Heo
2018-07-24 14:25 ` Michal Hocko
2018-07-24 14:28 ` Tejun Heo
2018-07-24 14:35 ` Tejun Heo
2018-07-24 14:43 ` Michal Hocko
2018-07-24 14:49 ` Tejun Heo
2018-07-24 15:52 ` Roman Gushchin
2018-07-25 12:00 ` Michal Hocko
2018-07-25 11:58 ` Michal Hocko
2018-07-30 8:03 ` Michal Hocko
2018-07-30 14:04 ` Tejun Heo
2018-07-30 15:29 ` Roman Gushchin
2018-07-24 11:59 ` Tetsuo Handa
2018-07-25 0:10 ` Roman Gushchin
2018-07-25 12:23 ` Tetsuo Handa
2018-07-25 13:01 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180723141748.GH31229@dhcp22.suse.cz \
--to=mhocko@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=gthelen@google.com \
--cc=guro@fb.com \
--cc=hannes@cmpxchg.org \
--cc=linux-mm@kvack.org \
--cc=rientjes@google.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox