linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Roman Gushchin <guro@fb.com>, hannes@cmpxchg.org, tj@kernel.org
Cc: David Rientjes <rientjes@google.com>,
	linux-mm@kvack.org, akpm@linux-foundation.org,
	gthelen@google.com
Subject: Re: cgroup-aware OOM killer, how to move forward
Date: Mon, 23 Jul 2018 16:17:48 +0200	[thread overview]
Message-ID: <20180723141748.GH31229@dhcp22.suse.cz> (raw)
In-Reply-To: <20180719170543.GA21770@castle.DHCP.thefacebook.com>

On Thu 19-07-18 10:05:47, Roman Gushchin wrote:
> On Thu, Jul 19, 2018 at 09:38:43AM +0200, Michal Hocko wrote:
> > On Wed 18-07-18 08:28:50, Roman Gushchin wrote:
> > > On Wed, Jul 18, 2018 at 10:12:30AM +0200, Michal Hocko wrote:
> > > > On Tue 17-07-18 13:06:42, Roman Gushchin wrote:
> > > > > On Tue, Jul 17, 2018 at 09:49:46PM +0200, Michal Hocko wrote:
> > > > > > On Tue 17-07-18 10:38:45, Roman Gushchin wrote:
> > > > > > [...]
> > > > > > > Let me show my proposal on examples. Let's say we have the following hierarchy,
> > > > > > > and the biggest process (or the process with highest oom_score_adj) is in D.
> > > > > > > 
> > > > > > >   /
> > > > > > >   |
> > > > > > >   A
> > > > > > >   |
> > > > > > >   B
> > > > > > >  / \
> > > > > > > C   D
> > > > > > > 
> > > > > > > Let's look at different examples and intended behavior:
> > > > > > > 1) system-wide OOM
> > > > > > >   - default settings: the biggest process is killed
> > > > > > >   - D/memory.group_oom=1: all processes in D are killed
> > > > > > >   - A/memory.group_oom=1: all processes in A are killed
> > > > > > > 2) memcg oom in B
> > > > > > >   - default settings: the biggest process is killed
> > > > > > >   - A/memory.group_oom=1: the biggest process is killed
> > > > > > 
> > > > > > Huh? Why would you even consider A here when the oom is below it?
> > > > > > /me confused
> > > > > 
> > > > > I do not.
> > > > > This is exactly a counter-example: A's memory.group_oom
> > > > > is not considered at all in this case,
> > > > > because A is above ooming cgroup.
> > > > 
> > > > OK, it confused me.
> > > > 
> > > > > > 
> > > > > > >   - B/memory.group_oom=1: all processes in B are killed
> > > > > > 
> > > > > >     - B/memory.group_oom=0 &&
> > > > > > >   - D/memory.group_oom=1: all processes in D are killed
> > > > > > 
> > > > > > What about?
> > > > > >     - B/memory.group_oom=1 && D/memory.group_oom=0
> > > > > 
> > > > > All tasks in B are killed.
> > > > 
> > > > so essentially find a task, traverse the memcg hierarchy from the
> > > > victim's memcg up to the oom root as long as memcg.group_oom = 1?
> > > > If the resulting memcg.group_oom == 1 then kill the whole sub tree.
> > > > Right?
> > > 
> > > Yes.
> > > 
> > > > 
> > > > > Group_oom set to 1 means that the workload can't tolerate
> > > > > killing of a random process, so in this case it's better
> > > > > to guarantee consistency for B.
> > > > 
> > > > OK, but then if D itself is OOM then we do not care about consistency
> > > > all of the sudden? I have hard time to think about a sensible usecase.
> > > 
> > > I mean if traversing the hierarchy up to the oom root we meet
> > > a memcg with group_oom set to 0, we shouldn't stop traversing.
> > 
> > Well, I am still fighting with the semantic of group, no-group, group
> > configuration. Why does it make any sense? In other words when can we
> > consider a cgroup to be a indivisible workload for one oom context while
> > it is fine to lose head or arm from another?
> 
> Hm, so the question is should we traverse up to the OOMing cgroup,
> or up to the first cgroup with memory.group_oom=0?
> 
> I looked at an example, and it *might* be the latter is better,
> especially if we'll make the default value inheritable.
> 
> Let's say we have a sub-tree with a workload and some control stuff.
> Workload is tolerable to OOM's (we can handle it in userspace, for
> example), but the control stuff is not.
> Then it probably makes no sense to kill the entire sub-tree,
> if a task in C has to be killed. But makes perfect sense if we
> have to kill a task in B.
> 
>   /
>   |
>   A, delegated sub-tree, group_oom=1
>  / \
> B   C, workload, group_oom=0
> ^
> some control stuff here, group_oom=1
> 
> Does this makes sense?

I am not sure. If you are going to delegate then you are basically
losing control of the group_oom at A-level. Is this good? What if I
_want_ to tear down the whole thing if it starts misbehaving because I
do not trust it?

The more I think about it the more I am concluding that we should start
with a more contrained model and require that once parent is
group_oom == 1 then children have to as well. If we ever find a usecase
to require a different scheme we can weaker it later. We cannot do that
other way around.

Tejun, Johannes what do you think about that?
-- 
Michal Hocko
SUSE Labs

  parent reply	other threads:[~2018-07-23 14:17 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-11 22:40 Roman Gushchin
2018-07-12 12:07 ` Michal Hocko
2018-07-12 15:55   ` Roman Gushchin
2018-07-13 21:34 ` David Rientjes
2018-07-13 22:16   ` Roman Gushchin
2018-07-13 22:39     ` David Rientjes
2018-07-13 23:05       ` Roman Gushchin
2018-07-13 23:11         ` David Rientjes
2018-07-13 23:16           ` Roman Gushchin
2018-07-17  4:19             ` David Rientjes
2018-07-17 12:41               ` Michal Hocko
2018-07-17 17:38               ` Roman Gushchin
2018-07-17 19:49                 ` Michal Hocko
2018-07-17 20:06                   ` Roman Gushchin
2018-07-17 20:41                     ` David Rientjes
2018-07-17 20:52                       ` Roman Gushchin
2018-07-20  8:30                         ` David Rientjes
2018-07-20 11:21                           ` Tejun Heo
2018-07-20 16:13                             ` Roman Gushchin
2018-07-20 20:28                             ` David Rientjes
2018-07-20 20:47                               ` Roman Gushchin
2018-07-23 23:06                                 ` David Rientjes
2018-07-23 14:12                               ` Michal Hocko
2018-07-18  8:19                       ` Michal Hocko
2018-07-18  8:12                     ` Michal Hocko
2018-07-18 15:28                       ` Roman Gushchin
2018-07-19  7:38                         ` Michal Hocko
2018-07-19 17:05                           ` Roman Gushchin
2018-07-20  8:32                             ` David Rientjes
2018-07-23 14:17                             ` Michal Hocko [this message]
2018-07-23 15:09                               ` Tejun Heo
2018-07-24  7:32                                 ` Michal Hocko
2018-07-24 13:08                                   ` Tejun Heo
2018-07-24 13:26                                     ` Michal Hocko
2018-07-24 13:31                                       ` Tejun Heo
2018-07-24 13:50                                         ` Michal Hocko
2018-07-24 13:55                                           ` Tejun Heo
2018-07-24 14:25                                             ` Michal Hocko
2018-07-24 14:28                                               ` Tejun Heo
2018-07-24 14:35                                                 ` Tejun Heo
2018-07-24 14:43                                                 ` Michal Hocko
2018-07-24 14:49                                                   ` Tejun Heo
2018-07-24 15:52                                                     ` Roman Gushchin
2018-07-25 12:00                                                       ` Michal Hocko
2018-07-25 11:58                                                     ` Michal Hocko
2018-07-30  8:03                                       ` Michal Hocko
2018-07-30 14:04                                         ` Tejun Heo
2018-07-30 15:29                                           ` Roman Gushchin
2018-07-24 11:59 ` Tetsuo Handa
2018-07-25  0:10   ` Roman Gushchin
2018-07-25 12:23     ` Tetsuo Handa
2018-07-25 13:01       ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180723141748.GH31229@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=gthelen@google.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-mm@kvack.org \
    --cc=rientjes@google.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox