From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-f72.google.com (mail-ed1-f72.google.com [209.85.208.72]) by kanga.kvack.org (Postfix) with ESMTP id 85E1F6B0010 for ; Fri, 13 Jul 2018 18:16:34 -0400 (EDT) Received: by mail-ed1-f72.google.com with SMTP id l1-v6so5088612edi.11 for ; Fri, 13 Jul 2018 15:16:34 -0700 (PDT) Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com. [67.231.153.30]) by mx.google.com with ESMTPS id d1-v6si2299213edn.311.2018.07.13.15.16.32 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 13 Jul 2018 15:16:33 -0700 (PDT) Date: Fri, 13 Jul 2018 15:16:07 -0700 From: Roman Gushchin Subject: Re: cgroup-aware OOM killer, how to move forward Message-ID: <20180713221602.GA15005@castle.DHCP.thefacebook.com> References: <20180711223959.GA13981@castle.DHCP.thefacebook.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: David Rientjes Cc: linux-mm@kvack.org, akpm@linux-foundation.org, mhocko@kernel.org, hannes@cmpxchg.org, tj@kernel.org, gthelen@google.com On Fri, Jul 13, 2018 at 02:34:49PM -0700, David Rientjes wrote: > On Wed, 11 Jul 2018, Roman Gushchin wrote: > > > I was thinking on how to move forward with the cgroup-aware OOM killer. > > It looks to me, that we all agree on the "cleanup" part of the patchset: > > it's a nice feature to be able to kill all tasks in the cgroup > > to guarantee the consistent state of the workload. > > All our disagreements are related to the victim selection algorithm. > > > > So, I wonder, if the right thing to do is to split the problem. > > We can agree on the "cleanup" part, which is useful by itself, > > merge it upstream, and then return to the victim selection > > algorithm. > > > > So, here is my proposal: > > let's introduce the memory.group_oom knob with the following semantics: > > if the knob is set, the OOM killer can kill either none, either all > > tasks in the cgroup*. > > It can perfectly work with the current OOM killer (as a "cleanup" option), > > and allows _any_ further approach on the OOM victim selection. > > It also doesn't require any mount/boot/tree-wide options. > > > > How does it sound? > > > > No objection, of course, this was always the mechanism vs policy > separation that I was referring to. Having the ability to kill all > processes attached to the cgroup when one of its processes is selected is > useful, and we have our own patches that do just that, with the exception > that it's triggerable by the user. Perfect! I'll prepare the patchset. > > One of the things that I really like about cgroup v2, though, is what > appears to be an implicit, but rather apparent, goal to minimize the > number of files for each controller. It's very clean. So I'd suggest > that we consider memory.group_oom, or however it is named, to allow for > future development. > > For example, rather than simply being binary, we'd probably want the > ability to kill all eligible processes attached directly to the victim's > mem cgroup *or* all processes attached to its subtree as well. > > I'd suggest it be implemented to accept a string, "default"/"process", > "local" or "tree"/"hierarchy", or better names, to define the group oom > mechanism for the mem cgroup that is oom when one of its processes is > selected as a victim. I would prefer to keep it boolean to match the simplicity of cgroup v2 API. In v2 hierarchy processes can't be attached to non-leaf cgroups, so I don't see the place for the 3rd meaning. > > > * More precisely: if the OOM killer kills a task, > > it will traverse the cgroup tree up to the OOM domain (OOMing memcg or root), > > looking for the highest-level cgroup with group_oom set. Then it will > > kill all tasks in such cgroup, if it does exist. > > > > All such processes that are not oom disabled, yes. > Yep, of course. Thanks!