From: Johannes Weiner <hannes@cmpxchg.org>
To: Roman Gushchin <guro@fb.com>
Cc: linux-mm@kvack.org, Michal Hocko <mhocko@kernel.org>,
Vladimir Davydov <vdavydov.dev@gmail.com>,
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
David Rientjes <rientjes@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Tejun Heo <tj@kernel.org>,
kernel-team@fb.com, cgroups@vger.kernel.org,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [v10 5/6] mm, oom: add cgroup v2 mount option for cgroup-aware OOM killer
Date: Wed, 4 Oct 2017 16:04:53 -0400 [thread overview]
Message-ID: <20171004200453.GE1501@cmpxchg.org> (raw)
In-Reply-To: <20171004154638.710-6-guro@fb.com>
On Wed, Oct 04, 2017 at 04:46:37PM +0100, Roman Gushchin wrote:
> Add a "groupoom" cgroup v2 mount option to enable the cgroup-aware
> OOM killer. If not set, the OOM selection is performed in
> a "traditional" per-process way.
>
> The behavior can be changed dynamically by remounting the cgroupfs.
>
> Signed-off-by: Roman Gushchin <guro@fb.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: kernel-team@fb.com
> Cc: cgroups@vger.kernel.org
> Cc: linux-doc@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-mm@kvack.org
> ---
> include/linux/cgroup-defs.h | 5 +++++
> kernel/cgroup/cgroup.c | 10 ++++++++++
> mm/memcontrol.c | 3 +++
> 3 files changed, 18 insertions(+)
>
> diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
> index 3e55bbd31ad1..cae5343a8b21 100644
> --- a/include/linux/cgroup-defs.h
> +++ b/include/linux/cgroup-defs.h
> @@ -80,6 +80,11 @@ enum {
> * Enable cpuset controller in v1 cgroup to use v2 behavior.
> */
> CGRP_ROOT_CPUSET_V2_MODE = (1 << 4),
> +
> + /*
> + * Enable cgroup-aware OOM killer.
> + */
> + CGRP_GROUP_OOM = (1 << 5),
> };
>
> /* cftype->flags */
> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> index c3421ee0d230..8d8aa46ff930 100644
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -1709,6 +1709,9 @@ static int parse_cgroup_root_flags(char *data, unsigned int *root_flags)
> if (!strcmp(token, "nsdelegate")) {
> *root_flags |= CGRP_ROOT_NS_DELEGATE;
> continue;
> + } else if (!strcmp(token, "groupoom")) {
> + *root_flags |= CGRP_GROUP_OOM;
> + continue;
> }
>
> pr_err("cgroup2: unknown option \"%s\"\n", token);
> @@ -1725,6 +1728,11 @@ static void apply_cgroup_root_flags(unsigned int root_flags)
> cgrp_dfl_root.flags |= CGRP_ROOT_NS_DELEGATE;
> else
> cgrp_dfl_root.flags &= ~CGRP_ROOT_NS_DELEGATE;
> +
> + if (root_flags & CGRP_GROUP_OOM)
> + cgrp_dfl_root.flags |= CGRP_GROUP_OOM;
> + else
> + cgrp_dfl_root.flags &= ~CGRP_GROUP_OOM;
> }
> }
>
> @@ -1732,6 +1740,8 @@ static int cgroup_show_options(struct seq_file *seq, struct kernfs_root *kf_root
> {
> if (cgrp_dfl_root.flags & CGRP_ROOT_NS_DELEGATE)
> seq_puts(seq, ",nsdelegate");
> + if (cgrp_dfl_root.flags & CGRP_GROUP_OOM)
> + seq_puts(seq, ",groupoom");
> return 0;
> }
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 1fcd6cc353d5..2e82625bd354 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2865,6 +2865,9 @@ bool mem_cgroup_select_oom_victim(struct oom_control *oc)
> if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
> return false;
>
> + if (!(cgrp_dfl_root.flags & CGRP_GROUP_OOM))
> + return false;
That will silently ignore what the user writes to the memory.oom_group
control files across the system's cgroup tree.
We'll have a knob that lets the workload declare itself an indivisible
memory consumer, that it would like to get killed in one piece, and
it's silently ignored because of a mount option they forgot to pass.
That's not good from an interface perspective.
On the other hand, the only benefit of this patch is to shield users
from changes to the OOM killing heuristics. Yet, it's really hard to
imagine that modifying the victim selection process slightly could be
called a regression in any way. We have done that many times over,
without a second thought on backwards compatibility:
5e9d834a0e0c oom: sacrifice child with highest badness score for parent
a63d83f427fb oom: badness heuristic rewrite
778c14affaf9 mm, oom: base root bonus on current usage
Let's not make the userspace interface crap because of some misguided
idea that the OOM heuristic is a hard promise to userspace. It's never
been, and nobody has complained about changes in the past.
This case is doubly silly, as the behavior change only applies to
cgroup2, which doesn't exactly have a large base of legacy users yet.
Let's just drop this 5/6 patch.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-10-04 20:04 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-04 15:46 [v10 0/6] " Roman Gushchin
2017-10-04 15:46 ` [v10 1/6] mm, oom: refactor the oom_kill_process() function Roman Gushchin
2017-10-04 19:14 ` Johannes Weiner
2017-10-04 15:46 ` [v10 2/6] mm: implement mem_cgroup_scan_tasks() for the root memory cgroup Roman Gushchin
2017-10-04 19:15 ` Johannes Weiner
2017-10-04 20:10 ` David Rientjes
2017-10-04 15:46 ` [v10 3/6] mm, oom: cgroup-aware OOM killer Roman Gushchin
2017-10-04 19:27 ` Johannes Weiner
2017-10-04 19:51 ` Roman Gushchin
2017-10-04 20:17 ` David Rientjes
2017-10-04 20:22 ` Roman Gushchin
2017-10-04 20:31 ` Johannes Weiner
2017-10-05 11:14 ` Michal Hocko
2017-10-04 19:48 ` Shakeel Butt
2017-10-04 20:15 ` Roman Gushchin
2017-10-04 21:24 ` Shakeel Butt
2017-10-05 10:27 ` Roman Gushchin
2017-10-05 11:12 ` Michal Hocko
2017-10-05 11:45 ` Roman Gushchin
2017-10-04 20:27 ` David Rientjes
2017-10-04 20:41 ` Johannes Weiner
2017-10-05 8:40 ` David Rientjes
2017-10-05 10:27 ` Johannes Weiner
2017-10-05 21:53 ` David Rientjes
2017-10-05 10:44 ` Roman Gushchin
2017-10-05 22:02 ` David Rientjes
2017-10-06 5:43 ` Michal Hocko
2017-10-05 11:40 ` Michal Hocko
2017-10-04 15:46 ` [v10 4/6] mm, oom: introduce memory.oom_group Roman Gushchin
2017-10-04 19:37 ` Johannes Weiner
2017-10-05 12:06 ` Michal Hocko
2017-10-05 12:32 ` Roman Gushchin
2017-10-05 12:58 ` Michal Hocko
2017-10-04 15:46 ` [v10 5/6] mm, oom: add cgroup v2 mount option for cgroup-aware OOM killer Roman Gushchin
2017-10-04 20:04 ` Johannes Weiner [this message]
2017-10-05 13:14 ` Michal Hocko
2017-10-05 13:41 ` Roman Gushchin
2017-10-05 14:10 ` Michal Hocko
2017-10-05 14:54 ` Johannes Weiner
2017-10-05 16:40 ` Michal Hocko
2017-10-05 15:51 ` Tejun Heo
2017-10-04 15:46 ` [v10 6/6] mm, oom, docs: describe the " Roman Gushchin
2017-10-04 20:08 ` Johannes Weiner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171004200453.GE1501@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=guro@fb.com \
--cc=kernel-team@fb.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=penguin-kernel@I-love.SAKURA.ne.jp \
--cc=rientjes@google.com \
--cc=tj@kernel.org \
--cc=vdavydov.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox