From: David Rientjes <rientjes@google.com>
To: Roman Gushchin <guro@fb.com>
Cc: linux-mm@kvack.org, Michal Hocko <mhocko@kernel.org>,
Vladimir Davydov <vdavydov.dev@gmail.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
Tejun Heo <tj@kernel.org>,
kernel-team@fb.com, cgroups@vger.kernel.org,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [v5 2/4] mm, oom: cgroup-aware OOM killer
Date: Sun, 20 Aug 2017 17:50:27 -0700 (PDT) [thread overview]
Message-ID: <alpine.DEB.2.10.1708201741330.117182@chino.kir.corp.google.com> (raw)
In-Reply-To: <20170816154325.GB29131@castle.DHCP.thefacebook.com>
On Wed, 16 Aug 2017, Roman Gushchin wrote:
> It's natural to expect that inside a container there are their own sshd,
> "activity manager" or some other stuff, which can play with oom_score_adj.
> If it can override the upper cgroup-level settings, the whole delegation model
> is broken.
>
I don't think any delegation model related to core cgroups or memory
cgroup is broken, I think it's based on how memory.oom_kill_all_tasks is
defined. It could very well behave as memory.oom_kill_all_eligible_tasks
when enacted upon.
> You can think about the oom_kill_all_tasks like the panic_on_oom,
> but on a cgroup level. It should _guarantee_, that in case of oom
> the whole cgroup will be destroyed completely, and will not remain
> in a non-consistent state.
>
Only CAP_SYS_ADMIN has this ability to set /proc/pid/oom_score_adj to
OOM_SCORE_ADJ_MIN, so it preserves the ability to change that setting, if
needed, when it sets memory.oom_kill_all_tasks. If a user gains
permissions to change memory.oom_kill_all_tasks, I disagree it should
override the CAP_SYS_ADMIN setting of /proc/pid/oom_score_adj.
I would prefer not to exclude oom disabled processes to their own sibling
cgroups because they would require their own reservation with cgroup v2
and it makes the single hierarchy model much more difficult to arrange
alongside cpusets, for example.
> The model you're describing is based on a trust given to these oom-unkillable
> processes on system level. But we can't really trust some unknown processes
> inside a cgroup that they will be able to do some useful work and finish
> in a reasonable time; especially in case of a global memory shortage.
Yes, we prefer to panic instead of sshd, for example, being oom killed.
We trust that sshd, as well as our own activity manager and security
daemons are trusted to do useful work and that we never want the kernel to
do this. I'm not sure why you are describing processes that CAP_SYS_ADMIN
has set to be oom disabled as unknown processes.
I'd be interested in hearing the opinions of others related to a per-memcg
knob being allowed to override the setting of the sysadmin.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-08-21 0:50 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-14 18:32 [v5 1/4] mm, oom: refactor the oom_kill_process() function Roman Gushchin
2017-08-14 18:32 ` [v5 0/4] cgroup-aware OOM killer Roman Gushchin
2017-08-14 18:32 ` [v5 2/4] mm, oom: " Roman Gushchin
2017-08-14 22:42 ` David Rientjes
2017-08-15 12:15 ` Roman Gushchin
2017-08-15 12:20 ` Aleksa Sarai
2017-08-15 12:57 ` Roman Gushchin
2017-08-15 21:47 ` David Rientjes
2017-08-16 15:43 ` Roman Gushchin
2017-08-21 0:50 ` David Rientjes [this message]
2017-08-21 9:46 ` Roman Gushchin
2017-08-22 17:03 ` Johannes Weiner
2017-08-23 16:20 ` Roman Gushchin
2017-08-23 17:24 ` Johannes Weiner
2017-08-23 18:04 ` Roman Gushchin
2017-08-23 23:13 ` David Rientjes
2017-08-14 18:32 ` [v5 3/4] mm, oom: introduce oom_priority for memory cgroups Roman Gushchin
2017-08-14 22:44 ` David Rientjes
2017-08-14 18:32 ` [v5 4/4] mm, oom, docs: describe the cgroup-aware OOM killer Roman Gushchin
2017-08-14 22:52 ` David Rientjes
2017-08-15 14:13 ` Roman Gushchin
2017-08-15 20:56 ` David Rientjes
2017-08-16 14:43 ` Roman Gushchin
2017-08-17 12:16 ` Roman Gushchin
2017-08-21 0:41 ` David Rientjes
2017-08-14 22:00 ` [v5 1/4] mm, oom: refactor the oom_kill_process() function David Rientjes
2017-08-22 17:06 ` Johannes Weiner
2017-08-23 12:30 ` Roman Gushchin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.2.10.1708201741330.117182@chino.kir.corp.google.com \
--to=rientjes@google.com \
--cc=cgroups@vger.kernel.org \
--cc=guro@fb.com \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@fb.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=penguin-kernel@i-love.sakura.ne.jp \
--cc=tj@kernel.org \
--cc=vdavydov.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox