From: Michal Hocko <mhocko@kernel.org>
To: Roman Gushchin <guro@fb.com>
Cc: Vladimir Davydov <vdavydov@tarantool.org>,
Johannes Weiner <hannes@cmpxchg.org>, Tejun Heo <tj@kernel.org>,
Li Zefan <lizefan@huawei.com>,
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
kernel-team@fb.com, cgroups@vger.kernel.org,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org
Subject: Re: [RFC PATCH] mm, oom: cgroup-aware OOM-killer
Date: Tue, 23 May 2017 09:07:47 +0200 [thread overview]
Message-ID: <20170523070747.GF12813@dhcp22.suse.cz> (raw)
In-Reply-To: <20170522170116.GB22625@castle>
On Mon 22-05-17 18:01:16, Roman Gushchin wrote:
> On Sat, May 20, 2017 at 09:37:29PM +0300, Vladimir Davydov wrote:
> > Hello Roman,
>
> Hi Vladimir!
>
> >
> > On Thu, May 18, 2017 at 05:28:04PM +0100, Roman Gushchin wrote:
> > ...
> > > +5-2-4. Cgroup-aware OOM Killer
> > > +
> > > +Cgroup v2 memory controller implements a cgroup-aware OOM killer.
> > > +It means that it treats memory cgroups as memory consumers
> > > +rather then individual processes. Under the OOM conditions it tries
> > > +to find an elegible leaf memory cgroup, and kill all processes
> > > +in this cgroup. If it's not possible (e.g. all processes belong
> > > +to the root cgroup), it falls back to the traditional per-process
> > > +behaviour.
> >
> > I agree that the current OOM victim selection algorithm is totally
> > unfair in a system using containers and it has been crying for rework
> > for the last few years now, so it's great to see this finally coming.
> >
> > However, I don't reckon that killing a whole leaf cgroup is always the
> > best practice. It does make sense when cgroups are used for
> > containerizing services or applications, because a service is unlikely
> > to remain operational after one of its processes is gone, but one can
> > also use cgroups to containerize processes started by a user. Kicking a
> > user out for one of her process has gone mad doesn't sound right to me.
>
> I agree, that it's not always a best practise, if you're not allowed
> to change the cgroup configuration (e.g. create new cgroups).
> IMHO, this case is mostly covered by using the v1 cgroup interface,
> which remains unchanged.
But there are features which are v2 only and users might really want to
use it. So I really do not buy this v2-only argument.
> If you do have control over cgroups, you can put processes into
> separate cgroups, and obtain control over OOM victim selection and killing.
Usually you do not have that control because there is a global daemon
doing the placement for you.
> > Another example when the policy you're suggesting fails in my opinion is
> > in case a service (cgroup) consists of sub-services (sub-cgroups) that
> > run processes. The main service may stop working normally if one of its
> > sub-services is killed. So it might make sense to kill not just an
> > individual process or a leaf cgroup, but the whole main service with all
> > its sub-services.
>
> I agree, although I do not pretend for solving all possible
> userspace problems caused by an OOM.
>
> How to react on an OOM - is definitely a policy, which depends
> on the workload. Nothing is changing here from how it's working now,
> except now kernel will choose a victim cgroup, and kill the victim cgroup
> rather than a process.
There is a _big_ difference. The current implementation just tries
to recover from the OOM situation without carying much about the
consequences on the workload. This is the last resort and a services for
the _system_ to get back to sane state. You are trying to make it more
clever and workload aware and that is inevitable going to depend on the
specific workload. I really do think we cannot simply hardcode any
policy into the kernel for this purpose and that is why I would like to
see a discussion about how to do that in a more extensible way. This
might be harder to implement now but it I believe it will turn out
better longerm.
> > And both kinds of workloads (services/applications and individual
> > processes run by users) can co-exist on the same host - consider the
> > default systemd setup, for instance.
> >
> > IMHO it would be better to give users a choice regarding what they
> > really want for a particular cgroup in case of OOM - killing the whole
> > cgroup or one of its descendants. For example, we could introduce a
> > per-cgroup flag that would tell the kernel whether the cgroup can
> > tolerate killing a descendant or not. If it can, the kernel will pick
> > the fattest sub-cgroup or process and check it. If it cannot, it will
> > kill the whole cgroup and all its processes and sub-cgroups.
>
> The last thing we want to do, is to compare processes with cgroups.
> I agree, that we can have some option to disable the cgroup-aware OOM at all,
> mostly for backward-compatibility. But I don't think it should be a
> per-cgroup configuration option, which we will support forever.
I can clearly see a demand for "this is definitely more important
container than others so do not kill" usecases. I can also see demand
for "do not kill this container running for X days". And more are likely
to pop out.
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-05-23 7:07 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-05-18 16:28 Roman Gushchin
2017-05-18 17:30 ` Michal Hocko
2017-05-18 18:11 ` Johannes Weiner
2017-05-19 8:02 ` Michal Hocko
2017-05-18 18:37 ` Balbir Singh
2017-05-18 19:20 ` Roman Gushchin
2017-05-18 19:41 ` Balbir Singh
2017-05-18 19:22 ` Johannes Weiner
2017-05-18 19:43 ` Balbir Singh
2017-05-18 20:15 ` Johannes Weiner
2017-05-20 18:37 ` Vladimir Davydov
2017-05-22 17:01 ` Roman Gushchin
2017-05-23 7:07 ` Michal Hocko [this message]
2017-05-23 13:25 ` Johannes Weiner
2017-05-25 15:38 ` Michal Hocko
2017-05-25 17:08 ` Johannes Weiner
2017-05-31 16:25 ` Michal Hocko
2017-05-31 18:01 ` Johannes Weiner
2017-06-02 8:43 ` Michal Hocko
2017-06-02 15:18 ` Roman Gushchin
2017-06-05 8:27 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170523070747.GF12813@dhcp22.suse.cz \
--to=mhocko@kernel.org \
--cc=cgroups@vger.kernel.org \
--cc=guro@fb.com \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@fb.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lizefan@huawei.com \
--cc=penguin-kernel@I-love.SAKURA.ne.jp \
--cc=tj@kernel.org \
--cc=vdavydov@tarantool.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox