linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Yafang Shao <laoar.shao@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Linux MM <linux-mm@kvack.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Roman Gushchin <guro@fb.com>
Subject: Re: [PATCH] mm, memcg: introduce per memcg oom_score_adj
Date: Thu, 22 Aug 2019 12:59:18 +0200	[thread overview]
Message-ID: <20190822105918.GH12785@dhcp22.suse.cz> (raw)
In-Reply-To: <CALOAHbAOH+Y+sN3ynAiBDm=JWrm4XpyUm8s3r9G=Oz4b0iNvCA@mail.gmail.com>

On Thu 22-08-19 17:34:54, Yafang Shao wrote:
> On Thu, Aug 22, 2019 at 5:19 PM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Thu 22-08-19 04:56:29, Yafang Shao wrote:
> > > - Why we need a per memcg oom_score_adj setting ?
> > > This is easy to deploy and very convenient for container.
> > > When we use container, we always treat memcg as a whole, if we have a per
> > > memcg oom_score_adj setting we don't need to set it process by process.
> >
> > Why cannot an initial process in the cgroup set the oom_score_adj and
> > other processes just inherit it from there? This sounds trivial to do
> > with a startup script.
> >
> 
> That is what we used to do before.
> But it can't apply to the running containers.
> 
> 
> > > It will make the user exhausted to set it to all processes in a memcg.
> >
> > Then let's have scripts to set it as they are less prone to exhaustion
> > ;)
> 
> That is not easy to deploy it to the production environment.

What is hard about a simple loop over tasklist exported by cgroup and
apply a value to oom_score_adj?

[...]

> > Besides that. What is the hierarchical semantic? Say you have hierarchy
> >         A (oom_score_adj = 1000)
> >          \
> >           B (oom_score_adj = 500)
> >            \
> >             C (oom_score_adj = -1000)
> >
> > put the above summing up aside for now and just focus on the memcg
> > adjusting?
> 
> I think that there's no conflict between children's oom_score_adj,
> that is different with memory.max.
> So it is not neccessary to consider the parent's oom_sore_adj.

Each exported cgroup tuning _has_ to be hierarchical so that an admin
can override children setting in order to safely delegate the
configuration.

Last but not least, oom_score_adj has proven to be a terrible interface
that is essentially close to unusable to anything outside of extreme
values (-1000 and very arguably 1000). Making it cgroup aware without
changing oom victim selection to consider cgroup as a whole will also be
a pain so I am afraid that this is a dead end path.

We can discuss cgroup aware oom victim selection for sure and there are
certainly reasonable usecases to back that functionality. Please refer
to discussion from 2017/2018 (dubbed as "cgroup-aware OOM killer"). But
be warned this is a tricky area and there was a fundamental disagreement
on how things should be classified without a clear way to reach
consensus. What we have right now is the only agreement we could reach.
It is likely possible that the only more clever cgroup aware oom
selection has to be implemented in the userspace with an understanding
of the specific workload.
-- 
Michal Hocko
SUSE Labs


  reply	other threads:[~2019-08-22 10:59 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-22  8:56 Yafang Shao
2019-08-22  9:19 ` Michal Hocko
2019-08-22  9:34   ` Yafang Shao
2019-08-22 10:59     ` Michal Hocko [this message]
2019-08-22 22:46       ` Roman Gushchin
2019-08-23  1:26         ` Yafang Shao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190822105918.GH12785@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=laoar.shao@gmail.com \
    --cc=linux-mm@kvack.org \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox