Re: [v10 5/6] mm, oom: add cgroup v2 mount option for cgroup-aware OOM killer

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Johannes Weiner <hannes@cmpxchg.org>
To: Michal Hocko <mhocko@kernel.org>
Cc: linux-mm@kvack.org, Vladimir Davydov <vdavydov.dev@gmail.com>,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	David Rientjes <rientjes@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Tejun Heo <tj@kernel.org>,
	kernel-team@fb.com, cgroups@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [v10 5/6] mm, oom: add cgroup v2 mount option for cgroup-aware OOM killer
Date: Thu, 5 Oct 2017 10:54:01 -0400	[thread overview]
Message-ID: <20171005144803.GA5733@cmpxchg.org> (raw)
In-Reply-To: <20171005131419.4o6qynsl2qxomekb@dhcp22.suse.cz>

On Thu, Oct 05, 2017 at 03:14:19PM +0200, Michal Hocko wrote:
> On Wed 04-10-17 16:04:53, Johannes Weiner wrote:
> [...]
> > That will silently ignore what the user writes to the memory.oom_group
> > control files across the system's cgroup tree.
> > 
> > We'll have a knob that lets the workload declare itself an indivisible
> > memory consumer, that it would like to get killed in one piece, and
> > it's silently ignored because of a mount option they forgot to pass.
> > 
> > That's not good from an interface perspective.
> 
> Yes and that is why I think a boot time knob would be the most simple
> way. It will also open doors for more oom policies in future which I
> believe come sooner or later.

A boot time knob makes less sense to me than the mount option. It
doesn't require a reboot to change this behavior, we shouldn't force
the user to reboot when a runtime configuration is possible.

But I don't see how dropping this patch as part of this series would
prevent adding modular oom policies in the future?

That said, selectable OOM policies sound like a total deadend to
me. The kernel OOM happens way too late to be useful for any kind of
resource policy already. Even now it won't prevent you from thrashing
indefinitely, with only 5% of your workload's time spent productively.

What kind of service quality do you have at this point?

The *minority* of our OOM situations (in terms of "this isn't making
real progress anymore due to a lack of memory") is even *seeing* OOM
kills at this point. And it'll get worse as storage gets faster and
memory bigger.

How is that useful as a resource arbitration point?

Then there is the question of reliability. I mean, we still don't have
a global OOM killer that is actually free from deadlocks. We don't
have reserves measured to the exact requirements of reclaim that would
guarantee recovery, the OOM reaper requires a lock that we hope isn't
taken, etc. I wouldn't want any of my fleet to rely on this for
regular operation - I'm just glad that, when we do mess up and hit
this event, we don't have to reboot.

It makes much more sense to monitor memory pressure from userspace and
smartly intervene when things turn unproductive, which is a long way
from the point where the kernel is about to *deadlock* due to memory.

Global OOM kills can still happen, but their goal should really be 1)
to save the kernel, 2) respect the integrity of a memory consumer and
3) be comprehensible to userspace. (These patches are about 2 and 3.)

But abstracting such a rudimentary and fragile deadlock avoidance
mechanism into higher-level resource management, or co-opting it as a
policy enforcement tool, is crazy to me.

And it seems reckless to present it as those things to our users by
encoding any such elaborate policy interfaces.

> > On the other hand, the only benefit of this patch is to shield users
> > from changes to the OOM killing heuristics. Yet, it's really hard to
> > imagine that modifying the victim selection process slightly could be
> > called a regression in any way. We have done that many times over,
> > without a second thought on backwards compatibility:
> > 
> > 5e9d834a0e0c oom: sacrifice child with highest badness score for parent
> > a63d83f427fb oom: badness heuristic rewrite
> > 778c14affaf9 mm, oom: base root bonus on current usage
> 
> yes we have changed that without a deeper considerations. Some of those
> changes are arguable (e.g. child scarification). The oom badness
> heuristic rewrite has triggered quite some complains AFAIR (I remember
> Kosaki has made several attempts to revert it). I think that we are
> trying to be more careful about user visible changes than we used to be.

Whatever grumbling might have come up, it has not resulted in a revert
or a way to switch back to the old behavior. So I don't think this can
be considered an actual regression.

We change heuristics in the MM all the time. If you track for example
allocator behavior over different kernel versions, you can see how
much our caching policy, our huge page policy etc. fluctuates. The
impact of that is way bigger to regular workloads than how we go about
choosing an OOM victim.

We don't want to regress anybody, but let's also keep perspective here
and especially consider the userspace interfaces we are willing to put
in for at least the next few years, the promises we want to make, the
further fragmentation of the config space, for such a negligible risk.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2017-10-05 14:54 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-04 15:46 [v10 0/6] " Roman Gushchin
2017-10-04 15:46 ` [v10 1/6] mm, oom: refactor the oom_kill_process() function Roman Gushchin
2017-10-04 19:14   ` Johannes Weiner
2017-10-04 15:46 ` [v10 2/6] mm: implement mem_cgroup_scan_tasks() for the root memory cgroup Roman Gushchin
2017-10-04 19:15   ` Johannes Weiner
2017-10-04 20:10   ` David Rientjes
2017-10-04 15:46 ` [v10 3/6] mm, oom: cgroup-aware OOM killer Roman Gushchin
2017-10-04 19:27   ` Johannes Weiner
2017-10-04 19:51     ` Roman Gushchin
2017-10-04 20:17       ` David Rientjes
2017-10-04 20:22         ` Roman Gushchin
2017-10-04 20:31         ` Johannes Weiner
2017-10-05 11:14           ` Michal Hocko
2017-10-04 19:48   ` Shakeel Butt
2017-10-04 20:15     ` Roman Gushchin
2017-10-04 21:24       ` Shakeel Butt
2017-10-05 10:27         ` Roman Gushchin
2017-10-05 11:12           ` Michal Hocko
2017-10-05 11:45             ` Roman Gushchin
2017-10-04 20:27   ` David Rientjes
2017-10-04 20:41     ` Johannes Weiner
2017-10-05  8:40       ` David Rientjes
2017-10-05 10:27         ` Johannes Weiner
2017-10-05 21:53           ` David Rientjes
2017-10-05 10:44         ` Roman Gushchin
2017-10-05 22:02           ` David Rientjes
2017-10-06  5:43             ` Michal Hocko
2017-10-05 11:40   ` Michal Hocko
2017-10-04 15:46 ` [v10 4/6] mm, oom: introduce memory.oom_group Roman Gushchin
2017-10-04 19:37   ` Johannes Weiner
2017-10-05 12:06   ` Michal Hocko
2017-10-05 12:32     ` Roman Gushchin
2017-10-05 12:58       ` Michal Hocko
2017-10-04 15:46 ` [v10 5/6] mm, oom: add cgroup v2 mount option for cgroup-aware OOM killer Roman Gushchin
2017-10-04 20:04   ` Johannes Weiner
2017-10-05 13:14     ` Michal Hocko
2017-10-05 13:41       ` Roman Gushchin
2017-10-05 14:10         ` Michal Hocko
2017-10-05 14:54       ` Johannes Weiner [this message]
2017-10-05 16:40         ` Michal Hocko
2017-10-05 15:51       ` Tejun Heo
2017-10-04 15:46 ` [v10 6/6] mm, oom, docs: describe the " Roman Gushchin
2017-10-04 20:08   ` Johannes Weiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171005144803.GA5733@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=kernel-team@fb.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=rientjes@google.com \
    --cc=tj@kernel.org \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox