From: David Rientjes <rientjes@google.com>
To: Michal Hocko <mhocko@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Johannes Weiner <hannes@cmpxchg.org>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
cgroups@vger.kernel.org
Subject: Re: [patch] mm, memcg: add oom killer delay
Date: Wed, 12 Jun 2013 14:27:05 -0700 (PDT) [thread overview]
Message-ID: <alpine.DEB.2.02.1306121408490.24902@chino.kir.corp.google.com> (raw)
In-Reply-To: <20130612202348.GA17282@dhcp22.suse.cz>
On Wed, 12 Jun 2013, Michal Hocko wrote:
> But the objective is to handle oom deadlocks gracefully and you cannot
> possibly miss those as they are, well, _deadlocks_.
That's not at all the objective, the changelog quite explicitly states
this is a deadlock as the result of userspace having disabled the oom
killer so that its userspace oom handler can resolve the condition and it
being unresponsive or unable to perform its job.
When you allow users to create their own memcgs, which we do and is
possible by chowning the user's root to be owned by it, and implement
their own userspace oom notifier, you must then rely on their
implementation to work 100% of the time, otherwise all those gigabytes of
memory go unfreed forever. What you're insisting on is that this
userspace is perfect and there is never any memory allocated (otherwise it
may oom its own user root memcg where the notifier is hosted) and it is
always responsive and able to handle the situation. This is not reality.
This is why the kernel has its own oom killer and doesn't wait for a user
to go to kill something. There's no option to disable the kernel oom
killer. It's because we don't want to leave the system in a state where
no progress can be made. The same intention is for memcgs to not be left
in a state where no progress can be made even if userspace has the best
intentions.
Your solution of a global entity to prevent these situations doesn't work
for the same reason we can't implement the kernel oom killer in userspace.
It's the exact same reason. We also want to push patches that allow
global oom conditions to trigger an eventfd notification on the root memcg
with the exact same semantics of a memcg oom: allow it time to respond but
step in and kill something if it fails to respond. Memcg happens to be
the perfect place to implement such a userspace policy and we want to have
a priority-based killing mechanism that is hierarchical and different from
oom_score_adj.
For that to work properly, it cannot possibly allocate memory even on page
fault so it must be mlocked in memory and have enough buffers to store the
priorities of top-level memcgs. Asking a global watchdog to sit there
mlocked in memory to store thousands of memcgs, their priorities, their
last oom, their timeouts, etc, is a non-starter.
I don't buy your argument that we're pushing any interface to an extreme.
Users having the ability to manipulate their own memcgs and subcontainers
isn't extreme, it's explicitly allowed by cgroups! What we're asking for
is that level of control for memcg is sane and that if userspace is
unresponsive that we don't lose gigabytes of memory forever. And since
we've supported this type of functionality even before memcg was created
for cpusets and have used and supported it for six years, I have no
problem supporting such a thing upstream.
I do understand that we're the largest user of memcg and use it unlike you
or others on this thread do, but that doesn't mean our usecase is any less
important or that we should aim for the most robust behavior possible.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-06-12 21:27 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-30 1:18 David Rientjes
2013-05-30 15:07 ` Michal Hocko
2013-05-30 20:47 ` David Rientjes
2013-05-31 8:10 ` Michal Hocko
2013-05-31 10:22 ` David Rientjes
2013-05-31 11:02 ` Michal Hocko
2013-05-31 11:21 ` Michal Hocko
2013-05-31 19:29 ` David Rientjes
2013-06-01 6:11 ` Johannes Weiner
2013-06-01 10:29 ` Michal Hocko
2013-06-01 15:15 ` Johannes Weiner
2013-06-03 15:34 ` Michal Hocko
2013-06-03 16:48 ` Johannes Weiner
2013-06-03 18:03 ` Michal Hocko
2013-06-03 18:30 ` Johannes Weiner
2013-06-03 21:33 ` KOSAKI Motohiro
2013-06-04 9:17 ` Michal Hocko
2013-06-04 18:48 ` Johannes Weiner
2013-06-04 19:27 ` Michal Hocko
2013-06-05 13:49 ` Michal Hocko
2013-06-03 16:31 ` Michal Hocko
2013-06-03 16:51 ` Johannes Weiner
2013-06-01 10:20 ` Michal Hocko
2013-06-03 18:18 ` David Rientjes
2013-06-03 18:54 ` Johannes Weiner
2013-06-03 19:09 ` David Rientjes
2013-06-03 21:43 ` Johannes Weiner
2013-06-03 19:31 ` Michal Hocko
2013-06-03 21:17 ` David Rientjes
2013-06-04 9:55 ` Michal Hocko
2013-06-05 6:40 ` David Rientjes
2013-06-05 9:39 ` Michal Hocko
2013-06-06 0:09 ` David Rientjes
2013-06-10 14:23 ` Michal Hocko
2013-06-11 20:33 ` David Rientjes
2013-06-12 20:23 ` Michal Hocko
2013-06-12 21:27 ` David Rientjes [this message]
2013-06-13 15:16 ` Michal Hocko
2013-06-13 22:25 ` David Rientjes
2013-06-14 0:56 ` Kamezawa Hiroyuki
2013-06-14 10:12 ` David Rientjes
2013-06-19 21:30 ` David Rientjes
2013-06-25 1:39 ` Kamezawa Hiroyuki
2013-06-26 23:18 ` David Rientjes
2013-07-10 11:23 ` Michal Hocko
2013-05-31 21:46 ` Andrew Morton
2013-06-03 18:00 ` David Rientjes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.2.02.1306121408490.24902@chino.kir.corp.google.com \
--to=rientjes@google.com \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox