From: David Rientjes <rientjes@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Balbir Singh <balbir@linux.vnet.ibm.com>,
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
linux-mm@kvack.org
Subject: Re: [patch] memcg: add oom killer delay
Date: Mon, 7 Mar 2011 17:33:44 -0800 (PST) [thread overview]
Message-ID: <alpine.DEB.2.00.1103071721330.25197@chino.kir.corp.google.com> (raw)
In-Reply-To: <20110307171853.c31ec416.akpm@linux-foundation.org>
On Mon, 7 Mar 2011, Andrew Morton wrote:
> > It could be, if users assign the handler to a different memcg; otherwise,
> > it's guaranteed.
>
> Putting the handler into the same container would be rather daft.
>
> If userspace is going to elect to take over a kernel function then it
> should be able to perform that function reliably. We don't have hacks
> in the kernel to stop runaway SCHED_FIFO tasks, either. If the oom
> handler has put itself into a memcg and then has permitted that memcg
> to go oom then userspace is busted.
>
We have a container specifically for daemons like this and have struggled
for years to accurately predict how much memory it needs and what to do
when it is oom. The problem, in this case, is that when it's oom it's too
late: the memcg is livelocked and then no memory limits on the system have
a chance of getting increased and nothing in oom memcgs are guaranteed to
ever make forward progress again.
That's why I keep bringing up the point that this patch is not a bugfix:
it's an extension of a feature (memory.oom_control) to allow userspace a
period of time to respond to memcgs reaching their hard limit before
killing something. For our container with vital system daemons, this is
absolutely mandatory if something consumes a large amount of memory and
needs to be restarted; we want the logic in userspace to determine what to
do without killing vital tasks or panicking. We want to use the oom
killer only as a last resort and that can effectively be done with this
patch and not with memory.oom_control (and I think this is why Kame acked
it).
> My issue with this patch is that it extends the userspace API. This
> means we're committed to maintaining that interface *and its behaviour*
> for evermore. But the oom-killer and memcg are both areas of intense
> development and the former has a habit of getting ripped out and
> rewritten. Committing ourselves to maintaining an extension to the
> userspace interface is a big thing, especially as that extension is
> somewhat tied to internal implementation details and is most definitely
> tied to short-term inadequacies in userspace and in the kernel
> implementation.
>
The same could have been said for memory.oom_control to disable the oom
killer entirely which no seems to be solidified as the only way to
influence oom killer behavior from the kernel and now we're locked into
that limitation because we don't want dual interfaces. I think this patch
would have been received much better prior to memory.oom_control since it
allows for the same behavior with an infinite timeout. memory.oom_control
is not an option for us since we can't guarantee that any userspace daemon
at our scale will ever be responsive 100% of the time.
I don't think the idea of a userspace grace period when a memcg is oom is
that abstract, though. I think applications should have the opportunity
to free some of their own memory first when oom instead of abruptly
killing something and restarting it.
So, in the end, we may have to carry this patch internally forever but I
think as memcg becomes more popular we'll have a higher demand for such a
grace period.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-03-08 1:33 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-02-08 0:24 David Rientjes
2011-02-08 1:55 ` KAMEZAWA Hiroyuki
2011-02-08 2:13 ` David Rientjes
2011-02-08 2:13 ` KAMEZAWA Hiroyuki
2011-02-08 2:20 ` KAMEZAWA Hiroyuki
2011-02-08 2:37 ` David Rientjes
2011-02-08 10:25 ` Balbir Singh
2011-02-09 22:19 ` David Rientjes
2011-02-10 0:04 ` KAMEZAWA Hiroyuki
2011-02-16 3:15 ` David Rientjes
2011-02-20 22:19 ` David Rientjes
2011-02-23 23:08 ` Andrew Morton
2011-02-24 0:13 ` KAMEZAWA Hiroyuki
2011-02-24 0:51 ` David Rientjes
2011-03-03 20:11 ` David Rientjes
2011-03-03 21:52 ` Andrew Morton
2011-03-08 0:12 ` David Rientjes
2011-03-08 0:29 ` Andrew Morton
2011-03-08 0:36 ` David Rientjes
2011-03-08 0:51 ` Andrew Morton
2011-03-08 1:02 ` David Rientjes
2011-03-08 1:18 ` Andrew Morton
2011-03-08 1:33 ` David Rientjes [this message]
2011-03-08 2:51 ` KAMEZAWA Hiroyuki
2011-03-08 3:07 ` David Rientjes
2011-03-08 3:13 ` KAMEZAWA Hiroyuki
2011-03-08 3:56 ` David Rientjes
2011-03-08 4:17 ` KAMEZAWA Hiroyuki
2011-03-08 5:30 ` David Rientjes
2011-03-08 5:49 ` KAMEZAWA Hiroyuki
2011-03-08 23:49 ` David Rientjes
2011-03-09 6:04 ` KAMEZAWA Hiroyuki
2011-03-09 6:44 ` David Rientjes
2011-03-09 7:16 ` KAMEZAWA Hiroyuki
2011-03-09 21:12 ` David Rientjes
2011-03-09 21:27 ` [patch] memcg: give current access to memory reserves if it's trying to die David Rientjes
2011-03-09 23:30 ` KAMEZAWA Hiroyuki
2011-03-17 23:37 ` David Rientjes
2011-03-17 23:53 ` Andrew Morton
2011-03-18 4:35 ` KAMEZAWA Hiroyuki
2011-03-18 5:17 ` Andrew Morton
2011-03-18 5:58 ` KAMEZAWA Hiroyuki
2011-03-18 20:36 ` David Rientjes
2011-03-18 20:32 ` David Rientjes
2011-03-08 3:06 ` [patch] memcg: add oom killer delay KAMEZAWA Hiroyuki
-- strict thread matches above, loose matches on Subject: below --
2010-12-22 7:27 David Rientjes
2010-12-22 7:59 ` Andrew Morton
2010-12-22 8:17 ` KAMEZAWA Hiroyuki
2010-12-22 8:31 ` KOSAKI Motohiro
2010-12-22 8:48 ` David Rientjes
2010-12-22 8:48 ` KAMEZAWA Hiroyuki
2010-12-22 8:55 ` KAMEZAWA Hiroyuki
2010-12-22 9:21 ` David Rientjes
2010-12-27 1:47 ` KAMEZAWA Hiroyuki
2010-12-22 9:04 ` David Rientjes
2010-12-22 8:42 ` David Rientjes
2010-12-25 10:47 ` Balbir Singh
2010-12-26 20:35 ` David Rientjes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.2.00.1103071721330.25197@chino.kir.corp.google.com \
--to=rientjes@google.com \
--cc=akpm@linux-foundation.org \
--cc=balbir@linux.vnet.ibm.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=nishimura@mxp.nes.nec.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox