Re: user defined OOM policies

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Rientjes <rientjes@google.com>
To: Luigi Semenzato <semenzato@google.com>
Cc: Michal Hocko <mhocko@suse.cz>,
	linux-mm@kvack.org, Greg Thelen <gthelen@google.com>,
	Glauber Costa <glommer@gmail.com>, Mel Gorman <mgorman@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Rik van Riel <riel@redhat.com>, Joern Engel <joern@logfs.org>,
	Hugh Dickins <hughd@google.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: user defined OOM policies
Date: Mon, 25 Nov 2013 17:29:20 -0800 (PST)	[thread overview]
Message-ID: <alpine.DEB.2.02.1311251717370.27270@chino.kir.corp.google.com> (raw)
In-Reply-To: <CAA25o9Q64eK5LHhrRyVn73kFz=Z7Jji=rYWS=9jWL_4y9ZGbQA@mail.gmail.com>

On Wed, 20 Nov 2013, Luigi Semenzato wrote:

> Yes, I agree that we can't always prevent OOM situations, and in fact
> we tolerate OOM kills, although they have a worse impact on the users
> than controlled freeing does.
> 

If the controlled freeing is able to actually free memory in time before 
hitting an oom condition, it should work pretty well.  That ability is 
seems to be highly dependent on sane thresholds for indvidual applications 
and I'm afraid we can never positively ensure that we wakeup and are able 
to free memory in time to avoid the oom condition.

> Well OK here it goes.  I hate to be a party-pooper, but the notion of
> a user-level OOM-handler scares me a bit for various reasons.
> 
> 1. Our custom notifier sends low-memory warnings well ahead of memory
> depletion.  If we don't have enough time to free memory then, what can
> the last-minute OOM handler do?
> 

The userspace oom handler doesn't necessarily guarantee that you can do 
memory freeing, our usecase wants to do a priority-based oom killing that 
is different from the kernel oom killer based on rss.  To do that, you 
only really need to read certain proc files and you can do killing based 
on uptime, for example.  You can also do a hierarchical traversal of 
memcgs based on a priority.

We already have hooks in the kernel oom killer, things like 
/proc/sys/vm/oom_kill_allocating_task and /proc/sys/vm/panic_on_oom that 
implement different policies that could now trivially be done in userspace 
with memory reserves and a timeout.  The point is that we can't possibly 
encode every possible policy into the kernel and there's no reason why 
userspace can't do the kill itself.

> 2. In addition to the time factor, it's not trivial to do anything,
> including freeing memory, without allocating memory first, so we'll
> need a reserve, but how much, and who is allowed to use it?
> 

The reserve is configurable in the proposal as a memcg precharge and would 
be dependent on the memory needed by the userspace oom handler at wakeup.  
Only processes that are waiting on memory.oom_control have access to the 
memory reserve.

> 3. How does one select the OOM-handler timeout?  If the freeing paths
> in the code are swapped out, the time needed to bring them in can be
> highly variable.
> 

The userspace oom handler itself is mlocked in memory, you'd want to 
select a timeout that is large enough to only react in situations where 
userspace is known to be unresponsive; it's only meant as a failsafe to 
avoid the memcg sitting around forever not making any forward progress.

> 4. Why wouldn't the OOM-handler also do the killing itself?  (Which is
> essentially what we do.)  Then all we need is a low-memory notifier
> which can predict how quickly we'll run out of memory.
> 

It can, but the prediction of how quickly we'll run out of memory is 
nearly impossible for every class of application and the timeout is 
required before the kernel steps in to solve the situation.

> 5. The use case mentioned earlier (the fact that the killing of one
> process can make an entire group of processes useless) can be dealt
> with using OOM priorities and user-level code.
> 

It depends on the application being killed.

> I confess I am surprised that the OOM killer works as well as I think
> it does.  Adding a user-level component would bring a whole new level
> of complexity to code that's already hard to fully comprehend, and
> might not really address the fundamental issues.
> 

The kernel code that would be added certainly isn't complex and I believe 
it is better than the current functionality that only allows you to 
disable the memcg oom killer entirely to effect any userspace policy.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2013-11-26  1:29 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-19 13:14 Michal Hocko
2013-11-19 13:40 ` Michal Hocko
2013-11-20  8:02   ` David Rientjes
2013-11-20 15:22     ` Michal Hocko
2013-11-20 17:14       ` Luigi Semenzato
2013-11-21  3:36         ` David Rientjes
2013-11-21  7:03           ` Luigi Semenzato
2013-11-22 18:08             ` Johannes Weiner
2013-11-28 11:36               ` Michal Hocko
2013-11-26  1:29             ` David Rientjes [this message]
2013-11-28 11:42               ` Michal Hocko
2013-12-02 23:09                 ` David Rientjes
2013-11-21  3:33       ` David Rientjes
2013-11-28 11:54         ` Michal Hocko
2013-12-02 23:07           ` David Rientjes
2013-12-04  5:19             ` [patch 1/8] fork: collapse copy_flags into copy_process David Rientjes
2013-12-04  5:19               ` [patch 2/8] mm, mempolicy: rename slab_node for clarity David Rientjes
2013-12-04 15:21                 ` Christoph Lameter
2013-12-04  5:20               ` [patch 3/8] mm, mempolicy: remove per-process flag David Rientjes
2013-12-04 15:24                 ` Christoph Lameter
2013-12-05  0:53                   ` David Rientjes
2013-12-05 19:05                     ` Christoph Lameter
2013-12-05 23:53                       ` David Rientjes
2013-12-06 14:46                         ` Christoph Lameter
2013-12-04  5:20               ` [patch 4/8] mm, memcg: add tunable for oom reserves David Rientjes
2013-12-04  5:20               ` [patch 5/8] res_counter: remove interface for locked charging and uncharging David Rientjes
2013-12-04  5:20               ` [patch 6/8] res_counter: add interface for maximum nofail charge David Rientjes
2013-12-04  5:20               ` [patch 7/8] mm, memcg: allow processes handling oom notifications to access reserves David Rientjes
2013-12-04  5:45                 ` Johannes Weiner
2013-12-05  1:49                   ` David Rientjes
2013-12-05  2:50                     ` Tejun Heo
2013-12-05 23:49                       ` David Rientjes
2013-12-06 17:34                         ` Johannes Weiner
2013-12-07 16:38                           ` Tim Hockin
2013-12-07 17:40                             ` Johannes Weiner
2013-12-07 18:12                               ` Tim Hockin
2013-12-07 19:06                                 ` Johannes Weiner
2013-12-07 21:04                                   ` Tim Hockin
2013-12-06 19:01                         ` Tejun Heo
2013-12-09 20:10                           ` David Rientjes
2013-12-09 22:37                             ` Johannes Weiner
2013-12-10 21:50                             ` Tejun Heo
2013-12-10 23:55                               ` David Rientjes
2013-12-11  9:49                                 ` Mel Gorman
2013-12-11 12:42                                 ` Tejun Heo
2013-12-12  5:37                                   ` Tim Hockin
2013-12-12 14:21                                     ` Tejun Heo
2013-12-12 16:32                                       ` Michal Hocko
2013-12-12 16:37                                         ` Tejun Heo
2013-12-12 18:42                                       ` Tim Hockin
2013-12-12 19:23                                         ` Tejun Heo
2013-12-13  0:23                                           ` Tim Hockin
2013-12-13 11:47                                             ` Tejun Heo
2013-12-04  5:20               ` [patch 8/8] mm, memcg: add memcg oom reserve documentation David Rientjes
2013-11-20 17:25     ` user defined OOM policies Vladimir Murzin
2013-11-20 17:21   ` Vladimir Murzin
2013-11-20 17:33     ` Michal Hocko
2013-11-21  3:38       ` David Rientjes
2013-11-21 17:13         ` Michal Hocko
2013-11-26  1:36           ` David Rientjes
2013-11-22  7:28       ` Vladimir Murzin
2013-11-22 13:18         ` Michal Hocko
2013-11-20  7:50 ` David Rientjes
2013-11-22  0:19 ` Jörn Engel
2013-11-26  1:31   ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.02.1311251717370.27270@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=glommer@gmail.com \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=joern@logfs.org \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=riel@redhat.com \
    --cc=semenzato@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox