linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Rientjes <rientjes@google.com>
To: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@suse.cz>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Mel Gorman <mgorman@suse.de>, Rik van Riel <riel@redhat.com>,
	Pekka Enberg <penberg@kernel.org>,
	Christoph Lameter <cl@linux-foundation.org>,
	Li Zefan <lizefan@huawei.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	cgroups@vger.kernel.org
Subject: Re: [patch 7/8] mm, memcg: allow processes handling oom notifications to access reserves
Date: Thu, 5 Dec 2013 15:49:57 -0800 (PST)	[thread overview]
Message-ID: <alpine.DEB.2.02.1312051537550.7717@chino.kir.corp.google.com> (raw)
In-Reply-To: <20131205025026.GA26777@htj.dyndns.org>

On Wed, 4 Dec 2013, Tejun Heo wrote:

> Hello,
> 

Tejun, how are you?

> Umm.. without delving into details, aren't you basically creating a
> memory cgroup inside a memory cgroup?  Doesn't sound like a
> particularly well thought-out plan to me.
> 

I agree that we wouldn't need such support if we are only addressing memcg 
oom conditions.  We could do things like A/memory.limit_in_bytes == 128M 
and A/b/memory.limit_in_bytes == 126MB and then attach the process waiting 
on A/b/memory.oom_control to A and that would work perfect.

However, we also need to discuss system oom handling.  We have an interest 
in being able to allow userspace to handle system oom conditions since the 
policy will differ depending on machine and we can't encode every possible 
mechanism into the kernel.  For example, on system oom we want to kill a 
process from the lowest priority top-level memcg.  We lack that ability 
entirely in the kernel and since the sum of our top-level memcgs 
memory.limit_in_bytes exceeds the amount of present RAM, we run into these 
oom conditions a _lot_.

So the first step, in my opinion, is to add a system oom notification on 
the root memcg's memory.oom_control which currently allows registering an 
eventfd() notification but never actually triggers.  I did that in a patch 
and it is was merged into -mm but was pulled out for later discussion.

Then, we need to ensure that the userspace that is registered to handle 
such events and that is difficult to do when the system is oom.  The 
proposal is to allow such processes, now marked as PF_OOM_HANDLER, to be 
able to access pre-defined per-zone memory reserves in the page allocator.  
The only special handling for PF_OOM_HANDLER in the page allocator itself 
would be under such oom conditions (memcg oom conditions have no problem 
allocating the memory, only charging it).  The amount of reserves would be 
defined as memory.oom_reserve_in_bytes from within the root memcg as 
defined by this patch, i.e. allow this amount of memory to be allocated in 
the page allocator for PF_OOM_HANDLER below the per-zone min watermarks.

This, I believe, is the cleanest interface for users who choose to use a 
non-default policy by setting memory.oom_reserve_in_bytes and constrains 
all of the code to memcg which you have to configure for such support.

The system oom condition is not addressed in this patch series, although 
the PF_OOM_HANDLER bit can be used for that purpose.  I didn't post that 
patch because the notification on the root memcg's memory.oom_control in 
such conditions is currently being debated, so we need to solve that issue 
first.

Your opinions and suggestions are more than helpful, thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-12-05 23:50 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-19 13:14 user defined OOM policies Michal Hocko
2013-11-19 13:40 ` Michal Hocko
2013-11-20  8:02   ` David Rientjes
2013-11-20 15:22     ` Michal Hocko
2013-11-20 17:14       ` Luigi Semenzato
2013-11-21  3:36         ` David Rientjes
2013-11-21  7:03           ` Luigi Semenzato
2013-11-22 18:08             ` Johannes Weiner
2013-11-28 11:36               ` Michal Hocko
2013-11-26  1:29             ` David Rientjes
2013-11-28 11:42               ` Michal Hocko
2013-12-02 23:09                 ` David Rientjes
2013-11-21  3:33       ` David Rientjes
2013-11-28 11:54         ` Michal Hocko
2013-12-02 23:07           ` David Rientjes
2013-12-04  5:19             ` [patch 1/8] fork: collapse copy_flags into copy_process David Rientjes
2013-12-04  5:19               ` [patch 2/8] mm, mempolicy: rename slab_node for clarity David Rientjes
2013-12-04 15:21                 ` Christoph Lameter
2013-12-04  5:20               ` [patch 3/8] mm, mempolicy: remove per-process flag David Rientjes
2013-12-04 15:24                 ` Christoph Lameter
2013-12-05  0:53                   ` David Rientjes
2013-12-05 19:05                     ` Christoph Lameter
2013-12-05 23:53                       ` David Rientjes
2013-12-06 14:46                         ` Christoph Lameter
2013-12-04  5:20               ` [patch 4/8] mm, memcg: add tunable for oom reserves David Rientjes
2013-12-04  5:20               ` [patch 5/8] res_counter: remove interface for locked charging and uncharging David Rientjes
2013-12-04  5:20               ` [patch 6/8] res_counter: add interface for maximum nofail charge David Rientjes
2013-12-04  5:20               ` [patch 7/8] mm, memcg: allow processes handling oom notifications to access reserves David Rientjes
2013-12-04  5:45                 ` Johannes Weiner
2013-12-05  1:49                   ` David Rientjes
2013-12-05  2:50                     ` Tejun Heo
2013-12-05 23:49                       ` David Rientjes [this message]
2013-12-06 17:34                         ` Johannes Weiner
2013-12-07 16:38                           ` Tim Hockin
2013-12-07 17:40                             ` Johannes Weiner
2013-12-07 18:12                               ` Tim Hockin
2013-12-07 19:06                                 ` Johannes Weiner
2013-12-07 21:04                                   ` Tim Hockin
2013-12-06 19:01                         ` Tejun Heo
2013-12-09 20:10                           ` David Rientjes
2013-12-09 22:37                             ` Johannes Weiner
2013-12-10 21:50                             ` Tejun Heo
2013-12-10 23:55                               ` David Rientjes
2013-12-11  9:49                                 ` Mel Gorman
2013-12-11 12:42                                 ` Tejun Heo
2013-12-12  5:37                                   ` Tim Hockin
2013-12-12 14:21                                     ` Tejun Heo
2013-12-12 16:32                                       ` Michal Hocko
2013-12-12 16:37                                         ` Tejun Heo
2013-12-12 18:42                                       ` Tim Hockin
2013-12-12 19:23                                         ` Tejun Heo
2013-12-13  0:23                                           ` Tim Hockin
2013-12-13 11:47                                             ` Tejun Heo
2013-12-04  5:20               ` [patch 8/8] mm, memcg: add memcg oom reserve documentation David Rientjes
2013-11-20 17:25     ` user defined OOM policies Vladimir Murzin
2013-11-20 17:21   ` Vladimir Murzin
2013-11-20 17:33     ` Michal Hocko
2013-11-21  3:38       ` David Rientjes
2013-11-21 17:13         ` Michal Hocko
2013-11-26  1:36           ` David Rientjes
2013-11-22  7:28       ` Vladimir Murzin
2013-11-22 13:18         ` Michal Hocko
2013-11-20  7:50 ` David Rientjes
2013-11-22  0:19 ` Jörn Engel
2013-11-26  1:31   ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.02.1312051537550.7717@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=cl@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan@huawei.com \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=penberg@kernel.org \
    --cc=riel@redhat.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox