From: Tim Hockin <thockin@hockin.org>
To: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Andrew Morton <akpm@linux-foundation.org>,
Michal Hocko <mhocko@suse.cz>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Mel Gorman <mgorman@suse.de>, Rik van Riel <riel@redhat.com>,
Pekka Enberg <penberg@kernel.org>,
Christoph Lameter <cl@linux-foundation.org>,
Li Zefan <lizefan@huawei.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
linux-mm@kvack.org, Cgroups <cgroups@vger.kernel.org>
Subject: Re: [patch 7/8] mm, memcg: allow processes handling oom notifications to access reserves
Date: Wed, 11 Dec 2013 21:37:46 -0800 [thread overview]
Message-ID: <CAAAKZwsmM-C=kLGV=RW=Y4Mq=BWpQzuPruW6zvEr9p0Xs4GD5g@mail.gmail.com> (raw)
In-Reply-To: <20131211124240.GA24557@htj.dyndns.org>
The immediate problem I see with setting aside reserves "off the top"
is that we don't really know a priori how much memory the kernel
itself is going to use, which could still land us in an overcommitted
state.
In other words, if I have your 128 MB machine, and I set aside 8 MB
for OOM handling, and give 120 MB for jobs, I have not accounted for
the kernel. So I set aside 8 MB for OOM and 100 MB for jobs, leaving
20 MB for jobs. That should be enough right? Hell if I know, and
nothing ensures that.
On Wed, Dec 11, 2013 at 4:42 AM, Tejun Heo <tj@kernel.org> wrote:
> Yo,
>
> On Tue, Dec 10, 2013 at 03:55:48PM -0800, David Rientjes wrote:
>> > Well, the gotcha there is that you won't be able to do that with
>> > system level OOM handler either unless you create a separately
>> > reserved memory, which, again, can be achieved using hierarchical
>> > memcg setup already. Am I missing something here?
>>
>> System oom conditions would only arise when the usage of memcgs A + B
>> above cause the page allocator to not be able to allocate memory without
>> oom killing something even though the limits of both A and B may not have
>> been reached yet. No userspace oom handler can allocate memory with
>> access to memory reserves in the page allocator in such a context; it's
>> vital that if we are to handle system oom conditions in userspace that we
>> given them access to memory that other processes can't allocate. You
>> could attach a userspace system oom handler to any memcg in this scenario
>> with memory.oom_reserve_in_bytes and since it has PF_OOM_HANDLER it would
>> be able to allocate in reserves in the page allocator and overcharge in
>> its memcg to handle it. This isn't possible only with a hierarchical
>> memcg setup unless you ensure the sum of the limits of the top level
>> memcgs do not equal or exceed the sum of the min watermarks of all memory
>> zones, and we exceed that.
>
> Yes, exactly. If system memory is 128M, create top level memcgs w/
> 120M and 8M each (well, with some slack of course) and then overcommit
> the descendants of 120M while putting OOM handlers and friends under
> 8M without overcommitting.
>
> ...
>> The stronger rationale is that you can't handle system oom in userspace
>> without this functionality and we need to do so.
>
> You're giving yourself an unreasonable precondition - overcommitting
> at root level and handling system OOM from userland - and then trying
> to contort everything to fit that. How can possibly "overcommitting
> at root level" be a goal of and in itself? Please take a step back
> and look at and explain the *problem* you're trying to solve. You
> haven't explained why that *need*s to be the case at all.
>
> I wrote this at the start of the thread but you're still doing the
> same thing. You're trying to create a hidden memcg level inside a
> memcg. At the beginning of this thread, you were trying to do that
> for !root memcgs and now you're arguing that you *need* that for root
> memcg. Because there's no other limit we can make use of, you're
> suggesting the use of kernel reserve memory for that purpose. It
> seems like an absurd thing to do to me. It could be that you might
> not be able to achieve exactly the same thing that way, but the right
> thing to do would be improving memcg in general so that it can instead
> of adding yet more layer of half-baked complexity, right?
>
> Even if there are some inherent advantages of system userland OOM
> handling with a separate physical memory reserve, which AFAICS you
> haven't succeeded at showing yet, this is a very invasive change and,
> as you said before, something with an *extremely* narrow use case.
> Wouldn't it be a better idea to improve the existing mechanisms - be
> that memcg in general or kernel OOM handling - to fit the niche use
> case better? I mean, just think about all the corner cases. How are
> you gonna handle priority inversion through locked pages or
> allocations given out to other tasks through slab? You're suggesting
> opening a giant can of worms for extremely narrow benefit which
> doesn't even seem like actually needing opening the said can.
>
> Thanks.
>
> --
> tejun
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-12-12 5:38 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-19 13:14 user defined OOM policies Michal Hocko
2013-11-19 13:40 ` Michal Hocko
2013-11-20 8:02 ` David Rientjes
2013-11-20 15:22 ` Michal Hocko
2013-11-20 17:14 ` Luigi Semenzato
2013-11-21 3:36 ` David Rientjes
2013-11-21 7:03 ` Luigi Semenzato
2013-11-22 18:08 ` Johannes Weiner
2013-11-28 11:36 ` Michal Hocko
2013-11-26 1:29 ` David Rientjes
2013-11-28 11:42 ` Michal Hocko
2013-12-02 23:09 ` David Rientjes
2013-11-21 3:33 ` David Rientjes
2013-11-28 11:54 ` Michal Hocko
2013-12-02 23:07 ` David Rientjes
2013-12-04 5:19 ` [patch 1/8] fork: collapse copy_flags into copy_process David Rientjes
2013-12-04 5:19 ` [patch 2/8] mm, mempolicy: rename slab_node for clarity David Rientjes
2013-12-04 15:21 ` Christoph Lameter
2013-12-04 5:20 ` [patch 3/8] mm, mempolicy: remove per-process flag David Rientjes
2013-12-04 15:24 ` Christoph Lameter
2013-12-05 0:53 ` David Rientjes
2013-12-05 19:05 ` Christoph Lameter
2013-12-05 23:53 ` David Rientjes
2013-12-06 14:46 ` Christoph Lameter
2013-12-04 5:20 ` [patch 4/8] mm, memcg: add tunable for oom reserves David Rientjes
2013-12-04 5:20 ` [patch 5/8] res_counter: remove interface for locked charging and uncharging David Rientjes
2013-12-04 5:20 ` [patch 6/8] res_counter: add interface for maximum nofail charge David Rientjes
2013-12-04 5:20 ` [patch 7/8] mm, memcg: allow processes handling oom notifications to access reserves David Rientjes
2013-12-04 5:45 ` Johannes Weiner
2013-12-05 1:49 ` David Rientjes
2013-12-05 2:50 ` Tejun Heo
2013-12-05 23:49 ` David Rientjes
2013-12-06 17:34 ` Johannes Weiner
2013-12-07 16:38 ` Tim Hockin
2013-12-07 17:40 ` Johannes Weiner
2013-12-07 18:12 ` Tim Hockin
2013-12-07 19:06 ` Johannes Weiner
2013-12-07 21:04 ` Tim Hockin
2013-12-06 19:01 ` Tejun Heo
2013-12-09 20:10 ` David Rientjes
2013-12-09 22:37 ` Johannes Weiner
2013-12-10 21:50 ` Tejun Heo
2013-12-10 23:55 ` David Rientjes
2013-12-11 9:49 ` Mel Gorman
2013-12-11 12:42 ` Tejun Heo
2013-12-12 5:37 ` Tim Hockin [this message]
2013-12-12 14:21 ` Tejun Heo
2013-12-12 16:32 ` Michal Hocko
2013-12-12 16:37 ` Tejun Heo
2013-12-12 18:42 ` Tim Hockin
2013-12-12 19:23 ` Tejun Heo
2013-12-13 0:23 ` Tim Hockin
2013-12-13 11:47 ` Tejun Heo
2013-12-04 5:20 ` [patch 8/8] mm, memcg: add memcg oom reserve documentation David Rientjes
2013-11-20 17:25 ` user defined OOM policies Vladimir Murzin
2013-11-20 17:21 ` Vladimir Murzin
2013-11-20 17:33 ` Michal Hocko
2013-11-21 3:38 ` David Rientjes
2013-11-21 17:13 ` Michal Hocko
2013-11-26 1:36 ` David Rientjes
2013-11-22 7:28 ` Vladimir Murzin
2013-11-22 13:18 ` Michal Hocko
2013-11-20 7:50 ` David Rientjes
2013-11-22 0:19 ` Jörn Engel
2013-11-26 1:31 ` David Rientjes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAAAKZwsmM-C=kLGV=RW=Y4Mq=BWpQzuPruW6zvEr9p0Xs4GD5g@mail.gmail.com' \
--to=thockin@hockin.org \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=cl@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lizefan@huawei.com \
--cc=mgorman@suse.de \
--cc=mhocko@suse.cz \
--cc=penberg@kernel.org \
--cc=riel@redhat.com \
--cc=rientjes@google.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox