From: Shakeel Butt <shakeelb@google.com>
To: Tim Hockin <thockin@hockin.org>
Cc: Roman Gushchin <guro@fb.com>, Michal Hocko <mhocko@kernel.org>,
Johannes Weiner <hannes@cmpxchg.org>, Tejun Heo <tj@kernel.org>,
kernel-team@fb.com, David Rientjes <rientjes@google.com>,
Linux MM <linux-mm@kvack.org>,
Vladimir Davydov <vdavydov.dev@gmail.com>,
Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
Andrew Morton <akpm@linux-foundation.org>,
Cgroups <cgroups@vger.kernel.org>,
linux-doc@vger.kernel.org,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [v8 0/4] cgroup-aware OOM killer
Date: Sun, 1 Oct 2017 16:29:48 -0700 [thread overview]
Message-ID: <CALvZod7iaOEeGmDJA0cZvJWpuzc-hMRn3PG2cfzcMniJtAjKqA@mail.gmail.com> (raw)
In-Reply-To: <CAAAKZwtApj-FgRc2V77nEb3BUd97Rwhgf-b-k0zhf1u+Y4fqxA@mail.gmail.com>
>
> Going back to Michal's example, say the user configured the following:
>
> root
> / \
> A D
> / \
> B C
>
> A global OOM event happens and we find this:
> - A > D
> - B, C, D are oomgroups
>
> What the user is telling us is that B, C, and D are compound memory
> consumers. They cannot be divided into their task parts from a memory
> point of view.
>
> However, the user doesn't say the same for A: the A subtree summarizes
> and controls aggregate consumption of B and C, but without groupoom
> set on A, the user says that A is in fact divisible into independent
> memory consumers B and C.
>
> If we don't have to kill all of A, but we'd have to kill all of D,
> does it make sense to compare the two?
>
I think Tim has given very clear explanation why comparing A & D makes
perfect sense. However I think the above example, a single user system
where a user has designed and created the whole hierarchy and then
attaches different jobs/applications to different nodes in this
hierarchy, is also a valid scenario. One solution I can think of, to
cater both scenarios, is to introduce a notion of 'bypass oom' or not
include a memcg for oom comparision and instead include its children
in the comparison.
So, in the same above example:
root
/ \
A(b) D
/ \
B C
A is marked as bypass and thus B and C are to be compared to D. So,
for the single user scenario, all the internal nodes are marked
'bypass oom comparison' and oom_priority of the leaves has to be set
to the same value.
Below is the pseudo code of select_victim_memcg() based on this idea
and David's previous pseudo code. The calculation of size of a memcg
is still not very well baked here yet. I am working on it and I plan
to have a patch based on Roman's v9 "mm, oom: cgroup-aware OOM killer"
patch.
struct mem_cgroup *memcg = root_mem_cgroup;
struct mem_cgroup *selected_memcg = root_mem_cgroup;
struct mem_cgroup *low_memcg;
unsigned long low_priority;
unsigned long prev_badness = memcg_oom_badness(memcg); // Roman's code
LIST_HEAD(queue);
next_level:
low_memcg = NULL;
low_priority = ULONG_MAX;
next:
for_each_child_of_memcg(it, memcg) {
unsigned long prio = it->oom_priority;
unsigned long badness = 0;
if (it->bypass_oom && !it->oom_group &&
memcg_has_children(it)) {
list_add(&it->oom_queue, &queue);
continue;
}
if (prio > low_priority)
continue;
if (prio == low_priority) {
badness = mem_cgroup_usage(it); // for
simplicity, need more thinking
if (badness < prev_badness)
continue;
}
low_memcg = it;
low_priority = prio;
prev_badness = badness ?: mem_cgroup_usage(it); //
for simplicity
}
if (!list_empty(&queue)) {
memcg = list_last_entry(&queue, struct mem_cgroup, oom_queue);
list_del(&memcg->oom_queue);
goto next;
}
if (low_memcg) {
selected_memcg = memcg = low_memcg;
prev_badness = 0;
if (!low_memcg->oom_group)
goto next_level;
}
if (selected_memcg->oom_group)
oom_kill_memcg(selected_memcg);
else
oom_kill_process_from_memcg(selected_memcg);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-10-01 23:29 UTC|newest]
Thread overview: 79+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-11 13:17 Roman Gushchin
2017-09-11 13:17 ` [v8 1/4] mm, oom: refactor the oom_kill_process() function Roman Gushchin
2017-09-11 20:51 ` David Rientjes
2017-09-14 13:42 ` Michal Hocko
2017-09-11 13:17 ` [v8 2/4] mm, oom: cgroup-aware OOM killer Roman Gushchin
2017-09-13 20:46 ` David Rientjes
2017-09-13 21:59 ` Roman Gushchin
2017-09-11 13:17 ` [v8 3/4] mm, oom: add cgroup v2 mount option for " Roman Gushchin
2017-09-11 20:48 ` David Rientjes
2017-09-12 20:01 ` Roman Gushchin
2017-09-12 20:23 ` David Rientjes
2017-09-13 12:23 ` Michal Hocko
2017-09-11 13:17 ` [v8 4/4] mm, oom, docs: describe the " Roman Gushchin
2017-09-11 20:44 ` [v8 0/4] " David Rientjes
2017-09-13 12:29 ` Michal Hocko
2017-09-13 20:46 ` David Rientjes
2017-09-14 13:34 ` Michal Hocko
2017-09-14 20:07 ` David Rientjes
2017-09-13 21:56 ` Roman Gushchin
2017-09-14 13:40 ` Michal Hocko
2017-09-14 16:05 ` Roman Gushchin
2017-09-15 10:58 ` Michal Hocko
2017-09-15 15:23 ` Roman Gushchin
2017-09-15 19:55 ` David Rientjes
2017-09-15 21:08 ` Roman Gushchin
2017-09-18 6:20 ` Michal Hocko
2017-09-18 15:02 ` Roman Gushchin
2017-09-21 8:30 ` David Rientjes
2017-09-19 20:54 ` David Rientjes
2017-09-20 22:24 ` Roman Gushchin
2017-09-21 8:27 ` David Rientjes
2017-09-18 6:16 ` Michal Hocko
2017-09-19 20:51 ` David Rientjes
2017-09-18 6:14 ` Michal Hocko
2017-09-20 21:53 ` Roman Gushchin
2017-09-25 12:24 ` Michal Hocko
2017-09-25 17:00 ` Johannes Weiner
2017-09-25 18:15 ` Roman Gushchin
2017-09-25 20:25 ` Michal Hocko
2017-09-26 10:59 ` Roman Gushchin
2017-09-26 11:21 ` Michal Hocko
2017-09-26 12:13 ` Roman Gushchin
2017-09-26 13:30 ` Michal Hocko
2017-09-26 17:26 ` Johannes Weiner
2017-09-27 3:37 ` Tim Hockin
2017-09-27 7:43 ` Michal Hocko
2017-09-27 10:19 ` Roman Gushchin
2017-09-27 15:35 ` Tim Hockin
2017-09-27 16:23 ` Roman Gushchin
2017-09-27 18:11 ` Tim Hockin
2017-10-01 23:29 ` Shakeel Butt [this message]
2017-10-02 11:56 ` Tetsuo Handa
2017-10-02 12:24 ` Michal Hocko
2017-10-02 12:47 ` Roman Gushchin
2017-10-02 14:29 ` Michal Hocko
2017-10-02 19:00 ` Shakeel Butt
2017-10-02 19:28 ` Michal Hocko
2017-10-02 19:45 ` Shakeel Butt
2017-10-02 19:56 ` Michal Hocko
2017-10-02 20:00 ` Tim Hockin
2017-10-02 20:08 ` Michal Hocko
2017-10-02 20:09 ` Shakeel Butt
2017-10-02 20:20 ` Shakeel Butt
2017-10-02 20:24 ` Shakeel Butt
2017-10-02 20:34 ` Johannes Weiner
2017-10-02 20:55 ` Michal Hocko
2017-09-25 22:21 ` David Rientjes
2017-09-26 8:46 ` Michal Hocko
2017-09-26 21:04 ` David Rientjes
2017-09-27 7:37 ` Michal Hocko
2017-09-27 9:57 ` Roman Gushchin
2017-09-21 14:21 ` Johannes Weiner
2017-09-21 21:17 ` David Rientjes
2017-09-21 21:51 ` Johannes Weiner
2017-09-22 20:53 ` David Rientjes
2017-09-22 15:44 ` Tejun Heo
2017-09-22 20:39 ` David Rientjes
2017-09-22 21:05 ` Tejun Heo
2017-09-23 8:16 ` David Rientjes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CALvZod7iaOEeGmDJA0cZvJWpuzc-hMRn3PG2cfzcMniJtAjKqA@mail.gmail.com \
--to=shakeelb@google.com \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=guro@fb.com \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@fb.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=penguin-kernel@i-love.sakura.ne.jp \
--cc=rientjes@google.com \
--cc=thockin@hockin.org \
--cc=tj@kernel.org \
--cc=vdavydov.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox