From: Greg Thelen <gthelen@google.com>
To: Michal Hocko <mhocko@suse.cz>
Cc: linux-mm@kvack.org, Johannes Weiner <hannes@cmpxchg.org>,
Andrew Morton <akpm@linux-foundation.org>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
LKML <linux-kernel@vger.kernel.org>,
Ying Han <yinghan@google.com>, Hugh Dickins <hughd@google.com>,
Michel Lespinasse <walken@google.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Tejun Heo <tj@kernel.org>
Subject: Re: [RFC 0/4] memcg: Low-limit reclaim
Date: Thu, 30 Jan 2014 16:28:27 -0800 [thread overview]
Message-ID: <xr931tzphu50.fsf@gthelen.mtv.corp.google.com> (raw)
In-Reply-To: <20140130123044.GB13509@dhcp22.suse.cz>
On Thu, Jan 30 2014, Michal Hocko wrote:
> On Wed 29-01-14 11:08:46, Greg Thelen wrote:
> [...]
>> The series looks useful. We (Google) have been using something similar.
>> In practice such a low_limit (or memory guarantee), doesn't nest very
>> well.
>>
>> Example:
>> - parent_memcg: limit 500, low_limit 500, usage 500
>> 1 privately charged non-reclaimable page (e.g. mlock, slab)
>> - child_memcg: limit 500, low_limit 500, usage 499
>
> I am not sure this is a good example. Your setup basically say that no
> single page should be reclaimed. I can imagine this might be useful in
> some cases and I would like to allow it but it sounds too extreme (e.g.
> a load which would start trashing heavily once the reclaim starts and it
> makes more sense to start it again rather than crowl - think about some
> mathematical simulation which might diverge).
Pages will still be reclaimed the usage_in_bytes is exceeds
limit_in_bytes. I see the low_limit as a way to tell the kernel: don't
reclaim my memory due to external pressure, but internal pressure is
different.
>> If a streaming file cache workload (e.g. sha1sum) starts gobbling up
>> page cache it will lead to an oom kill instead of reclaiming.
>
> Does it make any sense to protect all of such memory although it is
> easily reclaimable?
I think protection makes sense in this case. If I know my workload
needs 500 to operate well, then I reserve 500 using low_limit. My app
doesn't want to run with less than its reservation.
>> One could argue that this is working as intended because child_memcg
>> was promised 500 but can only get 499. So child_memcg is oom killed
>> rather than being forced to operate below its promised low limit.
>>
>> This has led to various internal workarounds like:
>> - don't charge any memory to interior tree nodes (e.g. parent_memcg);
>> only charge memory to cgroup leafs. This gets tricky when dealing
>> with reparented memory inherited to parent from child during cgroup
>> deletion.
>
> Do those need any protection at all?
Interior tree nodes don't need protection from their children. But
children and interior nodes need protection from siblings and parents.
>> - don't set low_limit on non leafs (e.g. do not set low limit on
>> parent_memcg). This constrains the cgroup layout a bit. Some
>> customers want to purchase $MEM and setup their workload with a few
>> child cgroups. A system daemon hands out $MEM by setting low_limit
>> for top-level containers (e.g. parent_memcg). Thereafter such
>> customers are able to partition their workload with sub memcg below
>> child_memcg. Example:
>> parent_memcg
>> \
>> child_memcg
>> / \
>> server backup
>
> I think that the low_limit makes sense where you actually want to
> protect something from reclaim. And backup sounds like a bad fit for
> that.
The backup job would presumably have a small low_limit, but it may still
have a minimum working set required to make useful forward progress.
Example:
parent_memcg
\
child_memcg limit 500, low_limit 500, usage 500
/ \
| backup limit 10, low_limit 10, usage 10
|
server limit 490, low_limit 490, usage 490
One could argue that problems appear when
server.low_limit+backup.lower_limit=child_memcg.limit. So the safer
configuration is leave some padding:
server.low_limit + backup.low_limit + padding = child_memcg.limit
but this just defers the problem. As memory is reparented into parent,
then padding must grow.
>> Thereafter customers often want some weak isolation between server and
>> backup. To avoid undesired oom kills the server/backup isolation is
>> provided with a softer memory guarantee (e.g. soft_limit). The soft
>> limit acts like the low_limit until priority becomes desperate.
>
> Johannes was already suggesting that the low_limit should allow for a
> weaker semantic as well. I am not very much inclined to that but I can
> leave with a knob which would say oom_on_lowlimit (on by default but
> allowed to be set to 0). We would fallback to the full reclaim if
> no groups turn out to be reclaimable.
I like the strong semantic of your low_limit at least at level:1 cgroups
(direct children of root). But I have also encountered situations where
a strict guarantee is too strict and a mere preference is desirable.
Perhaps the best plan is to continue with the proposed strict low_limit
and eventually provide an additional mechanism which provides weaker
guarantees (e.g. soft_limit or something else if soft_limit cannot be
altered). These two would offer good support for a variety of use
cases.
I thinking of something like:
bool mem_cgroup_reclaim_eligible(struct mem_cgroup *memcg,
struct mem_cgroup *root,
int priority)
{
do {
if (memcg == root)
break;
if (!res_counter_low_limit_excess(&memcg->res))
return false;
if ((priority >= DEF_PRIORITY - 2) &&
!res_counter_soft_limit_exceed(&memcg->res))
return false;
} while ((memcg = parent_mem_cgroup(memcg)));
return true;
}
But this soft_limit,priority extension can be added later.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2014-01-31 0:28 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-11 14:15 Michal Hocko
2013-12-11 14:15 ` [RFC 1/4] memcg, mm: introduce lowlimit reclaim Michal Hocko
2013-12-11 14:15 ` [RFC 2/4] mm, memcg: allow OOM if no memcg is eligible during direct reclaim Michal Hocko
2013-12-11 14:15 ` [RFC 3/4] memcg: Allow setting low_limit Michal Hocko
2013-12-11 14:15 ` [RFC 4/4] mm, memcg: expedite OOM if no memcg is reclaimable Michal Hocko
2014-01-24 11:07 ` [RFC 0/4] memcg: Low-limit reclaim Roman Gushchin
2014-01-29 18:22 ` Michal Hocko
2014-02-12 12:28 ` Roman Gushchin
2014-02-13 16:12 ` Michal Hocko
2014-01-29 19:08 ` Greg Thelen
2014-01-30 12:30 ` Michal Hocko
2014-01-31 0:28 ` Greg Thelen [this message]
2014-02-03 14:43 ` Michal Hocko
2014-02-04 1:33 ` Greg Thelen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xr931tzphu50.fsf@gthelen.mtv.corp.google.com \
--to=gthelen@google.com \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.cz \
--cc=tj@kernel.org \
--cc=walken@google.com \
--cc=yinghan@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox