From: Johannes Weiner <hannes@cmpxchg.org>
To: Ying Han <yinghan@google.com>
Cc: Michal Hocko <mhocko@suse.cz>, Mel Gorman <mel@csn.ul.ie>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Rik van Riel <riel@redhat.com>, Hillf Danton <dhillf@gmail.com>,
Hugh Dickins <hughd@google.com>,
Dan Magenheimer <dan.magenheimer@oracle.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org
Subject: Re: [PATCH V3 0/2] memcg softlimit reclaim rework
Date: Fri, 20 Apr 2012 00:51:33 +0200 [thread overview]
Message-ID: <20120419225133.GB2536@cmpxchg.org> (raw)
In-Reply-To: <20120419223318.GA2536@cmpxchg.org>
On Fri, Apr 20, 2012 at 12:33:18AM +0200, Johannes Weiner wrote:
> On Thu, Apr 19, 2012 at 10:47:27AM -0700, Ying Han wrote:
> > On Thu, Apr 19, 2012 at 10:04 AM, Michal Hocko <mhocko@suse.cz> wrote:
> > > On Wed 18-04-12 11:00:40, Ying Han wrote:
> > >> On Wed, Apr 18, 2012 at 5:24 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> > >> > On Tue, Apr 17, 2012 at 09:37:46AM -0700, Ying Han wrote:
> > >> >> The "soft_limit" was introduced in memcg to support over-committing the
> > >> >> memory resource on the host. Each cgroup configures its "hard_limit" where
> > >> >> it will be throttled or OOM killed by going over the limit. However, the
> > >> >> cgroup can go above the "soft_limit" as long as there is no system-wide
> > >> >> memory contention. So, the "soft_limit" is the kernel mechanism for
> > >> >> re-distributing system spare memory among cgroups.
> > >> >>
> > >> >> This patch reworks the softlimit reclaim by hooking it into the new global
> > >> >> reclaim scheme. So the global reclaim path including direct reclaim and
> > >> >> background reclaim will respect the memcg softlimit.
> > >> >>
> > >> >> v3..v2:
> > >> >> 1. rebase the patch on 3.4-rc3
> > >> >> 2. squash the commits of replacing the old implementation with new
> > >> >> implementation into one commit. This is to make sure to leave the tree
> > >> >> in stable state between each commit.
> > >> >> 3. removed the commit which changes the nr_to_reclaim for global reclaim
> > >> >> case. The need of that patch is not obvious now.
> > >> >>
> > >> >> Note:
> > >> >> 1. the new implementation of softlimit reclaim is rather simple and first
> > >> >> step for further optimizations. there is no memory pressure balancing between
> > >> >> memcgs for each zone, and that is something we would like to add as follow-ups.
> > >> >>
> > >> >> 2. this patch is slightly different from the last one posted from Johannes
> > >> >> http://comments.gmane.org/gmane.linux.kernel.mm/72382
> > >> >> where his patch is closer to the reverted implementation by doing hierarchical
> > >> >> reclaim for each selected memcg. However, that is not expected behavior from
> > >> >> user perspective. Considering the following example:
> > >> >>
> > >> >> root (32G capacity)
> > >> >> --> A (hard limit 20G, soft limit 15G, usage 16G)
> > >> >> --> A1 (soft limit 5G, usage 4G)
> > >> >> --> A2 (soft limit 10G, usage 12G)
> > >> >> --> B (hard limit 20G, soft limit 10G, usage 16G)
> > >> >>
> > >> >> Under global reclaim, we shouldn't add pressure on A1 although its parent(A)
> > >> >> exceeds softlimit. This is what admin expects by setting softlimit to the
> > >> >> actual working set size and only reclaim pages under softlimit if system has
> > >> >> trouble to reclaim.
> > >> >
> > >> > Actually, this is exactly what the admin expects when creating a
> > >> > hierarchy, because she defines that A1 is a child of A and is
> > >> > responsible for the memory situation in its parent.
> > >
> > > Hmm, I guess that both approaches have cons and pros.
> > > * Hierarchical soft limit reclaim - reclaim the whole subtree of the over
> > > soft limit memcg
> > > + it is consistent with the hard limit reclaim
> > Not sure why we want them to be consistent. Soft_limit is serving
> > different purpose and the one of the main purpose is to preserve the
> > working set of the cgroup.
>
> I'd argue, given the history of cgroups, one of the main purposes is
> having a machine of containers where you overcommit their hard limit
> and set the soft limit accordingly to provide fairness.
>
> Yes, we don't want to reclaim hierarchies that are below their soft
> limit as long as there are some in excess, of course. This is a flaw
> and needs fixing. But it's something completely different than
> changing how the soft limit is defined and suddenly allow child
> groups, which you may not trust, to override rules defined by parental
> groups.
>
> It bothers me that we should add something that will almost certainly
> bite us in the future while we are discussing on the cgroups list what
> would stand in the way of getting sane hierarchy semantics across
> controllers to provide consistency, nesting, etc.
>
> To support a single use case, which I feel we still have not discussed
> nearly enough to justify this change.
>
> For example, I get that you want 'meta-groups' that group together
> subgroups for common accounting and hard limiting. But I don't see
> why such meta-groups have their own processes. Conceptually, I mean,
> how does a process fit into A? Is it superior to the tasks in A1 and
> A2? Why can't it live in A3?
>
> So here is a proposal:
>
> Would it make sense to try to keep those meta groups always free of
> their own memory so that they don't /need/ soft limits with weird
> semantics? E.g. immediately free the unused memory on rmdir, OR add
> mechanisms to migrate the memory to a dedicated group:
>
> A
> A1 (soft-limited)
> A2 (soft-limited)
> B
> unused (soft-limited)
>
> Move all leftover memory from finished jobs to this 'unused' group.
> You could set its soft limit to 0 so that it sticks around only until
> you actually need the memory for something else.
>
> Then you would get the benefits of accounting and limiting A1 and A2
> under a single umbrella without the need for a soft limit in A. We
> could keep the consistent semantics for soft limits, because you would
> only have to set it on leaf nodes.
>
> Wouldn't this work for you?
Or, if the frequency of job creation and completion permits, just keep
the original groups around after completion, set their soft limit to
0, put a watch ("threshold notification") on its usage and reap it
when global pressure finally cleaned it out.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2012-04-19 22:51 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-04-17 16:37 Ying Han
2012-04-18 12:24 ` Johannes Weiner
2012-04-18 18:00 ` Ying Han
2012-04-19 17:04 ` Michal Hocko
2012-04-19 17:47 ` Ying Han
2012-04-19 22:33 ` Johannes Weiner
2012-04-19 22:51 ` Johannes Weiner [this message]
2012-04-20 7:37 ` Ying Han
2012-04-20 8:21 ` KAMEZAWA Hiroyuki
2012-04-20 14:17 ` Rik van Riel
2012-04-20 16:56 ` Ying Han
2012-04-20 13:17 ` Johannes Weiner
2012-04-20 17:44 ` Ying Han
2012-04-20 18:58 ` Michal Hocko
2012-04-20 22:50 ` Ying Han
2012-04-20 22:56 ` Rik van Riel
2012-04-20 23:14 ` Ying Han
2012-04-21 0:19 ` Johannes Weiner
2012-04-21 0:48 ` Johannes Weiner
2012-04-23 22:19 ` Ying Han
2012-04-20 23:29 ` Johannes Weiner
2012-04-23 13:59 ` Michal Hocko
2012-04-20 8:28 ` Michal Hocko
2012-04-20 8:11 ` Michal Hocko
2012-04-20 17:22 ` Ying Han
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120419225133.GB2536@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=akpm@linux-foundation.org \
--cc=dan.magenheimer@oracle.com \
--cc=dhillf@gmail.com \
--cc=hughd@google.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=mhocko@suse.cz \
--cc=riel@redhat.com \
--cc=yinghan@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox