linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Ying Han <yinghan@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@suse.cz>, Mel Gorman <mel@csn.ul.ie>,
	Rik van Riel <riel@redhat.com>, Hillf Danton <dhillf@gmail.com>,
	Hugh Dickins <hughd@google.com>,
	Dan Magenheimer <dan.magenheimer@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org
Subject: Re: [PATCH V3 0/2] memcg softlimit reclaim rework
Date: Fri, 20 Apr 2012 17:21:41 +0900	[thread overview]
Message-ID: <4F911C95.4040008@jp.fujitsu.com> (raw)
In-Reply-To: <CALWz4iy2==jYkYx98EGbqbM2Y7q4atJpv9sH_B7Fjr8aqq++JQ@mail.gmail.com>

(2012/04/20 16:37), Ying Han wrote:

> On Thu, Apr 19, 2012 at 3:33 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
>> On Thu, Apr 19, 2012 at 10:47:27AM -0700, Ying Han wrote:
>>> On Thu, Apr 19, 2012 at 10:04 AM, Michal Hocko <mhocko@suse.cz> wrote:
>>>> On Wed 18-04-12 11:00:40, Ying Han wrote:
>>>>> On Wed, Apr 18, 2012 at 5:24 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
>>>>>> On Tue, Apr 17, 2012 at 09:37:46AM -0700, Ying Han wrote:
>>>>>>> The "soft_limit" was introduced in memcg to support over-committing the
>>>>>>> memory resource on the host. Each cgroup configures its "hard_limit" where
>>>>>>> it will be throttled or OOM killed by going over the limit. However, the
>>>>>>> cgroup can go above the "soft_limit" as long as there is no system-wide
>>>>>>> memory contention. So, the "soft_limit" is the kernel mechanism for
>>>>>>> re-distributing system spare memory among cgroups.
>>>>>>>
>>>>>>> This patch reworks the softlimit reclaim by hooking it into the new global
>>>>>>> reclaim scheme. So the global reclaim path including direct reclaim and
>>>>>>> background reclaim will respect the memcg softlimit.
>>>>>>>
>>>>>>> v3..v2:
>>>>>>> 1. rebase the patch on 3.4-rc3
>>>>>>> 2. squash the commits of replacing the old implementation with new
>>>>>>> implementation into one commit. This is to make sure to leave the tree
>>>>>>> in stable state between each commit.
>>>>>>> 3. removed the commit which changes the nr_to_reclaim for global reclaim
>>>>>>> case. The need of that patch is not obvious now.
>>>>>>>
>>>>>>> Note:
>>>>>>> 1. the new implementation of softlimit reclaim is rather simple and first
>>>>>>> step for further optimizations. there is no memory pressure balancing between
>>>>>>> memcgs for each zone, and that is something we would like to add as follow-ups.
>>>>>>>
>>>>>>> 2. this patch is slightly different from the last one posted from Johannes
>>>>>>> http://comments.gmane.org/gmane.linux.kernel.mm/72382
>>>>>>> where his patch is closer to the reverted implementation by doing hierarchical
>>>>>>> reclaim for each selected memcg. However, that is not expected behavior from
>>>>>>> user perspective. Considering the following example:
>>>>>>>
>>>>>>> root (32G capacity)
>>>>>>> --> A (hard limit 20G, soft limit 15G, usage 16G)
>>>>>>>    --> A1 (soft limit 5G, usage 4G)
>>>>>>>    --> A2 (soft limit 10G, usage 12G)
>>>>>>> --> B (hard limit 20G, soft limit 10G, usage 16G)
>>>>>>>
>>>>>>> Under global reclaim, we shouldn't add pressure on A1 although its parent(A)
>>>>>>> exceeds softlimit. This is what admin expects by setting softlimit to the
>>>>>>> actual working set size and only reclaim pages under softlimit if system has
>>>>>>> trouble to reclaim.
>>>>>>
>>>>>> Actually, this is exactly what the admin expects when creating a
>>>>>> hierarchy, because she defines that A1 is a child of A and is
>>>>>> responsible for the memory situation in its parent.
>>>>
>>>> Hmm, I guess that both approaches have cons and pros.
>>>> * Hierarchical soft limit reclaim - reclaim the whole subtree of the over
>>>>  soft limit memcg
>>>>  + it is consistent with the hard limit reclaim
>>> Not sure why we want them to be consistent. Soft_limit is serving
>>> different purpose and the one of the main purpose is to preserve the
>>> working set of the cgroup.
>>
>> I'd argue, given the history of cgroups, one of the main purposes is
>> having a machine of containers where you overcommit their hard limit
>> and set the soft limit accordingly to provide fairness.
>>
>> Yes, we don't want to reclaim hierarchies that are below their soft
>> limit as long as there are some in excess, of course.  This is a flaw
>> and needs fixing.  But it's something completely different than
>> changing how the soft limit is defined and suddenly allow child
>> groups, which you may not trust, to override rules defined by parental
>> groups.
>>
>> It bothers me that we should add something that will almost certainly
>> bite us in the future while we are discussing on the cgroups list what
>> would stand in the way of getting sane hierarchy semantics across
>> controllers to provide consistency, nesting, etc.
> 
> I understand the concern here and I don't want the soft_limit reclaim
> to be far away from the other part of the cgroup design down to the
> road. On the other hand, I don't think the current implementation is
> against the hierarchy semantics totally. See the comment below :)
> 
>>
>> To support a single use case, which I feel we still have not discussed
>> nearly enough to justify this change.
>>
>> For example, I get that you want 'meta-groups' that group together
>> subgroups for common accounting and hard limiting.  But I don't see
>> why such meta-groups have their own processes.  Conceptually, I mean,
>> how does a process fit into A?  Is it superior to the tasks in A1 and
>> A2?  Why can't it live in A3?
> 
> For user processes, I can see that is totally feasible to live in A3.
> The case I was thinking is kernel threads, which 1) we don't want to
> limit their memory usage 2) they  serve for the whole group unlike
> individual jobs. Of course, we could say that putting those kernel
> thread in A3 and leave the cgroup to unlimited, but not sure if we
> should constrain ourselves not having any processes running under A.
> 
>>
>> So here is a proposal:
>>
>> Would it make sense to try to keep those meta groups always free of
>> their own memory so that they don't /need/ soft limits with weird
>> semantics?  E.g. immediately free the unused memory on rmdir, OR add
>> mechanisms to migrate the memory to a dedicated group:
>>
>>     A
>>       A1 (soft-limited)
>>       A2 (soft-limited)
>>     B
>>     unused (soft-limited)
>>
>> Move all leftover memory from finished jobs to this 'unused' group.
>> You could set its soft limit to 0 so that it sticks around only until
>> you actually need the memory for something else.
>>
>> Then you would get the benefits of accounting and limiting A1 and A2
>> under a single umbrella without the need for a soft limit in A.  We
>> could keep the consistent semantics for soft limits, because you would
>> only have to set it on leaf nodes.
>>
>> Wouldn't this work for you?
> 
> To be frankly, this sounds a lot of extra work for admin to manage the
> system and we still can not prevent page being landed on A totally.
> 
> Back to the current proposal, there are two concerns that I can tell by far:
> 
> 1. skipping "not trust" cgroup in case it sets its soft_limit very high:
> Here, we don't skip the "not trust" cgroup always. We do reclaim from
> them if not enough progress made from other cgroups above the
> softlimit. So, I don't see a problem here.
> 
> 2. not reclaiming based on hierarchy:
> Here I am not checking the ancestor's soft_limit in
> should_reclaim_mem_cgroup(). And it will only make difference if A is
> under soft_limit and A1 is above soft_limit. Now you do agree that we
> shouldn't reclaim from those under softlimit groups if there are
> cgroup exeed their softlimit. Then it leads me to think something like
> the following:
> 
> 1. for priority > DEF_PRIORITY - 3, only reclaim memcg above their softlimit
> 2. for priority <= DEF_PRIORITY - 3, besides 1), also look at memcg's
> ancestor. reclaim memcgs whose ancestor above soft_limit
> 3. for priority == 0, reclaim everything.
> 
> Then it has the guarantee of the softlimit at certain level while also
> considers the hierarchy reclaim if the first few rounds doesn't
> fulfill the request.
> 


seems complicated. I vote for " Hierarchical soft limit reclaim ".


If you need smart victim selection under hierarchy, please implement
victim scheduler which choose A2 rather than A and A1. I think you
can do it.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2012-04-20  8:23 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-17 16:37 Ying Han
2012-04-18 12:24 ` Johannes Weiner
2012-04-18 18:00   ` Ying Han
2012-04-19 17:04     ` Michal Hocko
2012-04-19 17:47       ` Ying Han
2012-04-19 22:33         ` Johannes Weiner
2012-04-19 22:51           ` Johannes Weiner
2012-04-20  7:37           ` Ying Han
2012-04-20  8:21             ` KAMEZAWA Hiroyuki [this message]
2012-04-20 14:17               ` Rik van Riel
2012-04-20 16:56                 ` Ying Han
2012-04-20 13:17             ` Johannes Weiner
2012-04-20 17:44               ` Ying Han
2012-04-20 18:58                 ` Michal Hocko
2012-04-20 22:50                   ` Ying Han
2012-04-20 22:56                     ` Rik van Riel
2012-04-20 23:14                       ` Ying Han
2012-04-21  0:19                     ` Johannes Weiner
2012-04-21  0:48                       ` Johannes Weiner
2012-04-23 22:19                         ` Ying Han
2012-04-20 23:29                   ` Johannes Weiner
2012-04-23 13:59                     ` Michal Hocko
2012-04-20  8:28           ` Michal Hocko
2012-04-20  8:11         ` Michal Hocko
2012-04-20 17:22           ` Ying Han

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F911C95.4040008@jp.fujitsu.com \
    --to=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=akpm@linux-foundation.org \
    --cc=dan.magenheimer@oracle.com \
    --cc=dhillf@gmail.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=mhocko@suse.cz \
    --cc=riel@redhat.com \
    --cc=yinghan@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox