Re: [patch 2/2] mm: memcg: hierarchical soft limit reclaim

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Michal Hocko <mhocko@suse.cz>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Balbir Singh <bsingharora@gmail.com>,
	Ying Han <yinghan@google.com>,
	cgroups@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [patch 2/2] mm: memcg: hierarchical soft limit reclaim
Date: Fri, 13 Jan 2012 13:04:06 +0100	[thread overview]
Message-ID: <20120113120406.GC17060@tiehlicka.suse.cz> (raw)
In-Reply-To: <1326207772-16762-3-git-send-email-hannes@cmpxchg.org>

On Tue 10-01-12 16:02:52, Johannes Weiner wrote:
> Right now, memcg soft limits are implemented by having a sorted tree
> of memcgs that are in excess of their limits.  Under global memory
> pressure, kswapd first reclaims from the biggest excessor and then
> proceeds to do regular global reclaim.  The result of this is that
> pages are reclaimed from all memcgs, but more scanning happens against
> those above their soft limit.
> 
> With global reclaim doing memcg-aware hierarchical reclaim by default,
> this is a lot easier to implement: everytime a memcg is reclaimed
> from, scan more aggressively (per tradition with a priority of 0) if
> it's above its soft limit.  With the same end result of scanning
> everybody, but soft limit excessors a bit more.
> 
> Advantages:
> 
>   o smoother reclaim: soft limit reclaim is a separate stage before
>     global reclaim, whose result is not communicated down the line and
>     so overreclaim of the groups in excess is very likely.  After this
>     patch, soft limit reclaim is fully integrated into regular reclaim
>     and each memcg is considered exactly once per cycle.
> 
>   o true hierarchy support: soft limits are only considered when
>     kswapd does global reclaim, but after this patch, targetted
>     reclaim of a memcg will mind the soft limit settings of its child
>     groups.

Yes it makes sense. At first I was thinking that soft limit should be
considered only under global mem. pressure (at least documentation says
so) but now it makes sense.
We can push on over-soft limit groups more because they told us they
could sacrifice something...  Anyway documentation needs an update as
well.

But we have to be little bit careful here. I am still quite confuses how
we should handle hierarchies vs. subtrees. See bellow.

> 
>   o code size: soft limit reclaim requires a lot of code to maintain
>     the per-node per-zone rb-trees to quickly find the biggest
>     offender, dedicated paths for soft limit reclaim etc. while this
>     new implementation gets away without all that.

on my i386 pae setup (including swap extension enabled):
Before
   text    data     bss     dec     hex filename
    310086   29970   35372  375428   5ba84 mm/built-in.o
After
size mm/built-in.o 
   text    data     bss     dec     hex filename
    309048   30030   35372  374450   5b6b2 mm/built-in.o

I would expect a bigger difference but still good.

> Test:

Will look into results later.
[...]

> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  include/linux/memcontrol.h |   18 +--
>  mm/memcontrol.c            |  412 ++++----------------------------------------
>  mm/vmscan.c                |   80 +--------
>  3 files changed, 48 insertions(+), 462 deletions(-)

Really nice to see

[...]
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 170dff4..d4f7ae5 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
[...]
> @@ -1318,6 +1123,36 @@ static unsigned long mem_cgroup_margin(struct mem_cgroup *memcg)
>  	return margin >> PAGE_SHIFT;
>  }
>  
> +/**
> + * mem_cgroup_over_softlimit
> + * @root: hierarchy root
> + * @memcg: child of @root to test
> + *
> + * Returns %true if @memcg exceeds its own soft limit or contributes
> + * to the soft limit excess of one of its parents up to and including
> + * @root.
> + */
> +bool mem_cgroup_over_softlimit(struct mem_cgroup *root,
> +			       struct mem_cgroup *memcg)
> +{
> +	if (mem_cgroup_disabled())
> +		return false;
> +
> +	if (!root)
> +		root = root_mem_cgroup;
> +
> +	for (; memcg; memcg = parent_mem_cgroup(memcg)) {
> +		/* root_mem_cgroup does not have a soft limit */
> +		if (memcg == root_mem_cgroup)
> +			break;
> +		if (res_counter_soft_limit_excess(&memcg->res))
> +			return true;
> +		if (memcg == root)
> +			break;
> +	}
> +	return false;
> +}

Well, this might be little bit tricky. We do not check whether memcg and
root are in a hierarchy (in terms of use_hierarchy) relation. 

If we are under global reclaim then we iterate over all memcgs and so
there is no guarantee that there is a hierarchical relation between the
given memcg and its parent. While, on the other hand, if we are doing
memcg reclaim then we have this guarantee.

Why should we punish a group (subtree) which is perfectly under its soft
limit just because some other subtree contributes to the common parent's
usage and makes it over its limit?
Should we check memcg->use_hierarchy here?

Does it even makes sense to setup soft limit on a parent group without
hierarchies?
Well I have to admit that hierarchies makes me headache.

> +
>  int mem_cgroup_swappiness(struct mem_cgroup *memcg)
>  {
>  	struct cgroup *cgrp = memcg->css.cgroup;
[...]
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index e3fd8a7..4279549 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2121,8 +2121,16 @@ static void shrink_zone(int priority, struct zone *zone,
>  			.mem_cgroup = memcg,
>  			.zone = zone,
>  		};
> +		int epriority = priority;
> +		/*
> +		 * Put more pressure on hierarchies that exceed their
> +		 * soft limit, to push them back harder than their
> +		 * well-behaving siblings.
> +		 */
> +		if (mem_cgroup_over_softlimit(root, memcg))
> +			epriority = 0;

This sounds too aggressive to me. Shouldn't we just double the pressure
or something like that?
Previously we always had nr_to_reclaim == SWAP_CLUSTER_MAX when we did
memcg reclaim but this is not the case now. For the kswapd we have
nr_to_reclaim == ULONG_MAX so we will not break out of the reclaim early
and we have to scan a lot.
Direct reclaim (shrink or hard limit) shouldn't be affected here.

>  
> -		shrink_mem_cgroup_zone(priority, &mz, sc);
> +		shrink_mem_cgroup_zone(epriority, &mz, sc);
>  
>  		mem_cgroup_account_reclaim(root, memcg,
>  					   sc->nr_reclaimed - nr_reclaimed,

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2012-01-13 12:04 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-10 15:02 [patch 0/2] mm: memcg reclaim integration followups Johannes Weiner
2012-01-10 15:02 ` [patch 1/2] mm: memcg: per-memcg reclaim statistics Johannes Weiner
2012-01-10 23:54   ` Ying Han
2012-01-11  0:30     ` Johannes Weiner
2012-01-11 22:33       ` Ying Han
2012-01-12  9:17         ` Johannes Weiner
2012-01-10 15:02 ` [patch 2/2] mm: memcg: hierarchical soft limit reclaim Johannes Weiner
2012-01-11 21:42   ` Ying Han
2012-01-12  8:59     ` Johannes Weiner
2012-01-13 21:31       ` Ying Han
2012-01-13 22:44         ` Johannes Weiner
2012-01-17 14:22           ` Sha
2012-01-17 14:53             ` Johannes Weiner
2012-01-17 20:25               ` Ying Han
2012-01-17 21:56                 ` Johannes Weiner
2012-01-17 23:39                   ` Ying Han
2012-01-18  7:17               ` Sha
2012-01-18  9:25                 ` Johannes Weiner
2012-01-18 11:25                   ` Sha
2012-01-18 15:27                     ` Michal Hocko
2012-01-19  6:38                       ` Sha
2012-01-12  1:54   ` KAMEZAWA Hiroyuki
2012-01-13 12:16     ` Johannes Weiner
2012-01-18  5:26       ` KAMEZAWA Hiroyuki
2012-01-13 12:04   ` Michal Hocko [this message]
2012-01-13 15:50     ` Johannes Weiner
2012-01-13 16:34       ` Michal Hocko
2012-01-13 21:45         ` Ying Han
2012-01-18  9:45           ` Johannes Weiner
2012-01-18 20:38             ` Ying Han

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120113120406.GC17060@tiehlicka.suse.cz \
    --to=mhocko@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=bsingharora@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=yinghan@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox