linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Liu <jeff.liu@oracle.com>
To: Michal Hocko <mhocko@suse.cz>
Cc: linux-mm@kvack.org, Glauber Costa <glommer@parallels.com>,
	handai.szj@taobao.com
Subject: Re: [PATCH v2 2/6] memcg: bypass swap accounting for the root memcg
Date: Thu, 31 Jan 2013 00:01:28 +0800	[thread overview]
Message-ID: <510943D8.9000902@oracle.com> (raw)
In-Reply-To: <20130129141318.GC29574@dhcp22.suse.cz>

On 01/29/2013 10:13 PM, Michal Hocko wrote:
> On Mon 28-01-13 18:54:38, Jeff Liu wrote:
>> Root memcg with swap cgroup is special since we only do tracking but can
>> not set limits against it.  In order to facilitate the implementation of
>> the coming swap cgroup structures delay allocation mechanism, we can bypass
>> the default swap statistics upon the root memcg and figure it out through
>> the global stats instead as below:
>>
>> root_memcg_swap_stat: total_swap_pages - nr_swap_pages - used_swap_pages_of_all_memcgs
> 
> How do you protect from races with swap{in,out}? Or they are tolerable?
To be honest, I previously have not taken race with swapin/out into consideration.

Yes, this patch would cause a little error since it has to iterate each memcg which can
introduce a bit overhead based on how many memcgs are configured.

However, considering our current implementation of swap statistics, we do account when swap 
cache is uncharged, but it is possible that the swap slot is already allocated before that.
That is to say, there is a inconsistent window in swap accounting stats IMHO.
As a figure shows to human, I think it can be tolerated to some extents. :)
> 
>> memcg_total_swap_stats: root_memcg_swap_stat + other_memcg_swap_stats
> 
> I am not sure I understand and if I do then it is not true:
> root (swap = 10M, use_hierarchy = 0/1)
>  \
>   A (swap = 1M, use_hierarchy = 1)
>    \
>     B (swap = 2M)
> 
> total for A is 3M regardless of what root has "accounted" while
> total for root should be 10 for use_hierarchy = 0 and 13 for the other
I am not sure I catch your point, but I think the total for root should be 13 no matter
use_hierarchy = 0 or 1, and the current patch is just doing that.

Originally, for_each_mem_cgroup_tree(iter, memcg) does statistics by iterating
all those children memcgs including the memcg itself.  But now, as we don't account the
root memcg swap statistics anymore(hence the stats is 0), we need to add the local swap
stats of root memcg itself(10M) to the memcg_total_swap_stats.  So actually we don't change
the way of accounting memcg_total_swap_stats.

> case (this is btw. broken in the tree already now because
> for_each_mem_cgroup_tree resp. mem_cgroup_iter doesn't honor
> use_hierarchy for the root cgroup - this is a separate topic though).
Yes, I noticed that the for_each_mem_cgroup_tree() resp, mem_cgroup_iter()
don't take the root->use_hierarchy into consideration, as it has the following logic:
if (!root->use_hierarchy && root != root_mem_cgroup) {
 	if (prev)
		return NULL;
	return root;
}

As i don't change the for_each_mem_cgroup_tree(), so it is in accordance with the original
behavior.

>> In this way, we'll return an invalid CSS_ID(generally, it's 0) at swap
>> cgroup related tracking infrastructures if only the root memcg is alive.
>> That is to say, we have not yet allocate swap cgroup structures.
>> As a result, the per pages swapin/swapout stats number agains the root
>> memcg shoud be ZERO.
>>
>> Signed-off-by: Jie Liu <jeff.liu@oracle.com>
>> Signed-off-by: Sha Zhengju <handai.szj@taobao.com>
>> CC: Glauber Costa <glommer@parallels.com>
>> CC: Michal Hocko <mhocko@suse.cz>
>> CC: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> CC: Johannes Weiner <hannes@cmpxchg.org>
>> CC: Mel Gorman <mgorman@suse.de>
>> CC: Andrew Morton <akpm@linux-foundation.org>
>>
>> ---
>>  mm/memcontrol.c |   35 ++++++++++++++++++++++++++++++-----
>>  1 file changed, 30 insertions(+), 5 deletions(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 09255ec..afe5e86 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -5231,12 +5231,34 @@ static int memcg_stat_show(struct cgroup *cont, struct cftype *cft,
>>  	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
>>  	struct mem_cgroup *mi;
>>  	unsigned int i;
>> +	long long root_swap_stat = 0;
>>
>>  	for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
>> -		if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
>> -			continue;
>> +		long val = 0;
>> +
>> +		if (i != MEM_CGROUP_STAT_SWAP)
>> +			val = mem_cgroup_read_stat(memcg, i);
>> +		else {
>> +			if (!do_swap_account)
>> +				continue;
> 
> 
>> +			if (!mem_cgroup_is_root(memcg))
>> +				val = mem_cgroup_read_stat(memcg, i);
>> +			else {
>> +				/*
>> +				 * The corresponding stat number of swap for
>> +				 * root_mem_cgroup is 0 since we don't account
>> +				 * it in any case.  Instead, we can fake the
>> +				 * root number via: total_swap_pages -
>> +				 * nr_swap_pages - total_swap_pages_of_all_memcg
>> +				 */
>> +				for_each_mem_cgroup(mi)
>> +					val += mem_cgroup_read_stat(mi, i);
>> +				val = root_swap_stat = (total_swap_pages -
>> +							nr_swap_pages - val);
>> +			}
> 
> This calls for a helper.
Yes, Sir.
> 
>> +		}
>>  		seq_printf(m, "%s %ld\n", mem_cgroup_stat_names[i],
>> -			   mem_cgroup_read_stat(memcg, i) * PAGE_SIZE);
>> +			   val * PAGE_SIZE);
>>  	}
>>  
>>  	for (i = 0; i < MEM_CGROUP_EVENTS_NSTATS; i++)
>> @@ -5260,8 +5282,11 @@ static int memcg_stat_show(struct cgroup *cont, struct cftype *cft,
>>  	for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
>>  		long long val = 0;
>>  
>> -		if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
>> -			continue;
>> +		if (i == MEM_CGROUP_STAT_SWAP) {
>> +			if (!do_swap_account)
>> +				continue;
>> +			val += root_swap_stat * PAGE_SIZE;
>> +		}
> 
> This doesn't seem right because you are adding root swap amount to _all_
> groups. This should be done only if (memcg == root_mem_cgroup).
Ah, I?m too dumb!

Thanks,
-Jeff
> 
>>  		for_each_mem_cgroup_tree(mi, memcg)
>>  			val += mem_cgroup_read_stat(mi, i) * PAGE_SIZE;
>>  		seq_printf(m, "total_%s %lld\n", mem_cgroup_stat_names[i], val);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-01-30 16:01 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-28 10:54 [PATCH v2 0/6] memcg: disable swap cgroup allocation at swapon Jeff Liu
2013-01-28 10:54 ` [PATCH v2 1/6] memcg: refactor swap_cgroup_swapon() Jeff Liu
2013-01-29  9:15   ` Lord Glauber Costa of Sealand
2013-01-29 13:41   ` Michal Hocko
2013-01-28 10:54 ` [PATCH v2 2/6] memcg: bypass swap accounting for the root memcg Jeff Liu
2013-01-29 10:18   ` Lord Glauber Costa of Sealand
2013-01-31  6:18     ` Jeff Liu
2013-01-29 14:13   ` Michal Hocko
2013-01-30 16:01     ` Jeff Liu [this message]
2013-01-30 16:29       ` Michal Hocko
2013-01-31  4:00         ` Jeff Liu
2013-01-28 10:54 ` [PATCH v2 3/6] memcg: introduce memsw_accounting_users Jeff Liu
2013-01-29  9:46   ` Lord Glauber Costa of Sealand
2013-01-29 10:52     ` Jeff Liu
2013-01-29 14:26     ` Michal Hocko
2013-01-29 14:24   ` Michal Hocko
2013-01-29 15:16     ` Jeff Liu
2013-01-28 10:54 ` [PATCH v2 4/6] memcg: export nr_swap_files Jeff Liu
2013-01-29  9:47   ` Lord Glauber Costa of Sealand
2013-01-29 14:31   ` Michal Hocko
2013-01-29 15:17     ` Jeff Liu
2013-01-28 10:54 ` [PATCH v2 5/6] memcg: introduce swap_cgroup_init()/swap_cgroup_free() Jeff Liu
2013-01-29  9:57   ` Lord Glauber Costa of Sealand
2013-01-29 10:21     ` Jeff Liu
2013-01-29 14:56   ` Michal Hocko
2013-01-29 15:51     ` Jeff Liu
2013-01-29 16:09       ` Michal Hocko
2013-01-28 10:54 ` [PATCH v2 6/6] memcg: init/free swap cgroup strucutres upon create/free child memcg Jeff Liu
2013-01-29  9:59   ` Lord Glauber Costa of Sealand
2013-01-29 10:27     ` Jeff Liu
2013-01-29 15:11   ` Michal Hocko
2013-01-29 15:15 ` [PATCH v2 0/6] memcg: disable swap cgroup allocation at swapon Michal Hocko
2013-01-29 16:50   ` Jeff Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=510943D8.9000902@oracle.com \
    --to=jeff.liu@oracle.com \
    --cc=glommer@parallels.com \
    --cc=handai.szj@taobao.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox