From: Michal Hocko <mhocko@suse.cz>
To: Sha Zhengju <handai.szj@gmail.com>
Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
linux-mm@kvack.org, akpm@linux-foundation.org,
kamezawa.hiroyu@jp.fujitsu.com, gthelen@google.com,
fengguang.wu@intel.com, glommer@parallels.com,
Sha Zhengju <handai.szj@taobao.com>
Subject: Re: [PATCH V3 6/8] memcg: Don't account root_mem_cgroup page statistics
Date: Wed, 2 Jan 2013 13:27:12 +0100 [thread overview]
Message-ID: <20130102122712.GE22160@dhcp22.suse.cz> (raw)
In-Reply-To: <1356456447-14740-1-git-send-email-handai.szj@taobao.com>
On Wed 26-12-12 01:27:27, Sha Zhengju wrote:
> From: Sha Zhengju <handai.szj@taobao.com>
>
> If memcg is enabled and no non-root memcg exists, all allocated pages
> belongs to root_mem_cgroup and go through root memcg statistics routines
> which brings some overheads. So for the sake of performance, we can give
> up accounting stats of root memcg for MEM_CGROUP_STAT_FILE_MAPPED/FILE_DIRTY
> /WRITEBACK
I do not like this selective approach. We should handle all the stat
types in the same way. SWAP is not a hot path but RSS and CACHE should
be optimized as well. It seems that thresholds events might be a
complication here but it shouldn't be that a big deal (mem_cgroup_usage
would need some treat).
> and instead we pay special attention while showing root
> memcg numbers in memcg_stat_show(): as we don't account root memcg stats
> anymore, the root_mem_cgroup->stat numbers are actually 0.
Yes, this is reasonable.
> But because of hierachy, figures of root_mem_cgroup may just represent
> numbers of pages used by its own tasks(not belonging to any other
> child cgroup).
I am not sure what the above means. root might have use_hierarchy set to
1 as well.
> So here we fake these root numbers by using stats of global state and
> all other memcg. That is for root memcg:
> nr(MEM_CGROUP_STAT_FILE_MAPPED) = global_page_state(NR_FILE_MAPPED) -
> sum_of_all_memcg(MEM_CGROUP_STAT_FILE_MAPPED);
> Dirty/Writeback pages accounting are in the similar way.
>
> Signed-off-by: Sha Zhengju <handai.szj@taobao.com>
I like the approach but I do not like the implementation. See details
bellow.
> ---
> mm/memcontrol.c | 70 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 68 insertions(+), 2 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index fc20ac9..728349d 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
[...]
> @@ -5396,18 +5406,70 @@ static inline void mem_cgroup_lru_names_not_uptodate(void)
> BUILD_BUG_ON(ARRAY_SIZE(mem_cgroup_lru_names) != NR_LRU_LISTS);
> }
>
> +long long root_memcg_local_stat(unsigned int i, long long val,
> + long long nstat[])
Function should be static
also
nstat parameter is ugly because this can be done by the caller
and also expecting that the caller already calculated val is not
nice (and undocumented). This approach is really hackish and error
prone. Why should we define a specific function rather than hooking into
mem_cgroup_read_stat and doing all the stuff there? I think that would
be much more maintainable.
> +{
> + long long res = 0;
> +
> + switch (i) {
> + case MEM_CGROUP_STAT_FILE_MAPPED:
> + res = global_page_state(NR_FILE_MAPPED);
> + break;
> + case MEM_CGROUP_STAT_FILE_DIRTY:
> + res = global_page_state(NR_FILE_DIRTY);
> + break;
> + case MEM_CGROUP_STAT_WRITEBACK:
> + res = global_page_state(NR_WRITEBACK);
> + break;
> + default:
> + break;
> + }
> +
> + res = (res <= val) ? 0 : (res - val) * PAGE_SIZE;
> + nstat[i] = res;
> +
> + return res;
> +}
> +
> static int memcg_stat_show(struct cgroup *cont, struct cftype *cft,
> struct seq_file *m)
> {
> struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
> struct mem_cgroup *mi;
> unsigned int i;
> + long long nstat[MEM_CGROUP_STAT_NSTATS] = {0};
s/nstat/root_stat/
>
> for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
> + long long val = 0, res = 0;
> +
> if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
> continue;
> - seq_printf(m, "%s %ld\n", mem_cgroup_stat_names[i],
> - mem_cgroup_read_stat(memcg, i) * PAGE_SIZE);
> + if (i == MEM_CGROUP_STAT_SWAP || i == MEM_CGROUP_STAT_CACHE ||
> + i == MEM_CGROUP_STAT_RSS) {
This is plain ugly. If nothing else it asks for a comment why those are
special.
> + seq_printf(m, "%s %ld\n", mem_cgroup_stat_names[i],
> + mem_cgroup_read_stat(memcg, i) * PAGE_SIZE);
> + continue;
> + }
> +
> + /* As we don't account root memcg stats anymore, the
> + * root_mem_cgroup->stat numbers are actually 0. But because of
> + * hierachy, figures of root_mem_cgroup may just represent
> + * numbers of pages used by its own tasks(not belonging to any
> + * other child cgroup). So here we fake these root numbers by
> + * using stats of global state and all other memcg. That is for
> + * root memcg:
> + * nr(MEM_CGROUP_STAT_FILE_MAPPED) = global_page_state(NR_FILE_
> + * MAPPED) - sum_of_all_memcg(MEM_CGROUP_STAT_FILE_MAPPED)
> + * Dirty/Writeback pages accounting are in the similar way.
> + */
> + if (memcg == root_mem_cgroup) {
> + for_each_mem_cgroup(mi)
> + val += mem_cgroup_read_stat(mi, i);
> + res = root_memcg_local_stat(i, val, nstat);
> + } else
> + res = mem_cgroup_read_stat(memcg, i) * PAGE_SIZE;
> +
> + seq_printf(m, "%s %lld\n", mem_cgroup_stat_names[i], res);
> }
>
> for (i = 0; i < MEM_CGROUP_EVENTS_NSTATS; i++)
> @@ -5435,6 +5497,10 @@ static int memcg_stat_show(struct cgroup *cont, struct cftype *cft,
> continue;
> for_each_mem_cgroup_tree(mi, memcg)
> val += mem_cgroup_read_stat(mi, i) * PAGE_SIZE;
> +
> + /* Adding local stats of root memcg */
> + if (memcg == root_mem_cgroup)
> + val += nstat[i];
> seq_printf(m, "total_%s %lld\n", mem_cgroup_stat_names[i], val);
> }
>
> --
> 1.7.9.5
>
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-01-02 12:27 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-12-25 17:18 [PATCH V3 0/8] Per-cgroup page stat accounting Sha Zhengju
2012-12-25 17:20 ` [PATCH V3 1/8] memcg: remove MEMCG_NR_FILE_MAPPED Sha Zhengju
2012-12-25 17:22 ` [PATCH V3 2/8] Make TestSetPageDirty and dirty page accounting in one func Sha Zhengju
2012-12-28 0:39 ` Kamezawa Hiroyuki
2013-01-05 2:34 ` Sha Zhengju
2013-01-02 9:08 ` Michal Hocko
2013-01-05 2:49 ` Sha Zhengju
2013-01-05 10:45 ` Michal Hocko
2012-12-25 17:24 ` [PATCH V3 3/8] use vfs __set_page_dirty interface instead of doing it inside filesystem Sha Zhengju
2012-12-28 0:41 ` Kamezawa Hiroyuki
2012-12-25 17:26 ` [PATCH V3 4/8] memcg: add per cgroup dirty pages accounting Sha Zhengju
2013-01-02 10:44 ` Michal Hocko
2013-01-05 4:48 ` Sha Zhengju
2013-01-06 20:02 ` Hugh Dickins
2013-01-07 7:49 ` Kamezawa Hiroyuki
2013-01-09 5:15 ` Hugh Dickins
2013-01-09 7:24 ` Kamezawa Hiroyuki
2013-01-09 14:35 ` Sha Zhengju
2013-01-09 14:47 ` Michal Hocko
2013-01-07 7:25 ` Kamezawa Hiroyuki
2013-01-09 15:02 ` Sha Zhengju
2013-01-10 2:16 ` Kamezawa Hiroyuki
2013-01-10 4:26 ` Sha Zhengju
2013-01-10 5:03 ` Kamezawa Hiroyuki
2013-01-10 8:28 ` Sha Zhengju
2013-05-03 9:11 ` Michal Hocko
2013-05-03 9:59 ` Sha Zhengju
2013-01-06 20:07 ` Greg Thelen
2013-01-09 9:45 ` Sha Zhengju
2012-12-25 17:26 ` [PATCH V3 5/8] memcg: add per cgroup writeback " Sha Zhengju
2012-12-28 0:52 ` Kamezawa Hiroyuki
2013-01-02 11:15 ` Michal Hocko
2013-01-06 20:07 ` Greg Thelen
2013-01-09 9:08 ` Sha Zhengju
2012-12-25 17:27 ` [PATCH V3 6/8] memcg: Don't account root_mem_cgroup page statistics Sha Zhengju
2012-12-28 1:04 ` Kamezawa Hiroyuki
2013-01-05 7:38 ` Sha Zhengju
2013-01-02 12:27 ` Michal Hocko [this message]
2013-01-05 10:52 ` Sha Zhengju
2013-01-09 12:57 ` Michal Hocko
2012-12-25 17:27 ` [PATCH V3 7/8] memcg: disable memcg page stat accounting code when not in use Sha Zhengju
2012-12-28 1:06 ` Kamezawa Hiroyuki
2012-12-28 1:45 ` Kamezawa Hiroyuki
2013-01-05 11:06 ` Sha Zhengju
2013-01-02 13:35 ` Michal Hocko
2012-12-25 17:28 ` [PATCH V3 8/8] memcg: Document cgroup dirty/writeback memory statistics Sha Zhengju
2012-12-28 1:10 ` Kamezawa Hiroyuki
2013-01-06 2:55 ` Sha Zhengju
2013-01-02 13:36 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130102122712.GE22160@dhcp22.suse.cz \
--to=mhocko@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=fengguang.wu@intel.com \
--cc=glommer@parallels.com \
--cc=gthelen@google.com \
--cc=handai.szj@gmail.com \
--cc=handai.szj@taobao.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox