From: Ying Han <yinghan@google.com>
To: Johannes Weiner <jweiner@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Andrew Morton <akpm@linux-foundation.org>,
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
Balbir Singh <bsingharora@gmail.com>,
Andrew Brestic <abrestic@google.com>,
Michal Hocko <mhocko@suse.cz>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [patch] Revert "memcg: add memory.vmscan_stat"
Date: Wed, 31 Aug 2011 23:05:51 -0700 [thread overview]
Message-ID: <CALWz4iyXbrgcrZEOsgvvW9mu6fr7Qwbn2d1FR_BVw6R_pMZPsQ@mail.gmail.com> (raw)
In-Reply-To: <20110830084245.GC13061@redhat.com>
On Tue, Aug 30, 2011 at 1:42 AM, Johannes Weiner <jweiner@redhat.com> wrote:
> On Tue, Aug 30, 2011 at 04:20:50PM +0900, KAMEZAWA Hiroyuki wrote:
>> On Tue, 30 Aug 2011 09:04:24 +0200
>> Johannes Weiner <jweiner@redhat.com> wrote:
>>
>> > On Tue, Aug 30, 2011 at 10:12:33AM +0900, KAMEZAWA Hiroyuki wrote:
>> > > @@ -1710,11 +1711,18 @@ static void mem_cgroup_record_scanstat(s
>> > > spin_lock(&memcg->scanstat.lock);
>> > > __mem_cgroup_record_scanstat(memcg->scanstat.stats[context], rec);
>> > > spin_unlock(&memcg->scanstat.lock);
>> > > -
>> > > - memcg = rec->root;
>> > > - spin_lock(&memcg->scanstat.lock);
>> > > - __mem_cgroup_record_scanstat(memcg->scanstat.rootstats[context], rec);
>> > > - spin_unlock(&memcg->scanstat.lock);
>> > > + cgroup = memcg->css.cgroup;
>> > > + do {
>> > > + spin_lock(&memcg->scanstat.lock);
>> > > + __mem_cgroup_record_scanstat(
>> > > + memcg->scanstat.hierarchy_stats[context], rec);
>> > > + spin_unlock(&memcg->scanstat.lock);
>> > > + if (!cgroup->parent)
>> > > + break;
>> > > + cgroup = cgroup->parent;
>> > > + memcg = mem_cgroup_from_cont(cgroup);
>> > > + } while (memcg->use_hierarchy && memcg != rec->root);
>> >
>> > Okay, so this looks correct, but it sums up all parents after each
>> > memcg scanned, which could have a performance impact. Usually,
>> > hierarchy statistics are only summed up when a user reads them.
>> >
>> Hmm. But sum-at-read doesn't work.
>>
>> Assume 3 cgroups in a hierarchy.
>>
>> A
>> /
>> B
>> /
>> C
>>
>> C's scan contains 3 causes.
>> C's scan caused by limit of A.
>> C's scan caused by limit of B.
>> C's scan caused by limit of C.
>>
>> If we make hierarchy sum at read, we think
>> B's scan_stat = B's scan_stat + C's scan_stat
>> But in precice, this is
>>
>> B's scan_stat = B's scan_stat caused by B +
>> B's scan_stat caused by A +
>> C's scan_stat caused by C +
>> C's scan_stat caused by B +
>> C's scan_stat caused by A.
>>
>> In orignal version.
>> B's scan_stat = B's scan_stat caused by B +
>> C's scan_stat caused by B +
>>
>> After this patch,
>> B's scan_stat = B's scan_stat caused by B +
>> B's scan_stat caused by A +
>> C's scan_stat caused by C +
>> C's scan_stat caused by B +
>> C's scan_stat caused by A.
>>
>> Hmm...removing hierarchy part completely seems fine to me.
>
> I see.
>
> You want to look at A and see whether its limit was responsible for
> reclaim scans in any children. IMO, that is asking the question
> backwards. Instead, there is a cgroup under reclaim and one wants to
> find out the cause for that. Not the other way round.
>
> In my original proposal I suggested differentiating reclaim caused by
> internal pressure (due to own limit) and reclaim caused by
> external/hierarchical pressure (due to limits from parents).
>
> If you want to find out why C is under reclaim, look at its reclaim
> statistics. If the _limit numbers are high, C's limit is the problem.
> If the _hierarchical numbers are high, the problem is B, A, or
> physical memory, so you check B for _limit and _hierarchical as well,
> then move on to A.
>
> Implementing this would be as easy as passing not only the memcg to
> scan (victim) to the reclaim code, but also the memcg /causing/ the
> reclaim (root_mem):
>
> root_mem == victim -> account to victim as _limit
> root_mem != victim -> account to victim as _hierarchical
>
> This would make things much simpler and more natural, both the code
> and the way of tracking down a problem, IMO.
This is pretty much the stats I am currently using for debugging the
reclaim patches. For example:
scanned_pages_by_system 0
scanned_pages_by_system_under_hierarchy 50989
scanned_pages_by_limit 0
scanned_pages_by_limit_under_hierarchy 0
"_system" is count under global reclaim, and "_limit" is count under
per-memcg reclaim.
"_under_hiearchy" is set if memcg is not the one triggering pressure.
So in the previous example:
> A (root)
> /
> B
> /
> C
For cgroup C:
scanned_pages_by_system:
scanned_pages_by_system_under_hierarchy: # of pages scanned under
global memory pressure
scanned_pages_by_limit: # of pages scanned while C hits the limit
scanned_pages_by_limit_under_hierarchy: # of pages scanned while B
hits the limit
--Ying
>
>> > I don't get why this has to be done completely different from the way
>> > we usually do things, without any justification, whatsoever.
>> >
>> > Why do you want to pass a recording structure down the reclaim stack?
>>
>> Just for reducing number of passed variables.
>
> It's still sitting on bottom of the reclaim stack the whole time.
>
> With my proposal, you would only need to pass the extra root_mem
> pointer.
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-09-01 6:06 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-22 8:15 [PATCH v3] memcg: add memory.vmscan_stat KAMEZAWA Hiroyuki
2011-08-08 12:43 ` Johannes Weiner
2011-08-08 23:33 ` KAMEZAWA Hiroyuki
2011-08-09 8:01 ` Johannes Weiner
2011-08-09 8:01 ` KAMEZAWA Hiroyuki
2011-08-13 1:04 ` Ying Han
2011-08-29 15:51 ` [patch] Revert "memcg: add memory.vmscan_stat" Johannes Weiner
2011-08-30 1:12 ` KAMEZAWA Hiroyuki
2011-08-30 7:04 ` Johannes Weiner
2011-08-30 7:20 ` KAMEZAWA Hiroyuki
2011-08-30 7:35 ` KAMEZAWA Hiroyuki
2011-08-30 8:42 ` Johannes Weiner
2011-08-30 8:56 ` KAMEZAWA Hiroyuki
2011-08-30 10:17 ` Johannes Weiner
2011-08-30 10:34 ` KAMEZAWA Hiroyuki
2011-08-30 11:03 ` Johannes Weiner
2011-08-30 23:38 ` KAMEZAWA Hiroyuki
2011-08-30 10:38 ` KAMEZAWA Hiroyuki
2011-08-30 11:32 ` Johannes Weiner
2011-08-30 23:29 ` KAMEZAWA Hiroyuki
2011-08-31 6:23 ` Johannes Weiner
2011-08-31 6:30 ` KAMEZAWA Hiroyuki
2011-08-31 8:33 ` Johannes Weiner
2011-09-01 6:05 ` Ying Han [this message]
2011-09-01 6:40 ` Johannes Weiner
2011-09-01 7:04 ` Ying Han
2011-09-01 8:27 ` Johannes Weiner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CALWz4iyXbrgcrZEOsgvvW9mu6fr7Qwbn2d1FR_BVw6R_pMZPsQ@mail.gmail.com \
--to=yinghan@google.com \
--cc=abrestic@google.com \
--cc=akpm@linux-foundation.org \
--cc=bsingharora@gmail.com \
--cc=jweiner@redhat.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.cz \
--cc=nishimura@mxp.nes.nec.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox