From: "KAMEZAWA Hiroyuki" <kamezawa.hiroyu@jp.fujitsu.com>
To: balbir@linux.vnet.ibm.com
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
linux-mm@kvack.org, Sudhir Kumar <skumar@linux.vnet.ibm.com>,
YAMAMOTO Takashi <yamamoto@valinux.co.jp>,
Bharata B Rao <bharata@in.ibm.com>,
Paul Menage <menage@google.com>,
lizf@cn.fujitsu.com, linux-kernel@vger.kernel.org,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
David Rientjes <rientjes@google.com>,
Pavel Emelianov <xemul@openvz.org>,
Dhaval Giani <dhaval@linux.vnet.ibm.com>,
Rik van Riel <riel@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH 0/4] Memory controller soft limit patches (v3)
Date: Mon, 2 Mar 2009 23:04:34 +0900 (JST) [thread overview]
Message-ID: <c31ccd23cb41f0f7594b3f56b20f0165.squirrel@webmail-b.css.fujitsu.com> (raw)
In-Reply-To: <20090302124210.GK11421@balbir.in.ibm.com>
Balbir Singh wrote:
> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-02
> 16:06:02]:
>
>> On Mon, 2 Mar 2009 12:06:49 +0530
>> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>> > OK, I get your point, but whay does that make RB-Tree data structure
>> non-sense?
>> >
>>
>> 1. Until memory-shortage, rb-tree is kept to be updated and the
>> users(kernel)
>> has to pay its maintainace/check cost, whici is unnecessary.
>> Considering trade-off, paying cost only when memory-shortage happens
>> tend to
>> be reasonable way.
> As you've seen in the code, the cost is only at an interval HZ/2
> currently. The other overhead is the calculation of excess, I can try
> and see if we can get rid of it.
>
>>
>> 2. Current "exceed" just shows "How much we got over my soft limit" but
>> doesn't
>> tell any information per-node/zone. Considering this, this rb-tree
>> information will not be able to help kswapd (on NUMA).
>> But maintain per-node information uses too much resource.
>
> Yes, kswapd is per-node and we try to free all pages belonging to a
> zonelist as specified by pgdat->node_zonelists for the memory control
> groups that are over their soft limit. Keeping this information per
> node makes no sense (exceeds information).
>
>>
>> Considering above 2, it's not bad to find victim by proper logic
>> from balance_pgdat() by using mem_cgroup_select_victim().
>> like this:
>> ==
>> struct mem_cgroup *select_vicitim_at_soft_limit_via_balance_pgdat(int
>> nid, int zid)
>> {
>> while (?) {
>> vitcim = mem_cgroup_select_victim(init_mem_cgroup); #need some
>> modification.
>> if (victim is not over soft-limit)
>> continue;
>> /* Ok this is candidate */
>> usage = mem_cgroup_nid_zid_usage(mem, nid, zid); #get sum of
>> active/inactive
>> if (usage_is_enough_big)
>> return victim;
>
> We currently track overall usage, so we split into per nid, zid
> information and use that? Is that your suggestion?
My suggestion is that current per-zone statistics interface of memcg
already holds all necessary information. And aggregate usage information
is not worth to be tracked becauset it's no help for kswapd.
> The soft limit is
> also an aggregate limit, how do we define usage_is_big_enough or
> usage_is_enough_big? Through some heuristics?
>
I think that if memcg/zone's page usage is not 0, it's enough big.
(and round robin rotation as hierachical reclaim can be used.)
There maybe some threshold to try.
For example)
need_to_reclaim = zone->high - zone->free.
if (usage_in_this_zone_of_memcg > need_to_reclaim/4)
select this.
Maybe we can adjust that later.
>> }
>> }
>> balance_pgdat()
>> ...... find target zone....
>> ...
>> mem = select_victime_at_soft_limit_via_balance_pgdat(nid, zid)
>> if (mem)
>> sc->mem = mem;
>> shrink_zone();
>> if (mem) {
>> sc->mem = NULL;
>> css_put(&mem->css);
>> }
>> ==
>>
>> We have to pay scan cost but it will not be too big(if there are not
>> thousands of memcg.)
>> Under above, round-robin rotation is used rather than sort.
>
> Yes, we sort, but not frequently at every page-fault but at a
> specified interval.
>
>> Maybe I can show you sample.....(but I'm a bit busy.)
>>
>
> Explanation and review is good, but I don't see how not-sorting will
> help? I need something that can help me point to the culprits quickly
> enough during soft limit reclaim and RB-Tree works very well for me.
>
I don't think "tracking memcg which exceeds soft limit" is not worth
to do in synchronous way. It can be done in lazy way when it's necessary
in simpler logic.
BTW, did you do set-softlimit-zero and rmdir() test ?
At quick review, memcg will never be removed from RB tree because
force_empty moves account from children to parent. But no tree ops there.
plz see mem_cgroup_move_account().
I'm sorry if I missed something.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-03-02 14:04 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-01 6:29 Balbir Singh
2009-03-01 6:30 ` [PATCH 1/4] Memory controller soft limit documentation (v3) Balbir Singh
2009-03-01 6:30 ` [PATCH 2/4] Memory controller soft limit interface (v3) Balbir Singh
2009-03-02 2:03 ` KAMEZAWA Hiroyuki
2009-03-02 4:46 ` Balbir Singh
2009-03-02 5:35 ` KAMEZAWA Hiroyuki
2009-03-02 6:07 ` Balbir Singh
2009-03-02 6:19 ` KAMEZAWA Hiroyuki
2009-03-02 6:29 ` Balbir Singh
2009-03-01 6:30 ` [PATCH 3/4] Memory controller soft limit organize cgroups (v3) Balbir Singh
2009-03-01 6:30 ` [PATCH 4/4] Memory controller soft limit reclaim on contention (v3) Balbir Singh
2009-03-02 3:08 ` KOSAKI Motohiro
2009-03-02 4:44 ` Balbir Singh
2009-03-03 2:43 ` KOSAKI Motohiro
2009-03-03 11:17 ` Balbir Singh
2009-03-04 0:07 ` KOSAKI Motohiro
2009-03-02 0:24 ` [PATCH 0/4] Memory controller soft limit patches (v3) KAMEZAWA Hiroyuki
2009-03-02 4:40 ` Balbir Singh
2009-03-02 5:32 ` KAMEZAWA Hiroyuki
2009-03-02 6:05 ` Balbir Singh
2009-03-02 6:18 ` KAMEZAWA Hiroyuki
2009-03-02 17:52 ` Balbir Singh
2009-03-03 0:03 ` KAMEZAWA Hiroyuki
2009-03-03 11:23 ` Balbir Singh
2009-03-02 6:21 ` KAMEZAWA Hiroyuki
2009-03-02 6:36 ` Balbir Singh
2009-03-02 7:06 ` KAMEZAWA Hiroyuki
2009-03-02 7:17 ` KAMEZAWA Hiroyuki
2009-03-02 12:42 ` Balbir Singh
2009-03-02 14:04 ` KAMEZAWA Hiroyuki [this message]
2009-03-02 17:41 ` Balbir Singh
2009-03-02 23:59 ` KAMEZAWA Hiroyuki
2009-03-03 11:12 ` Balbir Singh
2009-03-03 11:50 ` KAMEZAWA Hiroyuki
2009-03-03 13:14 ` Balbir Singh
2009-03-05 9:04 ` KAMEZAWA Hiroyuki
2009-03-05 9:13 ` KAMEZAWA Hiroyuki
2009-03-05 15:26 ` Balbir Singh
2009-03-05 23:53 ` KAMEZAWA Hiroyuki
2009-03-06 3:23 ` KAMEZAWA Hiroyuki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c31ccd23cb41f0f7594b3f56b20f0165.squirrel@webmail-b.css.fujitsu.com \
--to=kamezawa.hiroyu@jp.fujitsu.com \
--cc=akpm@linux-foundation.org \
--cc=balbir@linux.vnet.ibm.com \
--cc=bharata@in.ibm.com \
--cc=dhaval@linux.vnet.ibm.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lizf@cn.fujitsu.com \
--cc=menage@google.com \
--cc=riel@redhat.com \
--cc=rientjes@google.com \
--cc=skumar@linux.vnet.ibm.com \
--cc=xemul@openvz.org \
--cc=yamamoto@valinux.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox