linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Balbir Singh <balbir@linux.vnet.ibm.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: linux-mm@kvack.org, Sudhir Kumar <skumar@linux.vnet.ibm.com>,
	YAMAMOTO Takashi <yamamoto@valinux.co.jp>,
	Bharata B Rao <bharata@in.ibm.com>,
	Paul Menage <menage@google.com>,
	lizf@cn.fujitsu.com, linux-kernel@vger.kernel.org,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	David Rientjes <rientjes@google.com>,
	Pavel Emelianov <xemul@openvz.org>,
	Dhaval Giani <dhaval@linux.vnet.ibm.com>,
	Rik van Riel <riel@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH 0/4] Memory controller soft limit patches (v3)
Date: Mon, 2 Mar 2009 23:11:56 +0530	[thread overview]
Message-ID: <20090302174156.GM11421@balbir.in.ibm.com> (raw)
In-Reply-To: <c31ccd23cb41f0f7594b3f56b20f0165.squirrel@webmail-b.css.fujitsu.com>

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-02 23:04:34]:

> Balbir Singh wrote:
> > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-02
> > 16:06:02]:
> >
> >> On Mon, 2 Mar 2009 12:06:49 +0530
> >> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> >> > OK, I get your point, but whay does that make RB-Tree data structure
> >> non-sense?
> >> >
> >>
> >>  1. Until memory-shortage, rb-tree is kept to be updated and the
> >> users(kernel)
> >>     has to pay its maintainace/check cost, whici is unnecessary.
> >>     Considering trade-off, paying cost only when memory-shortage happens
> >> tend to
> >>     be reasonable way.
> > As you've seen in the code, the cost is only at an interval HZ/2
> > currently. The other overhead is the calculation of excess, I can try
> > and see if we can get rid of it.
> >
> >>
> >>  2. Current "exceed" just shows "How much we got over my soft limit" but
> >> doesn't
> >>     tell any information per-node/zone. Considering this, this rb-tree
> >>     information will not be able to help kswapd (on NUMA).
> >>     But maintain per-node information uses too much resource.
> >
> > Yes, kswapd is per-node and we try to free all pages belonging to a
> > zonelist as specified by pgdat->node_zonelists for the memory control
> > groups that are over their soft limit. Keeping this information per
> > node makes no sense (exceeds information).
> >
> >>
> >>  Considering above 2, it's not bad to find victim by proper logic
> >>  from balance_pgdat() by using mem_cgroup_select_victim().
> >>  like this:
> >> ==
> >>  struct mem_cgroup *select_vicitim_at_soft_limit_via_balance_pgdat(int
> >> nid, int zid)
> >>  {
> >>      while (?) {
> >>         vitcim = mem_cgroup_select_victim(init_mem_cgroup);  #need some
> >> modification.
> >>         if (victim is not over soft-limit)
> >>              continue;
> >>         /* Ok this is candidate */
> >>         usage = mem_cgroup_nid_zid_usage(mem, nid, zid); #get sum of
> >> active/inactive
> >>         if (usage_is_enough_big)
> >>               return victim;
> >
> > We currently track overall usage, so we split into per nid, zid
> > information and use that? Is that your suggestion?
> 
> My suggestion is that current per-zone statistics interface of memcg
> already holds all necessary information. And aggregate usage information
> is not worth to be tracked becauset it's no help for kswapd.
>

We have that data, but we need aggregate data to see who exceeded the
limit.
 
> >  The soft limit is
> > also an aggregate limit, how do we define usage_is_big_enough or
> > usage_is_enough_big? Through some heuristics?
> >
> I think that if memcg/zone's page usage is not 0, it's enough big.
> (and round robin rotation as hierachical reclaim can be used.)
> 
> There maybe some threshold to try.
> 
> For example)
>    need_to_reclaim = zone->high - zone->free.
>    if (usage_in_this_zone_of_memcg > need_to_reclaim/4)
>          select this.
> 
> Maybe we can adjust that later.
> 

No... this looks broken by design. Even if the administrator sets a
large enough limit and no soft limits, the cgroup gets reclaimed from?


> >>      }
> >>  }
> >>  balance_pgdat()
> >>  ...... find target zone....
> >>  ...
> >>  mem = select_victime_at_soft_limit_via_balance_pgdat(nid, zid)
> >>  if (mem)
> >>    sc->mem = mem;
> >>  shrink_zone();
> >>  if (mem) {
> >>    sc->mem = NULL;
> >>    css_put(&mem->css);
> >>  }
> >> ==
> >>
> >>  We have to pay scan cost but it will not be too big(if there are not
> >> thousands of memcg.)
> >>  Under above, round-robin rotation is used rather than sort.
> >
> > Yes, we sort, but not frequently at every page-fault but at a
> > specified interval.
> >
> >>  Maybe I can show you sample.....(but I'm a bit busy.)
> >>
> >
> > Explanation and review is good, but I don't see how not-sorting will
> > help? I need something that can help me point to the culprits quickly
> > enough during soft limit reclaim and RB-Tree works very well for me.
> >
> 
> I don't think "tracking memcg which exceeds soft limit" is not worth
> to do in synchronous way. It can be done in lazy way when it's necessary
> in simpler logic.
>

The synchronous way can be harmful if we do it every page fault. THe
current logic is quite simple....no?
 
> BTW, did you do set-softlimit-zero and rmdir() test ?
> At quick review, memcg will never be removed from RB tree because
> force_empty moves account from children to parent. But no tree ops there.
> plz see mem_cgroup_move_account().
> 

__mme_cgroup_free() has tree ops, shouldn't that catch this scenario?

> I'm sorry if I missed something.
>

Thanks for the review and discussion. 

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2009-03-02 17:42 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-01  6:29 Balbir Singh
2009-03-01  6:30 ` [PATCH 1/4] Memory controller soft limit documentation (v3) Balbir Singh
2009-03-01  6:30 ` [PATCH 2/4] Memory controller soft limit interface (v3) Balbir Singh
2009-03-02  2:03   ` KAMEZAWA Hiroyuki
2009-03-02  4:46     ` Balbir Singh
2009-03-02  5:35       ` KAMEZAWA Hiroyuki
2009-03-02  6:07         ` Balbir Singh
2009-03-02  6:19           ` KAMEZAWA Hiroyuki
2009-03-02  6:29             ` Balbir Singh
2009-03-01  6:30 ` [PATCH 3/4] Memory controller soft limit organize cgroups (v3) Balbir Singh
2009-03-01  6:30 ` [PATCH 4/4] Memory controller soft limit reclaim on contention (v3) Balbir Singh
2009-03-02  3:08   ` KOSAKI Motohiro
2009-03-02  4:44     ` Balbir Singh
2009-03-03  2:43       ` KOSAKI Motohiro
2009-03-03 11:17         ` Balbir Singh
2009-03-04  0:07           ` KOSAKI Motohiro
2009-03-02  0:24 ` [PATCH 0/4] Memory controller soft limit patches (v3) KAMEZAWA Hiroyuki
2009-03-02  4:40   ` Balbir Singh
2009-03-02  5:32     ` KAMEZAWA Hiroyuki
2009-03-02  6:05       ` Balbir Singh
2009-03-02  6:18         ` KAMEZAWA Hiroyuki
2009-03-02 17:52           ` Balbir Singh
2009-03-03  0:03             ` KAMEZAWA Hiroyuki
2009-03-03 11:23               ` Balbir Singh
2009-03-02  6:21         ` KAMEZAWA Hiroyuki
2009-03-02  6:36           ` Balbir Singh
2009-03-02  7:06             ` KAMEZAWA Hiroyuki
2009-03-02  7:17               ` KAMEZAWA Hiroyuki
2009-03-02 12:42               ` Balbir Singh
2009-03-02 14:04                 ` KAMEZAWA Hiroyuki
2009-03-02 17:41                   ` Balbir Singh [this message]
2009-03-02 23:59                     ` KAMEZAWA Hiroyuki
2009-03-03 11:12                       ` Balbir Singh
2009-03-03 11:50                         ` KAMEZAWA Hiroyuki
2009-03-03 13:14                           ` Balbir Singh
2009-03-05  9:04                         ` KAMEZAWA Hiroyuki
2009-03-05  9:13                           ` KAMEZAWA Hiroyuki
2009-03-05 15:26                           ` Balbir Singh
2009-03-05 23:53                             ` KAMEZAWA Hiroyuki
2009-03-06  3:23                             ` KAMEZAWA Hiroyuki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090302174156.GM11421@balbir.in.ibm.com \
    --to=balbir@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=bharata@in.ibm.com \
    --cc=dhaval@linux.vnet.ibm.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=menage@google.com \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    --cc=skumar@linux.vnet.ibm.com \
    --cc=xemul@openvz.org \
    --cc=yamamoto@valinux.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox