linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Balbir Singh <balbir@linux.vnet.ibm.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: linux-mm@kvack.org, Sudhir Kumar <skumar@linux.vnet.ibm.com>,
	YAMAMOTO Takashi <yamamoto@valinux.co.jp>,
	Bharata B Rao <bharata@in.ibm.com>,
	Paul Menage <menage@google.com>,
	lizf@cn.fujitsu.com, linux-kernel@vger.kernel.org,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	David Rientjes <rientjes@google.com>,
	Pavel Emelianov <xemul@openvz.org>,
	Dhaval Giani <dhaval@linux.vnet.ibm.com>,
	Rik van Riel <riel@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH 0/4] Memory controller soft limit patches (v3)
Date: Mon, 2 Mar 2009 18:12:10 +0530	[thread overview]
Message-ID: <20090302124210.GK11421@balbir.in.ibm.com> (raw)
In-Reply-To: <20090302160602.521928a5.kamezawa.hiroyu@jp.fujitsu.com>

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-02 16:06:02]:

> On Mon, 2 Mar 2009 12:06:49 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
> > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-02 15:21:28]:
> > 
> > > On Mon, 2 Mar 2009 11:35:19 +0530
> > > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > > 
> > > > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-02 14:32:50]:
> > > > 
> > > > > On Mon, 2 Mar 2009 10:10:43 +0530
> > > > > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > > > > 
> > > > > > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-02 09:24:04]:
> > > > > > 
> > > > > > > On Sun, 01 Mar 2009 11:59:59 +0530
> > > > > > > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > > > > > > 
> > > > > > > > 
> > > > > > > > From: Balbir Singh <balbir@linux.vnet.ibm.com>
> > > > > 
> > > > > > > 
> > > > > > > At first, it's said "When cgroup people adds something, the kernel gets slow".
> > > > > > > This is my start point of reviewing. Below is comments to this version of patch.
> > > > > > > 
> > > > > > >  1. I think it's bad to add more hooks to res_counter. It's enough slow to give up
> > > > > > >     adding more fancy things..
> > > > > > 
> > > > > > res_counters was desgined to be extensible, why is adding anything to
> > > > > > it going to make it slow, unless we turn on soft_limits?
> > > > > > 
> > > > > You inserted new "if" logic in the core loop.
> > > > > (What I want to say here is not that this is definitely bad but that "isn't there
> > > > >  any alternatives which is less overhead.)
> > > > > 
> > > > > 
> > > > > > > 
> > > > > > >  2. please avoid to add hooks to hot-path. In your patch, especially a hook to
> > > > > > >     mem_cgroup_uncharge_common() is annoying me.
> > > > > > 
> > > > > > If soft limits are not enabled, the function does a small check and
> > > > > > leaves. 
> > > > > > 
> > > > > &soft_fail_res is passed always even if memory.soft_limit==ULONG_MAX
> > > > > res_counter_soft_limit_excess() adds one more function call and spinlock, and irq-off.
> > > > >
> > > > 
> > > > OK, I see that overhead.. I'll figure out a way to work around it.
> > > >  
> > > > > > > 
> > > > > > >  3. please avoid to use global spinlock more. 
> > > > > > >     no lock is best. mutex is better, maybe.
> > > > > > > 
> > > > > > 
> > > > > > No lock to update a tree which is update concurrently?
> > > > > > 
> > > > > Using tree/sort itself is nonsense, I believe.
> > > > > 
> > > > 
> > > > I tried using prio trees in the past, but they are not easy to update
> > > > either. I won't mind asking for suggestions for a data structure that
> > > > can scaled well, allow quick insert/delete and search.
> > > > 
> > > Now, because the routine is called by kswapd() not by try_to_free.....
> > > 
> > > It's not necessary to be very very fast. That's my point.
> > >
> > 
> > OK, I get your point, but whay does that make RB-Tree data structure non-sense?
> >  
> 
>  1. Until memory-shortage, rb-tree is kept to be updated and the users(kernel)
>     has to pay its maintainace/check cost, whici is unnecessary.
>     Considering trade-off, paying cost only when memory-shortage happens tend to
>     be reasonable way.
As you've seen in the code, the cost is only at an interval HZ/2
currently. The other overhead is the calculation of excess, I can try
and see if we can get rid of it.

> 
>  2. Current "exceed" just shows "How much we got over my soft limit" but doesn't
>     tell any information per-node/zone. Considering this, this rb-tree
>     information will not be able to help kswapd (on NUMA).
>     But maintain per-node information uses too much resource.

Yes, kswapd is per-node and we try to free all pages belonging to a
zonelist as specified by pgdat->node_zonelists for the memory control
groups that are over their soft limit. Keeping this information per
node makes no sense (exceeds information).

> 
>  Considering above 2, it's not bad to find victim by proper logic
>  from balance_pgdat() by using mem_cgroup_select_victim().
>  like this:
> ==
>  struct mem_cgroup *select_vicitim_at_soft_limit_via_balance_pgdat(int nid, int zid)
>  {
>      while (?) {
>         vitcim = mem_cgroup_select_victim(init_mem_cgroup);  #need some modification.
>         if (victim is not over soft-limit)
>              continue;
>         /* Ok this is candidate */
>         usage = mem_cgroup_nid_zid_usage(mem, nid, zid); #get sum of active/inactive
>         if (usage_is_enough_big)
>               return victim;

We currently track overall usage, so we split into per nid, zid
information and use that? Is that your suggestion? The soft limit is
also an aggregate limit, how do we define usage_is_big_enough or
usage_is_enough_big? Through some heuristics?

>      }
>  }
>  balance_pgdat()
>  ...... find target zone....
>  ...
>  mem = select_victime_at_soft_limit_via_balance_pgdat(nid, zid)
>  if (mem)
>    sc->mem = mem;
>  shrink_zone();
>  if (mem) {
>    sc->mem = NULL;
>    css_put(&mem->css);
>  }
> ==
> 
>  We have to pay scan cost but it will not be too big(if there are not thousands of memcg.)
>  Under above, round-robin rotation is used rather than sort.

Yes, we sort, but not frequently at every page-fault but at a
specified interval.

>  Maybe I can show you sample.....(but I'm a bit busy.)
>

Explanation and review is good, but I don't see how not-sorting will
help? I need something that can help me point to the culprits quickly
enough during soft limit reclaim and RB-Tree works very well for me. 

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2009-03-02 12:42 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-01  6:29 Balbir Singh
2009-03-01  6:30 ` [PATCH 1/4] Memory controller soft limit documentation (v3) Balbir Singh
2009-03-01  6:30 ` [PATCH 2/4] Memory controller soft limit interface (v3) Balbir Singh
2009-03-02  2:03   ` KAMEZAWA Hiroyuki
2009-03-02  4:46     ` Balbir Singh
2009-03-02  5:35       ` KAMEZAWA Hiroyuki
2009-03-02  6:07         ` Balbir Singh
2009-03-02  6:19           ` KAMEZAWA Hiroyuki
2009-03-02  6:29             ` Balbir Singh
2009-03-01  6:30 ` [PATCH 3/4] Memory controller soft limit organize cgroups (v3) Balbir Singh
2009-03-01  6:30 ` [PATCH 4/4] Memory controller soft limit reclaim on contention (v3) Balbir Singh
2009-03-02  3:08   ` KOSAKI Motohiro
2009-03-02  4:44     ` Balbir Singh
2009-03-03  2:43       ` KOSAKI Motohiro
2009-03-03 11:17         ` Balbir Singh
2009-03-04  0:07           ` KOSAKI Motohiro
2009-03-02  0:24 ` [PATCH 0/4] Memory controller soft limit patches (v3) KAMEZAWA Hiroyuki
2009-03-02  4:40   ` Balbir Singh
2009-03-02  5:32     ` KAMEZAWA Hiroyuki
2009-03-02  6:05       ` Balbir Singh
2009-03-02  6:18         ` KAMEZAWA Hiroyuki
2009-03-02 17:52           ` Balbir Singh
2009-03-03  0:03             ` KAMEZAWA Hiroyuki
2009-03-03 11:23               ` Balbir Singh
2009-03-02  6:21         ` KAMEZAWA Hiroyuki
2009-03-02  6:36           ` Balbir Singh
2009-03-02  7:06             ` KAMEZAWA Hiroyuki
2009-03-02  7:17               ` KAMEZAWA Hiroyuki
2009-03-02 12:42               ` Balbir Singh [this message]
2009-03-02 14:04                 ` KAMEZAWA Hiroyuki
2009-03-02 17:41                   ` Balbir Singh
2009-03-02 23:59                     ` KAMEZAWA Hiroyuki
2009-03-03 11:12                       ` Balbir Singh
2009-03-03 11:50                         ` KAMEZAWA Hiroyuki
2009-03-03 13:14                           ` Balbir Singh
2009-03-05  9:04                         ` KAMEZAWA Hiroyuki
2009-03-05  9:13                           ` KAMEZAWA Hiroyuki
2009-03-05 15:26                           ` Balbir Singh
2009-03-05 23:53                             ` KAMEZAWA Hiroyuki
2009-03-06  3:23                             ` KAMEZAWA Hiroyuki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090302124210.GK11421@balbir.in.ibm.com \
    --to=balbir@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=bharata@in.ibm.com \
    --cc=dhaval@linux.vnet.ibm.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=menage@google.com \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    --cc=skumar@linux.vnet.ibm.com \
    --cc=xemul@openvz.org \
    --cc=yamamoto@valinux.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox