From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail190.messagelabs.com (mail190.messagelabs.com [216.82.249.51]) by kanga.kvack.org (Postfix) with SMTP id 1CA266B0085 for ; Mon, 29 Nov 2010 19:09:14 -0500 (EST) Received: from m3.gw.fujitsu.co.jp ([10.0.50.73]) by fgwmail7.fujitsu.co.jp (Fujitsu Gateway) with ESMTP id oAU09Amc019135 for (envelope-from kamezawa.hiroyu@jp.fujitsu.com); Tue, 30 Nov 2010 09:09:10 +0900 Received: from smail (m3 [127.0.0.1]) by outgoing.m3.gw.fujitsu.co.jp (Postfix) with ESMTP id 40AF545DE57 for ; Tue, 30 Nov 2010 09:09:10 +0900 (JST) Received: from s3.gw.fujitsu.co.jp (s3.gw.fujitsu.co.jp [10.0.50.93]) by m3.gw.fujitsu.co.jp (Postfix) with ESMTP id 0D15745DE60 for ; Tue, 30 Nov 2010 09:09:10 +0900 (JST) Received: from s3.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s3.gw.fujitsu.co.jp (Postfix) with ESMTP id F38EB1DB8037 for ; Tue, 30 Nov 2010 09:09:09 +0900 (JST) Received: from m108.s.css.fujitsu.com (m108.s.css.fujitsu.com [10.249.87.108]) by s3.gw.fujitsu.co.jp (Postfix) with ESMTP id B50B3E08004 for ; Tue, 30 Nov 2010 09:09:09 +0900 (JST) Date: Tue, 30 Nov 2010 09:03:33 +0900 From: KAMEZAWA Hiroyuki Subject: Re: Question about cgroup hierarchy and reducing memory limit Message-Id: <20101130090333.0c8c1728.kamezawa.hiroyu@jp.fujitsu.com> In-Reply-To: <20101129140233.GA4199@balbir.in.ibm.com> References: <20101124094736.3c4ba760.kamezawa.hiroyu@jp.fujitsu.com> <20101125100428.24920cd3.kamezawa.hiroyu@jp.fujitsu.com> <20101129155858.6af29381.kamezawa.hiroyu@jp.fujitsu.com> <20101129140233.GA4199@balbir.in.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: balbir@linux.vnet.ibm.com Cc: Evgeniy Ivanov , linux-mm@kvack.org, "nishimura@mxp.nes.nec.co.jp" , gthelen@google.com List-ID: On Mon, 29 Nov 2010 19:32:33 +0530 Balbir Singh wrote: > * KAMEZAWA Hiroyuki [2010-11-29 15:58:58]: > > > On Thu, 25 Nov 2010 13:51:06 +0300 > > Evgeniy Ivanov wrote: > > > > > That would be great, thanks! > > > For now we decided either to use decreasing limits in script with > > > timeout or controlling the limit just by root group. > > > > > > > I wrote a patch as below but I also found that "success" of shrkinking limit > > means easy OOM Kill because we don't have wait-for-writeback logic. > > > > Now, -EBUSY seems to be a safe guard logic against OOM KILL. > > I'd like to wait for the merge of dirty_ratio logic and test this again. > > I hope it helps. > > > > Thanks, > > -Kame > > == > > At changing limit of memory cgroup, we see many -EBUSY when > > 1. Cgroup is small. > > 2. Some tasks are accessing pages very frequently. > > > > It's not very covenient. This patch makes memcg to be in "shrinking" mode > > when the limit is shrinking. This patch does, > > > > a) block new allocation. > > b) ignore page reference bit at shrinking. > > > > The admin should know what he does... > > > > Need: > > - dirty_ratio for avoid OOM. > > - Documentation update. > > > > Note: > > - Sudden shrinking of memory limit tends to cause OOM. > > We need dirty_ratio patch before merging this. > > > > Reported-by: Evgeniy Ivanov > > Signed-off-by: KAMEZAWA Hiroyuki > > --- > > include/linux/memcontrol.h | 6 +++++ > > mm/memcontrol.c | 48 +++++++++++++++++++++++++++++++++++++++++++++ > > mm/vmscan.c | 2 + > > 3 files changed, 56 insertions(+) > > > > Index: mmotm-1117/mm/memcontrol.c > > =================================================================== > > --- mmotm-1117.orig/mm/memcontrol.c > > +++ mmotm-1117/mm/memcontrol.c > > @@ -239,6 +239,7 @@ struct mem_cgroup { > > unsigned int swappiness; > > /* OOM-Killer disable */ > > int oom_kill_disable; > > + atomic_t shrinking; > > > > /* set when res.limit == memsw.limit */ > > bool memsw_is_minimum; > > @@ -1814,6 +1815,25 @@ static int __cpuinit memcg_cpu_hotplug_c > > return NOTIFY_OK; > > } > > > > +static DECLARE_WAIT_QUEUE_HEAD(memcg_shrink_waitq); > > + > > +bool mem_cgroup_shrinking(struct mem_cgroup *mem) > > I prefer is_mem_cgroup_shrinking > Hmm, ok. > > +{ > > + return atomic_read(&mem->shrinking) > 0; > > +} > > + > > +void mem_cgroup_shrink_wait(struct mem_cgroup *mem) > > +{ > > + wait_queue_t wait; > > + > > + init_wait(&wait); > > + prepare_to_wait(&memcg_shrink_waitq, &wait, TASK_INTERRUPTIBLE); > > + smp_rmb(); > > Why the rmb? > my fault. > > + if (mem_cgroup_shrinking(mem)) > > + schedule(); > > We need to check for signals if we sleep with TASK_INTERRUPTIBLE, but > that complicates the entire path as well. May be the question to ask > is - why is this TASK_INTERRUPTIBLE, what is the expected delay. Could > this be a fairness issue as well? > Signal check is done in do_charge() automaticaly. > > + finish_wait(&memcg_shrink_waitq, &wait); > > +} > > + > > > > /* See __mem_cgroup_try_charge() for details */ > > enum { > > @@ -1832,6 +1852,17 @@ static int __mem_cgroup_do_charge(struct > > unsigned long flags = 0; > > int ret; > > > > + /* > > + * If shrinking() == true, admin is now reducing limit of memcg and > > + * reclaiming memory eagerly. This _new_ charge will increase usage and > > + * prevents the system from setting new limit. We add delay here and > > + * make reducing size easier. > > + */ > > + if (unlikely(mem_cgroup_shrinking(mem)) && (gfp_mask & __GFP_WAIT)) { > > + mem_cgroup_shrink_wait(mem); > > + return CHARGE_RETRY; > > + } > > + > > Oh! oh! I'd hate to do this in the fault path > Why ? We have per-cpu stock now and infulence of this is minimum. We never hit this. If problem, I'll use per-cpu value but it seems to be overkill. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: email@kvack.org