From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail191.messagelabs.com (mail191.messagelabs.com [216.82.242.19]) by kanga.kvack.org (Postfix) with ESMTP id 1A4726B0088 for ; Tue, 3 Mar 2009 06:17:24 -0500 (EST) Received: from d23relay01.au.ibm.com (d23relay01.au.ibm.com [202.81.31.243]) by e23smtp04.au.ibm.com (8.13.1/8.13.1) with ESMTP id n23BFSSL015370 for ; Tue, 3 Mar 2009 22:15:28 +1100 Received: from d23av04.au.ibm.com (d23av04.au.ibm.com [9.190.235.139]) by d23relay01.au.ibm.com (8.13.8/8.13.8/NCO v9.2) with ESMTP id n23BHcXx434280 for ; Tue, 3 Mar 2009 22:17:38 +1100 Received: from d23av04.au.ibm.com (loopback [127.0.0.1]) by d23av04.au.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id n23BHJUP010983 for ; Tue, 3 Mar 2009 22:17:20 +1100 Date: Tue, 3 Mar 2009 16:47:13 +0530 From: Balbir Singh Subject: Re: [PATCH 4/4] Memory controller soft limit reclaim on contention (v3) Message-ID: <20090303111713.GQ11421@balbir.in.ibm.com> Reply-To: balbir@linux.vnet.ibm.com References: <20090302120052.6FEC.A69D9226@jp.fujitsu.com> <20090302044406.GD11421@balbir.in.ibm.com> <20090303095833.D9FC.A69D9226@jp.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <20090303095833.D9FC.A69D9226@jp.fujitsu.com> Sender: owner-linux-mm@kvack.org To: KOSAKI Motohiro Cc: linux-mm@kvack.org, Sudhir Kumar , YAMAMOTO Takashi , Bharata B Rao , Paul Menage , lizf@cn.fujitsu.com, linux-kernel@vger.kernel.org, David Rientjes , Pavel Emelianov , Dhaval Giani , Rik van Riel , Andrew Morton , KAMEZAWA Hiroyuki List-ID: * KOSAKI Motohiro [2009-03-03 11:43:49]: > > * KOSAKI Motohiro [2009-03-02 12:08:01]: > > > > > Hi Balbir, > > > > > > > @@ -2015,9 +2016,12 @@ static int kswapd(void *p) > > > > finish_wait(&pgdat->kswapd_wait, &wait); > > > > > > > > if (!try_to_freeze()) { > > > > + struct zonelist *zl = pgdat->node_zonelists; > > > > /* We can speed up thawing tasks if we don't call > > > > * balance_pgdat after returning from the refrigerator > > > > */ > > > > + if (!order) > > > > + mem_cgroup_soft_limit_reclaim(zl, GFP_KERNEL); > > > > balance_pgdat(pgdat, order); > > > > } > > > > } > > > > > > kswapd's roll is increasing free pages until zone->pages_high in "own node". > > > mem_cgroup_soft_limit_reclaim() free one (or more) exceed page in any node. > > > > > > Oh, well. > > > I think it is not consistency. > > > > > > if mem_cgroup_soft_limit_reclaim() is aware to target node and its pages_high, > > > I'm glad. > > > > Yes, correct the role of kswapd is to keep increasing free pages until > > zone->pages_high and the first set of pages to consider is the memory > > controller over their soft limits. We pass the zonelist to ensure that > > while doing soft reclaim, we focus on the zonelist associated with the > > node. Kamezawa had concernes over calling the soft limit reclaim from > > __alloc_pages_internal(), did you prefer that call path? > > I read your patch again. > So, mem_cgroup_soft_limit_reclaim() caller place seems in balance_pgdat() is better. > > Please imazine most bad scenario. > CPU0 (kswapd) take to continue shrinking. > CPU1 take another activity and charge memcg conteniously. > At that time, balance_pgdat() don't exit very long time. then > mem_cgroup_soft_limit_reclaim() is never called. > Yes, true... that is why I added the hooks in __alloc_pages_internal() in the first two revisions, but Kamezawa objected to them. In the scenario that you mention that balance_pgdat() is busy, if we are under global system memory pressure, even after freeing memory from soft limited cgroups, we don't have sufficient free memory. We need to go reclaim from the whole system. An administrator can easily avoid the above scenario by using hard limits on the cgroup running on CPU1. > In ideal, if another cpu take another charge, kswapd should shrink > soft limit again. > Could you please elaborate further? > > btw, I don't like "if (!order)" condition. memcg soft limit sould be > always shrinked although > it's the order of because wakeup_kswapd() argument is merely hint. > > another process want another order. > Agreed, I'll remove the check. > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org > -- Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org