From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail191.messagelabs.com (mail191.messagelabs.com [216.82.242.19]) by kanga.kvack.org (Postfix) with ESMTP id 40A5B6B0089 for ; Sun, 22 Mar 2009 23:16:17 -0400 (EDT) Received: from d28relay02.in.ibm.com (d28relay02.in.ibm.com [9.184.220.59]) by e28smtp07.in.ibm.com (8.13.1/8.13.1) with ESMTP id n2N4D8R8020806 for ; Mon, 23 Mar 2009 09:43:08 +0530 Received: from d28av03.in.ibm.com (d28av03.in.ibm.com [9.184.220.65]) by d28relay02.in.ibm.com (8.13.8/8.13.8/NCO v9.2) with ESMTP id n2N49eg73194940 for ; Mon, 23 Mar 2009 09:39:40 +0530 Received: from d28av03.in.ibm.com (loopback [127.0.0.1]) by d28av03.in.ibm.com (8.13.1/8.13.3) with ESMTP id n2N4D7fq001226 for ; Mon, 23 Mar 2009 15:13:08 +1100 Date: Mon, 23 Mar 2009 09:42:53 +0530 From: Balbir Singh Subject: Re: [PATCH 5/5] Memory controller soft limit reclaim on contention (v7) Message-ID: <20090323041253.GH24227@balbir.in.ibm.com> Reply-To: balbir@linux.vnet.ibm.com References: <20090319165713.27274.94129.sendpatchset@localhost.localdomain> <20090319165752.27274.36030.sendpatchset@localhost.localdomain> <20090320130630.8b9ac3c7.kamezawa.hiroyu@jp.fujitsu.com> <20090322142748.GC24227@balbir.in.ibm.com> <20090323090205.49fc95d0.kamezawa.hiroyu@jp.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <20090323090205.49fc95d0.kamezawa.hiroyu@jp.fujitsu.com> Sender: owner-linux-mm@kvack.org To: KAMEZAWA Hiroyuki Cc: linux-mm@kvack.org, YAMAMOTO Takashi , lizf@cn.fujitsu.com, KOSAKI Motohiro , Rik van Riel , Andrew Morton List-ID: * KAMEZAWA Hiroyuki [2009-03-23 09:02:05]: > On Sun, 22 Mar 2009 19:57:48 +0530 > Balbir Singh wrote: > > > * KAMEZAWA Hiroyuki [2009-03-20 13:06:30]: > > > > > On Thu, 19 Mar 2009 22:27:52 +0530 > > > Balbir Singh wrote: > > > > > > > Feature: Implement reclaim from groups over their soft limit > > > > > > > > From: Balbir Singh > > > > > > > > Changelog v7...v6 > > > > 1. Refactored out reclaim_options patch into a separate patch > > > > 2. Added additional checks for all swap off condition in > > > > mem_cgroup_hierarchical_reclaim() > > > > > > > - did_some_progress = try_to_free_pages(zonelist, order, gfp_mask); > > > > + /* > > > > + * Try to free up some pages from the memory controllers soft > > > > + * limit queue. > > > > + */ > > > > + did_some_progress = mem_cgroup_soft_limit_reclaim(zonelist, gfp_mask); > > > > + if (order || !did_some_progress) > > > > + did_some_progress += try_to_free_pages(zonelist, order, > > > > + gfp_mask); > > > > > > > > > > Anyway, my biggest concern is here, always. > > > > > > By this. > > > if (order > 1), try_to_free_pages() is called twice. > > > > try_to_free_mem_cgroup_pages and try_to_free_pages() are called > > > > > Hmm...how about > > > > > > > > > > if (!pages_reclaimed && !(gfp_mask & __GFP_NORETRY)) { # this is the first loop or noretry > > > did_some_progress = mem_cgroup_soft_limit_reclaim(zonelist, gfp_mask); > > > > OK, I see what you mean.. but the cost of the > > mem_cgroup_soft_limit_reclaim() is really a low overhead call, which > > will bail out very quickly if nothing is over their soft limit. > > My point is "if something is over soft limit" case. Memory is reclaiemd twice. > My above code tries to avoid call memory-reclaim twice. > Twice if order > 0 or if soft limit reclaim fails or there is nothing to soft limit reclaim. > Even if order > 0, mem_cgroup_try_to_free_pages() may be able to recover > the situation. Maybe it's better to allow lumpty-reclaim even when > !scanning_global_lru(). > if order > 0, we let the global reclaim handler reclaim (scan global LRU). I think the chance of success is higher through that path, having said that I have not experimented with trying to allow lumpy-reclaim from memory cgroup LRU's. I think that should be a separate effort from this one. > > > Even if we retry, we do a simple check for soft-limit-reclaim, if > > there is really something to be reclaimed, we reclaim from there > > first. > > > That means you reclaim memory twice ;) > AFAIK, > - fork() -> task_struct/stack > page table in x86 PAE mode > requires order-1 pages very frequently and this "call twice" approach will kill > the application peformance very effectively. Yes, it would if this was the only way to allocate pages. But look at reality, with kswapd running in the background, how frequently do you expect to hit the reclaim path. Could you clarify what you mean by order-1 (2^1), if so soft limit reclaim is not invoked and it should not hurt performance. What am I missing? > > > > if (!did_some_progress) > > > did_some_progress = try_to_free_pages(zonelist, order, gfp_mask); > > > }else > > > did_some_progress = try_to_free_pages(zonelist, order, gfp_mask); > > > > > > > > > maybe a bit more concervative. > > > > > > > > > And I wonder "nodemask" should be checked or not.. > > > softlimit reclaim doesn't seem to work well with nodemask... > > > > Doesn't the zonelist take care of nodemask? > > > > Not sure, but I think, no check. hmm BUG in vmscan.c ? > The zonelist is built using policy_zonelist, that handles nodemask as well. That should keep the zonelist and nodemask in sync.. no? -- Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org