From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail191.messagelabs.com (mail191.messagelabs.com [216.82.242.19]) by kanga.kvack.org (Postfix) with ESMTP id 188086B0089 for ; Mon, 6 Dec 2010 21:30:08 -0500 (EST) Received: from kpbe14.cbf.corp.google.com (kpbe14.cbf.corp.google.com [172.25.105.78]) by smtp-out.google.com with ESMTP id oB72U6VH018977 for ; Mon, 6 Dec 2010 18:30:06 -0800 Received: from qwd6 (qwd6.prod.google.com [10.241.193.198]) by kpbe14.cbf.corp.google.com with ESMTP id oB72Tdce001774 for ; Mon, 6 Dec 2010 18:29:42 -0800 Received: by qwd6 with SMTP id 6so1955070qwd.9 for ; Mon, 06 Dec 2010 18:29:40 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20101202144132.GR2746@balbir.in.ibm.com> References: <1291099785-5433-1-git-send-email-yinghan@google.com> <20101130155327.8313.A69D9226@jp.fujitsu.com> <20101202144132.GR2746@balbir.in.ibm.com> Date: Mon, 6 Dec 2010 18:29:39 -0800 Message-ID: Subject: Re: [RFC][PATCH 0/4] memcg: per cgroup background reclaim From: Ying Han Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org To: balbir@linux.vnet.ibm.com Cc: KOSAKI Motohiro , Daisuke Nishimura , KAMEZAWA Hiroyuki , Andrew Morton , Mel Gorman , Johannes Weiner , Christoph Lameter , Wu Fengguang , Andi Kleen , Hugh Dickins , Rik van Riel , Tejun Heo , linux-mm@kvack.org List-ID: On Thu, Dec 2, 2010 at 6:41 AM, Balbir Singh wr= ote: > * Ying Han [2010-11-29 23:03:31]: > >> On Mon, Nov 29, 2010 at 10:54 PM, KOSAKI Motohiro >> wrote: >> >> The current implementation of memcg only supports direct reclaim and = this >> >> patchset adds the support for background reclaim. Per cgroup backgrou= nd >> >> reclaim is needed which spreads out the memory pressure over longer p= eriod >> >> of time and smoothes out the system performance. >> >> >> >> The current implementation is not a stable version, and it crashes so= metimes >> >> on my NUMA machine. Before going further for debugging, I would like = to start >> >> the discussion and hear the feedbacks of the initial design. >> > >> > I haven't read your code at all. However I agree your claim that memcg >> > also need background reclaim. >> >> Thanks for your comment. >> > >> > So if you post high level design memo, I'm happy. >> >> My high level design is kind of spreading out into each patch, and >> here is the consolidated one. This is nothing more but cluing all the >> commits' messages for the following patches. >> >> " >> The current implementation of memcg only supports direct reclaim and thi= s >> patchset adds the support for background reclaim. Per cgroup background >> reclaim is needed which spreads out the memory pressure over longer peri= od >> of time and smoothes out the system performance. >> >> There is a kswapd kernel thread for each memory node. We add a different= kswapd >> for each cgroup. The kswapd is sleeping in the wait queue headed at kswa= pd_wait >> field of a kswapd descriptor. The kswapd descriptor stores information o= f node >> or cgroup and it allows the global and per cgroup background reclaim to = share >> common reclaim algorithms. The per cgroup kswapd is invoked at mem_cgrou= p_charge >> when the cgroup's memory usage above a threshold--low_wmark. Then the ks= wapd >> thread starts to reclaim pages in a priority loop similar to global algo= rithm. >> The kswapd is done if the usage below a threshold--high_wmark. >> > > So the logic is per-node/per-zone/per-cgroup right? Thanks Balbir for your comments: The kswapd thread is per-cgroup, and the scanning is on per-node and per-zone. The watermarks is calculated based on the per-cgroup limit_in_bytes, and kswapd is done whenever the usage_in_bytes is under the watermarks. > >> The per cgroup background reclaim is based on the per cgroup LRU and als= o adds >> per cgroup watermarks. There are two watermarks including "low_wmark" an= d >> "high_wmark", and they are calculated based on the limit_in_bytes(hard_l= imit) >> for each cgroup. Each time the hard_limit is change, the corresponding w= marks >> are re-calculated. Since memory controller charges only user pages, ther= e is > > What about memsw limits, do they impact anything, I presume not. > >> no need for a "min_wmark". The current calculation of wmarks is a functi= on of >> "memory.min_free_kbytes" which could be adjusted by writing different va= lues >> into the new api. This is added mainly for debugging purpose. > > When you say debugging, can you elaborate? I am not sure if we would like to keep the memory.min_free_kbytes for the final version, which is used to adjust the calculation of per-cgroup wmarks. For now, I am adding it for performance testing purpose. > >> >> The kswapd() function now is shared between global and per cgroup kswapd= thread. >> It is passed in with the kswapd descriptor which contains the informatio= n of >> either node or cgroup. Then the new function balance_mem_cgroup_pgdat is= invoked >> if it is per cgroup kswapd thread. The balance_mem_cgroup_pgdat performs= a >> priority loop similar to global reclaim. In each iteration it invokes >> balance_pgdat_node for all nodes on the system, which is a new function = performs >> background reclaim per node. After reclaiming each node, it checks >> mem_cgroup_watermark_ok() and breaks the priority loop if returns true. = A per >> memcg zone will be marked as "unreclaimable" if the scanning rate is muc= h >> greater than the reclaiming rate on the per cgroup LRU. The bit is clear= ed when >> there is a page charged to the cgroup being freed. Kswapd breaks the pri= ority >> loop if all the zones are marked as "unreclaimable". >> " >> >> Also, I am happy to add more descriptions if anything not clear :) Sure. :) --Ying >> > > Thanks for explaining this in detail, it makes the review easier. > > -- > =A0 =A0 =A0 =A0Three Cheers, > =A0 =A0 =A0 =A0Balbir > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: email@kvack.org