From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail190.messagelabs.com (mail190.messagelabs.com [216.82.249.51]) by kanga.kvack.org (Postfix) with SMTP id BB0B96B003D for ; Tue, 10 Mar 2009 01:05:39 -0400 (EDT) Received: from m3.gw.fujitsu.co.jp ([10.0.50.73]) by fgwmail5.fujitsu.co.jp (Fujitsu Gateway) with ESMTP id n2A55aXO017210 for (envelope-from kamezawa.hiroyu@jp.fujitsu.com); Tue, 10 Mar 2009 14:05:36 +0900 Received: from smail (m3 [127.0.0.1]) by outgoing.m3.gw.fujitsu.co.jp (Postfix) with ESMTP id 6FE7245DD78 for ; Tue, 10 Mar 2009 14:05:36 +0900 (JST) Received: from s3.gw.fujitsu.co.jp (s3.gw.fujitsu.co.jp [10.0.50.93]) by m3.gw.fujitsu.co.jp (Postfix) with ESMTP id 4D0A245DD7F for ; Tue, 10 Mar 2009 14:05:36 +0900 (JST) Received: from s3.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s3.gw.fujitsu.co.jp (Postfix) with ESMTP id 22E7CE08006 for ; Tue, 10 Mar 2009 14:05:36 +0900 (JST) Received: from m108.s.css.fujitsu.com (m108.s.css.fujitsu.com [10.249.87.108]) by s3.gw.fujitsu.co.jp (Postfix) with ESMTP id BF07FE08008 for ; Tue, 10 Mar 2009 14:05:35 +0900 (JST) Date: Tue, 10 Mar 2009 14:04:16 +0900 From: KAMEZAWA Hiroyuki Subject: Re: [BUGFIX][PATCH] memcg: charge swapcache to proper memcg Message-Id: <20090310140416.ecf5ba18.kamezawa.hiroyu@jp.fujitsu.com> In-Reply-To: <20090310134757.38e37444.nishimura@mxp.nes.nec.co.jp> References: <20090310100707.e0640b0b.nishimura@mxp.nes.nec.co.jp> <20090310133316.b56d3319.kamezawa.hiroyu@jp.fujitsu.com> <20090310134757.38e37444.nishimura@mxp.nes.nec.co.jp> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: Daisuke Nishimura Cc: Andrew Morton , linux-mm , Balbir Singh , Li Zefan , Hugh Dickins List-ID: On Tue, 10 Mar 2009 13:47:57 +0900 Daisuke Nishimura wrote: > On Tue, 10 Mar 2009 13:33:16 +0900, KAMEZAWA Hiroyuki wrote: > > On Tue, 10 Mar 2009 10:07:07 +0900 > > Daisuke Nishimura wrote: > > > > > From: Daisuke Nishimura > > > > > > memcg_test.txt says at 4.1: > > > > > > This swap-in is one of the most complicated work. In do_swap_page(), > > > following events occur when pte is unchanged. > > > > > > (1) the page (SwapCache) is looked up. > > > (2) lock_page() > > > (3) try_charge_swapin() > > > (4) reuse_swap_page() (may call delete_swap_cache()) > > > (5) commit_charge_swapin() > > > (6) swap_free(). > > > > > > Considering following situation for example. > > > > > > (A) The page has not been charged before (2) and reuse_swap_page() > > > doesn't call delete_from_swap_cache(). > > > (B) The page has not been charged before (2) and reuse_swap_page() > > > calls delete_from_swap_cache(). > > > (C) The page has been charged before (2) and reuse_swap_page() doesn't > > > call delete_from_swap_cache(). > > > (D) The page has been charged before (2) and reuse_swap_page() calls > > > delete_from_swap_cache(). > > > > > > memory.usage/memsw.usage changes to this page/swp_entry will be > > > Case (A) (B) (C) (D) > > > Event > > > Before (2) 0/ 1 0/ 1 1/ 1 1/ 1 > > > =========================================== > > > (3) +1/+1 +1/+1 +1/+1 +1/+1 > > > (4) - 0/ 0 - -1/ 0 > > > (5) 0/-1 0/ 0 -1/-1 0/ 0 > > > (6) - 0/-1 - 0/-1 > > > =========================================== > > > Result 1/ 1 1/ 1 1/ 1 1/ 1 > > > > > > In any cases, charges to this page should be 1/ 1. > > > > > > In case of (D), mem_cgroup_try_get_from_swapcache() returns NULL > > > (because lookup_swap_cgroup() returns NULL), so "+1/+1" at (3) means > > > charges to the memcg("foo") to which the "current" belongs. > > > OTOH, "-1/0" at (4) and "0/-1" at (6) means uncharges from the memcg("baa") > > > to which the page has been charged. > > > > > > So, if the "foo" and "baa" is different(for example because of task move), > > > this charge will be moved from "baa" to "foo". > > > > > > I think this is an unexpected behavior. > > > > > > This patch fixes this by modifying mem_cgroup_try_get_from_swapcache() > > > to return the memcg to which the swapcache has been charged if PCG_USED bit > > > is set. > > > IIUC, checking PCG_USED bit of swapcache is safe under page lock. > > > > > > > > > Signed-off-by: Daisuke Nishimura > > > --- > > > mm/memcontrol.c | 15 +++++++++++++-- > > > 1 files changed, 13 insertions(+), 2 deletions(-) > > > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > > index 73c51c8..f2efbc0 100644 > > > --- a/mm/memcontrol.c > > > +++ b/mm/memcontrol.c > > > @@ -909,13 +909,24 @@ nomem: > > > static struct mem_cgroup *try_get_mem_cgroup_from_swapcache(struct page *page) > > > { > > > struct mem_cgroup *mem; > > > + struct page_cgroup *pc; > > > swp_entry_t ent; > > > > > > + VM_BUG_ON(!PageLocked(page)); > > > + > > > if (!PageSwapCache(page)) > > > return NULL; > > > > > > - ent.val = page_private(page); > > > - mem = lookup_swap_cgroup(ent); > > > + pc = lookup_page_cgroup(page); > > > + /* > > > + * Used bit of swapcache is solid under page lock. > > > + */ > > > + if (PageCgroupUsed(pc)) > > > + mem = pc->mem_cgroup; > > > > I've already acked but how about returning NULL here ? > > > Returning NULL here means try_charge_swapin charges "current" memcg("foo" > in the patch description above). > So, it doesn't change current behavior at all. > ok, maybe try_charge_swapin() should check Used bit...and set ptr=NULL before reaching here. Can't we move this + pc = lookup_page_cgroup(page); + if (PageCgroupUsed(pc)) check to try_charge_swapin() ? (I think this is safe because the page is locked.) By this, we can avoid more works in commit_charge(). -Kame > Thanks, > Daisuke Nishimura. > > > THanks > > -Kame > > > > > + else { > > > + ent.val = page_private(page); > > > + mem = lookup_swap_cgroup(ent); > > > + } > > > if (!mem) > > > return NULL; > > > if (!css_tryget(&mem->css)) > > > > > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org