linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"balbir@linux.vnet.ibm.com" <balbir@linux.vnet.ibm.com>,
	Greg Thelen <gthelen@google.com>,
	hannes@cmpxchg.org, aarcange@redhat.com,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>
Subject: Re: [PATCH 4/4] [BUGFIX] fix account leak at force_empty, rmdir with THP
Date: Mon, 17 Jan 2011 09:25:29 +0900	[thread overview]
Message-ID: <20110117092529.0708bc97.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <20110117091533.7fe2d819.nishimura@mxp.nes.nec.co.jp>

On Mon, 17 Jan 2011 09:15:33 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> Hi, thank you for your great works!
> 
> I've not read this series in detail, but one quick comment for move_parent.
> 
> > @@ -2245,6 +2253,7 @@ static int mem_cgroup_move_parent(struct
> >  	struct cgroup *cg = child->css.cgroup;
> >  	struct cgroup *pcg = cg->parent;
> >  	struct mem_cgroup *parent;
> > +	int charge_size = PAGE_SIZE;
> >  	int ret;
> >  
> >  	/* Is ROOT ? */
> > @@ -2256,16 +2265,19 @@ static int mem_cgroup_move_parent(struct
> >  		goto out;
> >  	if (isolate_lru_page(page))
> >  		goto put;
> > +	/* The page is isolated from LRU and we have no race with splitting */
> > +	if (PageTransHuge(page))
> > +		charge_size = PAGE_SIZE << compound_order(page);
> >  
> >  	parent = mem_cgroup_from_cont(pcg);
> >  	ret = __mem_cgroup_try_charge(NULL, gfp_mask, &parent, false,
> > -				      PAGE_SIZE);
> > +				      charge_size);
> >  	if (ret || !parent)
> >  		goto put_back;
> >  
> > -	ret = mem_cgroup_move_account(pc, child, parent, true);
> > +	ret = mem_cgroup_move_account(pc, child, parent, true, charge_size);
> >  	if (ret)
> > -		mem_cgroup_cancel_charge(parent, PAGE_SIZE);
> > +		mem_cgroup_cancel_charge(parent, charge_size);
> >  put_back:
> >  	putback_lru_page(page);
> >  put:
> I think there is possibility that the page is split after "if (PageTransHuge(page))".
> 
> In RHEL6, this part looks like:
> 
>    1598         if (PageTransHuge(page))
>    1599                 page_size = PAGE_SIZE << compound_order(page);
>    1600
>    1601         ret = __mem_cgroup_try_charge(NULL, gfp_mask, &parent, false, page,
>    1602                                       page_size);
>    1603         if (ret || !parent)
>    1604                 return ret;
>    1605
>    1606         if (!get_page_unless_zero(page)) {
>    1607                 ret = -EBUSY;
>    1608                 goto uncharge;
>    1609         }
>    1610
>    1611         ret = isolate_lru_page(page);
>    1612
>    1613         if (ret)
>    1614                 goto cancel;
>    1615
>    1616         compound_lock_irqsave(page, &flags);
>    1617         ret = mem_cgroup_move_account(pc, child, parent, page_size);
>    1618         compound_unlock_irqrestore(page, flags);
>    1619
> 
> In fact, I found a bug of res_counter underflow around here, and I've already send
> a patch to RedHat.
> 

Okay, I'll take care of that in the next version.

Thanks,
-Kame


> ===
> From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> 
> In mem_cgroup_move_parent(), the page can be split by other context after we
> check PageTransHuge() and before hold the compound_lock of the page later.
> 
> This means a race can happen like:
> 
> 	__split_huge_page_refcount()		mem_cgroup_move_parent()
>     ---------------------------------------------------------------------------
> 						if (PageTransHuge())
> 						-> true
> 						-> set "page_size" to huge page
> 						   size.
> 						__mem_cgroup_try_charge()
> 						-> charge "page_size" to the
> 						   parent.
> 	compound_lock()
> 	mem_cgroup_split_hugepage_commit()
> 	-> commit all the tail pages to the
> 	   "current"(i.e. child) cgroup.
> 	   iow, pc->mem_cgroup of tail pages
> 	   point to the child.
> 	ClearPageCompound()
> 	compound_unlock()
> 						compound_lock()
> 						mem_cgroup_move_account()
> 						-> make pc->mem_cgroup of the
> 						   head page point to the parent.
> 						-> uncharge "page_size" from
> 						   the child.
> 						compound_unlock()
> 
> This can causes at least 2 problems.
> 
> 1. Tail pages are linked to LRU of the child, even though usages(res_counter) of
>    them have been already uncharged from the chilid. This causes res_counter
>    underflow at removing the child directory.
> 2. Usage of the parent is increased by the huge page size at moving charge of
>    the head page, but usage will be decreased only by the normal page size when
>    the head page is uncharged later because it is not PageTransHuge() anymore.
>    This means the parent doesn't have enough pages on its LRU to decrease the
>    usage to 0 and it cannot be rmdir'ed.
> 
> This patch fixes this problem by re-checking PageTransHuge() again under the
> compound_lock.
> 
> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> 
> diff -uprN linux-2.6.32.x86_64.org/mm/memcontrol.c linux-2.6.32.x86_64/mm/memcontrol.c
> --- linux-2.6.32.x86_64.org/mm/memcontrol.c	2010-07-15 16:44:57.000000000 +0900
> +++ linux-2.6.32.x86_64/mm/memcontrol.c	2010-07-15 17:34:12.000000000 +0900
> @@ -1608,6 +1608,17 @@ static int mem_cgroup_move_parent(struct
>  		goto cancel;
>  
>  	compound_lock_irqsave(page, &flags);
> +	/* re-check under compound_lock because the page might be split */
> +	if (unlikely(page_size != PAGE_SIZE && !PageTransHuge(page))) {
> +		unsigned long extra = page_size - PAGE_SIZE;
> +		/* uncharge extra charges from parent */
> +		if (!mem_cgroup_is_root(parent)) {
> +			res_counter_uncharge(&parent->res, extra);
> +			if (do_swap_account)
> +				res_counter_uncharge(&parent->memsw, extra);
> +		}
> +		page_size = PAGE_SIZE;
> +	}
>  	ret = mem_cgroup_move_account(pc, child, parent, page_size);
>  	compound_unlock_irqrestore(page, flags);
>  
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

      reply	other threads:[~2011-01-17  0:31 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-14 10:04 [PATCH 0/4] [BUGFIX] thp vs memcg fix KAMEZAWA Hiroyuki
2011-01-14 10:06 ` [PATCH 1/4] [BUGFIX] enhance charge_statistics function for fixising issues KAMEZAWA Hiroyuki
2011-01-14 11:51   ` Johannes Weiner
2011-01-14 12:07     ` Hiroyuki Kamezawa
2011-01-14 10:09 ` [PATCH 2/4] [BUGFIX] dont set USED bit on tail pages KAMEZAWA Hiroyuki
2011-01-14 12:09   ` Johannes Weiner
2011-01-14 12:23     ` Hiroyuki Kamezawa
2011-01-14 10:10 ` [PATCH 3/4] [BUGFIX] fix memcgroup LRU stat with THP KAMEZAWA Hiroyuki
2011-01-14 12:14   ` Johannes Weiner
2011-01-14 12:24     ` Hiroyuki Kamezawa
2011-01-14 10:15 ` [PATCH 4/4] [BUGFIX] fix account leak at force_empty, rmdir " KAMEZAWA Hiroyuki
2011-01-14 12:21   ` Johannes Weiner
2011-01-14 12:28     ` Hiroyuki Kamezawa
2011-01-17  0:15   ` Daisuke Nishimura
2011-01-17  0:25     ` KAMEZAWA Hiroyuki [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110117092529.0708bc97.kamezawa.hiroyu@jp.fujitsu.com \
    --to=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nishimura@mxp.nes.nec.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox