From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail137.messagelabs.com (mail137.messagelabs.com [216.82.249.19]) by kanga.kvack.org (Postfix) with ESMTP id D83F96B0169 for ; Fri, 12 Aug 2011 13:08:25 -0400 (EDT) Received: from wpaz17.hot.corp.google.com (wpaz17.hot.corp.google.com [172.24.198.81]) by smtp-out.google.com with ESMTP id p7CH8KVS006596 for ; Fri, 12 Aug 2011 10:08:21 -0700 Received: from qyk34 (qyk34.prod.google.com [10.241.83.162]) by wpaz17.hot.corp.google.com with ESMTP id p7CH7Uqu028701 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=NOT) for ; Fri, 12 Aug 2011 10:08:19 -0700 Received: by qyk34 with SMTP id 34so441417qyk.10 for ; Fri, 12 Aug 2011 10:08:19 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20110812083458.GB6916@cmpxchg.org> References: <1306909519-7286-1-git-send-email-hannes@cmpxchg.org> <1306909519-7286-9-git-send-email-hannes@cmpxchg.org> <20110812083458.GB6916@cmpxchg.org> Date: Fri, 12 Aug 2011 10:08:18 -0700 Message-ID: Subject: Re: [patch 8/8] mm: make per-memcg lru lists exclusive From: Ying Han Content-Type: multipart/alternative; boundary=00163628429ec252a704aa51f55b Sender: owner-linux-mm@kvack.org List-ID: To: Johannes Weiner Cc: KAMEZAWA Hiroyuki , Daisuke Nishimura , Balbir Singh , Michal Hocko , Andrew Morton , Rik van Riel , Minchan Kim , KOSAKI Motohiro , Mel Gorman , Greg Thelen , Michel Lespinasse , linux-mm@kvack.org, linux-kernel@vger.kernel.org --00163628429ec252a704aa51f55b Content-Type: text/plain; charset=ISO-8859-1 On Fri, Aug 12, 2011 at 1:34 AM, Johannes Weiner wrote: > On Thu, Aug 11, 2011 at 01:33:05PM -0700, Ying Han wrote: > > > Johannes, I wonder if we should include the following patch: > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index 674823e..1513deb 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -832,7 +832,7 @@ static void > > mem_cgroup_lru_del_before_commit_swapcache(struct page *page) > > * Forget old LRU when this page_cgroup is *not* used. This Used > bit > > * is guarded by lock_page() because the page is SwapCache. > > */ > > - if (!PageCgroupUsed(pc)) > > + if (PageLRU(page) && !PageCgroupUsed(pc)) > > del_page_from_lru(zone, page); > > spin_unlock_irqrestore(&zone->lru_lock, flags); > > : > Yes, as the first PageLRU check is outside the lru_lock, PageLRU may > indeed go away before grabbing the lock. The page will already be > unlinked and the LRU accounting will be off. > For some reason, the first check of PageLRU was removed by some commit in my source tree and I don't know why. Guess I have to double check w/ that. > > The deeper problem, however, is that del_page_from_lru is wrong. We > can not keep the page off the LRU while leaving PageLRU set, or it > won't be very meaningful after the commit, anyway. Yes, leaving the LRU bit on while not linked to a LRU will cause various problems. This is what it looks like on my tree: - if (!PageCgroupUsed(pc)) + if (PageLRU(page) && !PageCgroupUsed(pc)) { + ClearPageLRU(page); del_page_from_lru(zone, page); } spin_unlock_irqrestore(&zone->lru_lock, flags); We are working on the patch to break zone->lru_lock, and without this patch the system crashes w/ running some swaptests. Sorry I didn't post the full patch at the beginning since not sure the second "+" related to the lru_lock patch or not. And in reality, we only care about properly memcg-unaccounting the old lru > state before > we change pc->mem_cgroup, so this becomes > > if (!PageLRU(page)) > return; > spin_lock_irqsave(&zone->lru_lock, flags); > if (!PageCgroupUsed(pc)) > mem_cgroup_lru_del(page); > spin_unlock_irqrestore(&zone->lru_lock, flags); > I don't see why we should care if the page stays physically linked to > the list. Can you clarify that? > The PageLRU check outside the lock is still fine as the > accounting has been done already if !PageLRU and a putback without > PageCgroupUsed will not re-account to pc->mem_cgroup, as the comment > above this code explains nicely. > The handling after committing the charge becomes this: > > - if (likely(!PageLRU(page))) > - return; > spin_lock_irqsave(&zone->lru_lock, flags); > lru = page_lru(page); > if (PageLRU(page) && !PageCgroupAcctLRU(pc)) { > del_page_from_lru_list(zone, page, lru); > add_page_to_lru_list(zone, page, lru); > } > > If the page is not on the LRU, someone else will put it there and link > it up properly. If it is on the LRU and already memcg-accounted then > it must be on the right lruvec as setting pc->mem_cgroup and PCG_USED > is properly ordered. Otherwise, it has to be physically moved to the > correct lruvec and memcg-accounted for. > While working on the zone->lru_lock patch, i have been questioning myself on the PageLRU and PageCgroupAcctLRU bit. Here is my question: It looks to me that PageLRU indicates the page is linked to per-zone lru list, and PageCgroupAcctLRU indicates the page is charged to a memcg and also linked to memcg's private lru list. All of these work nicely when we have both global and private (per-memcg) lru list, but i can not put them together after this patch. Now page is linked to private lru always either memcg or root. While linked to either lru list, the page could be uncharged (like swapcache). No matter what, i am thinking whether or not we can get rid of the AcctLRU bit from pc and use LRU bit only here. I haven't got chance put up the patch doing that, and at the same time i wonder maybe i missed something ? > The old unlocked PageLRU check in after_commit is no longer possible > because setting PG_lru is not ordered against setting the list head, > which means the page could be linked to the wrong lruvec while this > CPU would not yet observe PG_lru and do the relink. So this needs > strong ordering. Given that this code is hairy enough as it is, I > just removed the preliminary check for now and do the check only under > the lock instead of adding barriers here and to the lru linking sites. > > Thanks for making me write this out, few thinks put one's > understanding of a problem to the test like this. > > Let's hope it helped :-) > Thank you for the detailed information :) --Ying --00163628429ec252a704aa51f55b Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

On Fri, Aug 12, 2011 at 1:34 AM, Johanne= s Weiner <hannes= @cmpxchg.org> wrote:
On Thu, Aug 11, 2011 at 01:33:05PM -0700, Ying Han wrote:=
> > Johannes, I wonder if we should include the following patch:
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 674823e..1513deb 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -832,7 +832,7 @@ static void
> mem_cgroup_lru_del_before_commit_swapcache(struct page *page)
> =A0 =A0 =A0 =A0 =A0* Forget old LRU when this page_cgroup is *not* use= d. This Used bit
> =A0 =A0 =A0 =A0 =A0* is guarded by lock_page() because the page is Swa= pCache.
> =A0 =A0 =A0 =A0 =A0*/
> - =A0 =A0 =A0 if (!PageCgroupUsed(pc))
> + =A0 =A0 =A0 if (PageLRU(page) && !PageCgroupUsed(pc))
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 del_page_from_lru(zone, page);
> =A0 =A0 =A0 =A0 spin_unlock_irqrestore(&zone->lru_lock, flags);=


:
=A0
Yes, as the first PageLRU check is outside the lru_lock, PageLRU may<= br> indeed go away before grabbing the lock. =A0The page will already be
unlinked and the LRU accounting will be off.

For some reason, the first check of PageLRU was removed by some commi= t in my source tree and I don't know why. Guess=A0I have to double chec= k w/ that.=A0

The deeper problem, however, is that del_page_from_lru is wrong. =A0We
can not keep the page off the LRU while leaving PageLRU set, or it
won't be very meaningful after the commit, anyway. =A0

Yes, leaving the LRU bit on while not linked to a LRU will = cause various problems. This is what it looks like on my tree:

- =A0 =A0 =A0 if (!PageCgroupUsed(pc))
+ =A0 =A0 =A0 if (PageLRU(page) && !PageCgroupUsed(pc)) = {
+ =A0 =A0 = =A0 =A0 =A0 =A0 =A0ClearPageLRU(page);
=A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 del_page_from_lru(zone, page);
}
=A0 =A0 =A0 =A0 spin_unlock_irqrestore(&zone->lru_l= ock, flags);

=A0We are working = on the patch to break zone->lru_lock, and without this patch the system = crashes w/ running some swaptests.=A0Sorry I didn't post the full patch= at the beginning since not sure the second "+" related to the lr= u_lock patch or not.

And in reality, we=A0only ca= re about properly memcg-unaccounting the old lru state before
we change pc->mem_cgroup, so this becomes

=A0 =A0 =A0 =A0if (!PageLRU(page))
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return;
=A0 =A0 =A0 =A0spin_lock_irqsave(&zone->lru_lock, flags);
=A0 =A0 =A0 =A0if (!PageCgroupUsed(pc))
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0mem_cgroup_lru_del(page);
=A0 =A0 =A0 =A0spin_unlock_irqrestore(&zone->lru_= lock, flags);


I don't see why we should care if the page stays phys= ically linked to
the list. =A0

Can you clarify that?
=A0
The PageLRU check outside the l= ock is still fine as the
accounting has been done already if !PageLRU and a putback without
PageCgroupUsed will not re-account to pc->mem_cgroup, as the comment
above this code explains nicely.


The handling after committing the charg= e becomes this:

- =A0 =A0 =A0 if (likely(!PageLRU(page)))
- =A0 =A0 =A0 =A0 =A0 =A0 =A0 return;
=A0 =A0 =A0 =A0spin_lock_irqsave(&zone->lru_lock, flags);
=A0 =A0 =A0 =A0lru =3D page_lru(page);
=A0 =A0 =A0 =A0if (PageLRU(page) && !PageCgroupAcctLRU(pc)) {
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0del_page_from_lru_list(zone, page, lru); =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0add_page_to_lru_list(zone, page, lru);
=A0 =A0 =A0 =A0}

If the page is not on the LRU, someone else will put it there and link
it up properly. =A0If it is on the LRU and already memcg-accounted then
it must be on the right lruvec as setting pc->mem_cgroup and PCG_USED is properly ordered. =A0Otherwise, it has to be physically moved to the
correct lruvec and memcg-accounted for.

While working on the zone->lru_lock patch, i have been questioning myse= lf on the PageLRU and PageCgroupAcctLRU bit. Here is my question:

It looks to me that PageLRU indicates the page is linke= d to per-zone lru list, and PageCgroupAcctLRU indicates the page is charged= to a memcg and also linked to memcg's private lru list. All of these w= ork nicely when we have both global and private (per-memcg) lru list, but i= can not put them together after this patch.

Now page is linked to private lru always either memcg o= r root. While linked to either lru list, the page could be uncharged (like = swapcache). No matter what, i am thinking whether or not we can get rid of = the AcctLRU bit from pc and use LRU bit only here.

I haven't got chance put up the patch doing that, a= nd at the same time i wonder maybe i missed something ?
=A0
=
The old unlocked PageLRU check in after_commit is no longer possible
because setting PG_lru is not ordered against setting the list head,
which means the page could be linked to the wrong lruvec while this
CPU would not yet observe PG_lru and do the relink. =A0So this needs
strong ordering. =A0Given that this code is hairy enough as it is, I
just removed the preliminary check for now and do the check only under
the lock instead of adding barriers here and to the lru linking sites.

Thanks for making me write this out, few thinks put one's
understanding of a problem to the test like this.

Let's hope it helped :-)

Thank you for the detailed informati= on=A0:)

--Ying
--00163628429ec252a704aa51f55b-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org