linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ying Han <yinghan@google.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
	Michal Hocko <mhocko@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	Rik van Riel <riel@redhat.com>,
	Minchan Kim <minchan.kim@gmail.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Mel Gorman <mgorman@suse.de>, Greg Thelen <gthelen@google.com>,
	Michel Lespinasse <walken@google.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Hugh Dickins <hughd@google.com>,
	Balbir Singh <balbir@linux.vnet.ibm.com>
Subject: Re: [patch 2/8] mm: memcg-aware global reclaim
Date: Mon, 29 Aug 2011 13:36:48 -0700	[thread overview]
Message-ID: <CALWz4ix1X8=L0HzQpdGd=XVbjZuMCtYngzdG+hLMoeJJCUEjrg@mail.gmail.com> (raw)
In-Reply-To: <20110829190426.GC1434@cmpxchg.org>

[-- Attachment #1: Type: text/plain, Size: 6984 bytes --]

On Mon, Aug 29, 2011 at 12:04 PM, Johannes Weiner <hannes@cmpxchg.org>wrote:

> On Mon, Aug 29, 2011 at 12:22:02AM -0700, Ying Han wrote:
> > On Mon, Aug 29, 2011 at 12:15 AM, Ying Han <yinghan@google.com> wrote:
> > > On Thu, Aug 11, 2011 at 2:09 PM, Johannes Weiner <hannes@cmpxchg.org>
> wrote:
> > >>
> > >> On Thu, Aug 11, 2011 at 01:39:45PM -0700, Ying Han wrote:
> > >> > Please consider including the following patch for the next post. It
> causes
> > >> > crash on some of the tests where sc->mem_cgroup is NULL (global
> kswapd).
> > >> >
> > >> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > >> > index b72a844..12ab25d 100644
> > >> > --- a/mm/vmscan.c
> > >> > +++ b/mm/vmscan.c
> > >> > @@ -2768,7 +2768,8 @@ loop_again:
> > >> >                          * Do some background aging of the anon
> list, to
> > >> > give
> > >> >                          * pages a chance to be referenced before
> > >> > reclaiming.
> > >> >                          */
> > >> > -                       if (inactive_anon_is_low(zone, &sc))
> > >> > +                       if (scanning_global_lru(&sc) &&
> > >> > +                                       inactive_anon_is_low(zone,
> &sc))
> > >> >                                 shrink_active_list(SWAP_CLUSTER_MAX,
> zone,
> > >> >                                                         &sc,
> priority, 0);
> > >>
> > >> Thanks!  I completely overlooked this one and only noticed it after
> > >> changing the arguments to shrink_active_list().
> > >>
> > >> On memcg configurations, scanning_global_lru() will essentially never
> > >> be true again, so I moved the anon pre-aging to a separate function
> > >> that also does a hierarchy loop to preage the per-memcg anon lists.
> > >>
> > >> I hope to send out the next revision soon.
> > >
> > > Also, please consider to fold in the following patch as well. It fixes
> > > the root cgroup lru accounting and we could easily trigger OOM while
> > > doing some swapoff test w/o it.
> > >
> > > mm:fix the lru accounting for root cgroup.
> > >
> > > This patch is applied on top of:
> > > "
> > > mm: memcg-aware global reclaim
> > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > > "
> > >
> > > This patch fixes the lru accounting for root cgroup.
> > >
> > > After the "memcg-aware global reclaim" patch, one of the changes is to
> have
> > > lru pages linked back to root. Under the global memory pressure, we
> start from
> > > the root cgroup lru and walk through the memcg hierarchy of the system.
> For
> > > each memcg, we reclaim pages based on the its lru size.
> > >
> > > However for root cgroup, we used not having a seperate lru and only
> counting
> > > the pages charged to root as part of root lru size. Without this patch,
> all
> > > the pages which are linked to root lru but not charged to root like
> swapcache
> > > readahead are not visible to page reclaim code and we are easily to get
> OOM.
> > >
> > > After this patch, all the pages linked under root lru are counted in
> the lru
> > > size, including Used and !Used.
> > >
> > > Signed-off-by: Hugh Dickins <hughd@google.com>
> > > Signed-off-by: Ying Han <yinghan@google.com>
> > >
> > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > index 5518f54..f6c5f29 100644
> > > --- a/mm/memcontrol.c
> > > +++ b/mm/memcontrol.c
> > > @@ -888,19 +888,21 @@ void mem_cgroup_del_lru_list(struct page *page,
> > > enum lru_list lru)
> > >  {
> > >  >------struct page_cgroup *pc;
> > >  >------struct mem_cgroup_per_zone *mz;
> > > +>------struct mem_cgroup *mem;
> > > ·
> > >  >------if (mem_cgroup_disabled())
> > >  >------>-------return;
> > >  >------pc = lookup_page_cgroup(page);
> > > ->------/* can happen while we handle swapcache. */
> > > ->------if (!TestClearPageCgroupAcctLRU(pc))
> > > ->------>-------return;
> > > ->------VM_BUG_ON(!pc->mem_cgroup);
> > > ->------/*
> > > ->------ * We don't check PCG_USED bit. It's cleared when the "page" is
> finally
> > > ->------ * removed from global LRU.
> > > ->------ */
> > > ->------mz = page_cgroup_zoneinfo(pc->mem_cgroup, page);
> > > +
> > > +>------if (TestClearPageCgroupAcctLRU(pc) || PageCgroupUsed(pc)) {
>
> This PageCgroupUsed part confuses me.  A page that is being isolated
> shortly after being charged while on the LRU may reach here, and then
> it is unaccounted from pc->mem_cgroup, which it never was accounted
> to.
>
> Could you explain why you added it?
>

To be honest, i don't have very good reason for that. The PageCgroupUsed
check is put there after running some tests and some fixes seems help the
test, including this one.

The one case I can think of for page !AcctLRU | Used is in the pagevec.
However, we shouldn't get to the mem_cgroup_del_lru_list() for a page in
pagevec at the first place.

I now made it so that PageCgroupAcctLRU on the LRU means accounted to
> pc->mem_cgroup,


this is the same logic currently.


> and !PageCgroupAcctLRU on the LRU means accounted to
> and babysitted by root_mem_cgroup.


this seems to be different from what it is now, especially for swapcache
page. So, the page here is linked to root cgroup LRU or not?

Anyway, the AcctLRU flags still seems confusing to me:

what this flag tells me is that whether or not the page is on a PRIVATE lru
and being accounted, i used private here to differentiate from the per zone
lru, where it also has PageLRU flag.  The two flags are separate since pages
could be on one lru not the other ( I guess ) , but this is changed after
having the root cgroup lru back. For example, AcctLRU is used to keep track
of the accounted lru pages, especially for root ( we didn't account the
!Used pages to root like readahead swapcache). Now we account the full size
of lru list of root including Used and !Used, but only mark the Used pages
w/ AcctLRU flag.

So in general, i am wondering we should be able to replace that eventually
with existing Used and LRU bit.  Sorry this seems to be something we like to
consider later, not necessarily now :)

Always.  Which also means that

> before_commit now ensures an LRU page is moved to root_mem_cgroup for
> babysitting during the charge, so that concurrent isolations/putbacks
> are always accounted correctly.  Is this what you had in mind?  Did I
> miss something?
>

In my tree, the before->commit->after protocol is folded into one function.
I didn't post it since I know you also have patch doing that.  So guess I
don't understand why we need to move the page to root while it is gonna be
charged to a memcg by commit_charge shortly after.

My understanding is that in before_commit, we uncharge the page from
previous memcg lru if AcctLRU was set, then in the commit_charge we update
the new owner of it. And in after_commit we update the memcg lru for the new
owner after linking the page in the lru.

--Ying

[-- Attachment #2: Type: text/html, Size: 9801 bytes --]

  reply	other threads:[~2011-08-29 20:36 UTC|newest]

Thread overview: 110+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-01  6:25 [patch 0/8] mm: memcg naturalization -rc2 Johannes Weiner
2011-06-01  6:25 ` [patch 1/8] memcg: remove unused retry signal from reclaim Johannes Weiner
2011-06-01  6:25 ` [patch 2/8] mm: memcg-aware global reclaim Johannes Weiner
2011-06-02 13:59   ` Hiroyuki Kamezawa
2011-06-02 15:01     ` Johannes Weiner
2011-06-02 16:14       ` Hiroyuki Kamezawa
2011-06-02 17:29         ` Johannes Weiner
2011-06-09 14:01           ` Michal Hocko
2011-06-07 12:25   ` Christoph Hellwig
2011-06-08  9:30     ` Johannes Weiner
2011-06-09  9:26       ` Christoph Hellwig
2011-06-09 16:57         ` Johannes Weiner
2011-06-09 13:12   ` Michal Hocko
2011-06-09 13:45     ` Johannes Weiner
2011-06-09 15:48   ` Minchan Kim
2011-06-09 17:23     ` Johannes Weiner
2011-06-09 23:41       ` Minchan Kim
2011-06-09 23:47         ` Minchan Kim
2011-06-10  0:34           ` Johannes Weiner
2011-06-10  0:48             ` Minchan Kim
2011-08-11 20:39   ` Ying Han
2011-08-11 21:09     ` Johannes Weiner
2011-08-29  7:15       ` Ying Han
2011-08-29  7:22         ` Ying Han
2011-08-29  7:57           ` Johannes Weiner
2011-08-30  6:08             ` Ying Han
2011-08-29 19:04           ` Johannes Weiner
2011-08-29 20:36             ` Ying Han [this message]
2011-08-29 21:05               ` Johannes Weiner
2011-08-30  7:07                 ` Ying Han
2011-08-30 15:14                   ` Johannes Weiner
2011-08-31 22:58                     ` Ying Han
2011-09-21  8:44                       ` Johannes Weiner
2011-08-29  8:07         ` Johannes Weiner
2011-06-01  6:25 ` [patch 3/8] memcg: reclaim statistics Johannes Weiner
2011-06-01  6:25 ` [patch 4/8] memcg: rework soft limit reclaim Johannes Weiner
2011-06-02  5:37   ` Ying Han
2011-06-02 21:55   ` Ying Han
2011-06-03  5:25     ` Ying Han
2011-06-09 15:00       ` Michal Hocko
2011-06-10  7:36         ` Michal Hocko
2011-06-15 22:57           ` Ying Han
2011-06-16  0:33             ` Ying Han
2011-06-16 11:45             ` Michal Hocko
2011-06-15 22:48         ` Ying Han
2011-06-16 11:41           ` Michal Hocko
2011-06-01  6:25 ` [patch 5/8] memcg: remove unused soft limit code Johannes Weiner
2011-06-13  9:26   ` Michal Hocko
2011-06-01  6:25 ` [patch 6/8] vmscan: change zone_nr_lru_pages to take memcg instead of scan control Johannes Weiner
2011-06-02 13:30   ` Hiroyuki Kamezawa
2011-06-02 14:28     ` Johannes Weiner
2011-06-13  9:29   ` Michal Hocko
2011-06-01  6:25 ` [patch 7/8] vmscan: memcg-aware unevictable page rescue scanner Johannes Weiner
2011-06-02 13:27   ` Hiroyuki Kamezawa
2011-06-02 14:27     ` Johannes Weiner
2011-06-02 21:02     ` Ying Han
2011-06-02 22:01       ` Hiroyuki Kamezawa
2011-06-02 22:19         ` Johannes Weiner
2011-06-02 23:15           ` Hiroyuki Kamezawa
2011-06-03  5:08           ` Ying Han
2011-06-13  9:42   ` Michal Hocko
2011-06-13 10:30     ` Johannes Weiner
2011-06-13 11:18       ` Michal Hocko
2011-07-19 22:47   ` Ying Han
2011-07-20  0:36     ` Johannes Weiner
2011-08-29  7:28       ` Ying Han
2011-08-29  7:59         ` Johannes Weiner
2011-06-01  6:25 ` [patch 8/8] mm: make per-memcg lru lists exclusive Johannes Weiner
2011-06-02 13:16   ` Hiroyuki Kamezawa
2011-06-02 14:24     ` Johannes Weiner
2011-06-02 15:54       ` Hiroyuki Kamezawa
2011-06-02 17:57         ` Johannes Weiner
2011-06-08 15:04           ` Michal Hocko
2011-06-07 12:42   ` Christoph Hellwig
2011-06-08  8:54     ` Johannes Weiner
2011-06-09  9:23       ` Christoph Hellwig
2011-08-11 20:33   ` Ying Han
2011-08-12  8:34     ` Johannes Weiner
2011-08-12 17:08       ` Ying Han
2011-08-12 19:17         ` Johannes Weiner
2011-08-15  3:01           ` Ying Han
2011-08-15  1:34       ` Ying Han
2011-08-15  9:39         ` Johannes Weiner
2011-06-01 23:52 ` [patch 0/8] mm: memcg naturalization -rc2 Hiroyuki Kamezawa
2011-06-02  0:35   ` Greg Thelen
2011-06-09  1:13     ` Rik van Riel
2011-06-02  4:05   ` Ying Han
2011-06-02  7:50     ` Johannes Weiner
2011-06-02 15:51       ` Ying Han
2011-06-02 17:51         ` Johannes Weiner
2011-06-08  3:45           ` Ying Han
2011-06-08  3:53           ` Ying Han
2011-06-08 15:32             ` Johannes Weiner
2011-06-09  3:52               ` Ying Han
2011-06-09  8:35                 ` Johannes Weiner
2011-06-09 17:36                   ` Ying Han
2011-06-09 18:36                     ` Johannes Weiner
2011-06-09 21:38                       ` Ying Han
2011-06-09 22:30                       ` Ying Han
2011-06-09 23:31                         ` Johannes Weiner
2011-06-10  0:17                           ` Ying Han
2011-06-02  7:33   ` Johannes Weiner
2011-06-02  9:06     ` Hiroyuki Kamezawa
2011-06-02 10:00       ` Johannes Weiner
2011-06-02 12:59         ` Hiroyuki Kamezawa
2011-06-09  1:15           ` Rik van Riel
2011-06-09  8:43             ` Johannes Weiner
2011-06-09  9:31               ` Christoph Hellwig
2011-06-13  9:47 ` Michal Hocko
2011-06-13 10:35   ` Johannes Weiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALWz4ix1X8=L0HzQpdGd=XVbjZuMCtYngzdG+hLMoeJJCUEjrg@mail.gmail.com' \
    --to=yinghan@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=minchan.kim@gmail.com \
    --cc=nishimura@mxp.nes.nec.co.jp \
    --cc=riel@redhat.com \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox