linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Hiroyuki Kamezawa <kamezawa.hiroyuki@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	Ying Han <yinghan@google.com>, Michal Hocko <mhocko@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	Rik van Riel <riel@redhat.com>,
	Minchan Kim <minchan.kim@gmail.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Mel Gorman <mgorman@suse.de>, Greg Thelen <gthelen@google.com>,
	Michel Lespinasse <walken@google.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [patch 2/8] mm: memcg-aware global reclaim
Date: Thu, 2 Jun 2011 17:01:23 +0200	[thread overview]
Message-ID: <20110602150123.GE28684@cmpxchg.org> (raw)
In-Reply-To: <BANLkTikKHq=NBAPOXJVDM7ZEc9CkW+HdmQ@mail.gmail.com>

On Thu, Jun 02, 2011 at 10:59:01PM +0900, Hiroyuki Kamezawa wrote:
> 2011/6/1 Johannes Weiner <hannes@cmpxchg.org>:
> > @@ -1381,6 +1373,97 @@ u64 mem_cgroup_get_limit(struct mem_cgroup *memcg)
> >        return min(limit, memsw);
> >  }
> >
> > +/**
> > + * mem_cgroup_hierarchy_walk - iterate over a memcg hierarchy
> > + * @root: starting point of the hierarchy
> > + * @prev: previous position or NULL
> > + *
> > + * Caller must hold a reference to @root.  While this function will
> > + * return @root as part of the walk, it will never increase its
> > + * reference count.
> > + *
> > + * Caller must clean up with mem_cgroup_stop_hierarchy_walk() when it
> > + * stops the walk potentially before the full round trip.
> > + */
> > +struct mem_cgroup *mem_cgroup_hierarchy_walk(struct mem_cgroup *root,
> > +                                            struct mem_cgroup *prev)
> > +{
> > +       struct mem_cgroup *mem;
> > +
> > +       if (mem_cgroup_disabled())
> > +               return NULL;
> > +
> > +       if (!root)
> > +               root = root_mem_cgroup;
> > +       /*
> > +        * Even without hierarchy explicitely enabled in the root
> > +        * memcg, it is the ultimate parent of all memcgs.
> > +        */
> > +       if (!(root == root_mem_cgroup || root->use_hierarchy))
> > +               return root;
> 
> Hmm, because ROOT cgroup has no limit and control, if root=root_mem_cgroup,
> we do full hierarchy scan always. Right ?

What it essentially means is that all existing memcgs in the system
contribute to the usage of root_mem_cgroup.

If there is global memory pressure, we need to consider reclaiming
from every single memcg in the system.

> > +static unsigned long mem_cgroup_reclaim(struct mem_cgroup *mem,
> > +                                       gfp_t gfp_mask,
> > +                                       unsigned long flags)
> > +{
> > +       unsigned long total = 0;
> > +       bool noswap = false;
> > +       int loop;
> > +
> > +       if ((flags & MEM_CGROUP_RECLAIM_NOSWAP) || mem->memsw_is_minimum)
> > +               noswap = true;
> > +       for (loop = 0; loop < MEM_CGROUP_MAX_RECLAIM_LOOPS; loop++) {
> > +               drain_all_stock_async();
> 
> In recent patch, I removed this call here because this wakes up
> kworker too much.
> I will post that patch as a bugfix. So, please adjust this call
> somewhere which is
> not called frequently.

Okay, please CC me when you send out the bugfix.

> > @@ -1927,8 +1980,7 @@ static int mem_cgroup_do_charge(struct mem_cgroup *mem, gfp_t gfp_mask,
> >        if (!(gfp_mask & __GFP_WAIT))
> >                return CHARGE_WOULDBLOCK;
> >
> > -       ret = mem_cgroup_hierarchical_reclaim(mem_over_limit, NULL,
> > -                                             gfp_mask, flags);
> > +       ret = mem_cgroup_reclaim(mem_over_limit, gfp_mask, flags);
> >        if (mem_cgroup_margin(mem_over_limit) >= nr_pages)
> >                return CHARGE_RETRY;
> >        /*
> 
> It seems this clean-up around hierarchy and softlimit can be in an
> independent patch, no ?

Hm, why do you think it's a cleanup?  The hierarchical target reclaim
code is moved to vmscan.c and as a result the entry points for hard
limit and soft limit reclaim differ.  This is why the original
function, mem_cgroup_hierarchical_reclaim() has to be split into two
parts.

> > @@ -1943,6 +1976,31 @@ restart:
> >        throttle_vm_writeout(sc->gfp_mask);
> >  }
> >
> > +static void shrink_zone(int priority, struct zone *zone,
> > +                       struct scan_control *sc)
> > +{
> > +       unsigned long nr_reclaimed_before = sc->nr_reclaimed;
> > +       struct mem_cgroup *root = sc->target_mem_cgroup;
> > +       struct mem_cgroup *first, *mem = NULL;
> > +
> > +       first = mem = mem_cgroup_hierarchy_walk(root, mem);
> 
> Hmm, I think we should add some scheduling here, later.
> (as select a group over softlimit or select a group which has
>  easily reclaimable pages on this zone.)
> 
> This name as hierarchy_walk() sounds like "full scan in round-robin, always".
> Could you find better name ?

Okay, I'll try.

> > +       for (;;) {
> > +               unsigned long nr_reclaimed;
> > +
> > +               sc->mem_cgroup = mem;
> > +               do_shrink_zone(priority, zone, sc);
> > +
> > +               nr_reclaimed = sc->nr_reclaimed - nr_reclaimed_before;
> > +               if (nr_reclaimed >= sc->nr_to_reclaim)
> > +                       break;
> 
> what this calculation means ?  Shouldn't we do this quit based on the
> number of "scan"
> rather than "reclaimed" ?

It aborts the loop once sc->nr_to_reclaim pages have been reclaimed
from that zone during that hierarchy walk, to prevent overreclaim.

If you have unbalanced sizes of memcgs in the system, it is not
desirable to have every reclaimer scan all memcgs, but let those quit
early that have made some progress on the bigger memcgs.

It's essentially a forward progagation of the same check in
do_shrink_zone().  It trades absolute fairness for average reclaim
latency.

Note that kswapd sets the reclaim target to infinity, so this
optimization applies only to direct reclaimers.

> > +               mem = mem_cgroup_hierarchy_walk(root, mem);
> > +               if (mem == first)
> > +                       break;
> 
> Why we quit loop  ?

get_scan_count() for traditional global reclaim returns the scan
target for the zone.

With this per-memcg reclaimer, get_scan_count() will return scan
targets for the respective per-memcg zone subsizes.

So once we have gone through all memcgs, we should have scanned the
amount of pages that global reclaim would have deemed sensible for
that zone at that priority level.

As such, this is the exit condition based on scan count you referred
to above.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2011-06-02 15:01 UTC|newest]

Thread overview: 110+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-01  6:25 [patch 0/8] mm: memcg naturalization -rc2 Johannes Weiner
2011-06-01  6:25 ` [patch 1/8] memcg: remove unused retry signal from reclaim Johannes Weiner
2011-06-01  6:25 ` [patch 2/8] mm: memcg-aware global reclaim Johannes Weiner
2011-06-02 13:59   ` Hiroyuki Kamezawa
2011-06-02 15:01     ` Johannes Weiner [this message]
2011-06-02 16:14       ` Hiroyuki Kamezawa
2011-06-02 17:29         ` Johannes Weiner
2011-06-09 14:01           ` Michal Hocko
2011-06-07 12:25   ` Christoph Hellwig
2011-06-08  9:30     ` Johannes Weiner
2011-06-09  9:26       ` Christoph Hellwig
2011-06-09 16:57         ` Johannes Weiner
2011-06-09 13:12   ` Michal Hocko
2011-06-09 13:45     ` Johannes Weiner
2011-06-09 15:48   ` Minchan Kim
2011-06-09 17:23     ` Johannes Weiner
2011-06-09 23:41       ` Minchan Kim
2011-06-09 23:47         ` Minchan Kim
2011-06-10  0:34           ` Johannes Weiner
2011-06-10  0:48             ` Minchan Kim
2011-08-11 20:39   ` Ying Han
2011-08-11 21:09     ` Johannes Weiner
2011-08-29  7:15       ` Ying Han
2011-08-29  7:22         ` Ying Han
2011-08-29  7:57           ` Johannes Weiner
2011-08-30  6:08             ` Ying Han
2011-08-29 19:04           ` Johannes Weiner
2011-08-29 20:36             ` Ying Han
2011-08-29 21:05               ` Johannes Weiner
2011-08-30  7:07                 ` Ying Han
2011-08-30 15:14                   ` Johannes Weiner
2011-08-31 22:58                     ` Ying Han
2011-09-21  8:44                       ` Johannes Weiner
2011-08-29  8:07         ` Johannes Weiner
2011-06-01  6:25 ` [patch 3/8] memcg: reclaim statistics Johannes Weiner
2011-06-01  6:25 ` [patch 4/8] memcg: rework soft limit reclaim Johannes Weiner
2011-06-02  5:37   ` Ying Han
2011-06-02 21:55   ` Ying Han
2011-06-03  5:25     ` Ying Han
2011-06-09 15:00       ` Michal Hocko
2011-06-10  7:36         ` Michal Hocko
2011-06-15 22:57           ` Ying Han
2011-06-16  0:33             ` Ying Han
2011-06-16 11:45             ` Michal Hocko
2011-06-15 22:48         ` Ying Han
2011-06-16 11:41           ` Michal Hocko
2011-06-01  6:25 ` [patch 5/8] memcg: remove unused soft limit code Johannes Weiner
2011-06-13  9:26   ` Michal Hocko
2011-06-01  6:25 ` [patch 6/8] vmscan: change zone_nr_lru_pages to take memcg instead of scan control Johannes Weiner
2011-06-02 13:30   ` Hiroyuki Kamezawa
2011-06-02 14:28     ` Johannes Weiner
2011-06-13  9:29   ` Michal Hocko
2011-06-01  6:25 ` [patch 7/8] vmscan: memcg-aware unevictable page rescue scanner Johannes Weiner
2011-06-02 13:27   ` Hiroyuki Kamezawa
2011-06-02 14:27     ` Johannes Weiner
2011-06-02 21:02     ` Ying Han
2011-06-02 22:01       ` Hiroyuki Kamezawa
2011-06-02 22:19         ` Johannes Weiner
2011-06-02 23:15           ` Hiroyuki Kamezawa
2011-06-03  5:08           ` Ying Han
2011-06-13  9:42   ` Michal Hocko
2011-06-13 10:30     ` Johannes Weiner
2011-06-13 11:18       ` Michal Hocko
2011-07-19 22:47   ` Ying Han
2011-07-20  0:36     ` Johannes Weiner
2011-08-29  7:28       ` Ying Han
2011-08-29  7:59         ` Johannes Weiner
2011-06-01  6:25 ` [patch 8/8] mm: make per-memcg lru lists exclusive Johannes Weiner
2011-06-02 13:16   ` Hiroyuki Kamezawa
2011-06-02 14:24     ` Johannes Weiner
2011-06-02 15:54       ` Hiroyuki Kamezawa
2011-06-02 17:57         ` Johannes Weiner
2011-06-08 15:04           ` Michal Hocko
2011-06-07 12:42   ` Christoph Hellwig
2011-06-08  8:54     ` Johannes Weiner
2011-06-09  9:23       ` Christoph Hellwig
2011-08-11 20:33   ` Ying Han
2011-08-12  8:34     ` Johannes Weiner
2011-08-12 17:08       ` Ying Han
2011-08-12 19:17         ` Johannes Weiner
2011-08-15  3:01           ` Ying Han
2011-08-15  1:34       ` Ying Han
2011-08-15  9:39         ` Johannes Weiner
2011-06-01 23:52 ` [patch 0/8] mm: memcg naturalization -rc2 Hiroyuki Kamezawa
2011-06-02  0:35   ` Greg Thelen
2011-06-09  1:13     ` Rik van Riel
2011-06-02  4:05   ` Ying Han
2011-06-02  7:50     ` Johannes Weiner
2011-06-02 15:51       ` Ying Han
2011-06-02 17:51         ` Johannes Weiner
2011-06-08  3:45           ` Ying Han
2011-06-08  3:53           ` Ying Han
2011-06-08 15:32             ` Johannes Weiner
2011-06-09  3:52               ` Ying Han
2011-06-09  8:35                 ` Johannes Weiner
2011-06-09 17:36                   ` Ying Han
2011-06-09 18:36                     ` Johannes Weiner
2011-06-09 21:38                       ` Ying Han
2011-06-09 22:30                       ` Ying Han
2011-06-09 23:31                         ` Johannes Weiner
2011-06-10  0:17                           ` Ying Han
2011-06-02  7:33   ` Johannes Weiner
2011-06-02  9:06     ` Hiroyuki Kamezawa
2011-06-02 10:00       ` Johannes Weiner
2011-06-02 12:59         ` Hiroyuki Kamezawa
2011-06-09  1:15           ` Rik van Riel
2011-06-09  8:43             ` Johannes Weiner
2011-06-09  9:31               ` Christoph Hellwig
2011-06-13  9:47 ` Michal Hocko
2011-06-13 10:35   ` Johannes Weiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110602150123.GE28684@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=gthelen@google.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kamezawa.hiroyuki@gmail.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=minchan.kim@gmail.com \
    --cc=nishimura@mxp.nes.nec.co.jp \
    --cc=riel@redhat.com \
    --cc=walken@google.com \
    --cc=yinghan@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox