linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ying Han <yinghan@google.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>,
	Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
	Li Zefan <lizf@cn.fujitsu.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Rik van Riel <riel@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Greg Thelen <gthelen@google.com>,
	Minchan Kim <minchan.kim@gmail.com>, Mel Gorman <mel@csn.ul.ie>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Michal Hocko <mhocko@suse.cz>, Zhu Yanhai <zhu.yanhai@gmail.com>,
	linux-mm@kvack.org
Subject: Re: [RFC][PATCH] memcg: isolate pages in memcg lru from global lru
Date: Wed, 30 Mar 2011 22:41:51 -0700	[thread overview]
Message-ID: <BANLkTimiwObEvRLv8pmmcy8v31FN2y_VOg@mail.gmail.com> (raw)
In-Reply-To: <20110331112532.82ed25ad.kamezawa.hiroyu@jp.fujitsu.com>

[-- Attachment #1: Type: text/plain, Size: 5525 bytes --]

On Wed, Mar 30, 2011 at 7:25 PM, KAMEZAWA Hiroyuki <
kamezawa.hiroyu@jp.fujitsu.com> wrote:

> On Wed, 30 Mar 2011 17:48:18 -0700
> Ying Han <yinghan@google.com> wrote:
>
> > In memory controller, we do both targeting reclaim and global reclaim.
> The
> > later one walks through the global lru which links all the allocated
> pages
> > on the system. It breaks the memory isolation since pages are evicted
> > regardless of their memcg owners. This patch takes pages off global lru
> > as long as they are added to per-memcg lru.
> >
> > Memcg and cgroup together provide the solution of memory isolation where
> > multiple cgroups run in parallel without interfering with each other. In
> > vm, memory isolation requires changes in both page allocation and page
> > reclaim. The current memcg provides good user page accounting, but need
> > more work on the page reclaim.
> >
> > In an over-committed machine w/ 32G ram, here is the configuration:
> >
> > cgroup-A/  -- limit_in_bytes = 20G, soft_limit_in_bytes = 15G
> > cgroup-B/  -- limit_in_bytes = 20G, soft_limit_in_bytes = 15G
> >
> > 1) limit_in_bytes is the hard_limit where process will be throttled or
> OOM
> > killed by going over the limit.
> > 2) memory between soft_limit and limit_in_bytes are best-effort.
> soft_limit
> > provides "guarantee" in some sense.
> >
> > Then, it is easy to generate the following senario where:
> >
> > cgroup-A/  -- usage_in_bytes = 20G
> > cgroup-B/  -- usage_in_bytes = 12G
> >
> > The global memory pressure triggers while cgroup-A keep allocating
> memory. At
> > this point, pages belongs to cgroup-B can be evicted from global LRU.
> >
> > We do have per-memcg targeting reclaim including per-memcg background
> reclaim
> > and soft_limit reclaim. Both of them need some improvement, and
> regardless we
> > still need this patch since it breaks isolation.
> >
> > Besides, here is to-do list I have on memcg page reclaim and they are
> sorted.
> > a) per-memcg background reclaim. to reclaim pages proactively
> agree,
>
> > b) skipping global lru reclaim if soft_limit reclaim does enough work.
> this is
> > both for global background reclaim and global ttfp reclaim.
>
> agree. but zone-balancing cannot be avoidalble for now. So, I think we need
> a
> inter-zone-page-migration to balancing memory between zones...if necessary.
>

thank you for your comments, and can you clarify a bit on this? Actually I
was thinking about the zone balancing within memcg, but haven't thought it
through yet. I would like to learn more on the cases that we can not avoid
global zone-balancing totally.

>
>
> > c) improve the soft_limit reclaim to be efficient.
>
> must be done.
>

The current design of soft_limit is more on the correctness rather than
efficiency. If we are talking about to improve the efficiency of target
reclaim, there are quite a lot to change. The first thing might be improving
the per-zone RB tree. They are currently based on per-memcg
(usage_limit-soft_limit) regardless of how much pages landed on the zone.


>
> > d) isolate pages in memcg from global list since it breaks memory
> isolation.
>



> >
>
> I never agree this until about a),b),c) is fixed and we can go nowhere.
>
> BTW, in other POV, for reducing size of page_cgroup, we must remove ->lru
> on page_cgroup. If divide-and-conquer memory reclaim works enough,
> we can do that. But this is a big global VM change, so we need enough
> justification.
>

I can agree on that. The change looks big, especially without efficient
target reclaim. However
I do believe we need this to have isolation guarantee.

>
>
>
> > I have some basic test on this patch and more tests definitely are
> needed:
> >
>
> > Functional:
> > two memcgs under root. cgroup-A is reading 20g file with 2g limit,
> > cgroup-B is running random stuff with 500m limit. Check the counters for
> > per-memcg lru and global lru, and they should add-up.
> >
> > 1) total file pages
> > $ cat /proc/meminfo | grep Cache
> > Cached:          6032128 kB
> >
> > 2) file lru on global lru
> > $ cat /proc/vmstat | grep file
> > nr_inactive_file 0
> > nr_active_file 963131
> >
> > 3) file lru on root cgroup
> > $ cat /dev/cgroup/memory.stat | grep file
> > inactive_file 0
> > active_file 0
> >
> > 4) file lru on cgroup-A
> > $ cat /dev/cgroup/A/memory.stat | grep file
> > inactive_file 2145759232
> > active_file 0
> >
> > 5) file lru on cgroup-B
> > $ cat /dev/cgroup/B/memory.stat | grep file
> > inactive_file 401408
> > active_file 143360
> >
> > Performance:
> > run page fault test(pft) with 16 thread on faulting in 15G anon pages
> > in 16G cgroup. There is no regression noticed on "flt/cpu/s"
> >
>
> You need a fix for /proc/meminfo, /proc/vmstat to count memcg's ;)
>

Yes. :) Since this is RFC prototype, i took the shortcut by reusing the
existing stat by only count the pages on global LRU.

>
> Anyway, this seems too aggresive to me, for now. Please do a), b), c), at
> first.
>


>
> IIUC, this patch itself can cause a livelock when softlimit is
> misconfigured.
> What is the protection against wrong softlimit ?
>

Hmm, can you help to clarify on that?

>



> If we do this kind of LRU isolation, we'll need some limitation of the sum
> of
> limits of all memcg for avoiding wrong configuration. That may change UI,
> dramatically.
> (As RT-class cpu limiting cgroup does.....)
>

This sounds related the question above, so I just wait for my question being
answered :)


Anyway, thank you for data.
>
> sure

--Ying


> Thanks,
> -Kame
>
>
>

[-- Attachment #2: Type: text/html, Size: 8064 bytes --]

  reply	other threads:[~2011-03-31  5:41 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-31  0:48 Ying Han
2011-03-31  2:01 ` Daisuke Nishimura
2011-03-31  4:52   ` Ying Han
2011-03-31  2:25 ` KAMEZAWA Hiroyuki
2011-03-31  5:41   ` Ying Han [this message]
2011-03-31  6:07     ` KAMEZAWA Hiroyuki
2011-03-31 13:20 ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BANLkTimiwObEvRLv8pmmcy8v31FN2y_VOg@mail.gmail.com \
    --to=yinghan@google.com \
    --cc=aarcange@redhat.com \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=mel@csn.ul.ie \
    --cc=mhocko@suse.cz \
    --cc=minchan.kim@gmail.com \
    --cc=nishimura@mxp.nes.nec.co.jp \
    --cc=riel@redhat.com \
    --cc=zhu.yanhai@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox