linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Ying Han <yinghan@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Minchan Kim <minchan.kim@gmail.com>,
	Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	Tejun Heo <tj@kernel.org>, Pavel Emelyanov <xemul@openvz.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Li Zefan <lizf@cn.fujitsu.com>, Mel Gorman <mel@csn.ul.ie>,
	Christoph Lameter <cl@linux.com>, Rik van Riel <riel@redhat.com>,
	Hugh Dickins <hughd@google.com>, Michal Hocko <mhocko@suse.cz>,
	Dave Hansen <dave@linux.vnet.ibm.com>,
	Zhu Yanhai <zhu.yanhai@gmail.com>,
	linux-mm@kvack.org
Subject: Re: [PATCH V6 00/10] memcg: per cgroup background reclaim
Date: Wed, 27 Apr 2011 23:37:37 +0200	[thread overview]
Message-ID: <20110427213737.GD12437@cmpxchg.org> (raw)
In-Reply-To: <BANLkTikuEm6NjMpoDC_Wy3r061+rdhApFA@mail.gmail.com>

On Wed, Apr 27, 2011 at 10:41:47AM -0700, Ying Han wrote:
> On Wed, Apr 27, 2011 at 12:36 AM, Johannes Weiner <hannes@cmpxchg.org>wrote:
> 
> > On Fri, Apr 22, 2011 at 08:33:58PM -0700, Ying Han wrote:
> > > On Fri, Apr 22, 2011 at 7:34 PM, Johannes Weiner <hannes@cmpxchg.org>
> > wrote:
> > >
> > > > On Fri, Apr 22, 2011 at 07:10:25PM -0700, Ying Han wrote: >
> > > > However, i still think there is a need from the admin to have some
> > > > controls > of which memcg to do background reclaim proactively
> > > > (before global memory > pressure) and that was the initial logic
> > > > behind the API.
> > > >
> > > > That sounds more interesting.  Do you have a specific use case
> > > > that requires this?
> > >
> > > There might be more interesting use cases there, and here is one I
> > > can think of:
> > >
> > > let's say we three jobs A, B and C, and one host with 32G of RAM. We
> > > configure each job's hard_limit as their peak memory usage.
> > > A: 16G
> > > B: 16G
> > > C: 10G
> > >
> > > 1. we start running A with hard_limit 15G, and start running B with
> > > hard_limit 15G.
> > > 2. we set A and B's soft_limit based on their "hot" memory. Let's say
> > > setting A's soft_limit 10G and B's soft_limit 10G.
> > > (The soft_limit will be changing based on their runtime memory usage)
> > >
> > > If no more jobs running on the system, A and B will easily fill up the
> > whole
> > > system with pagecache pages. Since we are not over-committing the machine
> > > with their hard_limit, there will be no pressure to push their memory
> > usage
> > > down to soft_limit.
> > >
> > > Now we would like to launch another job C, since we know there are A(16G
> > -
> > > 10G) + B(16G - 10G)  = 12G "cold" memory can be reclaimed (w/o impacting
> > the
> > > A and B's performance). So what will happen
> > >
> > > 1. start running C on the host, which triggers global memory pressure
> > right
> > > away. If the reclaim is fast, C start growing with the free pages from A
> > and
> > > B.
> > >
> > > However, it might be possible that the reclaim can not catch-up with the
> > > job's page allocation. We end up with either OOM condition or performance
> > > spike on any of the running jobs.
> >
> > If background reclaim can not catch up, C will go into direct reclaim,
> > which will have exactly the same effect, only that C will have to do
> > the work itself.
> >
> > > One way to improve it is to set a wmark on either A/B to be proactively
> > > reclaiming pages before launching C. The global memory pressure won't
> > help
> > > much here since we won't trigger that.
> >
> > Ok, so you want to use the watermarks to push back and limit the usage
> > of A and B to make room for C.  Isn't this exactly what the hard limit
> > is for?
> 
> similar, but not exactly the same. there is no need to hard cap the memory
> usage for A and B in that case.
> what we need is to have some period of time that A and B slowly reclaim
> pages and leaves some room to
> launch C smoothly.

I think we are going in circles now.

Since starting with C the machine is overcommitted, the problem is no
longer memcg-internal latency but latency of global memory scarcity.

My suggestion to that was, and still is, to fix global background
reclaim, which should apply pressure equally to all memcgs until the
_global_ watermarks are met again.

This would do the right thing for this case: C starts up, the global
watermark is breached sooner or later and background reclaim will push
back A and B, hopefully before anyone has to go into direct
reclaim. ('Hopefully' because the allocations may still happen faster
than background reclaim can keep up freeing pages.  But this applies
to your scenario as well.)

I think this should work out of the box, without tweaking obscure
knobs from userspace.

Anyway, at this point I can only repeat myself, so I will shut up now.

	Hannes

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2011-04-27 21:38 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-19  3:57 Ying Han
2011-04-19  3:57 ` [PATCH V6 01/10] Add kswapd descriptor Ying Han
2011-04-19  3:57 ` [PATCH V6 02/10] Add per memcg reclaim watermarks Ying Han
2011-04-19  3:57 ` [PATCH V6 03/10] New APIs to adjust per-memcg wmarks Ying Han
2011-04-19  3:57 ` [PATCH V6 04/10] Infrastructure to support per-memcg reclaim Ying Han
2011-04-19  3:57 ` [PATCH V6 05/10] Implement the select_victim_node within memcg Ying Han
2011-04-19  3:57 ` [PATCH V6 06/10] Per-memcg background reclaim Ying Han
2011-04-20  1:03   ` KAMEZAWA Hiroyuki
2011-04-20  3:25     ` Ying Han
2011-04-20  4:20     ` Ying Han
2012-03-19  8:14   ` Zhu Yanhai
2012-03-20  5:37     ` Ying Han
2011-04-19  3:57 ` [PATCH V6 07/10] Add per-memcg zone "unreclaimable" Ying Han
2011-04-19  3:57 ` [PATCH V6 08/10] Enable per-memcg background reclaim Ying Han
2011-04-19  3:57 ` [PATCH V6 09/10] Add API to export per-memcg kswapd pid Ying Han
2011-04-20  1:15   ` KAMEZAWA Hiroyuki
2011-04-20  3:39     ` Ying Han
2011-04-19  3:57 ` [PATCH V6 10/10] Add some per-memcg stats Ying Han
2011-04-21  2:51 ` [PATCH V6 00/10] memcg: per cgroup background reclaim Johannes Weiner
2011-04-21  3:05   ` Ying Han
2011-04-21  3:53     ` Johannes Weiner
2011-04-21  4:00   ` KAMEZAWA Hiroyuki
2011-04-21  4:24     ` Ying Han
2011-04-21  4:46       ` KAMEZAWA Hiroyuki
2011-04-21  5:08     ` Johannes Weiner
2011-04-21  5:28       ` Ying Han
2011-04-23  1:35         ` Johannes Weiner
2011-04-23  2:10           ` Ying Han
2011-04-23  2:34             ` Johannes Weiner
2011-04-23  3:33               ` Ying Han
2011-04-23  3:41                 ` Rik van Riel
2011-04-23  3:49                   ` Ying Han
2011-04-27  7:36                 ` Johannes Weiner
2011-04-27 17:41                   ` Ying Han
2011-04-27 21:37                     ` Johannes Weiner [this message]
2011-04-21  5:41       ` KAMEZAWA Hiroyuki
2011-04-21  6:23         ` Ying Han
2011-04-23  2:02         ` Johannes Weiner
2011-04-21  3:40 ` KAMEZAWA Hiroyuki
2011-04-21  3:48   ` [PATCH 2/3] weight for memcg background reclaim (Was " KAMEZAWA Hiroyuki
2011-04-21  6:11     ` Ying Han
2011-04-21  6:38       ` KAMEZAWA Hiroyuki
2011-04-21  6:59         ` Ying Han
2011-04-21  7:01           ` KAMEZAWA Hiroyuki
2011-04-21  7:12             ` Ying Han
2011-04-21  3:50   ` [PATCH 3/3/] fix mem_cgroup_watemark_ok " KAMEZAWA Hiroyuki
2011-04-21  5:29     ` Ying Han
2011-04-21  4:22   ` Ying Han
2011-04-21  4:27     ` KAMEZAWA Hiroyuki
2011-04-21  4:31     ` Ying Han
2011-04-21  3:43 ` [PATCH 1/3] memcg kswapd thread pool (Was " KAMEZAWA Hiroyuki
2011-04-21  7:09   ` Ying Han
2011-04-21  7:14     ` KAMEZAWA Hiroyuki
2011-04-21  8:10   ` Minchan Kim
2011-04-21  8:46     ` KAMEZAWA Hiroyuki
2011-04-21  9:05       ` Minchan Kim
2011-04-21 16:56         ` Ying Han
2011-04-22  1:02           ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110427213737.GD12437@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=cl@linux.com \
    --cc=dave@linux.vnet.ibm.com \
    --cc=hughd@google.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=mel@csn.ul.ie \
    --cc=mhocko@suse.cz \
    --cc=minchan.kim@gmail.com \
    --cc=nishimura@mxp.nes.nec.co.jp \
    --cc=riel@redhat.com \
    --cc=tj@kernel.org \
    --cc=xemul@openvz.org \
    --cc=yinghan@google.com \
    --cc=zhu.yanhai@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox