From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail191.messagelabs.com (mail191.messagelabs.com [216.82.242.19]) by kanga.kvack.org (Postfix) with ESMTP id 427C28D003B for ; Fri, 22 Apr 2011 01:53:26 -0400 (EDT) Received: from kpbe12.cbf.corp.google.com (kpbe12.cbf.corp.google.com [172.25.105.76]) by smtp-out.google.com with ESMTP id p3M5rN4a027433 for ; Thu, 21 Apr 2011 22:53:23 -0700 Received: from qwj9 (qwj9.prod.google.com [10.241.195.73]) by kpbe12.cbf.corp.google.com with ESMTP id p3M5qKG1019956 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=NOT) for ; Thu, 21 Apr 2011 22:53:22 -0700 Received: by qwj9 with SMTP id 9so234914qwj.35 for ; Thu, 21 Apr 2011 22:53:19 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20110422140023.949e5737.kamezawa.hiroyu@jp.fujitsu.com> References: <1303446260-21333-1-git-send-email-yinghan@google.com> <1303446260-21333-5-git-send-email-yinghan@google.com> <20110422133643.6a36d838.kamezawa.hiroyu@jp.fujitsu.com> <20110422140023.949e5737.kamezawa.hiroyu@jp.fujitsu.com> Date: Thu, 21 Apr 2011 22:53:19 -0700 Message-ID: Subject: Re: [PATCH V7 4/9] Add memcg kswapd thread pool From: Ying Han Content-Type: multipart/alternative; boundary=001636b1476791c3a404a17b79d4 Sender: owner-linux-mm@kvack.org List-ID: To: KAMEZAWA Hiroyuki Cc: KOSAKI Motohiro , Minchan Kim , Daisuke Nishimura , Balbir Singh , Tejun Heo , Pavel Emelyanov , Andrew Morton , Li Zefan , Mel Gorman , Christoph Lameter , Johannes Weiner , Rik van Riel , Hugh Dickins , Michal Hocko , Dave Hansen , Zhu Yanhai , linux-mm@kvack.org --001636b1476791c3a404a17b79d4 Content-Type: text/plain; charset=ISO-8859-1 On Thu, Apr 21, 2011 at 10:00 PM, KAMEZAWA Hiroyuki < kamezawa.hiroyu@jp.fujitsu.com> wrote: > On Thu, 21 Apr 2011 21:49:04 -0700 > Ying Han wrote: > > > On Thu, Apr 21, 2011 at 9:36 PM, KAMEZAWA Hiroyuki < > > kamezawa.hiroyu@jp.fujitsu.com> wrote: > > > > > On Thu, 21 Apr 2011 21:24:15 -0700 > > > Ying Han wrote: > > > > > > > This patch creates a thread pool for memcg-kswapd. All memcg which > needs > > > > background recalim are linked to a list and memcg-kswapd picks up a > memcg > > > > from the list and run reclaim. > > > > > > > > The concern of using per-memcg-kswapd thread is the system overhead > > > including > > > > memory and cputime. > > > > > > > > Signed-off-by: KAMEZAWA Hiroyuki > > > > Signed-off-by: Ying Han > > > > > > Thank you for merging. This seems ok to me. > > > > > > Further development may make this better or change thread pools (to > some > > > other), > > > but I think this is enough good. > > > > > > > Thank you for reviewing and Acking. At the same time, I do have wondering > on > > the thread-pool modeling which I posted on the cover-letter :) > > > > The per-memcg-per-kswapd model > > Pros: > > 1. memory overhead per thread, and The memory consumption would be > 8k*1000 = > > 8M > > with 1k cgroup. > > 2. we see lots of threads at 'ps -elf' > > > > Cons: > > 1. the implementation is simply and straigh-forward. > > 2. we can easily isolate the background reclaim overhead between cgroups. > > 3. better latency from memory pressure to actual start reclaiming > > > > The thread-pool model > > Pros: > > 1. there is no isolation between memcg background reclaim, since the > memcg > > threads > > are shared. > > 2. it is hard for visibility and debugability. I have been experienced a > lot > > when > > some kswapds running creazy and we need a stright-forward way to identify > > which > > cgroup causing the reclaim. > > 3. potential starvation for some memcgs, if one workitem stucks and the > rest > > of work > > won't proceed. > > > > Cons: > > 1. save some memory resource. > > > > In general, the per-memcg-per-kswapd implmentation looks sane to me at > this > > point, esepcially the sharing memcg thread model will make debugging > issue > > very hard later. > > > > Comments? > > > Pros <-> Cons ? > > My idea is adding trace point for memcg-kswapd and seeing what it's now > doing. > (We don't have too small trace point in memcg...) > > I don't think its sane to create kthread per memcg because we know there is > a user > who makes hundreds/thousands of memcg. > > And, I think that creating threads, which does the same job, more than the > number > of cpus will cause much more difficult starvation, priority inversion > issue. > Keeping scheduling knob/chances of jobs in memcg is important. I don't want > to > give a hint to scheduler because of memcg internal issue. > > And, even if memcg-kswapd doesn't exist, memcg works (well?). > memcg-kswapd just helps making things better but not do any critical jobs. > So, it's okay to have this as best-effort service. > Of course, better scheduling idea for picking up memcg is welcomed. It's > now > round-robin. > > Hmm. The concern I have is the debug-ability. Let's say I am running a system and found memcg-3 running crazy. Is there a way to find out which memcg it is trying to reclaim pages from? Also, how to count cputime for the shared memcg to the memcgs if we wanted to. --Ying > Thanks, > -Kame > > --001636b1476791c3a404a17b79d4 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

On Thu, Apr 21, 2011 at 10:00 PM, KAMEZA= WA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
On Thu, 21 Apr 2011 21:49:04 -0700
Ying Han <yinghan@google.com> wrote:

> On Thu, Apr 21, 2011 at 9:36 PM, KAMEZAWA Hiroyuki <
> kamezawa.hiroyu@jp.f= ujitsu.com> wrote:
>
> > On Thu, 21 Apr 2011 21:24:15 -0700
> > Ying Han <yinghan@google= .com> wrote:
> >
> > > This patch creates a thread pool for memcg-kswapd. All memcg= which needs
> > > background recalim are linked to a list and memcg-kswapd pic= ks up a memcg
> > > from the list and run reclaim.
> > >
> > > The concern of using per-memcg-kswapd thread is the system o= verhead
> > including
> > > memory and cputime.
> > >
> > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > > Signed-off-by: Ying Han <yinghan@google.com>
> >
> > Thank you for merging. This seems ok to me.
> >
> > Further development may make this better or change thread pools (= to some
> > other),
> > but I think this is enough good.
> >
>
> Thank you for reviewing and Acking. At the same time, I do have wonder= ing on
> the thread-pool modeling which I posted on the cover-letter :)
>
> The per-memcg-per-kswapd model
> Pros:
> 1. memory overhead per thread, and The memory consumption would be 8k*= 1000 =3D
> 8M
> with 1k cgroup.
> 2. we see lots of threads at 'ps -elf'
>
> Cons:
> 1. the implementation is simply and straigh-forward.
> 2. we can easily isolate the background reclaim overhead between cgrou= ps.
> 3. better latency from memory pressure to actual start reclaiming
>
> The thread-pool model
> Pros:
> 1. there is no isolation between memcg background reclaim, since the m= emcg
> threads
> are shared.
> 2. it is hard for visibility and debugability. I have been experienced= a lot
> when
> some kswapds running creazy and we need a stright-forward way to ident= ify
> which
> cgroup causing the reclaim.
> 3. potential starvation for some memcgs, if one workitem stucks and th= e rest
> of work
> won't proceed.
>
> Cons:
> 1. save some memory resource.
>
> In general, the per-memcg-per-kswapd implmentation looks sane to me at= this
> point, esepcially the sharing memcg thread model will make debugging i= ssue
> very hard later.
>
> Comments?
>
Pros <-> Cons ?

My idea is adding trace point for memcg-kswapd and seeing what it's now= doing.
(We don't have too small trace point in memcg...)

I don't think its sane to create kthread per memcg because we know ther= e is a user
who makes hundreds/thousands of memcg.

And, I think that creating threads, which does the same job, more than the = number
of cpus will cause much more difficult starvation, priority inversion issue= .
Keeping scheduling knob/chances of jobs in memcg is important. I don't = want to
give a hint to scheduler because of memcg internal issue.

And, even if memcg-kswapd doesn't exist, memcg works (well?).
memcg-kswapd just helps making things better but not do any critical jobs.<= br> So, it's okay to have this as best-effort service.
Of course, better scheduling idea for picking up memcg is welcomed. It'= s now
round-robin.

Hmm. The concern I have is the=A0debug-ability. Let&#= 39;s say I am running a system and found memcg-3 running=A0crazy. Is there = a way to find out which memcg it is trying to reclaim pages from? Also, how= to count cputime for the shared memcg to the memcgs if we wanted to.=A0

--Ying
=A0=A0 =A0
Thanks,
-Kame


--001636b1476791c3a404a17b79d4-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org