From: Ying Han <yinghan@google.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
"kosaki.motohiro@jp.fujitsu.com" <kosaki.motohiro@jp.fujitsu.com>,
"balbir@linux.vnet.ibm.com" <balbir@linux.vnet.ibm.com>,
"nishimura@mxp.nes.nec.co.jp" <nishimura@mxp.nes.nec.co.jp>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
Johannes Weiner <jweiner@redhat.com>,
"minchan.kim@gmail.com" <minchan.kim@gmail.com>,
Michal Hocko <mhocko@suse.cz>, Greg Thelen <gthelen@google.com>,
Hugh Dickins <hughd@google.com>
Subject: Re: [PATCH 0/7] memcg background reclaim , yet another one.
Date: Mon, 25 Apr 2011 15:21:21 -0700 [thread overview]
Message-ID: <BANLkTikYeV8JpMHd1Lvh7kRXXpLyQEOw4w@mail.gmail.com> (raw)
In-Reply-To: <20110425191437.d881ee68.kamezawa.hiroyu@jp.fujitsu.com>
Kame:
Thank you for putting time on implementing the patch. I think it is
definitely a good idea to have the two alternatives on the table since
people has asked the questions. Before going down to the track, i have
thought about the two approaches and also discussed with Greg and Hugh
(cc-ed), i would like to clarify some of the pros and cons on both
approaches. In general, I think the workqueue is not the right answer
for this purpose.
The thread-pool model
Pros:
1. there is no isolation between memcg background reclaim, since the
memcg threads are shared. That isolation including all the resources
that the per-memcg background reclaim will need to access, like cpu
time. One thing we are missing for the shared worker model is the
individual cpu scheduling ability. We need the ability to isolate and
count the resource assumption per memcg, and including how much
cputime and where to run the per-memcg kswapd thread.
2. it is hard for visibility and debugability. We have been
experiencing a lot when some kswapds running creazy and we need a
stright-forward way to identify which cgroup causing the reclaim. yes,
we can add more stats per-memcg to sort of giving that visibility, but
I can tell they are involved w/ more overhead of the change. Why
introduce the over-head if the per-memcg kswapd thread can offer that
maturely.
3. potential priority inversion for some memcgs. Let's say we have two
memcgs A and B on a single core machine, and A has big chuck of work
and B has small chuck of work. Now B's work is queued up after A. In
the workqueue model, we won't process B unless we finish A's work
since we only have one worker on the single core host. However, in the
per-memcg kswapd model, B got chance to run when A calls
cond_resched(). Well, we might not having the exact problem if we
don't constrain the workers number, and the worst case we'll have the
same number of workers as the number of memcgs. If so, it would be the
same model as per-memcg kswapd.
4. the kswapd threads are created and destroyed dynamically. are we
talking about allocating 8k of stack for kswapd when we are under
memory pressure? In the other case, all the memory are preallocated.
5. the workqueue is scary and might introduce issues sooner or later.
Also, why we think the background reclaim fits into the workqueue
model, and be more specific, how that share the same logic of other
parts of the system using workqueue.
Cons:
1. save SOME memory resource.
The per-memcg-per-kswapd model
Pros:
1. memory overhead per thread, and The memory consumption would be
8k*1000 = 8M with 1k cgroup. This is NOT a problem as least we haven't
seen it in our production. We have cases that 2k of kernel threads
being created, and we haven't noticed it is causing resource
consumption problem as well as performance issue. On those systems, we
might have ~100 cgroup running at a time.
2. we see lots of threads at 'ps -elf'. well, is that really a problem
that we need to change the threading model?
Overall, the per-memcg-per-kswapd thread model is simple enough to
provide better isolation (predictability & debug ability). The number
of threads we might potentially have on the system is not a real
problem. We already have systems running that much of threads (even
more) and we haven't seen problem of that. Also, i can imagine it will
make our life easier for some other extensions on memcg works.
For now, I would like to stick on the simple model. At the same time I
am willing to looking into changes and fixes whence we have seen
problems later.
Comments?
Thanks
--Ying
On Mon, Apr 25, 2011 at 3:14 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Mon, 25 Apr 2011 18:25:29 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>
>
>> 2) == hard limit 500M/ hi_watermark = 400M ==
>> [root@rhel6-test hilow]# time cp ./tmpfile xxx
>>
>> real 0m6.421s
>> user 0m0.059s
>> sys 0m2.707s
>>
>
> When doing this, we see usage changes as
> (sec) (bytes)
> 0: 401408 <== cp start
> 1: 98603008
> 2: 262705152
> 3: 433491968 <== wmark reclaim triggerd.
> 4: 486502400
> 5: 507748352
> 6: 524189696 <== cp ends (and hit limits)
> 7: 501231616
> 8: 499511296
> 9: 477118464
> 10: 417980416 <== usage goes below watermark.
> 11: 417980416
> .....
>
> If we have dirty_ratio, this result will be some different.
> (and flusher thread will work sooner...)
>
>
> Thanks,
> -Kame
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-04-25 22:21 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-04-25 9:25 KAMEZAWA Hiroyuki
2011-04-25 9:28 ` [PATCH 1/7] memcg: add high/low watermark to res_counter KAMEZAWA Hiroyuki
2011-04-26 17:54 ` Ying Han
2011-04-29 13:33 ` Michal Hocko
2011-05-01 6:06 ` KOSAKI Motohiro
2011-05-03 6:49 ` Michal Hocko
2011-05-03 7:45 ` KOSAKI Motohiro
2011-05-03 8:25 ` Michal Hocko
2011-05-03 17:01 ` Ying Han
2011-05-04 8:58 ` Michal Hocko
2011-05-04 17:16 ` Ying Han
2011-05-05 6:59 ` Michal Hocko
2011-05-06 5:28 ` KAMEZAWA Hiroyuki
2011-05-06 14:22 ` Johannes Weiner
2011-05-09 0:21 ` KAMEZAWA Hiroyuki
2011-05-09 5:47 ` Ying Han
2011-05-09 9:58 ` Johannes Weiner
2011-05-09 9:59 ` KAMEZAWA Hiroyuki
2011-05-10 4:43 ` Ying Han
2011-05-09 5:40 ` Ying Han
2011-05-09 7:10 ` KAMEZAWA Hiroyuki
2011-05-09 10:18 ` Johannes Weiner
2011-05-09 12:49 ` Michal Hocko
2011-05-09 23:49 ` KAMEZAWA Hiroyuki
2011-05-10 4:39 ` Ying Han
2011-05-10 4:51 ` Ying Han
2011-05-10 6:27 ` Johannes Weiner
2011-05-10 7:09 ` Ying Han
2011-05-04 3:55 ` KOSAKI Motohiro
2011-05-04 8:55 ` Michal Hocko
2011-05-09 3:24 ` KOSAKI Motohiro
2011-05-02 9:07 ` Balbir Singh
2011-05-06 5:30 ` KAMEZAWA Hiroyuki
2011-04-25 9:29 ` [PATCH 2/7] memcg high watermark interface KAMEZAWA Hiroyuki
2011-04-25 22:36 ` Ying Han
2011-04-25 9:31 ` [PATCH 3/7] memcg: select victim node in round robin KAMEZAWA Hiroyuki
2011-04-25 9:34 ` [PATCH 4/7] memcg fix scan ratio with small memcg KAMEZAWA Hiroyuki
2011-04-25 17:35 ` Ying Han
2011-04-26 1:43 ` KAMEZAWA Hiroyuki
2011-04-25 9:36 ` [PATCH 5/7] memcg bgreclaim core KAMEZAWA Hiroyuki
2011-04-26 4:59 ` Ying Han
2011-04-26 5:08 ` KAMEZAWA Hiroyuki
2011-04-26 23:15 ` Ying Han
2011-04-27 0:10 ` KAMEZAWA Hiroyuki
2011-04-27 1:01 ` KAMEZAWA Hiroyuki
2011-04-26 18:37 ` Ying Han
2011-04-25 9:40 ` [PATCH 6/7] memcg add zone_all_unreclaimable KAMEZAWA Hiroyuki
2011-04-25 9:42 ` [PATCH 7/7] memcg watermark reclaim workqueue KAMEZAWA Hiroyuki
2011-04-26 23:19 ` Ying Han
2011-04-27 0:31 ` KAMEZAWA Hiroyuki
2011-04-27 3:40 ` Ying Han
2011-04-25 9:43 ` [PATCH 8/7] memcg : reclaim statistics KAMEZAWA Hiroyuki
2011-04-26 5:35 ` Ying Han
2011-04-25 9:49 ` [PATCH 0/7] memcg background reclaim , yet another one KAMEZAWA Hiroyuki
2011-04-25 10:14 ` KAMEZAWA Hiroyuki
2011-04-25 22:21 ` Ying Han [this message]
2011-04-26 1:38 ` KAMEZAWA Hiroyuki
2011-04-26 7:19 ` Ying Han
2011-04-26 7:43 ` KAMEZAWA Hiroyuki
2011-04-26 8:43 ` Ying Han
2011-04-26 8:47 ` KAMEZAWA Hiroyuki
2011-04-26 23:08 ` Ying Han
2011-04-27 0:34 ` KAMEZAWA Hiroyuki
2011-04-27 1:19 ` Ying Han
2011-04-28 3:55 ` Ying Han
2011-04-28 4:05 ` KAMEZAWA Hiroyuki
2011-05-02 7:02 ` Balbir Singh
2011-05-02 6:09 ` Balbir Singh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=BANLkTikYeV8JpMHd1Lvh7kRXXpLyQEOw4w@mail.gmail.com \
--to=yinghan@google.com \
--cc=akpm@linux-foundation.org \
--cc=balbir@linux.vnet.ibm.com \
--cc=gthelen@google.com \
--cc=hughd@google.com \
--cc=jweiner@redhat.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.cz \
--cc=minchan.kim@gmail.com \
--cc=nishimura@mxp.nes.nec.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox