From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail144.messagelabs.com (mail144.messagelabs.com [216.82.254.51]) by kanga.kvack.org (Postfix) with SMTP id 928046B004F for ; Sat, 10 Oct 2009 22:33:55 -0400 (EDT) Received: from m4.gw.fujitsu.co.jp ([10.0.50.74]) by fgwmail6.fujitsu.co.jp (Fujitsu Gateway) with ESMTP id n9B2Xqj4031941 for (envelope-from kamezawa.hiroyu@jp.fujitsu.com); Sun, 11 Oct 2009 11:33:52 +0900 Received: from smail (m4 [127.0.0.1]) by outgoing.m4.gw.fujitsu.co.jp (Postfix) with ESMTP id 1113C45DE6E for ; Sun, 11 Oct 2009 11:33:52 +0900 (JST) Received: from s4.gw.fujitsu.co.jp (s4.gw.fujitsu.co.jp [10.0.50.94]) by m4.gw.fujitsu.co.jp (Postfix) with ESMTP id DD8F945DE60 for ; Sun, 11 Oct 2009 11:33:51 +0900 (JST) Received: from s4.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s4.gw.fujitsu.co.jp (Postfix) with ESMTP id C4F2C1DB803E for ; Sun, 11 Oct 2009 11:33:51 +0900 (JST) Received: from ml14.s.css.fujitsu.com (ml14.s.css.fujitsu.com [10.249.87.104]) by s4.gw.fujitsu.co.jp (Postfix) with ESMTP id 74F411DB8037 for ; Sun, 11 Oct 2009 11:33:51 +0900 (JST) Message-ID: In-Reply-To: <604427e00910091737s52e11ce9p256c95d533dc2837@mail.gmail.com> References: <20091002135531.3b5abf5c.kamezawa.hiroyu@jp.fujitsu.com> <604427e00910091737s52e11ce9p256c95d533dc2837@mail.gmail.com> Date: Sun, 11 Oct 2009 11:33:50 +0900 (JST) Subject: Re: [PATCH 0/2] memcg: improving scalability by reducing lock contention at charge/uncharge From: "KAMEZAWA Hiroyuki" MIME-Version: 1.0 Content-Type: text/plain;charset=iso-2022-jp Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org To: Ying Han Cc: KAMEZAWA Hiroyuki , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , "akpm@linux-foundation.org" , "balbir@linux.vnet.ibm.com" , "nishimura@mxp.nes.nec.co.jp" List-ID: Ying Han wrote: > Hi KAMEZAWA-san: I tested your patch set based on 2.6.32-rc3 but I don't > see > much improvement on the page-faults rate. > Here is the number I got: > > [Before] > Performance counter stats for './runpause.sh 10' (5 runs): > > 226272.271246 task-clock-msecs # 3.768 CPUs ( +- > 0.193% > ) > 4424 context-switches # 0.000 M/sec ( +- > 14.418% > ) > 25 CPU-migrations # 0.000 M/sec ( +- > 23.077% > ) > 80499059 page-faults # 0.356 M/sec ( +- > 2.586% > ) > 499246232482 cycles # 2206.396 M/sec ( +- > 0.055% > ) > 193036122022 instructions # 0.387 IPC ( +- > 0.281% > ) > 76548856038 cache-references # 338.304 M/sec ( +- > 0.832% > ) > 480196860 cache-misses # 2.122 M/sec ( +- > 2.741% > ) > > 60.051646892 seconds time elapsed ( +- 0.010% ) > > [After] > Performance counter stats for './runpause.sh 10' (5 runs): > > 226491.338475 task-clock-msecs # 3.772 CPUs ( +- > 0.176% > ) > 3377 context-switches # 0.000 M/sec ( +- > 14.713% > ) > 12 CPU-migrations # 0.000 M/sec ( +- > 23.077% > ) > 81867014 page-faults # 0.361 M/sec ( +- > 3.201% > ) > 499835798750 cycles # 2206.865 M/sec ( +- > 0.036% > ) > 196685031865 instructions # 0.393 IPC ( +- > 0.286% > ) > 81143829910 cache-references # 358.265 M/sec ( +- > 0.428% > ) > 119362559 cache-misses # 0.527 M/sec ( +- > 5.291% > ) > > 60.048917062 seconds time elapsed ( +- 0.010% ) > > I ran it on an 4 core machine with 16G of RAM. And I modified > the runpause.sh to fork 4 pagefault process instead of 8. I mounted cgroup > with only memory subsystem and start running the test on the root cgroup. > > I believe that we might have different running environment including the > cgroup configuration. Any suggestions? > This patch series is only for "child" cgroup. Sorry, I had to write it clearer. No effects to root. Regards, -Kame > --Ying > > On Thu, Oct 1, 2009 at 9:55 PM, KAMEZAWA Hiroyuki < > kamezawa.hiroyu@jp.fujitsu.com> wrote: > >> Hi, >> >> This patch is against mmotm + softlimit fix patches. >> (which are now in -rc git tree.) >> >> In the latest -rc series, the kernel avoids accessing res_counter when >> cgroup is root cgroup. This helps scalabilty when memcg is not used. >> >> It's necessary to improve scalabilty even when memcg is used. This patch >> is for that. Previous Balbir's work shows that the biggest obstacles for >> better scalabilty is memcg's res_counter. Then, there are 2 ways. >> >> (1) make counter scale well. >> (2) avoid accessing core counter as much as possible. >> >> My first direction was (1). But no, there is no counter which is free >> from false sharing when it needs system-wide fine grain synchronization. >> And res_counter has several functionality...this makes (1) difficult. >> spin_lock (in slow path) around counter means tons of invalidation will >> happen even when we just access counter without modification. >> >> This patch series is for (2). This implements charge/uncharge in bached >> manner. >> This coalesces access to res_counter at charge/uncharge using nature of >> access locality. >> >> Tested for a month. And I got good reorts from Balbir and Nishimura, >> thanks. >> One concern is that this adds some members to the bottom of task_struct. >> Better idea is welcome. >> >> Following is test result of continuous page-fault on my 8cpu >> box(x86-64). >> >> A loop like this runs on all cpus in parallel for 60secs. >> == >> while (1) { >> x = mmap(NULL, MEGA, PROT_READ|PROT_WRITE, >> MAP_PRIVATE|MAP_ANONYMOUS, 0, 0); >> >> for (off = 0; off < MEGA; off += PAGE_SIZE) >> x[off]=0; >> munmap(x, MEGA); >> } >> == >> please see # of page faults. I think this is good improvement. >> >> >> [Before] >> Performance counter stats for './runpause.sh' (5 runs): >> >> 474539.756944 task-clock-msecs # 7.890 CPUs ( +- >> 0.015% >> ) >> 10284 context-switches # 0.000 M/sec ( +- >> 0.156% >> ) >> 12 CPU-migrations # 0.000 M/sec ( +- >> 0.000% >> ) >> 18425800 page-faults # 0.039 M/sec ( +- >> 0.107% >> ) >> 1486296285360 cycles # 3132.080 M/sec ( +- >> 0.029% >> ) >> 380334406216 instructions # 0.256 IPC ( +- >> 0.058% >> ) >> 3274206662 cache-references # 6.900 M/sec ( +- >> 0.453% >> ) >> 1272947699 cache-misses # 2.682 M/sec ( +- >> 0.118% >> ) >> >> 60.147907341 seconds time elapsed ( +- 0.010% ) >> >> [After] >> Performance counter stats for './runpause.sh' (5 runs): >> >> 474658.997489 task-clock-msecs # 7.891 CPUs ( +- >> 0.006% >> ) >> 10250 context-switches # 0.000 M/sec ( +- >> 0.020% >> ) >> 11 CPU-migrations # 0.000 M/sec ( +- >> 0.000% >> ) >> 33177858 page-faults # 0.070 M/sec ( +- >> 0.152% >> ) >> 1485264748476 cycles # 3129.120 M/sec ( +- >> 0.021% >> ) >> 409847004519 instructions # 0.276 IPC ( +- >> 0.123% >> ) >> 3237478723 cache-references # 6.821 M/sec ( +- >> 0.574% >> ) >> 1182572827 cache-misses # 2.491 M/sec ( +- >> 0.179% >> ) >> >> 60.151786309 seconds time elapsed ( +- 0.014% ) >> >> Regards, >> -Kame >> >> -- >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> the body to majordomo@kvack.org. For more info on Linux MM, >> see: http://www.linux-mm.org/ . >> Don't email: email@kvack.org >> > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org