From: Andrea Righi <arighi@develer.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Balbir Singh <balbir@linux.vnet.ibm.com>,
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Vivek Goyal <vgoyal@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Trond Myklebust <trond.myklebust@fys.uio.no>,
Suleiman Souhlal <suleiman@google.com>,
Greg Thelen <gthelen@google.com>,
"Kirill A. Shutemov" <kirill@shutemov.name>,
Andrew Morton <akpm@linux-foundation.org>,
containers@lists.linux-foundation.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: [PATCH -mmotm 0/5] memcg: per cgroup dirty limit (v6)
Date: Wed, 10 Mar 2010 00:00:31 +0100 [thread overview]
Message-ID: <1268175636-4673-1-git-send-email-arighi@develer.com> (raw)
Control the maximum amount of dirty pages a cgroup can have at any given time.
Per cgroup dirty limit is like fixing the max amount of dirty (hard to reclaim)
page cache used by any cgroup. So, in case of multiple cgroup writers, they
will not be able to consume more than their designated share of dirty pages and
will be forced to perform write-out if they cross that limit.
The overall design is the following:
- account dirty pages per cgroup
- limit the number of dirty pages via memory.dirty_ratio / memory.dirty_bytes
and memory.dirty_background_ratio / memory.dirty_background_bytes in
cgroupfs
- start to write-out (background or actively) when the cgroup limits are
exceeded
This feature is supposed to be strictly connected to any underlying IO
controller implementation, so we can stop increasing dirty pages in VM layer
and enforce a write-out before any cgroup will consume the global amount of
dirty pages defined by the /proc/sys/vm/dirty_ratio|dirty_bytes and
/proc/sys/vm/dirty_background_ratio|dirty_background_bytes limits.
Changelog (v5 -> v6)
~~~~~~~~~~~~~~~~~~~~~~
* always disable/enable IRQs at lock/unlock_page_cgroup(): this allows to drop
the previous complicated locking scheme in favor of a simpler locking, even
if this obviously adds some overhead (see results below)
* drop FUSE and NILFS2 dirty pages accounting for now (this depends on
charging bounce pages per cgroup)
Results
~~~~~~~
I ran some tests using a kernel build (2.6.33 x86_64_defconfig) on a
Intel Core 2 @ 1.2GHz as testcase using different kernels:
- mmotm "vanilla"
- mmotm with cgroup-dirty-memory using the previous "complex" locking scheme
(my previous patchset + the fixes reported by Kame-san and Daisuke-san)
- mmotm with cgroup-dirty-memory using the simple locking scheme
(lock_page_cgroup() with IRQs disabled)
Following the results:
<before>
- mmotm "vanilla", root cgroup: 11m51.983s
- mmotm "vanilla", child cgroup: 11m56.596s
<after>
- mmotm, "complex" locking scheme, root cgroup: 11m53.037s
- mmotm, "complex" locking scheme, child cgroup: 11m57.896s
- mmotm, lock_page_cgroup+irq_disabled, root cgroup: 12m5.499s
- mmotm, lock_page_cgroup+irq_disabled, child cgroup: 12m9.920s
With the "complex" locking solution, the overhead introduced by the
cgroup dirty memory accounting is minimal (0.14%), compared with the overhead
introduced by the lock_page_cgroup+irq_disabled solution (1.90%).
The performance overhead is not so huge in both solutions, but the impact on
performance is even more reduced using a complicated solution...
Maybe we can go ahead with the simplest implementation for now and start to
think to an alternative implementation of the page_cgroup locking and
charge/uncharge of pages.
If someone is interested or want to repeat the tests (maybe on a bigger
machine) I can post also the other version of the patchset. Just let me know.
-Andrea
Documentation/cgroups/memory.txt | 36 +++
fs/nfs/write.c | 4 +
include/linux/memcontrol.h | 87 +++++++-
include/linux/page_cgroup.h | 42 ++++-
include/linux/writeback.h | 2 -
mm/filemap.c | 1 +
mm/memcontrol.c | 475 +++++++++++++++++++++++++++++++++-----
mm/page-writeback.c | 215 +++++++++++-------
mm/rmap.c | 4 +-
mm/truncate.c | 1 +
10 files changed, 722 insertions(+), 145 deletions(-)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next reply other threads:[~2010-03-09 23:00 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-09 23:00 Andrea Righi [this message]
2010-03-09 23:00 ` [PATCH -mmotm 1/5] memcg: disable irq at page cgroup lock Andrea Righi
2010-03-09 23:00 ` [PATCH -mmotm 2/5] memcg: dirty memory documentation Andrea Righi
2010-03-09 23:00 ` [PATCH -mmotm 3/5] page_cgroup: introduce file cache flags Andrea Righi
2010-03-09 23:00 ` [PATCH -mmotm 4/5] memcg: dirty pages accounting and limiting infrastructure Andrea Righi
2010-03-10 22:23 ` Vivek Goyal
2010-03-11 22:27 ` Andrea Righi
2010-03-09 23:00 ` [PATCH -mmotm 5/5] memcg: dirty pages instrumentation Andrea Righi
2010-03-10 1:36 ` [PATCH -mmotm 0/5] memcg: per cgroup dirty limit (v6) Balbir Singh
2010-03-11 0:39 ` KAMEZAWA Hiroyuki
2010-03-11 1:17 ` KAMEZAWA Hiroyuki
2010-03-11 9:14 ` Peter Zijlstra
2010-03-11 9:25 ` KAMEZAWA Hiroyuki
2010-03-11 9:42 ` KAMEZAWA Hiroyuki
2010-03-11 22:20 ` Andrea Righi
2010-03-12 1:14 ` Daisuke Nishimura
2010-03-12 2:24 ` KAMEZAWA Hiroyuki
2010-03-15 14:48 ` Vivek Goyal
2010-03-12 10:07 ` Andrea Righi
2010-03-11 15:03 ` Vivek Goyal
2010-03-11 23:27 ` Andrea Righi
2010-03-11 23:52 ` KAMEZAWA Hiroyuki
2010-03-12 10:01 ` Andrea Righi
2010-03-15 14:16 ` Vivek Goyal
2010-03-11 23:42 ` KAMEZAWA Hiroyuki
2010-03-12 0:33 ` Andrea Righi
2010-03-15 14:38 ` Vivek Goyal
2010-03-17 22:32 ` Andrea Righi
2010-03-11 22:23 ` Andrea Righi
2010-03-11 18:07 ` Vivek Goyal
2010-03-11 23:59 ` Andrea Righi
2010-03-12 0:03 ` KAMEZAWA Hiroyuki
2010-03-12 9:58 ` Andrea Righi
2010-03-15 14:41 ` Vivek Goyal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1268175636-4673-1-git-send-email-arighi@develer.com \
--to=arighi@develer.com \
--cc=akpm@linux-foundation.org \
--cc=balbir@linux.vnet.ibm.com \
--cc=containers@lists.linux-foundation.org \
--cc=gthelen@google.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kirill@shutemov.name \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nishimura@mxp.nes.nec.co.jp \
--cc=peterz@infradead.org \
--cc=suleiman@google.com \
--cc=trond.myklebust@fys.uio.no \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox