Re: [PATCH v4 00/11] memcg: per cgroup dirty page accounting

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Greg Thelen <gthelen@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	containers@lists.osdl.org, Andrea Righi <arighi@develer.com>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
	Minchan Kim <minchan.kim@gmail.com>,
	Ciju Rajan K <ciju@linux.vnet.ibm.com>,
	David Rientjes <rientjes@google.com>,
	Wu Fengguang <fengguang.wu@intel.com>
Subject: Re: [PATCH v4 00/11] memcg: per cgroup dirty page accounting
Date: Sat, 30 Oct 2010 14:46:09 -0700	[thread overview]
Message-ID: <xr93sjzne4m6.fsf@ninji.mtv.corp.google.com> (raw)
In-Reply-To: <20101029131946.5905d244.akpm@linux-foundation.org> (Andrew Morton's message of "Fri, 29 Oct 2010 13:19:46 -0700")

Andrew Morton <akpm@linux-foundation.org> writes:

> On Fri, 29 Oct 2010 00:09:03 -0700
> Greg Thelen <gthelen@google.com> wrote:
>
> This is cool stuff - it's been a long haul.  One day we'll be
> nearly-finished and someone will write a book telling people how to use
> it all and lots of people will go "holy crap".  I hope.
>
>> Limiting dirty memory is like fixing the max amount of dirty (hard to reclaim)
>> page cache used by a cgroup.  So, in case of multiple cgroup writers, they will
>> not be able to consume more than their designated share of dirty pages and will
>> be forced to perform write-out if they cross that limit.
>> 
>> The patches are based on a series proposed by Andrea Righi in Mar 2010.
>> 
>> Overview:
>> - Add page_cgroup flags to record when pages are dirty, in writeback, or nfs
>>   unstable.
>> 
>> - Extend mem_cgroup to record the total number of pages in each of the 
>>   interesting dirty states (dirty, writeback, unstable_nfs).  
>> 
>> - Add dirty parameters similar to the system-wide  /proc/sys/vm/dirty_*
>>   limits to mem_cgroup.  The mem_cgroup dirty parameters are accessible
>>   via cgroupfs control files.
>
> Curious minds will want to know what the default values are set to and
> how they were determined.

When a memcg is created, its dirty limits are set to a copy of the
parent's limits.  If the new cgroup is a top level cgroup, then it
inherits from the system parameters (/proc/sys/vm/dirty_*).

>> - Consider both system and per-memcg dirty limits in page writeback when
>>   deciding to queue background writeback or block for foreground writeback.
>> 
>> Known shortcomings:
>> - When a cgroup dirty limit is exceeded, then bdi writeback is employed to
>>   writeback dirty inodes.  Bdi writeback considers inodes from any cgroup, not
>>   just inodes contributing dirty pages to the cgroup exceeding its limit.  
>
> yup.  Some broader discussion of the implications of this shortcoming
> is needed.  I'm not sure where it would be placed, though. 
> Documentation/ for now, until you write that book.

Fair enough.  I can add more text to Documentation/ describing the
behavior and issue in more detail.

>> - When memory.use_hierarchy is set, then dirty limits are disabled.  This is a
>>   implementation detail.
>
> So this is unintentional, and forced upon us my the present implementation?

Yes, this is not ideal.  I chose not to address this particular issue in
this series to keep the series smaller.

>>  An enhanced implementation is needed to check the
>>   chain of parents to ensure that no dirty limit is exceeded.
>
> How important is it that this be fixed?

I am not sure if there is interest in hierarchical per-memcg dirty
limits.  So I don't think that this is very important to be fixed
immediately.  But the fact that it doesn't work is unexpected.  It would
be nice if it just worked.  I'll look into making it work.

> And how feasible would that fix be?  A linear walk up the hierarchy
> list?  More than that?

I think it should be a simple matter of enhancing
mem_cgroup_dirty_info() to walk up the hierarchy looking for the cgroup
closest to its dirty limit.  The only tricky part is that there are
really two limits (foreground/throttling limit, and a background limit)
that need to be considered when finding the memcg that most deserves
inspection by balance_dirty_pages().

>> Performance data:
>> - A page fault microbenchmark workload was used to measure performance, which
>>   can be called in read or write mode:
>>         f = open(foo. $cpu)
>>         truncate(f, 4096)
>>         alarm(60)
>>         while (1) {
>>                 p = mmap(f, 4096)
>>                 if (write)
>> 			*p = 1
>> 		else
>> 			x = *p
>>                 munmap(p)
>>         }
>> 
>> - The workload was called for several points in the patch series in different
>>   modes:
>>   - s_read is a single threaded reader
>>   - s_write is a single threaded writer
>>   - p_read is a 16 thread reader, each operating on a different file
>>   - p_write is a 16 thread writer, each operating on a different file
>> 
>> - Measurements were collected on a 16 core non-numa system using "perf stat
>>   --repeat 3".  The -a option was used for parallel (p_*) runs.
>> 
>> - All numbers are page fault rate (M/sec).  Higher is better.
>> 
>> - To compare the performance of a kernel without non-memcg compare the first and
>>   last rows, neither has memcg configured.  The first row does not include any
>>   of these memcg patches.
>> 
>> - To compare the performance of using memcg dirty limits, compare the baseline
>>   (2nd row titled "w/ memcg") with the the code and memcg enabled (2nd to last
>>   row titled "all patches").
>> 
>>                            root_cgroup                    child_cgroup
>>                  s_read s_write p_read p_write   s_read s_write p_read p_write
>> mmotm w/o memcg   0.428  0.390   0.429  0.388
>> mmotm w/ memcg    0.411  0.378   0.391  0.362     0.412  0.377   0.385  0.363
>> all patches       0.384  0.360   0.370  0.348     0.381  0.363   0.368  0.347
>> all patches       0.431  0.402   0.427  0.395
>>   w/o memcg
>
> afaict this benchmark has demonstrated that the changes do not cause an
> appreciable performance regression in terms of CPU loading, yes?

Using the mmap() workload, which is a fault heavy workload...

When memcg is not configured, there is no significant performance
change.  Depending on the workload the performance is between 0%..3%
faster.  This is likely workload noise.

When memcg is configured, the performance drops between 4% and 8%.  Some
of this might be noise, but it is expected that memcg faults will get
slower because there's more code in the fault path.

> Can we come up with any tests which demonstrate the _benefits_ of the
> feature?

Here is a test script that shows a situation where memcg dirty limits
are beneficial.  The script runs two programs: a dirty page background
antagonist (dd) and an interactive foreground process (tar).  If the
scripts argument is false, then both processes are run together in the
root cgroup sharing system-wide dirty memory in classic fashion.  If the
script is given a true argument, then a cgroup is used to contain dd
dirty page consumption.

---[start]---
#!/bin/bash
# dirty.sh - dirty limit performance test script
echo use_cgroup: $1

# start antagonist
if $1; then    # if using cgroup to contain 'dd'...
  mkdir /dev/cgroup/A
  echo 400M > /dev/cgroup/A/memory.dirty_limit_in_bytes
  (echo $BASHPID > /dev/cgroup/A/tasks; dd if=/dev/zero of=big.file
  count=10k bs=1M) &
else
  dd if=/dev/zero of=big.file count=10k bs=1M &
fi

sleep 10

time tar -xzf linux-2.6.36.tar.gz
wait
$1 && rmdir /dev/cgroup/A
---[end]---

dirty.sh false : dd 59.7MB/s stddev 7.442%, tar 12.2s stddev 25.720%
  # both in root_cgroup
dirty.sh true  : dd 55.4MB/s stddev 0.958%, tar  3.8s stddev  0.250%
  # tar in root_cgroup, dd in cgroup

The cgroup reserved dirty memory resources for the rest of the system
processes (tar in this case).  The tar process had faster and more
predictable performance.  memcg dirty ratios might be useful to serve
different task classes (interactive vs batch).  A past discussion
touched on this: http://lkml.org/lkml/2010/5/20/136

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2010-10-30 21:46 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-29  7:09 Greg Thelen
2010-10-29  7:09 ` [PATCH v4 01/11] memcg: add page_cgroup flags for dirty page tracking Greg Thelen
2010-10-29  7:09 ` [PATCH v4 02/11] memcg: document cgroup dirty memory interfaces Greg Thelen
2010-10-29 11:03   ` Wu Fengguang
2010-10-29 21:35     ` Greg Thelen
2010-10-30  3:02       ` Wu Fengguang
2010-10-29 20:19   ` Andrew Morton
2010-10-29 21:37     ` Greg Thelen
2010-10-29  7:09 ` [PATCH v4 03/11] memcg: create extensible page stat update routines Greg Thelen
2010-10-31 14:48   ` Ciju Rajan K
2010-10-31 20:11     ` Greg Thelen
2010-11-01 20:16       ` Ciju Rajan K
2010-11-02 19:35       ` Ciju Rajan K
2010-10-29  7:09 ` [PATCH v4 04/11] memcg: add lock to synchronize page accounting and migration Greg Thelen
2010-10-29  7:09 ` [PATCH v4 05/11] writeback: create dirty_info structure Greg Thelen
2010-10-29  7:50   ` KAMEZAWA Hiroyuki
2010-11-18  0:49   ` Andrew Morton
2010-11-18  0:50     ` Andrew Morton
2010-11-18  2:02     ` Greg Thelen
2010-10-29  7:09 ` [PATCH v4 06/11] memcg: add dirty page accounting infrastructure Greg Thelen
2010-10-29 11:13   ` Wu Fengguang
2010-10-29 11:17     ` KAMEZAWA Hiroyuki
2010-10-29  7:09 ` [PATCH v4 07/11] memcg: add kernel calls for memcg dirty page stats Greg Thelen
2010-10-29  7:09 ` [PATCH v4 08/11] memcg: add dirty limits to mem_cgroup Greg Thelen
2010-10-29  7:41   ` KAMEZAWA Hiroyuki
2010-10-29 16:00     ` Greg Thelen
2010-10-29  7:09 ` [PATCH v4 09/11] memcg: CPU hotplug lockdep warning fix Greg Thelen
2010-10-29 20:19   ` Andrew Morton
2010-10-29  7:09 ` [PATCH v4 10/11] memcg: add cgroupfs interface to memcg dirty limits Greg Thelen
2010-10-29  7:43   ` KAMEZAWA Hiroyuki
2010-10-29  7:09 ` [PATCH v4 11/11] memcg: check memcg dirty limits in page writeback Greg Thelen
2010-10-29  7:48   ` KAMEZAWA Hiroyuki
2010-10-29 16:06     ` Greg Thelen
2010-10-31 20:03       ` Wu Fengguang
2010-10-29 20:19 ` [PATCH v4 00/11] memcg: per cgroup dirty page accounting Andrew Morton
2010-10-30 21:46   ` Greg Thelen [this message]
2010-11-02 19:33     ` Ciju Rajan K

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xr93sjzne4m6.fsf@ninji.mtv.corp.google.com \
    --to=gthelen@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=arighi@develer.com \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=ciju@linux.vnet.ibm.com \
    --cc=containers@lists.osdl.org \
    --cc=fengguang.wu@intel.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan.kim@gmail.com \
    --cc=nishimura@mxp.nes.nec.co.jp \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox