linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Greg Thelen <gthelen@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	containers@lists.osdl.org, linux-fsdevel@vger.kernel.org,
	Balbir Singh <bsingharora@gmail.com>,
	Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
	Minchan Kim <minchan.kim@gmail.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Wu Fengguang <fengguang.wu@intel.com>,
	Dave Chinner <david@fromorbit.com>,
	Vivek Goyal <vgoyal@redhat.com>,
	Andrea Righi <andrea@betterlinux.com>,
	Ciju Rajan K <ciju@linux.vnet.ibm.com>,
	David Rientjes <rientjes@google.com>
Subject: Re: [PATCH v9 00/13] memcg: per cgroup dirty page limiting
Date: Thu, 18 Aug 2011 09:35:57 +0900	[thread overview]
Message-ID: <20110818093557.7994e309.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <1313597705-6093-1-git-send-email-gthelen@google.com>

On Wed, 17 Aug 2011 09:14:52 -0700
Greg Thelen <gthelen@google.com> wrote:

> This patch series provides the ability for each cgroup to have independent dirty
> page usage limits.  Limiting dirty memory fixes the max amount of dirty (hard to
> reclaim) page cache used by a cgroup.  This allows for better per cgroup memory
> isolation and fewer memcg OOMs.
> 

Thank you for your patient work!. I really want this feature.
(Hopefully before we tune vmscan)

I hope this patch will not have heavy HUNKs..

Thanks,
-Kame


> Three features are included in this patch series:
>   1. memcg dirty page accounting
>   2. memcg writeback
>   3. memcg dirty page limiting
> 
> 
> 1. memcg dirty page accounting
> 
> Each memcg maintains a dirty page count and dirty page limit.  Previous
> iterations of this patch series have refined this logic.  The interface is
> similar to the procfs interface: /proc/sys/vm/dirty_*.  It is possible to
> configure a limit to trigger throttling of a dirtier or queue background
> writeback.  The root cgroup memory.dirty_* control files are read-only and match
> the contents of the /proc/sys/vm/dirty_* files.
> 
> 
> 2. memcg writeback
> 
> Having per cgroup dirty memory limits is not very interesting unless writeback
> is also cgroup aware.  There is not much isolation if cgroups have to writeback data
> from outside the affected cgroup to get below the cgroup dirty memory threshold.
> 
> Per-memcg dirty limits are provided to support isolation and thus cross cgroup
> inode sharing is not a priority.  This allows the code be simpler.
> 
> To add cgroup awareness to writeback, this series adds an i_memcg field to
> struct address_space to allow writeback to isolate inodes for a particular
> cgroup.  When an inode is marked dirty, i_memcg is set to the current cgroup.
> When inode pages are marked dirty the i_memcg field is compared against the
> page's cgroup.  If they differ, then the inode is marked as shared by setting
> i_memcg to a special shared value (zero).
> 
> When performing per-memcg writeback, move_expired_inodes() scans the per bdi
> b_dirty list using each inode's i_memcg and the global over-limit memcg bitmap
> to determine if the inode should be written.  This inode scan may involve
> skipping many unrelated inodes from other cgroup.  To test the scanning
> overhead, I created two cgroups (cgroup_A with 100,000 dirty inodes under A's
> dirty limit, cgroup_B with 1 inode over B's dirty limit).  The writeback code
> then had to skip 100,000 inodes when balancing cgroup_B to find the one inode
> that needed writing.  This scanning took 58 msec to skip 100,000 foreign inodes.
> 
> 
> 3. memcg dirty page limiting
> 
> balance_dirty_pages() calls mem_cgroup_balance_dirty_pages(), which checks the
> dirty usage vs dirty thresholds for the current cgroup and its parents.  As
> cgroups exceed their background limit, they are marked in a global over-limit
> bitmap (indexed by cgroup id) and the bdi flusher is awoke.  As a cgroup hits is
> foreground limit, the task is throttled while performing foreground writeback on
> inodes owned by the over-limit cgroup.  If mem_cgroup_balance_dirty_pages() is
> unable to get below the dirty page threshold writing per-memcg inodes, then
> downshifts to also writing shared inodes (i_memcg=0).
> 
> I know that there is some significant IO-less balance_dirty_pages() changes.  I
> am not trying to derail that effort.  I have done moderate functional testing of
> the newly proposed features.
> 
> The memcg aspects of this patch are pretty mature.  The writeback aspects are
> still fairly new and need feedback from the writeback community.  These features
> are linked, so it's not clear which branch to send the changes to (the writeback
> development branch or mmotm).
> 
> Here is an example of the memcg OOM that is avoided with this patch series:
> 	# mkdir /dev/cgroup/memory/x
> 	# echo 100M > /dev/cgroup/memory/x/memory.limit_in_bytes
> 	# echo $$ > /dev/cgroup/memory/x/tasks
> 	# dd if=/dev/zero of=/data/f1 bs=1k count=1M &
>         # dd if=/dev/zero of=/data/f2 bs=1k count=1M &
>         # wait
> 	[1]-  Killed                  dd if=/dev/zero of=/data/f1 bs=1M count=1k
> 	[2]+  Killed                  dd if=/dev/zero of=/data/f1 bs=1M count=1k
> 
> Changes since -v8:
> - Reordered patches for better more readability.
> 
> - No longer passing struct writeback_control into memcontrol functions.  Instead
>   the needed attributes (memcg_id, etc.) are explicitly passed in.  Therefore no
>   more field additions to struct writeback_control.
> 
> - Replaced 'Andrea Righi <arighi@develer.com>' with 
>   'Andrea Righi <andrea@betterlinux.com>' in commit descriptions.
> 
> - Rebased to mmotm-2011-08-02-16-19
> 
> Greg Thelen (13):
>   memcg: document cgroup dirty memory interfaces
>   memcg: add page_cgroup flags for dirty page tracking
>   memcg: add dirty page accounting infrastructure
>   memcg: add kernel calls for memcg dirty page stats
>   memcg: add mem_cgroup_mark_inode_dirty()
>   memcg: add dirty limits to mem_cgroup
>   memcg: add cgroupfs interface to memcg dirty limits
>   memcg: dirty page accounting support routines
>   memcg: create support routines for writeback
>   writeback: pass wb_writeback_work into move_expired_inodes()
>   writeback: make background writeback cgroup aware
>   memcg: create support routines for page writeback
>   memcg: check memcg dirty limits in page writeback
> 
>  Documentation/cgroups/memory.txt  |   70 ++++
>  fs/buffer.c                       |    2 +-
>  fs/fs-writeback.c                 |  113 ++++--
>  fs/inode.c                        |    3 +
>  fs/nfs/write.c                    |    4 +
>  fs/sync.c                         |    2 +-
>  include/linux/cgroup.h            |    1 +
>  include/linux/fs.h                |    9 +
>  include/linux/memcontrol.h        |   64 +++-
>  include/linux/page_cgroup.h       |   23 ++
>  include/linux/writeback.h         |    9 +-
>  include/trace/events/memcontrol.h |  207 ++++++++++
>  kernel/cgroup.c                   |    1 -
>  mm/backing-dev.c                  |    3 +-
>  mm/filemap.c                      |    1 +
>  mm/memcontrol.c                   |  760 ++++++++++++++++++++++++++++++++++++-
>  mm/page-writeback.c               |   44 ++-
>  mm/truncate.c                     |    1 +
>  mm/vmscan.c                       |    5 +-
>  19 files changed, 1265 insertions(+), 57 deletions(-)
>  create mode 100644 include/trace/events/memcontrol.h
> 
> -- 
> 1.7.3.1
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

      parent reply	other threads:[~2011-08-18  0:43 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-17 16:14 Greg Thelen
2011-08-17 16:14 ` [PATCH v9 01/13] memcg: document cgroup dirty memory interfaces Greg Thelen
2011-08-17 16:14 ` [PATCH v9 02/13] memcg: add page_cgroup flags for dirty page tracking Greg Thelen
2011-08-17 16:14 ` [PATCH v9 03/13] memcg: add dirty page accounting infrastructure Greg Thelen
2011-08-18  0:39   ` KAMEZAWA Hiroyuki
2011-08-18  6:07     ` Greg Thelen
2011-08-17 16:14 ` [PATCH v9 04/13] memcg: add kernel calls for memcg dirty page stats Greg Thelen
2011-08-17 16:14 ` [PATCH v9 05/13] memcg: add mem_cgroup_mark_inode_dirty() Greg Thelen
2011-08-18  0:51   ` KAMEZAWA Hiroyuki
2011-08-17 16:14 ` [PATCH v9 06/13] memcg: add dirty limits to mem_cgroup Greg Thelen
2011-08-18  0:53   ` KAMEZAWA Hiroyuki
2011-08-17 16:14 ` [PATCH v9 07/13] memcg: add cgroupfs interface to memcg dirty limits Greg Thelen
2011-08-18  0:55   ` KAMEZAWA Hiroyuki
2011-08-17 16:15 ` [PATCH v9 08/13] memcg: dirty page accounting support routines Greg Thelen
2011-08-18  1:05   ` KAMEZAWA Hiroyuki
2011-08-18  7:04     ` Greg Thelen
2011-08-17 16:15 ` [PATCH v9 09/13] memcg: create support routines for writeback Greg Thelen
2011-08-18  1:13   ` KAMEZAWA Hiroyuki
2011-08-17 16:15 ` [PATCH v9 10/13] writeback: pass wb_writeback_work into move_expired_inodes() Greg Thelen
2011-08-18  1:15   ` KAMEZAWA Hiroyuki
2011-08-17 16:15 ` [PATCH v9 11/13] writeback: make background writeback cgroup aware Greg Thelen
2011-08-18  1:23   ` KAMEZAWA Hiroyuki
2011-08-18  7:10     ` Greg Thelen
2011-08-18  7:17       ` KAMEZAWA Hiroyuki
2011-08-18  7:38         ` Greg Thelen
2011-08-18  7:35           ` KAMEZAWA Hiroyuki
2011-08-17 16:15 ` [PATCH v9 12/13] memcg: create support routines for page writeback Greg Thelen
2011-08-18  1:38   ` KAMEZAWA Hiroyuki
2011-08-18  2:36     ` Wu Fengguang
2011-08-18 10:12       ` Jan Kara
2011-08-18 12:17         ` Wu Fengguang
2011-08-18 20:08           ` Jan Kara
2011-08-19  1:36             ` Wu Fengguang
2011-08-17 16:15 ` [PATCH v9 13/13] memcg: check memcg dirty limits in " Greg Thelen
2011-08-18  1:40   ` KAMEZAWA Hiroyuki
2011-08-18  0:35 ` KAMEZAWA Hiroyuki [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110818093557.7994e309.kamezawa.hiroyu@jp.fujitsu.com \
    --to=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrea@betterlinux.com \
    --cc=bsingharora@gmail.com \
    --cc=ciju@linux.vnet.ibm.com \
    --cc=containers@lists.osdl.org \
    --cc=david@fromorbit.com \
    --cc=fengguang.wu@intel.com \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan.kim@gmail.com \
    --cc=nishimura@mxp.nes.nec.co.jp \
    --cc=rientjes@google.com \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox