linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ciju Rajan K <ciju@linux.vnet.ibm.com>
To: Greg Thelen <gthelen@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	containers@lists.osdl.org, Andrea Righi <arighi@develer.com>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
	Minchan Kim <minchan.kim@gmail.com>,
	David Rientjes <rientjes@google.com>,
	Ciju Rajan K <ciju@linux.vnet.ibm.com>
Subject: Re: [PATCH v2][memcg+dirtylimit] Fix  overwriting global vm dirty limit setting by memcg (Re: [PATCH v3 00/11] memcg: per cgroup dirty page accounting
Date: Mon, 25 Oct 2010 12:33:23 +0530	[thread overview]
Message-ID: <4CC52BBB.8010407@linux.vnet.ibm.com> (raw)
In-Reply-To: <xr937hh7sa5l.fsf@ninji.mtv.corp.google.com>

Greg Thelen wrote:
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> writes:
>
>   
>> Fixed one here.
>> ==
>> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>>
>> Now, at calculating dirty limit, vm_dirty_param() is called.
>> This function returns dirty-limit related parameters considering
>> memory cgroup settings.
>>
>> Now, assume that vm_dirty_bytes=100M (global dirty limit) and
>> memory cgroup has 1G of pages and 40 dirty_ratio, dirtyable memory is
>> 500MB.
>>
>> In this case, global_dirty_limits will consider dirty_limt as
>> 500 *0.4 = 200MB. This is bad...memory cgroup is not back door.
>>
>> This patch limits the return value of vm_dirty_param() considring
>> global settings.
>>
>> Changelog:
>>  - fixed an argument "mem" int to u64
>>  - fixed to use global available memory to cap memcg's value.
>>
>> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> ---
>>  include/linux/memcontrol.h |    5 +++--
>>  mm/memcontrol.c            |   30 +++++++++++++++++++++++++++++-
>>  mm/page-writeback.c        |    3 ++-
>>  3 files changed, 34 insertions(+), 4 deletions(-)
>>
>> Index: dirty_limit_new/mm/memcontrol.c
>> ===================================================================
>> --- dirty_limit_new.orig/mm/memcontrol.c
>> +++ dirty_limit_new/mm/memcontrol.c
>> @@ -1171,9 +1171,11 @@ static void __mem_cgroup_dirty_param(str
>>   * can be moved after our access and writeback tends to take long time.  At
>>   * least, "memcg" will not be freed while holding rcu_read_lock().
>>   */
>> -void vm_dirty_param(struct vm_dirty_param *param)
>> +void vm_dirty_param(struct vm_dirty_param *param,
>> +	 u64 mem, u64 global)
>>  {
>>  	struct mem_cgroup *memcg;
>> +	u64 limit, bglimit;
>>  
>>  	if (mem_cgroup_disabled()) {
>>  		global_vm_dirty_param(param);
>> @@ -1183,6 +1185,32 @@ void vm_dirty_param(struct vm_dirty_para
>>  	rcu_read_lock();
>>  	memcg = mem_cgroup_from_task(current);
>>  	__mem_cgroup_dirty_param(param, memcg);
>> +	/*
>> +	 * A limitation under memory cgroup is under global vm, too.
>> +	 */
>> +	if (vm_dirty_ratio)
>> +		limit = global * vm_dirty_ratio / 100;
>> +	else
>> +		limit = vm_dirty_bytes;
>> +	if (param->dirty_ratio) {
>> +		param->dirty_bytes = mem * param->dirty_ratio / 100;
>> +		param->dirty_ratio = 0;
>> +	}
>> +	if (param->dirty_bytes > limit)
>> +		param->dirty_bytes = limit;
>> +
>> +	if (dirty_background_ratio)
>> +		bglimit = global * dirty_background_ratio / 100;
>> +	else
>> +		bglimit = dirty_background_bytes;
>> +
>> +	if (param->dirty_background_ratio) {
>> +		param->dirty_background_bytes =
>> +			mem * param->dirty_background_ratio / 100;
>> +		param->dirty_background_ratio = 0;
>> +	}
>> +	if (param->dirty_background_bytes > bglimit)
>> +		param->dirty_background_bytes = bglimit;
>>  	rcu_read_unlock();
>>  }
>>  
>> Index: dirty_limit_new/include/linux/memcontrol.h
>> ===================================================================
>> --- dirty_limit_new.orig/include/linux/memcontrol.h
>> +++ dirty_limit_new/include/linux/memcontrol.h
>> @@ -171,7 +171,7 @@ static inline void mem_cgroup_dec_page_s
>>  }
>>  
>>  bool mem_cgroup_has_dirty_limit(void);
>> -void vm_dirty_param(struct vm_dirty_param *param);
>> +void vm_dirty_param(struct vm_dirty_param *param, u64 mem, u64 global);
>>  s64 mem_cgroup_page_stat(enum mem_cgroup_nr_pages_item item);
>>  
>>  unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
>> @@ -360,7 +360,8 @@ static inline bool mem_cgroup_has_dirty_
>>  	return false;
>>  }
>>  
>> -static inline void vm_dirty_param(struct vm_dirty_param *param)
>> +static inline void vm_dirty_param(struct vm_dirty_param *param,
>> +		u64 mem, u64 global)
>>  {
>>  	global_vm_dirty_param(param);
>>  }
>> Index: dirty_limit_new/mm/page-writeback.c
>> ===================================================================
>> --- dirty_limit_new.orig/mm/page-writeback.c
>> +++ dirty_limit_new/mm/page-writeback.c
>> @@ -466,7 +466,8 @@ void global_dirty_limits(unsigned long *
>>  	struct task_struct *tsk;
>>  	struct vm_dirty_param dirty_param;
>>  
>> -	vm_dirty_param(&dirty_param);
>> +	vm_dirty_param(&dirty_param,
>> +		available_memory, global_dirtyable_memory());
>>  
>>  	if (dirty_param.dirty_bytes)
>>  		dirty = DIV_ROUND_UP(dirty_param.dirty_bytes, PAGE_SIZE);
>>     
>
> I think there is a problem with the patch above.  In the patch
> vm_dirty_param() sets param->dirty_[background_]bytes to the smallest
> limits considering the memcg and global limits.  Assuming the current
> task is in a memcg, then the memcg dirty (not system-wide) usage is
> always compared to the selected limits (which may be per-memcg or
> system).  The problem is that if:
> a) per-memcg dirty limit is smaller than system then vm_dirty_param()
>    will select per-memcg dirty limit, and
> b) per-memcg dirty usage is well below memcg dirty limit, and
> b) system usage is at system limit
> Then the above patch will not trigger writeback.  Example with two
> memcg:
>          sys
>         B   C
>       
>       limit  usage
>   sys  10     10
>    B    7      6
>    C    5      4
>
> If B wants to grow, the system will exceed system limit of 10 and should
> be throttled.  However, the smaller limit (7) will be selected and
> applied to memcg usage (6), which indicates no need to throttle, so the
> system could get as bad as:
>
>       limit  usage
>   sys  10     12
>    B    7      7
>    C    5      5
>
> In this case the system usage exceeds the system limit because each
> the per-memcg checks see no per-memcg problems.
>
>   
What about the following scenarios?
a) limit usage
sys 9 7
B 8 6
A 4 1
Now assume B consumes 2 more. The total of B reaches 8 (memcg max) and 
the system total reaches 9 (Global limit).
The scenario will be like this.

limit usage
sys 9 9
B 8 8
A 4 1

In this case, group A is not getting a fair chance to utilize its limit.
Do we need to consider this case also?

b) Even though we are defining per cgroup dirty limit, it is not 
actually the case.
Is it indirectly dependent on the the total system wide limit in this 
implementation?

-Ciju
> To solve this I propose we create a new structure to aggregate both
> dirty limit and usage data:
> 	struct dirty_limits {
> 	       unsigned long dirty_thresh;
> 	       unsigned long background_thresh;
> 	       unsigned long nr_reclaimable;
> 	       unsigned long nr_writeback;
> 	};
>
> global_dirty_limits() would then query both the global and memcg limits
> and dirty usage of one that is closest to its limit.  This change makes
> global_dirty_limits() look like:
>
> void global_dirty_limits(struct dirty_limits *limits)
> {
> 	unsigned long background;
> 	unsigned long dirty;
> 	unsigned long nr_reclaimable;
> 	unsigned long nr_writeback;
> 	unsigned long available_memory = determine_dirtyable_memory();
> 	struct task_struct *tsk;
>
> 	if (vm_dirty_bytes)
> 		dirty = DIV_ROUND_UP(vm_dirty_bytes, PAGE_SIZE);
> 	else
> 		dirty = (vm_dirty_ratio * available_memory) / 100;
>
> 	if (dirty_background_bytes)
> 		background = DIV_ROUND_UP(dirty_background_bytes, PAGE_SIZE);
> 	else
> 		background = (dirty_background_ratio * available_memory) / 100;
>
> 	nr_reclaimable = global_page_state(NR_FILE_DIRTY) +
> 				global_page_state(NR_UNSTABLE_NFS);
> 	nr_writeback = global_page_state(NR_WRITEBACK);
>
> 	if (mem_cgroup_dirty_limits(available_memory, limits) &&
> 	    dirty_available(limits->dirty_thresh, limits->nr_reclaimable,
> 			    limits->nr_writeback) <
> 	    dirty_available(dirty, nr_reclaimable, nr_writeback)) {
> 		dirty = min(dirty, limits->dirty_thresh);
> 		background = min(background, limits->background_thresh);
> 	} else {
> 		limits->nr_reclaimable = nr_reclaimable;
> 		limits->nr_writeback = nr_writeback;
> 	}
>
> 	if (background >= dirty)
> 		background = dirty / 2;
> 	tsk = current;
> 	if (tsk->flags & PF_LESS_THROTTLE || rt_task(tsk)) {
> 		background += background / 4;
> 		dirty += dirty / 4;
> 	}
> 	limits->background_thresh = background;
> 	limits->dirty_thresh = dirty;
> }
>
> Because this approach considered both memcg and system limits, the
> problem described above is avoided.
>
> I have this change integrated into the memcg dirty limit series (-v3 was
> the last post; v4 is almost ready with this change).  I will post -v4
> with this approach is there is no strong objection.
>
> --
> Greg
>   

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2010-10-25  7:03 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-19  0:39 Greg Thelen
2010-10-19  0:39 ` [PATCH v3 01/11] memcg: add page_cgroup flags for dirty page tracking Greg Thelen
2010-10-19  4:31   ` Daisuke Nishimura
2010-10-19  0:39 ` [PATCH v3 02/11] memcg: document cgroup dirty memory interfaces Greg Thelen
2010-10-19  0:46   ` KAMEZAWA Hiroyuki
2010-10-19  8:27   ` Daisuke Nishimura
2010-10-19 21:00     ` Greg Thelen
2010-10-20  0:11       ` KAMEZAWA Hiroyuki
2010-10-20  0:45         ` Greg Thelen
2010-10-20  4:06           ` KAMEZAWA Hiroyuki
2010-10-20  4:25             ` Greg Thelen
2010-10-20  4:26               ` KAMEZAWA Hiroyuki
2010-10-20  0:48         ` Daisuke Nishimura
2010-10-20  1:14           ` KAMEZAWA Hiroyuki
2010-10-20  2:24             ` KAMEZAWA Hiroyuki
2010-10-20  3:47               ` Daisuke Nishimura
2010-10-19  0:39 ` [PATCH v3 03/11] memcg: create extensible page stat update routines Greg Thelen
2010-10-19  0:47   ` KAMEZAWA Hiroyuki
2010-10-19  4:52   ` Daisuke Nishimura
2010-10-19  0:39 ` [PATCH v3 04/11] memcg: add lock to synchronize page accounting and migration Greg Thelen
2010-10-19  0:45   ` KAMEZAWA Hiroyuki
2010-10-19  4:43     ` [RFC][PATCH 1/2] memcg: move_account optimization by reduct put,get page (Re: " KAMEZAWA Hiroyuki
2010-10-19  4:45       ` [RFC][PATCH 2/2] memcg: move_account optimization by reduce locks " KAMEZAWA Hiroyuki
2010-10-19  1:17   ` Minchan Kim
2010-10-19  5:03   ` Daisuke Nishimura
2010-10-19  0:39 ` [PATCH v3 05/11] memcg: add dirty page accounting infrastructure Greg Thelen
2010-10-19  0:49   ` KAMEZAWA Hiroyuki
2010-10-20  0:53   ` Daisuke Nishimura
2010-10-19  0:39 ` [PATCH v3 06/11] memcg: add kernel calls for memcg dirty page stats Greg Thelen
2010-10-19  0:51   ` KAMEZAWA Hiroyuki
2010-10-19  7:03   ` Daisuke Nishimura
2010-10-19  0:39 ` [PATCH v3 07/11] memcg: add dirty limits to mem_cgroup Greg Thelen
2010-10-19  0:53   ` KAMEZAWA Hiroyuki
2010-10-20  0:50   ` Daisuke Nishimura
2010-10-20  4:08     ` Greg Thelen
2010-10-19  0:39 ` [PATCH v3 08/11] memcg: CPU hotplug lockdep warning fix Greg Thelen
2010-10-19  0:54   ` KAMEZAWA Hiroyuki
2010-10-20  3:47   ` Daisuke Nishimura
2010-10-19  0:39 ` [PATCH v3 09/11] memcg: add cgroupfs interface to memcg dirty limits Greg Thelen
2010-10-19  0:56   ` KAMEZAWA Hiroyuki
2010-10-20  3:31   ` Daisuke Nishimura
2010-10-20  3:44     ` KAMEZAWA Hiroyuki
2010-10-20  3:46     ` Daisuke Nishimura
2010-10-19  0:39 ` [PATCH v3 10/11] writeback: make determine_dirtyable_memory() static Greg Thelen
2010-10-19  0:57   ` KAMEZAWA Hiroyuki
2010-10-20  3:47   ` Daisuke Nishimura
2010-10-19  0:39 ` [PATCH v3 11/11] memcg: check memcg dirty limits in page writeback Greg Thelen
2010-10-19  1:00   ` KAMEZAWA Hiroyuki
2010-10-20  4:18     ` KAMEZAWA Hiroyuki
2010-10-20  4:33       ` Greg Thelen
2010-10-20  4:33         ` KAMEZAWA Hiroyuki
2010-10-20  4:34       ` Daisuke Nishimura
2010-10-20  5:25   ` Daisuke Nishimura
2010-10-20  3:21 ` [PATCH][memcg+dirtylimit] Fix overwriting global vm dirty limit setting by memcg (Re: [PATCH v3 00/11] memcg: per cgroup dirty page accounting KAMEZAWA Hiroyuki
2010-10-20  4:14   ` KAMEZAWA Hiroyuki
2010-10-20  5:02   ` [PATCH v2][memcg+dirtylimit] " KAMEZAWA Hiroyuki
2010-10-20  6:09     ` Daisuke Nishimura
2010-10-20 14:35     ` Minchan Kim
2010-10-21  0:10       ` KAMEZAWA Hiroyuki
2010-10-24 18:44     ` Greg Thelen
2010-10-25  0:24       ` KAMEZAWA Hiroyuki
2010-10-25  2:00       ` Daisuke Nishimura
2010-10-25  7:03       ` Ciju Rajan K [this message]
2010-10-25  7:08         ` KAMEZAWA Hiroyuki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CC52BBB.8010407@linux.vnet.ibm.com \
    --to=ciju@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=arighi@develer.com \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=containers@lists.osdl.org \
    --cc=gthelen@google.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan.kim@gmail.com \
    --cc=nishimura@mxp.nes.nec.co.jp \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox