linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH 0/2] memcg: hierarchy support (v3)
@ 2008-06-04  4:58 KAMEZAWA Hiroyuki
  2008-06-04  5:01 ` [RFC][PATCH 1/2] memcg: res_counter hierarchy KAMEZAWA Hiroyuki
                   ` (3 more replies)
  0 siblings, 4 replies; 38+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-04  4:58 UTC (permalink / raw)
  To: linux-mm; +Cc: LKML, balbir, menage, xemul, yamamoto

Hi, this is third version.

While small changes in codes, the whole _tone_ of code is changed.
I'm not in hurry, any comments are welcome.

based on 2.6.26-rc2-mm1 + memcg patches in -mm queue.

Changes from v2:
 - Named as HardWall policy.
 - rewrote the code to be read easily. changed the name of functions.
 - Added text.
 - supported hierarchy_model parameter.
   Now, no_hierarchy and hardwall_hierarchy is implemented.

HardWall Policy:
  - designed for strict resource isolation under hierarchy.
    Usually, automatic load balancing between cgroup can break the
    users assumption even if it's implemented very well.
  - parent overcommits all children
     parent->usage = resource used by itself + resource moved to children.
     Of course, parent->limit > parent->usage. 
  - when child's limit is set, the resouce moves.
  - no automatic resource moving between parent <-> child

Example)
  1) Assume a cgroup with 1GB limits. (and no tasks belongs to this, now)
     - group_A limit=1G,usage=0M.

  2) create group B, C under A.
     - group A limit=1G, usage=0M
          - group B limit=0M, usage=0M.
          - group C limit=0M, usage=0M.

  3) increase group B's limit to 300M.
     - group A limit=1G, usage=300M.
          - group B limit=300M, usage=0M.
          - group C limit=0M, usage=0M.

  4) increase group C's limit to 500M
     - group A limit=1G, usage=800M.
          - group B limit=300M, usage=0M.
          - group C limit=500M, usage=0M.

  5) reduce group B's limit to 100M
     - group A limit=1G, usage=600M.
          - group B limit=100M, usage=0M.
          - group C limit=500M, usage=0M.


Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [RFC][PATCH 1/2] memcg: res_counter hierarchy
  2008-06-01  0:35               ` kamezawa.hiroyu
@ 2008-06-02  6:16 Balbir Singh
  2008-05-31 17:18 ` Balbir Singh
  0 siblings, 1 reply; 38+ messages in thread
From: Balbir Singh @ 2008-06-02  6:16 UTC (permalink / raw)
  To: kamezawa.hiroyu; +Cc: linux-mm, LKML, xemul, menage, yamamoto, lizf

Hi, Kamezawa-san,

kamezawa.hiroyu@jp.fujitsu.com wrote:
> 
> It's not problem. We're not developing world-wide eco system.
> It's good that there are several development groups. It's a way to evolution.
> Something popular will be defacto standard. 
> What we have to do is providing proper interfaces for allowing fair race.
> 

I did not claim that we were developing an eco system either :)
My point is that we should not confuse *Linux* users. Lets do the common/useful
stuff in the kernel and make it easy for users to use the cgroup subsystem.

>>> Here is an example. (just an example...)
>>> Please point out if I'm misunderstanding "share".
>>>
>>> root_level/                   = limit 1G.
>>>           /child_A = share=30
>>>           /child_B = share=15
>>>           /child_C = share=5
>>> (and assume there is no process under root_level for make explanation easy.
> .)
>>> 0. At first, before starting to use memory, set all kernel_memory_limit.
>>> root_level.limit = 1G
>>>   child_A.limit=64M,usage=0
>>>   child_B.limit=64M,usage=0
>>>   child_C.limit=64M,usage=0
>>>   free_resource=808M 
>>>
>> This sounds incorrect, since the limits should be proportional to shares. If 
> the
>> maximum shares in the root were 100 (*ideally we want higher resolution than 
> that)
>> Then
>>
>> child_A.limit = .3 * 1G
>> child_B.limit = .15 * 1G
>>
>> and so on
>>
> Above just showing param to the kernel. 
> From user's view, memory limitation is A:B:C=6:3:1 if memory is fully used.
> (In above case, usage=0)
> 
> In general, "share" works only when the total usage reaches limitation.
> (See how cpu scheduler works.)
> When the usage doesn't reach limit, there is no limitatiuon.
> 

If you are implying that shares imply a soft limit, I agree. But the only
parameter in the kernel right now is hard limits. We need to add soft limit support.

>>> 1. next, a process in child_C start to run and use memory of 600M.
>>> root_level.limit = 1G
>>>   child_A.limit=64M
>>>   child_B.limit=64M
>>>   child_C.limit=600M,usage=600M
>>>   free_resource=272M
>>>
>> How is that feasible, it's limit was 64M, how did it bump up to 600M? If you
>> want something like that, child_C should have no limits.
> 
> middleware just do when child_C.failcnt hits.
> echo 64M > childC.memory.limits_in_bytes.
> and periodically checks A,B,C and allow C to use what it wants becasue
> A and B doesn't want memory.
> 
>>> 2. now, a process in child_A start tu run and use memory of 800M.
>>>   child_A.limit=800M,usage=800M
>>>   child_B.limit=64M,usage=0M
>>>   child_C.limit=136M,usage=136M
>>>   free_resouce=0,A:C=6:1
>>>
>> Not sure I understand this step
>>
> Middleware notices that usage in A is growing and moves resources to A.
> 
> echo current child_C's limit - 64M > child_C
> echo current child_A's limit + 64M > child_A
> do above in step by step with loops for making A:C = 6:1
> (64M is just an example)
> 
>>> 3.Finally, a process in child_B start. and use memory of 500M.
>>>   child_A.limit=600M,usage=600M
>>>   child_B.limit=300M,usage=300M
>>>   child_C.limit=100M,usage=100M
>>>   free_resouce=0, A:B:C=6:3:1
>>>
>> Not sure I understand this step
>>
> echo current child_C's limit - 64M > child_C
> echo current child_A's limit - 64M > child_A
> echo current child_B's limit + 64M > child_B
> do above in step by step with loops for making A:B:C = 6:3:1
> 
> 
>>> 4. one more, a process in A exits.
>>>   child_A.limit=64M, usage=0M
>>>   child_B.limit=500M, usage=500M
>>>   child_C.limit=436M, usage=436M
>>>   free_resouce=0, B:C=3:1 (but B just want to use 500M)
>>>
>> Not sure I understand this step
>>
> middleware can notice memory pressure from Child_A is reduced.
> 
> echo current child_A's limit - 64M > child_A
> echo current child_C's limit + 64M > child_C
> echo current child_B's limit + 64M > child_B
> do above in step by step with loops for making B:C = 3:1 with avoiding
> the waste of resources.
> 
> 
> 
>>> This is only an example and the middleware can more pricise "limit"
>>> contols by checking statistics of memory controller hierarchy based on
>>> their own policy.
>>>
>>> What I think now is what kind of statistics/notifier/controls are
>>> necessary to implement shares in middleware. How pricise/quick work the
>>> middleware can do is based on interfaces.
>>> Maybe the middleware should know "how fast the application runs now" by
>>> some kind of check or co-operative interface with the application.
>>> But I'm not sure how the kernel can help it.
>> I am not sure if I understand your proposal at this point.
>>
> 
> The most important point is cgoups.memory.memory.limit_in_bytes
> is _just_ a notification to ask the kernel to limit the memory
> usage of process groups temporally. It changes often.
> Based on user's notification to the middleware (share or limit),
> the middleware changes limit_in_bytes to be suitable value
> and change it dynamically and periodically. 
> 

Why don't we add soft limits, so that we don't have to go to the kernel and
change limits frequently. One missing piece in the memory controller is that we
don't shrink the memory controller when limits change or when tasks move. I
think soft limits is a better solution.

Thanks for patiently explaining all of this.

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [RFC][PATCH 1/2] memcg: res_counter hierarchy
  2008-05-30  1:45         ` [RFC][PATCH 1/2] memcg: res_counter hierarchy KAMEZAWA Hiroyuki
@ 2008-06-02  2:15 YAMAMOTO Takashi
  1 sibling, 0 replies; 38+ messages in thread
From: YAMAMOTO Takashi @ 2008-06-02  2:15 UTC (permalink / raw)
  To: kamezawa.hiroyu; +Cc: linux-mm, linux-kernel, balbir, xemul, menage, lizf

> @@ -135,13 +138,118 @@ ssize_t res_counter_write(struct res_cou
>  		if (*end != '\0')
>  			goto out_free;
>  	}
> -	spin_lock_irqsave(&counter->lock, flags);
> -	val = res_counter_member(counter, member);
> -	*val = tmp;
> -	spin_unlock_irqrestore(&counter->lock, flags);
> -	ret = nbytes;
> +	if (member != RES_LIMIT || !callback) {

is there any reason to check member != RES_LIMIT here,
rather than in callers?

> +/*
> + * Move resource to its parent.
> + *   child->limit -= val.
> + *   parent->usage -= val.
> + *   parent->limit -= val.

s/limit/for_children/

> + */
> +
> +int res_counter_repay_resource(struct res_counter *child,
> +				struct res_counter *parent,
> +				unsigned long long val,
> +				res_shrink_callback_t callback, int retry)

can you reduce gratuitous differences between
res_counter_borrow_resource and res_counter_repay_resource?
eg. 'success' vs 'done', how to decrement 'retry'.

YAMAMOTO Takashi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2008-06-12  5:00 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-06-04  4:58 [RFC][PATCH 0/2] memcg: hierarchy support (v3) KAMEZAWA Hiroyuki
2008-06-04  5:01 ` [RFC][PATCH 1/2] memcg: res_counter hierarchy KAMEZAWA Hiroyuki
2008-06-04  6:54   ` Li Zefan
2008-06-04  7:03     ` KAMEZAWA Hiroyuki
2008-06-04  7:20   ` YAMAMOTO Takashi
2008-06-04  7:32     ` KAMEZAWA Hiroyuki
2008-06-04  8:59   ` Paul Menage
2008-06-04  9:18     ` KAMEZAWA Hiroyuki
2008-06-09  9:48   ` Balbir Singh
2008-06-09 10:20     ` KAMEZAWA Hiroyuki
2008-06-09 10:37       ` Balbir Singh
2008-06-09 12:02       ` kamezawa.hiroyu
2008-06-11 23:24   ` Randy Dunlap
2008-06-12  4:59     ` KAMEZAWA Hiroyuki
2008-06-04  5:03 ` [RFC][PATCH 2/2] memcg: hardwall hierarhcy for memcg KAMEZAWA Hiroyuki
2008-06-04  6:42   ` Li Zefan
2008-06-04  6:54     ` KAMEZAWA Hiroyuki
2008-06-04  8:59   ` Paul Menage
2008-06-04  9:26     ` KAMEZAWA Hiroyuki
2008-06-04 12:53       ` Daisuke Nishimura
2008-06-04 12:32   ` Daisuke Nishimura
2008-06-05  0:04     ` KAMEZAWA Hiroyuki
2008-06-09 10:56   ` Balbir Singh
2008-06-09 12:09   ` kamezawa.hiroyu
2008-06-11 23:24   ` Randy Dunlap
2008-06-12  5:00     ` KAMEZAWA Hiroyuki
2008-06-04  8:59 ` [RFC][PATCH 0/2] memcg: hierarchy support (v3) Paul Menage
2008-06-04  9:15   ` KAMEZAWA Hiroyuki
2008-06-04  9:15     ` Paul Menage
2008-06-04  9:31       ` KAMEZAWA Hiroyuki
2008-06-09  9:30 ` Balbir Singh
2008-06-09  9:55   ` KAMEZAWA Hiroyuki
2008-06-09 10:33     ` Balbir Singh
  -- strict thread matches above, loose matches on Subject: below --
2008-06-02  6:16 [RFC][PATCH 1/2] memcg: res_counter hierarchy Balbir Singh
2008-05-31 17:18 ` Balbir Singh
2008-05-31 11:20   ` Balbir Singh
2008-05-30 22:20     ` Balbir Singh
2008-05-30  1:43       ` [RFC][PATCH 0/2] memcg: simple hierarchy (v2) KAMEZAWA Hiroyuki
2008-05-30  1:45         ` [RFC][PATCH 1/2] memcg: res_counter hierarchy KAMEZAWA Hiroyuki
2008-05-31  1:59           ` kamezawa.hiroyu
2008-05-31 14:47             ` kamezawa.hiroyu
2008-06-01  0:35               ` kamezawa.hiroyu
2008-06-02  9:48                 ` kamezawa.hiroyu
2008-06-02  9:52           ` kamezawa.hiroyu
2008-06-02  2:15 YAMAMOTO Takashi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox