linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Balbir Singh <balbir@linux.vnet.ibm.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	YAMAMOTO Takashi <yamamoto@valinux.co.jp>,
	Paul Menage <menage@google.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [RFC 0/5] Memory controller soft limit introduction (v3)
Date: Mon, 30 Jun 2008 09:11:19 +0530	[thread overview]
Message-ID: <486855DF.2070100@linux.vnet.ibm.com> (raw)
In-Reply-To: <20080630102054.ee214765.kamezawa.hiroyu@jp.fujitsu.com>

KAMEZAWA Hiroyuki wrote:
> On Sun, 29 Jun 2008 10:32:03 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>>> I have a couple of comments.
>>>
>>> 1. Why you add soft_limit to res_coutner ?
>>>    Is there any other controller which uses soft-limit ?
>>>    I'll move watermark handling to memcg from res_counter becasue it's
>>>    required only by memcg.
>>>
>> I expect soft_limits to be controller independent. The same thing can be applied
>> to an io-controller for example, right?
>>
> 
> I can't imagine how soft-limit works on i/o controller. could you explain ?
> 

An io-controller could have the same concept. A hard-limit on the bandwidth and
a soft-limit to allow a group to exceed the soft-limit provided there is no i/o
bandwidth congestion.

> 
>>> 2. *please* handle NUMA
>>>    There is a fundamental difference between global VMM and memcg.
>>>      global VMM - reclaim memory at memory shortage.
>>>      memcg     - for reclaim memory at memory limit
>>>    Then, memcg wasn't required to handle place-of-memory at hitting limit. 
>>>    *just reducing the usage* was enough.
>>>    In this set, you try to handle memory shortage handling.
>>>    So, please handle NUMA, i.e. "what node do you want to reclaim memory from ?"
>>>    If not, 
>>>     - memory placement of Apps can be terrible.
>>>     - cannot work well with cpuset. (I think)
>>>
>> try_to_free_mem_cgroup_pages() handles NUMA right? We start with the
>> node_zonelists of the current node on which we are executing.  I can pass on the
>> zonelist from __alloc_pages_internal() to try_to_free_mem_cgroup_pages(). Is
>> there anything else you had in mind?
>>
> Assume following case of a host with 2 nodes. and following mount style.
> 
> mount -t cgroup -o memory,cpuset none /opt/cgroup/
> 
>   
>   /Group1: cpu 0-1, mem=0 limit=1G, soft-limit=700M
>   /Group2: cpu 2-3, mem=1 limit=1G  soft-limit=700M
>   ....
>   /Groupxxxx
> 
> Assume a environ after some workload, 
> 
>   /Group1: cpu 0-1, mem=0 limit=1G, soft-limit=700M usage=990M
>   /Group2: cpu 2-3, mem=1 limit=1G  soft-limit=700M usage=400M
> 
> *And* memory of node"1" is in shortage and the kernel has to reclaim
> memory from node "1".
> 
> Your routine tries to relclaim memory from a group, which exceeds soft-limit
> ....Group1. But it's no help because Group1 doesn't contains any memory in Node1.
> And make it worse, your routine doen't tries to call try_to_free_pages() in global
> LRU when your soft-limit reclaim some memory. So, if a task in Group 1 continues
> to allocate memory at some speed, memory shortage in Group2 will not be recovered,
> easily.
> 
> This includes 2 aspects of trouble.
>  - Group1's memory is reclaimed but it's wrong.
>  - Group2's try_to_free_pages() may took very long time.
> 
> (Current page shrinking under cpuset seems to scan all nodes,
>  his seems not to be quick, but it works  because it scans all.
>  This will be another problem, anyway ;).
> 
> 
> BTW, currently mem_cgroup_try_to_free_pages() assumes GFP_HIGHUSER_MOVABLE
> always.
> ==
> unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
>                                                 gfp_t gfp_mask)
> {
>         struct scan_control sc = {
>                 .may_writepage = !laptop_mode,
>                 .may_swap = 1,
>                 .swap_cluster_max = SWAP_CLUSTER_MAX,
>                 .swappiness = vm_swappiness,
>                 .order = 0,
>                 .mem_cgroup = mem_cont,
>                 .isolate_pages = mem_cgroup_isolate_pages,
>         };
>         struct zonelist *zonelist;
> 
>         sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
>                         (GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
>         zonelist = NODE_DATA(numa_node_id())->node_zonelists;
>         return do_try_to_free_pages(zonelist, &sc);
> }
> ==
> please select appropriate zonelist here.
> 

We do have zonelist information in __alloc_pages_internal(), it should be easy
to pass the zonelist or come up with a good default (current one) if no zonelist
is provided to the routine.


> 
>>> 3. I think  when "mem_cgroup_reclaim_on_contention" exits is unclear.
>>>    plz add explanation of algorithm. It returns when some pages are reclaimed ?
>>>
>> Sure, I will do that.
>>
>>> 4. When swap-full cgroup is on the top of heap, which tends to contain
>>>    tons of memory, much amount of cpu-time will be wasted.
>>>    Can we add "ignore me" flag  ?
>>>
>> Could you elaborate on swap-full cgroup please? Are you referring to changes
>> introduced by the memcg-handle-swap-cache patch? I don't mind adding a ignore me
>> flag, but I guess we need to figure out when a cgroup is swap full.
>>
> No. no-available-swap, or all-swap-are-used situation.
> 
> This situation will happen very easily if swap-controller comes.

We'll definitely deal with it when the swap-controller comes in.

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2008-06-30  3:41 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-27 15:18 Balbir Singh
2008-06-27 15:18 ` [RFC 1/5] Memory controller soft limit documentation Balbir Singh
2008-06-27 15:18 ` [RFC 2/5] Add delete max to prio heap Balbir Singh
2008-06-27 15:18 ` [RFC 3/5] Replacement policy on heap overfull Balbir Singh
2008-06-27 15:37   ` Paul Menage
2008-06-30  3:46     ` Balbir Singh
2008-06-27 15:18 ` [RFC 4/5] Memory controller soft limit resource counter additions Balbir Singh
2008-06-27 15:19 ` [RFC 5/5] Memory controller soft limit reclaim on contention Balbir Singh
2008-06-27 16:09   ` Paul Menage
2008-06-29  4:48     ` Balbir Singh
2008-06-30  3:42     ` Balbir Singh
2008-06-28  4:22   ` KAMEZAWA Hiroyuki
2008-06-30  7:33   ` KOSAKI Motohiro
2008-06-30  7:48     ` Balbir Singh
2008-06-30  7:56       ` KOSAKI Motohiro
2008-06-30  8:11         ` Balbir Singh
2008-06-30  8:17           ` KOSAKI Motohiro
2008-06-28  4:36 ` [RFC 0/5] Memory controller soft limit introduction (v3) KAMEZAWA Hiroyuki
2008-06-29  5:02   ` Balbir Singh
2008-06-30  1:20     ` KAMEZAWA Hiroyuki
2008-06-30  1:50       ` KAMEZAWA Hiroyuki
2008-06-30  2:02         ` KAMEZAWA Hiroyuki
2008-06-30  3:41       ` Balbir Singh [this message]
2008-06-30  3:57         ` KAMEZAWA Hiroyuki
2008-06-30  4:00           ` Balbir Singh
2008-06-30  4:19             ` KAMEZAWA Hiroyuki
2008-06-30  4:40               ` Balbir Singh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=486855DF.2070100@linux.vnet.ibm.com \
    --to=balbir@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=menage@google.com \
    --cc=yamamoto@valinux.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox