Re: [RFC 0/5] Memory controller soft limit introduction (v3)

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: balbir@linux.vnet.ibm.com
Cc: Andrew Morton <akpm@linux-foundation.org>,
	YAMAMOTO Takashi <yamamoto@valinux.co.jp>,
	Paul Menage <menage@google.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [RFC 0/5] Memory controller soft limit introduction (v3)
Date: Mon, 30 Jun 2008 10:20:54 +0900	[thread overview]
Message-ID: <20080630102054.ee214765.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <4867174B.3090005@linux.vnet.ibm.com>

On Sun, 29 Jun 2008 10:32:03 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > I have a couple of comments.
> > 
> > 1. Why you add soft_limit to res_coutner ?
> >    Is there any other controller which uses soft-limit ?
> >    I'll move watermark handling to memcg from res_counter becasue it's
> >    required only by memcg.
> > 
> 
> I expect soft_limits to be controller independent. The same thing can be applied
> to an io-controller for example, right?
> 

I can't imagine how soft-limit works on i/o controller. could you explain ?


> > 2. *please* handle NUMA
> >    There is a fundamental difference between global VMM and memcg.
> >      global VMM - reclaim memory at memory shortage.
> >      memcg     - for reclaim memory at memory limit
> >    Then, memcg wasn't required to handle place-of-memory at hitting limit. 
> >    *just reducing the usage* was enough.
> >    In this set, you try to handle memory shortage handling.
> >    So, please handle NUMA, i.e. "what node do you want to reclaim memory from ?"
> >    If not, 
> >     - memory placement of Apps can be terrible.
> >     - cannot work well with cpuset. (I think)
> > 
> 
> try_to_free_mem_cgroup_pages() handles NUMA right? We start with the
> node_zonelists of the current node on which we are executing.  I can pass on the
> zonelist from __alloc_pages_internal() to try_to_free_mem_cgroup_pages(). Is
> there anything else you had in mind?
> 
Assume following case of a host with 2 nodes. and following mount style.

mount -t cgroup -o memory,cpuset none /opt/cgroup/

  
  /Group1: cpu 0-1, mem=0 limit=1G, soft-limit=700M
  /Group2: cpu 2-3, mem=1 limit=1G  soft-limit=700M
  ....
  /Groupxxxx

Assume a environ after some workload, 

  /Group1: cpu 0-1, mem=0 limit=1G, soft-limit=700M usage=990M
  /Group2: cpu 2-3, mem=1 limit=1G  soft-limit=700M usage=400M

*And* memory of node"1" is in shortage and the kernel has to reclaim
memory from node "1".

Your routine tries to relclaim memory from a group, which exceeds soft-limit
....Group1. But it's no help because Group1 doesn't contains any memory in Node1.
And make it worse, your routine doen't tries to call try_to_free_pages() in global
LRU when your soft-limit reclaim some memory. So, if a task in Group 1 continues
to allocate memory at some speed, memory shortage in Group2 will not be recovered,
easily.

This includes 2 aspects of trouble.
 - Group1's memory is reclaimed but it's wrong.
 - Group2's try_to_free_pages() may took very long time.

(Current page shrinking under cpuset seems to scan all nodes,
 his seems not to be quick, but it works  because it scans all.
 This will be another problem, anyway ;).


BTW, currently mem_cgroup_try_to_free_pages() assumes GFP_HIGHUSER_MOVABLE
always.
==
unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
                                                gfp_t gfp_mask)
{
        struct scan_control sc = {
                .may_writepage = !laptop_mode,
                .may_swap = 1,
                .swap_cluster_max = SWAP_CLUSTER_MAX,
                .swappiness = vm_swappiness,
                .order = 0,
                .mem_cgroup = mem_cont,
                .isolate_pages = mem_cgroup_isolate_pages,
        };
        struct zonelist *zonelist;

        sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
                        (GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
        zonelist = NODE_DATA(numa_node_id())->node_zonelists;
        return do_try_to_free_pages(zonelist, &sc);
}
==
please select appropriate zonelist here.


> 
> > 3. I think  when "mem_cgroup_reclaim_on_contention" exits is unclear.
> >    plz add explanation of algorithm. It returns when some pages are reclaimed ?
> > 
> 
> Sure, I will do that.
> 
> > 4. When swap-full cgroup is on the top of heap, which tends to contain
> >    tons of memory, much amount of cpu-time will be wasted.
> >    Can we add "ignore me" flag  ?
> > 
> 
> Could you elaborate on swap-full cgroup please? Are you referring to changes
> introduced by the memcg-handle-swap-cache patch? I don't mind adding a ignore me
> flag, but I guess we need to figure out when a cgroup is swap full.
> 
No. no-available-swap, or all-swap-are-used situation.

This situation will happen very easily if swap-controller comes.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2008-06-30  1:20 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-27 15:18 Balbir Singh
2008-06-27 15:18 ` [RFC 1/5] Memory controller soft limit documentation Balbir Singh
2008-06-27 15:18 ` [RFC 2/5] Add delete max to prio heap Balbir Singh
2008-06-27 15:18 ` [RFC 3/5] Replacement policy on heap overfull Balbir Singh
2008-06-27 15:37   ` Paul Menage
2008-06-30  3:46     ` Balbir Singh
2008-06-27 15:18 ` [RFC 4/5] Memory controller soft limit resource counter additions Balbir Singh
2008-06-27 15:19 ` [RFC 5/5] Memory controller soft limit reclaim on contention Balbir Singh
2008-06-27 16:09   ` Paul Menage
2008-06-29  4:48     ` Balbir Singh
2008-06-30  3:42     ` Balbir Singh
2008-06-28  4:22   ` KAMEZAWA Hiroyuki
2008-06-30  7:33   ` KOSAKI Motohiro
2008-06-30  7:48     ` Balbir Singh
2008-06-30  7:56       ` KOSAKI Motohiro
2008-06-30  8:11         ` Balbir Singh
2008-06-30  8:17           ` KOSAKI Motohiro
2008-06-28  4:36 ` [RFC 0/5] Memory controller soft limit introduction (v3) KAMEZAWA Hiroyuki
2008-06-29  5:02   ` Balbir Singh
2008-06-30  1:20     ` KAMEZAWA Hiroyuki [this message]
2008-06-30  1:50       ` KAMEZAWA Hiroyuki
2008-06-30  2:02         ` KAMEZAWA Hiroyuki
2008-06-30  3:41       ` Balbir Singh
2008-06-30  3:57         ` KAMEZAWA Hiroyuki
2008-06-30  4:00           ` Balbir Singh
2008-06-30  4:19             ` KAMEZAWA Hiroyuki
2008-06-30  4:40               ` Balbir Singh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080630102054.ee214765.kamezawa.hiroyu@jp.fujitsu.com \
    --to=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=akpm@linux-foundation.org \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=menage@google.com \
    --cc=yamamoto@valinux.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox