From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: balbir@linux.vnet.ibm.com
Cc: Andrew Morton <akpm@linux-foundation.org>,
YAMAMOTO Takashi <yamamoto@valinux.co.jp>,
Paul Menage <menage@google.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [RFC 0/5] Memory controller soft limit introduction (v3)
Date: Mon, 30 Jun 2008 12:57:37 +0900 [thread overview]
Message-ID: <20080630125737.4b14785f.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <486855DF.2070100@linux.vnet.ibm.com>
On Mon, 30 Jun 2008 09:11:19 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> KAMEZAWA Hiroyuki wrote:
> > On Sun, 29 Jun 2008 10:32:03 +0530
> > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> >>> I have a couple of comments.
> >>>
> >>> 1. Why you add soft_limit to res_coutner ?
> >>> Is there any other controller which uses soft-limit ?
> >>> I'll move watermark handling to memcg from res_counter becasue it's
> >>> required only by memcg.
> >>>
> >> I expect soft_limits to be controller independent. The same thing can be applied
> >> to an io-controller for example, right?
> >>
> >
> > I can't imagine how soft-limit works on i/o controller. could you explain ?
> >
>
> An io-controller could have the same concept. A hard-limit on the bandwidth and
> a soft-limit to allow a group to exceed the soft-limit provided there is no i/o
> bandwidth congestion.
>
Hmm, that is the case where "share" works well. Why soft-limit ?
i/o conroller doesn't support share ? (I don' know sorry.)
> >
> >>> 2. *please* handle NUMA
> >>> There is a fundamental difference between global VMM and memcg.
> >>> global VMM - reclaim memory at memory shortage.
> >>> memcg - for reclaim memory at memory limit
> >>> Then, memcg wasn't required to handle place-of-memory at hitting limit.
> >>> *just reducing the usage* was enough.
> >>> In this set, you try to handle memory shortage handling.
> >>> So, please handle NUMA, i.e. "what node do you want to reclaim memory from ?"
> >>> If not,
> >>> - memory placement of Apps can be terrible.
> >>> - cannot work well with cpuset. (I think)
> >>>
> >> try_to_free_mem_cgroup_pages() handles NUMA right? We start with the
> >> node_zonelists of the current node on which we are executing. I can pass on the
> >> zonelist from __alloc_pages_internal() to try_to_free_mem_cgroup_pages(). Is
> >> there anything else you had in mind?
> >>
> > Assume following case of a host with 2 nodes. and following mount style.
> >
> > mount -t cgroup -o memory,cpuset none /opt/cgroup/
> >
> >
> > /Group1: cpu 0-1, mem=0 limit=1G, soft-limit=700M
> > /Group2: cpu 2-3, mem=1 limit=1G soft-limit=700M
> > ....
> > /Groupxxxx
> >
> > Assume a environ after some workload,
> >
> > /Group1: cpu 0-1, mem=0 limit=1G, soft-limit=700M usage=990M
> > /Group2: cpu 2-3, mem=1 limit=1G soft-limit=700M usage=400M
> >
> > *And* memory of node"1" is in shortage and the kernel has to reclaim
> > memory from node "1".
> >
> > Your routine tries to relclaim memory from a group, which exceeds soft-limit
> > ....Group1. But it's no help because Group1 doesn't contains any memory in Node1.
> > And make it worse, your routine doen't tries to call try_to_free_pages() in global
> > LRU when your soft-limit reclaim some memory. So, if a task in Group 1 continues
> > to allocate memory at some speed, memory shortage in Group2 will not be recovered,
> > easily.
> >
> > This includes 2 aspects of trouble.
> > - Group1's memory is reclaimed but it's wrong.
> > - Group2's try_to_free_pages() may took very long time.
> >
> > (Current page shrinking under cpuset seems to scan all nodes,
> > his seems not to be quick, but it works because it scans all.
> > This will be another problem, anyway ;).
> >
> >
> > BTW, currently mem_cgroup_try_to_free_pages() assumes GFP_HIGHUSER_MOVABLE
> > always.
> > ==
> > unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
> > gfp_t gfp_mask)
> > {
> > struct scan_control sc = {
> > .may_writepage = !laptop_mode,
> > .may_swap = 1,
> > .swap_cluster_max = SWAP_CLUSTER_MAX,
> > .swappiness = vm_swappiness,
> > .order = 0,
> > .mem_cgroup = mem_cont,
> > .isolate_pages = mem_cgroup_isolate_pages,
> > };
> > struct zonelist *zonelist;
> >
> > sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
> > (GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
> > zonelist = NODE_DATA(numa_node_id())->node_zonelists;
> > return do_try_to_free_pages(zonelist, &sc);
> > }
> > ==
> > please select appropriate zonelist here.
> >
>
> We do have zonelist information in __alloc_pages_internal(), it should be easy
> to pass the zonelist or come up with a good default (current one) if no zonelist
> is provided to the routine.
>
yes. what I want to say is you should take care of this.
Anyway, I think you should revisit the whole memory reclaim and fixes small bugs?
which doesn't meet soft-limit.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-06-30 3:57 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-06-27 15:18 Balbir Singh
2008-06-27 15:18 ` [RFC 1/5] Memory controller soft limit documentation Balbir Singh
2008-06-27 15:18 ` [RFC 2/5] Add delete max to prio heap Balbir Singh
2008-06-27 15:18 ` [RFC 3/5] Replacement policy on heap overfull Balbir Singh
2008-06-27 15:37 ` Paul Menage
2008-06-30 3:46 ` Balbir Singh
2008-06-27 15:18 ` [RFC 4/5] Memory controller soft limit resource counter additions Balbir Singh
2008-06-27 15:19 ` [RFC 5/5] Memory controller soft limit reclaim on contention Balbir Singh
2008-06-27 16:09 ` Paul Menage
2008-06-29 4:48 ` Balbir Singh
2008-06-30 3:42 ` Balbir Singh
2008-06-28 4:22 ` KAMEZAWA Hiroyuki
2008-06-30 7:33 ` KOSAKI Motohiro
2008-06-30 7:48 ` Balbir Singh
2008-06-30 7:56 ` KOSAKI Motohiro
2008-06-30 8:11 ` Balbir Singh
2008-06-30 8:17 ` KOSAKI Motohiro
2008-06-28 4:36 ` [RFC 0/5] Memory controller soft limit introduction (v3) KAMEZAWA Hiroyuki
2008-06-29 5:02 ` Balbir Singh
2008-06-30 1:20 ` KAMEZAWA Hiroyuki
2008-06-30 1:50 ` KAMEZAWA Hiroyuki
2008-06-30 2:02 ` KAMEZAWA Hiroyuki
2008-06-30 3:41 ` Balbir Singh
2008-06-30 3:57 ` KAMEZAWA Hiroyuki [this message]
2008-06-30 4:00 ` Balbir Singh
2008-06-30 4:19 ` KAMEZAWA Hiroyuki
2008-06-30 4:40 ` Balbir Singh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080630125737.4b14785f.kamezawa.hiroyu@jp.fujitsu.com \
--to=kamezawa.hiroyu@jp.fujitsu.com \
--cc=akpm@linux-foundation.org \
--cc=balbir@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=menage@google.com \
--cc=yamamoto@valinux.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox