[PATCH] memcg: reclaim memory from nodes in round robin

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] memcg: reclaim memory from nodes in round robin
@ 2011-04-27  2:57 KAMEZAWA Hiroyuki
  2011-04-27  3:52 ` Ying Han
  2011-04-27  5:20 ` Daisuke Nishimura
  0 siblings, 2 replies; 4+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-04-27  2:57 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, akpm, balbir, Daisuke Nishimura, kosaki.motohiro, Ying Han

Now, memory cgroup's direct reclaim frees memory from the current node.
But this has some troubles. In usual, when a set of threads works in
cooperative way, they are tend to on the same node. So, if they hit
limits under memcg, it will reclaim memory from themselves, it may be
active working set.

For example, assume 2 node system which has Node 0 and Node 1
and a memcg which has 1G limit. After some work, file cacne remains and
and usages are
   Node 0:  1M
   Node 1:  998M.

and run an application on Node 0, it will eats its foot before freeing
unnecessary file caches.

This patch adds round-robin for NUMA and adds equal pressure to each
node. When using cpuset's spread memory feature, this will work very well.

But yes, better algorithm is appreciated.

From: Ying Han <yinghan@google.com>
Signed-off-by: Ying Han <yinghan@google.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/memcontrol.h |    1 +
 mm/memcontrol.c            |   25 +++++++++++++++++++++++++
 mm/vmscan.c                |    9 ++++++++-
 3 files changed, 34 insertions(+), 1 deletion(-)

Index: memcg/include/linux/memcontrol.h
===================================================================
--- memcg.orig/include/linux/memcontrol.h
+++ memcg/include/linux/memcontrol.h
@@ -108,6 +108,7 @@ extern void mem_cgroup_end_migration(str
  */
 int mem_cgroup_inactive_anon_is_low(struct mem_cgroup *memcg);
 int mem_cgroup_inactive_file_is_low(struct mem_cgroup *memcg);
+int mem_cgroup_select_victim_node(struct mem_cgroup *memcg);
 unsigned long mem_cgroup_zone_nr_pages(struct mem_cgroup *memcg,
 				       struct zone *zone,
 				       enum lru_list lru);
Index: memcg/mm/memcontrol.c
===================================================================
--- memcg.orig/mm/memcontrol.c
+++ memcg/mm/memcontrol.c
@@ -237,6 +237,7 @@ struct mem_cgroup {
 	 * reclaimed from.
 	 */
 	int last_scanned_child;
+	int last_scanned_node;
 	/*
 	 * Should the accounting and control be hierarchical, per subtree?
 	 */
@@ -1472,6 +1473,29 @@ mem_cgroup_select_victim(struct mem_cgro
 }
 
 /*
+ * Selecting a node where we start reclaim from. Because what we need is just
+ * reducing usage counter, start from anywhere is O,K. When considering
+ * memory reclaim from current node, there are pros. and cons.
+ * Freeing memory from current node means freeing memory from a node which
+ * we'll use or we've used. So, it may make LRU bad. And if several threads
+ * hit limits, it will see a contention on a node. But freeing from remote
+ * node mean more costs for memory reclaim because of memory latency.
+ *
+ * Now, we use round-robin. Better algorithm is welcomed.
+ */
+int mem_cgroup_select_victim_node(struct mem_cgroup *mem)
+{
+	int node;
+
+	node = next_node(mem->last_scanned_node, node_states[N_HIGH_MEMORY]);
+	if (node == MAX_NUMNODES)
+		node = first_node(node_states[N_HIGH_MEMORY]);
+
+	mem->last_scanned_node = node;
+	return node;
+}
+
+/*
  * Scan the hierarchy if needed to reclaim memory. We remember the last child
  * we reclaimed from, so that we don't end up penalizing one child extensively
  * based on its position in the children list.
@@ -4678,6 +4702,7 @@ mem_cgroup_create(struct cgroup_subsys *
 		res_counter_init(&mem->memsw, NULL);
 	}
 	mem->last_scanned_child = 0;
+	mem->last_scanned_node = MAX_NUMNODES;
 	INIT_LIST_HEAD(&mem->oom_notify);
 
 	if (parent)
Index: memcg/mm/vmscan.c
===================================================================
--- memcg.orig/mm/vmscan.c
+++ memcg/mm/vmscan.c
@@ -2198,6 +2198,7 @@ unsigned long try_to_free_mem_cgroup_pag
 {
 	struct zonelist *zonelist;
 	unsigned long nr_reclaimed;
+	int nid;
 	struct scan_control sc = {
 		.may_writepage = !laptop_mode,
 		.may_unmap = 1,
@@ -2208,10 +2209,16 @@ unsigned long try_to_free_mem_cgroup_pag
 		.mem_cgroup = mem_cont,
 		.nodemask = NULL, /* we don't care the placement */
 	};
+	/*
+	 * Unlike direct reclaim via allo_pages(), memcg's reclaim
+	 * don't take care from where we get free resouce. So, the node where
+	 * we need to start scan is not need to be current node.
+	 */
+	nid = mem_cgroup_select_victim_node(mem_cont);
 
 	sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
 			(GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
-	zonelist = NODE_DATA(numa_node_id())->node_zonelists;
+	zonelist = NODE_DATA(nid)->node_zonelists;
 
 	trace_mm_vmscan_memcg_reclaim_begin(0,
 					    sc.may_writepage,

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] memcg: reclaim memory from nodes in round robin
  2011-04-27  2:57 [PATCH] memcg: reclaim memory from nodes in round robin KAMEZAWA Hiroyuki
@ 2011-04-27  3:52 ` Ying Han
  2011-04-27  4:28   ` KAMEZAWA Hiroyuki
  2011-04-27  5:20 ` Daisuke Nishimura
  1 sibling, 1 reply; 4+ messages in thread
From: Ying Han @ 2011-04-27  3:52 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, akpm, balbir, Daisuke Nishimura, kosaki.motohiro

On Tue, Apr 26, 2011 at 7:57 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> Now, memory cgroup's direct reclaim frees memory from the current node.
> But this has some troubles. In usual, when a set of threads works in
> cooperative way, they are tend to on the same node. So, if they hit
> limits under memcg, it will reclaim memory from themselves, it may be
> active working set.
>
> For example, assume 2 node system which has Node 0 and Node 1
> and a memcg which has 1G limit. After some work, file cacne remains and
> and usages are
>   Node 0:  1M
>   Node 1:  998M.
>
> and run an application on Node 0, it will eats its foot before freeing
> unnecessary file caches.
>
> This patch adds round-robin for NUMA and adds equal pressure to each
> node. When using cpuset's spread memory feature, this will work very well.
>
> But yes, better algorithm is appreciated.
>
> From: Ying Han <yinghan@google.com>
> Signed-off-by: Ying Han <yinghan@google.com>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  include/linux/memcontrol.h |    1 +
>  mm/memcontrol.c            |   25 +++++++++++++++++++++++++
>  mm/vmscan.c                |    9 ++++++++-
>  3 files changed, 34 insertions(+), 1 deletion(-)
>
> Index: memcg/include/linux/memcontrol.h
> ===================================================================
> --- memcg.orig/include/linux/memcontrol.h
> +++ memcg/include/linux/memcontrol.h
> @@ -108,6 +108,7 @@ extern void mem_cgroup_end_migration(str
>  */
>  int mem_cgroup_inactive_anon_is_low(struct mem_cgroup *memcg);
>  int mem_cgroup_inactive_file_is_low(struct mem_cgroup *memcg);
> +int mem_cgroup_select_victim_node(struct mem_cgroup *memcg);
>  unsigned long mem_cgroup_zone_nr_pages(struct mem_cgroup *memcg,
>                                       struct zone *zone,
>                                       enum lru_list lru);
> Index: memcg/mm/memcontrol.c
> ===================================================================
> --- memcg.orig/mm/memcontrol.c
> +++ memcg/mm/memcontrol.c
> @@ -237,6 +237,7 @@ struct mem_cgroup {
>         * reclaimed from.
>         */
>        int last_scanned_child;
> +       int last_scanned_node;
>        /*
>         * Should the accounting and control be hierarchical, per subtree?
>         */
> @@ -1472,6 +1473,29 @@ mem_cgroup_select_victim(struct mem_cgro
>  }
>
>  /*
> + * Selecting a node where we start reclaim from. Because what we need is just
> + * reducing usage counter, start from anywhere is O,K. When considering
> + * memory reclaim from current node, there are pros. and cons.
> + * Freeing memory from current node means freeing memory from a node which
> + * we'll use or we've used. So, it may make LRU bad. And if several threads
> + * hit limits, it will see a contention on a node. But freeing from remote
> + * node mean more costs for memory reclaim because of memory latency.
> + *
> + * Now, we use round-robin. Better algorithm is welcomed.
> + */
> +int mem_cgroup_select_victim_node(struct mem_cgroup *mem)
> +{
> +       int node;
> +
> +       node = next_node(mem->last_scanned_node, node_states[N_HIGH_MEMORY]);
> +       if (node == MAX_NUMNODES)
> +               node = first_node(node_states[N_HIGH_MEMORY]);
> +
> +       mem->last_scanned_node = node;
> +       return node;
> +}
> +
> +/*
>  * Scan the hierarchy if needed to reclaim memory. We remember the last child
>  * we reclaimed from, so that we don't end up penalizing one child extensively
>  * based on its position in the children list.
> @@ -4678,6 +4702,7 @@ mem_cgroup_create(struct cgroup_subsys *
>                res_counter_init(&mem->memsw, NULL);
>        }
>        mem->last_scanned_child = 0;
> +       mem->last_scanned_node = MAX_NUMNODES;
>        INIT_LIST_HEAD(&mem->oom_notify);
>
>        if (parent)
> Index: memcg/mm/vmscan.c
> ===================================================================
> --- memcg.orig/mm/vmscan.c
> +++ memcg/mm/vmscan.c
> @@ -2198,6 +2198,7 @@ unsigned long try_to_free_mem_cgroup_pag
>  {
>        struct zonelist *zonelist;
>        unsigned long nr_reclaimed;
> +       int nid;
>        struct scan_control sc = {
>                .may_writepage = !laptop_mode,
>                .may_unmap = 1,
> @@ -2208,10 +2209,16 @@ unsigned long try_to_free_mem_cgroup_pag
>                .mem_cgroup = mem_cont,
>                .nodemask = NULL, /* we don't care the placement */
>        };
> +       /*
> +        * Unlike direct reclaim via allo_pages(), memcg's reclaim
> +        * don't take care from where we get free resouce. So, the node where
> +        * we need to start scan is not need to be current node.
> +        */
Sorry, some typos. alloc_pages() instead of alloc_pages(). And "free resource".

--Ying
> +       nid = mem_cgroup_select_victim_node(mem_cont);
>
>        sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
>                        (GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
> -       zonelist = NODE_DATA(numa_node_id())->node_zonelists;
> +       zonelist = NODE_DATA(nid)->node_zonelists;
>
>        trace_mm_vmscan_memcg_reclaim_begin(0,
>                                            sc.may_writepage,
>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] memcg: reclaim memory from nodes in round robin
  2011-04-27  3:52 ` Ying Han
@ 2011-04-27  4:28   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 4+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-04-27  4:28 UTC (permalink / raw)
  To: Ying Han
  Cc: linux-mm, linux-kernel, akpm, balbir, Daisuke Nishimura, kosaki.motohiro

On Tue, 26 Apr 2011 20:52:39 -0700
Ying Han <yinghan@google.com> wrote:

> On Tue, Apr 26, 2011 at 7:57 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > Now, memory cgroup's direct reclaim frees memory from the current node.
> > But this has some troubles. In usual, when a set of threads works in
> > cooperative way, they are tend to on the same node. So, if they hit
> > limits under memcg, it will reclaim memory from themselves, it may be
> > active working set.
> >
> > For example, assume 2 node system which has Node 0 and Node 1
> > and a memcg which has 1G limit. After some work, file cacne remains and
> > and usages are
> > A  Node 0: A 1M
> > A  Node 1: A 998M.
> >
> > and run an application on Node 0, it will eats its foot before freeing
> > unnecessary file caches.
> >
> > This patch adds round-robin for NUMA and adds equal pressure to each
> > node. When using cpuset's spread memory feature, this will work very well.
> >
> > But yes, better algorithm is appreciated.
> >
> > From: Ying Han <yinghan@google.com>
> > Signed-off-by: Ying Han <yinghan@google.com>
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> > A include/linux/memcontrol.h | A  A 1 +
> > A mm/memcontrol.c A  A  A  A  A  A | A  25 +++++++++++++++++++++++++
> > A mm/vmscan.c A  A  A  A  A  A  A  A | A  A 9 ++++++++-
> > A 3 files changed, 34 insertions(+), 1 deletion(-)
> >
> > Index: memcg/include/linux/memcontrol.h
> > ===================================================================
> > --- memcg.orig/include/linux/memcontrol.h
> > +++ memcg/include/linux/memcontrol.h
> > @@ -108,6 +108,7 @@ extern void mem_cgroup_end_migration(str
> > A */
> > A int mem_cgroup_inactive_anon_is_low(struct mem_cgroup *memcg);
> > A int mem_cgroup_inactive_file_is_low(struct mem_cgroup *memcg);
> > +int mem_cgroup_select_victim_node(struct mem_cgroup *memcg);
> > A unsigned long mem_cgroup_zone_nr_pages(struct mem_cgroup *memcg,
> > A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  struct zone *zone,
> > A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  enum lru_list lru);
> > Index: memcg/mm/memcontrol.c
> > ===================================================================
> > --- memcg.orig/mm/memcontrol.c
> > +++ memcg/mm/memcontrol.c
> > @@ -237,6 +237,7 @@ struct mem_cgroup {
> > A  A  A  A  * reclaimed from.
> > A  A  A  A  */
> > A  A  A  A int last_scanned_child;
> > + A  A  A  int last_scanned_node;
> > A  A  A  A /*
> > A  A  A  A  * Should the accounting and control be hierarchical, per subtree?
> > A  A  A  A  */
> > @@ -1472,6 +1473,29 @@ mem_cgroup_select_victim(struct mem_cgro
> > A }
> >
> > A /*
> > + * Selecting a node where we start reclaim from. Because what we need is just
> > + * reducing usage counter, start from anywhere is O,K. When considering
> > + * memory reclaim from current node, there are pros. and cons.
> > + * Freeing memory from current node means freeing memory from a node which
> > + * we'll use or we've used. So, it may make LRU bad. And if several threads
> > + * hit limits, it will see a contention on a node. But freeing from remote
> > + * node mean more costs for memory reclaim because of memory latency.
> > + *
> > + * Now, we use round-robin. Better algorithm is welcomed.
> > + */
> > +int mem_cgroup_select_victim_node(struct mem_cgroup *mem)
> > +{
> > + A  A  A  int node;
> > +
> > + A  A  A  node = next_node(mem->last_scanned_node, node_states[N_HIGH_MEMORY]);
> > + A  A  A  if (node == MAX_NUMNODES)
> > + A  A  A  A  A  A  A  node = first_node(node_states[N_HIGH_MEMORY]);
> > +
> > + A  A  A  mem->last_scanned_node = node;
> > + A  A  A  return node;
> > +}
> > +
> > +/*
> > A * Scan the hierarchy if needed to reclaim memory. We remember the last child
> > A * we reclaimed from, so that we don't end up penalizing one child extensively
> > A * based on its position in the children list.
> > @@ -4678,6 +4702,7 @@ mem_cgroup_create(struct cgroup_subsys *
> > A  A  A  A  A  A  A  A res_counter_init(&mem->memsw, NULL);
> > A  A  A  A }
> > A  A  A  A mem->last_scanned_child = 0;
> > + A  A  A  mem->last_scanned_node = MAX_NUMNODES;
> > A  A  A  A INIT_LIST_HEAD(&mem->oom_notify);
> >
> > A  A  A  A if (parent)
> > Index: memcg/mm/vmscan.c
> > ===================================================================
> > --- memcg.orig/mm/vmscan.c
> > +++ memcg/mm/vmscan.c
> > @@ -2198,6 +2198,7 @@ unsigned long try_to_free_mem_cgroup_pag
> > A {
> > A  A  A  A struct zonelist *zonelist;
> > A  A  A  A unsigned long nr_reclaimed;
> > + A  A  A  int nid;
> > A  A  A  A struct scan_control sc = {
> > A  A  A  A  A  A  A  A .may_writepage = !laptop_mode,
> > A  A  A  A  A  A  A  A .may_unmap = 1,
> > @@ -2208,10 +2209,16 @@ unsigned long try_to_free_mem_cgroup_pag
> > A  A  A  A  A  A  A  A .mem_cgroup = mem_cont,
> > A  A  A  A  A  A  A  A .nodemask = NULL, /* we don't care the placement */
> > A  A  A  A };
> > + A  A  A  /*
> > + A  A  A  A * Unlike direct reclaim via allo_pages(), memcg's reclaim
> > + A  A  A  A * don't take care from where we get free resouce. So, the node where
> > + A  A  A  A * we need to start scan is not need to be current node.
> > + A  A  A  A */
> Sorry, some typos. alloc_pages() instead of alloc_pages(). And "free resource".
> 
ok, will fix. Thank you for pointing out.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] memcg: reclaim memory from nodes in round robin
  2011-04-27  2:57 [PATCH] memcg: reclaim memory from nodes in round robin KAMEZAWA Hiroyuki
  2011-04-27  3:52 ` Ying Han
@ 2011-04-27  5:20 ` Daisuke Nishimura
  1 sibling, 0 replies; 4+ messages in thread
From: Daisuke Nishimura @ 2011-04-27  5:20 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, akpm, balbir, kosaki.motohiro, Ying Han,
	Daisuke Nishimura

Hi,

On Wed, 27 Apr 2011 11:57:18 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> Now, memory cgroup's direct reclaim frees memory from the current node.
> But this has some troubles. In usual, when a set of threads works in
> cooperative way, they are tend to on the same node. So, if they hit
> limits under memcg, it will reclaim memory from themselves, it may be
> active working set.
> 
> For example, assume 2 node system which has Node 0 and Node 1
> and a memcg which has 1G limit. After some work, file cacne remains and
> and usages are
>    Node 0:  1M
>    Node 1:  998M.
> 
> and run an application on Node 0, it will eats its foot before freeing
> unnecessary file caches.
> 
> This patch adds round-robin for NUMA and adds equal pressure to each
> node. When using cpuset's spread memory feature, this will work very well.
> 
> But yes, better algorithm is appreciated.
>
At first, I thought the process may be oom-killed easily if we have many NUMA nodes
and we try to reclaim only from nodes where no processes in the memcg allocate memory.
But considering more, node_zonelists contains zones from other NUMA nodes IIUC,
so it doesn't happen. 

Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>

Except for some typos which have already been pointed out.

Thanks,
Daisuke Nishimura.

P.S.
I'm very sorry for my laziness these days. We have a long holidays in Japan
from this weekend, so I hope I can review recent patches about bgreclaim etc
in my home.

> From: Ying Han <yinghan@google.com>
> Signed-off-by: Ying Han <yinghan@google.com>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  include/linux/memcontrol.h |    1 +
>  mm/memcontrol.c            |   25 +++++++++++++++++++++++++
>  mm/vmscan.c                |    9 ++++++++-
>  3 files changed, 34 insertions(+), 1 deletion(-)
> 
> Index: memcg/include/linux/memcontrol.h
> ===================================================================
> --- memcg.orig/include/linux/memcontrol.h
> +++ memcg/include/linux/memcontrol.h
> @@ -108,6 +108,7 @@ extern void mem_cgroup_end_migration(str
>   */
>  int mem_cgroup_inactive_anon_is_low(struct mem_cgroup *memcg);
>  int mem_cgroup_inactive_file_is_low(struct mem_cgroup *memcg);
> +int mem_cgroup_select_victim_node(struct mem_cgroup *memcg);
>  unsigned long mem_cgroup_zone_nr_pages(struct mem_cgroup *memcg,
>  				       struct zone *zone,
>  				       enum lru_list lru);
> Index: memcg/mm/memcontrol.c
> ===================================================================
> --- memcg.orig/mm/memcontrol.c
> +++ memcg/mm/memcontrol.c
> @@ -237,6 +237,7 @@ struct mem_cgroup {
>  	 * reclaimed from.
>  	 */
>  	int last_scanned_child;
> +	int last_scanned_node;
>  	/*
>  	 * Should the accounting and control be hierarchical, per subtree?
>  	 */
> @@ -1472,6 +1473,29 @@ mem_cgroup_select_victim(struct mem_cgro
>  }
>  
>  /*
> + * Selecting a node where we start reclaim from. Because what we need is just
> + * reducing usage counter, start from anywhere is O,K. When considering
> + * memory reclaim from current node, there are pros. and cons.
> + * Freeing memory from current node means freeing memory from a node which
> + * we'll use or we've used. So, it may make LRU bad. And if several threads
> + * hit limits, it will see a contention on a node. But freeing from remote
> + * node mean more costs for memory reclaim because of memory latency.
> + *
> + * Now, we use round-robin. Better algorithm is welcomed.
> + */
> +int mem_cgroup_select_victim_node(struct mem_cgroup *mem)
> +{
> +	int node;
> +
> +	node = next_node(mem->last_scanned_node, node_states[N_HIGH_MEMORY]);
> +	if (node == MAX_NUMNODES)
> +		node = first_node(node_states[N_HIGH_MEMORY]);
> +
> +	mem->last_scanned_node = node;
> +	return node;
> +}
> +
> +/*
>   * Scan the hierarchy if needed to reclaim memory. We remember the last child
>   * we reclaimed from, so that we don't end up penalizing one child extensively
>   * based on its position in the children list.
> @@ -4678,6 +4702,7 @@ mem_cgroup_create(struct cgroup_subsys *
>  		res_counter_init(&mem->memsw, NULL);
>  	}
>  	mem->last_scanned_child = 0;
> +	mem->last_scanned_node = MAX_NUMNODES;
>  	INIT_LIST_HEAD(&mem->oom_notify);
>  
>  	if (parent)
> Index: memcg/mm/vmscan.c
> ===================================================================
> --- memcg.orig/mm/vmscan.c
> +++ memcg/mm/vmscan.c
> @@ -2198,6 +2198,7 @@ unsigned long try_to_free_mem_cgroup_pag
>  {
>  	struct zonelist *zonelist;
>  	unsigned long nr_reclaimed;
> +	int nid;
>  	struct scan_control sc = {
>  		.may_writepage = !laptop_mode,
>  		.may_unmap = 1,
> @@ -2208,10 +2209,16 @@ unsigned long try_to_free_mem_cgroup_pag
>  		.mem_cgroup = mem_cont,
>  		.nodemask = NULL, /* we don't care the placement */
>  	};
> +	/*
> +	 * Unlike direct reclaim via allo_pages(), memcg's reclaim
> +	 * don't take care from where we get free resouce. So, the node where
> +	 * we need to start scan is not need to be current node.
> +	 */
> +	nid = mem_cgroup_select_victim_node(mem_cont);
>  
>  	sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
>  			(GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
> -	zonelist = NODE_DATA(numa_node_id())->node_zonelists;
> +	zonelist = NODE_DATA(nid)->node_zonelists;
>  
>  	trace_mm_vmscan_memcg_reclaim_begin(0,
>  					    sc.may_writepage,
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-04-27  5:24 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-27  2:57 [PATCH] memcg: reclaim memory from nodes in round robin KAMEZAWA Hiroyuki
2011-04-27  3:52 ` Ying Han
2011-04-27  4:28   ` KAMEZAWA Hiroyuki
2011-04-27  5:20 ` Daisuke Nishimura

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox