linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] memcg: mem_cgroup->prev_priority protected by lock.
@ 2008-12-02  7:11 KOSAKI Motohiro
  2008-12-02  7:13 ` [PATCH 2/2] memcg: make memory.swappiness file KOSAKI Motohiro
  2008-12-02  7:15 ` [PATCH 1/2] memcg: mem_cgroup->prev_priority protected by lock KAMEZAWA Hiroyuki
  0 siblings, 2 replies; 4+ messages in thread
From: KOSAKI Motohiro @ 2008-12-02  7:11 UTC (permalink / raw)
  To: LKML, linux-mm, Balbir Singh, KAMEZAWA Hiroyuki, Andrew Morton
  Cc: kosaki.motohiro


Currently, mem_cgroup doesn't have own lock and almost its member doesn't need.
 (e.g. info is protected by zone lock, stat is per cpu variable)

However, there is one explict exception. mem_cgroup->prev_priorit need lock,
but doesn't protect.
Luckly, this is NOT bug because prev_priority isn't used for current reclaim code.

However, we plan to use prev_priority future again.
Therefore, fixing is better.


In addision, we plan to reuse this lock for another member.
Then "misc_lock" name is better than "prev_priority_lock".



Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/memcontrol.c |   20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

Index: b/mm/memcontrol.c
===================================================================
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -142,6 +142,13 @@ struct mem_cgroup {
 	 */
 	struct mem_cgroup_lru_info info;
 
+	/*
+	  Almost mem_cgroup member doesn't need lock.
+	  (e.g. info is protected by zone lock, stat is per cpu variable)
+	  However, rest few member need explict lock.
+	*/
+	spinlock_t misc_lock;
+
 	int	prev_priority;	/* for recording reclaim priority */
 
 	/*
@@ -393,18 +400,28 @@ int mem_cgroup_calc_mapped_ratio(struct 
  */
 int mem_cgroup_get_reclaim_priority(struct mem_cgroup *mem)
 {
-	return mem->prev_priority;
+	int prev_priority;
+
+	spin_lock(&mem->misc_lock);
+	prev_priority = mem->prev_priority;
+	spin_unlock(&mem->misc_lock);
+
+	return prev_priority;
 }
 
 void mem_cgroup_note_reclaim_priority(struct mem_cgroup *mem, int priority)
 {
+	spin_lock(&mem->misc_lock);
 	if (priority < mem->prev_priority)
 		mem->prev_priority = priority;
+	spin_unlock(&mem->misc_lock);
 }
 
 void mem_cgroup_record_reclaim_priority(struct mem_cgroup *mem, int priority)
 {
+	spin_lock(&mem->misc_lock);
 	mem->prev_priority = priority;
+	spin_unlock(&mem->misc_lock);
 }
 
 /*
@@ -1967,6 +1984,7 @@ mem_cgroup_create(struct cgroup_subsys *
 	}
 
 	mem->last_scanned_child = NULL;
+	spin_lock_init(&mem->misc_lock);
 
 	return &mem->css;
 free_out:


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 2/2] memcg: make memory.swappiness file
  2008-12-02  7:11 [PATCH 1/2] memcg: mem_cgroup->prev_priority protected by lock KOSAKI Motohiro
@ 2008-12-02  7:13 ` KOSAKI Motohiro
  2008-12-02  7:15 ` [PATCH 1/2] memcg: mem_cgroup->prev_priority protected by lock KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 4+ messages in thread
From: KOSAKI Motohiro @ 2008-12-02  7:13 UTC (permalink / raw)
  To: LKML, linux-mm, Balbir Singh, KAMEZAWA Hiroyuki, Andrew Morton
  Cc: kosaki.motohiro

Currently, /proc/sys/vm/swappiness can change swappiness ratio for global reclaim.
However, memcg reclaim doesn't have tuning parameter for itself.

In general, the optimal swappiness depend on workload.
(e.g. hpc workload need to low swappiness than the others.)

Then, per cgroup swappiness improve administrator tunability.



Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 Documentation/controllers/memory.txt |    6 ++
 include/linux/swap.h                 |    3 -
 mm/memcontrol.c                      |   72 ++++++++++++++++++++++++++++++++---
 mm/vmscan.c                          |    7 +--
 4 files changed, 78 insertions(+), 10 deletions(-)

Index: b/mm/memcontrol.c
===================================================================
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -163,6 +163,9 @@ struct mem_cgroup {
 	unsigned long	last_oom_jiffies;
 	int		obsolete;
 	atomic_t	refcnt;
+
+	unsigned int	swappiness;
+
 	/*
 	 * statistics. This must be placed at the end of memcg.
 	 */
@@ -588,6 +591,22 @@ done:
 	return ret;
 }
 
+static unsigned int get_swappiness(struct mem_cgroup *memcg)
+{
+	struct cgroup *cgrp = memcg->css.cgroup;
+	unsigned int swappiness;
+
+	/* root ? */
+	if (cgrp->parent == NULL)
+		return vm_swappiness;
+
+	spin_lock(&memcg->misc_lock);
+	swappiness = memcg->swappiness;
+	spin_unlock(&memcg->misc_lock);
+
+	return swappiness;
+}
+
 /*
  * Dance down the hierarchy if needed to reclaim memory. We remember the
  * last child we reclaimed from, so that we don't end up penalizing
@@ -608,7 +627,8 @@ static int mem_cgroup_hierarchical_recla
 	 * but there might be left over accounting, even after children
 	 * have left.
 	 */
-	ret = try_to_free_mem_cgroup_pages(root_mem, gfp_mask, noswap);
+	ret = try_to_free_mem_cgroup_pages(root_mem, gfp_mask, noswap,
+					   get_swappiness(root_mem));
 	if (res_counter_check_under_limit(&root_mem->res))
 		return 0;
 
@@ -622,7 +642,8 @@ static int mem_cgroup_hierarchical_recla
 			cgroup_unlock();
 			continue;
 		}
-		ret = try_to_free_mem_cgroup_pages(next_mem, gfp_mask, noswap);
+		ret = try_to_free_mem_cgroup_pages(next_mem, gfp_mask, noswap,
+						   get_swappiness(next_mem));
 		if (res_counter_check_under_limit(&root_mem->res))
 			return 0;
 		cgroup_lock();
@@ -1350,7 +1371,8 @@ int mem_cgroup_shrink_usage(struct mm_st
 	rcu_read_unlock();
 
 	do {
-		progress = try_to_free_mem_cgroup_pages(mem, gfp_mask, true);
+		progress = try_to_free_mem_cgroup_pages(mem, gfp_mask, true,
+							get_swappiness(mem));
 		progress += res_counter_check_under_limit(&mem->res);
 	} while (!progress && --retry);
 
@@ -1395,7 +1417,9 @@ static int mem_cgroup_resize_limit(struc
 			break;
 
 		progress = try_to_free_mem_cgroup_pages(memcg,
-				GFP_HIGHUSER_MOVABLE, false);
+							GFP_HIGHUSER_MOVABLE,
+							false,
+							get_swappiness(memcg));
   		if (!progress)			retry_count--;
 	}
 	return ret;
@@ -1435,7 +1459,8 @@ int mem_cgroup_resize_memsw_limit(struct
 			break;
 
 		oldusage = res_counter_read_u64(&memcg->memsw, RES_USAGE);
-		try_to_free_mem_cgroup_pages(memcg, GFP_HIGHUSER_MOVABLE, true);
+		try_to_free_mem_cgroup_pages(memcg, GFP_HIGHUSER_MOVABLE, true,
+					     get_swappiness(memcg));
 		curusage = res_counter_read_u64(&memcg->memsw, RES_USAGE);
 		if (curusage >= oldusage)
 			retry_count--;
@@ -1567,7 +1592,9 @@ try_to_free:
 			goto out;
 		}
 		progress = try_to_free_mem_cgroup_pages(mem,
-						  GFP_HIGHUSER_MOVABLE, false);
+							GFP_HIGHUSER_MOVABLE,
+							false,
+							get_swappiness(mem));
 		if (!progress) {
 			nr_retries--;
 			/* maybe some writeback is necessary */
@@ -1757,6 +1784,31 @@ static int mem_control_stat_show(struct 
 	return 0;
 }
 
+static u64 mem_cgroup_swappiness_read(struct cgroup *cgrp, struct cftype *cft)
+{
+	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgrp);
+
+	return get_swappiness(memcg);
+}
+
+static int mem_cgroup_swappiness_write(struct cgroup *cgrp, struct cftype *cft,
+				       u64 val)
+{
+	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgrp);
+
+	if (val > 100)
+		return -EINVAL;
+
+	if (cgrp->parent == NULL)
+		return -EBUSY;
+
+	spin_lock(&memcg->misc_lock);
+	memcg->swappiness = val;
+	spin_unlock(&memcg->misc_lock);
+
+	return 0;
+}
+
 
 static struct cftype mem_cgroup_files[] = {
 	{
@@ -1795,6 +1847,11 @@ static struct cftype mem_cgroup_files[] 
 		.write_u64 = mem_cgroup_hierarchy_write,
 		.read_u64 = mem_cgroup_hierarchy_read,
 	},
+	{
+		.name = "swappiness",
+		.read_u64 = mem_cgroup_swappiness_read,
+		.write_u64 = mem_cgroup_swappiness_write,
+	},
 };
 
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
@@ -1986,6 +2043,9 @@ mem_cgroup_create(struct cgroup_subsys *
 	mem->last_scanned_child = NULL;
 	spin_lock_init(&mem->misc_lock);
 
+	if (parent)
+		mem->swappiness = get_swappiness(parent);
+
 	return &mem->css;
 free_out:
 	for_each_node_state(node, N_POSSIBLE)
Index: b/mm/vmscan.c
===================================================================
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1716,14 +1716,15 @@ unsigned long try_to_free_pages(struct z
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR
 
 unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
-						gfp_t gfp_mask,
-					   bool noswap)
+					   gfp_t gfp_mask,
+					   bool noswap,
+					   unsigned int swappiness)
 {
 	struct scan_control sc = {
 		.may_writepage = !laptop_mode,
 		.may_swap = 1,
 		.swap_cluster_max = SWAP_CLUSTER_MAX,
-		.swappiness = vm_swappiness,
+		.swappiness = swappiness,
 		.order = 0,
 		.mem_cgroup = mem_cont,
 		.isolate_pages = mem_cgroup_isolate_pages,
Index: b/include/linux/swap.h
===================================================================
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -214,7 +214,8 @@ static inline void lru_cache_add_active_
 extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
 					gfp_t gfp_mask);
 extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem,
-						gfp_t gfp_mask, bool noswap);
+						  gfp_t gfp_mask, bool noswap,
+						  unsigned int swappiness);
 extern int __isolate_lru_page(struct page *page, int mode, int file);
 extern unsigned long shrink_all_memory(unsigned long nr_pages);
 extern int vm_swappiness;
Index: b/Documentation/controllers/memory.txt
===================================================================
--- a/Documentation/controllers/memory.txt
+++ b/Documentation/controllers/memory.txt
@@ -289,6 +289,12 @@ will be charged as a new owner of it.
   Because rmdir() moves all pages to parent, some out-of-use page caches can be
   moved to the parent. If you want to avoid that, force_empty will be useful.
 
+5.2 swappiness
+  Similar to /proc/sys/vm/swappiness, but affecting one group only.
+
+  The root cgroup can't be changed this parameter.
+  it always use /proc/sys/vm/swappiness internally.
+
 6. Hierarchy support
 
 The memory controller supports a deep hierarchy and hierarchical accounting.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] memcg: mem_cgroup->prev_priority protected by lock.
  2008-12-02  7:11 [PATCH 1/2] memcg: mem_cgroup->prev_priority protected by lock KOSAKI Motohiro
  2008-12-02  7:13 ` [PATCH 2/2] memcg: make memory.swappiness file KOSAKI Motohiro
@ 2008-12-02  7:15 ` KAMEZAWA Hiroyuki
  2008-12-02  7:19   ` KOSAKI Motohiro
  1 sibling, 1 reply; 4+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-12-02  7:15 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: LKML, linux-mm, Balbir Singh, Andrew Morton

On Tue,  2 Dec 2008 16:11:07 +0900 (JST)
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:

> 
> Currently, mem_cgroup doesn't have own lock and almost its member doesn't need.
>  (e.g. info is protected by zone lock, stat is per cpu variable)
> 
> However, there is one explict exception. mem_cgroup->prev_priorit need lock,
> but doesn't protect.
> Luckly, this is NOT bug because prev_priority isn't used for current reclaim code.
> 
> However, we plan to use prev_priority future again.
> Therefore, fixing is better.
> 
> 
> In addision, we plan to reuse this lock for another member.
> Then "misc_lock" name is better than "prev_priority_lock".
> 
please use better name...reclaim_param_lock or some ?

-Kame


> 
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> ---
>  mm/memcontrol.c |   20 +++++++++++++++++++-
>  1 file changed, 19 insertions(+), 1 deletion(-)
> 
> Index: b/mm/memcontrol.c
> ===================================================================
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -142,6 +142,13 @@ struct mem_cgroup {
>  	 */
>  	struct mem_cgroup_lru_info info;
>  
> +	/*
> +	  Almost mem_cgroup member doesn't need lock.
> +	  (e.g. info is protected by zone lock, stat is per cpu variable)
> +	  However, rest few member need explict lock.
> +	*/
> +	spinlock_t misc_lock;
> +
>  	int	prev_priority;	/* for recording reclaim priority */
>  
>  	/*
> @@ -393,18 +400,28 @@ int mem_cgroup_calc_mapped_ratio(struct 
>   */
>  int mem_cgroup_get_reclaim_priority(struct mem_cgroup *mem)
>  {
> -	return mem->prev_priority;
> +	int prev_priority;
> +
> +	spin_lock(&mem->misc_lock);
> +	prev_priority = mem->prev_priority;
> +	spin_unlock(&mem->misc_lock);
> +
> +	return prev_priority;
>  }
>  
>  void mem_cgroup_note_reclaim_priority(struct mem_cgroup *mem, int priority)
>  {
> +	spin_lock(&mem->misc_lock);
>  	if (priority < mem->prev_priority)
>  		mem->prev_priority = priority;
> +	spin_unlock(&mem->misc_lock);
>  }
>  
>  void mem_cgroup_record_reclaim_priority(struct mem_cgroup *mem, int priority)
>  {
> +	spin_lock(&mem->misc_lock);
>  	mem->prev_priority = priority;
> +	spin_unlock(&mem->misc_lock);
>  }
>  
>  /*
> @@ -1967,6 +1984,7 @@ mem_cgroup_create(struct cgroup_subsys *
>  	}
>  
>  	mem->last_scanned_child = NULL;
> +	spin_lock_init(&mem->misc_lock);
>  
>  	return &mem->css;
>  free_out:
> 
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] memcg: mem_cgroup->prev_priority protected by lock.
  2008-12-02  7:15 ` [PATCH 1/2] memcg: mem_cgroup->prev_priority protected by lock KAMEZAWA Hiroyuki
@ 2008-12-02  7:19   ` KOSAKI Motohiro
  0 siblings, 0 replies; 4+ messages in thread
From: KOSAKI Motohiro @ 2008-12-02  7:19 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: kosaki.motohiro, LKML, linux-mm, Balbir Singh, Andrew Morton

> On Tue,  2 Dec 2008 16:11:07 +0900 (JST)
> KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> 
> > 
> > Currently, mem_cgroup doesn't have own lock and almost its member doesn't need.
> >  (e.g. info is protected by zone lock, stat is per cpu variable)
> > 
> > However, there is one explict exception. mem_cgroup->prev_priorit need lock,
> > but doesn't protect.
> > Luckly, this is NOT bug because prev_priority isn't used for current reclaim code.
> > 
> > However, we plan to use prev_priority future again.
> > Therefore, fixing is better.
> > 
> > 
> > In addision, we plan to reuse this lock for another member.
> > Then "misc_lock" name is better than "prev_priority_lock".
> > 
> please use better name...reclaim_param_lock or some ?

good idea :)

Will fix.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-12-02  7:19 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-12-02  7:11 [PATCH 1/2] memcg: mem_cgroup->prev_priority protected by lock KOSAKI Motohiro
2008-12-02  7:13 ` [PATCH 2/2] memcg: make memory.swappiness file KOSAKI Motohiro
2008-12-02  7:15 ` [PATCH 1/2] memcg: mem_cgroup->prev_priority protected by lock KAMEZAWA Hiroyuki
2008-12-02  7:19   ` KOSAKI Motohiro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox