Re: [-mm patch] Show memcg information during OOM (v3)

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: [-mm patch] Show memcg information during OOM (v3)
@ 2009-02-03 17:21 Balbir Singh
  2009-02-03 22:46 ` Andrew Morton
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Balbir Singh @ 2009-02-03 17:21 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki, Andrew Morton; +Cc: linux-kernel, nishimura, lizf, linux-mm


Description: Add RSS and swap to OOM output from memcg

From: Balbir Singh <balbir@linux.vnet.ibm.com>

Changelog v3..v2
1. Use static char arrays of size PATH_MAX in order to make
   the OOM message more reliable.

Changelog v2..v1:

1. Add more information about task's memcg and the memcg
   over it's limit
2. Print data in KB
3. Move the print routine outside task_lock()
4. Use rcu_read_lock() around cgroup_path, strictly speaking it
   is not required, but relying on the current memcg implementation
   is not a good idea.


This patch displays memcg values like failcnt, usage and limit
when an OOM occurs due to memcg.

Thanks go out to Johannes Weiner, Li Zefan, David Rientjes,
Kamezawa Hiroyuki, Daisuke Nishimura and KOSAKI Motohiro for
review.

Sample output
-------------

Task in /a/x killed as a result of limit of /a
memory: usage 1048576kB, limit 1048576kB, failcnt 4183
memory+swap: usage 1400964kB, limit 9007199254740991kB, failcnt 0

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

 include/linux/memcontrol.h |    6 ++++
 mm/memcontrol.c            |   63 ++++++++++++++++++++++++++++++++++++++++++++
 mm/oom_kill.c              |    1 +
 3 files changed, 70 insertions(+), 0 deletions(-)


diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 326f45c..f9a6e78 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -104,6 +104,8 @@ struct zone_reclaim_stat *mem_cgroup_get_reclaim_stat(struct mem_cgroup *memcg,
 						      struct zone *zone);
 struct zone_reclaim_stat*
 mem_cgroup_get_reclaim_stat_from_page(struct page *page);
+extern void mem_cgroup_print_oom_info(struct mem_cgroup *memcg,
+					struct task_struct *p);
 
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
 extern int do_swap_account;
@@ -270,6 +272,10 @@ mem_cgroup_get_reclaim_stat_from_page(struct page *page)
 	return NULL;
 }
 
+void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
+{
+}
+
 #endif /* CONFIG_CGROUP_MEM_CONT */
 
 #endif /* _LINUX_MEMCONTROL_H */
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 8e4be9c..44e053b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -27,6 +27,7 @@
 #include <linux/backing-dev.h>
 #include <linux/bit_spinlock.h>
 #include <linux/rcupdate.h>
+#include <linux/limits.h>
 #include <linux/mutex.h>
 #include <linux/slab.h>
 #include <linux/swap.h>
@@ -813,6 +814,68 @@ bool mem_cgroup_oom_called(struct task_struct *task)
 	rcu_read_unlock();
 	return ret;
 }
+
+/**
+ * mem_cgroup_print_mem_info: Called from OOM with tasklist_lock held in
+ * read mode.
+ * @memcg: The memory cgroup that went over limit
+ * @p: Task that is going to be killed
+ *
+ * NOTE: @memcg and @p's mem_cgroup can be different when hierarchy is
+ * enabled
+ */
+void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
+{
+	struct cgroup *task_cgrp;
+	struct cgroup *mem_cgrp;
+	/*
+	 * Need a buffer on stack, can't rely on allocations. The code relies
+	 * on the assumption that OOM is serialized for memory controller.
+	 * If this assumption is broken, revisit this code.
+	 */
+	static char task_memcg_name[PATH_MAX];
+	static char memcg_name[PATH_MAX];
+	int ret;
+
+	if (!memcg)
+		return;
+
+	mem_cgrp = memcg->css.cgroup;
+	task_cgrp = mem_cgroup_from_task(p)->css.cgroup;
+
+	rcu_read_lock();
+	ret = cgroup_path(task_cgrp, task_memcg_name, PATH_MAX);
+	if (ret < 0) {
+		/*
+		 * Unfortunately, we are unable to convert to a useful name
+		 * But we'll still print out the usage information
+		 */
+		rcu_read_unlock();
+		goto done;
+	}
+	ret = cgroup_path(mem_cgrp, memcg_name, PATH_MAX);
+	 if (ret < 0) {
+		rcu_read_unlock();
+		goto done;
+	}
+
+	rcu_read_unlock();
+
+	printk(KERN_INFO "Task in %s killed as a result of limit of %s\n",
+			task_memcg_name, memcg_name);
+done:
+
+	printk(KERN_INFO "memory: usage %llukB, limit %llukB, failcnt %llu\n",
+		res_counter_read_u64(&memcg->res, RES_USAGE) >> 10,
+		res_counter_read_u64(&memcg->res, RES_LIMIT) >> 10,
+		res_counter_read_u64(&memcg->res, RES_FAILCNT));
+	printk(KERN_INFO "memory+swap: usage %llukB, limit %llukB, "
+		"failcnt %llu\n",
+		res_counter_read_u64(&memcg->memsw, RES_USAGE) >> 10,
+		res_counter_read_u64(&memcg->memsw, RES_LIMIT) >> 10,
+		res_counter_read_u64(&memcg->memsw, RES_FAILCNT));
+}
+
 /*
  * Unlike exported interface, "oom" parameter is added. if oom==true,
  * oom-killer can be invoked.
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index d3b9bac..2f3166e 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -394,6 +394,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 		cpuset_print_task_mems_allowed(current);
 		task_unlock(current);
 		dump_stack();
+		mem_cgroup_print_oom_info(mem, current);
 		show_mem();
 		if (sysctl_oom_dump_tasks)
 			dump_tasks(mem);

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [-mm patch] Show memcg information during OOM (v3)
  2009-02-03 17:21 [-mm patch] Show memcg information during OOM (v3) Balbir Singh
@ 2009-02-03 22:46 ` Andrew Morton
  2009-02-04  3:36   ` Balbir Singh
  2009-02-05 21:55   ` Andrew Morton
  2009-02-04  0:53 ` Li Zefan
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 18+ messages in thread
From: Andrew Morton @ 2009-02-03 22:46 UTC (permalink / raw)
  To: balbir; +Cc: kamezawa.hiroyu, linux-kernel, nishimura, lizf, linux-mm

On Tue, 3 Feb 2009 22:51:35 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> 
> Description: Add RSS and swap to OOM output from memcg
> 
> From: Balbir Singh <balbir@linux.vnet.ibm.com>
> 
> Changelog v3..v2
> 1. Use static char arrays of size PATH_MAX in order to make
>    the OOM message more reliable.
> 
> Changelog v2..v1:
> 
> 1. Add more information about task's memcg and the memcg
>    over it's limit
> 2. Print data in KB
> 3. Move the print routine outside task_lock()
> 4. Use rcu_read_lock() around cgroup_path, strictly speaking it
>    is not required, but relying on the current memcg implementation
>    is not a good idea.
> 
> 
> This patch displays memcg values like failcnt, usage and limit
> when an OOM occurs due to memcg.
> 
> Thanks go out to Johannes Weiner, Li Zefan, David Rientjes,
> Kamezawa Hiroyuki, Daisuke Nishimura and KOSAKI Motohiro for
> review.
> 
> Sample output
> -------------
> 
> Task in /a/x killed as a result of limit of /a
> memory: usage 1048576kB, limit 1048576kB, failcnt 4183
> memory+swap: usage 1400964kB, limit 9007199254740991kB, failcnt 0
> 
>
> ...
>
> +/**
> + * mem_cgroup_print_mem_info: Called from OOM with tasklist_lock held in
> + * read mode.
> + * @memcg: The memory cgroup that went over limit
> + * @p: Task that is going to be killed
> + *
> + * NOTE: @memcg and @p's mem_cgroup can be different when hierarchy is
> + * enabled
> + */
> +void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
> +{
> +	struct cgroup *task_cgrp;
> +	struct cgroup *mem_cgrp;
> +	/*
> +	 * Need a buffer on stack, can't rely on allocations. The code relies
> +	 * on the assumption that OOM is serialized for memory controller.
> +	 * If this assumption is broken, revisit this code.
> +	 */
> +	static char task_memcg_name[PATH_MAX];
> +	static char memcg_name[PATH_MAX];

I don't think we need both of these.  With a bit of shuffling we could
reuse the single buffer?

fixlets..

- kerneldoc requires that the function description be a single line

- unmunge whitespace

--- a/mm/memcontrol.c~memcg-show-memcg-information-during-oom-fix
+++ a/mm/memcontrol.c
@@ -724,8 +724,7 @@ static int mem_cgroup_count_children_cb(
 }
 
 /**
- * mem_cgroup_print_mem_info: Called from OOM with tasklist_lock held in
- * read mode.
+ * mem_cgroup_print_mem_info: Called from OOM with tasklist_lock held in read mode.
  * @memcg: The memory cgroup that went over limit
  * @p: Task that is going to be killed
  *
@@ -762,7 +761,7 @@ void mem_cgroup_print_oom_info(struct me
 		goto done;
 	}
 	ret = cgroup_path(mem_cgrp, memcg_name, PATH_MAX);
-	 if (ret < 0) {
+	if (ret < 0) {
 		rcu_read_unlock();
 		goto done;
 	}
diff -puN mm/oom_kill.c~memcg-show-memcg-information-during-oom-fix mm/oom_kill.c
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [-mm patch] Show memcg information during OOM (v3)
  2009-02-03 17:21 [-mm patch] Show memcg information during OOM (v3) Balbir Singh
  2009-02-03 22:46 ` Andrew Morton
@ 2009-02-04  0:53 ` Li Zefan
  2009-02-04  1:35   ` Li Zefan
  2009-02-04  3:37   ` Balbir Singh
  2009-02-05  4:00 ` Lai Jiangshan
  2009-02-06  3:02 ` Li Zefan
  3 siblings, 2 replies; 18+ messages in thread
From: Li Zefan @ 2009-02-04  0:53 UTC (permalink / raw)
  To: balbir
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, linux-kernel, nishimura, linux-mm

> @@ -104,6 +104,8 @@ struct zone_reclaim_stat *mem_cgroup_get_reclaim_stat(struct mem_cgroup *memcg,
>  						      struct zone *zone);
>  struct zone_reclaim_stat*
>  mem_cgroup_get_reclaim_stat_from_page(struct page *page);
> +extern void mem_cgroup_print_oom_info(struct mem_cgroup *memcg,
> +					struct task_struct *p);
>  
>  #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
>  extern int do_swap_account;
> @@ -270,6 +272,10 @@ mem_cgroup_get_reclaim_stat_from_page(struct page *page)
>  	return NULL;
>  }
>  
> +void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
> +{

should be static inline, otherwise it won't compile if CONFIG_CGROUP_MEM_CONT=n

> +}
> +
>  #endif /* CONFIG_CGROUP_MEM_CONT */
>  

> +void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
> +{
> +	struct cgroup *task_cgrp;
> +	struct cgroup *mem_cgrp;
> +	/*
> +	 * Need a buffer on stack, can't rely on allocations. The code relies

I think it's in .bss section, but not on stack, and it's better to explain why
the static buffer is safe in the comment.

> +	 * on the assumption that OOM is serialized for memory controller.
> +	 * If this assumption is broken, revisit this code.
> +	 */
> +	static char task_memcg_name[PATH_MAX];
> +	static char memcg_name[PATH_MAX];
> +	int ret;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [-mm patch] Show memcg information during OOM (v3)
  2009-02-04  0:53 ` Li Zefan
@ 2009-02-04  1:35   ` Li Zefan
  2009-02-04  3:37   ` Balbir Singh
  1 sibling, 0 replies; 18+ messages in thread
From: Li Zefan @ 2009-02-04  1:35 UTC (permalink / raw)
  To: balbir
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, linux-kernel, nishimura, linux-mm

>> +void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
>> +{
>> +	struct cgroup *task_cgrp;
>> +	struct cgroup *mem_cgrp;
>> +	/*
>> +	 * Need a buffer on stack, can't rely on allocations. The code relies
> 
> I think it's in .bss section, but not on stack

> and it's better to explain why
> the static buffer is safe in the comment.
> 

Sorry, I just saw the below comment.

>> +	 * on the assumption that OOM is serialized for memory controller.
>> +	 * If this assumption is broken, revisit this code.
>> +	 */
>> +	static char task_memcg_name[PATH_MAX];
>> +	static char memcg_name[PATH_MAX];
>> +	int ret;
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [-mm patch] Show memcg information during OOM (v3)
  2009-02-03 22:46 ` Andrew Morton
@ 2009-02-04  3:36   ` Balbir Singh
  2009-02-05 21:55   ` Andrew Morton
  1 sibling, 0 replies; 18+ messages in thread
From: Balbir Singh @ 2009-02-04  3:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: kamezawa.hiroyu, linux-kernel, nishimura, lizf, linux-mm

* Andrew Morton <akpm@linux-foundation.org> [2009-02-03 14:46:47]:

> -	 if (ret < 0) {
> +	if (ret < 0) {
i
Andrew sorry about the whitespace issues, I ran checkpatch and it did
not show up there, but I clearly see it in the patch. Do you want me
to send you a fixed patch?

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [-mm patch] Show memcg information during OOM (v3)
  2009-02-04  0:53 ` Li Zefan
  2009-02-04  1:35   ` Li Zefan
@ 2009-02-04  3:37   ` Balbir Singh
  2009-02-04  5:24     ` KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 18+ messages in thread
From: Balbir Singh @ 2009-02-04  3:37 UTC (permalink / raw)
  To: Li Zefan
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, linux-kernel, nishimura, linux-mm

* Li Zefan <lizf@cn.fujitsu.com> [2009-02-04 08:53:59]:

> > @@ -104,6 +104,8 @@ struct zone_reclaim_stat *mem_cgroup_get_reclaim_stat(struct mem_cgroup *memcg,
> >  						      struct zone *zone);
> >  struct zone_reclaim_stat*
> >  mem_cgroup_get_reclaim_stat_from_page(struct page *page);
> > +extern void mem_cgroup_print_oom_info(struct mem_cgroup *memcg,
> > +					struct task_struct *p);
> >  
> >  #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
> >  extern int do_swap_account;
> > @@ -270,6 +272,10 @@ mem_cgroup_get_reclaim_stat_from_page(struct page *page)
> >  	return NULL;
> >  }
> >  
> > +void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
> > +{
> 
> should be static inline, otherwise it won't compile if CONFIG_CGROUP_MEM_CONT=n
> 

Oh! yes.

> > +}
> > +
> >  #endif /* CONFIG_CGROUP_MEM_CONT */
> >  
> 
> > +void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
> > +{
> > +	struct cgroup *task_cgrp;
> > +	struct cgroup *mem_cgrp;
> > +	/*
> > +	 * Need a buffer on stack, can't rely on allocations. The code relies
> 
> I think it's in .bss section, but not on stack, and it's better to explain why
> the static buffer is safe in the comment.
>

Yes, it is no longer on stack, in the original patch it was. I'll send
an updated patch 

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [-mm patch] Show memcg information during OOM (v3)
  2009-02-04  3:37   ` Balbir Singh
@ 2009-02-04  5:24     ` KAMEZAWA Hiroyuki
  2009-02-04  6:42       ` Balbir Singh
  0 siblings, 1 reply; 18+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-02-04  5:24 UTC (permalink / raw)
  To: balbir; +Cc: Li Zefan, Andrew Morton, linux-kernel, nishimura, linux-mm

On Wed, 4 Feb 2009 09:07:50 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> > > +}
> > > +
> > >  #endif /* CONFIG_CGROUP_MEM_CONT */
> > >  
> > 
> > > +void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
> > > +{
> > > +	struct cgroup *task_cgrp;
> > > +	struct cgroup *mem_cgrp;
> > > +	/*
> > > +	 * Need a buffer on stack, can't rely on allocations. The code relies
> > 
> > I think it's in .bss section, but not on stack, and it's better to explain why
> > the static buffer is safe in the comment.
> >
> 
> Yes, it is no longer on stack, in the original patch it was. I'll send
> an updated patch 
> 
In the newest mmotm, OOM kill message is following.
==
Feb  4 13:16:28 localhost kernel: [  249.338911] malloc2 invoked oom-killer: gfp_mask=0xd0, order=0, oomkilladj=0
Feb  4 13:16:28 localhost kernel: [  249.339018] malloc2 cpuset=/ mems_allowed=0
Feb  4 13:16:28 localhost kernel: [  249.339023] Pid: 3459, comm: malloc2 Not tainted 2.6.29-rc3-mm1 #1
Feb  4 13:16:28 localhost kernel: [  249.339185] Call Trace:
Feb  4 13:16:28 localhost kernel: [  249.339202]  [<ffffffff8148dda6>] ? _spin_unlock+0x26/0x2a
Feb  4 13:16:28 localhost kernel: [  249.339210]  [<ffffffff8108d48d>] oom_kill_process+0x99/0x272
Feb  4 13:16:28 localhost kernel: [  249.339214]  [<ffffffff8108d918>] ? select_bad_process+0x9d/0xfa
Feb  4 13:16:28 localhost kernel: [  249.339219]  [<ffffffff8108dc8f>] mem_cgroup_out_of_memory+0x65/0x82
Feb  4 13:16:28 localhost kernel: [  249.339224]  [<ffffffff810bd457>] __mem_cgroup_try_charge+0x14c/0x196
Feb  4 13:16:28 localhost kernel: [  249.339229]  [<ffffffff810bdffa>] mem_cgroup_charge_common+0x47/0x72
Feb  4 13:16:28 localhost kernel: [  249.339234]  [<ffffffff810be063>] mem_cgroup_newpage_charge+0x3e/0x4f
Feb  4 13:16:28 localhost kernel: [  249.339239]  [<ffffffff810a05f9>] handle_mm_fault+0x214/0x761
Feb  4 13:16:28 localhost kernel: [  249.339244]  [<ffffffff8149062d>] do_page_fault+0x248/0x25f
Feb  4 13:16:28 localhost kernel: [  249.339249]  [<ffffffff8148e64f>] page_fault+0x1f/0x30
Feb  4 13:16:28 localhost kernel: [  249.339260] Task in /group_A/01 killed as a result of limit of /group_A
Feb  4 13:16:28 localhost kernel: [  249.339264] memory: usage 39168kB, limit 40960kB, failcnt 1
Feb  4 13:16:28 localhost kernel: [  249.339266] memory+swap: usage 40960kB, limit 40960kB, failcnt 15
==
Task in /group_A/01 is killed by mem+swap limit of /group_A. 

Yeah, very nice look :) thank you.

BTW, I wonder can't we show the path of mount point ?
/group_A/01 is /cgroup/group_A/01 and /group_A/ is /cgroup/group_A/ on this system.
Very difficult ?

Thanks,
-Kame







--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [-mm patch] Show memcg information during OOM (v3)
  2009-02-04  5:24     ` KAMEZAWA Hiroyuki
@ 2009-02-04  6:42       ` Balbir Singh
  2009-02-04  6:48         ` Li Zefan
  2009-02-04  6:50         ` KAMEZAWA Hiroyuki
  0 siblings, 2 replies; 18+ messages in thread
From: Balbir Singh @ 2009-02-04  6:42 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Li Zefan, Andrew Morton, linux-kernel, nishimura, linux-mm

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-02-04 14:24:55]:

> On Wed, 4 Feb 2009 09:07:50 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
> > > > +}
> > > > +
> > > >  #endif /* CONFIG_CGROUP_MEM_CONT */
> > > >  
> > > 
> > > > +void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
> > > > +{
> > > > +	struct cgroup *task_cgrp;
> > > > +	struct cgroup *mem_cgrp;
> > > > +	/*
> > > > +	 * Need a buffer on stack, can't rely on allocations. The code relies
> > > 
> > > I think it's in .bss section, but not on stack, and it's better to explain why
> > > the static buffer is safe in the comment.
> > >
> > 
> > Yes, it is no longer on stack, in the original patch it was. I'll send
> > an updated patch 
> > 
> In the newest mmotm, OOM kill message is following.
> ==
> Feb  4 13:16:28 localhost kernel: [  249.338911] malloc2 invoked oom-killer: gfp_mask=0xd0, order=0, oomkilladj=0
> Feb  4 13:16:28 localhost kernel: [  249.339018] malloc2 cpuset=/ mems_allowed=0
> Feb  4 13:16:28 localhost kernel: [  249.339023] Pid: 3459, comm: malloc2 Not tainted 2.6.29-rc3-mm1 #1
> Feb  4 13:16:28 localhost kernel: [  249.339185] Call Trace:
> Feb  4 13:16:28 localhost kernel: [  249.339202]  [<ffffffff8148dda6>] ? _spin_unlock+0x26/0x2a
> Feb  4 13:16:28 localhost kernel: [  249.339210]  [<ffffffff8108d48d>] oom_kill_process+0x99/0x272
> Feb  4 13:16:28 localhost kernel: [  249.339214]  [<ffffffff8108d918>] ? select_bad_process+0x9d/0xfa
> Feb  4 13:16:28 localhost kernel: [  249.339219]  [<ffffffff8108dc8f>] mem_cgroup_out_of_memory+0x65/0x82
> Feb  4 13:16:28 localhost kernel: [  249.339224]  [<ffffffff810bd457>] __mem_cgroup_try_charge+0x14c/0x196
> Feb  4 13:16:28 localhost kernel: [  249.339229]  [<ffffffff810bdffa>] mem_cgroup_charge_common+0x47/0x72
> Feb  4 13:16:28 localhost kernel: [  249.339234]  [<ffffffff810be063>] mem_cgroup_newpage_charge+0x3e/0x4f
> Feb  4 13:16:28 localhost kernel: [  249.339239]  [<ffffffff810a05f9>] handle_mm_fault+0x214/0x761
> Feb  4 13:16:28 localhost kernel: [  249.339244]  [<ffffffff8149062d>] do_page_fault+0x248/0x25f
> Feb  4 13:16:28 localhost kernel: [  249.339249]  [<ffffffff8148e64f>] page_fault+0x1f/0x30
> Feb  4 13:16:28 localhost kernel: [  249.339260] Task in /group_A/01 killed as a result of limit of /group_A
> Feb  4 13:16:28 localhost kernel: [  249.339264] memory: usage 39168kB, limit 40960kB, failcnt 1
> Feb  4 13:16:28 localhost kernel: [  249.339266] memory+swap: usage 40960kB, limit 40960kB, failcnt 15
> ==
> Task in /group_A/01 is killed by mem+swap limit of /group_A. 
> 
> Yeah, very nice look :) thank you.
> 

Welcome! Thanks for the good suggestion earlier.

> BTW, I wonder can't we show the path of mount point ?
> /group_A/01 is /cgroup/group_A/01 and /group_A/ is /cgroup/group_A/ on this system.
> Very difficult ?
>

No, it is not very difficult, we just need to append the mount point.
The reason for not doing it is consistency with output of
/proc/<pid>/cgroup and other places where cgroup_path prints the path
relative to the mount point. Since we are talking about memory, the
administrator should know where it is mounted. Do you strongly feel
the need to add mount point? My concern is consistency with other
cgroup output (look at /proc/sched_debug) for example.

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [-mm patch] Show memcg information during OOM (v3)
  2009-02-04  6:42       ` Balbir Singh
@ 2009-02-04  6:48         ` Li Zefan
  2009-02-04  6:52           ` KAMEZAWA Hiroyuki
  2009-02-04  6:50         ` KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 18+ messages in thread
From: Li Zefan @ 2009-02-04  6:48 UTC (permalink / raw)
  To: balbir
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, linux-kernel, nishimura, linux-mm

>> BTW, I wonder can't we show the path of mount point ?
>> /group_A/01 is /cgroup/group_A/01 and /group_A/ is /cgroup/group_A/ on this system.
>> Very difficult ?
>>
> 
> No, it is not very difficult, we just need to append the mount point.
> The reason for not doing it is consistency with output of
> /proc/<pid>/cgroup and other places where cgroup_path prints the path
> relative to the mount point. Since we are talking about memory, the
> administrator should know where it is mounted. Do you strongly feel
> the need to add mount point? My concern is consistency with other
> cgroup output (look at /proc/sched_debug) for example.
> 

Another reason to not do so is, we can mount a specific hierarchy to
multiple mount points.
	# mount -t cgroup -o memory /mnt
	# mount -t cgroup -o memory /cgroup
	# mkdir /mnt/0
Now, /mnt/0 is the same with /cgroup/0.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [-mm patch] Show memcg information during OOM (v3)
  2009-02-04  6:42       ` Balbir Singh
  2009-02-04  6:48         ` Li Zefan
@ 2009-02-04  6:50         ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 18+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-02-04  6:50 UTC (permalink / raw)
  To: balbir; +Cc: Li Zefan, Andrew Morton, linux-kernel, nishimura, linux-mm

On Wed, 4 Feb 2009 12:12:49 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-02-04 14:24:55]:
> 
> > On Wed, 4 Feb 2009 09:07:50 +0530
> > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > 
> > > > > +}
> > > > > +
> > > > >  #endif /* CONFIG_CGROUP_MEM_CONT */
> > > > >  
> > > > 
> > > > > +void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
> > > > > +{
> > > > > +	struct cgroup *task_cgrp;
> > > > > +	struct cgroup *mem_cgrp;
> > > > > +	/*
> > > > > +	 * Need a buffer on stack, can't rely on allocations. The code relies
> > > > 
> > > > I think it's in .bss section, but not on stack, and it's better to explain why
> > > > the static buffer is safe in the comment.
> > > >
> > > 
> > > Yes, it is no longer on stack, in the original patch it was. I'll send
> > > an updated patch 
> > > 
> > In the newest mmotm, OOM kill message is following.
> > ==
> > Feb  4 13:16:28 localhost kernel: [  249.338911] malloc2 invoked oom-killer: gfp_mask=0xd0, order=0, oomkilladj=0
> > Feb  4 13:16:28 localhost kernel: [  249.339018] malloc2 cpuset=/ mems_allowed=0
> > Feb  4 13:16:28 localhost kernel: [  249.339023] Pid: 3459, comm: malloc2 Not tainted 2.6.29-rc3-mm1 #1
> > Feb  4 13:16:28 localhost kernel: [  249.339185] Call Trace:
> > Feb  4 13:16:28 localhost kernel: [  249.339202]  [<ffffffff8148dda6>] ? _spin_unlock+0x26/0x2a
> > Feb  4 13:16:28 localhost kernel: [  249.339210]  [<ffffffff8108d48d>] oom_kill_process+0x99/0x272
> > Feb  4 13:16:28 localhost kernel: [  249.339214]  [<ffffffff8108d918>] ? select_bad_process+0x9d/0xfa
> > Feb  4 13:16:28 localhost kernel: [  249.339219]  [<ffffffff8108dc8f>] mem_cgroup_out_of_memory+0x65/0x82
> > Feb  4 13:16:28 localhost kernel: [  249.339224]  [<ffffffff810bd457>] __mem_cgroup_try_charge+0x14c/0x196
> > Feb  4 13:16:28 localhost kernel: [  249.339229]  [<ffffffff810bdffa>] mem_cgroup_charge_common+0x47/0x72
> > Feb  4 13:16:28 localhost kernel: [  249.339234]  [<ffffffff810be063>] mem_cgroup_newpage_charge+0x3e/0x4f
> > Feb  4 13:16:28 localhost kernel: [  249.339239]  [<ffffffff810a05f9>] handle_mm_fault+0x214/0x761
> > Feb  4 13:16:28 localhost kernel: [  249.339244]  [<ffffffff8149062d>] do_page_fault+0x248/0x25f
> > Feb  4 13:16:28 localhost kernel: [  249.339249]  [<ffffffff8148e64f>] page_fault+0x1f/0x30
> > Feb  4 13:16:28 localhost kernel: [  249.339260] Task in /group_A/01 killed as a result of limit of /group_A
> > Feb  4 13:16:28 localhost kernel: [  249.339264] memory: usage 39168kB, limit 40960kB, failcnt 1
> > Feb  4 13:16:28 localhost kernel: [  249.339266] memory+swap: usage 40960kB, limit 40960kB, failcnt 15
> > ==
> > Task in /group_A/01 is killed by mem+swap limit of /group_A. 
> > 
> > Yeah, very nice look :) thank you.
> > 
> 
> Welcome! Thanks for the good suggestion earlier.
> 
> > BTW, I wonder can't we show the path of mount point ?
> > /group_A/01 is /cgroup/group_A/01 and /group_A/ is /cgroup/group_A/ on this system.
> > Very difficult ?
> >
> 
> No, it is not very difficult, we just need to append the mount point.
> The reason for not doing it is consistency with output of
> /proc/<pid>/cgroup and other places where cgroup_path prints the path
> relative to the mount point. Since we are talking about memory, the
> administrator should know where it is mounted. Do you strongly feel
> the need to add mount point? My concern is consistency with other
> cgroup output (look at /proc/sched_debug) for example.
> 
No. just curious :)

Thanks a lot. Consistency is more important.

-Kame



> -- 
> 	Balbir
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [-mm patch] Show memcg information during OOM (v3)
  2009-02-04  6:48         ` Li Zefan
@ 2009-02-04  6:52           ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 18+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-02-04  6:52 UTC (permalink / raw)
  To: Li Zefan; +Cc: balbir, Andrew Morton, linux-kernel, nishimura, linux-mm

On Wed, 04 Feb 2009 14:48:58 +0800
Li Zefan <lizf@cn.fujitsu.com> wrote:

> >> BTW, I wonder can't we show the path of mount point ?
> >> /group_A/01 is /cgroup/group_A/01 and /group_A/ is /cgroup/group_A/ on this system.
> >> Very difficult ?
> >>
> > 
> > No, it is not very difficult, we just need to append the mount point.
> > The reason for not doing it is consistency with output of
> > /proc/<pid>/cgroup and other places where cgroup_path prints the path
> > relative to the mount point. Since we are talking about memory, the
> > administrator should know where it is mounted. Do you strongly feel
> > the need to add mount point? My concern is consistency with other
> > cgroup output (look at /proc/sched_debug) for example.
> > 
> 
> Another reason to not do so is, we can mount a specific hierarchy to
> multiple mount points.
> 	# mount -t cgroup -o memory /mnt
> 	# mount -t cgroup -o memory /cgroup
> 	# mkdir /mnt/0
> Now, /mnt/0 is the same with /cgroup/0.
> 
Thank you for clarification. Current logic seems right.

Thanks.
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [-mm patch] Show memcg information during OOM (v3)
  2009-02-03 17:21 [-mm patch] Show memcg information during OOM (v3) Balbir Singh
  2009-02-03 22:46 ` Andrew Morton
  2009-02-04  0:53 ` Li Zefan
@ 2009-02-05  4:00 ` Lai Jiangshan
  2009-02-05  4:55   ` KAMEZAWA Hiroyuki
  2009-02-06  3:02 ` Li Zefan
  3 siblings, 1 reply; 18+ messages in thread
From: Lai Jiangshan @ 2009-02-05  4:00 UTC (permalink / raw)
  To: balbir
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, linux-kernel, nishimura, lizf,
	linux-mm

Balbir Singh wrote:
> Description: Add RSS and swap to OOM output from memcg
> 
> From: Balbir Singh <balbir@linux.vnet.ibm.com>
> 
> Changelog v3..v2
> 1. Use static char arrays of size PATH_MAX in order to make
>    the OOM message more reliable.
> 
> Changelog v2..v1:
> 
> 1. Add more information about task's memcg and the memcg
>    over it's limit
> 2. Print data in KB
> 3. Move the print routine outside task_lock()
> 4. Use rcu_read_lock() around cgroup_path, strictly speaking it
>    is not required, but relying on the current memcg implementation
>    is not a good idea.
> 
> 
> This patch displays memcg values like failcnt, usage and limit
> when an OOM occurs due to memcg.
> 
> Thanks go out to Johannes Weiner, Li Zefan, David Rientjes,
> Kamezawa Hiroyuki, Daisuke Nishimura and KOSAKI Motohiro for
> review.
> 
> Sample output
> -------------
> 
> Task in /a/x killed as a result of limit of /a
> memory: usage 1048576kB, limit 1048576kB, failcnt 4183
> memory+swap: usage 1400964kB, limit 9007199254740991kB, failcnt 0
> 
> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> ---
> 
>  include/linux/memcontrol.h |    6 ++++
>  mm/memcontrol.c            |   63 ++++++++++++++++++++++++++++++++++++++++++++
>  mm/oom_kill.c              |    1 +
>  3 files changed, 70 insertions(+), 0 deletions(-)
> 
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 326f45c..f9a6e78 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -104,6 +104,8 @@ struct zone_reclaim_stat *mem_cgroup_get_reclaim_stat(struct mem_cgroup *memcg,
>  						      struct zone *zone);
>  struct zone_reclaim_stat*
>  mem_cgroup_get_reclaim_stat_from_page(struct page *page);
> +extern void mem_cgroup_print_oom_info(struct mem_cgroup *memcg,
> +					struct task_struct *p);
>  
>  #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
>  extern int do_swap_account;
> @@ -270,6 +272,10 @@ mem_cgroup_get_reclaim_stat_from_page(struct page *page)
>  	return NULL;
>  }
>  
> +void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
> +{
> +}
> +
>  #endif /* CONFIG_CGROUP_MEM_CONT */
>  
>  #endif /* _LINUX_MEMCONTROL_H */
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 8e4be9c..44e053b 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -27,6 +27,7 @@
>  #include <linux/backing-dev.h>
>  #include <linux/bit_spinlock.h>
>  #include <linux/rcupdate.h>
> +#include <linux/limits.h>
>  #include <linux/mutex.h>
>  #include <linux/slab.h>
>  #include <linux/swap.h>
> @@ -813,6 +814,68 @@ bool mem_cgroup_oom_called(struct task_struct *task)
>  	rcu_read_unlock();
>  	return ret;
>  }
> +
> +/**
> + * mem_cgroup_print_mem_info: Called from OOM with tasklist_lock held in
> + * read mode.
> + * @memcg: The memory cgroup that went over limit
> + * @p: Task that is going to be killed
> + *
> + * NOTE: @memcg and @p's mem_cgroup can be different when hierarchy is
> + * enabled
> + */
> +void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
> +{
> +	struct cgroup *task_cgrp;
> +	struct cgroup *mem_cgrp;
> +	/*
> +	 * Need a buffer on stack, can't rely on allocations. The code relies
> +	 * on the assumption that OOM is serialized for memory controller.
> +	 * If this assumption is broken, revisit this code.
> +	 */
> +	static char task_memcg_name[PATH_MAX];
> +	static char memcg_name[PATH_MAX];

Is there any lock which protects this static data?

> +	int ret;
> +
> +	if (!memcg)
> +		return;
> +
> +	mem_cgrp = memcg->css.cgroup;
> +	task_cgrp = mem_cgroup_from_task(p)->css.cgroup;
> +
> +	rcu_read_lock();
> +	ret = cgroup_path(task_cgrp, task_memcg_name, PATH_MAX);
> +	if (ret < 0) {
> +		/*
> +		 * Unfortunately, we are unable to convert to a useful name
> +		 * But we'll still print out the usage information
> +		 */
> +		rcu_read_unlock();
> +		goto done;
> +	}
> +	ret = cgroup_path(mem_cgrp, memcg_name, PATH_MAX);
> +	 if (ret < 0) {
> +		rcu_read_unlock();
> +		goto done;
> +	}
> +
> +	rcu_read_unlock();

IIRC, a preempt_enable() will add about 50 bytes to kernel size.

I think these lines are also good for readability:

	rcu_read_lock();
	ret = cgroup_path(task_cgrp, task_memcg_name, PATH_MAX);
	if (ret >= 0)
		ret = cgroup_path(mem_cgrp, memcg_name, PATH_MAX);
	rcu_read_unlock();

	if (ret < 0) {
		/*
		 * Unfortunately, we are unable to convert to a useful name
		 * But we'll still print out the usage information
		 */
		goto done;
	}

Lai

> +
> +	printk(KERN_INFO "Task in %s killed as a result of limit of %s\n",
> +			task_memcg_name, memcg_name);
> +done:
> +
> +	printk(KERN_INFO "memory: usage %llukB, limit %llukB, failcnt %llu\n",
> +		res_counter_read_u64(&memcg->res, RES_USAGE) >> 10,
> +		res_counter_read_u64(&memcg->res, RES_LIMIT) >> 10,
> +		res_counter_read_u64(&memcg->res, RES_FAILCNT));
> +	printk(KERN_INFO "memory+swap: usage %llukB, limit %llukB, "
> +		"failcnt %llu\n",
> +		res_counter_read_u64(&memcg->memsw, RES_USAGE) >> 10,
> +		res_counter_read_u64(&memcg->memsw, RES_LIMIT) >> 10,
> +		res_counter_read_u64(&memcg->memsw, RES_FAILCNT));
> +}
> +
>  /*
>   * Unlike exported interface, "oom" parameter is added. if oom==true,
>   * oom-killer can be invoked.
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index d3b9bac..2f3166e 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -394,6 +394,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
>  		cpuset_print_task_mems_allowed(current);
>  		task_unlock(current);
>  		dump_stack();
> +		mem_cgroup_print_oom_info(mem, current);
>  		show_mem();
>  		if (sysctl_oom_dump_tasks)
>  			dump_tasks(mem);
> 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [-mm patch] Show memcg information during OOM (v3)
  2009-02-05  4:00 ` Lai Jiangshan
@ 2009-02-05  4:55   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 18+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-02-05  4:55 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: balbir, Andrew Morton, linux-kernel, nishimura, lizf, linux-mm

On Thu, 05 Feb 2009 12:00:05 +0800
Lai Jiangshan <laijs@cn.fujitsu.com> wrote:

>
> > +void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
> > +{
> > +	struct cgroup *task_cgrp;
> > +	struct cgroup *mem_cgrp;
> > +	/*
> > +	 * Need a buffer on stack, can't rely on allocations. The code relies
> > +	 * on the assumption that OOM is serialized for memory controller.
> > +	 * If this assumption is broken, revisit this code.
> > +	 */
> > +	static char task_memcg_name[PATH_MAX];
> > +	static char memcg_name[PATH_MAX];
> 
> Is there any lock which protects this static data?
> 

As commented, seriealized by memory cgroup. see memcg_taslist mutex.


> > +	int ret;
> > +
> > +	if (!memcg)
> > +		return;
> > +
> > +	mem_cgrp = memcg->css.cgroup;
> > +	task_cgrp = mem_cgroup_from_task(p)->css.cgroup;
> > +
> > +	rcu_read_lock();
> > +	ret = cgroup_path(task_cgrp, task_memcg_name, PATH_MAX);
> > +	if (ret < 0) {
> > +		/*
> > +		 * Unfortunately, we are unable to convert to a useful name
> > +		 * But we'll still print out the usage information
> > +		 */
> > +		rcu_read_unlock();
> > +		goto done;
> > +	}
> > +	ret = cgroup_path(mem_cgrp, memcg_name, PATH_MAX);
> > +	 if (ret < 0) {
> > +		rcu_read_unlock();
> > +		goto done;
> > +	}
> > +
> > +	rcu_read_unlock();
> 
> IIRC, a preempt_enable() will add about 50 bytes to kernel size.
> 
I think compliler does good job here....


> I think these lines are also good for readability:
> 
> 	rcu_read_lock();
> 	ret = cgroup_path(task_cgrp, task_memcg_name, PATH_MAX);
> 	if (ret >= 0)
> 		ret = cgroup_path(mem_cgrp, memcg_name, PATH_MAX);
> 	rcu_read_unlock();
> 

In mmotm set, there is no 2 buffers. just one.
Sorry, if you have comments, patch against mmotm is welcome.

Thanks,
-Kame

> 	if (ret < 0) {
> 		/*
> 		 * Unfortunately, we are unable to convert to a useful name
> 		 * But we'll still print out the usage information
> 		 */
> 		goto done;
> 	}
> 
> Lai
> 
> > +
> > +	printk(KERN_INFO "Task in %s killed as a result of limit of %s\n",
> > +			task_memcg_name, memcg_name);
> > +done:
> > +
> > +	printk(KERN_INFO "memory: usage %llukB, limit %llukB, failcnt %llu\n",
> > +		res_counter_read_u64(&memcg->res, RES_USAGE) >> 10,
> > +		res_counter_read_u64(&memcg->res, RES_LIMIT) >> 10,
> > +		res_counter_read_u64(&memcg->res, RES_FAILCNT));
> > +	printk(KERN_INFO "memory+swap: usage %llukB, limit %llukB, "
> > +		"failcnt %llu\n",
> > +		res_counter_read_u64(&memcg->memsw, RES_USAGE) >> 10,
> > +		res_counter_read_u64(&memcg->memsw, RES_LIMIT) >> 10,
> > +		res_counter_read_u64(&memcg->memsw, RES_FAILCNT));
> > +}
> > +
> >  /*
> >   * Unlike exported interface, "oom" parameter is added. if oom==true,
> >   * oom-killer can be invoked.
> > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > index d3b9bac..2f3166e 100644
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -394,6 +394,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
> >  		cpuset_print_task_mems_allowed(current);
> >  		task_unlock(current);
> >  		dump_stack();
> > +		mem_cgroup_print_oom_info(mem, current);
> >  		show_mem();
> >  		if (sysctl_oom_dump_tasks)
> >  			dump_tasks(mem);
> > 
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [-mm patch] Show memcg information during OOM (v3)
  2009-02-03 22:46 ` Andrew Morton
  2009-02-04  3:36   ` Balbir Singh
@ 2009-02-05 21:55   ` Andrew Morton
  2009-02-06  2:26     ` Balbir Singh
  2009-02-06  7:01     ` Balbir Singh
  1 sibling, 2 replies; 18+ messages in thread
From: Andrew Morton @ 2009-02-05 21:55 UTC (permalink / raw)
  To: balbir, kamezawa.hiroyu, linux-kernel, nishimura, lizf, linux-mm

On Tue, 3 Feb 2009 14:46:47 -0800
Andrew Morton <akpm@linux-foundation.org> wrote:

> > +/**
> > + * mem_cgroup_print_mem_info: Called from OOM with tasklist_lock held in
> > + * read mode.
> > + * @memcg: The memory cgroup that went over limit
> > + * @p: Task that is going to be killed
> > + *
> > + * NOTE: @memcg and @p's mem_cgroup can be different when hierarchy is
> > + * enabled
> > + */
> > +void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
> > +{
> > +	struct cgroup *task_cgrp;
> > +	struct cgroup *mem_cgrp;
> > +	/*
> > +	 * Need a buffer on stack, can't rely on allocations. The code relies
> > +	 * on the assumption that OOM is serialized for memory controller.
> > +	 * If this assumption is broken, revisit this code.
> > +	 */
> > +	static char task_memcg_name[PATH_MAX];
> > +	static char memcg_name[PATH_MAX];
> 
> I don't think we need both of these.  With a bit of shuffling we could
> reuse the single buffer?

ping?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [-mm patch] Show memcg information during OOM (v3)
  2009-02-05 21:55   ` Andrew Morton
@ 2009-02-06  2:26     ` Balbir Singh
  2009-02-06  7:01     ` Balbir Singh
  1 sibling, 0 replies; 18+ messages in thread
From: Balbir Singh @ 2009-02-06  2:26 UTC (permalink / raw)
  To: Andrew Morton; +Cc: kamezawa.hiroyu, linux-kernel, nishimura, lizf, linux-mm

* Andrew Morton <akpm@linux-foundation.org> [2009-02-05 13:55:54]:

> On Tue, 3 Feb 2009 14:46:47 -0800
> Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > > +/**
> > > + * mem_cgroup_print_mem_info: Called from OOM with tasklist_lock held in
> > > + * read mode.
> > > + * @memcg: The memory cgroup that went over limit
> > > + * @p: Task that is going to be killed
> > > + *
> > > + * NOTE: @memcg and @p's mem_cgroup can be different when hierarchy is
> > > + * enabled
> > > + */
> > > +void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
> > > +{
> > > +	struct cgroup *task_cgrp;
> > > +	struct cgroup *mem_cgrp;
> > > +	/*
> > > +	 * Need a buffer on stack, can't rely on allocations. The code relies
> > > +	 * on the assumption that OOM is serialized for memory controller.
> > > +	 * If this assumption is broken, revisit this code.
> > > +	 */
> > > +	static char task_memcg_name[PATH_MAX];
> > > +	static char memcg_name[PATH_MAX];
> > 
> > I don't think we need both of these.  With a bit of shuffling we could
> > reuse the single buffer?
> 
> ping?
>

We can use a single buffer, I'll post a patch to fix it.
 

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [-mm patch] Show memcg information during OOM (v3)
  2009-02-03 17:21 [-mm patch] Show memcg information during OOM (v3) Balbir Singh
                   ` (2 preceding siblings ...)
  2009-02-05  4:00 ` Lai Jiangshan
@ 2009-02-06  3:02 ` Li Zefan
  2009-02-06  3:10   ` Li Zefan
  3 siblings, 1 reply; 18+ messages in thread
From: Li Zefan @ 2009-02-06  3:02 UTC (permalink / raw)
  To: balbir
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, linux-kernel, nishimura, linux-mm

> +	mem_cgrp = memcg->css.cgroup;
> +	task_cgrp = mem_cgroup_from_task(p)->css.cgroup;

I just noticed since v2, task's cgroup is also printed. Then 2 issues here:

1. this is better: task_cgrp = task_subsys_state(p, mem_cgroup_subsys_id);
2. getting cgroup from a task should be protected by task_lock or rcu_read_lock,
   so we can put the above statement inside rcu_read_lock below.

> +
> +	rcu_read_lock();
> +	ret = cgroup_path(task_cgrp, task_memcg_name, PATH_MAX);
> +	if (ret < 0) {
> +		/*
> +		 * Unfortunately, we are unable to convert to a useful name
> +		 * But we'll still print out the usage information
> +		 */
> +		rcu_read_unlock();
> +		goto done;
> +	}
> +	ret = cgroup_path(mem_cgrp, memcg_name, PATH_MAX);
> +	 if (ret < 0) {
> +		rcu_read_unlock();
> +		goto done;
> +	}
> +
> +	rcu_read_unlock();

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [-mm patch] Show memcg information during OOM (v3)
  2009-02-06  3:02 ` Li Zefan
@ 2009-02-06  3:10   ` Li Zefan
  0 siblings, 0 replies; 18+ messages in thread
From: Li Zefan @ 2009-02-06  3:10 UTC (permalink / raw)
  To: balbir
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, linux-kernel, nishimura, linux-mm

Li Zefan wrote:
>> +	mem_cgrp = memcg->css.cgroup;
>> +	task_cgrp = mem_cgroup_from_task(p)->css.cgroup;
> 
> I just noticed since v2, task's cgroup is also printed. Then 2 issues here:
> 
> 1. this is better: task_cgrp = task_subsys_state(p, mem_cgroup_subsys_id);

sorry, s/task_subsys_state/task_cgroup/

> 2. getting cgroup from a task should be protected by task_lock or rcu_read_lock,
>    so we can put the above statement inside rcu_read_lock below.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [-mm patch] Show memcg information during OOM (v3)
  2009-02-05 21:55   ` Andrew Morton
  2009-02-06  2:26     ` Balbir Singh
@ 2009-02-06  7:01     ` Balbir Singh
  1 sibling, 0 replies; 18+ messages in thread
From: Balbir Singh @ 2009-02-06  7:01 UTC (permalink / raw)
  To: Andrew Morton; +Cc: kamezawa.hiroyu, linux-kernel, nishimura, lizf, linux-mm

* Andrew Morton <akpm@linux-foundation.org> [2009-02-05 13:55:54]:

> 
> ping?
>

Andrew,

This patch fixes issues reported with the OOM printing patches.

From: Balbir Singh <balbir@linux.vnet.ibm.com>

1. It reduces the static buffers from 2 to 1
2. It fixes comments that incorrectly indicate that the buffer is on stack

This patch fails checkpatch.pl, due to split of the printk message.
I could not find an easy way to fix it.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

 mm/memcontrol.c |   23 +++++++++++++++--------
 1 files changed, 15 insertions(+), 8 deletions(-)


diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 839258e..9180702 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -736,22 +736,23 @@ void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
 	struct cgroup *task_cgrp;
 	struct cgroup *mem_cgrp;
 	/*
-	 * Need a buffer on stack, can't rely on allocations. The code relies
+	 * Need a buffer in BSS, can't rely on allocations. The code relies
 	 * on the assumption that OOM is serialized for memory controller.
 	 * If this assumption is broken, revisit this code.
 	 */
-	static char task_memcg_name[PATH_MAX];
 	static char memcg_name[PATH_MAX];
 	int ret;
 
 	if (!memcg)
 		return;
 
-	mem_cgrp = memcg->css.cgroup;
-	task_cgrp = mem_cgroup_from_task(p)->css.cgroup;
 
 	rcu_read_lock();
-	ret = cgroup_path(task_cgrp, task_memcg_name, PATH_MAX);
+
+	mem_cgrp = memcg->css.cgroup;
+	task_cgrp = task_cgroup(p, mem_cgroup_subsys_id);
+
+	ret = cgroup_path(task_cgrp, memcg_name, PATH_MAX);
 	if (ret < 0) {
 		/*
 		 * Unfortunately, we are unable to convert to a useful name
@@ -760,16 +761,22 @@ void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
 		rcu_read_unlock();
 		goto done;
 	}
+	rcu_read_unlock();
+
+	printk(KERN_INFO "Task in %s killed", memcg_name);
+
+	rcu_read_lock();
 	ret = cgroup_path(mem_cgrp, memcg_name, PATH_MAX);
 	if (ret < 0) {
 		rcu_read_unlock();
 		goto done;
 	}
-
 	rcu_read_unlock();
 
-	printk(KERN_INFO "Task in %s killed as a result of limit of %s\n",
-			task_memcg_name, memcg_name);
+	/*
+	 * Continues from above, so we don't need an KERN_ level
+	 */
+	printk(" as a result of limit of %s\n", memcg_name);
 done:
 
 	printk(KERN_INFO "memory: usage %llukB, limit %llukB, failcnt %llu\n",
-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2009-02-06  7:01 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-02-03 17:21 [-mm patch] Show memcg information during OOM (v3) Balbir Singh
2009-02-03 22:46 ` Andrew Morton
2009-02-04  3:36   ` Balbir Singh
2009-02-05 21:55   ` Andrew Morton
2009-02-06  2:26     ` Balbir Singh
2009-02-06  7:01     ` Balbir Singh
2009-02-04  0:53 ` Li Zefan
2009-02-04  1:35   ` Li Zefan
2009-02-04  3:37   ` Balbir Singh
2009-02-04  5:24     ` KAMEZAWA Hiroyuki
2009-02-04  6:42       ` Balbir Singh
2009-02-04  6:48         ` Li Zefan
2009-02-04  6:52           ` KAMEZAWA Hiroyuki
2009-02-04  6:50         ` KAMEZAWA Hiroyuki
2009-02-05  4:00 ` Lai Jiangshan
2009-02-05  4:55   ` KAMEZAWA Hiroyuki
2009-02-06  3:02 ` Li Zefan
2009-02-06  3:10   ` Li Zefan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox