linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/oom: added option 'oom_dump_task_cmdline'
@ 2019-02-20  3:22 Stepan Bujnak
  2019-02-20  4:09 ` Randy Dunlap
  2019-02-20  6:49 ` Michal Hocko
  0 siblings, 2 replies; 8+ messages in thread
From: Stepan Bujnak @ 2019-02-20  3:22 UTC (permalink / raw)
  To: linux-mm; +Cc: corbet, mcgrof, hannes, stepan

When oom_dump_tasks is enabled, this option will try to display task
cmdline instead of the command name in the system-wide task dump.

This is useful in some cases e.g. on postgres server. If OOM killer is
invoked it will show a bunch of tasks called 'postgres'. With this
option enabled it will show additional information like the database
user, database name and what it is currently doing.

Other example is python. Instead of just 'python' it will also show the
script name currently being executed.

Signed-off-by: Stepan Bujnak <stepan@pex.com>
---
 Documentation/sysctl/vm.txt | 10 ++++++++++
 include/linux/oom.h         |  1 +
 kernel/sysctl.c             |  7 +++++++
 mm/oom_kill.c               | 20 ++++++++++++++++++--
 4 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 187ce4f599a2..74278c8c30d2 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -50,6 +50,7 @@ Currently, these files are in /proc/sys/vm:
 - nr_trim_pages         (only if CONFIG_MMU=n)
 - numa_zonelist_order
 - oom_dump_tasks
+- oom_dump_task_cmdline
 - oom_kill_allocating_task
 - overcommit_kbytes
 - overcommit_memory
@@ -639,6 +640,15 @@ The default value is 1 (enabled).
 
 ==============================================================
 
+oom_dump_task_cmdline
+
+When oom_dump_tasks is enabled, this option will try to display task cmdline
+instead of the command name in the system-wide task dump.
+
+The default value is 0 (disabled).
+
+==============================================================
+
 oom_kill_allocating_task
 
 This enables or disables killing the OOM-triggering task in
diff --git a/include/linux/oom.h b/include/linux/oom.h
index d07992009265..461b15b3b695 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -125,6 +125,7 @@ extern struct task_struct *find_lock_task_mm(struct task_struct *p);
 
 /* sysctls */
 extern int sysctl_oom_dump_tasks;
+extern int sysctl_oom_dump_task_cmdline;
 extern int sysctl_oom_kill_allocating_task;
 extern int sysctl_panic_on_oom;
 #endif /* _INCLUDE_LINUX_OOM_H */
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index ba4d9e85feb8..4edc5f8e6cf9 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1288,6 +1288,13 @@ static struct ctl_table vm_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec,
 	},
+	{
+		.procname	= "oom_dump_task_cmdline",
+		.data		= &sysctl_oom_dump_task_cmdline,
+		.maxlen		= sizeof(sysctl_oom_dump_task_cmdline),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+	},
 	{
 		.procname	= "overcommit_ratio",
 		.data		= &sysctl_overcommit_ratio,
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 26ea8636758f..736fa0a6ab8d 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -41,6 +41,7 @@
 #include <linux/kthread.h>
 #include <linux/init.h>
 #include <linux/mmu_notifier.h>
+#include <linux/string_helpers.h>
 
 #include <asm/tlb.h>
 #include "internal.h"
@@ -52,6 +53,7 @@
 int sysctl_panic_on_oom;
 int sysctl_oom_kill_allocating_task;
 int sysctl_oom_dump_tasks = 1;
+int sysctl_oom_dump_task_cmdline;
 
 /*
  * Serializes oom killer invocations (out_of_memory()) from all contexts to
@@ -404,9 +406,18 @@ static void dump_tasks(struct mem_cgroup *memcg, const nodemask_t *nodemask)
 	pr_info("[  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name\n");
 	rcu_read_lock();
 	for_each_process(p) {
+		char *name, *cmd = NULL;
+
 		if (oom_unkillable_task(p, memcg, nodemask))
 			continue;
 
+		/*
+		 * This needs to be done before calling find_lock_task_mm()
+		 * since both grab a task lock which would result in deadlock.
+		 */
+		if (sysctl_oom_dump_task_cmdline)
+			cmd = kstrdup_quotable_cmdline(p, GFP_KERNEL);
+
 		task = find_lock_task_mm(p);
 		if (!task) {
 			/*
@@ -414,16 +425,21 @@ static void dump_tasks(struct mem_cgroup *memcg, const nodemask_t *nodemask)
 			 * detached their mm's.  There's no need to report
 			 * them; they can't be oom killed anyway.
 			 */
-			continue;
+			goto done;
 		}
 
+		name = cmd ? cmd : task->comm;
+
 		pr_info("[%7d] %5d %5d %8lu %8lu %8ld %8lu         %5hd %s\n",
 			task->pid, from_kuid(&init_user_ns, task_uid(task)),
 			task->tgid, task->mm->total_vm, get_mm_rss(task->mm),
 			mm_pgtables_bytes(task->mm),
 			get_mm_counter(task->mm, MM_SWAPENTS),
-			task->signal->oom_score_adj, task->comm);
+			task->signal->oom_score_adj, name);
 		task_unlock(task);
+
+done:
+		kfree(cmd);
 	}
 	rcu_read_unlock();
 }
-- 
2.17.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/oom: added option 'oom_dump_task_cmdline'
  2019-02-20  3:22 [PATCH] mm/oom: added option 'oom_dump_task_cmdline' Stepan Bujnak
@ 2019-02-20  4:09 ` Randy Dunlap
  2019-02-20  4:30   ` Bujnak, Stepan
  2019-02-20  6:49 ` Michal Hocko
  1 sibling, 1 reply; 8+ messages in thread
From: Randy Dunlap @ 2019-02-20  4:09 UTC (permalink / raw)
  To: Stepan Bujnak, linux-mm; +Cc: corbet, mcgrof, hannes

Hi,

Spell it out correctly (2 places):


On 2/19/19 7:22 PM, Stepan Bujnak wrote:
> When oom_dump_tasks is enabled, this option will try to display task

  When oom_dump_task_cmdline is enabled,

> cmdline instead of the command name in the system-wide task dump.
> 
> This is useful in some cases e.g. on postgres server. If OOM killer is
> invoked it will show a bunch of tasks called 'postgres'. With this
> option enabled it will show additional information like the database
> user, database name and what it is currently doing.
> 
> Other example is python. Instead of just 'python' it will also show the
> script name currently being executed.
> 
> Signed-off-by: Stepan Bujnak <stepan@pex.com>
> ---
>  Documentation/sysctl/vm.txt | 10 ++++++++++
>  include/linux/oom.h         |  1 +
>  kernel/sysctl.c             |  7 +++++++
>  mm/oom_kill.c               | 20 ++++++++++++++++++--
>  4 files changed, 36 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
> index 187ce4f599a2..74278c8c30d2 100644
> --- a/Documentation/sysctl/vm.txt
> +++ b/Documentation/sysctl/vm.txt
> @@ -50,6 +50,7 @@ Currently, these files are in /proc/sys/vm:
>  - nr_trim_pages         (only if CONFIG_MMU=n)
>  - numa_zonelist_order
>  - oom_dump_tasks
> +- oom_dump_task_cmdline
>  - oom_kill_allocating_task
>  - overcommit_kbytes
>  - overcommit_memory
> @@ -639,6 +640,15 @@ The default value is 1 (enabled).
>  
>  ==============================================================
>  
> +oom_dump_task_cmdline
> +
> +When oom_dump_tasks is enabled, this option will try to display task cmdline

   When oom_dump_task_cmdline is enabled,

> +instead of the command name in the system-wide task dump.
> +
> +The default value is 0 (disabled).
> +
> +==============================================================
> +
>  oom_kill_allocating_task
>  
>  This enables or disables killing the OOM-triggering task in
> diff --git a/include/linux/oom.h b/include/linux/oom.h
> index d07992009265..461b15b3b695 100644
> --- a/include/linux/oom.h
> +++ b/include/linux/oom.h
> @@ -125,6 +125,7 @@ extern struct task_struct *find_lock_task_mm(struct task_struct *p);
>  
>  /* sysctls */
>  extern int sysctl_oom_dump_tasks;
> +extern int sysctl_oom_dump_task_cmdline;
>  extern int sysctl_oom_kill_allocating_task;
>  extern int sysctl_panic_on_oom;
>  #endif /* _INCLUDE_LINUX_OOM_H */
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index ba4d9e85feb8..4edc5f8e6cf9 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -1288,6 +1288,13 @@ static struct ctl_table vm_table[] = {
>  		.mode		= 0644,
>  		.proc_handler	= proc_dointvec,
>  	},
> +	{
> +		.procname	= "oom_dump_task_cmdline",
> +		.data		= &sysctl_oom_dump_task_cmdline,
> +		.maxlen		= sizeof(sysctl_oom_dump_task_cmdline),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec,
> +	},
>  	{
>  		.procname	= "overcommit_ratio",
>  		.data		= &sysctl_overcommit_ratio,


thanks.
-- 
~Randy


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/oom: added option 'oom_dump_task_cmdline'
  2019-02-20  4:09 ` Randy Dunlap
@ 2019-02-20  4:30   ` Bujnak, Stepan
  2019-02-20  5:56     ` Randy Dunlap
  0 siblings, 1 reply; 8+ messages in thread
From: Bujnak, Stepan @ 2019-02-20  4:30 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: linux-mm, Jonathan Corbet, mcgrof, hannes

On Wed, Feb 20, 2019 at 5:10 AM Randy Dunlap <rdunlap@infradead.org> wrote:
>
> Hi,
>
> Spell it out correctly (2 places):
This is not a typo. It actually refers to the oom_dump_tasks option,
in a sense that when that option is enabled,
this option (oom_dump_task_cmdline) additionally displays task
cmdline instead of task name.
>
>
> On 2/19/19 7:22 PM, Stepan Bujnak wrote:
> > When oom_dump_tasks is enabled, this option will try to display task
>
>   When oom_dump_task_cmdline is enabled,
>
> > cmdline instead of the command name in the system-wide task dump.
> >
> > This is useful in some cases e.g. on postgres server. If OOM killer is
> > invoked it will show a bunch of tasks called 'postgres'. With this
> > option enabled it will show additional information like the database
> > user, database name and what it is currently doing.
> >
> > Other example is python. Instead of just 'python' it will also show the
> > script name currently being executed.
> >
> > Signed-off-by: Stepan Bujnak <stepan@pex.com>
> > ---
> >  Documentation/sysctl/vm.txt | 10 ++++++++++
> >  include/linux/oom.h         |  1 +
> >  kernel/sysctl.c             |  7 +++++++
> >  mm/oom_kill.c               | 20 ++++++++++++++++++--
> >  4 files changed, 36 insertions(+), 2 deletions(-)
> >
> > diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
> > index 187ce4f599a2..74278c8c30d2 100644
> > --- a/Documentation/sysctl/vm.txt
> > +++ b/Documentation/sysctl/vm.txt
> > @@ -50,6 +50,7 @@ Currently, these files are in /proc/sys/vm:
> >  - nr_trim_pages         (only if CONFIG_MMU=n)
> >  - numa_zonelist_order
> >  - oom_dump_tasks
> > +- oom_dump_task_cmdline
> >  - oom_kill_allocating_task
> >  - overcommit_kbytes
> >  - overcommit_memory
> > @@ -639,6 +640,15 @@ The default value is 1 (enabled).
> >
> >  ==============================================================
> >
> > +oom_dump_task_cmdline
> > +
> > +When oom_dump_tasks is enabled, this option will try to display task cmdline
>
>    When oom_dump_task_cmdline is enabled,
>
> > +instead of the command name in the system-wide task dump.
> > +
> > +The default value is 0 (disabled).
> > +
> > +==============================================================
> > +
> >  oom_kill_allocating_task
> >
> >  This enables or disables killing the OOM-triggering task in
> > diff --git a/include/linux/oom.h b/include/linux/oom.h
> > index d07992009265..461b15b3b695 100644
> > --- a/include/linux/oom.h
> > +++ b/include/linux/oom.h
> > @@ -125,6 +125,7 @@ extern struct task_struct *find_lock_task_mm(struct task_struct *p);
> >
> >  /* sysctls */
> >  extern int sysctl_oom_dump_tasks;
> > +extern int sysctl_oom_dump_task_cmdline;
> >  extern int sysctl_oom_kill_allocating_task;
> >  extern int sysctl_panic_on_oom;
> >  #endif /* _INCLUDE_LINUX_OOM_H */
> > diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> > index ba4d9e85feb8..4edc5f8e6cf9 100644
> > --- a/kernel/sysctl.c
> > +++ b/kernel/sysctl.c
> > @@ -1288,6 +1288,13 @@ static struct ctl_table vm_table[] = {
> >               .mode           = 0644,
> >               .proc_handler   = proc_dointvec,
> >       },
> > +     {
> > +             .procname       = "oom_dump_task_cmdline",
> > +             .data           = &sysctl_oom_dump_task_cmdline,
> > +             .maxlen         = sizeof(sysctl_oom_dump_task_cmdline),
> > +             .mode           = 0644,
> > +             .proc_handler   = proc_dointvec,
> > +     },
> >       {
> >               .procname       = "overcommit_ratio",
> >               .data           = &sysctl_overcommit_ratio,
>
>
> thanks.
> --
> ~Randy


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/oom: added option 'oom_dump_task_cmdline'
  2019-02-20  4:30   ` Bujnak, Stepan
@ 2019-02-20  5:56     ` Randy Dunlap
  0 siblings, 0 replies; 8+ messages in thread
From: Randy Dunlap @ 2019-02-20  5:56 UTC (permalink / raw)
  To: Bujnak, Stepan; +Cc: linux-mm, Jonathan Corbet, mcgrof, hannes

On 2/19/19 8:30 PM, Bujnak, Stepan wrote:
> On Wed, Feb 20, 2019 at 5:10 AM Randy Dunlap <rdunlap@infradead.org> wrote:
>>
>> Hi,
>>
>> Spell it out correctly (2 places):
> This is not a typo. It actually refers to the oom_dump_tasks option,
> in a sense that when that option is enabled,
> this option (oom_dump_task_cmdline) additionally displays task
> cmdline instead of task name.
>>

OK, thanks for clarifying.

>>
>> On 2/19/19 7:22 PM, Stepan Bujnak wrote:
>>> When oom_dump_tasks is enabled, this option will try to display task
>>
>>   When oom_dump_task_cmdline is enabled,
>>
>>> cmdline instead of the command name in the system-wide task dump.
>>>
>>> This is useful in some cases e.g. on postgres server. If OOM killer is
>>> invoked it will show a bunch of tasks called 'postgres'. With this
>>> option enabled it will show additional information like the database
>>> user, database name and what it is currently doing.
>>>
>>> Other example is python. Instead of just 'python' it will also show the
>>> script name currently being executed.
>>>
>>> Signed-off-by: Stepan Bujnak <stepan@pex.com>
>>> ---
>>>  Documentation/sysctl/vm.txt | 10 ++++++++++
>>>  include/linux/oom.h         |  1 +
>>>  kernel/sysctl.c             |  7 +++++++
>>>  mm/oom_kill.c               | 20 ++++++++++++++++++--
>>>  4 files changed, 36 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
>>> index 187ce4f599a2..74278c8c30d2 100644
>>> --- a/Documentation/sysctl/vm.txt
>>> +++ b/Documentation/sysctl/vm.txt
>>> @@ -50,6 +50,7 @@ Currently, these files are in /proc/sys/vm:
>>>  - nr_trim_pages         (only if CONFIG_MMU=n)
>>>  - numa_zonelist_order
>>>  - oom_dump_tasks
>>> +- oom_dump_task_cmdline
>>>  - oom_kill_allocating_task
>>>  - overcommit_kbytes
>>>  - overcommit_memory
>>> @@ -639,6 +640,15 @@ The default value is 1 (enabled).
>>>
>>>  ==============================================================
>>>
>>> +oom_dump_task_cmdline
>>> +
>>> +When oom_dump_tasks is enabled, this option will try to display task cmdline
>>
>>    When oom_dump_task_cmdline is enabled,
>>
>>> +instead of the command name in the system-wide task dump.
>>> +
>>> +The default value is 0 (disabled).
>>> +
>>> +==============================================================
>>> +
>>>  oom_kill_allocating_task
>>>
>>>  This enables or disables killing the OOM-triggering task in
>>> diff --git a/include/linux/oom.h b/include/linux/oom.h
>>> index d07992009265..461b15b3b695 100644
>>> --- a/include/linux/oom.h
>>> +++ b/include/linux/oom.h
>>> @@ -125,6 +125,7 @@ extern struct task_struct *find_lock_task_mm(struct task_struct *p);
>>>
>>>  /* sysctls */
>>>  extern int sysctl_oom_dump_tasks;
>>> +extern int sysctl_oom_dump_task_cmdline;
>>>  extern int sysctl_oom_kill_allocating_task;
>>>  extern int sysctl_panic_on_oom;
>>>  #endif /* _INCLUDE_LINUX_OOM_H */
>>> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
>>> index ba4d9e85feb8..4edc5f8e6cf9 100644
>>> --- a/kernel/sysctl.c
>>> +++ b/kernel/sysctl.c
>>> @@ -1288,6 +1288,13 @@ static struct ctl_table vm_table[] = {
>>>               .mode           = 0644,
>>>               .proc_handler   = proc_dointvec,
>>>       },
>>> +     {
>>> +             .procname       = "oom_dump_task_cmdline",
>>> +             .data           = &sysctl_oom_dump_task_cmdline,
>>> +             .maxlen         = sizeof(sysctl_oom_dump_task_cmdline),
>>> +             .mode           = 0644,
>>> +             .proc_handler   = proc_dointvec,
>>> +     },
>>>       {
>>>               .procname       = "overcommit_ratio",
>>>               .data           = &sysctl_overcommit_ratio,
>>
>>
>> thanks.
>> --
>> ~Randy


-- 
~Randy


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/oom: added option 'oom_dump_task_cmdline'
  2019-02-20  3:22 [PATCH] mm/oom: added option 'oom_dump_task_cmdline' Stepan Bujnak
  2019-02-20  4:09 ` Randy Dunlap
@ 2019-02-20  6:49 ` Michal Hocko
  2019-02-20  8:37   ` Bujnak, Stepan
  1 sibling, 1 reply; 8+ messages in thread
From: Michal Hocko @ 2019-02-20  6:49 UTC (permalink / raw)
  To: Stepan Bujnak; +Cc: linux-mm, corbet, mcgrof, hannes

On Wed 20-02-19 04:22:45, Stepan Bujnak wrote:
> When oom_dump_tasks is enabled, this option will try to display task
> cmdline instead of the command name in the system-wide task dump.
> 
> This is useful in some cases e.g. on postgres server. If OOM killer is
> invoked it will show a bunch of tasks called 'postgres'. With this
> option enabled it will show additional information like the database
> user, database name and what it is currently doing.
> 
> Other example is python. Instead of just 'python' it will also show the
> script name currently being executed.

The size of OOM report output is quite large already and this will just
add much more for some workloads and printing from this context is quite
a problem already.
 
> Signed-off-by: Stepan Bujnak <stepan@pex.com>
> ---
>  Documentation/sysctl/vm.txt | 10 ++++++++++
>  include/linux/oom.h         |  1 +
>  kernel/sysctl.c             |  7 +++++++
>  mm/oom_kill.c               | 20 ++++++++++++++++++--
>  4 files changed, 36 insertions(+), 2 deletions(-)
> 
[...]
> @@ -404,9 +406,18 @@ static void dump_tasks(struct mem_cgroup *memcg, const nodemask_t *nodemask)
>  	pr_info("[  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name\n");
>  	rcu_read_lock();
>  	for_each_process(p) {
> +		char *name, *cmd = NULL;
> +
>  		if (oom_unkillable_task(p, memcg, nodemask))
>  			continue;
>  
> +		/*
> +		 * This needs to be done before calling find_lock_task_mm()
> +		 * since both grab a task lock which would result in deadlock.
> +		 */
> +		if (sysctl_oom_dump_task_cmdline)
> +			cmd = kstrdup_quotable_cmdline(p, GFP_KERNEL);
> +
>  		task = find_lock_task_mm(p);
>  		if (!task) {
>  			/*
You are trying to allocate from the OOM context. That is a big no no.
Not to mention that this is deadlock prone because get_cmdline needs
mmap_sem and the allocating context migh hold the lock already. So the
patch is simply wrong.

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/oom: added option 'oom_dump_task_cmdline'
  2019-02-20  6:49 ` Michal Hocko
@ 2019-02-20  8:37   ` Bujnak, Stepan
  2019-02-20 10:00     ` Tetsuo Handa
  2019-02-20 10:01     ` Michal Hocko
  0 siblings, 2 replies; 8+ messages in thread
From: Bujnak, Stepan @ 2019-02-20  8:37 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, Jonathan Corbet, mcgrof, hannes

On Wed, Feb 20, 2019 at 7:49 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Wed 20-02-19 04:22:45, Stepan Bujnak wrote:
> > When oom_dump_tasks is enabled, this option will try to display task
> > cmdline instead of the command name in the system-wide task dump.
> >
> > This is useful in some cases e.g. on postgres server. If OOM killer is
> > invoked it will show a bunch of tasks called 'postgres'. With this
> > option enabled it will show additional information like the database
> > user, database name and what it is currently doing.
> >
> > Other example is python. Instead of just 'python' it will also show the
> > script name currently being executed.
>
> The size of OOM report output is quite large already and this will just
> add much more for some workloads and printing from this context is quite
> a problem already.
>

The option defaults to false so most workloads wouldn't be affected.
As an alternative the cmdline line can only be printed for the
victim task in the OOM summary.

> > Signed-off-by: Stepan Bujnak <stepan@pex.com>
> > ---
> >  Documentation/sysctl/vm.txt | 10 ++++++++++
> >  include/linux/oom.h         |  1 +
> >  kernel/sysctl.c             |  7 +++++++
> >  mm/oom_kill.c               | 20 ++++++++++++++++++--
> >  4 files changed, 36 insertions(+), 2 deletions(-)
> >
> [...]
> > @@ -404,9 +406,18 @@ static void dump_tasks(struct mem_cgroup *memcg, const nodemask_t *nodemask)
> >       pr_info("[  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name\n");
> >       rcu_read_lock();
> >       for_each_process(p) {
> > +             char *name, *cmd = NULL;
> > +
> >               if (oom_unkillable_task(p, memcg, nodemask))
> >                       continue;
> >
> > +             /*
> > +              * This needs to be done before calling find_lock_task_mm()
> > +              * since both grab a task lock which would result in deadlock.
> > +              */
> > +             if (sysctl_oom_dump_task_cmdline)
> > +                     cmd = kstrdup_quotable_cmdline(p, GFP_KERNEL);
> > +
> >               task = find_lock_task_mm(p);
> >               if (!task) {
> >                       /*
> You are trying to allocate from the OOM context. That is a big no no.
> Not to mention that this is deadlock prone because get_cmdline needs
> mmap_sem and the allocating context migh hold the lock already. So the
> patch is simply wrong.
>

Thanks for the notes. I understand how allocating from OOM context
is a problem. However I still believe that this would be helpful
for debugging OOM kills since task->comm is often not descriptive
enough. Would it help if instead of calling kstrdup_quotable_cmdline()
which allocates the buffer on heap I called get_cmdline() directly
passing it stack-allocated buffer of certain size e.g. 256?

> --
> Michal Hocko
> SUSE Labs


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/oom: added option 'oom_dump_task_cmdline'
  2019-02-20  8:37   ` Bujnak, Stepan
@ 2019-02-20 10:00     ` Tetsuo Handa
  2019-02-20 10:01     ` Michal Hocko
  1 sibling, 0 replies; 8+ messages in thread
From: Tetsuo Handa @ 2019-02-20 10:00 UTC (permalink / raw)
  To: Bujnak, Stepan; +Cc: Michal Hocko, linux-mm, Jonathan Corbet, mcgrof, hannes

On 2019/02/20 17:37, Bujnak, Stepan wrote:
>>> @@ -404,9 +406,18 @@ static void dump_tasks(struct mem_cgroup *memcg, const nodemask_t *nodemask)
>>>       pr_info("[  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name\n");
>>>       rcu_read_lock();
>>>       for_each_process(p) {
>>> +             char *name, *cmd = NULL;
>>> +
>>>               if (oom_unkillable_task(p, memcg, nodemask))
>>>                       continue;
>>>
>>> +             /*
>>> +              * This needs to be done before calling find_lock_task_mm()
>>> +              * since both grab a task lock which would result in deadlock.
>>> +              */
>>> +             if (sysctl_oom_dump_task_cmdline)
>>> +                     cmd = kstrdup_quotable_cmdline(p, GFP_KERNEL);
>>> +
>>>               task = find_lock_task_mm(p);
>>>               if (!task) {
>>>                       /*
>> You are trying to allocate from the OOM context. That is a big no no.
>> Not to mention that this is deadlock prone because get_cmdline needs
>> mmap_sem and the allocating context migh hold the lock already. So the
>> patch is simply wrong.
>>
> 
> Thanks for the notes. I understand how allocating from OOM context
> is a problem. However I still believe that this would be helpful
> for debugging OOM kills since task->comm is often not descriptive
> enough. Would it help if instead of calling kstrdup_quotable_cmdline()
> which allocates the buffer on heap I called get_cmdline() directly
> passing it stack-allocated buffer of certain size e.g. 256?

You made triple errors. First is that doing GFP_KERNEL allocation inside
rcu_read_lock()/rcu_read_unlock() is not permitted. Second is that doing
GFP_KERNEL allocation with oom_lock held is not permitted. Third is that
somebody might be already holding p->mm->mmap_sem for write when
get_cmdline() tries to hold it for read. That is, your patch can't work
(even if you update your patch to use static buffer).


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/oom: added option 'oom_dump_task_cmdline'
  2019-02-20  8:37   ` Bujnak, Stepan
  2019-02-20 10:00     ` Tetsuo Handa
@ 2019-02-20 10:01     ` Michal Hocko
  1 sibling, 0 replies; 8+ messages in thread
From: Michal Hocko @ 2019-02-20 10:01 UTC (permalink / raw)
  To: Bujnak, Stepan; +Cc: linux-mm, Jonathan Corbet, mcgrof, hannes

On Wed 20-02-19 09:37:56, Bujnak, Stepan wrote:
> On Wed, Feb 20, 2019 at 7:49 AM Michal Hocko <mhocko@kernel.org> wrote:
[...]
> > You are trying to allocate from the OOM context. That is a big no no.
> > Not to mention that this is deadlock prone because get_cmdline needs
> > mmap_sem and the allocating context migh hold the lock already. So the
> > patch is simply wrong.
> >
> 
> Thanks for the notes. I understand how allocating from OOM context
> is a problem. However I still believe that this would be helpful
> for debugging OOM kills since task->comm is often not descriptive
> enough. Would it help if instead of calling kstrdup_quotable_cmdline()
> which allocates the buffer on heap I called get_cmdline() directly
> passing it stack-allocated buffer of certain size e.g. 256?

No it wouldn't because get_cmdline take mmap_sem lock as already pointed
out.

Please also note that the cmd line might be considered security/privacy
sensitive information and dumping it to the log sounds like a bad idea
in general.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-02-20 10:01 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-20  3:22 [PATCH] mm/oom: added option 'oom_dump_task_cmdline' Stepan Bujnak
2019-02-20  4:09 ` Randy Dunlap
2019-02-20  4:30   ` Bujnak, Stepan
2019-02-20  5:56     ` Randy Dunlap
2019-02-20  6:49 ` Michal Hocko
2019-02-20  8:37   ` Bujnak, Stepan
2019-02-20 10:00     ` Tetsuo Handa
2019-02-20 10:01     ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox