From: David Rientjes <rientjes@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>, Nick Piggin <npiggin@suse.de>,
Oleg Nesterov <oleg@redhat.com>,
Balbir Singh <balbir@linux.vnet.ibm.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
linux-mm@kvack.org
Subject: Re: [patch 09/18] oom: select task from tasklist for mempolicy ooms
Date: Tue, 8 Jun 2010 17:46:45 -0700 (PDT) [thread overview]
Message-ID: <alpine.DEB.2.00.1006081741150.19582@chino.kir.corp.google.com> (raw)
In-Reply-To: <20100608140818.b413c335.akpm@linux-foundation.org>
On Tue, 8 Jun 2010, Andrew Morton wrote:
> > The oom killer presently kills current whenever there is no more memory
> > free or reclaimable on its mempolicy's nodes. There is no guarantee that
> > current is a memory-hogging task or that killing it will free any
> > substantial amount of memory, however.
> >
> > In such situations, it is better to scan the tasklist for nodes that are
> > allowed to allocate on current's set of nodes and kill the task with the
> > highest badness() score. This ensures that the most memory-hogging task,
> > or the one configured by the user with /proc/pid/oom_adj, is always
> > selected in such scenarios.
> >
> >
> > ...
> >
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -27,6 +27,7 @@
> > #include <linux/module.h>
> > #include <linux/notifier.h>
> > #include <linux/memcontrol.h>
> > +#include <linux/mempolicy.h>
> > #include <linux/security.h>
> >
> > int sysctl_panic_on_oom;
> > @@ -36,20 +37,36 @@ static DEFINE_SPINLOCK(zone_scan_lock);
> > /* #define DEBUG */
> >
> > /*
> > - * Is all threads of the target process nodes overlap ours?
> > + * Do all threads of the target process overlap our allowed nodes?
> > + * @tsk: task struct of which task to consider
> > + * @mask: nodemask passed to page allocator for mempolicy ooms
>
> The comment uses kerneldoc annotation but isn't a kerneldoc comment.
>
I'll fix it.
> > */
> > -static int has_intersects_mems_allowed(struct task_struct *tsk)
> > +static bool has_intersects_mems_allowed(struct task_struct *tsk,
> > + const nodemask_t *mask)
> > {
> > - struct task_struct *t;
> > + struct task_struct *start = tsk;
> >
> > - t = tsk;
> > do {
> > - if (cpuset_mems_allowed_intersects(current, t))
> > - return 1;
> > - t = next_thread(t);
> > - } while (t != tsk);
> > -
> > - return 0;
> > + if (mask) {
> > + /*
> > + * If this is a mempolicy constrained oom, tsk's
> > + * cpuset is irrelevant. Only return true if its
> > + * mempolicy intersects current, otherwise it may be
> > + * needlessly killed.
> > + */
> > + if (mempolicy_nodemask_intersects(tsk, mask))
> > + return true;
>
> The comment refers to `current' but the code does not?
>
mempolicy_nodemask_intersects() compares tsk's mempolicy to current's, we
don't need to pass current into the function (and we optimize for that
since we don't need to do task_lock(current): nothing else can change its
mempolicy).
> > + } else {
> > + /*
> > + * This is not a mempolicy constrained oom, so only
> > + * check the mems of tsk's cpuset.
> > + */
>
> The comment doesn't refer to `current', but the code does. Confused.
>
This simply compares the cpuset mems_allowed of both tasks passed into the
function.
> > + if (cpuset_mems_allowed_intersects(current, tsk))
> > + return true;
> > + }
> > + tsk = next_thread(tsk);
>
> hm, next_thread() uses list_entry_rcu(). What are the locking rules
> here? It's one of both of rcu_read_lock() and read_lock(&tasklist_lock),
> I think?
>
Oleg addressed this in his response.
> > + } while (tsk != start);
> > + return false;
> > }
>
> This is all bloat and overhead for non-NUMA builds. I doubt if gcc is
> able to eliminate the task_struct walk (although I didn't check).
>
> The function isn't oom-killer-specific at all - give it a better name
> then move it to mempolicy.c or similar? If so, the text "oom"
> shouldn't appear in the comments.
>
It's the only place where we want to filter tasks based on whether they
share mempolicy nodes or cpuset mems, though, so I think it's
appropriately placed in mm/oom_kill.c. I agree that we can add a
#ifndef CONFIG_NUMA variant and I'll do so, thanks.
> >
> > ...
> >
> > @@ -676,24 +699,19 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
> > */
> > constraint = constrained_alloc(zonelist, gfp_mask, nodemask);
> > read_lock(&tasklist_lock);
> > -
> > - switch (constraint) {
> > - case CONSTRAINT_MEMORY_POLICY:
> > - oom_kill_process(current, gfp_mask, order, 0, NULL,
> > - "No available memory (MPOL_BIND)");
> > - break;
> > -
> > - case CONSTRAINT_NONE:
> > - if (sysctl_panic_on_oom) {
> > + if (unlikely(sysctl_panic_on_oom)) {
> > + /*
> > + * panic_on_oom only affects CONSTRAINT_NONE, the kernel
> > + * should not panic for cpuset or mempolicy induced memory
> > + * failures.
> > + */
>
> This wasn't changelogged?
>
It's not a functional change, sysctl_panic_on_oom == 2 is already handled
earlier in the function. This was intended to elaborate on why we're only
concerned about CONSTRAINT_NONE here since the switch statement was
removed.
> > + if (constraint == CONSTRAINT_NONE) {
> > dump_header(NULL, gfp_mask, order, NULL);
> > - panic("out of memory. panic_on_oom is selected\n");
> > + read_unlock(&tasklist_lock);
> > + panic("Out of memory: panic_on_oom is enabled\n");
> > }
> > - /* Fall-through */
> > - case CONSTRAINT_CPUSET:
> > - __out_of_memory(gfp_mask, order);
> > - break;
> > }
> > -
> > + __out_of_memory(gfp_mask, order, constraint, nodemask);
> > read_unlock(&tasklist_lock);
> >
> > /*
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-06-09 0:46 UTC|newest]
Thread overview: 104+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-06-06 22:33 [patch 00/18] oom killer rewrite David Rientjes
2010-06-06 22:34 ` [patch 01/18] oom: check PF_KTHREAD instead of !mm to skip kthreads David Rientjes
2010-06-07 12:12 ` Balbir Singh
2010-06-07 19:50 ` David Rientjes
2010-06-08 19:33 ` Andrew Morton
2010-06-08 23:40 ` David Rientjes
2010-06-08 23:52 ` Andrew Morton
2010-06-06 22:34 ` [patch 02/18] oom: introduce find_lock_task_mm() to fix !mm false positives David Rientjes
2010-06-07 12:58 ` Balbir Singh
2010-06-07 13:49 ` Minchan Kim
2010-06-07 19:49 ` David Rientjes
2010-06-08 19:42 ` Andrew Morton
2010-06-08 20:14 ` Oleg Nesterov
2010-06-08 20:17 ` Oleg Nesterov
2010-06-08 21:34 ` Andrew Morton
2010-06-08 23:50 ` David Rientjes
2010-06-06 22:34 ` [patch 03/18] oom: dump_tasks use find_lock_task_mm too David Rientjes
2010-06-08 19:55 ` Andrew Morton
2010-06-09 0:06 ` David Rientjes
2010-06-06 22:34 ` [patch 04/18] oom: PF_EXITING check should take mm into account David Rientjes
2010-06-08 20:00 ` Andrew Morton
2010-06-06 22:34 ` [patch 05/18] oom: give current access to memory reserves if it has been killed David Rientjes
2010-06-08 11:41 ` KOSAKI Motohiro
2010-06-08 18:47 ` David Rientjes
2010-06-14 11:08 ` KOSAKI Motohiro
2010-06-08 20:12 ` Andrew Morton
2010-06-13 11:24 ` KOSAKI Motohiro
2010-06-08 20:08 ` Andrew Morton
2010-06-09 0:14 ` David Rientjes
2010-06-06 22:34 ` [patch 06/18] oom: avoid sending exiting tasks a SIGKILL David Rientjes
2010-06-08 11:41 ` KOSAKI Motohiro
2010-06-08 18:48 ` David Rientjes
2010-06-08 20:17 ` Andrew Morton
2010-06-08 20:26 ` Oleg Nesterov
2010-06-09 6:32 ` David Rientjes
2010-06-09 16:25 ` Oleg Nesterov
2010-06-09 19:44 ` David Rientjes
2010-06-09 20:14 ` Oleg Nesterov
2010-06-10 0:15 ` KAMEZAWA Hiroyuki
2010-06-10 1:21 ` Oleg Nesterov
2010-06-10 1:43 ` KAMEZAWA Hiroyuki
2010-06-10 1:51 ` Oleg Nesterov
2010-06-06 22:34 ` [patch 07/18] oom: filter tasks not sharing the same cpuset David Rientjes
2010-06-08 11:41 ` KOSAKI Motohiro
2010-06-08 18:51 ` David Rientjes
2010-06-08 19:27 ` Andrew Morton
2010-06-13 11:24 ` KOSAKI Motohiro
2010-07-02 22:35 ` Andrew Morton
2010-07-04 22:08 ` David Rientjes
2010-07-09 3:00 ` KOSAKI Motohiro
2010-06-08 20:23 ` Andrew Morton
2010-06-09 0:25 ` David Rientjes
2010-06-06 22:34 ` [patch 08/18] oom: sacrifice child with highest badness score for parent David Rientjes
2010-06-08 11:41 ` KOSAKI Motohiro
2010-06-08 18:53 ` David Rientjes
2010-06-08 20:33 ` Andrew Morton
2010-06-09 0:30 ` David Rientjes
2010-06-06 22:34 ` [patch 09/18] oom: select task from tasklist for mempolicy ooms David Rientjes
2010-06-08 11:41 ` KOSAKI Motohiro
2010-06-08 21:08 ` Andrew Morton
2010-06-08 21:17 ` Oleg Nesterov
2010-06-09 0:46 ` David Rientjes [this message]
2010-06-08 23:43 ` Andrew Morton
2010-06-09 0:40 ` David Rientjes
2010-06-06 22:34 ` [patch 10/18] oom: enable oom tasklist dump by default David Rientjes
2010-06-08 11:42 ` KOSAKI Motohiro
2010-06-08 18:56 ` David Rientjes
2010-06-08 21:13 ` Andrew Morton
2010-06-09 0:52 ` David Rientjes
2010-06-06 22:34 ` [patch 11/18] oom: avoid oom killer for lowmem allocations David Rientjes
2010-06-08 11:42 ` KOSAKI Motohiro
2010-06-08 21:19 ` Andrew Morton
2010-06-06 22:34 ` [patch 12/18] oom: extract panic helper function David Rientjes
2010-06-08 11:42 ` KOSAKI Motohiro
2010-06-06 22:34 ` [patch 13/18] oom: remove special handling for pagefault ooms David Rientjes
2010-06-08 11:42 ` KOSAKI Motohiro
2010-06-08 18:57 ` David Rientjes
2010-06-08 21:27 ` Andrew Morton
2010-06-06 22:34 ` [patch 14/18] oom: move sysctl declarations to oom.h David Rientjes
2010-06-08 11:42 ` KOSAKI Motohiro
2010-06-06 22:34 ` [patch 15/18] oom: remove unnecessary code and cleanup David Rientjes
2010-06-06 22:34 ` [patch 16/18] oom: badness heuristic rewrite David Rientjes
2010-06-08 11:41 ` KOSAKI Motohiro
2010-06-08 23:02 ` Andrew Morton
2010-06-13 11:24 ` KOSAKI Motohiro
2010-06-17 5:14 ` David Rientjes
2010-06-21 11:45 ` KOSAKI Motohiro
2010-06-21 20:47 ` David Rientjes
2010-06-30 9:26 ` KOSAKI Motohiro
2010-06-17 5:12 ` David Rientjes
2010-06-21 11:45 ` KOSAKI Motohiro
2010-06-08 22:58 ` Andrew Morton
2010-06-17 5:32 ` David Rientjes
2010-06-06 22:34 ` [patch 17/18] oom: add forkbomb penalty to badness heuristic David Rientjes
2010-06-08 11:41 ` KOSAKI Motohiro
2010-06-08 23:15 ` Andrew Morton
2010-06-06 22:35 ` [patch 18/18] oom: deprecate oom_adj tunable David Rientjes
2010-06-08 11:42 ` KOSAKI Motohiro
2010-06-08 19:00 ` David Rientjes
2010-06-08 23:18 ` Andrew Morton
2010-06-13 11:24 ` KOSAKI Motohiro
2010-06-17 3:36 ` David Rientjes
2010-06-21 11:45 ` KOSAKI Motohiro
2010-06-21 20:54 ` David Rientjes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.2.00.1006081741150.19582@chino.kir.corp.google.com \
--to=rientjes@google.com \
--cc=akpm@linux-foundation.org \
--cc=balbir@linux.vnet.ibm.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=npiggin@suse.de \
--cc=oleg@redhat.com \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox