From: David Rientjes <rientjes@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>, Nick Piggin <npiggin@suse.de>,
Oleg Nesterov <oleg@redhat.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Balbir Singh <balbir@linux.vnet.ibm.com>,
linux-mm@kvack.org
Subject: [patch -mm 05/18] oom: remove special handling for pagefault ooms
Date: Tue, 1 Jun 2010 00:18:32 -0700 (PDT) [thread overview]
Message-ID: <alpine.DEB.2.00.1006010014080.29202@chino.kir.corp.google.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1006010008410.29202@chino.kir.corp.google.com>
It is possible to remove the special pagefault oom handler by simply oom
locking all system zones and then calling directly into out_of_memory().
All populated zones must have ZONE_OOM_LOCKED set, otherwise there is a
parallel oom killing in progress that will lead to eventual memory freeing
so it's not necessary to needlessly kill another task. The context in
which the pagefault is allocating memory is unknown to the oom killer, so
this is done on a system-wide level.
If a task has already been oom killed and hasn't fully exited yet, this
will be a no-op since select_bad_process() recognizes tasks across the
system with TIF_MEMDIE set.
Acked-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: David Rientjes <rientjes@google.com>
---
mm/oom_kill.c | 86 +++++++++++++++++++++++++++++++++++++-------------------
1 files changed, 57 insertions(+), 29 deletions(-)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -603,6 +603,44 @@ void clear_zonelist_oom(struct zonelist *zonelist, gfp_t gfp_mask)
}
/*
+ * Try to acquire the oom killer lock for all system zones. Returns zero if a
+ * parallel oom killing is taking place, otherwise locks all zones and returns
+ * non-zero.
+ */
+static int try_set_system_oom(void)
+{
+ struct zone *zone;
+ int ret = 1;
+
+ spin_lock(&zone_scan_lock);
+ for_each_populated_zone(zone)
+ if (zone_is_oom_locked(zone)) {
+ ret = 0;
+ goto out;
+ }
+ for_each_populated_zone(zone)
+ zone_set_flag(zone, ZONE_OOM_LOCKED);
+out:
+ spin_unlock(&zone_scan_lock);
+ return ret;
+}
+
+/*
+ * Clears ZONE_OOM_LOCKED for all system zones so that failed allocation
+ * attempts or page faults may now recall the oom killer, if necessary.
+ */
+static void clear_system_oom(void)
+{
+ struct zone *zone;
+
+ spin_lock(&zone_scan_lock);
+ for_each_populated_zone(zone)
+ zone_clear_flag(zone, ZONE_OOM_LOCKED);
+ spin_unlock(&zone_scan_lock);
+}
+
+
+/*
* Must be called with tasklist_lock held for read.
*/
static void __out_of_memory(gfp_t gfp_mask, int order,
@@ -637,33 +675,6 @@ retry:
goto retry;
}
-/*
- * pagefault handler calls into here because it is out of memory but
- * doesn't know exactly how or why.
- */
-void pagefault_out_of_memory(void)
-{
- unsigned long freed = 0;
-
- blocking_notifier_call_chain(&oom_notify_list, 0, &freed);
- if (freed > 0)
- /* Got some memory back in the last second. */
- return;
-
- check_panic_on_oom(CONSTRAINT_NONE, 0, 0);
- read_lock(&tasklist_lock);
- /* unknown gfp_mask and order */
- __out_of_memory(0, 0, CONSTRAINT_NONE, NULL);
- read_unlock(&tasklist_lock);
-
- /*
- * Give "p" a good chance of killing itself before we
- * retry to allocate memory.
- */
- if (!test_thread_flag(TIF_MEMDIE))
- schedule_timeout_uninterruptible(1);
-}
-
/**
* out_of_memory - kill the "best" process when we run out of memory
* @zonelist: zonelist pointer
@@ -680,7 +691,7 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
int order, nodemask_t *nodemask)
{
unsigned long freed = 0;
- enum oom_constraint constraint;
+ enum oom_constraint constraint = CONSTRAINT_NONE;
blocking_notifier_call_chain(&oom_notify_list, 0, &freed);
if (freed > 0)
@@ -691,7 +702,8 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
* Check if there were limitations on the allocation (only relevant for
* NUMA) that may require different handling.
*/
- constraint = constrained_alloc(zonelist, gfp_mask, nodemask);
+ if (zonelist)
+ constraint = constrained_alloc(zonelist, gfp_mask, nodemask);
check_panic_on_oom(constraint, gfp_mask, order);
read_lock(&tasklist_lock);
__out_of_memory(gfp_mask, order, constraint, nodemask);
@@ -704,3 +716,19 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
if (!test_thread_flag(TIF_MEMDIE))
schedule_timeout_uninterruptible(1);
}
+
+/*
+ * The pagefault handler calls here because it is out of memory, so kill a
+ * memory-hogging task. If a populated zone has ZONE_OOM_LOCKED set, a parallel
+ * oom killing is already in progress so do nothing. If a task is found with
+ * TIF_MEMDIE set, it has been killed so do nothing and allow it to exit.
+ */
+void pagefault_out_of_memory(void)
+{
+ if (try_set_system_oom()) {
+ out_of_memory(NULL, 0, 0, NULL);
+ clear_system_oom();
+ }
+ if (!test_thread_flag(TIF_MEMDIE))
+ schedule_timeout_uninterruptible(1);
+}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-06-01 7:18 UTC|newest]
Thread overview: 99+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-06-01 7:18 [patch -mm 00/18] oom killer rewrite David Rientjes
2010-06-01 7:18 ` [patch -mm 01/18] oom: filter tasks not sharing the same cpuset David Rientjes
2010-06-01 7:20 ` KOSAKI Motohiro
2010-06-08 11:41 ` KOSAKI Motohiro
2010-06-08 18:37 ` David Rientjes
2010-06-13 11:24 ` KOSAKI Motohiro
2010-06-17 3:33 ` David Rientjes
2010-06-21 11:45 ` KOSAKI Motohiro
2010-06-21 11:45 ` KOSAKI Motohiro
2010-06-08 11:41 ` KOSAKI Motohiro
2010-06-08 18:43 ` David Rientjes
2010-06-08 23:25 ` Andrew Morton
2010-06-08 23:54 ` David Rientjes
2010-06-09 0:06 ` Andrew Morton
2010-06-09 1:07 ` David Rientjes
2010-06-13 11:24 ` KOSAKI Motohiro
2010-06-01 7:18 ` [patch -mm 02/18] oom: sacrifice child with highest badness score for parent David Rientjes
2010-06-01 7:39 ` KOSAKI Motohiro
2010-06-08 11:41 ` KOSAKI Motohiro
2010-06-08 18:41 ` David Rientjes
2010-06-13 11:24 ` KOSAKI Motohiro
2010-06-14 8:54 ` David Rientjes
2010-06-14 11:08 ` KOSAKI Motohiro
2010-06-08 11:41 ` KOSAKI Motohiro
2010-06-08 18:45 ` David Rientjes
2010-06-01 7:18 ` [patch -mm 03/18] oom: select task from tasklist for mempolicy ooms David Rientjes
2010-06-01 7:39 ` KOSAKI Motohiro
2010-06-08 11:41 ` KOSAKI Motohiro
2010-06-08 23:28 ` Andrew Morton
2010-06-08 11:41 ` KOSAKI Motohiro
2010-06-01 7:18 ` [patch -mm 04/18] oom: extract panic helper function David Rientjes
2010-06-01 7:33 ` KOSAKI Motohiro
2010-06-01 7:18 ` David Rientjes [this message]
2010-06-01 7:34 ` [patch -mm 05/18] oom: remove special handling for pagefault ooms KOSAKI Motohiro
2010-06-01 7:18 ` [patch -mm 06/18] oom: move sysctl declarations to oom.h David Rientjes
2010-06-01 7:34 ` KOSAKI Motohiro
2010-06-01 7:18 ` [patch -mm 07/18] oom: enable oom tasklist dump by default David Rientjes
2010-06-01 7:36 ` KOSAKI Motohiro
2010-06-01 7:18 ` [patch -mm 08/18] oom: badness heuristic rewrite David Rientjes
2010-06-01 7:36 ` KOSAKI Motohiro
2010-06-01 18:44 ` David Rientjes
2010-06-02 13:54 ` KOSAKI Motohiro
2010-06-02 21:20 ` David Rientjes
2010-06-03 23:10 ` Andrew Morton
2010-06-03 23:53 ` KAMEZAWA Hiroyuki
2010-06-04 0:04 ` Andrew Morton
2010-06-04 0:20 ` KAMEZAWA Hiroyuki
2010-06-04 5:57 ` KAMEZAWA Hiroyuki
2010-06-04 9:22 ` David Rientjes
2010-06-04 9:19 ` David Rientjes
2010-06-04 9:43 ` Oleg Nesterov
2010-06-04 10:54 ` KOSAKI Motohiro
2010-06-04 20:57 ` David Rientjes
2010-06-08 11:41 ` KOSAKI Motohiro
2010-06-08 23:47 ` Andrew Morton
2010-06-17 3:28 ` David Rientjes
2010-06-01 7:46 ` Nick Piggin
2010-06-01 18:56 ` David Rientjes
2010-06-02 13:54 ` KOSAKI Motohiro
2010-06-02 21:23 ` David Rientjes
2010-06-03 0:05 ` KAMEZAWA Hiroyuki
2010-06-03 6:44 ` David Rientjes
2010-06-03 3:07 ` KOSAKI Motohiro
2010-06-03 6:48 ` David Rientjes
2010-06-03 23:15 ` Andrew Morton
2010-06-04 10:54 ` KOSAKI Motohiro
2010-06-01 7:18 ` [patch -mm 09/18] oom: add forkbomb penalty to badness heuristic David Rientjes
2010-06-01 7:37 ` KOSAKI Motohiro
2010-06-01 18:57 ` David Rientjes
2010-06-03 20:33 ` David Rientjes
2010-06-08 11:41 ` KOSAKI Motohiro
2010-06-08 11:41 ` KOSAKI Motohiro
2010-06-01 7:18 ` [patch -mm 10/18] oom: deprecate oom_adj tunable David Rientjes
2010-06-01 7:37 ` KOSAKI Motohiro
2010-06-01 7:18 ` [patch -mm 11/18] oom: avoid oom killer for lowmem allocations David Rientjes
2010-06-01 7:38 ` KOSAKI Motohiro
2010-06-08 11:41 ` KOSAKI Motohiro
2010-06-08 18:38 ` David Rientjes
2010-06-01 7:18 ` [patch -mm 12/18] oom: remove unnecessary code and cleanup David Rientjes
2010-06-01 7:40 ` KOSAKI Motohiro
2010-06-01 18:58 ` David Rientjes
2010-06-01 7:19 ` [patch -mm 13/18] oom: avoid race for oom killed tasks detaching mm prior to exit David Rientjes
2010-06-01 7:40 ` KOSAKI Motohiro
2010-06-01 18:59 ` David Rientjes
2010-06-01 20:43 ` Oleg Nesterov
2010-06-01 21:19 ` David Rientjes
2010-06-02 0:28 ` KAMEZAWA Hiroyuki
2010-06-02 9:49 ` David Rientjes
2010-06-02 10:46 ` Nick Piggin
2010-06-02 21:35 ` David Rientjes
2010-06-02 13:54 ` KOSAKI Motohiro
2010-06-01 7:19 ` [patch -mm 14/18] oom: check PF_KTHREAD instead of !mm to skip kthreads David Rientjes
2010-06-01 7:41 ` KOSAKI Motohiro
2010-06-01 7:19 ` [patch -mm 15/18] oom: introduce find_lock_task_mm() to fix !mm false positives David Rientjes
2010-06-01 7:41 ` KOSAKI Motohiro
2010-06-01 7:19 ` [patch -mm 16/18] oom: give current access to memory reserves if it has been killed David Rientjes
2010-06-01 7:44 ` KOSAKI Motohiro
2010-06-01 7:19 ` [patch -mm 17/18] oom: avoid sending exiting tasks a SIGKILL David Rientjes
2010-06-01 7:19 ` [patch -mm 18/18] oom: clean up oom_kill_task() David Rientjes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.2.00.1006010014080.29202@chino.kir.corp.google.com \
--to=rientjes@google.com \
--cc=akpm@linux-foundation.org \
--cc=balbir@linux.vnet.ibm.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=npiggin@suse.de \
--cc=oleg@redhat.com \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox