linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Arcangeli <andrea@suse.de>
To: linux-mm@kvack.org
Cc: David Rientjes <rientjes@google.com>
Subject: [PATCH 09 of 24] fallback killing more tasks if tif-memdie doesn't go away
Date: Wed, 22 Aug 2007 14:48:56 +0200	[thread overview]
Message-ID: <9bf6a66eab3c52327daa.1187786936@v2.random> (raw)
In-Reply-To: <patchbomb.1187786927@v2.random>

# HG changeset patch
# User Andrea Arcangeli <andrea@suse.de>
# Date 1187778125 -7200
# Node ID 9bf6a66eab3c52327daa831ef101d7802bc71791
# Parent  ffdc30241856d7155ceedd4132eef684f7cc7059
fallback killing more tasks if tif-memdie doesn't go away

Waiting indefinitely for a TIF_MEMDIE task to go away will deadlock. Two
tasks reading from the same inode at the same time and both going out of
memory inside a read(largebuffer) syscall, will even deadlock through
contention over the PG_locked bitflag. The task holding the semaphore
detects oom but the oom killer decides to kill the task blocked in
wait_on_page_locked(). The task holding the semaphore will hang inside
alloc_pages that will never return because it will wait the TIF_MEMDIE
task to go away, but the TIF_MEMDIE task can't go away until the task
holding the semaphore is killed in the first place.

It's quite unpractical to teach the oom killer the locking dependencies
across running tasks, so the feasible fix is to develop a logic that
after waiting a long time for a TIF_MEMDIE tasks goes away, fallbacks
on killing one more task. This also eliminates the possibility of
suprious oom killage (i.e. two tasks killed despite only one had to be
killed). It's not a math guarantee because we can't demonstrate that if
a TIF_MEMDIE SIGKILLED task didn't mange to complete do_exit within
10sec, it never will. But the current probability of suprious oom
killing is sure much higher than the probability of suprious oom killing
with this patch applied.

The whole locking is around the tasklist_lock. On one side do_exit reads
TIF_MEMDIE and clears VM_is_OOM under the lock, on the other side the
oom killer accesses VM_is_OOM and TIF_MEMDIE under the lock. This is a
read_lock in the oom killer but it's actually a write lock thanks to the
OOM_lock semaphore running one oom killer at once (the locking rule is,
either use write_lock_irq or read_lock+OOM_lock).

Signed-off-by: Andrea Arcangeli <andrea@suse.de>

diff --git a/kernel/exit.c b/kernel/exit.c
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -849,6 +849,15 @@ static void exit_notify(struct task_stru
 	if (tsk->exit_signal == -1 && likely(!tsk->ptrace))
 		state = EXIT_DEAD;
 	tsk->exit_state = state;
+
+	/*
+	 * Read TIF_MEMDIE and set VM_is_OOM to 0 atomically inside
+	 * the tasklist_lock_lock.
+	 */
+	if (unlikely(test_tsk_thread_flag(tsk, TIF_MEMDIE))) {
+		extern unsigned long VM_is_OOM;
+		clear_bit(0, &VM_is_OOM);
+	}
 
 	write_unlock_irq(&tasklist_lock);
 
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -29,6 +29,9 @@ int sysctl_panic_on_oom;
 int sysctl_panic_on_oom;
 /* #define DEBUG */
 
+unsigned long VM_is_OOM;
+static unsigned long last_tif_memdie_jiffies;
+
 /**
  * badness - calculate a numeric value for how bad this task has been
  * @p: task struct of which task we should calculate
@@ -226,21 +229,14 @@ static struct task_struct *select_bad_pr
 		if (is_init(p))
 			continue;
 
-		/*
-		 * This task already has access to memory reserves and is
-		 * being killed. Don't allow any other task access to the
-		 * memory reserve.
-		 *
-		 * Note: this may have a chance of deadlock if it gets
-		 * blocked waiting for another task which itself is waiting
-		 * for memory. Is there a better alternative?
-		 *
-		 * Better not to skip PF_EXITING tasks, since they
-		 * don't have access to the PF_MEMALLOC pool until
-		 * we select them here first.
-		 */
-		if (test_tsk_thread_flag(p, TIF_MEMDIE))
-			return ERR_PTR(-1UL);
+		if (unlikely(test_tsk_thread_flag(p, TIF_MEMDIE))) {
+			/*
+			 * Either we already waited long enough,
+			 * or exit_mm already run, so we must
+			 * try to kill another task.
+			 */
+			continue;
+		}
 
 		if (p->oomkilladj == OOM_DISABLE)
 			continue;
@@ -277,13 +273,16 @@ static void __oom_kill_task(struct task_
 	if (verbose)
 		printk(KERN_ERR "Killed process %d (%s)\n", p->pid, p->comm);
 
+	if (!test_and_set_tsk_thread_flag(p, TIF_MEMDIE)) {
+		last_tif_memdie_jiffies = jiffies;
+		set_bit(0, &VM_is_OOM);
+	}
 	/*
 	 * We give our sacrificial lamb high priority and access to
 	 * all the memory it needs. That way it should be able to
 	 * exit() and clear out its resources quickly...
 	 */
 	p->time_slice = HZ;
-	set_tsk_thread_flag(p, TIF_MEMDIE);
 
 	force_sig(SIGKILL, p);
 }
@@ -420,6 +419,18 @@ void out_of_memory(struct zonelist *zone
 	constraint = constrained_alloc(zonelist, gfp_mask);
 	cpuset_lock();
 	read_lock(&tasklist_lock);
+
+	/*
+	 * This holds the down(OOM_lock)+read_lock(tasklist_lock), so it's
+	 * equivalent to write_lock_irq(tasklist_lock) as far as VM_is_OOM
+	 * is concerned.
+	 */
+	if (unlikely(test_bit(0, &VM_is_OOM))) {
+		if (time_before(jiffies, last_tif_memdie_jiffies + 10*HZ))
+			goto out;
+		printk("detected probable OOM deadlock, so killing another task\n");
+		last_tif_memdie_jiffies = jiffies;
+	}
 
 	switch (constraint) {
 	case CONSTRAINT_MEMORY_POLICY:
@@ -441,10 +452,6 @@ retry:
 		 * issues we may have.
 		 */
 		p = select_bad_process(&points);
-
-		if (PTR_ERR(p) == -1UL)
-			goto out;
-
 		/* Found nothing?!?! Either we hang forever, or we panic. */
 		if (!p) {
 			read_unlock(&tasklist_lock);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2007-08-22 12:48 UTC|newest]

Thread overview: 113+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-08-22 12:48 [PATCH 00 of 24] OOM related fixes Andrea Arcangeli
2007-08-22 12:48 ` [PATCH 01 of 24] remove nr_scan_inactive/active Andrea Arcangeli
2007-09-12 11:44   ` Andrew Morton
2008-01-02 17:50     ` Andrea Arcangeli
2007-08-22 12:48 ` [PATCH 02 of 24] avoid oom deadlock in nfs_create_request Andrea Arcangeli
2007-09-12 23:54   ` Christoph Lameter
2007-08-22 12:48 ` [PATCH 03 of 24] prevent oom deadlocks during read/write operations Andrea Arcangeli
2007-09-12 11:56   ` Andrew Morton
2007-09-12  2:18     ` Nick Piggin
2008-01-03  0:53     ` Andrea Arcangeli
2007-08-22 12:48 ` [PATCH 04 of 24] serialize oom killer Andrea Arcangeli
2007-09-12 12:02   ` Andrew Morton
2007-09-12 12:04     ` Andrew Morton
2007-09-12 12:11       ` Andrea Arcangeli
2008-01-03  0:55     ` Andrea Arcangeli
2007-09-13  0:09   ` Christoph Lameter
2007-09-13 18:32     ` David Rientjes
2007-09-13 18:37       ` Christoph Lameter
2007-09-13 18:46         ` David Rientjes
2007-09-13 18:53           ` Christoph Lameter
2007-09-14  0:36             ` David Rientjes
2007-09-14  2:31               ` Christoph Lameter
2007-09-14  3:33                 ` David Rientjes
2007-09-18 16:44                   ` David Rientjes
2007-09-18 16:44                     ` [patch 1/4] oom: move prototypes to appropriate header file David Rientjes
2007-09-18 16:44                       ` [patch 2/4] oom: move constraints to enum David Rientjes
2007-09-18 16:44                         ` [patch 3/4] oom: save zonelist pointer for oom killer calls David Rientjes
2007-09-18 16:44                           ` [patch 4/4] oom: serialize out of memory calls David Rientjes
2007-09-18 19:54                             ` Christoph Lameter
2007-09-18 19:56                               ` David Rientjes
2007-09-18 20:01                                 ` Christoph Lameter
2007-09-18 20:06                                   ` David Rientjes
2007-09-18 20:23                                     ` [patch 5/4] oom: rename serialization helper functions David Rientjes
2007-09-18 20:26                                       ` Christoph Lameter
2007-09-18 20:39                                         ` [patch 5/4 v2] " David Rientjes
2007-09-18 20:59                                           ` Christoph Lameter
2007-09-18 19:57                           ` [patch 3/4] oom: save zonelist pointer for oom killer calls Christoph Lameter
2007-09-18 20:13                             ` David Rientjes
2007-09-18 20:16                               ` Christoph Lameter
2007-09-18 20:47                                 ` [patch 6/4] oom: pass null to kfree if zonelist is not cleared David Rientjes
2007-09-18 21:01                                   ` Christoph Lameter
2007-09-18 21:13                                     ` David Rientjes
2007-09-18 21:25                                       ` Christoph Lameter
2007-09-18 22:16                                         ` David Rientjes
2007-09-19 17:09                                           ` Paul Jackson
2007-09-19 18:21                                             ` David Rientjes
2007-09-18 19:55                         ` [patch 2/4] oom: move constraints to enum Christoph Lameter
2007-08-22 12:48 ` [PATCH 05 of 24] avoid selecting already killed tasks Andrea Arcangeli
2007-09-13  0:13   ` Christoph Lameter
2007-08-22 12:48 ` [PATCH 06 of 24] reduce the probability of an OOM livelock Andrea Arcangeli
2007-09-12 12:17   ` Andrew Morton
2008-01-03  1:03     ` Andrea Arcangeli
2007-08-22 12:48 ` [PATCH 07 of 24] balance_pgdat doesn't return the number of pages freed Andrea Arcangeli
2007-09-12 12:18   ` Andrew Morton
2007-09-13  0:26     ` Christoph Lameter
2007-08-22 12:48 ` [PATCH 08 of 24] don't depend on PF_EXITING tasks to go away Andrea Arcangeli
2007-09-12 12:20   ` Andrew Morton
2008-01-03  0:56     ` Andrea Arcangeli
2007-08-22 12:48 ` Andrea Arcangeli [this message]
2007-09-12 12:30   ` [PATCH 09 of 24] fallback killing more tasks if tif-memdie doesn't " Andrew Morton
2007-09-12 12:34     ` Andrew Morton
2008-01-03  1:06     ` Andrea Arcangeli
2007-08-22 12:48 ` [PATCH 10 of 24] stop useless vm trashing while we wait the TIF_MEMDIE task to exit Andrea Arcangeli
2007-09-12 12:42   ` Andrew Morton
2007-09-13  0:36     ` Christoph Lameter
2007-09-21 19:10   ` David Rientjes
2008-01-03  1:08     ` Andrea Arcangeli
2007-08-22 12:48 ` [PATCH 11 of 24] the oom schedule timeout isn't needed with the VM_is_OOM logic Andrea Arcangeli
2007-09-12 12:44   ` Andrew Morton
2007-08-22 12:48 ` [PATCH 12 of 24] show mem information only when a task is actually being killed Andrea Arcangeli
2007-09-12 12:49   ` Andrew Morton
2007-08-22 12:49 ` [PATCH 13 of 24] simplify oom heuristics Andrea Arcangeli
2007-09-12 12:52   ` Andrew Morton
2007-09-12 13:40     ` Andrea Arcangeli
2007-09-12 20:52       ` Andrew Morton
2007-08-22 12:49 ` [PATCH 14 of 24] oom select should only take rss into account Andrea Arcangeli
2007-09-13  0:43   ` Christoph Lameter
2007-08-22 12:49 ` [PATCH 15 of 24] limit reclaim if enough pages have been freed Andrea Arcangeli
2007-09-12 12:57   ` Andrew Morton
2008-01-03  1:12     ` Andrea Arcangeli
2007-09-12 12:58   ` Andrew Morton
2007-09-12 13:38     ` Andrea Arcangeli
2007-08-22 12:49 ` [PATCH 16 of 24] avoid some lock operation in vm fast path Andrea Arcangeli
2007-09-12 12:59   ` Andrew Morton
2007-09-13  0:49     ` Christoph Lameter
2007-09-13  1:16       ` Andrew Morton
2007-09-13  1:33         ` Christoph Lameter
2007-09-13  1:41           ` KAMEZAWA Hiroyuki
2007-09-13  1:44           ` Andrew Morton
2007-08-22 12:49 ` [PATCH 17 of 24] apply the anti deadlock features only to global oom Andrea Arcangeli
2007-09-12 13:02   ` Andrew Morton
2007-09-13  0:53     ` Christoph Lameter
2007-09-13  0:52   ` Christoph Lameter
2007-08-22 12:49 ` [PATCH 18 of 24] run panic the same way in both places Andrea Arcangeli
2007-09-13  0:54   ` Christoph Lameter
2007-08-22 12:49 ` [PATCH 19 of 24] cacheline align VM_is_OOM to prevent false sharing Andrea Arcangeli
2007-09-12 13:02   ` Andrew Morton
2007-09-12 13:36     ` Andrea Arcangeli
2007-09-13  0:55       ` Christoph Lameter
2007-08-22 12:49 ` [PATCH 20 of 24] extract deadlock helper function Andrea Arcangeli
2007-08-22 12:49 ` [PATCH 21 of 24] select process to kill for cpusets Andrea Arcangeli
2007-09-12 13:05   ` Andrew Morton
2007-09-13  0:59     ` Christoph Lameter
2007-09-13  5:13       ` David Rientjes
2007-09-13 17:55         ` Christoph Lameter
2007-08-22 12:49 ` [PATCH 22 of 24] extract select helper function Andrea Arcangeli
2007-08-22 12:49 ` [PATCH 23 of 24] serialize for cpusets Andrea Arcangeli
2007-09-12 13:10   ` Andrew Morton
2007-09-12 13:34     ` Andrea Arcangeli
2007-09-12 19:08     ` David Rientjes
2007-09-13  1:02     ` Christoph Lameter
2007-08-22 12:49 ` [PATCH 24 of 24] add oom_kill_asking_task flag Andrea Arcangeli
2007-09-12 13:11   ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9bf6a66eab3c52327daa.1187786936@v2.random \
    --to=andrea@suse.de \
    --cc=linux-mm@kvack.org \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox