[patch 0/9] oom: various fixes and improvements for 2.6.18-rc2

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [patch 0/9] oom: various fixes and improvements for 2.6.18-rc2
@ 2006-07-28  7:20 Nick Piggin
  2006-07-28  7:20 ` [patch 1/9] oom: use unreclaimable info Nick Piggin
                   ` (9 more replies)
  0 siblings, 10 replies; 15+ messages in thread
From: Nick Piggin @ 2006-07-28  7:20 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Nick Piggin, Linux Memory Management

These are some various OOM killer fixes that I have accumulated. Some of
the more important ones are in SLES10, and were developed in response to
issues coming up in stress testing.

The other small fixes haven't been widely tested, but they're issues I
spotted when working in this area.

Comments?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 1/9] oom: use unreclaimable info
  2006-07-28  7:20 [patch 0/9] oom: various fixes and improvements for 2.6.18-rc2 Nick Piggin
@ 2006-07-28  7:20 ` Nick Piggin
  2006-07-28  7:21 ` [patch 2/9] oom: reclaim_mapped on oom Nick Piggin
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Nick Piggin @ 2006-07-28  7:20 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Nick Piggin, Linux Memory Management

__alloc_pages currently starts shooting if page reclaim has failed to free up
swap_cluster_max pages in one run through the priorities. This is not always a
good indicator on its own, so make use of the all_unreclaimable logic as
well: don't consider going OOM until all zones we're interested in are
unreclaimable.

Signed-off-by: Nick Piggin <npiggin@suse.de>

Index: linux-2.6/mm/vmscan.c
===================================================================
--- linux-2.6.orig/mm/vmscan.c
+++ linux-2.6/mm/vmscan.c
@@ -62,6 +62,8 @@ struct scan_control {
 	int swap_cluster_max;
 
 	int swappiness;
+
+	int all_unreclaimable;
 };
 
 /*
@@ -925,6 +927,7 @@ static unsigned long shrink_zones(int pr
 	unsigned long nr_reclaimed = 0;
 	int i;
 
+	sc->all_unreclaimable = 1;
 	for (i = 0; zones[i] != NULL; i++) {
 		struct zone *zone = zones[i];
 
@@ -941,6 +944,8 @@ static unsigned long shrink_zones(int pr
 		if (zone->all_unreclaimable && priority != DEF_PRIORITY)
 			continue;	/* Let kswapd poll it */
 
+		sc->all_unreclaimable = 0;
+
 		nr_reclaimed += shrink_zone(priority, zone, sc);
 	}
 	return nr_reclaimed;
@@ -1021,6 +1026,9 @@ unsigned long try_to_free_pages(struct z
 		if (sc.nr_scanned && priority < DEF_PRIORITY - 2)
 			blk_congestion_wait(WRITE, HZ/10);
 	}
+	/* top priority shrink_caches still had more to do? don't OOM, then */
+	if (!sc.all_unreclaimable)
+		ret = 1;
 out:
 	for (i = 0; zones[i] != 0; i++) {
 		struct zone *zone = zones[i];

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 2/9] oom: reclaim_mapped on oom
  2006-07-28  7:20 [patch 0/9] oom: various fixes and improvements for 2.6.18-rc2 Nick Piggin
  2006-07-28  7:20 ` [patch 1/9] oom: use unreclaimable info Nick Piggin
@ 2006-07-28  7:21 ` Nick Piggin
  2006-07-28  7:21 ` [patch 3/9] cpuset: oom panic fix Nick Piggin
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Nick Piggin @ 2006-07-28  7:21 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Nick Piggin, Linux Memory Management

Potentially it takes several scans of the lru lists before we can even start
reclaiming pages.

mapped pages, with young ptes can take 2 passes on the active list + one on
the inactive list. But reclaim_mapped may not always kick in instantly, so
it could take even more than that.

Raise the threshold for marking a zone as all_unreclaimable from a factor of
4 time the pages in the zone to 6.  Introduce a mechanism to force
reclaim_mapped if we've reached a factor 3 and still haven't made progress.

Previously, a customer doing stress testing was able to easily OOM the box
after using only a small fraction of its swap (~100MB). After the patches, it
would only OOM after having used up all swap (~800MB).

Signed-off-by: Nick Piggin <npiggin@suse.de>

Index: linux-2.6/mm/vmscan.c
===================================================================
--- linux-2.6.orig/mm/vmscan.c
+++ linux-2.6/mm/vmscan.c
@@ -697,6 +697,11 @@ done:
 	return nr_reclaimed;
 }
 
+static inline int zone_is_near_oom(struct zone *zone)
+{
+	return zone->pages_scanned >= (zone->nr_active + zone->nr_inactive)*3;
+}
+
 /*
  * This moves pages from the active list to the inactive list.
  *
@@ -732,6 +737,9 @@ static void shrink_active_list(unsigned 
 		long distress;
 		long swap_tendency;
 
+		if (zone_is_near_oom(zone))
+			goto force_reclaim_mapped;
+
 		/*
 		 * `distress' is a measure of how much trouble we're having
 		 * reclaiming pages.  0 -> no problems.  100 -> great trouble.
@@ -767,6 +775,7 @@ static void shrink_active_list(unsigned 
 		 * memory onto the inactive list.
 		 */
 		if (swap_tendency >= 100)
+force_reclaim_mapped:
 			reclaim_mapped = 1;
 	}
 
@@ -1161,7 +1170,7 @@ scan:
 			if (zone->all_unreclaimable)
 				continue;
 			if (nr_slab == 0 && zone->pages_scanned >=
-				    (zone->nr_active + zone->nr_inactive) * 4)
+				    (zone->nr_active + zone->nr_inactive) * 6)
 				zone->all_unreclaimable = 1;
 			/*
 			 * If we've done a decent amount of scanning and

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 3/9] cpuset: oom panic fix
  2006-07-28  7:20 [patch 0/9] oom: various fixes and improvements for 2.6.18-rc2 Nick Piggin
  2006-07-28  7:20 ` [patch 1/9] oom: use unreclaimable info Nick Piggin
  2006-07-28  7:21 ` [patch 2/9] oom: reclaim_mapped on oom Nick Piggin
@ 2006-07-28  7:21 ` Nick Piggin
  2006-07-28  7:29   ` Nick Piggin
  2006-07-28  9:06   ` Paul Jackson
  2006-07-28  7:21 ` [patch 4/9] oom: cpuset hint Nick Piggin
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 15+ messages in thread
From: Nick Piggin @ 2006-07-28  7:21 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Nick Piggin, Linux Memory Management

cpuset_excl_nodes_overlap always returns 0 if current is exiting. This caused
customer's systems to panic in the OOM killer when processes were having
trouble getting memory for the final put_user in mm_release. Even though there
were lots of processes to kill.

Change to returning 0 in this case. This achieves parity with !CONFIG_CPUSETS
case, and was observed to fix the problem.

Signed-off-by: Nick Piggin <npiggin@suse.de>

Index: linux-2.6/kernel/cpuset.c
===================================================================
--- linux-2.6.orig/kernel/cpuset.c
+++ linux-2.6/kernel/cpuset.c
@@ -2369,7 +2369,7 @@ EXPORT_SYMBOL_GPL(cpuset_mem_spread_node
 int cpuset_excl_nodes_overlap(const struct task_struct *p)
 {
 	const struct cpuset *cs1, *cs2;	/* my and p's cpuset ancestors */
-	int overlap = 0;		/* do cpusets overlap? */
+	int overlap = 1;		/* do cpusets overlap? */
 
 	task_lock(current);
 	if (current->flags & PF_EXITING) {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch 3/9] cpuset: oom panic fix
  2006-07-28  7:21 ` [patch 3/9] cpuset: oom panic fix Nick Piggin
@ 2006-07-28  7:29   ` Nick Piggin
  2006-07-28  9:06   ` Paul Jackson
  1 sibling, 0 replies; 15+ messages in thread
From: Nick Piggin @ 2006-07-28  7:29 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linux Memory Management, Paul Jackson

On Fri, Jul 28, 2006 at 09:21:11AM +0200, Nick Piggin wrote:
> cpuset_excl_nodes_overlap always returns 0 if current is exiting. This caused
> customer's systems to panic in the OOM killer when processes were having
> trouble getting memory for the final put_user in mm_release. Even though there
> were lots of processes to kill.
> 
> Change to returning 0 in this case. This achieves parity with !CONFIG_CPUSETS
> case, and was observed to fix the problem.
> 
> Signed-off-by: Nick Piggin <npiggin@suse.de>

I forgot to mention, I think this one was also Acked-by: Paul Jackson.
CCing him...

> 
> Index: linux-2.6/kernel/cpuset.c
> ===================================================================
> --- linux-2.6.orig/kernel/cpuset.c
> +++ linux-2.6/kernel/cpuset.c
> @@ -2369,7 +2369,7 @@ EXPORT_SYMBOL_GPL(cpuset_mem_spread_node
>  int cpuset_excl_nodes_overlap(const struct task_struct *p)
>  {
>  	const struct cpuset *cs1, *cs2;	/* my and p's cpuset ancestors */
> -	int overlap = 0;		/* do cpusets overlap? */
> +	int overlap = 1;		/* do cpusets overlap? */
>  
>  	task_lock(current);
>  	if (current->flags & PF_EXITING) {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch 3/9] cpuset: oom panic fix
  2006-07-28  7:21 ` [patch 3/9] cpuset: oom panic fix Nick Piggin
  2006-07-28  7:29   ` Nick Piggin
@ 2006-07-28  9:06   ` Paul Jackson
  1 sibling, 0 replies; 15+ messages in thread
From: Paul Jackson @ 2006-07-28  9:06 UTC (permalink / raw)
  To: Nick Piggin; +Cc: akpm, linux-mm

Nick wrote:
> Change to returning 0 in this case.

I think that comment is a typo, and should be instead:

> Change to returning 1 in this case.

Other than that nit:

Acked-by: Paul Jackson <pj@sgi.com>

I haven't actually seen a test case in hand for this one,
but it sure seems like "the right thing to do (tm)", and
I understand Nick has seen it fix a real problem.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 4/9] oom: cpuset hint
  2006-07-28  7:20 [patch 0/9] oom: various fixes and improvements for 2.6.18-rc2 Nick Piggin
                   ` (2 preceding siblings ...)
  2006-07-28  7:21 ` [patch 3/9] cpuset: oom panic fix Nick Piggin
@ 2006-07-28  7:21 ` Nick Piggin
  2006-07-28  9:07   ` Paul Jackson
  2006-07-28  7:21 ` [patch 5/9] oom: handle current exiting Nick Piggin
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 15+ messages in thread
From: Nick Piggin @ 2006-07-28  7:21 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Nick Piggin, Linux Memory Management

cpuset_excl_nodes_overlap does not always indicate that killing a task will
not free any memory we for us. For example, we may be asking for an allocation
from _anywhere_ in the machine, or the task in question may be pinning memory
that is outside its cpuset.  Fix this by just causing cpuset_excl_nodes_overlap
to reduce the badness rather than disallow it.

Signed-off-by: Nick Piggin <npiggin@suse.de>

Index: linux-2.6/mm/oom_kill.c
===================================================================
--- linux-2.6.orig/mm/oom_kill.c
+++ linux-2.6/mm/oom_kill.c
@@ -127,6 +127,14 @@ unsigned long badness(struct task_struct
 		points /= 4;
 
 	/*
+	 * If p's nodes don't overlap ours, it may still help to kill p
+	 * because p may have allocated or otherwise mapped memory on
+	 * this node before. However it will be less likely.
+	 */
+	if (!cpuset_excl_nodes_overlap(p))
+		points /= 8;
+
+	/*
 	 * Adjust the score by oomkilladj.
 	 */
 	if (p->oomkilladj) {
@@ -196,9 +204,6 @@ static struct task_struct *select_bad_pr
 			continue;
 		if (p->oomkilladj == OOM_DISABLE)
 			continue;
-		/* If p's nodes don't overlap ours, it won't help to kill p. */
-		if (!cpuset_excl_nodes_overlap(p))
-			continue;
 
 		/*
 		 * This is in the process of releasing memory so wait for it

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch 4/9] oom: cpuset hint
  2006-07-28  7:21 ` [patch 4/9] oom: cpuset hint Nick Piggin
@ 2006-07-28  9:07   ` Paul Jackson
  0 siblings, 0 replies; 15+ messages in thread
From: Paul Jackson @ 2006-07-28  9:07 UTC (permalink / raw)
  To: Nick Piggin; +Cc: akpm, linux-mm

I don't have a test case for this in hand, but
it sure makes sense.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 5/9] oom: handle current exiting
  2006-07-28  7:20 [patch 0/9] oom: various fixes and improvements for 2.6.18-rc2 Nick Piggin
                   ` (3 preceding siblings ...)
  2006-07-28  7:21 ` [patch 4/9] oom: cpuset hint Nick Piggin
@ 2006-07-28  7:21 ` Nick Piggin
  2006-07-28  7:21 ` [patch 6/9] oom: handle oom_disable exiting Nick Piggin
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Nick Piggin @ 2006-07-28  7:21 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Nick Piggin, Linux Memory Management

If current *is* exiting, it should actually be allowed to access reserved
memory rather than OOM kill something else. Can't do this via a straight check
in page_alloc.c because that would allow multiple tasks to use up reserves.
Instead cause current to OOM-kill itself which will mark it as TIF_MEMDIE.

The current procedure of simply aborting the OOM-kill if a task is exiting
can lead to OOM deadlocks.

In the case of killing a PF_EXITING task, don't make a lot of noise about it.
This becomes more important in future patches, where we can "kill" OOM_DISABLE
tasks.

Signed-off-by: Nick Piggin <npiggin@suse.de>

Index: linux-2.6/mm/oom_kill.c
===================================================================
--- linux-2.6.orig/mm/oom_kill.c
+++ linux-2.6/mm/oom_kill.c
@@ -208,11 +208,26 @@ static struct task_struct *select_bad_pr
 		/*
 		 * This is in the process of releasing memory so wait for it
 		 * to finish before killing some other task by mistake.
+		 *
+		 * However, if p is the current task, we allow the 'kill' to
+		 * go ahead if it is exiting: this will simply set TIF_MEMDIE,
+		 * which will allow it to gain access to memory reserves in
+		 * the process of exiting and releasing its resources.
+		 * Otherwise we could get an OOM deadlock.
 		 */
 		releasing = test_tsk_thread_flag(p, TIF_MEMDIE) ||
 						p->flags & PF_EXITING;
-		if (releasing && !(p->flags & PF_DEAD))
+		if (releasing) {
+			/* PF_DEAD tasks have already released their mm */
+			if (p->flags & PF_DEAD)
+				continue;
+			if (p->flags & PF_EXITING && p == current) {
+				chosen = p;
+				*ppoints = ULONG_MAX;
+				break;
+			}
 			return ERR_PTR(-1UL);
+		}
 		if (p->flags & PF_SWAPOFF)
 			return p;
 
@@ -246,8 +261,11 @@ static void __oom_kill_task(struct task_
 		return;
 	}
 	task_unlock(p);
-	printk(KERN_ERR "%s: Killed process %d (%s).\n",
+
+	if (message) {
+		printk(KERN_ERR "%s: Killed process %d (%s).\n",
 				message, p->pid, p->comm);
+	}
 
 	/*
 	 * We give our sacrificial lamb high priority and access to
@@ -298,8 +316,17 @@ static int oom_kill_process(struct task_
 	struct task_struct *c;
 	struct list_head *tsk;
 
-	printk(KERN_ERR "Out of Memory: Kill process %d (%s) score %li and "
-		"children.\n", p->pid, p->comm, points);
+	/*
+	 * If the task is already exiting, don't alarm the sysadmin or kill
+	 * its children or threads, just set TIF_MEMDIE so it can die quickly
+	 */
+	if (p->flags & PF_EXITING) {
+		__oom_kill_task(p, NULL);
+		return 0;
+	}
+
+	printk(KERN_ERR "Out of Memory: Kill process %d (%s) score %li"
+			" and children.\n", p->pid, p->comm, points);
 	/* Try to kill a child first */
 	list_for_each(tsk, &p->children) {
 		c = list_entry(tsk, struct task_struct, sibling);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 6/9] oom: handle oom_disable exiting
  2006-07-28  7:20 [patch 0/9] oom: various fixes and improvements for 2.6.18-rc2 Nick Piggin
                   ` (4 preceding siblings ...)
  2006-07-28  7:21 ` [patch 5/9] oom: handle current exiting Nick Piggin
@ 2006-07-28  7:21 ` Nick Piggin
  2006-07-28  7:21 ` [patch 7/9] oom: swapoff tasks tweak Nick Piggin
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Nick Piggin @ 2006-07-28  7:21 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Nick Piggin, Linux Memory Management

Having the oomkilladj == OOM_DISABLE check before the releasing check means
that oomkilladj == OOM_DISABLE tasks exiting will not stop the OOM killer.

Moving the test down will give the desired behaviour. Also: it will allow
them to "OOM-kill" themselves if they are exiting. As per the previous patch,
this is required to prevent OOM killer deadlocks (and they don't actually
get killed, because they're already exiting -- they're simply allowed access
to memory reserves).

Signed-off-by: Nick Piggin <npiggin@suse.de>

Index: linux-2.6/mm/oom_kill.c
===================================================================
--- linux-2.6.orig/mm/oom_kill.c
+++ linux-2.6/mm/oom_kill.c
@@ -202,8 +202,6 @@ static struct task_struct *select_bad_pr
 		/* skip the init task with pid == 1 */
 		if (p->pid == 1)
 			continue;
-		if (p->oomkilladj == OOM_DISABLE)
-			continue;
 
 		/*
 		 * This is in the process of releasing memory so wait for it
@@ -228,6 +226,8 @@ static struct task_struct *select_bad_pr
 			}
 			return ERR_PTR(-1UL);
 		}
+		if (p->oomkilladj == OOM_DISABLE)
+			continue;
 		if (p->flags & PF_SWAPOFF)
 			return p;
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 7/9] oom: swapoff tasks tweak
  2006-07-28  7:20 [patch 0/9] oom: various fixes and improvements for 2.6.18-rc2 Nick Piggin
                   ` (5 preceding siblings ...)
  2006-07-28  7:21 ` [patch 6/9] oom: handle oom_disable exiting Nick Piggin
@ 2006-07-28  7:21 ` Nick Piggin
  2006-07-28  7:21 ` [patch 8/9] oom: kthread infinite loop fix Nick Piggin
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Nick Piggin @ 2006-07-28  7:21 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Nick Piggin, Linux Memory Management

PF_SWAPOFF processes currently cause select_bad_process to return straight
away. Instead, give them high priority, so we will kill them first, however
we also first ensure no parallel OOM kills are happening at the same time.

Signed-off-by: Nick Piggin <npiggin@suse.de>

Index: linux-2.6/mm/oom_kill.c
===================================================================
--- linux-2.6.orig/mm/oom_kill.c
+++ linux-2.6/mm/oom_kill.c
@@ -58,6 +58,12 @@ unsigned long badness(struct task_struct
 	}
 
 	/*
+	 * swapoff can easily use up all memory, so kill those first.
+	 */
+	if (p->flags & PF_SWAPOFF)
+		return ULONG_MAX;
+
+	/*
 	 * The memory size of the process is the basis for the badness.
 	 */
 	points = mm->total_vm;
@@ -228,8 +234,6 @@ static struct task_struct *select_bad_pr
 		}
 		if (p->oomkilladj == OOM_DISABLE)
 			continue;
-		if (p->flags & PF_SWAPOFF)
-			return p;
 
 		points = badness(p, uptime.tv_sec);
 		if (points > *ppoints || !chosen) {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 8/9] oom: kthread infinite loop fix
  2006-07-28  7:20 [patch 0/9] oom: various fixes and improvements for 2.6.18-rc2 Nick Piggin
                   ` (6 preceding siblings ...)
  2006-07-28  7:21 ` [patch 7/9] oom: swapoff tasks tweak Nick Piggin
@ 2006-07-28  7:21 ` Nick Piggin
  2006-07-28  7:22 ` [patch 9/9] oom: more printk Nick Piggin
  2006-07-28  7:44 ` [patch 0/9] oom: various fixes and improvements for 2.6.18-rc2 Andrew Morton
  9 siblings, 0 replies; 15+ messages in thread
From: Nick Piggin @ 2006-07-28  7:21 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Nick Piggin, Linux Memory Management

Skip kernel threads, rather than having them return 0 from badness.
Theoretically, badness might truncate all results to 0, thus a kernel thread
might be picked first, causing an infinite loop.

Index: linux-2.6/mm/oom_kill.c
===================================================================
--- linux-2.6.orig/mm/oom_kill.c
+++ linux-2.6/mm/oom_kill.c
@@ -205,6 +205,9 @@ static struct task_struct *select_bad_pr
 		unsigned long points;
 		int releasing;
 
+		/* skip kernel threads */
+		if (!p->mm)
+			continue;
 		/* skip the init task with pid == 1 */
 		if (p->pid == 1)
 			continue;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 9/9] oom: more printk
  2006-07-28  7:20 [patch 0/9] oom: various fixes and improvements for 2.6.18-rc2 Nick Piggin
                   ` (7 preceding siblings ...)
  2006-07-28  7:21 ` [patch 8/9] oom: kthread infinite loop fix Nick Piggin
@ 2006-07-28  7:22 ` Nick Piggin
  2006-07-28  7:44 ` [patch 0/9] oom: various fixes and improvements for 2.6.18-rc2 Andrew Morton
  9 siblings, 0 replies; 15+ messages in thread
From: Nick Piggin @ 2006-07-28  7:22 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Nick Piggin, Linux Memory Management

Print the name of the task invoking the OOM killer. Could make debugging
easier.

Signed-off-by: Nick Piggin <npiggin@suse.de>

Index: linux-2.6/mm/oom_kill.c
===================================================================
--- linux-2.6.orig/mm/oom_kill.c
+++ linux-2.6/mm/oom_kill.c
@@ -359,8 +359,9 @@ void out_of_memory(struct zonelist *zone
 	unsigned long points = 0;
 
 	if (printk_ratelimit()) {
-		printk("oom-killer: gfp_mask=0x%x, order=%d\n",
-			gfp_mask, order);
+		printk(KERN_WARNING "%s invoked oom-killer: "
+			"gfp_mask=0x%x, order=%d, oomkilladj=%d\n",
+			current->comm, gfp_mask, order, current->oomkilladj);
 		dump_stack();
 		show_mem();
 	}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch 0/9] oom: various fixes and improvements for 2.6.18-rc2
  2006-07-28  7:20 [patch 0/9] oom: various fixes and improvements for 2.6.18-rc2 Nick Piggin
                   ` (8 preceding siblings ...)
  2006-07-28  7:22 ` [patch 9/9] oom: more printk Nick Piggin
@ 2006-07-28  7:44 ` Andrew Morton
  2006-07-28  9:28   ` Nick Piggin
  9 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2006-07-28  7:44 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-mm

On Fri, 28 Jul 2006 09:20:44 +0200 (CEST)
Nick Piggin <npiggin@suse.de> wrote:

> These are some various OOM killer fixes that I have accumulated. Some of
> the more important ones are in SLES10, and were developed in response to
> issues coming up in stress testing.
> 
> The other small fixes haven't been widely tested, but they're issues I
> spotted when working in this area.
> 
> Comments?

They all look good to me (although I haven't grappled with the cpuset ones
yet).

The "oom: reclaim_mapped on oom" one is kinda funny.  Back in 2.5.early I
decided that we were probably donig too much scanning before declaring oom
so I randomly reduced it by a factor of, iirc, four.  Under the assumption
that someone would start hitting early ooms and would get in there and tune
it for real.  It took five years ;)

Which of these patches have been well-tested and which are the more
speculative ones?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch 0/9] oom: various fixes and improvements for 2.6.18-rc2
  2006-07-28  7:44 ` [patch 0/9] oom: various fixes and improvements for 2.6.18-rc2 Andrew Morton
@ 2006-07-28  9:28   ` Nick Piggin
  0 siblings, 0 replies; 15+ messages in thread
From: Nick Piggin @ 2006-07-28  9:28 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm

On Fri, Jul 28, 2006 at 12:44:10AM -0700, Andrew Morton wrote:
> On Fri, 28 Jul 2006 09:20:44 +0200 (CEST)
> Nick Piggin <npiggin@suse.de> wrote:
> 
> > These are some various OOM killer fixes that I have accumulated. Some of
> > the more important ones are in SLES10, and were developed in response to
> > issues coming up in stress testing.
> > 
> > The other small fixes haven't been widely tested, but they're issues I
> > spotted when working in this area.
> > 
> > Comments?
> 
> They all look good to me (although I haven't grappled with the cpuset ones
> yet).

OK.

> 
> The "oom: reclaim_mapped on oom" one is kinda funny.  Back in 2.5.early I
> decided that we were probably donig too much scanning before declaring oom
> so I randomly reduced it by a factor of, iirc, four.  Under the assumption
> that someone would start hitting early ooms and would get in there and tune
> it for real.  It took five years ;)

Well, I guess it *can* make the machine less responsive during OOM, but
I guess it is probably reasonable to trade "OOM throughput" for a system
that is more conservative about killing tasks.

The workload involved was semi-realistic I guess, involving apache/mysql
servers in a hypervisor guest. The after patch 1, it was still killing
early, and with patch 2 it seemed to be the minimum required to get it to
use up all swap first.

> 
> Which of these patches have been well-tested and which are the more
> speculative ones?

1,2,3 are in SLES10, and tested/confirmed to fix things. The others I
guess are more edge cases, but I hope that together they can make things
a little more robust.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2006-07-28  9:28 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-07-28  7:20 [patch 0/9] oom: various fixes and improvements for 2.6.18-rc2 Nick Piggin
2006-07-28  7:20 ` [patch 1/9] oom: use unreclaimable info Nick Piggin
2006-07-28  7:21 ` [patch 2/9] oom: reclaim_mapped on oom Nick Piggin
2006-07-28  7:21 ` [patch 3/9] cpuset: oom panic fix Nick Piggin
2006-07-28  7:29   ` Nick Piggin
2006-07-28  9:06   ` Paul Jackson
2006-07-28  7:21 ` [patch 4/9] oom: cpuset hint Nick Piggin
2006-07-28  9:07   ` Paul Jackson
2006-07-28  7:21 ` [patch 5/9] oom: handle current exiting Nick Piggin
2006-07-28  7:21 ` [patch 6/9] oom: handle oom_disable exiting Nick Piggin
2006-07-28  7:21 ` [patch 7/9] oom: swapoff tasks tweak Nick Piggin
2006-07-28  7:21 ` [patch 8/9] oom: kthread infinite loop fix Nick Piggin
2006-07-28  7:22 ` [patch 9/9] oom: more printk Nick Piggin
2006-07-28  7:44 ` [patch 0/9] oom: various fixes and improvements for 2.6.18-rc2 Andrew Morton
2006-07-28  9:28   ` Nick Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox