linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Remove OOM killer from try_to_free_pages / all_unreclaimable braindamage
@ 2004-11-05 20:01 Marcelo Tosatti
  2004-11-05 23:32 ` Jesse Barnes
                   ` (2 more replies)
  0 siblings, 3 replies; 41+ messages in thread
From: Marcelo Tosatti @ 2004-11-05 20:01 UTC (permalink / raw)
  To: Andrew Morton, Nick Piggin; +Cc: linux-kernel, linux-mm

Hi,

As you know the OOM is very problematic in 2.6 right now - so I went
to investigate it. 

Currently the oom killer is invoked from the task reclaim 
code (try_to_free_pages), which IMO is fundamentally broken, 
because its non deterministic - the chance the OOM killer 
will be triggered increases as the number of tasks inside 
reclaiming increases. And kswapd is freeing pages in parallel, 
which is completly ignored by this approach.

In my opinion the correct approach is to trigger the OOM killer 
when kswapd is unable to free pages. Once that is done, the number 
of tasks inside page reclaim is irrelevant. 

So the following patch moves the out_of_memory() call to 
balance_pgdat(), and makes it dependant on success reclaiming pages 
when work has actually been done, and on priority reaching zero.

It also removes the "If it's been a long time since last failure dont 
OOM kill" logic which for me just tries to paper over a bigger issue. 

Relying on this information (kswapd failure after DEF_PRIORITY passes) 
to trigger the OOM killer seems to be very reliable - it needs some 
more testing though.

With this in place I can't see spurious OOM kills - just need to guarantee
that it reliably OOM kills when we are really out of memory.

While doing this, I noticed that kswapd will happily go to sleep 
if all zones have all_unreclaimable set. I bet this is the reason 
for the page allocation failures we are seeing. So the patch 
also makes balance_pgdat() NOT return and go to "loop_again" 
instead in case of page shortage - even if all_unreclaimable is set.

Basically the "loop_again" logic IS NOT WORKING! 

Comments?

My wife is almost killing me, its Friday night and I've been telling her 
"just another minute" for hours. Have to run.

diff -Nur --show-c-function --exclude='*.orig' linux-2.6.10-rc1-mm2.orig/mm/oom_kill.c linux-2.6.10-rc1-mm2/mm/oom_kill.c
--- linux-2.6.10-rc1-mm2.orig/mm/oom_kill.c	2004-11-04 22:50:50.000000000 -0200
+++ linux-2.6.10-rc1-mm2/mm/oom_kill.c	2004-11-05 18:33:29.918459072 -0200
@@ -240,23 +240,23 @@ void out_of_memory(int gfp_mask)
 	 * If it's been a long time since last failure,
 	 * we're not oom.
 	 */
-	if (since > 5*HZ)
-		goto reset;
+	//if (since > 5*HZ)
+	//	goto reset;
 
 	/*
 	 * If we haven't tried for at least one second,
 	 * we're not really oom.
 	 */
-	since = now - first;
-	if (since < HZ)
-		goto out_unlock;
+	//since = now - first;
+	//if (since < HZ)
+	//	goto out_unlock;
 
 	/*
 	 * If we have gotten only a few failures,
 	 * we're not really oom. 
 	 */
-	if (++count < 10)
-		goto out_unlock;
+//	if (++count < 10)
+//		goto out_unlock;
 
 	/*
 	 * If we just killed a process, wait a while
diff -Nur --show-c-function --exclude='*.orig' linux-2.6.10-rc1-mm2.orig/mm/vmscan.c linux-2.6.10-rc1-mm2/mm/vmscan.c
--- linux-2.6.10-rc1-mm2.orig/mm/vmscan.c	2004-11-04 22:50:50.000000000 -0200
+++ linux-2.6.10-rc1-mm2/mm/vmscan.c	2004-11-05 19:53:35.915836416 -0200
@@ -952,8 +952,6 @@ int try_to_free_pages(struct zone **zone
 		if (sc.nr_scanned && priority < DEF_PRIORITY - 2)
 			blk_congestion_wait(WRITE, HZ/10);
 	}
-	if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY))
-		out_of_memory(gfp_mask);
 out:
 	for (i = 0; zones[i] != 0; i++) {
 		struct zone *zone = zones[i];
@@ -997,13 +995,15 @@ static int balance_pgdat(pg_data_t *pgda
 	int all_zones_ok;
 	int priority;
 	int i;
-	int total_scanned, total_reclaimed;
+	int total_scanned, total_reclaimed, worked;
 	struct reclaim_state *reclaim_state = current->reclaim_state;
 	struct scan_control sc;
 
+
 loop_again:
 	total_scanned = 0;
 	total_reclaimed = 0;
+	worked = 0;
 	sc.gfp_mask = GFP_KERNEL;
 	sc.may_writepage = 0;
 	sc.nr_mapped = read_page_state(nr_mapped);
@@ -1033,6 +1033,10 @@ loop_again:
 				if (zone->present_pages == 0)
 					continue;
 
+				if (!zone_watermark_ok(zone, order, 
+						zone->pages_high, 0, 0, 0))
+					all_zones_ok = 0;
+
 				if (zone->all_unreclaimable &&
 						priority != DEF_PRIORITY)
 					continue;
@@ -1072,6 +1076,9 @@ scan:
 			if (zone->all_unreclaimable && priority != DEF_PRIORITY)
 				continue;
 
+			if (priority == 0)
+				worked = 1;
+
 			if (nr_pages == 0) {	/* Not software suspend */
 				if (!zone_watermark_ok(zone, order,
 						zone->pages_high, end_zone, 0, 0))
@@ -1129,6 +1136,9 @@ out:
 		zone->prev_priority = zone->temp_priority;
 	}
 	if (!all_zones_ok) {
+		if (priority == 0 && !total_reclaimed && worked)
+			out_of_memory(GFP_KERNEL);
+
 		cond_resched();
 		goto loop_again;
 	}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2004-12-10  6:02 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-11-05 20:01 [PATCH] Remove OOM killer from try_to_free_pages / all_unreclaimable braindamage Marcelo Tosatti
2004-11-05 23:32 ` Jesse Barnes
2004-11-05 23:47   ` Thomas Gleixner
2004-11-06  1:20   ` Andrea Arcangeli
2004-11-06  1:26     ` Nick Piggin
2004-11-06  1:36       ` Jesse Barnes
2004-11-06  1:50       ` Andrea Arcangeli
2004-11-06  9:47         ` Hugh Dickins
2004-11-06 10:53           ` Nick Piggin
2004-11-06 15:29             ` Andrea Arcangeli
2004-11-06 15:29           ` Andrea Arcangeli
2004-11-06 16:21             ` Hugh Dickins
2004-12-10  6:02               ` William Lee Irwin III
2004-11-06 11:37         ` Nikita Danilov
2004-11-06 15:32           ` Andrea Arcangeli
2004-11-06 16:54             ` Nikita Danilov
2004-11-06 17:44               ` Andrea Arcangeli
2004-11-06 19:24                 ` Nikita Danilov
2004-11-07  1:16                   ` Andrea Arcangeli
2004-11-06 10:11       ` Marcelo Tosatti
2004-11-06  1:55     ` Thomas Gleixner
2004-11-06 10:28       ` Marcelo Tosatti
2004-11-17 22:54       ` Werner Almesberger
2004-11-17 23:27         ` Chris Ross
2004-11-18  0:04           ` Werner Almesberger
2004-11-18  0:28             ` Chris Ross
2004-11-18  1:14               ` Werner Almesberger
2004-11-18  8:20                 ` Chris Ross
2004-11-18 10:01                   ` Werner Almesberger
2004-11-18 14:44                     ` Thomas Gleixner
2004-11-18 15:10                       ` Chris Friesen
2004-11-06 10:05     ` Marcelo Tosatti
2004-11-06 15:44       ` Andrea Arcangeli
2004-11-06 15:52         ` Arjan van de Ven
2004-11-06 17:09         ` Marcelo Tosatti
2004-11-07  0:48           ` Andrea Arcangeli
2004-11-07 11:21             ` Marcelo Tosatti
2004-11-06 12:53 ` [PATCH] Remove OOM killer Andries Brouwer
2004-11-06 10:41   ` Marcelo Tosatti
2004-11-07  9:26   ` Marko Macek
2004-11-08 16:27 ` [PATCH] Remove OOM killer from try_to_free_pages / all_unreclaimable braindamage Marcelo Tosatti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox