linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
To: Andrew Morton <akpm@osdl.org>
Cc: Rik van Riel <riel@redhat.com>,
	nikita@clusterfs.com, linux-mm@kvack.org,
	Nick Piggin <piggin@cyberone.com.au>
Subject: Re: [PATCH] ignore referenced pages on reclaim when OOM
Date: Wed, 10 Nov 2004 16:41:34 -0200	[thread overview]
Message-ID: <20041110184134.GC12867@logos.cnet> (raw)
In-Reply-To: <20041108142837.307029fc.akpm@osdl.org>

On Mon, Nov 08, 2004 at 02:28:37PM -0800, Andrew Morton wrote:
> Rik van Riel <riel@redhat.com> wrote:
> >
> > On Tue, 9 Nov 2004, Nikita Danilov wrote:
> > 
> > >  > Speeds up extreme load performance on Rik's tests.
> > > 
> > > I recently tested quite similar thing, the only dfference being that in
> > > my case references bit started being ignored when scanning priority
> > > reached 2 rather than 0.
> > > 
> > > I found that it _degrades_ performance in the loads when there is a lot
> > > of file system write-back going from tail of the inactive list (like
> > > dirtying huge file through mmap in a loop).
> > 
> > Well yeah, when you reach priority 2, you've only scanned
> > 1/4 of memory.  On the other hand, when you reach priority
> > 0, you've already scanned all pages once - beyond that point
> > the referenced bit really doesn't buy you much any more.
> > 
> 
> But we have to scan active, referenced pages two times to move them onto
> the inactive list.  A bit more, really, because nowadays
> refill_inactive_zone() doesn't even run page_referenced() until it starts
> to reach higher scanning priorities.
> 
> So it could be that we're just not scanning enough.

You know, all_unreclaimable has drawbacks.

Its hard to know whether you have "scanned enough to consider the box OOM 
and trigger OOM killer" when all_unreclaimable avoids the system 
from "scanning enough".

I'm trying to improve the OOM-kill-from-kswapd patch but z->all_unreclaimable 
is currently the bigger "rock on the shoe" - we need some way to detect that
the zones have been scanned enough so to be able to say 
"OK, I have scanned enough and no freeable pages appear, its time 
to trigger the OOM killer".

So z->all_unreclaimable logic and "OOM detection" are conflicting goals.

There must be some way to combine both effectively.

This is my current patch - avoids spurious OOM kills but obviously 
fails to set "worked_dma" - "worked_normal" due to all_unreclaimable logic,  
resulting in livelock when swapspace exhauts. 

Ideas are welcome.


--- vmscan.c.orig	2004-11-09 16:38:04.000000000 -0200
+++ vmscan.c	2004-11-10 18:59:43.098090736 -0200
@@ -878,6 +878,8 @@
 		shrink_zone(zone, sc);
 	}
 }
+
+int task_looping_oom = 0;
  
 /*
  * This is the main entry point to direct page reclaim.
@@ -952,8 +954,8 @@
 		if (sc.nr_scanned && priority < DEF_PRIORITY - 2)
 			blk_congestion_wait(WRITE, HZ/10);
 	}
-	if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY))
-		out_of_memory(gfp_mask);
+        if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY))
+		task_looping_oom = 1;
 out:
 	for (i = 0; zones[i] != 0; i++) {
 		struct zone *zone = zones[i];
@@ -963,6 +965,8 @@
 
 		zone->prev_priority = zone->temp_priority;
 	}
+	if (ret)
+		task_looping_oom = 0;
 	return ret;
 }
 
@@ -997,13 +1001,17 @@
 	int all_zones_ok;
 	int priority;
 	int i;
-	int total_scanned, total_reclaimed;
+	int total_scanned, total_reclaimed, low_reclaimed;
+	int worked_norm, worked_dma;
 	struct reclaim_state *reclaim_state = current->reclaim_state;
 	struct scan_control sc;
 
+
 loop_again:
 	total_scanned = 0;
 	total_reclaimed = 0;
+	low_reclaimed = 0;
+	worked_norm = worked_dma = 0;
 	sc.gfp_mask = GFP_KERNEL;
 	sc.may_writepage = 0;
 	sc.nr_mapped = read_page_state(nr_mapped);
@@ -1072,6 +1080,17 @@
 			if (zone->all_unreclaimable && priority != DEF_PRIORITY)
 				continue;
 
+			/* if we're scanning dma or normal, and priority 
+			 * reached zero, set "worked_dma" or "worked_norm" 
+			 * accordingly.
+			 */
+			if (i <= 1 && priority == 0) {
+				if (!i) 
+					worked_dma = 1;
+				else
+					worked_norm = 1;
+			}
+
 			if (nr_pages == 0) {	/* Not software suspend */
 				if (!zone_watermark_ok(zone, order,
 						zone->pages_high, end_zone, 0, 0))
@@ -1088,6 +1107,10 @@
 			shrink_slab(sc.nr_scanned, GFP_KERNEL, lru_pages);
 			sc.nr_reclaimed += reclaim_state->reclaimed_slab;
 			total_reclaimed += sc.nr_reclaimed;
+
+			if (i <= 1)
+				low_reclaimed += sc.nr_reclaimed;
+
 			if (zone->all_unreclaimable)
 				continue;
 			if (zone->pages_scanned >= (zone->nr_active +
@@ -1128,6 +1151,29 @@
 
 		zone->prev_priority = zone->temp_priority;
 	}
+
+
+	if (!low_reclaimed && worked_dma && worked_norm && task_looping_oom) {
+
+		printk(KERN_ERR "kswp: pri:%d tot_recl:%d wrkd_dma:%d"
+				"wrkd_norm:%d tsk_loop_oom:%d\n",
+			priority, total_reclaimed, worked_dma, worked_norm, 
+				task_looping_oom);
+
+		/* 
+		 * Only kill if ZONE_NORMAL/ZONE_DMA are both below
+		 * pages_min
+		 */
+		for (i = pgdat->nr_zones - 2; i >= 0; i--) {
+			struct zone *zone = pgdat->node_zones + i;
+
+			if (zone->free_pages > zone->pages_min)
+				return 0;
+		}
+		out_of_memory(GFP_KERNEL);
+		task_looping_oom = 0;
+	}
+
 	if (!all_zones_ok) {
 		cond_resched();
 		goto loop_again;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

  reply	other threads:[~2004-11-10 18:41 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-11-08 18:18 Marcelo Tosatti
2004-11-08 21:48 ` Nikita Danilov
2004-11-08 21:56   ` Rik van Riel
2004-11-08 18:48     ` Marcelo Tosatti
2004-11-08 22:28     ` Andrew Morton
2004-11-10 18:41       ` Marcelo Tosatti [this message]
2004-11-10 22:29         ` Andrew Morton
2004-11-10 20:09           ` Marcelo Tosatti
2004-11-12 16:10           ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20041110184134.GC12867@logos.cnet \
    --to=marcelo.tosatti@cyclades.com \
    --cc=akpm@osdl.org \
    --cc=linux-mm@kvack.org \
    --cc=nikita@clusterfs.com \
    --cc=piggin@cyberone.com.au \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox