[PATCH] 2.2.14 VM fix #3

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] 2.2.14 VM fix #3
@ 2000-01-21  4:07 Rik van Riel
  2000-01-21 13:34 ` Andrea Arcangeli
  0 siblings, 1 reply; 5+ messages in thread
From: Rik van Riel @ 2000-01-21  4:07 UTC (permalink / raw)
  To: Alan Cox; +Cc: Andrea Arcangeli, Linux MM, Linux Kernel

Hi Alan, Andrea,

here is my 3rd patch for the VM troubles. It has merged
parts of Andrea's patch with my patch and does some extra
improvements.

Most notably:
- int low_on_memory removed, now using freepages.* for
  hysteresis
- when we get below freepages.min, only __GFP_HIGH
  allocations are allowed to succeed (this was always
  the case and is exactly how it is documented, it
  will reduce the chance of the system running out of
  memory and code that calls with GFP_KERNEL can handle it)
- kswapd does the 1-second sleep and background freeing
  between freepages.low and freepages.high
- below freepages.low, kswapd is immediately woken up,
  __GFP_WAIT processes do a schedule() in case they
  might be lower priority than kswapd, otherwise kswapd
  will have to free memory when they get out of the way
- once we reach freepages.min, processes will actively
  try to free memory themselves and get refused their
  memory if they don't free any

In short, this patch brings the code back to the most
obvious possible code path and reverts back to old
trusted behaviour. I know this behaviour works because
we've been running that way for years...

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.


--- linux-2.2.15-pre3/mm/vmscan.c.orig	Wed Jan 19 21:18:54 2000
+++ linux-2.2.15-pre3/mm/vmscan.c	Fri Jan 21 04:24:48 2000
@@ -485,41 +485,26 @@
 		 * the processes needing more memory will wake us
 		 * up on a more timely basis.
 		 */
-		interruptible_sleep_on_timeout(&kswapd_wait, HZ);
 		while (nr_free_pages < freepages.high)
 		{
-			if (do_try_to_free_pages(GFP_KSWAPD))
-			{
-				if (tsk->need_resched)
-					schedule();
-				continue;
-			}
-			tsk->state = TASK_INTERRUPTIBLE;
-			schedule_timeout(10*HZ);
+			if (!do_try_to_free_pages(GFP_KSWAPD))
+				break;
+			if (tsk->need_resched)
+				schedule();
 		}
+		run_task_queue(&tq_disk);
+		interruptible_sleep_on_timeout(&kswapd_wait, HZ);
 	}
 }
 
 /*
- * Called by non-kswapd processes when they want more
- * memory.
- *
- * In a perfect world, this should just wake up kswapd
- * and return. We don't actually want to swap stuff out
- * from user processes, because the locking issues are
- * nasty to the extreme (file write locks, and MM locking)
- *
- * One option might be to let kswapd do all the page-out
- * and VM page table scanning that needs locking, and this
- * process thread could do just the mmap shrink stage that
- * can be done by just dropping cached pages without having
- * any deadlock issues.
+ * Called by non-kswapd processes when kswapd really cannot
+ * keep up with the demand for free memory.
  */
 int try_to_free_pages(unsigned int gfp_mask)
 {
 	int retval = 1;
 
-	wake_up_interruptible(&kswapd_wait);
 	if (gfp_mask & __GFP_WAIT)
 		retval = do_try_to_free_pages(gfp_mask);
 	return retval;
--- linux-2.2.15-pre3/mm/page_alloc.c.orig	Wed Jan 19 21:32:05 2000
+++ linux-2.2.15-pre3/mm/page_alloc.c	Fri Jan 21 05:02:13 2000
@@ -20,6 +20,7 @@
 
 int nr_swap_pages = 0;
 int nr_free_pages = 0;
+extern struct wait_queue * kswapd_wait;
 
 /*
  * Free area management
@@ -184,8 +185,6 @@
 	atomic_set(&map->count, 1); \
 } while (0)
 
-int low_on_memory = 0;
-
 unsigned long __get_free_pages(int gfp_mask, unsigned long order)
 {
 	unsigned long flags;
@@ -212,21 +211,21 @@
 	if (!(current->flags & PF_MEMALLOC)) {
 		int freed;
 
-		if (nr_free_pages > freepages.min) {
-			if (!low_on_memory)
-				goto ok_to_allocate;
-			if (nr_free_pages >= freepages.high) {
-				low_on_memory = 0;
-				goto ok_to_allocate;
-			}
+		if (nr_free_pages <= freepages.low) {
+			wake_up_interruptible(&kswapd_wait);
+			/* a bit of defensive programming */
+			if (gfp_mask & __GFP_WAIT)
+				schedule();
 		}
+		if (nr_free_pages > freepages.min)
+			goto ok_to_allocate;
 
-		low_on_memory = 1;
+		/* Danger, danger! Do something or fail */
 		current->flags |= PF_MEMALLOC;
 		freed = try_to_free_pages(gfp_mask);
 		current->flags &= ~PF_MEMALLOC;
 
-		if (!freed && !(gfp_mask & (__GFP_MED | __GFP_HIGH)))
+		if (!freed && !(gfp_mask & __GFP_HIGH))
 			goto nopage;
 	}
 ok_to_allocate:

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] 2.2.14 VM fix #3
  2000-01-21  4:07 [PATCH] 2.2.14 VM fix #3 Rik van Riel
@ 2000-01-21 13:34 ` Andrea Arcangeli
  2000-01-24 19:22   ` Stephen C. Tweedie
  0 siblings, 1 reply; 5+ messages in thread
From: Andrea Arcangeli @ 2000-01-21 13:34 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Alan Cox, Linux MM, Linux Kernel

On Fri, 21 Jan 2000, Rik van Riel wrote:

>Hi Alan, Andrea,
>
>here is my 3rd patch for the VM troubles. It has merged
>parts of Andrea's patch with my patch and does some extra
>improvements.

Sorry but I will never agree with your patch. The GFP_KERNEL change is not
something for 2.2.x. We have major deadlocks in getblk for example and you
may trigger tham more easily forbidding GFP_MID allocations to succeed. I
don't really see why you do these changes. What problem do you had on your
machine related to that? Such change sure won't help atomic allocations. Your
change only make a difference if we are oom.

Also killing the low_on_memory will harm performance. You doesn't seems to
see what such bit (that should be a per-process thing) is good for.

And the 1-second polling loop has to be killed since it make no sense.

>- below freepages.low, kswapd is immediately woken up,

Yes, using freepages.low is way better than my original freepages.high. I
noticed that this night after posting the patch. Anyway it's a
performance-only issue (see my other email) where I am providing an
incremental patch and a new version of my patch.

Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] 2.2.14 VM fix #3
  2000-01-21 13:34 ` Andrea Arcangeli
@ 2000-01-24 19:22   ` Stephen C. Tweedie
  2000-01-24 22:38     ` Rik van Riel
  2000-01-25  9:08     ` Andrea Arcangeli
  0 siblings, 2 replies; 5+ messages in thread
From: Stephen C. Tweedie @ 2000-01-24 19:22 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Rik van Riel, Alan Cox, Linux MM, Linux Kernel, Stephen Tweedie

Hi,

On Fri, 21 Jan 2000 14:34:14 +0100 (CET), Andrea Arcangeli
<andrea@suse.de> said:

> Sorry but I will never agree with your patch. The GFP_KERNEL change is not
> something for 2.2.x. We have major deadlocks in getblk for example and you
> may trigger tham more easily forbidding GFP_MID allocations to
> succeed. 

Agreed, definitely.

> Also killing the low_on_memory will harm performance. You doesn't seems to
> see what such bit (that should be a per-process thing) is good for.

Also agreed --- removing the per-process flag will just penalise _all_
processes when we enter thrashing.

> And the 1-second polling loop has to be killed since it make no sense.

Actually, that probably isn't too bad, as long as we make sure we wake
up kswapd on GFP_ATOMIC allocations when the free page count gets below
freepages.min, even if the allocation succeeded (and Rik's patch does
do that).

--Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] 2.2.14 VM fix #3
  2000-01-24 19:22   ` Stephen C. Tweedie
@ 2000-01-24 22:38     ` Rik van Riel
  2000-01-25  9:08     ` Andrea Arcangeli
  1 sibling, 0 replies; 5+ messages in thread
From: Rik van Riel @ 2000-01-24 22:38 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: Andrea Arcangeli, Alan Cox, Linux MM, Linux Kernel

On Mon, 24 Jan 2000, Stephen C. Tweedie wrote:
> On Fri, 21 Jan 2000 14:34:14 +0100 (CET), Andrea Arcangeli
> <andrea@suse.de> said:
> 
> > Sorry but I will never agree with your patch. The GFP_KERNEL change is not
> > something for 2.2.x. We have major deadlocks in getblk for example and you
> > may trigger tham more easily forbidding GFP_MID allocations to
> > succeed. 
> 
> Agreed, definitely.

OTOH, 2.2.1{3,4} have seen deadlocks because GFP_KERNEL
allocations had eaten up all of memory and a PF_MEMALLOC
allocation couldn't get through. It has also DoSed some
servers where the network driver got temporarily confused
when a GFP_ATOMIC allocation failed.

> > Also killing the low_on_memory will harm performance. You doesn't seems to
> > see what such bit (that should be a per-process thing) is good for.
> 
> Also agreed --- removing the per-process flag will just penalise
> _all_ processes when we enter thrashing.

Except that it never was a per-process flag...
(so we didn't lose anything there)

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] 2.2.14 VM fix #3
  2000-01-24 19:22   ` Stephen C. Tweedie
  2000-01-24 22:38     ` Rik van Riel
@ 2000-01-25  9:08     ` Andrea Arcangeli
  1 sibling, 0 replies; 5+ messages in thread
From: Andrea Arcangeli @ 2000-01-25  9:08 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: Alan Cox, Linux MM, Linux Kernel

On Mon, 24 Jan 2000, Stephen C. Tweedie wrote:

>> And the 1-second polling loop has to be killed since it make no sense.
>
>Actually, that probably isn't too bad, as long as we make sure we wake

Agreed. It definitely isn't too bad. But as far I can tell it shouldn't
help either in RL and performance would be better without it. The point of
the 1 second polling loop basically is to refill the freelist from the low
to the high watermark even if the last allocation didn't caused the
watermark to go below the "low" level.

I think it would be better to make sure that kswapd will do a high-low
work at each run and not a not interesting 2/3 page work (for obvious
icache-lines reasons). And kswapd is so fast freeing the high-low pages,
that 1 second is a too long measure to make a RL difference. We just made
sure to not block on allocations before we go below the "min" level, thus
kswapd will have all the time to do its work before we block (if the mem
load is not heavy, and if the load is heavy the 1 second polling loop was
just a noop in the first place ;).

Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2000-01-25  9:08 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-01-21  4:07 [PATCH] 2.2.14 VM fix #3 Rik van Riel
2000-01-21 13:34 ` Andrea Arcangeli
2000-01-24 19:22   ` Stephen C. Tweedie
2000-01-24 22:38     ` Rik van Riel
2000-01-25  9:08     ` Andrea Arcangeli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox