[patch *] VM deadlock fix

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [patch *] VM deadlock fix
@ 2000-09-21 16:44 Rik van Riel
  2000-09-21 20:28 ` Roger Larsson
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Rik van Riel @ 2000-09-21 16:44 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, linux-mm

Hi,

I've found and fixed the deadlocks in the new VM. They turned out 
to be single-cpu only bugs, which explains why they didn't crash my
SMP tesnt box ;)

They have to do with the fact that processes schedule away while
holding IO locks after waking up kswapd. At that point kswapd
spends its time spinning on the IO locks and single-cpu systems
will die...

Due to bad connectivity I'm not attaching this patch but have only
put it online on my home page:

http://www.surriel.com/patches/2.4.0-t9p2-vmpatch

(yes, I'm at a conference now ... the worst beating this patch
has had is a full night in 'make bzImage' with mem=8m)

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch *] VM deadlock fix
  2000-09-21 16:44 [patch *] VM deadlock fix Rik van Riel
@ 2000-09-21 20:28 ` Roger Larsson
  2000-09-21 23:31   ` Problem remains - page_launder? (Was: Re: [patch *] VM deadlock fix) Roger Larsson
  2000-09-21 22:23 ` [patch *] VM deadlock fix David S. Miller
  2000-09-22 12:16 ` [patch *] VM deadlock fix Martin Diehl
  2 siblings, 1 reply; 20+ messages in thread
From: Roger Larsson @ 2000-09-21 20:28 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-kernel, linux-mm

[-- Attachment #1: Type: text/plain, Size: 1805 bytes --]

Hi,

Tried your patch on 2.2.4-test9-pre4
with the included debug patch applied.

Rebooted, started mmap002

After a while it starts outputting (magic did not work
this time - usually does):

- - -
"VM: try_to_free_pages (result: 1) try_again # 12345"
"VM: try_to_free_pages (result: 1) try_again # 12346"
- - -

My interpretation:
1) try_to_free_pages succeeds (or returns ok when it did not work)
2) __alloc_pages still can't alloc

Maybe it is different limits,
  try_to_free_pages requires less to succeed than
  __alloc_pages_limit requires.
or a bug in
  __alloc_pages_limit(zonelist, order, PAGES_MIN, direct_reclaim)

Note:
  12345  is an example, it loops to over 30000...

/RogerL

Rik van Riel wrote:
> 
> Hi,
> 
> I've found and fixed the deadlocks in the new VM. They turned out
> to be single-cpu only bugs, which explains why they didn't crash my
> SMP tesnt box ;)
> 
> They have to do with the fact that processes schedule away while
> holding IO locks after waking up kswapd. At that point kswapd
> spends its time spinning on the IO locks and single-cpu systems
> will die...
> 
> Due to bad connectivity I'm not attaching this patch but have only
> put it online on my home page:
> 
> http://www.surriel.com/patches/2.4.0-t9p2-vmpatch
> 
> (yes, I'm at a conference now ... the worst beating this patch
> has had is a full night in 'make bzImage' with mem=8m)
> 
> regards,
> 
> Rik
> --
> "What you're running that piece of shit Gnome?!?!"
>        -- Miguel de Icaza, UKUUG 2000
> 
> http://www.conectiva.com/               http://www.surriel.com/
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> Please read the FAQ at http://www.tux.org/lkml/

--
Home page:
  http://www.norran.net/nra02596/

[-- Attachment #2: vmdebug.patch --]
[-- Type: text/plain, Size: 1167 bytes --]

--- mm/page_alloc.c.orig	Thu Sep 21 20:02:54 2000
+++ mm/page_alloc.c	Thu Sep 21 20:49:35 2000
@@ -295,6 +295,7 @@
 	int direct_reclaim = 0;
 	unsigned int gfp_mask = zonelist->gfp_mask;
 	struct page * page = NULL;
+	int try_again_loops = 0;
 
 	/*
 	 * Allocations put pressure on the VM subsystem.
@@ -320,8 +321,10 @@
 	/*
 	 * Are we low on inactive pages?
 	 */
-	if (inactive_shortage() > inactive_target / 2 && free_shortage())
+	if (inactive_shortage() > inactive_target / 2 && free_shortage()) {
+	  printk("VM: inactive shortage wake kswapd\n");
 		wakeup_kswapd(0);
+	}
 
 try_again:
 	/*
@@ -410,6 +413,7 @@
 		 * piece of free memory.
 		 */
 		if (order > 0 && (gfp_mask & __GFP_WAIT)) {
+		  printk("VM: higher order");
 			zone = zonelist->zones;
 			/* First, clean some dirty pages. */
 			page_launder(gfp_mask, 1);
@@ -444,7 +448,9 @@
 		 * processes, etc).
 		 */
 		if (gfp_mask & __GFP_WAIT) {
-			try_to_free_pages(gfp_mask);
+			int success = try_to_free_pages(gfp_mask);
+         printk("VM: try_to_free_pages (result: %d) try_again # %d\n",
+                success, ++try_again_loops);   
 			memory_pressure++;
 			goto try_again;
 		}

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Problem remains - page_launder? (Was: Re: [patch *] VM deadlock fix)
  2000-09-21 20:28 ` Roger Larsson
@ 2000-09-21 23:31   ` Roger Larsson
  0 siblings, 0 replies; 20+ messages in thread
From: Roger Larsson @ 2000-09-21 23:31 UTC (permalink / raw)
  To: Rik van Riel, linux-kernel, linux-mm

Hi again,

Further hints.

More testing (printks in refill_inactive and page_launder)
reveals that refill_inactive works ok (16 pages) but 
page_launder never succeeds in my lockup state... (WHY)
alloc fails since there is no inactive_clean and free is
less than MIN. And then when page_launder fails...

/RogerL


Roger Larsson wrote:
> 
> Hi,
> 
> Tried your patch on 2.2.4-test9-pre4
> with the included debug patch applied.
> 
> Rebooted, started mmap002
> 
> After a while it starts outputting (magic did not work
> this time - usually does):
> 
> - - -
> "VM: try_to_free_pages (result: 1) try_again # 12345"
> "VM: try_to_free_pages (result: 1) try_again # 12346"
> - - -
> 
> My interpretation:
> 1) try_to_free_pages succeeds (or returns ok when it did not work)
> 2) __alloc_pages still can't alloc
> 
> Maybe it is different limits,
>   try_to_free_pages requires less to succeed than
>   __alloc_pages_limit requires.
> or a bug in
>   __alloc_pages_limit(zonelist, order, PAGES_MIN, direct_reclaim)
> 
> Note:
>   12345  is an example, it loops to over 30000...
> 
> /RogerL
> 
> Rik van Riel wrote:
> >
> > Hi,
> >
> > I've found and fixed the deadlocks in the new VM. They turned out
> > to be single-cpu only bugs, which explains why they didn't crash my
> > SMP tesnt box ;)
> >
> > They have to do with the fact that processes schedule away while
> > holding IO locks after waking up kswapd. At that point kswapd
> > spends its time spinning on the IO locks and single-cpu systems
> > will die...
> >
> > Due to bad connectivity I'm not attaching this patch but have only
> > put it online on my home page:
> >
> > http://www.surriel.com/patches/2.4.0-t9p2-vmpatch
> >
> > (yes, I'm at a conference now ... the worst beating this patch
> > has had is a full night in 'make bzImage' with mem=8m)
> >
> > regards,
> >
> > Rik
> > --
> > "What you're running that piece of shit Gnome?!?!"
> >        -- Miguel de Icaza, UKUUG 2000
> >
> > http://www.conectiva.com/               http://www.surriel.com/
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > Please read the FAQ at http://www.tux.org/lkml/
> 
> --
> Home page:
>   http://www.norran.net/nra02596/
> 
>   ------------------------------------------------------------------------
>                     Name: vmdebug.patch
>    vmdebug.patch    Type: Plain Text (text/plain)
>                 Encoding: 7bit

--
Home page:
  http://www.norran.net/nra02596/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch *] VM deadlock fix
  2000-09-21 16:44 [patch *] VM deadlock fix Rik van Riel
  2000-09-21 20:28 ` Roger Larsson
@ 2000-09-21 22:23 ` David S. Miller
  2000-09-22  0:18   ` Andrea Arcangeli
  2000-09-22  8:39   ` Rik van Riel
  2000-09-22 12:16 ` [patch *] VM deadlock fix Martin Diehl
  2 siblings, 2 replies; 20+ messages in thread
From: David S. Miller @ 2000-09-21 22:23 UTC (permalink / raw)
  To: riel; +Cc: torvalds, linux-kernel, linux-mm

How did you get away with adding a new member to task_struct yet not
updating the INIT_TASK() macro appropriately? :-)  Does it really
compile?

Later,
David S. Miller
davem@redhat.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch *] VM deadlock fix
  2000-09-21 22:23 ` [patch *] VM deadlock fix David S. Miller
@ 2000-09-22  0:18   ` Andrea Arcangeli
  2000-09-21 23:57     ` David S. Miller
  2000-09-22  8:39   ` Rik van Riel
  1 sibling, 1 reply; 20+ messages in thread
From: Andrea Arcangeli @ 2000-09-22  0:18 UTC (permalink / raw)
  To: David S. Miller; +Cc: riel, torvalds, linux-kernel, linux-mm

On Thu, Sep 21, 2000 at 03:23:17PM -0700, David S. Miller wrote:
> 
> How did you get away with adding a new member to task_struct yet not
> updating the INIT_TASK() macro appropriately? :-)  Does it really
> compile?

As far as sleep_time is ok to be set to zero its missing initialization is
right.

Andrea
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch *] VM deadlock fix
  2000-09-22  0:18   ` Andrea Arcangeli
@ 2000-09-21 23:57     ` David S. Miller
  0 siblings, 0 replies; 20+ messages in thread
From: David S. Miller @ 2000-09-21 23:57 UTC (permalink / raw)
  To: andrea; +Cc: riel, torvalds, linux-kernel, linux-mm

   As far as sleep_time is ok to be set to zero its missing
   initialization is right.

Indeed.

Later,
David S. Miller
davem@redhat.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch *] VM deadlock fix
  2000-09-21 22:23 ` [patch *] VM deadlock fix David S. Miller
  2000-09-22  0:18   ` Andrea Arcangeli
@ 2000-09-22  8:39   ` Rik van Riel
  2000-09-22  8:54     ` test9-pre5+t9p2-vmpatch VM deadlock during write-intensive workload Molnar Ingo
  1 sibling, 1 reply; 20+ messages in thread
From: Rik van Riel @ 2000-09-22  8:39 UTC (permalink / raw)
  To: David S. Miller; +Cc: torvalds, linux-kernel, linux-mm

On Thu, 21 Sep 2000, David S. Miller wrote:

> How did you get away with adding a new member to task_struct yet
> not updating the INIT_TASK() macro appropriately? :-)  Does it
> really compile?

There are a lot of fields in the task_struct which
do not have fields declared in the INIT_TASK macro.

They seem to be set to zero by default.

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* test9-pre5+t9p2-vmpatch VM deadlock during write-intensive workload
  2000-09-22  8:39   ` Rik van Riel
@ 2000-09-22  8:54     ` Molnar Ingo
  2000-09-22  9:00       ` Molnar Ingo
                         ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Molnar Ingo @ 2000-09-22  8:54 UTC (permalink / raw)
  To: Rik van Riel; +Cc: David S. Miller, torvalds, linux-kernel, linux-mm

i'm still getting VM related lockups during heavy write load, in
test9-pre5 + your 2.4.0-t9p2-vmpatch (which i understand as being your
last VM related fix-patch, correct?). Here is a histogram of such a
lockup:

      1 Trace; 4010a720 <__switch_to+38/e8>
      5 Trace; 4010a74b <__switch_to+63/e8>
     13 Trace; 4010abc4 <poll_idle+10/2c>
    819 Trace; 4010abca <poll_idle+16/2c>
   1806 Trace; 4010abce <poll_idle+1a/2c>
      1 Trace; 4010abd0 <poll_idle+1c/2c>
      2 Trace; 4011af51 <schedule+45/884>
      1 Trace; 4011af77 <schedule+6b/884>
      1 Trace; 4011b010 <schedule+104/884>
      3 Trace; 4011b018 <schedule+10c/884>
      1 Trace; 4011b02d <schedule+121/884>
      1 Trace; 4011b051 <schedule+145/884>
      1 Trace; 4011b056 <schedule+14a/884>
      2 Trace; 4011b05c <schedule+150/884>
      3 Trace; 4011b06d <schedule+161/884>
      4 Trace; 4011b076 <schedule+16a/884>
    537 Trace; 4011b2bb <schedule+3af/884>
      2 Trace; 4011b2c6 <schedule+3ba/884>
      1 Trace; 4011b2c9 <schedule+3bd/884>
      4 Trace; 4011b2d5 <schedule+3c9/884>
     31 Trace; 4011b31a <schedule+40e/884>
      1 Trace; 4011b31d <schedule+411/884>
      1 Trace; 4011b32a <schedule+41e/884>
      1 Trace; 4011b346 <schedule+43a/884>
     11 Trace; 4011b378 <schedule+46c/884>
      2 Trace; 4011b381 <schedule+475/884>
      5 Trace; 4011b3f8 <schedule+4ec/884>
     17 Trace; 4011b404 <schedule+4f8/884>
      9 Trace; 4011b43f <schedule+533/884>
      1 Trace; 4011b450 <schedule+544/884>
      1 Trace; 4011b457 <schedule+54b/884>
      2 Trace; 4011b48c <schedule+580/884>
      1 Trace; 4011b49c <schedule+590/884>
    428 Trace; 4011b4cd <schedule+5c1/884>
      6 Trace; 4011b4f7 <schedule+5eb/884>
      4 Trace; 4011b500 <schedule+5f4/884>
      2 Trace; 4011b509 <schedule+5fd/884>
      1 Trace; 4011b560 <schedule+654/884>
      1 Trace; 4011b809 <__wake_up+79/3f0>
      1 Trace; 4011b81b <__wake_up+8b/3f0>
      8 Trace; 4011b81e <__wake_up+8e/3f0>
    310 Trace; 4011ba90 <__wake_up+300/3f0>
      1 Trace; 4011bb7b <__wake_up+3eb/3f0>
      2 Trace; 4011c32b <interruptible_sleep_on_timeout+283/290>
    244 Trace; 4011d40e <add_wait_queue+14e/154>
      1 Trace; 4011d411 <add_wait_queue+151/154>
      1 Trace; 4011d56c <remove_wait_queue+8/d0>
    618 Trace; 4011d62e <remove_wait_queue+ca/d0>
      2 Trace; 40122f28 <do_softirq+48/88>
      2 Trace; 40126c3c <del_timer_sync+6c/78>
      1 Trace; 401377ab <wakeup_kswapd+7/254>
      1 Trace; 401377c8 <wakeup_kswapd+24/254>
      5 Trace; 401377cc <wakeup_kswapd+28/254>
     15 Trace; 401377d4 <wakeup_kswapd+30/254>
     11 Trace; 401377dc <wakeup_kswapd+38/254>
      2 Trace; 401377e0 <wakeup_kswapd+3c/254>
      6 Trace; 401377ee <wakeup_kswapd+4a/254>
      8 Trace; 4013783c <wakeup_kswapd+98/254>
      1 Trace; 401378f8 <wakeup_kswapd+154/254>
      3 Trace; 4013792d <wakeup_kswapd+189/254>
      2 Trace; 401379af <wakeup_kswapd+20b/254>
      2 Trace; 401379f3 <wakeup_kswapd+24f/254>
      1 Trace; 40138524 <__alloc_pages+7c/4b8>
      1 Trace; 4013852b <__alloc_pages+83/4b8>

(first column is number of profiling hits, profiling hits taken on all
CPUs.)

unfortunately i havent captured which processes are running. This is an
8-CPU SMP box, 8 write-intensive processes are running, they create new
1k-1MB files in new directories - a total of many gigabytes.

this lockup happens both during vanilla test9-pre5 and with
2.4.0-t9p2-vmpatch. Your patch makes the lockup happen a bit later than
previous, but it still happens. During the lockup all dirty buffers are
written out to disk until it reaches such a state:

2162688 pages of RAM
1343488 pages of HIGHMEM
116116 reserved pages
652826 pages shared
0 pages swap cached
0 pages in page table cache
Buffer memory:    52592kB
    CLEAN: 664 buffers, 2302 kbyte, 5 used (last=93), 0 locked, 0 protected, 0 dirty
   LOCKED: 661752 buffers, 2646711 kbyte, 37 used (last=661397), 0 locked, 0 protected, 0 dirty
    DIRTY: 17 buffers, 26 kbyte, 1 used (last=1), 0 locked, 0 protected, 17 dirty

no disk IO happens anymore, but the lockup persists. The histogram was
taken after all disk IO has stopped.

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: test9-pre5+t9p2-vmpatch VM deadlock during write-intensive workload
  2000-09-22  8:54     ` test9-pre5+t9p2-vmpatch VM deadlock during write-intensive workload Molnar Ingo
@ 2000-09-22  9:00       ` Molnar Ingo
  2000-09-22  9:08       ` Rik van Riel
  2000-09-22 17:39       ` Linus Torvalds
  2 siblings, 0 replies; 20+ messages in thread
From: Molnar Ingo @ 2000-09-22  9:00 UTC (permalink / raw)
  To: Rik van Riel; +Cc: David S. Miller, torvalds, linux-kernel, linux-mm

btw. - no swapdevice here.

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: test9-pre5+t9p2-vmpatch VM deadlock during write-intensive workload
  2000-09-22  8:54     ` test9-pre5+t9p2-vmpatch VM deadlock during write-intensive workload Molnar Ingo
  2000-09-22  9:00       ` Molnar Ingo
@ 2000-09-22  9:08       ` Rik van Riel
  2000-09-22  9:14         ` Molnar Ingo
  2000-09-22 17:39       ` Linus Torvalds
  2 siblings, 1 reply; 20+ messages in thread
From: Rik van Riel @ 2000-09-22  9:08 UTC (permalink / raw)
  To: Molnar Ingo; +Cc: David S. Miller, torvalds, linux-kernel, linux-mm

On Fri, 22 Sep 2000, Molnar Ingo wrote:

> i'm still getting VM related lockups during heavy write load, in
> test9-pre5 + your 2.4.0-t9p2-vmpatch (which i understand as being your
> last VM related fix-patch, correct?). Here is a histogram of such a
> lockup:

> this lockup happens both during vanilla test9-pre5 and with
> 2.4.0-t9p2-vmpatch. Your patch makes the lockup happen a bit
> later than previous, but it still happens. During the lockup all
> dirty buffers are written out to disk until it reaches such a
> state:

It seems that conference life has taken its toll,
I seem to have reversed the logic in the test if
we can reschedule in refill_inactive()   ;(

On mm/vmscan.c, please remove the `!' in the
following fragment of code:

 894          if (current->need_resched && !(gfp_mask & __GFP_IO)) {
 895                  __set_current_state(TASK_RUNNING);
 896                  schedule();
 897          }

The idea was to not allow processes which have IO locks
to schedule away, but as you can see, the check is 
reversed ...

With the above fix, can you still lock it up?
And if you can, does it lock up in the same way or in
a new and exciting way? ;)

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: test9-pre5+t9p2-vmpatch VM deadlock during write-intensive workload
  2000-09-22  9:08       ` Rik van Riel
@ 2000-09-22  9:14         ` Molnar Ingo
  2000-09-22  9:34           ` Molnar Ingo
  0 siblings, 1 reply; 20+ messages in thread
From: Molnar Ingo @ 2000-09-22  9:14 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Molnar Ingo, David S. Miller, torvalds, linux-kernel, linux-mm

On Fri, 22 Sep 2000, Rik van Riel wrote:

>  894          if (current->need_resched && !(gfp_mask & __GFP_IO)) {
>  895                  __set_current_state(TASK_RUNNING);
>  896                  schedule();
>  897          }

> The idea was to not allow processes which have IO locks
> to schedule away, but as you can see, the check is 
> reversed ...

thanks ... sounds good. Will have this tested in about 15 mins.

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: test9-pre5+t9p2-vmpatch VM deadlock during write-intensive workload
  2000-09-22  9:14         ` Molnar Ingo
@ 2000-09-22  9:34           ` Molnar Ingo
  2000-09-22 10:27             ` Rik van Riel
  0 siblings, 1 reply; 20+ messages in thread
From: Molnar Ingo @ 2000-09-22  9:34 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Molnar Ingo, David S. Miller, torvalds, linux-kernel, linux-mm

[-- Attachment #1: Type: TEXT/PLAIN, Size: 145 bytes --]


yep this has done the trick, the deadlock is gone. I've attached the full
VM-fixes patch (this fix included) against vanilla test9-pre5.

	Ingo

[-- Attachment #2: Type: TEXT/PLAIN, Size: 6536 bytes --]

--- linux/fs/buffer.c.orig	Fri Sep 22 02:31:07 2000
+++ linux/fs/buffer.c	Fri Sep 22 02:31:13 2000
@@ -706,9 +706,7 @@
 static void refill_freelist(int size)
 {
 	if (!grow_buffers(size)) {
-		balance_dirty(NODEV);
-		wakeup_kswapd(0); /* We can't wait because of __GFP_IO */
-		schedule();
+		try_to_free_pages(GFP_BUFFER);
 	}
 }
 
--- linux/mm/filemap.c.orig	Fri Sep 22 02:31:07 2000
+++ linux/mm/filemap.c	Fri Sep 22 02:31:13 2000
@@ -255,7 +255,7 @@
 	 * up kswapd.
 	 */
 	age_page_up(page);
-	if (inactive_shortage() > (inactive_target * 3) / 4)
+	if (inactive_shortage() > inactive_target / 2 && free_shortage())
 			wakeup_kswapd(0);
 not_found:
 	return page;
--- linux/mm/page_alloc.c.orig	Fri Sep 22 02:31:07 2000
+++ linux/mm/page_alloc.c	Fri Sep 22 02:31:13 2000
@@ -444,7 +444,8 @@
 		 * processes, etc).
 		 */
 		if (gfp_mask & __GFP_WAIT) {
-			wakeup_kswapd(1);
+			try_to_free_pages(gfp_mask);
+			memory_pressure++;
 			goto try_again;
 		}
 	}
--- linux/mm/swap.c.orig	Fri Sep 22 02:31:07 2000
+++ linux/mm/swap.c	Fri Sep 22 02:31:13 2000
@@ -233,27 +233,11 @@
 	spin_lock(&pagemap_lru_lock);
 	if (!PageLocked(page))
 		BUG();
-	/*
-	 * Heisenbug Compensator(tm)
-	 * This bug shouldn't trigger, but for unknown reasons it
-	 * sometimes does. If there are no signs of list corruption,
-	 * we ignore the problem. Else we BUG()...
-	 */
-	if (PageActive(page) || PageInactiveDirty(page) ||
-					PageInactiveClean(page)) {
-		struct list_head * page_lru = &page->lru;
-		if (page_lru->next->prev != page_lru) {
-			printk("VM: lru_cache_add, bit or list corruption..\n");
-			BUG();
-		}
-		printk("VM: lru_cache_add, page already in list!\n");
-		goto page_already_on_list;
-	}
+	DEBUG_ADD_PAGE
 	add_page_to_active_list(page);
 	/* This should be relatively rare */
 	if (!page->age)
 		deactivate_page_nolock(page);
-page_already_on_list:
 	spin_unlock(&pagemap_lru_lock);
 }
 
--- linux/mm/vmscan.c.orig	Fri Sep 22 02:31:07 2000
+++ linux/mm/vmscan.c	Fri Sep 22 02:31:27 2000
@@ -377,7 +377,7 @@
 #define SWAP_SHIFT 5
 #define SWAP_MIN 8
 
-static int swap_out(unsigned int priority, int gfp_mask)
+static int swap_out(unsigned int priority, int gfp_mask, unsigned long idle_time)
 {
 	struct task_struct * p;
 	int counter;
@@ -407,6 +407,7 @@
 		struct mm_struct *best = NULL;
 		int pid = 0;
 		int assign = 0;
+		int found_task = 0;
 	select:
 		read_lock(&tasklist_lock);
 		p = init_task.next_task;
@@ -416,6 +417,11 @@
 				continue;
 	 		if (mm->rss <= 0)
 				continue;
+			/* Skip tasks which haven't slept long enough yet when idle-swapping. */
+			if (idle_time && !assign && (!(p->state & TASK_INTERRUPTIBLE) ||
+					time_before(p->sleep_time + idle_time * HZ, jiffies)))
+				continue;
+			found_task++;
 			/* Refresh swap_cnt? */
 			if (assign == 1) {
 				mm->swap_cnt = (mm->rss >> SWAP_SHIFT);
@@ -430,7 +436,7 @@
 		}
 		read_unlock(&tasklist_lock);
 		if (!best) {
-			if (!assign) {
+			if (!assign && found_task > 0) {
 				assign = 1;
 				goto select;
 			}
@@ -691,9 +697,9 @@
 			 * Now the page is really freeable, so we
 			 * move it to the inactive_clean list.
 			 */
-			UnlockPage(page);
 			del_page_from_inactive_dirty_list(page);
 			add_page_to_inactive_clean_list(page);
+			UnlockPage(page);
 			cleaned_pages++;
 		} else {
 			/*
@@ -701,9 +707,9 @@
 			 * It's no use keeping it here, so we move it to
 			 * the active list.
 			 */
-			UnlockPage(page);
 			del_page_from_inactive_dirty_list(page);
 			add_page_to_active_list(page);
+			UnlockPage(page);
 		}
 	}
 	spin_unlock(&pagemap_lru_lock);
@@ -860,6 +866,7 @@
 static int refill_inactive(unsigned int gfp_mask, int user)
 {
 	int priority, count, start_count, made_progress;
+	unsigned long idle_time;
 
 	count = inactive_shortage() + free_shortage();
 	if (user)
@@ -869,16 +876,28 @@
 	/* Always trim SLAB caches when memory gets low. */
 	kmem_cache_reap(gfp_mask);
 
+	/*
+	 * Calculate the minimum time (in seconds) a process must
+	 * have slept before we consider it for idle swapping.
+	 * This must be the number of seconds it takes to go through
+	 * all of the cache. Doing this idle swapping makes the VM
+	 * smoother once we start hitting swap.
+	 */
+	idle_time = atomic_read(&page_cache_size);
+	idle_time += atomic_read(&buffermem_pages);
+	idle_time /= (inactive_target + 1);
+
 	priority = 6;
 	do {
 		made_progress = 0;
 
-		if (current->need_resched) {
+		if (current->need_resched && (gfp_mask & __GFP_IO)) {
 			__set_current_state(TASK_RUNNING);
 			schedule();
 		}
 
-		while (refill_inactive_scan(priority, 1)) {
+		while (refill_inactive_scan(priority, 1) ||
+				swap_out(priority, gfp_mask, idle_time)) {
 			made_progress = 1;
 			if (!--count)
 				goto done;
@@ -913,7 +932,7 @@
 		/*
 		 * Then, try to page stuff out..
 		 */
-		while (swap_out(priority, gfp_mask)) {
+		while (swap_out(priority, gfp_mask, 0)) {
 			made_progress = 1;
 			if (!--count)
 				goto done;
@@ -963,7 +982,8 @@
 	 * before we get around to moving them to the other
 	 * list, so this is a relatively cheap operation.
 	 */
-	if (free_shortage())
+	if (free_shortage() || nr_inactive_dirty_pages > nr_free_pages() +
+			nr_inactive_clean_pages())
 		ret += page_launder(gfp_mask, user);
 
 	/*
@@ -1070,9 +1090,12 @@
 		run_task_queue(&tq_disk);
 
 		/* 
-		 * If we've either completely gotten rid of the
-		 * free page shortage or the inactive page shortage
-		 * is getting low, then stop eating CPU time.
+		 * We go to sleep if either the free page shortage
+		 * or the inactive page shortage is gone. We do this
+		 * because:
+		 * 1) we need no more free pages   or
+		 * 2) the inactive pages need to be flushed to disk,
+		 *    it wouldn't help to eat CPU time now ...
 		 *
 		 * We go to sleep for one second, but if it's needed
 		 * we'll be woken up earlier...
--- linux/include/linux/sched.h.orig	Fri Sep 22 02:31:04 2000
+++ linux/include/linux/sched.h	Fri Sep 22 02:31:13 2000
@@ -298,6 +298,7 @@
 	 * that's just fine.)
 	 */
 	struct list_head run_list;
+	unsigned long sleep_time;
 
 	struct task_struct *next_task, *prev_task;
 	struct mm_struct *active_mm;
@@ -818,6 +819,7 @@
 static inline void del_from_runqueue(struct task_struct * p)
 {
 	nr_running--;
+	p->sleep_time = jiffies;
 	list_del(&p->run_list);
 	p->run_list.next = NULL;
 }

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: test9-pre5+t9p2-vmpatch VM deadlock during write-intensive workload
  2000-09-22  9:34           ` Molnar Ingo
@ 2000-09-22 10:27             ` Rik van Riel
  2000-09-22 13:10               ` André Dahlqvist
  0 siblings, 1 reply; 20+ messages in thread
From: Rik van Riel @ 2000-09-22 10:27 UTC (permalink / raw)
  To: Molnar Ingo; +Cc: David S. Miller, torvalds, linux-kernel, linux-mm

[-- Attachment #1: Type: TEXT/PLAIN, Size: 543 bytes --]

On Fri, 22 Sep 2000, Molnar Ingo wrote:

> yep this has done the trick, the deadlock is gone. I've attached the full
> VM-fixes patch (this fix included) against vanilla test9-pre5.

Linus,

could you please include this patch in the next
pre patch?

(in the mean time, I'll go back to looking at the
balancing thing with shared memory ... which is
unrelated to this deadlock problem)

thanks,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

[-- Attachment #2: Type: TEXT/PLAIN, Size: 6536 bytes --]

--- linux/fs/buffer.c.orig	Fri Sep 22 02:31:07 2000
+++ linux/fs/buffer.c	Fri Sep 22 02:31:13 2000
@@ -706,9 +706,7 @@
 static void refill_freelist(int size)
 {
 	if (!grow_buffers(size)) {
-		balance_dirty(NODEV);
-		wakeup_kswapd(0); /* We can't wait because of __GFP_IO */
-		schedule();
+		try_to_free_pages(GFP_BUFFER);
 	}
 }
 
--- linux/mm/filemap.c.orig	Fri Sep 22 02:31:07 2000
+++ linux/mm/filemap.c	Fri Sep 22 02:31:13 2000
@@ -255,7 +255,7 @@
 	 * up kswapd.
 	 */
 	age_page_up(page);
-	if (inactive_shortage() > (inactive_target * 3) / 4)
+	if (inactive_shortage() > inactive_target / 2 && free_shortage())
 			wakeup_kswapd(0);
 not_found:
 	return page;
--- linux/mm/page_alloc.c.orig	Fri Sep 22 02:31:07 2000
+++ linux/mm/page_alloc.c	Fri Sep 22 02:31:13 2000
@@ -444,7 +444,8 @@
 		 * processes, etc).
 		 */
 		if (gfp_mask & __GFP_WAIT) {
-			wakeup_kswapd(1);
+			try_to_free_pages(gfp_mask);
+			memory_pressure++;
 			goto try_again;
 		}
 	}
--- linux/mm/swap.c.orig	Fri Sep 22 02:31:07 2000
+++ linux/mm/swap.c	Fri Sep 22 02:31:13 2000
@@ -233,27 +233,11 @@
 	spin_lock(&pagemap_lru_lock);
 	if (!PageLocked(page))
 		BUG();
-	/*
-	 * Heisenbug Compensator(tm)
-	 * This bug shouldn't trigger, but for unknown reasons it
-	 * sometimes does. If there are no signs of list corruption,
-	 * we ignore the problem. Else we BUG()...
-	 */
-	if (PageActive(page) || PageInactiveDirty(page) ||
-					PageInactiveClean(page)) {
-		struct list_head * page_lru = &page->lru;
-		if (page_lru->next->prev != page_lru) {
-			printk("VM: lru_cache_add, bit or list corruption..\n");
-			BUG();
-		}
-		printk("VM: lru_cache_add, page already in list!\n");
-		goto page_already_on_list;
-	}
+	DEBUG_ADD_PAGE
 	add_page_to_active_list(page);
 	/* This should be relatively rare */
 	if (!page->age)
 		deactivate_page_nolock(page);
-page_already_on_list:
 	spin_unlock(&pagemap_lru_lock);
 }
 
--- linux/mm/vmscan.c.orig	Fri Sep 22 02:31:07 2000
+++ linux/mm/vmscan.c	Fri Sep 22 02:31:27 2000
@@ -377,7 +377,7 @@
 #define SWAP_SHIFT 5
 #define SWAP_MIN 8
 
-static int swap_out(unsigned int priority, int gfp_mask)
+static int swap_out(unsigned int priority, int gfp_mask, unsigned long idle_time)
 {
 	struct task_struct * p;
 	int counter;
@@ -407,6 +407,7 @@
 		struct mm_struct *best = NULL;
 		int pid = 0;
 		int assign = 0;
+		int found_task = 0;
 	select:
 		read_lock(&tasklist_lock);
 		p = init_task.next_task;
@@ -416,6 +417,11 @@
 				continue;
 	 		if (mm->rss <= 0)
 				continue;
+			/* Skip tasks which haven't slept long enough yet when idle-swapping. */
+			if (idle_time && !assign && (!(p->state & TASK_INTERRUPTIBLE) ||
+					time_before(p->sleep_time + idle_time * HZ, jiffies)))
+				continue;
+			found_task++;
 			/* Refresh swap_cnt? */
 			if (assign == 1) {
 				mm->swap_cnt = (mm->rss >> SWAP_SHIFT);
@@ -430,7 +436,7 @@
 		}
 		read_unlock(&tasklist_lock);
 		if (!best) {
-			if (!assign) {
+			if (!assign && found_task > 0) {
 				assign = 1;
 				goto select;
 			}
@@ -691,9 +697,9 @@
 			 * Now the page is really freeable, so we
 			 * move it to the inactive_clean list.
 			 */
-			UnlockPage(page);
 			del_page_from_inactive_dirty_list(page);
 			add_page_to_inactive_clean_list(page);
+			UnlockPage(page);
 			cleaned_pages++;
 		} else {
 			/*
@@ -701,9 +707,9 @@
 			 * It's no use keeping it here, so we move it to
 			 * the active list.
 			 */
-			UnlockPage(page);
 			del_page_from_inactive_dirty_list(page);
 			add_page_to_active_list(page);
+			UnlockPage(page);
 		}
 	}
 	spin_unlock(&pagemap_lru_lock);
@@ -860,6 +866,7 @@
 static int refill_inactive(unsigned int gfp_mask, int user)
 {
 	int priority, count, start_count, made_progress;
+	unsigned long idle_time;
 
 	count = inactive_shortage() + free_shortage();
 	if (user)
@@ -869,16 +876,28 @@
 	/* Always trim SLAB caches when memory gets low. */
 	kmem_cache_reap(gfp_mask);
 
+	/*
+	 * Calculate the minimum time (in seconds) a process must
+	 * have slept before we consider it for idle swapping.
+	 * This must be the number of seconds it takes to go through
+	 * all of the cache. Doing this idle swapping makes the VM
+	 * smoother once we start hitting swap.
+	 */
+	idle_time = atomic_read(&page_cache_size);
+	idle_time += atomic_read(&buffermem_pages);
+	idle_time /= (inactive_target + 1);
+
 	priority = 6;
 	do {
 		made_progress = 0;
 
-		if (current->need_resched) {
+		if (current->need_resched && (gfp_mask & __GFP_IO)) {
 			__set_current_state(TASK_RUNNING);
 			schedule();
 		}
 
-		while (refill_inactive_scan(priority, 1)) {
+		while (refill_inactive_scan(priority, 1) ||
+				swap_out(priority, gfp_mask, idle_time)) {
 			made_progress = 1;
 			if (!--count)
 				goto done;
@@ -913,7 +932,7 @@
 		/*
 		 * Then, try to page stuff out..
 		 */
-		while (swap_out(priority, gfp_mask)) {
+		while (swap_out(priority, gfp_mask, 0)) {
 			made_progress = 1;
 			if (!--count)
 				goto done;
@@ -963,7 +982,8 @@
 	 * before we get around to moving them to the other
 	 * list, so this is a relatively cheap operation.
 	 */
-	if (free_shortage())
+	if (free_shortage() || nr_inactive_dirty_pages > nr_free_pages() +
+			nr_inactive_clean_pages())
 		ret += page_launder(gfp_mask, user);
 
 	/*
@@ -1070,9 +1090,12 @@
 		run_task_queue(&tq_disk);
 
 		/* 
-		 * If we've either completely gotten rid of the
-		 * free page shortage or the inactive page shortage
-		 * is getting low, then stop eating CPU time.
+		 * We go to sleep if either the free page shortage
+		 * or the inactive page shortage is gone. We do this
+		 * because:
+		 * 1) we need no more free pages   or
+		 * 2) the inactive pages need to be flushed to disk,
+		 *    it wouldn't help to eat CPU time now ...
 		 *
 		 * We go to sleep for one second, but if it's needed
 		 * we'll be woken up earlier...
--- linux/include/linux/sched.h.orig	Fri Sep 22 02:31:04 2000
+++ linux/include/linux/sched.h	Fri Sep 22 02:31:13 2000
@@ -298,6 +298,7 @@
 	 * that's just fine.)
 	 */
 	struct list_head run_list;
+	unsigned long sleep_time;
 
 	struct task_struct *next_task, *prev_task;
 	struct mm_struct *active_mm;
@@ -818,6 +819,7 @@
 static inline void del_from_runqueue(struct task_struct * p)
 {
 	nr_running--;
+	p->sleep_time = jiffies;
 	list_del(&p->run_list);
 	p->run_list.next = NULL;
 }

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: test9-pre5+t9p2-vmpatch VM deadlock during write-intensive workload
  2000-09-22 10:27             ` Rik van Riel
@ 2000-09-22 13:10               ` André Dahlqvist
  2000-09-22 14:10                 ` André Dahlqvist
  2000-09-22 16:20                 ` test9-pre5+t9p2-vmpatch VM deadlock during write-intensive workload Mohammad A. Haque
  0 siblings, 2 replies; 20+ messages in thread
From: André Dahlqvist @ 2000-09-22 13:10 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Molnar Ingo, David S. Miller, torvalds, linux-kernel, linux-mm

On Fri, Sep 22, 2000 at 07:27:30AM -0300, Rik van Riel wrote:

> Linus,
> 
> could you please include this patch in the next
> pre patch?

Rik,

I just had an oops with this patch applied. I ran into BUG at
buffer.c:730. The machine was not under load when the oops occured, I
was just reading e-mail in Mutt. I had to type the oops down by hand,
but I will provide ksymoops output soon if you need it.
-- 

// Andre
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: test9-pre5+t9p2-vmpatch VM deadlock during write-intensive workload
  2000-09-22 13:10               ` André Dahlqvist
@ 2000-09-22 14:10                 ` André Dahlqvist
  2000-09-22 16:38                   ` test9-pre3+t9p2-vmpatch VM deadlock during socket I/O Yuri Pudgorodsky
  2000-09-22 16:20                 ` test9-pre5+t9p2-vmpatch VM deadlock during write-intensive workload Mohammad A. Haque
  1 sibling, 1 reply; 20+ messages in thread
From: André Dahlqvist @ 2000-09-22 14:10 UTC (permalink / raw)
  To: Rik van Riel, Molnar Ingo, David S. Miller, torvalds,
	linux-kernel, linux-mm

> I had to type the oops down by hand, but I will provide ksymoops
> output soon if you need it.

Let's hope I typed down the oops from the screen without misstakes. Here
is the ksymoops output:

ksymoops 2.3.4 on i586 2.4.0-test9.  Options used
     -V (default)
     -k 20000922143001.ksyms (specified)
     -l 20000922143001.modules (specified)
     -o /lib/modules/2.4.0-test9/ (default)
     -m /boot/System.map-2.4.0-test9 (default)

invalid operand: 0000
CPU:    0
EIP:    0010:[<c012c1be>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010086
eax: 0000001c   ebx: c31779e0     ecx: 00000000       edx: 00000082
esi: c11f6f80   edi: 00000008     ebp: 00000001       esp: c01f3eec
ds: 0018   es: 0018   ss: 0018
Process swapper (pid:0, stackpage=c01f3000)
Stack: c01bb465 c01bb79a 000002da c0150d3f e31779e0 00000001 c11f6480 00000046
       c1168360 c0248460 c01684e3 c11f6f80 00000001 c0248584 00000000 c11f6f80
       c02484a0 c016e563 00000001 c1168360 c02484a0 c1168360 00000286 c0169cc7
Call Trace: [<c01bb4b5>] [<c01bb79a>] [<c0150d3f>] [<c01684e3>]
[<c016e563>] [<c0169cc7>] [<c016e500>] [<c010a02c>] [<c010a18e>] [<c0107120>] [<c0108de0>]
[<c0107120>] [<c0107143>] [<c01071a7>] [<c0105000>]
                                      [<c0100192>]
Code: 0f 0b 83 c4 0c c3 57 56 53 86 74 24 10 8b 54 24 14 85 d2 74

>>EIP; c012c1be <end_buffer_io_bad+42/48>   <=====
Trace; c01bb4b5 <tvecs+36dd/cde8>
Trace; c01bb79a <tvecs+39c2/cde8>
Trace; c0150d3f <end_that_request_first+5f/b8>
Trace; c01684e3 <ide_end_request+27/74>
Trace; c016e563 <ide_dma_intr+63/9c>
Trace; c0169cc7 <ide_intr+fb/150>
Trace; c016e500 <ide_dma_intr+0/9c>
Trace; c010a02c <handle_IRQ_event+30/5c>
Trace; c010a18e <do_IRQ+6e/b0>
Trace; c0107120 <default_idle+0/28>
Trace; c0108de0 <ret_from_intr+0/20>
Trace; c0107120 <default_idle+0/28>
Trace; c0107143 <default_idle+23/28>
Trace; c01071a7 <cpu_idle+3f/54>
Trace; c0105000 <empty_bad_page+0/1000>
Trace; c0100192 <L6+0/2>
Code;  c012c1be <end_buffer_io_bad+42/48>
00000000 <_EIP>:
Code;  c012c1be <end_buffer_io_bad+42/48>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  c012c1c0 <end_buffer_io_bad+44/48>
   2:   83 c4 0c                  add    $0xc,%esp
Code;  c012c1c3 <end_buffer_io_bad+47/48>
   5:   c3                        ret    
Code;  c012c1c4 <end_buffer_io_async+0/b4>
   6:   57                        push   %edi
Code;  c012c1c5 <end_buffer_io_async+1/b4>
   7:   56                        push   %esi
Code;  c012c1c6 <end_buffer_io_async+2/b4>
   8:   53                        push   %ebx
Code;  c012c1c7 <end_buffer_io_async+3/b4>
   9:   86 74 24 10               xchg   %dh,0x10(%esp,1)
Code;  c012c1cb <end_buffer_io_async+7/b4>
   d:   8b 54 24 14               mov    0x14(%esp,1),%edx
Code;  c012c1cf <end_buffer_io_async+b/b4>
  11:   85 d2                     test   %edx,%edx
Code;  c012c1d1 <end_buffer_io_async+d/b4>
  13:   74 00                     je     15 <_EIP+0x15> c012c1d3 <end_buffer_io_async+f/b4>

Aiee, killing interrupt handler
Kernel panic: Attempted to kill the idle task!
-- 

// Andre
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* test9-pre3+t9p2-vmpatch VM deadlock during socket I/O
  2000-09-22 14:10                 ` André Dahlqvist
@ 2000-09-22 16:38                   ` Yuri Pudgorodsky
  0 siblings, 0 replies; 20+ messages in thread
From: Yuri Pudgorodsky @ 2000-09-22 16:38 UTC (permalink / raw)
  To: André Dahlqvist
  Cc: Rik van Riel, Molnar Ingo, David S. Miller, torvalds,
	linux-kernel, linux-mm

I also encounter instant lockup of test9-pre3 + t9p2-vmpatch / SMP (two CPU).
under high I/O via UNIX domain sockets:

    - running 10 simple tasks doing
    #define BUFFERSIZE 204800
    for (j = 0; ; j++) {
                        if (socketpair(PF_LOCAL, SOCK_STREAM, 0, p) == -1) {
                                exit(1);
                        }
                        fcntl(p[0], F_SETFL, O_NONBLOCK);
                        fcntl(p[1], F_SETFL, O_NONBLOCK);
                        write(p[0], crap, BUFFERSIZE);
                        write(p[1], crap, BUFFERSIZE);
        }

So it looks like swap_out() cannot obtain lock_kernel()
holded by a swap_out() on a second CPU.... See below.

Call trace (looks very similar on both CPU):

Trace; c020aa3e <stext_lock+18a6/8848>
    (called from c0133eb4 <swap_out+0x28>)
Trace; c0133eb4 <swap_out+28/228>               args (6, 3, 0)
Trace; c0134e50 <refill_inactive+c8/170>        args (3, 1)
Trace; c0134f75 <do_try_to_free_pages+7d/9c>    args (3,1)
Trace; c0135168 <wakeup_kswapd+84/bc>
Trace; c0135d72 <__alloc_pages+1d6/264>
Trace; c0135e17 <__get_free_pages+17/28>
Trace; c01322ce <kmem_cache_grow+e2/264>
....

Under lockup, memory map looks like:

Active: 121 Inactive_dirty: 12217 Inactive_clean: 0 free: 12210 (256 512 768)

and does not change from time to time.

Most frequent EIP locations (from Sys-AltRq/P):

Trace; c0133f74 <swap_out+e8/228>
Trace; c0133f23 <swap_out+97/228>
Trace; c0134039 <swap_out+1ad/228>
Trace; c020aa37 <stext_lock+189f/8848>
Trace; c020aa3e <stext_lock+18a6/8848>


In a hope for a quick fix,
Yuri Pudgorodsky


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: test9-pre5+t9p2-vmpatch VM deadlock during write-intensive workload
  2000-09-22 13:10               ` André Dahlqvist
  2000-09-22 14:10                 ` André Dahlqvist
@ 2000-09-22 16:20                 ` Mohammad A. Haque
  1 sibling, 0 replies; 20+ messages in thread
From: Mohammad A. Haque @ 2000-09-22 16:20 UTC (permalink / raw)
  To: André Dahlqvist
  Cc: Rik van Riel, Molnar Ingo, David S. Miller, torvalds,
	linux-kernel, linux-mm

If the process that barfed is swapper then this is the oops that I got
in test9-pre4 w/o any patches.

http://marc.theaimsgroup.com/?l=linux-kernel&m=96936789621245&w=2

On Fri, 22 Sep 2000, Andre Dahlqvist wrote:

> On Fri, Sep 22, 2000 at 07:27:30AM -0300, Rik van Riel wrote:
> 
> > Linus,
> > 
> > could you please include this patch in the next
> > pre patch?
> 
> Rik,
> 
> I just had an oops with this patch applied. I ran into BUG at
> buffer.c:730. The machine was not under load when the oops occured, I
> was just reading e-mail in Mutt. I had to type the oops down by hand,
> but I will provide ksymoops output soon if you need it.
> 

-- 

=====================================================================
Mohammad A. Haque                              http://www.haque.net/ 
                                               mhaque@haque.net

  "Alcohol and calculus don't mix.             Project Lead
   Don't drink and derive." --Unknown          http://wm.themes.org/
                                               batmanppc@themes.org
=====================================================================

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: test9-pre5+t9p2-vmpatch VM deadlock during write-intensive workload
  2000-09-22  8:54     ` test9-pre5+t9p2-vmpatch VM deadlock during write-intensive workload Molnar Ingo
  2000-09-22  9:00       ` Molnar Ingo
  2000-09-22  9:08       ` Rik van Riel
@ 2000-09-22 17:39       ` Linus Torvalds
  2000-09-25 13:47         ` Rik van Riel
  2 siblings, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2000-09-22 17:39 UTC (permalink / raw)
  To: Molnar Ingo; +Cc: Rik van Riel, David S. Miller, linux-kernel, linux-mm


On Fri, 22 Sep 2000, Molnar Ingo wrote:
> 
> i'm still getting VM related lockups during heavy write load, in
> test9-pre5 + your 2.4.0-t9p2-vmpatch (which i understand as being your
> last VM related fix-patch, correct?). Here is a histogram of such a
> lockup:

Rik, 
 those VM patches are going away RSN if these issues do not get fixed. I'm
really disappointed, and suspect that it would be easier to go back to the
old VM with just page aging added, not your new code that seems to be full
of deadlocks everywhere.

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: test9-pre5+t9p2-vmpatch VM deadlock during write-intensive workload
  2000-09-22 17:39       ` Linus Torvalds
@ 2000-09-25 13:47         ` Rik van Riel
  0 siblings, 0 replies; 20+ messages in thread
From: Rik van Riel @ 2000-09-25 13:47 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Molnar Ingo, David S. Miller, linux-kernel, linux-mm

On Fri, 22 Sep 2000, Linus Torvalds wrote:
> On Fri, 22 Sep 2000, Molnar Ingo wrote:
> > 
> > i'm still getting VM related lockups during heavy write load, in
> > test9-pre5 + your 2.4.0-t9p2-vmpatch (which i understand as being your
> > last VM related fix-patch, correct?). Here is a histogram of such a
> > lockup:
> 
>  those VM patches are going away RSN if these issues do not get
> fixed. I'm really disappointed, and suspect that it would be
> easier to go back to the old VM with just page aging added, not
> your new code that seems to be full of deadlocks everywhere.

I've been away on a conference last week, so I haven't
had much chance to take a look at the code after you
integrated it and the test base got increased ;(

One thing I discovered are some UP-only deadlocks and
the page ping-pong thing, which I am fixing right now.

If I had a choice, I'd have chosen /next/ week as the
time to integrate the code ... doing this while I'm 
away at a conference was really inconvenient ;)

I'm looking into the email backlog and the bug reports
right now (today, tuesday and wednesday I'm at /another/
conferenc and thursday will be the next opportunity).

It looks like ther are no fundamental issues left, just
a bunch of small thinkos that can be fixed in a (few?)
week(s).

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch *] VM deadlock fix
  2000-09-21 16:44 [patch *] VM deadlock fix Rik van Riel
  2000-09-21 20:28 ` Roger Larsson
  2000-09-21 22:23 ` [patch *] VM deadlock fix David S. Miller
@ 2000-09-22 12:16 ` Martin Diehl
  2 siblings, 0 replies; 20+ messages in thread
From: Martin Diehl @ 2000-09-22 12:16 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-kernel, linux-mm

On Thu, 21 Sep 2000, Rik van Riel wrote:

> I've found and fixed the deadlocks in the new VM. They turned out 
> to be single-cpu only bugs, which explains why they didn't crash my
> SMP tesnt box ;)

Hi,

tried
> http://www.surriel.com/patches/2.4.0-t9p2-vmpatch
applied to 2.4.0-t9p4 on UP box booted with mem=8M.

The deadlock behaviour appears to be somehow different compared
to vanilla 2.4.0-t9p4 - however, for me it makes things even worse:

I booted into singleuser and used

dd if=/dev/urandom of=/dev/null count=1 bs=x

to trigger the issue by increasing bs-values. As soon as bs is big
enough to force swapping (about 3M in my case) the box "deadlocks".
What has become worse is, that SysRq+e (or k) doesn't help anymore
with this patch applied. So I had to SysRq+b and ended fscking (but
no fs-corruption). Without the patch this was not a problem.

Some more points I've notized:

* apparently, the deadlock happens when the box begins to swap. I never
  found any used swapspace with the new VM from 2.4.0-t9p*. If memory
  requests force the use of swapspace, the machine deadlocks.

* when, after deadlocking, I pressed SysRq+t several times I found
  - either dd or kswapd being current task in vanilla 2.4.0-t9p4
  - neither dd nor kswapd ever being current with this patch

* as an printk() in the main loop shows, kreclaimd *never* awoke

* My impression was similar to what somebody has already reported:
  seems something related to refill_inactive_scan() is recursing to
  infinity when the "deadlock" happens.

* the behaviour of kswapd without this last patch differs significantly
  before and after the first deadlock happens (and released by SysRq+e):
  only *after* pressing SysRq+e (or k) kswapd awoke once per second
  on the idle box. This is strange since it should sleep with timeout=HZ
  in its main loop.

Especially the last point suggests to me there might be a problem at
initialization. I'm not sure, whether everything called from kswapd
is properly initialized at the time when the kswapd-thread is created.
To check this, I've tentatively added an additional
interruptible_sleep_on_timeout() before kswapd's main loop to delay it
until initialization has finished. Probably it would be more "Right" to
move the sleep from the end of the main loop to its beginning - however,
I just tried a quick hack and did not check if the *_shortage() stuff is
ready to be called at init time.

The additional sleep before kswapd enters its main loop was a major
improvement for me:

* my dd-tests did not deadlock anymore - even with bs=100M and mem=8M

* swap space was really used now.

* i was able to advance beyond singleuser with 2.4.0-t9p* and mem=8M
  for the very first time (always deadlocked in the init-scripts)

* i was even able to make bzImage - but it dumped core after about 15 Min
  for unknown reason (probably out of memory) but without any deadlock.
  Box was at av. load 3 and 15M swap used at this time.

* I found kreclaimd *was* awoken several times.

* however, kswapd still not awaking every second after fresh boot. Now
  it begins to awake as soon as real swapping starts.

So, my conclusion is the "deadlock" issue might be mainly an
initialization problem. Probably some more special handling is needed
at swapon later. Currently my guess is there is a initialization problem
when kswapd starts and some kind of blocking when refill_inactive_scan()
is called before swapon.

Comments?
Will do some more tests (including your latest patch).

Regards
Martin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2000-09-25 13:47 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-09-21 16:44 [patch *] VM deadlock fix Rik van Riel
2000-09-21 20:28 ` Roger Larsson
2000-09-21 23:31   ` Problem remains - page_launder? (Was: Re: [patch *] VM deadlock fix) Roger Larsson
2000-09-21 22:23 ` [patch *] VM deadlock fix David S. Miller
2000-09-22  0:18   ` Andrea Arcangeli
2000-09-21 23:57     ` David S. Miller
2000-09-22  8:39   ` Rik van Riel
2000-09-22  8:54     ` test9-pre5+t9p2-vmpatch VM deadlock during write-intensive workload Molnar Ingo
2000-09-22  9:00       ` Molnar Ingo
2000-09-22  9:08       ` Rik van Riel
2000-09-22  9:14         ` Molnar Ingo
2000-09-22  9:34           ` Molnar Ingo
2000-09-22 10:27             ` Rik van Riel
2000-09-22 13:10               ` André Dahlqvist
2000-09-22 14:10                 ` André Dahlqvist
2000-09-22 16:38                   ` test9-pre3+t9p2-vmpatch VM deadlock during socket I/O Yuri Pudgorodsky
2000-09-22 16:20                 ` test9-pre5+t9p2-vmpatch VM deadlock during write-intensive workload Mohammad A. Haque
2000-09-22 17:39       ` Linus Torvalds
2000-09-25 13:47         ` Rik van Riel
2000-09-22 12:16 ` [patch *] VM deadlock fix Martin Diehl

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox