* [patch *] VM deadlock fix
@ 2000-09-21 16:44 Rik van Riel
2000-09-21 20:28 ` Roger Larsson
` (2 more replies)
0 siblings, 3 replies; 20+ messages in thread
From: Rik van Riel @ 2000-09-21 16:44 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-kernel, linux-mm
Hi,
I've found and fixed the deadlocks in the new VM. They turned out
to be single-cpu only bugs, which explains why they didn't crash my
SMP tesnt box ;)
They have to do with the fact that processes schedule away while
holding IO locks after waking up kswapd. At that point kswapd
spends its time spinning on the IO locks and single-cpu systems
will die...
Due to bad connectivity I'm not attaching this patch but have only
put it online on my home page:
http://www.surriel.com/patches/2.4.0-t9p2-vmpatch
(yes, I'm at a conference now ... the worst beating this patch
has had is a full night in 'make bzImage' with mem=8m)
regards,
Rik
--
"What you're running that piece of shit Gnome?!?!"
-- Miguel de Icaza, UKUUG 2000
http://www.conectiva.com/ http://www.surriel.com/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [patch *] VM deadlock fix 2000-09-21 16:44 [patch *] VM deadlock fix Rik van Riel @ 2000-09-21 20:28 ` Roger Larsson 2000-09-21 23:31 ` Problem remains - page_launder? (Was: Re: [patch *] VM deadlock fix) Roger Larsson 2000-09-21 22:23 ` [patch *] VM deadlock fix David S. Miller 2000-09-22 12:16 ` [patch *] VM deadlock fix Martin Diehl 2 siblings, 1 reply; 20+ messages in thread From: Roger Larsson @ 2000-09-21 20:28 UTC (permalink / raw) To: Rik van Riel; +Cc: linux-kernel, linux-mm [-- Attachment #1: Type: text/plain, Size: 1805 bytes --] Hi, Tried your patch on 2.2.4-test9-pre4 with the included debug patch applied. Rebooted, started mmap002 After a while it starts outputting (magic did not work this time - usually does): - - - "VM: try_to_free_pages (result: 1) try_again # 12345" "VM: try_to_free_pages (result: 1) try_again # 12346" - - - My interpretation: 1) try_to_free_pages succeeds (or returns ok when it did not work) 2) __alloc_pages still can't alloc Maybe it is different limits, try_to_free_pages requires less to succeed than __alloc_pages_limit requires. or a bug in __alloc_pages_limit(zonelist, order, PAGES_MIN, direct_reclaim) Note: 12345 is an example, it loops to over 30000... /RogerL Rik van Riel wrote: > > Hi, > > I've found and fixed the deadlocks in the new VM. They turned out > to be single-cpu only bugs, which explains why they didn't crash my > SMP tesnt box ;) > > They have to do with the fact that processes schedule away while > holding IO locks after waking up kswapd. At that point kswapd > spends its time spinning on the IO locks and single-cpu systems > will die... > > Due to bad connectivity I'm not attaching this patch but have only > put it online on my home page: > > http://www.surriel.com/patches/2.4.0-t9p2-vmpatch > > (yes, I'm at a conference now ... the worst beating this patch > has had is a full night in 'make bzImage' with mem=8m) > > regards, > > Rik > -- > "What you're running that piece of shit Gnome?!?!" > -- Miguel de Icaza, UKUUG 2000 > > http://www.conectiva.com/ http://www.surriel.com/ > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > Please read the FAQ at http://www.tux.org/lkml/ -- Home page: http://www.norran.net/nra02596/ [-- Attachment #2: vmdebug.patch --] [-- Type: text/plain, Size: 1167 bytes --] --- mm/page_alloc.c.orig Thu Sep 21 20:02:54 2000 +++ mm/page_alloc.c Thu Sep 21 20:49:35 2000 @@ -295,6 +295,7 @@ int direct_reclaim = 0; unsigned int gfp_mask = zonelist->gfp_mask; struct page * page = NULL; + int try_again_loops = 0; /* * Allocations put pressure on the VM subsystem. @@ -320,8 +321,10 @@ /* * Are we low on inactive pages? */ - if (inactive_shortage() > inactive_target / 2 && free_shortage()) + if (inactive_shortage() > inactive_target / 2 && free_shortage()) { + printk("VM: inactive shortage wake kswapd\n"); wakeup_kswapd(0); + } try_again: /* @@ -410,6 +413,7 @@ * piece of free memory. */ if (order > 0 && (gfp_mask & __GFP_WAIT)) { + printk("VM: higher order"); zone = zonelist->zones; /* First, clean some dirty pages. */ page_launder(gfp_mask, 1); @@ -444,7 +448,9 @@ * processes, etc). */ if (gfp_mask & __GFP_WAIT) { - try_to_free_pages(gfp_mask); + int success = try_to_free_pages(gfp_mask); + printk("VM: try_to_free_pages (result: %d) try_again # %d\n", + success, ++try_again_loops); memory_pressure++; goto try_again; } ^ permalink raw reply [flat|nested] 20+ messages in thread
* Problem remains - page_launder? (Was: Re: [patch *] VM deadlock fix) 2000-09-21 20:28 ` Roger Larsson @ 2000-09-21 23:31 ` Roger Larsson 0 siblings, 0 replies; 20+ messages in thread From: Roger Larsson @ 2000-09-21 23:31 UTC (permalink / raw) To: Rik van Riel, linux-kernel, linux-mm Hi again, Further hints. More testing (printks in refill_inactive and page_launder) reveals that refill_inactive works ok (16 pages) but page_launder never succeeds in my lockup state... (WHY) alloc fails since there is no inactive_clean and free is less than MIN. And then when page_launder fails... /RogerL Roger Larsson wrote: > > Hi, > > Tried your patch on 2.2.4-test9-pre4 > with the included debug patch applied. > > Rebooted, started mmap002 > > After a while it starts outputting (magic did not work > this time - usually does): > > - - - > "VM: try_to_free_pages (result: 1) try_again # 12345" > "VM: try_to_free_pages (result: 1) try_again # 12346" > - - - > > My interpretation: > 1) try_to_free_pages succeeds (or returns ok when it did not work) > 2) __alloc_pages still can't alloc > > Maybe it is different limits, > try_to_free_pages requires less to succeed than > __alloc_pages_limit requires. > or a bug in > __alloc_pages_limit(zonelist, order, PAGES_MIN, direct_reclaim) > > Note: > 12345 is an example, it loops to over 30000... > > /RogerL > > Rik van Riel wrote: > > > > Hi, > > > > I've found and fixed the deadlocks in the new VM. They turned out > > to be single-cpu only bugs, which explains why they didn't crash my > > SMP tesnt box ;) > > > > They have to do with the fact that processes schedule away while > > holding IO locks after waking up kswapd. At that point kswapd > > spends its time spinning on the IO locks and single-cpu systems > > will die... > > > > Due to bad connectivity I'm not attaching this patch but have only > > put it online on my home page: > > > > http://www.surriel.com/patches/2.4.0-t9p2-vmpatch > > > > (yes, I'm at a conference now ... the worst beating this patch > > has had is a full night in 'make bzImage' with mem=8m) > > > > regards, > > > > Rik > > -- > > "What you're running that piece of shit Gnome?!?!" > > -- Miguel de Icaza, UKUUG 2000 > > > > http://www.conectiva.com/ http://www.surriel.com/ > > > > - > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > the body of a message to majordomo@vger.kernel.org > > Please read the FAQ at http://www.tux.org/lkml/ > > -- > Home page: > http://www.norran.net/nra02596/ > > ------------------------------------------------------------------------ > Name: vmdebug.patch > vmdebug.patch Type: Plain Text (text/plain) > Encoding: 7bit -- Home page: http://www.norran.net/nra02596/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [patch *] VM deadlock fix 2000-09-21 16:44 [patch *] VM deadlock fix Rik van Riel 2000-09-21 20:28 ` Roger Larsson @ 2000-09-21 22:23 ` David S. Miller 2000-09-22 0:18 ` Andrea Arcangeli 2000-09-22 8:39 ` Rik van Riel 2000-09-22 12:16 ` [patch *] VM deadlock fix Martin Diehl 2 siblings, 2 replies; 20+ messages in thread From: David S. Miller @ 2000-09-21 22:23 UTC (permalink / raw) To: riel; +Cc: torvalds, linux-kernel, linux-mm How did you get away with adding a new member to task_struct yet not updating the INIT_TASK() macro appropriately? :-) Does it really compile? Later, David S. Miller davem@redhat.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [patch *] VM deadlock fix 2000-09-21 22:23 ` [patch *] VM deadlock fix David S. Miller @ 2000-09-22 0:18 ` Andrea Arcangeli 2000-09-21 23:57 ` David S. Miller 2000-09-22 8:39 ` Rik van Riel 1 sibling, 1 reply; 20+ messages in thread From: Andrea Arcangeli @ 2000-09-22 0:18 UTC (permalink / raw) To: David S. Miller; +Cc: riel, torvalds, linux-kernel, linux-mm On Thu, Sep 21, 2000 at 03:23:17PM -0700, David S. Miller wrote: > > How did you get away with adding a new member to task_struct yet not > updating the INIT_TASK() macro appropriately? :-) Does it really > compile? As far as sleep_time is ok to be set to zero its missing initialization is right. Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [patch *] VM deadlock fix 2000-09-22 0:18 ` Andrea Arcangeli @ 2000-09-21 23:57 ` David S. Miller 0 siblings, 0 replies; 20+ messages in thread From: David S. Miller @ 2000-09-21 23:57 UTC (permalink / raw) To: andrea; +Cc: riel, torvalds, linux-kernel, linux-mm As far as sleep_time is ok to be set to zero its missing initialization is right. Indeed. Later, David S. Miller davem@redhat.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [patch *] VM deadlock fix 2000-09-21 22:23 ` [patch *] VM deadlock fix David S. Miller 2000-09-22 0:18 ` Andrea Arcangeli @ 2000-09-22 8:39 ` Rik van Riel 2000-09-22 8:54 ` test9-pre5+t9p2-vmpatch VM deadlock during write-intensive workload Molnar Ingo 1 sibling, 1 reply; 20+ messages in thread From: Rik van Riel @ 2000-09-22 8:39 UTC (permalink / raw) To: David S. Miller; +Cc: torvalds, linux-kernel, linux-mm On Thu, 21 Sep 2000, David S. Miller wrote: > How did you get away with adding a new member to task_struct yet > not updating the INIT_TASK() macro appropriately? :-) Does it > really compile? There are a lot of fields in the task_struct which do not have fields declared in the INIT_TASK macro. They seem to be set to zero by default. regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* test9-pre5+t9p2-vmpatch VM deadlock during write-intensive workload 2000-09-22 8:39 ` Rik van Riel @ 2000-09-22 8:54 ` Molnar Ingo 2000-09-22 9:00 ` Molnar Ingo ` (2 more replies) 0 siblings, 3 replies; 20+ messages in thread From: Molnar Ingo @ 2000-09-22 8:54 UTC (permalink / raw) To: Rik van Riel; +Cc: David S. Miller, torvalds, linux-kernel, linux-mm i'm still getting VM related lockups during heavy write load, in test9-pre5 + your 2.4.0-t9p2-vmpatch (which i understand as being your last VM related fix-patch, correct?). Here is a histogram of such a lockup: 1 Trace; 4010a720 <__switch_to+38/e8> 5 Trace; 4010a74b <__switch_to+63/e8> 13 Trace; 4010abc4 <poll_idle+10/2c> 819 Trace; 4010abca <poll_idle+16/2c> 1806 Trace; 4010abce <poll_idle+1a/2c> 1 Trace; 4010abd0 <poll_idle+1c/2c> 2 Trace; 4011af51 <schedule+45/884> 1 Trace; 4011af77 <schedule+6b/884> 1 Trace; 4011b010 <schedule+104/884> 3 Trace; 4011b018 <schedule+10c/884> 1 Trace; 4011b02d <schedule+121/884> 1 Trace; 4011b051 <schedule+145/884> 1 Trace; 4011b056 <schedule+14a/884> 2 Trace; 4011b05c <schedule+150/884> 3 Trace; 4011b06d <schedule+161/884> 4 Trace; 4011b076 <schedule+16a/884> 537 Trace; 4011b2bb <schedule+3af/884> 2 Trace; 4011b2c6 <schedule+3ba/884> 1 Trace; 4011b2c9 <schedule+3bd/884> 4 Trace; 4011b2d5 <schedule+3c9/884> 31 Trace; 4011b31a <schedule+40e/884> 1 Trace; 4011b31d <schedule+411/884> 1 Trace; 4011b32a <schedule+41e/884> 1 Trace; 4011b346 <schedule+43a/884> 11 Trace; 4011b378 <schedule+46c/884> 2 Trace; 4011b381 <schedule+475/884> 5 Trace; 4011b3f8 <schedule+4ec/884> 17 Trace; 4011b404 <schedule+4f8/884> 9 Trace; 4011b43f <schedule+533/884> 1 Trace; 4011b450 <schedule+544/884> 1 Trace; 4011b457 <schedule+54b/884> 2 Trace; 4011b48c <schedule+580/884> 1 Trace; 4011b49c <schedule+590/884> 428 Trace; 4011b4cd <schedule+5c1/884> 6 Trace; 4011b4f7 <schedule+5eb/884> 4 Trace; 4011b500 <schedule+5f4/884> 2 Trace; 4011b509 <schedule+5fd/884> 1 Trace; 4011b560 <schedule+654/884> 1 Trace; 4011b809 <__wake_up+79/3f0> 1 Trace; 4011b81b <__wake_up+8b/3f0> 8 Trace; 4011b81e <__wake_up+8e/3f0> 310 Trace; 4011ba90 <__wake_up+300/3f0> 1 Trace; 4011bb7b <__wake_up+3eb/3f0> 2 Trace; 4011c32b <interruptible_sleep_on_timeout+283/290> 244 Trace; 4011d40e <add_wait_queue+14e/154> 1 Trace; 4011d411 <add_wait_queue+151/154> 1 Trace; 4011d56c <remove_wait_queue+8/d0> 618 Trace; 4011d62e <remove_wait_queue+ca/d0> 2 Trace; 40122f28 <do_softirq+48/88> 2 Trace; 40126c3c <del_timer_sync+6c/78> 1 Trace; 401377ab <wakeup_kswapd+7/254> 1 Trace; 401377c8 <wakeup_kswapd+24/254> 5 Trace; 401377cc <wakeup_kswapd+28/254> 15 Trace; 401377d4 <wakeup_kswapd+30/254> 11 Trace; 401377dc <wakeup_kswapd+38/254> 2 Trace; 401377e0 <wakeup_kswapd+3c/254> 6 Trace; 401377ee <wakeup_kswapd+4a/254> 8 Trace; 4013783c <wakeup_kswapd+98/254> 1 Trace; 401378f8 <wakeup_kswapd+154/254> 3 Trace; 4013792d <wakeup_kswapd+189/254> 2 Trace; 401379af <wakeup_kswapd+20b/254> 2 Trace; 401379f3 <wakeup_kswapd+24f/254> 1 Trace; 40138524 <__alloc_pages+7c/4b8> 1 Trace; 4013852b <__alloc_pages+83/4b8> (first column is number of profiling hits, profiling hits taken on all CPUs.) unfortunately i havent captured which processes are running. This is an 8-CPU SMP box, 8 write-intensive processes are running, they create new 1k-1MB files in new directories - a total of many gigabytes. this lockup happens both during vanilla test9-pre5 and with 2.4.0-t9p2-vmpatch. Your patch makes the lockup happen a bit later than previous, but it still happens. During the lockup all dirty buffers are written out to disk until it reaches such a state: 2162688 pages of RAM 1343488 pages of HIGHMEM 116116 reserved pages 652826 pages shared 0 pages swap cached 0 pages in page table cache Buffer memory: 52592kB CLEAN: 664 buffers, 2302 kbyte, 5 used (last=93), 0 locked, 0 protected, 0 dirty LOCKED: 661752 buffers, 2646711 kbyte, 37 used (last=661397), 0 locked, 0 protected, 0 dirty DIRTY: 17 buffers, 26 kbyte, 1 used (last=1), 0 locked, 0 protected, 17 dirty no disk IO happens anymore, but the lockup persists. The histogram was taken after all disk IO has stopped. Ingo -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: test9-pre5+t9p2-vmpatch VM deadlock during write-intensive workload 2000-09-22 8:54 ` test9-pre5+t9p2-vmpatch VM deadlock during write-intensive workload Molnar Ingo @ 2000-09-22 9:00 ` Molnar Ingo 2000-09-22 9:08 ` Rik van Riel 2000-09-22 17:39 ` Linus Torvalds 2 siblings, 0 replies; 20+ messages in thread From: Molnar Ingo @ 2000-09-22 9:00 UTC (permalink / raw) To: Rik van Riel; +Cc: David S. Miller, torvalds, linux-kernel, linux-mm btw. - no swapdevice here. Ingo -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: test9-pre5+t9p2-vmpatch VM deadlock during write-intensive workload 2000-09-22 8:54 ` test9-pre5+t9p2-vmpatch VM deadlock during write-intensive workload Molnar Ingo 2000-09-22 9:00 ` Molnar Ingo @ 2000-09-22 9:08 ` Rik van Riel 2000-09-22 9:14 ` Molnar Ingo 2000-09-22 17:39 ` Linus Torvalds 2 siblings, 1 reply; 20+ messages in thread From: Rik van Riel @ 2000-09-22 9:08 UTC (permalink / raw) To: Molnar Ingo; +Cc: David S. Miller, torvalds, linux-kernel, linux-mm On Fri, 22 Sep 2000, Molnar Ingo wrote: > i'm still getting VM related lockups during heavy write load, in > test9-pre5 + your 2.4.0-t9p2-vmpatch (which i understand as being your > last VM related fix-patch, correct?). Here is a histogram of such a > lockup: > this lockup happens both during vanilla test9-pre5 and with > 2.4.0-t9p2-vmpatch. Your patch makes the lockup happen a bit > later than previous, but it still happens. During the lockup all > dirty buffers are written out to disk until it reaches such a > state: It seems that conference life has taken its toll, I seem to have reversed the logic in the test if we can reschedule in refill_inactive() ;( On mm/vmscan.c, please remove the `!' in the following fragment of code: 894 if (current->need_resched && !(gfp_mask & __GFP_IO)) { 895 __set_current_state(TASK_RUNNING); 896 schedule(); 897 } The idea was to not allow processes which have IO locks to schedule away, but as you can see, the check is reversed ... With the above fix, can you still lock it up? And if you can, does it lock up in the same way or in a new and exciting way? ;) regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: test9-pre5+t9p2-vmpatch VM deadlock during write-intensive workload 2000-09-22 9:08 ` Rik van Riel @ 2000-09-22 9:14 ` Molnar Ingo 2000-09-22 9:34 ` Molnar Ingo 0 siblings, 1 reply; 20+ messages in thread From: Molnar Ingo @ 2000-09-22 9:14 UTC (permalink / raw) To: Rik van Riel Cc: Molnar Ingo, David S. Miller, torvalds, linux-kernel, linux-mm On Fri, 22 Sep 2000, Rik van Riel wrote: > 894 if (current->need_resched && !(gfp_mask & __GFP_IO)) { > 895 __set_current_state(TASK_RUNNING); > 896 schedule(); > 897 } > The idea was to not allow processes which have IO locks > to schedule away, but as you can see, the check is > reversed ... thanks ... sounds good. Will have this tested in about 15 mins. Ingo -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: test9-pre5+t9p2-vmpatch VM deadlock during write-intensive workload 2000-09-22 9:14 ` Molnar Ingo @ 2000-09-22 9:34 ` Molnar Ingo 2000-09-22 10:27 ` Rik van Riel 0 siblings, 1 reply; 20+ messages in thread From: Molnar Ingo @ 2000-09-22 9:34 UTC (permalink / raw) To: Rik van Riel Cc: Molnar Ingo, David S. Miller, torvalds, linux-kernel, linux-mm [-- Attachment #1: Type: TEXT/PLAIN, Size: 145 bytes --] yep this has done the trick, the deadlock is gone. I've attached the full VM-fixes patch (this fix included) against vanilla test9-pre5. Ingo [-- Attachment #2: Type: TEXT/PLAIN, Size: 6536 bytes --] --- linux/fs/buffer.c.orig Fri Sep 22 02:31:07 2000 +++ linux/fs/buffer.c Fri Sep 22 02:31:13 2000 @@ -706,9 +706,7 @@ static void refill_freelist(int size) { if (!grow_buffers(size)) { - balance_dirty(NODEV); - wakeup_kswapd(0); /* We can't wait because of __GFP_IO */ - schedule(); + try_to_free_pages(GFP_BUFFER); } } --- linux/mm/filemap.c.orig Fri Sep 22 02:31:07 2000 +++ linux/mm/filemap.c Fri Sep 22 02:31:13 2000 @@ -255,7 +255,7 @@ * up kswapd. */ age_page_up(page); - if (inactive_shortage() > (inactive_target * 3) / 4) + if (inactive_shortage() > inactive_target / 2 && free_shortage()) wakeup_kswapd(0); not_found: return page; --- linux/mm/page_alloc.c.orig Fri Sep 22 02:31:07 2000 +++ linux/mm/page_alloc.c Fri Sep 22 02:31:13 2000 @@ -444,7 +444,8 @@ * processes, etc). */ if (gfp_mask & __GFP_WAIT) { - wakeup_kswapd(1); + try_to_free_pages(gfp_mask); + memory_pressure++; goto try_again; } } --- linux/mm/swap.c.orig Fri Sep 22 02:31:07 2000 +++ linux/mm/swap.c Fri Sep 22 02:31:13 2000 @@ -233,27 +233,11 @@ spin_lock(&pagemap_lru_lock); if (!PageLocked(page)) BUG(); - /* - * Heisenbug Compensator(tm) - * This bug shouldn't trigger, but for unknown reasons it - * sometimes does. If there are no signs of list corruption, - * we ignore the problem. Else we BUG()... - */ - if (PageActive(page) || PageInactiveDirty(page) || - PageInactiveClean(page)) { - struct list_head * page_lru = &page->lru; - if (page_lru->next->prev != page_lru) { - printk("VM: lru_cache_add, bit or list corruption..\n"); - BUG(); - } - printk("VM: lru_cache_add, page already in list!\n"); - goto page_already_on_list; - } + DEBUG_ADD_PAGE add_page_to_active_list(page); /* This should be relatively rare */ if (!page->age) deactivate_page_nolock(page); -page_already_on_list: spin_unlock(&pagemap_lru_lock); } --- linux/mm/vmscan.c.orig Fri Sep 22 02:31:07 2000 +++ linux/mm/vmscan.c Fri Sep 22 02:31:27 2000 @@ -377,7 +377,7 @@ #define SWAP_SHIFT 5 #define SWAP_MIN 8 -static int swap_out(unsigned int priority, int gfp_mask) +static int swap_out(unsigned int priority, int gfp_mask, unsigned long idle_time) { struct task_struct * p; int counter; @@ -407,6 +407,7 @@ struct mm_struct *best = NULL; int pid = 0; int assign = 0; + int found_task = 0; select: read_lock(&tasklist_lock); p = init_task.next_task; @@ -416,6 +417,11 @@ continue; if (mm->rss <= 0) continue; + /* Skip tasks which haven't slept long enough yet when idle-swapping. */ + if (idle_time && !assign && (!(p->state & TASK_INTERRUPTIBLE) || + time_before(p->sleep_time + idle_time * HZ, jiffies))) + continue; + found_task++; /* Refresh swap_cnt? */ if (assign == 1) { mm->swap_cnt = (mm->rss >> SWAP_SHIFT); @@ -430,7 +436,7 @@ } read_unlock(&tasklist_lock); if (!best) { - if (!assign) { + if (!assign && found_task > 0) { assign = 1; goto select; } @@ -691,9 +697,9 @@ * Now the page is really freeable, so we * move it to the inactive_clean list. */ - UnlockPage(page); del_page_from_inactive_dirty_list(page); add_page_to_inactive_clean_list(page); + UnlockPage(page); cleaned_pages++; } else { /* @@ -701,9 +707,9 @@ * It's no use keeping it here, so we move it to * the active list. */ - UnlockPage(page); del_page_from_inactive_dirty_list(page); add_page_to_active_list(page); + UnlockPage(page); } } spin_unlock(&pagemap_lru_lock); @@ -860,6 +866,7 @@ static int refill_inactive(unsigned int gfp_mask, int user) { int priority, count, start_count, made_progress; + unsigned long idle_time; count = inactive_shortage() + free_shortage(); if (user) @@ -869,16 +876,28 @@ /* Always trim SLAB caches when memory gets low. */ kmem_cache_reap(gfp_mask); + /* + * Calculate the minimum time (in seconds) a process must + * have slept before we consider it for idle swapping. + * This must be the number of seconds it takes to go through + * all of the cache. Doing this idle swapping makes the VM + * smoother once we start hitting swap. + */ + idle_time = atomic_read(&page_cache_size); + idle_time += atomic_read(&buffermem_pages); + idle_time /= (inactive_target + 1); + priority = 6; do { made_progress = 0; - if (current->need_resched) { + if (current->need_resched && (gfp_mask & __GFP_IO)) { __set_current_state(TASK_RUNNING); schedule(); } - while (refill_inactive_scan(priority, 1)) { + while (refill_inactive_scan(priority, 1) || + swap_out(priority, gfp_mask, idle_time)) { made_progress = 1; if (!--count) goto done; @@ -913,7 +932,7 @@ /* * Then, try to page stuff out.. */ - while (swap_out(priority, gfp_mask)) { + while (swap_out(priority, gfp_mask, 0)) { made_progress = 1; if (!--count) goto done; @@ -963,7 +982,8 @@ * before we get around to moving them to the other * list, so this is a relatively cheap operation. */ - if (free_shortage()) + if (free_shortage() || nr_inactive_dirty_pages > nr_free_pages() + + nr_inactive_clean_pages()) ret += page_launder(gfp_mask, user); /* @@ -1070,9 +1090,12 @@ run_task_queue(&tq_disk); /* - * If we've either completely gotten rid of the - * free page shortage or the inactive page shortage - * is getting low, then stop eating CPU time. + * We go to sleep if either the free page shortage + * or the inactive page shortage is gone. We do this + * because: + * 1) we need no more free pages or + * 2) the inactive pages need to be flushed to disk, + * it wouldn't help to eat CPU time now ... * * We go to sleep for one second, but if it's needed * we'll be woken up earlier... --- linux/include/linux/sched.h.orig Fri Sep 22 02:31:04 2000 +++ linux/include/linux/sched.h Fri Sep 22 02:31:13 2000 @@ -298,6 +298,7 @@ * that's just fine.) */ struct list_head run_list; + unsigned long sleep_time; struct task_struct *next_task, *prev_task; struct mm_struct *active_mm; @@ -818,6 +819,7 @@ static inline void del_from_runqueue(struct task_struct * p) { nr_running--; + p->sleep_time = jiffies; list_del(&p->run_list); p->run_list.next = NULL; } ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: test9-pre5+t9p2-vmpatch VM deadlock during write-intensive workload 2000-09-22 9:34 ` Molnar Ingo @ 2000-09-22 10:27 ` Rik van Riel 2000-09-22 13:10 ` André Dahlqvist 0 siblings, 1 reply; 20+ messages in thread From: Rik van Riel @ 2000-09-22 10:27 UTC (permalink / raw) To: Molnar Ingo; +Cc: David S. Miller, torvalds, linux-kernel, linux-mm [-- Attachment #1: Type: TEXT/PLAIN, Size: 543 bytes --] On Fri, 22 Sep 2000, Molnar Ingo wrote: > yep this has done the trick, the deadlock is gone. I've attached the full > VM-fixes patch (this fix included) against vanilla test9-pre5. Linus, could you please include this patch in the next pre patch? (in the mean time, I'll go back to looking at the balancing thing with shared memory ... which is unrelated to this deadlock problem) thanks, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ [-- Attachment #2: Type: TEXT/PLAIN, Size: 6536 bytes --] --- linux/fs/buffer.c.orig Fri Sep 22 02:31:07 2000 +++ linux/fs/buffer.c Fri Sep 22 02:31:13 2000 @@ -706,9 +706,7 @@ static void refill_freelist(int size) { if (!grow_buffers(size)) { - balance_dirty(NODEV); - wakeup_kswapd(0); /* We can't wait because of __GFP_IO */ - schedule(); + try_to_free_pages(GFP_BUFFER); } } --- linux/mm/filemap.c.orig Fri Sep 22 02:31:07 2000 +++ linux/mm/filemap.c Fri Sep 22 02:31:13 2000 @@ -255,7 +255,7 @@ * up kswapd. */ age_page_up(page); - if (inactive_shortage() > (inactive_target * 3) / 4) + if (inactive_shortage() > inactive_target / 2 && free_shortage()) wakeup_kswapd(0); not_found: return page; --- linux/mm/page_alloc.c.orig Fri Sep 22 02:31:07 2000 +++ linux/mm/page_alloc.c Fri Sep 22 02:31:13 2000 @@ -444,7 +444,8 @@ * processes, etc). */ if (gfp_mask & __GFP_WAIT) { - wakeup_kswapd(1); + try_to_free_pages(gfp_mask); + memory_pressure++; goto try_again; } } --- linux/mm/swap.c.orig Fri Sep 22 02:31:07 2000 +++ linux/mm/swap.c Fri Sep 22 02:31:13 2000 @@ -233,27 +233,11 @@ spin_lock(&pagemap_lru_lock); if (!PageLocked(page)) BUG(); - /* - * Heisenbug Compensator(tm) - * This bug shouldn't trigger, but for unknown reasons it - * sometimes does. If there are no signs of list corruption, - * we ignore the problem. Else we BUG()... - */ - if (PageActive(page) || PageInactiveDirty(page) || - PageInactiveClean(page)) { - struct list_head * page_lru = &page->lru; - if (page_lru->next->prev != page_lru) { - printk("VM: lru_cache_add, bit or list corruption..\n"); - BUG(); - } - printk("VM: lru_cache_add, page already in list!\n"); - goto page_already_on_list; - } + DEBUG_ADD_PAGE add_page_to_active_list(page); /* This should be relatively rare */ if (!page->age) deactivate_page_nolock(page); -page_already_on_list: spin_unlock(&pagemap_lru_lock); } --- linux/mm/vmscan.c.orig Fri Sep 22 02:31:07 2000 +++ linux/mm/vmscan.c Fri Sep 22 02:31:27 2000 @@ -377,7 +377,7 @@ #define SWAP_SHIFT 5 #define SWAP_MIN 8 -static int swap_out(unsigned int priority, int gfp_mask) +static int swap_out(unsigned int priority, int gfp_mask, unsigned long idle_time) { struct task_struct * p; int counter; @@ -407,6 +407,7 @@ struct mm_struct *best = NULL; int pid = 0; int assign = 0; + int found_task = 0; select: read_lock(&tasklist_lock); p = init_task.next_task; @@ -416,6 +417,11 @@ continue; if (mm->rss <= 0) continue; + /* Skip tasks which haven't slept long enough yet when idle-swapping. */ + if (idle_time && !assign && (!(p->state & TASK_INTERRUPTIBLE) || + time_before(p->sleep_time + idle_time * HZ, jiffies))) + continue; + found_task++; /* Refresh swap_cnt? */ if (assign == 1) { mm->swap_cnt = (mm->rss >> SWAP_SHIFT); @@ -430,7 +436,7 @@ } read_unlock(&tasklist_lock); if (!best) { - if (!assign) { + if (!assign && found_task > 0) { assign = 1; goto select; } @@ -691,9 +697,9 @@ * Now the page is really freeable, so we * move it to the inactive_clean list. */ - UnlockPage(page); del_page_from_inactive_dirty_list(page); add_page_to_inactive_clean_list(page); + UnlockPage(page); cleaned_pages++; } else { /* @@ -701,9 +707,9 @@ * It's no use keeping it here, so we move it to * the active list. */ - UnlockPage(page); del_page_from_inactive_dirty_list(page); add_page_to_active_list(page); + UnlockPage(page); } } spin_unlock(&pagemap_lru_lock); @@ -860,6 +866,7 @@ static int refill_inactive(unsigned int gfp_mask, int user) { int priority, count, start_count, made_progress; + unsigned long idle_time; count = inactive_shortage() + free_shortage(); if (user) @@ -869,16 +876,28 @@ /* Always trim SLAB caches when memory gets low. */ kmem_cache_reap(gfp_mask); + /* + * Calculate the minimum time (in seconds) a process must + * have slept before we consider it for idle swapping. + * This must be the number of seconds it takes to go through + * all of the cache. Doing this idle swapping makes the VM + * smoother once we start hitting swap. + */ + idle_time = atomic_read(&page_cache_size); + idle_time += atomic_read(&buffermem_pages); + idle_time /= (inactive_target + 1); + priority = 6; do { made_progress = 0; - if (current->need_resched) { + if (current->need_resched && (gfp_mask & __GFP_IO)) { __set_current_state(TASK_RUNNING); schedule(); } - while (refill_inactive_scan(priority, 1)) { + while (refill_inactive_scan(priority, 1) || + swap_out(priority, gfp_mask, idle_time)) { made_progress = 1; if (!--count) goto done; @@ -913,7 +932,7 @@ /* * Then, try to page stuff out.. */ - while (swap_out(priority, gfp_mask)) { + while (swap_out(priority, gfp_mask, 0)) { made_progress = 1; if (!--count) goto done; @@ -963,7 +982,8 @@ * before we get around to moving them to the other * list, so this is a relatively cheap operation. */ - if (free_shortage()) + if (free_shortage() || nr_inactive_dirty_pages > nr_free_pages() + + nr_inactive_clean_pages()) ret += page_launder(gfp_mask, user); /* @@ -1070,9 +1090,12 @@ run_task_queue(&tq_disk); /* - * If we've either completely gotten rid of the - * free page shortage or the inactive page shortage - * is getting low, then stop eating CPU time. + * We go to sleep if either the free page shortage + * or the inactive page shortage is gone. We do this + * because: + * 1) we need no more free pages or + * 2) the inactive pages need to be flushed to disk, + * it wouldn't help to eat CPU time now ... * * We go to sleep for one second, but if it's needed * we'll be woken up earlier... --- linux/include/linux/sched.h.orig Fri Sep 22 02:31:04 2000 +++ linux/include/linux/sched.h Fri Sep 22 02:31:13 2000 @@ -298,6 +298,7 @@ * that's just fine.) */ struct list_head run_list; + unsigned long sleep_time; struct task_struct *next_task, *prev_task; struct mm_struct *active_mm; @@ -818,6 +819,7 @@ static inline void del_from_runqueue(struct task_struct * p) { nr_running--; + p->sleep_time = jiffies; list_del(&p->run_list); p->run_list.next = NULL; } ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: test9-pre5+t9p2-vmpatch VM deadlock during write-intensive workload 2000-09-22 10:27 ` Rik van Riel @ 2000-09-22 13:10 ` André Dahlqvist 2000-09-22 14:10 ` André Dahlqvist 2000-09-22 16:20 ` test9-pre5+t9p2-vmpatch VM deadlock during write-intensive workload Mohammad A. Haque 0 siblings, 2 replies; 20+ messages in thread From: André Dahlqvist @ 2000-09-22 13:10 UTC (permalink / raw) To: Rik van Riel Cc: Molnar Ingo, David S. Miller, torvalds, linux-kernel, linux-mm On Fri, Sep 22, 2000 at 07:27:30AM -0300, Rik van Riel wrote: > Linus, > > could you please include this patch in the next > pre patch? Rik, I just had an oops with this patch applied. I ran into BUG at buffer.c:730. The machine was not under load when the oops occured, I was just reading e-mail in Mutt. I had to type the oops down by hand, but I will provide ksymoops output soon if you need it. -- // Andre -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: test9-pre5+t9p2-vmpatch VM deadlock during write-intensive workload 2000-09-22 13:10 ` André Dahlqvist @ 2000-09-22 14:10 ` André Dahlqvist 2000-09-22 16:38 ` test9-pre3+t9p2-vmpatch VM deadlock during socket I/O Yuri Pudgorodsky 2000-09-22 16:20 ` test9-pre5+t9p2-vmpatch VM deadlock during write-intensive workload Mohammad A. Haque 1 sibling, 1 reply; 20+ messages in thread From: André Dahlqvist @ 2000-09-22 14:10 UTC (permalink / raw) To: Rik van Riel, Molnar Ingo, David S. Miller, torvalds, linux-kernel, linux-mm > I had to type the oops down by hand, but I will provide ksymoops > output soon if you need it. Let's hope I typed down the oops from the screen without misstakes. Here is the ksymoops output: ksymoops 2.3.4 on i586 2.4.0-test9. Options used -V (default) -k 20000922143001.ksyms (specified) -l 20000922143001.modules (specified) -o /lib/modules/2.4.0-test9/ (default) -m /boot/System.map-2.4.0-test9 (default) invalid operand: 0000 CPU: 0 EIP: 0010:[<c012c1be>] Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010086 eax: 0000001c ebx: c31779e0 ecx: 00000000 edx: 00000082 esi: c11f6f80 edi: 00000008 ebp: 00000001 esp: c01f3eec ds: 0018 es: 0018 ss: 0018 Process swapper (pid:0, stackpage=c01f3000) Stack: c01bb465 c01bb79a 000002da c0150d3f e31779e0 00000001 c11f6480 00000046 c1168360 c0248460 c01684e3 c11f6f80 00000001 c0248584 00000000 c11f6f80 c02484a0 c016e563 00000001 c1168360 c02484a0 c1168360 00000286 c0169cc7 Call Trace: [<c01bb4b5>] [<c01bb79a>] [<c0150d3f>] [<c01684e3>] [<c016e563>] [<c0169cc7>] [<c016e500>] [<c010a02c>] [<c010a18e>] [<c0107120>] [<c0108de0>] [<c0107120>] [<c0107143>] [<c01071a7>] [<c0105000>] [<c0100192>] Code: 0f 0b 83 c4 0c c3 57 56 53 86 74 24 10 8b 54 24 14 85 d2 74 >>EIP; c012c1be <end_buffer_io_bad+42/48> <===== Trace; c01bb4b5 <tvecs+36dd/cde8> Trace; c01bb79a <tvecs+39c2/cde8> Trace; c0150d3f <end_that_request_first+5f/b8> Trace; c01684e3 <ide_end_request+27/74> Trace; c016e563 <ide_dma_intr+63/9c> Trace; c0169cc7 <ide_intr+fb/150> Trace; c016e500 <ide_dma_intr+0/9c> Trace; c010a02c <handle_IRQ_event+30/5c> Trace; c010a18e <do_IRQ+6e/b0> Trace; c0107120 <default_idle+0/28> Trace; c0108de0 <ret_from_intr+0/20> Trace; c0107120 <default_idle+0/28> Trace; c0107143 <default_idle+23/28> Trace; c01071a7 <cpu_idle+3f/54> Trace; c0105000 <empty_bad_page+0/1000> Trace; c0100192 <L6+0/2> Code; c012c1be <end_buffer_io_bad+42/48> 00000000 <_EIP>: Code; c012c1be <end_buffer_io_bad+42/48> <===== 0: 0f 0b ud2a <===== Code; c012c1c0 <end_buffer_io_bad+44/48> 2: 83 c4 0c add $0xc,%esp Code; c012c1c3 <end_buffer_io_bad+47/48> 5: c3 ret Code; c012c1c4 <end_buffer_io_async+0/b4> 6: 57 push %edi Code; c012c1c5 <end_buffer_io_async+1/b4> 7: 56 push %esi Code; c012c1c6 <end_buffer_io_async+2/b4> 8: 53 push %ebx Code; c012c1c7 <end_buffer_io_async+3/b4> 9: 86 74 24 10 xchg %dh,0x10(%esp,1) Code; c012c1cb <end_buffer_io_async+7/b4> d: 8b 54 24 14 mov 0x14(%esp,1),%edx Code; c012c1cf <end_buffer_io_async+b/b4> 11: 85 d2 test %edx,%edx Code; c012c1d1 <end_buffer_io_async+d/b4> 13: 74 00 je 15 <_EIP+0x15> c012c1d3 <end_buffer_io_async+f/b4> Aiee, killing interrupt handler Kernel panic: Attempted to kill the idle task! -- // Andre -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* test9-pre3+t9p2-vmpatch VM deadlock during socket I/O 2000-09-22 14:10 ` André Dahlqvist @ 2000-09-22 16:38 ` Yuri Pudgorodsky 0 siblings, 0 replies; 20+ messages in thread From: Yuri Pudgorodsky @ 2000-09-22 16:38 UTC (permalink / raw) To: André Dahlqvist Cc: Rik van Riel, Molnar Ingo, David S. Miller, torvalds, linux-kernel, linux-mm I also encounter instant lockup of test9-pre3 + t9p2-vmpatch / SMP (two CPU). under high I/O via UNIX domain sockets: - running 10 simple tasks doing #define BUFFERSIZE 204800 for (j = 0; ; j++) { if (socketpair(PF_LOCAL, SOCK_STREAM, 0, p) == -1) { exit(1); } fcntl(p[0], F_SETFL, O_NONBLOCK); fcntl(p[1], F_SETFL, O_NONBLOCK); write(p[0], crap, BUFFERSIZE); write(p[1], crap, BUFFERSIZE); } So it looks like swap_out() cannot obtain lock_kernel() holded by a swap_out() on a second CPU.... See below. Call trace (looks very similar on both CPU): Trace; c020aa3e <stext_lock+18a6/8848> (called from c0133eb4 <swap_out+0x28>) Trace; c0133eb4 <swap_out+28/228> args (6, 3, 0) Trace; c0134e50 <refill_inactive+c8/170> args (3, 1) Trace; c0134f75 <do_try_to_free_pages+7d/9c> args (3,1) Trace; c0135168 <wakeup_kswapd+84/bc> Trace; c0135d72 <__alloc_pages+1d6/264> Trace; c0135e17 <__get_free_pages+17/28> Trace; c01322ce <kmem_cache_grow+e2/264> .... Under lockup, memory map looks like: Active: 121 Inactive_dirty: 12217 Inactive_clean: 0 free: 12210 (256 512 768) and does not change from time to time. Most frequent EIP locations (from Sys-AltRq/P): Trace; c0133f74 <swap_out+e8/228> Trace; c0133f23 <swap_out+97/228> Trace; c0134039 <swap_out+1ad/228> Trace; c020aa37 <stext_lock+189f/8848> Trace; c020aa3e <stext_lock+18a6/8848> In a hope for a quick fix, Yuri Pudgorodsky -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: test9-pre5+t9p2-vmpatch VM deadlock during write-intensive workload 2000-09-22 13:10 ` André Dahlqvist 2000-09-22 14:10 ` André Dahlqvist @ 2000-09-22 16:20 ` Mohammad A. Haque 1 sibling, 0 replies; 20+ messages in thread From: Mohammad A. Haque @ 2000-09-22 16:20 UTC (permalink / raw) To: André Dahlqvist Cc: Rik van Riel, Molnar Ingo, David S. Miller, torvalds, linux-kernel, linux-mm If the process that barfed is swapper then this is the oops that I got in test9-pre4 w/o any patches. http://marc.theaimsgroup.com/?l=linux-kernel&m=96936789621245&w=2 On Fri, 22 Sep 2000, Andre Dahlqvist wrote: > On Fri, Sep 22, 2000 at 07:27:30AM -0300, Rik van Riel wrote: > > > Linus, > > > > could you please include this patch in the next > > pre patch? > > Rik, > > I just had an oops with this patch applied. I ran into BUG at > buffer.c:730. The machine was not under load when the oops occured, I > was just reading e-mail in Mutt. I had to type the oops down by hand, > but I will provide ksymoops output soon if you need it. > -- ===================================================================== Mohammad A. Haque http://www.haque.net/ mhaque@haque.net "Alcohol and calculus don't mix. Project Lead Don't drink and derive." --Unknown http://wm.themes.org/ batmanppc@themes.org ===================================================================== -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: test9-pre5+t9p2-vmpatch VM deadlock during write-intensive workload 2000-09-22 8:54 ` test9-pre5+t9p2-vmpatch VM deadlock during write-intensive workload Molnar Ingo 2000-09-22 9:00 ` Molnar Ingo 2000-09-22 9:08 ` Rik van Riel @ 2000-09-22 17:39 ` Linus Torvalds 2000-09-25 13:47 ` Rik van Riel 2 siblings, 1 reply; 20+ messages in thread From: Linus Torvalds @ 2000-09-22 17:39 UTC (permalink / raw) To: Molnar Ingo; +Cc: Rik van Riel, David S. Miller, linux-kernel, linux-mm On Fri, 22 Sep 2000, Molnar Ingo wrote: > > i'm still getting VM related lockups during heavy write load, in > test9-pre5 + your 2.4.0-t9p2-vmpatch (which i understand as being your > last VM related fix-patch, correct?). Here is a histogram of such a > lockup: Rik, those VM patches are going away RSN if these issues do not get fixed. I'm really disappointed, and suspect that it would be easier to go back to the old VM with just page aging added, not your new code that seems to be full of deadlocks everywhere. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: test9-pre5+t9p2-vmpatch VM deadlock during write-intensive workload 2000-09-22 17:39 ` Linus Torvalds @ 2000-09-25 13:47 ` Rik van Riel 0 siblings, 0 replies; 20+ messages in thread From: Rik van Riel @ 2000-09-25 13:47 UTC (permalink / raw) To: Linus Torvalds; +Cc: Molnar Ingo, David S. Miller, linux-kernel, linux-mm On Fri, 22 Sep 2000, Linus Torvalds wrote: > On Fri, 22 Sep 2000, Molnar Ingo wrote: > > > > i'm still getting VM related lockups during heavy write load, in > > test9-pre5 + your 2.4.0-t9p2-vmpatch (which i understand as being your > > last VM related fix-patch, correct?). Here is a histogram of such a > > lockup: > > those VM patches are going away RSN if these issues do not get > fixed. I'm really disappointed, and suspect that it would be > easier to go back to the old VM with just page aging added, not > your new code that seems to be full of deadlocks everywhere. I've been away on a conference last week, so I haven't had much chance to take a look at the code after you integrated it and the test base got increased ;( One thing I discovered are some UP-only deadlocks and the page ping-pong thing, which I am fixing right now. If I had a choice, I'd have chosen /next/ week as the time to integrate the code ... doing this while I'm away at a conference was really inconvenient ;) I'm looking into the email backlog and the bug reports right now (today, tuesday and wednesday I'm at /another/ conferenc and thursday will be the next opportunity). It looks like ther are no fundamental issues left, just a bunch of small thinkos that can be fixed in a (few?) week(s). regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [patch *] VM deadlock fix 2000-09-21 16:44 [patch *] VM deadlock fix Rik van Riel 2000-09-21 20:28 ` Roger Larsson 2000-09-21 22:23 ` [patch *] VM deadlock fix David S. Miller @ 2000-09-22 12:16 ` Martin Diehl 2 siblings, 0 replies; 20+ messages in thread From: Martin Diehl @ 2000-09-22 12:16 UTC (permalink / raw) To: Rik van Riel; +Cc: linux-kernel, linux-mm On Thu, 21 Sep 2000, Rik van Riel wrote: > I've found and fixed the deadlocks in the new VM. They turned out > to be single-cpu only bugs, which explains why they didn't crash my > SMP tesnt box ;) Hi, tried > http://www.surriel.com/patches/2.4.0-t9p2-vmpatch applied to 2.4.0-t9p4 on UP box booted with mem=8M. The deadlock behaviour appears to be somehow different compared to vanilla 2.4.0-t9p4 - however, for me it makes things even worse: I booted into singleuser and used dd if=/dev/urandom of=/dev/null count=1 bs=x to trigger the issue by increasing bs-values. As soon as bs is big enough to force swapping (about 3M in my case) the box "deadlocks". What has become worse is, that SysRq+e (or k) doesn't help anymore with this patch applied. So I had to SysRq+b and ended fscking (but no fs-corruption). Without the patch this was not a problem. Some more points I've notized: * apparently, the deadlock happens when the box begins to swap. I never found any used swapspace with the new VM from 2.4.0-t9p*. If memory requests force the use of swapspace, the machine deadlocks. * when, after deadlocking, I pressed SysRq+t several times I found - either dd or kswapd being current task in vanilla 2.4.0-t9p4 - neither dd nor kswapd ever being current with this patch * as an printk() in the main loop shows, kreclaimd *never* awoke * My impression was similar to what somebody has already reported: seems something related to refill_inactive_scan() is recursing to infinity when the "deadlock" happens. * the behaviour of kswapd without this last patch differs significantly before and after the first deadlock happens (and released by SysRq+e): only *after* pressing SysRq+e (or k) kswapd awoke once per second on the idle box. This is strange since it should sleep with timeout=HZ in its main loop. Especially the last point suggests to me there might be a problem at initialization. I'm not sure, whether everything called from kswapd is properly initialized at the time when the kswapd-thread is created. To check this, I've tentatively added an additional interruptible_sleep_on_timeout() before kswapd's main loop to delay it until initialization has finished. Probably it would be more "Right" to move the sleep from the end of the main loop to its beginning - however, I just tried a quick hack and did not check if the *_shortage() stuff is ready to be called at init time. The additional sleep before kswapd enters its main loop was a major improvement for me: * my dd-tests did not deadlock anymore - even with bs=100M and mem=8M * swap space was really used now. * i was able to advance beyond singleuser with 2.4.0-t9p* and mem=8M for the very first time (always deadlocked in the init-scripts) * i was even able to make bzImage - but it dumped core after about 15 Min for unknown reason (probably out of memory) but without any deadlock. Box was at av. load 3 and 15M swap used at this time. * I found kreclaimd *was* awoken several times. * however, kswapd still not awaking every second after fresh boot. Now it begins to awake as soon as real swapping starts. So, my conclusion is the "deadlock" issue might be mainly an initialization problem. Probably some more special handling is needed at swapon later. Currently my guess is there is a initialization problem when kswapd starts and some kind of blocking when refill_inactive_scan() is called before swapon. Comments? Will do some more tests (including your latest patch). Regards Martin -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2000-09-25 13:47 UTC | newest] Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2000-09-21 16:44 [patch *] VM deadlock fix Rik van Riel 2000-09-21 20:28 ` Roger Larsson 2000-09-21 23:31 ` Problem remains - page_launder? (Was: Re: [patch *] VM deadlock fix) Roger Larsson 2000-09-21 22:23 ` [patch *] VM deadlock fix David S. Miller 2000-09-22 0:18 ` Andrea Arcangeli 2000-09-21 23:57 ` David S. Miller 2000-09-22 8:39 ` Rik van Riel 2000-09-22 8:54 ` test9-pre5+t9p2-vmpatch VM deadlock during write-intensive workload Molnar Ingo 2000-09-22 9:00 ` Molnar Ingo 2000-09-22 9:08 ` Rik van Riel 2000-09-22 9:14 ` Molnar Ingo 2000-09-22 9:34 ` Molnar Ingo 2000-09-22 10:27 ` Rik van Riel 2000-09-22 13:10 ` André Dahlqvist 2000-09-22 14:10 ` André Dahlqvist 2000-09-22 16:38 ` test9-pre3+t9p2-vmpatch VM deadlock during socket I/O Yuri Pudgorodsky 2000-09-22 16:20 ` test9-pre5+t9p2-vmpatch VM deadlock during write-intensive workload Mohammad A. Haque 2000-09-22 17:39 ` Linus Torvalds 2000-09-25 13:47 ` Rik van Riel 2000-09-22 12:16 ` [patch *] VM deadlock fix Martin Diehl
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox