* VM problem with 2.4.8-ac9 (fwd) @ 2001-08-22 19:25 Rik van Riel 2001-08-22 18:27 ` Marcelo Tosatti 2001-08-22 21:14 ` Alan Cox 0 siblings, 2 replies; 21+ messages in thread From: Rik van Riel @ 2001-08-22 19:25 UTC (permalink / raw) To: Alan Cox; +Cc: linux-mm, Marcelo Tosatti, Jari Ruusu Hi Alan, Another report of tasks dying on recent 2.4 kernels. Suspect code would be: - tlb optimisations in recent -ac (tasks dying with segfault) - swapfile.c, especially sys_swapoff (known race condition, marcelo?) What would cause the swap map badness below I wouldn't know, maybe marcelo is more familiar with the swapfile.c code... regards, Rik -- IA64: a worthy successor to the i860. ---------- Forwarded message ---------- Date: Wed, 22 Aug 2001 20:37:01 +0300 From: Jari Ruusu <jari.ruusu@pp.inet.fi> To: Rik van Riel <riel@conectiva.com.br> Subject: VM problem with 2.4.8-ac9 Unused swap offset entry in swap_dup 00519e00 VM: Bad swap entry 00519e00 Unused swap offset entry in swap_count 00519e00 Unused swap offset entry in swap_count 00519e00 VM: Bad swap entry 00519e00 Unused swap offset entry in swap_dup 006b8a00 VM: Bad swap entry 006b8a00 Unused swap offset entry in swap_dup 006b8a00 VM: killing process nscd Unused swap offset entry in swap_dup 006b8a00 VM: killing process nscd VM: Bad swap entry 006b8a00 Unused swap offset entry in swap_dup 005e6900 VM: Bad swap entry 005e6900 Unused swap offset entry in swap_dup 005e6900 VM: killing process init Unused swap offset entry in swap_dup 005e6900 VM: killing process init Unused swap offset entry in swap_dup 005e6900 VM: killing process init Unused swap offset entry in swap_dup 005e6900 VM: killing process init Kernel panic: Attempted to kill init! Linux debian 2.4.8-ac9 #1 Wed Aug 22 16:04:25 EEST 2001 i686 unknown Gnu C 2.95.3 Gnu make 3.79.1 binutils 2.9.5.0.37 mount 2.11g modutils 2.4.6 e2fsprogs 1.18 PPP 2.3.11 Linux C Library 2.1.3 ldd: version 1.9.11 Procps 2.0.6 Net-tools 1.54 Console-tools 0.2.3 Sh-utils 2.0 I get a repeatable VM failure with recent 2.4 kernels, tested with 2.4.8-ac[789] on x86 architecture. My VM torture test consists of following: boot the kernel with "mem=16M" parameter, start X11 and a couple xterms running kernel compile, glibc compile, bzip2 decompressor + tar, and top. Also xosview was running. Working memory need of such setup is way over available RAM, and swap use was about 20-35 MB (of 190 MB available swap), and swapping activity was _continuous_. Kernel 2.2.19aa2 survives the torture (everything else being same), and memtest-86 does not find any errors, so it is unlikely to be hardware failure. Anyway, the box dies after about 1-3 hours of torture. Sometimes it just kills some random process. I captured above info using serial console. If you need more info (.config, System.map, whatever) just ask for it. I am willing to do more testing, just tell me what you need done. Regards, Jari Ruusu <jari.ruusu@pp.inet.fi> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: VM problem with 2.4.8-ac9 (fwd) 2001-08-22 19:25 VM problem with 2.4.8-ac9 (fwd) Rik van Riel @ 2001-08-22 18:27 ` Marcelo Tosatti 2001-08-23 8:25 ` Jari Ruusu 2001-08-22 21:14 ` Alan Cox 1 sibling, 1 reply; 21+ messages in thread From: Marcelo Tosatti @ 2001-08-22 18:27 UTC (permalink / raw) To: Rik van Riel; +Cc: Alan Cox, linux-mm, Jari Ruusu On Wed, 22 Aug 2001, Rik van Riel wrote: > Hi Alan, > > Another report of tasks dying on recent 2.4 kernels. > Suspect code would be: > - tlb optimisations in recent -ac (tasks dying with segfault) > - swapfile.c, especially sys_swapoff (known race condition, marcelo?) There are known races on swapoff, but Jari is not running swapoff... > > What would cause the swap map badness below I wouldn't know, > maybe marcelo is more familiar with the swapfile.c code... Jari, 1) Are you using an SMP kernel? 2) Did you tried with older kernels or 2.4.9? > > Rik > -- > IA64: a worthy successor to the i860. > ---------- Forwarded message ---------- > Date: Wed, 22 Aug 2001 20:37:01 +0300 > From: Jari Ruusu <jari.ruusu@pp.inet.fi> > To: Rik van Riel <riel@conectiva.com.br> > Subject: VM problem with 2.4.8-ac9 > > Unused swap offset entry in swap_dup 00519e00 > VM: Bad swap entry 00519e00 > Unused swap offset entry in swap_count 00519e00 > Unused swap offset entry in swap_count 00519e00 > VM: Bad swap entry 00519e00 > Unused swap offset entry in swap_dup 006b8a00 > VM: Bad swap entry 006b8a00 > Unused swap offset entry in swap_dup 006b8a00 > VM: killing process nscd > Unused swap offset entry in swap_dup 006b8a00 > VM: killing process nscd > VM: Bad swap entry 006b8a00 > Unused swap offset entry in swap_dup 005e6900 > VM: Bad swap entry 005e6900 > Unused swap offset entry in swap_dup 005e6900 > VM: killing process init > Unused swap offset entry in swap_dup 005e6900 > VM: killing process init > Unused swap offset entry in swap_dup 005e6900 > VM: killing process init > Unused swap offset entry in swap_dup 005e6900 > VM: killing process init > Kernel panic: Attempted to kill init! > > Linux debian 2.4.8-ac9 #1 Wed Aug 22 16:04:25 EEST 2001 i686 unknown > Gnu C 2.95.3 > Gnu make 3.79.1 > binutils 2.9.5.0.37 > mount 2.11g > modutils 2.4.6 > e2fsprogs 1.18 > PPP 2.3.11 > Linux C Library 2.1.3 > ldd: version 1.9.11 > Procps 2.0.6 > Net-tools 1.54 > Console-tools 0.2.3 > Sh-utils 2.0 > > I get a repeatable VM failure with recent 2.4 kernels, tested with > 2.4.8-ac[789] on x86 architecture. My VM torture test consists of following: > boot the kernel with "mem=16M" parameter, start X11 and a couple xterms > running kernel compile, glibc compile, bzip2 decompressor + tar, and top. > Also xosview was running. Working memory need of such setup is way over > available RAM, and swap use was about 20-35 MB (of 190 MB available swap), > and swapping activity was _continuous_. Kernel 2.2.19aa2 survives the > torture (everything else being same), and memtest-86 does not find any > errors, so it is unlikely to be hardware failure. > > Anyway, the box dies after about 1-3 hours of torture. Sometimes it just > kills some random process. I captured above info using serial console. If > you need more info (.config, System.map, whatever) just ask for it. I am > willing to do more testing, just tell me what you need done. > > Regards, > Jari Ruusu <jari.ruusu@pp.inet.fi> > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: VM problem with 2.4.8-ac9 (fwd) 2001-08-22 18:27 ` Marcelo Tosatti @ 2001-08-23 8:25 ` Jari Ruusu 2001-08-23 17:35 ` Jari Ruusu 0 siblings, 1 reply; 21+ messages in thread From: Jari Ruusu @ 2001-08-23 8:25 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: Rik van Riel, Alan Cox, linux-mm Marcelo Tosatti wrote: > On Wed, 22 Aug 2001, Rik van Riel wrote: > > > Hi Alan, > > > > Another report of tasks dying on recent 2.4 kernels. > > Suspect code would be: > > - tlb optimisations in recent -ac (tasks dying with segfault) > > - swapfile.c, especially sys_swapoff (known race condition, marcelo?) > > There are known races on swapoff, but Jari is not running swapoff... Correct, not running swapoff. > > > > What would cause the swap map badness below I wouldn't know, > > maybe marcelo is more familiar with the swapfile.c code... > > Jari, > > 1) Are you using an SMP kernel? No. > 2) Did you tried with older kernels or 2.4.9? Linus' 2.4.9 survived about 7 hours of VM torture, and then I got ext2 filesystem corruption (just once, dunno if it is repeatable). No "swap offset" problems with 2.4.9 so far. Haven't tortured older kernels yet. Regards, Jari Ruusu <jari.ruusu@pp.inet.fi> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: VM problem with 2.4.8-ac9 (fwd) 2001-08-23 8:25 ` Jari Ruusu @ 2001-08-23 17:35 ` Jari Ruusu 2001-08-23 20:24 ` Hugh Dickins 0 siblings, 1 reply; 21+ messages in thread From: Jari Ruusu @ 2001-08-23 17:35 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: Rik van Riel, Alan Cox, linux-mm Jari Ruusu wrote: > Marcelo Tosatti wrote: > > 2) Did you tried with older kernels or 2.4.9? > > Linus' 2.4.9 survived about 7 hours of VM torture, and then I got ext2 > filesystem corruption (just once, dunno if it is repeatable). No "swap > offset" problems with 2.4.9 so far. Haven't tortured older kernels yet. Update: Stock Linus' 2.4.7 ~~~~~~~~~~~~~~~~~~ Box didn't die but I stopped VM torture test after this appeared: Unused swap offset entry in swap_dup 003d6b00 VM: Bad swap entry 003d6b00 Unused swap offset entry in swap_count 003d6b00 VM: Bad swap entry 003d6b00 Stock Linus' 2.4.8 ~~~~~~~~~~~~~~~~~~ bzip2 decompress + tar failed (twice, but at different place): > tar: Skipping to next header > > bzip2: Caught a SIGSEGV or SIGBUS whilst decompressing, > which probably indicates that the compressed data > is corrupted. > Input file = (stdin), output file = (stdout) > > It is possible that the compressed file(s) have become corrupted. > You can use the -tvv option to test integrity of such files. > > You can use the `bzip2recover' program to *attempt* to recover > data from undamaged sections of corrupted files. > > tar: 360 garbage bytes ignored at end of archive > tar: Child returned status 2 > tar: Error exit delayed from previous errors glibc compile failed: > make[2]: *** [math/subdir_install] Segmentation fault Note: previously mentioned ext2 filesystem corruption (with kernel 2.4.9) was in bzip2 decompress + tar restored directory hierarchy, so above bzip2 decompress + tar failure can explain that too. Maybe something went similarly wrong and bzip2 outputted garbage to tar instead of terminating with SIGSEGV. 6 hours of VM torture, 3 incidents where a process died with SIGSEGV. No "swap offset" problems with 2.4.8 so far. Regards, Jari Ruusu <jari.ruusu@pp.inet.fi> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: VM problem with 2.4.8-ac9 (fwd) 2001-08-23 17:35 ` Jari Ruusu @ 2001-08-23 20:24 ` Hugh Dickins 2001-08-23 20:29 ` Alan Cox 2001-08-24 17:23 ` Jari Ruusu 0 siblings, 2 replies; 21+ messages in thread From: Hugh Dickins @ 2001-08-23 20:24 UTC (permalink / raw) To: Jari Ruusu Cc: Marcelo Tosatti, Rik van Riel, Alan Cox, Jeremy Linton, linux-mm On Thu, 23 Aug 2001, Jari Ruusu wrote: > > Box didn't die but I stopped VM torture test after this appeared: > > Unused swap offset entry in swap_dup 003d6b00 > VM: Bad swap entry 003d6b00 > Unused swap offset entry in swap_count 003d6b00 > VM: Bad swap entry 003d6b00 Don't stop your test when such messages appear, under heavy swapping they can appear even when the system is proceeding correctly, not only when doing swapoff. (In a separate, swapoff patch I've suppressed them, but won't bother you with that here.) Alan has intentionally been avoiding many of the VM "fixes" in Linus' tree, Rik has been feeding him some of the less controversial ones, but I believe there are important ones missing (unrelated to aging and tuning etc.). Looking no further than mm/memory.c, patch below to bring 2.4.8-ac9 in line with 2.4.9 there: 1. lock_kiovec page unwind fix (velizarb@pirincom.com) 2. copy_cow_page & clear_user_highpage can block in kmap (Anton Blanchard, Ingo Molnar, Linus Torvalds, Hugh Dickins) 3. do_swap_page recheck pte before failing (Jeremy Linton, Linus Torvalds) 4. do_swap_page don't mkwrite when deleting from swap cache (Linus Torvalds) The first has no relevance to your issues, but should be in -ac. The second is rarely needed, but I wouldn't want to run a torture test on a highmem machine (okay, yours is far from that!) without it. The third is probably the fix to the process killing you saw near the Unused swap and Bad swap messages. The fourth is a correction Linus made to Rik's swap freeing, which might also have some bearing. (Alan, some doubts in do_wp_page: the additional PageReserved test may be redundant in view of your earlier ZERO_PAGE test, but I felt safer to include it; and I was dubious about the additional set_pte and ptep_get_and_clear in your version, but don't know the history and let them stay.) Hugh --- 2.4.8-ac9/mm/memory.c Thu Aug 23 12:31:32 2001 +++ linux/mm/memory.c Thu Aug 23 19:56:55 2001 @@ -611,9 +611,9 @@ if (TryLockPage(page)) { while (j--) { - page = *(--ppage); - if (page) - UnlockPage(page); + struct page *tmp = *--ppage; + if (tmp) + UnlockPage(tmp); } goto retry; } @@ -856,10 +856,9 @@ /* * We hold the mm semaphore for reading and vma->vm_mm->page_table_lock */ -static inline void break_cow(struct vm_area_struct * vma, struct page * old_page, struct page * new_page, unsigned long address, +static inline void break_cow(struct vm_area_struct * vma, struct page * new_page, unsigned long address, pte_t *page_table) { - copy_cow_page(old_page,new_page,address); flush_page_to_ram(new_page); flush_cache_page(vma, address); establish_pte(vma, address, page_table, pte_mkwrite(pte_mkdirty(mk_pte(new_page, vma->vm_page_prot)))); @@ -923,6 +922,8 @@ break; /* FallThrough */ case 1: + if (PageReserved(old_page)) + break; flush_cache_page(vma, address); establish_pte(vma, address, page_table, pte_mkyoung(pte_mkdirty(pte_mkwrite(pte)))); return 1; /* Minor fault */ @@ -932,16 +933,20 @@ * Ok, we need to copy. Oh, well.. */ copy: - set_pte(page_table, pte); + set_pte(page_table, pte); + page_cache_get(old_page); spin_unlock(&mm->page_table_lock); + new_page = alloc_page(GFP_HIGHUSER); - spin_lock(&mm->page_table_lock); if (!new_page) - return -1; + goto no_mem; + copy_cow_page(old_page,new_page,address); + page_cache_release(old_page); /* * Re-check the pte - we dropped the lock */ + spin_lock(&mm->page_table_lock); if (pte_same(*page_table, pte)) { /* We are changing the pte, so get rid of the old * one to avoid races with the hardware, this really @@ -950,7 +955,7 @@ pte = ptep_get_and_clear(page_table); if (PageReserved(old_page)) ++mm->rss; - break_cow(vma, old_page, new_page, address, page_table); + break_cow(vma, new_page, address, page_table); /* Free the old page.. */ new_page = old_page; @@ -961,6 +966,10 @@ bad_wp_page: printk("do_wp_page: bogus page at address %08lx (page 0x%lx)\n",address,(unsigned long)old_page); return -1; +no_mem: + page_cache_release(old_page); + spin_lock(&mm->page_table_lock); + return -1; } static void vmtruncate_list(struct vm_area_struct *mpnt, unsigned long pgoff) @@ -1099,9 +1108,10 @@ */ static int do_swap_page(struct mm_struct * mm, struct vm_area_struct * vma, unsigned long address, - pte_t * page_table, swp_entry_t entry, int write_access) + pte_t * page_table, pte_t orig_pte, int write_access) { struct page *page; + swp_entry_t entry = pte_to_swp_entry(orig_pte); pte_t pte; int ret = 1; @@ -1114,7 +1124,11 @@ unlock_kernel(); if (!page) { spin_lock(&mm->page_table_lock); - return -1; + /* + * Back out if somebody else faulted in this pte while + * we released the page table lock. + */ + return pte_same(*page_table, orig_pte) ? -1 : 1; } /* Had to read the page from swap area: Major fault */ @@ -1133,7 +1147,7 @@ * released the page table lock. */ spin_lock(&mm->page_table_lock); - if (pte_present(*page_table)) { + if (!pte_same(*page_table, orig_pte)) { UnlockPage(page); page_cache_release(page); return 1; @@ -1144,21 +1158,13 @@ pte = mk_pte(page, vma->vm_page_prot); swap_free(entry); - if (write_access && exclusive_swap_page(page)) - pte = pte_mkwrite(pte_mkdirty(pte)); - - /* - * If swap space is getting low and we were the last user - * of this piece of swap space, we free this space so - * somebody else can be swapped out. - * - * We are protected against try_to_swap_out() because the - * page is locked and against do_fork() because we have - * read_lock(&mm->mmap_sem). - */ - if (vm_swap_full() && exclusive_swap_page(page)) { - delete_from_swap_cache_nolock(page); - pte = pte_mkwrite(pte_mkdirty(pte)); + if (exclusive_swap_page(page)) { + if (write_access) + pte = pte_mkwrite(pte_mkdirty(pte)); + if (vm_swap_full()) { + delete_from_swap_cache_nolock(page); + pte = pte_mkdirty(pte); + } } UnlockPage(page); @@ -1189,16 +1195,18 @@ /* Allocate our own private page. */ spin_unlock(&mm->page_table_lock); + page = alloc_page(GFP_HIGHUSER); - spin_lock(&mm->page_table_lock); if (!page) - return -1; + goto no_mem; + clear_user_highpage(page, addr); + + spin_lock(&mm->page_table_lock); if (!pte_none(*page_table)) { page_cache_release(page); return 1; } mm->rss++; - clear_user_highpage(page, addr); flush_page_to_ram(page); entry = pte_mkwrite(pte_mkdirty(mk_pte(page, vma->vm_page_prot))); } @@ -1208,6 +1216,10 @@ /* No need to invalidate - it was non-present before */ update_mmu_cache(vma, addr, entry); return 1; /* Minor fault */ + +no_mem: + spin_lock(&mm->page_table_lock); + return -1; } /* @@ -1327,7 +1339,7 @@ */ if (pte_none(entry)) return do_no_page(mm, vma, address, write_access, pte); - return do_swap_page(mm, vma, address, pte, pte_to_swp_entry(entry), write_access); + return do_swap_page(mm, vma, address, pte, entry, write_access); } entry = ptep_get_and_clear(pte); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: VM problem with 2.4.8-ac9 (fwd) 2001-08-23 20:24 ` Hugh Dickins @ 2001-08-23 20:29 ` Alan Cox 2001-08-23 20:37 ` Rik van Riel 2001-08-24 17:23 ` Jari Ruusu 1 sibling, 1 reply; 21+ messages in thread From: Alan Cox @ 2001-08-23 20:29 UTC (permalink / raw) To: Hugh Dickins Cc: Jari Ruusu, Marcelo Tosatti, Rik van Riel, Alan Cox, Jeremy Linton, linux-mm > Alan has intentionally been avoiding many of the VM "fixes" in Linus' > tree, Rik has been feeding him some of the less controversial ones, > but I believe there are important ones missing (unrelated to aging > and tuning etc.). Looking no further than mm/memory.c, patch below > to bring 2.4.8-ac9 in line with 2.4.9 there: > > 1. lock_kiovec page unwind fix (velizarb@pirincom.com) > 2. copy_cow_page & clear_user_highpage can block in kmap > (Anton Blanchard, Ingo Molnar, Linus Torvalds, Hugh Dickins) > 3. do_swap_page recheck pte before failing (Jeremy Linton, Linus Torvalds) > 4. do_swap_page don't mkwrite when deleting from swap cache (Linus Torvalds) I've been avoiding the Linus paging ones because they seem to make my machines all crash repeatedly under any kind of serious test load. Plus the fact that after about 3 days they needed rebooting to get back from 386 speed. Rik traced down some more vm races so hopefully ac11 will have some more progress on this one. Alan -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: VM problem with 2.4.8-ac9 (fwd) 2001-08-23 20:29 ` Alan Cox @ 2001-08-23 20:37 ` Rik van Riel 0 siblings, 0 replies; 21+ messages in thread From: Rik van Riel @ 2001-08-23 20:37 UTC (permalink / raw) To: Alan Cox Cc: Hugh Dickins, Jari Ruusu, Marcelo Tosatti, Jeremy Linton, linux-mm On Thu, 23 Aug 2001, Alan Cox wrote: > > 1. lock_kiovec page unwind fix (velizarb@pirincom.com) > > 2. copy_cow_page & clear_user_highpage can block in kmap > > (Anton Blanchard, Ingo Molnar, Linus Torvalds, Hugh Dickins) I don't know enough about this code to properly fix it... > > 3. do_swap_page recheck pte before failing (Jeremy Linton, Linus Torvalds) > > 4. do_swap_page don't mkwrite when deleting from swap cache (Linus Torvalds) I'll look at these. I'll carefully merge some of Linus' stuff here. Careful because Linus seems to be getting various other things in eg. memory.c wrong and stripped off the comments of some pieces of code ;) cheers, Rik -- IA64: a worthy successor to the i860. http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: VM problem with 2.4.8-ac9 (fwd) 2001-08-23 20:24 ` Hugh Dickins 2001-08-23 20:29 ` Alan Cox @ 2001-08-24 17:23 ` Jari Ruusu 2001-08-24 17:41 ` Alan Cox 1 sibling, 1 reply; 21+ messages in thread From: Jari Ruusu @ 2001-08-24 17:23 UTC (permalink / raw) To: Hugh Dickins Cc: Marcelo Tosatti, Rik van Riel, Alan Cox, Jeremy Linton, linux-mm Hugh Dickins wrote: > 1. lock_kiovec page unwind fix (velizarb@pirincom.com) > 2. copy_cow_page & clear_user_highpage can block in kmap > (Anton Blanchard, Ingo Molnar, Linus Torvalds, Hugh Dickins) > 3. do_swap_page recheck pte before failing (Jeremy Linton, Linus Torvalds) > 4. do_swap_page don't mkwrite when deleting from swap cache (Linus Torvalds) VM torture results of 2.4.8-ac9 + Hugh's patch (version 23 Aug 2001 21:24:50), 8 hours of torture. 1 incident where a process died with SIGSEGV. No "swap offset" messages. glibc compile failed: make[2]: *** [math/subdir_lib] Segmentation fault Regards, Jari Ruusu <jari.ruusu@pp.inet.fi> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: VM problem with 2.4.8-ac9 (fwd) 2001-08-24 17:23 ` Jari Ruusu @ 2001-08-24 17:41 ` Alan Cox 2001-08-24 18:40 ` Marcelo Tosatti 0 siblings, 1 reply; 21+ messages in thread From: Alan Cox @ 2001-08-24 17:41 UTC (permalink / raw) To: Jari Ruusu Cc: Hugh Dickins, Marcelo Tosatti, Rik van Riel, Alan Cox, Jeremy Linton, linux-mm > VM torture results of 2.4.8-ac9 + Hugh's patch (version 23 Aug 2001 > 21:24:50), 8 hours of torture. 1 incident where a process died with SIGSEGV. > No "swap offset" messages. Great - Hugh can you forward me a copy of the patch. Alan -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: VM problem with 2.4.8-ac9 (fwd) 2001-08-24 17:41 ` Alan Cox @ 2001-08-24 18:40 ` Marcelo Tosatti 2001-08-24 20:11 ` Rik van Riel 0 siblings, 1 reply; 21+ messages in thread From: Marcelo Tosatti @ 2001-08-24 18:40 UTC (permalink / raw) To: Alan Cox; +Cc: Jari Ruusu, Hugh Dickins, Rik van Riel, Jeremy Linton, linux-mm On Fri, 24 Aug 2001, Alan Cox wrote: > > VM torture results of 2.4.8-ac9 + Hugh's patch (version 23 Aug 2001 > > 21:24:50), 8 hours of torture. 1 incident where a process died with SIGSEGV. > > No "swap offset" messages. > > Great - Hugh can you forward me a copy of the patch. Wait, I do not feel comfortable with random SIGSEGV messages. If we are getting those, I suspect there is still some broken thing in do_swap_page(). -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: VM problem with 2.4.8-ac9 (fwd) 2001-08-24 18:40 ` Marcelo Tosatti @ 2001-08-24 20:11 ` Rik van Riel 2001-08-25 13:10 ` Jari Ruusu 0 siblings, 1 reply; 21+ messages in thread From: Rik van Riel @ 2001-08-24 20:11 UTC (permalink / raw) To: Marcelo Tosatti Cc: Alan Cox, Jari Ruusu, Hugh Dickins, Jeremy Linton, linux-mm On Fri, 24 Aug 2001, Marcelo Tosatti wrote: > I do not feel comfortable with random SIGSEGV messages. You wuss ;) > If we are getting those, I suspect there is still some broken > thing in do_swap_page(). True, but note that even while Hugh's stuff doesn't fix everything, it sure seems to do away with some bugs. The code looks fine to me... regards, Rik -- IA64: a worthy successor to the i860. http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: VM problem with 2.4.8-ac9 (fwd) 2001-08-24 20:11 ` Rik van Riel @ 2001-08-25 13:10 ` Jari Ruusu 2001-08-28 15:49 ` Jari Ruusu 0 siblings, 1 reply; 21+ messages in thread From: Jari Ruusu @ 2001-08-25 13:10 UTC (permalink / raw) To: Alan Cox Cc: Rik van Riel, Marcelo Tosatti, Hugh Dickins, Jeremy Linton, linux-mm VM torture results of 2.4.8-ac11, 4 hours of torture. 1 incident where a process died with SIGSEGV. Got these on serial console lot earlier than the SIGSEGV happened, so they are probably unrelated to the SIGSEGV: > Unused swap offset entry in swap_count 003cba00 > VM: Bad swap entry 003cba00 Regards, Jari Ruusu <jari.ruusu@pp.inet.fi> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: VM problem with 2.4.8-ac9 (fwd) 2001-08-25 13:10 ` Jari Ruusu @ 2001-08-28 15:49 ` Jari Ruusu 0 siblings, 0 replies; 21+ messages in thread From: Jari Ruusu @ 2001-08-28 15:49 UTC (permalink / raw) To: Alan Cox Cc: Rik van Riel, Marcelo Tosatti, Hugh Dickins, Jeremy Linton, linux-mm 2.4.8-ac12 ~~~~~~~~~~ 5 hours of VM torture. 1 incident where a process died with SIGSEGV. Got these on serial console: > Unused swap offset entry in swap_dup 0007e400 > VM: Bad swap entry 0007e400 > Unused swap offset entry in swap_count 0007e400 > Unused swap offset entry in swap_count 0007e400 > Unused swap offset entry in swap_count 0007e400 > VM: Bad swap entry 0007e400 2.4.9-ac1 ~~~~~~~~~ 13 hours of VM torture. 2 incidents where a process died with SIGSEGV. No "swap offset" messages. Both SIGSEGV incidents appeared to happen simultaneously, suggesting that one erratic behavior caused both. 2.4.9-ac3 ~~~~~~~~~ Kernel compiled with -fno-strength-reduce. 3 hours of VM torture. 2 incidents where a process died with SIGSEGV. No "swap offset" messages. Both SIGSEGV incidents appeared to happen simultaneously, suggesting that one erratic behavior caused both. 2.4.10-pre1 ~~~~~~~~~~~ 2 hours of VM torture. 1 incident where a process died with SIGSEGV. No "swap offset" messages. Regards, Jari Ruusu <jari.ruusu@pp.inet.fi> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: VM problem with 2.4.8-ac9 (fwd) 2001-08-22 19:25 VM problem with 2.4.8-ac9 (fwd) Rik van Riel 2001-08-22 18:27 ` Marcelo Tosatti @ 2001-08-22 21:14 ` Alan Cox 2001-08-22 21:28 ` Rik van Riel 2001-08-23 6:19 ` Eric W. Biederman 1 sibling, 2 replies; 21+ messages in thread From: Alan Cox @ 2001-08-22 21:14 UTC (permalink / raw) To: Rik van Riel; +Cc: Alan Cox, linux-mm, Marcelo Tosatti, Jari Ruusu > Suspect code would be: > - tlb optimisations in recent -ac (tasks dying with segfault) Um the tlb optimisations go back to about 2.4.1-ac 8) My guess would be the vm changes you and marcelo did -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: VM problem with 2.4.8-ac9 (fwd) 2001-08-22 21:14 ` Alan Cox @ 2001-08-22 21:28 ` Rik van Riel 2001-08-22 21:33 ` Alan Cox 2001-08-23 6:19 ` Eric W. Biederman 1 sibling, 1 reply; 21+ messages in thread From: Rik van Riel @ 2001-08-22 21:28 UTC (permalink / raw) To: Alan Cox; +Cc: linux-mm, Marcelo Tosatti, Jari Ruusu On Wed, 22 Aug 2001, Alan Cox wrote: > > Suspect code would be: > > - tlb optimisations in recent -ac (tasks dying with segfault) > > Um the tlb optimisations go back to about 2.4.1-ac 8) > My guess would be the vm changes you and marcelo did The strange thing is that the recent vm tweaks don't have any influence on the code paths which could cause tasks segfaulting ... Rik -- IA64: a worthy successor to the i860. http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: VM problem with 2.4.8-ac9 (fwd) 2001-08-22 21:28 ` Rik van Riel @ 2001-08-22 21:33 ` Alan Cox 2001-08-22 20:03 ` Marcelo Tosatti 2001-08-22 21:34 ` Rik van Riel 0 siblings, 2 replies; 21+ messages in thread From: Alan Cox @ 2001-08-22 21:33 UTC (permalink / raw) To: Rik van Riel; +Cc: Alan Cox, linux-mm, Marcelo Tosatti, Jari Ruusu > > Um the tlb optimisations go back to about 2.4.1-ac 8) > > My guess would be the vm changes you and marcelo did > > The strange thing is that the recent vm tweaks don't > have any influence on the code paths which could cause > tasks segfaulting ... They change reuse and timing patterns. I can believe we may have bugs left over from before that are now showing up. Alan -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: VM problem with 2.4.8-ac9 (fwd) 2001-08-22 21:33 ` Alan Cox @ 2001-08-22 20:03 ` Marcelo Tosatti 2001-08-22 21:34 ` Rik van Riel 1 sibling, 0 replies; 21+ messages in thread From: Marcelo Tosatti @ 2001-08-22 20:03 UTC (permalink / raw) To: Alan Cox; +Cc: Rik van Riel, linux-mm, Jari Ruusu On Wed, 22 Aug 2001, Alan Cox wrote: > > > Um the tlb optimisations go back to about 2.4.1-ac 8) > > > My guess would be the vm changes you and marcelo did > > > > The strange thing is that the recent vm tweaks don't > > have any influence on the code paths which could cause > > tasks segfaulting ... > > They change reuse and timing patterns. I can believe we may have bugs left > over from before that are now showing up. I'm looking at possibles races now... -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: VM problem with 2.4.8-ac9 (fwd) 2001-08-22 21:33 ` Alan Cox 2001-08-22 20:03 ` Marcelo Tosatti @ 2001-08-22 21:34 ` Rik van Riel 1 sibling, 0 replies; 21+ messages in thread From: Rik van Riel @ 2001-08-22 21:34 UTC (permalink / raw) To: Alan Cox; +Cc: linux-mm, Marcelo Tosatti, Jari Ruusu On Wed, 22 Aug 2001, Alan Cox wrote: > > The strange thing is that the recent vm tweaks don't > > have any influence on the code paths which could cause > > tasks segfaulting ... > > They change reuse and timing patterns. I can believe we may have > bugs left over from before that are now showing up. The swap code is my usual suspect in this case, since I'm still not sure how the locking in that part of the VM is supposed to work. :| cheers, Rik -- IA64: a worthy successor to the i860. http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: VM problem with 2.4.8-ac9 (fwd) 2001-08-22 21:14 ` Alan Cox 2001-08-22 21:28 ` Rik van Riel @ 2001-08-23 6:19 ` Eric W. Biederman 2001-08-23 12:53 ` Alan Cox 2001-08-23 13:18 ` Rik van Riel 1 sibling, 2 replies; 21+ messages in thread From: Eric W. Biederman @ 2001-08-23 6:19 UTC (permalink / raw) To: Alan Cox; +Cc: linux-mm Alan Cox <alan@lxorguk.ukuu.org.uk> writes: > > Suspect code would be: > > - tlb optimisations in recent -ac (tasks dying with segfault) > > Um the tlb optimisations go back to about 2.4.1-ac 8) > > My guess would be the vm changes you and marcelo did Can I ask which tlb optimisations these are. I have a couple of reports of dosemu killing the kernel on 2.4.7-ac6 and 2.4.8-ac7 and similiar kernels, on machines with slow processors. It has been confirmed in dosemu without X and without any direct hardware access. The kernel seems to oops in random interrupt handlers. Just off the cuff that feels like a lazy context switching bug. As dosemu plays with ldt's and lives in the vm86 syscall I can see it have problems other code paths don't. It is so weird I have been having a hard time believing the bug reports. Eric -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: VM problem with 2.4.8-ac9 (fwd) 2001-08-23 6:19 ` Eric W. Biederman @ 2001-08-23 12:53 ` Alan Cox 2001-08-23 13:18 ` Rik van Riel 1 sibling, 0 replies; 21+ messages in thread From: Alan Cox @ 2001-08-23 12:53 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Alan Cox, linux-mm > Can I ask which tlb optimisations these are. I have a couple > of reports of dosemu killing the kernel on 2.4.7-ac6 and 2.4.8-ac7 and > similiar kernels, on machines with slow processors. It has been Unrelated. The tlb shootdown fix is ages old and fixes a real bug in Linus tree. There are interactions between the segment reload patch and vm86() operation where segmnet registers happen to be left holding CS/DS values that make the kernel think its optimising a kernel->kernel transition when its seeing old vm86 mode selectors Andi Kleen is working on that one -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: VM problem with 2.4.8-ac9 (fwd) 2001-08-23 6:19 ` Eric W. Biederman 2001-08-23 12:53 ` Alan Cox @ 2001-08-23 13:18 ` Rik van Riel 1 sibling, 0 replies; 21+ messages in thread From: Rik van Riel @ 2001-08-23 13:18 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Alan Cox, linux-mm On 23 Aug 2001, Eric W. Biederman wrote: > Alan Cox <alan@lxorguk.ukuu.org.uk> writes: > > > > Suspect code would be: > > > - tlb optimisations in recent -ac (tasks dying with segfault) > > Can I ask which tlb optimisations these are. > It is so weird I have been having a hard time believing the bug > reports. We found a new suspect last night. Turns out Linus' locking overhaul of memory.c results not only in the kernel dropping locks in critical sections, but also possibly ends up in the pageout path ... regards, Rik -- IA64: a worthy successor to i860. http://www.surriel.com/ http://distro.conectiva.com/ Send all your spam to aardvark@nl.linux.org (spam digging piggy) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2001-08-28 15:49 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2001-08-22 19:25 VM problem with 2.4.8-ac9 (fwd) Rik van Riel 2001-08-22 18:27 ` Marcelo Tosatti 2001-08-23 8:25 ` Jari Ruusu 2001-08-23 17:35 ` Jari Ruusu 2001-08-23 20:24 ` Hugh Dickins 2001-08-23 20:29 ` Alan Cox 2001-08-23 20:37 ` Rik van Riel 2001-08-24 17:23 ` Jari Ruusu 2001-08-24 17:41 ` Alan Cox 2001-08-24 18:40 ` Marcelo Tosatti 2001-08-24 20:11 ` Rik van Riel 2001-08-25 13:10 ` Jari Ruusu 2001-08-28 15:49 ` Jari Ruusu 2001-08-22 21:14 ` Alan Cox 2001-08-22 21:28 ` Rik van Riel 2001-08-22 21:33 ` Alan Cox 2001-08-22 20:03 ` Marcelo Tosatti 2001-08-22 21:34 ` Rik van Riel 2001-08-23 6:19 ` Eric W. Biederman 2001-08-23 12:53 ` Alan Cox 2001-08-23 13:18 ` Rik van Riel
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox