VM problem with 2.4.8-ac9 (fwd)

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* VM problem with 2.4.8-ac9 (fwd)
@ 2001-08-22 19:25 Rik van Riel
  2001-08-22 18:27 ` Marcelo Tosatti
  2001-08-22 21:14 ` Alan Cox
  0 siblings, 2 replies; 21+ messages in thread
From: Rik van Riel @ 2001-08-22 19:25 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-mm, Marcelo Tosatti, Jari Ruusu

Hi Alan,

Another report of tasks dying on recent 2.4 kernels.
Suspect code would be:
- tlb optimisations in recent -ac    (tasks dying with segfault)
- swapfile.c, especially sys_swapoff (known race condition, marcelo?)

What would cause the swap map badness below I wouldn't know,
maybe marcelo is more familiar with the swapfile.c code...

regards,

Rik
--
IA64: a worthy successor to the i860.
---------- Forwarded message ----------
Date: Wed, 22 Aug 2001 20:37:01 +0300
From: Jari Ruusu <jari.ruusu@pp.inet.fi>
To: Rik van Riel <riel@conectiva.com.br>
Subject: VM problem with 2.4.8-ac9

Unused swap offset entry in swap_dup 00519e00
VM: Bad swap entry 00519e00
Unused swap offset entry in swap_count 00519e00
Unused swap offset entry in swap_count 00519e00
VM: Bad swap entry 00519e00
Unused swap offset entry in swap_dup 006b8a00
VM: Bad swap entry 006b8a00
Unused swap offset entry in swap_dup 006b8a00
VM: killing process nscd
Unused swap offset entry in swap_dup 006b8a00
VM: killing process nscd
VM: Bad swap entry 006b8a00
Unused swap offset entry in swap_dup 005e6900
VM: Bad swap entry 005e6900
Unused swap offset entry in swap_dup 005e6900
VM: killing process init
Unused swap offset entry in swap_dup 005e6900
VM: killing process init
Unused swap offset entry in swap_dup 005e6900
VM: killing process init
Unused swap offset entry in swap_dup 005e6900
VM: killing process init
Kernel panic: Attempted to kill init!

Linux debian 2.4.8-ac9 #1 Wed Aug 22 16:04:25 EEST 2001 i686 unknown
Gnu C                  2.95.3
Gnu make               3.79.1
binutils               2.9.5.0.37
mount                  2.11g
modutils               2.4.6
e2fsprogs              1.18
PPP                    2.3.11
Linux C Library        2.1.3
ldd: version 1.9.11
Procps                 2.0.6
Net-tools              1.54
Console-tools          0.2.3
Sh-utils               2.0

I get a repeatable VM failure with recent 2.4 kernels, tested with
2.4.8-ac[789] on x86 architecture. My VM torture test consists of following:
boot the kernel with "mem=16M" parameter, start X11 and a couple xterms
running kernel compile, glibc compile, bzip2 decompressor + tar, and top.
Also xosview was running. Working memory need of such setup is way over
available RAM, and swap use was about 20-35 MB (of 190 MB available swap),
and swapping activity was _continuous_. Kernel 2.2.19aa2 survives the
torture (everything else being same), and memtest-86 does not find any
errors, so it is unlikely to be hardware failure.

Anyway, the box dies after about 1-3 hours of torture. Sometimes it just
kills some random process. I captured above info using serial console. If
you need more info (.config, System.map, whatever) just ask for it. I am
willing to do more testing, just tell me what you need done.

Regards,
Jari Ruusu <jari.ruusu@pp.inet.fi>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: VM problem with 2.4.8-ac9 (fwd)
  2001-08-22 19:25 VM problem with 2.4.8-ac9 (fwd) Rik van Riel
@ 2001-08-22 18:27 ` Marcelo Tosatti
  2001-08-23  8:25   ` Jari Ruusu
  2001-08-22 21:14 ` Alan Cox
  1 sibling, 1 reply; 21+ messages in thread
From: Marcelo Tosatti @ 2001-08-22 18:27 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Alan Cox, linux-mm, Jari Ruusu


On Wed, 22 Aug 2001, Rik van Riel wrote:

> Hi Alan,
> 
> Another report of tasks dying on recent 2.4 kernels.
> Suspect code would be:
> - tlb optimisations in recent -ac    (tasks dying with segfault)
> - swapfile.c, especially sys_swapoff (known race condition, marcelo?)

There are known races on swapoff, but Jari is not running swapoff...

> 
> What would cause the swap map badness below I wouldn't know,
> maybe marcelo is more familiar with the swapfile.c code...

Jari, 

1) Are you using an SMP kernel? 
2) Did you tried with older kernels or 2.4.9?


> 
> Rik
> --
> IA64: a worthy successor to the i860.
> ---------- Forwarded message ----------
> Date: Wed, 22 Aug 2001 20:37:01 +0300
> From: Jari Ruusu <jari.ruusu@pp.inet.fi>
> To: Rik van Riel <riel@conectiva.com.br>
> Subject: VM problem with 2.4.8-ac9
> 
> Unused swap offset entry in swap_dup 00519e00
> VM: Bad swap entry 00519e00
> Unused swap offset entry in swap_count 00519e00
> Unused swap offset entry in swap_count 00519e00
> VM: Bad swap entry 00519e00
> Unused swap offset entry in swap_dup 006b8a00
> VM: Bad swap entry 006b8a00
> Unused swap offset entry in swap_dup 006b8a00
> VM: killing process nscd
> Unused swap offset entry in swap_dup 006b8a00
> VM: killing process nscd
> VM: Bad swap entry 006b8a00
> Unused swap offset entry in swap_dup 005e6900
> VM: Bad swap entry 005e6900
> Unused swap offset entry in swap_dup 005e6900
> VM: killing process init
> Unused swap offset entry in swap_dup 005e6900
> VM: killing process init
> Unused swap offset entry in swap_dup 005e6900
> VM: killing process init
> Unused swap offset entry in swap_dup 005e6900
> VM: killing process init
> Kernel panic: Attempted to kill init!
> 
> Linux debian 2.4.8-ac9 #1 Wed Aug 22 16:04:25 EEST 2001 i686 unknown
> Gnu C                  2.95.3
> Gnu make               3.79.1
> binutils               2.9.5.0.37
> mount                  2.11g
> modutils               2.4.6
> e2fsprogs              1.18
> PPP                    2.3.11
> Linux C Library        2.1.3
> ldd: version 1.9.11
> Procps                 2.0.6
> Net-tools              1.54
> Console-tools          0.2.3
> Sh-utils               2.0
> 
> I get a repeatable VM failure with recent 2.4 kernels, tested with
> 2.4.8-ac[789] on x86 architecture. My VM torture test consists of following:
> boot the kernel with "mem=16M" parameter, start X11 and a couple xterms
> running kernel compile, glibc compile, bzip2 decompressor + tar, and top.
> Also xosview was running. Working memory need of such setup is way over
> available RAM, and swap use was about 20-35 MB (of 190 MB available swap),
> and swapping activity was _continuous_. Kernel 2.2.19aa2 survives the
> torture (everything else being same), and memtest-86 does not find any
> errors, so it is unlikely to be hardware failure.
> 
> Anyway, the box dies after about 1-3 hours of torture. Sometimes it just
> kills some random process. I captured above info using serial console. If
> you need more info (.config, System.map, whatever) just ask for it. I am
> willing to do more testing, just tell me what you need done.
> 
> Regards,
> Jari Ruusu <jari.ruusu@pp.inet.fi>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: VM problem with 2.4.8-ac9 (fwd)
  2001-08-22 18:27 ` Marcelo Tosatti
@ 2001-08-23  8:25   ` Jari Ruusu
  2001-08-23 17:35     ` Jari Ruusu
  0 siblings, 1 reply; 21+ messages in thread
From: Jari Ruusu @ 2001-08-23  8:25 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Rik van Riel, Alan Cox, linux-mm

Marcelo Tosatti wrote:
> On Wed, 22 Aug 2001, Rik van Riel wrote:
> 
> > Hi Alan,
> >
> > Another report of tasks dying on recent 2.4 kernels.
> > Suspect code would be:
> > - tlb optimisations in recent -ac    (tasks dying with segfault)
> > - swapfile.c, especially sys_swapoff (known race condition, marcelo?)
> 
> There are known races on swapoff, but Jari is not running swapoff...

Correct, not running swapoff.

> >
> > What would cause the swap map badness below I wouldn't know,
> > maybe marcelo is more familiar with the swapfile.c code...
> 
> Jari,
> 
> 1) Are you using an SMP kernel?

No.

> 2) Did you tried with older kernels or 2.4.9?

Linus' 2.4.9 survived about 7 hours of VM torture, and then I got ext2
filesystem corruption (just once, dunno if it is repeatable). No "swap
offset" problems with 2.4.9 so far. Haven't tortured older kernels yet.

Regards,
Jari Ruusu <jari.ruusu@pp.inet.fi>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: VM problem with 2.4.8-ac9 (fwd)
  2001-08-23  8:25   ` Jari Ruusu
@ 2001-08-23 17:35     ` Jari Ruusu
  2001-08-23 20:24       ` Hugh Dickins
  0 siblings, 1 reply; 21+ messages in thread
From: Jari Ruusu @ 2001-08-23 17:35 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Rik van Riel, Alan Cox, linux-mm

Jari Ruusu wrote:
> Marcelo Tosatti wrote:
> > 2) Did you tried with older kernels or 2.4.9?
> 
> Linus' 2.4.9 survived about 7 hours of VM torture, and then I got ext2
> filesystem corruption (just once, dunno if it is repeatable). No "swap
> offset" problems with 2.4.9 so far. Haven't tortured older kernels yet.

Update:

Stock Linus' 2.4.7
~~~~~~~~~~~~~~~~~~
Box didn't die but I stopped VM torture test after this appeared:

Unused swap offset entry in swap_dup 003d6b00
VM: Bad swap entry 003d6b00
Unused swap offset entry in swap_count 003d6b00
VM: Bad swap entry 003d6b00

Stock Linus' 2.4.8
~~~~~~~~~~~~~~~~~~
bzip2 decompress + tar failed (twice, but at different place):
> tar: Skipping to next header
> 
> bzip2: Caught a SIGSEGV or SIGBUS whilst decompressing,
>         which probably indicates that the compressed data
>         is corrupted.
>         Input file = (stdin), output file = (stdout)
> 
> It is possible that the compressed file(s) have become corrupted.
> You can use the -tvv option to test integrity of such files.
> 
> You can use the `bzip2recover' program to *attempt* to recover
> data from undamaged sections of corrupted files.
> 
> tar: 360 garbage bytes ignored at end of archive
> tar: Child returned status 2
> tar: Error exit delayed from previous errors

glibc compile failed:
> make[2]: *** [math/subdir_install] Segmentation fault

Note: previously mentioned ext2 filesystem corruption (with kernel 2.4.9)
was in bzip2 decompress + tar restored directory hierarchy, so above bzip2
decompress + tar failure can explain that too. Maybe something went
similarly wrong and bzip2 outputted garbage to tar instead of terminating
with SIGSEGV.

6 hours of VM torture, 3 incidents where a process died with SIGSEGV. No
"swap offset" problems with 2.4.8 so far.

Regards,
Jari Ruusu <jari.ruusu@pp.inet.fi>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: VM problem with 2.4.8-ac9 (fwd)
  2001-08-23 17:35     ` Jari Ruusu
@ 2001-08-23 20:24       ` Hugh Dickins
  2001-08-23 20:29         ` Alan Cox
  2001-08-24 17:23         ` Jari Ruusu
  0 siblings, 2 replies; 21+ messages in thread
From: Hugh Dickins @ 2001-08-23 20:24 UTC (permalink / raw)
  To: Jari Ruusu
  Cc: Marcelo Tosatti, Rik van Riel, Alan Cox, Jeremy Linton, linux-mm

On Thu, 23 Aug 2001, Jari Ruusu wrote:
> 
> Box didn't die but I stopped VM torture test after this appeared:
> 
> Unused swap offset entry in swap_dup 003d6b00
> VM: Bad swap entry 003d6b00
> Unused swap offset entry in swap_count 003d6b00
> VM: Bad swap entry 003d6b00

Don't stop your test when such messages appear, under heavy swapping
they can appear even when the system is proceeding correctly, not
only when doing swapoff.  (In a separate, swapoff patch I've
suppressed them, but won't bother you with that here.)

Alan has intentionally been avoiding many of the VM "fixes" in Linus'
tree, Rik has been feeding him some of the less controversial ones,
but I believe there are important ones missing (unrelated to aging
and tuning etc.).  Looking no further than mm/memory.c, patch below
to bring 2.4.8-ac9 in line with 2.4.9 there:

1. lock_kiovec page unwind fix (velizarb@pirincom.com)
2. copy_cow_page & clear_user_highpage can block in kmap
   (Anton Blanchard, Ingo Molnar, Linus Torvalds, Hugh Dickins)
3. do_swap_page recheck pte before failing (Jeremy Linton, Linus Torvalds)
4. do_swap_page don't mkwrite when deleting from swap cache (Linus Torvalds)

The first has no relevance to your issues, but should be in -ac.
The second is rarely needed, but I wouldn't want to run a torture
test on a highmem machine (okay, yours is far from that!) without it.
The third is probably the fix to the process killing you saw near
the Unused swap and Bad swap messages.  The fourth is a correction
Linus made to Rik's swap freeing, which might also have some bearing.

(Alan, some doubts in do_wp_page: the additional PageReserved test
may be redundant in view of your earlier ZERO_PAGE test, but I
felt safer to include it; and I was dubious about the additional
set_pte and ptep_get_and_clear in your version, but don't know the
history and let them stay.)

Hugh

--- 2.4.8-ac9/mm/memory.c	Thu Aug 23 12:31:32 2001
+++ linux/mm/memory.c	Thu Aug 23 19:56:55 2001
@@ -611,9 +611,9 @@
 			
 			if (TryLockPage(page)) {
 				while (j--) {
-					page = *(--ppage);
-					if (page)
-						UnlockPage(page);
+					struct page *tmp = *--ppage;
+					if (tmp)
+						UnlockPage(tmp);
 				}
 				goto retry;
 			}
@@ -856,10 +856,9 @@
 /*
  * We hold the mm semaphore for reading and vma->vm_mm->page_table_lock
  */
-static inline void break_cow(struct vm_area_struct * vma, struct page *	old_page, struct page * new_page, unsigned long address, 
+static inline void break_cow(struct vm_area_struct * vma, struct page * new_page, unsigned long address, 
 		pte_t *page_table)
 {
-	copy_cow_page(old_page,new_page,address);
 	flush_page_to_ram(new_page);
 	flush_cache_page(vma, address);
 	establish_pte(vma, address, page_table, pte_mkwrite(pte_mkdirty(mk_pte(new_page, vma->vm_page_prot))));
@@ -923,6 +922,8 @@
 			break;
 		/* FallThrough */
 	case 1:
+		if (PageReserved(old_page))
+			break;
 		flush_cache_page(vma, address);
 		establish_pte(vma, address, page_table, pte_mkyoung(pte_mkdirty(pte_mkwrite(pte))));
 		return 1;	/* Minor fault */
@@ -932,16 +933,20 @@
 	 * Ok, we need to copy. Oh, well..
 	 */
 copy:	 
- 	set_pte(page_table, pte);
+	set_pte(page_table, pte);
+	page_cache_get(old_page);
 	spin_unlock(&mm->page_table_lock);
+
 	new_page = alloc_page(GFP_HIGHUSER);
-	spin_lock(&mm->page_table_lock);
 	if (!new_page)
-		return -1;
+		goto no_mem;
+	copy_cow_page(old_page,new_page,address);
+	page_cache_release(old_page);
 
 	/*
 	 * Re-check the pte - we dropped the lock
 	 */
+	spin_lock(&mm->page_table_lock);
 	if (pte_same(*page_table, pte)) {
 		/* We are changing the pte, so get rid of the old
 		 * one to avoid races with the hardware, this really
@@ -950,7 +955,7 @@
 		pte = ptep_get_and_clear(page_table);
 		if (PageReserved(old_page))
 			++mm->rss;
-		break_cow(vma, old_page, new_page, address, page_table);
+		break_cow(vma, new_page, address, page_table);
 
 		/* Free the old page.. */
 		new_page = old_page;
@@ -961,6 +966,10 @@
 bad_wp_page:
 	printk("do_wp_page: bogus page at address %08lx (page 0x%lx)\n",address,(unsigned long)old_page);
 	return -1;
+no_mem:
+	page_cache_release(old_page);
+	spin_lock(&mm->page_table_lock);
+	return -1;
 }
 
 static void vmtruncate_list(struct vm_area_struct *mpnt, unsigned long pgoff)
@@ -1099,9 +1108,10 @@
  */
 static int do_swap_page(struct mm_struct * mm,
 	struct vm_area_struct * vma, unsigned long address,
-	pte_t * page_table, swp_entry_t entry, int write_access)
+	pte_t * page_table, pte_t orig_pte, int write_access)
 {
 	struct page *page;
+	swp_entry_t entry = pte_to_swp_entry(orig_pte);
 	pte_t pte;
 	int ret = 1;
 
@@ -1114,7 +1124,11 @@
 		unlock_kernel();
 		if (!page) {
 			spin_lock(&mm->page_table_lock);
-			return -1;
+			/*
+			 * Back out if somebody else faulted in this pte while
+			 * we released the page table lock.
+			 */
+			return pte_same(*page_table, orig_pte) ? -1 : 1;
 		}
 
 		/* Had to read the page from swap area: Major fault */
@@ -1133,7 +1147,7 @@
 	 * released the page table lock.
 	 */
 	spin_lock(&mm->page_table_lock);
-	if (pte_present(*page_table)) {
+	if (!pte_same(*page_table, orig_pte)) {
 		UnlockPage(page);
 		page_cache_release(page);
 		return 1;
@@ -1144,21 +1158,13 @@
 	pte = mk_pte(page, vma->vm_page_prot);
 
 	swap_free(entry);
-	if (write_access && exclusive_swap_page(page))
-		pte = pte_mkwrite(pte_mkdirty(pte));
-
-	/*
-	 * If swap space is getting low and we were the last user
-	 * of this piece of swap space, we free this space so
-	 * somebody else can be swapped out.
-	 *
-	 * We are protected against try_to_swap_out() because the
-	 * page is locked and against do_fork() because we have
-	 * read_lock(&mm->mmap_sem).
-	 */
-	if (vm_swap_full() && exclusive_swap_page(page)) {
-		delete_from_swap_cache_nolock(page);
-		pte = pte_mkwrite(pte_mkdirty(pte));
+	if (exclusive_swap_page(page)) {	
+		if (write_access)
+			pte = pte_mkwrite(pte_mkdirty(pte));
+		if (vm_swap_full()) {
+			delete_from_swap_cache_nolock(page);
+			pte = pte_mkdirty(pte);
+		}
 	}
 	UnlockPage(page);
 
@@ -1189,16 +1195,18 @@
 
 		/* Allocate our own private page. */
 		spin_unlock(&mm->page_table_lock);
+
 		page = alloc_page(GFP_HIGHUSER);
-		spin_lock(&mm->page_table_lock);
 		if (!page)
-			return -1;
+			goto no_mem;
+		clear_user_highpage(page, addr);
+
+		spin_lock(&mm->page_table_lock);
 		if (!pte_none(*page_table)) {
 			page_cache_release(page);
 			return 1;
 		}
 		mm->rss++;
-		clear_user_highpage(page, addr);
 		flush_page_to_ram(page);
 		entry = pte_mkwrite(pte_mkdirty(mk_pte(page, vma->vm_page_prot)));
 	}
@@ -1208,6 +1216,10 @@
 	/* No need to invalidate - it was non-present before */
 	update_mmu_cache(vma, addr, entry);
 	return 1;	/* Minor fault */
+
+no_mem:
+	spin_lock(&mm->page_table_lock);
+	return -1;
 }
 
 /*
@@ -1327,7 +1339,7 @@
 		 */
 		if (pte_none(entry))
 			return do_no_page(mm, vma, address, write_access, pte);
-		return do_swap_page(mm, vma, address, pte, pte_to_swp_entry(entry), write_access);
+		return do_swap_page(mm, vma, address, pte, entry, write_access);
 	}
 
 	entry = ptep_get_and_clear(pte);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: VM problem with 2.4.8-ac9 (fwd)
  2001-08-23 20:24       ` Hugh Dickins
@ 2001-08-23 20:29         ` Alan Cox
  2001-08-23 20:37           ` Rik van Riel
  2001-08-24 17:23         ` Jari Ruusu
  1 sibling, 1 reply; 21+ messages in thread
From: Alan Cox @ 2001-08-23 20:29 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Jari Ruusu, Marcelo Tosatti, Rik van Riel, Alan Cox,
	Jeremy Linton, linux-mm

> Alan has intentionally been avoiding many of the VM "fixes" in Linus'
> tree, Rik has been feeding him some of the less controversial ones,
> but I believe there are important ones missing (unrelated to aging
> and tuning etc.).  Looking no further than mm/memory.c, patch below
> to bring 2.4.8-ac9 in line with 2.4.9 there:
> 
> 1. lock_kiovec page unwind fix (velizarb@pirincom.com)
> 2. copy_cow_page & clear_user_highpage can block in kmap
>    (Anton Blanchard, Ingo Molnar, Linus Torvalds, Hugh Dickins)
> 3. do_swap_page recheck pte before failing (Jeremy Linton, Linus Torvalds)
> 4. do_swap_page don't mkwrite when deleting from swap cache (Linus Torvalds)

I've been avoiding the Linus paging ones because they seem to make my
machines all crash repeatedly under any kind of serious test load. Plus the
fact that after about 3 days they needed rebooting to get back from 386 
speed.

Rik traced down some more vm races so hopefully ac11 will have some more 
progress on this one.

Alan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: VM problem with 2.4.8-ac9 (fwd)
  2001-08-23 20:29         ` Alan Cox
@ 2001-08-23 20:37           ` Rik van Riel
  0 siblings, 0 replies; 21+ messages in thread
From: Rik van Riel @ 2001-08-23 20:37 UTC (permalink / raw)
  To: Alan Cox
  Cc: Hugh Dickins, Jari Ruusu, Marcelo Tosatti, Jeremy Linton, linux-mm

On Thu, 23 Aug 2001, Alan Cox wrote:

> > 1. lock_kiovec page unwind fix (velizarb@pirincom.com)
> > 2. copy_cow_page & clear_user_highpage can block in kmap
> >    (Anton Blanchard, Ingo Molnar, Linus Torvalds, Hugh Dickins)

I don't know enough about this code to properly fix it...

> > 3. do_swap_page recheck pte before failing (Jeremy Linton, Linus Torvalds)
> > 4. do_swap_page don't mkwrite when deleting from swap cache (Linus Torvalds)

I'll look at these. I'll carefully merge some of Linus'
stuff here. Careful because Linus seems to be getting
various other things in eg. memory.c wrong and stripped
off the comments of some pieces of code ;)

cheers,

Rik
--
IA64: a worthy successor to the i860.

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: VM problem with 2.4.8-ac9 (fwd)
  2001-08-23 20:24       ` Hugh Dickins
  2001-08-23 20:29         ` Alan Cox
@ 2001-08-24 17:23         ` Jari Ruusu
  2001-08-24 17:41           ` Alan Cox
  1 sibling, 1 reply; 21+ messages in thread
From: Jari Ruusu @ 2001-08-24 17:23 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Marcelo Tosatti, Rik van Riel, Alan Cox, Jeremy Linton, linux-mm

Hugh Dickins wrote:
> 1. lock_kiovec page unwind fix (velizarb@pirincom.com)
> 2. copy_cow_page & clear_user_highpage can block in kmap
>    (Anton Blanchard, Ingo Molnar, Linus Torvalds, Hugh Dickins)
> 3. do_swap_page recheck pte before failing (Jeremy Linton, Linus Torvalds)
> 4. do_swap_page don't mkwrite when deleting from swap cache (Linus Torvalds)

VM torture results of 2.4.8-ac9 + Hugh's patch (version 23 Aug 2001
21:24:50), 8 hours of torture. 1 incident where a process died with SIGSEGV.
No "swap offset" messages.

glibc compile failed:
make[2]: *** [math/subdir_lib] Segmentation fault

Regards,
Jari Ruusu <jari.ruusu@pp.inet.fi>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: VM problem with 2.4.8-ac9 (fwd)
  2001-08-24 17:23         ` Jari Ruusu
@ 2001-08-24 17:41           ` Alan Cox
  2001-08-24 18:40             ` Marcelo Tosatti
  0 siblings, 1 reply; 21+ messages in thread
From: Alan Cox @ 2001-08-24 17:41 UTC (permalink / raw)
  To: Jari Ruusu
  Cc: Hugh Dickins, Marcelo Tosatti, Rik van Riel, Alan Cox,
	Jeremy Linton, linux-mm

> VM torture results of 2.4.8-ac9 + Hugh's patch (version 23 Aug 2001
> 21:24:50), 8 hours of torture. 1 incident where a process died with SIGSEGV.
> No "swap offset" messages.

Great - Hugh can you forward me a copy of the patch.

Alan
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: VM problem with 2.4.8-ac9 (fwd)
  2001-08-24 17:41           ` Alan Cox
@ 2001-08-24 18:40             ` Marcelo Tosatti
  2001-08-24 20:11               ` Rik van Riel
  0 siblings, 1 reply; 21+ messages in thread
From: Marcelo Tosatti @ 2001-08-24 18:40 UTC (permalink / raw)
  To: Alan Cox; +Cc: Jari Ruusu, Hugh Dickins, Rik van Riel, Jeremy Linton, linux-mm


On Fri, 24 Aug 2001, Alan Cox wrote:

> > VM torture results of 2.4.8-ac9 + Hugh's patch (version 23 Aug 2001
> > 21:24:50), 8 hours of torture. 1 incident where a process died with SIGSEGV.
> > No "swap offset" messages.
> 
> Great - Hugh can you forward me a copy of the patch.

Wait, 

I do not feel comfortable with random SIGSEGV messages.

If we are getting those, I suspect there is still some broken thing in
do_swap_page().

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: VM problem with 2.4.8-ac9 (fwd)
  2001-08-24 18:40             ` Marcelo Tosatti
@ 2001-08-24 20:11               ` Rik van Riel
  2001-08-25 13:10                 ` Jari Ruusu
  0 siblings, 1 reply; 21+ messages in thread
From: Rik van Riel @ 2001-08-24 20:11 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Alan Cox, Jari Ruusu, Hugh Dickins, Jeremy Linton, linux-mm

On Fri, 24 Aug 2001, Marcelo Tosatti wrote:

> I do not feel comfortable with random SIGSEGV messages.

You wuss ;)

> If we are getting those, I suspect there is still some broken
> thing in do_swap_page().

True, but note that even while Hugh's stuff doesn't fix
everything, it sure seems to do away with some bugs.
The code looks fine to me...

regards,

Rik
--
IA64: a worthy successor to the i860.

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: VM problem with 2.4.8-ac9 (fwd)
  2001-08-24 20:11               ` Rik van Riel
@ 2001-08-25 13:10                 ` Jari Ruusu
  2001-08-28 15:49                   ` Jari Ruusu
  0 siblings, 1 reply; 21+ messages in thread
From: Jari Ruusu @ 2001-08-25 13:10 UTC (permalink / raw)
  To: Alan Cox
  Cc: Rik van Riel, Marcelo Tosatti, Hugh Dickins, Jeremy Linton, linux-mm

VM torture results of 2.4.8-ac11, 4 hours of torture. 1 incident where a
process died with SIGSEGV.

Got these on serial console lot earlier than the SIGSEGV happened, so they
are probably unrelated to the SIGSEGV:
> Unused swap offset entry in swap_count 003cba00
> VM: Bad swap entry 003cba00

Regards,
Jari Ruusu <jari.ruusu@pp.inet.fi>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: VM problem with 2.4.8-ac9 (fwd)
  2001-08-25 13:10                 ` Jari Ruusu
@ 2001-08-28 15:49                   ` Jari Ruusu
  0 siblings, 0 replies; 21+ messages in thread
From: Jari Ruusu @ 2001-08-28 15:49 UTC (permalink / raw)
  To: Alan Cox
  Cc: Rik van Riel, Marcelo Tosatti, Hugh Dickins, Jeremy Linton, linux-mm

2.4.8-ac12
~~~~~~~~~~
5 hours of VM torture. 1 incident where a process died with SIGSEGV.

Got these on serial console:
> Unused swap offset entry in swap_dup 0007e400
> VM: Bad swap entry 0007e400
> Unused swap offset entry in swap_count 0007e400
> Unused swap offset entry in swap_count 0007e400
> Unused swap offset entry in swap_count 0007e400
> VM: Bad swap entry 0007e400

2.4.9-ac1
~~~~~~~~~
13 hours of VM torture. 2 incidents where a process died with SIGSEGV. No
"swap offset" messages. Both SIGSEGV incidents appeared to happen
simultaneously, suggesting that one erratic behavior caused both.

2.4.9-ac3
~~~~~~~~~
Kernel compiled with -fno-strength-reduce. 3 hours of VM torture. 2
incidents where a process died with SIGSEGV. No "swap offset" messages. Both
SIGSEGV incidents appeared to happen simultaneously, suggesting that one
erratic behavior caused both.

2.4.10-pre1
~~~~~~~~~~~
2 hours of VM torture. 1 incident where a process died with SIGSEGV. No
"swap offset" messages.

Regards,
Jari Ruusu <jari.ruusu@pp.inet.fi>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: VM problem with 2.4.8-ac9 (fwd)
  2001-08-22 19:25 VM problem with 2.4.8-ac9 (fwd) Rik van Riel
  2001-08-22 18:27 ` Marcelo Tosatti
@ 2001-08-22 21:14 ` Alan Cox
  2001-08-22 21:28   ` Rik van Riel
  2001-08-23  6:19   ` Eric W. Biederman
  1 sibling, 2 replies; 21+ messages in thread
From: Alan Cox @ 2001-08-22 21:14 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Alan Cox, linux-mm, Marcelo Tosatti, Jari Ruusu

> Suspect code would be:
> - tlb optimisations in recent -ac    (tasks dying with segfault)

Um the tlb optimisations go back to about 2.4.1-ac 8)

My guess would be the vm changes you and marcelo did
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: VM problem with 2.4.8-ac9 (fwd)
  2001-08-22 21:14 ` Alan Cox
@ 2001-08-22 21:28   ` Rik van Riel
  2001-08-22 21:33     ` Alan Cox
  2001-08-23  6:19   ` Eric W. Biederman
  1 sibling, 1 reply; 21+ messages in thread
From: Rik van Riel @ 2001-08-22 21:28 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-mm, Marcelo Tosatti, Jari Ruusu

On Wed, 22 Aug 2001, Alan Cox wrote:

> > Suspect code would be:
> > - tlb optimisations in recent -ac    (tasks dying with segfault)
>
> Um the tlb optimisations go back to about 2.4.1-ac 8)
> My guess would be the vm changes you and marcelo did

The strange thing is that the recent vm tweaks don't
have any influence on the code paths which could cause
tasks segfaulting ...

Rik
--
IA64: a worthy successor to the i860.

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: VM problem with 2.4.8-ac9 (fwd)
  2001-08-22 21:28   ` Rik van Riel
@ 2001-08-22 21:33     ` Alan Cox
  2001-08-22 20:03       ` Marcelo Tosatti
  2001-08-22 21:34       ` Rik van Riel
  0 siblings, 2 replies; 21+ messages in thread
From: Alan Cox @ 2001-08-22 21:33 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Alan Cox, linux-mm, Marcelo Tosatti, Jari Ruusu

> > Um the tlb optimisations go back to about 2.4.1-ac 8)
> > My guess would be the vm changes you and marcelo did
> 
> The strange thing is that the recent vm tweaks don't
> have any influence on the code paths which could cause
> tasks segfaulting ...

They change reuse and timing patterns. I can believe we may have bugs left
over from before that are now showing up. 

Alan
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: VM problem with 2.4.8-ac9 (fwd)
  2001-08-22 21:33     ` Alan Cox
@ 2001-08-22 20:03       ` Marcelo Tosatti
  2001-08-22 21:34       ` Rik van Riel
  1 sibling, 0 replies; 21+ messages in thread
From: Marcelo Tosatti @ 2001-08-22 20:03 UTC (permalink / raw)
  To: Alan Cox; +Cc: Rik van Riel, linux-mm, Jari Ruusu


On Wed, 22 Aug 2001, Alan Cox wrote:

> > > Um the tlb optimisations go back to about 2.4.1-ac 8)
> > > My guess would be the vm changes you and marcelo did
> > 
> > The strange thing is that the recent vm tweaks don't
> > have any influence on the code paths which could cause
> > tasks segfaulting ...
> 
> They change reuse and timing patterns. I can believe we may have bugs left
> over from before that are now showing up. 

I'm looking at possibles races now... 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: VM problem with 2.4.8-ac9 (fwd)
  2001-08-22 21:33     ` Alan Cox
  2001-08-22 20:03       ` Marcelo Tosatti
@ 2001-08-22 21:34       ` Rik van Riel
  1 sibling, 0 replies; 21+ messages in thread
From: Rik van Riel @ 2001-08-22 21:34 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-mm, Marcelo Tosatti, Jari Ruusu

On Wed, 22 Aug 2001, Alan Cox wrote:

> > The strange thing is that the recent vm tweaks don't
> > have any influence on the code paths which could cause
> > tasks segfaulting ...
>
> They change reuse and timing patterns. I can believe we may have
> bugs left over from before that are now showing up.

The swap code is my usual suspect in this case, since
I'm still not sure how the locking in that part of the
VM is supposed to work. :|

cheers,

Rik
--
IA64: a worthy successor to the i860.

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: VM problem with 2.4.8-ac9 (fwd)
  2001-08-22 21:14 ` Alan Cox
  2001-08-22 21:28   ` Rik van Riel
@ 2001-08-23  6:19   ` Eric W. Biederman
  2001-08-23 12:53     ` Alan Cox
  2001-08-23 13:18     ` Rik van Riel
  1 sibling, 2 replies; 21+ messages in thread
From: Eric W. Biederman @ 2001-08-23  6:19 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-mm

Alan Cox <alan@lxorguk.ukuu.org.uk> writes:

> > Suspect code would be:
> > - tlb optimisations in recent -ac    (tasks dying with segfault)
> 
> Um the tlb optimisations go back to about 2.4.1-ac 8)
> 
> My guess would be the vm changes you and marcelo did

Can I ask which tlb optimisations these are.  I have a couple
of reports of dosemu killing the kernel on 2.4.7-ac6 and 2.4.8-ac7 and
similiar kernels, on machines with slow processors.  It has been
confirmed in dosemu without X and without any direct hardware
access. The kernel seems to oops in random interrupt handlers.  Just
off the cuff that feels like a lazy context switching bug.  As dosemu
plays with ldt's and lives in the vm86 syscall I can see it have
problems other code paths don't.

It is so weird I have been having a hard time believing the bug
reports.

Eric
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: VM problem with 2.4.8-ac9 (fwd)
  2001-08-23  6:19   ` Eric W. Biederman
@ 2001-08-23 12:53     ` Alan Cox
  2001-08-23 13:18     ` Rik van Riel
  1 sibling, 0 replies; 21+ messages in thread
From: Alan Cox @ 2001-08-23 12:53 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Alan Cox, linux-mm

> Can I ask which tlb optimisations these are.  I have a couple
> of reports of dosemu killing the kernel on 2.4.7-ac6 and 2.4.8-ac7 and
> similiar kernels, on machines with slow processors.  It has been

Unrelated. The tlb shootdown fix is ages old and fixes a real bug in Linus
tree.

There are interactions between the segment reload patch and vm86() operation
where segmnet registers happen to be left holding CS/DS values that make
the kernel think its optimising a kernel->kernel transition when its seeing
old vm86 mode selectors

Andi Kleen is working on that one
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: VM problem with 2.4.8-ac9 (fwd)
  2001-08-23  6:19   ` Eric W. Biederman
  2001-08-23 12:53     ` Alan Cox
@ 2001-08-23 13:18     ` Rik van Riel
  1 sibling, 0 replies; 21+ messages in thread
From: Rik van Riel @ 2001-08-23 13:18 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Alan Cox, linux-mm

On 23 Aug 2001, Eric W. Biederman wrote:
> Alan Cox <alan@lxorguk.ukuu.org.uk> writes:
>
> > > Suspect code would be:
> > > - tlb optimisations in recent -ac    (tasks dying with segfault)
>
> Can I ask which tlb optimisations these are.

> It is so weird I have been having a hard time believing the bug
> reports.

We found a new suspect last night.  Turns out Linus'
locking overhaul of memory.c results not only in the
kernel dropping locks in critical sections, but also
possibly ends up in the pageout path ...

regards,

Rik
-- 
IA64: a worthy successor to i860.

http://www.surriel.com/		http://distro.conectiva.com/

Send all your spam to aardvark@nl.linux.org (spam digging piggy)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2001-08-28 15:49 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-08-22 19:25 VM problem with 2.4.8-ac9 (fwd) Rik van Riel
2001-08-22 18:27 ` Marcelo Tosatti
2001-08-23  8:25   ` Jari Ruusu
2001-08-23 17:35     ` Jari Ruusu
2001-08-23 20:24       ` Hugh Dickins
2001-08-23 20:29         ` Alan Cox
2001-08-23 20:37           ` Rik van Riel
2001-08-24 17:23         ` Jari Ruusu
2001-08-24 17:41           ` Alan Cox
2001-08-24 18:40             ` Marcelo Tosatti
2001-08-24 20:11               ` Rik van Riel
2001-08-25 13:10                 ` Jari Ruusu
2001-08-28 15:49                   ` Jari Ruusu
2001-08-22 21:14 ` Alan Cox
2001-08-22 21:28   ` Rik van Riel
2001-08-22 21:33     ` Alan Cox
2001-08-22 20:03       ` Marcelo Tosatti
2001-08-22 21:34       ` Rik van Riel
2001-08-23  6:19   ` Eric W. Biederman
2001-08-23 12:53     ` Alan Cox
2001-08-23 13:18     ` Rik van Riel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox