* 2.6.18: Kernel BUG at mm/rmap.c:522
@ 2006-10-04 10:40 Andre Noll
2006-10-04 13:59 ` Nick Piggin
0 siblings, 1 reply; 7+ messages in thread
From: Andre Noll @ 2006-10-04 10:40 UTC (permalink / raw)
To: linux-mm, linux-kernel, andrea, nickpiggin, riel
[-- Attachment #1: Type: text/plain, Size: 11263 bytes --]
Hi
MATLAB triggers the following bug on both of our new 16-way opteron
machines (64G Ram): The same kernel is running with no problems on a
bunch of smaller (8-way, 4-way, max 32G Ram) cluster nodes.
Any hints?
Andre
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at ...aid0/home/maan/scm/stable/linux-2.6.18.y/mm/rmap.c:522
invalid opcode: 0000 [1] SMP
CPU 14
Pid: 12948, comm: MATLAB Not tainted 2.6.18-tt64-6 #1
RIP: 0010:[<ffffffff8015ee54>] [<ffffffff8015ee54>] page_remove_rmap+0x13/0x2d
RSP: 0018:ffff810207a19d70 EFLAGS: 00010286
RAX: 00000000ffffffff RBX: ffff81101aa1dd90 RCX: 000000000000001c
RDX: 00002aaad63b2000 RSI: 00002aaad63b2000 RDI: ffff81102d129ce8
RBP: 0000000f59e3b067 R08: 0000000000000023 R09: ffff810e30000680
R10: ffff8106079df408 R11: ffff8105970284a8 R12: ffff81102d129ce8
R13: ffff810e3005e160 R14: 00002aaad63b2000 R15: 0000000000000000
FS: 00002b3f6aa704a0(0000) GS:ffff810e301b9440(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00002aaad8592000 CR3: 00000005bdd17000 CR4: 00000000000006a0
Process MATLAB (pid: 12948, threadinfo ffff810207a18000, task ffff8101ea581080)
Stack: ffffffff80157a37 ffff810e30000680 00002aaad63b3000 ffff81101aa1dd98
ffff81102fb53668 ffffffffffffffb8 ffff8106079df400 ffff810207a19e98
00002aaad6400000 ffff810feecdc088 00002aaad6400000 ffff810b24d11588
Call Trace:
[<ffffffff80157a37>] zap_pte_range+0x1c4/0x2c0
[<ffffffff80157d0e>] unmap_page_range+0x1db/0x23a
[<ffffffff80157e5b>] unmap_vmas+0xee/0x1e3
[<ffffffff8015c6fe>] unmap_region+0xb4/0x127
[<ffffffff8015caa7>] do_munmap+0x183/0x19a
[<ffffffff8015caf7>] sys_munmap+0x39/0x52
[<ffffffff80109726>] system_call+0x7e/0x83
Code: 0f 0b 68 c0 4e 46 80 c2 0a 02 31 f6 f6 47 18 01 40 0f 94 c6
RIP [<ffffffff8015ee54>] page_remove_rmap+0x13/0x2d
RSP <ffff810207a19d70>
<0>Bad page state in process 'MATLAB'
page:ffff81102d129ce8 flags:0x1e00000000000014 mapping:0000000000000000 mapcount:-1 count:0
Trying to fix it up, but a reboot is needed
Backtrace:
Call Trace:
[<ffffffff8014f325>] bad_page+0x51/0x7b
[<ffffffff8014f788>] prep_new_page+0x57/0x15f
[<ffffffff8014feb1>] buffered_rmqueue+0x128/0x14a
[<ffffffff8015002c>] get_page_from_freelist+0xbd/0xe2
[<ffffffff801500a3>] __alloc_pages+0x52/0x29f
[<ffffffff80159a26>] do_anonymous_page+0x46/0x1b8
[<ffffffff8015a023>] __handle_mm_fault+0x18f/0x29d
[<ffffffff8011b359>] do_page_fault+0x1bd/0x4e7
[<ffffffff8015bf42>] do_mmap_pgoff+0x5fd/0x6de
[<ffffffff8022094b>] __up_write+0x14/0x108
[<ffffffff8010a3f9>] error_exit+0x0/0x84
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at ...aid0/home/maan/scm/stable/linux-2.6.18.y/mm/rmap.c:522
invalid opcode: 0000 [2] SMP
CPU 14
Pid: 12079, comm: MATLAB Tainted: G B 2.6.18-tt64-6 #1
RIP: 0010:[<ffffffff8015ee54>] [<ffffffff8015ee54>] page_remove_rmap+0x13/0x2d
RSP: 0018:ffff810a0916fd70 EFLAGS: 00010286
RAX: 00000000ffffffff RBX: ffff810e7c68d300 RCX: 000000000000001c
RDX: ffff810e30000000 RSI: 00002aab5ea60000 RDI: ffff81102bb25c10
RBP: 0000000ef53ee067 R08: 0000000000000023 R09: ffff810e30000680
R10: ffff810c01b69c88 R11: ffff810bce5484a8 R12: ffff81102bb25c10
R13: ffff810e3005e160 R14: 00002aab5ea60000 R15: 0000000000000000
FS: 00002b1f8100d4a0(0000) GS:ffff810e301b9440(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002aab31345000 CR3: 0000000c1bebd000 CR4: 00000000000006a0
Process MATLAB (pid: 12079, threadinfo ffff810a0916e000, task ffff810a092d2180)
Stack: ffffffff80157a37 ffff810e30000680 00002aab5ea61000 ffff810e7c68d308
ffff81102a0b6ee8 00000000ffffff9f ffff810c01b69c80 ffff810a0916fe98
00002aab5ec00000 ffff810fd4924298 00002aab5ec00000 ffff8100812777a8
Call Trace:
[<ffffffff80157a37>] zap_pte_range+0x1c4/0x2c0
[<ffffffff80157d0e>] unmap_page_range+0x1db/0x23a
[<ffffffff80157e5b>] unmap_vmas+0xee/0x1e3
[<ffffffff8015c6fe>] unmap_region+0xb4/0x127
[<ffffffff8015caa7>] do_munmap+0x183/0x19a
[<ffffffff8015caf7>] sys_munmap+0x39/0x52
[<ffffffff80109726>] system_call+0x7e/0x83
Code: 0f 0b 68 c0 4e 46 80 c2 0a 02 31 f6 f6 47 18 01 40 0f 94 c6
RIP [<ffffffff8015ee54>] page_remove_rmap+0x13/0x2d
RSP <ffff810a0916fd70>
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at ...aid0/home/maan/scm/stable/linux-2.6.18.y/mm/rmap.c:522
invalid opcode: 0000 [3] SMP
CPU 15
Pid: 20344, comm: MATLAB Tainted: G B 2.6.18-tt64-6 #1
RIP: 0010:[<ffffffff8015ee54>] [<ffffffff8015ee54>] page_remove_rmap+0x13/0x2d
RSP: 0018:ffff8101113b7d70 EFLAGS: 00010286
RAX: 00000000ffffffff RBX: ffff810e87e8d660 RCX: 000000000000001c
RDX: ffff810e30000000 RSI: 00002aaaf6acc000 RDI: ffff81102ef01410
RBP: 0000000fe24ee067 R08: 0000000000000023 R09: ffff810e30000680
R10: ffff81010a2d5248 R11: ffff810162b6a608 R12: ffff81102ef01410
R13: ffff810e300639e0 R14: 00002aaaf6acc000 R15: 0000000000000000
FS: 00002b88026364a0(0000) GS:ffff810e301f4440(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00002aab0284c0d0 CR3: 00000001038e3000 CR4: 00000000000006a0
Process MATLAB (pid: 20344, threadinfo ffff8101113b6000, task ffff81018fbd18a0)
Stack: ffffffff80157a37 ffff810e30000680 00002aaaf6acd000 ffff810e87e8d668
ffff81102a33aee8 00000000ffffff3f ffff81010a2d5240 ffff8101113b7e98
00002aaaf6c00000 ffff810ff0aa7818 00002aaaf6c00000 ffff810eaffd3da8
Call Trace:
[<ffffffff80157a37>] zap_pte_range+0x1c4/0x2c0
[<ffffffff80157d0e>] unmap_page_range+0x1db/0x23a
[<ffffffff80157e5b>] unmap_vmas+0xee/0x1e3
[<ffffffff8015c6fe>] unmap_region+0xb4/0x127
[<ffffffff8015caa7>] do_munmap+0x183/0x19a
[<ffffffff8015caf7>] sys_munmap+0x39/0x52
[<ffffffff80109726>] system_call+0x7e/0x83
Code: 0f 0b 68 c0 4e 46 80 c2 0a 02 31 f6 f6 47 18 01 40 0f 94 c6
RIP [<ffffffff8015ee54>] page_remove_rmap+0x13/0x2d
RSP <ffff8101113b7d70>
<3>swap_free: Unused swap offset entry 00000060
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at ...aid0/home/maan/scm/stable/linux-2.6.18.y/mm/rmap.c:522
invalid opcode: 0000 [4] SMP
CPU 14
Pid: 5985, comm: MATLAB Tainted: G B 2.6.18-tt64-6 #1
RIP: 0010:[<ffffffff8015ee54>] [<ffffffff8015ee54>] page_remove_rmap+0x13/0x2d
RSP: 0018:ffff810204875d70 EFLAGS: 00010286
RAX: 00000000ffffffff RBX: ffff810e87e852b0 RCX: ffff8101ef34d970
RDX: 00002aaae1856000 RSI: ffff8102280524d0 RDI: ffff81102d127358
RBP: 0000000f59d7d067 R08: 000000000000001e R09: 0000000000000000
R10: 0000000000000206 R11: 00000000fffffffa R12: ffff81102d127358
R13: ffff810e3005e160 R14: 00002aaae1856000 R15: 0000000000000000
FS: 00002ada130564a0(0000) GS:ffff810e301b9440(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00002aaae418a000 CR3: 0000000c0bdad000 CR4: 00000000000006a0
Process MATLAB (pid: 5985, threadinfo ffff810204874000, task ffff8101110f3140)
Stack: ffffffff80157a37 ffff810e30000680 00002aaae1857000 ffff810e87e852b8
ffff81102a33ad28 ffffffffffffffaa ffff810a30128a40 ffff810204875e98
00002aaae1a00000 ffff810fffed14a8 00002aaae1a00000 ffff810be0f79860
Call Trace:
[<ffffffff80157a37>] zap_pte_range+0x1c4/0x2c0
[<ffffffff80157d0e>] unmap_page_range+0x1db/0x23a
[<ffffffff80157e5b>] unmap_vmas+0xee/0x1e3
[<ffffffff8015c6fe>] unmap_region+0xb4/0x127
[<ffffffff8015caa7>] do_munmap+0x183/0x19a
[<ffffffff8015caf7>] sys_munmap+0x39/0x52
[<ffffffff80109726>] system_call+0x7e/0x83
Code: 0f 0b 68 c0 4e 46 80 c2 0a 02 31 f6 f6 47 18 01 40 0f 94 c6
RIP [<ffffffff8015ee54>] page_remove_rmap+0x13/0x2d
RSP <ffff810204875d70>
<3>swap_free: Unused swap offset entry 00000060
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at ...aid0/home/maan/scm/stable/linux-2.6.18.y/mm/rmap.c:522
invalid opcode: 0000 [5] SMP
CPU 14
Pid: 19074, comm: MATLAB Tainted: G B 2.6.18-tt64-6 #1
RIP: 0010:[<ffffffff8015ee54>] [<ffffffff8015ee54>] page_remove_rmap+0x13/0x2d
RSP: 0018:ffff810021587d70 EFLAGS: 00010286
RAX: 00000000ffffffff RBX: ffff810e8ca8f6a0 RCX: 000000000000001c
RDX: ffff810e30000000 RSI: 00002aaaf7cd4000 RDI: ffff81102a447b08
RBP: 0000000e8cb57067 R08: 00000000fffffffe R09: ffff810e30000680
R10: ffff810230080708 R11: ffff810389b47b88 R12: ffff81102a447b08
R13: ffff810e3005e160 R14: 00002aaaf7cd4000 R15: 0000000000000000
FS: 00002b1e7eb3b4a0(0000) GS:ffff810e301b9440(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00002aab08206870 CR3: 00000002b40e4000 CR4: 00000000000006a0
Process MATLAB (pid: 19074, threadinfo ffff810021586000, task ffff81013b640040)
Stack: ffffffff80157a37 ffff810e30000680 00002aaaf7cd5000 ffff810e8ca8f6a8
ffff81102a444f58 00000000ffffff38 ffff810230080700 ffff810021587e98
00002aaaf7e00000 ffff810fecefdb88 00002aaaf7e00000 ffff810092d8fdf0
Call Trace:
[<ffffffff80157a37>] zap_pte_range+0x1c4/0x2c0
[<ffffffff80157d0e>] unmap_page_range+0x1db/0x23a
[<ffffffff80157e5b>] unmap_vmas+0xee/0x1e3
[<ffffffff8015c6fe>] unmap_region+0xb4/0x127
[<ffffffff8015caa7>] do_munmap+0x183/0x19a
[<ffffffff8015caf7>] sys_munmap+0x39/0x52
[<ffffffff80109726>] system_call+0x7e/0x83
Code: 0f 0b 68 c0 4e 46 80 c2 0a 02 31 f6 f6 47 18 01 40 0f 94 c6
RIP [<ffffffff8015ee54>] page_remove_rmap+0x13/0x2d
RSP <ffff810021587d70>
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at ...aid0/home/maan/scm/stable/linux-2.6.18.y/mm/rmap.c:522
invalid opcode: 0000 [6] SMP
CPU 14
Pid: 20028, comm: MATLAB Tainted: G B 2.6.18-tt64-6 #1
RIP: 0010:[<ffffffff8015ee54>] [<ffffffff8015ee54>] page_remove_rmap+0x13/0x2d
RSP: 0018:ffff810dad0e7d70 EFLAGS: 00010286
RAX: 00000000ffffffff RBX: ffff810e7ea8d0e0 RCX: 000000000000001c
RDX: ffff810e30000000 RSI: 00002aaaad21c000 RDI: ffff81102cf5e918
RBP: 0000000f51b05067 R08: 00002aaaad400000 R09: ffff810e30000680
R10: ffff810428d0f888 R11: ffff810fcb122ad8 R12: ffff81102cf5e918
R13: ffff810e3005e160 R14: 00002aaaad21c000 R15: 0000000000000000
FS: 00002adbe435e4a0(0000) GS:ffff810e301b9440(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00002aaab12b8000 CR3: 0000000394dab000 CR4: 00000000000006a0
Process MATLAB (pid: 20028, threadinfo ffff810dad0e6000, task ffff810dbc128140)
Stack: ffffffff80157a37 ffff810e301b9340 00002aaaad21d000 ffff810e7ea8d0e8
ffff81102a134ee8 00000000fffffff2 ffff810428d0f880 ffff810dad0e7e98
00002aaaad400000 ffff810fe56a0d98 00002aaaad400000 ffff810175c0eb48
Call Trace:
[<ffffffff80157a37>] zap_pte_range+0x1c4/0x2c0
[<ffffffff80157d0e>] unmap_page_range+0x1db/0x23a
[<ffffffff80157e5b>] unmap_vmas+0xee/0x1e3
[<ffffffff8015c6fe>] unmap_region+0xb4/0x127
[<ffffffff8015caa7>] do_munmap+0x183/0x19a
[<ffffffff8015caf7>] sys_munmap+0x39/0x52
[<ffffffff80109726>] system_call+0x7e/0x83
Code: 0f 0b 68 c0 4e 46 80 c2 0a 02 31 f6 f6 47 18 01 40 0f 94 c6
RIP [<ffffffff8015ee54>] page_remove_rmap+0x13/0x2d
RSP <ffff810dad0e7d70>
--
The only person who always got his work done by Friday was Robinson Crusoe
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: 2.6.18: Kernel BUG at mm/rmap.c:522
2006-10-04 10:40 2.6.18: Kernel BUG at mm/rmap.c:522 Andre Noll
@ 2006-10-04 13:59 ` Nick Piggin
2006-10-04 15:42 ` Andre Noll
0 siblings, 1 reply; 7+ messages in thread
From: Nick Piggin @ 2006-10-04 13:59 UTC (permalink / raw)
To: Andre Noll; +Cc: linux-mm, linux-kernel, andrea, riel
[-- Attachment #1: Type: text/plain, Size: 779 bytes --]
Andre Noll wrote:
> Hi
>
> MATLAB triggers the following bug on both of our new 16-way opteron
> machines (64G Ram): The same kernel is running with no problems on a
> bunch of smaller (8-way, 4-way, max 32G Ram) cluster nodes.
>
> Any hints?
> Andre
>
>
> ----------- [cut here ] --------- [please bite here ] ---------
> Kernel BUG at ...aid0/home/maan/scm/stable/linux-2.6.18.y/mm/rmap.c:522
Ah, this old thing. I hope it is repeatable?
What we really want is the bit before this, the "Eeek! page_mapcount went
negative" part.
It is also nice if we can work out where the page actually came from. The
following attached patch should help out a bit with that, if you could
run with it?
Thanks a lot for reporting, this is very useful.
Nick
--
SUSE Labs, Novell Inc.
[-- Attachment #2: mm-rmap-debug-more.patch --]
[-- Type: text/plain, Size: 3863 bytes --]
Index: linux-2.6/include/linux/rmap.h
===================================================================
--- linux-2.6.orig/include/linux/rmap.h 2006-10-04 23:51:55.000000000 +1000
+++ linux-2.6/include/linux/rmap.h 2006-10-04 23:52:28.000000000 +1000
@@ -72,7 +72,7 @@ void __anon_vma_link(struct vm_area_stru
void page_add_anon_rmap(struct page *, struct vm_area_struct *, unsigned long);
void page_add_new_anon_rmap(struct page *, struct vm_area_struct *, unsigned long);
void page_add_file_rmap(struct page *);
-void page_remove_rmap(struct page *);
+void page_remove_rmap(struct page *, struct vm_area_struct *);
/**
* page_dup_rmap - duplicate pte mapping to a page
Index: linux-2.6/mm/filemap_xip.c
===================================================================
--- linux-2.6.orig/mm/filemap_xip.c 2006-10-04 23:56:59.000000000 +1000
+++ linux-2.6/mm/filemap_xip.c 2006-10-04 23:57:08.000000000 +1000
@@ -189,7 +189,7 @@ __xip_unmap (struct address_space * mapp
/* Nuke the page table entry. */
flush_cache_page(vma, address, pte_pfn(*pte));
pteval = ptep_clear_flush(vma, address, pte);
- page_remove_rmap(page);
+ page_remove_rmap(page, vma);
dec_mm_counter(mm, file_rss);
BUG_ON(pte_dirty(pteval));
pte_unmap_unlock(pte, ptl);
Index: linux-2.6/mm/fremap.c
===================================================================
--- linux-2.6.orig/mm/fremap.c 2006-10-04 23:57:00.000000000 +1000
+++ linux-2.6/mm/fremap.c 2006-10-04 23:57:13.000000000 +1000
@@ -33,7 +33,7 @@ static int zap_pte(struct mm_struct *mm,
if (page) {
if (pte_dirty(pte))
set_page_dirty(page);
- page_remove_rmap(page);
+ page_remove_rmap(page, vma);
page_cache_release(page);
}
} else {
Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c 2006-10-04 23:57:00.000000000 +1000
+++ linux-2.6/mm/memory.c 2006-10-04 23:57:26.000000000 +1000
@@ -677,7 +677,7 @@ static unsigned long zap_pte_range(struc
mark_page_accessed(page);
file_rss--;
}
- page_remove_rmap(page);
+ page_remove_rmap(page, vma);
tlb_remove_page(tlb, page);
continue;
}
@@ -1540,7 +1540,7 @@ gotten:
page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
if (likely(pte_same(*page_table, orig_pte))) {
if (old_page) {
- page_remove_rmap(old_page);
+ page_remove_rmap(old_page, vma);
if (!PageAnon(old_page)) {
dec_mm_counter(mm, file_rss);
inc_mm_counter(mm, anon_rss);
Index: linux-2.6/mm/rmap.c
===================================================================
--- linux-2.6.orig/mm/rmap.c 2006-10-04 23:49:15.000000000 +1000
+++ linux-2.6/mm/rmap.c 2006-10-04 23:58:00.000000000 +1000
@@ -508,7 +508,7 @@ void page_add_file_rmap(struct page *pag
*
* The caller needs to hold the pte lock.
*/
-void page_remove_rmap(struct page *page)
+void page_remove_rmap(struct page *page, struct vm_area_struct *vma)
{
if (atomic_add_negative(-1, &page->_mapcount)) {
#ifdef CONFIG_DEBUG_VM
@@ -517,6 +517,7 @@ void page_remove_rmap(struct page *page)
printk (KERN_EMERG " page->flags = %lx\n", page->flags);
printk (KERN_EMERG " page->count = %x\n", page_count(page));
printk (KERN_EMERG " page->mapping = %p\n", page->mapping);
+ print_symbol (KERN_EMERG " vma->vm_ops = %s\n", vma->vm_ops);
}
#endif
BUG_ON(page_mapcount(page) < 0);
@@ -621,7 +622,7 @@ static int try_to_unmap_one(struct page
dec_mm_counter(mm, file_rss);
- page_remove_rmap(page);
+ page_remove_rmap(page, vma);
page_cache_release(page);
out_unmap:
@@ -711,7 +712,7 @@ static void try_to_unmap_cluster(unsigne
if (pte_dirty(pteval))
set_page_dirty(page);
- page_remove_rmap(page);
+ page_remove_rmap(page, vma);
page_cache_release(page);
dec_mm_counter(mm, file_rss);
(*mapcount)--;
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: 2.6.18: Kernel BUG at mm/rmap.c:522
2006-10-04 13:59 ` Nick Piggin
@ 2006-10-04 15:42 ` Andre Noll
2006-10-04 15:49 ` Peter Zijlstra
0 siblings, 1 reply; 7+ messages in thread
From: Andre Noll @ 2006-10-04 15:42 UTC (permalink / raw)
To: Nick Piggin; +Cc: linux-mm, linux-kernel, andrea, riel
[-- Attachment #1: Type: text/plain, Size: 985 bytes --]
On 23:59, Nick Piggin wrote:
> Ah, this old thing. I hope it is repeatable?
Well, it happened on both of the new machines we got last week. One
of these is still up BTW and I'm able to ssh into it.
> What we really want is the bit before this, the "Eeek! page_mapcount went
> negative" part.
There's no such message in the log. The preceeding lines are just normal
startup messages:
Adding 16779852k swap on /dev/sda1. Priority:42 extents:1 across:16779852k
Adding 16779852k swap on /dev/sdb1. Priority:42 extents:1 across:16779852k
process `syslogd' is using obsolete setsockopt SO_BSDCOMPAT
> It is also nice if we can work out where the page actually came from. The
> following attached patch should help out a bit with that, if you could
> run with it?
Okay. I'll reboot with your patch and let you know if it crashes again.
Thanks for the quick response.
Andre
--
The only person who always got his work done by Friday was Robinson Crusoe
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: 2.6.18: Kernel BUG at mm/rmap.c:522
2006-10-04 15:42 ` Andre Noll
@ 2006-10-04 15:49 ` Peter Zijlstra
2006-10-04 16:12 ` Andre Noll
2006-10-04 20:39 ` Dave Jones
0 siblings, 2 replies; 7+ messages in thread
From: Peter Zijlstra @ 2006-10-04 15:49 UTC (permalink / raw)
To: Andre Noll; +Cc: Nick Piggin, linux-mm, linux-kernel, andrea, riel
On Wed, 2006-10-04 at 17:42 +0200, Andre Noll wrote:
> On 23:59, Nick Piggin wrote:
>
> > Ah, this old thing. I hope it is repeatable?
>
> Well, it happened on both of the new machines we got last week. One
> of these is still up BTW and I'm able to ssh into it.
>
> > What we really want is the bit before this, the "Eeek! page_mapcount went
> > negative" part.
>
> There's no such message in the log. The preceeding lines are just normal
> startup messages:
>
> Adding 16779852k swap on /dev/sda1. Priority:42 extents:1 across:16779852k
> Adding 16779852k swap on /dev/sdb1. Priority:42 extents:1 across:16779852k
> process `syslogd' is using obsolete setsockopt SO_BSDCOMPAT
>
> > It is also nice if we can work out where the page actually came from. The
> > following attached patch should help out a bit with that, if you could
> > run with it?
>
> Okay. I'll reboot with your patch and let you know if it crashes again.
enable CONFIG_DEBUG_VM to get that.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: 2.6.18: Kernel BUG at mm/rmap.c:522
2006-10-04 15:49 ` Peter Zijlstra
@ 2006-10-04 16:12 ` Andre Noll
2006-10-04 20:39 ` Dave Jones
1 sibling, 0 replies; 7+ messages in thread
From: Andre Noll @ 2006-10-04 16:12 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Nick Piggin, linux-mm, linux-kernel, andrea, riel
[-- Attachment #1: Type: text/plain, Size: 219 bytes --]
On 17:49, Peter Zijlstra wrote:
> enable CONFIG_DEBUG_VM to get that.
Yup, that was disabled. It's enabled now.
Thanks
Andre
--
The only person who always got his work done by Friday was Robinson Crusoe
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: 2.6.18: Kernel BUG at mm/rmap.c:522
2006-10-04 15:49 ` Peter Zijlstra
2006-10-04 16:12 ` Andre Noll
@ 2006-10-04 20:39 ` Dave Jones
2006-10-05 6:28 ` Nick Piggin
1 sibling, 1 reply; 7+ messages in thread
From: Dave Jones @ 2006-10-04 20:39 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Andre Noll, Nick Piggin, linux-mm, linux-kernel, andrea, riel
On Wed, Oct 04, 2006 at 05:49:00PM +0200, Peter Zijlstra wrote:
> > > It is also nice if we can work out where the page actually came from. The
> > > following attached patch should help out a bit with that, if you could
> > > run with it?
> > Okay. I'll reboot with your patch and let you know if it crashes again.
> enable CONFIG_DEBUG_VM to get that.
Given this warnings still pops up from time to time, I question whether
putting that check under DEBUG_VM was such a good idea. It's not as
if it's a major performance impact. This has potential for us to lose
valuable debugging info for a few nanoseconds performance increase in
an already costly path.
This patch brings it back unconditionally, and moves the BUG()
into the if arm.
Signed-off-by: Dave Jones <davej@redhat.com>
--- local-git/mm/rmap.c~ 2006-10-04 16:38:06.000000000 -0400
+++ local-git/mm/rmap.c 2006-10-04 16:38:24.000000000 -0400
@@ -576,15 +576,14 @@ void page_add_file_rmap(struct page *pag
void page_remove_rmap(struct page *page)
{
if (atomic_add_negative(-1, &page->_mapcount)) {
-#ifdef CONFIG_DEBUG_VM
if (unlikely(page_mapcount(page) < 0)) {
printk (KERN_EMERG "Eeek! page_mapcount(page) went negative! (%d)\n", page_mapcount(page));
printk (KERN_EMERG " page->flags = %lx\n", page->flags);
printk (KERN_EMERG " page->count = %x\n", page_count(page));
printk (KERN_EMERG " page->mapping = %p\n", page->mapping);
+ BUG();
}
-#endif
- BUG_ON(page_mapcount(page) < 0);
+
/*
* It would be tidy to reset the PageAnon mapping here,
* but that might overwrite a racing page_add_anon_rmap
--
http://www.codemonkey.org.uk
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: 2.6.18: Kernel BUG at mm/rmap.c:522
2006-10-04 20:39 ` Dave Jones
@ 2006-10-05 6:28 ` Nick Piggin
0 siblings, 0 replies; 7+ messages in thread
From: Nick Piggin @ 2006-10-05 6:28 UTC (permalink / raw)
To: Dave Jones
Cc: Peter Zijlstra, Andre Noll, linux-mm, linux-kernel, andrea, riel
Dave Jones wrote:
> On Wed, Oct 04, 2006 at 05:49:00PM +0200, Peter Zijlstra wrote:
>
> > > > It is also nice if we can work out where the page actually came from. The
> > > > following attached patch should help out a bit with that, if you could
> > > > run with it?
> > > Okay. I'll reboot with your patch and let you know if it crashes again.
> > enable CONFIG_DEBUG_VM to get that.
>
> Given this warnings still pops up from time to time, I question whether
> putting that check under DEBUG_VM was such a good idea. It's not as
> if it's a major performance impact. This has potential for us to lose
> valuable debugging info for a few nanoseconds performance increase in
> an already costly path.
Frustratingly, it usually doesn't tell us much (without my previous
patch), because it is normally some driver that has stuffed up their
refcounting and the page-> fields don't tell us where the page has
come from.
But..
> This patch brings it back unconditionally, and moves the BUG()
> into the if arm.
... this shouldn't hurt if gcc moves the unlikely code out of the
linear instruction stream, which I think it usually does. It shouldn't
cost _anything_ because we're already doing the branch for the BUG_ON
which you remove.
> Signed-off-by: Dave Jones <davej@redhat.com>
Thanks,
Acked-by: Nick Piggin <npiggin@suse.de>
>
> --- local-git/mm/rmap.c~ 2006-10-04 16:38:06.000000000 -0400
> +++ local-git/mm/rmap.c 2006-10-04 16:38:24.000000000 -0400
> @@ -576,15 +576,14 @@ void page_add_file_rmap(struct page *pag
> void page_remove_rmap(struct page *page)
> {
> if (atomic_add_negative(-1, &page->_mapcount)) {
> -#ifdef CONFIG_DEBUG_VM
> if (unlikely(page_mapcount(page) < 0)) {
> printk (KERN_EMERG "Eeek! page_mapcount(page) went negative! (%d)\n", page_mapcount(page));
> printk (KERN_EMERG " page->flags = %lx\n", page->flags);
> printk (KERN_EMERG " page->count = %x\n", page_count(page));
> printk (KERN_EMERG " page->mapping = %p\n", page->mapping);
> + BUG();
> }
> -#endif
> - BUG_ON(page_mapcount(page) < 0);
> +
> /*
> * It would be tidy to reset the PageAnon mapping here,
> * but that might overwrite a racing page_add_anon_rmap
--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2006-10-05 6:28 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-10-04 10:40 2.6.18: Kernel BUG at mm/rmap.c:522 Andre Noll
2006-10-04 13:59 ` Nick Piggin
2006-10-04 15:42 ` Andre Noll
2006-10-04 15:49 ` Peter Zijlstra
2006-10-04 16:12 ` Andre Noll
2006-10-04 20:39 ` Dave Jones
2006-10-05 6:28 ` Nick Piggin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox