* Re: [discuss] Memory performance problems on Tyan VX50 [not found] ` <43E0B8FE.8040803@t-platforms.ru> @ 2006-02-01 14:39 ` Andi Kleen 2006-02-01 17:03 ` Christoph Lameter 2006-02-02 22:07 ` Ray Bryant 0 siblings, 2 replies; 6+ messages in thread From: Andi Kleen @ 2006-02-01 14:39 UTC (permalink / raw) To: discuss; +Cc: Andrey Slepuhin, Ray Bryant, linux-mm, Christoph Lameter, akpm On Wednesday 01 February 2006 14:34, Andrey Slepuhin wrote: > Ray Bryant wrote: > > I don't think this will show anything is wrong, but try running the attached > > program on your box; it will diagnose situations where the numa setup is > > incorrect. > > Hi Ray, > > I was not able to run wheremem om my system - it prints > > [pooh@trans-rh4 ~]$ ./wheremem -vvv > ./wheremem: checking 16 processors and 8 nodes; allocating 1024 pages. > ./wheremem: starts.... > Killed > > The program is killed by OOM killer and then kernel gets oops and kernel > panic. > > On another system with 2 CPUs/4 cores it works just fine. > > I attached a console log with oops. Looks like a bug. There were changes both in the page allocator and in mempolicy in 2.6.16rc, so it might be related to that. What does this wheremem program do exactly? And what does numastat --hardware say on the machine? Either it's generally broken in page alloc or mempolicy somehow managed to pass in a NULL zonelist. -Andi Out of Memory: Killed process 4945 (wheremem). Unable to handle kernel NULL pointer dereference at 0000000000000008 RIP: <ffffffff8015476c>{__rmqueue+60} PGD 6ff91d067 PUD 6ffd7d067 PMD 0 Oops: 0000 [1] SMP CPU 2 Modules linked in: netconsole i2c_nforce2 tg3 floppy Pid: 4945, comm: wheremem Not tainted 2.6.16-rc1 #7 RIP: 0010:[<ffffffff8015476c>] <ffffffff8015476c>{__rmqueue+60} RSP: 0000:ffff810403bbfce0 EFLAGS: 00010017 RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff810000029700 RBP: 0000000000000001 R08: ffff810000029700 R09: ffff810000029848 R10: 0000000000000000 R11: 0000000000000000 R12: ffff810000029700 R13: 0000000000000000 R14: 0000000000000002 R15: 0000000000000001 FS: 00002afd5e9fbde0(0000) GS:ffff8101038921c0(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000008 CR3: 00000006ffc42000 CR4: 00000000000006e0 Process wheremem (pid: 4945, threadinfo ffff810403bbe000, task ffff8104ffa2b0e0) Stack: ffff8101038920c0 ffff8101038920d0 ffffffff80154cd0 0000000000000001 0000000200000001 ffff8101038de140 0000000180154099 ffff8101038de140 000280d200000000 0000000000000286 Call Trace: <ffffffff80154cd0>{get_page_from_freelist+272} <ffffffff801550a7>{__alloc_pages+311} <ffffffff8015f2d5>{__handle_mm_fault+517} <ffffffff801610e7>{vma_adjust+503} <ffffffff80354078>{do_page_fault+936} <ffffffff8016c6b6>{do_mbind+678} <ffffffff8010ba75>{error_exit+0} -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [discuss] Memory performance problems on Tyan VX50 2006-02-01 14:39 ` [discuss] Memory performance problems on Tyan VX50 Andi Kleen @ 2006-02-01 17:03 ` Christoph Lameter 2006-02-01 17:16 ` Andi Kleen 2006-02-02 22:07 ` Ray Bryant 1 sibling, 1 reply; 6+ messages in thread From: Christoph Lameter @ 2006-02-01 17:03 UTC (permalink / raw) To: Andi Kleen; +Cc: discuss, Andrey Slepuhin, Ray Bryant, linux-mm, akpm On Wed, 1 Feb 2006, Andi Kleen wrote: > Looks like a bug. There were changes both in the page allocator and in > mempolicy in 2.6.16rc, so it might be related to that. > What does this wheremem program do exactly? > And what does numastat --hardware say on the machine? > > Either it's generally broken in page alloc or mempolicy somehow managed to pass in > a NULL zonelist. The failure is in __rmqueue. AFAIK There is no influence of mempolicy on that one. Could we get an accurate pointer to the statement that is causing the NULL deref? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [discuss] Memory performance problems on Tyan VX50 2006-02-01 17:03 ` Christoph Lameter @ 2006-02-01 17:16 ` Andi Kleen 2006-02-01 20:26 ` Andrey Slepuhin 0 siblings, 1 reply; 6+ messages in thread From: Andi Kleen @ 2006-02-01 17:16 UTC (permalink / raw) To: Christoph Lameter; +Cc: discuss, Andrey Slepuhin, Ray Bryant, linux-mm, akpm On Wednesday 01 February 2006 18:03, Christoph Lameter wrote: > On Wed, 1 Feb 2006, Andi Kleen wrote: > > > Looks like a bug. There were changes both in the page allocator and in > > mempolicy in 2.6.16rc, so it might be related to that. > > What does this wheremem program do exactly? > > And what does numastat --hardware say on the machine? > > > > Either it's generally broken in page alloc or mempolicy somehow managed to pass in > > a NULL zonelist. > > The failure is in __rmqueue. AFAIK There is no influence of mempolicy on > that one. I haven't followed it in all details, but it could be if the zonelist is empty and rmqueue is the first to notice? Or MPOL_BIND makes it just easier to trigger OOM (maybe it would be a good idea to add some hack to prevent the oom killer from running when the OOM comes from a non standard numa policy) > Could we get an accurate pointer to the statement that is > causing the NULL deref? Andrey, can you recompile the kernel with CONFIG_DEBUG_INFO and do a addr2line -e vmlinux <RIP from oops> ? -Andi -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [discuss] Memory performance problems on Tyan VX50 2006-02-01 17:16 ` Andi Kleen @ 2006-02-01 20:26 ` Andrey Slepuhin 2006-02-01 23:59 ` Christoph Lameter 0 siblings, 1 reply; 6+ messages in thread From: Andrey Slepuhin @ 2006-02-01 20:26 UTC (permalink / raw) To: Andi Kleen; +Cc: Christoph Lameter, discuss, Ray Bryant, linux-mm, akpm [-- Attachment #1: Type: text/plain, Size: 469 bytes --] Andi Kleen wrote: >>Could we get an accurate pointer to the statement that is >>causing the NULL deref? > > > Andrey, can you recompile the kernel with CONFIG_DEBUG_INFO and > do a addr2line -e vmlinux <RIP from oops> ? Yes, I have done it: ffffffff801549ac -> include/linux/list.h:150 and two more addresses up the stack: ffffffff80154f10 -> mm/page_alloc.c:580 ffffffff801552e7 -> mm/page_alloc.c:965 I also attached the new oops log Best regards, Andrey [-- Attachment #2: nc2.log.gz --] [-- Type: application/x-gzip, Size: 6842 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [discuss] Memory performance problems on Tyan VX50 2006-02-01 20:26 ` Andrey Slepuhin @ 2006-02-01 23:59 ` Christoph Lameter 0 siblings, 0 replies; 6+ messages in thread From: Christoph Lameter @ 2006-02-01 23:59 UTC (permalink / raw) To: Andrey Slepuhin; +Cc: Andi Kleen, discuss, Ray Bryant, linux-mm, akpm > ffffffff801549ac -> include/linux/list.h:150 Hmm... That may indicate that something overwrites page->lru. Wild guess: lru is placed after the spinlock used for page table locking in struct page. Is this system using per page page table locks? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [discuss] Memory performance problems on Tyan VX50 2006-02-01 14:39 ` [discuss] Memory performance problems on Tyan VX50 Andi Kleen 2006-02-01 17:03 ` Christoph Lameter @ 2006-02-02 22:07 ` Ray Bryant 1 sibling, 0 replies; 6+ messages in thread From: Ray Bryant @ 2006-02-02 22:07 UTC (permalink / raw) To: Andi Kleen; +Cc: discuss, Andrey Slepuhin, linux-mm, Christoph Lameter, akpm On Wednesday 01 February 2006 08:39, Andi Kleen wrote: > On Wednesday 01 February 2006 14:34, Andrey Slepuhin wrote: > > Ray Bryant wrote: > > > I don't think this will show anything is wrong, but try running the > > > attached program on your box; it will diagnose situations where the > > > numa setup is incorrect. > > > > Hi Ray, > > > > I was not able to run wheremem om my system - it prints > > > > [pooh@trans-rh4 ~]$ ./wheremem -vvv > > ./wheremem: checking 16 processors and 8 nodes; allocating 1024 pages. > > ./wheremem: starts.... > > Killed > > > > The program is killed by OOM killer and then kernel gets oops and kernel > > panic. > > > > On another system with 2 CPUs/4 cores it works just fine. > > > > I attached a console log with oops. > > Looks like a bug. There were changes both in the page allocator and in > mempolicy in 2.6.16rc, so it might be related to that. > What does this wheremem program do exactly? It pins a thread to each cpu in turn, then allocates 1024 pages on that cpu and checks to make sure the pages are allocated on the correct node. This was a test program we've used in the past to make sure the numa allocation code is working correctly. It shouldn't be causing the system to OOM. (Sorry for late reply -- my Linux box has been down since yesterday morning....) Andrey, Our 8 socket, dual core Opteron box is unavailable for testing at the moment, but there some bandwitdth limitations on 8-socket Opterons at the moment due to: (1) There are not enough hypertransport links to build a symmetric system. Depending on if your box is a strict ladder (or a twisted ladder) then sockets 1,2 and 7,8 typically have less bandwidth available to them than sockets 3-5 (the interior sockets in the ladder). The edge nodes typically have one hypertransport link dedicated to I/O, so have only 2 links used to get to connect to other nodes for memory accesses. The interior nodes typically use all 3 hypertransport links to connect to other processors, but for them, I/O is always one hop away. (2) If your workload is memory intensive (e. g. stream) you can spend a lot of time waiting for cache probes to return on an 8-socket system. The larger systems seem to be better suited for cache-intensive rather than bandwidth-intensive workloads. If you are seeing the effect of (2), it is a hardware rather than a software issue. > And what does numastat --hardware say on the machine? > > Either it's generally broken in page alloc or mempolicy somehow managed to > pass in a NULL zonelist. > > -Andi > > Out of Memory: Killed process 4945 (wheremem). > Unable to handle kernel NULL pointer dereference at 0000000000000008 RIP: > <ffffffff8015476c>{__rmqueue+60} > PGD 6ff91d067 PUD 6ffd7d067 PMD 0 > Oops: 0000 [1] SMP > CPU 2 > Modules linked in: netconsole i2c_nforce2 tg3 floppy > Pid: 4945, comm: wheremem Not tainted 2.6.16-rc1 #7 > RIP: 0010:[<ffffffff8015476c>] <ffffffff8015476c>{__rmqueue+60} > RSP: 0000:ffff810403bbfce0 EFLAGS: 00010017 > RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 > RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff810000029700 > RBP: 0000000000000001 R08: ffff810000029700 R09: ffff810000029848 > R10: 0000000000000000 R11: 0000000000000000 R12: ffff810000029700 > R13: 0000000000000000 R14: 0000000000000002 R15: 0000000000000001 > FS: 00002afd5e9fbde0(0000) GS:ffff8101038921c0(0000) > knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 0000000000000008 CR3: 00000006ffc42000 CR4: 00000000000006e0 > Process wheremem (pid: 4945, threadinfo ffff810403bbe000, task > ffff8104ffa2b0e0) Stack: ffff8101038920c0 ffff8101038920d0 ffffffff80154cd0 > 0000000000000001 0000000200000001 ffff8101038de140 0000000180154099 > ffff8101038de140 000280d200000000 0000000000000286 > Call Trace: <ffffffff80154cd0>{get_page_from_freelist+272} > <ffffffff801550a7>{__alloc_pages+311} > <ffffffff8015f2d5>{__handle_mm_fault+517} > <ffffffff801610e7>{vma_adjust+503} <ffffffff80354078>{do_page_fault+936} > <ffffffff8016c6b6>{do_mbind+678} <ffffffff8010ba75>{error_exit+0} -- Ray Bryant AMD Performance Labs Austin, Tx 512-602-0038 (o) 512-507-7807 (c) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2006-02-02 22:07 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <43DF7654.6060807@t-platforms.ru>
[not found] ` <200601311223.11492.raybry@mpdtxmail.amd.com>
[not found] ` <43E0B8FE.8040803@t-platforms.ru>
2006-02-01 14:39 ` [discuss] Memory performance problems on Tyan VX50 Andi Kleen
2006-02-01 17:03 ` Christoph Lameter
2006-02-01 17:16 ` Andi Kleen
2006-02-01 20:26 ` Andrey Slepuhin
2006-02-01 23:59 ` Christoph Lameter
2006-02-02 22:07 ` Ray Bryant
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox