Re: [discuss] Memory performance problems on Tyan VX50

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: [discuss] Memory performance problems on Tyan VX50
       [not found]   ` <43E0B8FE.8040803@t-platforms.ru>
@ 2006-02-01 14:39     ` Andi Kleen
  2006-02-01 17:03       ` Christoph Lameter
  2006-02-02 22:07       ` Ray Bryant
  0 siblings, 2 replies; 6+ messages in thread
From: Andi Kleen @ 2006-02-01 14:39 UTC (permalink / raw)
  To: discuss; +Cc: Andrey Slepuhin, Ray Bryant, linux-mm, Christoph Lameter, akpm

On Wednesday 01 February 2006 14:34, Andrey Slepuhin wrote:
> Ray Bryant wrote:
> > I don't think this will show anything is wrong, but try running the attached 
> > program on your box; it will diagnose situations where the numa setup is 
> > incorrect.
> 
> Hi Ray,
> 
> I was not able to run wheremem om my system - it prints
> 
> [pooh@trans-rh4 ~]$ ./wheremem -vvv
> ./wheremem: checking 16 processors and 8 nodes; allocating 1024 pages.
> ./wheremem: starts....
> Killed
> 
> The program is killed by OOM killer and then kernel gets oops and kernel 
> panic.
> 
> On another system with 2 CPUs/4 cores it works just fine.
> 
> I attached a console log with oops.


Looks like a bug. There were changes both in the page allocator and in
mempolicy in 2.6.16rc, so it might be related to that.
What does this wheremem program do exactly?
And what does numastat --hardware say on the machine?

Either it's generally broken in page alloc or mempolicy somehow managed to pass in
a NULL zonelist. 

-Andi

Out of Memory: Killed process 4945 (wheremem).
Unable to handle kernel NULL pointer dereference at 0000000000000008 RIP: 
<ffffffff8015476c>{__rmqueue+60}
PGD 6ff91d067 PUD 6ffd7d067 PMD 0 
Oops: 0000 [1] SMP 
CPU 2 
Modules linked in: netconsole i2c_nforce2 tg3 floppy
Pid: 4945, comm: wheremem Not tainted 2.6.16-rc1 #7
RIP: 0010:[<ffffffff8015476c>] <ffffffff8015476c>{__rmqueue+60}
RSP: 0000:ffff810403bbfce0  EFLAGS: 00010017
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff810000029700
RBP: 0000000000000001 R08: ffff810000029700 R09: ffff810000029848
R10: 0000000000000000 R11: 0000000000000000 R12: ffff810000029700
R13: 0000000000000000 R14: 0000000000000002 R15: 0000000000000001
FS:  00002afd5e9fbde0(0000) GS:ffff8101038921c0(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000008 CR3: 00000006ffc42000 CR4: 00000000000006e0
Process wheremem (pid: 4945, threadinfo ffff810403bbe000, task ffff8104ffa2b0e0)
Stack: ffff8101038920c0 ffff8101038920d0 ffffffff80154cd0 0000000000000001 
       0000000200000001 ffff8101038de140 0000000180154099 ffff8101038de140 
       000280d200000000 0000000000000286 
Call Trace: <ffffffff80154cd0>{get_page_from_freelist+272}
       <ffffffff801550a7>{__alloc_pages+311} <ffffffff8015f2d5>{__handle_mm_fault+517}
       <ffffffff801610e7>{vma_adjust+503} <ffffffff80354078>{do_page_fault+936}
       <ffffffff8016c6b6>{do_mbind+678} <ffffffff8010ba75>{error_exit+0}


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [discuss] Memory performance problems on Tyan VX50
  2006-02-01 14:39     ` [discuss] Memory performance problems on Tyan VX50 Andi Kleen
@ 2006-02-01 17:03       ` Christoph Lameter
  2006-02-01 17:16         ` Andi Kleen
  2006-02-02 22:07       ` Ray Bryant
  1 sibling, 1 reply; 6+ messages in thread
From: Christoph Lameter @ 2006-02-01 17:03 UTC (permalink / raw)
  To: Andi Kleen; +Cc: discuss, Andrey Slepuhin, Ray Bryant, linux-mm, akpm

On Wed, 1 Feb 2006, Andi Kleen wrote:

> Looks like a bug. There were changes both in the page allocator and in
> mempolicy in 2.6.16rc, so it might be related to that.
> What does this wheremem program do exactly?
> And what does numastat --hardware say on the machine?
> 
> Either it's generally broken in page alloc or mempolicy somehow managed to pass in
> a NULL zonelist. 

The failure is in __rmqueue. AFAIK There is no influence of mempolicy on 
that one. Could we get an accurate pointer to the statement that is 
causing the NULL deref?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [discuss] Memory performance problems on Tyan VX50
  2006-02-01 17:03       ` Christoph Lameter
@ 2006-02-01 17:16         ` Andi Kleen
  2006-02-01 20:26           ` Andrey Slepuhin
  0 siblings, 1 reply; 6+ messages in thread
From: Andi Kleen @ 2006-02-01 17:16 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: discuss, Andrey Slepuhin, Ray Bryant, linux-mm, akpm

On Wednesday 01 February 2006 18:03, Christoph Lameter wrote:
> On Wed, 1 Feb 2006, Andi Kleen wrote:
> 
> > Looks like a bug. There were changes both in the page allocator and in
> > mempolicy in 2.6.16rc, so it might be related to that.
> > What does this wheremem program do exactly?
> > And what does numastat --hardware say on the machine?
> > 
> > Either it's generally broken in page alloc or mempolicy somehow managed to pass in
> > a NULL zonelist. 
> 
> The failure is in __rmqueue. AFAIK There is no influence of mempolicy on 
> that one.

I haven't followed it in all details, but it could be if the zonelist
is empty and rmqueue is the first to notice?

Or MPOL_BIND makes it just easier to trigger OOM
(maybe it would be a good idea to add some hack to prevent the oom
killer from running when the OOM comes from a non standard numa policy)


> Could we get an accurate pointer to the statement that is  
> causing the NULL deref?

Andrey, can you recompile the kernel with CONFIG_DEBUG_INFO and 
do a addr2line -e vmlinux <RIP from oops> ? 

-Andi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [discuss] Memory performance problems on Tyan VX50
  2006-02-01 17:16         ` Andi Kleen
@ 2006-02-01 20:26           ` Andrey Slepuhin
  2006-02-01 23:59             ` Christoph Lameter
  0 siblings, 1 reply; 6+ messages in thread
From: Andrey Slepuhin @ 2006-02-01 20:26 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Christoph Lameter, discuss, Ray Bryant, linux-mm, akpm

[-- Attachment #1: Type: text/plain, Size: 469 bytes --]

Andi Kleen wrote:
>>Could we get an accurate pointer to the statement that is  
>>causing the NULL deref?
> 
> 
> Andrey, can you recompile the kernel with CONFIG_DEBUG_INFO and 
> do a addr2line -e vmlinux <RIP from oops> ? 

Yes, I have done it:

ffffffff801549ac -> include/linux/list.h:150

and two more addresses up the stack:

ffffffff80154f10 -> mm/page_alloc.c:580
ffffffff801552e7 -> mm/page_alloc.c:965

I also attached the new oops log

Best regards,
Andrey

[-- Attachment #2: nc2.log.gz --]
[-- Type: application/x-gzip, Size: 6842 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [discuss] Memory performance problems on Tyan VX50
  2006-02-01 20:26           ` Andrey Slepuhin
@ 2006-02-01 23:59             ` Christoph Lameter
  0 siblings, 0 replies; 6+ messages in thread
From: Christoph Lameter @ 2006-02-01 23:59 UTC (permalink / raw)
  To: Andrey Slepuhin; +Cc: Andi Kleen, discuss, Ray Bryant, linux-mm, akpm

> ffffffff801549ac -> include/linux/list.h:150

Hmm... That may indicate that something overwrites
page->lru. 

Wild guess: lru is placed after the spinlock used for
page table locking in struct page. Is this system using
per page page table locks?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [discuss] Memory performance problems on Tyan VX50
  2006-02-01 14:39     ` [discuss] Memory performance problems on Tyan VX50 Andi Kleen
  2006-02-01 17:03       ` Christoph Lameter
@ 2006-02-02 22:07       ` Ray Bryant
  1 sibling, 0 replies; 6+ messages in thread
From: Ray Bryant @ 2006-02-02 22:07 UTC (permalink / raw)
  To: Andi Kleen; +Cc: discuss, Andrey Slepuhin, linux-mm, Christoph Lameter, akpm

On Wednesday 01 February 2006 08:39, Andi Kleen wrote:
> On Wednesday 01 February 2006 14:34, Andrey Slepuhin wrote:
> > Ray Bryant wrote:
> > > I don't think this will show anything is wrong, but try running the
> > > attached program on your box; it will diagnose situations where the
> > > numa setup is incorrect.
> >
> > Hi Ray,
> >
> > I was not able to run wheremem om my system - it prints
> >
> > [pooh@trans-rh4 ~]$ ./wheremem -vvv
> > ./wheremem: checking 16 processors and 8 nodes; allocating 1024 pages.
> > ./wheremem: starts....
> > Killed
> >
> > The program is killed by OOM killer and then kernel gets oops and kernel
> > panic.
> >
> > On another system with 2 CPUs/4 cores it works just fine.
> >
> > I attached a console log with oops.
>
> Looks like a bug. There were changes both in the page allocator and in
> mempolicy in 2.6.16rc, so it might be related to that.
> What does this wheremem program do exactly?

It pins a thread to each cpu in turn, then allocates 1024 pages on that cpu 
and checks to make sure the pages are allocated on the correct node.   This 
was a test program we've used in the past to make sure the numa allocation 
code is working correctly.   It shouldn't be causing the system to OOM.

(Sorry for late reply -- my Linux box has been down since yesterday 
morning....)

Andrey,

Our 8 socket, dual core Opteron box is unavailable for testing at the moment, 
but there some bandwitdth limitations on 8-socket Opterons at the moment due 
to:

(1)  There are not enough hypertransport links to build a symmetric system.
       Depending on if your box is a strict ladder (or a twisted ladder) then
       sockets 1,2 and 7,8 typically have less bandwidth available to them
       than sockets 3-5 (the interior sockets in the ladder).    The edge
       nodes typically have one hypertransport link dedicated to I/O, so 
       have only 2 links used to get to connect to other nodes for memory
       accesses.   The interior nodes typically use all 3 hypertransport links
       to connect to other processors, but for them, I/O is always one hop
       away.

(2)  If your workload is memory intensive (e. g. stream) you can spend a
       lot of time waiting for cache probes to return on an 8-socket system.
       The larger systems seem to be better suited for cache-intensive rather
       than bandwidth-intensive workloads.

If you are seeing the effect of (2), it is a hardware rather than a software 
issue.
 
> And what does numastat --hardware say on the machine?
>
> Either it's generally broken in page alloc or mempolicy somehow managed to
> pass in a NULL zonelist.
>
> -Andi
>
> Out of Memory: Killed process 4945 (wheremem).
> Unable to handle kernel NULL pointer dereference at 0000000000000008 RIP:
> <ffffffff8015476c>{__rmqueue+60}
> PGD 6ff91d067 PUD 6ffd7d067 PMD 0
> Oops: 0000 [1] SMP
> CPU 2
> Modules linked in: netconsole i2c_nforce2 tg3 floppy
> Pid: 4945, comm: wheremem Not tainted 2.6.16-rc1 #7
> RIP: 0010:[<ffffffff8015476c>] <ffffffff8015476c>{__rmqueue+60}
> RSP: 0000:ffff810403bbfce0  EFLAGS: 00010017
> RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff810000029700
> RBP: 0000000000000001 R08: ffff810000029700 R09: ffff810000029848
> R10: 0000000000000000 R11: 0000000000000000 R12: ffff810000029700
> R13: 0000000000000000 R14: 0000000000000002 R15: 0000000000000001
> FS:  00002afd5e9fbde0(0000) GS:ffff8101038921c0(0000)
> knlGS:0000000000000000 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000000000000008 CR3: 00000006ffc42000 CR4: 00000000000006e0
> Process wheremem (pid: 4945, threadinfo ffff810403bbe000, task
> ffff8104ffa2b0e0) Stack: ffff8101038920c0 ffff8101038920d0 ffffffff80154cd0
> 0000000000000001 0000000200000001 ffff8101038de140 0000000180154099
> ffff8101038de140 000280d200000000 0000000000000286
> Call Trace: <ffffffff80154cd0>{get_page_from_freelist+272}
>        <ffffffff801550a7>{__alloc_pages+311}
> <ffffffff8015f2d5>{__handle_mm_fault+517}
> <ffffffff801610e7>{vma_adjust+503} <ffffffff80354078>{do_page_fault+936}
> <ffffffff8016c6b6>{do_mbind+678} <ffffffff8010ba75>{error_exit+0}

-- 
Ray Bryant
AMD Performance Labs                   Austin, Tx
512-602-0038 (o)                 512-507-7807 (c)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2006-02-02 22:07 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <43DF7654.6060807@t-platforms.ru>
     [not found] ` <200601311223.11492.raybry@mpdtxmail.amd.com>
     [not found]   ` <43E0B8FE.8040803@t-platforms.ru>
2006-02-01 14:39     ` [discuss] Memory performance problems on Tyan VX50 Andi Kleen
2006-02-01 17:03       ` Christoph Lameter
2006-02-01 17:16         ` Andi Kleen
2006-02-01 20:26           ` Andrey Slepuhin
2006-02-01 23:59             ` Christoph Lameter
2006-02-02 22:07       ` Ray Bryant

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox