linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: -mm numa perf regression
       [not found]                         ` <45057055.7070003@shadowen.org>
@ 2006-09-11 16:35                           ` Andrew Morton
  2006-09-11 16:41                             ` Andy Whitcroft
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2006-09-11 16:35 UTC (permalink / raw)
  To: Andy Whitcroft, linux-mm; +Cc: Christoph Lameter, Martin Bligh, Paul Jackson

On Mon, 11 Sep 2006 15:19:01 +0100
Andy Whitcroft <apw@shadowen.org> wrote:

> Christoph Lameter wrote:
> > On Fri, 8 Sep 2006, Andy Whitcroft wrote:
> > 
> >>> I have not heard back from you on this issue. It would be good to have 
> >>> some more data on this one.
> >> Sorry I submitted the tests and the results filtered out to TKO, and
> >> then I forgot to check them.  Looking at the graph backing this out has
> >> had no effect.  As I think we'd expect from what comes below.
> >>
> >> What next?
> > 
> > Get me the promised data? /proc/zoneinfo before and after the run. 
> > /proc/meminfo and /sys/devices/system/node/node*/* would be helpful.
> 
> Sorry for the delay, the relevant files wern't all being preserved.
> Fixed that up and reran things.  The results you asked for are available
> here:
> 
>     http://www.shadowen.org/~apw/public/debug-moe-perf/47138/
> 
> Just having a quick look at the results, it seems that they are saying
> that all of our cpu's are in node 0 which isn't right at all.  The
> machine has 4 processors per node.
> 
> I am sure that would account for the performance loss.  Now as to why ...
> 
> > Is there a way to remotely access the box?
> 
> Sadly no ... I do have direct access to test on the box but am not able
> to export it.
> 
> I've also started a bisection looking for it.  Though that will be some
> time yet as I've only just dropped the cleaver for the first time.
> 

I've added linux-mm.  Can we please keep it on-list.  I have a vague suspicion
that your bisection will end up pointing at one Mel Gorman.  Or someone else.
But whoever it is will end up wondering wtf is going on.

I don't understand what you mean by "all of our cpu's are in node 0"?  
http://www.shadowen.org/~apw/public/debug-moe-perf/47138/sys/devices/system/node.after/node0/
and
http://www.shadowen.org/~apw/public/debug-moe-perf/47138/sys/devices/system/node.before/node0/
look the same..  It depends what "before" and "after" mean, I guess...

Do we have full dmesg output for both 2.6.18-rc6-mm1 and 2.6.18-rc6?

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: -mm numa perf regression
  2006-09-11 16:35                           ` -mm numa perf regression Andrew Morton
@ 2006-09-11 16:41                             ` Andy Whitcroft
  2006-09-11 17:16                               ` Andy Whitcroft
  0 siblings, 1 reply; 6+ messages in thread
From: Andy Whitcroft @ 2006-09-11 16:41 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, Christoph Lameter, Martin Bligh, Paul Jackson

Andrew Morton wrote:
> On Mon, 11 Sep 2006 15:19:01 +0100
> Andy Whitcroft <apw@shadowen.org> wrote:
> 
>> Christoph Lameter wrote:
>>> On Fri, 8 Sep 2006, Andy Whitcroft wrote:
>>>
>>>>> I have not heard back from you on this issue. It would be good to have 
>>>>> some more data on this one.
>>>> Sorry I submitted the tests and the results filtered out to TKO, and
>>>> then I forgot to check them.  Looking at the graph backing this out has
>>>> had no effect.  As I think we'd expect from what comes below.
>>>>
>>>> What next?
>>> Get me the promised data? /proc/zoneinfo before and after the run. 
>>> /proc/meminfo and /sys/devices/system/node/node*/* would be helpful.
>> Sorry for the delay, the relevant files wern't all being preserved.
>> Fixed that up and reran things.  The results you asked for are available
>> here:
>>
>>     http://www.shadowen.org/~apw/public/debug-moe-perf/47138/
>>
>> Just having a quick look at the results, it seems that they are saying
>> that all of our cpu's are in node 0 which isn't right at all.  The
>> machine has 4 processors per node.
>>
>> I am sure that would account for the performance loss.  Now as to why ...
>>
>>> Is there a way to remotely access the box?
>> Sadly no ... I do have direct access to test on the box but am not able
>> to export it.
>>
>> I've also started a bisection looking for it.  Though that will be some
>> time yet as I've only just dropped the cleaver for the first time.
>>
> 
> I've added linux-mm.  Can we please keep it on-list.  I have a vague suspicion
> that your bisection will end up pointing at one Mel Gorman.  Or someone else.
> But whoever it is will end up wondering wtf is going on.
> 
> I don't understand what you mean by "all of our cpu's are in node 0"?  
> http://www.shadowen.org/~apw/public/debug-moe-perf/47138/sys/devices/system/node.after/node0/
> and
> http://www.shadowen.org/~apw/public/debug-moe-perf/47138/sys/devices/system/node.before/node0/
> look the same..  It depends what "before" and "after" mean, I guess...

What I have noted in this output is that all of the CPU's in this
machine have been assigned to node 0, this is incorrect because there
are four nodes of four cpus each.

The before and after refer to either side of the test showing the
regression.  Of course the cpu these are static and thus the same.

-apw

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: -mm numa perf regression
  2006-09-11 16:41                             ` Andy Whitcroft
@ 2006-09-11 17:16                               ` Andy Whitcroft
  2006-09-11 18:51                                 ` keith mannthey
  2006-09-11 21:33                                 ` Andy Whitcroft
  0 siblings, 2 replies; 6+ messages in thread
From: Andy Whitcroft @ 2006-09-11 17:16 UTC (permalink / raw)
  To: Keith Mannthey
  Cc: Andy Whitcroft, Andrew Morton, linux-mm, Christoph Lameter,
	Martin Bligh, Paul Jackson

Andy Whitcroft wrote:
> Andrew Morton wrote:
>> On Mon, 11 Sep 2006 15:19:01 +0100
>> Andy Whitcroft <apw@shadowen.org> wrote:
>>
>>> Christoph Lameter wrote:
>>>> On Fri, 8 Sep 2006, Andy Whitcroft wrote:
>>>>
>>>>>> I have not heard back from you on this issue. It would be good to have 
>>>>>> some more data on this one.
>>>>> Sorry I submitted the tests and the results filtered out to TKO, and
>>>>> then I forgot to check them.  Looking at the graph backing this out has
>>>>> had no effect.  As I think we'd expect from what comes below.
>>>>>
>>>>> What next?
>>>> Get me the promised data? /proc/zoneinfo before and after the run. 
>>>> /proc/meminfo and /sys/devices/system/node/node*/* would be helpful.
>>> Sorry for the delay, the relevant files wern't all being preserved.
>>> Fixed that up and reran things.  The results you asked for are available
>>> here:
>>>
>>>     http://www.shadowen.org/~apw/public/debug-moe-perf/47138/
>>>
>>> Just having a quick look at the results, it seems that they are saying
>>> that all of our cpu's are in node 0 which isn't right at all.  The
>>> machine has 4 processors per node.
>>>
>>> I am sure that would account for the performance loss.  Now as to why ...
>>>
>>>> Is there a way to remotely access the box?
>>> Sadly no ... I do have direct access to test on the box but am not able
>>> to export it.
>>>
>>> I've also started a bisection looking for it.  Though that will be some
>>> time yet as I've only just dropped the cleaver for the first time.
>>>
>> I've added linux-mm.  Can we please keep it on-list.  I have a vague suspicion
>> that your bisection will end up pointing at one Mel Gorman.  Or someone else.
>> But whoever it is will end up wondering wtf is going on.
>>
>> I don't understand what you mean by "all of our cpu's are in node 0"?  
>> http://www.shadowen.org/~apw/public/debug-moe-perf/47138/sys/devices/system/node.after/node0/
>> and
>> http://www.shadowen.org/~apw/public/debug-moe-perf/47138/sys/devices/system/node.before/node0/
>> look the same..  It depends what "before" and "after" mean, I guess...
> 
> What I have noted in this output is that all of the CPU's in this
> machine have been assigned to node 0, this is incorrect because there
> are four nodes of four cpus each.
> 
> The before and after refer to either side of the test showing the
> regression.  Of course the cpu these are static and thus the same.

For those who missed the history.  We have been tracking a performance
regression on kernbench on some numa systems.  In the process of
analysing that we've noticed that all of the cpus in the system are
being bound to node 0 rather than their home nodes.  This is caused by
the changes in:

convert-i386-summit-subarch-to-use-srat-info-for-apicid_to_node-calls.patch

@@ -647,7 +649,7 @@ static void map_cpu_to_logical_apicid(vo
        int apicid = logical_smp_processor_id();

        cpu_2_logical_apicid[cpu] = apicid;
-       map_cpu_to_node(cpu, apicid_to_node(apicid));
+       map_cpu_to_node(cpu, apicid_to_node(hard_smp_processor_id()));
 }

This change moves us this mapping from logical to physical apic id which
the sub-architectures are not expecting.  I've just booted a machine
with this patch (and its -tidy) backed out.  The processors are again
assigned to the right places.

I am expecting this to help with the performance problem too.  But we'll
have to wait for the test results to propogate out to TKO to be sure.

Keith, even if this doens't fix the performance regression there is
cirtainly an unexpected side effect to this change on our system here.

-apw

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: -mm numa perf regression
  2006-09-11 17:16                               ` Andy Whitcroft
@ 2006-09-11 18:51                                 ` keith mannthey
  2006-09-11 21:36                                   ` Andy Whitcroft
  2006-09-11 21:33                                 ` Andy Whitcroft
  1 sibling, 1 reply; 6+ messages in thread
From: keith mannthey @ 2006-09-11 18:51 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: Andrew Morton, linux-mm, Christoph Lameter, Martin Bligh, Paul Jackson

On Mon, 2006-09-11 at 18:16 +0100, Andy Whitcroft wrote:
> Andy Whitcroft wrote:
> > Andrew Morton wrote:
> >> On Mon, 11 Sep 2006 15:19:01 +0100
> >> Andy Whitcroft <apw@shadowen.org> wrote:
> >>
> >>> Christoph Lameter wrote:
> >>>> On Fri, 8 Sep 2006, Andy Whitcroft wrote:
> >>>>
> >>>>>> I have not heard back from you on this issue. It would be good to have 
> >>>>>> some more data on this one.
> >>>>> Sorry I submitted the tests and the results filtered out to TKO, and
> >>>>> then I forgot to check them.  Looking at the graph backing this out has
> >>>>> had no effect.  As I think we'd expect from what comes below.
> >>>>>
> >>>>> What next?
> >>>> Get me the promised data? /proc/zoneinfo before and after the run. 
> >>>> /proc/meminfo and /sys/devices/system/node/node*/* would be helpful.
> >>> Sorry for the delay, the relevant files wern't all being preserved.
> >>> Fixed that up and reran things.  The results you asked for are available
> >>> here:
> >>>
> >>>     http://www.shadowen.org/~apw/public/debug-moe-perf/47138/
> >>>
> >>> Just having a quick look at the results, it seems that they are saying
> >>> that all of our cpu's are in node 0 which isn't right at all.  The
> >>> machine has 4 processors per node.
> >>>
> >>> I am sure that would account for the performance loss.  Now as to why ...
> >>>
> >>>> Is there a way to remotely access the box?
> >>> Sadly no ... I do have direct access to test on the box but am not able
> >>> to export it.
> >>>
> >>> I've also started a bisection looking for it.  Though that will be some
> >>> time yet as I've only just dropped the cleaver for the first time.
> >>>
> >> I've added linux-mm.  Can we please keep it on-list.  I have a vague suspicion
> >> that your bisection will end up pointing at one Mel Gorman.  Or someone else.
> >> But whoever it is will end up wondering wtf is going on.
> >>
> >> I don't understand what you mean by "all of our cpu's are in node 0"?  
> >> http://www.shadowen.org/~apw/public/debug-moe-perf/47138/sys/devices/system/node.after/node0/
> >> and
> >> http://www.shadowen.org/~apw/public/debug-moe-perf/47138/sys/devices/system/node.before/node0/
> >> look the same..  It depends what "before" and "after" mean, I guess...
> > 
> > What I have noted in this output is that all of the CPU's in this
> > machine have been assigned to node 0, this is incorrect because there
> > are four nodes of four cpus each.
> > 
> > The before and after refer to either side of the test showing the
> > regression.  Of course the cpu these are static and thus the same.

This before data seems to have 4 nodes in both.  Maybe I am missing
context here.  

> For those who missed the history.  We have been tracking a performance
> regression on kernbench on some numa systems.  In the process of
> analysing that we've noticed that all of the cpus in the system are
> being bound to node 0 rather than their home nodes.  This is caused by
> the changes in:

That isn't good. 

> convert-i386-summit-subarch-to-use-srat-info-for-apicid_to_node-calls.patch
> 
> @@ -647,7 +649,7 @@ static void map_cpu_to_logical_apicid(vo
>         int apicid = logical_smp_processor_id();
> 
>         cpu_2_logical_apicid[cpu] = apicid;
> -       map_cpu_to_node(cpu, apicid_to_node(apicid));
> +       map_cpu_to_node(cpu, apicid_to_node(hard_smp_processor_id()));
>  }
> This change moves us this mapping from logical to physical apic id which
> the sub-architectures are not expecting.  I've just booted a machine
> with this patch (and its -tidy) backed out.  The processors are again
> assigned to the right places.


> I am expecting this to help with the performance problem too.  But we'll
> have to wait for the test results to propogate out to TKO to be sure.
> 
> Keith, even if this doens't fix the performance regression there is
> cirtainly an unexpected side effect to this change on our system here.

Hmm, I tested this against on x440,x445 and x460 summit was fine at the
time...  Is there a dmesg around for the failed boot?  Is this from your
16-way x440 or is this a numaq (moe is numaq right?) breakage? 

I can push the map_cpu_to_node setup in the subarch code if this is
numaq breakage. Summit apicid_to_node mapping are in physical (as
defined by the SRAT hence the change in lookup) I am building current -
mm on a multi-node summit system to see if I can work something out. 

Thanks,
 Keith 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: -mm numa perf regression
  2006-09-11 17:16                               ` Andy Whitcroft
  2006-09-11 18:51                                 ` keith mannthey
@ 2006-09-11 21:33                                 ` Andy Whitcroft
  1 sibling, 0 replies; 6+ messages in thread
From: Andy Whitcroft @ 2006-09-11 21:33 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: Keith Mannthey, Andrew Morton, linux-mm, Christoph Lameter,
	Martin Bligh, Paul Jackson

Andy Whitcroft wrote:
> Andy Whitcroft wrote:
>> Andrew Morton wrote:
>>> On Mon, 11 Sep 2006 15:19:01 +0100
>>> Andy Whitcroft <apw@shadowen.org> wrote:
>>>
>>>> Christoph Lameter wrote:
>>>>> On Fri, 8 Sep 2006, Andy Whitcroft wrote:
>>>>>
>>>>>>> I have not heard back from you on this issue. It would be good to have 
>>>>>>> some more data on this one.
>>>>>> Sorry I submitted the tests and the results filtered out to TKO, and
>>>>>> then I forgot to check them.  Looking at the graph backing this out has
>>>>>> had no effect.  As I think we'd expect from what comes below.
>>>>>>
>>>>>> What next?
>>>>> Get me the promised data? /proc/zoneinfo before and after the run. 
>>>>> /proc/meminfo and /sys/devices/system/node/node*/* would be helpful.
>>>> Sorry for the delay, the relevant files wern't all being preserved.
>>>> Fixed that up and reran things.  The results you asked for are available
>>>> here:
>>>>
>>>>     http://www.shadowen.org/~apw/public/debug-moe-perf/47138/
>>>>
>>>> Just having a quick look at the results, it seems that they are saying
>>>> that all of our cpu's are in node 0 which isn't right at all.  The
>>>> machine has 4 processors per node.
>>>>
>>>> I am sure that would account for the performance loss.  Now as to why ...
>>>>
>>>>> Is there a way to remotely access the box?
>>>> Sadly no ... I do have direct access to test on the box but am not able
>>>> to export it.
>>>>
>>>> I've also started a bisection looking for it.  Though that will be some
>>>> time yet as I've only just dropped the cleaver for the first time.
>>>>
>>> I've added linux-mm.  Can we please keep it on-list.  I have a vague suspicion
>>> that your bisection will end up pointing at one Mel Gorman.  Or someone else.
>>> But whoever it is will end up wondering wtf is going on.
>>>
>>> I don't understand what you mean by "all of our cpu's are in node 0"?  
>>> http://www.shadowen.org/~apw/public/debug-moe-perf/47138/sys/devices/system/node.after/node0/
>>> and
>>> http://www.shadowen.org/~apw/public/debug-moe-perf/47138/sys/devices/system/node.before/node0/
>>> look the same..  It depends what "before" and "after" mean, I guess...
>> What I have noted in this output is that all of the CPU's in this
>> machine have been assigned to node 0, this is incorrect because there
>> are four nodes of four cpus each.
>>
>> The before and after refer to either side of the test showing the
>> regression.  Of course the cpu these are static and thus the same.
> 
> For those who missed the history.  We have been tracking a performance
> regression on kernbench on some numa systems.  In the process of
> analysing that we've noticed that all of the cpus in the system are
> being bound to node 0 rather than their home nodes.  This is caused by
> the changes in:
> 
> convert-i386-summit-subarch-to-use-srat-info-for-apicid_to_node-calls.patch
> 
> @@ -647,7 +649,7 @@ static void map_cpu_to_logical_apicid(vo
>         int apicid = logical_smp_processor_id();
> 
>         cpu_2_logical_apicid[cpu] = apicid;
> -       map_cpu_to_node(cpu, apicid_to_node(apicid));
> +       map_cpu_to_node(cpu, apicid_to_node(hard_smp_processor_id()));
>  }
> 
> This change moves us this mapping from logical to physical apic id which
> the sub-architectures are not expecting.  I've just booted a machine
> with this patch (and its -tidy) backed out.  The processors are again
> assigned to the right places.
> 
> I am expecting this to help with the performance problem too.  But we'll
> have to wait for the test results to propogate out to TKO to be sure.
> 
> Keith, even if this doens't fix the performance regression there is
> cirtainly an unexpected side effect to this change on our system here.

Ok, the results are out there, and this layout imbalance is the cause,
performance is back to that of mainline.

-apw

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: -mm numa perf regression
  2006-09-11 18:51                                 ` keith mannthey
@ 2006-09-11 21:36                                   ` Andy Whitcroft
  0 siblings, 0 replies; 6+ messages in thread
From: Andy Whitcroft @ 2006-09-11 21:36 UTC (permalink / raw)
  To: kmannth
  Cc: Andrew Morton, linux-mm, Christoph Lameter, Martin Bligh, Paul Jackson

keith mannthey wrote:
> On Mon, 2006-09-11 at 18:16 +0100, Andy Whitcroft wrote:
>> Andy Whitcroft wrote:
>>> Andrew Morton wrote:
>>>> On Mon, 11 Sep 2006 15:19:01 +0100
>>>> Andy Whitcroft <apw@shadowen.org> wrote:
>>>>
>>>>> Christoph Lameter wrote:
>>>>>> On Fri, 8 Sep 2006, Andy Whitcroft wrote:
>>>>>>
>>>>>>>> I have not heard back from you on this issue. It would be good to have 
>>>>>>>> some more data on this one.
>>>>>>> Sorry I submitted the tests and the results filtered out to TKO, and
>>>>>>> then I forgot to check them.  Looking at the graph backing this out has
>>>>>>> had no effect.  As I think we'd expect from what comes below.
>>>>>>>
>>>>>>> What next?
>>>>>> Get me the promised data? /proc/zoneinfo before and after the run. 
>>>>>> /proc/meminfo and /sys/devices/system/node/node*/* would be helpful.
>>>>> Sorry for the delay, the relevant files wern't all being preserved.
>>>>> Fixed that up and reran things.  The results you asked for are available
>>>>> here:
>>>>>
>>>>>     http://www.shadowen.org/~apw/public/debug-moe-perf/47138/
>>>>>
>>>>> Just having a quick look at the results, it seems that they are saying
>>>>> that all of our cpu's are in node 0 which isn't right at all.  The
>>>>> machine has 4 processors per node.
>>>>>
>>>>> I am sure that would account for the performance loss.  Now as to why ...
>>>>>
>>>>>> Is there a way to remotely access the box?
>>>>> Sadly no ... I do have direct access to test on the box but am not able
>>>>> to export it.
>>>>>
>>>>> I've also started a bisection looking for it.  Though that will be some
>>>>> time yet as I've only just dropped the cleaver for the first time.
>>>>>
>>>> I've added linux-mm.  Can we please keep it on-list.  I have a vague suspicion
>>>> that your bisection will end up pointing at one Mel Gorman.  Or someone else.
>>>> But whoever it is will end up wondering wtf is going on.
>>>>
>>>> I don't understand what you mean by "all of our cpu's are in node 0"?  
>>>> http://www.shadowen.org/~apw/public/debug-moe-perf/47138/sys/devices/system/node.after/node0/
>>>> and
>>>> http://www.shadowen.org/~apw/public/debug-moe-perf/47138/sys/devices/system/node.before/node0/
>>>> look the same..  It depends what "before" and "after" mean, I guess...
>>> What I have noted in this output is that all of the CPU's in this
>>> machine have been assigned to node 0, this is incorrect because there
>>> are four nodes of four cpus each.
>>>
>>> The before and after refer to either side of the test showing the
>>> regression.  Of course the cpu these are static and thus the same.
> 
> This before data seems to have 4 nodes in both.  Maybe I am missing
> context here.  
> 
>> For those who missed the history.  We have been tracking a performance
>> regression on kernbench on some numa systems.  In the process of
>> analysing that we've noticed that all of the cpus in the system are
>> being bound to node 0 rather than their home nodes.  This is caused by
>> the changes in:
> 
> That isn't good. 
> 
>> convert-i386-summit-subarch-to-use-srat-info-for-apicid_to_node-calls.patch
>>
>> @@ -647,7 +649,7 @@ static void map_cpu_to_logical_apicid(vo
>>         int apicid = logical_smp_processor_id();
>>
>>         cpu_2_logical_apicid[cpu] = apicid;
>> -       map_cpu_to_node(cpu, apicid_to_node(apicid));
>> +       map_cpu_to_node(cpu, apicid_to_node(hard_smp_processor_id()));
>>  }
>> This change moves us this mapping from logical to physical apic id which
>> the sub-architectures are not expecting.  I've just booted a machine
>> with this patch (and its -tidy) backed out.  The processors are again
>> assigned to the right places.
> 
> 
>> I am expecting this to help with the performance problem too.  But we'll
>> have to wait for the test results to propogate out to TKO to be sure.
>>
>> Keith, even if this doens't fix the performance regression there is
>> cirtainly an unexpected side effect to this change on our system here.
> 
> Hmm, I tested this against on x440,x445 and x460 summit was fine at the
> time...  Is there a dmesg around for the failed boot?  Is this from your
> 16-way x440 or is this a numaq (moe is numaq right?) breakage? 
> 
> I can push the map_cpu_to_node setup in the subarch code if this is
> numaq breakage. Summit apicid_to_node mapping are in physical (as
> defined by the SRAT hence the change in lookup) I am building current -
> mm on a multi-node summit system to see if I can work something out. 

Yes this is NUMA-Q.  I think the key point here is that the lookup is
logical on all other sub-arches so I'd expect to be seeing a logical to
physical conversion in the subarch.

You have access to the test data direct if you need it.  The key bit of
the failure showed cpu's being assigned to node 0, as per the debug in
that routine.

-apw

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2006-09-11 21:36 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20060901105554.780e9e78.akpm@osdl.org>
     [not found] ` <Pine.LNX.4.64.0609011125110.19863@schroedinger.engr.sgi.com>
     [not found]   ` <44F88236.10803@google.com>
     [not found]     ` <Pine.LNX.4.64.0609011231300.20077@schroedinger.engr.sgi.com>
     [not found]       ` <44F8949E.4010308@google.com>
     [not found]         ` <Pine.LNX.4.64.0609011314590.20312@schroedinger.engr.sgi.com>
     [not found]           ` <44F8970F.2050004@google.com>
     [not found]             ` <Pine.LNX.4.64.0609011331240.20357@schroedinger.engr.sgi.com>
     [not found]               ` <44F8BB87.7050402@shadowen.org>
     [not found]                 ` <Pine.LNX.4.64.0609020658290.22978@schroedinger.engr.sgi.com>
     [not found]                   ` <Pine.LNX.4.64.0609071116290.16838@schroedinger.engr.sgi.com>
     [not found]                     ` <45017C95.90502@shadowen.org>
     [not found]                       ` <Pine.LNX.4.64.0609081132200.23089@schroedinger.engr.sgi.com>
     [not found]                         ` <45057055.7070003@shadowen.org>
2006-09-11 16:35                           ` -mm numa perf regression Andrew Morton
2006-09-11 16:41                             ` Andy Whitcroft
2006-09-11 17:16                               ` Andy Whitcroft
2006-09-11 18:51                                 ` keith mannthey
2006-09-11 21:36                                   ` Andy Whitcroft
2006-09-11 21:33                                 ` Andy Whitcroft

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox