Re: -mm numa perf regression

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: keith mannthey <kmannth@us.ibm.com>
To: Andy Whitcroft <apw@shadowen.org>
Cc: Andrew Morton <akpm@osdl.org>, linux-mm <linux-mm@kvack.org>,
	Christoph Lameter <clameter@sgi.com>,
	Martin Bligh <mbligh@google.com>, Paul Jackson <pj@sgi.com>
Subject: Re: -mm numa perf regression
Date: Mon, 11 Sep 2006 11:51:27 -0700	[thread overview]
Message-ID: <1158000687.5755.50.camel@keithlap> (raw)
In-Reply-To: <450599F4.4050707@shadowen.org>

On Mon, 2006-09-11 at 18:16 +0100, Andy Whitcroft wrote:
> Andy Whitcroft wrote:
> > Andrew Morton wrote:
> >> On Mon, 11 Sep 2006 15:19:01 +0100
> >> Andy Whitcroft <apw@shadowen.org> wrote:
> >>
> >>> Christoph Lameter wrote:
> >>>> On Fri, 8 Sep 2006, Andy Whitcroft wrote:
> >>>>
> >>>>>> I have not heard back from you on this issue. It would be good to have 
> >>>>>> some more data on this one.
> >>>>> Sorry I submitted the tests and the results filtered out to TKO, and
> >>>>> then I forgot to check them.  Looking at the graph backing this out has
> >>>>> had no effect.  As I think we'd expect from what comes below.
> >>>>>
> >>>>> What next?
> >>>> Get me the promised data? /proc/zoneinfo before and after the run. 
> >>>> /proc/meminfo and /sys/devices/system/node/node*/* would be helpful.
> >>> Sorry for the delay, the relevant files wern't all being preserved.
> >>> Fixed that up and reran things.  The results you asked for are available
> >>> here:
> >>>
> >>>     http://www.shadowen.org/~apw/public/debug-moe-perf/47138/
> >>>
> >>> Just having a quick look at the results, it seems that they are saying
> >>> that all of our cpu's are in node 0 which isn't right at all.  The
> >>> machine has 4 processors per node.
> >>>
> >>> I am sure that would account for the performance loss.  Now as to why ...
> >>>
> >>>> Is there a way to remotely access the box?
> >>> Sadly no ... I do have direct access to test on the box but am not able
> >>> to export it.
> >>>
> >>> I've also started a bisection looking for it.  Though that will be some
> >>> time yet as I've only just dropped the cleaver for the first time.
> >>>
> >> I've added linux-mm.  Can we please keep it on-list.  I have a vague suspicion
> >> that your bisection will end up pointing at one Mel Gorman.  Or someone else.
> >> But whoever it is will end up wondering wtf is going on.
> >>
> >> I don't understand what you mean by "all of our cpu's are in node 0"?  
> >> http://www.shadowen.org/~apw/public/debug-moe-perf/47138/sys/devices/system/node.after/node0/
> >> and
> >> http://www.shadowen.org/~apw/public/debug-moe-perf/47138/sys/devices/system/node.before/node0/
> >> look the same..  It depends what "before" and "after" mean, I guess...
> > 
> > What I have noted in this output is that all of the CPU's in this
> > machine have been assigned to node 0, this is incorrect because there
> > are four nodes of four cpus each.
> > 
> > The before and after refer to either side of the test showing the
> > regression.  Of course the cpu these are static and thus the same.

This before data seems to have 4 nodes in both.  Maybe I am missing
context here.  

> For those who missed the history.  We have been tracking a performance
> regression on kernbench on some numa systems.  In the process of
> analysing that we've noticed that all of the cpus in the system are
> being bound to node 0 rather than their home nodes.  This is caused by
> the changes in:

That isn't good. 

> convert-i386-summit-subarch-to-use-srat-info-for-apicid_to_node-calls.patch
> 
> @@ -647,7 +649,7 @@ static void map_cpu_to_logical_apicid(vo
>         int apicid = logical_smp_processor_id();
> 
>         cpu_2_logical_apicid[cpu] = apicid;
> -       map_cpu_to_node(cpu, apicid_to_node(apicid));
> +       map_cpu_to_node(cpu, apicid_to_node(hard_smp_processor_id()));
>  }
> This change moves us this mapping from logical to physical apic id which
> the sub-architectures are not expecting.  I've just booted a machine
> with this patch (and its -tidy) backed out.  The processors are again
> assigned to the right places.


> I am expecting this to help with the performance problem too.  But we'll
> have to wait for the test results to propogate out to TKO to be sure.
> 
> Keith, even if this doens't fix the performance regression there is
> cirtainly an unexpected side effect to this change on our system here.

Hmm, I tested this against on x440,x445 and x460 summit was fine at the
time...  Is there a dmesg around for the failed boot?  Is this from your
16-way x440 or is this a numaq (moe is numaq right?) breakage? 

I can push the map_cpu_to_node setup in the subarch code if this is
numaq breakage. Summit apicid_to_node mapping are in physical (as
defined by the SRAT hence the change in lookup) I am building current -
mm on a multi-node summit system to see if I can work something out. 

Thanks,
 Keith 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2006-09-11 18:51 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20060901105554.780e9e78.akpm@osdl.org>
     [not found] ` <Pine.LNX.4.64.0609011125110.19863@schroedinger.engr.sgi.com>
     [not found]   ` <44F88236.10803@google.com>
     [not found]     ` <Pine.LNX.4.64.0609011231300.20077@schroedinger.engr.sgi.com>
     [not found]       ` <44F8949E.4010308@google.com>
     [not found]         ` <Pine.LNX.4.64.0609011314590.20312@schroedinger.engr.sgi.com>
     [not found]           ` <44F8970F.2050004@google.com>
     [not found]             ` <Pine.LNX.4.64.0609011331240.20357@schroedinger.engr.sgi.com>
     [not found]               ` <44F8BB87.7050402@shadowen.org>
     [not found]                 ` <Pine.LNX.4.64.0609020658290.22978@schroedinger.engr.sgi.com>
     [not found]                   ` <Pine.LNX.4.64.0609071116290.16838@schroedinger.engr.sgi.com>
     [not found]                     ` <45017C95.90502@shadowen.org>
     [not found]                       ` <Pine.LNX.4.64.0609081132200.23089@schroedinger.engr.sgi.com>
     [not found]                         ` <45057055.7070003@shadowen.org>
2006-09-11 16:35                           ` Andrew Morton
2006-09-11 16:41                             ` Andy Whitcroft
2006-09-11 17:16                               ` Andy Whitcroft
2006-09-11 18:51                                 ` keith mannthey [this message]
2006-09-11 21:36                                   ` Andy Whitcroft
2006-09-11 21:33                                 ` Andy Whitcroft

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1158000687.5755.50.camel@keithlap \
    --to=kmannth@us.ibm.com \
    --cc=akpm@osdl.org \
    --cc=apw@shadowen.org \
    --cc=clameter@sgi.com \
    --cc=linux-mm@kvack.org \
    --cc=mbligh@google.com \
    --cc=pj@sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox