From: "Tim Pepper" <lnxninja@us.ibm.com>
To: Christoph Lameter <clameter@sgi.com>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>,
ak@suse.de, linux-mm@kvack.org, linuxppc-dev@ozlabs.org,
agl@us.ibm.com
Subject: Re: libnuma interleaving oddness
Date: Tue, 29 Aug 2006 22:40:54 -0700 [thread overview]
Message-ID: <eada2a070608292240l21794824v4f127c0ae4b4758f@mail.gmail.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0608292123230.23009@schroedinger.engr.sgi.com>
On 8/29/06, Christoph Lameter <clameter@sgi.com> wrote:
> On Tue, 29 Aug 2006, Nishanth Aravamudan wrote:
>
> > If I use the default hugepage-aligned hugepage-backed malloc
> > replacement, I get the following in /proc/pid/numa_maps (excerpt):
> >
> > 20000000 interleave=0-7 file=/libhugetlbfs/libhugetlbfs.tmp.3JbO7R\040(deleted) huge dirty=1 N0=1
> > 21000000 interleave=0-7 file=/libhugetlbfs/libhugetlbfs.tmp.3JbO7R\040(deleted) huge dirty=1 N0=1
> > ...
> > 37000000 interleave=0-7 file=/libhugetlbfs/libhugetlbfs.tmp.3JbO7R\040(deleted) huge dirty=1 N0=1
> > 38000000 interleave=0-7 file=/libhugetlbfs/libhugetlbfs.tmp.3JbO7R\040(deleted) huge dirty=1 N0=1
>
> Is this with nodemask set to [0]?
The above is with a nodemask of 0-7. Just removing node 0 from the mask causes
interleaving to start as below:
> > If I change the nodemask to 1-7, I get:
> >
> > 20000000 interleave=1-7 file=/libhugetlbfs/libhugetlbfs.tmp.Eh9Bmp\040(deleted) huge dirty=1 N1=1
> > 21000000 interleave=1-7 file=/libhugetlbfs/libhugetlbfs.tmp.Eh9Bmp\040(deleted) huge dirty=1 N2=1
> > 22000000 interleave=1-7 file=/libhugetlbfs/libhugetlbfs.tmp.Eh9Bmp\040(deleted) huge dirty=1 N3=1
> > 23000000 interleave=1-7 file=/libhugetlbfs/libhugetlbfs.tmp.Eh9Bmp\040(deleted) huge dirty=1 N4=1
> > 24000000 interleave=1-7 file=/libhugetlbfs/libhugetlbfs.tmp.Eh9Bmp\040(deleted) huge dirty=1 N5=1
> > 25000000 interleave=1-7 file=/libhugetlbfs/libhugetlbfs.tmp.Eh9Bmp\040(deleted) huge dirty=1 N6=1
> > 26000000 interleave=1-7 file=/libhugetlbfs/libhugetlbfs.tmp.Eh9Bmp\040(deleted) huge dirty=1 N7=1
> > ...
> > 35000000 interleave=1-7 file=/libhugetlbfs/libhugetlbfs.tmp.Eh9Bmp\040(deleted) huge dirty=1 N1=1
> > 36000000 interleave=1-7 file=/libhugetlbfs/libhugetlbfs.tmp.Eh9Bmp\040(deleted) huge dirty=1 N2=1
> > 37000000 interleave=1-7 file=/libhugetlbfs/libhugetlbfs.tmp.Eh9Bmp\040(deleted) huge dirty=1 N3=1
> > 38000000 interleave=1-7 file=/libhugetlbfs/libhugetlbfs.tmp.Eh9Bmp\040(deleted) huge dirty=1 N4=1
>
> So interleave has an effect.
>
> Are you using cpusets? Or are you only using memory policies? What is the
> default policy of the task you are running?
Just memory policies with the default task policy...really simple
code. The current incantation basically does setup in the form of:
numa_available();
nodemask_zero(&nodemask);
for (i = 0; i <= maxnode; i++)
nodemask_set(&nodemask, i);
and then creates mmaps followed by:
numa_interleave_memory(p, size, &nodemask);
mlock(p, size)
munlock(p, size);
to get the page faulted in.
> Hmm... Strange. Interleaving should continue after the last one....
That's what we thought...good to know we're not crazy. We've spent a
lot of time looking at libnuma and the userspace side of things trying
to figure out if we were somehow passing an invalid nodemask into the
kernel, but we've pretty well convinced ourselves that is not the
case. The kernel side of things (eg: sys_mbind() codepath) isn't
exactly obvious...code inspection's been a bit gruelling...need to do
kernel side probing to see what codepaths we're actually hitting.
An interesting additional point: Nish's code originally wasn't using
libnuma and I wrote a simple little mmapping test program using
libnuma to compare results (thinking userspace issue). My code worked
fine. He rewrote to use libnuma and I rewrote to not use libnuma
thinking we'd find the problem in between. Yet my code still gets
interleaving and his does not. The only real difference between our
code is that mine basically does:
mmap(...many hugepages...)
and Nish's effectively is doing:
foreach(1..n) { mmap(...many/n hugepages...)}
if that pseudocode makes sense. As above, when he changes his mmap to
grab more than one hugepage of memory at a time he starts seeing
interleaving.
Tim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2006-08-30 5:40 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-08-29 23:15 Nishanth Aravamudan
2006-08-29 23:57 ` Christoph Lameter
2006-08-30 0:21 ` Nishanth Aravamudan
2006-08-30 2:26 ` Nishanth Aravamudan
2006-08-30 4:26 ` Christoph Lameter
2006-08-30 5:31 ` Nishanth Aravamudan
2006-08-30 5:40 ` Tim Pepper [this message]
2006-08-30 7:19 ` Andi Kleen
2006-08-30 7:29 ` Nishanth Aravamudan
2006-08-30 7:32 ` Andi Kleen
2006-08-30 18:01 ` Tim Pepper
2006-08-30 18:12 ` Andi Kleen
2006-08-30 18:13 ` Adam Litke
2006-08-30 21:04 ` Christoph Lameter
2006-08-31 6:00 ` Nishanth Aravamudan
2006-08-31 7:47 ` Andi Kleen
2006-08-31 15:49 ` Nishanth Aravamudan
2006-08-31 16:00 ` [PATCH] fix NUMA interleaving for huge pages (was RE: libnuma interleaving oddness) Nishanth Aravamudan
2006-08-31 16:08 ` Adam Litke
2006-08-31 16:19 ` Tim Pepper
2006-08-31 16:37 ` Christoph Lameter
2006-08-30 17:44 ` libnuma interleaving oddness Adam Litke
2006-08-30 7:16 ` Andi Kleen
-- strict thread matches above, loose matches on Subject: below --
2006-08-29 23:02 Nishanth Aravamudan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=eada2a070608292240l21794824v4f127c0ae4b4758f@mail.gmail.com \
--to=lnxninja@us.ibm.com \
--cc=agl@us.ibm.com \
--cc=ak@suse.de \
--cc=clameter@sgi.com \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@ozlabs.org \
--cc=nacc@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox