Re: NUMA aware slab allocator V3

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Dave Hansen <haveblue@us.ibm.com>
To: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: "Martin J. Bligh" <mbligh@mbligh.org>,
	Christoph Lameter <clameter@engr.sgi.com>,
	Andy Whitcroft <apw@shadowen.org>, Andrew Morton <akpm@osdl.org>,
	linux-mm <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	shai@scalex86.org, steiner@sgi.com
Subject: Re: NUMA aware slab allocator V3
Date: Mon, 16 May 2005 14:54:30 -0700	[thread overview]
Message-ID: <1116280470.1005.137.camel@localhost> (raw)
In-Reply-To: <200505161410.43382.jbarnes@virtuousgeek.org>

On Mon, 2005-05-16 at 14:10 -0700, Jesse Barnes wrote:
> On Monday, May 16, 2005 11:08 am, Martin J. Bligh wrote:
> > > I have never seen such a machine. A SMP machine with multiple
> > > "nodes"? So essentially one NUMA node has multiple discontig
> > > "nodes"?
> >
> > I believe you (SGI) make one ;-) Anywhere where you have large gaps
> > in the physical address range within a node, this is what you really
> > need. Except ia64 has this wierd virtual mem_map thing that can go
> > away once we have sparsemem.
> 
> Right, the SGI boxes have discontiguous memory within a node, but it's 
> not represented by pgdats (like you said, one 'virtual memmap' spans 
> the whole address space of a node).  Sparse can help simplify this 
> across platforms, but has the potential to be more expensive for 
> systems with dynamically sized holes, due to the additional calculation 
> and potential cache miss associated with indexing into the correct 
> memmap (Dave can probably correct me here, it's been awhile).  With a 
> virtual memmap, you only occasionally take a TLB miss on the struct 
> page access after indexing into the array.

The sparsemem calculation costs are quite low.  One of the main costs is
bringing the actual 'struct page' into the cache so you can use the
hints in page->flags.  In reality, after almost every pfn_to_page(), you
go ahead and touch the 'struct page' anyway.  So, this cost is
effectively zero.  In fact, it's kinda like doing a prefetch, so it may
even speed some things up.

After you have the section index from page->flags (which costs just a
shift and a mask), you access into a static array, and do a single
subtraction.  Here's the I386) disassembly this function with
SPARSEMEM=y:

        unsigned long page_to_pfn_stub(struct page *page)
        {
                return page_to_pfn(page);
        }

    1c30:       8b 54 24 04             mov    0x4(%esp),%edx
    1c34:       8b 02                   mov    (%edx),%eax
    1c36:       c1 e8 1a                shr    $0x1a,%eax
    1c39:       8b 04 85 00 00 00 00    mov    0x0(,%eax,4),%eax
    1c40:       24 fc                   and    $0xfc,%al
    1c42:       29 c2                   sub    %eax,%edx
    1c44:       c1 fa 05                sar    $0x5,%edx
    1c47:       89 d0                   mov    %edx,%eax
    1c49:       c3                      ret

Other than popping the arguments off the stack, I think there are only
two loads in there: the page->flags load, and the mem_section[]
dereference.  So, in the end, the only advantage of the vmem_map[]
approach is saving that _one_ load.  The worst-case-scenario for this
load in the sparsemem case is a full cache miss.  The worst case in the
vmem_map[] case is a TLB miss, which is probably hundreds of times
slower than even a full cache miss.

BTW, the object footprint of sparsemem is lower than discontigmem, too:

		SPARSEMEM 	DISCONTIGMEM
pfn_to_page:	      25b	         41b
page_to_pfn:	      25b		 33b

So, that helps out things like icache footprint.

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

next prev parent reply	other threads:[~2005-05-16 21:54 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-05-11 15:17 NUMA aware slab allocator V2 Christoph Lameter
2005-05-11 15:46 ` Jack Steiner
2005-05-12  7:04 ` Andrew Morton
2005-05-12  9:39   ` Niraj kumar
2005-05-12 20:02   ` Christoph Lameter
2005-05-12 20:22     ` Andrew Morton
2005-05-13  7:06     ` Andrew Morton
2005-05-13 11:21       ` Christoph Lameter
2005-05-13 11:33         ` Andrew Morton
2005-05-13 11:37           ` Christoph Lameter
2005-05-13 13:56             ` Dave Hansen
2005-05-13 16:20               ` Christoph Lameter
2005-05-14  1:24           ` NUMA aware slab allocator V3 Christoph Lameter
2005-05-14  7:42             ` Andrew Morton
2005-05-14 16:24               ` Christoph Lameter
2005-05-16  5:00                 ` Andrew Morton
2005-05-16 13:52             ` Dave Hansen
2005-05-16 16:47               ` Christoph Lameter
2005-05-16 17:22                 ` Dave Hansen
2005-05-16 17:54                   ` Christoph Lameter
2005-05-16 18:08                     ` Martin J. Bligh
2005-05-16 21:10                       ` Jesse Barnes
2005-05-16 21:21                         ` Martin J. Bligh
2005-05-17  0:14                           ` Christoph Lameter
2005-05-17  0:26                             ` Dave Hansen
2005-05-17 23:36                               ` Matthew Dobson
2005-05-17 23:49                                 ` Christoph Lameter
2005-05-18 17:27                                   ` Matthew Dobson
2005-05-18 17:48                                     ` Christoph Lameter
2005-05-18 21:15                                       ` Matthew Dobson
2005-05-18 21:40                                         ` Christoph Lameter
2005-05-19  5:07                                           ` Christoph Lameter
2005-05-19 16:14                                             ` Jesse Barnes
2005-05-19 19:03                                             ` Matthew Dobson
2005-05-19 21:46                                             ` Matthew Dobson
2005-05-20 19:03                                             ` Matthew Dobson
2005-05-20 19:23                                               ` Christoph Lameter
2005-05-20 20:20                                                 ` Matthew Dobson
2005-05-20 21:30                                                 ` Matthew Dobson
2005-05-20 23:42                                                   ` Christoph Lameter
2005-05-24 21:37                                                   ` Christoph Lameter
2005-05-24 23:02                                                     ` Matthew Dobson
2005-05-25  5:21                                                       ` Christoph Lameter
2005-05-25 18:27                                                         ` Matthew Dobson
2005-05-25 21:03                                                           ` Christoph Lameter
2005-05-26  6:48                                                             ` Martin J. Bligh
2005-05-28  1:59                                                       ` NUMA aware slab allocator V4 Christoph Lameter
2005-05-16 21:54                         ` Dave Hansen [this message]
2005-05-16 18:12                     ` NUMA aware slab allocator V3 Dave Hansen
2005-05-13 13:46         ` NUMA aware slab allocator V2 Dave Hansen
2005-05-17 23:29       ` Matthew Dobson
2005-05-18  1:07         ` Christoph Lameter
2005-05-12 21:49 ` Robin Holt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1116280470.1005.137.camel@localhost \
    --to=haveblue@us.ibm.com \
    --cc=akpm@osdl.org \
    --cc=apw@shadowen.org \
    --cc=clameter@engr.sgi.com \
    --cc=jbarnes@virtuousgeek.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mbligh@mbligh.org \
    --cc=shai@scalex86.org \
    --cc=steiner@sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox