linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dave Hansen <haveblue@us.ibm.com>
To: Christoph Lameter <clameter@engr.sgi.com>
Cc: Andrew Morton <akpm@osdl.org>, linux-mm <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	shai@scalex86.org, steiner@sgi.com
Subject: Re: NUMA aware slab allocator V3
Date: Mon, 16 May 2005 10:22:15 -0700	[thread overview]
Message-ID: <1116264135.1005.73.camel@localhost> (raw)
In-Reply-To: <Pine.LNX.4.62.0505160943140.1330@schroedinger.engr.sgi.com>

On Mon, 2005-05-16 at 09:47 -0700, Christoph Lameter wrote:
> On Mon, 16 May 2005, Dave Hansen wrote:
> > There are some broken assumptions in the kernel that
> > CONFIG_DISCONTIG==CONFIG_NUMA.  These usually manifest when code assumes
> > that one pg_data_t means one NUMA node.
> > 
> > However, NUMA node ids are actually distinct from "discontigmem nodes".
> > A "discontigmem node" is just one physically contiguous area of memory,
> > thus one pg_data_t.  Some (non-NUMA) Mac G5's have a gap in their
> > address space, so they get two discontigmem nodes.
> 
> I thought the discontigous memory in one node was handled through zones? 
> I.e. ZONE_HIGHMEM in i386?

You can only have one zone of each type under each pg_data_t.  For
instance, you can't properly represent (DMA, NORMAL, HIGHMEM, <GAP>,
HIGHMEM) in a single pg_data_t without wasting node_mem_map[] space.
The "proper" discontig way of representing that is like this:

        pg_data_t[0] (DMA, NORMAL, HIGHMEM)
        <GAP>
        pg_data_t[1] (---, ------, HIGHMEM)

Where pg_data_t[1] has empty DMA and NORMAL zones.  Also, remember that
both of these could theoretically be on the same NUMA node.  But, I
don't think we ever do that in practice.

> > So, that #error is bogus.  It's perfectly valid to have multiple
> > discontigmem nodes, when the number of NUMA nodes is 1.  MAX_NUMNODES
> > refers to discontigmem nodes, not NUMA nodes.
> 
> Ok. We looked through the code and saw that the check may be removed 
> without causing problems. However, there is still a feeling of uneasiness 
> about this.

I don't blame you :)

> To what node does numa_node_id() refer?

That refers to the NUMA node that you're thinking of.  Close CPUs and
memory and I/O, etc...

> And it is legit to use 
> numa_node_id() to index cpu maps and stuff?

Yes, those are all NUMA nodes.

> How do the concepts of numa node id relate to discontig node ids?

I believe there are quite a few assumptions on some architectures that,
when NUMA is on, they are equivalent.  It appears to be pretty much
assumed everywhere that CONFIG_NUMA=y means one pg_data_t per NUMA node.

Remember, as you saw, you can't assume that MAX_NUMNODES=1 when NUMA=n
because of the DISCONTIG=y case.

So, in summary, if you want to do it right: use the
CONFIG_NEED_MULTIPLE_NODES that you see in -mm.  As plain DISCONTIG=y
gets replaced by sparsemem any code using this is likely to stay
working.

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

  reply	other threads:[~2005-05-16 17:22 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-05-11 15:17 NUMA aware slab allocator V2 Christoph Lameter
2005-05-11 15:46 ` Jack Steiner
2005-05-12  7:04 ` Andrew Morton
2005-05-12  9:39   ` Niraj kumar
2005-05-12 20:02   ` Christoph Lameter
2005-05-12 20:22     ` Andrew Morton
2005-05-13  7:06     ` Andrew Morton
2005-05-13 11:21       ` Christoph Lameter
2005-05-13 11:33         ` Andrew Morton
2005-05-13 11:37           ` Christoph Lameter
2005-05-13 13:56             ` Dave Hansen
2005-05-13 16:20               ` Christoph Lameter
2005-05-14  1:24           ` NUMA aware slab allocator V3 Christoph Lameter
2005-05-14  7:42             ` Andrew Morton
2005-05-14 16:24               ` Christoph Lameter
2005-05-16  5:00                 ` Andrew Morton
2005-05-16 13:52             ` Dave Hansen
2005-05-16 16:47               ` Christoph Lameter
2005-05-16 17:22                 ` Dave Hansen [this message]
2005-05-16 17:54                   ` Christoph Lameter
2005-05-16 18:08                     ` Martin J. Bligh
2005-05-16 21:10                       ` Jesse Barnes
2005-05-16 21:21                         ` Martin J. Bligh
2005-05-17  0:14                           ` Christoph Lameter
2005-05-17  0:26                             ` Dave Hansen
2005-05-17 23:36                               ` Matthew Dobson
2005-05-17 23:49                                 ` Christoph Lameter
2005-05-18 17:27                                   ` Matthew Dobson
2005-05-18 17:48                                     ` Christoph Lameter
2005-05-18 21:15                                       ` Matthew Dobson
2005-05-18 21:40                                         ` Christoph Lameter
2005-05-19  5:07                                           ` Christoph Lameter
2005-05-19 16:14                                             ` Jesse Barnes
2005-05-19 19:03                                             ` Matthew Dobson
2005-05-19 21:46                                             ` Matthew Dobson
2005-05-20 19:03                                             ` Matthew Dobson
2005-05-20 19:23                                               ` Christoph Lameter
2005-05-20 20:20                                                 ` Matthew Dobson
2005-05-20 21:30                                                 ` Matthew Dobson
2005-05-20 23:42                                                   ` Christoph Lameter
2005-05-24 21:37                                                   ` Christoph Lameter
2005-05-24 23:02                                                     ` Matthew Dobson
2005-05-25  5:21                                                       ` Christoph Lameter
2005-05-25 18:27                                                         ` Matthew Dobson
2005-05-25 21:03                                                           ` Christoph Lameter
2005-05-26  6:48                                                             ` Martin J. Bligh
2005-05-28  1:59                                                       ` NUMA aware slab allocator V4 Christoph Lameter
2005-05-16 21:54                         ` NUMA aware slab allocator V3 Dave Hansen
2005-05-16 18:12                     ` Dave Hansen
2005-05-13 13:46         ` NUMA aware slab allocator V2 Dave Hansen
2005-05-17 23:29       ` Matthew Dobson
2005-05-18  1:07         ` Christoph Lameter
2005-05-12 21:49 ` Robin Holt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1116264135.1005.73.camel@localhost \
    --to=haveblue@us.ibm.com \
    --cc=akpm@osdl.org \
    --cc=clameter@engr.sgi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=shai@scalex86.org \
    --cc=steiner@sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox