* [patch 0/2] fix bugs while booting on NUMA system where some nodes have no mem
@ 2006-11-15 18:30 Christian Krafft
2006-11-15 18:32 ` [patch 1/2] fix call to alloc_bootmem after bootmem has been freed Christian Krafft
2006-11-15 18:34 ` [patch 2/2] enables booting a NUMA system where some nodes have no memory Christian Krafft
0 siblings, 2 replies; 37+ messages in thread
From: Christian Krafft @ 2006-11-15 18:30 UTC (permalink / raw)
To: linux-mm; +Cc: linux-kernel, krafft
Hi,
The following patches are fixing two problems that showed up
while booting a NUMA system where memory was limited to the first node.
Please cc me for comments as I am not subscribed.
cheers,
Christian
PS: sorry for resending it, I didn't cc myself, and wasn't able to reply to this note.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* [patch 1/2] fix call to alloc_bootmem after bootmem has been freed
2006-11-15 18:30 [patch 0/2] fix bugs while booting on NUMA system where some nodes have no mem Christian Krafft
@ 2006-11-15 18:32 ` Christian Krafft
2006-11-21 16:55 ` Andrew Morton
2006-11-15 18:34 ` [patch 2/2] enables booting a NUMA system where some nodes have no memory Christian Krafft
1 sibling, 1 reply; 37+ messages in thread
From: Christian Krafft @ 2006-11-15 18:32 UTC (permalink / raw)
To: Christian Krafft; +Cc: linux-mm, linux-kernel
In some cases it might happen, that alloc_bootmem is beeing called
after bootmem pages have been freed. This is, because the condition
SYSTEM_BOOTING is still true after bootmem has been freed.
Signed-off-by: Christian Krafft <krafft@de.ibm.com>
Index: linux/mm/page_alloc.c
===================================================================
--- linux.orig/mm/page_alloc.c
+++ linux/mm/page_alloc.c
@@ -1931,7 +1931,7 @@ int zone_wait_table_init(struct zone *zo
alloc_size = zone->wait_table_hash_nr_entries
* sizeof(wait_queue_head_t);
- if (system_state == SYSTEM_BOOTING) {
+ if (!slab_is_available()) {
zone->wait_table = (wait_queue_head_t *)
alloc_bootmem_node(pgdat, alloc_size);
} else {
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* [patch 2/2] enables booting a NUMA system where some nodes have no memory
2006-11-15 18:30 [patch 0/2] fix bugs while booting on NUMA system where some nodes have no mem Christian Krafft
2006-11-15 18:32 ` [patch 1/2] fix call to alloc_bootmem after bootmem has been freed Christian Krafft
@ 2006-11-15 18:34 ` Christian Krafft
2006-11-15 21:24 ` Christoph Lameter
1 sibling, 1 reply; 37+ messages in thread
From: Christian Krafft @ 2006-11-15 18:34 UTC (permalink / raw)
To: Christian Krafft; +Cc: linux-mm, linux-kernel
When booting a NUMA system with nodes that have no memory (eg by limiting memory),
bootmem_alloc_core tried to find pages in an uninitialized bootmem_map.
This caused a null pointer access.
This fix adds a check, so that NULL is returned.
That will enable the caller (bootmem_alloc_nopanic)
to alloc memory on other without a panic.
Signed-off-by: Christian Krafft <krafft@de.ibm.com>
Index: linux/mm/bootmem.c
===================================================================
--- linux.orig/mm/bootmem.c
+++ linux/mm/bootmem.c
@@ -196,6 +196,10 @@ __alloc_bootmem_core(struct bootmem_data
if (limit && bdata->node_boot_start >= limit)
return NULL;
+ /* on nodes without memory - bootmem_map is NULL */
+ if(!bdata->node_bootmem_map)
+ return NULL;
+
end_pfn = bdata->node_low_pfn;
limit = PFN_DOWN(limit);
if (limit && end_pfn > limit)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [patch 2/2] enables booting a NUMA system where some nodes have no memory
2006-11-15 18:34 ` [patch 2/2] enables booting a NUMA system where some nodes have no memory Christian Krafft
@ 2006-11-15 21:24 ` Christoph Lameter
2006-11-15 21:58 ` Jack Steiner
2006-11-15 22:05 ` Martin Bligh
0 siblings, 2 replies; 37+ messages in thread
From: Christoph Lameter @ 2006-11-15 21:24 UTC (permalink / raw)
To: Christian Krafft; +Cc: linux-mm, linux-kernel
On Wed, 15 Nov 2006, Christian Krafft wrote:
> When booting a NUMA system with nodes that have no memory (eg by limiting memory),
> bootmem_alloc_core tried to find pages in an uninitialized bootmem_map.
Why should we support nodes with no memory? If a node has no memory then
its processors and other resources need to be attached to the nearest node
with memory.
AFAICT The primary role of a node is to manage memory.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [patch 2/2] enables booting a NUMA system where some nodes have no memory
2006-11-15 21:24 ` Christoph Lameter
@ 2006-11-15 21:58 ` Jack Steiner
2006-11-15 22:40 ` Christoph Lameter
2006-11-15 22:05 ` Martin Bligh
1 sibling, 1 reply; 37+ messages in thread
From: Jack Steiner @ 2006-11-15 21:58 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Christian Krafft, linux-mm, linux-kernel
On Wed, Nov 15, 2006 at 01:24:55PM -0800, Christoph Lameter wrote:
> On Wed, 15 Nov 2006, Christian Krafft wrote:
>
> > When booting a NUMA system with nodes that have no memory (eg by limiting memory),
> > bootmem_alloc_core tried to find pages in an uninitialized bootmem_map.
>
> Why should we support nodes with no memory? If a node has no memory then
> its processors and other resources need to be attached to the nearest node
> with memory.
>
> AFAICT The primary role of a node is to manage memory.
>
SGI has nodes that are have neither memory or cpus. These are
IO nodes. Think of them as ordinary nodes that have had the
cpu's & DIMMs removed. Only the IO buses remain.
IO nodes have the same NUMA properties as regular nodes.
They are connected via the numalink fabric, they should be described
in the SLIT table, they should be identified in proximity_domains, etc.
A lot of the core infrastructure is currently missing that is required
to describe IO nodes as regular nodes, but in principle, I don't
see anything wrong with nodes w/o memory.
It is also possible to disable the DIMMs on a node that actually has
cpus & memory. I suspect this doesn't work but I see no reason that you
should HAVE to disable the cpus on nodes that have had the DIMMs disabled.
Our BIOS currently provides the capability to disable DIMMS. The BIOS has
a hack to automatically disable cpus if all DIMMs have been disabled.
This hack was required for several reasons, one of which was linux does
not support nodes with cpus & no memory.
-- jack
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [patch 2/2] enables booting a NUMA system where some nodes have no memory
2006-11-15 21:24 ` Christoph Lameter
2006-11-15 21:58 ` Jack Steiner
@ 2006-11-15 22:05 ` Martin Bligh
2006-11-15 22:41 ` Christoph Lameter
1 sibling, 1 reply; 37+ messages in thread
From: Martin Bligh @ 2006-11-15 22:05 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Christian Krafft, linux-mm, linux-kernel
Christoph Lameter wrote:
> On Wed, 15 Nov 2006, Christian Krafft wrote:
>
>> When booting a NUMA system with nodes that have no memory (eg by limiting memory),
>> bootmem_alloc_core tried to find pages in an uninitialized bootmem_map.
>
> Why should we support nodes with no memory? If a node has no memory then
> its processors and other resources need to be attached to the nearest node
> with memory.
>
> AFAICT The primary role of a node is to manage memory.
A node is an arbitrary container object containing one or more of:
CPUs
Memory
IO bus
It does not have to contain memory.
M.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [patch 2/2] enables booting a NUMA system where some nodes have no memory
2006-11-15 21:58 ` Jack Steiner
@ 2006-11-15 22:40 ` Christoph Lameter
2006-11-15 22:43 ` Martin Bligh
2006-11-16 1:35 ` Jack Steiner
0 siblings, 2 replies; 37+ messages in thread
From: Christoph Lameter @ 2006-11-15 22:40 UTC (permalink / raw)
To: Jack Steiner; +Cc: Christian Krafft, linux-mm, Martin Bligh, linux-kernel
On Wed, 15 Nov 2006, Jack Steiner wrote:
> A lot of the core infrastructure is currently missing that is required
> to describe IO nodes as regular nodes, but in principle, I don't
> see anything wrong with nodes w/o memory.
Every processor has a local node on which it runs. The kernel places
memory used by the processor on the local node. Even if we allow
nodes without memory: We still need to associate a "local" node to the
processor. If that is across some NUMA interlink then it is going to be
slower but it will work.
AFAIK It seems to be better to explicitly associate a memory node with a
processor during bootup in arch code.
Various kernel optimizations rely on local memory. Would we create
a special case here of a pglist_data structure without a zones structure?
It seems that the contents of pglist_data are targeted to a memory node.
If we do not have a pglist_data structure then the node would not exist
for the kernel.
What would the benefit or difference be of having nodes without memory?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [patch 2/2] enables booting a NUMA system where some nodes have no memory
2006-11-15 22:05 ` Martin Bligh
@ 2006-11-15 22:41 ` Christoph Lameter
2006-11-15 22:46 ` Martin Bligh
` (3 more replies)
0 siblings, 4 replies; 37+ messages in thread
From: Christoph Lameter @ 2006-11-15 22:41 UTC (permalink / raw)
To: Martin Bligh; +Cc: Christian Krafft, linux-mm, linux-kernel
On Wed, 15 Nov 2006, Martin Bligh wrote:
> A node is an arbitrary container object containing one or more of:
>
> CPUs
> Memory
> IO bus
>
> It does not have to contain memory.
I have never seen a node on Linux without memory. I have seen nodes
without processors and without I/O but not without memory.This seems to be
something new?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [patch 2/2] enables booting a NUMA system where some nodes have no memory
2006-11-15 22:40 ` Christoph Lameter
@ 2006-11-15 22:43 ` Martin Bligh
2006-11-15 22:52 ` Christoph Lameter
2006-11-16 1:35 ` Jack Steiner
1 sibling, 1 reply; 37+ messages in thread
From: Martin Bligh @ 2006-11-15 22:43 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Jack Steiner, Christian Krafft, linux-mm, linux-kernel
Christoph Lameter wrote:
> On Wed, 15 Nov 2006, Jack Steiner wrote:
>
>> A lot of the core infrastructure is currently missing that is required
>> to describe IO nodes as regular nodes, but in principle, I don't
>> see anything wrong with nodes w/o memory.
>
> Every processor has a local node on which it runs. The kernel places
> memory used by the processor on the local node. Even if we allow
> nodes without memory: We still need to associate a "local" node to the
> processor. If that is across some NUMA interlink then it is going to be
> slower but it will work.
>
> AFAIK It seems to be better to explicitly associate a memory node with a
> processor during bootup in arch code.
>
> Various kernel optimizations rely on local memory. Would we create
> a special case here of a pglist_data structure without a zones structure?
>
> It seems that the contents of pglist_data are targeted to a memory node.
> If we do not have a pglist_data structure then the node would not exist
> for the kernel.
>
> What would the benefit or difference be of having nodes without memory?
Some nodes really don't have memory. Either because it's been
deconfigured, or because it was never there in the first place.
We shouldn't need to kludge that.
All we need is an appropriate zonelist for each node, pointing to
the memory it should be accessing.
M.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [patch 2/2] enables booting a NUMA system where some nodes have no memory
2006-11-15 22:41 ` Christoph Lameter
@ 2006-11-15 22:46 ` Martin Bligh
2006-11-15 22:51 ` Christoph Lameter
2006-11-16 0:26 ` Arnd Bergmann
` (2 subsequent siblings)
3 siblings, 1 reply; 37+ messages in thread
From: Martin Bligh @ 2006-11-15 22:46 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Christian Krafft, linux-mm, linux-kernel
Christoph Lameter wrote:
> On Wed, 15 Nov 2006, Martin Bligh wrote:
>
>> A node is an arbitrary container object containing one or more of:
>>
>> CPUs
>> Memory
>> IO bus
>>
>> It does not have to contain memory.
>
> I have never seen a node on Linux without memory. I have seen nodes
> without processors and without I/O but not without memory.This seems to be
> something new?
A node was always defined that way. Search back a few years in the lkml
archives. We may be finding bugs in the implementation, but the
definition has not changed.
Supposing we hot-unplugged all the memory in a node? Or seems to have
happened in this instance is boot with mem=, cutting out memory on that
node.
M.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [patch 2/2] enables booting a NUMA system where some nodes have no memory
2006-11-15 22:46 ` Martin Bligh
@ 2006-11-15 22:51 ` Christoph Lameter
2006-11-16 0:59 ` KAMEZAWA Hiroyuki
0 siblings, 1 reply; 37+ messages in thread
From: Christoph Lameter @ 2006-11-15 22:51 UTC (permalink / raw)
To: Martin Bligh; +Cc: Christian Krafft, linux-mm, linux-kernel
On Wed, 15 Nov 2006, Martin Bligh wrote:
> Supposing we hot-unplugged all the memory in a node? Or seems to have
> happened in this instance is boot with mem=, cutting out memory on that
> node.
So a node with no memory has a pgdat_list structure but no zones? Or empty
zones?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [patch 2/2] enables booting a NUMA system where some nodes have no memory
2006-11-15 22:43 ` Martin Bligh
@ 2006-11-15 22:52 ` Christoph Lameter
2006-11-16 0:54 ` KAMEZAWA Hiroyuki
2006-11-16 2:01 ` Martin Bligh
0 siblings, 2 replies; 37+ messages in thread
From: Christoph Lameter @ 2006-11-15 22:52 UTC (permalink / raw)
To: Martin Bligh; +Cc: Jack Steiner, Christian Krafft, linux-mm, linux-kernel
On Wed, 15 Nov 2006, Martin Bligh wrote:
> All we need is an appropriate zonelist for each node, pointing to
> the memory it should be accessing.
But there is no memory on the node. Does the zonelist contain the zones of
the node without memory or not? We simply fall back each allocation to the
next node as if the node was overflowing?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [patch 2/2] enables booting a NUMA system where some nodes have no memory
2006-11-15 22:41 ` Christoph Lameter
2006-11-15 22:46 ` Martin Bligh
@ 2006-11-16 0:26 ` Arnd Bergmann
2006-11-16 0:45 ` Christoph Lameter
2006-11-16 0:44 ` Jesper Juhl
2006-11-16 15:21 ` Lee Schermerhorn
3 siblings, 1 reply; 37+ messages in thread
From: Arnd Bergmann @ 2006-11-16 0:26 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Martin Bligh, Christian Krafft, linux-mm, linux-kernel
On Wednesday 15 November 2006 23:41, Christoph Lameter wrote:
> On Wed, 15 Nov 2006, Martin Bligh wrote:
> > A node is an arbitrary container object containing one or more of:
> >
> > CPUs
> > Memory
> > IO bus
+ SPUs on a Cell processor
> > It does not have to contain memory.
>
> I have never seen a node on Linux without memory. I have seen nodes
> without processors and without I/O but not without memory.This seems to be
> something new?
In this particular case, we have a dual-socket Cell/B.E. blade server,
where each of the two CPU-socket/south-bridge/memory combinations is
treated as a separate node. The two points that make this tricky
are:
- we want to be able to boot with the 'mem=512M' option, which effectively
disables the memory on the second node (each node has 512MiB).
- Each node has 8 SPUs, all of which we want to use. In order to use an
SPU, we call __add_pages to register the local memory on it, so we have
struct page pointers we can hand out to user mappings with ->nopage().
The __add_pages call needs to do node local allocations (there are
probably more allocations that have the same problem, but this is the
first one that crashes), which oops when there is no memory registered
at all for that node, instead of returning an error or falling back
on a non-local allocation.
Arnd <><
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [patch 2/2] enables booting a NUMA system where some nodes have no memory
2006-11-15 22:41 ` Christoph Lameter
2006-11-15 22:46 ` Martin Bligh
2006-11-16 0:26 ` Arnd Bergmann
@ 2006-11-16 0:44 ` Jesper Juhl
2006-11-16 0:46 ` Christoph Lameter
2006-11-16 15:21 ` Lee Schermerhorn
3 siblings, 1 reply; 37+ messages in thread
From: Jesper Juhl @ 2006-11-16 0:44 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Martin Bligh, Christian Krafft, linux-mm, linux-kernel
On 15/11/06, Christoph Lameter <clameter@sgi.com> wrote:
> On Wed, 15 Nov 2006, Martin Bligh wrote:
>
> > A node is an arbitrary container object containing one or more of:
> >
> > CPUs
> > Memory
> > IO bus
> >
> > It does not have to contain memory.
>
> I have never seen a node on Linux without memory. I have seen nodes
> without processors and without I/O but not without memory.This seems to be
> something new?
>
What about SMP Opteron boards that have RAM slots for each CPU?
With two (or more) CPU's and only memory slots populated for one of
them, wouldn't that count as multiple NUMA nodes but only one of them
with memory?
That would seem to be a pretty common thing that could happen.
--
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [patch 2/2] enables booting a NUMA system where some nodes have no memory
2006-11-16 0:26 ` Arnd Bergmann
@ 2006-11-16 0:45 ` Christoph Lameter
2006-11-16 13:08 ` Arnd Bergmann
0 siblings, 1 reply; 37+ messages in thread
From: Christoph Lameter @ 2006-11-16 0:45 UTC (permalink / raw)
To: Arnd Bergmann; +Cc: Martin Bligh, Christian Krafft, linux-mm, linux-kernel
On Thu, 16 Nov 2006, Arnd Bergmann wrote:
> - we want to be able to boot with the 'mem=512M' option, which effectively
> disables the memory on the second node (each node has 512MiB).
> - Each node has 8 SPUs, all of which we want to use. In order to use an
> SPU, we call __add_pages to register the local memory on it, so we have
> struct page pointers we can hand out to user mappings with ->nopage().
This is more like the bringup of a processor right? You need
to have the memory online before the processor is brought up otherwise
the slab cannot properly allocate its structures on the node when the
per node portion is brought up. The page allocator has similar issues.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [patch 2/2] enables booting a NUMA system where some nodes have no memory
2006-11-16 0:44 ` Jesper Juhl
@ 2006-11-16 0:46 ` Christoph Lameter
0 siblings, 0 replies; 37+ messages in thread
From: Christoph Lameter @ 2006-11-16 0:46 UTC (permalink / raw)
To: Jesper Juhl; +Cc: Martin Bligh, Christian Krafft, linux-mm, linux-kernel
On Thu, 16 Nov 2006, Jesper Juhl wrote:
> What about SMP Opteron boards that have RAM slots for each CPU?
> With two (or more) CPU's and only memory slots populated for one of
> them, wouldn't that count as multiple NUMA nodes but only one of them
> with memory?
> That would seem to be a pretty common thing that could happen.
I think so far we have handled these as two processors on one node.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [patch 2/2] enables booting a NUMA system where some nodes have no memory
2006-11-15 22:52 ` Christoph Lameter
@ 2006-11-16 0:54 ` KAMEZAWA Hiroyuki
2006-11-16 0:57 ` Christoph Lameter
2006-11-16 2:01 ` Martin Bligh
1 sibling, 1 reply; 37+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-11-16 0:54 UTC (permalink / raw)
To: Christoph Lameter; +Cc: mbligh, steiner, krafft, linux-mm, linux-kernel
On Wed, 15 Nov 2006 14:52:43 -0800 (PST)
Christoph Lameter <clameter@sgi.com> wrote:
> On Wed, 15 Nov 2006, Martin Bligh wrote:
>
> > All we need is an appropriate zonelist for each node, pointing to
> > the memory it should be accessing.
>
> But there is no memory on the node. Does the zonelist contain the zones of
> the node without memory or not? We simply fall back each allocation to the
> next node as if the node was overflowing?
>
yes. just fallback.
The zonelist[] donen't contain empty-zone.
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [patch 2/2] enables booting a NUMA system where some nodes have no memory
2006-11-16 0:54 ` KAMEZAWA Hiroyuki
@ 2006-11-16 0:57 ` Christoph Lameter
2006-11-16 1:17 ` KAMEZAWA Hiroyuki
2006-11-16 15:40 ` Christian Krafft
0 siblings, 2 replies; 37+ messages in thread
From: Christoph Lameter @ 2006-11-16 0:57 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: mbligh, steiner, krafft, linux-mm, linux-kernel
On Thu, 16 Nov 2006, KAMEZAWA Hiroyuki wrote:
> > But there is no memory on the node. Does the zonelist contain the zones of
> > the node without memory or not? We simply fall back each allocation to the
> > next node as if the node was overflowing?
> yes. just fallback.
Ok, so we got a useless pglist_data struct and the struct zone contains a
zonelist that does not include the zone.
numa_node_id() points to this and we always get allocations redirected to
other nodes. The slab duplicates its per node structures on the fallback
node.
> The zonelist[] donen't contain empty-zone.
So we will never encounter that zone except when going to the
pglist_data struct through numa_node_id()?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [patch 2/2] enables booting a NUMA system where some nodes have no memory
2006-11-15 22:51 ` Christoph Lameter
@ 2006-11-16 0:59 ` KAMEZAWA Hiroyuki
2006-11-16 1:22 ` Yasunori Goto
0 siblings, 1 reply; 37+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-11-16 0:59 UTC (permalink / raw)
To: Christoph Lameter; +Cc: mbligh, krafft, linux-mm, linux-kernel
On Wed, 15 Nov 2006 14:51:26 -0800 (PST)
Christoph Lameter <clameter@sgi.com> wrote:
> On Wed, 15 Nov 2006, Martin Bligh wrote:
>
> > Supposing we hot-unplugged all the memory in a node? Or seems to have
> > happened in this instance is boot with mem=, cutting out memory on that
> > node.
>
> So a node with no memory has a pgdat_list structure but no zones? Or empty
> zones?
>
The node has just empty-zone. pgdat/per-cpu-area is allocated on an other
(nearest) node.
I hear some vender's machine has this configuration. (ia64, maybe SGI or HP)
Node0: CPUx0 + XXXGb memory
Node1: CPUx2 + 16MB memory
Node2: CPUx2 + 16MB memory
memory of Node1 and Node2 is tirmmed at boot by GRANULE alignment.
Then, final view is
Node0 : memory-only-node
Node1 : cpu-only-node
Node2 : cpu-only-node.
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [patch 2/2] enables booting a NUMA system where some nodes have no memory
2006-11-16 0:57 ` Christoph Lameter
@ 2006-11-16 1:17 ` KAMEZAWA Hiroyuki
2006-11-16 15:40 ` Christian Krafft
1 sibling, 0 replies; 37+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-11-16 1:17 UTC (permalink / raw)
To: Christoph Lameter; +Cc: mbligh, steiner, krafft, linux-mm, linux-kernel
On Wed, 15 Nov 2006 16:57:56 -0800 (PST)
Christoph Lameter <clameter@sgi.com> wrote:
> numa_node_id() points to this and we always get allocations redirected to
> other nodes. The slab duplicates its per node structures on the fallback
> node.
>
> > The zonelist[] donen't contain empty-zone.
>
> So we will never encounter that zone except when going to the
> pglist_data struct through numa_node_id()?
>
Some pgdat/zone scanning code will access it.
See: for_each_zone() and populated_zone().
AFAIK, in 2.6.9 age(means RHEL4), cpus on memory-less-node are moved to the
nearest node. And there were no useless pgdat.
Now, there are memory-less-node. Cpus on memory-less-node are on a pgdat
with empty-zone. I think this is very simple way rather than remapping.
And I think cpus on memory-less-node are sharing something (FSB,switch,etc..)
Tieing cpus to a memory-less-node may have some benefit.
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [patch 2/2] enables booting a NUMA system where some nodes have no memory
2006-11-16 0:59 ` KAMEZAWA Hiroyuki
@ 2006-11-16 1:22 ` Yasunori Goto
0 siblings, 0 replies; 37+ messages in thread
From: Yasunori Goto @ 2006-11-16 1:22 UTC (permalink / raw)
To: Christoph Lameter
Cc: mbligh, krafft, linux-mm, linux-kernel, KAMEZAWA Hiroyuki
> I hear some vender's machine has this configuration. (ia64, maybe SGI or HP)
>
> Node0: CPUx0 + XXXGb memory
> Node1: CPUx2 + 16MB memory
> Node2: CPUx2 + 16MB memory
>
> memory of Node1 and Node2 is tirmmed at boot by GRANULE alignment.
> Then, final view is
> Node0 : memory-only-node
> Node1 : cpu-only-node
> Node2 : cpu-only-node.
IIRC, this is HP box. It is using memory interleave among nodes.
Bye.
--
Yasunori Goto
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [patch 2/2] enables booting a NUMA system where some nodes have no memory
2006-11-15 22:40 ` Christoph Lameter
2006-11-15 22:43 ` Martin Bligh
@ 2006-11-16 1:35 ` Jack Steiner
2006-11-16 1:57 ` Christoph Lameter
1 sibling, 1 reply; 37+ messages in thread
From: Jack Steiner @ 2006-11-16 1:35 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Christian Krafft, linux-mm, Martin Bligh, linux-kernel
On Wed, Nov 15, 2006 at 02:40:36PM -0800, Christoph Lameter wrote:
> On Wed, 15 Nov 2006, Jack Steiner wrote:
>
> > A lot of the core infrastructure is currently missing that is required
> > to describe IO nodes as regular nodes, but in principle, I don't
> > see anything wrong with nodes w/o memory.
>
> Every processor has a local node on which it runs. The kernel places
> memory used by the processor on the local node. Even if we allow
> nodes without memory: We still need to associate a "local" node to the
> processor. If that is across some NUMA interlink then it is going to be
> slower but it will work.
True.
>
> AFAIK It seems to be better to explicitly associate a memory node with a
> processor during bootup in arch code.
>
> Various kernel optimizations rely on local memory. Would we create
> a special case here of a pglist_data structure without a zones structure?
>
> It seems that the contents of pglist_data are targeted to a memory node.
> If we do not have a pglist_data structure then the node would not exist
> for the kernel.
>
> What would the benefit or difference be of having nodes without memory?
I doubt that there is a demand for systems with memoryless nodes. However, if the
DIMM(s) on a node fails, I think the system may perform better
with the cpus on the node enabled than it will if they have to be
disabled.
-- jack
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [patch 2/2] enables booting a NUMA system where some nodes have no memory
2006-11-16 1:35 ` Jack Steiner
@ 2006-11-16 1:57 ` Christoph Lameter
2006-11-16 2:09 ` Martin Bligh
2006-11-16 3:28 ` Jack Steiner
0 siblings, 2 replies; 37+ messages in thread
From: Christoph Lameter @ 2006-11-16 1:57 UTC (permalink / raw)
To: Jack Steiner; +Cc: Christian Krafft, linux-mm, Martin Bligh, linux-kernel
On Wed, 15 Nov 2006, Jack Steiner wrote:
> I doubt that there is a demand for systems with memoryless nodes. However, if the
> DIMM(s) on a node fails, I think the system may perform better
> with the cpus on the node enabled than it will if they have to be
> disabled.
Right now we do not have the capability to remove memory from a node while
the system is running.
If the DIMMs have failed and we boot up and the systems finds out that
there is no memory on that node then the cpus can be remapped to
the next memory node. That is better than having lots of useless
structures allocated.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [patch 2/2] enables booting a NUMA system where some nodes have no memory
2006-11-15 22:52 ` Christoph Lameter
2006-11-16 0:54 ` KAMEZAWA Hiroyuki
@ 2006-11-16 2:01 ` Martin Bligh
1 sibling, 0 replies; 37+ messages in thread
From: Martin Bligh @ 2006-11-16 2:01 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Jack Steiner, Christian Krafft, linux-mm, linux-kernel
Christoph Lameter wrote:
> On Wed, 15 Nov 2006, Martin Bligh wrote:
>
>> All we need is an appropriate zonelist for each node, pointing to
>> the memory it should be accessing.
>
> But there is no memory on the node. Does the zonelist contain the zones of
> the node without memory or not? We simply fall back each allocation to the
> next node as if the node was overflowing?
Sure. there's no point in putting an empty zone in the zonelist.
We should just skip anything where present_pages is zero.
M.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [patch 2/2] enables booting a NUMA system where some nodes have no memory
2006-11-16 1:57 ` Christoph Lameter
@ 2006-11-16 2:09 ` Martin Bligh
2006-11-16 2:35 ` Christoph Lameter
2006-11-16 3:28 ` Jack Steiner
1 sibling, 1 reply; 37+ messages in thread
From: Martin Bligh @ 2006-11-16 2:09 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Jack Steiner, Christian Krafft, linux-mm, linux-kernel
Christoph Lameter wrote:
> On Wed, 15 Nov 2006, Jack Steiner wrote:
>
>> I doubt that there is a demand for systems with memoryless nodes. However, if the
>> DIMM(s) on a node fails, I think the system may perform better
>> with the cpus on the node enabled than it will if they have to be
>> disabled.
>
> Right now we do not have the capability to remove memory from a node while
> the system is running.
>
> If the DIMMs have failed and we boot up and the systems finds out that
> there is no memory on that node then the cpus can be remapped to
> the next memory node. That is better than having lots of useless
> structures allocated.
A node without memory is a node without memory. Simply remapping the
cpus to another node and pretending the world is different does not
make much sense.
Is there some fundamental problem you see with dealing with the nodes
as is? Doesn't seem that hard to me. I'm not asking you to put the
effort in to fixing it, just if you see some fundamental reason why
it can't be fixed?
M.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [patch 2/2] enables booting a NUMA system where some nodes have no memory
2006-11-16 2:09 ` Martin Bligh
@ 2006-11-16 2:35 ` Christoph Lameter
0 siblings, 0 replies; 37+ messages in thread
From: Christoph Lameter @ 2006-11-16 2:35 UTC (permalink / raw)
To: Martin Bligh; +Cc: Jack Steiner, Christian Krafft, linux-mm, linux-kernel
On Wed, 15 Nov 2006, Martin Bligh wrote:
> A node without memory is a node without memory. Simply remapping the
> cpus to another node and pretending the world is different does not
> make much sense.
It avoids overhead both in terms of memory and processing in the kernel
and it seems that is the way we have traditionally dealt with the issue?
Nodes without memory require the VM to allocate memory from different
nodes in order to build up management structures for the node (these
are useless since the node has no memory, caches will be split etc etc).
The cpus will allways fallback to the next node anyways since
their zonelist begins with a zone in a node that has memory.
> Is there some fundamental problem you see with dealing with the nodes
> as is? Doesn't seem that hard to me. I'm not asking you to put the
> effort in to fixing it, just if you see some fundamental reason why
> it can't be fixed?
I am not sure how memoryless nodes would affect various subsystems. And it
seems that this patch only fixes the first issue that they found (?). If
we go down this route then we may have to add more special casing to the
VM in order to cleanly handle memoryless nodes.
But maybe someone else has already experience with memoryless nodes?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [patch 2/2] enables booting a NUMA system where some nodes have no memory
2006-11-16 1:57 ` Christoph Lameter
2006-11-16 2:09 ` Martin Bligh
@ 2006-11-16 3:28 ` Jack Steiner
1 sibling, 0 replies; 37+ messages in thread
From: Jack Steiner @ 2006-11-16 3:28 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Christian Krafft, linux-mm, Martin Bligh, linux-kernel
On Wed, Nov 15, 2006 at 05:57:27PM -0800, Christoph Lameter wrote:
> On Wed, 15 Nov 2006, Jack Steiner wrote:
>
> > I doubt that there is a demand for systems with memoryless nodes. However, if the
> > DIMM(s) on a node fails, I think the system may perform better
> > with the cpus on the node enabled than it will if they have to be
> > disabled.
>
> Right now we do not have the capability to remove memory from a node while
> the system is running.
I know. I'm refering to a DIMM that fails power-on diags or one
that is explicitly disabled from the system controller.
Clearly a reboot is required in both cases, but the end result is
a node with cpus and no memory. As I said earlier, the PROM (for several
reasons) automatically the cpus on nodes w/o memory.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [patch 2/2] enables booting a NUMA system where some nodes have no memory
2006-11-16 0:45 ` Christoph Lameter
@ 2006-11-16 13:08 ` Arnd Bergmann
0 siblings, 0 replies; 37+ messages in thread
From: Arnd Bergmann @ 2006-11-16 13:08 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Martin Bligh, Christian Krafft, linux-mm, linux-kernel
On Thursday 16 November 2006 01:45, Christoph Lameter wrote:
> On Thu, 16 Nov 2006, Arnd Bergmann wrote:
>
> > - we want to be able to boot with the 'mem=512M' option, which effectively
> > disables the memory on the second node (each node has 512MiB).
> > - Each node has 8 SPUs, all of which we want to use. In order to use an
> > SPU, we call __add_pages to register the local memory on it, so we have
> > struct page pointers we can hand out to user mappings with ->nopage().
>
> This is more like the bringup of a processor right? You need
> to have the memory online before the processor is brought up otherwise
> the slab cannot properly allocate its structures on the node when the
> per node portion is brought up. The page allocator has similar issues.
No, that's not really the issue here. The memory we're trying to add to the
mem_map can not be used for kernel allocations at all and is never entered
into the buddy allocator. It can only be used for applications running on
an SPU itself.
So the problem is not the order in which we do things, but the fact that
node data structure has not been initialized, and never will be, when
we add the SPU to the node.
Arnd <><
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [patch 2/2] enables booting a NUMA system where some nodes have no memory
2006-11-15 22:41 ` Christoph Lameter
` (2 preceding siblings ...)
2006-11-16 0:44 ` Jesper Juhl
@ 2006-11-16 15:21 ` Lee Schermerhorn
3 siblings, 0 replies; 37+ messages in thread
From: Lee Schermerhorn @ 2006-11-16 15:21 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Martin Bligh, Christian Krafft, linux-mm, linux-kernel
On Wed, 2006-11-15 at 14:41 -0800, Christoph Lameter wrote:
> On Wed, 15 Nov 2006, Martin Bligh wrote:
>
> > A node is an arbitrary container object containing one or more of:
> >
> > CPUs
> > Memory
> > IO bus
> >
> > It does not have to contain memory.
>
> I have never seen a node on Linux without memory. I have seen nodes
> without processors and without I/O but not without memory.This seems to be
> something new?
I sent this out earlier in response to another message from Christoph
regarding nodes w/o memory. Don't know if it made it...
>On Fri, 2006-11-10 at 10:16 -0800, Christoph Lameter wrote:
>> On Wed, 8 Nov 2006, KAMEZAWA Hiroyuki wrote:
>>
>> > I wonder there are no code for creating NODE_DATA() for
device-only-node.
>>
>> On IA64 we remap nodes with no memory / cpus to the nearest node
with
>> memory. I think that is sufficient.
I don't think this happens anymore. Back in the ~2.6.5 days, when we
would configure our numa platforms with 100% of memory interleaved [in
hardware at cache line granularity], the cpus would move to the
interleaved "pseudo-node" and the memoryless nodes would be removed.
numactl --hardware would show something like this:
# uname -r
2.6.5-7.244-default
# numactl --hardware
available: 1 nodes (0-0)
node 0 size: 65443 MB
node 0 free: 64506 MB
I started seeing different behavior about the time SPARSEMEM went in.
Now, with a 2.6.16 base kernel [same platform, hardware interleaved
memory], I see:
# uname -r# numactl --hardware
available: 5 nodes (0-4)
node 0 size: 0 MB
node 0 free: 0 MB
node 1 size: 0 MB
node 1 free: 0 MB
node 2 size: 0 MB
node 2 free: 0 MB
node 3 size: 0 MB
node 3 free: 0 MB
node 4 size: 65439 MB
node 4 free: 64492 MB
node distances:
node 0 1 2 3 4
0: 10 17 17 17 14
1: 17 10 17 17 14
2: 17 17 10 17 14
3: 17 17 17 10 14
4: 14 14 14 14 10
2.6.16.21-0.8-default
[Aside: The firmware/SLIT says that the interleaved memory is closer to
all nodes that other nodes' memory. This has interesting implications
for the "overflow" zone lists...]
Lee
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [patch 2/2] enables booting a NUMA system where some nodes have no memory
2006-11-16 0:57 ` Christoph Lameter
2006-11-16 1:17 ` KAMEZAWA Hiroyuki
@ 2006-11-16 15:40 ` Christian Krafft
2006-11-16 15:49 ` Martin J. Bligh
2006-11-16 18:46 ` Christoph Lameter
1 sibling, 2 replies; 37+ messages in thread
From: Christian Krafft @ 2006-11-16 15:40 UTC (permalink / raw)
To: Christoph Lameter
Cc: KAMEZAWA Hiroyuki, mbligh, steiner, linux-mm, linux-kernel
On Wed, 15 Nov 2006 16:57:56 -0800 (PST)
Christoph Lameter <clameter@sgi.com> wrote:
> On Thu, 16 Nov 2006, KAMEZAWA Hiroyuki wrote:
>
> > > But there is no memory on the node. Does the zonelist contain the zones of
> > > the node without memory or not? We simply fall back each allocation to the
> > > next node as if the node was overflowing?
> > yes. just fallback.
>
> Ok, so we got a useless pglist_data struct and the struct zone contains a
> zonelist that does not include the zone.
Okay, I slowly understand what you are talking about.
I just tried a "numactl --cpunodebind 1 --membind 1 true" which hit an uninitialized zone in slab_node:
return zone_to_nid(policy->v.zonelist->zones[0]);
I also still don't know if it makes sense to have memoryless nodes, but supporting it does.
So wath would be reasonable, to have empty zonelists for those node, or to check if zonelists are uninitialized ?
--
Mit freundlichen Grussen,
kind regards,
Christian Krafft
IBM Systems & Technology Group,
Linux Kernel Development
IT Specialist
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [patch 2/2] enables booting a NUMA system where some nodes have no memory
2006-11-16 15:40 ` Christian Krafft
@ 2006-11-16 15:49 ` Martin J. Bligh
2006-11-16 18:46 ` Christoph Lameter
1 sibling, 0 replies; 37+ messages in thread
From: Martin J. Bligh @ 2006-11-16 15:49 UTC (permalink / raw)
To: Christian Krafft
Cc: Christoph Lameter, KAMEZAWA Hiroyuki, steiner, linux-mm, linux-kernel
Christian Krafft wrote:
> On Wed, 15 Nov 2006 16:57:56 -0800 (PST)
> Christoph Lameter <clameter@sgi.com> wrote:
>
>> On Thu, 16 Nov 2006, KAMEZAWA Hiroyuki wrote:
>>
>>>> But there is no memory on the node. Does the zonelist contain the zones of
>>>> the node without memory or not? We simply fall back each allocation to the
>>>> next node as if the node was overflowing?
>>> yes. just fallback.
>> Ok, so we got a useless pglist_data struct and the struct zone contains a
>> zonelist that does not include the zone.
>
> Okay, I slowly understand what you are talking about.
> I just tried a "numactl --cpunodebind 1 --membind 1 true" which hit an uninitialized zone in slab_node:
>
> return zone_to_nid(policy->v.zonelist->zones[0]);
>
> I also still don't know if it makes sense to have memoryless nodes, but supporting it does.
> So wath would be reasonable, to have empty zonelists for those node, or to check if zonelists are uninitialized ?
You don't want empty zonelists on a node containing CPUs, else it won't
know where to allocate from. You just want to make sure that the zones
in that node (if existant) are not contained in *anyone's* zonelist.
M.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [patch 2/2] enables booting a NUMA system where some nodes have no memory
2006-11-16 15:40 ` Christian Krafft
2006-11-16 15:49 ` Martin J. Bligh
@ 2006-11-16 18:46 ` Christoph Lameter
1 sibling, 0 replies; 37+ messages in thread
From: Christoph Lameter @ 2006-11-16 18:46 UTC (permalink / raw)
To: Christian Krafft
Cc: KAMEZAWA Hiroyuki, mbligh, steiner, linux-mm, linux-kernel
On Thu, 16 Nov 2006, Christian Krafft wrote:
> Okay, I slowly understand what you are talking about.
> I just tried a "numactl --cpunodebind 1 --membind 1 true" which hit an uninitialized zone in slab_node:
>
> return zone_to_nid(policy->v.zonelist->zones[0]);
I think the above should work fine and give the expected OOM since the
node has no memory.
The zone struct should redirect via the zonelist to nodes that have
memory for allocations that are not bound to a single node.
> I also still don't know if it makes sense to have memoryless nodes, but supporting it does.
> So wath would be reasonable, to have empty zonelists for those node, or to check if zonelists are uninitialized ?
zonelists of those nodes should contain a list of fallback zones with
available memory.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [patch 1/2] fix call to alloc_bootmem after bootmem has been freed
2006-11-15 18:32 ` [patch 1/2] fix call to alloc_bootmem after bootmem has been freed Christian Krafft
@ 2006-11-21 16:55 ` Andrew Morton
2006-11-21 18:02 ` Christian Krafft
0 siblings, 1 reply; 37+ messages in thread
From: Andrew Morton @ 2006-11-21 16:55 UTC (permalink / raw)
To: Christian Krafft; +Cc: linux-mm, linux-kernel
On Wed, 15 Nov 2006 19:32:38 +0100
Christian Krafft <krafft@de.ibm.com> wrote:
> In some cases it might happen, that alloc_bootmem is beeing called
> after bootmem pages have been freed. This is, because the condition
> SYSTEM_BOOTING is still true after bootmem has been freed.
>
> Signed-off-by: Christian Krafft <krafft@de.ibm.com>
>
> Index: linux/mm/page_alloc.c
> ===================================================================
> --- linux.orig/mm/page_alloc.c
> +++ linux/mm/page_alloc.c
> @@ -1931,7 +1931,7 @@ int zone_wait_table_init(struct zone *zo
> alloc_size = zone->wait_table_hash_nr_entries
> * sizeof(wait_queue_head_t);
>
> - if (system_state == SYSTEM_BOOTING) {
> + if (!slab_is_available()) {
> zone->wait_table = (wait_queue_head_t *)
> alloc_bootmem_node(pgdat, alloc_size);
> } else {
I don't think that slab_is_available() is an appropriate way of working out
if we can call vmalloc().
Also, a more complete description of the problem is needed, please. Which
caller is incorrectly allocating bootmem?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [patch 1/2] fix call to alloc_bootmem after bootmem has been freed
2006-11-21 16:55 ` Andrew Morton
@ 2006-11-21 18:02 ` Christian Krafft
2006-11-21 18:26 ` Andrew Morton
0 siblings, 1 reply; 37+ messages in thread
From: Christian Krafft @ 2006-11-21 18:02 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, linux-kernel
On Tue, 21 Nov 2006 08:55:35 -0800
Andrew Morton <akpm@osdl.org> wrote:
> On Wed, 15 Nov 2006 19:32:38 +0100
> Christian Krafft <krafft@de.ibm.com> wrote:
>
> > In some cases it might happen, that alloc_bootmem is beeing called
> > after bootmem pages have been freed. This is, because the condition
> > SYSTEM_BOOTING is still true after bootmem has been freed.
> >
> > Signed-off-by: Christian Krafft <krafft@de.ibm.com>
> >
> > Index: linux/mm/page_alloc.c
> > ===================================================================
> > --- linux.orig/mm/page_alloc.c
> > +++ linux/mm/page_alloc.c
> > @@ -1931,7 +1931,7 @@ int zone_wait_table_init(struct zone *zo
> > alloc_size = zone->wait_table_hash_nr_entries
> > * sizeof(wait_queue_head_t);
> >
> > - if (system_state == SYSTEM_BOOTING) {
> > + if (!slab_is_available()) {
> > zone->wait_table = (wait_queue_head_t *)
> > alloc_bootmem_node(pgdat, alloc_size);
> > } else {
>
> I don't think that slab_is_available() is an appropriate way of working out
> if we can call vmalloc().
Afaik slab_is_available() is the generic replacement for mem_init_done, which exists only on powerpc.
If thats not appropriate, I dont know why. However, SYSTEM_BOOTING is definitively wrong.
> Also, a more complete description of the problem is needed, please. Which
> caller is incorrectly allocating bootmem?
>
spu_base is causing the call to alloc_bootmem but only, if built into kernel. Other components might have the same problem.
cheers,
ck
--
Mit freundlichen Grussen,
kind regards,
Christian Krafft
IBM Systems & Technology Group,
Linux Kernel Development
IT Specialist
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [patch 1/2] fix call to alloc_bootmem after bootmem has been freed
2006-11-21 18:02 ` Christian Krafft
@ 2006-11-21 18:26 ` Andrew Morton
2006-11-22 9:23 ` Arnd Bergmann
0 siblings, 1 reply; 37+ messages in thread
From: Andrew Morton @ 2006-11-21 18:26 UTC (permalink / raw)
To: Christian Krafft; +Cc: linux-mm, linux-kernel
On Tue, 21 Nov 2006 19:02:13 +0100
Christian Krafft <krafft@de.ibm.com> wrote:
> > > Index: linux/mm/page_alloc.c
> > > ===================================================================
> > > --- linux.orig/mm/page_alloc.c
> > > +++ linux/mm/page_alloc.c
> > > @@ -1931,7 +1931,7 @@ int zone_wait_table_init(struct zone *zo
> > > alloc_size = zone->wait_table_hash_nr_entries
> > > * sizeof(wait_queue_head_t);
> > >
> > > - if (system_state == SYSTEM_BOOTING) {
> > > + if (!slab_is_available()) {
> > > zone->wait_table = (wait_queue_head_t *)
> > > alloc_bootmem_node(pgdat, alloc_size);
> > > } else {
> >
> > I don't think that slab_is_available() is an appropriate way of working out
> > if we can call vmalloc().
>
> Afaik slab_is_available() is the generic replacement for mem_init_done, which exists only on powerpc.
> If thats not appropriate, I dont know why. However, SYSTEM_BOOTING is definitively wrong.
slab is a very different thing from vmalloc. One could easily envisage
situations (now or in the future) in which slab is ready, but vmalloc is
not (more likely vice versa).
It'd be better to add a new vmalloc_is_available. (Just an int - no need
for a helper function).
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [patch 1/2] fix call to alloc_bootmem after bootmem has been freed
2006-11-21 18:26 ` Andrew Morton
@ 2006-11-22 9:23 ` Arnd Bergmann
0 siblings, 0 replies; 37+ messages in thread
From: Arnd Bergmann @ 2006-11-22 9:23 UTC (permalink / raw)
To: Andrew Morton; +Cc: Christian Krafft, linux-mm, linux-kernel
On Tuesday 21 November 2006 19:26, Andrew Morton wrote:
> slab is a very different thing from vmalloc. One could easily envisage
> situations (now or in the future) in which slab is ready, but vmalloc is
> not (more likely vice versa).
>
> It'd be better to add a new vmalloc_is_available. (Just an int - no need
> for a helper function).
In the time line, we currently have
start_kernel()
...
setup_arch()
init_bootmem() # alloc_bootmem starts working
...
paging_init() # needed for vmalloc
... #
mem_init()
free_all_bootmem() # alloc_bootmem stops working, alloc_pages
# starts working
kmem_cache_init() # kmalloc and vmalloc start working
...
system_state = SYSTEM_RUNNING
The one interesting point here is where you have to transition between
calling alloc_bootmem and calling the regular allocator functions.
Maybe calling it slab_is_available() was not the best choice for a name,
but I don't see a point in having different names for essentially the
same question, "bootmem or not bootmem". The powerpc platform has an
integer variable called 'mem_init_done', which expresses this well
IMHO, but it's currently not portable.
Checking for SYSTEM_RUNNING is obviously the wrong choice, since it is
set at a very late point in bootup, long after bootmem is gone.
Arnd <><
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* [patch 0/2] fix bugs while booting on NUMA system where some nodes have no mem
@ 2006-11-15 17:54 Christian Krafft
0 siblings, 0 replies; 37+ messages in thread
From: Christian Krafft @ 2006-11-15 17:54 UTC (permalink / raw)
To: linux-mm; +Cc: linux-kernel
Hi,
The following patches are fixing two problems that showed up
while booting a NUMA system where memory was limited to the first node.
Please cc me for comments as I am not subscribed.
cheers,
Christian
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2006-11-22 9:23 UTC | newest]
Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-11-15 18:30 [patch 0/2] fix bugs while booting on NUMA system where some nodes have no mem Christian Krafft
2006-11-15 18:32 ` [patch 1/2] fix call to alloc_bootmem after bootmem has been freed Christian Krafft
2006-11-21 16:55 ` Andrew Morton
2006-11-21 18:02 ` Christian Krafft
2006-11-21 18:26 ` Andrew Morton
2006-11-22 9:23 ` Arnd Bergmann
2006-11-15 18:34 ` [patch 2/2] enables booting a NUMA system where some nodes have no memory Christian Krafft
2006-11-15 21:24 ` Christoph Lameter
2006-11-15 21:58 ` Jack Steiner
2006-11-15 22:40 ` Christoph Lameter
2006-11-15 22:43 ` Martin Bligh
2006-11-15 22:52 ` Christoph Lameter
2006-11-16 0:54 ` KAMEZAWA Hiroyuki
2006-11-16 0:57 ` Christoph Lameter
2006-11-16 1:17 ` KAMEZAWA Hiroyuki
2006-11-16 15:40 ` Christian Krafft
2006-11-16 15:49 ` Martin J. Bligh
2006-11-16 18:46 ` Christoph Lameter
2006-11-16 2:01 ` Martin Bligh
2006-11-16 1:35 ` Jack Steiner
2006-11-16 1:57 ` Christoph Lameter
2006-11-16 2:09 ` Martin Bligh
2006-11-16 2:35 ` Christoph Lameter
2006-11-16 3:28 ` Jack Steiner
2006-11-15 22:05 ` Martin Bligh
2006-11-15 22:41 ` Christoph Lameter
2006-11-15 22:46 ` Martin Bligh
2006-11-15 22:51 ` Christoph Lameter
2006-11-16 0:59 ` KAMEZAWA Hiroyuki
2006-11-16 1:22 ` Yasunori Goto
2006-11-16 0:26 ` Arnd Bergmann
2006-11-16 0:45 ` Christoph Lameter
2006-11-16 13:08 ` Arnd Bergmann
2006-11-16 0:44 ` Jesper Juhl
2006-11-16 0:46 ` Christoph Lameter
2006-11-16 15:21 ` Lee Schermerhorn
-- strict thread matches above, loose matches on Subject: below --
2006-11-15 17:54 [patch 0/2] fix bugs while booting on NUMA system where some nodes have no mem Christian Krafft
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox