linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [patch 0/3] Fixes for NUMA allocations on memoryless nodes
@ 2007-06-12 20:48 clameter
  2007-06-12 20:48 ` [patch 1/3] NUMA: introduce node_memory_map clameter
                   ` (2 more replies)
  0 siblings, 3 replies; 40+ messages in thread
From: clameter @ 2007-06-12 20:48 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, ak, Nishanth Aravamudan, Lee Schermerhorn

This patchset fixes the dysfunctional behavior of for GFP_THISNODE and MPOL_INTERLEAVE
on systems with memoryless nodes. We introduce a new "node_memory_map" to be able to
determine efficiently if a node has memory.

Tested on IA64 NUMA and compile tested on i386 SMP (to verify that the new fallbacks work right)

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [patch 1/3] NUMA: introduce node_memory_map
  2007-06-12 20:48 [patch 0/3] Fixes for NUMA allocations on memoryless nodes clameter
@ 2007-06-12 20:48 ` clameter
  2007-06-12 21:03   ` David Rientjes
  2007-06-12 20:48 ` [patch 2/3] Fix GFP_THISNODE behavior for memoryless nodes clameter
  2007-06-12 20:48 ` [patch 3/3] Fix MPOL_INTERLEAVE " clameter
  2 siblings, 1 reply; 40+ messages in thread
From: clameter @ 2007-06-12 20:48 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, ak, Nishanth Aravamudan, Lee Schermerhorn

[-- Attachment #1: node_memory_map --]
[-- Type: text/plain, Size: 4293 bytes --]

It is necessary to know if nodes have memory since we have recently
begun to add support for memoryless nodes. For that purpose we introduce
a new bitmap called

node_memory_map

A node has its bit in node_memory_map set if it has memory. If a node
has memory then it has at least one zone defined in its pgdat structure
that is located in the pgdat itself.

The node_memory_map can then be used in various places to insure that we
do the right thing when we encounter a memoryless node.

Signed-off-by: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.22-rc4-mm2/include/linux/nodemask.h
===================================================================
--- linux-2.6.22-rc4-mm2.orig/include/linux/nodemask.h	2007-06-12 12:32:38.000000000 -0700
+++ linux-2.6.22-rc4-mm2/include/linux/nodemask.h	2007-06-12 13:45:44.000000000 -0700
@@ -64,12 +64,16 @@
  *
  * int node_online(node)		Is some node online?
  * int node_possible(node)		Is some node possible?
+ * int node_memory(node)		Does a node have memory?
  *
  * int any_online_node(mask)		First online node in mask
  *
  * node_set_online(node)		set bit 'node' in node_online_map
  * node_set_offline(node)		clear bit 'node' in node_online_map
  *
+ * node_set_memory(node)		set bit 'node' in node_memory_map
+ * node_clear_memoryd(node)		clear bit 'node' in node_memory_map
+ *
  * for_each_node(node)			for-loop node over node_possible_map
  * for_each_online_node(node)		for-loop node over node_online_map
  *
@@ -344,12 +348,14 @@ static inline void __nodes_remap(nodemas
 
 extern nodemask_t node_online_map;
 extern nodemask_t node_possible_map;
+extern nodemask_t node_memory_map;
 
 #if MAX_NUMNODES > 1
 #define num_online_nodes()	nodes_weight(node_online_map)
 #define num_possible_nodes()	nodes_weight(node_possible_map)
 #define node_online(node)	node_isset((node), node_online_map)
 #define node_possible(node)	node_isset((node), node_possible_map)
+#define node_memory(node)	node_isset((node), node_memory_map)
 #define first_online_node	first_node(node_online_map)
 #define next_online_node(nid)	next_node((nid), node_online_map)
 extern int nr_node_ids;
@@ -358,6 +364,8 @@ extern int nr_node_ids;
 #define num_possible_nodes()	1
 #define node_online(node)	((node) == 0)
 #define node_possible(node)	((node) == 0)
+#define node_memory(node)	((node) == 0)
+#define node_populated(node)	((node) == 0)
 #define first_online_node	0
 #define next_online_node(nid)	(MAX_NUMNODES)
 #define nr_node_ids		1
@@ -375,6 +383,9 @@ extern int nr_node_ids;
 #define node_set_online(node)	   set_bit((node), node_online_map.bits)
 #define node_set_offline(node)	   clear_bit((node), node_online_map.bits)
 
+#define node_set_memory(node)     set_bit((node), node_memory_map.bits)
+#define node_clear_memory(node)   clear_bit((node), node_memory_map.bits)
+
 #define for_each_node(node)	   for_each_node_mask((node), node_possible_map)
 #define for_each_online_node(node) for_each_node_mask((node), node_online_map)
 
Index: linux-2.6.22-rc4-mm2/mm/page_alloc.c
===================================================================
--- linux-2.6.22-rc4-mm2.orig/mm/page_alloc.c	2007-06-12 12:32:38.000000000 -0700
+++ linux-2.6.22-rc4-mm2/mm/page_alloc.c	2007-06-12 12:32:43.000000000 -0700
@@ -54,6 +54,9 @@ nodemask_t node_online_map __read_mostly
 EXPORT_SYMBOL(node_online_map);
 nodemask_t node_possible_map __read_mostly = NODE_MASK_ALL;
 EXPORT_SYMBOL(node_possible_map);
+nodemask_t node_memory_map __read_mostly = NODE_MASK_NONE;
+EXPORT_SYMBOL(node_memory_map);
+
 unsigned long totalram_pages __read_mostly;
 unsigned long totalreserve_pages __read_mostly;
 long nr_swap_pages;
@@ -2299,6 +2302,9 @@ static void build_zonelists(pg_data_t *p
 		/* calculate node order -- i.e., DMA last! */
 		build_zonelists_in_zone_order(pgdat, j);
 	}
+
+	if (pgdat->node_present_pages)
+		node_set_memory(local_node);
 }
 
 /* Construct the zonelist performance cache - see further mmzone.h */

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [patch 2/3] Fix GFP_THISNODE behavior for memoryless nodes
  2007-06-12 20:48 [patch 0/3] Fixes for NUMA allocations on memoryless nodes clameter
  2007-06-12 20:48 ` [patch 1/3] NUMA: introduce node_memory_map clameter
@ 2007-06-12 20:48 ` clameter
  2007-06-12 21:03   ` David Rientjes
  2007-06-13 21:10   ` Lee Schermerhorn
  2007-06-12 20:48 ` [patch 3/3] Fix MPOL_INTERLEAVE " clameter
  2 siblings, 2 replies; 40+ messages in thread
From: clameter @ 2007-06-12 20:48 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, ak, Nishanth Aravamudan, Lee Schermerhorn

[-- Attachment #1: gfp_thisnode_fix --]
[-- Type: text/plain, Size: 1785 bytes --]

GFP_THISNODE checks that the zone selected is within the pgdat (node) of the
first zone of a nodelist. That only works if the node has memory. A
memoryless node will have its first zone on another pgdat (node).

Thus GFP_THISNODE may be returning memory on other nodes.
GFP_THISNODE should fail if there is no local memory on a node.

So we add a check to verify that the node specified has memory in
alloc_pages_node(). If the node has no memory then return NULL.

The case of alloc_pages(GFP_THISNODE) is not changed. alloc_pages() (with
no memory policies in effect) is understood to prefer the current node.
If a process is running on a node with no memory then its default allocations
come from the next neighboring node. GFP_THISNODE will then force the memory
to come from that node.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>

Index: linux-2.6.22-rc4-mm2/include/linux/gfp.h
===================================================================
--- linux-2.6.22-rc4-mm2.orig/include/linux/gfp.h	2007-06-12 12:33:37.000000000 -0700
+++ linux-2.6.22-rc4-mm2/include/linux/gfp.h	2007-06-12 12:38:37.000000000 -0700
@@ -175,6 +175,13 @@ static inline struct page *alloc_pages_n
 	if (nid < 0)
 		nid = numa_node_id();
 
+	/*
+	 * Check for the special case that GFP_THISNODE is used on a
+	 * memoryless node
+	 */
+	if ((gfp_mask & __GFP_THISNODE) && !node_memory(nid))
+		return NULL;
+
 	return __alloc_pages(gfp_mask, order,
 		NODE_DATA(nid)->node_zonelists + gfp_zone(gfp_mask));
 }

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [patch 3/3] Fix MPOL_INTERLEAVE behavior for memoryless nodes
  2007-06-12 20:48 [patch 0/3] Fixes for NUMA allocations on memoryless nodes clameter
  2007-06-12 20:48 ` [patch 1/3] NUMA: introduce node_memory_map clameter
  2007-06-12 20:48 ` [patch 2/3] Fix GFP_THISNODE behavior for memoryless nodes clameter
@ 2007-06-12 20:48 ` clameter
  2 siblings, 0 replies; 40+ messages in thread
From: clameter @ 2007-06-12 20:48 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, ak, Nishanth Aravamudan, Lee Schermerhorn

[-- Attachment #1: fix_interleave --]
[-- Type: text/plain, Size: 1339 bytes --]

MPOL_INTERLEAVE currently simply loops over all nodes. Allocations on
memoryless nodes will be redirected to nodes with memory. This results in
an imbalance because the neighboring nodes to memoryless nodes will get
significantly more interleave hits that the rest of the nodes.

We can avoid this imbalance by clearing the nodes in the interleave node
set that have no memory.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>

Index: linux-2.6.22-rc4-mm2/mm/mempolicy.c
===================================================================
--- linux-2.6.22-rc4-mm2.orig/mm/mempolicy.c	2007-06-12 13:45:16.000000000 -0700
+++ linux-2.6.22-rc4-mm2/mm/mempolicy.c	2007-06-12 13:45:31.000000000 -0700
@@ -185,7 +185,8 @@ static struct mempolicy *mpol_new(int mo
 	switch (mode) {
 	case MPOL_INTERLEAVE:
 		policy->v.nodes = *nodes;
-		if (nodes_weight(*nodes) == 0) {
+		nodes_and(policy->v.nodes, policy->v.nodes, node_memory_map);
+		if (nodes_weight(policy->v.nodes) == 0) {
 			kmem_cache_free(policy_cache, policy);
 			return ERR_PTR(-EINVAL);
 		}

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 1/3] NUMA: introduce node_memory_map
  2007-06-12 20:48 ` [patch 1/3] NUMA: introduce node_memory_map clameter
@ 2007-06-12 21:03   ` David Rientjes
  2007-06-12 21:08     ` Christoph Lameter
  0 siblings, 1 reply; 40+ messages in thread
From: David Rientjes @ 2007-06-12 21:03 UTC (permalink / raw)
  To: clameter; +Cc: akpm, linux-mm, ak, Nishanth Aravamudan, Lee Schermerhorn

On Tue, 12 Jun 2007, clameter@sgi.com wrote:

> Index: linux-2.6.22-rc4-mm2/include/linux/nodemask.h
> ===================================================================
> --- linux-2.6.22-rc4-mm2.orig/include/linux/nodemask.h	2007-06-12 12:32:38.000000000 -0700
> +++ linux-2.6.22-rc4-mm2/include/linux/nodemask.h	2007-06-12 13:45:44.000000000 -0700
> @@ -64,12 +64,16 @@
>   *
>   * int node_online(node)		Is some node online?
>   * int node_possible(node)		Is some node possible?
> + * int node_memory(node)		Does a node have memory?
>   *

This name doesn't make sense; wouldn't node_has_memory() be better?

>   * int any_online_node(mask)		First online node in mask
>   *
>   * node_set_online(node)		set bit 'node' in node_online_map
>   * node_set_offline(node)		clear bit 'node' in node_online_map
>   *
> + * node_set_memory(node)		set bit 'node' in node_memory_map
> + * node_clear_memoryd(node)		clear bit 'node' in node_memory_map
> + *

Extra 'd' there in node_clear_memory().

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 2/3] Fix GFP_THISNODE behavior for memoryless nodes
  2007-06-12 20:48 ` [patch 2/3] Fix GFP_THISNODE behavior for memoryless nodes clameter
@ 2007-06-12 21:03   ` David Rientjes
  2007-06-12 21:07     ` Christoph Lameter
  2007-06-13 21:10   ` Lee Schermerhorn
  1 sibling, 1 reply; 40+ messages in thread
From: David Rientjes @ 2007-06-12 21:03 UTC (permalink / raw)
  To: clameter; +Cc: akpm, linux-mm, ak, Nishanth Aravamudan, Lee Schermerhorn

On Tue, 12 Jun 2007, clameter@sgi.com wrote:

> Index: linux-2.6.22-rc4-mm2/include/linux/gfp.h
> ===================================================================
> --- linux-2.6.22-rc4-mm2.orig/include/linux/gfp.h	2007-06-12 12:33:37.000000000 -0700
> +++ linux-2.6.22-rc4-mm2/include/linux/gfp.h	2007-06-12 12:38:37.000000000 -0700
> @@ -175,6 +175,13 @@ static inline struct page *alloc_pages_n
>  	if (nid < 0)
>  		nid = numa_node_id();
>  
> +	/*
> +	 * Check for the special case that GFP_THISNODE is used on a
> +	 * memoryless node
> +	 */
> +	if ((gfp_mask & __GFP_THISNODE) && !node_memory(nid))
> +		return NULL;
> +

unlikely()?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 2/3] Fix GFP_THISNODE behavior for memoryless nodes
  2007-06-12 21:03   ` David Rientjes
@ 2007-06-12 21:07     ` Christoph Lameter
  2007-06-12 21:08       ` David Rientjes
  0 siblings, 1 reply; 40+ messages in thread
From: Christoph Lameter @ 2007-06-12 21:07 UTC (permalink / raw)
  To: David Rientjes; +Cc: akpm, linux-mm, ak, Nishanth Aravamudan, Lee Schermerhorn

On Tue, 12 Jun 2007, David Rientjes wrote:

> > ===================================================================
> > --- linux-2.6.22-rc4-mm2.orig/include/linux/gfp.h	2007-06-12 12:33:37.000000000 -0700
> > +++ linux-2.6.22-rc4-mm2/include/linux/gfp.h	2007-06-12 12:38:37.000000000 -0700
> > @@ -175,6 +175,13 @@ static inline struct page *alloc_pages_n
> >  	if (nid < 0)
> >  		nid = numa_node_id();
> >  
> > +	/*
> > +	 * Check for the special case that GFP_THISNODE is used on a
> > +	 * memoryless node
> > +	 */
> > +	if ((gfp_mask & __GFP_THISNODE) && !node_memory(nid))
> > +		return NULL;
> > +
> 
> unlikely()?

The gfp_mask is typically constant. So the whole expression is folded by the 
compiler into either

if (!node_memory(nid))
	return NULL

or 

nothing.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 1/3] NUMA: introduce node_memory_map
  2007-06-12 21:03   ` David Rientjes
@ 2007-06-12 21:08     ` Christoph Lameter
  2007-06-12 21:10       ` David Rientjes
  0 siblings, 1 reply; 40+ messages in thread
From: Christoph Lameter @ 2007-06-12 21:08 UTC (permalink / raw)
  To: David Rientjes; +Cc: akpm, linux-mm, ak, Nishanth Aravamudan, Lee Schermerhorn

On Tue, 12 Jun 2007, David Rientjes wrote:

> >   * int node_online(node)		Is some node online?
> >   * int node_possible(node)		Is some node possible?
> > + * int node_memory(node)		Does a node have memory?
> >   *
> 
> This name doesn't make sense; wouldn't node_has_memory() be better?

node_set_has_memory and node_clear_has_memory sounds a bit strange.

> 
> >   * int any_online_node(mask)		First online node in mask
> >   *
> >   * node_set_online(node)		set bit 'node' in node_online_map
> >   * node_set_offline(node)		clear bit 'node' in node_online_map
> >   *
> > + * node_set_memory(node)		set bit 'node' in node_memory_map
> > + * node_clear_memoryd(node)		clear bit 'node' in node_memory_map
> > + *
> 
> Extra 'd' there in node_clear_memory().

Tss.... It survived my editing.

Fix typo

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.22-rc4-mm2/include/linux/nodemask.h
===================================================================
--- linux-2.6.22-rc4-mm2.orig/include/linux/nodemask.h	2007-06-12 14:07:52.000000000 -0700
+++ linux-2.6.22-rc4-mm2/include/linux/nodemask.h	2007-06-12 14:08:03.000000000 -0700
@@ -72,7 +72,7 @@
  * node_set_offline(node)		clear bit 'node' in node_online_map
  *
  * node_set_memory(node)		set bit 'node' in node_memory_map
- * node_clear_memoryd(node)		clear bit 'node' in node_memory_map
+ * node_clear_memory(node)		clear bit 'node' in node_memory_map
  *
  * for_each_node(node)			for-loop node over node_possible_map
  * for_each_online_node(node)		for-loop node over node_online_map
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 2/3] Fix GFP_THISNODE behavior for memoryless nodes
  2007-06-12 21:07     ` Christoph Lameter
@ 2007-06-12 21:08       ` David Rientjes
  2007-06-12 21:25         ` Christoph Lameter
  0 siblings, 1 reply; 40+ messages in thread
From: David Rientjes @ 2007-06-12 21:08 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: akpm, linux-mm, ak, Nishanth Aravamudan, Lee Schermerhorn

On Tue, 12 Jun 2007, Christoph Lameter wrote:

> On Tue, 12 Jun 2007, David Rientjes wrote:
> 
> > > ===================================================================
> > > --- linux-2.6.22-rc4-mm2.orig/include/linux/gfp.h	2007-06-12 12:33:37.000000000 -0700
> > > +++ linux-2.6.22-rc4-mm2/include/linux/gfp.h	2007-06-12 12:38:37.000000000 -0700
> > > @@ -175,6 +175,13 @@ static inline struct page *alloc_pages_n
> > >  	if (nid < 0)
> > >  		nid = numa_node_id();
> > >  
> > > +	/*
> > > +	 * Check for the special case that GFP_THISNODE is used on a
> > > +	 * memoryless node
> > > +	 */
> > > +	if ((gfp_mask & __GFP_THISNODE) && !node_memory(nid))
> > > +		return NULL;
> > > +
> > 
> > unlikely()?
> 
> The gfp_mask is typically constant. So the whole expression is folded by the 
> compiler into either
> 
> if (!node_memory(nid))
> 	return NULL
> 
> or 
> 
> nothing.
> 

That's the point.  Isn't !node_memory(nid) unlikely?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 1/3] NUMA: introduce node_memory_map
  2007-06-12 21:08     ` Christoph Lameter
@ 2007-06-12 21:10       ` David Rientjes
  2007-06-12 21:27         ` Christoph Lameter
  2007-06-12 21:36         ` Nishanth Aravamudan
  0 siblings, 2 replies; 40+ messages in thread
From: David Rientjes @ 2007-06-12 21:10 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: akpm, linux-mm, ak, Nishanth Aravamudan, Lee Schermerhorn

On Tue, 12 Jun 2007, Christoph Lameter wrote:

> On Tue, 12 Jun 2007, David Rientjes wrote:
> 
> > >   * int node_online(node)		Is some node online?
> > >   * int node_possible(node)		Is some node possible?
> > > + * int node_memory(node)		Does a node have memory?
> > >   *
> > 
> > This name doesn't make sense; wouldn't node_has_memory() be better?
> 
> node_set_has_memory and node_clear_has_memory sounds a bit strange.
> 

This will probably be one of those things that people see in the source 
and have to look up everytime.  node_has_memory() is straight-forward and 
to the point.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 2/3] Fix GFP_THISNODE behavior for memoryless nodes
  2007-06-12 21:08       ` David Rientjes
@ 2007-06-12 21:25         ` Christoph Lameter
  2007-06-12 21:34           ` David Rientjes
  0 siblings, 1 reply; 40+ messages in thread
From: Christoph Lameter @ 2007-06-12 21:25 UTC (permalink / raw)
  To: David Rientjes; +Cc: akpm, linux-mm, ak, Nishanth Aravamudan, Lee Schermerhorn

On Tue, 12 Jun 2007, David Rientjes wrote:

> That's the point.  Isn't !node_memory(nid) unlikely?

Correct.

Use unlikely

Signed-off-cy: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.22-rc4-mm2/include/linux/gfp.h
===================================================================
--- linux-2.6.22-rc4-mm2.orig/include/linux/gfp.h	2007-06-12 14:22:57.000000000 -0700
+++ linux-2.6.22-rc4-mm2/include/linux/gfp.h	2007-06-12 14:24:46.000000000 -0700
@@ -179,7 +179,7 @@ static inline struct page *alloc_pages_n
 	 * Check for the special case that GFP_THISNODE is used on a
 	 * memoryless node
 	 */
-	if ((gfp_mask & __GFP_THISNODE) && !node_memory(nid))
+	if (unlikely((gfp_mask & __GFP_THISNODE) && !node_memory(nid)))
 		return NULL;
 
 	return __alloc_pages(gfp_mask, order,

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 1/3] NUMA: introduce node_memory_map
  2007-06-12 21:10       ` David Rientjes
@ 2007-06-12 21:27         ` Christoph Lameter
  2007-06-12 21:34           ` David Rientjes
  2007-06-12 21:36         ` Nishanth Aravamudan
  1 sibling, 1 reply; 40+ messages in thread
From: Christoph Lameter @ 2007-06-12 21:27 UTC (permalink / raw)
  To: David Rientjes; +Cc: akpm, linux-mm, ak, Nishanth Aravamudan, Lee Schermerhorn

On Tue, 12 Jun 2007, David Rientjes wrote:

> On Tue, 12 Jun 2007, Christoph Lameter wrote:
> 
> > On Tue, 12 Jun 2007, David Rientjes wrote:
> > 
> > > >   * int node_online(node)		Is some node online?
> > > >   * int node_possible(node)		Is some node possible?
> > > > + * int node_memory(node)		Does a node have memory?
> > > >   *
> > > 
> > > This name doesn't make sense; wouldn't node_has_memory() be better?
> > 
> > node_set_has_memory and node_clear_has_memory sounds a bit strange.
> > 
> 
> This will probably be one of those things that people see in the source 
> and have to look up everytime.  node_has_memory() is straight-forward and 
> to the point.

But node_possible is similar to node_memory.

Would you also prefer node_is_possible over node_possible?

node_is_online?

 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 1/3] NUMA: introduce node_memory_map
  2007-06-12 21:27         ` Christoph Lameter
@ 2007-06-12 21:34           ` David Rientjes
  2007-06-12 21:37             ` Christoph Lameter
  0 siblings, 1 reply; 40+ messages in thread
From: David Rientjes @ 2007-06-12 21:34 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: akpm, linux-mm, ak, Nishanth Aravamudan, Lee Schermerhorn

On Tue, 12 Jun 2007, Christoph Lameter wrote:

> > > On Tue, 12 Jun 2007, David Rientjes wrote:
> > > 
> > > > >   * int node_online(node)		Is some node online?
> > > > >   * int node_possible(node)		Is some node possible?
> > > > > + * int node_memory(node)		Does a node have memory?
> > > > >   *
> > > > 
> > > > This name doesn't make sense; wouldn't node_has_memory() be better?
> > > 
> > > node_set_has_memory and node_clear_has_memory sounds a bit strange.
> > > 
> > 
> > This will probably be one of those things that people see in the source 
> > and have to look up everytime.  node_has_memory() is straight-forward and 
> > to the point.
> 
> But node_possible is similar to node_memory.
> 
> Would you also prefer node_is_possible over node_possible?
> 
> node_is_online?
> 

I think the problem is that online and possible are adverbs and 
adjectives, respectively, and memory is a noun.  That's why when it 
appears in source code, it doesn't make a lot of sense for node_memory to 
return a boolean value.  I suspect it would return the memory, whatever 
that is.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 2/3] Fix GFP_THISNODE behavior for memoryless nodes
  2007-06-12 21:25         ` Christoph Lameter
@ 2007-06-12 21:34           ` David Rientjes
  2007-06-12 21:51             ` Nishanth Aravamudan
  0 siblings, 1 reply; 40+ messages in thread
From: David Rientjes @ 2007-06-12 21:34 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: akpm, linux-mm, ak, Nishanth Aravamudan, Lee Schermerhorn

On Tue, 12 Jun 2007, Christoph Lameter wrote:

> On Tue, 12 Jun 2007, David Rientjes wrote:
> 
> > That's the point.  Isn't !node_memory(nid) unlikely?
> 
> Correct.
> 
> Use unlikely
> 
> Signed-off-cy: Christoph Lameter <clameter@sgi.com>

Acked-by: David Rientjes <rientjes@google.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 1/3] NUMA: introduce node_memory_map
  2007-06-12 21:10       ` David Rientjes
  2007-06-12 21:27         ` Christoph Lameter
@ 2007-06-12 21:36         ` Nishanth Aravamudan
  2007-06-12 21:39           ` Christoph Lameter
  2007-06-13  9:14           ` Andy Whitcroft
  1 sibling, 2 replies; 40+ messages in thread
From: Nishanth Aravamudan @ 2007-06-12 21:36 UTC (permalink / raw)
  To: David Rientjes; +Cc: Christoph Lameter, akpm, linux-mm, ak, Lee Schermerhorn

On 12.06.2007 [14:10:44 -0700], David Rientjes wrote:
> On Tue, 12 Jun 2007, Christoph Lameter wrote:
> 
> > On Tue, 12 Jun 2007, David Rientjes wrote:
> > 
> > > >   * int node_online(node)		Is some node online?
> > > >   * int node_possible(node)		Is some node possible?
> > > > + * int node_memory(node)		Does a node have memory?
> > > >   *
> > > 
> > > This name doesn't make sense; wouldn't node_has_memory() be better?
> > 
> > node_set_has_memory and node_clear_has_memory sounds a bit strange.
> > 
> 
> This will probably be one of those things that people see in the
> source and have to look up everytime.  node_has_memory() is
> straight-forward and to the point.

Indeed, I did and (I like to think) I helped write the patches :)

Why not just make the boolean sensible?

We can keep

node_set_memory()
node_clear_memory()

but change node_memory() to node_has_memory() ?

Thanks,
Nish

-- 
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 1/3] NUMA: introduce node_memory_map
  2007-06-12 21:34           ` David Rientjes
@ 2007-06-12 21:37             ` Christoph Lameter
  2007-06-12 21:38               ` David Rientjes
  0 siblings, 1 reply; 40+ messages in thread
From: Christoph Lameter @ 2007-06-12 21:37 UTC (permalink / raw)
  To: David Rientjes; +Cc: akpm, linux-mm, ak, Nishanth Aravamudan, Lee Schermerhorn

On Tue, 12 Jun 2007, David Rientjes wrote:

> I think the problem is that online and possible are adverbs and 
> adjectives, respectively, and memory is a noun.  That's why when it 
> appears in source code, it doesn't make a lot of sense for node_memory to 
> return a boolean value.  I suspect it would return the memory, whatever 
> that is.

A sentence such as "This is a memory node" in opposition to "This is a 
memoryless nide" would put memory to use as an adjective.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 1/3] NUMA: introduce node_memory_map
  2007-06-12 21:37             ` Christoph Lameter
@ 2007-06-12 21:38               ` David Rientjes
  0 siblings, 0 replies; 40+ messages in thread
From: David Rientjes @ 2007-06-12 21:38 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: akpm, linux-mm, ak, Nishanth Aravamudan, Lee Schermerhorn

On Tue, 12 Jun 2007, Christoph Lameter wrote:

> A sentence such as "This is a memory node" in opposition to "This is a 
> memoryless nide" would put memory to use as an adjective.
> 

Ok, is_memory_node() is great too ;)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 1/3] NUMA: introduce node_memory_map
  2007-06-12 21:36         ` Nishanth Aravamudan
@ 2007-06-12 21:39           ` Christoph Lameter
  2007-06-12 21:42             ` Nishanth Aravamudan
  2007-06-13  9:14           ` Andy Whitcroft
  1 sibling, 1 reply; 40+ messages in thread
From: Christoph Lameter @ 2007-06-12 21:39 UTC (permalink / raw)
  To: Nishanth Aravamudan; +Cc: David Rientjes, akpm, linux-mm, ak, Lee Schermerhorn

> Indeed, I did and (I like to think) I helped write the patches :)

The patches contain your signoff because of your authorship...

> We can keep
> 
> node_set_memory()
> node_clear_memory()
> 
> but change node_memory() to node_has_memory() ?

Hmmm.... That deviates from how the other node_xxx() things are so it 
disturbed my sense of order. We have no three word node_is/has_xxx yet.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 1/3] NUMA: introduce node_memory_map
  2007-06-12 21:39           ` Christoph Lameter
@ 2007-06-12 21:42             ` Nishanth Aravamudan
  2007-06-12 21:45               ` David Rientjes
  2007-06-12 22:26               ` Christoph Lameter
  0 siblings, 2 replies; 40+ messages in thread
From: Nishanth Aravamudan @ 2007-06-12 21:42 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: David Rientjes, akpm, linux-mm, ak, Lee Schermerhorn

On 12.06.2007 [14:39:21 -0700], Christoph Lameter wrote:
> 
> > Indeed, I did and (I like to think) I helped write the patches :)
> 
> The patches contain your signoff because of your authorship...
> 
> > We can keep
> > 
> > node_set_memory()
> > node_clear_memory()
> > 
> > but change node_memory() to node_has_memory() ?
> 
> Hmmm.... That deviates from how the other node_xxx() things are so it
> disturbed my sense of order. We have no three word node_is/has_xxx
> yet.

Yeah, I realize that -- but I also agree with David that

	node_memory()

is not intuitive at all. And we've already admitted that a few of the
macros in there are inconsistent already :)

Thanks,
Nish

-- 
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 1/3] NUMA: introduce node_memory_map
  2007-06-12 21:42             ` Nishanth Aravamudan
@ 2007-06-12 21:45               ` David Rientjes
  2007-06-12 22:26               ` Christoph Lameter
  1 sibling, 0 replies; 40+ messages in thread
From: David Rientjes @ 2007-06-12 21:45 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: Christoph Lameter, akpm, linux-mm, ak, Lee Schermerhorn

On Tue, 12 Jun 2007, Nishanth Aravamudan wrote:

> Yeah, I realize that -- but I also agree with David that
> 
> 	node_memory()
> 
> is not intuitive at all. And we've already admitted that a few of the
> macros in there are inconsistent already :)
> 

Creating new macros that conform to the others in terms of 
node_<single-word-here>() is great, but useless if it isn't readable in 
soruce code.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 2/3] Fix GFP_THISNODE behavior for memoryless nodes
  2007-06-12 21:34           ` David Rientjes
@ 2007-06-12 21:51             ` Nishanth Aravamudan
  0 siblings, 0 replies; 40+ messages in thread
From: Nishanth Aravamudan @ 2007-06-12 21:51 UTC (permalink / raw)
  To: David Rientjes; +Cc: Christoph Lameter, akpm, linux-mm, ak, Lee Schermerhorn

On 12.06.2007 [14:34:40 -0700], David Rientjes wrote:
> On Tue, 12 Jun 2007, Christoph Lameter wrote:
> 
> > On Tue, 12 Jun 2007, David Rientjes wrote:
> > 
> > > That's the point.  Isn't !node_memory(nid) unlikely?
> > 
> > Correct.
> > 
> > Use unlikely
> > 
> > Signed-off-cy: Christoph Lameter <clameter@sgi.com>
> 
> Acked-by: David Rientjes <rientjes@google.com>

Yep, definitely makes sense.

Acked-by: Nishanth Aravamudan <nacc@us.ibm.com>

-- 
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 1/3] NUMA: introduce node_memory_map
  2007-06-12 21:42             ` Nishanth Aravamudan
  2007-06-12 21:45               ` David Rientjes
@ 2007-06-12 22:26               ` Christoph Lameter
  2007-06-12 22:32                 ` Nishanth Aravamudan
  1 sibling, 1 reply; 40+ messages in thread
From: Christoph Lameter @ 2007-06-12 22:26 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: David Rientjes, akpm, linux-mm, ak, pj, Lee Schermerhorn

On Tue, 12 Jun 2007, Nishanth Aravamudan wrote:

> is not intuitive at all. And we've already admitted that a few of the
> macros in there are inconsistent already :)

I do not want to add another inconsistency.

if (node_memory(node))

is pretty clear as far as I can tell.

Some of the macros in include/linux/nodemask.h are inconsistent. How can 
we make those consistent. Could you come up with a consistent naming 
scheme? Add the explanation for that scheme.

But that should be a separate patch. And the patch would have to change 
all uses of those macros in the kernel source tree.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 1/3] NUMA: introduce node_memory_map
  2007-06-12 22:26               ` Christoph Lameter
@ 2007-06-12 22:32                 ` Nishanth Aravamudan
  0 siblings, 0 replies; 40+ messages in thread
From: Nishanth Aravamudan @ 2007-06-12 22:32 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: David Rientjes, akpm, linux-mm, ak, pj, Lee Schermerhorn

On 12.06.2007 [15:26:25 -0700], Christoph Lameter wrote:
> On Tue, 12 Jun 2007, Nishanth Aravamudan wrote:
> 
> > is not intuitive at all. And we've already admitted that a few of the
> > macros in there are inconsistent already :)
> 
> I do not want to add another inconsistency.
> 
> if (node_memory(node))
> 
> is pretty clear as far as I can tell.

Fair enough.

> Some of the macros in include/linux/nodemask.h are inconsistent. How
> can we make those consistent. Could you come up with a consistent
> naming scheme? Add the explanation for that scheme.

I'll think it over :)

> But that should be a separate patch. And the patch would have to
> change all uses of those macros in the kernel source tree.

Agreed, will be a down-the-road thing. In the meantime, I'm rebasing the
hugetlb pool patch and sysfs interface patch on top of your stack and
will repost.

Thanks,
Nish

-- 
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 1/3] NUMA: introduce node_memory_map
  2007-06-12 21:36         ` Nishanth Aravamudan
  2007-06-12 21:39           ` Christoph Lameter
@ 2007-06-13  9:14           ` Andy Whitcroft
  1 sibling, 0 replies; 40+ messages in thread
From: Andy Whitcroft @ 2007-06-13  9:14 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: David Rientjes, Christoph Lameter, akpm, linux-mm, ak, Lee Schermerhorn

Nishanth Aravamudan wrote:
> On 12.06.2007 [14:10:44 -0700], David Rientjes wrote:
>> On Tue, 12 Jun 2007, Christoph Lameter wrote:
>>
>>> On Tue, 12 Jun 2007, David Rientjes wrote:
>>>
>>>>>   * int node_online(node)		Is some node online?
>>>>>   * int node_possible(node)		Is some node possible?
>>>>> + * int node_memory(node)		Does a node have memory?
>>>>>   *
>>>> This name doesn't make sense; wouldn't node_has_memory() be better?
>>> node_set_has_memory and node_clear_has_memory sounds a bit strange.
>>>
>> This will probably be one of those things that people see in the
>> source and have to look up everytime.  node_has_memory() is
>> straight-forward and to the point.
> 
> Indeed, I did and (I like to think) I helped write the patches :)
> 
> Why not just make the boolean sensible?
> 
> We can keep
> 
> node_set_memory()
> node_clear_memory()

node_clear_memory() sounds like something to memset all of that node's
memory or something.

> but change node_memory() to node_has_memory() ?

-apw

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 2/3] Fix GFP_THISNODE behavior for memoryless nodes
  2007-06-12 20:48 ` [patch 2/3] Fix GFP_THISNODE behavior for memoryless nodes clameter
  2007-06-12 21:03   ` David Rientjes
@ 2007-06-13 21:10   ` Lee Schermerhorn
  2007-06-13 21:57     ` Nishanth Aravamudan
                       ` (2 more replies)
  1 sibling, 3 replies; 40+ messages in thread
From: Lee Schermerhorn @ 2007-06-13 21:10 UTC (permalink / raw)
  To: clameter; +Cc: akpm, linux-mm, ak, Nishanth Aravamudan

On Tue, 2007-06-12 at 13:48 -0700, clameter@sgi.com wrote:
> GFP_THISNODE checks that the zone selected is within the pgdat (node) of the
> first zone of a nodelist. That only works if the node has memory. A
> memoryless node will have its first zone on another pgdat (node).
> 
> Thus GFP_THISNODE may be returning memory on other nodes.
> GFP_THISNODE should fail if there is no local memory on a node.
> 
> So we add a check to verify that the node specified has memory in
> alloc_pages_node(). If the node has no memory then return NULL.
> 
> The case of alloc_pages(GFP_THISNODE) is not changed. alloc_pages() (with
> no memory policies in effect) is understood to prefer the current node.
> If a process is running on a node with no memory then its default allocations
> come from the next neighboring node. GFP_THISNODE will then force the memory
> to come from that node.
> 
> Signed-off-by: Christoph Lameter <clameter@sgi.com>
> Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
> 
> Index: linux-2.6.22-rc4-mm2/include/linux/gfp.h
> ===================================================================
> --- linux-2.6.22-rc4-mm2.orig/include/linux/gfp.h	2007-06-12 12:33:37.000000000 -0700
> +++ linux-2.6.22-rc4-mm2/include/linux/gfp.h	2007-06-12 12:38:37.000000000 -0700
> @@ -175,6 +175,13 @@ static inline struct page *alloc_pages_n
>  	if (nid < 0)
>  		nid = numa_node_id();
>  
> +	/*
> +	 * Check for the special case that GFP_THISNODE is used on a
> +	 * memoryless node
> +	 */
> +	if ((gfp_mask & __GFP_THISNODE) && !node_memory(nid))
> +		return NULL;
> +
>  	return __alloc_pages(gfp_mask, order,
>  		NODE_DATA(nid)->node_zonelists + gfp_zone(gfp_mask));
>  }
> 

Attached patch fixes alloc_pages_node() so that it never returns an
off-node page when GFP_THISNODE is specified by.  This requires a fix to
SLUB early allocation, included in the patch.  Works on HP ia64 platform
with small DMA only node and "zone order" zonelists.  Will test on
x86_64 real soon now...

---

PATCH  fix GFP_THISNODE for DMA only nodes and zone-order zonelists

The map of nodes with memory may include nodes with just
DMA/DMA32 memory.  Using this map/mask together with
GFP_THISNODE will not guarantee on-node allocations at higher
zones.  Modify checks in alloc_pages_node() to ensure that the
first zone in the selected zonelist is "on-node".

This change will result in alloc_pages_node() returning NULL
when GFP_THISNODE is specified and the first zone in the zonelist
selected by (nid, gfp_zone(gfp_mask) is not on node 'nid'.  This,
in turn, BUGs out in slub.c:early_kmem_cache_node_alloc() which
apparently can't handle a NULL page from new_slab().  Fix SLUB
to handle NULL page in early allocation.

Signed-off-by:  Lee Schermerhorn <lee.schermerhorn@hp.com>

 include/linux/gfp.h |   11 ++++++++---
 mm/slub.c           |   22 ++++++++++++----------
 2 files changed, 20 insertions(+), 13 deletions(-)

Index: Linux/include/linux/gfp.h
===================================================================
--- Linux.orig/include/linux/gfp.h	2007-06-13 16:36:02.000000000 -0400
+++ Linux/include/linux/gfp.h	2007-06-13 16:38:41.000000000 -0400
@@ -168,6 +168,9 @@ FASTCALL(__alloc_pages(gfp_t, unsigned i
 static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
 						unsigned int order)
 {
+	pg_data_t *pgdat;
+	struct zonelist *zonelist;
+
 	if (unlikely(order >= MAX_ORDER))
 		return NULL;
 
@@ -179,11 +182,13 @@ static inline struct page *alloc_pages_n
 	 * Check for the special case that GFP_THISNODE is used on a
 	 * memoryless node
 	 */
-	if ((gfp_mask & __GFP_THISNODE) && !node_memory(nid))
+	pgdat = NODE_DATA(nid);
+	zonelist = pgdat->node_zonelists + gfp_zone(gfp_mask);
+	if ((gfp_mask & __GFP_THISNODE) &&
+		pgdat != zonelist->zones[0]->zone_pgdat)
 		return NULL;
 
-	return __alloc_pages(gfp_mask, order,
-		NODE_DATA(nid)->node_zonelists + gfp_zone(gfp_mask));
+	return __alloc_pages(gfp_mask, order, zonelist);
 }
 
 #ifdef CONFIG_NUMA
Index: Linux/mm/slub.c
===================================================================
--- Linux.orig/mm/slub.c	2007-06-13 16:36:02.000000000 -0400
+++ Linux/mm/slub.c	2007-06-13 16:38:41.000000000 -0400
@@ -1870,16 +1870,18 @@ static struct kmem_cache_node * __init e
 	/* new_slab() disables interupts */
 	local_irq_enable();
 
-	BUG_ON(!page);
-	n = page->freelist;
-	BUG_ON(!n);
-	page->freelist = get_freepointer(kmalloc_caches, n);
-	page->inuse++;
-	kmalloc_caches->node[node] = n;
-	setup_object_debug(kmalloc_caches, page, n);
-	init_kmem_cache_node(n);
-	atomic_long_inc(&n->nr_slabs);
-	add_partial(n, page);
+	if (page) {
+		n = page->freelist;
+		BUG_ON(!n);
+		page->freelist = get_freepointer(kmalloc_caches, n);
+		page->inuse++;
+		kmalloc_caches->node[node] = n;
+		setup_object_debug(kmalloc_caches, page, n);
+		init_kmem_cache_node(n);
+		atomic_long_inc(&n->nr_slabs);
+		add_partial(n, page);
+	} else
+		kmalloc_caches->node[node] = NULL;
 	return n;
 }
 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 2/3] Fix GFP_THISNODE behavior for memoryless nodes
  2007-06-13 21:10   ` Lee Schermerhorn
@ 2007-06-13 21:57     ` Nishanth Aravamudan
  2007-06-13 22:46     ` Christoph Lameter
  2007-06-14  7:07     ` Christoph Lameter
  2 siblings, 0 replies; 40+ messages in thread
From: Nishanth Aravamudan @ 2007-06-13 21:57 UTC (permalink / raw)
  To: Lee Schermerhorn; +Cc: clameter, akpm, linux-mm, ak

On 13.06.2007 [17:10:32 -0400], Lee Schermerhorn wrote:
> On Tue, 2007-06-12 at 13:48 -0700, clameter@sgi.com wrote:
> > GFP_THISNODE checks that the zone selected is within the pgdat (node) of the
> > first zone of a nodelist. That only works if the node has memory. A
> > memoryless node will have its first zone on another pgdat (node).
> > 
> > Thus GFP_THISNODE may be returning memory on other nodes.
> > GFP_THISNODE should fail if there is no local memory on a node.
> > 
> > So we add a check to verify that the node specified has memory in
> > alloc_pages_node(). If the node has no memory then return NULL.
> > 
> > The case of alloc_pages(GFP_THISNODE) is not changed. alloc_pages() (with
> > no memory policies in effect) is understood to prefer the current node.
> > If a process is running on a node with no memory then its default allocations
> > come from the next neighboring node. GFP_THISNODE will then force the memory
> > to come from that node.
> > 
> > Signed-off-by: Christoph Lameter <clameter@sgi.com>
> > Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
> > 
> > Index: linux-2.6.22-rc4-mm2/include/linux/gfp.h
> > ===================================================================
> > --- linux-2.6.22-rc4-mm2.orig/include/linux/gfp.h	2007-06-12 12:33:37.000000000 -0700
> > +++ linux-2.6.22-rc4-mm2/include/linux/gfp.h	2007-06-12 12:38:37.000000000 -0700
> > @@ -175,6 +175,13 @@ static inline struct page *alloc_pages_n
> >  	if (nid < 0)
> >  		nid = numa_node_id();
> >  
> > +	/*
> > +	 * Check for the special case that GFP_THISNODE is used on a
> > +	 * memoryless node
> > +	 */
> > +	if ((gfp_mask & __GFP_THISNODE) && !node_memory(nid))
> > +		return NULL;
> > +
> >  	return __alloc_pages(gfp_mask, order,
> >  		NODE_DATA(nid)->node_zonelists + gfp_zone(gfp_mask));
> >  }
> > 
> 
> Attached patch fixes alloc_pages_node() so that it never returns an
> off-node page when GFP_THISNODE is specified by.  This requires a fix to
> SLUB early allocation, included in the patch.  Works on HP ia64 platform
> with small DMA only node and "zone order" zonelists.  Will test on
> x86_64 real soon now...
> 
> ---
> 
> PATCH  fix GFP_THISNODE for DMA only nodes and zone-order zonelists
> 
> The map of nodes with memory may include nodes with just
> DMA/DMA32 memory.  Using this map/mask together with
> GFP_THISNODE will not guarantee on-node allocations at higher
> zones.  Modify checks in alloc_pages_node() to ensure that the
> first zone in the selected zonelist is "on-node".
> 
> This change will result in alloc_pages_node() returning NULL
> when GFP_THISNODE is specified and the first zone in the zonelist
> selected by (nid, gfp_zone(gfp_mask) is not on node 'nid'.  This,
> in turn, BUGs out in slub.c:early_kmem_cache_node_alloc() which
> apparently can't handle a NULL page from new_slab().  Fix SLUB
> to handle NULL page in early allocation.
> 
> Signed-off-by:  Lee Schermerhorn <lee.schermerhorn@hp.com>
> 
>  include/linux/gfp.h |   11 ++++++++---
>  mm/slub.c           |   22 ++++++++++++----------
>  2 files changed, 20 insertions(+), 13 deletions(-)
> 
> Index: Linux/include/linux/gfp.h
> ===================================================================
> --- Linux.orig/include/linux/gfp.h	2007-06-13 16:36:02.000000000 -0400
> +++ Linux/include/linux/gfp.h	2007-06-13 16:38:41.000000000 -0400
> @@ -168,6 +168,9 @@ FASTCALL(__alloc_pages(gfp_t, unsigned i
>  static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
>  						unsigned int order)
>  {
> +	pg_data_t *pgdat;
> +	struct zonelist *zonelist;
> +
>  	if (unlikely(order >= MAX_ORDER))
>  		return NULL;
> 
> @@ -179,11 +182,13 @@ static inline struct page *alloc_pages_n
>  	 * Check for the special case that GFP_THISNODE is used on a
>  	 * memoryless node
>  	 */
> -	if ((gfp_mask & __GFP_THISNODE) && !node_memory(nid))
> +	pgdat = NODE_DATA(nid);
> +	zonelist = pgdat->node_zonelists + gfp_zone(gfp_mask);
> +	if ((gfp_mask & __GFP_THISNODE) &&
> +		pgdat != zonelist->zones[0]->zone_pgdat)

This hunk won't apply on top of the unlikely() change... Rediffed below,
adding into the set to test now...

From: Lee Schermerhorn <lee.schermerhorn@hp.com>

PATCH  fix GFP_THISNODE for DMA only nodes and zone-order zonelists

The map of nodes with memory may include nodes with just
DMA/DMA32 memory.  Using this map/mask together with
GFP_THISNODE will not guarantee on-node allocations at higher
zones.  Modify checks in alloc_pages_node() to ensure that the
first zone in the selected zonelist is "on-node".

This change will result in alloc_pages_node() returning NULL when
GFP_THISNODE is specified and the first zone in the zonelist selected by
(nid, gfp_zone(gfp_mask) is not on node 'nid'.  This, in turn, BUGs out
in slub.c:early_kmem_cache_node_alloc() which apparently can't handle a
NULL page from new_slab().  Fix SLUB to handle NULL page in early
allocation.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index ecd8adb..98a76c9 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -168,6 +168,9 @@ FASTCALL(__alloc_pages(gfp_t, unsigned int, struct zonelist *));
 static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
 						unsigned int order)
 {
+	pg_data_t *pgdat;
+	struct zonelist *zonelist;
+
 	if (unlikely(order >= MAX_ORDER))
 		return NULL;
 
@@ -175,15 +178,17 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
 	if (nid < 0)
 		nid = numa_node_id();
 
+	pgdat = NODE_DATA(nid);
+	zonelist = pgdat->node_zonelists + gfp_zone(gfp_mask);
 	/*
 	 * Check for the special case that GFP_THISNODE is used on a
 	 * memoryless node
 	 */
-	if (unlikely((gfp_mask & __GFP_THISNODE) && !node_memory(nid)))
+	if (unlikely((gfp_mask & __GFP_THISNODE) &&
+		pgdat != zonelist->zones[0]->zone_pgdat))
 		return NULL;
 
-	return __alloc_pages(gfp_mask, order,
-		NODE_DATA(nid)->node_zonelists + gfp_zone(gfp_mask));
+	return __alloc_pages(gfp_mask, order, zonelist);
 }
 
 #ifdef CONFIG_NUMA
diff --git a/mm/slub.c b/mm/slub.c
index c16b75d..43a9270 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1870,16 +1870,18 @@ static struct kmem_cache_node * __init early_kmem_cache_node_alloc(gfp_t gfpflag
 	/* new_slab() disables interupts */
 	local_irq_enable();
 
-	BUG_ON(!page);
-	n = page->freelist;
-	BUG_ON(!n);
-	page->freelist = get_freepointer(kmalloc_caches, n);
-	page->inuse++;
-	kmalloc_caches->node[node] = n;
-	setup_object_debug(kmalloc_caches, page, n);
-	init_kmem_cache_node(n);
-	atomic_long_inc(&n->nr_slabs);
-	add_partial(n, page);
+	if (page) {
+		n = page->freelist;
+		BUG_ON(!n);
+		page->freelist = get_freepointer(kmalloc_caches, n);
+		page->inuse++;
+		kmalloc_caches->node[node] = n;
+		setup_object_debug(kmalloc_caches, page, n);
+		init_kmem_cache_node(n);
+		atomic_long_inc(&n->nr_slabs);
+		add_partial(n, page);
+	} else
+		kmalloc_caches->node[node] = NULL;
 	return n;
 }
 

-- 
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 2/3] Fix GFP_THISNODE behavior for memoryless nodes
  2007-06-13 21:10   ` Lee Schermerhorn
  2007-06-13 21:57     ` Nishanth Aravamudan
@ 2007-06-13 22:46     ` Christoph Lameter
  2007-06-13 23:11       ` Nishanth Aravamudan
  2007-06-14 14:18       ` Lee Schermerhorn
  2007-06-14  7:07     ` Christoph Lameter
  2 siblings, 2 replies; 40+ messages in thread
From: Christoph Lameter @ 2007-06-13 22:46 UTC (permalink / raw)
  To: Lee Schermerhorn; +Cc: akpm, linux-mm, ak, Nishanth Aravamudan

On Wed, 13 Jun 2007, Lee Schermerhorn wrote:

> SLUB early allocation, included in the patch.  Works on HP ia64 platform
> with small DMA only node and "zone order" zonelists.  Will test on
> x86_64 real soon now...

I do not see the difference?? How does this work? node_memory(x) fails 
there?

> The map of nodes with memory may include nodes with just
> DMA/DMA32 memory.  Using this map/mask together with
> GFP_THISNODE will not guarantee on-node allocations at higher
> zones.  Modify checks in alloc_pages_node() to ensure that the
> first zone in the selected zonelist is "on-node".

That check is already done by __alloc_pages.

> This change will result in alloc_pages_node() returning NULL
> when GFP_THISNODE is specified and the first zone in the zonelist
> selected by (nid, gfp_zone(gfp_mask) is not on node 'nid'.  This,
> in turn, BUGs out in slub.c:early_kmem_cache_node_alloc() which
> apparently can't handle a NULL page from new_slab().  Fix SLUB
> to handle NULL page in early allocation.

Ummm... Slub would need to consult node_memory_map instead I guess.

> Index: Linux/mm/slub.c
> ===================================================================
> --- Linux.orig/mm/slub.c	2007-06-13 16:36:02.000000000 -0400
> +++ Linux/mm/slub.c	2007-06-13 16:38:41.000000000 -0400
> @@ -1870,16 +1870,18 @@ static struct kmem_cache_node * __init e
>  	/* new_slab() disables interupts */
>  	local_irq_enable();
>  
> -	BUG_ON(!page);
> -	n = page->freelist;
> -	BUG_ON(!n);
> -	page->freelist = get_freepointer(kmalloc_caches, n);
> -	page->inuse++;
> -	kmalloc_caches->node[node] = n;
> -	setup_object_debug(kmalloc_caches, page, n);
> -	init_kmem_cache_node(n);
> -	atomic_long_inc(&n->nr_slabs);
> -	add_partial(n, page);
> +	if (page) {
> +		n = page->freelist;
> +		BUG_ON(!n);
> +		page->freelist = get_freepointer(kmalloc_caches, n);
> +		page->inuse++;
> +		kmalloc_caches->node[node] = n;
> +		setup_object_debug(kmalloc_caches, page, n);
> +		init_kmem_cache_node(n);
> +		atomic_long_inc(&n->nr_slabs);
> +		add_partial(n, page);
> +	} else
> +		kmalloc_caches->node[node] = NULL;
>  	return n;
>  }

It would be easier to modify SLUB to loop over node_memory_map instead of 
node_online_map? Potentially we have to change all loops over online node 
in the slab allocators.

---
 include/linux/nodemask.h |    1 +
 mm/slub.c                |    2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

Index: linux-2.6/include/linux/nodemask.h
===================================================================
--- linux-2.6.orig/include/linux/nodemask.h	2007-06-13 15:40:27.000000000 -0700
+++ linux-2.6/include/linux/nodemask.h	2007-06-13 15:40:48.000000000 -0700
@@ -377,5 +377,6 @@ extern int nr_node_ids;
 
 #define for_each_node(node)	   for_each_node_mask((node), node_possible_map)
 #define for_each_online_node(node) for_each_node_mask((node), node_online_map)
+#define for_each_memory_node(node) for_each_node_mask((node), node_memory_map)
 
 #endif /* __LINUX_NODEMASK_H */
Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2007-06-13 15:39:16.000000000 -0700
+++ linux-2.6/mm/slub.c	2007-06-13 15:40:23.000000000 -0700
@@ -1836,7 +1836,7 @@ static int init_kmem_cache_nodes(struct 
 	else
 		local_node = 0;
 
-	for_each_online_node(node) {
+	for_each_memory_node(node) {
 		struct kmem_cache_node *n;
 
 		if (local_node == node)




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 2/3] Fix GFP_THISNODE behavior for memoryless nodes
  2007-06-13 22:46     ` Christoph Lameter
@ 2007-06-13 23:11       ` Nishanth Aravamudan
  2007-06-13 23:15         ` Christoph Lameter
  2007-06-14 14:18       ` Lee Schermerhorn
  1 sibling, 1 reply; 40+ messages in thread
From: Nishanth Aravamudan @ 2007-06-13 23:11 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Lee Schermerhorn, akpm, linux-mm, ak

On 13.06.2007 [15:46:11 -0700], Christoph Lameter wrote:
> On Wed, 13 Jun 2007, Lee Schermerhorn wrote:
> 
> > SLUB early allocation, included in the patch.  Works on HP ia64 platform
> > with small DMA only node and "zone order" zonelists.  Will test on
> > x86_64 real soon now...
> 
> I do not see the difference?? How does this work? node_memory(x) fails 
> there?
> 
> > The map of nodes with memory may include nodes with just
> > DMA/DMA32 memory.  Using this map/mask together with
> > GFP_THISNODE will not guarantee on-node allocations at higher
> > zones.  Modify checks in alloc_pages_node() to ensure that the
> > first zone in the selected zonelist is "on-node".
> 
> That check is already done by __alloc_pages.
> 
> > This change will result in alloc_pages_node() returning NULL
> > when GFP_THISNODE is specified and the first zone in the zonelist
> > selected by (nid, gfp_zone(gfp_mask) is not on node 'nid'.  This,
> > in turn, BUGs out in slub.c:early_kmem_cache_node_alloc() which
> > apparently can't handle a NULL page from new_slab().  Fix SLUB
> > to handle NULL page in early allocation.
> 
> Ummm... Slub would need to consult node_memory_map instead I guess.
> 
> > Index: Linux/mm/slub.c
> > ===================================================================
> > --- Linux.orig/mm/slub.c	2007-06-13 16:36:02.000000000 -0400
> > +++ Linux/mm/slub.c	2007-06-13 16:38:41.000000000 -0400
> > @@ -1870,16 +1870,18 @@ static struct kmem_cache_node * __init e
> >  	/* new_slab() disables interupts */
> >  	local_irq_enable();
> >  
> > -	BUG_ON(!page);
> > -	n = page->freelist;
> > -	BUG_ON(!n);
> > -	page->freelist = get_freepointer(kmalloc_caches, n);
> > -	page->inuse++;
> > -	kmalloc_caches->node[node] = n;
> > -	setup_object_debug(kmalloc_caches, page, n);
> > -	init_kmem_cache_node(n);
> > -	atomic_long_inc(&n->nr_slabs);
> > -	add_partial(n, page);
> > +	if (page) {
> > +		n = page->freelist;
> > +		BUG_ON(!n);
> > +		page->freelist = get_freepointer(kmalloc_caches, n);
> > +		page->inuse++;
> > +		kmalloc_caches->node[node] = n;
> > +		setup_object_debug(kmalloc_caches, page, n);
> > +		init_kmem_cache_node(n);
> > +		atomic_long_inc(&n->nr_slabs);
> > +		add_partial(n, page);
> > +	} else
> > +		kmalloc_caches->node[node] = NULL;
> >  	return n;
> >  }
> 
> It would be easier to modify SLUB to loop over node_memory_map instead of 
> node_online_map? Potentially we have to change all loops over online node 
> in the slab allocators.

So, I think we are really close to closing the gaps here. Just need to
figure out how to fix Lee's platform so it does what he wants, I think.
I've tested the current set (which is going to change again, once we
figure out how to deal with SLUB (I'm guessing we'll go with your patch
Christoph, but it didn't exist when I was testing earlier :) and Lee's
platform properly). But everything, including the sysfs allocator, works
with a 4-node x86_64, with all nodes populated and a 4-node ppc64, with
only 2 nodes populated.

I would like to roll up the patches and small fixes into a set of 4 or 5
patches that Andrew can pick up, so once this is all stable, I'll post a
fresh series. Sound good, Andrew?

Thanks,
Nish

-- 
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 2/3] Fix GFP_THISNODE behavior for memoryless nodes
  2007-06-13 23:11       ` Nishanth Aravamudan
@ 2007-06-13 23:15         ` Christoph Lameter
  2007-06-13 23:20           ` Nishanth Aravamudan
  0 siblings, 1 reply; 40+ messages in thread
From: Christoph Lameter @ 2007-06-13 23:15 UTC (permalink / raw)
  To: Nishanth Aravamudan; +Cc: Lee Schermerhorn, akpm, linux-mm, ak

On Wed, 13 Jun 2007, Nishanth Aravamudan wrote:

> I would like to roll up the patches and small fixes into a set of 4 or 5
> patches that Andrew can pick up, so once this is all stable, I'll post a
> fresh series. Sound good, Andrew?

NACK. This patchset is not ready for any inclusion and nothing like that 
should go into 2.6.22. We first need to assess the breakage that results 
if GFP_THISNODE now returns NULL for memoryless nodes. So far GFP_THISNODE 
returns memory on the nearest node and that seems to make lots of things 
keep working.

Plus the implementation of GFP_THISNODE that we have so far is a bit 
complex. It would be good if that could be simpler.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 2/3] Fix GFP_THISNODE behavior for memoryless nodes
  2007-06-13 23:15         ` Christoph Lameter
@ 2007-06-13 23:20           ` Nishanth Aravamudan
  2007-06-13 23:26             ` Christoph Lameter
  0 siblings, 1 reply; 40+ messages in thread
From: Nishanth Aravamudan @ 2007-06-13 23:20 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Lee Schermerhorn, akpm, linux-mm, ak

On 13.06.2007 [16:15:24 -0700], Christoph Lameter wrote:
> On Wed, 13 Jun 2007, Nishanth Aravamudan wrote:
> 
> > I would like to roll up the patches and small fixes into a set of 4 or 5
> > patches that Andrew can pick up, so once this is all stable, I'll post a
> > fresh series. Sound good, Andrew?
> 
> NACK. This patchset is not ready for any inclusion and nothing like
> that should go into 2.6.22. We first need to assess the breakage that
> results if GFP_THISNODE now returns NULL for memoryless nodes. So far
> GFP_THISNODE returns memory on the nearest node and that seems to make
> lots of things keep working.

Who the heck said anything about mainline?

See my other reply for discussing GFP_THISNODE.

Thanks,
Nish

-- 
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 2/3] Fix GFP_THISNODE behavior for memoryless nodes
  2007-06-13 23:20           ` Nishanth Aravamudan
@ 2007-06-13 23:26             ` Christoph Lameter
  2007-06-13 23:32               ` Nishanth Aravamudan
  0 siblings, 1 reply; 40+ messages in thread
From: Christoph Lameter @ 2007-06-13 23:26 UTC (permalink / raw)
  To: Nishanth Aravamudan; +Cc: Lee Schermerhorn, akpm, linux-mm, ak

On Wed, 13 Jun 2007, Nishanth Aravamudan wrote:

> Who the heck said anything about mainline?

Well we do have discovered issues that are bugs. Handling of memoryless 
nodes is rather strange.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 2/3] Fix GFP_THISNODE behavior for memoryless nodes
  2007-06-13 23:26             ` Christoph Lameter
@ 2007-06-13 23:32               ` Nishanth Aravamudan
  2007-06-13 23:53                 ` Christoph Lameter
  0 siblings, 1 reply; 40+ messages in thread
From: Nishanth Aravamudan @ 2007-06-13 23:32 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Lee Schermerhorn, akpm, linux-mm, ak

On 13.06.2007 [16:26:50 -0700], Christoph Lameter wrote:
> On Wed, 13 Jun 2007, Nishanth Aravamudan wrote:
> 
> > Who the heck said anything about mainline?
> 
> Well we do have discovered issues that are bugs. Handling of
> memoryless nodes is rather strange.

Without a doubt :)

Sorry, the real reason for wrapping the patches and reposting, for me,
is that we've had a lot of versions flying around, with small fixlets
here and there. I wanted to start a new thread for the 3 or so patches I
see that implement the core of dealing with memoryless nodes, and then
keep the discussion going there, but that was purely for my own sanity.

Sorry if I overreacted.

Thanks,
Nish

-- 
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 2/3] Fix GFP_THISNODE behavior for memoryless nodes
  2007-06-13 23:32               ` Nishanth Aravamudan
@ 2007-06-13 23:53                 ` Christoph Lameter
  2007-06-14  0:04                   ` Nishanth Aravamudan
  0 siblings, 1 reply; 40+ messages in thread
From: Christoph Lameter @ 2007-06-13 23:53 UTC (permalink / raw)
  To: Nishanth Aravamudan; +Cc: Lee Schermerhorn, akpm, linux-mm, ak

On Wed, 13 Jun 2007, Nishanth Aravamudan wrote:

> Sorry, the real reason for wrapping the patches and reposting, for me,
> is that we've had a lot of versions flying around, with small fixlets
> here and there. I wanted to start a new thread for the 3 or so patches I
> see that implement the core of dealing with memoryless nodes, and then
> keep the discussion going there, but that was purely for my own sanity.

Ok. I will try to put all the fixes together. Basically that will be the 
three patches plus Lee's new suggestion for alloc_pages_node. Plus fixes 
to slab / slub / the uncached allocator etc where I see that the online 
map is used but what was really intended was the node_memory_map. There is 
also stuff in the oom killer, vmscan.c etc that seems to make that 
assumption. Sigh.

That still leaves the issue of the name.

node_memory
set_node_has_memory
set_node_no_memory

Ok?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 2/3] Fix GFP_THISNODE behavior for memoryless nodes
  2007-06-13 23:53                 ` Christoph Lameter
@ 2007-06-14  0:04                   ` Nishanth Aravamudan
  0 siblings, 0 replies; 40+ messages in thread
From: Nishanth Aravamudan @ 2007-06-14  0:04 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Lee Schermerhorn, akpm, linux-mm, ak

On 13.06.2007 [16:53:26 -0700], Christoph Lameter wrote:
> On Wed, 13 Jun 2007, Nishanth Aravamudan wrote:
> 
> > Sorry, the real reason for wrapping the patches and reposting, for
> > me, is that we've had a lot of versions flying around, with small
> > fixlets here and there. I wanted to start a new thread for the 3 or
> > so patches I see that implement the core of dealing with memoryless
> > nodes, and then keep the discussion going there, but that was purely
> > for my own sanity.
> 
> Ok. I will try to put all the fixes together. Basically that will be
> the three patches plus Lee's new suggestion for alloc_pages_node. Plus
> fixes to slab / slub / the uncached allocator etc where I see that the
> online map is used but what was really intended was the
> node_memory_map. There is also stuff in the oom killer, vmscan.c etc
> that seems to make that assumption. Sigh.

Yeah, it's a big problem to solve, from an audit perspective.

> That still leaves the issue of the name.
> 
> node_memory

node_has_memory?

Reading code, if I see:

if (node_has_memory(nid))

it's almost reading an English sentence! :)

> set_node_has_memory
> set_node_no_memory

node_set_has_memory
node_set_no_memory?

Just to be close to (but not quite the same as) online/offline.

Thanks,
Nish

-- 
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 2/3] Fix GFP_THISNODE behavior for memoryless nodes
  2007-06-13 21:10   ` Lee Schermerhorn
  2007-06-13 21:57     ` Nishanth Aravamudan
  2007-06-13 22:46     ` Christoph Lameter
@ 2007-06-14  7:07     ` Christoph Lameter
  2007-06-14 14:23       ` Nishanth Aravamudan
  2 siblings, 1 reply; 40+ messages in thread
From: Christoph Lameter @ 2007-06-14  7:07 UTC (permalink / raw)
  To: Lee Schermerhorn; +Cc: akpm, linux-mm, ak, Nishanth Aravamudan

On Wed, 13 Jun 2007, Lee Schermerhorn wrote:

> --- Linux.orig/include/linux/gfp.h	2007-06-13 16:36:02.000000000 -0400
> +++ Linux/include/linux/gfp.h	2007-06-13 16:38:41.000000000 -0400
> @@ -168,6 +168,9 @@ FASTCALL(__alloc_pages(gfp_t, unsigned i
>  static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
>  						unsigned int order)
>  {
> +	pg_data_t *pgdat;
> +	struct zonelist *zonelist;
> +
>  	if (unlikely(order >= MAX_ORDER))
>  		return NULL;
>  
> @@ -179,11 +182,13 @@ static inline struct page *alloc_pages_n
>  	 * Check for the special case that GFP_THISNODE is used on a
>  	 * memoryless node
>  	 */
> -	if ((gfp_mask & __GFP_THISNODE) && !node_memory(nid))
> +	pgdat = NODE_DATA(nid);
> +	zonelist = pgdat->node_zonelists + gfp_zone(gfp_mask);
> +	if ((gfp_mask & __GFP_THISNODE) &&
> +		pgdat != zonelist->zones[0]->zone_pgdat)
>  		return NULL;
>  
> -	return __alloc_pages(gfp_mask, order,
> -		NODE_DATA(nid)->node_zonelists + gfp_zone(gfp_mask));
> +	return __alloc_pages(gfp_mask, order, zonelist);
>  }

Good idea but I think this does not address the case where the DMA zone of 
a node was moved to the end of the zonelist. In that case the first zone 
is not on the first pgdat but the node has memory. The memory of the node 
is listed elsewhere in the nodelist. I can probably modify __alloc_pages 
to make GFP_THISNODE to check all zones but we do not have the pgdat 
reference there. Sigh.

How about generating a special THISNODE zonelist in build_zonelist that 
only contains the zones of a single node. Then just use that one if 
GFP_THISNODE is set. Then we get rid of all the GFP_THISNODE crap that I 
added to __alloc_pages?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 2/3] Fix GFP_THISNODE behavior for memoryless nodes
  2007-06-13 22:46     ` Christoph Lameter
  2007-06-13 23:11       ` Nishanth Aravamudan
@ 2007-06-14 14:18       ` Lee Schermerhorn
  2007-06-14 14:24         ` Christoph Lameter
  1 sibling, 1 reply; 40+ messages in thread
From: Lee Schermerhorn @ 2007-06-14 14:18 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-mm, ak, Nishanth Aravamudan

On Wed, 2007-06-13 at 15:46 -0700, Christoph Lameter wrote:
> On Wed, 13 Jun 2007, Lee Schermerhorn wrote:
> 
> > SLUB early allocation, included in the patch.  Works on HP ia64 platform
> > with small DMA only node and "zone order" zonelists.  Will test on
> > x86_64 real soon now...
> 
> I do not see the difference?? How does this work? node_memory(x) fails 
> there?

On my system, pseudo-node 4 contains ~512MB of DMA zone.  With the new
zoneorder auto-config patch [which this platform also needs], the
zonelist for gfp_zone == ZONE_NORMAL [hugetlb attempts to allocate from
this zone--as it should] contains the following zones:

	zone0-normal, zone1-normal, zone2-normal, zone3-normal,
	zone4-dma

node_memory(4) returns "true" -- node does have memory, in the dma zone,
but it's last in the list, as required.  alloc_pages_node() would call
__alloc_pages() with this zonelist.  If get_page_from_freelist() finds
the requested page in zone0-normal, the check that the zone's pgdat ==
the pgdat of zonelist->zone[0] will succeed and we'll return an off-node
page.

> 
> > The map of nodes with memory may include nodes with just
> > DMA/DMA32 memory.  Using this map/mask together with
> > GFP_THISNODE will not guarantee on-node allocations at higher
> > zones.  Modify checks in alloc_pages_node() to ensure that the
> > first zone in the selected zonelist is "on-node".
> 
> That check is already done by __alloc_pages.

You mean in get_page_from_freelist()?  No, it only checks that the zone
under consideration is on the same node as the zone at the start of the
list.  This can be off-node if the node is populated only at lower
zones; and the zonelists are in zone-order.

> 
> > This change will result in alloc_pages_node() returning NULL
> > when GFP_THISNODE is specified and the first zone in the zonelist
> > selected by (nid, gfp_zone(gfp_mask) is not on node 'nid'.  This,
> > in turn, BUGs out in slub.c:early_kmem_cache_node_alloc() which
> > apparently can't handle a NULL page from new_slab().  Fix SLUB
> > to handle NULL page in early allocation.
> 
> Ummm... Slub would need to consult node_memory_map instead I guess.

Probably should check the node_memory_map to avoid attempting
allocations from completely memoryless nodes.  However, it should still
be able to handle nulls from alloc_pages_nodes() because of the
scenarios discussed above.

Lee

> 
> > Index: Linux/mm/slub.c
> > ===================================================================
> > --- Linux.orig/mm/slub.c	2007-06-13 16:36:02.000000000 -0400
> > +++ Linux/mm/slub.c	2007-06-13 16:38:41.000000000 -0400
> > @@ -1870,16 +1870,18 @@ static struct kmem_cache_node * __init e
> >  	/* new_slab() disables interupts */
> >  	local_irq_enable();
> >  
> > -	BUG_ON(!page);
> > -	n = page->freelist;
> > -	BUG_ON(!n);
> > -	page->freelist = get_freepointer(kmalloc_caches, n);
> > -	page->inuse++;
> > -	kmalloc_caches->node[node] = n;
> > -	setup_object_debug(kmalloc_caches, page, n);
> > -	init_kmem_cache_node(n);
> > -	atomic_long_inc(&n->nr_slabs);
> > -	add_partial(n, page);
> > +	if (page) {
> > +		n = page->freelist;
> > +		BUG_ON(!n);
> > +		page->freelist = get_freepointer(kmalloc_caches, n);
> > +		page->inuse++;
> > +		kmalloc_caches->node[node] = n;
> > +		setup_object_debug(kmalloc_caches, page, n);
> > +		init_kmem_cache_node(n);
> > +		atomic_long_inc(&n->nr_slabs);
> > +		add_partial(n, page);
> > +	} else
> > +		kmalloc_caches->node[node] = NULL;
> >  	return n;
> >  }
> 
> It would be easier to modify SLUB to loop over node_memory_map instead of 
> node_online_map? Potentially we have to change all loops over online node 
> in the slab allocators.

Again, node_memory_map can't detect the "first zone in zonelist
off-node" situation.  That's the one that alloc_pages_node() must guard
against.   So, it can/should/must return NULL when attempting to
allocate from a higher zone that is off-node.

> 
> ---
>  include/linux/nodemask.h |    1 +
>  mm/slub.c                |    2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> Index: linux-2.6/include/linux/nodemask.h
> ===================================================================
> --- linux-2.6.orig/include/linux/nodemask.h	2007-06-13 15:40:27.000000000 -0700
> +++ linux-2.6/include/linux/nodemask.h	2007-06-13 15:40:48.000000000 -0700
> @@ -377,5 +377,6 @@ extern int nr_node_ids;
>  
>  #define for_each_node(node)	   for_each_node_mask((node), node_possible_map)
>  #define for_each_online_node(node) for_each_node_mask((node), node_online_map)
> +#define for_each_memory_node(node) for_each_node_mask((node), node_memory_map)
>  
>  #endif /* __LINUX_NODEMASK_H */
> Index: linux-2.6/mm/slub.c
> ===================================================================
> --- linux-2.6.orig/mm/slub.c	2007-06-13 15:39:16.000000000 -0700
> +++ linux-2.6/mm/slub.c	2007-06-13 15:40:23.000000000 -0700
> @@ -1836,7 +1836,7 @@ static int init_kmem_cache_nodes(struct 
>  	else
>  		local_node = 0;
>  
> -	for_each_online_node(node) {
> +	for_each_memory_node(node) {
>  		struct kmem_cache_node *n;
>  
>  		if (local_node == node)
> 
> 
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 2/3] Fix GFP_THISNODE behavior for memoryless nodes
  2007-06-14  7:07     ` Christoph Lameter
@ 2007-06-14 14:23       ` Nishanth Aravamudan
  0 siblings, 0 replies; 40+ messages in thread
From: Nishanth Aravamudan @ 2007-06-14 14:23 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Lee Schermerhorn, akpm, linux-mm, ak

On 14.06.2007 [00:07:28 -0700], Christoph Lameter wrote:
> On Wed, 13 Jun 2007, Lee Schermerhorn wrote:
> 
> > --- Linux.orig/include/linux/gfp.h	2007-06-13 16:36:02.000000000 -0400
> > +++ Linux/include/linux/gfp.h	2007-06-13 16:38:41.000000000 -0400
> > @@ -168,6 +168,9 @@ FASTCALL(__alloc_pages(gfp_t, unsigned i
> >  static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
> >  						unsigned int order)
> >  {
> > +	pg_data_t *pgdat;
> > +	struct zonelist *zonelist;
> > +
> >  	if (unlikely(order >= MAX_ORDER))
> >  		return NULL;
> >  
> > @@ -179,11 +182,13 @@ static inline struct page *alloc_pages_n
> >  	 * Check for the special case that GFP_THISNODE is used on a
> >  	 * memoryless node
> >  	 */
> > -	if ((gfp_mask & __GFP_THISNODE) && !node_memory(nid))
> > +	pgdat = NODE_DATA(nid);
> > +	zonelist = pgdat->node_zonelists + gfp_zone(gfp_mask);
> > +	if ((gfp_mask & __GFP_THISNODE) &&
> > +		pgdat != zonelist->zones[0]->zone_pgdat)
> >  		return NULL;
> >  
> > -	return __alloc_pages(gfp_mask, order,
> > -		NODE_DATA(nid)->node_zonelists + gfp_zone(gfp_mask));
> > +	return __alloc_pages(gfp_mask, order, zonelist);
> >  }
> 
> Good idea but I think this does not address the case where the DMA
> zone of a node was moved to the end of the zonelist. In that case the
> first zone is not on the first pgdat but the node has memory. The
> memory of the node is listed elsewhere in the nodelist. I can probably
> modify __alloc_pages to make GFP_THISNODE to check all zones but we do
> not have the pgdat reference there. Sigh.
> 
> How about generating a special THISNODE zonelist in build_zonelist
> that only contains the zones of a single node. Then just use that one
> if GFP_THISNODE is set. Then we get rid of all the GFP_THISNODE crap
> that I added to __alloc_pages?

Makes sense to me.

Thanks,
NIsh

-- 
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 2/3] Fix GFP_THISNODE behavior for memoryless nodes
  2007-06-14 14:18       ` Lee Schermerhorn
@ 2007-06-14 14:24         ` Christoph Lameter
  2007-06-14 14:55           ` Lee Schermerhorn
  0 siblings, 1 reply; 40+ messages in thread
From: Christoph Lameter @ 2007-06-14 14:24 UTC (permalink / raw)
  To: Lee Schermerhorn; +Cc: akpm, linux-mm, ak, Nishanth Aravamudan

On Thu, 14 Jun 2007, Lee Schermerhorn wrote:

> > That check is already done by __alloc_pages.
> 
> You mean in get_page_from_freelist()?  No, it only checks that the zone
> under consideration is on the same node as the zone at the start of the
> list.  This can be off-node if the node is populated only at lower
> zones; and the zonelists are in zone-order.

See the later discussion. I did not see the use the nodes pgdat here that 
you only have in alloc_pages_node().

> > Ummm... Slub would need to consult node_memory_map instead I guess.
> 
> Probably should check the node_memory_map to avoid attempting
> allocations from completely memoryless nodes.  However, it should still
> be able to handle nulls from alloc_pages_nodes() because of the
> scenarios discussed above.

It is able to handle NULLs during usual operations but not during bootstrap.

> Again, node_memory_map can't detect the "first zone in zonelist
> off-node" situation.  That's the one that alloc_pages_node() must guard
> against.   So, it can/should/must return NULL when attempting to
> allocate from a higher zone that is off-node.

I think GFP_THISNODE should not fail in that case.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 2/3] Fix GFP_THISNODE behavior for memoryless nodes
  2007-06-14 14:24         ` Christoph Lameter
@ 2007-06-14 14:55           ` Lee Schermerhorn
  2007-06-14 15:51             ` Christoph Lameter
  0 siblings, 1 reply; 40+ messages in thread
From: Lee Schermerhorn @ 2007-06-14 14:55 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-mm, ak, Nishanth Aravamudan

On Thu, 2007-06-14 at 07:24 -0700, Christoph Lameter wrote:
> On Thu, 14 Jun 2007, Lee Schermerhorn wrote:
> 
> > > That check is already done by __alloc_pages.
> > 
> > You mean in get_page_from_freelist()?  No, it only checks that the zone
> > under consideration is on the same node as the zone at the start of the
> > list.  This can be off-node if the node is populated only at lower
> > zones; and the zonelists are in zone-order.
> 
> See the later discussion. I did not see the use the nodes pgdat here that 
> you only have in alloc_pages_node().
> 
> > > Ummm... Slub would need to consult node_memory_map instead I guess.
> > 
> > Probably should check the node_memory_map to avoid attempting
> > allocations from completely memoryless nodes.  However, it should still
> > be able to handle nulls from alloc_pages_nodes() because of the
> > scenarios discussed above.
> 
> It is able to handle NULLs during usual operations but not during bootstrap.

???  it has to handle memoryless nodes, right?  if so, why can't it
handle alloc_pages_node('THISNODE,...) returning NULL?  Too late in the
process?

> 
> > Again, node_memory_map can't detect the "first zone in zonelist
> > off-node" situation.  That's the one that alloc_pages_node() must guard
> > against.   So, it can/should/must return NULL when attempting to
> > allocate from a higher zone that is off-node.
> 
> I think GFP_THISNODE should not fail in that case.

You think it should return off-node memory?  If it can, we have to check
the returned page's nid--e.g., in alloc_fresh_huge_page()--where we
really don't want an off-node page.  Could do that, I suppose.

Or do you mean that GPF_THISNODE should return memory from the lower
zones, if it satisfies the order requirement?  Your custom 'thisnode'
zonelist will enable this, right?  That would work for my particular
config of my platform, but on a larger platform, I'd have a larger DMA
zone and wouldn't want hugetlb pages coming from there.  

I guess that in the long run, I need to hope that Nish's per node huge
page attribute goes in. Then, we can provide a fancy script to configure
huge pages on a specific set of nodes, rather than relying on the kernel
to distribute them evenly across the set of nodes that "make sense" for
the given platform.

Later,
Lee



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 2/3] Fix GFP_THISNODE behavior for memoryless nodes
  2007-06-14 14:55           ` Lee Schermerhorn
@ 2007-06-14 15:51             ` Christoph Lameter
  0 siblings, 0 replies; 40+ messages in thread
From: Christoph Lameter @ 2007-06-14 15:51 UTC (permalink / raw)
  To: Lee Schermerhorn; +Cc: akpm, linux-mm, ak, Nishanth Aravamudan

On Thu, 14 Jun 2007, Lee Schermerhorn wrote:

> > It is able to handle NULLs during usual operations but not during bootstrap.
> 
> ???  it has to handle memoryless nodes, right?  if so, why can't it
> handle alloc_pages_node('THISNODE,...) returning NULL?  Too late in the
> process?

It should not call alloc_pages_node for a node that has no memory during 
bootstrap.

> Or do you mean that GPF_THISNODE should return memory from the lower
> zones, if it satisfies the order requirement?  Your custom 'thisnode'
> zonelist will enable this, right?  That would work for my particular
> config of my platform, but on a larger platform, I'd have a larger DMA
> zone and wouldn't want hugetlb pages coming from there.  

Right. The fixes here are for the general vm and not for hugetlb. They are 
definitely not HP platform specific.

> I guess that in the long run, I need to hope that Nish's per node huge
> page attribute goes in. Then, we can provide a fancy script to configure
> huge pages on a specific set of nodes, rather than relying on the kernel
> to distribute them evenly across the set of nodes that "make sense" for
> the given platform.

Right.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2007-06-14 15:51 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-06-12 20:48 [patch 0/3] Fixes for NUMA allocations on memoryless nodes clameter
2007-06-12 20:48 ` [patch 1/3] NUMA: introduce node_memory_map clameter
2007-06-12 21:03   ` David Rientjes
2007-06-12 21:08     ` Christoph Lameter
2007-06-12 21:10       ` David Rientjes
2007-06-12 21:27         ` Christoph Lameter
2007-06-12 21:34           ` David Rientjes
2007-06-12 21:37             ` Christoph Lameter
2007-06-12 21:38               ` David Rientjes
2007-06-12 21:36         ` Nishanth Aravamudan
2007-06-12 21:39           ` Christoph Lameter
2007-06-12 21:42             ` Nishanth Aravamudan
2007-06-12 21:45               ` David Rientjes
2007-06-12 22:26               ` Christoph Lameter
2007-06-12 22:32                 ` Nishanth Aravamudan
2007-06-13  9:14           ` Andy Whitcroft
2007-06-12 20:48 ` [patch 2/3] Fix GFP_THISNODE behavior for memoryless nodes clameter
2007-06-12 21:03   ` David Rientjes
2007-06-12 21:07     ` Christoph Lameter
2007-06-12 21:08       ` David Rientjes
2007-06-12 21:25         ` Christoph Lameter
2007-06-12 21:34           ` David Rientjes
2007-06-12 21:51             ` Nishanth Aravamudan
2007-06-13 21:10   ` Lee Schermerhorn
2007-06-13 21:57     ` Nishanth Aravamudan
2007-06-13 22:46     ` Christoph Lameter
2007-06-13 23:11       ` Nishanth Aravamudan
2007-06-13 23:15         ` Christoph Lameter
2007-06-13 23:20           ` Nishanth Aravamudan
2007-06-13 23:26             ` Christoph Lameter
2007-06-13 23:32               ` Nishanth Aravamudan
2007-06-13 23:53                 ` Christoph Lameter
2007-06-14  0:04                   ` Nishanth Aravamudan
2007-06-14 14:18       ` Lee Schermerhorn
2007-06-14 14:24         ` Christoph Lameter
2007-06-14 14:55           ` Lee Schermerhorn
2007-06-14 15:51             ` Christoph Lameter
2007-06-14  7:07     ` Christoph Lameter
2007-06-14 14:23       ` Nishanth Aravamudan
2007-06-12 20:48 ` [patch 3/3] Fix MPOL_INTERLEAVE " clameter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox