linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [1/3] Add __GFP_THISNODE to avoid fallback to other nodes and ignore cpuset/memory policy restrictions.
@ 2006-08-08 16:33 Christoph Lameter
  2006-08-08 16:34 ` [2/3] sys_move_pages: Do not fall back to other nodes Christoph Lameter
                   ` (4 more replies)
  0 siblings, 5 replies; 26+ messages in thread
From: Christoph Lameter @ 2006-08-08 16:33 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, pj, jes, Andy Whitcroft

Add a new gfp flag __GFP_THISNODE to avoid fallback to other nodes. This flag
is essential if a kernel component requires memory to be located on a
certain node. It will be needed for alloc_pages_node() to force allocation
on the indicated node and for alloc_pages() to force allocation on the
current node.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.18-rc3-mm2/mm/page_alloc.c
===================================================================
--- linux-2.6.18-rc3-mm2.orig/mm/page_alloc.c	2006-08-07 20:21:28.431331931 -0700
+++ linux-2.6.18-rc3-mm2/mm/page_alloc.c	2006-08-08 09:23:23.323396326 -0700
@@ -916,6 +916,9 @@ get_page_from_freelist(gfp_t gfp_mask, u
 	 * See also cpuset_zone_allowed() comment in kernel/cpuset.c.
 	 */
 	do {
+		if (unlikely((gfp_mask & __GFP_THISNODE) &&
+			(*z)->zone_pgdat != zonelist->zones[0]->zone_pgdat))
+				break;
 		if ((alloc_flags & ALLOC_CPUSET) &&
 				!cpuset_zone_allowed(*z, gfp_mask))
 			continue;
Index: linux-2.6.18-rc3-mm2/include/linux/gfp.h
===================================================================
--- linux-2.6.18-rc3-mm2.orig/include/linux/gfp.h	2006-08-07 20:21:01.808957041 -0700
+++ linux-2.6.18-rc3-mm2/include/linux/gfp.h	2006-08-08 09:20:41.727897528 -0700
@@ -45,6 +45,7 @@ struct vm_area_struct;
 #define __GFP_ZERO	((__force gfp_t)0x8000u)/* Return zeroed page on success */
 #define __GFP_NOMEMALLOC ((__force gfp_t)0x10000u) /* Don't use emergency reserves */
 #define __GFP_HARDWALL   ((__force gfp_t)0x20000u) /* Enforce hardwall cpuset memory allocs */
+#define __GFP_THISNODE	((__force gfp_t)0x40000u)/* No fallback, no policies */
 
 #define __GFP_BITS_SHIFT 20	/* Room for 20 __GFP_FOO bits */
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
Index: linux-2.6.18-rc3-mm2/mm/mempolicy.c
===================================================================
--- linux-2.6.18-rc3-mm2.orig/mm/mempolicy.c	2006-08-07 20:21:01.810910045 -0700
+++ linux-2.6.18-rc3-mm2/mm/mempolicy.c	2006-08-08 09:20:41.729850533 -0700
@@ -1278,7 +1278,7 @@ struct page *alloc_pages_current(gfp_t g
 
 	if ((gfp & __GFP_WAIT) && !in_interrupt())
 		cpuset_update_task_memory_state();
-	if (!pol || in_interrupt())
+	if (!pol || in_interrupt() || (gfp & __GFP_THISNODE))
 		pol = &default_policy;
 	if (pol->policy == MPOL_INTERLEAVE)
 		return alloc_page_interleave(gfp, order, interleave_nodes(pol));
Index: linux-2.6.18-rc3-mm2/kernel/cpuset.c
===================================================================
--- linux-2.6.18-rc3-mm2.orig/kernel/cpuset.c	2006-08-07 20:21:07.429702734 -0700
+++ linux-2.6.18-rc3-mm2/kernel/cpuset.c	2006-08-08 09:20:41.730827035 -0700
@@ -2282,7 +2282,7 @@ int __cpuset_zone_allowed(struct zone *z
 	const struct cpuset *cs;	/* current cpuset ancestors */
 	int allowed;			/* is allocation in zone z allowed? */
 
-	if (in_interrupt())
+	if (in_interrupt() || (gfp_mask & __GFP_THISNODE))
 		return 1;
 	node = z->zone_pgdat->node_id;
 	might_sleep_if(!(gfp_mask & __GFP_HARDWALL));

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [2/3] sys_move_pages: Do not fall back to other nodes
  2006-08-08 16:33 [1/3] Add __GFP_THISNODE to avoid fallback to other nodes and ignore cpuset/memory policy restrictions Christoph Lameter
@ 2006-08-08 16:34 ` Christoph Lameter
  2006-08-08 16:37   ` [3/3] Guarantee that the uncached allocator gets pages on the correct node Christoph Lameter
  2006-08-08 16:56 ` [1/3] Add __GFP_THISNODE to avoid fallback to other nodes and ignore cpuset/memory policy restrictions Andy Whitcroft
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 26+ messages in thread
From: Christoph Lameter @ 2006-08-08 16:34 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, pj, jes, Andy Whitcroft

If the user specified a node where we should move the page to then
we really do not want any other node.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.18-rc3-mm2/mm/migrate.c
===================================================================
--- linux-2.6.18-rc3-mm2.orig/mm/migrate.c	2006-08-08 09:15:29.352637207 -0700
+++ linux-2.6.18-rc3-mm2/mm/migrate.c	2006-08-08 09:25:41.388119893 -0700
@@ -745,7 +745,9 @@ static struct page *new_page_node(struct
 
 	*result = &pm->status;
 
-	return alloc_pages_node(pm->node, GFP_HIGHUSER, 0);
+	return alloc_pages_node(pm->node,
+		GFP_HIGHUSER | __GFP_THISNODE | __GFP_NOWARN | __GFP_NORETRY,
+		0);
 }
 
 /*

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [3/3] Guarantee that the uncached allocator gets pages on the correct node.
  2006-08-08 16:34 ` [2/3] sys_move_pages: Do not fall back to other nodes Christoph Lameter
@ 2006-08-08 16:37   ` Christoph Lameter
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Lameter @ 2006-08-08 16:37 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, pj, jes, Andy Whitcroft

The uncached allocator manages per node pools. Specify __GFP_THISNODE
in order to force allocation on the indicated node or fail. The
uncached allocator has already logic to deal with failing allocations.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.18-rc3-mm2/arch/ia64/kernel/uncached.c
===================================================================
--- linux-2.6.18-rc3-mm2.orig/arch/ia64/kernel/uncached.c	2006-08-07 15:22:18.460374398 -0700
+++ linux-2.6.18-rc3-mm2/arch/ia64/kernel/uncached.c	2006-08-08 09:36:00.696583433 -0700
@@ -98,7 +98,8 @@ static int uncached_add_chunk(struct unc
 
 	/* attempt to allocate a granule's worth of cached memory pages */
 
-	page = alloc_pages_node(nid, GFP_KERNEL | __GFP_ZERO,
+	page = alloc_pages_node(nid, GFP_KERNEL | __GFP_ZERO |
+				 __GFP_THISNODE | __GFP_NORETRY | __GFP_NOWARN,
 				IA64_GRANULE_SHIFT-PAGE_SHIFT);
 	if (!page) {
 		mutex_unlock(&uc_pool->add_chunk_mutex);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [1/3] Add __GFP_THISNODE to avoid fallback to other nodes and ignore cpuset/memory policy restrictions.
  2006-08-08 16:33 [1/3] Add __GFP_THISNODE to avoid fallback to other nodes and ignore cpuset/memory policy restrictions Christoph Lameter
  2006-08-08 16:34 ` [2/3] sys_move_pages: Do not fall back to other nodes Christoph Lameter
@ 2006-08-08 16:56 ` Andy Whitcroft
  2006-08-08 17:01   ` Christoph Lameter
  2006-08-08 16:59 ` Mel Gorman
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 26+ messages in thread
From: Andy Whitcroft @ 2006-08-08 16:56 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-mm, pj, jes

Christoph Lameter wrote:
> Add a new gfp flag __GFP_THISNODE to avoid fallback to other nodes. This flag
> is essential if a kernel component requires memory to be located on a
> certain node. It will be needed for alloc_pages_node() to force allocation
> on the indicated node and for alloc_pages() to force allocation on the
> current node.
> 
> Signed-off-by: Christoph Lameter <clameter@sgi.com>
> 
> Index: linux-2.6.18-rc3-mm2/mm/page_alloc.c
> ===================================================================
> --- linux-2.6.18-rc3-mm2.orig/mm/page_alloc.c	2006-08-07 20:21:28.431331931 -0700
> +++ linux-2.6.18-rc3-mm2/mm/page_alloc.c	2006-08-08 09:23:23.323396326 -0700
> @@ -916,6 +916,9 @@ get_page_from_freelist(gfp_t gfp_mask, u
>  	 * See also cpuset_zone_allowed() comment in kernel/cpuset.c.
>  	 */
>  	do {
> +		if (unlikely((gfp_mask & __GFP_THISNODE) &&
> +			(*z)->zone_pgdat != zonelist->zones[0]->zone_pgdat))
> +				break;
>  		if ((alloc_flags & ALLOC_CPUSET) &&
>  				!cpuset_zone_allowed(*z, gfp_mask))
>  			continue;

Would this not be a very good example of an overlapping GFP_foo bits?. 
If this bit were just passed through with the GFP_DMA etc then we could 
build lists per-node which only include the node, then put those in the 
zonelist[GFP_THISNODE|GFP_DMA] etc?

-apw

> Index: linux-2.6.18-rc3-mm2/include/linux/gfp.h
> ===================================================================
> --- linux-2.6.18-rc3-mm2.orig/include/linux/gfp.h	2006-08-07 20:21:01.808957041 -0700
> +++ linux-2.6.18-rc3-mm2/include/linux/gfp.h	2006-08-08 09:20:41.727897528 -0700
> @@ -45,6 +45,7 @@ struct vm_area_struct;
>  #define __GFP_ZERO	((__force gfp_t)0x8000u)/* Return zeroed page on success */
>  #define __GFP_NOMEMALLOC ((__force gfp_t)0x10000u) /* Don't use emergency reserves */
>  #define __GFP_HARDWALL   ((__force gfp_t)0x20000u) /* Enforce hardwall cpuset memory allocs */
> +#define __GFP_THISNODE	((__force gfp_t)0x40000u)/* No fallback, no policies */
>  
>  #define __GFP_BITS_SHIFT 20	/* Room for 20 __GFP_FOO bits */
>  #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
> Index: linux-2.6.18-rc3-mm2/mm/mempolicy.c
> ===================================================================
> --- linux-2.6.18-rc3-mm2.orig/mm/mempolicy.c	2006-08-07 20:21:01.810910045 -0700
> +++ linux-2.6.18-rc3-mm2/mm/mempolicy.c	2006-08-08 09:20:41.729850533 -0700
> @@ -1278,7 +1278,7 @@ struct page *alloc_pages_current(gfp_t g
>  
>  	if ((gfp & __GFP_WAIT) && !in_interrupt())
>  		cpuset_update_task_memory_state();
> -	if (!pol || in_interrupt())
> +	if (!pol || in_interrupt() || (gfp & __GFP_THISNODE))
>  		pol = &default_policy;
>  	if (pol->policy == MPOL_INTERLEAVE)
>  		return alloc_page_interleave(gfp, order, interleave_nodes(pol));
> Index: linux-2.6.18-rc3-mm2/kernel/cpuset.c
> ===================================================================
> --- linux-2.6.18-rc3-mm2.orig/kernel/cpuset.c	2006-08-07 20:21:07.429702734 -0700
> +++ linux-2.6.18-rc3-mm2/kernel/cpuset.c	2006-08-08 09:20:41.730827035 -0700
> @@ -2282,7 +2282,7 @@ int __cpuset_zone_allowed(struct zone *z
>  	const struct cpuset *cs;	/* current cpuset ancestors */
>  	int allowed;			/* is allocation in zone z allowed? */
>  
> -	if (in_interrupt())
> +	if (in_interrupt() || (gfp_mask & __GFP_THISNODE))
>  		return 1;
>  	node = z->zone_pgdat->node_id;
>  	might_sleep_if(!(gfp_mask & __GFP_HARDWALL));

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [1/3] Add __GFP_THISNODE to avoid fallback to other nodes and ignore cpuset/memory policy restrictions.
  2006-08-08 16:33 [1/3] Add __GFP_THISNODE to avoid fallback to other nodes and ignore cpuset/memory policy restrictions Christoph Lameter
  2006-08-08 16:34 ` [2/3] sys_move_pages: Do not fall back to other nodes Christoph Lameter
  2006-08-08 16:56 ` [1/3] Add __GFP_THISNODE to avoid fallback to other nodes and ignore cpuset/memory policy restrictions Andy Whitcroft
@ 2006-08-08 16:59 ` Mel Gorman
  2006-08-08 17:03   ` Christoph Lameter
  2006-08-09  1:34 ` KAMEZAWA Hiroyuki
  2006-08-10 19:41 ` Andrew Morton
  4 siblings, 1 reply; 26+ messages in thread
From: Mel Gorman @ 2006-08-08 16:59 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-mm, pj, jes, Andy Whitcroft

On Tue, 8 Aug 2006, Christoph Lameter wrote:

> Add a new gfp flag __GFP_THISNODE to avoid fallback to other nodes. This flag
> is essential if a kernel component requires memory to be located on a
> certain node. It will be needed for alloc_pages_node() to force allocation
> on the indicated node and for alloc_pages() to force allocation on the
> current node.
>

GFP flags are getting a bit tight. Could this also be done by providing

alloc_pages_zonelist(int nid, gfp_t gfp_mask, unsigned int order,  struct zonelist *));

alloc_pages_node() would be altered to call alloc_pages_zonelist() with 
the currect zonelist. To avoid fallbacks, callers would need a helper 
function that provided a zonelist with just zones in a single node.

That would give the ability to avoid fallbacks at least. Avoiding policy 
temporarily is a bit harder but it is really needed?

> Signed-off-by: Christoph Lameter <clameter@sgi.com>
>
> Index: linux-2.6.18-rc3-mm2/mm/page_alloc.c
> ===================================================================
> --- linux-2.6.18-rc3-mm2.orig/mm/page_alloc.c	2006-08-07 20:21:28.431331931 -0700
> +++ linux-2.6.18-rc3-mm2/mm/page_alloc.c	2006-08-08 09:23:23.323396326 -0700
> @@ -916,6 +916,9 @@ get_page_from_freelist(gfp_t gfp_mask, u
> 	 * See also cpuset_zone_allowed() comment in kernel/cpuset.c.
> 	 */
> 	do {
> +		if (unlikely((gfp_mask & __GFP_THISNODE) &&
> +			(*z)->zone_pgdat != zonelist->zones[0]->zone_pgdat))
> +				break;
> 		if ((alloc_flags & ALLOC_CPUSET) &&
> 				!cpuset_zone_allowed(*z, gfp_mask))
> 			continue;
> Index: linux-2.6.18-rc3-mm2/include/linux/gfp.h
> ===================================================================
> --- linux-2.6.18-rc3-mm2.orig/include/linux/gfp.h	2006-08-07 20:21:01.808957041 -0700
> +++ linux-2.6.18-rc3-mm2/include/linux/gfp.h	2006-08-08 09:20:41.727897528 -0700
> @@ -45,6 +45,7 @@ struct vm_area_struct;
> #define __GFP_ZERO	((__force gfp_t)0x8000u)/* Return zeroed page on success */
> #define __GFP_NOMEMALLOC ((__force gfp_t)0x10000u) /* Don't use emergency reserves */
> #define __GFP_HARDWALL   ((__force gfp_t)0x20000u) /* Enforce hardwall cpuset memory allocs */
> +#define __GFP_THISNODE	((__force gfp_t)0x40000u)/* No fallback, no policies */
>
> #define __GFP_BITS_SHIFT 20	/* Room for 20 __GFP_FOO bits */
> #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
> Index: linux-2.6.18-rc3-mm2/mm/mempolicy.c
> ===================================================================
> --- linux-2.6.18-rc3-mm2.orig/mm/mempolicy.c	2006-08-07 20:21:01.810910045 -0700
> +++ linux-2.6.18-rc3-mm2/mm/mempolicy.c	2006-08-08 09:20:41.729850533 -0700
> @@ -1278,7 +1278,7 @@ struct page *alloc_pages_current(gfp_t g
>
> 	if ((gfp & __GFP_WAIT) && !in_interrupt())
> 		cpuset_update_task_memory_state();
> -	if (!pol || in_interrupt())
> +	if (!pol || in_interrupt() || (gfp & __GFP_THISNODE))
> 		pol = &default_policy;
> 	if (pol->policy == MPOL_INTERLEAVE)
> 		return alloc_page_interleave(gfp, order, interleave_nodes(pol));
> Index: linux-2.6.18-rc3-mm2/kernel/cpuset.c
> ===================================================================
> --- linux-2.6.18-rc3-mm2.orig/kernel/cpuset.c	2006-08-07 20:21:07.429702734 -0700
> +++ linux-2.6.18-rc3-mm2/kernel/cpuset.c	2006-08-08 09:20:41.730827035 -0700
> @@ -2282,7 +2282,7 @@ int __cpuset_zone_allowed(struct zone *z
> 	const struct cpuset *cs;	/* current cpuset ancestors */
> 	int allowed;			/* is allocation in zone z allowed? */
>
> -	if (in_interrupt())
> +	if (in_interrupt() || (gfp_mask & __GFP_THISNODE))
> 		return 1;
> 	node = z->zone_pgdat->node_id;
> 	might_sleep_if(!(gfp_mask & __GFP_HARDWALL));
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [1/3] Add __GFP_THISNODE to avoid fallback to other nodes and ignore cpuset/memory policy restrictions.
  2006-08-08 16:56 ` [1/3] Add __GFP_THISNODE to avoid fallback to other nodes and ignore cpuset/memory policy restrictions Andy Whitcroft
@ 2006-08-08 17:01   ` Christoph Lameter
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Lameter @ 2006-08-08 17:01 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: akpm, linux-mm, pj, jes

On Tue, 8 Aug 2006, Andy Whitcroft wrote:

> > +		if (unlikely((gfp_mask & __GFP_THISNODE) &&
> > +			(*z)->zone_pgdat != zonelist->zones[0]->zone_pgdat))
> > +				break;
> >  		if ((alloc_flags & ALLOC_CPUSET) &&
> >  				!cpuset_zone_allowed(*z, gfp_mask))
> >  			continue;
> 
> Would this not be a very good example of an overlapping GFP_foo bits?. If this
> bit were just passed through with the GFP_DMA etc then we could build lists
> per-node which only include the node, then put those in the
> zonelist[GFP_THISNODE|GFP_DMA] etc?

__GFP_THISNODE is needed for memory policies and cpuset constraints. In 
that case the zonelists do not help.

The gfp_mask is a local parameter and can be checked with minimal effort 
here.

cpuset already do extensive filtering of zonelists. We are down this road 
already.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [1/3] Add __GFP_THISNODE to avoid fallback to other nodes and ignore cpuset/memory policy restrictions.
  2006-08-08 16:59 ` Mel Gorman
@ 2006-08-08 17:03   ` Christoph Lameter
  2006-08-08 17:16     ` Mel Gorman
  2006-08-08 17:47     ` Paul Jackson
  0 siblings, 2 replies; 26+ messages in thread
From: Christoph Lameter @ 2006-08-08 17:03 UTC (permalink / raw)
  To: Mel Gorman; +Cc: akpm, linux-mm, pj, jes, Andy Whitcroft

On Tue, 8 Aug 2006, Mel Gorman wrote:

> On Tue, 8 Aug 2006, Christoph Lameter wrote:
> 
> > Add a new gfp flag __GFP_THISNODE to avoid fallback to other nodes. This
> > flag
> > is essential if a kernel component requires memory to be located on a
> > certain node. It will be needed for alloc_pages_node() to force allocation
> > on the indicated node and for alloc_pages() to force allocation on the
> > current node.
> > 
> 
> GFP flags are getting a bit tight. Could this also be done by providing

Right they are gettin scarce.

> alloc_pages_zonelist(int nid, gfp_t gfp_mask, unsigned int order,  struct
> zonelist *));
> 
> alloc_pages_node() would be altered to call alloc_pages_zonelist() with the
> currect zonelist. To avoid fallbacks, callers would need a helper function
> that provided a zonelist with just zones in a single node.

We would need a whole selection of allocators for this purpose. Some 
candidates:

alloc_pages_current
alloc_pages_node
vmalloc
vmalloc_node
dma_alloc_coherent

etc


 > That would give the ability to avoid fallbacks at least. Avoiding policy
> temporarily is a bit harder but it is really needed?

Policy and cpusets can redirect allocations. That is one of the key 
problems.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [1/3] Add __GFP_THISNODE to avoid fallback to other nodes and ignore cpuset/memory policy restrictions.
  2006-08-08 17:03   ` Christoph Lameter
@ 2006-08-08 17:16     ` Mel Gorman
  2006-08-08 17:51       ` Christoph Lameter
  2006-08-08 17:47     ` Paul Jackson
  1 sibling, 1 reply; 26+ messages in thread
From: Mel Gorman @ 2006-08-08 17:16 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-mm, pj, jes, Andy Whitcroft

On Tue, 8 Aug 2006, Christoph Lameter wrote:

> On Tue, 8 Aug 2006, Mel Gorman wrote:
>
>> On Tue, 8 Aug 2006, Christoph Lameter wrote:
>>
>>> Add a new gfp flag __GFP_THISNODE to avoid fallback to other nodes. This
>>> flag
>>> is essential if a kernel component requires memory to be located on a
>>> certain node. It will be needed for alloc_pages_node() to force allocation
>>> on the indicated node and for alloc_pages() to force allocation on the
>>> current node.
>>>
>>
>> GFP flags are getting a bit tight. Could this also be done by providing
>
> Right they are gettin scarce.
>
>> alloc_pages_zonelist(int nid, gfp_t gfp_mask, unsigned int order,  struct
>> zonelist *));
>>
>> alloc_pages_node() would be altered to call alloc_pages_zonelist() with the
>> currect zonelist. To avoid fallbacks, callers would need a helper function
>> that provided a zonelist with just zones in a single node.
>
> We would need a whole selection of allocators for this purpose. Some
> candidates:
>
> alloc_pages_current
> alloc_pages_node
> vmalloc
> vmalloc_node
> dma_alloc_coherent
>

>From your set of patches, it's only used for page migration and the IA64 
uncached allocator both of which are using alloc_pages_node() at the 
moment. Do you see a widespread need to avoid fallbacks in other areas?

Also, I just noticed you didn't update GFP_LEVEL_MASK with your new flag. 
That may cause interesting failures in the future, particularly if you 
call into the slab allocator with the new flag.

I'm not rabidly against the use of a GFP flag, I just want to be sure it's 
the only option.

> etc
>
>
> > That would give the ability to avoid fallbacks at least. Avoiding policy
>> temporarily is a bit harder but it is really needed?
>
> Policy and cpusets can redirect allocations. That is one of the key
> problems.
>

Could the policies and cpusets be avoided by allowing a zonelist to be 
specified?

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [1/3] Add __GFP_THISNODE to avoid fallback to other nodes and ignore cpuset/memory policy restrictions.
  2006-08-08 17:03   ` Christoph Lameter
  2006-08-08 17:16     ` Mel Gorman
@ 2006-08-08 17:47     ` Paul Jackson
  2006-08-08 17:59       ` Christoph Lameter
  1 sibling, 1 reply; 26+ messages in thread
From: Paul Jackson @ 2006-08-08 17:47 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: mel, akpm, linux-mm, jes, apw

> > alloc_pages_node() would be altered to call alloc_pages_zonelist() with the
> > currect zonelist. To avoid fallbacks, callers would need a helper function
> > that provided a zonelist with just zones in a single node.
> 
> We would need a whole selection of allocators for this purpose. Some 
> candidates:

Would we really need "a whole selection"?  Seems like we just
need a variant of alloc_pages_node(), for now.

My recollection is that most calls to alloc_pages_node() and
other node specific allocators really mean:
  * get memory on or near to the specified node (best effort)

but a few such calls, such as from the memory migration code
(which is determined to recreate the relative per-node memory layout
across the migration) really mean:
  * get memory on -exactly- this node (Pike's Peak or bust)

If I were God, we'd rename 'alloc_pages_node()' to be instead
'alloc_pages_near_node()', and have another routine named
'alloc_pages_exact_node()' for use in a few places such as the
migration code.  It's mildly unfortunate that many of the callers
of 'alloc_pages_node()' (using my God-like powers of mind reading)
are expecting -exactly- that node, not just a best effort -near-
that node.  Fortunately, they don't know what they want.

Back to reality ... rather than a "whole set of allocators", how
about just provide such exact node allocators on demand, as needed
by the few calls, such as migration, that need it.

For example, add an 'alloc_pages_exact_node()' to be used by the
new_page_node() in the migration code, and the uncached_add_chunk()
in kernel/uncached.c.  It would pass a zonelist with just zones from
the allowed node to __alloc_pages().  Such a routine sounds very much
like an MPOL_BIND on a single node. Perhaps there is potential synergy
between the implementation of MPOL_BIND and 'alloc_pages_exact_node'.

So far, only alloc_pages_exact_node is needed, not "a whole selection."

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [1/3] Add __GFP_THISNODE to avoid fallback to other nodes and ignore cpuset/memory policy restrictions.
  2006-08-08 17:16     ` Mel Gorman
@ 2006-08-08 17:51       ` Christoph Lameter
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Lameter @ 2006-08-08 17:51 UTC (permalink / raw)
  To: Mel Gorman; +Cc: akpm, linux-mm, pj, jes, Andy Whitcroft

On Tue, 8 Aug 2006, Mel Gorman wrote:

> > From your set of patches, it's only used for page migration and the IA64 
> uncached allocator both of which are using alloc_pages_node() at the moment.
> Do you see a widespread need to avoid fallbacks in other areas?

These are the patches that are ready for mm. Read the other RFCs on 
linux-mm. There are patches for the slab etc.
 
> Also, I just noticed you didn't update GFP_LEVEL_MASK with your new flag. That
> may cause interesting failures in the future, particularly if you call into
> the slab allocator with the new flag.

Thanks! Fixup:

Index: linux-2.6.18-rc3-mm2/include/linux/gfp.h
===================================================================
--- linux-2.6.18-rc3-mm2.orig/include/linux/gfp.h	2006-08-08 09:20:41.727897528 -0700
+++ linux-2.6.18-rc3-mm2/include/linux/gfp.h	2006-08-08 10:50:37.604766523 -0700
@@ -54,7 +54,7 @@ struct vm_area_struct;
 #define GFP_LEVEL_MASK (__GFP_WAIT|__GFP_HIGH|__GFP_IO|__GFP_FS| \
 			__GFP_COLD|__GFP_NOWARN|__GFP_REPEAT| \
 			__GFP_NOFAIL|__GFP_NORETRY|__GFP_NO_GROW|__GFP_COMP| \
-			__GFP_NOMEMALLOC|__GFP_HARDWALL)
+			__GFP_NOMEMALLOC|__GFP_HARDWALL|__GFP_THISNODE)
 
 /* This equals 0, but use constants in case they ever change */
 #define GFP_NOWAIT	(GFP_ATOMIC & ~__GFP_HIGH)
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [1/3] Add __GFP_THISNODE to avoid fallback to other nodes and ignore cpuset/memory policy restrictions.
  2006-08-08 17:47     ` Paul Jackson
@ 2006-08-08 17:59       ` Christoph Lameter
  2006-08-08 18:18         ` Paul Jackson
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Lameter @ 2006-08-08 17:59 UTC (permalink / raw)
  To: Paul Jackson; +Cc: mel, akpm, linux-mm, jes, apw

On Tue, 8 Aug 2006, Paul Jackson wrote:

> So far, only alloc_pages_exact_node is needed, not "a whole selection."

Ok then we can only allocate pages on exactly one node only via this 
particular function call and not through other subsystem allocators. This 
may fit the urgent needs for node specific allocations that I found so 
far.

However, doing so  means we cannot get vmalloced memory on a 
particular node, we cannot get dma memory on a particular node. We cannot 
indicate to the slab allocator that we want memory on a particular node. 
These are all things that we need. If we would look at the users at all 
the _node allocators then we surely will find users of kmalloc_node and 
vmalloc_node etc that expect memory on exactly that node.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [1/3] Add __GFP_THISNODE to avoid fallback to other nodes and ignore cpuset/memory policy restrictions.
  2006-08-08 17:59       ` Christoph Lameter
@ 2006-08-08 18:18         ` Paul Jackson
  2006-08-08 18:49           ` Christoph Lameter
  0 siblings, 1 reply; 26+ messages in thread
From: Paul Jackson @ 2006-08-08 18:18 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: mel, akpm, linux-mm, jes, apw

Christoph wrote:
> If we would look at the users at all 
> the _node allocators then we surely will find users of kmalloc_node and 
> vmalloc_node etc that expect memory on exactly that node.

Perhaps.  Do you know of any specific examples needing this?

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [1/3] Add __GFP_THISNODE to avoid fallback to other nodes and ignore cpuset/memory policy restrictions.
  2006-08-08 18:18         ` Paul Jackson
@ 2006-08-08 18:49           ` Christoph Lameter
  2006-08-08 20:35             ` Paul Jackson
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Lameter @ 2006-08-08 18:49 UTC (permalink / raw)
  To: Paul Jackson; +Cc: mel, akpm, linux-mm, jes, apw

On Tue, 8 Aug 2006, Paul Jackson wrote:

> Christoph wrote:
> > If we would look at the users at all 
> > the _node allocators then we surely will find users of kmalloc_node and 
> > vmalloc_node etc that expect memory on exactly that node.
> 
> Perhaps.  Do you know of any specific examples needing this?

Sure. Some examples

For kmalloc_node()  look at vmalloc.c and slab.c for starters.

For vmalloc_node see drivers/oprofile/buffer.c
net/ipv4/netfilter/... various places.

This is going to increase with the more NUMA awareness throughout the 
kernel.

Interesting constructs in ip_tables.c:

counters = vmalloc_node(countersize, numa_node_id());

It seems what they really want is:

counters = __vmalloc(countersize, __GFP_THISNODE, PAGE_KERNEL);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [1/3] Add __GFP_THISNODE to avoid fallback to other nodes and ignore cpuset/memory policy restrictions.
  2006-08-08 18:49           ` Christoph Lameter
@ 2006-08-08 20:35             ` Paul Jackson
  2006-08-09  9:33               ` Mel Gorman
  0 siblings, 1 reply; 26+ messages in thread
From: Paul Jackson @ 2006-08-08 20:35 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: mel, akpm, linux-mm, jes, apw

> Sure. Some examples

Hmmm ... more than I realized.

These __GFP_THISNODE patches seem reasonable to me.

Acked-by: Paul Jackson <pj@sgi.com>

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [1/3] Add __GFP_THISNODE to avoid fallback to other nodes and ignore cpuset/memory policy restrictions.
  2006-08-08 16:33 [1/3] Add __GFP_THISNODE to avoid fallback to other nodes and ignore cpuset/memory policy restrictions Christoph Lameter
                   ` (2 preceding siblings ...)
  2006-08-08 16:59 ` Mel Gorman
@ 2006-08-09  1:34 ` KAMEZAWA Hiroyuki
  2006-08-09  2:00   ` Christoph Lameter
  2006-08-10 19:41 ` Andrew Morton
  4 siblings, 1 reply; 26+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-08-09  1:34 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-mm, pj, jes, apw

On Tue, 8 Aug 2006 09:33:46 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:

> Add a new gfp flag __GFP_THISNODE to avoid fallback to other nodes. This flag
> is essential if a kernel component requires memory to be located on a
> certain node. It will be needed for alloc_pages_node() to force allocation
> on the indicated node and for alloc_pages() to force allocation on the
> current node.
> 
> Signed-off-by: Christoph Lameter <clameter@sgi.com>
> 

Hm, passing a nodemask as argment to alloc_page_???()is too more complicated
than GFP_THISNODE ? (it will increase # of args but...)

-Kame



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [1/3] Add __GFP_THISNODE to avoid fallback to other nodes and ignore cpuset/memory policy restrictions.
  2006-08-09  1:34 ` KAMEZAWA Hiroyuki
@ 2006-08-09  2:00   ` Christoph Lameter
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Lameter @ 2006-08-09  2:00 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: akpm, linux-mm, pj, jes, apw

On Wed, 9 Aug 2006, KAMEZAWA Hiroyuki wrote:

> Hm, passing a nodemask as argment to alloc_page_???()is too more complicated
> than GFP_THISNODE ? (it will increase # of args but...)

The node is passed via alloc_pages_node() etc already. If one uses 
__GFP_THISNODE with alloc_pages_node() then you will get the page on the 
indicated node regardless of cpusets. Currently cpuset constraints may 
lead to allocation on a different node.

If you use __GFP_THISNODE with an allocator that does not allow the 
specification of a node then you will get memory from the local node 
without regard to memory policies and cpuset constraints. In that usage 
scenario __GFP_THISNODE then behaves as if it would be 
Andy's GFP_LOCAL_NODE.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [1/3] Add __GFP_THISNODE to avoid fallback to other nodes and ignore cpuset/memory policy restrictions.
  2006-08-08 20:35             ` Paul Jackson
@ 2006-08-09  9:33               ` Mel Gorman
  0 siblings, 0 replies; 26+ messages in thread
From: Mel Gorman @ 2006-08-09  9:33 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Paul Jackson, akpm, Linux Memory Management List, jes, Andy Whitcroft

On Tue, 8 Aug 2006, Paul Jackson wrote:

>> Sure. Some examples
>
> Hmmm ... more than I realized.
>
> These __GFP_THISNODE patches seem reasonable to me.
>

I'm happy enough as well. Thanks for taking the time to explain all of the 
flags potential users.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [1/3] Add __GFP_THISNODE to avoid fallback to other nodes and ignore cpuset/memory policy restrictions.
  2006-08-08 16:33 [1/3] Add __GFP_THISNODE to avoid fallback to other nodes and ignore cpuset/memory policy restrictions Christoph Lameter
                   ` (3 preceding siblings ...)
  2006-08-09  1:34 ` KAMEZAWA Hiroyuki
@ 2006-08-10 19:41 ` Andrew Morton
  2006-08-11  3:16   ` Christoph Lameter
  4 siblings, 1 reply; 26+ messages in thread
From: Andrew Morton @ 2006-08-10 19:41 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm, pj, jes, Andy Whitcroft

On Tue, 8 Aug 2006 09:33:46 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:

> Add a new gfp flag __GFP_THISNODE to avoid fallback to other nodes. This flag
> is essential if a kernel component requires memory to be located on a
> certain node. It will be needed for alloc_pages_node() to force allocation
> on the indicated node and for alloc_pages() to force allocation on the
> current node.

This adds a little bit of overhead to non-numa kernels.  I think that
overhead could be eliminated if we were to do

#ifndef CONFIG_NUMA
#define __GFP_THISNODE 0
#endif

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [1/3] Add __GFP_THISNODE to avoid fallback to other nodes and ignore cpuset/memory policy restrictions.
  2006-08-10 19:41 ` Andrew Morton
@ 2006-08-11  3:16   ` Christoph Lameter
  2006-08-11 18:08     ` Andrew Morton
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Lameter @ 2006-08-11  3:16 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, pj, jes, Andy Whitcroft

On Thu, 10 Aug 2006, Andrew Morton wrote:

> This adds a little bit of overhead to non-numa kernels.  I think that
> overhead could be eliminated if we were to do

The overhead is really minimal. The parameter we are testing is passed on 
later and the test is unlikely.

I would rather avoid fiddling around with making __GFP_xxx conditional.
We have seen  to what problems this could lead. The #ifdef is less harmful
if placed in get_page_from_freelist.

How about this one:

Index: linux-2.6.18-rc3-mm2/mm/page_alloc.c
===================================================================
--- linux-2.6.18-rc3-mm2.orig/mm/page_alloc.c	2006-08-09 18:37:06.434599531 -0700
+++ linux-2.6.18-rc3-mm2/mm/page_alloc.c	2006-08-10 20:13:53.674465629 -0700
@@ -918,12 +918,14 @@ get_page_from_freelist(gfp_t gfp_mask, u
 	 */
 	do {
 		zone = *z;
+#ifdef CONFIG_NUMA
 		if (unlikely((gfp_mask & __GFP_THISNODE) &&
 			zone->zone_pgdat != zonelist->zones[0]->zone_pgdat))
 				break;
 		if ((alloc_flags & ALLOC_CPUSET) &&
 				!cpuset_zone_allowed(zone, gfp_mask))
 			continue;
+#endif
 
 		if (!(alloc_flags & ALLOC_NO_WATERMARKS)) {
 			unsigned long mark;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [1/3] Add __GFP_THISNODE to avoid fallback to other nodes and ignore cpuset/memory policy restrictions.
  2006-08-11  3:16   ` Christoph Lameter
@ 2006-08-11 18:08     ` Andrew Morton
  2006-08-11 18:15       ` Christoph Lameter
  0 siblings, 1 reply; 26+ messages in thread
From: Andrew Morton @ 2006-08-11 18:08 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm, pj, jes, Andy Whitcroft

On Thu, 10 Aug 2006 20:16:31 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:

> On Thu, 10 Aug 2006, Andrew Morton wrote:
> 
> > This adds a little bit of overhead to non-numa kernels.  I think that
> > overhead could be eliminated if we were to do
> 
> The overhead is really minimal. The parameter we are testing is passed on 
> later and the test is unlikely.

Well yes, but it is a fastpath.  And it consumes icache.

> I would rather avoid fiddling around with making __GFP_xxx conditional.
> We have seen  to what problems this could lead.

What problems?

> The #ifdef is less harmful
> if placed in get_page_from_freelist.
> 
> How about this one:
> 
> Index: linux-2.6.18-rc3-mm2/mm/page_alloc.c
> ===================================================================
> --- linux-2.6.18-rc3-mm2.orig/mm/page_alloc.c	2006-08-09 18:37:06.434599531 -0700
> +++ linux-2.6.18-rc3-mm2/mm/page_alloc.c	2006-08-10 20:13:53.674465629 -0700
> @@ -918,12 +918,14 @@ get_page_from_freelist(gfp_t gfp_mask, u
>  	 */
>  	do {
>  		zone = *z;
> +#ifdef CONFIG_NUMA
>  		if (unlikely((gfp_mask & __GFP_THISNODE) &&
>  			zone->zone_pgdat != zonelist->zones[0]->zone_pgdat))
>  				break;
>  		if ((alloc_flags & ALLOC_CPUSET) &&
>  				!cpuset_zone_allowed(zone, gfp_mask))
>  			continue;
> +#endif
>  
>  		if (!(alloc_flags & ALLOC_NO_WATERMARKS)) {
>  			unsigned long mark;

I think it would be better to do the `#define __GFP_THISNODE 0'

- It allows the compiler to optimise things like:

	foo |= (__GFP_THISNODE|__GFP_OTHER)

  into a bit-set instruction.

- It allows us to remove the above ifdef from the middle of the page
  allocator (dammit).

- It means that the previously-ifdefed code always gets compiled.  So we
  don't get into situations where non-numa developers introduce compile
  errors or warnings into numa builds.

- Note that the second statement which the above patch puts inside the
  ifdef does not need to be ifdefed.  non-NUMA cpuset_zone_allowed()
  returns 1.  Putting an ifdef around it will only increase the chances of
  people introducing build errors and warnings.  

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [1/3] Add __GFP_THISNODE to avoid fallback to other nodes and ignore cpuset/memory policy restrictions.
  2006-08-11 18:08     ` Andrew Morton
@ 2006-08-11 18:15       ` Christoph Lameter
  2006-08-11 18:42         ` Andrew Morton
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Lameter @ 2006-08-11 18:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, pj, jes, Andy Whitcroft

On Fri, 11 Aug 2006, Andrew Morton wrote:

> > I would rather avoid fiddling around with making __GFP_xxx conditional.
> > We have seen  to what problems this could lead.
> 
> What problems?

I just cleaned out the #ifdefs from the __GFP_xx section because you 
told me that some comparisions could go haywire. if __GFP_xx would be zero. See 
our discussion recently on __GFP_DMA32.

F.e. Tests like (__GFP_DMA | __GFP_THISNODE) == __GFP_THISNODE
would give wrong positives if __GFP_THISNODE would be 0.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [1/3] Add __GFP_THISNODE to avoid fallback to other nodes and ignore cpuset/memory policy restrictions.
  2006-08-11 18:15       ` Christoph Lameter
@ 2006-08-11 18:42         ` Andrew Morton
  2006-08-11 18:51           ` Christoph Lameter
  2006-08-11 19:41           ` Dave McCracken
  0 siblings, 2 replies; 26+ messages in thread
From: Andrew Morton @ 2006-08-11 18:42 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm, pj, jes, Andy Whitcroft

On Fri, 11 Aug 2006 11:15:08 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:

> On Fri, 11 Aug 2006, Andrew Morton wrote:
> 
> > > I would rather avoid fiddling around with making __GFP_xxx conditional.
> > > We have seen  to what problems this could lead.
> > 
> > What problems?
> 
> I just cleaned out the #ifdefs from the __GFP_xx section because you 
> told me that some comparisions could go haywire. if __GFP_xx would be zero. See 
> our discussion recently on __GFP_DMA32.
> 
> F.e. Tests like (__GFP_DMA | __GFP_THISNODE) == __GFP_THISNODE
> would give wrong positives if __GFP_THISNODE would be 0.

mutter.  No ifdefs, please.  We already have 41 #ifdef NUMA's in ./*/*.c

How about we do

/*
 * We do this to avoid lots of ifdefs and their consequential conditional
 * compilation
 */
#ifdef CONFIG_NUMA
#define NUMA_BUILD 1
#else
#define NUMA_BUILD 0
#endif

Then we can do

--- a/mm/page_alloc.c~a
+++ a/mm/page_alloc.c
@@ -903,7 +903,7 @@ get_page_from_freelist(gfp_t gfp_mask, u
 	 */
 	do {
 		zone = *z;
-		if (unlikely((gfp_mask & __GFP_THISNODE) &&
+		if (NUMA_BUILD && unlikely((gfp_mask & __GFP_THISNODE) &&
 			zone->zone_pgdat != zonelist->zones[0]->zone_pgdat))
 				break;
 		if ((alloc_flags & ALLOC_CPUSET) &&
_

in lots of places.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [1/3] Add __GFP_THISNODE to avoid fallback to other nodes and ignore cpuset/memory policy restrictions.
  2006-08-11 18:42         ` Andrew Morton
@ 2006-08-11 18:51           ` Christoph Lameter
  2006-08-11 19:15             ` Andrew Morton
  2006-08-11 19:41           ` Dave McCracken
  1 sibling, 1 reply; 26+ messages in thread
From: Christoph Lameter @ 2006-08-11 18:51 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, pj, jes, Andy Whitcroft

On Fri, 11 Aug 2006, Andrew Morton wrote:

> How about we do
> 
> /*
>  * We do this to avoid lots of ifdefs and their consequential conditional
>  * compilation
>  */
> #ifdef CONFIG_NUMA
> #define NUMA_BUILD 1
> #else
> #define NUMA_BUILD 0
> #endif

Put this in kernel.h?

Sounds good but this sets a new precedent on how to avoid #ifdefs.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [1/3] Add __GFP_THISNODE to avoid fallback to other nodes and ignore cpuset/memory policy restrictions.
  2006-08-11 18:51           ` Christoph Lameter
@ 2006-08-11 19:15             ` Andrew Morton
  2006-08-11 19:16               ` Christoph Lameter
  0 siblings, 1 reply; 26+ messages in thread
From: Andrew Morton @ 2006-08-11 19:15 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm, pj, jes, Andy Whitcroft

On Fri, 11 Aug 2006 11:51:59 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:

> On Fri, 11 Aug 2006, Andrew Morton wrote:
> 
> > How about we do
> > 
> > /*
> >  * We do this to avoid lots of ifdefs and their consequential conditional
> >  * compilation
> >  */
> > #ifdef CONFIG_NUMA
> > #define NUMA_BUILD 1
> > #else
> > #define NUMA_BUILD 0
> > #endif
> 
> Put this in kernel.h?

spose so.

> Sounds good but this sets a new precedent on how to avoid #ifdefs.

It does, a bit.  I'm not aware of any downside to it, really.  I got dinged
by Linus maybe five years back for this sort of thing.  He muttered something
about it defeating checkconfig or configcheck or some similar thing which probably
doesn't exist now.

Perhaps there is a downside.  But one could argue that NUMA is a
special-case.   Let's try it in a couple of places, see how it goes?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [1/3] Add __GFP_THISNODE to avoid fallback to other nodes and ignore cpuset/memory policy restrictions.
  2006-08-11 19:15             ` Andrew Morton
@ 2006-08-11 19:16               ` Christoph Lameter
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Lameter @ 2006-08-11 19:16 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, pj, jes, Andy Whitcroft

On Fri, 11 Aug 2006, Andrew Morton wrote:

> Perhaps there is a downside.  But one could argue that NUMA is a
> special-case.   Let's try it in a couple of places, see how it goes?

I'd be glad to use it for future patches. Its certainly useful to deal
with conditionals that are only relevant in NUMA situations.
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [1/3] Add __GFP_THISNODE to avoid fallback to other nodes and ignore cpuset/memory policy restrictions.
  2006-08-11 18:42         ` Andrew Morton
  2006-08-11 18:51           ` Christoph Lameter
@ 2006-08-11 19:41           ` Dave McCracken
  1 sibling, 0 replies; 26+ messages in thread
From: Dave McCracken @ 2006-08-11 19:41 UTC (permalink / raw)
  To: Andrew Morton, Christoph Lameter; +Cc: linux-mm

On Friday 11 August 2006 1:42 pm, Andrew Morton wrote:
> How about we do
>
> /*
>  * We do this to avoid lots of ifdefs and their consequential conditional
>  * compilation
>  */
> #ifdef CONFIG_NUMA
> #define NUMA_BUILD 1
> #else
> #define NUMA_BUILD 0
> #endif
>
> Then we can do
>
> --- a/mm/page_alloc.c~a
> +++ a/mm/page_alloc.c
> @@ -903,7 +903,7 @@ get_page_from_freelist(gfp_t gfp_mask, u
>          */
>         do {
>                 zone = *z;
> -               if (unlikely((gfp_mask & __GFP_THISNODE) &&
> +               if (NUMA_BUILD && unlikely((gfp_mask & __GFP_THISNODE) &&
>                         zone->zone_pgdat !=
> zonelist->zones[0]->zone_pgdat)) break;
>                 if ((alloc_flags & ALLOC_CPUSET) &&
> _

Wouldn't you get a similar effect by doing

#ifdef CONFIG_NUMA
#define	gfp_thisnode(__mask)		((__mask) & __GFP_THISNODE)
#else
#define	gfp_thisnode(__mask)		(0)

Or are there too many different ways this is used to make a macro practical?  
What am I missing here?

Dave McCracken

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2006-08-11 19:41 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-08-08 16:33 [1/3] Add __GFP_THISNODE to avoid fallback to other nodes and ignore cpuset/memory policy restrictions Christoph Lameter
2006-08-08 16:34 ` [2/3] sys_move_pages: Do not fall back to other nodes Christoph Lameter
2006-08-08 16:37   ` [3/3] Guarantee that the uncached allocator gets pages on the correct node Christoph Lameter
2006-08-08 16:56 ` [1/3] Add __GFP_THISNODE to avoid fallback to other nodes and ignore cpuset/memory policy restrictions Andy Whitcroft
2006-08-08 17:01   ` Christoph Lameter
2006-08-08 16:59 ` Mel Gorman
2006-08-08 17:03   ` Christoph Lameter
2006-08-08 17:16     ` Mel Gorman
2006-08-08 17:51       ` Christoph Lameter
2006-08-08 17:47     ` Paul Jackson
2006-08-08 17:59       ` Christoph Lameter
2006-08-08 18:18         ` Paul Jackson
2006-08-08 18:49           ` Christoph Lameter
2006-08-08 20:35             ` Paul Jackson
2006-08-09  9:33               ` Mel Gorman
2006-08-09  1:34 ` KAMEZAWA Hiroyuki
2006-08-09  2:00   ` Christoph Lameter
2006-08-10 19:41 ` Andrew Morton
2006-08-11  3:16   ` Christoph Lameter
2006-08-11 18:08     ` Andrew Morton
2006-08-11 18:15       ` Christoph Lameter
2006-08-11 18:42         ` Andrew Morton
2006-08-11 18:51           ` Christoph Lameter
2006-08-11 19:15             ` Andrew Morton
2006-08-11 19:16               ` Christoph Lameter
2006-08-11 19:41           ` Dave McCracken

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox