linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nishanth Aravamudan <nacc@us.ibm.com>
To: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: clameter@sgi.com, akpm@linux-foundation.org, linux-mm@kvack.org,
	ak@suse.de
Subject: Re: [patch 2/3] Fix GFP_THISNODE behavior for memoryless nodes
Date: Wed, 13 Jun 2007 14:57:50 -0700	[thread overview]
Message-ID: <20070613215750.GU3798@us.ibm.com> (raw)
In-Reply-To: <1181769033.6148.116.camel@localhost>

On 13.06.2007 [17:10:32 -0400], Lee Schermerhorn wrote:
> On Tue, 2007-06-12 at 13:48 -0700, clameter@sgi.com wrote:
> > GFP_THISNODE checks that the zone selected is within the pgdat (node) of the
> > first zone of a nodelist. That only works if the node has memory. A
> > memoryless node will have its first zone on another pgdat (node).
> > 
> > Thus GFP_THISNODE may be returning memory on other nodes.
> > GFP_THISNODE should fail if there is no local memory on a node.
> > 
> > So we add a check to verify that the node specified has memory in
> > alloc_pages_node(). If the node has no memory then return NULL.
> > 
> > The case of alloc_pages(GFP_THISNODE) is not changed. alloc_pages() (with
> > no memory policies in effect) is understood to prefer the current node.
> > If a process is running on a node with no memory then its default allocations
> > come from the next neighboring node. GFP_THISNODE will then force the memory
> > to come from that node.
> > 
> > Signed-off-by: Christoph Lameter <clameter@sgi.com>
> > Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
> > 
> > Index: linux-2.6.22-rc4-mm2/include/linux/gfp.h
> > ===================================================================
> > --- linux-2.6.22-rc4-mm2.orig/include/linux/gfp.h	2007-06-12 12:33:37.000000000 -0700
> > +++ linux-2.6.22-rc4-mm2/include/linux/gfp.h	2007-06-12 12:38:37.000000000 -0700
> > @@ -175,6 +175,13 @@ static inline struct page *alloc_pages_n
> >  	if (nid < 0)
> >  		nid = numa_node_id();
> >  
> > +	/*
> > +	 * Check for the special case that GFP_THISNODE is used on a
> > +	 * memoryless node
> > +	 */
> > +	if ((gfp_mask & __GFP_THISNODE) && !node_memory(nid))
> > +		return NULL;
> > +
> >  	return __alloc_pages(gfp_mask, order,
> >  		NODE_DATA(nid)->node_zonelists + gfp_zone(gfp_mask));
> >  }
> > 
> 
> Attached patch fixes alloc_pages_node() so that it never returns an
> off-node page when GFP_THISNODE is specified by.  This requires a fix to
> SLUB early allocation, included in the patch.  Works on HP ia64 platform
> with small DMA only node and "zone order" zonelists.  Will test on
> x86_64 real soon now...
> 
> ---
> 
> PATCH  fix GFP_THISNODE for DMA only nodes and zone-order zonelists
> 
> The map of nodes with memory may include nodes with just
> DMA/DMA32 memory.  Using this map/mask together with
> GFP_THISNODE will not guarantee on-node allocations at higher
> zones.  Modify checks in alloc_pages_node() to ensure that the
> first zone in the selected zonelist is "on-node".
> 
> This change will result in alloc_pages_node() returning NULL
> when GFP_THISNODE is specified and the first zone in the zonelist
> selected by (nid, gfp_zone(gfp_mask) is not on node 'nid'.  This,
> in turn, BUGs out in slub.c:early_kmem_cache_node_alloc() which
> apparently can't handle a NULL page from new_slab().  Fix SLUB
> to handle NULL page in early allocation.
> 
> Signed-off-by:  Lee Schermerhorn <lee.schermerhorn@hp.com>
> 
>  include/linux/gfp.h |   11 ++++++++---
>  mm/slub.c           |   22 ++++++++++++----------
>  2 files changed, 20 insertions(+), 13 deletions(-)
> 
> Index: Linux/include/linux/gfp.h
> ===================================================================
> --- Linux.orig/include/linux/gfp.h	2007-06-13 16:36:02.000000000 -0400
> +++ Linux/include/linux/gfp.h	2007-06-13 16:38:41.000000000 -0400
> @@ -168,6 +168,9 @@ FASTCALL(__alloc_pages(gfp_t, unsigned i
>  static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
>  						unsigned int order)
>  {
> +	pg_data_t *pgdat;
> +	struct zonelist *zonelist;
> +
>  	if (unlikely(order >= MAX_ORDER))
>  		return NULL;
> 
> @@ -179,11 +182,13 @@ static inline struct page *alloc_pages_n
>  	 * Check for the special case that GFP_THISNODE is used on a
>  	 * memoryless node
>  	 */
> -	if ((gfp_mask & __GFP_THISNODE) && !node_memory(nid))
> +	pgdat = NODE_DATA(nid);
> +	zonelist = pgdat->node_zonelists + gfp_zone(gfp_mask);
> +	if ((gfp_mask & __GFP_THISNODE) &&
> +		pgdat != zonelist->zones[0]->zone_pgdat)

This hunk won't apply on top of the unlikely() change... Rediffed below,
adding into the set to test now...

From: Lee Schermerhorn <lee.schermerhorn@hp.com>

PATCH  fix GFP_THISNODE for DMA only nodes and zone-order zonelists

The map of nodes with memory may include nodes with just
DMA/DMA32 memory.  Using this map/mask together with
GFP_THISNODE will not guarantee on-node allocations at higher
zones.  Modify checks in alloc_pages_node() to ensure that the
first zone in the selected zonelist is "on-node".

This change will result in alloc_pages_node() returning NULL when
GFP_THISNODE is specified and the first zone in the zonelist selected by
(nid, gfp_zone(gfp_mask) is not on node 'nid'.  This, in turn, BUGs out
in slub.c:early_kmem_cache_node_alloc() which apparently can't handle a
NULL page from new_slab().  Fix SLUB to handle NULL page in early
allocation.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index ecd8adb..98a76c9 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -168,6 +168,9 @@ FASTCALL(__alloc_pages(gfp_t, unsigned int, struct zonelist *));
 static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
 						unsigned int order)
 {
+	pg_data_t *pgdat;
+	struct zonelist *zonelist;
+
 	if (unlikely(order >= MAX_ORDER))
 		return NULL;
 
@@ -175,15 +178,17 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
 	if (nid < 0)
 		nid = numa_node_id();
 
+	pgdat = NODE_DATA(nid);
+	zonelist = pgdat->node_zonelists + gfp_zone(gfp_mask);
 	/*
 	 * Check for the special case that GFP_THISNODE is used on a
 	 * memoryless node
 	 */
-	if (unlikely((gfp_mask & __GFP_THISNODE) && !node_memory(nid)))
+	if (unlikely((gfp_mask & __GFP_THISNODE) &&
+		pgdat != zonelist->zones[0]->zone_pgdat))
 		return NULL;
 
-	return __alloc_pages(gfp_mask, order,
-		NODE_DATA(nid)->node_zonelists + gfp_zone(gfp_mask));
+	return __alloc_pages(gfp_mask, order, zonelist);
 }
 
 #ifdef CONFIG_NUMA
diff --git a/mm/slub.c b/mm/slub.c
index c16b75d..43a9270 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1870,16 +1870,18 @@ static struct kmem_cache_node * __init early_kmem_cache_node_alloc(gfp_t gfpflag
 	/* new_slab() disables interupts */
 	local_irq_enable();
 
-	BUG_ON(!page);
-	n = page->freelist;
-	BUG_ON(!n);
-	page->freelist = get_freepointer(kmalloc_caches, n);
-	page->inuse++;
-	kmalloc_caches->node[node] = n;
-	setup_object_debug(kmalloc_caches, page, n);
-	init_kmem_cache_node(n);
-	atomic_long_inc(&n->nr_slabs);
-	add_partial(n, page);
+	if (page) {
+		n = page->freelist;
+		BUG_ON(!n);
+		page->freelist = get_freepointer(kmalloc_caches, n);
+		page->inuse++;
+		kmalloc_caches->node[node] = n;
+		setup_object_debug(kmalloc_caches, page, n);
+		init_kmem_cache_node(n);
+		atomic_long_inc(&n->nr_slabs);
+		add_partial(n, page);
+	} else
+		kmalloc_caches->node[node] = NULL;
 	return n;
 }
 

-- 
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2007-06-13 20:55 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-06-12 20:48 [patch 0/3] Fixes for NUMA allocations on " clameter
2007-06-12 20:48 ` [patch 1/3] NUMA: introduce node_memory_map clameter
2007-06-12 21:03   ` David Rientjes
2007-06-12 21:08     ` Christoph Lameter
2007-06-12 21:10       ` David Rientjes
2007-06-12 21:27         ` Christoph Lameter
2007-06-12 21:34           ` David Rientjes
2007-06-12 21:37             ` Christoph Lameter
2007-06-12 21:38               ` David Rientjes
2007-06-12 21:36         ` Nishanth Aravamudan
2007-06-12 21:39           ` Christoph Lameter
2007-06-12 21:42             ` Nishanth Aravamudan
2007-06-12 21:45               ` David Rientjes
2007-06-12 22:26               ` Christoph Lameter
2007-06-12 22:32                 ` Nishanth Aravamudan
2007-06-13  9:14           ` Andy Whitcroft
2007-06-12 20:48 ` [patch 2/3] Fix GFP_THISNODE behavior for memoryless nodes clameter
2007-06-12 21:03   ` David Rientjes
2007-06-12 21:07     ` Christoph Lameter
2007-06-12 21:08       ` David Rientjes
2007-06-12 21:25         ` Christoph Lameter
2007-06-12 21:34           ` David Rientjes
2007-06-12 21:51             ` Nishanth Aravamudan
2007-06-13 21:10   ` Lee Schermerhorn
2007-06-13 21:57     ` Nishanth Aravamudan [this message]
2007-06-13 22:46     ` Christoph Lameter
2007-06-13 23:11       ` Nishanth Aravamudan
2007-06-13 23:15         ` Christoph Lameter
2007-06-13 23:20           ` Nishanth Aravamudan
2007-06-13 23:26             ` Christoph Lameter
2007-06-13 23:32               ` Nishanth Aravamudan
2007-06-13 23:53                 ` Christoph Lameter
2007-06-14  0:04                   ` Nishanth Aravamudan
2007-06-14 14:18       ` Lee Schermerhorn
2007-06-14 14:24         ` Christoph Lameter
2007-06-14 14:55           ` Lee Schermerhorn
2007-06-14 15:51             ` Christoph Lameter
2007-06-14  7:07     ` Christoph Lameter
2007-06-14 14:23       ` Nishanth Aravamudan
2007-06-12 20:48 ` [patch 3/3] Fix MPOL_INTERLEAVE " clameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070613215750.GU3798@us.ibm.com \
    --to=nacc@us.ibm.com \
    --cc=Lee.Schermerhorn@hp.com \
    --cc=ak@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=clameter@sgi.com \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox