From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
To: clameter@sgi.com
Cc: akpm@linux-foundation.org, linux-mm@kvack.org, ak@suse.de,
Nishanth Aravamudan <nacc@us.ibm.com>
Subject: Re: [patch 2/3] Fix GFP_THISNODE behavior for memoryless nodes
Date: Wed, 13 Jun 2007 17:10:32 -0400 [thread overview]
Message-ID: <1181769033.6148.116.camel@localhost> (raw)
In-Reply-To: <20070612205738.548677035@sgi.com>
On Tue, 2007-06-12 at 13:48 -0700, clameter@sgi.com wrote:
> GFP_THISNODE checks that the zone selected is within the pgdat (node) of the
> first zone of a nodelist. That only works if the node has memory. A
> memoryless node will have its first zone on another pgdat (node).
>
> Thus GFP_THISNODE may be returning memory on other nodes.
> GFP_THISNODE should fail if there is no local memory on a node.
>
> So we add a check to verify that the node specified has memory in
> alloc_pages_node(). If the node has no memory then return NULL.
>
> The case of alloc_pages(GFP_THISNODE) is not changed. alloc_pages() (with
> no memory policies in effect) is understood to prefer the current node.
> If a process is running on a node with no memory then its default allocations
> come from the next neighboring node. GFP_THISNODE will then force the memory
> to come from that node.
>
> Signed-off-by: Christoph Lameter <clameter@sgi.com>
> Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
>
> Index: linux-2.6.22-rc4-mm2/include/linux/gfp.h
> ===================================================================
> --- linux-2.6.22-rc4-mm2.orig/include/linux/gfp.h 2007-06-12 12:33:37.000000000 -0700
> +++ linux-2.6.22-rc4-mm2/include/linux/gfp.h 2007-06-12 12:38:37.000000000 -0700
> @@ -175,6 +175,13 @@ static inline struct page *alloc_pages_n
> if (nid < 0)
> nid = numa_node_id();
>
> + /*
> + * Check for the special case that GFP_THISNODE is used on a
> + * memoryless node
> + */
> + if ((gfp_mask & __GFP_THISNODE) && !node_memory(nid))
> + return NULL;
> +
> return __alloc_pages(gfp_mask, order,
> NODE_DATA(nid)->node_zonelists + gfp_zone(gfp_mask));
> }
>
Attached patch fixes alloc_pages_node() so that it never returns an
off-node page when GFP_THISNODE is specified by. This requires a fix to
SLUB early allocation, included in the patch. Works on HP ia64 platform
with small DMA only node and "zone order" zonelists. Will test on
x86_64 real soon now...
---
PATCH fix GFP_THISNODE for DMA only nodes and zone-order zonelists
The map of nodes with memory may include nodes with just
DMA/DMA32 memory. Using this map/mask together with
GFP_THISNODE will not guarantee on-node allocations at higher
zones. Modify checks in alloc_pages_node() to ensure that the
first zone in the selected zonelist is "on-node".
This change will result in alloc_pages_node() returning NULL
when GFP_THISNODE is specified and the first zone in the zonelist
selected by (nid, gfp_zone(gfp_mask) is not on node 'nid'. This,
in turn, BUGs out in slub.c:early_kmem_cache_node_alloc() which
apparently can't handle a NULL page from new_slab(). Fix SLUB
to handle NULL page in early allocation.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
include/linux/gfp.h | 11 ++++++++---
mm/slub.c | 22 ++++++++++++----------
2 files changed, 20 insertions(+), 13 deletions(-)
Index: Linux/include/linux/gfp.h
===================================================================
--- Linux.orig/include/linux/gfp.h 2007-06-13 16:36:02.000000000 -0400
+++ Linux/include/linux/gfp.h 2007-06-13 16:38:41.000000000 -0400
@@ -168,6 +168,9 @@ FASTCALL(__alloc_pages(gfp_t, unsigned i
static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
unsigned int order)
{
+ pg_data_t *pgdat;
+ struct zonelist *zonelist;
+
if (unlikely(order >= MAX_ORDER))
return NULL;
@@ -179,11 +182,13 @@ static inline struct page *alloc_pages_n
* Check for the special case that GFP_THISNODE is used on a
* memoryless node
*/
- if ((gfp_mask & __GFP_THISNODE) && !node_memory(nid))
+ pgdat = NODE_DATA(nid);
+ zonelist = pgdat->node_zonelists + gfp_zone(gfp_mask);
+ if ((gfp_mask & __GFP_THISNODE) &&
+ pgdat != zonelist->zones[0]->zone_pgdat)
return NULL;
- return __alloc_pages(gfp_mask, order,
- NODE_DATA(nid)->node_zonelists + gfp_zone(gfp_mask));
+ return __alloc_pages(gfp_mask, order, zonelist);
}
#ifdef CONFIG_NUMA
Index: Linux/mm/slub.c
===================================================================
--- Linux.orig/mm/slub.c 2007-06-13 16:36:02.000000000 -0400
+++ Linux/mm/slub.c 2007-06-13 16:38:41.000000000 -0400
@@ -1870,16 +1870,18 @@ static struct kmem_cache_node * __init e
/* new_slab() disables interupts */
local_irq_enable();
- BUG_ON(!page);
- n = page->freelist;
- BUG_ON(!n);
- page->freelist = get_freepointer(kmalloc_caches, n);
- page->inuse++;
- kmalloc_caches->node[node] = n;
- setup_object_debug(kmalloc_caches, page, n);
- init_kmem_cache_node(n);
- atomic_long_inc(&n->nr_slabs);
- add_partial(n, page);
+ if (page) {
+ n = page->freelist;
+ BUG_ON(!n);
+ page->freelist = get_freepointer(kmalloc_caches, n);
+ page->inuse++;
+ kmalloc_caches->node[node] = n;
+ setup_object_debug(kmalloc_caches, page, n);
+ init_kmem_cache_node(n);
+ atomic_long_inc(&n->nr_slabs);
+ add_partial(n, page);
+ } else
+ kmalloc_caches->node[node] = NULL;
return n;
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-06-13 21:10 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-06-12 20:48 [patch 0/3] Fixes for NUMA allocations on " clameter
2007-06-12 20:48 ` [patch 1/3] NUMA: introduce node_memory_map clameter
2007-06-12 21:03 ` David Rientjes
2007-06-12 21:08 ` Christoph Lameter
2007-06-12 21:10 ` David Rientjes
2007-06-12 21:27 ` Christoph Lameter
2007-06-12 21:34 ` David Rientjes
2007-06-12 21:37 ` Christoph Lameter
2007-06-12 21:38 ` David Rientjes
2007-06-12 21:36 ` Nishanth Aravamudan
2007-06-12 21:39 ` Christoph Lameter
2007-06-12 21:42 ` Nishanth Aravamudan
2007-06-12 21:45 ` David Rientjes
2007-06-12 22:26 ` Christoph Lameter
2007-06-12 22:32 ` Nishanth Aravamudan
2007-06-13 9:14 ` Andy Whitcroft
2007-06-12 20:48 ` [patch 2/3] Fix GFP_THISNODE behavior for memoryless nodes clameter
2007-06-12 21:03 ` David Rientjes
2007-06-12 21:07 ` Christoph Lameter
2007-06-12 21:08 ` David Rientjes
2007-06-12 21:25 ` Christoph Lameter
2007-06-12 21:34 ` David Rientjes
2007-06-12 21:51 ` Nishanth Aravamudan
2007-06-13 21:10 ` Lee Schermerhorn [this message]
2007-06-13 21:57 ` Nishanth Aravamudan
2007-06-13 22:46 ` Christoph Lameter
2007-06-13 23:11 ` Nishanth Aravamudan
2007-06-13 23:15 ` Christoph Lameter
2007-06-13 23:20 ` Nishanth Aravamudan
2007-06-13 23:26 ` Christoph Lameter
2007-06-13 23:32 ` Nishanth Aravamudan
2007-06-13 23:53 ` Christoph Lameter
2007-06-14 0:04 ` Nishanth Aravamudan
2007-06-14 14:18 ` Lee Schermerhorn
2007-06-14 14:24 ` Christoph Lameter
2007-06-14 14:55 ` Lee Schermerhorn
2007-06-14 15:51 ` Christoph Lameter
2007-06-14 7:07 ` Christoph Lameter
2007-06-14 14:23 ` Nishanth Aravamudan
2007-06-12 20:48 ` [patch 3/3] Fix MPOL_INTERLEAVE " clameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1181769033.6148.116.camel@localhost \
--to=lee.schermerhorn@hp.com \
--cc=ak@suse.de \
--cc=akpm@linux-foundation.org \
--cc=clameter@sgi.com \
--cc=linux-mm@kvack.org \
--cc=nacc@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox