From: Nishanth Aravamudan <nacc@us.ibm.com>
To: Christoph Lameter <clameter@sgi.com>
Cc: lee.schermerhorn@hp.com, anton@samba.org,
akpm@linux-foundation.org, linux-mm@kvack.org
Subject: [PATCH][RFC] Fix INTERLEAVE with memoryless nodes
Date: Mon, 11 Jun 2007 16:08:29 -0700 [thread overview]
Message-ID: <20070611230829.GC14458@us.ibm.com> (raw)
In-Reply-To: <20070611225213.GB14458@us.ibm.com>
Applies to 2.6.22-rc4-mm2 with
add-populated_map-to-account-for-memoryless-nodes
applied.
Based on ideas from Christoph Lameter, add checks in the INTERLEAVE
paths for memoryless nodes. We do not want to try interleaving onto
those nodes.
Christoph said:
"This does not work for the address based interleaving for anonymous
vmas. I am not sure what to do there. We could change the calculation
of the node to be based only on nodes with memory and then skip the
memoryless ones. I have only added a comment to describe its brokennes
for now."
I have copied his draft comment.
Change alloc_pages_node() to fail __GFP_THISNODE allocations if the node
is not populated.
Again, Christoph said:
"This will fix the alloc_pages_node case but not the alloc_pages() case.
In the alloc_pages() case we do not specify a node. Implicitly it is
understood that we (in the case of no memory policy / cpuset options)
allocate from the nearest node. So it may be argued there that the
GFP_THISNODE behavior of taking the first node from the zonelist is
okay."
Christoph was also worried about the performance impact on these paths,
so, as he suggested, uninline alloc_pages_node() and move it to
mempolicy.c.
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 49dcc2f..c83e56a 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -165,19 +165,7 @@ static inline void arch_alloc_page(struct page *page, int order) { }
extern struct page *
FASTCALL(__alloc_pages(gfp_t, unsigned int, struct zonelist *));
-static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
- unsigned int order)
-{
- if (unlikely(order >= MAX_ORDER))
- return NULL;
-
- /* Unknown node is current node */
- if (nid < 0)
- nid = numa_node_id();
-
- return __alloc_pages(gfp_mask, order,
- NODE_DATA(nid)->node_zonelists + gfp_zone(gfp_mask));
-}
+extern struct page *alloc_pages_node(int, gfp_t, unsigned int);
#ifdef CONFIG_NUMA
extern struct page *alloc_pages_current(gfp_t gfp_mask, unsigned order);
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 144805c..4e7c48c 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -174,6 +174,7 @@ static struct zonelist *bind_zonelist(nodemask_t *nodes)
static struct mempolicy *mpol_new(int mode, nodemask_t *nodes)
{
struct mempolicy *policy;
+ unsigned nid;
PDprintk("setting mode %d nodes[0] %lx\n", mode, nodes_addr(*nodes)[0]);
if (mode == MPOL_DEFAULT)
@@ -184,6 +185,16 @@ static struct mempolicy *mpol_new(int mode, nodemask_t *nodes)
atomic_set(&policy->refcnt, 1);
switch (mode) {
case MPOL_INTERLEAVE:
+ /*
+ * Clear any memoryless nodes here so that v.nodes can be used
+ * without extra checks
+ */
+ nid = first_node(*nodes);
+ while (nid < MAX_NUMNODES) {
+ if (!node_populated(nid))
+ node_clear(nid, *nodes);
+ nid = next_node(nid, *nodes);
+ }
policy->v.nodes = *nodes;
if (nodes_weight(*nodes) == 0) {
kmem_cache_free(policy_cache, policy);
@@ -578,6 +589,22 @@ long do_get_mempolicy(int *policy, nodemask_t *nmask,
return err;
}
+struct page *alloc_pages_node(int nid, gfp_t gfp_mask, unsigned int order)
+{
+ if (unlikely(order >= MAX_ORDER))
+ return NULL;
+
+ /* Unknown node is current node */
+ if (nid < 0)
+ nid = numa_node_id();
+
+ if ((gfp_mask & __GFP_THISNODE) && !node_populated(nid))
+ return NULL;
+
+ return __alloc_pages(gfp_mask, order,
+ NODE_DATA(nid)->node_zonelists + gfp_zone(gfp_mask));
+}
+
#ifdef CONFIG_MIGRATION
/*
* page migration
@@ -1126,9 +1153,11 @@ static unsigned interleave_nodes(struct mempolicy *policy)
struct task_struct *me = current;
nid = me->il_next;
- next = next_node(nid, policy->v.nodes);
- if (next >= MAX_NUMNODES)
- next = first_node(policy->v.nodes);
+ do {
+ next = next_node(nid, policy->v.nodes);
+ if (next >= MAX_NUMNODES)
+ next = first_node(policy->v.nodes);
+ } while (!node_populated(next));
me->il_next = next;
return nid;
}
@@ -1192,6 +1221,11 @@ static inline unsigned interleave_nid(struct mempolicy *pol,
* for huge pages, since vm_pgoff is in units of small
* pages, we need to shift off the always 0 bits to get
* a useful offset.
+ *
+ * NOTE: For configurations with memoryless nodes this
+ * is broken since the allocation attempts on that node
+ * will fall back to other nodes and thus one
+ * neighboring node will be overallocated from.
*/
BUG_ON(shift < PAGE_SHIFT);
off = vma->vm_pgoff >> (shift - PAGE_SHIFT);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-06-11 23:08 UTC|newest]
Thread overview: 140+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-06-11 20:27 [PATCH] Add populated_map to account for " Nishanth Aravamudan, Lee Schermerhorn
2007-06-11 21:25 ` Christoph Lameter
2007-06-11 22:10 ` [PATCH v2] " Nishanth Aravamudan
2007-06-11 22:42 ` Christoph Lameter
2007-06-11 22:52 ` [PATCH v3] " Nishanth Aravamudan
2007-06-11 23:00 ` Christoph Lameter
2007-06-11 23:41 ` [PATCH v4] " Nishanth Aravamudan
2007-06-11 23:45 ` Christoph Lameter
2007-06-12 0:07 ` [PATCH] populated_map: fix !NUMA case, remove comment Nishanth Aravamudan
2007-06-12 0:41 ` Christoph Lameter
2007-06-12 1:43 ` Nishanth Aravamudan
2007-06-12 1:45 ` Christoph Lameter
2007-06-12 1:52 ` Nishanth Aravamudan
2007-06-12 2:39 ` Nishanth Aravamudan
2007-06-12 2:02 ` Nishanth Aravamudan
2007-06-12 2:20 ` Christoph Lameter
2007-06-12 2:32 ` Nishanth Aravamudan
2007-06-12 2:54 ` Christoph Lameter
2007-06-12 3:20 ` Nishanth Aravamudan
2007-06-12 3:21 ` Christoph Lameter
2007-06-12 3:31 ` Nishanth Aravamudan
2007-06-12 15:06 ` Lee Schermerhorn
2007-06-12 17:28 ` Nishanth Aravamudan
2007-06-12 18:43 ` Christoph Lameter
2007-06-12 18:48 ` Lee Schermerhorn
2007-06-12 18:51 ` Christoph Lameter
2007-06-12 19:44 ` Lee Schermerhorn
2007-06-12 19:48 ` Christoph Lameter
2007-06-12 19:58 ` Christoph Lameter
2007-06-12 20:01 ` Nishanth Aravamudan
2007-06-13 15:30 ` Lee Schermerhorn
2007-06-13 17:58 ` Nishanth Aravamudan
2007-06-13 18:21 ` Lee Schermerhorn
2007-06-13 19:01 ` Nishanth Aravamudan
2007-06-13 22:51 ` Christoph Lameter
2007-06-14 15:50 ` Lee Schermerhorn
2007-06-14 15:57 ` Christoph Lameter
2007-06-14 16:54 ` Lee Schermerhorn
2007-06-14 16:09 ` Nishanth Aravamudan
2007-06-14 16:15 ` Christoph Lameter
2007-06-14 17:07 ` Lee Schermerhorn
2007-06-14 17:16 ` Christoph Lameter
2007-06-14 18:04 ` Lee Schermerhorn
2007-06-14 22:35 ` Nishanth Aravamudan
2007-06-13 22:50 ` Christoph Lameter
2007-06-13 23:09 ` Nishanth Aravamudan
2007-06-13 23:12 ` Christoph Lameter
2007-06-13 23:18 ` Nishanth Aravamudan
2007-06-13 23:26 ` Christoph Lameter
2007-06-13 23:56 ` Nishanth Aravamudan
2007-06-14 14:23 ` Lee Schermerhorn
2007-06-13 22:49 ` Christoph Lameter
2007-06-12 19:55 ` Nishanth Aravamudan
2007-06-12 18:41 ` Christoph Lameter
2007-06-12 19:07 ` Lee Schermerhorn
2007-06-12 19:13 ` Christoph Lameter
2007-06-11 23:08 ` Nishanth Aravamudan [this message]
2007-06-11 23:10 ` [PATCH v6][RFC] Fix hugetlb pool allocation with empty nodes Nishanth Aravamudan
2007-06-11 23:11 ` [PATCH][RFC] hugetlb: numafy several functions Nishanth Aravamudan
2007-06-11 23:13 ` [PATCH][RFC] hugetlb: add per-node nr_hugepages sysfs attribute Nishanth Aravamudan
2007-06-11 23:40 ` Christoph Lameter
2007-06-11 23:42 ` Christoph Lameter
2007-06-12 0:19 ` Nishanth Aravamudan
2007-06-12 0:43 ` Christoph Lameter
2007-06-12 2:19 ` Nishanth Aravamudan
2007-06-12 2:22 ` Christoph Lameter
2007-06-12 2:34 ` Nishanth Aravamudan
2007-06-11 23:38 ` [PATCH][RFC] hugetlb: numafy several functions Christoph Lameter
2007-06-11 23:17 ` [PATCH v6][RFC] Fix hugetlb pool allocation with empty nodes Christoph Lameter
2007-06-12 0:15 ` Nishanth Aravamudan
2007-06-12 0:47 ` Christoph Lameter
2007-06-12 2:12 ` Nishanth Aravamudan
2007-06-12 2:21 ` Christoph Lameter
2007-06-12 2:25 ` Christoph Lameter
2007-06-12 2:34 ` Nishanth Aravamudan
2007-06-12 2:55 ` Christoph Lameter
2007-06-12 3:17 ` Nishanth Aravamudan
2007-06-12 3:19 ` Christoph Lameter
2007-06-12 3:30 ` Nishanth Aravamudan
2007-06-12 3:48 ` Christoph Lameter
2007-06-12 5:07 ` Nishanth Aravamudan
2007-06-12 18:47 ` Christoph Lameter
2007-06-12 17:43 ` Nishanth Aravamudan
2007-06-12 18:49 ` Christoph Lameter
2007-06-12 2:33 ` Nishanth Aravamudan
2007-06-12 3:44 ` William Lee Irwin III
2007-06-12 3:50 ` Christoph Lameter
2007-06-12 3:53 ` William Lee Irwin III
2007-06-12 3:53 ` Christoph Lameter
2007-06-12 4:14 ` William Lee Irwin III
2007-06-12 5:09 ` Nishanth Aravamudan
2007-06-12 5:15 ` William Lee Irwin III
2007-06-12 17:36 ` Nishanth Aravamudan
2007-06-12 18:50 ` Christoph Lameter
2007-06-12 17:45 ` Nishanth Aravamudan
2007-06-12 19:13 ` William Lee Irwin III
2007-06-13 0:04 ` [PATCH v7][RFC] " Nishanth Aravamudan
2007-06-13 15:26 ` [PATCH v3][RFC] hugetlb: numafy several functions Nishanth Aravamudan
2007-06-13 15:28 ` [PATCH v3][RFC] hugetlb: add per-node nr_hugepages sysfs attribute Nishanth Aravamudan
2007-06-13 18:23 ` Lee Schermerhorn
2007-06-13 19:19 ` [PATCH v4][RFC] " Nishanth Aravamudan
2007-06-13 20:05 ` Lee Schermerhorn
2007-06-13 20:29 ` Nishanth Aravamudan
2007-06-13 21:02 ` Lee Schermerhorn
2007-07-23 19:23 ` Christoph Lameter
2007-07-23 20:14 ` Lee Schermerhorn
2007-06-13 21:04 ` [PATCH v7][RFC] Fix hugetlb pool allocation with empty nodes Lee Schermerhorn
2007-06-13 21:50 ` [PATCH v7][UPDATE][RFC] " Nishanth Aravamudan
2007-06-12 14:28 ` [PATCH v6][RFC] " Lee Schermerhorn
2007-06-11 23:15 ` [PATCH][RFC] Fix INTERLEAVE with memoryless nodes Christoph Lameter
2007-06-12 0:14 ` [PATCH v2][RFC] " Nishanth Aravamudan
2007-06-12 0:42 ` Christoph Lameter
2007-06-12 0:57 ` Andrew Morton
2007-06-12 1:12 ` Christoph Lameter
2007-06-12 1:41 ` Nishanth Aravamudan
2007-06-12 1:52 ` Andrew Morton
2007-06-12 2:03 ` Nishanth Aravamudan
2007-06-12 14:19 ` [PATCH v2] Add populated_map to account for " Lee Schermerhorn
2007-06-12 17:32 ` Nishanth Aravamudan
2007-06-12 18:45 ` Christoph Lameter
2007-06-12 19:17 ` Lee Schermerhorn
2007-06-12 19:22 ` Christoph Lameter
2007-06-12 19:49 ` Nishanth Aravamudan
2007-06-12 19:51 ` Christoph Lameter
2007-06-12 20:00 ` Nishanth Aravamudan
2007-06-12 20:03 ` Christoph Lameter
2007-06-12 20:10 ` Christoph Lameter
2007-06-12 19:52 ` Christoph Lameter
2007-06-12 19:58 ` Christoph Lameter
2007-06-12 20:00 ` Nishanth Aravamudan
2007-06-12 20:06 ` Christoph Lameter
2007-06-12 14:10 ` [PATCH] " Lee Schermerhorn
2007-06-12 17:35 ` Nishanth Aravamudan
2007-06-12 18:39 ` Christoph Lameter
2007-06-12 18:54 ` Lee Schermerhorn
2007-06-12 19:00 ` Christoph Lameter
2007-06-12 2:27 ` KAMEZAWA Hiroyuki
2007-06-12 2:46 ` Nishanth Aravamudan
2007-06-12 2:53 ` Christoph Lameter
2007-06-12 3:04 ` KAMEZAWA Hiroyuki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070611230829.GC14458@us.ibm.com \
--to=nacc@us.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=anton@samba.org \
--cc=clameter@sgi.com \
--cc=lee.schermerhorn@hp.com \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox