From: Nishanth Aravamudan <nacc@us.ibm.com>
To: Christoph Lameter <clameter@sgi.com>
Cc: lee.schermerhorn@hp.com, anton@samba.org,
akpm@linux-foundation.org, linux-mm@kvack.org
Subject: [PATCH v2][RFC] Fix INTERLEAVE with memoryless nodes
Date: Mon, 11 Jun 2007 17:14:36 -0700 [thread overview]
Message-ID: <20070612001436.GI14458@us.ibm.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0706111613100.23857@schroedinger.engr.sgi.com>
On 11.06.2007 [16:15:15 -0700], Christoph Lameter wrote:
> On Mon, 11 Jun 2007, Nishanth Aravamudan wrote:
>
> > Christoph was also worried about the performance impact on these paths,
> > so, as he suggested, uninline alloc_pages_node() and move it to
> > mempolicy.c.
>
> uninlining does not address performance issues.
>
> > @@ -184,6 +185,16 @@ static struct mempolicy *mpol_new(int mode, nodemask_t *nodes)
> > atomic_set(&policy->refcnt, 1);
> > switch (mode) {
> > case MPOL_INTERLEAVE:
> > + /*
> > + * Clear any memoryless nodes here so that v.nodes can be used
> > + * without extra checks
> > + */
> > + nid = first_node(*nodes);
> > + while (nid < MAX_NUMNODES) {
> > + if (!node_populated(nid))
> > + node_clear(nid, *nodes);
> > + nid = next_node(nid, *nodes);
> > + }
>
> There is a "nodes_and" function for this.
Right, fixed.
> > @@ -1126,9 +1153,11 @@ static unsigned interleave_nodes(struct mempolicy *policy)
> > struct task_struct *me = current;
> >
> > nid = me->il_next;
> > - next = next_node(nid, policy->v.nodes);
> > - if (next >= MAX_NUMNODES)
> > - next = first_node(policy->v.nodes);
> > + do {
> > + next = next_node(nid, policy->v.nodes);
> > + if (next >= MAX_NUMNODES)
> > + next = first_node(policy->v.nodes);
> > + } while (!node_populated(next));
>
> Is there a case where nodes has no node set?
Well, if the only place to get a "new" policy is mpol_new(), no, as just
after the above nodes_and(), we check the weight of the nodemask. Is
that sufficient?
Based on ideas from Christoph Lameter, add checks in the INTERLEAVE
paths for memoryless nodes. We do not want to try interleaving onto
those nodes.
Christoph said:
"This does not work for the address based interleaving for anonymous
vmas. I am not sure what to do there. We could change the calculation
of the node to be based only on nodes with memory and then skip the
memoryless ones. I have only added a comment to describe its brokennes
for now."
I have copied his draft's comment.
Change alloc_pages_node() to fail __GFP_THISNODE allocations if the node
is not populated.
Again, Christoph said:
"This will fix the alloc_pages_node case but not the alloc_pages() case.
In the alloc_pages() case we do not specify a node. Implicitly it is
understood that we (in the case of no memory policy / cpuset options)
allocate from the nearest node. So it may be argued there that the
GFP_THISNODE behavior of taking the first node from the zonelist is
okay."
Christoph was also worried about the performance impact on these paths,
as am I.
Finally, as he suggested, uninline alloc_pages_node() and move it to
mempolicy.c.
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 49dcc2f..c83e56a 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -165,19 +165,7 @@ static inline void arch_alloc_page(struct page *page, int order) { }
extern struct page *
FASTCALL(__alloc_pages(gfp_t, unsigned int, struct zonelist *));
-static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
- unsigned int order)
-{
- if (unlikely(order >= MAX_ORDER))
- return NULL;
-
- /* Unknown node is current node */
- if (nid < 0)
- nid = numa_node_id();
-
- return __alloc_pages(gfp_mask, order,
- NODE_DATA(nid)->node_zonelists + gfp_zone(gfp_mask));
-}
+extern struct page * alloc_pages_node(int, gfp_t, unsigned int);
#ifdef CONFIG_NUMA
extern struct page *alloc_pages_current(gfp_t gfp_mask, unsigned order);
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 144805c..abadbf4 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -174,6 +174,7 @@ static struct zonelist *bind_zonelist(nodemask_t *nodes)
static struct mempolicy *mpol_new(int mode, nodemask_t *nodes)
{
struct mempolicy *policy;
+ unsigned nid;
PDprintk("setting mode %d nodes[0] %lx\n", mode, nodes_addr(*nodes)[0]);
if (mode == MPOL_DEFAULT)
@@ -184,8 +185,12 @@ static struct mempolicy *mpol_new(int mode, nodemask_t *nodes)
atomic_set(&policy->refcnt, 1);
switch (mode) {
case MPOL_INTERLEAVE:
- policy->v.nodes = *nodes;
- if (nodes_weight(*nodes) == 0) {
+ /*
+ * Clear any memoryless nodes here so that v.nodes can be used
+ * without extra checks
+ */
+ nodes_and(policy->v.nodes, *nodes, node_populated_mask);
+ if (nodes_weight(policy->v.nodes) == 0) {
kmem_cache_free(policy_cache, policy);
return ERR_PTR(-EINVAL);
}
@@ -578,6 +583,22 @@ long do_get_mempolicy(int *policy, nodemask_t *nmask,
return err;
}
+struct page *alloc_pages_node(int nid, gfp_t gfp_mask, unsigned int order)
+{
+ if (unlikely(order >= MAX_ORDER))
+ return NULL;
+
+ /* Unknown node is current node */
+ if (nid < 0)
+ nid = numa_node_id();
+
+ if ((gfp_mask & __GFP_THISNODE) && !node_populated(nid))
+ return NULL;
+
+ return __alloc_pages(gfp_mask, order,
+ NODE_DATA(nid)->node_zonelists + gfp_zone(gfp_mask));
+}
+
#ifdef CONFIG_MIGRATION
/*
* page migration
@@ -1126,9 +1147,11 @@ static unsigned interleave_nodes(struct mempolicy *policy)
struct task_struct *me = current;
nid = me->il_next;
- next = next_node(nid, policy->v.nodes);
- if (next >= MAX_NUMNODES)
- next = first_node(policy->v.nodes);
+ do {
+ next = next_node(nid, policy->v.nodes);
+ if (next >= MAX_NUMNODES)
+ next = first_node(policy->v.nodes);
+ } while (!node_populated(next));
me->il_next = next;
return nid;
}
@@ -1192,6 +1215,11 @@ static inline unsigned interleave_nid(struct mempolicy *pol,
* for huge pages, since vm_pgoff is in units of small
* pages, we need to shift off the always 0 bits to get
* a useful offset.
+ *
+ * NOTE: For configurations with memoryless nodes this
+ * is broken since the allocation attempts on that node
+ * will fall back to other nodes and thus one
+ * neighboring node will be overallocated from.
*/
BUG_ON(shift < PAGE_SHIFT);
off = vma->vm_pgoff >> (shift - PAGE_SHIFT);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-06-12 0:10 UTC|newest]
Thread overview: 140+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-06-11 20:27 [PATCH] Add populated_map to account for " Nishanth Aravamudan, Lee Schermerhorn
2007-06-11 21:25 ` Christoph Lameter
2007-06-11 22:10 ` [PATCH v2] " Nishanth Aravamudan
2007-06-11 22:42 ` Christoph Lameter
2007-06-11 22:52 ` [PATCH v3] " Nishanth Aravamudan
2007-06-11 23:00 ` Christoph Lameter
2007-06-11 23:41 ` [PATCH v4] " Nishanth Aravamudan
2007-06-11 23:45 ` Christoph Lameter
2007-06-12 0:07 ` [PATCH] populated_map: fix !NUMA case, remove comment Nishanth Aravamudan
2007-06-12 0:41 ` Christoph Lameter
2007-06-12 1:43 ` Nishanth Aravamudan
2007-06-12 1:45 ` Christoph Lameter
2007-06-12 1:52 ` Nishanth Aravamudan
2007-06-12 2:39 ` Nishanth Aravamudan
2007-06-12 2:02 ` Nishanth Aravamudan
2007-06-12 2:20 ` Christoph Lameter
2007-06-12 2:32 ` Nishanth Aravamudan
2007-06-12 2:54 ` Christoph Lameter
2007-06-12 3:20 ` Nishanth Aravamudan
2007-06-12 3:21 ` Christoph Lameter
2007-06-12 3:31 ` Nishanth Aravamudan
2007-06-12 15:06 ` Lee Schermerhorn
2007-06-12 17:28 ` Nishanth Aravamudan
2007-06-12 18:43 ` Christoph Lameter
2007-06-12 18:48 ` Lee Schermerhorn
2007-06-12 18:51 ` Christoph Lameter
2007-06-12 19:44 ` Lee Schermerhorn
2007-06-12 19:48 ` Christoph Lameter
2007-06-12 19:58 ` Christoph Lameter
2007-06-12 20:01 ` Nishanth Aravamudan
2007-06-13 15:30 ` Lee Schermerhorn
2007-06-13 17:58 ` Nishanth Aravamudan
2007-06-13 18:21 ` Lee Schermerhorn
2007-06-13 19:01 ` Nishanth Aravamudan
2007-06-13 22:51 ` Christoph Lameter
2007-06-14 15:50 ` Lee Schermerhorn
2007-06-14 15:57 ` Christoph Lameter
2007-06-14 16:54 ` Lee Schermerhorn
2007-06-14 16:09 ` Nishanth Aravamudan
2007-06-14 16:15 ` Christoph Lameter
2007-06-14 17:07 ` Lee Schermerhorn
2007-06-14 17:16 ` Christoph Lameter
2007-06-14 18:04 ` Lee Schermerhorn
2007-06-14 22:35 ` Nishanth Aravamudan
2007-06-13 22:50 ` Christoph Lameter
2007-06-13 23:09 ` Nishanth Aravamudan
2007-06-13 23:12 ` Christoph Lameter
2007-06-13 23:18 ` Nishanth Aravamudan
2007-06-13 23:26 ` Christoph Lameter
2007-06-13 23:56 ` Nishanth Aravamudan
2007-06-14 14:23 ` Lee Schermerhorn
2007-06-13 22:49 ` Christoph Lameter
2007-06-12 19:55 ` Nishanth Aravamudan
2007-06-12 18:41 ` Christoph Lameter
2007-06-12 19:07 ` Lee Schermerhorn
2007-06-12 19:13 ` Christoph Lameter
2007-06-11 23:08 ` [PATCH][RFC] Fix INTERLEAVE with memoryless nodes Nishanth Aravamudan
2007-06-11 23:10 ` [PATCH v6][RFC] Fix hugetlb pool allocation with empty nodes Nishanth Aravamudan
2007-06-11 23:11 ` [PATCH][RFC] hugetlb: numafy several functions Nishanth Aravamudan
2007-06-11 23:13 ` [PATCH][RFC] hugetlb: add per-node nr_hugepages sysfs attribute Nishanth Aravamudan
2007-06-11 23:40 ` Christoph Lameter
2007-06-11 23:42 ` Christoph Lameter
2007-06-12 0:19 ` Nishanth Aravamudan
2007-06-12 0:43 ` Christoph Lameter
2007-06-12 2:19 ` Nishanth Aravamudan
2007-06-12 2:22 ` Christoph Lameter
2007-06-12 2:34 ` Nishanth Aravamudan
2007-06-11 23:38 ` [PATCH][RFC] hugetlb: numafy several functions Christoph Lameter
2007-06-11 23:17 ` [PATCH v6][RFC] Fix hugetlb pool allocation with empty nodes Christoph Lameter
2007-06-12 0:15 ` Nishanth Aravamudan
2007-06-12 0:47 ` Christoph Lameter
2007-06-12 2:12 ` Nishanth Aravamudan
2007-06-12 2:21 ` Christoph Lameter
2007-06-12 2:25 ` Christoph Lameter
2007-06-12 2:34 ` Nishanth Aravamudan
2007-06-12 2:55 ` Christoph Lameter
2007-06-12 3:17 ` Nishanth Aravamudan
2007-06-12 3:19 ` Christoph Lameter
2007-06-12 3:30 ` Nishanth Aravamudan
2007-06-12 3:48 ` Christoph Lameter
2007-06-12 5:07 ` Nishanth Aravamudan
2007-06-12 18:47 ` Christoph Lameter
2007-06-12 17:43 ` Nishanth Aravamudan
2007-06-12 18:49 ` Christoph Lameter
2007-06-12 2:33 ` Nishanth Aravamudan
2007-06-12 3:44 ` William Lee Irwin III
2007-06-12 3:50 ` Christoph Lameter
2007-06-12 3:53 ` William Lee Irwin III
2007-06-12 3:53 ` Christoph Lameter
2007-06-12 4:14 ` William Lee Irwin III
2007-06-12 5:09 ` Nishanth Aravamudan
2007-06-12 5:15 ` William Lee Irwin III
2007-06-12 17:36 ` Nishanth Aravamudan
2007-06-12 18:50 ` Christoph Lameter
2007-06-12 17:45 ` Nishanth Aravamudan
2007-06-12 19:13 ` William Lee Irwin III
2007-06-13 0:04 ` [PATCH v7][RFC] " Nishanth Aravamudan
2007-06-13 15:26 ` [PATCH v3][RFC] hugetlb: numafy several functions Nishanth Aravamudan
2007-06-13 15:28 ` [PATCH v3][RFC] hugetlb: add per-node nr_hugepages sysfs attribute Nishanth Aravamudan
2007-06-13 18:23 ` Lee Schermerhorn
2007-06-13 19:19 ` [PATCH v4][RFC] " Nishanth Aravamudan
2007-06-13 20:05 ` Lee Schermerhorn
2007-06-13 20:29 ` Nishanth Aravamudan
2007-06-13 21:02 ` Lee Schermerhorn
2007-07-23 19:23 ` Christoph Lameter
2007-07-23 20:14 ` Lee Schermerhorn
2007-06-13 21:04 ` [PATCH v7][RFC] Fix hugetlb pool allocation with empty nodes Lee Schermerhorn
2007-06-13 21:50 ` [PATCH v7][UPDATE][RFC] " Nishanth Aravamudan
2007-06-12 14:28 ` [PATCH v6][RFC] " Lee Schermerhorn
2007-06-11 23:15 ` [PATCH][RFC] Fix INTERLEAVE with memoryless nodes Christoph Lameter
2007-06-12 0:14 ` Nishanth Aravamudan [this message]
2007-06-12 0:42 ` [PATCH v2][RFC] " Christoph Lameter
2007-06-12 0:57 ` Andrew Morton
2007-06-12 1:12 ` Christoph Lameter
2007-06-12 1:41 ` Nishanth Aravamudan
2007-06-12 1:52 ` Andrew Morton
2007-06-12 2:03 ` Nishanth Aravamudan
2007-06-12 14:19 ` [PATCH v2] Add populated_map to account for " Lee Schermerhorn
2007-06-12 17:32 ` Nishanth Aravamudan
2007-06-12 18:45 ` Christoph Lameter
2007-06-12 19:17 ` Lee Schermerhorn
2007-06-12 19:22 ` Christoph Lameter
2007-06-12 19:49 ` Nishanth Aravamudan
2007-06-12 19:51 ` Christoph Lameter
2007-06-12 20:00 ` Nishanth Aravamudan
2007-06-12 20:03 ` Christoph Lameter
2007-06-12 20:10 ` Christoph Lameter
2007-06-12 19:52 ` Christoph Lameter
2007-06-12 19:58 ` Christoph Lameter
2007-06-12 20:00 ` Nishanth Aravamudan
2007-06-12 20:06 ` Christoph Lameter
2007-06-12 14:10 ` [PATCH] " Lee Schermerhorn
2007-06-12 17:35 ` Nishanth Aravamudan
2007-06-12 18:39 ` Christoph Lameter
2007-06-12 18:54 ` Lee Schermerhorn
2007-06-12 19:00 ` Christoph Lameter
2007-06-12 2:27 ` KAMEZAWA Hiroyuki
2007-06-12 2:46 ` Nishanth Aravamudan
2007-06-12 2:53 ` Christoph Lameter
2007-06-12 3:04 ` KAMEZAWA Hiroyuki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070612001436.GI14458@us.ibm.com \
--to=nacc@us.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=anton@samba.org \
--cc=clameter@sgi.com \
--cc=lee.schermerhorn@hp.com \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox