From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: ak@suse.de, linux-kernel@vger.kernel.org, linux-mm@kvack.org,
akpm@linux-foundation.org, clameter@sgi.com
Subject: [PATCH] change global zonelist order on NUMA v3
Date: Thu, 26 Apr 2007 19:53:48 +0900 [thread overview]
Message-ID: <20070426195348.6a4e5652.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <20070426191043.df96c114.kamezawa.hiroyu@jp.fujitsu.com>
Changelog V2 -> V3
- removed zone ordering selection knobs...
much simpler one. just changing zonelist ordering.
tested on ia64 NUMA works well as expected.
-Kame
change zonelist order on NUMA v3.
[Description]
Assume 2 node NUMA, only node(0) has ZONE_DMA.
(ia64's ZONE_DMA is below 4GB...x86_64's ZONE_DMA32)
In this case, current default (node0's) zonelist order is
Node(0)'s NORMAL -> Node(0)'s DMA -> Node(1)"s NORMAL.
This means Node(0)'s DMA will be used before Node(1)'s NORMAL.
This will cause OOM on ZONE_DMA easily.
This patch changes *default* zone order to
Node(0)'s NORMAL -> Node(1)'s NORMAL -> Node(0)'s DMA.
tested ia64 2-Node NUMA. works well.
Signed-Off-By: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Index: linux-2.6.21-rc7-mm2/mm/page_alloc.c
===================================================================
--- linux-2.6.21-rc7-mm2.orig/mm/page_alloc.c
+++ linux-2.6.21-rc7-mm2/mm/page_alloc.c
@@ -2023,6 +2023,7 @@ void show_free_areas(void)
*
* Add all populated zones of a node to the zonelist.
*/
+#ifndef CONFIG_NUMA
static int __meminit build_zonelists_node(pg_data_t *pgdat,
struct zonelist *zonelist, int nr_zones, enum zone_type zone_type)
{
@@ -2042,6 +2043,7 @@ static int __meminit build_zonelists_nod
} while (zone_type);
return nr_zones;
}
+#endif
#ifdef CONFIG_NUMA
#define MAX_NODE_LOAD (num_online_nodes())
@@ -2106,52 +2108,51 @@ static int __meminit find_next_best_node
return best_node;
}
+/*
+ * Build zonelist based on zone priority.
+ */
+static int __meminitdata node_order[MAX_NUMNODES];
static void __meminit build_zonelists(pg_data_t *pgdat)
{
- int j, node, local_node;
- enum zone_type i;
- int prev_node, load;
- struct zonelist *zonelist;
+ int i, j, pos, zone_type, node, load;
nodemask_t used_mask;
+ int local_node, prev_node;
+ struct zone *z;
+ struct zonelist *zonelist;
- /* initialize zonelists */
for (i = 0; i < MAX_NR_ZONES; i++) {
zonelist = pgdat->node_zonelists + i;
zonelist->zones[0] = NULL;
}
-
- /* NUMA-aware ordering of nodes */
+ memset(node_order, 0, sizeof(node_order));
local_node = pgdat->node_id;
load = num_online_nodes();
prev_node = local_node;
nodes_clear(used_mask);
+ j = 0;
while ((node = find_next_best_node(local_node, &used_mask)) >= 0) {
int distance = node_distance(local_node, node);
-
- /*
- * If another node is sufficiently far away then it is better
- * to reclaim pages in a zone before going off node.
- */
if (distance > RECLAIM_DISTANCE)
zone_reclaim_mode = 1;
-
- /*
- * We don't want to pressure a particular node.
- * So adding penalty to the first node in same
- * distance group to make it round-robin.
- */
-
if (distance != node_distance(local_node, prev_node))
- node_load[node] += load;
+ node_load[node] = load;
+ node_order[j++] = node;
prev_node = node;
load--;
- for (i = 0; i < MAX_NR_ZONES; i++) {
- zonelist = pgdat->node_zonelists + i;
- for (j = 0; zonelist->zones[j] != NULL; j++);
-
- j = build_zonelists_node(NODE_DATA(node), zonelist, j, i);
- zonelist->zones[j] = NULL;
+ }
+ /* calculate node order */
+ for (i = 0; i < MAX_NR_ZONES; i++) {
+ zonelist = pgdat->node_zonelists + i;
+ pos = 0;
+ for (zone_type = i; zone_type >= 0; zone_type--) {
+ for (j = 0; j < num_online_nodes(); j++) {
+ node = node_order[j];
+ z = &NODE_DATA(node)->node_zones[zone_type];
+ if (populated_zone(z))
+ zonelist->zones[pos++] = z;
+ }
}
+ zonelist->zones[pos] = NULL;
}
}
@@ -2239,6 +2240,7 @@ void __meminit build_all_zonelists(void)
__build_all_zonelists(NULL);
cpuset_init_current_mems_allowed();
} else {
+ memset(node_load, 0, sizeof(node_load));
/* we have to stop all cpus to guaranntee there is no user
of zonelist */
stop_machine_run(__build_all_zonelists, NULL, NR_CPUS);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-04-26 10:53 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-04-26 9:34 [PATCH] change global zonelist order on NUMA v2 KAMEZAWA Hiroyuki
2007-04-26 9:47 ` Andi Kleen
2007-04-26 10:10 ` KAMEZAWA Hiroyuki
2007-04-26 10:53 ` KAMEZAWA Hiroyuki [this message]
2007-04-26 16:00 ` [PATCH] change global zonelist order on NUMA v3 Lee Schermerhorn
2007-04-26 16:06 ` Christoph Lameter
2007-04-26 16:29 ` Lee Schermerhorn
2007-04-26 16:36 ` Christoph Lameter
2007-04-26 15:48 ` [PATCH] change global zonelist order on NUMA v2 Christoph Lameter
2007-04-27 0:27 ` KAMEZAWA Hiroyuki
2007-04-27 1:25 ` Christoph Lameter
2007-04-27 1:50 ` KAMEZAWA Hiroyuki
2007-04-30 15:03 ` Lee Schermerhorn
2007-04-30 14:09 ` Lee Schermerhorn
2007-04-26 15:46 ` Christoph Lameter
2007-04-26 15:51 ` Andi Kleen
2007-04-26 21:57 ` Lee Schermerhorn
2007-04-26 22:07 ` Christoph Lameter
2007-04-27 0:41 ` KAMEZAWA Hiroyuki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070426195348.6a4e5652.kamezawa.hiroyu@jp.fujitsu.com \
--to=kamezawa.hiroyu@jp.fujitsu.com \
--cc=ak@suse.de \
--cc=akpm@linux-foundation.org \
--cc=clameter@sgi.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox