linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
To: Christoph Lameter <clameter@sgi.com>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: ak@suse.de, linux-mm@kvack.org,
	Nishanth Aravamudan <nacc@us.ibm.com>,
	pj@sgi.com, kxr@sgi.com, Mel Gorman <mel@skynet.ie>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Eric Whitney <eric.whitney@hp.com>
Subject: [PATCH/RFC] memoryless nodes - fixup uses of node_online_map in generic code
Date: Thu, 16 Aug 2007 10:17:33 -0400	[thread overview]
Message-ID: <1187273853.5900.21.camel@localhost> (raw)
In-Reply-To: <Pine.LNX.4.64.0708081638270.17335@schroedinger.engr.sgi.com>

Christoph, Andrew:

Here's a cut at fixing up uses of the online node map in generic code.
I'll look at the archs as I get the time, but I thought it worth getting
the ball rolling on the generic fixes.

Note questions about use of N_HIGH_MEMORY in find_next_best_node() and
population of N_HIGH_MEMORY in early_calculate_totalpages().

Comments?

Lee

-----------------
PATCH/RFC Fix generic usage of node_online_map 

Against 2.6.23-rc2-mm2

mm/shmem.c:shmem_parse_mpol()

	Ensure nodelist is subset of nodes with memory.
	Use node_states[N_HIGH_MEMORY] as default for missing
	nodelist for interleave policy.

mm/shmem.c:shmem_fill_super()

	initialize policy_nodes to node_states[N_HIGH_MEMORY]

mm/page-writeback.c:highmem_dirtyable_memory()

	sum over nodes with memory

mm/swap_prefetch.c:clear_last_prefetch_free()
                   clear_last_current_free()

	use nodes with memory for prefetch nodes.
	just in case ...

mm/page_alloc.c:zlc_setup()

	allowednodes - use nodes with memory.

mm/page_alloc.c:default_zonelist_order()

	average over nodes with memory.

mm/page_alloc.c:find_next_best_node()

	skip nodes w/o memory.
	N_HIGH_MEMORY state mask may not be initialized at this time,
	unless we want to depend on early_calculate_totalpages() [see
	below].  Will ZONE_MOVABLE ever be configurable?

mm/page_alloc.c:find_zone_movable_pfns_for_nodes()

	spread kernelcore over nodes with memory.

	This required calling early_calculate_totalpages()
	unconditionally, and populating N_HIGH_MEMORY node
	state therein from nodes in the early_node_map[].
	If we can depend on this, we can eliminate the
	population of N_HIGH_MEMORY mask from __build_all_zonelists()
	and use the N_HIGH_MEMORY mask in find_next_best_node().

mm/mempolicy.c:mpol_check_policy()

	Ensure nodes specified for policy are subset of
	nodes with memory.


Signed-off-by:  Lee Schermerhorn <lee.schermerhorn@hp.com>

 mm/mempolicy.c      |    2 +-
 mm/page-writeback.c |    2 +-
 mm/page_alloc.c     |   36 ++++++++++++++++++++++++++++--------
 mm/shmem.c          |   13 ++++++++-----
 mm/swap_prefetch.c  |    4 ++--
 5 files changed, 40 insertions(+), 17 deletions(-)

Index: Linux/mm/shmem.c
===================================================================
--- Linux.orig/mm/shmem.c	2007-08-15 10:01:22.000000000 -0400
+++ Linux/mm/shmem.c	2007-08-16 09:48:45.000000000 -0400
@@ -971,7 +971,7 @@ static inline int shmem_parse_mpol(char 
 		*nodelist++ = '\0';
 		if (nodelist_parse(nodelist, *policy_nodes))
 			goto out;
-		if (!nodes_subset(*policy_nodes, node_online_map))
+		if (!nodes_subset(*policy_nodes, node_states[N_HIGH_MEMORY]))
 			goto out;
 	}
 	if (!strcmp(value, "default")) {
@@ -996,9 +996,11 @@ static inline int shmem_parse_mpol(char 
 			err = 0;
 	} else if (!strcmp(value, "interleave")) {
 		*policy = MPOL_INTERLEAVE;
-		/* Default to nodes online if no nodelist */
+		/*
+		 * Default to online nodes with memory if no nodelist
+		 */
 		if (!nodelist)
-			*policy_nodes = node_online_map;
+			*policy_nodes = node_states[N_HIGH_MEMORY];
 		err = 0;
 	}
 out:
@@ -1060,7 +1062,8 @@ shmem_alloc_page(gfp_t gfp, struct shmem
 	return page;
 }
 #else
-static inline int shmem_parse_mpol(char *value, int *policy, nodemask_t *policy_nodes)
+static inline int shmem_parse_mpol(char *value, int *policy,
+						nodemask_t *policy_nodes)
 {
 	return 1;
 }
@@ -2239,7 +2242,7 @@ static int shmem_fill_super(struct super
 	unsigned long blocks = 0;
 	unsigned long inodes = 0;
 	int policy = MPOL_DEFAULT;
-	nodemask_t policy_nodes = node_online_map;
+	nodemask_t policy_nodes = node_states[N_HIGH_MEMORY];
 
 #ifdef CONFIG_TMPFS
 	/*
Index: Linux/mm/page-writeback.c
===================================================================
--- Linux.orig/mm/page-writeback.c	2007-08-15 10:01:22.000000000 -0400
+++ Linux/mm/page-writeback.c	2007-08-15 10:13:49.000000000 -0400
@@ -126,7 +126,7 @@ static unsigned long highmem_dirtyable_m
 	int node;
 	unsigned long x = 0;
 
-	for_each_online_node(node) {
+	for_each_node_state(node, N_HIGH_MEMORY) {
 		struct zone *z =
 			&NODE_DATA(node)->node_zones[ZONE_HIGHMEM];
 
Index: Linux/mm/swap_prefetch.c
===================================================================
--- Linux.orig/mm/swap_prefetch.c	2007-08-15 10:01:22.000000000 -0400
+++ Linux/mm/swap_prefetch.c	2007-08-15 10:13:49.000000000 -0400
@@ -249,7 +249,7 @@ static void clear_last_prefetch_free(voi
 	 * Reset the nodes suitable for prefetching to all nodes. We could
 	 * update the data to take into account memory hotplug if desired..
 	 */
-	sp_stat.prefetch_nodes = node_online_map;
+	sp_stat.prefetch_nodes = node_states[N_HIGH_MEMORY];
 	for_each_node_mask(node, sp_stat.prefetch_nodes) {
 		struct node_stats *ns = &sp_stat.node[node];
 
@@ -261,7 +261,7 @@ static void clear_current_prefetch_free(
 {
 	int node;
 
-	sp_stat.prefetch_nodes = node_online_map;
+	sp_stat.prefetch_nodes = node_states[N_HIGH_MEMORY];
 	for_each_node_mask(node, sp_stat.prefetch_nodes) {
 		struct node_stats *ns = &sp_stat.node[node];
 
Index: Linux/mm/page_alloc.c
===================================================================
--- Linux.orig/mm/page_alloc.c	2007-08-15 10:05:41.000000000 -0400
+++ Linux/mm/page_alloc.c	2007-08-16 09:45:47.000000000 -0400
@@ -1302,7 +1302,7 @@ int zone_watermark_ok(struct zone *z, in
  *
  * If the zonelist cache is present in the passed in zonelist, then
  * returns a pointer to the allowed node mask (either the current
- * tasks mems_allowed, or node_online_map.)
+ * tasks mems_allowed, or node_states[N_HIGH_MEMORY].)
  *
  * If the zonelist cache is not available for this zonelist, does
  * nothing and returns NULL.
@@ -1331,7 +1331,7 @@ static nodemask_t *zlc_setup(struct zone
 
 	allowednodes = !in_interrupt() && (alloc_flags & ALLOC_CPUSET) ?
 					&cpuset_current_mems_allowed :
-					&node_online_map;
+					&node_states[N_HIGH_MEMORY];
 	return allowednodes;
 }
 
@@ -2128,8 +2128,17 @@ static int find_next_best_node(int node,
 	}
 
 	for_each_online_node(n) {
+		pg_data_t *pgdat = NODE_DATA(n);
 		cpumask_t tmp;
 
+		/*
+		 * skip nodes w/o memory.
+		 * Note:  N_HIGH_MEMORY state not guaranteed to be
+		 *        populated yet.
+		 */
+		if (pgdat->node_present_pages)
+			continue;
+
 		/* Don't want a node to appear more than once */
 		if (node_isset(n, *used_node_mask))
 			continue;
@@ -2264,7 +2273,8 @@ static int default_zonelist_order(void)
   	 * If there is a node whose DMA/DMA32 memory is very big area on
  	 * local memory, NODE_ORDER may be suitable.
          */
-	average_size = total_size / (num_online_nodes() + 1);
+	average_size = total_size /
+				(nodes_weight(node_states[N_HIGH_MEMORY]) + 1);
 	for_each_online_node(nid) {
 		low_kmem_size = 0;
 		total_size = 0;
@@ -3750,14 +3760,24 @@ unsigned long __init find_max_pfn_with_a
 	return max_pfn;
 }
 
+/*
+ * early_calculate_totalpages()
+ * Sum pages in active regions for movable zone.
+ * Populate N_HIGH_MEMORY for calculating usable_nodes.
+ */
 static unsigned long __init early_calculate_totalpages(void)
 {
 	int i;
 	unsigned long totalpages = 0;
 
-	for (i = 0; i < nr_nodemap_entries; i++)
-		totalpages += early_node_map[i].end_pfn -
+	for (i = 0; i < nr_nodemap_entries; i++) {
+		unsigned long pages = early_node_map[i].end_pfn -
 						early_node_map[i].start_pfn;
+		totalpages += pages;
+		if (pages)
+			node_set_state(early_node_map[i].nid,
+						N_HIGH_MEMORY);
+	}
 
 	return totalpages;
 }
@@ -3773,7 +3793,8 @@ void __init find_zone_movable_pfns_for_n
 	int i, nid;
 	unsigned long usable_startpfn;
 	unsigned long kernelcore_node, kernelcore_remaining;
-	int usable_nodes = num_online_nodes();
+	unsigned long totalpages = early_calculate_totalpages();
+	int usable_nodes = nodes_weight(node_states[N_HIGH_MEMORY]);
 
 	/*
 	 * If movablecore was specified, calculate what size of
@@ -3784,7 +3805,6 @@ void __init find_zone_movable_pfns_for_n
 	 * what movablecore would have allowed.
 	 */
 	if (required_movablecore) {
-		unsigned long totalpages = early_calculate_totalpages();
 		unsigned long corepages;
 
 		/*
@@ -3809,7 +3829,7 @@ void __init find_zone_movable_pfns_for_n
 restart:
 	/* Spread kernelcore memory as evenly as possible throughout nodes */
 	kernelcore_node = required_kernelcore / usable_nodes;
-	for_each_online_node(nid) {
+	for_each_node_state(nid, N_HIGH_MEMORY) {
 		/*
 		 * Recalculate kernelcore_node if the division per node
 		 * now exceeds what is necessary to satisfy the requested
Index: Linux/mm/mempolicy.c
===================================================================
--- Linux.orig/mm/mempolicy.c	2007-08-15 10:01:22.000000000 -0400
+++ Linux/mm/mempolicy.c	2007-08-16 09:45:47.000000000 -0400
@@ -130,7 +130,7 @@ static int mpol_check_policy(int mode, n
 			return -EINVAL;
 		break;
 	}
-	return nodes_subset(*nodes, node_online_map) ? 0 : -EINVAL;
+ 	return nodes_subset(*nodes, node_states[N_HIGH_MEMORY]) ? 0 : -EINVAL;
 }
 
 /* Generate a custom zonelist for the BIND policy. */


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2007-08-16 14:17 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-07-27 19:43 [PATCH 00/14] NUMA: Memoryless node support V4 Lee Schermerhorn
2007-07-27 19:43 ` [PATCH 01/14] NUMA: Generic management of nodemasks for various purposes Lee Schermerhorn
2007-07-30 21:38   ` [PATCH/RFC] 2.6.23-rc1-mm1: MPOL_PREFERRED fixups for preferred_node < 0 Lee Schermerhorn
2007-07-30 22:00     ` Lee Schermerhorn
2007-07-31 15:32       ` Mel Gorman
2007-07-31 15:58         ` Lee Schermerhorn
2007-07-31 21:05     ` [PATCH/RFC] 2.6.23-rc1-mm1: MPOL_PREFERRED fixups for preferred_node < 0 - v2 Lee Schermerhorn
2007-08-01  2:22   ` [PATCH 01/14] NUMA: Generic management of nodemasks for various purposes Andrew Morton
2007-08-01  2:52     ` Christoph Lameter
2007-08-01  3:05       ` Andrew Morton
2007-08-01  3:14         ` Christoph Lameter
2007-08-01  3:32           ` Andrew Morton
2007-08-01  3:37             ` Christoph Lameter
     [not found]             ` <Pine.LNX.4.64.0707312151400.2894@schroedinger.engr.sgi.com>
2007-08-01  5:07               ` Andrew Morton
2007-08-01  5:11                 ` Andrew Morton
2007-08-01  5:22                 ` Christoph Lameter
2007-08-01 10:24                   ` Mel Gorman
2007-08-02 16:23                   ` Mel Gorman
2007-08-02 20:00                     ` Christoph Lameter
2007-08-01  5:36             ` Paul Mundt
2007-08-01  9:19             ` Andi Kleen
2007-08-01 14:03             ` Lee Schermerhorn
2007-08-01 17:41               ` Christoph Lameter
2007-08-01 17:54                 ` Lee Schermerhorn
2007-08-02 20:05                 ` [PATCH/RFC/WIP] cpuset-independent interleave policy Lee Schermerhorn
2007-08-02 20:34                   ` Christoph Lameter
2007-08-02 21:04                     ` Lee Schermerhorn
2007-08-03  0:31                       ` Christoph Lameter
2007-08-02 20:19                 ` Audit of "all uses of node_online()" Lee Schermerhorn
2007-08-02 20:26                   ` Christoph Lameter
2007-08-08 22:19                     ` Lee Schermerhorn
2007-08-08 23:40                       ` Christoph Lameter
2007-08-16 14:17                         ` Lee Schermerhorn [this message]
2007-08-16 18:33                           ` [PATCH/RFC] memoryless nodes - fixup uses of node_online_map in generic code Christoph Lameter
2007-08-16 19:15                             ` Lee Schermerhorn
2007-08-16 21:10                         ` Lee Schermerhorn
2007-08-16 21:13                           ` Christoph Lameter
2007-08-24 16:09                         ` [PATCH] 2.6.23-rc3-mm1 - Move setup of N_CPU node state mask Lee Schermerhorn
2007-09-06 13:56                           ` Mel Gorman
2007-08-02 20:33                   ` Audit of "all uses of node_online()" Andrew Morton
2007-08-02 20:45                     ` Lee Schermerhorn
2007-08-01 15:58           ` [PATCH 01/14] NUMA: Generic management of nodemasks for various purposes Nishanth Aravamudan
2007-08-01 16:09             ` Nishanth Aravamudan
2007-08-01 17:47             ` Christoph Lameter
2007-08-01 15:25         ` Nishanth Aravamudan
2007-07-27 19:43 ` [PATCH 02/14] Memoryless nodes: introduce mask of nodes with memory Lee Schermerhorn
2007-07-27 19:43 ` [PATCH 03/14] Memoryless Nodes: Fix interleave behavior Lee Schermerhorn
2007-07-27 19:43 ` [PATCH 04/14] OOM: use the N_MEMORY map instead of constructing one on the fly Lee Schermerhorn
2007-07-27 19:43 ` [PATCH 05/14] Memoryless Nodes: No need for kswapd Lee Schermerhorn
2007-07-27 19:43 ` [PATCH 06/14] Memoryless Node: Slab support Lee Schermerhorn
2007-07-27 19:44 ` [PATCH 07/14] Memoryless nodes: SLUB support Lee Schermerhorn
2007-07-27 19:44 ` [PATCH 08/14] Uncached allocator: Handle memoryless nodes Lee Schermerhorn
2007-07-27 19:44 ` [PATCH 09/14] Memoryless node: Allow profiling data to fall back to other nodes Lee Schermerhorn
2007-07-27 19:44 ` [PATCH 10/14] Memoryless nodes: Update memory policy and page migration Lee Schermerhorn
2007-07-27 19:44 ` [PATCH 11/14] Add N_CPU node state Lee Schermerhorn
2007-07-27 19:44 ` [PATCH 12/14] Memoryless nodes: Fix GFP_THISNODE behavior Lee Schermerhorn
2007-07-27 19:44 ` [PATCH 13/14] Memoryless Nodes: use "node_memory_map" for cpusets Lee Schermerhorn
2007-07-27 19:44 ` [PATCH 14/14] Memoryless nodes: drop one memoryless node boot warning Lee Schermerhorn
2007-07-27 20:59 ` [PATCH 00/14] NUMA: Memoryless node support V4 Nishanth Aravamudan
2007-07-30 13:48   ` Lee Schermerhorn
2007-07-29 12:35 ` Paul Jackson
2007-07-30 16:07   ` Lee Schermerhorn
2007-07-30 18:56     ` Paul Jackson
2007-07-30 21:19 ` Nishanth Aravamudan
2007-07-30 22:06   ` Christoph Lameter
2007-07-30 22:35     ` Andi Kleen
2007-07-30 22:36       ` Christoph Lameter
2007-07-31 23:18         ` Nishanth Aravamudan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1187273853.5900.21.camel@localhost \
    --to=lee.schermerhorn@hp.com \
    --cc=ak@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=clameter@sgi.com \
    --cc=eric.whitney@hp.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kxr@sgi.com \
    --cc=linux-mm@kvack.org \
    --cc=mel@skynet.ie \
    --cc=nacc@us.ibm.com \
    --cc=pj@sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox