linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@osdl.org>
To: Andi Kleen <ak@suse.de>
Cc: akpm@osdl.org, Christoph Lameter <clameter@engr.sgi.com>,
	linux-mm@kvack.org
Subject: Re: [PATCH for 2.6.16] Handle holes in node mask in node fallback list initialization
Date: Fri, 17 Feb 2006 08:52:30 -0800 (PST)	[thread overview]
Message-ID: <Pine.LNX.4.64.0602170841190.916@g5.osdl.org> (raw)
In-Reply-To: <200602170223.34031.ak@suse.de>


On Fri, 17 Feb 2006, Andi Kleen wrote:
> 
> The new function to set up the node fallback lists didn't handle
> holes in the node map. This happens e.g. on Opterons when 
> the a CPU is missing memory, which is not that uncommon. 

That whole function is crap. Your changes don't seem to make it any less 
crap, and depends on some insane and unreliable node ordering 
characteristic, as far as I can tell. The thing is horrid.

There is no way you can know that your

	n = (node + i) % random_value;

hits all nodes, much less a valid node. 

Think about it: because we do "for_each_online_node(i)", the "i" is _not_ 
guaranteed to be contiguous, which means that "node + i" is not guaranteed 
to be contiguous, which in turn means that you may be hopping over all the 
valid nodes, and every time (because you do that stupid and undefined 
"node + i" crap) you may hit something invalid or empty.

THE WHOLE ALGORITHM IS BROKEN.

Your patch doesn't make it any less broken, it just makes it _differently_ 
broken, and so you think it fixed something. It fixed absolutely NOTHING.

Here is a totally untested diff that may not even compile, but at least it 
makes SENSE from a conceptual standpoint. The magis is

 - don't do the "node + i" crap, because it is by definition broken.

   It has no semantic meaning, and I _guarantee_ that you can't get it to 
   work in general.

 - prefer nodes that follow the current node, by artificially inflating 
   the distance to previous nodes. This should automatically get you 
   exactly the round-robin behaviour you wanted.

NOTE! I've not tested (and thus not debugged) it. I don't even have NUMA 
enabled, so I've not even compiled it. Somebody else please test it, and 
send it back to me with a sign-off and a proper explanation, and I'll sign 
off on it again and apply it.

		Linus

----
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 62c1225..208812b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1541,29 +1541,29 @@ static int __initdata node_load[MAX_NUMN
  */
 static int __init find_next_best_node(int node, nodemask_t *used_node_mask)
 {
-	int i, n, val;
+	int n, val;
 	int min_val = INT_MAX;
 	int best_node = -1;
 
-	for_each_online_node(i) {
-		cpumask_t tmp;
+	/* Use the local node if we haven't already */
+	if (!node_isset(node, *used_node_mask)) {
+		node_set(node, *used_node_mask);
+		return node;
+	}
 
-		/* Start from local node */
-		n = (node+i) % num_online_nodes();
+	for_each_online_node(n) {
+		cpumask_t tmp;
 
 		/* Don't want a node to appear more than once */
 		if (node_isset(n, *used_node_mask))
 			continue;
 
-		/* Use the local node if we haven't already */
-		if (!node_isset(node, *used_node_mask)) {
-			best_node = node;
-			break;
-		}
-
 		/* Use the distance array to find the distance */
 		val = node_distance(node, n);
 
+		/* Penalize nodes under us ("prefer the next node") */
+		val += (n < node);
+
 		/* Give preference to headless and unused nodes */
 		tmp = node_to_cpumask(n);
 		if (!cpus_empty(tmp))

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2006-02-17 16:52 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-02-17  1:23 Andi Kleen
2006-02-17  1:40 ` Christoph Lameter
2006-02-17  1:46   ` Andi Kleen
2006-02-17  2:12     ` KAMEZAWA Hiroyuki
2006-02-17  1:51 ` Christoph Lameter
2006-02-17  2:10   ` Andi Kleen
2006-02-17  2:46     ` Christoph Lameter
2006-02-17  6:10   ` Yasunori Goto
2006-02-17  9:58     ` Andi Kleen
2006-02-17 11:23       ` Bob Picco
2006-02-17 12:15         ` Andi Kleen
2006-02-17 14:34           ` Lee Schermerhorn
2006-02-17 16:05     ` Christoph Lameter
2006-02-17  3:33 ` Yasunori Goto
2006-02-17 16:52 ` Linus Torvalds [this message]
2006-02-17 18:07   ` Andi Kleen
2006-02-17 18:38     ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0602170841190.916@g5.osdl.org \
    --to=torvalds@osdl.org \
    --cc=ak@suse.de \
    --cc=akpm@osdl.org \
    --cc=clameter@engr.sgi.com \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox