From: Paul Jackson <pj@sgi.com>
To: David Rientjes <rientjes@cs.washington.edu>
Cc: linux-mm@kvack.org, akpm@osdl.org, nickpiggin@yahoo.com.au,
ak@suse.de, mbligh@google.com, rohitseth@google.com,
menage@google.com, clameter@sgi.com
Subject: Re: [RFC] another way to speed up fake numa node page_alloc
Date: Wed, 4 Oct 2006 19:27:14 -0700 [thread overview]
Message-ID: <20061004192714.20412e08.pj@sgi.com> (raw)
In-Reply-To: <Pine.LNX.4.64N.0610041456480.19080@attu2.cs.washington.edu>
> Isn't this the exact behavior that ordered zonelists are supposed to solve
> for real NUMA systems? Has there been an _observed_ case where the cost
> to scan the zonelists was considered excessive on real NUMA systems?
Well ... the good news is I understood your comments this time.
I guess I should be happy it only took about 3 iterations.
Historically the ordered zonelists addressed the situation where one
almost always found free memory near the front of the ordered zonelist.
Yes, you are correct that I originally didn't think we had a problem
with real numa zonelist scans.
Three days ago, when I introduced this alternative patch that started
this current thread, I changed my position, stating at that time:
>
> There are two reasons I persued this alternative:
>
> 1) Contrary to what I said before, we (SGI, on large ia64 sn2 systems)
> have seen real customer loads where the cost to scan the zonelist
> was a problem, due to many nodes being full of memory before
> we got to a node we could use. Or at least, I think we have.
> This was related to me by another engineer, based on experiences
> from some time past. So this is not guaranteed. Most likely, though.
>
> The following approach should help such real numa systems just as
> much as it helps fake numa systems, or any combination thereof.
>
> 2) The effort to distinguish fake from real numa, using node_distance,
> so that we could cache a fake numa node and optimize choosing
> it over equivalent distance fake nodes, while continuing to
> properly scan all real nodes in distance order, was going to
> require a nasty blob of zonelist and node distance munging.
>
> The following approach has no new dependency on node distances or
> zone sorting.
David wrote:
> I was under the impression that there was nothing wrong with the way
> current real NUMA systems allocate pages. If not, please point me to the
> thread that _specifically_ discusses this with _data_ that shows it's
> inefficient.
See above. I don't have data, so cannot justify going far out of our
way.
If someone has a better way to skin this fake numa cat, that does not
benefit (or harm) real numa, that would still be worth careful
consideration.
> In fact, when this thread started you recommended as little
> changes as possible to the code to not interfere with what already works.
Yes, I did start with that recommendation. See above.
And see above for my current reasons for persuing this patch.
Some more things I like about this patch:
* Conceptually, it is very localized, making no changes to the
larger code or data structure, just adding a cache of some
hot data.
* Further, it makes few assumptions about the larger scheme of
things.
* It has no dependencies on zonelist sorting, node distances,
fake vs real numa nodes or any of that.
* It makes no discernable difference in the memory placement
behaviour of a system.
Downside - it's still a linear zonelist scan, and it's a cache bolted on
the side of things, rather than an inherently fast algorithm and data
structure.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.925.600.0401
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2006-10-05 2:27 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-09-25 9:14 Paul Jackson
2006-09-26 6:08 ` David Rientjes
2006-09-26 7:06 ` Paul Jackson
2006-09-26 18:17 ` David Rientjes
2006-09-26 19:24 ` Paul Jackson
2006-09-26 19:58 ` David Rientjes
2006-09-26 21:48 ` Paul Jackson
2006-10-02 6:18 ` Paul Jackson
2006-10-02 6:31 ` David Rientjes
2006-10-02 6:48 ` Paul Jackson
2006-10-02 7:05 ` David Rientjes
2006-10-02 8:41 ` Paul Jackson
2006-10-03 18:15 ` Paul Jackson
2006-10-03 19:37 ` David Rientjes
2006-10-04 15:45 ` Paul Jackson
2006-10-04 16:11 ` Christoph Lameter
2006-10-04 22:10 ` David Rientjes
2006-10-05 2:27 ` Paul Jackson [this message]
2006-10-05 2:37 ` David Rientjes
2006-10-05 2:53 ` Paul Jackson
2006-10-05 3:00 ` David Rientjes
2006-10-05 3:26 ` Paul Jackson
2006-10-05 3:49 ` David Rientjes
2006-10-05 4:07 ` Andrew Morton
2006-10-05 4:14 ` Paul Jackson
2006-10-05 4:50 ` David Rientjes
2006-10-05 4:53 ` Paul Jackson
2006-10-11 3:42 ` Paul Jackson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20061004192714.20412e08.pj@sgi.com \
--to=pj@sgi.com \
--cc=ak@suse.de \
--cc=akpm@osdl.org \
--cc=clameter@sgi.com \
--cc=linux-mm@kvack.org \
--cc=mbligh@google.com \
--cc=menage@google.com \
--cc=nickpiggin@yahoo.com.au \
--cc=rientjes@cs.washington.edu \
--cc=rohitseth@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox