From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Tue, 26 Sep 2006 00:06:12 -0700 From: Paul Jackson Subject: Re: [RFC] another way to speed up fake numa node page_alloc Message-Id: <20060926000612.9db145a9.pj@sgi.com> In-Reply-To: References: <20060925091452.14277.9236.sendpatchset@v0> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: David Rientjes Cc: linux-mm@kvack.org, akpm@osdl.org, nickpiggin@yahoo.com.au, ak@suse.de, mbligh@google.com, rohitseth@google.com, menage@google.com, clameter@sgi.com List-ID: Thanks for reviewing this, David. David wrote: > If there's mangling on 'last_full_zap' in the scenario with multiple CPU's > on one node, that means that we might be clearing 'fullnodes' more often > than every 1*HZ, and that clear is always done by one CPU. Since the only > purpose of the delay is to allow a certain period of time go by where > these hints will actually serve a purpose, this entire speed-up will > then be degraded. I agree that adding locking for 'zonelist_faster' is > probably going too far in terms of performance hint data, but it seems > necessary with 'last_full_zap' if the goal is to preserve this 1*HZ > delay. I doubt it. An occassional extra clearing of fullnodes seems quite harmless to me. I doubt it matters whether we zap fullnodes once per second, or once per two seconds, or twice a second. We're just dealing with a single 64 bit word (a jiffies value), and it's a word that just the few CPUs local to a single node are contending over. On real 64 bit systems, it may not even be possible to mangle it The goal is not to preserve a 1*HZ delay. I just pulled that delay out of some unspeakable place. Roughly I wanted to throttle the rate of wasteful scans of already full zones to some rate that was infrequent enough to solve our performance problem, while still fast enough that no one would ever seriously notice the subtle transient changes in memory placement behaviour. > It seems like an immutable time interval embedded in the page alloc code > may not be the best way to measure when a full zap should occur. Eh ... why not? Sure, it's dirt simple. But in this case, fancier control of this interval seems like it risks spending more effort than it would save, with almost no discernable advantage to the user. If we already had the exact metric handy that we needed, so no more code needed to be added to a hot path to maintain the metric (including likely real locks, since most metrics don't like to be mangled by code that takes a cavelier attitude to locking), then I might reconsider. But I doubt that this use would justify adding a metric. > This is a creative solution, thanks .. > This definitely seems to be headed in the right direction because it works > in both the real NUMA case and the fake NUMA case. I hope so. > I would really like to > run benchmarks on this implementation as I have done for the others but I > no longer have access to a 64-bit machine. Odd ... Do you expect that situation to be remedied anytime soon? I'd like to see the results of your rerunning your benchmark. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson 1.925.600.0401 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org