From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Tue, 26 Sep 2006 11:17:18 -0700 (PDT) From: David Rientjes Subject: Re: [RFC] another way to speed up fake numa node page_alloc In-Reply-To: <20060926000612.9db145a9.pj@sgi.com> Message-ID: References: <20060925091452.14277.9236.sendpatchset@v0> <20060926000612.9db145a9.pj@sgi.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org Return-Path: To: Paul Jackson Cc: linux-mm@kvack.org, akpm@osdl.org, nickpiggin@yahoo.com.au, ak@suse.de, mbligh@google.com, rohitseth@google.com, menage@google.com, clameter@sgi.com List-ID: On Tue, 26 Sep 2006, Paul Jackson wrote: > The goal is not to preserve a 1*HZ delay. I just pulled that delay out > of some unspeakable place. > > Roughly I wanted to throttle the rate of wasteful scans of already full > zones to some rate that was infrequent enough to solve our performance > problem, while still fast enough that no one would ever seriously > notice the subtle transient changes in memory placement behaviour. > Absolutely, I'm sure we'll see a performance enhancement with the get_page_from_freelist speedup even though I cannot run benchmarks myself. Since one second was chosen as the time interval between zaps, however, that will not always be the case if there's mangling and one CPU on the node will be zapping it prematurely when the system is being stressed for page allocation. This happens to be the case where the smaller time interval would be the most unfortunate. Obviously a second is a long time to constantly be allocating more and more pages, so I guess what bothers me is that we're zapping information that we have no reason to not believe is still accurate. > Eh ... why not? Sure, it's dirt simple. But in this case, fancier > control of this interval seems like it risks spending more effort than > it would save, with almost no discernable advantage to the user. > Because when we're stressing the system for more and more memory for a particular task regardless of whether it's starting or not, we're constantly allocating pages and zapping the nodemask about every second even though the status of each node could not have changed. Those hints should not be zapped and rather preserved because we have not freed any pages over the same time interval and not because an arbitrary clock tick came around. When we free memory from a specific zone, why is it not better to use zone_to_nid and then zap that _node_ in the nodemask only because we are guaranteed that the status has changed? > > I would really like to > > run benchmarks on this implementation as I have done for the others but I > > no longer have access to a 64-bit machine. > > Odd ... Do you expect that situation to be remedied anytime soon? > > I'd like to see the results of your rerunning your benchmark. > I no longer have access to a 64-bit machine or my benchmarking script so unless they have relaxed the kernel hacking policies for undergrads back at my school, I doubt I can contribute in performing benchmarks. Four people on the Cc list to this email, however, still have access to my script. David -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org