From mboxrd@z Thu Jan 1 00:00:00 1970 Subject: Re: [PATCH 2.6.17-rc1-mm1 0/6] Migrate-on-fault - Overview From: Lee Schermerhorn In-Reply-To: <200604112052.50133.ak@suse.de> References: <1144441108.5198.36.camel@localhost.localdomain> <200604112052.50133.ak@suse.de> Content-Type: text/plain Date: Tue, 11 Apr 2006 16:40:19 -0400 Message-Id: <1144788020.5160.136.camel@localhost.localdomain> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Andi Kleen Cc: Christoph Lameter , linux-mm , ak@suse.com List-ID: On Tue, 2006-04-11 at 20:52 +0200, Andi Kleen wrote: > On Tuesday 11 April 2006 20:46, Christoph Lameter wrote: > > However, if the page is not frequently references then the > > effort required to migrate the page was not justified. > > I have my doubts the whole thing is really worthwhile. It probably > would at least need some statistics to only do this for frequent > accesses, but I don't know where to put this data. > > At least it would be a serious research project to figure out > a good way to do automatic migration. From what I was told by > people who tried this (e.g. in Irix) it is really hard and > didn't turn out to be a win for them. My understanding is that it IS really hard to optimize this or try to use statistics to do this. Especially if your goal is to eke out the last ounce of performance, as HPC apps are wont to do. I'm not interested in this. > > The better way is to just provide the infrastructure > and let batch managers or program itselves take care of migration. I agree, for that last ounce of performance. But, I still think we can do better [have on other systems] than just letting the scheduler move tasks around a numa system with no attention to their memory locality. Not everybody want to be locking tasks down to prevent this, either. > > That was the whole idea behind NUMA API - some problems > are too hard to figure out automatically by the kernel, so > allow the user or application to give it a hand. I really don't want the kernel to have to figure too much out. You KNOW that when you move a task to a different node that either you're moving away from some memory footprint or, if you're lucky, back close to some earlier footprint. How lucky you need to be to achieve the latter depends on how many nodes you have and how badly the tasks memory footprint is spread around the nodes due to involuntary migration. Migrate on fault provides the first piece of infrastructure to address this. The first time a task touches a page that is not in memory, that task's policies get to choose where the page goes. Presumably, we go through some amount of effort to get the page somewhere close to the task or where it wants it. We've got a lot of vm infrastructure in support of this endeavor. What migrate on fault does is allow that same decision to be made when a task finds a cached page for which no other tasks currently have translations [ptes]. Seems like a good time to reevaluate this. Now, arranging for a significant number of the task's pages to be in that state is the subject of another patch series. > > And frankly the defaults we have currently are not that bad, > perhaps with some small tweaks (e.g. i'm still liking the idea > of interleaving file cache by default) No, the defaults aren't bad for initial allocation. But, they don't prevent scheduling/load balancing from undoing all the good work done up front. Lee -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org