From mboxrd@z Thu Jan 1 00:00:00 1970 Subject: Re: Query re: mempolicy for page cache pages From: Lee Schermerhorn In-Reply-To: References: <1147974599.5195.96.camel@localhost.localdomain> Content-Type: text/plain Date: Thu, 18 May 2006 15:27:54 -0400 Message-Id: <1147980474.5195.173.camel@localhost.localdomain> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Christoph Lameter Cc: linux-mm , Andi Kleen , Steve Longerbeam , Andrew Morton List-ID: On Thu, 2006-05-18 at 11:15 -0700, Christoph Lameter wrote: > On Thu, 18 May 2006, Lee Schermerhorn wrote: > > > Below I've included an overview of a patch set that I've been working > > on. I submitted a previous version [then called Page Cache Policy] back > > ~20Apr. I started working on this because Christoph seemed to consider > > this a prerequisite for considering migrate-on-fault/lazy-migration/... > > Since the previous post, I have addressed comments [from Christoph] and > > kept the series up to date with the -mm tree. > > The prequisite for automatic page migration schemes in the kernel is proof > that these automatic migrations consistently improve performance. We are > still waiting on data showing that this is the case. So far, all I have is evidence that good locality obtained at process start up using default policy is broken by internode task migrations. I have seen this penalty fixed by automatic migration for artificial benchmarks [McAlpin STREAM] and know that this approached worked well for TPC-like loads on previous NUMA systems I've worked on. Currently, I don't have access to TPC loads in my lab, but we're working on it. And, if I could get the patches into the mm tree, once basic migration settles down so as not to complicate your on-going work, then folks could enable them to test the effects on their NUMA systems, if they are at all concerned about load balancing upsetting previously established locality. You know: the open source, community collaboration thing. I guess I thought that's what the mm tree is for. Not everything that gets in there makes it to Linus' tree. > > The particular automatic migration scheme that you proposed relies on > allocating pages according to the memory allocation policy. Makes emminent sense to me... > > The basic problem is first of all that the memory policies do not > necessarily describe how the user wants memory to be allocated. The user > may temporarily switch task policies to get specific allocation patterns. > So moving memory may misplace memory. We got around that by > saying that we need to separately enable migration if a user > wants it. I'm aware of this. I guess I always considered the temporary switching of policies to achieve desired locality as a stopgap measure because of missing capabilities in the kernel. But, I agree that since this is the existing behavior and we don't want to break user space, that it should be off by default and enabled when desired. My latest patch series, which I haven't posted for obvious reasons, does support per cpuset enabling of migrate-on-fault and auto-migration. > > But even then we have the issue that the memory policies cannot > describe proper allocation at all since allocation policies are > ignored for file backed vmas. And this is the issue you are trying to > address. Right! For consistency's sake, I guess. I always looked at it as the migrate-on-fault was "correcting" page misplacement at original fault time. Said with tongue only slightly in cheek ;-). > > I think this is all far to complicated to do in kernel space and still > conceptually unclean. I would like to have all automatic migration schemes > confined to user space. We will add an API that allows some process > to migrate pages at will. We want pages to migrate when the load balancer decides to move the process to a new node, away from it's memory. I suppose internode migration could also be accidently reuniting a task with its memory footprint, but the higher the node count, the lower the probability of this. And, if we did do some form of page migration on task migration, I think we'd need to consider the cost of page migration in the decision to migrate. I see that previous attempts to consider memory footprint in internode migration seem to have gone nowhere, tho'. Probably not worth it for some nominally numa platforms. Even for platforms where it might make sense, tuning the algorithms will, I think, require data that can best be obtained from testing on multiple platforms. Just dreaming, I know... As far as doing it in user space: I supposed one could deliver a notification to the process and have it migrate pages at that point. Sounds a lot more inefficient than just unmapping pages that have default policy as the process returns to user space on the new node [easy to hook in w/o adding new mechanism] and letting the pages fault over as they are referenced. After all, we don't know which pages will be touched before the next internode migration. I don't think that the application itself would have a good idea of this at the time of the migration. And, I think, complication and cleanliness is in the eye of the beholder. 'Nuf said on that point... ;-) Lee -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org