From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <428E6427.7060401@engr.sgi.com> Date: Fri, 20 May 2005 17:26:47 -0500 From: Ray Bryant MIME-Version: 1.0 Subject: Re: [Lhms-devel] Re: [PATCH 2.6.12-rc3 1/8] mm: manual page migration-rc2 -- xfs-extended-attributes-rc2.patch References: <20050511043756.10876.72079.60115@jackhammer.engr.sgi.com> <20050511043802.10876.60521.51027@jackhammer.engr.sgi.com> <20050511071538.GA23090@infradead.org> <4281F650.2020807@engr.sgi.com> <20050511125932.GW25612@wotan.suse.de> <42825236.1030503@engr.sgi.com> <20050511193207.GE11200@wotan.suse.de> <20050512104543.GA14799@infradead.org> In-Reply-To: <20050512104543.GA14799@infradead.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Christoph Hellwig Cc: Andi Kleen , Ray Bryant , Hirokazu Takahashi , Marcelo Tosatti , Dave Hansen , linux-mm , Nathan Scott , Ray Bryant , lhms-devel@lists.sourceforge.net, Jes Sorensen , Steve Longerbeam List-ID: Christoph Hellwig wrote: > On Wed, May 11, 2005 at 09:32:07PM +0200, Andi Kleen wrote: > >>A minor change for that is probably ok, as long as the actual logic >>who uses this is generic. >> >>hch: if you still are against this please reread the original thread >>with me and Ray and see why we decided that ld.so changes are not >>a good idea. > > > So reading through the thread I think using mempolicies to mark shared > libraries is better than the mmap flag I proposed. I still don't think > xattrs interpreted by the kernel is a good way to store them. Setting > up libraries is the job of the dynamic linker, and reading pre-defined > memory policies from an ELF header fits the approach we do for related > things. > > > Christoph and Andi, OK, here are the alternatives I have figured out, I'd appreciate feedback on which of these would be acceptable. (In each case, the migration attributes being set are either: MIGRATE_NONE to indicate that nothing in this mapped file should be migrated, or MIGRATE_NS to indicate that the non-shared pages should be migrated, this is the normal setting for shared library files. And, since madvise() is mostly about I/O related things, I'm assuming here that I extend mbind() to set the migration attributes.): (1) Use mbind() to set "shallow" vm attributes. (I use shallow versus deep here to indicate whether or not other processes that map the same object can see the attributes -- this basically also maps to whether we put the attributes in the vma [shallow] or in the memory_object [deep].) In the shallow case, mbind() has to be called in each address space in order to properly set the migration flags the same way in each address space that maps a shared object. So, we basically have to call mbind() from ld.so. As far as I am concerned this is a fundamental show stopper, since we without broad glibc support, we will never get the changes into ld.so for just Altix and page migration. It also doesn't handle the case of shared, mapped r/o data files. We can leverage Steve Longerbeam's work here, but he also doesn't have a time frame as to when his ld.so changes might be accepted by the glibc developers. It does allow one to mark anonymous memory with migration policy. However, any use of that I've been able to think of (e. g. marking some anonymous pages as MIGRATE_NONE and then calling migrat_pages()) could equally well be handled by mbind(.., MPOL_MF_STRICT | MPOL_MF_MOVE) (MPOL_MF_MOVE is in Steve Longerbeams patch and says to move the pages that don't match the memory policy -- we plan to hook this up to the migration code at some point in the future.) (2) Use mbind() to set "deep" vm attributes. There appear to be two places where the deep attributes could be set: in the address space object vma->file->f_mapping or in the inode vma->file->f_mapping->host. Some upper order bits of address_space. flags could be used, but there appear to be concurrency issues there. Bits in inode.i_flags also appear to be available. The advantage of setting "deep" vm attributes is that this interface could be used by ld.so, but in advance of getting the changes accepted there, we could also set the deep attributes in a migration library before calling migrate_pages(). (deep attrbutes are be seen from any address space that maps the object.) Then when ld.so changes are in, we can reduce the work done by the migration library. (3) The problem with (2) is that to set a deep attribute, one has to do 4 system calls: open, mmap, mbind, munmap. If we add the migration attributes to fcntl() [such as Paul Jackson has suggested], then it we could set them directly in the inode with one system call. Perhaps not a big deal, but something to think about. It's also simpler, easier to maintain code. (4) Then there is the original, extended attribute approach. I'm including this one last time just to observe that: (i) This correctly handles regular data (non-elf) files. (ii) If one wants to migrate just a portion of anonymous memory, one could still use mbind(...MPOL_MF_STRICT | MPOL_MF_MOVE) (iii) How to set the migration policy is based on how a shared file is mapped in multiple address spaces. It is not so much a characterstic of an individual address space's usage of the file. So, it seems natural to associate these with the file and not the particular instance in one address space (that is alternative (1)). If using a system attribute is too much change to fs code, then let's use a user attribute here. It's not perfect, but it is doable, and doesn't require any fs changes. (We'll just not support migration policy in file systems that don't have extended attributes.) In short, as near as I can tell, alternative (1) really doesn't do what we want, and is the hardest to implement and get into a production kernel. I still like (4) best, but I can live with (2) or (3). Both (2) and (3) have interim approaches that can be made to work until Steve Longerbeam's stuff makes it into ld.so, at which point I can easily merge my required changes in with his. -- Best Regards, Ray ----------------------------------------------- Ray Bryant 512-453-9679 (work) 512-507-7807 (cell) raybry@sgi.com raybry@austin.rr.com The box said: "Requires Windows 98 or better", so I installed Linux. ----------------------------------------------- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: aart@kvack.org