Andi Kleen wrote: >On Mon, Sep 20, 2004 at 12:00:33PM -0700, Ray Bryant wrote: > > >>Background >>---------- >> >>Last month, Jesse Barnes proposed a patch to do round robin >>allocation of page cache pages on NUMA machines. This got shot down >>for a number of reasons (see >> http://marc.theaimsgroup.com/?l=linux-kernel&m=109235420329360&w=2 >>and the related thread), but it seemed to me that one of the most >>significant issues was that this was a workload dependent optimization. >>That is, for an Altix running an HPC workload, it was a good thing, >>but for web servers or file servers it was not such a good idea. >> >>So the idea of this patch is the following: it creates a new memory >>policy structure (default_pagecache_policy) that is used to control >>how storage for page cache pages is allocated. So, for a large Altix >>running HPC workloads, we can specify a policy that does round robin >>allocations, and for other workloads you can specify the default policy >>(which results in page cache pages being allocated locally). >> >>The default_pagecache_policy is overrideable on a per process basis, so >>that if your application prefers to allocate page cache pages locally, >>it can. >> >> > >I'm not sure this really makes sense. Do you have some clear use >case where having so much flexibility is needed? > >I would prefer to have a global setting somewhere for the page >cache (sysctl or sysfs or what you prefer) and some special handling for >text pages. > >This would keep the per thread bloat low. > >Also I must say I got a patch submitted to do policy per >file from Steve Longerbeam. > >It so far only supports this for ELF executables, but >it has most of the infrastructure to do individual policy >per file. Maybe it would be better to go into this direction, >only thing missing is a nice way to declare policy for >arbitary files. Even in this case a global default would be useful. > >I haven't done anything with this patch yet due to missing time >and there were a few small issues to resolve, but i hope it >can be eventually integrated. > >[Steve, perhaps you can repost the patch to lse-tech for more >wider review?] > > Sure, patch is attached. Also, here is a reposting of my original email to you (Andi) describing the patch. Btw, I received your comments on the patch, I will reply to your points seperately. Sorry I haven't replied sooner, I'm in the middle of switching jobs :-) -------- original email follows ---------- Hi Andi, I'm working on adding the features to NUMA mempolicy necessary to support MontaVista's MTA. Attached is the first of those features, support for global page allocation policy for mapped files. Here's what the patch is doing: 1. add a shared_policy tree to the address_space object in fs.h. 2. modify page_cache_alloc() in pagemap.h to take an address_space object and page offset, and use those to allocate a page for the page cache using the policy in the address_space object. 3. modify filemap.c to pass the additional {mapping, page offset} pair to page_cache_alloc(). 4. Also in filemap.c, implement generic file {set|get}_policy() methods and add those to generic_file_vm_ops. 5. In filemap_nopage(), verify that any existing page located in the cache is located in a node that satisfies the file's policy. If it's not in a node that satisfies the policy, it must be because the page was allocated before the file had any policies. If it's unused, free it and goto retry_find (will allocate a new page using the file's policy). Note that a similar operation is done in exec.c:setup_arg_pages() for stack pages. 6. Init the file's shared policy in alloc_inode(), and free the shared policy in destroy_inode(). I'm working on the remaining features needed for MTA. They are: - support for policies contained in ELF images, for text and data regions. - support for do_mmap_mempolicy() and do_brk_mempolicy(). Do_mmap() can allocate pages to the region before the function exits, such as when pages are locked for the region. So it's necessary in that case to set the VMA's policy within do_mmap() before those pages are allocated. - system calls for mmap_mempolicy and brk_mempolicy. Let me know your thoughts on the filemap policy patch. Thanks, Steve