From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Fri, 3 Nov 2006 21:11:46 +0000 (GMT) From: Mel Gorman Subject: Re: Page allocator: Single Zone optimizations In-Reply-To: Message-ID: References: <20061028180402.7c3e6ad8.akpm@osdl.org> <4544914F.3000502@yahoo.com.au> <20061101182605.GC27386@skynet.ie> <20061101123451.3fd6cfa4.akpm@osdl.org> <454A2CE5.6080003@shadowen.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Sender: owner-linux-mm@kvack.org Return-Path: To: Christoph Lameter Cc: Andy Whitcroft , Andrew Morton , Nick Piggin , KAMEZAWA Hiroyuki , Linux Memory Management List , Peter Zijlstra List-ID: On Fri, 3 Nov 2006, Christoph Lameter wrote: > On Fri, 3 Nov 2006, Mel Gorman wrote: > >>> For now this would include reclaimable slabs? >> >> It could, but I don't. Currently, only network buffers, inode caches, buffer >> heads and dentries are marked like this. > > inode cache and dentries basically contain most of the reclaimable > slab caches. > Yes, and they are the largest amount of memory allocated by a significant margin. When they are clustered together, cache shrinking tends to free up contiguous blocks of pages. >>> They are reclaimable with a >>> huge effort and there may be pinned objects that we cannot move. Isnt this >>> more another case of unmovable? >> >> Probably, they would currently be treated as unmovable. > > So you really do not currently need that section? If you drop the section > then we have the same distinction that we wouild need for memory hotplug. > You mean, drop the section dealing with clustering the cache and dentries? That section is needed. Without it, success rates at succeeding high order allocations is lower and the mechanism breaks down after a few hours uptime. >>> Note that memory for a loaded module is allocated via vmalloc, mapped via >>> a page table (init_mm) and thus memory is remappable. We will likely be >>> able to move those. >>> >> >> It's not just a case of updating init_mm. You would also need to tear down the >> vmalloc area for every current running process in the system in case they had >> faulted within that module. That would be pretty entertaining. > > vmalloc areas are not process specific > and this works just fine within the > kernel. Eeek... remap_vmalloc_range() maps into user space. Need to have a > list it seems to be able to also update those ptes. > >> Once again, I am not adverse to writing such a defragment mechanism, but I see >> anti-frag as it currently stands as a prequisitie for a defragmentation >> mechanism having a decent success rate. > > What you call anti-frag is really a mechanism to separate two different > kinds of allocations that may be useful for multiple purposes not only > anti-frag. > Well, currently three types of allocations. It's worth separating out really unmovable pages and kernel allocations that can be reclaimed/moved in some fashion. Is it the name anti-frag you have a problem with? If so, what would you suggest calling it? >> Defragmentation on it's own would be insufficient for hugepage allocations >> because of unmovable pages dotted around the system. We know this because if >> you reclaim everything possible in the system, you still are unlikely to be >> able to grow the hugepage pool. If reclaiming everything doesn't give you huge >> pages, shuffling the same pages around the system won't improve the situation > > It all depends on the movability of pages. If unmovable pages are > sufficiently rare then this will work. > They are common enough that they get spread throughout memory unless they are clustered. If that was not the case, the hugepage pool would be a lot easier to grow after a decent amount of uptime. > I think we need something like what is done here via anti-frag but I wish > it would be more generic and not solely rely on reclaim to get pages freed > up. > How could it have been made more generic? Fundamentally, all we are doing at the moment is using the freelists to cluster types of pages together. We only depend on reclaim now. If we get the clustering part done, I can start working on the page migration part. > Also the duplication of the page struct caches worries me because it > reduces the hit rate. do you mean the per-cpu caches? If so, without clustering in the per-cpu caches, unmovable allocations would "leak" into blocks used for movable allocations. > Removing the intermediate type would reduce the page > caches to 2. And significantly reduce the effectiveness of the clustering in the process. > And maybe we do not need caches for unreclaimable/unmovable > pages? slab already does its own buffering there. > That is true. If it is a problem, what could be done is have a per-cpu cache for movable and unmovable allocations. Then have the __GFP_KERNRCLM allocations bypass the per-cpu allocator altogether and go straight to the buddy allocator. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org