From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Mon, 13 Nov 2006 21:08:46 +0000 (GMT) From: Mel Gorman Subject: Re: Page allocator: Single Zone optimizations In-Reply-To: Message-ID: References: <4544914F.3000502@yahoo.com.au> <20061101182605.GC27386@skynet.ie> <20061101123451.3fd6cfa4.akpm@osdl.org> <454A2CE5.6080003@shadowen.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Sender: owner-linux-mm@kvack.org Return-Path: To: Christoph Lameter Cc: Andy Whitcroft , Andrew Morton , Nick Piggin , KAMEZAWA Hiroyuki , Linux Memory Management List , Peter Zijlstra List-ID: On Tue, 7 Nov 2006, Mel Gorman wrote: >> >> Right. Maybe we can get away with leaving the pageset cpu caches >> untouched? On our largest systems with 1k nodes 4k cpus we currently have >> 4 zones * 4096 cpus * 1024 nodes = 16 million pagesets. Each of those has >> hot and cold yielding 32 million lists. Now we going triplicate that to >> 192 mio lists and we also increase the size of the structure. >> > > I can see the problem with expanding the per-cpu structures. I'll check out > what happens when per-cpu caches are only used for movable allocations. This > is the way things were in an earlier version of anti-fragmentation but I do > not have figures any more. > This was harder to get right than expected. Using the per-cpu allocator for only movable allocations led to considerable regressions - 2.5% on PPC64 for kbuild and the x86_64 figures were showing close to 2% regression. AIM9 results showed significant regressions in places, even on machines that normally give reliable AIM9 results. Non-movable allocations are frequent enough that forcing them to not use the per-cpu allocator has a noticable impact. However, I think I have a reasonable compromise. Pages on the per-cpu lists are not using page->private so the type of page can be stored in that field (i.e. Movable, Reclaimable, Unmovable). On allocation, the list is searched and the hotest page of the appropriate type is used, else rmqueue_bulk() is called. This stops the per-cpu allocator from "leaking" pages into undesirable areas without requiring larger per-cpu structures. As care is taken to preserve the hotness of the pages and the page structures tend to be cache hot anyway, regressions should be very minor if detectable overall at all. What I've found in initial tests is that slight increases in time spent in the system are offset by reduced time spent in userspace so results tend to be within 0.2% of each other. I'll rebase the patches to the latest -mm tree, run a set of tests to make sure it's working as expected and post a new set of patches >> With the code currently in 2.6.19 we go from 4 to 2 zones. So we have only >> 16 million pagesets. With the optional DMA in mm we got from 16 to 8 >> million pagesets. This effectively undoes the optimizations done in .19 >> .20. >> > > -- > Mel Gorman > Part-time Phd Student Linux Technology Center > University of Limerick IBM Dublin Software Lab > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org