From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <466F520D.9080206@yahoo.com.au> Date: Wed, 13 Jun 2007 12:10:21 +1000 From: Nick Piggin MIME-Version: 1.0 Subject: Re: [PATCH] numa: mempolicy: dynamic interleave map for system init. References: <20070607011701.GA14211@linux-sh.org> <20070607180108.0eeca877.akpm@linux-foundation.org> <20070608032505.GA13227@linux-sh.org> <20070608145011.GE11115@waste.org> <20070612094359.GA5803@linux-sh.org> <20070612153234.GI11115@waste.org> In-Reply-To: <20070612153234.GI11115@waste.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Matt Mackall Cc: Paul Mundt , Christoph Lameter , Andrew Morton , linux-mm@kvack.org, ak@suse.de, hugh@veritas.com, lee.schermerhorn@hp.com List-ID: Matt Mackall wrote: > On Tue, Jun 12, 2007 at 06:43:59PM +0900, Paul Mundt wrote: > >>On Fri, Jun 08, 2007 at 09:50:11AM -0500, Matt Mackall wrote: >> >>>SLOB's big scalability problem at this point is number of CPUs. >>>Throwing some fine-grained locking at it or the like may be able to >>>help with that too. >>> >>>Why would you even want to bother making it scale that large? For >>>starters, it's less affected by things like dcache fragmentation. The >>>majority of pages pinned by long-lived dcache entries will still be >>>available to other allocations. >>> >>>Haven't given any thought to NUMA yet though.. >>> >> >>This is what I've hacked together and tested with my small nodes. It's >>not terribly intelligent, and it pushes off most of the logic to the page >>allocator. Obviously it's not terribly scalable, and I haven't tested it >>with page migration, either. Still, it works for me with my simple tmpfs >>+ mpol policy tests. >> >>Tested on a UP + SPARSEMEM (static, not extreme) + NUMA (2 nodes) + SLOB >>configuration. >> >>Flame away! > > > For starters, it's not against the current SLOB, which no longer has > the bigblock list. > > >>-void *__kmalloc(size_t size, gfp_t gfp) >>+static void *__kmalloc_alloc(size_t size, gfp_t gfp, int node) > > > That's a ridiculous name. So, uh.. more underbars! > > Though really, I think you can just name it __kmalloc_node? > > >>+ if (node == -1) >>+ pages = alloc_pages(flags, get_order(c->size)); >>+ else >>+ pages = alloc_pages_node(node, flags, >>+ get_order(c->size)); > > > This fragment appears a few times. Looks like it ought to get its own > function. And that function can reduce to a trivial inline in the > !NUMA case. BTW. what I would like to see tried initially -- which may give reasonable scalability and NUMAness -- is perhaps a percpu or per-node free pages lists. However these lists would not be exclusively per-cpu, because that would result in worse memory consumption (we should always try to put memory consumption above all else with SLOB). So each list would have its own lock and can be accessed by any CPU, but they would default to their own list first (or in the case of a kmalloc_node, they could default to some other list). Then we'd probably like to introduce a *little* bit of slack, so that we will allocate a new page on our local list even if there is a small amount of memory free on another list. I think this might be enough to get a reasonable number of list-local allocations without blowing out the memory usage much. The slack ratio could be configurable so at one extreme we could always allocate from our local lists for best NUMA placement I guess. I haven't given it a great deal of thought, so this strategy might go horribly wrong in some cases... but I have a feeling something reasonably simple like that might go a long way to improving locking scalability and NUMAness. -- SUSE Labs, Novell Inc. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org