From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Wed, 13 Jun 2007 12:33:06 +0900 From: Paul Mundt Subject: Re: [PATCH] slob: poor man's NUMA, take 2. Message-ID: <20070613033306.GA15169@linux-sh.org> References: <20070613031203.GB15009@linux-sh.org> <466F6351.9040503@yahoo.com.au> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <466F6351.9040503@yahoo.com.au> Sender: owner-linux-mm@kvack.org Return-Path: To: Nick Piggin Cc: Matt Mackall , Christoph Lameter , Andrew Morton , linux-mm@kvack.org List-ID: On Wed, Jun 13, 2007 at 01:24:01PM +1000, Nick Piggin wrote: > Paul Mundt wrote: > >Here's an updated copy of the patch adding simple NUMA support to SLOB, > >against the current -mm version of SLOB this time. > > > >I've tried to address all of the comments on the initial version so far, > >but there's obviously still room for improvement. > > > >This approach is not terribly scalable in that we still end up using a > >global freelist (and a global spinlock!) across all nodes, making the > >partial free page lookup rather expensive. The next step after this will > >be moving towards split freelists with finer grained locking. > > I just think that this is not really a good intermediate step because > you only get NUMA awareness from the first allocation out of a page. I > guess that's an easy no-brainer for bigblock allocations, but for SLUB > proper, it seems not so good. > > For a lot of workloads you will have a steady state where allocation and > freeing rates match pretty well and there won't be much movement of pages > in and out of the allocator. In this case it will be back to random > allocations, won't it? > That's why I tossed in the node id matching in slob_alloc() for the partial free page lookup. At the moment the logic obviously won't scale, since we end up scanning the entire freelist looking for a page that matches the node specifier. If we don't find one, we could rescan and just grab a block from another node, but at the moment it just continues on and tries to fetch a new page for the specified node. If the freelists are split per node, that makes it a bit more manageable, but that's more of a scalability issue than a correctness one. Random alloc/free workloads will stick to their node with the current patch, so I'm not sure where you see the random node placement as an issue? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org