Re: Fw: [PATCH] NUMA Slab Allocator

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Manfred Spraul <manfred@colorfullife.com>
To: Christoph Lameter <christoph@lameter.com>
Cc: Andrew Morton <akpm@osdl.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: Fw: [PATCH] NUMA Slab Allocator
Date: Wed, 16 Mar 2005 19:34:22 +0100	[thread overview]
Message-ID: <42387C2E.4040106@colorfullife.com> (raw)
In-Reply-To: <20050315204110.6664771d.akpm@osdl.org>

Hi Christoph,

Do you have profile data from your modification? Which percentage of the 
allocations is node-local, which percentage is from foreign nodes? 
Preferably per-cache. It shouldn't be difficult to add statistics 
counters to your patch.
And: Can you estaimate which percentage is really accessed node-local 
and which percentage are long-living structures that are accessed from 
all cpus in the system?
I had discussions with guys from IBM and SGI regarding a numa allocator, 
and we decided that we need profile data before we can decide if we need 
one:
- A node-local allocator reduces the inter-node traffic, because the 
callers get node-local memory
- A node-local allocator increases the inter-node traffic, because 
objects that are kfree'd on the wrong node must be returned to their 
home node.

> static inline void __cache_free (kmem_cache_t *cachep, void* objp)
> {
>  struct array_cache *ac = ac_data(cachep);
>+ struct slab *slabp;
>
>  check_irq_off();
>  objp = cache_free_debugcheck(cachep, objp, __builtin_return_address(0));
>
>- if (likely(ac->avail < ac->limit)) {
>+ /* Make sure we are not freeing a object from another
>+  * node to the array cache on this cpu.
>+  */
>+ slabp = GET_PAGE_SLAB(virt_to_page(objp));
>  
>
This line is quite slow, and should be performed only for NUMA builds, 
not for non-numa builds. Some kind of wrapper is required.

>+ if(unlikely(slabp->nodeid != numa_node_id())) {
>+  STATS_INC_FREEMISS(cachep);
>+  int nodeid = slabp->nodeid;
>+  spin_lock(&(cachep->nodelists[nodeid])->list_lock);
>  
>
This line is very dangerous: Every wrong-node allocation causes a 
spin_lock operation. I fear that the cache line traffic for the spinlock 
might kill the performance for some workloads. I personally think that 
batching is required, i.e. each cpu stores wrong-node objects in a 
seperate per-cpu array, and then the objects are returned as a block to 
their home node.

>-/*
>- * NUMA: different approach needed if the spinlock is moved into
>- * the l3 structure
>  
>
You have moved the cache spinlock into the l3 structure. Have you 
compared both approaches?
A global spinlock has the advantage that batching is possible in 
free_block: Acquire global spinlock, return objects to all nodes in the 
system, release spinlock. A node-local spinlock would mean less 
contention [multiple spinlocks instead of one global lock], but far more 
spin_lock/unlock calls.

IIRC the conclusion from our discussion was, that there are at least 
four possible implementations:
- your version
- Add a second per-cpu array for off-node allocations. __cache_free 
batches, free_block then returns. Global spinlock or per-node spinlock. 
A patch with a global spinlock is in
http://www.colorfullife.com/~manfred/Linux-kernel/slab/patch-slab-numa-2.5.66
per-node spinlocks would require a restructuring of free_block.
- Add per-node array for each cpu for wrong node allocations. Allows 
very fast batch return: each array contains memory just from one node, 
usefull if per-node spinlocks are used.
- do nothing. Least overhead within slab.

I'm fairly certains that "do nothing" is the right answer for some 
caches. For example the dentry-cache: The object lifetime is seconds to 
minutes, the objects are stored in a global hashtable. They will be 
touched from all cpus in the system, thus guaranteeing that 
kmem_cache_alloc returns node-local memory won't help. But the added 
overhead within slab.c will hurt.

--
    Manfred
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

next      parent reply	other threads:[~2005-03-16 18:34 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20050315204110.6664771d.akpm@osdl.org>
2005-03-16 18:34 ` Manfred Spraul [this message]
2005-03-16 18:54   ` Martin J. Bligh
2005-03-16 19:09     ` Manfred Spraul
2005-03-30  5:30       ` API changes to the slab allocator for NUMA memory allocation Christoph Lameter
2005-03-30  5:56         ` Manfred Spraul
2005-03-30 15:55           ` Christoph Lameter
2005-03-30 17:55             ` Manfred Spraul
2005-03-30 18:13               ` Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=42387C2E.4040106@colorfullife.com \
    --to=manfred@colorfullife.com \
    --cc=akpm@osdl.org \
    --cc=christoph@lameter.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox