On Fri, 2007-05-04 at 11:27 -0700, Christoph Lameter wrote:

> 
> Not sure where to go here. Increasing the per cpu slab size may hold off 
> the issue up to a certain cpu cache size. For that we would need to 
> identify which slabs create the performance issue.
> 
> One easy way to check that this is indeed the case: Enable fake NUMA. You 
> will then have separate queues for each processor since they are on 
> different "nodes". Create two fake nodes. Run one thread in each node and 
> see if this fixes it.

I tried with fake NUMA (boot with numa=fake=2) and use

numactl --physcpubind=1 --membind=0 ./netserver
numactl --physcpubind=2 --membind=1 ./netperf -t TCP_STREAM -l 60 -H
127.0.0.1 -i 5,5 -I 99,5 -- -s 57344 -S 57344 -m 4096

to run the tests.  The results are about the same as the non-NUMA case,
with slab about 5% better than slub.  

So probably the difference is due to some other reasons than partial
slab.  The kernel config file is attached.

Tim