[S+Q2 00/19] SLUB with queueing (V2) beats SLAB netperf TCP_RR

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Christoph Lameter <cl@linux-foundation.org>
To: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Nick Piggin <npiggin@suse.de>,
	David Rientjes <rientjes@google.com>
Subject: [S+Q2 00/19] SLUB with queueing (V2) beats SLAB netperf TCP_RR
Date: Fri, 09 Jul 2010 14:07:06 -0500	[thread overview]
Message-ID: <20100709190706.938177313@quilx.com> (raw)

The following patchset cleans some pieces up and then equips SLUB with
per cpu queues that work similar to SLABs queues. With that approach
SLUB wins significantly in hackbench and improves also on tcp_rr.

Hackbench test script: 

#!/bin/bash 
uname -a
echo "./hackbench 100 process 200000"
./hackbench 100 process 200000
echo "./hackbench 100 process 20000"
./hackbench 100 process 20000
echo "./hackbench 100 process 20000"
./hackbench 100 process 20000
echo "./hackbench 100 process 20000"
./hackbench 100 process 20000
echo "./hackbench 10 process 20000"
./hackbench 10 process 20000
echo "./hackbench 10 process 20000"
./hackbench 10 process 20000
echo "./hackbench 10 process 20000"
./hackbench 10 process 20000
echo "./hackbench 1 process 20000"
./hackbench 1 process 20000
echo "./hackbench 1 process 20000"
./hackbench 1 process 20000
echo "./hackbench 1 process 20000"
./hackbench 1 process 20000

Dell Dual Quad Penryn on Linux 2.6.35-rc3
Time measurements: Smaller is better:

Procs	NR		SLAB	SLUB	SLUB+Queuing     %
-------------------------------------------------------------
100	200000		2741.3	2764.7	2231.9		-18
100	20000		279.3	270.3	219.0		-27
100	20000		278.0	273.1	219.2		-26
100	20000		279.0	271.7	218.8		-27
10 	20000		34.0	35.6	28.8		-18
10	20000		30.3	35.2	28.4		-6
10	20000		32.9	34.6	28.4		-15
1	20000		6.4	6.7	6.5		+1
1	20000		6.3	6.8	6.5		+3
1	20000		6.4	6.9	6.4		0

SLUB+Q also wins against SLAB in netperf:

Script:

#!/bin/bash

TIME=60  # seconds
HOSTNAME=localhost       # netserver

NR_CPUS=$(grep ^processor /proc/cpuinfo | wc -l)
echo NR_CPUS=$NR_CPUS

run_netperf() {
for i in $(seq 1 $1); do
netperf -H $HOSTNAME -t TCP_RR -l $TIME &
done
}

ITERATIONS=0
while [ $ITERATIONS -lt 12 ]; do
RATE=0
ITERATIONS=$[$ITERATIONS + 1]   
THREADS=$[$NR_CPUS * $ITERATIONS]
RESULTS=$(run_netperf $THREADS | grep -v '[a-zA-Z]' | awk '{ print $6 }')

for j in $RESULTS; do
RATE=$[$RATE + ${j/.*}]
done
echo threads=$THREADS rate=$RATE
done

Dell Dual Quad Penryn on Linux 2.6.35-rc4

Loop counts: Larger is better.

Threads		SLAB		SLUB+Q		%
 8		690869		714788		+ 3.4
16		680295		711771		+ 4.6
24		672677		703014		+ 4.5
32		676780		703914		+ 4.0
40		668458		699806		+ 4.6
48		667017		698908		+ 4.7
56		671227		696034		+ 3.6
64		667956		696913		+ 4.3
72		668332		694931		+ 3.9
80		667073		695658		+ 4.2
88		682866		697077		+ 2.0
96		668089		694719		+ 3.9

SLUB+Q is a merging of SLUB with some queuing concepts from SLAB and a
new way of managing objects in the slabs using bitmaps. It uses a percpu
queue so that free operations can be properly buffered and a bitmap for
managing the free/allocated state in the slabs. It is slightly more
inefficient than SLUB (due to the need to place large bitmaps --sized
a few words--in some slab pages if there are more than BITS_PER_LONG
objects in a slab) but in general does not increase space use too much.

The SLAB scheme of not touching the object during management is adopted.
SLUB+Q can efficiently free and allocate cache cold objects without
causing cache misses.

The queueing patches are likely still be a bit rough around corner cases
and special features and need to see some more widespread testing.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next             reply	other threads:[~2010-07-09 19:12 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-09 19:07 Christoph Lameter [this message]
2010-07-09 19:07 ` [S+Q2 01/19] Bugfix for semop() not reporting successful operation Christoph Lameter
2010-07-09 19:07 ` [S+Q2 02/19] percpu: make @dyn_size always mean min dyn_size in first chunk init functions Christoph Lameter
2010-07-09 19:07 ` [S+Q2 03/19] percpu: allow limited allocation before slab is online Christoph Lameter
2010-07-09 19:07 ` [S+Q2 04/19] slub: Use a constant for a unspecified node Christoph Lameter
2010-07-09 19:07 ` [S+Q2 05/19] SLUB: Constants need UL Christoph Lameter
2010-07-09 19:07 ` [S+Q2 06/19] slub: Check kasprintf results in kmem_cache_init() Christoph Lameter
2010-07-14 22:16   ` David Rientjes
2010-07-09 19:07 ` [S+Q2 07/19] slub: Allow removal of slab caches during boot Christoph Lameter
2010-07-14 23:48   ` David Rientjes
2010-07-19  0:07     ` Benjamin Herrenschmidt
2010-07-19 16:39       ` Christoph Lameter
2010-07-31  9:41         ` Pekka Enberg
2010-08-02 15:36           ` Christoph Lameter
2010-08-03  4:32             ` Pekka Enberg
2010-07-09 19:07 ` [S+Q2 08/19] slub: Use kmem_cache flags to detect if slab is in debugging mode Christoph Lameter
2010-07-09 19:07 ` [S+Q2 09/19] slub: discard_slab_unlock Christoph Lameter
2010-07-09 19:07 ` [S+Q2 10/19] slub: remove dynamic dma slab allocation Christoph Lameter
2010-07-09 19:07 ` [S+Q2 11/19] slub: Remove static kmem_cache_cpu array for boot Christoph Lameter
2010-07-09 19:07 ` [S+Q2 12/19] slub: Dynamically size kmalloc cache allocations Christoph Lameter
2010-07-09 19:07 ` [S+Q2 13/19] slub: Extract hooks for memory checkers from hotpaths Christoph Lameter
2010-07-09 19:07 ` [S+Q2 14/19] slub: Move gfpflag masking out of the hotpath Christoph Lameter
2010-07-09 19:07 ` [S+Q2 15/19] SLUB: Add SLAB style per cpu queueing Christoph Lameter
2010-07-09 19:07 ` [S+Q2 16/19] slub: Resize the new cpu queues Christoph Lameter
2010-07-09 19:07 ` [S+Q2 17/19] SLUB: Get rid of useless function count_free() Christoph Lameter
2010-07-09 19:07 ` [S+Q2 18/19] SLUB: Remove MAX_OBJS limitation Christoph Lameter
2010-07-09 19:07 ` [S+Q2 19/19] slub: Drop allocator announcement Christoph Lameter
2010-07-10 19:56 ` [S+Q2 00/19] SLUB with queueing (V2) beats SLAB netperf TCP_RR Heinz Diehl
2010-07-12 15:11   ` Christoph Lameter
2010-07-12 16:39     ` Heinz Diehl
2010-07-12 17:00       ` Christoph Lameter
2010-07-13 13:56         ` Heinz Diehl
2010-07-14  2:01           ` Christoph Lameter
2010-07-14 11:51             ` Tejun Heo
2010-07-14 14:25             ` Heinz Diehl
2010-07-14 20:22             ` David Rientjes
2010-07-14 11:46     ` Tejun Heo
2010-07-14 22:26 ` David Rientjes
2010-07-15 20:17   ` Christoph Lameter
2010-07-15 20:30     ` David Rientjes
2010-07-14 23:52 ` David Rientjes
2010-07-16  8:23   ` Pekka Enberg
2010-07-16  9:02     ` David Rientjes
2010-07-19  0:16       ` Benjamin Herrenschmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100709190706.938177313@quilx.com \
    --to=cl@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=penberg@cs.helsinki.fi \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox