From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx159.postini.com [74.125.245.159]) by kanga.kvack.org (Postfix) with SMTP id 061996B0071 for ; Wed, 4 Jul 2012 10:30:15 -0400 (EDT) Received: by ghrr18 with SMTP id r18so8045676ghr.14 for ; Wed, 04 Jul 2012 07:30:15 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <1340389359-2407-1-git-send-email-js1304@gmail.com> <1340390729-2821-1-git-send-email-js1304@gmail.com> Date: Wed, 4 Jul 2012 23:30:14 +0900 Message-ID: Subject: Re: [PATCH 1/3 v2] slub: prefetch next freelist pointer in __slab_alloc() From: JoonSoo Kim Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: Pekka Enberg Cc: Christoph Lameter , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Eric Dumazet 2012/7/4 Pekka Enberg : > Well, can you show improvement in any benchmark or workload? > Prefetching is not always an obvious win and the reason we merged > Eric's patch was that he was able to show an improvement in hackbench. I thinks that this patch is perfectly same effect as Eric's patch, so doesn't include benchmark result. Eric's patch which add "prefetch instruction" in fastpath works for second ~ last object of cpu slab. This patch which add "prefetch instrunction" in slowpath works for first object of cpu slab. But, I do test "./perf stat -r 20 ./hackbench 50 process 4000 > /dev/null" and gain following outputs. ***** vanilla ***** Performance counter stats for './hackbench 50 process 4000' (20 runs): 114189.571311 task-clock # 7.924 CPUs utilized ( +- 0.29% ) 2,978,515 context-switches # 0.026 M/sec ( +- 3.45% ) 102,635 CPU-migrations # 0.899 K/sec ( +- 5.63% ) 123,948 page-faults # 0.001 M/sec ( +- 0.16% ) 422,477,120,134 cycles # 3.700 GHz ( +- 0.29% ) stalled-cycles-frontend stalled-cycles-backend 251,943,851,074 instructions # 0.60 insns per cycle ( +- 0.14% ) 46,214,207,979 branches # 404.715 M/sec ( +- 0.15% ) 215,342,095 branch-misses # 0.47% of all branches ( +- 0.53% ) 14.409990448 seconds time elapsed ( +- 0.30% ) Performance counter stats for './hackbench 50 process 4000' (20 runs): 114576.053284 task-clock # 7.921 CPUs utilized ( +- 0.35% ) 2,810,138 context-switches # 0.025 M/sec ( +- 3.21% ) 85,641 CPU-migrations # 0.747 K/sec ( +- 5.05% ) 124,299 page-faults # 0.001 M/sec ( +- 0.18% ) 423,906,539,517 cycles # 3.700 GHz ( +- 0.35% ) stalled-cycles-frontend stalled-cycles-backend 251,354,351,283 instructions # 0.59 insns per cycle ( +- 0.13% ) 46,098,601,012 branches # 402.341 M/sec ( +- 0.13% ) 213,448,657 branch-misses # 0.46% of all branches ( +- 0.50% ) 14.464325969 seconds time elapsed ( +- 0.34% ) ***** patch applied ***** Performance counter stats for './hackbench 50 process 4000' (20 runs): 112935.199731 task-clock # 7.926 CPUs utilized ( +- 0.29% ) 2,810,157 context-switches # 0.025 M/sec ( +- 2.95% ) 104,278 CPU-migrations # 0.923 K/sec ( +- 6.83% ) 123,999 page-faults # 0.001 M/sec ( +- 0.17% ) 417,834,406,420 cycles # 3.700 GHz ( +- 0.29% ) stalled-cycles-frontend stalled-cycles-backend 251,291,523,926 instructions # 0.60 insns per cycle ( +- 0.11% ) 46,083,091,476 branches # 408.049 M/sec ( +- 0.12% ) 213,714,228 branch-misses # 0.46% of all branches ( +- 0.43% ) 14.248980376 seconds time elapsed ( +- 0.29% ) Performance counter stats for './hackbench 50 process 4000' (20 runs): 113640.944855 task-clock # 7.926 CPUs utilized ( +- 0.28% ) 2,776,983 context-switches # 0.024 M/sec ( +- 5.66% ) 95,962 CPU-migrations # 0.844 K/sec ( +- 10.69% ) 123,849 page-faults # 0.001 M/sec ( +- 0.15% ) 420,446,572,595 cycles # 3.700 GHz ( +- 0.28% ) stalled-cycles-frontend stalled-cycles-backend 251,174,259,429 instructions # 0.60 insns per cycle ( +- 0.21% ) 46,060,683,039 branches # 405.318 M/sec ( +- 0.23% ) 213,480,999 branch-misses # 0.46% of all branches ( +- 0.75% ) 14.336843534 seconds time elapsed ( +- 0.28% ) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org