From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AEDF0C433EF for ; Mon, 11 Oct 2021 10:33:09 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2C4BC60F14 for ; Mon, 11 Oct 2021 10:33:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 2C4BC60F14 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 865436B006C; Mon, 11 Oct 2021 06:33:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8159B900002; Mon, 11 Oct 2021 06:33:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 704216B0072; Mon, 11 Oct 2021 06:33:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0205.hostedemail.com [216.40.44.205]) by kanga.kvack.org (Postfix) with ESMTP id 621246B006C for ; Mon, 11 Oct 2021 06:33:08 -0400 (EDT) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 15F888249980 for ; Mon, 11 Oct 2021 10:33:08 +0000 (UTC) X-FDA: 78683794056.25.B959C78 Received: from mail-pj1-f51.google.com (mail-pj1-f51.google.com [209.85.216.51]) by imf07.hostedemail.com (Postfix) with ESMTP id CE51F1002AF1 for ; Mon, 11 Oct 2021 10:33:07 +0000 (UTC) Received: by mail-pj1-f51.google.com with SMTP id ls18-20020a17090b351200b001a00250584aso13912349pjb.4 for ; Mon, 11 Oct 2021 03:33:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=UPv5eAqFoWjAX3CQLmZH+52mh1iKuMebvF9Fe98WlZc=; b=N05ICMGlMILAHhelFs5f0qv0tS7SVcfT4Qp+ZMWT7Scmb0Ffr6l7bB5nXhWKa3FmKL RsJwuvQnAHUMjC04sGCXDdCga3Dq5pA74mPlR0+N/rGH6JuDNj+6dRTVwdxKn9LEYIAt 3PNCGD6WIaZGY1+almlKrXLdxncwZa6LDhVO3jGSWOjQlZKl/gHwqHvwN3me9EF06HSO hxi5dS9Tgcamg6GICQjZh1qyxFNRo1arDS6g5rnwb30XTjrib4IfE0h7RsYuG90vlygs SvNRnxHJ+Z3HW8/eOyeJxAakw/qHbY1+P/X6ohVKOmlOY+5NTobDNkt1GmBI1jclo7UC B2bw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=UPv5eAqFoWjAX3CQLmZH+52mh1iKuMebvF9Fe98WlZc=; b=QZU9amCaGoSwz/1V3ImrBJtDzJjO+nn/NR8kRZbUMzICtAjMmIJOogcxo/UROw1y2j kKQgh4MEYZM7xGSKnwQT/G+cycmmQgd2eXnVkOOjJ3ZUaMo1gJezdOmfw1qbKhsa3Ytc 5rF/oidxMiUsRE/pvIdVDnc5yVue+agLjh6v/uT3TmEDa9907V6Z3Ps5PGDBt9N9GDXZ 4KXLHfGnniGwu1BoqxJJBgGih5c36OZ7IuAJ+Pru8NVlmyLLMNAmgS9rg3Y27XWNKYvb xI9ucORiLxgsb7aVKtL/GxAu/iAzgl8WlrCBfRNd5ZLplO1q3O4hulD116qYgBmzfMBk f34g== X-Gm-Message-State: AOAM533nGpYo+QVvm6i4zK6hmBkDUMNFuecIFe4I9WrQc+3T9OjQAaoE D88FI/Mrk3vZLwdyMuAyOQo= X-Google-Smtp-Source: ABdhPJzGJV4mQ7sjRnzVuhVBnZ7Zt8sIpXQMjvqbspsqfeZEY4/gTIrnm1agBvCJFLRtAqOLtSTgjQ== X-Received: by 2002:a17:90b:4b4d:: with SMTP id mi13mr28888307pjb.187.1633948386725; Mon, 11 Oct 2021 03:33:06 -0700 (PDT) Received: from kvm.asia-northeast3-a.c.our-ratio-313919.internal (24.151.64.34.bc.googleusercontent.com. [34.64.151.24]) by smtp.gmail.com with ESMTPSA id p3sm7323068pfb.205.2021.10.11.03.33.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 11 Oct 2021 03:33:06 -0700 (PDT) Date: Mon, 11 Oct 2021 10:33:02 +0000 From: Hyeonggon Yoo <42.hyeyoo@gmail.com> To: Vlastimil Babka Cc: David Rientjes , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Christoph Lameter , Pekka Enberg , Joonsoo Kim , Andrew Morton Subject: Perf and Hackbench results on my machine Message-ID: <20211011103302.GA65713@kvm.asia-northeast3-a.c.our-ratio-313919.internal> References: <20211008133602.4963-1-42.hyeyoo@gmail.com> <30a76d87-e0af-3eec-d095-d87e898b31cf@google.com> <904b6e72-cc2e-2e4d-5601-dacab734bf15@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <904b6e72-cc2e-2e4d-5601-dacab734bf15@suse.cz> X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: CE51F1002AF1 X-Stat-Signature: k1k8xbigzme7kmmkrm86ym55itzerp1q Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=N05ICMGl; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf07.hostedemail.com: domain of 42.hyeyoo@gmail.com designates 209.85.216.51 as permitted sender) smtp.mailfrom=42.hyeyoo@gmail.com X-HE-Tag: 1633948387-722623 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hello Vlastimil. On Mon, Oct 11, 2021 at 09:21:01AM +0200, Vlastimil Babka wrote: > On 10/11/21 00:49, David Rientjes wrote: > > On Fri, 8 Oct 2021, Hyeonggon Yoo wrote: > > > >> It's certain that an object will be not only read, but also > >> written after allocation. > >> > > > > Why is it certain? I think perhaps what you meant to say is that if we > > are doing any prefetching here, then access will benefit from prefetchw > > instead of prefetch. But it's not "certain" that allocated memory will be > > accessed at all. > > I think the primary reason there's a prefetch is freelist traversal. The > cacheline we prefetch will be read during the next allocation, so if we > expect there to be one soon, prefetch might help. I agree that. > That the freepointer is > part of object itself and thus the cache line will be probably accessed also > after the allocation, is secondary. Right. it depends on cache line size and whether first cache line of an object is frequently accessed or not. > Yeah this might help some workloads, but > perhaps hurt others - these things might look obvious in theory but be > rather unpredictable in practice. At least some hackbench results would help... > Below is my measurement. it seems prefetch(w) is not making things worse at least on hackbench. Measured on 16 CPUs (ARM64) / 16G RAM Without prefetch: Time: 91.989 Performance counter stats for 'hackbench -g 100 -l 10000': 1467926.03 msec cpu-clock # 15.907 CPUs utilized 17782076 context-switches # 12.114 K/sec 957523 cpu-migrations # 652.296 /sec 104561 page-faults # 71.230 /sec 1622117569931 cycles # 1.105 GHz (54.54%) 2002981132267 instructions # 1.23 insn per cycle (54.32%) 5600876429 branch-misses (54.28%) 642657442307 cache-references # 437.800 M/sec (54.27%) 19404890844 cache-misses # 3.019 % of all cache refs (54.28%) 640413686039 L1-dcache-loads # 436.271 M/sec (46.85%) 19110650580 L1-dcache-load-misses # 2.98% of all L1-dcache accesses (46.83%) 651556334841 dTLB-loads # 443.862 M/sec (46.63%) 3193647402 dTLB-load-misses # 0.49% of all dTLB cache accesses (46.84%) 538927659684 iTLB-loads # 367.135 M/sec (54.31%) 118503839 iTLB-load-misses # 0.02% of all iTLB cache accesses (54.35%) 625750168840 L1-icache-loads # 426.282 M/sec (46.80%) 24348083282 L1-icache-load-misses # 3.89% of all L1-icache accesses (46.78%) 92.284351157 seconds time elapsed 44.524693000 seconds user 1426.214006000 seconds sys With prefetch: Time: 91.677 Performance counter stats for 'hackbench -g 100 -l 10000': 1462938.07 msec cpu-clock # 15.908 CPUs utilized 18072550 context-switches # 12.354 K/sec 1018814 cpu-migrations # 696.416 /sec 104558 page-faults # 71.471 /sec 2003670016013 instructions # 1.27 insn per cycle (54.31%) 5702204863 branch-misses (54.28%) 643368500985 cache-references # 439.778 M/sec (54.26%) 18475582235 cache-misses # 2.872 % of all cache refs (54.28%) 642206796636 L1-dcache-loads # 438.984 M/sec (46.87%) 18215813147 L1-dcache-load-misses # 2.84% of all L1-dcache accesses (46.83%) 653842996501 dTLB-loads # 446.938 M/sec (46.63%) 3227179675 dTLB-load-misses # 0.49% of all dTLB cache accesses (46.85%) 537531951350 iTLB-loads # 367.433 M/sec (54.33%) 114750630 iTLB-load-misses # 0.02% of all iTLB cache accesses (54.37%) 630135543177 L1-icache-loads # 430.733 M/sec (46.80%) 22923237620 L1-icache-load-misses # 3.64% of all L1-icache accesses (46.76%) 91.964452802 seconds time elapsed 43.416742000 seconds user 1422.441123000 seconds sys With prefetchw: Time: 90.220 Performance counter stats for 'hackbench -g 100 -l 10000': 1437418.48 msec cpu-clock # 15.880 CPUs utilized 17694068 context-switches # 12.310 K/sec 958257 cpu-migrations # 666.651 /sec 100604 page-faults # 69.989 /sec 1583259429428 cycles # 1.101 GHz (54.57%) 2004002484935 instructions # 1.27 insn per cycle (54.37%) 5594202389 branch-misses (54.36%) 643113574524 cache-references # 447.409 M/sec (54.39%) 18233791870 cache-misses # 2.835 % of all cache refs (54.37%) 640205852062 L1-dcache-loads # 445.386 M/sec (46.75%) 17968160377 L1-dcache-load-misses # 2.81% of all L1-dcache accesses (46.79%) 651747432274 dTLB-loads # 453.415 M/sec (46.59%) 3127124271 dTLB-load-misses # 0.48% of all dTLB cache accesses (46.75%) 535395273064 iTLB-loads # 372.470 M/sec (54.38%) 113500056 iTLB-load-misses # 0.02% of all iTLB cache accesses (54.35%) 628871845924 L1-icache-loads # 437.501 M/sec (46.80%) 22585641203 L1-icache-load-misses # 3.59% of all L1-icache accesses (46.79%) 90.514819303 seconds time elapsed 43.877656000 seconds user 1397.176001000 seconds sys Thanks, Hyeonggon