From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4A7B1C10F1A for ; Tue, 7 May 2024 17:29:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C1B906B009D; Tue, 7 May 2024 13:29:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BCB0B6B009E; Tue, 7 May 2024 13:29:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A6BAF6B009F; Tue, 7 May 2024 13:29:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 80D166B009D for ; Tue, 7 May 2024 13:29:27 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 304D1160D0C for ; Tue, 7 May 2024 17:29:27 +0000 (UTC) X-FDA: 82092286374.08.B623BCF Received: from mail-pj1-f53.google.com (mail-pj1-f53.google.com [209.85.216.53]) by imf03.hostedemail.com (Postfix) with ESMTP id 6185620013 for ; Tue, 7 May 2024 17:29:25 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=XN02vP5Q; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf03.hostedemail.com: domain of andrii.nakryiko@gmail.com designates 209.85.216.53 as permitted sender) smtp.mailfrom=andrii.nakryiko@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1715102965; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=DDZ21BfhcAiLJodQpjNC/6yMxF3ELBdQ20+IdAkrnMs=; b=zKl8p5uaZyvstO4BlMl3iQO4EURGDVk8SA2U0hzUqSOdFcDnthrbJgQENtoHMGcSY0Qrzy 4Lzc6WSBWHIFZMbk0/h2j7TcMK5sd6ZBg8oCImlZ2qqz+s/5+dTRgBLOvpx6aT5e9XNBkz IBoYNpPuOyFwxvdwoEukhBVZfa11pEs= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=XN02vP5Q; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf03.hostedemail.com: domain of andrii.nakryiko@gmail.com designates 209.85.216.53 as permitted sender) smtp.mailfrom=andrii.nakryiko@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1715102965; a=rsa-sha256; cv=none; b=dSS5BSAI29I7cITUobLTJfSSgBT7Qt9GgMs3cDP95N9ZKCFllR3SPStxFLSr70TONLkHAM mSeHumijL160v7v/4dhN2bOUOegRJgTwqby5Bi8QBqijDTCe6L50D+hz3HVPArf7PaawI0 TvTTytMTiWB7f7c45TyFA8xYg2/gA1o= Received: by mail-pj1-f53.google.com with SMTP id 98e67ed59e1d1-2b2b42b5126so2957503a91.3 for ; Tue, 07 May 2024 10:29:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715102964; x=1715707764; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=DDZ21BfhcAiLJodQpjNC/6yMxF3ELBdQ20+IdAkrnMs=; b=XN02vP5QOXeHjmY8mKdoqVe3LHsz9f/T3gp479ccJ3NuN15/apj+M/T1LJUm0AEtF8 ulUcCXduPwtyLr9yQXK5k5cAjOR5rktAODc3O4/tFi6wJ1rGk8ov2HMlC3xbYfYjaL3u Kn7yAr6RXcpeqoHGqjKfd5iOR1mJd6Gi/hg7NrG8IWaf8N/2L1IIydGR7ZW1TzEeBuOz 8e8ftPlC6Q3JSXGUHK/hW8EB9nkXdPwBeQ6HFq4TiIx2IZoWXgx/Kq1FBe7kYPadDpu2 MABZn3y3/86fvro6giQ2Sqb0rnrKXe0yDRnQ5gIbWIBZ8SCtmHjCiMGBVo9IkhMfDRLm 66MA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715102964; x=1715707764; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=DDZ21BfhcAiLJodQpjNC/6yMxF3ELBdQ20+IdAkrnMs=; b=YRGWWtWcg8KEZAMjgdxyYipBdQSYKhPVCYuf4uUhugXRLvrCSdxex1PnSddp+4WNON tGrkcMaBk44y9HsMqQGokiuRDHXypu9zTG5dt089Xjh7N6N0BjYl3yMcHXwoxBA6y6PO ZQTv3Q5EpTrYlbiYsRv4Q2wXEDuwO88UUqvqJX7EDfqrpVWgVPkQhH9Arvx+CWS6x5Yk CIWQFyoUPmZMUWhwigXi5qa6mxm52Rl5yl2Ov/vJKkgwBIIGUZobmXCy/VIcOUs0Ybq0 cvYu+Wf+EXp3RO2AP4cKv0cO/8ZjZ4rInV0D1Uk22siVsrdZyJTObDiMRmlPld2ty0Fc 3onw== X-Forwarded-Encrypted: i=1; AJvYcCWZisabQcHG+gi0KabZ5N+osl040XCCQirqwPCU+PzFxanQhmD1vAhxJSr71Mytp+mrfD4f3uNqJPnfBSb6HAS1lw8= X-Gm-Message-State: AOJu0YxjS6S4sbUC7R9IzBBV90DId5wmIXKWI14hY23syy9NZp2CqASt Ma3+fm9z7dSl5IM7Yx5h3WgTIFphApsHT3dWflYa1u78pO9rFVdGZQVUJZLW/PPl7C5GV1qhbWW mbJxZpmRrRP/hP7OasWDLzidJTZb8utvL X-Google-Smtp-Source: AGHT+IHUyaZnlpJw7nJ4YjUhgV2JhTbjZvtjSBxEMvt+FIIIYjTcMP8F76B4Efs97DgOKIKE6zAI2tAIf1wA1Fy747o= X-Received: by 2002:a17:90a:588a:b0:2b3:28df:96b1 with SMTP id 98e67ed59e1d1-2b61639ffb6mr268408a91.7.1715102964093; Tue, 07 May 2024 10:29:24 -0700 (PDT) MIME-Version: 1.0 References: <20240504003006.3303334-1-andrii@kernel.org> <20240504003006.3303334-6-andrii@kernel.org> <2024050404-rectify-romp-4fdb@gregkh> In-Reply-To: From: Andrii Nakryiko Date: Tue, 7 May 2024 10:29:11 -0700 Message-ID: Subject: Re: [PATCH 5/5] selftests/bpf: a simple benchmark tool for /proc//maps APIs To: Ian Rogers Cc: Greg KH , Andrii Nakryiko , linux-fsdevel@vger.kernel.org, brauner@kernel.org, viro@zeniv.linux.org.uk, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org, linux-mm@kvack.org, Arnaldo Carvalho de Melo , "linux-perf-use." Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 6185620013 X-Stat-Signature: asdw6wge3gnff8sx9bu8nmyojdkbt1uf X-Rspam-User: X-HE-Tag: 1715102965-232936 X-HE-Meta: U2FsdGVkX19LBfVP6Ypds/nvtH/uDMdmw6yZJBoX6nDvGx/c1PcxpWi7Ln2vEeNuB93d9fUCCU6OObykQ9DhoFW6Ejy9mwJNalqjnnxkbnkzVhKnHIBpdiJN4mJMMkV0Q8ZpME+0sPNCPkVRYiRqRZd234zZpPZtXkhTiSZ7QHR+5nmvpsDJdE1JdKSnRnOKOXCUU70uGHjK7PW3kRbFOZXphuPn3KGBKRjCzHYtBpZ9YGW5xDytxve45ARprx7CWtXyUBJ4RC7hKjwP0c4TYfKRhDxfxco0Bwhbdj83oWOflaWZYEi2Z0u4D/PVAE6XlEJeLiNazO4m3Aikrywwtn0GHWlJCkEAjk5NPi6jd9QgHvMZv8XyPrGvRlNvP7RLbSYWCd19AF9KKA0wB3Y9E/aNV6UJmDr03mDR5SMQ4P8A6GTb3kdt5boowRUadQNohdpA1Rb2NjXWf+MiHrl6a0iXf05PwiWE50H5wXqhP2aLHmXVe+LfodjbMxqu/WVCIqqw00lY7pO87Cekc8175D75cic/zlg/3Q3Lbpo1NDVBPRLEVW0/08G28FKWpoxwgEYWgww0eiE2G8j32ek0q+w8XycGvfZROju7LCrutW+7C9UJzQdYlJKWz3EHgBX5QOHZl/GaGuIQ/TdGwbxu4Giys59mA3uz+3RcHZFBDbuoEdfr9/agidJ3Hip4yB1lh4vM4b9Js9fk7eY/hH/jYk7+8zVzvS7M5Q42i7W4cU5o+pUI2zZddHXWM+kIL9SSoKPMbOxzDgd5AqviipdWbxRfXX7RYaoMKoMjDgNOrpbvT/eXQzVkLRw3G5jIHmh0TxmnjDBCkbnITZeJ4cMS93rI8G4aGK8J89WjJW0oivRNF1H9e5V3+8DOhqXAlvFRDo4/U9EG3RpEzMWgrpVqTI0qoXMceVgfkdl2Ul55P16L/X6BbRjzY27izF1cSV6BeFKL6ttxeFLr0tmpTUX 3bTiSIvJ xE7968azotSfeAhVsUOf6hWnp0t4PmMz/oZtbACRN6ImcEJbfGGCViUY95/OLRPgPA9LxjC660T6akxjKYlSvxop3/YP3gFcC25TshzAcCPRWBd1cIzhRvRaBOPKeBVfg2miCSONb0x8evfTHBDiMsOb4E7Pbstd7ZdS9/TdWqgKKnfBxV8wkDvxJVLKEaJlIpTlaVPiiOcIJy1cx/GGMxEqEemWsT8FGayrhb/UOpOYxCL9T7ww0P3WS8b1W40g2LSln4VuWJ0OLlejOQC3kW4wzQnYT5aYQV2D3BRECmI91yUddMp3gU14Ulxi4f86edcZ0BUCuZGBRAy6mbbuxKiR/T5LcC+/G2u7H4Uid6e6xu2D70QhuPx7D7RMWjxSSVVI4jUpekID557+A7OTSkWNhpvn/XGxzrmwWOeen5n0sFK1B5aXSn7n0Kt36CZtbiCJ442HPBkwPL6w/5+qbv7fqY8oUPRQ+Lt4+5ojv/qthv9xXGh5E34wY76won96l0lM02LK6hRp2ok39VjmQFgjP2DvH3YfsHys0gEzwrbwCv9wHhy8p9rInDmYoNxtWcdMp X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, May 6, 2024 at 10:06=E2=80=AFPM Andrii Nakryiko wrote: > > On Mon, May 6, 2024 at 11:43=E2=80=AFAM Ian Rogers w= rote: > > > > On Mon, May 6, 2024 at 11:32=E2=80=AFAM Andrii Nakryiko > > wrote: > > > > > > On Sat, May 4, 2024 at 10:09=E2=80=AFPM Ian Rogers wrote: > > > > > > > > On Sat, May 4, 2024 at 2:57=E2=80=AFPM Andrii Nakryiko > > > > wrote: > > > > > > > > > > On Sat, May 4, 2024 at 8:29=E2=80=AFAM Greg KH wrote: > > > > > > > > > > > > On Fri, May 03, 2024 at 05:30:06PM -0700, Andrii Nakryiko wrote= : > > > > > > > Implement a simple tool/benchmark for comparing address "reso= lution" > > > > > > > logic based on textual /proc//maps interface and new bin= ary > > > > > > > ioctl-based PROCFS_PROCMAP_QUERY command. > > > > > > > > > > > > Of course an artificial benchmark of "read a whole file" vs. "a= tiny > > > > > > ioctl" is going to be different, but step back and show how thi= s is > > > > > > going to be used in the real world overall. Pounding on this f= ile is > > > > > > not a normal operation, right? > > > > > > > > > > > > > > > > It's not artificial at all. It's *exactly* what, say, blazesym li= brary > > > > > is doing (see [0], it's Rust and part of the overall library API,= I > > > > > think C code in this patch is way easier to follow for someone no= t > > > > > familiar with implementation of blazesym, but both implementation= s are > > > > > doing exactly the same sequence of steps). You can do it even les= s > > > > > efficiently by parsing the whole file, building an in-memory look= up > > > > > table, then looking up addresses one by one. But that's even slow= er > > > > > and more memory-hungry. So I didn't even bother implementing that= , it > > > > > would put /proc//maps at even more disadvantage. > > > > > > > > > > Other applications that deal with stack traces (including perf) w= ould > > > > > be doing one of those two approaches, depending on circumstances = and > > > > > level of sophistication of code (and sensitivity to performance). > > > > > > > > The code in perf doing this is here: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/= tree/tools/perf/util/synthetic-events.c#n440 > > > > The code is using the api/io.h code: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/= tree/tools/lib/api/io.h > > > > Using perf to profile perf it was observed time was spent allocatin= g > > > > buffers and locale related activities when using stdio, so io is a > > > > lighter weight alternative, albeit with more verbose code than fsca= nf. > > > > You could add this as an alternate /proc//maps reader, we have= a > > > > similar benchmark in `perf bench internals synthesize`. > > > > > > > > > > If I add a new implementation using this ioctl() into > > > perf_event__synthesize_mmap_events(), will it be tested from this > > > `perf bench internals synthesize`? I'm not too familiar with perf cod= e > > > organization, sorry if it's a stupid question. If not, where exactly > > > is the code that would be triggered from benchmark? > > > > Yes it would be triggered :-) > > Ok, I don't exactly know how to interpret the results (and what the > benchmark is doing), but numbers don't seem to be worse. They actually > seem to be a bit better. > > I pushed my code that adds perf integration to [0]. That commit has > results, but I'll post them here (with invocation parameters). > perf-ioctl is the version with ioctl()-based implementation, > perf-parse is, logically, text-parsing version. Here are the results > (and see my notes below the results as well): > > TEXT-BASED > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > # ./perf-parse bench internals synthesize > # Running 'internals/synthesize' benchmark: > Computing performance of single threaded perf event synthesis by > synthesizing events on the perf process itself: > Average synthesis took: 80.311 usec (+- 0.077 usec) > Average num. events: 32.000 (+- 0.000) > Average time per event 2.510 usec > Average data synthesis took: 84.429 usec (+- 0.066 usec) > Average num. events: 179.000 (+- 0.000) > Average time per event 0.472 usec > > # ./perf-parse bench internals synthesize > # Running 'internals/synthesize' benchmark: > Computing performance of single threaded perf event synthesis by > synthesizing events on the perf process itself: > Average synthesis took: 79.900 usec (+- 0.077 usec) > Average num. events: 32.000 (+- 0.000) > Average time per event 2.497 usec > Average data synthesis took: 84.832 usec (+- 0.074 usec) > Average num. events: 180.000 (+- 0.000) > Average time per event 0.471 usec > > # ./perf-parse bench internals synthesize --mt -M 8 > # Running 'internals/synthesize' benchmark: > Computing performance of multi threaded perf event synthesis by > synthesizing events on CPU 0: > Number of synthesis threads: 1 > Average synthesis took: 36338.100 usec (+- 406.091 usec) > Average num. events: 14091.300 (+- 7.433) > Average time per event 2.579 usec > Number of synthesis threads: 2 > Average synthesis took: 37071.200 usec (+- 746.498 usec) > Average num. events: 14085.900 (+- 1.900) > Average time per event 2.632 usec > Number of synthesis threads: 3 > Average synthesis took: 33932.300 usec (+- 626.861 usec) > Average num. events: 14085.900 (+- 1.900) > Average time per event 2.409 usec > Number of synthesis threads: 4 > Average synthesis took: 33822.700 usec (+- 506.290 usec) > Average num. events: 14099.200 (+- 8.761) > Average time per event 2.399 usec > Number of synthesis threads: 5 > Average synthesis took: 33348.200 usec (+- 389.771 usec) > Average num. events: 14085.900 (+- 1.900) > Average time per event 2.367 usec > Number of synthesis threads: 6 > Average synthesis took: 33269.600 usec (+- 350.341 usec) > Average num. events: 14084.000 (+- 0.000) > Average time per event 2.362 usec > Number of synthesis threads: 7 > Average synthesis took: 32663.900 usec (+- 338.870 usec) > Average num. events: 14085.900 (+- 1.900) > Average time per event 2.319 usec > Number of synthesis threads: 8 > Average synthesis took: 32748.400 usec (+- 285.450 usec) > Average num. events: 14085.900 (+- 1.900) > Average time per event 2.325 usec > > IOCTL-BASED > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > # ./perf-ioctl bench internals synthesize > # Running 'internals/synthesize' benchmark: > Computing performance of single threaded perf event synthesis by > synthesizing events on the perf process itself: > Average synthesis took: 72.996 usec (+- 0.076 usec) > Average num. events: 31.000 (+- 0.000) > Average time per event 2.355 usec > Average data synthesis took: 79.067 usec (+- 0.074 usec) > Average num. events: 178.000 (+- 0.000) > Average time per event 0.444 usec > > # ./perf-ioctl bench internals synthesize > # Running 'internals/synthesize' benchmark: > Computing performance of single threaded perf event synthesis by > synthesizing events on the perf process itself: > Average synthesis took: 73.921 usec (+- 0.073 usec) > Average num. events: 31.000 (+- 0.000) > Average time per event 2.385 usec > Average data synthesis took: 80.545 usec (+- 0.070 usec) > Average num. events: 178.000 (+- 0.000) > Average time per event 0.453 usec > > # ./perf-ioctl bench internals synthesize --mt -M 8 > # Running 'internals/synthesize' benchmark: > Computing performance of multi threaded perf event synthesis by > synthesizing events on CPU 0: > Number of synthesis threads: 1 > Average synthesis took: 35609.500 usec (+- 428.576 usec) > Average num. events: 14040.700 (+- 1.700) > Average time per event 2.536 usec > Number of synthesis threads: 2 > Average synthesis took: 34293.800 usec (+- 453.811 usec) > Average num. events: 14040.700 (+- 1.700) > Average time per event 2.442 usec > Number of synthesis threads: 3 > Average synthesis took: 32385.200 usec (+- 363.106 usec) > Average num. events: 14040.700 (+- 1.700) > Average time per event 2.307 usec > Number of synthesis threads: 4 > Average synthesis took: 33113.100 usec (+- 553.931 usec) > Average num. events: 14054.500 (+- 11.469) > Average time per event 2.356 usec > Number of synthesis threads: 5 > Average synthesis took: 31600.600 usec (+- 297.349 usec) > Average num. events: 14012.500 (+- 4.590) > Average time per event 2.255 usec > Number of synthesis threads: 6 > Average synthesis took: 32309.900 usec (+- 472.225 usec) > Average num. events: 14004.000 (+- 0.000) > Average time per event 2.307 usec > Number of synthesis threads: 7 > Average synthesis took: 31400.100 usec (+- 206.261 usec) > Average num. events: 14004.800 (+- 0.800) > Average time per event 2.242 usec > Number of synthesis threads: 8 > Average synthesis took: 31601.400 usec (+- 303.350 usec) > Average num. events: 14005.700 (+- 1.700) > Average time per event 2.256 usec > > I also double-checked (using strace) that it does what it is supposed > to do, and it seems like everything checks out. Here's text-based > strace log: > > openat(AT_FDCWD, "/proc/35876/task/35876/maps", O_RDONLY) =3D 3 > read(3, "00400000-0040c000 r--p 00000000 "..., 8192) =3D 3997 > read(3, "7f519d4d3000-7f519d516000 r--p 0"..., 8192) =3D 4025 > read(3, "7f519dc3d000-7f519dc44000 r-xp 0"..., 8192) =3D 4048 > read(3, "7f519dd2d000-7f519dd2f000 r--p 0"..., 8192) =3D 4017 > read(3, "7f519dff6000-7f519dff8000 r--p 0"..., 8192) =3D 2744 > read(3, "", 8192) =3D 0 > close(3) =3D 0 > > > BTW, note how the kernel doesn't serve more than 4KB of data, even > though perf provides 8KB buffer (that's to Greg's question about > optimizing using bigger buffers, I suspect without seq_file changes, > it won't work). > > And here's an abbreviated log for ioctl version, it has lots more (but > much faster) ioctl() syscalls, given it dumps everything: > > openat(AT_FDCWD, "/proc/36380/task/36380/maps", O_RDONLY) =3D 3 > ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x9f, 0x1, 0x60), 0x7fff6b603d50) =3D= 0 > ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x9f, 0x1, 0x60), 0x7fff6b603d50) =3D= 0 > > ... 195 ioctl() calls in total ... > > ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x9f, 0x1, 0x60), 0x7fff6b603d50) =3D= 0 > ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x9f, 0x1, 0x60), 0x7fff6b603d50) =3D= 0 > ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x9f, 0x1, 0x60), 0x7fff6b603d50) =3D= 0 > ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x9f, 0x1, 0x60), 0x7fff6b603d50) > =3D -1 ENOENT (No such file or directory) > close(3) =3D 0 > > > So, it's not the optimal usage of this API, and yet it's still better > (or at least not worse) than text-based API. > In another reply to Arnaldo on patch #2 I mentioned the idea of allowing to iterate only file-backed VMAs (as it seems like what symbolizers would only care about, but I might be wrong here). So I tried that quickly, given it's a trivial addition to my code. See results below (it is slightly faster, but not much, because most of VMAs in that benchmark seem to be indeed file-backed anyways), just for completeness. I'm not sure if that would be useful/used by perf, so please let me know. As I mentioned above, it's not radically faster in this perf benchmark, because we still request about 170 VMAs (vs ~195 if we iterate *all* of them), so not a big change. The ratio will vary depending on what the process is doing, of course. Anyways, just for completeness, I'm not sure if I have to add this "filter" to the actual implementation. # ./perf-filebacked bench internals synthesize # Running 'internals/synthesize' benchmark: Computing performance of single threaded perf event synthesis by synthesizing events on the perf process itself: Average synthesis took: 65.759 usec (+- 0.063 usec) Average num. events: 30.000 (+- 0.000) Average time per event 2.192 usec Average data synthesis took: 73.840 usec (+- 0.080 usec) Average num. events: 153.000 (+- 0.000) Average time per event 0.483 usec # ./perf-filebacked bench internals synthesize # Running 'internals/synthesize' benchmark: Computing performance of single threaded perf event synthesis by synthesizing events on the perf process itself: Average synthesis took: 66.245 usec (+- 0.059 usec) Average num. events: 30.000 (+- 0.000) Average time per event 2.208 usec Average data synthesis took: 70.627 usec (+- 0.074 usec) Average num. events: 153.000 (+- 0.000) Average time per event 0.462 usec # ./perf-filebacked bench internals synthesize --mt -M 8 # Running 'internals/synthesize' benchmark: Computing performance of multi threaded perf event synthesis by synthesizing events on CPU 0: Number of synthesis threads: 1 Average synthesis took: 33477.500 usec (+- 556.102 usec) Average num. events: 10125.700 (+- 1.620) Average time per event 3.306 usec Number of synthesis threads: 2 Average synthesis took: 30473.700 usec (+- 221.933 usec) Average num. events: 10127.000 (+- 0.000) Average time per event 3.009 usec Number of synthesis threads: 3 Average synthesis took: 29775.200 usec (+- 315.212 usec) Average num. events: 10128.700 (+- 0.667) Average time per event 2.940 usec Number of synthesis threads: 4 Average synthesis took: 29477.100 usec (+- 621.258 usec) Average num. events: 10129.000 (+- 0.000) Average time per event 2.910 usec Number of synthesis threads: 5 Average synthesis took: 29777.900 usec (+- 294.710 usec) Average num. events: 10144.700 (+- 11.597) Average time per event 2.935 usec Number of synthesis threads: 6 Average synthesis took: 27774.700 usec (+- 357.569 usec) Average num. events: 10158.500 (+- 14.710) Average time per event 2.734 usec Number of synthesis threads: 7 Average synthesis took: 27437.200 usec (+- 233.626 usec) Average num. events: 10135.700 (+- 2.700) Average time per event 2.707 usec Number of synthesis threads: 8 Average synthesis took: 28784.600 usec (+- 477.630 usec) Average num. events: 10133.000 (+- 0.000) Average time per event 2.841 usec > [0] https://github.com/anakryiko/linux/commit/0841fe675ed30f5605c5b228e= 18f5612ea253b35 > > > > > Thanks, > > Ian > > > > > > Thanks, > > > > Ian > > > > > > > > > [0] https://github.com/libbpf/blazesym/blob/ee9b48a80c0b4499118= a1e8e5d901cddb2b33ab1/src/normalize/user.rs#L193 > > > > > > > > > > > thanks, > > > > > > > > > > > > greg k-h > > > > >