From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 723F4C0218A for ; Wed, 29 Jan 2025 02:54:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7636E28002B; Tue, 28 Jan 2025 21:54:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 712BC280028; Tue, 28 Jan 2025 21:54:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5B6C428002B; Tue, 28 Jan 2025 21:54:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 37002280028 for ; Tue, 28 Jan 2025 21:54:57 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id BAC151A07E7 for ; Wed, 29 Jan 2025 02:54:56 +0000 (UTC) X-FDA: 83058972192.02.0018F02 Received: from mail-qt1-f174.google.com (mail-qt1-f174.google.com [209.85.160.174]) by imf15.hostedemail.com (Postfix) with ESMTP id CBFF7A0011 for ; Wed, 29 Jan 2025 02:54:54 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=sqhsYYyX; spf=pass (imf15.hostedemail.com: domain of surenb@google.com designates 209.85.160.174 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738119294; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=eyN2CaXrlzsFfEQyErVRWYrnEL/HY1t7wN/EIZNjVvQ=; b=P155rrAU+JEwph3fahQ6xLhM7f5b/NzqIr9u2aBkpcFQEoRwGqqTgYdgxJa5y6x1dLMRAg qbICOFY46Gt2+7Yatvh+07dZ8Seo964WLdb3rC7FX4vpHvmbHfQfnG8H+wfjdpZmHr2Cf6 oN882NeiR//5QW9WCakzWtRD25FaUY4= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=sqhsYYyX; spf=pass (imf15.hostedemail.com: domain of surenb@google.com designates 209.85.160.174 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738119294; a=rsa-sha256; cv=none; b=yFgfoztwsqvUX1pw8YfBcw9/gTbdrYNie68jBpZTpvJ9wH5/SiHb8ktxIKAqsta5Hq4UlS TaODLu8wchBI3aR9M8vVJKCv2orchLUSK6WiLlicGrKunmIy9udbRvOMaPPIWP7/Gy2oNG 7iCyY1UGV3kJMgLRLvvi2qYkyxXLTGc= Received: by mail-qt1-f174.google.com with SMTP id d75a77b69052e-4678c9310afso411251cf.1 for ; Tue, 28 Jan 2025 18:54:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738119294; x=1738724094; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=eyN2CaXrlzsFfEQyErVRWYrnEL/HY1t7wN/EIZNjVvQ=; b=sqhsYYyX5XMhqRByNIbCwcRJ/UjS2U/gh/lWPK+MsZA1qXpfuFTouruONchGtqxmJQ W/Vdj9+f46VoTUP3UwJuNxtvx0EGQ/He335+uSUW4rlYpOzsSM0X/Bj3pps+a2qlFo2+ 2lsKVQStN7YYa5zHkIUyUXfRF3QK5iEsZVZ/sJy+GcmxXZ1vkPWVbUoyRl/BRK5Htcjg bxdsn4GyrsWO1W6qfS2XC94o7pSMMLkbHj+X+bMKlJlY76hTLyZlL0r3v9ock/uWV6SM O23rGXjmPgAVHerCaGUAGveXaJFiLBoBKOnxXph0LUvx7v9bRmplTWr5TVl7mvOpgwfK 4HSA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738119294; x=1738724094; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=eyN2CaXrlzsFfEQyErVRWYrnEL/HY1t7wN/EIZNjVvQ=; b=MuMLCl0zUcCfX81Z+vNT36/rkvOhu4GfokioEkioDG1IZCuYgV2dYs9fOYc/cId/hB +wi80mAvTRAkPJ5MnblzPiGNeH17Utj50s08sXpojLPjbRXv0LIo5JHQJykL/FObW+N4 ZbJQSc1+Lt8efK7xqqn9BMNF7b6bH34cWSFOvRfpbXnfI6bSdBFK+ivA24ubIeiYK2sR TOAiv2mCqf0tQbEFyLOXV8qAem8gDIMoKpZJ+cu0S4FHcSAW9JGJM9iobxgxf8uoHjX5 /AUXYBA5j81Puln03BC3ZMn1FYHgbp1JNgaaPyKYqJfwUuNrDR6kQ0ttNtN/RC5ijZTe CNSQ== X-Forwarded-Encrypted: i=1; AJvYcCU0OuCbt5pGS8pFlJ99WEee8Nz+xnkQN18OWlZaVlfeAl+vKUwwEwUmofnbJQK65PP/hoe5u0yVEQ==@kvack.org X-Gm-Message-State: AOJu0Yy8FeY/5xMCLig2NrvtIlKgXVqGvLxUqimmlvXH/b126uNj3r9V OOtiwtQrJxTkE/MbLxJDMbeqBhZm0DWcDL6hUovEZ2JTTeN7Boc5ztJOKugSiugDTY1IANT7bEd 0Jjzlybp/US8zUP9YgAMVgUo7EMJf6scqOil1 X-Gm-Gg: ASbGncvy1ENRPTowK2PTBP15qh2FUia+7L3mfTCGa4qC/ZJhNoD1jT50eXz6oJMBK3I d497rmkyT0poaFnV+4TT5w1PKnyVZAwuO/JVViHfWsuRq/0d6Kr2SLisdwERJETvoIuSwG7wQ X-Google-Smtp-Source: AGHT+IGzwqs915VuPFhLpxtdzQ+0WkFgKoPq7ZwWKcS1m/sguMj0/CZ5nL8WRim/zovd/WBw1G4pZs8N+KXSLnTEnYs= X-Received: by 2002:ac8:5f90:0:b0:46c:791f:bf2f with SMTP id d75a77b69052e-46fd273e6aemr967981cf.1.1738119293546; Tue, 28 Jan 2025 18:54:53 -0800 (PST) MIME-Version: 1.0 References: <20250126070206.381302-1-surenb@google.com> <20250126070206.381302-2-surenb@google.com> <20250128143549.62f458a7@gandalf.local.home> In-Reply-To: From: Suren Baghdasaryan Date: Tue, 28 Jan 2025 18:54:42 -0800 X-Gm-Features: AWEUYZnT-Yb0rsJRo2ov3yprVn3Mmp9iDHLTZXJSVWnLwvzMThdepZEIEasJBtQ Message-ID: Subject: Re: [PATCH 2/3] alloc_tag: uninline code gated by mem_alloc_profiling_key in slab allocator To: Steven Rostedt Cc: Vlastimil Babka , akpm@linux-foundation.org, Peter Zijlstra , kent.overstreet@linux.dev, yuzhao@google.com, minchan@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, 00107082@163.com, quic_zhenhuah@quicinc.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: CBFF7A0011 X-Stat-Signature: th47y57i6cgnpxtetng7bp9y8gsdpkhd X-Rspam-User: X-HE-Tag: 1738119294-43591 X-HE-Meta: U2FsdGVkX18Ed9UeLJTkjPEocfRdyCaVC0mpdF7/AlbxVi6CfhGAZUKJjBJrpa16FczOfEb4Zn0xbNSVXlu9j2fYJNZDzJ+PLkqfM1sflNSYPk3GqN5hXiMRHfvgBAkVYSVEB3zjT367Rv7iey21qSVSQYrZO0goPuUHJ5JsCuYw6BnEp6vjQRNID21iCL98cTeaBPQv4sbgO9UKL4pRg/DQnFrRqVokeGMQUS4ua/1PT8HUJvUu0bY1XJ2vBMsXEf2cSRMu8MxKf3JpCYHzfuh+YqKrCKMoEwCKboCDNiOxUY1VcxQVXHfppEPrfnMlxCRELJzOHkx8npcl/NM3+HqAGuMV5lyKoHYyWCl1Niq6QoznYNpxTBx5pbNtGopc/RgAAt7p3yQeWAVwvyCaDVukwM5sxtxL+gJt+Vv5YSE5OJPTAs7n/0/X83sAEG/nSCuOy81lqH2UEL+G+a/r/kB3Gco56C7kOdu5wTEaCRq2akn37SZxuImmhUt0kP/TAAFB4KWWDenb41QB53Yd9Lu2cmpYVjYCOiMagr77HWtmrhG1JraUDyQaQssvq6XKRsBSL1EKRlzhCnqVRE8yH/K7Wv7M3V4PJp9DwfIIgRGZ+oAnccHyyHDzP9SD2/6FZJymGa6Es8RR6ByxUu45gM8IwldEHbWsyDnCy7vrylB8F/IULdbHAAQpbRqPMHrhLccjWMEfQePop8J1TbS9vlTlY9e3ei4tGZMNFY2yKYC2ibxSjlGTT2TNontRO5E3XzjBfpprn+XNBSPRppGx5+QECwsOz4i5c8tOmPvADMYH0FHO53k2wqkvjjHJLHEFeQ24nNQZ+ZltwPlJPRCAfofh66535/RtOSUUv2skIh7htKArEwffWELHCXBi3vdtQPPHqyz4brrDrp119NBDVSh+W8yLJpq/5OKbVPteGVDWWpQLqUafclcs38B3uycd18K9wobh1G6fvcbwidn Ft1po1xR yVDGdPmGCpij+/XW+MPpvDVkZt6mJEYcMH/Ff56NCQl8dOHGvyOKwIIZPktTXyXCNZzzrm0xYh7sIlk4h12opxDkuTfTGYUKHN1POJ+Ux75fPH1JAwNpyp6IaCOwou8TwLem6uVCqLCXmc2haaz5woxE01A1uY8T5D6fJLeI6W+SKnnZfylIbAHzZp69Pp0PwEKDUUnV5zDVR8znXnpcch/2vF3K48bWrgJ18Sblyjedcu81fMxHJbuGsQ/e6a0GgBmFR8B4Qck8bnAlzcLQAQy9pXccESa1HV1vW8oD0ub3Q3xE5J9iYDLxVQdrk5l6Xalfc X-Bogosity: Ham, tests=bogofilter, spamicity=0.000407, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jan 28, 2025 at 3:43=E2=80=AFPM Suren Baghdasaryan wrote: > > On Tue, Jan 28, 2025 at 11:35=E2=80=AFAM Steven Rostedt wrote: > > > > On Mon, 27 Jan 2025 11:38:32 -0800 > > Suren Baghdasaryan wrote: > > > > > On Sun, Jan 26, 2025 at 8:47=E2=80=AFAM Vlastimil Babka wrote: > > > > > > > > On 1/26/25 08:02, Suren Baghdasaryan wrote: > > > > > When a sizable code section is protected by a disabled static key= , that > > > > > code gets into the instruction cache even though it's not execute= d and > > > > > consumes the cache, increasing cache misses. This can be remedied= by > > > > > moving such code into a separate uninlined function. The improvem= ent > > > > > > Sorry, I missed adding Steven Rostedt into the CC list since his > > > advice was instrumental in finding the way to optimize the static key > > > performance in this patch. Added now. > > > > > > > > > > > Weird, I thought the static_branch_likely/unlikely/maybe was alread= y > > > > handling this by the unlikely case being a jump to a block away fro= m the > > > > fast-path stream of instructions, thus making it less likely to get= cached. > > > > AFAIU even plain likely()/unlikely() should do this, along with bra= nch > > > > prediction hints. > > > > > > This was indeed an unexpected overhead when I measured it on Android. > > > Cache pollution was my understanding of the cause for this high > > > overhead after Steven told me to try uninlining the protected code. H= e > > > has done something similar in the tracing subsystem. But maybe I > > > misunderstood the real reason. Steven, could you please verify if my > > > understanding of the high overhead cause is correct here? Maybe there > > > is something else at play that I missed? > > > > From what I understand, is that the compiler will only move code to the= end > > of a function with the unlikely(). But, the code after the function cou= ld > > also be in the control flow path. If you have several functions that ar= e > > called together, by adding code to the unlikely() cases may not help th= e > > speed. > > > > I made an effort to make the tracepoint code call functions instead of > > having everything inlined. It actually brought down the size of the tex= t of > > the kernel, but looking in the change logs I never posted benchmarks. B= ut > > I'm sure making the size of the scheduler text section smaller probably= did > > help. > > > > > > That would be in line with my understanding above. Does the arm64 c= ompiler > > > > not do it as well as x86 (could be maybe found out by disassembling= ) or the > > > > Pixel6 cpu somhow caches these out of line blocks more aggressively= and only > > > > a function call stops it? > > > > > > I'll disassemble the code and will see what it looks like. > > > > I think I asked you to do that too ;-) > > Yes you did! And I disassembled almost each of these functions during > my investigation but in my infinite wisdom I did not save any of them. > So, now I need to do that again to answer Vlastimil's question. I'll > try to do that today. Yeah, quite a difference. This is alloc_tagging_slab_alloc_hook() with outlined version of __alloc_tagging_slab_alloc_hook(): ffffffc0803a2dd8 : ffffffc0803a2dd8: d503201f nop ffffffc0803a2ddc: d65f03c0 ret ffffffc0803a2de0: d503233f paciasp ffffffc0803a2de4: a9bf7bfd stp x29, x30, [sp, #-0x10]! ffffffc0803a2de8: 910003fd mov x29, sp ffffffc0803a2dec: 94000004 bl 0xffffffc0803a2dfc <__alloc_tagging_slab_alloc_hook> ffffffc0803a2df0: a8c17bfd ldp x29, x30, [sp], #0x10 ffffffc0803a2df4: d50323bf autiasp ffffffc0803a2df8: d65f03c0 ret This is the same function with inlined version of __alloc_tagging_slab_alloc_hook(): ffffffc0803a2dd8 : ffffffc0803a2dd8: d503233f paciasp ffffffc0803a2ddc: d10103ff sub sp, sp, #0x40 ffffffc0803a2de0: a9017bfd stp x29, x30, [sp, #0x10] ffffffc0803a2de4: f90013f5 str x21, [sp, #0x20] ffffffc0803a2de8: a9034ff4 stp x20, x19, [sp, #0x30] ffffffc0803a2dec: 910043fd add x29, sp, #0x10 ffffffc0803a2df0: d503201f nop ffffffc0803a2df4: a9434ff4 ldp x20, x19, [sp, #0x30] ffffffc0803a2df8: f94013f5 ldr x21, [sp, #0x20] ffffffc0803a2dfc: a9417bfd ldp x29, x30, [sp, #0x10] ffffffc0803a2e00: 910103ff add sp, sp, #0x40 ffffffc0803a2e04: d50323bf autiasp ffffffc0803a2e08: d65f03c0 ret ffffffc0803a2e0c: b4ffff41 cbz x1, 0xffffffc0803a2df4 ffffffc0803a2e10: b9400808 ldr w8, [x0, #0x8] ffffffc0803a2e14: 12060049 and w9, w2, #0x4000000 ffffffc0803a2e18: 12152108 and w8, w8, #0xff800 ffffffc0803a2e1c: 120d6108 and w8, w8, #0xfff80fff ffffffc0803a2e20: 2a090108 orr w8, w8, w9 ffffffc0803a2e24: 35fffe88 cbnz w8, 0xffffffc0803a2df4 ffffffc0803a2e28: d378dc28 lsl x8, x1, #8 ffffffc0803a2e2c: d2c01009 mov x9, #0x8000000000 // =3D549755813888 ffffffc0803a2e30: f9000fa0 str x0, [x29, #0x18] ffffffc0803a2e34: f90007e1 str x1, [sp, #0x8] ffffffc0803a2e38: 8b882128 add x8, x9, x8, asr #8 ffffffc0803a2e3c: b25f7be9 mov x9, #-0x200000000 // =3D-8589934592 ffffffc0803a2e40: f2b80009 movk x9, #0xc000, lsl #16 ffffffc0803a2e44: d34cfd08 lsr x8, x8, #12 ffffffc0803a2e48: 8b081928 add x8, x9, x8, lsl #6 ffffffc0803a2e4c: f9400509 ldr x9, [x8, #0x8] ffffffc0803a2e50: d100052a sub x10, x9, #0x1 ffffffc0803a2e54: 7200013f tst w9, #0x1 ffffffc0803a2e58: 9a8a0108 csel x8, x8, x10, eq ffffffc0803a2e5c: 3940cd09 ldrb w9, [x8, #0x33] ffffffc0803a2e60: 7103d53f cmp w9, #0xf5 ffffffc0803a2e64: 9a9f0113 csel x19, x8, xzr, eq ffffffc0803a2e68: f9401e68 ldr x8, [x19, #0x38] ffffffc0803a2e6c: f1001d1f cmp x8, #0x7 ffffffc0803a2e70: 540000a8 b.hi 0xffffffc0803a2e84 ffffffc0803a2e74: aa1303e0 mov x0, x19 ffffffc0803a2e78: 2a1f03e3 mov w3, wzr ffffffc0803a2e7c: 97ffd6a5 bl 0xffffffc080398910 ffffffc0803a2e80: 350009c0 cbnz w0, 0xffffffc0803a2fb8 ffffffc0803a2e84: b000f2c8 adrp x8, 0xffffffc0821fb000 ffffffc0803a2e88: f9401e6a ldr x10, [x19, #0x38] ffffffc0803a2e8c: f9453909 ldr x9, [x8, #0xa70] ffffffc0803a2e90: 927df148 and x8, x10, #0xfffffffffffffff8 ffffffc0803a2e94: b40000e9 cbz x9, 0xffffffc0803a2eb0 ffffffc0803a2e98: f94007ea ldr x10, [sp, #0x8] ffffffc0803a2e9c: cb090149 sub x9, x10, x9 ffffffc0803a2ea0: f142013f cmp x9, #0x80, lsl #12 // =3D0x80000 ffffffc0803a2ea4: 54000062 b.hs 0xffffffc0803a2eb0 ffffffc0803a2ea8: aa1f03e9 mov x9, xzr ffffffc0803a2eac: 14000015 b 0xffffffc0803a2f00 ffffffc0803a2eb0: d2ffe009 mov x9, #-0x100000000000000 // =3D-72057594037927936 ffffffc0803a2eb4: 14000002 b 0xffffffc0803a2ebc ffffffc0803a2eb8: aa1f03e9 mov x9, xzr ffffffc0803a2ebc: d2dffa0a mov x10, #0xffd000000000 // =3D281268818280= 448 ffffffc0803a2ec0: f2e01fea movk x10, #0xff, lsl #48 ffffffc0803a2ec4: 8b13194a add x10, x10, x19, lsl #6 ffffffc0803a2ec8: 9274ad4a and x10, x10, #0xfffffffffff000 ffffffc0803a2ecc: aa0a012a orr x10, x9, x10 ffffffc0803a2ed0: f9400fa9 ldr x9, [x29, #0x18] ffffffc0803a2ed4: f940112b ldr x11, [x9, #0x20] ffffffc0803a2ed8: f94007e9 ldr x9, [sp, #0x8] ffffffc0803a2edc: cb0a0129 sub x9, x9, x10 ffffffc0803a2ee0: d360fd6c lsr x12, x11, #32 ffffffc0803a2ee4: 9bab7d2a umull x10, w9, w11 ffffffc0803a2ee8: d368fd6b lsr x11, x11, #40 ffffffc0803a2eec: d360fd4a lsr x10, x10, #32 ffffffc0803a2ef0: 4b0a0129 sub w9, w9, w10 ffffffc0803a2ef4: 1acc2529 lsr w9, w9, w12 ffffffc0803a2ef8: 0b0a0129 add w9, w9, w10 ffffffc0803a2efc: 1acb2529 lsr w9, w9, w11 ffffffc0803a2f00: ab091109 adds x9, x8, x9, lsl #4 ffffffc0803a2f04: f9400fa8 ldr x8, [x29, #0x18] ffffffc0803a2f08: 54fff760 b.eq 0xffffffc0803a2df4 ffffffc0803a2f0c: b1002129 adds x9, x9, #0x8 ffffffc0803a2f10: 54fff720 b.eq 0xffffffc0803a2df4 ffffffc0803a2f14: d5384113 mrs x19, SP_EL0 ffffffc0803a2f18: f9402a74 ldr x20, [x19, #0x50] ffffffc0803a2f1c: b4fff6d4 cbz x20, 0xffffffc0803a2df4 ffffffc0803a2f20: b9401915 ldr w21, [x8, #0x18] ffffffc0803a2f24: f9000134 str x20, [x9] ffffffc0803a2f28: b9401268 ldr w8, [x19, #0x10] ffffffc0803a2f2c: 11000508 add w8, w8, #0x1 ffffffc0803a2f30: b9001268 str w8, [x19, #0x10] ffffffc0803a2f34: f9401288 ldr x8, [x20, #0x20] ffffffc0803a2f38: d538d089 mrs x9, TPIDR_EL1 ffffffc0803a2f3c: 8b090108 add x8, x8, x9 ffffffc0803a2f40: 52800029 mov w9, #0x1 // =3D1 ffffffc0803a2f44: 91002108 add x8, x8, #0x8 ffffffc0803a2f48: c85f7d0b ldxr x11, [x8] ffffffc0803a2f4c: 8b09016b add x11, x11, x9 ffffffc0803a2f50: c80a7d0b stxr w10, x11, [x8] ffffffc0803a2f54: 35ffffaa cbnz w10, 0xffffffc0803a2f48 ffffffc0803a2f58: f9400a68 ldr x8, [x19, #0x10] ffffffc0803a2f5c: f1000508 subs x8, x8, #0x1 ffffffc0803a2f60: b9001268 str w8, [x19, #0x10] ffffffc0803a2f64: 540003c0 b.eq 0xffffffc0803a2fdc ffffffc0803a2f68: f9400a68 ldr x8, [x19, #0x10] ffffffc0803a2f6c: b4000388 cbz x8, 0xffffffc0803a2fdc ffffffc0803a2f70: b9401268 ldr w8, [x19, #0x10] ffffffc0803a2f74: 11000508 add w8, w8, #0x1 ffffffc0803a2f78: b9001268 str w8, [x19, #0x10] ffffffc0803a2f7c: f9401288 ldr x8, [x20, #0x20] ffffffc0803a2f80: d538d089 mrs x9, TPIDR_EL1 ffffffc0803a2f84: 8b080128 add x8, x9, x8 ffffffc0803a2f88: c85f7d0a ldxr x10, [x8] ffffffc0803a2f8c: 8b15014a add x10, x10, x21 ffffffc0803a2f90: c8097d0a stxr w9, x10, [x8] ffffffc0803a2f94: 35ffffa9 cbnz w9, 0xffffffc0803a2f88 ffffffc0803a2f98: f9400a68 ldr x8, [x19, #0x10] ffffffc0803a2f9c: f1000508 subs x8, x8, #0x1 ffffffc0803a2fa0: b9001268 str w8, [x19, #0x10] ffffffc0803a2fa4: 54000060 b.eq 0xffffffc0803a2fb0 ffffffc0803a2fa8: f9400a68 ldr x8, [x19, #0x10] ffffffc0803a2fac: b5fff248 cbnz x8, 0xffffffc0803a2df4 ffffffc0803a2fb0: 94344478 bl 0xffffffc0810b4190 ffffffc0803a2fb4: 17ffff90 b 0xffffffc0803a2df4 ffffffc0803a2fb8: f9400fa8 ldr x8, [x29, #0x18] ffffffc0803a2fbc: f00092c0 adrp x0, 0xffffffc0815fd000 ffffffc0803a2fc0: 910e5400 add x0, x0, #0x395 ffffffc0803a2fc4: d00099c1 adrp x1, 0xffffffc0816dc000 ffffffc0803a2fc8: 911d1421 add x1, x1, #0x745 ffffffc0803a2fcc: f9403102 ldr x2, [x8, #0x60] ffffffc0803a2fd0: 97f46d47 bl 0xffffffc0800be4ec <__warn_printk> ffffffc0803a2fd4: d4210000 brk #0x800 ffffffc0803a2fd8: 17ffff87 b 0xffffffc0803a2df4 ffffffc0803a2fdc: 9434446d bl 0xffffffc0810b4190 ffffffc0803a2fe0: 17ffffe4 b 0xffffffc0803a2f70 > > > > > > > > > > > > > > > Signed-off-by: Suren Baghdasaryan > > > > > > > > Kinda sad that despite the static key we have to control a lot by t= he > > > > CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT in addition. > > > > > > I agree. If there is a better way to fix this regression I'm open to > > > changes. Let's wait for Steven to confirm my understanding before > > > proceeding. > > > > How slow is it to always do the call instead of inlining? > > Let's see... The additional overhead if we always call is: > > Little core: 2.42% > Middle core: 1.23% > Big core: 0.66% > > Not a huge deal because the overhead of memory profiling when enabled > is much higher. So, maybe for simplicity I should indeed always call? > > > > > -- Steve