From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 14D44C0218A for ; Tue, 28 Jan 2025 23:43:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5BD15280257; Tue, 28 Jan 2025 18:43:29 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 56C68280254; Tue, 28 Jan 2025 18:43:29 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 433A8280257; Tue, 28 Jan 2025 18:43:29 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 25E79280254 for ; Tue, 28 Jan 2025 18:43:29 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 9891A16063D for ; Tue, 28 Jan 2025 23:43:28 +0000 (UTC) X-FDA: 83058489696.30.736CEB0 Received: from mail-qt1-f173.google.com (mail-qt1-f173.google.com [209.85.160.173]) by imf17.hostedemail.com (Postfix) with ESMTP id BBA0440003 for ; Tue, 28 Jan 2025 23:43:26 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=E5FBIoLB; spf=pass (imf17.hostedemail.com: domain of surenb@google.com designates 209.85.160.173 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738107806; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=h5HYQVk9ww+2wAIaU3C7JuYbQMOelXd7Kw5sBRpl0OQ=; b=b+2mWPYXY1byOS88NgqNP1mXme+C78+6/HnN8D9M8k/8S2/eX+RBqgzNfSKyrrUnjHgidu gO8bY1xJzQZupwFJ2NA6YqeusUSILv0XP2K59B2V3xKdA5NdDU3DjydEO9lM+1Gq0T0Ob9 ZmMDWO3vGe7n3eTnOYHnIXncEU4GkSk= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=E5FBIoLB; spf=pass (imf17.hostedemail.com: domain of surenb@google.com designates 209.85.160.173 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738107806; a=rsa-sha256; cv=none; b=hNe5HUQutV7tKzZ7MXz4TNe6nbUy6d/MSh0eX09hbB/zb5Haxlp2wPvwjkPppF2Fu3Aobv oysdhsVpVCgm41XpAVy44+FIiWumIow/u9WMkFnZocENvZ5wqjMmd2cx10di/9bGkX73WJ lokM1XgrqranRy17Fsn3PzmJEI+9628= Received: by mail-qt1-f173.google.com with SMTP id d75a77b69052e-4678c9310afso374671cf.1 for ; Tue, 28 Jan 2025 15:43:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738107806; x=1738712606; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=h5HYQVk9ww+2wAIaU3C7JuYbQMOelXd7Kw5sBRpl0OQ=; b=E5FBIoLBgSJMebVEQegDsuWEptVG4lOwm0Kz6DOnA/lovD7/IwUWQy1DGVukEux0+9 hKkzbXtiQrmeCkAaqxo5vIvOSZrSAdv9LQwuN7Xx/pBFZlHL5shSjkIrYjR7rFJZ7hMo qGTNM+CuP4MiSFyN8NX7TMcex8ck5i5j1nJqmtxWqK/nCp/D6lSL/qJJfZbD/LFEJyZ+ SOYeqcoLQYAPYeddqi6vJdMnPK0Hm+Diil9atXvWbtzWMLqRyfdTZ8qjApdUKM+406Dg TBSCljf2xSQh31GDO3gdO/Li2ObSPlpFJXvigmbJi0u5XX3UBqnjr6tgOns6eMpQjy8r /6VA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738107806; x=1738712606; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=h5HYQVk9ww+2wAIaU3C7JuYbQMOelXd7Kw5sBRpl0OQ=; b=dYGWEu3YsPThykJ0RcoBHJ+F5gjmtLwSNXoGoq/aNujbeY2H4deDo/SpaeG3Goygk3 9btahaOzW2eDsKghcdnX2jy8GlsjezsBeDEn+X5rt+nyPO/pskZBq6isqVbO2FXPZZ/M 1UQ7dgHn6sRxeMl+AfBMQyWk3DCWXHEm5+fjzripIahay0XjUv9aA9hfKNMAV/Dy2hO4 roloYlvr6xouYxErndEAqPR3ezIaGqXGOwlHlfxSWNu+39Oxdbixpf6uZn84ArtQc/pU my5liezIK972nT8dDzL1pKhTym4A9irFTrO0pxAUlHVSSLq4I9oDlebzI1q7QwBAt8VD qt0A== X-Forwarded-Encrypted: i=1; AJvYcCXEp3GoiQc51W/TNzu1BGV1i2GzyBkJkNFll/+w5SbUeahebbkIVnoHML4+b7f9E3Z0qfyi2V7JyA==@kvack.org X-Gm-Message-State: AOJu0YyPDMzGs42mFcAvEMKmFggsymp/APOyccHq4O50IkEER/sDUKvI Tmsy3qdey1iiWCHkPNAfIUCibjETUscjlqVDGuhlQFXU20F++P5Ca6wK8t06Rhbzt67j2uhf3Zf fKpp7ZXUbkAx0OG/ucULJt8OT3k1HTwoQ0qUE X-Gm-Gg: ASbGnctoiT32HnHMkW8QiuBbkiyzAlXHLGoaVe1mc0dT9wwjpp0ZRddSaYce1StRUhl 2b0fjTTd2gsQXeqG47Gg/nPIoNo3JfN/nSutoRFuHMqRgocErFbmmC0AMt1QcyxI8ttBSaJqn X-Google-Smtp-Source: AGHT+IGoWjOmJ5kamjEx5Ji6sQp0Tw93kUBU4Gq7xs7iLTnmMrDiCXQDXK1Dv+OocmruXOoKmdlAyVR1x4rOvznxGno= X-Received: by 2002:a05:622a:1347:b0:461:358e:d635 with SMTP id d75a77b69052e-46fd130a96dmr1334271cf.18.1738107805501; Tue, 28 Jan 2025 15:43:25 -0800 (PST) MIME-Version: 1.0 References: <20250126070206.381302-1-surenb@google.com> <20250126070206.381302-2-surenb@google.com> <20250128143549.62f458a7@gandalf.local.home> In-Reply-To: <20250128143549.62f458a7@gandalf.local.home> From: Suren Baghdasaryan Date: Tue, 28 Jan 2025 15:43:13 -0800 X-Gm-Features: AWEUYZmDZvIPqQ326s1zp1vS5_gZ5DGN1bzV41Lw6siLpx5vy6N6TY5IQTquSjM Message-ID: Subject: Re: [PATCH 2/3] alloc_tag: uninline code gated by mem_alloc_profiling_key in slab allocator To: Steven Rostedt Cc: Vlastimil Babka , akpm@linux-foundation.org, Peter Zijlstra , kent.overstreet@linux.dev, yuzhao@google.com, minchan@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, 00107082@163.com, quic_zhenhuah@quicinc.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: BBA0440003 X-Stat-Signature: yp1tn8mn99hakj9gjjn6t6r4jo8x5y5h X-Rspamd-Server: rspam08 X-Rspam-User: X-HE-Tag: 1738107806-949109 X-HE-Meta: U2FsdGVkX18xWQB84k/N/SpjXykFbLaghzEdeayRzQzj//dpcFwWC4Lva7PWFnkLBS06RY5ysKbyaf5oAxmVA3FQcoZ3pm6tFmHwF09y+BLkqdYIaQnPGQmNIjfLQyUhak92NbbIO7FOFalNWBiYMogJh51TfM4vMRSq0WDuyV8UNck8eWLhN085+ZvG+CpxE7/teUd0zSqEWoBTWxm2A7IJgU7oht+u+44z3T8CplKuHJtSBM7LkdSj+SePDfKy2tMLGx4baWupc7AJk4yg1QjihX/Bqbd5HNidKMV9Hfm+dM/znkWRg6pDGk5rCgDeW/ssfzyDIBAxCCTVGb/MttiUiSzSBxDksOfUOKSGMhFgdvLknJYEeJSA9KK0qUqfZIx2mR+FAVVdNbN449xLYagu+gYykXekwhnGvRYTo7jbIVHfmzzIf+cqhVVKSwX1rYndpmU6MQmhgsSmzhWpjlW/Guj2r+B+yDDbxgOvCrB79v6KHscA1AxmCMh9LPNUVbomjx2Jf8nebEMmksr+wVXwbJwKBLMvyKbS02YMFsqnN9XChx59EXTLKYHAlDxDidIFJPYGPatMckW2NVsv29R4VcaPo/RTosm5LrTPEkE54A5Th72jr6ndB0fbs1vtlIXk8aViLshg4AkRFaZ/S+LY1l+boNdZpoWkkFK7maIXDfT2BSjEtjNpQn3H68RLBga8I2Qi7kxfTp6R1c/JUJ19ky33KqNs2kpWR+bDNC/UvpECRlkQy1x1T8eNbFeJh/Vh2XSuSXxoSdDYAD6uCLb9rLV2Z9MGM70N5RiiSR9VYQ6q39chNWy3KjvhgplCLsYvsT5ElGUM2R+Q8swzGyp4D4/AVP7EE1syEvJgvYPLLXm6fUAnuV7WfUDmIjXHn0UKQWsBJB1E8NSv7Lu7HwMZrGSrRQP1n3Uo4WciWggQ+zmWxkFhEU3a06QuOeJUT/Yd3rvtB57ru+u7YT8 e3NUIpM1 rGHPdBJvbcvZFPM5LP4aH+33Pkr/36/40jIen2vj+jXGJMj+ou2oKCh4e8DbOQi+rq229Nn3f/4CAm4xlWwuvkdGWkDMNyLsZOwNJK3kuYhCOuvZiZH9Jmd5HKvciJ20J0JJnbUAiiZru4m1cOjEn7faRwE/9q94JCaO1MxhGqhvHhcZ6ZuEHRzneysjbx6KRsEg1+Uo3QT2P/VusPzqNBmdAyVhbakfTATbPc0c8/nj/7ZyIguxE3oNLlWOb+WOqNsLfbQUpW+ovUwkt0+zRz9y40alYF7kYcWq3fhI9BAy+lc0YoqFtwosns36pT3rtNb2M X-Bogosity: Ham, tests=bogofilter, spamicity=0.000009, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jan 28, 2025 at 11:35=E2=80=AFAM Steven Rostedt wrote: > > On Mon, 27 Jan 2025 11:38:32 -0800 > Suren Baghdasaryan wrote: > > > On Sun, Jan 26, 2025 at 8:47=E2=80=AFAM Vlastimil Babka wrote: > > > > > > On 1/26/25 08:02, Suren Baghdasaryan wrote: > > > > When a sizable code section is protected by a disabled static key, = that > > > > code gets into the instruction cache even though it's not executed = and > > > > consumes the cache, increasing cache misses. This can be remedied b= y > > > > moving such code into a separate uninlined function. The improvemen= t > > > > Sorry, I missed adding Steven Rostedt into the CC list since his > > advice was instrumental in finding the way to optimize the static key > > performance in this patch. Added now. > > > > > > > > Weird, I thought the static_branch_likely/unlikely/maybe was already > > > handling this by the unlikely case being a jump to a block away from = the > > > fast-path stream of instructions, thus making it less likely to get c= ached. > > > AFAIU even plain likely()/unlikely() should do this, along with branc= h > > > prediction hints. > > > > This was indeed an unexpected overhead when I measured it on Android. > > Cache pollution was my understanding of the cause for this high > > overhead after Steven told me to try uninlining the protected code. He > > has done something similar in the tracing subsystem. But maybe I > > misunderstood the real reason. Steven, could you please verify if my > > understanding of the high overhead cause is correct here? Maybe there > > is something else at play that I missed? > > From what I understand, is that the compiler will only move code to the e= nd > of a function with the unlikely(). But, the code after the function could > also be in the control flow path. If you have several functions that are > called together, by adding code to the unlikely() cases may not help the > speed. > > I made an effort to make the tracepoint code call functions instead of > having everything inlined. It actually brought down the size of the text = of > the kernel, but looking in the change logs I never posted benchmarks. But > I'm sure making the size of the scheduler text section smaller probably d= id > help. > > > > That would be in line with my understanding above. Does the arm64 com= piler > > > not do it as well as x86 (could be maybe found out by disassembling) = or the > > > Pixel6 cpu somhow caches these out of line blocks more aggressively a= nd only > > > a function call stops it? > > > > I'll disassemble the code and will see what it looks like. > > I think I asked you to do that too ;-) Yes you did! And I disassembled almost each of these functions during my investigation but in my infinite wisdom I did not save any of them. So, now I need to do that again to answer Vlastimil's question. I'll try to do that today. > > > > > > > > > > Signed-off-by: Suren Baghdasaryan > > > > > > Kinda sad that despite the static key we have to control a lot by the > > > CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT in addition. > > > > I agree. If there is a better way to fix this regression I'm open to > > changes. Let's wait for Steven to confirm my understanding before > > proceeding. > > How slow is it to always do the call instead of inlining? Let's see... The additional overhead if we always call is: Little core: 2.42% Middle core: 1.23% Big core: 0.66% Not a huge deal because the overhead of memory profiling when enabled is much higher. So, maybe for simplicity I should indeed always call? > > -- Steve