From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6D832C021A4 for ; Fri, 14 Feb 2025 07:26:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DE7E66B0089; Fri, 14 Feb 2025 02:26:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D96686B008A; Fri, 14 Feb 2025 02:26:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C37526B008C; Fri, 14 Feb 2025 02:26:11 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id A3EA76B0089 for ; Fri, 14 Feb 2025 02:26:11 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 4593C141F0C for ; Fri, 14 Feb 2025 07:26:11 +0000 (UTC) X-FDA: 83117716542.28.BAE2AD6 Received: from mail-lj1-f175.google.com (mail-lj1-f175.google.com [209.85.208.175]) by imf26.hostedemail.com (Postfix) with ESMTP id 3CAC114000A for ; Fri, 14 Feb 2025 07:26:08 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=fN8r1GUe; spf=pass (imf26.hostedemail.com: domain of ubizjak@gmail.com designates 209.85.208.175 as permitted sender) smtp.mailfrom=ubizjak@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739517969; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=38pYU+65ovTN19/hqSg/fVJVSZjgbnGfZhqllgCvemQ=; b=ahqqe/sv3ZWK2iZU3PiewBhwnWoZLgTBdiq6ribTeXUBQ57/+0oTjGkZIMtxBuildpylad lRGooSz9M3++iuOEtXomnK1rq9zLiewl+/BVAaB1sa6ebTgPqSxwi5teo4pfTN9vIWDTGi 5bv4TekqGMQ7Bb/PD9p4kwlHXBMP/F4= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=fN8r1GUe; spf=pass (imf26.hostedemail.com: domain of ubizjak@gmail.com designates 209.85.208.175 as permitted sender) smtp.mailfrom=ubizjak@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739517969; a=rsa-sha256; cv=none; b=MsbR0FdH23+DxHLiQD4wHi3csU1/W0UaZBdlWuBjPPDDEIaPAFPsgqgbXXuXJO8zVsqlqp TzNyIv7A71u5HNTNilvDd5ton4KCnEodGeXCWqDYX1kyW37BxIYdyxXyzLl7meuRq5IF9i btrYDV7rgA/QjQG2+uSnOODBQm8mScE= Received: by mail-lj1-f175.google.com with SMTP id 38308e7fff4ca-307325f2436so14629731fa.0 for ; Thu, 13 Feb 2025 23:26:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1739517967; x=1740122767; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=38pYU+65ovTN19/hqSg/fVJVSZjgbnGfZhqllgCvemQ=; b=fN8r1GUe4xG7L20supLPRzBku78X+59UXm8tjCUlW9KI+pQTcXkczAmc/WLCGACflq d3zl9hytWXulMAChrccoT9gvhF78O92WWebMY8HDyKtzwTrjqt4tfTglrfnAw89rG63l hMzhOx4liPI2OjT0g1hEPquOoib1q3tFzZWYw4Dqk149k7eosUziMUI5r2+0MMI7zfHG ILpWBsYTQoJM2cSD8DucR1wCw1JgCQQIqvuQLAiPlT6z9DafEv/LQxdSmAPjsY69BOoM EsMeUiEDsf3Xdj72f2zO37mfl8Ey9B8yNHPTVoIc+8AnzeqFAROJWfAZ333ggyxNmv0Q hndA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739517967; x=1740122767; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=38pYU+65ovTN19/hqSg/fVJVSZjgbnGfZhqllgCvemQ=; b=omRV3ZpRyh2jv4B8nXD0DbPntcmlkHyCpVL+59bU0+LweuBct7gkb4dlHm+pVSKKk8 ZmO3d2cqwYbTNajL8Y96dktksO1c/YzfqcB3i9sjFhczvI3TshZlZ7lO57cPH7xNNAo0 9JBL9gSSKOQwEUCO5o4KXfL/QKrAH6187qtkFgCEcCCuzQlU7rEIouvB+ClAQl6g1NyI Cdj4gGU5PZP3cXegZ0dE1y/ouK9UOrm+fSAdS2Ko5F13e2UaAFz7mUEMKZnZsgINwfxw Sqzq1vNOrMrjibDq+8jKGYSqa2Y6Nd6A0VRuh7Ad6866EV3yoAPB7LRXqVfCHd7r3nej Hz/A== X-Forwarded-Encrypted: i=1; AJvYcCVHohTVz2PadQ5pkUgbN9NawPm3iHxlzmUKQ05s59z8Dc3e5lUOxfQS/iLU4cXMOTRQJTmU5ErRvA==@kvack.org X-Gm-Message-State: AOJu0Yzk5ybIJJQSI4NCcMgdjI/nHLoujNQHkBOMKi1DaGKtvK2Jy4N+ cI0y1p+MRE4LLol/IuL3bQibpGsWHaAupJlizRJxD9/VFkz3lYqqEY7RpiRisYFZiz8/eK810uz IgqT6KcZ2smNT7b8M3WKFZpzyGSs= X-Gm-Gg: ASbGncsZo8q5RaUirRKO0/rJOORNPMnEIU25TRvujuj2qIm6VPC8beufSpaEVKjZXP6 1steefp22oOVzwhuyMx/O51A/Om/3v1NBAfQC4tZUGne5+bNyIQ4XcKjsNERDZkdsh7SkaAlJ X-Google-Smtp-Source: AGHT+IF0qiv6mckgLE92LpM+wmWQ7kbCCAFlQgv+ICaPlczv2yFTt9B4STFkKc9CrzcR2usAuXLG4ZubeGOzBf6XIj0= X-Received: by 2002:a2e:a405:0:b0:307:9054:2a20 with SMTP id 38308e7fff4ca-309036d9711mr30490291fa.21.1739517966952; Thu, 13 Feb 2025 23:26:06 -0800 (PST) MIME-Version: 1.0 References: <20250213191457.12377-1-ubizjak@gmail.com> <20250213191457.12377-2-ubizjak@gmail.com> <62965669-bf1d-461f-9401-20e303c6d619@intel.com> <87194c62-7e97-41d3-98bd-14288e8bde8f@intel.com> In-Reply-To: <87194c62-7e97-41d3-98bd-14288e8bde8f@intel.com> From: Uros Bizjak Date: Fri, 14 Feb 2025 08:25:55 +0100 X-Gm-Features: AWEUYZkSNoFddTZnJtwGsd6M2G4C4JcnNyWzAdNRaSiO5ps3kzNqSuA9poXcr6c Message-ID: Subject: Re: [PATCH RESEND 2/2] x86/locking: Use asm_inline for {,try_}cmpxchg{64,128} emulations To: Dave Hansen Cc: x86@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Dennis Zhou , Tejun Heo , Christoph Lameter , "Peter Zijlstra (Intel)" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 3CAC114000A X-Stat-Signature: o69odeq7saaqoxpe4oge3oir37d45qrn X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1739517968-304371 X-HE-Meta: U2FsdGVkX1/ppNDveGaCofTfb4oIV70u+cyqiwW8Tov4+GtlvsG1bNSDbbSA0FpJtYVx2BNJSmnsBuB+YWgN07pDirbG7pZTO2N4rFtzjp/HQ3o8MBrluEOy9/orVxVju5ykDYgop/TKrco2pWKiDpJrUkprmZcs7SqiES6+fp35zlLqDWHNfGj5xsqImN6ahpZVyB6+37K2ElUWRjGbMIxjdeLGAimdeODSSnnxgPYh+wPyIE0cEeF8uyVCgjnv78JLiyFFbUn2yVckzdFdBPGR3OrGU09zFdGkiva9Fhl47BC6nb09TnghC79hi/ASqEYinfuLOGhAswLPE6iXhw4MadHCDgJrurwsHwVVuPKo022Y/RhWHtR1QekAjh5VeE9DWKyZusZtp1DN2JRJoviYbWDTSfjJ+gadZzSiWqIdW9A5QpotqLRCEOyzB371TYQZZCrZSYSNwgc1zTswg5Wx6kKhacP+M0prEzXAAUX2BCO/Haut8l+k9Ul7H8qw9RPPYeVGCP/VuLQEg5Zh7mAaAgFn5Kg6puMUB4TD2CvO3TfK5FoZnFNzpX32QugMFE8ZRffdI8vFZR6MAGSndrneG2cRRTrjH4/M6ktpyH01EpkPoF3qpMamYO64plYuZ010Su5UMEL5REbdJ1aYXQ+0ZKV/5bFUbfa+VqKA/4bpOfh1yrWWk/IePHIzTocVR8sAouHEYbBWIJ/lmO08+iu6Q4Ip4MPqeevvUsCfHXJqGN8gturM+25E3jsH3B2vYXQQITVIv8VVTzLtKeR+8HRW8cPEg8svk7foEA+vWayNo7pJzJTIYysXtqYwHmdp2dmyoFCQLlXuUvvI5lJNMUz6RKiJ07SLQQTrv6R4oM5y8W5uAPUjpRMs4/0mzfQpk9Yf+KO5fKtPbz3n1JZUQ6q2phb+UuXtCwE+zg7u6qfDKXoY/YSbHtmLEouhDcVfGX7Y43tf/G7ABpkDaAl 8mG/d4JC GymUWIoDSm9W1y/kPchfMHR5vZ+M4wW/o1Cv3paL1xxxd8nxQ/mx9imvPMhN3h7PGV5vON4Q4rmymoeECT/sWCLejIqr760yXT7jRwNie7YLYjF1ycMB5aPjf1a2IlxVQ1TblDPSA/uwXBlnkvfoxSgyjJnvw3kTcztKgCpsrwlH4YhbcLobh8GgJ/oVK+hWfOeDqvjWlqSjvbOqU9LIn67f1xsVsq1XClfonEw4xePIQGmM7TruEtHUHUU/YXv5DOzDZNbsYQBwGOBo8hazweVMZqJjrmUPrPPgmgEQm9JQV8bPIOJDDLXoZVgdF0akKdBZpnm5db2TkvKZ9jWGx+JKYWhL+KjlL9UpwvlKmibLlXQRsgWVxV2vxZnaD8ANIduAToAlEOXVZbg0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Feb 13, 2025 at 11:52=E2=80=AFPM Dave Hansen wrote: > > On 2/13/25 14:13, Uros Bizjak wrote: > > On Thu, Feb 13, 2025 at 9:48=E2=80=AFPM Dave Hansen wrote: > >> On 2/13/25 11:14, Uros Bizjak wrote: > >>> According to [1], the usage of asm pseudo directives in the asm templ= ate > >>> can confuse the compiler to wrongly estimate the size of the generate= d > >>> code. ALTERNATIVE macro expands to several asm pseudo directives, so > >>> its usage in {,try_}cmpxchg{64,128} causes instruction length estimat= e > >>> to fail by an order of magnitude (the compiler estimates the length o= f > >>> an asm to be more than 20 instructions). > >> > >> Just curious, but how did you come up with the "20 instructions" numbe= r? > > > > Currently, a patched GCC compiler is needed (please see > > asm_insn_count() and asm_str_count() functions in gcc/final.cc on how > > the asm length is calculated) to report the length. For historic > > reasons, the length of asm is not printed in asm dumps, but recently a > > GCC PR was filled with a request to change this). > > So, that's also good info to add. You can even do it in the changelog > with little more space than the existing changelog: > > ... fail by an order of magnitude (a hacked-up gcc shows that it > estimates the length of an asm to be more than 20 instructions). > > ... > >> Is any of this measurable? Is there any objective data to support that > >> this change is a good one? > > > > Actually, "asm inline" was added to the GCC compiler just for this > > purpose by request from the linux community [1]. > > Wow, that's really important important information. Shouldn't the fact > that this is leveraging a new feature that we asked for specifically get > called out somewhere? > > Who asked for it? Are they on cc? Do they agree that this feature fills > the gap they wanted filled? asm_inline is already used in some 40-50 places throughout the tree, but there still remain some places that could benefit from it. > > My patch follows the > > example of other similar macros (e.g. arch/x86/include/alternative.h) > > and adds the same cure to asms that will undoubtedly result in a > > single instruction [*]. The benefit is much more precise length > > estimation, so compiler heuristic is able to correctly estimate the > > benefit of inlining, not being skewed by excessive use of > > __always_inline directive. OTOH, it is hard to back up compiler > > decisions by objective data, as inlining decisions depend on several > > factors besides function size (e.g. how hot/cold is function), so a > > simple comparison of kernel sizes does not present the full picture. > > Yes, the world is complicated. But, honestly, one data point is a > billion times better than zero. Right now, we're at zero. > > >> It's quite possible that someone did the "asm" on purpose because > >> over-estimating the size was a good thing. > > > > I doubt this would be the case, and I would consider the code that > > depends on this detail defective. The code that results in one asm > > instruction should be accounted as such, no matter what internal > > details are exposed in the instruction asm template. > > Yeah, but defective or not, if this causes a regression, it's either not > getting applied to gets reverted. > > All that I'm asking here is that someone look at the kernel after the > patch gets applied and sanity check it. Absolutely basic scientific > method stuff. Make a hypothesis about what it will do: > > 1. Inline these locking functions > 2. Make the kernel go faster for _something_ > > and if it doesn't match the hypothesis, then try and figure out why. You > don't have to do every config or every compiler. Just do one config and > one modern compiler. > > Right now, this patch is saying: > > 1. gcc appears to have done something that might be suboptimal > 2. gcc has a new feature that might make it less suboptimal > 3. here's a patch that should optimize things > ... > > but then it leaves us hanging. There's a lot of "mights" and "shoulds" > in there, but nothing that shows that this actually does anything > positive in practice. Let me harvest some data and report the findings in a V2 ChangeLog. However, these particular macros are rarely used, so I don't expect some big changes in the generated asm code. > Maybe I'm just a dummy and this is just an obvious improvement that I > can't grasp. If so, sorry for being so dense, but I'm going to need a > little more education before this gets applied. Thanks, Uros.