From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B6A95C021A0 for ; Thu, 13 Feb 2025 22:14:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 48488280005; Thu, 13 Feb 2025 17:14:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4338B280001; Thu, 13 Feb 2025 17:14:01 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2FAAF280005; Thu, 13 Feb 2025 17:14:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 12D8E280001 for ; Thu, 13 Feb 2025 17:14:01 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 8A32C1617F5 for ; Thu, 13 Feb 2025 22:14:00 +0000 (UTC) X-FDA: 83116325040.28.21B5844 Received: from mail-lj1-f173.google.com (mail-lj1-f173.google.com [209.85.208.173]) by imf17.hostedemail.com (Postfix) with ESMTP id 9868840010 for ; Thu, 13 Feb 2025 22:13:58 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=BPyB+Fb3; spf=pass (imf17.hostedemail.com: domain of ubizjak@gmail.com designates 209.85.208.173 as permitted sender) smtp.mailfrom=ubizjak@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739484838; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ZBPXpf/lt1Re8L6uGMmeTBCcROMhpPVB+uEyhjkPLFY=; b=kz6VCfNfaYSO8UlNcdCc2OO3aTBDU/RMji1B9dQnZZlOJWunxh9iu3fvTE6R+P1ccoajNO uEifGeRVgiylgLDxxPi20jpnXxybIckQq18/0wQW5gHbJ5F3brofRLBgNaEGbv4YzY/dQP 1YWjMbO7qJcXI+EccYavjceXg8pdo2M= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=BPyB+Fb3; spf=pass (imf17.hostedemail.com: domain of ubizjak@gmail.com designates 209.85.208.173 as permitted sender) smtp.mailfrom=ubizjak@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739484838; a=rsa-sha256; cv=none; b=ddEWmqu3vat2P5xg35p5ub+g140cuRhwc77p52ABxrOy8k/nVAkIBsdNhEt5BAxB/UpWYo CI7xRH2TmrED5uu/+Zx0ruq7tLU05fyAkD6p5KddcaQNaeid2u10ZMywWfVFgkzlw8RoHY w9TADTzJAkcUYIJUp6PViLhlKPggyjo= Received: by mail-lj1-f173.google.com with SMTP id 38308e7fff4ca-3078fb1fa28so14256141fa.3 for ; Thu, 13 Feb 2025 14:13:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1739484837; x=1740089637; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ZBPXpf/lt1Re8L6uGMmeTBCcROMhpPVB+uEyhjkPLFY=; b=BPyB+Fb3rKwNLJoG6DSS4rH8x64vWgWgzp6gK+n3K2OfEMpX8fC07vCpqfOHV8ZFgT QiyRz0zYQuouq57vVdNp/PanynzRhTTeRxw3OeVwd6XhCHJhimzKh9OeKDSzCJ9q1Ra5 Tg2StdoYcipUNReAikHnFwjc7EzQv2OA4C/rT8JX+6BLSSIeUUak5cFtjRA5spb/qc1F 0vGoJfjBweacltlDR4wpB88qMtWCjIrEIhluC31Bpp88tuZ8u0qy0qrjA7jPd5YRoPia a9XbLyGgJv8JgWrBPteWJ+aF8vP2vH/N82J/QKzu+6kefNnbVFAnE5kt5o8d/aFIB2/5 iOkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739484837; x=1740089637; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ZBPXpf/lt1Re8L6uGMmeTBCcROMhpPVB+uEyhjkPLFY=; b=flIpVVj/cgepAj27chkn/tWHdghpUwH4nach5FNdZ2qANyOX38gRVJzrfhtsMcXZ03 LRaGQN6R4yBF+F4wMYsLp/BfTuv05uwhsxbTGmua8lzv8Tu3XdDlPPUWr7zTcFZPRoEM p8JmvmKaWu/u9VSaVcAxbze3QR26PJoH10iIqp+ZWtSBDvyGZ2UdKpG4yDUpJktvD9sP 1MEWqDmahHS6LHjPEl39Ogi+dZ94yk0RuC/tRPfCTft53+mgD1+NEcxqj0hjyxP/kEZS 2gxRJLb9mQsQh1fQ44cPDHGlT5BcxTxoBZdiXgELM2sMoO6BEC3ZTVO2zQscpmPwxvIZ nLZg== X-Forwarded-Encrypted: i=1; AJvYcCWt/3zQYxzRjp9cgxpgX1yaRX34SB4zXxm4KKxDPC5WJ2/X2WqkfTqvgScHD2ha082A+fPo06pIfA==@kvack.org X-Gm-Message-State: AOJu0YyZogPzsiwLvl1GwDjL2VgasNQrgCLLNSQd73OSpEzUCeeLFljB KS4PDFW6M94CQa5BiDipRKXXo+l707Jo3DM88G2DK8BHAHZ3Xrd5KkY3/1XgzGrwO8tlDPQK8Tn im4Admgkw6NUFq1OlBNAW1OcRWjw= X-Gm-Gg: ASbGncv8jx/26noHQOXmzhIexAFAuB8q3H696DB7PiaOePHr4TZztaxonPVMnyXUL4u 4yYV28XrnPv/SeG5XvyKqf/zIQ+57vLdy8HELs3ek0f1oLYcqG8Rm5Hz/j41PurU6eLJQg6o+ X-Google-Smtp-Source: AGHT+IHl869c/vhlrP9L11tJcUZj4Gj4SbPL20iiYmlSLs9jNRrYwRe7mQ9DoM2Z41cfTV+TkIZRS89bB8mPBb9gVlI= X-Received: by 2002:a05:651c:2121:b0:308:f6cf:3607 with SMTP id 38308e7fff4ca-309037abe87mr30937061fa.3.1739484836480; Thu, 13 Feb 2025 14:13:56 -0800 (PST) MIME-Version: 1.0 References: <20250213191457.12377-1-ubizjak@gmail.com> <20250213191457.12377-2-ubizjak@gmail.com> <62965669-bf1d-461f-9401-20e303c6d619@intel.com> In-Reply-To: <62965669-bf1d-461f-9401-20e303c6d619@intel.com> From: Uros Bizjak Date: Thu, 13 Feb 2025 23:13:43 +0100 X-Gm-Features: AWEUYZmtdkzZDqhBdjEAzNtz89ohQz-Hv1552tFAj3hAGo3iv7K2qa1i-_hhuBg Message-ID: Subject: Re: [PATCH RESEND 2/2] x86/locking: Use asm_inline for {,try_}cmpxchg{64,128} emulations To: Dave Hansen Cc: x86@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Dennis Zhou , Tejun Heo , Christoph Lameter , "Peter Zijlstra (Intel)" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 9868840010 X-Stat-Signature: qdmbnwkicgbb16qp3qkbds5fzb6hy1de X-HE-Tag: 1739484838-888795 X-HE-Meta: U2FsdGVkX1+YZ1SPuwkHXbzv/HUzXqnCfs3lr9BpSNU1Dz+xyWkmBNhoZATW8QY/3i30tla7eXMmTxknZcMUmPl1pAT89AJPi4Z+6L7wVR3TOJiAqxlc7tTrppBT36zfOZW+q05RQkIes0RC+X/jVrf8BTdeVcvTLzKcGgBz/vUgpbUqBYaE4e2rygjaqgRmmFRPn5ZrnXjESrfDEt7JeazfPiLPnHtuPoP7Cl5qT/WsKU4pDOOnZ1UMYybY9F9dtEzUsTb1m0kZSEIIg3Oq+0XKW5YohSXWJS7fkCcd77TriFvm9v3qlsyfGTrz6PTfAXyA36XhgHdxjq1RnV7snmMwMzc8JIHQIN10HV+fyDvjYt8CRD+K9bIHb5UH88pGAMsz6MxdQQtPyCoABuBWbFPN4YUwBDD2TlyrFXf3m9K0hga9QPiU0D/vvRpvb08nsjxt7nI0G0Qj31Cuwxw25inCOSc3Rsm0Ehm6xnId8TFXvCMtCc6R6xvBrlkl3NYO/OUuStKP9HCMtyhppXrDL2zhdqkUMCsrezdiKwqVtyKxDhYpwNb9Lw4+0Gsfgs1QhoEtZJkvHY2LUMJ/ujqWpOewj85qxdJKbRbfntpYcp7Qn+NsuBBxanzOcpr5Dkr0fRrncS72zEVJvIfkhcnkyZNZiLJ48vNVvLzyw0IHNZaTj0Isgshltd+utxY2cOEIOj/qfZxaz2iAEJiihrKNiUsNb5QHQzxIy63hiK5RtRdWjpv8gJCMZe80OlxKNTV403V2LRtiPrIIpIgVKU6A8uIQUCUfS+M93TKLR+UEb4BygQlG6zNjBoXZifi5gOo871leFLlg8gHG/nu+9Rxb+1UXzo8/kN+pYAxwGPMo+IljfydkjFUL5D4NnAAFaml9XV8Or8GyZVdnEWgnSThIIl66X7nf+M/27F34N9PxGh65DNMH18syZmiaXFRNqi5K1xgEVJjcbRndLxS3VRs fg4g6JJ1 g+lp4m51Or0a3rHYjcn8mvzoA/CEJrLvJV4jO59ycBQmyfywcQ0e99bMCVSrULihGicJ9B/eRsa/l/iH9eSqXSA0qr109L3mGewYJXZd6Rb1Y0x0QwfRF28JKkhloyjPo38f59wFGmvQ2AC/nY5iEKUJ4+zGy1WFv6Fv2XFuVySZsCP66flD+RZpqyZoY4J4CRSL75f5iAkS8WBA992fxDX5fM/+Lo8Xk+c3sg/VfJodH+ng0Mcme4axdgVFHrsQZcHHyhh7Q75WLv6BWSsanN+JH/D67GXAguQPY34Ot7NSke2KnWypO9KcLKd947sIGHtcEwpikOEQAf+JXDUwjndHrjuWrO6VHd4zrBLOtg6L/SUYA9CAQs+WOGiUFB4FjJAOKBFtuenz5iDSSHEY7rUWy49ktlvBl3etyNCgcYyCYvO5DM16bU0gNWQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Feb 13, 2025 at 9:48=E2=80=AFPM Dave Hansen = wrote: > > On 2/13/25 11:14, Uros Bizjak wrote: > > According to [1], the usage of asm pseudo directives in the asm templat= e > > can confuse the compiler to wrongly estimate the size of the generated > > code. ALTERNATIVE macro expands to several asm pseudo directives, so > > its usage in {,try_}cmpxchg{64,128} causes instruction length estimate > > to fail by an order of magnitude (the compiler estimates the length of > > an asm to be more than 20 instructions). > > Just curious, but how did you come up with the "20 instructions" number? Currently, a patched GCC compiler is needed (please see asm_insn_count() and asm_str_count() functions in gcc/final.cc on how the asm length is calculated) to report the length. For historic reasons, the length of asm is not printed in asm dumps, but recently a GCC PR was filled with a request to change this). > > This wrong estimate further causes unoptimal inlining decisions for > > functions that use these locking primitives. > > > > Use asm_inline instead of just asm. For inlining purposes, the size of > > the asm is then taken as the minimum size, ignoring how many instructio= ns > > compiler thinks it is. > > So, the compiler is trying to decide whether to inline a function or > not. The bigger it is, the less likely, it is to be inlined. Since it is > over-estimating the size of {,try_}cmpxchg{64,128}, it will avoid > inlining it when it _should_ be inlining it. > > Is that it? Yes, because the calculated length of what is effectively one instruction gets unreasonably high. The compiler counts 20 *instructions*, each estimated to be 16 bytes long. > Is any of this measurable? Is there any objective data to support that > this change is a good one? Actually, "asm inline" was added to the GCC compiler just for this purpose by request from the linux community [1]. My patch follows the example of other similar macros (e.g. arch/x86/include/alternative.h) and adds the same cure to asms that will undoubtedly result in a single instruction [*]. The benefit is much more precise length estimation, so compiler heuristic is able to correctly estimate the benefit of inlining, not being skewed by excessive use of __always_inline directive. OTOH, it is hard to back up compiler decisions by objective data, as inlining decisions depend on several factors besides function size (e.g. how hot/cold is function), so a simple comparison of kernel sizes does not present the full picture. [1] https://gcc.gnu.org/pipermail/gcc-patches/2018-December/512349.html [*] Please note that if asm template is using CC_SET, the compiler may emit an additional SETx asm insn. However, all GCC versions that support "asm inline" also support flag outputs, so they are guaranteed to emit only one asm insn. > It's quite possible that someone did the "asm" on purpose because > over-estimating the size was a good thing. I doubt this would be the case, and I would consider the code that depends on this detail defective. The code that results in one asm instruction should be accounted as such, no matter what internal details are exposed in the instruction asm template. Uros.