Re: [PATCH RESEND 2/2] x86/locking: Use asm_inline for {,try_}cmpxchg{64,128} emulations

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Uros Bizjak <ubizjak@gmail.com>
To: Dave Hansen <dave.hansen@intel.com>
Cc: x86@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	 Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@kernel.org>, Borislav Petkov <bp@alien8.de>,
	 Dave Hansen <dave.hansen@linux.intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	 Dennis Zhou <dennis@kernel.org>, Tejun Heo <tj@kernel.org>,
	Christoph Lameter <cl@linux.com>,
	 "Peter Zijlstra (Intel)" <peterz@infradead.org>
Subject: Re: [PATCH RESEND 2/2] x86/locking: Use asm_inline for {,try_}cmpxchg{64,128} emulations
Date: Fri, 14 Feb 2025 08:25:55 +0100	[thread overview]
Message-ID: <CAFULd4Y+xWW3Xq+SeRo7n7BS__epyYZC9ygnNU4EmWb+BPzC4w@mail.gmail.com> (raw)
In-Reply-To: <87194c62-7e97-41d3-98bd-14288e8bde8f@intel.com>

On Thu, Feb 13, 2025 at 11:52 PM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 2/13/25 14:13, Uros Bizjak wrote:
> > On Thu, Feb 13, 2025 at 9:48 PM Dave Hansen <dave.hansen@intel.com> wrote:
> >> On 2/13/25 11:14, Uros Bizjak wrote:
> >>> According to [1], the usage of asm pseudo directives in the asm template
> >>> can confuse the compiler to wrongly estimate the size of the generated
> >>> code. ALTERNATIVE macro expands to several asm pseudo directives, so
> >>> its usage in {,try_}cmpxchg{64,128} causes instruction length estimate
> >>> to fail by an order of magnitude (the compiler estimates the length of
> >>> an asm to be more than 20 instructions).
> >>
> >> Just curious, but how did you come up with the "20 instructions" number?
> >
> > Currently, a patched GCC compiler is needed (please see
> > asm_insn_count() and asm_str_count() functions in gcc/final.cc on how
> > the asm length is calculated) to report the length. For historic
> > reasons, the length of asm is not printed in asm dumps, but recently a
> > GCC PR was filled with a request to change this).
>
> So, that's also good info to add. You can  even do it in the changelog
> with little more space than the existing changelog:
>
>         ... fail by an order of magnitude (a hacked-up gcc shows that it
>         estimates the length of an asm to be more than 20 instructions).
>
> ...
> >> Is any of this measurable? Is there any objective data to support that
> >> this change is a good one?
> >
> > Actually, "asm inline" was added to the GCC compiler just for this
> > purpose by request from the linux community [1].
>
> Wow, that's really important important information. Shouldn't the fact
> that this is leveraging a new feature that we asked for specifically get
> called out somewhere?
>
> Who asked for it? Are they on cc? Do they agree that this feature fills
> the gap they wanted filled?

asm_inline is already used in some 40-50 places throughout the tree,
but there still remain some places that could benefit from it.

> > My patch follows the
> > example of other similar macros (e.g. arch/x86/include/alternative.h)
> > and adds the same cure to asms that will undoubtedly result in a
> > single instruction [*].  The benefit is much more precise length
> > estimation, so compiler heuristic is able to correctly estimate the
> > benefit of inlining, not being skewed by excessive use of
> > __always_inline directive. OTOH, it is hard to back up compiler
> > decisions by objective data, as inlining decisions depend on several
> > factors besides function size (e.g. how hot/cold is function), so a
> > simple comparison of kernel sizes does not present the full picture.
>
> Yes, the world is complicated. But, honestly, one data point is a
> billion times better than zero. Right now, we're at zero.
>
> >> It's quite possible that someone did the "asm" on purpose because
> >> over-estimating the size was a good thing.
> >
> > I doubt this would be the case, and I would consider the code that
> > depends on this detail defective. The code that results in one asm
> > instruction should be accounted as such, no matter what internal
> > details are exposed in the instruction asm template.
>
> Yeah, but defective or not, if this causes a regression, it's either not
> getting applied to gets reverted.
>
> All that I'm asking here is that someone look at the kernel after the
> patch gets applied and sanity check it. Absolutely basic scientific
> method stuff. Make a hypothesis about what it will do:
>
>         1. Inline these locking functions
>         2. Make the kernel go faster for _something_
>
> and if it doesn't match the hypothesis, then try and figure out why. You
> don't have to do every config or every compiler. Just do one config and
> one modern compiler.
>
> Right now, this patch is saying:
>
>         1. gcc appears to have done something that might be suboptimal
>         2. gcc has a new feature that might make it less suboptimal
>         3. here's a patch that should optimize things
>         ...
>
> but then it leaves us hanging.  There's a lot of "mights" and "shoulds"
> in there, but nothing that shows that this actually does anything
> positive in practice.

Let me harvest some data and report the findings in a V2 ChangeLog.
However, these particular macros are rarely used, so I don't expect
some big changes in the generated asm code.

> Maybe I'm just a dummy and this is just an obvious improvement that I
> can't grasp. If so, sorry for being so dense, but I'm going to need a
> little more education before this gets applied.

Thanks,
Uros.

next prev parent reply	other threads:[~2025-02-14  7:26 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-13 19:14 [PATCH RESEND 1/2] x86/locking: Use ALT_OUTPUT_SP() for percpu_{,try_}cmpxchg{64,128}_op() Uros Bizjak
2025-02-13 19:14 ` [PATCH RESEND 2/2] x86/locking: Use asm_inline for {,try_}cmpxchg{64,128} emulations Uros Bizjak
2025-02-13 20:48   ` Dave Hansen
2025-02-13 22:13     ` Uros Bizjak
2025-02-13 22:52       ` Dave Hansen
2025-02-14  7:25         ` Uros Bizjak [this message]
2025-02-13 20:43 ` [PATCH RESEND 1/2] x86/locking: Use ALT_OUTPUT_SP() for percpu_{,try_}cmpxchg{64,128}_op() Dave Hansen
2025-02-13 21:17   ` Uros Bizjak
2025-02-13 22:54     ` Dave Hansen
2025-02-14 18:22     ` Christoph Lameter (Ampere)
2025-02-14 19:55       ` Uros Bizjak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFULd4Y+xWW3Xq+SeRo7n7BS__epyYZC9ygnNU4EmWb+BPzC4w@mail.gmail.com \
    --to=ubizjak@gmail.com \
    --cc=bp@alien8.de \
    --cc=cl@linux.com \
    --cc=dave.hansen@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dennis@kernel.org \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox