linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	 Andy Lutomirski <luto@kernel.org>,
	x86@kernel.org, Kostya Serebryany <kcc@google.com>,
	 Andrey Ryabinin <ryabinin.a.a@gmail.com>,
	Andrey Konovalov <andreyknvl@gmail.com>,
	 Alexander Potapenko <glider@google.com>,
	Taras Madan <tarasmadan@google.com>,
	 Dmitry Vyukov <dvyukov@google.com>,
	"H . J . Lu" <hjl.tools@gmail.com>,
	 Andi Kleen <ak@linux.intel.com>,
	Rick Edgecombe <rick.p.edgecombe@intel.com>,
	 Bharata B Rao <bharata@amd.com>,
	Jacob Pan <jacob.jun.pan@linux.intel.com>,
	 Ashok Raj <ashok.raj@intel.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	 Sami Tolvanen <samitolvanen@google.com>,
	joao@overdrivepizza.com
Subject: Re: [PATCHv14 08/17] x86/mm: Reduce untagged_addr() overhead until the first LAM user
Date: Tue, 17 Jan 2023 12:43:57 -0800	[thread overview]
Message-ID: <CAHk-=wieCWumOHyPkhjJPUMpFr=OLSv2-dYvQV63GAT5zGnN5g@mail.gmail.com> (raw)
In-Reply-To: <CAHk-=wggKAKq1k0Z+EdWR0W9ULM7x9MhEbd_LmU2XPLM9WcjkQ@mail.gmail.com>

On Tue, Jan 17, 2023 at 12:10 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> That said, clang still generates more register pressure than gcc,
> causing the function prologue and epilogue to be rather bigger
> (pushing and popping six registers, as opposed to gcc that only needs
> three)

.. and at least part of that is the same thing with the bad byte mask
generation (see that "clang *still* messes up" link for details).

Basically, the byte mask is computed by

        mask = bytemask_from_count(tcount);

where we have

   #define bytemask_from_count(cnt)       (~(~0ul << (cnt)*8))

and clang tries very very hard to avoid that "multiply by 8", so
instead it keeps a shadow copy of that "(cnt)*8" value in the loop.

That is wrong for a couple of reasons:

 (a) it adds register pressure for no good reason

 (b) when you shift left by that value, only the low 6 bits of that
value matters

And guess how that "tcount" is updated? It's this:

                tcount -= sizeof(unsigned long);

in the loop, and thus the update of that shadow value of "(cnt)*8" is done as

        addl    $-64, %ecx

inside that loop.

This is truly stupid and wasted work, because the low 6 bits of the
value - remember, the only part that matters - DOES NOT CHANGE when
you do that.

So clang has decided that it needs to

 (a) avoid the "expensive" multiply-by-8 at the end by turning it into
a repeated "add $-64" inside the loop

 (b) added register pressure and one extra instruction inside the loop

 (c) not realized that that extra instruction doesn't actually *do*
anything, because it only affects the bits that don't actually matter
in the end.

which is all kind of silly, wouldn't you agree. Every single step
there was pointless.

But with my other simplifications, the fact that clang does these
extra things is no longer all that noticeable. It *used* to be a
horrible disaster because the extra register pressure ended up meaning
that you had spills and all kinds of nastiness. Now the function is
simple enough that even with the extra register pressure, there's no
need for spills.

.. until you look at the 32-bit version, which still needs spills. Gcc
does too, but clang just makes it worse by having the extra pointless
shadow variable.

If I cared about 32-bit, I might write up a bugzilla entry. As it is,
it's just "clang tries to be clever, and in the process is actually
being stupid".

              Linus


  reply	other threads:[~2023-01-17 20:51 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-11 12:37 [PATCHv14 00/17] Linear Address Masking enabling Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 01/17] x86/mm: Rework address range check in get_user() and put_user() Kirill A. Shutemov
2023-01-18 15:49   ` Peter Zijlstra
2023-01-18 15:59     ` Linus Torvalds
2023-01-18 16:48       ` Peter Zijlstra
2023-01-18 17:01         ` Linus Torvalds
2023-01-11 12:37 ` [PATCHv14 02/17] x86: Allow atomic MM_CONTEXT flags setting Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 03/17] x86: CPUID and CR3/CR4 flags for Linear Address Masking Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 04/17] x86/mm: Handle LAM on context switch Kirill A. Shutemov
2023-01-11 13:49   ` Linus Torvalds
2023-01-11 14:14     ` Kirill A. Shutemov
2023-01-11 14:37       ` [PATCHv14.1 " Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 05/17] mm: Introduce untagged_addr_remote() Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 06/17] x86/uaccess: Provide untagged_addr() and remove tags before address check Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 07/17] x86/mm: Provide arch_prctl() interface for LAM Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 08/17] x86/mm: Reduce untagged_addr() overhead until the first LAM user Kirill A. Shutemov
2023-01-17 13:05   ` Peter Zijlstra
2023-01-17 13:57     ` Kirill A. Shutemov
2023-01-17 15:02       ` Peter Zijlstra
2023-01-17 17:18         ` Linus Torvalds
2023-01-17 17:28           ` Linus Torvalds
2023-01-17 18:26             ` Nick Desaulniers
2023-01-17 18:33               ` Linus Torvalds
2023-01-17 19:17                 ` Nick Desaulniers
2023-01-17 20:10                   ` Linus Torvalds
2023-01-17 20:43                     ` Linus Torvalds [this message]
2023-01-17 18:14           ` Peter Zijlstra
2023-01-17 18:21           ` Peter Zijlstra
2023-01-19 23:06         ` Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 09/17] mm: Expose untagging mask in /proc/$PID/status Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 10/17] iommu/sva: Replace pasid_valid() helper with mm_valid_pasid() Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 11/17] x86/mm/iommu/sva: Make LAM and SVA mutually exclusive Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 12/17] selftests/x86/lam: Add malloc and tag-bits test cases for linear-address masking Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 13/17] selftests/x86/lam: Add mmap and SYSCALL " Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 14/17] selftests/x86/lam: Add io_uring " Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 15/17] selftests/x86/lam: Add inherit " Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 16/17] selftests/x86/lam: Add ARCH_FORCE_TAGGED_SVA " Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 17/17] selftests/x86/lam: Add test cases for LAM vs thread creation Kirill A. Shutemov
2023-01-18 16:49 ` [PATCHv14 00/17] Linear Address Masking enabling Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wieCWumOHyPkhjJPUMpFr=OLSv2-dYvQV63GAT5zGnN5g@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=ak@linux.intel.com \
    --cc=andreyknvl@gmail.com \
    --cc=ashok.raj@intel.com \
    --cc=bharata@amd.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dvyukov@google.com \
    --cc=glider@google.com \
    --cc=hjl.tools@gmail.com \
    --cc=jacob.jun.pan@linux.intel.com \
    --cc=joao@overdrivepizza.com \
    --cc=kcc@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=ndesaulniers@google.com \
    --cc=peterz@infradead.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=ryabinin.a.a@gmail.com \
    --cc=samitolvanen@google.com \
    --cc=tarasmadan@google.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox