linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>,
	Andy Lutomirski <luto@kernel.org>,
	x86@kernel.org, Kostya Serebryany <kcc@google.com>,
	Andrey Ryabinin <ryabinin.a.a@gmail.com>,
	Andrey Konovalov <andreyknvl@gmail.com>,
	Alexander Potapenko <glider@google.com>,
	Taras Madan <tarasmadan@google.com>,
	Dmitry Vyukov <dvyukov@google.com>,
	"H . J . Lu" <hjl.tools@gmail.com>,
	Andi Kleen <ak@linux.intel.com>,
	Rick Edgecombe <rick.p.edgecombe@intel.com>,
	Bharata B Rao <bharata@amd.com>,
	Jacob Pan <jacob.jun.pan@linux.intel.com>,
	Ashok Raj <ashok.raj@intel.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Sami Tolvanen <samitolvanen@google.com>,
	ndesaulniers@google.com, joao@overdrivepizza.com
Subject: Re: [PATCHv14 08/17] x86/mm: Reduce untagged_addr() overhead until the first LAM user
Date: Tue, 17 Jan 2023 16:02:06 +0100	[thread overview]
Message-ID: <Y8a4bmCU9wsenvvF@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <20230117135703.voaumisreld7crfb@box>

On Tue, Jan 17, 2023 at 04:57:03PM +0300, Kirill A. Shutemov wrote:
> On Tue, Jan 17, 2023 at 02:05:22PM +0100, Peter Zijlstra wrote:
> > On Wed, Jan 11, 2023 at 03:37:27PM +0300, Kirill A. Shutemov wrote:
> > 
> > >  #define __untagged_addr(untag_mask, addr)
> > >  	u64 __addr = (__force u64)(addr);				\
> > > -	s64 sign = (s64)__addr >> 63;					\
> > > -	__addr &= untag_mask | sign;					\
> > > +	if (static_branch_likely(&tagged_addr_key)) {			\
> > > +		s64 sign = (s64)__addr >> 63;				\
> > > +		__addr &= untag_mask | sign;				\
> > > +	}								\
> > >  	(__force __typeof__(addr))__addr;				\
> > >  })
> > >  
> > > #define untagged_addr(addr) __untagged_addr(current_untag_mask(), addr)
> > 
> > Is the compiler clever enough to put the memop inside the branch?
> 
> Hm. You mean current_untag_mask() inside static_branch_likely()?
> 
> But it is preprocessor who does this, not compiler. So, yes, the memop is
> inside the branch.
> 
> Or I didn't understand your question.

Nah, call it a pre-lunch dip, I overlooked the whole CPP angle -- d'0h.

That said, I did just put it through a compiler to see wth it did and it
is pretty gross:

GCC-12.2:

0000 00000000000023b0 <write_ok_or_segv>:
0000     23b0:  48 89 fa                mov    %rdi,%rdx
0003     23b3:  eb 76                   jmp    242b <write_ok_or_segv+0x7b>
0005     23b5:  65 48 8b 0d 00 00 00 00         mov    %gs:0x0(%rip),%rcx        # 23bd <write_ok_or_segv+0xd>  23b9: R_X86_64_PC32     tlbstate_untag_mask-0x4
000d     23bd:  48 89 f8                mov    %rdi,%rax
0010     23c0:  48 c1 f8 3f             sar    $0x3f,%rax
0014     23c4:  48 09 c8                or     %rcx,%rax
0017     23c7:  48 21 f8                and    %rdi,%rax
001a     23ca:  48 b9 00 f0 ff ff ff 7f 00 00   movabs $0x7ffffffff000,%rcx
0024     23d4:  48 39 f1                cmp    %rsi,%rcx
0027     23d7:  72 14                   jb     23ed <write_ok_or_segv+0x3d>
0029     23d9:  48 29 f1                sub    %rsi,%rcx
002c     23dc:  be 01 00 00 00          mov    $0x1,%esi
0031     23e1:  48 39 c1                cmp    %rax,%rcx
0034     23e4:  72 07                   jb     23ed <write_ok_or_segv+0x3d>
0036     23e6:  89 f0                   mov    %esi,%eax
0038     23e8:  e9 00 00 00 00          jmp    23ed <write_ok_or_segv+0x3d>     23e9: R_X86_64_PLT32    __x86_return_thunk-0x4
003d     23ed:  65 48 8b 04 25 00 00 00 00      mov    %gs:0x0,%rax     23f2: R_X86_64_32S      pcpu_hot
0046     23f6:  48 89 90 68 0b 00 00    mov    %rdx,0xb68(%rax)
004d     23fd:  be 01 00 00 00          mov    $0x1,%esi
0052     2402:  bf 0b 00 00 00          mov    $0xb,%edi
0057     2407:  48 c7 80 78 0b 00 00 06 00 00 00        movq   $0x6,0xb78(%rax)
0062     2412:  48 c7 80 70 0b 00 00 0e 00 00 00        movq   $0xe,0xb70(%rax)
006d     241d:  e8 00 00 00 00          call   2422 <write_ok_or_segv+0x72>     241e: R_X86_64_PLT32    force_sig_fault-0x4
0072     2422:  31 f6                   xor    %esi,%esi
0074     2424:  89 f0                   mov    %esi,%eax
0076     2426:  e9 00 00 00 00          jmp    242b <write_ok_or_segv+0x7b>     2427: R_X86_64_PLT32    __x86_return_thunk-0x4
007b     242b:  48 89 f8                mov    %rdi,%rax
007e     242e:  eb 9a                   jmp    23ca <write_ok_or_segv+0x1a>

Note the stupid jump to the end and back. Not all sites do this mind
you, but a fair number seem to do it.

Let me try llvm to see if it is any smarter.

CLANG-16:

0000 0000000000002d50 <write_ok_or_segv>:
0000     2d50:	41 57                	push   %r15
0002     2d52:	41 56                	push   %r14
0004     2d54:	41 54                	push   %r12
0006     2d56:	53                   	push   %rbx
0007     2d57:	48 89 f3             	mov    %rsi,%rbx
000a     2d5a:	48 89 fa             	mov    %rdi,%rdx
000d     2d5d:	49 89 fe             	mov    %rdi,%r14
0010     2d60:	eb 15                	jmp    2d77 <write_ok_or_segv+0x27>
0012     2d62:	48 89 d0             	mov    %rdx,%rax
0015     2d65:	48 c1 f8 3f          	sar    $0x3f,%rax
0019     2d69:	65 4c 8b 35 00 00 00 00 	mov    %gs:0x0(%rip),%r14        # 2d71 <write_ok_or_segv+0x21>	2d6d: R_X86_64_PC32	tlbstate_untag_mask-0x4
0021     2d71:	49 09 c6             	or     %rax,%r14
0024     2d74:	49 21 d6             	and    %rdx,%r14
0027     2d77:	f3 0f 1e fa          	endbr64
002b     2d7b:	49 bf 00 f0 ff ff ff 7f 00 00 	movabs $0x7ffffffff000,%r15
0035     2d85:	4d 89 fc             	mov    %r15,%r12
0038     2d88:	49 29 dc             	sub    %rbx,%r12
003b     2d8b:	72 05                	jb     2d92 <write_ok_or_segv+0x42>
003d     2d8d:	4d 39 f4             	cmp    %r14,%r12
0040     2d90:	73 34                	jae    2dc6 <write_ok_or_segv+0x76>
0042     2d92:	65 48 8b 05 00 00 00 00 	mov    %gs:0x0(%rip),%rax        # 2d9a <write_ok_or_segv+0x4a>	2d96: R_X86_64_PC32	pcpu_hot-0x4
004a     2d9a:	48 c7 80 78 0b 00 00 06 00 00 00 	movq   $0x6,0xb78(%rax)
0055     2da5:	48 89 90 68 0b 00 00 	mov    %rdx,0xb68(%rax)
005c     2dac:	48 c7 80 70 0b 00 00 0e 00 00 00 	movq   $0xe,0xb70(%rax)
0067     2db7:	bf 0b 00 00 00       	mov    $0xb,%edi
006c     2dbc:	be 01 00 00 00       	mov    $0x1,%esi
0071     2dc1:	e8 00 00 00 00       	call   2dc6 <write_ok_or_segv+0x76>	2dc2: R_X86_64_PLT32	force_sig_fault-0x4
0076     2dc6:	4d 39 f4             	cmp    %r14,%r12
0079     2dc9:	0f 93 c1             	setae  %cl
007c     2dcc:	49 39 df             	cmp    %rbx,%r15
007f     2dcf:	0f 93 c0             	setae  %al
0082     2dd2:	20 c8                	and    %cl,%al
0084     2dd4:	5b                   	pop    %rbx
0085     2dd5:	41 5c                	pop    %r12
0087     2dd7:	41 5e                	pop    %r14
0089     2dd9:	41 5f                	pop    %r15
008b     2ddb:	e9 00 00 00 00       	jmp    2de0 <__pfx_get_gate_vma>	2ddc: R_X86_64_PLT32	__x86_return_thunk-0x4

Well, it got the untag right, but OMG.. :-( Joao, Sami, any idea why it
put an ENDBR in there?


  reply	other threads:[~2023-01-17 15:02 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-11 12:37 [PATCHv14 00/17] Linear Address Masking enabling Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 01/17] x86/mm: Rework address range check in get_user() and put_user() Kirill A. Shutemov
2023-01-18 15:49   ` Peter Zijlstra
2023-01-18 15:59     ` Linus Torvalds
2023-01-18 16:48       ` Peter Zijlstra
2023-01-18 17:01         ` Linus Torvalds
2023-01-11 12:37 ` [PATCHv14 02/17] x86: Allow atomic MM_CONTEXT flags setting Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 03/17] x86: CPUID and CR3/CR4 flags for Linear Address Masking Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 04/17] x86/mm: Handle LAM on context switch Kirill A. Shutemov
2023-01-11 13:49   ` Linus Torvalds
2023-01-11 14:14     ` Kirill A. Shutemov
2023-01-11 14:37       ` [PATCHv14.1 " Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 05/17] mm: Introduce untagged_addr_remote() Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 06/17] x86/uaccess: Provide untagged_addr() and remove tags before address check Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 07/17] x86/mm: Provide arch_prctl() interface for LAM Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 08/17] x86/mm: Reduce untagged_addr() overhead until the first LAM user Kirill A. Shutemov
2023-01-17 13:05   ` Peter Zijlstra
2023-01-17 13:57     ` Kirill A. Shutemov
2023-01-17 15:02       ` Peter Zijlstra [this message]
2023-01-17 17:18         ` Linus Torvalds
2023-01-17 17:28           ` Linus Torvalds
2023-01-17 18:26             ` Nick Desaulniers
2023-01-17 18:33               ` Linus Torvalds
2023-01-17 19:17                 ` Nick Desaulniers
2023-01-17 20:10                   ` Linus Torvalds
2023-01-17 20:43                     ` Linus Torvalds
2023-01-17 18:14           ` Peter Zijlstra
2023-01-17 18:21           ` Peter Zijlstra
2023-01-19 23:06         ` Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 09/17] mm: Expose untagging mask in /proc/$PID/status Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 10/17] iommu/sva: Replace pasid_valid() helper with mm_valid_pasid() Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 11/17] x86/mm/iommu/sva: Make LAM and SVA mutually exclusive Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 12/17] selftests/x86/lam: Add malloc and tag-bits test cases for linear-address masking Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 13/17] selftests/x86/lam: Add mmap and SYSCALL " Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 14/17] selftests/x86/lam: Add io_uring " Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 15/17] selftests/x86/lam: Add inherit " Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 16/17] selftests/x86/lam: Add ARCH_FORCE_TAGGED_SVA " Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 17/17] selftests/x86/lam: Add test cases for LAM vs thread creation Kirill A. Shutemov
2023-01-18 16:49 ` [PATCHv14 00/17] Linear Address Masking enabling Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y8a4bmCU9wsenvvF@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=ak@linux.intel.com \
    --cc=andreyknvl@gmail.com \
    --cc=ashok.raj@intel.com \
    --cc=bharata@amd.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dvyukov@google.com \
    --cc=glider@google.com \
    --cc=hjl.tools@gmail.com \
    --cc=jacob.jun.pan@linux.intel.com \
    --cc=joao@overdrivepizza.com \
    --cc=kcc@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=ndesaulniers@google.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=ryabinin.a.a@gmail.com \
    --cc=samitolvanen@google.com \
    --cc=tarasmadan@google.com \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox