From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B4BB1CEBF67 for ; Fri, 27 Sep 2024 01:22:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AD48C6B009F; Thu, 26 Sep 2024 21:22:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A84286B00A1; Thu, 26 Sep 2024 21:22:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9242A6B00A2; Thu, 26 Sep 2024 21:22:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 711536B009F for ; Thu, 26 Sep 2024 21:22:20 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 028DB1A07A1 for ; Fri, 27 Sep 2024 01:22:19 +0000 (UTC) X-FDA: 82608767640.20.D5DC5BC Received: from smtpout.efficios.com (smtpout.efficios.com [167.114.26.122]) by imf12.hostedemail.com (Postfix) with ESMTP id AE29840006 for ; Fri, 27 Sep 2024 01:22:16 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=efficios.com header.s=smtpout1 header.b=W32bVJ7u; spf=pass (imf12.hostedemail.com: domain of mathieu.desnoyers@efficios.com designates 167.114.26.122 as permitted sender) smtp.mailfrom=mathieu.desnoyers@efficios.com; dmarc=pass (policy=none) header.from=efficios.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1727400014; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=s4ALrLqnjLMaXFqnHyb96akBnnMKc30aFbafy4+OZ4Q=; b=jf2K68Z4glduWuz8+EyIOnROtGVSOh96MDgeW2Y4Hh4i9Ri9xxqxYHDn33UzbePG2Nd5ez wi24Zj3ISQxleodb1mL1xb82RxRYzRaMe2TibdAD1JOvvPemcrKx40VeVKh1v26dgd3M4L 20U/uGm2NCHVk6Phf4t/hvA2M7a3WJY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1727400014; a=rsa-sha256; cv=none; b=GxfAo808BJ1e21lUtnPa3drvDfmjU1RrHf6AZA6IJ+hOnT5FiOS+atry2gRZOYAGyDcVpk FWxy6Nb5cItA9g9jjNpe5oE8zcporBuwQoz3Y5zInVczdYscs2mabVX1iIAbfuhdSD+AJc 2zzEBi+anp6OS99v1M1MqTT4x4s2ejI= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=efficios.com header.s=smtpout1 header.b=W32bVJ7u; spf=pass (imf12.hostedemail.com: domain of mathieu.desnoyers@efficios.com designates 167.114.26.122 as permitted sender) smtp.mailfrom=mathieu.desnoyers@efficios.com; dmarc=pass (policy=none) header.from=efficios.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1727400135; bh=ElZAKhc9xfVL364QUUdcGdbCVJjoyaPqJZ5VzYUZDLg=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=W32bVJ7uvg72EJNsd3sQl4uL/XE6oglxztZY2TbDCA+zlmMR2CAVp007XFbaQGJah Np0FQGOUG51m6HMNE3BGseVBXVhjNBT4bxH+TXAHGEihAQFzDYms7aHXpohI+XmSaW bJW2ZYbm+6JXkwdFmGKkDeUXu9VAQS7szJwV07fi1n+v+nfugsid+OJiSkM9iM+sid KD9mdOi6DOGKCoK+jDP8TbmEU9i5swLvwQtz/sz/LSeqRRjrL9w0jCG5MhVel3kRfO fMggqCINXLRHnkGxhrPbUaya9BAn1q8uQUaR2SL4AUtgmE3wYSsFun1Ea+HLQhEDQN FEOAy6rG3O30w== Received: from [IPV6:2606:6d00:100:4000:cacb:9855:de1f:ded2] (unknown [IPv6:2606:6d00:100:4000:cacb:9855:de1f:ded2]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4XFCPM25b7z1MZK; Thu, 26 Sep 2024 21:22:15 -0400 (EDT) Message-ID: Date: Fri, 27 Sep 2024 03:20:40 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH 1/4] hazptr: Add initial implementation of hazard pointers To: Linus Torvalds , Jonas Oberhauser Cc: Boqun Feng , linux-kernel@vger.kernel.org, rcu@vger.kernel.org, linux-mm@kvack.org, lkmm@lists.linux.dev, "Paul E. McKenney" , Frederic Weisbecker , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Uladzislau Rezki , Steven Rostedt , Lai Jiangshan , Zqiang , Peter Zijlstra , Ingo Molnar , Will Deacon , Waiman Long , Mark Rutland , Thomas Gleixner , Kent Overstreet , Vlastimil Babka , maged.michael@gmail.com, Neeraj Upadhyay References: <20240917143402.930114-1-boqun.feng@gmail.com> <20240917143402.930114-2-boqun.feng@gmail.com> <55975a55-302f-4c45-bfcc-192a8a1242e9@huaweicloud.com> <4167e6f5-4ff9-4aaa-915e-c1e692ac785a@efficios.com> <48992c9f-6c61-4716-977c-66e946adb399@efficios.com> <2b2aea37-06fe-40cb-8458-9408406ebda6@efficios.com> <55633835-242c-4d7f-875b-24b16f17939c@huaweicloud.com> From: Mathieu Desnoyers Content-Language: en-US In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Stat-Signature: t9egjywwiynec467qgxybhhyxwf3annr X-Rspamd-Queue-Id: AE29840006 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1727400136-416806 X-HE-Meta: U2FsdGVkX1+LHCe4I9rMe6K1igq57xtjoL3bZutftFeTGXoIZRVnujAoMkf+yZDyDYddBkZCioRne/wCReUkGOcJbckTG6JniLAjg6uzgwhUcgFyCZpqLIDZjG3rR8/Ni7tfosHpj15Luy02DA4iJDzxv4wQZkJzz+swuoAS3jGrRA2UuKU8/8adfLLmHlj6pZjVeip6+/W2ZY1pNmzOiiCn0KYwxXYZzPL5NnBB7s7iZDoI3BuiU2Cmh894GLOa2H6FUxTtNnQYP5vlw5ekEafvp16PZEbEkf7bCXJduLrI8PjSBkhd4fL8uFCBMdAqGqKkKUJJaxyvn7RfsyWAo/PfcEzvSYJYUGjJdKveBGOlySZFUYB2iSkeSACgvD1lf43pPUPVH691SQHj3wZI6/Nur6AEiTLfuYCpHYuo0GA3vykNWsDrU8T+snEm8uU7eERbqUQ6RYjXhtR1CUqvC73Pife+v0+9hGhhBDfyr7Rv+UmfOtfnNdZNuT8WRbD8wayb5fOHonxfJi0Zk/5yAG49VYqlVUuqSVuYXdsxUTNFq8me1OLMNZ+bx+fMAPt5sbkFXgaLZgisIKedOfpLqfmpzAjzTWCFRiKZWnImQ2Uzjxil+YLeGSHGC0clB6E5BIS8iqOJBVhUKutsB4M9S26ZySH8x1F+6tjeg7LGN2+P+6iHM1hya90OQS0q5xB6vAqsAJfrpR8Mr0N8GnzBEgP+AfZxtJ1DdB9x46RmkDXp9uZylCmzPra70nnIOEgG7C5b63jpRz9gCXBTpTsqp5VaVSl+5wXefvzMPqps1hdXHS0YlGwtQrNmqb+t83Y2oZddzd19ve28UvVs5dts5y1GySCPnNbUXhAmpzXb8GURvRgUD9qjO5hVR5o9YZ2Azb1Z6KhWxK9iVNUXe2tVJMEftWw6Cr0iJChj7//XRLJchQvj2Lou7eQIyjjwuZHO/oG0N8rNhbCQOMzShTu a6mHb9o5 p9+0X1+CNS3SVCKIPS4cJCG3+y99XMh98LZkKWi55IiAvkquCdiOlU2TjDRf4w5U1E5N5wfFVJajsh1as45DfHtp/oLx0K750JEHaKls6cUiP4YmgFFTxaMt+0CvhqX4eKWmb3aPaOpksQUaq3eLq+TQ5dWEnhVMgNhi8ev/tsMGW18i+GquYLxJ1v9NxyLP2pv0VKUfOzfB1iDliC9lX7mF1gkcf6HPbK4z5TRtJxcQIYjW3m11IljjFTkWuS3Bm0lI5XE/7KLfC9/E99QgtF5JCC03skiy6Mzt2Kb8ReKsZ2GU8MxUO/XCigDX4zeB2ZQXNRQxdL1yWAgGeTo9UPWlHslWQ+oh+00DYQeWt7BGVix9vwKU0lI9yIyH+8RmnCQ2tg/TR5tE0fMOytjdsqVMH57OTTny9+9Quk+W+mVmQ8Owb9wcTfXyA9Nqdto5M2n4sIZ3S88DjjqCQyLNigdGpVMABzPrqwSl0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024-09-26 18:12, Linus Torvalds wrote: > On Thu, 26 Sept 2024 at 08:54, Jonas Oberhauser > wrote: >> >> No, the issue introduced by the compiler optimization (or by your >> original patch) is that the CPU can speculatively load from the first >> pointer as soon as it has completed the load of that pointer: > > You mean the compiler can do it. The inline asm has no impact on what > the CPU does. The conditional isn't a barrier for the actual hardware. > But once the compiler doesn't try to do it, the data dependency on the > address does end up being an ordering constraint on the hardware too > (I'm happy to say that I haven't heard from the crazies that want > value prediction in a long time). > > Just use a barrier. Or make sure to use the proper ordered memory > accesses when possible. Don't use an inline asm for the compare - we > don't even have anything insane like that as a portable helper, and we > shouldn't have it. How does the compiler barrier help in any way here ? I am concerned about the compiler SSA GVN (Global Value Numbering) optimizations, and I don't think a compiler barrier solves anything. (or I'm missing something obvious) I was concerned about the suggestion from Jonas to use "node2" rather than "node" after the equality check as a way to ensure the intended register is used to return the pointer, because after the SSA GVN optimization pass, AFAIU this won't help in any way. I have a set of examples below that show gcc use the result of the first load, and clang use the result of the second load (on both x86-64 and aarch64). Likewise when a load-acquire is used as second load, which I find odd. Hopefully mixing this optimization from gcc with speculation still abide by the memory model. Only the asm goto approach ensures that gcc uses the result from the second load. #include #define READ_ONCE(x) \ (*(__volatile__ __typeof__(x) *)&(x)) static inline bool same_ptr(void *a, void *b) { asm goto ( #ifdef __x86_64__ "cmpq %[a], %[b]\n\t" "jne %l[ne]\n\t" #elif defined(__aarch64__) "cmp %[a], %[b]\n\t" "bne %l[ne]\n\t" #else # error "unimplemented" #endif : : [a] "r" (a), [b] "r" (b) : : ne); return true; ne: return false; } int *p; int fct_2_volatile(void) { int *a, *b; do { a = READ_ONCE(p); asm volatile ("" : : : "memory"); b = READ_ONCE(p); } while (a != b); return *b; } int fct_volatile_acquire(void) { int *a, *b; do { a = READ_ONCE(p); asm volatile ("" : : : "memory"); b = __atomic_load_n(&p, __ATOMIC_ACQUIRE); } while (a != b); return *b; } int fct_asm_compare(void) { int *a, *b; do { a = READ_ONCE(p); asm volatile ("" : : : "memory"); b = READ_ONCE(p); } while (!same_ptr(a, b)); return *b; } x86-64 gcc 14.2: fct_2_volatile: mov rax,QWORD PTR [rip+0x0] # 7 mov rdx,QWORD PTR [rip+0x0] # e cmp rax,rdx jne 0 mov eax,DWORD PTR [rax] ret cs nop WORD PTR [rax+rax*1+0x0] fct_volatile_acquire: mov rax,QWORD PTR [rip+0x0] # 27 mov rdx,QWORD PTR [rip+0x0] # 2e cmp rax,rdx jne 20 mov eax,DWORD PTR [rax] ret cs nop WORD PTR [rax+rax*1+0x0] fct_asm_compare: mov rdx,QWORD PTR [rip+0x0] # 47 mov rax,QWORD PTR [rip+0x0] # 4e cmp rax,rdx jne 40 mov eax,DWORD PTR [rax] ret main: xor eax,eax ret x86-64 clang 19.1.0: fct_2_volatile: mov rcx,QWORD PTR [rip+0x0] # 7 mov rax,QWORD PTR [rip+0x0] # e cmp rcx,rax jne 0 mov eax,DWORD PTR [rax] ret cs nop WORD PTR [rax+rax*1+0x0] fct_volatile_acquire: mov rcx,QWORD PTR [rip+0x0] # 27 mov rax,QWORD PTR [rip+0x0] # 2e cmp rcx,rax jne 20 mov eax,DWORD PTR [rax] ret cs nop WORD PTR [rax+rax*1+0x0] fct_asm_compare: mov rcx,QWORD PTR [rip+0x0] # 47 mov rax,QWORD PTR [rip+0x0] # 4e cmp rax,rcx jne 40 mov eax,DWORD PTR [rax] ret cs nop WORD PTR [rax+rax*1+0x0] main: xor eax,eax ret ARM64 gcc 14.2.0: fct_2_volatile: adrp x0, .LANCHOR0 add x0, x0, :lo12:.LANCHOR0 .L2: ldr x1, [x0] ldr x2, [x0] cmp x1, x2 bne .L2 ldr w0, [x1] ret fct_volatile_acquire: adrp x0, .LANCHOR0 add x0, x0, :lo12:.LANCHOR0 .L6: ldr x1, [x0] ldar x2, [x0] cmp x1, x2 bne .L6 ldr w0, [x1] ret fct_asm_compare: adrp x1, .LANCHOR0 add x1, x1, :lo12:.LANCHOR0 .L9: ldr x2, [x1] ldr x0, [x1] cmp x2, x0 bne .L9 ldr w0, [x0] ret p: .zero 8 armv8-a clang (trunk): fct_2_volatile: adrp x8, 0 ldr x10, [x8] ldr x9, [x8] cmp x10, x9 b.ne 4 // b.any ldr w0, [x9] ret fct_volatile_acquire: adrp x8, 0 add x8, x8, #0x0 ldr x10, [x8] ldar x9, [x8] cmp x10, x9 b.ne 24 // b.any ldr w0, [x9] ret fct_asm_compare: adrp x8, 0 ldr x9, [x8] ldr x8, [x8] cmp x9, x8 b.ne 3c // b.any ldr w0, [x8] ret main: mov w0, wzr -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com