From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AE77CCF34A9 for ; Thu, 3 Oct 2024 14:21:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EF6EB6B044E; Thu, 3 Oct 2024 10:21:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E7F8D6B0454; Thu, 3 Oct 2024 10:21:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C84E56B044E; Thu, 3 Oct 2024 10:21:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 9AFCE6B0446 for ; Thu, 3 Oct 2024 10:21:11 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 363FD416DE for ; Thu, 3 Oct 2024 14:21:11 +0000 (UTC) X-FDA: 82632503142.27.D8173B1 Received: from smtpout.efficios.com (smtpout.efficios.com [167.114.26.122]) by imf02.hostedemail.com (Postfix) with ESMTP id 0FC5B80013 for ; Thu, 3 Oct 2024 14:21:07 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=efficios.com header.s=smtpout1 header.b=xJ9KALNM; spf=pass (imf02.hostedemail.com: domain of mathieu.desnoyers@efficios.com designates 167.114.26.122 as permitted sender) smtp.mailfrom=mathieu.desnoyers@efficios.com; dmarc=pass (policy=none) header.from=efficios.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1727965097; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=UMyPIX7aH6ujBcQ2FFRTC/Q0pfB1CPhQSU2FBGf0Gvk=; b=6fnsRJQ1Fy/phTsMRmoB6OivMH/GBg4ftrugqKLiGsrsC4U4dF8MbBPjYZzV27n+uRU1y+ C5N2eUsbcLG9QtmOnp4IN5aOeIcgLSSJzz63KpyKYwEcKf2uH/zgGNtQyU9NijFghbJZFP FcjnzANMGGtK+VHt25aH1UY9yao2lF4= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=efficios.com header.s=smtpout1 header.b=xJ9KALNM; spf=pass (imf02.hostedemail.com: domain of mathieu.desnoyers@efficios.com designates 167.114.26.122 as permitted sender) smtp.mailfrom=mathieu.desnoyers@efficios.com; dmarc=pass (policy=none) header.from=efficios.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1727965097; a=rsa-sha256; cv=none; b=NdBKl9iahBVxcNHD8pJbeulpxnsGL8Ty+8+iIY0SuFzX9epQs3X5jflbAbC2KSs0UyD+8u fRUxN1206gslER/AvN+VzUl/R2mC28GVrXp4SOtySMj1wfuLmPChivU+QswC/+aEJVnzE4 0GH+RJvcaFlu9svFYJ7Ehh6Q/EEB3tk= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1727965267; bh=CGnfdcDnXLM248wKeLz3YsGWDhDT7TOOh166t7wY+uo=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=xJ9KALNMcGMevAx9+lNF070s7mlbSEA8awVfo6Hp8UCXB5siihboA9Gs8PO2FWHHZ ZlTSfIhSFTF5LJQmt6rFYnxWRN0bFcQ0tzRaZDyTDJB/IPd6UtGTAeRVxTqABsTK0m z4NAE/29P3ULxX+aVZ2VszPnlmSw8/Rxdd9ffCiYYaNUOrsbwee3OHPJ6jQlFSzwVu 64ohJZdZlaxw0ZjfAumQ+tHk6IEV1C3qp1WAKhCCM2esv81+yLDQ7fU8FU0lCIq9Ec W4FIQ2l2Qx99GsrfvJgBpRY+FmhJyt/PE2C5zjW+KQBfUP0wCD1UjFm+hMYjXLjdCB kVHlHVUz6q5Vg== Received: from [172.16.0.134] (96-127-217-162.qc.cable.ebox.net [96.127.217.162]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4XKDPG3wvwz59p; Thu, 3 Oct 2024 10:21:06 -0400 (EDT) Message-ID: <7cc83ffc-c9cc-4e87-a3ee-bb62588a594c@efficios.com> Date: Thu, 3 Oct 2024 10:19:05 -0400 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH 1/4] compiler.h: Introduce ptr_eq() to preserve address dependency To: Joel Fernandes Cc: Linus Torvalds , Andrew Morton , Peter Zijlstra , linux-kernel@vger.kernel.org, Nicholas Piggin , Michael Ellerman , Greg Kroah-Hartman , Sebastian Andrzej Siewior , "Paul E. McKenney" , Will Deacon , Boqun Feng , Alan Stern , John Stultz , Neeraj Upadhyay , Frederic Weisbecker , Josh Triplett , Uladzislau Rezki , Steven Rostedt , Lai Jiangshan , Zqiang , Ingo Molnar , Waiman Long , Mark Rutland , Thomas Gleixner , Vlastimil Babka , maged.michael@gmail.com, Mateusz Guzik , Jonas Oberhauser , rcu@vger.kernel.org, linux-mm@kvack.org, lkmm@lists.linux.dev, Gary Guo , Nikita Popov , llvm@lists.linux.dev References: <20241002010205.1341915-1-mathieu.desnoyers@efficios.com> <20241002010205.1341915-2-mathieu.desnoyers@efficios.com> <20241003000843.GA192403@google.com> From: Mathieu Desnoyers Content-Language: en-US In-Reply-To: <20241003000843.GA192403@google.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 0FC5B80013 X-Stat-Signature: zransx7jcra7csj98jqjxtfqezyeu5ow X-Rspam-User: X-HE-Tag: 1727965267-490040 X-HE-Meta: U2FsdGVkX1/GfykBMwQOp3BR2HZp3cGv0+GSLeCEqRqxqoXAQvVIU9WHP60cHqvTTTovSJD7qwRpRvEouX5wWvf37tlFO6+zuxfp4yG8sI3RcwQLBJ8PXd/BYsKnF0B0akc6HIp6etv/1yB9bf2SEX9ERS2VNoxxjF7GrRTb2Wp+r31OykM3Zr7WyMqitjHa4zL8QdV3vQ73LA+j1ZsXrm/kNI9jqzRk7yh/Z7skTr4PjuKKHq7ydcsK0WTHhR/qRcRDnAnkGxJt51Pof77FaoRkTsGnn0O6d9SnVKKsr5YKY/0ZQTHaLDW9uRqGgT/KgokGjAZAHn7VWRXpG1VleYrW9y7fwWKkNZNQH2HauPEHJORRpkfSCsKT+/SiRCypL05BN+7oTQeynwI+mMZEEd19w5TIhz3o9R1MyDxqpQW4lyu06wDrNE8CfyAkSGH+EV0zH56oMiuPPHHWg9+56hFQ1+f7BNJ6ZJNU/aW7JEEsIQztIywtjhFVQxRdQ+y8u6BS3stxE1UZaKf2SybU00epJOR0yyK7VSmlrSOlw1FyIps39ASfcvHgVd1r3QPhXGXi0RQllsA/ha2dZ/xmiqBScummqnO8d49teVpMNnsf68CYrt1Ia/q5NVn2ygNItIsJqQC11ngwjWT+DkZy/Z2RAGOHz8hn1vD0VsWE+RIUntU/Qov4fFyjUG4IffIoyb7NkCpqifdcy3SuJgNEhxh+2q2uWNjvW8tqXD5a/s4hVpkM3u+V8ExwAmxOGwuIj+8r72XaZW5XhTFDafhXbZLorKybpqdPFkoTvICJ9H5sC5wpSvp1EkPbpqHg0DCfnJVh66NfKj/VfYjBsr2RKLiEZL6ftPFqQzeDsk8p25sxKfn/JSbk0y2x+dCHmBIgSF0BtN1D7Vqq9y/MriLaB2c3liMhQpCLh0DR/J0u4aUBhUdWADHXViUAxWhGUc7BJY8IOfZrVar+C1ul/S8 JHMYy6mO 46OWF2Nt/TRJFwYaey2LRB5SrWES1yV9PaAS24hsb4B+CUk40qi5d6JEmgAkhtGvTc0aQAk9viD5qXU2p/OYyzQpFiV5lpWtd408L/zcb19cVwO2B4HB60xJ2BxpswPSizdZEySOmsMkxOh5++FsCsH138jbh0qcXxNCb0Q3VKLKtYlNVUGF/23p4dK/J/8cqdCiSh8N4ZLfkWOhgl7N0V0CFqJmm2c/RC/4AYM0t4iLfqRKCt8tIwQtRbF9pFVhO0GacubVPof0CJ/4mIZDeOUpExLd/effsH0pa4IH9wDqVoa75oRg5SIM9GhJOZC9CfaNZEwGt7xBib+BQCDUGdNFGj1wNXuBeMnStzHWyK0Cm/Rrr/xDf2isWmOa4cVHwMUkGEK5UGUxpXw5KlTjaBjFhQniiXQHVTHJcr4jnP6J8Jg0tUXwIEBr7+xoN62bATEuPMD9+q+jM8A5CKoFGEJ1rzNHNBWPFeVvcGouEc5Y6iSiYmMijtCob0Rq7UGrti2cYz3u1Qa4w2OR6jr4khpnYtN5v/IHT0cYBgllTC61sjbY+dbRel9gtQg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024-10-03 02:08, Joel Fernandes wrote: > On Tue, Oct 01, 2024 at 09:02:02PM -0400, Mathieu Desnoyers wrote: >> Compiler CSE and SSA GVN optimizations can cause the address dependency >> of addresses returned by rcu_dereference to be lost when comparing those >> pointers with either constants or previously loaded pointers. >> >> Introduce ptr_eq() to compare two addresses while preserving the address >> dependencies for later use of the address. It should be used when >> comparing an address returned by rcu_dereference(). >> >> This is needed to prevent the compiler CSE and SSA GVN optimizations >> from using @a (or @b) in places where the source refers to @b (or @a) >> based on the fact that after the comparison, the two are known to be >> equal, which does not preserve address dependencies and allows the >> following misordering speculations: >> >> - If @b is a constant, the compiler can issue the loads which depend >> on @a before loading @a. >> - If @b is a register populated by a prior load, weakly-ordered >> CPUs can speculate loads which depend on @a before loading @a. >> >> The same logic applies with @a and @b swapped. >> > [...] >> +/* >> + * Compare two addresses while preserving the address dependencies for >> + * later use of the address. It should be used when comparing an address >> + * returned by rcu_dereference(). >> + * >> + * This is needed to prevent the compiler CSE and SSA GVN optimizations >> + * from using @a (or @b) in places where the source refers to @b (or @a) >> + * based on the fact that after the comparison, the two are known to be >> + * equal, which does not preserve address dependencies and allows the >> + * following misordering speculations: >> + * >> + * - If @b is a constant, the compiler can issue the loads which depend >> + * on @a before loading @a. >> + * - If @b is a register populated by a prior load, weakly-ordered >> + * CPUs can speculate loads which depend on @a before loading @a. >> + * >> + * The same logic applies with @a and @b swapped. >> + * >> + * Return value: true if pointers are equal, false otherwise. >> + * >> + * The compiler barrier() is ineffective at fixing this issue. It does >> + * not prevent the compiler CSE from losing the address dependency: >> + * >> + * int fct_2_volatile_barriers(void) >> + * { >> + * int *a, *b; >> + * >> + * do { >> + * a = READ_ONCE(p); >> + * asm volatile ("" : : : "memory"); >> + * b = READ_ONCE(p); >> + * } while (a != b); >> + * asm volatile ("" : : : "memory"); <-- barrier() >> + * return *b; >> + * } >> + * >> + * With gcc 14.2 (arm64): >> + * >> + * fct_2_volatile_barriers: >> + * adrp x0, .LANCHOR0 >> + * add x0, x0, :lo12:.LANCHOR0 >> + * .L2: >> + * ldr x1, [x0] <-- x1 populated by first load. >> + * ldr x2, [x0] >> + * cmp x1, x2 >> + * bne .L2 >> + * ldr w0, [x1] <-- x1 is used for access which should depend on b. >> + * ret >> + * > > I could reproduce this in compiler explorer, but I'm curious what flags are > you using? For me it does a bunch of usage of the stack for temporary storage > (still incorrectly returns *a though as you pointed). You are probably missing "-O2". > > Interestingly, if I just move the comparison into an an __always_inline__ > function like below, but without the optimizer hide stuff, gcc 14.2 on arm64 > does generate the correct code: Make sure you compile in -O2. Based on a quick check here the hide var is needed to make sure the compiler does the intended behavior in O2. > > static inline __attribute__((__always_inline__)) int ptr_eq(const volatile void *a, const volatile void *b) > { >     /* No OPTIMIZER_HIDE_VAR */ >     return a == b; > } > > volatile int *p = 0; > > int fct_2_volatile_barriers() > { >     int *a, *b; > >     do { >         a = READ_ONCE(p); >         asm volatile ("" : : : "memory"); >         b = READ_ONCE(p); >     } while (!ptr_eq(a, b)); >     asm volatile ("" : : : "memory");  // barrier() >     return *b; > } > > But not sure if it fixes the speculation issue you referred to. Not in -O2. > > Putting back the OPTIMIZER_HIDE_VAR() then just seems to pass the a and b > stored on the stack through a washing machine: > >         ldr     x0, [sp, 8] >         str     x0, [sp, 8] >         ldr     x0, [sp] >         str     x0, [sp] That washing machine looks like the result of -O0. > > And here I thought the "" in OPTIMIZER_HIDE_VAR was not supposed to generate > any code but I guess it is still a NOOP. The hide var will only emit an extra register movement to copy the register to a temporary. That's one extra instruction but not as bad as what you observe in -O0. > > Anyway, as such this LGTM since whether OPTIMIZER_HIDE_VAR() used or not, it > does fix the problem. hide var is needed in O2. > > Reviewed-by: Joel Fernandes (Google) Please double-check with -O2, and let me know if you still agree with the patch :) Thanks, Mathieu > > thanks, > > - Joel > -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com