From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8BE6BC7115A for ; Sun, 22 Jun 2025 17:13:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ECAC76B00A2; Sun, 22 Jun 2025 13:13:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E7B5A6B00A3; Sun, 22 Jun 2025 13:13:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D6A6C6B00A4; Sun, 22 Jun 2025 13:13:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id C3ABF6B00A2 for ; Sun, 22 Jun 2025 13:13:56 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 7A1411D70A4 for ; Sun, 22 Jun 2025 17:13:56 +0000 (UTC) X-FDA: 83583684072.29.E183F81 Received: from mail-wm1-f45.google.com (mail-wm1-f45.google.com [209.85.128.45]) by imf22.hostedemail.com (Postfix) with ESMTP id A4DF9C000E for ; Sun, 22 Jun 2025 17:13:54 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=QEmUg2j5; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf22.hostedemail.com: domain of david.laight.linux@gmail.com designates 209.85.128.45 as permitted sender) smtp.mailfrom=david.laight.linux@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750612434; a=rsa-sha256; cv=none; b=hLj+rfw1arpKZBPyz0eH60GAjfOEL4ZoTFFbJmRbS/OlTVIisxkRXYHZWjNLb0Gw5/B0WD YDtMPstl0LwidlCfA3w6BUZAQHFhqmE/MEAnDCZKxuuF/Pt9UrtuJL3T2fgZAdx0hDpwCO rI/GPMfL/T2j+NYITOmjOe3O2dd/Evc= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=QEmUg2j5; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf22.hostedemail.com: domain of david.laight.linux@gmail.com designates 209.85.128.45 as permitted sender) smtp.mailfrom=david.laight.linux@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750612434; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=bIZU4ZB8m4d8Bq5Wfwj0E+qHG2zC3SoNULF9eyehTrI=; b=r7DXEFvp/WevcUml7I/TAoEcOunYgk6ePGUH0g8tqGD//7Fngk2Fg7fqNq8S+nspafQJtn t6mhtMZl87ATqMtVj0+HvYCL1odMO1gK4Jf2cO+/KkrE0iARpuvN2YFPIj2qD98P5bxZ/D IMxLdN3jTt3xGfoKE9FUNdAroBvvboU= Received: by mail-wm1-f45.google.com with SMTP id 5b1f17b1804b1-453066fad06so21641195e9.2 for ; Sun, 22 Jun 2025 10:13:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1750612433; x=1751217233; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=bIZU4ZB8m4d8Bq5Wfwj0E+qHG2zC3SoNULF9eyehTrI=; b=QEmUg2j5Vt1U1ur6Fcxsj5QMePsqhrlpWj8rz+o9/W/jcdpuwK+P9adb/54/rXWxmw s2oZ3gYPPBO7mASDBrDLxN+gEDPQ5f+36W9wxXe8MdrAUG7xYBqk2sF3JvBy71fYvCGC Pl3gLDYTswqVAvvnHISuwy3/Y4warxbaqfuQOflF+kOI9SPgSihj/NApwLfZXVzm0ys7 M88thFB/uOV8ggPjKJ832LACewsuSJwP3fgOGTAbhvtd5KOqwiUYkmN7RSvWqOCeFhF/ xSM3gXVVUYlp4xqQXHeroxkzEaE/q83YOz3bbA5APnreXgdQggqwkDzpjgB+7gwRfbXl TBnw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750612433; x=1751217233; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=bIZU4ZB8m4d8Bq5Wfwj0E+qHG2zC3SoNULF9eyehTrI=; b=Y4gsjxgjlfbxLF2VzCm/nesLH3RsM0/m/TVV7Cz6JndX6sYPkzTIsk9dud+YsoMoDa ANE/3zxMBkyszRsz/13yWxee1GJNEfGbYxj2f2skDGVur+qVERkmo6pQsB2StRx8a7M7 SJ9WVxvKXVs29n4leLLF4lhghJFsSx6LmxYDZPJEtkjzrnfcN1y3jpNeXPz7622Zpy3G EQqq+CzwPsnTOCZMxUvqHeHxr0VJ680PwxrLftIwVBowM4NF4/WSdGkPjyffa1wDotE3 G+xulFrfosX77CCiD9w7WoEwIQUDX+KSMFzPsW2RfpHqT2vXZRQZURqvtvL3w1SZeMZF cLMg== X-Forwarded-Encrypted: i=1; AJvYcCW6khSnORceamj4Mo0GRWvnLyaaHdUPYuBsORSEL7fm1wJ/Tf0RxlNQvHEqb4Dlhqmrq6ykVklmIA==@kvack.org X-Gm-Message-State: AOJu0YyFdqhOlaU7JXYZ9nV7gozSrsvGrsBPQJlQkIVgm8UV4pZchpeR fjpeIS5N2vykJo6+ggS7KMT275/QTJ2elZFonHkRlRsAcIVbieicW7El X-Gm-Gg: ASbGncuM5/VUXR5vPo5I2nXKLqaNtVKUJk1JZnc4TZFYm4gH1y93QT0ShEjaba6hN3O MSWGWqXJYfwp/K7s7UgYM/5fU0Ka1N7LnEzzAq/owl6gFtCbtEIKnx544mJxLQVj5vtS9eO07Xn ZWxCl7uwcdQkYmbdAYYtZImb5RWqCu+dLtqPwlhkxrnv0t0K1p9M/o6zV5rggwmMgoeozk5Otzy lsNtw9ag41dDsaqnyscugL2odxNCivG+Ok3ErdHlH1euV7O8F5Un+CyO3dXBh9dxyA8E0stzWzX F+QMpEWuoh+dx6eRadbQs/gNZkohZkYC852EoeFY1zXo2Jfoasw0oWTD1wd6esISVsJYfalG34Y mKyLUBRHLh672tUrJ7YIIzCPL X-Google-Smtp-Source: AGHT+IFxKlVxKCduPVVRwrWdl6NPspLUWEEGe3o0GYO6USWSDesx39FJP/b+Omd3c/6Se1vOl1bWTw== X-Received: by 2002:a05:600c:5250:b0:442:f4a3:b5f2 with SMTP id 5b1f17b1804b1-453653925f1mr79632595e9.6.1750612432858; Sun, 22 Jun 2025 10:13:52 -0700 (PDT) Received: from pumpkin (82-69-66-36.dsl.in-addr.zen.co.uk. [82.69.66.36]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3a6d0f10a27sm7621546f8f.14.2025.06.22.10.13.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 22 Jun 2025 10:13:52 -0700 (PDT) Date: Sun, 22 Jun 2025 18:13:51 +0100 From: David Laight To: Christophe Leroy Cc: Michael Ellerman , Nicholas Piggin , Naveen N Rao , Madhavan Srinivasan , Alexander Viro , Christian Brauner , Jan Kara , Thomas Gleixner , Ingo Molnar , Peter Zijlstra , Darren Hart , Davidlohr Bueso , "Andre Almeida" , Andrew Morton , Dave Hansen , Linus Torvalds , linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH 5/5] powerpc: Implement masked user access Message-ID: <20250622181351.08141b50@pumpkin> In-Reply-To: <9dfb66c94941e8f778c4cabbf046af2a301dd963.1750585239.git.christophe.leroy@csgroup.eu> References: <9dfb66c94941e8f778c4cabbf046af2a301dd963.1750585239.git.christophe.leroy@csgroup.eu> X-Mailer: Claws Mail 4.1.1 (GTK 3.24.38; arm-unknown-linux-gnueabihf) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: A4DF9C000E X-Stat-Signature: q4sn9uj8xuy1abqi9dkg4w3kpangtrdk X-HE-Tag: 1750612434-770162 X-HE-Meta: U2FsdGVkX19L+OWIlmDJrcbVEwmG36l3BY3ZO5v5+vXVfViOCN6C8j7JImJG7HjzMgyyqcz7XrCizySQYtfmc6xfQhzoq5PdBmgdaa/1F2c0JHxpZdC8mH6jGyswloI/M0XMgOQya/lGPsLNlccaxChKym/SbyNDroLohUk9SlK7iF33lS34wGoGeGYWsDB67dpVcY+g5dpIMZXPvoJ/w0Y1CLJf9eGNwNb4wc3uCMSDwT58l7JqBdy3vGF1BYg+1hdeq/wFASbM8Yjsglu7OEHEnv+55+WvcctyQcB2pQK9R/pM2OS7b+iGXUtajquhYCWBIQjAlP1DCOLPuO8PveuLSWUw28BJOA0aFcm9SLosQmgGq+F7xf+GhgxFwF/xBJeG16FXrXqLIAP0wsMHzo3D7RrTWHtzl4Cap4t6jkqm7+YQua9G8RQBOGFiHhbDDuTfmZjBZBfOO1jrJptyz+Om4v+ErQ2kbrsd8jjwmtIBOfONXqSgDhO3Hi45VW3LNz0zGj+SvYtsfKO51Ga7C6BfBRngar5/GCx8FlaKO542wPd86CUuK1FvRTZG8BgH1EcII/P5ft6D5ntZDXsGMOokZ6NPhrPCVQi7Qb4+DXiL3F45VUlgEIqcdtAOIeZMrrax8s0NBsZPZeS7aoC7xeDzEFC4vJB/9BYHHhaf3siuDfxDOqlZ2h12nvO7sNipwvO5PzE1PfClQ8lQh6/0CKi9UFBCThdVT8GvpY3w6a97NnoTRF+8kYlCz92h5L+UQ8Q8q2ifbsGdIGjzxMJBaaiskuvnDTyuZnMnhRL8oKGD0UqWW3ez3XJKJ3At9gSLB0uarml804ljXz8JgbvaKPQKMAvrwCd4Z6m/b2ke1GLyR1nqKNEw7gz9Oe34VFSrk+u/5LwkYhbvXTtVa7cHxAW3ZEfV2vVEobpLsYiECIBEjexn/sySfVkbDE2+VVqjdsUzzscd042t5+PNK5m otO5Q2z9 46kWQZN/9bcg4l9JUyWxnTzMbpDh9le9mkK4B8mZYaZzkNkejeuWzDPK7gmgyyL/5Uz4l X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sun, 22 Jun 2025 11:52:43 +0200 Christophe Leroy wrote: > Masked user access avoids the address/size verification by access_ok(). > Allthough its main purpose is to skip the speculation in the > verification of user address and size hence avoid the need of spec > mitigation, it also has the advantage to reduce the amount of > instructions needed so it also benefits to platforms that don't > need speculation mitigation, especially when the size of the copy is > not know at build time. Not checking the size is slightly orthogonal. It really just depends on the accesses being 'reasonably sequential'. That is probably always true since access_ok() covers a single copy. > > So implement masked user access on powerpc. The only requirement is > to have memory gap that faults between the top user space and the > real start of kernel area. On 64 bits platform it is easy, bit 0 is > always 0 for user addresses and always 1 for kernel addresses and > user addresses stop long before the end of the area. On 32 bits it > is more tricky. It theory user space can go up to 0xbfffffff while > kernel will usually start at 0xc0000000. So a gap needs to be added > inbetween. Allthough in theory a single 4k page would suffice, it > is easier and more efficient to enforce a 128k gap below kernel, > as it simplifies the masking. The gap isn't strictly necessary - provided the first access is guaranteed to be at the specified address and the transfer are guaranteed sequential. But that is hard to guarantee. Where does the vdso end up? My guess is 'near the top of userspace' - but maybe not. > > Unlike x86_64 which masks the address to 'all bits set' when the > user address is invalid, here the address is set to an address is > the gap. It avoids relying on the zero page to catch offseted > accesses. Not true. Using 'cmov' also removed an instruction. > > e500 has the isel instruction which allows selecting one value or > the other without branch and that instruction is not speculative, so > use it. Allthough GCC usually generates code using that instruction, > it is safer to use inline assembly to be sure. The result is: > > 14: 3d 20 bf fe lis r9,-16386 > 18: 7c 03 48 40 cmplw r3,r9 > 1c: 7c 69 18 5e iselgt r3,r9,r3 > > On other ones, when kernel space is over 0x80000000 and user space > is below, the logic in mask_user_address_simple() leads to a > 3 instruction sequence: > > 14: 7c 69 fe 70 srawi r9,r3,31 > 18: 7c 63 48 78 andc r3,r3,r9 > 1c: 51 23 00 00 rlwimi r3,r9,0,0,0 > > This is the default on powerpc 8xx. > > When the limit between user space and kernel space is not 0x80000000, > mask_user_address_32() is used and a 6 instructions sequence is > generated: > > 24: 54 69 7c 7e srwi r9,r3,17 > 28: 21 29 57 ff subfic r9,r9,22527 > 2c: 7d 29 fe 70 srawi r9,r9,31 > 30: 75 2a b0 00 andis. r10,r9,45056 > 34: 7c 63 48 78 andc r3,r3,r9 > 38: 7c 63 53 78 or r3,r3,r10 > > The constraint is that TASK_SIZE be aligned to 128K in order to get > the most optimal number of instructions. > > When CONFIG_PPC_BARRIER_NOSPEC is not defined, fallback on the > test-based masking as it is quicker than the 6 instructions sequence > but not necessarily quicker than the 3 instructions sequences above. Doesn't that depend on whether the branch is predicted correctly? I can't read ppc asm well enough to check the above. And the C is also a bit tortuous. > > On 64 bits, kernel is always above 0x8000000000000000 and user always > below, which leads to a 4 instructions sequence: > > 80: 7c 69 1b 78 mr r9,r3 > 84: 7c 63 fe 76 sradi r3,r3,63 > 88: 7d 29 18 78 andc r9,r9,r3 > 8c: 79 23 00 4c rldimi r3,r9,0,1 > > Signed-off-by: Christophe Leroy > --- > arch/powerpc/Kconfig | 2 +- > arch/powerpc/include/asm/uaccess.h | 100 +++++++++++++++++++++++++++++ > 2 files changed, 101 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig > index c3e0cc83f120..c26a39b4504a 100644 > --- a/arch/powerpc/Kconfig > +++ b/arch/powerpc/Kconfig > @@ -1303,7 +1303,7 @@ config TASK_SIZE > hex "Size of user task space" if TASK_SIZE_BOOL > default "0x80000000" if PPC_8xx > default "0xb0000000" if PPC_BOOK3S_32 && EXECMEM > - default "0xc0000000" > + default "0xbffe0000" > > config MODULES_SIZE_BOOL > bool "Set custom size for modules/execmem area" > diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h > index 89d53d4c2236..19743ee80523 100644 > --- a/arch/powerpc/include/asm/uaccess.h > +++ b/arch/powerpc/include/asm/uaccess.h > @@ -2,6 +2,8 @@ > #ifndef _ARCH_POWERPC_UACCESS_H > #define _ARCH_POWERPC_UACCESS_H > > +#include > + > #include > #include > #include > @@ -455,6 +457,104 @@ user_write_access_begin(const void __user *ptr, size_t len) > #define user_write_access_begin user_write_access_begin > #define user_write_access_end prevent_current_write_to_user > > +/* > + * Masking the user address is an alternative to a conditional > + * user_access_begin that can avoid the fencing. This only works > + * for dense accesses starting at the address. I think you need to say that kernel addresses get converted to an invalid address between user and kernel addresses. It works provided accesses are 'reasonably dense'. David > + */ > +static inline void __user *mask_user_address_simple(const void __user *ptr) > +{ > + unsigned long addr = (unsigned long)ptr; > + unsigned long mask = (unsigned long)((long)addr >> (BITS_PER_LONG - 1)); > + > + addr = ((addr & ~mask) & (~0UL >> 1)) | (mask & (1UL << (BITS_PER_LONG - 1))); > + > + return (void __user *)addr; > +} > + > +static inline void __user *mask_user_address_e500(const void __user *ptr) > +{ > + unsigned long addr; > + > + asm("cmplw %1, %2; iselgt %0, %2, %1" : "=r"(addr) : "r"(ptr), "r"(TASK_SIZE): "cr0"); > + > + return (void __user *)addr; > +} > + > +/* Make sure TASK_SIZE is a multiple of 128K for shifting by 17 to the right */ > +static inline void __user *mask_user_address_32(const void __user *ptr) > +{ > + unsigned long addr = (unsigned long)ptr; > + unsigned long mask = (unsigned long)((long)((TASK_SIZE >> 17) - 1 - (addr >> 17)) >> 31); > + > + addr = (addr & ~mask) | (TASK_SIZE & mask); > + > + return (void __user *)addr; > +} > + > +static inline void __user *mask_user_address_fallback(const void __user *ptr) > +{ > + unsigned long addr = (unsigned long)ptr; > + > + return (void __user *)(addr < TASK_SIZE ? addr : TASK_SIZE); > +} > + > +static inline void __user *mask_user_address(const void __user *ptr) > +{ > +#ifdef MODULES_VADDR > + const unsigned long border = MODULES_VADDR; > +#else > + const unsigned long border = PAGE_OFFSET; > +#endif > + BUILD_BUG_ON(TASK_SIZE_MAX & (SZ_128K - 1)); > + BUILD_BUG_ON(TASK_SIZE_MAX + SZ_128K > border); > + BUILD_BUG_ON(TASK_SIZE_MAX & 0x8000000000000000ULL); > + BUILD_BUG_ON(IS_ENABLED(CONFIG_PPC64) && !(PAGE_OFFSET & 0x8000000000000000ULL)); > + > + if (IS_ENABLED(CONFIG_PPC64)) > + return mask_user_address_simple(ptr); > + if (IS_ENABLED(CONFIG_E500)) > + return mask_user_address_e500(ptr); > + if (TASK_SIZE <= SZ_2G && border >= SZ_2G) > + return mask_user_address_simple(ptr); > + if (IS_ENABLED(CONFIG_PPC_BARRIER_NOSPEC)) > + return mask_user_address_32(ptr); > + return mask_user_address_fallback(ptr); > +} > + > +static inline void __user *masked_user_access_begin(const void __user *p) > +{ > + void __user *ptr = mask_user_address(p); > + > + might_fault(); > + allow_read_write_user(ptr, ptr); > + > + return ptr; > +} > +#define masked_user_access_begin masked_user_access_begin > + > +static inline void __user *masked_user_read_access_begin(const void __user *p) > +{ > + void __user *ptr = mask_user_address(p); > + > + might_fault(); > + allow_read_from_user(ptr); > + > + return ptr; > +} > +#define masked_user_read_access_begin masked_user_read_access_begin > + > +static inline void __user *masked_user_write_access_begin(const void __user *p) > +{ > + void __user *ptr = mask_user_address(p); > + > + might_fault(); > + allow_write_to_user(ptr); > + > + return ptr; > +} > +#define masked_user_write_access_begin masked_user_write_access_begin > + > #define unsafe_get_user(x, p, e) do { \ > __long_type(*(p)) __gu_val; \ > __typeof__(*(p)) __user *__gu_addr = (p); \