From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3A097D24470 for ; Fri, 11 Oct 2024 17:05:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AC7F96B007B; Fri, 11 Oct 2024 13:05:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A77736B008A; Fri, 11 Oct 2024 13:05:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9190B6B00B4; Fri, 11 Oct 2024 13:05:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 6C78E6B008A for ; Fri, 11 Oct 2024 13:05:56 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 89DF4A1552 for ; Fri, 11 Oct 2024 17:05:47 +0000 (UTC) X-FDA: 82661948670.29.F71B2A7 Received: from mail-pg1-f177.google.com (mail-pg1-f177.google.com [209.85.215.177]) by imf15.hostedemail.com (Postfix) with ESMTP id 84A6FA001A for ; Fri, 11 Oct 2024 17:05:51 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=rivosinc-com.20230601.gappssmtp.com header.s=20230601 header.b=y1HXxOgH; spf=pass (imf15.hostedemail.com: domain of debug@rivosinc.com designates 209.85.215.177 as permitted sender) smtp.mailfrom=debug@rivosinc.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1728666216; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=GX97w9CZORz9pwsb2g6R6nhW/kmsArAb0VPRjnUPWTE=; b=xIgB7iYsHO/F6JYYUc9YTDjvI9du47vb547kh/PFuCISzzqtizsdGu9y4EV0SnxexLn+fM bY83SOPsdy3krTn8iRq2ZfIOZuvCVt3TPvgTi43YJZhg3NvZjpuLj2IFPmKT7klPN8ursI sLaJz3ydLxuM13pjG98ckfGjbbrxuJw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1728666216; a=rsa-sha256; cv=none; b=kasjMpdQ1OZCOJ7uaOmRKV0C42V3zGheTybjt8MM+ArMrhYIMjoxNzk+VBMSl1UmdKB5O4 OmbeWDNgG9NGrSv1MDw8upUbzHO0bnJY889yRn6EAZ37uUQ7jNONgS9ANPnk+MCHcdCPfQ BrYd1EynMQ+1xaketMx/M9F1PN3IfsY= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=rivosinc-com.20230601.gappssmtp.com header.s=20230601 header.b=y1HXxOgH; spf=pass (imf15.hostedemail.com: domain of debug@rivosinc.com designates 209.85.215.177 as permitted sender) smtp.mailfrom=debug@rivosinc.com; dmarc=none Received: by mail-pg1-f177.google.com with SMTP id 41be03b00d2f7-7e9f955cb97so1264238a12.3 for ; Fri, 11 Oct 2024 10:05:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20230601.gappssmtp.com; s=20230601; t=1728666353; x=1729271153; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=GX97w9CZORz9pwsb2g6R6nhW/kmsArAb0VPRjnUPWTE=; b=y1HXxOgHwyRsN+j5iipp/NMk0aGHNIFKHGzlrfi3c4O7wTO+4o+snqSDCk9ZzRfhjj E4f+I3hwF3Oh2mHMQrP4xT2N/XnNAayCSg0EJsPIUI3wWy+oRYmmC/QWWhmF74LZzphy f/CRx85VGih+0dVEGGsaBxV7p6gn59ptqHRkvXIvkgF4JbdiRR1Nirg9GjIz8x06Snwa jK5J5peYchF0Aq5s1D/UKbnyPXxmyyurXHyY0ulMrn5nv2VLyhQDfbgzulL04M/co8Gg WmD4IyUtgq4LWtK+8SAMfojhtKAr4mOI6JxWXA6i7t6FH9VNmmMBAAmUWQ5l3Ymh9++b dpTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728666353; x=1729271153; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=GX97w9CZORz9pwsb2g6R6nhW/kmsArAb0VPRjnUPWTE=; b=IPlmFkR3+NkFH41ugiNihLu4WTpJC7qcCOiqU3+0XS2XqJxPo48ZIxlNLtawfL/mfp J5pWmENqwF+CtQqn2piJp88ctCuMIZRAsQP1malbZ86sBnCgmra+mHVfb54XbmBsI/hs Sfhm2rOoOo9oLRW/62dBaqR4HJ5IruUZtpvfTVdImn6NaeBed+V4GCetstOwzulgFR13 a7G+zSQomRbH/p8lm7kXqsOyu5xgWcdvkH5iIzsat5OmusqS71Te16ZefG3y+fNexZak ee6LK4ZOT6oIKXcq0zobjIst8Bqh1XD0p9tFpTFPBlklps+z3X9Sy/ncxxnwZJomJhKG m/xg== X-Forwarded-Encrypted: i=1; AJvYcCV6Yd2cesdC2qy0pt6fF6MIc101w6F3788e0UFaVG7khzaFSwe47cW8/M95L2JKKoqyRKYKZJgxxw==@kvack.org X-Gm-Message-State: AOJu0YxwBWOKUzTqKKN10Q0dted9cDFF/7Jitn7FpP6z/uAZnA5CD5yk YHbaqhkneUw38HsVaKDxWsyLvDPqBcbZkiE1P+jv+qeS+RvpKd8v8TeumRnt7rc= X-Google-Smtp-Source: AGHT+IHeKLAKX34MOiFH+9aZBGmNB0XIEjZvnsN4hA9hIXNt7UoKkhvVG/Qf7QZjmPERRc/yqeydag== X-Received: by 2002:a05:6a21:e8a:b0:1d8:b11e:19b9 with SMTP id adf61e73a8af0-1d8bcfb3d9cmr5518442637.47.1728666352605; Fri, 11 Oct 2024 10:05:52 -0700 (PDT) Received: from debug.ba.rivosinc.com ([64.71.180.162]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-71e2a9f522esm2810481b3a.69.2024.10.11.10.05.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 11 Oct 2024 10:05:52 -0700 (PDT) Date: Fri, 11 Oct 2024 10:05:50 -0700 From: Deepak Gupta To: Mark Brown Cc: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Andrew Morton , "Liam R. Howlett" , Vlastimil Babka , Lorenzo Stoakes , Arnd Bergmann , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, Rick Edgecombe Subject: Re: [PATCH RFC/RFT 3/3] kernel: converge common shadow stack flow agnostic to arch Message-ID: References: <20241010-shstk_converge-v1-0-631beca676e7@rivosinc.com> <20241010-shstk_converge-v1-3-631beca676e7@rivosinc.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 84A6FA001A X-Stat-Signature: 9kskxd9s9hrr6is1k8amf4c979fmpifm X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1728666351-512223 X-HE-Meta: U2FsdGVkX1/rYLfJdYZx19PjeeAaLA+kTVI5vm+hJNp15FY7Y5hcC8N6VXSIAfKv0ia9mYIdYc8yHsuOrtYVhGEN4T2VAUe6VS1cPmm1jQGanSt5BoEC3t2HiRbB3DlSpWKjYByKq3gQBjoQe3tEEp6SZmljnSc8FPC1jRQjPHZ8BQBz47TRz98lnXVG+sZihO3zSnt4Ui3VQfl0jTCpJ8xXy3+IA6Y9TIowkHvuzppX3+gMNi8s/Oc/crtBv67Eeg5Jm8sS7F8cNp+nHhgRPxifE6heUrj80DbOs1FES8awpOxVuaDMl+HwjXnZ5FKUtBivKiooJ9jlTzA4JA2tLOklOuCDbqS8UWLzvWixBIF4lOP5fZ7K+B82prbU3dsdHODdka5hrsQjkZ8VMZc7JflOnVUvnScCEaoHzclQj0Ny3em+lkB21bjsP0u+U2UUDXfcmkQlIE+fztQiRC0ihYS/dURTLUJEVcoIxWHtXwu0xN6A4Sn52fq5zRUQUs+IL4Qfzp8m3urCcD4VTtSQvhKbKvnIXQdupRncjVjntuI3OyxNHqCRNjmEh6UhvwqFXitA6COKHkoZ4MmMYY+9x9OMvdAn9EFd5KO1AR74G2yM52/icl7+IkLNWsnJIJ5xV1e9Y8yQ5v3n/euNO9JF7JORtStiFkCZNX4sYGcR1WSIhiva+BPuN1KQp/6vLpIKR6xK5gFGRcB+CYb/WdeAyhjVTJewyyaQvdCcTsuC1KbOxVJVsSLTGfoQ+9VJH6HS3WJJXyfFXG/7EKCaz/c/+BEuFTPK2eOlr/YEUqfSR3rW9ZeA5ceSBrrEiR20MXJjysLADU7QL0dVLxHsix2oHSvPSzefqe0UAVXA/2j9j32bx4MmJAXBvhHCSwzPop0DLEo/RUnvlKARpeKnmgAGDe7GYH3ZyEXBD1BcV7b1kht2/1tK2+EgBIz4GS2TiBqEULtEhjRgi2OsB4ArqBM N+SrYYbO oL7SnxtpVt2yDjBvoWf05I7wpQ7UH3qcPKlp2PnJJ2RWT7LIcm+aR7e/Y2Y4pmTnb2ykIe39Fqw7B25QXId/NVUO2yXWXsfbbtyl4zoFv44MypkMD32Pzv+q+c1EfiTynuonRihqRhPcKg1lDjYKOHAsb2pYUfJA1J33iKpjyqhGibDUcjcDOd+/ZzshcC1Y/ILUyLpKtW7jpjAE3XD5lveQihHMp7iLjXN+VJZq5ZxWAioZx/2YAnuGHZRdyfBcsbVFdCuLJnqMKdHQYWWgZKkx0IHzsgrNKe8jYpYtmcivH1lpi04/YmBdufHALpfWRLoltzAegTjNWUHjrrjFVzBGQ3Kn5MHhSOkhjmk0m7XmNni4kF6hiYu2HKA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Oct 11, 2024 at 01:33:05PM +0100, Mark Brown wrote: >On Thu, Oct 10, 2024 at 05:32:05PM -0700, Deepak Gupta wrote: > >> +unsigned long alloc_shstk(unsigned long addr, unsigned long size, >> + unsigned long token_offset, bool set_res_tok); >> +int shstk_setup(void); >> +int create_rstor_token(unsigned long ssp, unsigned long *token_addr); >> +bool cpu_supports_shadow_stack(void); > >The cpu_ naming is confusing in an arm64 context, we use cpu_ for >functions that report if a feature is supported on the current CPU and >system_ for functions that report if a feature is enabled on the system. > hmm... Curious. What's the difference between cpu and system? We can ditch both cpu and system and call it `user_shstk_supported()`. Again not a great name but all we are looking for is whether user shadow stack is supported or not. >> +void set_thread_shstk_status(bool enable); > >It might be better if this took the flags that the prctl() takes? It >feels like hmm we can do that. But I imagine it might get invoked from other flow as well. Although instead of `bool`, we can take `unsigned long` here. It would work for now for `prctl` and future users get options to chisel around it. I'll do that. > >> +/* Flags for map_shadow_stack(2) */ >> +#define SHADOW_STACK_SET_TOKEN (1ULL << 0) /* Set up a restore token in the shadow stack */ >> + > >We've also got SHADOW_STACK_SET_MARKER now. > Sorry, I missed it. >> +bool cpu_supports_shadow_stack(void) >> +{ >> + return arch_cpu_supports_shadow_stack(); >> +} >> + >> +bool is_shstk_enabled(struct task_struct *task) >> +{ >> + return arch_is_shstk_enabled(task); >> +} > >Do we need these wrappers (or could they just be static inlines in the >header)? As I started doing this exercise, I ran into some headers and multiple definitions issue due to both modules (arch specific shstk.c and generic usershstk.c) need to call into each other independently and need to include each other headers, so took path of least resistence. But now that it settling a bit and I've better picture, I'll give it a try again. > >> +void set_thread_shstk_status(bool enable) >> +{ >> + arch_set_thread_shstk_status(enable); >> +} > >arm64 can return an error here, we reject a bunch of conditions like 32 >bit threads and locked enable status. Ok. You would like this error to be `bool` or an `int`? > >> +unsigned long adjust_shstk_size(unsigned long size) >> +{ >> + if (size) >> + return PAGE_ALIGN(size); >> + >> + return PAGE_ALIGN(min_t(unsigned long long, rlimit(RLIMIT_STACK), SZ_4G)); >> +} > >static? Yes this can be static. > >> +/* >> + * VM_SHADOW_STACK will have a guard page. This helps userspace protect >> + * itself from attacks. The reasoning is as follows: >> + * >> + * The shadow stack pointer(SSP) is moved by CALL, RET, and INCSSPQ. The >> + * INCSSP instruction can increment the shadow stack pointer. It is the >> + * shadow stack analog of an instruction like: >> + * >> + * addq $0x80, %rsp >> + * >> + * However, there is one important difference between an ADD on %rsp >> + * and INCSSP. In addition to modifying SSP, INCSSP also reads from the >> + * memory of the first and last elements that were "popped". It can be >> + * thought of as acting like this: >> + * >> + * READ_ONCE(ssp); // read+discard top element on stack >> + * ssp += nr_to_pop * 8; // move the shadow stack >> + * READ_ONCE(ssp-8); // read+discard last popped stack element >> + * >> + * The maximum distance INCSSP can move the SSP is 2040 bytes, before >> + * it would read the memory. Therefore a single page gap will be enough >> + * to prevent any operation from shifting the SSP to an adjacent stack, >> + * since it would have to land in the gap at least once, causing a >> + * fault. > >This is all very x86 centric... Yes I am aware and likely will be removed, assuming no objections from x86 on removing it. > >> + if (create_rstor_token(mapped_addr + token_offset, NULL)) { >> + vm_munmap(mapped_addr, size); >> + return -EINVAL; >> + } > >Bikeshedding but can we call the function create_shstk_token() instead? >The rstor means absolutely nothing in an arm64 context. `create_shstk_token` it is. Much better name. Thanks. > >> +SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, unsigned int, flags) >> +{ >> + bool set_tok = flags & SHADOW_STACK_SET_TOKEN; >> + unsigned long aligned_size; >> + >> + if (!cpu_supports_shadow_stack()) >> + return -EOPNOTSUPP; >> + >> + if (flags & ~SHADOW_STACK_SET_TOKEN) >> + return -EINVAL; > >This needs SHADOW_STACK_SET_MARKER for arm64. Ack. > >> + if (addr && (addr & (PAGE_SIZE - 1))) >> + return -EINVAL; > > if (!PAGE_ALIGNED(addr)) > >> +int shstk_setup(void) >> +{ > >This is half of the implementation of the prctl() for enabling shadow >stacks. Looking at the arm64 implementation this rafactoring feels a >bit awkward, we don't have the one flag at a time requiremet that x86 >has and we structure things rather differently. I'm not sure that the >arch_prctl() and prctl() are going to line up comfortably... > Yes I was in two minds as well. In x86 case, its anyways arch specific so, you will land up in arch specific code and then later land in generic code. In case of arm64 and riscv as well, each will have arch specific implementation. I'll give a try on making prctl handling arch agnostic. If it becomes too kludgy and ugly, may be its best that prctl handling and first time shadow stack setup is all arch specific. >> + struct thread_shstk *shstk = ¤t->thread.shstk; Note for myself, this ^ is not needed. It's x86 specific and I missed this clean up. >> + unsigned long addr, size; >> + >> + /* Already enabled */ >> + if (is_shstk_enabled(current)) >> + return 0; >> + >> + /* Also not supported for 32 bit */ >> + if (!cpu_supports_shadow_stack() || >> + (IS_ENABLED(CONFIG_X86_64) && in_ia32_syscall())) >> + return -EOPNOTSUPP; > >We probably need a thread_supports_shstk(), `is_shstk_enabled(current)` doesn't work? > arm64 has a similar check >for not 32 bit threads and I noted an issue with needing this check >elsewhere. hmm May be that's why we need `is_shskt_enabled(current)` and another `thread_supports_shstk` (probably some better name here from someone on list) And in case we end up having no commonalities in prctl handling (as mentioned in comment above), may be then its not needed to have a `thread_supports_shstk` because its needed during prctl handling. > >> + /* >> + * For CLONE_VFORK the child will share the parents shadow stack. >> + * Make sure to clear the internal tracking of the thread shadow >> + * stack so the freeing logic run for child knows to leave it alone. >> + */ >> + if (clone_flags & CLONE_VFORK) { >> + set_shstk_base_size(tsk, 0, 0); >> + return 0; >> + } > >On arm64 we set the new thread's shadow stack pointer here, the logic >around that can probably also be usefully factored out. Ok I'll take a look. > >> + /* >> + * For !CLONE_VM the child will use a copy of the parents shadow >> + * stack. >> + */ >> + if (!(clone_flags & CLONE_VM)) >> + return 0; > >Here also. Same comment as above.