From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9959AC25B4F for ; Sun, 12 May 2024 17:06:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EF5F06B0194; Sun, 12 May 2024 13:06:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E7F3E6B0195; Sun, 12 May 2024 13:06:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D20366B0196; Sun, 12 May 2024 13:06:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id B13236B0194 for ; Sun, 12 May 2024 13:06:50 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 51DEBC0B1F for ; Sun, 12 May 2024 17:06:50 +0000 (UTC) X-FDA: 82110373380.10.09E4B96 Received: from relay8-d.mail.gandi.net (relay8-d.mail.gandi.net [217.70.183.201]) by imf05.hostedemail.com (Postfix) with ESMTP id 52706100010 for ; Sun, 12 May 2024 17:06:48 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=none; spf=pass (imf05.hostedemail.com: domain of alex@ghiti.fr designates 217.70.183.201 as permitted sender) smtp.mailfrom=alex@ghiti.fr; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1715533608; a=rsa-sha256; cv=none; b=tVgsY+hez7xCVOjjyv0d9nr52MTu5NsgyS9T83uXAiEFMRXF9XaFDBEv9I5XiFP1G21X8G EPVorq0J4B0gcYck29BHQofMW4GrJnD2UxTkFQpTPtg5uuCYKxQOf3u8VLJM7LFSDaSODu 6ehGzwq4Cu9SYIEDOsZzcV8OqbP0JwQ= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=none; spf=pass (imf05.hostedemail.com: domain of alex@ghiti.fr designates 217.70.183.201 as permitted sender) smtp.mailfrom=alex@ghiti.fr; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1715533608; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NcmO4wLWOV33vEfWkdOBqz6PjhcLuemGhNrDq/WAm2s=; b=TWERHWtVez/hJl+l/AAsut6YzM2eHvsrg9YnoJWpdcQGRk46mvVdqZiy6et1zA2nGvT1up 6yke1rjrc6lhIgZieC/QoKTiHRH4HgSC3Bf85biFCrI/l+Ui8md0vTCEzNmCPOZ+q7K+Oa i/KFZXCGYeLmJTbrBlBfx7MM6X8pZ1o= Received: by mail.gandi.net (Postfix) with ESMTPSA id 1468C1BF205; Sun, 12 May 2024 17:05:32 +0000 (UTC) Message-ID: <5f66b425-679a-4f1f-9ca1-0c0bf3950a0a@ghiti.fr> Date: Sun, 12 May 2024 19:05:27 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3 15/29] riscv/shstk: If needed allocate a new shadow stack on clone Content-Language: en-US To: Deepak Gupta , paul.walmsley@sifive.com, rick.p.edgecombe@intel.com, broonie@kernel.org, Szabolcs.Nagy@arm.com, kito.cheng@sifive.com, keescook@chromium.org, ajones@ventanamicro.com, conor.dooley@microchip.com, cleger@rivosinc.com, atishp@atishpatra.org, bjorn@rivosinc.com, alexghiti@rivosinc.com, samuel.holland@sifive.com, conor@kernel.org Cc: linux-doc@vger.kernel.org, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, devicetree@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org, corbet@lwn.net, palmer@dabbelt.com, aou@eecs.berkeley.edu, robh+dt@kernel.org, krzysztof.kozlowski+dt@linaro.org, oleg@redhat.com, akpm@linux-foundation.org, arnd@arndb.de, ebiederm@xmission.com, Liam.Howlett@oracle.com, vbabka@suse.cz, lstoakes@gmail.com, shuah@kernel.org, brauner@kernel.org, andy.chiu@sifive.com, jerry.shih@sifive.com, hankuan.chen@sifive.com, greentime.hu@sifive.com, evan@rivosinc.com, xiao.w.wang@intel.com, charlie@rivosinc.com, apatel@ventanamicro.com, mchitale@ventanamicro.com, dbarboza@ventanamicro.com, sameo@rivosinc.com, shikemeng@huaweicloud.com, willy@infradead.org, vincent.chen@sifive.com, guoren@kernel.org, samitolvanen@google.com, songshuaishuai@tinylab.org, gerg@kernel.org, heiko@sntech.de, bhe@redhat.com, jeeheng.sia@starfivetech.com, cyy@cyyself.name, maskray@google.com, ancientmodern4@gmail.com, mathis.salmen@matsal.de, cuiyunhui@bytedance.com, bgray@linux.ibm.com, mpe@ellerman.id.au, baruch@tkos.co.il, alx@kernel.org, david@redhat.com, catalin.marinas@arm.com, revest@chromium.org, josh@joshtriplett.org, shr@devkernel.io, deller@gmx.de, omosnace@redhat.com, ojeda@kernel.org, jhubbard@nvidia.com References: <20240403234054.2020347-1-debug@rivosinc.com> <20240403234054.2020347-16-debug@rivosinc.com> From: Alexandre Ghiti In-Reply-To: <20240403234054.2020347-16-debug@rivosinc.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-GND-Sasl: alex@ghiti.fr X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 52706100010 X-Stat-Signature: 7tkg638nn6818bf141kp45ftd4cc6btz X-HE-Tag: 1715533608-707814 X-HE-Meta: U2FsdGVkX1+2OpbkklfQW+6X3dwpyTO2ZmT6hoN+JLJddBJ2SQXZuIcixuWKQkFMuiU4IKOcCyjnY66nkLT9mhWUicll/NEnYvkNMSB0+O2TK3/se4QD0Sjhf9QqYcPiuXcqT45Xnodi9FsbcII2N9gQBTTbNiM4IEe6wHWURURPvJ8bOHcK6+Q7s3ZshDDZcMloYaGE5cbR3Lc0n7myBXoyrrR4HNvh5XIOZlNLbTzt1mmMb2aRi5t1xvcjqdlkOeGvyhqWT8s+cTatCNWjDEP/FOV2BwusSbKuwBLT8j0e36whagrPOzqTLy1ybpl9BwmD17PwJz89Mh/9wwpTNgEnboNr0oLxiWIxr+utYwvhqRudv6AcT4bUOiU2rcgg7nvlYqoP3qpmwSYSpKrAnZT3wAxtHzjwOHV2KCUMQWR6YSiGCQip0m5Om59qd3a68GtR+V6Im91kyK28WFG2PX9ReSARp5DCaoNEdJ+QnAcmS7hI8NfQo3JWeSmlaMFgLBFTdKD/Vd6HRHSgt4MjbJORtU+DHtiZ9FzAEgUGj8GmsMmYaSBS2BxzLPp1kuk47zAbr8EsPCOOpfhbvKo2ns6jbiDsG4fhLKpz1u9RM1LxAS4sqsSnrtMSY7dluDAHTFmuAG5ge+kj3kEdy5t/i/gyJZ2N/2hjfoasIlIfR4QHtONzV83DomPWrSwRh6XsMvPnjkf06W1G57An7ij0jL2P4BYpkvNdPkhUp2pAIc+q9KrlTiUBiqvfMkOiXr1fxVyBYH5yxpjdVuiF8ZRpOwbX0amJfqOuLwJdt6rHuoYpZYduvEbfhyt2RZQMqqtOmvcuuR3xZf2fS34ox93v5Brtg7w7dbq9AwvIpMuzfL/UH18B/xaS89M+4s9rt/1PFRY4TYhdD3cAeGnJrIJXF2L5qD6E0cS/HeZHkk4UUkILqWk+zJly0TO+AAEnAL2XC48EabLnWTcGejfnpEi r7nQiOHg V9Enx+eClFdS8KJzftrgdXIKBPLBKJlp2d+Lyxb9zptURuoi7FJEiYfubQwJZZr2YgS00HIdS44KTPL87V/GTljLtO2jvai0btR6t4x9ZAmy+V8eaYpsQRN0KAiAF2WgitRj26Dn3DjuDJF4eHXEmr58muv8X+nJ3mwR13q7eTmttdoNngvlXK6GXkBtCm0mtTaimzjIUh0jPXrPkGXtlmJtdvuRmQgAnstuciobIJqFplFv8TPXYqtowTLtDaRJqgAWJ2WFvpPhdsi9QDGo5lBbwGVZcu0yTUwfmnEgzLIA+mQ/Ftfg8fT6ULuqQMNHOIKlt X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 04/04/2024 01:35, Deepak Gupta wrote: > Userspace specifies VM_CLONE to share address space and spawn new thread. CLONE_VM? > `clone` allow userspace to specify a new stack for new thread. However > there is no way to specify new shadow stack base address without changing > API. This patch allocates a new shadow stack whenever VM_CLONE is given. > > In case of VM_FORK, parent is suspended until child finishes and thus can You mean CLONE_VFORK here right? > child use parent shadow stack. In case of !VM_CLONE, COW kicks in because > entire address space is copied from parent to child. > > `clone3` is extensible and can provide mechanisms using which shadow stack > as an input parameter can be provided. This is not settled yet and being > extensively discussed on mailing list. Once that's settled, this commit > will adapt to that. > > Signed-off-by: Deepak Gupta > --- > arch/riscv/include/asm/usercfi.h | 39 ++++++++++ > arch/riscv/kernel/process.c | 12 ++- > arch/riscv/kernel/usercfi.c | 121 +++++++++++++++++++++++++++++++ > 3 files changed, 171 insertions(+), 1 deletion(-) > > diff --git a/arch/riscv/include/asm/usercfi.h b/arch/riscv/include/asm/usercfi.h > index 4fa201b4fc4e..b47574a7a8c9 100644 > --- a/arch/riscv/include/asm/usercfi.h > +++ b/arch/riscv/include/asm/usercfi.h > @@ -8,6 +8,9 @@ > #ifndef __ASSEMBLY__ > #include > > +struct task_struct; > +struct kernel_clone_args; > + > #ifdef CONFIG_RISCV_USER_CFI > struct cfi_status { > unsigned long ubcfi_en : 1; /* Enable for backward cfi. */ > @@ -17,6 +20,42 @@ struct cfi_status { > unsigned long shdw_stk_size; /* size of shadow stack */ > }; > > +unsigned long shstk_alloc_thread_stack(struct task_struct *tsk, > + const struct kernel_clone_args *args); > +void shstk_release(struct task_struct *tsk); > +void set_shstk_base(struct task_struct *task, unsigned long shstk_addr, unsigned long size); > +void set_active_shstk(struct task_struct *task, unsigned long shstk_addr); > +bool is_shstk_enabled(struct task_struct *task); > + > +#else > + > +static inline unsigned long shstk_alloc_thread_stack(struct task_struct *tsk, > + const struct kernel_clone_args *args) > +{ > + return 0; > +} > + > +static inline void shstk_release(struct task_struct *tsk) > +{ > + > +} > + > +static inline void set_shstk_base(struct task_struct *task, unsigned long shstk_addr, > + unsigned long size) > +{ > + > +} > + > +static inline void set_active_shstk(struct task_struct *task, unsigned long shstk_addr) > +{ > + > +} > + > +static inline bool is_shstk_enabled(struct task_struct *task) > +{ > + return false; > +} > + > #endif /* CONFIG_RISCV_USER_CFI */ > > #endif /* __ASSEMBLY__ */ > diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c > index ce577cdc2af3..ef48a25b0eff 100644 > --- a/arch/riscv/kernel/process.c > +++ b/arch/riscv/kernel/process.c > @@ -26,6 +26,7 @@ > #include > #include > #include > +#include > > register unsigned long gp_in_global __asm__("gp"); > > @@ -202,7 +203,8 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src) > > void exit_thread(struct task_struct *tsk) > { > - > + if (IS_ENABLED(CONFIG_RISCV_USER_CFI)) > + shstk_release(tsk); > } > > int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) > @@ -210,6 +212,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) > unsigned long clone_flags = args->flags; > unsigned long usp = args->stack; > unsigned long tls = args->tls; > + unsigned long ssp = 0; > struct pt_regs *childregs = task_pt_regs(p); > > memset(&p->thread.s, 0, sizeof(p->thread.s)); > @@ -225,11 +228,18 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) > p->thread.s[0] = (unsigned long)args->fn; > p->thread.s[1] = (unsigned long)args->fn_arg; > } else { > + /* allocate new shadow stack if needed. In case of CLONE_VM we have to */ > + ssp = shstk_alloc_thread_stack(p, args); > + if (IS_ERR_VALUE(ssp)) > + return PTR_ERR((void *)ssp); > + > *childregs = *(current_pt_regs()); > /* Turn off status.VS */ > riscv_v_vstate_off(childregs); > if (usp) /* User fork */ > childregs->sp = usp; > + if (ssp) /* if needed, set new ssp */ > + set_active_shstk(p, ssp); > if (clone_flags & CLONE_SETTLS) > childregs->tp = tls; > childregs->a0 = 0; /* Return value of fork() */ > diff --git a/arch/riscv/kernel/usercfi.c b/arch/riscv/kernel/usercfi.c > index c4ed0d4e33d6..11ef7ab925c9 100644 > --- a/arch/riscv/kernel/usercfi.c > +++ b/arch/riscv/kernel/usercfi.c > @@ -19,6 +19,41 @@ > > #define SHSTK_ENTRY_SIZE sizeof(void *) > > +bool is_shstk_enabled(struct task_struct *task) > +{ > + return task->thread_info.user_cfi_state.ubcfi_en ? true : false; > +} > + > +void set_shstk_base(struct task_struct *task, unsigned long shstk_addr, unsigned long size) > +{ > + task->thread_info.user_cfi_state.shdw_stk_base = shstk_addr; > + task->thread_info.user_cfi_state.shdw_stk_size = size; > +} > + > +unsigned long get_shstk_base(struct task_struct *task, unsigned long *size) > +{ > + if (size) > + *size = task->thread_info.user_cfi_state.shdw_stk_size; > + return task->thread_info.user_cfi_state.shdw_stk_base; > +} > + > +void set_active_shstk(struct task_struct *task, unsigned long shstk_addr) > +{ > + task->thread_info.user_cfi_state.user_shdw_stk = shstk_addr; > +} > + > +/* > + * If size is 0, then to be compatible with regular stack we want it to be as big as > + * regular stack. Else PAGE_ALIGN it and return back > + */ > +static unsigned long calc_shstk_size(unsigned long size) > +{ > + if (size) > + return PAGE_ALIGN(size); > + > + return PAGE_ALIGN(min_t(unsigned long long, rlimit(RLIMIT_STACK), SZ_4G)); > +} > + > /* > * Writes on shadow stack can either be `sspush` or `ssamoswap`. `sspush` can happen > * implicitly on current shadow stack pointed to by CSR_SSP. `ssamoswap` takes pointer to > @@ -147,3 +182,89 @@ SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, unsi > > return allocate_shadow_stack(addr, aligned_size, size, set_tok); > } > + > +/* > + * This gets called during clone/clone3/fork. And is needed to allocate a shadow stack for > + * cases where CLONE_VM is specified and thus a different stack is specified by user. We > + * thus need a separate shadow stack too. How does separate shadow stack is specified by > + * user is still being debated. Once that's settled, remove this part of the comment. > + * This function simply returns 0 if shadow stack are not supported or if separate shadow > + * stack allocation is not needed (like in case of !CLONE_VM) > + */ > +unsigned long shstk_alloc_thread_stack(struct task_struct *tsk, > + const struct kernel_clone_args *args) > +{ > + unsigned long addr, size; > + > + /* If shadow stack is not supported, return 0 */ > + if (!cpu_supports_shadow_stack()) > + return 0; > + > + /* > + * If shadow stack is not enabled on the new thread, skip any > + * switch to a new shadow stack. > + */ > + if (is_shstk_enabled(tsk)) > + return 0; > + > + /* > + * For CLONE_VFORK the child will share the parents shadow stack. > + * Set base = 0 and size = 0, this is special means to track this state > + * so the freeing logic run for child knows to leave it alone. > + */ > + if (args->flags & CLONE_VFORK) { > + set_shstk_base(tsk, 0, 0); > + return 0; > + } > + > + /* > + * For !CLONE_VM the child will use a copy of the parents shadow > + * stack. > + */ > + if (!(args->flags & CLONE_VM)) > + return 0; > + > + /* > + * reaching here means, CLONE_VM was specified and thus a separate shadow > + * stack is needed for new cloned thread. Note: below allocation is happening > + * using current mm. > + */ > + size = calc_shstk_size(args->stack_size); > + addr = allocate_shadow_stack(0, size, 0, false); > + if (IS_ERR_VALUE(addr)) > + return addr; > + > + set_shstk_base(tsk, addr, size); > + > + return addr + size; > +} > + > +void shstk_release(struct task_struct *tsk) > +{ > + unsigned long base = 0, size = 0; > + /* If shadow stack is not supported or not enabled, nothing to release */ > + if (!cpu_supports_shadow_stack() || > + !is_shstk_enabled(tsk)) > + return; > + > + /* > + * When fork() with CLONE_VM fails, the child (tsk) already has a > + * shadow stack allocated, and exit_thread() calls this function to > + * free it. In this case the parent (current) and the child share > + * the same mm struct. Move forward only when they're same. > + */ > + if (!tsk->mm || tsk->mm != current->mm) > + return; > + > + /* > + * We know shadow stack is enabled but if base is NULL, then > + * this task is not managing its own shadow stack (CLONE_VFORK). So > + * skip freeing it. > + */ > + base = get_shstk_base(tsk, &size); > + if (!base) > + return; > + > + vm_munmap(base, size); > + set_shstk_base(tsk, 0, 0); > +}