From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C7BC0C2BB84 for ; Wed, 10 Mar 2021 22:02:16 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7463965007 for ; Wed, 10 Mar 2021 22:02:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7463965007 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id AD4388D022F; Wed, 10 Mar 2021 17:01:26 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A27088D0232; Wed, 10 Mar 2021 17:01:26 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7B0128D022F; Wed, 10 Mar 2021 17:01:26 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0142.hostedemail.com [216.40.44.142]) by kanga.kvack.org (Postfix) with ESMTP id 47ADA8D0232 for ; Wed, 10 Mar 2021 17:01:26 -0500 (EST) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 00793181AEF00 for ; Wed, 10 Mar 2021 22:01:25 +0000 (UTC) X-FDA: 77905336572.11.727E714 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by imf15.hostedemail.com (Postfix) with ESMTP id 91F07A0009E8 for ; Wed, 10 Mar 2021 22:01:24 +0000 (UTC) IronPort-SDR: 4vulciEnWcHgCv26lPQQIPNuwTS2KVJmjLl5oG5tsACAkx/sjEugFe2UN7zXyVttzii7iMcKFE kwCDC9vdo8Cw== X-IronPort-AV: E=McAfee;i="6000,8403,9919"; a="167846462" X-IronPort-AV: E=Sophos;i="5.81,238,1610438400"; d="scan'208";a="167846462" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Mar 2021 14:01:19 -0800 IronPort-SDR: SqC66tsQH6ujpbNfncDAjx2KqX8oPbx63vuPSywzB7mRH8M9dTNgaQBf0AG0IgsJQhQp5F4kj8 rrkqwl4FW75w== X-IronPort-AV: E=Sophos;i="5.81,238,1610438400"; d="scan'208";a="403847665" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Mar 2021 14:01:18 -0800 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu , Haitao Huang Cc: Yu-cheng Yu Subject: [PATCH v22 25/28] x86/cet/shstk: Handle thread shadow stack Date: Wed, 10 Mar 2021 14:00:43 -0800 Message-Id: <20210310220046.15866-26-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20210310220046.15866-1-yu-cheng.yu@intel.com> References: <20210310220046.15866-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Stat-Signature: pstuk1557ommapxf87cnex5ohgekn7x6 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 91F07A0009E8 Received-SPF: none (intel.com>: No applicable sender policy available) receiver=imf15; identity=mailfrom; envelope-from=""; helo=mga12.intel.com; client-ip=192.55.52.136 X-HE-DKIM-Result: none/none X-HE-Tag: 1615413684-538006 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The kernel allocates (and frees on thread exit) a new shadow stack for a pthread child. It is possible for the kernel to complete the clone syscall and set t= he child's shadow stack pointer to NULL and let the child thread allocat= e a shadow stack for itself. There are two issues in this approach: It is not compatible with existing code that does inline syscall and it cannot handle signals before the child can successfully allocate a shadow stack. Use stack_size passed from clone3() syscall for thread shadow stack size, but cap it to min(RLIMIT_STACK, 4 GB). A compat-mode thread shadow stack size is further reduced to 1/4. This allows more threads to run in a 32- bit address space. Signed-off-by: Yu-cheng Yu --- arch/x86/include/asm/cet.h | 5 +++ arch/x86/include/asm/mmu_context.h | 3 ++ arch/x86/kernel/cet.c | 49 ++++++++++++++++++++++++++++++ arch/x86/kernel/process.c | 15 +++++++-- 4 files changed, 69 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h index 73435856ce54..5d66340c7a13 100644 --- a/arch/x86/include/asm/cet.h +++ b/arch/x86/include/asm/cet.h @@ -18,12 +18,17 @@ struct cet_status { =20 #ifdef CONFIG_X86_CET int cet_setup_shstk(void); +int cet_setup_thread_shstk(struct task_struct *p, unsigned long clone_fl= ags, + unsigned long stack_size); void cet_disable_shstk(void); void cet_free_shstk(struct task_struct *p); int cet_verify_rstor_token(bool ia32, unsigned long ssp, unsigned long *= new_ssp); void cet_restore_signal(struct sc_ext *sc); int cet_setup_signal(bool ia32, unsigned long rstor, struct sc_ext *sc); #else +static inline int cet_setup_thread_shstk(struct task_struct *p, + unsigned long clone_flags, + unsigned long stack_size) { return 0; } static inline void cet_disable_shstk(void) {} static inline void cet_free_shstk(struct task_struct *p) {} static inline void cet_restore_signal(struct sc_ext *sc) { return; } diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mm= u_context.h index 27516046117a..e90bd2ee8498 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -11,6 +11,7 @@ =20 #include #include +#include #include =20 extern atomic64_t last_mm_ctx_id; @@ -146,6 +147,8 @@ do { \ #else #define deactivate_mm(tsk, mm) \ do { \ + if (!tsk->vfork_done) \ + cet_free_shstk(tsk); \ load_gs_index(0); \ loadsegment(fs, 0); \ } while (0) diff --git a/arch/x86/kernel/cet.c b/arch/x86/kernel/cet.c index 08e43d9b5176..12738cdfb5f2 100644 --- a/arch/x86/kernel/cet.c +++ b/arch/x86/kernel/cet.c @@ -172,6 +172,55 @@ int cet_setup_shstk(void) return 0; } =20 +int cet_setup_thread_shstk(struct task_struct *tsk, unsigned long clone_= flags, + unsigned long stack_size) +{ + unsigned long addr, size; + struct cet_user_state *state; + struct cet_status *cet =3D &tsk->thread.cet; + + if (!cet->shstk_size) + return 0; + + if ((clone_flags & (CLONE_VFORK | CLONE_VM)) !=3D CLONE_VM) + return 0; + + state =3D get_xsave_addr(&tsk->thread.fpu.state.xsave, + XFEATURE_CET_USER); + + if (!state) + return -EINVAL; + + if (stack_size =3D=3D 0) + return -EINVAL; + + /* Cap shadow stack size to 4 GB */ + size =3D min(rlimit(RLIMIT_STACK), 1UL << 32); + size =3D min(size, stack_size); + + /* + * Compat-mode pthreads share a limited address space. + * If each function call takes an average of four slots + * stack space, allocate 1/4 of stack size for shadow stack. + */ + if (in_compat_syscall()) + size /=3D 4; + size =3D round_up(size, PAGE_SIZE); + addr =3D alloc_shstk(size, 0); + + if (IS_ERR_VALUE(addr)) { + cet->shstk_base =3D 0; + cet->shstk_size =3D 0; + return PTR_ERR((void *)addr); + } + + fpu__prepare_write(&tsk->thread.fpu); + state->user_ssp =3D (u64)(addr + size); + cet->shstk_base =3D addr; + cet->shstk_size =3D size; + return 0; +} + void cet_disable_shstk(void) { struct cet_status *cet =3D ¤t->thread.cet; diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 9c214d7085a4..b7c8fe2d93ec 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -43,6 +43,7 @@ #include #include #include +#include =20 #include "process.h" =20 @@ -109,6 +110,7 @@ void exit_thread(struct task_struct *tsk) =20 free_vm86(t); =20 + cet_free_shstk(tsk); fpu__drop(fpu); } =20 @@ -122,8 +124,9 @@ static int set_new_tls(struct task_struct *p, unsigne= d long tls) return do_set_thread_area_64(p, ARCH_SET_FS, tls); } =20 -int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned lo= ng arg, - struct task_struct *p, unsigned long tls) +int copy_thread(unsigned long clone_flags, unsigned long sp, + unsigned long stack_size, struct task_struct *p, + unsigned long tls) { struct inactive_task_frame *frame; struct fork_frame *fork_frame; @@ -163,7 +166,7 @@ int copy_thread(unsigned long clone_flags, unsigned l= ong sp, unsigned long arg, /* Kernel thread ? */ if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) { memset(childregs, 0, sizeof(struct pt_regs)); - kthread_frame_init(frame, sp, arg); + kthread_frame_init(frame, sp, stack_size); return 0; } =20 @@ -181,6 +184,12 @@ int copy_thread(unsigned long clone_flags, unsigned = long sp, unsigned long arg, if (clone_flags & CLONE_SETTLS) ret =3D set_new_tls(p, tls); =20 +#ifdef CONFIG_X86_64 + /* Allocate a new shadow stack for pthread */ + if (!ret) + ret =3D cet_setup_thread_shstk(p, clone_flags, stack_size); +#endif + if (!ret && unlikely(test_tsk_thread_flag(current, TIF_IO_BITMAP))) io_bitmap_share(p); =20 --=20 2.21.0