From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AFBDEC35E15 for ; Tue, 25 Feb 2020 21:29:17 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 65CC821927 for ; Tue, 25 Feb 2020 21:29:17 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b="nUVRiccN" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 65CC821927 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=chromium.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E28E16B0003; Tue, 25 Feb 2020 16:29:16 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DD9616B0005; Tue, 25 Feb 2020 16:29:16 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CAB106B0006; Tue, 25 Feb 2020 16:29:16 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0070.hostedemail.com [216.40.44.70]) by kanga.kvack.org (Postfix) with ESMTP id B02C56B0003 for ; Tue, 25 Feb 2020 16:29:16 -0500 (EST) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 5D8A640C0 for ; Tue, 25 Feb 2020 21:29:16 +0000 (UTC) X-FDA: 76529940312.15.shoe16_16aae306a1d3a X-HE-Tag: shoe16_16aae306a1d3a X-Filterd-Recvd-Size: 9387 Received: from mail-pg1-f195.google.com (mail-pg1-f195.google.com [209.85.215.195]) by imf31.hostedemail.com (Postfix) with ESMTP for ; Tue, 25 Feb 2020 21:29:15 +0000 (UTC) Received: by mail-pg1-f195.google.com with SMTP id y30so163524pga.13 for ; Tue, 25 Feb 2020 13:29:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=S/FIbI+B+1AXtIG+6X6UmoCI89bJQls8Ya5JRg19+CE=; b=nUVRiccN9kZCxZ83HtHJa51NNtoOmhsH0gDd2R02Zuy/bNWgSZfP9eP19YE814le55 7kwf1lMZp4dvDhPY2ZgWfbUgLisO2TgpGB8/edRR/7sZJAFQZi9CSQY6bG3Wg9oz9KtI S7lj68pzu/b9+gus/FA3mGwlI/D4djZRtiV1s= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=S/FIbI+B+1AXtIG+6X6UmoCI89bJQls8Ya5JRg19+CE=; b=COuS526igGPniuGuywfNnYvylDHvZxonAg3AQjHnI/LN/iOYa9Hn4HhrX7R5AHv5d1 goY7IOf0UELa5TbGmXPehGYHoF41dZHNyAO+WDMBingYAGaLk1ytK4/22z+Ieu5idRY5 0Rcf7buzCnMzzxvoQ3snZivJyh2UKxGp/pJB7d6nq5BL6RKth65YzKgO02aHtY1p2fCB kojc9/k0FwnoogQdmVhywIrCYUT4Vf5ttZbOeVe3QbjX0L73bpUzrdex0ooQdGGXowQV EUML4SmohrEAoV4l6xPK9PsGfZELr2AOpgFBEDT9sjmoVvK5cWQCPcJurxxKAz39R+ve 4zgQ== X-Gm-Message-State: APjAAAX+F6NKO0lW5C+qFyQbYuQ+bWSt4sgsJXbdXfoMEgAZ8ySVpnZv Bn7NkgPZLoA33x5uGZb+HdwUew== X-Google-Smtp-Source: APXvYqxz9bH6oSUuy2LfLyRx0MtHyLriFGvoytHJvrFkPoAEiRc69XTS4QKMITcSoTrsvxe7ZtzgPQ== X-Received: by 2002:a63:4763:: with SMTP id w35mr488829pgk.113.1582666154657; Tue, 25 Feb 2020 13:29:14 -0800 (PST) Received: from www.outflux.net (smtp.outflux.net. [198.145.64.163]) by smtp.gmail.com with ESMTPSA id az9sm52982pjb.3.2020.02.25.13.29.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 25 Feb 2020 13:29:13 -0800 (PST) Date: Tue, 25 Feb 2020 13:29:12 -0800 From: Kees Cook To: Yu-cheng Yu Cc: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , x86-patch-review@intel.com Subject: Re: [RFC PATCH v9 25/27] x86/cet/shstk: Handle thread Shadow Stack Message-ID: <202002251324.5D515260@keescook> References: <20200205181935.3712-1-yu-cheng.yu@intel.com> <20200205181935.3712-26-yu-cheng.yu@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200205181935.3712-26-yu-cheng.yu@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Feb 05, 2020 at 10:19:33AM -0800, Yu-cheng Yu wrote: > The Shadow Stack (SHSTK) for clone/fork is handled as the following: > > (1) If ((clone_flags & (CLONE_VFORK | CLONE_VM)) == CLONE_VM), > the kernel allocates (and frees on thread exit) a new SHSTK for the > child. > > It is possible for the kernel to complete the clone syscall and set the > child's SHSTK pointer to NULL and let the child thread allocate a SHSTK > for itself. There are two issues in this approach: It is not > compatible with existing code that does inline syscall and it cannot > handle signals before the child can successfully allocate a SHSTK. > > (2) For (clone_flags & CLONE_VFORK), the child uses the existing SHSTK. > > (3) For all other cases, the SHSTK is copied/reused whenever the parent or > the child does a call/ret. > > This patch handles cases (1) & (2). Case (3) is handled in the SHSTK page > fault patches. > > A 64-bit SHSTK has a fixed size of RLIMIT_STACK. A compat-mode thread SHSTK > has a fixed size of 1/4 RLIMIT_STACK. This allows more threads to share a > 32-bit address space. I am not understanding this part. :) Entries are sizeof(unsigned long), yes? A 1/2 RLIMIT_STACK would cover 32-bit, but 1/4 is less, so why does that provide for more threads? > > Signed-off-by: Yu-cheng Yu > --- > arch/x86/include/asm/cet.h | 2 ++ > arch/x86/include/asm/mmu_context.h | 3 +++ > arch/x86/kernel/cet.c | 41 ++++++++++++++++++++++++++++++ > arch/x86/kernel/process.c | 7 +++++ > 4 files changed, 53 insertions(+) > > diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h > index 409d4f91a0dc..9a3e2da9c1c4 100644 > --- a/arch/x86/include/asm/cet.h > +++ b/arch/x86/include/asm/cet.h > @@ -19,10 +19,12 @@ struct cet_status { > > #ifdef CONFIG_X86_INTEL_CET > int cet_setup_shstk(void); > +int cet_setup_thread_shstk(struct task_struct *p); > void cet_disable_free_shstk(struct task_struct *p); > int cet_restore_signal(bool ia32, struct sc_ext *sc); > int cet_setup_signal(bool ia32, unsigned long rstor, struct sc_ext *sc); > #else > +static inline int cet_setup_thread_shstk(struct task_struct *p) { return 0; } > static inline void cet_disable_free_shstk(struct task_struct *p) {} > static inline int cet_restore_signal(bool ia32, struct sc_ext *sc) { return -EINVAL; } > static inline int cet_setup_signal(bool ia32, unsigned long rstor, > diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h > index 5f33924e200f..6a8189308823 100644 > --- a/arch/x86/include/asm/mmu_context.h > +++ b/arch/x86/include/asm/mmu_context.h > @@ -13,6 +13,7 @@ > #include > #include > #include > +#include > #include > > extern atomic64_t last_mm_ctx_id; > @@ -230,6 +231,8 @@ do { \ > #else > #define deactivate_mm(tsk, mm) \ > do { \ > + if (!tsk->vfork_done) \ > + cet_disable_free_shstk(tsk); \ > load_gs_index(0); \ > loadsegment(fs, 0); \ > } while (0) > diff --git a/arch/x86/kernel/cet.c b/arch/x86/kernel/cet.c > index cba5c7656aab..5b45abda80a1 100644 > --- a/arch/x86/kernel/cet.c > +++ b/arch/x86/kernel/cet.c > @@ -170,6 +170,47 @@ int cet_setup_shstk(void) > return 0; > } > > +int cet_setup_thread_shstk(struct task_struct *tsk) > +{ > + unsigned long addr, size; > + struct cet_user_state *state; > + struct cet_status *cet = &tsk->thread.cet; > + > + if (!cet->shstk_enabled) > + return 0; > + > + state = get_xsave_addr(&tsk->thread.fpu.state.xsave, > + XFEATURE_CET_USER); > + > + if (!state) > + return -EINVAL; > + > + size = rlimit(RLIMIT_STACK); Is SHSTK incompatible with RLIM_INFINITY stack rlimits? > + > + /* > + * Compat-mode pthreads share a limited address space. > + * If each function call takes an average of four slots > + * stack space, we need 1/4 of stack size for shadow stack. > + */ > + if (in_compat_syscall()) > + size /= 4; > + > + addr = alloc_shstk(size); I assume it'd fail here, but I worry about Stack Clash style attacks. I'd like to see test cases that make sure the SHSTK gap is working correctly. -Kees > + > + if (IS_ERR((void *)addr)) { > + cet->shstk_base = 0; > + cet->shstk_size = 0; > + cet->shstk_enabled = 0; > + return PTR_ERR((void *)addr); > + } > + > + fpu__prepare_write(&tsk->thread.fpu); > + state->user_ssp = (u64)(addr + size); > + cet->shstk_base = addr; > + cet->shstk_size = size; > + return 0; > +} > + > void cet_disable_free_shstk(struct task_struct *tsk) > { > struct cet_status *cet = &tsk->thread.cet; > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c > index e102e63de641..7098618142f2 100644 > --- a/arch/x86/kernel/process.c > +++ b/arch/x86/kernel/process.c > @@ -110,6 +110,7 @@ void exit_thread(struct task_struct *tsk) > > free_vm86(t); > > + cet_disable_free_shstk(tsk); > fpu__drop(fpu); > } > > @@ -180,6 +181,12 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp, > if (clone_flags & CLONE_SETTLS) > ret = set_new_tls(p, tls); > > +#ifdef CONFIG_X86_64 > + /* Allocate a new shadow stack for pthread */ > + if (!ret && (clone_flags & (CLONE_VFORK | CLONE_VM)) == CLONE_VM) > + ret = cet_setup_thread_shstk(p); > +#endif > + > if (!ret && unlikely(test_tsk_thread_flag(current, TIF_IO_BITMAP))) > io_bitmap_share(p); > > -- > 2.21.0 > -- Kees Cook