From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3A774C4346B for ; Fri, 18 Sep 2020 19:23:16 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C596E23119 for ; Fri, 18 Sep 2020 19:23:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C596E23119 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4E3176B0087; Fri, 18 Sep 2020 15:22:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4BB0C6B0088; Fri, 18 Sep 2020 15:22:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3354A8E0001; Fri, 18 Sep 2020 15:22:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0232.hostedemail.com [216.40.44.232]) by kanga.kvack.org (Postfix) with ESMTP id 1A9276B0087 for ; Fri, 18 Sep 2020 15:22:29 -0400 (EDT) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id DAD121F1A for ; Fri, 18 Sep 2020 19:22:28 +0000 (UTC) X-FDA: 77277153576.12.dogs57_5317c5c2712d Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin12.hostedemail.com (Postfix) with ESMTP id BB4AF180559F1 for ; Fri, 18 Sep 2020 19:22:28 +0000 (UTC) X-HE-Tag: dogs57_5317c5c2712d X-Filterd-Recvd-Size: 10238 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf23.hostedemail.com (Postfix) with ESMTP for ; Fri, 18 Sep 2020 19:22:27 +0000 (UTC) IronPort-SDR: gop4BoYxuhRD2Hb78Tm1Eyi9AhcKcOv30VSaQdSJ7++rpA4FfENCBykvix4UAZ+5QXJg8gdS0m hZ8Q7+lveWQA== X-IronPort-AV: E=McAfee;i="6000,8403,9748"; a="147696214" X-IronPort-AV: E=Sophos;i="5.77,274,1596524400"; d="scan'208";a="147696214" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Sep 2020 12:22:27 -0700 IronPort-SDR: yBE766AAG9B165CNNa1Ox4ZxXll7CQVYFwiBULWmh6MjhBvEgF9Aph2PFFTZX1YhMZDA3wb/Rm aFotpFKWJUlQ== X-IronPort-AV: E=Sophos;i="5.77,274,1596524400"; d="scan'208";a="484331943" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Sep 2020 12:22:26 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang Cc: Yu-cheng Yu Subject: [PATCH v12 26/26] mm: Introduce PROT_SHSTK for shadow stack Date: Fri, 18 Sep 2020 12:21:24 -0700 Message-Id: <20200918192125.25473-27-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200918192125.25473-1-yu-cheng.yu@intel.com> References: <20200918192125.25473-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: There are three possible options to create a shadow stack allocation API: an arch_prctl, a new syscall, or adding PROT_SHSTK to mmap()/mprotect(). Each has its advantages and compromises. An arch_prctl() is the least intrusive. However, the existing x86 arch_prctl() takes only two parameters. Multiple parameters must be passed in a memory buffer. There is a proposal to pass more parameters i= n registers [1], but no active discussion on that. A new syscall minimizes compatibility issues and offers an extensible fra= me work to other architectures, but this will likely result in some overlap = of mmap()/mprotect(). The introduction of PROT_SHSTK to mmap()/mprotect() takes advantage of existing APIs. The x86-specific PROT_SHSTK is translated to VM_SHSTK and a shadow stack mapping is created without reinventing the wheel. There a= re potential pitfalls though. The most obvious one would be using this as a bypass to shadow stack protection. However, the attacker would have to g= et to the syscall first. Since arch_calc_vm_prot_bits() is modified, I have moved arch_vm_get_page _prot() and arch_calc_vm_prot_bits() to x86/include/asm/mman.h. This will be more consistent with other architectures. [1] https://lore.kernel.org/lkml/20200828121624.108243-1-hjl.tools@gmail.= com/ Signed-off-by: Yu-cheng Yu --- arch/x86/include/asm/mman.h | 81 ++++++++++++++++++++++++++++++++ arch/x86/include/uapi/asm/mman.h | 28 ++--------- include/linux/mman.h | 8 ++++ mm/mmap.c | 8 +++- mm/mprotect.c | 4 ++ 5 files changed, 104 insertions(+), 25 deletions(-) create mode 100644 arch/x86/include/asm/mman.h diff --git a/arch/x86/include/asm/mman.h b/arch/x86/include/asm/mman.h new file mode 100644 index 000000000000..4e73dcc0fea2 --- /dev/null +++ b/arch/x86/include/asm/mman.h @@ -0,0 +1,81 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_MMAN_H +#define _ASM_X86_MMAN_H + +#include +#include +#include + +#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS +/* + * Take the 4 protection key bits out of the vma->vm_flags + * value and turn them in to the bits that we can put in + * to a pte. + * + * Only override these if Protection Keys are available + * (which is only on 64-bit). + */ +#define arch_vm_get_page_prot(vm_flags) __pgprot( \ + ((vm_flags) & VM_PKEY_BIT0 ? _PAGE_PKEY_BIT0 : 0) | \ + ((vm_flags) & VM_PKEY_BIT1 ? _PAGE_PKEY_BIT1 : 0) | \ + ((vm_flags) & VM_PKEY_BIT2 ? _PAGE_PKEY_BIT2 : 0) | \ + ((vm_flags) & VM_PKEY_BIT3 ? _PAGE_PKEY_BIT3 : 0)) + +#define pkey_vm_prot_bits(prot, key) ( \ + ((key) & 0x1 ? VM_PKEY_BIT0 : 0) | \ + ((key) & 0x2 ? VM_PKEY_BIT1 : 0) | \ + ((key) & 0x4 ? VM_PKEY_BIT2 : 0) | \ + ((key) & 0x8 ? VM_PKEY_BIT3 : 0)) +#else +#define pkey_vm_prot_bits(prot, key) (0) +#endif + +static inline unsigned long arch_calc_vm_prot_bits(unsigned long prot, + unsigned long pkey) +{ + unsigned long vm_prot_bits =3D pkey_vm_prot_bits(prot, pkey); + + if (!(prot & PROT_WRITE) && (prot & PROT_SHSTK)) + vm_prot_bits |=3D VM_SHSTK; + + return vm_prot_bits; +} +#define arch_calc_vm_prot_bits(prot, pkey) arch_calc_vm_prot_bits(prot, = pkey) + +static inline bool arch_validate_prot(unsigned long prot, unsigned long = addr) +{ + unsigned long supported =3D PROT_READ | PROT_WRITE | PROT_EXEC | PROT_S= EM; + + if (IS_ENABLED(CONFIG_X86_INTEL_SHADOW_STACK_USER) && + static_cpu_has(X86_FEATURE_SHSTK) && (prot & PROT_SHSTK)) { + + supported |=3D PROT_SHSTK; + + /* + * A shadow stack mapping is indirectly writable by only + * the CALL and WRUSS instructions, but not other write + * instructions). PROT_SHSTK and PROT_WRITE are mutually + * exclusive. + */ + supported &=3D ~PROT_WRITE; + } + + return (prot & ~supported) =3D=3D 0; +} +#define arch_validate_prot arch_validate_prot + +static inline bool arch_vma_can_mprot(struct vm_area_struct *vma, + unsigned long prot) +{ + bool can_mprot; + + /* + * Function call stack should not be backed by a file or shared. + */ + can_mprot =3D !(prot & PROT_SHSTK) || + !(vma->vm_file || (vma->vm_flags & VM_SHARED)); + return can_mprot; +} +#define arch_vma_can_mprot arch_vma_can_mprot + +#endif /* _ASM_X86_MMAN_H */ diff --git a/arch/x86/include/uapi/asm/mman.h b/arch/x86/include/uapi/asm= /mman.h index d4a8d0424bfb..39bb7db344a6 100644 --- a/arch/x86/include/uapi/asm/mman.h +++ b/arch/x86/include/uapi/asm/mman.h @@ -1,31 +1,11 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ -#ifndef _ASM_X86_MMAN_H -#define _ASM_X86_MMAN_H +#ifndef _UAPI_ASM_X86_MMAN_H +#define _UAPI_ASM_X86_MMAN_H =20 #define MAP_32BIT 0x40 /* only give out 32bit addresses */ =20 -#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS -/* - * Take the 4 protection key bits out of the vma->vm_flags - * value and turn them in to the bits that we can put in - * to a pte. - * - * Only override these if Protection Keys are available - * (which is only on 64-bit). - */ -#define arch_vm_get_page_prot(vm_flags) __pgprot( \ - ((vm_flags) & VM_PKEY_BIT0 ? _PAGE_PKEY_BIT0 : 0) | \ - ((vm_flags) & VM_PKEY_BIT1 ? _PAGE_PKEY_BIT1 : 0) | \ - ((vm_flags) & VM_PKEY_BIT2 ? _PAGE_PKEY_BIT2 : 0) | \ - ((vm_flags) & VM_PKEY_BIT3 ? _PAGE_PKEY_BIT3 : 0)) - -#define arch_calc_vm_prot_bits(prot, key) ( \ - ((key) & 0x1 ? VM_PKEY_BIT0 : 0) | \ - ((key) & 0x2 ? VM_PKEY_BIT1 : 0) | \ - ((key) & 0x4 ? VM_PKEY_BIT2 : 0) | \ - ((key) & 0x8 ? VM_PKEY_BIT3 : 0)) -#endif +#define PROT_SHSTK 0x10 /* shadow stack pages */ =20 #include =20 -#endif /* _ASM_X86_MMAN_H */ +#endif /* _UAPI_ASM_X86_MMAN_H */ diff --git a/include/linux/mman.h b/include/linux/mman.h index 6f34c33075f9..4d776adb0fdf 100644 --- a/include/linux/mman.h +++ b/include/linux/mman.h @@ -103,6 +103,14 @@ static inline bool arch_validate_prot(unsigned long = prot, unsigned long addr) #define arch_validate_prot arch_validate_prot #endif =20 +#ifndef arch_vma_can_mprot +/* + * Allow architectures to check if the vma can support the new + * protection. + */ +#define arch_vma_can_mprot(vma, prot) true +#endif + /* * Optimisation macro. It is equivalent to: * (x & bit1) ? bit2 : 0 diff --git a/mm/mmap.c b/mm/mmap.c index 81d4a00092da..4c403dfccff0 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1445,6 +1445,12 @@ unsigned long do_mmap(struct file *file, unsigned = long addr, struct inode *inode =3D file_inode(file); unsigned long flags_mask; =20 + /* + * Call stack cannot be backed by a file. + */ + if (vm_flags & VM_SHSTK) + return -EINVAL; + if (!file_mmap_ok(file, inode, pgoff, len)) return -EOVERFLOW; =20 @@ -1509,7 +1515,7 @@ unsigned long do_mmap(struct file *file, unsigned l= ong addr, } else { switch (flags & MAP_TYPE) { case MAP_SHARED: - if (vm_flags & (VM_GROWSDOWN|VM_GROWSUP)) + if (vm_flags & (VM_GROWSDOWN|VM_GROWSUP|VM_SHSTK)) return -EINVAL; /* * Ignore pgoff. diff --git a/mm/mprotect.c b/mm/mprotect.c index a8edbcb3af99..cf73b59a36da 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -553,6 +553,10 @@ static int do_mprotect_pkey(unsigned long start, siz= e_t len, error =3D -ENOMEM; if (!vma) goto out; + if (!arch_vma_can_mprot(vma, prot)) { + error =3D -EINVAL; + goto out; + } prev =3D vma->vm_prev; if (unlikely(grows & PROT_GROWSDOWN)) { if (vma->vm_start >=3D end) --=20 2.21.0