From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3BF0FC433F5 for ; Wed, 9 Feb 2022 22:24:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BA7416B0073; Wed, 9 Feb 2022 17:23:59 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B55356B0075; Wed, 9 Feb 2022 17:23:59 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A1DBA6B0078; Wed, 9 Feb 2022 17:23:59 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id 9397A6B0073 for ; Wed, 9 Feb 2022 17:23:59 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id 52C5312067F for ; Wed, 9 Feb 2022 22:23:59 +0000 (UTC) X-FDA: 79124670198.03.27B4E6A Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by imf03.hostedemail.com (Postfix) with ESMTP id 8540020007 for ; Wed, 9 Feb 2022 22:23:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1644445438; x=1675981438; h=message-id:date:mime-version:to:cc:references:from: subject:in-reply-to:content-transfer-encoding; bh=xkpFBXYQrPgtUiGsMKx5OqlH/pGeqV2L1Ykn/ZtyOSU=; b=FKIWkAxSfrguce5lWUZeo9OupWwuAW5tIN9Kn9A75fw42u8npExADaPE JK79r8U1ybfP49xgX/2nZ3ZcuKwM9o4gR6T6TQLzyQU7LLBxskf/1Dlq+ 2fuzEHscgB5xmop38wqDx17e9wherR+cr08xpFNRwFik6pDP5VvYS+net j02pNUiVU9BJSZ7erIoMIFJcIi6dLbT2Ea3eZS+riwjqvw+i0k1EUdt74 aIWrkT3Cy63OZgXz7mHijeJWy3TNb4JFQaHuzhAfsiIIr20coK5hSp7fP nVpy0YAhjRwXX44p6+ZZMgIEfOmDK+b3hty+FDQLngWhU6waC76LCUiGK g==; X-IronPort-AV: E=McAfee;i="6200,9189,10253"; a="230003283" X-IronPort-AV: E=Sophos;i="5.88,356,1635231600"; d="scan'208";a="230003283" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Feb 2022 14:23:57 -0800 X-IronPort-AV: E=Sophos;i="5.88,356,1635231600"; d="scan'208";a="585744181" Received: from sanvery-mobl.amr.corp.intel.com (HELO [10.212.232.139]) ([10.212.232.139]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Feb 2022 14:23:55 -0800 Message-ID: Date: Wed, 9 Feb 2022 14:23:51 -0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.5.0 Content-Language: en-US To: Rick Edgecombe , x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Dave Martin , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com Cc: Yu-cheng Yu References: <20220130211838.8382-1-rick.p.edgecombe@intel.com> <20220130211838.8382-19-rick.p.edgecombe@intel.com> From: Dave Hansen Subject: Re: [PATCH 18/35] mm: Add guard pages around a shadow stack. In-Reply-To: <20220130211838.8382-19-rick.p.edgecombe@intel.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=FKIWkAxS; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf03.hostedemail.com: domain of dave.hansen@intel.com has no SPF policy when checking 192.55.52.151) smtp.mailfrom=dave.hansen@intel.com X-Rspam-User: X-Rspamd-Queue-Id: 8540020007 X-Stat-Signature: yaudca6o83d8saugsgzrkd3y5s5s56u5 X-Rspamd-Server: rspam07 X-HE-Tag: 1644445438-730652 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 1/30/22 13:18, Rick Edgecombe wrote: > INCSSP(Q/D) increments shadow stack pointer and 'pops and discards' the > first and the last elements in the range, effectively touches those memory > areas. This is a pretty close copy of the instruction reference text for INCSSP. I'm feeling rather dense today, but that's just not making any sense. The pseudocode is more sensible in the SDM. I think this needs a better explanation: The INCSSP instruction increments the shadow stack pointer. It is the shadow stack analog of an instruction like: addq $0x80, %rsp However, there is one important difference between an ADD on %rsp and INCSSP. In addition to modifying SSP, INCSSP also reads from the memory of the first and last elements that were "popped". You can think of it as acting like this: READ_ONCE(ssp); // read+discard top element on stack ssp += nr_to_pop * 8; // move the shadow stack READ_ONCE(ssp-8); // read+discard last popped stack element > The maximum moving distance by INCSSPQ is 255 * 8 = 2040 bytes and > 255 * 4 = 1020 bytes by INCSSPD. Both ranges are far from PAGE_SIZE. ... That maximum distance, combined with an a guard pages at the end of a shadow stack ensures that INCSSP will fault before it is able to move across an entire guard page. > Thus, putting a gap page on both ends of a shadow stack prevents INCSSP, > CALL, and RET from going beyond. > > diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h > index a506a411474d..e1533fdc08b4 100644 > --- a/arch/x86/include/asm/page_types.h > +++ b/arch/x86/include/asm/page_types.h > @@ -73,6 +73,13 @@ bool pfn_range_is_mapped(unsigned long start_pfn, unsigned long end_pfn); > > extern void initmem_init(void); > > +#define vm_start_gap vm_start_gap > +struct vm_area_struct; > +extern unsigned long vm_start_gap(struct vm_area_struct *vma); > + > +#define vm_end_gap vm_end_gap > +extern unsigned long vm_end_gap(struct vm_area_struct *vma); > + > #endif /* !__ASSEMBLY__ */ > > #endif /* _ASM_X86_PAGE_DEFS_H */ > diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c > index f3f52c5e2fd6..81f9325084d3 100644 > --- a/arch/x86/mm/mmap.c > +++ b/arch/x86/mm/mmap.c > @@ -250,3 +250,49 @@ bool pfn_modify_allowed(unsigned long pfn, pgprot_t prot) > return false; > return true; > } > + > +/* > + * Shadow stack pointer is moved by CALL, RET, and INCSSP(Q/D). INCSSPQ > + * moves shadow stack pointer up to 255 * 8 = ~2 KB (~1KB for INCSSPD) and > + * touches the first and the last element in the range, which triggers a > + * page fault if the range is not in a shadow stack. Because of this, > + * creating 4-KB guard pages around a shadow stack prevents these > + * instructions from going beyond. > + */ > +#define SHADOW_STACK_GUARD_GAP PAGE_SIZE > + > +unsigned long vm_start_gap(struct vm_area_struct *vma) > +{ > + unsigned long vm_start = vma->vm_start; > + unsigned long gap = 0; > + > + if (vma->vm_flags & VM_GROWSDOWN) > + gap = stack_guard_gap; > + else if (vma->vm_flags & VM_SHADOW_STACK) > + gap = SHADOW_STACK_GUARD_GAP; > + > + if (gap != 0) { > + vm_start -= gap; > + if (vm_start > vma->vm_start) > + vm_start = 0; > + } > + return vm_start; > +} > + > +unsigned long vm_end_gap(struct vm_area_struct *vma) > +{ > + unsigned long vm_end = vma->vm_end; > + unsigned long gap = 0; > + > + if (vma->vm_flags & VM_GROWSUP) > + gap = stack_guard_gap; > + else if (vma->vm_flags & VM_SHADOW_STACK) > + gap = SHADOW_STACK_GUARD_GAP; > + > + if (gap != 0) { > + vm_end += gap; > + if (vm_end < vma->vm_end) > + vm_end = -PAGE_SIZE; > + } > + return vm_end; > +} First of all, __weak would be a lot better than these #ifdefs. Second, I have the same basic objection to this as the maybe_mkwrite() mess. This is a forked copy of the code. Instead of refactoring, it's just copied-pasted-and-#ifdef'd. Not so nice. Isn't this just a matter of overriding 'stack_guard_gap' for VM_SHADOW_STACK? Why don't we just do this: unsigned long stack_guard_gap(struct vm_area_struct *vma) { if (vma->vm_flags & VM_SHADOW_STACK) return SHADOW_STACK_GUARD_GAP; return __stack_guard_gap; } Or, worst-case if people don't want 2 easily compiled-out lines added to generic code, define: unsigned long __weak stack_guard_gap(struct vm_area_struct *vma) { return __stack_guard_gap; } in generic code, and put the top definition in arch/x86.