From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.6 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2E01FC433E0 for ; Fri, 5 Feb 2021 18:58:41 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9EB2464DA5 for ; Fri, 5 Feb 2021 18:58:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9EB2464DA5 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 1123A6B006C; Fri, 5 Feb 2021 13:58:40 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 09BEB6B006E; Fri, 5 Feb 2021 13:58:40 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E57276B0070; Fri, 5 Feb 2021 13:58:39 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0039.hostedemail.com [216.40.44.39]) by kanga.kvack.org (Postfix) with ESMTP id CB7696B006C for ; Fri, 5 Feb 2021 13:58:39 -0500 (EST) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 912D6180AD811 for ; Fri, 5 Feb 2021 18:58:39 +0000 (UTC) X-FDA: 77785125558.29.berry88_3505f6c275e7 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin29.hostedemail.com (Postfix) with ESMTP id 7108318086CB5 for ; Fri, 5 Feb 2021 18:58:39 +0000 (UTC) X-HE-Tag: berry88_3505f6c275e7 X-Filterd-Recvd-Size: 5985 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf26.hostedemail.com (Postfix) with ESMTP for ; Fri, 5 Feb 2021 18:58:38 +0000 (UTC) IronPort-SDR: bx7VlAnT/tosfRNfhTf77sHUPD4K6GiYv84mDexQWpi1vCFODNwixRD6ZNARzV0rBZ5AM/PhHN QlvkofL1l0oA== X-IronPort-AV: E=McAfee;i="6000,8403,9886"; a="168593703" X-IronPort-AV: E=Sophos;i="5.81,155,1610438400"; d="scan'208";a="168593703" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2021 10:58:36 -0800 IronPort-SDR: PKLSDoZriGTUaK7SMMGmhgy19dl1PLEvCUb5ekv5liqF4kTxmJEPaL5VV0T3jMRlLQvtxk1XlD jqrZlu7DH/0w== X-IronPort-AV: E=Sophos;i="5.81,155,1610438400"; d="scan'208";a="357790568" Received: from yyu32-mobl1.amr.corp.intel.com (HELO [10.212.95.7]) ([10.212.95.7]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2021 10:58:34 -0800 Subject: Re: [PATCH v19 08/25] x86/mm: Introduce _PAGE_COW To: Kees Cook Cc: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu References: <20210203225547.32221-1-yu-cheng.yu@intel.com> <20210203225547.32221-9-yu-cheng.yu@intel.com> <202102041215.B54FCA552F@keescook> From: "Yu, Yu-cheng" Message-ID: <21b1e325-a17d-c859-973d-de66c1401f19@intel.com> Date: Fri, 5 Feb 2021 10:58:33 -0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.7.0 MIME-Version: 1.0 In-Reply-To: <202102041215.B54FCA552F@keescook> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2/4/2021 12:19 PM, Kees Cook wrote: > On Wed, Feb 03, 2021 at 02:55:30PM -0800, Yu-cheng Yu wrote: >> There is essentially no room left in the x86 hardware PTEs on some OSes >> (not Linux). That left the hardware architects looking for a way to >> represent a new memory type (shadow stack) within the existing bits. >> They chose to repurpose a lightly-used state: Write=0, Dirty=1. >> >> The reason it's lightly used is that Dirty=1 is normally set by hardware >> and cannot normally be set by hardware on a Write=0 PTE. Software must >> normally be involved to create one of these PTEs, so software can simply >> opt to not create them. >> >> In places where Linux normally creates Write=0, Dirty=1, it can use the >> software-defined _PAGE_COW in place of the hardware _PAGE_DIRTY. In other >> words, whenever Linux needs to create Write=0, Dirty=1, it instead creates >> Write=0, Cow=1, except for shadow stack, which is Write=0, Dirty=1. This >> clearly separates shadow stack from other data, and results in the >> following: >> >> (a) A modified, copy-on-write (COW) page: (Write=0, Cow=1) >> (b) A R/O page that has been COW'ed: (Write=0, Cow=1) >> The user page is in a R/O VMA, and get_user_pages() needs a writable >> copy. The page fault handler creates a copy of the page and sets >> the new copy's PTE as Write=0 and Cow=1. >> (c) A shadow stack PTE: (Write=0, Dirty=1) >> (d) A shared shadow stack PTE: (Write=0, Cow=1) >> When a shadow stack page is being shared among processes (this happens >> at fork()), its PTE is made Dirty=0, so the next shadow stack access >> causes a fault, and the page is duplicated and Dirty=1 is set again. >> This is the COW equivalent for shadow stack pages, even though it's >> copy-on-access rather than copy-on-write. >> (e) A page where the processor observed a Write=1 PTE, started a write, set >> Dirty=1, but then observed a Write=0 PTE. That's possible today, but >> will not happen on processors that support shadow stack. > > What happens for "e" with/without CET? It sounds like direct writes to > such pages will be (correctly) rejected by the MMU? > >> >> Define _PAGE_COW and update pte_*() helpers and apply the same changes to >> pmd and pud. >> >> After this, there are six free bits left in the 64-bit PTE, and no more >> free bits in the 32-bit PTE (except for PAE) and Shadow Stack is not >> implemented for the 32-bit kernel. > > Are there selftests to validate this change? > I have some tests to verify, for example, - After clone(), shadow stack pages are indeed copy-on-write, - Shadow stack pages (i.e. Write=0, Dirty=1) cannot be directly written to, - Shadow stack guard pages exist. These tests are now on github, but kind of messy. I can gradually clean up them and submit as selftests separately. If you are asking for the detection of the potential hardware issue (that Dave Hansen talked about), then maybe we need to detect it from the kernel. > I think it might be useful to more clearly describe what is considered > "dirty" and "writeable" in comments above the pte_helpers. > Yes, I will update it. Thanks! [...]