From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B35FC38142 for ; Fri, 20 Jan 2023 00:55:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 304536B0071; Thu, 19 Jan 2023 19:55:22 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2B3DA6B0072; Thu, 19 Jan 2023 19:55:22 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 12E056B0074; Thu, 19 Jan 2023 19:55:22 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id F41C36B0071 for ; Thu, 19 Jan 2023 19:55:21 -0500 (EST) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id C0EAF1406D9 for ; Fri, 20 Jan 2023 00:55:21 +0000 (UTC) X-FDA: 80373358842.07.B18D897 Received: from mail-pg1-f173.google.com (mail-pg1-f173.google.com [209.85.215.173]) by imf22.hostedemail.com (Postfix) with ESMTP id DCE34C0006 for ; Fri, 20 Jan 2023 00:55:19 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=jZRlzywp; spf=pass (imf22.hostedemail.com: domain of keescook@chromium.org designates 209.85.215.173 as permitted sender) smtp.mailfrom=keescook@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674176119; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=NXL29dhGPGSlxhwtfxK4fw+n68wsFSVNwNnLqn1osKM=; b=MH/qKtl5fDhyocqmYpWuDqRRgkJKqp0Xl0DU8okUf6UnE+F1DG1Q+MD4sSaLEh5wobeyEi bRlS+56KgB+efugxH2BXldDh9YI/nWDn9msualwwx3TqHG44Wfxj5f+bF7Pdz4Y45GY+LP D+ktn684B0BNyGjlfzARNgLiKQ6bZss= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=jZRlzywp; spf=pass (imf22.hostedemail.com: domain of keescook@chromium.org designates 209.85.215.173 as permitted sender) smtp.mailfrom=keescook@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674176119; a=rsa-sha256; cv=none; b=1FqdX5ForzT7kKEpDq29qg3ZZNfvrwFxB11gSdXRZmZkKFnPt8bw6kpbNGEoJ489NUxWbn K5IVX6BNQMoPzaojmnODg72GLb9Fsayn14Eei85+HeyAlUGoW2RdZDZv2hjmhHBSAKzsVO VZU1hqxF6sitaGrU/dHY0rw7F3zNcCs= Received: by mail-pg1-f173.google.com with SMTP id d10so2954018pgm.13 for ; Thu, 19 Jan 2023 16:55:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=NXL29dhGPGSlxhwtfxK4fw+n68wsFSVNwNnLqn1osKM=; b=jZRlzywpwITHll2KRGT8ul5zgQzLQ5haL7RX7a1uWpmxk7IpWlL0sqakiUMkrLyS00 GUweQNVDtxHmxEpAU10K778LVljDuRkRIfGg0M9wktSLRM0TM7hkTJZUF2qUVbCmrE0C qria/izb3rbVitx2JKW2eXnH48oGsSs+T9tPc= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=NXL29dhGPGSlxhwtfxK4fw+n68wsFSVNwNnLqn1osKM=; b=WRjCuySweCzAYAitTLw/WrfCyKjEQrjBtZVgu48sSH2BiY565yifvG+ltEDCxUlhra Z1BLtJ3tK2Iw02aw8EZktHBF+xh9tYgaFv1R1A1WuwnleUSxVAIYxHYvDUGEOncHXZY7 YJ9GPM4hzQ7LSWRy55PLAh1U0mY9foaH+/bXxAH3gZ6J57Hm76vU918GnzbMjVvQIKrf 5p0sdPRPCXZmndE+JzuWpLv1l6SQiiZSXrfyPZdcW8xATFCqTEfWXZhOpUuAAFicCfQ6 xqX52YI9kCy0UVKBRoiuMmzsOBUjzm2Q9XxJB8AwIEICBHuIoOomo+KfQXJX+kd02t/E WZNg== X-Gm-Message-State: AFqh2koTzrD6bBVewupnstQNhrNMbzojhuuTXk1OnhTdXYm1Gpks3Nr1 StQhtL8IVPQSi7kifayx0dBWWw== X-Google-Smtp-Source: AMrXdXv5B5/j+1dhLV2iD0PUpATQfFIclIsDHZkG7CDLuhZzZXjljtdmKU4p4Wj57OmeTr9d69neOA== X-Received: by 2002:a05:6a00:21c9:b0:58d:f607:5300 with SMTP id t9-20020a056a0021c900b0058df6075300mr8819113pfj.8.1674176118791; Thu, 19 Jan 2023 16:55:18 -0800 (PST) Received: from www.outflux.net (smtp.outflux.net. [198.145.64.163]) by smtp.gmail.com with ESMTPSA id 65-20020a621844000000b005877d374069sm22336822pfy.10.2023.01.19.16.55.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 19 Jan 2023 16:55:18 -0800 (PST) Date: Thu, 19 Jan 2023 16:55:17 -0800 From: Kees Cook To: Rick Edgecombe Cc: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, Yu-cheng Yu Subject: Re: [PATCH v5 10/39] x86/mm: Introduce _PAGE_COW Message-ID: <202301191655.97E3023EC@keescook> References: <20230119212317.8324-1-rick.p.edgecombe@intel.com> <20230119212317.8324-11-rick.p.edgecombe@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230119212317.8324-11-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: DCE34C0006 X-Rspam-User: X-Stat-Signature: fqjd7f9ougdnr6uxg793mii3tpbbjtj8 X-HE-Tag: 1674176119-153581 X-HE-Meta: U2FsdGVkX1+qTepMEY6/iGjegJy1qaEBcE1PvIaZ1GLtg5p9opZOKTBWy2Mb6yLPPeGAoPIAFm6Cf+BdCYYKPEfIPMKzNHfA07AP578UT5/Cgg4IdXQwecI9xVOhj7k/Cq1xvWifYS7s3xrWG5Jtst1keoQIo6VRqeSjBTBfTNhK/oe22/RIW0HZYPTJ9B8EEdyJSUV8mnmjyOvgI1f6BgcqFgNmPj4mn9ErhAAyqBJGcB2aH/NTjgJOUKsqImWohX+j+wgW5qKErIWvoHBpqsX1Y+Iat93sEXI3pn5f79JU9ejGnpFBAQmjZsgarysaCR8Tv+/hSk0/IPH2gioEX/Pfyjb/i80dr+hxTXEILkDusSge30JkZH9kbDaTpJGgIJs/jOPZxJtdAjeKDIVNFciLjxVEObBEyc0ZCjHh4i9ZYz7/ndQ73nbuW0JOhJqinS8R72gmQbrtPxkYl2XEYSRETqMnDNr7KQYYipIzQawPDWUs27/XPBwfj0e+j54I+qorJuXDLTumRgiEcQRYf/K3rSpgUEqEFOLidlEljB6TBSyEnzvxceSIbVqnzQI4ByymnVQ+XQ+5R4D76s0POCvt/mjzUDGWPe75xfrvPFfPdanX3G/VnZDyfrT8Vfm1KHSt/x5fhcuDoTSLxUuUuuWoTQszU1R8OvdFzDtGSy2U9Ec8pRL/p4fBRx8hAd+PLD8RdCBJd4TYQ3R9AD5SSRqovFag1eVcxGnC6+lqSitYcPFBpKGFI6FkpWwJhwQ9TAxUHC2EncUfTDa6ipm9qMjX7w1EoW+wXYTqlwAmKte79xpzKjbmw3MwQxhFkvNACdFzhFLoeei3jDTQc9kcUswkpDBJXhGWspOP2rMPvyO6pm+ifMlZ+g8rZK7unvLNrN3b3nloJ9UTP9Arj3PcKCLooP3pdnZQJcnKENbQoSjJKhC72H2eIOEOnUvsdOjOjRHHXvayjhdlwQAE2Td 3oZPjGcN FV80MNT6ufEtQQw9TUvFv1xbnlHGZOMELs38hoC/k9DTDGStCMPkuTs4s5tzoJLR2YfDEtl22qKgU/ILJMW+osaLNPmeNZaW+UyTC9ss/PXVHyeATUaxvp/RdVsGoJiZHNJCln/4b/GGNU7SPPGFs7VyJ14/SBrzLnQI/KkuUIl/Ss7pyTCyVl03z6kNzdcuqyvlwUh1d/fdhllv2RPkHCM7vcHZricHKT7eE51MEeQjYQ0YH9EpSBR9guc2daaEa3ISFi9gTbzesY8pf9vs3R1YUtvnfi5doDb7ibpBw3QSSpaulLvNlHvu1A90Hd9CsvFzqsklQ89L1a8Q7nVIa0mM6uZmg6UYRuDYi9UsU+xbup+N/zmbtodMeMY4r+0Ckcghjz0dQltTyucVY14XaG4ElO1C+w9PXh4Nv4IyGsrFQii/rYEdZrgWrX0KG6OyVhaKJ7/9OLRwuI9qAD5u4+EEn3tZZ+4g1X2P4H+DEgQJ6iW1P/PJwooFLu0Y9He8Dd546G21dKIz58NKS/0DuJN4fXceV8iyqoR8VzJqqRH+CpkKXF32hJTHMopjqafkr+ggQJEq3Mu9dAJRru2gn9JjeSg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Jan 19, 2023 at 01:22:48PM -0800, Rick Edgecombe wrote: > Some OSes have a greater dependence on software available bits in PTEs than > Linux. That left the hardware architects looking for a way to represent a > new memory type (shadow stack) within the existing bits. They chose to > repurpose a lightly-used state: Write=0,Dirty=1. So in order to support > shadow stack memory, Linux should avoid creating memory with this PTE bit > combination unless it intends for it to be shadow stack. > > The reason it's lightly used is that Dirty=1 is normally set by HW > _before_ a write. A write with a Write=0 PTE would typically only generate > a fault, not set Dirty=1. Hardware can (rarely) both set Dirty=1 *and* > generate the fault, resulting in a Write=0,Dirty=1 PTE. Hardware which > supports shadow stacks will no longer exhibit this oddity. > > So that leaves Write=0,Dirty=1 PTEs created in software. To achieve this, > in places where Linux normally creates Write=0,Dirty=1, it can use the > software-defined _PAGE_COW in place of the hardware _PAGE_DIRTY. In other > words, whenever Linux needs to create Write=0,Dirty=1, it instead creates > Write=0,Cow=1 except for shadow stack, which is Write=0,Dirty=1. > Further differentiated by VMA flags, these PTE bit combinations would be > set as follows for various types of memory: > > (Write=0,Cow=1,Dirty=0): > - A modified, copy-on-write (COW) page. Previously when a typical > anonymous writable mapping was made COW via fork(), the kernel would > mark it Write=0,Dirty=1. Now it will instead use the Cow bit. This > happens in copy_present_pte(). > - A R/O page that has been COW'ed. The user page is in a R/O VMA, > and get_user_pages(FOLL_FORCE) needs a writable copy. The page fault > handler creates a copy of the page and sets the new copy's PTE as > Write=0 and Cow=1. > - A shared shadow stack PTE. When a shadow stack page is being shared > among processes (this happens at fork()), its PTE is made Dirty=0, so > the next shadow stack access causes a fault, and the page is > duplicated and Dirty=1 is set again. This is the COW equivalent for > shadow stack pages, even though it's copy-on-access rather than > copy-on-write. > > (Write=0,Cow=0,Dirty=1): > - A shadow stack PTE. > - A Cow PTE created when a processor without shadow stack support set > Dirty=1. > > There are six bits left available to software in the 64-bit PTE after > consuming a bit for _PAGE_COW. No space is consumed in 32-bit kernels > because shadow stacks are not enabled there. > > Implement only the infrastructure for _PAGE_COW. Changes to start > creating _PAGE_COW PTEs will follow once other pieces are in place. > > Tested-by: Pengfei Xu > Tested-by: John Allen > Co-developed-by: Yu-cheng Yu > Signed-off-by: Yu-cheng Yu Reviewed-by: Kees Cook -- Kees Cook