From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 45EBCC4321E for ; Sat, 3 Dec 2022 02:31:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E34806B0078; Fri, 2 Dec 2022 21:31:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DBDFD6B007B; Fri, 2 Dec 2022 21:31:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C39186B007D; Fri, 2 Dec 2022 21:31:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id B57236B0078 for ; Fri, 2 Dec 2022 21:31:15 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 8ADA8140C74 for ; Sat, 3 Dec 2022 02:31:15 +0000 (UTC) X-FDA: 80199418110.03.1C56552 Received: from mail-pj1-f41.google.com (mail-pj1-f41.google.com [209.85.216.41]) by imf13.hostedemail.com (Postfix) with ESMTP id 12A3A2000B for ; Sat, 3 Dec 2022 02:31:14 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b="kXbx/zxP"; spf=pass (imf13.hostedemail.com: domain of keescook@chromium.org designates 209.85.216.41 as permitted sender) smtp.mailfrom=keescook@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1670034675; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jNJdEKc96rBGuItANfeOJmqGPSpLhwEduadMg72X9+I=; b=wl9KKf+YqVyKeSHUEn8M5TlkG0TUkyNY0PirovbEWFtNT4T94hvGrNfgoYogrqLaa3ynYB 8mkKpfIM7J8UuYRGkdLUogLeT/F+GV2pJ5W3uzih3nSFcaDokK92I6BzqGg9tGOEj4PU/x OcQuj3lgEBiFfDrmkaLjy+LJYl1yecQ= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b="kXbx/zxP"; spf=pass (imf13.hostedemail.com: domain of keescook@chromium.org designates 209.85.216.41 as permitted sender) smtp.mailfrom=keescook@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1670034675; a=rsa-sha256; cv=none; b=ySx/j88PT88FgNJRXyqiBYWLRZF90CXTZiKFYsX7z2yixYSt8oa2uCdglkwiVtM/46gz+3 p/53sJKPfGj8Wf5i2Q1K5OCZIZzSpft4qBF9AcgYOD7qU4y9UK+iQRCiyIFS7ovsOHwlYi drKhGoFbNG+gphYqu2I+TMIb6BHQKqU= Received: by mail-pj1-f41.google.com with SMTP id o5-20020a17090a678500b00218cd5a21c9so6668488pjj.4 for ; Fri, 02 Dec 2022 18:31:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=jNJdEKc96rBGuItANfeOJmqGPSpLhwEduadMg72X9+I=; b=kXbx/zxPoqU2brUX0DUwi+fETzDpdoO6MchLW2gG+HaV3D/7RMyQboruFgqlHaf9l7 nKvHOB/M1V0b/np2bfytJHetMFVIxhQ/PsQdMLv1ulc+LQCiUetldzoa6QLpEPjA/FrA HEUGEydrFepn5uv44DSxjg69QjmB/S+jEDqeI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=jNJdEKc96rBGuItANfeOJmqGPSpLhwEduadMg72X9+I=; b=DzXXMv4g1CZIhBXy/lqdPPHxJSILuhtKyAmekuArY3t5kZXIC1TIjPShnkVF8hnKbW QamBqKcF/gadvpQLinW7gemOJu1R0YjfU4gY90LuDTvGhQj7r4Ft01Ks2qdz/6DvZZ6P hlOJGScTnvM1iO9IgrEfGkQ8wwy/o23rvpRjFI4jPkOERKmcC3AMLgj3NalGxX4gKNDU GFXxxi9e15tSb8/T8jgz/zK9wgCoL7+NusJ5iUUfSVawn0jNVqh4IKyuJfV5TQqI5aWH q9k5WBr+NhzSJftY4IMHwzH9lt5IQMSIAhRXltg9r2MPf0z2AAPjIP51d8E/9J7YA5B6 7uaA== X-Gm-Message-State: ANoB5pmwTqUDFutcOvNi/9u07hX+l82Wf6aWi8W6CDuBQVNg2a2FA7NL 6j2jE2sbsmJQ/LVtOhrE8DFCOA== X-Google-Smtp-Source: AA0mqf57wxPc/IcFFwXPdN4t6wwPtuIbaiBBr+XKGhmHjfx+eusyVOWa3AYDyygGChHHhwdagn5RJA== X-Received: by 2002:a17:90a:fb50:b0:219:5e9:f260 with SMTP id iq16-20020a17090afb5000b0021905e9f260mr42436295pjb.16.1670034674099; Fri, 02 Dec 2022 18:31:14 -0800 (PST) Received: from www.outflux.net (smtp.outflux.net. [198.145.64.163]) by smtp.gmail.com with ESMTPSA id x9-20020aa79a49000000b0056b9df2a15esm117188pfj.62.2022.12.02.18.31.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 02 Dec 2022 18:31:13 -0800 (PST) Date: Fri, 2 Dec 2022 18:31:12 -0800 From: Kees Cook To: Rick Edgecombe Cc: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, Yu-cheng Yu Subject: Re: [PATCH v4 10/39] x86/mm: Introduce _PAGE_COW Message-ID: <202212021831.61BD0D9A5@keescook> References: <20221203003606.6838-1-rick.p.edgecombe@intel.com> <20221203003606.6838-11-rick.p.edgecombe@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20221203003606.6838-11-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 12A3A2000B X-Stat-Signature: capen1nc9ix6wwfpmi83mpf8gfhf7p7e X-Spamd-Result: default: False [0.10 / 9.00]; BAYES_HAM(-6.00)[100.00%]; SORBS_IRL_BL(3.00)[209.85.216.41:from]; SUSPICIOUS_RECIPS(1.50)[]; SUBJECT_HAS_UNDERSCORES(1.00)[]; MID_RHS_NOT_FQDN(0.50)[]; RCVD_NO_TLS_LAST(0.10)[]; MIME_GOOD(-0.10)[text/plain]; BAD_REP_POLICIES(0.10)[]; PREVIOUSLY_DELIVERED(0.00)[linux-mm@kvack.org]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_TWELVE(0.00)[39]; ARC_NA(0.00)[]; R_DKIM_ALLOW(0.00)[chromium.org:s=google]; MIME_TRACE(0.00)[0:+]; TAGGED_RCPT(0.00)[]; DMARC_POLICY_ALLOW(0.00)[chromium.org,none]; TO_MATCH_ENVRCPT_SOME(0.00)[]; ARC_SIGNED(0.00)[hostedemail.com:s=arc-20220608:i=1]; DKIM_TRACE(0.00)[chromium.org:+]; R_SPF_ALLOW(0.00)[+ip4:209.85.128.0/17]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_SOME(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[] X-Rspam-User: X-HE-Tag: 1670034674-91631 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Dec 02, 2022 at 04:35:37PM -0800, Rick Edgecombe wrote: > Some OSes have a greater dependence on software available bits in PTEs than > Linux. That left the hardware architects looking for a way to represent a > new memory type (shadow stack) within the existing bits. They chose to > repurpose a lightly-used state: Write=0,Dirty=1. So in order to support > shadow stack memory, Linux should avoid creating memory with this PTE bit > combination unless it intends for it to be shadow stack. > > The reason it's lightly used is that Dirty=1 is normally set by HW > _before_ a write. A write with a Write=0 PTE would typically only generate > a fault, not set Dirty=1. Hardware can (rarely) both set Dirty=1 *and* > generate the fault, resulting in a Write=0,Dirty=1 PTE. Hardware which > supports shadow stacks will no longer exhibit this oddity. > > So that leaves Write=0,Dirty=1 PTEs created in software. To achieve this, > in places where Linux normally creates Write=0,Dirty=1, it can use the > software-defined _PAGE_COW in place of the hardware _PAGE_DIRTY. In other > words, whenever Linux needs to create Write=0,Dirty=1, it instead creates > Write=0,Cow=1 except for shadow stack, which is Write=0,Dirty=1. > Further differentiated by VMA flags, these PTE bit combinations would be > set as follows for various types of memory: > > (Write=0,Cow=1,Dirty=0): > - A modified, copy-on-write (COW) page. Previously when a typical > anonymous writable mapping was made COW via fork(), the kernel would > mark it Write=0,Dirty=1. Now it will instead use the Cow bit. This > happens in copy_present_pte(). > - A R/O page that has been COW'ed. The user page is in a R/O VMA, > and get_user_pages(FOLL_FORCE) needs a writable copy. The page fault > handler creates a copy of the page and sets the new copy's PTE as > Write=0 and Cow=1. > - A shared shadow stack PTE. When a shadow stack page is being shared > among processes (this happens at fork()), its PTE is made Dirty=0, so > the next shadow stack access causes a fault, and the page is > duplicated and Dirty=1 is set again. This is the COW equivalent for > shadow stack pages, even though it's copy-on-access rather than > copy-on-write. > > (Write=0,Cow=0,Dirty=1): > - A shadow stack PTE. > - A Cow PTE created when a processor without shadow stack support set > Dirty=1. > > There are six bits left available to software in the 64-bit PTE after > consuming a bit for _PAGE_COW. No space is consumed in 32-bit kernels > because shadow stacks are not enabled there. > > This is a prepratory patch. Changes to actually start marking _PAGE_COW > will follow once other pieces are in place. > > Tested-by: Pengfei Xu > Tested-by: John Allen > Co-developed-by: Yu-cheng Yu > Signed-off-by: Yu-cheng Yu Reviewed-by: Kees Cook -- Kees Cook