From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.0 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C501C433E7 for ; Tue, 14 Jul 2020 07:04:22 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CC57022200 for ; Tue, 14 Jul 2020 07:04:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CC57022200 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D5C9A8D0006; Tue, 14 Jul 2020 03:04:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C6C0A8D000A; Tue, 14 Jul 2020 03:04:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B116E8D0006; Tue, 14 Jul 2020 03:04:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0145.hostedemail.com [216.40.44.145]) by kanga.kvack.org (Postfix) with ESMTP id 848188D000A for ; Tue, 14 Jul 2020 03:04:17 -0400 (EDT) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 40A81180AD802 for ; Tue, 14 Jul 2020 07:04:17 +0000 (UTC) X-FDA: 77035792554.04.talk01_1c04a0c26eef Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin04.hostedemail.com (Postfix) with ESMTP id 1C8DD800294C for ; Tue, 14 Jul 2020 07:04:17 +0000 (UTC) X-HE-Tag: talk01_1c04a0c26eef X-Filterd-Recvd-Size: 7234 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf26.hostedemail.com (Postfix) with ESMTP for ; Tue, 14 Jul 2020 07:04:15 +0000 (UTC) IronPort-SDR: EQbHOIZLgMbTfoOBOHxhTZk2GPlEh/v4D8rAUkZwPoxLG2E0JXz8qRYYDyfqQNQV/QGNHZ7geB xzsJe83sHcCA== X-IronPort-AV: E=McAfee;i="6000,8403,9681"; a="148828669" X-IronPort-AV: E=Sophos;i="5.75,350,1589266800"; d="scan'208";a="148828669" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2020 00:04:14 -0700 IronPort-SDR: Q2mRak2UgfiLrg3sZxYOuh5J0y3GGbFbI2P9jkFn7i2q77yP2LA96pV7XI2Q2m/00uJqp6RVgw jDz7HsCXwj3w== X-IronPort-AV: E=Sophos;i="5.75,350,1589266800"; d="scan'208";a="325755398" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2020 00:04:13 -0700 From: ira.weiny@intel.com To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra Cc: Ira Weiny , x86@kernel.org, Dave Hansen , Dan Williams , Vishal Verma , Andrew Morton , Fenghua Yu , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Subject: [RFC PATCH 07/15] Documentation/pkeys: Update documentation for kernel pkeys Date: Tue, 14 Jul 2020 00:02:12 -0700 Message-Id: <20200714070220.3500839-8-ira.weiny@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200714070220.3500839-1-ira.weiny@intel.com> References: <20200714070220.3500839-1-ira.weiny@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 1C8DD800294C X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Ira Weiny Future Intel CPUS will support Protection Key Supervisor (PKS). Update the protection key documentation to cover pkeys on supervisor pages. Signed-off-by: Ira Weiny --- Documentation/core-api/protection-keys.rst | 81 +++++++++++++++++----- 1 file changed, 63 insertions(+), 18 deletions(-) diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/c= ore-api/protection-keys.rst index ec575e72d0b2..5ac400a5a306 100644 --- a/Documentation/core-api/protection-keys.rst +++ b/Documentation/core-api/protection-keys.rst @@ -4,25 +4,33 @@ Memory Protection Keys =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 -Memory Protection Keys for Userspace (PKU aka PKEYs) is a feature -which is found on Intel's Skylake (and later) "Scalable Processor" -Server CPUs. It will be available in future non-server Intel parts -and future AMD processors. - -For anyone wishing to test or use this feature, it is available in -Amazon's EC2 C5 instances and is known to work there using an Ubuntu -17.04 image. - Memory Protection Keys provides a mechanism for enforcing page-based protections, but without requiring modification of the page tables -when an application changes protection domains. It works by -dedicating 4 previously ignored bits in each page table entry to a -"protection key", giving 16 possible keys. +when an application changes protection domains. + +PKeys Userspace (PKU) is a feature which is found on Intel's Skylake "Sc= alable +Processor" Server CPUs and later. And It will be available in future +non-server Intel parts and future AMD processors. + +Future Intel processors will support Protection Keys for Supervisor page= s +(PKS). + +For anyone wishing to test or use user space pkeys, it is available in A= mazon's +EC2 C5 instances and is known to work there using an Ubuntu 17.04 image. + +pkes work by dedicating 4 previously Reserved bits in each page table en= try to +a "protection key", giving 16 possible keys. User and Supervisor pages = are +treated separately. =20 -There is also a new user-accessible register (PKRU) with two separate -bits (Access Disable and Write Disable) for each key. Being a CPU -register, PKRU is inherently thread-local, potentially giving each -thread a different set of protections from every other thread. +Protections for each page are controlled with per CPU registers for each= type +of page User and Supervisor. Each of these 32 bit register stores two s= eparate +bits (Access Disable and Write Disable) for each key. + +For Userspace the register is user-accessible (rdpkru/wrpkru). For +Supervisor, the register (MSR_IA32_PKRS) is accessible only to the kerne= l. + +Being a CPU register, pkes are inherently thread-local, potentially givi= ng +each thread an independent set of protections from every other thread. =20 There are two new instructions (RDPKRU/WRPKRU) for reading and writing to the new register. The feature is only available in 64-bit mode, @@ -30,8 +38,11 @@ even though there is theoretically space in the PAE PT= Es. These permissions are enforced on data access only and have no effect on instruction fetches. =20 -Syscalls -=3D=3D=3D=3D=3D=3D=3D=3D +For kernel space rdmsr/wrmsr are used to access the kernel MSRs. + + +Syscalls for user space keys +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D =20 There are 3 system calls which directly interact with pkeys:: =20 @@ -98,3 +109,37 @@ with a read():: The kernel will send a SIGSEGV in both cases, but si_code will be set to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when the plain mprotect() permissions are violated. + + +Kernel API for PKS support +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D + +PKS is intended to harden against unwanted access to kernel pages. But = it does +not completely restrict access under all conditions. For example the MS= R +setting is not saved/restored during irqs. Thus the use of PKS is a mit= igation +strategy rather than a form of strict security. + +The following calls are used to allocate, use, and deallocate a pkey whi= ch +defines a 'protection domain' within the kernel. Setting a pkey value i= n a +supervisor mapping adds that mapping to the protection domain. Then cal= ls can be +used to enable/disable read and/or write access to all of the pages mapp= ed with +that key: + + int pks_key_alloc(const char * const pkey_user); + #define PAGE_KERNEL_PKEY(pkey) + #define _PAGE_KEY(pkey) + int pks_update_protection(int pkey, unsigned long protection); + void pks_key_free(int pkey); + +In-kernel users must be prepared to set PAGE_KERNEL_PKEY() permission in= the +page table entries for the mappings they want to ptorect. + +WARNING: It is imperative that callers check for errors from pks_key_all= oc() +because pkeys are a limited resource and so callers should be prepared t= o work +without PKS support. + +For admins a debugfs interface provides a list of the current keys in us= e at: + + /sys/kernel/debug/x86/pks_keys_allocated + +Some example code can be found in lib/pks/pks_test.c --=20 2.25.1