From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E2925C433B4 for ; Wed, 5 May 2021 00:32:31 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 832CE613CB for ; Wed, 5 May 2021 00:32:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 832CE613CB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A069D6B006C; Tue, 4 May 2021 20:32:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9A4B86B0070; Tue, 4 May 2021 20:32:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 784CD6B0072; Tue, 4 May 2021 20:32:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0027.hostedemail.com [216.40.44.27]) by kanga.kvack.org (Postfix) with ESMTP id 4CBAF6B006C for ; Tue, 4 May 2021 20:32:29 -0400 (EDT) Received: from smtpin38.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 067338249980 for ; Wed, 5 May 2021 00:32:29 +0000 (UTC) X-FDA: 78105301218.38.9C892C6 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf09.hostedemail.com (Postfix) with ESMTP id 8DA756000105 for ; Wed, 5 May 2021 00:32:17 +0000 (UTC) IronPort-SDR: X3NcNgEgJw8GPyw3lMfgIputtmMV7OD3NjtCZOXNui+KEqmPXUwOV1Frjbz47SlytAiG4AZ/38 NVmcxaoPw27g== X-IronPort-AV: E=McAfee;i="6200,9189,9974"; a="197724321" X-IronPort-AV: E=Sophos;i="5.82,273,1613462400"; d="scan'208";a="197724321" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 May 2021 17:32:24 -0700 IronPort-SDR: FQrxJIOXl9kjvrzWySZvwZA5n2LRdmCwrFZ3HtrZs6RGPwdS5P3Kd3qwLAIlzOpv20rOSyaeQT ucepazSWdRBA== X-IronPort-AV: E=Sophos;i="5.82,273,1613462400"; d="scan'208";a="429490746" Received: from rpedgeco-mobl3.amr.corp.intel.com (HELO localhost.intel.com) ([10.209.26.68]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 May 2021 17:32:22 -0700 From: Rick Edgecombe To: dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org, linux-mm@kvack.org, x86@kernel.org, akpm@linux-foundation.org, linux-hardening@vger.kernel.org, kernel-hardening@lists.openwall.com Cc: ira.weiny@intel.com, rppt@kernel.org, dan.j.williams@intel.com, linux-kernel@vger.kernel.org, Rick Edgecombe Subject: [PATCH RFC 0/9] PKS write protected page tables Date: Tue, 4 May 2021 17:30:23 -0700 Message-Id: <20210505003032.489164-1-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Authentication-Results: imf09.hostedemail.com; dkim=none; dmarc=fail reason="No valid SPF, No valid DKIM" header.from=intel.com (policy=none); spf=none (imf09.hostedemail.com: domain of rick.p.edgecombe@intel.com has no SPF policy when checking 192.55.52.115) smtp.mailfrom=rick.p.edgecombe@intel.com X-Stat-Signature: ywkhm4p993uagmfpgwm3pdd3bjgphzsh X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 8DA756000105 Received-SPF: none (intel.com>: No applicable sender policy available) receiver=imf09; identity=mailfrom; envelope-from=""; helo=mga14.intel.com; client-ip=192.55.52.115 X-HE-DKIM-Result: none/none X-HE-Tag: 1620174737-434047 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This is a POC for write protecting page tables with PKS (Protection Keys = for=20 Supervisor) [1]. The basic idea is to make the page tables read only, exc= ept=20 temporarily on a per-cpu basis when they need to be modified. I=E2=80=99m= looking for=20 opinions on whether people like the general direction of this in terms of= =20 value and implementation. Why would people want this? =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D Page tables are the basis for many types of protections and as such, are = a=20 juicy target for attackers. Mapping them read-only will make them harder = to=20 use in attacks. This protects against an attacker that has acquired the ability to write = to=20 the page tables. It's not foolproof because an attacker who can execute=20 arbitrary code can either disable PKS directly, or simply call the same=20 functions that the kernel uses for legitimate page table writes. Why use PKS for this? =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D PKS is an upcoming CPU feature that allows supervisor virtual memory=20 permissions to be changed without flushing the TLB, like PKU does for use= r=20 memory. Protecting page tables would normally be really expensive because= you=20 would have to do it with paging itself. PKS helps by providing a way to t= oggle=20 the writability of the page tables with just a per-cpu MSR. Performance impacts =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Setting direct map permissions on whatever random page gets allocated for= a=20 page table would result in a lot of kernel range shootdowns and direct ma= p=20 large page shattering. So the way the PKS page table memory is created is= =20 similar to this module page clustering series[2], where a cache of pages = is=20 replenished from 2MB pages such that the direct map permissions and assoc= iated=20 breakage is localized on the direct map. In the PKS page tables case, a P= KS=20 key is pre-applied to the direct map for pages in the cache. There would be some costs of memory overhead in order to protect the dire= ct=20 map page tables. There would also be some extra kernel range shootdowns t= o=20 replenish the cache on occasion, from setting the PKS key on the direct m= ap of=20 the new pages. I don=E2=80=99t have any actual performance data yet. This is based on V6 [1] of the core PKS infrastructure patches. PKS=20 infrastructure follow-on=E2=80=99s are planned to enable keys to be set t= o the same=20 permissions globally. Since this usage needs a key to be set globally=20 read-only by default, a small temporary solution is hacked up in patch 8.= Long=20 term, PKS protected page tables would use a better and more generic solut= ion=20 to achieve this. [1] https://lore.kernel.org/lkml/20210401225833.566238-1-ira.weiny@intel.com/ [2] https://lore.kernel.org/lkml/20210405203711.1095940-1-rick.p.edgecombe@in= tel.com / Thanks, Rick Rick Edgecombe (9): list: Support getting most recent element in list_lru list: Support list head not in object for list_lru x86/mm/cpa: Add grouped page allocations mm: Explicitly zero page table lock ptr x86, mm: Use cache of page tables x86/mm/cpa: Add set_memory_pks() x86/mm/cpa: Add perm callbacks to grouped pages x86, mm: Protect page tables with PKS x86, cpa: PKS protect direct map page tables arch/x86/boot/compressed/ident_map_64.c | 5 + arch/x86/include/asm/pgalloc.h | 6 + arch/x86/include/asm/pgtable.h | 26 +- arch/x86/include/asm/pgtable_64.h | 33 ++- arch/x86/include/asm/pkeys_common.h | 8 +- arch/x86/include/asm/set_memory.h | 23 ++ arch/x86/mm/init.c | 40 +++ arch/x86/mm/pat/set_memory.c | 312 +++++++++++++++++++++++- arch/x86/mm/pgtable.c | 144 ++++++++++- include/asm-generic/pgalloc.h | 42 +++- include/linux/list_lru.h | 26 ++ include/linux/mm.h | 7 + mm/Kconfig | 6 +- mm/list_lru.c | 38 ++- mm/memory.c | 1 + mm/swap.c | 7 + mm/swap_state.c | 6 + 17 files changed, 705 insertions(+), 25 deletions(-) --=20 2.30.2