From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 98275C36008 for ; Tue, 25 Mar 2025 17:11:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 596B8280006; Tue, 25 Mar 2025 13:11:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 56DDD280005; Tue, 25 Mar 2025 13:11:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 43715280006; Tue, 25 Mar 2025 13:11:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id E7B3B280005 for ; Tue, 25 Mar 2025 13:11:33 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 395EB55B0C for ; Tue, 25 Mar 2025 17:11:35 +0000 (UTC) X-FDA: 83260714950.15.1116227 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf02.hostedemail.com (Postfix) with ESMTP id EB9528001B for ; Tue, 25 Mar 2025 17:11:32 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=none; spf=pass (imf02.hostedemail.com: domain of kevin.brodsky@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=kevin.brodsky@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742922693; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mL3bac11gJdmSQ9jSxEhzOMHHT6cgkydJIwgZXReIJo=; b=HzgD9SH6PQXVtuOlwzs8u8ZeENOs2Mkcoj7tg49TsN2eiectT3MyISA03T0xL1MD0xcP4o yjFmotxj/KMpFJazI7w0tdHemcIltX0ui/Yqy21fAIEXS6t2VFmjI4DaW+VjaRChFmBwL/ X1aBux9bMMmlIJAJc9IuIhcC9sDJMY8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742922693; a=rsa-sha256; cv=none; b=IhOhuGKThv3iHTqjURDl89kcUZewqd7WyYL2o76S9Wqq4+mVYJNNXmZWOOuTSUmTQkI8ZX HKKwATLVpf4qR8IPo1ETE7mFb5gIBubUwTEjW8Kgt3R51WLKLVH+uCYJo1RmFbkOTmwrNv DYVHEwgApIg6UG/X+Nc2etBieuxOASA= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=none; spf=pass (imf02.hostedemail.com: domain of kevin.brodsky@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=kevin.brodsky@arm.com; dmarc=pass (policy=none) header.from=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 93CEE1756; Tue, 25 Mar 2025 10:11:37 -0700 (PDT) Received: from [10.57.41.156] (unknown [10.57.41.156]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 940363F58B; Tue, 25 Mar 2025 10:11:26 -0700 (PDT) Message-ID: Date: Tue, 25 Mar 2025 18:11:24 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH v3 00/15] pkeys-based page table hardening To: Maxwell Bland Cc: linux-kernel@vger.kernel.org, Andrew Morton , Mark Brown , Catalin Marinas , Dave Hansen , Jann Horn , Jeff Xu , Joey Gouly , Kees Cook , Linus Walleij , Andy Lutomirski , Marc Zyngier , Peter Zijlstra , Pierre Langlois , Quentin Perret , "Mike Rapoport (IBM)" , Ryan Roberts , Thomas Gleixner , Will Deacon , Matthew Wilcox , Qi Zheng , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, x86@kernel.org References: <20250203101839.1223008-1-kevin.brodsky@arm.com> <802963a0-32bd-49e8-82b1-34443acdd5ae@arm.com> Content-Language: en-GB From: Kevin Brodsky In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Stat-Signature: gwtogwhbfuw3ip8qt3mkj43gk9pa54u9 X-Rspam-User: X-Rspamd-Queue-Id: EB9528001B X-Rspamd-Server: rspam08 X-HE-Tag: 1742922692-968726 X-HE-Meta: U2FsdGVkX1/Fpel9GrZqI2upC9FZh+AXETPnwiQVGQ/maWsw04Ssujyx0vOaStz/ULqldBMGcoxcgISvyBALbCWzQVC8D3hbLf5Im2MI5DxgplZy5o88xfYaRwqrbHiaH2099SOJH6y0SfkCX9sdcaG6vvzReSSNXq4WaUW3eR+IQN4f18y4FDcEVemXHS3K8zMA7cFIKcwXHX888wDFtKu6IlONbuvLs3wROhrfv0OY1nwABObO+0IamWuiBvoOkRuv5V1/AfOc8YVcZz8uhgeAWNoAoZi+uCFm4+LZJBln0Tzypare0nOhYYKI6G4exOXepfhgbImOyevG8L79jAYLImYX2rHvNIFf5iZRDae+oEjeNnS6kIGdKlcgZS2iGryAx/tKUQX69SaNVJzuUDFOXsIEUM9otzRGkGm1rZ4umSg+U5jvXI8KERCloQl/qjswLPSbkVlhLh1VnFg2k8V7H8JPZSAJ3VyX3VYRGsWfQ974e3W1b9ilNd4C6GYapqh75oDBTbQpYxZu6swohFPAlxBX0uft7rgbc9w/IZVXucK/hBtqt2AOysjdn06Pwy/jOy5iELqqBJfLoPeqQLusPS8pdQ0DopOVRVgprsHulIW6LyJpe7eahb1eOo606UftoSwBFt8ISY49y2N3/PDRCjMRqdz054dxPaZfZ4x3v4TU/QhoUpp9S8ilrVHlv9YPsGmVs+ijAezDuWPnc6JnbfUCnui7MvFV47XbaxsubuPTnXTBrbQpvwd7QxpfWOd6c1bJdFaoW1IPZ7OTb+GInTNrHmkPvmwd6oZ3SeDUFd6LqZs+kDVTaG5oaf2KXVgUXun7mzTypqcAS9WvmXnIGRD+OAWPOguRVzIolXEMt276QNNqsNk9GeRKHfxa98+zfz4VOa+YnrjbeWSdhQkPiV2+KM3eS5R89yjfeKlwugFb222pvkXowhpOhd2SleQDqERLUmQUZ5mPPUl zDmDIDxI dNjHWHYZwv1bEcMNZau9qoHPEQt6lq32hiAYypSYq4Ux9XamTd2bB7ENRBGuFHMIqPoZKiCnpKUbfZXjU/Mmx9jnw1jf71/fKI4wJMzEVbio0r3iADt8DP/7ADcAqc7kVUctyal7RldMPqzZJvhHmpegqYH5wbUrOOp+BU8wqGX1N6wM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 19/03/2025 22:54, Maxwell Bland wrote: > [...] > > Onto the solution. In a pure-immutability approach, I ended up creating > some definitions regarding what an "expected" virtual-to-physical > mapping of kernel memory or change in page permissions is (my solution > was to treat init_mm as a ground truth structure which is only RW under > a more refined set of constraints on mapping updates and permission set > updates), for example, > > if (get_tag(cur_pte & EL1_PADDR_MASK) == SECCOMP_PAGE) { > // run BPF re-verifier to avoid the Qualys 2021 CVE PoC magic > > or forcing PXN for everything always unless some other check says we > absolutely have to make this PX: > > if (!cur_pte && new_pte) { > if (!(new_pte & pxn_bit)) { > new_pte |= pxn_bit; > > And then a bunch of additional infrastructure to define what a "safe > mapping of physical memory looks like", which introduces its own set of > problems for defining how/where vmalloc allocates from, physical address > layout randomization, ... > > Such an explication of the expected possible modifications to PTEs may > be needed? I don't understand whether just adding guard() around set_pte > helps this case. If I understood correctly the case you're referring to, the simple answer is that it doesn't. If some driver explicitly updates page tables, and then some exploit hijacks those pgtable updates, then kpkeys won't help at all. I see two separate problems here: ensuring that the appropriate interface is used to write to pgtables (or other objects), and then constraining which pgtables exactly a certain context can manipulate via this interface. kpkeys addresses the first problem, which you could say is a low-hanging fruit. The second problem clearly needs addressing for a strong protection of page tables, but I don't think kpkeys can directly help with that. > [...] > >>> Where overwriting f_op is a "classic" bypass of protection systems like >>> this one. >> Indeed, this has been pointed out to me on a few occasions. On that note >> I should reference the follow-up series that aims at protecting struct >> cred [1]. >> >> Using kpkeys to ensure that the task->cred pointer itself is mostly >> read-only is not straightforward, because there is just too much >> existing code that directly writes to other fields in task_struct - >> meaning we can't simply make the whole task_struct mostly read-only. >> What could be done is to split out the cred fields in a separate >> kpkeys-protected page, and link that page from task_struct. There is a >> clear overhead associated with this approach though, and maybe more >> importantly we have just displaced the issue as the pointer to that >> protected page remains writeable... I'm not sure how much a page-based >> technique like pkeys can help here. > __randomize_layout accomplished some of what is necessary here, while > accommodating contemporary hardware limitations, albeit to a finer > granularity and lesser degree. I am also not sure how well it works on > production systems (yet! It is on my TODO list in the short term), or > how much protection it really implies for an adversary with a "read > gadget" that can determine where to write (either from associated asm or > from the struct itself). > > But... > > (1) After thinking about this for a few hours, __randomize_layout implies > something valuable: under circumstances where this is supported, because > we assume the struct's layout itself is modifable, it becomes possible > to introduce a canary value _not_ for the modifiable portions, but just > for the non-modifiable portions, store this canary value in a strictly > read-only region of memory that also tracks pointers to the structure, > and ensure uses of the non-modifiable fields must always leverage the > canary value stored in this secondary region. I don't think this can be enforced with pkeys, since the overall struct would have to remain RW. But I'm not sure I completely understand how that scheme would work overall, or how __randomize_layout is a prerequisite for it. > [...] > > It may be possible to write a plugin modification to > > https://github.com/gcc-mirror/gcc/blob/12b2c414b6d0e0d1b3d328b58d654c19c30bee8c/gcc/config/arm/aarch-bti-insert.cc > > To also inject the POE/register set instructions conditioned upon the an > expected privilege (kpkeys) level of the executing function. > > Instead of adding code to set the privilege state via a specialized > callgate function, having this function's call explicitly tied into the > GCC CFI instrumentation pass. Determination of the specific "key level", > i.e. KPKEYS_LVL_PGTABLES, could likely be determined by the file > name/path via the compiler, or many other clever mechanisms, potentially > with support for exceptions via function attribute tags. Right I'm with you now. This is actually something we considered, but I'm not sure it really helps all that much. Adding a guard object to a function is basically the same amount of churn as marking that function with an attribute, except it doesn't require compiler support. It would get more useful if you wanted all functions in a file to switch kpkeys level (using a compiler flag), but for the purpose of making certain data mostly read-only, I doubt you actually want that - the granularity is just too coarse. A different angle would be use an attribute to mark struct members, rather than functions. That would instruct the compiler to switch kpkeys level whenever that member is written (and only then). You would probably want to forbid taking the address of the member too, as the compiler can't easily track that. That said this doesn't solve the bigger issue of existing code expecting to be able to write to other members in the struct. It most likely works only if the kpkeys switch is done when writing to any member of the struct (which may get very expensive). > [...] > Noticed as well, just now, the reliance on the static_key for branch > unlikely ... the following data structure (at the end of this email) may > be of interest as an FYI ... it can be used to track whether the kernel > self patching API is only being used to patch expected locations. > > I use this right now with additional checks that the instruction being > written for the indirect branch matches the expected offset. Are there exploits out there corrupting the address of a static branch? This seems difficult to me in practice because the __jump_table section where the addresses of instructions to be patched are stored is marked __ro_after_init. - Kevin