From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA8DBC61D85 for ; Tue, 21 Nov 2023 21:20:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 326BB6B031C; Tue, 21 Nov 2023 16:20:36 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2D63C6B03C1; Tue, 21 Nov 2023 16:20:36 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1C6386B0461; Tue, 21 Nov 2023 16:20:36 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 0F6736B031C for ; Tue, 21 Nov 2023 16:20:36 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id DF0D3B597F for ; Tue, 21 Nov 2023 21:20:35 +0000 (UTC) X-FDA: 81483230430.08.A9D886E Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) by imf13.hostedemail.com (Postfix) with ESMTP id 1656020026 for ; Tue, 21 Nov 2023 21:20:33 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=j3c7vPHG; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf13.hostedemail.com: domain of mhkelley58@gmail.com designates 209.85.214.177 as permitted sender) smtp.mailfrom=mhkelley58@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700601634; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=3xc8GLr7b1Ue8tHqz7t72K+XV3v6lbBa6kVrEMAQXlc=; b=XBqbpsCUOUAtTN+D1UXSfmRLKXcr/a+Z1cT6r0mkRmEe0mDOUcqnn+6w185CLTFzUKemUm QylRQT4N8LU4WuxDj3/XGsEJS/N/Y76VqHaE1mUdjqIOhh9zinOkyXPimKDzQnb7SJaXnu 9ZC9fW4ruAiZVDOOyuR3HBaJd3xekwQ= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=j3c7vPHG; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf13.hostedemail.com: domain of mhkelley58@gmail.com designates 209.85.214.177 as permitted sender) smtp.mailfrom=mhkelley58@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700601634; a=rsa-sha256; cv=none; b=gt+lL1Zhbo4i3y7LJ0L0FmuejnashsE0e2/6tz2kFIGkSWFp6W+GCC8kQfXvE9Ro4EU6q8 7owWR+B2IaLCvM3b8bnUoms5jI95hL7cqXeBTzpT4jqRqTs764puiRcTm7leMAWiEJBFTa PGBImSzq1vg8c7Ilfxz8YNkzBDnvR5Y= Received: by mail-pl1-f177.google.com with SMTP id d9443c01a7336-1ce675e45f9so24644875ad.3 for ; Tue, 21 Nov 2023 13:20:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1700601633; x=1701206433; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:message-id:date :subject:to:from:from:to:cc:subject:date:message-id:reply-to; bh=3xc8GLr7b1Ue8tHqz7t72K+XV3v6lbBa6kVrEMAQXlc=; b=j3c7vPHGYEyEEeRQkTxRR+P0UcM98Je2obQmie1BtWeeVY5nQU+DU9riiDbkKEPgwB JTKi+Lcep8RIgrKuRYJHTKhO8PjEybNpkEzR/XZVpzus1YtAAnlW24XJ6UTeJMNue9RJ wE0isFOVG+W/2WW4JvUnjqn7/G4WuI7+wFYC5yfOaG4jRsF1qCZjkDAOOfIQZdZkAobZ R03nTbno/QMIbzRKaYfFYI0JuF1FmlB73Tp3arg4xwUmoG3VtWuHAkFzwxfVPOxiiu94 FQ3atI6gdpyvnpF3t6kSD5HxZFvTgqfITnfu0UhWPj8FJDSoyzQR2h8/GyDEt4tm1f3D mYTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700601633; x=1701206433; h=content-transfer-encoding:mime-version:reply-to:message-id:date :subject:to:from:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=3xc8GLr7b1Ue8tHqz7t72K+XV3v6lbBa6kVrEMAQXlc=; b=gdPrHu0mqXCBJvB8bp69ZylUFEyQIPSm/+baA0HmtX1jlGOosO+VfRH2jgqKFJD/JI KPxekiPga3iPfFdaG8tY45s9AVYT9rMN4Xxvod9yT+QtlVbLj4kNG5+DJME1ICocRDQY semDZgzWXu1FPktElNLgQJn3R1DSSCVyuIQkEJhv/aeU/IzRyagjWQrTWlZl/BfGwZ2D Sy0wYgHgXdDHgtvU5YqAkxcS5GFggH8yL1Wm901xTykvglAw7L+ltSen+4B5baokgBlK 4/5rMfbsaPRQNvHzrcXwdg3i6BfvBBfEr3x0WaH2KiG/YfVnSnxP2NSZ+VYDguaJQPhu n7zQ== X-Gm-Message-State: AOJu0Yw+4nnYBQ7j9+Pja0hf2m8jlDd1Wh87hXmfIRor9dQPIjFfoGQY 6LJRxykFEH0UQg8kabtNf2g= X-Google-Smtp-Source: AGHT+IHM5X1K1llz1VLKg9yvkhT/EXDhe6Mz6grLILsACT3NwCrpJBBQDYHT5we5qKHkDCR9OMYszw== X-Received: by 2002:a17:902:f542:b0:1ce:5b21:5c34 with SMTP id h2-20020a170902f54200b001ce5b215c34mr508877plf.5.1700601632801; Tue, 21 Nov 2023 13:20:32 -0800 (PST) Received: from localhost.localdomain (c-73-254-87-52.hsd1.wa.comcast.net. [73.254.87.52]) by smtp.gmail.com with ESMTPSA id j2-20020a170902758200b001bf52834696sm8281924pll.207.2023.11.21.13.20.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Nov 2023 13:20:32 -0800 (PST) From: mhkelley58@gmail.com X-Google-Original-From: mhklinux@outlook.com To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, kirill.shutemov@linux.intel.com, kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org, decui@microsoft.com, luto@kernel.org, peterz@infradead.org, akpm@linux-foundation.org, urezki@gmail.com, hch@infradead.org, lstoakes@gmail.com, thomas.lendacky@amd.com, ardb@kernel.org, jroedel@suse.de, seanjc@google.com, rick.p.edgecombe@intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, linux-hyperv@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 0/8] x86/coco: Mark CoCo VM pages not present when changing encrypted state Date: Tue, 21 Nov 2023 13:20:08 -0800 Message-Id: <20231121212016.1154303-1-mhklinux@outlook.com> X-Mailer: git-send-email 2.25.1 Reply-To: mhklinux@outlook.com MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Stat-Signature: 4kmoyht5f94hcsmwey7do89x6fu67kj1 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 1656020026 X-HE-Tag: 1700601633-161422 X-HE-Meta: U2FsdGVkX19szflBvlpaXwkbLjt6EhXHLG4LIWJES16upYBsUJn3tZRG4cvdoZgxSONXDBGhJ0OWK3kly1uJoKPdqm3arRe3GkObTIRAWRw5NllmUwTwOE1Wi2r6R1MPyVhtbeuelAEzveJ3i1Fz3viCpVJbgtiL4y33QSerDgNoTubuu6oLipOsbvfbll2o8UVLsYddebi0qd9d16EViVFPrdHHMWwhcPccJxP3Gl1MlEIgHBOs+mPYv6J+EXhE0cYeBd3FijaIK/DohoRRxX3ScoVgRQ2J/oOO5Gmq7v5Caz7rpR3OCmJcjqdP4s4Iz49zOhk89OHMw4CIFMZ4qseMp9LCd02YQuffkFYjmu7WPN6l/969XvItOav1hDHcPD0mf8nZZNRR1/4tOoaVBLjCgdXEeCWwfZQ4lrpYnH360w9jVkdgS7dFFqZzguFvDqpz+aqV+LeJPcXhIa12ftlafb2448NNiUSrqkJzMm3yJksRYspcrVrWbpahcFoBNdjQ37Sx9umICIyun0KZ+EpYV+4doQfGVQbwbEAQCPIu3jM2bcx+FsL88l0l4XPwTqLv5krno2s8Ba0N4WCAbo/NZilUI15ArwNx+Kga9gEPzdWbN3pYHTkFq/SvvONhIpnmmbxet6m4rmd4021iGc4mECeQAKYYCO/wcu3NDA1E5K/ltsUB9EUReRdLPG2Fu6RcLIYszjajTwU/vzj8jkShiNLMwasrfztKAdK+8M1XRuy+CLZt0bQF8pzwlq5EFNeejTDGfplFpdMQ3vo4wgHcD+useH0s/wGVZDFWRHa7gtAhf+rIaN8Ch0cLEmzDbQWbvpPwz8H44dZcgkWOd66QKttArvMT9nN59iCRIcTSglNnml6M13dZ63eP5CAZB25gBENAKGaCaFbce4ndsaJhcTX7IzmCWUCP4AJup0vI72kTk19HSM/oKxNKsErheWG+GY/YrtUB530M7Za cW0fbXVZ vwGcFz+3+iCLAPAnnFFdZvnWX/nS1o5f91IWAGZUkh7vQoxFmrdwXppKAe03v5RrfDUhkWCcU90f9Dwx058IW/MTnolYowYmYSUVF6GWpDPWlYFTqmaRVcjEwjCLMw3u5B1b5K5JcW0CtYLZnFPmd6J7raYO+9wRxG87eDHhRrCMQO913Lgl2WqwuLa3UiEAJLHnYF/yipgxp+io2jtUxvBkKGjdVYKW2APlWAb2eXlJTkYY5ZVt2/TE+R24KK8Z8yDwNdhQrq0KnH5xy0HMCC5rFKmmySjsqsYdKktygos9jZkNtJWgRq8uYCNzRF33u6Fjwod77Vlhl9CtRxXY+rUKBBVu9K999gHyXfg46frpASgBctIKxCInI8/rpeXxgB8HYTQeJ/jqovNEXYH+/bNBHWwOUlghHRAGmZXNbgipyyHhAMFQozvCoH+nqvyEJkrpeRoHLF9WeSLYCRF3p62jg0HHlTlaXVr5MbqQZCrhzUuiw7Z1P58C0TdfXF9oiP3jAjYOThimdOAbyah+qY+Ht0pgMzced8HkWBCSfYVZ22nqoHzcDmKOr3IB4THICn6ox8fmhLAezCQU1e+KDU+1A86M8eNjueH0iWzdOx3RSMgSAVOViI6VYstKcgfttPl3HJrjW8zrCym6lnPTAivTfT00EMOybJqu07BbBr02ERSPPSNqpyikXEA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Michael Kelley In a CoCo VM when a page transitions from encrypted to decrypted, or vice versa, attributes in the PTE must be updated *and* the hypervisor must be notified of the change. Because there are two separate steps, there's a window where the settings are inconsistent. Normally the code that initiates the transition (via set_memory_decrypted() or set_memory_encrypted()) ensures that the memory is not being accessed during a transition, so the window of inconsistency is not a problem. However, the load_unaligned_zeropad() function can read arbitrary memory pages at arbitrary times, which could read a transitioning page during the window. In such a case, CoCo VM specific exceptions are taken (depending on the CoCo architecture in use). Current code in those exception handlers recovers and does "fixup" on the result returned by load_unaligned_zeropad(). Unfortunately, this exception handling can't work in paravisor scenarios (TDX Paritioning and SEV-SNP in vTOM mode) if the exceptions are routed to the paravisor. The paravisor can't do load_unaligned_zeropad() fixup, so the exceptions would need to be forwarded from the paravisor to the Linux guest, but there are no architectural specs for how to do that. Fortunately, there's a simpler way to solve the problem by changing the core transition code in __set_memory_enc_pgtable() to do the following: 1. Remove aliasing mappings 2. Flush the data cache if needed 3. Remove the PRESENT bit from the PTEs of all transitioning pages 4. Set/clear the encryption attribute as appropriate 5. Flush the TLB so the changed encryption attribute isn't visible 6. Notify the hypervisor of the encryption status change 7. Add back the PRESENT bit, making the changed attribute visible With this approach, load_unaligned_zeropad() just takes its normal page-fault-based fixup path if it touches a page that is transitioning. As a result, load_unaligned_zeropad() and CoCo VM page transitioning are completely decoupled. CoCo VM page transitions can proceed without needing to handle architecture-specific exceptions and fix things up. This decoupling reduces the complexity due to separate TDX and SEV-SNP fixup paths, and gives more freedom to revise and introduce new capabilities in future versions of the TDX and SEV-SNP architectures. Paravisor scenarios work properly without needing to forward exceptions. Patch 1 handles implications of the hypervisor callbacks in Step 6 needing to do virt-to-phys translations on pages that are temporarily marked not present. Patch 2 ensures that Step 7 doesn't generate a TLB flush. It is a performance optimization only and is not necessary for correctness. Patches 3 and 4 handle the case where SEV-SNP does PVALIDATE in Step 6, which requires a valid virtual address. But since the PRESENT bit has been removed from the direct map virtual address, the PVALIDATE fails. These patches construct a temporary virtual address to be used by PVALIDATE. This code is SEV-SNP only because the TDX and Hyper-V paravisor flavors operate on physical addresses. Patches 5 and 6 are the core change that marks the transitioning pages as not present. Patch 6 is optional since retaining both the "prepare" and "finish" callbacks doesn't hurt anything and there might be an argument for retaining both for future flexibility. However, Patch 6 *does* eliminate about 75 lines of code and comments. Patch 7 is a somewhat tangential cleanup that removes an unnecessary wrapper function in the path for doing a transition. Patch 8 adds comments describing the implications of errors when doing a transition. These implications are discussed in the email thread for the RFC patch[1] and a patch proposed by Rick Edgecombe. [2][3] With this change, the #VE and #VC exception handlers should no longer be triggered for load_unaligned_zeropad() accesses, and the existing code in those handlers to do the "fixup" shouldn't be needed. But I have not removed that code in this patch set. Kirill Shutemov wants to keep the code for TDX #VE, so the code for #VC on the SEV-SNP side has also been kept. This patch set is based on the linux-next20231117 code tree. Changes in v2: * Added Patches 3 and 4 to deal with the failure on SEV-SNP [Tom Lendacky] * Split the main change into two separate patches (Patch 5 and Patch 6) to improve reviewability and to offer the option of retaining both hypervisor callbacks. * Patch 5 moves set_memory_p() out of an #ifdef CONFIG_X86_64 so that the code builds correctly for 32-bit, even though it is never executed for 32-bit [reported by kernel test robot] [1] https://lore.kernel.org/lkml/1688661719-60329-1-git-send-email-mikelley@microsoft.com/ [2] https://lore.kernel.org/lkml/20231017202505.340906-1-rick.p.edgecombe@intel.com/ [3] https://lore.kernel.org/lkml/20231024234829.1443125-1-rick.p.edgecombe@intel.com/ Michael Kelley (8): x86/coco: Use slow_virt_to_phys() in page transition hypervisor callbacks x86/mm: Don't do a TLB flush if changing a PTE that isn't marked present x86/mm: Remove "static" from vmap_pages_range() x86/sev: Enable PVALIDATE for PFNs without a valid virtual address x86/mm: Mark CoCo VM pages not present while changing encrypted state x86/mm: Merge CoCo prepare and finish hypervisor callbacks x86/mm: Remove unnecessary call layer for __set_memory_enc_pgtable() x86/mm: Add comments about errors in set_memory_decrypted()/encrypted() arch/x86/boot/compressed/sev.c | 2 +- arch/x86/coco/tdx/tdx.c | 66 +---------------- arch/x86/hyperv/ivm.c | 15 ++-- arch/x86/include/asm/sev.h | 6 +- arch/x86/include/asm/x86_init.h | 4 -- arch/x86/kernel/sev-shared.c | 57 ++++++++++++--- arch/x86/kernel/sev.c | 43 ++++++----- arch/x86/kernel/x86_init.c | 4 -- arch/x86/mm/mem_encrypt_amd.c | 23 +----- arch/x86/mm/pat/set_memory.c | 122 ++++++++++++++++++++++---------- include/linux/vmalloc.h | 2 + mm/vmalloc.c | 2 +- 12 files changed, 171 insertions(+), 175 deletions(-) -- 2.25.1