From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C4784C4828E for ; Fri, 2 Feb 2024 22:55:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 524846B0087; Fri, 2 Feb 2024 17:55:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4AC8F6B0088; Fri, 2 Feb 2024 17:55:01 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2FF866B0089; Fri, 2 Feb 2024 17:55:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 1A7D36B0087 for ; Fri, 2 Feb 2024 17:55:01 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id B0A4B1C0668 for ; Fri, 2 Feb 2024 22:54:59 +0000 (UTC) X-FDA: 81748370718.19.CCA2ACA Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf21.hostedemail.com (Postfix) with ESMTP id EB2411C0015 for ; Fri, 2 Feb 2024 22:54:57 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="V0C/7zFl"; spf=pass (imf21.hostedemail.com: domain of 3wXK9ZQYKCNAE0w95y2AA270.yA8749GJ-886Hwy6.AD2@flex--seanjc.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3wXK9ZQYKCNAE0w95y2AA270.yA8749GJ-886Hwy6.AD2@flex--seanjc.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706914498; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yST2IhC4x7xwue5FLDSVPscgeSgsqzoau2hWHl8C2YI=; b=tqQAg4PEu4frBY+dILemRLasi0Zn8crhDMuTjpyjWol8D8QcGcTgp15cEdsXnOaV4FGlmq 4/Xy27oddmUufQjljCPFhVtv+9iw4EKuJrCV0MmXDu7CGS8rKWyXEE42QH/IaSm+AB3zJ9 XRneQkEpM+8iLeHyLSNbe9gKjbn5aUo= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706914498; a=rsa-sha256; cv=none; b=bshgEi7GEcyQZlN56XlHP8K2DxUSD3fhNJCRm8Yl7SE0fSaXR5Xod5AEDE6kwx3t9YPMoP 6FZW2uoZlkwYOIvt4Dnz5O3hs4NP6ME2km6MLoE8klXxQlfJ8Qz0XJ0UczHg1HIkx/dSyu LWDVYu/7qrtdA037nr61/PJ61P4Hk7M= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="V0C/7zFl"; spf=pass (imf21.hostedemail.com: domain of 3wXK9ZQYKCNAE0w95y2AA270.yA8749GJ-886Hwy6.AD2@flex--seanjc.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3wXK9ZQYKCNAE0w95y2AA270.yA8749GJ-886Hwy6.AD2@flex--seanjc.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-6041c45ce1cso38045757b3.1 for ; Fri, 02 Feb 2024 14:54:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1706914497; x=1707519297; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=yST2IhC4x7xwue5FLDSVPscgeSgsqzoau2hWHl8C2YI=; b=V0C/7zFlgQ6Cgtu0iNoH4Em4XtFrOHTUampKFJjkn3EMgv2G3SdtsDA+dLi3/YoTVV xE1dU8vWKywJAcusBexfKAD0Du6/VBpWtl5uytKk0/D31fI845YxsFp6+EpU1JWCHYWV p4Pmq2nnPdBKxqaAPeFtsGb21XGTwifC1HpyWWofeu8D+NzytZEpx30csVeuTLjneFkJ klcc107BG8Z3A7lxzomGCTtjHXj7dJvolLTpzzHnAQgaVV2mWVWTWLXBBasoxoXQ09tO TIdqOPIUVVD8bV8e3r4HnkbXPFXAj9BbJGl756VFHpdrZ7mUQuGhlXnyAVjOEDc0KuoK fqYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706914497; x=1707519297; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=yST2IhC4x7xwue5FLDSVPscgeSgsqzoau2hWHl8C2YI=; b=IA+MyfHPRTEJP89qvIlraQ4NDz2lDuOKrCjm1lifwDKTMqsCq06mv+JJeKNyg2OWBr 6Sag+zBN6tP9+dskLyLN6DoT1KDCloC8J63rV82fLeSDdUdlP03vQ8XalOzCmiOKSQl1 NWd/I00jO0gGZOtHHciJEgQ2WYozHGNL3fi93TPG8ryZrm4lRFixU3zEki+IEDpvCIyt sy85BwGz233cmZbAP/CTkei1G59UsJ8v/DifbOuHNbMb8yGG7fXgqBuIHWs/s197AxwK z/EreYdDLIXPi2fF0k+URJ8SeNnck6rmx5BUTHHGtDjaGY3wrZUNS5K2WoqDYqTFu7+H frJQ== X-Gm-Message-State: AOJu0Yy5VAvf9xh8FORoDTPzFQ3cKIxmh6O6u48XIEIVg/kryrmJ8QDh 8RsblM1ls6lnRIykk+t3eyhsRV7uz0OavWn3f2Dp2qK7ebj3+vtYbAFhlPBZGKx8CYhbisTrWdi XsQ== X-Google-Smtp-Source: AGHT+IHpo//SwtMaQHEuaxSujjkaLfTq03FTpTzav293FCFg3laEeYePeMZVtQBRLJICB1llqkhzIf5zGgs= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:6902:993:b0:dc2:25fd:eff1 with SMTP id bv19-20020a056902099300b00dc225fdeff1mr234713ybb.4.1706914497040; Fri, 02 Feb 2024 14:54:57 -0800 (PST) Date: Fri, 2 Feb 2024 14:54:55 -0800 In-Reply-To: <20240116041457.wver7acnwthjaflr@amd.com> Mime-Version: 1.0 References: <20231230172351.574091-1-michael.roth@amd.com> <20231230172351.574091-19-michael.roth@amd.com> <20240116041457.wver7acnwthjaflr@amd.com> Message-ID: Subject: Re: [PATCH v11 18/35] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command From: Sean Christopherson To: Michael Roth Cc: kvm@vger.kernel.org, linux-coco@lists.linux.dev, linux-mm@kvack.org, linux-crypto@vger.kernel.org, x86@kernel.org, linux-kernel@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, jroedel@suse.de, thomas.lendacky@amd.com, hpa@zytor.com, ardb@kernel.org, pbonzini@redhat.com, vkuznets@redhat.com, jmattson@google.com, luto@kernel.org, dave.hansen@linux.intel.com, slp@redhat.com, pgonda@google.com, peterz@infradead.org, srinivas.pandruvada@linux.intel.com, rientjes@google.com, dovmurik@linux.ibm.com, tobin@ibm.com, bp@alien8.de, vbabka@suse.cz, kirill@shutemov.name, ak@linux.intel.com, tony.luck@intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, alpergun@google.com, jarkko@kernel.org, ashish.kalra@amd.com, nikunj.dadhania@amd.com, pankaj.gupta@amd.com, liam.merwick@oracle.com, zhi.a.wang@intel.com, Brijesh Singh Content-Type: text/plain; charset="us-ascii" X-Rspamd-Queue-Id: EB2411C0015 X-Rspam-User: X-Stat-Signature: qahqyht58szy1iww5x3c1xjsq4458rbd X-Rspamd-Server: rspam03 X-HE-Tag: 1706914497-683331 X-HE-Meta: U2FsdGVkX18l9KII+2F9hiOxqqzh2Nh6WSAwyeKA6Psgse9M2O6ZSqpRWgsmNfbMS4AVId6r9JIq/4+7a5hMXk0I+VOpVGlCt8Wfz/Pp0WJDC6rB1et4o+dlFC87QS9hsAc9YMUc7bcy3O1mg/+ZcyZSfz0dt2qin3RLCYSAL7zXr6r0GiGrfc2n/qtlr6r2h7KTRUtw2TO3hyY5RwGHpy1PNyiSC4h7FaRttNhEJbNHWraKpICRA7V33QIJ3Afy4xDhBndJVxnvEFMYzWq0EA8Pw1TY3CYSDYxA2icjibkbacR5Q6VAsJ/VsnV9swoe4wVMPZzn9bpL07G0UipTvHDFhr6q/5h3GV3HTMDTQfG3hnSRXu3likofk9PHqxbBfBGcxoEUAm4REIZwWOdRz9BK5aJxTfFZZDpkaLuA2IviDx1Q2t+axbl4yAtJPBWpmhJvdKh4HMZBBWfE3J6h5EZJ3CVS2NIqa/mdIBwZfhVNn68gIRmys8hpx1Dv608gIOVqLTzyMnYHsl/HTMVg9cJVukN1+17Ncg5JXyBEqz+JzmYVI/uY7AW2N5gCtM49azZu2BgvgDn66BBXiM6I94BLoLbROTq8S6QjCVZ/1qZY/Q/aaMtsR+pZLN/9lfaEHgsa0Q90aJHcSGcNKFhHbiRflAdoc7uwApsrbg7bSdFf9/KH1fcj5iWgPMrUZ3PGULVfgAFYw+5YCsjj3yJRIrZpYIZP8hy1gsOWi0wT32R5d0PPuDJNNzGgoXe08hPZtXoNDuvZ9UDTNQyiuoDAw4qm0RFjJfJ1ENr3QY8yBT9RWNA3XZf5sAwje99QhbAphhzaEiDf0Sm2JYHVaqgLozq813s9XJylL54KVXrHaNXZ1Vamcvnz7LrDYscity2cVeVlL6Fh646qqoEZ3cycFUIuMRhUtFpexq5G3hF0oBiNdP+eRIHuB8VFDAkTv0mn4CPsoJKSMUlKBCt/csH dGxIgLOV u6BmYqj1gxophyMFl14i+vV4+GyOKKSjZ5PjbSeyeG3g0CB3rnrcTsyNJlXoCtO9nfSfoWiK1gUlASgAsab1Gnln94y85yiTkXou5qdxZYwAqPFV/dCCIdZvL+ginmPhLHS5YRyK95b5eEXhQukBXyw4I26ZmME/XA2rqny7yl6DFnYyiazgKSaWAqHLgbkq6JCYSGSEEK5qQwagapVenoQoW5pXQ2n8oDsy1WGS3Jr6Y1i0HLHjgz2diZe5nDpDXbwSuXPTmaOj+u15WSvGU/6p/DmizoAogDbGD1PKys7Qxo1rvDDWd+04vohLRp9b6WwhAm7XrvEPnqMRteh5o3L86hgMkZ+ghTlMd0v13TVBjRI23J6+lFkBv91bMBy9XqLDWcC1mhnOfAiz8b/AOqZHAJKCh+bX7OAkC8BX5GmjXA1F0+acnYrz8evmZ3F59ADuY7TAtNUMMwmIdSDNZMwlB2d0OPmSSTAJhkmmOh7lZv6HCDlOb71GDrmN5/ubWBTPcUxKmiiII1Qat9Fs/GEj69cgxTfJhD+v2Tras4fco2wqmOwpDtw5/YQxgTrjRWv0Y5LNp6QwwQmBiHL0H7gejV+XAwLX3gXMdR5NmtI8Q2O7+J5oykYBjpCXNyZ6UJcFoC2jSjFu1810BUeuoR3vfztPAjxqgjt8UnvizSZeLQolTukHsHDHc6i6lcWd/rrXxwmuDCbnXjIUSNjX+OS8H0w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jan 15, 2024, Michael Roth wrote: > On Wed, Jan 10, 2024 at 07:45:36AM -0800, Sean Christopherson wrote: > > On Sat, Dec 30, 2023, Michael Roth wrote: > > > diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst > > > index b1beb2fe8766..d4325b26724c 100644 > > > --- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst > > > +++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst > > > @@ -485,6 +485,34 @@ Returns: 0 on success, -negative on error > > > > > > See the SEV-SNP specification for further detail on the launch input. > > > > > > +20. KVM_SNP_LAUNCH_UPDATE > > > +------------------------- > > > + > > > +The KVM_SNP_LAUNCH_UPDATE is used for encrypting a memory region. It also > > > +calculates a measurement of the memory contents. The measurement is a signature > > > +of the memory contents that can be sent to the guest owner as an attestation > > > +that the memory was encrypted correctly by the firmware. > > > + > > > +Parameters (in): struct kvm_snp_launch_update > > > + > > > +Returns: 0 on success, -negative on error > > > + > > > +:: > > > + > > > + struct kvm_sev_snp_launch_update { > > > + __u64 start_gfn; /* Guest page number to start from. */ > > > + __u64 uaddr; /* userspace address need to be encrypted */ > > > > Huh? Why is KVM taking a userspace address? IIUC, the address unconditionally > > gets translated into a gfn, so why not pass a gfn? > > > > And speaking of gfns, AFAICT start_gfn is never used. > > I think having both the uaddr and start_gfn parameters makes sense. I > think it's only awkward because how I'm using the memslot to translate > the uaddr to a GFN in the current implementation, Yes. > > Oof, reading more of the code, this *requires* an effective in-place copy-and-convert > > of guest memory. > > Yes, I'm having some trouble locating the various threads, but initially > there were some discussions about having a userspace API that allows for > UPM/gmem pages to be pre-populated before they are in-place encrypted, but > we'd all eventually decided that having KVM handle this internally was > the simplest approach. > > So that's how it's done here, KVM_SNP_LAUNCH_UPDATE copies the pages into > gmem, then passes those pages on to firmware for encryption. Then the > VMM is expected to mark the GFN range as private via > KVM_SET_MEMORY_ATTRIBUTES, since the VMM understands what constitutes > initial private/encrypted payload. I should document that better in > KVM_SNP_LAUNCH_UPDATE docs however. That's fine. As above, my complaints are that the unencrypted source *must* be covered by a memslot. That's beyond ugly. > > What's "the IMI"? Initial Measurement Image? I assume this is essentially just > > a flag that communicates whether or not the page should be measured? > > This is actually for loading a measured migration agent into the target > system so that it can handle receiving the encrypted guest data from the > source. There's still a good deal of planning around how live migration > will be handled however so if you think it's premature to expose this > via KVM I can remove the related fields. Yes, please. Though FWIW, I honestly hope KVM_SEV_SNP_LAUNCH_UPDATE goes away and we end up with a common uAPI for populating guest memory: https://lore.kernel.org/all/Zbrj5WKVgMsUFDtb@google.com > > > + __u8 page_type; /* page type */ > > > + __u8 vmpl3_perms; /* VMPL3 permission mask */ > > > + __u8 vmpl2_perms; /* VMPL2 permission mask */ > > > + __u8 vmpl1_perms; /* VMPL1 permission mask */ > > > > Why? KVM doesn't support VMPLs. > > It does actually get used by the SVSM. > I can remove these but then we'd need some capability bit or something to > know when they are available if they get re-introduced. _If_. We don't merge dead code, and we _definitely_ don't merge dead code that creates ABI. > > > + kvaddr = pfn_to_kaddr(pfns[i]); > > > + if (!virt_addr_valid(kvaddr)) { > > > > I really, really don't like that this assume guest_memfd is backed by struct page. > > There are similar enforcements in the SEV/SEV-ES code. There was some > initial discussion about relaxing this for SNP so we could support > things like /dev/mem-mapped guest memory, but then guest_memfd came > along and made that to be an unlikely use-case in the near-term given > that it relies on alloc_pages() currently and explicitly guards against > mmap()'ing pages in userspace. > > I think it makes to keep the current tightened restrictions in-place > until such a use-case comes along, since otherwise we are relaxing a > bunch of currently-useful sanity checks that span all throughout the code > to support some nebulous potential use-case that might never come along. > I think it makes more sense to cross that bridge when we get there. I disagree. You say "sanity checks", while I see a bunch of arbitrary assumptions that don't need to exist. Yes, today guest_memfd always uses struct page memory, but that should have _zero_ impact on SNP. Arbitrary assumptions often cause a lot of confusion for future readers, e.g. a few years from now, if the above code still exists, someone might wonder what is so special about struct page memory, and then waste a bunch of time trying to figure out that there's no technical reason SNP "requires" struct page memory. This is partly why I was pushing for guest_memfd to handle some of this; the gmem code _knows_ what backing type it's using, it _knows_ if the direct map is valid, etc. > > > + pr_err("Invalid HVA 0x%llx for GFN 0x%llx\n", (uint64_t)kvaddr, gfn); > > > + ret = -EINVAL; > > > + goto e_release; > > > + } > > > + > > > + ret = kvm_read_guest_page(kvm, gfn, kvaddr, 0, PAGE_SIZE); > > > > Good gravy. If I'm reading this correctly, KVM: > > > > 1. Translates an HVA into a GFN. > > 2. Gets the PFN for that GFN from guest_memfd > > 3. Verifies the PFN is not assigned to the guest > > 4. Copies memory from the shared memslot page to the guest_memfd page > > 5. Converts the page to private and asks the PSP to encrypt it > > > > (a) As above, why is #1 a thing? > > Yah, it's probably best to avoid this, as proposed above. > > > (b) Why are KVM's memory attributes never consulted? > > It doesn't really matter if the attributes are set before or after > KVM_SNP_LAUNCH_UPDATE, only that by the time the guest actually launches > they pages get set to private so they get faulted in from gmem. We could > document our expectations and enforce them here if that's preferable > however. Maybe requiring KVM_SET_MEMORY_ATTRIBUTES(private) in advance > would make it easier to enforce that userspace does the right thing. > I'll see how that looks if there are no objections. Userspace owns whether a page is PRIVATE or SHARED, full stop. If KVM can't honor that, then we need to come up with better uAPI. > > (c) What prevents TOCTOU issues with respect to the RMP? > > Time-of-use will be when the guest faults the gmem page in with C-bit > set. If it is not in the expected Guest-owned/pre-validated state that > SEV_CMD_SNP_LAUNCH_UPDATE expected/set, then the guest will get an RMP > fault or #VC exception for unvalidated page access. It will also fail > attestation. Not sure if that covers the scenarios you had in mind. I don't think this covers what I was asking, but I suspect my concern will go away once the new APIs come along, so let's table this for now. > > > (d) Why is *KVM* copying memory into guest_memfd? > > As mentioned above, there were various discussions of ways of allowing > userspace to pwrite() to the guest_memfd in advance of > "sealing"/"binding" it and then encrypting it in place. I think this was > one of the related threads: > > https://lore.kernel.org/linux-mm/YkyKywkQYbr9U0CA@google.com/ > > My read at the time was that the requirements between pKVM/TDX/SNP were all > so unique that we'd spin forever trying to come up with a userspace ABI > that worked for everyone. At the time you'd suggested for pKVM to handle > their specific requirements internally in pKVM to avoid unecessary churn on > TDX/SNP side, and I took the same approach with SNP in implementing it > internally in SNP's KVM interfaces since it seemed unlikely there would > be much common ground with how TDX handles it via KVM_TDX_INIT_MEM_REGION. Yeah, the whole "intra-memslot copy" thing threw me. > > (e) What guarantees the direct map is valid for guest_memfd? > > Are you suggesting this may change in the near-term? I asking because, when I asked, I was unaware that the plan was to shatter the direct to address the 2MiB vs. 4KiB erratum (as opposed to unmapping guest_memfd pfns). > > > + if (ret) { > > > + pr_err("SEV-SNP launch update failed, ret: 0x%x, fw_error: 0x%x\n", > > > + ret, *error); > > > + snp_page_reclaim(pfns[i]); > > > + > > > + /* > > > + * When invalid CPUID function entries are detected, the firmware > > > + * corrects these entries for debugging purpose and leaves the > > > + * page unencrypted so it can be provided users for debugging > > > + * and error-reporting. > > > + * > > > + * Copy the corrected CPUID page back to shared memory so > > > + * userpsace can retrieve this information. > > > > Why? IIUC, this is basically backdooring reads/writes into guest_memfd to avoid > > having to add proper mmap() support. > > The CPUID page is private/encrypted, so it needs to be a gmem page. > SNP firmware is doing the backdooring when it writes the unencrypted, > expected contents into the page during failure. What's wrong with copying > the contents back into the source page so userspace can be use of it? > If we implement the above-mentioned changes then the source page is just > a userspace buffer that isn't necessarily associated with a memslot, so > it becomes even more straightforward. > > Would that be acceptable? Yes, I am specifically complaining about writing guest memory on failure, which is all kinds of weird. > > > + kvfree(pfns); > > > > Saving PFNs from guest_memfd, which is fully owned by KVM, is so unnecessarily > > complex. Add a guest_memfd API (or three) to do this safely, e.g. to lock the > > pages, do (and track) the RMP conversion, etc... > > Is adding 3 gmem APIs and tracking RMP states inside gmem really less > complex than what's going on here? I think we covered this in PUCK? Holler if you still have questions here.