[PATCH v12 00/29] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v12 00/29] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
@ 2024-03-29 22:58 Michael Roth
  2024-03-29 22:58 ` [PATCH v12 01/29] [TEMP] x86/kvm/Kconfig: Have KVM_AMD_SEV select ARCH_HAS_CC_PLATFORM Michael Roth
                   ` (29 more replies)
  0 siblings, 30 replies; 55+ messages in thread
From: Michael Roth @ 2024-03-29 22:58 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick

This patchset is also available at:

  https://github.com/amdese/linux/commits/snp-host-v12

and is based on top of the following series:

  [PATCH gmem 0/6] gmem fix-ups and interfaces for populating gmem pages
  https://lore.kernel.org/kvm/20240329212444.395559-1-michael.roth@amd.com/ 

which in turn is based on:

  https://git.kernel.org/pub/scm/virt/kvm/kvm.git/log/?h=kvm-coco-queue

Patch Layout
------------

01-04: These patches are minor dependencies for this series and will
       eventually make their way upstream through other trees. They are
       included here only temporarily.

05-09: These patches add some basic infrastructure and introduces a new
       KVM_X86_SNP_VM vm_type to handle differences verses the existing
       KVM_X86_SEV_VM and KVM_X86_SEV_ES_VM types.

10-12: These implement the KVM API to handle the creation of a
       cryptographic launch context, encrypt/measure the initial image
       into guest memory, and finalize it before launching it.

13-20: These implement handling for various guest-generated events such
       as page state changes, onlining of additional vCPUs, etc.

21-24: These implement the gmem hooks needed to prepare gmem-allocated
       pages before mapping them into guest private memory ranges as
       well as cleaning them up prior to returning them to the host for
       use as normal memory. Because this supplants certain activities
       like issued WBINVDs during KVM MMU invalidations, there's also
       a patch to avoid duplicating that work to avoid unecessary
       overhead.

25:    With all the core support in place, the patch adds a kvm_amd module
       parameter to enable SNP support.

26-29: These patches all deal with the servicing of guest requests to handle
       things like attestation, as well as some related host-management
       interfaces.

Testing
-------

For testing this via QEMU, use the following tree:

  https://github.com/amdese/qemu/commits/snp-v4-wip2

A patched OVMF is also needed due to upstream KVM no longer supporting MMIO
ranges that are mapped as private. It is recommended you build the AmdSevX64
variant as it provides the kernel-hashing support present in this series:

  https://github.com/amdese/ovmf/commits/apic-mmio-fix1c

A basic command-line invocation for SNP would be:

 qemu-system-x86_64 -smp 32,maxcpus=255 -cpu EPYC-Milan-v2
  -machine q35,confidential-guest-support=sev0,memory-backend=ram1
  -object memory-backend-memfd,id=ram1,size=4G,share=true,reserve=false
  -object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1,id-auth=
  -bios /home/mroth/ovmf/OVMF_CODE-upstream-20240228-apicfix-1c-AmdSevX64.fd

With kernel-hashing and certificate data supplied:

 qemu-system-x86_64 -smp 32,maxcpus=255 -cpu EPYC-Milan-v2
  -machine q35,confidential-guest-support=sev0,memory-backend=ram1
  -object memory-backend-memfd,id=ram1,size=4G,share=true,reserve=false
  -object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1,id-auth=,certs-path=/home/mroth/cert.blob,kernel-hashes=on
  -bios /home/mroth/ovmf/OVMF_CODE-upstream-20240228-apicfix-1c-AmdSevX64.fd
  -kernel /boot/vmlinuz-6.8.0-snp-host-v12-wip40+
  -initrd /boot/initrd.img-6.8.0-snp-host-v12-wip40+
  -append "root=UUID=d72a6d1c-06cf-4b79-af43-f1bac4f620f9 ro console=ttyS0,115200n8"

Known issues / TODOs
--------------------

 * Base tree in some cases reports "Unpatched return thunk in use. This should 
   not happen!" the first time it runs an SVM/SEV/SNP guests. This a recent
   regression upstream and unrelated to this series:

     https://lore.kernel.org/linux-kernel/CANpmjNOcKzEvLHoGGeL-boWDHJobwfwyVxUqMq2kWeka3N4tXA@mail.gmail.com/T/

 * 2MB hugepage support has been dropped pending discussion on how we plan
   to re-enable it in gmem.

 * Host kexec should work, but there is a known issue with handling host
   kdump while SNP guests are running which will be addressed as a follow-up.

 * SNP kselftests are currently a WIP and will be included as part of SNP
   upstreaming efforts in the near-term.

SEV-SNP Overview
----------------

This part of the Secure Encrypted Paging (SEV-SNP) series focuses on the
changes required to add KVM support for SEV-SNP. This series builds upon
SEV-SNP guest support, which is now in mainline, and and SEV-SNP host
initialization support, which is now in linux-next.

While series provides the basic building blocks to support booting the
SEV-SNP VMs, it does not cover all the security enhancement introduced by
the SEV-SNP such as interrupt protection, which will added in the future.

With SNP, when pages are marked as guest-owned in the RMP table, they are
assigned to a specific guest/ASID, as well as a specific GFN with in the
guest. Any attempts to map it in the RMP table to a different guest/ASID,
or a different GFN within a guest/ASID, will result in an RMP nested page
fault.

Prior to accessing a guest-owned page, the guest must validate it with a
special PVALIDATE instruction which will set a special bit in the RMP table
for the guest. This is the only way to set the validated bit outside of the
initial pre-encrypted guest payload/image; any attempts outside the guest to
modify the RMP entry from that point forward will result in the validated
bit being cleared, at which point the guest will trigger an exception if it
attempts to access that page so it can be made aware of possible tampering.

One exception to this is the initial guest payload, which is pre-validated
by the firmware prior to launching. The guest can use Guest Message requests 
to fetch an attestation report which will include the measurement of the
initial image so that the guest can verify it was booted with the expected
image/environment.

After boot, guests can use Page State Change requests to switch pages
between shared/hypervisor-owned and private/guest-owned to share data for
things like DMA, virtio buffers, and other GHCB requests.

In this implementation of SEV-SNP, private guest memory is managed by a new
kernel framework called guest_memfd (gmem). With gmem, a new
KVM_SET_MEMORY_ATTRIBUTES KVM ioctl has been added to tell the KVM
MMU whether a particular GFN should be backed by shared (normal) memory or
private (gmem-allocated) memory. To tie into this, Page State Change
requests are forward to userspace via KVM_EXIT_VMGEXIT exits, which will
then issue the corresponding KVM_SET_MEMORY_ATTRIBUTES call to set the
private/shared state in the KVM MMU.

The gmem / KVM MMU hooks implemented in this series will then update the RMP
table entries for the backing PFNs to set them to guest-owned/private when
mapping private pages into the guest via KVM MMU, or use the normal KVM MMU
handling in the case of shared pages where the corresponding RMP table
entries are left in the default shared/hypervisor-owned state.

Feedback/review is very much appreciated!

-Mike

Changes since v11:

 * Rebase series on kvm-coco-queue and re-work to leverage more
   infrastructure between SNP/TDX series.
 * Drop KVM_SNP_INIT in favor of the new KVM_SEV_INIT2 interface introduced
   here (Paolo):
     https://lore.kernel.org/lkml/20240318233352.2728327-1-pbonzini@redhat.com/
 * Drop exposure API fields related to things like VMPL levels, migration
   agents, etc., until they are actually supported/used (Sean)
 * Rework KVM_SEV_SNP_LAUNCH_UPDATE handling to use a new
   kvm_gmem_populate() interface instead of copying data directly into
   gmem-allocated pages (Sean)
 * Add support for SNP_LOAD_VLEK, rework the SNP_SET_CONFIG_{START,END} to
   have simpler semantics that are applicable to management of SNP_LOAD_VLEK
   updates as well, rename interfaces to the now more appropriate
   SNP_{PAUSE,RESUME}_ATTESTATION
 * Fix up documentation wording and do print warnings for
   userspace-triggerable failures (Peter, Sean)
 * Fix a race with AP_CREATION wake-up events (Jacob, Sean)
 * Fix a memory leak with VMSA pages (Sean)
 * Tighten up handling of RMP page faults to better distinguish between real
   and spurious cases (Tom)
 * Various patch/documentation rewording, cleanups, etc.

Changes since v10:

 * Split off host initialization patches to separate series
 * Drop SNP_{SET,GET}_EXT_CONFIG SEV ioctls, and drop 
   KVM_SEV_SNP_{SET,GET}_CERTS KVM ioctls. Instead, all certificate data is
   now fetched from uerspace as part of a new KVM_EXIT_VMGEXIT event type.
   (Sean, Dionna)
 * SNP_SET_EXT_CONFIG is now replaced with a more basic SNP_SET_CONFIG,
   which is now just a light wrapper around the SNP_CONFIG firmware command,
   and SNP_GET_EXT_CONFIG is now redundant with existing SNP_PLATFORM_STATUS,
   so just stick with that interface
 * Introduce SNP_SET_CONFIG_{START,END}, which can be used to pause extended
   guest requests while reported TCB / certificates are being updated so
   the updates are done atomically relative to running guests.
 * Improve documentation for KVM_EXIT_VMGEXIT event types and tighten down
   the expected input/output for union types rather than exposing GHCB
   page/MSR
 * Various re-factorings, commit/comments fixups (Boris, Liam, Vlastimil) 
 * Make CONFIG_KVM_AMD_SEV depend on KVM_GENERIC_PRIVATE_MEM instead of
   CONFIG_KVM_SW_PROTECTED_VM (Paolo)
 * Include Sean's patch to add hugepage support to gmem, but modify it based
   on discussions to be best-effort and not rely on explicit flag

----------------------------------------------------------------
Ashish Kalra (1):
      KVM: SEV: Avoid WBINVD for HVA-based MMU notifications for SNP

Borislav Petkov (AMD) (3):
      [TEMP] x86/kvm/Kconfig: Have KVM_AMD_SEV select ARCH_HAS_CC_PLATFORM
      [TEMP] x86/cc: Add cc_platform_set/_clear() helpers
      [TEMP] x86/CPU/AMD: Track SNP host status with cc_platform_*()

Brijesh Singh (11):
      KVM: x86: Define RMP page fault error bits for #NPF
      KVM: SEV: Add GHCB handling for Hypervisor Feature Support requests
      KVM: SEV: Add KVM_SEV_SNP_LAUNCH_START command
      KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command
      KVM: SEV: Add support to handle GHCB GPA register VMGEXIT
      KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT
      KVM: SEV: Add support to handle Page State Change VMGEXIT
      KVM: x86: Export the kvm_zap_gfn_range() for the SNP use
      KVM: SEV: Add support to handle RMP nested page faults
      KVM: SVM: Add module parameter to enable the SEV-SNP
      KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event

Michael Roth (10):
      KVM: SEV: Select KVM_GENERIC_PRIVATE_MEM when CONFIG_KVM_AMD_SEV=y
      KVM: SEV: Add initial SEV-SNP support
      KVM: SEV: Add KVM_SEV_SNP_LAUNCH_FINISH command
      KVM: SEV: Add support for GHCB-based termination requests
      KVM: SEV: Implement gmem hook for initializing private pages
      KVM: SEV: Implement gmem hook for invalidating private pages
      KVM: x86: Implement gmem hook for determining max NPT mapping level
      crypto: ccp: Add the SNP_VLEK_LOAD command
      crypto: ccp: Add the SNP_{PAUSE,RESUME}_ATTESTATION commands
      KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event

Paolo Bonzini (1):
      [TEMP] fixup! KVM: SEV: sync FPU and AVX state at LAUNCH_UPDATE_VMSA time

Tom Lendacky (3):
      KVM: SEV: Add support to handle AP reset MSR protocol
      KVM: SEV: Use a VMSA physical address variable for populating VMCB
      KVM: SEV: Support SEV-SNP AP Creation NAE event

 Documentation/virt/coco/sev-guest.rst              |   50 +-
 Documentation/virt/kvm/api.rst                     |   73 +
 .../virt/kvm/x86/amd-memory-encryption.rst         |   88 +-
 arch/x86/coco/core.c                               |   52 +
 arch/x86/include/asm/kvm_host.h                    |    8 +
 arch/x86/include/asm/sev-common.h                  |   22 +-
 arch/x86/include/asm/sev.h                         |   15 +-
 arch/x86/include/asm/svm.h                         |    9 +-
 arch/x86/include/uapi/asm/kvm.h                    |   39 +
 arch/x86/kernel/cpu/amd.c                          |   38 +-
 arch/x86/kernel/cpu/mtrr/generic.c                 |    2 +-
 arch/x86/kernel/fpu/xstate.c                       |    1 +
 arch/x86/kernel/sev.c                              |   10 -
 arch/x86/kvm/Kconfig                               |    4 +
 arch/x86/kvm/mmu.h                                 |    2 -
 arch/x86/kvm/mmu/mmu.c                             |    1 +
 arch/x86/kvm/svm/sev.c                             | 1410 +++++++++++++++++++-
 arch/x86/kvm/svm/svm.c                             |   48 +-
 arch/x86/kvm/svm/svm.h                             |   50 +
 arch/x86/kvm/x86.c                                 |   18 +-
 arch/x86/virt/svm/sev.c                            |   90 +-
 drivers/crypto/ccp/sev-dev.c                       |   85 +-
 drivers/iommu/amd/init.c                           |    4 +-
 include/linux/cc_platform.h                        |   12 +
 include/linux/psp-sev.h                            |    4 +-
 include/uapi/linux/kvm.h                           |   28 +
 include/uapi/linux/psp-sev.h                       |   39 +
 include/uapi/linux/sev-guest.h                     |    9 +
 virt/kvm/guest_memfd.c                             |    4 +-
 29 files changed, 2121 insertions(+), 94 deletions(-)

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v12 01/29] [TEMP] x86/kvm/Kconfig: Have KVM_AMD_SEV select ARCH_HAS_CC_PLATFORM
  2024-03-29 22:58 [PATCH v12 00/29] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
@ 2024-03-29 22:58 ` Michael Roth
  2024-03-29 22:58 ` [PATCH v12 02/29] [TEMP] x86/cc: Add cc_platform_set/_clear() helpers Michael Roth
                   ` (28 subsequent siblings)
  29 siblings, 0 replies; 55+ messages in thread
From: Michael Roth @ 2024-03-29 22:58 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick

From: "Borislav Petkov (AMD)" <bp@alien8.de>

The functionality to load SEV-SNP guests by the host will soon rely on
cc_platform* helpers because the cpu_feature* API with the early
patching is insufficient when SNP support needs to be disabled late.

Therefore, pull that functionality in.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 8c3032a96caf..6a76ba7b6bac 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -122,6 +122,7 @@ config KVM_AMD_SEV
 	bool "AMD Secure Encrypted Virtualization (SEV) support"
 	depends on KVM_AMD && X86_64
 	depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
+	select ARCH_HAS_CC_PLATFORM
 	help
 	  Provides support for launching Encrypted VMs (SEV) and Encrypted VMs
 	  with Encrypted State (SEV-ES) on AMD processors.
-- 
2.25.1



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v12 02/29] [TEMP] x86/cc: Add cc_platform_set/_clear() helpers
  2024-03-29 22:58 [PATCH v12 00/29] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
  2024-03-29 22:58 ` [PATCH v12 01/29] [TEMP] x86/kvm/Kconfig: Have KVM_AMD_SEV select ARCH_HAS_CC_PLATFORM Michael Roth
@ 2024-03-29 22:58 ` Michael Roth
  2024-03-29 22:58 ` [PATCH v12 03/29] [TEMP] x86/CPU/AMD: Track SNP host status with cc_platform_*() Michael Roth
                   ` (27 subsequent siblings)
  29 siblings, 0 replies; 55+ messages in thread
From: Michael Roth @ 2024-03-29 22:58 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick

From: "Borislav Petkov (AMD)" <bp@alien8.de>

Add functionality to set and/or clear different attributes of the
machine as a confidential computing platform. Add the first one too:
whether the machine is running as a host for SEV-SNP guests.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/coco/core.c        | 52 +++++++++++++++++++++++++++++++++++++
 include/linux/cc_platform.h | 12 +++++++++
 2 files changed, 64 insertions(+)

diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
index d07be9d05cd0..8c3fae23d3c6 100644
--- a/arch/x86/coco/core.c
+++ b/arch/x86/coco/core.c
@@ -16,6 +16,11 @@
 enum cc_vendor cc_vendor __ro_after_init = CC_VENDOR_NONE;
 u64 cc_mask __ro_after_init;
 
+static struct cc_attr_flags {
+	__u64 host_sev_snp	: 1,
+	      __resv		: 63;
+} cc_flags;
+
 static bool noinstr intel_cc_platform_has(enum cc_attr attr)
 {
 	switch (attr) {
@@ -89,6 +94,9 @@ static bool noinstr amd_cc_platform_has(enum cc_attr attr)
 	case CC_ATTR_GUEST_SEV_SNP:
 		return sev_status & MSR_AMD64_SEV_SNP_ENABLED;
 
+	case CC_ATTR_HOST_SEV_SNP:
+		return cc_flags.host_sev_snp;
+
 	default:
 		return false;
 	}
@@ -148,3 +156,47 @@ u64 cc_mkdec(u64 val)
 	}
 }
 EXPORT_SYMBOL_GPL(cc_mkdec);
+
+static void amd_cc_platform_clear(enum cc_attr attr)
+{
+	switch (attr) {
+	case CC_ATTR_HOST_SEV_SNP:
+		cc_flags.host_sev_snp = 0;
+		break;
+	default:
+		break;
+	}
+}
+
+void cc_platform_clear(enum cc_attr attr)
+{
+	switch (cc_vendor) {
+	case CC_VENDOR_AMD:
+		amd_cc_platform_clear(attr);
+		break;
+	default:
+		break;
+	}
+}
+
+static void amd_cc_platform_set(enum cc_attr attr)
+{
+	switch (attr) {
+	case CC_ATTR_HOST_SEV_SNP:
+		cc_flags.host_sev_snp = 1;
+		break;
+	default:
+		break;
+	}
+}
+
+void cc_platform_set(enum cc_attr attr)
+{
+	switch (cc_vendor) {
+	case CC_VENDOR_AMD:
+		amd_cc_platform_set(attr);
+		break;
+	default:
+		break;
+	}
+}
diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h
index cb0d6cd1c12f..60693a145894 100644
--- a/include/linux/cc_platform.h
+++ b/include/linux/cc_platform.h
@@ -90,6 +90,14 @@ enum cc_attr {
 	 * Examples include TDX Guest.
 	 */
 	CC_ATTR_HOTPLUG_DISABLED,
+
+	/**
+	 * @CC_ATTR_HOST_SEV_SNP: AMD SNP enabled on the host.
+	 *
+	 * The host kernel is running with the necessary features
+	 * enabled to run SEV-SNP guests.
+	 */
+	CC_ATTR_HOST_SEV_SNP,
 };
 
 #ifdef CONFIG_ARCH_HAS_CC_PLATFORM
@@ -107,10 +115,14 @@ enum cc_attr {
  * * FALSE - Specified Confidential Computing attribute is not active
  */
 bool cc_platform_has(enum cc_attr attr);
+void cc_platform_set(enum cc_attr attr);
+void cc_platform_clear(enum cc_attr attr);
 
 #else	/* !CONFIG_ARCH_HAS_CC_PLATFORM */
 
 static inline bool cc_platform_has(enum cc_attr attr) { return false; }
+static inline void cc_platform_set(enum cc_attr attr) { }
+static inline void cc_platform_clear(enum cc_attr attr) { }
 
 #endif	/* CONFIG_ARCH_HAS_CC_PLATFORM */
 
-- 
2.25.1



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v12 03/29] [TEMP] x86/CPU/AMD: Track SNP host status with cc_platform_*()
  2024-03-29 22:58 [PATCH v12 00/29] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
  2024-03-29 22:58 ` [PATCH v12 01/29] [TEMP] x86/kvm/Kconfig: Have KVM_AMD_SEV select ARCH_HAS_CC_PLATFORM Michael Roth
  2024-03-29 22:58 ` [PATCH v12 02/29] [TEMP] x86/cc: Add cc_platform_set/_clear() helpers Michael Roth
@ 2024-03-29 22:58 ` Michael Roth
  2024-03-29 22:58 ` [PATCH v12 04/29] [TEMP] fixup! KVM: SEV: sync FPU and AVX state at LAUNCH_UPDATE_VMSA time Michael Roth
                   ` (26 subsequent siblings)
  29 siblings, 0 replies; 55+ messages in thread
From: Michael Roth @ 2024-03-29 22:58 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick

From: "Borislav Petkov (AMD)" <bp@alien8.de>

The host SNP worthiness can determined later, after alternatives have
been patched, in snp_rmptable_init() depending on cmdline options like
iommu=pt which is incompatible with SNP, for example.

Which means that one cannot use X86_FEATURE_SEV_SNP and will need to
have a special flag for that control.

Use that newly added CC_ATTR_HOST_SEV_SNP in the appropriate places.

Move kdump_sev_callback() to its rightfull place, while at it.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/sev.h         |  4 ++--
 arch/x86/kernel/cpu/amd.c          | 38 ++++++++++++++++++------------
 arch/x86/kernel/cpu/mtrr/generic.c |  2 +-
 arch/x86/kernel/sev.c              | 10 --------
 arch/x86/kvm/svm/sev.c             |  2 +-
 arch/x86/virt/svm/sev.c            | 26 +++++++++++++-------
 drivers/crypto/ccp/sev-dev.c       |  2 +-
 drivers/iommu/amd/init.c           |  4 +++-
 8 files changed, 49 insertions(+), 39 deletions(-)

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 9477b4053bce..780182cda3ab 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -228,7 +228,6 @@ int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, struct sn
 void snp_accept_memory(phys_addr_t start, phys_addr_t end);
 u64 snp_get_unsupported_features(u64 status);
 u64 sev_get_status(void);
-void kdump_sev_callback(void);
 void sev_show_status(void);
 #else
 static inline void sev_es_ist_enter(struct pt_regs *regs) { }
@@ -258,7 +257,6 @@ static inline int snp_issue_guest_request(u64 exit_code, struct snp_req_data *in
 static inline void snp_accept_memory(phys_addr_t start, phys_addr_t end) { }
 static inline u64 snp_get_unsupported_features(u64 status) { return 0; }
 static inline u64 sev_get_status(void) { return 0; }
-static inline void kdump_sev_callback(void) { }
 static inline void sev_show_status(void) { }
 #endif
 
@@ -270,6 +268,7 @@ int psmash(u64 pfn);
 int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, u32 asid, bool immutable);
 int rmp_make_shared(u64 pfn, enum pg_level level);
 void snp_leak_pages(u64 pfn, unsigned int npages);
+void kdump_sev_callback(void);
 #else
 static inline bool snp_probe_rmptable_info(void) { return false; }
 static inline int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level) { return -ENODEV; }
@@ -282,6 +281,7 @@ static inline int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, u32 as
 }
 static inline int rmp_make_shared(u64 pfn, enum pg_level level) { return -ENODEV; }
 static inline void snp_leak_pages(u64 pfn, unsigned int npages) {}
+static inline void kdump_sev_callback(void) { }
 #endif
 
 #endif
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 6d8677e80ddb..9bf17c9c29da 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -345,6 +345,28 @@ static void srat_detect_node(struct cpuinfo_x86 *c)
 #endif
 }
 
+static void bsp_determine_snp(struct cpuinfo_x86 *c)
+{
+#ifdef CONFIG_ARCH_HAS_CC_PLATFORM
+	cc_vendor = CC_VENDOR_AMD;
+
+	if (cpu_has(c, X86_FEATURE_SEV_SNP)) {
+		/*
+		 * RMP table entry format is not architectural and is defined by the
+		 * per-processor PPR. Restrict SNP support on the known CPU models
+		 * for which the RMP table entry format is currently defined for.
+		 */
+		if (!cpu_has(c, X86_FEATURE_HYPERVISOR) &&
+		    c->x86 >= 0x19 && snp_probe_rmptable_info()) {
+			cc_platform_set(CC_ATTR_HOST_SEV_SNP);
+		} else {
+			setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
+			cc_platform_clear(CC_ATTR_HOST_SEV_SNP);
+		}
+	}
+#endif
+}
+
 static void bsp_init_amd(struct cpuinfo_x86 *c)
 {
 	if (cpu_has(c, X86_FEATURE_CONSTANT_TSC)) {
@@ -452,21 +474,7 @@ static void bsp_init_amd(struct cpuinfo_x86 *c)
 		break;
 	}
 
-	if (cpu_has(c, X86_FEATURE_SEV_SNP)) {
-		/*
-		 * RMP table entry format is not architectural and it can vary by processor
-		 * and is defined by the per-processor PPR. Restrict SNP support on the
-		 * known CPU model and family for which the RMP table entry format is
-		 * currently defined for.
-		 */
-		if (!boot_cpu_has(X86_FEATURE_ZEN3) &&
-		    !boot_cpu_has(X86_FEATURE_ZEN4) &&
-		    !boot_cpu_has(X86_FEATURE_ZEN5))
-			setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
-		else if (!snp_probe_rmptable_info())
-			setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
-	}
-
+	bsp_determine_snp(c);
 	return;
 
 warn:
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 422a4ddc2ab7..7b29ebda024f 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -108,7 +108,7 @@ static inline void k8_check_syscfg_dram_mod_en(void)
 	      (boot_cpu_data.x86 >= 0x0f)))
 		return;
 
-	if (cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+	if (cc_platform_has(CC_ATTR_HOST_SEV_SNP))
 		return;
 
 	rdmsr(MSR_AMD64_SYSCFG, lo, hi);
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index b59b09c2f284..1e1a3c3bd1e8 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -2287,16 +2287,6 @@ static int __init snp_init_platform_device(void)
 }
 device_initcall(snp_init_platform_device);
 
-void kdump_sev_callback(void)
-{
-	/*
-	 * Do wbinvd() on remote CPUs when SNP is enabled in order to
-	 * safely do SNP_SHUTDOWN on the local CPU.
-	 */
-	if (cpu_feature_enabled(X86_FEATURE_SEV_SNP))
-		wbinvd();
-}
-
 void sev_show_status(void)
 {
 	int i;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index d30bd30d4f7a..7b872f97a452 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3279,7 +3279,7 @@ struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
 	unsigned long pfn;
 	struct page *p;
 
-	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+	if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP))
 		return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
 
 	/*
diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
index cffe1157a90a..ab0e8448bb6e 100644
--- a/arch/x86/virt/svm/sev.c
+++ b/arch/x86/virt/svm/sev.c
@@ -77,7 +77,7 @@ static int __mfd_enable(unsigned int cpu)
 {
 	u64 val;
 
-	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+	if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP))
 		return 0;
 
 	rdmsrl(MSR_AMD64_SYSCFG, val);
@@ -98,7 +98,7 @@ static int __snp_enable(unsigned int cpu)
 {
 	u64 val;
 
-	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+	if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP))
 		return 0;
 
 	rdmsrl(MSR_AMD64_SYSCFG, val);
@@ -174,11 +174,11 @@ static int __init snp_rmptable_init(void)
 	u64 rmptable_size;
 	u64 val;
 
-	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+	if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP))
 		return 0;
 
 	if (!amd_iommu_snp_en)
-		return 0;
+		goto nosnp;
 
 	if (!probed_rmp_size)
 		goto nosnp;
@@ -225,7 +225,7 @@ static int __init snp_rmptable_init(void)
 	return 0;
 
 nosnp:
-	setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
+	cc_platform_clear(CC_ATTR_HOST_SEV_SNP);
 	return -ENOSYS;
 }
 
@@ -246,7 +246,7 @@ static struct rmpentry *__snp_lookup_rmpentry(u64 pfn, int *level)
 {
 	struct rmpentry *large_entry, *entry;
 
-	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+	if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP))
 		return ERR_PTR(-ENODEV);
 
 	entry = get_rmpentry(pfn);
@@ -363,7 +363,7 @@ int psmash(u64 pfn)
 	unsigned long paddr = pfn << PAGE_SHIFT;
 	int ret;
 
-	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+	if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP))
 		return -ENODEV;
 
 	if (!pfn_valid(pfn))
@@ -472,7 +472,7 @@ static int rmpupdate(u64 pfn, struct rmp_state *state)
 	unsigned long paddr = pfn << PAGE_SHIFT;
 	int ret, level;
 
-	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+	if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP))
 		return -ENODEV;
 
 	level = RMP_TO_PG_LEVEL(state->pagesize);
@@ -558,3 +558,13 @@ void snp_leak_pages(u64 pfn, unsigned int npages)
 	spin_unlock(&snp_leaked_pages_list_lock);
 }
 EXPORT_SYMBOL_GPL(snp_leak_pages);
+
+void kdump_sev_callback(void)
+{
+	/*
+	 * Do wbinvd() on remote CPUs when SNP is enabled in order to
+	 * safely do SNP_SHUTDOWN on the local CPU.
+	 */
+	if (cc_platform_has(CC_ATTR_HOST_SEV_SNP))
+		wbinvd();
+}
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index f44efbb89c34..2102377f727b 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -1090,7 +1090,7 @@ static int __sev_snp_init_locked(int *error)
 	void *arg = &data;
 	int cmd, rc = 0;
 
-	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+	if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP))
 		return -ENODEV;
 
 	sev = psp->sev_data;
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index e7a44929f0da..33228c1c8980 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -3228,7 +3228,7 @@ static bool __init detect_ivrs(void)
 static void iommu_snp_enable(void)
 {
 #ifdef CONFIG_KVM_AMD_SEV
-	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+	if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP))
 		return;
 	/*
 	 * The SNP support requires that IOMMU must be enabled, and is
@@ -3236,12 +3236,14 @@ static void iommu_snp_enable(void)
 	 */
 	if (no_iommu || iommu_default_passthrough()) {
 		pr_err("SNP: IOMMU disabled or configured in passthrough mode, SNP cannot be supported.\n");
+		cc_platform_clear(CC_ATTR_HOST_SEV_SNP);
 		return;
 	}
 
 	amd_iommu_snp_en = check_feature(FEATURE_SNP);
 	if (!amd_iommu_snp_en) {
 		pr_err("SNP: IOMMU SNP feature not enabled, SNP cannot be supported.\n");
+		cc_platform_clear(CC_ATTR_HOST_SEV_SNP);
 		return;
 	}
 
-- 
2.25.1



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v12 04/29] [TEMP] fixup! KVM: SEV: sync FPU and AVX state at LAUNCH_UPDATE_VMSA time
  2024-03-29 22:58 [PATCH v12 00/29] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (2 preceding siblings ...)
  2024-03-29 22:58 ` [PATCH v12 03/29] [TEMP] x86/CPU/AMD: Track SNP host status with cc_platform_*() Michael Roth
@ 2024-03-29 22:58 ` Michael Roth
  2024-03-29 22:58 ` [PATCH v12 05/29] KVM: x86: Define RMP page fault error bits for #NPF Michael Roth
                   ` (25 subsequent siblings)
  29 siblings, 0 replies; 55+ messages in thread
From: Michael Roth @ 2024-03-29 22:58 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick

From: Paolo Bonzini <pbonzini@redhat.com>

A small change to add EXPORT_SYMBOL_GPL, and especially to actually match
the format in which the processor expects x87 registers in the VMSA.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kernel/fpu/xstate.c |  1 +
 arch/x86/kvm/svm/sev.c       | 12 ++++++++++--
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 117e74c44e75..eeaf4ec9243d 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -990,6 +990,7 @@ void *get_xsave_addr(struct xregs_state *xsave, int xfeature_nr)
 
 	return __raw_xsave_addr(xsave, xfeature_nr);
 }
+EXPORT_SYMBOL_GPL(get_xsave_addr);
 
 #ifdef CONFIG_ARCH_HAS_PKEYS
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 7b872f97a452..58019f1aefed 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -679,9 +679,17 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
 	save->x87_rip = xsave->i387.rip;
 
 	for (i = 0; i < 8; i++) {
-		d = save->fpreg_x87 + i * 10;
+		/*
+		 * The format of the x87 save area is totally undocumented,
+		 * and definitely not what you would expect.  It consists
+		 * of an 8*8 bytes area with bytes 0-7 and an 8*2 bytes area
+		 * with bytes 8-9 of each register.
+		 */
+		d = save->fpreg_x87 + i * 8;
 		s = ((u8 *)xsave->i387.st_space) + i * 16;
-		memcpy(d, s, 10);
+		memcpy(d, s, 8);
+		save->fpreg_x87[64 + i * 2] = s[8];
+		save->fpreg_x87[64 + i * 2 + 1] = s[9];
 	}
 	memcpy(save->fpreg_xmm, xsave->i387.xmm_space, 256);
 
-- 
2.25.1



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v12 05/29] KVM: x86: Define RMP page fault error bits for #NPF
  2024-03-29 22:58 [PATCH v12 00/29] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (3 preceding siblings ...)
  2024-03-29 22:58 ` [PATCH v12 04/29] [TEMP] fixup! KVM: SEV: sync FPU and AVX state at LAUNCH_UPDATE_VMSA time Michael Roth
@ 2024-03-29 22:58 ` Michael Roth
  2024-03-30 19:28   ` Paolo Bonzini
  2024-03-29 22:58 ` [PATCH v12 06/29] KVM: SEV: Select KVM_GENERIC_PRIVATE_MEM when CONFIG_KVM_AMD_SEV=y Michael Roth
                   ` (24 subsequent siblings)
  29 siblings, 1 reply; 55+ messages in thread
From: Michael Roth @ 2024-03-29 22:58 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

When SEV-SNP is enabled globally, the hardware places restrictions on
all memory accesses based on the RMP entry, whether the hypervisor or a
VM, performs the accesses. When hardware encounters an RMP access
violation during a guest access, it will cause a #VMEXIT(NPF) with a
number of additional bits set to indicate the reasons for the #NPF.
Define those here.

See APM2 section 16.36.10 for more details.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: add some additional details to commit message]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/kvm_host.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 90dc0ae9311a..a3f8eba8d8b6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -262,9 +262,12 @@ enum x86_intercept_stage;
 #define PFERR_FETCH_BIT 4
 #define PFERR_PK_BIT 5
 #define PFERR_SGX_BIT 15
+#define PFERR_GUEST_RMP_BIT 31
 #define PFERR_GUEST_FINAL_BIT 32
 #define PFERR_GUEST_PAGE_BIT 33
 #define PFERR_GUEST_ENC_BIT 34
+#define PFERR_GUEST_SIZEM_BIT 35
+#define PFERR_GUEST_VMPL_BIT 36
 #define PFERR_IMPLICIT_ACCESS_BIT 48
 
 #define PFERR_PRESENT_MASK	BIT(PFERR_PRESENT_BIT)
@@ -277,7 +280,10 @@ enum x86_intercept_stage;
 #define PFERR_GUEST_FINAL_MASK	BIT_ULL(PFERR_GUEST_FINAL_BIT)
 #define PFERR_GUEST_PAGE_MASK	BIT_ULL(PFERR_GUEST_PAGE_BIT)
 #define PFERR_GUEST_ENC_MASK	BIT_ULL(PFERR_GUEST_ENC_BIT)
+#define PFERR_GUEST_RMP_MASK	BIT_ULL(PFERR_GUEST_RMP_BIT)
+#define PFERR_GUEST_SIZEM_MASK	BIT_ULL(PFERR_GUEST_SIZEM_BIT)
 #define PFERR_IMPLICIT_ACCESS	BIT_ULL(PFERR_IMPLICIT_ACCESS_BIT)
+#define PFERR_GUEST_VMPL_MASK	BIT_ULL(PFERR_GUEST_VMPL_BIT)
 
 #define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK |	\
 				 PFERR_WRITE_MASK |		\
-- 
2.25.1



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v12 05/29] KVM: x86: Define RMP page fault error bits for #NPF
  2024-03-29 22:58 ` [PATCH v12 05/29] KVM: x86: Define RMP page fault error bits for #NPF Michael Roth
@ 2024-03-30 19:28   ` Paolo Bonzini
  0 siblings, 0 replies; 55+ messages in thread
From: Paolo Bonzini @ 2024-03-30 19:28 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, seanjc, vkuznets,
	jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, Brijesh Singh

On 3/29/24 23:58, Michael Roth wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
> 
> When SEV-SNP is enabled globally, the hardware places restrictions on
> all memory accesses based on the RMP entry, whether the hypervisor or a
> VM, performs the accesses. When hardware encounters an RMP access
> violation during a guest access, it will cause a #VMEXIT(NPF) with a
> number of additional bits set to indicate the reasons for the #NPF.
> Define those here.
> 
> See APM2 section 16.36.10 for more details.
> 
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> [mdr: add some additional details to commit message]
> Signed-off-by: Michael Roth <michael.roth@amd.com>

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

One nit below.


> ---
>   arch/x86/include/asm/kvm_host.h | 6 ++++++
>   1 file changed, 6 insertions(+)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 90dc0ae9311a..a3f8eba8d8b6 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -262,9 +262,12 @@ enum x86_intercept_stage;
>   #define PFERR_FETCH_BIT 4
>   #define PFERR_PK_BIT 5
>   #define PFERR_SGX_BIT 15
> +#define PFERR_GUEST_RMP_BIT 31
>   #define PFERR_GUEST_FINAL_BIT 32
>   #define PFERR_GUEST_PAGE_BIT 33
>   #define PFERR_GUEST_ENC_BIT 34
> +#define PFERR_GUEST_SIZEM_BIT 35
> +#define PFERR_GUEST_VMPL_BIT 36
>   #define PFERR_IMPLICIT_ACCESS_BIT 48
>   
>   #define PFERR_PRESENT_MASK	BIT(PFERR_PRESENT_BIT)
> @@ -277,7 +280,10 @@ enum x86_intercept_stage;
>   #define PFERR_GUEST_FINAL_MASK	BIT_ULL(PFERR_GUEST_FINAL_BIT)
>   #define PFERR_GUEST_PAGE_MASK	BIT_ULL(PFERR_GUEST_PAGE_BIT)
>   #define PFERR_GUEST_ENC_MASK	BIT_ULL(PFERR_GUEST_ENC_BIT)
> +#define PFERR_GUEST_RMP_MASK	BIT_ULL(PFERR_GUEST_RMP_BIT)
> +#define PFERR_GUEST_SIZEM_MASK	BIT_ULL(PFERR_GUEST_SIZEM_BIT)
>   #define PFERR_IMPLICIT_ACCESS	BIT_ULL(PFERR_IMPLICIT_ACCESS_BIT)
> +#define PFERR_GUEST_VMPL_MASK	BIT_ULL(PFERR_GUEST_VMPL_BIT)

Should be kept in either bit order or perhaps alphabetical order 
(probably bit is better).

Paolo

>   #define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK |	\
>   				 PFERR_WRITE_MASK |		\



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v12 06/29] KVM: SEV: Select KVM_GENERIC_PRIVATE_MEM when CONFIG_KVM_AMD_SEV=y
  2024-03-29 22:58 [PATCH v12 00/29] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (4 preceding siblings ...)
  2024-03-29 22:58 ` [PATCH v12 05/29] KVM: x86: Define RMP page fault error bits for #NPF Michael Roth
@ 2024-03-29 22:58 ` Michael Roth
  2024-03-29 22:58 ` [PATCH v12 07/29] KVM: SEV: Add support to handle AP reset MSR protocol Michael Roth
                   ` (23 subsequent siblings)
  29 siblings, 0 replies; 55+ messages in thread
From: Michael Roth @ 2024-03-29 22:58 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick

SEV-SNP relies on private memory support to run guests, so make sure to
enable that support via the CONFIG_KVM_GENERIC_PRIVATE_MEM config
option.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 6a76ba7b6bac..d0bb0e7a4e80 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -123,6 +123,7 @@ config KVM_AMD_SEV
 	depends on KVM_AMD && X86_64
 	depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
 	select ARCH_HAS_CC_PLATFORM
+	select KVM_GENERIC_PRIVATE_MEM
 	help
 	  Provides support for launching Encrypted VMs (SEV) and Encrypted VMs
 	  with Encrypted State (SEV-ES) on AMD processors.
-- 
2.25.1



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v12 07/29] KVM: SEV: Add support to handle AP reset MSR protocol
  2024-03-29 22:58 [PATCH v12 00/29] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (5 preceding siblings ...)
  2024-03-29 22:58 ` [PATCH v12 06/29] KVM: SEV: Select KVM_GENERIC_PRIVATE_MEM when CONFIG_KVM_AMD_SEV=y Michael Roth
@ 2024-03-29 22:58 ` Michael Roth
  2024-03-29 22:58 ` [PATCH v12 08/29] KVM: SEV: Add GHCB handling for Hypervisor Feature Support requests Michael Roth
                   ` (22 subsequent siblings)
  29 siblings, 0 replies; 55+ messages in thread
From: Michael Roth @ 2024-03-29 22:58 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

Add support for AP Reset Hold being invoked using the GHCB MSR protocol,
available in version 2 of the GHCB specification.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/sev-common.h |  6 ++--
 arch/x86/kvm/svm/sev.c            | 56 ++++++++++++++++++++++++++-----
 arch/x86/kvm/svm/svm.h            |  1 +
 3 files changed, 53 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index b463fcbd4b90..01261f7054ad 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -54,8 +54,10 @@
 	(((unsigned long)fn) << 32))
 
 /* AP Reset Hold */
-#define GHCB_MSR_AP_RESET_HOLD_REQ	0x006
-#define GHCB_MSR_AP_RESET_HOLD_RESP	0x007
+#define GHCB_MSR_AP_RESET_HOLD_REQ		0x006
+#define GHCB_MSR_AP_RESET_HOLD_RESP		0x007
+#define GHCB_MSR_AP_RESET_HOLD_RESULT_POS	12
+#define GHCB_MSR_AP_RESET_HOLD_RESULT_MASK	GENMASK_ULL(51, 0)
 
 /* GHCB GPA Register */
 #define GHCB_MSR_REG_GPA_REQ		0x012
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 58019f1aefed..7f5faa0d4d4f 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -49,6 +49,10 @@ static bool sev_es_debug_swap_enabled = true;
 module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0444);
 static u64 sev_supported_vmsa_features;
 
+#define AP_RESET_HOLD_NONE		0
+#define AP_RESET_HOLD_NAE_EVENT		1
+#define AP_RESET_HOLD_MSR_PROTO		2
+
 static u8 sev_enc_bit;
 static DECLARE_RWSEM(sev_deactivate_lock);
 static DEFINE_MUTEX(sev_bitmap_lock);
@@ -2718,6 +2722,9 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 
 void sev_es_unmap_ghcb(struct vcpu_svm *svm)
 {
+	/* Clear any indication that the vCPU is in a type of AP Reset Hold */
+	svm->sev_es.ap_reset_hold_type = AP_RESET_HOLD_NONE;
+
 	if (!svm->sev_es.ghcb)
 		return;
 
@@ -2929,6 +2936,22 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 				  GHCB_MSR_INFO_POS);
 		break;
 	}
+	case GHCB_MSR_AP_RESET_HOLD_REQ:
+		svm->sev_es.ap_reset_hold_type = AP_RESET_HOLD_MSR_PROTO;
+		ret = kvm_emulate_ap_reset_hold(&svm->vcpu);
+
+		/*
+		 * Preset the result to a non-SIPI return and then only set
+		 * the result to non-zero when delivering a SIPI.
+		 */
+		set_ghcb_msr_bits(svm, 0,
+				  GHCB_MSR_AP_RESET_HOLD_RESULT_MASK,
+				  GHCB_MSR_AP_RESET_HOLD_RESULT_POS);
+
+		set_ghcb_msr_bits(svm, GHCB_MSR_AP_RESET_HOLD_RESP,
+				  GHCB_MSR_INFO_MASK,
+				  GHCB_MSR_INFO_POS);
+		break;
 	case GHCB_MSR_TERM_REQ: {
 		u64 reason_set, reason_code;
 
@@ -3028,6 +3051,7 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 		ret = 1;
 		break;
 	case SVM_VMGEXIT_AP_HLT_LOOP:
+		svm->sev_es.ap_reset_hold_type = AP_RESET_HOLD_NAE_EVENT;
 		ret = kvm_emulate_ap_reset_hold(vcpu);
 		break;
 	case SVM_VMGEXIT_AP_JUMP_TABLE: {
@@ -3271,15 +3295,31 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
 		return;
 	}
 
-	/*
-	 * Subsequent SIPI: Return from an AP Reset Hold VMGEXIT, where
-	 * the guest will set the CS and RIP. Set SW_EXIT_INFO_2 to a
-	 * non-zero value.
-	 */
-	if (!svm->sev_es.ghcb)
-		return;
+	/* Subsequent SIPI */
+	switch (svm->sev_es.ap_reset_hold_type) {
+	case AP_RESET_HOLD_NAE_EVENT:
+		/*
+		 * Return from an AP Reset Hold VMGEXIT, where the guest will
+		 * set the CS and RIP. Set SW_EXIT_INFO_2 to a non-zero value.
+		 */
+		ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, 1);
+		break;
+	case AP_RESET_HOLD_MSR_PROTO:
+		/*
+		 * Return from an AP Reset Hold VMGEXIT, where the guest will
+		 * set the CS and RIP. Set GHCB data field to a non-zero value.
+		 */
+		set_ghcb_msr_bits(svm, 1,
+				  GHCB_MSR_AP_RESET_HOLD_RESULT_MASK,
+				  GHCB_MSR_AP_RESET_HOLD_RESULT_POS);
 
-	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, 1);
+		set_ghcb_msr_bits(svm, GHCB_MSR_AP_RESET_HOLD_RESP,
+				  GHCB_MSR_INFO_MASK,
+				  GHCB_MSR_INFO_POS);
+		break;
+	default:
+		break;
+	}
 }
 
 struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 717cc97f8f50..157eb3f65269 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -199,6 +199,7 @@ struct vcpu_sev_es_state {
 	u8 valid_bitmap[16];
 	struct kvm_host_map ghcb_map;
 	bool received_first_sipi;
+	unsigned int ap_reset_hold_type;
 
 	/* SEV-ES scratch area support */
 	u64 sw_scratch;
-- 
2.25.1



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v12 08/29] KVM: SEV: Add GHCB handling for Hypervisor Feature Support requests
  2024-03-29 22:58 [PATCH v12 00/29] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (6 preceding siblings ...)
  2024-03-29 22:58 ` [PATCH v12 07/29] KVM: SEV: Add support to handle AP reset MSR protocol Michael Roth
@ 2024-03-29 22:58 ` Michael Roth
  2024-03-29 22:58 ` [PATCH v12 09/29] KVM: SEV: Add initial SEV-SNP support Michael Roth
                   ` (21 subsequent siblings)
  29 siblings, 0 replies; 55+ messages in thread
From: Michael Roth @ 2024-03-29 22:58 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

Version 2 of the GHCB specification introduced advertisement of features
that are supported by the Hypervisor.

Now that KVM supports version 2 of the GHCB specification, bump the
maximum supported protocol version.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/sev-common.h |  2 ++
 arch/x86/kvm/svm/sev.c            | 16 +++++++++++++++-
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 01261f7054ad..5a8246dd532f 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -101,6 +101,8 @@ enum psc_op {
 /* GHCB Hypervisor Feature Request/Response */
 #define GHCB_MSR_HV_FT_REQ		0x080
 #define GHCB_MSR_HV_FT_RESP		0x081
+#define GHCB_MSR_HV_FT_POS		12
+#define GHCB_MSR_HV_FT_MASK		GENMASK_ULL(51, 0)
 #define GHCB_MSR_HV_FT_RESP_VAL(v)			\
 	/* GHCBData[63:12] */				\
 	(((u64)(v) & GENMASK_ULL(63, 12)) >> 12)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 7f5faa0d4d4f..1e65f5634ad3 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -33,9 +33,11 @@
 #include "cpuid.h"
 #include "trace.h"
 
-#define GHCB_VERSION_MAX	1ULL
+#define GHCB_VERSION_MAX	2ULL
 #define GHCB_VERSION_MIN	1ULL
 
+#define GHCB_HV_FT_SUPPORTED	GHCB_HV_FT_SNP
+
 /* enable/disable SEV support */
 static bool sev_enabled = true;
 module_param_named(sev, sev_enabled, bool, 0444);
@@ -2692,6 +2694,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 	case SVM_VMGEXIT_AP_HLT_LOOP:
 	case SVM_VMGEXIT_AP_JUMP_TABLE:
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
+	case SVM_VMGEXIT_HV_FEATURES:
 		break;
 	default:
 		reason = GHCB_ERR_INVALID_EVENT;
@@ -2952,6 +2955,12 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 				  GHCB_MSR_INFO_MASK,
 				  GHCB_MSR_INFO_POS);
 		break;
+	case GHCB_MSR_HV_FT_REQ:
+		set_ghcb_msr_bits(svm, GHCB_HV_FT_SUPPORTED,
+				  GHCB_MSR_HV_FT_MASK, GHCB_MSR_HV_FT_POS);
+		set_ghcb_msr_bits(svm, GHCB_MSR_HV_FT_RESP,
+				  GHCB_MSR_INFO_MASK, GHCB_MSR_INFO_POS);
+		break;
 	case GHCB_MSR_TERM_REQ: {
 		u64 reason_set, reason_code;
 
@@ -3076,6 +3085,11 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 		ret = 1;
 		break;
 	}
+	case SVM_VMGEXIT_HV_FEATURES:
+		ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, GHCB_HV_FT_SUPPORTED);
+
+		ret = 1;
+		break;
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 		vcpu_unimpl(vcpu,
 			    "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
-- 
2.25.1



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v12 09/29] KVM: SEV: Add initial SEV-SNP support
  2024-03-29 22:58 [PATCH v12 00/29] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (7 preceding siblings ...)
  2024-03-29 22:58 ` [PATCH v12 08/29] KVM: SEV: Add GHCB handling for Hypervisor Feature Support requests Michael Roth
@ 2024-03-29 22:58 ` Michael Roth
  2024-03-30 19:58   ` Paolo Bonzini
  2024-03-29 22:58 ` [PATCH v12 10/29] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_START command Michael Roth
                   ` (20 subsequent siblings)
  29 siblings, 1 reply; 55+ messages in thread
From: Michael Roth @ 2024-03-29 22:58 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, Brijesh Singh

SEV-SNP builds upon existing SEV and SEV-ES functionality while adding
new hardware-based security protection. SEV-SNP adds strong memory
encryption and integrity protection to help prevent malicious
hypervisor-based attacks such as data replay, memory re-mapping, and
more, to create an isolated execution environment.

Define a new KVM_X86_SNP_VM type which makes use of these capabilities
and extend the KVM_SEV_INIT2 ioctl to support it. Also add a basic
helper to check whether SNP is enabled.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: commit fixups, use similar ASID reporting as with SEV/SEV-ES]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/svm.h      |  3 ++-
 arch/x86/include/uapi/asm/kvm.h |  1 +
 arch/x86/kvm/svm/sev.c          | 21 ++++++++++++++++++++-
 arch/x86/kvm/svm/svm.c          |  3 ++-
 arch/x86/kvm/svm/svm.h          | 12 ++++++++++++
 arch/x86/kvm/x86.c              |  2 +-
 6 files changed, 38 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 728c98175b9c..544a43c1cf11 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -285,7 +285,8 @@ static_assert((X2AVIC_MAX_PHYSICAL_ID & AVIC_PHYSICAL_MAX_INDEX_MASK) == X2AVIC_
 
 #define AVIC_HPA_MASK	~((0xFFFULL << 52) | 0xFFF)
 
-#define SVM_SEV_FEAT_DEBUG_SWAP                        BIT(5)
+#define SVM_SEV_FEAT_SNP_ACTIVE				BIT(0)
+#define SVM_SEV_FEAT_DEBUG_SWAP				BIT(5)
 
 struct vmcb_seg {
 	u16 selector;
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 51b13080ed4b..725b75cfe9ff 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -868,5 +868,6 @@ struct kvm_hyperv_eventfd {
 #define KVM_X86_SW_PROTECTED_VM	1
 #define KVM_X86_SEV_VM		2
 #define KVM_X86_SEV_ES_VM	3
+#define KVM_X86_SNP_VM		4
 
 #endif /* _ASM_X86_KVM_H */
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 1e65f5634ad3..3d9771163562 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -46,6 +46,9 @@ module_param_named(sev, sev_enabled, bool, 0444);
 static bool sev_es_enabled = true;
 module_param_named(sev_es, sev_es_enabled, bool, 0444);
 
+/* enable/disable SEV-SNP support */
+static bool sev_snp_enabled;
+
 /* enable/disable SEV-ES DebugSwap support */
 static bool sev_es_debug_swap_enabled = true;
 module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0444);
@@ -275,6 +278,9 @@ static int __sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp,
 	sev->es_active = es_active;
 	sev->vmsa_features = data->vmsa_features;
 
+	if (vm_type == KVM_X86_SNP_VM)
+		sev->vmsa_features |= SVM_SEV_FEAT_SNP_ACTIVE;
+
 	ret = sev_asid_new(sev);
 	if (ret)
 		goto e_no_asid;
@@ -326,7 +332,8 @@ static int sev_guest_init2(struct kvm *kvm, struct kvm_sev_cmd *argp)
 		return -EINVAL;
 
 	if (kvm->arch.vm_type != KVM_X86_SEV_VM &&
-	    kvm->arch.vm_type != KVM_X86_SEV_ES_VM)
+	    kvm->arch.vm_type != KVM_X86_SEV_ES_VM &&
+	    kvm->arch.vm_type != KVM_X86_SNP_VM)
 		return -EINVAL;
 
 	if (copy_from_user(&data, u64_to_user_ptr(argp->data), sizeof(data)))
@@ -2297,11 +2304,16 @@ void __init sev_set_cpu_caps(void)
 		kvm_cpu_cap_set(X86_FEATURE_SEV_ES);
 		kvm_caps.supported_vm_types |= BIT(KVM_X86_SEV_ES_VM);
 	}
+	if (sev_snp_enabled) {
+		kvm_cpu_cap_set(X86_FEATURE_SEV_SNP);
+		kvm_caps.supported_vm_types |= BIT(KVM_X86_SNP_VM);
+	}
 }
 
 void __init sev_hardware_setup(void)
 {
 	unsigned int eax, ebx, ecx, edx, sev_asid_count, sev_es_asid_count;
+	bool sev_snp_supported = false;
 	bool sev_es_supported = false;
 	bool sev_supported = false;
 
@@ -2382,6 +2394,7 @@ void __init sev_hardware_setup(void)
 	sev_es_asid_count = min_sev_asid - 1;
 	WARN_ON_ONCE(misc_cg_set_capacity(MISC_CG_RES_SEV_ES, sev_es_asid_count));
 	sev_es_supported = true;
+	sev_snp_supported = sev_snp_enabled && cc_platform_has(CC_ATTR_HOST_SEV_SNP);
 
 out:
 	if (boot_cpu_has(X86_FEATURE_SEV))
@@ -2394,9 +2407,15 @@ void __init sev_hardware_setup(void)
 		pr_info("SEV-ES %s (ASIDs %u - %u)\n",
 			sev_es_supported ? "enabled" : "disabled",
 			min_sev_asid > 1 ? 1 : 0, min_sev_asid - 1);
+	if (boot_cpu_has(X86_FEATURE_SEV_SNP))
+		pr_info("SEV-SNP %s (ASIDs %u - %u)\n",
+			sev_snp_supported ? "enabled" : "disabled",
+			min_sev_asid > 1 ? 1 : 0, min_sev_asid - 1);
 
 	sev_enabled = sev_supported;
 	sev_es_enabled = sev_es_supported;
+	sev_snp_enabled = sev_snp_supported;
+
 	if (!sev_es_enabled || !cpu_feature_enabled(X86_FEATURE_DEBUG_SWAP) ||
 	    !cpu_feature_enabled(X86_FEATURE_NO_NESTED_DATA_BP))
 		sev_es_debug_swap_enabled = false;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 0f3b59da0d4a..2c162f6a1d78 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4890,7 +4890,8 @@ static int svm_vm_init(struct kvm *kvm)
 
 	if (type != KVM_X86_DEFAULT_VM &&
 	    type != KVM_X86_SW_PROTECTED_VM) {
-		kvm->arch.has_protected_state = (type == KVM_X86_SEV_ES_VM);
+		kvm->arch.has_protected_state =
+			(type == KVM_X86_SEV_ES_VM || type == KVM_X86_SNP_VM);
 		to_kvm_sev_info(kvm)->need_init = true;
 	}
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 157eb3f65269..4a01a81dd9b9 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -348,6 +348,18 @@ static __always_inline bool sev_es_guest(struct kvm *kvm)
 #endif
 }
 
+static __always_inline bool sev_snp_guest(struct kvm *kvm)
+{
+#ifdef CONFIG_KVM_AMD_SEV
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+
+	return (sev->vmsa_features & SVM_SEV_FEAT_SNP_ACTIVE) &&
+	       !WARN_ON_ONCE(!sev_es_guest(kvm));
+#else
+	return false;
+#endif
+}
+
 static inline void vmcb_mark_all_dirty(struct vmcb *vmcb)
 {
 	vmcb->control.clean = 0;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 64eda7949f09..f85735b6235d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12603,7 +12603,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 
 	kvm->arch.vm_type = type;
 	kvm->arch.has_private_mem =
-		(type == KVM_X86_SW_PROTECTED_VM);
+		(type == KVM_X86_SW_PROTECTED_VM || type == KVM_X86_SNP_VM);
 
 	ret = kvm_page_track_init(kvm);
 	if (ret)
-- 
2.25.1



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v12 09/29] KVM: SEV: Add initial SEV-SNP support
  2024-03-29 22:58 ` [PATCH v12 09/29] KVM: SEV: Add initial SEV-SNP support Michael Roth
@ 2024-03-30 19:58   ` Paolo Bonzini
  0 siblings, 0 replies; 55+ messages in thread
From: Paolo Bonzini @ 2024-03-30 19:58 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, seanjc, vkuznets,
	jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, Brijesh Singh

On 3/29/24 23:58, Michael Roth wrote:
> SEV-SNP builds upon existing SEV and SEV-ES functionality while adding
> new hardware-based security protection. SEV-SNP adds strong memory
> encryption and integrity protection to help prevent malicious
> hypervisor-based attacks such as data replay, memory re-mapping, and
> more, to create an isolated execution environment.
> 
> Define a new KVM_X86_SNP_VM type which makes use of these capabilities
> and extend the KVM_SEV_INIT2 ioctl to support it. Also add a basic
> helper to check whether SNP is enabled.
> 
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> [mdr: commit fixups, use similar ASID reporting as with SEV/SEV-ES]
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>   arch/x86/include/asm/svm.h      |  3 ++-
>   arch/x86/include/uapi/asm/kvm.h |  1 +
>   arch/x86/kvm/svm/sev.c          | 21 ++++++++++++++++++++-
>   arch/x86/kvm/svm/svm.c          |  3 ++-
>   arch/x86/kvm/svm/svm.h          | 12 ++++++++++++
>   arch/x86/kvm/x86.c              |  2 +-
>   6 files changed, 38 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> index 728c98175b9c..544a43c1cf11 100644
> --- a/arch/x86/include/asm/svm.h
> +++ b/arch/x86/include/asm/svm.h
> @@ -285,7 +285,8 @@ static_assert((X2AVIC_MAX_PHYSICAL_ID & AVIC_PHYSICAL_MAX_INDEX_MASK) == X2AVIC_
>   
>   #define AVIC_HPA_MASK	~((0xFFFULL << 52) | 0xFFF)
>   
> -#define SVM_SEV_FEAT_DEBUG_SWAP                        BIT(5)
> +#define SVM_SEV_FEAT_SNP_ACTIVE				BIT(0)
> +#define SVM_SEV_FEAT_DEBUG_SWAP				BIT(5)
>   
>   struct vmcb_seg {
>   	u16 selector;
> diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
> index 51b13080ed4b..725b75cfe9ff 100644
> --- a/arch/x86/include/uapi/asm/kvm.h
> +++ b/arch/x86/include/uapi/asm/kvm.h
> @@ -868,5 +868,6 @@ struct kvm_hyperv_eventfd {
>   #define KVM_X86_SW_PROTECTED_VM	1
>   #define KVM_X86_SEV_VM		2
>   #define KVM_X86_SEV_ES_VM	3
> +#define KVM_X86_SNP_VM		4
>   
>   #endif /* _ASM_X86_KVM_H */
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 1e65f5634ad3..3d9771163562 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -46,6 +46,9 @@ module_param_named(sev, sev_enabled, bool, 0444);
>   static bool sev_es_enabled = true;
>   module_param_named(sev_es, sev_es_enabled, bool, 0444);
>   
> +/* enable/disable SEV-SNP support */
> +static bool sev_snp_enabled;
> +
>   /* enable/disable SEV-ES DebugSwap support */
>   static bool sev_es_debug_swap_enabled = true;
>   module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0444);
> @@ -275,6 +278,9 @@ static int __sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp,
>   	sev->es_active = es_active;
>   	sev->vmsa_features = data->vmsa_features;
>   
> +	if (vm_type == KVM_X86_SNP_VM)
> +		sev->vmsa_features |= SVM_SEV_FEAT_SNP_ACTIVE;
> +
>   	ret = sev_asid_new(sev);
>   	if (ret)
>   		goto e_no_asid;
> @@ -326,7 +332,8 @@ static int sev_guest_init2(struct kvm *kvm, struct kvm_sev_cmd *argp)
>   		return -EINVAL;
>   
>   	if (kvm->arch.vm_type != KVM_X86_SEV_VM &&
> -	    kvm->arch.vm_type != KVM_X86_SEV_ES_VM)
> +	    kvm->arch.vm_type != KVM_X86_SEV_ES_VM &&
> +	    kvm->arch.vm_type != KVM_X86_SNP_VM)
>   		return -EINVAL;
>   
>   	if (copy_from_user(&data, u64_to_user_ptr(argp->data), sizeof(data)))
> @@ -2297,11 +2304,16 @@ void __init sev_set_cpu_caps(void)
>   		kvm_cpu_cap_set(X86_FEATURE_SEV_ES);
>   		kvm_caps.supported_vm_types |= BIT(KVM_X86_SEV_ES_VM);
>   	}
> +	if (sev_snp_enabled) {
> +		kvm_cpu_cap_set(X86_FEATURE_SEV_SNP);
> +		kvm_caps.supported_vm_types |= BIT(KVM_X86_SNP_VM);
> +	}
>   }
>   
>   void __init sev_hardware_setup(void)
>   {
>   	unsigned int eax, ebx, ecx, edx, sev_asid_count, sev_es_asid_count;
> +	bool sev_snp_supported = false;
>   	bool sev_es_supported = false;
>   	bool sev_supported = false;
>   
> @@ -2382,6 +2394,7 @@ void __init sev_hardware_setup(void)
>   	sev_es_asid_count = min_sev_asid - 1;
>   	WARN_ON_ONCE(misc_cg_set_capacity(MISC_CG_RES_SEV_ES, sev_es_asid_count));
>   	sev_es_supported = true;
> +	sev_snp_supported = sev_snp_enabled && cc_platform_has(CC_ATTR_HOST_SEV_SNP);
>   
>   out:
>   	if (boot_cpu_has(X86_FEATURE_SEV))
> @@ -2394,9 +2407,15 @@ void __init sev_hardware_setup(void)
>   		pr_info("SEV-ES %s (ASIDs %u - %u)\n",
>   			sev_es_supported ? "enabled" : "disabled",
>   			min_sev_asid > 1 ? 1 : 0, min_sev_asid - 1);
> +	if (boot_cpu_has(X86_FEATURE_SEV_SNP))
> +		pr_info("SEV-SNP %s (ASIDs %u - %u)\n",
> +			sev_snp_supported ? "enabled" : "disabled",
> +			min_sev_asid > 1 ? 1 : 0, min_sev_asid - 1);
>   
>   	sev_enabled = sev_supported;
>   	sev_es_enabled = sev_es_supported;
> +	sev_snp_enabled = sev_snp_supported;
> +
>   	if (!sev_es_enabled || !cpu_feature_enabled(X86_FEATURE_DEBUG_SWAP) ||
>   	    !cpu_feature_enabled(X86_FEATURE_NO_NESTED_DATA_BP))
>   		sev_es_debug_swap_enabled = false;
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 0f3b59da0d4a..2c162f6a1d78 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4890,7 +4890,8 @@ static int svm_vm_init(struct kvm *kvm)
>   
>   	if (type != KVM_X86_DEFAULT_VM &&
>   	    type != KVM_X86_SW_PROTECTED_VM) {
> -		kvm->arch.has_protected_state = (type == KVM_X86_SEV_ES_VM);
> +		kvm->arch.has_protected_state =
> +			(type == KVM_X86_SEV_ES_VM || type == KVM_X86_SNP_VM);
>   		to_kvm_sev_info(kvm)->need_init = true;
>   	}
>   
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 157eb3f65269..4a01a81dd9b9 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -348,6 +348,18 @@ static __always_inline bool sev_es_guest(struct kvm *kvm)
>   #endif
>   }
>   
> +static __always_inline bool sev_snp_guest(struct kvm *kvm)
> +{
> +#ifdef CONFIG_KVM_AMD_SEV
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +
> +	return (sev->vmsa_features & SVM_SEV_FEAT_SNP_ACTIVE) &&
> +	       !WARN_ON_ONCE(!sev_es_guest(kvm));
> +#else
> +	return false;
> +#endif
> +}
> +
>   static inline void vmcb_mark_all_dirty(struct vmcb *vmcb)
>   {
>   	vmcb->control.clean = 0;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 64eda7949f09..f85735b6235d 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12603,7 +12603,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>   
>   	kvm->arch.vm_type = type;
>   	kvm->arch.has_private_mem =
> -		(type == KVM_X86_SW_PROTECTED_VM);
> +		(type == KVM_X86_SW_PROTECTED_VM || type == KVM_X86_SNP_VM);
>   
>   	ret = kvm_page_track_init(kvm);
>   	if (ret)

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

Paolo



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v12 10/29] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_START command
  2024-03-29 22:58 [PATCH v12 00/29] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (8 preceding siblings ...)
  2024-03-29 22:58 ` [PATCH v12 09/29] KVM: SEV: Add initial SEV-SNP support Michael Roth
@ 2024-03-29 22:58 ` Michael Roth
  2024-03-29 22:58 ` [PATCH v12 11/29] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command Michael Roth
                   ` (19 subsequent siblings)
  29 siblings, 0 replies; 55+ messages in thread
From: Michael Roth @ 2024-03-29 22:58 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

KVM_SEV_SNP_LAUNCH_START begins the launch process for an SEV-SNP guest.
The command initializes a cryptographic digest context used to construct
the measurement of the guest. Other commands can then at that point be
used to load/encrypt data into the guest's initial launch image.

For more information see the SEV-SNP specification.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: hold sev_deactivate_lock when calling SEV_CMD_SNP_DECOMMISSION]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 .../virt/kvm/x86/amd-memory-encryption.rst    |  23 ++-
 arch/x86/include/uapi/asm/kvm.h               |   8 +
 arch/x86/kvm/svm/sev.c                        | 152 +++++++++++++++++-
 arch/x86/kvm/svm/svm.h                        |   1 +
 4 files changed, 180 insertions(+), 4 deletions(-)

diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index f7c007d34114..a10b817c162d 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -459,6 +459,25 @@ issued by the hypervisor to make the guest ready for execution.
 
 Returns: 0 on success, -negative on error
 
+18. KVM_SEV_SNP_LAUNCH_START
+----------------------------
+
+The KVM_SNP_LAUNCH_START command is used for creating the memory encryption
+context for the SEV-SNP guest.
+
+Parameters (in): struct  kvm_sev_snp_launch_start
+
+Returns: 0 on success, -negative on error
+
+::
+
+        struct kvm_sev_snp_launch_start {
+                __u64 policy;           /* Guest policy to use. */
+                __u8 gosvw[16];         /* Guest OS visible workarounds. */
+        };
+
+See the SEV-SNP spec [snp-fw-abi]_ for further detail on the launch input.
+
 Device attribute API
 ====================
 
@@ -490,9 +509,11 @@ References
 ==========
 
 
-See [white-paper]_, [api-spec]_, [amd-apm]_ and [kvm-forum]_ for more info.
+See [white-paper]_, [api-spec]_, [amd-apm]_, [kvm-forum]_, and [snp-fw-abi]_
+for more info.
 
 .. [white-paper] https://developer.amd.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf
 .. [api-spec] https://support.amd.com/TechDocs/55766_SEV-KM_API_Specification.pdf
 .. [amd-apm] https://support.amd.com/TechDocs/24593.pdf (section 15.34)
 .. [kvm-forum]  https://www.linux-kvm.org/images/7/74/02x08A-Thomas_Lendacky-AMDs_Virtualizatoin_Memory_Encryption_Technology.pdf
+.. [snp-fw-abi] https://www.amd.com/system/files/TechDocs/56860.pdf
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 725b75cfe9ff..350ddd5264ea 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -693,6 +693,9 @@ enum sev_cmd_id {
 	/* Second time is the charm; improved versions of the above ioctls.  */
 	KVM_SEV_INIT2,
 
+	/* SNP-specific commands */
+	KVM_SEV_SNP_LAUNCH_START,
+
 	KVM_SEV_NR_MAX,
 };
 
@@ -818,6 +821,11 @@ struct kvm_sev_receive_update_data {
 	__u32 pad2;
 };
 
+struct kvm_sev_snp_launch_start {
+	__u64 policy;
+	__u8 gosvw[16];
+};
+
 #define KVM_X2APIC_API_USE_32BIT_IDS            (1ULL << 0)
 #define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK  (1ULL << 1)
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 3d9771163562..6c7c77e33e62 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -25,6 +25,7 @@
 #include <asm/fpu/xcr.h>
 #include <asm/fpu/xstate.h>
 #include <asm/debugreg.h>
+#include <asm/sev.h>
 
 #include "mmu.h"
 #include "x86.h"
@@ -58,6 +59,10 @@ static u64 sev_supported_vmsa_features;
 #define AP_RESET_HOLD_NAE_EVENT		1
 #define AP_RESET_HOLD_MSR_PROTO		2
 
+/* As defined by SEV-SNP Firmware ABI, under "Guest Policy". */
+#define SNP_POLICY_MASK_SMT		BIT_ULL(16)
+#define SNP_POLICY_MASK_SINGLE_SOCKET	BIT_ULL(20)
+
 static u8 sev_enc_bit;
 static DECLARE_RWSEM(sev_deactivate_lock);
 static DEFINE_MUTEX(sev_bitmap_lock);
@@ -68,6 +73,8 @@ static unsigned int nr_asids;
 static unsigned long *sev_asid_bitmap;
 static unsigned long *sev_reclaim_asid_bitmap;
 
+static int snp_decommission_context(struct kvm *kvm);
+
 struct enc_region {
 	struct list_head list;
 	unsigned long npages;
@@ -94,12 +101,17 @@ static int sev_flush_asids(unsigned int min_asid, unsigned int max_asid)
 	down_write(&sev_deactivate_lock);
 
 	wbinvd_on_all_cpus();
-	ret = sev_guest_df_flush(&error);
+
+	if (sev_snp_enabled)
+		ret = sev_do_cmd(SEV_CMD_SNP_DF_FLUSH, NULL, &error);
+	else
+		ret = sev_guest_df_flush(&error);
 
 	up_write(&sev_deactivate_lock);
 
 	if (ret)
-		pr_err("SEV: DF_FLUSH failed, ret=%d, error=%#x\n", ret, error);
+		pr_err("SEV%s: DF_FLUSH failed, ret=%d, error=%#x\n",
+		       sev_snp_enabled ? "-SNP" : "", ret, error);
 
 	return ret;
 }
@@ -1967,6 +1979,102 @@ int sev_dev_get_attr(u64 attr, u64 *val)
 	}
 }
 
+/*
+ * The guest context contains all the information, keys and metadata
+ * associated with the guest that the firmware tracks to implement SEV
+ * and SNP features. The firmware stores the guest context in hypervisor
+ * provide page via the SNP_GCTX_CREATE command.
+ */
+static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct sev_data_snp_addr data = {};
+	void *context;
+	int rc;
+
+	/* Allocate memory for context page */
+	context = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
+	if (!context)
+		return NULL;
+
+	data.address = __psp_pa(context);
+	rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_GCTX_CREATE, &data, &argp->error);
+	if (rc) {
+		pr_warn("Failed to create SEV-SNP context, rc %d fw_error %d",
+			rc, argp->error);
+		snp_free_firmware_page(context);
+		return NULL;
+	}
+
+	return context;
+}
+
+static int snp_bind_asid(struct kvm *kvm, int *error)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_snp_activate data = {0};
+
+	data.gctx_paddr = __psp_pa(sev->snp_context);
+	data.asid = sev_get_asid(kvm);
+	return sev_issue_cmd(kvm, SEV_CMD_SNP_ACTIVATE, &data, error);
+}
+
+static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_snp_launch_start start = {0};
+	struct kvm_sev_snp_launch_start params;
+	int rc;
+
+	if (!sev_snp_guest(kvm))
+		return -ENOTTY;
+
+	if (copy_from_user(&params, u64_to_user_ptr(argp->data), sizeof(params)))
+		return -EFAULT;
+
+	/* Don't allow userspace to allocate memory for more than 1 SNP context. */
+	if (sev->snp_context) {
+		pr_debug("SEV-SNP context already exists. Refusing to allocate an additional one.");
+		return -EINVAL;
+	}
+
+	sev->snp_context = snp_context_create(kvm, argp);
+	if (!sev->snp_context)
+		return -ENOTTY;
+
+	if (params.policy & SNP_POLICY_MASK_SINGLE_SOCKET) {
+		pr_debug("SEV-SNP hypervisor does not support limiting guests to a single socket.");
+		return -EINVAL;
+	}
+
+	if (!(params.policy & SNP_POLICY_MASK_SMT)) {
+		pr_debug("SEV-SNP hypervisor does not support limiting guests to a single SMT thread.");
+		return -EINVAL;
+	}
+
+	start.gctx_paddr = __psp_pa(sev->snp_context);
+	start.policy = params.policy;
+	memcpy(start.gosvw, params.gosvw, sizeof(params.gosvw));
+	rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_START, &start, &argp->error);
+	if (rc) {
+		pr_debug("SEV_CMD_SNP_LAUNCH_START command failed, rc %d\n", rc);
+		goto e_free_context;
+	}
+
+	sev->fd = argp->sev_fd;
+	rc = snp_bind_asid(kvm, &argp->error);
+	if (rc) {
+		pr_debug("Failed to bind ASID to SEV-SNP context, rc %d\n", rc);
+		goto e_free_context;
+	}
+
+	return 0;
+
+e_free_context:
+	snp_decommission_context(kvm);
+
+	return rc;
+}
+
 int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
@@ -2054,6 +2162,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 	case KVM_SEV_RECEIVE_FINISH:
 		r = sev_receive_finish(kvm, &sev_cmd);
 		break;
+	case KVM_SEV_SNP_LAUNCH_START:
+		r = snp_launch_start(kvm, &sev_cmd);
+		break;
 	default:
 		r = -EINVAL;
 		goto out;
@@ -2249,6 +2360,33 @@ int sev_vm_copy_enc_context_from(struct kvm *kvm, unsigned int source_fd)
 	return ret;
 }
 
+static int snp_decommission_context(struct kvm *kvm)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_snp_addr data = {};
+	int ret;
+
+	/* If context is not created then do nothing */
+	if (!sev->snp_context)
+		return 0;
+
+	data.address = __sme_pa(sev->snp_context);
+	down_write(&sev_deactivate_lock);
+	ret = sev_do_cmd(SEV_CMD_SNP_DECOMMISSION, &data, NULL);
+	if (WARN_ONCE(ret, "failed to release guest context")) {
+		up_write(&sev_deactivate_lock);
+		return ret;
+	}
+
+	up_write(&sev_deactivate_lock);
+
+	/* free the context page now */
+	snp_free_firmware_page(sev->snp_context);
+	sev->snp_context = NULL;
+
+	return 0;
+}
+
 void sev_vm_destroy(struct kvm *kvm)
 {
 	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
@@ -2290,7 +2428,15 @@ void sev_vm_destroy(struct kvm *kvm)
 		}
 	}
 
-	sev_unbind_asid(kvm, sev->handle);
+	if (sev_snp_guest(kvm)) {
+		if (snp_decommission_context(kvm)) {
+			WARN_ONCE(1, "Failed to free SNP guest context, leaking asid!\n");
+			return;
+		}
+	} else {
+		sev_unbind_asid(kvm, sev->handle);
+	}
+
 	sev_asid_free(sev);
 }
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 4a01a81dd9b9..a3c190642c57 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -92,6 +92,7 @@ struct kvm_sev_info {
 	struct list_head mirror_entry; /* Use as a list entry of mirrors */
 	struct misc_cg *misc_cg; /* For misc cgroup accounting */
 	atomic_t migration_in_progress;
+	void *snp_context;      /* SNP guest context page */
 };
 
 struct kvm_svm {
-- 
2.25.1



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v12 11/29] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command
  2024-03-29 22:58 [PATCH v12 00/29] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (9 preceding siblings ...)
  2024-03-29 22:58 ` [PATCH v12 10/29] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_START command Michael Roth
@ 2024-03-29 22:58 ` Michael Roth
       [not found]   ` <8c3685a6-833c-4b3c-83f4-c0bd78bba36e@redhat.com>
  2024-03-29 22:58 ` [PATCH v12 12/29] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_FINISH command Michael Roth
                   ` (18 subsequent siblings)
  29 siblings, 1 reply; 55+ messages in thread
From: Michael Roth @ 2024-03-29 22:58 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

A key aspect of a launching an SNP guest is initializing it with a
known/measured payload which is then encrypted into guest memory as
pre-validated private pages and then measured into the cryptographic
launch context created with KVM_SEV_SNP_LAUNCH_START so that the guest
can attest itself after booting.

Since all private pages are provided by guest_memfd, make use of the
kvm_gmem_populate() interface to handle this. The general flow is that
guest_memfd will handle allocating the pages associated with the GPA
ranges being initialized by each particular call of
KVM_SEV_SNP_LAUNCH_UPDATE, copying data from userspace into those pages,
and then the post_populate callback will do the work of setting the
RMP entries for these pages to private and issuing the SNP firmware
calls to encrypt/measure them.

For more information see the SEV-SNP specification.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 .../virt/kvm/x86/amd-memory-encryption.rst    |  39 ++++
 arch/x86/include/uapi/asm/kvm.h               |  15 ++
 arch/x86/kvm/svm/sev.c                        | 211 ++++++++++++++++++
 3 files changed, 265 insertions(+)

diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index a10b817c162d..4268aa5c380e 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -478,6 +478,45 @@ Returns: 0 on success, -negative on error
 
 See the SEV-SNP spec [snp-fw-abi]_ for further detail on the launch input.
 
+19. KVM_SEV_SNP_LAUNCH_UPDATE
+-----------------------------
+
+The KVM_SEV_SNP_LAUNCH_UPDATE command is used for loading userspace-provided
+data into a guest GPA range, measuring the contents into the SNP guest context
+created by KVM_SEV_SNP_LAUNCH_START, and then encrypting/validating that GPA
+range so that it will be immediately readable using the encryption key
+associated with the guest context once it is booted, after which point it can
+attest the measurement associated with its context before unlocking any
+secrets.
+
+It is required that the GPA ranges initialized by this command have had the
+KVM_MEMORY_ATTRIBUTE_PRIVATE attribute set in advance. See the documentation
+for KVM_SET_MEMORY_ATTRIBUTES for more details on this aspect.
+
+Parameters (in): struct  kvm_sev_snp_launch_update
+
+Returns: 0 on success, -negative on error
+
+::
+
+        struct kvm_sev_snp_launch_update {
+                __u64 gfn_start;        /* Guest page number to load/encrypt data into. */
+                __u64 uaddr;            /* Userspace address of data to be loaded/encrypted. */
+                __u32 len;              /* 4k-aligned length in bytes to copy into guest memory.*/
+                __u8 type;              /* The type of the guest pages being initialized. */
+        };
+
+where the allowed values for page_type are #define'd as::
+
+	KVM_SEV_SNP_PAGE_TYPE_NORMAL
+	KVM_SEV_SNP_PAGE_TYPE_ZERO
+	KVM_SEV_SNP_PAGE_TYPE_UNMEASURED
+	KVM_SEV_SNP_PAGE_TYPE_SECRETS
+	KVM_SEV_SNP_PAGE_TYPE_CPUID
+
+See the SEV-SNP spec [snp-fw-abi]_ for further details on how each page type is
+used/measured.
+
 Device attribute API
 ====================
 
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 350ddd5264ea..956eb548c08e 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -695,6 +695,7 @@ enum sev_cmd_id {
 
 	/* SNP-specific commands */
 	KVM_SEV_SNP_LAUNCH_START,
+	KVM_SEV_SNP_LAUNCH_UPDATE,
 
 	KVM_SEV_NR_MAX,
 };
@@ -826,6 +827,20 @@ struct kvm_sev_snp_launch_start {
 	__u8 gosvw[16];
 };
 
+/* Kept in sync with firmware values for simplicity. */
+#define KVM_SEV_SNP_PAGE_TYPE_NORMAL		0x1
+#define KVM_SEV_SNP_PAGE_TYPE_ZERO		0x3
+#define KVM_SEV_SNP_PAGE_TYPE_UNMEASURED	0x4
+#define KVM_SEV_SNP_PAGE_TYPE_SECRETS		0x5
+#define KVM_SEV_SNP_PAGE_TYPE_CPUID		0x6
+
+struct kvm_sev_snp_launch_update {
+	__u64 gfn_start;
+	__u64 uaddr;
+	__u32 len;
+	__u8 type;
+};
+
 #define KVM_X2APIC_API_USE_32BIT_IDS            (1ULL << 0)
 #define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK  (1ULL << 1)
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 6c7c77e33e62..a8a8a285b4a4 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -247,6 +247,35 @@ static void sev_decommission(unsigned int handle)
 	sev_guest_decommission(&decommission, NULL);
 }
 
+static int snp_page_reclaim(u64 pfn)
+{
+	struct sev_data_snp_page_reclaim data = {0};
+	int err, rc;
+
+	data.paddr = __sme_set(pfn << PAGE_SHIFT);
+	rc = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
+	if (WARN_ON_ONCE(rc)) {
+		/*
+		 * This shouldn't happen under normal circumstances, but if the
+		 * reclaim failed, then the page is no longer safe to use.
+		 */
+		snp_leak_pages(pfn, 1);
+	}
+
+	return rc;
+}
+
+static int host_rmp_make_shared(u64 pfn, enum pg_level level, bool leak)
+{
+	int rc;
+
+	rc = rmp_make_shared(pfn, level);
+	if (rc && leak)
+		snp_leak_pages(pfn, page_level_size(level) >> PAGE_SHIFT);
+
+	return rc;
+}
+
 static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
 {
 	struct sev_data_deactivate deactivate;
@@ -2075,6 +2104,185 @@ static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	return rc;
 }
 
+struct sev_gmem_populate_args {
+	__u8 type;
+	int sev_fd;
+	int fw_error;
+};
+
+static int sev_gmem_post_populate(struct kvm *kvm, struct kvm_memory_slot *slot,
+				  gfn_t gfn_start, kvm_pfn_t pfn, void __user *src,
+				  int order, void *opaque)
+{
+	struct sev_gmem_populate_args *sev_populate_args = opaque;
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	int npages = (1 << order);
+	int n_private = 0;
+	int ret, i;
+	gfn_t gfn;
+
+	pr_debug("%s: gfn_start %llx pfn_start %llx npages %d\n",
+		 __func__, gfn_start, pfn, npages);
+
+	for (gfn = gfn_start, i = 0; gfn < gfn_start + npages; gfn++, i++) {
+		struct sev_data_snp_launch_update fw_args = {0};
+		bool assigned;
+		int level;
+
+		if (!kvm_mem_is_private(kvm, gfn)) {
+			pr_debug("%s: Failed to ensure GFN 0x%llx has private memory attribute set\n",
+				 __func__, gfn);
+			ret = -EINVAL;
+			break;
+		}
+
+		ret = snp_lookup_rmpentry((u64)pfn + i, &assigned, &level);
+		if (ret || assigned) {
+			pr_debug("%s: Failed to ensure GFN 0x%llx RMP entry is initial shared state, ret: %d assigned: %d\n",
+				 __func__, gfn, ret, assigned);
+			break;
+		}
+
+		ret = rmp_make_private(pfn + i, gfn << PAGE_SHIFT, PG_LEVEL_4K,
+				       sev_get_asid(kvm), true);
+		if (ret) {
+			pr_debug("%s: Failed to convert GFN 0x%llx to private, ret: %d\n",
+				 __func__, gfn, ret);
+			break;
+		}
+
+		n_private++;
+
+		fw_args.gctx_paddr = __psp_pa(sev->snp_context);
+		fw_args.address = __sme_set(pfn_to_hpa(pfn + i));
+		fw_args.page_size = PG_LEVEL_TO_RMP(PG_LEVEL_4K);
+		fw_args.page_type = sev_populate_args->type;
+		ret = __sev_issue_cmd(sev_populate_args->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE,
+				      &fw_args, &sev_populate_args->fw_error);
+		if (ret) {
+			pr_debug("%s: SEV-SNP launch update failed, ret: 0x%x, fw_error: 0x%x\n",
+				 __func__, ret, sev_populate_args->fw_error);
+
+			if (snp_page_reclaim(pfn + i))
+				break;
+
+			/*
+			 * When invalid CPUID function entries are detected,
+			 * firmware writes the expected values into the page and
+			 * leaves it unencrypted so it can be used for debugging
+			 * and error-reporting.
+			 *
+			 * Copy this page back into the source buffer so
+			 * userspace can use this information to provide
+			 * information on which CPUID leaves/fields failed CPUID
+			 * validation.
+			 */
+			if (sev_populate_args->type == KVM_SEV_SNP_PAGE_TYPE_CPUID &&
+			    sev_populate_args->fw_error == SEV_RET_INVALID_PARAM) {
+				void *vaddr;
+
+				host_rmp_make_shared(pfn + i, PG_LEVEL_4K, true);
+				vaddr = kmap_local_pfn(pfn + i);
+
+				if (copy_to_user(src + i * PAGE_SIZE,
+						 vaddr, PAGE_SIZE))
+					pr_debug("Failed to write CPUID page back to userspace\n");
+
+				kunmap_local(vaddr);
+			}
+
+			break;
+		}
+	}
+
+	if (ret) {
+		pr_debug("%s: exiting with error ret %d, undoing %d populated gmem pages.\n",
+			 __func__, ret, n_private);
+		for (i = 0; i < n_private; i++)
+			host_rmp_make_shared(pfn + i, PG_LEVEL_4K, true);
+	}
+
+	return ret;
+}
+
+static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_gmem_populate_args sev_populate_args = {0};
+	struct kvm_gmem_populate_args populate_args = {0};
+	struct kvm_sev_snp_launch_update params;
+	struct kvm_memory_slot *memslot;
+	unsigned int npages;
+	int ret = 0;
+
+	if (!sev_snp_guest(kvm) || !sev->snp_context)
+		return -EINVAL;
+
+	if (copy_from_user(&params, u64_to_user_ptr(argp->data), sizeof(params)))
+		return -EFAULT;
+
+	if (!IS_ALIGNED(params.len, PAGE_SIZE) ||
+	    (params.type != KVM_SEV_SNP_PAGE_TYPE_NORMAL &&
+	     params.type != KVM_SEV_SNP_PAGE_TYPE_ZERO &&
+	     params.type != KVM_SEV_SNP_PAGE_TYPE_UNMEASURED &&
+	     params.type != KVM_SEV_SNP_PAGE_TYPE_SECRETS &&
+	     params.type != KVM_SEV_SNP_PAGE_TYPE_CPUID))
+		return -EINVAL;
+
+	npages = params.len / PAGE_SIZE;
+
+	pr_debug("%s: GFN range 0x%llx-0x%llx type %d\n", __func__,
+		 params.gfn_start, params.gfn_start + npages, params.type);
+
+	/*
+	 * For each GFN that's being prepared as part of the initial guest
+	 * state, the following pre-conditions are verified:
+	 *
+	 *   1) The backing memslot is a valid private memslot.
+	 *   2) The GFN has been set to private via KVM_SET_MEMORY_ATTRIBUTES
+	 *      beforehand.
+	 *   3) The PFN of the guest_memfd has not already been set to private
+	 *      in the RMP table.
+	 *
+	 * The KVM MMU relies on kvm->mmu_invalidate_seq to retry nested page
+	 * faults if there's a race between a fault and an attribute update via
+	 * KVM_SET_MEMORY_ATTRIBUTES, and a similar approach could be utilized
+	 * here. However, kvm->slots_lock guards against both this as well as
+	 * concurrent memslot updates occurring while these checks are being
+	 * performed, so use that here to make it easier to reason about the
+	 * initial expected state and better guard against unexpected
+	 * situations.
+	 */
+	mutex_lock(&kvm->slots_lock);
+
+	memslot = gfn_to_memslot(kvm, params.gfn_start);
+	if (!kvm_slot_can_be_private(memslot)) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	sev_populate_args.sev_fd = argp->sev_fd;
+	sev_populate_args.type = params.type;
+
+	populate_args.opaque = &sev_populate_args;
+	populate_args.gfn = params.gfn_start;
+	populate_args.src = u64_to_user_ptr(params.uaddr);
+	populate_args.npages = npages;
+	populate_args.do_memcpy = params.type != KVM_SEV_SNP_PAGE_TYPE_ZERO;
+	populate_args.post_populate = sev_gmem_post_populate;
+
+	ret = kvm_gmem_populate(kvm, memslot, &populate_args);
+	if (ret) {
+		argp->error = sev_populate_args.fw_error;
+		pr_debug("%s: kvm_gmem_populate failed, ret %d\n", __func__, ret);
+	}
+
+out:
+	mutex_unlock(&kvm->slots_lock);
+
+	return ret;
+}
+
 int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
@@ -2165,6 +2373,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 	case KVM_SEV_SNP_LAUNCH_START:
 		r = snp_launch_start(kvm, &sev_cmd);
 		break;
+	case KVM_SEV_SNP_LAUNCH_UPDATE:
+		r = snp_launch_update(kvm, &sev_cmd);
+		break;
 	default:
 		r = -EINVAL;
 		goto out;
-- 
2.25.1



^ permalink raw reply	[flat|nested] 55+ messages in thread

[parent not found: <8c3685a6-833c-4b3c-83f4-c0bd78bba36e@redhat.com>]

* Re: [PATCH v12 11/29] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command
       [not found]   ` <8c3685a6-833c-4b3c-83f4-c0bd78bba36e@redhat.com>
@ 2024-04-01 22:22     ` Michael Roth
  2024-04-02 22:58       ` Isaku Yamahata
  0 siblings, 1 reply; 55+ messages in thread
From: Michael Roth @ 2024-04-01 22:22 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, seanjc, vkuznets,
	jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, Brijesh Singh, Xu Yilun, Binbin Wu, Xiaoyao Li

On Sat, Mar 30, 2024 at 09:31:40PM +0100, Paolo Bonzini wrote:
> On 3/29/24 23:58, Michael Roth wrote:

Cc'ing some more TDX folks.

> > +	memslot = gfn_to_memslot(kvm, params.gfn_start);
> > +	if (!kvm_slot_can_be_private(memslot)) {
> > +		ret = -EINVAL;
> > +		goto out;
> > +	}
> > +
> 
> This can be moved to kvm_gmem_populate.

That does seem nicer, but I hadn't really seen that pattern for
kvm_gmem_get_pfn()/etc. so wasn't sure if that was by design or not. I
suppose in those cases the memslot is already available at the main
KVM page-fault call-sites so maybe it was just unecessary to do the
lookup internally there.

> 
> > +	populate_args.src = u64_to_user_ptr(params.uaddr);
> 
> This is not used if !do_memcpy, and in fact src is redundant with do_memcpy.
> Overall the arguments can be "kvm, gfn, src, npages, post_populate, opaque"
> which are relatively few and do not need the struct.

This was actually a consideration for TDX that was discussed during the
"Finalizing internal guest_memfd APIs for SNP/TDX" PUCK call. In that
case, they have a TDH_MEM_PAGE_ADD seamcall that takes @src and encrypts
it, loads it into the destination page, and then maps it into SecureEPT
through a single call. So in that particular case, @src would be
initialized, but the memcpy() would be unecessary.

It's not actually clear TDX plans to use this interface. In v19 they still
used a KVM MMU hook (set_private_spte) that gets triggered through a call
to KVM_MAP_MEMORY->kvm_mmu_map_tdp_page() prior to starting the guest. But
more recent discussion[1] suggests that KVM_MAP_MEMORY->kvm_mmu_map_tdp_page()
would now only be used to create upper levels of SecureEPT, and the
actual mapping/encrypting of the leaf page would be handled by a
separate TDX-specific interface.

With that model, the potential for using kvm_gmem_populate() seemed
plausible to I was trying to make it immediately usable for that
purpose. But maybe the TDX folks can confirm whether this would be
usable for them or not. (kvm_gmem_populate was introduced here[2] for
reference/background)

-Mike

[1] https://lore.kernel.org/kvm/20240319155349.GE1645738@ls.amr.corp.intel.com/T/#m8580d8e39476be565534d6ff5f5afa295fe8d4f7
[2] https://lore.kernel.org/kvm/20240329212444.395559-3-michael.roth@amd.com/T/#m3aeba660fcc991602820d3703b1265722b871025)

> 
> I'll do that when posting the next version of the patches in kvm-coco-queue.
> 
> Paolo
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v12 11/29] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command
  2024-04-01 22:22     ` Michael Roth
@ 2024-04-02 22:58       ` Isaku Yamahata
  2024-04-03 12:51         ` Paolo Bonzini
  0 siblings, 1 reply; 55+ messages in thread
From: Isaku Yamahata @ 2024-04-02 22:58 UTC (permalink / raw)
  To: Michael Roth
  Cc: Paolo Bonzini, kvm, linux-coco, linux-mm, linux-crypto, x86,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	seanjc, vkuznets, jmattson, luto, dave.hansen, slp, pgonda,
	peterz, srinivas.pandruvada, rientjes, dovmurik, tobin, bp,
	vbabka, kirill, ak, tony.luck, sathyanarayanan.kuppuswamy,
	alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, Brijesh Singh, Xu Yilun, Binbin Wu, Xiaoyao Li,
	isaku.yamahata, isaku.yamahata

On Mon, Apr 01, 2024 at 05:22:29PM -0500,
Michael Roth <michael.roth@amd.com> wrote:

> On Sat, Mar 30, 2024 at 09:31:40PM +0100, Paolo Bonzini wrote:
> > On 3/29/24 23:58, Michael Roth wrote:
> 
> Cc'ing some more TDX folks.
> 
> > > +	memslot = gfn_to_memslot(kvm, params.gfn_start);
> > > +	if (!kvm_slot_can_be_private(memslot)) {
> > > +		ret = -EINVAL;
> > > +		goto out;
> > > +	}
> > > +
> > 
> > This can be moved to kvm_gmem_populate.
> 
> That does seem nicer, but I hadn't really seen that pattern for
> kvm_gmem_get_pfn()/etc. so wasn't sure if that was by design or not. I
> suppose in those cases the memslot is already available at the main
> KVM page-fault call-sites so maybe it was just unecessary to do the
> lookup internally there.
> 
> > 
> > > +	populate_args.src = u64_to_user_ptr(params.uaddr);
> > 
> > This is not used if !do_memcpy, and in fact src is redundant with do_memcpy.
> > Overall the arguments can be "kvm, gfn, src, npages, post_populate, opaque"
> > which are relatively few and do not need the struct.
> 
> This was actually a consideration for TDX that was discussed during the
> "Finalizing internal guest_memfd APIs for SNP/TDX" PUCK call. In that
> case, they have a TDH_MEM_PAGE_ADD seamcall that takes @src and encrypts
> it, loads it into the destination page, and then maps it into SecureEPT
> through a single call. So in that particular case, @src would be
> initialized, but the memcpy() would be unecessary.
> 
> It's not actually clear TDX plans to use this interface. In v19 they still
> used a KVM MMU hook (set_private_spte) that gets triggered through a call
> to KVM_MAP_MEMORY->kvm_mmu_map_tdp_page() prior to starting the guest. But
> more recent discussion[1] suggests that KVM_MAP_MEMORY->kvm_mmu_map_tdp_page()
> would now only be used to create upper levels of SecureEPT, and the
> actual mapping/encrypting of the leaf page would be handled by a
> separate TDX-specific interface.

I think TDX can use it with slight change. Pass vcpu instead of KVM, page pin
down and mmu_lock.  TDX requires non-leaf Secure page tables to be populated
before adding a leaf.  Maybe with the assumption that vcpu doesn't run, GFN->PFN
relation is stable so that mmu_lock isn't needed? What about punch hole?

The flow would be something like as follows.

- lock slots_lock

- kvm_gmem_populate(vcpu)
  - pin down source page instead of do_memcopy.
  - get pfn with __kvm_gmem_get_pfn()

  - read lock mmu_lock
  - in the post_populate callback
    - lookup tdp mmu page table to check if the table is populated.
      lookup only version of kvm_tdp_mmu_map().
      We need vcpu instead of kvm.
    - TDH_MEM_PAGE_ADD
  - read unlock mmu_lock

- unlock slots_lock

Thanks,

> With that model, the potential for using kvm_gmem_populate() seemed
> plausible to I was trying to make it immediately usable for that
> purpose. But maybe the TDX folks can confirm whether this would be
> usable for them or not. (kvm_gmem_populate was introduced here[2] for
> reference/background)
> 
> -Mike
> 
> [1] https://lore.kernel.org/kvm/20240319155349.GE1645738@ls.amr.corp.intel.com/T/#m8580d8e39476be565534d6ff5f5afa295fe8d4f7
> [2] https://lore.kernel.org/kvm/20240329212444.395559-3-michael.roth@amd.com/T/#m3aeba660fcc991602820d3703b1265722b871025)
> 
> 
> > 
> > I'll do that when posting the next version of the patches in kvm-coco-queue.
> > 
> > Paolo
> > 
> 

-- 
Isaku Yamahata <isaku.yamahata@intel.com>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v12 11/29] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command
  2024-04-02 22:58       ` Isaku Yamahata
@ 2024-04-03 12:51         ` Paolo Bonzini
  2024-04-03 15:37           ` Isaku Yamahata
  0 siblings, 1 reply; 55+ messages in thread
From: Paolo Bonzini @ 2024-04-03 12:51 UTC (permalink / raw)
  To: Isaku Yamahata
  Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	seanjc, vkuznets, jmattson, luto, dave.hansen, slp, pgonda,
	peterz, srinivas.pandruvada, rientjes, dovmurik, tobin, bp,
	vbabka, kirill, ak, tony.luck, sathyanarayanan.kuppuswamy,
	alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, Brijesh Singh, Xu Yilun, Binbin Wu, Xiaoyao Li,
	isaku.yamahata

On Wed, Apr 3, 2024 at 12:58 AM Isaku Yamahata <isaku.yamahata@intel.com> wrote:
> I think TDX can use it with slight change. Pass vcpu instead of KVM, page pin
> down and mmu_lock.  TDX requires non-leaf Secure page tables to be populated
> before adding a leaf.  Maybe with the assumption that vcpu doesn't run, GFN->PFN
> relation is stable so that mmu_lock isn't needed? What about punch hole?
>
> The flow would be something like as follows.
>
> - lock slots_lock
>
> - kvm_gmem_populate(vcpu)
>   - pin down source page instead of do_memcopy.

Both pinning the source page and the memcpy can be done in the
callback.  I think the right thing to do is:

1) eliminate do_memcpy, letting AMD code taking care of
   copy_from_user.

2) pass to the callback only gfn/pfn/src, where src is computed as

    args->src ? args->src + i * PAGE_SIZE : NULL

If another architecture/vendor needs do_memcpy, they can add
something like kvm_gmem_populate_copy.

>   - get pfn with __kvm_gmem_get_pfn()
>   - read lock mmu_lock
>   - in the post_populate callback
>     - lookup tdp mmu page table to check if the table is populated.
>       lookup only version of kvm_tdp_mmu_map().
>       We need vcpu instead of kvm.

Passing vcpu can be done using the opaque callback argument to
kvm_gmem_populate.

Likewise, the mmu_lock can be taken by the TDX post_populate
callback.

Paolo

>     - TDH_MEM_PAGE_ADD
>   - read unlock mmu_lock
>
> - unlock slots_lock
>
> Thanks,
>
> > With that model, the potential for using kvm_gmem_populate() seemed
> > plausible to I was trying to make it immediately usable for that
> > purpose. But maybe the TDX folks can confirm whether this would be
> > usable for them or not. (kvm_gmem_populate was introduced here[2] for
> > reference/background)
> >
> > -Mike
> >
> > [1] https://lore.kernel.org/kvm/20240319155349.GE1645738@ls.amr.corp.intel.com/T/#m8580d8e39476be565534d6ff5f5afa295fe8d4f7
> > [2] https://lore.kernel.org/kvm/20240329212444.395559-3-michael.roth@amd.com/T/#m3aeba660fcc991602820d3703b1265722b871025)



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v12 11/29] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command
  2024-04-03 12:51         ` Paolo Bonzini
@ 2024-04-03 15:37           ` Isaku Yamahata
  0 siblings, 0 replies; 55+ messages in thread
From: Isaku Yamahata @ 2024-04-03 15:37 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Isaku Yamahata, Michael Roth, kvm, linux-coco, linux-mm,
	linux-crypto, x86, linux-kernel, tglx, mingo, jroedel,
	thomas.lendacky, hpa, ardb, seanjc, vkuznets, jmattson, luto,
	dave.hansen, slp, pgonda, peterz, srinivas.pandruvada, rientjes,
	dovmurik, tobin, bp, vbabka, kirill, ak, tony.luck,
	sathyanarayanan.kuppuswamy, alpergun, jarkko, ashish.kalra,
	nikunj.dadhania, pankaj.gupta, liam.merwick, Brijesh Singh,
	Xu Yilun, Binbin Wu, Xiaoyao Li, isaku.yamahata

On Wed, Apr 03, 2024 at 02:51:59PM +0200,
Paolo Bonzini <pbonzini@redhat.com> wrote:

> On Wed, Apr 3, 2024 at 12:58 AM Isaku Yamahata <isaku.yamahata@intel.com> wrote:
> > I think TDX can use it with slight change. Pass vcpu instead of KVM, page pin
> > down and mmu_lock.  TDX requires non-leaf Secure page tables to be populated
> > before adding a leaf.  Maybe with the assumption that vcpu doesn't run, GFN->PFN
> > relation is stable so that mmu_lock isn't needed? What about punch hole?
> >
> > The flow would be something like as follows.
> >
> > - lock slots_lock
> >
> > - kvm_gmem_populate(vcpu)
> >   - pin down source page instead of do_memcopy.
> 
> Both pinning the source page and the memcpy can be done in the
> callback.  I think the right thing to do is:
> 
> 1) eliminate do_memcpy, letting AMD code taking care of
>    copy_from_user.
> 
> 2) pass to the callback only gfn/pfn/src, where src is computed as
> 
>     args->src ? args->src + i * PAGE_SIZE : NULL
> 
> If another architecture/vendor needs do_memcpy, they can add
> something like kvm_gmem_populate_copy.
> 
> >   - get pfn with __kvm_gmem_get_pfn()
> >   - read lock mmu_lock
> >   - in the post_populate callback
> >     - lookup tdp mmu page table to check if the table is populated.
> >       lookup only version of kvm_tdp_mmu_map().
> >       We need vcpu instead of kvm.
> 
> Passing vcpu can be done using the opaque callback argument to
> kvm_gmem_populate.
> 
> Likewise, the mmu_lock can be taken by the TDX post_populate
> callback.

Yes, it should work.  Let me give it a try.
-- 
Isaku Yamahata <isaku.yamahata@intel.com>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v12 12/29] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_FINISH command
  2024-03-29 22:58 [PATCH v12 00/29] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (10 preceding siblings ...)
  2024-03-29 22:58 ` [PATCH v12 11/29] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command Michael Roth
@ 2024-03-29 22:58 ` Michael Roth
       [not found]   ` <40382494-7253-442b-91a8-e80c38fb4f2c@redhat.com>
  2024-03-29 22:58 ` [PATCH v12 13/29] KVM: SEV: Add support to handle GHCB GPA register VMGEXIT Michael Roth
                   ` (17 subsequent siblings)
  29 siblings, 1 reply; 55+ messages in thread
From: Michael Roth @ 2024-03-29 22:58 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, Brijesh Singh, Harald Hoyer

Add a KVM_SEV_SNP_LAUNCH_FINISH command to finalize the cryptographic
launch digest and stores it as the measurement of the guest at launch
time. Also extend the existing SNP firmware data structures to support
enforcing the use of Version Loaded Endorsement Keys by guests as part
of this command.

While finalizing the launch flow, it also issues the LAUNCH_UPDATE SNP
firmware commands to encrypt/measure the initial VMSA pages for each
configured vCPU. This involves setting the RMP entries for those pages
to provide, so also add handling to clean up the RMP entries for these
pages whening free'ing vCPUs.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Harald Hoyer <harald@profian.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: always measure BSP first to get consistent launch measurements]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 .../virt/kvm/x86/amd-memory-encryption.rst    |  26 ++++
 arch/x86/include/uapi/asm/kvm.h               |  15 ++
 arch/x86/kvm/svm/sev.c                        | 137 ++++++++++++++++++
 include/linux/psp-sev.h                       |   4 +-
 4 files changed, 181 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index 4268aa5c380e..a49e8cff9133 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -517,6 +517,32 @@ where the allowed values for page_type are #define'd as::
 See the SEV-SNP spec [snp-fw-abi]_ for further details on how each page type is
 used/measured.
 
+20. KVM_SEV_SNP_LAUNCH_FINISH
+-----------------------------
+
+After completion of the SNP guest launch flow, the KVM_SEV_SNP_LAUNCH_FINISH
+command can be issued to make the guest ready for execution.
+
+Parameters (in): struct kvm_sev_snp_launch_finish
+
+Returns: 0 on success, -negative on error
+
+::
+
+        struct kvm_sev_snp_launch_finish {
+                __u64 id_block_uaddr;
+                __u64 id_auth_uaddr;
+                __u8 id_block_en;
+                __u8 auth_key_en;
+                __u8 vlek_required;
+                __u8 host_data[32];
+                __u8 pad[6];
+        };
+
+
+See SEV-SNP specification [snp-fw-abi]_ for SNP_LAUNCH_FINISH further details
+on launch finish input parameters.
+
 Device attribute API
 ====================
 
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 956eb548c08e..2b08fcbe039a 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -696,6 +696,7 @@ enum sev_cmd_id {
 	/* SNP-specific commands */
 	KVM_SEV_SNP_LAUNCH_START,
 	KVM_SEV_SNP_LAUNCH_UPDATE,
+	KVM_SEV_SNP_LAUNCH_FINISH,
 
 	KVM_SEV_NR_MAX,
 };
@@ -841,6 +842,20 @@ struct kvm_sev_snp_launch_update {
 	__u8 type;
 };
 
+#define KVM_SEV_SNP_ID_BLOCK_SIZE	96
+#define KVM_SEV_SNP_ID_AUTH_SIZE	4096
+#define KVM_SEV_SNP_FINISH_DATA_SIZE	32
+
+struct kvm_sev_snp_launch_finish {
+	__u64 id_block_uaddr;
+	__u64 id_auth_uaddr;
+	__u8 id_block_en;
+	__u8 auth_key_en;
+	__u8 vlek_required;
+	__u8 host_data[KVM_SEV_SNP_FINISH_DATA_SIZE];
+	__u8 pad[6];
+};
+
 #define KVM_X2APIC_API_USE_32BIT_IDS            (1ULL << 0)
 #define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK  (1ULL << 1)
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index a8a8a285b4a4..3d6c030091c2 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -63,6 +63,8 @@ static u64 sev_supported_vmsa_features;
 #define SNP_POLICY_MASK_SMT		BIT_ULL(16)
 #define SNP_POLICY_MASK_SINGLE_SOCKET	BIT_ULL(20)
 
+#define INITIAL_VMSA_GPA 0xFFFFFFFFF000
+
 static u8 sev_enc_bit;
 static DECLARE_RWSEM(sev_deactivate_lock);
 static DEFINE_MUTEX(sev_bitmap_lock);
@@ -2283,6 +2285,125 @@ static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	return ret;
 }
 
+static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_snp_launch_update data = {};
+	bool boot_vcpu_handled = false;
+	struct kvm_vcpu *vcpu;
+	unsigned long i;
+	int ret;
+
+	data.gctx_paddr = __psp_pa(sev->snp_context);
+	data.page_type = SNP_PAGE_TYPE_VMSA;
+
+handle_remaining_vcpus:
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		struct vcpu_svm *svm = to_svm(vcpu);
+		u64 pfn = __pa(svm->sev_es.vmsa) >> PAGE_SHIFT;
+
+		/* Handle boot vCPU first to ensure consistent measurement of initial state. */
+		if (!boot_vcpu_handled && vcpu->vcpu_id != 0)
+			continue;
+
+		if (boot_vcpu_handled && vcpu->vcpu_id == 0)
+			continue;
+
+		/* Perform some pre-encryption checks against the VMSA */
+		ret = sev_es_sync_vmsa(svm);
+		if (ret)
+			return ret;
+
+		/* Transition the VMSA page to a firmware state. */
+		ret = rmp_make_private(pfn, INITIAL_VMSA_GPA, PG_LEVEL_4K, sev->asid, true);
+		if (ret)
+			return ret;
+
+		/* Issue the SNP command to encrypt the VMSA */
+		data.address = __sme_pa(svm->sev_es.vmsa);
+		ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE,
+				      &data, &argp->error);
+		if (ret) {
+			snp_page_reclaim(pfn);
+			return ret;
+		}
+
+		svm->vcpu.arch.guest_state_protected = true;
+
+		if (!boot_vcpu_handled) {
+			boot_vcpu_handled = true;
+			goto handle_remaining_vcpus;
+		}
+	}
+
+	return 0;
+}
+
+static int snp_launch_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct kvm_sev_snp_launch_finish params;
+	struct sev_data_snp_launch_finish *data;
+	void *id_block = NULL, *id_auth = NULL;
+	int ret;
+
+	if (!sev_snp_guest(kvm))
+		return -ENOTTY;
+
+	if (!sev->snp_context)
+		return -EINVAL;
+
+	if (copy_from_user(&params, u64_to_user_ptr(argp->data), sizeof(params)))
+		return -EFAULT;
+
+	/* Measure all vCPUs using LAUNCH_UPDATE before finalizing the launch flow. */
+	ret = snp_launch_update_vmsa(kvm, argp);
+	if (ret)
+		return ret;
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
+	if (!data)
+		return -ENOMEM;
+
+	if (params.id_block_en) {
+		id_block = psp_copy_user_blob(params.id_block_uaddr, KVM_SEV_SNP_ID_BLOCK_SIZE);
+		if (IS_ERR(id_block)) {
+			ret = PTR_ERR(id_block);
+			goto e_free;
+		}
+
+		data->id_block_en = 1;
+		data->id_block_paddr = __sme_pa(id_block);
+
+		id_auth = psp_copy_user_blob(params.id_auth_uaddr, KVM_SEV_SNP_ID_AUTH_SIZE);
+		if (IS_ERR(id_auth)) {
+			ret = PTR_ERR(id_auth);
+			goto e_free_id_block;
+		}
+
+		data->id_auth_paddr = __sme_pa(id_auth);
+
+		if (params.auth_key_en)
+			data->auth_key_en = 1;
+	}
+
+	data->vcek_disabled = params.vlek_required;
+
+	memcpy(data->host_data, params.host_data, KVM_SEV_SNP_FINISH_DATA_SIZE);
+	data->gctx_paddr = __psp_pa(sev->snp_context);
+	ret = sev_issue_cmd(kvm, SEV_CMD_SNP_LAUNCH_FINISH, data, &argp->error);
+
+	kfree(id_auth);
+
+e_free_id_block:
+	kfree(id_block);
+
+e_free:
+	kfree(data);
+
+	return ret;
+}
+
 int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
@@ -2376,6 +2497,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 	case KVM_SEV_SNP_LAUNCH_UPDATE:
 		r = snp_launch_update(kvm, &sev_cmd);
 		break;
+	case KVM_SEV_SNP_LAUNCH_FINISH:
+		r = snp_launch_finish(kvm, &sev_cmd);
+		break;
 	default:
 		r = -EINVAL;
 		goto out;
@@ -2866,11 +2990,24 @@ void sev_free_vcpu(struct kvm_vcpu *vcpu)
 
 	svm = to_svm(vcpu);
 
+	/*
+	 * If it's an SNP guest, then the VMSA was marked in the RMP table as
+	 * a guest-owned page. Transition the page to hypervisor state before
+	 * releasing it back to the system.
+	 */
+	if (sev_snp_guest(vcpu->kvm)) {
+		u64 pfn = __pa(svm->sev_es.vmsa) >> PAGE_SHIFT;
+
+		if (host_rmp_make_shared(pfn, PG_LEVEL_4K, true))
+			goto skip_vmsa_free;
+	}
+
 	if (vcpu->arch.guest_state_protected)
 		sev_flush_encrypted_page(vcpu, svm->sev_es.vmsa);
 
 	__free_page(virt_to_page(svm->sev_es.vmsa));
 
+skip_vmsa_free:
 	if (svm->sev_es.ghcb_sa_free)
 		kvfree(svm->sev_es.ghcb_sa);
 }
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 3705c2044fc0..903ddfea8585 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -658,6 +658,7 @@ struct sev_data_snp_launch_update {
  * @id_auth_paddr: system physical address of ID block authentication structure
  * @id_block_en: indicates whether ID block is present
  * @auth_key_en: indicates whether author key is present in authentication structure
+ * @vcek_disabled: indicates whether use of VCEK is allowed for attestation reports
  * @rsvd: reserved
  * @host_data: host-supplied data for guest, not interpreted by firmware
  */
@@ -667,7 +668,8 @@ struct sev_data_snp_launch_finish {
 	u64 id_auth_paddr;
 	u8 id_block_en:1;
 	u8 auth_key_en:1;
-	u64 rsvd:62;
+	u8 vcek_disabled:1;
+	u64 rsvd:61;
 	u8 host_data[32];
 } __packed;
 
-- 
2.25.1



^ permalink raw reply	[flat|nested] 55+ messages in thread

[parent not found: <40382494-7253-442b-91a8-e80c38fb4f2c@redhat.com>]

* Re: [PATCH v12 12/29] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_FINISH command
       [not found]   ` <40382494-7253-442b-91a8-e80c38fb4f2c@redhat.com>
@ 2024-04-01 23:17     ` Michael Roth
  2024-04-03 12:56       ` Paolo Bonzini
  0 siblings, 1 reply; 55+ messages in thread
From: Michael Roth @ 2024-04-01 23:17 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, seanjc, vkuznets,
	jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, Brijesh Singh, Harald Hoyer

On Sat, Mar 30, 2024 at 09:41:30PM +0100, Paolo Bonzini wrote:
> On 3/29/24 23:58, Michael Roth wrote:
> > 
> > +		/* Handle boot vCPU first to ensure consistent measurement of initial state. */
> > +		if (!boot_vcpu_handled && vcpu->vcpu_id != 0)
> > +			continue;
> > +
> > +		if (boot_vcpu_handled && vcpu->vcpu_id == 0)
> > +			continue;
> 
> Why was this not necessary for KVM_SEV_LAUNCH_UPDATE_VMSA?  Do we need it
> now?

I tried to find the original discussion for more context, but can't seem to
locate it. But AIUI, there are cases where a VMM may create AP vCPUs earlier
than it does the BSP, in which case kvm_for_each_vcpu() might return an AP
as it's first entry and cause that VMSA to get measured before, leading
to a different measurement depending on the creation ordering.

Measuring the BSP first ensures consistent measurement, since the
initial AP contents are all identical so their ordering doesn't matter.

For SNP, it makes sense to take the more consistent approach right off
the bat. But for SEV-ES, it's possible that there are VMMs/userspaces
out there that have already accounted for this in their measurement
calculations, so it could cause issues if we should the behavior for all
SEV-ES. We could however limit the change to KVM_X86_SEV_ES_VM and
document that as part of KVM_SEV_INIT2, since there is similarly chance
for measurement changes their WRT to the new FPU/XSAVE sync'ing that was
added.

> 
> > +See SEV-SNP specification [snp-fw-abi]_ for SNP_LAUNCH_FINISH further details
> > +on launch finish input parameters.
> 
> See SNP_LAUNCH_FINISH in the SEV-SNP specification [snp-fw-abi]_ for further
> details on the input parameters in ``struct kvm_sev_snp_launch_finish``.

Will make similar changes for the others as well. Thanks!

-Mike

> 
> Paolo
> 
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v12 12/29] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_FINISH command
  2024-04-01 23:17     ` Michael Roth
@ 2024-04-03 12:56       ` Paolo Bonzini
  0 siblings, 0 replies; 55+ messages in thread
From: Paolo Bonzini @ 2024-04-03 12:56 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, seanjc, vkuznets,
	jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, Brijesh Singh, Harald Hoyer

On Tue, Apr 2, 2024 at 1:18 AM Michael Roth <michael.roth@amd.com> wrote:
>
> On Sat, Mar 30, 2024 at 09:41:30PM +0100, Paolo Bonzini wrote:
> > On 3/29/24 23:58, Michael Roth wrote:
> > >
> > > +           /* Handle boot vCPU first to ensure consistent measurement of initial state. */
> > > +           if (!boot_vcpu_handled && vcpu->vcpu_id != 0)
> > > +                   continue;
> > > +
> > > +           if (boot_vcpu_handled && vcpu->vcpu_id == 0)
> > > +                   continue;
> >
> > Why was this not necessary for KVM_SEV_LAUNCH_UPDATE_VMSA?  Do we need it
> > now?
>
> I tried to find the original discussion for more context, but can't seem to
> locate it. But AIUI, there are cases where a VMM may create AP vCPUs earlier
> than it does the BSP, in which case kvm_for_each_vcpu() might return an AP
> as it's first entry and cause that VMSA to get measured before, leading
> to a different measurement depending on the creation ordering.

I think that would be considered a bug in either the VMM or the
"thing" that computes the measurement.

If that hasn't been a problem for SEV-ES, I'd rather keep the code simple.

> We could however limit the change to KVM_X86_SEV_ES_VM and
> document that as part of KVM_SEV_INIT2, since there is similarly chance
> for measurement changes their WRT to the new FPU/XSAVE sync'ing that was
> added.

Hmm, I need to double check that the FPU/XSAVE syncing doesn't break
existing measurements, too.

Paolo



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v12 13/29] KVM: SEV: Add support to handle GHCB GPA register VMGEXIT
  2024-03-29 22:58 [PATCH v12 00/29] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (11 preceding siblings ...)
  2024-03-29 22:58 ` [PATCH v12 12/29] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_FINISH command Michael Roth
@ 2024-03-29 22:58 ` Michael Roth
  2024-03-29 22:58 ` [PATCH v12 14/29] KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT Michael Roth
                   ` (16 subsequent siblings)
  29 siblings, 0 replies; 55+ messages in thread
From: Michael Roth @ 2024-03-29 22:58 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

SEV-SNP guests are required to perform a GHCB GPA registration. Before
using a GHCB GPA for a vCPU the first time, a guest must register the
vCPU GHCB GPA. If hypervisor can work with the guest requested GPA then
it must respond back with the same GPA otherwise return -1.

On VMEXIT, verify that the GHCB GPA matches with the registered value.
If a mismatch is detected, then abort the guest.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/sev-common.h |  8 ++++++++
 arch/x86/kvm/svm/sev.c            | 27 +++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.h            |  7 +++++++
 3 files changed, 42 insertions(+)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 5a8246dd532f..1006bfffe07a 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -59,6 +59,14 @@
 #define GHCB_MSR_AP_RESET_HOLD_RESULT_POS	12
 #define GHCB_MSR_AP_RESET_HOLD_RESULT_MASK	GENMASK_ULL(51, 0)
 
+/* Preferred GHCB GPA Request */
+#define GHCB_MSR_PREF_GPA_REQ		0x010
+#define GHCB_MSR_GPA_VALUE_POS		12
+#define GHCB_MSR_GPA_VALUE_MASK		GENMASK_ULL(51, 0)
+
+#define GHCB_MSR_PREF_GPA_RESP		0x011
+#define GHCB_MSR_PREF_GPA_NONE		0xfffffffffffff
+
 /* GHCB GPA Register */
 #define GHCB_MSR_REG_GPA_REQ		0x012
 #define GHCB_MSR_REG_GPA_REQ_VAL(v)			\
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 3d6c030091c2..b882f72a940a 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3474,6 +3474,26 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 		set_ghcb_msr_bits(svm, GHCB_MSR_HV_FT_RESP,
 				  GHCB_MSR_INFO_MASK, GHCB_MSR_INFO_POS);
 		break;
+	case GHCB_MSR_PREF_GPA_REQ:
+		set_ghcb_msr_bits(svm, GHCB_MSR_PREF_GPA_NONE, GHCB_MSR_GPA_VALUE_MASK,
+				  GHCB_MSR_GPA_VALUE_POS);
+		set_ghcb_msr_bits(svm, GHCB_MSR_PREF_GPA_RESP, GHCB_MSR_INFO_MASK,
+				  GHCB_MSR_INFO_POS);
+		break;
+	case GHCB_MSR_REG_GPA_REQ: {
+		u64 gfn;
+
+		gfn = get_ghcb_msr_bits(svm, GHCB_MSR_GPA_VALUE_MASK,
+					GHCB_MSR_GPA_VALUE_POS);
+
+		svm->sev_es.ghcb_registered_gpa = gfn_to_gpa(gfn);
+
+		set_ghcb_msr_bits(svm, gfn, GHCB_MSR_GPA_VALUE_MASK,
+				  GHCB_MSR_GPA_VALUE_POS);
+		set_ghcb_msr_bits(svm, GHCB_MSR_REG_GPA_RESP, GHCB_MSR_INFO_MASK,
+				  GHCB_MSR_INFO_POS);
+		break;
+	}
 	case GHCB_MSR_TERM_REQ: {
 		u64 reason_set, reason_code;
 
@@ -3537,6 +3557,13 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 	trace_kvm_vmgexit_enter(vcpu->vcpu_id, svm->sev_es.ghcb);
 
 	sev_es_sync_from_ghcb(svm);
+
+	/* SEV-SNP guest requires that the GHCB GPA must be registered */
+	if (sev_snp_guest(svm->vcpu.kvm) && !ghcb_gpa_is_registered(svm, ghcb_gpa)) {
+		vcpu_unimpl(&svm->vcpu, "vmgexit: GHCB GPA [%#llx] is not registered.\n", ghcb_gpa);
+		return -EINVAL;
+	}
+
 	ret = sev_es_validate_vmgexit(svm);
 	if (ret)
 		return ret;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index a3c190642c57..bb04d63012b4 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -208,6 +208,8 @@ struct vcpu_sev_es_state {
 	u32 ghcb_sa_len;
 	bool ghcb_sa_sync;
 	bool ghcb_sa_free;
+
+	u64 ghcb_registered_gpa;
 };
 
 struct vcpu_svm {
@@ -361,6 +363,11 @@ static __always_inline bool sev_snp_guest(struct kvm *kvm)
 #endif
 }
 
+static inline bool ghcb_gpa_is_registered(struct vcpu_svm *svm, u64 val)
+{
+	return svm->sev_es.ghcb_registered_gpa == val;
+}
+
 static inline void vmcb_mark_all_dirty(struct vmcb *vmcb)
 {
 	vmcb->control.clean = 0;
-- 
2.25.1



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v12 14/29] KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT
  2024-03-29 22:58 [PATCH v12 00/29] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (12 preceding siblings ...)
  2024-03-29 22:58 ` [PATCH v12 13/29] KVM: SEV: Add support to handle GHCB GPA register VMGEXIT Michael Roth
@ 2024-03-29 22:58 ` Michael Roth
  2024-03-29 22:58 ` [PATCH v12 15/29] KVM: SEV: Add support to handle " Michael Roth
                   ` (15 subsequent siblings)
  29 siblings, 0 replies; 55+ messages in thread
From: Michael Roth @ 2024-03-29 22:58 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

SEV-SNP VMs can ask the hypervisor to change the page state in the RMP
table to be private or shared using the Page State Change MSR protocol
as defined in the GHCB specification.

When using gmem, private/shared memory is allocated through separate
pools, and KVM relies on userspace issuing a KVM_SET_MEMORY_ATTRIBUTES
KVM ioctl to tell the KVM MMU whether or not a particular GFN should be
backed by private memory or not.

Forward these page state change requests to userspace so that it can
issue the expected KVM ioctls. The KVM MMU will handle updating the RMP
entries when it is ready to map a private page into a guest.

Define a new KVM_EXIT_VMGEXIT for exits of this type, and structure it
so that it can be extended for other cases where VMGEXITs need some
level of handling in userspace.

Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 Documentation/virt/kvm/api.rst    | 33 +++++++++++++++++++++++++++++++
 arch/x86/include/asm/sev-common.h |  6 ++++++
 arch/x86/kvm/svm/sev.c            | 33 +++++++++++++++++++++++++++++++
 include/uapi/linux/kvm.h          | 17 ++++++++++++++++
 4 files changed, 89 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index f0b76ff5030d..4a7a2945bc78 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7060,6 +7060,39 @@ Please note that the kernel is allowed to use the kvm_run structure as the
 primary storage for certain register types. Therefore, the kernel may use the
 values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
 
+::
+
+		/* KVM_EXIT_VMGEXIT */
+		struct kvm_user_vmgexit {
+		#define KVM_USER_VMGEXIT_PSC_MSR	1
+			__u32 type; /* KVM_USER_VMGEXIT_* type */
+			union {
+				struct {
+					__u64 gpa;
+		#define KVM_USER_VMGEXIT_PSC_MSR_OP_PRIVATE	1
+		#define KVM_USER_VMGEXIT_PSC_MSR_OP_SHARED	2
+					__u8 op;
+					__u32 ret;
+				} psc_msr;
+			};
+		};
+
+If exit reason is KVM_EXIT_VMGEXIT then it indicates that an SEV-SNP guest
+has issued a VMGEXIT instruction (as documented by the AMD Architecture
+Programmer's Manual (APM)) to the hypervisor that needs to be serviced by
+userspace. These are generally handled by the host kernel, but in some
+cases some aspects handling a VMGEXIT are handled by userspace.
+
+A kvm_user_vmgexit structure is defined to encapsulate the data to be
+sent to or returned by userspace. The type field defines the specific type
+of exit that needs to be serviced, and that type is used as a discriminator
+to determine which union type should be used for input/output.
+
+For the KVM_USER_VMGEXIT_PSC_MSR type, the psc_msr union type is used. The
+kernel will supply the 'gpa' and 'op' fields, and userspace is expected to
+update the private/shared state of the GPA using the corresponding
+KVM_SET_MEMORY_ATTRIBUTES ioctl. The 'ret' field is to be set to 0 by
+userpace on success, or some non-zero value on failure.
 
 6. Capabilities that can be enabled on vCPUs
 ============================================
diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 1006bfffe07a..6d68db812de1 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -101,11 +101,17 @@ enum psc_op {
 	/* GHCBData[11:0] */				\
 	GHCB_MSR_PSC_REQ)
 
+#define GHCB_MSR_PSC_REQ_TO_GFN(msr) (((msr) & GENMASK_ULL(51, 12)) >> 12)
+#define GHCB_MSR_PSC_REQ_TO_OP(msr) (((msr) & GENMASK_ULL(55, 52)) >> 52)
+
 #define GHCB_MSR_PSC_RESP		0x015
 #define GHCB_MSR_PSC_RESP_VAL(val)			\
 	/* GHCBData[63:32] */				\
 	(((u64)(val) & GENMASK_ULL(63, 32)) >> 32)
 
+/* Set highest bit as a generic error response */
+#define GHCB_MSR_PSC_RESP_ERROR (BIT_ULL(63) | GHCB_MSR_PSC_RESP)
+
 /* GHCB Hypervisor Feature Request/Response */
 #define GHCB_MSR_HV_FT_REQ		0x080
 #define GHCB_MSR_HV_FT_RESP		0x081
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index b882f72a940a..1464edac2304 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3396,6 +3396,36 @@ static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
 	svm->vmcb->control.ghcb_gpa = value;
 }
 
+static int snp_complete_psc_msr(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+	u64 vmm_ret = vcpu->run->vmgexit.psc_msr.ret;
+
+	set_ghcb_msr(svm, (vmm_ret << 32) | GHCB_MSR_PSC_RESP);
+
+	return 1; /* resume guest */
+}
+
+static int snp_begin_psc_msr(struct kvm_vcpu *vcpu, u64 ghcb_msr)
+{
+	u64 gpa = gfn_to_gpa(GHCB_MSR_PSC_REQ_TO_GFN(ghcb_msr));
+	u8 op = GHCB_MSR_PSC_REQ_TO_OP(ghcb_msr);
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	if (op != SNP_PAGE_STATE_PRIVATE && op != SNP_PAGE_STATE_SHARED) {
+		set_ghcb_msr(svm, GHCB_MSR_PSC_RESP_ERROR);
+		return 1; /* resume guest */
+	}
+
+	vcpu->run->exit_reason = KVM_EXIT_VMGEXIT;
+	vcpu->run->vmgexit.type = KVM_USER_VMGEXIT_PSC_MSR;
+	vcpu->run->vmgexit.psc_msr.gpa = gpa;
+	vcpu->run->vmgexit.psc_msr.op = op;
+	vcpu->arch.complete_userspace_io = snp_complete_psc_msr;
+
+	return 0; /* forward request to userspace */
+}
+
 static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3494,6 +3524,9 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 				  GHCB_MSR_INFO_POS);
 		break;
 	}
+	case GHCB_MSR_PSC_REQ:
+		ret = snp_begin_psc_msr(vcpu, control->ghcb_gpa);
+		break;
 	case GHCB_MSR_TERM_REQ: {
 		u64 reason_set, reason_code;
 
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 2190adbe3002..54b81e46a9fa 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -135,6 +135,20 @@ struct kvm_xen_exit {
 	} u;
 };
 
+struct kvm_user_vmgexit {
+#define KVM_USER_VMGEXIT_PSC_MSR	1
+	__u32 type; /* KVM_USER_VMGEXIT_* type */
+	union {
+		struct {
+			__u64 gpa;
+#define KVM_USER_VMGEXIT_PSC_MSR_OP_PRIVATE	1
+#define KVM_USER_VMGEXIT_PSC_MSR_OP_SHARED	2
+			__u8 op;
+			__u32 ret;
+		} psc_msr;
+	};
+};
+
 #define KVM_S390_GET_SKEYS_NONE   1
 #define KVM_S390_SKEYS_MAX        1048576
 
@@ -178,6 +192,7 @@ struct kvm_xen_exit {
 #define KVM_EXIT_NOTIFY           37
 #define KVM_EXIT_LOONGARCH_IOCSR  38
 #define KVM_EXIT_MEMORY_FAULT     39
+#define KVM_EXIT_VMGEXIT          40
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -433,6 +448,8 @@ struct kvm_run {
 			__u64 gpa;
 			__u64 size;
 		} memory_fault;
+		/* KVM_EXIT_VMGEXIT */
+		struct kvm_user_vmgexit vmgexit;
 		/* Fix the size of the union. */
 		char padding[256];
 	};
-- 
2.25.1



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v12 15/29] KVM: SEV: Add support to handle Page State Change VMGEXIT
  2024-03-29 22:58 [PATCH v12 00/29] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (13 preceding siblings ...)
  2024-03-29 22:58 ` [PATCH v12 14/29] KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT Michael Roth
@ 2024-03-29 22:58 ` Michael Roth
  2024-03-29 22:58 ` [PATCH v12 16/29] KVM: x86: Export the kvm_zap_gfn_range() for the SNP use Michael Roth
                   ` (14 subsequent siblings)
  29 siblings, 0 replies; 55+ messages in thread
From: Michael Roth @ 2024-03-29 22:58 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

SEV-SNP VMs can ask the hypervisor to change the page state in the RMP
table to be private or shared using the Page State Change NAE event
as defined in the GHCB specification version 2.

Forward these requests to userspace as KVM_EXIT_VMGEXITs, similar to how
it is done for requests that don't use a GHCB page.

Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 Documentation/virt/kvm/api.rst | 14 ++++++++++++++
 arch/x86/kvm/svm/sev.c         | 16 ++++++++++++++++
 include/uapi/linux/kvm.h       |  5 +++++
 3 files changed, 35 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 4a7a2945bc78..85099198a10f 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7065,6 +7065,7 @@ values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
 		/* KVM_EXIT_VMGEXIT */
 		struct kvm_user_vmgexit {
 		#define KVM_USER_VMGEXIT_PSC_MSR	1
+		#define KVM_USER_VMGEXIT_PSC		2
 			__u32 type; /* KVM_USER_VMGEXIT_* type */
 			union {
 				struct {
@@ -7074,9 +7075,14 @@ values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
 					__u8 op;
 					__u32 ret;
 				} psc_msr;
+				struct {
+					__u64 shared_gpa;
+					__u64 ret;
+				} psc;
 			};
 		};
 
+
 If exit reason is KVM_EXIT_VMGEXIT then it indicates that an SEV-SNP guest
 has issued a VMGEXIT instruction (as documented by the AMD Architecture
 Programmer's Manual (APM)) to the hypervisor that needs to be serviced by
@@ -7094,6 +7100,14 @@ update the private/shared state of the GPA using the corresponding
 KVM_SET_MEMORY_ATTRIBUTES ioctl. The 'ret' field is to be set to 0 by
 userpace on success, or some non-zero value on failure.
 
+For the KVM_USER_VMGEXIT_PSC type, the psc union type is used. The kernel
+will supply the GPA of the Page State Structure defined in the GHCB spec.
+Userspace will process this structure as defined by the GHCB, and issue
+KVM_SET_MEMORY_ATTRIBUTES ioctls to set the GPAs therein to the expected
+private/shared state. Userspace will return a value in 'ret' that is in
+agreement with the GHCB-defined return values that the guest will expect
+in the SW_EXITINFO2 field of the GHCB in response to these requests.
+
 6. Capabilities that can be enabled on vCPUs
 ============================================
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 1464edac2304..c35ed9d91c89 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3208,6 +3208,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 	case SVM_VMGEXIT_AP_JUMP_TABLE:
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 	case SVM_VMGEXIT_HV_FEATURES:
+	case SVM_VMGEXIT_PSC:
 		break;
 	default:
 		reason = GHCB_ERR_INVALID_EVENT;
@@ -3426,6 +3427,15 @@ static int snp_begin_psc_msr(struct kvm_vcpu *vcpu, u64 ghcb_msr)
 	return 0; /* forward request to userspace */
 }
 
+static int snp_complete_psc(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, vcpu->run->vmgexit.psc.ret);
+
+	return 1; /* resume guest */
+}
+
 static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3663,6 +3673,12 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 
 		ret = 1;
 		break;
+	case SVM_VMGEXIT_PSC:
+		vcpu->run->exit_reason = KVM_EXIT_VMGEXIT;
+		vcpu->run->vmgexit.type = KVM_USER_VMGEXIT_PSC;
+		vcpu->run->vmgexit.psc.shared_gpa = svm->sev_es.sw_scratch;
+		vcpu->arch.complete_userspace_io = snp_complete_psc;
+		break;
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 		vcpu_unimpl(vcpu,
 			    "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 54b81e46a9fa..e33c48bfbd67 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -137,6 +137,7 @@ struct kvm_xen_exit {
 
 struct kvm_user_vmgexit {
 #define KVM_USER_VMGEXIT_PSC_MSR	1
+#define KVM_USER_VMGEXIT_PSC		2
 	__u32 type; /* KVM_USER_VMGEXIT_* type */
 	union {
 		struct {
@@ -146,6 +147,10 @@ struct kvm_user_vmgexit {
 			__u8 op;
 			__u32 ret;
 		} psc_msr;
+		struct {
+			__u64 shared_gpa;
+			__u64 ret;
+		} psc;
 	};
 };
 
-- 
2.25.1



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v12 16/29] KVM: x86: Export the kvm_zap_gfn_range() for the SNP use
  2024-03-29 22:58 [PATCH v12 00/29] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (14 preceding siblings ...)
  2024-03-29 22:58 ` [PATCH v12 15/29] KVM: SEV: Add support to handle " Michael Roth
@ 2024-03-29 22:58 ` Michael Roth
  2024-03-29 22:58 ` [PATCH v12 17/29] KVM: SEV: Add support to handle RMP nested page faults Michael Roth
                   ` (13 subsequent siblings)
  29 siblings, 0 replies; 55+ messages in thread
From: Michael Roth @ 2024-03-29 22:58 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

While resolving the RMP page fault, there may be cases where the page
level between the RMP entry and TDP does not match and the 2M RMP entry
must be split into 4K RMP entries. Or a 2M TDP page need to be broken
into multiple of 4K pages.

To keep the RMP and TDP page level in sync, zap the gfn range after
splitting the pages in the RMP entry. The zap should force the TDP to
gets rebuilt with the new page level.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/mmu.h              | 2 --
 arch/x86/kvm/mmu/mmu.c          | 1 +
 3 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a3f8eba8d8b6..49b294a8d917 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1950,6 +1950,7 @@ void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
 				   const struct kvm_memory_slot *memslot);
 void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen);
 void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long kvm_nr_mmu_pages);
+void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
 
 int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3);
 
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 2c54ba5b0a28..89da37be241a 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -253,8 +253,6 @@ static inline bool kvm_mmu_honors_guest_mtrrs(struct kvm *kvm)
 	return __kvm_mmu_honors_guest_mtrrs(kvm_arch_has_noncoherent_dma(kvm));
 }
 
-void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
-
 int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
 
 int kvm_mmu_post_init_vm(struct kvm *kvm);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 0049d49aa913..c5af52e3f0c5 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6772,6 +6772,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
 
 	return need_tlb_flush;
 }
+EXPORT_SYMBOL_GPL(kvm_zap_gfn_range);
 
 static void kvm_rmap_zap_collapsible_sptes(struct kvm *kvm,
 					   const struct kvm_memory_slot *slot)
-- 
2.25.1



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v12 17/29] KVM: SEV: Add support to handle RMP nested page faults
  2024-03-29 22:58 [PATCH v12 00/29] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (15 preceding siblings ...)
  2024-03-29 22:58 ` [PATCH v12 16/29] KVM: x86: Export the kvm_zap_gfn_range() for the SNP use Michael Roth
@ 2024-03-29 22:58 ` Michael Roth
  2024-03-30 20:55   ` Paolo Bonzini
  2024-03-29 22:58 ` [PATCH v12 18/29] KVM: SEV: Use a VMSA physical address variable for populating VMCB Michael Roth
                   ` (12 subsequent siblings)
  29 siblings, 1 reply; 55+ messages in thread
From: Michael Roth @ 2024-03-29 22:58 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

When SEV-SNP is enabled in the guest, the hardware places restrictions
on all memory accesses based on the contents of the RMP table. When
hardware encounters RMP check failure caused by the guest memory access
it raises the #NPF. The error code contains additional information on
the access type. See the APM volume 2 for additional information.

When using gmem, RMP faults resulting from mismatches between the state
in the RMP table vs. what the guest expects via its page table result
in KVM_EXIT_MEMORY_FAULTs being forwarded to userspace to handle. This
means the only expected case that needs to be handled in the kernel is
when the page size of the entry in the RMP table is larger than the
mapping in the nested page table, in which case a PSMASH instruction
needs to be issued to split the large RMP entry into individual 4K
entries so that subsequent accesses can succeed.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 arch/x86/include/asm/sev.h |   3 ++
 arch/x86/kvm/svm/sev.c     | 103 +++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c     |  21 ++++++--
 arch/x86/kvm/svm/svm.h     |   3 ++
 4 files changed, 126 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 780182cda3ab..234a998e2d2d 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -91,6 +91,9 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
 /* RMUPDATE detected 4K page and 2MB page overlap. */
 #define RMPUPDATE_FAIL_OVERLAP		4
 
+/* PSMASH failed due to concurrent access by another CPU */
+#define PSMASH_FAIL_INUSE		3
+
 /* RMP page size */
 #define RMP_PG_SIZE_4K			0
 #define RMP_PG_SIZE_2M			1
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index c35ed9d91c89..a0a88471f9ab 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3397,6 +3397,13 @@ static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
 	svm->vmcb->control.ghcb_gpa = value;
 }
 
+static int snp_rmptable_psmash(kvm_pfn_t pfn)
+{
+	pfn = pfn & ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1);
+
+	return psmash(pfn);
+}
+
 static int snp_complete_psc_msr(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -3956,3 +3963,99 @@ struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
 
 	return p;
 }
+
+void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
+{
+	struct kvm_memory_slot *slot;
+	struct kvm *kvm = vcpu->kvm;
+	int order, rmp_level, ret;
+	bool assigned;
+	kvm_pfn_t pfn;
+	gfn_t gfn;
+
+	gfn = gpa >> PAGE_SHIFT;
+
+	/*
+	 * The only time RMP faults occur for shared pages is when the guest is
+	 * triggering an RMP fault for an implicit page-state change from
+	 * shared->private. Implicit page-state changes are forwarded to
+	 * userspace via KVM_EXIT_MEMORY_FAULT events, however, so RMP faults
+	 * for shared pages should not end up here.
+	 */
+	if (!kvm_mem_is_private(kvm, gfn)) {
+		pr_warn_ratelimited("SEV: Unexpected RMP fault for non-private GPA 0x%llx\n",
+				    gpa);
+		return;
+	}
+
+	slot = gfn_to_memslot(kvm, gfn);
+	if (!kvm_slot_can_be_private(slot)) {
+		pr_warn_ratelimited("SEV: Unexpected RMP fault, non-private slot for GPA 0x%llx\n",
+				    gpa);
+		return;
+	}
+
+	ret = kvm_gmem_get_pfn(kvm, slot, gfn, &pfn, &order);
+	if (ret) {
+		pr_warn_ratelimited("SEV: Unexpected RMP fault, no backing page for private GPA 0x%llx\n",
+				    gpa);
+		return;
+	}
+
+	ret = snp_lookup_rmpentry(pfn, &assigned, &rmp_level);
+	if (ret || !assigned) {
+		pr_warn_ratelimited("SEV: Unexpected RMP fault, no assigned RMP entry found for GPA 0x%llx PFN 0x%llx error %d\n",
+				    gpa, pfn, ret);
+		goto out;
+	}
+
+	/*
+	 * There are 2 cases where a PSMASH may be needed to resolve an #NPF
+	 * with PFERR_GUEST_RMP_BIT set:
+	 *
+	 * 1) RMPADJUST/PVALIDATE can trigger an #NPF with PFERR_GUEST_SIZEM
+	 *    bit set if the guest issues them with a smaller granularity than
+	 *    what is indicated by the page-size bit in the 2MB RMP entry for
+	 *    the PFN that backs the GPA.
+	 *
+	 * 2) Guest access via NPT can trigger an #NPF if the NPT mapping is
+	 *    smaller than what is indicated by the 2MB RMP entry for the PFN
+	 *    that backs the GPA.
+	 *
+	 * In both these cases, the corresponding 2M RMP entry needs to
+	 * be PSMASH'd to 512 4K RMP entries.  If the RMP entry is already
+	 * split into 4K RMP entries, then this is likely a spurious case which
+	 * can occur when there are concurrent accesses by the guest to a 2MB
+	 * GPA range that is backed by a 2MB-aligned PFN who's RMP entry is in
+	 * the process of being PMASH'd into 4K entries. These cases should
+	 * resolve automatically on subsequent accesses, so just ignore them
+	 * here.
+	 */
+	if (rmp_level == PG_LEVEL_4K) {
+		pr_debug_ratelimited("%s: Spurious RMP fault for GPA 0x%llx, error_code 0x%llx",
+				     __func__, gpa, error_code);
+		goto out;
+	}
+
+	pr_debug_ratelimited("%s: Splitting 2M RMP entry for GPA 0x%llx, error_code 0x%llx",
+			     __func__, gpa, error_code);
+	ret = snp_rmptable_psmash(pfn);
+	if (ret && ret != PSMASH_FAIL_INUSE) {
+		/*
+		 * Look it up again. If it's 4K now then the PSMASH may have raced with
+		 * another process and the issue has already resolved itself.
+		 */
+		if (!snp_lookup_rmpentry(pfn, &assigned, &rmp_level) && assigned &&
+		    rmp_level == PG_LEVEL_4K) {
+			pr_debug_ratelimited("%s: PSMASH for GPA 0x%llx failed with ret %d due to potential race",
+					     __func__, gpa, ret);
+			goto out;
+		}
+		pr_err_ratelimited("SEV: Unable to split RMP entry for GPA 0x%llx PFN 0x%llx ret %d\n",
+				   gpa, pfn, ret);
+	}
+
+	kvm_zap_gfn_range(kvm, gfn, gfn + PTRS_PER_PMD);
+out:
+	put_page(pfn_to_page(pfn));
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 2c162f6a1d78..648a05ca53fc 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2043,15 +2043,28 @@ static int pf_interception(struct kvm_vcpu *vcpu)
 static int npf_interception(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
+	int rc;
 
 	u64 fault_address = svm->vmcb->control.exit_info_2;
 	u64 error_code = svm->vmcb->control.exit_info_1;
 
 	trace_kvm_page_fault(vcpu, fault_address, error_code);
-	return kvm_mmu_page_fault(vcpu, fault_address, error_code,
-			static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
-			svm->vmcb->control.insn_bytes : NULL,
-			svm->vmcb->control.insn_len);
+	rc = kvm_mmu_page_fault(vcpu, fault_address, error_code,
+				static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
+				svm->vmcb->control.insn_bytes : NULL,
+				svm->vmcb->control.insn_len);
+
+	/*
+	 * rc == 0 indicates a userspace exit is needed to handle page
+	 * transitions, so do that first before updating the RMP table.
+	 */
+	if (error_code & PFERR_GUEST_RMP_MASK) {
+		if (rc == 0)
+			return rc;
+		sev_handle_rmp_fault(vcpu, fault_address, error_code);
+	}
+
+	return rc;
 }
 
 static int db_interception(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index bb04d63012b4..c0675ff2d8a2 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -722,6 +722,7 @@ void sev_hardware_unsetup(void);
 int sev_cpu_init(struct svm_cpu_data *sd);
 int sev_dev_get_attr(u64 attr, u64 *val);
 extern unsigned int max_sev_asid;
+void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
 #else
 static inline struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu) {
 	return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
@@ -735,6 +736,8 @@ static inline void sev_hardware_unsetup(void) {}
 static inline int sev_cpu_init(struct svm_cpu_data *sd) { return 0; }
 static inline int sev_dev_get_attr(u64 attr, u64 *val) { return -ENXIO; }
 #define max_sev_asid 0
+static inline void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code) {}
+
 #endif
 
 /* vmenter.S */
-- 
2.25.1



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v12 17/29] KVM: SEV: Add support to handle RMP nested page faults
  2024-03-29 22:58 ` [PATCH v12 17/29] KVM: SEV: Add support to handle RMP nested page faults Michael Roth
@ 2024-03-30 20:55   ` Paolo Bonzini
  0 siblings, 0 replies; 55+ messages in thread
From: Paolo Bonzini @ 2024-03-30 20:55 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, seanjc, vkuznets,
	jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, Brijesh Singh

On 3/29/24 23:58, Michael Roth wrote:
> +	if (rmp_level == PG_LEVEL_4K) {
> +		pr_debug_ratelimited("%s: Spurious RMP fault for GPA 0x%llx, error_code 0x%llx",
> +				     __func__, gpa, error_code);
> +		goto out;
> +	}
> +
> +	pr_debug_ratelimited("%s: Splitting 2M RMP entry for GPA 0x%llx, error_code 0x%llx",
> +			     __func__, gpa, error_code);
> +	ret = snp_rmptable_psmash(pfn);
> +	if (ret && ret != PSMASH_FAIL_INUSE) {
> +		/*
> +		 * Look it up again. If it's 4K now then the PSMASH may have raced with
> +		 * another process and the issue has already resolved itself.
> +		 */
> +		if (!snp_lookup_rmpentry(pfn, &assigned, &rmp_level) && assigned &&
> +		    rmp_level == PG_LEVEL_4K) {
> +			pr_debug_ratelimited("%s: PSMASH for GPA 0x%llx failed with ret %d due to potential race",
> +					     __func__, gpa, ret);
> +			goto out;
> +		}

Please change these pr_debug_ratelimited() to just a single trace point 
after the call to snp_rmptable_psmash().

Paolo



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v12 18/29] KVM: SEV: Use a VMSA physical address variable for populating VMCB
  2024-03-29 22:58 [PATCH v12 00/29] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (16 preceding siblings ...)
  2024-03-29 22:58 ` [PATCH v12 17/29] KVM: SEV: Add support to handle RMP nested page faults Michael Roth
@ 2024-03-29 22:58 ` Michael Roth
  2024-03-30 21:01   ` Paolo Bonzini
  2024-03-29 22:58 ` [PATCH v12 19/29] KVM: SEV: Support SEV-SNP AP Creation NAE event Michael Roth
                   ` (11 subsequent siblings)
  29 siblings, 1 reply; 55+ messages in thread
From: Michael Roth @ 2024-03-29 22:58 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick

From: Tom Lendacky <thomas.lendacky@amd.com>

In preparation to support SEV-SNP AP Creation, use a variable that holds
the VMSA physical address rather than converting the virtual address.
This will allow SEV-SNP AP Creation to set the new physical address that
will be used should the vCPU reset path be taken.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/svm/sev.c | 3 +--
 arch/x86/kvm/svm/svm.c | 9 ++++++++-
 arch/x86/kvm/svm/svm.h | 1 +
 3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index a0a88471f9ab..ce1c727bad23 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3780,8 +3780,7 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
 	 * the VMSA will be NULL if this vCPU is the destination for intrahost
 	 * migration, and will be copied later.
 	 */
-	if (svm->sev_es.vmsa)
-		svm->vmcb->control.vmsa_pa = __pa(svm->sev_es.vmsa);
+	svm->vmcb->control.vmsa_pa = svm->sev_es.vmsa_pa;
 
 	/* Can't intercept CR register access, HV can't modify CR registers */
 	svm_clr_intercept(svm, INTERCEPT_CR0_READ);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 648a05ca53fc..e036a8927717 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1451,9 +1451,16 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu)
 	svm->vmcb01.pa = __sme_set(page_to_pfn(vmcb01_page) << PAGE_SHIFT);
 	svm_switch_vmcb(svm, &svm->vmcb01);
 
-	if (vmsa_page)
+	if (vmsa_page) {
 		svm->sev_es.vmsa = page_address(vmsa_page);
 
+		/*
+		 * Do not include the encryption mask on the VMSA physical
+		 * address since hardware will access it using the guest key.
+		 */
+		svm->sev_es.vmsa_pa = __pa(svm->sev_es.vmsa);
+	}
+
 	svm->guest_state_loaded = false;
 
 	return 0;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index c0675ff2d8a2..8cce3315b46c 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -199,6 +199,7 @@ struct vcpu_sev_es_state {
 	struct ghcb *ghcb;
 	u8 valid_bitmap[16];
 	struct kvm_host_map ghcb_map;
+	hpa_t vmsa_pa;
 	bool received_first_sipi;
 	unsigned int ap_reset_hold_type;
 
-- 
2.25.1



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v12 18/29] KVM: SEV: Use a VMSA physical address variable for populating VMCB
  2024-03-29 22:58 ` [PATCH v12 18/29] KVM: SEV: Use a VMSA physical address variable for populating VMCB Michael Roth
@ 2024-03-30 21:01   ` Paolo Bonzini
  2024-04-16 11:53     ` Paolo Bonzini
  0 siblings, 1 reply; 55+ messages in thread
From: Paolo Bonzini @ 2024-03-30 21:01 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, seanjc, vkuznets,
	jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick

On 3/29/24 23:58, Michael Roth wrote:
> From: Tom Lendacky<thomas.lendacky@amd.com>
> 
> In preparation to support SEV-SNP AP Creation, use a variable that holds
> the VMSA physical address rather than converting the virtual address.
> This will allow SEV-SNP AP Creation to set the new physical address that
> will be used should the vCPU reset path be taken.
> 
> Signed-off-by: Tom Lendacky<thomas.lendacky@amd.com>
> Signed-off-by: Ashish Kalra<ashish.kalra@amd.com>
> Signed-off-by: Michael Roth<michael.roth@amd.com>
> ---

I'll get back to this one after Easter, but it looks like Sean had some 
objections at https://lore.kernel.org/lkml/ZeCqnq7dLcJI41O9@google.com/.

Paolo



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v12 18/29] KVM: SEV: Use a VMSA physical address variable for populating VMCB
  2024-03-30 21:01   ` Paolo Bonzini
@ 2024-04-16 11:53     ` Paolo Bonzini
  2024-04-16 14:25       ` Tom Lendacky
  2024-04-17 20:57       ` Michael Roth
  0 siblings, 2 replies; 55+ messages in thread
From: Paolo Bonzini @ 2024-04-16 11:53 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, seanjc, vkuznets,
	jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick

On Sat, Mar 30, 2024 at 10:01 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 3/29/24 23:58, Michael Roth wrote:
> > From: Tom Lendacky<thomas.lendacky@amd.com>
> >
> > In preparation to support SEV-SNP AP Creation, use a variable that holds
> > the VMSA physical address rather than converting the virtual address.
> > This will allow SEV-SNP AP Creation to set the new physical address that
> > will be used should the vCPU reset path be taken.
> >
> > Signed-off-by: Tom Lendacky<thomas.lendacky@amd.com>
> > Signed-off-by: Ashish Kalra<ashish.kalra@amd.com>
> > Signed-off-by: Michael Roth<michael.roth@amd.com>
> > ---
>
> I'll get back to this one after Easter, but it looks like Sean had some
> objections at https://lore.kernel.org/lkml/ZeCqnq7dLcJI41O9@google.com/.

So IIUC the gist of the solution here would be to replace

   /* Use the new VMSA */
   svm->sev_es.vmsa_pa = pfn_to_hpa(pfn);
   svm->vmcb->control.vmsa_pa = svm->sev_es.vmsa_pa;

with something like

   /* Use the new VMSA */
   __free_page(virt_to_page(svm->sev_es.vmsa));
   svm->sev_es.vmsa = pfn_to_kaddr(pfn);
   svm->vmcb->control.vmsa_pa = __pa(svm->sev_es.vmsa);

and wrap the __free_page() in sev_free_vcpu() with "if
(!svm->sev_es.snp_ap_create)".

This should remove the need for svm->sev_es.vmsa_pa. It is always
equal to svm->vmcb->control.vmsa_pa anyway.

Also, it's possible to remove

   /*
    * gmem pages aren't currently migratable, but if this ever
    * changes then care should be taken to ensure
    * svm->sev_es.vmsa_pa is pinned through some other means.
    */
   kvm_release_pfn_clean(pfn);

if sev_free_vcpu() does

   if (svm->sev_es.snp_ap_create) {
     __free_page(virt_to_page(svm->sev_es.vmsa));
   } else {
     put_page(virt_to_page(svm->sev_es.vmsa));
   }

and while at it, please reverse the polarity of snp_ap_create and
rename it to snp_ap_created.

Paolo



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v12 18/29] KVM: SEV: Use a VMSA physical address variable for populating VMCB
  2024-04-16 11:53     ` Paolo Bonzini
@ 2024-04-16 14:25       ` Tom Lendacky
  2024-04-16 17:00         ` Paolo Bonzini
  2024-04-17 20:57       ` Michael Roth
  1 sibling, 1 reply; 55+ messages in thread
From: Tom Lendacky @ 2024-04-16 14:25 UTC (permalink / raw)
  To: Paolo Bonzini, Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, hpa, ardb, seanjc, vkuznets, jmattson, luto,
	dave.hansen, slp, pgonda, peterz, srinivas.pandruvada, rientjes,
	dovmurik, tobin, bp, vbabka, kirill, ak, tony.luck,
	sathyanarayanan.kuppuswamy, alpergun, jarkko, ashish.kalra,
	nikunj.dadhania, pankaj.gupta, liam.merwick

On 4/16/24 06:53, Paolo Bonzini wrote:
> On Sat, Mar 30, 2024 at 10:01 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> On 3/29/24 23:58, Michael Roth wrote:
>>> From: Tom Lendacky<thomas.lendacky@amd.com>
>>>
>>> In preparation to support SEV-SNP AP Creation, use a variable that holds
>>> the VMSA physical address rather than converting the virtual address.
>>> This will allow SEV-SNP AP Creation to set the new physical address that
>>> will be used should the vCPU reset path be taken.
>>>
>>> Signed-off-by: Tom Lendacky<thomas.lendacky@amd.com>
>>> Signed-off-by: Ashish Kalra<ashish.kalra@amd.com>
>>> Signed-off-by: Michael Roth<michael.roth@amd.com>
>>> ---
>>
>> I'll get back to this one after Easter, but it looks like Sean had some
>> objections at https://lore.kernel.org/lkml/ZeCqnq7dLcJI41O9@google.com/.
> 

Note that AP create is called multiple times per vCPU under OVMF with 
and added call by the kernel when booting the APs.

> So IIUC the gist of the solution here would be to replace
> 
>     /* Use the new VMSA */
>     svm->sev_es.vmsa_pa = pfn_to_hpa(pfn);
>     svm->vmcb->control.vmsa_pa = svm->sev_es.vmsa_pa;
> 
> with something like
> 
>     /* Use the new VMSA */
>     __free_page(virt_to_page(svm->sev_es.vmsa));

This should only be called for the page that KVM allocated during vCPU 
creation. After that, the VMSA page from an AP create is a guest page 
and shouldn't be freed by KVM.

>     svm->sev_es.vmsa = pfn_to_kaddr(pfn);
>     svm->vmcb->control.vmsa_pa = __pa(svm->sev_es.vmsa);
> 
> and wrap the __free_page() in sev_free_vcpu() with "if
> (!svm->sev_es.snp_ap_create)".
> 
> This should remove the need for svm->sev_es.vmsa_pa. It is always
> equal to svm->vmcb->control.vmsa_pa anyway.

Yeah, a little bit of multiple VMPL support worked its way in there 
where the VMSA per VMPL level is maintained.

But I believe that Sean wants a separate KVM object per VMPL level, so 
that would disappear anyway (Joerg and I want to get on the PUCK 
schedule to talk about multi-VMPL level support soon.)

> 
> Also, it's possible to remove
> 
>     /*
>      * gmem pages aren't currently migratable, but if this ever
>      * changes then care should be taken to ensure
>      * svm->sev_es.vmsa_pa is pinned through some other means.
>      */
>     kvm_release_pfn_clean(pfn);

Removing this here will cause any previous guest VMSA page(s) to remain 
pinned, that's the reason for unpinning here. OVMF re-uses the VMSA, but 
that isn't a requirement for a firmware, and the kernel will create a 
new VMSA page.

> 
> if sev_free_vcpu() does
> 
>     if (svm->sev_es.snp_ap_create) {
>       __free_page(virt_to_page(svm->sev_es.vmsa));
>     } else {
>       put_page(virt_to_page(svm->sev_es.vmsa));
>     }
> 
> and while at it, please reverse the polarity of snp_ap_create and
> rename it to snp_ap_created.

The snp_ap_create flag gets cleared once the new VMSA is put in place, 
it doesn't remain. So the flag usage will have to be altered in order 
for this function to work properly.

Thanks,
Tom

> 
> Paolo
> 


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v12 18/29] KVM: SEV: Use a VMSA physical address variable for populating VMCB
  2024-04-16 14:25       ` Tom Lendacky
@ 2024-04-16 17:00         ` Paolo Bonzini
  0 siblings, 0 replies; 55+ messages in thread
From: Paolo Bonzini @ 2024-04-16 17:00 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
	linux-kernel, tglx, mingo, jroedel, hpa, ardb, seanjc, vkuznets,
	jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick

On Tue, Apr 16, 2024 at 4:25 PM Tom Lendacky <thomas.lendacky@amd.com> wrote:
>
> On 4/16/24 06:53, Paolo Bonzini wrote:
> > On Sat, Mar 30, 2024 at 10:01 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
> >>
> >> On 3/29/24 23:58, Michael Roth wrote:
> >>> From: Tom Lendacky<thomas.lendacky@amd.com>
> >>>
> >>> In preparation to support SEV-SNP AP Creation, use a variable that holds
> >>> the VMSA physical address rather than converting the virtual address.
> >>> This will allow SEV-SNP AP Creation to set the new physical address that
> >>> will be used should the vCPU reset path be taken.
> >>>
> >>> Signed-off-by: Tom Lendacky<thomas.lendacky@amd.com>
> >>> Signed-off-by: Ashish Kalra<ashish.kalra@amd.com>
> >>> Signed-off-by: Michael Roth<michael.roth@amd.com>
> >>> ---
> >>
> >> I'll get back to this one after Easter, but it looks like Sean had some
> >> objections at https://lore.kernel.org/lkml/ZeCqnq7dLcJI41O9@google.com/.
> >
>
> Note that AP create is called multiple times per vCPU under OVMF with
> and added call by the kernel when booting the APs.

Oooh, I somehow thought that

+ target_svm->sev_es.snp_vmsa_gpa = INVALID_PAGE;
+ target_svm->sev_es.snp_ap_create = true;

was in svm_create_vcpu().

So there should be separate "snp_ap_waiting_for_reset" and
"snp_has_guest_vmsa" flags. The latter is set once in
__sev_snp_update_protected_guest_state and is what governs whether the
VMSA page was allocated or just refcounted.

> But I believe that Sean wants a separate KVM object per VMPL level, so
> that would disappear anyway (Joerg and I want to get on the PUCK
> schedule to talk about multi-VMPL level support soon.)

Yes, agreed on both counts.

> >     /*
> >      * gmem pages aren't currently migratable, but if this ever
> >      * changes then care should be taken to ensure
> >      * svm->sev_es.vmsa_pa is pinned through some other means.
> >      */
> >     kvm_release_pfn_clean(pfn);
>
> Removing this here will cause any previous guest VMSA page(s) to remain
> pinned, that's the reason for unpinning here. OVMF re-uses the VMSA, but
> that isn't a requirement for a firmware, and the kernel will create a
> new VMSA page.

Yes, and once you understand that I was thinking of a set-once flag
"snp_has_guest_vmsa" it should all make a lot more sense.

Paolo



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v12 18/29] KVM: SEV: Use a VMSA physical address variable for populating VMCB
  2024-04-16 11:53     ` Paolo Bonzini
  2024-04-16 14:25       ` Tom Lendacky
@ 2024-04-17 20:57       ` Michael Roth
  1 sibling, 0 replies; 55+ messages in thread
From: Michael Roth @ 2024-04-17 20:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, seanjc, vkuznets,
	jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick

On Tue, Apr 16, 2024 at 01:53:24PM +0200, Paolo Bonzini wrote:
> On Sat, Mar 30, 2024 at 10:01 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
> >
> > On 3/29/24 23:58, Michael Roth wrote:
> > > From: Tom Lendacky<thomas.lendacky@amd.com>
> > >
> > > In preparation to support SEV-SNP AP Creation, use a variable that holds
> > > the VMSA physical address rather than converting the virtual address.
> > > This will allow SEV-SNP AP Creation to set the new physical address that
> > > will be used should the vCPU reset path be taken.
> > >
> > > Signed-off-by: Tom Lendacky<thomas.lendacky@amd.com>
> > > Signed-off-by: Ashish Kalra<ashish.kalra@amd.com>
> > > Signed-off-by: Michael Roth<michael.roth@amd.com>
> > > ---
> >
> > I'll get back to this one after Easter, but it looks like Sean had some
> > objections at https://lore.kernel.org/lkml/ZeCqnq7dLcJI41O9@google.com/.
> 
> So IIUC the gist of the solution here would be to replace
> 
>    /* Use the new VMSA */
>    svm->sev_es.vmsa_pa = pfn_to_hpa(pfn);
>    svm->vmcb->control.vmsa_pa = svm->sev_es.vmsa_pa;
> 
> with something like
> 
>    /* Use the new VMSA */
>    __free_page(virt_to_page(svm->sev_es.vmsa));

One downside to free'ing VMSA at this point is there are a number of
additional cleanup routines like wbinvd_on_all_cpus() and in sev_free_vcpu()
which will need to be called before we are able to safely free the page back
to the system.

It would be simple to wrap all that up in an sev_free_vmsa() helper and
also call it here rather than defer it, but from a performance
perspective it would be nice to defer it to shutdown path.


>    svm->sev_es.vmsa = pfn_to_kaddr(pfn);
>    svm->vmcb->control.vmsa_pa = __pa(svm->sev_es.vmsa);

It turns out sev_es_init_vmcb() always ends up setting control.vmsa_pa
again using the new vmsa stored in sev_es.vmsa before the AP re-enters the
guest:

  svm->vmcb->control.vmsa_pa = __pa(svm->sev_es.vmsa);

If we modify that code to instead do:

  if (!svm->sev_es.snp_has_guest_vmsa)
    svm->vmcb->control.vmsa_pa = __pa(svm->sev_es.vmsa);
      
Then it will instead continue to use the control.vmsa_pa set here in
__sev_snp_update_protected_guest_state(), in which case svm->sev_es.vmsa
will only ever be used to store the initial VMSA that was allocated by KVM.
Given that...

> 
> and wrap the __free_page() in sev_free_vcpu() with "if
> (!svm->sev_es.snp_ap_create)".

If we take the deferred approach above, then no checks are needed here
and the KVM-allocated VMSA is cleaned up the same way it is handled for
SEV-ES. SNP never needs to piggy-back off of sev_es.vmsa to pass around
VMSAs that reside in guest memory.

I can still rework things to free KVM-allocated VMSA immediately here if
you prefer but for now I have things implemented as above to keep
SEV-ES/SNP handling similar and avoid performance penalty during guest
boot. I've pushed the revised AP creation patch here for reference:

  https://github.com/mdroth/linux/commit/5a7e76231a7629ba62f8b0bba8039d93d3595ecb

Thanks for the suggestions, this all looks a good bit cleaner either way.

-Mike

> 
> This should remove the need for svm->sev_es.vmsa_pa. It is always
> equal to svm->vmcb->control.vmsa_pa anyway.
> 
> Also, it's possible to remove
> 
>    /*
>     * gmem pages aren't currently migratable, but if this ever
>     * changes then care should be taken to ensure
>     * svm->sev_es.vmsa_pa is pinned through some other means.
>     */
>    kvm_release_pfn_clean(pfn);
> 
> if sev_free_vcpu() does
> 
>    if (svm->sev_es.snp_ap_create) {
>      __free_page(virt_to_page(svm->sev_es.vmsa));
>    } else {
>      put_page(virt_to_page(svm->sev_es.vmsa));
>    }
> 
> and while at it, please reverse the polarity of snp_ap_create and
> rename it to snp_ap_created.
> 
> Paolo
> 


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v12 19/29] KVM: SEV: Support SEV-SNP AP Creation NAE event
  2024-03-29 22:58 [PATCH v12 00/29] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (17 preceding siblings ...)
  2024-03-29 22:58 ` [PATCH v12 18/29] KVM: SEV: Use a VMSA physical address variable for populating VMCB Michael Roth
@ 2024-03-29 22:58 ` Michael Roth
  2024-03-29 22:58 ` [PATCH v12 20/29] KVM: SEV: Add support for GHCB-based termination requests Michael Roth
                   ` (10 subsequent siblings)
  29 siblings, 0 replies; 55+ messages in thread
From: Michael Roth @ 2024-03-29 22:58 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

Add support for the SEV-SNP AP Creation NAE event. This allows SEV-SNP
guests to alter the register state of the APs on their own. This allows
the guest a way of simulating INIT-SIPI.

A new event, KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, is created and used
so as to avoid updating the VMSA pointer while the vCPU is running.

For CREATE
  The guest supplies the GPA of the VMSA to be used for the vCPU with
  the specified APIC ID. The GPA is saved in the svm struct of the
  target vCPU, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is added
  to the vCPU and then the vCPU is kicked.

For CREATE_ON_INIT:
  The guest supplies the GPA of the VMSA to be used for the vCPU with
  the specified APIC ID the next time an INIT is performed. The GPA is
  saved in the svm struct of the target vCPU.

For DESTROY:
  The guest indicates it wishes to stop the vCPU. The GPA is cleared
  from the svm struct, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is
  added to vCPU and then the vCPU is kicked.

The KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event handler will be invoked
as a result of the event or as a result of an INIT. If a new VMSA is to
be installed, the VMSA guest page is set as the VMSA in the vCPU VMCB
and the vCPU state is set to KVM_MP_STATE_RUNNABLE. If a new VMSA is not
to be installed, the VMSA is cleared in the vCPU VMCB and the vCPU state
is set to KVM_MP_STATE_HALTED to prevent it from being run.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: add handling for gmem, move MP_STATE_UNINITIALIZED -> RUNNABLE
 transition to target vCPU side rather than setting vcpu->arch.mp_state
 remotely]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/kvm_host.h |   1 +
 arch/x86/include/asm/svm.h      |   6 +
 arch/x86/kvm/svm/sev.c          | 217 +++++++++++++++++++++++++++++++-
 arch/x86/kvm/svm/svm.c          |  11 +-
 arch/x86/kvm/svm/svm.h          |   8 ++
 arch/x86/kvm/x86.c              |  11 ++
 6 files changed, 252 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 49b294a8d917..0fdacacd6e8e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -121,6 +121,7 @@
 	KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_HV_TLB_FLUSH \
 	KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
+#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE	KVM_ARCH_REQ(34)
 
 #define CR0_RESERVED_BITS                                               \
 	(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 544a43c1cf11..f0dea3750ca9 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -286,8 +286,14 @@ static_assert((X2AVIC_MAX_PHYSICAL_ID & AVIC_PHYSICAL_MAX_INDEX_MASK) == X2AVIC_
 #define AVIC_HPA_MASK	~((0xFFFULL << 52) | 0xFFF)
 
 #define SVM_SEV_FEAT_SNP_ACTIVE				BIT(0)
+#define SVM_SEV_FEAT_RESTRICTED_INJECTION		BIT(3)
+#define SVM_SEV_FEAT_ALTERNATE_INJECTION		BIT(4)
 #define SVM_SEV_FEAT_DEBUG_SWAP				BIT(5)
 
+#define SVM_SEV_FEAT_INT_INJ_MODES		\
+	(SVM_SEV_FEAT_RESTRICTED_INJECTION |	\
+	 SVM_SEV_FEAT_ALTERNATE_INJECTION)
+
 struct vmcb_seg {
 	u16 selector;
 	u16 attrib;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index ce1c727bad23..7dfbf12b454b 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -37,7 +37,7 @@
 #define GHCB_VERSION_MAX	2ULL
 #define GHCB_VERSION_MIN	1ULL
 
-#define GHCB_HV_FT_SUPPORTED	GHCB_HV_FT_SNP
+#define GHCB_HV_FT_SUPPORTED	(GHCB_HV_FT_SNP | GHCB_HV_FT_SNP_AP_CREATION)
 
 /* enable/disable SEV support */
 static bool sev_enabled = true;
@@ -3203,6 +3203,11 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 		if (!kvm_ghcb_sw_scratch_is_valid(svm))
 			goto vmgexit_err;
 		break;
+	case SVM_VMGEXIT_AP_CREATION:
+		if (lower_32_bits(control->exit_info_1) != SVM_VMGEXIT_AP_DESTROY)
+			if (!kvm_ghcb_rax_is_valid(svm))
+				goto vmgexit_err;
+		break;
 	case SVM_VMGEXIT_NMI_COMPLETE:
 	case SVM_VMGEXIT_AP_HLT_LOOP:
 	case SVM_VMGEXIT_AP_JUMP_TABLE:
@@ -3443,6 +3448,195 @@ static int snp_complete_psc(struct kvm_vcpu *vcpu)
 	return 1; /* resume guest */
 }
 
+static int __sev_snp_update_protected_guest_state(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	WARN_ON(!mutex_is_locked(&svm->sev_es.snp_vmsa_mutex));
+
+	/* Mark the vCPU as offline and not runnable */
+	vcpu->arch.pv.pv_unhalted = false;
+	vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
+
+	/* Clear use of the VMSA */
+	svm->sev_es.vmsa_pa = INVALID_PAGE;
+	svm->vmcb->control.vmsa_pa = INVALID_PAGE;
+
+	if (VALID_PAGE(svm->sev_es.snp_vmsa_gpa)) {
+		gfn_t gfn = gpa_to_gfn(svm->sev_es.snp_vmsa_gpa);
+		struct kvm_memory_slot *slot;
+		kvm_pfn_t pfn;
+
+		slot = gfn_to_memslot(vcpu->kvm, gfn);
+		if (!slot)
+			return -EINVAL;
+
+		/*
+		 * The new VMSA will be private memory guest memory, so
+		 * retrieve the PFN from the gmem backend.
+		 */
+		if (kvm_gmem_get_pfn(vcpu->kvm, slot, gfn, &pfn, NULL))
+			return -EINVAL;
+
+		/* Use the new VMSA */
+		svm->sev_es.vmsa_pa = pfn_to_hpa(pfn);
+		svm->vmcb->control.vmsa_pa = svm->sev_es.vmsa_pa;
+
+		/* Mark the vCPU as runnable */
+		vcpu->arch.pv.pv_unhalted = false;
+		vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
+
+		svm->sev_es.snp_vmsa_gpa = INVALID_PAGE;
+
+		/*
+		 * gmem pages aren't currently migratable, but if this ever
+		 * changes then care should be taken to ensure
+		 * svm->sev_es.vmsa_pa is pinned through some other means.
+		 */
+		kvm_release_pfn_clean(pfn);
+	}
+
+	/*
+	 * When replacing the VMSA during SEV-SNP AP creation,
+	 * mark the VMCB dirty so that full state is always reloaded.
+	 */
+	vmcb_mark_all_dirty(svm->vmcb);
+
+	return 0;
+}
+
+/*
+ * Invoked as part of svm_vcpu_reset() processing of an init event.
+ */
+void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+	int ret;
+
+	if (!sev_snp_guest(vcpu->kvm))
+		return;
+
+	mutex_lock(&svm->sev_es.snp_vmsa_mutex);
+
+	if (!svm->sev_es.snp_ap_create)
+		goto unlock;
+
+	svm->sev_es.snp_ap_create = false;
+
+	ret = __sev_snp_update_protected_guest_state(vcpu);
+	if (ret)
+		vcpu_unimpl(vcpu, "snp: AP state update on init failed\n");
+
+unlock:
+	mutex_unlock(&svm->sev_es.snp_vmsa_mutex);
+}
+
+static int sev_snp_ap_creation(struct vcpu_svm *svm)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(svm->vcpu.kvm)->sev_info;
+	struct kvm_vcpu *vcpu = &svm->vcpu;
+	struct kvm_vcpu *target_vcpu;
+	struct vcpu_svm *target_svm;
+	unsigned int request;
+	unsigned int apic_id;
+	bool kick;
+	int ret;
+
+	request = lower_32_bits(svm->vmcb->control.exit_info_1);
+	apic_id = upper_32_bits(svm->vmcb->control.exit_info_1);
+
+	/* Validate the APIC ID */
+	target_vcpu = kvm_get_vcpu_by_id(vcpu->kvm, apic_id);
+	if (!target_vcpu) {
+		vcpu_unimpl(vcpu, "vmgexit: invalid AP APIC ID [%#x] from guest\n",
+			    apic_id);
+		return -EINVAL;
+	}
+
+	ret = 0;
+
+	target_svm = to_svm(target_vcpu);
+
+	/*
+	 * The target vCPU is valid, so the vCPU will be kicked unless the
+	 * request is for CREATE_ON_INIT. For any errors at this stage, the
+	 * kick will place the vCPU in an non-runnable state.
+	 */
+	kick = true;
+
+	mutex_lock(&target_svm->sev_es.snp_vmsa_mutex);
+
+	target_svm->sev_es.snp_vmsa_gpa = INVALID_PAGE;
+	target_svm->sev_es.snp_ap_create = true;
+
+	/* Interrupt injection mode shouldn't change for AP creation */
+	if (request < SVM_VMGEXIT_AP_DESTROY) {
+		u64 sev_features;
+
+		sev_features = vcpu->arch.regs[VCPU_REGS_RAX];
+		sev_features ^= sev->vmsa_features;
+
+		if (sev_features & SVM_SEV_FEAT_INT_INJ_MODES) {
+			vcpu_unimpl(vcpu, "vmgexit: invalid AP injection mode [%#lx] from guest\n",
+				    vcpu->arch.regs[VCPU_REGS_RAX]);
+			ret = -EINVAL;
+			goto out;
+		}
+	}
+
+	switch (request) {
+	case SVM_VMGEXIT_AP_CREATE_ON_INIT:
+		kick = false;
+		fallthrough;
+	case SVM_VMGEXIT_AP_CREATE:
+		if (!page_address_valid(vcpu, svm->vmcb->control.exit_info_2)) {
+			vcpu_unimpl(vcpu, "vmgexit: invalid AP VMSA address [%#llx] from guest\n",
+				    svm->vmcb->control.exit_info_2);
+			ret = -EINVAL;
+			goto out;
+		}
+
+		/*
+		 * Malicious guest can RMPADJUST a large page into VMSA which
+		 * will hit the SNP erratum where the CPU will incorrectly signal
+		 * an RMP violation #PF if a hugepage collides with the RMP entry
+		 * of VMSA page, reject the AP CREATE request if VMSA address from
+		 * guest is 2M aligned.
+		 */
+		if (IS_ALIGNED(svm->vmcb->control.exit_info_2, PMD_SIZE)) {
+			vcpu_unimpl(vcpu,
+				    "vmgexit: AP VMSA address [%llx] from guest is unsafe as it is 2M aligned\n",
+				    svm->vmcb->control.exit_info_2);
+			ret = -EINVAL;
+			goto out;
+		}
+
+		target_svm->sev_es.snp_vmsa_gpa = svm->vmcb->control.exit_info_2;
+		break;
+	case SVM_VMGEXIT_AP_DESTROY:
+		break;
+	default:
+		vcpu_unimpl(vcpu, "vmgexit: invalid AP creation request [%#x] from guest\n",
+			    request);
+		ret = -EINVAL;
+		break;
+	}
+
+out:
+	if (kick) {
+		kvm_make_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, target_vcpu);
+
+		if (target_vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)
+			kvm_make_request(KVM_REQ_UNBLOCK, target_vcpu);
+
+		kvm_vcpu_kick(target_vcpu);
+	}
+
+	mutex_unlock(&target_svm->sev_es.snp_vmsa_mutex);
+
+	return ret;
+}
+
 static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3686,6 +3880,15 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 		vcpu->run->vmgexit.psc.shared_gpa = svm->sev_es.sw_scratch;
 		vcpu->arch.complete_userspace_io = snp_complete_psc;
 		break;
+	case SVM_VMGEXIT_AP_CREATION:
+		ret = sev_snp_ap_creation(svm);
+		if (ret) {
+			ghcb_set_sw_exit_info_1(svm->sev_es.ghcb, 2);
+			ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, GHCB_ERR_INVALID_INPUT);
+		}
+
+		ret = 1;
+		break;
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 		vcpu_unimpl(vcpu,
 			    "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
@@ -3852,6 +4055,8 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm)
 	set_ghcb_msr(svm, GHCB_MSR_SEV_INFO(GHCB_VERSION_MAX,
 					    GHCB_VERSION_MIN,
 					    sev_enc_bit));
+
+	mutex_init(&svm->sev_es.snp_vmsa_mutex);
 }
 
 void sev_es_prepare_switch_to_guest(struct vcpu_svm *svm, struct sev_es_save_area *hostsa)
@@ -3963,6 +4168,16 @@ struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
 	return p;
 }
 
+void sev_vcpu_unblocking(struct kvm_vcpu *vcpu)
+{
+	if (!sev_snp_guest(vcpu->kvm))
+		return;
+
+	if (kvm_test_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, vcpu) &&
+	    vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)
+		vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
+}
+
 void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
 {
 	struct kvm_memory_slot *slot;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e036a8927717..a895d3f07cb8 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1398,6 +1398,9 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 	svm->spec_ctrl = 0;
 	svm->virt_spec_ctrl = 0;
 
+	if (init_event)
+		sev_snp_init_protected_guest_state(vcpu);
+
 	init_vmcb(vcpu);
 
 	if (!init_event)
@@ -4937,6 +4940,12 @@ static void *svm_alloc_apic_backing_page(struct kvm_vcpu *vcpu)
 	return page_address(page);
 }
 
+static void svm_vcpu_unblocking(struct kvm_vcpu *vcpu)
+{
+	sev_vcpu_unblocking(vcpu);
+	avic_vcpu_unblocking(vcpu);
+}
+
 static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.name = KBUILD_MODNAME,
 
@@ -4959,7 +4968,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.vcpu_load = svm_vcpu_load,
 	.vcpu_put = svm_vcpu_put,
 	.vcpu_blocking = avic_vcpu_blocking,
-	.vcpu_unblocking = avic_vcpu_unblocking,
+	.vcpu_unblocking = svm_vcpu_unblocking,
 
 	.update_exception_bitmap = svm_update_exception_bitmap,
 	.get_msr_feature = svm_get_msr_feature,
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 8cce3315b46c..0cdcd0759fe0 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -211,6 +211,10 @@ struct vcpu_sev_es_state {
 	bool ghcb_sa_free;
 
 	u64 ghcb_registered_gpa;
+
+	struct mutex snp_vmsa_mutex; /* Used to handle concurrent updates of VMSA. */
+	gpa_t snp_vmsa_gpa;
+	bool snp_ap_create;
 };
 
 struct vcpu_svm {
@@ -724,6 +728,8 @@ int sev_cpu_init(struct svm_cpu_data *sd);
 int sev_dev_get_attr(u64 attr, u64 *val);
 extern unsigned int max_sev_asid;
 void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
+void sev_vcpu_unblocking(struct kvm_vcpu *vcpu);
+void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
 #else
 static inline struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu) {
 	return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
@@ -738,6 +744,8 @@ static inline int sev_cpu_init(struct svm_cpu_data *sd) { return 0; }
 static inline int sev_dev_get_attr(u64 attr, u64 *val) { return -ENXIO; }
 #define max_sev_asid 0
 static inline void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code) {}
+static inline void sev_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
+static inline void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu) {}
 
 #endif
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f85735b6235d..617c38656757 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10943,6 +10943,14 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 
 		if (kvm_check_request(KVM_REQ_UPDATE_CPU_DIRTY_LOGGING, vcpu))
 			static_call(kvm_x86_update_cpu_dirty_logging)(vcpu);
+
+		if (kvm_check_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, vcpu)) {
+			kvm_vcpu_reset(vcpu, true);
+			if (vcpu->arch.mp_state != KVM_MP_STATE_RUNNABLE) {
+				r = 1;
+				goto out;
+			}
+		}
 	}
 
 	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win ||
@@ -13150,6 +13158,9 @@ static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
 	if (kvm_test_request(KVM_REQ_PMI, vcpu))
 		return true;
 
+	if (kvm_test_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, vcpu))
+		return true;
+
 	if (kvm_arch_interrupt_allowed(vcpu) &&
 	    (kvm_cpu_has_interrupt(vcpu) ||
 	    kvm_guest_apic_has_interrupt(vcpu)))
-- 
2.25.1



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v12 20/29] KVM: SEV: Add support for GHCB-based termination requests
  2024-03-29 22:58 [PATCH v12 00/29] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (18 preceding siblings ...)
  2024-03-29 22:58 ` [PATCH v12 19/29] KVM: SEV: Support SEV-SNP AP Creation NAE event Michael Roth
@ 2024-03-29 22:58 ` Michael Roth
  2024-03-29 22:58 ` [PATCH v12 21/29] KVM: SEV: Implement gmem hook for initializing private pages Michael Roth
                   ` (9 subsequent siblings)
  29 siblings, 0 replies; 55+ messages in thread
From: Michael Roth @ 2024-03-29 22:58 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick

GHCB version 2 adds support for a GHCB-based termination request that
a guest can issue when it reaches an error state and wishes to inform
the hypervisor that it should be terminated. Implement support for that
similarly to GHCB MSR-based termination requests that are already
available to SEV-ES guests via earlier versions of the GHCB protocol.

See 'Termination Request' in the 'Invoking VMGEXIT' section of the GHCB
specification for more details.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/svm/sev.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 7dfbf12b454b..9ea13c2de668 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3214,6 +3214,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 	case SVM_VMGEXIT_HV_FEATURES:
 	case SVM_VMGEXIT_PSC:
+	case SVM_VMGEXIT_TERM_REQUEST:
 		break;
 	default:
 		reason = GHCB_ERR_INVALID_EVENT;
@@ -3889,6 +3890,14 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 
 		ret = 1;
 		break;
+	case SVM_VMGEXIT_TERM_REQUEST:
+		pr_info("SEV-ES guest requested termination: reason %#llx info %#llx\n",
+			control->exit_info_1, control->exit_info_2);
+		vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
+		vcpu->run->system_event.type = KVM_SYSTEM_EVENT_SEV_TERM;
+		vcpu->run->system_event.ndata = 1;
+		vcpu->run->system_event.data[0] = control->ghcb_gpa;
+		break;
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 		vcpu_unimpl(vcpu,
 			    "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
-- 
2.25.1



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v12 21/29] KVM: SEV: Implement gmem hook for initializing private pages
  2024-03-29 22:58 [PATCH v12 00/29] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (19 preceding siblings ...)
  2024-03-29 22:58 ` [PATCH v12 20/29] KVM: SEV: Add support for GHCB-based termination requests Michael Roth
@ 2024-03-29 22:58 ` Michael Roth
  2024-03-30 21:05   ` Paolo Bonzini
  2024-03-29 22:58 ` [PATCH v12 22/29] KVM: SEV: Implement gmem hook for invalidating " Michael Roth
                   ` (8 subsequent siblings)
  29 siblings, 1 reply; 55+ messages in thread
From: Michael Roth @ 2024-03-29 22:58 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick

This will handle the RMP table updates needed to put a page into a
private state before mapping it into an SEV-SNP guest.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/Kconfig   |  1 +
 arch/x86/kvm/svm/sev.c | 98 ++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c |  2 +
 arch/x86/kvm/svm/svm.h |  5 +++
 arch/x86/kvm/x86.c     |  5 +++
 virt/kvm/guest_memfd.c |  4 +-
 6 files changed, 113 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index d0bb0e7a4e80..286b40d0b07c 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -124,6 +124,7 @@ config KVM_AMD_SEV
 	depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
 	select ARCH_HAS_CC_PLATFORM
 	select KVM_GENERIC_PRIVATE_MEM
+	select HAVE_KVM_GMEM_PREPARE
 	help
 	  Provides support for launching Encrypted VMs (SEV) and Encrypted VMs
 	  with Encrypted State (SEV-ES) on AMD processors.
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 9ea13c2de668..e1f8be1df219 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4282,3 +4282,101 @@ void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
 out:
 	put_page(pfn_to_page(pfn));
 }
+
+static bool is_pfn_range_shared(kvm_pfn_t start, kvm_pfn_t end)
+{
+	kvm_pfn_t pfn = start;
+
+	while (pfn < end) {
+		int ret, rmp_level;
+		bool assigned;
+
+		ret = snp_lookup_rmpentry(pfn, &assigned, &rmp_level);
+		if (ret) {
+			pr_warn_ratelimited("SEV: Failed to retrieve RMP entry: PFN 0x%llx GFN start 0x%llx GFN end 0x%llx RMP level %d error %d\n",
+					    pfn, start, end, rmp_level, ret);
+			return false;
+		}
+
+		if (assigned) {
+			pr_debug("%s: overlap detected, PFN 0x%llx start 0x%llx end 0x%llx RMP level %d\n",
+				 __func__, pfn, start, end, rmp_level);
+			return false;
+		}
+
+		pfn++;
+	}
+
+	return true;
+}
+
+static u8 max_level_for_order(int order)
+{
+	if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M))
+		return PG_LEVEL_2M;
+
+	return PG_LEVEL_4K;
+}
+
+static bool is_large_rmp_possible(struct kvm *kvm, kvm_pfn_t pfn, int order)
+{
+	kvm_pfn_t pfn_aligned = ALIGN_DOWN(pfn, PTRS_PER_PMD);
+
+	/*
+	 * If this is a large folio, and the entire 2M range containing the
+	 * PFN is currently shared, then the entire 2M-aligned range can be
+	 * set to private via a single 2M RMP entry.
+	 */
+	if (max_level_for_order(order) > PG_LEVEL_4K &&
+	    is_pfn_range_shared(pfn_aligned, pfn_aligned + PTRS_PER_PMD))
+		return true;
+
+	return false;
+}
+
+int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	kvm_pfn_t pfn_aligned;
+	gfn_t gfn_aligned;
+	int level, rc;
+	bool assigned;
+
+	if (!sev_snp_guest(kvm))
+		return 0;
+
+	rc = snp_lookup_rmpentry(pfn, &assigned, &level);
+	if (rc) {
+		pr_err_ratelimited("SEV: Failed to look up RMP entry: GFN %llx PFN %llx error %d\n",
+				   gfn, pfn, rc);
+		return -ENOENT;
+	}
+
+	if (assigned) {
+		pr_debug("%s: already assigned: gfn %llx pfn %llx max_order %d level %d\n",
+			 __func__, gfn, pfn, max_order, level);
+		return 0;
+	}
+
+	if (is_large_rmp_possible(kvm, pfn, max_order)) {
+		level = PG_LEVEL_2M;
+		pfn_aligned = ALIGN_DOWN(pfn, PTRS_PER_PMD);
+		gfn_aligned = ALIGN_DOWN(gfn, PTRS_PER_PMD);
+	} else {
+		level = PG_LEVEL_4K;
+		pfn_aligned = pfn;
+		gfn_aligned = gfn;
+	}
+
+	rc = rmp_make_private(pfn_aligned, gfn_to_gpa(gfn_aligned), level, sev->asid, false);
+	if (rc) {
+		pr_err_ratelimited("SEV: Failed to update RMP entry: GFN %llx PFN %llx level %d error %d\n",
+				   gfn, pfn, level, rc);
+		return -EINVAL;
+	}
+
+	pr_debug("%s: updated: gfn %llx pfn %llx pfn_aligned %llx max_order %d level %d\n",
+		 __func__, gfn, pfn, pfn_aligned, max_order, level);
+
+	return 0;
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index a895d3f07cb8..c099154e326a 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5078,6 +5078,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
 	.vcpu_get_apicv_inhibit_reasons = avic_vcpu_get_apicv_inhibit_reasons,
 	.alloc_apic_backing_page = svm_alloc_apic_backing_page,
+
+	.gmem_prepare = sev_gmem_prepare,
 };
 
 /*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 0cdcd0759fe0..53618cfc2b89 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -730,6 +730,7 @@ extern unsigned int max_sev_asid;
 void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
 void sev_vcpu_unblocking(struct kvm_vcpu *vcpu);
 void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
+int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
 #else
 static inline struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu) {
 	return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
@@ -746,6 +747,10 @@ static inline int sev_dev_get_attr(u64 attr, u64 *val) { return -ENXIO; }
 static inline void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code) {}
 static inline void sev_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
 static inline void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu) {}
+static inline int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order)
+{
+	return 0;
+}
 
 #endif
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 617c38656757..d05922684005 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -13615,6 +13615,11 @@ bool kvm_arch_no_poll(struct kvm_vcpu *vcpu)
 EXPORT_SYMBOL_GPL(kvm_arch_no_poll);
 
 #ifdef CONFIG_HAVE_KVM_GMEM_PREPARE
+bool kvm_arch_gmem_prepare_needed(struct kvm *kvm)
+{
+	return kvm->arch.vm_type == KVM_X86_SNP_VM;
+}
+
 int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order)
 {
 	return static_call(kvm_x86_gmem_prepare)(kvm, pfn, gfn, max_order);
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 3e3c4b7fff3b..11952254ae48 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -46,8 +46,8 @@ static int kvm_gmem_prepare_folio(struct inode *inode, pgoff_t index, struct fol
 		gfn = slot->base_gfn + index - slot->gmem.pgoff;
 		rc = kvm_arch_gmem_prepare(kvm, gfn, pfn, compound_order(compound_head(page)));
 		if (rc) {
-			pr_warn_ratelimited("gmem: Failed to prepare folio for index %lx, error %d.\n",
-					    index, rc);
+			pr_warn_ratelimited("gmem: Failed to prepare folio for index %lx GFN %llx PFN %llx error %d.\n",
+					    index, gfn, pfn, rc);
 			return rc;
 		}
 	}
-- 
2.25.1



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v12 21/29] KVM: SEV: Implement gmem hook for initializing private pages
  2024-03-29 22:58 ` [PATCH v12 21/29] KVM: SEV: Implement gmem hook for initializing private pages Michael Roth
@ 2024-03-30 21:05   ` Paolo Bonzini
  0 siblings, 0 replies; 55+ messages in thread
From: Paolo Bonzini @ 2024-03-30 21:05 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, seanjc, vkuznets,
	jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick

On 3/29/24 23:58, Michael Roth wrote:
> This will handle the RMP table updates needed to put a page into a
> private state before mapping it into an SEV-SNP guest.
> 
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>   arch/x86/kvm/Kconfig   |  1 +
>   arch/x86/kvm/svm/sev.c | 98 ++++++++++++++++++++++++++++++++++++++++++
>   arch/x86/kvm/svm/svm.c |  2 +
>   arch/x86/kvm/svm/svm.h |  5 +++
>   arch/x86/kvm/x86.c     |  5 +++
>   virt/kvm/guest_memfd.c |  4 +-
>   6 files changed, 113 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> index d0bb0e7a4e80..286b40d0b07c 100644
> --- a/arch/x86/kvm/Kconfig
> +++ b/arch/x86/kvm/Kconfig
> @@ -124,6 +124,7 @@ config KVM_AMD_SEV
>   	depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
>   	select ARCH_HAS_CC_PLATFORM
>   	select KVM_GENERIC_PRIVATE_MEM
> +	select HAVE_KVM_GMEM_PREPARE
>   	help
>   	  Provides support for launching Encrypted VMs (SEV) and Encrypted VMs
>   	  with Encrypted State (SEV-ES) on AMD processors.
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 9ea13c2de668..e1f8be1df219 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -4282,3 +4282,101 @@ void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
>   out:
>   	put_page(pfn_to_page(pfn));
>   }
> +
> +static bool is_pfn_range_shared(kvm_pfn_t start, kvm_pfn_t end)
> +{
> +	kvm_pfn_t pfn = start;
> +
> +	while (pfn < end) {
> +		int ret, rmp_level;
> +		bool assigned;
> +
> +		ret = snp_lookup_rmpentry(pfn, &assigned, &rmp_level);
> +		if (ret) {
> +			pr_warn_ratelimited("SEV: Failed to retrieve RMP entry: PFN 0x%llx GFN start 0x%llx GFN end 0x%llx RMP level %d error %d\n",
> +					    pfn, start, end, rmp_level, ret);
> +			return false;
> +		}
> +
> +		if (assigned) {
> +			pr_debug("%s: overlap detected, PFN 0x%llx start 0x%llx end 0x%llx RMP level %d\n",
> +				 __func__, pfn, start, end, rmp_level);
> +			return false;
> +		}
> +
> +		pfn++;
> +	}
> +
> +	return true;
> +}
> +
> +static u8 max_level_for_order(int order)
> +{
> +	if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M))
> +		return PG_LEVEL_2M;
> +
> +	return PG_LEVEL_4K;
> +}
> +
> +static bool is_large_rmp_possible(struct kvm *kvm, kvm_pfn_t pfn, int order)
> +{
> +	kvm_pfn_t pfn_aligned = ALIGN_DOWN(pfn, PTRS_PER_PMD);
> +
> +	/*
> +	 * If this is a large folio, and the entire 2M range containing the
> +	 * PFN is currently shared, then the entire 2M-aligned range can be
> +	 * set to private via a single 2M RMP entry.
> +	 */
> +	if (max_level_for_order(order) > PG_LEVEL_4K &&
> +	    is_pfn_range_shared(pfn_aligned, pfn_aligned + PTRS_PER_PMD))
> +		return true;
> +
> +	return false;
> +}
> +
> +int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +	kvm_pfn_t pfn_aligned;
> +	gfn_t gfn_aligned;
> +	int level, rc;
> +	bool assigned;
> +
> +	if (!sev_snp_guest(kvm))
> +		return 0;
> +
> +	rc = snp_lookup_rmpentry(pfn, &assigned, &level);
> +	if (rc) {
> +		pr_err_ratelimited("SEV: Failed to look up RMP entry: GFN %llx PFN %llx error %d\n",
> +				   gfn, pfn, rc);
> +		return -ENOENT;
> +	}
> +
> +	if (assigned) {
> +		pr_debug("%s: already assigned: gfn %llx pfn %llx max_order %d level %d\n",
> +			 __func__, gfn, pfn, max_order, level);
> +		return 0;
> +	}
> +
> +	if (is_large_rmp_possible(kvm, pfn, max_order)) {
> +		level = PG_LEVEL_2M;
> +		pfn_aligned = ALIGN_DOWN(pfn, PTRS_PER_PMD);
> +		gfn_aligned = ALIGN_DOWN(gfn, PTRS_PER_PMD);
> +	} else {
> +		level = PG_LEVEL_4K;
> +		pfn_aligned = pfn;
> +		gfn_aligned = gfn;
> +	}
> +
> +	rc = rmp_make_private(pfn_aligned, gfn_to_gpa(gfn_aligned), level, sev->asid, false);
> +	if (rc) {
> +		pr_err_ratelimited("SEV: Failed to update RMP entry: GFN %llx PFN %llx level %d error %d\n",
> +				   gfn, pfn, level, rc);
> +		return -EINVAL;
> +	}
> +
> +	pr_debug("%s: updated: gfn %llx pfn %llx pfn_aligned %llx max_order %d level %d\n",
> +		 __func__, gfn, pfn, pfn_aligned, max_order, level);
> +
> +	return 0;
> +}
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index a895d3f07cb8..c099154e326a 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -5078,6 +5078,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>   	.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
>   	.vcpu_get_apicv_inhibit_reasons = avic_vcpu_get_apicv_inhibit_reasons,
>   	.alloc_apic_backing_page = svm_alloc_apic_backing_page,
> +
> +	.gmem_prepare = sev_gmem_prepare,
>   };
>   
>   /*
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 0cdcd0759fe0..53618cfc2b89 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -730,6 +730,7 @@ extern unsigned int max_sev_asid;
>   void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
>   void sev_vcpu_unblocking(struct kvm_vcpu *vcpu);
>   void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
> +int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
>   #else
>   static inline struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu) {
>   	return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> @@ -746,6 +747,10 @@ static inline int sev_dev_get_attr(u64 attr, u64 *val) { return -ENXIO; }
>   static inline void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code) {}
>   static inline void sev_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
>   static inline void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu) {}
> +static inline int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order)
> +{
> +	return 0;
> +}
>   
>   #endif
>   
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 617c38656757..d05922684005 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -13615,6 +13615,11 @@ bool kvm_arch_no_poll(struct kvm_vcpu *vcpu)
>   EXPORT_SYMBOL_GPL(kvm_arch_no_poll);
>   
>   #ifdef CONFIG_HAVE_KVM_GMEM_PREPARE
> +bool kvm_arch_gmem_prepare_needed(struct kvm *kvm)
> +{
> +	return kvm->arch.vm_type == KVM_X86_SNP_VM;
> +}
> +
>   int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order)
>   {
>   	return static_call(kvm_x86_gmem_prepare)(kvm, pfn, gfn, max_order);
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 3e3c4b7fff3b..11952254ae48 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -46,8 +46,8 @@ static int kvm_gmem_prepare_folio(struct inode *inode, pgoff_t index, struct fol
>   		gfn = slot->base_gfn + index - slot->gmem.pgoff;
>   		rc = kvm_arch_gmem_prepare(kvm, gfn, pfn, compound_order(compound_head(page)));
>   		if (rc) {
> -			pr_warn_ratelimited("gmem: Failed to prepare folio for index %lx, error %d.\n",
> -					    index, rc);
> +			pr_warn_ratelimited("gmem: Failed to prepare folio for index %lx GFN %llx PFN %llx error %d.\n",
> +					    index, gfn, pfn, rc);
>   			return rc;
>   		}
>   	}

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

Paolo



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v12 22/29] KVM: SEV: Implement gmem hook for invalidating private pages
  2024-03-29 22:58 [PATCH v12 00/29] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (20 preceding siblings ...)
  2024-03-29 22:58 ` [PATCH v12 21/29] KVM: SEV: Implement gmem hook for initializing private pages Michael Roth
@ 2024-03-29 22:58 ` Michael Roth
  2024-03-30 21:31   ` Paolo Bonzini
  2024-03-29 22:58 ` [PATCH v12 23/29] KVM: x86: Implement gmem hook for determining max NPT mapping level Michael Roth
                   ` (7 subsequent siblings)
  29 siblings, 1 reply; 55+ messages in thread
From: Michael Roth @ 2024-03-29 22:58 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick

Implement a platform hook to do the work of restoring the direct map
entries of gmem-managed pages and transitioning the corresponding RMP
table entries back to the default shared/hypervisor-owned state.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/Kconfig   |  1 +
 arch/x86/kvm/svm/sev.c | 63 ++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c |  1 +
 arch/x86/kvm/svm/svm.h |  2 ++
 4 files changed, 67 insertions(+)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 286b40d0b07c..32a5c37cbf88 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -125,6 +125,7 @@ config KVM_AMD_SEV
 	select ARCH_HAS_CC_PLATFORM
 	select KVM_GENERIC_PRIVATE_MEM
 	select HAVE_KVM_GMEM_PREPARE
+	select HAVE_KVM_GMEM_INVALIDATE
 	help
 	  Provides support for launching Encrypted VMs (SEV) and Encrypted VMs
 	  with Encrypted State (SEV-ES) on AMD processors.
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index e1f8be1df219..87d621d013a4 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4380,3 +4380,66 @@ int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order)
 
 	return 0;
 }
+
+void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
+{
+	kvm_pfn_t pfn;
+
+	pr_debug("%s: PFN start 0x%llx PFN end 0x%llx\n", __func__, start, end);
+
+	for (pfn = start; pfn < end;) {
+		bool use_2m_update = false;
+		int rc, rmp_level;
+		bool assigned;
+
+		rc = snp_lookup_rmpentry(pfn, &assigned, &rmp_level);
+		if (rc) {
+			pr_debug_ratelimited("SEV: Failed to retrieve RMP entry for PFN 0x%llx error %d\n",
+					     pfn, rc);
+			goto next_pfn;
+		}
+
+		if (!assigned)
+			goto next_pfn;
+
+		use_2m_update = IS_ALIGNED(pfn, PTRS_PER_PMD) &&
+				end >= (pfn + PTRS_PER_PMD) &&
+				rmp_level > PG_LEVEL_4K;
+
+		/*
+		 * If an unaligned PFN corresponds to a 2M region assigned as a
+		 * large page in he RMP table, PSMASH the region into individual
+		 * 4K RMP entries before attempting to convert a 4K sub-page.
+		 */
+		if (!use_2m_update && rmp_level > PG_LEVEL_4K) {
+			rc = snp_rmptable_psmash(pfn);
+			if (rc)
+				pr_err_ratelimited("SEV: Failed to PSMASH RMP entry for PFN 0x%llx error %d\n",
+						   pfn, rc);
+		}
+
+		rc = rmp_make_shared(pfn, use_2m_update ? PG_LEVEL_2M : PG_LEVEL_4K);
+		if (WARN_ON_ONCE(rc)) {
+			pr_err_ratelimited("SEV: Failed to update RMP entry for PFN 0x%llx error %d\n",
+					   pfn, rc);
+			goto next_pfn;
+		}
+
+		/*
+		 * SEV-ES avoids host/guest cache coherency issues through
+		 * WBINVD hooks issued via MMU notifiers during run-time, and
+		 * KVM's VM destroy path at shutdown. Those MMU notifier events
+		 * don't cover gmem since there is no requirement to map pages
+		 * to a HVA in order to use them for a running guest. While the
+		 * shutdown path would still likely cover things for SNP guests,
+		 * userspace may also free gmem pages during run-time via
+		 * hole-punching operations on the guest_memfd, so flush the
+		 * cache entries for these pages before free'ing them back to
+		 * the host.
+		 */
+		clflush_cache_range(__va(pfn_to_hpa(pfn)),
+				    use_2m_update ? PMD_SIZE : PAGE_SIZE);
+next_pfn:
+		pfn += use_2m_update ? PTRS_PER_PMD : 1;
+	}
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index c099154e326a..b456906f2670 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5080,6 +5080,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.alloc_apic_backing_page = svm_alloc_apic_backing_page,
 
 	.gmem_prepare = sev_gmem_prepare,
+	.gmem_invalidate = sev_gmem_invalidate,
 };
 
 /*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 53618cfc2b89..3f1f6d3d3ade 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -731,6 +731,7 @@ void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
 void sev_vcpu_unblocking(struct kvm_vcpu *vcpu);
 void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
 int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
+void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end);
 #else
 static inline struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu) {
 	return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
@@ -751,6 +752,7 @@ static inline int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, in
 {
 	return 0;
 }
+static inline void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end) {}
 
 #endif
 
-- 
2.25.1



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v12 22/29] KVM: SEV: Implement gmem hook for invalidating private pages
  2024-03-29 22:58 ` [PATCH v12 22/29] KVM: SEV: Implement gmem hook for invalidating " Michael Roth
@ 2024-03-30 21:31   ` Paolo Bonzini
  2024-04-18 19:57     ` Michael Roth
  0 siblings, 1 reply; 55+ messages in thread
From: Paolo Bonzini @ 2024-03-30 21:31 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, seanjc, vkuznets,
	jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick

On 3/29/24 23:58, Michael Roth wrote:
> +		/*
> +		 * If an unaligned PFN corresponds to a 2M region assigned as a
> +		 * large page in he RMP table, PSMASH the region into individual
> +		 * 4K RMP entries before attempting to convert a 4K sub-page.
> +		 */
> +		if (!use_2m_update && rmp_level > PG_LEVEL_4K) {
> +			rc = snp_rmptable_psmash(pfn);
> +			if (rc)
> +				pr_err_ratelimited("SEV: Failed to PSMASH RMP entry for PFN 0x%llx error %d\n",
> +						   pfn, rc);
> +		}

Ignoring the PSMASH failure is pretty scary...  At this point 
.free_folio cannot fail, should the psmash part of this patch be done in 
kvm_gmem_invalidate_begin() before kvm_mmu_unmap_gfn_range()?

Also, can you get PSMASH_FAIL_INUSE and if so what's the best way to 
address it?  Should fallocate() return -EBUSY?

Thanks,

Paolo



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v12 22/29] KVM: SEV: Implement gmem hook for invalidating private pages
  2024-03-30 21:31   ` Paolo Bonzini
@ 2024-04-18 19:57     ` Michael Roth
  0 siblings, 0 replies; 55+ messages in thread
From: Michael Roth @ 2024-04-18 19:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, seanjc, vkuznets,
	jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick

On Sat, Mar 30, 2024 at 10:31:47PM +0100, Paolo Bonzini wrote:
> On 3/29/24 23:58, Michael Roth wrote:
> > +		/*
> > +		 * If an unaligned PFN corresponds to a 2M region assigned as a
> > +		 * large page in he RMP table, PSMASH the region into individual
> > +		 * 4K RMP entries before attempting to convert a 4K sub-page.
> > +		 */
> > +		if (!use_2m_update && rmp_level > PG_LEVEL_4K) {
> > +			rc = snp_rmptable_psmash(pfn);
> > +			if (rc)
> > +				pr_err_ratelimited("SEV: Failed to PSMASH RMP entry for PFN 0x%llx error %d\n",
> > +						   pfn, rc);
> > +		}
> 
> Ignoring the PSMASH failure is pretty scary...  At this point .free_folio
> cannot fail, should the psmash part of this patch be done in
> kvm_gmem_invalidate_begin() before kvm_mmu_unmap_gfn_range()?
> 
> Also, can you get PSMASH_FAIL_INUSE and if so what's the best way to address
> it?  Should fallocate() return -EBUSY?

FAIL_INUSE shouldn't occur since at this point the pages have been unmapped
from NPT and only the task doing the cleanup should be attempting to
access/PSMASH this particular 2M HPA range at this point.

However, since FAIL_INUSE is transient, there isn't a good reason why we
shouldn't retry until it clears itself up rather than risk hosing the
system if some unexpected case ever did pop up, so I've updated
snp_rmptable_psmash() to handle that case automatically and simplify the
handling in sev_handle_rmp_fault() as well. (in the case of #NPF RMP
faults there is actually potential for PSMASH errors other than
FAIL_INUSE due to races with other vCPU threads which can interleave and
put the RMP entry in an unexpected state, so there's additional
handling/reporting to deal with those cases, but here they are not expected
and will trigger WARN_*ONCE()'s now)

I used this hacked up version of Sean's original patch to re-enable 2MB
hugepage support in gmem for the purposes of re-testing this:

  https://github.com/mdroth/linux/commit/15aa4f81811485997953130fc184e829ba4399d2

-Mike

> 
> Thanks,
> 
> Paolo
> 
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v12 23/29] KVM: x86: Implement gmem hook for determining max NPT mapping level
  2024-03-29 22:58 [PATCH v12 00/29] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (21 preceding siblings ...)
  2024-03-29 22:58 ` [PATCH v12 22/29] KVM: SEV: Implement gmem hook for invalidating " Michael Roth
@ 2024-03-29 22:58 ` Michael Roth
  2024-03-30 21:35   ` Paolo Bonzini
  2024-03-29 22:58 ` [PATCH v12 24/29] KVM: SEV: Avoid WBINVD for HVA-based MMU notifications for SNP Michael Roth
                   ` (6 subsequent siblings)
  29 siblings, 1 reply; 55+ messages in thread
From: Michael Roth @ 2024-03-29 22:58 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick

In the case of SEV-SNP, whether or not a 2MB page can be mapped via a
2MB mapping in the guest's nested page table depends on whether or not
any subpages within the range have already been initialized as private
in the RMP table. The existing mixed-attribute tracking in KVM is
insufficient here, for instance:

  - gmem allocates 2MB page
  - guest issues PVALIDATE on 2MB page
  - guest later converts a subpage to shared
  - SNP host code issues PSMASH to split 2MB RMP mapping to 4K
  - KVM MMU splits NPT mapping to 4K
  - guest later converts that shared page back to private

At this point there are no mixed attributes, and KVM would normally
allow for 2MB NPT mappings again, but this is actually not allowed
because the RMP table mappings are 4K and cannot be promoted on the
hypervisor side, so the NPT mappings must still be limited to 4K to
match this.

Implement a kvm_x86_ops.gmem_validate_fault() hook for SEV that checks
for this condition and adjusts the mapping level accordingly.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/svm/sev.c | 32 ++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c |  1 +
 arch/x86/kvm/svm/svm.h |  7 +++++++
 3 files changed, 40 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 87d621d013a4..31f6f4786503 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4443,3 +4443,35 @@ void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
 		pfn += use_2m_update ? PTRS_PER_PMD : 1;
 	}
 }
+
+/*
+ * Re-check whether an #NPF for a private/gmem page can still be serviced, and
+ * adjust maximum mapping level if needed.
+ */
+int sev_gmem_validate_fault(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, bool is_private,
+			    u8 *max_level)
+{
+	int level, rc;
+	bool assigned;
+
+	if (!sev_snp_guest(kvm))
+		return 0;
+
+	rc = snp_lookup_rmpentry(pfn, &assigned, &level);
+	if (rc) {
+		pr_err_ratelimited("SEV: RMP entry not found: GFN %llx PFN %llx level %d error %d\n",
+				   gfn, pfn, level, rc);
+		return -ENOENT;
+	}
+
+	if (!assigned) {
+		pr_err_ratelimited("SEV: RMP entry is not assigned: GFN %llx PFN %llx level %d\n",
+				   gfn, pfn, level);
+		return -EINVAL;
+	}
+
+	if (level < *max_level)
+		*max_level = level;
+
+	return 0;
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index b456906f2670..298b4ce77a5f 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5081,6 +5081,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 
 	.gmem_prepare = sev_gmem_prepare,
 	.gmem_invalidate = sev_gmem_invalidate,
+	.gmem_validate_fault = sev_gmem_validate_fault,
 };
 
 /*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 3f1f6d3d3ade..746f819a6de4 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -732,6 +732,8 @@ void sev_vcpu_unblocking(struct kvm_vcpu *vcpu);
 void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
 int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
 void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end);
+int sev_gmem_validate_fault(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, bool is_private,
+			    u8 *max_level);
 #else
 static inline struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu) {
 	return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
@@ -753,6 +755,11 @@ static inline int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, in
 	return 0;
 }
 static inline void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end) {}
+static inline int sev_gmem_validate_fault(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn,
+					  bool is_private, u8 *max_level)
+{
+	return 0;
+}
 
 #endif
 
-- 
2.25.1



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v12 23/29] KVM: x86: Implement gmem hook for determining max NPT mapping level
  2024-03-29 22:58 ` [PATCH v12 23/29] KVM: x86: Implement gmem hook for determining max NPT mapping level Michael Roth
@ 2024-03-30 21:35   ` Paolo Bonzini
  0 siblings, 0 replies; 55+ messages in thread
From: Paolo Bonzini @ 2024-03-30 21:35 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, seanjc, vkuznets,
	jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick

On 3/29/24 23:58, Michael Roth wrote:
> In the case of SEV-SNP, whether or not a 2MB page can be mapped via a
> 2MB mapping in the guest's nested page table depends on whether or not
> any subpages within the range have already been initialized as private
> in the RMP table. The existing mixed-attribute tracking in KVM is
> insufficient here, for instance:
> 
>    - gmem allocates 2MB page
>    - guest issues PVALIDATE on 2MB page
>    - guest later converts a subpage to shared
>    - SNP host code issues PSMASH to split 2MB RMP mapping to 4K
>    - KVM MMU splits NPT mapping to 4K
>    - guest later converts that shared page back to private
> 
> At this point there are no mixed attributes, and KVM would normally
> allow for 2MB NPT mappings again, but this is actually not allowed
> because the RMP table mappings are 4K and cannot be promoted on the
> hypervisor side, so the NPT mappings must still be limited to 4K to
> match this.
> 
> Implement a kvm_x86_ops.gmem_validate_fault() hook for SEV that checks
> for this condition and adjusts the mapping level accordingly.
> 
> Signed-off-by: Michael Roth <michael.roth@amd.com>

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

> ---
>   arch/x86/kvm/svm/sev.c | 32 ++++++++++++++++++++++++++++++++
>   arch/x86/kvm/svm/svm.c |  1 +
>   arch/x86/kvm/svm/svm.h |  7 +++++++
>   3 files changed, 40 insertions(+)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 87d621d013a4..31f6f4786503 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -4443,3 +4443,35 @@ void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
>   		pfn += use_2m_update ? PTRS_PER_PMD : 1;
>   	}
>   }
> +
> +/*
> + * Re-check whether an #NPF for a private/gmem page can still be serviced, and
> + * adjust maximum mapping level if needed.
> + */
> +int sev_gmem_validate_fault(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, bool is_private,
> +			    u8 *max_level)
> +{
> +	int level, rc;
> +	bool assigned;
> +
> +	if (!sev_snp_guest(kvm))
> +		return 0;
> +
> +	rc = snp_lookup_rmpentry(pfn, &assigned, &level);
> +	if (rc) {
> +		pr_err_ratelimited("SEV: RMP entry not found: GFN %llx PFN %llx level %d error %d\n",
> +				   gfn, pfn, level, rc);
> +		return -ENOENT;
> +	}
> +
> +	if (!assigned) {
> +		pr_err_ratelimited("SEV: RMP entry is not assigned: GFN %llx PFN %llx level %d\n",
> +				   gfn, pfn, level);
> +		return -EINVAL;
> +	}
> +
> +	if (level < *max_level)
> +		*max_level = level;
> +
> +	return 0;
> +}
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index b456906f2670..298b4ce77a5f 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -5081,6 +5081,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>   
>   	.gmem_prepare = sev_gmem_prepare,
>   	.gmem_invalidate = sev_gmem_invalidate,
> +	.gmem_validate_fault = sev_gmem_validate_fault,
>   };
>   
>   /*
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 3f1f6d3d3ade..746f819a6de4 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -732,6 +732,8 @@ void sev_vcpu_unblocking(struct kvm_vcpu *vcpu);
>   void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
>   int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
>   void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end);
> +int sev_gmem_validate_fault(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, bool is_private,
> +			    u8 *max_level);
>   #else
>   static inline struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu) {
>   	return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> @@ -753,6 +755,11 @@ static inline int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, in
>   	return 0;
>   }
>   static inline void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end) {}
> +static inline int sev_gmem_validate_fault(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn,
> +					  bool is_private, u8 *max_level)
> +{
> +	return 0;
> +}
>   
>   #endif
>   



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v12 24/29] KVM: SEV: Avoid WBINVD for HVA-based MMU notifications for SNP
  2024-03-29 22:58 [PATCH v12 00/29] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (22 preceding siblings ...)
  2024-03-29 22:58 ` [PATCH v12 23/29] KVM: x86: Implement gmem hook for determining max NPT mapping level Michael Roth
@ 2024-03-29 22:58 ` Michael Roth
  2024-03-30 21:35   ` Paolo Bonzini
  2024-03-29 22:58 ` [PATCH v12 25/29] KVM: SVM: Add module parameter to enable the SEV-SNP Michael Roth
                   ` (5 subsequent siblings)
  29 siblings, 1 reply; 55+ messages in thread
From: Michael Roth @ 2024-03-29 22:58 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick

From: Ashish Kalra <ashish.kalra@amd.com>

With SNP/guest_memfd, private/encrypted memory should not be mappable,
and MMU notifications for HVA-mapped memory will only be relevant to
unencrypted guest memory. Therefore, the rationale behind issuing a
wbinvd_on_all_cpus() in sev_guest_memory_reclaimed() should not apply
for SNP guests and can be ignored.

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: Add some clarifications in commit]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/svm/sev.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 31f6f4786503..3e8de7cb3c89 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2975,7 +2975,14 @@ static void sev_flush_encrypted_page(struct kvm_vcpu *vcpu, void *va)
 
 void sev_guest_memory_reclaimed(struct kvm *kvm)
 {
-	if (!sev_guest(kvm))
+	/*
+	 * With SNP+gmem, private/encrypted memory should be
+	 * unreachable via the hva-based mmu notifiers. Additionally,
+	 * for shared->private translations, H/W coherency will ensure
+	 * first guest access to the page would clear out any existing
+	 * dirty copies of that cacheline.
+	 */
+	if (!sev_guest(kvm) || sev_snp_guest(kvm))
 		return;
 
 	wbinvd_on_all_cpus();
-- 
2.25.1



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v12 24/29] KVM: SEV: Avoid WBINVD for HVA-based MMU notifications for SNP
  2024-03-29 22:58 ` [PATCH v12 24/29] KVM: SEV: Avoid WBINVD for HVA-based MMU notifications for SNP Michael Roth
@ 2024-03-30 21:35   ` Paolo Bonzini
  0 siblings, 0 replies; 55+ messages in thread
From: Paolo Bonzini @ 2024-03-30 21:35 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, seanjc, vkuznets,
	jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick

On 3/29/24 23:58, Michael Roth wrote:
> From: Ashish Kalra <ashish.kalra@amd.com>
> 
> With SNP/guest_memfd, private/encrypted memory should not be mappable,
> and MMU notifications for HVA-mapped memory will only be relevant to
> unencrypted guest memory. Therefore, the rationale behind issuing a
> wbinvd_on_all_cpus() in sev_guest_memory_reclaimed() should not apply
> for SNP guests and can be ignored.
> 
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> [mdr: Add some clarifications in commit]
> Signed-off-by: Michael Roth <michael.roth@amd.com>

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

> ---
>   arch/x86/kvm/svm/sev.c | 9 ++++++++-
>   1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 31f6f4786503..3e8de7cb3c89 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -2975,7 +2975,14 @@ static void sev_flush_encrypted_page(struct kvm_vcpu *vcpu, void *va)
>   
>   void sev_guest_memory_reclaimed(struct kvm *kvm)
>   {
> -	if (!sev_guest(kvm))
> +	/*
> +	 * With SNP+gmem, private/encrypted memory should be
> +	 * unreachable via the hva-based mmu notifiers. Additionally,
> +	 * for shared->private translations, H/W coherency will ensure
> +	 * first guest access to the page would clear out any existing
> +	 * dirty copies of that cacheline.
> +	 */
> +	if (!sev_guest(kvm) || sev_snp_guest(kvm))
>   		return;
>   
>   	wbinvd_on_all_cpus();



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v12 25/29] KVM: SVM: Add module parameter to enable the SEV-SNP
  2024-03-29 22:58 [PATCH v12 00/29] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (23 preceding siblings ...)
  2024-03-29 22:58 ` [PATCH v12 24/29] KVM: SEV: Avoid WBINVD for HVA-based MMU notifications for SNP Michael Roth
@ 2024-03-29 22:58 ` Michael Roth
  2024-03-30 21:35   ` Paolo Bonzini
  2024-03-29 22:58 ` [PATCH v12 26/29] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event Michael Roth
                   ` (4 subsequent siblings)
  29 siblings, 1 reply; 55+ messages in thread
From: Michael Roth @ 2024-03-29 22:58 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

Add a module parameter than can be used to enable or disable the SEV-SNP
feature. Now that KVM contains the support for the SNP set the GHCB
hypervisor feature flag to indicate that SNP is supported.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 arch/x86/kvm/svm/sev.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 3e8de7cb3c89..658116537f3f 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -48,7 +48,8 @@ static bool sev_es_enabled = true;
 module_param_named(sev_es, sev_es_enabled, bool, 0444);
 
 /* enable/disable SEV-SNP support */
-static bool sev_snp_enabled;
+static bool sev_snp_enabled = true;
+module_param_named(sev_snp, sev_snp_enabled, bool, 0444);
 
 /* enable/disable SEV-ES DebugSwap support */
 static bool sev_es_debug_swap_enabled = true;
-- 
2.25.1



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v12 25/29] KVM: SVM: Add module parameter to enable the SEV-SNP
  2024-03-29 22:58 ` [PATCH v12 25/29] KVM: SVM: Add module parameter to enable the SEV-SNP Michael Roth
@ 2024-03-30 21:35   ` Paolo Bonzini
  0 siblings, 0 replies; 55+ messages in thread
From: Paolo Bonzini @ 2024-03-30 21:35 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, seanjc, vkuznets,
	jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, Brijesh Singh

On 3/29/24 23:58, Michael Roth wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
> 
> Add a module parameter than can be used to enable or disable the SEV-SNP
> feature. Now that KVM contains the support for the SNP set the GHCB
> hypervisor feature flag to indicate that SNP is supported.
> 
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

> ---
>   arch/x86/kvm/svm/sev.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 3e8de7cb3c89..658116537f3f 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -48,7 +48,8 @@ static bool sev_es_enabled = true;
>   module_param_named(sev_es, sev_es_enabled, bool, 0444);
>   
>   /* enable/disable SEV-SNP support */
> -static bool sev_snp_enabled;
> +static bool sev_snp_enabled = true;
> +module_param_named(sev_snp, sev_snp_enabled, bool, 0444);
>   
>   /* enable/disable SEV-ES DebugSwap support */
>   static bool sev_es_debug_swap_enabled = true;



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v12 26/29] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
  2024-03-29 22:58 [PATCH v12 00/29] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (24 preceding siblings ...)
  2024-03-29 22:58 ` [PATCH v12 25/29] KVM: SVM: Add module parameter to enable the SEV-SNP Michael Roth
@ 2024-03-29 22:58 ` Michael Roth
  2024-04-10 22:14   ` Tom Lendacky
  2024-03-29 22:58 ` [PATCH v12 27/29] crypto: ccp: Add the SNP_VLEK_LOAD command Michael Roth
                   ` (3 subsequent siblings)
  29 siblings, 1 reply; 55+ messages in thread
From: Michael Roth @ 2024-03-29 22:58 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, Brijesh Singh, Alexey Kardashevskiy

From: Brijesh Singh <brijesh.singh@amd.com>

Version 2 of GHCB specification added support for the SNP Guest Request
Message NAE event. The event allows for an SEV-SNP guest to make
requests to the SEV-SNP firmware through hypervisor using the
SNP_GUEST_REQUEST API defined in the SEV-SNP firmware specification.

This is used by guests primarily to request attestation reports from
firmware. There are other request types are available as well, but the
specifics of what guest requests are being made are opaque to the
hypervisor, which only serves as a proxy for the guest requests and
firmware responses.

Implement handling for these events.

Co-developed-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: ensure FW command failures are indicated to guest, drop extended
 request handling to be re-written as separate patch, massage commit]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/svm/sev.c         | 83 ++++++++++++++++++++++++++++++++++
 include/uapi/linux/sev-guest.h |  9 ++++
 2 files changed, 92 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 658116537f3f..f56f04553e81 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -19,6 +19,7 @@
 #include <linux/misc_cgroup.h>
 #include <linux/processor.h>
 #include <linux/trace_events.h>
+#include <uapi/linux/sev-guest.h>
 
 #include <asm/pkru.h>
 #include <asm/trapnr.h>
@@ -3223,6 +3224,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 	case SVM_VMGEXIT_HV_FEATURES:
 	case SVM_VMGEXIT_PSC:
 	case SVM_VMGEXIT_TERM_REQUEST:
+	case SVM_VMGEXIT_GUEST_REQUEST:
 		break;
 	default:
 		reason = GHCB_ERR_INVALID_EVENT;
@@ -3646,6 +3648,83 @@ static int sev_snp_ap_creation(struct vcpu_svm *svm)
 	return ret;
 }
 
+static bool snp_setup_guest_buf(struct kvm *kvm, struct sev_data_snp_guest_request *data,
+				gpa_t req_gpa, gpa_t resp_gpa)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	kvm_pfn_t req_pfn, resp_pfn;
+
+	if (!IS_ALIGNED(req_gpa, PAGE_SIZE) || !IS_ALIGNED(resp_gpa, PAGE_SIZE))
+		return false;
+
+	req_pfn = gfn_to_pfn(kvm, gpa_to_gfn(req_gpa));
+	if (is_error_noslot_pfn(req_pfn))
+		return false;
+
+	resp_pfn = gfn_to_pfn(kvm, gpa_to_gfn(resp_gpa));
+	if (is_error_noslot_pfn(resp_pfn))
+		return false;
+
+	if (rmp_make_private(resp_pfn, 0, PG_LEVEL_4K, 0, true))
+		return false;
+
+	data->gctx_paddr = __psp_pa(sev->snp_context);
+	data->req_paddr = __sme_set(req_pfn << PAGE_SHIFT);
+	data->res_paddr = __sme_set(resp_pfn << PAGE_SHIFT);
+
+	return true;
+}
+
+static bool snp_cleanup_guest_buf(struct sev_data_snp_guest_request *data)
+{
+	u64 pfn = __sme_clr(data->res_paddr) >> PAGE_SHIFT;
+
+	if (snp_page_reclaim(pfn))
+		return false;
+
+	if (rmp_make_shared(pfn, PG_LEVEL_4K))
+		return false;
+
+	return true;
+}
+
+static bool __snp_handle_guest_req(struct kvm *kvm, gpa_t req_gpa, gpa_t resp_gpa,
+				   sev_ret_code *fw_err)
+{
+	struct sev_data_snp_guest_request data = {0};
+	struct kvm_sev_info *sev;
+	bool ret = true;
+
+	if (!sev_snp_guest(kvm))
+		return false;
+
+	sev = &to_kvm_svm(kvm)->sev_info;
+
+	if (!snp_setup_guest_buf(kvm, &data, req_gpa, resp_gpa))
+		return false;
+
+	if (sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, fw_err))
+		ret = false;
+
+	if (!snp_cleanup_guest_buf(&data))
+		ret = false;
+
+	return ret;
+}
+
+static void snp_handle_guest_req(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_gpa)
+{
+	struct kvm_vcpu *vcpu = &svm->vcpu;
+	struct kvm *kvm = vcpu->kvm;
+	sev_ret_code fw_err = 0;
+	int vmm_ret = 0;
+
+	if (!__snp_handle_guest_req(kvm, req_gpa, resp_gpa, &fw_err))
+		vmm_ret = SNP_GUEST_VMM_ERR_GENERIC;
+
+	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, fw_err));
+}
+
 static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3906,6 +3985,10 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 		vcpu->run->system_event.ndata = 1;
 		vcpu->run->system_event.data[0] = control->ghcb_gpa;
 		break;
+	case SVM_VMGEXIT_GUEST_REQUEST:
+		snp_handle_guest_req(svm, control->exit_info_1, control->exit_info_2);
+		ret = 1;
+		break;
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 		vcpu_unimpl(vcpu,
 			    "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
diff --git a/include/uapi/linux/sev-guest.h b/include/uapi/linux/sev-guest.h
index 154a87a1eca9..7bd78e258569 100644
--- a/include/uapi/linux/sev-guest.h
+++ b/include/uapi/linux/sev-guest.h
@@ -89,8 +89,17 @@ struct snp_ext_report_req {
 #define SNP_GUEST_FW_ERR_MASK		GENMASK_ULL(31, 0)
 #define SNP_GUEST_VMM_ERR_SHIFT		32
 #define SNP_GUEST_VMM_ERR(x)		(((u64)x) << SNP_GUEST_VMM_ERR_SHIFT)
+#define SNP_GUEST_FW_ERR(x)		((x) & SNP_GUEST_FW_ERR_MASK)
+#define SNP_GUEST_ERR(vmm_err, fw_err)	(SNP_GUEST_VMM_ERR(vmm_err) | \
+					 SNP_GUEST_FW_ERR(fw_err))
 
+/*
+ * The GHCB spec only formally defines INVALID_LEN/BUSY VMM errors, but define
+ * a GENERIC error code such that it won't ever conflict with GHCB-defined
+ * errors if any get added in the future.
+ */
 #define SNP_GUEST_VMM_ERR_INVALID_LEN	1
 #define SNP_GUEST_VMM_ERR_BUSY		2
+#define SNP_GUEST_VMM_ERR_GENERIC	BIT(31)
 
 #endif /* __UAPI_LINUX_SEV_GUEST_H_ */
-- 
2.25.1



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v12 26/29] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
  2024-03-29 22:58 ` [PATCH v12 26/29] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event Michael Roth
@ 2024-04-10 22:14   ` Tom Lendacky
  0 siblings, 0 replies; 55+ messages in thread
From: Tom Lendacky @ 2024-04-10 22:14 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, hpa, ardb, pbonzini, seanjc, vkuznets, jmattson,
	luto, dave.hansen, slp, pgonda, peterz, srinivas.pandruvada,
	rientjes, dovmurik, tobin, bp, vbabka, kirill, ak, tony.luck,
	sathyanarayanan.kuppuswamy, alpergun, jarkko, ashish.kalra,
	nikunj.dadhania, pankaj.gupta, liam.merwick, Brijesh Singh,
	Alexey Kardashevskiy

On 3/29/24 17:58, Michael Roth wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
> 
> Version 2 of GHCB specification added support for the SNP Guest Request
> Message NAE event. The event allows for an SEV-SNP guest to make
> requests to the SEV-SNP firmware through hypervisor using the
> SNP_GUEST_REQUEST API defined in the SEV-SNP firmware specification.
> 
> This is used by guests primarily to request attestation reports from
> firmware. There are other request types are available as well, but the
> specifics of what guest requests are being made are opaque to the
> hypervisor, which only serves as a proxy for the guest requests and
> firmware responses.
> 
> Implement handling for these events.
> 
> Co-developed-by: Alexey Kardashevskiy <aik@amd.com>
> Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>

You need to add a Co-developed-by: for Asish here.

> [mdr: ensure FW command failures are indicated to guest, drop extended
>   request handling to be re-written as separate patch, massage commit]
> Signed-off-by: Michael Roth <michael.roth@amd.com>

One minor comment below should another version be required, otherwise:

Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>

> ---
>   arch/x86/kvm/svm/sev.c         | 83 ++++++++++++++++++++++++++++++++++
>   include/uapi/linux/sev-guest.h |  9 ++++
>   2 files changed, 92 insertions(+)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 658116537f3f..f56f04553e81 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c

>   
> +static bool snp_setup_guest_buf(struct kvm *kvm, struct sev_data_snp_guest_request *data,
> +				gpa_t req_gpa, gpa_t resp_gpa)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +	kvm_pfn_t req_pfn, resp_pfn;
> +
> +	if (!IS_ALIGNED(req_gpa, PAGE_SIZE) || !IS_ALIGNED(resp_gpa, PAGE_SIZE))

Minor, but you can use PAGE_ALIGNED() here.

Thanks,
Tom

> +		return false;


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v12 27/29] crypto: ccp: Add the SNP_VLEK_LOAD command
  2024-03-29 22:58 [PATCH v12 00/29] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (25 preceding siblings ...)
  2024-03-29 22:58 ` [PATCH v12 26/29] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event Michael Roth
@ 2024-03-29 22:58 ` Michael Roth
  2024-04-10 22:20   ` Tom Lendacky
  2024-03-29 22:58 ` [PATCH v12 28/29] crypto: ccp: Add the SNP_{PAUSE,RESUME}_ATTESTATION commands Michael Roth
                   ` (2 subsequent siblings)
  29 siblings, 1 reply; 55+ messages in thread
From: Michael Roth @ 2024-03-29 22:58 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick

When requesting an attestation report a guest is able to specify whether
it wants SNP firmware to sign the report using either a Versioned Chip
Endorsement Key (VCEK), which is derived from chip-unique secrets, or a
Versioned Loaded Endorsement Key (VLEK) which is obtained from an AMD
Key Derivation Service (KDS) and derived from seeds allocated to
enrolled cloud service providers (CSPs).

For VLEK keys, an SNP_VLEK_LOAD SNP firmware command is used to load
them into the system after obtaining them from the KDS. Add a
corresponding userspace interface so to allow the loading of VLEK keys
into the system.

See SEV-SNP Firmware ABI 1.54, SNP_VLEK_LOAD for more details.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 drivers/crypto/ccp/sev-dev.c | 36 ++++++++++++++++++++++++++++++++++++
 include/uapi/linux/psp-sev.h | 27 +++++++++++++++++++++++++++
 2 files changed, 63 insertions(+)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 2102377f727b..97a7959406ee 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -2027,6 +2027,39 @@ static int sev_ioctl_do_snp_set_config(struct sev_issue_cmd *argp, bool writable
 	return __sev_do_cmd_locked(SEV_CMD_SNP_CONFIG, &config, &argp->error);
 }
 
+static int sev_ioctl_do_snp_vlek_load(struct sev_issue_cmd *argp, bool writable)
+{
+	struct sev_device *sev = psp_master->sev_data;
+	struct sev_user_data_snp_vlek_load input;
+	void *blob;
+	int ret;
+
+	if (!sev->snp_initialized || !argp->data)
+		return -EINVAL;
+
+	if (!writable)
+		return -EPERM;
+
+	if (copy_from_user(&input, u64_to_user_ptr(argp->data), sizeof(input)))
+		return -EFAULT;
+
+	if (input.len != sizeof(input) || input.vlek_wrapped_version != 0)
+		return -EINVAL;
+
+	blob = psp_copy_user_blob(input.vlek_wrapped_address,
+				  sizeof(struct sev_user_data_snp_wrapped_vlek_hashstick));
+	if (IS_ERR(blob))
+		return PTR_ERR(blob);
+
+	input.vlek_wrapped_address = __psp_pa(blob);
+
+	ret = __sev_do_cmd_locked(SEV_CMD_SNP_VLEK_LOAD, &input, &argp->error);
+
+	kfree(blob);
+
+	return ret;
+}
+
 static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
@@ -2087,6 +2120,9 @@ static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
 	case SNP_SET_CONFIG:
 		ret = sev_ioctl_do_snp_set_config(&input, writable);
 		break;
+	case SNP_VLEK_LOAD:
+		ret = sev_ioctl_do_snp_vlek_load(&input, writable);
+		break;
 	default:
 		ret = -EINVAL;
 		goto out;
diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
index b7a2c2ee35b7..2289b7c76c59 100644
--- a/include/uapi/linux/psp-sev.h
+++ b/include/uapi/linux/psp-sev.h
@@ -31,6 +31,7 @@ enum {
 	SNP_PLATFORM_STATUS,
 	SNP_COMMIT,
 	SNP_SET_CONFIG,
+	SNP_VLEK_LOAD,
 
 	SEV_MAX,
 };
@@ -214,6 +215,32 @@ struct sev_user_data_snp_config {
 	__u8 rsvd1[52];
 } __packed;
 
+/**
+ * struct sev_data_snp_vlek_load - SNP_VLEK_LOAD structure
+ *
+ * @len: length of the command buffer read by the PSP
+ * @vlek_wrapped_version: version of wrapped VLEK hashstick (Must be 0h)
+ * @rsvd: reserved
+ * @vlek_wrapped_address: address of a wrapped VLEK hashstick
+ *                        (struct sev_user_data_snp_wrapped_vlek_hashstick)
+ */
+struct sev_user_data_snp_vlek_load {
+	__u32 len;				/* In */
+	__u8 vlek_wrapped_version;		/* In */
+	__u8 rsvd[3];				/* In */
+	__u64 vlek_wrapped_address;		/* In */
+} __packed;
+
+/**
+ * struct sev_user_data_snp_vlek_wrapped_vlek_hashstick - Wrapped VLEK data
+ *
+ * @data: Opaque data provided by AMD KDS (as described in SEV-SNP Firmware ABI
+ *        1.54, SNP_VLEK_LOAD)
+ */
+struct sev_user_data_snp_wrapped_vlek_hashstick {
+	__u8 data[432];				/* In */
+} __packed;
+
 /**
  * struct sev_issue_cmd - SEV ioctl parameters
  *
-- 
2.25.1



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v12 27/29] crypto: ccp: Add the SNP_VLEK_LOAD command
  2024-03-29 22:58 ` [PATCH v12 27/29] crypto: ccp: Add the SNP_VLEK_LOAD command Michael Roth
@ 2024-04-10 22:20   ` Tom Lendacky
  0 siblings, 0 replies; 55+ messages in thread
From: Tom Lendacky @ 2024-04-10 22:20 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, hpa, ardb, pbonzini, seanjc, vkuznets, jmattson,
	luto, dave.hansen, slp, pgonda, peterz, srinivas.pandruvada,
	rientjes, dovmurik, tobin, bp, vbabka, kirill, ak, tony.luck,
	sathyanarayanan.kuppuswamy, alpergun, jarkko, ashish.kalra,
	nikunj.dadhania, pankaj.gupta, liam.merwick

On 3/29/24 17:58, Michael Roth wrote:
> When requesting an attestation report a guest is able to specify whether
> it wants SNP firmware to sign the report using either a Versioned Chip
> Endorsement Key (VCEK), which is derived from chip-unique secrets, or a
> Versioned Loaded Endorsement Key (VLEK) which is obtained from an AMD
> Key Derivation Service (KDS) and derived from seeds allocated to
> enrolled cloud service providers (CSPs).
> 
> For VLEK keys, an SNP_VLEK_LOAD SNP firmware command is used to load
> them into the system after obtaining them from the KDS. Add a
> corresponding userspace interface so to allow the loading of VLEK keys
> into the system.
> 
> See SEV-SNP Firmware ABI 1.54, SNP_VLEK_LOAD for more details.
> 
> Signed-off-by: Michael Roth <michael.roth@amd.com>

Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v12 28/29] crypto: ccp: Add the SNP_{PAUSE,RESUME}_ATTESTATION commands
  2024-03-29 22:58 [PATCH v12 00/29] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (26 preceding siblings ...)
  2024-03-29 22:58 ` [PATCH v12 27/29] crypto: ccp: Add the SNP_VLEK_LOAD command Michael Roth
@ 2024-03-29 22:58 ` Michael Roth
  2024-04-10 22:27   ` Tom Lendacky
  2024-03-29 22:58 ` [PATCH v12 29/29] KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event Michael Roth
  2024-03-30 21:44 ` [PATCH v12 00/29] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Paolo Bonzini
  29 siblings, 1 reply; 55+ messages in thread
From: Michael Roth @ 2024-03-29 22:58 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick

These commands can be used to pause servicing of guest attestation
requests. This useful when updating the reported TCB or signing key with
commands such as SNP_SET_CONFIG/SNP_COMMIT/SNP_VLEK_LOAD, since they may
in turn require updates to userspace-supplied certificates, and if an
attestation request happens to be in-flight at the time those updates
are occurring there is potential for a guest to receive a certificate
blob that is out of sync with the effective signing key for the
attestation report.

These interfaces also provide some versatility with how similar
firmware/certificate update activities can be handled in the future.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 Documentation/virt/coco/sev-guest.rst | 50 +++++++++++++++++++++++++--
 arch/x86/include/asm/sev.h            |  4 +++
 arch/x86/virt/svm/sev.c               | 43 +++++++++++++++++++++++
 drivers/crypto/ccp/sev-dev.c          | 47 +++++++++++++++++++++++++
 include/uapi/linux/psp-sev.h          | 12 +++++++
 5 files changed, 154 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/coco/sev-guest.rst b/Documentation/virt/coco/sev-guest.rst
index e1eaf6a830ce..dd5cf2098afd 100644
--- a/Documentation/virt/coco/sev-guest.rst
+++ b/Documentation/virt/coco/sev-guest.rst
@@ -128,8 +128,6 @@ the SEV-SNP specification for further details.
 
 The SNP_GET_EXT_REPORT ioctl is similar to the SNP_GET_REPORT. The difference is
 related to the additional certificate data that is returned with the report.
-The certificate data returned is being provided by the hypervisor through the
-SNP_SET_EXT_CONFIG.
 
 The ioctl uses the SNP_GUEST_REQUEST (MSG_REPORT_REQ) command provided by the SEV-SNP
 firmware to get the attestation report.
@@ -176,6 +174,54 @@ to SNP_CONFIG command defined in the SEV-SNP spec. The current values of
 the firmware parameters affected by this command can be queried via
 SNP_PLATFORM_STATUS.
 
+2.7 SNP_PAUSE_ATTESTATION / SNP_RESUME_ATTESTATION
+--------------------------------------------------
+:Technology: sev-snp
+:Type: hypervisor ioctl cmd
+:Parameters (out): struct sev_user_data_snp_pause_transaction
+:Returns (out): 0 on success, -negative on error
+
+When requesting attestation reports, SNP guests have the option of issuing
+an extended guest request which allows host userspace to supply additional
+certificate data that can be used to validate the signature used to sign
+the attestation report. This signature is generated using a key that is
+derived from the reported TCB that can be set via the SNP_SET_CONFIG and
+SNP_COMMIT ioctls, so the accompanying certificate data needs to be kept in
+sync with the changes made to the reported TCB via these ioctls.
+
+Similarly, interfaces like SNP_LOAD_VLEK can modify the key used to sign
+the attestation reports, which may in turn require updating the certificate
+data provided to guests via extended guest requests.
+
+To allow for updating the reported TCB, endorsement key, and any certificate
+data in a manner that is atomic to guests, the SNP_PAUSE_ATTESTATION and
+SNP_RESUME_ATTESTATION commands are provided.
+
+After SNP_PAUSE_ATTESTATION is issued, any attestation report requests via
+extended guest requests that are in-progress, or received after
+SNP_PAUSE_ATTESTATION is issued, will result in the guest receiving a
+GHCB-defined error message instructing it to retry the request. Once all
+the desired reported TCB, endorsement keys, or certificate data updates
+are completed on the host, the SNP_RESUME_ATTESTATION command must be
+issued to allow guest attestation requests to proceed.
+
+In general, hosts should serialize updates of this sort and never have more
+than 1 outstanding transaction in flight that could result in the
+interleaving of multiple SNP_PAUSE_ATTESTATION/SNP_RESUME_ATTESTATION pairs.
+To guard against this, SNP_PAUSE_ATTESTATION will fail if another process
+has already paused attestation requests.
+
+However, there may be occassions where a transaction needs to be aborted due
+to unexpected activity in userspace such as timeouts, crashes, etc., so
+SNP_RESUME_ATTESTATION will always succeed. Nonetheless, this could
+potentially lead to SNP_RESUME_ATTESTATION being called out of sequence, so
+to allow for callers of SNP_{PAUSE,RESUME}_ATTESTATION to detect such
+occurrences, each ioctl will return a transaction ID in the response so the
+caller can monitor whether the start/end ID both match. If they don't, the
+caller should assume that attestation has been paused/resumed unexpectedly,
+and take whatever measures it deems necessary such as logging, reporting,
+auditing the sequence of events.
+
 3. SEV-SNP CPUID Enforcement
 ============================
 
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 234a998e2d2d..975e92005438 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -272,6 +272,8 @@ int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, u32 asid, bool immut
 int rmp_make_shared(u64 pfn, enum pg_level level);
 void snp_leak_pages(u64 pfn, unsigned int npages);
 void kdump_sev_callback(void);
+int snp_pause_attestation(u64 *transaction_id);
+void snp_resume_attestation(u64 *transaction_id);
 #else
 static inline bool snp_probe_rmptable_info(void) { return false; }
 static inline int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level) { return -ENODEV; }
@@ -285,6 +287,8 @@ static inline int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, u32 as
 static inline int rmp_make_shared(u64 pfn, enum pg_level level) { return -ENODEV; }
 static inline void snp_leak_pages(u64 pfn, unsigned int npages) {}
 static inline void kdump_sev_callback(void) { }
+static inline int snp_pause_attestation(u64 *transaction_id) { return 0; }
+static inline void snp_resume_attestation(u64 *transaction_id) {}
 #endif
 
 #endif
diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
index ab0e8448bb6e..09d62870306b 100644
--- a/arch/x86/virt/svm/sev.c
+++ b/arch/x86/virt/svm/sev.c
@@ -70,6 +70,11 @@ static DEFINE_SPINLOCK(snp_leaked_pages_list_lock);
 
 static unsigned long snp_nr_leaked_pages;
 
+/* For synchronizing TCB/certificate updates with extended guest requests */
+static DEFINE_MUTEX(snp_pause_attestation_lock);
+static u64 snp_transaction_id;
+static bool snp_attestation_paused;
+
 #undef pr_fmt
 #define pr_fmt(fmt)	"SEV-SNP: " fmt
 
@@ -568,3 +573,41 @@ void kdump_sev_callback(void)
 	if (cc_platform_has(CC_ATTR_HOST_SEV_SNP))
 		wbinvd();
 }
+
+int snp_pause_attestation(u64 *transaction_id)
+{
+	mutex_lock(&snp_pause_attestation_lock);
+
+	if (snp_attestation_paused) {
+		mutex_unlock(&snp_pause_attestation_lock);
+		return -EBUSY;
+	}
+
+	/*
+	 * The actual transaction ID update will happen when
+	 * snp_resume_attestation() is called, so return
+	 * the *anticipated* transaction ID that will be
+	 * returned by snp_resume_attestation(). This is
+	 * to ensure that unbalanced/aborted transactions will
+	 * be noticeable when the caller that started the
+	 * transaction calls snp_resume_attestation().
+	 */
+	*transaction_id = snp_transaction_id + 1;
+	snp_attestation_paused = true;
+
+	mutex_unlock(&snp_pause_attestation_lock);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(snp_pause_attestation);
+
+void snp_resume_attestation(u64 *transaction_id)
+{
+	mutex_lock(&snp_pause_attestation_lock);
+
+	snp_attestation_paused = false;
+	*transaction_id = ++snp_transaction_id;
+
+	mutex_unlock(&snp_pause_attestation_lock);
+}
+EXPORT_SYMBOL_GPL(snp_resume_attestation);
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 97a7959406ee..7eb18a273731 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -2060,6 +2060,47 @@ static int sev_ioctl_do_snp_vlek_load(struct sev_issue_cmd *argp, bool writable)
 	return ret;
 }
 
+static int sev_ioctl_do_snp_pause_attestation(struct sev_issue_cmd *argp, bool writable)
+{
+	struct sev_user_data_snp_pause_attestation transaction = {0};
+	struct sev_device *sev = psp_master->sev_data;
+	int ret;
+
+	if (!sev->snp_initialized || !argp->data)
+		return -EINVAL;
+
+	if (!writable)
+		return -EPERM;
+
+	ret = snp_pause_attestation(&transaction.id);
+	if (ret)
+		return ret;
+
+	if (copy_to_user((void __user *)argp->data, &transaction, sizeof(transaction)))
+		return -EFAULT;
+
+	return 0;
+}
+
+static int sev_ioctl_do_snp_resume_attestation(struct sev_issue_cmd *argp, bool writable)
+{
+	struct sev_user_data_snp_pause_attestation transaction = {0};
+	struct sev_device *sev = psp_master->sev_data;
+
+	if (!sev->snp_initialized || !argp->data)
+		return -EINVAL;
+
+	if (!writable)
+		return -EPERM;
+
+	snp_resume_attestation(&transaction.id);
+
+	if (copy_to_user((void __user *)argp->data, &transaction, sizeof(transaction)))
+		return -EFAULT;
+
+	return 0;
+}
+
 static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
@@ -2123,6 +2164,12 @@ static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
 	case SNP_VLEK_LOAD:
 		ret = sev_ioctl_do_snp_vlek_load(&input, writable);
 		break;
+	case SNP_PAUSE_ATTESTATION:
+		ret = sev_ioctl_do_snp_pause_attestation(&input, writable);
+		break;
+	case SNP_RESUME_ATTESTATION:
+		ret = sev_ioctl_do_snp_resume_attestation(&input, writable);
+		break;
 	default:
 		ret = -EINVAL;
 		goto out;
diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
index 2289b7c76c59..7b35b2814a99 100644
--- a/include/uapi/linux/psp-sev.h
+++ b/include/uapi/linux/psp-sev.h
@@ -32,6 +32,8 @@ enum {
 	SNP_COMMIT,
 	SNP_SET_CONFIG,
 	SNP_VLEK_LOAD,
+	SNP_PAUSE_ATTESTATION,
+	SNP_RESUME_ATTESTATION,
 
 	SEV_MAX,
 };
@@ -241,6 +243,16 @@ struct sev_user_data_snp_wrapped_vlek_hashstick {
 	__u8 data[432];				/* In */
 } __packed;
 
+/**
+ * struct sev_user_data_snp_pause_attestation - metadata for pausing attestation
+ *
+ * @id: the ID of the transaction started/ended by a call to SNP_PAUSE_ATTESTATION
+ *	or SNP_RESUME_ATTESTATION, respectively.
+ */
+struct sev_user_data_snp_pause_attestation {
+	__u64 id;				/* Out */
+} __packed;
+
 /**
  * struct sev_issue_cmd - SEV ioctl parameters
  *
-- 
2.25.1



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v12 28/29] crypto: ccp: Add the SNP_{PAUSE,RESUME}_ATTESTATION commands
  2024-03-29 22:58 ` [PATCH v12 28/29] crypto: ccp: Add the SNP_{PAUSE,RESUME}_ATTESTATION commands Michael Roth
@ 2024-04-10 22:27   ` Tom Lendacky
  0 siblings, 0 replies; 55+ messages in thread
From: Tom Lendacky @ 2024-04-10 22:27 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, hpa, ardb, pbonzini, seanjc, vkuznets, jmattson,
	luto, dave.hansen, slp, pgonda, peterz, srinivas.pandruvada,
	rientjes, dovmurik, tobin, bp, vbabka, kirill, ak, tony.luck,
	sathyanarayanan.kuppuswamy, alpergun, jarkko, ashish.kalra,
	nikunj.dadhania, pankaj.gupta, liam.merwick

On 3/29/24 17:58, Michael Roth wrote:
> These commands can be used to pause servicing of guest attestation
> requests. This useful when updating the reported TCB or signing key with
> commands such as SNP_SET_CONFIG/SNP_COMMIT/SNP_VLEK_LOAD, since they may
> in turn require updates to userspace-supplied certificates, and if an
> attestation request happens to be in-flight at the time those updates
> are occurring there is potential for a guest to receive a certificate
> blob that is out of sync with the effective signing key for the
> attestation report.
> 
> These interfaces also provide some versatility with how similar
> firmware/certificate update activities can be handled in the future.
> 
> Signed-off-by: Michael Roth <michael.roth@amd.com>

Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>

> ---



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v12 29/29] KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event
  2024-03-29 22:58 [PATCH v12 00/29] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (27 preceding siblings ...)
  2024-03-29 22:58 ` [PATCH v12 28/29] crypto: ccp: Add the SNP_{PAUSE,RESUME}_ATTESTATION commands Michael Roth
@ 2024-03-29 22:58 ` Michael Roth
  2024-04-11 13:33   ` Tom Lendacky
  2024-03-30 21:44 ` [PATCH v12 00/29] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Paolo Bonzini
  29 siblings, 1 reply; 55+ messages in thread
From: Michael Roth @ 2024-03-29 22:58 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick

Version 2 of GHCB specification added support for the SNP Extended Guest
Request Message NAE event. This event serves a nearly identical purpose
to the previously-added SNP_GUEST_REQUEST event, but allows for
additional certificate data to be supplied via an additional
guest-supplied buffer to be used mainly for verifying the signature of
an attestation report as returned by firmware.

This certificate data is supplied by userspace, so unlike with
SNP_GUEST_REQUEST events, SNP_EXTENDED_GUEST_REQUEST events are first
forwarded to userspace via a KVM_EXIT_VMGEXIT exit type, and then the
firmware request is made only afterward.

Implement handling for these events.

Since there is a potential for race conditions where the
userspace-supplied certificate data may be out-of-sync relative to the
reported TCB or VLEK that firmware will use when signing attestation
reports, make use of the synchronization mechanisms wired up to the
SNP_{PAUSE,RESUME}_ATTESTATION SEV device ioctls such that the guest
will be told to retry the request while attestation has been paused due
to an update being underway on the system.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 Documentation/virt/kvm/api.rst | 26 ++++++++++++
 arch/x86/include/asm/sev.h     |  4 ++
 arch/x86/kvm/svm/sev.c         | 75 ++++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.h         |  3 ++
 arch/x86/virt/svm/sev.c        | 21 ++++++++++
 include/uapi/linux/kvm.h       |  6 +++
 6 files changed, 135 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 85099198a10f..6cf186ed8f66 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7066,6 +7066,7 @@ values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
 		struct kvm_user_vmgexit {
 		#define KVM_USER_VMGEXIT_PSC_MSR	1
 		#define KVM_USER_VMGEXIT_PSC		2
+		#define KVM_USER_VMGEXIT_EXT_GUEST_REQ	3
 			__u32 type; /* KVM_USER_VMGEXIT_* type */
 			union {
 				struct {
@@ -7079,6 +7080,11 @@ values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
 					__u64 shared_gpa;
 					__u64 ret;
 				} psc;
+				struct {
+					__u64 data_gpa;
+					__u64 data_npages;
+					__u32 ret;
+				} ext_guest_req;
 			};
 		};
 
@@ -7108,6 +7114,26 @@ private/shared state. Userspace will return a value in 'ret' that is in
 agreement with the GHCB-defined return values that the guest will expect
 in the SW_EXITINFO2 field of the GHCB in response to these requests.
 
+For the KVM_USER_VMGEXIT_EXT_GUEST_REQ type, the ext_guest_req union type
+is used. The kernel will supply in 'data_gpa' the value the guest supplies
+via the RAX field of the GHCB when issued extended guest requests.
+'data_npages' will similarly contain the value the guest supplies in RBX
+denoting the number of shared pages available to write the certificate
+data into.
+
+  - If the supplied number of pages is sufficient, userspace should write
+    the certificate data blob (in the format defined by the GHCB spec) in
+    the address indicated by 'data_gpa' and set 'ret' to 0.
+
+  - If the number of pages supplied is not sufficient, userspace must write
+    the required number of pages in 'data_npages' and then set 'ret' to 1.
+
+  - If userspace is temporarily unable to handle the request, 'ret' should
+    be set to 2 to inform the guest to retry later.
+
+  - If some other error occurred, userspace should set 'ret' to a non-zero
+    value that is distinct from the specific return values mentioned above.
+
 6. Capabilities that can be enabled on vCPUs
 ============================================
 
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 975e92005438..0e092c8c5614 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -274,6 +274,8 @@ void snp_leak_pages(u64 pfn, unsigned int npages);
 void kdump_sev_callback(void);
 int snp_pause_attestation(u64 *transaction_id);
 void snp_resume_attestation(u64 *transaction_id);
+u64 snp_transaction_get_id(void);
+bool snp_transaction_is_stale(u64 transaction_id);
 #else
 static inline bool snp_probe_rmptable_info(void) { return false; }
 static inline int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level) { return -ENODEV; }
@@ -289,6 +291,8 @@ static inline void snp_leak_pages(u64 pfn, unsigned int npages) {}
 static inline void kdump_sev_callback(void) { }
 static inline int snp_pause_attestation(u64 *transaction_id) { return 0; }
 static inline void snp_resume_attestation(u64 *transaction_id) {}
+static inline u64 snp_transaction_get_id(void) { return 0; }
+static inline bool snp_transaction_is_stale(u64 transaction_id) { return false; }
 #endif
 
 #endif
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index f56f04553e81..1da45e23ee14 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3225,6 +3225,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 	case SVM_VMGEXIT_PSC:
 	case SVM_VMGEXIT_TERM_REQUEST:
 	case SVM_VMGEXIT_GUEST_REQUEST:
+	case SVM_VMGEXIT_EXT_GUEST_REQUEST:
 		break;
 	default:
 		reason = GHCB_ERR_INVALID_EVENT;
@@ -3725,6 +3726,77 @@ static void snp_handle_guest_req(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp
 	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, fw_err));
 }
 
+static int snp_complete_ext_guest_req(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+	struct vmcb_control_area *control;
+	struct kvm *kvm = vcpu->kvm;
+	sev_ret_code fw_err = 0;
+	int vmm_ret;
+
+	vmm_ret = vcpu->run->vmgexit.ext_guest_req.ret;
+	if (vmm_ret) {
+		if (vmm_ret == SNP_GUEST_VMM_ERR_INVALID_LEN)
+			vcpu->arch.regs[VCPU_REGS_RBX] =
+				vcpu->run->vmgexit.ext_guest_req.data_npages;
+		goto abort_request;
+	}
+
+	control = &svm->vmcb->control;
+
+	if (!__snp_handle_guest_req(kvm, control->exit_info_1, control->exit_info_2,
+				    &fw_err))
+		vmm_ret = SNP_GUEST_VMM_ERR_GENERIC;
+
+	/*
+	 * Give errors related to stale transactions precedence to provide more
+	 * potential options for servicing firmware while guests are running.
+	 */
+	if (snp_transaction_is_stale(svm->snp_transaction_id))
+		vmm_ret = SNP_GUEST_VMM_ERR_BUSY;
+
+abort_request:
+	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, fw_err));
+
+	return 1; /* resume guest */
+}
+
+static int snp_begin_ext_guest_req(struct kvm_vcpu *vcpu)
+{
+	int vmm_ret = SNP_GUEST_VMM_ERR_GENERIC;
+	struct vcpu_svm *svm = to_svm(vcpu);
+	unsigned long data_npages;
+	sev_ret_code fw_err;
+	gpa_t data_gpa;
+
+	if (!sev_snp_guest(vcpu->kvm))
+		goto abort_request;
+
+	data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
+	data_npages = vcpu->arch.regs[VCPU_REGS_RBX];
+
+	if (!IS_ALIGNED(data_gpa, PAGE_SIZE))
+		goto abort_request;
+
+	svm->snp_transaction_id = snp_transaction_get_id();
+	if (snp_transaction_is_stale(svm->snp_transaction_id)) {
+		vmm_ret = SNP_GUEST_VMM_ERR_BUSY;
+		goto abort_request;
+	}
+
+	vcpu->run->exit_reason = KVM_EXIT_VMGEXIT;
+	vcpu->run->vmgexit.type = KVM_USER_VMGEXIT_EXT_GUEST_REQ;
+	vcpu->run->vmgexit.ext_guest_req.data_gpa = data_gpa;
+	vcpu->run->vmgexit.ext_guest_req.data_npages = data_npages;
+	vcpu->arch.complete_userspace_io = snp_complete_ext_guest_req;
+
+	return 0; /* forward request to userspace */
+
+abort_request:
+	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, fw_err));
+	return 1; /* resume guest */
+}
+
 static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3989,6 +4061,9 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 		snp_handle_guest_req(svm, control->exit_info_1, control->exit_info_2);
 		ret = 1;
 		break;
+	case SVM_VMGEXIT_EXT_GUEST_REQUEST:
+		ret = snp_begin_ext_guest_req(vcpu);
+		break;
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 		vcpu_unimpl(vcpu,
 			    "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 746f819a6de4..7af6d0e9de17 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -303,6 +303,9 @@ struct vcpu_svm {
 
 	/* Guest GIF value, used when vGIF is not enabled */
 	bool guest_gif;
+
+	/* Transaction ID associated with SNP config updates */
+	u64 snp_transaction_id;
 };
 
 struct svm_cpu_data {
diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
index 09d62870306b..30638d10a1b9 100644
--- a/arch/x86/virt/svm/sev.c
+++ b/arch/x86/virt/svm/sev.c
@@ -611,3 +611,24 @@ void snp_resume_attestation(u64 *transaction_id)
 	mutex_unlock(&snp_pause_attestation_lock);
 }
 EXPORT_SYMBOL_GPL(snp_resume_attestation);
+
+u64 snp_transaction_get_id(void)
+{
+	return snp_transaction_id;
+}
+EXPORT_SYMBOL_GPL(snp_transaction_get_id);
+
+bool snp_transaction_is_stale(u64 transaction_id)
+{
+	bool stale;
+
+	mutex_lock(&snp_pause_attestation_lock);
+
+	stale = (snp_attestation_paused ||
+		 transaction_id != snp_transaction_id);
+
+	mutex_unlock(&snp_pause_attestation_lock);
+
+	return stale;
+}
+EXPORT_SYMBOL_GPL(snp_transaction_is_stale);
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index e33c48bfbd67..585de3a2591e 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -138,6 +138,7 @@ struct kvm_xen_exit {
 struct kvm_user_vmgexit {
 #define KVM_USER_VMGEXIT_PSC_MSR	1
 #define KVM_USER_VMGEXIT_PSC		2
+#define KVM_USER_VMGEXIT_EXT_GUEST_REQ	3
 	__u32 type; /* KVM_USER_VMGEXIT_* type */
 	union {
 		struct {
@@ -151,6 +152,11 @@ struct kvm_user_vmgexit {
 			__u64 shared_gpa;
 			__u64 ret;
 		} psc;
+		struct {
+			__u64 data_gpa;
+			__u64 data_npages;
+			__u32 ret;
+		} ext_guest_req;
 	};
 };
 
-- 
2.25.1



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v12 29/29] KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event
  2024-03-29 22:58 ` [PATCH v12 29/29] KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event Michael Roth
@ 2024-04-11 13:33   ` Tom Lendacky
  0 siblings, 0 replies; 55+ messages in thread
From: Tom Lendacky @ 2024-04-11 13:33 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, hpa, ardb, pbonzini, seanjc, vkuznets, jmattson,
	luto, dave.hansen, slp, pgonda, peterz, srinivas.pandruvada,
	rientjes, dovmurik, tobin, bp, vbabka, kirill, ak, tony.luck,
	sathyanarayanan.kuppuswamy, alpergun, jarkko, ashish.kalra,
	nikunj.dadhania, pankaj.gupta, liam.merwick

On 3/29/24 17:58, Michael Roth wrote:
> Version 2 of GHCB specification added support for the SNP Extended Guest
> Request Message NAE event. This event serves a nearly identical purpose
> to the previously-added SNP_GUEST_REQUEST event, but allows for
> additional certificate data to be supplied via an additional
> guest-supplied buffer to be used mainly for verifying the signature of
> an attestation report as returned by firmware.
> 
> This certificate data is supplied by userspace, so unlike with
> SNP_GUEST_REQUEST events, SNP_EXTENDED_GUEST_REQUEST events are first
> forwarded to userspace via a KVM_EXIT_VMGEXIT exit type, and then the
> firmware request is made only afterward.
> 
> Implement handling for these events.
> 
> Since there is a potential for race conditions where the
> userspace-supplied certificate data may be out-of-sync relative to the
> reported TCB or VLEK that firmware will use when signing attestation
> reports, make use of the synchronization mechanisms wired up to the
> SNP_{PAUSE,RESUME}_ATTESTATION SEV device ioctls such that the guest
> will be told to retry the request while attestation has been paused due
> to an update being underway on the system.
> 
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>   Documentation/virt/kvm/api.rst | 26 ++++++++++++
>   arch/x86/include/asm/sev.h     |  4 ++
>   arch/x86/kvm/svm/sev.c         | 75 ++++++++++++++++++++++++++++++++++
>   arch/x86/kvm/svm/svm.h         |  3 ++
>   arch/x86/virt/svm/sev.c        | 21 ++++++++++
>   include/uapi/linux/kvm.h       |  6 +++
>   6 files changed, 135 insertions(+)
> 

> +static int snp_complete_ext_guest_req(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_svm *svm = to_svm(vcpu);
> +	struct vmcb_control_area *control;
> +	struct kvm *kvm = vcpu->kvm;
> +	sev_ret_code fw_err = 0;
> +	int vmm_ret;
> +
> +	vmm_ret = vcpu->run->vmgexit.ext_guest_req.ret;
> +	if (vmm_ret) {
> +		if (vmm_ret == SNP_GUEST_VMM_ERR_INVALID_LEN)
> +			vcpu->arch.regs[VCPU_REGS_RBX] =
> +				vcpu->run->vmgexit.ext_guest_req.data_npages;
> +		goto abort_request;
> +	}
> +
> +	control = &svm->vmcb->control;
> +
> +	if (!__snp_handle_guest_req(kvm, control->exit_info_1, control->exit_info_2,
> +				    &fw_err))
> +		vmm_ret = SNP_GUEST_VMM_ERR_GENERIC;
> +
> +	/*
> +	 * Give errors related to stale transactions precedence to provide more
> +	 * potential options for servicing firmware while guests are running.
> +	 */
> +	if (snp_transaction_is_stale(svm->snp_transaction_id))
> +		vmm_ret = SNP_GUEST_VMM_ERR_BUSY;

I think having this after the call to the SEV firmware will cause an 
issue. If the firmware has performed the attestation request 
successfully in the __snp_handle_guest_req() call, then it will have 
incremented the sequence number. If you return busy, then the sev-guest 
driver will attempt to re-issue the request with the original sequence 
number which will now fail. That failure will then be propagated back to 
the sev-guest driver which will then disable the VMPCK key.

So I think you need to put this before the call to firmware.

Thanks,
Tom

> +


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v12 00/29] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2024-03-29 22:58 [PATCH v12 00/29] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (28 preceding siblings ...)
  2024-03-29 22:58 ` [PATCH v12 29/29] KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event Michael Roth
@ 2024-03-30 21:44 ` Paolo Bonzini
  29 siblings, 0 replies; 55+ messages in thread
From: Paolo Bonzini @ 2024-03-30 21:44 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, seanjc, vkuznets,
	jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick

On 3/29/24 23:58, Michael Roth wrote:
> This patchset is also available at:
> 
>    https://github.com/amdese/linux/commits/snp-host-v12
> 
> and is based on top of the following series:
> 
>    [PATCH gmem 0/6] gmem fix-ups and interfaces for populating gmem pages
>    https://lore.kernel.org/kvm/20240329212444.395559-1-michael.roth@amd.com/
> 
> which in turn is based on:
> 
>    https://git.kernel.org/pub/scm/virt/kvm/kvm.git/log/?h=kvm-coco-queue
> 
> 
> Patch Layout
> ------------
> 
> 01-04: These patches are minor dependencies for this series and will
>         eventually make their way upstream through other trees. They are
>         included here only temporarily.
> 
> 05-09: These patches add some basic infrastructure and introduces a new
>         KVM_X86_SNP_VM vm_type to handle differences verses the existing
>         KVM_X86_SEV_VM and KVM_X86_SEV_ES_VM types.
> 
> 10-12: These implement the KVM API to handle the creation of a
>         cryptographic launch context, encrypt/measure the initial image
>         into guest memory, and finalize it before launching it.
> 
> 13-20: These implement handling for various guest-generated events such
>         as page state changes, onlining of additional vCPUs, etc.
> 
> 21-24: These implement the gmem hooks needed to prepare gmem-allocated
>         pages before mapping them into guest private memory ranges as
>         well as cleaning them up prior to returning them to the host for
>         use as normal memory. Because this supplants certain activities
>         like issued WBINVDs during KVM MMU invalidations, there's also
>         a patch to avoid duplicating that work to avoid unecessary
>         overhead.
> 
> 25:    With all the core support in place, the patch adds a kvm_amd module
>         parameter to enable SNP support.
> 
> 26-29: These patches all deal with the servicing of guest requests to handle
>         things like attestation, as well as some related host-management
>         interfaces.
> 
> 
> Testing
> -------
> 
> For testing this via QEMU, use the following tree:
> 
>    https://github.com/amdese/qemu/commits/snp-v4-wip2
> 
> A patched OVMF is also needed due to upstream KVM no longer supporting MMIO
> ranges that are mapped as private. It is recommended you build the AmdSevX64
> variant as it provides the kernel-hashing support present in this series:
> 
>    https://github.com/amdese/ovmf/commits/apic-mmio-fix1c
> 
> A basic command-line invocation for SNP would be:
> 
>   qemu-system-x86_64 -smp 32,maxcpus=255 -cpu EPYC-Milan-v2
>    -machine q35,confidential-guest-support=sev0,memory-backend=ram1
>    -object memory-backend-memfd,id=ram1,size=4G,share=true,reserve=false
>    -object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1,id-auth=
>    -bios /home/mroth/ovmf/OVMF_CODE-upstream-20240228-apicfix-1c-AmdSevX64.fd
> 
> With kernel-hashing and certificate data supplied:
> 
>   qemu-system-x86_64 -smp 32,maxcpus=255 -cpu EPYC-Milan-v2
>    -machine q35,confidential-guest-support=sev0,memory-backend=ram1
>    -object memory-backend-memfd,id=ram1,size=4G,share=true,reserve=false
>    -object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1,id-auth=,certs-path=/home/mroth/cert.blob,kernel-hashes=on
>    -bios /home/mroth/ovmf/OVMF_CODE-upstream-20240228-apicfix-1c-AmdSevX64.fd
>    -kernel /boot/vmlinuz-6.8.0-snp-host-v12-wip40+
>    -initrd /boot/initrd.img-6.8.0-snp-host-v12-wip40+
>    -append "root=UUID=d72a6d1c-06cf-4b79-af43-f1bac4f620f9 ro console=ttyS0,115200n8"
> 
> 
> Known issues / TODOs
> --------------------
> 
>   * Base tree in some cases reports "Unpatched return thunk in use. This should
>     not happen!" the first time it runs an SVM/SEV/SNP guests. This a recent
>     regression upstream and unrelated to this series:
> 
>       https://lore.kernel.org/linux-kernel/CANpmjNOcKzEvLHoGGeL-boWDHJobwfwyVxUqMq2kWeka3N4tXA@mail.gmail.com/T/
> 
>   * 2MB hugepage support has been dropped pending discussion on how we plan
>     to re-enable it in gmem.
> 
>   * Host kexec should work, but there is a known issue with handling host
>     kdump while SNP guests are running which will be addressed as a follow-up.
> 
>   * SNP kselftests are currently a WIP and will be included as part of SNP
>     upstreaming efforts in the near-term.
> 
> 
> SEV-SNP Overview
> ----------------
> 
> This part of the Secure Encrypted Paging (SEV-SNP) series focuses on the
> changes required to add KVM support for SEV-SNP. This series builds upon
> SEV-SNP guest support, which is now in mainline, and and SEV-SNP host
> initialization support, which is now in linux-next.
> 
> While series provides the basic building blocks to support booting the
> SEV-SNP VMs, it does not cover all the security enhancement introduced by
> the SEV-SNP such as interrupt protection, which will added in the future.
> 
> With SNP, when pages are marked as guest-owned in the RMP table, they are
> assigned to a specific guest/ASID, as well as a specific GFN with in the
> guest. Any attempts to map it in the RMP table to a different guest/ASID,
> or a different GFN within a guest/ASID, will result in an RMP nested page
> fault.
> 
> Prior to accessing a guest-owned page, the guest must validate it with a
> special PVALIDATE instruction which will set a special bit in the RMP table
> for the guest. This is the only way to set the validated bit outside of the
> initial pre-encrypted guest payload/image; any attempts outside the guest to
> modify the RMP entry from that point forward will result in the validated
> bit being cleared, at which point the guest will trigger an exception if it
> attempts to access that page so it can be made aware of possible tampering.
> 
> One exception to this is the initial guest payload, which is pre-validated
> by the firmware prior to launching. The guest can use Guest Message requests
> to fetch an attestation report which will include the measurement of the
> initial image so that the guest can verify it was booted with the expected
> image/environment.
> 
> After boot, guests can use Page State Change requests to switch pages
> between shared/hypervisor-owned and private/guest-owned to share data for
> things like DMA, virtio buffers, and other GHCB requests.
> 
> In this implementation of SEV-SNP, private guest memory is managed by a new
> kernel framework called guest_memfd (gmem). With gmem, a new
> KVM_SET_MEMORY_ATTRIBUTES KVM ioctl has been added to tell the KVM
> MMU whether a particular GFN should be backed by shared (normal) memory or
> private (gmem-allocated) memory. To tie into this, Page State Change
> requests are forward to userspace via KVM_EXIT_VMGEXIT exits, which will
> then issue the corresponding KVM_SET_MEMORY_ATTRIBUTES call to set the
> private/shared state in the KVM MMU.
> 
> The gmem / KVM MMU hooks implemented in this series will then update the RMP
> table entries for the backing PFNs to set them to guest-owned/private when
> mapping private pages into the guest via KVM MMU, or use the normal KVM MMU
> handling in the case of shared pages where the corresponding RMP table
> entries are left in the default shared/hypervisor-owned state.
> 
> Feedback/review is very much appreciated!
> 
> -Mike
> 
> Changes since v11:
> 
>   * Rebase series on kvm-coco-queue and re-work to leverage more
>     infrastructure between SNP/TDX series.
>   * Drop KVM_SNP_INIT in favor of the new KVM_SEV_INIT2 interface introduced
>     here (Paolo):
>       https://lore.kernel.org/lkml/20240318233352.2728327-1-pbonzini@redhat.com/
>   * Drop exposure API fields related to things like VMPL levels, migration
>     agents, etc., until they are actually supported/used (Sean)
>   * Rework KVM_SEV_SNP_LAUNCH_UPDATE handling to use a new
>     kvm_gmem_populate() interface instead of copying data directly into
>     gmem-allocated pages (Sean)
>   * Add support for SNP_LOAD_VLEK, rework the SNP_SET_CONFIG_{START,END} to
>     have simpler semantics that are applicable to management of SNP_LOAD_VLEK
>     updates as well, rename interfaces to the now more appropriate
>     SNP_{PAUSE,RESUME}_ATTESTATION
>   * Fix up documentation wording and do print warnings for
>     userspace-triggerable failures (Peter, Sean)
>   * Fix a race with AP_CREATION wake-up events (Jacob, Sean)
>   * Fix a memory leak with VMSA pages (Sean)
>   * Tighten up handling of RMP page faults to better distinguish between real
>     and spurious cases (Tom)
>   * Various patch/documentation rewording, cleanups, etc.

I skipped a few patches that deal mostly with AMD ABIs.  Here are the 
ones that have nontrivial remarks, that are probably be worth a reply 
before sending v13:

- patch 10: some extra checks on input parameters, and possibly 
forbidding SEV/SEV-ES ioctls for SEV-SNP guests?

- patch 12: a (hopefully) simple question on boot_vcpu_handled

- patch 18: see Sean's objections at 
https://lore.kernel.org/lkml/ZeCqnq7dLcJI41O9@google.com/

- patch 22: question on ignoring PSMASH failures and possibly adding a 
kvm_arch_gmem_invalidate_begin() API.

With respect to the six preparatory patches, I'll merge them in 
kvm-coco-queue early next week.  However I'll explode the arguments to 
kvm_gmem_populate(), while also removing "memslot" and merging "src" 
with "do_memcpy".  I'll post my version very early.

Paolo



^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2024-04-18 20:03 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-29 22:58 [PATCH v12 00/29] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
2024-03-29 22:58 ` [PATCH v12 01/29] [TEMP] x86/kvm/Kconfig: Have KVM_AMD_SEV select ARCH_HAS_CC_PLATFORM Michael Roth
2024-03-29 22:58 ` [PATCH v12 02/29] [TEMP] x86/cc: Add cc_platform_set/_clear() helpers Michael Roth
2024-03-29 22:58 ` [PATCH v12 03/29] [TEMP] x86/CPU/AMD: Track SNP host status with cc_platform_*() Michael Roth
2024-03-29 22:58 ` [PATCH v12 04/29] [TEMP] fixup! KVM: SEV: sync FPU and AVX state at LAUNCH_UPDATE_VMSA time Michael Roth
2024-03-29 22:58 ` [PATCH v12 05/29] KVM: x86: Define RMP page fault error bits for #NPF Michael Roth
2024-03-30 19:28   ` Paolo Bonzini
2024-03-29 22:58 ` [PATCH v12 06/29] KVM: SEV: Select KVM_GENERIC_PRIVATE_MEM when CONFIG_KVM_AMD_SEV=y Michael Roth
2024-03-29 22:58 ` [PATCH v12 07/29] KVM: SEV: Add support to handle AP reset MSR protocol Michael Roth
2024-03-29 22:58 ` [PATCH v12 08/29] KVM: SEV: Add GHCB handling for Hypervisor Feature Support requests Michael Roth
2024-03-29 22:58 ` [PATCH v12 09/29] KVM: SEV: Add initial SEV-SNP support Michael Roth
2024-03-30 19:58   ` Paolo Bonzini
2024-03-29 22:58 ` [PATCH v12 10/29] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_START command Michael Roth
2024-03-29 22:58 ` [PATCH v12 11/29] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command Michael Roth
     [not found]   ` <8c3685a6-833c-4b3c-83f4-c0bd78bba36e@redhat.com>
2024-04-01 22:22     ` Michael Roth
2024-04-02 22:58       ` Isaku Yamahata
2024-04-03 12:51         ` Paolo Bonzini
2024-04-03 15:37           ` Isaku Yamahata
2024-03-29 22:58 ` [PATCH v12 12/29] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_FINISH command Michael Roth
     [not found]   ` <40382494-7253-442b-91a8-e80c38fb4f2c@redhat.com>
2024-04-01 23:17     ` Michael Roth
2024-04-03 12:56       ` Paolo Bonzini
2024-03-29 22:58 ` [PATCH v12 13/29] KVM: SEV: Add support to handle GHCB GPA register VMGEXIT Michael Roth
2024-03-29 22:58 ` [PATCH v12 14/29] KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT Michael Roth
2024-03-29 22:58 ` [PATCH v12 15/29] KVM: SEV: Add support to handle " Michael Roth
2024-03-29 22:58 ` [PATCH v12 16/29] KVM: x86: Export the kvm_zap_gfn_range() for the SNP use Michael Roth
2024-03-29 22:58 ` [PATCH v12 17/29] KVM: SEV: Add support to handle RMP nested page faults Michael Roth
2024-03-30 20:55   ` Paolo Bonzini
2024-03-29 22:58 ` [PATCH v12 18/29] KVM: SEV: Use a VMSA physical address variable for populating VMCB Michael Roth
2024-03-30 21:01   ` Paolo Bonzini
2024-04-16 11:53     ` Paolo Bonzini
2024-04-16 14:25       ` Tom Lendacky
2024-04-16 17:00         ` Paolo Bonzini
2024-04-17 20:57       ` Michael Roth
2024-03-29 22:58 ` [PATCH v12 19/29] KVM: SEV: Support SEV-SNP AP Creation NAE event Michael Roth
2024-03-29 22:58 ` [PATCH v12 20/29] KVM: SEV: Add support for GHCB-based termination requests Michael Roth
2024-03-29 22:58 ` [PATCH v12 21/29] KVM: SEV: Implement gmem hook for initializing private pages Michael Roth
2024-03-30 21:05   ` Paolo Bonzini
2024-03-29 22:58 ` [PATCH v12 22/29] KVM: SEV: Implement gmem hook for invalidating " Michael Roth
2024-03-30 21:31   ` Paolo Bonzini
2024-04-18 19:57     ` Michael Roth
2024-03-29 22:58 ` [PATCH v12 23/29] KVM: x86: Implement gmem hook for determining max NPT mapping level Michael Roth
2024-03-30 21:35   ` Paolo Bonzini
2024-03-29 22:58 ` [PATCH v12 24/29] KVM: SEV: Avoid WBINVD for HVA-based MMU notifications for SNP Michael Roth
2024-03-30 21:35   ` Paolo Bonzini
2024-03-29 22:58 ` [PATCH v12 25/29] KVM: SVM: Add module parameter to enable the SEV-SNP Michael Roth
2024-03-30 21:35   ` Paolo Bonzini
2024-03-29 22:58 ` [PATCH v12 26/29] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event Michael Roth
2024-04-10 22:14   ` Tom Lendacky
2024-03-29 22:58 ` [PATCH v12 27/29] crypto: ccp: Add the SNP_VLEK_LOAD command Michael Roth
2024-04-10 22:20   ` Tom Lendacky
2024-03-29 22:58 ` [PATCH v12 28/29] crypto: ccp: Add the SNP_{PAUSE,RESUME}_ATTESTATION commands Michael Roth
2024-04-10 22:27   ` Tom Lendacky
2024-03-29 22:58 ` [PATCH v12 29/29] KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event Michael Roth
2024-04-11 13:33   ` Tom Lendacky
2024-03-30 21:44 ` [PATCH v12 00/29] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox