Re: [PATCH v1 11/26] x86/sev: Invalidate pages from the direct map when adding them to the RMP table

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Michael Roth <michael.roth@amd.com>
To: Dave Hansen <dave.hansen@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>,
	Borislav Petkov <bp@alien8.de>, <x86@kernel.org>,
	<kvm@vger.kernel.org>, <linux-coco@lists.linux.dev>,
	<linux-mm@kvack.org>, <linux-crypto@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, <tglx@linutronix.de>,
	<mingo@redhat.com>, <jroedel@suse.de>, <hpa@zytor.com>,
	<ardb@kernel.org>, <pbonzini@redhat.com>, <seanjc@google.com>,
	<vkuznets@redhat.com>, <jmattson@google.com>, <luto@kernel.org>,
	<dave.hansen@linux.intel.com>, <slp@redhat.com>,
	<pgonda@google.com>, <peterz@infradead.org>,
	<srinivas.pandruvada@linux.intel.com>, <rientjes@google.com>,
	<tobin@ibm.com>, <vbabka@suse.cz>, <kirill@shutemov.name>,
	<ak@linux.intel.com>, <tony.luck@intel.com>,
	<sathyanarayanan.kuppuswamy@linux.intel.com>,
	<alpergun@google.com>, <jarkko@kernel.org>,
	<ashish.kalra@amd.com>, <nikunj.dadhania@amd.com>,
	<pankaj.gupta@amd.com>, <liam.merwick@oracle.com>,
	<zhi.a.wang@intel.com>, Brijesh Singh <brijesh.singh@amd.com>,
	<rppt@kernel.org>
Subject: Re: [PATCH v1 11/26] x86/sev: Invalidate pages from the direct map when adding them to the RMP table
Date: Thu, 25 Jan 2024 19:35:24 -0600	[thread overview]
Message-ID: <20240126013524.s7lzsnaeyhs7g5ey@amd.com> (raw)
In-Reply-To: <011fed12-7fcd-4df8-b264-b55db2f3e95f@intel.com>

On Tue, Jan 16, 2024 at 12:22:39PM -0800, Dave Hansen wrote:
> On 1/16/24 08:19, Michael Roth wrote:
> > 
> > So at the very least, if we went down this path, we would be worth
> > investigating the following areas in addition to general perf testing:
> > 
> >   1) Only splitting directmap regions corresponding to kernel-allocatable
> >      *data* (hopefully that's even feasible...)
> 
> Take a look at the 64-bit memory map in here:
> 
> 	https://www.kernel.org/doc/Documentation/x86/x86_64/mm.rst
> 
> We already have separate mappings for kernel data and (normal) kernel text.
> 
> >   2) Potentially deferring the split until an SNP guest is actually
> >      run, so there isn't any impact just from having SNP enabled (though
> >      you still take a hit from RMP checks in that case so maybe it's not
> >      worthwhile, but that itself has been noted as a concern for users
> >      so it would be nice to not make things even worse).
> 
> Yes, this would be nice too.
> 
> >> Actually, where _is_ the TLB flushing here?
> > Boris pointed that out in v6, and we implemented it in v7, but it
> > completely cratered performance:
> 
> That *desperately* needs to be documented.
> 
> How can it be safe to skip the TLB flush?  It this akin to a page
> permission promotion where you go from RO->RW but can skip the TLB
> flush?  In that case, the CPU will see the RO TLB entry violation, drop
> it, and re-walk the page tables, discovering the RW entry.
> 
> Does something similar happen here where the CPU sees the 2M/4k conflict
> in the TLB, drops the 2M entry, does a re-walk then picks up the
> newly-split 2M->4k entries?
> 
> I can see how something like that would work, but it's _awfully_ subtle
> to go unmentioned.

I think the current logic relies on the fact that we don't actually have
to remove entries from the RMP table to avoid unexpected RMP #PFs,
because the owners of private PFNs are aware of what access restrictions
would be active, and non-owners of private PFNs shouldn't be trying to
write to them. 

It's only cases where a non-owner does a write via a 2MB+ mapping to that
happens to overlap a private PFN that needs to be guarded against, and
when a private PFN is removed from the directmap it will end up
splitting the 2MB mapping, and in that cases there *is* a TLB flush that
would wipe out the 2MB TLB entry.

After that point, there may still be stale 4KB TLB entries that accrue,
but by the time there is any sensible reason to use them to perform a
write, the PFN would have been transitioned back to shared state and
re-added to the directmap.

But yes, it's ugly, and we've ended up re-working this to simply split
the directmap via set_memory_4k(), which also does the expected TLB
flushing of 2MB+ mappings, and we only call it when absolutely
necessary, benefitting from the following optimizations:

  - if lookup_address() tells us the mapping is already split, no work
    splitting is needed
  - when the rmpupdate() is for a 2MB private page/RMP entry, there's no
    need to split the directmap, because there's no risk of a non-owner's
    writes overlapping with the private PFNs (RMP checks are only done on
    4KB/2MB granularity, so even a write via a 1GB directmap mapping
    would not trigger an RMP fault since the non-owner's writes would be
    to a different 2MB PFN range.

Since KVM/gmem generally allocate 2MB backing pages for private guest
memory, the above optimizations result in the directmap only needing to
be split, and only needing to acquire the cpa_lock, a handful of times
for each guest.

This also sort of gives us what we were looking for earlier: a way to
defer the directmap split until it's actually needed. But rather than in
bulk, it's amortized over time, and the more it's done, the more likely it
is that the host is being used primarily to run SNP guests, and so the less
likely it is that splitting the directmap is going to cause some sort of
noticeable regression for non-SNP/non-guest workloads.

I did some basic testing to compare this against pre-splitting the
directmap, and I think it's pretty close performance-wise, so it seems like
a good, safe default. It's also fairly trivial to add a kernel cmdline
params or something later if someone wants to optimize specifically for
SNP guests.

  System: EPYC, 512 cores, 1.5TB memory
  
  == Page-in/acceptance rate for 1 guests, each with 128 vCPUs/512M ==
  
     directmap pre-split to 4K:
     13.77GB/s
     11.88GB/s
     15.37GB/s

     directmap split on-demand:
     14.77 GB/s
     16.57 GB/s
     15.53 GB/s
  
  == Page-in/acceptance rate for 4 guests, each with 128 vCPUs/256GB ==
  
     directmap pre-split to 4K:
     37.35 GB/s
     33.37 GB/s
     29.55 GB/s
    
     directmap split on-demand:
     28.88 GB/s
     25.13 GB/s
     29.28 GB/s
  
  == Page-in/acceptance rate for 8 guests, each with 128 vCPUs/160GB ==
  
     directmap pre-split to 4K:
     24.39 GB/s
     27.16 GB/s
     24.70 GB/s
     
     direct split on-demand:
     26.12 GB/s
     24.44 GB/s
     22.66 GB/s

So for v2 we're using this on-demand approach, and I've tried to capture
some of this rationale and in the commit log / comments for the relevant
patch.

Thanks,

Mike

next prev parent reply	other threads:[~2024-01-26  3:15 UTC|newest]

Thread overview: 101+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-30 16:19 [PATCH v1 00/26] Add AMD Secure Nested Paging (SEV-SNP) Initialization Support Michael Roth
2023-12-30 16:19 ` [PATCH v1 01/26] x86/cpufeatures: Add SEV-SNP CPU feature Michael Roth
2023-12-31 11:50   ` Borislav Petkov
2023-12-31 16:44     ` Michael Roth
2023-12-30 16:19 ` [PATCH v1 02/26] x86/speculation: Do not enable Automatic IBRS if SEV SNP is enabled Michael Roth
2023-12-30 16:19 ` [PATCH v1 03/26] iommu/amd: Don't rely on external callers to enable IOMMU SNP support Michael Roth
2024-01-04 10:30   ` Borislav Petkov
2024-01-04 10:58   ` Joerg Roedel
2023-12-30 16:19 ` [PATCH v1 04/26] x86/sev: Add the host SEV-SNP initialization support Michael Roth
2024-01-04 11:05   ` Jeremi Piotrowski
2024-01-05 16:09     ` Borislav Petkov
2024-01-05 16:21       ` Borislav Petkov
2024-01-08 16:49         ` Jeremi Piotrowski
2024-01-08 17:04           ` Borislav Petkov
2024-01-09 11:56             ` Jeremi Piotrowski
2024-01-09 12:29               ` Borislav Petkov
2024-01-09 12:44                 ` Borislav Petkov
2024-02-14 16:56                   ` Jeremi Piotrowski
2024-01-04 11:16   ` Borislav Petkov
2024-01-04 14:42   ` Borislav Petkov
2024-01-05 19:19   ` Borislav Petkov
2024-01-05 21:27   ` Borislav Petkov
2023-12-30 16:19 ` [PATCH v1 05/26] x86/mtrr: Don't print errors if MtrrFixDramModEn is set when SNP enabled Michael Roth
2023-12-30 16:19 ` [PATCH v1 06/26] x86/sev: Add RMP entry lookup helpers Michael Roth
2023-12-30 16:19 ` [PATCH v1 07/26] x86/fault: Add helper for dumping RMP entries Michael Roth
2024-01-10  9:59   ` Borislav Petkov
2024-01-10 20:18     ` Jarkko Sakkinen
2024-01-10 22:14       ` Borislav Petkov
2024-01-10 11:13   ` Borislav Petkov
2024-01-10 15:20     ` Tom Lendacky
2024-01-10 15:27       ` Borislav Petkov
2024-01-10 15:51         ` Tom Lendacky
2024-01-10 15:55           ` Borislav Petkov
2024-01-10 15:10   ` Tom Lendacky
2023-12-30 16:19 ` [PATCH v1 08/26] x86/traps: Define RMP violation #PF error code Michael Roth
2023-12-30 16:19 ` [PATCH v1 09/26] x86/fault: Dump RMP table information when RMP page faults occur Michael Roth
2023-12-30 16:19 ` [PATCH v1 10/26] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction Michael Roth
2024-01-12 14:49   ` Borislav Petkov
2023-12-30 16:19 ` [PATCH v1 11/26] x86/sev: Invalidate pages from the direct map when adding them to the RMP table Michael Roth
2024-01-12 19:48   ` Borislav Petkov
2024-01-12 20:00   ` Dave Hansen
2024-01-12 20:07     ` Borislav Petkov
2024-01-12 20:27       ` Vlastimil Babka
2024-01-15  9:06         ` Borislav Petkov
2024-01-15  9:14           ` Vlastimil Babka
2024-01-15  9:16           ` Mike Rapoport
2024-01-15  9:20             ` Borislav Petkov
2024-01-12 20:28       ` Tom Lendacky
2024-01-12 20:37         ` Dave Hansen
2024-01-15  9:23           ` Vlastimil Babka
2024-01-16 16:19           ` Michael Roth
2024-01-16 16:50             ` Michael Roth
     [not found]               ` <ZabjKpCqx9np0SEI@kernel.org>
2024-01-26  1:49                 ` Michael Roth
2024-01-16 18:22             ` Borislav Petkov
2024-01-16 20:22             ` Dave Hansen
2024-01-26  1:35               ` Michael Roth [this message]
2024-01-15  9:09     ` Borislav Petkov
2024-01-16 16:21       ` Dave Hansen
2024-01-17  9:34         ` Borislav Petkov
2024-01-15  9:01   ` Borislav Petkov
2023-12-30 16:19 ` [PATCH v1 12/26] crypto: ccp: Define the SEV-SNP commands Michael Roth
2024-01-15  9:41   ` Borislav Petkov
2024-01-26  1:56     ` Michael Roth
2023-12-30 16:19 ` [PATCH v1 13/26] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP Michael Roth
2024-01-15 11:19   ` Borislav Petkov
2024-01-15 19:53   ` Borislav Petkov
2024-01-26  2:48     ` Michael Roth
2023-12-30 16:19 ` [PATCH v1 14/26] crypto: ccp: Provide API to issue SEV and SNP commands Michael Roth
2024-01-17  9:48   ` Borislav Petkov
2023-12-30 16:19 ` [PATCH v1 15/26] x86/sev: Introduce snp leaked pages list Michael Roth
2024-01-08 10:45   ` Vlastimil Babka
2024-01-09 22:19     ` Kalra, Ashish
2024-01-10  8:59       ` Vlastimil Babka
2023-12-30 16:19 ` [PATCH v1 16/26] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled Michael Roth
2023-12-30 16:19 ` [PATCH v1 17/26] crypto: ccp: Handle non-volatile INIT_EX data " Michael Roth
2024-01-18 14:03   ` Borislav Petkov
2023-12-30 16:19 ` [PATCH v1 18/26] crypto: ccp: Handle legacy SEV commands " Michael Roth
2024-01-19 17:18   ` Borislav Petkov
2024-01-19 17:36     ` Tom Lendacky
2024-01-19 17:48       ` Borislav Petkov
2024-01-26 13:29     ` Michael Roth
2023-12-30 16:19 ` [PATCH v1 19/26] iommu/amd: Clean up RMP entries for IOMMU pages during SNP shutdown Michael Roth
2023-12-30 16:19 ` [PATCH v1 20/26] crypto: ccp: Add debug support for decrypting pages Michael Roth
2024-01-10 14:59   ` Sean Christopherson
2024-01-11  0:50     ` Michael Roth
2023-12-30 16:19 ` [PATCH v1 21/26] crypto: ccp: Add panic notifier for SEV/SNP firmware shutdown on kdump Michael Roth
2024-01-21 11:49   ` Borislav Petkov
2024-01-26  3:03     ` Kalra, Ashish
2024-01-26 13:38     ` Michael Roth
2023-12-30 16:19 ` [PATCH v1 22/26] KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe Michael Roth
2024-01-21 11:51   ` Borislav Petkov
2024-01-26  3:44     ` Michael Roth
2023-12-30 16:19 ` [PATCH v1 23/26] x86/cpufeatures: Enable/unmask SEV-SNP CPU feature Michael Roth
2023-12-30 16:19 ` [PATCH v1 24/26] crypto: ccp: Add the SNP_PLATFORM_STATUS command Michael Roth
2024-01-21 12:29   ` Borislav Petkov
2024-01-26  3:32     ` Michael Roth
2023-12-30 16:19 ` [PATCH v1 25/26] crypto: ccp: Add the SNP_COMMIT command Michael Roth
2024-01-21 12:35   ` Borislav Petkov
2023-12-30 16:19 ` [PATCH v1 26/26] crypto: ccp: Add the SNP_SET_CONFIG command Michael Roth
2024-01-21 12:41   ` Borislav Petkov
2024-01-26 13:30     ` Michael Roth

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240126013524.s7lzsnaeyhs7g5ey@amd.com \
    --to=michael.roth@amd.com \
    --cc=ak@linux.intel.com \
    --cc=alpergun@google.com \
    --cc=ardb@kernel.org \
    --cc=ashish.kalra@amd.com \
    --cc=bp@alien8.de \
    --cc=brijesh.singh@amd.com \
    --cc=dave.hansen@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=jarkko@kernel.org \
    --cc=jmattson@google.com \
    --cc=jroedel@suse.de \
    --cc=kirill@shutemov.name \
    --cc=kvm@vger.kernel.org \
    --cc=liam.merwick@oracle.com \
    --cc=linux-coco@lists.linux.dev \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=nikunj.dadhania@amd.com \
    --cc=pankaj.gupta@amd.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pgonda@google.com \
    --cc=rientjes@google.com \
    --cc=rppt@kernel.org \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=seanjc@google.com \
    --cc=slp@redhat.com \
    --cc=srinivas.pandruvada@linux.intel.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=tobin@ibm.com \
    --cc=tony.luck@intel.com \
    --cc=vbabka@suse.cz \
    --cc=vkuznets@redhat.com \
    --cc=x86@kernel.org \
    --cc=zhi.a.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox