Re: [PATCH RFC gmem v1 4/8] KVM: x86: Add gmem hook for invalidating memory

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: Steven Price <steven.price@arm.com>
Cc: Michael Roth <michael.roth@amd.com>,
	kvm@vger.kernel.org,  Suzuki K Poulose <suzuki.poulose@arm.com>,
	"tabba@google.com" <tabba@google.com>,
	linux-coco@lists.linux.dev,  linux-mm@kvack.org,
	linux-crypto@vger.kernel.org, x86@kernel.org,
	 linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	 pbonzini@redhat.com, isaku.yamahata@intel.com,
	ackerleytng@google.com,  vbabka@suse.cz, ashish.kalra@amd.com,
	nikunj.dadhania@amd.com,  jroedel@suse.de, pankaj.gupta@amd.com
Subject: Re: [PATCH RFC gmem v1 4/8] KVM: x86: Add gmem hook for invalidating memory
Date: Fri, 9 Feb 2024 06:28:04 -0800	[thread overview]
Message-ID: <ZcY2VRsRd03UQdF7@google.com> (raw)
In-Reply-To: <e7125fcb-52b1-4942-9ae7-c85049e92e5c@arm.com>

On Fri, Feb 09, 2024, Steven Price wrote:
> On 16/10/2023 12:50, Michael Roth wrote:
> > In some cases, like with SEV-SNP, guest memory needs to be updated in a
> > platform-specific manner before it can be safely freed back to the host.
> > Wire up arch-defined hooks to the .free_folio kvm_gmem_aops callback to
> > allow for special handling of this sort when freeing memory in response
> > to FALLOC_FL_PUNCH_HOLE operations and when releasing the inode, and go
> > ahead and define an arch-specific hook for x86 since it will be needed
> > for handling memory used for SEV-SNP guests.
> 
> Hi all,
> 
> Arm CCA has a similar need to prepare/unprepare memory (granule
> delegate/undelegate using our terminology) before it is used for
> protected memory.
> 
> However I see a problem with the current gmem implementation that the
> "invalidations" are not precise enough for our RMI API. When punching a
> hole in the memfd the code currently hits the same path (ending in
> kvm_unmap_gfn_range()) as if a VMA is modified in the same range (for
> the shared version).
>
> The Arm CCA architecture doesn't allow the protected memory to be removed and
> refaulted without the permission of the guest (the memory contents would be
> wiped in this case).

TDX behaves almost exactly like CCA.  Well, that's not technically true, strictly
speaking, as there are TDX APIs that do allow for *temporarily* marking mappings
!PRESENT, but those aren't in play for invalidation events like this.

SNP does allow zapping page table mappings, but fully removing a page, as PUNCH_HOLE
would do, is destructive, so SNP also behaves the same way for all intents and
purposes.

> One option that I've considered is to implement a seperate CCA ioctl to
> notify KVM whether the memory should be mapped protected.

That's what KVM_SET_MEMORY_ATTRIBUTES+KVM_MEMORY_ATTRIBUTE_PRIVATE is for, no?

> The invalidations would then be ignored on ranges that are currently
> protected for this guest.

That's backwards.  Invalidations on a guest_memfd should affect only *protected*
mappings.  And for that, the plan/proposal is to plumb only_{shared,private} flags
into "struct kvm_gfn_range"[1] so that guest_memfd invalidations don't zap shared
mappings, and mmu_notifier invalidation don't zap private mappings.  Sample usage
in the TDX context[2] (disclaimer, I'm pretty sure I didn't write most of that
patch despite, I only provided a rough sketch).

[1] https://lore.kernel.org/all/20231027182217.3615211-13-seanjc@google.com
[2] https://lore.kernel.org/all/0b308fb6dd52bafe7153086c7f54bfad03da74b1.1705965635.git.isaku.yamahata@intel.com

> This 'solves' the problem nicely except for the case where the VMM
> deliberately punches holes in memory which the guest is using.

I don't see what problem there is to solve in this case.  PUNCH_HOLE is destructive,
so don't do that.

> The issue in this case is that there's no way of failing the punch hole
> operation - we can detect that the memory is in use and shouldn't be
> freed, but this callback doesn't give the opportunity to actually block
> the freeing of the memory.

Why is this KVM's problem?  E.g. the same exact thing happens without guest_memfd
if userspace munmap()s memory the guest is using.

> Sadly there's no easy way to map from a physical page in a gmem back to
> which VM (and where in the VM) the page is mapped. So actually ripping
> the page out of the appropriate VM isn't really possible in this case.

I don't follow.  guest_memfd has a 1:1 binding with a VM *and* a gfn, how can you
not know what exactly needs to be invalidated?

> How is this situation handled on x86? Is it possible to invalidate and
> then refault a protected page without affecting the memory contents? My
> guess is yes and that is a CCA specific problem - is my understanding
> correct?
> 
> My current thoughts for CCA are one of three options:
> 
> 1. Represent shared and protected memory as two separate memslots. This
> matches the underlying architecture more closely (the top address bit is
> repurposed as a 'shared' flag), but I don't like it because it's a
> deviation from other CoCo architectures (notably pKVM).
> 
> 2. Allow punch-hole to fail on CCA if the memory is mapped into the
> guest's protected space. Again, this is CCA being different and also
> creates nasty corner cases where the gmem descriptor could have to
> outlive the VMM - so looks like a potential source of memory leaks.
> 
> 3. 'Fix' the invalidation to provide more precise semantics. I haven't
> yet prototyped it but it might be possible to simply provide a flag from
> kvm_gmem_invalidate_begin specifying that the invalidation is for the
> protected memory. KVM would then only unmap the protected memory when
> this flag is set (avoiding issues with VMA updates causing spurious unmaps).
> 
> Fairly obviously (3) is my preferred option, but it relies on the
> guarantees that the "invalidation" is actually a precise set of
> addresses where the memory is actually being freed.

#3 is what we are planning for x86, and except for the only_{shared,private} flags,
the requisite functionality should already be in Linus' tree, though it does need
to be wired up for ARM.

next prev parent reply	other threads:[~2024-02-09 14:28 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-16 11:50 [PATCH RFC gmem v1 0/8] KVM: gmem hooks/changes needed for x86 (other archs?) Michael Roth
2023-10-16 11:50 ` [PATCH RFC gmem v1 1/8] mm: Introduce AS_INACCESSIBLE for encrypted/confidential memory Michael Roth
2023-10-16 11:50 ` [PATCH RFC gmem v1 2/8] KVM: Use AS_INACCESSIBLE when creating guest_memfd inode Michael Roth
2023-10-16 11:50 ` [PATCH RFC gmem v1 3/8] KVM: x86: Add gmem hook for initializing memory Michael Roth
2024-02-08 10:57   ` Suzuki K Poulose
2024-02-08 17:29     ` Sean Christopherson
2023-10-16 11:50 ` [PATCH RFC gmem v1 4/8] KVM: x86: Add gmem hook for invalidating memory Michael Roth
2024-02-09 10:11   ` Steven Price
2024-02-09 14:28     ` Sean Christopherson [this message]
2024-02-09 15:02       ` Steven Price
2024-02-09 15:13         ` Sean Christopherson
2024-03-11 17:24           ` Michael Roth
2024-03-12 20:26             ` Sean Christopherson
2024-03-13 17:11               ` Steven Price
2023-10-16 11:50 ` [PATCH RFC gmem v1 5/8] KVM: x86/mmu: Pass around full 64-bit error code for KVM page faults Michael Roth
2023-10-16 11:50 ` [PATCH RFC gmem v1 6/8] KVM: x86: Add KVM_X86_SNP_VM vm_type Michael Roth
2023-10-16 11:50 ` [PATCH RFC gmem v1 7/8] KVM: x86: Define RMP page fault error bits for #NPF Michael Roth
2023-10-16 11:50 ` [PATCH RFC gmem v1 8/8] KVM: x86: Determine shared/private faults based on vm_type Michael Roth
2024-01-31  1:13   ` Sean Christopherson
2024-02-08  0:24     ` Michael Roth
2024-02-08 17:27       ` Sean Christopherson
2024-02-08 17:30         ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZcY2VRsRd03UQdF7@google.com \
    --to=seanjc@google.com \
    --cc=ackerleytng@google.com \
    --cc=ashish.kalra@amd.com \
    --cc=isaku.yamahata@intel.com \
    --cc=jroedel@suse.de \
    --cc=kvm@vger.kernel.org \
    --cc=linux-coco@lists.linux.dev \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=michael.roth@amd.com \
    --cc=nikunj.dadhania@amd.com \
    --cc=pankaj.gupta@amd.com \
    --cc=pbonzini@redhat.com \
    --cc=steven.price@arm.com \
    --cc=suzuki.poulose@arm.com \
    --cc=tabba@google.com \
    --cc=vbabka@suse.cz \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox