From: Sean Christopherson <seanjc@google.com>
To: Steven Price <steven.price@arm.com>
Cc: Michael Roth <michael.roth@amd.com>,
kvm@vger.kernel.org, Suzuki K Poulose <suzuki.poulose@arm.com>,
"tabba@google.com" <tabba@google.com>,
linux-coco@lists.linux.dev, linux-mm@kvack.org,
linux-crypto@vger.kernel.org, x86@kernel.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
pbonzini@redhat.com, isaku.yamahata@intel.com,
ackerleytng@google.com, vbabka@suse.cz, ashish.kalra@amd.com,
nikunj.dadhania@amd.com, jroedel@suse.de, pankaj.gupta@amd.com
Subject: Re: [PATCH RFC gmem v1 4/8] KVM: x86: Add gmem hook for invalidating memory
Date: Fri, 9 Feb 2024 06:28:04 -0800 [thread overview]
Message-ID: <ZcY2VRsRd03UQdF7@google.com> (raw)
In-Reply-To: <e7125fcb-52b1-4942-9ae7-c85049e92e5c@arm.com>
On Fri, Feb 09, 2024, Steven Price wrote:
> On 16/10/2023 12:50, Michael Roth wrote:
> > In some cases, like with SEV-SNP, guest memory needs to be updated in a
> > platform-specific manner before it can be safely freed back to the host.
> > Wire up arch-defined hooks to the .free_folio kvm_gmem_aops callback to
> > allow for special handling of this sort when freeing memory in response
> > to FALLOC_FL_PUNCH_HOLE operations and when releasing the inode, and go
> > ahead and define an arch-specific hook for x86 since it will be needed
> > for handling memory used for SEV-SNP guests.
>
> Hi all,
>
> Arm CCA has a similar need to prepare/unprepare memory (granule
> delegate/undelegate using our terminology) before it is used for
> protected memory.
>
> However I see a problem with the current gmem implementation that the
> "invalidations" are not precise enough for our RMI API. When punching a
> hole in the memfd the code currently hits the same path (ending in
> kvm_unmap_gfn_range()) as if a VMA is modified in the same range (for
> the shared version).
>
> The Arm CCA architecture doesn't allow the protected memory to be removed and
> refaulted without the permission of the guest (the memory contents would be
> wiped in this case).
TDX behaves almost exactly like CCA. Well, that's not technically true, strictly
speaking, as there are TDX APIs that do allow for *temporarily* marking mappings
!PRESENT, but those aren't in play for invalidation events like this.
SNP does allow zapping page table mappings, but fully removing a page, as PUNCH_HOLE
would do, is destructive, so SNP also behaves the same way for all intents and
purposes.
> One option that I've considered is to implement a seperate CCA ioctl to
> notify KVM whether the memory should be mapped protected.
That's what KVM_SET_MEMORY_ATTRIBUTES+KVM_MEMORY_ATTRIBUTE_PRIVATE is for, no?
> The invalidations would then be ignored on ranges that are currently
> protected for this guest.
That's backwards. Invalidations on a guest_memfd should affect only *protected*
mappings. And for that, the plan/proposal is to plumb only_{shared,private} flags
into "struct kvm_gfn_range"[1] so that guest_memfd invalidations don't zap shared
mappings, and mmu_notifier invalidation don't zap private mappings. Sample usage
in the TDX context[2] (disclaimer, I'm pretty sure I didn't write most of that
patch despite, I only provided a rough sketch).
[1] https://lore.kernel.org/all/20231027182217.3615211-13-seanjc@google.com
[2] https://lore.kernel.org/all/0b308fb6dd52bafe7153086c7f54bfad03da74b1.1705965635.git.isaku.yamahata@intel.com
> This 'solves' the problem nicely except for the case where the VMM
> deliberately punches holes in memory which the guest is using.
I don't see what problem there is to solve in this case. PUNCH_HOLE is destructive,
so don't do that.
> The issue in this case is that there's no way of failing the punch hole
> operation - we can detect that the memory is in use and shouldn't be
> freed, but this callback doesn't give the opportunity to actually block
> the freeing of the memory.
Why is this KVM's problem? E.g. the same exact thing happens without guest_memfd
if userspace munmap()s memory the guest is using.
> Sadly there's no easy way to map from a physical page in a gmem back to
> which VM (and where in the VM) the page is mapped. So actually ripping
> the page out of the appropriate VM isn't really possible in this case.
I don't follow. guest_memfd has a 1:1 binding with a VM *and* a gfn, how can you
not know what exactly needs to be invalidated?
> How is this situation handled on x86? Is it possible to invalidate and
> then refault a protected page without affecting the memory contents? My
> guess is yes and that is a CCA specific problem - is my understanding
> correct?
>
> My current thoughts for CCA are one of three options:
>
> 1. Represent shared and protected memory as two separate memslots. This
> matches the underlying architecture more closely (the top address bit is
> repurposed as a 'shared' flag), but I don't like it because it's a
> deviation from other CoCo architectures (notably pKVM).
>
> 2. Allow punch-hole to fail on CCA if the memory is mapped into the
> guest's protected space. Again, this is CCA being different and also
> creates nasty corner cases where the gmem descriptor could have to
> outlive the VMM - so looks like a potential source of memory leaks.
>
> 3. 'Fix' the invalidation to provide more precise semantics. I haven't
> yet prototyped it but it might be possible to simply provide a flag from
> kvm_gmem_invalidate_begin specifying that the invalidation is for the
> protected memory. KVM would then only unmap the protected memory when
> this flag is set (avoiding issues with VMA updates causing spurious unmaps).
>
> Fairly obviously (3) is my preferred option, but it relies on the
> guarantees that the "invalidation" is actually a precise set of
> addresses where the memory is actually being freed.
#3 is what we are planning for x86, and except for the only_{shared,private} flags,
the requisite functionality should already be in Linus' tree, though it does need
to be wired up for ARM.
next prev parent reply other threads:[~2024-02-09 14:28 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-16 11:50 [PATCH RFC gmem v1 0/8] KVM: gmem hooks/changes needed for x86 (other archs?) Michael Roth
2023-10-16 11:50 ` [PATCH RFC gmem v1 1/8] mm: Introduce AS_INACCESSIBLE for encrypted/confidential memory Michael Roth
2023-10-16 11:50 ` [PATCH RFC gmem v1 2/8] KVM: Use AS_INACCESSIBLE when creating guest_memfd inode Michael Roth
2023-10-16 11:50 ` [PATCH RFC gmem v1 3/8] KVM: x86: Add gmem hook for initializing memory Michael Roth
2024-02-08 10:57 ` Suzuki K Poulose
2024-02-08 17:29 ` Sean Christopherson
2023-10-16 11:50 ` [PATCH RFC gmem v1 4/8] KVM: x86: Add gmem hook for invalidating memory Michael Roth
2024-02-09 10:11 ` Steven Price
2024-02-09 14:28 ` Sean Christopherson [this message]
2024-02-09 15:02 ` Steven Price
2024-02-09 15:13 ` Sean Christopherson
2024-03-11 17:24 ` Michael Roth
2024-03-12 20:26 ` Sean Christopherson
2024-03-13 17:11 ` Steven Price
2023-10-16 11:50 ` [PATCH RFC gmem v1 5/8] KVM: x86/mmu: Pass around full 64-bit error code for KVM page faults Michael Roth
2023-10-16 11:50 ` [PATCH RFC gmem v1 6/8] KVM: x86: Add KVM_X86_SNP_VM vm_type Michael Roth
2023-10-16 11:50 ` [PATCH RFC gmem v1 7/8] KVM: x86: Define RMP page fault error bits for #NPF Michael Roth
2023-10-16 11:50 ` [PATCH RFC gmem v1 8/8] KVM: x86: Determine shared/private faults based on vm_type Michael Roth
2024-01-31 1:13 ` Sean Christopherson
2024-02-08 0:24 ` Michael Roth
2024-02-08 17:27 ` Sean Christopherson
2024-02-08 17:30 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZcY2VRsRd03UQdF7@google.com \
--to=seanjc@google.com \
--cc=ackerleytng@google.com \
--cc=ashish.kalra@amd.com \
--cc=isaku.yamahata@intel.com \
--cc=jroedel@suse.de \
--cc=kvm@vger.kernel.org \
--cc=linux-coco@lists.linux.dev \
--cc=linux-crypto@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=michael.roth@amd.com \
--cc=nikunj.dadhania@amd.com \
--cc=pankaj.gupta@amd.com \
--cc=pbonzini@redhat.com \
--cc=steven.price@arm.com \
--cc=suzuki.poulose@arm.com \
--cc=tabba@google.com \
--cc=vbabka@suse.cz \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox