From: Ackerley Tng <ackerleytng@google.com>
To: Sean Christopherson <seanjc@google.com>, Fuad Tabba <tabba@google.com>
Cc: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org,
linux-mm@kvack.org, kvmarm@lists.linux.dev, pbonzini@redhat.com,
chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org,
paul.walmsley@sifive.com, palmer@dabbelt.com,
aou@eecs.berkeley.edu, viro@zeniv.linux.org.uk,
brauner@kernel.org, willy@infradead.org,
akpm@linux-foundation.org, xiaoyao.li@intel.com,
yilun.xu@intel.com, chao.p.peng@linux.intel.com,
jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com,
isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz,
vannapurve@google.com, mail@maciej.szmigiero.name,
david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com,
liam.merwick@oracle.com, isaku.yamahata@gmail.com,
kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com,
steven.price@arm.com, quic_eberman@quicinc.com,
quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com,
quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com,
quic_pderrin@quicinc.com, quic_pheragu@quicinc.com,
catalin.marinas@arm.com, james.morse@arm.com,
yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org,
will@kernel.org, qperret@google.com, keirf@google.com,
roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org,
jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com,
fvdl@google.com, hughd@google.com, jthoughton@google.com,
peterx@redhat.com, pankaj.gupta@amd.com, ira.weiny@intel.com
Subject: Re: [PATCH v12 10/18] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory
Date: Tue, 24 Jun 2025 16:40:18 -0700 [thread overview]
Message-ID: <diqz1pr8lndp.fsf@ackerleytng-ctop.c.googlers.com> (raw)
In-Reply-To: <aEyhHgwQXW4zbx-k@google.com>
Sean Christopherson <seanjc@google.com> writes:
> On Wed, Jun 11, 2025, Fuad Tabba wrote:
>> From: Ackerley Tng <ackerleytng@google.com>
>>
>> For memslots backed by guest_memfd with shared mem support, the KVM MMU
>> must always fault in pages from guest_memfd, and not from the host
>> userspace_addr. Update the fault handler to do so.
>
> And with a KVM_MEMSLOT_GUEST_MEMFD_ONLY flag, this becomes super obvious.
>
>> This patch also refactors related function names for accuracy:
>
> This patch. And phrase changelogs as commands.
>
>> kvm_mem_is_private() returns true only when the current private/shared
>> state (in the CoCo sense) of the memory is private, and returns false if
>> the current state is shared explicitly or impicitly, e.g., belongs to a
>> non-CoCo VM.
>
> Again, state changes as commands. For the above, it's not obvious if you're
> talking about the existing code versus the state of things after "this patch".
>
>
Will fix these, thanks!
>> kvm_mmu_faultin_pfn_gmem() is updated to indicate that it can be used to
>> fault in not just private memory, but more generally, from guest_memfd.
>
>> +static inline u8 kvm_max_level_for_order(int order)
>
> Do not use "inline" for functions that are visible only to the local compilation
> unit. "inline" is just a hint, and modern compilers are smart enough to inline
> functions when appropriate without a hint.
>
> A longer explanation/rant here: https://lore.kernel.org/all/ZAdfX+S323JVWNZC@google.com
>
Will fix this!
>> +static inline int kvm_gmem_max_mapping_level(const struct kvm_memory_slot *slot,
>> + gfn_t gfn, int max_level)
>> +{
>> + int max_order;
>>
>> if (max_level == PG_LEVEL_4K)
>> return PG_LEVEL_4K;
>
> This is dead code, the one and only caller has *just* checked for this condition.
>>
>> - host_level = host_pfn_mapping_level(kvm, gfn, slot);
>> - return min(host_level, max_level);
>> + max_order = kvm_gmem_mapping_order(slot, gfn);
>> + return min(max_level, kvm_max_level_for_order(max_order));
>> }
>
> ...
>
>> -static u8 kvm_max_private_mapping_level(struct kvm *kvm, kvm_pfn_t pfn,
>> - u8 max_level, int gmem_order)
>> +static u8 kvm_max_level_for_fault_and_order(struct kvm *kvm,
>
> This is comically verbose. C ain't Java. And having two separate helpers makes
> it *really* hard to (a) even see there are TWO helpers in the first place, and
> (b) understand how they differ.
>
> Gah, and not your bug, but completely ignoring the RMP in kvm_mmu_max_mapping_level()
> is wrong. It "works" because guest_memfd doesn't (yet) support dirty logging,
> no one enables the NX hugepage mitigation on AMD hosts.
>
> We could plumb in the pfn and private info, but I don't really see the point,
> at least not at this time.
>
>> + struct kvm_page_fault *fault,
>> + int order)
>> {
>> - u8 req_max_level;
>> + u8 max_level = fault->max_level;
>>
>> if (max_level == PG_LEVEL_4K)
>> return PG_LEVEL_4K;
>>
>> - max_level = min(kvm_max_level_for_order(gmem_order), max_level);
>> + max_level = min(kvm_max_level_for_order(order), max_level);
>> if (max_level == PG_LEVEL_4K)
>> return PG_LEVEL_4K;
>>
>> - req_max_level = kvm_x86_call(private_max_mapping_level)(kvm, pfn);
>> - if (req_max_level)
>> - max_level = min(max_level, req_max_level);
>> + if (fault->is_private) {
>> + u8 level = kvm_x86_call(private_max_mapping_level)(kvm, fault->pfn);
>
> Hmm, so the interesting thing here is that (IIRC) the RMP restrictions aren't
> just on the private pages, they also apply to the HYPERVISOR/SHARED pages. (Don't
> quote me on that).
>
> Regardless, I'm leaning toward dropping the "private" part, and making SNP deal
> with the intricacies of the RMP:
>
> /* Some VM types have additional restrictions, e.g. SNP's RMP. */
> req_max_level = kvm_x86_call(max_mapping_level)(kvm, fault);
> if (req_max_level)
> max_level = min(max_level, req_max_level);
>
> Then we can get to something like:
>
> static int kvm_gmem_max_mapping_level(struct kvm *kvm, int order,
> struct kvm_page_fault *fault)
> {
> int max_level, req_max_level;
>
> max_level = kvm_max_level_for_order(order);
> if (max_level == PG_LEVEL_4K)
> return PG_LEVEL_4K;
>
> req_max_level = kvm_x86_call(max_mapping_level)(kvm, fault);
> if (req_max_level)
> max_level = min(max_level, req_max_level);
>
> return max_level;
> }
>
> int kvm_mmu_max_mapping_level(struct kvm *kvm,
> const struct kvm_memory_slot *slot, gfn_t gfn)
> {
> int max_level;
>
> max_level = kvm_lpage_info_max_mapping_level(kvm, slot, gfn, PG_LEVEL_NUM);
> if (max_level == PG_LEVEL_4K)
> return PG_LEVEL_4K;
>
> /* TODO: Comment goes here about KVM not supporting this path (yet). */
Which path does KVM not support?
> if (kvm_mem_is_private(kvm, gfn))
> return PG_LEVEL_4K;
>
Just making sure - this suggestion does take into account that
kvm_mem_is_private() will be querying guest_memfd for memory privacy
status, right? So the check below for kvm_is_memslot_gmem_only() will
only be handling the cases where the memory is shared, and only
guest_memfd is used for this gfn?
> if (kvm_is_memslot_gmem_only(slot)) {
> int order = kvm_gmem_mapping_order(slot, gfn);
>
> return min(max_level, kvm_gmem_max_mapping_level(kvm, order, NULL));
> }
>
> return min(max_level, host_pfn_mapping_level(kvm, gfn, slot));
> }
>
> static int kvm_mmu_faultin_pfn_gmem(struct kvm_vcpu *vcpu,
> struct kvm_page_fault *fault)
> {
> struct kvm *kvm = vcpu->kvm;
> int order, r;
>
> if (!kvm_slot_has_gmem(fault->slot)) {
> kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
> return -EFAULT;
> }
>
> r = kvm_gmem_get_pfn(kvm, fault->slot, fault->gfn, &fault->pfn,
> &fault->refcounted_page, &order);
> if (r) {
> kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
> return r;
> }
>
> fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY);
> fault->max_level = kvm_gmem_max_mapping_level(kvm, order, fault);
>
> return RET_PF_CONTINUE;
> }
>
> int sev_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fault)
> {
> int level, rc;
> bool assigned;
>
> if (!sev_snp_guest(kvm))
> return 0;
>
> if (WARN_ON_ONCE(!fault) || !fault->is_private)
> return 0;
>
> rc = snp_lookup_rmpentry(fault->pfn, &assigned, &level);
> if (rc || !assigned)
> return PG_LEVEL_4K;
>
> return level;
> }
I like this. Thanks for the suggestion, I'll pass Fuad some patch(es)
for v13.
>> +/*
>> + * Returns true if the given gfn's private/shared status (in the CoCo sense) is
>> + * private.
>> + *
>> + * A return value of false indicates that the gfn is explicitly or implicitly
>> + * shared (i.e., non-CoCo VMs).
>> + */
>> static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>> {
>> - return IS_ENABLED(CONFIG_KVM_GMEM) &&
>> - kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
>> + struct kvm_memory_slot *slot;
>> +
>> + if (!IS_ENABLED(CONFIG_KVM_GMEM))
>> + return false;
>> +
>> + slot = gfn_to_memslot(kvm, gfn);
>> + if (kvm_slot_has_gmem(slot) && kvm_gmem_memslot_supports_shared(slot)) {
>> + /*
>> + * Without in-place conversion support, if a guest_memfd memslot
>> + * supports shared memory, then all the slot's memory is
>> + * considered not private, i.e., implicitly shared.
>> + */
>> + return false;
>
> Why!?!? Just make sure KVM_MEMORY_ATTRIBUTE_PRIVATE is mutually exclusive with
> mappable guest_memfd. You need to do that no matter what.
Thanks, I agree that setting KVM_MEMORY_ATTRIBUTE_PRIVATE should be
disallowed for gfn ranges whose slot is guest_memfd-only. Missed that
out. Where do people think we should check the mutual exclusivity?
In kvm_supported_mem_attributes() I'm thiking that we should still allow
the use of KVM_MEMORY_ATTRIBUTE_PRIVATE for other non-guest_memfd-only
gfn ranges. Or do people think we should just disallow
KVM_MEMORY_ATTRIBUTE_PRIVATE for the entire VM as long as one memslot is
a guest_memfd-only memslot?
If we check mutually exclusivity when handling
kvm_vm_set_memory_attributes(), as long as part of the range where
KVM_MEMORY_ATTRIBUTE_PRIVATE is requested to be set intersects a range
whose slot is guest_memfd-only, the ioctl will return EINVAL.
> Then you don't need
> to sprinkle special case code all over the place.
>
That's true, thanks.
I guess the special-casing will come back when guest_memfd supports
conversions (and stores shareability). After guest_memfd supports
conversions, if guest_memfd-only memslot, check with guest_memfd. Else,
look up memory attributes with kvm_get_memory_attributes().
>> + }
>> +
>> + return kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
>> }
>> #else
>> static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>> --
>> 2.50.0.rc0.642.g800a2b2222-goog
>>
next prev parent reply other threads:[~2025-06-24 23:40 UTC|newest]
Thread overview: 75+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-11 13:33 [PATCH v12 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 01/18] KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 02/18] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 03/18] KVM: Rename kvm_arch_has_private_mem() to kvm_arch_supports_gmem() Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 04/18] KVM: x86: Rename kvm->arch.has_private_mem to kvm->arch.supports_gmem Fuad Tabba
2025-06-13 13:57 ` Ackerley Tng
2025-06-13 20:35 ` Sean Christopherson
2025-06-16 7:13 ` Fuad Tabba
2025-06-16 14:20 ` David Hildenbrand
2025-06-24 20:51 ` Ackerley Tng
2025-06-25 6:33 ` Roy, Patrick
2025-06-11 13:33 ` [PATCH v12 05/18] KVM: Rename kvm_slot_can_be_private() to kvm_slot_has_gmem() Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 06/18] KVM: Fix comments that refer to slots_lock Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 07/18] KVM: Fix comment that refers to kvm uapi header path Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages Fuad Tabba
2025-06-12 16:16 ` Shivank Garg
2025-06-13 21:03 ` Sean Christopherson
2025-06-13 21:18 ` David Hildenbrand
2025-06-13 22:48 ` Sean Christopherson
2025-06-16 6:52 ` Fuad Tabba
2025-06-16 14:16 ` David Hildenbrand
2025-06-17 23:04 ` Sean Christopherson
2025-06-18 11:18 ` Fuad Tabba
2025-06-16 13:44 ` Ira Weiny
2025-06-16 14:03 ` David Hildenbrand
2025-06-16 14:16 ` Fuad Tabba
2025-06-16 14:25 ` David Hildenbrand
2025-06-18 0:40 ` Sean Christopherson
2025-06-18 8:15 ` David Hildenbrand
2025-06-18 9:20 ` Xiaoyao Li
2025-06-18 9:27 ` David Hildenbrand
2025-06-18 9:44 ` Xiaoyao Li
2025-06-18 9:59 ` David Hildenbrand
2025-06-18 10:42 ` Xiaoyao Li
2025-06-18 11:14 ` David Hildenbrand
2025-06-18 12:17 ` Xiaoyao Li
2025-06-18 13:16 ` David Hildenbrand
2025-06-19 1:48 ` Sean Christopherson
2025-06-19 1:50 ` Sean Christopherson
2025-06-18 9:25 ` David Hildenbrand
2025-06-25 21:47 ` Ackerley Tng
2025-06-11 13:33 ` [PATCH v12 09/18] KVM: guest_memfd: Track shared memory support in memslot Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 10/18] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory Fuad Tabba
2025-06-13 22:08 ` Sean Christopherson
2025-06-24 23:40 ` Ackerley Tng [this message]
2025-06-27 15:01 ` Ackerley Tng
2025-06-30 8:07 ` Fuad Tabba
2025-06-30 14:44 ` Ackerley Tng
2025-06-30 15:08 ` Fuad Tabba
2025-06-30 19:26 ` Shivank Garg
2025-06-30 20:03 ` David Hildenbrand
2025-07-01 14:15 ` Ackerley Tng
2025-07-01 14:44 ` David Hildenbrand
2025-07-08 0:05 ` Sean Christopherson
2025-07-08 13:44 ` Ackerley Tng
2025-06-11 13:33 ` [PATCH v12 11/18] KVM: x86: Consult guest_memfd when computing max_mapping_level Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 12/18] KVM: x86: Enable guest_memfd shared memory for non-CoCo VMs Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 13/18] KVM: arm64: Refactor user_mem_abort() Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 14/18] KVM: arm64: Handle guest_memfd-backed guest page faults Fuad Tabba
2025-06-12 17:33 ` James Houghton
2025-06-11 13:33 ` [PATCH v12 15/18] KVM: arm64: Enable host mapping of shared guest_memfd memory Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 16/18] KVM: Introduce the KVM capability KVM_CAP_GMEM_SHARED_MEM Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 17/18] KVM: selftests: Don't use hardcoded page sizes in guest_memfd test Fuad Tabba
2025-06-12 16:24 ` Shivank Garg
2025-06-11 13:33 ` [PATCH v12 18/18] KVM: selftests: guest_memfd mmap() test when mapping is allowed Fuad Tabba
2025-06-12 16:23 ` Shivank Garg
2025-06-12 17:38 ` [PATCH v12 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs David Hildenbrand
2025-06-24 10:02 ` Fuad Tabba
2025-06-24 10:16 ` David Hildenbrand
2025-06-24 10:25 ` Fuad Tabba
2025-06-24 11:44 ` David Hildenbrand
2025-06-24 11:58 ` Fuad Tabba
2025-06-24 17:50 ` Sean Christopherson
2025-06-25 8:00 ` Fuad Tabba
2025-06-25 14:07 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=diqz1pr8lndp.fsf@ackerleytng-ctop.c.googlers.com \
--to=ackerleytng@google.com \
--cc=akpm@linux-foundation.org \
--cc=amoorthy@google.com \
--cc=anup@brainfault.org \
--cc=aou@eecs.berkeley.edu \
--cc=brauner@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=chao.p.peng@linux.intel.com \
--cc=chenhuacai@kernel.org \
--cc=david@redhat.com \
--cc=dmatlack@google.com \
--cc=fvdl@google.com \
--cc=hch@infradead.org \
--cc=hughd@google.com \
--cc=ira.weiny@intel.com \
--cc=isaku.yamahata@gmail.com \
--cc=isaku.yamahata@intel.com \
--cc=james.morse@arm.com \
--cc=jarkko@kernel.org \
--cc=jgg@nvidia.com \
--cc=jhubbard@nvidia.com \
--cc=jthoughton@google.com \
--cc=keirf@google.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.linux.dev \
--cc=liam.merwick@oracle.com \
--cc=linux-arm-msm@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mail@maciej.szmigiero.name \
--cc=maz@kernel.org \
--cc=mic@digikod.net \
--cc=michael.roth@amd.com \
--cc=mpe@ellerman.id.au \
--cc=oliver.upton@linux.dev \
--cc=palmer@dabbelt.com \
--cc=pankaj.gupta@amd.com \
--cc=paul.walmsley@sifive.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=qperret@google.com \
--cc=quic_cvanscha@quicinc.com \
--cc=quic_eberman@quicinc.com \
--cc=quic_mnalajal@quicinc.com \
--cc=quic_pderrin@quicinc.com \
--cc=quic_pheragu@quicinc.com \
--cc=quic_svaddagi@quicinc.com \
--cc=quic_tsoni@quicinc.com \
--cc=rientjes@google.com \
--cc=roypat@amazon.co.uk \
--cc=seanjc@google.com \
--cc=shuah@kernel.org \
--cc=steven.price@arm.com \
--cc=suzuki.poulose@arm.com \
--cc=tabba@google.com \
--cc=vannapurve@google.com \
--cc=vbabka@suse.cz \
--cc=viro@zeniv.linux.org.uk \
--cc=wei.w.wang@intel.com \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=xiaoyao.li@intel.com \
--cc=yilun.xu@intel.com \
--cc=yuzenghui@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox