Re: [PATCH v12 10/18] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Ackerley Tng <ackerleytng@google.com>
To: Sean Christopherson <seanjc@google.com>, Fuad Tabba <tabba@google.com>
Cc: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org,
	linux-mm@kvack.org,  kvmarm@lists.linux.dev, pbonzini@redhat.com,
	chenhuacai@kernel.org,  mpe@ellerman.id.au, anup@brainfault.org,
	paul.walmsley@sifive.com,  palmer@dabbelt.com,
	aou@eecs.berkeley.edu, viro@zeniv.linux.org.uk,
	 brauner@kernel.org, willy@infradead.org,
	akpm@linux-foundation.org,  xiaoyao.li@intel.com,
	yilun.xu@intel.com, chao.p.peng@linux.intel.com,
	 jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com,
	 isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz,
	 vannapurve@google.com, mail@maciej.szmigiero.name,
	david@redhat.com,  michael.roth@amd.com, wei.w.wang@intel.com,
	liam.merwick@oracle.com,  isaku.yamahata@gmail.com,
	kirill.shutemov@linux.intel.com,  suzuki.poulose@arm.com,
	steven.price@arm.com, quic_eberman@quicinc.com,
	 quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com,
	quic_svaddagi@quicinc.com,  quic_cvanscha@quicinc.com,
	quic_pderrin@quicinc.com, quic_pheragu@quicinc.com,
	 catalin.marinas@arm.com, james.morse@arm.com,
	yuzenghui@huawei.com,  oliver.upton@linux.dev, maz@kernel.org,
	will@kernel.org, qperret@google.com,  keirf@google.com,
	roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org,
	 jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com,
	fvdl@google.com,  hughd@google.com, jthoughton@google.com,
	peterx@redhat.com,  pankaj.gupta@amd.com, ira.weiny@intel.com
Subject: Re: [PATCH v12 10/18] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory
Date: Tue, 24 Jun 2025 16:40:18 -0700	[thread overview]
Message-ID: <diqz1pr8lndp.fsf@ackerleytng-ctop.c.googlers.com> (raw)
In-Reply-To: <aEyhHgwQXW4zbx-k@google.com>

Sean Christopherson <seanjc@google.com> writes:

> On Wed, Jun 11, 2025, Fuad Tabba wrote:
>> From: Ackerley Tng <ackerleytng@google.com>
>> 
>> For memslots backed by guest_memfd with shared mem support, the KVM MMU
>> must always fault in pages from guest_memfd, and not from the host
>> userspace_addr. Update the fault handler to do so.
>
> And with a KVM_MEMSLOT_GUEST_MEMFD_ONLY flag, this becomes super obvious.
>
>> This patch also refactors related function names for accuracy:
>
> This patch.  And phrase changelogs as commands.
>
>> kvm_mem_is_private() returns true only when the current private/shared
>> state (in the CoCo sense) of the memory is private, and returns false if
>> the current state is shared explicitly or impicitly, e.g., belongs to a
>> non-CoCo VM.
>
> Again, state changes as commands.  For the above, it's not obvious if you're
> talking about the existing code versus the state of things after "this patch".
>
>

Will fix these, thanks!

>> kvm_mmu_faultin_pfn_gmem() is updated to indicate that it can be used to
>> fault in not just private memory, but more generally, from guest_memfd.
>
>> +static inline u8 kvm_max_level_for_order(int order)
>
> Do not use "inline" for functions that are visible only to the local compilation
> unit.  "inline" is just a hint, and modern compilers are smart enough to inline
> functions when appropriate without a hint.
>
> A longer explanation/rant here: https://lore.kernel.org/all/ZAdfX+S323JVWNZC@google.com
>

Will fix this!

>> +static inline int kvm_gmem_max_mapping_level(const struct kvm_memory_slot *slot,
>> +					     gfn_t gfn, int max_level)
>> +{
>> +	int max_order;
>>  
>>  	if (max_level == PG_LEVEL_4K)
>>  		return PG_LEVEL_4K;
>
> This is dead code, the one and only caller has *just* checked for this condition.
>>  
>> -	host_level = host_pfn_mapping_level(kvm, gfn, slot);
>> -	return min(host_level, max_level);
>> +	max_order = kvm_gmem_mapping_order(slot, gfn);
>> +	return min(max_level, kvm_max_level_for_order(max_order));
>>  }
>
> ...
>
>> -static u8 kvm_max_private_mapping_level(struct kvm *kvm, kvm_pfn_t pfn,
>> -					u8 max_level, int gmem_order)
>> +static u8 kvm_max_level_for_fault_and_order(struct kvm *kvm,
>
> This is comically verbose.  C ain't Java.  And having two separate helpers makes
> it *really* hard to (a) even see there are TWO helpers in the first place, and
> (b) understand how they differ.
>
> Gah, and not your bug, but completely ignoring the RMP in kvm_mmu_max_mapping_level()
> is wrong.  It "works" because guest_memfd doesn't (yet) support dirty logging,
> no one enables the NX hugepage mitigation on AMD hosts.
>
> We could plumb in the pfn and private info, but I don't really see the point,
> at least not at this time.
>
>> +					    struct kvm_page_fault *fault,
>> +					    int order)
>>  {
>> -	u8 req_max_level;
>> +	u8 max_level = fault->max_level;
>>  
>>  	if (max_level == PG_LEVEL_4K)
>>  		return PG_LEVEL_4K;
>>  
>> -	max_level = min(kvm_max_level_for_order(gmem_order), max_level);
>> +	max_level = min(kvm_max_level_for_order(order), max_level);
>>  	if (max_level == PG_LEVEL_4K)
>>  		return PG_LEVEL_4K;
>>  
>> -	req_max_level = kvm_x86_call(private_max_mapping_level)(kvm, pfn);
>> -	if (req_max_level)
>> -		max_level = min(max_level, req_max_level);
>> +	if (fault->is_private) {
>> +		u8 level = kvm_x86_call(private_max_mapping_level)(kvm, fault->pfn);
>
> Hmm, so the interesting thing here is that (IIRC) the RMP restrictions aren't
> just on the private pages, they also apply to the HYPERVISOR/SHARED pages.  (Don't
> quote me on that).
>
> Regardless, I'm leaning toward dropping the "private" part, and making SNP deal
> with the intricacies of the RMP:
>
> 	/* Some VM types have additional restrictions, e.g. SNP's RMP. */
> 	req_max_level = kvm_x86_call(max_mapping_level)(kvm, fault);
> 	if (req_max_level)
> 		max_level = min(max_level, req_max_level);
>
> Then we can get to something like:
>
> static int kvm_gmem_max_mapping_level(struct kvm *kvm, int order,
> 				      struct kvm_page_fault *fault)
> {
> 	int max_level, req_max_level;
>
> 	max_level = kvm_max_level_for_order(order);
> 	if (max_level == PG_LEVEL_4K)
> 		return PG_LEVEL_4K;
>
> 	req_max_level = kvm_x86_call(max_mapping_level)(kvm, fault);
> 	if (req_max_level)
> 		max_level = min(max_level, req_max_level);
>
> 	return max_level;
> }
>
> int kvm_mmu_max_mapping_level(struct kvm *kvm,
> 			      const struct kvm_memory_slot *slot, gfn_t gfn)
> {
> 	int max_level;
>
> 	max_level = kvm_lpage_info_max_mapping_level(kvm, slot, gfn, PG_LEVEL_NUM);
> 	if (max_level == PG_LEVEL_4K)
> 		return PG_LEVEL_4K;
>
> 	/* TODO: Comment goes here about KVM not supporting this path (yet). */

Which path does KVM not support?

> 	if (kvm_mem_is_private(kvm, gfn))
> 		return PG_LEVEL_4K;
>

Just making sure - this suggestion does take into account that
kvm_mem_is_private() will be querying guest_memfd for memory privacy
status, right? So the check below for kvm_is_memslot_gmem_only() will
only be handling the cases where the memory is shared, and only
guest_memfd is used for this gfn?

> 	if (kvm_is_memslot_gmem_only(slot)) {
> 		int order = kvm_gmem_mapping_order(slot, gfn);
>
> 		return min(max_level, kvm_gmem_max_mapping_level(kvm, order, NULL));
> 	}
>
> 	return min(max_level, host_pfn_mapping_level(kvm, gfn, slot));
> }
>
> static int kvm_mmu_faultin_pfn_gmem(struct kvm_vcpu *vcpu,
> 				    struct kvm_page_fault *fault)
> {
> 	struct kvm *kvm = vcpu->kvm;
> 	int order, r;
>
> 	if (!kvm_slot_has_gmem(fault->slot)) {
> 		kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
> 		return -EFAULT;
> 	}
>
> 	r = kvm_gmem_get_pfn(kvm, fault->slot, fault->gfn, &fault->pfn,
> 			     &fault->refcounted_page, &order);
> 	if (r) {
> 		kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
> 		return r;
> 	}
>
> 	fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY);
> 	fault->max_level = kvm_gmem_max_mapping_level(kvm, order, fault);
>
> 	return RET_PF_CONTINUE;
> }
>
> int sev_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fault)
> {
> 	int level, rc;
> 	bool assigned;
>
> 	if (!sev_snp_guest(kvm))
> 		return 0;
>
> 	if (WARN_ON_ONCE(!fault) || !fault->is_private)
> 		return 0;
>
> 	rc = snp_lookup_rmpentry(fault->pfn, &assigned, &level);
> 	if (rc || !assigned)
> 		return PG_LEVEL_4K;
>
> 	return level;
> }

I like this. Thanks for the suggestion, I'll pass Fuad some patch(es)
for v13.

>> +/*
>> + * Returns true if the given gfn's private/shared status (in the CoCo sense) is
>> + * private.
>> + *
>> + * A return value of false indicates that the gfn is explicitly or implicitly
>> + * shared (i.e., non-CoCo VMs).
>> + */
>>  static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>>  {
>> -	return IS_ENABLED(CONFIG_KVM_GMEM) &&
>> -	       kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
>> +	struct kvm_memory_slot *slot;
>> +
>> +	if (!IS_ENABLED(CONFIG_KVM_GMEM))
>> +		return false;
>> +
>> +	slot = gfn_to_memslot(kvm, gfn);
>> +	if (kvm_slot_has_gmem(slot) && kvm_gmem_memslot_supports_shared(slot)) {
>> +		/*
>> +		 * Without in-place conversion support, if a guest_memfd memslot
>> +		 * supports shared memory, then all the slot's memory is
>> +		 * considered not private, i.e., implicitly shared.
>> +		 */
>> +		return false;
>
> Why!?!?  Just make sure KVM_MEMORY_ATTRIBUTE_PRIVATE is mutually exclusive with
> mappable guest_memfd.  You need to do that no matter what. 

Thanks, I agree that setting KVM_MEMORY_ATTRIBUTE_PRIVATE should be
disallowed for gfn ranges whose slot is guest_memfd-only. Missed that
out. Where do people think we should check the mutual exclusivity?

In kvm_supported_mem_attributes() I'm thiking that we should still allow
the use of KVM_MEMORY_ATTRIBUTE_PRIVATE for other non-guest_memfd-only
gfn ranges. Or do people think we should just disallow
KVM_MEMORY_ATTRIBUTE_PRIVATE for the entire VM as long as one memslot is
a guest_memfd-only memslot?

If we check mutually exclusivity when handling
kvm_vm_set_memory_attributes(), as long as part of the range where
KVM_MEMORY_ATTRIBUTE_PRIVATE is requested to be set intersects a range
whose slot is guest_memfd-only, the ioctl will return EINVAL.

> Then you don't need
> to sprinkle special case code all over the place.
>

That's true, thanks.

I guess the special-casing will come back when guest_memfd supports
conversions (and stores shareability). After guest_memfd supports
conversions, if guest_memfd-only memslot, check with guest_memfd. Else,
look up memory attributes with kvm_get_memory_attributes().

>> +	}
>> +
>> +	return kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
>>  }
>>  #else
>>  static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>> -- 
>> 2.50.0.rc0.642.g800a2b2222-goog
>>

next prev parent reply	other threads:[~2025-06-24 23:40 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-11 13:33 [PATCH v12 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 01/18] KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 02/18] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 03/18] KVM: Rename kvm_arch_has_private_mem() to kvm_arch_supports_gmem() Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 04/18] KVM: x86: Rename kvm->arch.has_private_mem to kvm->arch.supports_gmem Fuad Tabba
2025-06-13 13:57   ` Ackerley Tng
2025-06-13 20:35   ` Sean Christopherson
2025-06-16  7:13     ` Fuad Tabba
2025-06-16 14:20       ` David Hildenbrand
2025-06-24 20:51     ` Ackerley Tng
2025-06-25  6:33       ` Roy, Patrick
2025-06-11 13:33 ` [PATCH v12 05/18] KVM: Rename kvm_slot_can_be_private() to kvm_slot_has_gmem() Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 06/18] KVM: Fix comments that refer to slots_lock Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 07/18] KVM: Fix comment that refers to kvm uapi header path Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages Fuad Tabba
2025-06-12 16:16   ` Shivank Garg
2025-06-13 21:03   ` Sean Christopherson
2025-06-13 21:18     ` David Hildenbrand
2025-06-13 22:48     ` Sean Christopherson
2025-06-16  6:52     ` Fuad Tabba
2025-06-16 14:16       ` David Hildenbrand
2025-06-17 23:04       ` Sean Christopherson
2025-06-18 11:18         ` Fuad Tabba
2025-06-16 13:44     ` Ira Weiny
2025-06-16 14:03       ` David Hildenbrand
2025-06-16 14:16         ` Fuad Tabba
2025-06-16 14:25           ` David Hildenbrand
2025-06-18  0:40             ` Sean Christopherson
2025-06-18  8:15               ` David Hildenbrand
2025-06-18  9:20                 ` Xiaoyao Li
2025-06-18  9:27                   ` David Hildenbrand
2025-06-18  9:44                     ` Xiaoyao Li
2025-06-18  9:59                       ` David Hildenbrand
2025-06-18 10:42                         ` Xiaoyao Li
2025-06-18 11:14                           ` David Hildenbrand
2025-06-18 12:17                             ` Xiaoyao Li
2025-06-18 13:16                               ` David Hildenbrand
2025-06-19  1:48                 ` Sean Christopherson
2025-06-19  1:50                   ` Sean Christopherson
2025-06-18  9:25     ` David Hildenbrand
2025-06-25 21:47   ` Ackerley Tng
2025-06-11 13:33 ` [PATCH v12 09/18] KVM: guest_memfd: Track shared memory support in memslot Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 10/18] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory Fuad Tabba
2025-06-13 22:08   ` Sean Christopherson
2025-06-24 23:40     ` Ackerley Tng [this message]
2025-06-27 15:01       ` Ackerley Tng
2025-06-30  8:07         ` Fuad Tabba
2025-06-30 14:44           ` Ackerley Tng
2025-06-30 15:08             ` Fuad Tabba
2025-06-30 19:26               ` Shivank Garg
2025-06-30 20:03                 ` David Hildenbrand
2025-07-01 14:15                   ` Ackerley Tng
2025-07-01 14:44                     ` David Hildenbrand
2025-07-08  0:05                       ` Sean Christopherson
2025-07-08 13:44                         ` Ackerley Tng
2025-06-11 13:33 ` [PATCH v12 11/18] KVM: x86: Consult guest_memfd when computing max_mapping_level Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 12/18] KVM: x86: Enable guest_memfd shared memory for non-CoCo VMs Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 13/18] KVM: arm64: Refactor user_mem_abort() Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 14/18] KVM: arm64: Handle guest_memfd-backed guest page faults Fuad Tabba
2025-06-12 17:33   ` James Houghton
2025-06-11 13:33 ` [PATCH v12 15/18] KVM: arm64: Enable host mapping of shared guest_memfd memory Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 16/18] KVM: Introduce the KVM capability KVM_CAP_GMEM_SHARED_MEM Fuad Tabba
2025-06-11 13:33 ` [PATCH v12 17/18] KVM: selftests: Don't use hardcoded page sizes in guest_memfd test Fuad Tabba
2025-06-12 16:24   ` Shivank Garg
2025-06-11 13:33 ` [PATCH v12 18/18] KVM: selftests: guest_memfd mmap() test when mapping is allowed Fuad Tabba
2025-06-12 16:23   ` Shivank Garg
2025-06-12 17:38 ` [PATCH v12 00/18] KVM: Mapping guest_memfd backed memory at the host for software protected VMs David Hildenbrand
2025-06-24 10:02   ` Fuad Tabba
2025-06-24 10:16     ` David Hildenbrand
2025-06-24 10:25       ` Fuad Tabba
2025-06-24 11:44         ` David Hildenbrand
2025-06-24 11:58           ` Fuad Tabba
2025-06-24 17:50             ` Sean Christopherson
2025-06-25  8:00               ` Fuad Tabba
2025-06-25 14:07                 ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=diqz1pr8lndp.fsf@ackerleytng-ctop.c.googlers.com \
    --to=ackerleytng@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=amoorthy@google.com \
    --cc=anup@brainfault.org \
    --cc=aou@eecs.berkeley.edu \
    --cc=brauner@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=chao.p.peng@linux.intel.com \
    --cc=chenhuacai@kernel.org \
    --cc=david@redhat.com \
    --cc=dmatlack@google.com \
    --cc=fvdl@google.com \
    --cc=hch@infradead.org \
    --cc=hughd@google.com \
    --cc=ira.weiny@intel.com \
    --cc=isaku.yamahata@gmail.com \
    --cc=isaku.yamahata@intel.com \
    --cc=james.morse@arm.com \
    --cc=jarkko@kernel.org \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=jthoughton@google.com \
    --cc=keirf@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=liam.merwick@oracle.com \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mail@maciej.szmigiero.name \
    --cc=maz@kernel.org \
    --cc=mic@digikod.net \
    --cc=michael.roth@amd.com \
    --cc=mpe@ellerman.id.au \
    --cc=oliver.upton@linux.dev \
    --cc=palmer@dabbelt.com \
    --cc=pankaj.gupta@amd.com \
    --cc=paul.walmsley@sifive.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qperret@google.com \
    --cc=quic_cvanscha@quicinc.com \
    --cc=quic_eberman@quicinc.com \
    --cc=quic_mnalajal@quicinc.com \
    --cc=quic_pderrin@quicinc.com \
    --cc=quic_pheragu@quicinc.com \
    --cc=quic_svaddagi@quicinc.com \
    --cc=quic_tsoni@quicinc.com \
    --cc=rientjes@google.com \
    --cc=roypat@amazon.co.uk \
    --cc=seanjc@google.com \
    --cc=shuah@kernel.org \
    --cc=steven.price@arm.com \
    --cc=suzuki.poulose@arm.com \
    --cc=tabba@google.com \
    --cc=vannapurve@google.com \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=wei.w.wang@intel.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=xiaoyao.li@intel.com \
    --cc=yilun.xu@intel.com \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox