Re: [PATCH v8 06/13] KVM: x86: Generalize private fault lookups to guest_memfd fault lookups

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Ackerley Tng <ackerleytng@google.com>
To: David Hildenbrand <david@redhat.com>,
	Sean Christopherson <seanjc@google.com>,
	 Vishal Annapurve <vannapurve@google.com>
Cc: Fuad Tabba <tabba@google.com>,
	kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org,
	 linux-mm@kvack.org, pbonzini@redhat.com, chenhuacai@kernel.org,
	 mpe@ellerman.id.au, anup@brainfault.org,
	paul.walmsley@sifive.com,  palmer@dabbelt.com,
	aou@eecs.berkeley.edu, viro@zeniv.linux.org.uk,
	 brauner@kernel.org, willy@infradead.org,
	akpm@linux-foundation.org,  xiaoyao.li@intel.com,
	yilun.xu@intel.com, chao.p.peng@linux.intel.com,
	 jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com,
	 isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz,
	 mail@maciej.szmigiero.name, michael.roth@amd.com,
	wei.w.wang@intel.com,  liam.merwick@oracle.com,
	isaku.yamahata@gmail.com,  kirill.shutemov@linux.intel.com,
	suzuki.poulose@arm.com, steven.price@arm.com,
	 quic_eberman@quicinc.com, quic_mnalajal@quicinc.com,
	quic_tsoni@quicinc.com,  quic_svaddagi@quicinc.com,
	quic_cvanscha@quicinc.com,  quic_pderrin@quicinc.com,
	quic_pheragu@quicinc.com, catalin.marinas@arm.com,
	 james.morse@arm.com, yuzenghui@huawei.com,
	oliver.upton@linux.dev,  maz@kernel.org, will@kernel.org,
	qperret@google.com, keirf@google.com,  roypat@amazon.co.uk,
	shuah@kernel.org, hch@infradead.org, jgg@nvidia.com,
	 rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com,
	hughd@google.com,  jthoughton@google.com, peterx@redhat.com,
	pankaj.gupta@amd.com
Subject: Re: [PATCH v8 06/13] KVM: x86: Generalize private fault lookups to guest_memfd fault lookups
Date: Tue, 06 May 2025 13:46:58 -0700	[thread overview]
Message-ID: <diqzh61xqxfh.fsf@ackerleytng-ctop.c.googlers.com> (raw)
In-Reply-To: <39ea3946-6683-462e-af5d-fe7d28ab7d00@redhat.com>

David Hildenbrand <david@redhat.com> writes:

> On 06.05.25 15:58, Sean Christopherson wrote:
>> On Mon, May 05, 2025, Vishal Annapurve wrote:
>>> On Mon, May 5, 2025 at 10:17 PM Vishal Annapurve <vannapurve@google.com> wrote:
>>>>
>>>> On Mon, May 5, 2025 at 3:57 PM Sean Christopherson <seanjc@google.com> wrote:
>>>>>> ...
>>>>>> And not worry about lpage_infor for the time being, until we actually do
>>>>>> support larger pages.
>>>>>
>>>>> I don't want to completely punt on this, because if it gets messy, then I want
>>>>> to know now and have a solution in hand, not find out N months from now.
>>>>>
>>>>> That said, I don't expect it to be difficult.  What we could punt on is
>>>>> performance of the lookups, which is the real reason KVM maintains the rather
>>>>> expensive disallow_lpage array.
>>>>>
>>>>> And that said, memslots can only bind to one guest_memfd instance, so I don't
>>>>> immediately see any reason why the guest_memfd ioctl() couldn't process the
>>>>> slots that are bound to it.  I.e. why not update KVM_LPAGE_MIXED_FLAG from the
>>>>> guest_memfd ioctl() instead of from KVM_SET_MEMORY_ATTRIBUTES?
>>>>
>>>> I am missing the point here to update KVM_LPAGE_MIXED_FLAG for the
>>>> scenarios where in-place memory conversion will be supported with
>>>> guest_memfd. As guest_memfd support for hugepages comes with the
>>>> design that hugepages can't have mixed attributes. i.e. max_order
>>>> returned by get_pfn will always have the same attributes for the folio
>>>> range.
>> 
>> Oh, if this will naturally be handled by guest_memfd, then do that.  I was purely
>> reacting to David's suggestion to "not worry about lpage_infor for the time being,
>> until we actually do support larger pages".
>> 
>>>> Is your suggestion around using guest_memfd ioctl() to also toggle
>>>> memory attributes for the scenarios where guest_memfd instance doesn't
>>>> have in-place memory conversion feature enabled?
>>>
>>> Reading more into your response, I guess your suggestion is about
>>> covering different usecases present today and new usecases which may
>>> land in future, that rely on kvm_lpage_info for faster lookup. If so,
>>> then it should be easy to modify guest_memfd ioctl to update
>>> kvm_lpage_info as you suggested.
>> 
>> Nah, I just missed/forgot that using a single guest_memfd for private and shared
>> would naturally need to split the folio and thus this would Just Work.

Sean, David, I'm circling back to make sure I'm following the discussion
correctly before Fuad sends out the next revision of this series.

>
> Yeah, I ignored that fact as well. So essentially, this patch should be 
> mostly good for now.
>

From here [1], these changes will make it to v9

+ kvm_max_private_mapping_level renaming to kvm_max_gmem_mapping_level
+ kvm_mmu_faultin_pfn_private renaming to kvm_mmu_faultin_pfn_gmem

> Only kvm_mmu_hugepage_adjust() must be taught to not rely on 
> fault->is_private.
>

I think fault->is_private should contribute to determining the max
mapping level.

By the time kvm_mmu_hugepage_adjust() is called,

* For Coco VMs using guest_memfd only for private memory,
  * fault->is_private would have been checked to align with
    kvm->mem_attr_array, so 
* For Coco VMs using guest_memfd for both private/shared memory,
  * fault->is_private would have been checked to align with
    guest_memfd's shareability
* For non-Coco VMs using guest_memfd
  * fault->is_private would be false

Hence fault->is_private can be relied on when calling
kvm_mmu_hugepage_adjust().

If fault->is_private, there will be no host userspace mapping to check,
hence in __kvm_mmu_max_mapping_level(), we should skip querying host
page tables.

If !fault->is_private, for shared memory ranges, if the VM uses
guest_memfd only for shared memory, we should query host page tables.

If !fault->is_private, for shared memory ranges, if the VM uses
guest_memfd for both shared/private memory, we should not query host
page tables.

If !fault->is_private, for non-Coco VMs, we should not query host page
tables.

I propose to rename the parameter is_private to skip_host_page_tables,
so

- if (is_private)
+ if (skip_host_page_tables)
	return max_level;

and pass

skip_host_page_tables = fault->is_private ||
			kvm_gmem_memslot_supports_shared(fault->slot);

where kvm_gmem_memslot_supports_shared() checks the inode in the memslot
for GUEST_MEMFD_FLAG_SUPPORT_SHARED.

For recover_huge_pages_range(), the other user of
__kvm_mmu_max_mapping_level(), currently there's no prior call to
kvm_gmem_get_pfn() to get max_order or max_level, so I propose to call
__kvm_mmu_max_mapping_level() with

if (kvm_gmem_memslot_supports_shared(slot)) {
	max_level = kvm_gmem_max_mapping_level(slot, gfn);
	skip_host_page_tables = true;
} else {
	max_level = PG_LEVEL_NUM;
        skip_host_page_tables = kvm_slot_has_gmem(slot) &&
				kvm_mem_is_private(kvm, gfn);
}

Without 1G support, kvm_gmem_max_mapping_level(slot, gfn) would always
return 4K.

With 1G support, kvm_gmem_max_mapping_level(slot, gfn) would return the
level for the page's order, at the offset corresponding to the gfn.

> Once we support large folios in guest_memfd, only the "alignment" 
> consideration might have to be taken into account.
>

I'll be handling this alignment as part of the 1G page support series
(won't be part of Fuad's first stage series) [2]

> Anything else?
>
> -- 
> Cheers,
>
> David / dhildenb


[1] https://lore.kernel.org/all/20250430165655.605595-7-tabba@google.com/
[2] https://lore.kernel.org/all/diqz1pt1sfw8.fsf@ackerleytng-ctop.c.googlers.com/

next prev parent reply	other threads:[~2025-05-06 20:47 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-30 16:56 [PATCH v8 00/13] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
2025-04-30 16:56 ` [PATCH v8 01/13] KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM Fuad Tabba
2025-05-01 17:38   ` Ira Weiny
2025-04-30 16:56 ` [PATCH v8 02/13] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE Fuad Tabba
2025-05-01 18:10   ` Ira Weiny
2025-05-02  6:44     ` David Hildenbrand
2025-05-02 14:24       ` Ira Weiny
2025-04-30 16:56 ` [PATCH v8 03/13] KVM: Rename kvm_arch_has_private_mem() to kvm_arch_supports_gmem() Fuad Tabba
2025-05-01 18:18   ` Ira Weiny
2025-04-30 16:56 ` [PATCH v8 04/13] KVM: x86: Rename kvm->arch.has_private_mem to kvm->arch.supports_gmem Fuad Tabba
2025-05-01 18:19   ` Ira Weiny
2025-04-30 16:56 ` [PATCH v8 05/13] KVM: Rename kvm_slot_can_be_private() to kvm_slot_has_gmem() Fuad Tabba
2025-05-01 21:37   ` Ira Weiny
2025-04-30 16:56 ` [PATCH v8 06/13] KVM: x86: Generalize private fault lookups to guest_memfd fault lookups Fuad Tabba
2025-04-30 18:58   ` Ackerley Tng
2025-05-01  9:53     ` Fuad Tabba
2025-05-02 15:04     ` David Hildenbrand
2025-05-02 16:21       ` Sean Christopherson
2025-05-02 22:00         ` Ackerley Tng
2025-05-05  8:01           ` David Hildenbrand
2025-05-05 22:57             ` Sean Christopherson
2025-05-06  5:17               ` Vishal Annapurve
2025-05-06  5:28                 ` Vishal Annapurve
2025-05-06 13:58                   ` Sean Christopherson
2025-05-06 14:15                     ` David Hildenbrand
2025-05-06 20:46                       ` Ackerley Tng [this message]
2025-05-08 14:12                         ` Sean Christopherson
2025-05-08 14:46                         ` David Hildenbrand
2025-05-09 21:04                         ` James Houghton
2025-05-09 22:29                           ` David Hildenbrand
2025-05-09 22:38                             ` James Houghton
2025-05-06 19:27               ` Ackerley Tng
2025-05-05 23:09             ` Ackerley Tng
2025-05-05 23:17               ` Sean Christopherson
2025-05-01 21:38   ` Ira Weiny
2025-04-30 16:56 ` [PATCH v8 07/13] KVM: Fix comments that refer to slots_lock Fuad Tabba
2025-04-30 21:30   ` David Hildenbrand
2025-05-01 21:43   ` Ira Weiny
2025-05-02 12:07     ` Fuad Tabba
2025-04-30 16:56 ` [PATCH v8 08/13] KVM: guest_memfd: Allow host to map guest_memfd() pages Fuad Tabba
2025-04-30 21:33   ` David Hildenbrand
2025-05-01  8:07     ` Fuad Tabba
2025-05-02 15:11   ` David Hildenbrand
2025-05-02 22:06     ` Ackerley Tng
2025-05-02 22:29   ` Ackerley Tng
2025-05-06  8:47     ` Yan Zhao
2025-05-05 21:06   ` Ira Weiny
2025-05-06 12:15     ` Fuad Tabba
2025-05-09 20:54   ` James Houghton
2025-05-11  8:03     ` David Hildenbrand
2025-05-12  7:08       ` Fuad Tabba
2025-05-12 19:29         ` James Houghton
2025-05-12  7:46       ` Roy, Patrick
2025-04-30 16:56 ` [PATCH v8 09/13] KVM: arm64: Refactor user_mem_abort() calculation of force_pte Fuad Tabba
2025-04-30 21:35   ` David Hildenbrand
2025-04-30 16:56 ` [PATCH v8 10/13] KVM: arm64: Handle guest_memfd()-backed guest page faults Fuad Tabba
2025-05-09 20:15   ` James Houghton
2025-05-12  7:07     ` Fuad Tabba
2025-04-30 16:56 ` [PATCH v8 11/13] KVM: arm64: Enable mapping guest_memfd in arm64 Fuad Tabba
2025-05-09 21:08   ` James Houghton
2025-05-12  6:55     ` Fuad Tabba
2025-04-30 16:56 ` [PATCH v8 12/13] KVM: x86: KVM_X86_SW_PROTECTED_VM to support guest_memfd shared memory Fuad Tabba
2025-04-30 16:56 ` [PATCH v8 13/13] KVM: guest_memfd: selftests: guest_memfd mmap() test when mapping is allowed Fuad Tabba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=diqzh61xqxfh.fsf@ackerleytng-ctop.c.googlers.com \
    --to=ackerleytng@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=amoorthy@google.com \
    --cc=anup@brainfault.org \
    --cc=aou@eecs.berkeley.edu \
    --cc=brauner@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=chao.p.peng@linux.intel.com \
    --cc=chenhuacai@kernel.org \
    --cc=david@redhat.com \
    --cc=dmatlack@google.com \
    --cc=fvdl@google.com \
    --cc=hch@infradead.org \
    --cc=hughd@google.com \
    --cc=isaku.yamahata@gmail.com \
    --cc=isaku.yamahata@intel.com \
    --cc=james.morse@arm.com \
    --cc=jarkko@kernel.org \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=jthoughton@google.com \
    --cc=keirf@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=liam.merwick@oracle.com \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mail@maciej.szmigiero.name \
    --cc=maz@kernel.org \
    --cc=mic@digikod.net \
    --cc=michael.roth@amd.com \
    --cc=mpe@ellerman.id.au \
    --cc=oliver.upton@linux.dev \
    --cc=palmer@dabbelt.com \
    --cc=pankaj.gupta@amd.com \
    --cc=paul.walmsley@sifive.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qperret@google.com \
    --cc=quic_cvanscha@quicinc.com \
    --cc=quic_eberman@quicinc.com \
    --cc=quic_mnalajal@quicinc.com \
    --cc=quic_pderrin@quicinc.com \
    --cc=quic_pheragu@quicinc.com \
    --cc=quic_svaddagi@quicinc.com \
    --cc=quic_tsoni@quicinc.com \
    --cc=rientjes@google.com \
    --cc=roypat@amazon.co.uk \
    --cc=seanjc@google.com \
    --cc=shuah@kernel.org \
    --cc=steven.price@arm.com \
    --cc=suzuki.poulose@arm.com \
    --cc=tabba@google.com \
    --cc=vannapurve@google.com \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=wei.w.wang@intel.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=xiaoyao.li@intel.com \
    --cc=yilun.xu@intel.com \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox