From: Ackerley Tng <ackerleytng@google.com>
To: Michael Roth <michael.roth@amd.com>
Cc: kvm@vger.kernel.org, linux-coco@lists.linux.dev,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
david@redhat.com, tabba@google.com, vannapurve@google.com,
ira.weiny@intel.com, thomas.lendacky@amd.com,
pbonzini@redhat.com, seanjc@google.com, vbabka@suse.cz,
joro@8bytes.org, pratikrajesh.sampat@amd.com,
liam.merwick@oracle.com, yan.y.zhao@intel.com, aik@amd.com
Subject: Re: [PATCH RFC v1 1/5] KVM: guest_memfd: Remove preparation tracking
Date: Thu, 18 Sep 2025 09:29:14 +0000 [thread overview]
Message-ID: <diqza52sp0r9.fsf@google.com> (raw)
In-Reply-To: <diqzcy7op5wg.fsf@google.com>
Ackerley Tng <ackerleytng@google.com> writes:
> Ackerley Tng <ackerleytng@google.com> writes:
>
>> Michael Roth <michael.roth@amd.com> writes:
>>
>>> On Mon, Aug 25, 2025 at 04:08:19PM -0700, Ackerley Tng wrote:
>>>> Michael Roth <michael.roth@amd.com> writes:
>>>>
>>>>
>>>> [...snip...]
>>>>
>>>> > @@ -435,13 +430,7 @@ static inline void kvm_gmem_mark_prepared(struct folio *folio)
>>>> > static int kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slot,
>>>> > gfn_t gfn, struct folio *folio)
>>>> > {
>>>> > - unsigned long nr_pages, i;
>>>> > pgoff_t index;
>>>> > - int r;
>>>> > -
>>>> > - nr_pages = folio_nr_pages(folio);
>>>> > - for (i = 0; i < nr_pages; i++)
>>>> > - clear_highpage(folio_page(folio, i));
>>>> >
>>>> > /*
>>>> > * Preparing huge folios should always be safe, since it should
>>>> > @@ -459,11 +448,8 @@ static int kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slot,
>>>>
>>>> While working on HugeTLB support for guest_memfd, I added a test that
>>>> tries to map a non-huge-page-aligned gmem.pgoff to a huge-page aligned
>>>> gfn.
>>>>
>>>> I understand that config would destroy the performance advantages of
>>>> huge pages, but I think the test is necessary since Yan brought up the
>>>> use case here [1].
>>>>
>>>> The conclusion in that thread, I believe, was to allow binding of
>>>> unaligned GFNs to offsets, but disallow large pages in that case. The
>>>> next series for guest_memfd HugeTLB support will include a fix similar
>>>> to this [2].
>>>>
>>>> While testing, I hit this WARN_ON with a non-huge-page-aligned
>>>> gmem.pgoff.
>>>>
>>>> > WARN_ON(!IS_ALIGNED(slot->gmem.pgoff, 1 << folio_order(folio)));
>>>>
>>>> Do you all think this WARN_ON can be removed?
>>>
>>> I think so.. I actually ended up dropping this WARN_ON() for a similar
>>> reason:
>>>
>>
>> Thanks for confirming!
>>
>
> Dropping this WARN_ON() actually further highlights the importance of
> separating preparedness from folio flags (and the folio).
>
> With huge pages being supported in guest_memfd, it's possible for just
> part of a folio to be mapped into the stage 2 page tables. One example
> of this is if userspace were to request populating just 2M in a 1G
> page. If preparedness were recorded in folio flags, then the entire 1G
> would be considered prepared even though only 2M of that page was
> prepared (updated in RMP tables).
>
> So I do support making the uptodate flag only mean zeroed, and taking
> preparedness out of the picture.
>
> With this change, kvm_gmem_prepare_folio() and
> __kvm_gmem_prepare_folio() seems to be a misnomer, since conceptually
> we're not preparing a folio, we can't assume that we're always preparing
> a whole folio once huge pages are in the picture.
>
> What do you all think of taking this even further? Instead of keeping
> kvm_gmem_prepare_folio() within guest_memfd, what if we
>
> 1. Focus on preparing pfn ranges (retaining kvm_arch_gmem_prepare() is
> good) and not folios
>
> 2. More clearly and directly associate preparing pfns with mapping
> (rather than with getting a folio to be mapped) into stage 2 page
> tables
>
Thought about this a little more and maybe this is not quite accurate
either. On a conversion, for SNP, does the memory actually need to be
unmapped from the NPTs, or would it be possible to just flip the C bit?
If conversion only involves flipping the C bit and updating RMP tables,
then perhaps preparation and invalidation shouldn't be associated with
mapping, but directly with conversions, or setting page private/shared
state.
> What I have in mind for (2) is to update kvm_tdp_mmu_map() to do an
> arch-specific call, when fault->is_private, to call
> kvm_arch_gmem_prepare() just before mapping the pfns and when the
> mapping level is known.
>
> The cleanup counterpart would then be to call kvm_arch_gmem_invalidate()
> somewhere in tdp_mmu_zap_leafs().
>
> kvm_arch_gmem_prepare() and kvm_arch_gmem_invalidate() would then drop
> out of guest_memfd and be moved back into the core of KVM.
>
> Technically these two functions don't even need to have gmem in the name
> since any memory can be prepared in the SNP sense, though for the
> foreseeable future gmem is the only memory supported for private memory
> in CoCo VMs.
>
> Also, to push this along a little, I feel that this series does a few
> things. What do you all think of re-focusing this series (or a part of
> this series) as "Separating SNP preparation from guest_memfd" or
> "Separating arch-specific preparation from guest_memfd"?
>
>>>
>>> [...snip...]
>>>
next prev parent reply other threads:[~2025-09-18 9:29 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-13 0:53 [PATCH RFC v1 0/5] KVM: guest_memfd: Support in-place conversion for CoCo VMs Michael Roth
2025-06-13 0:53 ` [PATCH RFC v1 1/5] KVM: guest_memfd: Remove preparation tracking Michael Roth
2025-07-15 12:47 ` Vishal Annapurve
2025-07-15 22:55 ` Michael Roth
2025-08-25 23:08 ` Ackerley Tng
2025-09-16 23:33 ` Michael Roth
2025-09-18 6:31 ` Ackerley Tng
2025-09-18 7:38 ` Ackerley Tng
2025-09-18 9:29 ` Ackerley Tng [this message]
2025-11-07 13:05 ` Yan Zhao
2025-06-13 0:53 ` [PATCH RFC v1 2/5] KVM: guest_memfd: Only access KVM memory attributes when appropriate Michael Roth
2025-06-13 0:53 ` [PATCH RFC v1 3/5] KVM: guest_memfd: Call arch invalidation hooks when converting to shared Michael Roth
2025-07-15 13:20 ` Vishal Annapurve
2025-07-15 22:48 ` Michael Roth
2025-07-16 13:04 ` Vishal Annapurve
2025-06-13 0:53 ` [PATCH RFC v1 4/5] KVM: guest_memfd: Don't prepare shared folios Michael Roth
2025-06-13 0:54 ` [PATCH RFC v1 5/5] KVM: SEV: Make SNP_LAUNCH_UPDATE ignore 'uaddr' if guest_memfd is shareable Michael Roth
2025-06-13 7:36 ` [PATCH RFC v1 0/5] KVM: guest_memfd: Support in-place conversion for CoCo VMs David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=diqza52sp0r9.fsf@google.com \
--to=ackerleytng@google.com \
--cc=aik@amd.com \
--cc=david@redhat.com \
--cc=ira.weiny@intel.com \
--cc=joro@8bytes.org \
--cc=kvm@vger.kernel.org \
--cc=liam.merwick@oracle.com \
--cc=linux-coco@lists.linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=michael.roth@amd.com \
--cc=pbonzini@redhat.com \
--cc=pratikrajesh.sampat@amd.com \
--cc=seanjc@google.com \
--cc=tabba@google.com \
--cc=thomas.lendacky@amd.com \
--cc=vannapurve@google.com \
--cc=vbabka@suse.cz \
--cc=yan.y.zhao@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox