From: Quentin Perret <qperret@google.com>
To: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Sean Christopherson <seanjc@google.com>,
Ackerley Tng <ackerleytng@google.com>,
Alexey Kardashevskiy <aik@amd.com>,
cgroups@vger.kernel.org, kvm@vger.kernel.org,
linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org,
x86@kernel.org, akpm@linux-foundation.org,
binbin.wu@linux.intel.com, bp@alien8.de, brauner@kernel.org,
chao.p.peng@intel.com, chenhuacai@kernel.org, corbet@lwn.net,
dave.hansen@intel.com, dave.hansen@linux.intel.com,
david@redhat.com, dmatlack@google.com, erdemaktas@google.com,
fan.du@intel.com, fvdl@google.com, haibo1.xu@intel.com,
hannes@cmpxchg.org, hch@infradead.org, hpa@zytor.com,
hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com,
jack@suse.cz, james.morse@arm.com, jarkko@kernel.org,
jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de,
jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com,
keirf@google.com, kent.overstreet@linux.dev,
liam.merwick@oracle.com, maciej.wieczor-retman@intel.com,
mail@maciej.szmigiero.name, maobibo@loongson.cn,
mathieu.desnoyers@efficios.com, maz@kernel.org,
mhiramat@kernel.org, mhocko@kernel.org, mic@digikod.net,
michael.roth@amd.com, mingo@redhat.com, mlevitsk@redhat.com,
mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com,
nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com,
pankaj.gupta@amd.com, paul.walmsley@sifive.com,
pbonzini@redhat.com, peterx@redhat.com, pgonda@google.com,
prsampat@amd.com, pvorel@suse.cz, richard.weiyang@gmail.com,
rick.p.edgecombe@intel.com, rientjes@google.com,
rostedt@goodmis.org, roypat@amazon.co.uk, rppt@kernel.org,
shakeel.butt@linux.dev, shuah@kernel.org, steven.price@arm.com,
steven.sistare@oracle.com, suzuki.poulose@arm.com,
tabba@google.com, tglx@linutronix.de, thomas.lendacky@amd.com,
vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk,
vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org,
willy@infradead.org, wyihan@google.com, xiaoyao.li@intel.com,
yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com,
zhiquan1.li@intel.com
Subject: Re: [RFC PATCH v1 05/37] KVM: guest_memfd: Wire up kvm_get_memory_attributes() to per-gmem attributes
Date: Thu, 29 Jan 2026 11:10:12 +0000 [thread overview]
Message-ID: <i22yykvttpc2e4expluuzucczqnetdnpee2wx2fzqwg7cnt45x@ovx7e7hok5iz> (raw)
In-Reply-To: <20260129011618.GA2307128@ziepe.ca>
Hi all,
On Wednesday 28 Jan 2026 at 21:16:18 (-0400), Jason Gunthorpe wrote:
> On Wed, Jan 28, 2026 at 05:03:27PM -0800, Sean Christopherson wrote:
>
> > For a dmabuf fd, the story is the same as guest_memfd. Unless private vs. shared
> > is all or nothing, and can never change, then the only entity that can track that
> > info is the owner of the dmabuf. And even if the private vs. shared attributes
> > are constant, tracking it external to KVM makes sense, because then the provider
> > can simply hardcode %true/%false.
>
> Oh my I had not given that bit any thought. My remarks were just about
> normal non-CC systems.
>
> So MMIO starts out shared, and then converts to private when the guest
> triggers it. It is not all or nothing, there are permanent shared
> holes in the MMIO ranges too.
>
> Beyond that I don't know what people are thinking.
>
> Clearly VFIO has to revoke and disable the DMABUF once any of it
> becomes private. VFIO will somehow have to know when it changes modes
> from the TSM subsystem.
>
> I guess we could have a special channel for KVM to learn the
> shared/private page by page from VFIO as some kind of "aware of CC"
> importer.
Slightly out of my depth, but I figured I should jump in this discussion
nonetheless; turns out dmabuf vs CoCo is a hot topic for pKVM[*], so
please bear with me :)
It occurred to me that lazily faulting a dmabuf page by page into a
guest isn't particularly useful, because the entire dmabuf is 'paged in'
by construction on the host side (regardless of whether that dmabuf is
backed by memory or MMIO). There is a weird edge case where a memslot
may not cover an entire dmabuf, but perhaps we could simply say 'don't
do that'. Faulting-in the entire dmabuf in one go on the first guest
access would be good for performance, but it doesn't really solve any of
the problems you've listed above.
A not-fully-thought-through-and-possibly-ridiculous idea that crossed
my mind some time ago was to make KVM itself a proper dmabuf
importer. You'd essentially see a guest as a 'device' (probably with an
actual struct dev representing it), and the stage-2 MMU in front of it
as its IOMMU. That could potentially allow KVM to implement dma_map_ops
for that guest 'device' by mapping/unmapping pages into its stage-2 and
such. And in order to get KVM to import a dmabuf, host userspace would
have to pass a dmabuf fd to SET_USER_MEMORY_REGION2, a which point KVM
could check properties about the dmabuf before proceeding with the
import. We could set different expectations about the properties we
want for CoCo vs non-CoCo guests at that level (and yes this could
include having KVM use a special channel with the exporter to check
that).
That has the nice benefit of having a clear KVM-level API to transition
an entire dmabuf fd to 'private' in one go in the CoCo case. And in the
non-CoCo case, we avoid the unnecessary lazy faulting of the dmabuf.
It gets really funny when a CoCo guest decides to share back a subset of
that dmabuf with the host, and I'm still wrapping my head around how
we'd make that work, but at this point I'm ready to be told how all the
above already doesn't work and that I should go back to the peanut
gallery :-)
Cheers,
Quentin
[*] https://www.youtube.com/watch?v=zaBxoyRepzA&list=PLW3ep1uCIRfxwmllXTOA2txfDWN6vUOHp&index=35
next prev parent reply other threads:[~2026-01-29 11:10 UTC|newest]
Thread overview: 83+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-17 20:11 [RFC PATCH v1 00/37] guest_memfd: In-place conversion support Ackerley Tng
2025-10-17 20:11 ` [RFC PATCH v1 01/37] KVM: guest_memfd: Introduce per-gmem attributes, use to guard user mappings Ackerley Tng
2025-10-27 13:27 ` Vlastimil Babka
2025-11-12 8:58 ` Binbin Wu
2026-01-28 17:07 ` Ackerley Tng
2026-01-19 7:58 ` Yan Zhao
2026-01-28 17:50 ` Ackerley Tng
2025-10-17 20:11 ` [RFC PATCH v1 02/37] KVM: Rename KVM_GENERIC_MEMORY_ATTRIBUTES to KVM_VM_MEMORY_ATTRIBUTES Ackerley Tng
2025-10-17 20:11 ` [RFC PATCH v1 03/37] KVM: Enumerate support for PRIVATE memory iff kvm_arch_has_private_mem is defined Ackerley Tng
2025-11-13 1:42 ` Binbin Wu
2025-10-17 20:11 ` [RFC PATCH v1 04/37] KVM: Stub in ability to disable per-VM memory attribute tracking Ackerley Tng
2025-10-17 20:11 ` [RFC PATCH v1 05/37] KVM: guest_memfd: Wire up kvm_get_memory_attributes() to per-gmem attributes Ackerley Tng
2026-01-15 11:08 ` Alexey Kardashevskiy
2026-01-28 21:47 ` Ackerley Tng
2026-01-29 0:37 ` Jason Gunthorpe
2026-01-29 1:03 ` Sean Christopherson
2026-01-29 1:16 ` Jason Gunthorpe
2026-01-29 11:10 ` Quentin Perret [this message]
2026-01-29 13:42 ` Jason Gunthorpe
2026-01-29 14:36 ` Quentin Perret
2026-02-03 1:07 ` Alexey Kardashevskiy
2026-02-03 18:13 ` Jason Gunthorpe
2026-02-03 9:56 ` Xu Yilun
2026-02-03 18:16 ` Jason Gunthorpe
2026-02-04 4:43 ` Xu Yilun
2026-02-04 12:47 ` Jason Gunthorpe
2026-02-05 7:04 ` Xu Yilun
2025-10-17 20:11 ` [RFC PATCH v1 06/37] KVM: guest_memfd: Update kvm_gmem_populate() to use gmem attributes Ackerley Tng
2025-11-10 10:01 ` Yan Zhao
2025-11-15 0:52 ` Ackerley Tng
2025-10-17 20:11 ` [RFC PATCH v1 07/37] KVM: Introduce KVM_SET_MEMORY_ATTRIBUTES2 Ackerley Tng
2025-10-22 15:21 ` Steven Price
2025-10-22 16:51 ` Ackerley Tng
2025-10-22 22:45 ` Ackerley Tng
2025-10-22 23:30 ` Sean Christopherson
2025-10-23 14:01 ` Ackerley Tng
2025-10-23 15:05 ` Sean Christopherson
2025-10-24 14:36 ` Ackerley Tng
2025-10-24 15:11 ` Sean Christopherson
2025-10-24 16:41 ` Ackerley Tng
2025-10-24 17:45 ` Sean Christopherson
2025-10-27 12:48 ` Ackerley Tng
2025-10-17 20:11 ` [RFC PATCH v1 08/37] KVM: guest_memfd: Don't set FGP_ACCESSED when getting folios Ackerley Tng
2025-10-27 13:39 ` Vlastimil Babka
2025-10-17 20:11 ` [RFC PATCH v1 09/37] KVM: guest_memfd: Skip LRU for guest_memfd folios Ackerley Tng
2025-10-27 13:56 ` Vlastimil Babka
2026-01-27 23:46 ` Ackerley Tng
2026-01-20 2:15 ` Yan Zhao
2025-10-17 20:11 ` [RFC PATCH v1 10/37] KVM: guest_memfd: Enable INIT_SHARED on guest_memfd for x86 Coco VMs Ackerley Tng
2025-10-17 20:11 ` [RFC PATCH v1 11/37] KVM: guest_memfd: Add support for KVM_SET_MEMORY_ATTRIBUTES Ackerley Tng
2025-11-04 9:25 ` Yan Zhao
2025-11-04 15:29 ` Vishal Annapurve
2025-11-15 0:46 ` Ackerley Tng
2025-10-17 20:11 ` [RFC PATCH v1 12/37] KVM: Move KVM_VM_MEMORY_ATTRIBUTES config definition to x86 Ackerley Tng
2025-10-17 20:11 ` [RFC PATCH v1 13/37] KVM: Let userspace disable per-VM mem attributes, enable per-gmem attributes Ackerley Tng
2025-10-17 20:11 ` [RFC PATCH v1 14/37] KVM: selftests: Create gmem fd before "regular" fd when adding memslot Ackerley Tng
2025-10-17 20:11 ` [RFC PATCH v1 15/37] KVM: selftests: Rename guest_memfd{,_offset} to gmem_{fd,offset} Ackerley Tng
2025-10-17 20:11 ` [RFC PATCH v1 16/37] KVM: selftests: Add support for mmap() on guest_memfd in core library Ackerley Tng
2025-10-24 16:48 ` Ackerley Tng
2025-10-24 18:18 ` Sean Christopherson
2025-10-27 12:51 ` Ackerley Tng
2025-10-17 20:11 ` [RFC PATCH v1 17/37] KVM: selftests: Update framework to use KVM_SET_MEMORY_ATTRIBUTES2 Ackerley Tng
2025-10-17 20:11 ` [RFC PATCH v1 18/37] KVM: selftests: Add helpers for calling ioctls on guest_memfd Ackerley Tng
2025-10-17 20:12 ` [RFC PATCH v1 19/37] KVM: selftests: guest_memfd: Test basic single-page conversion flow Ackerley Tng
2025-10-17 20:12 ` [RFC PATCH v1 20/37] KVM: selftests: guest_memfd: Test conversion flow when INIT_SHARED Ackerley Tng
2025-10-17 20:12 ` [RFC PATCH v1 21/37] KVM: selftests: guest_memfd: Test indexing in guest_memfd Ackerley Tng
2025-10-17 20:12 ` [RFC PATCH v1 22/37] KVM: selftests: guest_memfd: Test conversion before allocation Ackerley Tng
2025-10-17 20:12 ` [RFC PATCH v1 23/37] KVM: selftests: guest_memfd: Convert with allocated folios in different layouts Ackerley Tng
2025-10-17 20:12 ` [RFC PATCH v1 24/37] KVM: selftests: guest_memfd: Test precision of conversion Ackerley Tng
2025-10-17 20:12 ` [RFC PATCH v1 25/37] KVM: selftests: guest_memfd: Test that truncation does not change shared/private status Ackerley Tng
2025-10-17 20:12 ` [RFC PATCH v1 26/37] KVM: selftests: guest_memfd: Test that shared/private status is consistent across processes Ackerley Tng
2025-10-17 23:33 ` Ackerley Tng
2025-10-17 20:12 ` [RFC PATCH v1 27/37] KVM: selftests: guest_memfd: Test conversion with elevated page refcount Ackerley Tng
2025-10-17 20:12 ` [RFC PATCH v1 28/37] KVM: selftests: Reset shared memory after hole-punching Ackerley Tng
2025-10-17 20:12 ` [RFC PATCH v1 29/37] KVM: selftests: Add selftests global for guest memory attributes capability Ackerley Tng
2025-10-17 20:12 ` [RFC PATCH v1 30/37] KVM: selftests: Provide function to look up guest_memfd details from gpa Ackerley Tng
2025-10-17 20:12 ` [RFC PATCH v1 31/37] KVM: selftests: Provide common function to set memory attributes Ackerley Tng
2025-10-17 20:12 ` [RFC PATCH v1 32/37] KVM: selftests: Check fd/flags provided to mmap() when setting up memslot Ackerley Tng
2025-10-17 20:12 ` [RFC PATCH v1 33/37] KVM: selftests: Make TEST_EXPECT_SIGBUS thread-safe Ackerley Tng
2025-10-17 20:12 ` [RFC PATCH v1 34/37] KVM: selftests: Update private_mem_conversions_test to mmap() guest_memfd Ackerley Tng
2025-10-17 20:12 ` [RFC PATCH v1 35/37] KVM: selftests: Add script to exercise private_mem_conversions_test Ackerley Tng
2025-10-17 20:12 ` [RFC PATCH v1 36/37] KVM: selftests: Update pre-fault test to work with per-guest_memfd attributes Ackerley Tng
2025-10-17 20:12 ` [RFC PATCH v1 37/37] KVM: selftests: Update private memory exits test work with per-gmem attributes Ackerley Tng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=i22yykvttpc2e4expluuzucczqnetdnpee2wx2fzqwg7cnt45x@ovx7e7hok5iz \
--to=qperret@google.com \
--cc=ackerleytng@google.com \
--cc=aik@amd.com \
--cc=akpm@linux-foundation.org \
--cc=binbin.wu@linux.intel.com \
--cc=bp@alien8.de \
--cc=brauner@kernel.org \
--cc=cgroups@vger.kernel.org \
--cc=chao.p.peng@intel.com \
--cc=chenhuacai@kernel.org \
--cc=corbet@lwn.net \
--cc=dave.hansen@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=dmatlack@google.com \
--cc=erdemaktas@google.com \
--cc=fan.du@intel.com \
--cc=fvdl@google.com \
--cc=haibo1.xu@intel.com \
--cc=hannes@cmpxchg.org \
--cc=hch@infradead.org \
--cc=hpa@zytor.com \
--cc=hughd@google.com \
--cc=ira.weiny@intel.com \
--cc=isaku.yamahata@intel.com \
--cc=jack@suse.cz \
--cc=james.morse@arm.com \
--cc=jarkko@kernel.org \
--cc=jgg@ziepe.ca \
--cc=jgowans@amazon.com \
--cc=jhubbard@nvidia.com \
--cc=jroedel@suse.de \
--cc=jthoughton@google.com \
--cc=jun.miao@intel.com \
--cc=kai.huang@intel.com \
--cc=keirf@google.com \
--cc=kent.overstreet@linux.dev \
--cc=kvm@vger.kernel.org \
--cc=liam.merwick@oracle.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=maciej.wieczor-retman@intel.com \
--cc=mail@maciej.szmigiero.name \
--cc=maobibo@loongson.cn \
--cc=mathieu.desnoyers@efficios.com \
--cc=maz@kernel.org \
--cc=mhiramat@kernel.org \
--cc=mhocko@kernel.org \
--cc=mic@digikod.net \
--cc=michael.roth@amd.com \
--cc=mingo@redhat.com \
--cc=mlevitsk@redhat.com \
--cc=mpe@ellerman.id.au \
--cc=muchun.song@linux.dev \
--cc=nikunj@amd.com \
--cc=nsaenz@amazon.es \
--cc=oliver.upton@linux.dev \
--cc=palmer@dabbelt.com \
--cc=pankaj.gupta@amd.com \
--cc=paul.walmsley@sifive.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=pgonda@google.com \
--cc=prsampat@amd.com \
--cc=pvorel@suse.cz \
--cc=richard.weiyang@gmail.com \
--cc=rick.p.edgecombe@intel.com \
--cc=rientjes@google.com \
--cc=rostedt@goodmis.org \
--cc=roypat@amazon.co.uk \
--cc=rppt@kernel.org \
--cc=seanjc@google.com \
--cc=shakeel.butt@linux.dev \
--cc=shuah@kernel.org \
--cc=steven.price@arm.com \
--cc=steven.sistare@oracle.com \
--cc=suzuki.poulose@arm.com \
--cc=tabba@google.com \
--cc=tglx@linutronix.de \
--cc=thomas.lendacky@amd.com \
--cc=vannapurve@google.com \
--cc=vbabka@suse.cz \
--cc=viro@zeniv.linux.org.uk \
--cc=vkuznets@redhat.com \
--cc=wei.w.wang@intel.com \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=wyihan@google.com \
--cc=x86@kernel.org \
--cc=xiaoyao.li@intel.com \
--cc=yan.y.zhao@intel.com \
--cc=yilun.xu@intel.com \
--cc=yuzenghui@huawei.com \
--cc=zhiquan1.li@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox