From: Frank van der Linden <fvdl@google.com>
To: Sean Christopherson <seanjc@google.com>
Cc: Nikita Kalyazin <kalyazin@amazon.co.uk>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>,
"kvmarm@lists.linux.dev" <kvmarm@lists.linux.dev>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
"linux-kselftest@vger.kernel.org"
<linux-kselftest@vger.kernel.org>,
"kernel@xen0n.name" <kernel@xen0n.name>,
"linux-riscv@lists.infradead.org"
<linux-riscv@lists.infradead.org>,
"linux-s390@vger.kernel.org" <linux-s390@vger.kernel.org>,
"loongarch@lists.linux.dev" <loongarch@lists.linux.dev>,
"linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>,
"pbonzini@redhat.com" <pbonzini@redhat.com>,
"corbet@lwn.net" <corbet@lwn.net>,
"maz@kernel.org" <maz@kernel.org>,
"oupton@kernel.org" <oupton@kernel.org>,
"joey.gouly@arm.com" <joey.gouly@arm.com>,
"suzuki.poulose@arm.com" <suzuki.poulose@arm.com>,
"yuzenghui@huawei.com" <yuzenghui@huawei.com>,
"catalin.marinas@arm.com" <catalin.marinas@arm.com>,
"will@kernel.org" <will@kernel.org>,
"tglx@kernel.org" <tglx@kernel.org>,
"mingo@redhat.com" <mingo@redhat.com>,
"bp@alien8.de" <bp@alien8.de>,
"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
"x86@kernel.org" <x86@kernel.org>,
"hpa@zytor.com" <hpa@zytor.com>,
"luto@kernel.org" <luto@kernel.org>,
"peterz@infradead.org" <peterz@infradead.org>,
"willy@infradead.org" <willy@infradead.org>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"david@kernel.org" <david@kernel.org>,
"lorenzo.stoakes@oracle.com" <lorenzo.stoakes@oracle.com>,
"vbabka@kernel.org" <vbabka@kernel.org>,
"rppt@kernel.org" <rppt@kernel.org>,
"surenb@google.com" <surenb@google.com>,
"mhocko@suse.com" <mhocko@suse.com>,
"ast@kernel.org" <ast@kernel.org>,
"daniel@iogearbox.net" <daniel@iogearbox.net>,
"andrii@kernel.org" <andrii@kernel.org>,
"martin.lau@linux.dev" <martin.lau@linux.dev>,
"eddyz87@gmail.com" <eddyz87@gmail.com>,
"song@kernel.org" <song@kernel.org>,
"yonghong.song@linux.dev" <yonghong.song@linux.dev>,
"john.fastabend@gmail.com" <john.fastabend@gmail.com>,
"kpsingh@kernel.org" <kpsingh@kernel.org>,
"sdf@fomichev.me" <sdf@fomichev.me>,
"haoluo@google.com" <haoluo@google.com>,
"jolsa@kernel.org" <jolsa@kernel.org>,
"jgg@ziepe.ca" <jgg@ziepe.ca>,
"jhubbard@nvidia.com" <jhubbard@nvidia.com>,
"peterx@redhat.com" <peterx@redhat.com>,
"jannh@google.com" <jannh@google.com>,
"pfalcato@suse.de" <pfalcato@suse.de>,
"skhan@linuxfoundation.org" <skhan@linuxfoundation.org>,
"riel@surriel.com" <riel@surriel.com>,
"ryan.roberts@arm.com" <ryan.roberts@arm.com>,
"jgross@suse.com" <jgross@suse.com>,
"yu-cheng.yu@intel.com" <yu-cheng.yu@intel.com>,
"kas@kernel.org" <kas@kernel.org>,
"coxu@redhat.com" <coxu@redhat.com>,
"ackerleytng@google.com" <ackerleytng@google.com>,
"yosry@kernel.org" <yosry@kernel.org>,
"ajones@ventanamicro.com" <ajones@ventanamicro.com>,
"maobibo@loongson.cn" <maobibo@loongson.cn>,
"tabba@google.com" <tabba@google.com>,
"prsampat@amd.com" <prsampat@amd.com>,
"wu.fei9@sanechips.com.cn" <wu.fei9@sanechips.com.cn>,
"mlevitsk@redhat.com" <mlevitsk@redhat.com>,
"jmattson@google.com" <jmattson@google.com>,
"jthoughton@google.com" <jthoughton@google.com>,
"agordeev@linux.ibm.com" <agordeev@linux.ibm.com>,
"alex@ghiti.fr" <alex@ghiti.fr>,
"aou@eecs.berkeley.edu" <aou@eecs.berkeley.edu>,
"borntraeger@linux.ibm.com" <borntraeger@linux.ibm.com>,
"chenhuacai@kernel.org" <chenhuacai@kernel.org>,
"baolu.lu@linux.intel.com" <baolu.lu@linux.intel.com>,
"dev.jain@arm.com" <dev.jain@arm.com>,
"gor@linux.ibm.com" <gor@linux.ibm.com>,
"hca@linux.ibm.com" <hca@linux.ibm.com>,
"palmer@dabbelt.com" <palmer@dabbelt.com>,
"pjw@kernel.org" <pjw@kernel.org>,
"shijie@os.amperecomputing.com" <shijie@os.amperecomputing.com>,
"svens@linux.ibm.com" <svens@linux.ibm.com>,
"thuth@redhat.com" <thuth@redhat.com>,
"yang@os.amperecomputing.com" <yang@os.amperecomputing.com>,
"Liam.Howlett@oracle.com" <Liam.Howlett@oracle.com>,
"urezki@gmail.com" <urezki@gmail.com>,
"zhengqi.arch@bytedance.com" <zhengqi.arch@bytedance.com>,
"gerald.schaefer@linux.ibm.com" <gerald.schaefer@linux.ibm.com>,
"jiayuan.chen@shopee.com" <jiayuan.chen@shopee.com>,
"lenb@kernel.org" <lenb@kernel.org>,
"pavel@kernel.org" <pavel@kernel.org>,
"rafael@kernel.org" <rafael@kernel.org>,
"yangyicong@hisilicon.com" <yangyicong@hisilicon.com>,
"vannapurve@google.com" <vannapurve@google.com>,
"jackmanb@google.com" <jackmanb@google.com>,
"patrick.roy@linux.dev" <patrick.roy@linux.dev>,
Jack Thomson <jackabt@amazon.co.uk>,
Takahiro Itazuri <itazur@amazon.co.uk>,
Derek Manwaring <derekmn@amazon.com>,
Nikita Kalyazin <nikita.kalyazin@linux.dev>
Subject: Re: [PATCH v12 10/16] KVM: guest_memfd: Add flag to remove from direct map
Date: Tue, 21 Apr 2026 10:08:48 -0700 [thread overview]
Message-ID: <CAPTztWb67XZvfcMVnbegDNNW0LJa9UsaTGx3M898xJUJrekk0w@mail.gmail.com> (raw)
In-Reply-To: <aeemS2wm38Cm4qAf@google.com>
On Tue, Apr 21, 2026 at 9:31 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Fri, Apr 10, 2026, Nikita Kalyazin wrote:
> > From: Patrick Roy <patrick.roy@linux.dev>
> >
> > Add GUEST_MEMFD_FLAG_NO_DIRECT_MAP flag for KVM_CREATE_GUEST_MEMFD()
> > ioctl. When set, guest_memfd folios will be removed from the direct map
> > after preparation, with direct map entries only restored when the folios
> > are freed.
> >
> > To ensure these folios do not end up in places where the kernel cannot
> > deal with them, set AS_NO_DIRECT_MAP on the guest_memfd's struct
> > address_space if GUEST_MEMFD_FLAG_NO_DIRECT_MAP is requested.
> >
> > Note that this flag causes removal of direct map entries for all
> > guest_memfd folios independent of whether they are "shared" or "private"
> > (although current guest_memfd only supports either all folios in the
> > "shared" state, or all folios in the "private" state if
> > GUEST_MEMFD_FLAG_MMAP is not set). The usecase for removing direct map
> > entries of also the shared parts of guest_memfd are a special type of
> > non-CoCo VM where, host userspace is trusted to have access to all of
> > guest memory, but where Spectre-style transient execution attacks
> > through the host kernel's direct map should still be mitigated. In this
> > setup, KVM retains access to guest memory via userspace mappings of
> > guest_memfd, which are reflected back into KVM's memslots via
> > userspace_addr. This is needed for things like MMIO emulation on x86_64
> > to work.
> >
> > Direct map entries are zapped right before guest or userspace mappings
> > of gmem folios are set up, e.g. in kvm_gmem_fault_user_mapping() or
> > kvm_gmem_get_pfn() [called from the KVM MMU code].
>
> ...
>
> > +#define KVM_GMEM_FOLIO_NO_DIRECT_MAP BIT(0)
> > +
> > +static bool kvm_gmem_folio_no_direct_map(struct folio *folio)
> > +{
> > + return ((u64)folio->private) & KVM_GMEM_FOLIO_NO_DIRECT_MAP;
> > +}
> > +
> > +static int kvm_gmem_folio_zap_direct_map(struct folio *folio)
> > +{
> > + int r = 0;
> > +
> > + VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio);
> > +
> > + if (WARN_ON_ONCE(!(GMEM_I(folio_inode(folio))->flags & GUEST_MEMFD_FLAG_NO_DIRECT_MAP)))
> > + return -EINVAL;
> > +
> > + if (kvm_gmem_folio_no_direct_map(folio))
> > + goto out;
> > +
> > + r = folio_zap_direct_map(folio);
> > + if (!r)
> > + folio->private = (void *)((u64)folio->private | KVM_GMEM_FOLIO_NO_DIRECT_MAP);
> > +
> > +out:
> > + return r;
> > +}
> > +
> > +static void kvm_gmem_folio_restore_direct_map(struct folio *folio)
> > +{
> > + folio_restore_direct_map(folio);
> > + folio->private = (void *)((u64)folio->private & ~KVM_GMEM_FOLIO_NO_DIRECT_MAP);
> > +}
>
> Making guest_memfd responsible for zapping and restoring the direct map on a per-
> folio basis feels wrong given the addition of AS_NO_DIRECT_MAP. I especially don't
> like that the "rules" for when an AS_NO_DIRECT_MAP folio has a direct map will vary
> based on the owner, and even within an owner (e.g. guest_memfd) will be ad hoc.
>
> E.g. as per the series to add guest_memfd write() support[*]:
>
> When direct map removal is implemented [2]
> - write() will not be allowed to access pages that have already
> been removed from direct map
> - on completion, write() will remove the populated pages from
> direct map
>
> That's pretty gross ABI, because with KVM_GMEM_FOLIO_NO_DIRECT_MAP, userspace can
> write() exactly once. To re-write memory, I assume userspace would need to do a
> PUNCH_HOLE or truncate.
>
> What's preventing us from handling this automagically in e.g. filemap_add_folio()
> and filemap_remove_folio()? Then the usage rules are pretty straightforward: the
> kernel must *always* assume the direct map is invalid for folios from
> AS_NO_DIRECT_MAP mappings.
>
> Then if KVM needs to utilize a kernel mapping, e.g. in kvm_gmem_populate(), KVM
> could use dedicated variants of kmap_local_xxx() to deal with a local mapping for
> a folio/page without a direct map. Or, KVM could simply disallow the specific
> sequence that would require KVM to do the memcpy (I'm pretty sure we can do that
> with in-place shared=>private conversion support).
>
> I realize that could throw a big wrench into write() performance, but IMO, before
> merging either series, we need a complete story for exactly how this will all fit
> together, in a maintainable fashion and with sane ABI.
>
> [*] https://lore.kernel.org/all/20251114151828.98165-1-kalyazin@amazon.com
>
I agree with this - this approach would also allow for memory that was
never in the direct map to begin with, or has been taken out already
(for which I happen to have a use case :-)). guest_memfd and other
code can then assume that AS_NO_DIRECT_MAP means they have to take
explicit action to map it if needed. It's a clean, simple ABI.
With the current set of patches, it seems like this couldn't be done
in a clean manner.
- Frank
next prev parent reply other threads:[~2026-04-21 17:09 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-10 15:17 [PATCH v12 00/16] Direct Map Removal Support for guest_memfd Kalyazin, Nikita
2026-04-10 15:17 ` [PATCH v12 01/16] set_memory: set_direct_map_* to take address Kalyazin, Nikita
2026-04-21 14:43 ` Lorenzo Stoakes
2026-04-10 15:18 ` [PATCH v12 02/16] set_memory: add folio_{zap,restore}_direct_map helpers Kalyazin, Nikita
2026-04-10 15:18 ` [PATCH v12 03/16] mm/secretmem: make use of folio_{zap,restore}_direct_map Kalyazin, Nikita
2026-04-10 15:18 ` [PATCH v12 04/16] mm/gup: drop secretmem optimization from gup_fast_folio_allowed Kalyazin, Nikita
2026-04-10 15:18 ` [PATCH v12 05/16] mm/gup: drop local variable in gup_fast_folio_allowed Kalyazin, Nikita
2026-04-10 15:18 ` [PATCH v12 06/16] mm: introduce AS_NO_DIRECT_MAP Kalyazin, Nikita
2026-04-10 15:19 ` [PATCH v12 07/16] KVM: guest_memfd: Add stub for kvm_arch_gmem_invalidate Kalyazin, Nikita
2026-04-10 15:19 ` [PATCH v12 08/16] KVM: x86: define kvm_arch_gmem_supports_no_direct_map() Kalyazin, Nikita
2026-04-10 15:19 ` [PATCH v12 09/16] KVM: arm64: " Kalyazin, Nikita
2026-04-21 16:55 ` Marc Zyngier
2026-04-10 15:19 ` [PATCH v12 10/16] KVM: guest_memfd: Add flag to remove from direct map Kalyazin, Nikita
2026-04-21 16:31 ` Sean Christopherson
2026-04-21 17:08 ` Frank van der Linden [this message]
2026-04-10 15:19 ` [PATCH v12 11/16] KVM: selftests: load elf via bounce buffer Kalyazin, Nikita
2026-04-10 15:19 ` [PATCH v12 12/16] KVM: selftests: set KVM_MEM_GUEST_MEMFD in vm_mem_add() if guest_memfd != -1 Kalyazin, Nikita
2026-04-10 15:20 ` [PATCH v12 13/16] KVM: selftests: Add guest_memfd based vm_mem_backing_src_types Kalyazin, Nikita
2026-04-10 15:20 ` [PATCH v12 14/16] KVM: selftests: cover GUEST_MEMFD_FLAG_NO_DIRECT_MAP in existing selftests Kalyazin, Nikita
2026-04-10 15:20 ` [PATCH v12 15/16] KVM: selftests: stuff vm_mem_backing_src_type into vm_shape Kalyazin, Nikita
2026-04-10 15:20 ` [PATCH v12 16/16] KVM: selftests: Test guest execution from direct map removed gmem Kalyazin, Nikita
2026-04-21 13:40 ` [PATCH v12 00/16] Direct Map Removal Support for guest_memfd Lorenzo Stoakes
2026-04-21 16:36 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAPTztWb67XZvfcMVnbegDNNW0LJa9UsaTGx3M898xJUJrekk0w@mail.gmail.com \
--to=fvdl@google.com \
--cc=Liam.Howlett@oracle.com \
--cc=ackerleytng@google.com \
--cc=agordeev@linux.ibm.com \
--cc=ajones@ventanamicro.com \
--cc=akpm@linux-foundation.org \
--cc=alex@ghiti.fr \
--cc=andrii@kernel.org \
--cc=aou@eecs.berkeley.edu \
--cc=ast@kernel.org \
--cc=baolu.lu@linux.intel.com \
--cc=borntraeger@linux.ibm.com \
--cc=bp@alien8.de \
--cc=bpf@vger.kernel.org \
--cc=catalin.marinas@arm.com \
--cc=chenhuacai@kernel.org \
--cc=corbet@lwn.net \
--cc=coxu@redhat.com \
--cc=daniel@iogearbox.net \
--cc=dave.hansen@linux.intel.com \
--cc=david@kernel.org \
--cc=derekmn@amazon.com \
--cc=dev.jain@arm.com \
--cc=eddyz87@gmail.com \
--cc=gerald.schaefer@linux.ibm.com \
--cc=gor@linux.ibm.com \
--cc=haoluo@google.com \
--cc=hca@linux.ibm.com \
--cc=hpa@zytor.com \
--cc=itazur@amazon.co.uk \
--cc=jackabt@amazon.co.uk \
--cc=jackmanb@google.com \
--cc=jannh@google.com \
--cc=jgg@ziepe.ca \
--cc=jgross@suse.com \
--cc=jhubbard@nvidia.com \
--cc=jiayuan.chen@shopee.com \
--cc=jmattson@google.com \
--cc=joey.gouly@arm.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=jthoughton@google.com \
--cc=kalyazin@amazon.co.uk \
--cc=kas@kernel.org \
--cc=kernel@xen0n.name \
--cc=kpsingh@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.linux.dev \
--cc=lenb@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-pm@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=linux-s390@vger.kernel.org \
--cc=loongarch@lists.linux.dev \
--cc=lorenzo.stoakes@oracle.com \
--cc=luto@kernel.org \
--cc=maobibo@loongson.cn \
--cc=martin.lau@linux.dev \
--cc=maz@kernel.org \
--cc=mhocko@suse.com \
--cc=mingo@redhat.com \
--cc=mlevitsk@redhat.com \
--cc=nikita.kalyazin@linux.dev \
--cc=oupton@kernel.org \
--cc=palmer@dabbelt.com \
--cc=patrick.roy@linux.dev \
--cc=pavel@kernel.org \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=peterz@infradead.org \
--cc=pfalcato@suse.de \
--cc=pjw@kernel.org \
--cc=prsampat@amd.com \
--cc=rafael@kernel.org \
--cc=riel@surriel.com \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=sdf@fomichev.me \
--cc=seanjc@google.com \
--cc=shijie@os.amperecomputing.com \
--cc=skhan@linuxfoundation.org \
--cc=song@kernel.org \
--cc=surenb@google.com \
--cc=suzuki.poulose@arm.com \
--cc=svens@linux.ibm.com \
--cc=tabba@google.com \
--cc=tglx@kernel.org \
--cc=thuth@redhat.com \
--cc=urezki@gmail.com \
--cc=vannapurve@google.com \
--cc=vbabka@kernel.org \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=wu.fei9@sanechips.com.cn \
--cc=x86@kernel.org \
--cc=yang@os.amperecomputing.com \
--cc=yangyicong@hisilicon.com \
--cc=yonghong.song@linux.dev \
--cc=yosry@kernel.org \
--cc=yu-cheng.yu@intel.com \
--cc=yuzenghui@huawei.com \
--cc=zhengqi.arch@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox