linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Kalyazin, Nikita" <kalyazin@amazon.co.uk>
To: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"kvmarm@lists.linux.dev" <kvmarm@lists.linux.dev>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
	"linux-kselftest@vger.kernel.org"
	<linux-kselftest@vger.kernel.org>
Cc: "pbonzini@redhat.com" <pbonzini@redhat.com>,
	"corbet@lwn.net" <corbet@lwn.net>,
	"maz@kernel.org" <maz@kernel.org>,
	"oupton@kernel.org" <oupton@kernel.org>,
	"joey.gouly@arm.com" <joey.gouly@arm.com>,
	"suzuki.poulose@arm.com" <suzuki.poulose@arm.com>,
	"yuzenghui@huawei.com" <yuzenghui@huawei.com>,
	"catalin.marinas@arm.com" <catalin.marinas@arm.com>,
	"will@kernel.org" <will@kernel.org>,
	"seanjc@google.com" <seanjc@google.com>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"bp@alien8.de" <bp@alien8.de>,
	"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
	"x86@kernel.org" <x86@kernel.org>,
	"hpa@zytor.com" <hpa@zytor.com>,
	"luto@kernel.org" <luto@kernel.org>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"willy@infradead.org" <willy@infradead.org>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"david@kernel.org" <david@kernel.org>,
	"lorenzo.stoakes@oracle.com" <lorenzo.stoakes@oracle.com>,
	"Liam.Howlett@oracle.com" <Liam.Howlett@oracle.com>,
	"vbabka@suse.cz" <vbabka@suse.cz>,
	"rppt@kernel.org" <rppt@kernel.org>,
	"surenb@google.com" <surenb@google.com>,
	"mhocko@suse.com" <mhocko@suse.com>,
	"ast@kernel.org" <ast@kernel.org>,
	"daniel@iogearbox.net" <daniel@iogearbox.net>,
	"andrii@kernel.org" <andrii@kernel.org>,
	"martin.lau@linux.dev" <martin.lau@linux.dev>,
	"eddyz87@gmail.com" <eddyz87@gmail.com>,
	"song@kernel.org" <song@kernel.org>,
	"yonghong.song@linux.dev" <yonghong.song@linux.dev>,
	"john.fastabend@gmail.com" <john.fastabend@gmail.com>,
	"kpsingh@kernel.org" <kpsingh@kernel.org>,
	"sdf@fomichev.me" <sdf@fomichev.me>,
	"haoluo@google.com" <haoluo@google.com>,
	"jolsa@kernel.org" <jolsa@kernel.org>,
	"jgg@ziepe.ca" <jgg@ziepe.ca>,
	"jhubbard@nvidia.com" <jhubbard@nvidia.com>,
	"peterx@redhat.com" <peterx@redhat.com>,
	"jannh@google.com" <jannh@google.com>,
	"pfalcato@suse.de" <pfalcato@suse.de>,
	"shuah@kernel.org" <shuah@kernel.org>,
	"riel@surriel.com" <riel@surriel.com>,
	"baohua@kernel.org" <baohua@kernel.org>,
	"ryan.roberts@arm.com" <ryan.roberts@arm.com>,
	"jgross@suse.com" <jgross@suse.com>,
	"yu-cheng.yu@intel.com" <yu-cheng.yu@intel.com>,
	"kas@kernel.org" <kas@kernel.org>,
	"coxu@redhat.com" <coxu@redhat.com>,
	"kevin.brodsky@arm.com" <kevin.brodsky@arm.com>,
	"ackerleytng@google.com" <ackerleytng@google.com>,
	"maobibo@loongson.cn" <maobibo@loongson.cn>,
	"prsampat@amd.com" <prsampat@amd.com>,
	"mlevitsk@redhat.com" <mlevitsk@redhat.com>,
	"isaku.yamahata@intel.com" <isaku.yamahata@intel.com>,
	"jmattson@google.com" <jmattson@google.com>,
	"jthoughton@google.com" <jthoughton@google.com>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	"vannapurve@google.com" <vannapurve@google.com>,
	"jackmanb@google.com" <jackmanb@google.com>,
	"aneesh.kumar@kernel.org" <aneesh.kumar@kernel.org>,
	"patrick.roy@linux.dev" <patrick.roy@linux.dev>,
	"Thomson, Jack" <jackabt@amazon.co.uk>,
	"Itazuri, Takahiro" <itazur@amazon.co.uk>,
	"Manwaring, Derek" <derekmn@amazon.com>,
	"Cali, Marco" <xmarcalx@amazon.co.uk>,
	"Kalyazin, Nikita" <kalyazin@amazon.co.uk>
Subject: [PATCH v8 05/13] KVM: guest_memfd: Add flag to remove from direct map
Date: Fri, 5 Dec 2025 16:58:43 +0000	[thread overview]
Message-ID: <20251205165743.9341-6-kalyazin@amazon.com> (raw)
In-Reply-To: <20251205165743.9341-1-kalyazin@amazon.com>

From: Patrick Roy <patrick.roy@linux.dev>

Add GUEST_MEMFD_FLAG_NO_DIRECT_MAP flag for KVM_CREATE_GUEST_MEMFD()
ioctl. When set, guest_memfd folios will be removed from the direct map
after preparation, with direct map entries only restored when the folios
are freed.

To ensure these folios do not end up in places where the kernel cannot
deal with them, set AS_NO_DIRECT_MAP on the guest_memfd's struct
address_space if GUEST_MEMFD_FLAG_NO_DIRECT_MAP is requested.

Note that this flag causes removal of direct map entries for all
guest_memfd folios independent of whether they are "shared" or "private"
(although current guest_memfd only supports either all folios in the
"shared" state, or all folios in the "private" state if
GUEST_MEMFD_FLAG_MMAP is not set). The usecase for removing direct map
entries of also the shared parts of guest_memfd are a special type of
non-CoCo VM where, host userspace is trusted to have access to all of
guest memory, but where Spectre-style transient execution attacks
through the host kernel's direct map should still be mitigated.  In this
setup, KVM retains access to guest memory via userspace mappings of
guest_memfd, which are reflected back into KVM's memslots via
userspace_addr. This is needed for things like MMIO emulation on x86_64
to work.

Direct map entries are zapped right before guest or userspace mappings
of gmem folios are set up, e.g. in kvm_gmem_fault_user_mapping() or
kvm_gmem_get_pfn() [called from the KVM MMU code]. The only place where
a gmem folio can be allocated without being mapped anywhere is
kvm_gmem_populate(), where handling potential failures of direct map
removal is not possible (by the time direct map removal is attempted,
the folio is already marked as prepared, meaning attempting to re-try
kvm_gmem_populate() would just result in -EEXIST without fixing up the
direct map state). These folios are then removed form the direct map
upon kvm_gmem_get_pfn(), e.g. when they are mapped into the guest later.

Signed-off-by: Patrick Roy <patrick.roy@linux.dev>
Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com>
---
 Documentation/virt/kvm/api.rst | 22 ++++++++-----
 include/linux/kvm_host.h       | 12 +++++++
 include/uapi/linux/kvm.h       |  1 +
 virt/kvm/guest_memfd.c         | 60 ++++++++++++++++++++++++++++++++++
 4 files changed, 86 insertions(+), 9 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 01a3abef8abb..c5f54f1370c8 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6440,15 +6440,19 @@ a single guest_memfd file, but the bound ranges must not overlap).
 The capability KVM_CAP_GUEST_MEMFD_FLAGS enumerates the `flags` that can be
 specified via KVM_CREATE_GUEST_MEMFD.  Currently defined flags:
 
-  ============================ ================================================
-  GUEST_MEMFD_FLAG_MMAP        Enable using mmap() on the guest_memfd file
-                               descriptor.
-  GUEST_MEMFD_FLAG_INIT_SHARED Make all memory in the file shared during
-                               KVM_CREATE_GUEST_MEMFD (memory files created
-                               without INIT_SHARED will be marked private).
-                               Shared memory can be faulted into host userspace
-                               page tables. Private memory cannot.
-  ============================ ================================================
+  ============================== ================================================
+  GUEST_MEMFD_FLAG_MMAP          Enable using mmap() on the guest_memfd file
+                                 descriptor.
+  GUEST_MEMFD_FLAG_INIT_SHARED   Make all memory in the file shared during
+                                 KVM_CREATE_GUEST_MEMFD (memory files created
+                                 without INIT_SHARED will be marked private).
+                                 Shared memory can be faulted into host userspace
+                                 page tables. Private memory cannot.
+  GUEST_MEMFD_FLAG_NO_DIRECT_MAP The guest_memfd instance will behave similarly
+                                 to memfd_secret, and unmaps the memory backing
+                                 it from the kernel's address space before
+                                 being passed off to userspace or the guest.
+  ============================== ================================================
 
 When the KVM MMU performs a PFN lookup to service a guest fault and the backing
 guest_memfd has the GUEST_MEMFD_FLAG_MMAP set, then the fault will always be
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 27796a09d29b..d4d5306075bf 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -738,10 +738,22 @@ static inline u64 kvm_gmem_get_supported_flags(struct kvm *kvm)
 	if (!kvm || kvm_arch_supports_gmem_init_shared(kvm))
 		flags |= GUEST_MEMFD_FLAG_INIT_SHARED;
 
+	if (kvm_arch_gmem_supports_no_direct_map())
+		flags |= GUEST_MEMFD_FLAG_NO_DIRECT_MAP;
+
 	return flags;
 }
 #endif
 
+#ifdef CONFIG_KVM_GUEST_MEMFD
+#ifndef kvm_arch_gmem_supports_no_direct_map
+static inline bool kvm_arch_gmem_supports_no_direct_map(void)
+{
+	return false;
+}
+#endif
+#endif /* CONFIG_KVM_GUEST_MEMFD */
+
 #ifndef kvm_arch_has_readonly_mem
 static inline bool kvm_arch_has_readonly_mem(struct kvm *kvm)
 {
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index dddb781b0507..60341e1ba1be 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1612,6 +1612,7 @@ struct kvm_memory_attributes {
 #define KVM_CREATE_GUEST_MEMFD	_IOWR(KVMIO,  0xd4, struct kvm_create_guest_memfd)
 #define GUEST_MEMFD_FLAG_MMAP		(1ULL << 0)
 #define GUEST_MEMFD_FLAG_INIT_SHARED	(1ULL << 1)
+#define GUEST_MEMFD_FLAG_NO_DIRECT_MAP	(1ULL << 2)
 
 struct kvm_create_guest_memfd {
 	__u64 size;
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 92e7f8c1f303..ec4966a47d5e 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -7,6 +7,9 @@
 #include <linux/mempolicy.h>
 #include <linux/pseudo_fs.h>
 #include <linux/pagemap.h>
+#include <linux/set_memory.h>
+
+#include <asm/tlbflush.h>
 
 #include "kvm_mm.h"
 
@@ -76,6 +79,49 @@ static int __kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slo
 	return 0;
 }
 
+#define KVM_GMEM_FOLIO_NO_DIRECT_MAP BIT(0)
+
+static bool kvm_gmem_folio_no_direct_map(struct folio *folio)
+{
+	return ((u64) folio->private) & KVM_GMEM_FOLIO_NO_DIRECT_MAP;
+}
+
+static int kvm_gmem_folio_zap_direct_map(struct folio *folio)
+{
+	int r = 0;
+	unsigned long addr = (unsigned long) folio_address(folio);
+	u64 gmem_flags = GMEM_I(folio_inode(folio))->flags;
+
+	if (kvm_gmem_folio_no_direct_map(folio) || !(gmem_flags & GUEST_MEMFD_FLAG_NO_DIRECT_MAP))
+		goto out;
+
+	r = set_direct_map_valid_noflush(folio_page(folio, 0), folio_nr_pages(folio),
+					 false);
+
+	if (r)
+		goto out;
+
+	folio->private = (void *) KVM_GMEM_FOLIO_NO_DIRECT_MAP;
+	flush_tlb_kernel_range(addr, addr + folio_size(folio));
+
+out:
+	return r;
+}
+
+static void kvm_gmem_folio_restore_direct_map(struct folio *folio)
+{
+	/*
+	 * Direct map restoration cannot fail, as the only error condition
+	 * for direct map manipulation is failure to allocate page tables
+	 * when splitting huge pages, but this split would have already
+	 * happened in set_direct_map_invalid_noflush() in kvm_gmem_folio_zap_direct_map().
+	 * Thus set_direct_map_valid_noflush() here only updates prot bits.
+	 */
+	if (kvm_gmem_folio_no_direct_map(folio))
+		set_direct_map_valid_noflush(folio_page(folio, 0), folio_nr_pages(folio),
+					 true);
+}
+
 static inline void kvm_gmem_mark_prepared(struct folio *folio)
 {
 	folio_mark_uptodate(folio);
@@ -398,6 +444,7 @@ static vm_fault_t kvm_gmem_fault_user_mapping(struct vm_fault *vmf)
 	struct inode *inode = file_inode(vmf->vma->vm_file);
 	struct folio *folio;
 	vm_fault_t ret = VM_FAULT_LOCKED;
+	int err;
 
 	if (((loff_t)vmf->pgoff << PAGE_SHIFT) >= i_size_read(inode))
 		return VM_FAULT_SIGBUS;
@@ -423,6 +470,12 @@ static vm_fault_t kvm_gmem_fault_user_mapping(struct vm_fault *vmf)
 		kvm_gmem_mark_prepared(folio);
 	}
 
+	err = kvm_gmem_folio_zap_direct_map(folio);
+	if (err) {
+		ret = vmf_error(err);
+		goto out_folio;
+	}
+
 	vmf->page = folio_file_page(folio, vmf->pgoff);
 
 out_folio:
@@ -533,6 +586,8 @@ static void kvm_gmem_free_folio(struct folio *folio)
 	kvm_pfn_t pfn = page_to_pfn(page);
 	int order = folio_order(folio);
 
+	kvm_gmem_folio_restore_direct_map(folio);
+
 	kvm_arch_gmem_invalidate(pfn, pfn + (1ul << order));
 }
 
@@ -596,6 +651,9 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
 	/* Unmovable mappings are supposed to be marked unevictable as well. */
 	WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping));
 
+	if (flags & GUEST_MEMFD_FLAG_NO_DIRECT_MAP)
+		mapping_set_no_direct_map(inode->i_mapping);
+
 	GMEM_I(inode)->flags = flags;
 
 	file = alloc_file_pseudo(inode, kvm_gmem_mnt, name, O_RDWR, &kvm_gmem_fops);
@@ -807,6 +865,8 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 	if (!is_prepared)
 		r = kvm_gmem_prepare_folio(kvm, slot, gfn, folio);
 
+	kvm_gmem_folio_zap_direct_map(folio);
+
 	folio_unlock(folio);
 
 	if (!r)
-- 
2.50.1



  parent reply	other threads:[~2025-12-05 16:59 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-05 16:57 [PATCH v8 00/13] Direct Map Removal Support for guest_memfd Kalyazin, Nikita
2025-12-05 16:57 ` [PATCH v8 01/13] x86: export set_direct_map_valid_noflush to KVM module Kalyazin, Nikita
2025-12-05 17:26   ` Dave Hansen
2025-12-05 16:58 ` [PATCH v8 02/13] x86/tlb: export flush_tlb_kernel_range " Kalyazin, Nikita
2025-12-05 16:58 ` [PATCH v8 03/13] mm: introduce AS_NO_DIRECT_MAP Kalyazin, Nikita
2025-12-05 18:35   ` John Hubbard
2025-12-05 16:58 ` [PATCH v8 04/13] KVM: guest_memfd: Add stub for kvm_arch_gmem_invalidate Kalyazin, Nikita
2025-12-05 16:58 ` Kalyazin, Nikita [this message]
2025-12-05 17:30   ` [PATCH v8 05/13] KVM: guest_memfd: Add flag to remove from direct map Dave Hansen
2025-12-05 16:58 ` [PATCH v8 06/13] KVM: x86: define kvm_arch_gmem_supports_no_direct_map() Kalyazin, Nikita
2025-12-05 16:59 ` [PATCH v8 07/13] KVM: arm64: " Kalyazin, Nikita
2025-12-05 16:59 ` [PATCH v8 08/13] KVM: selftests: load elf via bounce buffer Kalyazin, Nikita
2025-12-05 16:59 ` [PATCH v8 09/13] KVM: selftests: set KVM_MEM_GUEST_MEMFD in vm_mem_add() if guest_memfd != -1 Kalyazin, Nikita
2025-12-05 16:59 ` [PATCH v8 10/13] KVM: selftests: Add guest_memfd based vm_mem_backing_src_types Kalyazin, Nikita
2025-12-05 16:59 ` [PATCH v8 11/13] KVM: selftests: cover GUEST_MEMFD_FLAG_NO_DIRECT_MAP in existing selftests Kalyazin, Nikita
2025-12-05 17:00 ` [PATCH v8 12/13] KVM: selftests: stuff vm_mem_backing_src_type into vm_shape Kalyazin, Nikita
2025-12-05 17:00 ` [PATCH v8 13/13] KVM: selftests: Test guest execution from direct map removed gmem Kalyazin, Nikita

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251205165743.9341-6-kalyazin@amazon.com \
    --to=kalyazin@amazon.co.uk \
    --cc=Liam.Howlett@oracle.com \
    --cc=ackerleytng@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrii@kernel.org \
    --cc=aneesh.kumar@kernel.org \
    --cc=ast@kernel.org \
    --cc=baohua@kernel.org \
    --cc=bp@alien8.de \
    --cc=bpf@vger.kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=corbet@lwn.net \
    --cc=coxu@redhat.com \
    --cc=daniel@iogearbox.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@kernel.org \
    --cc=derekmn@amazon.com \
    --cc=eddyz87@gmail.com \
    --cc=haoluo@google.com \
    --cc=hpa@zytor.com \
    --cc=isaku.yamahata@intel.com \
    --cc=itazur@amazon.co.uk \
    --cc=jackabt@amazon.co.uk \
    --cc=jackmanb@google.com \
    --cc=jannh@google.com \
    --cc=jgg@ziepe.ca \
    --cc=jgross@suse.com \
    --cc=jhubbard@nvidia.com \
    --cc=jmattson@google.com \
    --cc=joey.gouly@arm.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=jthoughton@google.com \
    --cc=kas@kernel.org \
    --cc=kevin.brodsky@arm.com \
    --cc=kpsingh@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=luto@kernel.org \
    --cc=maobibo@loongson.cn \
    --cc=martin.lau@linux.dev \
    --cc=maz@kernel.org \
    --cc=mhocko@suse.com \
    --cc=mingo@redhat.com \
    --cc=mlevitsk@redhat.com \
    --cc=oupton@kernel.org \
    --cc=patrick.roy@linux.dev \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pfalcato@suse.de \
    --cc=prsampat@amd.com \
    --cc=riel@surriel.com \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=sdf@fomichev.me \
    --cc=seanjc@google.com \
    --cc=shuah@kernel.org \
    --cc=song@kernel.org \
    --cc=surenb@google.com \
    --cc=suzuki.poulose@arm.com \
    --cc=tglx@linutronix.de \
    --cc=vannapurve@google.com \
    --cc=vbabka@suse.cz \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    --cc=xmarcalx@amazon.co.uk \
    --cc=yonghong.song@linux.dev \
    --cc=yu-cheng.yu@intel.com \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox