linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v2 00/10] KVM: Restricted mapping of guest_memfd at the host and pKVM/arm64 support
@ 2024-08-01  9:01 Fuad Tabba
  2024-08-01  9:01 ` [RFC PATCH v2 01/10] KVM: Introduce kvm_gmem_get_pfn_locked(), which retains the folio lock Fuad Tabba
                   ` (10 more replies)
  0 siblings, 11 replies; 21+ messages in thread
From: Fuad Tabba @ 2024-08-01  9:01 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, yu.c.zhang,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail,
	david, michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat,
	shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd, tabba

This series adds restricted mmap() support to guest_memfd, as
well as support for guest_memfd on pKVM/arm64. It is based on
Linux 6.10.

Main changes since V1 [1]:

- Decoupled whether guest memory is mappable from KVM memory
attributes (SeanC)

Mappability is now tracked in the guest_mem object, orthogonal to
whether userspace wants the memory to be private or shared.
Moreover, the memory attributes capability (i.e.,
KVM_CAP_MEMORY_ATTRIBUTES) is not enabled for pKVM, since for
software-based hypervisors such as pKVM and Gunyah, userspace is
informed of the state of the memory via hypervisor exits if
needed.

Even if attributes are enabled, this patch series would still
work (modulo bugs), without compromising guest memory nor
crashing the system.

- Use page_mapped() instead of page_mapcount() to check if page
is mapped (DavidH)

- Add a new capability, KVM_CAP_GUEST_MEMFD_MAPPABLE, to query
whether guest private memory can be mapped (with aforementioned
restrictions)

- Add a selftest to check whether memory is mappable when the
capability is enabled, and not mappable otherwise. Also, test the
effect of punching holes in mapped memory. (DavidH)

By design, guest_memfd cannot be mapped, read, or written by the
host. In pKVM, memory shared between a protected guest and the
host is shared in-place, unlike the other confidential computing
solutions that guest_memfd was originally envisaged for (e.g,
TDX). When initializing a guest, as well as when accessing memory
shared by the guest with the host, it would be useful to support
mapping that memory at the host to avoid copying its contents.

One of the benefits of guest_memfd is that it prevents a
misbehaving host process from crashing the system when attempting
to access (deliberately or accidentally) protected guest memory,
since this memory isn't mapped to begin with. Without
guest_memfd, the hypervisor would still prevent such accesses,
but in certain cases the host kernel wouldn't be able to recover,
causing the system to crash.

Support for mmap() in this patch series maintains the invariant
that only memory shared with the host, either explicitly by the
guest or implicitly before the guest has started running (in
order to populate its memory) is allowed to have a valid mapping
at the host. At no time should private (as viewed by the
hypervisor) guest memory be mapped at the host.

This patch series is divided into two parts:

The first part is to the KVM core code. It adds opt-in support
for mapping guest memory only as long as it is shared. For that,
the host needs to know the mappability status of guest memory.
Therefore, the series adds a structure to track whether memory is
mappable. This new structure is associated with each guest_memfd
object.

The second part of the series adds guest_memfd support for
pKVM/arm64.

We don't enforce the invariant that only memory shared with the
host can be mapped by the host userspace in
file_operations::mmap(), but we enforce it in
vm_operations_struct:fault(). On vm_operations_struct::fault(),
we check whether the page is allowed to be mapped. If not, we
deliver a SIGBUS to the current task, as discussed in the Linux
MM Alignment Session on this topic [2].

Currently there's no support for huge pages, which is something
we hope to add in the future, and seems to be a hot topic for the
upcoming LPC 2024 [3].

Cheers,
/fuad

[1] https://lore.kernel.org/all/20240222161047.402609-1-tabba@google.com/

[2] https://lore.kernel.org/all/20240712232937.2861788-1-ackerleytng@google.com/

[3] https://lpc.events/event/18/sessions/183/#20240919

Fuad Tabba (10):
  KVM: Introduce kvm_gmem_get_pfn_locked(), which retains the folio lock
  KVM: Add restricted support for mapping guestmem by the host
  KVM: Implement kvm_(read|/write)_guest_page for private memory slots
  KVM: Add KVM capability to check if guest_memfd can be mapped by the
    host
  KVM: selftests: guest_memfd mmap() test when mapping is allowed
  KVM: arm64: Skip VMA checks for slots without userspace address
  KVM: arm64: Do not allow changes to private memory slots
  KVM: arm64: Handle guest_memfd()-backed guest page faults
  KVM: arm64: arm64 has private memory support when config is enabled
  KVM: arm64: Enable private memory kconfig for arm64

 arch/arm64/include/asm/kvm_host.h             |   3 +
 arch/arm64/kvm/Kconfig                        |   1 +
 arch/arm64/kvm/mmu.c                          | 139 +++++++++-
 include/linux/kvm_host.h                      |  72 +++++
 include/uapi/linux/kvm.h                      |   3 +-
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../testing/selftests/kvm/guest_memfd_test.c  |  47 +++-
 virt/kvm/Kconfig                              |   4 +
 virt/kvm/guest_memfd.c                        | 129 ++++++++-
 virt/kvm/kvm_main.c                           | 253 ++++++++++++++++--
 10 files changed, 628 insertions(+), 24 deletions(-)


base-commit: 0c3836482481200ead7b416ca80c68a29cfdaabd
-- 
2.46.0.rc1.232.g9752f9e123-goog



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 01/10] KVM: Introduce kvm_gmem_get_pfn_locked(), which retains the folio lock
  2024-08-01  9:01 [RFC PATCH v2 00/10] KVM: Restricted mapping of guest_memfd at the host and pKVM/arm64 support Fuad Tabba
@ 2024-08-01  9:01 ` Fuad Tabba
  2024-08-01  9:01 ` [RFC PATCH v2 02/10] KVM: Add restricted support for mapping guestmem by the host Fuad Tabba
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: Fuad Tabba @ 2024-08-01  9:01 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, yu.c.zhang,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail,
	david, michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat,
	shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd, tabba

Create a new variant of kvm_gmem_get_pfn(), which retains the
folio lock if it returns successfully.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 include/linux/kvm_host.h | 11 +++++++++++
 virt/kvm/guest_memfd.c   | 19 ++++++++++++++++---
 2 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 692c01e41a18..43a157f8171a 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2431,6 +2431,8 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
 #ifdef CONFIG_KVM_PRIVATE_MEM
 int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 		     gfn_t gfn, kvm_pfn_t *pfn, int *max_order);
+int kvm_gmem_get_pfn_locked(struct kvm *kvm, struct kvm_memory_slot *slot,
+			      gfn_t gfn, kvm_pfn_t *pfn, int *max_order);
 #else
 static inline int kvm_gmem_get_pfn(struct kvm *kvm,
 				   struct kvm_memory_slot *slot, gfn_t gfn,
@@ -2439,6 +2441,15 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
 	KVM_BUG_ON(1, kvm);
 	return -EIO;
 }
+
+static inline int kvm_gmem_get_pfn_locked(struct kvm *kvm,
+					  struct kvm_memory_slot *slot,
+					  gfn_t gfn, kvm_pfn_t *pfn,
+					  int *max_order)
+{
+	KVM_BUG_ON(1, kvm);
+	return -EIO;
+}
 #endif /* CONFIG_KVM_PRIVATE_MEM */
 
 #endif
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 747fe251e445..f3f4334a9ccb 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -482,8 +482,8 @@ void kvm_gmem_unbind(struct kvm_memory_slot *slot)
 	fput(file);
 }
 
-int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
-		     gfn_t gfn, kvm_pfn_t *pfn, int *max_order)
+int kvm_gmem_get_pfn_locked(struct kvm *kvm, struct kvm_memory_slot *slot,
+			    gfn_t gfn, kvm_pfn_t *pfn, int *max_order)
 {
 	pgoff_t index = gfn - slot->base_gfn + slot->gmem.pgoff;
 	struct kvm_gmem *gmem;
@@ -524,10 +524,23 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 
 	r = 0;
 
-	folio_unlock(folio);
 out_fput:
 	fput(file);
 
 	return r;
 }
+EXPORT_SYMBOL_GPL(kvm_gmem_get_pfn_locked);
+
+int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
+		     gfn_t gfn, kvm_pfn_t *pfn, int *max_order)
+{
+	int r;
+
+	r = kvm_gmem_get_pfn_locked(kvm, slot, gfn, pfn, max_order);
+	if (r)
+		return r;
+
+	unlock_page(pfn_to_page(*pfn));
+	return 0;
+}
 EXPORT_SYMBOL_GPL(kvm_gmem_get_pfn);
-- 
2.46.0.rc1.232.g9752f9e123-goog



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 02/10] KVM: Add restricted support for mapping guestmem by the host
  2024-08-01  9:01 [RFC PATCH v2 00/10] KVM: Restricted mapping of guest_memfd at the host and pKVM/arm64 support Fuad Tabba
  2024-08-01  9:01 ` [RFC PATCH v2 01/10] KVM: Introduce kvm_gmem_get_pfn_locked(), which retains the folio lock Fuad Tabba
@ 2024-08-01  9:01 ` Fuad Tabba
  2024-08-05 17:14   ` Ackerley Tng
  2024-08-01  9:01 ` [RFC PATCH v2 03/10] KVM: Implement kvm_(read|/write)_guest_page for private memory slots Fuad Tabba
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 21+ messages in thread
From: Fuad Tabba @ 2024-08-01  9:01 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, yu.c.zhang,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail,
	david, michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat,
	shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd, tabba

Add support for mmap() and fault() for guest_memfd in the host.
The ability to fault in a guest page is contingent on that page
being shared with the host. To track this, this patch adds a new
xarray to each guest_memfd object, which tracks the mappability
of guest frames.

The guest_memfd PRIVATE memory attribute is not used for two
reasons. First because it reflects the userspace expectation for
that memory location, and therefore can be toggled by userspace.
The second is, although each guest_memfd file has a 1:1 binding
with a KVM instance, the plan is to allow multiple files per
inode, e.g. to allow intra-host migration to a new KVM instance,
without destroying guest_memfd.

This new feature is gated with a new configuration option,
CONFIG_KVM_PRIVATE_MEM_MAPPABLE.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 include/linux/kvm_host.h |  61 ++++++++++++++++++++
 virt/kvm/Kconfig         |   4 ++
 virt/kvm/guest_memfd.c   | 110 +++++++++++++++++++++++++++++++++++
 virt/kvm/kvm_main.c      | 122 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 297 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 43a157f8171a..ab1344327e57 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2452,4 +2452,65 @@ static inline int kvm_gmem_get_pfn_locked(struct kvm *kvm,
 }
 #endif /* CONFIG_KVM_PRIVATE_MEM */
 
+#ifdef CONFIG_KVM_PRIVATE_MEM_MAPPABLE
+bool kvm_gmem_is_mappable(struct kvm *kvm, gfn_t gfn, gfn_t end);
+bool kvm_gmem_is_mapped(struct kvm *kvm, gfn_t start, gfn_t end);
+int kvm_gmem_set_mappable(struct kvm *kvm, gfn_t start, gfn_t end);
+int kvm_gmem_clear_mappable(struct kvm *kvm, gfn_t start, gfn_t end);
+int kvm_slot_gmem_toggle_mappable(struct kvm_memory_slot *slot, gfn_t start,
+				  gfn_t end, bool is_mappable);
+int kvm_slot_gmem_set_mappable(struct kvm_memory_slot *slot, gfn_t start,
+			       gfn_t end);
+int kvm_slot_gmem_clear_mappable(struct kvm_memory_slot *slot, gfn_t start,
+				 gfn_t end);
+bool kvm_slot_gmem_is_mappable(struct kvm_memory_slot *slot, gfn_t gfn);
+#else
+static inline bool kvm_gmem_is_mappable(struct kvm *kvm, gfn_t gfn, gfn_t end)
+{
+	WARN_ON_ONCE(1);
+	return false;
+}
+static inline bool kvm_gmem_is_mapped(struct kvm *kvm, gfn_t start, gfn_t end)
+{
+	WARN_ON_ONCE(1);
+	return false;
+}
+static inline int kvm_gmem_set_mappable(struct kvm *kvm, gfn_t start, gfn_t end)
+{
+	WARN_ON_ONCE(1);
+	return -EINVAL;
+}
+static inline int kvm_gmem_clear_mappable(struct kvm *kvm, gfn_t start,
+					  gfn_t end)
+{
+	WARN_ON_ONCE(1);
+	return -EINVAL;
+}
+static inline int kvm_slot_gmem_toggle_mappable(struct kvm_memory_slot *slot,
+						gfn_t start, gfn_t end,
+						bool is_mappable)
+{
+	WARN_ON_ONCE(1);
+	return -EINVAL;
+}
+static inline int kvm_slot_gmem_set_mappable(struct kvm_memory_slot *slot,
+					     gfn_t start, gfn_t end)
+{
+	WARN_ON_ONCE(1);
+	return -EINVAL;
+}
+static inline int kvm_slot_gmem_clear_mappable(struct kvm_memory_slot *slot,
+					       gfn_t start, gfn_t end)
+{
+	WARN_ON_ONCE(1);
+	return -EINVAL;
+}
+static inline bool kvm_slot_gmem_is_mappable(struct kvm_memory_slot *slot,
+					     gfn_t gfn)
+{
+	WARN_ON_ONCE(1);
+	return false;
+}
+#endif /* CONFIG_KVM_PRIVATE_MEM_MAPPABLE */
+
 #endif
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 29b73eedfe74..a3970c5eca7b 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -109,3 +109,7 @@ config KVM_GENERIC_PRIVATE_MEM
        select KVM_GENERIC_MEMORY_ATTRIBUTES
        select KVM_PRIVATE_MEM
        bool
+
+config KVM_PRIVATE_MEM_MAPPABLE
+       select KVM_PRIVATE_MEM
+       bool
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index f3f4334a9ccb..0a1f266a16f9 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -11,6 +11,9 @@ struct kvm_gmem {
 	struct kvm *kvm;
 	struct xarray bindings;
 	struct list_head entry;
+#ifdef CONFIG_KVM_PRIVATE_MEM_MAPPABLE
+	struct xarray unmappable_gfns;
+#endif
 };
 
 static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index)
@@ -230,6 +233,11 @@ static int kvm_gmem_release(struct inode *inode, struct file *file)
 	mutex_unlock(&kvm->slots_lock);
 
 	xa_destroy(&gmem->bindings);
+
+#ifdef CONFIG_KVM_PRIVATE_MEM_MAPPABLE
+	xa_destroy(&gmem->unmappable_gfns);
+#endif
+
 	kfree(gmem);
 
 	kvm_put_kvm(kvm);
@@ -248,7 +256,105 @@ static inline struct file *kvm_gmem_get_file(struct kvm_memory_slot *slot)
 	return get_file_active(&slot->gmem.file);
 }
 
+#ifdef CONFIG_KVM_PRIVATE_MEM_MAPPABLE
+int kvm_slot_gmem_toggle_mappable(struct kvm_memory_slot *slot, gfn_t start,
+				  gfn_t end, bool is_mappable)
+{
+	struct kvm_gmem *gmem = slot->gmem.file->private_data;
+	void *xval = is_mappable ? NULL : xa_mk_value(true);
+	void *r;
+
+	r = xa_store_range(&gmem->unmappable_gfns, start, end - 1, xval, GFP_KERNEL);
+
+	return xa_err(r);
+}
+
+int kvm_slot_gmem_set_mappable(struct kvm_memory_slot *slot, gfn_t start, gfn_t end)
+{
+	return kvm_slot_gmem_toggle_mappable(slot, start, end, true);
+}
+
+int kvm_slot_gmem_clear_mappable(struct kvm_memory_slot *slot, gfn_t start, gfn_t end)
+{
+	return kvm_slot_gmem_toggle_mappable(slot, start, end, false);
+}
+
+bool kvm_slot_gmem_is_mappable(struct kvm_memory_slot *slot, gfn_t gfn)
+{
+	struct kvm_gmem *gmem = slot->gmem.file->private_data;
+	unsigned long _gfn = gfn;
+
+	return !xa_find(&gmem->unmappable_gfns, &_gfn, ULONG_MAX, XA_PRESENT);
+}
+
+static bool kvm_gmem_isfaultable(struct vm_fault *vmf)
+{
+	struct kvm_gmem *gmem = vmf->vma->vm_file->private_data;
+	struct inode *inode = file_inode(vmf->vma->vm_file);
+	pgoff_t pgoff = vmf->pgoff;
+	struct kvm_memory_slot *slot;
+	unsigned long index;
+	bool r = true;
+
+	filemap_invalidate_lock(inode->i_mapping);
+
+	xa_for_each_range(&gmem->bindings, index, slot, pgoff, pgoff) {
+		pgoff_t base_gfn = slot->base_gfn;
+		pgoff_t gfn_pgoff = slot->gmem.pgoff;
+		pgoff_t gfn = base_gfn + max(gfn_pgoff, pgoff) - gfn_pgoff;
+
+		if (!kvm_slot_gmem_is_mappable(slot, gfn)) {
+			r = false;
+			break;
+		}
+	}
+
+	filemap_invalidate_unlock(inode->i_mapping);
+
+	return r;
+}
+
+static vm_fault_t kvm_gmem_fault(struct vm_fault *vmf)
+{
+	struct folio *folio;
+
+	folio = kvm_gmem_get_folio(file_inode(vmf->vma->vm_file), vmf->pgoff);
+	if (!folio)
+		return VM_FAULT_SIGBUS;
+
+	if (!kvm_gmem_isfaultable(vmf)) {
+		folio_unlock(folio);
+		folio_put(folio);
+		return VM_FAULT_SIGBUS;
+	}
+
+	vmf->page = folio_file_page(folio, vmf->pgoff);
+	return VM_FAULT_LOCKED;
+}
+
+static const struct vm_operations_struct kvm_gmem_vm_ops = {
+	.fault = kvm_gmem_fault,
+};
+
+static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) !=
+	    (VM_SHARED | VM_MAYSHARE)) {
+		return -EINVAL;
+	}
+
+	file_accessed(file);
+	vm_flags_set(vma, VM_DONTDUMP);
+	vma->vm_ops = &kvm_gmem_vm_ops;
+
+	return 0;
+}
+#else
+#define kvm_gmem_mmap NULL
+#endif /* CONFIG_KVM_PRIVATE_MEM_MAPPABLE */
+
 static struct file_operations kvm_gmem_fops = {
+	.mmap		= kvm_gmem_mmap,
 	.open		= generic_file_open,
 	.release	= kvm_gmem_release,
 	.fallocate	= kvm_gmem_fallocate,
@@ -369,6 +475,10 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
 	xa_init(&gmem->bindings);
 	list_add(&gmem->entry, &inode->i_mapping->i_private_list);
 
+#ifdef CONFIG_KVM_PRIVATE_MEM_MAPPABLE
+	xa_init(&gmem->unmappable_gfns);
+#endif
+
 	fd_install(fd, file);
 	return fd;
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1192942aef91..f4b4498d4de6 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3265,6 +3265,128 @@ static int next_segment(unsigned long len, int offset)
 		return len;
 }
 
+#ifdef CONFIG_KVM_PRIVATE_MEM_MAPPABLE
+static bool __kvm_gmem_is_mappable(struct kvm *kvm, gfn_t start, gfn_t end)
+{
+	struct kvm_memslot_iter iter;
+
+	lockdep_assert_held(&kvm->slots_lock);
+
+	kvm_for_each_memslot_in_gfn_range(&iter, kvm_memslots(kvm), start, end) {
+		struct kvm_memory_slot *memslot = iter.slot;
+		gfn_t gfn_start, gfn_end, i;
+
+		gfn_start = max(start, memslot->base_gfn);
+		gfn_end = min(end, memslot->base_gfn + memslot->npages);
+		if (WARN_ON_ONCE(gfn_start >= gfn_end))
+			continue;
+
+		for (i = gfn_start; i < gfn_end; i++) {
+			if (!kvm_slot_gmem_is_mappable(memslot, i))
+				return false;
+		}
+	}
+
+	return true;
+}
+
+bool kvm_gmem_is_mappable(struct kvm *kvm, gfn_t start, gfn_t end)
+{
+	bool r;
+
+	mutex_lock(&kvm->slots_lock);
+	r = __kvm_gmem_is_mappable(kvm, start, end);
+	mutex_unlock(&kvm->slots_lock);
+
+	return r;
+}
+
+static bool __kvm_gmem_is_mapped(struct kvm *kvm, gfn_t start, gfn_t end)
+{
+	struct kvm_memslot_iter iter;
+
+	lockdep_assert_held(&kvm->slots_lock);
+
+	kvm_for_each_memslot_in_gfn_range(&iter, kvm_memslots(kvm), start, end) {
+		struct kvm_memory_slot *memslot = iter.slot;
+		gfn_t gfn_start, gfn_end, i;
+
+		gfn_start = max(start, memslot->base_gfn);
+		gfn_end = min(end, memslot->base_gfn + memslot->npages);
+		if (WARN_ON_ONCE(gfn_start >= gfn_end))
+			continue;
+
+		for (i = gfn_start; i < gfn_end; i++) {
+			struct page *page;
+			bool is_mapped;
+			kvm_pfn_t pfn;
+
+			if (WARN_ON_ONCE(kvm_gmem_get_pfn_locked(kvm, memslot, i, &pfn, NULL)))
+				continue;
+
+			page = pfn_to_page(pfn);
+			is_mapped = page_mapped(page) || page_maybe_dma_pinned(page);
+			unlock_page(page);
+			put_page(page);
+
+			if (is_mapped)
+				return true;
+		}
+	}
+
+	return false;
+}
+
+bool kvm_gmem_is_mapped(struct kvm *kvm, gfn_t start, gfn_t end)
+{
+	bool r;
+
+	mutex_lock(&kvm->slots_lock);
+	r = __kvm_gmem_is_mapped(kvm, start, end);
+	mutex_unlock(&kvm->slots_lock);
+
+	return r;
+}
+
+static int kvm_gmem_toggle_mappable(struct kvm *kvm, gfn_t start, gfn_t end,
+				    bool is_mappable)
+{
+	struct kvm_memslot_iter iter;
+	int r = 0;
+
+	mutex_lock(&kvm->slots_lock);
+
+	kvm_for_each_memslot_in_gfn_range(&iter, kvm_memslots(kvm), start, end) {
+		struct kvm_memory_slot *memslot = iter.slot;
+		gfn_t gfn_start, gfn_end;
+
+		gfn_start = max(start, memslot->base_gfn);
+		gfn_end = min(end, memslot->base_gfn + memslot->npages);
+		if (WARN_ON_ONCE(start >= end))
+			continue;
+
+		r = kvm_slot_gmem_toggle_mappable(memslot, gfn_start, gfn_end, is_mappable);
+		if (WARN_ON_ONCE(r))
+			break;
+	}
+
+	mutex_unlock(&kvm->slots_lock);
+
+	return r;
+}
+
+int kvm_gmem_set_mappable(struct kvm *kvm, gfn_t start, gfn_t end)
+{
+	return kvm_gmem_toggle_mappable(kvm, start, end, true);
+}
+
+int kvm_gmem_clear_mappable(struct kvm *kvm, gfn_t start, gfn_t end)
+{
+	return kvm_gmem_toggle_mappable(kvm, start, end, false);
+}
+
+#endif /* CONFIG_KVM_PRIVATE_MEM_MAPPABLE */
+
 /* Copy @len bytes from guest memory at '(@gfn * PAGE_SIZE) + @offset' to @data */
 static int __kvm_read_guest_page(struct kvm_memory_slot *slot, gfn_t gfn,
 				 void *data, int offset, int len)
-- 
2.46.0.rc1.232.g9752f9e123-goog



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 03/10] KVM: Implement kvm_(read|/write)_guest_page for private memory slots
  2024-08-01  9:01 [RFC PATCH v2 00/10] KVM: Restricted mapping of guest_memfd at the host and pKVM/arm64 support Fuad Tabba
  2024-08-01  9:01 ` [RFC PATCH v2 01/10] KVM: Introduce kvm_gmem_get_pfn_locked(), which retains the folio lock Fuad Tabba
  2024-08-01  9:01 ` [RFC PATCH v2 02/10] KVM: Add restricted support for mapping guestmem by the host Fuad Tabba
@ 2024-08-01  9:01 ` Fuad Tabba
  2024-08-16 19:32   ` Sean Christopherson
  2024-08-01  9:01 ` [RFC PATCH v2 04/10] KVM: Add KVM capability to check if guest_memfd can be mapped by the host Fuad Tabba
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 21+ messages in thread
From: Fuad Tabba @ 2024-08-01  9:01 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, yu.c.zhang,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail,
	david, michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat,
	shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd, tabba

Make __kvm_read_guest_page/__kvm_write_guest_page capable of
accessing guest memory if no userspace address is available.
Moreover, check that the memory being accessed is shared with the
host before attempting the access.

KVM at the host might need to access shared memory that is not
mapped in the host userspace but is in fact shared with the host,
e.g., when accounting for stolen time. This allows the access
without relying on the slot's userspace_addr being set.

This does not circumvent protection, since the access is only
attempted if the memory is mappable by the host, which implies
shareability.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 virt/kvm/kvm_main.c | 127 ++++++++++++++++++++++++++++++++++++++------
 1 file changed, 111 insertions(+), 16 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f4b4498d4de6..ec6255c7325e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3385,20 +3385,108 @@ int kvm_gmem_clear_mappable(struct kvm *kvm, gfn_t start, gfn_t end)
 	return kvm_gmem_toggle_mappable(kvm, start, end, false);
 }
 
+static int __kvm_read_private_guest_page(struct kvm *kvm,
+					 struct kvm_memory_slot *slot,
+					 gfn_t gfn, void *data, int offset,
+					 int len)
+{
+	struct page *page;
+	u64 pfn;
+	int r = 0;
+
+	if (size_add(offset, len) > PAGE_SIZE)
+		return -E2BIG;
+
+	mutex_lock(&kvm->slots_lock);
+
+	if (!__kvm_gmem_is_mappable(kvm, gfn, gfn + 1)) {
+		r = -EPERM;
+		goto unlock;
+	}
+
+	r = kvm_gmem_get_pfn_locked(kvm, slot, gfn, &pfn, NULL);
+	if (r)
+		goto unlock;
+
+	page = pfn_to_page(pfn);
+	memcpy(data, page_address(page) + offset, len);
+	unlock_page(page);
+	kvm_release_pfn_clean(pfn);
+unlock:
+	mutex_unlock(&kvm->slots_lock);
+
+	return r;
+}
+
+static int __kvm_write_private_guest_page(struct kvm *kvm,
+					  struct kvm_memory_slot *slot,
+					  gfn_t gfn, const void *data,
+					  int offset, int len)
+{
+	struct page *page;
+	u64 pfn;
+	int r = 0;
+
+	if (size_add(offset, len) > PAGE_SIZE)
+		return -E2BIG;
+
+	mutex_lock(&kvm->slots_lock);
+
+	if (!__kvm_gmem_is_mappable(kvm, gfn, gfn + 1)) {
+		r = -EPERM;
+		goto unlock;
+	}
+
+	r = kvm_gmem_get_pfn_locked(kvm, slot, gfn, &pfn, NULL);
+	if (r)
+		goto unlock;
+
+	page = pfn_to_page(pfn);
+	memcpy(page_address(page) + offset, data, len);
+	unlock_page(page);
+	kvm_release_pfn_dirty(pfn);
+unlock:
+	mutex_unlock(&kvm->slots_lock);
+
+	return r;
+}
+#else
+static int __kvm_read_private_guest_page(struct kvm *kvm,
+					 struct kvm_memory_slot *slot,
+					 gfn_t gfn, void *data, int offset,
+					 int len)
+{
+	WARN_ON_ONCE(1);
+	return -EIO;
+}
+
+static int __kvm_write_private_guest_page(struct kvm *kvm,
+					  struct kvm_memory_slot *slot,
+					  gfn_t gfn, const void *data,
+					  int offset, int len)
+{
+	WARN_ON_ONCE(1);
+	return -EIO;
+}
 #endif /* CONFIG_KVM_PRIVATE_MEM_MAPPABLE */
 
 /* Copy @len bytes from guest memory at '(@gfn * PAGE_SIZE) + @offset' to @data */
-static int __kvm_read_guest_page(struct kvm_memory_slot *slot, gfn_t gfn,
-				 void *data, int offset, int len)
+
+static int __kvm_read_guest_page(struct kvm *kvm, struct kvm_memory_slot *slot,
+				 gfn_t gfn, void *data, int offset, int len)
 {
-	int r;
 	unsigned long addr;
 
+	if (IS_ENABLED(CONFIG_KVM_PRIVATE_MEM_MAPPABLE) &&
+	    kvm_slot_can_be_private(slot)) {
+		return __kvm_read_private_guest_page(kvm, slot, gfn, data,
+						     offset, len);
+	}
+
 	addr = gfn_to_hva_memslot_prot(slot, gfn, NULL);
 	if (kvm_is_error_hva(addr))
 		return -EFAULT;
-	r = __copy_from_user(data, (void __user *)addr + offset, len);
-	if (r)
+	if (__copy_from_user(data, (void __user *)addr + offset, len))
 		return -EFAULT;
 	return 0;
 }
@@ -3408,7 +3496,7 @@ int kvm_read_guest_page(struct kvm *kvm, gfn_t gfn, void *data, int offset,
 {
 	struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);
 
-	return __kvm_read_guest_page(slot, gfn, data, offset, len);
+	return __kvm_read_guest_page(kvm, slot, gfn, data, offset, len);
 }
 EXPORT_SYMBOL_GPL(kvm_read_guest_page);
 
@@ -3417,7 +3505,7 @@ int kvm_vcpu_read_guest_page(struct kvm_vcpu *vcpu, gfn_t gfn, void *data,
 {
 	struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
 
-	return __kvm_read_guest_page(slot, gfn, data, offset, len);
+	return __kvm_read_guest_page(vcpu->kvm, slot, gfn, data, offset, len);
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_read_guest_page);
 
@@ -3492,17 +3580,24 @@ EXPORT_SYMBOL_GPL(kvm_vcpu_read_guest_atomic);
 /* Copy @len bytes from @data into guest memory at '(@gfn * PAGE_SIZE) + @offset' */
 static int __kvm_write_guest_page(struct kvm *kvm,
 				  struct kvm_memory_slot *memslot, gfn_t gfn,
-			          const void *data, int offset, int len)
+				  const void *data, int offset, int len)
 {
-	int r;
-	unsigned long addr;
+	if (IS_ENABLED(CONFIG_KVM_PRIVATE_MEM_MAPPABLE) &&
+	    kvm_slot_can_be_private(memslot)) {
+		int r = __kvm_write_private_guest_page(kvm, memslot, gfn, data,
+						       offset, len);
+
+		if (r)
+			return r;
+	} else {
+		unsigned long addr = gfn_to_hva_memslot(memslot, gfn);
+
+		if (kvm_is_error_hva(addr))
+			return -EFAULT;
+		if (__copy_to_user((void __user *)addr + offset, data, len))
+			return -EFAULT;
+	}
 
-	addr = gfn_to_hva_memslot(memslot, gfn);
-	if (kvm_is_error_hva(addr))
-		return -EFAULT;
-	r = __copy_to_user((void __user *)addr + offset, data, len);
-	if (r)
-		return -EFAULT;
 	mark_page_dirty_in_slot(kvm, memslot, gfn);
 	return 0;
 }
-- 
2.46.0.rc1.232.g9752f9e123-goog



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 04/10] KVM: Add KVM capability to check if guest_memfd can be mapped by the host
  2024-08-01  9:01 [RFC PATCH v2 00/10] KVM: Restricted mapping of guest_memfd at the host and pKVM/arm64 support Fuad Tabba
                   ` (2 preceding siblings ...)
  2024-08-01  9:01 ` [RFC PATCH v2 03/10] KVM: Implement kvm_(read|/write)_guest_page for private memory slots Fuad Tabba
@ 2024-08-01  9:01 ` Fuad Tabba
  2024-08-05 17:19   ` Ackerley Tng
  2024-08-01  9:01 ` [RFC PATCH v2 05/10] KVM: selftests: guest_memfd mmap() test when mapping is allowed Fuad Tabba
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 21+ messages in thread
From: Fuad Tabba @ 2024-08-01  9:01 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, yu.c.zhang,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail,
	david, michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat,
	shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd, tabba

Add the KVM capability KVM_CAP_GUEST_MEMFD_MAPPABLE, which is
true if mapping guest memory is supported by the host.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 include/uapi/linux/kvm.h | 3 ++-
 virt/kvm/kvm_main.c      | 4 ++++
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index d03842abae57..783d0c3f4cb1 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -916,7 +916,8 @@ struct kvm_enable_cap {
 #define KVM_CAP_MEMORY_FAULT_INFO 232
 #define KVM_CAP_MEMORY_ATTRIBUTES 233
 #define KVM_CAP_GUEST_MEMFD 234
-#define KVM_CAP_VM_TYPES 235
+#define KVM_CAP_GUEST_MEMFD_MAPPABLE 235
+#define KVM_CAP_VM_TYPES 236
 
 struct kvm_irq_routing_irqchip {
 	__u32 irqchip;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ec6255c7325e..485c39fc373c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5077,6 +5077,10 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 #ifdef CONFIG_KVM_PRIVATE_MEM
 	case KVM_CAP_GUEST_MEMFD:
 		return !kvm || kvm_arch_has_private_mem(kvm);
+#endif
+#ifdef CONFIG_KVM_PRIVATE_MEM_MAPPABLE
+	case KVM_CAP_GUEST_MEMFD_MAPPABLE:
+		return !kvm || kvm_arch_has_private_mem(kvm);
 #endif
 	default:
 		break;
-- 
2.46.0.rc1.232.g9752f9e123-goog



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 05/10] KVM: selftests: guest_memfd mmap() test when mapping is allowed
  2024-08-01  9:01 [RFC PATCH v2 00/10] KVM: Restricted mapping of guest_memfd at the host and pKVM/arm64 support Fuad Tabba
                   ` (3 preceding siblings ...)
  2024-08-01  9:01 ` [RFC PATCH v2 04/10] KVM: Add KVM capability to check if guest_memfd can be mapped by the host Fuad Tabba
@ 2024-08-01  9:01 ` Fuad Tabba
  2024-08-01  9:01 ` [RFC PATCH v2 06/10] KVM: arm64: Skip VMA checks for slots without userspace address Fuad Tabba
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: Fuad Tabba @ 2024-08-01  9:01 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, yu.c.zhang,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail,
	david, michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat,
	shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd, tabba

Expand the guest_memfd selftests to include testing mapping guest
memory if the capability is supported, and that still checks that
memory is not mappable if the capability isn't supported.

Also, build the guest_memfd selftest for aarch64.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 tools/testing/selftests/kvm/Makefile          |  1 +
 .../testing/selftests/kvm/guest_memfd_test.c  | 47 ++++++++++++++++++-
 2 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index ac280dcba996..fb63f7e956d4 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -166,6 +166,7 @@ TEST_GEN_PROGS_aarch64 += arch_timer
 TEST_GEN_PROGS_aarch64 += demand_paging_test
 TEST_GEN_PROGS_aarch64 += dirty_log_test
 TEST_GEN_PROGS_aarch64 += dirty_log_perf_test
+TEST_GEN_PROGS_aarch64 += guest_memfd_test
 TEST_GEN_PROGS_aarch64 += guest_print_test
 TEST_GEN_PROGS_aarch64 += get-reg-list
 TEST_GEN_PROGS_aarch64 += kvm_create_max_vcpus
diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
index ba0c8e996035..c6bb2be5b6e2 100644
--- a/tools/testing/selftests/kvm/guest_memfd_test.c
+++ b/tools/testing/selftests/kvm/guest_memfd_test.c
@@ -34,12 +34,55 @@ static void test_file_read_write(int fd)
 		    "pwrite on a guest_mem fd should fail");
 }
 
-static void test_mmap(int fd, size_t page_size)
+static void test_mmap_allowed(int fd, size_t total_size)
 {
+	size_t page_size = getpagesize();
+	char *mem;
+	int ret;
+	int i;
+
+	mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+	TEST_ASSERT(mem != MAP_FAILED, "mmaping() guest memory should pass.");
+
+	memset(mem, 0xaa, total_size);
+	for (i = 0; i < total_size; i++)
+		TEST_ASSERT_EQ(mem[i], 0xaa);
+
+	ret = fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, 0,
+			page_size);
+	TEST_ASSERT(!ret, "fallocate the first page should succeed");
+
+	for (i = 0; i < page_size; i++)
+		TEST_ASSERT_EQ(mem[i], 0x00);
+	for (; i < total_size; i++)
+		TEST_ASSERT_EQ(mem[i], 0xaa);
+
+	memset(mem, 0xaa, total_size);
+	for (i = 0; i < total_size; i++)
+		TEST_ASSERT_EQ(mem[i], 0xaa);
+
+	ret = munmap(mem, total_size);
+	TEST_ASSERT(!ret, "munmap should succeed");
+}
+
+static void test_mmap_denied(int fd, size_t total_size)
+{
+	size_t page_size = getpagesize();
 	char *mem;
 
 	mem = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
 	TEST_ASSERT_EQ(mem, MAP_FAILED);
+
+	mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+	TEST_ASSERT_EQ(mem, MAP_FAILED);
+}
+
+static void test_mmap(int fd, size_t total_size)
+{
+	if (kvm_has_cap(KVM_CAP_GUEST_MEMFD_MAPPABLE))
+		test_mmap_allowed(fd, total_size);
+	else
+		test_mmap_denied(fd, total_size);
 }
 
 static void test_file_size(int fd, size_t page_size, size_t total_size)
@@ -190,7 +233,7 @@ int main(int argc, char *argv[])
 	fd = vm_create_guest_memfd(vm, total_size, 0);
 
 	test_file_read_write(fd);
-	test_mmap(fd, page_size);
+	test_mmap(fd, total_size);
 	test_file_size(fd, page_size, total_size);
 	test_fallocate(fd, page_size, total_size);
 	test_invalid_punch_hole(fd, page_size, total_size);
-- 
2.46.0.rc1.232.g9752f9e123-goog



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 06/10] KVM: arm64: Skip VMA checks for slots without userspace address
  2024-08-01  9:01 [RFC PATCH v2 00/10] KVM: Restricted mapping of guest_memfd at the host and pKVM/arm64 support Fuad Tabba
                   ` (4 preceding siblings ...)
  2024-08-01  9:01 ` [RFC PATCH v2 05/10] KVM: selftests: guest_memfd mmap() test when mapping is allowed Fuad Tabba
@ 2024-08-01  9:01 ` Fuad Tabba
  2024-08-01  9:01 ` [RFC PATCH v2 07/10] KVM: arm64: Do not allow changes to private memory slots Fuad Tabba
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: Fuad Tabba @ 2024-08-01  9:01 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, yu.c.zhang,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail,
	david, michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat,
	shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd, tabba

Memory slots backed by guest memory might be created with no
intention of being mapped by the host. These are recognized by
not having a userspace address in the memory slot.

VMA checks are neither possible nor necessary for this kind of
slot, so skip them.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/arm64/kvm/mmu.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 8bcab0cc3fe9..e632e10ea395 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -948,6 +948,10 @@ static void stage2_unmap_memslot(struct kvm *kvm,
 	phys_addr_t size = PAGE_SIZE * memslot->npages;
 	hva_t reg_end = hva + size;
 
+	/* Host will not map this private memory without a userspace address. */
+	if (kvm_slot_can_be_private(memslot) && !hva)
+		return;
+
 	/*
 	 * A memory region could potentially cover multiple VMAs, and any holes
 	 * between them, so iterate over all of them to find out if we should
@@ -1976,6 +1980,10 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 	hva = new->userspace_addr;
 	reg_end = hva + (new->npages << PAGE_SHIFT);
 
+	/* Host will not map this private memory without a userspace address. */
+	if ((kvm_slot_can_be_private(new)) && !hva)
+		return 0;
+
 	mmap_read_lock(current->mm);
 	/*
 	 * A memory region could potentially cover multiple VMAs, and any holes
-- 
2.46.0.rc1.232.g9752f9e123-goog



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 07/10] KVM: arm64: Do not allow changes to private memory slots
  2024-08-01  9:01 [RFC PATCH v2 00/10] KVM: Restricted mapping of guest_memfd at the host and pKVM/arm64 support Fuad Tabba
                   ` (5 preceding siblings ...)
  2024-08-01  9:01 ` [RFC PATCH v2 06/10] KVM: arm64: Skip VMA checks for slots without userspace address Fuad Tabba
@ 2024-08-01  9:01 ` Fuad Tabba
  2024-08-01  9:01 ` [RFC PATCH v2 08/10] KVM: arm64: Handle guest_memfd()-backed guest page faults Fuad Tabba
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: Fuad Tabba @ 2024-08-01  9:01 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, yu.c.zhang,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail,
	david, michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat,
	shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd, tabba

Handling changes to private memory slots can be difficult, since
it would probably require some cooperation from the hypervisor
and/or the guest. Do not allow such changes for now.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/arm64/kvm/mmu.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index e632e10ea395..b1fc636fb670 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1970,6 +1970,10 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 			change != KVM_MR_FLAGS_ONLY)
 		return 0;
 
+	if ((change == KVM_MR_MOVE || change == KVM_MR_FLAGS_ONLY) &&
+	    ((kvm_slot_can_be_private(old)) || (kvm_slot_can_be_private(new))))
+		return -EPERM;
+
 	/*
 	 * Prevent userspace from creating a memory region outside of the IPA
 	 * space addressable by the KVM guest IPA space.
-- 
2.46.0.rc1.232.g9752f9e123-goog



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 08/10] KVM: arm64: Handle guest_memfd()-backed guest page faults
  2024-08-01  9:01 [RFC PATCH v2 00/10] KVM: Restricted mapping of guest_memfd at the host and pKVM/arm64 support Fuad Tabba
                   ` (6 preceding siblings ...)
  2024-08-01  9:01 ` [RFC PATCH v2 07/10] KVM: arm64: Do not allow changes to private memory slots Fuad Tabba
@ 2024-08-01  9:01 ` Fuad Tabba
  2024-08-01  9:01 ` [RFC PATCH v2 09/10] KVM: arm64: arm64 has private memory support when config is enabled Fuad Tabba
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: Fuad Tabba @ 2024-08-01  9:01 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, yu.c.zhang,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail,
	david, michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat,
	shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd, tabba

Add arm64 support for resolving guest page faults on
guest_memfd() backed memslots. This support is not contingent on
pKVM, or other confidential computing support, and works in both
VHE and nVHE modes.

Without confidential computing, this support is useful for
testing and debugging. In the future, it might also be useful
should a user want to use guest_memfd() for all code, whether
it's for a protected guest or not.

For now, the fault granule is restricted to PAGE_SIZE.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/arm64/kvm/mmu.c | 127 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 125 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index b1fc636fb670..e15167865cab 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1378,6 +1378,123 @@ static bool kvm_vma_mte_allowed(struct vm_area_struct *vma)
 	return vma->vm_flags & VM_MTE_ALLOWED;
 }
 
+static int guest_memfd_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+			     struct kvm_memory_slot *memslot, bool fault_is_perm)
+{
+	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
+	bool exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu);
+	bool logging_active = memslot_is_logging(memslot);
+	struct kvm_pgtable *pgt = vcpu->arch.hw_mmu->pgt;
+	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
+	bool write_fault = kvm_is_write_fault(vcpu);
+	struct mm_struct *mm = current->mm;
+	gfn_t gfn = gpa_to_gfn(fault_ipa);
+	struct kvm *kvm = vcpu->kvm;
+	unsigned long mmu_seq;
+	struct page *page;
+	kvm_pfn_t pfn;
+	int ret;
+
+	/* For now, guest_memfd() only supports PAGE_SIZE granules. */
+	if (WARN_ON_ONCE(fault_is_perm &&
+			 kvm_vcpu_trap_get_perm_fault_granule(vcpu) != PAGE_SIZE)) {
+		return -EFAULT;
+	}
+
+	VM_BUG_ON(write_fault && exec_fault);
+
+	if (fault_is_perm && !write_fault && !exec_fault) {
+		kvm_err("Unexpected L2 read permission error\n");
+		return -EFAULT;
+	}
+
+	/*
+	 * Permission faults just need to update the existing leaf entry,
+	 * and so normally don't require allocations from the memcache. The
+	 * only exception to this is when dirty logging is enabled at runtime
+	 * and a write fault needs to collapse a block entry into a table.
+	 */
+	if (!fault_is_perm || (logging_active && write_fault)) {
+		ret = kvm_mmu_topup_memory_cache(memcache,
+						 kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu));
+		if (ret)
+			return ret;
+	}
+
+	/*
+	 * Read mmu_invalidate_seq so that KVM can detect if the results of
+	 * kvm_gmem_get_pfn_locked() become stale prior to acquiring
+	 * kvm->mmu_lock.
+	 */
+	mmu_seq = vcpu->kvm->mmu_invalidate_seq;
+
+	/* To pair with the smp_wmb() in kvm_mmu_invalidate_end(). */
+	smp_rmb();
+
+	ret = kvm_gmem_get_pfn_locked(kvm, memslot, gfn, &pfn, NULL);
+	if (ret)
+		return ret;
+
+	page = pfn_to_page(pfn);
+
+	if (!kvm_gmem_is_mappable(kvm, gfn, gfn + 1) &&
+	    (page_mapped(page) || page_maybe_dma_pinned(page))) {
+		return -EPERM;
+	}
+
+	/*
+	 * Once it's faulted in, a guest_memfd() page will stay in memory.
+	 * Therefore, count it as locked.
+	 */
+	if (!fault_is_perm) {
+		ret = account_locked_vm(mm, 1, true);
+		if (ret)
+			goto unlock_page;
+	}
+
+	read_lock(&kvm->mmu_lock);
+	if (mmu_invalidate_retry(kvm, mmu_seq))
+		goto unlock_mmu;
+
+	if (write_fault)
+		prot |= KVM_PGTABLE_PROT_W;
+
+	if (exec_fault)
+		prot |= KVM_PGTABLE_PROT_X;
+
+	if (cpus_have_final_cap(ARM64_HAS_CACHE_DIC))
+		prot |= KVM_PGTABLE_PROT_X;
+
+	/*
+	 * Under the premise of getting a FSC_PERM fault, we just need to relax
+	 * permissions.
+	 */
+	if (fault_is_perm)
+		ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot);
+	else
+		ret = kvm_pgtable_stage2_map(pgt, fault_ipa, PAGE_SIZE,
+					__pfn_to_phys(pfn), prot,
+					memcache,
+					KVM_PGTABLE_WALK_HANDLE_FAULT |
+					KVM_PGTABLE_WALK_SHARED);
+
+	/* Mark the page dirty only if the fault is handled successfully */
+	if (write_fault && !ret) {
+		kvm_set_pfn_dirty(pfn);
+		mark_page_dirty_in_slot(kvm, memslot, gfn);
+	}
+
+unlock_mmu:
+	read_unlock(&kvm->mmu_lock);
+
+	if (ret && !fault_is_perm)
+		account_locked_vm(mm, 1, false);
+unlock_page:
+	unlock_page(page);
+	put_page(page);
+	return ret != -EAGAIN ? ret : 0;
+}
+
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			  struct kvm_memory_slot *memslot, unsigned long hva,
 			  bool fault_is_perm)
@@ -1748,8 +1865,14 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 		goto out_unlock;
 	}
 
-	ret = user_mem_abort(vcpu, fault_ipa, memslot, hva,
-			     esr_fsc_is_permission_fault(esr));
+	if (kvm_slot_can_be_private(memslot)) {
+		ret = guest_memfd_abort(vcpu, fault_ipa, memslot,
+					esr_fsc_is_permission_fault(esr));
+	} else {
+		ret = user_mem_abort(vcpu, fault_ipa, memslot, hva,
+				     esr_fsc_is_permission_fault(esr));
+	}
+
 	if (ret == 0)
 		ret = 1;
 out:
-- 
2.46.0.rc1.232.g9752f9e123-goog



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 09/10] KVM: arm64: arm64 has private memory support when config is enabled
  2024-08-01  9:01 [RFC PATCH v2 00/10] KVM: Restricted mapping of guest_memfd at the host and pKVM/arm64 support Fuad Tabba
                   ` (7 preceding siblings ...)
  2024-08-01  9:01 ` [RFC PATCH v2 08/10] KVM: arm64: Handle guest_memfd()-backed guest page faults Fuad Tabba
@ 2024-08-01  9:01 ` Fuad Tabba
  2024-08-15  6:27   ` Patrick Roy
  2024-08-01  9:01 ` [RFC PATCH v2 10/10] KVM: arm64: Enable private memory kconfig for arm64 Fuad Tabba
  2024-08-05 16:53 ` [RFC PATCH v2 00/10] KVM: Restricted mapping of guest_memfd at the host and pKVM/arm64 support Ackerley Tng
  10 siblings, 1 reply; 21+ messages in thread
From: Fuad Tabba @ 2024-08-01  9:01 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, yu.c.zhang,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail,
	david, michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat,
	shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd, tabba

Implement kvm_arch_has_private_mem() in arm64, making it
dependent on the configuration option.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/arm64/include/asm/kvm_host.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 36b8e97bf49e..8f7d78ee9557 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -1414,4 +1414,7 @@ bool kvm_arm_vcpu_stopped(struct kvm_vcpu *vcpu);
 		(pa + pi + pa3) == 1;					\
 	})
 
+#define kvm_arch_has_private_mem(kvm)					\
+	(IS_ENABLED(CONFIG_KVM_PRIVATE_MEM) && is_protected_kvm_enabled())
+
 #endif /* __ARM64_KVM_HOST_H__ */
-- 
2.46.0.rc1.232.g9752f9e123-goog



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 10/10] KVM: arm64: Enable private memory kconfig for arm64
  2024-08-01  9:01 [RFC PATCH v2 00/10] KVM: Restricted mapping of guest_memfd at the host and pKVM/arm64 support Fuad Tabba
                   ` (8 preceding siblings ...)
  2024-08-01  9:01 ` [RFC PATCH v2 09/10] KVM: arm64: arm64 has private memory support when config is enabled Fuad Tabba
@ 2024-08-01  9:01 ` Fuad Tabba
  2024-08-05 16:53 ` [RFC PATCH v2 00/10] KVM: Restricted mapping of guest_memfd at the host and pKVM/arm64 support Ackerley Tng
  10 siblings, 0 replies; 21+ messages in thread
From: Fuad Tabba @ 2024-08-01  9:01 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, yu.c.zhang,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail,
	david, michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat,
	shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd, tabba

Now that the infrastructure is in place for arm64 to support
guest private memory, enable it in the arm64 kernel
configuration.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/arm64/kvm/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 58f09370d17e..8b166c697930 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -37,6 +37,7 @@ menuconfig KVM
 	select HAVE_KVM_VCPU_RUN_PID_CHANGE
 	select SCHED_INFO
 	select GUEST_PERF_EVENTS if PERF_EVENTS
+	select KVM_PRIVATE_MEM_MAPPABLE
 	help
 	  Support hosting virtualized guest machines.
 
-- 
2.46.0.rc1.232.g9752f9e123-goog



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 00/10] KVM: Restricted mapping of guest_memfd at the host and pKVM/arm64 support
  2024-08-01  9:01 [RFC PATCH v2 00/10] KVM: Restricted mapping of guest_memfd at the host and pKVM/arm64 support Fuad Tabba
                   ` (9 preceding siblings ...)
  2024-08-01  9:01 ` [RFC PATCH v2 10/10] KVM: arm64: Enable private memory kconfig for arm64 Fuad Tabba
@ 2024-08-05 16:53 ` Ackerley Tng
  2024-08-05 18:13   ` Fuad Tabba
  10 siblings, 1 reply; 21+ messages in thread
From: Ackerley Tng @ 2024-08-05 16:53 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	yu.c.zhang, isaku.yamahata, mic, vbabka, vannapurve, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat,
	shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd, tabba

Fuad Tabba <tabba@google.com> writes:

> This series adds restricted mmap() support to guest_memfd, as
> well as support for guest_memfd on pKVM/arm64. It is based on
> Linux 6.10.
>
> Main changes since V1 [1]:
>
> - Decoupled whether guest memory is mappable from KVM memory
> attributes (SeanC)
>
> Mappability is now tracked in the guest_mem object, orthogonal to
> whether userspace wants the memory to be private or shared.
> Moreover, the memory attributes capability (i.e.,
> KVM_CAP_MEMORY_ATTRIBUTES) is not enabled for pKVM, since for
> software-based hypervisors such as pKVM and Gunyah, userspace is
> informed of the state of the memory via hypervisor exits if
> needed.
>
> Even if attributes are enabled, this patch series would still
> work (modulo bugs), without compromising guest memory nor
> crashing the system.
>
> - Use page_mapped() instead of page_mapcount() to check if page
> is mapped (DavidH)
>
> - Add a new capability, KVM_CAP_GUEST_MEMFD_MAPPABLE, to query
> whether guest private memory can be mapped (with aforementioned
> restrictions)
>
> - Add a selftest to check whether memory is mappable when the
> capability is enabled, and not mappable otherwise. Also, test the
> effect of punching holes in mapped memory. (DavidH)
>
> By design, guest_memfd cannot be mapped, read, or written by the
> host. In pKVM, memory shared between a protected guest and the

I think we should use "cannot be faulted in" to be clear that
guest_memfd can be mmaped but not faulted in.

Would it be better to have all the variables/config macros be something
about faultability instead of mappability?

> host is shared in-place, unlike the other confidential computing
> solutions that guest_memfd was originally envisaged for (e.g,
> TDX). When initializing a guest, as well as when accessing memory
> shared by the guest with the host, it would be useful to support
> mapping that memory at the host to avoid copying its contents.
>
> One of the benefits of guest_memfd is that it prevents a
> misbehaving host process from crashing the system when attempting
> to access (deliberately or accidentally) protected guest memory,
> since this memory isn't mapped to begin with. Without
> guest_memfd, the hypervisor would still prevent such accesses,
> but in certain cases the host kernel wouldn't be able to recover,
> causing the system to crash.
>
> Support for mmap() in this patch series maintains the invariant
> that only memory shared with the host, either explicitly by the
> guest or implicitly before the guest has started running (in
> order to populate its memory) is allowed to have a valid mapping
> at the host. At no time should private (as viewed by the
> hypervisor) guest memory be mapped at the host.
>
> This patch series is divided into two parts:
>
> The first part is to the KVM core code. It adds opt-in support
> for mapping guest memory only as long as it is shared. For that,
> the host needs to know the mappability status of guest memory.
> Therefore, the series adds a structure to track whether memory is
> mappable. This new structure is associated with each guest_memfd
> object.
>
> The second part of the series adds guest_memfd support for
> pKVM/arm64.
>
> We don't enforce the invariant that only memory shared with the
> host can be mapped by the host userspace in
> file_operations::mmap(), but we enforce it in
> vm_operations_struct:fault(). On vm_operations_struct::fault(),
> we check whether the page is allowed to be mapped. If not, we
> deliver a SIGBUS to the current task, as discussed in the Linux
> MM Alignment Session on this topic [2].
>
> Currently there's no support for huge pages, which is something
> we hope to add in the future, and seems to be a hot topic for the
> upcoming LPC 2024 [3].
>
> Cheers,
> /fuad
>
> [1] https://lore.kernel.org/all/20240222161047.402609-1-tabba@google.com/
>
> [2] https://lore.kernel.org/all/20240712232937.2861788-1-ackerleytng@google.com/
>
> [3] https://lpc.events/event/18/sessions/183/#20240919
>
> Fuad Tabba (10):
>   KVM: Introduce kvm_gmem_get_pfn_locked(), which retains the folio lock
>   KVM: Add restricted support for mapping guestmem by the host
>   KVM: Implement kvm_(read|/write)_guest_page for private memory slots
>   KVM: Add KVM capability to check if guest_memfd can be mapped by the
>     host
>   KVM: selftests: guest_memfd mmap() test when mapping is allowed
>   KVM: arm64: Skip VMA checks for slots without userspace address
>   KVM: arm64: Do not allow changes to private memory slots
>   KVM: arm64: Handle guest_memfd()-backed guest page faults
>   KVM: arm64: arm64 has private memory support when config is enabled
>   KVM: arm64: Enable private memory kconfig for arm64
>
>  arch/arm64/include/asm/kvm_host.h             |   3 +
>  arch/arm64/kvm/Kconfig                        |   1 +
>  arch/arm64/kvm/mmu.c                          | 139 +++++++++-
>  include/linux/kvm_host.h                      |  72 +++++
>  include/uapi/linux/kvm.h                      |   3 +-
>  tools/testing/selftests/kvm/Makefile          |   1 +
>  .../testing/selftests/kvm/guest_memfd_test.c  |  47 +++-
>  virt/kvm/Kconfig                              |   4 +
>  virt/kvm/guest_memfd.c                        | 129 ++++++++-
>  virt/kvm/kvm_main.c                           | 253 ++++++++++++++++--
>  10 files changed, 628 insertions(+), 24 deletions(-)
>
>
> base-commit: 0c3836482481200ead7b416ca80c68a29cfdaabd


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 02/10] KVM: Add restricted support for mapping guestmem by the host
  2024-08-01  9:01 ` [RFC PATCH v2 02/10] KVM: Add restricted support for mapping guestmem by the host Fuad Tabba
@ 2024-08-05 17:14   ` Ackerley Tng
  2024-08-05 18:08     ` Fuad Tabba
  0 siblings, 1 reply; 21+ messages in thread
From: Ackerley Tng @ 2024-08-05 17:14 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	yu.c.zhang, isaku.yamahata, mic, vbabka, vannapurve, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat,
	shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd, tabba

Fuad Tabba <tabba@google.com> writes:

> Add support for mmap() and fault() for guest_memfd in the host.
> The ability to fault in a guest page is contingent on that page
> being shared with the host. To track this, this patch adds a new
> xarray to each guest_memfd object, which tracks the mappability
> of guest frames.
>
> The guest_memfd PRIVATE memory attribute is not used for two
> reasons. First because it reflects the userspace expectation for
> that memory location, and therefore can be toggled by userspace.

Thank you for clarifying why the PRIVATE memory attribute cannot be
used. I think that having the attributes with guest_memfd is a good
idea.

Since faultability is a property of the memory contents, shouldn't
faultability be stored on the inode rather than the file?

> The second is, although each guest_memfd file has a 1:1 binding
> with a KVM instance, the plan is to allow multiple files per
> inode, e.g. to allow intra-host migration to a new KVM instance,
> without destroying guest_memfd.

I think you also alluded to the concept of inodes vs files above.

To store the xarray on the inode, we would probably have to make
guest_memfd use its own mount so that we can use .evict_inode() from
struct address_space_operations to clean up the inode neatly, perhaps
the way it was done in guest_memfd v12 [1].

This RFC to enable intra-host migration [2] was built with the version
of guest_memfd that used its own mount. IIUC, the gmem struct stored on
the file is meant to be the binding between struct kvm and the memory
contents [3], so the gmem struct won't be transferred and a new gmem
struct will be created.

>
> This new feature is gated with a new configuration option,
> CONFIG_KVM_PRIVATE_MEM_MAPPABLE.
>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>  include/linux/kvm_host.h |  61 ++++++++++++++++++++
>  virt/kvm/Kconfig         |   4 ++
>  virt/kvm/guest_memfd.c   | 110 +++++++++++++++++++++++++++++++++++
>  virt/kvm/kvm_main.c      | 122 +++++++++++++++++++++++++++++++++++++++
>  4 files changed, 297 insertions(+)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 43a157f8171a..ab1344327e57 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -2452,4 +2452,65 @@ static inline int kvm_gmem_get_pfn_locked(struct kvm *kvm,
>  }
>  #endif /* CONFIG_KVM_PRIVATE_MEM */
>
> +#ifdef CONFIG_KVM_PRIVATE_MEM_MAPPABLE
> +bool kvm_gmem_is_mappable(struct kvm *kvm, gfn_t gfn, gfn_t end);
> +bool kvm_gmem_is_mapped(struct kvm *kvm, gfn_t start, gfn_t end);
> +int kvm_gmem_set_mappable(struct kvm *kvm, gfn_t start, gfn_t end);

How will kvm_gmem_is_mapped() and kvm_gmem_set_mappable() be used?

> +int kvm_gmem_clear_mappable(struct kvm *kvm, gfn_t start, gfn_t end);
> +int kvm_slot_gmem_toggle_mappable(struct kvm_memory_slot *slot, gfn_t start,
> +				  gfn_t end, bool is_mappable);
> +int kvm_slot_gmem_set_mappable(struct kvm_memory_slot *slot, gfn_t start,
> +			       gfn_t end);
> +int kvm_slot_gmem_clear_mappable(struct kvm_memory_slot *slot, gfn_t start,
> +				 gfn_t end);
> +bool kvm_slot_gmem_is_mappable(struct kvm_memory_slot *slot, gfn_t gfn);
> +#else
> +static inline bool kvm_gmem_is_mappable(struct kvm *kvm, gfn_t gfn, gfn_t end)
> +{
> +	WARN_ON_ONCE(1);
> +	return false;
> +}
> +static inline bool kvm_gmem_is_mapped(struct kvm *kvm, gfn_t start, gfn_t end)
> +{
> +	WARN_ON_ONCE(1);
> +	return false;
> +}
> +static inline int kvm_gmem_set_mappable(struct kvm *kvm, gfn_t start, gfn_t end)
> +{
> +	WARN_ON_ONCE(1);
> +	return -EINVAL;
> +}
> +static inline int kvm_gmem_clear_mappable(struct kvm *kvm, gfn_t start,
> +					  gfn_t end)
> +{
> +	WARN_ON_ONCE(1);
> +	return -EINVAL;
> +}
> +static inline int kvm_slot_gmem_toggle_mappable(struct kvm_memory_slot *slot,
> +						gfn_t start, gfn_t end,
> +						bool is_mappable)
> +{
> +	WARN_ON_ONCE(1);
> +	return -EINVAL;
> +}
> +static inline int kvm_slot_gmem_set_mappable(struct kvm_memory_slot *slot,
> +					     gfn_t start, gfn_t end)
> +{
> +	WARN_ON_ONCE(1);
> +	return -EINVAL;
> +}
> +static inline int kvm_slot_gmem_clear_mappable(struct kvm_memory_slot *slot,
> +					       gfn_t start, gfn_t end)
> +{
> +	WARN_ON_ONCE(1);
> +	return -EINVAL;
> +}
> +static inline bool kvm_slot_gmem_is_mappable(struct kvm_memory_slot *slot,
> +					     gfn_t gfn)
> +{
> +	WARN_ON_ONCE(1);
> +	return false;
> +}
> +#endif /* CONFIG_KVM_PRIVATE_MEM_MAPPABLE */
> +
>  #endif
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index 29b73eedfe74..a3970c5eca7b 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -109,3 +109,7 @@ config KVM_GENERIC_PRIVATE_MEM
>         select KVM_GENERIC_MEMORY_ATTRIBUTES
>         select KVM_PRIVATE_MEM
>         bool
> +
> +config KVM_PRIVATE_MEM_MAPPABLE
> +       select KVM_PRIVATE_MEM
> +       bool
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index f3f4334a9ccb..0a1f266a16f9 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -11,6 +11,9 @@ struct kvm_gmem {
>  	struct kvm *kvm;
>  	struct xarray bindings;
>  	struct list_head entry;
> +#ifdef CONFIG_KVM_PRIVATE_MEM_MAPPABLE
> +	struct xarray unmappable_gfns;
> +#endif
>  };
>
>  static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index)
> @@ -230,6 +233,11 @@ static int kvm_gmem_release(struct inode *inode, struct file *file)
>  	mutex_unlock(&kvm->slots_lock);
>
>  	xa_destroy(&gmem->bindings);
> +
> +#ifdef CONFIG_KVM_PRIVATE_MEM_MAPPABLE
> +	xa_destroy(&gmem->unmappable_gfns);
> +#endif
> +
>  	kfree(gmem);
>
>  	kvm_put_kvm(kvm);
> @@ -248,7 +256,105 @@ static inline struct file *kvm_gmem_get_file(struct kvm_memory_slot *slot)
>  	return get_file_active(&slot->gmem.file);
>  }
>
> +#ifdef CONFIG_KVM_PRIVATE_MEM_MAPPABLE
> +int kvm_slot_gmem_toggle_mappable(struct kvm_memory_slot *slot, gfn_t start,
> +				  gfn_t end, bool is_mappable)
> +{
> +	struct kvm_gmem *gmem = slot->gmem.file->private_data;
> +	void *xval = is_mappable ? NULL : xa_mk_value(true);

IIUC storing stuff in the xarray takes memory, so if we want to save
memory, we should minimize entries in the xarray. For pKVM, do you
expect more mappable, or more unmappable gfns?

For TDX most of the memory will be private, so we expect fewer mappable
gfns, and we'd prefer "entry in xarray == mappable".

> +	void *r;
> +
> +	r = xa_store_range(&gmem->unmappable_gfns, start, end - 1, xval, GFP_KERNEL);
> +
> +	return xa_err(r);
> +}
> +
> +int kvm_slot_gmem_set_mappable(struct kvm_memory_slot *slot, gfn_t start, gfn_t end)
> +{
> +	return kvm_slot_gmem_toggle_mappable(slot, start, end, true);
> +}
> +
> +int kvm_slot_gmem_clear_mappable(struct kvm_memory_slot *slot, gfn_t start, gfn_t end)
> +{
> +	return kvm_slot_gmem_toggle_mappable(slot, start, end, false);
> +}
> +
> +bool kvm_slot_gmem_is_mappable(struct kvm_memory_slot *slot, gfn_t gfn)
> +{
> +	struct kvm_gmem *gmem = slot->gmem.file->private_data;
> +	unsigned long _gfn = gfn;
> +
> +	return !xa_find(&gmem->unmappable_gfns, &_gfn, ULONG_MAX, XA_PRESENT);
> +}
> +
> +static bool kvm_gmem_isfaultable(struct vm_fault *vmf)
> +{
> +	struct kvm_gmem *gmem = vmf->vma->vm_file->private_data;
> +	struct inode *inode = file_inode(vmf->vma->vm_file);
> +	pgoff_t pgoff = vmf->pgoff;
> +	struct kvm_memory_slot *slot;
> +	unsigned long index;
> +	bool r = true;
> +
> +	filemap_invalidate_lock(inode->i_mapping);
> +
> +	xa_for_each_range(&gmem->bindings, index, slot, pgoff, pgoff) {
> +		pgoff_t base_gfn = slot->base_gfn;
> +		pgoff_t gfn_pgoff = slot->gmem.pgoff;
> +		pgoff_t gfn = base_gfn + max(gfn_pgoff, pgoff) - gfn_pgoff;
> +
> +		if (!kvm_slot_gmem_is_mappable(slot, gfn)) {
> +			r = false;
> +			break;
> +		}
> +	}
> +
> +	filemap_invalidate_unlock(inode->i_mapping);
> +
> +	return r;
> +}
> +
> +static vm_fault_t kvm_gmem_fault(struct vm_fault *vmf)
> +{
> +	struct folio *folio;
> +
> +	folio = kvm_gmem_get_folio(file_inode(vmf->vma->vm_file), vmf->pgoff);
> +	if (!folio)
> +		return VM_FAULT_SIGBUS;
> +
> +	if (!kvm_gmem_isfaultable(vmf)) {
> +		folio_unlock(folio);
> +		folio_put(folio);
> +		return VM_FAULT_SIGBUS;
> +	}
> +
> +	vmf->page = folio_file_page(folio, vmf->pgoff);
> +	return VM_FAULT_LOCKED;
> +}
> +
> +static const struct vm_operations_struct kvm_gmem_vm_ops = {
> +	.fault = kvm_gmem_fault,
> +};
> +
> +static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> +	if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) !=
> +	    (VM_SHARED | VM_MAYSHARE)) {
> +		return -EINVAL;
> +	}
> +
> +	file_accessed(file);
> +	vm_flags_set(vma, VM_DONTDUMP);
> +	vma->vm_ops = &kvm_gmem_vm_ops;
> +
> +	return 0;
> +}
> +#else
> +#define kvm_gmem_mmap NULL
> +#endif /* CONFIG_KVM_PRIVATE_MEM_MAPPABLE */
> +
>  static struct file_operations kvm_gmem_fops = {
> +	.mmap		= kvm_gmem_mmap,
>  	.open		= generic_file_open,
>  	.release	= kvm_gmem_release,
>  	.fallocate	= kvm_gmem_fallocate,
> @@ -369,6 +475,10 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
>  	xa_init(&gmem->bindings);
>  	list_add(&gmem->entry, &inode->i_mapping->i_private_list);
>
> +#ifdef CONFIG_KVM_PRIVATE_MEM_MAPPABLE
> +	xa_init(&gmem->unmappable_gfns);
> +#endif
> +
>  	fd_install(fd, file);
>  	return fd;
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 1192942aef91..f4b4498d4de6 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3265,6 +3265,128 @@ static int next_segment(unsigned long len, int offset)
>  		return len;
>  }
>
> +#ifdef CONFIG_KVM_PRIVATE_MEM_MAPPABLE
> +static bool __kvm_gmem_is_mappable(struct kvm *kvm, gfn_t start, gfn_t end)
> +{
> +	struct kvm_memslot_iter iter;
> +
> +	lockdep_assert_held(&kvm->slots_lock);
> +
> +	kvm_for_each_memslot_in_gfn_range(&iter, kvm_memslots(kvm), start, end) {
> +		struct kvm_memory_slot *memslot = iter.slot;
> +		gfn_t gfn_start, gfn_end, i;
> +
> +		gfn_start = max(start, memslot->base_gfn);
> +		gfn_end = min(end, memslot->base_gfn + memslot->npages);
> +		if (WARN_ON_ONCE(gfn_start >= gfn_end))
> +			continue;
> +
> +		for (i = gfn_start; i < gfn_end; i++) {
> +			if (!kvm_slot_gmem_is_mappable(memslot, i))
> +				return false;
> +		}
> +	}
> +
> +	return true;
> +}
> +
> +bool kvm_gmem_is_mappable(struct kvm *kvm, gfn_t start, gfn_t end)
> +{
> +	bool r;
> +
> +	mutex_lock(&kvm->slots_lock);
> +	r = __kvm_gmem_is_mappable(kvm, start, end);
> +	mutex_unlock(&kvm->slots_lock);
> +
> +	return r;
> +}
> +
> +static bool __kvm_gmem_is_mapped(struct kvm *kvm, gfn_t start, gfn_t end)
> +{
> +	struct kvm_memslot_iter iter;
> +
> +	lockdep_assert_held(&kvm->slots_lock);
> +
> +	kvm_for_each_memslot_in_gfn_range(&iter, kvm_memslots(kvm), start, end) {
> +		struct kvm_memory_slot *memslot = iter.slot;
> +		gfn_t gfn_start, gfn_end, i;
> +
> +		gfn_start = max(start, memslot->base_gfn);
> +		gfn_end = min(end, memslot->base_gfn + memslot->npages);
> +		if (WARN_ON_ONCE(gfn_start >= gfn_end))
> +			continue;
> +
> +		for (i = gfn_start; i < gfn_end; i++) {
> +			struct page *page;
> +			bool is_mapped;
> +			kvm_pfn_t pfn;
> +
> +			if (WARN_ON_ONCE(kvm_gmem_get_pfn_locked(kvm, memslot, i, &pfn, NULL)))
> +				continue;
> +
> +			page = pfn_to_page(pfn);
> +			is_mapped = page_mapped(page) || page_maybe_dma_pinned(page);
> +			unlock_page(page);
> +			put_page(page);
> +
> +			if (is_mapped)
> +				return true;
> +		}
> +	}
> +
> +	return false;
> +}
> +
> +bool kvm_gmem_is_mapped(struct kvm *kvm, gfn_t start, gfn_t end)
> +{
> +	bool r;
> +
> +	mutex_lock(&kvm->slots_lock);
> +	r = __kvm_gmem_is_mapped(kvm, start, end);
> +	mutex_unlock(&kvm->slots_lock);
> +
> +	return r;
> +}
> +
> +static int kvm_gmem_toggle_mappable(struct kvm *kvm, gfn_t start, gfn_t end,
> +				    bool is_mappable)
> +{
> +	struct kvm_memslot_iter iter;
> +	int r = 0;
> +
> +	mutex_lock(&kvm->slots_lock);
> +
> +	kvm_for_each_memslot_in_gfn_range(&iter, kvm_memslots(kvm), start, end) {
> +		struct kvm_memory_slot *memslot = iter.slot;
> +		gfn_t gfn_start, gfn_end;
> +
> +		gfn_start = max(start, memslot->base_gfn);
> +		gfn_end = min(end, memslot->base_gfn + memslot->npages);
> +		if (WARN_ON_ONCE(start >= end))
> +			continue;
> +
> +		r = kvm_slot_gmem_toggle_mappable(memslot, gfn_start, gfn_end, is_mappable);
> +		if (WARN_ON_ONCE(r))
> +			break;
> +	}
> +
> +	mutex_unlock(&kvm->slots_lock);
> +
> +	return r;
> +}
> +
> +int kvm_gmem_set_mappable(struct kvm *kvm, gfn_t start, gfn_t end)
> +{
> +	return kvm_gmem_toggle_mappable(kvm, start, end, true);
> +}
> +
> +int kvm_gmem_clear_mappable(struct kvm *kvm, gfn_t start, gfn_t end)
> +{
> +	return kvm_gmem_toggle_mappable(kvm, start, end, false);
> +}
> +
> +#endif /* CONFIG_KVM_PRIVATE_MEM_MAPPABLE */
> +
>  /* Copy @len bytes from guest memory at '(@gfn * PAGE_SIZE) + @offset' to @data */
>  static int __kvm_read_guest_page(struct kvm_memory_slot *slot, gfn_t gfn,
>  				 void *data, int offset, int len)

[1] https://lore.kernel.org/all/20230914015531.1419405-15-seanjc@google.com/
[2] https://lore.kernel.org/all/cover.1691446946.git.ackerleytng@google.com/T/
[3] https://lore.kernel.org/all/ZQOmcc969s90DwNz@google.com/


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 04/10] KVM: Add KVM capability to check if guest_memfd can be mapped by the host
  2024-08-01  9:01 ` [RFC PATCH v2 04/10] KVM: Add KVM capability to check if guest_memfd can be mapped by the host Fuad Tabba
@ 2024-08-05 17:19   ` Ackerley Tng
  2024-08-05 18:12     ` Fuad Tabba
  0 siblings, 1 reply; 21+ messages in thread
From: Ackerley Tng @ 2024-08-05 17:19 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	yu.c.zhang, isaku.yamahata, mic, vbabka, vannapurve, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat,
	shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd, tabba

Fuad Tabba <tabba@google.com> writes:

> Add the KVM capability KVM_CAP_GUEST_MEMFD_MAPPABLE, which is
> true if mapping guest memory is supported by the host.
>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>  include/uapi/linux/kvm.h | 3 ++-
>  virt/kvm/kvm_main.c      | 4 ++++
>  2 files changed, 6 insertions(+), 1 deletion(-)
>
> <snip>

Why do we need a cap for `KVM_CAP_GUEST_MEMFD_MAPPABLE` instead of just
making guest_memfd mmap-able?

Is this to prevent breaking userspace, because a user might be relying
on guest_memfd being not mmap-able?


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 02/10] KVM: Add restricted support for mapping guestmem by the host
  2024-08-05 17:14   ` Ackerley Tng
@ 2024-08-05 18:08     ` Fuad Tabba
  0 siblings, 0 replies; 21+ messages in thread
From: Fuad Tabba @ 2024-08-05 18:08 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	yu.c.zhang, isaku.yamahata, mic, vbabka, vannapurve, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat,
	shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd

Hi Ackerley,

On Mon, 5 Aug 2024 at 18:14, Ackerley Tng <ackerleytng@google.com> wrote:
>
> Fuad Tabba <tabba@google.com> writes:
>
> > Add support for mmap() and fault() for guest_memfd in the host.
> > The ability to fault in a guest page is contingent on that page
> > being shared with the host. To track this, this patch adds a new
> > xarray to each guest_memfd object, which tracks the mappability
> > of guest frames.
> >
> > The guest_memfd PRIVATE memory attribute is not used for two
> > reasons. First because it reflects the userspace expectation for
> > that memory location, and therefore can be toggled by userspace.
>
> Thank you for clarifying why the PRIVATE memory attribute cannot be
> used. I think that having the attributes with guest_memfd is a good
> idea.
>
> Since faultability is a property of the memory contents, shouldn't
> faultability be stored on the inode rather than the file?

I don't have a strong opinion about this, but by not mappable, this
means not having a valid mapping in the host, not that the "mmap"
call in and of itself is not allowed.

>
> > The second is, although each guest_memfd file has a 1:1 binding
> > with a KVM instance, the plan is to allow multiple files per
> > inode, e.g. to allow intra-host migration to a new KVM instance,
> > without destroying guest_memfd.
>
> I think you also alluded to the concept of inodes vs files above.
>
> To store the xarray on the inode, we would probably have to make
> guest_memfd use its own mount so that we can use .evict_inode() from
> struct address_space_operations to clean up the inode neatly, perhaps
> the way it was done in guest_memfd v12 [1].
>
> This RFC to enable intra-host migration [2] was built with the version
> of guest_memfd that used its own mount. IIUC, the gmem struct stored on
> the file is meant to be the binding between struct kvm and the memory
> contents [3], so the gmem struct won't be transferred and a new gmem
> struct will be created.

I thought that the gmem object itself is unique, and that even if
we're transferring it from one inode to another it would still be
there. If not, then you're right and we need to store this data in the
inode.

Thanks for pointing me to the RFC. I'll have a look at it and use it
as a reference when I respin this.

> >
> > This new feature is gated with a new configuration option,
> > CONFIG_KVM_PRIVATE_MEM_MAPPABLE.
> >
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> >  include/linux/kvm_host.h |  61 ++++++++++++++++++++
> >  virt/kvm/Kconfig         |   4 ++
> >  virt/kvm/guest_memfd.c   | 110 +++++++++++++++++++++++++++++++++++
> >  virt/kvm/kvm_main.c      | 122 +++++++++++++++++++++++++++++++++++++++
> >  4 files changed, 297 insertions(+)
> >
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index 43a157f8171a..ab1344327e57 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -2452,4 +2452,65 @@ static inline int kvm_gmem_get_pfn_locked(struct kvm *kvm,
> >  }
> >  #endif /* CONFIG_KVM_PRIVATE_MEM */
> >
> > +#ifdef CONFIG_KVM_PRIVATE_MEM_MAPPABLE
> > +bool kvm_gmem_is_mappable(struct kvm *kvm, gfn_t gfn, gfn_t end);
> > +bool kvm_gmem_is_mapped(struct kvm *kvm, gfn_t start, gfn_t end);
> > +int kvm_gmem_set_mappable(struct kvm *kvm, gfn_t start, gfn_t end);
>
> How will kvm_gmem_is_mapped() and kvm_gmem_set_mappable() be used?

In this patch series I only use __ kvm_gmem_is_mapped(), you're right.
In later patches (not posted, pkvm specific), I use
kvm_gmem_is_mapped() in the processing of an unshare call from the
guest.

> > +int kvm_gmem_clear_mappable(struct kvm *kvm, gfn_t start, gfn_t end);
> > +int kvm_slot_gmem_toggle_mappable(struct kvm_memory_slot *slot, gfn_t start,
> > +                               gfn_t end, bool is_mappable);
> > +int kvm_slot_gmem_set_mappable(struct kvm_memory_slot *slot, gfn_t start,
> > +                            gfn_t end);
> > +int kvm_slot_gmem_clear_mappable(struct kvm_memory_slot *slot, gfn_t start,
> > +                              gfn_t end);
> > +bool kvm_slot_gmem_is_mappable(struct kvm_memory_slot *slot, gfn_t gfn);
> > +#else
> > +static inline bool kvm_gmem_is_mappable(struct kvm *kvm, gfn_t gfn, gfn_t end)
> > +{
> > +     WARN_ON_ONCE(1);
> > +     return false;
> > +}
> > +static inline bool kvm_gmem_is_mapped(struct kvm *kvm, gfn_t start, gfn_t end)
> > +{
> > +     WARN_ON_ONCE(1);
> > +     return false;
> > +}
> > +static inline int kvm_gmem_set_mappable(struct kvm *kvm, gfn_t start, gfn_t end)
> > +{
> > +     WARN_ON_ONCE(1);
> > +     return -EINVAL;
> > +}
> > +static inline int kvm_gmem_clear_mappable(struct kvm *kvm, gfn_t start,
> > +                                       gfn_t end)
> > +{
> > +     WARN_ON_ONCE(1);
> > +     return -EINVAL;
> > +}
> > +static inline int kvm_slot_gmem_toggle_mappable(struct kvm_memory_slot *slot,
> > +                                             gfn_t start, gfn_t end,
> > +                                             bool is_mappable)
> > +{
> > +     WARN_ON_ONCE(1);
> > +     return -EINVAL;
> > +}
> > +static inline int kvm_slot_gmem_set_mappable(struct kvm_memory_slot *slot,
> > +                                          gfn_t start, gfn_t end)
> > +{
> > +     WARN_ON_ONCE(1);
> > +     return -EINVAL;
> > +}
> > +static inline int kvm_slot_gmem_clear_mappable(struct kvm_memory_slot *slot,
> > +                                            gfn_t start, gfn_t end)
> > +{
> > +     WARN_ON_ONCE(1);
> > +     return -EINVAL;
> > +}
> > +static inline bool kvm_slot_gmem_is_mappable(struct kvm_memory_slot *slot,
> > +                                          gfn_t gfn)
> > +{
> > +     WARN_ON_ONCE(1);
> > +     return false;
> > +}
> > +#endif /* CONFIG_KVM_PRIVATE_MEM_MAPPABLE */
> > +
> >  #endif
> > diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> > index 29b73eedfe74..a3970c5eca7b 100644
> > --- a/virt/kvm/Kconfig
> > +++ b/virt/kvm/Kconfig
> > @@ -109,3 +109,7 @@ config KVM_GENERIC_PRIVATE_MEM
> >         select KVM_GENERIC_MEMORY_ATTRIBUTES
> >         select KVM_PRIVATE_MEM
> >         bool
> > +
> > +config KVM_PRIVATE_MEM_MAPPABLE
> > +       select KVM_PRIVATE_MEM
> > +       bool
> > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> > index f3f4334a9ccb..0a1f266a16f9 100644
> > --- a/virt/kvm/guest_memfd.c
> > +++ b/virt/kvm/guest_memfd.c
> > @@ -11,6 +11,9 @@ struct kvm_gmem {
> >       struct kvm *kvm;
> >       struct xarray bindings;
> >       struct list_head entry;
> > +#ifdef CONFIG_KVM_PRIVATE_MEM_MAPPABLE
> > +     struct xarray unmappable_gfns;
> > +#endif
> >  };
> >
> >  static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index)
> > @@ -230,6 +233,11 @@ static int kvm_gmem_release(struct inode *inode, struct file *file)
> >       mutex_unlock(&kvm->slots_lock);
> >
> >       xa_destroy(&gmem->bindings);
> > +
> > +#ifdef CONFIG_KVM_PRIVATE_MEM_MAPPABLE
> > +     xa_destroy(&gmem->unmappable_gfns);
> > +#endif
> > +
> >       kfree(gmem);
> >
> >       kvm_put_kvm(kvm);
> > @@ -248,7 +256,105 @@ static inline struct file *kvm_gmem_get_file(struct kvm_memory_slot *slot)
> >       return get_file_active(&slot->gmem.file);
> >  }
> >
> > +#ifdef CONFIG_KVM_PRIVATE_MEM_MAPPABLE
> > +int kvm_slot_gmem_toggle_mappable(struct kvm_memory_slot *slot, gfn_t start,
> > +                               gfn_t end, bool is_mappable)
> > +{
> > +     struct kvm_gmem *gmem = slot->gmem.file->private_data;
> > +     void *xval = is_mappable ? NULL : xa_mk_value(true);
>
> IIUC storing stuff in the xarray takes memory, so if we want to save
> memory, we should minimize entries in the xarray. For pKVM, do you
> expect more mappable, or more unmappable gfns?
>
> For TDX most of the memory will be private, so we expect fewer mappable
> gfns, and we'd prefer "entry in xarray == mappable".

This is similar to a discussion I've had earlier regarding the PRIVATE
attribute [*].

I don't have a strong opinion. It's trivial to reverse the polarity.
But to answer your question, most of the memory in pKVM would be
private for protected guests, but shared for non-protected guests. So,
it's not easy to know which would be the more common use case.

Cheers,
/fuad

[*] https://lore.kernel.org/lkml/20231027182217.3615211-1-seanjc@google.com/T/#:~:text=consistent%20across%20implementations.-,Yeah%2C%20we%20discussed%20this%20in%20v12%5B*%5D.%20%20The%20default%20really%20doesn%27t%20matter%20for%20memory,-overheads%20or%20performances


> > +     void *r;
> > +
> > +     r = xa_store_range(&gmem->unmappable_gfns, start, end - 1, xval, GFP_KERNEL);
> > +
> > +     return xa_err(r);
> > +}
> > +
> > +int kvm_slot_gmem_set_mappable(struct kvm_memory_slot *slot, gfn_t start, gfn_t end)
> > +{
> > +     return kvm_slot_gmem_toggle_mappable(slot, start, end, true);
> > +}
> > +
> > +int kvm_slot_gmem_clear_mappable(struct kvm_memory_slot *slot, gfn_t start, gfn_t end)
> > +{
> > +     return kvm_slot_gmem_toggle_mappable(slot, start, end, false);
> > +}
> > +
> > +bool kvm_slot_gmem_is_mappable(struct kvm_memory_slot *slot, gfn_t gfn)
> > +{
> > +     struct kvm_gmem *gmem = slot->gmem.file->private_data;
> > +     unsigned long _gfn = gfn;
> > +
> > +     return !xa_find(&gmem->unmappable_gfns, &_gfn, ULONG_MAX, XA_PRESENT);
> > +}
> > +
> > +static bool kvm_gmem_isfaultable(struct vm_fault *vmf)
> > +{
> > +     struct kvm_gmem *gmem = vmf->vma->vm_file->private_data;
> > +     struct inode *inode = file_inode(vmf->vma->vm_file);
> > +     pgoff_t pgoff = vmf->pgoff;
> > +     struct kvm_memory_slot *slot;
> > +     unsigned long index;
> > +     bool r = true;
> > +
> > +     filemap_invalidate_lock(inode->i_mapping);
> > +
> > +     xa_for_each_range(&gmem->bindings, index, slot, pgoff, pgoff) {
> > +             pgoff_t base_gfn = slot->base_gfn;
> > +             pgoff_t gfn_pgoff = slot->gmem.pgoff;
> > +             pgoff_t gfn = base_gfn + max(gfn_pgoff, pgoff) - gfn_pgoff;
> > +
> > +             if (!kvm_slot_gmem_is_mappable(slot, gfn)) {
> > +                     r = false;
> > +                     break;
> > +             }
> > +     }
> > +
> > +     filemap_invalidate_unlock(inode->i_mapping);
> > +
> > +     return r;
> > +}
> > +
> > +static vm_fault_t kvm_gmem_fault(struct vm_fault *vmf)
> > +{
> > +     struct folio *folio;
> > +
> > +     folio = kvm_gmem_get_folio(file_inode(vmf->vma->vm_file), vmf->pgoff);
> > +     if (!folio)
> > +             return VM_FAULT_SIGBUS;
> > +
> > +     if (!kvm_gmem_isfaultable(vmf)) {
> > +             folio_unlock(folio);
> > +             folio_put(folio);
> > +             return VM_FAULT_SIGBUS;
> > +     }
> > +
> > +     vmf->page = folio_file_page(folio, vmf->pgoff);
> > +     return VM_FAULT_LOCKED;
> > +}
> > +
> > +static const struct vm_operations_struct kvm_gmem_vm_ops = {
> > +     .fault = kvm_gmem_fault,
> > +};
> > +
> > +static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
> > +{
> > +     if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) !=
> > +         (VM_SHARED | VM_MAYSHARE)) {
> > +             return -EINVAL;
> > +     }
> > +
> > +     file_accessed(file);
> > +     vm_flags_set(vma, VM_DONTDUMP);
> > +     vma->vm_ops = &kvm_gmem_vm_ops;
> > +
> > +     return 0;
> > +}
> > +#else
> > +#define kvm_gmem_mmap NULL
> > +#endif /* CONFIG_KVM_PRIVATE_MEM_MAPPABLE */
> > +
> >  static struct file_operations kvm_gmem_fops = {
> > +     .mmap           = kvm_gmem_mmap,
> >       .open           = generic_file_open,
> >       .release        = kvm_gmem_release,
> >       .fallocate      = kvm_gmem_fallocate,
> > @@ -369,6 +475,10 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
> >       xa_init(&gmem->bindings);
> >       list_add(&gmem->entry, &inode->i_mapping->i_private_list);
> >
> > +#ifdef CONFIG_KVM_PRIVATE_MEM_MAPPABLE
> > +     xa_init(&gmem->unmappable_gfns);
> > +#endif
> > +
> >       fd_install(fd, file);
> >       return fd;
> >
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 1192942aef91..f4b4498d4de6 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -3265,6 +3265,128 @@ static int next_segment(unsigned long len, int offset)
> >               return len;
> >  }
> >
> > +#ifdef CONFIG_KVM_PRIVATE_MEM_MAPPABLE
> > +static bool __kvm_gmem_is_mappable(struct kvm *kvm, gfn_t start, gfn_t end)
> > +{
> > +     struct kvm_memslot_iter iter;
> > +
> > +     lockdep_assert_held(&kvm->slots_lock);
> > +
> > +     kvm_for_each_memslot_in_gfn_range(&iter, kvm_memslots(kvm), start, end) {
> > +             struct kvm_memory_slot *memslot = iter.slot;
> > +             gfn_t gfn_start, gfn_end, i;
> > +
> > +             gfn_start = max(start, memslot->base_gfn);
> > +             gfn_end = min(end, memslot->base_gfn + memslot->npages);
> > +             if (WARN_ON_ONCE(gfn_start >= gfn_end))
> > +                     continue;
> > +
> > +             for (i = gfn_start; i < gfn_end; i++) {
> > +                     if (!kvm_slot_gmem_is_mappable(memslot, i))
> > +                             return false;
> > +             }
> > +     }
> > +
> > +     return true;
> > +}
> > +
> > +bool kvm_gmem_is_mappable(struct kvm *kvm, gfn_t start, gfn_t end)
> > +{
> > +     bool r;
> > +
> > +     mutex_lock(&kvm->slots_lock);
> > +     r = __kvm_gmem_is_mappable(kvm, start, end);
> > +     mutex_unlock(&kvm->slots_lock);
> > +
> > +     return r;
> > +}
> > +
> > +static bool __kvm_gmem_is_mapped(struct kvm *kvm, gfn_t start, gfn_t end)
> > +{
> > +     struct kvm_memslot_iter iter;
> > +
> > +     lockdep_assert_held(&kvm->slots_lock);
> > +
> > +     kvm_for_each_memslot_in_gfn_range(&iter, kvm_memslots(kvm), start, end) {
> > +             struct kvm_memory_slot *memslot = iter.slot;
> > +             gfn_t gfn_start, gfn_end, i;
> > +
> > +             gfn_start = max(start, memslot->base_gfn);
> > +             gfn_end = min(end, memslot->base_gfn + memslot->npages);
> > +             if (WARN_ON_ONCE(gfn_start >= gfn_end))
> > +                     continue;
> > +
> > +             for (i = gfn_start; i < gfn_end; i++) {
> > +                     struct page *page;
> > +                     bool is_mapped;
> > +                     kvm_pfn_t pfn;
> > +
> > +                     if (WARN_ON_ONCE(kvm_gmem_get_pfn_locked(kvm, memslot, i, &pfn, NULL)))
> > +                             continue;
> > +
> > +                     page = pfn_to_page(pfn);
> > +                     is_mapped = page_mapped(page) || page_maybe_dma_pinned(page);
> > +                     unlock_page(page);
> > +                     put_page(page);
> > +
> > +                     if (is_mapped)
> > +                             return true;
> > +             }
> > +     }
> > +
> > +     return false;
> > +}
> > +
> > +bool kvm_gmem_is_mapped(struct kvm *kvm, gfn_t start, gfn_t end)
> > +{
> > +     bool r;
> > +
> > +     mutex_lock(&kvm->slots_lock);
> > +     r = __kvm_gmem_is_mapped(kvm, start, end);
> > +     mutex_unlock(&kvm->slots_lock);
> > +
> > +     return r;
> > +}
> > +
> > +static int kvm_gmem_toggle_mappable(struct kvm *kvm, gfn_t start, gfn_t end,
> > +                                 bool is_mappable)
> > +{
> > +     struct kvm_memslot_iter iter;
> > +     int r = 0;
> > +
> > +     mutex_lock(&kvm->slots_lock);
> > +
> > +     kvm_for_each_memslot_in_gfn_range(&iter, kvm_memslots(kvm), start, end) {
> > +             struct kvm_memory_slot *memslot = iter.slot;
> > +             gfn_t gfn_start, gfn_end;
> > +
> > +             gfn_start = max(start, memslot->base_gfn);
> > +             gfn_end = min(end, memslot->base_gfn + memslot->npages);
> > +             if (WARN_ON_ONCE(start >= end))
> > +                     continue;
> > +
> > +             r = kvm_slot_gmem_toggle_mappable(memslot, gfn_start, gfn_end, is_mappable);
> > +             if (WARN_ON_ONCE(r))
> > +                     break;
> > +     }
> > +
> > +     mutex_unlock(&kvm->slots_lock);
> > +
> > +     return r;
> > +}
> > +
> > +int kvm_gmem_set_mappable(struct kvm *kvm, gfn_t start, gfn_t end)
> > +{
> > +     return kvm_gmem_toggle_mappable(kvm, start, end, true);
> > +}
> > +
> > +int kvm_gmem_clear_mappable(struct kvm *kvm, gfn_t start, gfn_t end)
> > +{
> > +     return kvm_gmem_toggle_mappable(kvm, start, end, false);
> > +}
> > +
> > +#endif /* CONFIG_KVM_PRIVATE_MEM_MAPPABLE */
> > +
> >  /* Copy @len bytes from guest memory at '(@gfn * PAGE_SIZE) + @offset' to @data */
> >  static int __kvm_read_guest_page(struct kvm_memory_slot *slot, gfn_t gfn,
> >                                void *data, int offset, int len)
>
> [1] https://lore.kernel.org/all/20230914015531.1419405-15-seanjc@google.com/
> [2] https://lore.kernel.org/all/cover.1691446946.git.ackerleytng@google.com/T/
> [3] https://lore.kernel.org/all/ZQOmcc969s90DwNz@google.com/


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 04/10] KVM: Add KVM capability to check if guest_memfd can be mapped by the host
  2024-08-05 17:19   ` Ackerley Tng
@ 2024-08-05 18:12     ` Fuad Tabba
  0 siblings, 0 replies; 21+ messages in thread
From: Fuad Tabba @ 2024-08-05 18:12 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	yu.c.zhang, isaku.yamahata, mic, vbabka, vannapurve, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat,
	shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd

Hi Ackerley,

On Mon, 5 Aug 2024 at 18:19, Ackerley Tng <ackerleytng@google.com> wrote:
>
> Fuad Tabba <tabba@google.com> writes:
>
> > Add the KVM capability KVM_CAP_GUEST_MEMFD_MAPPABLE, which is
> > true if mapping guest memory is supported by the host.
> >
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> >  include/uapi/linux/kvm.h | 3 ++-
> >  virt/kvm/kvm_main.c      | 4 ++++
> >  2 files changed, 6 insertions(+), 1 deletion(-)
> >
> > <snip>
>
> Why do we need a cap for `KVM_CAP_GUEST_MEMFD_MAPPABLE` instead of just
> making guest_memfd mmap-able?
>
> Is this to prevent breaking userspace, because a user might be relying
> on guest_memfd being not mmap-able?

To be able to check that the ability is there, since it is a new
capability not available in Linux 6.9 not 6.10 (i.e., after
guest_memfd()) was introduced.

Cheers,
/fuad


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 00/10] KVM: Restricted mapping of guest_memfd at the host and pKVM/arm64 support
  2024-08-05 16:53 ` [RFC PATCH v2 00/10] KVM: Restricted mapping of guest_memfd at the host and pKVM/arm64 support Ackerley Tng
@ 2024-08-05 18:13   ` Fuad Tabba
  0 siblings, 0 replies; 21+ messages in thread
From: Fuad Tabba @ 2024-08-05 18:13 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	yu.c.zhang, isaku.yamahata, mic, vbabka, vannapurve, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat,
	shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd

Hi Ackerley,

On Mon, 5 Aug 2024 at 17:53, Ackerley Tng <ackerleytng@google.com> wrote:
>
> Fuad Tabba <tabba@google.com> writes:
>
> > This series adds restricted mmap() support to guest_memfd, as
> > well as support for guest_memfd on pKVM/arm64. It is based on
> > Linux 6.10.
> >
> > Main changes since V1 [1]:
> >
> > - Decoupled whether guest memory is mappable from KVM memory
> > attributes (SeanC)
> >
> > Mappability is now tracked in the guest_mem object, orthogonal to
> > whether userspace wants the memory to be private or shared.
> > Moreover, the memory attributes capability (i.e.,
> > KVM_CAP_MEMORY_ATTRIBUTES) is not enabled for pKVM, since for
> > software-based hypervisors such as pKVM and Gunyah, userspace is
> > informed of the state of the memory via hypervisor exits if
> > needed.
> >
> > Even if attributes are enabled, this patch series would still
> > work (modulo bugs), without compromising guest memory nor
> > crashing the system.
> >
> > - Use page_mapped() instead of page_mapcount() to check if page
> > is mapped (DavidH)
> >
> > - Add a new capability, KVM_CAP_GUEST_MEMFD_MAPPABLE, to query
> > whether guest private memory can be mapped (with aforementioned
> > restrictions)
> >
> > - Add a selftest to check whether memory is mappable when the
> > capability is enabled, and not mappable otherwise. Also, test the
> > effect of punching holes in mapped memory. (DavidH)
> >
> > By design, guest_memfd cannot be mapped, read, or written by the
> > host. In pKVM, memory shared between a protected guest and the
>
> I think we should use "cannot be faulted in" to be clear that
> guest_memfd can be mmaped but not faulted in.
>
> Would it be better to have all the variables/config macros be something
> about faultability instead of mappability?

With mappability, I mean having a valid mapping in the host. But like
I said in the reply to the other patch, I don't have a strong opinion
about this.

Cheers,
/fuad

> > host is shared in-place, unlike the other confidential computing
> > solutions that guest_memfd was originally envisaged for (e.g,
> > TDX). When initializing a guest, as well as when accessing memory
> > shared by the guest with the host, it would be useful to support
> > mapping that memory at the host to avoid copying its contents.
> >
> > One of the benefits of guest_memfd is that it prevents a
> > misbehaving host process from crashing the system when attempting
> > to access (deliberately or accidentally) protected guest memory,
> > since this memory isn't mapped to begin with. Without
> > guest_memfd, the hypervisor would still prevent such accesses,
> > but in certain cases the host kernel wouldn't be able to recover,
> > causing the system to crash.
> >
> > Support for mmap() in this patch series maintains the invariant
> > that only memory shared with the host, either explicitly by the
> > guest or implicitly before the guest has started running (in
> > order to populate its memory) is allowed to have a valid mapping
> > at the host. At no time should private (as viewed by the
> > hypervisor) guest memory be mapped at the host.
> >
> > This patch series is divided into two parts:
> >
> > The first part is to the KVM core code. It adds opt-in support
> > for mapping guest memory only as long as it is shared. For that,
> > the host needs to know the mappability status of guest memory.
> > Therefore, the series adds a structure to track whether memory is
> > mappable. This new structure is associated with each guest_memfd
> > object.
> >
> > The second part of the series adds guest_memfd support for
> > pKVM/arm64.
> >
> > We don't enforce the invariant that only memory shared with the
> > host can be mapped by the host userspace in
> > file_operations::mmap(), but we enforce it in
> > vm_operations_struct:fault(). On vm_operations_struct::fault(),
> > we check whether the page is allowed to be mapped. If not, we
> > deliver a SIGBUS to the current task, as discussed in the Linux
> > MM Alignment Session on this topic [2].
> >
> > Currently there's no support for huge pages, which is something
> > we hope to add in the future, and seems to be a hot topic for the
> > upcoming LPC 2024 [3].
> >
> > Cheers,
> > /fuad
> >
> > [1] https://lore.kernel.org/all/20240222161047.402609-1-tabba@google.com/
> >
> > [2] https://lore.kernel.org/all/20240712232937.2861788-1-ackerleytng@google.com/
> >
> > [3] https://lpc.events/event/18/sessions/183/#20240919
> >
> > Fuad Tabba (10):
> >   KVM: Introduce kvm_gmem_get_pfn_locked(), which retains the folio lock
> >   KVM: Add restricted support for mapping guestmem by the host
> >   KVM: Implement kvm_(read|/write)_guest_page for private memory slots
> >   KVM: Add KVM capability to check if guest_memfd can be mapped by the
> >     host
> >   KVM: selftests: guest_memfd mmap() test when mapping is allowed
> >   KVM: arm64: Skip VMA checks for slots without userspace address
> >   KVM: arm64: Do not allow changes to private memory slots
> >   KVM: arm64: Handle guest_memfd()-backed guest page faults
> >   KVM: arm64: arm64 has private memory support when config is enabled
> >   KVM: arm64: Enable private memory kconfig for arm64
> >
> >  arch/arm64/include/asm/kvm_host.h             |   3 +
> >  arch/arm64/kvm/Kconfig                        |   1 +
> >  arch/arm64/kvm/mmu.c                          | 139 +++++++++-
> >  include/linux/kvm_host.h                      |  72 +++++
> >  include/uapi/linux/kvm.h                      |   3 +-
> >  tools/testing/selftests/kvm/Makefile          |   1 +
> >  .../testing/selftests/kvm/guest_memfd_test.c  |  47 +++-
> >  virt/kvm/Kconfig                              |   4 +
> >  virt/kvm/guest_memfd.c                        | 129 ++++++++-
> >  virt/kvm/kvm_main.c                           | 253 ++++++++++++++++--
> >  10 files changed, 628 insertions(+), 24 deletions(-)
> >
> >
> > base-commit: 0c3836482481200ead7b416ca80c68a29cfdaabd


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 09/10] KVM: arm64: arm64 has private memory support when config is enabled
  2024-08-01  9:01 ` [RFC PATCH v2 09/10] KVM: arm64: arm64 has private memory support when config is enabled Fuad Tabba
@ 2024-08-15  6:27   ` Patrick Roy
  2024-08-15  7:26     ` Fuad Tabba
  0 siblings, 1 reply; 21+ messages in thread
From: Patrick Roy @ 2024-08-15  6:27 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, yu.c.zhang,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail,
	david, michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, shuah, hch,
	jgg, rientjes, jhubbard, fvdl, hughd

Hi Fuad,

On Thu, 2024-08-01 at 10:01 +0100, Fuad Tabba wrote:
> Implement kvm_arch_has_private_mem() in arm64, making it
> dependent on the configuration option.
> 
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>  arch/arm64/include/asm/kvm_host.h | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 36b8e97bf49e..8f7d78ee9557 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -1414,4 +1414,7 @@ bool kvm_arm_vcpu_stopped(struct kvm_vcpu *vcpu);
>                 (pa + pi + pa3) == 1;                                   \
>         })
> 
> +#define kvm_arch_has_private_mem(kvm)                                  \
> +       (IS_ENABLED(CONFIG_KVM_PRIVATE_MEM) && is_protected_kvm_enabled())
> +

Would it make sense to have some ARM equivalent of
KVM_X86_SW_PROTECTED_VM here? Both for easier testing of guest_memfd on
ARM, as well as for future non-coco usecases.

>  #endif /* __ARM64_KVM_HOST_H__ */
> --
> 2.46.0.rc1.232.g9752f9e123-goog
> 

Best,
Patrick


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 09/10] KVM: arm64: arm64 has private memory support when config is enabled
  2024-08-15  6:27   ` Patrick Roy
@ 2024-08-15  7:26     ` Fuad Tabba
  0 siblings, 0 replies; 21+ messages in thread
From: Fuad Tabba @ 2024-08-15  7:26 UTC (permalink / raw)
  To: Patrick Roy
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	yu.c.zhang, isaku.yamahata, mic, vbabka, vannapurve, ackerleytng,
	mail, david, michael.roth, wei.w.wang, liam.merwick,
	isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
	quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
	quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
	james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
	shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd

Hi Patrick,

On Thu, 15 Aug 2024 at 07:27, Patrick Roy <roypat@amazon.co.uk> wrote:
>
> Hi Fuad,
>
> On Thu, 2024-08-01 at 10:01 +0100, Fuad Tabba wrote:
> > Implement kvm_arch_has_private_mem() in arm64, making it
> > dependent on the configuration option.
> >
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> >  arch/arm64/include/asm/kvm_host.h | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 36b8e97bf49e..8f7d78ee9557 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -1414,4 +1414,7 @@ bool kvm_arm_vcpu_stopped(struct kvm_vcpu *vcpu);
> >                 (pa + pi + pa3) == 1;                                   \
> >         })
> >
> > +#define kvm_arch_has_private_mem(kvm)                                  \
> > +       (IS_ENABLED(CONFIG_KVM_PRIVATE_MEM) && is_protected_kvm_enabled())
> > +
>
> Would it make sense to have some ARM equivalent of
> KVM_X86_SW_PROTECTED_VM here? Both for easier testing of guest_memfd on
> ARM, as well as for future non-coco usecases.

I don't really have a strong opinion about this. I thought that
if/when that were to happen, it would be trivial to modify this macro.

Cheers,
/fuad

> >  #endif /* __ARM64_KVM_HOST_H__ */
> > --
> > 2.46.0.rc1.232.g9752f9e123-goog
> >
>
> Best,
> Patrick


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 03/10] KVM: Implement kvm_(read|/write)_guest_page for private memory slots
  2024-08-01  9:01 ` [RFC PATCH v2 03/10] KVM: Implement kvm_(read|/write)_guest_page for private memory slots Fuad Tabba
@ 2024-08-16 19:32   ` Sean Christopherson
  2024-09-03  9:28     ` Fuad Tabba
  0 siblings, 1 reply; 21+ messages in thread
From: Sean Christopherson @ 2024-08-16 19:32 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	yu.c.zhang, isaku.yamahata, mic, vbabka, vannapurve, ackerleytng,
	mail, david, michael.roth, wei.w.wang, liam.merwick,
	isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
	quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
	quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
	james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
	roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd

On Thu, Aug 01, 2024, Fuad Tabba wrote:
> Make __kvm_read_guest_page/__kvm_write_guest_page capable of
> accessing guest memory if no userspace address is available.
> Moreover, check that the memory being accessed is shared with the
> host before attempting the access.
> 
> KVM at the host might need to access shared memory that is not
> mapped in the host userspace but is in fact shared with the host,
> e.g., when accounting for stolen time. This allows the access
> without relying on the slot's userspace_addr being set.

Why?  As evidenced by the amount of code below, special casing guest_memfd isn't
trivial, and taking kvm->slots_lock is likely a complete non-starter.  In the
happy case, uaccess is about as fast as can be, and has no inherent scaling issues.

> This does not circumvent protection, since the access is only
> attempted if the memory is mappable by the host, which implies
> shareability.
> 
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>  virt/kvm/kvm_main.c | 127 ++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 111 insertions(+), 16 deletions(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index f4b4498d4de6..ec6255c7325e 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3385,20 +3385,108 @@ int kvm_gmem_clear_mappable(struct kvm *kvm, gfn_t start, gfn_t end)
>  	return kvm_gmem_toggle_mappable(kvm, start, end, false);
>  }
>  
> +static int __kvm_read_private_guest_page(struct kvm *kvm,

The changelog says this is for accessing memory that is shared, but this says
"private".

> +					 struct kvm_memory_slot *slot,
> +					 gfn_t gfn, void *data, int offset,
> +					 int len)


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 03/10] KVM: Implement kvm_(read|/write)_guest_page for private memory slots
  2024-08-16 19:32   ` Sean Christopherson
@ 2024-09-03  9:28     ` Fuad Tabba
  0 siblings, 0 replies; 21+ messages in thread
From: Fuad Tabba @ 2024-09-03  9:28 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	yu.c.zhang, isaku.yamahata, mic, vbabka, vannapurve, ackerleytng,
	mail, david, michael.roth, wei.w.wang, liam.merwick,
	isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
	quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
	quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
	james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
	roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd

Hi Sean,

On Fri, 16 Aug 2024 at 20:32, Sean Christopherson <seanjc@google.com> wrote:
>
> On Thu, Aug 01, 2024, Fuad Tabba wrote:
> > Make __kvm_read_guest_page/__kvm_write_guest_page capable of
> > accessing guest memory if no userspace address is available.
> > Moreover, check that the memory being accessed is shared with the
> > host before attempting the access.
> >
> > KVM at the host might need to access shared memory that is not
> > mapped in the host userspace but is in fact shared with the host,
> > e.g., when accounting for stolen time. This allows the access
> > without relying on the slot's userspace_addr being set.
>
> Why?  As evidenced by the amount of code below, special casing guest_memfd isn't
> trivial, and taking kvm->slots_lock is likely a complete non-starter.  In the
> happy case, uaccess is about as fast as can be, and has no inherent scaling issues.
>
> > This does not circumvent protection, since the access is only
> > attempted if the memory is mappable by the host, which implies
> > shareability.
> >
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> >  virt/kvm/kvm_main.c | 127 ++++++++++++++++++++++++++++++++++++++------
> >  1 file changed, 111 insertions(+), 16 deletions(-)
> >
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index f4b4498d4de6..ec6255c7325e 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -3385,20 +3385,108 @@ int kvm_gmem_clear_mappable(struct kvm *kvm, gfn_t start, gfn_t end)
> >       return kvm_gmem_toggle_mappable(kvm, start, end, false);
> >  }
> >
> > +static int __kvm_read_private_guest_page(struct kvm *kvm,
>
> The changelog says this is for accessing memory that is shared, but this says
> "private".

This is bad naming on my part. Instead, I should call this function
something like, read_guestmem_page (and similar for the write one).
Thanks for pointing this out.

Cheers,
/fuad

> > +                                      struct kvm_memory_slot *slot,
> > +                                      gfn_t gfn, void *data, int offset,
> > +                                      int len)


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2024-09-03  9:29 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-08-01  9:01 [RFC PATCH v2 00/10] KVM: Restricted mapping of guest_memfd at the host and pKVM/arm64 support Fuad Tabba
2024-08-01  9:01 ` [RFC PATCH v2 01/10] KVM: Introduce kvm_gmem_get_pfn_locked(), which retains the folio lock Fuad Tabba
2024-08-01  9:01 ` [RFC PATCH v2 02/10] KVM: Add restricted support for mapping guestmem by the host Fuad Tabba
2024-08-05 17:14   ` Ackerley Tng
2024-08-05 18:08     ` Fuad Tabba
2024-08-01  9:01 ` [RFC PATCH v2 03/10] KVM: Implement kvm_(read|/write)_guest_page for private memory slots Fuad Tabba
2024-08-16 19:32   ` Sean Christopherson
2024-09-03  9:28     ` Fuad Tabba
2024-08-01  9:01 ` [RFC PATCH v2 04/10] KVM: Add KVM capability to check if guest_memfd can be mapped by the host Fuad Tabba
2024-08-05 17:19   ` Ackerley Tng
2024-08-05 18:12     ` Fuad Tabba
2024-08-01  9:01 ` [RFC PATCH v2 05/10] KVM: selftests: guest_memfd mmap() test when mapping is allowed Fuad Tabba
2024-08-01  9:01 ` [RFC PATCH v2 06/10] KVM: arm64: Skip VMA checks for slots without userspace address Fuad Tabba
2024-08-01  9:01 ` [RFC PATCH v2 07/10] KVM: arm64: Do not allow changes to private memory slots Fuad Tabba
2024-08-01  9:01 ` [RFC PATCH v2 08/10] KVM: arm64: Handle guest_memfd()-backed guest page faults Fuad Tabba
2024-08-01  9:01 ` [RFC PATCH v2 09/10] KVM: arm64: arm64 has private memory support when config is enabled Fuad Tabba
2024-08-15  6:27   ` Patrick Roy
2024-08-15  7:26     ` Fuad Tabba
2024-08-01  9:01 ` [RFC PATCH v2 10/10] KVM: arm64: Enable private memory kconfig for arm64 Fuad Tabba
2024-08-05 16:53 ` [RFC PATCH v2 00/10] KVM: Restricted mapping of guest_memfd at the host and pKVM/arm64 support Ackerley Tng
2024-08-05 18:13   ` Fuad Tabba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox