linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 00/10] KVM: Mapping guest_memfd backed memory at the host for software protected VMs
@ 2025-03-12 17:58 Fuad Tabba
  2025-03-12 17:58 ` [PATCH v6 01/10] mm: Consolidate freeing of typed folios on final folio_put() Fuad Tabba
                   ` (9 more replies)
  0 siblings, 10 replies; 22+ messages in thread
From: Fuad Tabba @ 2025-03-12 17:58 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, tabba

Main changes since v5 [1]:
- Added handling of folio_put() when KVM is configured as a module
- KVM_GMEM_SHARED_MEM is orthogonal to KVM_GENERIC_MEMORY_ATTRIBUTES
  (Ackerley)
- kvm_gmem_offset_is_shared() takes folio as parameter to check locking
  (Kirill)
- Refactoring and fixes from comments on previous version
- Rebased on Linux 6.14-rc6

The purpose of this series is to serve as a base for _restricted_
mmap() support for guest_memfd backed memory at the host [2]. It
allows experimentation with what that support would be like in
the safe environment of software and non-confidential VM types.

For more background and for how to test this series, please refer
to v2 [3]. Note that an updated version of kvmtool that works
with this series is available here [4].

I'm working on respinning the series that tracks folio sharing [5]. I'll
post that one soon.

Cheers,
/fuad

[1] https://lore.kernel.org/all/20250303171013.3548775-1-tabba@google.com/
[2] https://lore.kernel.org/all/20250117163001.2326672-1-tabba@google.com/
[3] https://lore.kernel.org/all/20250129172320.950523-1-tabba@google.com/
[4] https://android-kvm.googlesource.com/kvmtool/+/refs/heads/tabba/guestmem-6.14
[5] https://lore.kernel.org/all/20250117163001.2326672-1-tabba@google.com/

Fuad Tabba (10):
  mm: Consolidate freeing of typed folios on final folio_put()
  KVM: guest_memfd: Handle final folio_put() of guest_memfd pages
  KVM: guest_memfd: Handle kvm_gmem_handle_folio_put() for KVM as a
    module
  KVM: guest_memfd: Allow host to map guest_memfd() pages
  KVM: guest_memfd: Handle in-place shared memory as guest_memfd backed
    memory
  KVM: x86: Mark KVM_X86_SW_PROTECTED_VM as supporting guest_memfd
    shared memory
  KVM: arm64: Refactor user_mem_abort() calculation of force_pte
  KVM: arm64: Handle guest_memfd()-backed guest page faults
  KVM: arm64: Enable mapping guest_memfd in arm64
  KVM: guest_memfd: selftests: guest_memfd mmap() test when mapping is
    allowed

 arch/arm64/include/asm/kvm_host.h             |  10 ++
 arch/arm64/kvm/Kconfig                        |   1 +
 arch/arm64/kvm/mmu.c                          |  76 +++++++-----
 arch/x86/include/asm/kvm_host.h               |   5 +
 arch/x86/kvm/Kconfig                          |   3 +-
 include/linux/kvm_host.h                      |  23 +++-
 include/linux/page-flags.h                    |  31 +++++
 include/uapi/linux/kvm.h                      |   1 +
 mm/debug.c                                    |   1 +
 mm/swap.c                                     |  50 +++++++-
 tools/testing/selftests/kvm/Makefile.kvm      |   1 +
 .../testing/selftests/kvm/guest_memfd_test.c  |  75 +++++++++++-
 virt/kvm/Kconfig                              |   4 +
 virt/kvm/guest_memfd.c                        | 110 ++++++++++++++++++
 virt/kvm/kvm_main.c                           |   9 +-
 15 files changed, 354 insertions(+), 46 deletions(-)


base-commit: 80e54e84911a923c40d7bee33a34c1b4be148d7a
-- 
2.49.0.rc0.332.g42c0ae87b1-goog



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v6 01/10] mm: Consolidate freeing of typed folios on final folio_put()
  2025-03-12 17:58 [PATCH v6 00/10] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
@ 2025-03-12 17:58 ` Fuad Tabba
  2025-03-12 17:58 ` [PATCH v6 02/10] KVM: guest_memfd: Handle final folio_put() of guest_memfd pages Fuad Tabba
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Fuad Tabba @ 2025-03-12 17:58 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, tabba

Some folio types, such as hugetlb, handle freeing their own
folios. Moreover, guest_memfd will require being notified once a
folio's reference count reaches 0 to facilitate shared to private
folio conversion, without the folio actually being freed at that
point.

As a first step towards that, this patch consolidates freeing
folios that have a type. The first user is hugetlb folios. Later
in this patch series, guest_memfd will become the second user of
this.

Suggested-by: David Hildenbrand <david@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 include/linux/page-flags.h | 15 +++++++++++++++
 mm/swap.c                  | 23 ++++++++++++++++++-----
 2 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 36d283552f80..6dc2494bd002 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -953,6 +953,21 @@ static inline bool page_has_type(const struct page *page)
 	return page_mapcount_is_type(data_race(page->page_type));
 }
 
+static inline int page_get_type(const struct page *page)
+{
+	return page->page_type >> 24;
+}
+
+static inline bool folio_has_type(const struct folio *folio)
+{
+	return page_has_type(&folio->page);
+}
+
+static inline int folio_get_type(const struct folio *folio)
+{
+	return page_get_type(&folio->page);
+}
+
 #define FOLIO_TYPE_OPS(lname, fname)					\
 static __always_inline bool folio_test_##fname(const struct folio *folio) \
 {									\
diff --git a/mm/swap.c b/mm/swap.c
index fc8281ef4241..47bc1bb919cc 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -94,6 +94,19 @@ static void page_cache_release(struct folio *folio)
 		unlock_page_lruvec_irqrestore(lruvec, flags);
 }
 
+static void free_typed_folio(struct folio *folio)
+{
+	switch (folio_get_type(folio)) {
+#ifdef CONFIG_HUGETLBFS
+	case PGTY_hugetlb:
+		free_huge_folio(folio);
+		return;
+#endif
+	default:
+		WARN_ON_ONCE(1);
+	}
+}
+
 void __folio_put(struct folio *folio)
 {
 	if (unlikely(folio_is_zone_device(folio))) {
@@ -101,8 +114,8 @@ void __folio_put(struct folio *folio)
 		return;
 	}
 
-	if (folio_test_hugetlb(folio)) {
-		free_huge_folio(folio);
+	if (unlikely(folio_has_type(folio))) {
+		free_typed_folio(folio);
 		return;
 	}
 
@@ -966,13 +979,13 @@ void folios_put_refs(struct folio_batch *folios, unsigned int *refs)
 		if (!folio_ref_sub_and_test(folio, nr_refs))
 			continue;
 
-		/* hugetlb has its own memcg */
-		if (folio_test_hugetlb(folio)) {
+		if (unlikely(folio_has_type(folio))) {
+			/* typed folios have their own memcg, if any */
 			if (lruvec) {
 				unlock_page_lruvec_irqrestore(lruvec, flags);
 				lruvec = NULL;
 			}
-			free_huge_folio(folio);
+			free_typed_folio(folio);
 			continue;
 		}
 		folio_unqueue_deferred_split(folio);
-- 
2.49.0.rc0.332.g42c0ae87b1-goog



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v6 02/10] KVM: guest_memfd: Handle final folio_put() of guest_memfd pages
  2025-03-12 17:58 [PATCH v6 00/10] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
  2025-03-12 17:58 ` [PATCH v6 01/10] mm: Consolidate freeing of typed folios on final folio_put() Fuad Tabba
@ 2025-03-12 17:58 ` Fuad Tabba
  2025-03-12 17:58 ` [PATCH v6 03/10] KVM: guest_memfd: Handle kvm_gmem_handle_folio_put() for KVM as a module Fuad Tabba
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Fuad Tabba @ 2025-03-12 17:58 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, tabba

Before transitioning a guest_memfd folio to unshared, thereby
disallowing access by the host and allowing the hypervisor to
transition its view of the guest page as private, we need to be
sure that the host doesn't have any references to the folio.

This patch introduces a new type for guest_memfd folios, which
isn't activated in this series but is here as a placeholder and
to facilitate the code in the subsequent patch series. This will
be used in the future to register a callback that informs the
guest_memfd subsystem when the last reference is dropped,
therefore knowing that the host doesn't have any remaining
references.

This patch also introduces the configuration option,
KVM_GMEM_SHARED_MEM, which toggles support for mapping
guest_memfd shared memory at the host.

Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand <david@redhat.com>
---
 include/linux/kvm_host.h   |  7 +++++++
 include/linux/page-flags.h | 16 ++++++++++++++++
 mm/debug.c                 |  1 +
 mm/swap.c                  |  9 +++++++++
 virt/kvm/Kconfig           |  4 ++++
 5 files changed, 37 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f34f4cfaa513..7788e3625f6d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2571,4 +2571,11 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
 				    struct kvm_pre_fault_memory *range);
 #endif
 
+#ifdef CONFIG_KVM_GMEM_SHARED_MEM
+static inline void kvm_gmem_handle_folio_put(struct folio *folio)
+{
+	WARN_ONCE(1, "A placeholder that shouldn't trigger. Work in progress.");
+}
+#endif
+
 #endif
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 6dc2494bd002..daeee9a38e4c 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -933,6 +933,7 @@ enum pagetype {
 	PGTY_slab	= 0xf5,
 	PGTY_zsmalloc	= 0xf6,
 	PGTY_unaccepted	= 0xf7,
+	PGTY_guestmem	= 0xf8,
 
 	PGTY_mapcount_underflow = 0xff
 };
@@ -1082,6 +1083,21 @@ FOLIO_TYPE_OPS(hugetlb, hugetlb)
 FOLIO_TEST_FLAG_FALSE(hugetlb)
 #endif
 
+/*
+ * guestmem folios are used to back VM memory as managed by guest_memfd. Once
+ * the last reference is put, instead of freeing these folios back to the page
+ * allocator, they are returned to guest_memfd.
+ *
+ * For now, guestmem will only be set on these folios as long as they  cannot be
+ * mapped to user space ("private state"), with the plan of always setting that
+ * type once typed folios can be mapped to user space cleanly.
+ */
+#ifdef CONFIG_KVM_GMEM_SHARED_MEM
+FOLIO_TYPE_OPS(guestmem, guestmem)
+#else
+FOLIO_TEST_FLAG_FALSE(guestmem)
+#endif
+
 PAGE_TYPE_OPS(Zsmalloc, zsmalloc, zsmalloc)
 
 /*
diff --git a/mm/debug.c b/mm/debug.c
index 8d2acf432385..08bc42c6cba8 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -56,6 +56,7 @@ static const char *page_type_names[] = {
 	DEF_PAGETYPE_NAME(table),
 	DEF_PAGETYPE_NAME(buddy),
 	DEF_PAGETYPE_NAME(unaccepted),
+	DEF_PAGETYPE_NAME(guestmem),
 };
 
 static const char *page_type_name(unsigned int page_type)
diff --git a/mm/swap.c b/mm/swap.c
index 47bc1bb919cc..241880a46358 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -38,6 +38,10 @@
 #include <linux/local_lock.h>
 #include <linux/buffer_head.h>
 
+#ifdef CONFIG_KVM_GMEM_SHARED_MEM
+#include <linux/kvm_host.h>
+#endif
+
 #include "internal.h"
 
 #define CREATE_TRACE_POINTS
@@ -101,6 +105,11 @@ static void free_typed_folio(struct folio *folio)
 	case PGTY_hugetlb:
 		free_huge_folio(folio);
 		return;
+#endif
+#ifdef CONFIG_KVM_GMEM_SHARED_MEM
+	case PGTY_guestmem:
+		kvm_gmem_handle_folio_put(folio);
+		return;
 #endif
 	default:
 		WARN_ON_ONCE(1);
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 54e959e7d68f..4e759e8020c5 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -124,3 +124,7 @@ config HAVE_KVM_ARCH_GMEM_PREPARE
 config HAVE_KVM_ARCH_GMEM_INVALIDATE
        bool
        depends on KVM_PRIVATE_MEM
+
+config KVM_GMEM_SHARED_MEM
+       select KVM_PRIVATE_MEM
+       bool
-- 
2.49.0.rc0.332.g42c0ae87b1-goog



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v6 03/10] KVM: guest_memfd: Handle kvm_gmem_handle_folio_put() for KVM as a module
  2025-03-12 17:58 [PATCH v6 00/10] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
  2025-03-12 17:58 ` [PATCH v6 01/10] mm: Consolidate freeing of typed folios on final folio_put() Fuad Tabba
  2025-03-12 17:58 ` [PATCH v6 02/10] KVM: guest_memfd: Handle final folio_put() of guest_memfd pages Fuad Tabba
@ 2025-03-12 17:58 ` Fuad Tabba
  2025-03-13 13:46   ` Ackerley Tng
  2025-03-13 13:49   ` Ackerley Tng
  2025-03-12 17:58 ` [PATCH v6 04/10] KVM: guest_memfd: Allow host to map guest_memfd() pages Fuad Tabba
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 22+ messages in thread
From: Fuad Tabba @ 2025-03-12 17:58 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, tabba

In some architectures, KVM could be defined as a module. If there is a
pending folio_put() while KVM is unloaded, the system could crash. By
having a helper check for that and call the function only if it's
available, we are able to handle that case more gracefully.

Signed-off-by: Fuad Tabba <tabba@google.com>

---

This patch could be squashed with the previous one of the maintainers
think it would be better.
---
 include/linux/kvm_host.h |  5 +----
 mm/swap.c                | 20 +++++++++++++++++++-
 virt/kvm/guest_memfd.c   |  8 ++++++++
 3 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7788e3625f6d..3ad0719bfc4f 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2572,10 +2572,7 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
 #endif
 
 #ifdef CONFIG_KVM_GMEM_SHARED_MEM
-static inline void kvm_gmem_handle_folio_put(struct folio *folio)
-{
-	WARN_ONCE(1, "A placeholder that shouldn't trigger. Work in progress.");
-}
+void kvm_gmem_handle_folio_put(struct folio *folio);
 #endif
 
 #endif
diff --git a/mm/swap.c b/mm/swap.c
index 241880a46358..27dfd75536c8 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -98,6 +98,24 @@ static void page_cache_release(struct folio *folio)
 		unlock_page_lruvec_irqrestore(lruvec, flags);
 }
 
+#ifdef CONFIG_KVM_GMEM_SHARED_MEM
+static void gmem_folio_put(struct folio *folio)
+{
+#if IS_MODULE(CONFIG_KVM)
+	void (*fn)(struct folio *folio);
+
+	fn = symbol_get(kvm_gmem_handle_folio_put);
+	if (WARN_ON_ONCE(!fn))
+		return;
+
+	fn(folio);
+	symbol_put(kvm_gmem_handle_folio_put);
+#else
+	kvm_gmem_handle_folio_put(folio);
+#endif
+}
+#endif
+
 static void free_typed_folio(struct folio *folio)
 {
 	switch (folio_get_type(folio)) {
@@ -108,7 +126,7 @@ static void free_typed_folio(struct folio *folio)
 #endif
 #ifdef CONFIG_KVM_GMEM_SHARED_MEM
 	case PGTY_guestmem:
-		kvm_gmem_handle_folio_put(folio);
+		gmem_folio_put(folio);
 		return;
 #endif
 	default:
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index b2aa6bf24d3a..5fc414becae5 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -13,6 +13,14 @@ struct kvm_gmem {
 	struct list_head entry;
 };
 
+#ifdef CONFIG_KVM_GMEM_SHARED_MEM
+void kvm_gmem_handle_folio_put(struct folio *folio)
+{
+	WARN_ONCE(1, "A placeholder that shouldn't trigger. Work in progress.");
+}
+EXPORT_SYMBOL_GPL(kvm_gmem_handle_folio_put);
+#endif /* CONFIG_KVM_GMEM_SHARED_MEM */
+
 /**
  * folio_file_pfn - like folio_file_page, but return a pfn.
  * @folio: The folio which contains this index.
-- 
2.49.0.rc0.332.g42c0ae87b1-goog



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v6 04/10] KVM: guest_memfd: Allow host to map guest_memfd() pages
  2025-03-12 17:58 [PATCH v6 00/10] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (2 preceding siblings ...)
  2025-03-12 17:58 ` [PATCH v6 03/10] KVM: guest_memfd: Handle kvm_gmem_handle_folio_put() for KVM as a module Fuad Tabba
@ 2025-03-12 17:58 ` Fuad Tabba
  2025-03-14 18:46   ` Ackerley Tng
  2025-03-12 17:58 ` [PATCH v6 05/10] KVM: guest_memfd: Handle in-place shared memory as guest_memfd backed memory Fuad Tabba
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 22+ messages in thread
From: Fuad Tabba @ 2025-03-12 17:58 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, tabba

Add support for mmap() and fault() for guest_memfd backed memory
in the host for VMs that support in-place conversion between
shared and private. To that end, this patch adds the ability to
check whether the VM type supports in-place conversion, and only
allows mapping its memory if that's the case.

Also add the KVM capability KVM_CAP_GMEM_SHARED_MEM, which
indicates that the VM supports shared memory in guest_memfd, or
that the host can create VMs that support shared memory.
Supporting shared memory implies that memory can be mapped when
shared with the host.

This is controlled by the KVM_GMEM_SHARED_MEM configuration
option.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 include/linux/kvm_host.h |  11 +++++
 include/uapi/linux/kvm.h |   1 +
 virt/kvm/guest_memfd.c   | 102 +++++++++++++++++++++++++++++++++++++++
 virt/kvm/kvm_main.c      |   4 ++
 4 files changed, 118 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 3ad0719bfc4f..601bbcaa5e41 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -728,6 +728,17 @@ static inline bool kvm_arch_has_private_mem(struct kvm *kvm)
 }
 #endif
 
+/*
+ * Arch code must define kvm_arch_gmem_supports_shared_mem if support for
+ * private memory is enabled and it supports in-place shared/private conversion.
+ */
+#if !defined(kvm_arch_gmem_supports_shared_mem) && !IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM)
+static inline bool kvm_arch_gmem_supports_shared_mem(struct kvm *kvm)
+{
+	return false;
+}
+#endif
+
 #ifndef kvm_arch_has_readonly_mem
 static inline bool kvm_arch_has_readonly_mem(struct kvm *kvm)
 {
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 45e6d8fca9b9..117937a895da 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -929,6 +929,7 @@ struct kvm_enable_cap {
 #define KVM_CAP_PRE_FAULT_MEMORY 236
 #define KVM_CAP_X86_APIC_BUS_CYCLES_NS 237
 #define KVM_CAP_X86_GUEST_MODE 238
+#define KVM_CAP_GMEM_SHARED_MEM 239
 
 struct kvm_irq_routing_irqchip {
 	__u32 irqchip;
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 5fc414becae5..eea44e003ed1 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -320,7 +320,109 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn)
 	return gfn - slot->base_gfn + slot->gmem.pgoff;
 }
 
+#ifdef CONFIG_KVM_GMEM_SHARED_MEM
+static bool folio_offset_is_shared(const struct folio *folio, struct file *file, pgoff_t index)
+{
+	struct kvm_gmem *gmem = file->private_data;
+
+	VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
+
+	/* For now, VMs that support shared memory share all their memory. */
+	return kvm_arch_gmem_supports_shared_mem(gmem->kvm);
+}
+
+static vm_fault_t kvm_gmem_fault(struct vm_fault *vmf)
+{
+	struct inode *inode = file_inode(vmf->vma->vm_file);
+	struct folio *folio;
+	vm_fault_t ret = VM_FAULT_LOCKED;
+
+	filemap_invalidate_lock_shared(inode->i_mapping);
+
+	folio = kvm_gmem_get_folio(inode, vmf->pgoff);
+	if (IS_ERR(folio)) {
+		int err = PTR_ERR(folio);
+
+		if (err == -EAGAIN)
+			ret = VM_FAULT_RETRY;
+		else
+			ret = vmf_error(err);
+
+		goto out_filemap;
+	}
+
+	if (folio_test_hwpoison(folio)) {
+		ret = VM_FAULT_HWPOISON;
+		goto out_folio;
+	}
+
+	if (!folio_offset_is_shared(folio, vmf->vma->vm_file, vmf->pgoff)) {
+		ret = VM_FAULT_SIGBUS;
+		goto out_folio;
+	}
+
+	/*
+	 * Shared folios would not be marked as "guestmem" so far, and we only
+	 * expect shared folios at this point.
+	 */
+	if (WARN_ON_ONCE(folio_test_guestmem(folio)))  {
+		ret = VM_FAULT_SIGBUS;
+		goto out_folio;
+	}
+
+	/* No support for huge pages. */
+	if (WARN_ON_ONCE(folio_test_large(folio))) {
+		ret = VM_FAULT_SIGBUS;
+		goto out_folio;
+	}
+
+	if (!folio_test_uptodate(folio)) {
+		clear_highpage(folio_page(folio, 0));
+		kvm_gmem_mark_prepared(folio);
+	}
+
+	vmf->page = folio_file_page(folio, vmf->pgoff);
+
+out_folio:
+	if (ret != VM_FAULT_LOCKED) {
+		folio_unlock(folio);
+		folio_put(folio);
+	}
+
+out_filemap:
+	filemap_invalidate_unlock_shared(inode->i_mapping);
+
+	return ret;
+}
+
+static const struct vm_operations_struct kvm_gmem_vm_ops = {
+	.fault = kvm_gmem_fault,
+};
+
+static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct kvm_gmem *gmem = file->private_data;
+
+	if (!kvm_arch_gmem_supports_shared_mem(gmem->kvm))
+		return -ENODEV;
+
+	if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) !=
+	    (VM_SHARED | VM_MAYSHARE)) {
+		return -EINVAL;
+	}
+
+	file_accessed(file);
+	vm_flags_set(vma, VM_DONTDUMP);
+	vma->vm_ops = &kvm_gmem_vm_ops;
+
+	return 0;
+}
+#else
+#define kvm_gmem_mmap NULL
+#endif /* CONFIG_KVM_GMEM_SHARED_MEM */
+
 static struct file_operations kvm_gmem_fops = {
+	.mmap		= kvm_gmem_mmap,
 	.open		= generic_file_open,
 	.release	= kvm_gmem_release,
 	.fallocate	= kvm_gmem_fallocate,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ba0327e2d0d3..38f0f402ea46 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4830,6 +4830,10 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 #ifdef CONFIG_KVM_PRIVATE_MEM
 	case KVM_CAP_GUEST_MEMFD:
 		return !kvm || kvm_arch_has_private_mem(kvm);
+#endif
+#ifdef CONFIG_KVM_GMEM_SHARED_MEM
+	case KVM_CAP_GMEM_SHARED_MEM:
+		return !kvm || kvm_arch_gmem_supports_shared_mem(kvm);
 #endif
 	default:
 		break;
-- 
2.49.0.rc0.332.g42c0ae87b1-goog



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v6 05/10] KVM: guest_memfd: Handle in-place shared memory as guest_memfd backed memory
  2025-03-12 17:58 [PATCH v6 00/10] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (3 preceding siblings ...)
  2025-03-12 17:58 ` [PATCH v6 04/10] KVM: guest_memfd: Allow host to map guest_memfd() pages Fuad Tabba
@ 2025-03-12 17:58 ` Fuad Tabba
  2025-03-12 17:58 ` [PATCH v6 06/10] KVM: x86: Mark KVM_X86_SW_PROTECTED_VM as supporting guest_memfd shared memory Fuad Tabba
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Fuad Tabba @ 2025-03-12 17:58 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, tabba

For VMs that allow sharing guest_memfd backed memory in-place,
handle that memory the same as "private" guest_memfd memory. This
means that faulting that memory in the host or in the guest will
go through the guest_memfd subsystem.

Note that the word "private" in the name of the function
kvm_mem_is_private() doesn't necessarily indicate that the memory
isn't shared, but is due to the history and evolution of
guest_memfd and the various names it has received. In effect,
this function is used to multiplex between the path of a normal
page fault and the path of a guest_memfd backed page fault.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 include/linux/kvm_host.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 601bbcaa5e41..3d5595a71a2a 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2521,7 +2521,8 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
 #else
 static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
 {
-	return false;
+	return kvm_arch_gmem_supports_shared_mem(kvm) &&
+	       kvm_slot_can_be_private(gfn_to_memslot(kvm, gfn));
 }
 #endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */
 
-- 
2.49.0.rc0.332.g42c0ae87b1-goog



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v6 06/10] KVM: x86: Mark KVM_X86_SW_PROTECTED_VM as supporting guest_memfd shared memory
  2025-03-12 17:58 [PATCH v6 00/10] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (4 preceding siblings ...)
  2025-03-12 17:58 ` [PATCH v6 05/10] KVM: guest_memfd: Handle in-place shared memory as guest_memfd backed memory Fuad Tabba
@ 2025-03-12 17:58 ` Fuad Tabba
  2025-03-12 17:58 ` [PATCH v6 07/10] KVM: arm64: Refactor user_mem_abort() calculation of force_pte Fuad Tabba
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Fuad Tabba @ 2025-03-12 17:58 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, tabba

The KVM_X86_SW_PROTECTED_VM type is meant for experimentation and
does not have any underlying support for protected guests. This
makes it a good candidate for testing mapping shared memory.
Therefore, when the kconfig option is enabled, mark
KVM_X86_SW_PROTECTED_VM as supporting shared memory.

This means that this memory is considered by guest_memfd to be
shared with the host, with the possibility of in-place conversion
between shared and private. This allows the host to map and fault
in guest_memfd memory belonging to this VM type.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/x86/include/asm/kvm_host.h | 5 +++++
 arch/x86/kvm/Kconfig            | 3 ++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 32ae3aa50c7e..b874e54a5ee4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2246,8 +2246,13 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
 
 #ifdef CONFIG_KVM_PRIVATE_MEM
 #define kvm_arch_has_private_mem(kvm) ((kvm)->arch.has_private_mem)
+
+#define kvm_arch_gmem_supports_shared_mem(kvm)         \
+	(IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM) &&      \
+	 ((kvm)->arch.vm_type == KVM_X86_SW_PROTECTED_VM))
 #else
 #define kvm_arch_has_private_mem(kvm) false
+#define kvm_arch_gmem_supports_shared_mem(kvm) false
 #endif
 
 #define kvm_arch_has_readonly_mem(kvm) (!(kvm)->arch.has_protected_state)
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index ea2c4f21c1ca..22d1bcdaad58 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -45,7 +45,8 @@ config KVM_X86
 	select HAVE_KVM_PM_NOTIFIER if PM
 	select KVM_GENERIC_HARDWARE_ENABLING
 	select KVM_GENERIC_PRE_FAULT_MEMORY
-	select KVM_GENERIC_PRIVATE_MEM if KVM_SW_PROTECTED_VM
+	select KVM_PRIVATE_MEM if KVM_SW_PROTECTED_VM
+	select KVM_GMEM_SHARED_MEM if KVM_SW_PROTECTED_VM
 	select KVM_WERROR if WERROR
 
 config KVM
-- 
2.49.0.rc0.332.g42c0ae87b1-goog



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v6 07/10] KVM: arm64: Refactor user_mem_abort() calculation of force_pte
  2025-03-12 17:58 [PATCH v6 00/10] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (5 preceding siblings ...)
  2025-03-12 17:58 ` [PATCH v6 06/10] KVM: x86: Mark KVM_X86_SW_PROTECTED_VM as supporting guest_memfd shared memory Fuad Tabba
@ 2025-03-12 17:58 ` Fuad Tabba
  2025-03-12 17:58 ` [PATCH v6 08/10] KVM: arm64: Handle guest_memfd()-backed guest page faults Fuad Tabba
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Fuad Tabba @ 2025-03-12 17:58 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, tabba

To simplify the code and to make the assumptions clearer,
refactor user_mem_abort() by immediately setting force_pte to
true if the conditions are met. Also, remove the comment about
logging_active being guaranteed to never be true for VM_PFNMAP
memslots, since it's not technically correct right now.

No functional change intended.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/arm64/kvm/mmu.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 1f55b0c7b11d..887ffa1f5b14 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1460,7 +1460,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			  bool fault_is_perm)
 {
 	int ret = 0;
-	bool write_fault, writable, force_pte = false;
+	bool write_fault, writable;
 	bool exec_fault, mte_allowed;
 	bool device = false, vfio_allow_any_uc = false;
 	unsigned long mmu_seq;
@@ -1472,6 +1472,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	gfn_t gfn;
 	kvm_pfn_t pfn;
 	bool logging_active = memslot_is_logging(memslot);
+	bool force_pte = logging_active || is_protected_kvm_enabled();
 	long vma_pagesize, fault_granule;
 	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
 	struct kvm_pgtable *pgt;
@@ -1521,16 +1522,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		return -EFAULT;
 	}
 
-	/*
-	 * logging_active is guaranteed to never be true for VM_PFNMAP
-	 * memslots.
-	 */
-	if (logging_active || is_protected_kvm_enabled()) {
-		force_pte = true;
+	if (force_pte)
 		vma_shift = PAGE_SHIFT;
-	} else {
+	else
 		vma_shift = get_vma_page_shift(vma, hva);
-	}
 
 	switch (vma_shift) {
 #ifndef __PAGETABLE_PMD_FOLDED
-- 
2.49.0.rc0.332.g42c0ae87b1-goog



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v6 08/10] KVM: arm64: Handle guest_memfd()-backed guest page faults
  2025-03-12 17:58 [PATCH v6 00/10] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (6 preceding siblings ...)
  2025-03-12 17:58 ` [PATCH v6 07/10] KVM: arm64: Refactor user_mem_abort() calculation of force_pte Fuad Tabba
@ 2025-03-12 17:58 ` Fuad Tabba
  2025-03-12 17:58 ` [PATCH v6 09/10] KVM: arm64: Enable mapping guest_memfd in arm64 Fuad Tabba
  2025-03-12 17:58 ` [PATCH v6 10/10] KVM: guest_memfd: selftests: guest_memfd mmap() test when mapping is allowed Fuad Tabba
  9 siblings, 0 replies; 22+ messages in thread
From: Fuad Tabba @ 2025-03-12 17:58 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, tabba

Add arm64 support for handling guest page faults on guest_memfd
backed memslots.

For now, the fault granule is restricted to PAGE_SIZE.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/arm64/kvm/mmu.c     | 65 +++++++++++++++++++++++++++-------------
 include/linux/kvm_host.h |  5 ++++
 virt/kvm/kvm_main.c      |  5 ----
 3 files changed, 50 insertions(+), 25 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 887ffa1f5b14..adb0681fc1c6 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1454,6 +1454,30 @@ static bool kvm_vma_mte_allowed(struct vm_area_struct *vma)
 	return vma->vm_flags & VM_MTE_ALLOWED;
 }
 
+static kvm_pfn_t faultin_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
+			     gfn_t gfn, bool write_fault, bool *writable,
+			     struct page **page, bool is_private)
+{
+	kvm_pfn_t pfn;
+	int ret;
+
+	if (!is_private)
+		return __kvm_faultin_pfn(slot, gfn, write_fault ? FOLL_WRITE : 0, writable, page);
+
+	*writable = false;
+
+	ret = kvm_gmem_get_pfn(kvm, slot, gfn, &pfn, page, NULL);
+	if (!ret) {
+		*writable = !memslot_is_readonly(slot);
+		return pfn;
+	}
+
+	if (ret == -EHWPOISON)
+		return KVM_PFN_ERR_HWPOISON;
+
+	return KVM_PFN_ERR_NOSLOT_MASK;
+}
+
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			  struct kvm_s2_trans *nested,
 			  struct kvm_memory_slot *memslot, unsigned long hva,
@@ -1461,19 +1485,20 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 {
 	int ret = 0;
 	bool write_fault, writable;
-	bool exec_fault, mte_allowed;
+	bool exec_fault, mte_allowed = false;
 	bool device = false, vfio_allow_any_uc = false;
 	unsigned long mmu_seq;
 	phys_addr_t ipa = fault_ipa;
 	struct kvm *kvm = vcpu->kvm;
-	struct vm_area_struct *vma;
+	struct vm_area_struct *vma = NULL;
 	short vma_shift;
 	void *memcache;
-	gfn_t gfn;
+	gfn_t gfn = ipa >> PAGE_SHIFT;
 	kvm_pfn_t pfn;
 	bool logging_active = memslot_is_logging(memslot);
-	bool force_pte = logging_active || is_protected_kvm_enabled();
-	long vma_pagesize, fault_granule;
+	bool is_gmem = kvm_mem_is_private(kvm, gfn);
+	bool force_pte = logging_active || is_gmem || is_protected_kvm_enabled();
+	long vma_pagesize, fault_granule = PAGE_SIZE;
 	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
 	struct kvm_pgtable *pgt;
 	struct page *page;
@@ -1510,16 +1535,22 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			return ret;
 	}
 
+	mmap_read_lock(current->mm);
+
 	/*
 	 * Let's check if we will get back a huge page backed by hugetlbfs, or
 	 * get block mapping for device MMIO region.
 	 */
-	mmap_read_lock(current->mm);
-	vma = vma_lookup(current->mm, hva);
-	if (unlikely(!vma)) {
-		kvm_err("Failed to find VMA for hva 0x%lx\n", hva);
-		mmap_read_unlock(current->mm);
-		return -EFAULT;
+	if (!is_gmem) {
+		vma = vma_lookup(current->mm, hva);
+		if (unlikely(!vma)) {
+			kvm_err("Failed to find VMA for hva 0x%lx\n", hva);
+			mmap_read_unlock(current->mm);
+			return -EFAULT;
+		}
+
+		vfio_allow_any_uc = vma->vm_flags & VM_ALLOW_ANY_UNCACHED;
+		mte_allowed = kvm_vma_mte_allowed(vma);
 	}
 
 	if (force_pte)
@@ -1590,18 +1621,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		ipa &= ~(vma_pagesize - 1);
 	}
 
-	gfn = ipa >> PAGE_SHIFT;
-	mte_allowed = kvm_vma_mte_allowed(vma);
-
-	vfio_allow_any_uc = vma->vm_flags & VM_ALLOW_ANY_UNCACHED;
-
 	/* Don't use the VMA after the unlock -- it may have vanished */
 	vma = NULL;
 
 	/*
 	 * Read mmu_invalidate_seq so that KVM can detect if the results of
-	 * vma_lookup() or __kvm_faultin_pfn() become stale prior to
-	 * acquiring kvm->mmu_lock.
+	 * vma_lookup() or faultin_pfn() become stale prior to acquiring
+	 * kvm->mmu_lock.
 	 *
 	 * Rely on mmap_read_unlock() for an implicit smp_rmb(), which pairs
 	 * with the smp_wmb() in kvm_mmu_invalidate_end().
@@ -1609,8 +1635,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	mmu_seq = vcpu->kvm->mmu_invalidate_seq;
 	mmap_read_unlock(current->mm);
 
-	pfn = __kvm_faultin_pfn(memslot, gfn, write_fault ? FOLL_WRITE : 0,
-				&writable, &page);
+	pfn = faultin_pfn(kvm, memslot, gfn, write_fault, &writable, &page, is_gmem);
 	if (pfn == KVM_PFN_ERR_HWPOISON) {
 		kvm_send_hwpoison_signal(hva, vma_shift);
 		return 0;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 3d5595a71a2a..ec3bedc18eab 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1882,6 +1882,11 @@ static inline int memslot_id(struct kvm *kvm, gfn_t gfn)
 	return gfn_to_memslot(kvm, gfn)->id;
 }
 
+static inline bool memslot_is_readonly(const struct kvm_memory_slot *slot)
+{
+	return slot->flags & KVM_MEM_READONLY;
+}
+
 static inline gfn_t
 hva_to_gfn_memslot(unsigned long hva, struct kvm_memory_slot *slot)
 {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 38f0f402ea46..3e40acb9f5c0 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2624,11 +2624,6 @@ unsigned long kvm_host_page_size(struct kvm_vcpu *vcpu, gfn_t gfn)
 	return size;
 }
 
-static bool memslot_is_readonly(const struct kvm_memory_slot *slot)
-{
-	return slot->flags & KVM_MEM_READONLY;
-}
-
 static unsigned long __gfn_to_hva_many(const struct kvm_memory_slot *slot, gfn_t gfn,
 				       gfn_t *nr_pages, bool write)
 {
-- 
2.49.0.rc0.332.g42c0ae87b1-goog



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v6 09/10] KVM: arm64: Enable mapping guest_memfd in arm64
  2025-03-12 17:58 [PATCH v6 00/10] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (7 preceding siblings ...)
  2025-03-12 17:58 ` [PATCH v6 08/10] KVM: arm64: Handle guest_memfd()-backed guest page faults Fuad Tabba
@ 2025-03-12 17:58 ` Fuad Tabba
  2025-03-13 14:20   ` kernel test robot
  2025-03-12 17:58 ` [PATCH v6 10/10] KVM: guest_memfd: selftests: guest_memfd mmap() test when mapping is allowed Fuad Tabba
  9 siblings, 1 reply; 22+ messages in thread
From: Fuad Tabba @ 2025-03-12 17:58 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, tabba

Enable mapping guest_memfd in arm64. For now, it applies to all
VMs in arm64 that use guest_memfd. In the future, new VM types
can restrict this via kvm_arch_gmem_supports_shared_mem().

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/arm64/include/asm/kvm_host.h | 10 ++++++++++
 arch/arm64/kvm/Kconfig            |  1 +
 2 files changed, 11 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index d919557af5e5..b3b154b81d97 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -1543,4 +1543,14 @@ void kvm_set_vm_id_reg(struct kvm *kvm, u32 reg, u64 val);
 #define kvm_has_s1poe(k)				\
 	(kvm_has_feat((k), ID_AA64MMFR3_EL1, S1POE, IMP))
 
+static inline bool kvm_arch_has_private_mem(struct kvm *kvm)
+{
+	return IS_ENABLED(CONFIG_KVM_PRIVATE_MEM);
+}
+
+static inline bool kvm_arch_gmem_supports_shared_mem(struct kvm *kvm)
+{
+	return IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM);
+}
+
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index ead632ad01b4..4830d8805bed 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -38,6 +38,7 @@ menuconfig KVM
 	select HAVE_KVM_VCPU_RUN_PID_CHANGE
 	select SCHED_INFO
 	select GUEST_PERF_EVENTS if PERF_EVENTS
+	select KVM_GMEM_SHARED_MEM
 	help
 	  Support hosting virtualized guest machines.
 
-- 
2.49.0.rc0.332.g42c0ae87b1-goog



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v6 10/10] KVM: guest_memfd: selftests: guest_memfd mmap() test when mapping is allowed
  2025-03-12 17:58 [PATCH v6 00/10] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
                   ` (8 preceding siblings ...)
  2025-03-12 17:58 ` [PATCH v6 09/10] KVM: arm64: Enable mapping guest_memfd in arm64 Fuad Tabba
@ 2025-03-12 17:58 ` Fuad Tabba
  9 siblings, 0 replies; 22+ messages in thread
From: Fuad Tabba @ 2025-03-12 17:58 UTC (permalink / raw)
  To: kvm, linux-arm-msm, linux-mm
  Cc: pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	seanjc, viro, brauner, willy, akpm, xiaoyao.li, yilun.xu,
	chao.p.peng, jarkko, amoorthy, dmatlack, isaku.yamahata, mic,
	vbabka, vannapurve, ackerleytng, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx, tabba

Expand the guest_memfd selftests to include testing mapping guest
memory for VM types that support it.

Also, build the guest_memfd selftest for aarch64.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 tools/testing/selftests/kvm/Makefile.kvm      |  1 +
 .../testing/selftests/kvm/guest_memfd_test.c  | 75 +++++++++++++++++--
 2 files changed, 70 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index 4277b983cace..c9a3f30e28dd 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -160,6 +160,7 @@ TEST_GEN_PROGS_arm64 += coalesced_io_test
 TEST_GEN_PROGS_arm64 += demand_paging_test
 TEST_GEN_PROGS_arm64 += dirty_log_test
 TEST_GEN_PROGS_arm64 += dirty_log_perf_test
+TEST_GEN_PROGS_arm64 += guest_memfd_test
 TEST_GEN_PROGS_arm64 += guest_print_test
 TEST_GEN_PROGS_arm64 += get-reg-list
 TEST_GEN_PROGS_arm64 += kvm_create_max_vcpus
diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
index ce687f8d248f..38c501e49e0e 100644
--- a/tools/testing/selftests/kvm/guest_memfd_test.c
+++ b/tools/testing/selftests/kvm/guest_memfd_test.c
@@ -34,12 +34,48 @@ static void test_file_read_write(int fd)
 		    "pwrite on a guest_mem fd should fail");
 }
 
-static void test_mmap(int fd, size_t page_size)
+static void test_mmap_allowed(int fd, size_t total_size)
 {
+	size_t page_size = getpagesize();
+	const char val = 0xaa;
+	char *mem;
+	int ret;
+	int i;
+
+	mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+	TEST_ASSERT(mem != MAP_FAILED, "mmaping() guest memory should pass.");
+
+	memset(mem, val, total_size);
+	for (i = 0; i < total_size; i++)
+		TEST_ASSERT_EQ(mem[i], val);
+
+	ret = fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, 0,
+			page_size);
+	TEST_ASSERT(!ret, "fallocate the first page should succeed");
+
+	for (i = 0; i < page_size; i++)
+		TEST_ASSERT_EQ(mem[i], 0x00);
+	for (; i < total_size; i++)
+		TEST_ASSERT_EQ(mem[i], val);
+
+	memset(mem, val, total_size);
+	for (i = 0; i < total_size; i++)
+		TEST_ASSERT_EQ(mem[i], val);
+
+	ret = munmap(mem, total_size);
+	TEST_ASSERT(!ret, "munmap should succeed");
+}
+
+static void test_mmap_denied(int fd, size_t total_size)
+{
+	size_t page_size = getpagesize();
 	char *mem;
 
 	mem = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
 	TEST_ASSERT_EQ(mem, MAP_FAILED);
+
+	mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+	TEST_ASSERT_EQ(mem, MAP_FAILED);
 }
 
 static void test_file_size(int fd, size_t page_size, size_t total_size)
@@ -170,19 +206,27 @@ static void test_create_guest_memfd_multiple(struct kvm_vm *vm)
 	close(fd1);
 }
 
-int main(int argc, char *argv[])
+unsigned long get_shared_type(void)
 {
-	size_t page_size;
+#ifdef __x86_64__
+	return KVM_X86_SW_PROTECTED_VM;
+#endif
+	return 0;
+}
+
+void test_vm_type(unsigned long type, bool is_shared)
+{
+	struct kvm_vm *vm;
 	size_t total_size;
+	size_t page_size;
 	int fd;
-	struct kvm_vm *vm;
 
 	TEST_REQUIRE(kvm_has_cap(KVM_CAP_GUEST_MEMFD));
 
 	page_size = getpagesize();
 	total_size = page_size * 4;
 
-	vm = vm_create_barebones();
+	vm = vm_create_barebones_type(type);
 
 	test_create_guest_memfd_invalid(vm);
 	test_create_guest_memfd_multiple(vm);
@@ -190,10 +234,29 @@ int main(int argc, char *argv[])
 	fd = vm_create_guest_memfd(vm, total_size, 0);
 
 	test_file_read_write(fd);
-	test_mmap(fd, page_size);
+
+	if (is_shared)
+		test_mmap_allowed(fd, total_size);
+	else
+		test_mmap_denied(fd, total_size);
+
 	test_file_size(fd, page_size, total_size);
 	test_fallocate(fd, page_size, total_size);
 	test_invalid_punch_hole(fd, page_size, total_size);
 
 	close(fd);
+	kvm_vm_release(vm);
+}
+
+int main(int argc, char *argv[])
+{
+#ifndef __aarch64__
+	/* For now, arm64 only supports shared guest memory. */
+	test_vm_type(VM_TYPE_DEFAULT, false);
+#endif
+
+	if (kvm_has_cap(KVM_CAP_GMEM_SHARED_MEM))
+		test_vm_type(get_shared_type(), true);
+
+	return 0;
 }
-- 
2.49.0.rc0.332.g42c0ae87b1-goog



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v6 03/10] KVM: guest_memfd: Handle kvm_gmem_handle_folio_put() for KVM as a module
  2025-03-12 17:58 ` [PATCH v6 03/10] KVM: guest_memfd: Handle kvm_gmem_handle_folio_put() for KVM as a module Fuad Tabba
@ 2025-03-13 13:46   ` Ackerley Tng
  2025-03-13 13:49   ` Ackerley Tng
  1 sibling, 0 replies; 22+ messages in thread
From: Ackerley Tng @ 2025-03-13 13:46 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat,
	shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton,
	peterx, tabba

Fuad Tabba <tabba@google.com> writes:

> In some architectures, KVM could be defined as a module. If there is a
> pending folio_put() while KVM is unloaded, the system could crash. By
> having a helper check for that and call the function only if it's
> available, we are able to handle that case more gracefully.
>
> Signed-off-by: Fuad Tabba <tabba@google.com>
>
> ---
>
> This patch could be squashed with the previous one of the maintainers
> think it would be better.
> ---
>  include/linux/kvm_host.h |  5 +----
>  mm/swap.c                | 20 +++++++++++++++++++-
>  virt/kvm/guest_memfd.c   |  8 ++++++++
>  3 files changed, 28 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 7788e3625f6d..3ad0719bfc4f 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -2572,10 +2572,7 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
>  #endif
>  
>  #ifdef CONFIG_KVM_GMEM_SHARED_MEM
> -static inline void kvm_gmem_handle_folio_put(struct folio *folio)
> -{
> -	WARN_ONCE(1, "A placeholder that shouldn't trigger. Work in progress.");
> -}
> +void kvm_gmem_handle_folio_put(struct folio *folio);
>  #endif
>  
>  #endif
> diff --git a/mm/swap.c b/mm/swap.c
> index 241880a46358..27dfd75536c8 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -98,6 +98,24 @@ static void page_cache_release(struct folio *folio)
>  		unlock_page_lruvec_irqrestore(lruvec, flags);
>  }
>  
> +#ifdef CONFIG_KVM_GMEM_SHARED_MEM
> +static void gmem_folio_put(struct folio *folio)
> +{
> +#if IS_MODULE(CONFIG_KVM)
> +	void (*fn)(struct folio *folio);
> +
> +	fn = symbol_get(kvm_gmem_handle_folio_put);
> +	if (WARN_ON_ONCE(!fn))
> +		return;
> +
> +	fn(folio);
> +	symbol_put(kvm_gmem_handle_folio_put);
> +#else
> +	kvm_gmem_handle_folio_put(folio);
> +#endif
> +}
> +#endif

I was thinking of having a static function pointer within mm/swap.c that
will be filled in and mm/swap.c 

> +
>  static void free_typed_folio(struct folio *folio)
>  {
>  	switch (folio_get_type(folio)) {
> @@ -108,7 +126,7 @@ static void free_typed_folio(struct folio *folio)
>  #endif
>  #ifdef CONFIG_KVM_GMEM_SHARED_MEM
>  	case PGTY_guestmem:
> -		kvm_gmem_handle_folio_put(folio);
> +		gmem_folio_put(folio);
>  		return;
>  #endif
>  	default:
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index b2aa6bf24d3a..5fc414becae5 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -13,6 +13,14 @@ struct kvm_gmem {
>  	struct list_head entry;
>  };
>  
> +#ifdef CONFIG_KVM_GMEM_SHARED_MEM
> +void kvm_gmem_handle_folio_put(struct folio *folio)
> +{
> +	WARN_ONCE(1, "A placeholder that shouldn't trigger. Work in progress.");
> +}
> +EXPORT_SYMBOL_GPL(kvm_gmem_handle_folio_put);
> +#endif /* CONFIG_KVM_GMEM_SHARED_MEM */
> +
>  /**
>   * folio_file_pfn - like folio_file_page, but return a pfn.
>   * @folio: The folio which contains this index.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v6 03/10] KVM: guest_memfd: Handle kvm_gmem_handle_folio_put() for KVM as a module
  2025-03-12 17:58 ` [PATCH v6 03/10] KVM: guest_memfd: Handle kvm_gmem_handle_folio_put() for KVM as a module Fuad Tabba
  2025-03-13 13:46   ` Ackerley Tng
@ 2025-03-13 13:49   ` Ackerley Tng
  2025-03-13 13:57     ` Fuad Tabba
  2025-03-17 13:43     ` Vlastimil Babka
  1 sibling, 2 replies; 22+ messages in thread
From: Ackerley Tng @ 2025-03-13 13:49 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat,
	shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton,
	peterx, tabba

Fuad Tabba <tabba@google.com> writes:

> In some architectures, KVM could be defined as a module. If there is a
> pending folio_put() while KVM is unloaded, the system could crash. By
> having a helper check for that and call the function only if it's
> available, we are able to handle that case more gracefully.
>
> Signed-off-by: Fuad Tabba <tabba@google.com>
>
> ---
>
> This patch could be squashed with the previous one of the maintainers
> think it would be better.
> ---
>  include/linux/kvm_host.h |  5 +----
>  mm/swap.c                | 20 +++++++++++++++++++-
>  virt/kvm/guest_memfd.c   |  8 ++++++++
>  3 files changed, 28 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 7788e3625f6d..3ad0719bfc4f 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -2572,10 +2572,7 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
>  #endif
>  
>  #ifdef CONFIG_KVM_GMEM_SHARED_MEM
> -static inline void kvm_gmem_handle_folio_put(struct folio *folio)
> -{
> -	WARN_ONCE(1, "A placeholder that shouldn't trigger. Work in progress.");
> -}
> +void kvm_gmem_handle_folio_put(struct folio *folio);
>  #endif
>  
>  #endif
> diff --git a/mm/swap.c b/mm/swap.c
> index 241880a46358..27dfd75536c8 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -98,6 +98,24 @@ static void page_cache_release(struct folio *folio)
>  		unlock_page_lruvec_irqrestore(lruvec, flags);
>  }
>  
> +#ifdef CONFIG_KVM_GMEM_SHARED_MEM
> +static void gmem_folio_put(struct folio *folio)
> +{
> +#if IS_MODULE(CONFIG_KVM)
> +	void (*fn)(struct folio *folio);
> +
> +	fn = symbol_get(kvm_gmem_handle_folio_put);
> +	if (WARN_ON_ONCE(!fn))
> +		return;
> +
> +	fn(folio);
> +	symbol_put(kvm_gmem_handle_folio_put);
> +#else
> +	kvm_gmem_handle_folio_put(folio);
> +#endif
> +}
> +#endif

Sorry about the premature sending earlier!

I was thinking about having a static function pointer in mm/swap.c that
will be filled in when KVM is loaded and cleared when KVM is unloaded.

One benefit I see is that it'll avoid the lookup that symbol_get() does
on every folio_put(), but some other pinning on KVM would have to be
done to prevent KVM from being unloaded in the middle of
kvm_gmem_handle_folio_put() call.

Do you/anyone else see pros/cons either way?

> +
>  static void free_typed_folio(struct folio *folio)
>  {
>  	switch (folio_get_type(folio)) {
> @@ -108,7 +126,7 @@ static void free_typed_folio(struct folio *folio)
>  #endif
>  #ifdef CONFIG_KVM_GMEM_SHARED_MEM
>  	case PGTY_guestmem:
> -		kvm_gmem_handle_folio_put(folio);
> +		gmem_folio_put(folio);
>  		return;
>  #endif
>  	default:
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index b2aa6bf24d3a..5fc414becae5 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -13,6 +13,14 @@ struct kvm_gmem {
>  	struct list_head entry;
>  };
>  
> +#ifdef CONFIG_KVM_GMEM_SHARED_MEM
> +void kvm_gmem_handle_folio_put(struct folio *folio)
> +{
> +	WARN_ONCE(1, "A placeholder that shouldn't trigger. Work in progress.");
> +}
> +EXPORT_SYMBOL_GPL(kvm_gmem_handle_folio_put);
> +#endif /* CONFIG_KVM_GMEM_SHARED_MEM */
> +
>  /**
>   * folio_file_pfn - like folio_file_page, but return a pfn.
>   * @folio: The folio which contains this index.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v6 03/10] KVM: guest_memfd: Handle kvm_gmem_handle_folio_put() for KVM as a module
  2025-03-13 13:49   ` Ackerley Tng
@ 2025-03-13 13:57     ` Fuad Tabba
  2025-03-17 13:43     ` Vlastimil Babka
  1 sibling, 0 replies; 22+ messages in thread
From: Fuad Tabba @ 2025-03-13 13:57 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat,
	shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton,
	peterx

Hi Ackerley,

On Thu, 13 Mar 2025 at 13:49, Ackerley Tng <ackerleytng@google.com> wrote:
>
> Fuad Tabba <tabba@google.com> writes:
>
> > In some architectures, KVM could be defined as a module. If there is a
> > pending folio_put() while KVM is unloaded, the system could crash. By
> > having a helper check for that and call the function only if it's
> > available, we are able to handle that case more gracefully.
> >
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> >
> > ---
> >
> > This patch could be squashed with the previous one of the maintainers
> > think it would be better.
> > ---
> >  include/linux/kvm_host.h |  5 +----
> >  mm/swap.c                | 20 +++++++++++++++++++-
> >  virt/kvm/guest_memfd.c   |  8 ++++++++
> >  3 files changed, 28 insertions(+), 5 deletions(-)
> >
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index 7788e3625f6d..3ad0719bfc4f 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -2572,10 +2572,7 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
> >  #endif
> >
> >  #ifdef CONFIG_KVM_GMEM_SHARED_MEM
> > -static inline void kvm_gmem_handle_folio_put(struct folio *folio)
> > -{
> > -     WARN_ONCE(1, "A placeholder that shouldn't trigger. Work in progress.");
> > -}
> > +void kvm_gmem_handle_folio_put(struct folio *folio);
> >  #endif
> >
> >  #endif
> > diff --git a/mm/swap.c b/mm/swap.c
> > index 241880a46358..27dfd75536c8 100644
> > --- a/mm/swap.c
> > +++ b/mm/swap.c
> > @@ -98,6 +98,24 @@ static void page_cache_release(struct folio *folio)
> >               unlock_page_lruvec_irqrestore(lruvec, flags);
> >  }
> >
> > +#ifdef CONFIG_KVM_GMEM_SHARED_MEM
> > +static void gmem_folio_put(struct folio *folio)
> > +{
> > +#if IS_MODULE(CONFIG_KVM)
> > +     void (*fn)(struct folio *folio);
> > +
> > +     fn = symbol_get(kvm_gmem_handle_folio_put);
> > +     if (WARN_ON_ONCE(!fn))
> > +             return;
> > +
> > +     fn(folio);
> > +     symbol_put(kvm_gmem_handle_folio_put);
> > +#else
> > +     kvm_gmem_handle_folio_put(folio);
> > +#endif
> > +}
> > +#endif
>
> Sorry about the premature sending earlier!
>
> I was thinking about having a static function pointer in mm/swap.c that
> will be filled in when KVM is loaded and cleared when KVM is unloaded.
>
> One benefit I see is that it'll avoid the lookup that symbol_get() does
> on every folio_put(), but some other pinning on KVM would have to be
> done to prevent KVM from being unloaded in the middle of
> kvm_gmem_handle_folio_put() call.
>
> Do you/anyone else see pros/cons either way?

To be honest, I didn't do it that way (or even think of doing it that
way) since that's how vfio does it, e.g.,:

https://elixir.bootlin.com/linux/v6.14-rc6/source/virt/kvm/vfio.c#L38

Like you said, your suggestion would require something like
try_module_get()/put() to ensure that it's not unloaded. Is that
better? I don't know... Maybe someone with more experience could chime
in.

Thanks!
/fuad


> > +
> >  static void free_typed_folio(struct folio *folio)
> >  {
> >       switch (folio_get_type(folio)) {
> > @@ -108,7 +126,7 @@ static void free_typed_folio(struct folio *folio)
> >  #endif
> >  #ifdef CONFIG_KVM_GMEM_SHARED_MEM
> >       case PGTY_guestmem:
> > -             kvm_gmem_handle_folio_put(folio);
> > +             gmem_folio_put(folio);
> >               return;
> >  #endif
> >       default:
> > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> > index b2aa6bf24d3a..5fc414becae5 100644
> > --- a/virt/kvm/guest_memfd.c
> > +++ b/virt/kvm/guest_memfd.c
> > @@ -13,6 +13,14 @@ struct kvm_gmem {
> >       struct list_head entry;
> >  };
> >
> > +#ifdef CONFIG_KVM_GMEM_SHARED_MEM
> > +void kvm_gmem_handle_folio_put(struct folio *folio)
> > +{
> > +     WARN_ONCE(1, "A placeholder that shouldn't trigger. Work in progress.");
> > +}
> > +EXPORT_SYMBOL_GPL(kvm_gmem_handle_folio_put);
> > +#endif /* CONFIG_KVM_GMEM_SHARED_MEM */
> > +
> >  /**
> >   * folio_file_pfn - like folio_file_page, but return a pfn.
> >   * @folio: The folio which contains this index.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v6 09/10] KVM: arm64: Enable mapping guest_memfd in arm64
  2025-03-12 17:58 ` [PATCH v6 09/10] KVM: arm64: Enable mapping guest_memfd in arm64 Fuad Tabba
@ 2025-03-13 14:20   ` kernel test robot
  0 siblings, 0 replies; 22+ messages in thread
From: kernel test robot @ 2025-03-13 14:20 UTC (permalink / raw)
  To: Fuad Tabba, kvm, linux-arm-msm, linux-mm
  Cc: llvm, oe-kbuild-all, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, ackerleytng, mail,
	david, michael.roth, wei.w.wang

Hi Fuad,

kernel test robot noticed the following build errors:

[auto build test ERROR on 80e54e84911a923c40d7bee33a34c1b4be148d7a]

url:    https://github.com/intel-lab-lkp/linux/commits/Fuad-Tabba/mm-Consolidate-freeing-of-typed-folios-on-final-folio_put/20250313-020010
base:   80e54e84911a923c40d7bee33a34c1b4be148d7a
patch link:    https://lore.kernel.org/r/20250312175824.1809636-10-tabba%40google.com
patch subject: [PATCH v6 09/10] KVM: arm64: Enable mapping guest_memfd in arm64
config: arm64-allnoconfig (https://download.01.org/0day-ci/archive/20250313/202503132205.Ajz52k8I-lkp@intel.com/config)
compiler: aarch64-linux-gcc (GCC) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250313/202503132205.Ajz52k8I-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202503132205.Ajz52k8I-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from arch/arm64/kernel/asm-offsets.c:15:
>> include/linux/kvm_host.h:725:20: error: redefinition of 'kvm_arch_has_private_mem'
     725 | static inline bool kvm_arch_has_private_mem(struct kvm *kvm)
         |                    ^~~~~~~~~~~~~~~~~~~~~~~~
   In file included from include/linux/kvm_host.h:45:
   arch/arm64/include/asm/kvm_host.h:1546:20: note: previous definition of 'kvm_arch_has_private_mem' with type 'bool(struct kvm *)' {aka '_Bool(struct kvm *)'}
    1546 | static inline bool kvm_arch_has_private_mem(struct kvm *kvm)
         |                    ^~~~~~~~~~~~~~~~~~~~~~~~
>> include/linux/kvm_host.h:736:20: error: redefinition of 'kvm_arch_gmem_supports_shared_mem'
     736 | static inline bool kvm_arch_gmem_supports_shared_mem(struct kvm *kvm)
         |                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   arch/arm64/include/asm/kvm_host.h:1551:20: note: previous definition of 'kvm_arch_gmem_supports_shared_mem' with type 'bool(struct kvm *)' {aka '_Bool(struct kvm *)'}
    1551 | static inline bool kvm_arch_gmem_supports_shared_mem(struct kvm *kvm)
         |                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   make[3]: *** [scripts/Makefile.build:102: arch/arm64/kernel/asm-offsets.s] Error 1
   make[3]: Target 'prepare' not remade because of errors.
   make[2]: *** [Makefile:1269: prepare0] Error 2
   make[2]: Target 'prepare' not remade because of errors.
   make[1]: *** [Makefile:251: __sub-make] Error 2
   make[1]: Target 'prepare' not remade because of errors.
   make: *** [Makefile:251: __sub-make] Error 2
   make: Target 'prepare' not remade because of errors.


vim +/kvm_arch_has_private_mem +725 include/linux/kvm_host.h

f481b069e674378 Paolo Bonzini       2015-05-17  719  
a7800aa80ea4d53 Sean Christopherson 2023-11-13  720  /*
a7800aa80ea4d53 Sean Christopherson 2023-11-13  721   * Arch code must define kvm_arch_has_private_mem if support for private memory
a7800aa80ea4d53 Sean Christopherson 2023-11-13  722   * is enabled.
a7800aa80ea4d53 Sean Christopherson 2023-11-13  723   */
a7800aa80ea4d53 Sean Christopherson 2023-11-13  724  #if !defined(kvm_arch_has_private_mem) && !IS_ENABLED(CONFIG_KVM_PRIVATE_MEM)
a7800aa80ea4d53 Sean Christopherson 2023-11-13 @725  static inline bool kvm_arch_has_private_mem(struct kvm *kvm)
a7800aa80ea4d53 Sean Christopherson 2023-11-13  726  {
a7800aa80ea4d53 Sean Christopherson 2023-11-13  727  	return false;
a7800aa80ea4d53 Sean Christopherson 2023-11-13  728  }
a7800aa80ea4d53 Sean Christopherson 2023-11-13  729  #endif
a7800aa80ea4d53 Sean Christopherson 2023-11-13  730  
a765e4ca28eb657 Fuad Tabba          2025-03-12  731  /*
a765e4ca28eb657 Fuad Tabba          2025-03-12  732   * Arch code must define kvm_arch_gmem_supports_shared_mem if support for
a765e4ca28eb657 Fuad Tabba          2025-03-12  733   * private memory is enabled and it supports in-place shared/private conversion.
a765e4ca28eb657 Fuad Tabba          2025-03-12  734   */
a765e4ca28eb657 Fuad Tabba          2025-03-12  735  #if !defined(kvm_arch_gmem_supports_shared_mem) && !IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM)
a765e4ca28eb657 Fuad Tabba          2025-03-12 @736  static inline bool kvm_arch_gmem_supports_shared_mem(struct kvm *kvm)
a765e4ca28eb657 Fuad Tabba          2025-03-12  737  {
a765e4ca28eb657 Fuad Tabba          2025-03-12  738  	return false;
a765e4ca28eb657 Fuad Tabba          2025-03-12  739  }
a765e4ca28eb657 Fuad Tabba          2025-03-12  740  #endif
a765e4ca28eb657 Fuad Tabba          2025-03-12  741  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v6 04/10] KVM: guest_memfd: Allow host to map guest_memfd() pages
  2025-03-12 17:58 ` [PATCH v6 04/10] KVM: guest_memfd: Allow host to map guest_memfd() pages Fuad Tabba
@ 2025-03-14 18:46   ` Ackerley Tng
  2025-03-17 10:42     ` Fuad Tabba
  0 siblings, 1 reply; 22+ messages in thread
From: Ackerley Tng @ 2025-03-14 18:46 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat,
	shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton,
	peterx, tabba

Fuad Tabba <tabba@google.com> writes:

> Add support for mmap() and fault() for guest_memfd backed memory
> in the host for VMs that support in-place conversion between
> shared and private. To that end, this patch adds the ability to
> check whether the VM type supports in-place conversion, and only
> allows mapping its memory if that's the case.
>
> Also add the KVM capability KVM_CAP_GMEM_SHARED_MEM, which
> indicates that the VM supports shared memory in guest_memfd, or
> that the host can create VMs that support shared memory.
> Supporting shared memory implies that memory can be mapped when
> shared with the host.
>
> This is controlled by the KVM_GMEM_SHARED_MEM configuration
> option.
>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>  include/linux/kvm_host.h |  11 +++++
>  include/uapi/linux/kvm.h |   1 +
>  virt/kvm/guest_memfd.c   | 102 +++++++++++++++++++++++++++++++++++++++
>  virt/kvm/kvm_main.c      |   4 ++
>  4 files changed, 118 insertions(+)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 3ad0719bfc4f..601bbcaa5e41 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -728,6 +728,17 @@ static inline bool kvm_arch_has_private_mem(struct kvm *kvm)
>  }
>  #endif
>  
> +/*
> + * Arch code must define kvm_arch_gmem_supports_shared_mem if support for
> + * private memory is enabled and it supports in-place shared/private conversion.
> + */
> +#if !defined(kvm_arch_gmem_supports_shared_mem) && !IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM)
> +static inline bool kvm_arch_gmem_supports_shared_mem(struct kvm *kvm)
> +{
> +	return false;
> +}
> +#endif
> +
>  #ifndef kvm_arch_has_readonly_mem
>  static inline bool kvm_arch_has_readonly_mem(struct kvm *kvm)
>  {
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 45e6d8fca9b9..117937a895da 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -929,6 +929,7 @@ struct kvm_enable_cap {
>  #define KVM_CAP_PRE_FAULT_MEMORY 236
>  #define KVM_CAP_X86_APIC_BUS_CYCLES_NS 237
>  #define KVM_CAP_X86_GUEST_MODE 238
> +#define KVM_CAP_GMEM_SHARED_MEM 239
>  
>  struct kvm_irq_routing_irqchip {
>  	__u32 irqchip;
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 5fc414becae5..eea44e003ed1 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -320,7 +320,109 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn)
>  	return gfn - slot->base_gfn + slot->gmem.pgoff;
>  }
>  
> +#ifdef CONFIG_KVM_GMEM_SHARED_MEM
> +static bool folio_offset_is_shared(const struct folio *folio, struct file *file, pgoff_t index)
> +{
> +	struct kvm_gmem *gmem = file->private_data;
> +
> +	VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);

I should've commented on this in the last series, but why must folio
lock be held to check if this offset is shared?

I was thinking to use the filemap's lock (filemap_invalidate_lock()) to
guard mappability races. Does that work too?

> +
> +	/* For now, VMs that support shared memory share all their memory. */
> +	return kvm_arch_gmem_supports_shared_mem(gmem->kvm);
> +}
> +
> +static vm_fault_t kvm_gmem_fault(struct vm_fault *vmf)
> +{
> +	struct inode *inode = file_inode(vmf->vma->vm_file);
> +	struct folio *folio;
> +	vm_fault_t ret = VM_FAULT_LOCKED;
> +
> +	filemap_invalidate_lock_shared(inode->i_mapping);
> +
> +	folio = kvm_gmem_get_folio(inode, vmf->pgoff);
> +	if (IS_ERR(folio)) {
> +		int err = PTR_ERR(folio);
> +
> +		if (err == -EAGAIN)
> +			ret = VM_FAULT_RETRY;
> +		else
> +			ret = vmf_error(err);
> +
> +		goto out_filemap;
> +	}
> +
> +	if (folio_test_hwpoison(folio)) {
> +		ret = VM_FAULT_HWPOISON;
> +		goto out_folio;
> +	}
> +
> +	if (!folio_offset_is_shared(folio, vmf->vma->vm_file, vmf->pgoff)) {
> +		ret = VM_FAULT_SIGBUS;
> +		goto out_folio;
> +	}
> +
> +	/*
> +	 * Shared folios would not be marked as "guestmem" so far, and we only
> +	 * expect shared folios at this point.
> +	 */
> +	if (WARN_ON_ONCE(folio_test_guestmem(folio)))  {
> +		ret = VM_FAULT_SIGBUS;
> +		goto out_folio;
> +	}
> +
> +	/* No support for huge pages. */
> +	if (WARN_ON_ONCE(folio_test_large(folio))) {
> +		ret = VM_FAULT_SIGBUS;
> +		goto out_folio;
> +	}
> +
> +	if (!folio_test_uptodate(folio)) {
> +		clear_highpage(folio_page(folio, 0));
> +		kvm_gmem_mark_prepared(folio);
> +	}
> +
> +	vmf->page = folio_file_page(folio, vmf->pgoff);
> +
> +out_folio:
> +	if (ret != VM_FAULT_LOCKED) {
> +		folio_unlock(folio);
> +		folio_put(folio);
> +	}
> +
> +out_filemap:
> +	filemap_invalidate_unlock_shared(inode->i_mapping);
> +
> +	return ret;
> +}
> +
> <snip>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v6 04/10] KVM: guest_memfd: Allow host to map guest_memfd() pages
  2025-03-14 18:46   ` Ackerley Tng
@ 2025-03-17 10:42     ` Fuad Tabba
  0 siblings, 0 replies; 22+ messages in thread
From: Fuad Tabba @ 2025-03-17 10:42 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vbabka, vannapurve, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat,
	shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton,
	peterx

Hi Ackerley,

On Fri, 14 Mar 2025 at 18:47, Ackerley Tng <ackerleytng@google.com> wrote:
>
> Fuad Tabba <tabba@google.com> writes:
>
> > Add support for mmap() and fault() for guest_memfd backed memory
> > in the host for VMs that support in-place conversion between
> > shared and private. To that end, this patch adds the ability to
> > check whether the VM type supports in-place conversion, and only
> > allows mapping its memory if that's the case.
> >
> > Also add the KVM capability KVM_CAP_GMEM_SHARED_MEM, which
> > indicates that the VM supports shared memory in guest_memfd, or
> > that the host can create VMs that support shared memory.
> > Supporting shared memory implies that memory can be mapped when
> > shared with the host.
> >
> > This is controlled by the KVM_GMEM_SHARED_MEM configuration
> > option.
> >
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> >  include/linux/kvm_host.h |  11 +++++
> >  include/uapi/linux/kvm.h |   1 +
> >  virt/kvm/guest_memfd.c   | 102 +++++++++++++++++++++++++++++++++++++++
> >  virt/kvm/kvm_main.c      |   4 ++
> >  4 files changed, 118 insertions(+)
> >
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index 3ad0719bfc4f..601bbcaa5e41 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -728,6 +728,17 @@ static inline bool kvm_arch_has_private_mem(struct kvm *kvm)
> >  }
> >  #endif
> >
> > +/*
> > + * Arch code must define kvm_arch_gmem_supports_shared_mem if support for
> > + * private memory is enabled and it supports in-place shared/private conversion.
> > + */
> > +#if !defined(kvm_arch_gmem_supports_shared_mem) && !IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM)
> > +static inline bool kvm_arch_gmem_supports_shared_mem(struct kvm *kvm)
> > +{
> > +     return false;
> > +}
> > +#endif
> > +
> >  #ifndef kvm_arch_has_readonly_mem
> >  static inline bool kvm_arch_has_readonly_mem(struct kvm *kvm)
> >  {
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index 45e6d8fca9b9..117937a895da 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -929,6 +929,7 @@ struct kvm_enable_cap {
> >  #define KVM_CAP_PRE_FAULT_MEMORY 236
> >  #define KVM_CAP_X86_APIC_BUS_CYCLES_NS 237
> >  #define KVM_CAP_X86_GUEST_MODE 238
> > +#define KVM_CAP_GMEM_SHARED_MEM 239
> >
> >  struct kvm_irq_routing_irqchip {
> >       __u32 irqchip;
> > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> > index 5fc414becae5..eea44e003ed1 100644
> > --- a/virt/kvm/guest_memfd.c
> > +++ b/virt/kvm/guest_memfd.c
> > @@ -320,7 +320,109 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn)
> >       return gfn - slot->base_gfn + slot->gmem.pgoff;
> >  }
> >
> > +#ifdef CONFIG_KVM_GMEM_SHARED_MEM
> > +static bool folio_offset_is_shared(const struct folio *folio, struct file *file, pgoff_t index)
> > +{
> > +     struct kvm_gmem *gmem = file->private_data;
> > +
> > +     VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
>
> I should've commented on this in the last series, but why must folio
> lock be held to check if this offset is shared?
>
> I was thinking to use the filemap's lock (filemap_invalidate_lock()) to
> guard mappability races. Does that work too?

I was thinking the same thing as I am preparing the sharing state
patch series to be sent. I (wrongly) thought before that it wasn't
possible to protect all cases with the invalidate_lock, but they
already are.

I will fix it in the respin of both. Thanks!


/fuad

> > +
> > +     /* For now, VMs that support shared memory share all their memory. */
> > +     return kvm_arch_gmem_supports_shared_mem(gmem->kvm);
> > +}
> > +
> > +static vm_fault_t kvm_gmem_fault(struct vm_fault *vmf)
> > +{
> > +     struct inode *inode = file_inode(vmf->vma->vm_file);
> > +     struct folio *folio;
> > +     vm_fault_t ret = VM_FAULT_LOCKED;
> > +
> > +     filemap_invalidate_lock_shared(inode->i_mapping);
> > +
> > +     folio = kvm_gmem_get_folio(inode, vmf->pgoff);
> > +     if (IS_ERR(folio)) {
> > +             int err = PTR_ERR(folio);
> > +
> > +             if (err == -EAGAIN)
> > +                     ret = VM_FAULT_RETRY;
> > +             else
> > +                     ret = vmf_error(err);
> > +
> > +             goto out_filemap;
> > +     }
> > +
> > +     if (folio_test_hwpoison(folio)) {
> > +             ret = VM_FAULT_HWPOISON;
> > +             goto out_folio;
> > +     }
> > +
> > +     if (!folio_offset_is_shared(folio, vmf->vma->vm_file, vmf->pgoff)) {
> > +             ret = VM_FAULT_SIGBUS;
> > +             goto out_folio;
> > +     }
> > +
> > +     /*
> > +      * Shared folios would not be marked as "guestmem" so far, and we only
> > +      * expect shared folios at this point.
> > +      */
> > +     if (WARN_ON_ONCE(folio_test_guestmem(folio)))  {
> > +             ret = VM_FAULT_SIGBUS;
> > +             goto out_folio;
> > +     }
> > +
> > +     /* No support for huge pages. */
> > +     if (WARN_ON_ONCE(folio_test_large(folio))) {
> > +             ret = VM_FAULT_SIGBUS;
> > +             goto out_folio;
> > +     }
> > +
> > +     if (!folio_test_uptodate(folio)) {
> > +             clear_highpage(folio_page(folio, 0));
> > +             kvm_gmem_mark_prepared(folio);
> > +     }
> > +
> > +     vmf->page = folio_file_page(folio, vmf->pgoff);
> > +
> > +out_folio:
> > +     if (ret != VM_FAULT_LOCKED) {
> > +             folio_unlock(folio);
> > +             folio_put(folio);
> > +     }
> > +
> > +out_filemap:
> > +     filemap_invalidate_unlock_shared(inode->i_mapping);
> > +
> > +     return ret;
> > +}
> > +
> > <snip>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v6 03/10] KVM: guest_memfd: Handle kvm_gmem_handle_folio_put() for KVM as a module
  2025-03-13 13:49   ` Ackerley Tng
  2025-03-13 13:57     ` Fuad Tabba
@ 2025-03-17 13:43     ` Vlastimil Babka
  2025-03-17 14:38       ` Sean Christopherson
  2025-03-17 16:27       ` Ackerley Tng
  1 sibling, 2 replies; 22+ messages in thread
From: Vlastimil Babka @ 2025-03-17 13:43 UTC (permalink / raw)
  To: Ackerley Tng, Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vannapurve, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx

On 3/13/25 14:49, Ackerley Tng wrote:
> Fuad Tabba <tabba@google.com> writes:
> 
>> In some architectures, KVM could be defined as a module. If there is a
>> pending folio_put() while KVM is unloaded, the system could crash. By
>> having a helper check for that and call the function only if it's
>> available, we are able to handle that case more gracefully.
>>
>> Signed-off-by: Fuad Tabba <tabba@google.com>
>>
>> ---
>>
>> This patch could be squashed with the previous one of the maintainers
>> think it would be better.
>> ---
>>  include/linux/kvm_host.h |  5 +----
>>  mm/swap.c                | 20 +++++++++++++++++++-
>>  virt/kvm/guest_memfd.c   |  8 ++++++++
>>  3 files changed, 28 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index 7788e3625f6d..3ad0719bfc4f 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -2572,10 +2572,7 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
>>  #endif
>>  
>>  #ifdef CONFIG_KVM_GMEM_SHARED_MEM
>> -static inline void kvm_gmem_handle_folio_put(struct folio *folio)
>> -{
>> -	WARN_ONCE(1, "A placeholder that shouldn't trigger. Work in progress.");
>> -}
>> +void kvm_gmem_handle_folio_put(struct folio *folio);
>>  #endif
>>  
>>  #endif
>> diff --git a/mm/swap.c b/mm/swap.c
>> index 241880a46358..27dfd75536c8 100644
>> --- a/mm/swap.c
>> +++ b/mm/swap.c
>> @@ -98,6 +98,24 @@ static void page_cache_release(struct folio *folio)
>>  		unlock_page_lruvec_irqrestore(lruvec, flags);
>>  }
>>  
>> +#ifdef CONFIG_KVM_GMEM_SHARED_MEM
>> +static void gmem_folio_put(struct folio *folio)
>> +{
>> +#if IS_MODULE(CONFIG_KVM)
>> +	void (*fn)(struct folio *folio);
>> +
>> +	fn = symbol_get(kvm_gmem_handle_folio_put);
>> +	if (WARN_ON_ONCE(!fn))
>> +		return;
>> +
>> +	fn(folio);
>> +	symbol_put(kvm_gmem_handle_folio_put);
>> +#else
>> +	kvm_gmem_handle_folio_put(folio);
>> +#endif
>> +}
>> +#endif

Yeah, this is not great. The vfio code isn't setting a good example to follow :(

> Sorry about the premature sending earlier!
> 
> I was thinking about having a static function pointer in mm/swap.c that
> will be filled in when KVM is loaded and cleared when KVM is unloaded.
> 
> One benefit I see is that it'll avoid the lookup that symbol_get() does
> on every folio_put(), but some other pinning on KVM would have to be
> done to prevent KVM from being unloaded in the middle of
> kvm_gmem_handle_folio_put() call.

Isn't there some "natural" dependency between things such that at the point
the KVM module is able to unload itself, no guest_memfd areas should be
existing anymore at that point, and thus also not any pages that would use
this callback should exist? In that case it would mean there's a memory leak
if that happens so while we might be trying to avoid calling a function that
was unleaded, we don't need to try has hard as symbol_get()/put() on every
invocation, but a racy check would be good enough?
Or would such a late folio_put() be legitimate to happen because some
short-lived folio_get() from e.g. a pfn scanner could prolong the page's
lifetime beyond the KVM module? I'd hope that since you want to make pages
PGTY_guestmem only in certain points of their lifetime, then maybe this
should not be possible to happen?

> Do you/anyone else see pros/cons either way?
> 
>> +
>>  static void free_typed_folio(struct folio *folio)
>>  {
>>  	switch (folio_get_type(folio)) {
>> @@ -108,7 +126,7 @@ static void free_typed_folio(struct folio *folio)
>>  #endif
>>  #ifdef CONFIG_KVM_GMEM_SHARED_MEM
>>  	case PGTY_guestmem:
>> -		kvm_gmem_handle_folio_put(folio);
>> +		gmem_folio_put(folio);
>>  		return;
>>  #endif
>>  	default:
>> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
>> index b2aa6bf24d3a..5fc414becae5 100644
>> --- a/virt/kvm/guest_memfd.c
>> +++ b/virt/kvm/guest_memfd.c
>> @@ -13,6 +13,14 @@ struct kvm_gmem {
>>  	struct list_head entry;
>>  };
>>  
>> +#ifdef CONFIG_KVM_GMEM_SHARED_MEM
>> +void kvm_gmem_handle_folio_put(struct folio *folio)
>> +{
>> +	WARN_ONCE(1, "A placeholder that shouldn't trigger. Work in progress.");
>> +}
>> +EXPORT_SYMBOL_GPL(kvm_gmem_handle_folio_put);
>> +#endif /* CONFIG_KVM_GMEM_SHARED_MEM */
>> +
>>  /**
>>   * folio_file_pfn - like folio_file_page, but return a pfn.
>>   * @folio: The folio which contains this index.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v6 03/10] KVM: guest_memfd: Handle kvm_gmem_handle_folio_put() for KVM as a module
  2025-03-17 13:43     ` Vlastimil Babka
@ 2025-03-17 14:38       ` Sean Christopherson
  2025-03-17 15:04         ` Fuad Tabba
  2025-03-17 16:27       ` Ackerley Tng
  1 sibling, 1 reply; 22+ messages in thread
From: Sean Christopherson @ 2025-03-17 14:38 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Ackerley Tng, Fuad Tabba, kvm, linux-arm-msm, linux-mm, pbonzini,
	chenhuacai, mpe, anup, paul.walmsley, palmer, aou, viro, brauner,
	willy, akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy,
	dmatlack, isaku.yamahata, mic, vannapurve, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat,
	shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton,
	peterx

On Mon, Mar 17, 2025, Vlastimil Babka wrote:
> On 3/13/25 14:49, Ackerley Tng wrote:
> >> +#ifdef CONFIG_KVM_GMEM_SHARED_MEM
> >> +static void gmem_folio_put(struct folio *folio)
> >> +{
> >> +#if IS_MODULE(CONFIG_KVM)
> >> +	void (*fn)(struct folio *folio);
> >> +
> >> +	fn = symbol_get(kvm_gmem_handle_folio_put);
> >> +	if (WARN_ON_ONCE(!fn))
> >> +		return;
> >> +
> >> +	fn(folio);
> >> +	symbol_put(kvm_gmem_handle_folio_put);
> >> +#else
> >> +	kvm_gmem_handle_folio_put(folio);
> >> +#endif
> >> +}
> >> +#endif
> 
> Yeah, this is not great. The vfio code isn't setting a good example to follow :(

+1000

I haven't been following guest_memfd development, so I've no idea what the context
of this patch is, but...

NAK to any approach that requires symbol_get().  Not only is it beyond gross,
it's also broken on x86 as it fails to pin the vendor module, i.e. kvm-amd.ko or
kvm-intel.ko.

> > Sorry about the premature sending earlier!
> > 
> > I was thinking about having a static function pointer in mm/swap.c that
> > will be filled in when KVM is loaded and cleared when KVM is unloaded.
> > 
> > One benefit I see is that it'll avoid the lookup that symbol_get() does
> > on every folio_put(), but some other pinning on KVM would have to be
> > done to prevent KVM from being unloaded in the middle of
> > kvm_gmem_handle_folio_put() call.
> 
> Isn't there some "natural" dependency between things such that at the point
> the KVM module is able to unload itself, no guest_memfd areas should be
> existing anymore at that point, and thus also not any pages that would use
> this callback should exist?

Yes.  File-backed VMAs hold a reference to the file (e.g. see get_file() usage
in vma.c), and keeping the guest_memfd file alive in turn prevents kvm.ko from
being unloaded.

The "magic" is this bit of code in kvm_gmem_init():

	kvm_gmem_fops.owner = module;

The fops->owner pointer is then processed by the try_get_module() call in
__anon_inode_getfile() to obtain a reference to the module which owns the fops.
The module reference won't be put until the file is fully closed/released; see
__fput() => fops_put().

On x86, that pins not only kvm.ko, but also the vendor module, because the
@module passed to kvm_gmem_init() points at the vendor module, not at kvm.ko.

If that's not working, y'all broke something :-)


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v6 03/10] KVM: guest_memfd: Handle kvm_gmem_handle_folio_put() for KVM as a module
  2025-03-17 14:38       ` Sean Christopherson
@ 2025-03-17 15:04         ` Fuad Tabba
  0 siblings, 0 replies; 22+ messages in thread
From: Fuad Tabba @ 2025-03-17 15:04 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vlastimil Babka, Ackerley Tng, kvm, linux-arm-msm, linux-mm,
	pbonzini, chenhuacai, mpe, anup, paul.walmsley, palmer, aou,
	viro, brauner, willy, akpm, xiaoyao.li, yilun.xu, chao.p.peng,
	jarkko, amoorthy, dmatlack, isaku.yamahata, mic, vannapurve,
	mail, david, michael.roth, wei.w.wang, liam.merwick,
	isaku.yamahata, kirill.shutemov, suzuki.poulose, steven.price,
	quic_eberman, quic_mnalajal, quic_tsoni, quic_svaddagi,
	quic_cvanscha, quic_pderrin, quic_pheragu, catalin.marinas,
	james.morse, yuzenghui, oliver.upton, maz, will, qperret, keirf,
	roypat, shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd,
	jthoughton, peterx

Hi Sean and Vlastimil,

On Mon, 17 Mar 2025 at 14:39, Sean Christopherson <seanjc@google.com> wrote:
>
> On Mon, Mar 17, 2025, Vlastimil Babka wrote:
> > On 3/13/25 14:49, Ackerley Tng wrote:
> > >> +#ifdef CONFIG_KVM_GMEM_SHARED_MEM
> > >> +static void gmem_folio_put(struct folio *folio)
> > >> +{
> > >> +#if IS_MODULE(CONFIG_KVM)
> > >> +  void (*fn)(struct folio *folio);
> > >> +
> > >> +  fn = symbol_get(kvm_gmem_handle_folio_put);
> > >> +  if (WARN_ON_ONCE(!fn))
> > >> +          return;
> > >> +
> > >> +  fn(folio);
> > >> +  symbol_put(kvm_gmem_handle_folio_put);
> > >> +#else
> > >> +  kvm_gmem_handle_folio_put(folio);
> > >> +#endif
> > >> +}
> > >> +#endif
> >
> > Yeah, this is not great. The vfio code isn't setting a good example to follow :(
>
> +1000
>
> I haven't been following guest_memfd development, so I've no idea what the context
> of this patch is, but...
>
> NAK to any approach that requires symbol_get().  Not only is it beyond gross,
> it's also broken on x86 as it fails to pin the vendor module, i.e. kvm-amd.ko or
> kvm-intel.ko.
>
> > > Sorry about the premature sending earlier!
> > >
> > > I was thinking about having a static function pointer in mm/swap.c that
> > > will be filled in when KVM is loaded and cleared when KVM is unloaded.
> > >
> > > One benefit I see is that it'll avoid the lookup that symbol_get() does
> > > on every folio_put(), but some other pinning on KVM would have to be
> > > done to prevent KVM from being unloaded in the middle of
> > > kvm_gmem_handle_folio_put() call.
> >
> > Isn't there some "natural" dependency between things such that at the point
> > the KVM module is able to unload itself, no guest_memfd areas should be
> > existing anymore at that point, and thus also not any pages that would use
> > this callback should exist?
>
> Yes.  File-backed VMAs hold a reference to the file (e.g. see get_file() usage
> in vma.c), and keeping the guest_memfd file alive in turn prevents kvm.ko from
> being unloaded.
>
> The "magic" is this bit of code in kvm_gmem_init():
>
>         kvm_gmem_fops.owner = module;
>
> The fops->owner pointer is then processed by the try_get_module() call in
> __anon_inode_getfile() to obtain a reference to the module which owns the fops.
> The module reference won't be put until the file is fully closed/released; see
> __fput() => fops_put().
>
> On x86, that pins not only kvm.ko, but also the vendor module, because the
> @module passed to kvm_gmem_init() points at the vendor module, not at kvm.ko.
>
> If that's not working, y'all broke something :-)

Thank you for your feedback and for clarifying things. You're right,
with a reference to the module held, no one should be able to unload
it as long as there are in-flight references, no stragglers.

Nothing is broken. Will fix this on the respin.

Cheers,
/fuad


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v6 03/10] KVM: guest_memfd: Handle kvm_gmem_handle_folio_put() for KVM as a module
  2025-03-17 13:43     ` Vlastimil Babka
  2025-03-17 14:38       ` Sean Christopherson
@ 2025-03-17 16:27       ` Ackerley Tng
  2025-03-17 16:50         ` Fuad Tabba
  1 sibling, 1 reply; 22+ messages in thread
From: Ackerley Tng @ 2025-03-17 16:27 UTC (permalink / raw)
  To: Vlastimil Babka, Fuad Tabba
  Cc: kvm, linux-arm-msm, linux-mm, pbonzini, chenhuacai, mpe, anup,
	paul.walmsley, palmer, aou, seanjc, viro, brauner, willy, akpm,
	xiaoyao.li, yilun.xu, chao.p.peng, jarkko, amoorthy, dmatlack,
	isaku.yamahata, mic, vannapurve, mail, david, michael.roth,
	wei.w.wang, liam.merwick, isaku.yamahata, kirill.shutemov,
	suzuki.poulose, steven.price, quic_eberman, quic_mnalajal,
	quic_tsoni, quic_svaddagi, quic_cvanscha, quic_pderrin,
	quic_pheragu, catalin.marinas, james.morse, yuzenghui,
	oliver.upton, maz, will, qperret, keirf, roypat, shuah, hch, jgg,
	rientjes, jhubbard, fvdl, hughd, jthoughton, peterx

Vlastimil Babka <vbabka@suse.cz> writes:

> On 3/13/25 14:49, Ackerley Tng wrote:
>> Fuad Tabba <tabba@google.com> writes:
>> 
>>> In some architectures, KVM could be defined as a module. If there is a
>>> pending folio_put() while KVM is unloaded, the system could crash. By
>>> having a helper check for that and call the function only if it's
>>> available, we are able to handle that case more gracefully.
>>>
>>> Signed-off-by: Fuad Tabba <tabba@google.com>
>>>
>>> ---
>>>
>>> This patch could be squashed with the previous one of the maintainers
>>> think it would be better.
>>> ---
>>>  include/linux/kvm_host.h |  5 +----
>>>  mm/swap.c                | 20 +++++++++++++++++++-
>>>  virt/kvm/guest_memfd.c   |  8 ++++++++
>>>  3 files changed, 28 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>>> index 7788e3625f6d..3ad0719bfc4f 100644
>>> --- a/include/linux/kvm_host.h
>>> +++ b/include/linux/kvm_host.h
>>> @@ -2572,10 +2572,7 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
>>>  #endif
>>>  
>>>  #ifdef CONFIG_KVM_GMEM_SHARED_MEM
>>> -static inline void kvm_gmem_handle_folio_put(struct folio *folio)
>>> -{
>>> -	WARN_ONCE(1, "A placeholder that shouldn't trigger. Work in progress.");
>>> -}
>>> +void kvm_gmem_handle_folio_put(struct folio *folio);
>>>  #endif
>>>  
>>>  #endif
>>> diff --git a/mm/swap.c b/mm/swap.c
>>> index 241880a46358..27dfd75536c8 100644
>>> --- a/mm/swap.c
>>> +++ b/mm/swap.c
>>> @@ -98,6 +98,24 @@ static void page_cache_release(struct folio *folio)
>>>  		unlock_page_lruvec_irqrestore(lruvec, flags);
>>>  }
>>>  
>>> +#ifdef CONFIG_KVM_GMEM_SHARED_MEM
>>> +static void gmem_folio_put(struct folio *folio)
>>> +{
>>> +#if IS_MODULE(CONFIG_KVM)
>>> +	void (*fn)(struct folio *folio);
>>> +
>>> +	fn = symbol_get(kvm_gmem_handle_folio_put);
>>> +	if (WARN_ON_ONCE(!fn))
>>> +		return;
>>> +
>>> +	fn(folio);
>>> +	symbol_put(kvm_gmem_handle_folio_put);
>>> +#else
>>> +	kvm_gmem_handle_folio_put(folio);
>>> +#endif
>>> +}
>>> +#endif
>
> Yeah, this is not great. The vfio code isn't setting a good example to follow :(
>
>> Sorry about the premature sending earlier!
>> 
>> I was thinking about having a static function pointer in mm/swap.c that
>> will be filled in when KVM is loaded and cleared when KVM is unloaded.
>> 
>> One benefit I see is that it'll avoid the lookup that symbol_get() does
>> on every folio_put(), but some other pinning on KVM would have to be
>> done to prevent KVM from being unloaded in the middle of
>> kvm_gmem_handle_folio_put() call.
>
> Isn't there some "natural" dependency between things such that at the point
> the KVM module is able to unload itself, no guest_memfd areas should be
> existing anymore at that point, and thus also not any pages that would use
> this callback should exist? In that case it would mean there's a memory leak
> if that happens so while we might be trying to avoid calling a function that
> was unleaded, we don't need to try has hard as symbol_get()/put() on every
> invocation, but a racy check would be good enough?
> Or would such a late folio_put() be legitimate to happen because some
> short-lived folio_get() from e.g. a pfn scanner could prolong the page's
> lifetime beyond the KVM module? I'd hope that since you want to make pages
> PGTY_guestmem only in certain points of their lifetime, then maybe this
> should not be possible to happen?
>

IIUC the last refcount on a guest_memfd folio may not be held by
guest_memfd if the folios is already truncated from guest_memfd. The
inode could already be closed. If the inode is closed then the KVM is
free to be unloaded.

This means that someone could hold on to the last refcount, unload KVM,
and then drop the last refcount and have the folio_put() call a
non-existent callback.

If we first check that folio->mapping != NULL and then do
kvm_gmem_handle_folio_put(), then I think what you suggested would work,
since folio->mapping is only NULL when the folio has been disassociated
from the inode.

gmem_folio_put() should probably end with

if (folio_ref_count(folio) == 0)
	__folio_put(folio)

so that if kvm_gmem_handle_folio_put() is done with whatever it needs to
(e.g. complete the conversion) gmem_folio_put() will free the folio.

>> Do you/anyone else see pros/cons either way?
>> 
>>> +
>>>  static void free_typed_folio(struct folio *folio)
>>>  {
>>>  	switch (folio_get_type(folio)) {
>>> @@ -108,7 +126,7 @@ static void free_typed_folio(struct folio *folio)
>>>  #endif
>>>  #ifdef CONFIG_KVM_GMEM_SHARED_MEM
>>>  	case PGTY_guestmem:
>>> -		kvm_gmem_handle_folio_put(folio);
>>> +		gmem_folio_put(folio);
>>>  		return;
>>>  #endif
>>>  	default:
>>> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
>>> index b2aa6bf24d3a..5fc414becae5 100644
>>> --- a/virt/kvm/guest_memfd.c
>>> +++ b/virt/kvm/guest_memfd.c
>>> @@ -13,6 +13,14 @@ struct kvm_gmem {
>>>  	struct list_head entry;
>>>  };
>>>  
>>> +#ifdef CONFIG_KVM_GMEM_SHARED_MEM
>>> +void kvm_gmem_handle_folio_put(struct folio *folio)
>>> +{
>>> +	WARN_ONCE(1, "A placeholder that shouldn't trigger. Work in progress.");
>>> +}
>>> +EXPORT_SYMBOL_GPL(kvm_gmem_handle_folio_put);
>>> +#endif /* CONFIG_KVM_GMEM_SHARED_MEM */
>>> +
>>>  /**
>>>   * folio_file_pfn - like folio_file_page, but return a pfn.
>>>   * @folio: The folio which contains this index.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v6 03/10] KVM: guest_memfd: Handle kvm_gmem_handle_folio_put() for KVM as a module
  2025-03-17 16:27       ` Ackerley Tng
@ 2025-03-17 16:50         ` Fuad Tabba
  0 siblings, 0 replies; 22+ messages in thread
From: Fuad Tabba @ 2025-03-17 16:50 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: Vlastimil Babka, kvm, linux-arm-msm, linux-mm, pbonzini,
	chenhuacai, mpe, anup, paul.walmsley, palmer, aou, seanjc, viro,
	brauner, willy, akpm, xiaoyao.li, yilun.xu, chao.p.peng, jarkko,
	amoorthy, dmatlack, isaku.yamahata, mic, vannapurve, mail, david,
	michael.roth, wei.w.wang, liam.merwick, isaku.yamahata,
	kirill.shutemov, suzuki.poulose, steven.price, quic_eberman,
	quic_mnalajal, quic_tsoni, quic_svaddagi, quic_cvanscha,
	quic_pderrin, quic_pheragu, catalin.marinas, james.morse,
	yuzenghui, oliver.upton, maz, will, qperret, keirf, roypat,
	shuah, hch, jgg, rientjes, jhubbard, fvdl, hughd, jthoughton,
	peterx

Hi Ackerley,

On Mon, 17 Mar 2025 at 16:27, Ackerley Tng <ackerleytng@google.com> wrote:
>
> Vlastimil Babka <vbabka@suse.cz> writes:
>
> > On 3/13/25 14:49, Ackerley Tng wrote:
> >> Fuad Tabba <tabba@google.com> writes:
> >>
> >>> In some architectures, KVM could be defined as a module. If there is a
> >>> pending folio_put() while KVM is unloaded, the system could crash. By
> >>> having a helper check for that and call the function only if it's
> >>> available, we are able to handle that case more gracefully.
> >>>
> >>> Signed-off-by: Fuad Tabba <tabba@google.com>
> >>>
> >>> ---
> >>>
> >>> This patch could be squashed with the previous one of the maintainers
> >>> think it would be better.
> >>> ---
> >>>  include/linux/kvm_host.h |  5 +----
> >>>  mm/swap.c                | 20 +++++++++++++++++++-
> >>>  virt/kvm/guest_memfd.c   |  8 ++++++++
> >>>  3 files changed, 28 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> >>> index 7788e3625f6d..3ad0719bfc4f 100644
> >>> --- a/include/linux/kvm_host.h
> >>> +++ b/include/linux/kvm_host.h
> >>> @@ -2572,10 +2572,7 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
> >>>  #endif
> >>>
> >>>  #ifdef CONFIG_KVM_GMEM_SHARED_MEM
> >>> -static inline void kvm_gmem_handle_folio_put(struct folio *folio)
> >>> -{
> >>> -   WARN_ONCE(1, "A placeholder that shouldn't trigger. Work in progress.");
> >>> -}
> >>> +void kvm_gmem_handle_folio_put(struct folio *folio);
> >>>  #endif
> >>>
> >>>  #endif
> >>> diff --git a/mm/swap.c b/mm/swap.c
> >>> index 241880a46358..27dfd75536c8 100644
> >>> --- a/mm/swap.c
> >>> +++ b/mm/swap.c
> >>> @@ -98,6 +98,24 @@ static void page_cache_release(struct folio *folio)
> >>>             unlock_page_lruvec_irqrestore(lruvec, flags);
> >>>  }
> >>>
> >>> +#ifdef CONFIG_KVM_GMEM_SHARED_MEM
> >>> +static void gmem_folio_put(struct folio *folio)
> >>> +{
> >>> +#if IS_MODULE(CONFIG_KVM)
> >>> +   void (*fn)(struct folio *folio);
> >>> +
> >>> +   fn = symbol_get(kvm_gmem_handle_folio_put);
> >>> +   if (WARN_ON_ONCE(!fn))
> >>> +           return;
> >>> +
> >>> +   fn(folio);
> >>> +   symbol_put(kvm_gmem_handle_folio_put);
> >>> +#else
> >>> +   kvm_gmem_handle_folio_put(folio);
> >>> +#endif
> >>> +}
> >>> +#endif
> >
> > Yeah, this is not great. The vfio code isn't setting a good example to follow :(
> >
> >> Sorry about the premature sending earlier!
> >>
> >> I was thinking about having a static function pointer in mm/swap.c that
> >> will be filled in when KVM is loaded and cleared when KVM is unloaded.
> >>
> >> One benefit I see is that it'll avoid the lookup that symbol_get() does
> >> on every folio_put(), but some other pinning on KVM would have to be
> >> done to prevent KVM from being unloaded in the middle of
> >> kvm_gmem_handle_folio_put() call.
> >
> > Isn't there some "natural" dependency between things such that at the point
> > the KVM module is able to unload itself, no guest_memfd areas should be
> > existing anymore at that point, and thus also not any pages that would use
> > this callback should exist? In that case it would mean there's a memory leak
> > if that happens so while we might be trying to avoid calling a function that
> > was unleaded, we don't need to try has hard as symbol_get()/put() on every
> > invocation, but a racy check would be good enough?
> > Or would such a late folio_put() be legitimate to happen because some
> > short-lived folio_get() from e.g. a pfn scanner could prolong the page's
> > lifetime beyond the KVM module? I'd hope that since you want to make pages
> > PGTY_guestmem only in certain points of their lifetime, then maybe this
> > should not be possible to happen?
> >
>
> IIUC the last refcount on a guest_memfd folio may not be held by
> guest_memfd if the folios is already truncated from guest_memfd. The
> inode could already be closed. If the inode is closed then the KVM is
> free to be unloaded.
>
> This means that someone could hold on to the last refcount, unload KVM,
> and then drop the last refcount and have the folio_put() call a
> non-existent callback.
>
> If we first check that folio->mapping != NULL and then do
> kvm_gmem_handle_folio_put(), then I think what you suggested would work,
> since folio->mapping is only NULL when the folio has been disassociated
> from the inode.
>
> gmem_folio_put() should probably end with
>
> if (folio_ref_count(folio) == 0)
>         __folio_put(folio)
>
> so that if kvm_gmem_handle_folio_put() is done with whatever it needs to
> (e.g. complete the conversion) gmem_folio_put() will free the folio.

Right, this is important. Into the next respin.

Thanks,
/fuad

> >> Do you/anyone else see pros/cons either way?
> >>
> >>> +
> >>>  static void free_typed_folio(struct folio *folio)
> >>>  {
> >>>     switch (folio_get_type(folio)) {
> >>> @@ -108,7 +126,7 @@ static void free_typed_folio(struct folio *folio)
> >>>  #endif
> >>>  #ifdef CONFIG_KVM_GMEM_SHARED_MEM
> >>>     case PGTY_guestmem:
> >>> -           kvm_gmem_handle_folio_put(folio);
> >>> +           gmem_folio_put(folio);
> >>>             return;
> >>>  #endif
> >>>     default:
> >>> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> >>> index b2aa6bf24d3a..5fc414becae5 100644
> >>> --- a/virt/kvm/guest_memfd.c
> >>> +++ b/virt/kvm/guest_memfd.c
> >>> @@ -13,6 +13,14 @@ struct kvm_gmem {
> >>>     struct list_head entry;
> >>>  };
> >>>
> >>> +#ifdef CONFIG_KVM_GMEM_SHARED_MEM
> >>> +void kvm_gmem_handle_folio_put(struct folio *folio)
> >>> +{
> >>> +   WARN_ONCE(1, "A placeholder that shouldn't trigger. Work in progress.");
> >>> +}
> >>> +EXPORT_SYMBOL_GPL(kvm_gmem_handle_folio_put);
> >>> +#endif /* CONFIG_KVM_GMEM_SHARED_MEM */
> >>> +
> >>>  /**
> >>>   * folio_file_pfn - like folio_file_page, but return a pfn.
> >>>   * @folio: The folio which contains this index.


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2025-03-17 16:51 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-03-12 17:58 [PATCH v6 00/10] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
2025-03-12 17:58 ` [PATCH v6 01/10] mm: Consolidate freeing of typed folios on final folio_put() Fuad Tabba
2025-03-12 17:58 ` [PATCH v6 02/10] KVM: guest_memfd: Handle final folio_put() of guest_memfd pages Fuad Tabba
2025-03-12 17:58 ` [PATCH v6 03/10] KVM: guest_memfd: Handle kvm_gmem_handle_folio_put() for KVM as a module Fuad Tabba
2025-03-13 13:46   ` Ackerley Tng
2025-03-13 13:49   ` Ackerley Tng
2025-03-13 13:57     ` Fuad Tabba
2025-03-17 13:43     ` Vlastimil Babka
2025-03-17 14:38       ` Sean Christopherson
2025-03-17 15:04         ` Fuad Tabba
2025-03-17 16:27       ` Ackerley Tng
2025-03-17 16:50         ` Fuad Tabba
2025-03-12 17:58 ` [PATCH v6 04/10] KVM: guest_memfd: Allow host to map guest_memfd() pages Fuad Tabba
2025-03-14 18:46   ` Ackerley Tng
2025-03-17 10:42     ` Fuad Tabba
2025-03-12 17:58 ` [PATCH v6 05/10] KVM: guest_memfd: Handle in-place shared memory as guest_memfd backed memory Fuad Tabba
2025-03-12 17:58 ` [PATCH v6 06/10] KVM: x86: Mark KVM_X86_SW_PROTECTED_VM as supporting guest_memfd shared memory Fuad Tabba
2025-03-12 17:58 ` [PATCH v6 07/10] KVM: arm64: Refactor user_mem_abort() calculation of force_pte Fuad Tabba
2025-03-12 17:58 ` [PATCH v6 08/10] KVM: arm64: Handle guest_memfd()-backed guest page faults Fuad Tabba
2025-03-12 17:58 ` [PATCH v6 09/10] KVM: arm64: Enable mapping guest_memfd in arm64 Fuad Tabba
2025-03-13 14:20   ` kernel test robot
2025-03-12 17:58 ` [PATCH v6 10/10] KVM: guest_memfd: selftests: guest_memfd mmap() test when mapping is allowed Fuad Tabba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox