linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/5] KVM: guest_memfd: support for uffd minor
@ 2025-04-02 16:07 Nikita Kalyazin
  2025-04-02 16:07 ` [PATCH v2 1/5] mm: userfaultfd: generic continue for non hugetlbfs Nikita Kalyazin
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Nikita Kalyazin @ 2025-04-02 16:07 UTC (permalink / raw)
  To: akpm, pbonzini, shuah
  Cc: kvm, linux-kselftest, linux-kernel, linux-mm, lorenzo.stoakes,
	david, ryan.roberts, quic_eberman, jthoughton, peterx, graf,
	jgowans, roypat, derekmn, nsaenz, xmarcalx, kalyazin

This series is built on top of Fuad's v7 "mapping guest_memfd backed
memory at the host" [1].

With James's KVM userfault [2], it is possible to handle stage-2 faults
in guest_memfd in userspace.  However, KVM itself also triggers faults
in guest_memfd in some cases, for example: PV interfaces like kvmclock,
PV EOI and page table walking code when fetching the MMIO instruction on
x86.  It was agreed in the guest_memfd upstream call on 23 Jan 2025 [3]
that KVM would be accessing those pages via userspace page tables.  In
order for such faults to be handled in userspace, guest_memfd needs to
support userfaultfd.

Changes since v1 [4]:
 - James, Peter: implement a full minor trap instead of a hybrid
   missing/minor trap
 - James, Peter: to avoid shmem- and guest_memfd-specific code in the
   UFFDIO_CONTINUE implementation make it generic by calling
vm_ops->fault()

While generalising UFFDIO_CONTINUE implementation helped avoid
guest_memfd-specific code in mm/userfaulfd, userfaultfd still needs
access to KVM code to be able to verify the VMA type when handling
UFFDIO_REGISTER_MODE_MINOR, so I used a similar approach to what Fuad
did for now [5].

In v1, Peter was mentioning a potential for eliminating taking a folio
lock [6].  I did not implement that, but according to my testing, the
performance of shmem minor fault handling stayed the same after the
migration to calling vm_ops->fault() (tested on an x86).

Before:

./demand_paging_test -u MINOR -s shmem
Random seed: 0x6b8b4567
Testing guest mode: PA-bits:ANY, VA-bits:48,  4K pages
guest physical test memory: [0x3fffbffff000, 0x3ffffffff000)
Finished creating vCPUs and starting uffd threads
Started all vCPUs
All vCPU threads joined
Total guest execution time:	10.979277020s
Per-vcpu demand paging rate:	23876.253375 pgs/sec/vcpu
Overall demand paging rate:	23876.253375 pgs/sec

After:

./demand_paging_test -u MINOR -s shmem
Random seed: 0x6b8b4567
Testing guest mode: PA-bits:ANY, VA-bits:48,  4K pages
guest physical test memory: [0x3fffbffff000, 0x3ffffffff000)
Finished creating vCPUs and starting uffd threads
Started all vCPUs
All vCPU threads joined
Total guest execution time:	10.978893504s
Per-vcpu demand paging rate:	23877.087423 pgs/sec/vcpu
Overall demand paging rate:	23877.087423 pgs/sec

Nikita

[1] https://lore.kernel.org/kvm/20250318161823.4005529-1-tabba@google.com/T/
[2] https://lore.kernel.org/kvm/20250109204929.1106563-1-jthoughton@google.com/T/
[3] https://docs.google.com/document/d/1M6766BzdY1Lhk7LiR5IqVR8B8mG3cr-cxTxOrAosPOk/edit?tab=t.0#heading=h.w1126rgli5e3
[4] https://lore.kernel.org/kvm/20250303133011.44095-1-kalyazin@amazon.com/T/
[5] https://lore.kernel.org/kvm/20250318161823.4005529-1-tabba@google.com/T/#Z2e.:..:20250318161823.4005529-3-tabba::40google.com:1mm:swap.c
[6] https://lore.kernel.org/kvm/20250303133011.44095-1-kalyazin@amazon.com/T/#m8695dc24d2cc633a6a486a8990e3f7d50d4efb79

Nikita Kalyazin (5):
  mm: userfaultfd: generic continue for non hugetlbfs
  KVM: guest_memfd: add kvm_gmem_vma_is_gmem
  mm: userfaultfd: allow to register continue for guest_memfd
  KVM: guest_memfd: add support for userfaultfd minor
  KVM: selftests: test userfaultfd minor for guest_memfd

 include/linux/mm_types.h                      |  3 +
 include/linux/userfaultfd_k.h                 | 13 ++-
 mm/hugetlb.c                                  |  2 +-
 mm/shmem.c                                    |  3 +-
 mm/userfaultfd.c                              | 25 +++--
 .../testing/selftests/kvm/guest_memfd_test.c  | 94 +++++++++++++++++++
 virt/kvm/guest_memfd.c                        | 15 +++
 virt/kvm/kvm_mm.h                             |  1 +
 8 files changed, 146 insertions(+), 10 deletions(-)


base-commit: 3cc51efc17a2c41a480eed36b31c1773936717e0
-- 
2.47.1



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 1/5] mm: userfaultfd: generic continue for non hugetlbfs
  2025-04-02 16:07 [PATCH v2 0/5] KVM: guest_memfd: support for uffd minor Nikita Kalyazin
@ 2025-04-02 16:07 ` Nikita Kalyazin
  2025-04-02 19:04   ` James Houghton
  2025-04-02 16:07 ` [PATCH v2 2/5] KVM: guest_memfd: add kvm_gmem_vma_is_gmem Nikita Kalyazin
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Nikita Kalyazin @ 2025-04-02 16:07 UTC (permalink / raw)
  To: akpm, pbonzini, shuah
  Cc: kvm, linux-kselftest, linux-kernel, linux-mm, lorenzo.stoakes,
	david, ryan.roberts, quic_eberman, jthoughton, peterx, graf,
	jgowans, roypat, derekmn, nsaenz, xmarcalx, kalyazin

Remove shmem-specific code from UFFDIO_CONTINUE implementation for
non-huge pages by calling vm_ops->fault().  A new VMF flag,
FAULT_FLAG_NO_USERFAULT_MINOR, is introduced to avoid recursive call to
handle_userfault().

Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com>
---
 include/linux/mm_types.h |  3 +++
 mm/hugetlb.c             |  2 +-
 mm/shmem.c               |  3 ++-
 mm/userfaultfd.c         | 25 ++++++++++++++++++-------
 4 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 0234f14f2aa6..91a00f2cd565 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -1429,6 +1429,8 @@ enum tlb_flush_reason {
  * @FAULT_FLAG_ORIG_PTE_VALID: whether the fault has vmf->orig_pte cached.
  *                        We should only access orig_pte if this flag set.
  * @FAULT_FLAG_VMA_LOCK: The fault is handled under VMA lock.
+ * @FAULT_FLAG_NO_USERFAULT_MINOR: The fault handler must not call userfaultfd
+ *                                 minor handler.
  *
  * About @FAULT_FLAG_ALLOW_RETRY and @FAULT_FLAG_TRIED: we can specify
  * whether we would allow page faults to retry by specifying these two
@@ -1467,6 +1469,7 @@ enum fault_flag {
 	FAULT_FLAG_UNSHARE =		1 << 10,
 	FAULT_FLAG_ORIG_PTE_VALID =	1 << 11,
 	FAULT_FLAG_VMA_LOCK =		1 << 12,
+	FAULT_FLAG_NO_USERFAULT_MINOR = 1 << 13,
 };
 
 typedef unsigned int __bitwise zap_flags_t;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 97930d44d460..ba90d48144fc 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6228,7 +6228,7 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
 		}
 
 		/* Check for page in userfault range. */
-		if (userfaultfd_minor(vma)) {
+		if (userfaultfd_minor(vma) && !(vmf->flags & FAULT_FLAG_NO_USERFAULT_MINOR)) {
 			folio_unlock(folio);
 			folio_put(folio);
 			/* See comment in userfaultfd_missing() block above */
diff --git a/mm/shmem.c b/mm/shmem.c
index 1ede0800e846..5e1911e39dec 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2467,7 +2467,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
 	fault_mm = vma ? vma->vm_mm : NULL;
 
 	folio = filemap_get_entry(inode->i_mapping, index);
-	if (folio && vma && userfaultfd_minor(vma)) {
+	if (folio && vma && userfaultfd_minor(vma) &&
+	    !(vmf->flags & FAULT_FLAG_NO_USERFAULT_MINOR)) {
 		if (!xa_is_value(folio))
 			folio_put(folio);
 		*fault_type = handle_userfault(vmf, VM_UFFD_MINOR);
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index d06453fa8aba..68a995216789 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -386,24 +386,35 @@ static int mfill_atomic_pte_continue(pmd_t *dst_pmd,
 				     unsigned long dst_addr,
 				     uffd_flags_t flags)
 {
-	struct inode *inode = file_inode(dst_vma->vm_file);
-	pgoff_t pgoff = linear_page_index(dst_vma, dst_addr);
 	struct folio *folio;
 	struct page *page;
 	int ret;
+	struct vm_fault vmf = {
+		.vma = dst_vma,
+		.address = dst_addr,
+		.flags = FAULT_FLAG_WRITE | FAULT_FLAG_REMOTE |
+		    FAULT_FLAG_NO_USERFAULT_MINOR,
+		.pte = NULL,
+		.page = NULL,
+		.pgoff = linear_page_index(dst_vma, dst_addr),
+	};
+
+	if (!dst_vma->vm_ops || !dst_vma->vm_ops->fault)
+		return -EINVAL;
 
-	ret = shmem_get_folio(inode, pgoff, 0, &folio, SGP_NOALLOC);
-	/* Our caller expects us to return -EFAULT if we failed to find folio */
-	if (ret == -ENOENT)
+	ret = dst_vma->vm_ops->fault(&vmf);
+	if (ret & VM_FAULT_ERROR) {
 		ret = -EFAULT;
-	if (ret)
 		goto out;
+	}
+
+	page = vmf.page;
+	folio = page_folio(page);
 	if (!folio) {
 		ret = -EFAULT;
 		goto out;
 	}
 
-	page = folio_file_page(folio, pgoff);
 	if (PageHWPoison(page)) {
 		ret = -EIO;
 		goto out_release;
-- 
2.47.1



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 2/5] KVM: guest_memfd: add kvm_gmem_vma_is_gmem
  2025-04-02 16:07 [PATCH v2 0/5] KVM: guest_memfd: support for uffd minor Nikita Kalyazin
  2025-04-02 16:07 ` [PATCH v2 1/5] mm: userfaultfd: generic continue for non hugetlbfs Nikita Kalyazin
@ 2025-04-02 16:07 ` Nikita Kalyazin
  2025-04-02 16:07 ` [PATCH v2 3/5] mm: userfaultfd: allow to register continue for guest_memfd Nikita Kalyazin
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 16+ messages in thread
From: Nikita Kalyazin @ 2025-04-02 16:07 UTC (permalink / raw)
  To: akpm, pbonzini, shuah
  Cc: kvm, linux-kselftest, linux-kernel, linux-mm, lorenzo.stoakes,
	david, ryan.roberts, quic_eberman, jthoughton, peterx, graf,
	jgowans, roypat, derekmn, nsaenz, xmarcalx, kalyazin

It will be used to distinguish the vma type in userfaultfd code.

Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com>
---
 virt/kvm/guest_memfd.c | 5 +++++
 virt/kvm/kvm_mm.h      | 1 +
 2 files changed, 6 insertions(+)

diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index fbf89e643add..26b1734b9623 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -416,6 +416,11 @@ static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
 
 	return 0;
 }
+
+bool kvm_gmem_vma_is_gmem(struct vm_area_struct *vma)
+{
+	return vma->vm_ops == &kvm_gmem_vm_ops;
+}
 #else
 #define kvm_gmem_mmap NULL
 #endif /* CONFIG_KVM_GMEM_SHARED_MEM */
diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h
index acef3f5c582a..09fcdf18a072 100644
--- a/virt/kvm/kvm_mm.h
+++ b/virt/kvm/kvm_mm.h
@@ -73,6 +73,7 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args);
 int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
 		  unsigned int fd, loff_t offset);
 void kvm_gmem_unbind(struct kvm_memory_slot *slot);
+bool kvm_gmem_vma_is_gmem(struct vm_area_struct *vma);
 #else
 static inline void kvm_gmem_init(struct module *module)
 {
-- 
2.47.1



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 3/5] mm: userfaultfd: allow to register continue for guest_memfd
  2025-04-02 16:07 [PATCH v2 0/5] KVM: guest_memfd: support for uffd minor Nikita Kalyazin
  2025-04-02 16:07 ` [PATCH v2 1/5] mm: userfaultfd: generic continue for non hugetlbfs Nikita Kalyazin
  2025-04-02 16:07 ` [PATCH v2 2/5] KVM: guest_memfd: add kvm_gmem_vma_is_gmem Nikita Kalyazin
@ 2025-04-02 16:07 ` Nikita Kalyazin
  2025-04-02 21:25   ` James Houghton
  2025-04-02 16:07 ` [PATCH v2 4/5] KVM: guest_memfd: add support for userfaultfd minor Nikita Kalyazin
  2025-04-02 16:07 ` [PATCH v2 5/5] KVM: selftests: test userfaultfd minor for guest_memfd Nikita Kalyazin
  4 siblings, 1 reply; 16+ messages in thread
From: Nikita Kalyazin @ 2025-04-02 16:07 UTC (permalink / raw)
  To: akpm, pbonzini, shuah
  Cc: kvm, linux-kselftest, linux-kernel, linux-mm, lorenzo.stoakes,
	david, ryan.roberts, quic_eberman, jthoughton, peterx, graf,
	jgowans, roypat, derekmn, nsaenz, xmarcalx, kalyazin

Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com>
---
 include/linux/userfaultfd_k.h | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
index 75342022d144..bc184edfbb85 100644
--- a/include/linux/userfaultfd_k.h
+++ b/include/linux/userfaultfd_k.h
@@ -212,6 +212,10 @@ static inline bool userfaultfd_armed(struct vm_area_struct *vma)
 	return vma->vm_flags & __VM_UFFD_FLAGS;
 }
 
+#ifdef CONFIG_KVM_PRIVATE_MEM
+bool kvm_gmem_vma_is_gmem(struct vm_area_struct *vma);
+#endif
+
 static inline bool vma_can_userfault(struct vm_area_struct *vma,
 				     unsigned long vm_flags,
 				     bool wp_async)
@@ -222,7 +226,11 @@ static inline bool vma_can_userfault(struct vm_area_struct *vma,
 		return false;
 
 	if ((vm_flags & VM_UFFD_MINOR) &&
-	    (!is_vm_hugetlb_page(vma) && !vma_is_shmem(vma)))
+	    (!is_vm_hugetlb_page(vma) && !vma_is_shmem(vma))
+#ifdef CONFIG_KVM_PRIVATE_MEM
+	     && !kvm_gmem_vma_is_gmem(vma)
+#endif
+	    )
 		return false;
 
 	/*
@@ -244,6 +252,9 @@ static inline bool vma_can_userfault(struct vm_area_struct *vma,
 
 	/* By default, allow any of anon|shmem|hugetlb */
 	return vma_is_anonymous(vma) || is_vm_hugetlb_page(vma) ||
+#ifdef CONFIG_KVM_PRIVATE_MEM
+	    kvm_gmem_vma_is_gmem(vma) ||
+#endif
 	    vma_is_shmem(vma);
 }
 
-- 
2.47.1



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 4/5] KVM: guest_memfd: add support for userfaultfd minor
  2025-04-02 16:07 [PATCH v2 0/5] KVM: guest_memfd: support for uffd minor Nikita Kalyazin
                   ` (2 preceding siblings ...)
  2025-04-02 16:07 ` [PATCH v2 3/5] mm: userfaultfd: allow to register continue for guest_memfd Nikita Kalyazin
@ 2025-04-02 16:07 ` Nikita Kalyazin
  2025-04-02 16:07 ` [PATCH v2 5/5] KVM: selftests: test userfaultfd minor for guest_memfd Nikita Kalyazin
  4 siblings, 0 replies; 16+ messages in thread
From: Nikita Kalyazin @ 2025-04-02 16:07 UTC (permalink / raw)
  To: akpm, pbonzini, shuah
  Cc: kvm, linux-kselftest, linux-kernel, linux-mm, lorenzo.stoakes,
	david, ryan.roberts, quic_eberman, jthoughton, peterx, graf,
	jgowans, roypat, derekmn, nsaenz, xmarcalx, kalyazin

Add support for sending a pagefault event if userfaultfd is registered.
Only page minor event is currently supported.

Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com>
---
 virt/kvm/guest_memfd.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 26b1734b9623..92d3e6b51dc2 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -4,6 +4,9 @@
 #include <linux/kvm_host.h>
 #include <linux/pagemap.h>
 #include <linux/anon_inodes.h>
+#ifdef CONFIG_KVM_PRIVATE_MEM
+#include <linux/userfaultfd_k.h>
+#endif /* CONFIG_KVM_PRIVATE_MEM */
 
 #include "kvm_mm.h"
 
@@ -380,6 +383,13 @@ static vm_fault_t kvm_gmem_fault(struct vm_fault *vmf)
 		kvm_gmem_mark_prepared(folio);
 	}
 
+	if (userfaultfd_minor(vmf->vma) &&
+	    !(vmf->flags & FAULT_FLAG_NO_USERFAULT_MINOR)) {
+		folio_unlock(folio);
+		filemap_invalidate_unlock_shared(inode->i_mapping);
+		return handle_userfault(vmf, VM_UFFD_MINOR);
+	}
+
 	vmf->page = folio_file_page(folio, vmf->pgoff);
 
 out_folio:
-- 
2.47.1



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 5/5] KVM: selftests: test userfaultfd minor for guest_memfd
  2025-04-02 16:07 [PATCH v2 0/5] KVM: guest_memfd: support for uffd minor Nikita Kalyazin
                   ` (3 preceding siblings ...)
  2025-04-02 16:07 ` [PATCH v2 4/5] KVM: guest_memfd: add support for userfaultfd minor Nikita Kalyazin
@ 2025-04-02 16:07 ` Nikita Kalyazin
  2025-04-02 21:10   ` James Houghton
  4 siblings, 1 reply; 16+ messages in thread
From: Nikita Kalyazin @ 2025-04-02 16:07 UTC (permalink / raw)
  To: akpm, pbonzini, shuah
  Cc: kvm, linux-kselftest, linux-kernel, linux-mm, lorenzo.stoakes,
	david, ryan.roberts, quic_eberman, jthoughton, peterx, graf,
	jgowans, roypat, derekmn, nsaenz, xmarcalx, kalyazin

The test demonstrates that a minor userfaultfd event in guest_memfd can
be resolved via a memcpy followed by a UFFDIO_CONTINUE ioctl.

Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com>
---
 .../testing/selftests/kvm/guest_memfd_test.c  | 94 +++++++++++++++++++
 1 file changed, 94 insertions(+)

diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
index 38c501e49e0e..9b47b796f3aa 100644
--- a/tools/testing/selftests/kvm/guest_memfd_test.c
+++ b/tools/testing/selftests/kvm/guest_memfd_test.c
@@ -10,12 +10,16 @@
 #include <errno.h>
 #include <stdio.h>
 #include <fcntl.h>
+#include <pthread.h>
 
 #include <linux/bitmap.h>
 #include <linux/falloc.h>
+#include <linux/userfaultfd.h>
 #include <sys/mman.h>
 #include <sys/types.h>
 #include <sys/stat.h>
+#include <sys/syscall.h>
+#include <sys/ioctl.h>
 
 #include "kvm_util.h"
 #include "test_util.h"
@@ -206,6 +210,93 @@ static void test_create_guest_memfd_multiple(struct kvm_vm *vm)
 	close(fd1);
 }
 
+struct fault_args {
+	char *addr;
+	volatile char value;
+};
+
+static void *fault_thread_fn(void *arg)
+{
+	struct fault_args *args = arg;
+
+	/* Trigger page fault */
+	args->value = *args->addr;
+	return NULL;
+}
+
+static void test_uffd_missing(int fd, size_t page_size, size_t total_size)
+{
+	struct uffdio_register uffd_reg;
+	struct uffdio_continue uffd_cont;
+	struct uffd_msg msg;
+	struct fault_args args;
+	pthread_t fault_thread;
+	void *mem, *mem_nofault, *buf = NULL;
+	int uffd, ret;
+	off_t offset = page_size;
+	void *fault_addr;
+
+	ret = posix_memalign(&buf, page_size, total_size);
+	TEST_ASSERT_EQ(ret, 0);
+
+	uffd = syscall(__NR_userfaultfd, O_CLOEXEC);
+	TEST_ASSERT(uffd != -1, "userfaultfd creation should succeed");
+
+	struct uffdio_api uffdio_api = {
+		.api = UFFD_API,
+		.features = UFFD_FEATURE_MISSING_SHMEM,
+	};
+	ret = ioctl(uffd, UFFDIO_API, &uffdio_api);
+	TEST_ASSERT(ret != -1, "ioctl(UFFDIO_API) should succeed");
+
+	mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+	TEST_ASSERT(mem != MAP_FAILED, "mmap should succeed");
+
+	mem_nofault = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+	TEST_ASSERT(mem_nofault != MAP_FAILED, "mmap should succeed");
+
+	uffd_reg.range.start = (unsigned long)mem;
+	uffd_reg.range.len = total_size;
+	uffd_reg.mode = UFFDIO_REGISTER_MODE_MINOR;
+	ret = ioctl(uffd, UFFDIO_REGISTER, &uffd_reg);
+	TEST_ASSERT(ret != -1, "ioctl(UFFDIO_REGISTER) should succeed");
+
+	ret = fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE,
+			offset, page_size);
+	TEST_ASSERT(!ret, "fallocate(PUNCH_HOLE) should succeed");
+
+	fault_addr = mem + offset;
+	args.addr = fault_addr;
+
+	ret = pthread_create(&fault_thread, NULL, fault_thread_fn, &args);
+	TEST_ASSERT(ret == 0, "pthread_create should succeed");
+
+	ret = read(uffd, &msg, sizeof(msg));
+	TEST_ASSERT(ret != -1, "read from userfaultfd should succeed");
+	TEST_ASSERT(msg.event == UFFD_EVENT_PAGEFAULT, "event type should be pagefault");
+	TEST_ASSERT((void *)(msg.arg.pagefault.address & ~(page_size - 1)) == fault_addr,
+		    "pagefault should occur at expected address");
+
+	memcpy(mem_nofault + offset, buf + offset, page_size);
+
+	uffd_cont.range.start = (unsigned long)fault_addr;
+	uffd_cont.range.len = page_size;
+	uffd_cont.mode = 0;
+	ret = ioctl(uffd, UFFDIO_CONTINUE, &uffd_cont);
+	TEST_ASSERT(ret != -1, "ioctl(UFFDIO_CONTINUE) should succeed");
+
+	ret = pthread_join(fault_thread, NULL);
+	TEST_ASSERT(ret == 0, "pthread_join should succeed");
+
+	ret = munmap(mem_nofault, total_size);
+	TEST_ASSERT(!ret, "munmap should succeed");
+
+	ret = munmap(mem, total_size);
+	TEST_ASSERT(!ret, "munmap should succeed");
+	free(buf);
+	close(uffd);
+}
+
 unsigned long get_shared_type(void)
 {
 #ifdef __x86_64__
@@ -244,6 +335,9 @@ void test_vm_type(unsigned long type, bool is_shared)
 	test_fallocate(fd, page_size, total_size);
 	test_invalid_punch_hole(fd, page_size, total_size);
 
+	if (is_shared)
+		test_uffd_missing(fd, page_size, total_size);
+
 	close(fd);
 	kvm_vm_release(vm);
 }
-- 
2.47.1



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/5] mm: userfaultfd: generic continue for non hugetlbfs
  2025-04-02 16:07 ` [PATCH v2 1/5] mm: userfaultfd: generic continue for non hugetlbfs Nikita Kalyazin
@ 2025-04-02 19:04   ` James Houghton
  2025-04-03 17:01     ` Nikita Kalyazin
  0 siblings, 1 reply; 16+ messages in thread
From: James Houghton @ 2025-04-02 19:04 UTC (permalink / raw)
  To: Nikita Kalyazin
  Cc: akpm, pbonzini, shuah, kvm, linux-kselftest, linux-kernel,
	linux-mm, lorenzo.stoakes, david, ryan.roberts, quic_eberman,
	peterx, graf, jgowans, roypat, derekmn, nsaenz, xmarcalx

On Wed, Apr 2, 2025 at 9:07 AM Nikita Kalyazin <kalyazin@amazon.com> wrote:
>
> Remove shmem-specific code from UFFDIO_CONTINUE implementation for
> non-huge pages by calling vm_ops->fault().  A new VMF flag,
> FAULT_FLAG_NO_USERFAULT_MINOR, is introduced to avoid recursive call to
> handle_userfault().
>
> Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com>
> ---
>  include/linux/mm_types.h |  3 +++
>  mm/hugetlb.c             |  2 +-
>  mm/shmem.c               |  3 ++-
>  mm/userfaultfd.c         | 25 ++++++++++++++++++-------
>  4 files changed, 24 insertions(+), 9 deletions(-)
>
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 0234f14f2aa6..91a00f2cd565 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -1429,6 +1429,8 @@ enum tlb_flush_reason {
>   * @FAULT_FLAG_ORIG_PTE_VALID: whether the fault has vmf->orig_pte cached.
>   *                        We should only access orig_pte if this flag set.
>   * @FAULT_FLAG_VMA_LOCK: The fault is handled under VMA lock.
> + * @FAULT_FLAG_NO_USERFAULT_MINOR: The fault handler must not call userfaultfd
> + *                                 minor handler.

Perhaps instead a flag that says to avoid the userfaultfd minor fault
handler, maybe we should have a flag to indicate that vm_ops->fault()
has been called by UFFDIO_CONTINUE. See below.

>   *
>   * About @FAULT_FLAG_ALLOW_RETRY and @FAULT_FLAG_TRIED: we can specify
>   * whether we would allow page faults to retry by specifying these two
> @@ -1467,6 +1469,7 @@ enum fault_flag {
>         FAULT_FLAG_UNSHARE =            1 << 10,
>         FAULT_FLAG_ORIG_PTE_VALID =     1 << 11,
>         FAULT_FLAG_VMA_LOCK =           1 << 12,
> +       FAULT_FLAG_NO_USERFAULT_MINOR = 1 << 13,
>  };
>
>  typedef unsigned int __bitwise zap_flags_t;
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 97930d44d460..ba90d48144fc 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -6228,7 +6228,7 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
>                 }
>
>                 /* Check for page in userfault range. */
> -               if (userfaultfd_minor(vma)) {
> +               if (userfaultfd_minor(vma) && !(vmf->flags & FAULT_FLAG_NO_USERFAULT_MINOR)) {
>                         folio_unlock(folio);
>                         folio_put(folio);
>                         /* See comment in userfaultfd_missing() block above */
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 1ede0800e846..5e1911e39dec 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -2467,7 +2467,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
>         fault_mm = vma ? vma->vm_mm : NULL;
>
>         folio = filemap_get_entry(inode->i_mapping, index);
> -       if (folio && vma && userfaultfd_minor(vma)) {
> +       if (folio && vma && userfaultfd_minor(vma) &&
> +           !(vmf->flags & FAULT_FLAG_NO_USERFAULT_MINOR)) {
>                 if (!xa_is_value(folio))
>                         folio_put(folio);
>                 *fault_type = handle_userfault(vmf, VM_UFFD_MINOR);
> diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> index d06453fa8aba..68a995216789 100644
> --- a/mm/userfaultfd.c
> +++ b/mm/userfaultfd.c
> @@ -386,24 +386,35 @@ static int mfill_atomic_pte_continue(pmd_t *dst_pmd,
>                                      unsigned long dst_addr,
>                                      uffd_flags_t flags)
>  {
> -       struct inode *inode = file_inode(dst_vma->vm_file);
> -       pgoff_t pgoff = linear_page_index(dst_vma, dst_addr);
>         struct folio *folio;
>         struct page *page;
>         int ret;
> +       struct vm_fault vmf = {
> +               .vma = dst_vma,
> +               .address = dst_addr,
> +               .flags = FAULT_FLAG_WRITE | FAULT_FLAG_REMOTE |
> +                   FAULT_FLAG_NO_USERFAULT_MINOR,
> +               .pte = NULL,
> +               .page = NULL,
> +               .pgoff = linear_page_index(dst_vma, dst_addr),
> +       };
> +
> +       if (!dst_vma->vm_ops || !dst_vma->vm_ops->fault)
> +               return -EINVAL;
>
> -       ret = shmem_get_folio(inode, pgoff, 0, &folio, SGP_NOALLOC);
> -       /* Our caller expects us to return -EFAULT if we failed to find folio */
> -       if (ret == -ENOENT)
> +       ret = dst_vma->vm_ops->fault(&vmf);

shmem_get_folio() was being called with SGP_NOALLOC, and now it is
being called with SGP_CACHE (by shmem_fault()). This will result in a
UAPI change: UFFDIO_CONTINUE for a VA without a page in the page cache
should result in EFAULT, but now the page will be allocated.
SGP_NOALLOC was carefully chosen[1], so I think a better way to do
this will be to:

1. Have a FAULT_FLAG_USERFAULT_CONTINUE (or something)
2. In shmem_fault(), if FAULT_FLAG_USERFAULT_CONTINUE, use SGP_NOALLOC
instead of SGP_CACHE (and make sure not to drop into
handle_userfault(), of course)

[1]: https://lore.kernel.org/linux-mm/20220610173812.1768919-1-axelrasmussen@google.com/

> +       if (ret & VM_FAULT_ERROR) {
>                 ret = -EFAULT;
> -       if (ret)
>                 goto out;
> +       }
> +
> +       page = vmf.page;
> +       folio = page_folio(page);
>         if (!folio) {

What if ret == VM_FAULT_RETRY? I think we should retry instead instead
of returning -EFAULT.

And I'm not sure how VM_FAULT_NOPAGE should be handled, like if we
need special logic for it or not.

>                 ret = -EFAULT;
>                 goto out;
>         }
>
> -       page = folio_file_page(folio, pgoff);
>         if (PageHWPoison(page)) {
>                 ret = -EIO;
>                 goto out_release;
> --
> 2.47.1
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 5/5] KVM: selftests: test userfaultfd minor for guest_memfd
  2025-04-02 16:07 ` [PATCH v2 5/5] KVM: selftests: test userfaultfd minor for guest_memfd Nikita Kalyazin
@ 2025-04-02 21:10   ` James Houghton
  2025-04-03 17:02     ` Nikita Kalyazin
  0 siblings, 1 reply; 16+ messages in thread
From: James Houghton @ 2025-04-02 21:10 UTC (permalink / raw)
  To: Nikita Kalyazin
  Cc: akpm, pbonzini, shuah, kvm, linux-kselftest, linux-kernel,
	linux-mm, lorenzo.stoakes, david, ryan.roberts, quic_eberman,
	peterx, graf, jgowans, roypat, derekmn, nsaenz, xmarcalx

On Wed, Apr 2, 2025 at 9:08 AM Nikita Kalyazin <kalyazin@amazon.com> wrote:
>
> The test demonstrates that a minor userfaultfd event in guest_memfd can
> be resolved via a memcpy followed by a UFFDIO_CONTINUE ioctl.
>
> Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com>
> ---
>  .../testing/selftests/kvm/guest_memfd_test.c  | 94 +++++++++++++++++++
>  1 file changed, 94 insertions(+)
>
> diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
> index 38c501e49e0e..9b47b796f3aa 100644
> --- a/tools/testing/selftests/kvm/guest_memfd_test.c
> +++ b/tools/testing/selftests/kvm/guest_memfd_test.c
> @@ -10,12 +10,16 @@
>  #include <errno.h>
>  #include <stdio.h>
>  #include <fcntl.h>
> +#include <pthread.h>
>
>  #include <linux/bitmap.h>
>  #include <linux/falloc.h>
> +#include <linux/userfaultfd.h>
>  #include <sys/mman.h>
>  #include <sys/types.h>
>  #include <sys/stat.h>
> +#include <sys/syscall.h>
> +#include <sys/ioctl.h>
>
>  #include "kvm_util.h"
>  #include "test_util.h"
> @@ -206,6 +210,93 @@ static void test_create_guest_memfd_multiple(struct kvm_vm *vm)
>         close(fd1);
>  }
>
> +struct fault_args {
> +       char *addr;
> +       volatile char value;

I think you should/must put volatile on `addr` and not on `value`.

> +};
> +
> +static void *fault_thread_fn(void *arg)
> +{
> +       struct fault_args *args = arg;
> +
> +       /* Trigger page fault */
> +       args->value = *args->addr;
> +       return NULL;
> +}
> +
> +static void test_uffd_missing(int fd, size_t page_size, size_t total_size)

test_uffd_minor? :)

> +{
> +       struct uffdio_register uffd_reg;
> +       struct uffdio_continue uffd_cont;
> +       struct uffd_msg msg;
> +       struct fault_args args;
> +       pthread_t fault_thread;
> +       void *mem, *mem_nofault, *buf = NULL;
> +       int uffd, ret;
> +       off_t offset = page_size;
> +       void *fault_addr;
> +
> +       ret = posix_memalign(&buf, page_size, total_size);
> +       TEST_ASSERT_EQ(ret, 0);
> +
> +       uffd = syscall(__NR_userfaultfd, O_CLOEXEC);
> +       TEST_ASSERT(uffd != -1, "userfaultfd creation should succeed");
> +
> +       struct uffdio_api uffdio_api = {
> +               .api = UFFD_API,
> +               .features = UFFD_FEATURE_MISSING_SHMEM,

I think you mean UFFD_FEATURE_MINOR_SHMEM...?

And I'm trying to think through what feature we should expose for
guest_memfd; UFFD_FEATURE_MINOR_SHMEM already indicates support for
shmem.

We could have UFFD_FEATURE_MINOR_GUESTMEMFD, perhaps that's enough.

Or we could have UFFD_FEATURE_MINOR_GENERIC (or nothing at all!). Some
VMAs might not support the minor mode, and the user will figure that
out when UFFDIO_REGISTER fails.

> +       };
> +       ret = ioctl(uffd, UFFDIO_API, &uffdio_api);
> +       TEST_ASSERT(ret != -1, "ioctl(UFFDIO_API) should succeed");
> +
> +       mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> +       TEST_ASSERT(mem != MAP_FAILED, "mmap should succeed");
> +
> +       mem_nofault = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> +       TEST_ASSERT(mem_nofault != MAP_FAILED, "mmap should succeed");
> +
> +       uffd_reg.range.start = (unsigned long)mem;
> +       uffd_reg.range.len = total_size;
> +       uffd_reg.mode = UFFDIO_REGISTER_MODE_MINOR;
> +       ret = ioctl(uffd, UFFDIO_REGISTER, &uffd_reg);
> +       TEST_ASSERT(ret != -1, "ioctl(UFFDIO_REGISTER) should succeed");
> +
> +       ret = fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE,
> +                       offset, page_size);
> +       TEST_ASSERT(!ret, "fallocate(PUNCH_HOLE) should succeed");
> +
> +       fault_addr = mem + offset;
> +       args.addr = fault_addr;
> +
> +       ret = pthread_create(&fault_thread, NULL, fault_thread_fn, &args);
> +       TEST_ASSERT(ret == 0, "pthread_create should succeed");
> +
> +       ret = read(uffd, &msg, sizeof(msg));
> +       TEST_ASSERT(ret != -1, "read from userfaultfd should succeed");
> +       TEST_ASSERT(msg.event == UFFD_EVENT_PAGEFAULT, "event type should be pagefault");
> +       TEST_ASSERT((void *)(msg.arg.pagefault.address & ~(page_size - 1)) == fault_addr,
> +                   "pagefault should occur at expected address");
> +
> +       memcpy(mem_nofault + offset, buf + offset, page_size);
> +
> +       uffd_cont.range.start = (unsigned long)fault_addr;
> +       uffd_cont.range.len = page_size;
> +       uffd_cont.mode = 0;
> +       ret = ioctl(uffd, UFFDIO_CONTINUE, &uffd_cont);
> +       TEST_ASSERT(ret != -1, "ioctl(UFFDIO_CONTINUE) should succeed");
> +
> +       ret = pthread_join(fault_thread, NULL);
> +       TEST_ASSERT(ret == 0, "pthread_join should succeed");

And maybe also:

/* Right value? */
TEST_ASSERT(args.value == *(char *)mem_nofault));
/* No second fault? */
TEST_ASSERT(args.value == *(char *)mem);

> +
> +       ret = munmap(mem_nofault, total_size);
> +       TEST_ASSERT(!ret, "munmap should succeed");
> +
> +       ret = munmap(mem, total_size);
> +       TEST_ASSERT(!ret, "munmap should succeed");
> +       free(buf);
> +       close(uffd);
> +}
> +
>  unsigned long get_shared_type(void)
>  {
>  #ifdef __x86_64__
> @@ -244,6 +335,9 @@ void test_vm_type(unsigned long type, bool is_shared)
>         test_fallocate(fd, page_size, total_size);
>         test_invalid_punch_hole(fd, page_size, total_size);
>
> +       if (is_shared)
> +               test_uffd_missing(fd, page_size, total_size);
> +
>         close(fd);
>         kvm_vm_release(vm);
>  }
> --
> 2.47.1
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/5] mm: userfaultfd: allow to register continue for guest_memfd
  2025-04-02 16:07 ` [PATCH v2 3/5] mm: userfaultfd: allow to register continue for guest_memfd Nikita Kalyazin
@ 2025-04-02 21:25   ` James Houghton
  2025-04-03 17:01     ` Nikita Kalyazin
  0 siblings, 1 reply; 16+ messages in thread
From: James Houghton @ 2025-04-02 21:25 UTC (permalink / raw)
  To: Nikita Kalyazin
  Cc: akpm, pbonzini, shuah, kvm, linux-kselftest, linux-kernel,
	linux-mm, lorenzo.stoakes, david, ryan.roberts, quic_eberman,
	peterx, graf, jgowans, roypat, derekmn, nsaenz, xmarcalx

On Wed, Apr 2, 2025 at 9:08 AM Nikita Kalyazin <kalyazin@amazon.com> wrote:
>
> Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com>
> ---
>  include/linux/userfaultfd_k.h | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
> index 75342022d144..bc184edfbb85 100644
> --- a/include/linux/userfaultfd_k.h
> +++ b/include/linux/userfaultfd_k.h
> @@ -212,6 +212,10 @@ static inline bool userfaultfd_armed(struct vm_area_struct *vma)
>         return vma->vm_flags & __VM_UFFD_FLAGS;
>  }
>
> +#ifdef CONFIG_KVM_PRIVATE_MEM
> +bool kvm_gmem_vma_is_gmem(struct vm_area_struct *vma);
> +#endif
> +
>  static inline bool vma_can_userfault(struct vm_area_struct *vma,
>                                      unsigned long vm_flags,
>                                      bool wp_async)
> @@ -222,7 +226,11 @@ static inline bool vma_can_userfault(struct vm_area_struct *vma,
>                 return false;
>
>         if ((vm_flags & VM_UFFD_MINOR) &&
> -           (!is_vm_hugetlb_page(vma) && !vma_is_shmem(vma)))
> +           (!is_vm_hugetlb_page(vma) && !vma_is_shmem(vma))
> +#ifdef CONFIG_KVM_PRIVATE_MEM
> +            && !kvm_gmem_vma_is_gmem(vma)
> +#endif

Maybe a better way to do this is to add a vm_ops->can_userfault() or
something, so we could write something like this:

if (vma->vm_ops && !vma->vm_ops->can_userfault)
  return false;
if (vma->vm_ops && !vma->vm_ops->can_userfault(vm_flags))
  return false;

And shmem/hugetlbfs can advertise support for everything they already
support that way.

> +           )
>                 return false;
>
>         /*
> @@ -244,6 +252,9 @@ static inline bool vma_can_userfault(struct vm_area_struct *vma,
>
>         /* By default, allow any of anon|shmem|hugetlb */
>         return vma_is_anonymous(vma) || is_vm_hugetlb_page(vma) ||
> +#ifdef CONFIG_KVM_PRIVATE_MEM
> +           kvm_gmem_vma_is_gmem(vma) ||
> +#endif
>             vma_is_shmem(vma);
>  }
>
> --
> 2.47.1
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/5] mm: userfaultfd: generic continue for non hugetlbfs
  2025-04-02 19:04   ` James Houghton
@ 2025-04-03 17:01     ` Nikita Kalyazin
  0 siblings, 0 replies; 16+ messages in thread
From: Nikita Kalyazin @ 2025-04-03 17:01 UTC (permalink / raw)
  To: James Houghton
  Cc: akpm, pbonzini, shuah, kvm, linux-kselftest, linux-kernel,
	linux-mm, lorenzo.stoakes, david, ryan.roberts, quic_eberman,
	peterx, graf, jgowans, roypat, derekmn, nsaenz, xmarcalx



On 02/04/2025 20:04, James Houghton wrote:
> On Wed, Apr 2, 2025 at 9:07 AM Nikita Kalyazin <kalyazin@amazon.com> wrote:
>>
>> Remove shmem-specific code from UFFDIO_CONTINUE implementation for
>> non-huge pages by calling vm_ops->fault().  A new VMF flag,
>> FAULT_FLAG_NO_USERFAULT_MINOR, is introduced to avoid recursive call to
>> handle_userfault().
>>
>> Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com>
>> ---
>>   include/linux/mm_types.h |  3 +++
>>   mm/hugetlb.c             |  2 +-
>>   mm/shmem.c               |  3 ++-
>>   mm/userfaultfd.c         | 25 ++++++++++++++++++-------
>>   4 files changed, 24 insertions(+), 9 deletions(-)
>>
>> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
>> index 0234f14f2aa6..91a00f2cd565 100644
>> --- a/include/linux/mm_types.h
>> +++ b/include/linux/mm_types.h
>> @@ -1429,6 +1429,8 @@ enum tlb_flush_reason {
>>    * @FAULT_FLAG_ORIG_PTE_VALID: whether the fault has vmf->orig_pte cached.
>>    *                        We should only access orig_pte if this flag set.
>>    * @FAULT_FLAG_VMA_LOCK: The fault is handled under VMA lock.
>> + * @FAULT_FLAG_NO_USERFAULT_MINOR: The fault handler must not call userfaultfd
>> + *                                 minor handler.
> 
> Perhaps instead a flag that says to avoid the userfaultfd minor fault
> handler, maybe we should have a flag to indicate that vm_ops->fault()
> has been called by UFFDIO_CONTINUE. See below.
> 
>>    *
>>    * About @FAULT_FLAG_ALLOW_RETRY and @FAULT_FLAG_TRIED: we can specify
>>    * whether we would allow page faults to retry by specifying these two
>> @@ -1467,6 +1469,7 @@ enum fault_flag {
>>          FAULT_FLAG_UNSHARE =            1 << 10,
>>          FAULT_FLAG_ORIG_PTE_VALID =     1 << 11,
>>          FAULT_FLAG_VMA_LOCK =           1 << 12,
>> +       FAULT_FLAG_NO_USERFAULT_MINOR = 1 << 13,
>>   };
>>
>>   typedef unsigned int __bitwise zap_flags_t;
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index 97930d44d460..ba90d48144fc 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -6228,7 +6228,7 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
>>                  }
>>
>>                  /* Check for page in userfault range. */
>> -               if (userfaultfd_minor(vma)) {
>> +               if (userfaultfd_minor(vma) && !(vmf->flags & FAULT_FLAG_NO_USERFAULT_MINOR)) {
>>                          folio_unlock(folio);
>>                          folio_put(folio);
>>                          /* See comment in userfaultfd_missing() block above */
>> diff --git a/mm/shmem.c b/mm/shmem.c
>> index 1ede0800e846..5e1911e39dec 100644
>> --- a/mm/shmem.c
>> +++ b/mm/shmem.c
>> @@ -2467,7 +2467,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
>>          fault_mm = vma ? vma->vm_mm : NULL;
>>
>>          folio = filemap_get_entry(inode->i_mapping, index);
>> -       if (folio && vma && userfaultfd_minor(vma)) {
>> +       if (folio && vma && userfaultfd_minor(vma) &&
>> +           !(vmf->flags & FAULT_FLAG_NO_USERFAULT_MINOR)) {
>>                  if (!xa_is_value(folio))
>>                          folio_put(folio);
>>                  *fault_type = handle_userfault(vmf, VM_UFFD_MINOR);
>> diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
>> index d06453fa8aba..68a995216789 100644
>> --- a/mm/userfaultfd.c
>> +++ b/mm/userfaultfd.c
>> @@ -386,24 +386,35 @@ static int mfill_atomic_pte_continue(pmd_t *dst_pmd,
>>                                       unsigned long dst_addr,
>>                                       uffd_flags_t flags)
>>   {
>> -       struct inode *inode = file_inode(dst_vma->vm_file);
>> -       pgoff_t pgoff = linear_page_index(dst_vma, dst_addr);
>>          struct folio *folio;
>>          struct page *page;
>>          int ret;
>> +       struct vm_fault vmf = {
>> +               .vma = dst_vma,
>> +               .address = dst_addr,
>> +               .flags = FAULT_FLAG_WRITE | FAULT_FLAG_REMOTE |
>> +                   FAULT_FLAG_NO_USERFAULT_MINOR,
>> +               .pte = NULL,
>> +               .page = NULL,
>> +               .pgoff = linear_page_index(dst_vma, dst_addr),
>> +       };
>> +
>> +       if (!dst_vma->vm_ops || !dst_vma->vm_ops->fault)
>> +               return -EINVAL;
>>
>> -       ret = shmem_get_folio(inode, pgoff, 0, &folio, SGP_NOALLOC);
>> -       /* Our caller expects us to return -EFAULT if we failed to find folio */
>> -       if (ret == -ENOENT)
>> +       ret = dst_vma->vm_ops->fault(&vmf);
> 
> shmem_get_folio() was being called with SGP_NOALLOC, and now it is
> being called with SGP_CACHE (by shmem_fault()). This will result in a
> UAPI change: UFFDIO_CONTINUE for a VA without a page in the page cache
> should result in EFAULT, but now the page will be allocated.
> SGP_NOALLOC was carefully chosen[1], so I think a better way to do
> this will be to:
> 
> 1. Have a FAULT_FLAG_USERFAULT_CONTINUE (or something)
> 2. In shmem_fault(), if FAULT_FLAG_USERFAULT_CONTINUE, use SGP_NOALLOC
> instead of SGP_CACHE (and make sure not to drop into
> handle_userfault(), of course)

Yes, that is very true.  Thanks for pointing out.  Will apply in the 
next version.

> 
> [1]: https://lore.kernel.org/linux-mm/20220610173812.1768919-1-axelrasmussen@google.com/
> 
>> +       if (ret & VM_FAULT_ERROR) {
>>                  ret = -EFAULT;
>> -       if (ret)
>>                  goto out;
>> +       }
>> +
>> +       page = vmf.page;
>> +       folio = page_folio(page);
>>          if (!folio) {
> 
> What if ret == VM_FAULT_RETRY? I think we should retry instead instead
> of returning -EFAULT.

True.  I'm going to retry the vm_ops->fault() call in this case.

> 
> And I'm not sure how VM_FAULT_NOPAGE should be handled, like if we
> need special logic for it or not.

As far as I see, the only case it can come up in shmem is during a fault 
when a hole is being punched [1].  I'm thinking of returning EAGAIN to 
the userspace if this happens.

[1] https://elixir.bootlin.com/linux/v6.14-rc6/source/mm/shmem.c#L2670

> 
>>                  ret = -EFAULT;
>>                  goto out;
>>          }
>>
>> -       page = folio_file_page(folio, pgoff);
>>          if (PageHWPoison(page)) {
>>                  ret = -EIO;
>>                  goto out_release;
>> --
>> 2.47.1
>>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/5] mm: userfaultfd: allow to register continue for guest_memfd
  2025-04-02 21:25   ` James Houghton
@ 2025-04-03 17:01     ` Nikita Kalyazin
  0 siblings, 0 replies; 16+ messages in thread
From: Nikita Kalyazin @ 2025-04-03 17:01 UTC (permalink / raw)
  To: James Houghton
  Cc: akpm, pbonzini, shuah, kvm, linux-kselftest, linux-kernel,
	linux-mm, lorenzo.stoakes, david, ryan.roberts, quic_eberman,
	peterx, graf, jgowans, roypat, derekmn, nsaenz, xmarcalx



On 02/04/2025 22:25, James Houghton wrote:
> On Wed, Apr 2, 2025 at 9:08 AM Nikita Kalyazin <kalyazin@amazon.com> wrote:
>>
>> Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com>
>> ---
>>   include/linux/userfaultfd_k.h | 13 ++++++++++++-
>>   1 file changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
>> index 75342022d144..bc184edfbb85 100644
>> --- a/include/linux/userfaultfd_k.h
>> +++ b/include/linux/userfaultfd_k.h
>> @@ -212,6 +212,10 @@ static inline bool userfaultfd_armed(struct vm_area_struct *vma)
>>          return vma->vm_flags & __VM_UFFD_FLAGS;
>>   }
>>
>> +#ifdef CONFIG_KVM_PRIVATE_MEM
>> +bool kvm_gmem_vma_is_gmem(struct vm_area_struct *vma);
>> +#endif
>> +
>>   static inline bool vma_can_userfault(struct vm_area_struct *vma,
>>                                       unsigned long vm_flags,
>>                                       bool wp_async)
>> @@ -222,7 +226,11 @@ static inline bool vma_can_userfault(struct vm_area_struct *vma,
>>                  return false;
>>
>>          if ((vm_flags & VM_UFFD_MINOR) &&
>> -           (!is_vm_hugetlb_page(vma) && !vma_is_shmem(vma)))
>> +           (!is_vm_hugetlb_page(vma) && !vma_is_shmem(vma))
>> +#ifdef CONFIG_KVM_PRIVATE_MEM
>> +            && !kvm_gmem_vma_is_gmem(vma)
>> +#endif
> 
> Maybe a better way to do this is to add a vm_ops->can_userfault() or
> something, so we could write something like this:
> 
> if (vma->vm_ops && !vma->vm_ops->can_userfault)
>    return false;
> if (vma->vm_ops && !vma->vm_ops->can_userfault(vm_flags))
>    return false;

I like that, thanks!  What do you see passing vm_flags being useful for? 
  Shall we pass the entire vma struct like in most of other callbacks?

> 
> And shmem/hugetlbfs can advertise support for everything they already
> support that way.
> 
>> +           )
>>                  return false;
>>
>>          /*
>> @@ -244,6 +252,9 @@ static inline bool vma_can_userfault(struct vm_area_struct *vma,
>>
>>          /* By default, allow any of anon|shmem|hugetlb */
>>          return vma_is_anonymous(vma) || is_vm_hugetlb_page(vma) ||
>> +#ifdef CONFIG_KVM_PRIVATE_MEM
>> +           kvm_gmem_vma_is_gmem(vma) ||
>> +#endif
>>              vma_is_shmem(vma);
>>   }
>>
>> --
>> 2.47.1
>>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 5/5] KVM: selftests: test userfaultfd minor for guest_memfd
  2025-04-02 21:10   ` James Houghton
@ 2025-04-03 17:02     ` Nikita Kalyazin
  0 siblings, 0 replies; 16+ messages in thread
From: Nikita Kalyazin @ 2025-04-03 17:02 UTC (permalink / raw)
  To: James Houghton
  Cc: akpm, pbonzini, shuah, kvm, linux-kselftest, linux-kernel,
	linux-mm, lorenzo.stoakes, david, ryan.roberts, quic_eberman,
	peterx, graf, jgowans, roypat, derekmn, nsaenz, xmarcalx



On 02/04/2025 22:10, James Houghton wrote:
> On Wed, Apr 2, 2025 at 9:08 AM Nikita Kalyazin <kalyazin@amazon.com> wrote:
>>
>> The test demonstrates that a minor userfaultfd event in guest_memfd can
>> be resolved via a memcpy followed by a UFFDIO_CONTINUE ioctl.
>>
>> Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com>
>> ---
>>   .../testing/selftests/kvm/guest_memfd_test.c  | 94 +++++++++++++++++++
>>   1 file changed, 94 insertions(+)
>>
>> diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
>> index 38c501e49e0e..9b47b796f3aa 100644
>> --- a/tools/testing/selftests/kvm/guest_memfd_test.c
>> +++ b/tools/testing/selftests/kvm/guest_memfd_test.c
>> @@ -10,12 +10,16 @@
>>   #include <errno.h>
>>   #include <stdio.h>
>>   #include <fcntl.h>
>> +#include <pthread.h>
>>
>>   #include <linux/bitmap.h>
>>   #include <linux/falloc.h>
>> +#include <linux/userfaultfd.h>
>>   #include <sys/mman.h>
>>   #include <sys/types.h>
>>   #include <sys/stat.h>
>> +#include <sys/syscall.h>
>> +#include <sys/ioctl.h>
>>
>>   #include "kvm_util.h"
>>   #include "test_util.h"
>> @@ -206,6 +210,93 @@ static void test_create_guest_memfd_multiple(struct kvm_vm *vm)
>>          close(fd1);
>>   }
>>
>> +struct fault_args {
>> +       char *addr;
>> +       volatile char value;
> 
> I think you should/must put volatile on `addr` and not on `value`.

This was to prevent the compiler from omitting the write to the value, 
because it's never read later on.

> 
>> +};
>> +
>> +static void *fault_thread_fn(void *arg)
>> +{
>> +       struct fault_args *args = arg;
>> +
>> +       /* Trigger page fault */
>> +       args->value = *args->addr;
>> +       return NULL;
>> +}
>> +
>> +static void test_uffd_missing(int fd, size_t page_size, size_t total_size)
> 
> test_uffd_minor? :)
> 
>> +{
>> +       struct uffdio_register uffd_reg;
>> +       struct uffdio_continue uffd_cont;
>> +       struct uffd_msg msg;
>> +       struct fault_args args;
>> +       pthread_t fault_thread;
>> +       void *mem, *mem_nofault, *buf = NULL;
>> +       int uffd, ret;
>> +       off_t offset = page_size;
>> +       void *fault_addr;
>> +
>> +       ret = posix_memalign(&buf, page_size, total_size);
>> +       TEST_ASSERT_EQ(ret, 0);
>> +
>> +       uffd = syscall(__NR_userfaultfd, O_CLOEXEC);
>> +       TEST_ASSERT(uffd != -1, "userfaultfd creation should succeed");
>> +
>> +       struct uffdio_api uffdio_api = {
>> +               .api = UFFD_API,
>> +               .features = UFFD_FEATURE_MISSING_SHMEM,
> 
> I think you mean UFFD_FEATURE_MINOR_SHMEM...?
> 
> And I'm trying to think through what feature we should expose for
> guest_memfd; UFFD_FEATURE_MINOR_SHMEM already indicates support for
> shmem.
> 
> We could have UFFD_FEATURE_MINOR_GUESTMEMFD, perhaps that's enough.

Yes, I will introduce UFFD_FEATURE_MINOR_GUEST_MEMFD in the next version.

> 
> Or we could have UFFD_FEATURE_MINOR_GENERIC (or nothing at all!). Some
> VMAs might not support the minor mode, and the user will figure that
> out when UFFDIO_REGISTER fails.

My concern is the exact reason of the failure may not be apparent to the 
caller in that case.

> 
>> +       };
>> +       ret = ioctl(uffd, UFFDIO_API, &uffdio_api);
>> +       TEST_ASSERT(ret != -1, "ioctl(UFFDIO_API) should succeed");
>> +
>> +       mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
>> +       TEST_ASSERT(mem != MAP_FAILED, "mmap should succeed");
>> +
>> +       mem_nofault = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
>> +       TEST_ASSERT(mem_nofault != MAP_FAILED, "mmap should succeed");
>> +
>> +       uffd_reg.range.start = (unsigned long)mem;
>> +       uffd_reg.range.len = total_size;
>> +       uffd_reg.mode = UFFDIO_REGISTER_MODE_MINOR;
>> +       ret = ioctl(uffd, UFFDIO_REGISTER, &uffd_reg);
>> +       TEST_ASSERT(ret != -1, "ioctl(UFFDIO_REGISTER) should succeed");
>> +
>> +       ret = fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE,
>> +                       offset, page_size);
>> +       TEST_ASSERT(!ret, "fallocate(PUNCH_HOLE) should succeed");
>> +
>> +       fault_addr = mem + offset;
>> +       args.addr = fault_addr;
>> +
>> +       ret = pthread_create(&fault_thread, NULL, fault_thread_fn, &args);
>> +       TEST_ASSERT(ret == 0, "pthread_create should succeed");
>> +
>> +       ret = read(uffd, &msg, sizeof(msg));
>> +       TEST_ASSERT(ret != -1, "read from userfaultfd should succeed");
>> +       TEST_ASSERT(msg.event == UFFD_EVENT_PAGEFAULT, "event type should be pagefault");
>> +       TEST_ASSERT((void *)(msg.arg.pagefault.address & ~(page_size - 1)) == fault_addr,
>> +                   "pagefault should occur at expected address");
>> +
>> +       memcpy(mem_nofault + offset, buf + offset, page_size);
>> +
>> +       uffd_cont.range.start = (unsigned long)fault_addr;
>> +       uffd_cont.range.len = page_size;
>> +       uffd_cont.mode = 0;
>> +       ret = ioctl(uffd, UFFDIO_CONTINUE, &uffd_cont);
>> +       TEST_ASSERT(ret != -1, "ioctl(UFFDIO_CONTINUE) should succeed");
>> +
>> +       ret = pthread_join(fault_thread, NULL);
>> +       TEST_ASSERT(ret == 0, "pthread_join should succeed");
> 
> And maybe also:
> 
> /* Right value? */
> TEST_ASSERT(args.value == *(char *)mem_nofault));
> /* No second fault? */
> TEST_ASSERT(args.value == *(char *)mem);

Good idea, thanks.  I don't need the volatile anymore :)

> 
>> +
>> +       ret = munmap(mem_nofault, total_size);
>> +       TEST_ASSERT(!ret, "munmap should succeed");
>> +
>> +       ret = munmap(mem, total_size);
>> +       TEST_ASSERT(!ret, "munmap should succeed");
>> +       free(buf);
>> +       close(uffd);
>> +}
>> +
>>   unsigned long get_shared_type(void)
>>   {
>>   #ifdef __x86_64__
>> @@ -244,6 +335,9 @@ void test_vm_type(unsigned long type, bool is_shared)
>>          test_fallocate(fd, page_size, total_size);
>>          test_invalid_punch_hole(fd, page_size, total_size);
>>
>> +       if (is_shared)
>> +               test_uffd_missing(fd, page_size, total_size);
>> +
>>          close(fd);
>>          kvm_vm_release(vm);
>>   }
>> --
>> 2.47.1
>>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 5/5] KVM: selftests: test userfaultfd minor for guest_memfd
  2025-11-26 16:49   ` Nikita Kalyazin
@ 2025-11-27 10:39     ` Mike Rapoport
  0 siblings, 0 replies; 16+ messages in thread
From: Mike Rapoport @ 2025-11-27 10:39 UTC (permalink / raw)
  To: Nikita Kalyazin
  Cc: linux-mm, Andrea Arcangeli, Andrew Morton, Axel Rasmussen,
	Baolin Wang, David Hildenbrand, Hugh Dickins, James Houghton,
	Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Paolo Bonzini,
	Peter Xu, Sean Christopherson, Shuah Khan, Suren Baghdasaryan,
	Vlastimil Babka, linux-kernel, kvm, linux-kselftest

On Wed, Nov 26, 2025 at 04:49:46PM +0000, Nikita Kalyazin wrote:
> On 25/11/2025 18:38, Mike Rapoport wrote:
> > From: Nikita Kalyazin <kalyazin@amazon.com>
> > 
> > +static void test_uffd_minor(int fd, size_t total_size)
> > +{
> > +       struct uffdio_api uffdio_api = {
> > +               .api = UFFD_API,
> > +               .features = UFFD_FEATURE_MINOR_GENERIC,
> 
> Should it be UFFD_FEATURE_MINOR_SHMEM instead? UFFD_FEATURE_MINOR_GENERIC
> was removed in the v1.

I'll drop .features completely, the checks in UFFDIO_REGISTER are
sufficient.
 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 5/5] KVM: selftests: test userfaultfd minor for guest_memfd
  2025-11-25 18:38 ` [PATCH v2 5/5] KVM: selftests: test userfaultfd minor for guest_memfd Mike Rapoport
  2025-11-26 15:23   ` Liam R. Howlett
@ 2025-11-26 16:49   ` Nikita Kalyazin
  2025-11-27 10:39     ` Mike Rapoport
  1 sibling, 1 reply; 16+ messages in thread
From: Nikita Kalyazin @ 2025-11-26 16:49 UTC (permalink / raw)
  To: Mike Rapoport, linux-mm
  Cc: Andrea Arcangeli, Andrew Morton, Axel Rasmussen, Baolin Wang,
	David Hildenbrand, Hugh Dickins, James Houghton, Liam R. Howlett,
	Lorenzo Stoakes, Michal Hocko, Paolo Bonzini, Peter Xu,
	Sean Christopherson, Shuah Khan, Suren Baghdasaryan,
	Vlastimil Babka, linux-kernel, kvm, linux-kselftest



On 25/11/2025 18:38, Mike Rapoport wrote:
> From: Nikita Kalyazin <kalyazin@amazon.com>
> 
> The test demonstrates that a minor userfaultfd event in guest_memfd can
> be resolved via a memcpy followed by a UFFDIO_CONTINUE ioctl.
> 
> Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com>
> Co-developed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> ---
>   .../testing/selftests/kvm/guest_memfd_test.c  | 103 ++++++++++++++++++
>   1 file changed, 103 insertions(+)
> 
> diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
> index e7d9aeb418d3..a5d3ed21d7bb 100644
> --- a/tools/testing/selftests/kvm/guest_memfd_test.c
> +++ b/tools/testing/selftests/kvm/guest_memfd_test.c
> @@ -10,13 +10,17 @@
>   #include <errno.h>
>   #include <stdio.h>
>   #include <fcntl.h>
> +#include <pthread.h>
> 
>   #include <linux/bitmap.h>
>   #include <linux/falloc.h>
>   #include <linux/sizes.h>
> +#include <linux/userfaultfd.h>
>   #include <sys/mman.h>
>   #include <sys/types.h>
>   #include <sys/stat.h>
> +#include <sys/syscall.h>
> +#include <sys/ioctl.h>
> 
>   #include "kvm_util.h"
>   #include "test_util.h"
> @@ -254,6 +258,104 @@ static void test_guest_memfd_flags(struct kvm_vm *vm)
>          }
>   }
> 
> +struct fault_args {
> +       char *addr;
> +       volatile char value;
> +};
> +
> +static void *fault_thread_fn(void *arg)
> +{
> +       struct fault_args *args = arg;
> +
> +       /* Trigger page fault */
> +       args->value = *args->addr;
> +       return NULL;
> +}
> +
> +static void test_uffd_minor(int fd, size_t total_size)
> +{
> +       struct uffdio_api uffdio_api = {
> +               .api = UFFD_API,
> +               .features = UFFD_FEATURE_MINOR_GENERIC,

Should it be UFFD_FEATURE_MINOR_SHMEM instead? 
UFFD_FEATURE_MINOR_GENERIC was removed in the v1.

> +       };
> +       struct uffdio_register uffd_reg;
> +       struct uffdio_continue uffd_cont;
> +       struct uffd_msg msg;
> +       struct fault_args args;
> +       pthread_t fault_thread;
> +       void *mem, *mem_nofault, *buf = NULL;
> +       int uffd, ret;
> +       off_t offset = page_size;
> +       void *fault_addr;
> +
> +       ret = posix_memalign(&buf, page_size, total_size);
> +       TEST_ASSERT_EQ(ret, 0);
> +
> +       memset(buf, 0xaa, total_size);
> +
> +       uffd = syscall(__NR_userfaultfd, O_CLOEXEC);
> +       TEST_ASSERT(uffd != -1, "userfaultfd creation should succeed");
> +
> +       ret = ioctl(uffd, UFFDIO_API, &uffdio_api);
> +       TEST_ASSERT(ret != -1, "ioctl(UFFDIO_API) should succeed");
> +
> +       mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> +       TEST_ASSERT(mem != MAP_FAILED, "mmap should succeed");
> +
> +       mem_nofault = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> +       TEST_ASSERT(mem_nofault != MAP_FAILED, "mmap should succeed");
> +
> +       uffd_reg.range.start = (unsigned long)mem;
> +       uffd_reg.range.len = total_size;
> +       uffd_reg.mode = UFFDIO_REGISTER_MODE_MINOR;
> +       ret = ioctl(uffd, UFFDIO_REGISTER, &uffd_reg);
> +       TEST_ASSERT(ret != -1, "ioctl(UFFDIO_REGISTER) should succeed");
> +
> +       ret = fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE,
> +                       offset, page_size);
> +       TEST_ASSERT(!ret, "fallocate(PUNCH_HOLE) should succeed");
> +
> +       fault_addr = mem + offset;
> +       args.addr = fault_addr;
> +
> +       ret = pthread_create(&fault_thread, NULL, fault_thread_fn, &args);
> +       TEST_ASSERT(ret == 0, "pthread_create should succeed");
> +
> +       ret = read(uffd, &msg, sizeof(msg));
> +       TEST_ASSERT(ret != -1, "read from userfaultfd should succeed");
> +       TEST_ASSERT(msg.event == UFFD_EVENT_PAGEFAULT, "event type should be pagefault");
> +       TEST_ASSERT((void *)(msg.arg.pagefault.address & ~(page_size - 1)) == fault_addr,
> +                   "pagefault should occur at expected address");
> +
> +       memcpy(mem_nofault + offset, buf + offset, page_size);
> +
> +       uffd_cont.range.start = (unsigned long)fault_addr;
> +       uffd_cont.range.len = page_size;
> +       uffd_cont.mode = 0;
> +       ret = ioctl(uffd, UFFDIO_CONTINUE, &uffd_cont);
> +       TEST_ASSERT(ret != -1, "ioctl(UFFDIO_CONTINUE) should succeed");
> +
> +       /*
> +        * wait for fault_thread to finish to make sure fault happened and was
> +        * resolved before we verify the values
> +        */
> +       ret = pthread_join(fault_thread, NULL);
> +       TEST_ASSERT(ret == 0, "pthread_join should succeed");
> +
> +       TEST_ASSERT(args.value == *(char *)(mem_nofault + offset),
> +                   "memory should contain the value that was copied");
> +       TEST_ASSERT(args.value == *(char *)(mem + offset),
> +                   "no further fault is expected");
> +
> +       ret = munmap(mem_nofault, total_size);
> +       TEST_ASSERT(!ret, "munmap should succeed");
> +
> +       ret = munmap(mem, total_size);
> +       TEST_ASSERT(!ret, "munmap should succeed");
> +       free(buf);
> +       close(uffd);
> +}
> +
>   #define gmem_test(__test, __vm, __flags)                               \
>   do {                                                                   \
>          int fd = vm_create_guest_memfd(__vm, page_size * 4, __flags);   \
> @@ -273,6 +375,7 @@ static void __test_guest_memfd(struct kvm_vm *vm, uint64_t flags)
>                  if (flags & GUEST_MEMFD_FLAG_INIT_SHARED) {
>                          gmem_test(mmap_supported, vm, flags);
>                          gmem_test(fault_overflow, vm, flags);
> +                       gmem_test(uffd_minor, vm, flags);
>                  } else {
>                          gmem_test(fault_private, vm, flags);
>                  }
> --
> 2.50.1
> 



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 5/5] KVM: selftests: test userfaultfd minor for guest_memfd
  2025-11-25 18:38 ` [PATCH v2 5/5] KVM: selftests: test userfaultfd minor for guest_memfd Mike Rapoport
@ 2025-11-26 15:23   ` Liam R. Howlett
  2025-11-26 16:49   ` Nikita Kalyazin
  1 sibling, 0 replies; 16+ messages in thread
From: Liam R. Howlett @ 2025-11-26 15:23 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: linux-mm, Andrea Arcangeli, Andrew Morton, Axel Rasmussen,
	Baolin Wang, David Hildenbrand, Hugh Dickins, James Houghton,
	Lorenzo Stoakes, Michal Hocko, Nikita Kalyazin, Paolo Bonzini,
	Peter Xu, Sean Christopherson, Shuah Khan, Suren Baghdasaryan,
	Vlastimil Babka, linux-kernel, kvm, linux-kselftest

* Mike Rapoport <rppt@kernel.org> [251125 13:39]:
> From: Nikita Kalyazin <kalyazin@amazon.com>
> 
> The test demonstrates that a minor userfaultfd event in guest_memfd can
> be resolved via a memcpy followed by a UFFDIO_CONTINUE ioctl.
> 
> Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com>
> Co-developed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com>

> ---
>  .../testing/selftests/kvm/guest_memfd_test.c  | 103 ++++++++++++++++++
>  1 file changed, 103 insertions(+)
> 
> diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
> index e7d9aeb418d3..a5d3ed21d7bb 100644
> --- a/tools/testing/selftests/kvm/guest_memfd_test.c
> +++ b/tools/testing/selftests/kvm/guest_memfd_test.c
> @@ -10,13 +10,17 @@
>  #include <errno.h>
>  #include <stdio.h>
>  #include <fcntl.h>
> +#include <pthread.h>
>  
>  #include <linux/bitmap.h>
>  #include <linux/falloc.h>
>  #include <linux/sizes.h>
> +#include <linux/userfaultfd.h>
>  #include <sys/mman.h>
>  #include <sys/types.h>
>  #include <sys/stat.h>
> +#include <sys/syscall.h>
> +#include <sys/ioctl.h>
>  
>  #include "kvm_util.h"
>  #include "test_util.h"
> @@ -254,6 +258,104 @@ static void test_guest_memfd_flags(struct kvm_vm *vm)
>  	}
>  }
>  
> +struct fault_args {
> +	char *addr;
> +	volatile char value;
> +};
> +
> +static void *fault_thread_fn(void *arg)
> +{
> +	struct fault_args *args = arg;
> +
> +	/* Trigger page fault */
> +	args->value = *args->addr;
> +	return NULL;
> +}
> +
> +static void test_uffd_minor(int fd, size_t total_size)
> +{
> +	struct uffdio_api uffdio_api = {
> +		.api = UFFD_API,
> +		.features = UFFD_FEATURE_MINOR_GENERIC,
> +	};
> +	struct uffdio_register uffd_reg;
> +	struct uffdio_continue uffd_cont;
> +	struct uffd_msg msg;
> +	struct fault_args args;
> +	pthread_t fault_thread;
> +	void *mem, *mem_nofault, *buf = NULL;
> +	int uffd, ret;
> +	off_t offset = page_size;
> +	void *fault_addr;
> +
> +	ret = posix_memalign(&buf, page_size, total_size);
> +	TEST_ASSERT_EQ(ret, 0);
> +
> +	memset(buf, 0xaa, total_size);
> +
> +	uffd = syscall(__NR_userfaultfd, O_CLOEXEC);
> +	TEST_ASSERT(uffd != -1, "userfaultfd creation should succeed");
> +
> +	ret = ioctl(uffd, UFFDIO_API, &uffdio_api);
> +	TEST_ASSERT(ret != -1, "ioctl(UFFDIO_API) should succeed");
> +
> +	mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> +	TEST_ASSERT(mem != MAP_FAILED, "mmap should succeed");
> +
> +	mem_nofault = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> +	TEST_ASSERT(mem_nofault != MAP_FAILED, "mmap should succeed");
> +
> +	uffd_reg.range.start = (unsigned long)mem;
> +	uffd_reg.range.len = total_size;
> +	uffd_reg.mode = UFFDIO_REGISTER_MODE_MINOR;
> +	ret = ioctl(uffd, UFFDIO_REGISTER, &uffd_reg);
> +	TEST_ASSERT(ret != -1, "ioctl(UFFDIO_REGISTER) should succeed");
> +
> +	ret = fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE,
> +			offset, page_size);
> +	TEST_ASSERT(!ret, "fallocate(PUNCH_HOLE) should succeed");
> +
> +	fault_addr = mem + offset;
> +	args.addr = fault_addr;
> +
> +	ret = pthread_create(&fault_thread, NULL, fault_thread_fn, &args);
> +	TEST_ASSERT(ret == 0, "pthread_create should succeed");
> +
> +	ret = read(uffd, &msg, sizeof(msg));
> +	TEST_ASSERT(ret != -1, "read from userfaultfd should succeed");
> +	TEST_ASSERT(msg.event == UFFD_EVENT_PAGEFAULT, "event type should be pagefault");
> +	TEST_ASSERT((void *)(msg.arg.pagefault.address & ~(page_size - 1)) == fault_addr,
> +		    "pagefault should occur at expected address");
> +
> +	memcpy(mem_nofault + offset, buf + offset, page_size);
> +
> +	uffd_cont.range.start = (unsigned long)fault_addr;
> +	uffd_cont.range.len = page_size;
> +	uffd_cont.mode = 0;
> +	ret = ioctl(uffd, UFFDIO_CONTINUE, &uffd_cont);
> +	TEST_ASSERT(ret != -1, "ioctl(UFFDIO_CONTINUE) should succeed");
> +
> +	/*
> +	 * wait for fault_thread to finish to make sure fault happened and was
> +	 * resolved before we verify the values
> +	 */
> +	ret = pthread_join(fault_thread, NULL);
> +	TEST_ASSERT(ret == 0, "pthread_join should succeed");
> +
> +	TEST_ASSERT(args.value == *(char *)(mem_nofault + offset),
> +		    "memory should contain the value that was copied");
> +	TEST_ASSERT(args.value == *(char *)(mem + offset),
> +		    "no further fault is expected");
> +
> +	ret = munmap(mem_nofault, total_size);
> +	TEST_ASSERT(!ret, "munmap should succeed");
> +
> +	ret = munmap(mem, total_size);
> +	TEST_ASSERT(!ret, "munmap should succeed");
> +	free(buf);
> +	close(uffd);
> +}
> +
>  #define gmem_test(__test, __vm, __flags)				\
>  do {									\
>  	int fd = vm_create_guest_memfd(__vm, page_size * 4, __flags);	\
> @@ -273,6 +375,7 @@ static void __test_guest_memfd(struct kvm_vm *vm, uint64_t flags)
>  		if (flags & GUEST_MEMFD_FLAG_INIT_SHARED) {
>  			gmem_test(mmap_supported, vm, flags);
>  			gmem_test(fault_overflow, vm, flags);
> +			gmem_test(uffd_minor, vm, flags);
>  		} else {
>  			gmem_test(fault_private, vm, flags);
>  		}
> -- 
> 2.50.1
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 5/5] KVM: selftests: test userfaultfd minor for guest_memfd
  2025-11-25 18:38 [PATCH v2 0/5] mm, kvm: add guest_memfd support for uffd minor faults Mike Rapoport
@ 2025-11-25 18:38 ` Mike Rapoport
  2025-11-26 15:23   ` Liam R. Howlett
  2025-11-26 16:49   ` Nikita Kalyazin
  0 siblings, 2 replies; 16+ messages in thread
From: Mike Rapoport @ 2025-11-25 18:38 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrea Arcangeli, Andrew Morton, Axel Rasmussen, Baolin Wang,
	David Hildenbrand, Hugh Dickins, James Houghton, Liam R. Howlett,
	Lorenzo Stoakes, Michal Hocko, Mike Rapoport, Nikita Kalyazin,
	Paolo Bonzini, Peter Xu, Sean Christopherson, Shuah Khan,
	Suren Baghdasaryan, Vlastimil Babka, linux-kernel, kvm,
	linux-kselftest

From: Nikita Kalyazin <kalyazin@amazon.com>

The test demonstrates that a minor userfaultfd event in guest_memfd can
be resolved via a memcpy followed by a UFFDIO_CONTINUE ioctl.

Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com>
Co-developed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
 .../testing/selftests/kvm/guest_memfd_test.c  | 103 ++++++++++++++++++
 1 file changed, 103 insertions(+)

diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
index e7d9aeb418d3..a5d3ed21d7bb 100644
--- a/tools/testing/selftests/kvm/guest_memfd_test.c
+++ b/tools/testing/selftests/kvm/guest_memfd_test.c
@@ -10,13 +10,17 @@
 #include <errno.h>
 #include <stdio.h>
 #include <fcntl.h>
+#include <pthread.h>
 
 #include <linux/bitmap.h>
 #include <linux/falloc.h>
 #include <linux/sizes.h>
+#include <linux/userfaultfd.h>
 #include <sys/mman.h>
 #include <sys/types.h>
 #include <sys/stat.h>
+#include <sys/syscall.h>
+#include <sys/ioctl.h>
 
 #include "kvm_util.h"
 #include "test_util.h"
@@ -254,6 +258,104 @@ static void test_guest_memfd_flags(struct kvm_vm *vm)
 	}
 }
 
+struct fault_args {
+	char *addr;
+	volatile char value;
+};
+
+static void *fault_thread_fn(void *arg)
+{
+	struct fault_args *args = arg;
+
+	/* Trigger page fault */
+	args->value = *args->addr;
+	return NULL;
+}
+
+static void test_uffd_minor(int fd, size_t total_size)
+{
+	struct uffdio_api uffdio_api = {
+		.api = UFFD_API,
+		.features = UFFD_FEATURE_MINOR_GENERIC,
+	};
+	struct uffdio_register uffd_reg;
+	struct uffdio_continue uffd_cont;
+	struct uffd_msg msg;
+	struct fault_args args;
+	pthread_t fault_thread;
+	void *mem, *mem_nofault, *buf = NULL;
+	int uffd, ret;
+	off_t offset = page_size;
+	void *fault_addr;
+
+	ret = posix_memalign(&buf, page_size, total_size);
+	TEST_ASSERT_EQ(ret, 0);
+
+	memset(buf, 0xaa, total_size);
+
+	uffd = syscall(__NR_userfaultfd, O_CLOEXEC);
+	TEST_ASSERT(uffd != -1, "userfaultfd creation should succeed");
+
+	ret = ioctl(uffd, UFFDIO_API, &uffdio_api);
+	TEST_ASSERT(ret != -1, "ioctl(UFFDIO_API) should succeed");
+
+	mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+	TEST_ASSERT(mem != MAP_FAILED, "mmap should succeed");
+
+	mem_nofault = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+	TEST_ASSERT(mem_nofault != MAP_FAILED, "mmap should succeed");
+
+	uffd_reg.range.start = (unsigned long)mem;
+	uffd_reg.range.len = total_size;
+	uffd_reg.mode = UFFDIO_REGISTER_MODE_MINOR;
+	ret = ioctl(uffd, UFFDIO_REGISTER, &uffd_reg);
+	TEST_ASSERT(ret != -1, "ioctl(UFFDIO_REGISTER) should succeed");
+
+	ret = fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE,
+			offset, page_size);
+	TEST_ASSERT(!ret, "fallocate(PUNCH_HOLE) should succeed");
+
+	fault_addr = mem + offset;
+	args.addr = fault_addr;
+
+	ret = pthread_create(&fault_thread, NULL, fault_thread_fn, &args);
+	TEST_ASSERT(ret == 0, "pthread_create should succeed");
+
+	ret = read(uffd, &msg, sizeof(msg));
+	TEST_ASSERT(ret != -1, "read from userfaultfd should succeed");
+	TEST_ASSERT(msg.event == UFFD_EVENT_PAGEFAULT, "event type should be pagefault");
+	TEST_ASSERT((void *)(msg.arg.pagefault.address & ~(page_size - 1)) == fault_addr,
+		    "pagefault should occur at expected address");
+
+	memcpy(mem_nofault + offset, buf + offset, page_size);
+
+	uffd_cont.range.start = (unsigned long)fault_addr;
+	uffd_cont.range.len = page_size;
+	uffd_cont.mode = 0;
+	ret = ioctl(uffd, UFFDIO_CONTINUE, &uffd_cont);
+	TEST_ASSERT(ret != -1, "ioctl(UFFDIO_CONTINUE) should succeed");
+
+	/*
+	 * wait for fault_thread to finish to make sure fault happened and was
+	 * resolved before we verify the values
+	 */
+	ret = pthread_join(fault_thread, NULL);
+	TEST_ASSERT(ret == 0, "pthread_join should succeed");
+
+	TEST_ASSERT(args.value == *(char *)(mem_nofault + offset),
+		    "memory should contain the value that was copied");
+	TEST_ASSERT(args.value == *(char *)(mem + offset),
+		    "no further fault is expected");
+
+	ret = munmap(mem_nofault, total_size);
+	TEST_ASSERT(!ret, "munmap should succeed");
+
+	ret = munmap(mem, total_size);
+	TEST_ASSERT(!ret, "munmap should succeed");
+	free(buf);
+	close(uffd);
+}
+
 #define gmem_test(__test, __vm, __flags)				\
 do {									\
 	int fd = vm_create_guest_memfd(__vm, page_size * 4, __flags);	\
@@ -273,6 +375,7 @@ static void __test_guest_memfd(struct kvm_vm *vm, uint64_t flags)
 		if (flags & GUEST_MEMFD_FLAG_INIT_SHARED) {
 			gmem_test(mmap_supported, vm, flags);
 			gmem_test(fault_overflow, vm, flags);
+			gmem_test(uffd_minor, vm, flags);
 		} else {
 			gmem_test(fault_private, vm, flags);
 		}
-- 
2.50.1



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-11-27 10:39 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-04-02 16:07 [PATCH v2 0/5] KVM: guest_memfd: support for uffd minor Nikita Kalyazin
2025-04-02 16:07 ` [PATCH v2 1/5] mm: userfaultfd: generic continue for non hugetlbfs Nikita Kalyazin
2025-04-02 19:04   ` James Houghton
2025-04-03 17:01     ` Nikita Kalyazin
2025-04-02 16:07 ` [PATCH v2 2/5] KVM: guest_memfd: add kvm_gmem_vma_is_gmem Nikita Kalyazin
2025-04-02 16:07 ` [PATCH v2 3/5] mm: userfaultfd: allow to register continue for guest_memfd Nikita Kalyazin
2025-04-02 21:25   ` James Houghton
2025-04-03 17:01     ` Nikita Kalyazin
2025-04-02 16:07 ` [PATCH v2 4/5] KVM: guest_memfd: add support for userfaultfd minor Nikita Kalyazin
2025-04-02 16:07 ` [PATCH v2 5/5] KVM: selftests: test userfaultfd minor for guest_memfd Nikita Kalyazin
2025-04-02 21:10   ` James Houghton
2025-04-03 17:02     ` Nikita Kalyazin
2025-11-25 18:38 [PATCH v2 0/5] mm, kvm: add guest_memfd support for uffd minor faults Mike Rapoport
2025-11-25 18:38 ` [PATCH v2 5/5] KVM: selftests: test userfaultfd minor for guest_memfd Mike Rapoport
2025-11-26 15:23   ` Liam R. Howlett
2025-11-26 16:49   ` Nikita Kalyazin
2025-11-27 10:39     ` Mike Rapoport

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox