linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/3] mm: follow_pte() improvements and acrn follow_pte() fixes
@ 2024-04-10 15:55 David Hildenbrand
  2024-04-10 15:55 ` [PATCH v1 1/3] drivers/virt/acrn: fix PFNMAP PTE checks in acrn_vm_ram_map() David Hildenbrand
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: David Hildenbrand @ 2024-04-10 15:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, x86, linux-s390, kvm, David Hildenbrand, Andrew Morton,
	Yonghua Huang, Fei Li, Christoph Hellwig, Gerald Schaefer,
	Heiko Carstens, Ingo Molnar, Alex Williamson, Paolo Bonzini

Patch #1 fixes a bunch of issues I spotted in the acrn driver. It compiles,
that's all I know. I'll appreciate some review and testing from acrn
folks.

Patch #2+#3 improve follow_pte(), passing a VMA instead of the MM, adding
more sanity checks, and improving the documentation. Gave it a quick
test on x86-64 using VM_PAT that ends up using follow_pte().

Not CCing all s390x and x86 maintainers (but lists), to reduce noise.

As this depends on other stuff in mm-unstable, this should likely go via
the MM tree.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Yonghua Huang <yonghua.huang@intel.com>
Cc: Fei Li <fei1.li@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>

David Hildenbrand (3):
  drivers/virt/acrn: fix PFNMAP PTE checks in acrn_vm_ram_map()
  mm: pass VMA instead of MM to follow_pte()
  mm: follow_pte() improvements

 arch/s390/pci/pci_mmio.c        |  4 +--
 arch/x86/mm/pat/memtype.c       |  5 +--
 drivers/vfio/vfio_iommu_type1.c |  4 +--
 drivers/virt/acrn/mm.c          | 62 ++++++++++++++++++++++++---------
 include/linux/mm.h              |  2 +-
 mm/memory.c                     | 35 ++++++++++++-------
 virt/kvm/kvm_main.c             |  4 +--
 7 files changed, 77 insertions(+), 39 deletions(-)

-- 
2.44.0



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v1 1/3] drivers/virt/acrn: fix PFNMAP PTE checks in acrn_vm_ram_map()
  2024-04-10 15:55 [PATCH v1 0/3] mm: follow_pte() improvements and acrn follow_pte() fixes David Hildenbrand
@ 2024-04-10 15:55 ` David Hildenbrand
  2024-04-10 20:12   ` Andrew Morton
  2024-04-10 15:55 ` [PATCH v1 2/3] mm: pass VMA instead of MM to follow_pte() David Hildenbrand
  2024-04-10 15:55 ` [PATCH v1 3/3] mm: follow_pte() improvements David Hildenbrand
  2 siblings, 1 reply; 6+ messages in thread
From: David Hildenbrand @ 2024-04-10 15:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, x86, linux-s390, kvm, David Hildenbrand, Andrew Morton,
	Yonghua Huang, Fei Li, Christoph Hellwig, Gerald Schaefer,
	Heiko Carstens, Ingo Molnar, Alex Williamson, Paolo Bonzini

We currently miss to handle various cases, resulting in a dangerous
follow_pte() (previously follow_pfn()) usage.

(1) We're not checking PTE write permissions.

Maybe we should simply always require pte_write() like we do for
pin_user_pages_fast(FOLL_WRITE)? Hard to tell, so let's check for
ACRN_MEM_ACCESS_WRITE for now.

(2) We're not rejecting refcounted pages.

As we are not using MMU notifiers, messing with refcounted pages is
dangerous and can result in use-after-free. Let's make sure to reject them.

(3) We are only looking at the first PTE of a bigger range.

We only lookup a single PTE, but memmap->len may span a larger area.
Let's loop over all involved PTEs and make sure the PFN range is
actually contiguous. Reject everything else: it couldn't have worked
either way, and rather made use access PFNs we shouldn't be accessing.

Fixes: 8a6e85f75a83 ("virt: acrn: obtain pa from VMA with PFNMAP flag")
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/virt/acrn/mm.c | 63 +++++++++++++++++++++++++++++++-----------
 1 file changed, 47 insertions(+), 16 deletions(-)

diff --git a/drivers/virt/acrn/mm.c b/drivers/virt/acrn/mm.c
index b30077baf352..2d98e1e185c4 100644
--- a/drivers/virt/acrn/mm.c
+++ b/drivers/virt/acrn/mm.c
@@ -156,23 +156,29 @@ int acrn_vm_memseg_unmap(struct acrn_vm *vm, struct acrn_vm_memmap *memmap)
 int acrn_vm_ram_map(struct acrn_vm *vm, struct acrn_vm_memmap *memmap)
 {
 	struct vm_memory_region_batch *regions_info;
-	int nr_pages, i = 0, order, nr_regions = 0;
+	int nr_pages, i, order, nr_regions = 0;
 	struct vm_memory_mapping *region_mapping;
 	struct vm_memory_region_op *vm_region;
 	struct page **pages = NULL, *page;
 	void *remap_vaddr;
 	int ret, pinned;
 	u64 user_vm_pa;
-	unsigned long pfn;
 	struct vm_area_struct *vma;
 
 	if (!vm || !memmap)
 		return -EINVAL;
 
+	/* Get the page number of the map region */
+	nr_pages = memmap->len >> PAGE_SHIFT;
+	if (!nr_pages)
+		return -EINVAL;
+
 	mmap_read_lock(current->mm);
 	vma = vma_lookup(current->mm, memmap->vma_base);
 	if (vma && ((vma->vm_flags & VM_PFNMAP) != 0)) {
+		unsigned long start_pfn, cur_pfn;
 		spinlock_t *ptl;
+		bool writable;
 		pte_t *ptep;
 
 		if ((memmap->vma_base + memmap->len) > vma->vm_end) {
@@ -180,25 +186,53 @@ int acrn_vm_ram_map(struct acrn_vm *vm, struct acrn_vm_memmap *memmap)
 			return -EINVAL;
 		}
 
-		ret = follow_pte(vma->vm_mm, memmap->vma_base, &ptep, &ptl);
-		if (ret < 0) {
-			mmap_read_unlock(current->mm);
+		for (i = 0; i < nr_pages; i++) {
+			ret = follow_pte(vma->vm_mm,
+					 memmap->vma_base + i * PAGE_SIZE,
+					 &ptep, &ptl);
+			if (ret)
+				break;
+
+			cur_pfn = pte_pfn(ptep_get(ptep));
+			if (i == 0)
+				start_pfn = cur_pfn;
+			writable = !!pte_write(ptep_get(ptep));
+			pte_unmap_unlock(ptep, ptl);
+
+			/* Disallow write access if the PTE is not writable. */
+			if (!writable &&
+			    (memmap->attr & ACRN_MEM_ACCESS_WRITE)) {
+				ret = -EFAULT;
+				break;
+			}
+
+			/* Disallow refcounted pages. */
+			if (pfn_valid(cur_pfn) &&
+			    !PageReserved(pfn_to_page(cur_pfn))) {
+				ret = -EFAULT;
+				break;
+			}
+
+			/* Disallow non-contiguous ranges. */
+			if (cur_pfn != start_pfn + i) {
+				ret = -EINVAL;
+				break;
+			}
+		}
+		mmap_read_unlock(current->mm);
+
+		if (ret) {
 			dev_dbg(acrn_dev.this_device,
 				"Failed to lookup PFN at VMA:%pK.\n", (void *)memmap->vma_base);
 			return ret;
 		}
-		pfn = pte_pfn(ptep_get(ptep));
-		pte_unmap_unlock(ptep, ptl);
-		mmap_read_unlock(current->mm);
 
 		return acrn_mm_region_add(vm, memmap->user_vm_pa,
-			 PFN_PHYS(pfn), memmap->len,
+			 PFN_PHYS(start_pfn), memmap->len,
 			 ACRN_MEM_TYPE_WB, memmap->attr);
 	}
 	mmap_read_unlock(current->mm);
 
-	/* Get the page number of the map region */
-	nr_pages = memmap->len >> PAGE_SHIFT;
 	pages = vzalloc(array_size(nr_pages, sizeof(*pages)));
 	if (!pages)
 		return -ENOMEM;
@@ -242,12 +276,11 @@ int acrn_vm_ram_map(struct acrn_vm *vm, struct acrn_vm_memmap *memmap)
 	mutex_unlock(&vm->regions_mapping_lock);
 
 	/* Calculate count of vm_memory_region_op */
-	while (i < nr_pages) {
+	for (i = 0; i < nr_pages; i += 1 << order) {
 		page = pages[i];
 		VM_BUG_ON_PAGE(PageTail(page), page);
 		order = compound_order(page);
 		nr_regions++;
-		i += 1 << order;
 	}
 
 	/* Prepare the vm_memory_region_batch */
@@ -264,8 +297,7 @@ int acrn_vm_ram_map(struct acrn_vm *vm, struct acrn_vm_memmap *memmap)
 	regions_info->vmid = vm->vmid;
 	regions_info->regions_gpa = virt_to_phys(vm_region);
 	user_vm_pa = memmap->user_vm_pa;
-	i = 0;
-	while (i < nr_pages) {
+	for (i = 0; i < nr_pages; i += 1 << order) {
 		u32 region_size;
 
 		page = pages[i];
@@ -281,7 +313,6 @@ int acrn_vm_ram_map(struct acrn_vm *vm, struct acrn_vm_memmap *memmap)
 
 		vm_region++;
 		user_vm_pa += region_size;
-		i += 1 << order;
 	}
 
 	/* Inform the ACRN Hypervisor to set up EPT mappings */
-- 
2.44.0



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v1 2/3] mm: pass VMA instead of MM to follow_pte()
  2024-04-10 15:55 [PATCH v1 0/3] mm: follow_pte() improvements and acrn follow_pte() fixes David Hildenbrand
  2024-04-10 15:55 ` [PATCH v1 1/3] drivers/virt/acrn: fix PFNMAP PTE checks in acrn_vm_ram_map() David Hildenbrand
@ 2024-04-10 15:55 ` David Hildenbrand
  2024-04-10 18:08   ` Sean Christopherson
  2024-04-10 15:55 ` [PATCH v1 3/3] mm: follow_pte() improvements David Hildenbrand
  2 siblings, 1 reply; 6+ messages in thread
From: David Hildenbrand @ 2024-04-10 15:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, x86, linux-s390, kvm, David Hildenbrand, Andrew Morton,
	Yonghua Huang, Fei Li, Christoph Hellwig, Gerald Schaefer,
	Heiko Carstens, Ingo Molnar, Alex Williamson, Paolo Bonzini

... and centralize the VM_IO/VM_PFNMAP sanity check in there. We'll
now also perform these sanity checks for direct follow_pte()
invocations.

For generic_access_phys(), we might now check multiple times: nothing to
worry about, really.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/s390/pci/pci_mmio.c        |  4 ++--
 arch/x86/mm/pat/memtype.c       |  5 +----
 drivers/vfio/vfio_iommu_type1.c |  4 ++--
 drivers/virt/acrn/mm.c          |  3 +--
 include/linux/mm.h              |  2 +-
 mm/memory.c                     | 15 ++++++++-------
 virt/kvm/kvm_main.c             |  4 ++--
 7 files changed, 17 insertions(+), 20 deletions(-)

diff --git a/arch/s390/pci/pci_mmio.c b/arch/s390/pci/pci_mmio.c
index a90499c087f0..5398729bfe1b 100644
--- a/arch/s390/pci/pci_mmio.c
+++ b/arch/s390/pci/pci_mmio.c
@@ -169,7 +169,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
 	if (!(vma->vm_flags & VM_WRITE))
 		goto out_unlock_mmap;
 
-	ret = follow_pte(vma->vm_mm, mmio_addr, &ptep, &ptl);
+	ret = follow_pte(vma, mmio_addr, &ptep, &ptl);
 	if (ret)
 		goto out_unlock_mmap;
 
@@ -308,7 +308,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr,
 	if (!(vma->vm_flags & VM_WRITE))
 		goto out_unlock_mmap;
 
-	ret = follow_pte(vma->vm_mm, mmio_addr, &ptep, &ptl);
+	ret = follow_pte(vma, mmio_addr, &ptep, &ptl);
 	if (ret)
 		goto out_unlock_mmap;
 
diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
index d01c3b0bd6eb..bdc2a240c2aa 100644
--- a/arch/x86/mm/pat/memtype.c
+++ b/arch/x86/mm/pat/memtype.c
@@ -954,10 +954,7 @@ static int follow_phys(struct vm_area_struct *vma, unsigned long *prot,
 	pte_t *ptep, pte;
 	spinlock_t *ptl;
 
-	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
-		return -EINVAL;
-
-	if (follow_pte(vma->vm_mm, vma->vm_start, &ptep, &ptl))
+	if (follow_pte(vma, vma->vm_start, &ptep, &ptl))
 		return -EINVAL;
 
 	pte = ptep_get(ptep);
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index b5c15fe8f9fc..3a0218171cfa 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -518,7 +518,7 @@ static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm,
 	spinlock_t *ptl;
 	int ret;
 
-	ret = follow_pte(vma->vm_mm, vaddr, &ptep, &ptl);
+	ret = follow_pte(vma, vaddr, &ptep, &ptl);
 	if (ret) {
 		bool unlocked = false;
 
@@ -532,7 +532,7 @@ static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm,
 		if (ret)
 			return ret;
 
-		ret = follow_pte(vma->vm_mm, vaddr, &ptep, &ptl);
+		ret = follow_pte(vma, vaddr, &ptep, &ptl);
 		if (ret)
 			return ret;
 	}
diff --git a/drivers/virt/acrn/mm.c b/drivers/virt/acrn/mm.c
index 2d98e1e185c4..db8ff1d0ac23 100644
--- a/drivers/virt/acrn/mm.c
+++ b/drivers/virt/acrn/mm.c
@@ -187,8 +187,7 @@ int acrn_vm_ram_map(struct acrn_vm *vm, struct acrn_vm_memmap *memmap)
 		}
 
 		for (i = 0; i < nr_pages; i++) {
-			ret = follow_pte(vma->vm_mm,
-					 memmap->vma_base + i * PAGE_SIZE,
+			ret = follow_pte(vma, memmap->vma_base + i * PAGE_SIZE,
 					 &ptep, &ptl);
 			if (ret)
 				break;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index ef34cf54c14f..374b307abfc1 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2420,7 +2420,7 @@ void free_pgd_range(struct mmu_gather *tlb, unsigned long addr,
 		unsigned long end, unsigned long floor, unsigned long ceiling);
 int
 copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma);
-int follow_pte(struct mm_struct *mm, unsigned long address,
+int follow_pte(struct vm_area_struct *vma, unsigned long address,
 	       pte_t **ptepp, spinlock_t **ptlp);
 int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,
 			void *buf, int len, int write);
diff --git a/mm/memory.c b/mm/memory.c
index 78422d1c7381..ab01fb69dc72 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5928,7 +5928,7 @@ int __pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long address)
 
 /**
  * follow_pte - look up PTE at a user virtual address
- * @mm: the mm_struct of the target address space
+ * @vma: the memory mapping
  * @address: user virtual address
  * @ptepp: location to store found PTE
  * @ptlp: location to store the lock for the PTE
@@ -5947,15 +5947,19 @@ int __pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long address)
  *
  * Return: zero on success, -ve otherwise.
  */
-int follow_pte(struct mm_struct *mm, unsigned long address,
+int follow_pte(struct vm_area_struct *vma, unsigned long address,
 	       pte_t **ptepp, spinlock_t **ptlp)
 {
+	struct mm_struct *mm = vma->vm_mm;
 	pgd_t *pgd;
 	p4d_t *p4d;
 	pud_t *pud;
 	pmd_t *pmd;
 	pte_t *ptep;
 
+	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
+		goto out;
+
 	pgd = pgd_offset(mm, address);
 	if (pgd_none(*pgd) || unlikely(pgd_bad(*pgd)))
 		goto out;
@@ -6009,11 +6013,8 @@ int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,
 	int offset = offset_in_page(addr);
 	int ret = -EINVAL;
 
-	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
-		return -EINVAL;
-
 retry:
-	if (follow_pte(vma->vm_mm, addr, &ptep, &ptl))
+	if (follow_pte(vma, addr, &ptep, &ptl))
 		return -EINVAL;
 	pte = ptep_get(ptep);
 	pte_unmap_unlock(ptep, ptl);
@@ -6028,7 +6029,7 @@ int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,
 	if (!maddr)
 		return -ENOMEM;
 
-	if (follow_pte(vma->vm_mm, addr, &ptep, &ptl))
+	if (follow_pte(vma, addr, &ptep, &ptl))
 		goto out_unmap;
 
 	if (!pte_same(pte, ptep_get(ptep))) {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index fb49c2a60200..f57dbacb8689 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2902,7 +2902,7 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 	spinlock_t *ptl;
 	int r;
 
-	r = follow_pte(vma->vm_mm, addr, &ptep, &ptl);
+	r = follow_pte(vma, addr, &ptep, &ptl);
 	if (r) {
 		/*
 		 * get_user_pages fails for VM_IO and VM_PFNMAP vmas and does
@@ -2917,7 +2917,7 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 		if (r)
 			return r;
 
-		r = follow_pte(vma->vm_mm, addr, &ptep, &ptl);
+		r = follow_pte(vma, addr, &ptep, &ptl);
 		if (r)
 			return r;
 	}
-- 
2.44.0



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v1 3/3] mm: follow_pte() improvements
  2024-04-10 15:55 [PATCH v1 0/3] mm: follow_pte() improvements and acrn follow_pte() fixes David Hildenbrand
  2024-04-10 15:55 ` [PATCH v1 1/3] drivers/virt/acrn: fix PFNMAP PTE checks in acrn_vm_ram_map() David Hildenbrand
  2024-04-10 15:55 ` [PATCH v1 2/3] mm: pass VMA instead of MM to follow_pte() David Hildenbrand
@ 2024-04-10 15:55 ` David Hildenbrand
  2 siblings, 0 replies; 6+ messages in thread
From: David Hildenbrand @ 2024-04-10 15:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, x86, linux-s390, kvm, David Hildenbrand, Andrew Morton,
	Yonghua Huang, Fei Li, Christoph Hellwig, Gerald Schaefer,
	Heiko Carstens, Ingo Molnar, Alex Williamson, Paolo Bonzini

follow_pte() is now our main function to lookup PTEs in VM_PFNMAP/VM_IO
VMAs. Let's perform some more sanity checks to make this exported function
harder to abuse.

Further, extend the doc a bit, it still focuses on the KVM use case with
MMU notifiers. Drop the KVM+follow_pfn() comment, follow_pfn() is no more,
and we have other users nowadays.

Also extend the doc regarding refcounted pages and the interaction with MMU
notifiers.

KVM is one example that uses MMU notifiers and can deal with refcounted
pages properly. VFIO is one example that doesn't use MMU notifiers, and
to prevent use-after-free, rejects refcounted pages:
pfn_valid(pfn) && !PageReserved(pfn_to_page(pfn)). Protection changes are
less of a concern for users like VFIO: the behavior is similar to
longterm-pinning a page, and getting the PTE protection changed afterwards.

The primary concern with refcounted pages is use-after-free, which
callers should be aware of.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index ab01fb69dc72..535ef2686f95 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5935,15 +5935,21 @@ int __pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long address)
  *
  * On a successful return, the pointer to the PTE is stored in @ptepp;
  * the corresponding lock is taken and its location is stored in @ptlp.
- * The contents of the PTE are only stable until @ptlp is released;
- * any further use, if any, must be protected against invalidation
- * with MMU notifiers.
+ *
+ * The contents of the PTE are only stable until @ptlp is released using
+ * pte_unmap_unlock(). This function will fail if the PTE is non-present.
+ * Present PTEs may include PTEs that map refcounted pages, such as
+ * anonymous folios in COW mappings.
+ *
+ * Callers must be careful when relying on PTE content after
+ * pte_unmap_unlock(). Especially if the PTE maps a refcounted page,
+ * callers must protect against invalidation with MMU notifiers; otherwise
+ * access to the PFN at a later point in time can trigger use-after-free.
  *
  * Only IO mappings and raw PFN mappings are allowed.  The mmap semaphore
  * should be taken for read.
  *
- * KVM uses this function.  While it is arguably less bad than the historic
- * ``follow_pfn``, it is not a good general-purpose API.
+ * This function must not be used to modify PTE content.
  *
  * Return: zero on success, -ve otherwise.
  */
@@ -5957,6 +5963,10 @@ int follow_pte(struct vm_area_struct *vma, unsigned long address,
 	pmd_t *pmd;
 	pte_t *ptep;
 
+	mmap_assert_locked(mm);
+	if (unlikely(address < vma->vm_start || address >= vma->vm_end))
+		goto out;
+
 	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
 		goto out;
 
-- 
2.44.0



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v1 2/3] mm: pass VMA instead of MM to follow_pte()
  2024-04-10 15:55 ` [PATCH v1 2/3] mm: pass VMA instead of MM to follow_pte() David Hildenbrand
@ 2024-04-10 18:08   ` Sean Christopherson
  0 siblings, 0 replies; 6+ messages in thread
From: Sean Christopherson @ 2024-04-10 18:08 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, x86, linux-s390, kvm, Andrew Morton,
	Yonghua Huang, Fei Li, Christoph Hellwig, Gerald Schaefer,
	Heiko Carstens, Ingo Molnar, Alex Williamson, Paolo Bonzini

On Wed, Apr 10, 2024, David Hildenbrand wrote:
> ... and centralize the VM_IO/VM_PFNMAP sanity check in there. We'll
> now also perform these sanity checks for direct follow_pte()
> invocations.

Nice!

> For generic_access_phys(), we might now check multiple times: nothing to
> worry about, really.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---

For KVM, a very hearty

Acked-by: Sean Christopherson <seanjc@google.com>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v1 1/3] drivers/virt/acrn: fix PFNMAP PTE checks in acrn_vm_ram_map()
  2024-04-10 15:55 ` [PATCH v1 1/3] drivers/virt/acrn: fix PFNMAP PTE checks in acrn_vm_ram_map() David Hildenbrand
@ 2024-04-10 20:12   ` Andrew Morton
  0 siblings, 0 replies; 6+ messages in thread
From: Andrew Morton @ 2024-04-10 20:12 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, x86, linux-s390, kvm, Yonghua Huang,
	Fei Li, Christoph Hellwig, Gerald Schaefer, Heiko Carstens,
	Ingo Molnar, Alex Williamson, Paolo Bonzini

On Wed, 10 Apr 2024 17:55:25 +0200 David Hildenbrand <david@redhat.com> wrote:

> We currently miss to handle various cases, resulting in a dangerous
> follow_pte() (previously follow_pfn()) usage.
> 
> (1) We're not checking PTE write permissions.
> 
> Maybe we should simply always require pte_write() like we do for
> pin_user_pages_fast(FOLL_WRITE)? Hard to tell, so let's check for
> ACRN_MEM_ACCESS_WRITE for now.
> 
> (2) We're not rejecting refcounted pages.
> 
> As we are not using MMU notifiers, messing with refcounted pages is
> dangerous and can result in use-after-free. Let's make sure to reject them.
> 
> (3) We are only looking at the first PTE of a bigger range.
> 
> We only lookup a single PTE, but memmap->len may span a larger area.
> Let's loop over all involved PTEs and make sure the PFN range is
> actually contiguous. Reject everything else: it couldn't have worked
> either way, and rather made use access PFNs we shouldn't be accessing.
> 

This all sounds rather nasty and the maintainers of this driver may
choose to turn your fixes into something suitable for current mainline
and for -stable backporting.

If they choose to do this then please just go ahead.  Once such a
change appear in linux-next the mm-unstable patch "virt: acrn: stop
using follow_pfn" will start generating rejects, which will be easy
enough to handle.  Of they may choose to incorporate that change at the
same time.  Here it is:


From: Christoph Hellwig <hch@lst.de>
Subject: virt: acrn: stop using follow_pfn
Date: Mon, 25 Mar 2024 07:45:40 +0800

Switch from follow_pfn to follow_pte so that we can get rid of follow_pfn.
Note that this doesn't fix any of the pre-existing raciness and lack of
permission checking in the code.

Link: https://lkml.kernel.org/r/20240324234542.2038726-1-hch@lst.de
Link: https://lkml.kernel.org/r/20240324234542.2038726-2-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fei Li <fei1.li@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/virt/acrn/mm.c |   10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

--- a/drivers/virt/acrn/mm.c~virt-acrn-stop-using-follow_pfn
+++ a/drivers/virt/acrn/mm.c
@@ -172,18 +172,24 @@ int acrn_vm_ram_map(struct acrn_vm *vm,
 	mmap_read_lock(current->mm);
 	vma = vma_lookup(current->mm, memmap->vma_base);
 	if (vma && ((vma->vm_flags & VM_PFNMAP) != 0)) {
+		spinlock_t *ptl;
+		pte_t *ptep;
+
 		if ((memmap->vma_base + memmap->len) > vma->vm_end) {
 			mmap_read_unlock(current->mm);
 			return -EINVAL;
 		}
 
-		ret = follow_pfn(vma, memmap->vma_base, &pfn);
-		mmap_read_unlock(current->mm);
+		ret = follow_pte(vma->vm_mm, memmap->vma_base, &ptep, &ptl);
 		if (ret < 0) {
+			mmap_read_unlock(current->mm);
 			dev_dbg(acrn_dev.this_device,
 				"Failed to lookup PFN at VMA:%pK.\n", (void *)memmap->vma_base);
 			return ret;
 		}
+		pfn = pte_pfn(ptep_get(ptep));
+		pte_unmap_unlock(ptep, ptl);
+		mmap_read_unlock(current->mm);
 
 		return acrn_mm_region_add(vm, memmap->user_vm_pa,
 			 PFN_PHYS(pfn), memmap->len,
_



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-04-10 20:12 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-10 15:55 [PATCH v1 0/3] mm: follow_pte() improvements and acrn follow_pte() fixes David Hildenbrand
2024-04-10 15:55 ` [PATCH v1 1/3] drivers/virt/acrn: fix PFNMAP PTE checks in acrn_vm_ram_map() David Hildenbrand
2024-04-10 20:12   ` Andrew Morton
2024-04-10 15:55 ` [PATCH v1 2/3] mm: pass VMA instead of MM to follow_pte() David Hildenbrand
2024-04-10 18:08   ` Sean Christopherson
2024-04-10 15:55 ` [PATCH v1 3/3] mm: follow_pte() improvements David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox