linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@ziepe.ca>
To: lizhe.67@bytedance.com
Cc: david@redhat.com, akpm@linux-foundation.org,
	alex.williamson@redhat.com, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	peterx@redhat.com
Subject: Re: [PATCH v4 2/3] gup: introduce unpin_user_folio_dirty_locked()
Date: Wed, 18 Jun 2025 10:23:50 -0300	[thread overview]
Message-ID: <20250618132350.GN1376515@ziepe.ca> (raw)
In-Reply-To: <20250618121928.36287-1-lizhe.67@bytedance.com>

On Wed, Jun 18, 2025 at 08:19:28PM +0800, lizhe.67@bytedance.com wrote:
> On Wed, 18 Jun 2025 08:56:22 -0300, jgg@ziepe.ca wrote:
>  
> > On Wed, Jun 18, 2025 at 01:52:37PM +0200, David Hildenbrand wrote:
> > 
> > > I thought we also wanted to optimize out the
> > > is_invalid_reserved_pfn() check for each subpage of a folio.
> 
> Yes, that is an important aspect of our optimization.
> 
> > VFIO keeps a tracking structure for the ranges, you can record there
> > if a reserved PFN was ever placed into this range and skip the check
> > entirely.
> > 
> > It would be very rare for reserved PFNs and non reserved will to be
> > mixed within the same range, userspace could cause this but nothing
> > should.
> 
> Yes, but it seems we don't have a very straightforward interface to
> obtain the reserved attribute of this large range of pfns.

vfio_unmap_unpin()  has the struct vfio_dma, you'd store the
indication there and pass it down.

It already builds the longest run of physical contiguity here:

		for (len = PAGE_SIZE; iova + len < end; len += PAGE_SIZE) {
			next = iommu_iova_to_phys(domain->domain, iova + len);
			if (next != phys + len)
				break;
		}

And we pass down a physically contiguous range to
unmap_unpin_fast()/unmap_unpin_slow().

The only thing you need to do is to detect reserved in
vfio_unmap_unpin() optimized flag in the dma, and break up the above
loop if it crosses a reserved boundary.

If you have a reserved range then just directly call iommu_unmap and
forget about any page pinning.

Then in the page pinning side you use the range version.

Something very approximately like the below. But again, I would
implore you to just use iommufd that is already much better here.

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 1136d7ac6b597e..097b97c67e3f0d 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -738,12 +738,13 @@ static long vfio_unpin_pages_remote(struct vfio_dma *dma, dma_addr_t iova,
 	long unlocked = 0, locked = 0;
 	long i;
 
+	/* The caller has already ensured the pfn range is not reserved */
+	unpin_user_page_range_dirty_lock(pfn_to_page(pfn), npage,
+					 dma->prot & IOMMU_WRITE);
 	for (i = 0; i < npage; i++, iova += PAGE_SIZE) {
-		if (put_pfn(pfn++, dma->prot)) {
 			unlocked++;
 			if (vfio_find_vpfn(dma, iova))
 				locked++;
-		}
 	}
 
 	if (do_accounting)
@@ -1082,6 +1083,7 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma,
 	while (iova < end) {
 		size_t unmapped, len;
 		phys_addr_t phys, next;
+		bool reserved = false;
 
 		phys = iommu_iova_to_phys(domain->domain, iova);
 		if (WARN_ON(!phys)) {
@@ -1089,6 +1091,9 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma,
 			continue;
 		}
 
+		if (dma->has_reserved)
+			reserved = is_invalid_reserved_pfn(phys >> PAGE_SHIFT);
+
 		/*
 		 * To optimize for fewer iommu_unmap() calls, each of which
 		 * may require hardware cache flushing, try to find the
@@ -1098,21 +1103,31 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma,
 			next = iommu_iova_to_phys(domain->domain, iova + len);
 			if (next != phys + len)
 				break;
+			if (dma->has_reserved &&
+			    reserved != is_invalid_reserved_pfn(next >> PAGE_SHIFT))
+				break;
 		}
 
 		/*
 		 * First, try to use fast unmap/unpin. In case of failure,
 		 * switch to slow unmap/unpin path.
 		 */
-		unmapped = unmap_unpin_fast(domain, dma, &iova, len, phys,
-					    &unlocked, &unmapped_region_list,
-					    &unmapped_region_cnt,
-					    &iotlb_gather);
-		if (!unmapped) {
-			unmapped = unmap_unpin_slow(domain, dma, &iova, len,
-						    phys, &unlocked);
-			if (WARN_ON(!unmapped))
-				break;
+		if (reserved) {
+			unmapped = iommu_unmap(domain->domain, iova, len);
+			*iova += unmapped;
+		} else {
+			unmapped = unmap_unpin_fast(domain, dma, &iova, len,
+						    phys, &unlocked,
+						    &unmapped_region_list,
+						    &unmapped_region_cnt,
+						    &iotlb_gather);
+			if (!unmapped) {
+				unmapped = unmap_unpin_slow(domain, dma, &iova,
+							    len, phys,
+							    &unlocked);
+				if (WARN_ON(!unmapped))
+					break;
+			}
 		}
 	}
 


  reply	other threads:[~2025-06-18 13:23 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-17  4:18 [PATCH v4 0/3] optimize vfio_unpin_pages_remote() for large folio lizhe.67
2025-06-17  4:18 ` [PATCH v4 1/3] vfio/type1: batch vfio_find_vpfn() in function vfio_unpin_pages_remote() lizhe.67
2025-06-17  4:18 ` [PATCH v4 2/3] gup: introduce unpin_user_folio_dirty_locked() lizhe.67
2025-06-17  7:35   ` David Hildenbrand
2025-06-17 13:42   ` Jason Gunthorpe
2025-06-17 13:45     ` David Hildenbrand
2025-06-17 13:58       ` David Hildenbrand
2025-06-17 14:04         ` David Hildenbrand
2025-06-17 15:22           ` Jason Gunthorpe
2025-06-18  6:28             ` lizhe.67
2025-06-18  8:20               ` David Hildenbrand
2025-06-18 11:36               ` Jason Gunthorpe
2025-06-18 11:40                 ` David Hildenbrand
2025-06-18 11:42                   ` David Hildenbrand
2025-06-18 11:46                     ` Jason Gunthorpe
2025-06-18 11:52                       ` David Hildenbrand
2025-06-18 11:56                         ` Jason Gunthorpe
2025-06-18 12:19                           ` lizhe.67
2025-06-18 13:23                             ` Jason Gunthorpe [this message]
2025-06-19  9:05                               ` lizhe.67
2025-06-19 12:35                                 ` Jason Gunthorpe
2025-06-19 12:49                                   ` lizhe.67
2025-06-17  4:18 ` [PATCH v4 3/3] vfio/type1: optimize vfio_unpin_pages_remote() for large folio lizhe.67
2025-06-17  7:43   ` David Hildenbrand
2025-06-17  9:21     ` [PATCH v4 2/3] gup: introduce unpin_user_folio_dirty_locked() lizhe.67
2025-06-17  9:27       ` David Hildenbrand
2025-06-17  9:47         ` [PATCH v4 3/3] vfio/type1: optimize vfio_unpin_pages_remote() for large folio lizhe.67
2025-06-17  9:49           ` David Hildenbrand
2025-06-17 12:42             ` lizhe.67
2025-06-17 13:47               ` David Hildenbrand
2025-06-18  6:11                 ` lizhe.67
2025-06-18  7:22                   ` lizhe.67
2025-06-18  8:54                   ` David Hildenbrand
2025-06-18  9:39                     ` lizhe.67

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250618132350.GN1376515@ziepe.ca \
    --to=jgg@ziepe.ca \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=david@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizhe.67@bytedance.com \
    --cc=peterx@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox