From: Jason Gunthorpe <jgg@ziepe.ca>
To: lizhe.67@bytedance.com
Cc: david@redhat.com, akpm@linux-foundation.org,
alex.williamson@redhat.com, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
peterx@redhat.com
Subject: Re: [PATCH v4 2/3] gup: introduce unpin_user_folio_dirty_locked()
Date: Wed, 18 Jun 2025 10:23:50 -0300 [thread overview]
Message-ID: <20250618132350.GN1376515@ziepe.ca> (raw)
In-Reply-To: <20250618121928.36287-1-lizhe.67@bytedance.com>
On Wed, Jun 18, 2025 at 08:19:28PM +0800, lizhe.67@bytedance.com wrote:
> On Wed, 18 Jun 2025 08:56:22 -0300, jgg@ziepe.ca wrote:
>
> > On Wed, Jun 18, 2025 at 01:52:37PM +0200, David Hildenbrand wrote:
> >
> > > I thought we also wanted to optimize out the
> > > is_invalid_reserved_pfn() check for each subpage of a folio.
>
> Yes, that is an important aspect of our optimization.
>
> > VFIO keeps a tracking structure for the ranges, you can record there
> > if a reserved PFN was ever placed into this range and skip the check
> > entirely.
> >
> > It would be very rare for reserved PFNs and non reserved will to be
> > mixed within the same range, userspace could cause this but nothing
> > should.
>
> Yes, but it seems we don't have a very straightforward interface to
> obtain the reserved attribute of this large range of pfns.
vfio_unmap_unpin() has the struct vfio_dma, you'd store the
indication there and pass it down.
It already builds the longest run of physical contiguity here:
for (len = PAGE_SIZE; iova + len < end; len += PAGE_SIZE) {
next = iommu_iova_to_phys(domain->domain, iova + len);
if (next != phys + len)
break;
}
And we pass down a physically contiguous range to
unmap_unpin_fast()/unmap_unpin_slow().
The only thing you need to do is to detect reserved in
vfio_unmap_unpin() optimized flag in the dma, and break up the above
loop if it crosses a reserved boundary.
If you have a reserved range then just directly call iommu_unmap and
forget about any page pinning.
Then in the page pinning side you use the range version.
Something very approximately like the below. But again, I would
implore you to just use iommufd that is already much better here.
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 1136d7ac6b597e..097b97c67e3f0d 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -738,12 +738,13 @@ static long vfio_unpin_pages_remote(struct vfio_dma *dma, dma_addr_t iova,
long unlocked = 0, locked = 0;
long i;
+ /* The caller has already ensured the pfn range is not reserved */
+ unpin_user_page_range_dirty_lock(pfn_to_page(pfn), npage,
+ dma->prot & IOMMU_WRITE);
for (i = 0; i < npage; i++, iova += PAGE_SIZE) {
- if (put_pfn(pfn++, dma->prot)) {
unlocked++;
if (vfio_find_vpfn(dma, iova))
locked++;
- }
}
if (do_accounting)
@@ -1082,6 +1083,7 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma,
while (iova < end) {
size_t unmapped, len;
phys_addr_t phys, next;
+ bool reserved = false;
phys = iommu_iova_to_phys(domain->domain, iova);
if (WARN_ON(!phys)) {
@@ -1089,6 +1091,9 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma,
continue;
}
+ if (dma->has_reserved)
+ reserved = is_invalid_reserved_pfn(phys >> PAGE_SHIFT);
+
/*
* To optimize for fewer iommu_unmap() calls, each of which
* may require hardware cache flushing, try to find the
@@ -1098,21 +1103,31 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma,
next = iommu_iova_to_phys(domain->domain, iova + len);
if (next != phys + len)
break;
+ if (dma->has_reserved &&
+ reserved != is_invalid_reserved_pfn(next >> PAGE_SHIFT))
+ break;
}
/*
* First, try to use fast unmap/unpin. In case of failure,
* switch to slow unmap/unpin path.
*/
- unmapped = unmap_unpin_fast(domain, dma, &iova, len, phys,
- &unlocked, &unmapped_region_list,
- &unmapped_region_cnt,
- &iotlb_gather);
- if (!unmapped) {
- unmapped = unmap_unpin_slow(domain, dma, &iova, len,
- phys, &unlocked);
- if (WARN_ON(!unmapped))
- break;
+ if (reserved) {
+ unmapped = iommu_unmap(domain->domain, iova, len);
+ *iova += unmapped;
+ } else {
+ unmapped = unmap_unpin_fast(domain, dma, &iova, len,
+ phys, &unlocked,
+ &unmapped_region_list,
+ &unmapped_region_cnt,
+ &iotlb_gather);
+ if (!unmapped) {
+ unmapped = unmap_unpin_slow(domain, dma, &iova,
+ len, phys,
+ &unlocked);
+ if (WARN_ON(!unmapped))
+ break;
+ }
}
}
next prev parent reply other threads:[~2025-06-18 13:23 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-17 4:18 [PATCH v4 0/3] optimize vfio_unpin_pages_remote() for large folio lizhe.67
2025-06-17 4:18 ` [PATCH v4 1/3] vfio/type1: batch vfio_find_vpfn() in function vfio_unpin_pages_remote() lizhe.67
2025-06-17 4:18 ` [PATCH v4 2/3] gup: introduce unpin_user_folio_dirty_locked() lizhe.67
2025-06-17 7:35 ` David Hildenbrand
2025-06-17 13:42 ` Jason Gunthorpe
2025-06-17 13:45 ` David Hildenbrand
2025-06-17 13:58 ` David Hildenbrand
2025-06-17 14:04 ` David Hildenbrand
2025-06-17 15:22 ` Jason Gunthorpe
2025-06-18 6:28 ` lizhe.67
2025-06-18 8:20 ` David Hildenbrand
2025-06-18 11:36 ` Jason Gunthorpe
2025-06-18 11:40 ` David Hildenbrand
2025-06-18 11:42 ` David Hildenbrand
2025-06-18 11:46 ` Jason Gunthorpe
2025-06-18 11:52 ` David Hildenbrand
2025-06-18 11:56 ` Jason Gunthorpe
2025-06-18 12:19 ` lizhe.67
2025-06-18 13:23 ` Jason Gunthorpe [this message]
2025-06-19 9:05 ` lizhe.67
2025-06-19 12:35 ` Jason Gunthorpe
2025-06-19 12:49 ` lizhe.67
2025-06-17 4:18 ` [PATCH v4 3/3] vfio/type1: optimize vfio_unpin_pages_remote() for large folio lizhe.67
2025-06-17 7:43 ` David Hildenbrand
2025-06-17 9:21 ` [PATCH v4 2/3] gup: introduce unpin_user_folio_dirty_locked() lizhe.67
2025-06-17 9:27 ` David Hildenbrand
2025-06-17 9:47 ` [PATCH v4 3/3] vfio/type1: optimize vfio_unpin_pages_remote() for large folio lizhe.67
2025-06-17 9:49 ` David Hildenbrand
2025-06-17 12:42 ` lizhe.67
2025-06-17 13:47 ` David Hildenbrand
2025-06-18 6:11 ` lizhe.67
2025-06-18 7:22 ` lizhe.67
2025-06-18 8:54 ` David Hildenbrand
2025-06-18 9:39 ` lizhe.67
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250618132350.GN1376515@ziepe.ca \
--to=jgg@ziepe.ca \
--cc=akpm@linux-foundation.org \
--cc=alex.williamson@redhat.com \
--cc=david@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lizhe.67@bytedance.com \
--cc=peterx@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox