linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: lizhe.67@bytedance.com
To: jgg@ziepe.ca, david@redhat.com
Cc: akpm@linux-foundation.org, alex.williamson@redhat.com,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, lizhe.67@bytedance.com, peterx@redhat.com
Subject: Re: [PATCH v4 2/3] gup: introduce unpin_user_folio_dirty_locked()
Date: Thu, 19 Jun 2025 17:05:42 +0800	[thread overview]
Message-ID: <20250619090542.29974-1-lizhe.67@bytedance.com> (raw)
In-Reply-To: <20250618132350.GN1376515@ziepe.ca>

On Wed, 18 Jun 2025 10:23:50 -0300, jgg@ziepe.ca wrote:
 
> On Wed, Jun 18, 2025 at 08:19:28PM +0800, lizhe.67@bytedance.com wrote:
> > On Wed, 18 Jun 2025 08:56:22 -0300, jgg@ziepe.ca wrote:
> >  
> > > On Wed, Jun 18, 2025 at 01:52:37PM +0200, David Hildenbrand wrote:
> > > 
> > > > I thought we also wanted to optimize out the
> > > > is_invalid_reserved_pfn() check for each subpage of a folio.
> > 
> > Yes, that is an important aspect of our optimization.
> > 
> > > VFIO keeps a tracking structure for the ranges, you can record there
> > > if a reserved PFN was ever placed into this range and skip the check
> > > entirely.
> > > 
> > > It would be very rare for reserved PFNs and non reserved will to be
> > > mixed within the same range, userspace could cause this but nothing
> > > should.
> > 
> > Yes, but it seems we don't have a very straightforward interface to
> > obtain the reserved attribute of this large range of pfns.
> 
> vfio_unmap_unpin()  has the struct vfio_dma, you'd store the
> indication there and pass it down.
> 
> It already builds the longest run of physical contiguity here:
> 
> 		for (len = PAGE_SIZE; iova + len < end; len += PAGE_SIZE) {
> 			next = iommu_iova_to_phys(domain->domain, iova + len);
> 			if (next != phys + len)
> 				break;
> 		}
> 
> And we pass down a physically contiguous range to
> unmap_unpin_fast()/unmap_unpin_slow().
> 
> The only thing you need to do is to detect reserved in
> vfio_unmap_unpin() optimized flag in the dma, and break up the above
> loop if it crosses a reserved boundary.
> 
> If you have a reserved range then just directly call iommu_unmap and
> forget about any page pinning.
> 
> Then in the page pinning side you use the range version.
> 
> Something very approximately like the below. But again, I would
> implore you to just use iommufd that is already much better here.

Thank you for your suggestion. We are also working on this, but
it is not something that can be completed in a short time. In
the near term, we are still expected to use the type1 method.

> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 1136d7ac6b597e..097b97c67e3f0d 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -738,12 +738,13 @@ static long vfio_unpin_pages_remote(struct vfio_dma *dma, dma_addr_t iova,
>  	long unlocked = 0, locked = 0;
>  	long i;
>  
> +	/* The caller has already ensured the pfn range is not reserved */
> +	unpin_user_page_range_dirty_lock(pfn_to_page(pfn), npage,
> +					 dma->prot & IOMMU_WRITE);
>  	for (i = 0; i < npage; i++, iova += PAGE_SIZE) {
> -		if (put_pfn(pfn++, dma->prot)) {
>  			unlocked++;
>  			if (vfio_find_vpfn(dma, iova))
>  				locked++;
> -		}
>  	}
>  
>  	if (do_accounting)
> @@ -1082,6 +1083,7 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma,
>  	while (iova < end) {
>  		size_t unmapped, len;
>  		phys_addr_t phys, next;
> +		bool reserved = false;
>  
>  		phys = iommu_iova_to_phys(domain->domain, iova);
>  		if (WARN_ON(!phys)) {
> @@ -1089,6 +1091,9 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma,
>  			continue;
>  		}
>  
> +		if (dma->has_reserved)
> +			reserved = is_invalid_reserved_pfn(phys >> PAGE_SHIFT);
> +
>  		/*
>  		 * To optimize for fewer iommu_unmap() calls, each of which
>  		 * may require hardware cache flushing, try to find the
> @@ -1098,21 +1103,31 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma,
>  			next = iommu_iova_to_phys(domain->domain, iova + len);
>  			if (next != phys + len)
>  				break;
> +			if (dma->has_reserved &&
> +			    reserved != is_invalid_reserved_pfn(next >> PAGE_SHIFT))
> +				break;
>  		}
>  
>  		/*
>  		 * First, try to use fast unmap/unpin. In case of failure,
>  		 * switch to slow unmap/unpin path.
>  		 */
> -		unmapped = unmap_unpin_fast(domain, dma, &iova, len, phys,
> -					    &unlocked, &unmapped_region_list,
> -					    &unmapped_region_cnt,
> -					    &iotlb_gather);
> -		if (!unmapped) {
> -			unmapped = unmap_unpin_slow(domain, dma, &iova, len,
> -						    phys, &unlocked);
> -			if (WARN_ON(!unmapped))
> -				break;
> +		if (reserved) {
> +			unmapped = iommu_unmap(domain->domain, iova, len);
> +			*iova += unmapped;
> +		} else {
> +			unmapped = unmap_unpin_fast(domain, dma, &iova, len,
> +						    phys, &unlocked,
> +						    &unmapped_region_list,
> +						    &unmapped_region_cnt,
> +						    &iotlb_gather);
> +			if (!unmapped) {
> +				unmapped = unmap_unpin_slow(domain, dma, &iova,
> +							    len, phys,
> +							    &unlocked);
> +				if (WARN_ON(!unmapped))
> +					break;
> +			}
>  		}
>  	}

As I understand it, there seem to be some issues with this
implementation. How can we obtain the value of dma->has_reserved
(acquiring it within vfio_pin_pages_remote() might be a good option)
and ensure that this value remains unchanged from the time of
assignment until we perform the unpin operation? I've searched
through the code and it appears that there are instances where
SetPageReserved() is called outside of the initialization phase.
Please correct me if I am wrong.

Thanks,
Zhe


  reply	other threads:[~2025-06-19  9:05 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-17  4:18 [PATCH v4 0/3] optimize vfio_unpin_pages_remote() for large folio lizhe.67
2025-06-17  4:18 ` [PATCH v4 1/3] vfio/type1: batch vfio_find_vpfn() in function vfio_unpin_pages_remote() lizhe.67
2025-06-17  4:18 ` [PATCH v4 2/3] gup: introduce unpin_user_folio_dirty_locked() lizhe.67
2025-06-17  7:35   ` David Hildenbrand
2025-06-17 13:42   ` Jason Gunthorpe
2025-06-17 13:45     ` David Hildenbrand
2025-06-17 13:58       ` David Hildenbrand
2025-06-17 14:04         ` David Hildenbrand
2025-06-17 15:22           ` Jason Gunthorpe
2025-06-18  6:28             ` lizhe.67
2025-06-18  8:20               ` David Hildenbrand
2025-06-18 11:36               ` Jason Gunthorpe
2025-06-18 11:40                 ` David Hildenbrand
2025-06-18 11:42                   ` David Hildenbrand
2025-06-18 11:46                     ` Jason Gunthorpe
2025-06-18 11:52                       ` David Hildenbrand
2025-06-18 11:56                         ` Jason Gunthorpe
2025-06-18 12:19                           ` lizhe.67
2025-06-18 13:23                             ` Jason Gunthorpe
2025-06-19  9:05                               ` lizhe.67 [this message]
2025-06-19 12:35                                 ` Jason Gunthorpe
2025-06-19 12:49                                   ` lizhe.67
2025-06-17  4:18 ` [PATCH v4 3/3] vfio/type1: optimize vfio_unpin_pages_remote() for large folio lizhe.67
2025-06-17  7:43   ` David Hildenbrand
2025-06-17  9:21     ` [PATCH v4 2/3] gup: introduce unpin_user_folio_dirty_locked() lizhe.67
2025-06-17  9:27       ` David Hildenbrand
2025-06-17  9:47         ` [PATCH v4 3/3] vfio/type1: optimize vfio_unpin_pages_remote() for large folio lizhe.67
2025-06-17  9:49           ` David Hildenbrand
2025-06-17 12:42             ` lizhe.67
2025-06-17 13:47               ` David Hildenbrand
2025-06-18  6:11                 ` lizhe.67
2025-06-18  7:22                   ` lizhe.67
2025-06-18  8:54                   ` David Hildenbrand
2025-06-18  9:39                     ` lizhe.67

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250619090542.29974-1-lizhe.67@bytedance.com \
    --to=lizhe.67@bytedance.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=david@redhat.com \
    --cc=jgg@ziepe.ca \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=peterx@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox