From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CB658C71157 for ; Wed, 18 Jun 2025 13:23:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4E1596B0088; Wed, 18 Jun 2025 09:23:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4924E6B0089; Wed, 18 Jun 2025 09:23:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3A7EA6B008A; Wed, 18 Jun 2025 09:23:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 2D5146B0088 for ; Wed, 18 Jun 2025 09:23:56 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id A453BBF7CB for ; Wed, 18 Jun 2025 13:23:55 +0000 (UTC) X-FDA: 83568589230.24.700F36B Received: from mail-qk1-f172.google.com (mail-qk1-f172.google.com [209.85.222.172]) by imf06.hostedemail.com (Postfix) with ESMTP id B58F618000B for ; Wed, 18 Jun 2025 13:23:53 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=ziepe.ca header.s=google header.b=mPQwwJ8v; spf=pass (imf06.hostedemail.com: domain of jgg@ziepe.ca designates 209.85.222.172 as permitted sender) smtp.mailfrom=jgg@ziepe.ca; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750253033; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=KqEK20OUV+YC6Bkjm5qN0J00zk2Z0hjMnTZBJHjERP0=; b=1YH+3M5nl+SQO248YwPVZQNcZUEPv51QtaamtdGxA1FY0jyZaiX0Wq7cVyVScpC8UEdmI5 6SnuZLFU5VisWPNvxS8TqqGA+GWsL9MDr4YuVszYSwiYd7GcwZDfPMFzSFDbKnNyZA7XXo VA7D8xu4F/oCoZHcicFQRauSZ/B4fwE= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=ziepe.ca header.s=google header.b=mPQwwJ8v; spf=pass (imf06.hostedemail.com: domain of jgg@ziepe.ca designates 209.85.222.172 as permitted sender) smtp.mailfrom=jgg@ziepe.ca; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750253033; a=rsa-sha256; cv=none; b=Vl4rP6kWDwzqbNjNAk2C03/oCWAsMdwTHSvdoagAToR15PSh+Cs51TcS7qy2HOkndxTXE2 0ght1umJRO2+Lll8WnkNiNI7xLLMvD0mvHe2WxTdAPaJVteYKt++jJANX/CQzssan4qEiz i/x+oGcqZqNwCkh8NBali/eSKGg6JOo= Received: by mail-qk1-f172.google.com with SMTP id af79cd13be357-7d094ef02e5so76684285a.1 for ; Wed, 18 Jun 2025 06:23:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; t=1750253033; x=1750857833; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=KqEK20OUV+YC6Bkjm5qN0J00zk2Z0hjMnTZBJHjERP0=; b=mPQwwJ8vGCPWJrCwe316yzayMz4mwVz4yce0VQpI0Q5K6RgH3CUqIFfCisH+ya7bn3 crztoFgc81WXVzDcKdAUUVt5THq3sIXsM0gXw8gG/5Nz9swHUthBT/uhH0x96FwM2Zhz vmtirVlmSpEBFp7GH+8oWAC76y3z10TTjPlmVFY0GTAfFJdv4YG+9Oy+PserkpDDQoxz AYHDvH/cXVwNh6QR0k0ygU/Z09PKMhN/pjB4kbGLfdq/RzYPYTHSHlpMk0FQn+Lzn/cg 8WZqSq/KhLpFY9DRJC5f7OSL1J+eT6nsURC3Fr5QtmMFNTf2xMvRjiKfGu6ZKVKhlClT IsTQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750253033; x=1750857833; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=KqEK20OUV+YC6Bkjm5qN0J00zk2Z0hjMnTZBJHjERP0=; b=g1hB8Ywg/FjKNd4SNy2Gqo0CcAlDHGsBXXca5JtJQ8TaahqwS8ERESseQBiUdFTPbL ZLVqTQHUwwrAKvT+sGtvh3IctJjeRIHymRdkXR+27VhXKm3El53LPBISq8b/jD8CReix LYlsUnbyUkioGJiRCn3n0oWECM/4Qi/zo0Uuo0vZg6mKH9qWxC1D6Nyr8l/pJz+qDo4i kLy4SFvB9nnrH/iFRLuZ62yVq7NGEmaqy8stEBwtiqEQ+rsaHLibjcIzxl/hqLQ5SBvE 0bcjgjr7NBvQFcqqWOtfF08E6aOADwdJwTz8rjsBuUdX4suJ+lB/eg0uGbUsXmNou50c QuCQ== X-Forwarded-Encrypted: i=1; AJvYcCWNUVlaRHZhqtlpGfgH471/psK8R5YPPV9TNAzMR0Ihsx9meztA9uwk3T+w1Jx8jRPwwdQ7mVS5Cw==@kvack.org X-Gm-Message-State: AOJu0YyhlqmJn1gFU+OKbmfvSerB3eF/PR4eHxgrWvFubOkT3RIIRkZd +wr3AiznTOnd9qkBA11ohUBDw8XXltQ/hmubtB8Ac/M2WOxVLy+Doo9Gxk2uqjagPDM= X-Gm-Gg: ASbGncsYO2cjTwLRX27J/5vdJuOHXyOnEDRxKIfp73T/c3HPVGdwTesZYApDcQVLXzJ WbZ8PoeHXtozFuK0TGsY++0NdGH09I6tu4eUjdYZLvdUHutv2teX8CQYza3Q+7ncVOx3xR3jRje jGyw415rApHFeXA5km7Vkhux/B5wXq8LSZhqHI/NfAuI7y9qPH2+ysf8oYkevTiZyFsh1uY+QYU 9H0RSUWBlhK6hfe95kLrSWdTr5gN9gBPH5cb0OBHXe2I0XJW0gw7tDfxXSw/ecHkiB9wqutdGy9 SanYQ9uWYAmfI0sipc23welg7Otk8PdtoH+MyzO7t9DfB5wfcJPrcUfdx3u98KhmPtBdMCINFGy wTmQjTH7+MmQm7z2tzQe6z76FtVgovgSvfEIYww== X-Google-Smtp-Source: AGHT+IF9NkUjCA8uk6HKiAeyvH8TKX/0/ivDtPkHXPD3WCNV0m99YeZSjC4tka6Wctd9mMDXm8kpLA== X-Received: by 2002:a05:620a:2621:b0:7cf:5cdb:7b68 with SMTP id af79cd13be357-7d3e9219d0bmr390160385a.0.1750253032707; Wed, 18 Jun 2025 06:23:52 -0700 (PDT) Received: from ziepe.ca (hlfxns017vw-142-167-56-70.dhcp-dynamic.fibreop.ns.bellaliant.net. [142.167.56.70]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7d3b8eac910sm769806485a.72.2025.06.18.06.23.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Jun 2025 06:23:51 -0700 (PDT) Received: from jgg by wakko with local (Exim 4.97) (envelope-from ) id 1uRsla-00000006mvn-29y8; Wed, 18 Jun 2025 10:23:50 -0300 Date: Wed, 18 Jun 2025 10:23:50 -0300 From: Jason Gunthorpe To: lizhe.67@bytedance.com Cc: david@redhat.com, akpm@linux-foundation.org, alex.williamson@redhat.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, peterx@redhat.com Subject: Re: [PATCH v4 2/3] gup: introduce unpin_user_folio_dirty_locked() Message-ID: <20250618132350.GN1376515@ziepe.ca> References: <20250618115622.GM1376515@ziepe.ca> <20250618121928.36287-1-lizhe.67@bytedance.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250618121928.36287-1-lizhe.67@bytedance.com> X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: B58F618000B X-Stat-Signature: 8opnys9uh9hj16kqqjkfzbbdgt5z1gkf X-HE-Tag: 1750253033-881815 X-HE-Meta: U2FsdGVkX18EyM2fst+gdYFsqcvbVTNDwZyVs/bBkB8kYFaoLpsXRVmqWcUHQjpfAi06fK9MNE2voyAYgiqsVgMQJLfvlEGyksXoXqWRmB6Ldv2+FNqttNQ/Uo0a5fI54OSc2cwbfseWyKEuIgLCFCw5YmDXW6ZGk/C7XJbkbhAuPeW8EfiVL6fe9ZRcsvTTVsx1YehhTjPkgzeONlR0ESwsTkyymdN8GOqbKGBlzZ6twkG11QNgaP7wN5SGlbCGCWxWR7ws7IwKuJjexLUcIhsuvqfw5x8r0C6hywBxq0/YGEc2Clww8kLmjkRyRnVDltPN7f4mv1Vmjgb3zJgJaTpCB9sCql4Wav90TFTMkbWk4Tc1cME6MPhfLMvsEDXt9hO8sPaFami5Wt0JKaxSxNRx6HZf2rS1tJUhsUGxAluOTZUrpqMK5hnitbVv6dHbvB819UmOPrtg5sVCOIi+Uvqhw+rgKq+fIzYDgDEGfaq6CMpYJZb+5CDchYjpl7wzxBRcK9PeiIgzV7WI9eK57zeywPca1wb9wBgx0T4mGFSpyJtp4c7y5B7Bmww0T+m0C8GTW/euRwGqdnfEx8Y2Sv6RKxY6ri3czkVs0QsqNTFhqAlXdwZIDpxYubJsG15qrO2yGoqPO5yXNPgZgxLdVLIMJq/qhaq217D4729nbwNNsSwdHfj/5oWRaguQg7xWTSOFZvfHBxxNRw1+Hc6eev4jLn8gAnYu6XV6GaRBpzTlnKi/PpMHGR8lTGNJLiF8vTP04PcnfT6LS5jkLm78bPNcXIxuTe5PDW3aVMCjxT55D1qWP1SiX9EwiWQF1/qeWP3grjKjQCZLKW+w7dqgozU+zmDyq17lO6u6k74/7/xG9ULbRorwzuJZ6DPP0BMI8mB092rCiLHFrKZvNl+fW25UP4Oh5SP+kmyIpEADSm82b0Hpo9FQxFJCtDNcSLsFzKEUVRhTk7RPv7ERCWf w/wrK/tb TU8GiImXVzsMEmpabsNBjILvxqX9/x3NxnsmOawC7JiZO+zmz97+F4N4XSs4OyDz/C8zOLet4OzLEmCW2E3iGVxhnuxueyVhwpj7zHvARmXOmRwZQRFxT7Tzeq1CsBWl9XM6S4C9lnGp2UaM4E3FQxnvl5/mfH1iTkHDEuDjdmm8MN6Y0bZjnfLbA6M3FsRrbQ+sMae3AtNgN2TJ76KQAlUdTWXVviQVTKB02gtsPsN2rX4ozsas+67ZcjVRcoV+u661LqPQoex+U6GE6d4+BlvcbJdVEfoixSVF9U4f1fSH9FYbmM/sOAHbP0Drd3TSXmaGULAVb2T8GwsTy9L1vz3KQNCpXNqC9xUBF27jxZyF0ii1W7wUQrsq9rFkFnBmviKdTLwUa4MKOt28EGBu0D2g3fwLKTFHOdePW X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jun 18, 2025 at 08:19:28PM +0800, lizhe.67@bytedance.com wrote: > On Wed, 18 Jun 2025 08:56:22 -0300, jgg@ziepe.ca wrote: > > > On Wed, Jun 18, 2025 at 01:52:37PM +0200, David Hildenbrand wrote: > > > > > I thought we also wanted to optimize out the > > > is_invalid_reserved_pfn() check for each subpage of a folio. > > Yes, that is an important aspect of our optimization. > > > VFIO keeps a tracking structure for the ranges, you can record there > > if a reserved PFN was ever placed into this range and skip the check > > entirely. > > > > It would be very rare for reserved PFNs and non reserved will to be > > mixed within the same range, userspace could cause this but nothing > > should. > > Yes, but it seems we don't have a very straightforward interface to > obtain the reserved attribute of this large range of pfns. vfio_unmap_unpin() has the struct vfio_dma, you'd store the indication there and pass it down. It already builds the longest run of physical contiguity here: for (len = PAGE_SIZE; iova + len < end; len += PAGE_SIZE) { next = iommu_iova_to_phys(domain->domain, iova + len); if (next != phys + len) break; } And we pass down a physically contiguous range to unmap_unpin_fast()/unmap_unpin_slow(). The only thing you need to do is to detect reserved in vfio_unmap_unpin() optimized flag in the dma, and break up the above loop if it crosses a reserved boundary. If you have a reserved range then just directly call iommu_unmap and forget about any page pinning. Then in the page pinning side you use the range version. Something very approximately like the below. But again, I would implore you to just use iommufd that is already much better here. diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 1136d7ac6b597e..097b97c67e3f0d 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -738,12 +738,13 @@ static long vfio_unpin_pages_remote(struct vfio_dma *dma, dma_addr_t iova, long unlocked = 0, locked = 0; long i; + /* The caller has already ensured the pfn range is not reserved */ + unpin_user_page_range_dirty_lock(pfn_to_page(pfn), npage, + dma->prot & IOMMU_WRITE); for (i = 0; i < npage; i++, iova += PAGE_SIZE) { - if (put_pfn(pfn++, dma->prot)) { unlocked++; if (vfio_find_vpfn(dma, iova)) locked++; - } } if (do_accounting) @@ -1082,6 +1083,7 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma, while (iova < end) { size_t unmapped, len; phys_addr_t phys, next; + bool reserved = false; phys = iommu_iova_to_phys(domain->domain, iova); if (WARN_ON(!phys)) { @@ -1089,6 +1091,9 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma, continue; } + if (dma->has_reserved) + reserved = is_invalid_reserved_pfn(phys >> PAGE_SHIFT); + /* * To optimize for fewer iommu_unmap() calls, each of which * may require hardware cache flushing, try to find the @@ -1098,21 +1103,31 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma, next = iommu_iova_to_phys(domain->domain, iova + len); if (next != phys + len) break; + if (dma->has_reserved && + reserved != is_invalid_reserved_pfn(next >> PAGE_SHIFT)) + break; } /* * First, try to use fast unmap/unpin. In case of failure, * switch to slow unmap/unpin path. */ - unmapped = unmap_unpin_fast(domain, dma, &iova, len, phys, - &unlocked, &unmapped_region_list, - &unmapped_region_cnt, - &iotlb_gather); - if (!unmapped) { - unmapped = unmap_unpin_slow(domain, dma, &iova, len, - phys, &unlocked); - if (WARN_ON(!unmapped)) - break; + if (reserved) { + unmapped = iommu_unmap(domain->domain, iova, len); + *iova += unmapped; + } else { + unmapped = unmap_unpin_fast(domain, dma, &iova, len, + phys, &unlocked, + &unmapped_region_list, + &unmapped_region_cnt, + &iotlb_gather); + if (!unmapped) { + unmapped = unmap_unpin_slow(domain, dma, &iova, + len, phys, + &unlocked); + if (WARN_ON(!unmapped)) + break; + } } }