From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 26F7FC83F07 for ; Mon, 7 Jul 2025 06:50:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B60F06B03FD; Mon, 7 Jul 2025 02:50:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AE9C36B03FE; Mon, 7 Jul 2025 02:50:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 98AD16B03FF; Mon, 7 Jul 2025 02:50:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 834466B03FD for ; Mon, 7 Jul 2025 02:50:19 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 53D1D160398 for ; Mon, 7 Jul 2025 06:50:19 +0000 (UTC) X-FDA: 83636544558.19.10D5352 Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) by imf15.hostedemail.com (Postfix) with ESMTP id 67109A0007 for ; Mon, 7 Jul 2025 06:50:17 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=RH9+HBV8; spf=pass (imf15.hostedemail.com: domain of lizhe.67@bytedance.com designates 209.85.214.170 as permitted sender) smtp.mailfrom=lizhe.67@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1751871017; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=e8Cadi1skh4P1ApGM+PqU48O5jgE1uMvu2URGbuzsOs=; b=rw4VCjPI/yH5LmHDlT0caGG/URpuoN7L7au1/kv4lmIp3sZwgRH4j9fFjQg0Qv2hxfu1G0 m6e0YC7xCHT5GCtayPuxFj2gHWxxfIDBiAiKTfhkVVUeI6puhjwEhQbGNpENbY/3sX7h+w pQWntCYFUU8b86k9jGo/rJZ1GICxcFg= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=RH9+HBV8; spf=pass (imf15.hostedemail.com: domain of lizhe.67@bytedance.com designates 209.85.214.170 as permitted sender) smtp.mailfrom=lizhe.67@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1751871017; a=rsa-sha256; cv=none; b=kznQoo3mOKVjzSw93ibof+pXYrppCc/kPM7UJRQWltDDajigE2o7TCYhporTaKnS7Dw5ow 3yEBEOkytCNEWaMtg0vqfV/lEMs7hHt8+9u7/fKNcLGcTWgHyuiMar/1fGQw8ES7uluDhA MrjyJpFgBlp7M8u0VWU/MJI4czaOYbE= Received: by mail-pl1-f170.google.com with SMTP id d9443c01a7336-2350fc2591dso24562815ad.1 for ; Sun, 06 Jul 2025 23:50:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1751871016; x=1752475816; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=e8Cadi1skh4P1ApGM+PqU48O5jgE1uMvu2URGbuzsOs=; b=RH9+HBV8A8NUSQVi5/3VegPNRORlZNXFZi/Nf5Un0YWHl5abF25NbXHNl+Etsc9wL5 PXN06SpiCGhhdIQDcmmUDFG02Yh9BMr/cWqWcB6P5UI5qKiN0NJshjGWkQZuKRna1xWk Mwejlil0CR7g0TSVp/9CSvPK2HSxOqjEVcU4kfK+BFco1r/wpTDQJCxMkSzmqZMWBBeI r5yS7hz2pckRGrNvuraLu5A8XLo6HS2lOFzlVt5PHg4xOcCgqMCsVj9ttrrZnxuHmEX0 R6iNHquXMC0wncxCiGL0QIRyRA0N5S2xLSj7lv+tLb11CgkmCOdkATgX16QVrTrjs6QX 6D/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751871016; x=1752475816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=e8Cadi1skh4P1ApGM+PqU48O5jgE1uMvu2URGbuzsOs=; b=SsWOslvirMS6BCotTzWjD4pd0itTmoGNfM6+c8OcxeIXMqlJPup6RF1nMpoN4QI812 1Cab98FESnGul0Yld6WFEm7nb5JU+95PfDm4aqirs5uMGLmO5RKZnIDQGUn+G4iC2ZTm k5Tm1vemFgNcHsfUizSNYoz6WgA4A6lH/xGpoiYYoFpx7LtnF2FI/BebH94+7t3sSS9u X+aielvFkqYoVjZeCZpshWlEVW2yxKSKTkxpF5BcpkoJNIssPjORiIbx7B1rP5Tkrqn+ TsC7/m4bsQrtmtGCYJCDRrCPaFzii5WhDYcfodh+plU8fsbngsGwUMscYzsC3fAasO1u xcUg== X-Forwarded-Encrypted: i=1; AJvYcCV/1XLJp7oD+CpRJR/9pu3v/uZ2YsV+C9fVa9X5kea1eEgBrgnbjdyBxZGvgwHIMARmBslgG8n+KA==@kvack.org X-Gm-Message-State: AOJu0YwF1mVE8wv1rsL9zJHeBY9FwTnMtDtbifcyz0AMwRg69xwluFjK CFKGbKrYl2UggqU08YbBeDAl6a2jO803wd2BTj1jFhfkTBp1hHlFJdpAfUUAYMIRjsI= X-Gm-Gg: ASbGncsWUF8cxgWy/77xOAGThGOH8sGIWQNb2Mzczz0BesiUDRyypZyTDrSW0qrG7jV eEeywlJmSe2MVnnVLPskNtMOz7BSBdpXu0LWvVZlMTib/XFxgkJGPFHUXKZPQuAUT5E+XFwLhqD DLV6KiGVV6OjmDVyJUrTXmSCu78bXojwiZkpvdeWwPi7rB2ddb9CwhnVkfD8nPyhVG9FBEOhM0P ZYeFeNiHhCejqv03m89rZJHE7D4lYG6O/EVlBnysAnXRkkfBBqe0g2bsjja/Eb952BZrqdbEOjX CjQJz0o6/TV5zb5UJKWW62NmVcyEN+1JB9+aWuUeiXC1mVW9a7JT9Sgb8LjxihZa3SBkBSfPcRm vj0cSK5k+cbhy X-Google-Smtp-Source: AGHT+IFDMC5zNe5FLEop89MsvruvqnioIQVRr64gsK9+ppHn/U4Xbh1NwxhORd/zYO8Y0kwaArR9Dg== X-Received: by 2002:a17:902:e80f:b0:236:6f43:7053 with SMTP id d9443c01a7336-23c85886bb4mr158575025ad.2.1751871015977; Sun, 06 Jul 2025 23:50:15 -0700 (PDT) Received: from localhost.localdomain ([203.208.189.9]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-23c8431a1aasm77377635ad.15.2025.07.06.23.50.12 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Sun, 06 Jul 2025 23:50:15 -0700 (PDT) From: lizhe.67@bytedance.com To: alex.williamson@redhat.com, akpm@linux-foundation.org, david@redhat.com, jgg@ziepe.ca, peterx@redhat.com Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, lizhe.67@bytedance.com Subject: [PATCH v3 2/5] vfio/type1: optimize vfio_pin_pages_remote() Date: Mon, 7 Jul 2025 14:49:47 +0800 Message-ID: <20250707064950.72048-3-lizhe.67@bytedance.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20250707064950.72048-1-lizhe.67@bytedance.com> References: <20250707064950.72048-1-lizhe.67@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 67109A0007 X-Stat-Signature: fer1b7npsyar47jzo94sbsyj9fqpf679 X-Rspam-User: X-HE-Tag: 1751871017-577670 X-HE-Meta: U2FsdGVkX18PuuLqv6na4iEoRW1LBpS8An0zhId62MqeTBnABggYkbRQ/2Aol5//WQav8wS3uU/iAKDRiMklSiCpKnLwbEYzKMaoLQMnHWocC4Tjr/an4eByz5kOJpHfOCA9923wfzzvIPlhEz+JZdDixX4Xma3yb9avCmIBwIAAiQwMj7zf/Bla4UWTUlw//+0rayfoQ13Xpgt099t+8fsOlbIICBUCa9ZJXjw8nwTD/3aUGJ2Qgm+YEMI6PNXKaexb1ulTW+D5hP3sDC3eETumB25B/VPJLW5/642McfBE6em2w2wmc3nDjE50mBzeDetUBlv7F755hWJ5m2tmVKnTUhlHYMDWw67T3Vb1xNH1veb9p6biVeX8JO9BT5x+5K03sCN0NXSPObQOF14w0ve4ITrz5e03kUpsMNzhqtXuxs09HoJnLiWE5trFh6HEdQzVEyjakovFY8b2w3PmISETydhneBdbUttFAb7o9RwU+ahfCW/wj9I28LPa0wm3ATNaWRwRUZozSOGQLdkVrQQOgnbe1YUh3lcgMagnbxFswiT4fuxLvykbSOzHz3dOZmb3XSrRcZNxnBCl1izAxVVx63sTr88Nfiuw0+NsGJnKI1hPLVBBsNdc/715POnbHm1uEnDZlrETVopSKgxPSCMpLjRC5Kx0tpS1zEXnB/kGF7gQmlX/yXCZ28NlO0efpfX6cYWqu+DHxJSrWUY4F7TvCbWOBcjeHKrEoiBcvq0pzpZp6+OmhnV+eux5WCKTnxlaRPMlo3iLifFFGoQM2UqbrqRQ6+DDpFoilsIoMfoSyTnyKT0t3MP++rtPQPNpOI5haBuB5lj3nlQ6Cc52FyrrVsHjZkBhi4DFSX38rxwtwwKvPgqm+8PnH0cq9mA9pxQs+xEIZbiqCZWrcCIMp/w9J151i+ypLDP8H9BGaKs21ZJUA7TEmFfB0r/n4x59jyA8MoNPKciUSZS+hw/ vCrp/iKp n+zmS8M2ajSC49Rwujm1fj7gMupgni9MHPOnKEDm8xY9y/ONsAjk+kaQYOSkpaMppUFb49BBH2EDrcRX1vE86UPD3wHQz1YUBZsjbCmpGOh/it2ruGDodkKznGDyGFYDvnUJ1KOYpV7JTDJtpnETYTY2zlknd3zOlilPmFZg7D01vGjTkDUvsYV5ABpXEEF7U857yGpLmjIKxEXL3q21a6FncmBIjX2lEMjefbqVhbAytn0aVj2wWDzee3bGZgQ2J95kIXbEb63SKeE93W8KV+dPta7wfovbamapDF+WOPCQBXj6NlQLNn6CwRgGYJyLUJZGNFCPaiLO86HECwaiXmX32CSaGA/3uG1dOk6eq7i1aKQqyOMQ26OlxiQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Li Zhe When vfio_pin_pages_remote() is called with a range of addresses that includes large folios, the function currently performs individual statistics counting operations for each page. This can lead to significant performance overheads, especially when dealing with large ranges of pages. Batch processing of statistical counting operations can effectively enhance performance. In addition, the pages obtained through longterm GUP are neither invalid nor reserved. Therefore, we can reduce the overhead associated with some calls to function is_invalid_reserved_pfn(). The performance test results for completing the 16G VFIO IOMMU DMA mapping are as follows. Base(v6.16-rc4): ------- AVERAGE (MADV_HUGEPAGE) -------- VFIO MAP DMA in 0.047 s (340.2 GB/s) ------- AVERAGE (MAP_POPULATE) -------- VFIO MAP DMA in 0.280 s (57.2 GB/s) ------- AVERAGE (HUGETLBFS) -------- VFIO MAP DMA in 0.052 s (310.5 GB/s) With this patch: ------- AVERAGE (MADV_HUGEPAGE) -------- VFIO MAP DMA in 0.027 s (602.1 GB/s) ------- AVERAGE (MAP_POPULATE) -------- VFIO MAP DMA in 0.257 s (62.4 GB/s) ------- AVERAGE (HUGETLBFS) -------- VFIO MAP DMA in 0.031 s (517.4 GB/s) For large folio, we achieve an over 40% performance improvement. For small folios, the performance test results indicate a slight improvement. Signed-off-by: Li Zhe Co-developed-by: Alex Williamson Signed-off-by: Alex Williamson --- drivers/vfio/vfio_iommu_type1.c | 83 ++++++++++++++++++++++++++++----- 1 file changed, 71 insertions(+), 12 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 1136d7ac6b59..03fce54e1372 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -318,7 +318,13 @@ static void vfio_dma_bitmap_free_all(struct vfio_iommu *iommu) /* * Helper Functions for host iova-pfn list */ -static struct vfio_pfn *vfio_find_vpfn(struct vfio_dma *dma, dma_addr_t iova) + +/* + * Find the highest vfio_pfn that overlapping the range + * [iova_start, iova_end) in rb tree. + */ +static struct vfio_pfn *vfio_find_vpfn_range(struct vfio_dma *dma, + dma_addr_t iova_start, dma_addr_t iova_end) { struct vfio_pfn *vpfn; struct rb_node *node = dma->pfn_list.rb_node; @@ -326,9 +332,9 @@ static struct vfio_pfn *vfio_find_vpfn(struct vfio_dma *dma, dma_addr_t iova) while (node) { vpfn = rb_entry(node, struct vfio_pfn, node); - if (iova < vpfn->iova) + if (iova_end <= vpfn->iova) node = node->rb_left; - else if (iova > vpfn->iova) + else if (iova_start > vpfn->iova) node = node->rb_right; else return vpfn; @@ -336,6 +342,11 @@ static struct vfio_pfn *vfio_find_vpfn(struct vfio_dma *dma, dma_addr_t iova) return NULL; } +static inline struct vfio_pfn *vfio_find_vpfn(struct vfio_dma *dma, dma_addr_t iova) +{ + return vfio_find_vpfn_range(dma, iova, iova + PAGE_SIZE); +} + static void vfio_link_pfn(struct vfio_dma *dma, struct vfio_pfn *new) { @@ -614,6 +625,39 @@ static long vaddr_get_pfns(struct mm_struct *mm, unsigned long vaddr, return ret; } + +static long vpfn_pages(struct vfio_dma *dma, + dma_addr_t iova_start, long nr_pages) +{ + dma_addr_t iova_end = iova_start + (nr_pages << PAGE_SHIFT); + struct vfio_pfn *top = vfio_find_vpfn_range(dma, iova_start, iova_end); + long ret = 1; + struct vfio_pfn *vpfn; + struct rb_node *prev; + struct rb_node *next; + + if (likely(!top)) + return 0; + + prev = next = &top->node; + + while ((prev = rb_prev(prev))) { + vpfn = rb_entry(prev, struct vfio_pfn, node); + if (vpfn->iova < iova_start) + break; + ret++; + } + + while ((next = rb_next(next))) { + vpfn = rb_entry(next, struct vfio_pfn, node); + if (vpfn->iova >= iova_end) + break; + ret++; + } + + return ret; +} + /* * Attempt to pin pages. We really don't want to track all the pfns and * the iommu can only map chunks of consecutive pfns anyway, so get the @@ -680,32 +724,47 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr, * and rsvd here, and therefore continues to use the batch. */ while (true) { + long nr_pages, acct_pages = 0; + if (pfn != *pfn_base + pinned || rsvd != is_invalid_reserved_pfn(pfn)) goto out; + /* + * Using GUP with the FOLL_LONGTERM in + * vaddr_get_pfns() will not return invalid + * or reserved pages. + */ + nr_pages = num_pages_contiguous( + &batch->pages[batch->offset], + batch->size); + if (!rsvd) { + acct_pages = nr_pages; + acct_pages -= vpfn_pages(dma, iova, nr_pages); + } + /* * Reserved pages aren't counted against the user, * externally pinned pages are already counted against * the user. */ - if (!rsvd && !vfio_find_vpfn(dma, iova)) { + if (acct_pages) { if (!dma->lock_cap && - mm->locked_vm + lock_acct + 1 > limit) { + mm->locked_vm + lock_acct + acct_pages > limit) { pr_warn("%s: RLIMIT_MEMLOCK (%ld) exceeded\n", __func__, limit << PAGE_SHIFT); ret = -ENOMEM; goto unpin_out; } - lock_acct++; + lock_acct += acct_pages; } - pinned++; - npage--; - vaddr += PAGE_SIZE; - iova += PAGE_SIZE; - batch->offset++; - batch->size--; + pinned += nr_pages; + npage -= nr_pages; + vaddr += PAGE_SIZE * nr_pages; + iova += PAGE_SIZE * nr_pages; + batch->offset += nr_pages; + batch->size -= nr_pages; if (!batch->size) break; -- 2.20.1