From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 752B2C83F1B for ; Fri, 11 Jul 2025 21:35:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EBA108D0002; Fri, 11 Jul 2025 17:35:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E1BED8D0001; Fri, 11 Jul 2025 17:35:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CBCF18D0002; Fri, 11 Jul 2025 17:35:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id B72C08D0001 for ; Fri, 11 Jul 2025 17:35:31 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 69EAD1D8242 for ; Fri, 11 Jul 2025 21:35:31 +0000 (UTC) X-FDA: 83653290462.18.CEB590C Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf29.hostedemail.com (Postfix) with ESMTP id 10271120002 for ; Fri, 11 Jul 2025 21:35:28 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="Tu/kR/Fr"; spf=pass (imf29.hostedemail.com: domain of alex.williamson@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=alex.williamson@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1752269729; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=GgkWm7qRDs+Q41tEmKDUfbmdZCqmGjrzM3iUWhFiHo4=; b=R2s5m4bt9iNpONV4RD8z9srDmI6gQfcUJOT7WnpDYa4rP2p1mtToCiYN7cZYoKJQ7g8Mww VJDVHah260lYgNbQ//moUD0VdakiM9OpCJSmqE8RzO0+SvcYXHND4/eEljWjAe+9rzCOP7 3QFE7uqLDDQtcYgMgQCPhWXtNbCXwTY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1752269729; a=rsa-sha256; cv=none; b=puobivVgOYDzA0dmH6GWQMpnNihL3B7RUcRV+4SX3Nl01wZ43mGSeeJqK+LehLv9fgWOZ4 JgSBCIrgfqhxweMoakDIF4gXH8Vwtg7tW1XoESWIJSqK0Texw5BMdQ0hmvafpN4cVWahmL NfGII2VuhSAplPmouqnw/bB34b8hblY= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="Tu/kR/Fr"; spf=pass (imf29.hostedemail.com: domain of alex.williamson@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=alex.williamson@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1752269728; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GgkWm7qRDs+Q41tEmKDUfbmdZCqmGjrzM3iUWhFiHo4=; b=Tu/kR/FrLWKB7x+NplPxx3Y1Z7SpgUaa0gN1tp9tcTOMgm5r1JhUGbItbx8sv1tJpDI/Pf pZXioQSfS9ELrYYYlWjP7eLw1nG+hM27YyQ6KpclZY9Sh/433h5vW3xUhjgX2H8TRtmSzT 6qm/3p9W0x9kur9ARZ6bS8w89mI3/Ps= Received: from mail-io1-f69.google.com (mail-io1-f69.google.com [209.85.166.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-145-0mhlgOyXOGCi9UWxrpwcMQ-1; Fri, 11 Jul 2025 17:35:27 -0400 X-MC-Unique: 0mhlgOyXOGCi9UWxrpwcMQ-1 X-Mimecast-MFC-AGG-ID: 0mhlgOyXOGCi9UWxrpwcMQ_1752269726 Received: by mail-io1-f69.google.com with SMTP id ca18e2360f4ac-87333a93bd9so10047239f.1 for ; Fri, 11 Jul 2025 14:35:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752269726; x=1752874526; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=GgkWm7qRDs+Q41tEmKDUfbmdZCqmGjrzM3iUWhFiHo4=; b=DaCCa5E5tOo5peMRfnnyJr0ME/8218Bu1m5vHJ8BZmxj4UoFs2GR3JWwAZUp7LHjE3 HC6cumLUdC33F9nn0Pab3JhHu+rXEc1HmvrxZj/+QVQzpuwN9FpRSdENlqZf7PnlSufy hX2kVXy2uRxKXoNw7M51IEK2vc8EDXz9O6fvBJ5o2aSENlsDSzVe0v+Y0dzCAJJTehl4 AMqPBpnV+2/7ru1/zywqEfZK2CNwv5XtwYAJSI1xCm6UKqzjuLAbPdJf8rcxDsrICo7D Qut5fIz6x0r+Gh9pnzrb9YNlxgvxDwFrMMpcgGA21F439oaWYNYaiRzfMYGRWJBKGqkW OJFw== X-Forwarded-Encrypted: i=1; AJvYcCWpUmoMgUHZqZxJ6R78l9RxxbfTUudyzO+hO+UeF0vLw+xK4Dp0UZujC0TBcGdWIUhhhq402QRBEQ==@kvack.org X-Gm-Message-State: AOJu0Yw4VARNUfBXAYUIwMbL1+zNpTa2Oi2LHtk12AD3ZcEaskVKt/J3 4vg6iu55N4+TYy7rmfC5D/OREsMWLHIfHPO5u5fxasZvaS24Y0QtT6CtrC8ZmfYtOTO+k6A0kbr +zAoD8CYUyRvmILFfJAitatIccM/xHh64h86dGpQ8JpHAmU28HoUS X-Gm-Gg: ASbGncvRM8h9Sv7xmQuOEsj0zN0sUZnomOpz5rxdk8vmB2pJtQInPnFeMW1pRU6I7H0 EvfD9G4Mvhfbv/irg+20b8zqcYwC/IYNJLVeev3Fj8mJJSd3NC/1esCD6NOE4quijZHpmIHbJLh aIuVxI/q1rgWx1eFegOLqXXoWuDUbMUxpQMAV5qgLZolYHCXmrQ7xENS/ctiawe+IZqPcq6q908 aPc5Scp+zUXG6bBx3ssqS72Va9RVbES4x+/dC/YM6BiPBrUOvoWEChQqxABeD/EaIjeWcnvJJvQ ESoVPPNa0J0psPusTchGP696V8l0L1TB+I1eO1J2l5Q= X-Received: by 2002:a05:6602:1688:b0:85e:12c1:fe90 with SMTP id ca18e2360f4ac-879792f8fbemr145243339f.5.1752269726187; Fri, 11 Jul 2025 14:35:26 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGgzikGdHj30Bq9CuVwdpEt36LfpGDi5nuIaLVIiPS6ipxZYS3T0Muy4zgTBSWsG9GGcy2HpQ== X-Received: by 2002:a05:6602:1688:b0:85e:12c1:fe90 with SMTP id ca18e2360f4ac-879792f8fbemr145242639f.5.1752269725706; Fri, 11 Jul 2025 14:35:25 -0700 (PDT) Received: from redhat.com ([38.15.36.11]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-50556b1d379sm989950173.129.2025.07.11.14.35.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 11 Jul 2025 14:35:25 -0700 (PDT) Date: Fri, 11 Jul 2025 15:35:23 -0600 From: Alex Williamson To: lizhe.67@bytedance.com Cc: akpm@linux-foundation.org, david@redhat.com, jgg@ziepe.ca, peterx@redhat.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v4 2/5] vfio/type1: optimize vfio_pin_pages_remote() Message-ID: <20250711153523.42d68ec0.alex.williamson@redhat.com> In-Reply-To: <20250710085355.54208-3-lizhe.67@bytedance.com> References: <20250710085355.54208-1-lizhe.67@bytedance.com> <20250710085355.54208-3-lizhe.67@bytedance.com> Organization: Red Hat MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: op3uGU32Hsl6CMLr7PWgMwXJj_JwFw5FlKpBLR3zTo8_1752269726 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 10271120002 X-Rspam-User: X-Stat-Signature: gsejm9szkmjci4xmgosez8fgt9oc6yku X-Rspamd-Server: rspam05 X-HE-Tag: 1752269728-788400 X-HE-Meta: U2FsdGVkX1/HYfDRElQ72sBgq2/+PyKcegYW2qEQ8C8RDfio/mSn/S35d6Me3unTocgW8dhvemi5cmzgmQlavg71WBjjekUk4JjqlRuaCf0OneHSjjaP1Kn/cjCmPANURg8m+JR8hFn9ClL6OnNMaeLfontTU1lRSIUAVhV4XPGdAQO8SruYS8n9tWlX7E4WVtiemPiVHpJWxxV8PwT3wU794c8CQfg6E/ITyf0xegVyx/5wSLrkp4/sYT+ZlBA4rMwDzshKignQhAPvA4N1sR2EYMBaLysGjsmCqelQwKOCxVTu/6eksDfooWRPQIkgW+nJu4BLuN+CjKj/puxYB/wODfaVZw95wDICNLZxbSAKKHIsL9YtNHTwiYTA53giAjz6NIAWjK3EdibLtIqU2h0flYysK1YXObC7dd7tz9NlRlwuCqiZSFgWnUO24+YSuvD6GZL79mseTYjeRvPoEKQ6NQSIutUhg4uh8E6JS/9EAhpFvGgEf3Sg0doy32g886DYMcHPx3zBdE+0OuZh9NAzZz+mPcu0A3f2uwsGiWQfNlyg/mlC82QlrU+DNic3px0t8x9qXADQyLls7VhbfbpglbfwyzWNmpRLgvvnITPn5H4VXPg5ZwK5Mk3cWekqoACB5MBe+amK5FSmLRmlwHdeItLNQu/HPzng3uwAK792T1TSTm0sYPocBID7dAuCYbbTckWpHGdq+A7GFnR0Qka2vch7+bFO5Nauu5n1Whie2qtyzPBIWRbxp24Oo/QrzWNmklweX3aKj7nWqlcpN5rIgbx9/Bvsdk5HywDWmUnooghKKjtV8bbEKZ8y++fK7zfo0Z47eLf2V6Y9Kn0pHXdpiX9SuDzgsAMD6MmyAcqq6DwZhQaGm/MEwBAXs/a8UXUjyj8zqAoQFUR6eCKosPTJQZ4a/eQrwvuaR/Nm2nQ3wFcUwW6/OOy9szv140fDahq9J9RRGx0dj8y94yc ARWkm00B DRmJFP7Sso4Qg7ZuQq172T9bOC/aZWwLtxIMdq7U6QZMjE7D6RBsPOBUuBxjIjnAXGuK5OgDthIq6tLvUQjTIvOx53w/bHk2INANBVZ92pyHwrZXNYGaHKbD3w8+JB9162F5bV+/Go34S8/6t43amTvYG98QkhLvFjqQNDrHdy+zRKzsEwlAge+H9UBJF9e4phG4sRVYkiTfuQYz8fAE0Bv3d8Tt27ZSZINgdjtXq7HvOlAU++LR7wIMmo3U4v/XO1zpuVvkuWMqwlAzWfqfoUw6FF/sq4TTEiKVKJEgJ8UzfQugtijveSDZjHP2reJBx45AOdXYRlutsbg8KhmgBPuUXeAz0Of4ZMwQgxe/y1y+p2i6FFvkgcWidR+545pXJVHokO/FzVoVfHl+nuqKwrLTxPlhaPSzGHsly6PpSD1LyAJW7WPYOWLVUozpTuwtg+9T50DawsH314vV0UGGBNq5ac01F+4EvYfllUYrMvmbhkiB/x/6Hdqe6f/pFlGvrzJUL X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, 10 Jul 2025 16:53:52 +0800 lizhe.67@bytedance.com wrote: > From: Li Zhe > > When vfio_pin_pages_remote() is called with a range of addresses that > includes large folios, the function currently performs individual > statistics counting operations for each page. This can lead to significant > performance overheads, especially when dealing with large ranges of pages. > Batch processing of statistical counting operations can effectively enhance > performance. > > In addition, the pages obtained through longterm GUP are neither invalid > nor reserved. Therefore, we can reduce the overhead associated with some > calls to function is_invalid_reserved_pfn(). > > The performance test results for completing the 16G VFIO IOMMU DMA mapping > are as follows. > > Base(v6.16-rc4): > ------- AVERAGE (MADV_HUGEPAGE) -------- > VFIO MAP DMA in 0.047 s (340.2 GB/s) > ------- AVERAGE (MAP_POPULATE) -------- > VFIO MAP DMA in 0.280 s (57.2 GB/s) > ------- AVERAGE (HUGETLBFS) -------- > VFIO MAP DMA in 0.052 s (310.5 GB/s) > > With this patch: > ------- AVERAGE (MADV_HUGEPAGE) -------- > VFIO MAP DMA in 0.027 s (602.1 GB/s) > ------- AVERAGE (MAP_POPULATE) -------- > VFIO MAP DMA in 0.257 s (62.4 GB/s) > ------- AVERAGE (HUGETLBFS) -------- > VFIO MAP DMA in 0.031 s (517.4 GB/s) > > For large folio, we achieve an over 40% performance improvement. > For small folios, the performance test results indicate a > slight improvement. > > Signed-off-by: Li Zhe > Co-developed-by: Alex Williamson > Signed-off-by: Alex Williamson > Acked-by: David Hildenbrand > --- > drivers/vfio/vfio_iommu_type1.c | 83 ++++++++++++++++++++++++++++----- > 1 file changed, 71 insertions(+), 12 deletions(-) > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index 1136d7ac6b59..6909275e46c2 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -318,7 +318,13 @@ static void vfio_dma_bitmap_free_all(struct vfio_iommu *iommu) > /* > * Helper Functions for host iova-pfn list > */ > -static struct vfio_pfn *vfio_find_vpfn(struct vfio_dma *dma, dma_addr_t iova) > + > +/* > + * Find the highest vfio_pfn that overlapping the range > + * [iova_start, iova_end) in rb tree. > + */ > +static struct vfio_pfn *vfio_find_vpfn_range(struct vfio_dma *dma, > + dma_addr_t iova_start, dma_addr_t iova_end) > { > struct vfio_pfn *vpfn; > struct rb_node *node = dma->pfn_list.rb_node; > @@ -326,9 +332,9 @@ static struct vfio_pfn *vfio_find_vpfn(struct vfio_dma *dma, dma_addr_t iova) > while (node) { > vpfn = rb_entry(node, struct vfio_pfn, node); > > - if (iova < vpfn->iova) > + if (iova_end <= vpfn->iova) > node = node->rb_left; > - else if (iova > vpfn->iova) > + else if (iova_start > vpfn->iova) > node = node->rb_right; > else > return vpfn; > @@ -336,6 +342,11 @@ static struct vfio_pfn *vfio_find_vpfn(struct vfio_dma *dma, dma_addr_t iova) > return NULL; > } > > +static inline struct vfio_pfn *vfio_find_vpfn(struct vfio_dma *dma, dma_addr_t iova) > +{ > + return vfio_find_vpfn_range(dma, iova, iova + PAGE_SIZE); > +} > + > static void vfio_link_pfn(struct vfio_dma *dma, > struct vfio_pfn *new) > { > @@ -614,6 +625,39 @@ static long vaddr_get_pfns(struct mm_struct *mm, unsigned long vaddr, > return ret; > } > > + > +static long vpfn_pages(struct vfio_dma *dma, > + dma_addr_t iova_start, long nr_pages) > +{ > + dma_addr_t iova_end = iova_start + (nr_pages << PAGE_SHIFT); > + struct vfio_pfn *top = vfio_find_vpfn_range(dma, iova_start, iova_end); > + long ret = 1; > + struct vfio_pfn *vpfn; > + struct rb_node *prev; > + struct rb_node *next; > + > + if (likely(!top)) > + return 0; > + > + prev = next = &top->node; > + > + while ((prev = rb_prev(prev))) { > + vpfn = rb_entry(prev, struct vfio_pfn, node); > + if (vpfn->iova < iova_start) > + break; > + ret++; > + } > + > + while ((next = rb_next(next))) { > + vpfn = rb_entry(next, struct vfio_pfn, node); > + if (vpfn->iova >= iova_end) > + break; > + ret++; > + } > + > + return ret; > +} > + > /* > * Attempt to pin pages. We really don't want to track all the pfns and > * the iommu can only map chunks of consecutive pfns anyway, so get the > @@ -680,32 +724,47 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr, > * and rsvd here, and therefore continues to use the batch. > */ > while (true) { > + long nr_pages, acct_pages = 0; > + > if (pfn != *pfn_base + pinned || > rsvd != is_invalid_reserved_pfn(pfn)) > goto out; > > + /* > + * Using GUP with the FOLL_LONGTERM in > + * vaddr_get_pfns() will not return invalid > + * or reserved pages. > + */ > + nr_pages = num_pages_contiguous( > + &batch->pages[batch->offset], > + batch->size); > + if (!rsvd) { > + acct_pages = nr_pages; > + acct_pages -= vpfn_pages(dma, iova, nr_pages); > + } > + > /* > * Reserved pages aren't counted against the user, > * externally pinned pages are already counted against > * the user. > */ > - if (!rsvd && !vfio_find_vpfn(dma, iova)) { > + if (acct_pages) { > if (!dma->lock_cap && > - mm->locked_vm + lock_acct + 1 > limit) { > + mm->locked_vm + lock_acct + acct_pages > limit) { Don't resend, I'll fix on commit, but there's still a gratuitous difference in leading white space from the original. Otherwise the series looks good to me but I'll give Jason a little more time to provide reviews since he's been so active in the thread (though he'd rather we just use iommufd ;). Thanks, Alex > pr_warn("%s: RLIMIT_MEMLOCK (%ld) exceeded\n", > __func__, limit << PAGE_SHIFT); > ret = -ENOMEM; > goto unpin_out; > } > - lock_acct++; > + lock_acct += acct_pages; > } > > - pinned++; > - npage--; > - vaddr += PAGE_SIZE; > - iova += PAGE_SIZE; > - batch->offset++; > - batch->size--; > + pinned += nr_pages; > + npage -= nr_pages; > + vaddr += PAGE_SIZE * nr_pages; > + iova += PAGE_SIZE * nr_pages; > + batch->offset += nr_pages; > + batch->size -= nr_pages; > > if (!batch->size) > break;