From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9A562C83F18 for ; Thu, 10 Jul 2025 08:54:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 26C836B00A8; Thu, 10 Jul 2025 04:54:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 21D696B00A9; Thu, 10 Jul 2025 04:54:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 10CDE6B00AB; Thu, 10 Jul 2025 04:54:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id F413B6B00A8 for ; Thu, 10 Jul 2025 04:54:10 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 68E1C1298D9 for ; Thu, 10 Jul 2025 08:54:10 +0000 (UTC) X-FDA: 83647743060.30.EB68C82 Received: from mail-pj1-f54.google.com (mail-pj1-f54.google.com [209.85.216.54]) by imf23.hostedemail.com (Postfix) with ESMTP id 16D6D14000A for ; Thu, 10 Jul 2025 08:54:06 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=AgJ58u9K; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf23.hostedemail.com: domain of lizhe.67@bytedance.com designates 209.85.216.54 as permitted sender) smtp.mailfrom=lizhe.67@bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1752137648; a=rsa-sha256; cv=none; b=Lu6tpmD4Dvmk5Taat9wbT4Qbfc1j1AeOTXv6nzPxBu2VYQLELdwUUp4aSCfpOy+ROZfjjl TKV0BR3iEYNcWoobs/0Uqxy92NnCdd/T252hHYMtr7u8ywgCaZeyMClD5834Evg0HExzN4 YyOQbkptfEYhu2lWMVLsEenKxuKWuQ0= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=AgJ58u9K; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf23.hostedemail.com: domain of lizhe.67@bytedance.com designates 209.85.216.54 as permitted sender) smtp.mailfrom=lizhe.67@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1752137648; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=gWbi6xFF8NpvBvxiKjb+PGBnnwTuskR8M/z7QpKQFLM=; b=EiuuclrIvFCoW8CvkHVUZbkaDpY0Skp1ZZS7dTi7qFUtTIj2qECEPXu+o3X72kGM9f0JvQ gaTPBUYIvHURpfJaC9KSaTy8XeH5lZEeYXcA7H+BUYtSuTGTkLax8f7YbPEiTGv94URBgr jCYiDc1joQ/Guzx8gkSlPjmd3OlbFPo= Received: by mail-pj1-f54.google.com with SMTP id 98e67ed59e1d1-3122368d7c4so739328a91.1 for ; Thu, 10 Jul 2025 01:54:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1752137646; x=1752742446; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=gWbi6xFF8NpvBvxiKjb+PGBnnwTuskR8M/z7QpKQFLM=; b=AgJ58u9KkrZkQgJi7ADgtEC9jb0G26jSrlmczgJAtNKLmEZVgL15kXvEW/kIZ5fL6p esf2Oqr6097HFJnG4LeOp4Dt9N/ZC026lkRL+TjDEffDosKPnzTmD/bzzVbb6otbbNXm xeaVKPl0R55jSRQJp6E2AOtZPlMC4Li4uiARE2fLvdGLIS699mFIECghV6NXNiWOsm1a DYCDnWk/361w20JigjJ+zhnHrq54fjq7hFo36GSjGRX40FMO3c38ZMMXYrH3A6eK7Duu 2z3iQZMkJy9BulippK96L/jBsY+IX9UI5se+clzWDMU7ztG5Ci9LYSNbnugD3oXju3JQ //gw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752137646; x=1752742446; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=gWbi6xFF8NpvBvxiKjb+PGBnnwTuskR8M/z7QpKQFLM=; b=LiDCtILJd6nuqnhLEqzh2Rnb92XgVx4Gjc2CGEppCICQ5PagBIIHC8cWE/htRd14kP vhjVbSYRwI/dH20IMV+xf/S9XJ7IwDLZ59suB+w51zwCz1YMftmj6gYiKDEe3m5iLpk3 YrgQOuZ4r/LOJOERzjYapYTK1MW3gyw1PBqshICFIwBZdYsjVzzoerwHVoCxn1rPrIu5 Vtawnm8KZwZu0YkKRMoF2Kkl3xyRVKm0oxDOcoy+I+tnVSIyfQyf2e5z8xJEKemmTGjD zjTkrimjrb0jtZf77v+pvUaB1YQ16DPQ/vI3nB8zV0Sbj870bRBpG4MVaKkeF47ihjX0 xD7A== X-Forwarded-Encrypted: i=1; AJvYcCUVG3KXeyTtg2rw9zs3wBui4XpqiDQRoIH4Zv9KvE6qg8TZpTL0/pSMWKWXmp8yosMPcnuTZ0FnfA==@kvack.org X-Gm-Message-State: AOJu0Yy44DcWoIwnITxx3V4QTUYgBbFnDCDsGrMPL2sMPY7+I6URPjP9 02elMFEode+3xwubYVp9QWiZC17idQVJN3ewDO4/wVxntOzlnW7qIermVV9xzN24qlU= X-Gm-Gg: ASbGncsakU92MSuVViphQowQRQbDn+8I034GHkjBthUM+gYO3kOdz+7rX9+dKShBleR sJAtZnBbOUcWgyQvNAhO9WmtWSMFCBKosiHDl6+97YCpM7V9SZNWexLoxiPkgq8ZcJ/wzZMsE3V VEoXGgJoSZR175zL36afBNpTOtJM3gwW6B53HQ6ReK6Zdpj7FW8Ko/UFAhtb8xnJBzYegIAEv9B pxdPWfmTfYIU6J40K7D8ZDjh1pedA2eavNYONl3yQIHtP6o0vmIpIIrACWBsJjqP3b6DqlsNymE qaLrJNChZy1hI5O2YTGiKIJbtv+8ckm4j+FV4EEgtsdz1l4VLml+x//0+l7NPtKqyzZxf0VX82e 73Y24vJGxNWdz+fATXHPe0xyH X-Google-Smtp-Source: AGHT+IFSCTXPP1Dwv220thp+IerNnUa3CxD3gdRWucNdZnQM1rhZZ8D4iNbDue+54p483oZCo9Cdfw== X-Received: by 2002:a17:90a:d884:b0:312:e90b:419e with SMTP id 98e67ed59e1d1-31c2fcffd1cmr10025787a91.12.1752137645501; Thu, 10 Jul 2025 01:54:05 -0700 (PDT) Received: from localhost.localdomain ([203.208.189.12]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-31c3e975d41sm1650228a91.13.2025.07.10.01.54.01 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 10 Jul 2025 01:54:05 -0700 (PDT) From: lizhe.67@bytedance.com To: alex.williamson@redhat.com, akpm@linux-foundation.org, david@redhat.com, jgg@ziepe.ca, peterx@redhat.com Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, lizhe.67@bytedance.com Subject: [PATCH v4 0/5] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote() Date: Thu, 10 Jul 2025 16:53:50 +0800 Message-ID: <20250710085355.54208-1-lizhe.67@bytedance.com> X-Mailer: git-send-email 2.45.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 16D6D14000A X-Stat-Signature: 95sy8cyk8eurnhynm3smwkdx5iynw1ct X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1752137646-465274 X-HE-Meta: U2FsdGVkX1/hXz1roGxCJyufH2ODfiij1YymvUuhD7+NRv3kiyiFL+K/ajEuGwU/az29/svr5QgTmh0m26xIJlJEXfqEpCcPKgMPfiaK1tRrdu98dPSXQ6AXsoNkBUX9qeriC3v9Mx1942YBXvovABYuXVZy9SoI07MGT6V0TS+3s0U2zYeEMsfuH0ZyRdBruvGmte2noaD1LdtROq38TsEjShrEbBOBqoE1x6KwH9cZ0IHCJXIAwn9QqHSuC9iFaKbBqMqAjp+IunYfbX8qRE049LJ+09XRYQibS/ywo8CYdHwuBXKethfPDm+t+CVnMgwMJ9eKDanirt5Nc8/UT3zcFgB76Fse8sLN/++CVgLy8NXf38CVPIChsrYAxLcwHy2eQAHXCZGSmDdakUzybSyvQ1LP0Cg09fgmL7BjKqczegLJjw4mKwZp9Q0vJaEuJB363EJU/SPuuZHV9651XoaM/0Xgtd6asianCvBu/P37llIi0YF/OiwhmhjrvkF3e285iaJotbBGjxsn3RGiQoqOMYAm+zDHnp+8KpScXXYn/WW8udNuUSfOARzvIzjDDsrPJSZut5CGuEA0TTAhXnkQKAHgEHmORl0WqDdhlPNER33Ox6JU9c0bVAmfSYcrRqQw9VVgYAj1QSMOmAdds9B6Im42xBGZ3v5Qw8m5+yCiZ/NrUbNC+PWRSJ5fdB66V350oa4nqNhr4CFt+gBaNhfqplcXl+M1LPnezP+7xINGxLpgDptANRc94kbXmEPqd8aEr/WuK22/4LkvNDrqMnvCrnejexX6CdhzUzh8wZ1O2FF6EdfqNGV4ZX17Yeuy0vfAGJTeFyVXrdyJ5PxguNfAlBW3bDX7jaU4Ch3rhQcSPMyAamm0Kq+DDvKD54cYoraVG2HyCHA7J0gmrnLYcvLVdVgjMLlyaVu+uLFucjCQ+VQyzk8riV4PkX4P0F1RbN7ISQZxRiHdvGQFLnX tZKsFhhG rrjh70ogrTUH0D41SPtXB2sn+50PFIqF74rBzRyrOSaeSquFhLg6mJXNNeeFec3HKTIFdQE3wJCVCZgYcZF4g15DaFKXAAHQ/ug0kP0xMeyu8ZLD26h2fGvlW0l1/lzbmIWxGOmypWGRo+j3LyHjQ98093cUfQQetR1fSGA+340ecQggpG38ljB5tGRgN+9oiYd7H9kw5O49rhAkEB1devGdQHkvUsbAfcB2BTpAxPqKLSiqXjUWvx5n5ULBhO/sX9u07EeNptd5p5zKT5FFtnBd72ylr0MC+FaUzJPLT4/dUkm8EnXTmjpV8TjG7JOUYcoPLw3XGwX0a150yGPts15PGYnk8R4+ihVrAXjzoCkXlmtbeYnXvunFhwjy5rT/J8IR+P39GFUFrqCEbHs79SErgKQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Li Zhe This patchset is an integration of the two previous patchsets[1][2]. When vfio_pin_pages_remote() is called with a range of addresses that includes large folios, the function currently performs individual statistics counting operations for each page. This can lead to significant performance overheads, especially when dealing with large ranges of pages. The function vfio_unpin_pages_remote() has a similar issue, where executing put_pfn() for each pfn brings considerable consumption. This patchset primarily optimizes the performance of the relevant functions by batching the less efficient operations mentioned before. The first two patch optimizes the performance of the function vfio_pin_pages_remote(), while the remaining patches optimize the performance of the function vfio_unpin_pages_remote(). The performance test results, based on v6.16-rc4, for completing the 16G VFIO MAP/UNMAP DMA, obtained through unit test[3] with slight modifications[4], are as follows. Base(6.16-rc4): ./vfio-pci-mem-dma-map 0000:03:00.0 16 ------- AVERAGE (MADV_HUGEPAGE) -------- VFIO MAP DMA in 0.047 s (340.2 GB/s) VFIO UNMAP DMA in 0.135 s (118.6 GB/s) ------- AVERAGE (MAP_POPULATE) -------- VFIO MAP DMA in 0.280 s (57.2 GB/s) VFIO UNMAP DMA in 0.312 s (51.3 GB/s) ------- AVERAGE (HUGETLBFS) -------- VFIO MAP DMA in 0.052 s (310.5 GB/s) VFIO UNMAP DMA in 0.136 s (117.3 GB/s) With this patchset: ------- AVERAGE (MADV_HUGEPAGE) -------- VFIO MAP DMA in 0.027 s (600.7 GB/s) VFIO UNMAP DMA in 0.045 s (357.0 GB/s) ------- AVERAGE (MAP_POPULATE) -------- VFIO MAP DMA in 0.261 s (61.4 GB/s) VFIO UNMAP DMA in 0.288 s (55.6 GB/s) ------- AVERAGE (HUGETLBFS) -------- VFIO MAP DMA in 0.031 s (516.4 GB/s) VFIO UNMAP DMA in 0.045 s (353.9 GB/s) For large folio, we achieve an over 40% performance improvement for VFIO MAP DMA and an over 66% performance improvement for VFIO DMA UNMAP. For small folios, the performance test results show a slight improvement with the performance before optimization. [1]: https://lore.kernel.org/all/20250529064947.38433-1-lizhe.67@bytedance.com/ [2]: https://lore.kernel.org/all/20250620032344.13382-1-lizhe.67@bytedance.com/#t [3]: https://github.com/awilliam/tests/blob/vfio-pci-mem-dma-map/vfio-pci-mem-dma-map.c [4]: https://lore.kernel.org/all/20250610031013.98556-1-lizhe.67@bytedance.com/ Li Zhe (5): mm: introduce num_pages_contiguous() vfio/type1: optimize vfio_pin_pages_remote() vfio/type1: batch vfio_find_vpfn() in function vfio_unpin_pages_remote() vfio/type1: introduce a new member has_rsvd for struct vfio_dma vfio/type1: optimize vfio_unpin_pages_remote() drivers/vfio/vfio_iommu_type1.c | 111 ++++++++++++++++++++++++++------ include/linux/mm.h | 23 +++++++ 2 files changed, 113 insertions(+), 21 deletions(-) --- Changelogs: v3->v4: - Fix an indentation issue in patch #2. v2->v3: - Add a "Suggested-by" and a "Reviewed-by" tag. - Address the compilation errors introduced by patch #1. - Resolved several variable type issues. - Add clarification for function num_pages_contiguous(). v1->v2: - Update the performance test results. - The function num_pages_contiguous() is extracted and placed in a separate commit. - The phrase 'for large folio' has been removed from the patchset title. v3: https://lore.kernel.org/all/20250707064950.72048-1-lizhe.67@bytedance.com/ v2: https://lore.kernel.org/all/20250704062602.33500-1-lizhe.67@bytedance.com/ v1: https://lore.kernel.org/all/20250630072518.31846-1-lizhe.67@bytedance.com/ -- 2.20.1