From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 642D7C8303C for ; Mon, 7 Jul 2025 06:50:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D0E4F6B03F9; Mon, 7 Jul 2025 02:50:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C97CD6B03FA; Mon, 7 Jul 2025 02:50:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B5F776B03FB; Mon, 7 Jul 2025 02:50:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 9FC276B03F9 for ; Mon, 7 Jul 2025 02:50:09 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 4838E10A85A for ; Mon, 7 Jul 2025 06:50:09 +0000 (UTC) X-FDA: 83636544138.28.28BE026 Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) by imf18.hostedemail.com (Postfix) with ESMTP id B58131C0007 for ; Mon, 7 Jul 2025 06:50:06 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=e7N3+ebC; spf=pass (imf18.hostedemail.com: domain of lizhe.67@bytedance.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=lizhe.67@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1751871007; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=i8AsWeXMLhqUn3H7B9xWkZ8Mg80AN3Bq26h9flJP3Ag=; b=4o4YNizBkfdEvz4eRhJNwbXza2b0c9n+PNXIbgDo3Tho46fd27HG4eNSmDLRAAGVA9cmFH iH10nlXBoLLQmFnthrswKNPOGM/cjPrKF6kB3O53Tjq03/X3ywtTHdnYwbm7NfEZ+do2tn ZUHszuxSVcPrJRkZ60NDClyBWiWZF6Q= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=e7N3+ebC; spf=pass (imf18.hostedemail.com: domain of lizhe.67@bytedance.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=lizhe.67@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1751871007; a=rsa-sha256; cv=none; b=wVnDEx6zrATiqbHTdIMAADXe+u417StQC8KcMUfcrZJ4OKCJNk/U7n1vi/M/vG57CKLgwx EOttffVm2q4/m3UVwRSqbAbfwjStHumaLEL0MxPU25vBJwENgj/AxSdJY1kokZblDOaKRx /of10BaCBkrgO21vZAmOsl7bf0ddH0s= Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-23602481460so26816325ad.0 for ; Sun, 06 Jul 2025 23:50:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1751871005; x=1752475805; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=i8AsWeXMLhqUn3H7B9xWkZ8Mg80AN3Bq26h9flJP3Ag=; b=e7N3+ebCq8lLWBmfCurCGin54rwNNINJB88O3igIPnOAQ0WDs4sNlHOHjQy+8c6NKi vY8cxwGPDLjgoxh8cx5aGddYXGeS5q1PYQs9CpAmr9B5GWIxL8sJjbolrwo0tEpEyGTX lxc0I5WovKN30ZZRF5G8Vrt2JuYcszMz5EgaVisLfK5lOxtxC9fFRScOd2Eu2CKiDkxr SdPPJnMUYicttwobC3i8u2LclM7bG2RE4nxHFtzR9Zi3xdTS8XQW8TpKXyAZCjQJsJZr dezgBcfnx9nnKhfoGW8+v0eKaVIY0yeQ3G/iatxfqMbLVekLn7q5WcBnAaTQ6VHylfJN oCXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751871005; x=1752475805; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=i8AsWeXMLhqUn3H7B9xWkZ8Mg80AN3Bq26h9flJP3Ag=; b=QJqGtHpN4G7PDtqbPQQ1cH/WrH45QyaXMbbQYvdpdgF+3bt1sfmlBUzdzCG1NPi07k XuxARZcZ7ki5pg9992Pd2nTM9R+ab3Wi5dg8hVs/dzB9udHLK1FXkJ77myDDoFJbhvaq TTD5ADS++tXxjeuTPxkeUp63bXlRs3yZ72c8KQtigps1osZ3ziREBDNPmqlbMu8EN3mQ L+sM8v/4x1szoXRPuoAR2RZZfZm736fTngY0drYHDFurYhKnsniY88i36I9vEAG81Cju tzjNpe+6M7ArLXCmWf0ngw2zxP5Djw/mE1ktnq2X5BrWuTuT+d0W3o22LylX/RQ6KEYJ Cfdw== X-Forwarded-Encrypted: i=1; AJvYcCUEafmMBUt09OJSS1OOQjTmLLh12z1QVH24KTvVrJ6JmLmnByoaNKIxk6mVIB4r+/FzC91fbJGD4g==@kvack.org X-Gm-Message-State: AOJu0Yxjhi8XGDFwWkUxjhiZcE35HzhSXIUtsQvpTjVL+jK7VTbuNwI4 +rsXxSD0FIi4qSo9q8tg2BX/jEwcEC+rnpxtCeZNqc61NSloJnalNnWSeioa2g3fJ2w= X-Gm-Gg: ASbGncvhlqdmivIo4e0sCpYKYo+izoAC+yPVO/xjlux80sIME5asA8oH359bFG2rLFc vs/VIBgvC97jUbEJ62d9rOJAcrnX+SkPlOYT/TigSplgZgHlIKANMLsW+ywymjL/vcYENxGet0X 9iKqiwiWmPn32VtFw1664N4nLKuT98id8tbGxL6G/u/6U85zgSG7jwYVbjPbXS68egbmeSvqO6Z MLdPVuYZKeKvy/N0hFjjgpv4xridiDvnGLRtwbhfT4iNhitruExqe5tpl/Tn7BDmDuCL1j2njpR LYk1S3l+lc4D4VK63IZYA77x9n4MDMcLF8zUFG0HceuZi2i3UbK0tycQYe8zANr80hNtKuBevbS DqVt7o7JrjWTZAzUxLHz7Fq8= X-Google-Smtp-Source: AGHT+IGLihlBavJmP5vmETh0vLgxwYfBYMI7r7oHFioBXcwdK0RdD8OiHt5kuZB6aRF9wDmBnGCq9A== X-Received: by 2002:a17:902:f70f:b0:235:e309:7dec with SMTP id d9443c01a7336-23c9105d8dcmr85967755ad.26.1751871005195; Sun, 06 Jul 2025 23:50:05 -0700 (PDT) Received: from localhost.localdomain ([203.208.189.9]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-23c8431a1aasm77377635ad.15.2025.07.06.23.50.01 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Sun, 06 Jul 2025 23:50:04 -0700 (PDT) From: lizhe.67@bytedance.com To: alex.williamson@redhat.com, akpm@linux-foundation.org, david@redhat.com, jgg@ziepe.ca, peterx@redhat.com Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, lizhe.67@bytedance.com Subject: [PATCH v3 0/5] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote() Date: Mon, 7 Jul 2025 14:49:45 +0800 Message-ID: <20250707064950.72048-1-lizhe.67@bytedance.com> X-Mailer: git-send-email 2.45.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: B58131C0007 X-Rspam-User: X-Stat-Signature: 6wqtqstsubqy1g49ay5ratef4eadszso X-HE-Tag: 1751871006-573843 X-HE-Meta: U2FsdGVkX193k00y3IodUPWrZT/noIj/y0r64ww2R9px71CYkcl8T+n0RjB4CdI/0A2Ku0DtyOa3SG+6hzgc+2t6Lzs7qSSGW1FTHngyKxg7xiBhOmjMRmKQ2suq90NHUEte6zSqiMBOCqn6KgSVvZ+qVcTGD6ilg6MRYSb7lF46sFfSqbl7ZkBgCR8CBeHsOBSC34fTZ9fmLe6usGZPs5sbnL0gwwxEqoqmelyJWldiOsjvv/rMyh25ghZvm76N6DBgCgEhTYrjjcindnYCvnXPT/zecFEdATYoc3XkkDFCdkfkzGS0OCqmwfuy8bUUOC5SiSgQVN6QHLVJr9ya4ugRhqTM6Css4w5gb/up5JKW5HzfHwvdv83f9C49OcVUCFNUK8+4O3F1pnIg3ohfLNpXRuSWP1wUHWjb3H7kIcN4PrJZd6l65e9mi3/Ccd2M4JyCTKlwuieXtl9KBSTkQQE8I8qapfcdUxafsRaxvgx5TEEkY0kzezpIq4LOu1ZhSFBYzeGFhDfLSU/XPTLxleDwUKvRPxj45meQDqIFrVgm/7ybX3bTIcu0ag2THGMNBOY2UftsVLOCN7PTk6b0J2dFEgaL7Ndu0KH7/GAuCRKD+WsaWqhLcozBMc7jVgQ1Nt5Lb2EtPZcRHPPNE4KXU0eaXxScFb5c4TlQgDf2Xalh9n6W3i3kK0zLKLO7dz+ulIQwe04n7FF+N59N1wa9E2Z2xctxr0a36JuNl6LMYDENgGEZPOyEPXazLnT6A9xVwkVu3HPfq/TZCidZc0n8WPU6Yw5C3Q7wwSbtCQ7mm+eEY59tx9jnUUmL3fjKXxNNfkrQacNItemZt51ZDyXo8P7HdVlDrexxtMxLQrPiU5Q2gvV3IC7qGO6+3FFf+bmhb/M7g34zG2RxU+buhYA/z6SpZaQ+lF0KuCIPGSYHKKf68weFaK9IkzgSr+Hr8lpK33Kxan7LNcpVinHx4Xx rbqpjy6h jm4Xu6xbInUVwkUsxt65r7x5WckxaLJ0ifhgVq8OKoouquQNmY4jlGHCn/q9EjKGPlH0dcXNz05Py3Yfhv2NAkrNfNouUaoJWxvz7QoNM6IesmFUrT/F3xnIqpFc2fPZ0H/JaWKD3jxenRZH7RtcVtpDyH5sHPHuXJ9AFJQAYwulY/L+yAGW2VEpA+XIRivVc2q1M4lK1WE/S5DjuARsZoOkH6xCGBBW5PqLoIKBmiEZwmXpkOIX0mn7URnJ6ms8hFkcZq4pDc+ogNuyQ76/I5dveMnQwLQQjY/fibymYpaUrx0Wux8XTGzU9W7F45ElTdbkRBIyk23dVWSWNy2ehcpFFWe0ZoV4BzcB5l95Oj8We+HhZphN/zSzhHq/THkc9asvg/QJBRgLOArgy/4LWhQy15g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Li Zhe This patchset is an integration of the two previous patchsets[1][2]. When vfio_pin_pages_remote() is called with a range of addresses that includes large folios, the function currently performs individual statistics counting operations for each page. This can lead to significant performance overheads, especially when dealing with large ranges of pages. The function vfio_unpin_pages_remote() has a similar issue, where executing put_pfn() for each pfn brings considerable consumption. This patchset primarily optimizes the performance of the relevant functions by batching the less efficient operations mentioned before. The first two patch optimizes the performance of the function vfio_pin_pages_remote(), while the remaining patches optimize the performance of the function vfio_unpin_pages_remote(). The performance test results, based on v6.16-rc4, for completing the 16G VFIO MAP/UNMAP DMA, obtained through unit test[3] with slight modifications[4], are as follows. Base(6.16-rc4): ./vfio-pci-mem-dma-map 0000:03:00.0 16 ------- AVERAGE (MADV_HUGEPAGE) -------- VFIO MAP DMA in 0.047 s (340.2 GB/s) VFIO UNMAP DMA in 0.135 s (118.6 GB/s) ------- AVERAGE (MAP_POPULATE) -------- VFIO MAP DMA in 0.280 s (57.2 GB/s) VFIO UNMAP DMA in 0.312 s (51.3 GB/s) ------- AVERAGE (HUGETLBFS) -------- VFIO MAP DMA in 0.052 s (310.5 GB/s) VFIO UNMAP DMA in 0.136 s (117.3 GB/s) With this patchset: ------- AVERAGE (MADV_HUGEPAGE) -------- VFIO MAP DMA in 0.027 s (600.7 GB/s) VFIO UNMAP DMA in 0.045 s (357.0 GB/s) ------- AVERAGE (MAP_POPULATE) -------- VFIO MAP DMA in 0.261 s (61.4 GB/s) VFIO UNMAP DMA in 0.288 s (55.6 GB/s) ------- AVERAGE (HUGETLBFS) -------- VFIO MAP DMA in 0.031 s (516.4 GB/s) VFIO UNMAP DMA in 0.045 s (353.9 GB/s) For large folio, we achieve an over 40% performance improvement for VFIO MAP DMA and an over 66% performance improvement for VFIO DMA UNMAP. For small folios, the performance test results show a slight improvement with the performance before optimization. [1]: https://lore.kernel.org/all/20250529064947.38433-1-lizhe.67@bytedance.com/ [2]: https://lore.kernel.org/all/20250620032344.13382-1-lizhe.67@bytedance.com/#t [3]: https://github.com/awilliam/tests/blob/vfio-pci-mem-dma-map/vfio-pci-mem-dma-map.c [4]: https://lore.kernel.org/all/20250610031013.98556-1-lizhe.67@bytedance.com/ Li Zhe (5): mm: introduce num_pages_contiguous() vfio/type1: optimize vfio_pin_pages_remote() vfio/type1: batch vfio_find_vpfn() in function vfio_unpin_pages_remote() vfio/type1: introduce a new member has_rsvd for struct vfio_dma vfio/type1: optimize vfio_unpin_pages_remote() drivers/vfio/vfio_iommu_type1.c | 111 ++++++++++++++++++++++++++------ include/linux/mm.h | 23 +++++++ 2 files changed, 113 insertions(+), 21 deletions(-) --- Changelogs: v2->v3: - Add a "Suggested-by" and a "Reviewed-by" tag. - Address the compilation errors introduced by patch #1. - Resolved several variable type issues. - Add clarification for function num_pages_contiguous(). v1->v2: - Update the performance test results. - The function num_pages_contiguous() is extracted and placed in a separate commit. - The phrase 'for large folio' has been removed from the patchset title. v2: https://lore.kernel.org/all/20250704062602.33500-1-lizhe.67@bytedance.com/ v1: https://lore.kernel.org/all/20250630072518.31846-1-lizhe.67@bytedance.com/ -- 2.20.1