From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C82A9D5B865 for ; Tue, 29 Oct 2024 03:33:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 405F46B00BE; Mon, 28 Oct 2024 23:33:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3B5BA6B00BF; Mon, 28 Oct 2024 23:33:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 27DA56B00C5; Mon, 28 Oct 2024 23:33:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 015AD6B00BE for ; Mon, 28 Oct 2024 23:33:04 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id A0A7AA06A6 for ; Tue, 29 Oct 2024 03:33:04 +0000 (UTC) X-FDA: 82725218016.24.4EC9D94 Received: from out30-130.freemail.mail.aliyun.com (out30-130.freemail.mail.aliyun.com [115.124.30.130]) by imf26.hostedemail.com (Postfix) with ESMTP id 88FFC140009 for ; Tue, 29 Oct 2024 03:32:43 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b="B/uYyw9G"; spf=pass (imf26.hostedemail.com: domain of qinyuntan@linux.alibaba.com designates 115.124.30.130 as permitted sender) smtp.mailfrom=qinyuntan@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730172728; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=I/2DTLm1WNbI9Umt9nEXdjlHaYdgN4nxe2/jWIWHQxM=; b=cl0YH+FGvYfHtSaSBi4pLr2Qc6zF9+tiIwW7pk9xJt+HTHBnA7Vh5KZQpscj8YJoDd+xfA K6aYbee18SFxeFYJEH+zaNHfF3aU4e4EGUOi26eV51y6s/Hldnn3yRe3o7fhQl5OkW74nJ g7col4wLnDsyQg5GtGDq2qx1ncg/LUI= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b="B/uYyw9G"; spf=pass (imf26.hostedemail.com: domain of qinyuntan@linux.alibaba.com designates 115.124.30.130 as permitted sender) smtp.mailfrom=qinyuntan@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730172728; a=rsa-sha256; cv=none; b=mne2/eDaAfzmwUN3VRA/nEWb+69tlqK2SL7pM1sr2uDPzMHPNoHXfXx6mMgRF2N1R5+h1k 7vbP2WPZ9SVr3lHYx/2VxGAX/506OZLVi64KE+nN1DPinl9lCjrtTkjwmv4+HfF1SK5Rn6 rAp9OWbLQkd+oB2pIE+oAdOcVCrB560= DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1730172778; h=Message-ID:Date:MIME-Version:From:Subject:To:Content-Type; bh=I/2DTLm1WNbI9Umt9nEXdjlHaYdgN4nxe2/jWIWHQxM=; b=B/uYyw9G/2pU9deK5DScBjiDkylbJBXVD6zCeNwPGJhmb2r5LEOshZAH1NhrfpH8SEBBCNyTb8fcPoRCw2VWP+eVAR9fgEgeWB7lgsqNOr1lfhvHkdya2AFoTBLq7Y1hK38gsTmyK1bzE7aNsPacypztHwdOyfB2lhOHjfmoK+k= Received: from 30.178.65.205(mailfrom:qinyuntan@linux.alibaba.com fp:SMTPD_---0WI8m0P8_1730172769 cluster:ay36) by smtp.aliyun-inc.com; Tue, 29 Oct 2024 11:32:57 +0800 Message-ID: <18761ea2-46a7-4c79-a5b7-933e26362559@linux.alibaba.com> Date: Tue, 29 Oct 2024 11:32:49 +0800 MIME-Version: 1.0 User-Agent: =?UTF-8?B?TW96aWxsYSBUaHVuZGVyYmlyZCDmtYvor5XniYg=?= From: qinyuntan Subject: Re: [PATCH v1: vfio: avoid unnecessary pin memory when dma map io address space 0/2] To: Alex Williamson Cc: Andrew Morton , linux-mm@kvack.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org References: <20241024110624.63871cfa.alex.williamson@redhat.com> In-Reply-To: <20241024110624.63871cfa.alex.williamson@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Stat-Signature: nsejou7hnp7qxw5qysmmha46g9e8hidh X-Rspamd-Queue-Id: 88FFC140009 X-Rspamd-Server: rspam11 X-HE-Tag: 1730172763-611706 X-HE-Meta: U2FsdGVkX195zirSY6E1rN+BPqEELgUuiikPa1AbmRuROrHKEb1odT8bE3MApkROGIxEMLKWPW6bACIibIOMxBR70iPiduAknhVOn4PL8grcyRW1nVyLIYnKMGAsGiHwaRMbJa/y1Ytz+ACjsLFemf3TCXQOztHvxBLNlYzAgmM7Awak0YwqARqncep8xCHGYPU9876SroqDEyaNLJsCdr3OyOSdeWcoCNKuxAUxViSxVI+QotdjveUeDcFRwIJ8oZC2c7HUwmESaEjSfXR7nehkR1uBBQe0mZqsaThrbOEGtYQWixeprkrQqkQ8ihLb47oXGxNlRebNXLbVqKUjplChlkTBUQrJSu1e3COHD4IE97aepXmGQD9NvfM+4BaxsL7OZSkP2A/3r7tH7yETPN9MZIjeG2MGdpmNusc7G2UC4Xa0cqU+xGpys2VxqtxStkE3vP5QY7W+s3QHmQcS8CI/SIuoFcSBxYuXakOS12bPLQ87sTzc0EdZM1oZYlLUs7CyKTM/oLbfu1w7nm8gl/RyPoqRdX2nFUCMzECkB1JpqCU8EIaxJKl2dYq7FeZmTre9pbPmzXE7Ngixdpxx4ejJ3F0/tm3nwEzy9SqKrjueAl5gEKZwVAOULdlGJeUbrNIuY+WvKzgqtdJ0JaIYuPp+pAkL60lktATz5jchRLsVzOnmEkz+Op3vqse6kcJzoip2AN20YnUZUdFTugPq9d4VTsAHNaa6djQXPf4Ve0jskfpeaRpVEPTZB6hD63dyLtee7k4HCE3rUAq9DqNtHoHY7+CshhukOUSyci9ZNspS5SBOOA+M1oiy57NjFO9zNY1Lk42aruOez7IgfP5keU2ANyoUvdMOnzrKm5NLP8ZQyvdizqWZKLvj4tf6KOhWBqEZ+rQMix9rJA0/icBnC9Uin1dvmc9njafbyseKD6lSB2YIKOTtBw4stvUl+S6wjzLREbaXpBji07oqqIB feucQMHC lqK5MBKPdA99EwTdRvvj19yK3oSInhVuD0OWF5V8mvExyls+grgLXS4X5vto+xRqb/126DF2ZIvOuSwLzJ2hCBOyRHJ99QUBgVKKOgr3iuIvLTCZyXTHrQoStnDVtDG2Zp42GJVn+KsHldkOLHdi63OgtKO5y4unAZxREF1CLNg2s4JOILMD+JnZHpPj1P+EhtYsjnytLjqFVTwjvWdSKPdiyW8i5CVDMwMvqWGP2r0nFzSCnHt1Bnxp0f/0QbbU5wdsbYWu8c0aEh3D2W0KCGAGy/g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: You are right, it seems I did not get the relevant updates in time. In the patch f9e54c3a2f5b7 ("vfio/pci: implement huge_fault support"), huge_fault was introduced, and maybe we can achieve the same effect by adjusting the function vfio_pci_mmap_huge_fault's order parameter. Thanks, Qinyun Tan On 2024/10/25 01:06, Alex Williamson wrote: > On Thu, 24 Oct 2024 17:34:42 +0800 > Qinyun Tan wrote: > >> When user application call ioctl(VFIO_IOMMU_MAP_DMA) to map a dma address, >> the general handler 'vfio_pin_map_dma' attempts to pin the memory and >> then create the mapping in the iommu. >> >> However, some mappings aren't backed by a struct page, for example an >> mmap'd MMIO range for our own or another device. In this scenario, a vma >> with flag VM_IO | VM_PFNMAP, the pin operation will fail. Moreover, the >> pin operation incurs a large overhead which will result in a longer >> startup time for the VM. We don't actually need a pin in this scenario. >> >> To address this issue, we introduce a new DMA MAP flag >> 'VFIO_DMA_MAP_FLAG_MMIO_DONT_PIN' to skip the 'vfio_pin_pages_remote' >> operation in the DMA map process for mmio memory. Additionally, we add >> the 'VM_PGOFF_IS_PFN' flag for vfio_pci_mmap address, ensuring that we can >> directly obtain the pfn through vma->vm_pgoff. >> >> This approach allows us to avoid unnecessary memory pinning operations, >> which would otherwise introduce additional overhead during DMA mapping. >> >> In my tests, using vfio to pass through an 8-card AMD GPU which with a >> large bar size (128GB*8), the time mapping the 192GB*8 bar was reduced >> from about 50.79s to 1.57s. > > If the vma has a flag to indicate pfnmap, why does the user need to > provide a mapping flag to indicate not to pin? We generally cannot > trust such a user directive anyway, nor do we in this series, so it all > seems rather redundant. > > What about simply improving the batching of pfnmap ranges rather than > imposing any sort of mm or uapi changes? Or perhaps, since we're now > using huge_fault to populate the vma, maybe we can iterate at PMD or > PUD granularity rather than PAGE_SIZE? Seems like we have plenty of > optimizations to pursue that could be done transparently to the user. > Thanks, > > Alex