From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 17B4BCCF9E3 for ; Fri, 7 Nov 2025 18:58:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 13E5A8E0005; Fri, 7 Nov 2025 13:58:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0EF678E0002; Fri, 7 Nov 2025 13:58:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 02BED8E0005; Fri, 7 Nov 2025 13:58:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id E538B8E0002 for ; Fri, 7 Nov 2025 13:58:45 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 84FA7497B1 for ; Fri, 7 Nov 2025 18:58:45 +0000 (UTC) X-FDA: 84084722610.12.2C5EF1E Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) by imf11.hostedemail.com (Postfix) with ESMTP id DDAE640009 for ; Fri, 7 Nov 2025 18:58:42 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=infradead.org header.s=bombadil.20210309 header.b=H59v75Ni; spf=none (imf11.hostedemail.com: domain of rdunlap@infradead.org has no SPF policy when checking 198.137.202.133) smtp.mailfrom=rdunlap@infradead.org; dmarc=pass (policy=none) header.from=infradead.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1762541923; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=PjiF0FDj1ggdA8+srznADWIujFetT2w1WcC08yNm+xE=; b=BuIdC/+nScaUTHHhQ/C/zsVwAJOwMsjS3v60IZ3rePmP2o5/248JSzNviWUpiLku3w2dM0 Mje5puuWxoYe3hONhD0Qp8pQX/Lcju+Z7FAQLpNrpOu7YnVgZHS7o66zvpkTgKQArQ58w1 uRqsCJQIQq9hyoX6g8qJ+UCfuT7C9m4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1762541923; a=rsa-sha256; cv=none; b=OdqICTKazKwzuHyIgzmnzZSMf7du+1O9xVDVfUoTf0ahaK0MmlgQKKrMfml4BabbecHDGX Mc8BNQWaJ62TyiFJRB90TCDHz6IO69VkP7YiliV2Au4Kq9ZufAPw40F0ed3DL+DHJl3Ef+ posNCP/zOiVyA3mlsSjxW4+ID9/W2Hg= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=infradead.org header.s=bombadil.20210309 header.b=H59v75Ni; spf=none (imf11.hostedemail.com: domain of rdunlap@infradead.org has no SPF policy when checking 198.137.202.133) smtp.mailfrom=rdunlap@infradead.org; dmarc=pass (policy=none) header.from=infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: Content-Type:In-Reply-To:From:References:Cc:To:Subject:MIME-Version:Date: Message-ID:Sender:Reply-To:Content-ID:Content-Description; bh=PjiF0FDj1ggdA8+srznADWIujFetT2w1WcC08yNm+xE=; b=H59v75NiiIx555adWgSui938BX Yc3i1CYEaGF7O6kQ1NqaqRVT/Zz7Cswv0/fmOzsuOsqzoIRagasqeUrHrhrdChqEbZeXwaG25NO+S 5wwiT1Bk++lMCO2rTIpSkO6LuvNwiSk1/lj7Setie4/hO5vXucwRTkyP4yelMuEJS4S3/t3xpANoZ fMWvRG+69LUGWzme6PVDco0ZFCeM7vfUpGeksDZI3z5FpBYHlU5+MqvBB/XCUz0tSEk1fDzrRkUZ2 YHwjo2XWY68XJ4S/UyDHNpcYYoIRz6GCK8IoBHNkuYm6Mq2NkrCY2ePjJ4l6ltSL9O9XJUZxKJPhW /Noao/Cg==; Received: from [50.53.43.113] (helo=[192.168.254.34]) by bombadil.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1vHRfI-00000000bYE-3FZm; Fri, 07 Nov 2025 18:58:28 +0000 Message-ID: <0c265a9b-fdc5-40d7-845f-30910f1ac6ea@infradead.org> Date: Fri, 7 Nov 2025 10:58:27 -0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v7 05/11] PCI/P2PDMA: Document DMABUF model To: Leon Romanovsky Cc: Bjorn Helgaas , Logan Gunthorpe , Jens Axboe , Robin Murphy , Joerg Roedel , Will Deacon , Marek Szyprowski , Jason Gunthorpe , Andrew Morton , Jonathan Corbet , Sumit Semwal , =?UTF-8?Q?Christian_K=C3=B6nig?= , Kees Cook , "Gustavo A. R. Silva" , Ankit Agrawal , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Krishnakant Jaju , Matt Ochs , linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, iommu@lists.linux.dev, linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org, kvm@vger.kernel.org, linux-hardening@vger.kernel.org References: <20251106-dmabuf-vfio-v7-0-2503bf390699@nvidia.com> <20251106-dmabuf-vfio-v7-5-2503bf390699@nvidia.com> <135df7eb-9291-428b-9c86-d58c2e19e052@infradead.org> <20251107160120.GD15456@unreal> Content-Language: en-US From: Randy Dunlap In-Reply-To: <20251107160120.GD15456@unreal> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Stat-Signature: jx1bi6khw58eg6oajewy8ydh9hn4hjpq X-Rspam-User: X-Rspamd-Queue-Id: DDAE640009 X-Rspamd-Server: rspam01 X-HE-Tag: 1762541922-63222 X-HE-Meta: U2FsdGVkX1+bdG719U47WPeGI0oejfTXuGCSoA0HI7zid6ZVjDFKwQYXmXEVSsAioJ2WOQ+8pXF56ZjyAbPgM4LjDtp+RnpVSP+A6AqdULaRK53rXY8Os6kl7rsfnwm7k3W+fIayLBGIH89vqXv3dyFLY9FjYj1jZS1RdBbfrdNWW6r725KyMfesSW+JoPt9PH5+g3xEqCr/BDGGu2Z1ctHkM94icIn7TQ6GtmTLlTFcvWAf8H0lOI7+lzECAbHAMn8mzaVUc+8FUcflDBQsV+cpwR11kZvYORR/LN2FJ4oQsOljkAj3r1LpFfJYWoHb6Me6MvFP8QA11J4FhOhMvHl8onOM4ngYB6/p74zYw9hjbEeQCbk8LzqatUN0LYk29gk7mUck3UK+/X1oo1jCILmlBipQ1dVNJe1mSdJ4ssagPUkiNiREBVk2Ia3H31sOwSTeybh7jp532zUQplOsdV9A65TUuwpp8A6s2L3i1P0H9FX25HayHrus9XcCKId9r6kyvTVCgplqN6NG3FafjWQ4kyyiss7LYkcBSD9LZUEFUR927/4KrPfowiSypxrIChi49abQ07Oj8+7n8xJvA+jFpWbyUaF63lXLni7Kc7Gz/yKIoYC1QbVEdTTCJYpcwmjdqM7lYEcop+F2tJ4VYphvzjI9+9NN1D4qtcwyClAxozwPKpiOZFHOOQJOjTHUUyJhS+/6HUwt31ERazFlFh5nlCtcB0uUp/nuUTERSyr0Wwfxl87jJqokMIvil/gWv1CANWiBMGkHfF3NAQC8zfuwumoFgycX6CtnEiJ14IMtdnLepu/06aPk0IWuuWK2yhNdW/oKnD7r0PLjFM1Ridop4JYZwUtqLiKv1SO5S+Smv6nQhdTMG9vJo+Ar4iocT/pb72fjXd3/9GDBWhEd2kqZKP6IBHA6gqkDnJgTV6QM4zdkbL9tpzidHMkbvDKfJUfx6g13elO5Htlj98+ JQnbArVG y8HZPUeBN+es7OIIXSsLux3gOxqNiuQ6E0mBuPYOJxmPIFKXB4KoVjkyXECjTHDXHEGblAPWmL4zTOY3zP8T6HoriUDD6QCJnCsCK1Rc7QjE2FkqM9Jau4W1dyNIMrYQkekKEgWkGMsyuLSpBleQNmjh3VHikc8G7B8mWWrt28bK4q7zSlp2fEmDYb+lBTFoSlAYEs8nLS5U9wtET3m7rJtJycHnScTUF7M+lSfoXgoElo72snzXxfoCtNeVBmlmbKns29BNedNftZxuU95cHbK+t+6eGWM2W7efsNjloqljgjSPfV2hHb4IzZ1tq+ATvawcu X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 11/7/25 8:01 AM, Leon Romanovsky wrote: > On Thu, Nov 06, 2025 at 10:15:07PM -0800, Randy Dunlap wrote: >> >> >> On 11/6/25 6:16 AM, Leon Romanovsky wrote: >>> From: Jason Gunthorpe >>> >>> Reflect latest changes in p2p implementation to support DMABUF lifecycle. >>> >>> Signed-off-by: Leon Romanovsky >>> Signed-off-by: Jason Gunthorpe >>> --- >>> Documentation/driver-api/pci/p2pdma.rst | 95 +++++++++++++++++++++++++-------- >>> 1 file changed, 72 insertions(+), 23 deletions(-) >>> >>> diff --git a/Documentation/driver-api/pci/p2pdma.rst b/Documentation/driver-api/pci/p2pdma.rst >>> index d0b241628cf1..69adea45f73e 100644 >>> --- a/Documentation/driver-api/pci/p2pdma.rst >>> +++ b/Documentation/driver-api/pci/p2pdma.rst >>> @@ -9,22 +9,47 @@ between two devices on the bus. This type of transaction is henceforth >>> called Peer-to-Peer (or P2P). However, there are a number of issues that >>> make P2P transactions tricky to do in a perfectly safe way. >>> >>> -One of the biggest issues is that PCI doesn't require forwarding >>> -transactions between hierarchy domains, and in PCIe, each Root Port >>> -defines a separate hierarchy domain. To make things worse, there is no >>> -simple way to determine if a given Root Complex supports this or not. >>> -(See PCIe r4.0, sec 1.3.1). Therefore, as of this writing, the kernel >>> -only supports doing P2P when the endpoints involved are all behind the >>> -same PCI bridge, as such devices are all in the same PCI hierarchy >>> -domain, and the spec guarantees that all transactions within the >>> -hierarchy will be routable, but it does not require routing >>> -between hierarchies. >>> - >>> -The second issue is that to make use of existing interfaces in Linux, >>> -memory that is used for P2P transactions needs to be backed by struct >>> -pages. However, PCI BARs are not typically cache coherent so there are >>> -a few corner case gotchas with these pages so developers need to >>> -be careful about what they do with them. >>> +For PCIe the routing of TLPs is well defined up until they reach a host bridge >> >> Define what TLP means? > > In PCIe "world", TLP is very well-known and well-defined acronym, which > means Transaction Layer Packet. It's your choice (or Bjorn's). I'm just reviewing... >> well-defined > > Thanks > > diff --git a/Documentation/driver-api/pci/p2pdma.rst b/Documentation/driver-api/pci/p2pdma.rst > index 69adea45f73e..7530296a5dea 100644 > --- a/Documentation/driver-api/pci/p2pdma.rst > +++ b/Documentation/driver-api/pci/p2pdma.rst > @@ -9,17 +9,17 @@ between two devices on the bus. This type of transaction is henceforth > called Peer-to-Peer (or P2P). However, there are a number of issues that > make P2P transactions tricky to do in a perfectly safe way. > > -For PCIe the routing of TLPs is well defined up until they reach a host bridge > -or root port. If the path includes PCIe switches then based on the ACS settings > -the transaction can route entirely within the PCIe hierarchy and never reach the > -root port. The kernel will evaluate the PCIe topology and always permit P2P > -in these well defined cases. > +For PCIe the routing of Transaction Layer Packets (TLPs) is well-defined up > +until they reach a host bridge or root port. If the path includes PCIe switches > +then based on the ACS settings the transaction can route entirely within > +the PCIe hierarchy and never reach the root port. The kernel will evaluate > +the PCIe topology and always permit P2P in these well-defined cases. > > However, if the P2P transaction reaches the host bridge then it might have to > hairpin back out the same root port, be routed inside the CPU SOC to another > PCIe root port, or routed internally to the SOC. > > -As this is not well defined or well supported in real HW the kernel defaults to > +As this is not well-defined or well supported in real HW the kernel defaults to Nit: well-supported The rest of it looks good. Thanks. > blocking such routing. There is an allow list to allow detecting known-good HW, > in which case P2P between any two PCIe devices will be permitted. > > @@ -39,7 +39,7 @@ delegates lifecycle management to the providing driver. It is expected that > drivers using this option will wrap their MMIO memory in DMABUF and use DMABUF > to provide an invalidation shutdown. These MMIO pages have no struct page, and > if used with mmap() must create special PTEs. As such there are very few > -kernel uAPIs that can accept pointers to them, in particular they cannot be used > +kernel uAPIs that can accept pointers to them; in particular they cannot be used > with read()/write(), including O_DIRECT. > > Building on this, the subsystem offers a layer to wrap the MMIO in a ZONE_DEVICE > @@ -154,7 +154,7 @@ access happens. > Usage With DMABUF > ================= > > -DMABUF provides an alternative to the above struct page based > +DMABUF provides an alternative to the above struct page-based > client/provider/orchestrator system. In this mode the exporting driver will wrap > some of its MMIO in a DMABUF and give the DMABUF FD to userspace. > > @@ -162,10 +162,10 @@ Userspace can then pass the FD to an importing driver which will ask the > exporting driver to map it. > > In this case the initiator and target pci_devices are known and the P2P subsystem > -is used to determine the mapping type. The phys_addr_t based DMA API is used to > +is used to determine the mapping type. The phys_addr_t-based DMA API is used to > establish the dma_addr_t. > > -Lifecycle is controlled by DMABUF move_notify(), when the exporting driver wants > +Lifecycle is controlled by DMABUF move_notify(). When the exporting driver wants > to remove() it must deliver an invalidation shutdown to all DMABUF importing > drivers through move_notify() and synchronously DMA unmap all the MMIO. > -- ~Randy