From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AD9EFC36001 for ; Wed, 19 Mar 2025 08:31:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4C8DE280002; Wed, 19 Mar 2025 04:31:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 476A5280001; Wed, 19 Mar 2025 04:31:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 33FF5280002; Wed, 19 Mar 2025 04:31:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 10996280001 for ; Wed, 19 Mar 2025 04:31:06 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 6A4D61A1AED for ; Wed, 19 Mar 2025 08:31:06 +0000 (UTC) X-FDA: 83237630532.13.9AC0949 Received: from tor.source.kernel.org (tor.source.kernel.org [172.105.4.254]) by imf22.hostedemail.com (Postfix) with ESMTP id B8C22C0007 for ; Wed, 19 Mar 2025 08:31:04 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="k/ZmCalc"; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf22.hostedemail.com: domain of leon@kernel.org designates 172.105.4.254 as permitted sender) smtp.mailfrom=leon@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742373064; a=rsa-sha256; cv=none; b=wcdRO4tDBwHHm/tn1sEgyShf6lXC6E9dV3m2SpFwr9JHZk8JJ2GmisYDynBVd0s2DucgnE 0ze+9qFM8ojkl3IjRR3ciZPDZKaYx6tIaqNJvCZDLBEqqkhilXzXebNYEp36NEhLbcAI6Q B1yoDdLzZRoV8mpc5dYFS8i75mIISpo= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="k/ZmCalc"; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf22.hostedemail.com: domain of leon@kernel.org designates 172.105.4.254 as permitted sender) smtp.mailfrom=leon@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742373064; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=N6niQD/zx4cnSEvuK0mXXM2/kRgA/QWu5PSY74hFTCc=; b=B8eBEaz5l6KMwGFK8t7XHCAt3sgX2RZ34dovJn4Dolpa9eSDR0CFbx/IQ6U3Sg2uBdVo+8 bhjavx/F5ZitF53PftxhjrcHqzzPybGzBzSHpmuLKejAxVZ4T3USjPKSMWBQnh1jo+Sx0U Kx7N5RI9Z3zh9REDbsOeuDLIyXECwlw= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by tor.source.kernel.org (Postfix) with ESMTP id 5CA8B615B8; Wed, 19 Mar 2025 08:31:02 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 96DF5C4CEE9; Wed, 19 Mar 2025 08:31:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1742373063; bh=w3JmB7o8CKormT/Akok+UN9U+U8VYEVmbKd3okiCYoE=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=k/ZmCalc1dPOZG6tOAhA+Vbavt7ywew9AlCIx2pzmLHVBTMjREfKHqE2Yum/A2k90 Mgo+utD6esZ3GSzapnLqbyG8yVsrTWUf4LItvlOCrYiiwuNO3yR2hQSEZpAOU6nnCo ZQRiEYn2mZ5bFSK8HEji2pCWj4CLMwBv5PoBA514qIBaTnW4NxRHmPJsRxHz6F9LM6 DEj4Vkpkiz3HR+9d8OInlfDxTa9WdUELXV90gpobKxd8KKBd5UGsQL7y13D0qR11Xw 1V/5hXHr1Z4zSHswxS63wOUI6bDFZ2FK5D8dKbOltHWG9ThPOmE1f0m3YkoTz1ruoI rs/FbXe+RBpfw== Date: Wed, 19 Mar 2025 10:30:58 +0200 From: Leon Romanovsky To: Marek Szyprowski Cc: Robin Murphy , Christoph Hellwig , Jason Gunthorpe , Jens Axboe , Joerg Roedel , Will Deacon , Sagi Grimberg , Keith Busch , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , =?iso-8859-1?B?Suly9G1l?= Glisse , Andrew Morton , Jonathan Corbet , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, Randy Dunlap Subject: Re: [PATCH v7 00/17] Provide a new two step DMA mapping API Message-ID: <20250319083058.GG1322339@unreal> References: <20250220124827.GR53094@unreal> <1166a5f5-23cc-4cce-ba40-5e10ad2606de@arm.com> <20250312193249.GI1322339@unreal> <20250314184911.GR1322339@unreal> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20250314184911.GR1322339@unreal> X-Rspamd-Server: rspam07 X-Rspam-User: X-Stat-Signature: 433imjj67x9rc31azrgooytqhannreui X-Rspamd-Queue-Id: B8C22C0007 X-HE-Tag: 1742373064-171610 X-HE-Meta: U2FsdGVkX1+w+P8QNT+mdHwJ+zizc78Yje4ELV2VMqfdo9i/Ul7icV8KM6bXNJv0HAHSVDTixfDr/MMaLQpUPL6jLGTYGvSHA4zMPHVseVge9fZ6E9UO63zorbth0EkrEi0w9ZX03fVQdkBZKnQcx+M70dARTUkS7cHSNmrQLcPwH0OyNyw35e35c7oG5Vk9kjYYN/2Z7u0jHcoNzUCWUaFy5hNX8SdeezuU3AkjakOH3PpvSRugpPPNKjTZgT8Z8hbtHnz41UurLD52sBSdxjij6f/8ueWbaHq/59NoxbljS267dHfD9iev10a6n5dobXaZiZvVj9D7Kof8bOdbJRFXV7/Uvw92uEFmb0xzQUe1vMdom1qg8hOTcoLEUTOss08t6kX2P/gSW3X3kDTgs3WAvXp+PObSNa2hA/E/9iScQ8uvgptJlR2X8ZslU4hzaI1xux/XacvXKOTF5+kFE16W5Ngb2udta5sChoLPhO2aqiJKRkf/LGwmnu4sNfroxsSW32fsc+bysrC9/hVTsyI1tL1bnwgwgruPP/fOqTfMgO8eqohhdQVisUhmNGh50G57LaLNqvWcs920Ynu3bUcELO/tqV2jsKPcqyDurHxP7H90lr+Iyhy2c6l66/KzEkDscitTnmAodwQJbhhBHSyp0RyPmQiV2B2TwxhfWQVldXWXd3LbArU+XU1d7EpFRpgAfjIwPvjoeF3h0qeYH5x6fKen7gXZAw0Ukb1GI6ZOYlLnynysBC1xawETuFt2ve2NNVkos6dDuYbjm/i+B3plhj/N7VoKrdWDH0wkmo4jcK6oV+6eqUGqW4+d7RP6o6x0A5/Ocd2I6bapcIMKH7/PrM/OR4kx1NARYWUrqC702zf6TZ6JdjzyO+PAGvchpNwE2Ygebaenj+jRfCiUx4p8Phq02oQJwNuN3iBSbZKHCYtjcJ2Mv8LxDD/DtU4LwJxiEsa7kMGVHuZIz4S zY117Lmt 9uSJWpOL/n8M54+X1y7DENpGpSGTNXaqJBH47+TK8WvOfRhIj3YgwH7AZGIRlVkJsaLSfNXEWTpZsYTPpvktSXPsxeFEB6pj1adQqbsrwxFc63yQzUQN1Su+9ucIOlAaHAnDqnEU4jLRlPXM/QbnQl7YYHIuWOmC1ulNwCdiDxCgtb1wq5yJJJQRUzvQjnJLNnGrZtDiVS+9L1YlhN14BqNESQIHkEfHs+YCSVfj8zdYAHHvLPuE9+qyg5+IgJzQJiq4g28rw0DAo1PLrTrH5FaSsBf8W3kGDeFPe6RulQWpcUZAmS0AgYJZq4A8zX/10XgAWvWZzQFw/Dbwx60eN5JVPamBrcjfCULjP X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Mar 14, 2025 at 08:49:11PM +0200, Leon Romanovsky wrote: > On Fri, Mar 14, 2025 at 11:52:58AM +0100, Marek Szyprowski wrote: > > On 12.03.2025 20:32, Leon Romanovsky wrote: > > > On Wed, Mar 12, 2025 at 10:28:32AM +0100, Marek Szyprowski wrote: > > >> Hi Robin > > >> > > >> On 28.02.2025 20:54, Robin Murphy wrote: > > >>> On 20/02/2025 12:48 pm, Leon Romanovsky wrote: > > >>>> On Wed, Feb 05, 2025 at 04:40:20PM +0200, Leon Romanovsky wrote: > > >>>>> From: Leon Romanovsky > > >>>>> > > >>>>> Changelog: > > >>>>> v7: > > >>>>>   * Rebased to v6.14-rc1 > > >>>> <...> > > >>>> > > >>>>> Christoph Hellwig (6): > > >>>>>    PCI/P2PDMA: Refactor the p2pdma mapping helpers > > >>>>>    dma-mapping: move the PCI P2PDMA mapping helpers to pci-p2pdma.h > > >>>>>    iommu: generalize the batched sync after map interface > > >>>>>    iommu/dma: Factor out a iommu_dma_map_swiotlb helper > > >>>>>    dma-mapping: add a dma_need_unmap helper > > >>>>>    docs: core-api: document the IOVA-based API > > >>>>> > > >>>>> Leon Romanovsky (11): > > >>>>>    iommu: add kernel-doc for iommu_unmap and iommu_unmap_fast > > >>>>>    dma-mapping: Provide an interface to allow allocate IOVA > > >>>>>    dma-mapping: Implement link/unlink ranges API > > >>>>>    mm/hmm: let users to tag specific PFN with DMA mapped bit > > >>>>>    mm/hmm: provide generic DMA managing logic > > >>>>>    RDMA/umem: Store ODP access mask information in PFN > > >>>>>    RDMA/core: Convert UMEM ODP DMA mapping to caching IOVA and page > > >>>>>      linkage > > >>>>>    RDMA/umem: Separate implicit ODP initialization from explicit ODP > > >>>>>    vfio/mlx5: Explicitly use number of pages instead of allocated > > >>>>> length > > >>>>>    vfio/mlx5: Rewrite create mkey flow to allow better code reuse > > >>>>>    vfio/mlx5: Enable the DMA link API > > >>>>> > > >>>>>   Documentation/core-api/dma-api.rst   |  70 ++++ > > >>>>   drivers/infiniband/core/umem_odp.c   | 250 +++++--------- > > >>>>>   drivers/infiniband/hw/mlx5/mlx5_ib.h |  12 +- > > >>>>>   drivers/infiniband/hw/mlx5/odp.c     |  65 ++-- > > >>>>>   drivers/infiniband/hw/mlx5/umr.c     |  12 +- > > >>>>>   drivers/iommu/dma-iommu.c            | 468 > > >>>>> +++++++++++++++++++++++---- > > >>>>>   drivers/iommu/iommu.c                |  84 ++--- > > >>>>>   drivers/pci/p2pdma.c                 |  38 +-- > > >>>>>   drivers/vfio/pci/mlx5/cmd.c          | 375 +++++++++++---------- > > >>>>>   drivers/vfio/pci/mlx5/cmd.h          |  35 +- > > >>>>>   drivers/vfio/pci/mlx5/main.c         |  87 +++-- > > >>>>>   include/linux/dma-map-ops.h          |  54 ---- > > >>>>>   include/linux/dma-mapping.h          |  85 +++++ > > >>>>>   include/linux/hmm-dma.h              |  33 ++ > > >>>>>   include/linux/hmm.h                  |  21 ++ > > >>>>>   include/linux/iommu.h                |   4 + > > >>>>>   include/linux/pci-p2pdma.h           |  84 +++++ > > >>>>>   include/rdma/ib_umem_odp.h           |  25 +- > > >>>>>   kernel/dma/direct.c                  |  44 +-- > > >>>>>   kernel/dma/mapping.c                 |  18 ++ > > >>>>>   mm/hmm.c                             | 264 +++++++++++++-- > > >>>>>   21 files changed, 1435 insertions(+), 693 deletions(-) > > >>>>>   create mode 100644 include/linux/hmm-dma.h > > >>>> Kind reminder. > > > <...> > > > > > >> Removing the need for scatterlists was advertised as the main goal of > > >> this new API, but it looks that similar effects can be achieved with > > >> just iterating over the pages and calling page-based DMA API directly. > > > Such iteration can't be enough because P2P pages don't have struct pages, > > > so you can't use reliably and efficiently dma_map_page_attrs() call. > > > > > > The only way to do so is to use dma_map_sg_attrs(), which relies on SG > > > (the one that we want to remove) to map P2P pages. > > > > That's something I don't get yet. How P2P pages can be used with > > dma_map_sg_attrs(), but not with dma_map_page_attrs()? Both operate > > internally on struct page pointer. > > Yes, and no. > See users of is_pci_p2pdma_page(...) function. In dma_*_sg() APIs, there > is a real check and support for p2p. In dma_map_page_attrs() variants, > this support is missing (ignored, or error is returned). > > > > > >> Maybe I missed something. I still see some advantages in this DMA API > > >> extension, but I would also like to see the clear benefits from > > >> introducing it, like perf logs or other benchmark summary. > > > We didn't focus yet on performance, however Christoph mentioned in his > > > block RFC [1] that even simple conversion should improve performance as > > > we are performing one P2P lookup per-bio and not per-SG entry as was > > > before [2]. In addition it decreases memory [3] too. > > > > > > [1] https://lore.kernel.org/all/cover.1730037261.git.leon@kernel.org/ > > > [2] https://lore.kernel.org/all/34d44537a65aba6ede215a8ad882aeee028b423a.1730037261.git.leon@kernel.org/ > > > [3] https://lore.kernel.org/all/383557d0fa1aa393dbab4e1daec94b6cced384ab.1730037261.git.leon@kernel.org/ > > > > > > So clear benefits are: > > > 1. Ability to use native for subsystem structure, e.g. bio for block, > > > umem for RDMA, dmabuf for DRM, e.t.c. It removes current wasteful > > > conversions from and to SG in order to work with DMA API. > > > 2. Batched request and iotlb sync optimizations (perform only once). > > > 3. Avoid very expensive call to pgmap pointer. > > > 4. Expose MMIO over VFIO without hacks (PCI BAR doesn't have struct pages). > > > See this series for such a hack > > > https://lore.kernel.org/all/20250307052248.405803-1-vivek.kasireddy@intel.com/ > > > > I see those benefits and I admit that for typical DMA-with-IOMMU case it > > would improve some things. I think that main concern from Robin was how > > to handle it for the cases without an IOMMU. > > In such case, we fallback to non-IOMMU flow (old, well-established one). > See this HMM patch as an example https://lore.kernel.org/all/a796da065fa8a9cb35d591ce6930400619572dcc.1738765879.git.leonro@nvidia.com/ > +dma_addr_t hmm_dma_map_pfn(struct device *dev, struct hmm_dma_map *map, > + size_t idx, > + struct pci_p2pdma_map_state *p2pdma_state) > ... > + if (dma_use_iova(state)) { > ... > + } else { > ... > + dma_addr = dma_map_page(dev, page, 0, map->dma_entry_size, > + DMA_BIDIRECTIONAL); > > Thanks Marek, Did it answer your concerns? How can we progress here? As you can see, the chances to get meaningful response to your review request and my questions are not high. Thanks > > > > > Best regards > > -- > > Marek Szyprowski, PhD > > Samsung R&D Institute Poland > > >