linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Chuck Lever <chuck.lever@oracle.com>
To: Jason Gunthorpe <jgg@ziepe.ca>,
	Marek Szyprowski <m.szyprowski@samsung.com>
Cc: "Leon Romanovsky" <leon@kernel.org>,
	"Robin Murphy" <robin.murphy@arm.com>,
	"Christoph Hellwig" <hch@lst.de>, "Jens Axboe" <axboe@kernel.dk>,
	"Joerg Roedel" <joro@8bytes.org>, "Will Deacon" <will@kernel.org>,
	"Sagi Grimberg" <sagi@grimberg.me>,
	"Keith Busch" <kbusch@kernel.org>,
	"Bjorn Helgaas" <bhelgaas@google.com>,
	"Logan Gunthorpe" <logang@deltatee.com>,
	"Yishai Hadas" <yishaih@nvidia.com>,
	"Shameer Kolothum" <shameerali.kolothum.thodi@huawei.com>,
	"Kevin Tian" <kevin.tian@intel.com>,
	"Alex Williamson" <alex.williamson@redhat.com>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Jonathan Corbet" <corbet@lwn.net>,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-block@vger.kernel.org, linux-rdma@vger.kernel.org,
	iommu@lists.linux.dev, linux-nvme@lists.infradead.org,
	linux-pci@vger.kernel.org, kvm@vger.kernel.org,
	linux-mm@kvack.org, "Randy Dunlap" <rdunlap@infradead.org>
Subject: Re: [PATCH v7 00/17] Provide a new two step DMA mapping API
Date: Mon, 31 Mar 2025 10:46:40 -0400	[thread overview]
Message-ID: <913df4b4-fc4a-409d-9007-088a3e2c8291@oracle.com> (raw)
In-Reply-To: <20250322004130.GS126678@ziepe.ca>

On 3/21/25 8:41 PM, Jason Gunthorpe wrote:
> On Fri, Mar 21, 2025 at 12:52:30AM +0100, Marek Szyprowski wrote:
>>> Christoph's vision was to make a performance DMA API path that could
>>> be used to implement any scatterlist-like data structure very
>>> efficiently without having to teach the DMA API about all sorts of
>>> scatterlist-like things.
>>
>> Thanks for explaining one more motivation behind this patchset!
> 
> Sure, no problem.
> 
> To close the loop on the bigger picture here..
> 
> When you put the parts together:
> 
>  1) dma_map_sg is the only API that is both performant and fully
>     functional
> 
>  2) scatterlist is a horrible leaky design and badly misued all over
>     the place. When Logan added SG_DMA_BUS_ADDRESS it became quite
>     clear that any significant changes to scatterlist are infeasible,
>     or at least we'd break a huge number of untestable legacy drivers
>     in the process.
> 
>  3) We really want to do full featured performance DMA *without* a
>     struct page. This requires changing scatterlist, inventing a new
>     scatterlist v2 and DMA map for it, or this idea here of a flexible
>     lower level DMA API entry point.
> 
>     Matthew has been talking about struct-pageless for a long time now
>     from the block/mm direction using folio & memdesc and this is
>     meeting his work from the other end of the stack by starting to
>     build a way to do DMA on future struct pageless things. This is 
>     going to be huge multi-year project but small parts like this need
>     to be solved and agreed to make progress.
> 
>  4) In the immediate moment we still have problems in VFIO, RDMA, and
>     DRM managing P2P transfers because dma_map_resource/page() don't
>     properly work, and we don't have struct pages to use
>     dma_map_sg(). Hacks around the DMA API have been in the kernel for
>     a long time now, we want to see a properly architected solution.

The in-kernel NFS stack, for example, already has a mechanism for
receiving and sending RPC messages using arrays of bio_vecs. The stack
can use bio_vecs natively for communicating with both the page cache and
the kernel socket API.

But NFS's RPC/RDMA transport still has to convert these pages into a
scatterlist so that they can be mapped and then handed to the RDMA core.
Instead, having a DMA mapping API that can take an array of bio_vecs
directly (and then, a similar API within the RDMA core) would make
NFS/RDMA a lot more CPU-efficient.

The lack of a bio_vec DMA mapping API has held up a full conversion of
the in-kernel NFS stack to use folios. That's the reason I tried my
own hand at adding a bio_vec DMA mapping API last summer.

Leon and Christoph have provided a clean step in the right direction
and it looks to me like they have thought carefully about next steps.
Robin pointed out some areas that might be lacking in v7, but IMHO
there is a plan to address many of these areas in subsequent work. I
don't see a reason not to proceed with this first step.


-- 
Chuck Lever


  parent reply	other threads:[~2025-03-31 15:24 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-05 14:40 Leon Romanovsky
2025-02-05 14:40 ` [PATCH v7 01/17] PCI/P2PDMA: Refactor the p2pdma mapping helpers Leon Romanovsky
2025-02-05 14:40 ` [PATCH v7 02/17] dma-mapping: move the PCI P2PDMA mapping helpers to pci-p2pdma.h Leon Romanovsky
2025-02-05 14:40 ` [PATCH v7 03/17] iommu: generalize the batched sync after map interface Leon Romanovsky
2025-03-17  9:52   ` Niklas Schnelle
2025-03-17 13:44     ` Leon Romanovsky
2025-02-05 14:40 ` [PATCH v7 04/17] iommu: add kernel-doc for iommu_unmap and iommu_unmap_fast Leon Romanovsky
2025-02-05 14:40 ` [PATCH v7 05/17] dma-mapping: Provide an interface to allow allocate IOVA Leon Romanovsky
2025-02-05 14:40 ` [PATCH v7 06/17] iommu/dma: Factor out a iommu_dma_map_swiotlb helper Leon Romanovsky
2025-02-05 14:40 ` [PATCH v7 07/17] dma-mapping: Implement link/unlink ranges API Leon Romanovsky
2025-02-05 14:40 ` [PATCH v7 08/17] dma-mapping: add a dma_need_unmap helper Leon Romanovsky
2025-02-05 14:40 ` [PATCH v7 09/17] docs: core-api: document the IOVA-based API Leon Romanovsky
2025-02-05 14:40 ` [PATCH v7 10/17] mm/hmm: let users to tag specific PFN with DMA mapped bit Leon Romanovsky
2025-02-05 14:40 ` [PATCH v7 11/17] mm/hmm: provide generic DMA managing logic Leon Romanovsky
2025-02-05 14:40 ` [PATCH v7 12/17] RDMA/umem: Store ODP access mask information in PFN Leon Romanovsky
2025-02-05 14:40 ` [PATCH v7 13/17] RDMA/core: Convert UMEM ODP DMA mapping to caching IOVA and page linkage Leon Romanovsky
2025-02-05 14:40 ` [PATCH v7 14/17] RDMA/umem: Separate implicit ODP initialization from explicit ODP Leon Romanovsky
2025-02-05 14:40 ` [PATCH v7 15/17] vfio/mlx5: Explicitly use number of pages instead of allocated length Leon Romanovsky
2025-02-05 14:40 ` [PATCH v7 16/17] vfio/mlx5: Rewrite create mkey flow to allow better code reuse Leon Romanovsky
2025-02-05 14:40 ` [PATCH v7 17/17] vfio/mlx5: Enable the DMA link API Leon Romanovsky
2025-02-20 12:48 ` [PATCH v7 00/17] Provide a new two step DMA mapping API Leon Romanovsky
2025-02-28 19:54   ` Robin Murphy
2025-03-02  8:57     ` Leon Romanovsky
2025-03-21 16:05       ` Robin Murphy
2025-03-25 12:36         ` Jason Gunthorpe
2025-03-25 14:41           ` Leon Romanovsky
2025-04-01  1:09             ` Luis Chamberlain
2025-03-27 17:56         ` Matthew Wilcox
2025-03-12  9:28     ` Marek Szyprowski
2025-03-12 19:32       ` Leon Romanovsky
2025-03-14 10:52         ` Marek Szyprowski
2025-03-14 18:49           ` Leon Romanovsky
2025-03-19  8:30             ` Leon Romanovsky
2025-03-19 17:58           ` Jason Gunthorpe
2025-03-20 23:52             ` Marek Szyprowski
2025-03-22  0:41               ` Jason Gunthorpe
2025-03-28 14:18                 ` Marek Szyprowski
2025-03-31 19:10                   ` Jason Gunthorpe
2025-03-31 14:46                 ` Chuck Lever [this message]
2025-04-18  1:20                 ` Dan Williams
2025-03-21 13:52       ` Robin Murphy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=913df4b4-fc4a-409d-9007-088a3e2c8291@oracle.com \
    --to=chuck.lever@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=bhelgaas@google.com \
    --cc=corbet@lwn.net \
    --cc=hch@lst.de \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@ziepe.ca \
    --cc=jglisse@redhat.com \
    --cc=joro@8bytes.org \
    --cc=kbusch@kernel.org \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=logang@deltatee.com \
    --cc=m.szyprowski@samsung.com \
    --cc=rdunlap@infradead.org \
    --cc=robin.murphy@arm.com \
    --cc=sagi@grimberg.me \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=will@kernel.org \
    --cc=yishaih@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox