From: Jason Gunthorpe <jgg@ziepe.ca>
To: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: "Leon Romanovsky" <leon@kernel.org>,
"Robin Murphy" <robin.murphy@arm.com>,
"Christoph Hellwig" <hch@lst.de>, "Jens Axboe" <axboe@kernel.dk>,
"Joerg Roedel" <joro@8bytes.org>, "Will Deacon" <will@kernel.org>,
"Sagi Grimberg" <sagi@grimberg.me>,
"Keith Busch" <kbusch@kernel.org>,
"Bjorn Helgaas" <bhelgaas@google.com>,
"Logan Gunthorpe" <logang@deltatee.com>,
"Yishai Hadas" <yishaih@nvidia.com>,
"Shameer Kolothum" <shameerali.kolothum.thodi@huawei.com>,
"Kevin Tian" <kevin.tian@intel.com>,
"Alex Williamson" <alex.williamson@redhat.com>,
"Jérôme Glisse" <jglisse@redhat.com>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Jonathan Corbet" <corbet@lwn.net>,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-block@vger.kernel.org, linux-rdma@vger.kernel.org,
iommu@lists.linux.dev, linux-nvme@lists.infradead.org,
linux-pci@vger.kernel.org, kvm@vger.kernel.org,
linux-mm@kvack.org, "Randy Dunlap" <rdunlap@infradead.org>
Subject: Re: [PATCH v7 00/17] Provide a new two step DMA mapping API
Date: Wed, 19 Mar 2025 14:58:40 -0300 [thread overview]
Message-ID: <20250319175840.GG10600@ziepe.ca> (raw)
In-Reply-To: <adb63b87-d8f2-4ae6-90c4-125bde41dc29@samsung.com>
On Fri, Mar 14, 2025 at 11:52:58AM +0100, Marek Szyprowski wrote:
> > The only way to do so is to use dma_map_sg_attrs(), which relies on SG
> > (the one that we want to remove) to map P2P pages.
>
> That's something I don't get yet. How P2P pages can be used with
> dma_map_sg_attrs(), but not with dma_map_page_attrs()? Both operate
> internally on struct page pointer.
It is a bit subtle, I ran in to this when exploring enabling proper
P2P for dma_map_resource() too.
The API signatures are:
dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
size_t offset, size_t size, enum dma_data_direction dir,
unsigned long attrs);
void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
enum dma_data_direction dir, unsigned long attrs);
The thing to notice immediately is that the unmap path does not get
passed a struct page.
So, lets think about the flow when the iommu is turned on.
For normal struct page memory:
- dma_map_page_attrs() allocates some IOVA and returns it in the
dma_addr_t and then maps the struct page to the iommu page table
- dma_unmap_page_attrs() frees the IOVA from the given dma_addr_t
If we think about P2P now:
- dma_map_page_attrs() can inspect the struct page and determine it
is P2P. It computes a bus address which is not an IOVA, and does
not transit through the IOMMU. No IOVA allocation is performed. the
bus address is returned as the dma_addr_t
- dma_unmap_page_attrs() ... is impossible. We just get this
dma_addr_t that doesn't have enough information to tell anymore if
the address is a P2P bus address or not, so we can't tell if we
should unmap an iova from the dma_addr_t :\
The sg path fixes this because it introduced a new flag in the
scatterlist, SG_DMA_BUS_ADDRESS, that allows the sg map path to record
the information for the unmap path so it can do the right thing.
Leon's approach fixes this by putting an overarching transaction state
around the DMA operation so that map and unmap operations can look in
the state and determine if this is a P2P or non P2P map and then know
how to unmap.
For some background here, Christoph gave me this idea back at LSF/MM
in Vancouver (two years ago now). At the time I was looking at
replacing scatterlist and giving new DMA API ops to operate on a
"scatterlist v2" structure.
Christoph's vision was to make a performance DMA API path that could
be used to implement any scatterlist-like data structure very
efficiently without having to teach the DMA API about all sorts of
scatterlist-like things.
Jason
next prev parent reply other threads:[~2025-03-19 17:58 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-05 14:40 Leon Romanovsky
2025-02-05 14:40 ` [PATCH v7 01/17] PCI/P2PDMA: Refactor the p2pdma mapping helpers Leon Romanovsky
2025-02-05 14:40 ` [PATCH v7 02/17] dma-mapping: move the PCI P2PDMA mapping helpers to pci-p2pdma.h Leon Romanovsky
2025-02-05 14:40 ` [PATCH v7 03/17] iommu: generalize the batched sync after map interface Leon Romanovsky
2025-03-17 9:52 ` Niklas Schnelle
2025-03-17 13:44 ` Leon Romanovsky
2025-02-05 14:40 ` [PATCH v7 04/17] iommu: add kernel-doc for iommu_unmap and iommu_unmap_fast Leon Romanovsky
2025-02-05 14:40 ` [PATCH v7 05/17] dma-mapping: Provide an interface to allow allocate IOVA Leon Romanovsky
2025-02-05 14:40 ` [PATCH v7 06/17] iommu/dma: Factor out a iommu_dma_map_swiotlb helper Leon Romanovsky
2025-02-05 14:40 ` [PATCH v7 07/17] dma-mapping: Implement link/unlink ranges API Leon Romanovsky
2025-02-05 14:40 ` [PATCH v7 08/17] dma-mapping: add a dma_need_unmap helper Leon Romanovsky
2025-02-05 14:40 ` [PATCH v7 09/17] docs: core-api: document the IOVA-based API Leon Romanovsky
2025-02-05 14:40 ` [PATCH v7 10/17] mm/hmm: let users to tag specific PFN with DMA mapped bit Leon Romanovsky
2025-02-05 14:40 ` [PATCH v7 11/17] mm/hmm: provide generic DMA managing logic Leon Romanovsky
2025-02-05 14:40 ` [PATCH v7 12/17] RDMA/umem: Store ODP access mask information in PFN Leon Romanovsky
2025-02-05 14:40 ` [PATCH v7 13/17] RDMA/core: Convert UMEM ODP DMA mapping to caching IOVA and page linkage Leon Romanovsky
2025-02-05 14:40 ` [PATCH v7 14/17] RDMA/umem: Separate implicit ODP initialization from explicit ODP Leon Romanovsky
2025-02-05 14:40 ` [PATCH v7 15/17] vfio/mlx5: Explicitly use number of pages instead of allocated length Leon Romanovsky
2025-02-05 14:40 ` [PATCH v7 16/17] vfio/mlx5: Rewrite create mkey flow to allow better code reuse Leon Romanovsky
2025-02-05 14:40 ` [PATCH v7 17/17] vfio/mlx5: Enable the DMA link API Leon Romanovsky
2025-02-20 12:48 ` [PATCH v7 00/17] Provide a new two step DMA mapping API Leon Romanovsky
2025-02-28 19:54 ` Robin Murphy
2025-03-02 8:57 ` Leon Romanovsky
2025-03-21 16:05 ` Robin Murphy
2025-03-25 12:36 ` Jason Gunthorpe
2025-03-25 14:41 ` Leon Romanovsky
2025-04-01 1:09 ` Luis Chamberlain
2025-03-27 17:56 ` Matthew Wilcox
2025-03-12 9:28 ` Marek Szyprowski
2025-03-12 19:32 ` Leon Romanovsky
2025-03-14 10:52 ` Marek Szyprowski
2025-03-14 18:49 ` Leon Romanovsky
2025-03-19 8:30 ` Leon Romanovsky
2025-03-19 17:58 ` Jason Gunthorpe [this message]
2025-03-20 23:52 ` Marek Szyprowski
2025-03-22 0:41 ` Jason Gunthorpe
2025-03-28 14:18 ` Marek Szyprowski
2025-03-31 19:10 ` Jason Gunthorpe
2025-03-31 14:46 ` Chuck Lever
2025-04-18 1:20 ` Dan Williams
2025-03-21 13:52 ` Robin Murphy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250319175840.GG10600@ziepe.ca \
--to=jgg@ziepe.ca \
--cc=akpm@linux-foundation.org \
--cc=alex.williamson@redhat.com \
--cc=axboe@kernel.dk \
--cc=bhelgaas@google.com \
--cc=corbet@lwn.net \
--cc=hch@lst.de \
--cc=iommu@lists.linux.dev \
--cc=jglisse@redhat.com \
--cc=joro@8bytes.org \
--cc=kbusch@kernel.org \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=leon@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nvme@lists.infradead.org \
--cc=linux-pci@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=logang@deltatee.com \
--cc=m.szyprowski@samsung.com \
--cc=rdunlap@infradead.org \
--cc=robin.murphy@arm.com \
--cc=sagi@grimberg.me \
--cc=shameerali.kolothum.thodi@huawei.com \
--cc=will@kernel.org \
--cc=yishaih@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox