Re: [PATCH v10 05/24] dma-mapping: Provide an interface to allow allocate IOVA

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Baolu Lu <baolu.lu@linux.intel.com>
To: Leon Romanovsky <leon@kernel.org>,
	Marek Szyprowski <m.szyprowski@samsung.com>,
	Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@lst.de>,
	Keith Busch <kbusch@kernel.org>
Cc: "Leon Romanovsky" <leonro@nvidia.com>, "Jake Edge" <jake@lwn.net>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Jason Gunthorpe" <jgg@ziepe.ca>,
	"Zhu Yanjun" <zyjzyj2000@gmail.com>,
	"Robin Murphy" <robin.murphy@arm.com>,
	"Joerg Roedel" <joro@8bytes.org>, "Will Deacon" <will@kernel.org>,
	"Sagi Grimberg" <sagi@grimberg.me>,
	"Bjorn Helgaas" <bhelgaas@google.com>,
	"Logan Gunthorpe" <logang@deltatee.com>,
	"Yishai Hadas" <yishaih@nvidia.com>,
	"Shameer Kolothum" <shameerali.kolothum.thodi@huawei.com>,
	"Kevin Tian" <kevin.tian@intel.com>,
	"Alex Williamson" <alex.williamson@redhat.com>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-block@vger.kernel.org, linux-rdma@vger.kernel.org,
	iommu@lists.linux.dev, linux-nvme@lists.infradead.org,
	linux-pci@vger.kernel.org, kvm@vger.kernel.org,
	linux-mm@kvack.org, "Niklas Schnelle" <schnelle@linux.ibm.com>,
	"Chuck Lever" <chuck.lever@oracle.com>,
	"Luis Chamberlain" <mcgrof@kernel.org>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Dan Williams" <dan.j.williams@intel.com>,
	"Kanchan Joshi" <joshi.k@samsung.com>,
	"Chaitanya Kulkarni" <kch@nvidia.com>
Subject: Re: [PATCH v10 05/24] dma-mapping: Provide an interface to allow allocate IOVA
Date: Tue, 29 Apr 2025 11:10:54 +0800	[thread overview]
Message-ID: <0086302d-1cb3-43dd-a989-e4b1995a0d22@linux.intel.com> (raw)
In-Reply-To: <30f0601d400711b3859deeb8fef3090f5b2020a4.1745831017.git.leon@kernel.org>

On 4/28/25 17:22, Leon Romanovsky wrote:
> From: Leon Romanovsky<leonro@nvidia.com>
> 
> The existing .map_page() callback provides both allocating of IOVA

.map_pages()

> and linking DMA pages. That combination works great for most of the
> callers who use it in control paths, but is less effective in fast
> paths where there may be multiple calls to map_page().
> 
> These advanced callers already manage their data in some sort of
> database and can perform IOVA allocation in advance, leaving range
> linkage operation to be in fast path.
> 
> Provide an interface to allocate/deallocate IOVA and next patch
> link/unlink DMA ranges to that specific IOVA.
> 
> In the new API a DMA mapping transaction is identified by a
> struct dma_iova_state, which holds some recomputed information
> for the transaction which does not change for each page being
> mapped, so add a check if IOVA can be used for the specific
> transaction.
> 
> The API is exported from dma-iommu as it is the only implementation
> supported, the namespace is clearly different from iommu_* functions
> which are not allowed to be used. This code layout allows us to save
> function call per API call used in datapath as well as a lot of boilerplate
> code.
> 
> Reviewed-by: Christoph Hellwig<hch@lst.de>
> Tested-by: Jens Axboe<axboe@kernel.dk>
> Reviewed-by: Luis Chamberlain<mcgrof@kernel.org>
> Signed-off-by: Leon Romanovsky<leonro@nvidia.com>
> ---
>   drivers/iommu/dma-iommu.c   | 86 +++++++++++++++++++++++++++++++++++++
>   include/linux/dma-mapping.h | 48 +++++++++++++++++++++
>   2 files changed, 134 insertions(+)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 9ba8d8bc0ce9..d3211a8d755e 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -1723,6 +1723,92 @@ size_t iommu_dma_max_mapping_size(struct device *dev)
>   	return SIZE_MAX;
>   }
>   
> +/**
> + * dma_iova_try_alloc - Try to allocate an IOVA space
> + * @dev: Device to allocate the IOVA space for
> + * @state: IOVA state
> + * @phys: physical address
> + * @size: IOVA size
> + *
> + * Check if @dev supports the IOVA-based DMA API, and if yes allocate IOVA space
> + * for the given base address and size.
> + *
> + * Note: @phys is only used to calculate the IOVA alignment. Callers that always
> + * do PAGE_SIZE aligned transfers can safely pass 0 here.

Have you considered adding a direct alignment parameter to
dma_iova_try_alloc()? '0' simply means the default PAGE_SIZE alignment.

I'm imagining that some devices might have particular alignment needs
for better performance, especially for the ATS cache efficiency. This
would allow those device drivers to express the requirements directly
during iova allocation.

> + *
> + * Returns %true if the IOVA-based DMA API can be used and IOVA space has been
> + * allocated, or %false if the regular DMA API should be used.
> + */
> +bool dma_iova_try_alloc(struct device *dev, struct dma_iova_state *state,
> +		phys_addr_t phys, size_t size)
> +{
> +	struct iommu_dma_cookie *cookie;
> +	struct iommu_domain *domain;
> +	struct iova_domain *iovad;
> +	size_t iova_off;
> +	dma_addr_t addr;
> +
> +	memset(state, 0, sizeof(*state));
> +	if (!use_dma_iommu(dev))
> +		return false;
> +
> +	domain = iommu_get_dma_domain(dev);
> +	cookie = domain->iova_cookie;
> +	iovad = &cookie->iovad;
> +	iova_off = iova_offset(iovad, phys);
> +
> +	if (static_branch_unlikely(&iommu_deferred_attach_enabled) &&
> +	    iommu_deferred_attach(dev, iommu_get_domain_for_dev(dev)))
> +		return false;
> +
> +	if (WARN_ON_ONCE(!size))
> +		return false;
> +
> +	/*
> +	 * DMA_IOVA_USE_SWIOTLB is flag which is set by dma-iommu
> +	 * internals, make sure that caller didn't set it and/or
> +	 * didn't use this interface to map SIZE_MAX.
> +	 */
> +	if (WARN_ON_ONCE((u64)size & DMA_IOVA_USE_SWIOTLB))

I'm a little concerned that device drivers might inadvertently misuse
the state->__size by forgetting about the high bit being used for
DMA_IOVA_USE_SWIOTLB. Perhaps adding a separate flag within struct
dma_iova_state to prevent such issues?

> +		return false;
> +
> +	addr = iommu_dma_alloc_iova(domain,
> +			iova_align(iovad, size + iova_off),
> +			dma_get_mask(dev), dev);
> +	if (!addr)
> +		return false;
> +
> +	state->addr = addr + iova_off;
> +	state->__size = size;
> +	return true;
> +}
> +EXPORT_SYMBOL_GPL(dma_iova_try_alloc);

Thanks,
baolu

next prev parent reply	other threads:[~2025-04-29  3:15 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-28  9:22 [PATCH v10 00/24] Provide a new two step DMA mapping API Leon Romanovsky
2025-04-28  9:22 ` [PATCH v10 01/24] PCI/P2PDMA: Refactor the p2pdma mapping helpers Leon Romanovsky
2025-04-29  2:08   ` Baolu Lu
2025-04-28  9:22 ` [PATCH v10 02/24] dma-mapping: move the PCI P2PDMA mapping helpers to pci-p2pdma.h Leon Romanovsky
2025-04-29  2:09   ` Baolu Lu
2025-04-28  9:22 ` [PATCH v10 03/24] iommu: generalize the batched sync after map interface Leon Romanovsky
2025-04-29  2:19   ` Baolu Lu
2025-04-29  6:09     ` Leon Romanovsky
2025-04-29 11:53     ` Jason Gunthorpe
2025-04-28  9:22 ` [PATCH v10 04/24] iommu: add kernel-doc for iommu_unmap_fast Leon Romanovsky
2025-04-29  2:37   ` Baolu Lu
2025-04-28  9:22 ` [PATCH v10 05/24] dma-mapping: Provide an interface to allow allocate IOVA Leon Romanovsky
2025-04-29  3:10   ` Baolu Lu [this message]
2025-04-29  5:46     ` Leon Romanovsky
2025-04-28  9:22 ` [PATCH v10 06/24] iommu/dma: Factor out a iommu_dma_map_swiotlb helper Leon Romanovsky
2025-04-29  4:58   ` Baolu Lu
2025-04-29  5:53     ` Leon Romanovsky
2025-04-29  5:58       ` Baolu Lu
2025-04-29  6:18         ` Leon Romanovsky
2025-04-28  9:22 ` [PATCH v10 07/24] dma-mapping: Implement link/unlink ranges API Leon Romanovsky
2025-04-28  9:22 ` [PATCH v10 08/24] dma-mapping: add a dma_need_unmap helper Leon Romanovsky
2025-04-28  9:22 ` [PATCH v10 09/24] docs: core-api: document the IOVA-based API Leon Romanovsky
2025-04-28  9:22 ` [PATCH v10 10/24] mm/hmm: let users to tag specific PFN with DMA mapped bit Leon Romanovsky
2025-04-28  9:22 ` [PATCH v10 11/24] mm/hmm: provide generic DMA managing logic Leon Romanovsky
2025-04-28  9:22 ` [PATCH v10 12/24] RDMA/umem: Store ODP access mask information in PFN Leon Romanovsky
2025-04-28  9:22 ` [PATCH v10 13/24] RDMA/core: Convert UMEM ODP DMA mapping to caching IOVA and page linkage Leon Romanovsky
2025-04-28  9:22 ` [PATCH v10 14/24] RDMA/umem: Separate implicit ODP initialization from explicit ODP Leon Romanovsky
2025-04-28  9:22 ` [PATCH v10 15/24] vfio/mlx5: Explicitly use number of pages instead of allocated length Leon Romanovsky
2025-04-28  9:22 ` [PATCH v10 16/24] vfio/mlx5: Rewrite create mkey flow to allow better code reuse Leon Romanovsky
2025-04-28  9:22 ` [PATCH v10 17/24] vfio/mlx5: Enable the DMA link API Leon Romanovsky
2025-04-28  9:22 ` [PATCH v10 18/24] block: share more code for bio addition helper Leon Romanovsky
2025-04-28  9:22 ` [PATCH v10 19/24] block: don't merge different kinds of P2P transfers in a single bio Leon Romanovsky
2025-04-28  9:22 ` [PATCH v10 20/24] blk-mq: add scatterlist-less DMA mapping helpers Leon Romanovsky
2025-04-28  9:22 ` [PATCH v10 21/24] nvme-pci: remove struct nvme_descriptor Leon Romanovsky
2025-04-28  9:22 ` [PATCH v10 22/24] nvme-pci: use a better encoding for small prp pool allocations Leon Romanovsky
2025-04-28  9:22 ` [PATCH v10 23/24] nvme-pci: convert to blk_rq_dma_map Leon Romanovsky
2025-04-28 16:46   ` Keith Busch
2025-04-28 17:22     ` Leon Romanovsky
2025-04-28 17:30       ` Keith Busch
2025-04-28  9:22 ` [PATCH v10 24/24] nvme-pci: store aborted state in flags variable Leon Romanovsky
2025-05-12 10:07 ` (subset) [PATCH v10 00/24] Provide a new two step DMA mapping API Leon Romanovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0086302d-1cb3-43dd-a989-e4b1995a0d22@linux.intel.com \
    --to=baolu.lu@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=bhelgaas@google.com \
    --cc=chuck.lever@oracle.com \
    --cc=corbet@lwn.net \
    --cc=dan.j.williams@intel.com \
    --cc=hch@lst.de \
    --cc=iommu@lists.linux.dev \
    --cc=jake@lwn.net \
    --cc=jgg@ziepe.ca \
    --cc=jglisse@redhat.com \
    --cc=joro@8bytes.org \
    --cc=joshi.k@samsung.com \
    --cc=kbusch@kernel.org \
    --cc=kch@nvidia.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=leon@kernel.org \
    --cc=leonro@nvidia.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=logang@deltatee.com \
    --cc=m.szyprowski@samsung.com \
    --cc=mcgrof@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=sagi@grimberg.me \
    --cc=schnelle@linux.ibm.com \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=yishaih@nvidia.com \
    --cc=zyjzyj2000@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox