From: Leon Romanovsky <leon@kernel.org>
To: Marek Szyprowski <m.szyprowski@samsung.com>,
Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@lst.de>,
Keith Busch <kbusch@kernel.org>
Cc: "Jake Edge" <jake@lwn.net>, "Jonathan Corbet" <corbet@lwn.net>,
"Jason Gunthorpe" <jgg@ziepe.ca>,
"Zhu Yanjun" <zyjzyj2000@gmail.com>,
"Robin Murphy" <robin.murphy@arm.com>,
"Joerg Roedel" <joro@8bytes.org>, "Will Deacon" <will@kernel.org>,
"Sagi Grimberg" <sagi@grimberg.me>,
"Bjorn Helgaas" <bhelgaas@google.com>,
"Logan Gunthorpe" <logang@deltatee.com>,
"Yishai Hadas" <yishaih@nvidia.com>,
"Shameer Kolothum" <shameerali.kolothum.thodi@huawei.com>,
"Kevin Tian" <kevin.tian@intel.com>,
"Alex Williamson" <alex.williamson@redhat.com>,
"Jérôme Glisse" <jglisse@redhat.com>,
"Andrew Morton" <akpm@linux-foundation.org>,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-block@vger.kernel.org, linux-rdma@vger.kernel.org,
iommu@lists.linux.dev, linux-nvme@lists.infradead.org,
linux-pci@vger.kernel.org, kvm@vger.kernel.org,
linux-mm@kvack.org, "Niklas Schnelle" <schnelle@linux.ibm.com>,
"Chuck Lever" <chuck.lever@oracle.com>,
"Luis Chamberlain" <mcgrof@kernel.org>,
"Matthew Wilcox" <willy@infradead.org>,
"Dan Williams" <dan.j.williams@intel.com>,
"Kanchan Joshi" <joshi.k@samsung.com>,
"Chaitanya Kulkarni" <kch@nvidia.com>,
"Leon Romanovsky" <leonro@nvidia.com>
Subject: [PATCH v10 19/24] block: don't merge different kinds of P2P transfers in a single bio
Date: Mon, 28 Apr 2025 12:22:25 +0300 [thread overview]
Message-ID: <e31df76edb282f648d22c3c9040320cf18b1652f.1745831017.git.leon@kernel.org> (raw)
In-Reply-To: <cover.1745831017.git.leon@kernel.org>
From: Christoph Hellwig <hch@lst.de>
To get out of the dma mapping helpers having to check every segment for
it's P2P status, ensure that bios either contain P2P transfers or non-P2P
transfers, and that a P2P bio only contains ranges from a single device.
This means we do the page zone access in the bio add path where it should
be still page hot, and will only have do the fairly expensive P2P topology
lookup once per bio down in the dma mapping path, and only for already
marked bios.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Tested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
block/bio.c | 17 ++++++++++-------
block/blk-merge.c | 17 +++++++++++------
include/linux/blk_types.h | 2 ++
3 files changed, 23 insertions(+), 13 deletions(-)
diff --git a/block/bio.c b/block/bio.c
index 3047fa3f4b32..279eac2396bf 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -928,8 +928,6 @@ static bool bvec_try_merge_page(struct bio_vec *bv, struct page *page,
return false;
if (xen_domain() && !xen_biovec_phys_mergeable(bv, page))
return false;
- if (!zone_device_pages_have_same_pgmap(bv->bv_page, page))
- return false;
*same_page = ((vec_end_addr & PAGE_MASK) == ((page_addr + off) &
PAGE_MASK));
@@ -998,11 +996,16 @@ static int bio_add_page_int(struct bio *bio, struct page *page,
if (bio->bi_iter.bi_size > UINT_MAX - len)
return 0;
- if (bio->bi_vcnt > 0 &&
- bvec_try_merge_page(&bio->bi_io_vec[bio->bi_vcnt - 1],
- page, len, offset, same_page)) {
- bio->bi_iter.bi_size += len;
- return len;
+ if (bio->bi_vcnt > 0) {
+ struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt - 1];
+
+ if (bvec_try_merge_page(bv, page, len, offset, same_page)) {
+ bio->bi_iter.bi_size += len;
+ return len;
+ }
+ } else {
+ if (is_pci_p2pdma_page(page))
+ bio->bi_opf |= REQ_P2PDMA | REQ_NOMERGE;
}
if (bio->bi_vcnt >= bio->bi_max_vecs)
diff --git a/block/blk-merge.c b/block/blk-merge.c
index fdd4efb54c6c..d9691e900cc6 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -320,12 +320,17 @@ int bio_split_rw_at(struct bio *bio, const struct queue_limits *lim,
unsigned nsegs = 0, bytes = 0;
bio_for_each_bvec(bv, bio, iter) {
- /*
- * If the queue doesn't support SG gaps and adding this
- * offset would create a gap, disallow it.
- */
- if (bvprvp && bvec_gap_to_prev(lim, bvprvp, bv.bv_offset))
- goto split;
+ if (bvprvp) {
+ /*
+ * If the queue doesn't support SG gaps and adding this
+ * offset would create a gap, disallow it.
+ */
+ if (bvec_gap_to_prev(lim, bvprvp, bv.bv_offset))
+ goto split;
+ } else {
+ if (is_pci_p2pdma_page(bv.bv_page))
+ bio->bi_opf |= REQ_P2PDMA | REQ_NOMERGE;
+ }
if (nsegs < lim->max_segments &&
bytes + bv.bv_len <= max_bytes &&
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index dce7615c35e7..94cf146e8ce6 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -378,6 +378,7 @@ enum req_flag_bits {
__REQ_DRV, /* for driver use */
__REQ_FS_PRIVATE, /* for file system (submitter) use */
__REQ_ATOMIC, /* for atomic write operations */
+ __REQ_P2PDMA, /* contains P2P DMA pages */
/*
* Command specific flags, keep last:
*/
@@ -410,6 +411,7 @@ enum req_flag_bits {
#define REQ_DRV (__force blk_opf_t)(1ULL << __REQ_DRV)
#define REQ_FS_PRIVATE (__force blk_opf_t)(1ULL << __REQ_FS_PRIVATE)
#define REQ_ATOMIC (__force blk_opf_t)(1ULL << __REQ_ATOMIC)
+#define REQ_P2PDMA (__force blk_opf_t)(1ULL << __REQ_P2PDMA)
#define REQ_NOUNMAP (__force blk_opf_t)(1ULL << __REQ_NOUNMAP)
--
2.49.0
next prev parent reply other threads:[~2025-04-28 9:24 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-28 9:22 [PATCH v10 00/24] Provide a new two step DMA mapping API Leon Romanovsky
2025-04-28 9:22 ` [PATCH v10 01/24] PCI/P2PDMA: Refactor the p2pdma mapping helpers Leon Romanovsky
2025-04-29 2:08 ` Baolu Lu
2025-04-28 9:22 ` [PATCH v10 02/24] dma-mapping: move the PCI P2PDMA mapping helpers to pci-p2pdma.h Leon Romanovsky
2025-04-29 2:09 ` Baolu Lu
2025-04-28 9:22 ` [PATCH v10 03/24] iommu: generalize the batched sync after map interface Leon Romanovsky
2025-04-29 2:19 ` Baolu Lu
2025-04-29 6:09 ` Leon Romanovsky
2025-04-29 11:53 ` Jason Gunthorpe
2025-04-28 9:22 ` [PATCH v10 04/24] iommu: add kernel-doc for iommu_unmap_fast Leon Romanovsky
2025-04-29 2:37 ` Baolu Lu
2025-04-28 9:22 ` [PATCH v10 05/24] dma-mapping: Provide an interface to allow allocate IOVA Leon Romanovsky
2025-04-29 3:10 ` Baolu Lu
2025-04-29 5:46 ` Leon Romanovsky
2025-04-28 9:22 ` [PATCH v10 06/24] iommu/dma: Factor out a iommu_dma_map_swiotlb helper Leon Romanovsky
2025-04-29 4:58 ` Baolu Lu
2025-04-29 5:53 ` Leon Romanovsky
2025-04-29 5:58 ` Baolu Lu
2025-04-29 6:18 ` Leon Romanovsky
2025-04-28 9:22 ` [PATCH v10 07/24] dma-mapping: Implement link/unlink ranges API Leon Romanovsky
2025-04-28 9:22 ` [PATCH v10 08/24] dma-mapping: add a dma_need_unmap helper Leon Romanovsky
2025-04-28 9:22 ` [PATCH v10 09/24] docs: core-api: document the IOVA-based API Leon Romanovsky
2025-04-28 9:22 ` [PATCH v10 10/24] mm/hmm: let users to tag specific PFN with DMA mapped bit Leon Romanovsky
2025-04-28 9:22 ` [PATCH v10 11/24] mm/hmm: provide generic DMA managing logic Leon Romanovsky
2025-04-28 9:22 ` [PATCH v10 12/24] RDMA/umem: Store ODP access mask information in PFN Leon Romanovsky
2025-04-28 9:22 ` [PATCH v10 13/24] RDMA/core: Convert UMEM ODP DMA mapping to caching IOVA and page linkage Leon Romanovsky
2025-04-28 9:22 ` [PATCH v10 14/24] RDMA/umem: Separate implicit ODP initialization from explicit ODP Leon Romanovsky
2025-04-28 9:22 ` [PATCH v10 15/24] vfio/mlx5: Explicitly use number of pages instead of allocated length Leon Romanovsky
2025-04-28 9:22 ` [PATCH v10 16/24] vfio/mlx5: Rewrite create mkey flow to allow better code reuse Leon Romanovsky
2025-04-28 9:22 ` [PATCH v10 17/24] vfio/mlx5: Enable the DMA link API Leon Romanovsky
2025-04-28 9:22 ` [PATCH v10 18/24] block: share more code for bio addition helper Leon Romanovsky
2025-04-28 9:22 ` Leon Romanovsky [this message]
2025-04-28 9:22 ` [PATCH v10 20/24] blk-mq: add scatterlist-less DMA mapping helpers Leon Romanovsky
2025-04-28 9:22 ` [PATCH v10 21/24] nvme-pci: remove struct nvme_descriptor Leon Romanovsky
2025-04-28 9:22 ` [PATCH v10 22/24] nvme-pci: use a better encoding for small prp pool allocations Leon Romanovsky
2025-04-28 9:22 ` [PATCH v10 23/24] nvme-pci: convert to blk_rq_dma_map Leon Romanovsky
2025-04-28 16:46 ` Keith Busch
2025-04-28 17:22 ` Leon Romanovsky
2025-04-28 17:30 ` Keith Busch
2025-04-28 9:22 ` [PATCH v10 24/24] nvme-pci: store aborted state in flags variable Leon Romanovsky
2025-05-12 10:07 ` (subset) [PATCH v10 00/24] Provide a new two step DMA mapping API Leon Romanovsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e31df76edb282f648d22c3c9040320cf18b1652f.1745831017.git.leon@kernel.org \
--to=leon@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=alex.williamson@redhat.com \
--cc=axboe@kernel.dk \
--cc=bhelgaas@google.com \
--cc=chuck.lever@oracle.com \
--cc=corbet@lwn.net \
--cc=dan.j.williams@intel.com \
--cc=hch@lst.de \
--cc=iommu@lists.linux.dev \
--cc=jake@lwn.net \
--cc=jgg@ziepe.ca \
--cc=jglisse@redhat.com \
--cc=joro@8bytes.org \
--cc=joshi.k@samsung.com \
--cc=kbusch@kernel.org \
--cc=kch@nvidia.com \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=leonro@nvidia.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nvme@lists.infradead.org \
--cc=linux-pci@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=logang@deltatee.com \
--cc=m.szyprowski@samsung.com \
--cc=mcgrof@kernel.org \
--cc=robin.murphy@arm.com \
--cc=sagi@grimberg.me \
--cc=schnelle@linux.ibm.com \
--cc=shameerali.kolothum.thodi@huawei.com \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=yishaih@nvidia.com \
--cc=zyjzyj2000@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox