From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B3644E677EE for ; Sat, 2 Nov 2024 07:39:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 32B566B0085; Sat, 2 Nov 2024 03:39:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2B37D6B0088; Sat, 2 Nov 2024 03:39:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 17C336B0089; Sat, 2 Nov 2024 03:39:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id ECAAF6B0085 for ; Sat, 2 Nov 2024 03:39:45 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 70C4EAD32E for ; Sat, 2 Nov 2024 07:39:45 +0000 (UTC) X-FDA: 82740354648.23.CD6CBE6 Received: from out-176.mta0.migadu.com (out-176.mta0.migadu.com [91.218.175.176]) by imf17.hostedemail.com (Postfix) with ESMTP id 19D0840003 for ; Sat, 2 Nov 2024 07:39:20 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=Vo9PwtDe; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf17.hostedemail.com: domain of yanjun.zhu@linux.dev designates 91.218.175.176 as permitted sender) smtp.mailfrom=yanjun.zhu@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730533127; a=rsa-sha256; cv=none; b=0dkPSJtZNAYYV5WkwJVGaDBS7qB42pRKuccymr88dGPkmIpaiUhlxqGX2wD2D5yTG8gBDu 53J+qiSsmA9DKY24jPoaXPkFRRORxavByrjohbDyqvmLvEpk7ueArW2atNQkk/4BwlQmn3 XOiBys14SfpUVPVLgc1+tF8Dno9yyhQ= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=Vo9PwtDe; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf17.hostedemail.com: domain of yanjun.zhu@linux.dev designates 91.218.175.176 as permitted sender) smtp.mailfrom=yanjun.zhu@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730533127; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Zkq8x4dmz9yfUFHPf0EELUUQUn7E8nglf4dRQUS+EMY=; b=RkDaJBNQ5zN5vxucqUSudMv0g88Vuinfsjfbi+ccdC6+4PaSI9Us2caLIDekt7Cjxv77P8 ig1vQ8pvFqcpwCIqnOnAMa0f8VEMqGq4Zy41T8+I6UazNtBDG/gj9nERQuFR+KfrtVSt3y 7kmC5oB9r1pmVffC8GxjJAqleNQbUYk= Message-ID: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1730533181; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Zkq8x4dmz9yfUFHPf0EELUUQUn7E8nglf4dRQUS+EMY=; b=Vo9PwtDerSGdplleUNX0Y/p+JDHIf70BdVvazCyN6y7vBeSmJN/2zQyKwdwVLMDJ75gcUi p2dyDGNKJmn61rctVMXgRmkGSEuGDVdgsPLZmNxceQZmMQrgLWD8jMvyZRZxdxYdGOxMmE AmgT0wss4pgZ2ZNAb8fYVAZQLqSC69g= Date: Sat, 2 Nov 2024 08:39:35 +0100 MIME-Version: 1.0 Subject: Re: [RFC PATCH 2/7] block: don't merge different kinds of P2P transfers in a single bio To: Leon Romanovsky , Jens Axboe , Jason Gunthorpe , Robin Murphy , Joerg Roedel , Will Deacon , Christoph Hellwig , Sagi Grimberg Cc: Keith Busch , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , Jonathan Corbet , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org References: <34d44537a65aba6ede215a8ad882aeee028b423a.1730037261.git.leon@kernel.org> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Zhu Yanjun In-Reply-To: <34d44537a65aba6ede215a8ad882aeee028b423a.1730037261.git.leon@kernel.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT X-Rspam-User: X-Rspamd-Queue-Id: 19D0840003 X-Rspamd-Server: rspam01 X-Stat-Signature: 4ud5e75dwfupqx8sg3gsm8bojw3grt5x X-HE-Tag: 1730533160-29511 X-HE-Meta: U2FsdGVkX19BZe8mBXMPf8o7FgPduNIetjVV8v1PfkaUMDnmHBak67JFnkEmDvGrOqt6YE2m8/jBUcyXGx7r60hFJRwfs6n1HbtViJ4WZttTNeUQRkhRWXNTxNfVxLAguX3jCZFdo0wNT1tNpM3dZt8w9A1nOPeIWBgjUFOdt1AF9pbPWrhN4DW3Q/cFDSjP4cUWi9mMZL1cHj841DHSzzjwxlxdM8Dk2I0JTy25LPoZc2S3HNtd85sIaqKXTR8g8Hs5BOjY5bItoPAALZLhts7u5bKwD9a8mvEHwMSZb7F/1MLOO7hXWZrovs11vHqzLaQ5dA6/5k4aD5NVIPq4jGDUN4LAPCAGVGjKY9laarohWCB32bh492UzonaotWut/6sjNtWBxFGoUUI9PSVWxmNjppIZz+3CwuMx47trepzQ5vO/Uz5zJ0/5GlxMZa/3mPscVPFgwalSiA1qf4E9AlNpPd+GdIbqtLml98OxpTPNU0zzh2ZW06pSckWWH4CvjDcbpJ6IqqleCO6GGuU3htLl12vCQWn0l7XTndZlWfdkxQXJoxZp+szKFOog9mp8DpE4Y+aH4865Wq8fj2b/M3chjNMB1mvC2zTj7qDkZqwiAFStNuJ7HNvHZm1wztUycnvFCQElpsBBO7E/eMVCvObaCc6JlbNXJ4cYyvjtue0RiGJPn5fsle/6e70O8FCLhIhgBQHGZpIhOXzPVvhfJiQlS9ypUTVL08fStNSQW5HvoE9DElTZFk50g1TiZyZRdRBBKAnFDvhT7sA4zyR1n3P4Eg0eSc24KcUnOad7dy4XAtUZ++dWPMwUDfWFzu7eOtYN6FhkdzKPVvTTQiIF1WWkKoD+amnzcsOy0/bcdzC29jQ84u/tbfgM7xnGXDH/CVyFto680riiSsN82fGktcOPiKlO7s01ssTfeqfzy0QhEN5AMzM3qdEH2itb0wxB8ppdaQEce56yeT51e8u 9z4AKwUq +mcLM2ylAvFraZPech6v/OwpFnKZKzsQ9lZyYzp3vGW4WWk9AXNkoVdtQxrJiwy8I5nd+fRq3ACZCbf82Uf9wkFLLB+DYhUV2NmPKbaI3D258JDhgZ5178EYICAKoBumbIWOazlIy6Uy1cE7262Tvd6S83ez4M4h4OB+5kLZHjqUdHFJX6y3k8RWIfv2EjkwYH9sgjcHRgGmHGAU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: 在 2024/10/27 15:21, Leon Romanovsky 写道: > From: Christoph Hellwig > > To get out of the dma mapping helpers having to check every segment for > it's P2P status, ensure that bios either contain P2P transfers or non-P2P > transfers, and that a P2P bio only contains ranges from a single device. > > This means we do the page zone access in the bio add path where it should > be still page hot, and will only have do the fairly expensive P2P topology > lookup once per bio down in the dma mapping path, and only for already > marked bios. > > Signed-off-by: Christoph Hellwig > Signed-off-by: Leon Romanovsky > --- > block/bio.c | 36 +++++++++++++++++++++++++++++------- > block/blk-map.c | 32 ++++++++++++++++++++++++-------- > include/linux/blk_types.h | 2 ++ > 3 files changed, 55 insertions(+), 15 deletions(-) > > diff --git a/block/bio.c b/block/bio.c > index 2d3bc8bfb071..943a6d78cb3e 100644 > --- a/block/bio.c > +++ b/block/bio.c > @@ -928,8 +928,6 @@ static bool bvec_try_merge_page(struct bio_vec *bv, struct page *page, > return false; > if (xen_domain() && !xen_biovec_phys_mergeable(bv, page)) > return false; > - if (!zone_device_pages_have_same_pgmap(bv->bv_page, page)) > - return false; > > *same_page = ((vec_end_addr & PAGE_MASK) == ((page_addr + off) & > PAGE_MASK)); > @@ -993,6 +991,14 @@ int bio_add_hw_page(struct request_queue *q, struct bio *bio, > if (bio->bi_vcnt > 0) { > struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt - 1]; > > + /* > + * When doing ZONE_DEVICE-based P2P transfers, all pages in a > + * bio must be P2P pages from the same device. > + */ > + if ((bio->bi_opf & REQ_P2PDMA) && > + !zone_device_pages_have_same_pgmap(bv->bv_page, page)) > + return 0; > + > if (bvec_try_merge_hw_page(q, bv, page, len, offset, > same_page)) { > bio->bi_iter.bi_size += len; > @@ -1009,6 +1015,9 @@ int bio_add_hw_page(struct request_queue *q, struct bio *bio, > */ > if (bvec_gap_to_prev(&q->limits, bv, offset)) > return 0; > + } else { > + if (is_pci_p2pdma_page(page)) > + bio->bi_opf |= REQ_P2PDMA | REQ_NOMERGE; > } > > bvec_set_page(&bio->bi_io_vec[bio->bi_vcnt], page, len, offset); > @@ -1133,11 +1142,24 @@ static int bio_add_page_int(struct bio *bio, struct page *page, > if (bio->bi_iter.bi_size > UINT_MAX - len) > return 0; > > - if (bio->bi_vcnt > 0 && > - bvec_try_merge_page(&bio->bi_io_vec[bio->bi_vcnt - 1], > - page, len, offset, same_page)) { > - bio->bi_iter.bi_size += len; > - return len; > + if (bio->bi_vcnt > 0) { > + struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt - 1]; > + > + /* > + * When doing ZONE_DEVICE-based P2P transfers, all pages in a > + * bio must be P2P pages from the same device. > + */ > + if ((bio->bi_opf & REQ_P2PDMA) && > + !zone_device_pages_have_same_pgmap(bv->bv_page, page)) > + return 0; > + > + if (bvec_try_merge_page(bv, page, len, offset, same_page)) { > + bio->bi_iter.bi_size += len; > + return len; > + } > + } else { > + if (is_pci_p2pdma_page(page)) > + bio->bi_opf |= REQ_P2PDMA | REQ_NOMERGE; > } > > if (bio->bi_vcnt >= bio->bi_max_vecs) > diff --git a/block/blk-map.c b/block/blk-map.c > index 0e1167b23934..03192b1ca6ea 100644 > --- a/block/blk-map.c > +++ b/block/blk-map.c > @@ -568,6 +568,7 @@ static int blk_rq_map_user_bvec(struct request *rq, const struct iov_iter *iter) > const struct queue_limits *lim = &q->limits; > unsigned int nsegs = 0, bytes = 0; > struct bio *bio; > + int error; > size_t i; > > if (!nr_iter || (nr_iter >> SECTOR_SHIFT) > queue_max_hw_sectors(q)) > @@ -588,15 +589,30 @@ static int blk_rq_map_user_bvec(struct request *rq, const struct iov_iter *iter) > for (i = 0; i < nr_segs; i++) { > struct bio_vec *bv = &bvecs[i]; > > - /* > - * If the queue doesn't support SG gaps and adding this > - * offset would create a gap, fallback to copy. > - */ > - if (bvprvp && bvec_gap_to_prev(lim, bvprvp, bv->bv_offset)) { > - blk_mq_map_bio_put(bio); > - return -EREMOTEIO; > + error = -EREMOTEIO; > + if (bvprvp) { > + /* > + * If the queue doesn't support SG gaps and adding this > + * offset would create a gap, fallback to copy. > + */ > + if (bvec_gap_to_prev(lim, bvprvp, bv->bv_offset)) > + goto put_bio; > + > + /* > + * When doing ZONE_DEVICE-based P2P transfers, all pages > + * in a bio must be P2P pages, and from the same device. > + */ > + if ((bio->bi_opf & REQ_P2PDMA) && > + zone_device_pages_have_same_pgmap(bvprvp->bv_page, > + bv->bv_page)) > + goto put_bio; > + } else { > + if (is_pci_p2pdma_page(bv->bv_page)) > + bio->bi_opf |= REQ_P2PDMA | REQ_NOMERGE; > } > + > /* check full condition */ > + error = -EINVAL; > if (nsegs >= nr_segs || bytes > UINT_MAX - bv->bv_len) > goto put_bio; > if (bytes + bv->bv_len > nr_iter) > @@ -611,7 +627,7 @@ static int blk_rq_map_user_bvec(struct request *rq, const struct iov_iter *iter) > return 0; > put_bio: > blk_mq_map_bio_put(bio); > - return -EINVAL; > + return error; > } > > /** > diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h > index dce7615c35e7..94cf146e8ce6 100644 > --- a/include/linux/blk_types.h > +++ b/include/linux/blk_types.h > @@ -378,6 +378,7 @@ enum req_flag_bits { > __REQ_DRV, /* for driver use */ > __REQ_FS_PRIVATE, /* for file system (submitter) use */ > __REQ_ATOMIC, /* for atomic write operations */ > + __REQ_P2PDMA, /* contains P2P DMA pages */ > /* > * Command specific flags, keep last: > */ > @@ -410,6 +411,7 @@ enum req_flag_bits { > #define REQ_DRV (__force blk_opf_t)(1ULL << __REQ_DRV) > #define REQ_FS_PRIVATE (__force blk_opf_t)(1ULL << __REQ_FS_PRIVATE) > #define REQ_ATOMIC (__force blk_opf_t)(1ULL << __REQ_ATOMIC) > +#define REQ_P2PDMA (__force blk_opf_t)(1ULL << __REQ_P2PDMA) #define REQ_P2PDMA (__force blk_opf_t)BIT_ULL(__REQ_P2PDMA) Use BIT_ULL instead of direct left shit. Zhu Yanjun > > #define REQ_NOUNMAP (__force blk_opf_t)(1ULL << __REQ_NOUNMAP) >