From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C73A0D13562 for ; Sun, 27 Oct 2024 14:23:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 526F38D000E; Sun, 27 Oct 2024 10:23:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4D6C28D0001; Sun, 27 Oct 2024 10:23:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 350A58D000E; Sun, 27 Oct 2024 10:23:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 14F048D0001 for ; Sun, 27 Oct 2024 10:23:20 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id C58F71A1C21 for ; Sun, 27 Oct 2024 14:22:40 +0000 (UTC) X-FDA: 82719598752.08.8990C7B Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf06.hostedemail.com (Postfix) with ESMTP id 43705180018 for ; Sun, 27 Oct 2024 14:23:02 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=GU2WiE85; spf=pass (imf06.hostedemail.com: domain of leon@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=leon@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730038919; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=DwhhRHXOs/NzUCpd1QbXTeegPlkywkyQe+nKJxEfOkU=; b=iMv9I4C+vPzkkkb5wGOmSW/G1byVdq72J4BGRQ3PYXi9oilH4uc7m2jcLJsuV4d6Iu4Eb+ 6EGPRhM2CY+S/TMZQUB90ykHvFcV5oHiHDbOjg17gGBcB2xFox6x9kccl6uI4+Fs4nmj1S 2v9Hs2U03r3hkO6eM6mSkiiGF1vNeCA= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=GU2WiE85; spf=pass (imf06.hostedemail.com: domain of leon@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=leon@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730038919; a=rsa-sha256; cv=none; b=o/28dXiYDkgTPBSM0mkCO3NpfWIIhW/YR/wufC53pbelqMoaWHnG+sMcTXvlp6fMI7g+FH KoUtu/m3mHUzLwhvvH2Mw9wK/73p802yNUazXpQp/e2YQnYAHKo1jJEMoXp2r7hP0jNAys iiCge84t47di9l40v0d2ttty7/QleJ4= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 3B6F35C5833; Sun, 27 Oct 2024 14:22:32 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 38793C4CEC3; Sun, 27 Oct 2024 14:23:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1730038997; bh=ZAkEvCDpOQpT4FqrkcEPmIofeDg0rb/b5+lEupSm25s=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=GU2WiE85QkyrGZabT77HMzH4RzcXBtUoK7DX8razab6E7GlTeZ2LQTeNI7ZRCEXcF +9yMHm9zDo08nuNcPNZNSsuaYkBUL9ipGLwLInfLmk0HejWEa8WPu1R8UqdKOsPV74 Z5wpY7UYyju99OwvD8w7GPbHNbCOK/zLgPk0s2GMr97M3wN8lKofvcDQjgjNlAIopH eIYUU6O634Ehr1WlX/W3GKJbeoFI2GpQCheDt+O+VjGAz5sar3RgcEymVxHQmnOdc4 uO+VYU+gZUk1F3VsP3juUi2vi+UCZ40ddUDVpK0TRukr1wrJIDD2IoyIOcFlFa0aTT I1l2OPb8MTV4g== From: Leon Romanovsky To: Jens Axboe , Jason Gunthorpe , Robin Murphy , Joerg Roedel , Will Deacon , Christoph Hellwig , Sagi Grimberg Cc: Keith Busch , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Andrew Morton , Jonathan Corbet , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH 3/7] blk-mq: add a dma mapping iterator Date: Sun, 27 Oct 2024 16:21:56 +0200 Message-ID: <9b7c6203fbbae90821f2220dba1bd600e3bd23ba.1730037261.git.leon@kernel.org> X-Mailer: git-send-email 2.46.2 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 43705180018 X-Stat-Signature: 5ywgf8suow8g38xkhtpui1fzsuqdty9b X-HE-Tag: 1730038982-400653 X-HE-Meta: U2FsdGVkX1/jCQ0XmEtAlIrNDsE9+MfwqLcc847F1nUJdZnrqUT/L/7lQkGmPbxtRyRslLItbOJTOqjLDlHXEe5Vs3itfZA6glTp1orJrnHlRcfWvCu6aQjVhY0+9yG3m3dhnR2xXqwUhBYdA3glMp6T7Nd5Jaj3a07r4f5TZvZmJ76MIYnkp2B8TNzYv++8yyPS6Wp5V5v32KVb7iJmWkLFBnlCnkH82I5fPZEYoJKnLuJhOJb/veXeRAHKj+TNAuoRzebyJUwFxeuDufDH057mPbHGqhA77b16JDO3x67q7QjhX3fdNY4PVHFcVloJKsJWinBXrWfgU8M9XXxJEWxfiyL5FkzDOgI04SP2NPHAY19HhJgwl2FWzan3Q91bGj5HHH5/WvHhTWIocNbJyrFUc1/W7SSO5T3gZ0iliH7jfZ2Fp2J7MqBYAMsBBIJIwlluWn+bOlRbf0RZUvkSApQKDyGjNUqiWzj2omqoF525lDlw+e93iL1LeTmljXDvAJPc/TzmOYNgpoXsKzfzc1yCql3VbwF6nKuGXr3VB+MS2XTcXnlwrvI1MJvN8KCPGlwvRAbZH0tJNhcmoMaI7mt8xtkuPD+GKc2BTn/GK3MO56wyiPN+PqND+4SQfPPh0Z+7I6oKAahtd5DIhbvLiXx90TAiJxbrj15egXRLUn7oWD5sPkpFePO7VGyPlLCDGauLZWNQCH5hgmiaKmKdDjyh/LT6Rc1CtX/Ww7NZFkXINbKcf1EvCYlZxG8BM1CqLR7xuT4nSIW2vhJPexD4Fwg+2RSWP9pyZoD7iMEziNY+8FAkB8o2KXjrGv5w+V/1SAf+Q1PrxgjOvp3gcV1drdG69C1YwdMf1qYrEp1LgLetpiL/hFw3BWf3rXK/MCiJqkTmkoAIOofpf3p9lC6lFwfQnrzGqEhABtG7a0xviaBVHzKNuLZ3yDF8MPX9FgwXfOcg5xY5C4zibfU/3rF pyP2XqB2 YYwGi4SpxpEKhbEFHrchPpzpY8dl4YGXHmCAxT5nmdKZkdy7SMPuco/XCRAeAN2+QrPjq6J+94PUHwhzeKcHS8624QmLLJyTJe+kEpYuXGSvA8XYHm/eNDS6ovSbtBLl2bxeCSO/1s38ScDwXLf8SkY/IzsidKao8lobSKtRXaBMC9EH7A7BxuhyGXJfHA3+bh8GfPucOeeZBr4zMJf+WGamSn7rfeHOwF1Du2sE9HgYtkZc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Christoph Hellwig blk_rq_map_sg is maze of nested loops. Untangle it by creating an iterator that returns [paddr,len] tuples for DMA mapping, and then implement the DMA logic on top of this. This not only removes code at the source level, but also generates nicer binary code: $ size block/blk-merge.o.* text data bss dec hex filename 10001 432 0 10433 28c1 block/blk-merge.o.new 10317 468 0 10785 2a21 block/blk-merge.o.old Last but not least it will be used as a building block for a new DMA mapping helper that doesn't rely on struct scatterlist. Signed-off-by: Christoph Hellwig Signed-off-by: Leon Romanovsky --- block/blk-merge.c | 182 ++++++++++++++++++++-------------------------- 1 file changed, 77 insertions(+), 105 deletions(-) diff --git a/block/blk-merge.c b/block/blk-merge.c index ad763ec313b6..b63fd754a5de 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -451,137 +451,109 @@ unsigned int blk_recalc_rq_segments(struct request *rq) return nr_phys_segs; } -static inline struct scatterlist *blk_next_sg(struct scatterlist **sg, - struct scatterlist *sglist) +struct phys_vec { + phys_addr_t paddr; + u32 len; +}; + +static bool blk_map_iter_next(struct request *req, + struct req_iterator *iter, struct phys_vec *vec) { - if (!*sg) - return sglist; + unsigned int max_size; + struct bio_vec bv; /* - * If the driver previously mapped a shorter list, we could see a - * termination bit prematurely unless it fully inits the sg table - * on each mapping. We KNOW that there must be more entries here - * or the driver would be buggy, so force clear the termination bit - * to avoid doing a full sg_init_table() in drivers for each command. + * For special payload requests there only is a single segment. Return + * it now and make sure blk_phys_iter_next stop iterating. */ - sg_unmark_end(*sg); - return sg_next(*sg); -} + if (req->rq_flags & RQF_SPECIAL_PAYLOAD) { + if (!iter->bio) + return false; + vec->paddr = bvec_phys(&req->special_vec); + vec->len = req->special_vec.bv_len; + iter->bio = NULL; + return true; + } -static unsigned blk_bvec_map_sg(struct request_queue *q, - struct bio_vec *bvec, struct scatterlist *sglist, - struct scatterlist **sg) -{ - unsigned nbytes = bvec->bv_len; - unsigned nsegs = 0, total = 0; + if (!iter->iter.bi_size) + return false; - while (nbytes > 0) { - unsigned offset = bvec->bv_offset + total; - unsigned len = get_max_segment_size(&q->limits, - bvec_phys(bvec) + total, nbytes); - struct page *page = bvec->bv_page; + bv = mp_bvec_iter_bvec(iter->bio->bi_io_vec, iter->iter); + vec->paddr = bvec_phys(&bv); + max_size = get_max_segment_size(&req->q->limits, vec->paddr, UINT_MAX); + bv.bv_len = min(bv.bv_len, max_size); + bio_advance_iter_single(iter->bio, &iter->iter, bv.bv_len); - /* - * Unfortunately a fair number of drivers barf on scatterlists - * that have an offset larger than PAGE_SIZE, despite other - * subsystems dealing with that invariant just fine. For now - * stick to the legacy format where we never present those from - * the block layer, but the code below should be removed once - * these offenders (mostly MMC/SD drivers) are fixed. - */ - page += (offset >> PAGE_SHIFT); - offset &= ~PAGE_MASK; + /* + * If we are entirely done with this bi_io_vec entry, check if the next + * one could be merged into it. This typically happens when moving to + * the next bio, but some callers also don't pack bvecs tight. + */ + while (!iter->iter.bi_size || !iter->iter.bi_bvec_done) { + struct bio_vec next; + + if (!iter->iter.bi_size) { + if (!iter->bio->bi_next) + break; + iter->bio = iter->bio->bi_next; + iter->iter = iter->bio->bi_iter; + } - *sg = blk_next_sg(sg, sglist); - sg_set_page(*sg, page, len, offset); + next = mp_bvec_iter_bvec(iter->bio->bi_io_vec, iter->iter); + if (bv.bv_len + next.bv_len > max_size || + !biovec_phys_mergeable(req->q, &bv, &next)) + break; - total += len; - nbytes -= len; - nsegs++; + bv.bv_len += next.bv_len; + bio_advance_iter_single(iter->bio, &iter->iter, next.bv_len); } - return nsegs; + vec->len = bv.bv_len; + return true; } -static inline int __blk_bvec_map_sg(struct bio_vec bv, - struct scatterlist *sglist, struct scatterlist **sg) -{ - *sg = blk_next_sg(sg, sglist); - sg_set_page(*sg, bv.bv_page, bv.bv_len, bv.bv_offset); - return 1; -} +#define blk_phys_to_page(_paddr) \ + (pfn_to_page(__phys_to_pfn(_paddr))) -/* only try to merge bvecs into one sg if they are from two bios */ -static inline bool -__blk_segment_map_sg_merge(struct request_queue *q, struct bio_vec *bvec, - struct bio_vec *bvprv, struct scatterlist **sg) +static inline struct scatterlist *blk_next_sg(struct scatterlist **sg, + struct scatterlist *sglist) { - - int nbytes = bvec->bv_len; - if (!*sg) - return false; - - if ((*sg)->length + nbytes > queue_max_segment_size(q)) - return false; - - if (!biovec_phys_mergeable(q, bvprv, bvec)) - return false; - - (*sg)->length += nbytes; - - return true; -} - -static int __blk_bios_map_sg(struct request_queue *q, struct bio *bio, - struct scatterlist *sglist, - struct scatterlist **sg) -{ - struct bio_vec bvec, bvprv = { NULL }; - struct bvec_iter iter; - int nsegs = 0; - bool new_bio = false; - - for_each_bio(bio) { - bio_for_each_bvec(bvec, bio, iter) { - /* - * Only try to merge bvecs from two bios given we - * have done bio internal merge when adding pages - * to bio - */ - if (new_bio && - __blk_segment_map_sg_merge(q, &bvec, &bvprv, sg)) - goto next_bvec; - - if (bvec.bv_offset + bvec.bv_len <= PAGE_SIZE) - nsegs += __blk_bvec_map_sg(bvec, sglist, sg); - else - nsegs += blk_bvec_map_sg(q, &bvec, sglist, sg); - next_bvec: - new_bio = false; - } - if (likely(bio->bi_iter.bi_size)) { - bvprv = bvec; - new_bio = true; - } - } + return sglist; - return nsegs; + /* + * If the driver previously mapped a shorter list, we could see a + * termination bit prematurely unless it fully inits the sg table + * on each mapping. We KNOW that there must be more entries here + * or the driver would be buggy, so force clear the termination bit + * to avoid doing a full sg_init_table() in drivers for each command. + */ + sg_unmark_end(*sg); + return sg_next(*sg); } /* - * map a request to scatterlist, return number of sg entries setup. Caller - * must make sure sg can hold rq->nr_phys_segments entries + * Map a request to scatterlist, return number of sg entries setup. Caller + * must make sure sg can hold rq->nr_phys_segments entries. */ int __blk_rq_map_sg(struct request_queue *q, struct request *rq, struct scatterlist *sglist, struct scatterlist **last_sg) { + struct req_iterator iter = { + .bio = rq->bio, + .iter = rq->bio->bi_iter, + }; + struct phys_vec vec; int nsegs = 0; - if (rq->rq_flags & RQF_SPECIAL_PAYLOAD) - nsegs = __blk_bvec_map_sg(rq->special_vec, sglist, last_sg); - else if (rq->bio) - nsegs = __blk_bios_map_sg(q, rq->bio, sglist, last_sg); + while (blk_map_iter_next(rq, &iter, &vec)) { + struct page *page = blk_phys_to_page(vec.paddr); + unsigned int offset = offset_in_page(vec.paddr); + + *last_sg = blk_next_sg(last_sg, sglist); + sg_set_page(*last_sg, page, vec.len, offset); + nsegs++; + } if (*last_sg) sg_mark_end(*last_sg); -- 2.46.2