From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6CFEEC5475B for ; Wed, 6 Mar 2024 16:20:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E49AA6B007D; Wed, 6 Mar 2024 11:20:31 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DFA4B6B007E; Wed, 6 Mar 2024 11:20:31 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CE9516B0080; Wed, 6 Mar 2024 11:20:31 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id C046A6B007D for ; Wed, 6 Mar 2024 11:20:31 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 68C57A04FE for ; Wed, 6 Mar 2024 16:20:31 +0000 (UTC) X-FDA: 81867127062.03.6441AAC Received: from verein.lst.de (verein.lst.de [213.95.11.211]) by imf11.hostedemail.com (Postfix) with ESMTP id A455B4001F for ; Wed, 6 Mar 2024 16:20:29 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf11.hostedemail.com: domain of hch@lst.de designates 213.95.11.211 as permitted sender) smtp.mailfrom=hch@lst.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709742030; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zI9S4WsJO+B3yWxo6LFa+TIs/4eBcMQ4GYanyGUlvgY=; b=JjK5EJTfL1Vz+/EsPXT9/p2z57pGZzHPZh2qbVpWJ0l96iSHZzboFtySfLeHXDTJY2aQK/ QKusCfCmKmTlL5j3E+T9HKonGCB2/iZdHZ7OuKptzzC3WF3qsWsmUKZoc20tiJbYjaVOHz 1qRjp+0VI7hExsM3sZuAW0NTwQFfl7U= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf11.hostedemail.com: domain of hch@lst.de designates 213.95.11.211 as permitted sender) smtp.mailfrom=hch@lst.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709742030; a=rsa-sha256; cv=none; b=akRQ5/OzKiaGsToXzIAShZCx13WRng6OxHpGvduvI2QDgjL+xDmUWjFkRpYY7Tf8nW/byr v7Ckuif0pH21HRPWCjpjbafgdL80pcJX8/HwRqzLTrJoJshs08RMfZCVclK6tsOwzaQGkW vtgYqNlBcdkU0WkK5USOq2VDwoqAdn4= Received: by verein.lst.de (Postfix, from userid 2407) id 1908468CFE; Wed, 6 Mar 2024 17:20:23 +0100 (CET) Date: Wed, 6 Mar 2024 17:20:22 +0100 From: Christoph Hellwig To: Jason Gunthorpe Cc: Christoph Hellwig , Leon Romanovsky , Robin Murphy , Marek Szyprowski , Joerg Roedel , Will Deacon , Chaitanya Kulkarni , Jonathan Corbet , Jens Axboe , Keith Busch , Sagi Grimberg , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , =?iso-8859-1?B?Suly9G1l?= Glisse , Andrew Morton , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, kvm@vger.kernel.org, linux-mm@kvack.org, Bart Van Assche , Damien Le Moal , Amir Goldstein , "josef@toxicpanda.com" , "Martin K. Petersen" , "daniel@iogearbox.net" , Dan Williams , "jack@suse.com" , Zhu Yanjun Subject: Re: [RFC RESEND 00/16] Split IOMMU DMA mapping operation to two steps Message-ID: <20240306162022.GB28427@lst.de> References: <47afacda-3023-4eb7-b227-5f725c3187c2@arm.com> <20240305122935.GB36868@unreal> <20240306144416.GB19711@lst.de> <20240306154328.GM9225@ziepe.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240306154328.GM9225@ziepe.ca> User-Agent: Mutt/1.5.17 (2007-11-01) X-Rspamd-Queue-Id: A455B4001F X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: kj87z48aje184c4ip6n9yi1cocnkmfhs X-HE-Tag: 1709742029-254249 X-HE-Meta: U2FsdGVkX19pOdsVIH8O8Mz1Zx9fvXYOOGscA9hAtZCWdmlfjM/NBeoq6Ts97rYP/dbCipy3giOATuHIBsYdZxs2cgcaHhJTb4IONhAqG4N9bAkqe8A2CcCT+qOThC0pscrs7nYkUaTPE4AbL2nqp2FTLz0hykCBAPV+A7ql2Ynx1vslV2+rJ91PgQ42CPLJoID4qjHhaUsYn4fP1JcYmPmSSDdqxzsgiDChI9j9RAttPROAvS39He6FKKrW+wFRsjIY6/m/f8VPPnsq9upHFkB7y3eAVDHLNWAhfSo189+joddXiaH6UnToSK8LwaFV27bSBuWy9CNo+R8gW1I1rU6CeyfnZgP6PYB1PFCY25Y0CsMNloeJh1xtdiyLISxshQ935n1/ixtL+iSI3UqMwlYNrXzmt7cv8xi8erGl6tqFN1/jWtb7x8XZnHL+/eaAYdUuCuJuCJxrkvLG90CwN+nNEk2yx2JhaOqbDGE23AXcNLte7KscU6d2vDckXpUzKmJH3nPsH7XPuELTDFiEOCkS4kkP3bn9czmFP0ZoYz+t2/bWPakzoB2iT60gMKH7JCJR5k//lMslGJywWHrgZ/qnnG1f2Qd4XvBe9uzGtke9IeD6OXjUHYSgOJBM6TqtXJfrm3EPQoJXdYwAejFsSleEPETuATzfH1N1BDHOKyDTfr7j4E7cMOOZ5glcMCyUPyJu10nl0JHtY5huWu5k29JZOl8VeI3of1FGLZzgMxOwhb2wcvhEsJXAhlFbXZkdcHJl5xYlY8rHFyTgjTx46PmpoKH8uCyx X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Mar 06, 2024 at 11:43:28AM -0400, Jason Gunthorpe wrote: > I don't think they are so fundamentally different, at least in our > past conversations I never came out with the idea we should burden the > driver with two different flows based on what kind of alignment the > transfer happens to have. Then we talked past each other.. > At least the RDMA drivers could productively use just a page aligned > interface. But I didn't think this would make BIO users happy so never > even thought about it.. page aligned is generally the right thing for the block layer. NVMe for example already requires that anyway due to PRPs. > > The total transfer size should just be passed in by the callers and > > be known, and there should be no offset. > > The API needs the caller to figure out the total number of IOVA pages > it needs, rounding up the CPU ranges to full aligned pages. That > becomes the IOVA allocation. Yes, it's a basic align up to the granularity asuming we don't bother with non-aligned transfers. > > > So if we want to efficiently be able to handle these cases we need > > two APIs in the driver and a good framework to switch between them. > > But, what does the non-page-aligned version look like? Doesn't it > still look basically like this? I'd just rather have the non-aligned case for those who really need it be the loop over map single region that is needed for the direct mapping anyway. > > And what is the actual difference if the input is aligned? The caller > can assume it doesn't need to provide a per-range dma_addr_t during > unmap. A per-range dma_addr_t doesn't really make sense for the aligned and coalesced case. > It still can't assume the HW programming will be linear due to the P2P > !ACS support. > > And it still has to call an API per-cpu range to actually program the > IOMMU. > > So are they really so different to want different APIs? That strikes > me as a big driver cost. To not have to store a dma_address range per CPU range that doesn't actually get used at all.