From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 95F01C5475B for ; Wed, 6 Mar 2024 15:43:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 180476B0078; Wed, 6 Mar 2024 10:43:34 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1099F6B007D; Wed, 6 Mar 2024 10:43:34 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EEB916B007E; Wed, 6 Mar 2024 10:43:33 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id D94BA6B0078 for ; Wed, 6 Mar 2024 10:43:33 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 7624C120AA8 for ; Wed, 6 Mar 2024 15:43:33 +0000 (UTC) X-FDA: 81867033906.26.5FBD6D5 Received: from mail-oo1-f44.google.com (mail-oo1-f44.google.com [209.85.161.44]) by imf24.hostedemail.com (Postfix) with ESMTP id 763CD180034 for ; Wed, 6 Mar 2024 15:43:31 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=ziepe.ca header.s=google header.b=iibvmxKm; dmarc=none; spf=pass (imf24.hostedemail.com: domain of jgg@ziepe.ca designates 209.85.161.44 as permitted sender) smtp.mailfrom=jgg@ziepe.ca ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709739811; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=k1cN5gYBP45I54WCStxe7DUVSwP4yg1UztQaoI/QgCY=; b=y2d1rUAKUjiRlnDfq1YRzPJJM7dGBd/Y0Tza4OmkIHwtxeJm4KRRshzm0wC5HcZPhkPcXN Wqbdf5w6vojjip1wZAdY5Hfj/dUPNn8eW6eQemdNkjctqtzkJs1jjZdciHKCTDk7v5L9m8 bffLLiv51ilAt+Q9jpdPSOy7vIrXCLY= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=ziepe.ca header.s=google header.b=iibvmxKm; dmarc=none; spf=pass (imf24.hostedemail.com: domain of jgg@ziepe.ca designates 209.85.161.44 as permitted sender) smtp.mailfrom=jgg@ziepe.ca ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709739811; a=rsa-sha256; cv=none; b=K42PJmC9QxSGCmyY9k+8jOjJTWP8OO2qyyYAdwiw+WND/92cAiSj4Kks2HmuZTQ5QDvznQ szFQdbnk2nRg0DeEq0se5Mv/1VNpGn4eXVBh6qsPZ3x7OIyvqNrFMsFyIxIsnMDah4vOtb lE5x+hffvMArk2hysQBzk/CXH8rxVOE= Received: by mail-oo1-f44.google.com with SMTP id 006d021491bc7-5a1a069bd16so169909eaf.3 for ; Wed, 06 Mar 2024 07:43:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; t=1709739810; x=1710344610; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=k1cN5gYBP45I54WCStxe7DUVSwP4yg1UztQaoI/QgCY=; b=iibvmxKmLF4g+O3yVd7mZaRv3NhEfr2KH5v1Ek88ILl0MgY0vSSuG+8DcSvU6Q5kdA e90+TZQqJUX9IIFUdXfRtp0uHy8f40nhz9Y4hsMq8PasWk+kKjhIFxZOYiXL5TiVCnh3 bAO1sKKyZOC2YmndTVmtT0NNgCqgGOI+j1JMqQeFn/I75FqcYFh4xPQ55ScHVaJ/z9dl FYhFkHnEOMYq/FHz3+GlE2zgehtGNS9x6OZhGdScZ24LSEupWW7R40HT7YC9xHto5iEB C7C9c/BmygaZ19Pl8a75SdQDr0RdljZVejn6u2m9VkhfnnuAGvpGx810nT0ryr762jtr Jx6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709739810; x=1710344610; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=k1cN5gYBP45I54WCStxe7DUVSwP4yg1UztQaoI/QgCY=; b=aqnlgQ+4qX8/Gss8Wye8SgrkyPasC+XLAClqLB7p92eTf2oyAqusg/tteCDZWjDUub TZhYhDn26GgpQiMReg1N3JYucmQgOqN5QKWyBM97BiaMuzqnx+WIACdVNcfrprrVWNJW fCatYlhW8Mdz8uyniYLkplvxPb9djXm/r+rXp5hxHy77HFy6A66NBtt1sZY2GuXkzqUk ZYHvwUkc3L0axfKmOEIpD+w1C/njZguBHQ7I++P3nc6luG64UD2B+/QYCvgXFqYdTXYC sAanvVowilub71qsIp11rtUEOEqjdxp4xAtf1TaNLEgyIJtjQvEIMnW7hUOszH8etyfN YGtA== X-Forwarded-Encrypted: i=1; AJvYcCWUuKbuq2WYw7HNwBHJxALKAqaGESV+Gks3pODqTKdbrqgAiCn7Aji8PEYoPM7RGryFgEnmxngJQfpUi0rIbxLYoyA= X-Gm-Message-State: AOJu0Yz4vUwh7bMOgOXTgh5BbH0miVEW394LJuMgd5QIxgmWKvofKbho khuXtKnBpMOd9rGrERDp6KR9Ahs9HJNYTuQE9RdOfM/bOZlilyGOON52UdOr6SU= X-Google-Smtp-Source: AGHT+IF835JdG2KV+Fg3U6AZj/VvG5X03U44hahvSgMdG82XDPKZTf6weC4/9ghDsNohTNUCXi+btw== X-Received: by 2002:a05:6358:78a:b0:17c:1bc7:16d5 with SMTP id n10-20020a056358078a00b0017c1bc716d5mr5567125rwj.5.1709739810357; Wed, 06 Mar 2024 07:43:30 -0800 (PST) Received: from ziepe.ca (hlfxns017vw-142-68-80-239.dhcp-dynamic.fibreop.ns.bellaliant.net. [142.68.80.239]) by smtp.gmail.com with ESMTPSA id j10-20020ae9c20a000000b007871bac855fsm6707680qkg.47.2024.03.06.07.43.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Mar 2024 07:43:29 -0800 (PST) Received: from jgg by wakko with local (Exim 4.95) (envelope-from ) id 1rhtQW-001c2D-PU; Wed, 06 Mar 2024 11:43:28 -0400 Date: Wed, 6 Mar 2024 11:43:28 -0400 From: Jason Gunthorpe To: Christoph Hellwig Cc: Leon Romanovsky , Robin Murphy , Marek Szyprowski , Joerg Roedel , Will Deacon , Chaitanya Kulkarni , Jonathan Corbet , Jens Axboe , Keith Busch , Sagi Grimberg , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , =?utf-8?B?SsOpcsO0bWU=?= Glisse , Andrew Morton , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, kvm@vger.kernel.org, linux-mm@kvack.org, Bart Van Assche , Damien Le Moal , Amir Goldstein , "josef@toxicpanda.com" , "Martin K. Petersen" , "daniel@iogearbox.net" , Dan Williams , "jack@suse.com" , Zhu Yanjun Subject: Re: [RFC RESEND 00/16] Split IOMMU DMA mapping operation to two steps Message-ID: <20240306154328.GM9225@ziepe.ca> References: <47afacda-3023-4eb7-b227-5f725c3187c2@arm.com> <20240305122935.GB36868@unreal> <20240306144416.GB19711@lst.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240306144416.GB19711@lst.de> X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 763CD180034 X-Stat-Signature: w4bpyfapz6ayrdjt3rd14y83jtrabfqb X-Rspam-User: X-HE-Tag: 1709739811-255450 X-HE-Meta: U2FsdGVkX1+k2VfoZTFLD0kR1Z4apGW1bufGaMxIeShLUde0GaxervBPaPn4X4MX0VwytS/CekLzY56/TjTrlUgSnDSVwEfdH5QebAfHLlVhZJajY4WYwd3PWnYhi5YEpoiUHL5joZk9NJ/41dDTnx8Fr34Cl/nFX7LnAB33iE1mr/GuHN5Ug6SGMlbJgaVbKQupjj9g+iACEl3T3hAJj8jai4PYK/VU/2Ve0LiQKf25QdTBz8uovTOFtXrKY5iZRKM2KK3maO7+8kBFDs+cBEu9cQmP+YDE5DrLnTgIcEJobFvJ+imu+V/50yHHeyCXaMDe0G/RqVwhTyMA3eek5mTTODGR8z22KDlz7Hlv7p4Q8DCe/usjXy1MKIZFFPlzL0fahZc0litz7naZBX2pbdxqO5Xa9ORjmiauPqiBSRHlHK4sxTnfa+2YxrpBuSC5yYxZ+MTQ5qNaBERbKmVMhL/eA5nZatd9WLGtPsDaTsb1lRFy/1+6SYhZaeHKeLR2GfcMUI3rLpK78C+cPbmQ+Im3nTVsu38p8C+hLYKONVtb5fjdCTT8IC1fz1E2F9DAp5KeMzqszXFkDN87sTM3e0lersrPd6kUSs4KnchPSdd52DgM6Pho5pQ2mggMYeb8MfX8QZ8zQ7EYjQBEhHvnck0WHCDBsrctWOhHs9mtvQYa9Hln5Pl6jiXGwtRi5l5RtIUce7RGiFlpfMnObdevX/gsAK4oI1NPBoKremnX/pIykktD43jMIbkseaTUXxHAteAzzY4FayTxDVTDJYDEfvd7+5JaB0k/H9l9fJO+/ib9dpKu+P0Hbas909BYzNkj+38zlmeAIAWQat28VCfaYON3kcdhNArhc4x2sNoQNbj8HiX9DUT//OM1SJXEUOqo7gXQKk8rlpzLGIC6jw4CG8HEg+r9+lOK83J9ZdqftgwKtolj3K6SGmgHzwhl2Ki578/uKNQsyIbeOof4+zV ZNKyUVfQ Hj1FqAjdQkelTdJiiMha7ysWb/s+xeX6k9bRZfiZQflrxRv9KH5QbRFjkNXQt4HcyUZDvg+QLPo2koXgTcm+RdZ19ifJE77eAtP2fapgyQdEv2JaU204bZBmEeoXVRUXMVgJQdwyya8hOuPigg9NPSNt7yVE6ZeeqCQsnaopdxNk95of6VnBSvj4ZgrgWjpvcwR0UEPAw497mzyM5dz/ibiL68tgujAByVN7OfyDSXoxm4U/MhS0ineoKhrdjYkgC5PqhWbFgIV2KV6J8NogIHZU9a1dxtu63o4VGYCBxpBmoqTS/RUmnfFBDanJnTzhPtg/RKlccs/Er42FcIEQlbdKTJJ0OmZnFM+/Ehj9oXYgnHJ4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Mar 06, 2024 at 03:44:16PM +0100, Christoph Hellwig wrote: > Except that the flows are fundamentally different for the "can coalesce" > vs "can't coalesce" case. In the former we have one dma_addr_t range, > and in the latter as many as there are input vectors (this is ignoring > the weird iommu merging case where we we coalesce some but not all > segments, but I'd rather not have that in a new API). I don't think they are so fundamentally different, at least in our past conversations I never came out with the idea we should burden the driver with two different flows based on what kind of alignment the transfer happens to have. Certainly if we split the API to focus one API on doing only page-aligned transfers the aligned part does become a little. At least the RDMA drivers could productively use just a page aligned interface. But I didn't think this would make BIO users happy so never even thought about it.. > The total transfer size should just be passed in by the callers and > be known, and there should be no offset. The API needs the caller to figure out the total number of IOVA pages it needs, rounding up the CPU ranges to full aligned pages. That becomes the IOVA allocation. offset is something that arises to support non-aligned transfers. > So if we want to efficiently be able to handle these cases we need > two APIs in the driver and a good framework to switch between them. But, what does the non-page-aligned version look like? Doesn't it still look basically like this? And what is the actual difference if the input is aligned? The caller can assume it doesn't need to provide a per-range dma_addr_t during unmap. It still can't assume the HW programming will be linear due to the P2P !ACS support. And it still has to call an API per-cpu range to actually program the IOMMU. So are they really so different to want different APIs? That strikes me as a big driver cost. > I'd still prefer to wrap it with dma callers to handle things like > swiotlb and maybe Xen grant tables and to avoid the type confusion > between dma_addr_t and then untyped iova in the iommu layer, but > having this layer or not is probably worth a discussion. I'm surprised by the idea of random drivers reaching past dma-iommu.c and into the iommu layer to setup DMA directly on the DMA API's iommu_domain?? That seems like completely giving up on the DMA API abstraction to me. :( IMHO, it needs to be wrapped, the wrapper needs to do all the special P2P stuff, at a minimum. The wrapper should multiplex to all the non-iommu cases for the driver too. We still need to achieve some kind of abstraction here that doesn't bruden every driver with different code paths for each DMA back end! Don't we?? Jason