From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 09C7DC54E67 for ; Wed, 27 Mar 2024 17:15:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7CBCA6B009A; Wed, 27 Mar 2024 13:15:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 77B936B009B; Wed, 27 Mar 2024 13:15:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 61C996B009C; Wed, 27 Mar 2024 13:15:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 42C476B009A for ; Wed, 27 Mar 2024 13:15:15 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id F2944140D31 for ; Wed, 27 Mar 2024 17:15:14 +0000 (UTC) X-FDA: 81943469748.29.56A4BDA Received: from mail-oi1-f176.google.com (mail-oi1-f176.google.com [209.85.167.176]) by imf01.hostedemail.com (Postfix) with ESMTP id 0D8BC40028 for ; Wed, 27 Mar 2024 17:15:12 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=ziepe.ca header.s=google header.b=VwePzt2P; spf=pass (imf01.hostedemail.com: domain of jgg@ziepe.ca designates 209.85.167.176 as permitted sender) smtp.mailfrom=jgg@ziepe.ca; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1711559713; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=MyfMHWP1Z5kFosJnJ/cLT9vrDNlc6F3RpHcMzwkJjx8=; b=FjJxW0Feec8YQL0AEAtwL4x6X/I55YukdZlW2SxPoEBXZik3gMRJRkvAD1Mjhy5J9vnmf2 p9fMApJACiYFH5cF5u1Jjx7rC24eHC3qyHwxTCZi49sPJCuvWB1dHFoIspmLvX0MWs56F9 5Rs/SxIzwQI3Ilae/2XNR2DpuNOjk4U= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1711559713; a=rsa-sha256; cv=none; b=zj3sE9ezne3xcO3JCtRB5A+VRnab1WaPQ/Cc3gcm+Jhr5FmukLx6Uy+O4j/AHtDe2RPK3L Wq0nVn1j9j+S+xzfyRu2k/oWgeWW6pEpdzZQxLM4KkznzCa6iEubj8qNmIzkJBB4pg3DuD 9hzab7BUDGkxa/4rTju/K0Btm+vNbAs= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=ziepe.ca header.s=google header.b=VwePzt2P; spf=pass (imf01.hostedemail.com: domain of jgg@ziepe.ca designates 209.85.167.176 as permitted sender) smtp.mailfrom=jgg@ziepe.ca; dmarc=none Received: by mail-oi1-f176.google.com with SMTP id 5614622812f47-3c396fec63aso56989b6e.0 for ; Wed, 27 Mar 2024 10:15:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; t=1711559712; x=1712164512; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=MyfMHWP1Z5kFosJnJ/cLT9vrDNlc6F3RpHcMzwkJjx8=; b=VwePzt2PXAHztO4ihQR2s/alkYZS0qcCkWQjwsnQIesTDX2S5SY4q7/KEMJc1EjJjY QrAvB4IFkg79zhcubgGc6bsogmY7PCPNP4ND/us0KrfCs9iOepy+YT2DAqybt2YXc387 SpGV5BkvhPu5Hxl4boTuGam0j1Vmu62Xaw/FzAwCnpqyjRGxDtmNhdrJiu5fVYWoSV24 YEr5KqRDo5sHfo0YCroOOY9TW8XhAzvKWsDtD6tCT8EvkMkw/0flzuLqSvl9KgEzWStl Q9XrpKYAd0Gi/VQpLEi3//m2GbUivEhI3ufpdDdHq1htipqIXJ+tO4Tp+OcM4qKv/APU nVjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711559712; x=1712164512; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=MyfMHWP1Z5kFosJnJ/cLT9vrDNlc6F3RpHcMzwkJjx8=; b=vdDhsrGM9Mt/Euy25JAiCyHJIVjaxWfRD/NRPBOX02TfCSeTwXRbrU1rfyeRR1skVD YK5qezWUvAdjbkAETcCPx2AaK6O1DQeFmvtcw9JC+Kd5ZZrVBpjLBnm4w9tFLlmuTRi/ h9zxSnbYOVeyhRqxVB8FXllAoWsyX3rMFAmYNiRAK+LcnFCayMTcgfSeFtxH20unSbSc aFDse/LhF8l5j996+zzj58kDIPyjR8SuTh4OyXTKIsHYZmOcL19gE8RyBAeGpityhJqu AiM64ztUu+BTrsrjkiCrOZcJfGOceJr8Rnsits39TzDmj5YCfglvFVoJN5vhn2N5W0qx 2RQQ== X-Forwarded-Encrypted: i=1; AJvYcCXc8WFN5YLejKz2tBd4tdIfeGoF0R5BBfOoj0gvL9y91zxm0wnRAEiRm1YHlYxoyCa0BwLDqShXRXwbYPvHgm+SUAM= X-Gm-Message-State: AOJu0Yz+HmvknAjJkyl+/XNN6XRcfmRAdZM6ahTBtjs/owNxab1yEe5G pmxmLSKdYLW52OEe9wCLwLKnrXwueLiUjkizMMwWALbv5Z5iS06QyRjZ8Jd6IdE= X-Google-Smtp-Source: AGHT+IGQ+Fbz7qR1SW5qNkJYopwwdBYDjEYKQNSO/4NiABanB9dU3kXOL1kwvcUdEQ/e1ORbwkUJTQ== X-Received: by 2002:a05:6808:64b:b0:3c3:d56d:a5dd with SMTP id z11-20020a056808064b00b003c3d56da5ddmr357972oih.18.1711559712027; Wed, 27 Mar 2024 10:15:12 -0700 (PDT) Received: from ziepe.ca (hlfxns017vw-142-68-80-239.dhcp-dynamic.fibreop.ns.bellaliant.net. [142.68.80.239]) by smtp.gmail.com with ESMTPSA id kd9-20020a056214400900b00696b117a325sm499925qvb.108.2024.03.27.10.14.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Mar 2024 10:14:48 -0700 (PDT) Received: from jgg by wakko with local (Exim 4.95) (envelope-from ) id 1rpWr2-005ajV-OE; Wed, 27 Mar 2024 14:14:24 -0300 Date: Wed, 27 Mar 2024 14:14:24 -0300 From: Jason Gunthorpe To: Christoph Hellwig Cc: Leon Romanovsky , Robin Murphy , Marek Szyprowski , Joerg Roedel , Will Deacon , Chaitanya Kulkarni , Jonathan Corbet , Jens Axboe , Keith Busch , Sagi Grimberg , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , =?utf-8?B?SsOpcsO0bWU=?= Glisse , Andrew Morton , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, kvm@vger.kernel.org, linux-mm@kvack.org, Bart Van Assche , Damien Le Moal , Amir Goldstein , "josef@toxicpanda.com" , "Martin K. Petersen" , "daniel@iogearbox.net" , Dan Williams , "jack@suse.com" , Zhu Yanjun Subject: Re: [RFC RESEND 00/16] Split IOMMU DMA mapping operation to two steps Message-ID: <20240327171424.GI8419@ziepe.ca> References: <20240307000036.GP9225@ziepe.ca> <20240307150505.GA28978@lst.de> <20240307210116.GQ9225@ziepe.ca> <20240308164920.GA17991@lst.de> <20240308202342.GZ9225@ziepe.ca> <20240309161418.GA27113@lst.de> <20240319153620.GB66976@ziepe.ca> <20240321223910.GA22663@lst.de> <20240322184330.GL66976@ziepe.ca> <20240324232215.GC20765@lst.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240324232215.GC20765@lst.de> X-Stat-Signature: s3ysuu3jm6nxtfqqf8qzwrf4w4k1ezws X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 0D8BC40028 X-Rspam-User: X-HE-Tag: 1711559712-626713 X-HE-Meta: U2FsdGVkX1987GHFMO7Ehn6NsS01dL1UJYJUnLO8B/mX9rk+j83D45Q983ILqKzqMTzrRS1WQMpZ/o7D0DamX1LPqKj92cQynD5me1qti4o6RT9z2rMiJ5rGz2RbNc7tF21+ZXANMrI6Eqpy2cbpKM1a4M4IXXDKqq/06p268pSThrytLeRwdh0gm5DcdPVdbSTW86VAsZwXMw3MZ/Mi6tCj/6ASxDPts2j+tV1ankj63dRkl+rRNI1CIoDlvzVTw8Gh7NJUAmkU4z8mOjmeH1f44cicxYbhQNmaSvqadTAPEWcAf/CkwLRZQr18Du2OdIdFQ9ny098I2qzVhWqQfL0vkNqiGe5UevhHMdt8k/KUCEVnlAELe8Rc4ZjC57cFwEGH+OGXNBZ0NhZNOYG7MzP6Ig6rB1KYhxrDWLewZPB/QW4KGTSibv7FOzW6I4CSaAA+Al1T7de6yB6oULtqcZExeN8ztVLgm6TgsYjK/VftBeAvvSkCX+tcBjIVZODp8OIHDwZ7LydrstrOZoOjC2YHJzKRxYksWmcp8ujZi+AWWNhVuP7jMT+zW4GS8mfU5WO/ZLVuLrcQTKMzUZ1bkQ8feEs3LF0r1xHQ/vayNF8zj/UFCwKEc4qqDMshscb2VeKjr4nd2D57csbxQ3al+FSx9tMRQ1/yKPGQbKyKiYMQRWNwdsHmNmLQyfWC88BFAv0I1aQ09IEjxzzQkkZ0nvg6fKAAFPhR9Dstp0ItOW7y8hCv2EMfQwZYb4ro4Y8P+PBkWRfrrlUe2Lrv+YA2sEVy+doJiWbdVi8kOvkaNP9tgds6Oj2OAdOOSexPDwTZLW2YL9lAk8+XjsBhNLHuXqRtQzVmBg2Cx6WH7bJA1DR6wFad0m6fHaxoJMv+moi+EQKA84+pEaV1hg06zH4OkoDH5tAUkHL+KgrYCkYNVY8zMlzviXaWKvKeLdeRKa3pjcSl2bfO+Y6zXtfOyIQ 7Z0kngyf wCDchxcgSqP5467L4hDPvD8HQiI2uPkQXvjczYHDutNjh8R7UXTxrKu7tJB1BVrc9qdiuQ2vGplC+aokbUVKconbgVQzFll5byTw9dKcxh66NgJnMoNiKu6xCiFTnZYsOJT3qcJKHBhba4q6BKeue7jdInjZbzVHsCuBA3AWUsJ9fp1wL694tu+0FbnUYiP3ZGx6bozjLpyALLSTnlUqD9iEQY3WclbyPaoT9sH1rVWcxzx3C3Bptr7HZBHes3Yy02DntPqFpNzEvu2I64U53M1dheT8h0/XOd8r8OoMAe9pHh/6FVYE+8vxX3SAMM71Swy6PmfRA3O84kWQasgDCmDDXnW2zTRHovY2u9E2S5D+zx1Ka33VwbRQb4A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Mar 25, 2024 at 12:22:15AM +0100, Christoph Hellwig wrote: > On Fri, Mar 22, 2024 at 03:43:30PM -0300, Jason Gunthorpe wrote: > > If we are going to make caller provided uniformity a requirement, lets > > imagine a formal memory type idea to help keep this a little > > abstracted? > > > > DMA_MEMORY_TYPE_NORMAL > > DMA_MEMORY_TYPE_P2P_NOT_ACS > > DMA_MEMORY_TYPE_ENCRYPTED > > DMA_MEMORY_TYPE_BOUNCE_BUFFER // ?? > > > > Then maybe the driver flow looks like: > > > > if (transaction.memory_type == DMA_MEMORY_TYPE_NORMAL && dma_api_has_iommu(dev)) { > > Add a nice helper to make this somewhat readable, but yes. > > > } else if (transaction.memory_type == DMA_MEMORY_TYPE_P2P_NOT_ACS) { > > num_hwsgls = transcation.num_sgls; > > for_each_range(transaction, range) { > > hwsgl[i].addr = dma_api_p2p_not_acs_map(range.start_physical, range.length, p2p_memory_provider); > > hwsgl[i].len = range.size; > > } > > } else { > > /* Must be DMA_MEMORY_TYPE_NORMAL, DMA_MEMORY_TYPE_ENCRYPTED, DMA_MEMORY_TYPE_BOUNCE_BUFFER? */ > > num_hwsgls = transcation.num_sgls; > > for_each_range(transaction, range) { > > hwsgl[i].addr = dma_api_map_cpu_page(range.start_page, range.length); > > hwsgl[i].len = range.size; > > } > > > > And these two are really the same except that we call a different map > helper underneath. So I think as far as the driver is concerned > they should be the same, the DMA API just needs to key off the > memory tap. Yeah.. If the caller is going to have compute the memory type of the range then lets pass it to the helper dma_api_map_memory_type(transaction.memory_type, range.start_page, range.length); Then we can just hide all the differences under the API without doing duplicated work. Function names need some work ... > > > > So I take it as a requirement that RDMA MUST make single MR's out of a > > > > hodgepodge of page types. RDMA MRs cannot be split. Multiple MR's are > > > > not a functional replacement for a single MR. > > > > > > But MRs consolidate multiple dma addresses anyway. > > > > I'm not sure I understand this? > > The RDMA MRs take a a list of PFNish address, (or SGLs with the > enhanced MRs from Mellanox) and give you back a single rkey/lkey. Yes, that is the desire. > > To go back to my main thesis - I would like a high performance low > > level DMA API that is capable enough that it could implement > > scatterlist dma_map_sg() and thus also implement any future > > scatterlist_v2, bio, hmm_range_fault or any other thing we come up > > with on top of it. This is broadly what I thought we agreed to at LSF > > last year. > > I think the biggest underlying problem of the scatterlist based > DMA implementation for IOMMUs is that it's trying to handle to much, > that is magic coalescing even if the segments boundaries don't align > with the IOMMU page size. If we can get rid of that misfeature I > think we'd greatly simply the API and implementation. Yeah, that stuff is not easy at all and takes extra computation to figure out. I always assumed it was there for block... Leon & Chaitanya will make a RFC v2 along these lines, lets see how it goes. Thanks, Jason