From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9D562C5B552 for ; Mon, 9 Jun 2025 09:32:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 376306B00A3; Mon, 9 Jun 2025 05:32:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 327146B00A7; Mon, 9 Jun 2025 05:32:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 264826B00AA; Mon, 9 Jun 2025 05:32:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 074C56B00A3 for ; Mon, 9 Jun 2025 05:32:29 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id A1B9E1A1669 for ; Mon, 9 Jun 2025 09:32:28 +0000 (UTC) X-FDA: 83535346776.15.B844006 Received: from mta20.hihonor.com (mta20.honor.com [81.70.206.69]) by imf17.hostedemail.com (Postfix) with ESMTP id 0632140009 for ; Mon, 9 Jun 2025 09:32:25 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=honor.com; spf=pass (imf17.hostedemail.com: domain of tao.wangtao@honor.com designates 81.70.206.69 as permitted sender) smtp.mailfrom=tao.wangtao@honor.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1749461546; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=P5W+COQsnpO2qkNipD4mVRBUbl1dLEzDUChQq+Eoc7g=; b=aK7MIcDwAttSvEXTEcUQnzcN3mOliIRGdzu0ypdbknlnSGbpFyD1X3Ttuwg+ZIKbKSMojT 3eM7qJ8wVb3vIKj9MvlbyLgfvMSmYKXEYZsKRpaK3+RWeaGjy9EK+KP102Wivy5VEtqNix +t2xLcNH3WfQof77vTLo0A25CbmP1NM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1749461546; a=rsa-sha256; cv=none; b=zrlvNrEl222Q/Ax/B52cWmoPQwragZ0pOJMYwafewSJP99RvYtLakqws2hYLzHvBCX9QTh Ug5UgzClHmSxVXnIttxHO6H8f5TDReWBPyG383Jd2eTkMpvZKTBMdB539IQ09WlqduW/fG 57BnCoqfXOHU/fm4ljOk1tpOzwkwx7Q= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=honor.com; spf=pass (imf17.hostedemail.com: domain of tao.wangtao@honor.com designates 81.70.206.69 as permitted sender) smtp.mailfrom=tao.wangtao@honor.com Received: from w001.hihonor.com (unknown [10.68.25.235]) by mta20.hihonor.com (SkyGuard) with ESMTPS id 4bG68R70G6zYl5Jm; Mon, 9 Jun 2025 17:29:59 +0800 (CST) Received: from a011.hihonor.com (10.68.31.243) by w001.hihonor.com (10.68.25.235) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Mon, 9 Jun 2025 17:32:21 +0800 Received: from a010.hihonor.com (10.68.16.52) by a011.hihonor.com (10.68.31.243) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Mon, 9 Jun 2025 17:32:20 +0800 Received: from a010.hihonor.com ([fe80::7127:3946:32c7:6e]) by a010.hihonor.com ([fe80::7127:3946:32c7:6e%14]) with mapi id 15.02.1544.011; Mon, 9 Jun 2025 17:32:20 +0800 From: wangtao To: Christoph Hellwig , =?iso-8859-1?Q?Christian_K=F6nig?= CC: "sumit.semwal@linaro.org" , "kraxel@redhat.com" , "vivek.kasireddy@intel.com" , "viro@zeniv.linux.org.uk" , "brauner@kernel.org" , "hughd@google.com" , "akpm@linux-foundation.org" , "amir73il@gmail.com" , "benjamin.gaignard@collabora.com" , "Brian.Starkey@arm.com" , "jstultz@google.com" , "tjmercier@google.com" , "jack@suse.cz" , "baolin.wang@linux.alibaba.com" , "linux-media@vger.kernel.org" , "dri-devel@lists.freedesktop.org" , "linaro-mm-sig@lists.linaro.org" , "linux-kernel@vger.kernel.org" , "linux-fsdevel@vger.kernel.org" , "linux-mm@kvack.org" , "wangbintian(BintianWang)" , yipengxiang , liulu 00013167 , hanfeng 00012985 Subject: RE: [PATCH v4 0/4] Implement dmabuf direct I/O via copy_file_range Thread-Topic: [PATCH v4 0/4] Implement dmabuf direct I/O via copy_file_range Thread-Index: AQHb1G1ol+FT389RFkuW+lwB3adoKrPw4BKAgAADywCAAAF8AIAE6kCg//+rigCABEW6AIAA1IFw Date: Mon, 9 Jun 2025 09:32:20 +0000 Message-ID: <761986ec0f404856b6f21c3feca67012@honor.com> References: <20250603095245.17478-1-tao.wangtao@honor.com> <09c8fb7c-a337-4813-9f44-3a538c4ee8b1@amd.com> <5d36abace6bf492aadd847f0fabc38be@honor.com> In-Reply-To: Accept-Language: zh-CN, en-US Content-Language: zh-CN X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.163.18.240] Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 0632140009 X-Stat-Signature: ihmei4gdgz59hid8regsy4u6dqt8aq19 X-Rspam-User: X-HE-Tag: 1749461545-986385 X-HE-Meta: U2FsdGVkX1/jEk0X2/boDYKGZeFhFHGIT/2PBU9nTylzkfFR1nUBx2TT7t2SRL+iVUg7f1RgVi5CbfmBaFoPTTiVSmQKRphpPMSRGpelFZSmxQWKIXtUSFDZ1OZbPH2chSp8xP8uXqYepkK60d4mHlDwkHsc09zi0olHbYsdnx8MXYx9n3I8h3srVCdw0kgYb181kLn98SbE7ae6lmH4Sbqmo8bClPQa0s2sxgO0NEYSyV5LKk2Y91zaDkcLZBNNkLpOIFOo1llqF/uUog+IgvYMoih3wjlun3w6P9stCrdJk6JdsztFaKU0OaJbTixivUwKD9Hfwlrd4zw0J27oqnGvl5TzMu5Vt3W6ItJ0ziSC+mLEaxf3rA+VRCyrWjbmeYXkR9Q4TkxQKMFIL+C4H/iy/Ur91eHSItftdJOTgxKpi8uGODcbgqRYgjubItIzZUmq8027rjry6rFqF6x9a/pfrpKCwEPSGU2UXZYu0VHe4U4cae4OYdUccABrp0RfGyTvC9Mal5siCzfzTYF98ZYeGKL0pDJqeccYjgLj8dVkqPSlEgn+2h1VzsqDos/nFWaCBFntx7wFzKoAfyGRUbvHZIJdjaXp2kOKGMvt+cp6YRY0/S6kwDEP4Hgv3zx8MuPT3sKArHsp450ISMybyzUjxzYhz/AOK4/FJ0epny8lIos9UxEl07AWgdSGqZimMCLBRWK1nu0pmWKC2dN0NAgTMyyH6qNjngKuQkkyLOx+jh4yxLFDd8VXpV9PlFBqXbpTMFYFYh6x7rB5nXWzq6SL4G9B8G7R7zxkfHVHetLJxzC0NYZ98q8XpfUuqM1AtbP0m2PlQ82sd86RQP09F0BEG0GMo3MQMx35lGqYWO9DLvSt5feIApsaVh3H5W/pZuJAk0O9gLzpTcwFytrsDCv5zhuMMrGm81G4dMsnTdASKeUWyUJIkttVNscVRfQY81/6TYHS7jlewYoy5+K BztgiPAa +8ShMUb6qT3aDlokaegwNu7/63NyTC+q0dZUAye77C6O3nw9ITRgUHB8nzyXd54POqG47ishLuc3ta9KmGeQcXtpDO8AP/mLYE0KM6TXpPJ28NjOlovBw81xwEyTsArGr2B0rZwTovg2vreG8K32ZBCGGvtdZ1m/RSiA7j1KXVH4AYt/AgSaBUtmMoTNpIohV+oRlr0Vv18iAbFOjgCJYv+r7nmL6Bb/3g+9a5HJ2AGVi2mUz+855w2HVSGXRD3l7EOWJHK86QWCaq6sraw+hjJkl8D/Ufd4jEkujjpSya74u63Fm1LqPhh5qoUhfpmKmQPK3DEtl4EdI/aLdk/MjQeWiikXmlnBvVKP5Tu/HqHay+tkqvGx4kHTnhex08KBlzOuOS76WsfRxIck8L0SiL8NylbthNrDnfT8mPNZYkpL0ctwWznSm/tvPlkKbJy2NTn8xWjRjt575xAvV3cZrrabuCdMwy90Fq1QC6wXtmODOCjm0wefCRcV6mR0shf0x9NwfkB0AOQy3nx53nurlPS81H8Ep02XokPbQi50Do4feuCgeaGJeZsOhzLLKokzLUQXTcK5PEBPi51wGbeFp2B8mhFz4RpaGNX2SmpPmRaQCxf4Bvutbd8CLvrB8Jdg60zsRwpuDP3r7prw1RIqf9FxnfscMAMhE3sE7Oz36mJHh8C95QrWrGXl+nkOMqngcTEd/TbS7KLe2JUbEJPw4Qdt5JaJQsMFvseKuJkXImf/SzItOAIMNLDtJTXEgYGNK/ruHURh1ZEPRJgLE781wi5V65yePEjYLiaQzKQoYP7Fcm/HY5DEMk9yXS21KmsOcUdDtkmigdjJzwes9CTBO1aFko99MGApt9X1qXSKhDf86u8aKsKTsPkv7F4SM0Pw2yq1Y6FaEZNEWZtsREtkc0DKdAPgHw2ntz2xrTmtsIjUpFIT5rqdEbxNLgDrK+wcLtOGRGW1aMDka9ZrhtP5MEG8Iqpb4 dVHIPBJp JXsKdSgE5324dW5riksvDRksXSws/sS8SmZ66HeTwyjXkJ9zQ8+kE9G1mlvNh15h X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: > -----Original Message----- > From: Christoph Hellwig > Sent: Monday, June 9, 2025 12:35 PM > To: Christian K=F6nig > Cc: wangtao ; Christoph Hellwig > ; sumit.semwal@linaro.org; kraxel@redhat.com; > vivek.kasireddy@intel.com; viro@zeniv.linux.org.uk; brauner@kernel.org; > hughd@google.com; akpm@linux-foundation.org; amir73il@gmail.com; > benjamin.gaignard@collabora.com; Brian.Starkey@arm.com; > jstultz@google.com; tjmercier@google.com; jack@suse.cz; > baolin.wang@linux.alibaba.com; linux-media@vger.kernel.org; dri- > devel@lists.freedesktop.org; linaro-mm-sig@lists.linaro.org; linux- > kernel@vger.kernel.org; linux-fsdevel@vger.kernel.org; linux- > mm@kvack.org; wangbintian(BintianWang) ; > yipengxiang ; liulu 00013167 > ; hanfeng 00012985 > Subject: Re: [PATCH v4 0/4] Implement dmabuf direct I/O via > copy_file_range >=20 > On Fri, Jun 06, 2025 at 01:20:48PM +0200, Christian K=F6nig wrote: > > > dmabuf acts as a driver and shouldn't be handled by VFS, so I made > > > dmabuf implement copy_file_range callbacks to support direct I/O > > > zero-copy. I'm open to both approaches. What's the preference of VFS > > > experts? > > > > That would probably be illegal. Using the sg_table in the DMA-buf > > implementation turned out to be a mistake. >=20 > Two thing here that should not be directly conflated. Using the sg_table= was > a huge mistake, and we should try to move dmabuf to switch that to a pure I'm a bit confused: don't dmabuf importers need to traverse sg_table to access folios or dma_addr/len? Do you mean restricting sg_table access (e.g., only via iov_iter) or proposing alternative approaches? > dma_addr_t/len array now that the new DMA API supporting that has been > merged. Is there any chance the dma-buf maintainers could start to kick = this > off? I'm of course happy to assist. >=20 > But that notwithstanding, dma-buf is THE buffer sharing mechanism in the > kernel, and we should promote it instead of reinventing it badly. > And there is a use case for having a fully DMA mapped buffer in the block > layer and I/O path, especially on systems with an IOMMU. > So having an iov_iter backed by a dma-buf would be extremely helpful. > That's mostly lib/iov_iter.c code, not VFS, though. Are you suggesting adding an ITER_DMABUF type to iov_iter, or implementing dmabuf-to-iov_bvec conversion within iov_iter? >=20 > > The question Christoph raised was rather why is your CPU so slow that > > walking the page tables has a significant overhead compared to the > > actual I/O? >=20 > Yes, that's really puzzling and should be addressed first. With high CPU performance (e.g., 3GHz), GUP (get_user_pages) overhead is relatively low (observed in 3GHz tests). | 32x32MB Read 1024MB |Creat-ms|Close-ms| I/O-ms|I/O-MB/s| I/O% |---------------------------|--------|--------|--------|--------|----- | 1) memfd direct R/W| 1 | 118 | 312 | 3448 | 100% | 2) u+memfd direct R/W| 196 | 123 | 295 | 3651 | 105% | 3) u+memfd direct sendfile| 175 | 102 | 976 | 1100 | 31% | 4) u+memfd direct splice| 173 | 103 | 443 | 2428 | 70% | 5) udmabuf buffer R/W| 183 | 100 | 453 | 2375 | 68% | 6) dmabuf buffer R/W| 34 | 4 | 427 | 2519 | 73% | 7) udmabuf direct c_f_r| 200 | 102 | 278 | 3874 | 112% | 8) dmabuf direct c_f_r| 36 | 5 | 269 | 4002 | 116% With lower CPU performance (e.g., 1GHz), GUP overhead becomes more significant (as seen in 1GHz tests). | 32x32MB Read 1024MB |Creat-ms|Close-ms| I/O-ms|I/O-MB/s| I/O% |---------------------------|--------|--------|--------|--------|----- | 1) memfd direct R/W| 2 | 393 | 969 | 1109 | 100% | 2) u+memfd direct R/W| 592 | 424 | 570 | 1884 | 169% | 3) u+memfd direct sendfile| 587 | 356 | 2229 | 481 | 43% | 4) u+memfd direct splice| 568 | 352 | 795 | 1350 | 121% | 5) udmabuf buffer R/W| 597 | 343 | 1238 | 867 | 78% | 6) dmabuf buffer R/W| 69 | 13 | 1128 | 952 | 85% | 7) udmabuf direct c_f_r| 595 | 345 | 372 | 2889 | 260% | 8) dmabuf direct c_f_r| 80 | 13 | 274 | 3929 | 354% Regards, Wangtao.