From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BF3DBC3DA7F for ; Fri, 2 Aug 2024 23:03:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EEB516B007B; Fri, 2 Aug 2024 19:03:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E9B0C6B0083; Fri, 2 Aug 2024 19:03:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D8AA06B0085; Fri, 2 Aug 2024 19:03:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id BD9CB6B007B for ; Fri, 2 Aug 2024 19:03:21 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 7B63CA1402 for ; Fri, 2 Aug 2024 23:03:21 +0000 (UTC) X-FDA: 82408833402.23.AAA4219 Received: from fout4-smtp.messagingengine.com (fout4-smtp.messagingengine.com [103.168.172.147]) by imf06.hostedemail.com (Postfix) with ESMTP id 49585180015 for ; Fri, 2 Aug 2024 23:03:19 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=fastmail.fm header.s=fm3 header.b=Cx8wiIiY; dkim=pass header.d=messagingengine.com header.s=fm3 header.b="j CUllck"; spf=pass (imf06.hostedemail.com: domain of bernd.schubert@fastmail.fm designates 103.168.172.147 as permitted sender) smtp.mailfrom=bernd.schubert@fastmail.fm; dmarc=pass (policy=none) header.from=fastmail.fm ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722639770; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=G1CFRnyHAXhy+dFKRK51ZyDuS5xaKXWg3D6icphUhIE=; b=cWp7VvO1kmtaFXS5ICVRQ+NZDjFo8c+SXgNayUPqJ4qHTmw8QjNFBhSR6bHXRh3180FOH5 CMnLCmsQ+4fa5KBzxwKcjjKIaa+D33H0V2ciabugiUBV5H6IefoPk+q+k5ZcuILVbUjPZX Sfoxv281uOe15Ukjl2vko7W02iYOtSA= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=fastmail.fm header.s=fm3 header.b=Cx8wiIiY; dkim=pass header.d=messagingengine.com header.s=fm3 header.b="j CUllck"; spf=pass (imf06.hostedemail.com: domain of bernd.schubert@fastmail.fm designates 103.168.172.147 as permitted sender) smtp.mailfrom=bernd.schubert@fastmail.fm; dmarc=pass (policy=none) header.from=fastmail.fm ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722639770; a=rsa-sha256; cv=none; b=wQyEAT5vtnV+e+ZeqmxDcYoWBUUhqF+SszYxOO05JHi8NP2y91sqTk/HSgUlqb2Po17OBr c4yVAQtu4ysAHdx1nBAsNJ3b4bIVePBfhECzOlyOCL+N4AjSuCH2F9yufy51aaM5bFxWKD X2ErG/E4uiYmrqD3RyaQa72xlIclYwo= Received: from compute7.internal (compute7.nyi.internal [10.202.2.48]) by mailfout.nyi.internal (Postfix) with ESMTP id A182A138CD0A; Fri, 2 Aug 2024 19:03:18 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute7.internal (MEProxy); Fri, 02 Aug 2024 19:03:18 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fastmail.fm; h= cc:cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm3; t=1722639798; x=1722726198; bh=G1CFRnyHAXhy+dFKRK51ZyDuS5xaKXWg3D6icphUhIE=; b= Cx8wiIiY0v1tji9HZi5pa40ef5yek0qmhXxeLruTQ2whmNjnGnTdKDNCMKE2mS7O 1n4Wed5KiQn9UNQHfqyVUK6GDjXcqmeGw1SI0cpiRSH9A7HH/PtJWICFZzARxa9i q8ILT1EyF+RQzGx/C2CFdRv6geYaJTZroFnNUdh2c3ArF5qvIWrOqraaN5k+aqAa nj7nyWtzAr5ytmoo9Dt8wNGPbDQZXRaNmpL76L4XeS1DUkysF9yBRau14OPFLnNw 2TXnXjDCzmSS/1moyb/oB5I0B6qaIF/kGfRYkRg6oy9IcsoEiJNmSe4a9K9Xs440 eRG7ML3XQpkQO1oAOfK33A== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1722639798; x= 1722726198; bh=G1CFRnyHAXhy+dFKRK51ZyDuS5xaKXWg3D6icphUhIE=; b=j CUllckUOtJ1KwXkS7DeWTvcfU3ur4lUnwosJ6U2R4LXxAhvtUIQnu9f9agMa6z9j a/h8tBeM/yqUFLZm+ErXHnppyz2h/5iRzW+HPv3Wtt4Kfznso383MEAwn2WmRbdY C9kPacYIEWO6t7tfmUzP9scKkclHQJYqCwKzDzF7Gxl/i5w1+EzytnIvo/Zg8d8z l0WUrdY9YFp+3cvFA9+1xFg092Umzh7aakAmLTZ7Xia/PqYONjGIRQsQTTgCXobF vi/zsWetinM4/5yAdteKGnzlaEmtYpQy7uDqHiR4IFBDNivTZoQ3OHDVfErYvn86 v9rh4avSqGqZPZHec92VQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeftddrkedugddulecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpefkffggfgfuvfevfhfhjggtgfesthejredttddvjeenucfhrhhomhepuegvrhhn ugcuufgthhhusggvrhhtuceosggvrhhnugdrshgthhhusggvrhhtsehfrghsthhmrghilh drfhhmqeenucggtffrrghtthgvrhhnpeevhffgvdeltddugfdtgfegleefvdehfeeiveej ieefveeiteeggffggfeulefgjeenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmh epmhgrihhlfhhrohhmpegsvghrnhgurdhstghhuhgsvghrthesfhgrshhtmhgrihhlrdhf mhdpnhgspghrtghpthhtoheptd X-ME-Proxy: Feedback-ID: id8a24192:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 2 Aug 2024 19:03:16 -0400 (EDT) Message-ID: <4c1118d0-b871-44e8-93ca-6b0cf8643144@fastmail.fm> Date: Sat, 3 Aug 2024 01:03:15 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH RFC v2 00/19] fuse: fuse-over-io-uring To: Bernd Schubert , Miklos Szeredi Cc: Amir Goldstein , "linux-fsdevel@vger.kernel.org" , Andrew Morton , "linux-mm@kvack.org" , Ingo Molnar , Peter Zijlstra , Andrei Vagin , "io-uring@vger.kernel.org" , Kent Overstreet , Josef Bacik References: <20240529-fuse-uring-for-6-9-rfc2-out-v1-0-d149476b1d65@ddn.com> <99d13ae4-8250-4308-b86d-14abd1de2867@fastmail.fm> <62ecc4cf-97c8-43e6-84a1-72feddf07d29@fastmail.fm> <0615e79d-9397-48eb-b89e-f0be1d814baf@ddn.com> <3b74f850-c74c-49d0-be63-a806119cbfbd@ddn.com> From: Bernd Schubert Content-Language: en-US, fr, ru In-Reply-To: <3b74f850-c74c-49d0-be63-a806119cbfbd@ddn.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 49585180015 X-Stat-Signature: 7x8c7ghypoazwdza5edahtmy5stm6ncc X-HE-Tag: 1722639799-656986 X-HE-Meta: U2FsdGVkX18aGAjZTGTCj854VOtENkXmJN6eQwijeFDEjiyZP+0UykfIQAqPaCLfn9QupfMCrvB2UWjahZOyf0BtdZ1V1/PU2n4HIkHU+FNTRbO5lnrBqwkE8+Q9Tm2IEHGspyq8Gros/HbEQKKV4teKfilDjBJOpmaBbw3AgDgYiczeOhZ/b8oP89LjksprII6MaFHjUyiT9L0IGWxhHLqFY6/IgPrxgqnhKsU8kH8jxiglysaYAk4iC5icK1HU7ma8N+udeoktrkH7IzksrxmldS9VuE3/Aq1wXmOA1lJqIou+ba3WZBu+TAo9MfTihZIb49djkuII4HLrhKoGdCiaja52ycikB0Lq+JPDzZ+WQnvyzBL/Awrm6L4IBrSbPimim4zybl2QVRjoyQLBhMHMSejXmenngBEMnHrtV4CGWZ29uVTjlaVsacygTgb8MllfakuxDUZBvkj988yrZdgJ11mCvhebBAOP7P/O9d5MnEebhLVUZcBcTjgQGXU6V6PSPnaNiyhLegrQkae9lrsYk+OQJWmRCoqp/bvWAdko8MB4NeLRlKFutRACgTVZSuu2XV+lrnqlC8ULDvYfDmB+Q9VhIIAIGlWhyaOEOhPjrodqY7FcnY/tbZ8zJ7/o/w5rcL2/lR6ZneJBMTFkooG0U1LM0xV9bPjUyM/76RjBvW3gJXpfFQULUU49wN9OXKyCJ6zoCbG9tTHvfebCB3TqqWAfalnUXaBoF6uU/iuosLcKtxaWEewNjwalB5hYiGiEPIRzJ85uY7pyLsb8uxLVRiVLhrSCHwEi3UnoLLk+4T3f5WZqt84Bam9gXkBdrm6Jxc792/8FaQNxv8kYZDbeF/87lIhhKNIUJE+FYlR5DpRIP0c40+lFN4gk2y6+3EmjACR6jewF8puP4iaXnteYsUT9frqaaqDCuDhH3vHJ7XlErtKsFLuJEWiQzlBxrFFxKDX+Ly3grDyLj8t 0liq5651 BiJdY4/kunmOGU8I6CV9neTtedeK3qHbt9o5KD1+3a/O7u2OZ5Nqri/07ySggbdLAxC8XLSRfEbjigfp2S104zjiH/KcQUwdvcQj5XPxsNalST2goWoL0iGzjyTFcVwviZ0hoooclO1FPa1vytqX39YYalZG0yOzYA5pUESxEv7CRRORpcVQmZ9kFYcgbS9gP9Kg174e0AQ8fMZWvpMl1QZlile++uJohLEB/7KMn76RghtTi4UMMPLAxl7K5uQWCvHcOHj5/APvabQjojkT0kLYks3pMJ3CjQjAlAUCZSmKNGFu8O5j/5IneNVIK45TSmPLApzkDeFwK3ZgWC/S695NFbqYE0OmV9LH4T0vZSznwK93PlVtiAZ5iP1ZPK9DjyNOlXVWzGzdAkQW+G+lBz/DSlA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 6/12/24 16:56, Bernd Schubert wrote: > On 6/12/24 16:07, Miklos Szeredi wrote: >> On Wed, 12 Jun 2024 at 15:33, Bernd Schubert wrote: >> >>> I didn't do that yet, as we are going to use the ring buffer for requests, >>> i.e. the ring buffer immediately gets all the data from network, there is >>> no copy. Even if the ring buffer would get data from local disk - there >>> is no need to use a separate application buffer anymore. And with that >>> there is just no extra copy >> >> Let's just tackle this shared request buffer, as it seems to be a >> central part of your design. >> >> You say the shared buffer is used to immediately get the data from the >> network (or various other sources), which is completely viable. >> >> And then the kernel will do the copy from the shared buffer. Single copy, fine. >> >> But if the buffer wasn't shared? What would be the difference? >> Single copy also. >> >> Why is the shared buffer better? I mean it may even be worse due to >> cache aliasing issues on certain architectures. copy_to_user() / >> copy_from_user() are pretty darn efficient. > > Right now we have: > > - Application thread writes into the buffer, then calls io_uring_cmd_done > > I can try to do without mmap and set a pointer to the user buffer in the > 80B section of the SQE. I'm not sure if the application is allowed to > write into that buffer, possibly/probably we will be forced to use > io_uring_cmd_complete_in_task() in all cases (without 19/19 we have that > anyway). My greatest fear here is that the extra task has performance > implications for sync requests. > > >> >> Why is it better to have that buffer managed by kernel? Being locked >> in memory (being unswappable) is probably a disadvantage as well. And >> if locking is required, it can be done on the user buffer. > > Well, let me try to give the buffer in the 80B section. > >> >> And there are all the setup and teardown complexities... > > If the buffer in the 80B section works setup becomes easier, mmap and > ioctls go away. Teardown, well, we still need the workaround as we need > to handle io_uring_cmd_done, but if you could live with that for the > instance, I would ask Jens or Pavel or Ming for help if we could solve > that in io-uring itself. > Is the ring workaround in fuse_dev_release() acceptable for you? Or do > you have any another idea about it? > >> Short update, I have this working for some time now with a hack patch that just adds in a user buffer (without removing mmap, it is just unused). Initially I thought that is a lot slower, but after removing all the kernel debug options perf loss is just around 5% and I think I can get back the remaining by having iov_iter_get_pages2() of the user buffer in the initialization (with additional code overhead). I hope to have new patches by mid of next week. I also want to get rid of the difference of buffer layout between uring and /dev/fuse as that can be troublesome for other changes like alignment. That might require an io-uring CQE128, though. Thanks, Bernd