From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20D99C27C78 for ; Tue, 11 Jun 2024 23:35:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 75F166B0116; Tue, 11 Jun 2024 19:35:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 70D5A6B0117; Tue, 11 Jun 2024 19:35:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5D4706B0118; Tue, 11 Jun 2024 19:35:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 3F1056B0116 for ; Tue, 11 Jun 2024 19:35:12 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id C1F95A2719 for ; Tue, 11 Jun 2024 23:35:11 +0000 (UTC) X-FDA: 82220216022.28.26449D4 Received: from out-185.mta0.migadu.com (out-185.mta0.migadu.com [91.218.175.185]) by imf21.hostedemail.com (Postfix) with ESMTP id 5495E1C0012 for ; Tue, 11 Jun 2024 23:35:09 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b="EQSI5O/N"; spf=pass (imf21.hostedemail.com: domain of kent.overstreet@linux.dev designates 91.218.175.185 as permitted sender) smtp.mailfrom=kent.overstreet@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718148909; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9Cjknq0RVOyh4UWOHg2cniGcfnX4Dm7y/+Yo/T54704=; b=SGW4EF8tpY62UXWXNOq9su6Iva1IEZ7XctVvOXsrx6CAqlXW7slYPbaVZ4h5rU0TVjRP1n j8g3U76IoBbw+e+lEnNB8WRyyQ+wGC5iKgjRgZccW4QupR1zl/MX+izJ+c0MRSFGS2Xjml kLJvcKx7vQtnmlULbY8lqxXhYn42tQ0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718148909; a=rsa-sha256; cv=none; b=z6+WbQFuTwoL9BZoXDWEWnupN4jhXrNJbpVa/KCLIFCtv9NubOCwET2MtZj9r5n5mMs2AE hqgZemmFn+IzILc2RUy3GineFIm4B/NxxxBf5Gt2fvoAVkSCg4cPERYWgv0/9WyAmA50Zx 2PhTy9nDERnWCPUbBGj/a78Oac102+Q= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b="EQSI5O/N"; spf=pass (imf21.hostedemail.com: domain of kent.overstreet@linux.dev designates 91.218.175.185 as permitted sender) smtp.mailfrom=kent.overstreet@linux.dev; dmarc=pass (policy=none) header.from=linux.dev X-Envelope-To: bernd.schubert@fastmail.fm DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1718148907; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=9Cjknq0RVOyh4UWOHg2cniGcfnX4Dm7y/+Yo/T54704=; b=EQSI5O/NkSpUX1Bw8cam3LaOgO8YAC9bkApzrxADLXhslIYL3X4ETjkYwT0xlmd2zQ+7Ql tqBUTPZB8Umn1DLKOH2Bedairl7sKZytfi5gV3GyBfjrZB7NhffhzUn2AZAgTdAgLjuCIv 2dJR4iY89A2bxV1nhueC/U11CIVsDnw= X-Envelope-To: miklos@szeredi.hu X-Envelope-To: bschubert@ddn.com X-Envelope-To: amir73il@gmail.com X-Envelope-To: linux-fsdevel@vger.kernel.org X-Envelope-To: akpm@linux-foundation.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: mingo@redhat.com X-Envelope-To: peterz@infradead.org X-Envelope-To: avagin@google.com X-Envelope-To: io-uring@vger.kernel.org Date: Tue, 11 Jun 2024 19:35:01 -0400 X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Kent Overstreet To: Bernd Schubert Cc: Miklos Szeredi , Bernd Schubert , Amir Goldstein , linux-fsdevel@vger.kernel.org, Andrew Morton , linux-mm@kvack.org, Ingo Molnar , Peter Zijlstra , Andrei Vagin , io-uring@vger.kernel.org Subject: Re: [PATCH RFC v2 00/19] fuse: fuse-over-io-uring Message-ID: References: <20240529-fuse-uring-for-6-9-rfc2-out-v1-0-d149476b1d65@ddn.com> <99d13ae4-8250-4308-b86d-14abd1de2867@fastmail.fm> <62ecc4cf-97c8-43e6-84a1-72feddf07d29@fastmail.fm> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <62ecc4cf-97c8-43e6-84a1-72feddf07d29@fastmail.fm> X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam03 X-Stat-Signature: 4bw1c7cfd49xqtyewq3orw6zncg9wyb6 X-Rspamd-Queue-Id: 5495E1C0012 X-Rspam-User: X-HE-Tag: 1718148909-504141 X-HE-Meta: U2FsdGVkX1/Ye9KGFy4OB7sh+JfGXtLlzZ0WsCB5SiqNZDjPigT9g1Ss+/p9EK5OVN88og1aPmouYsZJK5tcdbs7pMuEto+uEMzOG9k9VQ8vq4BzC85OwhDJjuOkuXPrqzY0R7d4QpUicN0E0tCa8pjTwli10McaXgLbQwGWqhR4aBCBdWEhBFV3JJoy3cQHVVtBE4cqUZeuwRaja/kkmiGam8rVXyQ9QdBk6x0pWqyZyLjtLjvLhn5AanY8FY6v+wW2GloeRNdZ+Y7AbMLskrUpnMDO3qIolZXUUm29KxMX07vGHyeTLxpYi72JGohsu2K+QvVDjMLKNEOBr4jIU1WZjXl88bBz+QqbmYFrZCY23bF9OFDKPNLB9QcBpIUbdB5BvBk2N/z+dxLM3q3pzsaIuntK+qLid95+7dG0ypeOys2ldW0PUCCIWXuZVDut51MtYIhVUWvNTCiZQPYs0ltAt5Syo8rGphgBFYVYXfsDCYGO6sFnHJgGQcIAWtKdrL5vWLj2i9wdwtWT0SCvKSQmsEe+/Vkpo+Dv86hR9Wi1HlckAD/0LRWG9K5NOvo2giaSH0mIX3hE+QY7jJFodv5JSazTYAKqofgFdt7gCVsOgJa555bBmEWelTV4lRwJY7IG7KRqp8OOV7EcjuPH5T0iybjRYOFMG43xgVd4e7IfUHpQc68wcH7+xCqV76B7sFCMQJMJtJ0+QAlRyuY0Xa0A6pj+fp19q9MnEtTbfznoQBw6j2LdR4CWouX/ueKtOiVhru76vxBoOZIo6dYX+u1+ljg7KDkYyqoRUIZMQ8W2CtEI86G/2FkQy3WoJalVJsZ3RSk3MQAux2e1wtBjkOafJQhWd2HDualdaPs10P4ELUdzIPnVcsfjm7YCxiSa0wIW4LknUQHT6grDLQ7b5DkAlOTUaJLCBAVnm5FteqdbepgU+eMSrNiXgTBK6l8D5JkFD4akPKG6IxJbBQl FH7BL6p/ ATXstCsTw/3oeUwvblRigusjlFWvPSFMqAKVk2Wgvu0EBqTObjtnmISTCKVYN4xwkw8PBVI5UYgTHZFFVAixkEQU55fb1AF/V1JCfei0PHSZGgGi3br3YIDxhGA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jun 11, 2024 at 07:37:30PM GMT, Bernd Schubert wrote: > > > On 6/11/24 17:35, Miklos Szeredi wrote: > > On Tue, 11 Jun 2024 at 12:26, Bernd Schubert wrote: > > > >> Secondly, with IORING_OP_URING_CMD we already have only a single command > >> to submit requests and fetch the next one - half of the system calls. > >> > >> Wouldn't IORING_OP_READV/IORING_OP_WRITEV be just this approach? > >> https://github.com/uroni/fuseuring? > >> I.e. it hook into the existing fuse and just changes from read()/write() > >> of /dev/fuse to io-uring of /dev/fuse. With the disadvantage of zero > >> control which ring/queue and which ring-entry handles the request. > > > > Unlike system calls, io_uring ops should have very little overhead. > > That's one of the main selling points of io_uring (as described in the > > io_uring(7) man page). > > > > So I don't think it matters to performance whether there's a combined > > WRITEV + READV (or COMMIT + FETCH) op or separate ops. > > This has to be performance proven and is no means what I'm seeing. How > should io-uring improve performance if you have the same number of > system calls? > > As I see it (@Jens or @Pavel or anyone else please correct me if I'm > wrong), advantage of io-uring comes when there is no syscall overhead at > all - either you have a ring with multiple entries and then one side > operates on multiple entries or you have polling and no syscall overhead > either. We cannot afford cpu intensive polling - out of question, > besides that I had even tried SQPOLL and it made things worse (that is > actually where my idea about application polling comes from). > As I see it, for sync blocking calls (like meta operations) with one > entry in the queue, you would get no advantage with > IORING_OP_READV/IORING_OP_WRITEV - io-uring has do two system calls - > one to submit from kernel to userspace and another from userspace to > kernel. Why should io-uring be faster there? > > And from my testing this is exactly what I had seen - io-uring for meta > requests (i.e. without a large request queue and *without* core > affinity) makes meta operations even slower that /dev/fuse. > > For anything that imposes a large ring queue and where either side > (kernel or userspace) needs to process multiple ring entries - system > call overhead gets reduced by the queue size. Just for DIO or meta > operations that is hard to reach. > > Also, if you are using IORING_OP_READV/IORING_OP_WRITEV, nothing would > change in fuse kernel? I.e. IOs would go via fuse_dev_read()? > I.e. we would not have encoded in the request which queue it belongs to? Want to try out my new ringbuffer syscall? I haven't yet dug far into the fuse protocol or /dev/fuse code yet, only skimmed. But using it to replace the read/write syscall overhead should be straightforward; you'll want to spin up a kthread for responding to requests. The next thing I was going to look at is how you guys are using splice, we want to get away from that too. Brian was also saying the fuse virtio_fs code may be worth investigating, maybe that could be adapted?