From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 22CB2C27C53 for ; Wed, 12 Jun 2024 16:45:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A40E46B0089; Wed, 12 Jun 2024 12:45:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9F10D6B0092; Wed, 12 Jun 2024 12:45:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 891396B0096; Wed, 12 Jun 2024 12:45:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 6BA186B0089 for ; Wed, 12 Jun 2024 12:45:00 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id E6E161C0498 for ; Wed, 12 Jun 2024 16:44:59 +0000 (UTC) X-FDA: 82222811118.04.4BE49C2 Received: from fout2-smtp.messagingengine.com (fout2-smtp.messagingengine.com [103.168.172.145]) by imf11.hostedemail.com (Postfix) with ESMTP id B589940023 for ; Wed, 12 Jun 2024 16:44:56 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=fastmail.fm header.s=fm1 header.b=4XUItZ5a; dkim=pass header.d=messagingengine.com header.s=fm1 header.b="Z 4wh9lj"; spf=pass (imf11.hostedemail.com: domain of bernd.schubert@fastmail.fm designates 103.168.172.145 as permitted sender) smtp.mailfrom=bernd.schubert@fastmail.fm; dmarc=pass (policy=none) header.from=fastmail.fm ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718210696; a=rsa-sha256; cv=none; b=0ZDDcSke9nyp9dhmKGOtdmxjQ4szcN5ODJWqDW+0xFYwEqrVyzITu/87RTkuVqFSVvxTZd rbxZwVss2Qdj/sg1EVus+Wh54QK95fogWs+tzh9Iaq8mXYDF7NPdebWx3/fJbvPbkcr5zT AEK9aamM0JOa0iSTK7aVnmeSkOz9Dmk= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=fastmail.fm header.s=fm1 header.b=4XUItZ5a; dkim=pass header.d=messagingengine.com header.s=fm1 header.b="Z 4wh9lj"; spf=pass (imf11.hostedemail.com: domain of bernd.schubert@fastmail.fm designates 103.168.172.145 as permitted sender) smtp.mailfrom=bernd.schubert@fastmail.fm; dmarc=pass (policy=none) header.from=fastmail.fm ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718210696; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=D55YX+W40OzYpbYOsRCkEGZUBWknjhYT4/HuwuDHGQs=; b=cXW1iWeWgVqtFsJeFnjfgbzEYAX6bhJwsB5epeC94ef5H1aLyYRd0TArwVrWFPR3u59swR nQQg7APWl/I2TwdufPlGT9QvW3sn5mq0IOkdMF/LN2pSqkhonXeecLoclARAskV2x3EFfF umIwb32a2jEpLbg6E6QVnc8d9Y2d9e4= Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailfout.nyi.internal (Postfix) with ESMTP id 0AA6913801BA; Wed, 12 Jun 2024 12:44:56 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute4.internal (MEProxy); Wed, 12 Jun 2024 12:44:56 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fastmail.fm; h= cc:cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm1; t=1718210696; x=1718297096; bh=D55YX+W40OzYpbYOsRCkEGZUBWknjhYT4/HuwuDHGQs=; b= 4XUItZ5a/XsoqAwUARxsDySMX+lfsq5z0LNZG7EclcpqTfvBVlzbAAAy9PvyWfko cvwB7JnBAmY68BtFqW9GvPBR/XduzufPjmEjY518+X/4IzXRnP6YoTB+kf2yuJNF 0RT6egN2+p8bxZkEw7bByzHC1/0Camxx332Nk2HKSzvRpIc7C9p/4p3rj9WZ69YY mZWoseye8r6ziyyZiR+5GfeIVjAYh51c2S1ATVsZBR1b5JrqiqG6Cob7bUCq+Xli cbYoTLRHJ77GaAShHipR7VTHpZTruyG9GilsBAL6nAQ6IY8MTk/+vxAQnM2umi2Q lW2mAHpepqtbajtggnG3mw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t=1718210696; x= 1718297096; bh=D55YX+W40OzYpbYOsRCkEGZUBWknjhYT4/HuwuDHGQs=; b=Z 4wh9ljOGLsoXz0/LQXg/4U/RG5W/7yAI5aL5GeWHWkFOIRgOpeImN+o7Q6wSwNNs ViH52oIZhIHb8JM7gFeuIjLS5cqKl1EGpovg3nBexcwxKuRkmJlnKp6/RNKG7xx5 WWGZNr75BZW4X6OkuSpnzxwtmt3WKh7z4vRh5sDVtWa8u/ON2EzwhslWqsYFrw2B 9K+Vc5QJSkYLcfcHM4wdOCSkQy9aYAw20bSxMXS57+g2tf0+6PQlskiX60NsjU2n 0U527baPDmrPUGg+GisbGeggfeRaangXKsO+yO1oDrXtznSqtHyJHc8aF63r0TdS qYCqY01BXJNL2WYVFIPeA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrfedugedguddtfecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefkffggfgfuvfevfhfhjggtgfesthejredttddvjeenucfhrhhomhepuegv rhhnugcuufgthhhusggvrhhtuceosggvrhhnugdrshgthhhusggvrhhtsehfrghsthhmrg hilhdrfhhmqeenucggtffrrghtthgvrhhnpeetteejudevudefffdutefgledvvdfgtdel ueeifeegudehkeelkeeffedvieehkeenucffohhmrghinhepghhnuhifvggvsgdrohhrgh enucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpegsvghr nhgurdhstghhuhgsvghrthesfhgrshhtmhgrihhlrdhfmh X-ME-Proxy: Feedback-ID: id8a24192:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 12 Jun 2024 12:44:54 -0400 (EDT) Message-ID: <8d270a22-edf4-4e38-8b62-6504c4101c6a@fastmail.fm> Date: Wed, 12 Jun 2024 18:44:53 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH RFC v2 00/19] fuse: fuse-over-io-uring To: Kent Overstreet Cc: Bernd Schubert , Miklos Szeredi , Amir Goldstein , "linux-fsdevel@vger.kernel.org" , Andrew Morton , "linux-mm@kvack.org" , Ingo Molnar , Peter Zijlstra , Andrei Vagin , "io-uring@vger.kernel.org" References: <99d13ae4-8250-4308-b86d-14abd1de2867@fastmail.fm> <62ecc4cf-97c8-43e6-84a1-72feddf07d29@fastmail.fm> <4e5a84ab-4aa5-4d8b-aa12-625082d92073@ddn.com> <3bh7pncpg3qpeia5m7kgtolbvxwe2u46uwfixjhb5dcgni5k4m@kqode5qrywls> From: Bernd Schubert Content-Language: en-US In-Reply-To: <3bh7pncpg3qpeia5m7kgtolbvxwe2u46uwfixjhb5dcgni5k4m@kqode5qrywls> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: B589940023 X-Rspam-User: X-Rspamd-Server: rspam12 X-Stat-Signature: 8zgri37kr33h565i7g5hrwmt9rgjpd8f X-HE-Tag: 1718210696-192048 X-HE-Meta: U2FsdGVkX19yZxpnkK2Hffd82YtqervlJam8lJu4rEzG82NTnN9OI5WosrhJ7iYNoW4dnhh14uGnmWjcaJYFICxr+SpNL4382D9eax44NR8dCb710CDUWyyrMDIZG+pS6SMwzIpoVgVIlr9BKCKAgF/hmuVxF3zPt8v4dPukLrHxf+Qoc50bQjgiKqfrjvBZxh7++ROmt42C7Le56lvfHFhkYdJqXHyJcWuAPdWVTyNCY7mbL+cPI1JaoefdW3/prJyhObuV/munczrAuE5jCmthA+Lt1qs1wByWEXm6iFCq5QYz1G4XxFvCo32TB5SSJELstp0CtWBwaOIb1Xcu3QXFcgs1MVAeiuDhdhitIP7WHZ51cuoMJ9+3s41BlwumMdZNb0Vw0uE1gDa5vBWojtN3CT9WFbuVzc/5w7ckpoc2WUnbxH9eiRDSfTP7BlAiSB6nEJIGBPC521NvyizUiWQdjaejvhbf6XJ90/ojvNDovXw0RS3Dag5l0c6FolK8E/FmQ/3RudK1O4+ykZHJ8aufKcormd5OzKRR0Dz0W/YoGOp5dVr28fgkwSWmm1Qne4qsKpxr47eU4s9XkR5sRqg3BZfdZup+FP+QSvCy+t0G2pRDCx+4GRsfgopVGAOeVO/fcoXF/tkgHLmsTuyoziwsTN7U3OetdYyD9A89kwoCHCssLuewKXR4lISD2r6LM3t9F2Q8DAXveKLlJbScmxBsvBM9FLOYu4Km4BoL6JRgBXdQjtYMk1fiTUiXPc0KxWuYiF3f+/9XjHPmHwqNDA9ojPiQSc1Jdz/UK0qQ/KLfgClO4YYll80JtxisuZhONyQ9vAhtvmXS06QnA4Y1f9PiWboqakCU03rCFzj6xNp1sUvoshnB8KerjEdqKU5slduhdx+FeoIZLMjFLhsuOgIqNo6xHiEeFA0eUif+NziTkslhvvtPggbIZy+zXIfXxd/y8Kfk7Ne23iRzPng NFkF+zk4 4cg06BtwSQiHstE9mIx3HrKZrKbL5w7Zbs8yCaEL/1fUePC5hdIIAVwuXgdDRAUtybtNWrXA8hGiwQ1nCjeYxkNK6scI3K7fOKVEfkoVPsXoq/4k+gQezWMxy5lQzMlbKZN0zXadNkKzDvDcPr8G9QIbhKcnaWxy7dE9Cn843v5+Gbi1j67/+SmViSzcdiqcE9qc2gyrol1h+zyG7wtXOttc7mKpyoyXkkxM7ru/QApzztmKkgZbVeLFJ+r8/ZohHnjPJQUTesDgM4/uHBLjVM2alIbwnmjy6mUko0aMWYATYfEPACh7Xn+grvywCQ0rEyFIYSUaXjn0QOGe/70I0MbVlBEn6TfpWfrEKtg8HjqWDLV2+E8p7eXwl7rbKJSvEKAKyChpe+AGUsmdye4GLoeJw4FeQYk/xxzEPjWBH95/UXUA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 6/12/24 18:24, Kent Overstreet wrote: > On Wed, Jun 12, 2024 at 06:15:57PM GMT, Bernd Schubert wrote: >> >> >> On 6/12/24 17:55, Kent Overstreet wrote: >>> On Wed, Jun 12, 2024 at 03:40:14PM GMT, Bernd Schubert wrote: >>>> On 6/12/24 16:19, Kent Overstreet wrote: >>>>> On Wed, Jun 12, 2024 at 03:53:42PM GMT, Bernd Schubert wrote: >>>>>> I will definitely look at it this week. Although I don't like the idea >>>>>> to have a new kthread. We already have an application thread and have >>>>>> the fuse server thread, why do we need another one? >>>>> >>>>> Ok, I hadn't found the fuse server thread - that should be fine. >>>>> >>>>>>> >>>>>>> The next thing I was going to look at is how you guys are using splice, >>>>>>> we want to get away from that too. >>>>>> >>>>>> Well, Ming Lei is working on that for ublk_drv and I guess that new approach >>>>>> could be adapted as well onto the current way of io-uring. >>>>>> It _probably_ wouldn't work with IORING_OP_READV/IORING_OP_WRITEV. >>>>>> >>>>>> https://lore.gnuweeb.org/io-uring/20240511001214.173711-6-ming.lei@redhat.com/T/ >>>>>> >>>>>>> >>>>>>> Brian was also saying the fuse virtio_fs code may be worth >>>>>>> investigating, maybe that could be adapted? >>>>>> >>>>>> I need to check, but really, the majority of the new additions >>>>>> is just to set up things, shutdown and to have sanity checks. >>>>>> Request sending/completing to/from the ring is not that much new lines. >>>>> >>>>> What I'm wondering is how read/write requests are handled. Are the data >>>>> payloads going in the same ringbuffer as the commands? That could work, >>>>> if the ringbuffer is appropriately sized, but alignment is a an issue. >>>> >>>> That is exactly the big discussion Miklos and I have. Basically in my >>>> series another buffer is vmalloced, mmaped and then assigned to ring entries. >>>> Fuse meta headers and application payload goes into that buffer. >>>> In both kernel/userspace directions. io-uring only allows 80B, so only a >>>> really small request would fit into it. >>> >>> Well, the generic ringbuffer would lift that restriction. >> >> Yeah, kind of. Instead allocating the buffer in fuse, it would be now allocated >> in that code. At least all that setup code would be moved out of fuse. I will >> eventually come to your patches today. >> Now we only need to convince Miklos that your ring is better ;) >> >>> >>>> Legacy /dev/fuse has an alignment issue as payload follows directly as the fuse >>>> header - intrinsically fixed in the ring patches. >>> >>> *nod* >>> >>> That's the big question, put the data inline (with potential alignment >>> hassles) or manage (and map) a separate data structure. >>> >>> Maybe padding could be inserted to solve alignment? >> >> Right now I have this struct: >> >> struct fuse_ring_req { >> union { >> /* The first 4K are command data */ >> char ring_header[FUSE_RING_HEADER_BUF_SIZE]; >> >> struct { >> uint64_t flags; >> >> /* enum fuse_ring_buf_cmd */ >> uint32_t in_out_arg_len; >> uint32_t padding; >> >> /* kernel fills in, reads out */ >> union { >> struct fuse_in_header in; >> struct fuse_out_header out; >> }; >> }; >> }; >> >> char in_out_arg[]; >> }; >> >> >> Data go into in_out_arg, i.e. headers are padded by the union. >> I actually wonder if FUSE_RING_HEADER_BUF_SIZE should be page size >> and not a fixed 4K. > > I would make the commands variable sized, so that commands with no data > buffers don't need padding, and then when you do have a data command you > only pad out that specific command so that the data buffer starts on a > page boundary. The same buffer is used for kernel to userspace and the other way around - it is attached to the ring entry. Either direction will always have data, where would a dynamic sizing then be useful? Well, some "data" like the node id don't need to be aligned - we could save memory for that. I still would like to have some padding so that headers could be grown without any kind of compat issues. Though almost 4K is probably too much for that. Thanks for pointing it out, will improve it! Cheers, Bernd