From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7DEA6C27C53 for ; Wed, 12 Jun 2024 15:55:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EACDE6B008C; Wed, 12 Jun 2024 11:55:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E5B536B00A3; Wed, 12 Jun 2024 11:55:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CD60F6B00A5; Wed, 12 Jun 2024 11:55:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id AC51E6B008C for ; Wed, 12 Jun 2024 11:55:13 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 55FE8160502 for ; Wed, 12 Jun 2024 15:55:13 +0000 (UTC) X-FDA: 82222685706.02.E3D091B Received: from out-170.mta1.migadu.com (out-170.mta1.migadu.com [95.215.58.170]) by imf03.hostedemail.com (Postfix) with ESMTP id 878242000B for ; Wed, 12 Jun 2024 15:55:10 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=tBfOSzNs; spf=pass (imf03.hostedemail.com: domain of kent.overstreet@linux.dev designates 95.215.58.170 as permitted sender) smtp.mailfrom=kent.overstreet@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718207711; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=AvL1Mr0Fm0dZcOgqN+9E74/tJ0v+sK2Qs50nD6yxG/I=; b=0JG9MPBl3LRz+jUsDViagoPAI4TRMkg227++orxJmAA1ak9iVZE39u4U6wyldUIVTYVs7x bp6hlF1UCO77Wsjo7nv/PKGeUq/gK3K+6GMFMM38OWWGP+on3QCFRC448QwnrYtJh2LDI4 N5i5qCRkzDwn8m7SZ6EkjHBQqE/ph6A= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=tBfOSzNs; spf=pass (imf03.hostedemail.com: domain of kent.overstreet@linux.dev designates 95.215.58.170 as permitted sender) smtp.mailfrom=kent.overstreet@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718207711; a=rsa-sha256; cv=none; b=3R3r6YgICOcpY66ZrOmHCnGdoYpp7wM80XcVbJsP4xY0//AiBTw/gIPRjxJ0AoAzVh5iXp uS1v/mA2eNT2w04Uutve+IV2UfRflJ/VBxO0fzOC3DW1LXyv2DIYWb6JWHxd9aosynYeOY y5udq6xHpQtdQT97UsT84CFTOKanqHk= X-Envelope-To: bschubert@ddn.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1718207708; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=AvL1Mr0Fm0dZcOgqN+9E74/tJ0v+sK2Qs50nD6yxG/I=; b=tBfOSzNsZtHuam2LcfyThM1Q7QYdjTEHsJ+upKwR+o01BtIwjefYnW2l8sf0lsdEvqF4n9 qheOfzV5uZffa+YyyKdgv7hc8pkSAp+UaKILQWP4YSRBhhSHszgcqN0NHEQaUSWace0JSo pSarQHVvr7YBAoBr/hL2cMkf4WQk+sE= X-Envelope-To: bernd.schubert@fastmail.fm X-Envelope-To: miklos@szeredi.hu X-Envelope-To: amir73il@gmail.com X-Envelope-To: linux-fsdevel@vger.kernel.org X-Envelope-To: akpm@linux-foundation.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: mingo@redhat.com X-Envelope-To: peterz@infradead.org X-Envelope-To: avagin@google.com X-Envelope-To: io-uring@vger.kernel.org Date: Wed, 12 Jun 2024 11:55:03 -0400 X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Kent Overstreet To: Bernd Schubert Cc: Bernd Schubert , Miklos Szeredi , Amir Goldstein , "linux-fsdevel@vger.kernel.org" , Andrew Morton , "linux-mm@kvack.org" , Ingo Molnar , Peter Zijlstra , Andrei Vagin , "io-uring@vger.kernel.org" Subject: Re: [PATCH RFC v2 00/19] fuse: fuse-over-io-uring Message-ID: References: <20240529-fuse-uring-for-6-9-rfc2-out-v1-0-d149476b1d65@ddn.com> <99d13ae4-8250-4308-b86d-14abd1de2867@fastmail.fm> <62ecc4cf-97c8-43e6-84a1-72feddf07d29@fastmail.fm> <4e5a84ab-4aa5-4d8b-aa12-625082d92073@ddn.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4e5a84ab-4aa5-4d8b-aa12-625082d92073@ddn.com> X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: 878242000B X-Rspam-User: X-Rspamd-Server: rspam09 X-Stat-Signature: hkn1u4r19i6b9sbhh3fznzm93giegad4 X-HE-Tag: 1718207710-471055 X-HE-Meta: U2FsdGVkX1/O3kFc21K9H0u/Fmw5Yq68vadG8lftkuEO+ZzhEnQtlEWrfSnMs9zCEx3VUOazqdX8l1aWnc++x4N8/Jtgq0rnX0VYdRyMwdvbR8z8U2JwaPIbfvS/aYc2ywaZnTrnuf7TJRkEdonNt9NKJSWJWis9KHlGNdbvnAAoeWlQubeGRPZvwCZIpueNH6pcLSX6AsjsCmBbBRAJiHxsWSw+oSac+QKh4OISMlgtLdMfQMs31Lsmrs+jrkpSnFUCIfxcyluondeTYu279nLEpsyU3zfOYCOrMlOW5iEPsacdCcInlq7DM+9IJSWIawyhRKxOMPTwF1+FLM8zLw8KJlZbJW61Bq2mvUMcV7fvMHKVi7s1Mo+7srrNT88qUOqXGdxqG0tOnDzNfhYqgPPE+538mOGM7ewqhQ97zRbMHUFsnbfVFpraP2wZKp6YJalhG+IOj4JEGXjMi2WTCzlg9bnOMuKKIKWdFZm8oT0MxmeUiOIAX5P1EN5ZlS2mU69p/ENd5CutGfF8WBBJ/BOVRlnwAv/1NQOgOrXInN4TUDwRVDV37snZb5qmlKX5VQuloIeG78V/i2lwQkvbAG3DSyliKyNsi+aeCojo6wS+uS0IvWPIVB4NB9UftHHUVrQAzgcigxqzY0veAyBcR3tcic1UhOUv4qI/P9W/KepjcM7iXbseJX1ChrbvTIBZHV5hsxSvStG+s6sOByayBHCMr+mq6d6wC4dyWJUE3CCfbrQrDJpy7ozKVnx3xcHMfZmUOXtYtunKE3+OOwJXfSUPE4h8Y17fMCbf4TDfo91q/8IKsH9CzbNxkfeuXVne58cqo3bayr6CnVP4RYB8i85gM1NntRkbbQCyRZeLt96YG1l5OpC+D33NhfbtdhIM1jcpb/ioWas4r74e79Y+FeQcKnKVKG0/Bl6S2achT9zliCY7hWjPSE3pHczu8O3/0r+Tu568WLjrub1aLe8 TwzYiJc7 ygGYwoHtc9GO8tiQbx1hR8zTaJff/vEC274eA2lYWx7JXXfNvXQ6m/hjdRHNZmZHMwidl+3w0ouWgGRX176XYQNeLxHw8Xf3EKilvZjjC02yiMjq0bLgLMpIIOjFKyWOlG+2YuMOh5tWqMfEHAO+ueG+VI062a/FXls0y14Qu3eJdVdWeaeONwwCwFy7AeeOSP/ijxhOMXKePoBc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jun 12, 2024 at 03:40:14PM GMT, Bernd Schubert wrote: > On 6/12/24 16:19, Kent Overstreet wrote: > > On Wed, Jun 12, 2024 at 03:53:42PM GMT, Bernd Schubert wrote: > >> I will definitely look at it this week. Although I don't like the idea > >> to have a new kthread. We already have an application thread and have > >> the fuse server thread, why do we need another one? > > > > Ok, I hadn't found the fuse server thread - that should be fine. > > > >>> > >>> The next thing I was going to look at is how you guys are using splice, > >>> we want to get away from that too. > >> > >> Well, Ming Lei is working on that for ublk_drv and I guess that new approach > >> could be adapted as well onto the current way of io-uring. > >> It _probably_ wouldn't work with IORING_OP_READV/IORING_OP_WRITEV. > >> > >> https://lore.gnuweeb.org/io-uring/20240511001214.173711-6-ming.lei@redhat.com/T/ > >> > >>> > >>> Brian was also saying the fuse virtio_fs code may be worth > >>> investigating, maybe that could be adapted? > >> > >> I need to check, but really, the majority of the new additions > >> is just to set up things, shutdown and to have sanity checks. > >> Request sending/completing to/from the ring is not that much new lines. > > > > What I'm wondering is how read/write requests are handled. Are the data > > payloads going in the same ringbuffer as the commands? That could work, > > if the ringbuffer is appropriately sized, but alignment is a an issue. > > That is exactly the big discussion Miklos and I have. Basically in my > series another buffer is vmalloced, mmaped and then assigned to ring entries. > Fuse meta headers and application payload goes into that buffer. > In both kernel/userspace directions. io-uring only allows 80B, so only a > really small request would fit into it. Well, the generic ringbuffer would lift that restriction. > Legacy /dev/fuse has an alignment issue as payload follows directly as the fuse > header - intrinsically fixed in the ring patches. *nod* That's the big question, put the data inline (with potential alignment hassles) or manage (and map) a separate data structure. Maybe padding could be inserted to solve alignment? A separate data structure would only really be useful if it enabled zero copy, but that should probably be a secondary enhancement. > I will now try without mmap and just provide a user buffer as pointer in the 80B > section. > > > > > > We just looked up the device DMA requirements and with modern NVME only > > 4 byte alignment is required, but the block layer likely isn't set up to > > handle that. > > I think existing fuse headers have and their data have a 4 byte alignment. > Maybe even 8 byte, I don't remember without looking through all request types. > If you try a simple O_DIRECT read/write to libfuse/example_passthrough_hp > without the ring patches it will fail because of alignment. Needs to be fixed > in legacy fuse and would also avoid compat issues we had in libfuse when the > kernel header was updated. > > > > > So - prearranged buffer? Or are you using splice to get pages that > > userspace has read into into the kernel pagecache? > > I didn't even try to use splice yet, because for the DDN (my employer) use case > we cannot use zero copy, at least not without violating the rule that one > cannot access the application buffer in userspace. DDN - lustre related? > > I will definitely look into Mings work, as it will be useful for others. > > > Cheers, > Bernd