From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D5877C433F5 for ; Tue, 29 Mar 2022 17:21:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 38F6F8D0002; Tue, 29 Mar 2022 13:21:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 33F518D0001; Tue, 29 Mar 2022 13:21:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 206388D0002; Tue, 29 Mar 2022 13:21:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id 0ED8B8D0001 for ; Tue, 29 Mar 2022 13:21:06 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id BB2DE243C2 for ; Tue, 29 Mar 2022 17:21:05 +0000 (UTC) X-FDA: 79298089290.11.563AF65 Received: from bhuna.collabora.co.uk (bhuna.collabora.co.uk [46.235.227.227]) by imf01.hostedemail.com (Postfix) with ESMTP id 1B09440003 for ; Tue, 29 Mar 2022 17:21:03 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) (Authenticated sender: krisman) with ESMTPSA id 273861F43829 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1648574462; bh=MKEFh39BbVWWkFq3HEcuavmMNzOKW96accRddIjlml0=; h=From:To:Cc:Subject:References:Date:In-Reply-To:From; b=HOBxHKk1TrEFQCjB9ARCB9ak6RO7J4j6jzvIoI3f5oo4PnxH0fCcJxOqI6bO9lK3k A5UmKzQaTLlglOk+kKNyNYSuXwnAQ7bWLj+SGhVxr3Yie8rUBLLj/exU4ZuMJtq9Ub nYjoocAREzu9PeQXudeKZiTr+PRkUg4IgG0G44v+Ld+pnAtk0C6CiyJQGiTM9qeYcp zBOsbfGn0XUWgSSXOETC+u9b5U8Et1NFSK3tVAUBo45PWcShUxNee0PFMHZE+++XqR Z4RkUMIWZDRWGdWQzcsJMb3prkriXziLViRKQY6cnjhBr3gnvljAtk1232nZxNtrP3 bAgon0hOGcmbA== From: Gabriel Krisman Bertazi To: Ming Lei Cc: Hannes Reinecke , lsf-pc@lists.linux-foundation.org, linux-block@vger.kernel.org, Xiaoguang Wang , linux-mm@kvack.org Subject: Re: [LSF/MM/BPF TOPIC] block drivers in user space Organization: Collabora References: <87tucsf0sr.fsf@collabora.com> <986caf55-65d1-0755-383b-73834ec04967@suse.de> <87o81prfrg.fsf@collabora.com> Date: Tue, 29 Mar 2022 13:20:57 -0400 In-Reply-To: (Ming Lei's message of "Tue, 29 Mar 2022 08:30:57 +0800") Message-ID: <87bkxor7ye.fsf@collabora.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: g91xx6cp5qa5umrhoqqmbsbkwq9sayce Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=collabora.com header.s=mail header.b=HOBxHKk1; dmarc=pass (policy=none) header.from=collabora.com; spf=pass (imf01.hostedemail.com: domain of krisman@collabora.com designates 46.235.227.227 as permitted sender) smtp.mailfrom=krisman@collabora.com X-Rspamd-Queue-Id: 1B09440003 X-HE-Tag: 1648574463-906811 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000005, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Ming Lei writes: >> I was thinking of something like this, or having a way for the server to >> only operate on the fds and do splice/sendfile. But, I don't know if it >> would be useful for many use cases. We also want to be able to send the >> data to userspace, for instance, for userspace networking. > > I understand the big point is that how to pass the io data to ubd driver's > request/bio pages. But splice/sendfile just transfers data between two FDs, > then how can the block request/bio's pages get filled with expected data? > Can you explain a bit in detail? Hi Ming, My idea was to split the control and dataplanes in different file descriptors. A queue has a fd that is mapped to a shared memory area where the request descriptors are. Submission/completion are done by read/writing the index of the request on the shared memory area. For the data plane, each request descriptor in the queue has an associated file descriptor to be used for data transfer, which is preallocated at queue creation time. I'm mapping the bio linearly, from offset 0, on these descriptors on .queue_rq(). Userspace operates on these data file descriptors with regular RW syscalls, direct splice to another fd or pipe, or mmap it to move data around. The data is available on that fd until IO is completed through the queue fd. After an operation is completed, the fds are reused for the next IO on that queue position. Hannes has pointed out the issues with fd limits. :) > If block layer is bypassed, it won't be exposed as block disk to userspace. I implemented it as a block-mq driver, but it still only supports one queue. -- Gabriel Krisman Bertazi