From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E999FC636CC for ; Wed, 8 Feb 2023 16:04:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 35F896B0074; Wed, 8 Feb 2023 11:04:28 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 30FCF6B0075; Wed, 8 Feb 2023 11:04:28 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1D7056B0078; Wed, 8 Feb 2023 11:04:28 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 0E0966B0074 for ; Wed, 8 Feb 2023 11:04:28 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id CBEAF140E91 for ; Wed, 8 Feb 2023 16:04:27 +0000 (UTC) X-FDA: 80444596974.01.928F9F8 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf22.hostedemail.com (Postfix) with ESMTP id 635F0C0002 for ; Wed, 8 Feb 2023 16:04:24 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=MtW1KWKF; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=QC5v9xR0; dmarc=none; spf=pass (imf22.hostedemail.com: domain of jack@suse.cz designates 195.135.220.28 as permitted sender) smtp.mailfrom=jack@suse.cz ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675872264; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jztu6kPRArmYAWeaNd2ZPA7CnaNgV1G1FdAMKJLz+gk=; b=66mKcpvEVeiPPCqf8Gl2dPpOspgTh8Jy9SxvnUwJHoo04m6BV0VCtc2uNQFrLheJ8NKczi ew+V8uAOfLVZjROCULc9IrvdOqmuqPfEoX7C69zjH58EzmGZyx6c8lPdLuHl8frCNveMGY cXU3yPrW9gLK/JUTB9wdXS2c1RpQ/xc= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=MtW1KWKF; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=QC5v9xR0; dmarc=none; spf=pass (imf22.hostedemail.com: domain of jack@suse.cz designates 195.135.220.28 as permitted sender) smtp.mailfrom=jack@suse.cz ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675872264; a=rsa-sha256; cv=none; b=n2EI+Ugi3ZBzrYDUb55b/VjKZObuX0UGnqkUcOnCSyqz1UuESYSdfj7v4ySEh54XNo4E/r LrY64pK0xYlBRynLpI0iQXqEz3+a3XZJOkwqHJRJRFLOo3imT4VCaAZyeRP91BEopvfVxF aAMFXP54zld7A1Xl4Tk4YHoPgYrj2U0= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id E8DB833787; Wed, 8 Feb 2023 16:04:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1675872262; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=jztu6kPRArmYAWeaNd2ZPA7CnaNgV1G1FdAMKJLz+gk=; b=MtW1KWKFEsF/zioFfyXPEGAi526JjBrig8YDHpRbCCFm4LI0r6TfYHLl6+W2D4dDTj6wqo 9Xh3CIRTowicVcK/DEcfFFUAYzH+gYIzoYYI5U/Bgb63Hsq3Lh775dCmHrF5dAWQcFsahY KFEIOd4/yMHFgKoiJYMGD6e5fhGdBXE= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1675872262; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=jztu6kPRArmYAWeaNd2ZPA7CnaNgV1G1FdAMKJLz+gk=; b=QC5v9xR02uAkH+FZA7JOjHmT9BpCPpYw2P1sZ2RxQJOSU6bCoahmj9oPWz3qNvv9Wknsoa owOTIMRwdIbDTPBA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id D21701358A; Wed, 8 Feb 2023 16:04:22 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id pt9BMwbI42P6TwAAMHmgww (envelope-from ); Wed, 08 Feb 2023 16:04:22 +0000 Received: by quack3.suse.cz (Postfix, from userid 1000) id 5D41BA06D5; Wed, 8 Feb 2023 17:04:22 +0100 (CET) Date: Wed, 8 Feb 2023 17:04:22 +0100 From: Jan Kara To: Matthew Wilcox Cc: Luis Chamberlain , lsf-pc@lists.linux-foundation.org, Christoph Hellwig , David Howells , "kbus @imap.suse.de>> Keith Busch" , Pankaj Raghav , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: Re: LSF/MM/BPF 2023 IOMAP conversion status update Message-ID: <20230208160422.m4d4rx6kg57xm5xk@quack3> References: <20230129044645.3cb2ayyxwxvxzhah@garbanzo> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 635F0C0002 X-Stat-Signature: 6sxwapauaedsjri8wjb3a5imd7i644s7 X-HE-Tag: 1675872264-891194 X-HE-Meta: U2FsdGVkX1+QD1SjupNYO/4RGcLYhWY1PDALIQlnTrwnGn0QVRHUlEhSKHtnAv3L0Xgva1mVwuVH7vIu8qwYvS+8tQ76ZGIbm5XfXnx6zKV+VhjeIN1sqkL+VZJK/Mj9nyBBb+O310teK/+Mg8TdaVc/o1ab77Y8BfsGsKMMKixlxZ9/AiftWhznP/xJCyWLb2OpnEaRIuyU0HkK658Jsv2k5o6vt0z6OQ4PDYotqisYpf63f17zdRFga1pt4Fgynboew5H5IWqb1VoaQTpdUe3Iy45v3uLxhYj1/o2CvT+CNB3bVxdX+ohHgamKhO3E8x4rtfxoj43U6DzaeHgKzUNdkvaKxfc87B/LZAKp0tOT88VN5pchnq3MR1Z0dyUHROyphpy3ms/yxWeXoWyNAvlkq5fIw9qPCj/OTYOt6nVjElFwAMl7747opWYWXQFR6SUhSHarjgbkJknPRWEcO0VQAwG+fsjaoGFUujHV00LonGCCP8+2fAwsYfPTIjznZwCfQ6KE+2PhJldvwhzWQKhqaQh00YALS6SVjwy+AZ5SRMh9dll2JFuwtDmrxXB6Hpk3DNalY+Trwuq3chUFkgQvoWcQtO3j/L3xpF6vIvcMVydKVEUpabnkB/tAorrADgn6OZbzvjCy7jtGExz++u/4ES9vhIoqeDWuyVPwLclCbFLfpEneY4qK7gKKno11pXj+haAxyTQxk94WdtbVue+TgKan1KobWzRhoo0uaoz5IdRope0bu8WXPrza1x5U1YH7VDavlQ6BLUseFGNOkwVDwS0Dey1Vqh3OZxJR6IcBSfb8pGX3UFH6lHeqOEOeDWX1fLQWKImHQJclBs3ty50IBe2ddmMAaYwPm8Ru/rHJRVy12Slo68IpHD9623OnSHhu2cK9aIYPyI7MPmNbnJFNmhaDoFO0W9Peir0X0sdDU1sqrHo390AgVtKzIJ5KU+YLB339nZLOuiQcD2Z G2gG6iVL Ej1Q4/yH3YmRL1UluBmkNKZ6HvglL3gORW21+WDHJRm90tVefUieYILGM+LXcNI2lytdlsn9WTGLFr6HilBrZXPeUT9R1w0xP52OsWLrWcpRXFHThwsFfG94YVmoAcgZp35LS19aaRBWgMP4SOcrfAYNHom3gLm7IQJZqT8M2TBr9OMimps93LfUY+Pz3tctLfw79JP4W+DpFq1zysk49BQvBqbBbatPkmsQt8oFPLFnB+gsSxpdclCSK1duYOIDVN5X1Dd+WNMghrIU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sun 29-01-23 05:06:47, Matthew Wilcox wrote: > On Sat, Jan 28, 2023 at 08:46:45PM -0800, Luis Chamberlain wrote: > > I'm hoping this *might* be useful to some, but I fear it may leave quite > > a bit of folks with more questions than answers as it did for me. And > > hence I figured that *this aspect of this topic* perhaps might be a good > > topic for LSF. The end goal would hopefully then be finally enabling us > > to document IOMAP API properly and helping with the whole conversion > > effort. > > +1 from me. > > I've made a couple of abortive efforts to try and convert a "trivial" > filesystem like ext2/ufs/sysv/jfs to iomap, and I always get hung up on > what the semantics are for get_block_t and iomap_begin(). Yeah, I'd be also interested in this discussion. In particular as a maintainer of part of these legacy filesystems (ext2, udf, isofs). > > Perhaps fs/buffers.c could be converted to folios only, and be done > > with it. But would we be loosing out on something? What would that be? > > buffer_heads are inefficient for multi-page folios because some of the > algorthims are O(n^2) for n being the number of buffers in a folio. > It's fine for 8x 512b buffers in a 4k page, but for 512x 4kb buffers in > a 2MB folio, it's pretty sticky. Things like "Read I/O has completed on > this buffer, can I mark the folio as Uptodate now?" For iomap, that's a > scan of a 64 byte bitmap up to 512 times; for BHs, it's a loop over 512 > allocations, looking at one bit in each BH before moving on to the next. > Similarly for writeback, iirc. > > So +1 from me for a "How do we convert 35-ish block based filesystems > from BHs to iomap for their buffered & direct IO paths". There's maybe a > separate discussion to be had for "What should the API be for filesystems > to access metadata on the block device" because I don't believe the > page-cache based APIs are easy for fs authors to use. Yeah, so the actual data paths should be relatively easy for these old filesystems as they usually don't do anything special (those that do - like reiserfs - are deprecated and to be removed). But for metadata we do need some convenience functions like - give me block of metadata at this block number, make it dirty / clean / uptodate (block granularity dirtying & uptodate state is absolute must for metadata, otherwise we'll have data corruption issues). From the more complex functionality we need stuff like: lock particular block of metadata (equivalent of buffer lock), track that this block is metadata for given inode so that it can be written on fsync(2). Then more fancy filesystems like ext4 also need to attach more private state to each metadata block but that needs to be dealt with on case-by-case basis anyway. > Maybe some related topics are > "What testing should we require for some of these ancient filesystems?" > "Whose job is it to convert these 35 filesystems anyway, can we just > delete some of them?" I would not certainly miss some more filesystems - like minix, sysv, ... But before really treatening to remove some of these ancient and long untouched filesystems, we should convert at least those we do care about. When there's precedent how simple filesystem conversion looks like, it is easier to argue about what to do with the ones we don't care about so much. > "Is there a lower-performance but easier-to-implement API than iomap > for old filesystems that only exist for compatibiity reasons?" As I wrote above, for metadata there ought to be something as otherwise it will be real pain (and no gain really). But I guess the concrete API only matterializes once we attempt a conversion of some filesystem like ext2. I'll try to have a look into that, at least the obvious preparatory steps like converting the data paths to iomap. Honza -- Jan Kara SUSE Labs, CR