From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D7D3C7EE2E for ; Sun, 26 Feb 2023 20:16:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BD6E86B0072; Sun, 26 Feb 2023 15:16:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B87646B0073; Sun, 26 Feb 2023 15:16:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A4F936B0074; Sun, 26 Feb 2023 15:16:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 923D96B0072 for ; Sun, 26 Feb 2023 15:16:45 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 664B840594 for ; Sun, 26 Feb 2023 20:16:45 +0000 (UTC) X-FDA: 80510551170.23.2DDB655 Received: from mail-pj1-f50.google.com (mail-pj1-f50.google.com [209.85.216.50]) by imf08.hostedemail.com (Postfix) with ESMTP id B29C616001B for ; Sun, 26 Feb 2023 20:16:43 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=FyaKusYj; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf08.hostedemail.com: domain of ritesh.list@gmail.com designates 209.85.216.50 as permitted sender) smtp.mailfrom=ritesh.list@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1677442603; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to:references:dkim-signature; bh=JRygPWuFB5jpxgCDeu/k/O7jLoxofffpwNYn7bW41qM=; b=nvYGSckKCkfJ8lVc4yj0WOH49yN68ac1W1FDLy8nxCfP3/E61pZzggUJOl1TTaSF07OvKT pn/ROvBwzAybfMr5Rg3AniVIeK7azdghzGUhRjG6ioJdCcsU3MkfYip4pE1v4cfxHzWO/e k06bImjsO7HmmqnFlFK21NejhaQRhOo= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=FyaKusYj; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf08.hostedemail.com: domain of ritesh.list@gmail.com designates 209.85.216.50 as permitted sender) smtp.mailfrom=ritesh.list@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677442603; a=rsa-sha256; cv=none; b=JeR+WtFLmgh3Ot3hVs+e+8e3Iq2/bMFIF/FyiwaGy7ZaopY4OAnNlJ9E6xnqtpTKZ3UU76 WLAXt/yx6FhuvTPJCYIoNwLg41xfUU30WEwAw4yesh1XU0rcqiUt+ISpe5Ha7wd8Do3LKq E6JN5UHLISYrzpSVb2UProAUacNcHcM= Received: by mail-pj1-f50.google.com with SMTP id m8-20020a17090a4d8800b002377bced051so8121804pjh.0 for ; Sun, 26 Feb 2023 12:16:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=in-reply-to:subject:cc:to:from:message-id:date:from:to:cc:subject :date:message-id:reply-to; bh=JRygPWuFB5jpxgCDeu/k/O7jLoxofffpwNYn7bW41qM=; b=FyaKusYjm8nVUXJJnn20es6xKbhk81hTUHRRkxs+zZnJFb74PJG6yisvk6NfvOwA3h VVDQJWvcW+9J+HEoCygQJErQrwmNccYRtlCMflt8Yzlehly+UJbo2yjxQnZithjwegmo WobAAoGIKj6qBK8tnRIPaz6BERL9IEFLEX3277zTnggG/hn1y1hIWeVpl2yv2PeDnADE mqFzmmhZBzeNf9NkueHvkI2JGmap+f74mvM9WK1qPkR8SRn4hL7FnujY3X9An0Vo3A0L JAW9VD7PoBq9z0sCyCOFA9+mkyd+pIESb5Bgf//+rICR4BB5U2y9SHdX+IkXj9epNML9 C/mA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:subject:cc:to:from:message-id:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=JRygPWuFB5jpxgCDeu/k/O7jLoxofffpwNYn7bW41qM=; b=VCs5pBZacDHMQbFlIGvKCKEFcQzjGddfsWie8+ZqScV4fWTIqo5NohTd+aCJwDZtxk Zm3E7VPNyPayYQMNomPEwaH1kMLWZMHm6MBFe1SVji7WCKnyfJ6xctCcHs1azC3ctLLU kOHyvTLE1LW75mzapQQUX4KTmo1P0+MPu4kITgifViiBS/rba0Aql6Bf1ViqeVutDVLj BlrvOwrYdLFHNvZ+HrJmzZ2bDGhcrCUdOkqy+baaEAvZaUsQMYGjfPHiHLKTfKe7ia29 ft6Ep2rhzhgK0PqF9e/4LQTjfb1vt3/1RkjsQ+3eCo4GgdW+AiY8XmHzfDka52ca3kgz cGOQ== X-Gm-Message-State: AO0yUKUqlyfkWxHGmRe8V7L9zvpEaam8lSDB6bPJEJ5aTI7YI6CuLv+C CIDRgWkBGflz++mOXdXfABiYWwOZXb4= X-Google-Smtp-Source: AK7set+Oj1MU13hNbJ3Cl5GNxRd137HkwL16onsIV+yXxkaAAi/2T0VIBzPrzttCSP5Xg8BeGmgq2Q== X-Received: by 2002:a05:6a20:690b:b0:cb:cd69:48d2 with SMTP id q11-20020a056a20690b00b000cbcd6948d2mr18632791pzj.30.1677442602035; Sun, 26 Feb 2023 12:16:42 -0800 (PST) Received: from rh-tp ([2406:7400:63:469f:eb50:3ffb:dc1b:2d55]) by smtp.gmail.com with ESMTPSA id y18-20020aa78052000000b005a8aab9ae7esm2801906pfm.216.2023.02.26.12.16.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 26 Feb 2023 12:16:41 -0800 (PST) Date: Mon, 27 Feb 2023 01:46:23 +0530 Message-Id: <87ttz889ns.fsf@doe.com> From: Ritesh Harjani (IBM) To: Jan Kara , Matthew Wilcox Cc: Luis Chamberlain , lsf-pc@lists.linux-foundation.org, Christoph Hellwig , David Howells , "kbus @imap.suse.de>> Keith Busch" , Pankaj Raghav , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: Re: LSF/MM/BPF 2023 IOMAP conversion status update In-Reply-To: <20230208160422.m4d4rx6kg57xm5xk@quack3> X-Rspamd-Queue-Id: B29C616001B X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: p191g9ry48j6royn8xt9fjmx4rqxtjxo X-HE-Tag: 1677442603-403172 X-HE-Meta: U2FsdGVkX1+zPLCUMYukDk6AHjK16/L1qoYtX9/s6DcM1xqKcE/OXi+/e8RO7eptQv9NFe5IdaoieMiNvpKKKYTn22jZlLBJU5nCwyjhNLkyFlanY7EOfhGBdVo2UyQbPA9PTNYoVLSGGuRi2ge56CShDSK6UFDnbRzgWwDZTfTrmkAKbuntT5rn5ZNKO6+o835jWqD/X+KOZ1DOa4RpnaDlY7XJfL94qOSbAcCJasJk7LzPMUxqPtfcRhUs1yA2vfruwH8wcURLqlcBFZCdZ4SAsBfGQpUTTAzEjE+d5kU4wu+doywDW3WKn8xT/IPRYbLuWkCJ0SGl3foS3HB6uB61jgdRJKfjIpo933xr6il3TfuMTJHP/cLC/I7R41mCw/BdzijZJDnRMwaTJtqLtFeg9wpRTweWVkk2ILq27PncC1UzErYjLtKxiV2Jn3EMHOBkxWaBxQC9SOjNEulVb2F9xpG21h4ca8ASW+vZBXg6ht0P9OTZvLFr5geM5NHiUzWO/G1Gzu+jLyfGggtcXh/Lvq7EcUZOMLfHGRodQOFoMV6avp0/75tnnhxkwPCOaZKH4kaQ6GsraD/7DBp+/UV2l7KTooirSG+01+rzLuUnRf5sT0PnLzmhAUMfdJ4qeYSE6zSJPbSR8F20GmAzpAdXc3SoDR8MlrUUY6HdgB6CIDMHoQMZ8hTbQfJaxzFCubOtQVD/HHKVt2OYi5v73MvG4T1hRrt5ysAmP87hoFaRXE4ak7qqYHVz73QBQjdcRBXf/oZCxdIKxk8kNEkH1iiZM8M8pUGiBIZOZszEjHfQYP1NZ9D/ptw9rriA2dzh/O5fBFfg8S05jPgFlWI+QaV7tBkwp4OAxh2pzMVrG1fgl9Yh+pSh7Q4VcOV7wUFIcUyTAiX6A6BEnrB98bGbSpLpGjajpTUKZsUCZcUmUhWswEffpFfIinzd7b9zzApzh5wTEVo+aARE6uV3agT zELTXNKm 4p+vD+u3UuWaw8Y636ozECyeES5UMfssz9NWBBiHzYBYslM5Xb7dbVZHsYYAdNK9mAxpOUNK9RIqa6wVbjaEH86PIYbUbExvpdGXolci2Av5CBHUrbrKnbU9UHsvHIEMF5SJPg9wx20eYmcs1+PFOiQfqN+Ldx9ULBx4EZOCN4LTbklJ9TmaQ6RqbeP0Us9DveJh5CLnwaVYWkgdpN/yBVpCq8ZJaPR4nClA1hET2dWiOYv2JhkdpDg8sFN2oHXcxkCtfLKWMUovcLYgZlM2Cnl+nst29dj+92OQWBwr3iHIkGWCsrhtYDNujiiOB6TCVDlqNqSfclEj7mxKfD8O8MPA822gPetLkk9RfxkaB07d0ZZgjFf6vk6+9nCeuTD835JC+eLZbdUxKfzaIYtoJuMfPqEcAWFaBfml+7i0Jsa3rb6Y= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000145, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Jan Kara writes: > On Sun 29-01-23 05:06:47, Matthew Wilcox wrote: >> On Sat, Jan 28, 2023 at 08:46:45PM -0800, Luis Chamberlain wrote: >> > I'm hoping this *might* be useful to some, but I fear it may leave quite >> > a bit of folks with more questions than answers as it did for me. And >> > hence I figured that *this aspect of this topic* perhaps might be a good >> > topic for LSF. The end goal would hopefully then be finally enabling us >> > to document IOMAP API properly and helping with the whole conversion >> > effort. >> >> +1 from me. +1 from my end as well please. Currently I have also been working on adding subpage size dirty tracking support to iomap layer so that we don't have the write amplification problem in case of buffered writes for bs < ps systems [1]. This also improves the performance including in some real world usecases like postgres db workload. Now there are some further points that we would like to discuss on how to optimize/improve iomap dirty bitmap tracking for large folios. I can try to come up with some ideas in that regard so that we can discuss about these as well with others [2] [1]: https://lore.kernel.org/all/cover.1677428794.git.ritesh.list@gmail.com/ [2]: https://lore.kernel.org/all/20230130210113.opdvyliooizicrsk@rh-tp/ >> >> I've made a couple of abortive efforts to try and convert a "trivial" >> filesystem like ext2/ufs/sysv/jfs to iomap, and I always get hung up on >> what the semantics are for get_block_t and iomap_begin(). > > Yeah, I'd be also interested in this discussion. In particular as a > maintainer of part of these legacy filesystems (ext2, udf, isofs). > >> > Perhaps fs/buffers.c could be converted to folios only, and be done >> > with it. But would we be loosing out on something? What would that be? >> >> buffer_heads are inefficient for multi-page folios because some of the >> algorthims are O(n^2) for n being the number of buffers in a folio. >> It's fine for 8x 512b buffers in a 4k page, but for 512x 4kb buffers in >> a 2MB folio, it's pretty sticky. Things like "Read I/O has completed on >> this buffer, can I mark the folio as Uptodate now?" For iomap, that's a >> scan of a 64 byte bitmap up to 512 times; for BHs, it's a loop over 512 >> allocations, looking at one bit in each BH before moving on to the next. >> Similarly for writeback, iirc. >> >> So +1 from me for a "How do we convert 35-ish block based filesystems >> from BHs to iomap for their buffered & direct IO paths". There's maybe a >> separate discussion to be had for "What should the API be for filesystems >> to access metadata on the block device" because I don't believe the >> page-cache based APIs are easy for fs authors to use. > > Yeah, so the actual data paths should be relatively easy for these old > filesystems as they usually don't do anything special (those that do - like > reiserfs - are deprecated and to be removed). But for metadata we do need > some convenience functions like - give me block of metadata at this block > number, make it dirty / clean / uptodate (block granularity dirtying & > uptodate state is absolute must for metadata, otherwise we'll have data > corruption issues). From the more complex functionality we need stuff like: > lock particular block of metadata (equivalent of buffer lock), track that > this block is metadata for given inode so that it can be written on > fsync(2). Then more fancy filesystems like ext4 also need to attach more > private state to each metadata block but that needs to be dealt with on > case-by-case basis anyway. > >> Maybe some related topics are >> "What testing should we require for some of these ancient filesystems?" >> "Whose job is it to convert these 35 filesystems anyway, can we just >> delete some of them?" > > I would not certainly miss some more filesystems - like minix, sysv, ... > But before really treatening to remove some of these ancient and long > untouched filesystems, we should convert at least those we do care about. > When there's precedent how simple filesystem conversion looks like, it is > easier to argue about what to do with the ones we don't care about so much. > >> "Is there a lower-performance but easier-to-implement API than iomap >> for old filesystems that only exist for compatibiity reasons?" > > As I wrote above, for metadata there ought to be something as otherwise it > will be real pain (and no gain really). But I guess the concrete API only > matterializes once we attempt a conversion of some filesystem like ext2. > I'll try to have a look into that, at least the obvious preparatory steps > like converting the data paths to iomap. I have worked in past with Jan and others on adding iomap support to DIO for ext4. I have also added fiemap, bmap and swap operations to move to iomap for ext4 and I would like to continue working in this direction for making ext4 completely switch to iomap (including the buffered-io path). I would definitely like to convert ext2 to iomap (work along with you on this of course :)). This in my opinion too, can help us figure out whether iomap requires any changes so as to support ext* family of filesystem specially for buffered io path. This IMO is simpler then to start with ext4 buffered-io path which makes fscrypt, fsverity etc. get in our way... Let me also take a look at this and get back to you. At first I will get started with ext2 DIO path (which I think should be very straight forward, since we already have iomap_ops for DAX in ext2). While at it, I will also see what is required for the buffered I/O path conversion. [3]: https://lore.kernel.org/all/?q=s%3Aiomap+and+f%3Aritesh -ritesh