From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5362EC64ED8 for ; Mon, 27 Feb 2023 19:47:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CD4366B0072; Mon, 27 Feb 2023 14:47:28 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C84786B0073; Mon, 27 Feb 2023 14:47:28 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B4C0A6B0075; Mon, 27 Feb 2023 14:47:28 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id A35A86B0072 for ; Mon, 27 Feb 2023 14:47:28 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 76F521A04DA for ; Mon, 27 Feb 2023 19:47:28 +0000 (UTC) X-FDA: 80514106176.11.916DA62 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf20.hostedemail.com (Postfix) with ESMTP id C01C81C001D for ; Mon, 27 Feb 2023 19:47:26 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=nVoSzA3x; spf=pass (imf20.hostedemail.com: domain of djwong@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=djwong@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1677527246; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=h4m19E4ZI2/3dJPx5TnIdRuMDbgpj9OuQvT1cFt+d90=; b=LTZl9luDBEUum1HvLk5jcCwzk1yy4A2lVjmhvIwc6IlPP0J5pposuRmObKbFm6Zn+iZNqg 5cZnt0pi7p1UgYSjQ09+WsNqhcb/6+frjDXkllvMAEOvTRV96+KGq7qAW8g52msx05phC/ xZmPhgG5UYbj9JVBhuryCc0b3+o42XI= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=nVoSzA3x; spf=pass (imf20.hostedemail.com: domain of djwong@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=djwong@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677527246; a=rsa-sha256; cv=none; b=QBpQ4KQNnn8O5XzKvEg1qCKLutfPboqCqv2bceU3TJQqNcaS8J2vXNZ1y/Djf/M9CTqERQ 2yM85b78ICvZdX2TsF5vQyEqmlfL2HxcN21swMN6/rAFlhzkwMcgvQ0Myqlm8FcOUSqT42 ijWfU4ZGNQqFfZcQkQBuBsWb76Wgo3g= Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id BABD460F1A; Mon, 27 Feb 2023 19:47:25 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 180BAC433D2; Mon, 27 Feb 2023 19:47:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1677527245; bh=lzfzDpupVAFg/QPvmyRLydbCbV+3pyAG9XpfpP6HYRY=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=nVoSzA3xh4Syfl7JMJ7t0ZuZoUvaA3/CboWmHYNrAYtvCQ5S6Gmk2uI2dvajY7nL/ yBkmw8KUD2hRmONgua8cgy1PiUzy2iHE0uBNtI0l1qoPwl6t73LbCHW0mD32Ub+6de 0f6z8wkZqKHnk+XxpinRQycKwWYdk6SnHexpEAe8SPPmk9L0odbx+CzWk5it7mtMbD YCqrOCiNG2CHcHoODs500scqlLaJFCQX1xIWYZw9NhP1xvidDzYXNs7xUDtsXFX3OO XpowpH0kea6rFWxZrgBkbwQdgzeLfT3kJl1KJ2hyfdp9y1BbIxLt+OOKU063vjqh55 hCjsimVLQGk+Q== Date: Mon, 27 Feb 2023 11:47:24 -0800 From: "Darrick J. Wong" To: Matthew Wilcox Cc: Luis Chamberlain , lsf-pc@lists.linux-foundation.org, Christoph Hellwig , David Howells , "kbus >> Keith Busch" , Pankaj Raghav , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: Re: LSF/MM/BPF 2023 IOMAP conversion status update Message-ID: References: <20230129044645.3cb2ayyxwxvxzhah@garbanzo> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: C01C81C001D X-Stat-Signature: 9umi3b4dcxq3soynfypebsbc17b9bap8 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1677527246-542900 X-HE-Meta: U2FsdGVkX18QKVAj+pqZoDVKGOBLiVB/E0RoNLM4JGsdKqjiaFqizX3LWuNBFPPOmIlovGVjrFAaeSbHduDn4ydmMFSHNApPSHEGA9oamMQhzyTc/8kXMvyqX9IDhpy4o3IFTbftXOUW7pN9Og7AlooggOM1pDGyU3ca1P68qi3jRcygmczUKwyefOvPLOli4Kxx5VDfwNOQY4rTgVZFdkRom6kjKssCgnFkQbe2bsooOrdFX68ECWEDTn8DOArL+JD0ND8Pil6rn237Bu/kIiHh/zliKx0evNo4SKMzy6KuuNy57RwxJ0p5/E6PgsuCtyOUp3gkLyGhOLXv/9NW0L52SCknhbr+MoBzcxGD0tIffpN21NgPT1Cg0mZFHLYnRiUokytYsD1xlnCiLSyR8uYDwdBQOtvDxMXZjxvVi+cxPjXy08aIdljOucN6ONHHc8bI0ZHA3XuPjoByvvlYrk8BY7TUSNgw7kNbbtuJ7wcHJf7i7tF5yTpoiIvYeoWnjFn/fz1yZcYC0k/N3JhlJn4UaJ77PRJwOVEivWYVj/2dULaO9EQGaxxb2SXGUYtxp2FsAwUHc+Tk+Pe8FqfUQrHyVMMOZMxFnI37h1y6cTYBjb3Ca0kt/J1ILutiIuvpa3e+Fat/9lgX+fou9RexPXpMe51cyYcT+9lmXXfedwmYdYm/JhFRZuL/wDuhqQr2z7fqDJSNzSTIjYtZrPBJV23uvzGmrORyFvI7mbpNvHgpub57+ZCvQB8zkkBp7Kiis51FtmJ8KL3lvwO82u4UsUItaKVJZQZG637mvFsrdvRtFvt6S7PqIo/tOaNdZcpKqKtAsGneKftd80f5GKadpF0KpbsPQ46eu/2AG+cEYJhOUfgBFmEKWf5XQrSqwCGx9pdXTFl+a7g2LVyut0sg86KopBuqmdwrv5rmFXC1A6ZmqjT8wMmNNVlOICGBNN+agcUGGT1vYD/RRluUgNx ZohD/Irz gVFtMvkus0QY729gmK5znqHlMH7Q6kXxSwQ0Lod73c8aGWfLBx/HulkHCeo96IbAgkYFp2gFPr845YkMfHJ3RsAC1A5p+p7JgQLs2iz8HyJkcJl7nqNOw6VtuVhERVZBaa8pRiMIJPjCCIQBFRXVssVRPgYDMWYTygT/quGau9g0FjIjhE90BTDzxTtXGwK71JqVOx4cXJVdnnjQjQuPFVVkkxzLY3uG1R3nYG1RlLgVk6b6ITNJr8Ap8utWogmWF4t6GGVsDksjIw8Alw/73S93bX/NuTi00UJdhJ6CNSSlGfXrN/CXQ51yv+jSQ8uN7paixgHvhHZfVvc/i2h8VozFOjtde80/v1AreyQ5yS3Sgtj4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sun, Jan 29, 2023 at 05:06:47AM +0000, Matthew Wilcox wrote: > On Sat, Jan 28, 2023 at 08:46:45PM -0800, Luis Chamberlain wrote: > > I'm hoping this *might* be useful to some, but I fear it may leave quite > > a bit of folks with more questions than answers as it did for me. And > > hence I figured that *this aspect of this topic* perhaps might be a good > > topic for LSF. The end goal would hopefully then be finally enabling us > > to document IOMAP API properly and helping with the whole conversion > > effort. > > +1 from me. > > I've made a couple of abortive efforts to try and convert a "trivial" > filesystem like ext2/ufs/sysv/jfs to iomap, and I always get hung up on > what the semantics are for get_block_t and iomap_begin(). Yup. I wrote about it a little bit here: https://lore.kernel.org/linux-fsdevel/Y%2Fz%2FJrV8qRhUcqE7@magnolia/T/#mda6c3175857d1e4cba88dca042fee030207df4f6 ...and promised that I'd get back to writeback. For buffered IO, iomap does things in a much different order than (I think) most filesystems. Traditionally, I think the order is i_rwsem -> mmap_invalidatelock(?) -> page lock -> get mapping. iomap relies on the callers to take i_rwsem, asks the filesystem for a mapping (with whatever locking that entails), and only then starts locking pagecache folios to operate on them. IOWs, involving the filesystem earlier in the process enables it to make better decisions about space allocations, which in turn should make things faster and less fragmenty. OTOH, it also means that we've learned the hard way that pagecache operations need a means to revalidate mappings to avoid write races. This applies both to the initial pagecache write and to scheduling writeback, but the mechanisms for each were developed separately and years apart. See iomap::validity_cookie and xfs_writepage_ctx::{data,cow}_seq for what I'm talking about. We (xfs developers) ought to figure out if these two mechanisms should be merged before more filesystems start using iomap for buffered io. I'd like to have a discussion about how to clean up and clarify the iomap interfaces, and a separate one about how to port the remaining 35+ filesystems. I don't know how exactly to split this into LSF sessions, other than to suggest at least two. If hch or dchinner show up, I also want to drag them into this. :) --D > > Perhaps fs/buffers.c could be converted to folios only, and be done > > with it. But would we be loosing out on something? What would that be? > > buffer_heads are inefficient for multi-page folios because some of the > algorthims are O(n^2) for n being the number of buffers in a folio. > It's fine for 8x 512b buffers in a 4k page, but for 512x 4kb buffers in > a 2MB folio, it's pretty sticky. Things like "Read I/O has completed on > this buffer, can I mark the folio as Uptodate now?" For iomap, that's a > scan of a 64 byte bitmap up to 512 times; for BHs, it's a loop over 512 > allocations, looking at one bit in each BH before moving on to the next. > Similarly for writeback, iirc. > > So +1 from me for a "How do we convert 35-ish block based filesystems > from BHs to iomap for their buffered & direct IO paths". There's maybe a > separate discussion to be had for "What should the API be for filesystems > to access metadata on the block device" because I don't believe the > page-cache based APIs are easy for fs authors to use. > > Maybe some related topics are > "What testing should we require for some of these ancient filesystems?" > "Whose job is it to convert these 35 filesystems anyway, can we just > delete some of them?" > "Is there a lower-performance but easier-to-implement API than iomap > for old filesystems that only exist for compatibiity reasons?" >