From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A9B56C46CD2 for ; Sat, 27 Jan 2024 20:23:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 225236B0072; Sat, 27 Jan 2024 15:23:37 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1D6116B0082; Sat, 27 Jan 2024 15:23:37 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 04FD36B0085; Sat, 27 Jan 2024 15:23:36 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id E66FA6B0082 for ; Sat, 27 Jan 2024 15:23:36 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id B62DA1C0BC4 for ; Sat, 27 Jan 2024 20:23:36 +0000 (UTC) X-FDA: 81726216432.14.6B78B0F Received: from bedivere.hansenpartnership.com (bedivere.hansenpartnership.com [96.44.175.130]) by imf24.hostedemail.com (Postfix) with ESMTP id D162D180014 for ; Sat, 27 Jan 2024 20:23:34 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=hansenpartnership.com header.s=20151216 header.b=TYsQMUAM; dkim=pass header.d=hansenpartnership.com header.s=20151216 header.b="x4V4W3/D"; spf=pass (imf24.hostedemail.com: domain of James.Bottomley@HansenPartnership.com designates 96.44.175.130 as permitted sender) smtp.mailfrom=James.Bottomley@HansenPartnership.com; dmarc=pass (policy=none) header.from=hansenpartnership.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706387015; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fUEyXVsegu05qF3yOcN8gGbSmaaAtcES2ZsWWos38WU=; b=IBRB5NPrUwyYO7gFIZ06XhCoG4fz031nZEgNk/YTYedb4xqpmWk5BNglEPOj8/e70AjCTH w5T9U5ZwHe0RUZUA9W7z5s+0YzrQjRtaddcP/P91hkUVOZdo7jXXoD1sJ+Y7xEvQ6e/0aN p1aO8zo8sAeWTOGvVj7wVRlyocZrxFo= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706387015; a=rsa-sha256; cv=none; b=Zf227hJggZnkYcQiWq1c9E7f3OVnzAw8vIRMt8r6FqjAPAH08vUL07T2VdW2Fk3ltxoB+Q lXgVlt1w2egz0zti9oenoq8L/wAQrfrxXGMKvF9KeH3T94boxicNAOS38+N2OMBbdFHhYO 21g2SZIQGDH+PEeIT4l70mmbMLdxYJo= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=hansenpartnership.com header.s=20151216 header.b=TYsQMUAM; dkim=pass header.d=hansenpartnership.com header.s=20151216 header.b="x4V4W3/D"; spf=pass (imf24.hostedemail.com: domain of James.Bottomley@HansenPartnership.com designates 96.44.175.130 as permitted sender) smtp.mailfrom=James.Bottomley@HansenPartnership.com; dmarc=pass (policy=none) header.from=hansenpartnership.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=hansenpartnership.com; s=20151216; t=1706387013; bh=G5sQ2mdEKBWsC4av1zwuswWx4yayKcsyOSHxGlVZIKU=; h=Message-ID:Subject:From:To:Date:In-Reply-To:References:From; b=TYsQMUAMiOVZI7P1LGIASb4pV5WhG9xpC187jeNXIECQj+IzviwgrV7USF9BJXbj4 gIQVEPbp/CKxR9SSSrfVEXMLz4kRm9Pfld3GTa+r5UKjy+Y1NVVt97DVM6zWchjAbQ Ydq4Q8mNUPeyp2wlSBIG7McvRictD4s2UwR97K7w= Received: from localhost (localhost [127.0.0.1]) by bedivere.hansenpartnership.com (Postfix) with ESMTP id 522E71285ED1; Sat, 27 Jan 2024 15:23:33 -0500 (EST) Received: from bedivere.hansenpartnership.com ([127.0.0.1]) by localhost (bedivere.hansenpartnership.com [127.0.0.1]) (amavis, port 10024) with ESMTP id HVQ2fmJtf6UN; Sat, 27 Jan 2024 15:23:33 -0500 (EST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=hansenpartnership.com; s=20151216; t=1706387012; bh=G5sQ2mdEKBWsC4av1zwuswWx4yayKcsyOSHxGlVZIKU=; h=Message-ID:Subject:From:To:Date:In-Reply-To:References:From; b=x4V4W3/DWV20FUp602j0U58rTBeJ7DZblXe6kPzUhpvVijqI8DjoSIYonkRZqvVPg ZvEuDLvIoHgeSiKCxjzU6DtoV8g7Sn4ldlqxdeUZ+pMvyDweTyn5ji6PXAp1P0Td+r HBz5mlvZVD2sKwjqgXWGrb6jY3MAItu0OZxJtWhk= Received: from lingrow.int.hansenpartnership.com (unknown [IPv6:2601:5c4:4302:c21::c14]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (prime256v1) server-signature RSA-PSS (2048 bits)) (Client did not present a certificate) by bedivere.hansenpartnership.com (Postfix) with ESMTPSA id D66C31285EC7; Sat, 27 Jan 2024 15:23:31 -0500 (EST) Message-ID: <2b5c46a4dc3cb8206079d4dfc661df53939ee06a.camel@HansenPartnership.com> Subject: Re: [LSF/MM TOPIC] Making pseudo file systems inodes/dentries more like normal file systems From: James Bottomley To: Linus Torvalds , Matthew Wilcox Cc: Amir Goldstein , Steven Rostedt , Greg Kroah-Hartman , lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Christian Brauner , Al Viro Date: Sat, 27 Jan 2024 15:23:29 -0500 In-Reply-To: References: <2024012522-shorten-deviator-9f45@gregkh> <20240125205055.2752ac1c@rorschach.local.home> <2024012528-caviar-gumming-a14b@gregkh> <20240125214007.67d45fcf@rorschach.local.home> <2024012634-rotten-conjoined-0a98@gregkh> <20240126101553.7c22b054@gandalf.local.home> <2024012600-dose-happiest-f57d@gregkh> <20240126114451.17be7e15@gandalf.local.home> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.42.4 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: D162D180014 X-Rspam-User: X-Stat-Signature: 5bg8z5mazaqyf8tjnqouxqj6gm7me7ny X-Rspamd-Server: rspam03 X-HE-Tag: 1706387014-817105 X-HE-Meta: U2FsdGVkX19izkK2FVfcr/ZJP/RyMZLykVo3EsFEYQd2qP9wg/ZTxO5WOZ66crF/GfQerD1EV+rnOMs8TX3r6YeVS/hEi+0G/Ji7UHXqdxBvIjL7TxWEDxQ4NiapCHcMJECbw+MqJvnWb2U2PBQYct7VL8nQx9lF0mmrrtMUld5QkrlWiXjWUQPUNcsYQUf4MXblQiml98oxkW1j54LUU1dGRjyKHMPLnqwzNGCvQgn1zk3X6jBh2GR8Fs2tXv9wRjgEaMbdvZG05oxjcN4zkW5ZL+9NGs3EQqFQ0XmnZmaWGwTbXISTj2g3aakNvqJ4hqe6G/bsoLvsSXv/3qBMAGaP+Vqqy/2OmHr+KYdbRA74X1LN9nvA5S8POOmHCwCrAlIwR6gizk5IhdeJ3T95HIr8fi0ALKp0C2CczS/sggAo3GISi77bf6J3EkglJoYWca2GftoGgZqH98/UKKlrz9q7oZRi8G8t4RJ5VNFadF8g6m+8lcCLEqNnJUjyWNCiD41Rwu0dKruZu/h9060PpUQyhx0ktF6sdr6mqfniU8loSHkeyu8tHZ/4NroQPFQhInselnR8jNX/QC3MnrBuCyE7XtDoNvJ4uaL1LO869xff91KQuD34HOC3FuQb2h47mOdrkZsGYFtyrFf/aie1K/CTfJs9+laePRSG8PUj4Na6IsEAe3LQusnaqf8rAr+iTlDEhXwPKsXGcHvn+6649yqFxw2AS4rCWK0wRHeHLsIURkYTe6o9xQor/1hUTyY6pSnCH5PWrcgElrCVARa+LyJQ+FbWwumpKr0TsRZjE97P9tZViShk9inACkTv8+sXxyOGbvx8M3aRvLHP2wIU+80eL9togOaV7VUXVCM2CjsghrQjmNIArfuvtLgzXCn6Rjy9SQnl3hqElFHZl5RRRE8d1HHCtAH2QXJAwkf5LJDOS54N+xQWTuIhBqx+WkepsZSrPIIbA7cTBYMEpBE CO0/q4O4 hrcG+E4wSBi6Apr9vzFRx7rrEH54BYP7yOMx9AjDLCbom5wuJ0Zp3MlH4DpKnmqfYhVWzjZcI/+BjECa06AlxPOmr0iMbWmr56RL5mNX+SKdpJH3IYHh74Fqg7rYztSJiVreqBGwFmtfkmmt8A/u24YIcKtTHOnz5znqjcVKpRJT4TY4lVYY/ued7zqC0LPkWaFoDXKMNLsbmIfwuHmOLnD0OJkBArmKmhguRA7qkUjm/CKgM9GtPQx/iKeYoU/vaY+7tqZhdNQF4mwYcMGN+HGGosLiFJSpj0QANu4TrEW1XOcECYgYgVybT76fQfKX9YAqR X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sat, 2024-01-27 at 11:44 -0800, Linus Torvalds wrote: [...] >  (c) none of the above is generally true of virtual filesystems > > Sure, *some* virtual filesystems are designed to act like a > filesystem from the ground up. Something like "tmpfs" is obviously a > virtual filesystem, but it's "virtual" only in the sense that it > doesn't have much of a backing store. It's still designed primarily > to *be* a filesystem, and the only operations that happen on it are > filesystem operations. > > So ignore 'tmpfs' here, and think about all the other virtual > filesystems we have. Actually, I did look at tmpfs and it did help. > And realize that hey aren't really designed to be filesystems per se > - they are literally designed to be something entirely different, and > the filesystem interface is then only a secondary thing - it's a > window into a strange non-filesystem world where normal filesystem > operations don't even exist, even if sometimes there can be some kind > of convoluted transformation for them. > > So you have "simple" things like just plain read-only files in /proc, > and desp[ite being about as simple as they come, they fail miserably > at the most fundamental part of a file: you can't even 'stat()' them > and get sane file size data from them. Well, this is a big piece of the problem: when constructing a virtual filesystem what properties do I really need to care about (like stat or uniqueness of inode numbers) and what can I simply ignore? Ideally this should be documented because you have to read a lot of code to get an idea of what the must have properties are. I think a simple summary of this would go a long way to getting people somewhat out of the swamp that sucks you in when you try to construct virtual filesystems. > And "caching" - which was the #1 reason for most of the filesystem > code - ends up being much less so, although it turns out that it's > still hugely important because of the abstraction interface it > allows. > > So all those dentries, and all the complicated lookup code, end up > still being quite important to make the virtual filesystem look like > a filesystem at all: it's what gives you the 'getcwd()' system call, > it's what still gives you the whole bind mount thing, it really ends > up giving a lot of "structure" to the virtual filesystem that would > be an absolute nightmare without it.  But it's a structure that is > really designed for something else. I actually found dentries (which were the foundation of shiftfs) quite easy. My biggest problem was the places in the code where we use a bare dentry and I needed the struct mnt (or struct path) as well, but that's a different discussion. > Because the non-filesystem virtual part that a virtual filesystem is > actually trying to expose _as_ a filesystem to user space usually has > lifetime rules (and other rules) that are *entirely* unrelated to any > filesystem activity. A user can "chdir()" into a directory that > describes a process, but the lifetime of that process is then > entirely unrelated to that, and it can go away as a process, while > the directory still has to virtually exist. On this alone, real filesystems do have the unplug problem as well (device goes away while user is in the directory), so the solution that works for them work for virtual filesystems as well. > That's part of what the VFS code gives a virtual filesystem: the > dentries etc end up being those things that hang around even when the > virtual part that they described may have disappeared. And you *need* > that, just to get sane UNIX 'home directory' semantics. > > I think people often don't think of how much that VFS infrastructure > protects them from. > > But it's also why virtual filesystems are generally a complete mess: > you have these two pieces, and they are really doing two *COMPLETELY* > different things. > > It's why I told Steven so forcefully that tracefs must not mess > around with VFS internals. A virtual filesystem either needs to be a > "real filesystem" aka tmpfs and just leave it *all* to the VFS layer, > or it needs to just treat the dentries as a separate cache that the > virtual filesystem is *not* in charge of, and trust the VFS layer to > do the filesystem parts. > > But no. You should *not* look at a virtual filesystem as a guide how > to write a filesystem, or how to use the VFS. Look at a real FS. A > simple one, and preferably one that is built from the ground up to > look like a POSIX one, so that you don't end up getting confused by > all the nasty hacks to make it all look ok. Well, I did look at ext4 when I was wondering what a real filesystem does, but we're back to having to read real and virtual filesystems now just to understand what you have to do and hence we're back to the "how do we make this easier" problem. > IOW, while FAT is a simple filesystem, don't look at that one, just > because then you end up with all the complications that come from > decades of non-UNIX filesystem history. > > I'd say "look at minix or sysv filesystems", except those may be > simple but they also end up being so legacy that they aren't good > examples. You shouldn't use buffer-heads for anything new. But they > are still probably good examples for one thing: if you want to > understand the real power of dentries, look at either of the minix or > sysv 'namei.c' files. Just *look* at how simple they are. Ignore the > internal implementation of how a directory entry is then looked up on > disk - because that's obviously filesystem-specific - and instead > just look at the interface. So shall I put you down for helping with virtual filesystem documentation then ... ? James