linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: James Bottomley <James.Bottomley@HansenPartnership.com>
To: Linus Torvalds <torvalds@linux-foundation.org>,
	Matthew Wilcox <willy@infradead.org>
Cc: Amir Goldstein <amir73il@gmail.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	 Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org,
	 linux-mm@kvack.org, Christian Brauner <brauner@kernel.org>,
	Al Viro <viro@zeniv.linux.org.uk>
Subject: Re: [LSF/MM TOPIC] Making pseudo file systems inodes/dentries more like normal file systems
Date: Sat, 27 Jan 2024 15:23:29 -0500	[thread overview]
Message-ID: <2b5c46a4dc3cb8206079d4dfc661df53939ee06a.camel@HansenPartnership.com> (raw)
In-Reply-To: <CAHk-=whXg6zAHWZ7f+CdOg5GOMffR3RSDVyvORTZhipxp5iAFQ@mail.gmail.com>

On Sat, 2024-01-27 at 11:44 -0800, Linus Torvalds wrote:
[...]
>  (c) none of the above is generally true of virtual filesystems
> 
> Sure, *some* virtual filesystems are designed to act like a
> filesystem from the ground up. Something like "tmpfs" is obviously a
> virtual filesystem, but it's "virtual" only in the sense that it
> doesn't have much of a backing store. It's still designed primarily
> to *be* a filesystem, and the only operations that happen on it are
> filesystem operations.
> 
> So ignore 'tmpfs' here, and think about all the other virtual
> filesystems we have.

Actually, I did look at tmpfs and it did help.

> And realize that hey aren't really designed to be filesystems per se
> - they are literally designed to be something entirely different, and
> the filesystem interface is then only a secondary thing - it's a
> window into a strange non-filesystem world where normal filesystem
> operations don't even exist, even if sometimes there can be some kind
> of convoluted transformation for them.
> 
> So you have "simple" things like just plain read-only files in /proc,
> and desp[ite being about as simple as they come, they fail miserably
> at the most fundamental part of a file: you can't even 'stat()' them
> and get sane file size data from them.

Well, this is a big piece of the problem: when constructing a virtual
filesystem what properties do I really need to care about (like stat or
uniqueness of inode numbers) and what can I simply ignore?  Ideally
this should be documented because you have to read a lot of code to get
an idea of what the must have properties are.  I think a simple summary
of this would go a long way to getting people somewhat out of the swamp
that sucks you in when you try to construct virtual filesystems.

> And "caching" - which was the #1 reason for most of the filesystem
> code - ends up being much less so, although it turns out that it's
> still hugely important because of the abstraction interface it
> allows.
> 
> So all those dentries, and all the complicated lookup code, end up
> still being quite important to make the virtual filesystem look like
> a filesystem at all: it's what gives you the 'getcwd()' system call,
> it's what still gives you the whole bind mount thing, it really ends
> up giving a lot of "structure" to the virtual filesystem that would
> be an absolute nightmare without it.  But it's a structure that is
> really designed for something else.

I actually found dentries (which were the foundation of shiftfs) quite
easy.  My biggest problem was the places in the code where we use a
bare dentry and I needed the struct mnt (or struct path) as well, but
that's a different discussion.

> Because the non-filesystem virtual part that a virtual filesystem is
> actually trying to expose _as_ a filesystem to user space usually has
> lifetime rules (and other rules) that are *entirely* unrelated to any
> filesystem activity. A user can "chdir()" into a directory that
> describes a process, but the lifetime of that process is then
> entirely unrelated to that, and it can go away as a process, while
> the directory still has to virtually exist.

On this alone, real filesystems do have the unplug problem as well
(device goes away while user is in the directory), so the solution that
works for them work for virtual filesystems as well.

> That's part of what the VFS code gives a virtual filesystem: the
> dentries etc end up being those things that hang around even when the
> virtual part that they described may have disappeared. And you *need*
> that, just to get sane UNIX 'home directory' semantics.
> 
> I think people often don't think of how much that VFS infrastructure
> protects them from.
> 
> But it's also why virtual filesystems are generally a complete mess:
> you have these two pieces, and they are really doing two *COMPLETELY*
> different things.
> 
> It's why I told Steven so forcefully that tracefs must not mess
> around with VFS internals. A virtual filesystem either needs to be a
> "real filesystem" aka tmpfs and just leave it *all* to the VFS layer,
> or it needs to just treat the dentries as a separate cache that the
> virtual filesystem is *not* in charge of, and trust the VFS layer to
> do the filesystem parts.
> 
> But no. You should *not* look at a virtual filesystem as a guide how
> to write a filesystem, or how to use the VFS. Look at a real FS. A
> simple one, and preferably one that is built from the ground up to
> look like a POSIX one, so that you don't end up getting confused by
> all the nasty hacks to make it all look ok.

Well, I did look at ext4 when I was wondering what a real filesystem
does, but we're back to having to read real and virtual filesystems now
just to understand what you have to do and hence we're back to the "how
do we make this easier" problem.

> IOW, while FAT is a simple filesystem, don't look at that one, just
> because then you end up with all the complications that come from
> decades of non-UNIX filesystem history.
> 
> I'd say "look at minix or sysv filesystems", except those may be
> simple but they also end up being so legacy that they aren't good
> examples. You shouldn't use buffer-heads for anything new. But they
> are still probably good examples for one thing: if you want to
> understand the real power of dentries, look at either of the minix or
> sysv 'namei.c' files. Just *look* at how simple they are. Ignore the
> internal implementation of how a directory entry is then looked up on
> disk - because that's obviously filesystem-specific - and instead
> just look at the interface.

So shall I put you down for helping with virtual filesystem
documentation then ... ?

James



  reply	other threads:[~2024-01-27 20:23 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-25 15:48 Steven Rostedt
2024-01-26  1:24 ` Greg Kroah-Hartman
2024-01-26  1:50   ` Steven Rostedt
2024-01-26  1:59     ` Greg Kroah-Hartman
2024-01-26  2:40       ` Steven Rostedt
2024-01-26 14:16         ` Greg Kroah-Hartman
2024-01-26 15:15           ` Steven Rostedt
2024-01-26 15:41             ` Greg Kroah-Hartman
2024-01-26 16:44               ` Steven Rostedt
2024-01-27 10:15                 ` Amir Goldstein
2024-01-27 14:54                   ` Steven Rostedt
2024-01-27 14:59                   ` James Bottomley
2024-01-27 18:06                     ` Matthew Wilcox
2024-01-27 19:44                       ` Linus Torvalds
2024-01-27 20:23                         ` James Bottomley [this message]
2024-01-29 15:08                         ` Christian Brauner
2024-01-29 15:57                           ` Steven Rostedt
2024-01-27 20:07                       ` James Bottomley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2b5c46a4dc3cb8206079d4dfc661df53939ee06a.camel@HansenPartnership.com \
    --to=james.bottomley@hansenpartnership.com \
    --cc=amir73il@gmail.com \
    --cc=brauner@kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=rostedt@goodmis.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox