linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Al Viro <viro@zeniv.linux.org.uk>
To: Mateusz Guzik <mjguzik@gmail.com>
Cc: Max Kellermann <max.kellermann@ionos.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	ceph-devel@vger.kernel.org
Subject: Re: Need advice with iput() deadlock during writeback
Date: Wed, 17 Sep 2025 22:42:29 +0100	[thread overview]
Message-ID: <20250917214229.GF39973@ZenIV> (raw)
In-Reply-To: <20250917210241.GD39973@ZenIV>

On Wed, Sep 17, 2025 at 10:02:41PM +0100, Al Viro wrote:
> On Wed, Sep 17, 2025 at 10:39:22PM +0200, Mateusz Guzik wrote:
> 
> > Linux has to have something of the sort for dentries, otherwise the
> > current fput stuff would not be safe. I find it surprising to learn
> > inodes are treated differently.
> 
> If you are looking at vnode counterparts, dentries are closer to that.
> Inodes are secondary.
> 
> And no, it's not a "wait for references to go away" - every file holds
> a _pair_ of references, one to mount and another to dentry.
> 
> Additional references to mount => umount() gets -EBUSY, lazy umount()
> (with MNT_DETACH) gets the sucker removed from the mount tree, with
> shutdown deferred (at least) until the last reference to mount goes away.
> 
> Once the mount refcount hits zero and the damn thing gets taken apart,
> an active reference to superblock (i.e. to filesystem instance) is
> dropped.
> 
> If that was not the last one (e.g. it's mounted elsewhere as well), we
> are not waiting for anything.  If it *was* the last active ref, we
> shut the filesystem instance down; that's _it_ - once you are into
> ->kill_sb(), it's all over.
> 
> Linux VFS is seriously different from Heidemann's-derived ones you'll find in
> BSD land these days.  Different taxonomy of objects, among other things...

FWIW, the basic overview of objects:

super_block: filesystem instance.  Two refcounts (passive and active, having
positive active refcount counts as one passive reference).  Shutdown when
active refcount gets to zero; freeing of in-core struct super_block - when
passive gets there.

mount: a subtree of an active filesystem.  Most of them are in mount tree(s),
but they might exist on their own - e.g. pipefs one, etc.  Has a refcount,
bears an active reference to fs instance (super_block) *and* a reference to
a dentry belonging to that instance - root of the (sub)tree visible in
it.  Shutdown when refcount hits zero.  Being in mount tree contributes
to refcount; that contribution goes away when it's detached from the tree
(on umount, normally).  Refcount is responsible for -EBUSY from non-lazy
umount; lazy one (umount -l, umount2(path, MNT_DETACH)) dissolves the entire
subtree that used to be mounted at that point and shuts down everything
that had refcounts reach zero, leaving the rest until their refcounts drop
to zero too.  Shutdown drops the superblock and root dentry refs.

inode & dentry: that's what vnodes map onto.  Dentry is the main object,
inode is secondary.  Each belongs to a specific fs instance for the entire
lifetime.  Dentries form a forest; inodes are attached to some of them.
Details are a lot more involved than anything that would fit into a short
overview.  Both are refcounted, attaching dentry to an inode contributes
1 to inode's refcount.  Child dentry contributes 1 to refcount of parent.
Shutdown does *not* happen until the dentry refcount hits zero; once it's
zero, the normal policy is "keep it around if it's still hashed", but
filesystem may say "no point keeping it".  Memory pressure => kill the
ones with zero refcount (and if their parents had been pinned only by
those children, take the parents out as well, etc.).  Filesystem shutdown =>
kick out everything with zero refcount, complain if anything's left after
that (shrink_dcache_for_umount() does it, so if filesystem kept anything
pinned internally, it would better drop those before we get to that
point).  evict_inodes() does the same to inodes.

file: the usual; open IO channel, as on any Unix.  Carries a reference to
dentry and to mount.  Shutdown happens when refcount goes to zero, normally
delayed until return to userland, when we are on shallow stack and without
any locks held.  Incidentally, sockets and pipes come with those as well -
none of the "sockets don't have a vnode" headache.

cwd (and process's root as well): a pair of mount and dentry references.


  parent reply	other threads:[~2025-09-17 21:42 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-17  8:07 Max Kellermann
2025-09-17  8:23 ` Mateusz Guzik
2025-09-17  8:38   ` Max Kellermann
2025-09-17  8:59     ` Mateusz Guzik
2025-09-17  9:20       ` Max Kellermann
2025-09-17  9:32         ` Mateusz Guzik
2025-09-17 12:48         ` Max Kellermann
2025-09-17 20:14       ` Al Viro
2025-09-17 20:19         ` Max Kellermann
2025-09-17 20:29           ` Al Viro
2025-09-17 20:32             ` Max Kellermann
2025-09-17 20:23         ` Mateusz Guzik
2025-09-17 20:34           ` Al Viro
2025-09-17 20:36             ` Max Kellermann
2025-09-17 21:10               ` Al Viro
2025-09-17 21:19                 ` Max Kellermann
2025-09-17 21:20                   ` Mateusz Guzik
2025-09-17 20:39             ` Mateusz Guzik
2025-09-17 21:02               ` Al Viro
2025-09-17 21:18                 ` Mateusz Guzik
2025-09-17 21:42                 ` Al Viro [this message]
2025-09-17 22:58                   ` Mateusz Guzik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250917214229.GF39973@ZenIV \
    --to=viro@zeniv.linux.org.uk \
    --cc=ceph-devel@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=max.kellermann@ionos.com \
    --cc=mjguzik@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox