linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/54] tree-in-dcache stuff
@ 2025-11-18  5:15 Al Viro
  2025-11-18  5:15 ` [PATCH v4 01/54] fuse_ctl_add_conn(): fix nlink breakage in case of early failure Al Viro
                   ` (53 more replies)
  0 siblings, 54 replies; 55+ messages in thread
From: Al Viro @ 2025-11-18  5:15 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: torvalds, brauner, jack, raven, miklos, neil, a.hindborg,
	linux-mm, linux-efi, ocfs2-devel, kees, rostedt, gregkh,
	linux-usb, paul, casey, linuxppc-dev, john.johansen, selinux,
	borntraeger, bpf, clm

Some filesystems use a kinda-sorta controlled dentry refcount leak to pin
dentries of created objects in dcache (and undo it when removing those).
Reference is grabbed and not released, but it's not actually _stored_
anywhere.  That works, but it's hard to follow and verify; among other
things, we have no way to tell _which_ of the increments is intended
to be an unpaired one.  Worse, on removal we need to decide whether
the reference had already been dropped, which can be non-trivial if
that removal is on umount and we need to figure out if this dentry is
pinned due to e.g. unlink() not done.  Usually that is handled by using
kill_litter_super() as ->kill_sb(), but there are open-coded special
cases of the same (consider e.g. /proc/self).

Things get simpler if we introduce a new dentry flag (DCACHE_PERSISTENT)
marking those "leaked" dentries.  Having it set claims responsibility
for +1 in refcount.

The end result this series is aiming for:

* get these unbalanced dget() and dput() replaced with new primitives that
  would, in addition to adjusting refcount, set and clear persistency flag.
* instead of having kill_litter_super() mess with removing the remaining
  "leaked" references (e.g. for all tmpfs files that hadn't been removed
  prior to umount), have the regular shrink_dcache_for_umount() strip
  DCACHE_PERSISTENT of all dentries, dropping the corresponding
  reference if it had been set.  After that kill_litter_super() becomes
  an equivalent of kill_anon_super().

Doing that in a single step is not feasible - it would affect too many places
in too many filesystems.  It has to be split into a series.

This work has really started early in 2024; quite a few preliminary pieces
have already gone into mainline.  This chunk is finally getting to the
meat of that stuff - infrastructure and most of the conversions to it.

Some pieces are still sitting in the local branches, but the bulk of
that stuff is here.

Compared to v3:
	* fixed a functionfs braino around ffs_epfiles_destroy() (in #40/54,
used to be #36/50).
	* added fixes for a couple of UAF in functionfs (##36--39); that
does *NOT* include any fixes for dmabuf bugs Chris posted last week, though.

The branch is -rc5-based; it lives in
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #work.persistency
individual patches in followups.

Please, help with review and testing.  If nobody objects, in a few days it
goes into #for-next.

Shortlog:
      fuse_ctl_add_conn(): fix nlink breakage in case of early failure
      tracefs: fix a leak in eventfs_create_events_dir()
      new helper: simple_remove_by_name()
      new helper: simple_done_creating()
      introduce a flag for explicitly marking persistently pinned dentries
      primitives for maintaining persisitency
      convert simple_{link,unlink,rmdir,rename,fill_super}() to new primitives
      convert ramfs and tmpfs
      procfs: make /self and /thread_self dentries persistent
      configfs, securityfs: kill_litter_super() not needed
      convert xenfs
      convert smackfs
      convert hugetlbfs
      convert mqueue
      convert bpf
      convert dlmfs
      convert fuse_ctl
      convert pstore
      convert tracefs
      convert debugfs
      debugfs: remove duplicate checks in callers of start_creating()
      convert efivarfs
      convert spufs
      convert ibmasmfs
      ibmasmfs: get rid of ibmasmfs_dir_ops
      convert devpts
      binderfs: use simple_start_creating()
      binderfs_binder_ctl_create(): kill a bogus check
      convert binderfs
      autofs_{rmdir,unlink}: dentry->d_fsdata->dentry == dentry there
      convert autofs
      convert binfmt_misc
      selinuxfs: don't stash the dentry of /policy_capabilities
      selinuxfs: new helper for attaching files to tree
      convert selinuxfs
      functionfs: don't abuse ffs_data_closed() on fs shutdown
      functionfs: don't bother with ffs->ref in ffs_data_{opened,closed}()
      functionfs: need to cancel ->reset_work in ->kill_sb()
      functionfs: fix the open/removal races
      functionfs: switch to simple_remove_by_name()
      convert functionfs
      gadgetfs: switch to simple_remove_by_name()
      convert gadgetfs
      hypfs: don't pin dentries twice
      hypfs: switch hypfs_create_str() to returning int
      hypfs: swich hypfs_create_u64() to returning int
      convert hypfs
      convert rpc_pipefs
      convert nfsctl
      convert rust_binderfs
      get rid of kill_litter_super()
      convert securityfs
      kill securityfs_recursive_remove()
      d_make_discardable(): warn if given a non-persistent dentry

Diffstat:
 Documentation/filesystems/porting.rst     |   7 ++
 arch/powerpc/platforms/cell/spufs/inode.c |  17 ++-
 arch/s390/hypfs/hypfs.h                   |   6 +-
 arch/s390/hypfs/hypfs_diag_fs.c           |  60 ++++------
 arch/s390/hypfs/hypfs_vm_fs.c             |  21 ++--
 arch/s390/hypfs/inode.c                   |  82 +++++--------
 drivers/android/binder/rust_binderfs.c    | 121 ++++++-------------
 drivers/android/binderfs.c                |  82 +++----------
 drivers/base/devtmpfs.c                   |   2 +-
 drivers/misc/ibmasm/ibmasmfs.c            |  24 ++--
 drivers/usb/gadget/function/f_fs.c        | 144 +++++++++++++----------
 drivers/usb/gadget/legacy/inode.c         |  49 ++++----
 drivers/xen/xenfs/super.c                 |   2 +-
 fs/autofs/inode.c                         |   2 +-
 fs/autofs/root.c                          |  11 +-
 fs/binfmt_misc.c                          |  69 ++++++-----
 fs/configfs/dir.c                         |  10 +-
 fs/configfs/inode.c                       |   3 +-
 fs/configfs/mount.c                       |   2 +-
 fs/dcache.c                               | 111 +++++++++++-------
 fs/debugfs/inode.c                        |  32 ++----
 fs/devpts/inode.c                         |  57 ++++-----
 fs/efivarfs/inode.c                       |   7 +-
 fs/efivarfs/super.c                       |   5 +-
 fs/fuse/control.c                         |  38 +++---
 fs/hugetlbfs/inode.c                      |  12 +-
 fs/internal.h                             |   1 -
 fs/libfs.c                                |  52 +++++++--
 fs/nfsd/nfsctl.c                          |  18 +--
 fs/ocfs2/dlmfs/dlmfs.c                    |   8 +-
 fs/proc/base.c                            |   6 +-
 fs/proc/internal.h                        |   1 +
 fs/proc/root.c                            |  14 +--
 fs/proc/self.c                            |  10 +-
 fs/proc/thread_self.c                     |  11 +-
 fs/pstore/inode.c                         |   7 +-
 fs/ramfs/inode.c                          |   8 +-
 fs/super.c                                |   8 --
 fs/tracefs/event_inode.c                  |   7 +-
 fs/tracefs/inode.c                        |  13 +--
 include/linux/dcache.h                    |   4 +-
 include/linux/fs.h                        |   6 +-
 include/linux/proc_fs.h                   |   2 -
 include/linux/security.h                  |   2 -
 init/do_mounts.c                          |   2 +-
 ipc/mqueue.c                              |  12 +-
 kernel/bpf/inode.c                        |  15 +--
 mm/shmem.c                                |  38 ++----
 net/sunrpc/rpc_pipe.c                     |  27 ++---
 security/apparmor/apparmorfs.c            |  13 ++-
 security/inode.c                          |  35 +++---
 security/selinux/selinuxfs.c              | 185 +++++++++++++-----------------
 security/smack/smackfs.c                  |   2 +-
 53 files changed, 649 insertions(+), 834 deletions(-)

	Overview:

First two commits are bugfixes (fusectl and tracefs resp.)

[1/54] fuse_ctl_add_conn(): fix nlink breakage in case of early failure
[2/54] tracefs: fix a leak in eventfs_create_events_dir()

Next, two commits adding a couple of useful helpers, the next three adding
the infrastructure and the rest consists of per-filesystem conversions.

[3/54] new helper: simple_remove_by_name()
[4/54] new helper: simple_done_creating()
	end_creating_path() analogue for internal object creation; unlike
end_creating_path() no mount is passed to it (or guaranteed to exist, for
that matter - it might be used during the filesystem setup, before the
superblock gets attached to any mounts).

Infrastructure:
[5/54] introduce a flag for explicitly marking persistently pinned dentries
	* introduce the new flag
	* teach shrink_dcache_for_umount() to handle it (i.e. remove
and drop refcount on anything that survives to umount with that flag
still set)
	* teach kill_litter_super() that anything with that flag does
*not* need to be unpinned.
[6/54] primitives for maintaining persisitency
	* d_make_persistent(dentry, inode) - bump refcount, mark persistent
and make hashed positive.  Return value is a borrowed reference to dentry;
it can be used until something removes persistency (at the very least,
until the parent gets unlocked, but some filesystems may have stronger
exclusion).
	* d_make_discardable() - remove persistency mark and drop reference.

NOTE: at that stage d_make_discardable() does not reject dentries not
marked persistent - it acts as if the mark been set.

Rationale: less noise in series splitup that way.  We want (and on the
next commit will get) simple_unlink() to do the right thing - remove
persistency, if it's there.  However, it's used by many filesystems.
We would have either to convert them all at once or split simple_unlink()
into "want persistent" and "don't want persistent" versions, the latter
being the old one.  In the course of the series almost all callers
would migrate to the replacement, leaving only two pathological cases
with the old one.  The same goes for simple_rmdir() (two callers left in
the end), simple_recursive_removal() (all callers gone in the end), etc.
That's a lot of noise and it's easier to start with d_make_discardable()
quietly accepting non-persistent dentries, then, in the end, add private
copies of simple_unlink() and simple_rmdir() for two weird users (configfs
and apparmorfs) and have those use dput() instead of d_make_discardable().
At that point we'd be left with all callers of d_make_discardable()
always passing persistent dentries, allowing to add a warning in it.

[7/54] convert simple_{link,unlink,rmdir,rename,fill_super}() to new primitives
	See above re quietly accepting non-peristent dentries in
simple_unlink(), simple_rmdir(), etc.

	Converting filesystems:
[8/54] convert ramfs and tmpfs
[9/54] procfs: make /self and /thread_self dentries persistent
[10/54] configfs, securityfs: kill_litter_super() not needed
[11/54] convert xenfs
[12/54] convert smackfs
[13/54] convert hugetlbfs
[14/54] convert mqueue
[15/54] convert bpf
[16/54] convert dlmfs
[17/54] convert fuse_ctl
[18/54] convert pstore
[19/54] convert tracefs
[20/54] convert debugfs
[21/54] debugfs: remove duplicate checks in callers of start_creating()
[22/54] convert efivarfs
[23/54] convert spufs
[24/54] convert ibmasmfs
[25/54] ibmasmfs: get rid of ibmasmfs_dir_ops
[26/54] convert devpts
[27/54] binderfs: use simple_start_creating()
[28/54] binderfs_binder_ctl_create(): kill a bogus check
[29/54] convert binderfs
[30/54] autofs_{rmdir,unlink}: dentry->d_fsdata->dentry == dentry there
[31/54] convert autofs
[32/54] convert binfmt_misc
[33/54] selinuxfs: don't stash the dentry of /policy_capabilities
[34/54] selinuxfs: new helper for attaching files to tree
[35/54] convert selinuxfs

	Several functionfs fixes, before converting it, to make life
simpler for backporting:
[36/54] functionfs: don't abuse ffs_data_closed() on fs shutdown
[37/54] functionfs: don't bother with ffs->ref in ffs_data_{opened,closed}()
[38/54] functionfs: need to cancel ->reset_work in ->kill_sb()
[39/54] functionfs: fix the open/removal races

	... and back to filesystems conversions:

[40/54] functionfs: switch to simple_remove_by_name()
[41/54] convert functionfs
[42/54] gadgetfs: switch to simple_remove_by_name()
[43/54] convert gadgetfs
[44/54] hypfs: don't pin dentries twice
[45/54] hypfs: switch hypfs_create_str() to returning int
[46/54] hypfs: swich hypfs_create_u64() to returning int
[47/54] convert hypfs
[48/54] convert rpc_pipefs
[49/54] convert nfsctl
[50/54] convert rust_binderfs

	... and no kill_litter_super() callers remain, so we
can take it out:
[51/54] get rid of kill_litter_super()
	
	Followups:
[52/54] convert securityfs
	That was the last remaining user of simple_recursive_removal()
that did *not* mark things persistent.  Now the only places where
d_make_discardable() is still called for dentries that are not marked
persistent are the calls of simple_{unlink,rmdir}() in configfs and
apparmorfs.

[53/54] kill securityfs_recursive_remove()
	Unused macro...

[54/54] d_make_discardable(): warn if given a non-persistent dentry

At this point there are very few call chains that might lead to
d_make_discardable() on a dentry that hadn't been made persistent:
calls of simple_unlink() and simple_rmdir() in configfs and
apparmorfs.

Both filesystems do pin (part of) their contents in dcache, but
they are currently playing very unusual games with that.  Converting
them to more usual patterns might be possible, but it's definitely
going to be a long series of changes in both cases.

For now the easiest solution is to have both stop using simple_unlink()
and simple_rmdir() - that allows to make d_make_discardable() warn
when given a non-persistent dentry.

Rather than giving them full-blown private copies (with calls of
d_make_discardable() replaced with dput()), let's pull the parts of
simple_unlink() and simple_rmdir() that deal with timestamps and link
counts into separate helpers (__simple_unlink() and __simple_rmdir()
resp.) and have those used by configfs and apparmorfs.



^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2025-11-18 10:01 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-11-18  5:15 [PATCH v4 00/54] tree-in-dcache stuff Al Viro
2025-11-18  5:15 ` [PATCH v4 01/54] fuse_ctl_add_conn(): fix nlink breakage in case of early failure Al Viro
2025-11-18  5:15 ` [PATCH v4 02/54] tracefs: fix a leak in eventfs_create_events_dir() Al Viro
2025-11-18  5:15 ` [PATCH v4 03/54] new helper: simple_remove_by_name() Al Viro
2025-11-18  5:15 ` [PATCH v4 04/54] new helper: simple_done_creating() Al Viro
2025-11-18  5:15 ` [PATCH v4 05/54] introduce a flag for explicitly marking persistently pinned dentries Al Viro
2025-11-18  5:15 ` [PATCH v4 06/54] primitives for maintaining persisitency Al Viro
2025-11-18  5:15 ` [PATCH v4 07/54] convert simple_{link,unlink,rmdir,rename,fill_super}() to new primitives Al Viro
2025-11-18  5:15 ` [PATCH v4 08/54] convert ramfs and tmpfs Al Viro
2025-11-18  5:15 ` [PATCH v4 09/54] procfs: make /self and /thread_self dentries persistent Al Viro
2025-11-18  5:15 ` [PATCH v4 10/54] configfs, securityfs: kill_litter_super() not needed Al Viro
2025-11-18  5:15 ` [PATCH v4 11/54] convert xenfs Al Viro
2025-11-18  5:15 ` [PATCH v4 12/54] convert smackfs Al Viro
2025-11-18  5:15 ` [PATCH v4 13/54] convert hugetlbfs Al Viro
2025-11-18  5:15 ` [PATCH v4 14/54] convert mqueue Al Viro
2025-11-18  5:15 ` [PATCH v4 15/54] convert bpf Al Viro
2025-11-18  5:15 ` [PATCH v4 16/54] convert dlmfs Al Viro
2025-11-18  5:15 ` [PATCH v4 17/54] convert fuse_ctl Al Viro
2025-11-18  5:15 ` [PATCH v4 18/54] convert pstore Al Viro
2025-11-18  5:15 ` [PATCH v4 19/54] convert tracefs Al Viro
2025-11-18  5:15 ` [PATCH v4 20/54] convert debugfs Al Viro
2025-11-18  5:15 ` [PATCH v4 21/54] debugfs: remove duplicate checks in callers of start_creating() Al Viro
2025-11-18  5:15 ` [PATCH v4 22/54] convert efivarfs Al Viro
2025-11-18  5:15 ` [PATCH v4 23/54] convert spufs Al Viro
2025-11-18  5:15 ` [PATCH v4 24/54] convert ibmasmfs Al Viro
2025-11-18  5:15 ` [PATCH v4 25/54] ibmasmfs: get rid of ibmasmfs_dir_ops Al Viro
2025-11-18  5:15 ` [PATCH v4 26/54] convert devpts Al Viro
2025-11-18  5:15 ` [PATCH v4 27/54] binderfs: use simple_start_creating() Al Viro
2025-11-18  5:15 ` [PATCH v4 28/54] binderfs_binder_ctl_create(): kill a bogus check Al Viro
2025-11-18  5:15 ` [PATCH v4 29/54] convert binderfs Al Viro
2025-11-18  5:15 ` [PATCH v4 30/54] autofs_{rmdir,unlink}: dentry->d_fsdata->dentry == dentry there Al Viro
2025-11-18  5:15 ` [PATCH v4 31/54] convert autofs Al Viro
2025-11-18  5:15 ` [PATCH v4 32/54] convert binfmt_misc Al Viro
2025-11-18  5:15 ` [PATCH v4 33/54] selinuxfs: don't stash the dentry of /policy_capabilities Al Viro
2025-11-18  5:15 ` [PATCH v4 34/54] selinuxfs: new helper for attaching files to tree Al Viro
2025-11-18  5:15 ` [PATCH v4 35/54] convert selinuxfs Al Viro
2025-11-18  5:15 ` [PATCH v4 36/54] functionfs: don't abuse ffs_data_closed() on fs shutdown Al Viro
2025-11-18  5:15 ` [PATCH v4 37/54] functionfs: don't bother with ffs->ref in ffs_data_{opened,closed}() Al Viro
2025-11-18  5:15 ` [PATCH v4 38/54] functionfs: need to cancel ->reset_work in ->kill_sb() Al Viro
2025-11-18  5:15 ` [PATCH v4 39/54] functionfs: fix the open/removal races Al Viro
2025-11-18  5:15 ` [PATCH v4 40/54] functionfs: switch to simple_remove_by_name() Al Viro
2025-11-18  5:15 ` [PATCH v4 41/54] convert functionfs Al Viro
2025-11-18  5:15 ` [PATCH v4 42/54] gadgetfs: switch to simple_remove_by_name() Al Viro
2025-11-18  5:15 ` [PATCH v4 43/54] convert gadgetfs Al Viro
2025-11-18  5:15 ` [PATCH v4 44/54] hypfs: don't pin dentries twice Al Viro
2025-11-18  5:15 ` [PATCH v4 45/54] hypfs: switch hypfs_create_str() to returning int Al Viro
2025-11-18  5:15 ` [PATCH v4 46/54] hypfs: swich hypfs_create_u64() " Al Viro
2025-11-18  5:15 ` [PATCH v4 47/54] convert hypfs Al Viro
2025-11-18  5:15 ` [PATCH v4 48/54] convert rpc_pipefs Al Viro
2025-11-18  5:15 ` [PATCH v4 49/54] convert nfsctl Al Viro
2025-11-18  5:15 ` [PATCH v4 50/54] convert rust_binderfs Al Viro
2025-11-18  5:16 ` [PATCH v4 51/54] get rid of kill_litter_super() Al Viro
2025-11-18  5:16 ` [PATCH v4 52/54] convert securityfs Al Viro
2025-11-18  5:16 ` [PATCH v4 53/54] kill securityfs_recursive_remove() Al Viro
2025-11-18  5:16 ` [PATCH v4 54/54] d_make_discardable(): warn if given a non-persistent dentry Al Viro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox