Re: [PATCH 00/14] xattr: rework simple xattrs and support user.* xattrs on sockets

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Christian Brauner <brauner@kernel.org>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: linux-fsdevel@vger.kernel.org, Jeff Layton <jlayton@kernel.org>,
	 Josef Bacik <josef@toxicpanda.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	 Jan Kara <jack@suse.cz>,
	linux-kernel@vger.kernel.org, Hugh Dickins <hughd@google.com>,
	 linux-mm@kvack.org,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	 Tejun Heo <tj@kernel.org>, Eric Dumazet <edumazet@google.com>,
	 Jakub Kicinski <kuba@kernel.org>, Jann Horn <jannh@google.com>,
	netdev@vger.kernel.org
Subject: Re: [PATCH 00/14] xattr: rework simple xattrs and support user.* xattrs on sockets
Date: Fri, 20 Feb 2026 10:23:55 +0100	[thread overview]
Message-ID: <20260220-biobauer-beilhieb-2e792f9453cf@brauner> (raw)
In-Reply-To: <20260220004454.GR6467@frogsfrogsfrogs>

On Thu, Feb 19, 2026 at 04:44:54PM -0800, Darrick J. Wong wrote:
> On Mon, Feb 16, 2026 at 02:31:56PM +0100, Christian Brauner wrote:
> > Hey,
> > 
> > This reworks the simple_xattr infrastructure and adds support for
> > user.* extended attributes on sockets.
> > 
> > The simple_xattr subsystem currently uses an rbtree protected by a
> > reader-writer spinlock. This series replaces the rbtree with an
> > rhashtable giving O(1) average-case lookup with RCU-based lockless
> > reads. This sped up concurrent access patterns on tmpfs quite a bit and
> > it's an overall easy enough conversion to do and gets rid or rwlock_t.
> > 
> > The conversion is done incrementally: a new rhashtable path is added
> > alongside the existing rbtree, consumers are migrated one at a time
> > (shmem, kernfs, pidfs), and then the rbtree code is removed. All three
> > consumers switch from embedded structs to pointer-based lazy allocation
> > so the rhashtable overhead is only paid for inodes that actually use
> > xattrs.
> 
> Patches 1-6 look ok to me, at least in the sense that nothing stood out
> to me as obviously wrong, so
> Acked-by: "Darrick J. Wong" <djwong@kernel.org>
> 
> > With this infrastructure in place the series adds support for user.*
> > xattrs on sockets. Path-based AF_UNIX sockets inherit xattr support
> > from the underlying filesystem (e.g. tmpfs) but sockets in sockfs -
> > that is everything created via socket() including abstract namespace
> > AF_UNIX sockets - had no xattr support at all.
> > 
> > The xattr_permission() checks are reworked to allow user.* xattrs on
> > S_IFSOCK inodes. Sockfs sockets get per-inode limits of 128 xattrs and
> > 128KB total value size matching the limits already in use for kernfs.
> > 
> > The practical motivation comes from several directions. systemd and
> > GNOME are expanding their use of Varlink as an IPC mechanism. For D-Bus
> > there are tools like dbus-monitor that can observe IPC traffic across
> > the system but this only works because D-Bus has a central broker. For
> > Varlink there is no broker and there is currently no way to identify
> 
> Hum.  I suppose there's never going to be a central varlink broker, is
> there?  That doesn't sound great for discoverability, unless the plan is

Varlink was explicitly designed to avoid having to have a broker.
Practically it would have been one option to have a a central registry
maintained as a bpf socket map. My naive take had always been something
like: systemd can have a global socket map. sockets are picked up
whenver the appropriate xattr is set and deleted from the map once the
socket goes away (or the xattr is unset). Right now this is something
that would require capabilities. Once signed bpf is more common it is
easy to load that on per-container basis. But...

> to try to concentrate them in (say) /run/varlink?  But even then, could

... the future is already here :)

  https://github.com/systemd/systemd/pull/40590

All public varlink services that are supposed to be announced are now
symlinked into:

  /run/varlink/registry

There are of-course non-public interfaces such as the interface
between PID 1 and oomd. Such interfaces are not exposed.

It's also possible to have per user registries at e.g.:

  /run/user/1000/varlink/registry/

Such varlink services can now also be listed via:

  valinkctl list-services

This then ties very neatly into the varlink bridge we're currently
building:

  https://github.com/mvo5/varlink-http-bridge

It takes a directory with varlink sockets (or symlinks to varlink
sockets) like /run/varlink/registry as the argument and will serve
whatever it finds in there. Sockets can be added or removed dynamically
in the dir as needed:

  curl -s http://localhost:8080/sockets | jq
  {
    "sockets": [
      "io.systemd.Login",
      "io.systemd.Hostname",
      "io.systemd.sysext",
      "io.systemd.BootControl",
      "io.systemd.Import",
      "io.systemd.Repart",
      "io.systemd.MuteConsole",
      "io.systemd.FactoryReset",
      "io.systemd.Credentials",
      "io.systemd.AskPassword",
      "io.systemd.Manager",
      "io.systemd.ManagedOOM"
    ]
  }

The xattrs allow to have a completely global view of such services and
the per-user sessions all have their own sub-view.

> you have N services that share the same otherwise private tmpfs in order
> to talk to each other via a varlink socket?  I suppose in that case, the

Yeah sure that's one way.

> N services probably don't care/want others to discover their socket.
> 
> > which sockets speak Varlink. With user.* xattrs on sockets a service
> > can label its socket with the IPC protocol it speaks (e.g.,
> > user.varlink=1) and an eBPF program can then selectively capture
> 
> Who gets to set xattrs?  Can a malicious varlink socket user who has
> connect() abilities also delete user.varlink to mess with everyone who
> comes afterwards?

The main focus is AF_UNIX sockets of course so a varlink service does:

  fd = socket(AF_UNIX)
  umask(0117);
  bind(fd, "/run/foobar");
  umask(original_umask);
  chown("/run/foobar", -1, MYACCESSGID);
  setxattr("/run/foobar", "user.varlink", "1");

For non-path based sockets the inodes for client and server are
inherently distinct so they cannot interfer with each other. But even
then a chmod() + chown(-1, MYACCESSGID) on the sockfs socket fd will
protect this.

Thanks for the review. Please keep going. :)

next prev parent reply	other threads:[~2026-02-20  9:24 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-16 13:31 Christian Brauner
2026-02-16 13:31 ` [PATCH 01/14] xattr: add rcu_head and rhash_head to struct simple_xattr Christian Brauner
2026-02-27 14:43   ` Jan Kara
2026-02-16 13:31 ` [PATCH 02/14] xattr: add rhashtable-based simple_xattr infrastructure Christian Brauner
2026-02-27 14:43   ` Jan Kara
2026-02-16 13:31 ` [PATCH 03/14] shmem: adapt to rhashtable-based simple_xattrs with lazy allocation Christian Brauner
2026-02-27 14:48   ` Jan Kara
2026-02-16 13:32 ` [PATCH 04/14] kernfs: " Christian Brauner
2026-02-27 15:00   ` Jan Kara
2026-03-02 10:06     ` Christian Brauner
2026-02-16 13:32 ` [PATCH 05/14] pidfs: adapt to rhashtable-based simple_xattrs Christian Brauner
2026-02-27 15:09   ` Jan Kara
2026-02-27 15:16     ` Jan Kara
2026-03-02 10:04       ` Christian Brauner
2026-02-16 13:32 ` [PATCH 06/14] xattr: remove rbtree-based simple_xattr infrastructure Christian Brauner
2026-02-27 15:14   ` Jan Kara
2026-02-16 13:32 ` [PATCH 07/14] xattr: add xattr_permission_error() Christian Brauner
2026-02-27 15:15   ` Jan Kara
2026-02-16 13:32 ` [PATCH 08/14] xattr: switch xattr_permission() to switch statement Christian Brauner
2026-02-27 15:17   ` Jan Kara
2026-02-16 13:32 ` [PATCH 09/14] xattr: move user limits for xattrs to generic infra Christian Brauner
2026-02-21  0:03   ` Darrick J. Wong
2026-02-23 12:13     ` Christian Brauner
2026-02-27 15:20   ` Jan Kara
2026-02-16 13:32 ` [PATCH 10/14] xattr,net: support limited amount of extended attributes on sockfs sockets Christian Brauner
2026-02-27 15:25   ` Jan Kara
2026-02-16 13:32 ` [PATCH 11/14] xattr: support extended attributes on sockets Christian Brauner
2026-02-27 15:26   ` Jan Kara
2026-02-16 13:32 ` [PATCH 12/14] selftests/xattr: path-based AF_UNIX socket xattr tests Christian Brauner
2026-02-27 15:29   ` Jan Kara
2026-02-16 13:32 ` [PATCH 13/14] selftests/xattr: sockfs " Christian Brauner
2026-02-27 15:30   ` Jan Kara
2026-02-16 13:32 ` [PATCH 14/14] selftests/xattr: test xattrs on various socket families Christian Brauner
2026-02-27 15:32   ` Jan Kara
2026-02-20  0:44 ` [PATCH 00/14] xattr: rework simple xattrs and support user.* xattrs on sockets Darrick J. Wong
2026-02-20  9:23   ` Christian Brauner [this message]
2026-02-21  0:14     ` Darrick J. Wong
2026-03-03 12:40 ` Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260220-biobauer-beilhieb-2e792f9453cf@brauner \
    --to=brauner@kernel.org \
    --cc=djwong@kernel.org \
    --cc=edumazet@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=jannh@google.com \
    --cc=jlayton@kernel.org \
    --cc=josef@toxicpanda.com \
    --cc=kuba@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=netdev@vger.kernel.org \
    --cc=tj@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox