linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Christian Brauner <brauner@kernel.org>
To: Pratyush Yadav <ptyadav@amazon.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	 linux-kernel@vger.kernel.org, Jonathan Corbet <corbet@lwn.net>,
	 Eric Biederman <ebiederm@xmission.com>,
	Arnd Bergmann <arnd@arndb.de>,
	 Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	 Jan Kara <jack@suse.cz>, Hugh Dickins <hughd@google.com>,
	 Alexander Graf <graf@amazon.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	 David Woodhouse <dwmw2@infradead.org>,
	James Gowans <jgowans@amazon.com>,
	 Mike Rapoport <rppt@kernel.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	 Pasha Tatashin <tatashin@google.com>,
	Anthony Yznaga <anthony.yznaga@oracle.com>,
	 Dave Hansen <dave.hansen@intel.com>,
	David Hildenbrand <david@redhat.com>,
	 Jason Gunthorpe <jgg@nvidia.com>,
	Matthew Wilcox <willy@infradead.org>,
	 Wei Yang <richard.weiyang@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	 linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-mm@kvack.org,  kexec@lists.infradead.org
Subject: Re: [RFC PATCH 1/5] misc: introduce FDBox
Date: Sun, 9 Mar 2025 13:03:31 +0100	[thread overview]
Message-ID: <20250309-unerwartet-alufolie-96aae4d20e38@brauner> (raw)
In-Reply-To: <mafs0ikokidqz.fsf@amazon.de>

On Sat, Mar 08, 2025 at 12:10:12AM +0000, Pratyush Yadav wrote:
> Hi Christian,
> 
> Thanks for the review!

No worries, I'm not trying to be polemic. It's just that this whole
proposed concept is pretty lightweight in terms of thinking about
possible implications.

> > This use-case is covered with systemd's fdstore and it's available to
> > unprivileged userspace. Stashing arbitrary file descriptors in the
> > kernel in this way isn't a good idea.
> 
> For one, it can't be arbitrary FDs, but only explicitly enabled ones.
> Beyond that, while not intended, there is no way to stop userspace from
> using it as a stash. Stashing FDs is a needed operation for this to
> work, and there is no way to guarantee in advance that userspace will
> actually use it for KHO, and not just stash it to grab back later.

As written it can't ever function as a generic file descriptor store.

It only allows fully privileged processes to stash file descriptors.
Which makes it useless for generic userspace. A generic fdstore should
have a model that makes it usable unprivileged it probably should also
be multi-instance and work easily with namespaces. This doesn't and
hitching it on devtmpfs and character devices is guaranteed to not work
well with such use-cases.

It also has big time security issues and implications. Any file you
stash in there will have the credentials of the opener attached to it.
So if someone stashes anything in there you need permission mechanisms
that ensures that Joe Random can't via FDBOX_GET_FD pull out a file for
e.g., someone else's cgroup and happily migrate processses under the
openers credentials or mess around some random executing binary.

So you need a model of who is allowed to pull out what file descriptors
from a file descriptor stash. What are the semantics for that? What's
the security model for that? What are possible corner cases?

For systemd's userspace fstore that's covered by policy it can implement
quite easily what fds it accepts. For the kernel it's a lot more
complicated.

If someone puts in file descriptors for a bunch of files in there opened
in different mount namespaces then this will pin said mount namespaces.
If the last process in the mount namespace exists the mount namespace
would be cleaned up but not anymore. The mount namespace would stay
pinned. Not wrong, but needs to be spelled out what the implications of
this are.

What if someone puts a file descriptor from devtmpfs or for /dev/fdbox
into an fdbox? Even if that's blocked, what happens if someone creates a
detached bind-mount of a /dev/fdbox mount and mounts it into a different
mount namespace and then puts a file descriptor for that mount namespace
into the fdbox? Tons of other scenarios come to mind. Ignoring when
networking is brought into the mix as well.

It's not done by just letting the kernel stash some files and getting
them out later somehow and then see whether it's somehow useful in the
future for other stuff. A generic globally usable fdstore is not
happening without a clear and detailed analysis what the semantics are
going to be.

So either that work is done right from the start or that stashing files
goes out the window and instead that KHO part is implemented in a way
where during a KHO dump relevant userspace is notified that they must
now serialize their state into the serialization stash. And no files are
actually kept in there at all.


  reply	other threads:[~2025-03-09 12:03 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-07  0:57 [RFC PATCH 0/5] Introduce FDBox, and preserve memfd with shmem over KHO Pratyush Yadav
2025-03-07  0:57 ` [RFC PATCH 1/5] misc: introduce FDBox Pratyush Yadav
2025-03-07  6:03   ` Greg Kroah-Hartman
2025-03-07  9:31   ` Christian Brauner
2025-03-07 13:19     ` Christian Brauner
2025-03-07 15:14     ` Jason Gunthorpe
2025-03-08 11:09       ` Christian Brauner
2025-03-17 16:46         ` Jason Gunthorpe
2025-03-08  0:10     ` Pratyush Yadav
2025-03-09 12:03       ` Christian Brauner [this message]
2025-03-17 16:59         ` Jason Gunthorpe
2025-03-18 14:25           ` Christian Brauner
2025-03-18 14:57             ` Jason Gunthorpe
2025-03-18 23:02               ` Pratyush Yadav
2025-03-18 23:27                 ` Jason Gunthorpe
2025-03-19 13:35                   ` Pratyush Yadav
2025-03-20 12:14                     ` Jason Gunthorpe
2025-03-26 22:40                       ` Pratyush Yadav
2025-03-31 15:38                         ` Jason Gunthorpe
2025-03-07  0:57 ` [RFC PATCH 2/5] misc: add documentation for FDBox Pratyush Yadav
2025-03-07  2:19   ` Randy Dunlap
2025-03-07 15:03     ` Pratyush Yadav
2025-03-07 14:22   ` Jonathan Corbet
2025-03-07 14:51     ` Pratyush Yadav
2025-03-07 15:25       ` Jonathan Corbet
2025-03-07 23:28         ` Pratyush Yadav
2025-03-07  0:57 ` [RFC PATCH 3/5] mm: shmem: allow callers to specify operations to shmem_undo_range Pratyush Yadav
2025-03-07  0:57 ` [RFC PATCH 4/5] mm: shmem: allow preserving file over FDBOX + KHO Pratyush Yadav
2025-03-07  0:57 ` [RFC PATCH 5/5] mm/memfd: allow preserving FD " Pratyush Yadav

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250309-unerwartet-alufolie-96aae4d20e38@brauner \
    --to=brauner@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=anthony.yznaga@oracle.com \
    --cc=arnd@arndb.de \
    --cc=benh@kernel.crashing.org \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=dwmw2@infradead.org \
    --cc=ebiederm@xmission.com \
    --cc=graf@amazon.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=jgg@nvidia.com \
    --cc=jgowans@amazon.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=pbonzini@redhat.com \
    --cc=ptyadav@amazon.de \
    --cc=richard.weiyang@gmail.com \
    --cc=rppt@kernel.org \
    --cc=tatashin@google.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox