From: Randy Dunlap <rdunlap@infradead.org>
To: Pratyush Yadav <ptyadav@amazon.de>, linux-kernel@vger.kernel.org
Cc: Jonathan Corbet <corbet@lwn.net>,
Eric Biederman <ebiederm@xmission.com>,
Arnd Bergmann <arnd@arndb.de>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
Hugh Dickins <hughd@google.com>, Alexander Graf <graf@amazon.com>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
David Woodhouse <dwmw2@infradead.org>,
James Gowans <jgowans@amazon.com>,
Mike Rapoport <rppt@kernel.org>,
Paolo Bonzini <pbonzini@redhat.com>,
Pasha Tatashin <tatashin@google.com>,
Anthony Yznaga <anthony.yznaga@oracle.com>,
Dave Hansen <dave.hansen@intel.com>,
David Hildenbrand <david@redhat.com>,
Jason Gunthorpe <jgg@nvidia.com>,
Matthew Wilcox <willy@infradead.org>,
Wei Yang <richard.weiyang@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org,
linux-mm@kvack.org, kexec@lists.infradead.org
Subject: Re: [RFC PATCH 2/5] misc: add documentation for FDBox
Date: Thu, 06 Mar 2025 18:19:58 -0800 [thread overview]
Message-ID: <E41DA7C8-635C-4E6E-A2CA-5D657526BE85@infradead.org> (raw)
In-Reply-To: <20250307005830.65293-3-ptyadav@amazon.de>
On March 6, 2025 4:57:36 PM PST, Pratyush Yadav <ptyadav@amazon.de> wrote:
>With FDBox in place, add documentation that describes what it is and how
>it is used, along with its UAPI and in-kernel API.
>
>Since the document refers to KHO, add a reference tag in kho/index.rst.
>
>Signed-off-by: Pratyush Yadav <ptyadav@amazon.de>
>---
> Documentation/filesystems/locking.rst | 21 +++
> Documentation/kho/fdbox.rst | 224 ++++++++++++++++++++++++++
> Documentation/kho/index.rst | 3 +
> MAINTAINERS | 1 +
> 4 files changed, 249 insertions(+)
> create mode 100644 Documentation/kho/fdbox.rst
>
>diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
>index d20a32b77b60f..5526833faf79a 100644
>--- a/Documentation/filesystems/locking.rst
>+++ b/Documentation/filesystems/locking.rst
>@@ -607,6 +607,27 @@ used. To block changes to file contents via a memory mapping during the
> operation, the filesystem must take mapping->invalidate_lock to coordinate
> with ->page_mkwrite.
>
>+fdbox_file_ops
>+==============
>+
>+prototypes::
>+
>+ int (*kho_write)(struct fdbox_fd *box_fd, void *fdt);
>+ int (*seal)(struct fdbox *box);
>+ int (*unseal)(struct fdbox *box);
>+
>+
>+locking rules:
>+ all may block
>+
>+============== ==================================================
>+ops i_rwsem(box_fd->file->f_inode)
>+============== ==================================================
>+kho_write: exclusive
>+seal: no
>+unseal: no
>+============== ==================================================
>+
> dquot_operations
> ================
>
>diff --git a/Documentation/kho/fdbox.rst b/Documentation/kho/fdbox.rst
>new file mode 100644
>index 0000000000000..44a3f5cdf1efb
>--- /dev/null
>+++ b/Documentation/kho/fdbox.rst
>@@ -0,0 +1,224 @@
>+.. SPDX-License-Identifier: GPL-2.0-or-later
>+
>+===========================
>+File Descriptor Box (FDBox)
>+===========================
>+
>+:Author: Pratyush Yadav
>+
>+Introduction
>+============
>+
>+The File Descriptor Box (FDBox) is a mechanism for userspace to name file
>+descriptors and give them over to the kernel to hold. They can later be
>+retrieved by passing in the same name.
>+
>+The primary purpose of FDBox is to be used with :ref:`kho`. There are many kinds
many kinds of
>+anonymous file descriptors in the kernel like memfd, guest_memfd, iommufd, etc.
etc.,
>+that would be useful to be preserved using KHO. To be able to do that, there
>+needs to be a mechanism to label FDs that allows userspace to set the label
>+before doing KHO and to use the label to map them back after KHO. FDBox achieves
>+that purpose by exposing a miscdevice which exposes ioctls to label and transfer
>+FDs between the kernel and userspace. FDBox is not intended to work with any
>+generic file descriptor. Support for each kind of FDs must be explicitly
>+enabled.
>+
>+FDBox can be enabled by setting the ``CONFIG_FDBOX`` option to ``y``. While the
>+primary purpose of FDBox is to be used with KHO, it does not explicitly require
>+``CONFIG_KEXEC_HANDOVER``, since it can be used without KHO, simply as a way to
>+preserve or transfer FDs when userspace exits.
>+
>+Concepts
>+========
>+
>+Box
>+---
>+
>+The box is a container for FDs. Boxes are identified by their name, which must
>+be unique. Userspace can put FDs in the box using the ``FDBOX_PUT_FD``
>+operation, and take them out of the box using the ``FDBOX_GET_FD`` operation.
Is this ioctl range documented is ioctl-number.rst?
I didn't notice a patch for that.
>+Once all the required FDs are put into the box, it can be sealed to make it
>+ready for shipping. This can be done by the ``FDBOX_SEAL`` operation. The seal
>+operation notifies each FD in the box. If any of the FDs have a dependency on
>+another, this gives them an opportunity to ensure all dependencies are met, or
>+fail the seal if not. Once a box is sealed, no FDs can be added or removed from
>+the box until it is unsealed. Only sealed boxes are transported to a new kernel
What if KHO is not being used?
>+via KHO. The box can be unsealed by the ``FDBOX_UNSEAL`` operation. This is the
>+opposite of seal. It also notifies each FD in the box to ensure all dependencies
>+are met. This can be useful in case some FDs fail to be restored after KHO.
>+
>+Box FD
>+------
I can't tell in my email font, but is each underlinoat least as long as the title above it?
>+
>+The Box FD is a FD that is currently in a box. It is identified by its name,
>+which must be unique in the box it belongs to. The Box FD is created when a FD
>+is put into a box by using the ``FDBOX_PUT_FD`` operation. This operation
>+removes the FD from the calling task. The FD can be restored by passing the
>+unique name to the ``FDBOX_GET_FD`` operation.
>+
>+FDBox control device
>+--------------------
>+
>+This is the ``/dev/fdbox/fdbox`` device. A box can be created using the
>+``FDBOX_CREATE_BOX`` operation on the device. A box can be removed using the
>+``FDBOX_DELETE_BOX`` operation.
>+
>+UAPI
>+====
>+
>+FDBOX_NAME_LEN
>+--------------
>+
>+.. code-block:: c
>+
>+ #define FDBOX_NAME_LEN 256
>+
>+Maximum length of the name of a Box or Box FD.
>+
>+Ioctls on /dev/fdbox/fdbox
>+--------------------------
>+
>+FDBOX_CREATE_BOX
>+~~~~~~~~~~~~~~~~
>+
>+.. code-block:: c
>+
>+ #define FDBOX_CREATE_BOX _IO(FDBOX_TYPE, FDBOX_BASE + 0)
>+ struct fdbox_create_box {
>+ __u64 flags;
>+ __u8 name[FDBOX_NAME_LEN];
>+ };
>+
>+Create a box.
>+
>+After this returns, the box is available at ``/dev/fdbox/<name>``.
>+
>+``name``
>+ The name of the box to be created. Must be unique.
>+
>+``flags``
>+ Flags to the operation. Currently, no flags are defined.
>+
>+Returns:
>+ 0 on success, -1 on error, with errno set.
>+
>+FDBOX_DELETE_BOX
>+~~~~~~~~~~~~~~~~
>+
>+.. code-block:: c
>+
>+ #define FDBOX_DELETE_BOX _IO(FDBOX_TYPE, FDBOX_BASE + 1)
>+ struct fdbox_delete_box {
>+ __u64 flags;
>+ __u8 name[FDBOX_NAME_LEN];
>+ };
>+
>+Delete a box.
>+
>+After this returns, the box is no longer available at ``/dev/fdbox/<name>``.
>+
>+``name``
>+ The name of the box to be deleted.
>+
>+``flags``
>+ Flags to the operation. Currently, no flags are defined.
>+
>+Returns:
>+ 0 on success, -1 on error, with errno set.
>+
>+Ioctls on /dev/fdbox/<boxname>
>+------------------------------
>+
>+These must be performed on the ``/dev/fdbox/<boxname>`` device.
>+
>+FDBX_PUT_FD
>+~~~~~~~~~~~
>+
>+.. code-block:: c
>+
>+ #define FDBOX_PUT_FD _IO(FDBOX_TYPE, FDBOX_BASE + 2)
>+ struct fdbox_put_fd {
>+ __u64 flags;
>+ __u32 fd;
>+ __u32 pad;
>+ __u8 name[FDBOX_NAME_LEN];
>+ };
>+
>+
>+Put FD into the box.
>+
>+After this returns, ``fd`` is removed from the task and can no longer be used by
>+it.
>+
>+``name``
>+ The name of the FD.
>+
>+``fd``
>+ The file descriptor number to be
>+
>+``flags``
>+ Flags to the operation. Currently, no flags are defined.
>+
>+Returns:
>+ 0 on success, -1 on error, with errno set.
>+
>+FDBX_GET_FD
>+~~~~~~~~~~~
>+
>+.. code-block:: c
>+
>+ #define FDBOX_GET_FD _IO(FDBOX_TYPE, FDBOX_BASE + 3)
>+ struct fdbox_get_fd {
>+ __u64 flags;
>+ __u8 name[FDBOX_NAME_LEN];
>+ };
>+
>+Get an FD from the box.
>+
>+After this returns, the FD identified by ``name`` is mapped into the task and is
>+available for use.
>+
>+``name``
>+ The name of the FD to get.
>+
>+``flags``
>+ Flags to the operation. Currently, no flags are defined.
>+
>+Returns:
>+ FD number on success, -1 on error with errno set.
>+
>+FDBOX_SEAL
>+~~~~~~~~~~
>+
>+.. code-block:: c
>+
>+ #define FDBOX_SEAL _IO(FDBOX_TYPE, FDBOX_BASE + 4)
>+
>+Seal the box.
>+
>+Gives the kernel an opportunity to ensure all dependencies are met in the box.
>+After this returns, the box is sealed and FDs can no longer be added or removed
>+from it. A box must be sealed for it to be transported across KHO.
>+
>+Returns:
>+ 0 on success, -1 on error with errno set.
>+
>+FDBOX_UNSEAL
>+~~~~~~~~~~~~
>+
>+.. code-block:: c
>+
>+ #define FDBOX_UNSEAL _IO(FDBOX_TYPE, FDBOX_BASE + 5)
>+
>+Unseal the box.
>+
>+Gives the kernel an opportunity to ensure all dependencies are met in the box,
>+and in case of KHO, no FDs have been lost in transit.
>+
>+Returns:
>+ 0 on success, -1 on error with errno set.
>+
>+Kernel functions and structures
>+===============================
>+
>+.. kernel-doc:: include/linux/fdbox.h
>diff --git a/Documentation/kho/index.rst b/Documentation/kho/index.rst
>index 5e7eeeca8520f..051513b956075 100644
>--- a/Documentation/kho/index.rst
>+++ b/Documentation/kho/index.rst
>@@ -1,5 +1,7 @@
> .. SPDX-License-Identifier: GPL-2.0-or-later
>
>+.. _kho:
>+
> ========================
> Kexec Handover Subsystem
> ========================
>@@ -9,6 +11,7 @@ Kexec Handover Subsystem
>
> concepts
> usage
>+ fdbox
>
> .. only:: subproject and html
>
>diff --git a/MAINTAINERS b/MAINTAINERS
>index d329d3e5514c5..135427582e60f 100644
>--- a/MAINTAINERS
>+++ b/MAINTAINERS
>@@ -8866,6 +8866,7 @@ FDBOX
> M: Pratyush Yadav <pratyush@kernel.org>
> L: linux-fsdevel@vger.kernel.org
> S: Maintained
>+F: Documentation/kho/fdbox.rst
> F: drivers/misc/fdbox.c
> F: include/linux/fdbox.h
> F: include/uapi/linux/fdbox.h
~Randy
next prev parent reply other threads:[~2025-03-07 2:20 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-07 0:57 [RFC PATCH 0/5] Introduce FDBox, and preserve memfd with shmem over KHO Pratyush Yadav
2025-03-07 0:57 ` [RFC PATCH 1/5] misc: introduce FDBox Pratyush Yadav
2025-03-07 6:03 ` Greg Kroah-Hartman
2025-03-07 9:31 ` Christian Brauner
2025-03-07 13:19 ` Christian Brauner
2025-03-07 15:14 ` Jason Gunthorpe
2025-03-08 11:09 ` Christian Brauner
2025-03-17 16:46 ` Jason Gunthorpe
2025-03-08 0:10 ` Pratyush Yadav
2025-03-09 12:03 ` Christian Brauner
2025-03-17 16:59 ` Jason Gunthorpe
2025-03-18 14:25 ` Christian Brauner
2025-03-18 14:57 ` Jason Gunthorpe
2025-03-18 23:02 ` Pratyush Yadav
2025-03-18 23:27 ` Jason Gunthorpe
2025-03-19 13:35 ` Pratyush Yadav
2025-03-20 12:14 ` Jason Gunthorpe
2025-03-26 22:40 ` Pratyush Yadav
2025-03-31 15:38 ` Jason Gunthorpe
2025-03-07 0:57 ` [RFC PATCH 2/5] misc: add documentation for FDBox Pratyush Yadav
2025-03-07 2:19 ` Randy Dunlap [this message]
2025-03-07 15:03 ` Pratyush Yadav
2025-03-07 14:22 ` Jonathan Corbet
2025-03-07 14:51 ` Pratyush Yadav
2025-03-07 15:25 ` Jonathan Corbet
2025-03-07 23:28 ` Pratyush Yadav
2025-03-07 0:57 ` [RFC PATCH 3/5] mm: shmem: allow callers to specify operations to shmem_undo_range Pratyush Yadav
2025-03-07 0:57 ` [RFC PATCH 4/5] mm: shmem: allow preserving file over FDBOX + KHO Pratyush Yadav
2025-03-07 0:57 ` [RFC PATCH 5/5] mm/memfd: allow preserving FD " Pratyush Yadav
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=E41DA7C8-635C-4E6E-A2CA-5D657526BE85@infradead.org \
--to=rdunlap@infradead.org \
--cc=akpm@linux-foundation.org \
--cc=anthony.yznaga@oracle.com \
--cc=arnd@arndb.de \
--cc=benh@kernel.crashing.org \
--cc=brauner@kernel.org \
--cc=corbet@lwn.net \
--cc=dave.hansen@intel.com \
--cc=david@redhat.com \
--cc=dwmw2@infradead.org \
--cc=ebiederm@xmission.com \
--cc=graf@amazon.com \
--cc=gregkh@linuxfoundation.org \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=jgg@nvidia.com \
--cc=jgowans@amazon.com \
--cc=kexec@lists.infradead.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=pbonzini@redhat.com \
--cc=ptyadav@amazon.de \
--cc=richard.weiyang@gmail.com \
--cc=rppt@kernel.org \
--cc=tatashin@google.com \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox