linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Pratyush Yadav <ptyadav@amazon.de>
To: <linux-kernel@vger.kernel.org>
Cc: Pratyush Yadav <ptyadav@amazon.de>,
	Jonathan Corbet <corbet@lwn.net>,
	"Eric Biederman" <ebiederm@xmission.com>,
	Arnd Bergmann <arnd@arndb.de>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
	Hugh Dickins <hughd@google.com>, Alexander Graf <graf@amazon.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	"David Woodhouse" <dwmw2@infradead.org>,
	James Gowans <jgowans@amazon.com>,
	"Mike Rapoport" <rppt@kernel.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	"Pasha Tatashin" <tatashin@google.com>,
	Anthony Yznaga <anthony.yznaga@oracle.com>,
	Dave Hansen <dave.hansen@intel.com>,
	David Hildenbrand <david@redhat.com>,
	Jason Gunthorpe <jgg@nvidia.com>,
	Matthew Wilcox <willy@infradead.org>,
	"Wei Yang" <richard.weiyang@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	<linux-fsdevel@vger.kernel.org>, <linux-doc@vger.kernel.org>,
	<linux-mm@kvack.org>, <kexec@lists.infradead.org>
Subject: [RFC PATCH 2/5] misc: add documentation for FDBox
Date: Fri, 7 Mar 2025 00:57:36 +0000	[thread overview]
Message-ID: <20250307005830.65293-3-ptyadav@amazon.de> (raw)
In-Reply-To: <20250307005830.65293-1-ptyadav@amazon.de>

With FDBox in place, add documentation that describes what it is and how
it is used, along with its UAPI and in-kernel API.

Since the document refers to KHO, add a reference tag in kho/index.rst.

Signed-off-by: Pratyush Yadav <ptyadav@amazon.de>
---
 Documentation/filesystems/locking.rst |  21 +++
 Documentation/kho/fdbox.rst           | 224 ++++++++++++++++++++++++++
 Documentation/kho/index.rst           |   3 +
 MAINTAINERS                           |   1 +
 4 files changed, 249 insertions(+)
 create mode 100644 Documentation/kho/fdbox.rst

diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index d20a32b77b60f..5526833faf79a 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -607,6 +607,27 @@ used. To block changes to file contents via a memory mapping during the
 operation, the filesystem must take mapping->invalidate_lock to coordinate
 with ->page_mkwrite.
 
+fdbox_file_ops
+==============
+
+prototypes::
+
+	int (*kho_write)(struct fdbox_fd *box_fd, void *fdt);
+	int (*seal)(struct fdbox *box);
+	int (*unseal)(struct fdbox *box);
+
+
+locking rules:
+	all may block
+
+==============	==================================================
+ops		i_rwsem(box_fd->file->f_inode)
+==============	==================================================
+kho_write:	exclusive
+seal:		no
+unseal:		no
+==============	==================================================
+
 dquot_operations
 ================
 
diff --git a/Documentation/kho/fdbox.rst b/Documentation/kho/fdbox.rst
new file mode 100644
index 0000000000000..44a3f5cdf1efb
--- /dev/null
+++ b/Documentation/kho/fdbox.rst
@@ -0,0 +1,224 @@
+.. SPDX-License-Identifier: GPL-2.0-or-later
+
+===========================
+File Descriptor Box (FDBox)
+===========================
+
+:Author: Pratyush Yadav
+
+Introduction
+============
+
+The File Descriptor Box (FDBox) is a mechanism for userspace to name file
+descriptors and give them over to the kernel to hold. They can later be
+retrieved by passing in the same name.
+
+The primary purpose of FDBox is to be used with :ref:`kho`. There are many kinds
+anonymous file descriptors in the kernel like memfd, guest_memfd, iommufd, etc.
+that would be useful to be preserved using KHO. To be able to do that, there
+needs to be a mechanism to label FDs that allows userspace to set the label
+before doing KHO and to use the label to map them back after KHO. FDBox achieves
+that purpose by exposing a miscdevice which exposes ioctls to label and transfer
+FDs between the kernel and userspace. FDBox is not intended to work with any
+generic file descriptor. Support for each kind of FDs must be explicitly
+enabled.
+
+FDBox can be enabled by setting the ``CONFIG_FDBOX`` option to ``y``. While the
+primary purpose of FDBox is to be used with KHO, it does not explicitly require
+``CONFIG_KEXEC_HANDOVER``, since it can be used without KHO, simply as a way to
+preserve or transfer FDs when userspace exits.
+
+Concepts
+========
+
+Box
+---
+
+The box is a container for FDs. Boxes are identified by their name, which must
+be unique. Userspace can put FDs in the box using the ``FDBOX_PUT_FD``
+operation, and take them out of the box using the ``FDBOX_GET_FD`` operation.
+Once all the required FDs are put into the box, it can be sealed to make it
+ready for shipping. This can be done by the ``FDBOX_SEAL`` operation. The seal
+operation notifies each FD in the box. If any of the FDs have a dependency on
+another, this gives them an opportunity to ensure all dependencies are met, or
+fail the seal if not. Once a box is sealed, no FDs can be added or removed from
+the box until it is unsealed. Only sealed boxes are transported to a new kernel
+via KHO. The box can be unsealed by the ``FDBOX_UNSEAL`` operation. This is the
+opposite of seal. It also notifies each FD in the box to ensure all dependencies
+are met. This can be useful in case some FDs fail to be restored after KHO.
+
+Box FD
+------
+
+The Box FD is a FD that is currently in a box. It is identified by its name,
+which must be unique in the box it belongs to. The Box FD is created when a FD
+is put into a box by using the ``FDBOX_PUT_FD`` operation. This operation
+removes the FD from the calling task. The FD can be restored by passing the
+unique name to the ``FDBOX_GET_FD`` operation.
+
+FDBox control device
+--------------------
+
+This is the ``/dev/fdbox/fdbox`` device. A box can be created using the
+``FDBOX_CREATE_BOX`` operation on the device. A box can be removed using the
+``FDBOX_DELETE_BOX`` operation.
+
+UAPI
+====
+
+FDBOX_NAME_LEN
+--------------
+
+.. code-block:: c
+
+    #define FDBOX_NAME_LEN			256
+
+Maximum length of the name of a Box or Box FD.
+
+Ioctls on /dev/fdbox/fdbox
+--------------------------
+
+FDBOX_CREATE_BOX
+~~~~~~~~~~~~~~~~
+
+.. code-block:: c
+
+    #define FDBOX_CREATE_BOX	_IO(FDBOX_TYPE, FDBOX_BASE + 0)
+    struct fdbox_create_box {
+    	__u64 flags;
+    	__u8 name[FDBOX_NAME_LEN];
+    };
+
+Create a box.
+
+After this returns, the box is available at ``/dev/fdbox/<name>``.
+
+``name``
+    The name of the box to be created. Must be unique.
+
+``flags``
+    Flags to the operation. Currently, no flags are defined.
+
+Returns:
+    0 on success, -1 on error, with errno set.
+
+FDBOX_DELETE_BOX
+~~~~~~~~~~~~~~~~
+
+.. code-block:: c
+
+    #define FDBOX_DELETE_BOX	_IO(FDBOX_TYPE, FDBOX_BASE + 1)
+    struct fdbox_delete_box {
+    	__u64 flags;
+    	__u8 name[FDBOX_NAME_LEN];
+    };
+
+Delete a box.
+
+After this returns, the box is no longer available at ``/dev/fdbox/<name>``.
+
+``name``
+    The name of the box to be deleted.
+
+``flags``
+    Flags to the operation. Currently, no flags are defined.
+
+Returns:
+    0 on success, -1 on error, with errno set.
+
+Ioctls on /dev/fdbox/<boxname>
+------------------------------
+
+These must be performed on the ``/dev/fdbox/<boxname>`` device.
+
+FDBX_PUT_FD
+~~~~~~~~~~~
+
+.. code-block:: c
+
+    #define FDBOX_PUT_FD	_IO(FDBOX_TYPE, FDBOX_BASE + 2)
+    struct fdbox_put_fd {
+    	__u64 flags;
+    	__u32 fd;
+    	__u32 pad;
+    	__u8 name[FDBOX_NAME_LEN];
+    };
+
+
+Put FD into the box.
+
+After this returns, ``fd`` is removed from the task and can no longer be used by
+it.
+
+``name``
+    The name of the FD.
+
+``fd``
+    The file descriptor number to be
+
+``flags``
+    Flags to the operation. Currently, no flags are defined.
+
+Returns:
+    0 on success, -1 on error, with errno set.
+
+FDBX_GET_FD
+~~~~~~~~~~~
+
+.. code-block:: c
+
+    #define FDBOX_GET_FD	_IO(FDBOX_TYPE, FDBOX_BASE + 3)
+    struct fdbox_get_fd {
+    	__u64 flags;
+    	__u8 name[FDBOX_NAME_LEN];
+    };
+
+Get an FD from the box.
+
+After this returns, the FD identified by ``name`` is mapped into the task and is
+available for use.
+
+``name``
+    The name of the FD to get.
+
+``flags``
+    Flags to the operation. Currently, no flags are defined.
+
+Returns:
+    FD number on success, -1 on error with errno set.
+
+FDBOX_SEAL
+~~~~~~~~~~
+
+.. code-block:: c
+
+    #define FDBOX_SEAL	_IO(FDBOX_TYPE, FDBOX_BASE + 4)
+
+Seal the box.
+
+Gives the kernel an opportunity to ensure all dependencies are met in the box.
+After this returns, the box is sealed and FDs can no longer be added or removed
+from it. A box must be sealed for it to be transported across KHO.
+
+Returns:
+    0 on success, -1 on error with errno set.
+
+FDBOX_UNSEAL
+~~~~~~~~~~~~
+
+.. code-block:: c
+
+    #define FDBOX_UNSEAL	_IO(FDBOX_TYPE, FDBOX_BASE + 5)
+
+Unseal the box.
+
+Gives the kernel an opportunity to ensure all dependencies are met in the box,
+and in case of KHO, no FDs have been lost in transit.
+
+Returns:
+    0 on success, -1 on error with errno set.
+
+Kernel functions and structures
+===============================
+
+.. kernel-doc:: include/linux/fdbox.h
diff --git a/Documentation/kho/index.rst b/Documentation/kho/index.rst
index 5e7eeeca8520f..051513b956075 100644
--- a/Documentation/kho/index.rst
+++ b/Documentation/kho/index.rst
@@ -1,5 +1,7 @@
 .. SPDX-License-Identifier: GPL-2.0-or-later
 
+.. _kho:
+
 ========================
 Kexec Handover Subsystem
 ========================
@@ -9,6 +11,7 @@ Kexec Handover Subsystem
 
    concepts
    usage
+   fdbox
 
 .. only::  subproject and html
 
diff --git a/MAINTAINERS b/MAINTAINERS
index d329d3e5514c5..135427582e60f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8866,6 +8866,7 @@ FDBOX
 M:	Pratyush Yadav <pratyush@kernel.org>
 L:	linux-fsdevel@vger.kernel.org
 S:	Maintained
+F:	Documentation/kho/fdbox.rst
 F:	drivers/misc/fdbox.c
 F:	include/linux/fdbox.h
 F:	include/uapi/linux/fdbox.h
-- 
2.47.1



  parent reply	other threads:[~2025-03-07  0:58 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-07  0:57 [RFC PATCH 0/5] Introduce FDBox, and preserve memfd with shmem over KHO Pratyush Yadav
2025-03-07  0:57 ` [RFC PATCH 1/5] misc: introduce FDBox Pratyush Yadav
2025-03-07  6:03   ` Greg Kroah-Hartman
2025-03-07  9:31   ` Christian Brauner
2025-03-07 13:19     ` Christian Brauner
2025-03-07 15:14     ` Jason Gunthorpe
2025-03-08 11:09       ` Christian Brauner
2025-03-17 16:46         ` Jason Gunthorpe
2025-03-08  0:10     ` Pratyush Yadav
2025-03-09 12:03       ` Christian Brauner
2025-03-17 16:59         ` Jason Gunthorpe
2025-03-18 14:25           ` Christian Brauner
2025-03-18 14:57             ` Jason Gunthorpe
2025-03-18 23:02               ` Pratyush Yadav
2025-03-18 23:27                 ` Jason Gunthorpe
2025-03-19 13:35                   ` Pratyush Yadav
2025-03-20 12:14                     ` Jason Gunthorpe
2025-03-26 22:40                       ` Pratyush Yadav
2025-03-31 15:38                         ` Jason Gunthorpe
2025-03-07  0:57 ` Pratyush Yadav [this message]
2025-03-07  2:19   ` [RFC PATCH 2/5] misc: add documentation for FDBox Randy Dunlap
2025-03-07 15:03     ` Pratyush Yadav
2025-03-07 14:22   ` Jonathan Corbet
2025-03-07 14:51     ` Pratyush Yadav
2025-03-07 15:25       ` Jonathan Corbet
2025-03-07 23:28         ` Pratyush Yadav
2025-03-07  0:57 ` [RFC PATCH 3/5] mm: shmem: allow callers to specify operations to shmem_undo_range Pratyush Yadav
2025-03-07  0:57 ` [RFC PATCH 4/5] mm: shmem: allow preserving file over FDBOX + KHO Pratyush Yadav
2025-03-07  0:57 ` [RFC PATCH 5/5] mm/memfd: allow preserving FD " Pratyush Yadav

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250307005830.65293-3-ptyadav@amazon.de \
    --to=ptyadav@amazon.de \
    --cc=akpm@linux-foundation.org \
    --cc=anthony.yznaga@oracle.com \
    --cc=arnd@arndb.de \
    --cc=benh@kernel.crashing.org \
    --cc=brauner@kernel.org \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=dwmw2@infradead.org \
    --cc=ebiederm@xmission.com \
    --cc=graf@amazon.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=jgg@nvidia.com \
    --cc=jgowans@amazon.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=pbonzini@redhat.com \
    --cc=richard.weiyang@gmail.com \
    --cc=rppt@kernel.org \
    --cc=tatashin@google.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox