From: Yi Liu <yi.l.liu@intel.com>
To: David Matlack <dmatlack@google.com>,
Alex Williamson <alex@shazbot.org>,
Bjorn Helgaas <bhelgaas@google.com>
Cc: "Adithya Jayachandran" <ajayachandra@nvidia.com>,
"Alexander Graf" <graf@amazon.com>,
"Alex Mastro" <amastro@fb.com>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Ankit Agrawal" <ankita@nvidia.com>,
"Arnd Bergmann" <arnd@arndb.de>,
"Askar Safin" <safinaskar@gmail.com>,
"Borislav Petkov (AMD)" <bp@alien8.de>,
"Chris Li" <chrisl@kernel.org>,
"Dapeng Mi" <dapeng1.mi@linux.intel.com>,
"David Rientjes" <rientjes@google.com>,
"Feng Tang" <feng.tang@linux.alibaba.com>,
"Jacob Pan" <jacob.pan@linux.microsoft.com>,
"Jason Gunthorpe" <jgg@nvidia.com>,
"Jason Gunthorpe" <jgg@ziepe.ca>,
"Jonathan Corbet" <corbet@lwn.net>,
"Josh Hilke" <jrhilke@google.com>, "Kees Cook" <kees@kernel.org>,
"Kevin Tian" <kevin.tian@intel.com>,
kexec@lists.infradead.org, kvm@vger.kernel.org,
"Leon Romanovsky" <leon@kernel.org>,
"Leon Romanovsky" <leonro@nvidia.com>,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-kselftest@vger.kernel.org, linux-mm@kvack.org,
linux-pci@vger.kernel.org, "Li RongQing" <lirongqing@baidu.com>,
"Lukas Wunner" <lukas@wunner.de>,
"Marco Elver" <elver@google.com>,
"Michał Winiarski" <michal.winiarski@intel.com>,
"Mike Rapoport" <rppt@kernel.org>,
"Parav Pandit" <parav@nvidia.com>,
"Pasha Tatashin" <pasha.tatashin@soleen.com>,
"Paul E. McKenney" <paulmck@kernel.org>,
"Pawan Gupta" <pawan.kumar.gupta@linux.intel.com>,
"Peter Zijlstra (Intel)" <peterz@infradead.org>,
"Pranjal Shrivastava" <praan@google.com>,
"Pratyush Yadav" <pratyush@kernel.org>,
"Raghavendra Rao Ananta" <rananta@google.com>,
"Randy Dunlap" <rdunlap@infradead.org>,
"Rodrigo Vivi" <rodrigo.vivi@intel.com>,
"Saeed Mahameed" <saeedm@nvidia.com>,
"Samiullah Khawaja" <skhawaja@google.com>,
"Shuah Khan" <skhan@linuxfoundation.org>,
"Vipin Sharma" <vipinsh@google.com>,
"Vivek Kasireddy" <vivek.kasireddy@intel.com>,
"William Tu" <witu@nvidia.com>,
"Zhu Yanjun" <yanjun.zhu@linux.dev>
Subject: Re: [PATCH v3 07/24] vfio/pci: Preserve vfio-pci device files across Live Update
Date: Tue, 24 Mar 2026 21:08:16 +0800 [thread overview]
Message-ID: <df5dac48-8a54-49e2-acb8-9370b7078033@intel.com> (raw)
In-Reply-To: <20260323235817.1960573-8-dmatlack@google.com>
On 3/24/26 07:57, David Matlack wrote:
> From: Vipin Sharma <vipinsh@google.com>
>
> Implement the live update file handler callbacks to preserve a vfio-pci
> device across a Live Update. Subsequent commits will enable userspace to
> then retrieve this file after the Live Update.
>
> Live Update support is scoped only to cdev files (i.e. not
> VFIO_GROUP_GET_DEVICE_FD files).
>
> State about each device is serialized into a new ABI struct
> vfio_pci_core_device_ser. The contents of this struct are preserved
> across the Live Update to the next kernel using a combination of
> Kexec-Handover (KHO) to preserve the page(s) holding the struct and the
> Live Update Orchestrator (LUO) to preserve the physical address of the
> struct.
>
> For now the only contents of struct vfio_pci_core_device_ser the
> device's PCI segment number and BDF, so that the device can be uniquely
> identified after the Live Update.
>
> Require that userspace disables interrupts on the device prior to
> freeze() so that the device does not send any interrupts until new
> interrupt handlers have been set up by the next kernel.
>
> Reset the device and restore its state in the freeze() callback. This
> ensures the device can be received by the next kernel in a consistent
> state. Eventually this will be dropped and the device can be preserved
> across in a running state, but that requires further work in VFIO and
> the core PCI layer.
>
> Note that LUO holds a reference to this file when it is preserved. So
> VFIO is guaranteed that vfio_df_device_last_close() will not be called
> on this device no matter what userspace does.
>
> Signed-off-by: Vipin Sharma <vipinsh@google.com>
> Co-developed-by: David Matlack <dmatlack@google.com>
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
> drivers/vfio/pci/vfio_pci.c | 2 +-
> drivers/vfio/pci/vfio_pci_core.c | 57 +++++----
> drivers/vfio/pci/vfio_pci_liveupdate.c | 156 ++++++++++++++++++++++++-
> drivers/vfio/pci/vfio_pci_priv.h | 4 +
> drivers/vfio/vfio_main.c | 3 +-
> include/linux/kho/abi/vfio_pci.h | 15 +++
> include/linux/vfio.h | 2 +
> 7 files changed, 213 insertions(+), 26 deletions(-)
>
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 41dcbe4ace67..351480d13f6e 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -125,7 +125,7 @@ static int vfio_pci_open_device(struct vfio_device *core_vdev)
> return 0;
> }
>
> -static const struct vfio_device_ops vfio_pci_ops = {
> +const struct vfio_device_ops vfio_pci_ops = {
> .name = "vfio-pci",
> .init = vfio_pci_core_init_dev,
> .release = vfio_pci_core_release_dev,
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index d43745fe4c84..81f941323641 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -585,9 +585,42 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev)
> }
> EXPORT_SYMBOL_GPL(vfio_pci_core_enable);
>
> +void vfio_pci_core_try_reset(struct vfio_pci_core_device *vdev)
> +{
> + struct pci_dev *pdev = vdev->pdev;
> + struct pci_dev *bridge = pci_upstream_bridge(pdev);
> +
> + lockdep_assert_held(&vdev->vdev.dev_set->lock);
> +
> + if (!vdev->reset_works)
> + return;
> +
> + /*
> + * Try to get the locks ourselves to prevent a deadlock. The
> + * success of this is dependent on being able to lock the device,
> + * which is not always possible.
> + *
> + * We cannot use the "try" reset interface here, since that will
> + * overwrite the previously restored configuration information.
> + */
> + if (bridge && !pci_dev_trylock(bridge))
> + return;
> +
> + if (!pci_dev_trylock(pdev))
> + goto out;
> +
> + if (!__pci_reset_function_locked(pdev))
> + vdev->needs_reset = false;
> +
> + pci_dev_unlock(pdev);
> +out:
> + if (bridge)
> + pci_dev_unlock(bridge);
> +}
> +EXPORT_SYMBOL_GPL(vfio_pci_core_try_reset);
> +
> void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
> {
> - struct pci_dev *bridge;
> struct pci_dev *pdev = vdev->pdev;
> struct vfio_pci_dummy_resource *dummy_res, *tmp;
> struct vfio_pci_ioeventfd *ioeventfd, *ioeventfd_tmp;
> @@ -687,27 +720,7 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
> */
> pci_write_config_word(pdev, PCI_COMMAND, PCI_COMMAND_INTX_DISABLE);
>
> - /*
> - * Try to get the locks ourselves to prevent a deadlock. The
> - * success of this is dependent on being able to lock the device,
> - * which is not always possible.
> - * We can not use the "try" reset interface here, which will
> - * overwrite the previously restored configuration information.
> - */
> - if (vdev->reset_works) {
> - bridge = pci_upstream_bridge(pdev);
> - if (bridge && !pci_dev_trylock(bridge))
> - goto out_restore_state;
> - if (pci_dev_trylock(pdev)) {
> - if (!__pci_reset_function_locked(pdev))
> - vdev->needs_reset = false;
> - pci_dev_unlock(pdev);
> - }
> - if (bridge)
> - pci_dev_unlock(bridge);
> - }
> -
> -out_restore_state:
> + vfio_pci_core_try_reset(vdev);
> pci_restore_state(pdev);
> out:
> pci_disable_device(pdev);
> diff --git a/drivers/vfio/pci/vfio_pci_liveupdate.c b/drivers/vfio/pci/vfio_pci_liveupdate.c
> index 5ea5af46b159..c4ebc7c486e5 100644
> --- a/drivers/vfio/pci/vfio_pci_liveupdate.c
> +++ b/drivers/vfio/pci/vfio_pci_liveupdate.c
> @@ -6,27 +6,178 @@
> * David Matlack <dmatlack@google.com>
> */
>
> +/**
> + * DOC: VFIO PCI Preservation via LUO
> + *
> + * VFIO PCI devices can be preserved over a kexec using the Live Update
> + * Orchestrator (LUO) file preservation. This allows userspace (such as a VMM)
> + * to transfer an in-use device to the next kernel.
> + *
> + * .. note::
> + * The support for preserving VFIO PCI devices is currently *partial* and
> + * should be considered *experimental*. It should only be used by developers
> + * working on expanding the support for the time being.
> + *
> + * To avoid accidental usage while the support is still experimental, this
> + * support is hidden behind a default-disable config option
> + * ``CONFIG_VFIO_PCI_LIVEUPDATE``. Once the kernel support has stabilized and
> + * become complete, this option will be enabled by default when
> + * ``CONFIG_VFIO_PCI`` and ``CONFIG_LIVEUPDATE`` are enabled.
> + *
> + * Usage Example
> + * =============
> + *
> + * VFIO PCI devices can be preserved across a kexec by preserving the file
> + * associated with the device in a LUO session::
> + *
> + * device_fd = open("/dev/vfio/devices/X");
/dev/vfio/devices/vfioX
> + * ...
> + * ioctl(session_fd, LIVEUPDATE_SESSION_PRESERVE_FD, { ..., device_fd, ...});
> + *
> + * .. note::
> + * LUO will hold an extra reference to the device file for as long as it is
> + * preserved, so there is no way for the file to be destroyed or the device
> + * to be unbound from the vfio-pci driver while it is preserved.
> + *
> + * Retrieving the file after kexec is not yet supported.
> + *
> + * Restrictions
> + * ============
> + *
> + * The kernel imposes the following restrictions when preserving VFIO devices:
> + *
> + * * The device must be bound to the ``vfio-pci`` driver.
> + *
> + * * ``CONFIG_VFIO_PCI_ZDEV_KVM`` must not be enabled. This may be relaxed in
> + * the future.
> + *
> + * * The device not be an Intel display device. This may be relaxed in the
> + * future.
> + *
> + * * The device file must have been acquired from the VFIO character device,
> + * not ``VFIO_GROUP_GET_DEVICE_FD``.
how about "The device file descriptor must be obtained by opening the
VFIO device
character device (``/dev/vfio/devices/vfioX``), not via
``VFIO_GROUP_GET_DEVICE_FD``."?
just be aligned with the below words in vfio.rst.
"Traditionally user acquires a device fd via VFIO_GROUP_GET_DEVICE_FD
user can now acquire a device fd by directly opening a character device
/dev/vfio/devices/vfioX"
> + *
> + * * The device must have interrupt disable prior to kexec. Failure to disable
> + * interrupts on the device will cause the ``reboot(LINUX_REBOOT_CMD_KEXEC)``
> + * syscall (to initiate the kexec) to fail.
> + *
> + * Preservation Behavior
> + * =====================
> + *
> + * The eventual goal of this support is to avoid disrupting the workload, state,
> + * or configuration of each preserved device during a Live Update. This would
> + * include allowing the device to perform DMA to preserved memory buffers and
> + * perform P2P DMA to other preserved devices. However, there are many pieces
> + * that still need to land in the kernel.
> + *
> + * For now, VFIO only preserves the following state for for devices:
> + *
> + * * The PCI Segment, Bus, Device, and Function numbers of the device. The
> + * kernel guarantees the these will not change across a kexec when a device
> + * is preserved.
> + *
> + * Since the kernel is not yet prepared to preserve all parts of the device and
> + * its dependencies (such as DMA mappings), VFIO currently resets and restores
> + * preserved devices back into an idle state during kexec, before handing off
> + * control to the next kernel. This will be relaxed in future versions of the
> + * kernel once it is safe to allow the device to keep running across kexec.
> + */
> +
> #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>
> +#include <linux/kexec_handover.h>
> #include <linux/kho/abi/vfio_pci.h>
> #include <linux/liveupdate.h>
> #include <linux/errno.h>
> +#include <linux/vfio.h>
maybe follow alphabet order. errno.h would be moved to the top first.
Regards,Yi Liu
next prev parent reply other threads:[~2026-03-24 13:01 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-23 23:57 [PATCH v3 00/24] vfio/pci: Base Live Update support for VFIO device files David Matlack
2026-03-23 23:57 ` [PATCH v3 01/24] liveupdate: Export symbols needed by modules David Matlack
2026-03-23 23:57 ` [PATCH v3 02/24] PCI: Add API to track PCI devices preserved across Live Update David Matlack
2026-03-25 20:06 ` David Matlack
2026-03-25 23:12 ` Bjorn Helgaas
2026-03-26 21:39 ` David Matlack
2026-03-30 22:54 ` Bjorn Helgaas
2026-03-31 17:33 ` Samiullah Khawaja
2026-04-02 21:28 ` Yanjun.Zhu
2026-04-03 17:24 ` Chris Li
2026-04-03 21:58 ` David Matlack
2026-04-05 16:56 ` Zhu Yanjun
2026-04-06 16:06 ` David Matlack
2026-04-06 18:09 ` Yanjun.Zhu
2026-03-23 23:57 ` [PATCH v3 03/24] PCI: Require Live Update preserved devices are in singleton iommu_groups David Matlack
2026-03-24 13:07 ` Yi Liu
2026-03-24 18:00 ` David Matlack
2026-03-25 11:12 ` Yi Liu
2026-03-25 17:29 ` David Matlack
2026-03-25 23:13 ` Bjorn Helgaas
2026-03-23 23:57 ` [PATCH v3 04/24] PCI: Inherit bus numbers from previous kernel during Live Update David Matlack
2026-03-23 23:57 ` [PATCH v3 05/24] docs: liveupdate: Add documentation for PCI David Matlack
2026-03-23 23:57 ` [PATCH v3 06/24] vfio/pci: Register a file handler with Live Update Orchestrator David Matlack
2026-03-24 13:07 ` Yi Liu
2026-03-24 16:33 ` David Matlack
2026-03-23 23:57 ` [PATCH v3 07/24] vfio/pci: Preserve vfio-pci device files across Live Update David Matlack
2026-03-24 13:08 ` Yi Liu [this message]
2026-03-24 16:46 ` David Matlack
2026-03-27 23:39 ` Samiullah Khawaja
2026-03-23 23:58 ` [PATCH v3 08/24] vfio/pci: Retrieve preserved device files after " David Matlack
2026-03-24 13:08 ` Yi Liu
2026-03-24 17:05 ` David Matlack
2026-03-23 23:58 ` [PATCH v3 09/24] vfio/pci: Notify PCI subsystem about devices preserved across " David Matlack
2026-03-23 23:58 ` [PATCH v3 10/24] vfio: Enforce preserved devices are retrieved via LIVEUPDATE_SESSION_RETRIEVE_FD David Matlack
2026-03-23 23:58 ` [PATCH v3 11/24] vfio/pci: Store incoming Live Update state in struct vfio_pci_core_device David Matlack
2026-03-23 23:58 ` [PATCH v3 12/24] vfio/pci: Skip reset of preserved device after Live Update David Matlack
2026-03-23 23:58 ` [PATCH v3 13/24] docs: liveupdate: Add documentation for VFIO PCI David Matlack
2026-03-23 23:58 ` [PATCH v3 14/24] selftests/liveupdate: Move luo_test_utils.* into a reusable library David Matlack
2026-03-23 23:58 ` [PATCH v3 15/24] selftests/liveupdate: Add helpers to preserve/retrieve FDs David Matlack
2026-03-23 23:58 ` [PATCH v3 16/24] vfio: selftests: Build liveupdate library in VFIO selftests David Matlack
2026-03-23 23:58 ` [PATCH v3 17/24] vfio: selftests: Add Makefile support for TEST_GEN_PROGS_EXTENDED David Matlack
2026-03-23 23:58 ` [PATCH v3 18/24] vfio: selftests: Add vfio_pci_liveupdate_uapi_test David Matlack
2026-03-23 23:58 ` [PATCH v3 19/24] vfio: selftests: Initialize vfio_pci_device using a VFIO cdev FD David Matlack
2026-03-23 23:58 ` [PATCH v3 20/24] vfio: selftests: Add vfio_pci_liveupdate_kexec_test David Matlack
2026-03-23 23:58 ` [PATCH v3 21/24] vfio: selftests: Expose iommu_modes to tests David Matlack
2026-03-23 23:58 ` [PATCH v3 22/24] vfio: selftests: Expose low-level helper routines for setting up struct vfio_pci_device David Matlack
2026-03-23 23:58 ` [PATCH v3 23/24] vfio: selftests: Verify that opening VFIO device fails during Live Update David Matlack
2026-03-23 23:58 ` [PATCH v3 24/24] vfio: selftests: Add continuous DMA to vfio_pci_liveupdate_kexec_test David Matlack
2026-03-26 20:43 ` [PATCH v3 00/24] vfio/pci: Base Live Update support for VFIO device files David Matlack
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=df5dac48-8a54-49e2-acb8-9370b7078033@intel.com \
--to=yi.l.liu@intel.com \
--cc=ajayachandra@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=alex@shazbot.org \
--cc=amastro@fb.com \
--cc=ankita@nvidia.com \
--cc=arnd@arndb.de \
--cc=bhelgaas@google.com \
--cc=bp@alien8.de \
--cc=chrisl@kernel.org \
--cc=corbet@lwn.net \
--cc=dapeng1.mi@linux.intel.com \
--cc=dmatlack@google.com \
--cc=elver@google.com \
--cc=feng.tang@linux.alibaba.com \
--cc=graf@amazon.com \
--cc=jacob.pan@linux.microsoft.com \
--cc=jgg@nvidia.com \
--cc=jgg@ziepe.ca \
--cc=jrhilke@google.com \
--cc=kees@kernel.org \
--cc=kevin.tian@intel.com \
--cc=kexec@lists.infradead.org \
--cc=kvm@vger.kernel.org \
--cc=leon@kernel.org \
--cc=leonro@nvidia.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-pci@vger.kernel.org \
--cc=lirongqing@baidu.com \
--cc=lukas@wunner.de \
--cc=michal.winiarski@intel.com \
--cc=parav@nvidia.com \
--cc=pasha.tatashin@soleen.com \
--cc=paulmck@kernel.org \
--cc=pawan.kumar.gupta@linux.intel.com \
--cc=peterz@infradead.org \
--cc=praan@google.com \
--cc=pratyush@kernel.org \
--cc=rananta@google.com \
--cc=rdunlap@infradead.org \
--cc=rientjes@google.com \
--cc=rodrigo.vivi@intel.com \
--cc=rppt@kernel.org \
--cc=saeedm@nvidia.com \
--cc=safinaskar@gmail.com \
--cc=skhan@linuxfoundation.org \
--cc=skhawaja@google.com \
--cc=vipinsh@google.com \
--cc=vivek.kasireddy@intel.com \
--cc=witu@nvidia.com \
--cc=yanjun.zhu@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox