linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yi Liu <yi.l.liu@intel.com>
To: David Matlack <dmatlack@google.com>,
	Alex Williamson <alex@shazbot.org>,
	Bjorn Helgaas <bhelgaas@google.com>
Cc: "Adithya Jayachandran" <ajayachandra@nvidia.com>,
	"Alexander Graf" <graf@amazon.com>,
	"Alex Mastro" <amastro@fb.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Ankit Agrawal" <ankita@nvidia.com>,
	"Arnd Bergmann" <arnd@arndb.de>,
	"Askar Safin" <safinaskar@gmail.com>,
	"Borislav Petkov (AMD)" <bp@alien8.de>,
	"Chris Li" <chrisl@kernel.org>,
	"Dapeng Mi" <dapeng1.mi@linux.intel.com>,
	"David Rientjes" <rientjes@google.com>,
	"Feng Tang" <feng.tang@linux.alibaba.com>,
	"Jacob Pan" <jacob.pan@linux.microsoft.com>,
	"Jason Gunthorpe" <jgg@nvidia.com>,
	"Jason Gunthorpe" <jgg@ziepe.ca>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Josh Hilke" <jrhilke@google.com>, "Kees Cook" <kees@kernel.org>,
	"Kevin Tian" <kevin.tian@intel.com>,
	kexec@lists.infradead.org, kvm@vger.kernel.org,
	"Leon Romanovsky" <leon@kernel.org>,
	"Leon Romanovsky" <leonro@nvidia.com>,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-kselftest@vger.kernel.org, linux-mm@kvack.org,
	linux-pci@vger.kernel.org, "Li RongQing" <lirongqing@baidu.com>,
	"Lukas Wunner" <lukas@wunner.de>,
	"Marco Elver" <elver@google.com>,
	"Michał Winiarski" <michal.winiarski@intel.com>,
	"Mike Rapoport" <rppt@kernel.org>,
	"Parav Pandit" <parav@nvidia.com>,
	"Pasha Tatashin" <pasha.tatashin@soleen.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	"Pawan Gupta" <pawan.kumar.gupta@linux.intel.com>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	"Pranjal Shrivastava" <praan@google.com>,
	"Pratyush Yadav" <pratyush@kernel.org>,
	"Raghavendra Rao Ananta" <rananta@google.com>,
	"Randy Dunlap" <rdunlap@infradead.org>,
	"Rodrigo Vivi" <rodrigo.vivi@intel.com>,
	"Saeed Mahameed" <saeedm@nvidia.com>,
	"Samiullah Khawaja" <skhawaja@google.com>,
	"Shuah Khan" <skhan@linuxfoundation.org>,
	"Vipin Sharma" <vipinsh@google.com>,
	"Vivek Kasireddy" <vivek.kasireddy@intel.com>,
	"William Tu" <witu@nvidia.com>,
	"Zhu Yanjun" <yanjun.zhu@linux.dev>
Subject: Re: [PATCH v3 07/24] vfio/pci: Preserve vfio-pci device files across Live Update
Date: Tue, 24 Mar 2026 21:08:16 +0800	[thread overview]
Message-ID: <df5dac48-8a54-49e2-acb8-9370b7078033@intel.com> (raw)
In-Reply-To: <20260323235817.1960573-8-dmatlack@google.com>

On 3/24/26 07:57, David Matlack wrote:
> From: Vipin Sharma <vipinsh@google.com>
> 
> Implement the live update file handler callbacks to preserve a vfio-pci
> device across a Live Update. Subsequent commits will enable userspace to
> then retrieve this file after the Live Update.
> 
> Live Update support is scoped only to cdev files (i.e. not
> VFIO_GROUP_GET_DEVICE_FD files).
> 
> State about each device is serialized into a new ABI struct
> vfio_pci_core_device_ser. The contents of this struct are preserved
> across the Live Update to the next kernel using a combination of
> Kexec-Handover (KHO) to preserve the page(s) holding the struct and the
> Live Update Orchestrator (LUO) to preserve the physical address of the
> struct.
> 
> For now the only contents of struct vfio_pci_core_device_ser the
> device's PCI segment number and BDF, so that the device can be uniquely
> identified after the Live Update.
> 
> Require that userspace disables interrupts on the device prior to
> freeze() so that the device does not send any interrupts until new
> interrupt handlers have been set up by the next kernel.
> 
> Reset the device and restore its state in the freeze() callback. This
> ensures the device can be received by the next kernel in a consistent
> state. Eventually this will be dropped and the device can be preserved
> across in a running state, but that requires further work in VFIO and
> the core PCI layer.
> 
> Note that LUO holds a reference to this file when it is preserved. So
> VFIO is guaranteed that vfio_df_device_last_close() will not be called
> on this device no matter what userspace does.
> 
> Signed-off-by: Vipin Sharma <vipinsh@google.com>
> Co-developed-by: David Matlack <dmatlack@google.com>
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>   drivers/vfio/pci/vfio_pci.c            |   2 +-
>   drivers/vfio/pci/vfio_pci_core.c       |  57 +++++----
>   drivers/vfio/pci/vfio_pci_liveupdate.c | 156 ++++++++++++++++++++++++-
>   drivers/vfio/pci/vfio_pci_priv.h       |   4 +
>   drivers/vfio/vfio_main.c               |   3 +-
>   include/linux/kho/abi/vfio_pci.h       |  15 +++
>   include/linux/vfio.h                   |   2 +
>   7 files changed, 213 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 41dcbe4ace67..351480d13f6e 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -125,7 +125,7 @@ static int vfio_pci_open_device(struct vfio_device *core_vdev)
>   	return 0;
>   }
>   
> -static const struct vfio_device_ops vfio_pci_ops = {
> +const struct vfio_device_ops vfio_pci_ops = {
>   	.name		= "vfio-pci",
>   	.init		= vfio_pci_core_init_dev,
>   	.release	= vfio_pci_core_release_dev,
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index d43745fe4c84..81f941323641 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -585,9 +585,42 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev)
>   }
>   EXPORT_SYMBOL_GPL(vfio_pci_core_enable);
>   
> +void vfio_pci_core_try_reset(struct vfio_pci_core_device *vdev)
> +{
> +	struct pci_dev *pdev = vdev->pdev;
> +	struct pci_dev *bridge = pci_upstream_bridge(pdev);
> +
> +	lockdep_assert_held(&vdev->vdev.dev_set->lock);
> +
> +	if (!vdev->reset_works)
> +		return;
> +
> +	/*
> +	 * Try to get the locks ourselves to prevent a deadlock. The
> +	 * success of this is dependent on being able to lock the device,
> +	 * which is not always possible.
> +	 *
> +	 * We cannot use the "try" reset interface here, since that will
> +	 * overwrite the previously restored configuration information.
> +	 */
> +	if (bridge && !pci_dev_trylock(bridge))
> +		return;
> +
> +	if (!pci_dev_trylock(pdev))
> +		goto out;
> +
> +	if (!__pci_reset_function_locked(pdev))
> +		vdev->needs_reset = false;
> +
> +	pci_dev_unlock(pdev);
> +out:
> +	if (bridge)
> +		pci_dev_unlock(bridge);
> +}
> +EXPORT_SYMBOL_GPL(vfio_pci_core_try_reset);
> +
>   void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
>   {
> -	struct pci_dev *bridge;
>   	struct pci_dev *pdev = vdev->pdev;
>   	struct vfio_pci_dummy_resource *dummy_res, *tmp;
>   	struct vfio_pci_ioeventfd *ioeventfd, *ioeventfd_tmp;
> @@ -687,27 +720,7 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
>   	 */
>   	pci_write_config_word(pdev, PCI_COMMAND, PCI_COMMAND_INTX_DISABLE);
>   
> -	/*
> -	 * Try to get the locks ourselves to prevent a deadlock. The
> -	 * success of this is dependent on being able to lock the device,
> -	 * which is not always possible.
> -	 * We can not use the "try" reset interface here, which will
> -	 * overwrite the previously restored configuration information.
> -	 */
> -	if (vdev->reset_works) {
> -		bridge = pci_upstream_bridge(pdev);
> -		if (bridge && !pci_dev_trylock(bridge))
> -			goto out_restore_state;
> -		if (pci_dev_trylock(pdev)) {
> -			if (!__pci_reset_function_locked(pdev))
> -				vdev->needs_reset = false;
> -			pci_dev_unlock(pdev);
> -		}
> -		if (bridge)
> -			pci_dev_unlock(bridge);
> -	}
> -
> -out_restore_state:
> +	vfio_pci_core_try_reset(vdev);
>   	pci_restore_state(pdev);
>   out:
>   	pci_disable_device(pdev);
> diff --git a/drivers/vfio/pci/vfio_pci_liveupdate.c b/drivers/vfio/pci/vfio_pci_liveupdate.c
> index 5ea5af46b159..c4ebc7c486e5 100644
> --- a/drivers/vfio/pci/vfio_pci_liveupdate.c
> +++ b/drivers/vfio/pci/vfio_pci_liveupdate.c
> @@ -6,27 +6,178 @@
>    * David Matlack <dmatlack@google.com>
>    */
>   
> +/**
> + * DOC: VFIO PCI Preservation via LUO
> + *
> + * VFIO PCI devices can be preserved over a kexec using the Live Update
> + * Orchestrator (LUO) file preservation. This allows userspace (such as a VMM)
> + * to transfer an in-use device to the next kernel.
> + *
> + * .. note::
> + *    The support for preserving VFIO PCI devices is currently *partial* and
> + *    should be considered *experimental*. It should only be used by developers
> + *    working on expanding the support for the time being.
> + *
> + *    To avoid accidental usage while the support is still experimental, this
> + *    support is hidden behind a default-disable config option
> + *    ``CONFIG_VFIO_PCI_LIVEUPDATE``. Once the kernel support has stabilized and
> + *    become complete, this option will be enabled by default when
> + *    ``CONFIG_VFIO_PCI`` and ``CONFIG_LIVEUPDATE`` are enabled.
> + *
> + * Usage Example
> + * =============
> + *
> + * VFIO PCI devices can be preserved across a kexec by preserving the file
> + * associated with the device in a LUO session::
> + *
> + *   device_fd = open("/dev/vfio/devices/X");

/dev/vfio/devices/vfioX

> + *   ...
> + *   ioctl(session_fd, LIVEUPDATE_SESSION_PRESERVE_FD, { ..., device_fd, ...});
> + *
> + * .. note::
> + *    LUO will hold an extra reference to the device file for as long as it is
> + *    preserved, so there is no way for the file to be destroyed or the device
> + *    to be unbound from the vfio-pci driver while it is preserved.
> + *
> + * Retrieving the file after kexec is not yet supported.
> + *
> + * Restrictions
> + * ============
> + *
> + * The kernel imposes the following restrictions when preserving VFIO devices:
> + *
> + *  * The device must be bound to the ``vfio-pci`` driver.
> + *
> + *  * ``CONFIG_VFIO_PCI_ZDEV_KVM`` must not be enabled. This may be relaxed in
> + *    the future.
> + *
> + *  * The device not be an Intel display device. This may be relaxed in the
> + *    future.
> + *
> + *  * The device file must have been acquired from the VFIO character device,
> + *    not ``VFIO_GROUP_GET_DEVICE_FD``.

how about "The device file descriptor must be obtained by opening the 
VFIO device
character device (``/dev/vfio/devices/vfioX``), not via 
``VFIO_GROUP_GET_DEVICE_FD``."?

just be aligned with the below words in vfio.rst.

"Traditionally user acquires a device fd via VFIO_GROUP_GET_DEVICE_FD
user can now acquire a device fd by directly opening a character device 
/dev/vfio/devices/vfioX"

> + *
> + *  * The device must have interrupt disable prior to kexec. Failure to disable
> + *    interrupts on the device will cause the ``reboot(LINUX_REBOOT_CMD_KEXEC)``
> + *    syscall (to initiate the kexec) to fail.
> + *
> + * Preservation Behavior
> + * =====================
> + *
> + * The eventual goal of this support is to avoid disrupting the workload, state,
> + * or configuration of each preserved device during a Live Update. This would
> + * include allowing the device to perform DMA to preserved memory buffers and
> + * perform P2P DMA to other preserved devices. However, there are many pieces
> + * that still need to land in the kernel.
> + *
> + * For now, VFIO only preserves the following state for for devices:
> + *
> + *  * The PCI Segment, Bus, Device, and Function numbers of the device. The
> + *    kernel guarantees the these will not change across a kexec when a device
> + *    is preserved.
> + *
> + * Since the kernel is not yet prepared to preserve all parts of the device and
> + * its dependencies (such as DMA mappings), VFIO currently resets and restores
> + * preserved devices back into an idle state during kexec, before handing off
> + * control to the next kernel. This will be relaxed in future versions of the
> + * kernel once it is safe to allow the device to keep running across kexec.
> + */
> +
>   #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>   
> +#include <linux/kexec_handover.h>
>   #include <linux/kho/abi/vfio_pci.h>
>   #include <linux/liveupdate.h>
>   #include <linux/errno.h>
> +#include <linux/vfio.h>

maybe follow alphabet order. errno.h would be moved to the top first.

Regards,Yi Liu


  reply	other threads:[~2026-03-24 13:01 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-23 23:57 [PATCH v3 00/24] vfio/pci: Base Live Update support for VFIO device files David Matlack
2026-03-23 23:57 ` [PATCH v3 01/24] liveupdate: Export symbols needed by modules David Matlack
2026-03-23 23:57 ` [PATCH v3 02/24] PCI: Add API to track PCI devices preserved across Live Update David Matlack
2026-03-25 20:06   ` David Matlack
2026-03-25 23:12   ` Bjorn Helgaas
2026-03-26 21:39     ` David Matlack
2026-03-30 22:54       ` Bjorn Helgaas
2026-03-31 17:33   ` Samiullah Khawaja
2026-04-02 21:28   ` Yanjun.Zhu
2026-04-03 17:24     ` Chris Li
2026-04-03 21:58     ` David Matlack
2026-04-05 16:56       ` Zhu Yanjun
2026-04-06 16:06         ` David Matlack
2026-04-06 18:09           ` Yanjun.Zhu
2026-03-23 23:57 ` [PATCH v3 03/24] PCI: Require Live Update preserved devices are in singleton iommu_groups David Matlack
2026-03-24 13:07   ` Yi Liu
2026-03-24 18:00     ` David Matlack
2026-03-25 11:12       ` Yi Liu
2026-03-25 17:29         ` David Matlack
2026-03-25 23:13   ` Bjorn Helgaas
2026-03-23 23:57 ` [PATCH v3 04/24] PCI: Inherit bus numbers from previous kernel during Live Update David Matlack
2026-03-23 23:57 ` [PATCH v3 05/24] docs: liveupdate: Add documentation for PCI David Matlack
2026-03-23 23:57 ` [PATCH v3 06/24] vfio/pci: Register a file handler with Live Update Orchestrator David Matlack
2026-03-24 13:07   ` Yi Liu
2026-03-24 16:33     ` David Matlack
2026-03-23 23:57 ` [PATCH v3 07/24] vfio/pci: Preserve vfio-pci device files across Live Update David Matlack
2026-03-24 13:08   ` Yi Liu [this message]
2026-03-24 16:46     ` David Matlack
2026-03-27 23:39   ` Samiullah Khawaja
2026-03-23 23:58 ` [PATCH v3 08/24] vfio/pci: Retrieve preserved device files after " David Matlack
2026-03-24 13:08   ` Yi Liu
2026-03-24 17:05     ` David Matlack
2026-03-23 23:58 ` [PATCH v3 09/24] vfio/pci: Notify PCI subsystem about devices preserved across " David Matlack
2026-03-23 23:58 ` [PATCH v3 10/24] vfio: Enforce preserved devices are retrieved via LIVEUPDATE_SESSION_RETRIEVE_FD David Matlack
2026-03-23 23:58 ` [PATCH v3 11/24] vfio/pci: Store incoming Live Update state in struct vfio_pci_core_device David Matlack
2026-03-23 23:58 ` [PATCH v3 12/24] vfio/pci: Skip reset of preserved device after Live Update David Matlack
2026-03-23 23:58 ` [PATCH v3 13/24] docs: liveupdate: Add documentation for VFIO PCI David Matlack
2026-03-23 23:58 ` [PATCH v3 14/24] selftests/liveupdate: Move luo_test_utils.* into a reusable library David Matlack
2026-03-23 23:58 ` [PATCH v3 15/24] selftests/liveupdate: Add helpers to preserve/retrieve FDs David Matlack
2026-03-23 23:58 ` [PATCH v3 16/24] vfio: selftests: Build liveupdate library in VFIO selftests David Matlack
2026-03-23 23:58 ` [PATCH v3 17/24] vfio: selftests: Add Makefile support for TEST_GEN_PROGS_EXTENDED David Matlack
2026-03-23 23:58 ` [PATCH v3 18/24] vfio: selftests: Add vfio_pci_liveupdate_uapi_test David Matlack
2026-03-23 23:58 ` [PATCH v3 19/24] vfio: selftests: Initialize vfio_pci_device using a VFIO cdev FD David Matlack
2026-03-23 23:58 ` [PATCH v3 20/24] vfio: selftests: Add vfio_pci_liveupdate_kexec_test David Matlack
2026-03-23 23:58 ` [PATCH v3 21/24] vfio: selftests: Expose iommu_modes to tests David Matlack
2026-03-23 23:58 ` [PATCH v3 22/24] vfio: selftests: Expose low-level helper routines for setting up struct vfio_pci_device David Matlack
2026-03-23 23:58 ` [PATCH v3 23/24] vfio: selftests: Verify that opening VFIO device fails during Live Update David Matlack
2026-03-23 23:58 ` [PATCH v3 24/24] vfio: selftests: Add continuous DMA to vfio_pci_liveupdate_kexec_test David Matlack
2026-03-26 20:43 ` [PATCH v3 00/24] vfio/pci: Base Live Update support for VFIO device files David Matlack

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=df5dac48-8a54-49e2-acb8-9370b7078033@intel.com \
    --to=yi.l.liu@intel.com \
    --cc=ajayachandra@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex@shazbot.org \
    --cc=amastro@fb.com \
    --cc=ankita@nvidia.com \
    --cc=arnd@arndb.de \
    --cc=bhelgaas@google.com \
    --cc=bp@alien8.de \
    --cc=chrisl@kernel.org \
    --cc=corbet@lwn.net \
    --cc=dapeng1.mi@linux.intel.com \
    --cc=dmatlack@google.com \
    --cc=elver@google.com \
    --cc=feng.tang@linux.alibaba.com \
    --cc=graf@amazon.com \
    --cc=jacob.pan@linux.microsoft.com \
    --cc=jgg@nvidia.com \
    --cc=jgg@ziepe.ca \
    --cc=jrhilke@google.com \
    --cc=kees@kernel.org \
    --cc=kevin.tian@intel.com \
    --cc=kexec@lists.infradead.org \
    --cc=kvm@vger.kernel.org \
    --cc=leon@kernel.org \
    --cc=leonro@nvidia.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lirongqing@baidu.com \
    --cc=lukas@wunner.de \
    --cc=michal.winiarski@intel.com \
    --cc=parav@nvidia.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=paulmck@kernel.org \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=peterz@infradead.org \
    --cc=praan@google.com \
    --cc=pratyush@kernel.org \
    --cc=rananta@google.com \
    --cc=rdunlap@infradead.org \
    --cc=rientjes@google.com \
    --cc=rodrigo.vivi@intel.com \
    --cc=rppt@kernel.org \
    --cc=saeedm@nvidia.com \
    --cc=safinaskar@gmail.com \
    --cc=skhan@linuxfoundation.org \
    --cc=skhawaja@google.com \
    --cc=vipinsh@google.com \
    --cc=vivek.kasireddy@intel.com \
    --cc=witu@nvidia.com \
    --cc=yanjun.zhu@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox