linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Samiullah Khawaja <skhawaja@google.com>
To: David Matlack <dmatlack@google.com>
Cc: "Alex Williamson" <alex@shazbot.org>,
	"Adithya Jayachandran" <ajayachandra@nvidia.com>,
	"Alexander Graf" <graf@amazon.com>,
	"Alex Mastro" <amastro@fb.com>,
	"Alistair Popple" <apopple@nvidia.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Ankit Agrawal" <ankita@nvidia.com>,
	"Bjorn Helgaas" <bhelgaas@google.com>,
	"Chris Li" <chrisl@kernel.org>,
	"David Rientjes" <rientjes@google.com>,
	"Jacob Pan" <jacob.pan@linux.microsoft.com>,
	"Jason Gunthorpe" <jgg@nvidia.com>,
	"Jason Gunthorpe" <jgg@ziepe.ca>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Josh Hilke" <jrhilke@google.com>,
	"Kevin Tian" <kevin.tian@intel.com>,
	kexec@lists.infradead.org, kvm@vger.kernel.org,
	"Leon Romanovsky" <leon@kernel.org>,
	"Leon Romanovsky" <leonro@nvidia.com>,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-kselftest@vger.kernel.org, linux-mm@kvack.org,
	linux-pci@vger.kernel.org, "Lukas Wunner" <lukas@wunner.de>,
	"Michał Winiarski" <michal.winiarski@intel.com>,
	"Mike Rapoport" <rppt@kernel.org>,
	"Parav Pandit" <parav@nvidia.com>,
	"Pasha Tatashin" <pasha.tatashin@soleen.com>,
	"Pranjal Shrivastava" <praan@google.com>,
	"Pratyush Yadav" <pratyush@kernel.org>,
	"Raghavendra Rao Ananta" <rananta@google.com>,
	"Rodrigo Vivi" <rodrigo.vivi@intel.com>,
	"Saeed Mahameed" <saeedm@nvidia.com>,
	"Shuah Khan" <skhan@linuxfoundation.org>,
	"Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
	"Tomita Moeko" <tomitamoeko@gmail.com>,
	"Vipin Sharma" <vipinsh@google.com>,
	"Vivek Kasireddy" <vivek.kasireddy@intel.com>,
	"William Tu" <witu@nvidia.com>, "Yi Liu" <yi.l.liu@intel.com>,
	"Zhu Yanjun" <yanjun.zhu@linux.dev>
Subject: Re: [PATCH v2 05/22] vfio/pci: Preserve vfio-pci device files across Live Update
Date: Mon, 23 Feb 2026 22:29:03 +0000	[thread overview]
Message-ID: <qshnqqtfg7clbcqoq45ei55wt6xt4z4xtwv5ehjyjxwq47cwng@n3vd3lxigcs6> (raw)
In-Reply-To: <20260129212510.967611-6-dmatlack@google.com>

On Thu, Jan 29, 2026 at 09:24:52PM +0000, David Matlack wrote:
>From: Vipin Sharma <vipinsh@google.com>
>
>Implement the live update file handler callbacks to preserve a vfio-pci
>device across a Live Update. Subsequent commits will enable userspace to
>then retrieve this file after the Live Update.
>
>Live Update support is scoped only to cdev files (i.e. not
>VFIO_GROUP_GET_DEVICE_FD files).
>
>State about each device is serialized into a new ABI struct
>vfio_pci_core_device_ser. The contents of this struct are preserved
>across the Live Update to the next kernel using a combination of
>Kexec-Handover (KHO) to preserve the page(s) holding the struct and the
>Live Update Orchestrator (LUO) to preserve the physical address of the
>struct.
>
>For now the only contents of struct vfio_pci_core_device_ser the
>device's PCI segment number and BDF, so that the device can be uniquely
>identified after the Live Update.
>
>Require that userspace disables interrupts on the device prior to
>freeze() so that the device does not send any interrupts until new
>interrupt handlers have been set up by the next kernel.
>
>Reset the device and restore its state in the freeze() callback. This
>ensures the device can be received by the next kernel in a consistent
>state. Eventually this will be dropped and the device can be preserved
>across in a running state, but that requires further work in VFIO and
>the core PCI layer.
>
>Note that LUO holds a reference to this file when it is preserved. So
>VFIO is guaranteed that vfio_df_device_last_close() will not be called
>on this device no matter what userspace does.
>
>Signed-off-by: Vipin Sharma <vipinsh@google.com>
>Co-developed-by: David Matlack <dmatlack@google.com>
>Signed-off-by: David Matlack <dmatlack@google.com>
>---
> drivers/vfio/pci/vfio_pci.c            |  2 +-
> drivers/vfio/pci/vfio_pci_liveupdate.c | 84 +++++++++++++++++++++++++-
> drivers/vfio/pci/vfio_pci_priv.h       |  2 +
> drivers/vfio/vfio.h                    | 13 ----
> drivers/vfio/vfio_main.c               | 10 +--
> include/linux/kho/abi/vfio_pci.h       | 15 +++++
> include/linux/vfio.h                   | 28 +++++++++
> 7 files changed, 129 insertions(+), 25 deletions(-)
>
>diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
>index 19e88322af2c..0260afb9492d 100644
>--- a/drivers/vfio/pci/vfio_pci.c
>+++ b/drivers/vfio/pci/vfio_pci.c
>@@ -125,7 +125,7 @@ static int vfio_pci_open_device(struct vfio_device *core_vdev)
> 	return 0;
> }
>
>-static const struct vfio_device_ops vfio_pci_ops = {
>+const struct vfio_device_ops vfio_pci_ops = {
> 	.name		= "vfio-pci",
> 	.init		= vfio_pci_core_init_dev,
> 	.release	= vfio_pci_core_release_dev,
>diff --git a/drivers/vfio/pci/vfio_pci_liveupdate.c b/drivers/vfio/pci/vfio_pci_liveupdate.c
>index b84e63c0357b..f01de98f1b75 100644
>--- a/drivers/vfio/pci/vfio_pci_liveupdate.c
>+++ b/drivers/vfio/pci/vfio_pci_liveupdate.c
>@@ -8,25 +8,104 @@
>
> #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>
>+#include <linux/kexec_handover.h>
> #include <linux/kho/abi/vfio_pci.h>
> #include <linux/liveupdate.h>
> #include <linux/errno.h>
>+#include <linux/vfio.h>
>
> #include "vfio_pci_priv.h"
>
> static bool vfio_pci_liveupdate_can_preserve(struct liveupdate_file_handler *handler,
> 					     struct file *file)
> {
>-	return false;
>+	struct vfio_device_file *df = to_vfio_device_file(file);
>+
>+	if (!df)
>+		return false;
>+
>+	/* Live Update support is limited to cdev files. */
>+	if (df->group)
>+		return false;
>+
>+	return df->device->ops == &vfio_pci_ops;
> }
>
> static int vfio_pci_liveupdate_preserve(struct liveupdate_file_op_args *args)
> {
>-	return -EOPNOTSUPP;
>+	struct vfio_device *device = vfio_device_from_file(args->file);
>+	struct vfio_pci_core_device_ser *ser;
>+	struct vfio_pci_core_device *vdev;
>+	struct pci_dev *pdev;
>+
>+	vdev = container_of(device, struct vfio_pci_core_device, vdev);
>+	pdev = vdev->pdev;
>+
>+	if (IS_ENABLED(CONFIG_VFIO_PCI_ZDEV_KVM))
>+		return -EINVAL;
>+
>+	if (vfio_pci_is_intel_display(pdev))
>+		return -EINVAL;
>+
>+	ser = kho_alloc_preserve(sizeof(*ser));
>+	if (IS_ERR(ser))
>+		return PTR_ERR(ser);
>+
>+	ser->bdf = pci_dev_id(pdev);
>+	ser->domain = pci_domain_nr(pdev->bus);
>+
>+	args->serialized_data = virt_to_phys(ser);
>+	return 0;
> }
>
> static void vfio_pci_liveupdate_unpreserve(struct liveupdate_file_op_args *args)
> {
>+	kho_unpreserve_free(phys_to_virt(args->serialized_data));
>+}
>+
>+static int vfio_pci_liveupdate_freeze(struct liveupdate_file_op_args *args)
>+{
>+	struct vfio_device *device = vfio_device_from_file(args->file);
>+	struct vfio_pci_core_device *vdev;
>+	struct pci_dev *pdev;
>+	int ret;
>+
>+	vdev = container_of(device, struct vfio_pci_core_device, vdev);
>+	pdev = vdev->pdev;
>+
>+	guard(mutex)(&device->dev_set->lock);
>+
>+	/*
>+	 * Userspace must disable interrupts on the device prior to freeze so
>+	 * that the device does not send any interrupts until new interrupt
>+	 * handlers have been established by the next kernel.
>+	 */
>+	if (vdev->irq_type != VFIO_PCI_NUM_IRQS) {
>+		pci_err(pdev, "Freeze failed! Interrupts are still enabled.\n");
>+		return -EINVAL;
>+	}
>+
>+	pci_dev_lock(pdev);
>+
>+	ret = pci_load_saved_state(pdev, vdev->pci_saved_state);
>+	if (ret)
>+		goto out;
>+
>+	/*
>+	 * Reset the device and restore it back to its original state before
>+	 * handing it to the next kernel.
>+	 *
>+	 * Eventually both of these should be dropped and the device should be
>+	 * kept running with its current state across the Live Update.
>+	 */
>+	if (vdev->reset_works)
>+		ret = __pci_reset_function_locked(pdev);
>+
>+	pci_restore_state(pdev);
>+
>+out:
>+	pci_dev_unlock(pdev);
>+	return ret;
> }
>
> static int vfio_pci_liveupdate_retrieve(struct liveupdate_file_op_args *args)
>@@ -42,6 +121,7 @@ static const struct liveupdate_file_ops vfio_pci_liveupdate_file_ops = {
> 	.can_preserve = vfio_pci_liveupdate_can_preserve,
> 	.preserve = vfio_pci_liveupdate_preserve,
> 	.unpreserve = vfio_pci_liveupdate_unpreserve,
>+	.freeze = vfio_pci_liveupdate_freeze,
> 	.retrieve = vfio_pci_liveupdate_retrieve,
> 	.finish = vfio_pci_liveupdate_finish,
> 	.owner = THIS_MODULE,
>diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h
>index 68966ec64e51..d3da79b7b03c 100644
>--- a/drivers/vfio/pci/vfio_pci_priv.h
>+++ b/drivers/vfio/pci/vfio_pci_priv.h
>@@ -11,6 +11,8 @@
> /* Cap maximum number of ioeventfds per device (arbitrary) */
> #define VFIO_PCI_IOEVENTFD_MAX		1000
>
>+extern const struct vfio_device_ops vfio_pci_ops;
>+
> struct vfio_pci_ioeventfd {
> 	struct list_head	next;
> 	struct vfio_pci_core_device	*vdev;
>diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
>index 50128da18bca..6b89edbbf174 100644
>--- a/drivers/vfio/vfio.h
>+++ b/drivers/vfio/vfio.h
>@@ -16,17 +16,6 @@ struct iommufd_ctx;
> struct iommu_group;
> struct vfio_container;
>
>-struct vfio_device_file {
>-	struct vfio_device *device;
>-	struct vfio_group *group;
>-
>-	u8 access_granted;
>-	u32 devid; /* only valid when iommufd is valid */
>-	spinlock_t kvm_ref_lock; /* protect kvm field */
>-	struct kvm *kvm;
>-	struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
>-};
>-
> void vfio_device_put_registration(struct vfio_device *device);
> bool vfio_device_try_get_registration(struct vfio_device *device);
> int vfio_df_open(struct vfio_device_file *df);
>@@ -34,8 +23,6 @@ void vfio_df_close(struct vfio_device_file *df);
> struct vfio_device_file *
> vfio_allocate_device_file(struct vfio_device *device);
>
>-extern const struct file_operations vfio_device_fops;
>-
> #ifdef CONFIG_VFIO_NOIOMMU
> extern bool vfio_noiommu __read_mostly;
> #else
>diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
>index f7df90c423b4..276f615f0c28 100644
>--- a/drivers/vfio/vfio_main.c
>+++ b/drivers/vfio/vfio_main.c
>@@ -1436,15 +1436,7 @@ const struct file_operations vfio_device_fops = {
> 	.show_fdinfo	= vfio_device_show_fdinfo,
> #endif
> };
>-
>-static struct vfio_device *vfio_device_from_file(struct file *file)
>-{
>-	struct vfio_device_file *df = file->private_data;
>-
>-	if (file->f_op != &vfio_device_fops)
>-		return NULL;
>-	return df->device;
>-}
>+EXPORT_SYMBOL_GPL(vfio_device_fops);
>
> /**
>  * vfio_file_is_valid - True if the file is valid vfio file
>diff --git a/include/linux/kho/abi/vfio_pci.h b/include/linux/kho/abi/vfio_pci.h
>index 37a845eed972..9bf58a2f3820 100644
>--- a/include/linux/kho/abi/vfio_pci.h
>+++ b/include/linux/kho/abi/vfio_pci.h
>@@ -9,6 +9,9 @@
> #ifndef _LINUX_LIVEUPDATE_ABI_VFIO_PCI_H
> #define _LINUX_LIVEUPDATE_ABI_VFIO_PCI_H
>
>+#include <linux/compiler.h>
>+#include <linux/types.h>
>+
> /**
>  * DOC: VFIO PCI Live Update ABI
>  *
>@@ -25,4 +28,16 @@
>
> #define VFIO_PCI_LUO_FH_COMPATIBLE "vfio-pci-v1"
>
>+/**
>+ * struct vfio_pci_core_device_ser - Serialized state of a single VFIO PCI
>+ * device.
>+ *
>+ * @bdf: The device's PCI bus, device, and function number.
>+ * @domain: The device's PCI domain number (segment).
>+ */
>+struct vfio_pci_core_device_ser {
>+	u16 bdf;
>+	u16 domain;
>+} __packed;
>+
> #endif /* _LINUX_LIVEUPDATE_ABI_VFIO_PCI_H */
>diff --git a/include/linux/vfio.h b/include/linux/vfio.h
>index e90859956514..9aa1587fea19 100644
>--- a/include/linux/vfio.h
>+++ b/include/linux/vfio.h
>@@ -81,6 +81,34 @@ struct vfio_device {
> #endif
> };
>
>+struct vfio_device_file {
>+	struct vfio_device *device;
>+	struct vfio_group *group;
>+
>+	u8 access_granted;
>+	u32 devid; /* only valid when iommufd is valid */
>+	spinlock_t kvm_ref_lock; /* protect kvm field */
>+	struct kvm *kvm;
>+	struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
>+};
>+
>+extern const struct file_operations vfio_device_fops;
>+
>+static inline struct vfio_device_file *to_vfio_device_file(struct file *file)
>+{
>+	if (file->f_op != &vfio_device_fops)
>+		return NULL;
>+
>+	return file->private_data;
>+}
>+
>+static inline struct vfio_device *vfio_device_from_file(struct file *file)
>+{
>+	struct vfio_device_file *df = to_vfio_device_file(file);
>+
>+	return df ? df->device : NULL;
>+}
>+
> /**
>  * struct vfio_device_ops - VFIO bus driver device callbacks
>  *
>-- 
>2.53.0.rc1.225.gd81095ad13-goog
>

Reviewed-by: Samiullah Khawaja <skhawaja@google.com>


  reply	other threads:[~2026-02-23 22:29 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-29 21:24 [PATCH v2 00/22] vfio/pci: Base Live Update support for VFIO device files David Matlack
2026-01-29 21:24 ` [PATCH v2 01/22] liveupdate: Export symbols needed by modules David Matlack
2026-01-29 21:24 ` [PATCH v2 02/22] PCI: Add API to track PCI devices preserved across Live Update David Matlack
2026-02-01  6:38   ` Zhu Yanjun
2026-02-02 18:14     ` David Matlack
2026-02-04  0:10       ` Yanjun.Zhu
2026-02-20 19:03         ` David Matlack
2026-02-23 22:04   ` Samiullah Khawaja
2026-02-23 23:08     ` David Matlack
2026-02-23 23:43       ` Samiullah Khawaja
2026-02-24  0:00         ` David Matlack
2026-01-29 21:24 ` [PATCH v2 03/22] PCI: Inherit bus numbers from previous kernel during " David Matlack
2026-01-29 21:24 ` [PATCH v2 04/22] vfio/pci: Register a file handler with Live Update Orchestrator David Matlack
2026-02-06 22:37   ` Yanjun.Zhu
2026-02-06 23:14     ` David Matlack
2026-01-29 21:24 ` [PATCH v2 05/22] vfio/pci: Preserve vfio-pci device files across Live Update David Matlack
2026-02-23 22:29   ` Samiullah Khawaja [this message]
2026-01-29 21:24 ` [PATCH v2 06/22] vfio/pci: Retrieve preserved device files after " David Matlack
2026-02-23 23:27   ` Samiullah Khawaja
2026-01-29 21:24 ` [PATCH v2 07/22] vfio/pci: Notify PCI subsystem about devices preserved across " David Matlack
2026-01-29 21:24 ` [PATCH v2 08/22] vfio: Enforce preserved devices are retrieved via LIVEUPDATE_SESSION_RETRIEVE_FD David Matlack
2026-01-29 21:24 ` [PATCH v2 09/22] vfio/pci: Store incoming Live Update state in struct vfio_pci_core_device David Matlack
2026-01-29 21:24 ` [PATCH v2 10/22] vfio/pci: Skip reset of preserved device after Live Update David Matlack
2026-01-29 22:21   ` Jacob Pan
2026-01-29 22:33     ` David Matlack
2026-01-30  0:31       ` Jacob Pan
2026-01-29 21:24 ` [PATCH v2 11/22] docs: liveupdate: Document VFIO device file preservation David Matlack
2026-01-29 21:24 ` [PATCH v2 12/22] selftests/liveupdate: Move luo_test_utils.* into a reusable library David Matlack
2026-01-29 21:25 ` [PATCH v2 13/22] selftests/liveupdate: Add helpers to preserve/retrieve FDs David Matlack
2026-01-29 21:25 ` [PATCH v2 14/22] vfio: selftests: Build liveupdate library in VFIO selftests David Matlack
2026-01-29 21:25 ` [PATCH v2 15/22] vfio: selftests: Add Makefile support for TEST_GEN_PROGS_EXTENDED David Matlack
2026-01-29 21:25 ` [PATCH v2 16/22] vfio: selftests: Add vfio_pci_liveupdate_uapi_test David Matlack
2026-01-29 21:25 ` [PATCH v2 17/22] vfio: selftests: Initialize vfio_pci_device using a VFIO cdev FD David Matlack
2026-01-29 21:25 ` [PATCH v2 18/22] vfio: selftests: Add vfio_pci_liveupdate_kexec_test David Matlack
2026-01-29 21:25 ` [PATCH v2 19/22] vfio: selftests: Expose iommu_modes to tests David Matlack
2026-01-29 21:25 ` [PATCH v2 20/22] vfio: selftests: Expose low-level helper routines for setting up struct vfio_pci_device David Matlack
2026-01-29 21:25 ` [PATCH v2 21/22] vfio: selftests: Verify that opening VFIO device fails during Live Update David Matlack
2026-01-29 21:25 ` [PATCH v2 22/22] vfio: selftests: Add continuous DMA to vfio_pci_liveupdate_kexec_test David Matlack

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=qshnqqtfg7clbcqoq45ei55wt6xt4z4xtwv5ehjyjxwq47cwng@n3vd3lxigcs6 \
    --to=skhawaja@google.com \
    --cc=ajayachandra@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex@shazbot.org \
    --cc=amastro@fb.com \
    --cc=ankita@nvidia.com \
    --cc=apopple@nvidia.com \
    --cc=bhelgaas@google.com \
    --cc=chrisl@kernel.org \
    --cc=corbet@lwn.net \
    --cc=dmatlack@google.com \
    --cc=graf@amazon.com \
    --cc=jacob.pan@linux.microsoft.com \
    --cc=jgg@nvidia.com \
    --cc=jgg@ziepe.ca \
    --cc=jrhilke@google.com \
    --cc=kevin.tian@intel.com \
    --cc=kexec@lists.infradead.org \
    --cc=kvm@vger.kernel.org \
    --cc=leon@kernel.org \
    --cc=leonro@nvidia.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=michal.winiarski@intel.com \
    --cc=parav@nvidia.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=praan@google.com \
    --cc=pratyush@kernel.org \
    --cc=rananta@google.com \
    --cc=rientjes@google.com \
    --cc=rodrigo.vivi@intel.com \
    --cc=rppt@kernel.org \
    --cc=saeedm@nvidia.com \
    --cc=skhan@linuxfoundation.org \
    --cc=thomas.hellstrom@linux.intel.com \
    --cc=tomitamoeko@gmail.com \
    --cc=vipinsh@google.com \
    --cc=vivek.kasireddy@intel.com \
    --cc=witu@nvidia.com \
    --cc=yanjun.zhu@linux.dev \
    --cc=yi.l.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox