linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Matlack <dmatlack@google.com>
To: Alex Williamson <alex@shazbot.org>
Cc: "Adithya Jayachandran" <ajayachandra@nvidia.com>,
	"Alexander Graf" <graf@amazon.com>,
	"Alex Mastro" <amastro@fb.com>,
	"Alistair Popple" <apopple@nvidia.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Ankit Agrawal" <ankita@nvidia.com>,
	"Bjorn Helgaas" <bhelgaas@google.com>,
	"Chris Li" <chrisl@kernel.org>,
	"David Matlack" <dmatlack@google.com>,
	"David Rientjes" <rientjes@google.com>,
	"Jacob Pan" <jacob.pan@linux.microsoft.com>,
	"Jason Gunthorpe" <jgg@nvidia.com>,
	"Jason Gunthorpe" <jgg@ziepe.ca>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Josh Hilke" <jrhilke@google.com>,
	"Kevin Tian" <kevin.tian@intel.com>,
	kexec@lists.infradead.org, kvm@vger.kernel.org,
	"Leon Romanovsky" <leon@kernel.org>,
	"Leon Romanovsky" <leonro@nvidia.com>,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-kselftest@vger.kernel.org, linux-mm@kvack.org,
	linux-pci@vger.kernel.org, "Lukas Wunner" <lukas@wunner.de>,
	"Michał Winiarski" <michal.winiarski@intel.com>,
	"Mike Rapoport" <rppt@kernel.org>,
	"Parav Pandit" <parav@nvidia.com>,
	"Pasha Tatashin" <pasha.tatashin@soleen.com>,
	"Pranjal Shrivastava" <praan@google.com>,
	"Pratyush Yadav" <pratyush@kernel.org>,
	"Raghavendra Rao Ananta" <rananta@google.com>,
	"Rodrigo Vivi" <rodrigo.vivi@intel.com>,
	"Saeed Mahameed" <saeedm@nvidia.com>,
	"Samiullah Khawaja" <skhawaja@google.com>,
	"Shuah Khan" <skhan@linuxfoundation.org>,
	"Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
	"Tomita Moeko" <tomitamoeko@gmail.com>,
	"Vipin Sharma" <vipinsh@google.com>,
	"Vivek Kasireddy" <vivek.kasireddy@intel.com>,
	"William Tu" <witu@nvidia.com>, "Yi Liu" <yi.l.liu@intel.com>,
	"Zhu Yanjun" <yanjun.zhu@linux.dev>
Subject: [PATCH v2 05/22] vfio/pci: Preserve vfio-pci device files across Live Update
Date: Thu, 29 Jan 2026 21:24:52 +0000	[thread overview]
Message-ID: <20260129212510.967611-6-dmatlack@google.com> (raw)
In-Reply-To: <20260129212510.967611-1-dmatlack@google.com>

From: Vipin Sharma <vipinsh@google.com>

Implement the live update file handler callbacks to preserve a vfio-pci
device across a Live Update. Subsequent commits will enable userspace to
then retrieve this file after the Live Update.

Live Update support is scoped only to cdev files (i.e. not
VFIO_GROUP_GET_DEVICE_FD files).

State about each device is serialized into a new ABI struct
vfio_pci_core_device_ser. The contents of this struct are preserved
across the Live Update to the next kernel using a combination of
Kexec-Handover (KHO) to preserve the page(s) holding the struct and the
Live Update Orchestrator (LUO) to preserve the physical address of the
struct.

For now the only contents of struct vfio_pci_core_device_ser the
device's PCI segment number and BDF, so that the device can be uniquely
identified after the Live Update.

Require that userspace disables interrupts on the device prior to
freeze() so that the device does not send any interrupts until new
interrupt handlers have been set up by the next kernel.

Reset the device and restore its state in the freeze() callback. This
ensures the device can be received by the next kernel in a consistent
state. Eventually this will be dropped and the device can be preserved
across in a running state, but that requires further work in VFIO and
the core PCI layer.

Note that LUO holds a reference to this file when it is preserved. So
VFIO is guaranteed that vfio_df_device_last_close() will not be called
on this device no matter what userspace does.

Signed-off-by: Vipin Sharma <vipinsh@google.com>
Co-developed-by: David Matlack <dmatlack@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
---
 drivers/vfio/pci/vfio_pci.c            |  2 +-
 drivers/vfio/pci/vfio_pci_liveupdate.c | 84 +++++++++++++++++++++++++-
 drivers/vfio/pci/vfio_pci_priv.h       |  2 +
 drivers/vfio/vfio.h                    | 13 ----
 drivers/vfio/vfio_main.c               | 10 +--
 include/linux/kho/abi/vfio_pci.h       | 15 +++++
 include/linux/vfio.h                   | 28 +++++++++
 7 files changed, 129 insertions(+), 25 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 19e88322af2c..0260afb9492d 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -125,7 +125,7 @@ static int vfio_pci_open_device(struct vfio_device *core_vdev)
 	return 0;
 }
 
-static const struct vfio_device_ops vfio_pci_ops = {
+const struct vfio_device_ops vfio_pci_ops = {
 	.name		= "vfio-pci",
 	.init		= vfio_pci_core_init_dev,
 	.release	= vfio_pci_core_release_dev,
diff --git a/drivers/vfio/pci/vfio_pci_liveupdate.c b/drivers/vfio/pci/vfio_pci_liveupdate.c
index b84e63c0357b..f01de98f1b75 100644
--- a/drivers/vfio/pci/vfio_pci_liveupdate.c
+++ b/drivers/vfio/pci/vfio_pci_liveupdate.c
@@ -8,25 +8,104 @@
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
+#include <linux/kexec_handover.h>
 #include <linux/kho/abi/vfio_pci.h>
 #include <linux/liveupdate.h>
 #include <linux/errno.h>
+#include <linux/vfio.h>
 
 #include "vfio_pci_priv.h"
 
 static bool vfio_pci_liveupdate_can_preserve(struct liveupdate_file_handler *handler,
 					     struct file *file)
 {
-	return false;
+	struct vfio_device_file *df = to_vfio_device_file(file);
+
+	if (!df)
+		return false;
+
+	/* Live Update support is limited to cdev files. */
+	if (df->group)
+		return false;
+
+	return df->device->ops == &vfio_pci_ops;
 }
 
 static int vfio_pci_liveupdate_preserve(struct liveupdate_file_op_args *args)
 {
-	return -EOPNOTSUPP;
+	struct vfio_device *device = vfio_device_from_file(args->file);
+	struct vfio_pci_core_device_ser *ser;
+	struct vfio_pci_core_device *vdev;
+	struct pci_dev *pdev;
+
+	vdev = container_of(device, struct vfio_pci_core_device, vdev);
+	pdev = vdev->pdev;
+
+	if (IS_ENABLED(CONFIG_VFIO_PCI_ZDEV_KVM))
+		return -EINVAL;
+
+	if (vfio_pci_is_intel_display(pdev))
+		return -EINVAL;
+
+	ser = kho_alloc_preserve(sizeof(*ser));
+	if (IS_ERR(ser))
+		return PTR_ERR(ser);
+
+	ser->bdf = pci_dev_id(pdev);
+	ser->domain = pci_domain_nr(pdev->bus);
+
+	args->serialized_data = virt_to_phys(ser);
+	return 0;
 }
 
 static void vfio_pci_liveupdate_unpreserve(struct liveupdate_file_op_args *args)
 {
+	kho_unpreserve_free(phys_to_virt(args->serialized_data));
+}
+
+static int vfio_pci_liveupdate_freeze(struct liveupdate_file_op_args *args)
+{
+	struct vfio_device *device = vfio_device_from_file(args->file);
+	struct vfio_pci_core_device *vdev;
+	struct pci_dev *pdev;
+	int ret;
+
+	vdev = container_of(device, struct vfio_pci_core_device, vdev);
+	pdev = vdev->pdev;
+
+	guard(mutex)(&device->dev_set->lock);
+
+	/*
+	 * Userspace must disable interrupts on the device prior to freeze so
+	 * that the device does not send any interrupts until new interrupt
+	 * handlers have been established by the next kernel.
+	 */
+	if (vdev->irq_type != VFIO_PCI_NUM_IRQS) {
+		pci_err(pdev, "Freeze failed! Interrupts are still enabled.\n");
+		return -EINVAL;
+	}
+
+	pci_dev_lock(pdev);
+
+	ret = pci_load_saved_state(pdev, vdev->pci_saved_state);
+	if (ret)
+		goto out;
+
+	/*
+	 * Reset the device and restore it back to its original state before
+	 * handing it to the next kernel.
+	 *
+	 * Eventually both of these should be dropped and the device should be
+	 * kept running with its current state across the Live Update.
+	 */
+	if (vdev->reset_works)
+		ret = __pci_reset_function_locked(pdev);
+
+	pci_restore_state(pdev);
+
+out:
+	pci_dev_unlock(pdev);
+	return ret;
 }
 
 static int vfio_pci_liveupdate_retrieve(struct liveupdate_file_op_args *args)
@@ -42,6 +121,7 @@ static const struct liveupdate_file_ops vfio_pci_liveupdate_file_ops = {
 	.can_preserve = vfio_pci_liveupdate_can_preserve,
 	.preserve = vfio_pci_liveupdate_preserve,
 	.unpreserve = vfio_pci_liveupdate_unpreserve,
+	.freeze = vfio_pci_liveupdate_freeze,
 	.retrieve = vfio_pci_liveupdate_retrieve,
 	.finish = vfio_pci_liveupdate_finish,
 	.owner = THIS_MODULE,
diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h
index 68966ec64e51..d3da79b7b03c 100644
--- a/drivers/vfio/pci/vfio_pci_priv.h
+++ b/drivers/vfio/pci/vfio_pci_priv.h
@@ -11,6 +11,8 @@
 /* Cap maximum number of ioeventfds per device (arbitrary) */
 #define VFIO_PCI_IOEVENTFD_MAX		1000
 
+extern const struct vfio_device_ops vfio_pci_ops;
+
 struct vfio_pci_ioeventfd {
 	struct list_head	next;
 	struct vfio_pci_core_device	*vdev;
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 50128da18bca..6b89edbbf174 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -16,17 +16,6 @@ struct iommufd_ctx;
 struct iommu_group;
 struct vfio_container;
 
-struct vfio_device_file {
-	struct vfio_device *device;
-	struct vfio_group *group;
-
-	u8 access_granted;
-	u32 devid; /* only valid when iommufd is valid */
-	spinlock_t kvm_ref_lock; /* protect kvm field */
-	struct kvm *kvm;
-	struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
-};
-
 void vfio_device_put_registration(struct vfio_device *device);
 bool vfio_device_try_get_registration(struct vfio_device *device);
 int vfio_df_open(struct vfio_device_file *df);
@@ -34,8 +23,6 @@ void vfio_df_close(struct vfio_device_file *df);
 struct vfio_device_file *
 vfio_allocate_device_file(struct vfio_device *device);
 
-extern const struct file_operations vfio_device_fops;
-
 #ifdef CONFIG_VFIO_NOIOMMU
 extern bool vfio_noiommu __read_mostly;
 #else
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index f7df90c423b4..276f615f0c28 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -1436,15 +1436,7 @@ const struct file_operations vfio_device_fops = {
 	.show_fdinfo	= vfio_device_show_fdinfo,
 #endif
 };
-
-static struct vfio_device *vfio_device_from_file(struct file *file)
-{
-	struct vfio_device_file *df = file->private_data;
-
-	if (file->f_op != &vfio_device_fops)
-		return NULL;
-	return df->device;
-}
+EXPORT_SYMBOL_GPL(vfio_device_fops);
 
 /**
  * vfio_file_is_valid - True if the file is valid vfio file
diff --git a/include/linux/kho/abi/vfio_pci.h b/include/linux/kho/abi/vfio_pci.h
index 37a845eed972..9bf58a2f3820 100644
--- a/include/linux/kho/abi/vfio_pci.h
+++ b/include/linux/kho/abi/vfio_pci.h
@@ -9,6 +9,9 @@
 #ifndef _LINUX_LIVEUPDATE_ABI_VFIO_PCI_H
 #define _LINUX_LIVEUPDATE_ABI_VFIO_PCI_H
 
+#include <linux/compiler.h>
+#include <linux/types.h>
+
 /**
  * DOC: VFIO PCI Live Update ABI
  *
@@ -25,4 +28,16 @@
 
 #define VFIO_PCI_LUO_FH_COMPATIBLE "vfio-pci-v1"
 
+/**
+ * struct vfio_pci_core_device_ser - Serialized state of a single VFIO PCI
+ * device.
+ *
+ * @bdf: The device's PCI bus, device, and function number.
+ * @domain: The device's PCI domain number (segment).
+ */
+struct vfio_pci_core_device_ser {
+	u16 bdf;
+	u16 domain;
+} __packed;
+
 #endif /* _LINUX_LIVEUPDATE_ABI_VFIO_PCI_H */
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index e90859956514..9aa1587fea19 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -81,6 +81,34 @@ struct vfio_device {
 #endif
 };
 
+struct vfio_device_file {
+	struct vfio_device *device;
+	struct vfio_group *group;
+
+	u8 access_granted;
+	u32 devid; /* only valid when iommufd is valid */
+	spinlock_t kvm_ref_lock; /* protect kvm field */
+	struct kvm *kvm;
+	struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
+};
+
+extern const struct file_operations vfio_device_fops;
+
+static inline struct vfio_device_file *to_vfio_device_file(struct file *file)
+{
+	if (file->f_op != &vfio_device_fops)
+		return NULL;
+
+	return file->private_data;
+}
+
+static inline struct vfio_device *vfio_device_from_file(struct file *file)
+{
+	struct vfio_device_file *df = to_vfio_device_file(file);
+
+	return df ? df->device : NULL;
+}
+
 /**
  * struct vfio_device_ops - VFIO bus driver device callbacks
  *
-- 
2.53.0.rc1.225.gd81095ad13-goog



  parent reply	other threads:[~2026-01-29 21:25 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-29 21:24 [PATCH v2 00/22] vfio/pci: Base Live Update support for VFIO device files David Matlack
2026-01-29 21:24 ` [PATCH v2 01/22] liveupdate: Export symbols needed by modules David Matlack
2026-01-29 21:24 ` [PATCH v2 02/22] PCI: Add API to track PCI devices preserved across Live Update David Matlack
2026-02-01  6:38   ` Zhu Yanjun
2026-02-02 18:14     ` David Matlack
2026-02-04  0:10       ` Yanjun.Zhu
2026-02-20 19:03         ` David Matlack
2026-01-29 21:24 ` [PATCH v2 03/22] PCI: Inherit bus numbers from previous kernel during " David Matlack
2026-01-29 21:24 ` [PATCH v2 04/22] vfio/pci: Register a file handler with Live Update Orchestrator David Matlack
2026-02-06 22:37   ` Yanjun.Zhu
2026-02-06 23:14     ` David Matlack
2026-01-29 21:24 ` David Matlack [this message]
2026-01-29 21:24 ` [PATCH v2 06/22] vfio/pci: Retrieve preserved device files after Live Update David Matlack
2026-01-29 21:24 ` [PATCH v2 07/22] vfio/pci: Notify PCI subsystem about devices preserved across " David Matlack
2026-01-29 21:24 ` [PATCH v2 08/22] vfio: Enforce preserved devices are retrieved via LIVEUPDATE_SESSION_RETRIEVE_FD David Matlack
2026-01-29 21:24 ` [PATCH v2 09/22] vfio/pci: Store incoming Live Update state in struct vfio_pci_core_device David Matlack
2026-01-29 21:24 ` [PATCH v2 10/22] vfio/pci: Skip reset of preserved device after Live Update David Matlack
2026-01-29 22:21   ` Jacob Pan
2026-01-29 22:33     ` David Matlack
2026-01-30  0:31       ` Jacob Pan
2026-01-29 21:24 ` [PATCH v2 11/22] docs: liveupdate: Document VFIO device file preservation David Matlack
2026-01-29 21:24 ` [PATCH v2 12/22] selftests/liveupdate: Move luo_test_utils.* into a reusable library David Matlack
2026-01-29 21:25 ` [PATCH v2 13/22] selftests/liveupdate: Add helpers to preserve/retrieve FDs David Matlack
2026-01-29 21:25 ` [PATCH v2 14/22] vfio: selftests: Build liveupdate library in VFIO selftests David Matlack
2026-01-29 21:25 ` [PATCH v2 15/22] vfio: selftests: Add Makefile support for TEST_GEN_PROGS_EXTENDED David Matlack
2026-01-29 21:25 ` [PATCH v2 16/22] vfio: selftests: Add vfio_pci_liveupdate_uapi_test David Matlack
2026-01-29 21:25 ` [PATCH v2 17/22] vfio: selftests: Initialize vfio_pci_device using a VFIO cdev FD David Matlack
2026-01-29 21:25 ` [PATCH v2 18/22] vfio: selftests: Add vfio_pci_liveupdate_kexec_test David Matlack
2026-01-29 21:25 ` [PATCH v2 19/22] vfio: selftests: Expose iommu_modes to tests David Matlack
2026-01-29 21:25 ` [PATCH v2 20/22] vfio: selftests: Expose low-level helper routines for setting up struct vfio_pci_device David Matlack
2026-01-29 21:25 ` [PATCH v2 21/22] vfio: selftests: Verify that opening VFIO device fails during Live Update David Matlack
2026-01-29 21:25 ` [PATCH v2 22/22] vfio: selftests: Add continuous DMA to vfio_pci_liveupdate_kexec_test David Matlack

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260129212510.967611-6-dmatlack@google.com \
    --to=dmatlack@google.com \
    --cc=ajayachandra@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex@shazbot.org \
    --cc=amastro@fb.com \
    --cc=ankita@nvidia.com \
    --cc=apopple@nvidia.com \
    --cc=bhelgaas@google.com \
    --cc=chrisl@kernel.org \
    --cc=corbet@lwn.net \
    --cc=graf@amazon.com \
    --cc=jacob.pan@linux.microsoft.com \
    --cc=jgg@nvidia.com \
    --cc=jgg@ziepe.ca \
    --cc=jrhilke@google.com \
    --cc=kevin.tian@intel.com \
    --cc=kexec@lists.infradead.org \
    --cc=kvm@vger.kernel.org \
    --cc=leon@kernel.org \
    --cc=leonro@nvidia.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=michal.winiarski@intel.com \
    --cc=parav@nvidia.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=praan@google.com \
    --cc=pratyush@kernel.org \
    --cc=rananta@google.com \
    --cc=rientjes@google.com \
    --cc=rodrigo.vivi@intel.com \
    --cc=rppt@kernel.org \
    --cc=saeedm@nvidia.com \
    --cc=skhan@linuxfoundation.org \
    --cc=skhawaja@google.com \
    --cc=thomas.hellstrom@linux.intel.com \
    --cc=tomitamoeko@gmail.com \
    --cc=vipinsh@google.com \
    --cc=vivek.kasireddy@intel.com \
    --cc=witu@nvidia.com \
    --cc=yanjun.zhu@linux.dev \
    --cc=yi.l.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox