From: Pasha Tatashin <pasha.tatashin@soleen.com>
To: pratyush@kernel.org, jasonmiu@google.com, graf@amazon.com,
changyuanl@google.com, pasha.tatashin@soleen.com,
rppt@kernel.org, dmatlack@google.com, rientjes@google.com,
corbet@lwn.net, rdunlap@infradead.org,
ilpo.jarvinen@linux.intel.com, kanie@linux.alibaba.com,
ojeda@kernel.org, aliceryhl@google.com, masahiroy@kernel.org,
akpm@linux-foundation.org, tj@kernel.org, yoann.congal@smile.fr,
mmaurer@google.com, roman.gushchin@linux.dev,
chenridong@huawei.com, axboe@kernel.dk, mark.rutland@arm.com,
jannh@google.com, vincent.guittot@linaro.org,
hannes@cmpxchg.org, dan.j.williams@intel.com, david@redhat.com,
joel.granados@kernel.org, rostedt@goodmis.org,
anna.schumaker@oracle.com, song@kernel.org,
zhangguopeng@kylinos.cn, linux@weissschuh.net,
linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
linux-mm@kvack.org, gregkh@linuxfoundation.org,
tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com,
rafael@kernel.org, dakr@kernel.org,
bartosz.golaszewski@linaro.org, cw00.choi@samsung.com,
myungjoo.ham@samsung.com, yesanishhere@gmail.com,
Jonathan.Cameron@huawei.com, quic_zijuhu@quicinc.com,
aleksander.lobakin@intel.com, ira.weiny@intel.com,
andriy.shevchenko@linux.intel.com, leon@kernel.org,
lukas@wunner.de, bhelgaas@google.com, wagi@kernel.org,
djeffery@redhat.com, stuart.w.hayes@gmail.com,
ptyadav@amazon.de, lennart@poettering.net, brauner@kernel.org,
linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org,
saeedm@nvidia.com, ajayachandra@nvidia.com, jgg@nvidia.com,
parav@nvidia.com, leonro@nvidia.com, witu@nvidia.com,
hughd@google.com, skhawaja@google.com, chrisl@kernel.org,
steven.sistare@oracle.com
Subject: Re: [PATCH v4 00/30] Live Update Orchestrator
Date: Tue, 7 Oct 2025 13:10:30 -0400 [thread overview]
Message-ID: <CA+CK2bB+RdapsozPHe84MP4NVSPLo6vje5hji5MKSg8L6ViAbw@mail.gmail.com> (raw)
In-Reply-To: <20250929010321.3462457-1-pasha.tatashin@soleen.com>
On Sun, Sep 28, 2025 at 9:03 PM Pasha Tatashin
<pasha.tatashin@soleen.com> wrote:
>
> This series introduces the Live Update Orchestrator (LUO), a kernel
> subsystem designed to facilitate live kernel updates. LUO enables
> kexec-based reboots with minimal downtime, a critical capability for
> cloud environments where hypervisors must be updated without disrupting
> running virtual machines. By preserving the state of selected resources,
> such as file descriptors and memory, LUO allows workloads to resume
> seamlessly in the new kernel.
>
> The git branch for this series can be found at:
> https://github.com/googleprodkernel/linux-liveupdate/tree/luo/v4
>
> The patch series applies against linux-next tag: next-20250926
>
> While this series is showed cased using memfd preservation. There are
> works to preserve devices:
> 1. IOMMU: https://lore.kernel.org/all/20250928190624.3735830-16-skhawaja@google.com
> 2. PCI: https://lore.kernel.org/all/20250916-luo-pci-v2-0-c494053c3c08@kernel.org
>
> =======================================================================
> Changelog since v3:
> (https://lore.kernel.org/all/20250807014442.3829950-1-pasha.tatashin@soleen.com):
>
> - The main architectural change in this version is introduction of
> "sessions" to manage the lifecycle of preserved file descriptors.
> In v3, session management was left to a single userspace agent. This
> approach has been revised to improve robustness. Now, each session is
> represented by a file descriptor (/dev/liveupdate). The lifecycle of
> all preserved resources within a session is tied to this FD, ensuring
> automatic cleanup by the kernel if the controlling userspace agent
> crashes or exits unexpectedly.
>
> - The first three KHO fixes from the previous series have been merged
> into Linus' tree.
>
> - Various bug fixes and refactorings, including correcting memory
> unpreservation logic during a kho_abort() sequence.
>
> - Addressing all comments from reviewers.
>
> - Removing sysfs interface (/sys/kernel/liveupdate/state), the state
> can now be queried only via ioctl() API.
>
> =======================================================================
Hi all,
Following up on yesterday's Hypervisor Live Update meeting, we
discussed the requirements for the LUO to track dependencies,
particularly for IOMMU preservation and other stateful file
descriptors. This email summarizes the main design decisions and
outcomes from that discussion.
For context, the notes from the previous meeting can be found here:
https://lore.kernel.org/all/365acb25-4b25-86a2-10b0-1df98703e287@google.com
The notes for yesterday's meeting are not yes available.
The key outcomes are as follows:
1. User-Enforced Ordering
-------------------------
The responsibility for enforcing the correct order of operations will
lie with the userspace agent. If fd_A is a dependency for fd_B,
userspace must ensure that fd_A is preserved before fd_B. This same
ordering must be honored during the restoration phase after the reboot
(fd_A must be restored before fd_B). The kernel preserve the ordering.
2. Serialization in PRESERVE_FD
-------------------------------
To keep the global prepare() phase lightweight and predictable, the
consensus was to shift the heavy serialization work into the
PRESERVE_FD ioctl handler. This means that when userspace requests to
preserve a file, the file handler should perform the bulk of the
state-saving work immediately.
The proposed sequence of operations reflects this shift:
Shutdown Flow:
fd_preserve() (heavy serialization) -> prepare() (lightweight final
checks) -> Suspend VM -> reboot(KEXEC) -> freeze() (lightweight)
Boot & Restore Flow:
fd_restore() (lightweight object creation) -> Resume VM -> Heavy
post-restore IOCTLs (e.g., hardware page table re-creation) ->
finish() (lightweight cleanup)
This decision primarily serves as a guideline for file handler
implementations. For the LUO core, this implies minor API changes,
such as renaming can_preserve() to a more active preserve() and adding
a corresponding unpreserve() callback to be called during
UNPRESERVE_FD.
3. FD Data Query API
--------------------
We identified the need for a kernel API to allow subsystems to query
preserved FD data during the boot process, before userspace has
initiated the restore.
The proposed API would allow a file handler to retrieve a list of all
its preserved FDs, including their session names, tokens, and the
private data payload.
Proposed Data Structure:
struct liveupdate_fd {
char *session; /* session name */
u64 token; /* Preserved FD token */
u64 data; /* Private preserved data */
};
Proposed Function:
liveupdate_fd_data_query(struct liveupdate_file_handler *h,
struct liveupdate_fd *fds, long *count);
4. New File-Lifecycle-Bound Global State
----------------------------------------
A new mechanism for managing global state was proposed, designed to be
tied to the lifecycle of the preserved files themselves. This would
allow a file owner (e.g., the IOMMU subsystem) to save and retrieve
global state that is only relevant when one or more of its FDs are
being managed by LUO.
The key characteristics of this new mechanism are:
The global state is optionally created on the first preserve() call
for a given file handler.
The state can be updated on subsequent preserve() calls.
The state is destroyed when the last corresponding file is unpreserved
or finished.
The data can be accessed during boot.
I am thinking of an API like this.
1. Add three more callbacks to liveupdate_file_ops:
/*
* Optional. Called by LUO during first get global state call.
* The handler should allocate/KHO preserve its global state object and return a
* pointer to it via 'obj'. It must also provide a u64 handle (e.g., a physical
* address of preserved memory) via 'data_handle' that LUO will save.
* Return: 0 on success.
*/
int (*global_state_create)(struct liveupdate_file_handler *h,
void **obj, u64 *data_handle);
/*
* Optional. Called by LUO in the new kernel
* before the first access to the global state. The handler receives
* the preserved u64 data_handle and should use it to reconstruct its
* global state object, returning a pointer to it via 'obj'.
* Return: 0 on success.
*/
int (*global_state_restore)(struct liveupdate_file_handler *h,
u64 data_handle, void **obj);
/*
* Optional. Called by LUO after the last
* file for this handler is unpreserved or finished. The handler
* must free its global state object and any associated resources.
*/
void (*global_state_destroy)(struct liveupdate_file_handler *h, void *obj);
The get/put global state data:
/* Get and lock the data with file_handler scoped lock */
int liveupdate_fh_global_state_get(struct liveupdate_file_handler *h,
void **obj);
/* Unlock the data */
void liveupdate_fh_global_state_put(struct liveupdate_file_handler *h);
Execution Flow:
1. Outgoing Kernel (First preserve() call):
2. Handler's preserve() is called. It needs the global state, so it calls
liveupdate_fh_global_state_get(&h, &obj). LUO acquires h->global_state_lock.
It sees h->global_state_obj is NULL.
LUO calls h->ops->global_state_create(h, &h->global_state_obj, &handle).
The handler allocates its state, preserves it with KHO, and returns its live
pointer and a u64 handle.
3. LUO stores the handle internally for later serialization.
4. LUO sets *obj = h->global_state_obj and returns 0 with the lock still held.
5. The preserve() callback does its work using the obj.
6. It calls liveupdate_fh_global_state_put(h), which releases the lock.
Global PREPARE:
1. LUO iterates handlers. If h->count > 0, it writes the stored data_handle into
the LUO FDT.
Incoming Kernel (First access):
1. When liveupdate_fh_global_state_get(&h, &obj) is called the first time. LUO
acquires h->global_state_lock.
2. It sees h->global_state_obj is NULL, but it knows it has a preserved u64
handle from the FDT. LUO calls h->ops->global_state_restore()
3. Reconstructs its state object, and returns the live pointer.
4. LUO sets *obj = h->global_state_obj and returns 0 with the lock held.
5. The caller does its work.
6. It calls liveupdate_fh_global_state_put(h) to release the lock.
Last File Cleanup (in unpreserve or finish):
1. LUO decrements h->count to 0.
2. This triggers the cleanup logic.
3. LUO calls h->ops->global_state_destroy(h, h->global_state_obj).
4. The handler frees its memory and resources.
5. LUO sets h->global_state_obj = NULL, resetting it for a future live update
cycle.
Pasha
Pasha
next prev parent reply other threads:[~2025-10-07 17:11 UTC|newest]
Thread overview: 86+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-29 1:02 Pasha Tatashin
2025-09-29 1:02 ` [PATCH v4 01/30] kho: allow to drive kho from within kernel Pasha Tatashin
2025-09-29 1:02 ` [PATCH v4 02/30] kho: make debugfs interface optional Pasha Tatashin
2025-10-06 16:30 ` Pratyush Yadav
2025-10-06 18:02 ` Pasha Tatashin
2025-10-06 16:55 ` Pratyush Yadav
2025-10-06 17:23 ` Pasha Tatashin
2025-09-29 1:02 ` [PATCH v4 03/30] kho: drop notifiers Pasha Tatashin
2025-10-06 14:30 ` Pratyush Yadav
2025-10-06 16:17 ` Pasha Tatashin
2025-10-06 16:38 ` Pratyush Yadav
2025-10-06 17:01 ` Pratyush Yadav
2025-10-06 17:21 ` Pasha Tatashin
2025-10-07 12:09 ` Pratyush Yadav
2025-10-07 13:16 ` Pasha Tatashin
2025-10-07 13:30 ` Pratyush Yadav
2025-09-29 1:02 ` [PATCH v4 04/30] kho: add interfaces to unpreserve folios and page ranes Pasha Tatashin
2025-09-29 1:02 ` [PATCH v4 05/30] kho: don't unpreserve memory during abort Pasha Tatashin
2025-09-29 1:02 ` [PATCH v4 06/30] liveupdate: kho: move to kernel/liveupdate Pasha Tatashin
2025-09-29 1:02 ` [PATCH v4 07/30] liveupdate: luo_core: luo_ioctl: Live Update Orchestrator Pasha Tatashin
2025-09-29 1:02 ` [PATCH v4 08/30] liveupdate: luo_core: integrate with KHO Pasha Tatashin
2025-09-29 1:03 ` [PATCH v4 09/30] liveupdate: luo_subsystems: add subsystem registration Pasha Tatashin
2025-09-29 1:03 ` [PATCH v4 10/30] liveupdate: luo_subsystems: implement subsystem callbacks Pasha Tatashin
2025-09-29 1:03 ` [PATCH v4 11/30] liveupdate: luo_session: Add sessions support Pasha Tatashin
2025-09-29 1:03 ` [PATCH v4 12/30] liveupdate: luo_ioctl: add user interface Pasha Tatashin
2025-09-29 1:03 ` [PATCH v4 13/30] liveupdate: luo_file: implement file systems callbacks Pasha Tatashin
2025-09-29 1:03 ` [PATCH v4 14/30] liveupdate: luo_session: Add ioctls for file preservation and state management Pasha Tatashin
2025-10-29 19:07 ` Pratyush Yadav
2025-10-29 20:13 ` Pasha Tatashin
2025-10-29 20:43 ` David Matlack
2025-10-29 20:57 ` Pasha Tatashin
2025-10-29 21:13 ` David Matlack
2025-10-29 21:17 ` Pasha Tatashin
2025-10-29 22:00 ` Samiullah Khawaja
2025-10-30 14:45 ` Pasha Tatashin
2025-10-29 20:37 ` Pratyush Yadav
2025-10-29 20:58 ` Pasha Tatashin
2025-09-29 1:03 ` [PATCH v4 15/30] reboot: call liveupdate_reboot() before kexec Pasha Tatashin
2025-09-29 1:03 ` [PATCH v4 16/30] kho: move kho debugfs directory to liveupdate Pasha Tatashin
2025-09-29 1:03 ` [PATCH v4 17/30] liveupdate: add selftests for subsystems un/registration Pasha Tatashin
2025-09-29 1:03 ` [PATCH v4 18/30] selftests/liveupdate: add subsystem/state tests Pasha Tatashin
2025-10-03 23:17 ` Vipin Sharma
2025-10-04 2:08 ` Pasha Tatashin
2025-09-29 1:03 ` [PATCH v4 19/30] docs: add luo documentation Pasha Tatashin
2025-09-29 1:03 ` [PATCH v4 20/30] MAINTAINERS: add liveupdate entry Pasha Tatashin
2025-09-29 1:03 ` [PATCH v4 21/30] mm: shmem: use SHMEM_F_* flags instead of VM_* flags Pasha Tatashin
2025-09-29 1:03 ` [PATCH v4 22/30] mm: shmem: allow freezing inode mapping Pasha Tatashin
2025-09-29 1:03 ` [PATCH v4 23/30] mm: shmem: export some functions to internal.h Pasha Tatashin
2025-09-29 1:03 ` [PATCH v4 24/30] luo: allow preserving memfd Pasha Tatashin
2025-09-29 1:03 ` [PATCH v4 25/30] docs: add documentation for memfd preservation via LUO Pasha Tatashin
2025-09-29 1:03 ` [PATCH v4 26/30] selftests/liveupdate: Add multi-kexec session lifecycle test Pasha Tatashin
2025-10-03 22:51 ` Vipin Sharma
2025-10-04 2:07 ` Pasha Tatashin
2025-10-04 2:37 ` Pasha Tatashin
2025-10-09 22:57 ` Vipin Sharma
2025-09-29 1:03 ` [PATCH v4 27/30] selftests/liveupdate: Add multi-file and unreclaimed file test Pasha Tatashin
2025-09-29 1:03 ` [PATCH v4 28/30] selftests/liveupdate: Add multi-session workflow and state interaction test Pasha Tatashin
2025-09-29 1:03 ` [PATCH v4 29/30] selftests/liveupdate: Add test for unreclaimed resource cleanup Pasha Tatashin
2025-09-29 1:03 ` [PATCH v4 30/30] selftests/liveupdate: Add tests for per-session state and cancel cycles Pasha Tatashin
2025-10-07 17:10 ` Pasha Tatashin [this message]
2025-10-07 17:50 ` [PATCH v4 00/30] Live Update Orchestrator Jason Gunthorpe
2025-10-08 3:18 ` Pasha Tatashin
2025-10-08 7:03 ` Samiullah Khawaja
2025-10-08 16:40 ` Pasha Tatashin
2025-10-08 19:35 ` Jason Gunthorpe
2025-10-08 20:26 ` Pasha Tatashin
2025-10-09 14:48 ` Jason Gunthorpe
2025-10-09 15:01 ` Pasha Tatashin
2025-10-09 15:03 ` Pasha Tatashin
2025-10-09 16:46 ` Samiullah Khawaja
2025-10-09 17:39 ` Jason Gunthorpe
2025-10-09 18:37 ` Pasha Tatashin
2025-10-10 14:35 ` Jason Gunthorpe
2025-10-09 21:58 ` Samiullah Khawaja
2025-10-09 22:42 ` Pasha Tatashin
2025-10-10 14:42 ` Jason Gunthorpe
2025-10-10 14:58 ` Pasha Tatashin
2025-10-10 15:02 ` Jason Gunthorpe
2025-10-09 22:57 ` Pratyush Yadav
2025-10-09 23:50 ` Pasha Tatashin
2025-10-10 15:01 ` Jason Gunthorpe
2025-10-14 13:29 ` Pratyush Yadav
2025-10-20 14:29 ` Jason Gunthorpe
2025-10-27 11:37 ` Pratyush Yadav
2025-10-13 15:23 ` Pratyush Yadav
2025-10-10 12:45 ` Pasha Tatashin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CA+CK2bB+RdapsozPHe84MP4NVSPLo6vje5hji5MKSg8L6ViAbw@mail.gmail.com \
--to=pasha.tatashin@soleen.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=ajayachandra@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=aleksander.lobakin@intel.com \
--cc=aliceryhl@google.com \
--cc=andriy.shevchenko@linux.intel.com \
--cc=anna.schumaker@oracle.com \
--cc=axboe@kernel.dk \
--cc=bartosz.golaszewski@linaro.org \
--cc=bhelgaas@google.com \
--cc=bp@alien8.de \
--cc=brauner@kernel.org \
--cc=changyuanl@google.com \
--cc=chenridong@huawei.com \
--cc=chrisl@kernel.org \
--cc=corbet@lwn.net \
--cc=cw00.choi@samsung.com \
--cc=dakr@kernel.org \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=djeffery@redhat.com \
--cc=dmatlack@google.com \
--cc=graf@amazon.com \
--cc=gregkh@linuxfoundation.org \
--cc=hannes@cmpxchg.org \
--cc=hpa@zytor.com \
--cc=hughd@google.com \
--cc=ilpo.jarvinen@linux.intel.com \
--cc=ira.weiny@intel.com \
--cc=jannh@google.com \
--cc=jasonmiu@google.com \
--cc=jgg@nvidia.com \
--cc=joel.granados@kernel.org \
--cc=kanie@linux.alibaba.com \
--cc=lennart@poettering.net \
--cc=leon@kernel.org \
--cc=leonro@nvidia.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux@weissschuh.net \
--cc=lukas@wunner.de \
--cc=mark.rutland@arm.com \
--cc=masahiroy@kernel.org \
--cc=mingo@redhat.com \
--cc=mmaurer@google.com \
--cc=myungjoo.ham@samsung.com \
--cc=ojeda@kernel.org \
--cc=parav@nvidia.com \
--cc=pratyush@kernel.org \
--cc=ptyadav@amazon.de \
--cc=quic_zijuhu@quicinc.com \
--cc=rafael@kernel.org \
--cc=rdunlap@infradead.org \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=rostedt@goodmis.org \
--cc=rppt@kernel.org \
--cc=saeedm@nvidia.com \
--cc=skhawaja@google.com \
--cc=song@kernel.org \
--cc=steven.sistare@oracle.com \
--cc=stuart.w.hayes@gmail.com \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=vincent.guittot@linaro.org \
--cc=wagi@kernel.org \
--cc=witu@nvidia.com \
--cc=x86@kernel.org \
--cc=yesanishhere@gmail.com \
--cc=yoann.congal@smile.fr \
--cc=zhangguopeng@kylinos.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox