* [Hypervisor Live Update] Notes from October 20, 2025
@ 2025-11-01 23:35 David Rientjes
0 siblings, 0 replies; only message in thread
From: David Rientjes @ 2025-11-01 23:35 UTC (permalink / raw)
To: Alexander Graf, Anthony Yznaga, Dave Hansen, David Hildenbrand,
David Matlack, Frank van der Linden, James Gowans,
Jason Gunthorpe, Junaid Shahid, Mike Rapoport, Pankaj Gupta,
Pasha Tatashin, Pratyush Yadav, Praveen Kumar, Vipin Sharma,
Vishal Annapurve, Woodhouse, David
Cc: linux-mm, kexec
Hi everybody,
Here are the notes from the last Hypervisor Live Update call that happened
on Monday, October 20. Thanks to everybody who was involved!
These notes are intended to bring people up to speed who could not attend
the call as well as keep the conversation going in between meetings.
----->o-----
I thought this instance of the meeting would be short and I turned out to
be very wrong :)
We touched on the discussion from the previous instance regarding the fd
dependency checking and this happening at the time of preserve rather than
prepare, Pasha noted that the discussion continued upstream afterwards on
the mailing list. The biggest change would be that the order is going to
be enforced by the user. The preserve function itself is the heavy
lifting now; the freeze and prepare are more for sanity checking.
David Matlack asked how the global states wuld work since that's outside
the fd. Pasha said the subsystem will be there but there will be another
mechanism that follows the lifecycle of fds of a specific type; example is
if a session has an fd of a specific type then it will follow the
lifecycle of the aggregate. This will be supported in v5.
----->o-----
Pasha updated that he had sent the KHO patches that provide the groundwork
for LUO. Last week he also sent a KHO memory corruption fix. Once those
patches are merged, he will send LUO v5. He was targeting sending the
next series of changes before the next biweekly sync.
----->o-----
Vipin Sharma sent out RFC patches for VFIO and was looking for feedback
from the group in the next instance of the meeting. Jason was providing
feedback on the upstream mailing list already.
----->o-----
We shifted to discussing the main topic of the day which was iommu
persistence from Samiullah. His slides are available on the shared drive.
There was general alignment with what should be included in the next
series upstream. His demonstrator so far included iommufd, iommu core,
and iommu driver patches but was just preserving root tables. He also
proposed hot swap. There was lots of discussion upstream around selection
of HWPT to be preserved, preserved HWPT and iommu domain lifecycle, fd
dependencies, and LUO finish.
Pasha noted that LUO finish can now fail which Jason asked about. Pasha
said if the fd hasn't replaced the hardware page table then finish would
have to fail.
Sami noted that the HWPTs are also restored and associated with the
preserved iommu domains and this would be done when the fd is retrieved.
We can't restore the domain during the probe but there is no mechanism to
have the HWPTs to be created during the boot time. Jason said during
probe time you put the domains back with placeholders so the iommu core
has some understanding what the translation is.
----->o-----
During the discussion for hotswap, Sami noted that once all the preserved
devices have their iommu domains hot swapped, we can destroy the restored
iommu domains that are not being used. Jason said that once the iommu
domains are rehydrated back into an fd that they should have the normal
lifecycle of a hardware page table in an fd. So they will be destroyed
when the hardware page table is destroyed when the fd closes it or the VMM
asks it to be destroyed. Jason noted that the VMM needs the id so that it
can be destroyed.
Jason suggested restoring the hardware page table pointers inside the
devices that represent the currently attached hardware page table and this
is done when you bring back the iommufd. We should likely retain a list
for each hardware page table the list of which VFIO device objects are
linked to it and this all needs to be brought back. Or an alternative may
be to serialize the devices. IOMMU needs the VFIO devices and this needs
careful orchestration.
Pasha suggested that since we have the session and sessions have specific
orders, the things without any dependencies that were preserved first and
things with dependencies were preserved last. The kernel could call
restore on everything from lowest to highest. Jason said there needs to
be a two step process: the struct file needs to be brought back before you
fill it. VFIO needs the iommufd to be filled before it can auto bind
before it can complete its restoration.
Sami suggested if we don't restore the HWPT until we have all the
information, even if it closes it goes back to the state that it was in
and we would consider the iommufd not fully restored until it is. Jason
suggested that would require adding an iommufd ioctl to restore individual
sub objects: restoring a HWPT that was with this tag and give back the id;
the restore would only be possible if the VFIO devices are already present
inside the iommufd.
----->o-----
When discussing LUO finish, Pasha suggested we need a way to discard a
session if it hasn't been reclaimed or there are exceptions. If the VM
never is restored then we will have lingering session that need to be
somehow discarded. Jason suggested all objects are brought back to
userspace before you can encounter an error. If there are problems up to
that point, then the cleanest way to address this is with another kexec.
Jason stressed the need for another kexec as a big hammer to be able to do
recovery and cleanup. For example, if there are 10 VMs and one did not
restore, do another live update to clean up the lingering VM.
----->o-----
Next meeting will be on Monday, November 3 at 8am PST (UTC-8), everybody
is welcome: https://meet.google.com/rjn-dmzu-hgq
NOTE!!! Daylight Savings Time has ended in the United States, so please
check your local time carefully:
Time zones
PST (UTC-8) 8:00am
MST (UTC-7) 9:00am
CST (UTC-6) 10:00am
EST (UTC-5) 11:00am
Rio de Janeiro (UTC-3) 1:00pm
London (UTC) 4:00pm
Berlin (UTC+1) 5:00pm
Moscow (UTC+3) 7:00pm
Dubai (UTC+4) 8:00pm
Mumbai (UTC+5:30) 9:30pm
Singapore (UTC+8) 12:00am Tuesday
Beijing (UTC+8) 12:00am Tuesday
Tokyo (UTC+9) 1:00am Tuesday
Sydney (UTC+11) 3:00am Tuesday
Auckland (UTC+13) 5:00am Tuesday
Topics for the next meeting:
- update on the status of stateless KHO RFC patches that should simplify
LUO support
- update on LUO v5 and patch series sent upstream after KHO changes and
fixes are staged
- VFIO RFC patch feedback based on the series sent to the mailing list a
couple weeks ago
- follow up on the status of iommu persistence and any addtional
discussion from last time
- update on memfd preservation, vmalloc support, and 1GB limitation
- discuss deferred struct page initialization and deferring when KHO is
enabled
- discuss guest_memfd preservation use cases for Confidential Computing
and any current work happening on it, including overlap with memfd
preservation being worked on by Pratyush
+ discuss any use cases for Confidential Computing where folios may
need to be split after being marked as preserved during brown out
- later: testing methodology to allow downstream consumers to qualify
that live update works from one version to another
- later: reducing blackout window during live update
Please let me know if you'd like to propose additional topics for
discussion, thank you!
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2025-11-01 23:35 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-11-01 23:35 [Hypervisor Live Update] Notes from October 20, 2025 David Rientjes
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox