linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [Hypervisor Live Update] Notes from October 20, 2025
@ 2025-11-01 23:35 David Rientjes
  0 siblings, 0 replies; only message in thread
From: David Rientjes @ 2025-11-01 23:35 UTC (permalink / raw)
  To: Alexander Graf, Anthony Yznaga, Dave Hansen, David Hildenbrand,
	David Matlack, Frank van der Linden, James Gowans,
	Jason Gunthorpe, Junaid Shahid, Mike Rapoport, Pankaj Gupta,
	Pasha Tatashin, Pratyush Yadav, Praveen Kumar, Vipin Sharma,
	Vishal Annapurve, Woodhouse, David
  Cc: linux-mm, kexec

Hi everybody,

Here are the notes from the last Hypervisor Live Update call that happened 
on Monday, October 20.  Thanks to everybody who was involved!

These notes are intended to bring people up to speed who could not attend 
the call as well as keep the conversation going in between meetings.

----->o-----
I thought this instance of the meeting would be short and I turned out to 
be very wrong :)

We touched on the discussion from the previous instance regarding the fd 
dependency checking and this happening at the time of preserve rather than 
prepare, Pasha noted that the discussion continued upstream afterwards on 
the mailing list.  The biggest change would be that the order is going to 
be enforced by the user.  The preserve function itself is the heavy 
lifting now; the freeze and prepare are more for sanity checking.

David Matlack asked how the global states wuld work since that's outside 
the fd.  Pasha said the subsystem will be there but there will be another 
mechanism that follows the lifecycle of fds of a specific type; example is 
if a session has an fd of a specific type then it will follow the 
lifecycle of the aggregate.  This will be supported in v5. 

----->o-----
Pasha updated that he had sent the KHO patches that provide the groundwork 
for LUO.  Last week he also sent a KHO memory corruption fix.  Once those 
patches are merged, he will send LUO v5.  He was targeting sending the 
next series of changes before the next biweekly sync.

----->o-----
Vipin Sharma sent out RFC patches for VFIO and was looking for feedback 
from the group in the next instance of the meeting.  Jason was providing 
feedback on the upstream mailing list already.

----->o-----
We shifted to discussing the main topic of the day which was iommu 
persistence from Samiullah.  His slides are available on the shared drive.

There was general alignment with what should be included in the next 
series upstream.  His demonstrator so far included iommufd, iommu core, 
and iommu driver patches but was just preserving root tables.  He also 
proposed hot swap.  There was lots of discussion upstream around selection 
of HWPT to be preserved, preserved HWPT and iommu domain lifecycle, fd 
dependencies, and LUO finish.

Pasha noted that LUO finish can now fail which Jason asked about.  Pasha 
said if the fd hasn't replaced the hardware page table then finish would 
have to fail.

Sami noted that the HWPTs are also restored and associated with the 
preserved iommu domains and this would be done when the fd is retrieved.  
We can't restore the domain during the probe but there is no mechanism to 
have the HWPTs to be created during the boot time.  Jason said during 
probe time you put the domains back with placeholders so the iommu core 
has some understanding what the translation is.

----->o-----
During the discussion for hotswap, Sami noted that once all the preserved 
devices have their iommu domains hot swapped, we can destroy the restored 
iommu domains that are not being used.  Jason said that once the iommu 
domains are rehydrated back into an fd that they should have the normal 
lifecycle of a hardware page table in an fd.  So they will be destroyed 
when the hardware page table is destroyed when the fd closes it or the VMM 
asks it to be destroyed.  Jason noted that the VMM needs the id so that it 
can be destroyed.

Jason suggested restoring the hardware page table pointers inside the 
devices that represent the currently attached hardware page table and this 
is done when you bring back the iommufd.  We should likely retain a list 
for each hardware page table the list of which VFIO device objects are 
linked to it and this all needs to be brought back.  Or an alternative may 
be to serialize the devices.  IOMMU needs the VFIO devices and this needs 
careful orchestration.

Pasha suggested that since we have the session and sessions have specific 
orders, the things without any dependencies that were preserved first and 
things with dependencies were preserved last.  The kernel could call 
restore on everything from lowest to highest.  Jason said there needs to 
be a two step process: the struct file needs to be brought back before you 
fill it.  VFIO needs the iommufd to be filled before it can auto bind 
before it can complete its restoration.

Sami suggested if we don't restore the HWPT until we have all the 
information, even if it closes it goes back to the state that it was in 
and we would consider the iommufd not fully restored until it is.  Jason 
suggested that would require adding an iommufd ioctl to restore individual 
sub objects: restoring a HWPT that was with this tag and give back the id; 
the restore would only be possible if the VFIO devices are already present 
inside the iommufd.

----->o-----
When discussing LUO finish, Pasha suggested we need a way to discard a 
session if it hasn't been reclaimed or there are exceptions.  If the VM 
never is restored then we will have lingering session that need to be 
somehow discarded.  Jason suggested all objects are brought back to 
userspace before you can encounter an error.  If there are problems up to 
that point, then the cleanest way to address this is with another kexec.

Jason stressed the need for another kexec as a big hammer to be able to do 
recovery and cleanup.  For example, if there are 10 VMs and one did not 
restore, do another live update to clean up the lingering VM.

----->o-----
Next meeting will be on Monday, November 3 at 8am PST (UTC-8), everybody
is welcome: https://meet.google.com/rjn-dmzu-hgq

NOTE!!!  Daylight Savings Time has ended in the United States, so please
check your local time carefully:

Time zones

PST (UTC-8)             8:00am
MST (UTC-7)             9:00am
CST (UTC-6)             10:00am
EST (UTC-5)             11:00am
Rio de Janeiro (UTC-3)  1:00pm
London (UTC)            4:00pm
Berlin (UTC+1)          5:00pm
Moscow (UTC+3)          7:00pm
Dubai (UTC+4)           8:00pm
Mumbai (UTC+5:30)       9:30pm
Singapore (UTC+8)       12:00am Tuesday
Beijing (UTC+8)         12:00am Tuesday
Tokyo (UTC+9)           1:00am Tuesday
Sydney (UTC+11)         3:00am Tuesday
Auckland (UTC+13)       5:00am Tuesday

Topics for the next meeting:

 - update on the status of stateless KHO RFC patches that should simplify
   LUO support
 - update on LUO v5 and patch series sent upstream after KHO changes and
   fixes are staged
 - VFIO RFC patch feedback based on the series sent to the mailing list a
   couple weeks ago
 - follow up on the status of iommu persistence and any addtional
   discussion from last time
 - update on memfd preservation, vmalloc support, and 1GB limitation 
 - discuss deferred struct page initialization and deferring when KHO is
   enabled
 - discuss guest_memfd preservation use cases for Confidential Computing
   and any current work happening on it, including overlap with memfd
   preservation being worked on by Pratyush
   + discuss any use cases for Confidential Computing where folios may
     need to be split after being marked as preserved during brown out
 - later: testing methodology to allow downstream consumers to qualify
   that live update works from one version to another
 - later: reducing blackout window during live update

Please let me know if you'd like to propose additional topics for
discussion, thank you!


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2025-11-01 23:35 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-11-01 23:35 [Hypervisor Live Update] Notes from October 20, 2025 David Rientjes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox