From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 272EDC3601E for ; Mon, 14 Apr 2025 01:57:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 51B026B00E7; Sun, 13 Apr 2025 21:57:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4F3DF6B00E9; Sun, 13 Apr 2025 21:57:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3457B6B00E7; Sun, 13 Apr 2025 21:57:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 113BB6B00DE for ; Sun, 13 Apr 2025 21:57:50 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 0B3FBBBA4D for ; Mon, 14 Apr 2025 01:57:51 +0000 (UTC) X-FDA: 83330988342.07.7DA075F Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) by imf28.hostedemail.com (Postfix) with ESMTP id 55F7BC000C for ; Mon, 14 Apr 2025 01:57:49 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=jZobKDTi; spf=pass (imf28.hostedemail.com: domain of rientjes@google.com designates 209.85.214.179 as permitted sender) smtp.mailfrom=rientjes@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744595869; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=EPo4fDzBPzZ0SF46iJaXS2cUYofLRWgUanU/uI3JO8M=; b=x/dBXWAAdnWSKBmlAfR7RYk8kC56Ndq9fQshhcfTLdwIjTtaykx3NjaP0lrwrkDHVPHkc+ 56oSO2LtSwjYeciWaIqijOJ9r2eFz9F07B3NNCoe3PFwHw/oBYD2AMqf4lZminsbJP/GFd n/ubKaFs9wEt/DnczFuabHcqF6l64hw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744595869; a=rsa-sha256; cv=none; b=BIVdom9uCz2kK/tDl9nee5Rd/Evv/zSbJC5CSXDJdKxvi9yiyqYu7LqgQTRjcaJb+jn/9X kWWrPYGgc5LuNlfq3LI9XYI0EcU19dtMd6fuqHNLcVcFzQgC02c0RqcdENgD+1NDL4uoQL nRyU/9DsNhUPcxYqfQZuOtVfz4j6LMg= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=jZobKDTi; spf=pass (imf28.hostedemail.com: domain of rientjes@google.com designates 209.85.214.179 as permitted sender) smtp.mailfrom=rientjes@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-pl1-f179.google.com with SMTP id d9443c01a7336-2240aad70f2so325825ad.0 for ; Sun, 13 Apr 2025 18:57:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1744595868; x=1745200668; darn=kvack.org; h=mime-version:message-id:subject:cc:to:from:date:from:to:cc:subject :date:message-id:reply-to; bh=EPo4fDzBPzZ0SF46iJaXS2cUYofLRWgUanU/uI3JO8M=; b=jZobKDTizlKypJs+bLrDKkzfnjI7V/P3cUJSrcRmhgWX3g0pZVFNFcbfpvjiWOF1av wl8umGbCAX6+Z+h/tQpZ9QYlR4AIPVBAbI79WWiiTMGsDGxLAwGYIP2FoRNqt/aO74TV U1Xq61P1HpEnIkDYwYF4wtxCuO6MTRxRA1GgNag+YS+lubQotYneUA++4R/Uyc9cC8P2 Al/KDfB0yz591HwulM7cdkbNuaLkNSO0A+1QAsDdhs6vha5i8yKTNCdJvEXvsWWgoK7D P4EvXq4lQDKM6kjzYsiIrRgJIVs5V7J/+W+msUk3BAkpbcceZJCJTE2KAVdkOOQjaFiK zfbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744595868; x=1745200668; h=mime-version:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=EPo4fDzBPzZ0SF46iJaXS2cUYofLRWgUanU/uI3JO8M=; b=LtPLCt2d8e17DADEl26ckQRtP10jCa4oSvkdRMUr7gr7zXJCvvl54KzzkKZUG4VJxj ITyZIHCisSknroTAbHhQCbX6pyPgbDytTLE3DMrKKCcSixsiw6bCeOhNOEeR4agFPI9y ckiOJGjy+atan+D4/9mjvRoUdwX1P+GkDLf8/XcPOptZC9aOmT64qfk17Od6S5wi4eaF RSRFcHDJVNQLICOB0mFSFJSowI/TMX6fdY8B/4CeqBKVYezwFpK+WBa/WQ28Ewo3srGJ UeDZOJJq9r52OXPrBT2lt9V2PdXfSI/zdK6SBqmPFlF2KjB8sT1fTFxPb0QFVDJ5wP32 O6og== X-Gm-Message-State: AOJu0Yw1hTOUql6rrjmaIW/14XHq53GNYajzM0XF4pVlaD0yEnhx0mvd q3EeA/uP412OVaSorXz7yBm7eCc707HC1vREpQ5uq0MkYUzF30nxvRHeJ/LCsg== X-Gm-Gg: ASbGncsmeMGSUzo8AkNACBuubTG2GZBsb6YfaTExgzUtSqbUxH/3r6yWONQEoM5Ll54 2sjO/bM4yOibkU7my/WV10Zt84KUKE/i553WdIjUOU9Hop8OfDiPuktFkIrrRgK+954NW1+pJlX HSLb95B8IQcTXaB2ldZkvujf5igPspwh9eZZICzjAPpMS7DI/HjC0okwqEg6C16i1H6BpfSSKrC hU5xVzNRI3w4mqJwaQHS5PGlQ5WE1LvV6q8t6X+bDswFgp/YTSiJtu5fnQcRTpaze4BPEYkS2Zb ima0stcC22lFXAWxKgKDCShE7h06RPIDycgLejpMfoA+Ew99+cvOiEU/StawrXuXl1Vu4d4iW4V U1Z1OmoNFDoN4biITI1vCwVUZ8Wgh8gF3T6Cy+o9lppasFg== X-Google-Smtp-Source: AGHT+IEtYqIuSLz/eBi+PVCcNBkAcKBhv+SmCO2Ibb03MJgmk6fUjpgMLtQ4dk/XA3mvLQhAtdSY9Q== X-Received: by 2002:a17:902:ebc7:b0:223:3b76:4e25 with SMTP id d9443c01a7336-22bf52ca7e7mr2623495ad.17.1744595867709; Sun, 13 Apr 2025 18:57:47 -0700 (PDT) Received: from [2a00:79e0:2eb0:8:f229:adb7:460c:4b5e] ([2a00:79e0:2eb0:8:f229:adb7:460c:4b5e]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-22ac7c97298sm89234135ad.138.2025.04.13.18.57.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 13 Apr 2025 18:57:47 -0700 (PDT) Date: Sun, 13 Apr 2025 18:57:46 -0700 (PDT) From: David Rientjes To: Alexander Graf , Anthony Yznaga , Dave Hansen , David Hildenbrand , Frank van der Linden , James Gowans , Jason Gunthorpe , Junaid Shahid , Mike Rapoport , Pankaj Gupta , Pasha Tatashin , Pratyush Yadav , Vipin Sharma , Vishal Annapurve , "Woodhouse, David" cc: linux-mm@kvack.org, kexec@lists.infradead.org Subject: [Hypervisor Live Update] Notes from April 7, 2025 Message-ID: <963ddf5b-81ed-7be2-3fcd-0eec7fafa132@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 55F7BC000C X-Stat-Signature: dr14o1pmosnt4o1cwzyrgwfhcm3qhjx3 X-HE-Tag: 1744595869-447707 X-HE-Meta: U2FsdGVkX1/GvcCLcdcmWhKQuFiLnWNnHzQj1m8SgTtUjXwbfRSIzsNjZRjihMMhia0j62sI5m+z904aR3NzdOit8g7nfgVv9OLC107+HCNV82AtqHQbOYnbl4UmK0ydRsWy2tGa3t0APAidLvWafrKZ7V1P4iB4JzQ7ZVbD3b5GpOViMlfyjDjSKOBvJEWbLU/oQ/Ifpr2QV9VzXSkBJQ5PCDc8cyGaIIjabu7vVWde9yghKlYgCxlZyJQMeJ2SeTus61hfXEGyKFO4/5wKf2bbmYphdMl9Oq6nFc48XJ42M44vciTx7aho9tDdINNejkjui24PauKfHwa4nWEi07cWQ75FVGOWw4mrQJBDpQ6z/ys1zNzYunPYzNmR55XNNujpXcqFk3CtgsbMPqc63bGtlz11Ui+KdybuJ+FXJeK+KPo+mMOmGneALPaVSXaYf47JDGmbQMV7y8P3MwzT6P4aAvBhXNMdhNN0ktt8tqn8yaiVo/dvgi3dAM7baQZZ6oIC0mFZHdzUFFX/dq2CjE5r0vXX2qq6kEZnns+zMcMFdzgVy4Qm/fkUIyvRfedwcjum5+IvTP5+vFaneF4S7bt6OUsMCisO9+vSr2EPObPDu6cH+7I0TaBEs2MQEWsbJuNpZ7iCjDZlYfKH7jKQr+94B0mAKKhQcKEvpXxPzA102LnaWSQFLQ870B+1WlPdmNijhZMX4zdaOZ5SUzuoI1gWdvRb3y3arDrsQh7tnhJY/04lJbIO6BVWRzc4qnNvgeUXOgH+ZULzthn6BxkN5E+z/guMQ2FU5IOVJEdG/UzDeB3YCwjtVTSD2kJmB/dX9MhSNWYd041eXN/BSSZx9UbM5n6nwK8YeTBqDjMcby7rUAq5+rineLLgNCSs3JW9LWU4SnK7KgZlGRZOwqMIaDcIXwHznyQN0BeOqY+CCvPce1GDLEc/yOB6JdimgvBtzFm736Q/aoR1fYKcD37 f78mhy/K iYyDyTFa8ce7DDbd0dI5ANTmKH/ErDh+ZQz2QPuLg2UT3KapMUNI8VL7PtfGw6m7Xk3yejxH+ipU7cKDyRb6jI+1m694xIohePhItP8GFF0nXryFqVFMksitQbRy6Qjlss1+igT8o1zr4F+p72cfjzZBpNXdl6GIz+Iq9P6Cxd0QDxMStHgHb3POFixMVuhvaQckyLskw42BCus2R+UkWRD9eYubnUOmqpK4lyUyVcEzywShs8RQqXJHskKEy7KXcl0kfqWB5rqtn7nRIBF7/86O/Ni8sfOWaYT+z0gkBeEFpQxRy7j1cJf7E67OxOvpxTUzvBWchmAaWpI1bIurZtxY5thtESe49borfQOzIIDyprgPvgh1vee4Gc3VK0T4dl6jW9p0HRKNcl7FgaUQw38mQ5qAr52ncUD0w6IyayUW9C70= X-Bogosity: Ham, tests=bogofilter, spamicity=0.006985, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi everybody, Here are the notes from the last Hypervisor Live Update call that happened on Monday, April 7. Thanks to everybody who was involved! These notes are intended to bring people up to speed who could not attend the call as well as keep the conversation going in between meetings. ----->o----- We debriefed the discussions at LSF/MM/BPF. The general understanding was that the core MM community didn't have any major concerns or feedback for the approach discussed, as long as there would not be intrusive changes made. This would likely only start to become a concern when extensions would be made for preserving hugetlbfs or tmpfs. ----->o----- LUO and fdbox were discussed at LSF/MM/BPF. Jason suggested having everything preserved using fds, including a single char device interface. This could require some significant changes to VMM: Pasha noted we'd have a VM suspended to memory but some KVM specific state would need to be preserved for Confidential VM use cases. The VMM would still do the same call pattern as today (open /dev/kvm, lots of ioctls) but would also note to the kernel that some specific state would need be restored for the VM, rather than retrieving the full fd for /dev/kvm that is preserved. Jason said the VFIO and IOMMU need the /dev/kvm fd so there is no option other than to preserve the full KVM as well -- otherwise we cannot restore the full iommufd. Pasha noted an alternative would be to preserve memory using the fd and the IOMMU is recreated with the memory that was preserved. Jason noted KVM would have to be involved when we started to preserve vIOMMU and for Confidential Computing. Pasha was concerned with the amount of code changes that would be required for qemu and other VMMs. Jason stressed that starting up a VM in this case will inevitably be different from starting up a clean VM. This will especially be required for vIOMMU, but not necessarily only for vIOMMU; for example, the VMID must be the same as KVM uses on the IOMMU and CPU side for ARM and this can't be disrupted during the KHO. James Gowans asked if this state could all be serialized to/from userspace which would not be transparent. There was general debate about preserving all fds; Jason argued that it will be complex but likely there is not an alternative. The underlying hardware state would be destroyed when attempting to restore the IOMMUFD. We have to preserve the hardware state, which is different than the challenges that KVM has to face because it does not have the underlying hardware state. He offered an example of preserving eight VMs with corresponding IOMMU hardware state and how to map this to the correct VM on the other side of the kexec. He was also concerned about what permissions would be required to open an fd and take over a KHO; in this case, a security token would be needed. Jason noted the only thing VFIO needs to preserve is the fact that it does not need to FLR the device and which iommufd is controlling the translation. Preferably, there would be a consistent way of doing this throughout the kernel, such as preserving fds, rather than anything hacky; for this, we have freedom to determine what is supported with KHO and what is not. ----->o----- We discussed open questions for KHO, fdbox, and LUO after LSF/MM/BPF. Pratyush wanted a feel for where this goes so that the next version of fdbox could be worked on; clarity was needed in establishing fdbox's role and where it overlaps with KHO. Pasha noted LUO was handling the state machine and the dependency chain for devices -- this starts to fully overlap fdbox. Pratyush noted it would be fine for fdbox to be part of LUO and he would follow-up by looking at the latest LUO series. ----->o----- Changyuan Lyu discussed what should be saved in the KHO FDT. Alex's original patches allowed for copying smaller amounts of memory, or it's possible to specify a pointer to save larger chunks of memory that the new kernel would fetch from the FDT. He suggested only allowing KHO users to save pointers to memory into the FDT and leave it to the users to interpret the preserved data. Jason noted that this made sense with the simplest example of just using a u64. James noted that one very attractive feature of storing everything directly in the FDT, while acknowledging the size limitation, was that the state can be dumped for debugging purposes. The ability to dump this state would still be possible, but with more complex parsing. There was not full alignment, so James suggested following up with Mike and Alex Graf on this topic on the mailing list. Jason suggested separating this topic entirely from KHO. ----->o----- Jason suggested if VFIO or iommufd were users of LUO then the case for upstreaming, as well as addressing many of the questions in the discussions about it, would be much more clear. ----->o----- Next meeting will be on Monday, April 21 at 8am PDT (UTC-7), everybody is welcome: https://meet.google.com/rjn-dmzu-hgq Topics I think we should cover in the next meeting: - finalize decision on everything being preserving by fds (complex solution) or recreating state on the other side of kexec + discuss Live Update Orchestrater (LUO) based on RFC patches to define the state machine - update on next steps for fdbox + is this going to be pursued separately or as part of LUO * does this support obsolete the need for guestmemfd in the long term + allocating swiotlb in low memory and any other device requirements - finalize decision on storing u64 in the KHO FDT to point to memory without storing all state directly in the FDT itself - alignment on memblock as the first use case for KHO to justify upstreaming, including ftrace use cases + update on Mike's patch series for memory reservation - discuss how KSTATE plays into KHO upstreaming and complementary or overlapping goals - decoupling 1GB pages for hugetlb, guest_memfd, and memfds and how fds can be added to an fdbox - iommufd patch series (as well as qemu) from James - establishing an API for callbacks into drivers to serialize state during brownout - reducing blackout window during live update - testing methodology for these components, including selftests Please let me know if you'd like to propose additional topics for discussion, thank you!