From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0955BC0218F for ; Tue, 4 Feb 2025 04:00:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 460646B007B; Mon, 3 Feb 2025 23:00:16 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 410876B0083; Mon, 3 Feb 2025 23:00:16 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2B0A76B0085; Mon, 3 Feb 2025 23:00:16 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 0D6446B007B for ; Mon, 3 Feb 2025 23:00:16 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id B1683806AF for ; Tue, 4 Feb 2025 04:00:15 +0000 (UTC) X-FDA: 83080909590.04.D9575E7 Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) by imf21.hostedemail.com (Postfix) with ESMTP id 12DD41C000E for ; Tue, 4 Feb 2025 04:00:13 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=lsqlzeyJ; spf=pass (imf21.hostedemail.com: domain of rientjes@google.com designates 209.85.214.172 as permitted sender) smtp.mailfrom=rientjes@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738641614; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=GFdQhGirlKFkv6SspInQshhQMw1bVlSUURS7hK7cLdE=; b=CdQffGpQAsuPIUcRW23hd7sBe5zKSXO5Im+8zbSM6c7/kSXAgRnNzktibTzwEtSMtxpZak rxd2Uy5HrW4o3oqf6Qb8FcLAnVPkVzsp10sb4Txf4SbNB9IvRKl+Xp/v+Tiv8ea0A8h0+J F2BCbUSUF0B0jkZetTEwks2XzZdRZzQ= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=lsqlzeyJ; spf=pass (imf21.hostedemail.com: domain of rientjes@google.com designates 209.85.214.172 as permitted sender) smtp.mailfrom=rientjes@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738641614; a=rsa-sha256; cv=none; b=C5/cuk1DMEFGllILwK4beV24IBkkeVoi61+6DX4lM2LnNFn2Vsxa5S4DRLIu9+COYGmpQR LZcc6PZSdjDj62b5KCXS5d6zbO9SoqttgKRMkYZX/lVY1Y5n8DChkXJNK1ysNaM5UrQoXQ VXgHWrYh1Dz3w658Wi1h+kmLuxAA7h0= Received: by mail-pl1-f172.google.com with SMTP id d9443c01a7336-21625b4f978so74035ad.0 for ; Mon, 03 Feb 2025 20:00:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738641613; x=1739246413; darn=kvack.org; h=mime-version:message-id:subject:cc:to:from:date:from:to:cc:subject :date:message-id:reply-to; bh=GFdQhGirlKFkv6SspInQshhQMw1bVlSUURS7hK7cLdE=; b=lsqlzeyJv4F5LmD9Eef8MWob5V/IzNzRZonPdvF7jNQyc3pzUp8i3nQwryWAu21Bzx buGPyqonJGa2MA0EmZPElCVfg2wl+dcAqPIWZotJUlsuIYReoxfKdRO6GH7wmxl/BvIM lIRWtXoe4GwoLoV5fJaE0uMVZgg5u4PCNSRdzrQ82RY5bK+Q6vz5l2M2e+EcbDA5AadU tBF1CVh516iNhyszMt/t75XJhvvwo2boSOE1UPdjXmszI73hfDOQDDYF83ubD8fhhXNz cUSP3OlTeg2m5/xaJzgy2XHbgTEvi+g+uJT98Nz34kVs0YOpMpYJw5Mp3RC84BZ7i93q SlRQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738641613; x=1739246413; h=mime-version:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=GFdQhGirlKFkv6SspInQshhQMw1bVlSUURS7hK7cLdE=; b=cy5L4NGHIRSkbZjnw2poFS2+u2zdf5rx4+f4mPCKXlKl1N+AO4ZkYIBoWs0cLswfQ7 h9c6DdbIOAQwlL+PyqGMLJJoiabp63x/bdjkZupyrCmrJXUp+4zOJgo9yQwwiXvz4UQq fwko0xA5zwHSWmAGZziQ/pMPKnTs5EuZgX6/7n/2aHck/6c05lIgLM4zU/EhwSoqPKm0 veVtAhscWaw2ZRnl/HN2wumZhSewFPTcEQsRTeDZMngoDY5icMBX+RzbyUb9SdO9eoSk iCYzyhZYBjrCTW4Q3UdL8UQMx1ZqMw9SUyhFRjGUk2bCUWMKFcKQZ6nEH6gXyYtvq+bK eLwQ== X-Gm-Message-State: AOJu0Yxj8U2OQNHNj9sB2nhbJ2dkIlhgqYb+x4K+lD9strH2PdBBn795 xron6cHqfGrEyC/8Jn7mlPER9On5rlIIeS+kCpefwYbzmJ0qaIuB6e6n62fvEg== X-Gm-Gg: ASbGnctH35gDARg6lCup5VefxA8WhyB8eHh/fCYAF2tY4+4uja70ZOe0dxVilF5/v2I UgOuxmjjzCpmilDwfACIjCvr+k/WN1fbrSCZHulzFCrBQ8x8GD1oP+HmaNpLuOEQCmTBLIiOBiQ SZxUcmBrwr1vBJn6pKSk3C/Bf8FEufsOJk6d21wTg3VVfDx3teMeXIbxyMF1jKyLlM5LgJZkD7O m+xMNjt+vDbefgrdXNUtT7OuJPwfvdOmvoFRbwxK8STmgHQXFJ+AoakMxAQv8oGJ7qFVmnuRrZM UCpEFbvDk7Agh8pBN8YPXoz5s7mS7a/Enaeycm0apvikIoiY5sSph732msaAPmW1TwrNC90Hk4x KN26jqycLNg== X-Google-Smtp-Source: AGHT+IHS9Kq2kj1R8wYY+KP/a5zdghZPF6LytpK2VuTrqxReA4Envd5jDikCxeZV1geusqrivtuEvQ== X-Received: by 2002:a17:902:eb83:b0:215:aca2:dc04 with SMTP id d9443c01a7336-21f005bbe49mr1613695ad.26.1738641612613; Mon, 03 Feb 2025 20:00:12 -0800 (PST) Received: from [2a00:79e0:2eb0:8:62bb:5433:88e3:2227] ([2a00:79e0:2eb0:8:62bb:5433:88e3:2227]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-72fe69cf042sm9282202b3a.152.2025.02.03.20.00.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Feb 2025 20:00:11 -0800 (PST) Date: Mon, 3 Feb 2025 20:00:10 -0800 (PST) From: David Rientjes To: Alexander Graf , Andrey Ryabinin , Anthony Yznaga , Dave Hansen , David Hildenbrand , Frank van der Linden , James Gowans , Jason Gunthorpe , Junaid Shahid , Matthew Wilcox , Mike Rapoport , Pankaj Gupta , Pasha Tatashin , Pratyush Yadav , Vipin Sharma , Vishal Annapurve , "Woodhouse, David" cc: linux-mm@kvack.org, kexec@lists.infradead.org Subject: [Hypervisor Live Update] Notes from January 27, 2025 Message-ID: <26a4b7ca-93a6-30e2-923b-f551ced03d62@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 12DD41C000E X-Stat-Signature: eyf51p54mmzejpcng4qmrbcz49r4dfd8 X-Rspam-User: X-HE-Tag: 1738641613-427243 X-HE-Meta: U2FsdGVkX18D9ugyfXG/pFf1j0nDB1mf992JNm0UnrXhuuoXw9cbKJfYwC/Iqm8ocu+y9bwIAOHafJYaSmvYXqasKpaL5pDzVggNoSIu1cnN/z4szjL5K5BiqJIHo5Jv+w7Vo1MenMGIY6QON915Tjltj9779x1Jh2LIsurd0GQrAOYJRh/YKq5b3fMOys9Nkl4+T+pXzfNUaPa3Sm2SMv/T0dzYGbWpBelfCTkZ5C6hvFUO37N6NZsN50hOdlgQaub7FoitteVSwgEO8lsMj0KX4aX2L+49NdHvXtz8kzDfMmC6bcpQKYPi7Um2IqwqTE1vwx9RZ2lAlmKYDF2eLUz1lQzk0EgVl8FyZCugPYsD3jQvAlVuzG+yZIoxnqiXTt6k12lfUjlbEgTEa9JNEfGuiLMNTl5tj/mSkw57X7rqMtDfVTp3gPg/sbuI+pRhIQOvB6ZeGQbSdImAKxrhydYBvlTP0pf8upSw6DDuv+jM/AomqRXwhhAZd5e4kiv1TMDdz7/G0jMzt8r4p6ri15uud9WOYuMhcJXUNHOfUg5vRt0BPiWi/Vg7+SVXdgxLFCNGxMCsfhe7QJYu2ADxOAbx2hIlaU/3lUmQmdr6vIp2+9qe/j2EqTz0/QgBeXF/q/9W/H8RukHqEN6z8QSl3d3El1OOpY4j/TkC3xMaXqtfCn307RnA6jZxIk/Lczp4sdLr7/lQabiJYsRp81uGhr4Ag3lJZvHl9g4uED0mlBf52d2dBr/QEg98LZbPWWJumRm/gP7f9OO/7PYfHRg8Ad2l+uZIZ3kjuw5uzbIcnczFAow9SB9UjXNs9++SvGziwKIliG4Iw+aVZJXodeOB1PpYiECqc0MJmI5W5DmVgwACqSNz0Vs7+yJwAyC1SXCaQ2PL7Hv687anLoCthglmhup0V46u2B8zblU2K1oE5hUG25WdMvKGwmYfm4dTJgDhawTFkUSooGRqqLqmV16 xr/6ckLt RfxcmlgWU4381c2Q8spjWCvrOzl7OfVKAfPOk6AfG8GBmNnfhzybfxr2uX0j+bGgn4X7TkJZPaZkkoAGKe46aXdTfGUbJLOIvzL8QtBjImhmQd16Mh/5K6lYGIXwCxWmSAD3RzAu9SDFVYRZJdQxp6tUXb0cmAEP7vhfU9XS/NTTi9WZVETqg1ReLgSM5IaOWbInbTfCaEqs91aGj4EsB741pm8FaXL2/9yrMhQKE9eXOjVSw8swlAoAG2awxLGKRbmBawAxfJbAyj8pUQR0ckmkgTpNj6CWsABmlSwmg+HFhMwPjeQ7jrTnk4QnyyrMO/sn0sT/3hy8Im+QH7zrBcnkUrKyBV2PxNKDDW2YIgdTwaCn2oyjzIFQoD2tMKuNuE7BuN6bBmwpyN0pkO/rFYHaqyLbHhRkg8NvvIOXk+UwRbH+BKgYmM5aeI091UZZ3QFMJBPmsrxYxkkBHrZ8PAO2RMi7nw6d89Ye6KvxLikhko9G8fyTih1aOMzm/IKUSjoOkWweI6yhYfGNM4p/ORkqooasaqb4iTgBO6NLrvvyAXzbwfzqgvjMCOrHAIZPA8rIhUf5dq2GtRtU3Mscp2/L+rWTrDamjzf0W4q2Yi3nrGdJiARpcQHGoxfkj5Buqnpte/+NP01EQ3v2AMSOfp5iiMCdi1lEVjav52ny+ypq1m66evLgXJTfqGSk6HzWVnJ2GW3l1AXXXzmh5DO3XXq7yGOdozl+orQT7FNMmJXn2FudTqmWFzk6Jl5E3FW7VIG3VwoVRafF1/kkiJn52KDXhQny23Ae1u7d3XfH4iEtLoyTbXgSJ/0K/xi9x+6xnSze+hZ6GcY5xKf2G4YGpf9Ztndn+/MC85KzGCWO/zOcnUrhF1LeDNoL4Fe0cE7jzcpywl/FU+XcWA59/w3xf4VwedY3NAXoK/gTdrKQL9hIogNF64wjG8xC+fC58nOO/9Lot5OCv4K2ca9dnbIBxZPPlVh1l +0LD7GdP 1vQTTDKVrIuIVAixjQmduPKSrJHRUMDo1toCaVD2P70lcmYEpc6b0O8zPeut+ZPOxNW4fF4vhTCXKK3XN9q5VKREA0TSTf7QakFw6rVbRWgjKfH5wzwdfPT8fdKaF24XKBtl/rksWmWKjcM/UUPfKkHEUbZM73z20Y3HNL9o9gbJhEpTD7PixB9DpmShRTuV9ZsSFKIMKG7glOPVHUNf4+sKXkwD9m5+cNL9IBft5tTe39bOJSwb04hVHCu3JRyPHkMs7P9ikTNEb8pomoZNij9t+ciZ0T36f4hAKhiz2RfGJ3IQR81tmhNoAKN2AQwaQAlXavBQz8Z7lbAJaDCi2K0EaoNYq5I1PbscCy38CW/XHk6K1nTJdyK+/jFwvI8SfxCKooMepPM4Urv/fAFkYzzd8NqM8AEQ2sUYgkwelyNfguXFoYhlkQApV5BlzXnJ7mMFu+UwH+tipRVWUkbixR6wS7XY5dSgKZS2memAOdjnYq4nOQks2fUPHMB/LYaVJ5NIhk8ZT2NMAGVJwoIPXNM0KtFVkcYJLcV2TkZ837DCdHZfB6SVCWhOGR9nWNNm5BR+5517r9ubwXMpWT7PTUXO0NngY8mQM0o7IR5DPNcx6Et1bjgcHOKcn8Wh69ZYp+nWHohy4G3/zvED6hTMP1a0L7WEw2Q33J/PdtaALRHu+EM9eaYuJY2mk58glU3G6I7g+UUsTkjAqcHebBrmDQrHY10XwRny6RZWodcqJzrXel8xHjkUeGaxmrtz8EHILgm7SE84z83RbnxLNs540SbxYtyD3JS16CAQ4AhWIX61pUJWLGqK2Fby655aIhGAnMZ43kih6XskHy5JV5JJbbAHvl3iF9FNVXQ5ztN3Xh1AiTlv+fqX5CzptpU7lC99kEFvtLEvyuX1epsrCBV5PRzYdoClnUDTd5L2qSWTYI7JQVSpHw24g+9jE8ZAv5qGB99/34CqunGupeBd3r6CoBQoEd74l JSoZNi8g D3zVh8vGo3IlPbxsNmMgCxr4j90bRs7Vk65Vi0xz9WkxoeedjO6IpNBvXrLnwc+fp9wSYAqZ5tPwB6HRM0MusOuP/TifFJPVz2iCiykqyl42ODKQxIsQLmuUlXP8JZUN4oRIkN91AeCePedZkGjvaA9FMyHdrvfcDCI9SpEXadJKq1NFKq0lx7i7KxGJiT33+Uv/b/PGnf+imBjgFsudkwxzbfbiMdWaK5ZMzVlOyOzd4U8fDNCWumVRLBtyl8Pfpla+jZOKanItAPHDKVO4z9mBY9lLssL6DvDcTI61eIbrs/6IvPkDEXLwEJfjxHBBXCEfrO78elJpsCSelfVfMaeZDvqwRXZYULET00BfBGevV120VVkepPKCzp+6R8U7C90tRIZZsunHXqRAW9QGMFh3syDshAjuDCzfDv8+XvHNxURjQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000003, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi everybody, Here are the notes from the inaugural Hypervisor Live Update call that happened on Monday, January 27. Thanks for everybody who was involved! ----->o----- I talked about the logistics and goals of the biweekly. If you would like to be added to the calendar invite, please email me privately. I can also share our cover letter and shared drive with you that contains recordings and slides (if any) if you provide an email address associated with a Google account. We also discussed the scope of the biweekly series to include: - KHO + Including potential early adopters (hugetlbfs, tmpfs) - Persistence of PCIe devices + IOMMU(fd) persistence - Guest memory + Including Confidential Computing use cases - Reboot optimizations ----->o----- Mike is planning on sending out another KHO patch series, likely this week (v4). ----->o----- James discussed work last year on iommufd persistence and hooking iommu drivers into KHO to persistent their state into kexec. Feedback suggested setting up new page tables and then transferring over after the kexec is completed. James will start implementing this on top of his existing patch series and include qemu changes as well. To minimize downtime, the plan was to resume the VM with the old page tables. He suggested userspace would initiate the switch to the new page tables. Jason noted KHO has been too focused on preserving memory and needs to preserve file descriptors, we need to take iommufd, freeze it, give it to KHO, and then pick it back up after kexec. When you're done with it after the hand over, like an atomic attach, then it gets destroyed. Jason also noted we'll have to consider preserving vIOMMU state to support latest NV hardware, which is highly complex. Alexander Graf previously developed a concept called fdbox that turned out to be very intrusive in the kernel. Jason noted that all of this work will be invasive, but we should prefer to compartmentalize it as much as possible (like for iommufd stuff, a kho.c). ----->o----- Pasha suggested KHO should be kept as a mechanism to preserve kernel memory across kexec, the serialization requires different mechanisms. He plans to propose separate an API for callbacks into drivers. Jason noted it was going to be critical to provide a state machine that we all agree on, including for definitions. One aspect he would like to align on is whether you could put a guest_memfd into an fdbox or even a tmpfs into an fdbox. Mike Rapoport noted there are multiple layers here, where KHO is very lower level and fdbox is built on top of it. Mike emphasized it will be critical to establish a format between multiple kernel versions that will be standardized. ----->o----- There was lots of discussion on stable ABIs for allowing continuous upgrading of kernels without requiring a reboot. Jason suggested upstream can provide a mechanism for upgrading from 6.12 -> 6.13 but not 6.16 as an example. Doing any version -> any version is much harder and likely cannot be supported, at least in the short term, because it's so invasive. A good example would be for the mlx driver. James and Alexander noted that we must be able to rollback and this can be enabled by the downstream customer, it may not be a burden on the upstream kernel. There was a general acknowledgment that upstream pairs must be supported, but much of this could become the responsibility of the downstream user. (Alexander noted some users may care about mlx, others may not, for example.) David Woodhouse inquired about rollback functionality and how we would support a VM that has deserialized after kexec using a new feature and then still support a downgrade afterwards. Alexander said it was important that the user of KHO supports very controlled A->B environments for this to work properly, and, if provided, they can control downgrade paths as well. Dave Hansen noted this was similar to discussions about checkpoint restart and CRIU. The burden in this case may be very similar, that it is taken upon by those who care about upgrading from one version to another and that it is not a general upstream requirement. It was acknowledged that this will be a ton of work to maintain reliably, however. Dave noted it will be important to socialize the work that needs to be done with upstream developers, but that the work will be taken on those who care to use KHO. It was agreed on that once you roll out, you enable new features only when you are confident there will not be a rollback, and then once the feature is enabled you've passed the point of no return. ----->o----- Jason provided a nice early milestone for KHO work: demonstrate a kexec while the VM survives and the VFIO attached to it survives. Pasha noted this has been done before with PKRAM, but needs to now be done in a way that KHO would support. ----->o----- Next meeting is scheduled for Monday, February 10 at 8am PST (UTC-8). I'll send a reminder on this mailing list. Topics I think we should cover in the next meeting: - LSF/MM/BPF topics of interest for the group - v4 of the KHO patch series sent out by Mike Rapoport - iommufd patch series (as well as qemu) sent out (hopefully) by James Gowans this week, otherwise a week or two from now - establishing an API for callbacks into drivers to serialize state - establishing an FSM for all of the various states that are agreed upon with common language - finalizing the decision on upstream support for minor version upgrades across KHO and the burden of downstream users to define what versions can be upgraded - topics proposed by Pasha: reducing blackout window, relaxed serialization, KHO activation requirements, and decoupling KHO from kexec - implications of preserving vIOMMU state Please let me know if you'd like to propose additional topics for discussion, thank you!