From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1DD75C282C5 for ; Mon, 3 Mar 2025 04:21:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3410D6B0095; Sun, 2 Mar 2025 23:21:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2EF4B6B0096; Sun, 2 Mar 2025 23:21:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 193976B0098; Sun, 2 Mar 2025 23:21:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id E7C036B0095 for ; Sun, 2 Mar 2025 23:21:22 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 3E274140BAE for ; Mon, 3 Mar 2025 04:21:22 +0000 (UTC) X-FDA: 83178940404.06.49D8C3C Received: from mail-pl1-f171.google.com (mail-pl1-f171.google.com [209.85.214.171]) by imf05.hostedemail.com (Postfix) with ESMTP id 826C1100003 for ; Mon, 3 Mar 2025 04:21:20 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=HF9AEivO; spf=pass (imf05.hostedemail.com: domain of rientjes@google.com designates 209.85.214.171 as permitted sender) smtp.mailfrom=rientjes@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740975680; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=jSwTz98KJxkNrhvDmMikiQIq7w0EbnaiUYR+WiVAYwg=; b=6ndHFjV5cgD1lhEsPmSya6UpvblpZGpQO1SuthgR0Wj7lWceqrjGLPhIsicJ++/f4yTBkj m4CmAO6/STwJ8VpmKrlN8ByCzdBQJ2l9ZXB45itsaQW5CQdM9NXAh2RuhQTIrLBql/2Ytu YU4pbaUixw5h5bcZtr/qfMTGdDB/JoE= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=HF9AEivO; spf=pass (imf05.hostedemail.com: domain of rientjes@google.com designates 209.85.214.171 as permitted sender) smtp.mailfrom=rientjes@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740975680; a=rsa-sha256; cv=none; b=OWcd650rvWSzup2XcN+/StlX0UPfW52GiQtsN36rj7U9RZ+mHGJWrowF7P8mr0i5/Sa/3c XX1oeGtBWNn3n7cAUmAOolCeHDc1NU73lsuef8uqymghgRrmDsrPDAwdFxw6F+avQzVKF1 K3VUIKd1nmtV0N3Di2Xgalet/QqWKvE= Received: by mail-pl1-f171.google.com with SMTP id d9443c01a7336-223a0da61easo116755ad.0 for ; Sun, 02 Mar 2025 20:21:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1740975679; x=1741580479; darn=kvack.org; h=mime-version:message-id:subject:cc:to:from:date:from:to:cc:subject :date:message-id:reply-to; bh=jSwTz98KJxkNrhvDmMikiQIq7w0EbnaiUYR+WiVAYwg=; b=HF9AEivObza/5Z2KI6gqJSTl8zc3yueg5ywobiXNMLsbOZIuiyhwVjFqyqCaTXwbNo ujoi6aKJDIc0/grqk9DynCBVXP44hPEYtH2g/rn9hiQo/OXSyLDz8yxAYQMsApgbrHnx y+RYtdqcr+B7G5246TKZqPGsRvYDY8fvDFFv9NpLw8HzKzkP+1koMSStfSOOIXmhCQj1 tRffQYYBbfM1nKPA85QVT4gJZKsdeN8cHf8mqhn3p9k0su+K4BPZj4e0BGejcr2smq1k 65i6BTMWp2Kfq2o9R+4VtSIJUKYGJVH+IhmCazlh9vh42qtJTe4R0k7+iu9Y4CqJy+MZ hPDg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740975679; x=1741580479; h=mime-version:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=jSwTz98KJxkNrhvDmMikiQIq7w0EbnaiUYR+WiVAYwg=; b=bj2Oba+PQcoeRguUEdZOErYLmPWgHBhGWj5sKdxtRsSiEVzux5vPXH2RK2r0MaUd8/ Sb6mwyTmvhCNzfoxFn5JQqoS51iG8bdczu65ZeWQzmu8Sjm82nJR9ukHWYbA3Zkk8OoT qTDYYRFdt8+wymf9nDQN+58VoEf777BuwAUXQUF5zDt9OSlFhsNOomBWrMpqMFLbfvEL a9/Wrp0poP3qmxv57UhuaAWN3nFEs47Zytse+Jrrv6l+Rxk/CVNTb8wGGupRjEw1Xy2+ 3IAiqFFRip8eCplM1bjXopAsIJn0fEaAtm/iInoTSWvQUjIkBBmCzF+3O7P9wXal0dry Tg4A== X-Gm-Message-State: AOJu0YyLwZ+8OX+TT0fdpETnS/cJh0kC7ObVb8okMG3hBdhGCVxufyXH S4/9SwhI8jlWxuQqDhdQUczb21amG+fPIy4z1DFPqkCQ98DAkQ9I4ycJpyu3sw== X-Gm-Gg: ASbGncsZDeiPJ0Q0J9ZQT/kihSgih1uDQl9V/w2bpCif2b6w8obG8JWgq+7ABsQqSSi gKqaRp0ovLks8BiLBrCgq1StdY8W+UPLaMTJU6CsBR60D3KVEw0+SDyntSdcFnmXIoMWSMoYqAf xdhtQn4syT95egshjK76hzDvhIbCATUfA3+O7yObUYGIsk7W4OFOoCv+DpJNAg1BlnUMDJ3W3Pg +AM2wUZTs4e9MAROHU3A/T+SzdY+3TdZj14A3A4YMqx+VI3fymoR2HtHs9Vncs3ZKjbQfDCfYR4 PvSnOthLUbXnqibmdBUXyLD0NST+vZE85awOUWcKpyMWAEljnPn9XOzXR4OqX9z9gn/zE22oysm jQgpP8ZVlblTGBacI0LcYtE8arc/ZRNpqt1I48iE= X-Google-Smtp-Source: AGHT+IHTmDLjkL4kLc2K4ZyNgMK9wi9iHW9R9e1KRi5T48mrWugtz12tqUtkXco7ZpSBGpURw5rxJg== X-Received: by 2002:a17:902:f68c:b0:215:aca2:dc04 with SMTP id d9443c01a7336-2238367038fmr3140475ad.26.1740975678929; Sun, 02 Mar 2025 20:21:18 -0800 (PST) Received: from [2a00:79e0:2eb0:8:ab03:d519:56de:e7bb] ([2a00:79e0:2eb0:8:ab03:d519:56de:e7bb]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-736550edf72sm931936b3a.167.2025.03.02.20.21.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 02 Mar 2025 20:21:18 -0800 (PST) Date: Sun, 2 Mar 2025 20:21:17 -0800 (PST) From: David Rientjes To: Alexander Graf , Anthony Yznaga , Dave Hansen , David Hildenbrand , Frank van der Linden , James Gowans , Jason Gunthorpe , Junaid Shahid , Matthew Wilcox , Mike Rapoport , Pankaj Gupta , Pasha Tatashin , Pratyush Yadav , Vipin Sharma , Vishal Annapurve , "Woodhouse, David" cc: linux-mm@kvack.org, kexec@lists.infradead.org Subject: [Hypervisor Live Update] Notes from February 24, 2025 Message-ID: <7154b857-01cf-a8c6-ea02-51d095bafabf@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 826C1100003 X-Stat-Signature: fy8q37btuop41jy8arktpca39qxekf76 X-HE-Tag: 1740975680-146903 X-HE-Meta: U2FsdGVkX18QKN7f0zD5gEfWGUCjNEUbaFwkQfB0CdbRxEYz4rKiBcj38YFRDbCYCPw06VgmMvQL2bt4YxBmvwh0S6XpwolVptZy4Ich4plhN7GoGqJVf6jxalAyc1T6WZ+mnG8VLoijJGSBE4b7Gl5WdbMD8lj2h55HHhopnX81jVh0j8FfH6RF/whLge1dsNQKGBhxxyjDG5l5OVhsdoDixDKukeCcaUM+77Hq72saiZ8eNtrKpoyZdHGofubi8t50HO6A+heH0J1fqoPowjSEMGt8T3A1nJXsDrdjzVbOupE4FgYYUraZkdvmMerG8SsHX+L1GtWMpWyNpTedW4hD4GPphFFy5l/fh4X9sNyEJSdmXUtaU0fVginKFPqsZyJFBRgTo7NKs83s2qPdow1/JrS/K6viTGxpcpdpV41EpPRR9HPVvwUlsly3EgjaMI92PXcmu/8MxR0AnGgt7M7NimJ1Kr4O0/FK96aqYowHsKocOf4OtEiq7ERMveTRguoCCrc8Uszw9/XYu2BiEpSuPpRTFcZeMbku4UY9VqWM00P+tBSAx1movh7riXfaFZxpq8CRC2sUMCHT1oFR4Hjq99o5Fini6huLExozYl2++jlyWP/spGKUlxd1ZYIPqUNBbo7B6CJoWoXfIkpaqub4XTcHIlUuZ6P18tVwi1iOt8GZFRMluOrT89Dg78e7LOcYSaQMnMpnjTdgNp5s2kQe/uOygQMcBWySTEVVNGFmhqpUEsq3nfx7LyFc6YipufWdAKVxh1KfmpNtzLyUGMEViBqzefa3uba2MOokI6gevjL5e+vqByLRdtQO+v2zwke5YdjKGfp/hZA5AQTdoz4Hc+WnGJGRiQOAJqtZcUkxJg1Zm1MZWu35/Zmf5XEmLwZk4xyqy7Cb1PZwXHy9eTyC/UgIopAAKPnzhHZ4NedEMf8rzNFM4A/tpAN8IFd1+OUSu+BH908I4wmaCNn oLUY1pOI eUVlc7kQ4e+cULQc8goDT0o9D80/u+R/ferkGcNnpnPRpurWV/A8zDuDGwApDnCxFMBPPvU/WK/afYy3jznZEDBSPeNst2pdNjWMbvhRY/XudxdtifaMnjqVkM+wnjKqPS2Skp9qUhruYqIUM9DpTxV4xSjqO6xbHrXNVYAp6y+3N+UkKs9tzdkB98uZ4/XKtC+ukVrYLp1faMtzVGnAv3KJbZ4tnMeIN0C1ZQwjKKYE11VH9RgHUsTuu/xem5/xA7LvHq4mgs9WJ4BQAwxLh6zF2wWrHUCAAuoDv/UgjxLIVfIaK2kNjgbsYEHNpvDBicxIK19XcJWs7rj1qw0Xbjv1kT0tz5TlE+e6DhO+7xRfxIrhc8ZXx9ywTLAhnbWBPiGLu8mPApW0iaY8mpR2uqfj7IRwPmQfhq2tcSySt7cn/TaLkKKM0R2oTRr/Hwsk4ej91RWKqMkHzW96yI0HTS8tQ+Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.001381, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi everybody, Here are the notes from the last Hypervisor Live Update call that happened on Monday, February 24. Thanks for everybody who was involved! These notes are intended to bring people up to speed who could not attend the call as well as keep the conversation going in between meetings. ----->o----- We discussed the KHO v5 patch series that would be posted shortly. Two major changes were noted: decoupling of KHO activation from kexec load (no KHO finalize stage triggered from userspace or kexec reboot), and more dynamic internal representation of the tree that gets serialized into the FDT at the end of the process. The memory preservation mechanism did not change. Specifically, we were discussing the elements that were proposed for sysfs: the activation interface, FDT blob, dt max, and scratch area definitions. Jason noted a general concern on all of the UAPIs that are being developed and suggested scaling them back. Mike noted also noted a concern with the activation trigger: we may want it to represent the state in the KHO state machine. He also agreed with Jason that dt max may want to be removed altogether. The suggestion would be to hard code for prototyping, Mike said this should not be in sysfs. Mike noted that recent changes in KHO would do vmap and vmalloc in kexec reboot. Mike said that we need to have a point where state becomes immutable and Jason suggested a simple read from sysfs, i.e. when you read byte zero of the sysfs file the kernel would go and do the callback. This could at least be used early in development. ----->o----- Jason suggested that we agree on the FSM first because of unpredictability with future changes, especially for UAPI. I suggested that we align this across stakeholders as soon as possible, especially before going into v6 or v7 of KHO. In this case, I was referring to both state machines for the live update process as well as for activation. Jason pushed to understand what the scope of the v5 that will be sent out will be. Is the objective to get some things merged as a foot in the door, or is it something that is complete and does everything we intend to do? Mike suggested getting started with something that was minimal to address concerns from devicetree and kexec communities. Jason strongly suggested minimizing the UAPI in this case and relying on the dt blob for now. I suggested perhaps we should use debugfs for now instead of relying on sysfs where we have more flexibility for changes. Jason noted this may not be sysfs in the end anyway, we might have different interfaces later. Mike noted that Alex may have a strong opinion about this, but would suspect that this is ok for debugfs. ----->o----- Mike discussed playing around with hugetlb, Jason suggested causing the fd's to round trip through the kexec. If hugetlb is thrown into fdbox, then this informs the kernel that this memory should be preserved. This would be similar for memfd. Jason suggested against something like guestmemfs and pointed in the direction of fdbox instead. David Matlack asked what would be the point of the fdbox for things already in a filesystem. Jason noted this was for the concept of creating the inode in the first place (could be an anonymous file description), then we give it a label, and expect it to be there on the other side of the kexec. Instead of hugetlbfs, this might be a memfd that supports 1GB hugepages; additionally, guest_memfd development has been trying to untangle 1GB hugepages from guest_memfd in general. We discussed what metadata should be preserved, independent of whether it is regular memfd or guest_memfd. The other side of the kexec would do memfd_create() to restore the filesystem on the new kernel. Pratyush took the AI to look at implementation on memfd. Mike took the AI to look at memory reservation. ----->o----- I pivoted the discussion toward fdbox which ended up never being posted upstream. Pratyush provided the link to the most recent code: https://github.com/agraf/linux-2.6/blob/kvm-kho-gmem-test/drivers/misc/fdbox.c This work likely would need to be picked up to pursue upstream because the current code has several TODO's. (A struct miscdevice was called out as curious.) It was noted that it would be difficult to support memfd without something like fdbox, so whether the fdbox code itself were upstreamed or it becomes more generic from the work on memfd, the base support would need to be provided somehow. Mike and Jason suggested designing fdbox and then starting to use it for memfd. This would need to be aligned by stakeholders, including the UAPI for fdbox. Pratyush will be looking into fdbox or inventing something similar while working on memfd. We need to propose the fdbox design and UAPI. ----->o----- David Matlack suggested a future topic: testing of the live update process, including kexec and KHO. It will be challenging to do full integration testing with this, so low level selftests would be strongly preferred as this becomes more mature. ----->o----- Next meeting is scheduled for Monday, March 10 at 8am PDT (UTC-7). Note the time change due to Daylight Savings Time, this is now UTC-7 instead of UTC-8. I'll send a reminder on this mailing list. Topics I think we should cover in the next meeting: - any objections to using debugfs as the initial interface for development and prototyping - update from Pratyush on implementation on memfd - update from Mike on implementation on memory reservation - design for fdbox and its use as a conceptual replacement for guestmemfs, gaining alignment within the group, and agreeing on its UAPI - decoupling 1GB pages for hugetlb, guest_memfd, and memfds and how fds can be added to an fdbox - establishing an FSM for all of the various states that are agreed upon with common language (when memory mappings can happen, what is disallowed at certain stages) - extending the above topic on a separate FSM for the entire live update process (what happens in brownout, blackout, shutdown, etc) - iommufd patch series (as well as qemu) from James - establishing an API for callbacks into drivers to serialize state during brownout - topics proposed by Pasha: reducing blackout window, relaxed serialization, and KHO activation requirements - implications of preserving vIOMMU state - testing methodology for these components, including selftests Please let me know if you'd like to propose additional topics for discussion, thank you!