From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 36008C021AA for ; Wed, 19 Feb 2025 04:04:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A18A52801E2; Tue, 18 Feb 2025 23:04:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 978C42801D7; Tue, 18 Feb 2025 23:04:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7CB802801E2; Tue, 18 Feb 2025 23:04:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 5C09A2801D7 for ; Tue, 18 Feb 2025 23:04:52 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 0CAEE140D6A for ; Wed, 19 Feb 2025 04:04:52 +0000 (UTC) X-FDA: 83135353224.12.EDDC62F Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) by imf29.hostedemail.com (Postfix) with ESMTP id 57FA612000D for ; Wed, 19 Feb 2025 04:04:50 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=I2AjdPJ0; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf29.hostedemail.com: domain of rientjes@google.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=rientjes@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739937890; a=rsa-sha256; cv=none; b=yVcLguRe0Ip0HXsZ5vjffo8H70TvIcfwhbk4TLvYE73CQnHYXc0DK4s26Bxgh1n555EIfJ HQZzwHx1YruqM2EloOJSFcv4RNDJNpFJIz+N86BXfFOZ6PeklQjXj6HtcTxE4tSgGNFD0i kMF5/2k4ibH73oXCj3jawxJuXBmIsoA= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=I2AjdPJ0; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf29.hostedemail.com: domain of rientjes@google.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=rientjes@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739937890; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=/1CbMeTGavk4T2TQG8Wh+P+k6CxlJkoFkaDMpL3O9rM=; b=naFi0zOYdAcVc9xaVDPHpzIQNFYll+DOvcGPN0qN3JZklrJEA0Q0Thi/rRUSvxZA7WM2ZH 5bHfAWQr4mr71zKTDr9nTXkJHDcPDRdQy9fB3LPc+gDxKySEWW53Kt9OMl9WD0KIVq7yes UIGZrVIwKbcMJDmCE2sLCeJTKAnLf5w= Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-2212222d4cdso433385ad.0 for ; Tue, 18 Feb 2025 20:04:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1739937889; x=1740542689; darn=kvack.org; h=mime-version:message-id:subject:cc:to:from:date:from:to:cc:subject :date:message-id:reply-to; bh=/1CbMeTGavk4T2TQG8Wh+P+k6CxlJkoFkaDMpL3O9rM=; b=I2AjdPJ0MN7Z+rPMynMo9D8FiFaH2Cne+mGpHUjUNL5Pf677KbtdJuOWPB89nZQSrX UqwwfKPw2rXG2S5dUBC4d1U/mVHBFn1hNscQSU7AN4PIWiZ4XgKe9q9AM6vtZR2wTunl joDn+VxfQ1blOgu2O+WHZTEk7yIBKLVg758KrbQzt+6RQUnAYppozmQ6JdpQWregyteW vtteS/a3VnKl/tC28RLKY/QbByeMcrd2MKsf/RxR01ipKMVk8CoQr+gz/34lzYEIQC3B ktT8tSdOIfwvN9cgWoU9CWF2qTZErtjbsSzQDy277JYWOZAhWHV3fdXC+WhPL5ywmnf3 IO2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739937889; x=1740542689; h=mime-version:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=/1CbMeTGavk4T2TQG8Wh+P+k6CxlJkoFkaDMpL3O9rM=; b=rDraZt5UDudDacpxgXhh+PoRrm5LVAMDGPl9sMyJK93o6PIMW1z6QMNdIn9fQDik0M /OEvl05H4++cAGh+AzyTjP43rdFc4020TqktSm5V+e+pkOVPY/s7iKHXKl3/0GSSz0NN UJCkZeAIW8Xvoy86ybExOV+cag2XI1UaeSFQZfZR7l3dZYJrORdXsdou0QIZnOGQ5RE7 SnIrmnFHR+L/dxqRA5BkpFLNwYigEh3FVKjJDGliiDZEgvKF1c2XhAh3Bec2uRvAcjNj bPySb/GZBEO/NkTqMOOVWfHRVWaKu1Z5EvgYdgAEMgIuxWMj1Uxtv4848DJiWr/8ylyx M7Ew== X-Gm-Message-State: AOJu0YyLs2JmDwuw1o04bAFM5Uw/NAVfZVD1B86VyW0Ipa3D9MLV66xS Jpq/hEufVQYvGzBCOb+jPTQ95e92g7fDoqAKOmJqEHqA5R99QnwnizQAWSpUXw== X-Gm-Gg: ASbGnctU5EgthoEWsmE6netJs3CJsb/5/4zRLNYpr5P/BGUt7Pc2FA6SKH+EJrCMndE ByM2S3eYk/WddBbvvJrwI8q4/4Pk/qjiEJNiOQZXpvdrfpdQtwmJ/nQAQgomAf5b9kthfOcqxUe 6tz9v+mNdti7uOeJ8aopPsvbUQ2KkFP9O2xDhg6/GdBs3CsuuI/1elYIlLDF/cojjSRge9Cd2Fz Z59gZC2PcdO5cTIR5NcGlMRK0P2fMIaxWhiHanukD/f4k4UyLbu9qvyKb0mAFukcQrF04CZ5ECU 187uvLNFq2GJ1I/rsMLFaVWZThUabMTit1lGpeJ8n8z6lWOlLZ5iTBrj9eoxeqWSwbpNn/ntB6I i5kRMiNdXsw== X-Google-Smtp-Source: AGHT+IHmBPa+/ldm44EMohM9jKZFDe+NLuVimzm7brPo/TKk+ElQlGUFHQ/mL9HTDct7sfwqaYSEaQ== X-Received: by 2002:a17:902:ea06:b0:21d:dd8f:6e01 with SMTP id d9443c01a7336-22175a850c2mr980465ad.5.1739937888904; Tue, 18 Feb 2025 20:04:48 -0800 (PST) Received: from [2a00:79e0:2eb0:8:3222:29ed:dda9:52c7] ([2a00:79e0:2eb0:8:3222:29ed:dda9:52c7]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-22117cd038fsm54494735ad.218.2025.02.18.20.04.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 Feb 2025 20:04:48 -0800 (PST) Date: Tue, 18 Feb 2025 20:04:47 -0800 (PST) From: David Rientjes To: Alexander Graf , Andrey Ryabinin , Anthony Yznaga , Dave Hansen , David Hildenbrand , Frank van der Linden , James Gowans , Jason Gunthorpe , Junaid Shahid , Matthew Wilcox , Mike Rapoport , Pankaj Gupta , Pasha Tatashin , Pratyush Yadav , Vipin Sharma , Vishal Annapurve , "Woodhouse, David" cc: linux-mm@kvack.org, kexec@lists.infradead.org Subject: [Hypervisor Live Update] Notes from February 10, 2025 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 57FA612000D X-Stat-Signature: 1hsfmwhzitndnxynjzp3wxeancdncy1x X-Rspam-User: X-HE-Tag: 1739937890-893631 X-HE-Meta: U2FsdGVkX18o0bz9I4VFT26zmsqqpRYCIke58Eh1T5hzbp2un9Bsm7vokZUFpHK9WC6eGLlA5FjRw30Ko7V9LxbziASLDuKsAGg/LIRE5RcxIgI9OzOraRVebjKKU4NayCkSITglD3F0DiSSQFHihUoYomHPJJI87cOOXoAWVAE2w8Y+o2LJpdlHpBIGyl4WBrWEvzn4UYwe1skCVA40yi4fM10NNwA8Ps0MdRzdTVuXEBp83CfO7ZmUdan3IGO9/CJlAGfVHgx7nxCxhPnX7wTG74zjsqIrWIbb4X32k6sRaU/UQ0dYEK7QpXcTV2qc8LnWNbHNO+w0Nm3f3YTNdDHmEEMpw6EVnMTl2gklA0rFWx58IUSihroRNT53HENS4ARhBGoDv0Fc+jlk9t6wDbiUm3JP26vzvFsQlYeRnBojkkPpFRSfRgbHljImXJPOdbd8wrjw8prTFXFSizcTJ6QocNLNvFDbiXp8BwFw8QtIJLUJpGGlyrokP4z/ONep2hPFrNTx6IMg+yO1o3mlJteUwwlwA7NmEfdej4xH6V/x6hzyL0VNj2UjknQiFF2qTuTaCPEW/LmkC4lpb3VIKGLDsAUz/WdfpuCsSLgkYboKUZyheDPgmCJR86GNB019uC6wD9+sZohE8/xlfWsVSYMcUam+DXboNUQjrGzg3zD0L0rbKn7xcpgcwDMYD/bsTBENVCQ30Q/IuurnW8vXN7NCTcBlsB4SWupvmMJ1JywVT+sJrb7nzUlymax5y0jxXDVXhN19Is+i2eEDbixxCH6VFKb+G8d8RnsyqBX6k2QNC7MnOdIajCFPmEgQ6dDizmLUE+aHohP0fQklmpzIXZ+OkpzJf9a0thCSJoJ2Bci0lpx/q+974lb9DWeFBRa7mN6/BA/Ax8FjDpuiJQmIwx5Vue3qWzDcn3mlew+WrDtwHWDXGt8Vb+wPFbjsznqUMqwwKUHk/0UzsYGkE1B /XVOOMeo edtZNU3M0wMBmxx6GhPnXm2lfnDixm3I9Z1/iwPl28ZxA7xOFEj2s3VK5bcKsxXfN4V4u61/s4iiqee0SILiRINb1m2fG48ph02dklZxQMPqtJK1uZM454VJIzp4cIkytuqcFExrhdnkHMbgL65G6EStU49zMZ7rqEeOhXu1O7WUkAGPtedMdzQeCULyWBahAu/gxHP/Xy/WRVfqLtIhCR8tAlt2P8iroXsUhtQmqrwdXF3EDFss/n3K+qvmQBdN6s/O1s3DqBci/SqyzhGWsE9hR063w1XBECYJMpsz4c/m1WublCfl0ZYjveChfLTlU46lgFbklg6sfHa2cNkVdhg45uuKgqI3bwKuHlG0s39UjkxubaJmZaNOuZx7hO9KKWbXJYmXjp58yjV8DW8B+eePGmBMsm0OUxotsiqKnPC1MuzMvKJi2xadj11ifKKBlKsX1 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000129, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi everybody, Here are the notes from the last Hypervisor Live Update call that happened on Monday, February 10. Thanks for everybody who was involved! These notes are intended to bring people up to speed who could not attend the call as well as keep the conversation going in between meetings. ----->o----- James mentioned guest memory persistence and the future of guestmemfs, including feedback to allow for more prototyping. We didn't get into this topic during the call, so we'll touch on it in the next call. ----->o----- Mike brought up the sysfs interface, whether or not we want an activate or not, as well as aligning with the devicetree feedback that was received on the last upstream posting for KHO. Every new binding would need to go through their code review and Jason noted the scalability and flexibility concerns for this. Jason noted that older kernels can ignore newer devicetree components and everything still works, which is different in the cases of live update. He suggested a much stronger compatibility for live update purposes between pairs of two kernel versions. Andrey agreed that we don't really need devicetree here and pointed to a patch series he had developed that doesn't rely on this. Pasha agreed that for the short/medium term it may make sense to decouple this from devicetree. Alexander noted that FDT was chosen deliberately: we want a generic key,value store and the ability to add attributes without invalidating compatibility. Andrey noted this can be done without devicetree. There was discussion on using this as a KHO-tree and not precisely a devicetree. Schema validation was another attractive characteristic of FDT. Alexander noted the current discussion has been focused on nodes and sub-nodes as a structure based on runtime data: the decision had to be made between structured data (incl debugability) and data that is always in memory and gets translated from one kernel to another. Jason thought we needed both. Pasha noted Intel in 2021 had preserved VFIO passthrough devices using PKRAM. The PKRAM patches and its interfaces turned out to be very difficult to maintain, given PKRAM did not maintain ABIs between kernels. It relied heavily on developer insight to specify what needed to be maintained in yaml files. Mike suggested that we'd need to be able to allow drivers the flexibility to point to an area of memory, a struct, or a scalar. James discussed serializing all inodes in guestmemfs with KHO in previous work, and this turned out to be very useful. Newer kernels were able to add new fields and move things around, but the downgrade path wasn't supported using this. Jason noted there were similarities between stable ABIs provided to userspace and filesystems. He stressed that complexity here for driver maintainers may become too burdensome. Jason suggested starting simple: structs pointing to structs pointing to structs. Have drivers that have versions 1, 2, etc, and allow for this to become more complex when needed. There was a general desire expressed to not maintain the kernel direct map and the virtual address space would end up getting scrambled, and that must be supported. Alexander noted that KHO's strategy so far has been that FDT has been the standard for compatibility and the usage of it versus other solutions depends on the specific use cases. We'd need to extend tooling to do validation in the future. Alexander noted that after writing 1 to the activate file that you'll grab a snapshot of the device tree from sysfs and that can be used for validation. ----->o----- We shifted to discussion on KHO v5. Mike noted that there was a sysfs interface that enables KHO and then the KHO data (devicetree and scratch description) gets appended to kexec images. Only the scratch space would be touched by the new kernel, not all memory is preserved, although it is in the kernel direct map. There was a discussion about using a bitmap (or an idr) to indicate what memory should be removed from the buddy allocator during early boot. Alexander noted you'd still need to be able to associate that bitmap or data structure with the specific driver that needs to find its memory. Jason noted this would be the driver's responsibility. Jason stressed how this would be used to establish the ABI, for example if a driver does alloc_pages(), store memory, and then use to_kho(), this is a nice clean interface to preserve driver memory. Doing things like GFP_KHO would be more invasive for this. ----->o----- Pasha led a discussion on the next KHO series to be sent upstream and alignment between people in the call. Pasha suggest we don't want to have kexec file load as part of the KHO process and rather these should be decoupled from each other. We want to minimize the blackout window as much as possible. If the VM is still running while doing KHO activate, we'd need to prevent any operation from changing this state that limits the VM functionality. Pasha wanted kexec file load to be completely decoupled from KHO. Alexander noted the point of the activate phase is to accelerate the kexec so that we can serialize state, goal being to keep 99% of VM operations still possible. Pasha noted that some devices need to be preserved across kexec but others do not need to. Jason suggested not coupling this with a global activate state, in that case, and Alexander agreed on allowing certain drivers to participate and not necessarily all. Jason stressed that we need to all agree on the state machine given the discussion two weeks ago. ----->o----- Next meeting is scheduled for Monday, February 24 at 8am PST (UTC-8). I'll send a reminder on this mailing list. Topics I think we should cover in the next meeting: - the future of guestmemfs and what it becomes, including alignment so prototyping can be done - Andrey's patch series that didn't rely on devicetree - alignment on not preserving the kernel direct map and using different virtual addresses in the new kernel - v5 of the KHO patch series with minor fixes - establishing an FSM for all of the various states that are agreed upon with common language (when memory mappings can happen, what is disallowed at certain stages) - extending the above topic on a separate FSM for the entire live update process (what happens in brownout, blackout, shutdown, etc) - iommufd patch series (as well as qemu) from James - establishing an API for callbacks into drivers to serialize state during brownout - topics proposed by Pasha: reducing blackout window, relaxed serialization, KHO activation requirements, and decoupling KHO from kexec - implications of preserving vIOMMU state Please let me know if you'd like to propose additional topics for discussion, thank you!