From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4511EC3ABD8 for ; Sun, 18 May 2025 04:07:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 47A836B0082; Sun, 18 May 2025 00:07:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 42A636B0083; Sun, 18 May 2025 00:07:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2CA3C6B0085; Sun, 18 May 2025 00:07:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 000AC6B0082 for ; Sun, 18 May 2025 00:07:56 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 6B2CC1C87A5 for ; Sun, 18 May 2025 04:07:57 +0000 (UTC) X-FDA: 83454695394.02.2B7CF33 Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) by imf19.hostedemail.com (Postfix) with ESMTP id B84D11A0004 for ; Sun, 18 May 2025 04:07:55 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=M7W0H+Ue; spf=pass (imf19.hostedemail.com: domain of rientjes@google.com designates 209.85.214.176 as permitted sender) smtp.mailfrom=rientjes@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747541275; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=m1S5wux5Un3HR/J/zrUPg8YRxdL0/UwPOgAQamHb9Wg=; b=B2wpH/whunTpvL/pk5X5Yf0kw7uCgVhhGE7pW8bMRs7+IYPs20oZF+3exgaKkK88Q+6Keq w0WCeYPAw+rUZEUtsvN9c0PyYmRAaoyZDDnulN/3SlrRN2clgmN8B1PbPu8InUWyAXIFmZ uiDraqi1icOHytWlB6OclAzXJobVCAg= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=M7W0H+Ue; spf=pass (imf19.hostedemail.com: domain of rientjes@google.com designates 209.85.214.176 as permitted sender) smtp.mailfrom=rientjes@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747541275; a=rsa-sha256; cv=none; b=5mnNSlwk9hOtGavsNZZ4l6sw+3ywq7BB+610jcsQsKu348sjo1KKB6kguauODi/8DLeveA WqapUQchnEbFaBnWod/MZUOUHNTZtGmj4/9+LoE/dQ3CFjghVQ1N56zUicZBiEfqJSz0rr V76ETpz1cvlkqS+A2EYLkd3gYmAntHU= Received: by mail-pl1-f176.google.com with SMTP id d9443c01a7336-231f61dc510so206055ad.0 for ; Sat, 17 May 2025 21:07:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747541274; x=1748146074; darn=kvack.org; h=mime-version:message-id:subject:cc:to:from:date:from:to:cc:subject :date:message-id:reply-to; bh=m1S5wux5Un3HR/J/zrUPg8YRxdL0/UwPOgAQamHb9Wg=; b=M7W0H+UeXhk8P2RUUJfxikdqBiJSAtd9Pfh38vbbd6dlWyhJ4Q1sQFuIqZHM+YfxSb 2Aygfcu6v6L+YJtt79wCVStch5yJjR9j87g/c5Tnkn4EPVEG+VxHEFogyVYnnRRxOJQb fJf6HGVWGVVuo14pbPkYOxAqQ7AOgDvFP/VZKUqx7P0nAGubW61K2zuCcPbiYxlM7M2t f6xcUSBlw83HxsdueXi1oo66OM25dUkQGP+YjXpDv+a+ZQBH96+k5uqu1JuAXDxcicgi 4+kfrlYB2Ydv639DVuc3aRMCs3tLTC/lrT85UW8+Op7raVUNm70WLNpQaALvSRd5kR7p 8Y/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747541274; x=1748146074; h=mime-version:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=m1S5wux5Un3HR/J/zrUPg8YRxdL0/UwPOgAQamHb9Wg=; b=GbXCkmvwYOxtqXXhsrdfdmJFovK5soUIvJSf4S/8mW3mS2GJ3XotrqyhXZ7L536peE gJ8VTjIx7MuUAURU5SYTIjQYVczpMY2eBpUy/4p6TH9qRWI14BvgHfCvgOtWJ/TfLmxD XrKyxht9Vut48JDs39uOqXQ/1rVRLw7M96BY8vT2rNGiMKO1H65R+HCp3I4bPk+PUJP6 wOcuIEP2hrr30TSlTsKnACiaoLGnNqkQNs3Kxf+3OPeqBP8I8apR2P0KmV3U/0QVYZyQ bVJoYT3T+oBmkme3jE/yWJl36ZjgA/F/sdoAbobNcBBgfdkt0MofI5uy3C+pix+iliKk gOmQ== X-Gm-Message-State: AOJu0Yx5bmwVeW1vL8go7vjseGpeh8YIX2RNYM64rCsGBHB0o380Pfc7 1VCLfNZ5KqxgY9oHGSDCX51BT0OI0evGCJkXUPdJigu89iu9uaDMivICHtxL5qeZsg== X-Gm-Gg: ASbGncvI+Txg0wWLpl77EAZVJ7FBhiaQbuEGNUuB43oC+BjPk8C/DdWMnppEdJJO3Rp cgC5GfT8DupGwzOM8emNxVHDe4xXd6pZXSKNkLE3nF1O5MZ/crFjM9yTtz4dcxTw/VNLsRYkvWv 8eBdAKwU8/RyGWn4jid/bbvl3NRJG+TVwER0gN+NFifnV1b6IoM4M1BgqYzPgIicg8rrrSVxjXB oaC7TxHSDTe+4leQOzsveJ4ri/zIu6OjgETqUa9jYY+5CNfIJy6OAlufZqpmNU1RNIxATunACcE 8pDRr60hnxdss/N4EcCvTEzmbbjMqke50nf4ccZiXVMjpOgl8U3ZNPnvylrIgTdUWiB+0gJCA2z TEBJPGzVBwFNM4udpNwYsNb+rcb82L5pksKuNsXbk4ma+HCV3CI69Vbrx0DhLzq+RWt4= X-Google-Smtp-Source: AGHT+IEgxvBhOFuCqEEVpkjjjRbwY5GhLbpjFDYwCP66o+lD5JckQmQSrsygVA2eRoddVrKz+dAwPA== X-Received: by 2002:a17:902:e94f:b0:216:48d4:b3a8 with SMTP id d9443c01a7336-231ffd31118mr2592805ad.16.1747541274090; Sat, 17 May 2025 21:07:54 -0700 (PDT) Received: from [2a00:79e0:2eb0:8:c2a1:ddfd:5eb2:50a4] ([2a00:79e0:2eb0:8:c2a1:ddfd:5eb2:50a4]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-231d4e98082sm37688705ad.122.2025.05.17.21.07.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 17 May 2025 21:07:53 -0700 (PDT) Date: Sat, 17 May 2025 21:07:52 -0700 (PDT) From: David Rientjes To: Alexander Graf , Anthony Yznaga , Dave Hansen , David Hildenbrand , David Matlack , Frank van der Linden , James Gowans , Jason Gunthorpe , Junaid Shahid , Mike Rapoport , Pankaj Gupta , Pasha Tatashin , Pratyush Yadav , Vipin Sharma , Vishal Annapurve , "Woodhouse, David" cc: linux-mm@kvack.org, kexec@lists.infradead.org Subject: [Hypervisor Live Update] Notes from May 5, 2025 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspam-User: X-Rspamd-Queue-Id: B84D11A0004 X-Rspamd-Server: rspam09 X-Stat-Signature: kiax4wczzex6snqacto8cyc8di6o58xo X-HE-Tag: 1747541275-755034 X-HE-Meta: U2FsdGVkX1/IUhY5gBSE6ANqefSx1XqWfvPHf68zssXL1Tw4K4R49hV21H1BtbOYe8hgTBWiG3OsRlLEmx+4GmxcHmSF2o8Esvav50oZGJWW0xJW3s0yTL9c1OfgtrLAOjGnujiWb9rp96hH6uWJda4+cWj0hZUosaM0VE9eMZ+XircBDhCxCM1gSmE2x+zVv/AMkJjDW3zzpjUVh+yvwQZdH2+Q2EEcXvib8AAURjn+yMoYWwOAIzPFhOb/NNTqAsmAIEn+YhVWNkn0Z8GivFVKVv+TTIommbUEmvXLURb/AKfQezExE8t1hSKUS1tsjQYnHhQa9z9+O2pfKRnbCW/NH8PXp3g/DSJJ7dtVeHvN1ctuM7PpSJV+9bDz/sBaEVR5JsLcmKC8+4Jx1ELMMAaXlJMtZERpWpkQbUV1ov1A5wvM6uw/tWCbgTBccXLr5M6b9c9yJcrjzkKIjaNckmvCb+Dd4yi/6cKAkn0tRczyVSvIw9p0e41QtahG1uvEdBj4JqImclKaJFTCAV3qBDxUeLTEhjEyiTJYGsL9qKNEqKoc087LOMwvYQU5Cwpjbw3WRzIJdPRfQ9GRL1W9uVxk6Y3EqKgb4qnV+bnO//bMMPHhd1n9Udb9Ja1wM7P6tq7GXQOkVkkp6o3qgVbYA3HZqaUBpgQHSviHq/wFVuWCsmNJp3bBLCooGpEV8Sj9YfO3K7qGhffYpSYafpyTfeDAAwe9Xkzf0OUHdOlcA3/46z0a6pfHhWu2hsMnN0OOKXMlHjVKivSHMYU5FLmOU/ypai86MKzbUxnT61XwWw0Tk9JV+ZpFY4C5E3poqtkACXHHQ+zjULtceUsK1JP0SGZUXRsL0sfaRFoxFUX3dQaNialZhf3W1YuaLC914e2fix62kaQsliSu3KsIDAft7nLzn052sDIck4yoQ161kX4CP3UTpdYZlpVIvSuUnA1yQR+ct9GSdJTtuIrDKQ+ Fum01NuE zagqTNFwoHjDDrVMNCwEj10jakbbLorehXjhuiMXXttbv8gTG+1UfOLnQ2f9i/FWZSYRydSuquJnga0TM2XjgRRGkGRjdJfsRkwfrMwYU9tB3jNA/j1eVFbtGoACXEDKocsTCb7nu2vsjzc4m/t71/86DJtUD06BdEBSo8QmMfOgdXSZs1TP8e9nxYvF/ak/wvWYYJoaCVwIlr2e+l28TOzBerXk0nXydSrUvSk6ko18Z1y35WnN1/h1+66o0P+iN9hUsaUXOZx5++B72GUSYsC+cgvVslsvsQJO05D77EV6PnDWhfap3vQ5eiRQ7rd7WympIq2sDJsb33u//zYxD59kImUoZX9b0qX+mCTPrE3bLIyB/o9oGO175Pwpsiiad1zRgfYNmLhYBovODz7MuwPTI+bxNtbpNCaHWiY0lsqi0kF6Q1TSIYWkYbgA2qOXQSKupGLMBO68WJ/1D3Plwx12jfFPsp5DVpLK/DNsPumnNyJQzr1LeMXueqY8xp8sr1qfj1Iip2nRaC8Q= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi everybody, Here are the notes from the last Hypervisor Live Update call that happened on Monday, May 5. Thanks to everybody who was involved! These notes are intended to bring people up to speed who could not attend the call as well as keep the conversation going in between meetings. ----->o----- We discussed reviews on the latest series of KHO changes. Pasha noted that Dave Hansen's feedback should be fully addressed. Small changes were requested that could be handled incrementally on top. Changyuan echoed that an incremental x86 patch on top could be posted as KHO v7 is currently in mm-unstable. I asked if there were any major blockers to the series and there did not appear to be. Jason suggested that we just leave the current status of the series as-is so that it can graduate to mm-stable and then eventually to upstream. It was noted that compound_folio() may not be handled correctly based on feedback from Mike Rapoport upstream. Changyuan said there was currently an issue where KHO would have difficulty in preserving a folio for the same number of times that it was referenced; the same issue would likely apply to compound folio. Jason suggested that if a memfd needs to be preserved by multiple processes that they still get a single fd, it doesn't get preserved twice. ----->o----- The current status of KHO is such that it should be possible to build iommufd and memfd on top of it. Pasha said that once the compound folio concern was addressed, and now that all of Dave's feedback was also addressed, that everything else should be able to be incremental on top of the framework. I started discussing the next phase of development for the KHO framework. Pasha said that LUO is the next thing that should land upstream as it controls the lifecycle. For KHO itself, additional support for other architectures, bug fixes, and scalability improvements are always possible. ----->o----- Pasha said that he would be sending out an RFC v2 for LUO. The first user of LUO would be memfd that Changyuan is working on. He was porting fdbox support on top of LUO. Pratyush asked for the patches to be sent out to fix some issues on what he was working on ahead of time. Pasha noted that he would present LUO design in the next biweekly meeting to provide more visibility for the rest of the group. We briefly chatted about the future of guestmemfs support after LUO and whether there is a future for it. Until we hear more about additional requirements that necessitate a guestmemfs, we'll drop the topic for subsequent meetings. ----->o----- Andrey discussed with Chris Li about the next steps for KSTATE on top of KHO. Chris said that he would work with Andrey for state saving with PCI code; he felt that KSTATE was a solid direction for PCI. FDT is designed for information saving but not serializing state. In LUO, there are many recursive FDT objects and that breaks the fact that it needs to be stored in one big save state structure. Chris suggested that objects would need to support pointers. Pasha asked about recursive FDT with LUO and said this was not the plan; LUO does not necessarily care itself. He supported KSTATE for this. LUO supports the 8-byte pointer for preservation. Chris said they are shifting to a tree like structure branched off of this 8-byte pointer. Chris said that for every object we store, there will be a description of the member type, ID, and type. Chris said this needed to be stored as part of the binary format for the new kernel to understand what the old kernel was using, including for rollback. He also noted that FDT does not support descriptions for the acceptable ranges that members can take on very well (and version number may be inappropriate to describe this). Jason said that per-member schemas would probably be very complex. Chris said this would be needed. Jason suggested per-field would be way too granular, he was skeptical that rollback was something that the upstream kernel should support; if a CSP wants to deploy v2 -> v3, then this is entirely deferred to them. Jason suggested having very coarse versioning instead that captures everything. Chris said it would be important for a vendor to be able to add their own versions. Jason noted that it would not be possible to enable a new feature in the fleet until the CSP is no longer willing to roll back. There may be some minor exceptions to this, but for more features they will need to remain off until fully deployed and it's not possible to roll back. Amit Shah agreed that the feature must be available but not enabled; once available everywhere then it can be enabled throughout the fleet. David Matlack observed the similarities for KVM features today and agreed. The amount of state to preserve across kexec would have to be minimal and updating versions should be rare. Andrey agreed with this. Pasha asked about the situation when enabling a new feature would run into issues in the fleet and whether reboot would be the only way to recover from that. Jason said this should go through the VMM so that the only way to enable a new feature is when the VM restarts and once that is committed, then it's there until the VM reboots. ----->o----- Jason noted that he posted his first patch series to make the iommu page tables common[1] that could become part of the KHO work. Pasha said this was exciting to see and that it would be possible to add page table checks for this. That consolidating iommu page table implementation patch series would benefit from review from the community, so people are strongly encouraged to take a look. ----->o----- Pasha noted that the support would be sent upstream soon to support dev dax as 1GB shards similar to how hugetlb is managed. Currently under review was support pmem regions into arbitrary lengths. There was also support to provision fsdax by default but dev dax would be optional as defined by the kernel command line. This would avoid the need for a separate tool. ----->o----- Frank van der Linden discussed his physical pool allocator. His concept is for a common layer that is separate from hugetlb; the topic has come up a number of different times. For example, if memfd needs to be backed by 1GB, then a physical pool allocator could provide this, as well as for guest_memfd. This will decouple 1GB pages from hugetlb entirely. His series provides a common allocator for physical memory and will be sending an RFC prototype soon. The 1GB pages comes from a static pool or from a dynamic pool and can even be removed from the kernel direct map. Goal will be to send this out in June. Jason noted that we should avoid having to store information about the vmemmap across kexec if at all possible. ----->o----- Next meeting will be on Monday, May 19 at 8am PDT (UTC-7), everybody is welcome: https://meet.google.com/rjn-dmzu-hgq Topics for the next meeting: - 20 min: presentation of LUO v2 design - check back on latest status of KHO series in mm staging trees and any pending concerns + including possible refcount issues for compound folios across KHO - possibility of a Live Update Microconference for LPC this year - discuss support for sharding of dax devices into arbitrary lengths - discuss support for defaulting to fsdax and with optional devdax as needed, provisioned by the kernel, on the command line without additional tooling (ndctl) - update on physical pool allocator that can be used to provide pages for hugetlb, guest_memfd, and memfds - SEV-SNP support for preserving guest memory and what foundational components AMD can depend on, building on top of KHO v6 or KSTATE - later: testing methodology to allow downstream consumers to qualify that live update works from one version to another - later: reducing blackout window during live update Please let me know if you'd like to propose additional topics for discussion, thank you! [1] https://marc.info/?l=linux-doc&m=174645437711873&q=mbox