From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D2423C02198 for ; Mon, 17 Feb 2025 03:19:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 35E8E6B0083; Sun, 16 Feb 2025 22:19:38 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2E6BE6B009C; Sun, 16 Feb 2025 22:19:38 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 13A0928002A; Sun, 16 Feb 2025 22:19:38 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id E61006B0083 for ; Sun, 16 Feb 2025 22:19:37 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 6B5BE4D196 for ; Mon, 17 Feb 2025 03:19:37 +0000 (UTC) X-FDA: 83127981594.23.2B899A6 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf26.hostedemail.com (Postfix) with ESMTP id 136F0140002 for ; Mon, 17 Feb 2025 03:19:34 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=cKYSpcVN; spf=pass (imf26.hostedemail.com: domain of ruyang@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=ruyang@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739762375; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=EgbKbeBQm7lkao3p6uQLBUr3dfHH6Bslim0jMF3gJaE=; b=wx3HlAbpnEKgFRiTtBUzygYnsm875xUWWYH+4OePRiEDVzPFRLLUt6P219Z2275FPFpecU k7Ie2BFAv67qmPtSHsmO4Lpgilgq0zAnI3ND06zqXSXdSWX9wPU0lV0RgyM43ElQlAXGzS jc0FNdxJ9e0ErZrqv7LouVOYT41jA5Y= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=cKYSpcVN; spf=pass (imf26.hostedemail.com: domain of ruyang@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=ruyang@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739762375; a=rsa-sha256; cv=none; b=7KEW3CH+I6ehHLmtag3aVfOq8oLItsiijnLDlGGanIyZq+TokZ2DQ7Ph1U67jhsBbOyWY7 o7gYWuZVKLO5EOcAZjLU5XnSoS5YHaS0RBw5GSwV2/97lMjW8mQoKnyL4exZhRU9YB3EPz K8+zmkSMOb1HnO221PCQmb49Y81Oucc= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739762374; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=EgbKbeBQm7lkao3p6uQLBUr3dfHH6Bslim0jMF3gJaE=; b=cKYSpcVN82Li331bz/K+rHyBcOloznPdKoFJgT4dzMmb698dPOFAKA+nYc0UhSc9+UCDdL 14ns5WcCqpQ48V74GuUeEzpok0upG8tTzBuAdCKgUWIhv+CSDA+uVcJnIWGRiIjcXPb540 9QoIb9+iIiXGtPEsAHGdFa+kxnma7eM= Received: from mail-il1-f199.google.com (mail-il1-f199.google.com [209.85.166.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-253-Fz77VYBaMUSWOm7SgMej5A-1; Sun, 16 Feb 2025 22:19:32 -0500 X-MC-Unique: Fz77VYBaMUSWOm7SgMej5A-1 X-Mimecast-MFC-AGG-ID: Fz77VYBaMUSWOm7SgMej5A_1739762372 Received: by mail-il1-f199.google.com with SMTP id e9e14a558f8ab-3d187cab068so64906745ab.2 for ; Sun, 16 Feb 2025 19:19:32 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739762372; x=1740367172; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=EgbKbeBQm7lkao3p6uQLBUr3dfHH6Bslim0jMF3gJaE=; b=vItPIZ3PZmQNZqB9ZFYwBp+a1/gLnYTGSnQrMHJQyjRROgiq/vocAm6+chQcC9z0/k XtqgSQVlO7aPmheLsdmnFrAdSL8JWn0zZwU4VeBBxuaz5/iMJx1aTqTdGfT6tQMd20fZ tdHKMtzVmEoo5ObUoY+HwNa2SnybkZtUvf+lq0cbQS5FIvC60uifCvv8+eRNm+P6AB0O Dw1lSv4pLI1/C28hH4YIXmou6Usrz7+U+76Ry15rzcjtpF6j0QnN8/i5i/1NL8MGI1DA AbflIKM+4Qa8VMktYHjH8pNtQ7n4p+T3CuUZePAn+Ltg5a2B8xG+HU7aZIQieg1nUyTC VuHA== X-Forwarded-Encrypted: i=1; AJvYcCXedTaSpvB+vH97DyK63pybRexZOyJ6+ZuZ/vsaeBSuL/OvdTj9xPNj5x9JWgQjGabMH6Z2dU2jng==@kvack.org X-Gm-Message-State: AOJu0YwwRIPWT1qV+WKXek/+jb3tReKjSeEDqho0qOXbtcCEzLmcydIZ M1jsZ9d645+6zu53gFbCDDAmDaekgxNscftEB/QvKiCZf2Xn14IvSDALPobQoCnjW93e0tsIdRu 5R9WyndrTYqu9IS+/tYdd6b6yq+sjV9wKcq5Ti2je9Ap55oG+gYJq7jnUfVdC9xVBHRGp8G1MQ8 X984dJCaTvCeIkt1gwM4oG8oU= X-Gm-Gg: ASbGncuGhZ2phsyysY60RC81L6CrFaUwcBtzIziFBAuKmHMIk1PeyrD/Og2xu/GA2tP SikBTPKoH6ttgxbeV6sOVtc5tdx65r9xLgudEljBuiSSxCqclJAb8UkwRWpt+4HM= X-Received: by 2002:a05:6e02:1b09:b0:3d0:fe8:607e with SMTP id e9e14a558f8ab-3d280940c9bmr64357435ab.14.1739762371861; Sun, 16 Feb 2025 19:19:31 -0800 (PST) X-Google-Smtp-Source: AGHT+IFLJ+sakoWDc92vC5PVdrAPLBDgA51ioqiveqE/kVlhzpNtt+A2nTfHDcNxz2e0uNeHln7kmPyQMRXuDBPUaYI= X-Received: by 2002:a05:6e02:1b09:b0:3d0:fe8:607e with SMTP id e9e14a558f8ab-3d280940c9bmr64357085ab.14.1739762371480; Sun, 16 Feb 2025 19:19:31 -0800 (PST) MIME-Version: 1.0 References: <20250206132754.2596694-1-rppt@kernel.org> In-Reply-To: <20250206132754.2596694-1-rppt@kernel.org> From: RuiRui Yang Date: Mon, 17 Feb 2025 11:19:45 +0800 X-Gm-Features: AWEUYZmTtRGOQe_1FWRhP6QKOKg6j7aVMk1lhIeHe6H83IiAgYz-eSDseytmTwE Message-ID: Subject: Re: [PATCH v4 00/14] kexec: introduce Kexec HandOver (KHO) To: Mike Rapoport Cc: linux-kernel@vger.kernel.org, Alexander Graf , Andrew Morton , Andy Lutomirski , Anthony Yznaga , Arnd Bergmann , Ashish Kalra , Benjamin Herrenschmidt , Borislav Petkov , Catalin Marinas , Dave Hansen , David Woodhouse , Eric Biederman , Ingo Molnar , James Gowans , Jonathan Corbet , Krzysztof Kozlowski , Mark Rutland , Paolo Bonzini , Pasha Tatashin , "H. Peter Anvin" , Peter Zijlstra , Pratyush Yadav , Rob Herring , Rob Herring , Saravana Kannan , Stanislav Kinsburskii , Steven Rostedt , Thomas Gleixner , Tom Lendacky , Usama Arif , Will Deacon , devicetree@vger.kernel.org, kexec@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: yICvW-VKI3UUK7W-02mdUwaI_OX31UyaKddkOy-uZCA_1739762372 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Queue-Id: 136F0140002 X-Stat-Signature: henztyb9j4wrquzxfwu53ete1bec9uxr X-Rspamd-Server: rspam03 X-HE-Tag: 1739762374-581568 X-HE-Meta: U2FsdGVkX19GUhGgVBj5Gc510eP/UlNy6W9X8+0IZbFtP1OuFFwZd3XsPErqFNCGDGECCQbQb3fijpUqKNFCdHlRb9caaXurO2TADesx0JTJZBdeGJFGdTD7CtOC/0a42t1HnUZZzBvgs8lZL9ZAmwvEsnqX7RTmWU0/Fx1iXH21q/hEp9JEXwtMs8BCFTEv78PatKFAl0EJK6fsEEWVwJt6bJgWkY+lo5vAsQm5itG5uvBl0jWrNRYQToxHpMQi/ZyFGnbZ3Wstyw0mr+KBPopX7xLlgz423Ligg6s+wiEdqUFDaSVMcZMAttSlxS0IweAx4FVYAs5+w63Urvn/hxe07ouvHFKjGSgeRM21jfn2TFcc2608h7+E2zxlB4hwOA/d2JzFSsHfNovKoIg5ZDV7ydwd0JZxATdXDlOijeXAOXeNikz20HjnDcMfRmKspzcFgKrobgbjN82YS0JGF/xO+Vsz+GMZhihBZo4g3C+q3Rsl3L4p4Qk7GVgSZ05WdMcqzZxRVvRpvLMJz+4PYa0+mr5QZGqYnltQaLaueuFz6wkKRcftk7N73uy4jSKR4s45Mvkk9ZR2DyreJyr+QWtUOxar3eBJ54HJy1Kz0EietgQuXu6EqZPYbteSrzQeNtYgebxps0nbCDJzf5XuoO1mheSbiIGUg/nOgLgU1QQF3UT/c6iKljO7/jEb6DB8TPIIG1Gfo2eTIOB4KJmIj42j8I/RTEPHMJbvpWaM/zC9xxezb8Bh02P2nASGoETOfKmRDrw/0xp/Cv3B07041OjwvY4gHFBGH8gJfTWXxQEOCJLXwUYV0AvNMdBoFNfStDwX0SxRrSja95FWW1AwLKhEHSK12PPuhrkcIdUlf13eg1Ol69O6xQQKxpICt2bwbUYh6Dd6DwkF8yXtGmL6MGhqtYaOT7Le4lJEgDxiDM1LQKFHtkPNfXF8i+jTPqpzgjof4b9EwrtGpOuwmr0 nUS1r2+o ojCYEDMWxTk/PhhKAyXAv+Svu90mZnbJWst5KLHSETjjUp86pQFJ4ePQe+9KRo9qX7EkORPg9h5Ldn04CUUBfSLUBs/Iu9LAQpiIaVW8P9hH3UuSxEb+YXfFAuP2dtXh/q1DIyN4MpwTtkL5lVa6c69OcdQeONCmHlnUl80zXLPY9qpFVSutRaj0lQD72qz2wFRUMiLnxWo8P7LPQJDBQ2L1WYmIgRrTa7x+GduVzGHJTZXOCgI2+kW1jNGWbtws72eMnSRiNJx9Z944R7D4sAFVQbLO1kvBB7OOY1cAJRZkH9VsLYSE7zRkml8q2UXZqFCaTTWs0o4XOvnzY4WUV1L58G/sgD+V5TJx/Y1azXwx6LMK4NLpGct3EyfXA46SS6llvHfbrtFQOiiGeD8ezRXNPY1JWrC66vxvfnkVQgtK1y3EweqHnN+OuNmsWCZjmn7IYgu/T/YDEczE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, 6 Feb 2025 at 21:34, Mike Rapoport wrote: > > From: "Mike Rapoport (Microsoft)" > > Hi, > > This a next version of Alex's "kexec: Allow preservation of ftrace buffers" > series (https://lore.kernel.org/all/20240117144704.602-1-graf@amazon.com), > just to make things simpler instead of ftrace we decided to preserve > "reserve_mem" regions. > > The patches are also available in git: > https://git.kernel.org/rppt/h/kho/v4 > > > Kexec today considers itself purely a boot loader: When we enter the new > kernel, any state the previous kernel left behind is irrelevant and the > new kernel reinitializes the system. > > However, there are use cases where this mode of operation is not what we > actually want. In virtualization hosts for example, we want to use kexec > to update the host kernel while virtual machine memory stays untouched. > When we add device assignment to the mix, we also need to ensure that > IOMMU and VFIO states are untouched. If we add PCIe peer to peer DMA, we > need to do the same for the PCI subsystem. If we want to kexec while an > SEV-SNP enabled virtual machine is running, we need to preserve the VM > context pages and physical memory. See "pkernfs: Persisting guest memory > and kernel/device state safely across kexec" Linux Plumbers > Conference 2023 presentation for details: > > https://lpc.events/event/17/contributions/1485/ > > To start us on the journey to support all the use cases above, this patch > implements basic infrastructure to allow hand over of kernel state across > kexec (Kexec HandOver, aka KHO). As a really simple example target, we use > memblock's reserve_mem. > With this patch set applied, memory that was reserved using "reserve_mem" > command line options remains intact after kexec and it is guaranteed to > reside at the same physical address. > > == Alternatives == > > There are alternative approaches to (parts of) the problems above: > > * Memory Pools [1] - preallocated persistent memory region + allocator > * PRMEM [2] - resizable persistent memory regions with fixed metadata > pointer on the kernel command line + allocator > * Pkernfs [3] - preallocated file system for in-kernel data with fixed > address location on the kernel command line > * PKRAM [4] - handover of user space pages using a fixed metadata page > specified via command line > > All of the approaches above fundamentally have the same problem: They > require the administrator to explicitly carve out a physical memory > location because they have no mechanism outside of the kernel command > line to pass data (including memory reservations) between kexec'ing > kernels. > > KHO provides that base foundation. We will determine later whether we > still need any of the approaches above for fast bulk memory handover of for > example IOMMU page tables. But IMHO they would all be users of KHO, with > KHO providing the foundational primitive to pass metadata and bulk memory > reservations as well as provide easy versioning for data. > > == Overview == > > We introduce a metadata file that the kernels pass between each other. How > they pass it is architecture specific. The file's format is a Flattened > Device Tree (fdt) which has a generator and parser already included in > Linux. When the root user enables KHO through /sys/kernel/kho/active, the > kernel invokes callbacks to every driver that supports KHO to serialize > its state. When the actual kexec happens, the fdt is part of the image > set that we boot into. In addition, we keep a "scratch regions" available > for kexec: A physically contiguous memory regions that is guaranteed to > not have any memory that KHO would preserve. The new kernel bootstraps > itself using the scratch regions and sets all handed over memory as in use. > When drivers initialize that support KHO, they introspect the fdt and > recover their state from it. This includes memory reservations, where the > driver can either discard or claim reservations. > > == Limitations == > > Currently KHO is only implemented for file based kexec. The kernel > interfaces in the patch set are already in place to support user space > kexec as well, but it is still not implemented it yet inside kexec tools. > What architecture exactly does this KHO work fine? Device Tree should be ok on arm*, x86 and power*, but how about s390? Thanks Dae