From: Jason Gunthorpe <jgg@nvidia.com>
To: Mike Rapoport <rppt@kernel.org>
Cc: linux-kernel@vger.kernel.org, Alexander Graf <graf@amazon.com>,
Andrew Morton <akpm@linux-foundation.org>,
Andy Lutomirski <luto@kernel.org>,
Anthony Yznaga <anthony.yznaga@oracle.com>,
Arnd Bergmann <arnd@arndb.de>,
Ashish Kalra <ashish.kalra@amd.com>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Borislav Petkov <bp@alien8.de>,
Catalin Marinas <catalin.marinas@arm.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
David Woodhouse <dwmw2@infradead.org>,
Eric Biederman <ebiederm@xmission.com>,
Ingo Molnar <mingo@redhat.com>, James Gowans <jgowans@amazon.com>,
Jonathan Corbet <corbet@lwn.net>,
Krzysztof Kozlowski <krzk@kernel.org>,
Mark Rutland <mark.rutland@arm.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Pasha Tatashin <pasha.tatashin@soleen.com>,
"H. Peter Anvin" <hpa@zytor.com>,
Peter Zijlstra <peterz@infradead.org>,
Pratyush Yadav <ptyadav@amazon.de>,
Rob Herring <robh+dt@kernel.org>, Rob Herring <robh@kernel.org>,
Saravana Kannan <saravanak@google.com>,
Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>,
Steven Rostedt <rostedt@goodmis.org>,
Thomas Gleixner <tglx@linutronix.de>,
Tom Lendacky <thomas.lendacky@amd.com>,
Usama Arif <usama.arif@bytedance.com>,
Will Deacon <will@kernel.org>,
devicetree@vger.kernel.org, kexec@lists.infradead.org,
linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org,
linux-mm@kvack.org, x86@kernel.org
Subject: Re: [PATCH v4 05/14] kexec: Add Kexec HandOver (KHO) generation helpers
Date: Mon, 10 Feb 2025 16:22:20 -0400 [thread overview]
Message-ID: <20250210202220.GC3765641@nvidia.com> (raw)
In-Reply-To: <20250206132754.2596694-6-rppt@kernel.org>
On Thu, Feb 06, 2025 at 03:27:45PM +0200, Mike Rapoport wrote:
> diff --git a/Documentation/ABI/testing/sysfs-kernel-kho b/Documentation/ABI/testing/sysfs-kernel-kho
> new file mode 100644
> index 000000000000..f13b252bc303
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-kernel-kho
> @@ -0,0 +1,53 @@
> +What: /sys/kernel/kho/active
> +Date: December 2023
> +Contact: Alexander Graf <graf@amazon.com>
> +Description:
> + Kexec HandOver (KHO) allows Linux to transition the state of
> + compatible drivers into the next kexec'ed kernel. To do so,
> + device drivers will serialize their current state into a DT.
> + While the state is serialized, they are unable to perform
> + any modifications to state that was serialized, such as
> + handed over memory allocations.
> +
> + When this file contains "1", the system is in the transition
> + state. When contains "0", it is not. To switch between the
> + two states, echo the respective number into this file.
I don't think this is a great interface for the actual state machine..
> +What: /sys/kernel/kho/dt_max
> +Date: December 2023
> +Contact: Alexander Graf <graf@amazon.com>
> +Description:
> + KHO needs to allocate a buffer for the DT that gets
> + generated before it knows the final size. By default, it
> + will allocate 10 MiB for it. You can write to this file
> + to modify the size of that allocation.
Seems gross, why can't it use a non-contiguous page list to generate
the FDT? :\
See below for a suggestion..
> +static int kho_serialize(void)
> +{
> + void *fdt = NULL;
> + int err = -ENOMEM;
> +
> + fdt = kvmalloc(kho_out.dt_max, GFP_KERNEL);
> + if (!fdt)
> + goto out;
> +
> + if (fdt_create(fdt, kho_out.dt_max)) {
> + err = -EINVAL;
> + goto out;
> + }
> +
> + err = fdt_finish_reservemap(fdt);
> + if (err)
> + goto out;
> +
> + err = fdt_begin_node(fdt, "");
> + if (err)
> + goto out;
> +
> + err = fdt_property_string(fdt, "compatible", "kho-v1");
> + if (err)
> + goto out;
> +
> + /* Loop through all kho dump functions */
> + err = blocking_notifier_call_chain(&kho_out.chain_head, KEXEC_KHO_DUMP, fdt);
> + err = notifier_to_errno(err);
I don't see this really working long term. I think we'd like each
component to be able to serialize at its own pace under userspace
control.
This design requires that the whole thing be wrapped in a notifier
callback just so we can make use of the fdt APIs.
It seems like a poor fit me.
IMHO if you want to keep using FDT I suggest that each serializing
component (ie driver, ftrace whatever) allocate its own FDT fragment
from scratch and the main KHO one just link to the memories that holds
those fragements.
Ie the driver experience would be more like
kho = kho_start_storage("my_compatible_string,v1", some_kind_of_instance_key);
fdt...(kho->fdt..)
kho_finish_storage(kho);
Where this ends up creating a stand alone FDT fragment:
/dts-v1/;
/ {
compatible = "linux-kho,my_compatible_string,v1";
instance = some_kind_of_instance_key;
key-value-1 = <..>;
key-value-1 = <..>;
};
And then kho_finish_storage() would remember the phys/length until the
kexec fdt is produced as the very last step.
This way we could do things like fdbox an iommufd and create the above
FDT fragment completely seperately from any notifier chain and,
crucially, disconnected from the fdt_create() for the kexec payload.
Further, if you split things like this (it will waste some small
amount of memory) you can probably get to a point where no single FDT
is more than 4k. That looks like it would simplify/robustify alot of
stuff?
Jason
next prev parent reply other threads:[~2025-02-10 20:22 UTC|newest]
Thread overview: 97+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-06 13:27 [PATCH v4 00/14] kexec: introduce Kexec HandOver (KHO) Mike Rapoport
2025-02-06 13:27 ` [PATCH v4 01/14] mm/mm_init: rename init_reserved_page to init_deferred_page Mike Rapoport
2025-02-18 14:59 ` Wei Yang
2025-02-19 7:13 ` Mike Rapoport
2025-02-20 8:36 ` Wei Yang
2025-02-20 14:54 ` Mike Rapoport
2025-02-25 7:40 ` Mike Rapoport
2025-02-06 13:27 ` [PATCH v4 02/14] memblock: add MEMBLOCK_RSRV_KERN flag Mike Rapoport
2025-02-18 15:50 ` Wei Yang
2025-02-19 7:24 ` Mike Rapoport
2025-02-23 0:22 ` Wei Yang
2025-03-10 9:51 ` Wei Yang
2025-03-11 5:27 ` Mike Rapoport
2025-03-11 13:41 ` Wei Yang
2025-03-12 5:22 ` Mike Rapoport
2025-02-24 1:31 ` Wei Yang
2025-02-25 7:46 ` Mike Rapoport
2025-02-26 2:09 ` Wei Yang
2025-03-10 7:56 ` Wei Yang
2025-03-10 8:28 ` Mike Rapoport
2025-03-10 9:42 ` Wei Yang
2025-02-26 1:53 ` Changyuan Lyu
2025-03-13 15:41 ` Mike Rapoport
2025-02-06 13:27 ` [PATCH v4 03/14] memblock: Add support for scratch memory Mike Rapoport
2025-02-24 2:50 ` Wei Yang
2025-02-25 7:47 ` Mike Rapoport
2025-02-06 13:27 ` [PATCH v4 04/14] memblock: introduce memmap_init_kho_scratch() Mike Rapoport
2025-02-24 3:02 ` Wei Yang
2025-02-06 13:27 ` [PATCH v4 05/14] kexec: Add Kexec HandOver (KHO) generation helpers Mike Rapoport
2025-02-10 20:22 ` Jason Gunthorpe [this message]
2025-02-10 20:58 ` Pasha Tatashin
2025-02-11 12:49 ` Jason Gunthorpe
2025-02-11 16:14 ` Pasha Tatashin
2025-02-11 16:37 ` Jason Gunthorpe
2025-02-12 15:23 ` Jason Gunthorpe
2025-02-12 16:39 ` Mike Rapoport
2025-02-12 17:43 ` Jason Gunthorpe
2025-02-23 18:51 ` Mike Rapoport
2025-02-24 14:28 ` Jason Gunthorpe
2025-02-12 12:29 ` Thomas Weißschuh
2025-02-06 13:27 ` [PATCH v4 06/14] kexec: Add KHO parsing support Mike Rapoport
2025-02-10 20:50 ` Jason Gunthorpe
2025-03-10 16:20 ` Pratyush Yadav
2025-03-10 17:08 ` Mike Rapoport
2025-02-06 13:27 ` [PATCH v4 07/14] kexec: Add KHO support to kexec file loads Mike Rapoport
2025-02-06 13:27 ` [PATCH v4 08/14] kexec: Add config option for KHO Mike Rapoport
2025-02-06 13:27 ` [PATCH v4 09/14] kexec: Add documentation " Mike Rapoport
2025-02-10 19:26 ` Jason Gunthorpe
2025-02-06 13:27 ` [PATCH v4 10/14] arm64: Add KHO support Mike Rapoport
2025-02-09 10:38 ` Krzysztof Kozlowski
2025-02-06 13:27 ` [PATCH v4 11/14] x86/setup: use memblock_reserve_kern for memory used by kernel Mike Rapoport
2025-02-06 13:27 ` [PATCH v4 12/14] x86: Add KHO support Mike Rapoport
2025-02-24 7:13 ` Wei Yang
2025-02-24 14:36 ` Mike Rapoport
2025-02-25 0:00 ` Wei Yang
2025-02-06 13:27 ` [PATCH v4 13/14] memblock: Add KHO support for reserve_mem Mike Rapoport
2025-02-10 16:03 ` Rob Herring
2025-02-12 16:30 ` Mike Rapoport
2025-02-17 4:04 ` Wei Yang
2025-02-19 7:25 ` Mike Rapoport
2025-02-06 13:27 ` [PATCH v4 14/14] Documentation: KHO: Add memblock bindings Mike Rapoport
2025-02-09 10:29 ` Krzysztof Kozlowski
2025-02-09 15:10 ` Mike Rapoport
2025-02-09 15:23 ` Krzysztof Kozlowski
2025-02-09 20:41 ` Mike Rapoport
2025-02-09 20:49 ` Krzysztof Kozlowski
2025-02-09 20:50 ` Krzysztof Kozlowski
2025-02-10 19:15 ` Jason Gunthorpe
2025-02-10 19:27 ` Krzysztof Kozlowski
2025-02-10 20:20 ` Jason Gunthorpe
2025-02-12 16:00 ` Mike Rapoport
2025-02-07 0:29 ` [PATCH v4 00/14] kexec: introduce Kexec HandOver (KHO) Andrew Morton
2025-02-07 1:28 ` Pasha Tatashin
2025-02-08 1:38 ` Baoquan He
2025-02-08 8:41 ` Mike Rapoport
2025-02-08 11:13 ` Baoquan He
2025-02-09 0:23 ` Pasha Tatashin
2025-02-09 3:07 ` Baoquan He
2025-02-07 8:06 ` Mike Rapoport
2025-02-09 10:33 ` Krzysztof Kozlowski
2025-02-07 4:50 ` Andrew Morton
2025-02-07 8:01 ` Mike Rapoport
2025-02-08 23:39 ` Cong Wang
2025-02-09 0:13 ` Pasha Tatashin
2025-02-09 1:00 ` Cong Wang
2025-02-09 0:51 ` Cong Wang
2025-02-17 3:19 ` RuiRui Yang
2025-02-19 7:32 ` Mike Rapoport
2025-02-19 12:49 ` Dave Young
2025-02-19 13:54 ` Alexander Graf
2025-02-20 1:49 ` Dave Young
2025-02-20 16:43 ` Alexander Gordeev
2025-02-23 17:54 ` Mike Rapoport
2025-02-26 20:08 ` Pratyush Yadav
2025-02-28 20:20 ` Mike Rapoport
2025-02-28 23:04 ` Pratyush Yadav
2025-03-02 9:52 ` Mike Rapoport
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250210202220.GC3765641@nvidia.com \
--to=jgg@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=anthony.yznaga@oracle.com \
--cc=arnd@arndb.de \
--cc=ashish.kalra@amd.com \
--cc=benh@kernel.crashing.org \
--cc=bp@alien8.de \
--cc=catalin.marinas@arm.com \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=devicetree@vger.kernel.org \
--cc=dwmw2@infradead.org \
--cc=ebiederm@xmission.com \
--cc=graf@amazon.com \
--cc=hpa@zytor.com \
--cc=jgowans@amazon.com \
--cc=kexec@lists.infradead.org \
--cc=krzk@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=mark.rutland@arm.com \
--cc=mingo@redhat.com \
--cc=pasha.tatashin@soleen.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=ptyadav@amazon.de \
--cc=robh+dt@kernel.org \
--cc=robh@kernel.org \
--cc=rostedt@goodmis.org \
--cc=rppt@kernel.org \
--cc=saravanak@google.com \
--cc=skinsburskii@linux.microsoft.com \
--cc=tglx@linutronix.de \
--cc=thomas.lendacky@amd.com \
--cc=usama.arif@bytedance.com \
--cc=will@kernel.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox