linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mike Rapoport <rppt@kernel.org>
To: linux-kernel@vger.kernel.org
Cc: Alexander Graf <graf@amazon.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andy Lutomirski <luto@kernel.org>,
	Anthony Yznaga <anthony.yznaga@oracle.com>,
	Arnd Bergmann <arnd@arndb.de>,
	Ashish Kalra <ashish.kalra@amd.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Borislav Petkov <bp@alien8.de>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	David Woodhouse <dwmw2@infradead.org>,
	Eric Biederman <ebiederm@xmission.com>,
	Ingo Molnar <mingo@redhat.com>, James Gowans <jgowans@amazon.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Krzysztof Kozlowski <krzk@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Mike Rapoport <rppt@kernel.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Pasha Tatashin <pasha.tatashin@soleen.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Pratyush Yadav <ptyadav@amazon.de>,
	Rob Herring <robh+dt@kernel.org>, Rob Herring <robh@kernel.org>,
	Saravana Kannan <saravanak@google.com>,
	Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Tom Lendacky <thomas.lendacky@amd.com>,
	Usama Arif <usama.arif@bytedance.com>,
	Will Deacon <will@kernel.org>,
	devicetree@vger.kernel.org, kexec@lists.infradead.org,
	linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org,
	linux-mm@kvack.org, x86@kernel.org
Subject: [PATCH v4 09/14] kexec: Add documentation for KHO
Date: Thu,  6 Feb 2025 15:27:49 +0200	[thread overview]
Message-ID: <20250206132754.2596694-10-rppt@kernel.org> (raw)
In-Reply-To: <20250206132754.2596694-1-rppt@kernel.org>

From: Alexander Graf <graf@amazon.com>

With KHO in place, let's add documentation that describes what it is and
how to use it.

Signed-off-by: Alexander Graf <graf@amazon.com>
Co-developed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
 Documentation/kho/concepts.rst   | 80 ++++++++++++++++++++++++++++++++
 Documentation/kho/index.rst      | 19 ++++++++
 Documentation/kho/usage.rst      | 60 ++++++++++++++++++++++++
 Documentation/subsystem-apis.rst |  1 +
 MAINTAINERS                      |  1 +
 5 files changed, 161 insertions(+)
 create mode 100644 Documentation/kho/concepts.rst
 create mode 100644 Documentation/kho/index.rst
 create mode 100644 Documentation/kho/usage.rst

diff --git a/Documentation/kho/concepts.rst b/Documentation/kho/concepts.rst
new file mode 100644
index 000000000000..232bddacc0ef
--- /dev/null
+++ b/Documentation/kho/concepts.rst
@@ -0,0 +1,80 @@
+.. SPDX-License-Identifier: GPL-2.0-or-later
+
+=======================
+Kexec Handover Concepts
+=======================
+
+Kexec HandOver (KHO) is a mechanism that allows Linux to preserve state -
+arbitrary properties as well as memory locations - across kexec.
+
+It introduces multiple concepts:
+
+KHO Device Tree
+---------------
+
+Every KHO kexec carries a KHO specific flattened device tree blob that
+describes the state of the system. Device drivers can register to KHO to
+serialize their state before kexec. After KHO, device drivers can read
+the device tree and extract previous state.
+
+KHO only uses the fdt container format and libfdt library, but does not
+adhere to the same property semantics that normal device trees do: Properties
+are passed in native endianness and standardized properties like ``regs`` and
+``ranges`` do not exist, hence there are no ``#...-cells`` properties.
+
+KHO introduces a new concept to its device tree: ``mem`` properties. A
+``mem`` property can be inside any subnode in the device tree. When present,
+it contains an array of physical memory ranges that the new kernel must mark
+as reserved on boot. It is recommended, but not required, to make these ranges
+as physically contiguous as possible to reduce the number of array elements ::
+
+    struct kho_mem {
+            __u64 addr;
+            __u64 len;
+    };
+
+After boot, drivers can call the kho subsystem to transfer ownership of memory
+that was reserved via a ``mem`` property to themselves to continue using memory
+from the previous execution.
+
+The KHO device tree follows the in-Linux schema requirements. Any element in
+the device tree is documented via device tree schema yamls that explain what
+data gets transferred.
+
+Scratch Regions
+---------------
+
+To boot into kexec, we need to have a physically contiguous memory range that
+contains no handed over memory. Kexec then places the target kernel and initrd
+into that region. The new kernel exclusively uses this region for memory
+allocations before during boot up to the initialization of the page allocator.
+
+We guarantee that we always have such regions through the scratch regions: On
+first boot KHO allocates several physically contiguous memory regions. Since
+after kexec these regions will be used by early memory allocations, there is a
+scratch region per NUMA node plus a scratch region to satisfy allocations
+requests that do not require particilar NUMA node assignment.
+By default, size of the scratch region is calculated based on amount of memory
+allocated during boot. The ``kho_scratch`` kernel command line option may be used to explicitly define size of the scratch regions.
+The scratch regions are declared as CMA when page allocator is initialized so
+that their memory can be used during system lifetime. CMA gives us the
+guarantee that no handover pages land in that region, because handover pages
+must be at a static physical memory location and CMA enforces that only
+movable pages can be located inside.
+
+After KHO kexec, we ignore the ``kho_scratch`` kernel command line option and
+instead reuse the exact same region that was originally allocated. This allows
+us to recursively execute any amount of KHO kexecs. Because we used this region
+for boot memory allocations and as target memory for kexec blobs, some parts
+of that memory region may be reserved. These reservations are irrenevant for
+the next KHO, because kexec can overwrite even the original kernel.
+
+KHO active phase
+----------------
+
+To enable user space based kexec file loader, the kernel needs to be able to
+provide the device tree that describes the previous kernel's state before
+performing the actual kexec. The process of generating that device tree is
+called serialization. When the device tree is generated, some properties
+of the system may become immutable because they are already written down
+in the device tree. That state is called the KHO active phase.
diff --git a/Documentation/kho/index.rst b/Documentation/kho/index.rst
new file mode 100644
index 000000000000..5e7eeeca8520
--- /dev/null
+++ b/Documentation/kho/index.rst
@@ -0,0 +1,19 @@
+.. SPDX-License-Identifier: GPL-2.0-or-later
+
+========================
+Kexec Handover Subsystem
+========================
+
+.. toctree::
+   :maxdepth: 1
+
+   concepts
+   usage
+
+.. only::  subproject and html
+
+
+   Indices
+   =======
+
+   * :ref:`genindex`
diff --git a/Documentation/kho/usage.rst b/Documentation/kho/usage.rst
new file mode 100644
index 000000000000..e7300fbb309c
--- /dev/null
+++ b/Documentation/kho/usage.rst
@@ -0,0 +1,60 @@
+.. SPDX-License-Identifier: GPL-2.0-or-later
+
+====================
+Kexec Handover Usage
+====================
+
+Kexec HandOver (KHO) is a mechanism that allows Linux to preserve state -
+arbitrary properties as well as memory locations - across kexec.
+
+This document expects that you are familiar with the base KHO
+:ref:`Documentation/kho/concepts.rst <concepts>`. If you have not read
+them yet, please do so now.
+
+Prerequisites
+-------------
+
+KHO is available when the ``CONFIG_KEXEC_HANDOVER`` config option is set to y
+at compile time. Every KHO producer may have its own config option that you
+need to enable if you would like to preserve their respective state across
+kexec.
+
+To use KHO, please boot the kernel with the ``kho=on`` command line
+parameter. You may use ``kho_scratch`` parameter to define size of the
+scratch regions. For example ``kho_scratch=512M,512M`` will reserve a 512
+MiB for a global scratch region and 512 MiB per NUMA node scratch regions
+on boot.
+
+Perform a KHO kexec
+-------------------
+
+Before you can perform a KHO kexec, you need to move the system into the
+:ref:`Documentation/kho/concepts.rst <KHO active phase>` ::
+
+  $ echo 1 > /sys/kernel/kho/active
+
+After this command, the KHO device tree is available in ``/sys/kernel/kho/dt``.
+
+Next, load the target payload and kexec into it. It is important that you
+use the ``-s`` parameter to use the in-kernel kexec file loader, as user
+space kexec tooling currently has no support for KHO with the user space
+based file loader ::
+
+  # kexec -l Image --initrd=initrd -s
+  # kexec -e
+
+The new kernel will boot up and contain some of the previous kernel's state.
+
+For example, if you used ``reserve_mem`` command line parameter to create
+an early memory reservation, the new kernel will have that memory at the
+same physical address as the old kernel.
+
+Abort a KHO exec
+----------------
+
+You can move the system out of KHO active phase again by calling ::
+
+  $ echo 1 > /sys/kernel/kho/active
+
+After this command, the KHO device tree is no longer available in
+``/sys/kernel/kho/dt``.
diff --git a/Documentation/subsystem-apis.rst b/Documentation/subsystem-apis.rst
index b52ad5b969d4..5fc69d6ff9f0 100644
--- a/Documentation/subsystem-apis.rst
+++ b/Documentation/subsystem-apis.rst
@@ -90,3 +90,4 @@ Other subsystems
    peci/index
    wmi/index
    tee/index
+   kho/index
diff --git a/MAINTAINERS b/MAINTAINERS
index e1e01b2a3727..82c2ef421c00 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12828,6 +12828,7 @@ S:	Maintained
 W:	http://kernel.org/pub/linux/utils/kernel/kexec/
 F:	Documentation/ABI/testing/sysfs-firmware-kho
 F:	Documentation/ABI/testing/sysfs-kernel-kho
+F:	Documentation/kho/
 F:	include/linux/kexec.h
 F:	include/uapi/linux/kexec.h
 F:	kernel/kexec*
-- 
2.47.2



  parent reply	other threads:[~2025-02-06 13:29 UTC|newest]

Thread overview: 97+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-06 13:27 [PATCH v4 00/14] kexec: introduce Kexec HandOver (KHO) Mike Rapoport
2025-02-06 13:27 ` [PATCH v4 01/14] mm/mm_init: rename init_reserved_page to init_deferred_page Mike Rapoport
2025-02-18 14:59   ` Wei Yang
2025-02-19  7:13     ` Mike Rapoport
2025-02-20  8:36       ` Wei Yang
2025-02-20 14:54         ` Mike Rapoport
2025-02-25  7:40         ` Mike Rapoport
2025-02-06 13:27 ` [PATCH v4 02/14] memblock: add MEMBLOCK_RSRV_KERN flag Mike Rapoport
2025-02-18 15:50   ` Wei Yang
2025-02-19  7:24     ` Mike Rapoport
2025-02-23  0:22       ` Wei Yang
2025-03-10  9:51         ` Wei Yang
2025-03-11  5:27           ` Mike Rapoport
2025-03-11 13:41             ` Wei Yang
2025-03-12  5:22               ` Mike Rapoport
2025-02-24  1:31       ` Wei Yang
2025-02-25  7:46         ` Mike Rapoport
2025-02-26  2:09           ` Wei Yang
2025-03-10  7:56             ` Wei Yang
2025-03-10  8:28               ` Mike Rapoport
2025-03-10  9:42                 ` Wei Yang
2025-02-26  1:53   ` Changyuan Lyu
2025-03-13 15:41     ` Mike Rapoport
2025-02-06 13:27 ` [PATCH v4 03/14] memblock: Add support for scratch memory Mike Rapoport
2025-02-24  2:50   ` Wei Yang
2025-02-25  7:47     ` Mike Rapoport
2025-02-06 13:27 ` [PATCH v4 04/14] memblock: introduce memmap_init_kho_scratch() Mike Rapoport
2025-02-24  3:02   ` Wei Yang
2025-02-06 13:27 ` [PATCH v4 05/14] kexec: Add Kexec HandOver (KHO) generation helpers Mike Rapoport
2025-02-10 20:22   ` Jason Gunthorpe
2025-02-10 20:58     ` Pasha Tatashin
2025-02-11 12:49       ` Jason Gunthorpe
2025-02-11 16:14         ` Pasha Tatashin
2025-02-11 16:37           ` Jason Gunthorpe
2025-02-12 15:23             ` Jason Gunthorpe
2025-02-12 16:39               ` Mike Rapoport
2025-02-12 17:43                 ` Jason Gunthorpe
2025-02-23 18:51                   ` Mike Rapoport
2025-02-24 14:28                     ` Jason Gunthorpe
2025-02-12 12:29   ` Thomas Weißschuh
2025-02-06 13:27 ` [PATCH v4 06/14] kexec: Add KHO parsing support Mike Rapoport
2025-02-10 20:50   ` Jason Gunthorpe
2025-03-10 16:20   ` Pratyush Yadav
2025-03-10 17:08     ` Mike Rapoport
2025-02-06 13:27 ` [PATCH v4 07/14] kexec: Add KHO support to kexec file loads Mike Rapoport
2025-02-06 13:27 ` [PATCH v4 08/14] kexec: Add config option for KHO Mike Rapoport
2025-02-06 13:27 ` Mike Rapoport [this message]
2025-02-10 19:26   ` [PATCH v4 09/14] kexec: Add documentation " Jason Gunthorpe
2025-02-06 13:27 ` [PATCH v4 10/14] arm64: Add KHO support Mike Rapoport
2025-02-09 10:38   ` Krzysztof Kozlowski
2025-02-06 13:27 ` [PATCH v4 11/14] x86/setup: use memblock_reserve_kern for memory used by kernel Mike Rapoport
2025-02-06 13:27 ` [PATCH v4 12/14] x86: Add KHO support Mike Rapoport
2025-02-24  7:13   ` Wei Yang
2025-02-24 14:36     ` Mike Rapoport
2025-02-25  0:00       ` Wei Yang
2025-02-06 13:27 ` [PATCH v4 13/14] memblock: Add KHO support for reserve_mem Mike Rapoport
2025-02-10 16:03   ` Rob Herring
2025-02-12 16:30     ` Mike Rapoport
2025-02-17  4:04   ` Wei Yang
2025-02-19  7:25     ` Mike Rapoport
2025-02-06 13:27 ` [PATCH v4 14/14] Documentation: KHO: Add memblock bindings Mike Rapoport
2025-02-09 10:29   ` Krzysztof Kozlowski
2025-02-09 15:10     ` Mike Rapoport
2025-02-09 15:23       ` Krzysztof Kozlowski
2025-02-09 20:41         ` Mike Rapoport
2025-02-09 20:49           ` Krzysztof Kozlowski
2025-02-09 20:50             ` Krzysztof Kozlowski
2025-02-10 19:15               ` Jason Gunthorpe
2025-02-10 19:27                 ` Krzysztof Kozlowski
2025-02-10 20:20                   ` Jason Gunthorpe
2025-02-12 16:00                     ` Mike Rapoport
2025-02-07  0:29 ` [PATCH v4 00/14] kexec: introduce Kexec HandOver (KHO) Andrew Morton
2025-02-07  1:28   ` Pasha Tatashin
2025-02-08  1:38     ` Baoquan He
2025-02-08  8:41       ` Mike Rapoport
2025-02-08 11:13         ` Baoquan He
2025-02-09  0:23       ` Pasha Tatashin
2025-02-09  3:07         ` Baoquan He
2025-02-07  8:06   ` Mike Rapoport
2025-02-09 10:33   ` Krzysztof Kozlowski
2025-02-07  4:50 ` Andrew Morton
2025-02-07  8:01   ` Mike Rapoport
2025-02-08 23:39 ` Cong Wang
2025-02-09  0:13   ` Pasha Tatashin
2025-02-09  1:00     ` Cong Wang
2025-02-09  0:51 ` Cong Wang
2025-02-17  3:19 ` RuiRui Yang
2025-02-19  7:32   ` Mike Rapoport
2025-02-19 12:49     ` Dave Young
2025-02-19 13:54       ` Alexander Graf
2025-02-20  1:49         ` Dave Young
2025-02-20 16:43           ` Alexander Gordeev
2025-02-23 17:54             ` Mike Rapoport
2025-02-26 20:08 ` Pratyush Yadav
2025-02-28 20:20   ` Mike Rapoport
2025-02-28 23:04     ` Pratyush Yadav
2025-03-02  9:52       ` Mike Rapoport

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250206132754.2596694-10-rppt@kernel.org \
    --to=rppt@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=anthony.yznaga@oracle.com \
    --cc=arnd@arndb.de \
    --cc=ashish.kalra@amd.com \
    --cc=benh@kernel.crashing.org \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=devicetree@vger.kernel.org \
    --cc=dwmw2@infradead.org \
    --cc=ebiederm@xmission.com \
    --cc=graf@amazon.com \
    --cc=hpa@zytor.com \
    --cc=jgowans@amazon.com \
    --cc=kexec@lists.infradead.org \
    --cc=krzk@kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=ptyadav@amazon.de \
    --cc=robh+dt@kernel.org \
    --cc=robh@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=saravanak@google.com \
    --cc=skinsburskii@linux.microsoft.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=usama.arif@bytedance.com \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox