linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Pasha Tatashin <pasha.tatashin@soleen.com>
To: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Mike Rapoport <rppt@kernel.org>,
	linux-kernel@vger.kernel.org,  Alexander Graf <graf@amazon.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Andy Lutomirski <luto@kernel.org>,
	Anthony Yznaga <anthony.yznaga@oracle.com>,
	 Arnd Bergmann <arnd@arndb.de>,
	Ashish Kalra <ashish.kalra@amd.com>,
	 Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Borislav Petkov <bp@alien8.de>,
	 Catalin Marinas <catalin.marinas@arm.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	 David Woodhouse <dwmw2@infradead.org>,
	Eric Biederman <ebiederm@xmission.com>,
	 Ingo Molnar <mingo@redhat.com>,
	James Gowans <jgowans@amazon.com>,
	Jonathan Corbet <corbet@lwn.net>,
	 Krzysztof Kozlowski <krzk@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	 Paolo Bonzini <pbonzini@redhat.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	 Peter Zijlstra <peterz@infradead.org>,
	Pratyush Yadav <ptyadav@amazon.de>,
	 Rob Herring <robh+dt@kernel.org>, Rob Herring <robh@kernel.org>,
	 Saravana Kannan <saravanak@google.com>,
	 Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	 Thomas Gleixner <tglx@linutronix.de>,
	Tom Lendacky <thomas.lendacky@amd.com>,
	 Usama Arif <usama.arif@bytedance.com>,
	Will Deacon <will@kernel.org>,
	devicetree@vger.kernel.org,  kexec@lists.infradead.org,
	linux-arm-kernel@lists.infradead.org,  linux-doc@vger.kernel.org,
	linux-mm@kvack.org, x86@kernel.org
Subject: Re: [PATCH v4 00/14] kexec: introduce Kexec HandOver (KHO)
Date: Sat, 8 Feb 2025 19:13:40 -0500	[thread overview]
Message-ID: <CA+CK2bBrO+khpX+U3F+d8wCb3GutVD=3HtU-94gHQJSoenQcKw@mail.gmail.com> (raw)
In-Reply-To: <CAM_iQpU9DDg2Oi33_dfPqVpd9j_2O+WD7ovo__f48BA9DztwXQ@mail.gmail.com>

On Sat, Feb 8, 2025 at 6:39 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> Hi Mike,
>
> On Thu, Feb 6, 2025 at 5:28 AM Mike Rapoport <rppt@kernel.org> wrote:
> >
> > From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
> >
> > Hi,
> >
> > This a next version of Alex's "kexec: Allow preservation of ftrace buffers"
> > series (https://lore.kernel.org/all/20240117144704.602-1-graf@amazon.com),
> > just to make things simpler instead of ftrace we decided to preserve
> > "reserve_mem" regions.
> >
> > The patches are also available in git:
> > https://git.kernel.org/rppt/h/kho/v4
> >
> >
> > Kexec today considers itself purely a boot loader: When we enter the new
> > kernel, any state the previous kernel left behind is irrelevant and the
> > new kernel reinitializes the system.
> >
> > However, there are use cases where this mode of operation is not what we
> > actually want. In virtualization hosts for example, we want to use kexec
> > to update the host kernel while virtual machine memory stays untouched.
> > When we add device assignment to the mix, we also need to ensure that
> > IOMMU and VFIO states are untouched. If we add PCIe peer to peer DMA, we
> > need to do the same for the PCI subsystem. If we want to kexec while an
> > SEV-SNP enabled virtual machine is running, we need to preserve the VM
> > context pages and physical memory. See "pkernfs: Persisting guest memory
> > and kernel/device state safely across kexec" Linux Plumbers
> > Conference 2023 presentation for details:
> >
> >   https://lpc.events/event/17/contributions/1485/
> >
> > To start us on the journey to support all the use cases above, this patch
> > implements basic infrastructure to allow hand over of kernel state across
> > kexec (Kexec HandOver, aka KHO). As a really simple example target, we use
> > memblock's reserve_mem.
> > With this patch set applied, memory that was reserved using "reserve_mem"
> > command line options remains intact after kexec and it is guaranteed to
> > reside at the same physical address.
>
> Nice work!
>
> One concern there is that using memblock to reserve memory as crashkernel=
> is not flexible. I worked on kdump years ago and one of the biggest pains
> of kdump is how much memory should be reserved with crashkernel=. And
> it is still a pain today.
>
> If we reserve more, that would mean more waste for the 1st kernel. If we
> reserve less, that would induce more OOM for the 2nd kernel.
>
> I'd suggest considering using CMA, where the "reserved" memory can be
> still reusable for other purposes, just that pages can be migrated out of this
> reserved region on demand, that is, when loading a kexec kernel. Of course,
> we need to make sure they are not reused by what you want to preserve here,
> e.g., IOMMU. So you might need additional work to make it work, but still I
> believe this is the right direction.

This is exactly what scratch memory is used for. Unlike crashkernel=,
the entire scratch area is available to user applications as CMA, as
we know that no kernel-reserved memory will come from that area. This
doesn't work for crashkernel=, because in some cases, the user pages
might also need to be preserved in the crash dump. However, if user
pages are going to be discarded from the crash dump (as is done 99% of
the time), then it is better to also make it use CMA or ZONE_MOVABLE
and use only the memory occupied by the crash kernel and do not waste
any memory at all. We have an internal patch at Google that does this,
and I think it would be a good improvement for the upstream kernel to
carry as well.

Pasha

>
> Just my two cents.
>
> Thanks!


  reply	other threads:[~2025-02-09  0:14 UTC|newest]

Thread overview: 97+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-06 13:27 Mike Rapoport
2025-02-06 13:27 ` [PATCH v4 01/14] mm/mm_init: rename init_reserved_page to init_deferred_page Mike Rapoport
2025-02-18 14:59   ` Wei Yang
2025-02-19  7:13     ` Mike Rapoport
2025-02-20  8:36       ` Wei Yang
2025-02-20 14:54         ` Mike Rapoport
2025-02-25  7:40         ` Mike Rapoport
2025-02-06 13:27 ` [PATCH v4 02/14] memblock: add MEMBLOCK_RSRV_KERN flag Mike Rapoport
2025-02-18 15:50   ` Wei Yang
2025-02-19  7:24     ` Mike Rapoport
2025-02-23  0:22       ` Wei Yang
2025-03-10  9:51         ` Wei Yang
2025-03-11  5:27           ` Mike Rapoport
2025-03-11 13:41             ` Wei Yang
2025-03-12  5:22               ` Mike Rapoport
2025-02-24  1:31       ` Wei Yang
2025-02-25  7:46         ` Mike Rapoport
2025-02-26  2:09           ` Wei Yang
2025-03-10  7:56             ` Wei Yang
2025-03-10  8:28               ` Mike Rapoport
2025-03-10  9:42                 ` Wei Yang
2025-02-26  1:53   ` Changyuan Lyu
2025-03-13 15:41     ` Mike Rapoport
2025-02-06 13:27 ` [PATCH v4 03/14] memblock: Add support for scratch memory Mike Rapoport
2025-02-24  2:50   ` Wei Yang
2025-02-25  7:47     ` Mike Rapoport
2025-02-06 13:27 ` [PATCH v4 04/14] memblock: introduce memmap_init_kho_scratch() Mike Rapoport
2025-02-24  3:02   ` Wei Yang
2025-02-06 13:27 ` [PATCH v4 05/14] kexec: Add Kexec HandOver (KHO) generation helpers Mike Rapoport
2025-02-10 20:22   ` Jason Gunthorpe
2025-02-10 20:58     ` Pasha Tatashin
2025-02-11 12:49       ` Jason Gunthorpe
2025-02-11 16:14         ` Pasha Tatashin
2025-02-11 16:37           ` Jason Gunthorpe
2025-02-12 15:23             ` Jason Gunthorpe
2025-02-12 16:39               ` Mike Rapoport
2025-02-12 17:43                 ` Jason Gunthorpe
2025-02-23 18:51                   ` Mike Rapoport
2025-02-24 14:28                     ` Jason Gunthorpe
2025-02-12 12:29   ` Thomas Weißschuh
2025-02-06 13:27 ` [PATCH v4 06/14] kexec: Add KHO parsing support Mike Rapoport
2025-02-10 20:50   ` Jason Gunthorpe
2025-03-10 16:20   ` Pratyush Yadav
2025-03-10 17:08     ` Mike Rapoport
2025-02-06 13:27 ` [PATCH v4 07/14] kexec: Add KHO support to kexec file loads Mike Rapoport
2025-02-06 13:27 ` [PATCH v4 08/14] kexec: Add config option for KHO Mike Rapoport
2025-02-06 13:27 ` [PATCH v4 09/14] kexec: Add documentation " Mike Rapoport
2025-02-10 19:26   ` Jason Gunthorpe
2025-02-06 13:27 ` [PATCH v4 10/14] arm64: Add KHO support Mike Rapoport
2025-02-09 10:38   ` Krzysztof Kozlowski
2025-02-06 13:27 ` [PATCH v4 11/14] x86/setup: use memblock_reserve_kern for memory used by kernel Mike Rapoport
2025-02-06 13:27 ` [PATCH v4 12/14] x86: Add KHO support Mike Rapoport
2025-02-24  7:13   ` Wei Yang
2025-02-24 14:36     ` Mike Rapoport
2025-02-25  0:00       ` Wei Yang
2025-02-06 13:27 ` [PATCH v4 13/14] memblock: Add KHO support for reserve_mem Mike Rapoport
2025-02-10 16:03   ` Rob Herring
2025-02-12 16:30     ` Mike Rapoport
2025-02-17  4:04   ` Wei Yang
2025-02-19  7:25     ` Mike Rapoport
2025-02-06 13:27 ` [PATCH v4 14/14] Documentation: KHO: Add memblock bindings Mike Rapoport
2025-02-09 10:29   ` Krzysztof Kozlowski
2025-02-09 15:10     ` Mike Rapoport
2025-02-09 15:23       ` Krzysztof Kozlowski
2025-02-09 20:41         ` Mike Rapoport
2025-02-09 20:49           ` Krzysztof Kozlowski
2025-02-09 20:50             ` Krzysztof Kozlowski
2025-02-10 19:15               ` Jason Gunthorpe
2025-02-10 19:27                 ` Krzysztof Kozlowski
2025-02-10 20:20                   ` Jason Gunthorpe
2025-02-12 16:00                     ` Mike Rapoport
2025-02-07  0:29 ` [PATCH v4 00/14] kexec: introduce Kexec HandOver (KHO) Andrew Morton
2025-02-07  1:28   ` Pasha Tatashin
2025-02-08  1:38     ` Baoquan He
2025-02-08  8:41       ` Mike Rapoport
2025-02-08 11:13         ` Baoquan He
2025-02-09  0:23       ` Pasha Tatashin
2025-02-09  3:07         ` Baoquan He
2025-02-07  8:06   ` Mike Rapoport
2025-02-09 10:33   ` Krzysztof Kozlowski
2025-02-07  4:50 ` Andrew Morton
2025-02-07  8:01   ` Mike Rapoport
2025-02-08 23:39 ` Cong Wang
2025-02-09  0:13   ` Pasha Tatashin [this message]
2025-02-09  1:00     ` Cong Wang
2025-02-09  0:51 ` Cong Wang
2025-02-17  3:19 ` RuiRui Yang
2025-02-19  7:32   ` Mike Rapoport
2025-02-19 12:49     ` Dave Young
2025-02-19 13:54       ` Alexander Graf
2025-02-20  1:49         ` Dave Young
2025-02-20 16:43           ` Alexander Gordeev
2025-02-23 17:54             ` Mike Rapoport
2025-02-26 20:08 ` Pratyush Yadav
2025-02-28 20:20   ` Mike Rapoport
2025-02-28 23:04     ` Pratyush Yadav
2025-03-02  9:52       ` Mike Rapoport

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+CK2bBrO+khpX+U3F+d8wCb3GutVD=3HtU-94gHQJSoenQcKw@mail.gmail.com' \
    --to=pasha.tatashin@soleen.com \
    --cc=akpm@linux-foundation.org \
    --cc=anthony.yznaga@oracle.com \
    --cc=arnd@arndb.de \
    --cc=ashish.kalra@amd.com \
    --cc=benh@kernel.crashing.org \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=devicetree@vger.kernel.org \
    --cc=dwmw2@infradead.org \
    --cc=ebiederm@xmission.com \
    --cc=graf@amazon.com \
    --cc=hpa@zytor.com \
    --cc=jgowans@amazon.com \
    --cc=kexec@lists.infradead.org \
    --cc=krzk@kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=ptyadav@amazon.de \
    --cc=robh+dt@kernel.org \
    --cc=robh@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=rppt@kernel.org \
    --cc=saravanak@google.com \
    --cc=skinsburskii@linux.microsoft.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=usama.arif@bytedance.com \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox