linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Cong Wang <xiyou.wangcong@gmail.com>
To: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Mike Rapoport <rppt@kernel.org>,
	linux-kernel@vger.kernel.org,  Alexander Graf <graf@amazon.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Andy Lutomirski <luto@kernel.org>,
	Anthony Yznaga <anthony.yznaga@oracle.com>,
	 Arnd Bergmann <arnd@arndb.de>,
	Ashish Kalra <ashish.kalra@amd.com>,
	 Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Borislav Petkov <bp@alien8.de>,
	 Catalin Marinas <catalin.marinas@arm.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	 David Woodhouse <dwmw2@infradead.org>,
	Eric Biederman <ebiederm@xmission.com>,
	 Ingo Molnar <mingo@redhat.com>,
	James Gowans <jgowans@amazon.com>,
	Jonathan Corbet <corbet@lwn.net>,
	 Krzysztof Kozlowski <krzk@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	 Paolo Bonzini <pbonzini@redhat.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	 Peter Zijlstra <peterz@infradead.org>,
	Pratyush Yadav <ptyadav@amazon.de>,
	 Rob Herring <robh+dt@kernel.org>, Rob Herring <robh@kernel.org>,
	 Saravana Kannan <saravanak@google.com>,
	 Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	 Thomas Gleixner <tglx@linutronix.de>,
	Tom Lendacky <thomas.lendacky@amd.com>,
	 Usama Arif <usama.arif@bytedance.com>,
	Will Deacon <will@kernel.org>,
	devicetree@vger.kernel.org,  kexec@lists.infradead.org,
	linux-arm-kernel@lists.infradead.org,  linux-doc@vger.kernel.org,
	linux-mm@kvack.org, x86@kernel.org
Subject: Re: [PATCH v4 00/14] kexec: introduce Kexec HandOver (KHO)
Date: Sat, 8 Feb 2025 17:00:15 -0800	[thread overview]
Message-ID: <CAM_iQpW4--H6wqcx-=O5_PhEOkdrZN52qUhRRZO9xwpMxxLPaw@mail.gmail.com> (raw)
In-Reply-To: <CA+CK2bBrO+khpX+U3F+d8wCb3GutVD=3HtU-94gHQJSoenQcKw@mail.gmail.com>

On Sat, Feb 8, 2025 at 4:14 PM Pasha Tatashin <pasha.tatashin@soleen.com> wrote:
>
> On Sat, Feb 8, 2025 at 6:39 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >
> > Hi Mike,
> >
> > On Thu, Feb 6, 2025 at 5:28 AM Mike Rapoport <rppt@kernel.org> wrote:
> > >
> > > From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
> > >
> > > Hi,
> > >
> > > This a next version of Alex's "kexec: Allow preservation of ftrace buffers"
> > > series (https://lore.kernel.org/all/20240117144704.602-1-graf@amazon.com),
> > > just to make things simpler instead of ftrace we decided to preserve
> > > "reserve_mem" regions.
> > >
> > > The patches are also available in git:
> > > https://git.kernel.org/rppt/h/kho/v4
> > >
> > >
> > > Kexec today considers itself purely a boot loader: When we enter the new
> > > kernel, any state the previous kernel left behind is irrelevant and the
> > > new kernel reinitializes the system.
> > >
> > > However, there are use cases where this mode of operation is not what we
> > > actually want. In virtualization hosts for example, we want to use kexec
> > > to update the host kernel while virtual machine memory stays untouched.
> > > When we add device assignment to the mix, we also need to ensure that
> > > IOMMU and VFIO states are untouched. If we add PCIe peer to peer DMA, we
> > > need to do the same for the PCI subsystem. If we want to kexec while an
> > > SEV-SNP enabled virtual machine is running, we need to preserve the VM
> > > context pages and physical memory. See "pkernfs: Persisting guest memory
> > > and kernel/device state safely across kexec" Linux Plumbers
> > > Conference 2023 presentation for details:
> > >
> > >   https://lpc.events/event/17/contributions/1485/
> > >
> > > To start us on the journey to support all the use cases above, this patch
> > > implements basic infrastructure to allow hand over of kernel state across
> > > kexec (Kexec HandOver, aka KHO). As a really simple example target, we use
> > > memblock's reserve_mem.
> > > With this patch set applied, memory that was reserved using "reserve_mem"
> > > command line options remains intact after kexec and it is guaranteed to
> > > reside at the same physical address.
> >
> > Nice work!
> >
> > One concern there is that using memblock to reserve memory as crashkernel=
> > is not flexible. I worked on kdump years ago and one of the biggest pains
> > of kdump is how much memory should be reserved with crashkernel=. And
> > it is still a pain today.
> >
> > If we reserve more, that would mean more waste for the 1st kernel. If we
> > reserve less, that would induce more OOM for the 2nd kernel.
> >
> > I'd suggest considering using CMA, where the "reserved" memory can be
> > still reusable for other purposes, just that pages can be migrated out of this
> > reserved region on demand, that is, when loading a kexec kernel. Of course,
> > we need to make sure they are not reused by what you want to preserve here,
> > e.g., IOMMU. So you might need additional work to make it work, but still I
> > believe this is the right direction.
>
> This is exactly what scratch memory is used for. Unlike crashkernel=,
> the entire scratch area is available to user applications as CMA, as
> we know that no kernel-reserved memory will come from that area. This
> doesn't work for crashkernel=, because in some cases, the user pages
> might also need to be preserved in the crash dump. However, if user
> pages are going to be discarded from the crash dump (as is done 99% of
> the time), then it is better to also make it use CMA or ZONE_MOVABLE
> and use only the memory occupied by the crash kernel and do not waste
> any memory at all. We have an internal patch at Google that does this,
> and I think it would be a good improvement for the upstream kernel to
> carry as well.

Good to know CMA is already used, I could not tell from the cover letter.

The case that user-space pages need to be preserved is for scenarios like
RDMA which pins user-space pages for DMA transfer. Since the goal here
is also to preserve hardware states like RDMA's I guess the same concern
remains.

Thanks!


  reply	other threads:[~2025-02-09  1:00 UTC|newest]

Thread overview: 97+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-06 13:27 Mike Rapoport
2025-02-06 13:27 ` [PATCH v4 01/14] mm/mm_init: rename init_reserved_page to init_deferred_page Mike Rapoport
2025-02-18 14:59   ` Wei Yang
2025-02-19  7:13     ` Mike Rapoport
2025-02-20  8:36       ` Wei Yang
2025-02-20 14:54         ` Mike Rapoport
2025-02-25  7:40         ` Mike Rapoport
2025-02-06 13:27 ` [PATCH v4 02/14] memblock: add MEMBLOCK_RSRV_KERN flag Mike Rapoport
2025-02-18 15:50   ` Wei Yang
2025-02-19  7:24     ` Mike Rapoport
2025-02-23  0:22       ` Wei Yang
2025-03-10  9:51         ` Wei Yang
2025-03-11  5:27           ` Mike Rapoport
2025-03-11 13:41             ` Wei Yang
2025-03-12  5:22               ` Mike Rapoport
2025-02-24  1:31       ` Wei Yang
2025-02-25  7:46         ` Mike Rapoport
2025-02-26  2:09           ` Wei Yang
2025-03-10  7:56             ` Wei Yang
2025-03-10  8:28               ` Mike Rapoport
2025-03-10  9:42                 ` Wei Yang
2025-02-26  1:53   ` Changyuan Lyu
2025-03-13 15:41     ` Mike Rapoport
2025-02-06 13:27 ` [PATCH v4 03/14] memblock: Add support for scratch memory Mike Rapoport
2025-02-24  2:50   ` Wei Yang
2025-02-25  7:47     ` Mike Rapoport
2025-02-06 13:27 ` [PATCH v4 04/14] memblock: introduce memmap_init_kho_scratch() Mike Rapoport
2025-02-24  3:02   ` Wei Yang
2025-02-06 13:27 ` [PATCH v4 05/14] kexec: Add Kexec HandOver (KHO) generation helpers Mike Rapoport
2025-02-10 20:22   ` Jason Gunthorpe
2025-02-10 20:58     ` Pasha Tatashin
2025-02-11 12:49       ` Jason Gunthorpe
2025-02-11 16:14         ` Pasha Tatashin
2025-02-11 16:37           ` Jason Gunthorpe
2025-02-12 15:23             ` Jason Gunthorpe
2025-02-12 16:39               ` Mike Rapoport
2025-02-12 17:43                 ` Jason Gunthorpe
2025-02-23 18:51                   ` Mike Rapoport
2025-02-24 14:28                     ` Jason Gunthorpe
2025-02-12 12:29   ` Thomas Weißschuh
2025-02-06 13:27 ` [PATCH v4 06/14] kexec: Add KHO parsing support Mike Rapoport
2025-02-10 20:50   ` Jason Gunthorpe
2025-03-10 16:20   ` Pratyush Yadav
2025-03-10 17:08     ` Mike Rapoport
2025-02-06 13:27 ` [PATCH v4 07/14] kexec: Add KHO support to kexec file loads Mike Rapoport
2025-02-06 13:27 ` [PATCH v4 08/14] kexec: Add config option for KHO Mike Rapoport
2025-02-06 13:27 ` [PATCH v4 09/14] kexec: Add documentation " Mike Rapoport
2025-02-10 19:26   ` Jason Gunthorpe
2025-02-06 13:27 ` [PATCH v4 10/14] arm64: Add KHO support Mike Rapoport
2025-02-09 10:38   ` Krzysztof Kozlowski
2025-02-06 13:27 ` [PATCH v4 11/14] x86/setup: use memblock_reserve_kern for memory used by kernel Mike Rapoport
2025-02-06 13:27 ` [PATCH v4 12/14] x86: Add KHO support Mike Rapoport
2025-02-24  7:13   ` Wei Yang
2025-02-24 14:36     ` Mike Rapoport
2025-02-25  0:00       ` Wei Yang
2025-02-06 13:27 ` [PATCH v4 13/14] memblock: Add KHO support for reserve_mem Mike Rapoport
2025-02-10 16:03   ` Rob Herring
2025-02-12 16:30     ` Mike Rapoport
2025-02-17  4:04   ` Wei Yang
2025-02-19  7:25     ` Mike Rapoport
2025-02-06 13:27 ` [PATCH v4 14/14] Documentation: KHO: Add memblock bindings Mike Rapoport
2025-02-09 10:29   ` Krzysztof Kozlowski
2025-02-09 15:10     ` Mike Rapoport
2025-02-09 15:23       ` Krzysztof Kozlowski
2025-02-09 20:41         ` Mike Rapoport
2025-02-09 20:49           ` Krzysztof Kozlowski
2025-02-09 20:50             ` Krzysztof Kozlowski
2025-02-10 19:15               ` Jason Gunthorpe
2025-02-10 19:27                 ` Krzysztof Kozlowski
2025-02-10 20:20                   ` Jason Gunthorpe
2025-02-12 16:00                     ` Mike Rapoport
2025-02-07  0:29 ` [PATCH v4 00/14] kexec: introduce Kexec HandOver (KHO) Andrew Morton
2025-02-07  1:28   ` Pasha Tatashin
2025-02-08  1:38     ` Baoquan He
2025-02-08  8:41       ` Mike Rapoport
2025-02-08 11:13         ` Baoquan He
2025-02-09  0:23       ` Pasha Tatashin
2025-02-09  3:07         ` Baoquan He
2025-02-07  8:06   ` Mike Rapoport
2025-02-09 10:33   ` Krzysztof Kozlowski
2025-02-07  4:50 ` Andrew Morton
2025-02-07  8:01   ` Mike Rapoport
2025-02-08 23:39 ` Cong Wang
2025-02-09  0:13   ` Pasha Tatashin
2025-02-09  1:00     ` Cong Wang [this message]
2025-02-09  0:51 ` Cong Wang
2025-02-17  3:19 ` RuiRui Yang
2025-02-19  7:32   ` Mike Rapoport
2025-02-19 12:49     ` Dave Young
2025-02-19 13:54       ` Alexander Graf
2025-02-20  1:49         ` Dave Young
2025-02-20 16:43           ` Alexander Gordeev
2025-02-23 17:54             ` Mike Rapoport
2025-02-26 20:08 ` Pratyush Yadav
2025-02-28 20:20   ` Mike Rapoport
2025-02-28 23:04     ` Pratyush Yadav
2025-03-02  9:52       ` Mike Rapoport

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAM_iQpW4--H6wqcx-=O5_PhEOkdrZN52qUhRRZO9xwpMxxLPaw@mail.gmail.com' \
    --to=xiyou.wangcong@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=anthony.yznaga@oracle.com \
    --cc=arnd@arndb.de \
    --cc=ashish.kalra@amd.com \
    --cc=benh@kernel.crashing.org \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=devicetree@vger.kernel.org \
    --cc=dwmw2@infradead.org \
    --cc=ebiederm@xmission.com \
    --cc=graf@amazon.com \
    --cc=hpa@zytor.com \
    --cc=jgowans@amazon.com \
    --cc=kexec@lists.infradead.org \
    --cc=krzk@kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=ptyadav@amazon.de \
    --cc=robh+dt@kernel.org \
    --cc=robh@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=rppt@kernel.org \
    --cc=saravanak@google.com \
    --cc=skinsburskii@linux.microsoft.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=usama.arif@bytedance.com \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox