linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
To: Baoquan He <bhe@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>,
	tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com,
	ebiederm@xmission.com, akpm@linux-foundation.org,
	stanislav.kinsburskii@gmail.com, corbet@lwn.net,
	linux-kernel@vger.kernel.org, kexec@lists.infradead.org,
	linux-mm@kvack.org, kys@microsoft.com, jgowans@amazon.com,
	wei.liu@kernel.org, arnd@arndb.de, gregkh@linuxfoundation.org,
	graf@amazon.de, pbonzini@redhat.com, "Shutemov,
	Kirill" <kirill.shutemov@intel.com>
Subject: Re: [RFC PATCH v2 0/7] Introduce persistent memory pool
Date: Thu, 28 Sep 2023 00:18:01 -0700	[thread overview]
Message-ID: <20230928071801.GA20527@skinsburskii.> (raw)
In-Reply-To: <ZRYStfGZ0/FrRh8Z@MiWiFi-R3L-srv>

On Fri, Sep 29, 2023 at 07:56:37AM +0800, Baoquan He wrote:
> On 09/27/23 at 07:46pm, Stanislav Kinsburskii wrote:
> > On Thu, Sep 28, 2023 at 12:16:31PM -0700, Dave Hansen wrote:
> > > On 9/27/23 17:38, Stanislav Kinsburskii wrote:
> > > > On Thu, Sep 28, 2023 at 11:00:12AM -0700, Dave Hansen wrote:
> > > >> On 9/27/23 17:02, Stanislav Kinsburskii wrote:
> > > >>> On Thu, Sep 28, 2023 at 10:29:32AM -0700, Dave Hansen wrote:
> > > >> ...
> > > >>> Well, not exactly. That's something I'd like to have indeed, but from my
> > > >>> POV this goal is out of scope of discussion at the moment.
> > > >>> Let me try to express it the same way you did above:
> > > >>>
> > > >>> 1. Boot some kernel
> > > >>> 2. Grow the deposited memory a bunch
> > > >>> 5. Kexec
> > > >>> 4. Kernel panic due to GPF upon accessing the memory deposited to
> > > >>> hypervisor.
> > > >>
> > > >> I basically consider this a bug in the first kernel.  It *can't* kexec
> > > >> when it's left RAM in shambles.  It doesn't know what features the new
> > > >> kernel has and whether this is even safe.
> > > >>
> > > > 
> > > > Could you elaborate more on why this is a bug in the first kernel?
> > > > Say, kernel memory can be allocated in big physically consequitive
> > > > chunks by the first kernel for depositing. The information about these
> > > > chunks is then passed the the second kernel via FDT or even command
> > > > line, so the seconds kernel can reserve this region during booting.
> > > > What's wrong with this approach?
> > > 
> > > How do you know the second kernel can parse the FDT entry or the
> > > command-line you pass to it?
> > > 
> > > >> Can the new kernel even read the new device tree data?
> > > > 
> > > > I'm not sure I understand the question, to be honest.
> > > > Why can't it? This series contains code parts for both first and seconds
> > > > kernels.
> > > 
> > > How do you know the second kernel isn't the version *before* this series
> > > gets merged?
> > > 
> > 
> > The answer to both questions above is the following: the feature is deployed
> > fleed-wide first, and enabled only upon the next deployment.
> > It worth mentioning, that fleet-wide deployments usually don't need to support
> > updates to a version older that the previous one.
> > Also, since kexec is initialited by user space, it always can be
> > enlightened about kernel capabilities and simply don't kexec to an
> > incompatible kernel version.
> > One more bit to mention, that it real life this problme exists only
> > during initial transition, as once the upgrade to a kernel with a
> > feature has happened, there won't be a revert to a versoin without it.
> > 
> > > ...
> > > >> I still think the only way this will possibly work when kexec'ing both
> > > >> old and new kernels is to do it with the memory maps that *all* kernels
> > > >> can read.
> > > > 
> > > > Could you elaborate more on this?
> > > > The avaiable memory map actually stays the same for both kernels. The
> > > > difference here can be in a different list of memory regions to reserve,
> > > > when the first kernel allocated and deposited another chunk, and thus
> > > > the second kernel needs to reserve this memory as a new region upon
> > > > booting.
> > > 
> > > Please take a step back from your implementation for a moment.  There
> > > are two basic design points that need to be considered.
> > > 
> > > First, *must* "System RAM" (according to the memory map) be persisted
> > > across kexec?  If no, then there's no problem to solve and we can stop
> > > this thread.  If yes, then some mechanism must be used to tell the new
> > > kernel that the "System RAM" in the memory map is not normal RAM.
> > > 
> > > Second, *if* we agree that some data must communicate across kexec, then
> > > what mechanism should be used?  You're arguing for a new mechanism that
> > > only new kernels can use.  I'm arguing that you should likely reuse an
> > > existing mechanism (probably the UEFI/e820 maps) so that *ALL* kernels
> > > can consume the information, old and new.
> > > 
> > 
> > I'd answer yes, "System MAP" must be persisted across kexec.
> > Could you elaborate on why there should be a mechanism to tell the
> > kernel anything special about the existent "System map" in this context?
> > Say, one can reserve a CMA region (or a crash kernel region, etc), store
> > there some data, and then pass it across kexec. Reserved CMA region will
> > still be a part of the "System MAP", won't it?
> 
> Well, I haven't gone through all the discusison thread and clearly got
> your intention and motivation. But here I have to say there's
> misunderstanding. At least I am astonished when I heard the above
> description. Who said a CMA region or a crahs kernel region need be
> passed across kexec. Think kexec as a bootloader, in essence it's no
> different than any other bootloader. When it jumps to 2nd kernel, the
> whole system will be booted up and reconstructed on the system resources.
> All the difference kexec has is it won't go through firmware to do those
> detecting/testing/init. If the intentionn is to preserve any state or
> region in 1st kernel, you absolutely got it wrong.
> 
> This is not the first time people want to put burden on kexec because
> of a specifica scenario, and this is not the 2nd time, and not 3rd time
> in the recent 2 years. But I would say please think about what is kexec
> reboot, what we expect it to do, whether the problem be fixed in its own
> side.

Frankly, I'm confused as I don't really understand, what you are arguing
with exactly... Maybe I triggered some pain point, but I don't think you
are reacting to what I actually said.
I never said, that either CMA or crash kernel needs to be passed across
kexec: I said they may be (and, actually are) passed in real worlds
scenarios. Also, it's not just CMA, but pmem backed by RAM as well.
What do I miss here?

And to me it looks like I do think about kexec as a boot loader just
like you mentioned, as the proposal in this series is to construct a
device tree exactly the same way as it it's constructed by (for example)
uboot for both x86 and arm64.
So, if we think about kexec as a bootloader, why uboot can pass a
resource to the new kernel, while the previous kernel can't do the same
and why may it be considered as an additional burden?

Thanks,
Stanislav


  reply	other threads:[~2023-09-29  0:28 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <01828.123092517290700465@us-mta-156.us.mimecast.lan>
2023-09-27  5:44 ` Baoquan He
2023-09-27 16:13   ` Stanislav Kinsburskii
2023-09-28 13:22     ` Dave Hansen
2023-09-27 23:25       ` Stanislav Kinsburskii
2023-09-28 17:29         ` Dave Hansen
2023-09-28  0:02           ` Stanislav Kinsburskii
2023-09-28 18:00             ` Dave Hansen
2023-09-28  0:38               ` Stanislav Kinsburskii
2023-09-28 19:16                 ` Dave Hansen
2023-09-28  2:46                   ` Stanislav Kinsburskii
2023-09-29 10:13                     ` Shutemov, Kirill
2023-09-28  9:16                       ` Stanislav Kinsburskii
     [not found]                   ` <64208.123092816192300612@us-mta-483.us.mimecast.lan>
2023-09-28 23:56                     ` Baoquan He
2023-09-28  7:18                       ` Stanislav Kinsburskii [this message]
2023-09-28 17:35       ` David Hildenbrand
2023-09-28 17:37         ` Dave Hansen
2023-09-28 18:12           ` [EXTERNAL] " KY Srinivasan
     [not found]   ` <58146.123092712145601339@us-mta-73.us.mimecast.lan>
2023-09-28 10:25     ` Baoquan He
2023-09-27 22:44       ` Stanislav Kinsburskii
2023-09-28 17:29       ` David Hildenbrand
2023-09-25 21:27 Stanislav Kinsburskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230928071801.GA20527@skinsburskii. \
    --to=skinsburskii@linux.microsoft.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=bhe@redhat.com \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=ebiederm@xmission.com \
    --cc=graf@amazon.de \
    --cc=gregkh@linuxfoundation.org \
    --cc=hpa@zytor.com \
    --cc=jgowans@amazon.com \
    --cc=kexec@lists.infradead.org \
    --cc=kirill.shutemov@intel.com \
    --cc=kys@microsoft.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=stanislav.kinsburskii@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=wei.liu@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox