From: Dave Hansen <dave.hansen@intel.com>
To: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Cc: Baoquan He <bhe@redhat.com>,
tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com,
ebiederm@xmission.com, akpm@linux-foundation.org,
stanislav.kinsburskii@gmail.com, corbet@lwn.net,
linux-kernel@vger.kernel.org, kexec@lists.infradead.org,
linux-mm@kvack.org, kys@microsoft.com, jgowans@amazon.com,
wei.liu@kernel.org, arnd@arndb.de, gregkh@linuxfoundation.org,
graf@amazon.de, pbonzini@redhat.com
Subject: Re: [RFC PATCH v2 0/7] Introduce persistent memory pool
Date: Thu, 28 Sep 2023 10:29:32 -0700 [thread overview]
Message-ID: <b684d339-991d-be85-692c-75f21679ca69@intel.com> (raw)
In-Reply-To: <20230927232548.GA20221@skinsburskii.>
On 9/27/23 16:25, Stanislav Kinsburskii wrote:
> On Thu, Sep 28, 2023 at 06:22:54AM -0700, Dave Hansen wrote:
>> On 9/27/23 09:13, Stanislav Kinsburskii wrote:
>>> Once deposited, these pages can't be accessed by Linux anymore and thus
>>> must be preserved in "used" state across kexec, as hypervisor state is
>>> unware of kexec.
>>
>> If Linux can't access them, they're not RAM any more. I'd much rather
>> remove them from the memory map and move on with life rather than
>> implement a bunch of new ABI that's got to be handed across kernels.
>
> Could you elaborate more on the new ABIs? FDT is handled by x86 already,
> and passing it over kexec looks like a natural extension.
> Also, adding more state to it also doens't look like a new ABI.
> Or does it?
FDT makes it easier to pass arbitrary data around, but you're still
creating a new "default_pmpool" device tree node on one end and
consuming it on the other. That's a new ABI in my book.
> Let me also comment on removing this regions from the memory map. The
> major peculiarity here is that hypervisor distinguish between the pages,
> deposited for guests to rnu and the pages deposited for the Linux root
> partition to keep the guest-related portion of hypervisor state in the
> root partition. And the latter is the matter in question.
>
> We can indeed isolate and deposit a excessive amount of memory upfront
> in hope that hypervisor will never get into the situation, when it needs
> more memory.
> However, it's not reliable, as the amount of memory will always be an
> estimation, depending on the number of expected guests, guest-attached
> devices, etc. And this becomes even a bigger problem when most of the
> memory is already removed from the memory map to host guest partitions.
> It's also not efficient as the amount of memory required by hypervisor
> can grow or shrink depending on the use case or host configuration, and
> deposting excessive amount of memory will be a waste.
>
> But, actually, the idea of removing the pages from memory map was
> reflected to some extent in the first version of this proposal,
> so let me elaborate on it a bit.
>
> Effectively, instead of reserving and depositing a lot of memory to
> hypervisor upfront, the memory can be allocated from kernel memory when
> needed and then returned back when unused.
> This would still require pages removal from the memory map upon kexec,
> but that's another problem.
Let's distill this down a bit.
I agree that it's a waste to reserve an obscene amount of memory up
front for all guests for rare cases. Having the amount of consumed
memory grow is a nice feature.
You can also quite easily *shrink* the amount of memory on a given
kernel without new code. Right?
The problem comes when you've grown the footprint of hypervisor-donated
memory, kexec, and *THEN* want to shrink it. That's what needs new
metadata to be communicated over to the new kernel.
1. Boot some kernel
2. Grow the deposited memory a bunch
3. Kexec
4. Shrink the deposited memory
Right?
That's where you lose me.
Can't the deposited memory just be shrunk before kexec? Surely there
aren't a bunch of pathological things consuming that memory right before
kexec, which is basically a reboot.
next prev parent reply other threads:[~2023-09-28 17:29 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <01828.123092517290700465@us-mta-156.us.mimecast.lan>
2023-09-27 5:44 ` Baoquan He
2023-09-27 16:13 ` Stanislav Kinsburskii
2023-09-28 13:22 ` Dave Hansen
2023-09-27 23:25 ` Stanislav Kinsburskii
2023-09-28 17:29 ` Dave Hansen [this message]
2023-09-28 0:02 ` Stanislav Kinsburskii
2023-09-28 18:00 ` Dave Hansen
2023-09-28 0:38 ` Stanislav Kinsburskii
2023-09-28 19:16 ` Dave Hansen
2023-09-28 2:46 ` Stanislav Kinsburskii
2023-09-29 10:13 ` Shutemov, Kirill
2023-09-28 9:16 ` Stanislav Kinsburskii
[not found] ` <64208.123092816192300612@us-mta-483.us.mimecast.lan>
2023-09-28 23:56 ` Baoquan He
2023-09-28 7:18 ` Stanislav Kinsburskii
2023-09-28 17:35 ` David Hildenbrand
2023-09-28 17:37 ` Dave Hansen
2023-09-28 18:12 ` [EXTERNAL] " KY Srinivasan
[not found] ` <58146.123092712145601339@us-mta-73.us.mimecast.lan>
2023-09-28 10:25 ` Baoquan He
2023-09-27 22:44 ` Stanislav Kinsburskii
2023-09-28 17:29 ` David Hildenbrand
2023-09-25 21:27 Stanislav Kinsburskii
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b684d339-991d-be85-692c-75f21679ca69@intel.com \
--to=dave.hansen@intel.com \
--cc=akpm@linux-foundation.org \
--cc=arnd@arndb.de \
--cc=bhe@redhat.com \
--cc=bp@alien8.de \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=ebiederm@xmission.com \
--cc=graf@amazon.de \
--cc=gregkh@linuxfoundation.org \
--cc=hpa@zytor.com \
--cc=jgowans@amazon.com \
--cc=kexec@lists.infradead.org \
--cc=kys@microsoft.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=skinsburskii@linux.microsoft.com \
--cc=stanislav.kinsburskii@gmail.com \
--cc=tglx@linutronix.de \
--cc=wei.liu@kernel.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox