linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@ziepe.ca>
To: Alexander Graf <graf@amazon.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>,
	Mike Rapoport <rppt@kernel.org>,
	David Rientjes <rientjes@google.com>,
	lsf-pc@lists.linux-foundation.org, "Gowans,
	James" <jgowans@amazon.com>,
	linux-mm@kvack.org
Subject: Re: [LSF/MM/BPF TOPIC] memory persistence over kexec
Date: Tue, 28 Jan 2025 10:04:01 -0400	[thread overview]
Message-ID: <20250128140401.GB1524382@ziepe.ca> (raw)
In-Reply-To: <c08f106e-f2d2-4a7e-a95c-e2a8a9c83ea0@amazon.com>

On Mon, Jan 27, 2025 at 08:12:37AM -0800, Alexander Graf wrote:

> I agree with the simplifications you're proposing; not using the purgatory
> would be a great property to have.
> 
> The reason why KHO doesn't do it yet is that I wanted to keep it simple from
> the other end. The big problem with going A/B is that if done the simple
> way, you only map B as MOVABLE while running in A. That means A could
> accidentally allocate persistent memory from A's memory region. When A then
> switches to B, B can no longer make all of A MOVABLE.

But you have this basic problem no matter what? kexec requires a
pretty big region of linear memory to boot a kernel into. Even with
purgatory and copying you still have to have ensure a free linear
space that has no KHO pages in it.

This seems impossible to really guarentee unless you have a special
KHO allocator that happens to guarentee available linear memory, or
are doing tricks like we are discussing to use the normal allocator to
keep allocations out of some linear memory.

> So we need to ensure that *both* regions are MOVABLE, and the system is
> always fully aware of both.

I imagined the kernel would boot with only the A or B area of memory
available during early boot, and then in later boot phases it would
setup the additional memory that has a mix of KHO and free pages.
This feels easier to do once the allocators are all fully started up -
ie you can deal with KHO pages by just allocating them. [*]

IOW each A/B area should be large enough to complete alot of boot and
would end up naturally containing GFP_KERNEL allocations during this
process as it is the only memory available.

If you have a special KHO allocator (GFP_KHO?) then it can simply be
aware of this and avoid allocating from the A/B zone.

However, it would be much nicer to avoid having to mark possible KHO
allocations in code at the allocation point, this would be nicer:
  p = alloc_pages(GFP_KERNEL)
  // time passes
  to_kho(p)

So I agree there is an appeal to somehow using the existing allocators
to stop taking unmovable pages from the A/B region after some point so
that no to_kho() will ever get a page that in A/B.

Can you take a ZONE_NORMAL, use it for booting, and then switch it to
ZONE_MOVABLE, keeping all the unmovable memory? Something else?

* - For drivers I'm imaging that we can do:
     p = alloc_pages(GFP_KERNEL|GFP_KHO|GFP_COMP, order);
     to_kho(p);
     // kexec
     from_kho(p);
     folio_put(p)
    Meaning KHO has to preserve the folio, keep the KVA the same,
    manage the refcount, and restore the GFP_COMP.

    I think if you have this as the basic primitive you can build
    everything else on top of it.

Jason


  reply	other threads:[~2025-01-28 14:04 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-20  7:54 Mike Rapoport
2025-01-20 14:14 ` Jason Gunthorpe
2025-01-20 19:42   ` David Rientjes
2025-01-22 23:30     ` Pasha Tatashin
2025-01-25  9:53       ` Mike Rapoport
2025-01-25 15:19         ` Pasha Tatashin
2025-01-26 20:04           ` Jason Gunthorpe
2025-01-26 20:41             ` Pasha Tatashin
2025-01-27  0:21               ` Alexander Graf
2025-01-27 13:15                 ` Jason Gunthorpe
2025-01-27 16:12                   ` Alexander Graf
2025-01-28 14:04                     ` Jason Gunthorpe [this message]
2025-01-27 13:05               ` Jason Gunthorpe
2025-01-24 21:03     ` Zhu Yanjun
2025-01-24 11:30   ` Mike Rapoport
2025-01-24 14:56     ` Jason Gunthorpe
2025-01-24 18:23 ` Andrey Ryabinin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250128140401.GB1524382@ziepe.ca \
    --to=jgg@ziepe.ca \
    --cc=graf@amazon.com \
    --cc=jgowans@amazon.com \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=pasha.tatashin@soleen.com \
    --cc=rientjes@google.com \
    --cc=rppt@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox