From: Pasha Tatashin <pasha.tatashin@soleen.com>
To: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Pratyush Yadav <pratyush@kernel.org>,
akpm@linux-foundation.org, brauner@kernel.org, corbet@lwn.net,
graf@amazon.com, linux-kernel@vger.kernel.org,
linux-kselftest@vger.kernel.org, linux-mm@kvack.org,
masahiroy@kernel.org, ojeda@kernel.org, rdunlap@infradead.org,
rppt@kernel.org, tj@kernel.org, jasonmiu@google.com,
dmatlack@google.com, skhawaja@google.com, glider@google.com,
elver@google.com
Subject: Re: [PATCH 2/2] liveupdate: kho: allocate metadata directly from the buddy allocator
Date: Fri, 24 Oct 2025 09:57:24 -0400 [thread overview]
Message-ID: <CA+CK2bA_Qb9csWvEQb-zpxgMg7vy+gw9eh0z88QBEdiFdtopMQ@mail.gmail.com> (raw)
In-Reply-To: <20251024132509.GB760669@ziepe.ca>
On Fri, Oct 24, 2025 at 9:25 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Oct 15, 2025 at 10:19:08AM -0400, Pasha Tatashin wrote:
> > On Wed, Oct 15, 2025 at 9:05 AM Pratyush Yadav <pratyush@kernel.org> wrote:
> > >
> > > +Cc Marco, Alexander
> > >
> > > On Wed, Oct 15 2025, Pasha Tatashin wrote:
> > >
> > > > KHO allocates metadata for its preserved memory map using the SLUB
> > > > allocator via kzalloc(). This metadata is temporary and is used by the
> > > > next kernel during early boot to find preserved memory.
> > > >
> > > > A problem arises when KFENCE is enabled. kzalloc() calls can be
> > > > randomly intercepted by kfence_alloc(), which services the allocation
> > > > from a dedicated KFENCE memory pool. This pool is allocated early in
> > > > boot via memblock.
> > >
> > > At some point, we'd probably want to add support for preserving slab
> > > objects using KHO. That wouldn't work if the objects can land in scratch
> > > memory. Right now, the kfence pools are allocated right before KHO goes
> > > out of scratch-only and memblock frees pages to buddy.
> >
> > If we do that, most likely we will add a GFP flag that goes with it,
> > so the slab can use a special pool of pages that are preservable.
> > Otherwise, we are going to be leaking memory from the old kernel in
> > the unpreserved parts of the pages.
>
> That isn't an issue. If we make slab preservable then we'd have to
> preserve the page and then somehow record what order is stored in that
> page and a bit map of which parts are allocated to restore the slab
> state on recovery.
>
> So long as the non-preserved memory comes back as freed on the
> sucessor kernel it doesn't matter what was in it in the preceeding
> kernel. The new kernel will eventually zero it. So it isn't a 'leak'.
Hi Jason,
I agree, it's not a "leak" in the traditional sense, as we trust the
successor kernel to manage its own memory.
However, my concern is that without a dedicated GFP flag, this
partial-page preservation model becomes too fragile, inefficient, and
creates a data exposure risk.
You're right the new kernel will eventually zero memory, but KHO
preserves at page granularity. If we preserve a single slab object,
the entire page is handed off. When the new kernel maps that page
(e.g., to userspace) to access the preserved object, it also exposes
the unpreserved portions of that same page. Those portions contain
stale data from the old kernel and won't have been zeroed yet,
creating an easy-to-miss data leak vector. It makes the API very
error-prone.
There's also the inefficiency. The unpreserved parts of that page are
unusable by the new kernel until the preserved object is freed.
Depending on the use case, that object might live for the entire
kernel lifetime, effectively wasting that memory. This waste could
then accumulate with each subsequent live update.
Trying to create a special KHO slab cache isn't a solution either,
since slab caches are often merged.
As I see it, the only robust solution is to use a special GFP flag.
This would force these allocations to come from a dedicated pool of
pages that are fully preserved, with no partial/mixed-use pages and
also retrieved as slabs.
That said, I'm not sure preserving individual slab objects is a high
priority right now. It might be simpler to avoid it altogether.
Pasha
next prev parent reply other threads:[~2025-10-24 13:58 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-15 5:31 [PATCH 0/2] KHO: Fix metadata allocation in scratch area Pasha Tatashin
2025-10-15 5:31 ` [PATCH 1/2] liveupdate: kho: warn and fail on metadata or preserved memory " Pasha Tatashin
2025-10-15 8:21 ` Mike Rapoport
2025-10-15 12:36 ` Pasha Tatashin
2025-10-16 17:23 ` Mike Rapoport
2025-10-18 15:31 ` Pasha Tatashin
2025-10-18 15:28 ` Pasha Tatashin
2025-10-15 12:10 ` Pratyush Yadav
2025-10-15 12:40 ` Pasha Tatashin
2025-10-15 13:11 ` Pratyush Yadav
2025-10-15 5:31 ` [PATCH 2/2] liveupdate: kho: allocate metadata directly from the buddy allocator Pasha Tatashin
2025-10-15 8:37 ` Mike Rapoport
2025-10-15 12:46 ` Pasha Tatashin
2025-10-15 13:05 ` Pratyush Yadav
2025-10-15 14:19 ` Pasha Tatashin
2025-10-15 14:36 ` Alexander Potapenko
2025-10-24 13:25 ` Jason Gunthorpe
2025-10-24 13:57 ` Pasha Tatashin [this message]
2025-10-24 14:20 ` Jason Gunthorpe
2025-10-24 14:36 ` Pasha Tatashin
2025-10-24 14:55 ` Jason Gunthorpe
2025-10-24 15:06 ` Pasha Tatashin
2025-10-15 14:22 ` Pasha Tatashin
2025-10-24 13:21 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CA+CK2bA_Qb9csWvEQb-zpxgMg7vy+gw9eh0z88QBEdiFdtopMQ@mail.gmail.com \
--to=pasha.tatashin@soleen.com \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=corbet@lwn.net \
--cc=dmatlack@google.com \
--cc=elver@google.com \
--cc=glider@google.com \
--cc=graf@amazon.com \
--cc=jasonmiu@google.com \
--cc=jgg@ziepe.ca \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=masahiroy@kernel.org \
--cc=ojeda@kernel.org \
--cc=pratyush@kernel.org \
--cc=rdunlap@infradead.org \
--cc=rppt@kernel.org \
--cc=skhawaja@google.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox