From: David Hildenbrand <david@redhat.com>
To: Matthew Wilcox <willy@infradead.org>, linux-mm@kvack.org
Subject: Re: A plan for supporting PageMovable in 2025
Date: Tue, 18 Mar 2025 11:12:24 +0100 [thread overview]
Message-ID: <6bb178de-67d1-41fd-a033-72fb52ad0905@redhat.com> (raw)
In-Reply-To: <Z9hVE-SzM-IuS-6B@casper.infradead.org>
On 17.03.25 18:00, Matthew Wilcox wrote:
> With the upcoming shrink of struct page to 4 words, we need a plan for
> handling PageMovable. Ideally this does not involve memory allocation,
> and is a relatively simple change from what we have now. To shrink
> struct page beyond 4 words, we'll need a better plan, but I think this
> will do for the next few months.
Right, I've been focusing on grasping what we need in the long run with
frozen pages that don't even want any memdesc (PageOffline).
>
> The current proposed layout for struct page is:
>
> struct page {
> unsigned long flags;
> union {
> struct list_head buddy_list;
> struct list_head pcp_list;
> struct {
> unsigned long memdesc;
> union {
> unsigned long private;
> atomic_t _mapcount;
> };
> };
> };
> int _refcount;
> };
>
> My proposal for movable non-folio pages is:
>
> * memdesc is used to point to struct movable_operations (these will
> need to be aligned to 16 bytes, but I think that's fine)
Note that we don't want to allocate a memdesc for PageOffline pages in
the long run. For balloon compaction it might be fine as a first step.
How'd we handle PAGE_MAPPING_MOVABLE? See below on my idea to avoid what
you describe here.
> * private is used to point to the next page in the list
> * These pages are refcounted
> * We retain a "lock" bit in page->flags
Note that there is also PG_isolated, which I am hoping we can get rid of.
My current bigger idea is something like this:
1) memdesc type (currently folio type) identifies "struct
movable_operations". We could think of a registration model for
migration handlers.
Pg_offline -> call into balloon compaction
Calling the ->isolate callback will fail if the callback is not
responsible for migrating the page, or if somebody else already isolated it.
Ideally, we'd have two bits (per memdesc) to essentially indicate "this
is movable" and "this is isolated".
Not 100% sure if the latter is required. If already isolated, simply
calling the ->isolate callback will fail. I think most of the existing
PG_isolated users are irrelevant, but it's all complicated.
So a single per-memdesc bit + memdesc type might be sufficient to lookup
the
2) No dependency on the refcount: ->isolate / ->putback effectively move
the ownership ("reference") from the real owner to migration code (so
they can be frozen). We just have to make sure that, while a page is
isolated, that it cannot be freed by the real owner. (which is already
the case IIRC)
3) No lists: we simply use an array of PFNs in migration code?
4) Lock bit: not 100% sure yet, but likely not required if ->isolate /
->migrate / ->putback just handle this locking internally.
Lists are a problem for ballooning drivers with PageOffline pages. I had
the exact same thought as you regarding "private is used to point to the
next page in the list", but discarded it because it's inefficient for
ballooning purposes and not future proof.
So instead, my plan is to using an xarray in the ballooning drivers to
store the PFNs of inflated pages.
The only nasty thing is that "insert page in the balloon" can fail if
OOM (inserting into the xarray). In general, that's just fine, except in
some XEN / Hyper-V code where PageOffline pages are not allocated from
the buddy where we could put them back, but they "come to life" with
memory that gets added.
--
Cheers,
David / dhildenb
prev parent reply other threads:[~2025-03-18 10:12 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-17 17:00 Matthew Wilcox
2025-03-17 17:37 ` Matthew Wilcox
2025-03-18 10:12 ` David Hildenbrand [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6bb178de-67d1-41fd-a033-72fb52ad0905@redhat.com \
--to=david@redhat.com \
--cc=linux-mm@kvack.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox