linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* A plan for supporting PageMovable in 2025
@ 2025-03-17 17:00 Matthew Wilcox
  2025-03-17 17:37 ` Matthew Wilcox
  2025-03-18 10:12 ` David Hildenbrand
  0 siblings, 2 replies; 3+ messages in thread
From: Matthew Wilcox @ 2025-03-17 17:00 UTC (permalink / raw)
  To: linux-mm

With the upcoming shrink of struct page to 4 words, we need a plan for
handling PageMovable.  Ideally this does not involve memory allocation,
and is a relatively simple change from what we have now.  To shrink
struct page beyond 4 words, we'll need a better plan, but I think this
will do for the next few months.
 
The current proposed layout for struct page is:

struct page {
    unsigned long flags;
    union {
        struct list_head buddy_list;
        struct list_head pcp_list;
        struct {
            unsigned long memdesc;
            union {
                unsigned long private;
                atomic_t _mapcount;
            };
        };
    };
    int _refcount;
};

My proposal for movable non-folio pages is:

 * memdesc is used to point to struct movable_operations (these will
   need to be aligned to 16 bytes, but I think that's fine)
 * private is used to point to the next page in the list
 * These pages are refcounted
 * We retain a "lock" bit in page->flags

The most disruptive part of this is that we can't use a list_head any
more.  I don't think a singly-linked list is prohibitive, but either
we need to switch folios to also use a singly-linked list, or we need
to handle folios & movable pages separately.  I think the former is
probably best.



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: A plan for supporting PageMovable in 2025
  2025-03-17 17:00 A plan for supporting PageMovable in 2025 Matthew Wilcox
@ 2025-03-17 17:37 ` Matthew Wilcox
  2025-03-18 10:12 ` David Hildenbrand
  1 sibling, 0 replies; 3+ messages in thread
From: Matthew Wilcox @ 2025-03-17 17:37 UTC (permalink / raw)
  To: linux-mm

On Mon, Mar 17, 2025 at 05:00:03PM +0000, Matthew Wilcox wrote:
> My proposal for movable non-folio pages is:
> 
>  * memdesc is used to point to struct movable_operations (these will
>    need to be aligned to 16 bytes, but I think that's fine)
>  * private is used to point to the next page in the list
>  * These pages are refcounted
>  * We retain a "lock" bit in page->flags
> 
> The most disruptive part of this is that we can't use a list_head any
> more.  I don't think a singly-linked list is prohibitive, but either
> we need to switch folios to also use a singly-linked list, or we need
> to handle folios & movable pages separately.  I think the former is
> probably best.

And of course, moments after sending this out I realise I didn't
include how to handle zpdesc pages.  zpdesc have their own memory
descriptor (like folios do), so they can't use page->memdesc to
point to their movable_ops.

Either we can make zsmalloc_mops visible outside zsmalloc.c
(which would prohibit zsmalloc being built as a module), or
we can make struct zpdesc look like this:

struct zpdesc {
	unsigned long flags;
	struct movable_operations *mops;
	struct zpdesc *next;
	atomic_t _refcount;
	...
};

so that the same code can handle both plain pages and zpdesc pages.
It's a bit wasteful in that every zpdesc would contain exactly the same
mops, but it saves playing games with symbol_get() or having zsmalloc
fill in a zpdesc_mops pointer exported to modules by the migration code.

Again, I'm not trying to define the "forever" code here, I'm looking for
something quick and unlikely to introduce bugs so that we can move
forward with splitting page & folio apart.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: A plan for supporting PageMovable in 2025
  2025-03-17 17:00 A plan for supporting PageMovable in 2025 Matthew Wilcox
  2025-03-17 17:37 ` Matthew Wilcox
@ 2025-03-18 10:12 ` David Hildenbrand
  1 sibling, 0 replies; 3+ messages in thread
From: David Hildenbrand @ 2025-03-18 10:12 UTC (permalink / raw)
  To: Matthew Wilcox, linux-mm

On 17.03.25 18:00, Matthew Wilcox wrote:
> With the upcoming shrink of struct page to 4 words, we need a plan for
> handling PageMovable.  Ideally this does not involve memory allocation,
> and is a relatively simple change from what we have now.  To shrink
> struct page beyond 4 words, we'll need a better plan, but I think this
> will do for the next few months.

Right, I've been focusing on grasping what we need in the long run with 
frozen pages that don't even want any memdesc (PageOffline).

>   
> The current proposed layout for struct page is:
> 
> struct page {
>      unsigned long flags;
>      union {
>          struct list_head buddy_list;
>          struct list_head pcp_list;
>          struct {
>              unsigned long memdesc;
>              union {
>                  unsigned long private;
>                  atomic_t _mapcount;
>              };
>          };
>      };
>      int _refcount;
> };
> 
> My proposal for movable non-folio pages is:
> 
>   * memdesc is used to point to struct movable_operations (these will
>     need to be aligned to 16 bytes, but I think that's fine)

Note that we don't want to allocate a memdesc for PageOffline pages in 
the long run. For balloon compaction it might be fine as a first step.

How'd we handle PAGE_MAPPING_MOVABLE? See below on my idea to avoid what 
you describe here.

>   * private is used to point to the next page in the list
>   * These pages are refcounted
>   * We retain a "lock" bit in page->flags

Note that there is also PG_isolated, which I am hoping we can get rid of.


My current bigger idea is something like this:

1) memdesc type (currently folio type) identifies "struct 
movable_operations". We could think of a registration model for 
migration handlers.

Pg_offline -> call into balloon compaction

Calling the ->isolate callback will fail if the callback is not 
responsible for migrating the page, or if somebody else already isolated it.

Ideally, we'd have two bits (per memdesc) to essentially indicate "this 
is movable" and "this is isolated".

Not 100% sure if the latter is required. If already isolated, simply 
calling the ->isolate callback will fail. I think most of the existing 
PG_isolated users are irrelevant, but it's all complicated.

So a single per-memdesc bit + memdesc type might be sufficient to lookup 
the


2) No dependency on the refcount: ->isolate / ->putback effectively move 
the ownership ("reference") from the real owner to migration code (so 
they can be frozen). We just have to make sure that, while a page is 
isolated, that it cannot be freed by the real owner. (which is already 
the case IIRC)


3) No lists: we simply use an array of PFNs in migration code?


4) Lock bit: not 100% sure yet, but likely not required if ->isolate / 
->migrate / ->putback just handle this locking internally.



Lists are a problem for ballooning drivers with PageOffline pages. I had 
the exact same thought as you regarding "private is used to point to the 
next page in the list", but discarded it because it's inefficient for 
ballooning purposes and not future proof.

So instead, my plan is to using an xarray in the ballooning drivers to 
store the PFNs of inflated pages.

The only nasty thing is that "insert page in the balloon" can fail if 
OOM (inserting into the xarray). In general, that's just fine, except in 
some XEN / Hyper-V code where PageOffline pages are not allocated from 
the buddy where we could put them back, but they "come to life" with 
memory that gets added.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-03-18 10:12 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-03-17 17:00 A plan for supporting PageMovable in 2025 Matthew Wilcox
2025-03-17 17:37 ` Matthew Wilcox
2025-03-18 10:12 ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox