[LSF/MM/BPF TOPIC] Non-lru page migration in a memdesc world

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [LSF/MM/BPF TOPIC] Non-lru page migration in a memdesc world
@ 2025-01-07 16:11 David Hildenbrand
  2025-01-07 16:48 ` Zi Yan
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: David Hildenbrand @ 2025-01-07 16:11 UTC (permalink / raw)
  To: linux-mm; +Cc: lsf-pc

Hi,

one item on my todo list is making PageOffline pages to stop using 
"struct page" members except page->type and 1/2 flags, to prepare them 
for the memdesc future, to avoid unnecessary atomics, and to resolve 
some (so-far) theoretical issues with temporary speculative references.

For example, the page->_refcount will always be 0 (frozen) for 
PageOffline pages, and they will get allocated/freed similar to how we 
allocate/free frozen pages for slab already. Once we move the refcount 
into "struct folio", they will not have a refcount at all anymore.

One complication is balloon compaction: we allow for migrating 
PageOffline pages allocated in some memory ballooning implementations 
such as virtio-balloon.

For that, we use the "non-lru page migration" framework and in that 
process we make use of ... way to many members of "struct page"/"struct 
folio" and rely on the refcount not being 0. For example, we certainly 
don't want to allocate memdescs for PageOffline pages just so some of 
them can be migrated.

While we converted non-lru page migration to work on folios (i.e., 
folio_movable_ops()) these things are not actually "folios" in the 
future, they can have different memdescs.

So, how can we migrate non-lru things that are not folios while not 
relying on "struct folio" members, with minimal/no metadata overhead?

I have some ideas, but no complete solution yet; input about the 
requirements of other non-lru page migration use cases besides 
PageOffline will be interesting.

... and maybe, we have other non-folio things we'd want to migrate, and 
want to be prepared to handle them as well? (hint: leaf page tables?)

-- 
Cheers,

David / dhildenb

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Non-lru page migration in a memdesc world
  2025-01-07 16:11 [LSF/MM/BPF TOPIC] Non-lru page migration in a memdesc world David Hildenbrand
@ 2025-01-07 16:48 ` Zi Yan
  2025-01-07 16:55   ` David Hildenbrand
  2025-01-07 16:49 ` Matthew Wilcox
  2025-03-24 18:56 ` David Hildenbrand
  2 siblings, 1 reply; 9+ messages in thread
From: Zi Yan @ 2025-01-07 16:48 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: linux-mm, lsf-pc

On 7 Jan 2025, at 11:11, David Hildenbrand wrote:

> Hi,
>
> one item on my todo list is making PageOffline pages to stop using "struct page" members except page->type and 1/2 flags, to prepare them for the memdesc future, to avoid unnecessary atomics, and to resolve some (so-far) theoretical issues with temporary speculative references.
>
> For example, the page->_refcount will always be 0 (frozen) for PageOffline pages, and they will get allocated/freed similar to how we allocate/free frozen pages for slab already. Once we move the refcount into "struct folio", they will not have a refcount at all anymore.
>
> One complication is balloon compaction: we allow for migrating PageOffline pages allocated in some memory ballooning implementations such as virtio-balloon.
>
> For that, we use the "non-lru page migration" framework and in that process we make use of ... way to many members of "struct page"/"struct folio" and rely on the refcount not being 0. For example, we certainly don't want to allocate memdescs for PageOffline pages just so some of them can be migrated.

Then first thing is to make all get_new_folio functions be aware of PageOffline
pages and be able to allocate a PageOffline page. IIUC, the current process
is: 1) allocate a page from buddy allocator, 2) offline the new page during
mops->migrate_page() and online the old page. The inflation and deflation
in step 2 looks redundant if migrate_pages() can get PageOffline pages to
begin with and put_page() can handle PageOffline page too.

>
> While we converted non-lru page migration to work on folios (i.e., folio_movable_ops()) these things are not actually "folios" in the future, they can have different memdescs.
>
> So, how can we migrate non-lru things that are not folios while not relying on "struct folio" members, with minimal/no metadata overhead?

Like I said above, if migrate_pages() is aware of PageOffline pages by allocating
and putting them like normal folios, that could work.

Or you can do what hugetlb migration does, adding a separate migrate_offlinepages()
function to handle PageOffline pages. This probably can save you a lot of
LRU page checks like mapping and locks, but it adds a special function. So
tradeoffs.

>
> I have some ideas, but no complete solution yet; input about the requirements of other non-lru page migration use cases besides PageOffline will be interesting.
>
> ... and maybe, we have other non-folio things we'd want to migrate, and want to be prepared to handle them as well? (hint: leaf page tables?)

If we have dedicated allocator for non-folio things and make migrate_pages()
be aware of them, it should be doable.


Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Non-lru page migration in a memdesc world
  2025-01-07 16:11 [LSF/MM/BPF TOPIC] Non-lru page migration in a memdesc world David Hildenbrand
  2025-01-07 16:48 ` Zi Yan
@ 2025-01-07 16:49 ` Matthew Wilcox
  2025-01-08  3:39   ` Zi Yan
  2025-03-24 18:56 ` David Hildenbrand
  2 siblings, 1 reply; 9+ messages in thread
From: Matthew Wilcox @ 2025-01-07 16:49 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: linux-mm, lsf-pc

On Tue, Jan 07, 2025 at 05:11:02PM +0100, David Hildenbrand wrote:
> one item on my todo list is making PageOffline pages to stop using "struct
> page" members except page->type and 1/2 flags, to prepare them for the
> memdesc future, to avoid unnecessary atomics, and to resolve some (so-far)
> theoretical issues with temporary speculative references.

Well, thank goodness someone's working on this!  Because I'm stumped.

> For that, we use the "non-lru page migration" framework and in that process
> we make use of ... way to many members of "struct page"/"struct folio" and
> rely on the refcount not being 0. For example, we certainly don't want to
> allocate memdescs for PageOffline pages just so some of them can be
> migrated.

I mean, let's start with how we migrate pages.

int migrate_pages(struct list_head *from, new_folio_t get_new_folio,
                free_folio_t put_new_folio, unsigned long private,
                enum migrate_mode mode, int reason, unsigned int *ret_succeeded)
...
        list_for_each_entry_safe(folio, folio2, from, lru) {

We identify every folio to be migrated and put them on a list.  But once
non-folio things need to be migrated, this code is wrong.

We could rename this to migrate_folios() and have a different function
for migrating non-folio memory.  But now the compaction code starts to
look distressingly complex [1].  So we need a way to pass in a list/array
of memory to be migrated that doesn't involve a list_head and magically
trying to deduce what the memory is.

I'm actually wondering about a bitmap.  Generally when we migrate memory
it's to create physical contiguity so perhaps passing in a base_pfn
and a bitmap that contains, say, PMD_ORDER bits; then it's the job of
the migration code to figure out what to do for each pfn indicated by
base_pfn and the set bits in the bitmap?

Although now I write this down, I guess NUMA migration doesn't behave
that way.  So perhaps compaction-migration and numa-migration end up
using different interfaces?  I think NUMA migration always migrates
folios, so it can keep using get_new_folio() and put_new_folio() while
the compaction-migration might need a different pair of callbacks to
allocate/free memory of many different memdesc types.

[1] OK, it is already distressingly complex.  But we're making it even
more complex.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Non-lru page migration in a memdesc world
  2025-01-07 16:48 ` Zi Yan
@ 2025-01-07 16:55   ` David Hildenbrand
  2025-01-07 17:27     ` Zi Yan
  0 siblings, 1 reply; 9+ messages in thread
From: David Hildenbrand @ 2025-01-07 16:55 UTC (permalink / raw)
  To: Zi Yan; +Cc: linux-mm, lsf-pc

On 07.01.25 17:48, Zi Yan wrote:
> On 7 Jan 2025, at 11:11, David Hildenbrand wrote:
> 
>> Hi,
>>
>> one item on my todo list is making PageOffline pages to stop using "struct page" members except page->type and 1/2 flags, to prepare them for the memdesc future, to avoid unnecessary atomics, and to resolve some (so-far) theoretical issues with temporary speculative references.
>>
>> For example, the page->_refcount will always be 0 (frozen) for PageOffline pages, and they will get allocated/freed similar to how we allocate/free frozen pages for slab already. Once we move the refcount into "struct folio", they will not have a refcount at all anymore.
>>
>> One complication is balloon compaction: we allow for migrating PageOffline pages allocated in some memory ballooning implementations such as virtio-balloon.
>>
>> For that, we use the "non-lru page migration" framework and in that process we make use of ... way to many members of "struct page"/"struct folio" and rely on the refcount not being 0. For example, we certainly don't want to allocate memdescs for PageOffline pages just so some of them can be migrated.
> 
> Then first thing is to make all get_new_folio functions be aware of PageOffline
> pages and be able to allocate a PageOffline page. IIUC, the current process
> is: 1) allocate a page from buddy allocator, 2) offline the new page during
> mops->migrate_page() and online the old page. The inflation and deflation
> in step 2 looks redundant if migrate_pages() can get PageOffline pages to
> begin with and put_page() can handle PageOffline page too.

That might be one hacky way of handling offline pages, yes :)

(the isolation step is tricky: for example, with page->lru gone we 
cannot even put these things into a list! Also, there is page isolation ...)

I recall that the isolation step is required because we could have 
multiple parties trying to migrate the same page at the same time. So 
that must be handled as well.

> 
>>
>> While we converted non-lru page migration to work on folios (i.e., folio_movable_ops()) these things are not actually "folios" in the future, they can have different memdescs.
>>
>> So, how can we migrate non-lru things that are not folios while not relying on "struct folio" members, with minimal/no metadata overhead?
> 
> Like I said above, if migrate_pages() is aware of PageOffline pages by allocating
> and putting them like normal folios, that could work.
> 
> Or you can do what hugetlb migration does, adding a separate migrate_offlinepages()
> function to handle PageOffline pages. This probably can save you a lot of
> LRU page checks like mapping and locks, but it adds a special function. So
> tradeoffs.
> 
>>
>> I have some ideas, but no complete solution yet; input about the requirements of other non-lru page migration use cases besides PageOffline will be interesting.
>>
>> ... and maybe, we have other non-folio things we'd want to migrate, and want to be prepared to handle them as well? (hint: leaf page tables?)
> 
> If we have dedicated allocator for non-folio things and make migrate_pages()
> be aware of them, it should be doable.

Note that I thought about similar things as you describe above, but part 
of the exercise will not be focusing on PageOffline pages, but having 
something more generic that can handle pages with actual page content, 
and that have to be properly isolated :)

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Non-lru page migration in a memdesc world
  2025-01-07 16:55   ` David Hildenbrand
@ 2025-01-07 17:27     ` Zi Yan
  2025-01-13  4:18       ` Alistair Popple
  0 siblings, 1 reply; 9+ messages in thread
From: Zi Yan @ 2025-01-07 17:27 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: linux-mm, lsf-pc

On 7 Jan 2025, at 11:55, David Hildenbrand wrote:

> On 07.01.25 17:48, Zi Yan wrote:
>> On 7 Jan 2025, at 11:11, David Hildenbrand wrote:
>>
>>> Hi,
>>>
>>> one item on my todo list is making PageOffline pages to stop using "struct page" members except page->type and 1/2 flags, to prepare them for the memdesc future, to avoid unnecessary atomics, and to resolve some (so-far) theoretical issues with temporary speculative references.
>>>
>>> For example, the page->_refcount will always be 0 (frozen) for PageOffline pages, and they will get allocated/freed similar to how we allocate/free frozen pages for slab already. Once we move the refcount into "struct folio", they will not have a refcount at all anymore.
>>>
>>> One complication is balloon compaction: we allow for migrating PageOffline pages allocated in some memory ballooning implementations such as virtio-balloon.
>>>
>>> For that, we use the "non-lru page migration" framework and in that process we make use of ... way to many members of "struct page"/"struct folio" and rely on the refcount not being 0. For example, we certainly don't want to allocate memdescs for PageOffline pages just so some of them can be migrated.
>>
>> Then first thing is to make all get_new_folio functions be aware of PageOffline
>> pages and be able to allocate a PageOffline page. IIUC, the current process
>> is: 1) allocate a page from buddy allocator, 2) offline the new page during
>> mops->migrate_page() and online the old page. The inflation and deflation
>> in step 2 looks redundant if migrate_pages() can get PageOffline pages to
>> begin with and put_page() can handle PageOffline page too.
>
> That might be one hacky way of handling offline pages, yes :)
>
> (the isolation step is tricky: for example, with page->lru gone we cannot even put these things into a list! Also, there is page isolation ...)
>
> I recall that the isolation step is required because we could have multiple parties trying to migrate the same page at the same time. So that must be handled as well.

OK, since page->lru is gone, migrate_pages() might not be suitable for these
pages, unless we want to rewrite migrate_pages(), which might be desirable. :)
Then, we could record PFNs instead, like what migrate_vma*() does, but I have
not checked migrate_vma*() in details to tell the feasibility yet.

In terms of isolation, we can use PageIsolated flag and make sure it is
in the remaining 1/2 flags. This flag can be used for other non-folio things
too.

>
>>
>>>
>>> While we converted non-lru page migration to work on folios (i.e., folio_movable_ops()) these things are not actually "folios" in the future, they can have different memdescs.
>>>
>>> So, how can we migrate non-lru things that are not folios while not relying on "struct folio" members, with minimal/no metadata overhead?
>>
>> Like I said above, if migrate_pages() is aware of PageOffline pages by allocating
>> and putting them like normal folios, that could work.
>>
>> Or you can do what hugetlb migration does, adding a separate migrate_offlinepages()
>> function to handle PageOffline pages. This probably can save you a lot of
>> LRU page checks like mapping and locks, but it adds a special function. So
>> tradeoffs.
>>
>>>
>>> I have some ideas, but no complete solution yet; input about the requirements of other non-lru page migration use cases besides PageOffline will be interesting.
>>>
>>> ... and maybe, we have other non-folio things we'd want to migrate, and want to be prepared to handle them as well? (hint: leaf page tables?)
>>
>> If we have dedicated allocator for non-folio things and make migrate_pages()
>> be aware of them, it should be doable.
>
> Note that I thought about similar things as you describe above, but part of the exercise will not be focusing on PageOffline pages, but having something more generic that can handle pages with actual page content, and that have to be properly isolated :)

Sure. IMHO, we will need dedicated allocation and free functions for
these non-folio things, PageIsolated flag for isolation, a dedicated
code path in migrate_pages() or migrate_vma*().


Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Non-lru page migration in a memdesc world
  2025-01-07 16:49 ` Matthew Wilcox
@ 2025-01-08  3:39   ` Zi Yan
  0 siblings, 0 replies; 9+ messages in thread
From: Zi Yan @ 2025-01-08  3:39 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: David Hildenbrand, linux-mm, lsf-pc

On 7 Jan 2025, at 11:49, Matthew Wilcox wrote:

> On Tue, Jan 07, 2025 at 05:11:02PM +0100, David Hildenbrand wrote:
>> one item on my todo list is making PageOffline pages to stop using "struct
>> page" members except page->type and 1/2 flags, to prepare them for the
>> memdesc future, to avoid unnecessary atomics, and to resolve some (so-far)
>> theoretical issues with temporary speculative references.
>
> Well, thank goodness someone's working on this!  Because I'm stumped.
>
>> For that, we use the "non-lru page migration" framework and in that process
>> we make use of ... way to many members of "struct page"/"struct folio" and
>> rely on the refcount not being 0. For example, we certainly don't want to
>> allocate memdescs for PageOffline pages just so some of them can be
>> migrated.
>
> I mean, let's start with how we migrate pages.
>
> int migrate_pages(struct list_head *from, new_folio_t get_new_folio,
>                 free_folio_t put_new_folio, unsigned long private,
>                 enum migrate_mode mode, int reason, unsigned int *ret_succeeded)
> ...
>         list_for_each_entry_safe(folio, folio2, from, lru) {
>
> We identify every folio to be migrated and put them on a list.  But once
> non-folio things need to be migrated, this code is wrong.
>
> We could rename this to migrate_folios() and have a different function
> for migrating non-folio memory.  But now the compaction code starts to
> look distressingly complex [1].  So we need a way to pass in a list/array
> of memory to be migrated that doesn't involve a list_head and magically
> trying to deduce what the memory is.

How about something like folio_batch carrying a list of pointers to the
to-be-migrated folios/non-folios? But it consumes memory if the number
of to-be-migrated is large and that is probably why ->lru is used.
Allocating memory during migration might not be desirable.

>
> I'm actually wondering about a bitmap.  Generally when we migrate memory
> it's to create physical contiguity so perhaps passing in a base_pfn
> and a bitmap that contains, say, PMD_ORDER bits; then it's the job of
> the migration code to figure out what to do for each pfn indicated by
> base_pfn and the set bits in the bitmap?
>
> Although now I write this down, I guess NUMA migration doesn't behave
> that way.  So perhaps compaction-migration and numa-migration end up
> using different interfaces?  I think NUMA migration always migrates

But both use the same backend to unmap old pages, move metadata, and
remap new pages for folios. It is actually non-folios which have a different
routine for migration. We probably want a dedicated interface for non-folios
when ->lru cannot be used, so during compaction, when a non-folio is
encountered, the dedicated non-folio migration interface is called.
As I am writing, how often we see non-folios in the entire physical space?
If not often, is it possible to just migrate one non-folio at a time
so that the list problem just goes away?

> folios, so it can keep using get_new_folio() and put_new_folio() while
> the compaction-migration might need a different pair of callbacks to
> allocate/free memory of many different memdesc types.
>
> [1] OK, it is already distressingly complex.  But we're making it even
> more complex.


Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Non-lru page migration in a memdesc world
  2025-01-07 17:27     ` Zi Yan
@ 2025-01-13  4:18       ` Alistair Popple
  2025-01-13  4:56         ` Matthew Wilcox
  0 siblings, 1 reply; 9+ messages in thread
From: Alistair Popple @ 2025-01-13  4:18 UTC (permalink / raw)
  To: Zi Yan; +Cc: David Hildenbrand, linux-mm, lsf-pc

On Tue, Jan 07, 2025 at 12:27:57PM -0500, Zi Yan wrote:
> On 7 Jan 2025, at 11:55, David Hildenbrand wrote:
> 
> > On 07.01.25 17:48, Zi Yan wrote:
> >> On 7 Jan 2025, at 11:11, David Hildenbrand wrote:
> >>
> >>> Hi,
> >>>
> >>> one item on my todo list is making PageOffline pages to stop using "struct page" members except page->type and 1/2 flags, to prepare them for the memdesc future, to avoid unnecessary atomics, and to resolve some (so-far) theoretical issues with temporary speculative references.
> >>>
> >>> For example, the page->_refcount will always be 0 (frozen) for PageOffline pages, and they will get allocated/freed similar to how we allocate/free frozen pages for slab already. Once we move the refcount into "struct folio", they will not have a refcount at all anymore.
> >>>
> >>> One complication is balloon compaction: we allow for migrating PageOffline pages allocated in some memory ballooning implementations such as virtio-balloon.
> >>>
> >>> For that, we use the "non-lru page migration" framework and in that process we make use of ... way to many members of "struct page"/"struct folio" and rely on the refcount not being 0. For example, we certainly don't want to allocate memdescs for PageOffline pages just so some of them can be migrated.
> >>
> >> Then first thing is to make all get_new_folio functions be aware of PageOffline
> >> pages and be able to allocate a PageOffline page. IIUC, the current process
> >> is: 1) allocate a page from buddy allocator, 2) offline the new page during
> >> mops->migrate_page() and online the old page. The inflation and deflation
> >> in step 2 looks redundant if migrate_pages() can get PageOffline pages to
> >> begin with and put_page() can handle PageOffline page too.
> >
> > That might be one hacky way of handling offline pages, yes :)
> >
> > (the isolation step is tricky: for example, with page->lru gone we cannot even put these things into a list! Also, there is page isolation ...)
> >
> > I recall that the isolation step is required because we could have multiple parties trying to migrate the same page at the same time. So that must be handled as well.
> 
> OK, since page->lru is gone, migrate_pages() might not be suitable for these
> pages, unless we want to rewrite migrate_pages(), which might be desirable. :)
> Then, we could record PFNs instead, like what migrate_vma*() does, but I have
> not checked migrate_vma*() in details to tell the feasibility yet.

migrate_vma_*() (and migrate_device_range) require folios, but not page->lru as
they are designed to work with both normal LRU pages and ZONE_DEVICE pages which
don't have page->lru because it is used for something else.

To me that is the primary difference between migrate_vma_*() and migrate_pages()
is the latter requires LRU pages. It has long annoyed me that much of the
migrate_pages() logic is duplicated in migrate_vma_*() simply because the former
requires list_heads whilst the latter needs an array of PFNs to deal with a lack
of page->lru. It seems like it should be possible to converge these two code
paths.

> In terms of isolation, we can use PageIsolated flag and make sure it is
> in the remaining 1/2 flags. This flag can be used for other non-folio things
> too.

My understanding of the isolation step was that it was required to ensure
page->lru could be reused by the caller of folio_isolate_lru(), not specifically
to deal directly with multiple parties trying to migrate the same page. Although
perhaps we are saying the same thing if migration is the only time the page->lru
list_heads are used for putting pages on non-LRU lists.

Multiple parties migrating the same page is dealt with reference count checks
- ie. if the reference count doesn't match the "expected" value we assume some
other party is migrating it, pinning it, etc. and fail the migration.

> >
> >>
> >>>
> >>> While we converted non-lru page migration to work on folios (i.e., folio_movable_ops()) these things are not actually "folios" in the future, they can have different memdescs.
> >>>
> >>> So, how can we migrate non-lru things that are not folios while not relying on "struct folio" members, with minimal/no metadata overhead?
> >>
> >> Like I said above, if migrate_pages() is aware of PageOffline pages by allocating
> >> and putting them like normal folios, that could work.
> >>
> >> Or you can do what hugetlb migration does, adding a separate migrate_offlinepages()
> >> function to handle PageOffline pages. This probably can save you a lot of
> >> LRU page checks like mapping and locks, but it adds a special function. So
> >> tradeoffs.
> >>
> >>>
> >>> I have some ideas, but no complete solution yet; input about the requirements of other non-lru page migration use cases besides PageOffline will be interesting.
> >>>
> >>> ... and maybe, we have other non-folio things we'd want to migrate, and want to be prepared to handle them as well? (hint: leaf page tables?)
> >>
> >> If we have dedicated allocator for non-folio things and make migrate_pages()
> >> be aware of them, it should be doable.
> >
> > Note that I thought about similar things as you describe above, but part of the exercise will not be focusing on PageOffline pages, but having something more generic that can handle pages with actual page content, and that have to be properly isolated :)
> 
> Sure. IMHO, we will need dedicated allocation and free functions for
> these non-folio things, PageIsolated flag for isolation, a dedicated
> code path in migrate_pages() or migrate_vma*().

If you don't have page->lru I think it would make more sense to extend
migrate_vma_*() as that already allows migration of non-LRU pages and allows
page content to be copied.

I'm hoping to extend that in the near(ish) future to support large non-LRU
folios (ie. for (m)THP and THP). Part of the difficulty there is figuring out
what the API should look like as an array of PAGE_SIZE PFNs does not really
scale.

 - Alistair

> Best Regards,
> Yan, Zi
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Non-lru page migration in a memdesc world
  2025-01-13  4:18       ` Alistair Popple
@ 2025-01-13  4:56         ` Matthew Wilcox
  0 siblings, 0 replies; 9+ messages in thread
From: Matthew Wilcox @ 2025-01-13  4:56 UTC (permalink / raw)
  To: Alistair Popple; +Cc: Zi Yan, David Hildenbrand, linux-mm, lsf-pc

On Mon, Jan 13, 2025 at 03:18:23PM +1100, Alistair Popple wrote:
> I'm hoping to extend that in the near(ish) future to support large non-LRU
> folios (ie. for (m)THP and THP). Part of the difficulty there is figuring out
> what the API should look like as an array of PAGE_SIZE PFNs does not really
> scale.

I invite you to consider the encoding that I'm going to have to name
soon:

https://kernelnewbies.org/MatthewWilcox/NaturallyAlignedOrder

Not that I'm certain an array is the right answer.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Non-lru page migration in a memdesc world
  2025-01-07 16:11 [LSF/MM/BPF TOPIC] Non-lru page migration in a memdesc world David Hildenbrand
  2025-01-07 16:48 ` Zi Yan
  2025-01-07 16:49 ` Matthew Wilcox
@ 2025-03-24 18:56 ` David Hildenbrand
  2 siblings, 0 replies; 9+ messages in thread
From: David Hildenbrand @ 2025-03-24 18:56 UTC (permalink / raw)
  To: linux-mm; +Cc: lsf-pc

On 07.01.25 17:11, David Hildenbrand wrote:
> Hi,
> 
> one item on my todo list is making PageOffline pages to stop using
> "struct page" members except page->type and 1/2 flags, to prepare them
> for the memdesc future, to avoid unnecessary atomics, and to resolve
> some (so-far) theoretical issues with temporary speculative references.
> 
> For example, the page->_refcount will always be 0 (frozen) for
> PageOffline pages, and they will get allocated/freed similar to how we
> allocate/free frozen pages for slab already. Once we move the refcount
> into "struct folio", they will not have a refcount at all anymore.
> 
> One complication is balloon compaction: we allow for migrating
> PageOffline pages allocated in some memory ballooning implementations
> such as virtio-balloon.
> 
> For that, we use the "non-lru page migration" framework and in that
> process we make use of ... way to many members of "struct page"/"struct
> folio" and rely on the refcount not being 0. For example, we certainly
> don't want to allocate memdescs for PageOffline pages just so some of
> them can be migrated.
> 
> While we converted non-lru page migration to work on folios (i.e.,
> folio_movable_ops()) these things are not actually "folios" in the
> future, they can have different memdescs.
> 
> So, how can we migrate non-lru things that are not folios while not
> relying on "struct folio" members, with minimal/no metadata overhead?
> 
> I have some ideas, but no complete solution yet; input about the
> requirements of other non-lru page migration use cases besides
> PageOffline will be interesting.
> 
> ... and maybe, we have other non-folio things we'd want to migrate, and
> want to be prepared to handle them as well? (hint: leaf page tables?)
> 

Slides from today:

https://drive.google.com/file/d/1NIjjwVAonz9WWIoJZ0nh71ovMfSjfqFc/view?usp=sharing

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-03-24 18:56 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-01-07 16:11 [LSF/MM/BPF TOPIC] Non-lru page migration in a memdesc world David Hildenbrand
2025-01-07 16:48 ` Zi Yan
2025-01-07 16:55   ` David Hildenbrand
2025-01-07 17:27     ` Zi Yan
2025-01-13  4:18       ` Alistair Popple
2025-01-13  4:56         ` Matthew Wilcox
2025-01-07 16:49 ` Matthew Wilcox
2025-01-08  3:39   ` Zi Yan
2025-03-24 18:56 ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox