Re: [LSF/MM/BPF TOPIC] Non-lru page migration in a memdesc world

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Hildenbrand <david@redhat.com>
To: Zi Yan <ziy@nvidia.com>
Cc: linux-mm@kvack.org, lsf-pc@lists.linux-foundation.org
Subject: Re: [LSF/MM/BPF TOPIC] Non-lru page migration in a memdesc world
Date: Tue, 7 Jan 2025 17:55:54 +0100	[thread overview]
Message-ID: <e3bad2d0-ed98-4560-8a88-f42ebd9c78b6@redhat.com> (raw)
In-Reply-To: <5C9489ED-0B26-40EA-B47B-C034E7DACB2F@nvidia.com>

On 07.01.25 17:48, Zi Yan wrote:
> On 7 Jan 2025, at 11:11, David Hildenbrand wrote:
> 
>> Hi,
>>
>> one item on my todo list is making PageOffline pages to stop using "struct page" members except page->type and 1/2 flags, to prepare them for the memdesc future, to avoid unnecessary atomics, and to resolve some (so-far) theoretical issues with temporary speculative references.
>>
>> For example, the page->_refcount will always be 0 (frozen) for PageOffline pages, and they will get allocated/freed similar to how we allocate/free frozen pages for slab already. Once we move the refcount into "struct folio", they will not have a refcount at all anymore.
>>
>> One complication is balloon compaction: we allow for migrating PageOffline pages allocated in some memory ballooning implementations such as virtio-balloon.
>>
>> For that, we use the "non-lru page migration" framework and in that process we make use of ... way to many members of "struct page"/"struct folio" and rely on the refcount not being 0. For example, we certainly don't want to allocate memdescs for PageOffline pages just so some of them can be migrated.
> 
> Then first thing is to make all get_new_folio functions be aware of PageOffline
> pages and be able to allocate a PageOffline page. IIUC, the current process
> is: 1) allocate a page from buddy allocator, 2) offline the new page during
> mops->migrate_page() and online the old page. The inflation and deflation
> in step 2 looks redundant if migrate_pages() can get PageOffline pages to
> begin with and put_page() can handle PageOffline page too.

That might be one hacky way of handling offline pages, yes :)

(the isolation step is tricky: for example, with page->lru gone we 
cannot even put these things into a list! Also, there is page isolation ...)

I recall that the isolation step is required because we could have 
multiple parties trying to migrate the same page at the same time. So 
that must be handled as well.

> 
>>
>> While we converted non-lru page migration to work on folios (i.e., folio_movable_ops()) these things are not actually "folios" in the future, they can have different memdescs.
>>
>> So, how can we migrate non-lru things that are not folios while not relying on "struct folio" members, with minimal/no metadata overhead?
> 
> Like I said above, if migrate_pages() is aware of PageOffline pages by allocating
> and putting them like normal folios, that could work.
> 
> Or you can do what hugetlb migration does, adding a separate migrate_offlinepages()
> function to handle PageOffline pages. This probably can save you a lot of
> LRU page checks like mapping and locks, but it adds a special function. So
> tradeoffs.
> 
>>
>> I have some ideas, but no complete solution yet; input about the requirements of other non-lru page migration use cases besides PageOffline will be interesting.
>>
>> ... and maybe, we have other non-folio things we'd want to migrate, and want to be prepared to handle them as well? (hint: leaf page tables?)
> 
> If we have dedicated allocator for non-folio things and make migrate_pages()
> be aware of them, it should be doable.

Note that I thought about similar things as you describe above, but part 
of the exercise will not be focusing on PageOffline pages, but having 
something more generic that can handle pages with actual page content, 
and that have to be properly isolated :)

-- 
Cheers,

David / dhildenb

next prev parent reply	other threads:[~2025-01-07 16:56 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-07 16:11 David Hildenbrand
2025-01-07 16:48 ` Zi Yan
2025-01-07 16:55   ` David Hildenbrand [this message]
2025-01-07 17:27     ` Zi Yan
2025-01-13  4:18       ` Alistair Popple
2025-01-13  4:56         ` Matthew Wilcox
2025-01-07 16:49 ` Matthew Wilcox
2025-01-08  3:39   ` Zi Yan
2025-03-24 18:56 ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e3bad2d0-ed98-4560-8a88-f42ebd9c78b6@redhat.com \
    --to=david@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox