linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* PageOffline: refcount, flags and memdesc
@ 2024-11-14 11:18 David Hildenbrand
  2024-11-14 15:23 ` Matthew Wilcox
  0 siblings, 1 reply; 5+ messages in thread
From: David Hildenbrand @ 2024-11-14 11:18 UTC (permalink / raw)
  To: linux-mm; +Cc: Matthew Wilcox

Hi,

I'm currently staring again at PageOffline and wonder how we could 
prepare it for the memdesc future, and if we can remove refcount handling.


Currently, we set PageOffline in the following cases (one nast exception 
below):

(a) Memory blocks gets onlined, whereby we initialize all "struct pages
     to PageOffline + refcount of 1: memmap_init_range(). These pages are
     expected to get onlined via generic_online_page() later. Drivers
     might decide to leave some offline, because they are not backed by
     actual memory in the hypervisor. Some drivers still use free_page()
     instead of generic_online_page().

(b) We allocated pages (alloc_page(), alloc_contig_pages() ...) to
     logically offline them, whereby the refcount is set to 1 by the
     buddy and to PageOffline is set manually be the driver afterwards.

We clear PageOffline in the following cases (one nasty exception below):

(a) We want to return a page to the buddy (free_page/
     free_contig_page_range).
     PageOffline is cleared by the driver and freeing the page will
     decrement the refcount to 0.
(b) We want to expose it to the buddy the first time
     (generic_online_page). We will force the refcount to 0.

There are still subtle differences between onlining a page the first 
time to the buddy, such as debug_pagealloc_map_pages() in 
__free_pages_core(). I'm hoping we can get rid of them long-term, or 
just abstract it internally.


I'd like to stop using the refcount for PageOffline pages, and keep the 
refcount always at 0.

But the refcount, it is currently used to detect whether we are allowed 
to offline memory blocks that contain PageOffline pages, because only 
selected drivers support re-onlining. Well, and it is used when 
returning the pages to the buddy where 
free_page()/free_contig_range().... expect a refcount of 1.

Further, virtio-mem currently uses the PageDirty() bit to remember if a 
PageOffline page was already exposed to the buddy before, or if we must 
use generic_online_page().

For now we would need the following information, that could be stored in 
2 flags, leaving the refcount at 0:

(1) Was it obtained from the buddy or never exposed it to the buddy

PageOffline() && PageOfflineNeverOnlined()

(2) The driver does support actual memory offlining+reonlining, they can
     be skipped when offlining.

PageOffline() && PageOfflineSkippable


But when allocating/freeing pages we would still mess with the refcount, 
which is bad.

We could have a dedicated interface for freeing them, where we abstract 
the generic_online_page() bits, and leave the refcount at 0:

free_offline_page()
free_offline_page_range()

And

alloc_offline_page()
alloc_offline_page_range()
alloc_offline_pages

I'm not super happy about the "alloc/free" terminology, but nothing 
better came to mind.


There is one complication to sort out: balloon_compaction.h supports 
moving PageOffline pages, and seems to use the page lock, page refcount, 
page lru, page private... which is all rather nasty. I wonder if these 
should get their own page type, like PageMovableOffline, and we'd mostly 
leave them alone for now. This would mean that virtio-balloon, 
vmware-balloon and ppc CMM would keep doing the old refcount-based thing 
but with a new page type.


I assume this all goes into the direction of getting pages from the 
buddy and returning them without refcounts  ... thoughts?

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-11-14 20:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-11-14 11:18 PageOffline: refcount, flags and memdesc David Hildenbrand
2024-11-14 15:23 ` Matthew Wilcox
2024-11-14 15:55   ` David Hildenbrand
2024-11-14 20:29     ` Matthew Wilcox
2024-11-14 20:45       ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox