linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Zi Yan <ziy@nvidia.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	virtualization@lists.linux.dev,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Jason Wang" <jasowang@redhat.com>,
	"Xuan Zhuo" <xuanzhuo@linux.alibaba.com>,
	"Eugenio Pérez" <eperezma@redhat.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Oscar Salvador" <osalvador@suse.de>,
	"Vlastimil Babka" <vbabka@suse.cz>,
	"Suren Baghdasaryan" <surenb@google.com>,
	"Michal Hocko" <mhocko@suse.com>,
	"Brendan Jackman" <jackmanb@google.com>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>
Subject: Re: [PATCH v1 1/2] mm/memory_hotplug: PG_offline_skippable for offlining memory blocks with PageOffline pages
Date: Mon, 19 May 2025 16:39:38 +0200	[thread overview]
Message-ID: <85fd0566-30c8-4e1e-8ce9-5b5bb4f1fa84@redhat.com> (raw)
In-Reply-To: <0F0BB46A-F7C2-40BE-A045-C65525956D52@nvidia.com>


>>> Some thoughts after reading the related code:
>>>
>>> offline_pages() is a little convoluted, since it has two steps to make
>>> sure a range of memory can be offlined:
>>> 1. start_isolate_page_range() checks unmovable (maybe not migratable
>>> is more precise) pages using has_unmovable_pages(), but leaves unmovable
>>> PageOffline() page handling to the driver;
>>
>> Right, it's best-effort. For ZONE_MOVABLE we're skipping the checks completely.
>>
>> I was wondering for a longer time that -- with the isolate flag being a separate bit soon :) -- we could simply isolate the whole range, and then fail if we stumble over
> 
> Talking about that, I will need your input on my change to move_pfn_range_to_zone()
> in online_pages()[1]. Making MIGRATE_ISOLATE a separate bit means if you isolate
> a pageblock without a migratetype, unisolating it will give an unpredictable
> migratetype.

Sorry for my late reply on that. I think, the rule should be that you 
initialize the migratetype once, before any isolation+un-isolation.

So that should only require care when adding new pageblocks (memory 
hotplug).

> 
> [1] https://lore.kernel.org/linux-mm/20250509200111.3372279-3-ziy@nvidia.com/
> 
>> an unmovable page during migration. That is, get rid of has_unmovable_pages() entirely. Un-doing the isolation would then properly preserve the migratetype -- no harm done?
>>
>> Certainly worth a look. What do you think about that?
> 
> In principle, the method should work and simplifies the code. But it suffers more
> penalty (pages are migrated) when an unmovable page is encountered after the
> isolation, since before nothing will be migrated. To mitigate this,
> we probably would want some retry mechanism. For example, register a callback
> to each unmovable page and once the unmovable page is deallocated,
> alloc_contig_range() can move forward progress.

I was wondering if we could make the isolation code a  bit simpler. For 
example, set all involved pageblocks isolated. If any is already 
isolated, we can easily back off.

Once all are isolated, we could do the has_unmovable_pages() on the 
whole range, not a single pageblock.

Then we could start migrating.

I think the "issue" is, once we drop the zone lock, pages that are 
getting freed end up on the MIGRATE_ISOLATE list -- and I think we also 
must have moved the free pages already to the proper MIGRATE_ISOLATE 
list before we drop the zone lock.

So the possible cleanups might be limited.

> 
>>
>>> 2. scan_movable_pages() and do_migrate_range() migrate pages and handle
>>> different types of PageOffline() pages.
>>
>> Right, migrate what we can, skip these special once, and bail out on any others (at least in this patch, patch #2 restores the pre-virtio-mem behavior).
>>
>>>
>>> It might make the logic cleaner if start_isolate_page_range() can
>>> have a callback to allow driver to handle PageOffline() pages.
>>
>> We have to identify them repeadetly, so start_isolate_page_range() would not be sufficient. Also, callbacks are rather tricky for the case where we cannot really stabilize the page.
> 
> During start_isolate_page_range(), all pageblocks are isolated, so
> free pages cannot be used by anyone else, meaning no new PageOffline()
> pages or any other unmovable pages, right?

Yes, that is correct. Pages (incl. PageOffline) pages could get freed 
back to the buddy.

But we don't want to store per-page callbacks either way.

What would work is a new notifier chain (protected by RCU etc), that we 
can ask for each PageOffline page.

Not sure I prefer that of the approach here that is significantly 
simpler -- and we do have the spare bit for PageOffline pages in the new 
memdec world (for PageOffline, we'll probably need 2/3 flags in the long 
run).

> 
>>
>>>
>>> Hmm, Skippable PageOffline() pages are virtio-mem specific, I wonder
>>> why offline_pages() takes care of them. Shouldn't virtio-mem driver
>>> handle them?
>>> I also realize that the two steps in offline_pages()> are very similar to alloc_contig_range() and virtio-mem is using
>>> alloc_contig_range(). I wonder if the two steps in offline_pages()
>>> can be abstracted out and shared with virtio-mem.Yes, offline_pages()> operates at memory section granularity, different from the granularity,
>>> pageblock size, of alloc_contig_range() used in virtio-mem, but
>>> both are trying to check unmovable pages and migrate movable pages.
>>
>> Not sure I get completely what you mean. virtio-mem really wants to use alloc_contig_range() to allocate ranges it wants to unplug, because it will fail fast and allows for smaller ranges.
>>
>> offline_pages() is primarily also about offlining the memory section, which virtio-mem must do in order to remove the Linux memory block.
> 
> To clarify, I mean the steps of start_isolate_page_range(), scan_movable_pages(),
> do_migrate_range(), dissolve_free_hugetlb_folios() and test_pages_isolated() in
> offline_pages() is very similar to the steps of start_isolate_page_range(),
> __alloc_contig_migrate_range(), replace_free_hugepage_folios(),
> and test_pages_isolate() in alloc_contig_range(). So I wonder if a common
> function routine can be shared.

Ah, yes, I had the same idea a couple of years back, but never got to it.

There are subtle differences (e.g., memory offlining keeps retrying and 
is allowed to dissolve free hugetlb folios), but these could likely be 
modified using a parameter (similar to how we already special-case 
offlining).


Thanks!

-- 
Cheers,

David / dhildenb



  reply	other threads:[~2025-05-19 14:39 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-14 11:15 [PATCH v1 0/2] mm/memory_hotplug: introduce and use PG_offline_skippable David Hildenbrand
2025-05-14 11:15 ` [PATCH v1 1/2] mm/memory_hotplug: PG_offline_skippable for offlining memory blocks with PageOffline pages David Hildenbrand
2025-05-14 19:00   ` Zi Yan
2025-05-14 19:51     ` David Hildenbrand
2025-05-14 20:30       ` Zi Yan
2025-05-19 14:39         ` David Hildenbrand [this message]
2025-05-14 11:15 ` [PATCH v1 2/2] mm/memory_hotplug: remove -EBUSY handling from scan_movable_pages() David Hildenbrand
2025-05-14 19:57   ` David Hildenbrand
2025-05-14 13:45 ` [PATCH v1 0/2] mm/memory_hotplug: introduce and use PG_offline_skippable Zi Yan
2025-05-14 14:12   ` David Hildenbrand
2025-05-14 15:49     ` Zi Yan
2025-05-14 17:28       ` David Hildenbrand
2025-05-14 17:43         ` Zi Yan
2025-05-14 17:46           ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=85fd0566-30c8-4e1e-8ce9-5b5bb4f1fa84@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=eperezma@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=jackmanb@google.com \
    --cc=jasowang@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=mst@redhat.com \
    --cc=osalvador@suse.de \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=virtualization@lists.linux.dev \
    --cc=willy@infradead.org \
    --cc=xuanzhuo@linux.alibaba.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox