From: David Hildenbrand <david@redhat.com>
To: Oscar Salvador <osalvador@suse.de>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-hyperv@vger.kernel.org, virtualization@lists.linux.dev,
xen-devel@lists.xenproject.org, kasan-dev@googlegroups.com,
"Andrew Morton" <akpm@linux-foundation.org>,
"Mike Rapoport" <rppt@kernel.org>,
"K. Y. Srinivasan" <kys@microsoft.com>,
"Haiyang Zhang" <haiyangz@microsoft.com>,
"Wei Liu" <wei.liu@kernel.org>,
"Dexuan Cui" <decui@microsoft.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
"Jason Wang" <jasowang@redhat.com>,
"Xuan Zhuo" <xuanzhuo@linux.alibaba.com>,
"Eugenio Pérez" <eperezma@redhat.com>,
"Juergen Gross" <jgross@suse.com>,
"Stefano Stabellini" <sstabellini@kernel.org>,
"Oleksandr Tyshchenko" <oleksandr_tyshchenko@epam.com>,
"Alexander Potapenko" <glider@google.com>,
"Marco Elver" <elver@google.com>,
"Dmitry Vyukov" <dvyukov@google.com>
Subject: Re: [PATCH v1 2/3] mm/memory_hotplug: initialize memmap of !ZONE_DEVICE with PageOffline() instead of PageReserved()
Date: Mon, 10 Jun 2024 10:56:02 +0200 [thread overview]
Message-ID: <5d9583e1-3374-437d-8eea-6ab1e1400a30@redhat.com> (raw)
In-Reply-To: <ZmZ_3Xc7fdrL1R15@localhost.localdomain>
On 10.06.24 06:23, Oscar Salvador wrote:
> On Fri, Jun 07, 2024 at 11:09:37AM +0200, David Hildenbrand wrote:
>> We currently initialize the memmap such that PG_reserved is set and the
>> refcount of the page is 1. In virtio-mem code, we have to manually clear
>> that PG_reserved flag to make memory offlining with partially hotplugged
>> memory blocks possible: has_unmovable_pages() would otherwise bail out on
>> such pages.
>>
>> We want to avoid PG_reserved where possible and move to typed pages
>> instead. Further, we want to further enlighten memory offlining code about
>> PG_offline: offline pages in an online memory section. One example is
>> handling managed page count adjustments in a cleaner way during memory
>> offlining.
>>
>> So let's initialize the pages with PG_offline instead of PG_reserved.
>> generic_online_page()->__free_pages_core() will now clear that flag before
>> handing that memory to the buddy.
>>
>> Note that the page refcount is still 1 and would forbid offlining of such
>> memory except when special care is take during GOING_OFFLINE as
>> currently only implemented by virtio-mem.
>>
>> With this change, we can now get non-PageReserved() pages in the XEN
>> balloon list. From what I can tell, that can already happen via
>> decrease_reservation(), so that should be fine.
>>
>> HV-balloon should not really observe a change: partial online memory
>> blocks still cannot get surprise-offlined, because the refcount of these
>> PageOffline() pages is 1.
>>
>> Update virtio-mem, HV-balloon and XEN-balloon code to be aware that
>> hotplugged pages are now PageOffline() instead of PageReserved() before
>> they are handed over to the buddy.
>>
>> We'll leave the ZONE_DEVICE case alone for now.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 27e3be75edcf7..0254059efcbe1 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -734,7 +734,7 @@ static inline void section_taint_zone_device(unsigned long pfn)
>> /*
>> * Associate the pfn range with the given zone, initializing the memmaps
>> * and resizing the pgdat/zone data to span the added pages. After this
>> - * call, all affected pages are PG_reserved.
>> + * call, all affected pages are PageOffline().
>> *
>> * All aligned pageblocks are initialized to the specified migratetype
>> * (usually MIGRATE_MOVABLE). Besides setting the migratetype, no related
>> @@ -1100,8 +1100,12 @@ int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_pages,
>>
>> move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_UNMOVABLE);
>>
>> - for (i = 0; i < nr_pages; i++)
>> - SetPageVmemmapSelfHosted(pfn_to_page(pfn + i));
>> + for (i = 0; i < nr_pages; i++) {
>> + struct page *page = pfn_to_page(pfn + i);
>> +
>> + __ClearPageOffline(page);
>> + SetPageVmemmapSelfHosted(page);
>
> So, refresh my memory here please.
> AFAIR, those VmemmapSelfHosted pages were marked Reserved before, but now,
> memmap_init_range() will not mark them reserved anymore.
Correct.
> I do not think that is ok? I am worried about walkers getting this wrong.
>
> We usually skip PageReserved pages in walkers because are pages we cannot deal
> with for those purposes, but with this change, we will leak
> PageVmemmapSelfHosted, and I am not sure whether are ready for that.
There are fortunately not that many left.
I'd even say marking them (vmemmap) reserved is more wrong than right:
note that ordinary vmemmap pages after memory hotplug are not reserved!
Only bootmem should be reserved.
Let's take at the relevant core-mm ones (arch stuff is mostly just for
MMIO remapping)
fs/proc/task_mmu.c: if (PageReserved(page))
fs/proc/task_mmu.c: if (PageReserved(page))
-> If we find vmemmap pages mapped into user space we already messed up
seriously
kernel/power/snapshot.c: if (PageReserved(page) ||
kernel/power/snapshot.c: if (PageReserved(page)
-> There should be no change (saveable_page() would still allow saving
them, highmem does not apply)
mm/hugetlb_vmemmap.c: if (!PageReserved(head))
mm/hugetlb_vmemmap.c: if (PageReserved(page))
-> Wants to identify bootmem, but we exclude these
PageVmemmapSelfHosted() on the splitting part already properly
mm/page_alloc.c: VM_WARN_ON_ONCE(PageReserved(p));
mm/page_alloc.c: if (PageReserved(page))
-> pfn_range_valid_contig() would scan them, just like for ordinary
vmemmap pages during hotplug. We'll simply fail isolating/migrating
them similarly (like any unmovable allocations) later
mm/page_ext.c: BUG_ON(PageReserved(page));
-> free_page_ext handling, does not apply
mm/page_isolation.c: if (PageReserved(page))
-> has_unmovable_pages() should still detect them as unmovable (e.g.,
neither movable nor LRU).
mm/page_owner.c: if (PageReserved(page))
mm/page_owner.c: if (PageReserved(page))
-> Simply page_ext_get() will return NULL instead and we'll similarly
skip them
mm/sparse.c: if (!PageReserved(virt_to_page(ms->usage))) {
-> Detecting boot memory for ms->usage allocation, does not apply to
vmemmap.
virt/kvm/kvm_main.c: if (!PageReserved(page))
virt/kvm/kvm_main.c: return !PageReserved(page);
-> For MMIO remapping purposes, does not apply to vmemmap
> Moreover, boot memmap pages are marked as PageReserved, which would be
> now inconsistent with those added during hotplug operations.
Just like vmemmap pages allocated dynamically during memory hotplug.
Now, really only bootmem-ones are PageReserved.
> All in all, I feel uneasy about this change.
I really don't want to mark these pages here PageReserved for the sake
of it.
Any PageReserved user that I am missing, or why we should handle these
vmemmap pages differently than the ones allocated during ordinary memory
hotplug?
In the future, we might want to consider using a dedicated page type for
them, so we can stop using a bit that doesn't allow to reliably identify
them. (we should mark all vmemmap with that type then)
Thanks!
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2024-06-10 8:56 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-07 9:09 [PATCH v1 0/3] mm/memory_hotplug: use PageOffline() instead of PageReserved() for !ZONE_DEVICE David Hildenbrand
2024-06-07 9:09 ` [PATCH v1 1/3] mm: pass meminit_context to __free_pages_core() David Hildenbrand
2024-06-07 18:40 ` David Hildenbrand
2024-06-10 4:03 ` Oscar Salvador
[not found] ` <13070847-4129-490c-b228-2e52bd77566a@redhat.com>
2024-06-10 11:47 ` Oscar Salvador
2024-06-11 10:06 ` David Hildenbrand
2024-06-11 19:19 ` Andrew Morton
2024-06-11 19:20 ` David Hildenbrand
2024-06-12 18:38 ` David Hildenbrand
2024-06-11 19:41 ` Tim Chen
2024-06-11 19:50 ` David Hildenbrand
2024-06-07 9:09 ` [PATCH v1 2/3] mm/memory_hotplug: initialize memmap of !ZONE_DEVICE with PageOffline() instead of PageReserved() David Hildenbrand
2024-06-10 4:23 ` Oscar Salvador
2024-06-10 8:56 ` David Hildenbrand [this message]
2024-06-11 7:45 ` Oscar Salvador
2024-06-11 8:04 ` David Hildenbrand
2024-06-11 8:01 ` Oscar Salvador
2024-06-11 9:42 ` David Hildenbrand
2024-06-11 19:36 ` Andrew Morton
2024-06-07 9:09 ` [PATCH v1 3/3] mm/memory_hotplug: skip adjust_managed_page_count() for PageOffline() pages when offlining David Hildenbrand
2024-06-10 4:29 ` Oscar Salvador
2024-06-10 8:56 ` David Hildenbrand
2024-06-25 22:43 ` [PATCH v1 0/3] mm/memory_hotplug: use PageOffline() instead of PageReserved() for !ZONE_DEVICE Andrew Morton
2024-06-26 5:01 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5d9583e1-3374-437d-8eea-6ab1e1400a30@redhat.com \
--to=david@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=decui@microsoft.com \
--cc=dvyukov@google.com \
--cc=elver@google.com \
--cc=eperezma@redhat.com \
--cc=glider@google.com \
--cc=haiyangz@microsoft.com \
--cc=jasowang@redhat.com \
--cc=jgross@suse.com \
--cc=kasan-dev@googlegroups.com \
--cc=kys@microsoft.com \
--cc=linux-hyperv@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mst@redhat.com \
--cc=oleksandr_tyshchenko@epam.com \
--cc=osalvador@suse.de \
--cc=rppt@kernel.org \
--cc=sstabellini@kernel.org \
--cc=virtualization@lists.linux.dev \
--cc=wei.liu@kernel.org \
--cc=xen-devel@lists.xenproject.org \
--cc=xuanzhuo@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox