linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Oscar Salvador <osalvador@suse.de>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-hyperv@vger.kernel.org, virtualization@lists.linux.dev,
	xen-devel@lists.xenproject.org, kasan-dev@googlegroups.com,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Mike Rapoport" <rppt@kernel.org>,
	"K. Y. Srinivasan" <kys@microsoft.com>,
	"Haiyang Zhang" <haiyangz@microsoft.com>,
	"Wei Liu" <wei.liu@kernel.org>,
	"Dexuan Cui" <decui@microsoft.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Jason Wang" <jasowang@redhat.com>,
	"Xuan Zhuo" <xuanzhuo@linux.alibaba.com>,
	"Eugenio Pérez" <eperezma@redhat.com>,
	"Juergen Gross" <jgross@suse.com>,
	"Stefano Stabellini" <sstabellini@kernel.org>,
	"Oleksandr Tyshchenko" <oleksandr_tyshchenko@epam.com>,
	"Alexander Potapenko" <glider@google.com>,
	"Marco Elver" <elver@google.com>,
	"Dmitry Vyukov" <dvyukov@google.com>
Subject: Re: [PATCH v1 2/3] mm/memory_hotplug: initialize memmap of !ZONE_DEVICE with PageOffline() instead of PageReserved()
Date: Mon, 10 Jun 2024 10:56:02 +0200	[thread overview]
Message-ID: <5d9583e1-3374-437d-8eea-6ab1e1400a30@redhat.com> (raw)
In-Reply-To: <ZmZ_3Xc7fdrL1R15@localhost.localdomain>

On 10.06.24 06:23, Oscar Salvador wrote:
> On Fri, Jun 07, 2024 at 11:09:37AM +0200, David Hildenbrand wrote:
>> We currently initialize the memmap such that PG_reserved is set and the
>> refcount of the page is 1. In virtio-mem code, we have to manually clear
>> that PG_reserved flag to make memory offlining with partially hotplugged
>> memory blocks possible: has_unmovable_pages() would otherwise bail out on
>> such pages.
>>
>> We want to avoid PG_reserved where possible and move to typed pages
>> instead. Further, we want to further enlighten memory offlining code about
>> PG_offline: offline pages in an online memory section. One example is
>> handling managed page count adjustments in a cleaner way during memory
>> offlining.
>>
>> So let's initialize the pages with PG_offline instead of PG_reserved.
>> generic_online_page()->__free_pages_core() will now clear that flag before
>> handing that memory to the buddy.
>>
>> Note that the page refcount is still 1 and would forbid offlining of such
>> memory except when special care is take during GOING_OFFLINE as
>> currently only implemented by virtio-mem.
>>
>> With this change, we can now get non-PageReserved() pages in the XEN
>> balloon list. From what I can tell, that can already happen via
>> decrease_reservation(), so that should be fine.
>>
>> HV-balloon should not really observe a change: partial online memory
>> blocks still cannot get surprise-offlined, because the refcount of these
>> PageOffline() pages is 1.
>>
>> Update virtio-mem, HV-balloon and XEN-balloon code to be aware that
>> hotplugged pages are now PageOffline() instead of PageReserved() before
>> they are handed over to the buddy.
>>
>> We'll leave the ZONE_DEVICE case alone for now.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
> 
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 27e3be75edcf7..0254059efcbe1 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -734,7 +734,7 @@ static inline void section_taint_zone_device(unsigned long pfn)
>>   /*
>>    * Associate the pfn range with the given zone, initializing the memmaps
>>    * and resizing the pgdat/zone data to span the added pages. After this
>> - * call, all affected pages are PG_reserved.
>> + * call, all affected pages are PageOffline().
>>    *
>>    * All aligned pageblocks are initialized to the specified migratetype
>>    * (usually MIGRATE_MOVABLE). Besides setting the migratetype, no related
>> @@ -1100,8 +1100,12 @@ int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_pages,
>>   
>>   	move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_UNMOVABLE);
>>   
>> -	for (i = 0; i < nr_pages; i++)
>> -		SetPageVmemmapSelfHosted(pfn_to_page(pfn + i));
>> +	for (i = 0; i < nr_pages; i++) {
>> +		struct page *page = pfn_to_page(pfn + i);
>> +
>> +		__ClearPageOffline(page);
>> +		SetPageVmemmapSelfHosted(page);
> 
> So, refresh my memory here please.
> AFAIR, those VmemmapSelfHosted pages were marked Reserved before, but now,
> memmap_init_range() will not mark them reserved anymore.

Correct.

> I do not think that is ok? I am worried about walkers getting this wrong.
> 
> We usually skip PageReserved pages in walkers because are pages we cannot deal
> with for those purposes, but with this change, we will leak
> PageVmemmapSelfHosted, and I am not sure whether are ready for that.

There are fortunately not that many left.

I'd even say marking them (vmemmap) reserved is more wrong than right: 
note that ordinary vmemmap pages after memory hotplug are not reserved! 
Only bootmem should be reserved.

Let's take at the relevant core-mm ones (arch stuff is mostly just for 
MMIO remapping)

fs/proc/task_mmu.c:     if (PageReserved(page))
fs/proc/task_mmu.c:     if (PageReserved(page))

-> If we find vmemmap pages mapped into user space we already messed up
    seriously

kernel/power/snapshot.c:        if (PageReserved(page) ||
kernel/power/snapshot.c:        if (PageReserved(page)

-> There should be no change (saveable_page() would still allow saving
    them, highmem does not apply)

mm/hugetlb_vmemmap.c:           if (!PageReserved(head))
mm/hugetlb_vmemmap.c:   if (PageReserved(page))

-> Wants to identify bootmem, but we exclude these
    PageVmemmapSelfHosted() on the splitting part already properly


mm/page_alloc.c:                VM_WARN_ON_ONCE(PageReserved(p));
mm/page_alloc.c:                if (PageReserved(page))

-> pfn_range_valid_contig() would scan them, just like for ordinary
    vmemmap pages during hotplug. We'll simply fail isolating/migrating
    them similarly (like any unmovable allocations) later

mm/page_ext.c:          BUG_ON(PageReserved(page));

-> free_page_ext handling, does not apply

mm/page_isolation.c:            if (PageReserved(page))

-> has_unmovable_pages() should still detect them as unmovable (e.g.,
    neither movable nor LRU).

mm/page_owner.c:                        if (PageReserved(page))
mm/page_owner.c:                        if (PageReserved(page))

-> Simply page_ext_get() will return NULL instead and we'll similarly
    skip them

mm/sparse.c:            if (!PageReserved(virt_to_page(ms->usage))) {

-> Detecting boot memory for ms->usage allocation, does not apply to
    vmemmap.

virt/kvm/kvm_main.c:    if (!PageReserved(page))
virt/kvm/kvm_main.c:    return !PageReserved(page);

-> For MMIO remapping purposes, does not apply to vmemmap


> Moreover, boot memmap pages are marked as PageReserved, which would be
> now inconsistent with those added during hotplug operations.

Just like vmemmap pages allocated dynamically during memory hotplug. 
Now, really only bootmem-ones are PageReserved.

> All in all, I feel uneasy about this change.

I really don't want to mark these pages here PageReserved for the sake 
of it.

Any PageReserved user that I am missing, or why we should handle these 
vmemmap pages differently than the ones allocated during ordinary memory 
hotplug?

In the future, we might want to consider using a dedicated page type for 
them, so we can stop using a bit that doesn't allow to reliably identify 
them. (we should mark all vmemmap with that type then)

Thanks!

-- 
Cheers,

David / dhildenb



  reply	other threads:[~2024-06-10  8:56 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-07  9:09 [PATCH v1 0/3] mm/memory_hotplug: use PageOffline() instead of PageReserved() for !ZONE_DEVICE David Hildenbrand
2024-06-07  9:09 ` [PATCH v1 1/3] mm: pass meminit_context to __free_pages_core() David Hildenbrand
2024-06-07 18:40   ` David Hildenbrand
2024-06-10  4:03   ` Oscar Salvador
     [not found]     ` <13070847-4129-490c-b228-2e52bd77566a@redhat.com>
2024-06-10 11:47       ` Oscar Salvador
2024-06-11 10:06   ` David Hildenbrand
2024-06-11 19:19     ` Andrew Morton
2024-06-11 19:20       ` David Hildenbrand
2024-06-12 18:38       ` David Hildenbrand
2024-06-11 19:41   ` Tim Chen
2024-06-11 19:50     ` David Hildenbrand
2024-06-07  9:09 ` [PATCH v1 2/3] mm/memory_hotplug: initialize memmap of !ZONE_DEVICE with PageOffline() instead of PageReserved() David Hildenbrand
2024-06-10  4:23   ` Oscar Salvador
2024-06-10  8:56     ` David Hildenbrand [this message]
2024-06-11  7:45       ` Oscar Salvador
2024-06-11  8:04         ` David Hildenbrand
2024-06-11  8:01   ` Oscar Salvador
2024-06-11  9:42   ` David Hildenbrand
2024-06-11 19:36     ` Andrew Morton
2024-06-07  9:09 ` [PATCH v1 3/3] mm/memory_hotplug: skip adjust_managed_page_count() for PageOffline() pages when offlining David Hildenbrand
2024-06-10  4:29   ` Oscar Salvador
2024-06-10  8:56     ` David Hildenbrand
2024-06-25 22:43 ` [PATCH v1 0/3] mm/memory_hotplug: use PageOffline() instead of PageReserved() for !ZONE_DEVICE Andrew Morton
2024-06-26  5:01   ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5d9583e1-3374-437d-8eea-6ab1e1400a30@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=decui@microsoft.com \
    --cc=dvyukov@google.com \
    --cc=elver@google.com \
    --cc=eperezma@redhat.com \
    --cc=glider@google.com \
    --cc=haiyangz@microsoft.com \
    --cc=jasowang@redhat.com \
    --cc=jgross@suse.com \
    --cc=kasan-dev@googlegroups.com \
    --cc=kys@microsoft.com \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mst@redhat.com \
    --cc=oleksandr_tyshchenko@epam.com \
    --cc=osalvador@suse.de \
    --cc=rppt@kernel.org \
    --cc=sstabellini@kernel.org \
    --cc=virtualization@lists.linux.dev \
    --cc=wei.liu@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    --cc=xuanzhuo@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox