From: David Rientjes <rientjes@google.com>
To: Aaron Thompson <dev@aaront.org>
Cc: linux-mm@kvack.org, Mike Rapoport <rppt@kernel.org>,
"H. Peter Anvin" <hpa@zytor.com>,
Alexander Potapenko <glider@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Andy Shevchenko <andy@infradead.org>,
Ard Biesheuvel <ardb@kernel.org>, Borislav Petkov <bp@alien8.de>,
Darren Hart <dvhart@infradead.org>,
Dave Hansen <dave.hansen@linux.intel.com>,
Dmitry Vyukov <dvyukov@google.com>,
Ingo Molnar <mingo@redhat.com>, Marco Elver <elver@google.com>,
Thomas Gleixner <tglx@linutronix.de>,
kasan-dev@googlegroups.com, linux-efi@vger.kernel.org,
linux-kernel@vger.kernel.org,
platform-driver-x86@vger.kernel.org, x86@kernel.org
Subject: Re: [PATCH 0/1] Pages not released from memblock to the buddy allocator
Date: Wed, 4 Jan 2023 17:43:51 -0800 (PST) [thread overview]
Message-ID: <30478b4a-870b-bf48-76d0-a236a40e7674@google.com> (raw)
In-Reply-To: <010101857bbc3a41-173240b3-9064-42ef-93f3-482081126ec2-000000@us-west-2.amazonses.com>
On Wed, 4 Jan 2023, Aaron Thompson wrote:
> Hi all,
>
> (I've CC'ed the KMSAN and x86 EFI maintainers as an FYI; the only code change
> I'm proposing is in memblock.)
>
> I've run into a case where pages are not released from memblock to the buddy
> allocator. If deferred struct page init is enabled, and memblock_free_late() is
> called before page_alloc_init_late() has run, and the pages being freed are in
> the deferred init range, then the pages are never released. memblock_free_late()
> calls memblock_free_pages() which only releases the pages if they are not in the
> deferred range. That is correct for free pages because they will be initialized
> and released by page_alloc_init_late(), but memblock_free_late() is dealing with
> reserved pages. If memblock_free_late() doesn't release those pages, they will
> forever be reserved. All reserved pages were initialized by memblock_free_all(),
> so I believe the fix is to simply have memblock_free_late() call
> __free_pages_core() directly instead of memblock_free_pages().
>
> In addition, there was a recent change (3c20650982609 "init: kmsan: call KMSAN
> initialization routines") that added a call to kmsan_memblock_free_pages() in
> memblock_free_pages(). It looks to me like it would also be incorrect to make
> that call in the memblock_free_late() case, because the KMSAN metadata was
> already initialized for all reserved pages by kmsan_init_shadow(), which runs
> before memblock_free_all(). Having memblock_free_late() call __free_pages_core()
> directly also fixes this issue.
>
> I encountered this issue when I tried to switch some x86_64 VMs I was running
> from BIOS boot to EFI boot. The x86 EFI code reserves all EFI boot services
> ranges via memblock_reserve() (part of setup_arch()), and it frees them later
> via memblock_free_late() (part of efi_enter_virtual_mode()). The EFI
> implementation of the VM I was attempting this on, an Amazon EC2 t3.micro
> instance, maps north of 170 MB in boot services ranges that happen to fall in
> the deferred init range. I certainly noticed when that much memory went missing
> on a 1 GB VM.
>
> I've tested the patch on EC2 instances, qemu/KVM VMs with OVMF, and some real
> x86_64 EFI systems, and they all look good to me. However, the physical systems
> that I have don't actually trigger this issue because they all have more than 4
> GB of RAM, so their deferred init range starts above 4 GB (it's always in the
> highest zone and ZONE_DMA32 ends at 4 GB) while their EFI boot services mappings
> are below 4 GB.
>
> Deferred struct page init can't be enabled on x86_32 so those systems are
> unaffected. I haven't found any other code paths that would trigger this issue,
> though I can't promise that there aren't any. I did run with this patch on an
> arm64 VM as a sanity check, but memblock=debug didn't show any calls to
> memblock_free_late() so that system was unaffected as well.
>
> I am guessing that this change should also go the stable kernels but it may not
> apply cleanly (__free_pages_core() was __free_pages_boot_core() and
> memblock_free_pages() was __free_pages_bootmem() when this issue was first
> introduced). I haven't gone through that process before so please let me know if
> I can help with that.
>
> This is the end result on an EC2 t3.micro instance booting via EFI:
>
> v6.2-rc2:
> # grep -E 'Node|spanned|present|managed' /proc/zoneinfo
> Node 0, zone DMA
> spanned 4095
> present 3999
> managed 3840
> Node 0, zone DMA32
> spanned 246652
> present 245868
> managed 178867
>
> v6.2-rc2 + patch:
> # grep -E 'Node|spanned|present|managed' /proc/zoneinfo
> Node 0, zone DMA
> spanned 4095
> present 3999
> managed 3840
> Node 0, zone DMA32
> spanned 246652
> present 245868
> managed 222816
>
The above before + after seems useful information to include in the commit
description of the change.
next prev parent reply other threads:[~2023-01-05 1:43 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-04 7:43 Aaron Thompson
2023-01-05 1:43 ` David Rientjes [this message]
2023-01-05 4:17 ` [PATCH v2 " Aaron Thompson
[not found] ` <20230105041650.1485-1-dev@aaront.org>
2023-01-05 4:17 ` [PATCH v2 1/1] mm: Always release pages to the buddy allocator in memblock_free_late() Aaron Thompson
2023-01-05 10:48 ` Ingo Molnar
2023-01-06 2:02 ` Aaron Thompson
2023-01-06 3:12 ` Ingo Molnar
2023-01-30 7:40 ` Xu Yu
2023-01-30 7:47 ` Xu Yu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=30478b4a-870b-bf48-76d0-a236a40e7674@google.com \
--to=rientjes@google.com \
--cc=akpm@linux-foundation.org \
--cc=andy@infradead.org \
--cc=ardb@kernel.org \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=dev@aaront.org \
--cc=dvhart@infradead.org \
--cc=dvyukov@google.com \
--cc=elver@google.com \
--cc=glider@google.com \
--cc=hpa@zytor.com \
--cc=kasan-dev@googlegroups.com \
--cc=linux-efi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@redhat.com \
--cc=platform-driver-x86@vger.kernel.org \
--cc=rppt@kernel.org \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox