From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1ECAFC54EBD for ; Fri, 6 Jan 2023 22:22:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6C27D8E0002; Fri, 6 Jan 2023 17:22:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 672778E0001; Fri, 6 Jan 2023 17:22:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 53B078E0002; Fri, 6 Jan 2023 17:22:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 407DB8E0001 for ; Fri, 6 Jan 2023 17:22:43 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 0C3CA160506 for ; Fri, 6 Jan 2023 22:22:43 +0000 (UTC) X-FDA: 80325799806.06.BE5A327 Received: from a27-189.smtp-out.us-west-2.amazonses.com (a27-189.smtp-out.us-west-2.amazonses.com [54.240.27.189]) by imf13.hostedemail.com (Postfix) with ESMTP id 5EB422000B for ; Fri, 6 Jan 2023 22:22:41 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=aaront.org header.s=ude52klaz7ukvnrchdbsicqdl2lnui6h header.b=PzJxcT0U; dkim=pass header.d=amazonses.com header.s=gdwg2y3kokkkj5a55z2ilkup5wp5hhxx header.b="M+/LshSr"; spf=pass (imf13.hostedemail.com: domain of 01010185892dd125-7738e4af-55c6-43b6-9cd9-d52dfea959d9-000000@ses-us-west-2.bounces.aaront.org designates 54.240.27.189 as permitted sender) smtp.mailfrom=01010185892dd125-7738e4af-55c6-43b6-9cd9-d52dfea959d9-000000@ses-us-west-2.bounces.aaront.org; dmarc=pass (policy=quarantine) header.from=aaront.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1673043761; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=nlAYKJhIKQIaJGqJZ7251SFQ9ZUo3QPvvxzqDQ/vCZE=; b=jg7Mw4RrPRXZ2pZfuxg9NzS77sxWTQ/LXKoqJszvxYJ1kaZqBmt1DI5X8sCCSAGes9JLGO 5f7nZHNgCWAwJH/52fXnqS1FalLBQhX7PnGYn9RJmrEnDogR+VAdbuw2sNvjrWR1Hn3qr6 iOXJvkq0byY29pkgI4C+3BCs9EBpy5o= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=aaront.org header.s=ude52klaz7ukvnrchdbsicqdl2lnui6h header.b=PzJxcT0U; dkim=pass header.d=amazonses.com header.s=gdwg2y3kokkkj5a55z2ilkup5wp5hhxx header.b="M+/LshSr"; spf=pass (imf13.hostedemail.com: domain of 01010185892dd125-7738e4af-55c6-43b6-9cd9-d52dfea959d9-000000@ses-us-west-2.bounces.aaront.org designates 54.240.27.189 as permitted sender) smtp.mailfrom=01010185892dd125-7738e4af-55c6-43b6-9cd9-d52dfea959d9-000000@ses-us-west-2.bounces.aaront.org; dmarc=pass (policy=quarantine) header.from=aaront.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1673043761; a=rsa-sha256; cv=none; b=Kx3KtmOAoQ/yVLGgMly7Dc1dEQbf4kU1u0nsgX1nXH3Fh+rnJcXAzk8jhEZwhbiyfiupM1 bzmn1ejMejQp5IPLzGBCg+N4g9jU+ctRuyXDhNK8CewPkPjbKBgGwkQP0mjMl7xUsyi5A0 YUjYhs1nJE60XVbi2+jvykwv9AKIyB0= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/simple; s=ude52klaz7ukvnrchdbsicqdl2lnui6h; d=aaront.org; t=1673043759; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version:Content-Transfer-Encoding; bh=IY8vnxo83o2Y6gMUZqHh+QTdUHhVKEY+SOxvGnmpdHg=; b=PzJxcT0U+Y7PPepn/l0fSGHaFMAE4Z6Nobcb8Y7BLD49uS+ucpHfD/hBcqtjSiop bEu5zTsKilWNQKuoNHe3/B5SbWfZ6cSemOVNswFIfYARkKiTyREgYp7Rr07/lJSrfq+ 5HYKB5ApZ5L0ezFR445IWVVr/dzuII2MsQsKQF4U= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/simple; s=gdwg2y3kokkkj5a55z2ilkup5wp5hhxx; d=amazonses.com; t=1673043759; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version:Content-Transfer-Encoding:Feedback-ID; bh=IY8vnxo83o2Y6gMUZqHh+QTdUHhVKEY+SOxvGnmpdHg=; b=M+/LshSr7RcA29c1Uh1QjtkHM2viQZrAjQ8zPl7vMU1FvYCwwUUe3O11YUz0IlHL 0PLYuNarCYlfPvPHsBEvmPbnMUiQHhud9nZWeiPSFXpV/mMWiNpa41elf0xA9Yne7GD AhWbzAcOUs3k3d0KhqObuNIo7ETLpG2fCji/THMo= From: Aaron Thompson To: Mike Rapoport , linux-mm@kvack.org Cc: "H. Peter Anvin" , Alexander Potapenko , Andrew Morton , Andy Shevchenko , Ard Biesheuvel , Borislav Petkov , Darren Hart , Dave Hansen , David Rientjes , Dmitry Vyukov , Ingo Molnar , Marco Elver , Thomas Gleixner , kasan-dev@googlegroups.com, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org, platform-driver-x86@vger.kernel.org, x86@kernel.org, Aaron Thompson Subject: [PATCH v3 0/1] Pages not released from memblock to the buddy allocator Date: Fri, 6 Jan 2023 22:22:39 +0000 Message-ID: <01010185892dd125-7738e4af-55c6-43b6-9cd9-d52dfea959d9-000000@us-west-2.amazonses.com> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Feedback-ID: 1.us-west-2.OwdjDcIoZWY+bZWuVZYzryiuW455iyNkDEZFeL97Dng=:AmazonSES X-SES-Outgoing: 2023.01.06-54.240.27.189 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 5EB422000B X-Rspam-User: X-Stat-Signature: xdng46nonw4k6o56fhx8x3c586ys1djq X-HE-Tag: 1673043761-457432 X-HE-Meta: U2FsdGVkX1+KQ6bbbO6DTtSQkFJG1FK6+8lJ414dfXnR0DXAW6GwdMvCo3pMvHf5O4xqn45zIMLqXEefnIqHxJ+r+DeQ7bGCM1Ez+4nvCwUCmClCUWS9J4KcKWtYQ/WV6BCp69+CWn+MSTv9WxECKDRP3O0affAUTA7qq0k0qWDhEhc+fXdFsr7Wf1BwRNSr8r6coiiLdxrJaiRETms9yTumGLbZdrmBfvYSx6yBpCkDoZL7TFtW/uJQED2G7DU0jAL+fpHr0eTJeheXUkEcoLt9s9qj2V3D2+1bhq7LsFnFsLG9VzWFW2RYnvvLF4n3oJystFDLWrYXJSuN/IsDVaK20DlvwGdP5gCTU5HlBv1VVGIgPOwivQelydOlJK15gn+PhJZXl2/0XDesHBtwKY20yq+otntm1fPHIIlrtXUg7L3DhPyDTXeSijUDrvQoPhG+wzHWtu+wu9RpVuJoTZ1sQKVeiN2q/5yZVYfkc/JBFdQofQSmymDVlR5rYClSqiydMtaCc+VXlcCq0bD3xnNHKAanWoMaDOoSRf8iamunSvzBOBV3ZSOjgj0eohbgjZRiXq2i78GmMWSaflaIpjk5wx7b8cpcuAbnvt9o9E604mqlBGFR6kvAQOXEA0K30mPBAKm1wF+MXsH41hEJZiHjxmf04CIAkmoUYdiyHTpiJNS/dQPikdlqD76Ze/noWZbHbjD0F/jhTLIiLUpt8ZHPNmD3spcnLRPTrwXxgkQY00xap0BG1E9kTkzY1e56l1Y4DsQUbxs4m0N4P0McMX87zjtYUbJM1nPnNRaI2fNG9wtB3VC7FJw0yLOwGib97s8Z8X8W/I7t84kfGQDUrCBMOhB+o2BhOoXfoHag5hPCkTD4YzVBpxh/71D34vpXjrGzizhPUfRwWktkOUv4clP0vN5mp9GhWlyKhuW7FZmyjpkoWS7acPV8jZk/rPkzx6s5GK+Wax7yuhjS1VL i6/xmaS9 PFkLvW3VpK3hM9SmwvOzOgVw/19HspkHR0iKZUDhM4L83MvMrQNEgecUbKOmJsFkzF2XYFD32EkN3GsyMBUnpAMXPKVgh4jD//6gvH1Q/pIh9LXwzGcmsuZj+eysog3ObAevSs/lGwNAmxZa4w5pg7ylBJo1L8U1eQjgMA0fieXgHGrQ/+zSBx8oeEwsiVu5o59s1xISYN22fqagl4Iq3OkfPYCAZb2AypQhAM92a+6D80AKt1Tlu4mkN8t1uHIObpWws6wulhFnkOOk8l6SBXivc/k9isyqCXWw4nGHjNM+HQFo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Changelog: v3: - Include the difference of managed pages in the commit message (suggested by Ingo Molnar) v2: - Add comment in memblock_free_late() (suggested by Mike Rapoport) - Improve commit message, including an explanation of the x86_64 EFI boot issue (suggested by Mike Rapoport and David Rientjes) Hi all, (I've CC'ed the KMSAN and x86 EFI maintainers as an FYI; the only code change I'm proposing is in memblock.) I've run into a case where pages are not released from memblock to the buddy allocator. If deferred struct page init is enabled, and memblock_free_late() is called before page_alloc_init_late() has run, and the pages being freed are in the deferred init range, then the pages are never released. memblock_free_late() calls memblock_free_pages() which only releases the pages if they are not in the deferred range. That is correct for free pages because they will be initialized and released by page_alloc_init_late(), but memblock_free_late() is dealing with reserved pages. If memblock_free_late() doesn't release those pages, they will forever be reserved. All reserved pages were initialized by memblock_free_all(), so I believe the fix is to simply have memblock_free_late() call __free_pages_core() directly instead of memblock_free_pages(). In addition, there was a recent change (3c20650982609 "init: kmsan: call KMSAN initialization routines") that added a call to kmsan_memblock_free_pages() in memblock_free_pages(). It looks to me like it would also be incorrect to make that call in the memblock_free_late() case, because the KMSAN metadata was already initialized for all reserved pages by kmsan_init_shadow(), which runs before memblock_free_all(). Having memblock_free_late() call __free_pages_core() directly also fixes this issue. I encountered this issue when I tried to switch some x86_64 VMs I was running from BIOS boot to EFI boot. The x86 EFI code reserves all EFI boot services ranges via memblock_reserve() (part of setup_arch()), and it frees them later via memblock_free_late() (part of efi_enter_virtual_mode()). The EFI implementation of the VM I was attempting this on, an Amazon EC2 t3.micro instance, maps north of 170 MB in boot services ranges that happen to fall in the deferred init range. I certainly noticed when that much memory went missing on a 1 GB VM. I've tested the patch on EC2 instances, qemu/KVM VMs with OVMF, and some real x86_64 EFI systems, and they all look good to me. However, the physical systems that I have don't actually trigger this issue because they all have more than 4 GB of RAM, so their deferred init range starts above 4 GB (it's always in the highest zone and ZONE_DMA32 ends at 4 GB) while their EFI boot services mappings are below 4 GB. Deferred struct page init can't be enabled on x86_32 so those systems are unaffected. I haven't found any other code paths that would trigger this issue, though I can't promise that there aren't any. I did run with this patch on an arm64 VM as a sanity check, but memblock=debug didn't show any calls to memblock_free_late() so that system was unaffected as well. I am guessing that this change should also go the stable kernels but it may not apply cleanly (__free_pages_core() was __free_pages_boot_core() and memblock_free_pages() was __free_pages_bootmem() when this issue was first introduced). I haven't gone through that process before so please let me know if I can help with that. This is the end result on an EC2 t3.micro instance booting via EFI: v6.2-rc2: # grep -E 'Node|spanned|present|managed' /proc/zoneinfo Node 0, zone DMA spanned 4095 present 3999 managed 3840 Node 0, zone DMA32 spanned 246652 present 245868 managed 178867 v6.2-rc2 + patch: # grep -E 'Node|spanned|present|managed' /proc/zoneinfo Node 0, zone DMA spanned 4095 present 3999 managed 3840 Node 0, zone DMA32 spanned 246652 present 245868 managed 222816 Aaron Thompson (1): mm: Always release pages to the buddy allocator in memblock_free_late(). mm/memblock.c | 8 +++++++- tools/testing/memblock/internal.h | 4 ++++ 2 files changed, 11 insertions(+), 1 deletion(-) -- 2.30.2