From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2BB5FC636D3 for ; Mon, 6 Feb 2023 07:12:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 98B116B0073; Mon, 6 Feb 2023 02:12:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 93AE76B0074; Mon, 6 Feb 2023 02:12:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8030D6B0075; Mon, 6 Feb 2023 02:12:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 737366B0073 for ; Mon, 6 Feb 2023 02:12:51 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 3BD3FA0B60 for ; Mon, 6 Feb 2023 07:12:51 +0000 (UTC) X-FDA: 80435999742.15.9BD7451 Received: from smtp-out0.aaront.org (smtp-out0.aaront.org [52.10.12.108]) by imf07.hostedemail.com (Postfix) with ESMTP id 7D0944000A for ; Mon, 6 Feb 2023 07:12:49 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=aaront.org header.s=elwxqanhxhag6erl header.b=OwmytaBm; spf=pass (imf07.hostedemail.com: domain of dev+gse@aaront.org designates 52.10.12.108 as permitted sender) smtp.mailfrom=dev+gse@aaront.org; dmarc=pass (policy=quarantine) header.from=aaront.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675667569; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Vl5/YDiQa4da9gO8mEAEmyBjKwFuoCigmbcXYdivwn4=; b=rW0wCkiHv/6Yrr9fndZr/c6nSJrhWTuvAPYNowoRz+qo6xSigNmj0JRdGRAsjy/SKyvzW2 KVkRf2qHRXPJaNZIl0CIpbPvWP2dj8OeGFkUsawwXu4n02UwnxKoY2xCrsTK8DPT1KgCue ee0gCyT5fKzFQapfqfV3/va36ABBQjY= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=aaront.org header.s=elwxqanhxhag6erl header.b=OwmytaBm; spf=pass (imf07.hostedemail.com: domain of dev+gse@aaront.org designates 52.10.12.108 as permitted sender) smtp.mailfrom=dev+gse@aaront.org; dmarc=pass (policy=quarantine) header.from=aaront.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675667569; a=rsa-sha256; cv=none; b=7ReYjNCl3uB4L+bpLsghEAw5Y28MA6XGD2tRmDKCG9SOJX80RR+CIwh7BCnVowpjC/XWMQ xIcJCoEoENbFAAibz3sl/x8agIZDj/kSPH1V6ajWv8DQIZqfWQLGeoVmugDqPjBTVcrL8z TGxENnclMJJF51BVSzFF9HCqti0dYEE= Received: by smtp-out0.aaront.org (Postfix) with ESMTP id 4P9HXJ40cczMs; Mon, 6 Feb 2023 07:12:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=aaront.org; h= from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; s=elwxqanhxhag6erl; bh= y7pa9vIM1shhX8XOKE3m/lGhsBFqy8v63C8PQIRZIGM=; b=OwmytaBmSoCrxLGI b8oAMXxhIH7QvQKaG4U4JLebDboLonVTHpvdJ6en1jIOKO+QiT4uVzGJn5ZX2VZ2 SO4oHbcOKSCIQoRPHwW0esQyvNM1jueruaA+xlbjcYwHnuNFODy20F/LY4RRzPPK mq5JGpXgLgSKLhFprIT/pL+8pycJK03kyGu6XO02V0ZHsez1d6DnTNPcCUdGc9c5 oea9XEUCXzM1GVoNJfd7e1b68eAqlmjeX4KzwN46iC1YTswqHyIEcX6O5jbIH5yu monSUsEZwaKAyQwJ77Q3/Ue5hnKNAt1xpXTgp0p82TvEgBNLEyf73IzNDg3+F0Yw yfZXLA== From: Aaron Thompson To: Mike Rapoport Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Aaron Thompson Subject: [PATCH 1/1] mm: Defer freeing reserved pages in memblock_free_late() Date: Mon, 6 Feb 2023 07:12:10 +0000 Message-Id: <20230206071211.3157-2-dev@aaront.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20230206071211.3157-1-dev@aaront.org> References: <20230206071211.3157-1-dev@aaront.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 7D0944000A X-Stat-Signature: orxk9useaza63rg5arpqrai5tqf5g3u9 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1675667569-715369 X-HE-Meta: U2FsdGVkX18ynqTpLkYRLd5pnK7A8WU6bDcSrF/TeWFvfJA3JdC552VzREisqPmDUYR/f12T69nPbdjkTiIc3dHmAhbNKdq5epgXdMawX07hyCS+V2FU1dpIE8R/J0OIiawsqzsfDGmkMfmoYXcETD4sisurX8EF9NjPxhOG9F06ZnHoTWQDaConURcwnfoU0+LIrgZAseirVlZX30SheMZuyYstVUH1yIp9j5un9ewsLeVrXvVHk4WmwZLd4ACqreyb4xnBs8ieZ1x2+OQQRN309L47bz4tTRmI04sHp+5uP1Y/ANQhKinQ85u85eJmAQfFeu/nmguUa/viwO00QItsJpqf+sJ9RfcJf/9JxkAOuwR/c1x7kenl0ntlL+ZdpX0J8CbUv4MDCo7fhXbqzgL9FsLIMATZSHtBKWg+b9qBgJQt8iepVrtxrxPQsjbQ/Rfe5C5vXqu9BauJWlB2metI+AUSRKaAl6780GurCvpvExOOiG53PtbufRUXpxujDs8j7qlXCdqHvKW8S2gKtQPS4a4fImMv7kNYdlrR7RRP6jBGhDLytXLN+NnJmTjHWrCUrrE3wvp2I3e+MasoNnOOsiU8qg7mYNN9jUwO0DCToIM0Cj317lT9KMX2sDHQh9kCquGkC9kneveX5G7EMMqtcJDSPeA+JiwzsNyJp8J48InBo1w+Y8IdXYdhUZBBcmAH2REkyK7XBm3vNgKSlIupvjos6G9QzxPNEWWpaZsKQ/iZfk4DCWSWaLGZV+99Uvs3+AYkuvLR1Wwbo603eTN9eEHu9FdcVmW+z20b9poGrJkd3Z+DYAlk6S5/XItEZ1Z8L398G1AU25hVA2cFGJBxsYD8PwpRix/TBLiqzTKOAySDJln9E99nqJ4nlMrttk5IcOOsJsY8fb+zz+EafCZLVJcvR8dQ5phcq0B6AQRdUJyf1m9I9cNXOIZn/SVZ1d25ObESS4P9QqNmAgV tBl+a5Q/ J0nfmxox1mQLR81lxT+1CEd9J0duGROMzccLZLwJqL5exCAo1HanoqSXiOzvlNNvMXyHJiISyncy1kG9vu6W5SdfDXiunkmTXzI3UPtGdVplIX5xAYQaFz5tJiZWE75GCQepDyCwl1vGh5VEGj4uzBQJVe1wvtzusbIsxwwGGk2rqtT/Ptc8BDCGpzATQyu7vRAapCW4JVlaWULHoJAc882gefuUcHDs2L2H/+K3OVi120Zf2b4QNiHwyA7z4q0hBm+RPjarlnX3C3l0seg5hsQIpnpeYja42Var471lmz25aRO/Y1c5mTPzrBzsZaGb++K1rP+F88ukPnYUQBi/zY7T/xAdKXJRESoLlFupWT9l19xg5OhvYDg9JlMcXnHHo+TABVae5Kx7shBmUTk2zyino7A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Commit 115d9d77bb0f ("mm: Always release pages to the buddy allocator in memblock_free_late().") introduced a bug. The pages being freed by memblock_free_late() have already been initialized, but if they are in the deferred init range, __free_one_page() might access nearby uninitialized pages when trying to coalesce buddies. This can, for example, trigger this BUG: BUG: unable to handle page fault for address: ffffe964c02580c8 RIP: 0010:__list_del_entry_valid+0x3f/0x70 __free_one_page+0x139/0x410 __free_pages_ok+0x21d/0x450 memblock_free_late+0x8c/0xb9 efi_free_boot_services+0x16b/0x25c efi_enter_virtual_mode+0x403/0x446 start_kernel+0x678/0x714 secondary_startup_64_no_verify+0xd2/0xdb Instead of freeing such pages immediately, remove the range from memblock.reserved. This causes the deferred init process to treat it as a range of free pages, which means they will be initialized and freed by deferred_init_maxorder(). Fixes: 115d9d77bb0f ("mm: Always release pages to the buddy allocator in memblock_free_late().") Signed-off-by: Aaron Thompson --- mm/internal.h | 2 ++ mm/memblock.c | 36 ++++++++++++++++++++----------- mm/page_alloc.c | 17 +++++++++++++++ tools/testing/memblock/internal.h | 7 +++--- 4 files changed, 47 insertions(+), 15 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index bcf75a8b032d..48d87f334f8c 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -358,6 +358,8 @@ extern void __putback_isolated_page(struct page *page, unsigned int order, int mt); extern void memblock_free_pages(struct page *page, unsigned long pfn, unsigned int order); +extern void memblock_free_reserved_pages(struct page *page, unsigned long pfn, + unsigned int order); extern void __free_pages_core(struct page *page, unsigned int order); extern void prep_compound_page(struct page *page, unsigned int order); extern void post_alloc_hook(struct page *page, unsigned int order, diff --git a/mm/memblock.c b/mm/memblock.c index 685e30e6d27c..8f65ea3533c6 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -160,6 +160,7 @@ static bool system_has_some_mirror __initdata_memblock = false; static int memblock_can_resize __initdata_memblock; static int memblock_memory_in_slab __initdata_memblock = 0; static int memblock_reserved_in_slab __initdata_memblock = 0; +static bool memblock_discard_called __initdata = false; static enum memblock_flags __init_memblock choose_memblock_flags(void) { @@ -366,6 +367,8 @@ void __init memblock_discard(void) { phys_addr_t addr, size; + memblock_discard_called = true; + if (memblock.reserved.regions != memblock_reserved_init_regions) { addr = __pa(memblock.reserved.regions); size = PAGE_ALIGN(sizeof(struct memblock_region) * @@ -1620,13 +1623,16 @@ void * __init memblock_alloc_try_nid( } /** - * memblock_free_late - free pages directly to buddy allocator - * @base: phys starting address of the boot memory block + * memblock_free_late - free boot memory block after memblock_free_all() has run + * @base: phys starting address of the boot memory block * @size: size of the boot memory block in bytes * - * This is only useful when the memblock allocator has already been torn - * down, but we are still initializing the system. Pages are released directly - * to the buddy allocator. + * Free boot memory block previously allocated or reserved via memblock APIs. + * This function is to be used after memblock_free_all() has run (prior to that, + * use memblock_free()/memblock_phys_free()). Pages will be released to the + * buddy allocator, either immediately or as part of deferred page + * initialization. The block will also be removed from the reserved regions if + * memblock_discard() has not yet run. */ void __init memblock_free_late(phys_addr_t base, phys_addr_t size) { @@ -1640,15 +1646,21 @@ void __init memblock_free_late(phys_addr_t base, phys_addr_t size) end = PFN_DOWN(base + size); for (; cursor < end; cursor++) { - /* - * Reserved pages are always initialized by the end of - * memblock_free_all() (by memmap_init() and, if deferred - * initialization is enabled, memmap_init_reserved_pages()), so - * these pages can be released directly to the buddy allocator. - */ - __free_pages_core(pfn_to_page(cursor), 0); + memblock_free_reserved_pages(pfn_to_page(cursor), cursor, 0); totalram_pages_inc(); } + + if (!memblock_discard_called) + /* + * Also remove the range from memblock.reserved. If deferred + * page init is enabled, memblock_free_reserved_pages() does not + * free pages that are in the deferred range, but because the + * range is no longer reserved, deferred init will initialize + * and free the pages. Note that such pages will be initialized + * twice, first by memmap_init_reserved_pages() and again by + * deferred_init_maxorder(). + */ + memblock_remove_range(&memblock.reserved, base, size); } /* diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0745aedebb37..4583215bfe3a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1813,6 +1813,23 @@ void __init memblock_free_pages(struct page *page, unsigned long pfn, __free_pages_core(page, order); } +void __init memblock_free_reserved_pages(struct page *page, unsigned long pfn, + unsigned int order) +{ + /* + * All reserved pages have been initialized at this point by either + * memmap_init() or memmap_init_reserved_pages(), but if the pages to + * be freed are in the deferred init range (which is what + * early_page_uninitialised() checks), freeing them now could result + * in __free_one_page() accessing nearby uninitialized pages when it + * tries to coalesce buddies. They will be freed as part of deferred + * init instead. + */ + if (early_page_uninitialised(pfn)) + return; + __free_pages_core(page, order); +} + /* * Check that the whole (or subset of) a pageblock given by the interval of * [start_pfn, end_pfn) is valid and within the same zone, before scanning it diff --git a/tools/testing/memblock/internal.h b/tools/testing/memblock/internal.h index 85973e55489e..524d93e71bee 100644 --- a/tools/testing/memblock/internal.h +++ b/tools/testing/memblock/internal.h @@ -15,12 +15,13 @@ bool mirrored_kernelcore = false; struct page {}; -void __free_pages_core(struct page *page, unsigned int order) +void memblock_free_pages(struct page *page, unsigned long pfn, + unsigned int order) { } -void memblock_free_pages(struct page *page, unsigned long pfn, - unsigned int order) +void memblock_free_reserved_pages(struct page *page, unsigned long pfn, + unsigned int order) { } -- 2.30.2