From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3DD6FC61CE7 for ; Fri, 6 Jun 2025 16:23:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CA2B86B0089; Fri, 6 Jun 2025 12:23:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C52F86B008C; Fri, 6 Jun 2025 12:23:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B90156B0092; Fri, 6 Jun 2025 12:23:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 99B376B0089 for ; Fri, 6 Jun 2025 12:23:12 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 34BA01D0103 for ; Fri, 6 Jun 2025 16:23:12 +0000 (UTC) X-FDA: 83525495424.07.E6F02C5 Received: from sea.source.kernel.org (sea.source.kernel.org [172.234.252.31]) by imf11.hostedemail.com (Postfix) with ESMTP id 6A6BB40002 for ; Fri, 6 Jun 2025 16:23:10 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=sQPmZdLH; spf=pass (imf11.hostedemail.com: domain of pratyush@kernel.org designates 172.234.252.31 as permitted sender) smtp.mailfrom=pratyush@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1749226990; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0+ln5P26Ak/j4ElmXSdvwPFgYvBe+Jc+lOmVACLTtk8=; b=xQ32Wh4YvIQSqJkkyMDO6JYgQyO1MVfX+LM8y+oBhNoCb05NRtMDTuRrmEyWCKXLt8aePw rnP0Lu1JYBh6h4zv9EFUQ/u4QsDcjJUlIu4GMBYh7KAyeudr/+uxloJ/Vl74DD/XpiNAwb i9XCRa1G0y2su0yODBIbjRwB3JRHsJ4= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=sQPmZdLH; spf=pass (imf11.hostedemail.com: domain of pratyush@kernel.org designates 172.234.252.31 as permitted sender) smtp.mailfrom=pratyush@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1749226990; a=rsa-sha256; cv=none; b=W35n/nt2Pj5+Zz+TAd+EDKRJac3PsDa1p9Xt8Es66Ah+hkeJHLmnQ7OhSOEMVsiRdJuFuy p0EUPkdnOLISbZ5sleZCqBDEYmKzrFc4jSNNwwDzf8vrIixUnebSCwNVfdxYEjgTb9jm8/ qytW5Gy7yJyG3Orjh4KzcYIN2f7jV+A= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sea.source.kernel.org (Postfix) with ESMTP id 1BED943779; Fri, 6 Jun 2025 16:23:09 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3D92EC4CEEB; Fri, 6 Jun 2025 16:23:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1749226989; bh=6rhc/ZDVVnpQpTCIp4jX95hAhu4FDuDrKeo8BKnWZ3Q=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=sQPmZdLHLDPhq6jJ5XL+69EoRnhjMkxvVY02bXGdIzR8rAJFIDjMWNkcQ3mzkVyOw SWxGsxofSa1kUHpBTyAahAd6/+RXffUuxRDtPig2b8G8kbHEkUiR1Qr63ZZnKQFXWp VAj7YZHGU4wHItd2HmgNa1LocUE1ghegfrqy/rimkd/ZEXb8cOx6zolEvp0+LqGfZp vSc8M16uHz7Gmqq8D6pZ46l0S0t5aSJB6s4Rn0vDS6FwBEYP91i6zYcFTme3lt15P4 6AyWwRzzV2jZWdUmZKzBAD5ca/1SaODXVN5v6cWCo9cjjjGWCNqikn9KbpmDPQ6BHO gqFSAGnA4eMjA== From: Pratyush Yadav To: Mike Rapoport Cc: Pratyush Yadav , Alexander Graf , Changyuan Lyu , Pasha Tatashin , Andrew Morton , Baoquan He , kexec@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH] kho: initialize tail pages for higher order folios properly In-Reply-To: References: <20250605171143.76963-1-pratyush@kernel.org> Date: Fri, 06 Jun 2025 18:23:06 +0200 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain X-Rspam-User: X-Rspamd-Queue-Id: 6A6BB40002 X-Rspamd-Server: rspam09 X-Stat-Signature: b4jrhr8y5krush5gbuez3gjd6qaoqjjd X-HE-Tag: 1749226990-873391 X-HE-Meta: U2FsdGVkX1+TVUpfV/BXHZpVG6UC4Bynue+anJg9aCWRf23LRVT3F9XFVGnFyDDmpU3Vwg7+ahHD8ilXED+rny9YZeizbNI1pPIomcv8WTpwXS2y7LHg5/rSKXwgMzqpdNpX+wpYglWwATFQcwlLOZxrmwnJG/LDZnYCuJvVv3tUv5+2FwbyxomY3Ce1fxkiSpy+d5qcHSJGs+3tf6ga5E7LKxtyJSV2QPmRT1Jshj9QbmEx+Ka8WRIQuHjpmzf0LDC85ZngLv7zEoWvJYVD6MzLdEz4y0G3uoD4TuY2d+K7YRwSPgzSY2ULfb80H9d7TIBUPlTSG9fEi0AsWimraZvVysNlK27ibiOOaOeM1ZTDvdNNcpe3wUIRm4RDMam5HajWKfuDzaN0mnxmO0icpuLjFS+WnIxW9rwOa+tbyJE53pbHwEjHqYH+zmvweqUkriU/jnuWOscFlwUGBlyfCJVfqMR+p/QnPBC+ZkU50/AvqyVXikPDLJU3G8jpnAYt3UJV/BxdNMhZVLcGYUX9LazpjmvYFs/0fjg+eLezbTxTMf1PJiv1bJA77AFhe96oHexFPPZOFqFygwcKZy5X4FYcZGXYImj3gLLbkfcTvldSGvXakgbt39E1TeB36W1uTDMdFQ+jTfWfUBmfTuhHLOlo1wPHSp7A7va8M1Tifmyw/o7UFn819iHaNijKBVSZZfmag4HRM8V52NbpU7f5Kfp79if/fZKq83t+H40X6oHTYmrQRT8HBK6wyTgL5ftmpDLT1EVSnjrHN5Ofz/FPO10aUvP8qUPQZ5AC4qeR1cD/59Zzthc0iMoFjsiH2GXTDIYp9Ez7Sba9+BvL/aa0bDO2hz6NS8ZTavwR2EZpaTPuSDVQKDBUIS+C1iW7HVxmO2vapFq2NkOwRLe/juwhwZJKYgSVxkiCP4RrH3KwBQMQGiQnMKdst8uYhrpaXDqOsiC334iX05dJfPZEO2b VhPHloEZ xsuGGxZc9DzS5thFoQIy83a4OHzvYloMNelYbs3DeH9oXLVGimped2orSNNoJjX6/m3PzAwdFmjth8/sX+SzDkgUYCUQX7gNXDhUzZsmlKdAIS6bau71Z4FXj8kxiiAQVoUj2hJN3K585XqHnApyPJvE/bPU1HDmuTB/yezEsMKZlTPM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Mike, On Fri, Jun 06 2025, Mike Rapoport wrote: > On Thu, Jun 05, 2025 at 07:11:41PM +0200, Pratyush Yadav wrote: >> From: Pratyush Yadav >> >> Currently, when restoring higher order folios, kho_restore_folio() only >> calls prep_compound_page() on all the pages. That is not enough to >> properly initialize the folios. The managed page count does not >> get updated, the reserved flag does not get dropped, and page count does >> not get initialized properly. >> >> Restoring a higher order folio with it results in the following BUG with >> CONFIG_DEBUG_VM when attempting to free the folio: >> >> BUG: Bad page state in process test pfn:104e2b >> page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffffffffffffffff pfn:0x104e2b >> flags: 0x2fffff80000000(node=0|zone=2|lastcpupid=0x1fffff) >> raw: 002fffff80000000 0000000000000000 00000000ffffffff 0000000000000000 >> raw: ffffffffffffffff 0000000000000000 00000001ffffffff 0000000000000000 >> page dumped because: nonzero _refcount >> [...] >> Call Trace: >> >> dump_stack_lvl+0x4b/0x70 >> bad_page.cold+0x97/0xb2 >> __free_frozen_pages+0x616/0x850 >> [...] >> >> Combine the path for 0-order and higher order folios, initialize the >> tail pages with a count of zero, and call adjust_managed_page_count() to >> account for all the pages instead of just missing them. >> >> In addition, since all the KHO-preserved pages get marked with >> MEMBLOCK_RSRV_NOINIT by deserialize_bitmap(), the reserved flag is not >> actually set (as can also be seen from the flags of the dumped page in >> the logs above). So drop the ClearPageReserved() calls. >> >> Fixes: fc33e4b44b271 ("kexec: enable KHO support for memory preservation") >> Signed-off-by: Pratyush Yadav >> --- >> >> Side note: get_maintainers.pl for KHO only lists kexec@ as the mailing list. >> Since KHO has a bunch of MM bits as well, should we also add linux-mm@ to its >> MAINTAINERS entry? >> >> Adding linux-mm@ to this patch at least, in case MM people have an opinion on >> this. >> >> kernel/kexec_handover.c | 29 +++++++++++++++++------------ >> 1 file changed, 17 insertions(+), 12 deletions(-) >> >> diff --git a/kernel/kexec_handover.c b/kernel/kexec_handover.c >> index eb305e7e61296..5214ab27d1f8d 100644 >> --- a/kernel/kexec_handover.c >> +++ b/kernel/kexec_handover.c >> @@ -157,11 +157,21 @@ static int __kho_preserve_order(struct kho_mem_track *track, unsigned long pfn, >> } >> >> /* almost as free_reserved_page(), just don't free the page */ >> -static void kho_restore_page(struct page *page) >> +static void kho_restore_page(struct page *page, unsigned int order) >> { >> - ClearPageReserved(page); > > So now we don't clear PG_Reserved even on order-0 pages? ;-) We don't need to. As I mentioned in the commit message as well, PG_Reserved is never set for KHO pages since they are reserved with MEMBLOCK_RSRV_NOINIT, so memmap_init_reserved_pages() skips over them. To double-check, I added some quick prints to kho_restore_page(): /* Head page gets refcount of 1. */ set_page_count(page, 1); printk("Head page flags: 0x%lx reserved: %d\n", page->flags, PageReserved(page)); /* For higher order folios, tail pages get a page count of zero. */ for (i = 1; i < nr_pages; i++) { printk("Tail page %u flags: 0x%lx reserved: %d\n", i, (page+i)->flags, PageReserved(page+i)); set_page_count(page + i, 0); } And this is what I get: [ 9.003187] Head page flags: 0x2fffff80000000 reserved: 0 [ 9.003730] Tail page 1 flags: 0x2fffff80000000 reserved: 0 [ 9.004229] Tail page 2 flags: 0x2fffff80000000 reserved: 0 [ 9.004740] Tail page 3 flags: 0x2fffff80000000 reserved: 0 [ 9.005265] Head page flags: 0x2fffff80000000 reserved: 0 [ 9.005759] Head page flags: 0x2fffff80000000 reserved: 0 [...] So PG_Reserved is never set. That said, while reading through some of the code, I noticed another bug: because KHO reserves the preserved pages as NOINIT, with CONFIG_DEFERRED_STRUCT_PAGE_INIT == n, all the pages get initialized when memmap_init_range() is called from setup_arch (paging_init() on x86). This happens before kho_memory_init(), so the KHO-preserved pages are not marked as reserved to memblock yet. With deferred page init, some pages might not get initialized early, and get initialized after kho_memory_init(), by which time the KHO-preserved pages are marked as reserved. So, deferred_init_maxorder() will skip over those pages and leave them uninitialized. And sure enough, doing the same with CONFIG_DEFERRED_STRUCT_PAGE_INIT == y results in: [ 10.060842] Head page flags: 0x2fffff80000000 reserved: 0 [ 10.061387] Tail page 1 flags: 0x2fffff80000000 reserved: 0 [ 10.061902] Tail page 2 flags: 0x2fffff80000000 reserved: 0 [ 10.062400] Tail page 3 flags: 0x2fffff80000000 reserved: 0 [ 10.062924] page:00000000fb3dca54 is uninitialized and poisoned [ 10.063494] page dumped because: VM_BUG_ON_PAGE(PagePoisoned(page)) [ 10.064190] ------------[ cut here ]------------ [ 10.064636] kernel BUG at ./include/linux/page-flags.h:571! [ 10.065194] Oops: invalid opcode: 0000 [#1] SMP PTI [ 10.065661] CPU: 2 UID: 0 PID: 1954 Comm: test Not tainted 6.15.0+ #297 PREEMPT(undef) [ 10.066449] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 [ 10.067353] RIP: 0010:kho_restore_folio+0x4e/0x70 [...] So we need to either also call init_deferred_page(), or remove the memblock_reserved_mark_noinit() call in deserialize_bitmap(). And TBH, I am not sure why KHO pages even need to be marked noinit in the first place. Probably the only benefit would be if a large chunk of memory is KHO-preserved, the pages can be initialized later on-demand, reducing bootup time a bit. What do you think? Should we drop noinit or call init_deferred_page()? FWIW, my preference is to drop noinit, since init_deferred_page() is __meminit and we would have to make sure it doesn't go away after boot. > >> - init_page_count(page); >> - adjust_managed_page_count(page, 1); >> + unsigned int i, nr_pages = (1 << order); > > Can you please declare 'i' inside the loop, looks nicer IMHO. Ok, will do. > >> + >> + /* Head page gets refcount of 1. */ >> + set_page_count(page, 1); > > ClearPageReserved(page) here? > >> + >> + /* For higher order folios, tail pages get a page count of zero. */ >> + for (i = 1; i < nr_pages; i++) >> + set_page_count(page + i, 0); > > and here? > >> + >> + if (order > 0) >> + prep_compound_page(page, order); >> + >> + adjust_managed_page_count(page, nr_pages); >> } >> >> /** [...] -- Regards, Pratyush Yadav