From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E743C369D3 for ; Tue, 22 Apr 2025 13:31:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F0B976B0005; Tue, 22 Apr 2025 09:31:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EBC536B0006; Tue, 22 Apr 2025 09:31:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DA9146B0007; Tue, 22 Apr 2025 09:31:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id B93046B0005 for ; Tue, 22 Apr 2025 09:31:39 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id A7ED11CE09A for ; Tue, 22 Apr 2025 13:31:39 +0000 (UTC) X-FDA: 83361767118.06.73A840C Received: from sea.source.kernel.org (sea.source.kernel.org [172.234.252.31]) by imf09.hostedemail.com (Postfix) with ESMTP id C8CC414001B for ; Tue, 22 Apr 2025 13:31:37 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=lzsbSUe9; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf09.hostedemail.com: domain of rppt@kernel.org designates 172.234.252.31 as permitted sender) smtp.mailfrom=rppt@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745328698; a=rsa-sha256; cv=none; b=hirSrYlZS0qUgE/JIvphuKJvnkzW/FJ/diBLSSqe2TtGEtb3LUOyhQ8wS33v1zslL+uFNB I03foL415l03UZGKyL/JWYJFS1x68PFas2xRuq4bz4amh3Mi9qsMnls/LmNpOx5+f/5ZVr /qQ3et2CxhF25bf4cW+N9hkmw4XZm2A= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=lzsbSUe9; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf09.hostedemail.com: domain of rppt@kernel.org designates 172.234.252.31 as permitted sender) smtp.mailfrom=rppt@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1745328698; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ID22wjceRugB90WxLsuRXhNBWXnwisWQrSPXMdGR18k=; b=MRST0R7BGQUZIAR0mFHxvg5q56Sb0IJwSbgdYK8rmJryGpZX9aj5KS2YGaHlA1rOOpLxRd HwwxTgfBNYNe8Rdfyjs891UklRwSc/g/07OoHP+0GQsQ5IrOt6yqyPA0pVDzEitN756/fM Hi6uCrR/QG7o/dO2nPMcM2IqOD9Q/iI= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sea.source.kernel.org (Postfix) with ESMTP id 6EECE49B89; Tue, 22 Apr 2025 13:31:33 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id BD85EC4CEE9; Tue, 22 Apr 2025 13:31:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1745328694; bh=PUlxQyX8nREovK6k+GC9IgG9dVIjGRHmht1sosPSRZg=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=lzsbSUe9yMIoIdAFuVtjuYbtKJXYnXH+/18lnSxmlyvJ3KAczwykQSh0ukO+g5UhR b9fKvjYZ4h43aGmf/RbCPrd1oGYcnY1Mrcf5caZIcHxEVTAbKDhBZCwUX/xHDukPQf eneYsZ6BCW2dUf2BBdziP1WjKygmCeL2+OdBTTdA7gK6RiN0DcOwrzOd9Qu1F3UyFi 2oeG7VUlfx0elQ3L/kwDIu6Zy7W3318G0BIs8M9MOq110qg5tpNxoYsnjTwCkfb5d8 iPQfR1G2Q83oKqJ/77Xc2dm00a6e3gsq56zipjt/LQbzPilfN8TfWRxWZJthoj/dTK M120fBrzWowgw== Date: Tue, 22 Apr 2025 16:31:19 +0300 From: Mike Rapoport To: Changyuan Lyu Cc: linux-kernel@vger.kernel.org, akpm@linux-foundation.org, anthony.yznaga@oracle.com, arnd@arndb.de, ashish.kalra@amd.com, benh@kernel.crashing.org, bp@alien8.de, catalin.marinas@arm.com, corbet@lwn.net, dave.hansen@linux.intel.com, devicetree@vger.kernel.org, dwmw2@infradead.org, ebiederm@xmission.com, graf@amazon.com, hpa@zytor.com, jgowans@amazon.com, kexec@lists.infradead.org, krzk@kernel.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, luto@kernel.org, mark.rutland@arm.com, mingo@redhat.com, pasha.tatashin@soleen.com, pbonzini@redhat.com, peterz@infradead.org, ptyadav@amazon.de, robh@kernel.org, rostedt@goodmis.org, saravanak@google.com, skinsburskii@linux.microsoft.com, tglx@linutronix.de, thomas.lendacky@amd.com, will@kernel.org, x86@kernel.org Subject: Re: [PATCH v6 12/14] memblock: add KHO support for reserve_mem Message-ID: References: <20250411053745.1817356-1-changyuanl@google.com> <20250411053745.1817356-13-changyuanl@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250411053745.1817356-13-changyuanl@google.com> X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: C8CC414001B X-Stat-Signature: sxbhejkchsg4s99ph8p8irpi9pt3ecpu X-Rspam-User: X-HE-Tag: 1745328697-187619 X-HE-Meta: U2FsdGVkX18R24hjBLQ3VztQFf9E/hxaMj3+qBrzqDZYSQE4hjjAzl8GcOx7OWXKh+vfeCKYqoFIMjoL7xOl7Owsc52Mf2rlzTsyXxewp2Kju9Rw+vUHNfbd4rw7s5dOeiO54+6KmOS1trW5du8demVGpY37E61Gq8rSpdW8YulA0GxYYAlzUPl1uBlXxb6yv+4DxCt2fJfOkuTPSRGY5JXIKX8gnRezyAgtmXviAMq83luCdNmOD7wgSXaxFcF7WwP/H0CXLkarfZRIUq7GxKRlvR5Bi3/mWXM3QWOu9xexGOd0u4tkOcTUgX7SmDg3jkjvFp+SdVUlozlcY5/vZjdr6ljtyljv71HAfDkQwWulbM+DX8AvxruFDcIhTYqeIV60UaqY4i00aLyLGuLRbRvCBR9tybNwwltMN9TWllr7kkZolREPswvZjiaPKv3zAhKzieDJ8l9ywOTPCUbiOPWjuq4UHIZs74DQ/PNw980IhC8gxtsTZ3KVRag3r/BJNWSi2NLP8ZFZgFmqUcE+FuAPAC6BF4rpRXQHTW54yYwn2tAF7f7UCbMdBEZW3xjoD5bIU2DiFutQUh7SQGSWpHhTqqO1hCyLjTsdP/BCKjmsxvd6hPFpOgNIpSca+aC0LK8W34Lwj1zmYOeE14B0kO10+JZoT+il5GHoIrIe6GmESPpnQkRI8qGkVygIKM79o0kcXkB2oS4HPAeuxG2peQj0wfmyg0Z/n68qeMD5dhSAw4nEPwBqfCt6iy7MidpJPbr8wUvIb7S+Rep6Wlts6La3780M09AlMIB9bISjrygK6evTzAP3UaXsRoyBT00wB5ddROCIqIKgPcOdzI77J6VsMu3eQmIT2ObuyMk2XIcsW/3hpkTLTt+ji2rWU20W9wS3zHToqh+sZJ9/k4s2NvFe7ljG7pkZjnw9/TO9gAB9uvpbLsj7Efdhdu1Av6G4k++3Ss72LflPaU+iSm5 FvR9s3ie xFnDY+CaTrNgtklLVnDJqD6eq6uF6pAzANTyYE3SR13xPsi/KhfB3dbbN0YTvONDt4MNAIsCnb/UPAWLIcZj58GAd5RSpuKCEUIuAOc9DbbvJPNO1x+YpzcIP4nzZ4CvF53Ztyrr0kGrewbdPyh0VmO4SZfMwxfe3GFJhaQjEV9NPZHhYcsi+v6P5L6uXAvUZ/xvXHPdnHUzUwX7997Aj/tB9pZFhJBytUrUDSi5Uf5F6hrqQDjU26ZBGu9AoU0JrHsGbNf4locL4Hh/aCyLg9NjGmRJw7cA8m6MU9WfwpZ3SZ8hqHGNkfWN9a/nzgcSVYt1jLg1dQQ9sta78tUVfWHpdYw/pKbwSlUoQMedM3sCqE1lPR/vANmHaeQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Apr 10, 2025 at 10:37:43PM -0700, Changyuan Lyu wrote: > From: Alexander Graf > > Linux has recently gained support for "reserve_mem": A mechanism to > allocate a region of memory early enough in boot that we can cross our > fingers and hope it stays at the same location during most boots, so we > can store for example ftrace buffers into it. > > Thanks to KASLR, we can never be really sure that "reserve_mem" > allocations are static across kexec. Let's teach it KHO awareness so > that it serializes its reservations on kexec exit and deserializes them > again on boot, preserving the exact same mapping across kexec. > > This is an example user for KHO in the KHO patch set to ensure we have > at least one (not very controversial) user in the tree before extending > KHO's use to more subsystems. > > Signed-off-by: Alexander Graf > Co-developed-by: Mike Rapoport (Microsoft) > Signed-off-by: Mike Rapoport (Microsoft) > Co-developed-by: Changyuan Lyu > Signed-off-by: Changyuan Lyu > --- > mm/memblock.c | 205 ++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 205 insertions(+) > > diff --git a/mm/memblock.c b/mm/memblock.c > index 456689cb73e20..3571a859f2fe1 100644 > --- a/mm/memblock.c > +++ b/mm/memblock.c > @@ -18,6 +18,11 @@ > #include > #include > > +#ifdef CONFIG_KEXEC_HANDOVER > +#include > +#include > +#endif /* CONFIG_KEXEC_HANDOVER */ > + > #include > #include > > @@ -2475,6 +2480,201 @@ int reserve_mem_release_by_name(const char *name) > return 1; > } > > +#ifdef CONFIG_KEXEC_HANDOVER > +#define MEMBLOCK_KHO_FDT "memblock" > +#define MEMBLOCK_KHO_NODE_COMPATIBLE "memblock-v1" > +#define RESERVE_MEM_KHO_NODE_COMPATIBLE "reserve-mem-v1" > +static struct page *kho_fdt; > + > +static int reserve_mem_kho_finalize(struct kho_serialization *ser) > +{ > + int err = 0, i; > + > + if (!reserved_mem_count) > + return NOTIFY_DONE; > + > + if (IS_ERR(kho_fdt)) { > + err = PTR_ERR(kho_fdt); > + pr_err("memblock FDT was not prepared successfully: %d\n", err); > + return notifier_from_errno(err); > + } > + > + for (i = 0; i < reserved_mem_count; i++) { > + struct reserve_mem_table *map = &reserved_mem_table[i]; > + > + err |= kho_preserve_phys(ser, map->start, map->size); > + } > + > + err |= kho_preserve_folio(ser, page_folio(kho_fdt)); > + err |= kho_add_subtree(ser, MEMBLOCK_KHO_FDT, page_to_virt(kho_fdt)); > + > + return notifier_from_errno(err); > +} > + > +static int reserve_mem_kho_notifier(struct notifier_block *self, > + unsigned long cmd, void *v) > +{ > + switch (cmd) { > + case KEXEC_KHO_FINALIZE: > + return reserve_mem_kho_finalize((struct kho_serialization *)v); > + case KEXEC_KHO_ABORT: > + return NOTIFY_DONE; > + default: > + return NOTIFY_BAD; > + } > +} > + > +static struct notifier_block reserve_mem_kho_nb = { > + .notifier_call = reserve_mem_kho_notifier, > +}; > + > +static void __init prepare_kho_fdt(void) > +{ > + int err = 0, i; > + void *fdt; > + > + if (!reserved_mem_count) > + return; It's better to have this check in reserve_mem_init() before registering kho notifier. > + > + kho_fdt = alloc_page(GFP_KERNEL); > + if (!kho_fdt) { > + kho_fdt = ERR_PTR(-ENOMEM); Do we really care about having errno in kho_fdt? I think NULL would work just fine. > + return; And actually, it makes sense to me to return -ENOMEM here and let reserve_mem_init() bail out before registering notifier if fdt preparation failed. That will save the checks in reserve_mem_kho_finalize() because it would be called only if we have reserve_mem areas and fdt is ready. > + } > + > + fdt = page_to_virt(kho_fdt); > + > + err |= fdt_create(fdt, PAGE_SIZE); > + err |= fdt_finish_reservemap(fdt); > + > + err |= fdt_begin_node(fdt, ""); > + err |= fdt_property_string(fdt, "compatible", MEMBLOCK_KHO_NODE_COMPATIBLE); > + for (i = 0; i < reserved_mem_count; i++) { > + struct reserve_mem_table *map = &reserved_mem_table[i]; > + > + err |= fdt_begin_node(fdt, map->name); > + err |= fdt_property_string(fdt, "compatible", RESERVE_MEM_KHO_NODE_COMPATIBLE); > + err |= fdt_property(fdt, "start", &map->start, sizeof(map->start)); > + err |= fdt_property(fdt, "size", &map->size, sizeof(map->size)); > + err |= fdt_end_node(fdt); > + } > + err |= fdt_end_node(fdt); > + > + err |= fdt_finish(fdt); > + > + if (err) { > + pr_err("failed to prepare memblock FDT for KHO: %d\n", err); > + put_page(kho_fdt); > + kho_fdt = ERR_PTR(-EINVAL); > + } > +} > + > +static int __init reserve_mem_init(void) > +{ > + if (!kho_is_enabled()) > + return 0; > + > + prepare_kho_fdt(); > + > + return register_kho_notifier(&reserve_mem_kho_nb); > +} > +late_initcall(reserve_mem_init); > + > +static void *kho_fdt_in __initdata; > + > +static void *__init reserve_mem_kho_retrieve_fdt(void) > +{ > + phys_addr_t fdt_phys; > + struct folio *fdt_folio; > + void *fdt; > + int err; > + > + err = kho_retrieve_subtree(MEMBLOCK_KHO_FDT, &fdt_phys); > + if (err) { > + if (err != -ENOENT) > + pr_warn("failed to retrieve FDT '%s' from KHO: %d\n", > + MEMBLOCK_KHO_FDT, err); > + return ERR_PTR(err); Wouldn't just 'return NULL' work here? > + } > + > + fdt_folio = kho_restore_folio(fdt_phys); > + if (!fdt_folio) { > + pr_warn("failed to restore memblock KHO FDT (0x%llx)\n", fdt_phys); > + return ERR_PTR(-EFAULT); > + } > + > + fdt = page_to_virt(folio_page(fdt_folio, 0)); fdt = folio_address(folio); > + > + err = fdt_node_check_compatible(fdt, 0, MEMBLOCK_KHO_NODE_COMPATIBLE); > + if (err) { > + pr_warn("FDT '%s' is incompatible with '%s': %d\n", > + MEMBLOCK_KHO_FDT, MEMBLOCK_KHO_NODE_COMPATIBLE, err); > + return ERR_PTR(-EINVAL); > + } > + > + return fdt; > +} > + > +static bool __init reserve_mem_kho_revive(const char *name, phys_addr_t size, > + phys_addr_t align) > +{ > + int err, len_start, len_size, offset; > + const phys_addr_t *p_start, *p_size; > + const void *fdt; > + > + if (!kho_fdt_in) > + kho_fdt_in = reserve_mem_kho_retrieve_fdt(); I'd invert this and move to reserve_mem_kho_retrieve_fdt(), so there it would be if (kho_fdt_in) return kho_fdt_in; /* actually retrieve the fdt */ kho_fdt_in = fdt; return fdt; and here fdt = reserve_mem_kho_retrieve_fdt(); if (!fdt) return false; > + > + if (IS_ERR(kho_fdt_in)) > + return false; > + > + fdt = kho_fdt_in; > + > + offset = fdt_subnode_offset(fdt, 0, name); > + if (offset < 0) { > + pr_warn("FDT '%s' has no child '%s': %d\n", > + MEMBLOCK_KHO_FDT, name, offset); > + return false; > + } > + err = fdt_node_check_compatible(fdt, offset, RESERVE_MEM_KHO_NODE_COMPATIBLE); > + if (err) { > + pr_warn("Node '%s' is incompatible with '%s': %d\n", > + name, RESERVE_MEM_KHO_NODE_COMPATIBLE, err); > + return false; > + } > + > + p_start = fdt_getprop(fdt, offset, "start", &len_start); > + p_size = fdt_getprop(fdt, offset, "size", &len_size); > + if (!p_start || len_start != sizeof(*p_start) || !p_size || > + len_size != sizeof(*p_size)) { > + return false; > + } > + > + if (*p_start & (align - 1)) { > + pr_warn("KHO reserve-mem '%s' has wrong alignment (0x%lx, 0x%lx)\n", > + name, (long)align, (long)*p_start); > + return false; > + } > + > + if (*p_size != size) { > + pr_warn("KHO reserve-mem '%s' has wrong size (0x%lx != 0x%lx)\n", > + name, (long)*p_size, (long)size); > + return false; > + } > + > + reserved_mem_add(*p_start, size, name); > + pr_info("Revived memory reservation '%s' from KHO\n", name); > + > + return true; > +} > +#else > +static bool __init reserve_mem_kho_revive(const char *name, phys_addr_t size, > + phys_addr_t align) > +{ > + return false; > +} > +#endif /* CONFIG_KEXEC_HANDOVER */ > + > /* > * Parse reserve_mem=nn:align:name > */ > @@ -2530,6 +2730,11 @@ static int __init reserve_mem(char *p) > if (reserve_mem_find_by_name(name, &start, &tmp)) > return -EBUSY; > > + /* Pick previous allocations up from KHO if available */ > + if (reserve_mem_kho_revive(name, size, align)) > + return 1; > + > + /* TODO: Allocation must be outside of scratch region */ > start = memblock_phys_alloc(size, align); > if (!start) > return -ENOMEM; > -- > 2.49.0.604.gff1f9ca942-goog > -- Sincerely yours, Mike.