From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 868EBC369DC for ; Thu, 1 May 2025 22:55:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 23FB36B00A1; Thu, 1 May 2025 18:55:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1CE146B00A4; Thu, 1 May 2025 18:55:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DD81B6B00A5; Thu, 1 May 2025 18:55:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id A75B96B00A1 for ; Thu, 1 May 2025 18:55:24 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 8D11E800C9 for ; Thu, 1 May 2025 22:55:25 +0000 (UTC) X-FDA: 83395847010.21.35A392B Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) by imf23.hostedemail.com (Postfix) with ESMTP id B4A9E140009 for ; Thu, 1 May 2025 22:55:23 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=YfyShqvv; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf23.hostedemail.com: domain of 32vsTaAoKCGkJOHUNfbHUSNVVNSL.JVTSPUbe-TTRcHJR.VYN@flex--changyuanl.bounces.google.com designates 209.85.215.202 as permitted sender) smtp.mailfrom=32vsTaAoKCGkJOHUNfbHUSNVVNSL.JVTSPUbe-TTRcHJR.VYN@flex--changyuanl.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1746140123; a=rsa-sha256; cv=none; b=ye26aUMZ5kqt+er0BOnO5V2W9v9ZBil5fR5iw8kIbcPwQpYDICCZF7UAv08XaJEsVakUGt sc9ZHuKnemXzWmoiEQTbUEvH6N/NHm7a7tzQDuPTmBC/S3lLbHXku8LJs14B9EzyOaE1ji NGL8OWKcftq0QSSwD3c55yiinMqvEzk= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=YfyShqvv; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf23.hostedemail.com: domain of 32vsTaAoKCGkJOHUNfbHUSNVVNSL.JVTSPUbe-TTRcHJR.VYN@flex--changyuanl.bounces.google.com designates 209.85.215.202 as permitted sender) smtp.mailfrom=32vsTaAoKCGkJOHUNfbHUSNVVNSL.JVTSPUbe-TTRcHJR.VYN@flex--changyuanl.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1746140123; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ahn0CpwGlow7d1B2oRzig1Hirg8bGDbJkNyQ2rWb7+Y=; b=RJd9v5jt2Zs80vUpStNkyzg5ePtXansuoHTTnrkVyKryVebF2EcvLrWVFeisLIR64OJyjC Wzo+zcl3DwyvbR/QjA7TxUvfLQJEtJ9az7/2nQAM70ZpZ/ntcB5nuNtnpVY2W3AkVaE2Mu 9OqYIISXgDXY2Aox4M7VTABt1IhDnlM= Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-af59547f55bso854176a12.0 for ; Thu, 01 May 2025 15:55:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1746140122; x=1746744922; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ahn0CpwGlow7d1B2oRzig1Hirg8bGDbJkNyQ2rWb7+Y=; b=YfyShqvvLzdWKgKUswEPPNVI963j4Cj9fpKmm4q7+38T7wE4T2QRDvQGb/ASu4PBNN lOieWI7TL26s7CWljzMMCtF1vKOB1W2b2Q69hwgCpKBsUzPZPGZxutJW0GGmX4+Ysj9E 5TQYG9vVJOch3uvd7qNm/Wuoa7+gD8LXrMG31cUUJSEgJaY1rksgwPpneZJyiyj0WbBT CWpIuEHM1VUUF6ZkDbtc7bUOUCM+oD2g7gd8iOi+j4CyvHOCc3x+XI2Pu5eJyWfjg1hE i24CIJMSVjnobXePUSoD0R/rmXxarj8cd/AFjL2NIiuXITfeuwJY291KpB6jIgHLqHP8 NZYw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1746140122; x=1746744922; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ahn0CpwGlow7d1B2oRzig1Hirg8bGDbJkNyQ2rWb7+Y=; b=YCUscBNIoUQ0pZdtQefb/BIVNc168WdnKPsE1/+TI566LfG5XDmItjc+JJCqn4PblF FRC5F+vij5EOAuPKdmuWltH8+52VAfyHjdS5SLIGpwgukqCGjkdxLg4gq4XA5kmzcomM 6Pd+SH73VJ3+ICWNXqF/HOwLZuCRvOPG6c1ciaVClFvJ3r17afqMkXqDnrJwkhWRVLmp BPjso0XPu7zRVhgidrHQCkWWDun09Iv5MHpd1SNbIt6DsQXpXUEaq5J3V4YjYBsO+dqt XXnYC6FinSgF1pswTBkaY5uqSkF2tINVN29uBckX/ZvXCTSRWIPOv9HQv8ugXV281CY7 f1tQ== X-Forwarded-Encrypted: i=1; AJvYcCXLJnDmS3NWnKNiu83ISownmUyKq7cWDZaC1R3pbtaoTJYyZgzsgp65swCXPBn8/xhKRavnlY+glg==@kvack.org X-Gm-Message-State: AOJu0Ywr62BHWF4+fDZa/gZ+/D/shKbSQFwieGJq4k17T8yV4ugKP6X8 2UZDs1xUqsujHzoeS7eVl8Fw5Wpx+3SHtnUJiGr6d0rl+3wjADL+ISRcJlADREW7+ZruIE9HN34 Gwsw1PmBGqZjoXhTMpw== X-Google-Smtp-Source: AGHT+IE7Ipcc3Yfu0opFD7T9fLG2qdetuzGT/pPArsEd4Y7NNLnPX6DlCyLOptaPTidbIU/KZquDv6iti8nU9gNz X-Received: from pgq9.prod.google.com ([2002:a63:1049:0:b0:b1f:9d93:19a7]) (user=changyuanl job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a21:9004:b0:1f5:9393:fd4d with SMTP id adf61e73a8af0-20ce04e70cdmr969801637.27.1746140122500; Thu, 01 May 2025 15:55:22 -0700 (PDT) Date: Thu, 1 May 2025 15:54:14 -0700 In-Reply-To: <20250501225425.635167-1-changyuanl@google.com> Mime-Version: 1.0 References: <20250501225425.635167-1-changyuanl@google.com> X-Mailer: git-send-email 2.49.0.906.g1f30a19c02-goog Message-ID: <20250501225425.635167-8-changyuanl@google.com> Subject: [PATCH v7 07/18] kexec: enable KHO support for memory preservation From: Changyuan Lyu To: linux-kernel@vger.kernel.org Cc: changyuanl@google.com, akpm@linux-foundation.org, anthony.yznaga@oracle.com, arnd@arndb.de, ashish.kalra@amd.com, benh@kernel.crashing.org, bp@alien8.de, catalin.marinas@arm.com, corbet@lwn.net, dave.hansen@linux.intel.com, devicetree@vger.kernel.org, dwmw2@infradead.org, ebiederm@xmission.com, graf@amazon.com, hpa@zytor.com, jgowans@amazon.com, kexec@lists.infradead.org, krzk@kernel.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, luto@kernel.org, mark.rutland@arm.com, mingo@redhat.com, pasha.tatashin@soleen.com, pbonzini@redhat.com, peterz@infradead.org, ptyadav@amazon.de, robh@kernel.org, rostedt@goodmis.org, rppt@kernel.org, saravanak@google.com, skinsburskii@linux.microsoft.com, tglx@linutronix.de, thomas.lendacky@amd.com, will@kernel.org, x86@kernel.org, Jason Gunthorpe Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: B4A9E140009 X-Stat-Signature: 5amfky465gn6h4wfj7cpa5s6gj6nncbc X-HE-Tag: 1746140123-537016 X-HE-Meta: U2FsdGVkX18Q/aQf3IvV9D0yHpN4XOd7nZ8/lGqV0L0DWCQ5aXdg55XVrYWAwVsRSo6T7KDvB99/wZhOYljujzJawcs61Y0VG9Ai5rEvQqip1bTAhqiPM0F4z0IAlOchPRySavomjV84hQi81Z51Qv0J0rSJ78Ahr/PrtknGAlOjlifTvPuxdh3S4yTdVQJiUhvMFA92WgXzBpR54I33gU0ObeGiynUyezFrnM5uq7QaQBk6aPfLSunLXyzYWeRThG1AKmiHShltrBcDjS6K4O8t8sQfdE6e9uF54e3We+JT+GMjeRuKDrLCUlFmAL1A3pKHk9rgkH3XBeC5JCQw/6iEG3cU3yAZ8N4qNoYgWbQCiQixHOZHtJi/COP+vVUfsQzSEkBB0yaMUo14iulywz5NZUeXh2ZR1Vh/ct7NJCizfAbY+6NwEHG9i8HOOS3CL3jV5+AVaJsbkCphZZ4TPsQB5On9+BHRJSld3dp9IPwQf22wxkwnIbaGsK8LJ9YPIsWZgij8w+hHGM7aQZ1sQMviP9u3V8wu7bOwcG+YmBFibgxYxi6ixgoGLofWLu1jnm6DfX40+8tsaKO3EhEcCxl6dms12IciI8myfDZ3SQRVXzNZ5Wngxw3hfDxhipj+gakMGOPbGD9nu8RHlrZULG286wE5r9T/Axek5BgeRYMO6Uuoyzq/3kZTIz41AYqlZaAk2T0niGb9m/ART6xN3q59oyke/VrgQ5ZjohUQupl+gyIEwKOezhlp1TpnhoSnhU+M2c+Q5ibr/AQ3qgVv+srvkOJvINfvb890UYRjLNWf420Q9nBD0jIa4S8aqfAc1EZYmkKsxJOWzuuSAcyyh+03xhcuS3L7cY81jL16dcOVVUsPV2wr7ofIjy85T8xEOpfwtnrlozxORrZ60ER5bhD6rlg5ERIhUSdp6z3QEY20hRHZA2Kn2tEV5oSdtLp6Z48ROw9DH54ys327a7Z gtTlxltK j2lkimQlfRgXfo54OyM8xCWxhfFqXgka6tZ6OF8DqaogcFSNJQU2Aj8K7cUmIiMCT9opxDSX+o53tHYNJJXfWdC61BH7B97EDbUZqiTkbMILcbmZTeds1frm1FoydyucaY5lmGUl6rlGfWwyLCDBYSIseMALs59A1tKmNfsI61ydcb3nrjLBYCI/oUI6j4RNBkfxGutQ9EcbNItF9q1e0XUggkjS0yB8FYMNG3lVDZPQA4VBVdgqNmANVD0CP4N7ssZT0/WyQCYbqIzndvTCPFt+vOP5w44AAZgsLu4JGnZNOTsGGCFnjv/3SAgUWZASNN2sFdgKR5Rq33Y8O6iJxT8kXuDwK1vFaefYZ3vvhCR7hjLWWapinkLBYSVUU+cbPaBPqz93jXYY/SltoggpI9tE+z6ULtRzUI1o+qBFe/NMDzz+8NKJ5/JK6cG5SwE3cCf/wjQQSh+lDhXIah5q5u2+vIluXq8BsqrSZbBKI8gayuxTGUqe6xFJq48Rd4H9avvWQVNIuS4EoIlsP6dhRA54dR4ztHPQ6Uz6Ns1gjdJt2DMlByedH2jCr/NalqfK//bk2mc+UT72+aWhTiw7UWVinDZiAu698r8MKFJfsSh+D30yOCSzmVw0RmA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: "Mike Rapoport (Microsoft)" Introduce APIs allowing KHO users to preserve memory across kexec and get access to that memory after boot of the kexeced kernel kho_preserve_folio() - record a folio to be preserved over kexec kho_restore_folio() - recreates the folio from the preserved memory kho_preserve_phys() - record physically contiguous range to be preserved over kexec. The memory preservations are tracked by two levels of xarrays to manage chunks of per-order 512 byte bitmaps. For instance if PAGE_SIZE = 4096, the entire 1G order of a 1TB x86 system would fit inside a single 512 byte bitmap. For order 0 allocations each bitmap will cover 16M of address space. Thus, for 16G of memory at most 512K of bitmap memory will be needed for order 0. At serialization time all bitmaps are recorded in a linked list of pages for the next kernel to process and the physical address of the list is recorded in KHO FDT. The next kernel then processes that list, reserves the memory ranges and later, when a user requests a folio or a physical range, KHO restores corresponding memory map entries. Suggested-by: Jason Gunthorpe Signed-off-by: Mike Rapoport (Microsoft) Co-developed-by: Changyuan Lyu Signed-off-by: Changyuan Lyu --- include/linux/kexec_handover.h | 36 +++ kernel/kexec_handover.c | 406 +++++++++++++++++++++++++++++++++ 2 files changed, 442 insertions(+) diff --git a/include/linux/kexec_handover.h b/include/linux/kexec_handover.h index 02dcfc8c427e3..348844cffb136 100644 --- a/include/linux/kexec_handover.h +++ b/include/linux/kexec_handover.h @@ -16,13 +16,34 @@ enum kho_event { KEXEC_KHO_ABORT = 1, }; +struct folio; struct notifier_block; +#define DECLARE_KHOSER_PTR(name, type) \ + union { \ + phys_addr_t phys; \ + type ptr; \ + } name +#define KHOSER_STORE_PTR(dest, val) \ + ({ \ + typeof(val) v = val; \ + typecheck(typeof((dest).ptr), v); \ + (dest).phys = virt_to_phys(v); \ + }) +#define KHOSER_LOAD_PTR(src) \ + ({ \ + typeof(src) s = src; \ + (typeof((s).ptr))((s).phys ? phys_to_virt((s).phys) : NULL); \ + }) + struct kho_serialization; #ifdef CONFIG_KEXEC_HANDOVER bool kho_is_enabled(void); +int kho_preserve_folio(struct folio *folio); +int kho_preserve_phys(phys_addr_t phys, size_t size); +struct folio *kho_restore_folio(phys_addr_t phys); int kho_add_subtree(struct kho_serialization *ser, const char *name, void *fdt); int kho_retrieve_subtree(const char *name, phys_addr_t *phys); @@ -39,6 +60,21 @@ static inline bool kho_is_enabled(void) return false; } +static inline int kho_preserve_folio(struct folio *folio) +{ + return -EOPNOTSUPP; +} + +static inline int kho_preserve_phys(phys_addr_t phys, size_t size) +{ + return -EOPNOTSUPP; +} + +static inline struct folio *kho_restore_folio(phys_addr_t phys) +{ + return NULL; +} + static inline int kho_add_subtree(struct kho_serialization *ser, const char *name, void *fdt) { diff --git a/kernel/kexec_handover.c b/kernel/kexec_handover.c index 59f3cf9557f50..3bf74b4960f84 100644 --- a/kernel/kexec_handover.c +++ b/kernel/kexec_handover.c @@ -9,6 +9,7 @@ #define pr_fmt(fmt) "KHO: " fmt #include +#include #include #include #include @@ -44,12 +45,307 @@ static int __init kho_parse_enable(char *p) } early_param("kho", kho_parse_enable); +/* + * Keep track of memory that is to be preserved across KHO. + * + * The serializing side uses two levels of xarrays to manage chunks of per-order + * 512 byte bitmaps. For instance if PAGE_SIZE = 4096, the entire 1G order of a + * 1TB system would fit inside a single 512 byte bitmap. For order 0 allocations + * each bitmap will cover 16M of address space. Thus, for 16G of memory at most + * 512K of bitmap memory will be needed for order 0. + * + * This approach is fully incremental, as the serialization progresses folios + * can continue be aggregated to the tracker. The final step, immediately prior + * to kexec would serialize the xarray information into a linked list for the + * successor kernel to parse. + */ + +#define PRESERVE_BITS (512 * 8) + +struct kho_mem_phys_bits { + DECLARE_BITMAP(preserve, PRESERVE_BITS); +}; + +struct kho_mem_phys { + /* + * Points to kho_mem_phys_bits, a sparse bitmap array. Each bit is sized + * to order. + */ + struct xarray phys_bits; +}; + +struct kho_mem_track { + /* Points to kho_mem_phys, each order gets its own bitmap tree */ + struct xarray orders; +}; + +struct khoser_mem_chunk; + struct kho_serialization { struct page *fdt; struct list_head fdt_list; struct dentry *sub_fdt_dir; + struct kho_mem_track track; + /* First chunk of serialized preserved memory map */ + struct khoser_mem_chunk *preserved_mem_map; }; +static void *xa_load_or_alloc(struct xarray *xa, unsigned long index, size_t sz) +{ + void *elm, *res; + + elm = xa_load(xa, index); + if (elm) + return elm; + + elm = kzalloc(sz, GFP_KERNEL); + if (!elm) + return ERR_PTR(-ENOMEM); + + res = xa_cmpxchg(xa, index, NULL, elm, GFP_KERNEL); + if (xa_is_err(res)) + res = ERR_PTR(xa_err(res)); + + if (res) { + kfree(elm); + return res; + } + + return elm; +} + +static void __kho_unpreserve(struct kho_mem_track *track, unsigned long pfn, + unsigned long end_pfn) +{ + struct kho_mem_phys_bits *bits; + struct kho_mem_phys *physxa; + + while (pfn < end_pfn) { + const unsigned int order = + min(count_trailing_zeros(pfn), ilog2(end_pfn - pfn)); + const unsigned long pfn_high = pfn >> order; + + physxa = xa_load(&track->orders, order); + if (!physxa) + continue; + + bits = xa_load(&physxa->phys_bits, pfn_high / PRESERVE_BITS); + if (!bits) + continue; + + clear_bit(pfn_high % PRESERVE_BITS, bits->preserve); + + pfn += 1 << order; + } +} + +static int __kho_preserve_order(struct kho_mem_track *track, unsigned long pfn, + unsigned int order) +{ + struct kho_mem_phys_bits *bits; + struct kho_mem_phys *physxa; + const unsigned long pfn_high = pfn >> order; + + might_sleep(); + + physxa = xa_load_or_alloc(&track->orders, order, sizeof(*physxa)); + if (IS_ERR(physxa)) + return PTR_ERR(physxa); + + bits = xa_load_or_alloc(&physxa->phys_bits, pfn_high / PRESERVE_BITS, + sizeof(*bits)); + if (IS_ERR(bits)) + return PTR_ERR(bits); + + set_bit(pfn_high % PRESERVE_BITS, bits->preserve); + + return 0; +} + +/* almost as free_reserved_page(), just don't free the page */ +static void kho_restore_page(struct page *page) +{ + ClearPageReserved(page); + init_page_count(page); + adjust_managed_page_count(page, 1); +} + +/** + * kho_restore_folio - recreates the folio from the preserved memory. + * @phys: physical address of the folio. + * + * Return: pointer to the struct folio on success, NULL on failure. + */ +struct folio *kho_restore_folio(phys_addr_t phys) +{ + struct page *page = pfn_to_online_page(PHYS_PFN(phys)); + unsigned long order; + + if (!page) + return NULL; + + order = page->private; + if (order) { + if (order > MAX_PAGE_ORDER) + return NULL; + + prep_compound_page(page, order); + } else { + kho_restore_page(page); + } + + return page_folio(page); +} +EXPORT_SYMBOL_GPL(kho_restore_folio); + +/* Serialize and deserialize struct kho_mem_phys across kexec + * + * Record all the bitmaps in a linked list of pages for the next kernel to + * process. Each chunk holds bitmaps of the same order and each block of bitmaps + * starts at a given physical address. This allows the bitmaps to be sparse. The + * xarray is used to store them in a tree while building up the data structure, + * but the KHO successor kernel only needs to process them once in order. + * + * All of this memory is normal kmalloc() memory and is not marked for + * preservation. The successor kernel will remain isolated to the scratch space + * until it completes processing this list. Once processed all the memory + * storing these ranges will be marked as free. + */ + +struct khoser_mem_bitmap_ptr { + phys_addr_t phys_start; + DECLARE_KHOSER_PTR(bitmap, struct kho_mem_phys_bits *); +}; + +struct khoser_mem_chunk_hdr { + DECLARE_KHOSER_PTR(next, struct khoser_mem_chunk *); + unsigned int order; + unsigned int num_elms; +}; + +#define KHOSER_BITMAP_SIZE \ + ((PAGE_SIZE - sizeof(struct khoser_mem_chunk_hdr)) / \ + sizeof(struct khoser_mem_bitmap_ptr)) + +struct khoser_mem_chunk { + struct khoser_mem_chunk_hdr hdr; + struct khoser_mem_bitmap_ptr bitmaps[KHOSER_BITMAP_SIZE]; +}; + +static_assert(sizeof(struct khoser_mem_chunk) == PAGE_SIZE); + +static struct khoser_mem_chunk *new_chunk(struct khoser_mem_chunk *cur_chunk, + unsigned long order) +{ + struct khoser_mem_chunk *chunk; + + chunk = kzalloc(PAGE_SIZE, GFP_KERNEL); + if (!chunk) + return NULL; + chunk->hdr.order = order; + if (cur_chunk) + KHOSER_STORE_PTR(cur_chunk->hdr.next, chunk); + return chunk; +} + +static void kho_mem_ser_free(struct khoser_mem_chunk *first_chunk) +{ + struct khoser_mem_chunk *chunk = first_chunk; + + while (chunk) { + struct khoser_mem_chunk *tmp = chunk; + + chunk = KHOSER_LOAD_PTR(chunk->hdr.next); + kfree(tmp); + } +} + +static int kho_mem_serialize(struct kho_serialization *ser) +{ + struct khoser_mem_chunk *first_chunk = NULL; + struct khoser_mem_chunk *chunk = NULL; + struct kho_mem_phys *physxa; + unsigned long order; + + xa_for_each(&ser->track.orders, order, physxa) { + struct kho_mem_phys_bits *bits; + unsigned long phys; + + chunk = new_chunk(chunk, order); + if (!chunk) + goto err_free; + + if (!first_chunk) + first_chunk = chunk; + + xa_for_each(&physxa->phys_bits, phys, bits) { + struct khoser_mem_bitmap_ptr *elm; + + if (chunk->hdr.num_elms == ARRAY_SIZE(chunk->bitmaps)) { + chunk = new_chunk(chunk, order); + if (!chunk) + goto err_free; + } + + elm = &chunk->bitmaps[chunk->hdr.num_elms]; + chunk->hdr.num_elms++; + elm->phys_start = (phys * PRESERVE_BITS) + << (order + PAGE_SHIFT); + KHOSER_STORE_PTR(elm->bitmap, bits); + } + } + + ser->preserved_mem_map = first_chunk; + + return 0; + +err_free: + kho_mem_ser_free(first_chunk); + return -ENOMEM; +} + +static void deserialize_bitmap(unsigned int order, + struct khoser_mem_bitmap_ptr *elm) +{ + struct kho_mem_phys_bits *bitmap = KHOSER_LOAD_PTR(elm->bitmap); + unsigned long bit; + + for_each_set_bit(bit, bitmap->preserve, PRESERVE_BITS) { + int sz = 1 << (order + PAGE_SHIFT); + phys_addr_t phys = + elm->phys_start + (bit << (order + PAGE_SHIFT)); + struct page *page = phys_to_page(phys); + + memblock_reserve(phys, sz); + memblock_reserved_mark_noinit(phys, sz); + page->private = order; + } +} + +static void __init kho_mem_deserialize(const void *fdt) +{ + struct khoser_mem_chunk *chunk; + const phys_addr_t *mem; + int len; + + mem = fdt_getprop(fdt, 0, PROP_PRESERVED_MEMORY_MAP, &len); + + if (!mem || len != sizeof(*mem)) { + pr_err("failed to get preserved memory bitmaps\n"); + return; + } + + chunk = *mem ? phys_to_virt(*mem) : NULL; + while (chunk) { + unsigned int i; + + for (i = 0; i != chunk->hdr.num_elms; i++) + deserialize_bitmap(chunk->hdr.order, + &chunk->bitmaps[i]); + chunk = KHOSER_LOAD_PTR(chunk->hdr.next); + } +} + /* * With KHO enabled, memory can become fragmented because KHO regions may * be anywhere in physical address space. The scratch regions give us a @@ -324,6 +620,9 @@ static struct kho_out kho_out = { .lock = __MUTEX_INITIALIZER(kho_out.lock), .ser = { .fdt_list = LIST_HEAD_INIT(kho_out.ser.fdt_list), + .track = { + .orders = XARRAY_INIT(kho_out.ser.track.orders, 0), + }, }, .finalized = false, }; @@ -340,6 +639,73 @@ int unregister_kho_notifier(struct notifier_block *nb) } EXPORT_SYMBOL_GPL(unregister_kho_notifier); +/** + * kho_preserve_folio - preserve a folio across kexec. + * @folio: folio to preserve. + * + * Instructs KHO to preserve the whole folio across kexec. The order + * will be preserved as well. + * + * Return: 0 on success, error code on failure + */ +int kho_preserve_folio(struct folio *folio) +{ + const unsigned long pfn = folio_pfn(folio); + const unsigned int order = folio_order(folio); + struct kho_mem_track *track = &kho_out.ser.track; + + if (kho_out.finalized) + return -EBUSY; + + return __kho_preserve_order(track, pfn, order); +} +EXPORT_SYMBOL_GPL(kho_preserve_folio); + +/** + * kho_preserve_phys - preserve a physically contiguous range across kexec. + * @phys: physical address of the range. + * @size: size of the range. + * + * Instructs KHO to preserve the memory range from @phys to @phys + @size + * across kexec. + * + * Return: 0 on success, error code on failure + */ +int kho_preserve_phys(phys_addr_t phys, size_t size) +{ + unsigned long pfn = PHYS_PFN(phys); + unsigned long failed_pfn = 0; + const unsigned long start_pfn = pfn; + const unsigned long end_pfn = PHYS_PFN(phys + size); + int err = 0; + struct kho_mem_track *track = &kho_out.ser.track; + + if (kho_out.finalized) + return -EBUSY; + + if (!PAGE_ALIGNED(phys) || !PAGE_ALIGNED(size)) + return -EINVAL; + + while (pfn < end_pfn) { + const unsigned int order = + min(count_trailing_zeros(pfn), ilog2(end_pfn - pfn)); + + err = __kho_preserve_order(track, pfn, order); + if (err) { + failed_pfn = pfn; + break; + } + + pfn += 1 << order; + } + + if (err) + __kho_unpreserve(track, start_pfn, failed_pfn); + + return err; +} +EXPORT_SYMBOL_GPL(kho_preserve_phys); + /* Handling for debug/kho/out */ static struct dentry *debugfs_root; @@ -366,6 +732,25 @@ static int kho_out_update_debugfs_fdt(void) static int kho_abort(void) { int err; + unsigned long order; + struct kho_mem_phys *physxa; + + xa_for_each(&kho_out.ser.track.orders, order, physxa) { + struct kho_mem_phys_bits *bits; + unsigned long phys; + + xa_for_each(&physxa->phys_bits, phys, bits) + kfree(bits); + + xa_destroy(&physxa->phys_bits); + kfree(physxa); + } + xa_destroy(&kho_out.ser.track.orders); + + if (kho_out.ser.preserved_mem_map) { + kho_mem_ser_free(kho_out.ser.preserved_mem_map); + kho_out.ser.preserved_mem_map = NULL; + } err = blocking_notifier_call_chain(&kho_out.chain_head, KEXEC_KHO_ABORT, NULL); @@ -380,12 +765,25 @@ static int kho_abort(void) static int kho_finalize(void) { int err = 0; + u64 *preserved_mem_map; void *fdt = page_to_virt(kho_out.ser.fdt); err |= fdt_create(fdt, PAGE_SIZE); err |= fdt_finish_reservemap(fdt); err |= fdt_begin_node(fdt, ""); err |= fdt_property_string(fdt, "compatible", KHO_FDT_COMPATIBLE); + /** + * Reserve the preserved-memory-map property in the root FDT, so + * that all property definitions will precede subnodes created by + * KHO callers. + */ + err |= fdt_property_placeholder(fdt, PROP_PRESERVED_MEMORY_MAP, + sizeof(*preserved_mem_map), + (void **)&preserved_mem_map); + if (err) + goto abort; + + err = kho_preserve_folio(page_folio(kho_out.ser.fdt)); if (err) goto abort; @@ -395,6 +793,12 @@ static int kho_finalize(void) if (err) goto abort; + err = kho_mem_serialize(&kho_out.ser); + if (err) + goto abort; + + *preserved_mem_map = (u64)virt_to_phys(kho_out.ser.preserved_mem_map); + err |= fdt_end_node(fdt); err |= fdt_finish(fdt); @@ -700,6 +1104,8 @@ void __init kho_memory_init(void) if (kho_in.scratch_phys) { kho_scratch = phys_to_virt(kho_in.scratch_phys); kho_release_scratch(); + + kho_mem_deserialize(kho_get_fdt()); } else { kho_reserve_scratch(); } -- 2.49.0.906.g1f30a19c02-goog