linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Pratyush Yadav <ptyadav@amazon.de>
To: Changyuan Lyu <changyuanl@google.com>
Cc: <linux-kernel@vger.kernel.org>, <graf@amazon.com>,
	<akpm@linux-foundation.org>, <luto@kernel.org>,
	<anthony.yznaga@oracle.com>, <arnd@arndb.de>,
	<ashish.kalra@amd.com>, <benh@kernel.crashing.org>,
	<bp@alien8.de>, <catalin.marinas@arm.com>,
	<dave.hansen@linux.intel.com>, <dwmw2@infradead.org>,
	<ebiederm@xmission.com>, <mingo@redhat.com>, <jgowans@amazon.com>,
	<corbet@lwn.net>, <krzk@kernel.org>, <rppt@kernel.org>,
	<mark.rutland@arm.com>, <pbonzini@redhat.com>,
	<pasha.tatashin@soleen.com>, <hpa@zytor.com>,
	<peterz@infradead.org>, <robh+dt@kernel.org>, <robh@kernel.org>,
	<saravanak@google.com>, <skinsburskii@linux.microsoft.com>,
	<rostedt@goodmis.org>, <tglx@linutronix.de>,
	<thomas.lendacky@amd.com>, <usama.arif@bytedance.com>,
	<will@kernel.org>, <devicetree@vger.kernel.org>,
	<kexec@lists.infradead.org>,
	<linux-arm-kernel@lists.infradead.org>,
	<linux-doc@vger.kernel.org>, <linux-mm@kvack.org>,
	<x86@kernel.org>, Jason Gunthorpe <jgg@nvidia.com>
Subject: Re: [PATCH v5 09/16] kexec: enable KHO support for memory preservation
Date: Wed, 2 Apr 2025 19:16:27 +0000	[thread overview]
Message-ID: <mafs05xjmqsqc.fsf@amazon.de> (raw)
In-Reply-To: <20250320015551.2157511-10-changyuanl@google.com>

Hi Changyuan,

On Wed, Mar 19 2025, Changyuan Lyu wrote:

> From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
>
> Introduce APIs allowing KHO users to preserve memory across kexec and
> get access to that memory after boot of the kexeced kernel
>
> kho_preserve_folio() - record a folio to be preserved over kexec
> kho_restore_folio() - recreates the folio from the preserved memory
> kho_preserve_phys() - record physically contiguous range to be
> preserved over kexec.
> kho_restore_phys() - recreates order-0 pages corresponding to the
> preserved physical range
>
> The memory preservations are tracked by two levels of xarrays to manage
> chunks of per-order 512 byte bitmaps. For instance the entire 1G order
> of a 1TB x86 system would fit inside a single 512 byte bitmap. For
> order 0 allocations each bitmap will cover 16M of address space. Thus,
> for 16G of memory at most 512K of bitmap memory will be needed for order 0.
>
> At serialization time all bitmaps are recorded in a linked list of pages
> for the next kernel to process and the physical address of the list is
> recorded in KHO FDT.
>
> The next kernel then processes that list, reserves the memory ranges and
> later, when a user requests a folio or a physical range, KHO restores
> corresponding memory map entries.
>
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> Co-developed-by: Changyuan Lyu <changyuanl@google.com>
> Signed-off-by: Changyuan Lyu <changyuanl@google.com>
> ---
>  include/linux/kexec_handover.h |  38 +++
>  kernel/kexec_handover.c        | 486 ++++++++++++++++++++++++++++++++-
>  2 files changed, 522 insertions(+), 2 deletions(-)
[...]
> +int kho_preserve_phys(phys_addr_t phys, size_t size)
> +{
> +	unsigned long pfn = PHYS_PFN(phys), end_pfn = PHYS_PFN(phys + size);
> +	unsigned int order = ilog2(end_pfn - pfn);

This caught my eye when playing around with the code. It does not put
any limit on the order, so it can exceed NR_PAGE_ORDERS. Also, when
initializing the page after KHO, we pass the order directly to
prep_compound_page() without sanity checking it. The next kernel might
not support all the orders the current one supports. Perhaps something
to fix?

> +	unsigned long failed_pfn;
> +	int err = 0;
> +
> +	if (!kho_enable)
> +		return -EOPNOTSUPP;
> +
> +	down_read(&kho_out.tree_lock);
> +	if (kho_out.fdt) {
> +		err = -EBUSY;
> +		goto unlock;
> +	}
> +
> +	for (; pfn < end_pfn;
> +	     pfn += (1 << order), order = ilog2(end_pfn - pfn)) {
> +		err = __kho_preserve(&kho_mem_track, pfn, order);
> +		if (err) {
> +			failed_pfn = pfn;
> +			break;
> +		}
> +	}
[...
> +struct folio *kho_restore_folio(phys_addr_t phys)
> +{
> +	struct page *page = pfn_to_online_page(PHYS_PFN(phys));
> +	unsigned long order = page->private;
> +
> +	if (!page)
> +		return NULL;
> +
> +	order = page->private;
> +	if (order)
> +		prep_compound_page(page, order);
> +	else
> +		kho_restore_page(page);
> +
> +	return page_folio(page);
> +}
[...]

-- 
Regards,
Pratyush Yadav


  parent reply	other threads:[~2025-04-02 19:16 UTC|newest]

Thread overview: 103+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-20  1:55 [PATCH v5 00/16] kexec: introduce Kexec HandOver (KHO) Changyuan Lyu
2025-03-20  1:55 ` [PATCH v5 01/16] kexec: define functions to map and unmap segments Changyuan Lyu
2025-03-20  1:55 ` [PATCH v5 02/16] mm/mm_init: rename init_reserved_page to init_deferred_page Changyuan Lyu
2025-03-20  7:10   ` Krzysztof Kozlowski
2025-03-20 17:15     ` Changyuan Lyu
2025-03-20  1:55 ` [PATCH v5 03/16] memblock: add MEMBLOCK_RSRV_KERN flag Changyuan Lyu
2025-03-20  1:55 ` [PATCH v5 04/16] memblock: Add support for scratch memory Changyuan Lyu
2025-03-20  1:55 ` [PATCH v5 05/16] memblock: introduce memmap_init_kho_scratch() Changyuan Lyu
2025-03-20  1:55 ` [PATCH v5 06/16] hashtable: add macro HASHTABLE_INIT Changyuan Lyu
2025-03-20  1:55 ` [PATCH v5 07/16] kexec: add Kexec HandOver (KHO) generation helpers Changyuan Lyu
2025-03-21 13:34   ` Jason Gunthorpe
2025-03-23 19:02     ` Changyuan Lyu
2025-03-24 16:28       ` Jason Gunthorpe
2025-03-25  0:21         ` Changyuan Lyu
2025-03-25  2:20           ` Jason Gunthorpe
2025-03-24 18:40   ` Frank van der Linden
2025-03-25 19:19     ` Mike Rapoport
2025-03-25 21:56       ` Frank van der Linden
2025-03-26 11:59         ` Mike Rapoport
2025-03-26 16:25           ` Frank van der Linden
2025-03-20  1:55 ` [PATCH v5 08/16] kexec: add KHO parsing support Changyuan Lyu
2025-03-20  1:55 ` [PATCH v5 09/16] kexec: enable KHO support for memory preservation Changyuan Lyu
2025-03-21 13:46   ` Jason Gunthorpe
2025-03-22 19:12     ` Mike Rapoport
2025-03-23 18:55       ` Jason Gunthorpe
2025-03-24 18:18         ` Mike Rapoport
2025-03-24 20:07           ` Jason Gunthorpe
2025-03-26 12:07             ` Mike Rapoport
2025-03-23 19:07     ` Changyuan Lyu
2025-03-25  2:04       ` Jason Gunthorpe
2025-03-27 10:03   ` Pratyush Yadav
2025-03-27 13:31     ` Jason Gunthorpe
2025-03-27 17:28       ` Pratyush Yadav
2025-03-28 12:53         ` Jason Gunthorpe
2025-04-02 16:44         ` Changyuan Lyu
2025-04-02 16:47           ` Pratyush Yadav
2025-04-02 18:37             ` Pasha Tatashin
2025-04-02 18:49               ` Pratyush Yadav
2025-04-02 19:16   ` Pratyush Yadav [this message]
2025-04-03 11:42     ` Jason Gunthorpe
2025-04-03 13:58       ` Mike Rapoport
2025-04-03 14:24         ` Jason Gunthorpe
2025-04-04  9:54           ` Mike Rapoport
2025-04-04 12:47             ` Jason Gunthorpe
2025-04-04 13:53               ` Mike Rapoport
2025-04-04 14:30                 ` Jason Gunthorpe
2025-04-04 16:24                   ` Pratyush Yadav
2025-04-04 17:31                     ` Jason Gunthorpe
2025-04-06 16:13                     ` Mike Rapoport
2025-04-06 16:11                   ` Mike Rapoport
2025-04-07 14:16                     ` Jason Gunthorpe
2025-04-07 16:31                       ` Mike Rapoport
2025-04-07 17:03                         ` Jason Gunthorpe
2025-04-09  9:06                           ` Mike Rapoport
2025-04-09 12:56                             ` Jason Gunthorpe
2025-04-09 13:58                               ` Mike Rapoport
2025-04-09 15:37                                 ` Jason Gunthorpe
2025-04-09 16:19                                   ` Mike Rapoport
2025-04-09 16:28                                     ` Jason Gunthorpe
2025-04-10 16:51                                       ` Matthew Wilcox
2025-04-10 17:31                                         ` Jason Gunthorpe
2025-04-09 16:28                       ` Mike Rapoport
2025-04-09 18:32                         ` Jason Gunthorpe
2025-04-04 16:15                 ` Pratyush Yadav
2025-04-06 16:34                   ` Mike Rapoport
2025-04-07 14:23                     ` Jason Gunthorpe
2025-04-03 13:57     ` Mike Rapoport
2025-04-11  4:02     ` Changyuan Lyu
2025-04-03 15:50   ` Pratyush Yadav
2025-04-03 16:10     ` Jason Gunthorpe
2025-04-03 17:37       ` Pratyush Yadav
2025-04-04 12:54         ` Jason Gunthorpe
2025-04-04 15:39           ` Pratyush Yadav
2025-04-09  8:35       ` Mike Rapoport
2025-03-20  1:55 ` [PATCH v5 10/16] kexec: add KHO support to kexec file loads Changyuan Lyu
2025-03-21 13:48   ` Jason Gunthorpe
2025-03-20  1:55 ` [PATCH v5 11/16] kexec: add config option for KHO Changyuan Lyu
2025-03-20  7:10   ` Krzysztof Kozlowski
2025-03-20 17:18     ` Changyuan Lyu
2025-03-24  4:18   ` Dave Young
2025-03-24 19:26     ` Pasha Tatashin
2025-03-25  1:24       ` Dave Young
2025-03-25  3:07         ` Dave Young
2025-03-25  6:57     ` Baoquan He
2025-03-25  8:36       ` Dave Young
2025-03-26  9:17         ` Dave Young
2025-03-26 11:28           ` Mike Rapoport
2025-03-26 12:09             ` Dave Young
2025-03-25 14:04       ` Pasha Tatashin
2025-03-20  1:55 ` [PATCH v5 12/16] arm64: add KHO support Changyuan Lyu
2025-03-20  7:13   ` Krzysztof Kozlowski
2025-03-20  8:30     ` Krzysztof Kozlowski
2025-03-20 23:29     ` Changyuan Lyu
2025-04-11  3:47   ` Changyuan Lyu
2025-03-20  1:55 ` [PATCH v5 13/16] x86/setup: use memblock_reserve_kern for memory used by kernel Changyuan Lyu
2025-03-20  1:55 ` [PATCH v5 14/16] x86: add KHO support Changyuan Lyu
2025-03-20  1:55 ` [PATCH v5 15/16] memblock: add KHO support for reserve_mem Changyuan Lyu
2025-03-20  1:55 ` [PATCH v5 16/16] Documentation: add documentation for KHO Changyuan Lyu
2025-03-20 14:45   ` Jonathan Corbet
2025-03-21  6:33     ` Changyuan Lyu
2025-03-21 13:46       ` Jonathan Corbet
2025-03-25 14:19 ` [PATCH v5 00/16] kexec: introduce Kexec HandOver (KHO) Pasha Tatashin
2025-03-25 15:03   ` Mike Rapoport

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mafs05xjmqsqc.fsf@amazon.de \
    --to=ptyadav@amazon.de \
    --cc=akpm@linux-foundation.org \
    --cc=anthony.yznaga@oracle.com \
    --cc=arnd@arndb.de \
    --cc=ashish.kalra@amd.com \
    --cc=benh@kernel.crashing.org \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=changyuanl@google.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=devicetree@vger.kernel.org \
    --cc=dwmw2@infradead.org \
    --cc=ebiederm@xmission.com \
    --cc=graf@amazon.com \
    --cc=hpa@zytor.com \
    --cc=jgg@nvidia.com \
    --cc=jgowans@amazon.com \
    --cc=kexec@lists.infradead.org \
    --cc=krzk@kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=robh+dt@kernel.org \
    --cc=robh@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=rppt@kernel.org \
    --cc=saravanak@google.com \
    --cc=skinsburskii@linux.microsoft.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=usama.arif@bytedance.com \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox