From: Jason Gunthorpe <jgg@nvidia.com>
To: Pratyush Yadav <ptyadav@amazon.de>
Cc: Changyuan Lyu <changyuanl@google.com>,
linux-kernel@vger.kernel.org, graf@amazon.com,
akpm@linux-foundation.org, luto@kernel.org,
anthony.yznaga@oracle.com, arnd@arndb.de, ashish.kalra@amd.com,
benh@kernel.crashing.org, bp@alien8.de, catalin.marinas@arm.com,
dave.hansen@linux.intel.com, dwmw2@infradead.org,
ebiederm@xmission.com, mingo@redhat.com, jgowans@amazon.com,
corbet@lwn.net, krzk@kernel.org, rppt@kernel.org,
mark.rutland@arm.com, pbonzini@redhat.com,
pasha.tatashin@soleen.com, hpa@zytor.com, peterz@infradead.org,
robh+dt@kernel.org, robh@kernel.org, saravanak@google.com,
skinsburskii@linux.microsoft.com, rostedt@goodmis.org,
tglx@linutronix.de, thomas.lendacky@amd.com,
usama.arif@bytedance.com, will@kernel.org,
devicetree@vger.kernel.org, kexec@lists.infradead.org,
linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org,
linux-mm@kvack.org, x86@kernel.org
Subject: Re: [PATCH v5 09/16] kexec: enable KHO support for memory preservation
Date: Thu, 3 Apr 2025 13:10:01 -0300 [thread overview]
Message-ID: <20250403161001.GG342109@nvidia.com> (raw)
In-Reply-To: <mafs01pu9qm6r.fsf@amazon.de>
On Thu, Apr 03, 2025 at 03:50:04PM +0000, Pratyush Yadav wrote:
> The patch currently has a limitation where it does not free any of the
> empty tables after a unpreserve operation. But Changyuan's patch also
> doesn't do it so at least it is not any worse off.
We do we even have unpreserve? Just discard the entire KHO operation
in a bulk.
> When working on this patch, I realized that kho_mem_deserialize() is
> currently _very_ slow. It takes over 2 seconds to make memblock
> reservations for 48 GiB of 0-order pages. I suppose this can later be
> optimized by teaching memblock_free_all() to skip preserved pages
> instead of making memblock reservations.
Yes, this was my prior point of not having actual data to know what
the actual hot spots are.. This saves a few ms on an operation that
takes over 2 seconds :)
> +typedef unsigned long khomem_desc_t;
This should be more like:
union {
void *table;
phys_addr_t table_phys;
};
Since we are not using the low bits right now and it is alot cheaper
to convert from va to phys only once during the final step. __va is
not exactly fast.
> +#define PTRS_PER_LEVEL (PAGE_SIZE / sizeof(unsigned long))
> +#define KHOMEM_L1_BITS (PAGE_SIZE * BITS_PER_BYTE)
> +#define KHOMEM_L1_MASK ((1 << ilog2(KHOMEM_L1_BITS)) - 1)
> +#define KHOMEM_L1_SHIFT (PAGE_SHIFT)
> +#define KHOMEM_L2_SHIFT (KHOMEM_L1_SHIFT + ilog2(KHOMEM_L1_BITS))
> +#define KHOMEM_L3_SHIFT (KHOMEM_L2_SHIFT + ilog2(PTRS_PER_LEVEL))
> +#define KHOMEM_L4_SHIFT (KHOMEM_L3_SHIFT + ilog2(PTRS_PER_LEVEL))
> +#define KHOMEM_PFN_MASK PAGE_MASK
This all works better if you just use GENMASK and FIELD_GET
> +static int __khomem_table_alloc(khomem_desc_t *desc)
> +{
> + if (khomem_desc_none(*desc)) {
Needs READ_ONCE
> +struct kho_mem_track {
> + /* Points to L4 KHOMEM descriptor, each order gets its own table. */
> + struct xarray orders;
> +};
I think it would be easy to add a 5th level and just use bits 63:57 as
a 6 bit order. Then you don't need all this stuff either.
> +int kho_preserve_folio(struct folio *folio)
> +{
> + unsigned long pfn = folio_pfn(folio);
> + unsigned int order = folio_order(folio);
> + int err;
> +
> + if (!kho_enable)
> + return -EOPNOTSUPP;
> +
> + down_read(&kho_out.tree_lock);
This lock still needs to go away
> +static void kho_mem_serialize(void)
> +{
> + struct kho_mem_track *tracker = &kho_mem_track;
> + khomem_desc_t *desc;
> + unsigned long order;
> +
> + xa_for_each(&tracker->orders, order, desc) {
> + if (WARN_ON(order >= NR_PAGE_ORDERS))
> + break;
> + kho_out.mem_tables[order] = *desc;
Missing the virt_to_phys?
> + nr_tables = min_t(unsigned int, len / sizeof(*tables), NR_PAGE_ORDERS);
> + for (order = 0; order < nr_tables; order++)
> + khomem_walk_preserved((khomem_desc_t *)&tables[order], order,
Missing phys_to_virt
Please dont' remove the KHOSER stuff, and do use it with proper
structs and types. It is part of keeping this stuff understandable.
Jason
next prev parent reply other threads:[~2025-04-03 16:10 UTC|newest]
Thread overview: 103+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-20 1:55 [PATCH v5 00/16] kexec: introduce Kexec HandOver (KHO) Changyuan Lyu
2025-03-20 1:55 ` [PATCH v5 01/16] kexec: define functions to map and unmap segments Changyuan Lyu
2025-03-20 1:55 ` [PATCH v5 02/16] mm/mm_init: rename init_reserved_page to init_deferred_page Changyuan Lyu
2025-03-20 7:10 ` Krzysztof Kozlowski
2025-03-20 17:15 ` Changyuan Lyu
2025-03-20 1:55 ` [PATCH v5 03/16] memblock: add MEMBLOCK_RSRV_KERN flag Changyuan Lyu
2025-03-20 1:55 ` [PATCH v5 04/16] memblock: Add support for scratch memory Changyuan Lyu
2025-03-20 1:55 ` [PATCH v5 05/16] memblock: introduce memmap_init_kho_scratch() Changyuan Lyu
2025-03-20 1:55 ` [PATCH v5 06/16] hashtable: add macro HASHTABLE_INIT Changyuan Lyu
2025-03-20 1:55 ` [PATCH v5 07/16] kexec: add Kexec HandOver (KHO) generation helpers Changyuan Lyu
2025-03-21 13:34 ` Jason Gunthorpe
2025-03-23 19:02 ` Changyuan Lyu
2025-03-24 16:28 ` Jason Gunthorpe
2025-03-25 0:21 ` Changyuan Lyu
2025-03-25 2:20 ` Jason Gunthorpe
2025-03-24 18:40 ` Frank van der Linden
2025-03-25 19:19 ` Mike Rapoport
2025-03-25 21:56 ` Frank van der Linden
2025-03-26 11:59 ` Mike Rapoport
2025-03-26 16:25 ` Frank van der Linden
2025-03-20 1:55 ` [PATCH v5 08/16] kexec: add KHO parsing support Changyuan Lyu
2025-03-20 1:55 ` [PATCH v5 09/16] kexec: enable KHO support for memory preservation Changyuan Lyu
2025-03-21 13:46 ` Jason Gunthorpe
2025-03-22 19:12 ` Mike Rapoport
2025-03-23 18:55 ` Jason Gunthorpe
2025-03-24 18:18 ` Mike Rapoport
2025-03-24 20:07 ` Jason Gunthorpe
2025-03-26 12:07 ` Mike Rapoport
2025-03-23 19:07 ` Changyuan Lyu
2025-03-25 2:04 ` Jason Gunthorpe
2025-03-27 10:03 ` Pratyush Yadav
2025-03-27 13:31 ` Jason Gunthorpe
2025-03-27 17:28 ` Pratyush Yadav
2025-03-28 12:53 ` Jason Gunthorpe
2025-04-02 16:44 ` Changyuan Lyu
2025-04-02 16:47 ` Pratyush Yadav
2025-04-02 18:37 ` Pasha Tatashin
2025-04-02 18:49 ` Pratyush Yadav
2025-04-02 19:16 ` Pratyush Yadav
2025-04-03 11:42 ` Jason Gunthorpe
2025-04-03 13:58 ` Mike Rapoport
2025-04-03 14:24 ` Jason Gunthorpe
2025-04-04 9:54 ` Mike Rapoport
2025-04-04 12:47 ` Jason Gunthorpe
2025-04-04 13:53 ` Mike Rapoport
2025-04-04 14:30 ` Jason Gunthorpe
2025-04-04 16:24 ` Pratyush Yadav
2025-04-04 17:31 ` Jason Gunthorpe
2025-04-06 16:13 ` Mike Rapoport
2025-04-06 16:11 ` Mike Rapoport
2025-04-07 14:16 ` Jason Gunthorpe
2025-04-07 16:31 ` Mike Rapoport
2025-04-07 17:03 ` Jason Gunthorpe
2025-04-09 9:06 ` Mike Rapoport
2025-04-09 12:56 ` Jason Gunthorpe
2025-04-09 13:58 ` Mike Rapoport
2025-04-09 15:37 ` Jason Gunthorpe
2025-04-09 16:19 ` Mike Rapoport
2025-04-09 16:28 ` Jason Gunthorpe
2025-04-10 16:51 ` Matthew Wilcox
2025-04-10 17:31 ` Jason Gunthorpe
2025-04-09 16:28 ` Mike Rapoport
2025-04-09 18:32 ` Jason Gunthorpe
2025-04-04 16:15 ` Pratyush Yadav
2025-04-06 16:34 ` Mike Rapoport
2025-04-07 14:23 ` Jason Gunthorpe
2025-04-03 13:57 ` Mike Rapoport
2025-04-11 4:02 ` Changyuan Lyu
2025-04-03 15:50 ` Pratyush Yadav
2025-04-03 16:10 ` Jason Gunthorpe [this message]
2025-04-03 17:37 ` Pratyush Yadav
2025-04-04 12:54 ` Jason Gunthorpe
2025-04-04 15:39 ` Pratyush Yadav
2025-04-09 8:35 ` Mike Rapoport
2025-03-20 1:55 ` [PATCH v5 10/16] kexec: add KHO support to kexec file loads Changyuan Lyu
2025-03-21 13:48 ` Jason Gunthorpe
2025-03-20 1:55 ` [PATCH v5 11/16] kexec: add config option for KHO Changyuan Lyu
2025-03-20 7:10 ` Krzysztof Kozlowski
2025-03-20 17:18 ` Changyuan Lyu
2025-03-24 4:18 ` Dave Young
2025-03-24 19:26 ` Pasha Tatashin
2025-03-25 1:24 ` Dave Young
2025-03-25 3:07 ` Dave Young
2025-03-25 6:57 ` Baoquan He
2025-03-25 8:36 ` Dave Young
2025-03-26 9:17 ` Dave Young
2025-03-26 11:28 ` Mike Rapoport
2025-03-26 12:09 ` Dave Young
2025-03-25 14:04 ` Pasha Tatashin
2025-03-20 1:55 ` [PATCH v5 12/16] arm64: add KHO support Changyuan Lyu
2025-03-20 7:13 ` Krzysztof Kozlowski
2025-03-20 8:30 ` Krzysztof Kozlowski
2025-03-20 23:29 ` Changyuan Lyu
2025-04-11 3:47 ` Changyuan Lyu
2025-03-20 1:55 ` [PATCH v5 13/16] x86/setup: use memblock_reserve_kern for memory used by kernel Changyuan Lyu
2025-03-20 1:55 ` [PATCH v5 14/16] x86: add KHO support Changyuan Lyu
2025-03-20 1:55 ` [PATCH v5 15/16] memblock: add KHO support for reserve_mem Changyuan Lyu
2025-03-20 1:55 ` [PATCH v5 16/16] Documentation: add documentation for KHO Changyuan Lyu
2025-03-20 14:45 ` Jonathan Corbet
2025-03-21 6:33 ` Changyuan Lyu
2025-03-21 13:46 ` Jonathan Corbet
2025-03-25 14:19 ` [PATCH v5 00/16] kexec: introduce Kexec HandOver (KHO) Pasha Tatashin
2025-03-25 15:03 ` Mike Rapoport
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250403161001.GG342109@nvidia.com \
--to=jgg@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=anthony.yznaga@oracle.com \
--cc=arnd@arndb.de \
--cc=ashish.kalra@amd.com \
--cc=benh@kernel.crashing.org \
--cc=bp@alien8.de \
--cc=catalin.marinas@arm.com \
--cc=changyuanl@google.com \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=devicetree@vger.kernel.org \
--cc=dwmw2@infradead.org \
--cc=ebiederm@xmission.com \
--cc=graf@amazon.com \
--cc=hpa@zytor.com \
--cc=jgowans@amazon.com \
--cc=kexec@lists.infradead.org \
--cc=krzk@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=mark.rutland@arm.com \
--cc=mingo@redhat.com \
--cc=pasha.tatashin@soleen.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=ptyadav@amazon.de \
--cc=robh+dt@kernel.org \
--cc=robh@kernel.org \
--cc=rostedt@goodmis.org \
--cc=rppt@kernel.org \
--cc=saravanak@google.com \
--cc=skinsburskii@linux.microsoft.com \
--cc=tglx@linutronix.de \
--cc=thomas.lendacky@amd.com \
--cc=usama.arif@bytedance.com \
--cc=will@kernel.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox