From: Oscar Salvador <osalvador@suse.de>
To: david@redhat.com
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Michal Hocko <mhocko@suse.com>, Hannes Reinecke <hare@kernel.org>
Subject: [RFC] Disable auto_movable_ratio for selfhosted memmap
Date: Mon, 28 Jul 2025 10:15:47 +0200 [thread overview]
Message-ID: <aIcxs2nk3RNWWbD6@localhost.localdomain> (raw)
Hi,
Currently, we have several mechanisms to pick a zone for the new memory we are
onlining.
Eventually, we will land on zone_for_pfn_range() which will pick the zone.
Two of these mechanisms are 'movable_node' and 'auto-movable' policy.
The former will put every single hotpluggled memory in ZONE_MOVABLE
(unless we can keep zones contiguous by not doing so), while the latter
will put it in ZONA_MOVABLE IFF we are within the established ratio
MOVABLE:KERNEL.
It seems, the later doesn't play well with CXL memory where CXL cards hold really
large amounts of memory, making the ratio fail, and since CXL cards must be removed
as a unit, it can't be done if any memory block fell within
!ZONE_MOVABLE zone.
One way to tackle this would be update the ratio every time a new CXL
card gets inserted, but this seems suboptimal.
Another way is that since CXL memory works with selfhosted memmap, we could relax
the check when 'auto-movable' and only look at the ratio if we aren't
working with selfhosted memmap.
Something like the following (acthung: it's just a PoC)
Comments? Ideas?
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 5c6c1d6bb59f..ff87cfb3881a 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -234,7 +234,7 @@ static int memory_block_online(struct memory_block *mem)
return -EHWPOISON;
zone = zone_for_pfn_range(mem->online_type, mem->nid, mem->group,
- start_pfn, nr_pages);
+ start_pfn, nr_pages, mem->altmap);
/*
* Although vmemmap pages have a different lifecycle than the pages
@@ -473,11 +473,11 @@ static ssize_t phys_device_show(struct device *dev,
static int print_allowed_zone(char *buf, int len, int nid,
struct memory_group *group,
unsigned long start_pfn, unsigned long nr_pages,
- int online_type, struct zone *default_zone)
+ int online_type, struct zone *default_zone, struct vmem_altmap *altmap)
{
struct zone *zone;
- zone = zone_for_pfn_range(online_type, nid, group, start_pfn, nr_pages);
+ zone = zone_for_pfn_range(online_type, nid, group, start_pfn, nr_pages, altmap);
if (zone == default_zone)
return 0;
@@ -509,13 +509,13 @@ static ssize_t valid_zones_show(struct device *dev,
}
default_zone = zone_for_pfn_range(MMOP_ONLINE, nid, group,
- start_pfn, nr_pages);
+ start_pfn, nr_pages, mem->altmap);
len = sysfs_emit(buf, "%s", default_zone->name);
len += print_allowed_zone(buf, len, nid, group, start_pfn, nr_pages,
- MMOP_ONLINE_KERNEL, default_zone);
+ MMOP_ONLINE_KERNEL, default_zone, mem->altmap);
len += print_allowed_zone(buf, len, nid, group, start_pfn, nr_pages,
- MMOP_ONLINE_MOVABLE, default_zone);
+ MMOP_ONLINE_MOVABLE, default_zone, mem->altmap);
len += sysfs_emit_at(buf, len, "\n");
return len;
}
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 23f038a16231..89f7b9c5d995 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -328,7 +328,7 @@ extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
unsigned long pnum);
extern struct zone *zone_for_pfn_range(int online_type, int nid,
struct memory_group *group, unsigned long start_pfn,
- unsigned long nr_pages);
+ unsigned long nr_pages, struct vmem_altmap *altmap);
extern int arch_create_linear_mapping(int nid, u64 start, u64 size,
struct mhp_params *params);
void arch_remove_linear_mapping(u64 start, u64 size);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 69a636e20f7b..6c6600a9c839 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1048,7 +1048,7 @@ static inline struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn
struct zone *zone_for_pfn_range(int online_type, int nid,
struct memory_group *group, unsigned long start_pfn,
- unsigned long nr_pages)
+ unsigned long nr_pages, struct vmem_altmap *altmap)
{
if (online_type == MMOP_ONLINE_KERNEL)
return default_kernel_zone_for_pfn(nid, start_pfn, nr_pages);
@@ -1056,6 +1056,10 @@ struct zone *zone_for_pfn_range(int online_type, int nid,
if (online_type == MMOP_ONLINE_MOVABLE)
return &NODE_DATA(nid)->node_zones[ZONE_MOVABLE];
+ /* Selfhosted memmap, skip ratio check */
+ if (online_policy == ONLINE_POLICY_AUTO_MOVABLE && altmap)
+ return &NODE_DATA(nid)->node_zones[ZONE_MOVABLE];
+
if (online_policy == ONLINE_POLICY_AUTO_MOVABLE)
return auto_movable_zone_for_pfn(nid, group, start_pfn, nr_pages);
--
Oscar Salvador
SUSE Labs
next reply other threads:[~2025-07-28 8:15 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-28 8:15 Oscar Salvador [this message]
2025-07-28 8:44 ` David Hildenbrand
2025-07-28 9:28 ` Hannes Reinecke
2025-07-28 9:42 ` David Hildenbrand
2025-07-28 8:48 ` Michal Hocko
2025-07-28 8:53 ` David Hildenbrand
2025-07-28 9:04 ` Michal Hocko
2025-07-28 9:10 ` David Hildenbrand
2025-07-28 9:37 ` Hannes Reinecke
2025-07-28 13:06 ` Michal Hocko
2025-07-28 13:08 ` David Hildenbrand
2025-07-29 7:24 ` Hannes Reinecke
2025-07-29 9:19 ` Michal Hocko
2025-07-29 9:29 ` David Hildenbrand
2025-07-29 9:33 ` Hannes Reinecke
2025-07-29 11:58 ` Michal Hocko
2025-07-29 13:52 ` Hannes Reinecke
2025-07-28 15:15 ` David Hildenbrand
2025-07-28 12:17 ` Michal Hocko
2025-07-28 12:27 ` David Hildenbrand
2025-07-28 12:27 ` David Hildenbrand
2025-07-28 13:00 ` Michal Hocko
2025-07-28 13:03 ` David Hildenbrand
2025-07-28 12:54 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aIcxs2nk3RNWWbD6@localhost.localdomain \
--to=osalvador@suse.de \
--cc=david@redhat.com \
--cc=hare@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox