* [PATCH 1/8] mm/memory_hotplug: pass online_type to online_memory_block() via arg
2026-01-14 8:51 Subject: [PATCH 0/8] dax/kmem: add runtime hotplug state control Gregory Price
@ 2026-01-14 8:51 ` Gregory Price
2026-01-14 9:46 ` David Hildenbrand (Red Hat)
2026-01-14 8:51 ` [PATCH 2/8] mm/memory_hotplug: extract __add_memory_resource() and __offline_memory() Gregory Price
` (6 subsequent siblings)
7 siblings, 1 reply; 15+ messages in thread
From: Gregory Price @ 2026-01-14 8:51 UTC (permalink / raw)
To: linux-mm
Cc: linux-cxl, nvdimm, linux-kernel, virtualization, kernel-team,
dan.j.williams, vishal.l.verma, dave.jiang, david, mst, jasowang,
xuanzhuo, eperezma, osalvador, akpm
Modify online_memory_block() to accept the online type through its arg
parameter rather than calling mhp_get_default_online_type() internally.
This prepares for allowing callers to specify explicit online types.
Update the caller in add_memory_resource() to pass the default online
type via a local variable. No functional change.
Signed-off-by: Gregory Price <gourry@gourry.net>
---
mm/memory_hotplug.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 389989a28abe..5718556121f0 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1337,7 +1337,9 @@ static int check_hotplug_memory_range(u64 start, u64 size)
static int online_memory_block(struct memory_block *mem, void *arg)
{
- mem->online_type = mhp_get_default_online_type();
+ int *online_type = arg;
+
+ mem->online_type = *online_type;
return device_online(&mem->dev);
}
@@ -1578,8 +1580,12 @@ int add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
merge_system_ram_resource(res);
/* online pages if requested */
- if (mhp_get_default_online_type() != MMOP_OFFLINE)
- walk_memory_blocks(start, size, NULL, online_memory_block);
+ if (mhp_get_default_online_type() != MMOP_OFFLINE) {
+ int online_type = mhp_get_default_online_type();
+
+ walk_memory_blocks(start, size, &online_type,
+ online_memory_block);
+ }
return ret;
error:
--
2.52.0
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH 1/8] mm/memory_hotplug: pass online_type to online_memory_block() via arg
2026-01-14 8:51 ` [PATCH 1/8] mm/memory_hotplug: pass online_type to online_memory_block() via arg Gregory Price
@ 2026-01-14 9:46 ` David Hildenbrand (Red Hat)
0 siblings, 0 replies; 15+ messages in thread
From: David Hildenbrand (Red Hat) @ 2026-01-14 9:46 UTC (permalink / raw)
To: Gregory Price, linux-mm
Cc: linux-cxl, nvdimm, linux-kernel, virtualization, kernel-team,
dan.j.williams, vishal.l.verma, dave.jiang, mst, jasowang,
xuanzhuo, eperezma, osalvador, akpm
On 1/14/26 09:51, Gregory Price wrote:
> Modify online_memory_block() to accept the online type through its arg
> parameter rather than calling mhp_get_default_online_type() internally.
> This prepares for allowing callers to specify explicit online types.
>
> Update the caller in add_memory_resource() to pass the default online
> type via a local variable. No functional change.
>
> Signed-off-by: Gregory Price <gourry@gourry.net>
> ---
> mm/memory_hotplug.c | 12 +++++++++---
> 1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 389989a28abe..5718556121f0 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1337,7 +1337,9 @@ static int check_hotplug_memory_range(u64 start, u64 size)
>
> static int online_memory_block(struct memory_block *mem, void *arg)
> {
> - mem->online_type = mhp_get_default_online_type();
> + int *online_type = arg;
> +
> + mem->online_type = *online_type;
> return device_online(&mem->dev);
> }
>
> @@ -1578,8 +1580,12 @@ int add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
> merge_system_ram_resource(res);
>
> /* online pages if requested */
> - if (mhp_get_default_online_type() != MMOP_OFFLINE)
> - walk_memory_blocks(start, size, NULL, online_memory_block);
> + if (mhp_get_default_online_type() != MMOP_OFFLINE) {
> + int online_type = mhp_get_default_online_type();
> +
> + walk_memory_blocks(start, size, &online_type,
> + online_memory_block);
I think you could just pass the value by casting to uintptr_t and back.
Doesn't make a big difference here, though.
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
--
Cheers
David
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 2/8] mm/memory_hotplug: extract __add_memory_resource() and __offline_memory()
2026-01-14 8:51 Subject: [PATCH 0/8] dax/kmem: add runtime hotplug state control Gregory Price
2026-01-14 8:51 ` [PATCH 1/8] mm/memory_hotplug: pass online_type to online_memory_block() via arg Gregory Price
@ 2026-01-14 8:51 ` Gregory Price
2026-01-14 10:14 ` David Hildenbrand (Red Hat)
2026-01-14 8:51 ` [PATCH 3/8] mm/memory_hotplug: add APIs for explicit online type control Gregory Price
` (5 subsequent siblings)
7 siblings, 1 reply; 15+ messages in thread
From: Gregory Price @ 2026-01-14 8:51 UTC (permalink / raw)
To: linux-mm
Cc: linux-cxl, nvdimm, linux-kernel, virtualization, kernel-team,
dan.j.williams, vishal.l.verma, dave.jiang, david, mst, jasowang,
xuanzhuo, eperezma, osalvador, akpm
Extract internal helper functions with explicit parameters to prepare
for adding new APIs that allow explicit online type control:
- __add_memory_resource(): accepts an explicit online_type parameter.
Add MMOP_SYSTEM_DEFAULT as a new value that instructs the function
to use mhp_get_default_online_type() for the actual online type.
The existing add_memory_resource() becomes a thin wrapper that
passes MMOP_SYSTEM_DEFAULT to preserve existing behavior.
- __offline_memory(): extracted from offline_and_remove_memory() to
handle the offline operation with rollback support. The caller
now handles locking and the remove step separately.
This refactoring enables future callers to specify explicit online
types (MMOP_OFFLINE, MMOP_ONLINE, MMOP_ONLINE_MOVABLE) or use
MMOP_SYSTEM_DEFAULT for the system default policy. The offline logic
can also be used independently of the remove step.
Mild functional change: if try_remove_memory() failed after successfully
offlining, we would re-online the memory. We no longer do this, and in
practice removal doesn't fail if offline succeeds.
Signed-off-by: Gregory Price <gourry@gourry.net>
---
include/linux/memory_hotplug.h | 2 +
mm/memory_hotplug.c | 69 ++++++++++++++++++++++------------
2 files changed, 48 insertions(+), 23 deletions(-)
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index f2f16cdd73ee..d5407264d72a 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -29,6 +29,8 @@ enum {
MMOP_ONLINE_KERNEL,
/* Online the memory to ZONE_MOVABLE. */
MMOP_ONLINE_MOVABLE,
+ /* Use system default online type from mhp_get_default_online_type(). */
+ MMOP_SYSTEM_DEFAULT,
};
/* Flags for add_memory() and friends to specify memory hotplug details. */
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 5718556121f0..ab73c8fcc0f1 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1490,7 +1490,8 @@ static int create_altmaps_and_memory_blocks(int nid, struct memory_group *group,
*
* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG
*/
-int add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
+static int __add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags,
+ int online_type)
{
struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) };
enum memblock_flags memblock_flags = MEMBLOCK_NONE;
@@ -1499,6 +1500,10 @@ int add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
bool new_node = false;
int ret;
+ /* Convert system default to actual online type */
+ if (online_type == MMOP_SYSTEM_DEFAULT)
+ online_type = mhp_get_default_online_type();
+
start = res->start;
size = resource_size(res);
@@ -1580,12 +1585,9 @@ int add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
merge_system_ram_resource(res);
/* online pages if requested */
- if (mhp_get_default_online_type() != MMOP_OFFLINE) {
- int online_type = mhp_get_default_online_type();
-
+ if (online_type != MMOP_OFFLINE)
walk_memory_blocks(start, size, &online_type,
online_memory_block);
- }
return ret;
error:
@@ -1601,7 +1603,12 @@ int add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
return ret;
}
-/* requires device_hotplug_lock, see add_memory_resource() */
+int add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
+{
+ return __add_memory_resource(nid, res, mhp_flags, MMOP_SYSTEM_DEFAULT);
+}
+
+/* requires device_hotplug_lock, see __add_memory_resource() */
int __add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags)
{
struct resource *res;
@@ -2357,12 +2364,12 @@ static int try_reonline_memory_block(struct memory_block *mem, void *arg)
}
/*
- * Try to offline and remove memory. Might take a long time to finish in case
- * memory is still in use. Primarily useful for memory devices that logically
- * unplugged all memory (so it's no longer in use) and want to offline + remove
- * that memory.
+ * Offline a memory range. In case of failure, already offlined memory blocks
+ * will be re-onlined.
+ *
+ * Caller must hold device hotplug lock.
*/
-int offline_and_remove_memory(u64 start, u64 size)
+static int __offline_memory(u64 start, u64 size)
{
const unsigned long mb_count = size / memory_block_size_bytes();
uint8_t *online_types, *tmp;
@@ -2388,11 +2395,37 @@ int offline_and_remove_memory(u64 start, u64 size)
*/
memset(online_types, MMOP_OFFLINE, mb_count);
- lock_device_hotplug();
-
tmp = online_types;
rc = walk_memory_blocks(start, size, &tmp, try_offline_memory_block);
+ /*
+ * Rollback what we did. While memory onlining might theoretically fail
+ * (nacked by a notifier), it barely ever happens.
+ */
+ if (rc) {
+ tmp = online_types;
+ walk_memory_blocks(start, size, &tmp,
+ try_reonline_memory_block);
+ }
+
+ kfree(online_types);
+ return rc;
+}
+
+/*
+ * Try to offline and remove memory. Might take a long time to finish in case
+ * memory is still in use. Primarily useful for memory devices that logically
+ * unplugged all memory (so it's no longer in use) and want to offline + remove
+ * that memory.
+ */
+int offline_and_remove_memory(u64 start, u64 size)
+{
+ int rc;
+
+ lock_device_hotplug();
+
+ rc = __offline_memory(start, size);
+
/*
* In case we succeeded to offline all memory, remove it.
* This cannot fail as it cannot get onlined in the meantime.
@@ -2403,18 +2436,8 @@ int offline_and_remove_memory(u64 start, u64 size)
pr_err("%s: Failed to remove memory: %d", __func__, rc);
}
- /*
- * Rollback what we did. While memory onlining might theoretically fail
- * (nacked by a notifier), it barely ever happens.
- */
- if (rc) {
- tmp = online_types;
- walk_memory_blocks(start, size, &tmp,
- try_reonline_memory_block);
- }
unlock_device_hotplug();
- kfree(online_types);
return rc;
}
EXPORT_SYMBOL_GPL(offline_and_remove_memory);
--
2.52.0
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH 2/8] mm/memory_hotplug: extract __add_memory_resource() and __offline_memory()
2026-01-14 8:51 ` [PATCH 2/8] mm/memory_hotplug: extract __add_memory_resource() and __offline_memory() Gregory Price
@ 2026-01-14 10:14 ` David Hildenbrand (Red Hat)
0 siblings, 0 replies; 15+ messages in thread
From: David Hildenbrand (Red Hat) @ 2026-01-14 10:14 UTC (permalink / raw)
To: Gregory Price, linux-mm
Cc: linux-cxl, nvdimm, linux-kernel, virtualization, kernel-team,
dan.j.williams, vishal.l.verma, dave.jiang, mst, jasowang,
xuanzhuo, eperezma, osalvador, akpm
On 1/14/26 09:51, Gregory Price wrote:
> Extract internal helper functions with explicit parameters to prepare
> for adding new APIs that allow explicit online type control:
>
> - __add_memory_resource(): accepts an explicit online_type parameter.
> Add MMOP_SYSTEM_DEFAULT as a new value that instructs the function
> to use mhp_get_default_online_type() for the actual online type.
> The existing add_memory_resource() becomes a thin wrapper that
> passes MMOP_SYSTEM_DEFAULT to preserve existing behavior.
>
> - __offline_memory(): extracted from offline_and_remove_memory() to
> handle the offline operation with rollback support. The caller
> now handles locking and the remove step separately.
I don't understand why this change is even part of this patch, can you
elaborate? You don't add any "explicit parameters to prepare for adding
new APIs that allow explicit online type control" there.
So likely you squeezed two independent things into a single patch? :)
Likely you should pair the __add_memory_resource() change with the
add_memory_driver_managed() changed and vice versa.
>
> This refactoring enables future callers to specify explicit online
> types (MMOP_OFFLINE, MMOP_ONLINE, MMOP_ONLINE_MOVABLE) or use
> MMOP_SYSTEM_DEFAULT for the system default policy. The offline logic
> can also be used independently of the remove step.
>
> Mild functional change: if try_remove_memory() failed after successfully
> offlining, we would re-online the memory. We no longer do this, and in
> practice removal doesn't fail if offline succeeds.
>
> Signed-off-by: Gregory Price <gourry@gourry.net>
> ---
> include/linux/memory_hotplug.h | 2 +
> mm/memory_hotplug.c | 69 ++++++++++++++++++++++------------
> 2 files changed, 48 insertions(+), 23 deletions(-)
>
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index f2f16cdd73ee..d5407264d72a 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -29,6 +29,8 @@ enum {
> MMOP_ONLINE_KERNEL,
> /* Online the memory to ZONE_MOVABLE. */
> MMOP_ONLINE_MOVABLE,
> + /* Use system default online type from mhp_get_default_online_type(). */
> + MMOP_SYSTEM_DEFAULT,
I don't like having fake options as part of this interface.
Why can't we let selected users use mhp_get_default_online_type()
instead? Like add_memory_resource(). We can export that function.
--
Cheers
David
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 3/8] mm/memory_hotplug: add APIs for explicit online type control
2026-01-14 8:51 Subject: [PATCH 0/8] dax/kmem: add runtime hotplug state control Gregory Price
2026-01-14 8:51 ` [PATCH 1/8] mm/memory_hotplug: pass online_type to online_memory_block() via arg Gregory Price
2026-01-14 8:51 ` [PATCH 2/8] mm/memory_hotplug: extract __add_memory_resource() and __offline_memory() Gregory Price
@ 2026-01-14 8:51 ` Gregory Price
2026-01-14 10:21 ` David Hildenbrand (Red Hat)
2026-01-14 8:51 ` [PATCH 4/8] mm/memory_hotplug: return online type from add_memory_driver_managed() Gregory Price
` (4 subsequent siblings)
7 siblings, 1 reply; 15+ messages in thread
From: Gregory Price @ 2026-01-14 8:51 UTC (permalink / raw)
To: linux-mm
Cc: linux-cxl, nvdimm, linux-kernel, virtualization, kernel-team,
dan.j.williams, vishal.l.verma, dave.jiang, david, mst, jasowang,
xuanzhuo, eperezma, osalvador, akpm
Add new memory hotplug APIs that allow callers to explicitly control
the online type when adding or managing memory:
- Extend add_memory_driver_managed() with an online_type parameter:
Callers can now specify MMOP_ONLINE, MMOP_ONLINE_KERNEL, or
MMOP_ONLINE_MOVABLE to online with that type, MMOP_OFFLINE to leave
memory offline, or MMOP_SYSTEM_DEFAULT to use the system default
policy. Update virtio_mem to pass MMOP_SYSTEM_DEFAULT to maintain
existing behavior.
- online_memory_range(): online a previously-added memory range with
a specified online type (MMOP_ONLINE, MMOP_ONLINE_KERNEL, or
MMOP_ONLINE_MOVABLE). Validates that the type is valid for onlining.
- offline_memory(): offline a memory range without removing it. This
is a wrapper around the internal __offline_memory() that handles
locking. Useful for drivers that want to offline memory blocks
before performing other operations.
These APIs enable drivers like dax_kmem to implement sophisticated
memory management policies, such as adding memory offline and deferring
the online decision to userspace.
Signed-off-by: Gregory Price <gourry@gourry.net>
---
drivers/dax/kmem.c | 3 +-
drivers/virtio/virtio_mem.c | 3 +-
include/linux/memory_hotplug.h | 4 ++-
mm/memory_hotplug.c | 63 ++++++++++++++++++++++++++++++++--
4 files changed, 68 insertions(+), 5 deletions(-)
diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index c036e4d0b610..5e0cf94a9620 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -175,7 +175,8 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
* this as RAM automatically.
*/
rc = add_memory_driver_managed(data->mgid, range.start,
- range_len(&range), kmem_name, mhp_flags);
+ range_len(&range), kmem_name, mhp_flags,
+ MMOP_SYSTEM_DEFAULT);
if (rc) {
dev_warn(dev, "mapping%d: %#llx-%#llx memory add failed\n",
diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 1688ecd69a04..b1ec8f2b9e31 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -654,7 +654,8 @@ static int virtio_mem_add_memory(struct virtio_mem *vm, uint64_t addr,
/* Memory might get onlined immediately. */
atomic64_add(size, &vm->offline_size);
rc = add_memory_driver_managed(vm->mgid, addr, size, vm->resource_name,
- MHP_MERGE_RESOURCE | MHP_NID_IS_MGID);
+ MHP_MERGE_RESOURCE | MHP_NID_IS_MGID,
+ MMOP_SYSTEM_DEFAULT);
if (rc) {
atomic64_sub(size, &vm->offline_size);
dev_warn(&vm->vdev->dev, "adding memory failed: %d\n", rc);
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index d5407264d72a..0f98bea6da65 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -265,6 +265,7 @@ static inline void pgdat_resize_init(struct pglist_data *pgdat) {}
extern void try_offline_node(int nid);
extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
struct zone *zone, struct memory_group *group);
+extern int offline_memory(u64 start, u64 size);
extern int remove_memory(u64 start, u64 size);
extern void __remove_memory(u64 start, u64 size);
extern int offline_and_remove_memory(u64 start, u64 size);
@@ -297,7 +298,8 @@ extern int add_memory_resource(int nid, struct resource *resource,
mhp_t mhp_flags);
extern int add_memory_driver_managed(int nid, u64 start, u64 size,
const char *resource_name,
- mhp_t mhp_flags);
+ mhp_t mhp_flags, int online_type);
+extern int online_memory_range(u64 start, u64 size, int online_type);
extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
unsigned long nr_pages,
struct vmem_altmap *altmap, int migratetype,
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index ab73c8fcc0f1..515ff9d18039 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1343,6 +1343,34 @@ static int online_memory_block(struct memory_block *mem, void *arg)
return device_online(&mem->dev);
}
+/**
+ * online_memory_range - online memory blocks in a range
+ * @start: physical start address of memory region
+ * @size: size of memory region
+ * @online_type: MMOP_ONLINE, MMOP_ONLINE_KERNEL, or MMOP_ONLINE_MOVABLE
+ *
+ * Online all memory blocks in the specified range with the given online type.
+ * The memory must have already been added to the system.
+ *
+ * Returns 0 on success, negative error code on failure.
+ */
+int online_memory_range(u64 start, u64 size, int online_type)
+{
+ int rc;
+
+ if (online_type == MMOP_OFFLINE ||
+ online_type > MMOP_ONLINE_MOVABLE)
+ return -EINVAL;
+
+ lock_device_hotplug();
+ rc = walk_memory_blocks(start, size, &online_type,
+ online_memory_block);
+ unlock_device_hotplug();
+
+ return rc;
+}
+EXPORT_SYMBOL_GPL(online_memory_range);
+
#ifndef arch_supports_memmap_on_memory
static inline bool arch_supports_memmap_on_memory(unsigned long vmemmap_size)
{
@@ -1656,9 +1684,16 @@ EXPORT_SYMBOL_GPL(add_memory);
*
* The resource_name (visible via /proc/iomem) has to have the format
* "System RAM ($DRIVER)".
+ *
+ * @online_type specifies the online behavior: MMOP_ONLINE, MMOP_ONLINE_KERNEL,
+ * MMOP_ONLINE_MOVABLE to online with that type, MMOP_OFFLINE to leave offline,
+ * or MMOP_SYSTEM_DEFAULT to use the system default policy.
+ *
+ * Returns 0 on success, negative error code on failure.
*/
int add_memory_driver_managed(int nid, u64 start, u64 size,
- const char *resource_name, mhp_t mhp_flags)
+ const char *resource_name, mhp_t mhp_flags,
+ int online_type)
{
struct resource *res;
int rc;
@@ -1668,6 +1703,13 @@ int add_memory_driver_managed(int nid, u64 start, u64 size,
resource_name[strlen(resource_name) - 1] != ')')
return -EINVAL;
+ /* Convert system default to actual online type */
+ if (online_type == MMOP_SYSTEM_DEFAULT)
+ online_type = mhp_get_default_online_type();
+
+ if (online_type < 0 || online_type > MMOP_ONLINE_MOVABLE)
+ return -EINVAL;
+
lock_device_hotplug();
res = register_memory_resource(start, size, resource_name);
@@ -1676,7 +1718,7 @@ int add_memory_driver_managed(int nid, u64 start, u64 size,
goto out_unlock;
}
- rc = add_memory_resource(nid, res, mhp_flags);
+ rc = __add_memory_resource(nid, res, mhp_flags, online_type);
if (rc < 0)
release_memory_resource(res);
@@ -2412,6 +2454,23 @@ static int __offline_memory(u64 start, u64 size)
return rc;
}
+/*
+ * Try to offline a memory range. Might take a long time to finish in case
+ * memory is still in use. In case of failure, already offlined memory blocks
+ * will be re-onlined.
+ */
+int offline_memory(u64 start, u64 size)
+{
+ int rc;
+
+ lock_device_hotplug();
+ rc = __offline_memory(start, size);
+ unlock_device_hotplug();
+
+ return rc;
+}
+EXPORT_SYMBOL_GPL(offline_memory);
+
/*
* Try to offline and remove memory. Might take a long time to finish in case
* memory is still in use. Primarily useful for memory devices that logically
--
2.52.0
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH 3/8] mm/memory_hotplug: add APIs for explicit online type control
2026-01-14 8:51 ` [PATCH 3/8] mm/memory_hotplug: add APIs for explicit online type control Gregory Price
@ 2026-01-14 10:21 ` David Hildenbrand (Red Hat)
0 siblings, 0 replies; 15+ messages in thread
From: David Hildenbrand (Red Hat) @ 2026-01-14 10:21 UTC (permalink / raw)
To: Gregory Price, linux-mm
Cc: linux-cxl, nvdimm, linux-kernel, virtualization, kernel-team,
dan.j.williams, vishal.l.verma, dave.jiang, mst, jasowang,
xuanzhuo, eperezma, osalvador, akpm
On 1/14/26 09:51, Gregory Price wrote:
> Add new memory hotplug APIs that allow callers to explicitly control
> the online type when adding or managing memory:
>
> - Extend add_memory_driver_managed() with an online_type parameter:
> Callers can now specify MMOP_ONLINE, MMOP_ONLINE_KERNEL, or
> MMOP_ONLINE_MOVABLE to online with that type, MMOP_OFFLINE to leave
> memory offline, or MMOP_SYSTEM_DEFAULT to use the system default
> policy. Update virtio_mem to pass MMOP_SYSTEM_DEFAULT to maintain
> existing behavior.
I wonder if we rather want to add a new interface
(add_and_online_memory_driver_managed()) where we can restrict it to
known kernel modules that do not violate user-space onlining policies.
For dax we know that user space will define the policy.
>
> - online_memory_range(): online a previously-added memory range with
> a specified online type (MMOP_ONLINE, MMOP_ONLINE_KERNEL, or
> MMOP_ONLINE_MOVABLE). Validates that the type is valid for onlining.
Why not simply online_memory() and offline_memory() ?
>
> - offline_memory(): offline a memory range without removing it. This
> is a wrapper around the internal __offline_memory() that handles
> locking. Useful for drivers that want to offline memory blocks
> before performing other operations.
>
These two should be not exported to arbitrary kernel modules. Use
EXPORT_SYMBOL_FOR_MODULES() if required, or do not export them at all.
> These APIs enable drivers like dax_kmem to implement sophisticated
> memory management policies, such as adding memory offline and deferring
> the online decision to userspace.
>
> Signed-off-by: Gregory Price <gourry@gourry.net>
> ---
> drivers/dax/kmem.c | 3 +-
> drivers/virtio/virtio_mem.c | 3 +-
> include/linux/memory_hotplug.h | 4 ++-
> mm/memory_hotplug.c | 63 ++++++++++++++++++++++++++++++++--
> 4 files changed, 68 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> index c036e4d0b610..5e0cf94a9620 100644
> --- a/drivers/dax/kmem.c
> +++ b/drivers/dax/kmem.c
> @@ -175,7 +175,8 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
> * this as RAM automatically.
> */
> rc = add_memory_driver_managed(data->mgid, range.start,
> - range_len(&range), kmem_name, mhp_flags);
> + range_len(&range), kmem_name, mhp_flags,
> + MMOP_SYSTEM_DEFAULT);
>
> if (rc) {
> dev_warn(dev, "mapping%d: %#llx-%#llx memory add failed\n",
> diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
> index 1688ecd69a04..b1ec8f2b9e31 100644
> --- a/drivers/virtio/virtio_mem.c
> +++ b/drivers/virtio/virtio_mem.c
> @@ -654,7 +654,8 @@ static int virtio_mem_add_memory(struct virtio_mem *vm, uint64_t addr,
> /* Memory might get onlined immediately. */
> atomic64_add(size, &vm->offline_size);
> rc = add_memory_driver_managed(vm->mgid, addr, size, vm->resource_name,
> - MHP_MERGE_RESOURCE | MHP_NID_IS_MGID);
> + MHP_MERGE_RESOURCE | MHP_NID_IS_MGID,
> + MMOP_SYSTEM_DEFAULT);
> if (rc) {
> atomic64_sub(size, &vm->offline_size);
> dev_warn(&vm->vdev->dev, "adding memory failed: %d\n", rc);
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index d5407264d72a..0f98bea6da65 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -265,6 +265,7 @@ static inline void pgdat_resize_init(struct pglist_data *pgdat) {}
> extern void try_offline_node(int nid);
> extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
> struct zone *zone, struct memory_group *group);
> +extern int offline_memory(u64 start, u64 size);
No new "extern" for functions.
> extern int remove_memory(u64 start, u64 size);
> extern void __remove_memory(u64 start, u64 size);
> extern int offline_and_remove_memory(u64 start, u64 size);
> @@ -297,7 +298,8 @@ extern int add_memory_resource(int nid, struct resource *resource,
> mhp_t mhp_flags);
> extern int add_memory_driver_managed(int nid, u64 start, u64 size,
> const char *resource_name,
> - mhp_t mhp_flags);
> + mhp_t mhp_flags, int online_type);
> +extern int online_memory_range(u64 start, u64 size, int online_type);
> extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
> unsigned long nr_pages,
> struct vmem_altmap *altmap, int migratetype,
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index ab73c8fcc0f1..515ff9d18039 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1343,6 +1343,34 @@ static int online_memory_block(struct memory_block *mem, void *arg)
> return device_online(&mem->dev);
> }
>
> +/**
> + * online_memory_range - online memory blocks in a range
> + * @start: physical start address of memory region
> + * @size: size of memory region
> + * @online_type: MMOP_ONLINE, MMOP_ONLINE_KERNEL, or MMOP_ONLINE_MOVABLE
I wonder if we instead want something that consumes all parameters like
int online_or_offline_memory(int online_type)
Then it's easier to use and we don't really have to document the
"online_type" that much to hand-select some values.
(I'm sure there are better nameing suggestions :) )
Should we document what happens if the memory is already online, but was
onlined to a different zone?
> + *
> + * Online all memory blocks in the specified range with the given online type.
> + * The memory must have already been added to the system.
> + *
> + * Returns 0 on success, negative error code on failure.
> + */
> +int online_memory_range(u64 start, u64 size, int online_type)
> +{
> + int rc;
> +
> + if (online_type == MMOP_OFFLINE ||
> + online_type > MMOP_ONLINE_MOVABLE)
> + return -EINVAL;
> +
> + lock_device_hotplug();
> + rc = walk_memory_blocks(start, size, &online_type,
> + online_memory_block);
> + unlock_device_hotplug();
> +
> + return rc;
> +}
> +EXPORT_SYMBOL_GPL(online_memory_range);
> +
> #ifndef arch_supports_memmap_on_memory
> static inline bool arch_supports_memmap_on_memory(unsigned long vmemmap_size)
> {
> @@ -1656,9 +1684,16 @@ EXPORT_SYMBOL_GPL(add_memory);
> *
> * The resource_name (visible via /proc/iomem) has to have the format
> * "System RAM ($DRIVER)".
> + *
> + * @online_type specifies the online behavior: MMOP_ONLINE, MMOP_ONLINE_KERNEL,
> + * MMOP_ONLINE_MOVABLE to online with that type, MMOP_OFFLINE to leave offline,
> + * or MMOP_SYSTEM_DEFAULT to use the system default policy.
> + *
I think we can simplify this documentation. Especially, one
MMOP_SYSTEM_DEFAULT is gone.
> + * Returns 0 on success, negative error code on failure.
> */
> int add_memory_driver_managed(int nid, u64 start, u64 size,
> - const char *resource_name, mhp_t mhp_flags)
> + const char *resource_name, mhp_t mhp_flags,
> + int online_type)
> {
> struct resource *res;
> int rc;
> @@ -1668,6 +1703,13 @@ int add_memory_driver_managed(int nid, u64 start, u64 size,
> resource_name[strlen(resource_name) - 1] != ')')
> return -EINVAL;
>
> + /* Convert system default to actual online type */
> + if (online_type == MMOP_SYSTEM_DEFAULT)
> + online_type = mhp_get_default_online_type();
> +
> + if (online_type < 0 || online_type > MMOP_ONLINE_MOVABLE)
> + return -EINVAL;
> +
> lock_device_hotplug();
>
> res = register_memory_resource(start, size, resource_name);
> @@ -1676,7 +1718,7 @@ int add_memory_driver_managed(int nid, u64 start, u64 size,
> goto out_unlock;
> }
>
> - rc = add_memory_resource(nid, res, mhp_flags);
> + rc = __add_memory_resource(nid, res, mhp_flags, online_type);
> if (rc < 0)
> release_memory_resource(res);
>
> @@ -2412,6 +2454,23 @@ static int __offline_memory(u64 start, u64 size)
> return rc;
> }
>
> +/*
> + * Try to offline a memory range. Might take a long time to finish in case
> + * memory is still in use. In case of failure, already offlined memory blocks
> + * will be re-onlined.
> + */
Proper kerneldoc? :)
> +int offline_memory(u64 start, u64 size)
> +{
> + int rc;
> +
> + lock_device_hotplug();
> + rc = __offline_memory(start, size);
> + unlock_device_hotplug();
> +
> + return rc;
> +}
> +EXPORT_SYMBOL_GPL(offline_memory);
> +
> /*
> * Try to offline and remove memory. Might take a long time to finish in case
> * memory is still in use. Primarily useful for memory devices that logically
--
Cheers
David
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 4/8] mm/memory_hotplug: return online type from add_memory_driver_managed()
2026-01-14 8:51 Subject: [PATCH 0/8] dax/kmem: add runtime hotplug state control Gregory Price
` (2 preceding siblings ...)
2026-01-14 8:51 ` [PATCH 3/8] mm/memory_hotplug: add APIs for explicit online type control Gregory Price
@ 2026-01-14 8:51 ` Gregory Price
2026-01-14 10:49 ` David Hildenbrand (Red Hat)
2026-01-14 8:51 ` [PATCH 5/8] dax/kmem: extract hotplug/hotremove helper functions Gregory Price
` (3 subsequent siblings)
7 siblings, 1 reply; 15+ messages in thread
From: Gregory Price @ 2026-01-14 8:51 UTC (permalink / raw)
To: linux-mm
Cc: linux-cxl, nvdimm, linux-kernel, virtualization, kernel-team,
dan.j.williams, vishal.l.verma, dave.jiang, david, mst, jasowang,
xuanzhuo, eperezma, osalvador, akpm
Change add_memory_driver_managed() to return the online type (MMOP_*)
on success instead of 0. This allows callers to determine the actual
online state of the memory after addition, which is important when
MMOP_SYSTEM_DEFAULT is used and the actual online type depends on the
system default policy.
Update virtio_mem to handle the new return value semantics by checking
for rc < 0 to detect errors.
Signed-off-by: Gregory Price <gourry@gourry.net>
---
drivers/dax/kmem.c | 2 +-
drivers/virtio/virtio_mem.c | 5 +++--
mm/memory_hotplug.c | 4 +++-
3 files changed, 7 insertions(+), 4 deletions(-)
diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index 5e0cf94a9620..d0dd36c536a0 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -178,7 +178,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
range_len(&range), kmem_name, mhp_flags,
MMOP_SYSTEM_DEFAULT);
- if (rc) {
+ if (rc < 0) {
dev_warn(dev, "mapping%d: %#llx-%#llx memory add failed\n",
i, range.start, range.end);
remove_resource(res);
diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index b1ec8f2b9e31..4decb44f5a43 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -656,15 +656,16 @@ static int virtio_mem_add_memory(struct virtio_mem *vm, uint64_t addr,
rc = add_memory_driver_managed(vm->mgid, addr, size, vm->resource_name,
MHP_MERGE_RESOURCE | MHP_NID_IS_MGID,
MMOP_SYSTEM_DEFAULT);
- if (rc) {
+ if (rc < 0) {
atomic64_sub(size, &vm->offline_size);
dev_warn(&vm->vdev->dev, "adding memory failed: %d\n", rc);
/*
* TODO: Linux MM does not properly clean up yet in all cases
* where adding of memory failed - especially on -ENOMEM.
*/
+ return rc;
}
- return rc;
+ return 0;
}
/*
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 515ff9d18039..41974a1ccb91 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1689,7 +1689,7 @@ EXPORT_SYMBOL_GPL(add_memory);
* MMOP_ONLINE_MOVABLE to online with that type, MMOP_OFFLINE to leave offline,
* or MMOP_SYSTEM_DEFAULT to use the system default policy.
*
- * Returns 0 on success, negative error code on failure.
+ * Returns the online type (MMOP_*) on success, negative error code on failure.
*/
int add_memory_driver_managed(int nid, u64 start, u64 size,
const char *resource_name, mhp_t mhp_flags,
@@ -1721,6 +1721,8 @@ int add_memory_driver_managed(int nid, u64 start, u64 size,
rc = __add_memory_resource(nid, res, mhp_flags, online_type);
if (rc < 0)
release_memory_resource(res);
+ else
+ rc = online_type;
out_unlock:
unlock_device_hotplug();
--
2.52.0
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH 4/8] mm/memory_hotplug: return online type from add_memory_driver_managed()
2026-01-14 8:51 ` [PATCH 4/8] mm/memory_hotplug: return online type from add_memory_driver_managed() Gregory Price
@ 2026-01-14 10:49 ` David Hildenbrand (Red Hat)
0 siblings, 0 replies; 15+ messages in thread
From: David Hildenbrand (Red Hat) @ 2026-01-14 10:49 UTC (permalink / raw)
To: Gregory Price, linux-mm
Cc: linux-cxl, nvdimm, linux-kernel, virtualization, kernel-team,
dan.j.williams, vishal.l.verma, dave.jiang, mst, jasowang,
xuanzhuo, eperezma, osalvador, akpm
On 1/14/26 09:51, Gregory Price wrote:
> Change add_memory_driver_managed() to return the online type (MMOP_*)
> on success instead of 0. This allows callers to determine the actual
> online state of the memory after addition, which is important when
> MMOP_SYSTEM_DEFAULT is used and the actual online type depends on the
> system default policy.
Another reason to just let the caller handle MMOP_SYSTEM_DEFAULT itself
by calling mhp_get_default_online_type() :)
--
Cheers
David
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 5/8] dax/kmem: extract hotplug/hotremove helper functions
2026-01-14 8:51 Subject: [PATCH 0/8] dax/kmem: add runtime hotplug state control Gregory Price
` (3 preceding siblings ...)
2026-01-14 8:51 ` [PATCH 4/8] mm/memory_hotplug: return online type from add_memory_driver_managed() Gregory Price
@ 2026-01-14 8:51 ` Gregory Price
2026-01-14 8:51 ` [PATCH 6/8] dax/kmem: add online/offline " Gregory Price
` (2 subsequent siblings)
7 siblings, 0 replies; 15+ messages in thread
From: Gregory Price @ 2026-01-14 8:51 UTC (permalink / raw)
To: linux-mm
Cc: linux-cxl, nvdimm, linux-kernel, virtualization, kernel-team,
dan.j.williams, vishal.l.verma, dave.jiang, david, mst, jasowang,
xuanzhuo, eperezma, osalvador, akpm
Refactor dev_dax_kmem_probe() and dev_dax_kmem_remove() by extracting
the memory hotplug and hot-remove logic into separate helper functions:
- dax_kmem_do_hotplug(): handles memory region reservation and adding
- dax_kmem_do_hotremove(): handles memory removal and resource cleanup
Update to use the new add_memory_driver_managed() signature with
explicit online_type parameter, passing MMOP_SYSTEM_DEFAULT to
maintain existing behavior.
This is a pure refactoring with no functional change. The helpers will
enable future extensions to support more granular control over memory
hotplug operations.
Signed-off-by: Gregory Price <gourry@gourry.net>
---
drivers/dax/kmem.c | 244 +++++++++++++++++++++++++++------------------
1 file changed, 149 insertions(+), 95 deletions(-)
diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index d0dd36c536a0..5225f2bf0b2a 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -65,14 +65,138 @@ static void kmem_put_memory_types(void)
mt_put_memory_types(&kmem_memory_types);
}
+/**
+ * dax_kmem_do_hotplug - hotplug memory for dax kmem device
+ * @dev_dax: the dev_dax instance
+ * @data: the dax_kmem_data structure with resource tracking
+ *
+ * Hotplugs all ranges in the dev_dax region as system memory.
+ *
+ * Returns the number of successfully mapped ranges, or negative error.
+ */
+static int dax_kmem_do_hotplug(struct dev_dax *dev_dax,
+ struct dax_kmem_data *data)
+{
+ struct device *dev = &dev_dax->dev;
+ int i, rc, mapped = 0;
+ mhp_t mhp_flags;
+
+ for (i = 0; i < dev_dax->nr_range; i++) {
+ struct resource *res;
+ struct range range;
+
+ rc = dax_kmem_range(dev_dax, i, &range);
+ if (rc)
+ continue;
+
+ /* Skip ranges already added */
+ if (data->res[i])
+ continue;
+
+ /* Region is permanently reserved if hotremove fails. */
+ res = request_mem_region(range.start, range_len(&range),
+ data->res_name);
+ if (!res) {
+ dev_warn(dev, "mapping%d: %#llx-%#llx could not reserve region\n",
+ i, range.start, range.end);
+ /*
+ * Once some memory has been onlined we can't
+ * assume that it can be un-onlined safely.
+ */
+ if (mapped)
+ continue;
+ return -EBUSY;
+ }
+ data->res[i] = res;
+
+ /*
+ * Set flags appropriate for System RAM. Leave ..._BUSY clear
+ * so that add_memory() can add a child resource. Do not
+ * inherit flags from the parent since it may set new flags
+ * unknown to us that will break add_memory() below.
+ */
+ res->flags = IORESOURCE_SYSTEM_RAM;
+
+ mhp_flags = MHP_NID_IS_MGID;
+ if (dev_dax->memmap_on_memory)
+ mhp_flags |= MHP_MEMMAP_ON_MEMORY;
+
+ /*
+ * Ensure that future kexec'd kernels will not treat
+ * this as RAM automatically.
+ */
+ rc = add_memory_driver_managed(data->mgid, range.start,
+ range_len(&range), kmem_name,
+ mhp_flags, MMOP_SYSTEM_DEFAULT);
+
+ if (rc < 0) {
+ dev_warn(dev, "mapping%d: %#llx-%#llx memory add failed\n",
+ i, range.start, range.end);
+ remove_resource(res);
+ kfree(res);
+ data->res[i] = NULL;
+ if (mapped)
+ continue;
+ return rc;
+ }
+ mapped++;
+ }
+
+ return mapped;
+}
+
+#ifdef CONFIG_MEMORY_HOTREMOVE
+/**
+ * dax_kmem_do_hotremove - hot-remove memory for dax kmem device
+ * @dev_dax: the dev_dax instance
+ * @data: the dax_kmem_data structure with resource tracking
+ *
+ * Removes all ranges in the dev_dax region.
+ *
+ * Returns the number of successfully removed ranges.
+ */
+static int dax_kmem_do_hotremove(struct dev_dax *dev_dax,
+ struct dax_kmem_data *data)
+{
+ struct device *dev = &dev_dax->dev;
+ int i, success = 0;
+
+ for (i = 0; i < dev_dax->nr_range; i++) {
+ struct range range;
+ int rc;
+
+ rc = dax_kmem_range(dev_dax, i, &range);
+ if (rc)
+ continue;
+
+ /* Skip ranges not currently added */
+ if (!data->res[i])
+ continue;
+
+ rc = remove_memory(range.start, range_len(&range));
+ if (rc == 0) {
+ remove_resource(data->res[i]);
+ kfree(data->res[i]);
+ data->res[i] = NULL;
+ success++;
+ continue;
+ }
+ any_hotremove_failed = true;
+ dev_err(dev, "mapping%d: %#llx-%#llx offline failed\n",
+ i, range.start, range.end);
+ }
+
+ return success;
+}
+#endif /* CONFIG_MEMORY_HOTREMOVE */
+
static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
{
struct device *dev = &dev_dax->dev;
unsigned long total_len = 0, orig_len = 0;
struct dax_kmem_data *data;
struct memory_dev_type *mtype;
- int i, rc, mapped = 0;
- mhp_t mhp_flags;
+ int i, rc;
int numa_node;
int adist = MEMTIER_DEFAULT_DAX_ADISTANCE;
@@ -134,68 +258,16 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
goto err_reg_mgid;
data->mgid = rc;
- for (i = 0; i < dev_dax->nr_range; i++) {
- struct resource *res;
- struct range range;
-
- rc = dax_kmem_range(dev_dax, i, &range);
- if (rc)
- continue;
-
- /* Region is permanently reserved if hotremove fails. */
- res = request_mem_region(range.start, range_len(&range), data->res_name);
- if (!res) {
- dev_warn(dev, "mapping%d: %#llx-%#llx could not reserve region\n",
- i, range.start, range.end);
- /*
- * Once some memory has been onlined we can't
- * assume that it can be un-onlined safely.
- */
- if (mapped)
- continue;
- rc = -EBUSY;
- goto err_request_mem;
- }
- data->res[i] = res;
-
- /*
- * Set flags appropriate for System RAM. Leave ..._BUSY clear
- * so that add_memory() can add a child resource. Do not
- * inherit flags from the parent since it may set new flags
- * unknown to us that will break add_memory() below.
- */
- res->flags = IORESOURCE_SYSTEM_RAM;
-
- mhp_flags = MHP_NID_IS_MGID;
- if (dev_dax->memmap_on_memory)
- mhp_flags |= MHP_MEMMAP_ON_MEMORY;
-
- /*
- * Ensure that future kexec'd kernels will not treat
- * this as RAM automatically.
- */
- rc = add_memory_driver_managed(data->mgid, range.start,
- range_len(&range), kmem_name, mhp_flags,
- MMOP_SYSTEM_DEFAULT);
-
- if (rc < 0) {
- dev_warn(dev, "mapping%d: %#llx-%#llx memory add failed\n",
- i, range.start, range.end);
- remove_resource(res);
- kfree(res);
- data->res[i] = NULL;
- if (mapped)
- continue;
- goto err_request_mem;
- }
- mapped++;
- }
-
dev_set_drvdata(dev, data);
+ rc = dax_kmem_do_hotplug(dev_dax, data);
+ if (rc < 0)
+ goto err_hotplug;
+
return 0;
-err_request_mem:
+err_hotplug:
+ dev_set_drvdata(dev, NULL);
memory_group_unregister(data->mgid);
err_reg_mgid:
kfree(data->res_name);
@@ -209,7 +281,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
#ifdef CONFIG_MEMORY_HOTREMOVE
static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
{
- int i, success = 0;
+ int success;
int node = dev_dax->target_node;
struct device *dev = &dev_dax->dev;
struct dax_kmem_data *data = dev_get_drvdata(dev);
@@ -220,42 +292,24 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
* there is no way to hotremove this memory until reboot because device
* unbind will succeed even if we return failure.
*/
- for (i = 0; i < dev_dax->nr_range; i++) {
- struct range range;
- int rc;
-
- rc = dax_kmem_range(dev_dax, i, &range);
- if (rc)
- continue;
-
- rc = remove_memory(range.start, range_len(&range));
- if (rc == 0) {
- remove_resource(data->res[i]);
- kfree(data->res[i]);
- data->res[i] = NULL;
- success++;
- continue;
- }
- any_hotremove_failed = true;
- dev_err(dev,
- "mapping%d: %#llx-%#llx cannot be hotremoved until the next reboot\n",
- i, range.start, range.end);
+ success = dax_kmem_do_hotremove(dev_dax, data);
+ if (success < dev_dax->nr_range) {
+ dev_err(dev, "Hotplug regions stuck online until reboot\n");
+ return;
}
- if (success >= dev_dax->nr_range) {
- memory_group_unregister(data->mgid);
- kfree(data->res_name);
- kfree(data);
- dev_set_drvdata(dev, NULL);
- /*
- * Clear the memtype association on successful unplug.
- * If not, we have memory blocks left which can be
- * offlined/onlined later. We need to keep memory_dev_type
- * for that. This implies this reference will be around
- * till next reboot.
- */
- clear_node_memory_type(node, NULL);
- }
+ memory_group_unregister(data->mgid);
+ kfree(data->res_name);
+ kfree(data);
+ dev_set_drvdata(dev, NULL);
+ /*
+ * Clear the memtype association on successful unplug.
+ * If not, we have memory blocks left which can be
+ * offlined/onlined later. We need to keep memory_dev_type
+ * for that. This implies this reference will be around
+ * till next reboot.
+ */
+ clear_node_memory_type(node, NULL);
}
#else
static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
--
2.52.0
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH 6/8] dax/kmem: add online/offline helper functions
2026-01-14 8:51 Subject: [PATCH 0/8] dax/kmem: add runtime hotplug state control Gregory Price
` (4 preceding siblings ...)
2026-01-14 8:51 ` [PATCH 5/8] dax/kmem: extract hotplug/hotremove helper functions Gregory Price
@ 2026-01-14 8:51 ` Gregory Price
2026-01-14 8:51 ` [PATCH 7/8] dax/kmem: add sysfs interface for runtime hotplug state control Gregory Price
2026-01-14 8:52 ` [PATCH 8/8] dax/kmem: add memory notifier to block external state changes Gregory Price
7 siblings, 0 replies; 15+ messages in thread
From: Gregory Price @ 2026-01-14 8:51 UTC (permalink / raw)
To: linux-mm
Cc: linux-cxl, nvdimm, linux-kernel, virtualization, kernel-team,
dan.j.williams, vishal.l.verma, dave.jiang, david, mst, jasowang,
xuanzhuo, eperezma, osalvador, akpm
Add helper functions for onlining and offlining memory ranges:
- dax_kmem_do_online(): online memory with specified type (MMOP_ONLINE
or MMOP_ONLINE_MOVABLE) using online_memory_range()
- dax_kmem_do_offline(): offline memory using offline_memory()
These helpers use the memory hotplug APIs from the memory_hotplug
refactoring and will be used by the upcoming sysfs interface to allow
userspace control over memory state transitions.
No functional change as these helpers are not called yet.
Signed-off-by: Gregory Price <gourry@gourry.net>
---
drivers/dax/kmem.c | 103 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 103 insertions(+)
diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index 5225f2bf0b2a..30429f2d5a67 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -190,6 +190,109 @@ static int dax_kmem_do_hotremove(struct dev_dax *dev_dax,
}
#endif /* CONFIG_MEMORY_HOTREMOVE */
+/**
+ * dax_kmem_do_online - online memory blocks for dax kmem device
+ * @dev_dax: the dev_dax instance
+ * @data: the dax_kmem_data structure with resource tracking
+ * @online_type: MMOP_ONLINE or MMOP_ONLINE_MOVABLE
+ *
+ * Onlines all ranges in the dev_dax region with the specified online type.
+ * On partial failure, previously onlined ranges are rolled back to offline.
+ *
+ * Returns 0 on success, negative error code on failure.
+ */
+static int dax_kmem_do_online(struct dev_dax *dev_dax,
+ struct dax_kmem_data *data, int online_type)
+{
+ int i, j, rc;
+
+ for (i = 0; i < dev_dax->nr_range; i++) {
+ struct range range;
+
+ rc = dax_kmem_range(dev_dax, i, &range);
+ if (rc)
+ continue;
+
+ if (!data->res[i])
+ continue;
+
+ rc = online_memory_range(range.start, range_len(&range),
+ online_type);
+ if (rc)
+ goto rollback;
+ }
+
+ return 0;
+
+rollback:
+ /* Rollback previously onlined ranges */
+ for (j = 0; j < i; j++) {
+ struct range range;
+
+ if (dax_kmem_range(dev_dax, j, &range))
+ continue;
+
+ if (!data->res[j])
+ continue;
+
+ /* Best effort rollback - ignore failures */
+ offline_memory(range.start, range_len(&range));
+ }
+ return rc;
+}
+
+/**
+ * dax_kmem_do_offline - offline memory blocks for dax kmem device
+ * @dev_dax: the dev_dax instance
+ * @data: the dax_kmem_data structure with resource tracking
+ *
+ * Offlines all ranges in the dev_dax region.
+ * On partial failure, previously offlined ranges are rolled back to online.
+ *
+ * Returns 0 on success, negative error code on failure.
+ */
+static int dax_kmem_do_offline(struct dev_dax *dev_dax,
+ struct dax_kmem_data *data)
+{
+ int i, j, rc;
+
+ for (i = 0; i < dev_dax->nr_range; i++) {
+ struct range range;
+
+ rc = dax_kmem_range(dev_dax, i, &range);
+ if (rc)
+ continue;
+
+ if (!data->res[i])
+ continue;
+
+ rc = offline_memory(range.start, range_len(&range));
+ if (rc)
+ goto rollback;
+ }
+
+ return 0;
+
+rollback:
+ /*
+ * Rollback previously offlined ranges. Use MMOP_ONLINE as a safe
+ * default - the original online type is not tracked per-range.
+ */
+ for (j = 0; j < i; j++) {
+ struct range range;
+
+ if (dax_kmem_range(dev_dax, j, &range))
+ continue;
+
+ if (!data->res[j])
+ continue;
+
+ /* Best effort rollback - ignore failures */
+ online_memory_range(range.start, range_len(&range), MMOP_ONLINE);
+ }
+ return rc;
+}
+
static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
{
struct device *dev = &dev_dax->dev;
--
2.52.0
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH 7/8] dax/kmem: add sysfs interface for runtime hotplug state control
2026-01-14 8:51 Subject: [PATCH 0/8] dax/kmem: add runtime hotplug state control Gregory Price
` (5 preceding siblings ...)
2026-01-14 8:51 ` [PATCH 6/8] dax/kmem: add online/offline " Gregory Price
@ 2026-01-14 8:51 ` Gregory Price
2026-01-14 10:55 ` David Hildenbrand (Red Hat)
2026-01-14 8:52 ` [PATCH 8/8] dax/kmem: add memory notifier to block external state changes Gregory Price
7 siblings, 1 reply; 15+ messages in thread
From: Gregory Price @ 2026-01-14 8:51 UTC (permalink / raw)
To: linux-mm
Cc: linux-cxl, nvdimm, linux-kernel, virtualization, kernel-team,
dan.j.williams, vishal.l.verma, dave.jiang, david, mst, jasowang,
xuanzhuo, eperezma, osalvador, akpm
The dax kmem driver currently onlines memory automatically during
probe using the system's default online policy but provides no way
to control or query the memory state at runtime. Users cannot change
the online type after probe, and there's no atomic way to offline and
remove memory blocks together.
Add a new 'hotplug' sysfs attribute that allows userspace to control
and query the memory state. The interface supports the following states:
- "offline": memory is added but not online
- "online": memory is online as normal system RAM
- "online_movable": memory is online in ZONE_MOVABLE
- "unplug": memory is offlined and removed
The initial state after probe uses MMOP_SYSTEM_DEFAULT to preserve
backwards compatibility - existing systems with auto-online policies
will continue to work as before.
The state machine enforces valid transitions:
- From offline: can transition to online, online_movable, or unplug
- From online/online_movable: can transition to offline or unplug
- Cannot switch directly between online and online_movable
Implementation changes:
- Add state tracking to struct dax_kmem_data
- Extend dax_kmem_do_hotplug() to accept online_type parameter
- Use add_memory_driver_managed() with explicit online_type parameter
- Use MMOP_SYSTEM_DEFAULT at probe for backwards compatibility
- Use offline_and_remove_memory() for atomic offline+remove
- Add stub for dax_kmem_do_hotremove() when !CONFIG_MEMORY_HOTREMOVE
This enables userspace memory managers to implement sophisticated
policies such as changing CXL memory zone type based on workload
characteristics, or atomically unplugging memory without races against
auto-online policies.
Signed-off-by: Gregory Price <gourry@gourry.net>
---
drivers/dax/kmem.c | 167 +++++++++++++++++++++++++++++++++++++++++---
mm/memory_hotplug.c | 1 +
2 files changed, 158 insertions(+), 10 deletions(-)
diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index 30429f2d5a67..6d73c44e4e08 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -44,9 +44,15 @@ static int dax_kmem_range(struct dev_dax *dev_dax, int i, struct range *r)
return 0;
}
+#define DAX_KMEM_UNPLUGGED (-1)
+
struct dax_kmem_data {
const char *res_name;
int mgid;
+ int numa_node;
+ struct dev_dax *dev_dax;
+ int state;
+ struct mutex lock; /* protects hotplug state transitions */
struct resource *res[];
};
@@ -69,13 +75,15 @@ static void kmem_put_memory_types(void)
* dax_kmem_do_hotplug - hotplug memory for dax kmem device
* @dev_dax: the dev_dax instance
* @data: the dax_kmem_data structure with resource tracking
+ * @online_type: MMOP_OFFLINE, MMOP_ONLINE, or MMOP_ONLINE_MOVABLE
*
- * Hotplugs all ranges in the dev_dax region as system memory.
+ * Hotplugs all ranges in the dev_dax region as system memory using
+ * the specified online type.
*
* Returns the number of successfully mapped ranges, or negative error.
*/
static int dax_kmem_do_hotplug(struct dev_dax *dev_dax,
- struct dax_kmem_data *data)
+ struct dax_kmem_data *data, int online_type)
{
struct device *dev = &dev_dax->dev;
int i, rc, mapped = 0;
@@ -124,10 +132,14 @@ static int dax_kmem_do_hotplug(struct dev_dax *dev_dax,
/*
* Ensure that future kexec'd kernels will not treat
* this as RAM automatically.
+ *
+ * Use add_memory_driver_managed() with explicit online_type
+ * to control the online state and avoid surprises from
+ * system auto-online policy.
*/
rc = add_memory_driver_managed(data->mgid, range.start,
range_len(&range), kmem_name,
- mhp_flags, MMOP_SYSTEM_DEFAULT);
+ mhp_flags, online_type);
if (rc < 0) {
dev_warn(dev, "mapping%d: %#llx-%#llx memory add failed\n",
@@ -151,14 +163,13 @@ static int dax_kmem_do_hotplug(struct dev_dax *dev_dax,
* @dev_dax: the dev_dax instance
* @data: the dax_kmem_data structure with resource tracking
*
- * Removes all ranges in the dev_dax region.
+ * Offlines and removes all ranges in the dev_dax region.
*
- * Returns the number of successfully removed ranges.
+ * Returns the number of successfully removed ranges, or negative error.
*/
static int dax_kmem_do_hotremove(struct dev_dax *dev_dax,
struct dax_kmem_data *data)
{
- struct device *dev = &dev_dax->dev;
int i, success = 0;
for (i = 0; i < dev_dax->nr_range; i++) {
@@ -173,7 +184,7 @@ static int dax_kmem_do_hotremove(struct dev_dax *dev_dax,
if (!data->res[i])
continue;
- rc = remove_memory(range.start, range_len(&range));
+ rc = offline_and_remove_memory(range.start, range_len(&range));
if (rc == 0) {
remove_resource(data->res[i]);
kfree(data->res[i]);
@@ -182,12 +193,19 @@ static int dax_kmem_do_hotremove(struct dev_dax *dev_dax,
continue;
}
any_hotremove_failed = true;
- dev_err(dev, "mapping%d: %#llx-%#llx offline failed\n",
+ dev_err(&dev_dax->dev,
+ "mapping%d: %#llx-%#llx offline and remove failed\n",
i, range.start, range.end);
}
return success;
}
+#else
+static int dax_kmem_do_hotremove(struct dev_dax *dev_dax,
+ struct dax_kmem_data *data)
+{
+ return -ENODEV;
+}
#endif /* CONFIG_MEMORY_HOTREMOVE */
/**
@@ -288,11 +306,117 @@ static int dax_kmem_do_offline(struct dev_dax *dev_dax,
continue;
/* Best effort rollback - ignore failures */
- online_memory_range(range.start, range_len(&range), MMOP_ONLINE);
+ online_memory_range(range.start, range_len(&range), data->state);
}
return rc;
}
+static int dax_kmem_parse_state(const char *buf)
+{
+ if (sysfs_streq(buf, "unplug"))
+ return DAX_KMEM_UNPLUGGED;
+ if (sysfs_streq(buf, "offline"))
+ return MMOP_OFFLINE;
+ if (sysfs_streq(buf, "online"))
+ return MMOP_ONLINE;
+ if (sysfs_streq(buf, "online_movable"))
+ return MMOP_ONLINE_MOVABLE;
+ return -EINVAL;
+}
+
+static ssize_t hotplug_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct dax_kmem_data *data = dev_get_drvdata(dev);
+ const char *state_str;
+
+ if (!data)
+ return -ENXIO;
+
+ switch (data->state) {
+ case DAX_KMEM_UNPLUGGED:
+ state_str = "unplugged";
+ break;
+ case MMOP_OFFLINE:
+ state_str = "offline";
+ break;
+ case MMOP_ONLINE:
+ state_str = "online";
+ break;
+ case MMOP_ONLINE_MOVABLE:
+ state_str = "online_movable";
+ break;
+ default:
+ state_str = "unknown";
+ break;
+ }
+
+ return sysfs_emit(buf, "%s\n", state_str);
+}
+
+static ssize_t hotplug_store(struct device *dev, struct device_attribute *attr,
+ const char *buf, size_t len)
+{
+ struct dev_dax *dev_dax = to_dev_dax(dev);
+ struct dax_kmem_data *data = dev_get_drvdata(dev);
+ int online_type;
+ int rc;
+
+ if (!data)
+ return -ENXIO;
+
+ online_type = dax_kmem_parse_state(buf);
+ if (online_type < DAX_KMEM_UNPLUGGED)
+ return online_type;
+
+ guard(mutex)(&data->lock);
+
+ /* Already in requested state */
+ if (data->state == online_type)
+ return len;
+
+ if (online_type == DAX_KMEM_UNPLUGGED) {
+ rc = dax_kmem_do_hotremove(dev_dax, data);
+ if (rc < 0) {
+ dev_warn(dev, "hotplug state is inconsistent\n");
+ return rc;
+ }
+ data->state = DAX_KMEM_UNPLUGGED;
+ return len;
+ }
+
+ if (online_type == MMOP_OFFLINE) {
+ /* Can only offline from an online state */
+ if (data->state != MMOP_ONLINE && data->state != MMOP_ONLINE_MOVABLE)
+ return -EINVAL;
+ rc = dax_kmem_do_offline(dev_dax, data);
+ if (rc < 0) {
+ dev_warn(dev, "hotplug state is inconsistent\n");
+ return rc;
+ }
+ data->state = MMOP_OFFLINE;
+ return len;
+ }
+
+ /* online_type is MMOP_ONLINE or MMOP_ONLINE_MOVABLE */
+
+ /* Cannot switch between online types without offlining first */
+ if (data->state == MMOP_ONLINE || data->state == MMOP_ONLINE_MOVABLE)
+ return -EBUSY;
+
+ if (data->state == MMOP_OFFLINE)
+ rc = dax_kmem_do_online(dev_dax, data, online_type);
+ else
+ rc = dax_kmem_do_hotplug(dev_dax, data, online_type);
+
+ if (rc < 0)
+ return rc;
+
+ data->state = online_type;
+ return len;
+}
+static DEVICE_ATTR_RW(hotplug);
+
static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
{
struct device *dev = &dev_dax->dev;
@@ -360,12 +484,29 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
if (rc < 0)
goto err_reg_mgid;
data->mgid = rc;
+ data->numa_node = numa_node;
+ data->dev_dax = dev_dax;
+ mutex_init(&data->lock);
dev_set_drvdata(dev, data);
- rc = dax_kmem_do_hotplug(dev_dax, data);
+ /*
+ * Hotplug the memory using the system default online policy.
+ * This preserves backwards compatibility for existing users who
+ * rely on auto-online behavior.
+ */
+ rc = dax_kmem_do_hotplug(dev_dax, data, MMOP_SYSTEM_DEFAULT);
if (rc < 0)
goto err_hotplug;
+ /*
+ * dax_kmem_do_hotplug returns the count of mapped ranges on success.
+ * Query the system default to determine the actual memory state.
+ */
+ data->state = mhp_get_default_online_type();
+
+ rc = device_create_file(dev, &dev_attr_hotplug);
+ if (rc)
+ dev_warn(dev, "failed to create hotplug sysfs entry\n");
return 0;
@@ -389,6 +530,8 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
struct device *dev = &dev_dax->dev;
struct dax_kmem_data *data = dev_get_drvdata(dev);
+ device_remove_file(dev, &dev_attr_hotplug);
+
/*
* We have one shot for removing memory, if some memory blocks were not
* offline prior to calling this function remove_memory() will fail, and
@@ -417,6 +560,10 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
#else
static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
{
+ struct device *dev = &dev_dax->dev;
+
+ device_remove_file(dev, &dev_attr_hotplug);
+
/*
* Without hotremove purposely leak the request_mem_region() for the
* device-dax range and return '0' to ->remove() attempts. The removal
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 41974a1ccb91..3adc05d2df52 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -239,6 +239,7 @@ int mhp_get_default_online_type(void)
return mhp_default_online_type;
}
+EXPORT_SYMBOL_GPL(mhp_get_default_online_type);
void mhp_set_default_online_type(int online_type)
{
--
2.52.0
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH 7/8] dax/kmem: add sysfs interface for runtime hotplug state control
2026-01-14 8:51 ` [PATCH 7/8] dax/kmem: add sysfs interface for runtime hotplug state control Gregory Price
@ 2026-01-14 10:55 ` David Hildenbrand (Red Hat)
0 siblings, 0 replies; 15+ messages in thread
From: David Hildenbrand (Red Hat) @ 2026-01-14 10:55 UTC (permalink / raw)
To: Gregory Price, linux-mm
Cc: linux-cxl, nvdimm, linux-kernel, virtualization, kernel-team,
dan.j.williams, vishal.l.verma, dave.jiang, mst, jasowang,
xuanzhuo, eperezma, osalvador, akpm
On 1/14/26 09:51, Gregory Price wrote:
> The dax kmem driver currently onlines memory automatically during
> probe using the system's default online policy but provides no way
> to control or query the memory state at runtime. Users cannot change
> the online type after probe, and there's no atomic way to offline and
> remove memory blocks together.
>
> Add a new 'hotplug' sysfs attribute that allows userspace to control
> and query the memory state. The interface supports the following states:
>
> - "offline": memory is added but not online
> - "online": memory is online as normal system RAM
> - "online_movable": memory is online in ZONE_MOVABLE
> - "unplug": memory is offlined and removed
>
> The initial state after probe uses MMOP_SYSTEM_DEFAULT to preserve
> backwards compatibility - existing systems with auto-online policies
> will continue to work as before.
>
> The state machine enforces valid transitions:
> - From offline: can transition to online, online_movable, or unplug
> - From online/online_movable: can transition to offline or unplug
> - Cannot switch directly between online and online_movable
Do we have to support these transitions right from the start?
What are the use cases for adding memory as offline and then onlining
it, and why do we have to support that through this interface?
It would be a lot simpler if we would only allow
> - "offline": memory is added but not online
> - "online": memory is online as normal system RAM
> - "online_movable": memory is online in ZONE_MOVABLE
> - "unplug": memory is offlined and removed
That is, transitioning from offline to online or vice versa fails with
-ENOSUPP. User space can do that itself through sysfs and if there is
ever a good use case we can extend this interface here to allow it.
Or is there a good use case that really requires this?
--
Cheers
David
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 8/8] dax/kmem: add memory notifier to block external state changes
2026-01-14 8:51 Subject: [PATCH 0/8] dax/kmem: add runtime hotplug state control Gregory Price
` (6 preceding siblings ...)
2026-01-14 8:51 ` [PATCH 7/8] dax/kmem: add sysfs interface for runtime hotplug state control Gregory Price
@ 2026-01-14 8:52 ` Gregory Price
2026-01-14 9:44 ` David Hildenbrand (Red Hat)
7 siblings, 1 reply; 15+ messages in thread
From: Gregory Price @ 2026-01-14 8:52 UTC (permalink / raw)
To: linux-mm
Cc: linux-cxl, nvdimm, linux-kernel, virtualization, kernel-team,
dan.j.williams, vishal.l.verma, dave.jiang, david, mst, jasowang,
xuanzhuo, eperezma, osalvador, akpm, Hannes Reinecke
Add a memory notifier to prevent external operations from changing the
online/offline state of memory blocks managed by dax_kmem. This ensures
state changes only occur through the driver's hotplug sysfs interface,
providing consistent state tracking and preventing races with auto-online
policies or direct memory block sysfs manipulation.
The notifier uses a transition protocol with memory barriers:
- Before initiating a state change, set target_state then in_transition
- Use a barrier to ensure target_state is visible before in_transition
- The notifier checks in_transition, then uses barrier before reading
target_state to ensure proper ordering on weakly-ordered architectures
The notifier callback:
- Returns NOTIFY_DONE for non-overlapping memory (not our concern)
- Returns NOTIFY_BAD if in_transition is false (block external ops)
- Validates the memory event matches target_state (MEM_GOING_ONLINE
for online operations, MEM_GOING_OFFLINE for offline/unplug)
- Returns NOTIFY_OK only for driver-initiated operations with matching
target_state
This prevents scenarios where:
- Auto-online policies re-online memory the driver is trying to offline
- Users manually change memory state via /sys/devices/system/memory/
- Other kernel subsystems interfere with driver-managed memory state
Suggested-by: Hannes Reinecke <hare@suse.de>
Suggested-by: David Hildenbrand <david@kernel.org>
Signed-off-by: Gregory Price <gourry@gourry.net>
---
drivers/dax/kmem.c | 164 +++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 160 insertions(+), 4 deletions(-)
diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index 6d73c44e4e08..b604da8b3fe1 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -53,6 +53,9 @@ struct dax_kmem_data {
struct dev_dax *dev_dax;
int state;
struct mutex lock; /* protects hotplug state transitions */
+ bool in_transition;
+ int target_state;
+ struct notifier_block mem_nb;
struct resource *res[];
};
@@ -71,6 +74,116 @@ static void kmem_put_memory_types(void)
mt_put_memory_types(&kmem_memory_types);
}
+/**
+ * dax_kmem_start_transition - begin a driver-initiated state transition
+ * @data: the dax_kmem_data structure
+ * @target: the target state (MMOP_ONLINE, MMOP_ONLINE_MOVABLE, or MMOP_OFFLINE)
+ *
+ * Sets up state for a driver-initiated memory operation. The memory notifier
+ * will only allow operations that match this target state while in transition.
+ * Uses store-release to ensure target_state is visible before in_transition.
+ */
+static void dax_kmem_start_transition(struct dax_kmem_data *data, int target)
+{
+ data->target_state = target;
+ smp_store_release(&data->in_transition, true);
+}
+
+/**
+ * dax_kmem_end_transition - end a driver-initiated state transition
+ * @data: the dax_kmem_data structure
+ *
+ * Clears the in_transition flag after a state change completes or aborts.
+ */
+static void dax_kmem_end_transition(struct dax_kmem_data *data)
+{
+ WRITE_ONCE(data->in_transition, false);
+}
+
+/**
+ * dax_kmem_overlaps_range - check if a memory range overlaps with this device
+ * @data: the dax_kmem_data structure
+ * @start: start physical address of the range to check
+ * @size: size of the range to check
+ *
+ * Returns true if the range overlaps with any of the device's memory ranges.
+ */
+static bool dax_kmem_overlaps_range(struct dax_kmem_data *data,
+ u64 start, u64 size)
+{
+ struct dev_dax *dev_dax = data->dev_dax;
+ int i;
+
+ for (i = 0; i < dev_dax->nr_range; i++) {
+ struct range range;
+ struct range check = DEFINE_RANGE(start, start + size - 1);
+
+ if (dax_kmem_range(dev_dax, i, &range))
+ continue;
+
+ if (!data->res[i])
+ continue;
+
+ if (range_overlaps(&range, &check))
+ return true;
+ }
+ return false;
+}
+
+/**
+ * dax_kmem_memory_notifier_cb - memory notifier callback for dax kmem
+ * @nb: the notifier block (embedded in dax_kmem_data)
+ * @action: the memory event (MEM_GOING_ONLINE, MEM_GOING_OFFLINE, etc.)
+ * @arg: pointer to memory_notify structure
+ *
+ * This callback prevents external operations (e.g., from sysfs or auto-online
+ * policies) on memory blocks managed by dax_kmem. Only operations initiated
+ * by the driver itself (via the hotplug sysfs interface) are allowed.
+ *
+ * Returns NOTIFY_OK to allow the operation, NOTIFY_BAD to block it,
+ * or NOTIFY_DONE if the memory doesn't belong to this device.
+ */
+static int dax_kmem_memory_notifier_cb(struct notifier_block *nb,
+ unsigned long action, void *arg)
+{
+ struct dax_kmem_data *data = container_of(nb, struct dax_kmem_data,
+ mem_nb);
+ struct memory_notify *mhp = arg;
+ const u64 start = PFN_PHYS(mhp->start_pfn);
+ const u64 size = PFN_PHYS(mhp->nr_pages);
+
+ /* Only interested in going online/offline events */
+ if (action != MEM_GOING_ONLINE && action != MEM_GOING_OFFLINE)
+ return NOTIFY_DONE;
+
+ /* Check if this memory belongs to our device */
+ if (!dax_kmem_overlaps_range(data, start, size))
+ return NOTIFY_DONE;
+
+ /*
+ * Block all operations unless we're in a driver-initiated transition.
+ * When in_transition is set, only allow operations that match our
+ * target_state to prevent races with external operations.
+ *
+ * Use load-acquire to pair with the store-release in
+ * dax_kmem_start_transition(), ensuring target_state is visible.
+ */
+ if (!smp_load_acquire(&data->in_transition))
+ return NOTIFY_BAD;
+
+ /* Online operations expect MEM_GOING_ONLINE */
+ if (action == MEM_GOING_ONLINE &&
+ (data->target_state == MMOP_ONLINE ||
+ data->target_state == MMOP_ONLINE_MOVABLE))
+ return NOTIFY_OK;
+
+ /* Offline/hotremove operations expect MEM_GOING_OFFLINE */
+ if (action == MEM_GOING_OFFLINE && data->target_state == MMOP_OFFLINE)
+ return NOTIFY_OK;
+
+ return NOTIFY_BAD;
+}
+
/**
* dax_kmem_do_hotplug - hotplug memory for dax kmem device
* @dev_dax: the dev_dax instance
@@ -375,11 +488,27 @@ static ssize_t hotplug_store(struct device *dev, struct device_attribute *attr,
if (data->state == online_type)
return len;
+ /*
+ * Start transition with target_state for the notifier.
+ * For unplug, use MMOP_OFFLINE since memory goes offline before removal.
+ */
+ if (online_type == DAX_KMEM_UNPLUGGED || online_type == MMOP_OFFLINE)
+ dax_kmem_start_transition(data, MMOP_OFFLINE);
+ else
+ dax_kmem_start_transition(data, online_type);
+
if (online_type == DAX_KMEM_UNPLUGGED) {
+ int expected = 0;
+
+ for (rc = 0; rc < dev_dax->nr_range; rc++)
+ if (data->res[rc])
+ expected++;
+
rc = dax_kmem_do_hotremove(dev_dax, data);
- if (rc < 0) {
+ dax_kmem_end_transition(data);
+ if (rc < expected) {
dev_warn(dev, "hotplug state is inconsistent\n");
- return rc;
+ return rc == 0 ? -EBUSY : -EIO;
}
data->state = DAX_KMEM_UNPLUGGED;
return len;
@@ -387,9 +516,12 @@ static ssize_t hotplug_store(struct device *dev, struct device_attribute *attr,
if (online_type == MMOP_OFFLINE) {
/* Can only offline from an online state */
- if (data->state != MMOP_ONLINE && data->state != MMOP_ONLINE_MOVABLE)
+ if (data->state != MMOP_ONLINE && data->state != MMOP_ONLINE_MOVABLE) {
+ dax_kmem_end_transition(data);
return -EINVAL;
+ }
rc = dax_kmem_do_offline(dev_dax, data);
+ dax_kmem_end_transition(data);
if (rc < 0) {
dev_warn(dev, "hotplug state is inconsistent\n");
return rc;
@@ -401,14 +533,18 @@ static ssize_t hotplug_store(struct device *dev, struct device_attribute *attr,
/* online_type is MMOP_ONLINE or MMOP_ONLINE_MOVABLE */
/* Cannot switch between online types without offlining first */
- if (data->state == MMOP_ONLINE || data->state == MMOP_ONLINE_MOVABLE)
+ if (data->state == MMOP_ONLINE || data->state == MMOP_ONLINE_MOVABLE) {
+ dax_kmem_end_transition(data);
return -EBUSY;
+ }
if (data->state == MMOP_OFFLINE)
rc = dax_kmem_do_online(dev_dax, data, online_type);
else
rc = dax_kmem_do_hotplug(dev_dax, data, online_type);
+ dax_kmem_end_transition(data);
+
if (rc < 0)
return rc;
@@ -490,12 +626,25 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
dev_set_drvdata(dev, data);
+ /* Register memory notifier to block external operations */
+ data->mem_nb.notifier_call = dax_kmem_memory_notifier_cb;
+ rc = register_memory_notifier(&data->mem_nb);
+ if (rc) {
+ dev_warn(dev, "failed to register memory notifier\n");
+ goto err_notifier;
+ }
+
/*
* Hotplug the memory using the system default online policy.
* This preserves backwards compatibility for existing users who
* rely on auto-online behavior.
+ *
+ * Start transition with resolved system default since the notifier
+ * validates the operation type matches.
*/
+ dax_kmem_start_transition(data, mhp_get_default_online_type());
rc = dax_kmem_do_hotplug(dev_dax, data, MMOP_SYSTEM_DEFAULT);
+ dax_kmem_end_transition(data);
if (rc < 0)
goto err_hotplug;
/*
@@ -511,6 +660,8 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
return 0;
err_hotplug:
+ unregister_memory_notifier(&data->mem_nb);
+err_notifier:
dev_set_drvdata(dev, NULL);
memory_group_unregister(data->mgid);
err_reg_mgid:
@@ -538,12 +689,15 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
* there is no way to hotremove this memory until reboot because device
* unbind will succeed even if we return failure.
*/
+ dax_kmem_start_transition(data, MMOP_OFFLINE);
success = dax_kmem_do_hotremove(dev_dax, data);
+ dax_kmem_end_transition(data);
if (success < dev_dax->nr_range) {
dev_err(dev, "Hotplug regions stuck online until reboot\n");
return;
}
+ unregister_memory_notifier(&data->mem_nb);
memory_group_unregister(data->mgid);
kfree(data->res_name);
kfree(data);
@@ -561,8 +715,10 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
{
struct device *dev = &dev_dax->dev;
+ struct dax_kmem_data *data = dev_get_drvdata(dev);
device_remove_file(dev, &dev_attr_hotplug);
+ unregister_memory_notifier(&data->mem_nb);
/*
* Without hotremove purposely leak the request_mem_region() for the
--
2.52.0
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH 8/8] dax/kmem: add memory notifier to block external state changes
2026-01-14 8:52 ` [PATCH 8/8] dax/kmem: add memory notifier to block external state changes Gregory Price
@ 2026-01-14 9:44 ` David Hildenbrand (Red Hat)
0 siblings, 0 replies; 15+ messages in thread
From: David Hildenbrand (Red Hat) @ 2026-01-14 9:44 UTC (permalink / raw)
To: Gregory Price, linux-mm
Cc: linux-cxl, nvdimm, linux-kernel, virtualization, kernel-team,
dan.j.williams, vishal.l.verma, dave.jiang, mst, jasowang,
xuanzhuo, eperezma, osalvador, akpm, Hannes Reinecke
On 1/14/26 09:52, Gregory Price wrote:
> Add a memory notifier to prevent external operations from changing the
> online/offline state of memory blocks managed by dax_kmem. This ensures
> state changes only occur through the driver's hotplug sysfs interface,
> providing consistent state tracking and preventing races with auto-online
> policies or direct memory block sysfs manipulation.
>
> The notifier uses a transition protocol with memory barriers:
> - Before initiating a state change, set target_state then in_transition
> - Use a barrier to ensure target_state is visible before in_transition
> - The notifier checks in_transition, then uses barrier before reading
> target_state to ensure proper ordering on weakly-ordered architectures
>
> The notifier callback:
> - Returns NOTIFY_DONE for non-overlapping memory (not our concern)
> - Returns NOTIFY_BAD if in_transition is false (block external ops)
> - Validates the memory event matches target_state (MEM_GOING_ONLINE
> for online operations, MEM_GOING_OFFLINE for offline/unplug)
> - Returns NOTIFY_OK only for driver-initiated operations with matching
> target_state
>
> This prevents scenarios where:
> - Auto-online policies re-online memory the driver is trying to offline
Is this still a problem when using offline_and_remove_memory() ?
> - Users manually change memory state via /sys/devices/system/memory/
I don't see why we would want to care about that :)
> - Other kernel subsystems interfere with driver-managed memory state
What do you have in mind?
Not sure if this functionality here is really needed when the driver
does add+online and offline+remove in a single operation. So please
elaborate :)
--
Cheers
David
^ permalink raw reply [flat|nested] 15+ messages in thread