* [PATCH v2 0/4] Add managed SOFT RESERVE resource handling
@ 2025-01-16 17:42 Nathan Fontenot
2025-01-16 17:42 ` [PATCH v2 1/4] kernel/resource: Introduce managed SOFT RESERVED resources Nathan Fontenot
` (3 more replies)
0 siblings, 4 replies; 25+ messages in thread
From: Nathan Fontenot @ 2025-01-16 17:42 UTC (permalink / raw)
To: linux-cxl; +Cc: dan.j.williams, alison.schofield, linux-mm, gourry
Add the ability to manage SOFT RESERVE iomem resources prior to them
being added to the iomem resource tree. This allows drivers, such as
CXL, to remove any pieces of the SOFT RESERVE resource that intersect
with created CXL regions.
The current approac of leaving the SOFT RESERVE resources as is can
cause failures during hotplug of devices, such as CXL, because the
resource is not available for reuse after teardown of the device.
The approach is to add SOFT RESERVE resources to a separate tree
during boot. This allows any drivers to update the SOFT RESERVE
resources before they are merged into the iomem resource tree. In
addition a notifier chain is added so that drivers can be notified
when these SOFT RESERVE resources are added to the ioeme resource
tree.
The CXL driver is modified to use a worker thread that waits for
the CXL PCI and CXL mem drivers to be loaded and for their probe
routine to complete. Then walks through any created CXL regions to
remove pieces that intersect with SOFT RESERVE resources from the
SOFT RESERVES before adding the SOFT RESERVES to the iomem tree.
The dax driver uses the new soft reserve notifier chain so it can
consume any remaining SOFT RESERVES once they're added to the
iomem tree.
v2 updates:
- Add config option SOFT_RESERVE_MANAGED to control use of the
separate srmem resource tree at boot.
- Only add SOFT RESERVE resources to the soft reserve tree during
boot, they go to the iomem resource tree after boot.
- Remove the resource trimming code in the previous patch to re-use
the existing code in kernel/resource.c
-Add functionality for the cxl acpi driver to wait for the cxl PCI
and me drivers to load.
Nathan Fontenot (4):
kernel/resource: Introduce managed SOFT RESERVED resources
cxl: Update Soft Reserve resources upon region creation
dax: Update hmem resource/device registration
Add SOFT RESERVE resource notification chain
drivers/acpi/numa/hmat.c | 7 +--
drivers/cxl/Kconfig | 1 +
drivers/cxl/acpi.c | 26 ++++++++++
drivers/cxl/core/Makefile | 2 +-
drivers/cxl/core/region.c | 25 +++++++++-
drivers/cxl/core/suspend.c | 41 +++++++++++++++
drivers/cxl/cxl.h | 3 ++
drivers/cxl/cxlmem.h | 9 ----
drivers/cxl/cxlpci.h | 2 +
drivers/cxl/pci.c | 1 +
drivers/dax/hmem/device.c | 14 +++---
drivers/dax/hmem/hmem.c | 34 +++++++++++--
include/linux/dax.h | 9 ++--
include/linux/ioport.h | 20 ++++++++
kernel/resource.c | 100 +++++++++++++++++++++++++++++++++++--
lib/Kconfig | 4 ++
16 files changed, 260 insertions(+), 38 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v2 1/4] kernel/resource: Introduce managed SOFT RESERVED resources
2025-01-16 17:42 [PATCH v2 0/4] Add managed SOFT RESERVE resource handling Nathan Fontenot
@ 2025-01-16 17:42 ` Nathan Fontenot
2025-01-21 8:19 ` David Hildenbrand
2025-01-22 5:52 ` Fan Ni
2025-01-16 17:42 ` [PATCH v2 2/4] cxl: Update Soft Reserve resources upon region creation Nathan Fontenot
` (2 subsequent siblings)
3 siblings, 2 replies; 25+ messages in thread
From: Nathan Fontenot @ 2025-01-16 17:42 UTC (permalink / raw)
To: linux-cxl; +Cc: dan.j.williams, alison.schofield, linux-mm, gourry
Introduce the ability to manage SOFT RESERVED kernel resources prior to
these resources being placed in the iomem resource tree. This provides
the ability for drivers to update SOFT RESERVED resources that intersect
with their memory resources.
During boot, any resources marked as IORES_DESC_SOFT_RESERVED are placed
on the soft reserve resource tree. Once boot completes all resources
are placed on the iomem resource tree. This behavior is gated by a new
kernel option CONFIG_SOFT_RESERVED_MANAGED.
As part of this update two new interfaces are added for management of
the SOFT RESERVED resources. The release_srmem_region_adjustable()
routine allows for removing pieces of SOFT RESERVED resources. The
the merge_srmem_resources() allows drivers to merge any remaining SOFT
RESERVED resources into the iomem resource tree once updatea are complete.
Signed-off-by: Nathan Fontenot <nathan.fontenot@amd.com>
---
include/linux/ioport.h | 9 +++++
kernel/resource.c | 79 +++++++++++++++++++++++++++++++++++++++---
lib/Kconfig | 4 +++
3 files changed, 87 insertions(+), 5 deletions(-)
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 6e9fb667a1c5..2c95cf0be45e 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -249,6 +249,15 @@ struct resource *lookup_resource(struct resource *root, resource_size_t start);
int adjust_resource(struct resource *res, resource_size_t start,
resource_size_t size);
resource_size_t resource_alignment(struct resource *res);
+
+#ifdef CONFIG_SOFT_RESERVED_MANAGED
+void merge_srmem_resources(void);
+extern void release_srmem_region_adjustable(resource_size_t start,
+ resource_size_t size);
+#else
+static inline void merge_srmem_resources(void) { }
+#endif
+
static inline resource_size_t resource_size(const struct resource *res)
{
return res->end - res->start + 1;
diff --git a/kernel/resource.c b/kernel/resource.c
index a83040fde236..9db420078a3f 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -48,6 +48,14 @@ struct resource iomem_resource = {
};
EXPORT_SYMBOL(iomem_resource);
+static struct resource srmem_resource = {
+ .name = "Soft Reserved mem",
+ .start = 0,
+ .end = -1,
+ .flags = IORESOURCE_MEM,
+ .desc = IORES_DESC_SOFT_RESERVED,
+};
+
static DEFINE_RWLOCK(resource_lock);
static struct resource *next_resource(struct resource *p, bool skip_children)
@@ -818,6 +826,19 @@ static struct resource * __insert_resource(struct resource *parent, struct resou
{
struct resource *first, *next;
+ if (IS_ENABLED(CONFIG_SOFT_RESERVED_MANAGED)) {
+ /*
+ * During boot SOFT RESERVED resources are placed on the srmem
+ * resource tree. These resources may be updated later in boot,
+ * for example see the CXL driver, prior to being merged into
+ * the iomem resource tree.
+ */
+ if (system_state < SYSTEM_RUNNING &&
+ parent == &iomem_resource &&
+ new->desc == IORES_DESC_SOFT_RESERVED)
+ parent = &srmem_resource;
+ }
+
for (;; parent = first) {
first = __request_resource(parent, new);
if (!first)
@@ -1336,11 +1357,12 @@ void __release_region(struct resource *parent, resource_size_t start,
}
EXPORT_SYMBOL(__release_region);
-#ifdef CONFIG_MEMORY_HOTREMOVE
/**
- * release_mem_region_adjustable - release a previously reserved memory region
+ * release_region_adjustable - release a previously reserved memory region
+ * @parent: resource tree to release resource from
* @start: resource start address
* @size: resource region size
+ * @busy_check: check for IORESOURCE_BUSY
*
* This interface is intended for memory hot-delete. The requested region
* is released from a currently busy memory resource. The requested region
@@ -1356,9 +1378,11 @@ EXPORT_SYMBOL(__release_region);
* assumes that all children remain in the lower address entry for
* simplicity. Enhance this logic when necessary.
*/
-void release_mem_region_adjustable(resource_size_t start, resource_size_t size)
+static void release_region_adjustable(struct resource *parent,
+ resource_size_t start,
+ resource_size_t size,
+ bool busy_check)
{
- struct resource *parent = &iomem_resource;
struct resource *new_res = NULL;
bool alloc_nofail = false;
struct resource **p;
@@ -1395,7 +1419,7 @@ void release_mem_region_adjustable(resource_size_t start, resource_size_t size)
if (!(res->flags & IORESOURCE_MEM))
break;
- if (!(res->flags & IORESOURCE_BUSY)) {
+ if (busy_check && !(res->flags & IORESOURCE_BUSY)) {
p = &res->child;
continue;
}
@@ -1445,6 +1469,51 @@ void release_mem_region_adjustable(resource_size_t start, resource_size_t size)
write_unlock(&resource_lock);
free_resource(new_res);
}
+
+#ifdef CONFIG_SOFT_RESERVED_MANAGED
+/**
+ * merge_srmem_resources - merge srmem resources into the iomem resource tree
+ *
+ * This is intended to allow kernel drivers that manage the SOFT RESERVED
+ * resources to merge any remaining resources into the iomem resource tree
+ * once any updates have been made.
+ */
+void merge_srmem_resources(void)
+{
+ struct resource *res, *next;
+ int rc;
+
+ for (res = srmem_resource.child; res; res = next) {
+ next = next_resource(res, true);
+
+ write_lock(&resource_lock);
+
+ if (WARN_ON(__release_resource(res, true))) {
+ write_unlock(&resource_lock);
+ continue;
+ }
+
+ if (WARN_ON(__insert_resource(&iomem_resource, res)))
+ __insert_resource(&srmem_resource, res);
+
+ write_unlock(&resource_lock);
+ }
+}
+EXPORT_SYMBOL_GPL(merge_srmem_resources);
+
+void release_srmem_region_adjustable(resource_size_t start,
+ resource_size_t size)
+{
+ release_region_adjustable(&srmem_resource, start, size, false);
+}
+EXPORT_SYMBOL(release_srmem_region_adjustable);
+#endif
+
+#ifdef CONFIG_MEMORY_HOTREMOVE
+void release_mem_region_adjustable(resource_size_t start, resource_size_t size)
+{
+ release_region_adjustable(&iomem_resource, start, size, true);
+}
#endif /* CONFIG_MEMORY_HOTREMOVE */
#ifdef CONFIG_MEMORY_HOTPLUG
diff --git a/lib/Kconfig b/lib/Kconfig
index b38849af6f13..4f4011334051 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -777,3 +777,7 @@ config POLYNOMIAL
config FIRMWARE_TABLE
bool
+
+config SOFT_RESERVED_MANAGED
+ bool
+ default n
--
2.43.0
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v2 2/4] cxl: Update Soft Reserve resources upon region creation
2025-01-16 17:42 [PATCH v2 0/4] Add managed SOFT RESERVE resource handling Nathan Fontenot
2025-01-16 17:42 ` [PATCH v2 1/4] kernel/resource: Introduce managed SOFT RESERVED resources Nathan Fontenot
@ 2025-01-16 17:42 ` Nathan Fontenot
2025-01-16 17:42 ` [PATCH v2 3/4] dax: Update hmem resource/device registration Nathan Fontenot
2025-01-16 17:42 ` [PATCH v2 4/4] Add SOFT RESERVE resource notification chain Nathan Fontenot
3 siblings, 0 replies; 25+ messages in thread
From: Nathan Fontenot @ 2025-01-16 17:42 UTC (permalink / raw)
To: linux-cxl; +Cc: dan.j.williams, alison.schofield, linux-mm, gourry
Update handling of SOFT RESERVE iomem resources that intersect with
CXL region resources to remove intersections from the SOFT RESERVE
resources. The current approach of leaving SOFT RESERVE resources as
is can cause failures during hotplug replace of CXL devices bercause
the resource is not avaialable for reuse after teardown of the CXL device.
The approach is to use the CONFIG_SOFT_RESERVED_MANAGED config option
to have SOFT RESERVE resources set aside during boot. After the CXL
drivers complete their probe and CXL regions are created any CXL
region intersections with SOFT RESERVE resources are removed from the
SOFT RESERVE resource. The remaining SOFT RESERVE resources, if any,
are then added to the iomem resource tree.
To accomplish this the cxl acpi driver creates a worker thread at the
end of cxl_acpi_probe(). This worker thread first waits for the CXL PCI
CXL mem drivers have loaded. The cxl core/suspend.c code is updated to
add a pci_loaded variable, in addition to the mem_active variable, that
is updated when the pci driver loads. A new cxl_wait_for_pci_mem() routine
uses a waitqueue for both these driver to be loaded. The need to add
this additional waitqueue is ensure the CXL PCI and CXL mem drivers
have loaded before we wait for their probe, without it the cxl acpi probe
worker thread calls wait_for_device_probe() before these drivers
are loaded.
After the CXL PCI and CXL mem drivers load the cxl acpi worker thread
uses wait_for_device_probe() to ensure device probe routines have
completed.
After probe completes, find all cxl regions that have been created and
remove any intersections with SOFT RESERVE resources and add remaining
SOFT RESERRVES to the iomem resource tree.
Signed-off-by: Nathan Fontenot <nathan.fontenot@amd.com>
---
drivers/cxl/Kconfig | 1 +
drivers/cxl/acpi.c | 26 ++++++++++++++++++++++++
drivers/cxl/core/Makefile | 2 +-
drivers/cxl/core/region.c | 25 ++++++++++++++++++++++-
drivers/cxl/core/suspend.c | 41 ++++++++++++++++++++++++++++++++++++++
drivers/cxl/cxl.h | 3 +++
drivers/cxl/cxlmem.h | 9 ---------
drivers/cxl/cxlpci.h | 2 ++
drivers/cxl/pci.c | 1 +
9 files changed, 99 insertions(+), 11 deletions(-)
diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index 99b5c25be079..5d9dce2fb282 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -60,6 +60,7 @@ config CXL_ACPI
default CXL_BUS
select ACPI_TABLE_LIB
select ACPI_HMAT
+ select SOFT_RESERVED_MANAGED
help
Enable support for host managed device memory (HDM) resources
published by a platform's ACPI CXL memory layout description. See
diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index 82b78e331d8e..a95004230f85 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -7,6 +7,8 @@
#include <linux/acpi.h>
#include <linux/pci.h>
#include <linux/node.h>
+#include <linux/pm.h>
+#include <linux/workqueue.h>
#include <asm/div64.h>
#include "cxlpci.h"
#include "cxl.h"
@@ -813,6 +815,27 @@ static int pair_cxl_resource(struct device *dev, void *data)
return 0;
}
+static void cxl_srmem_work_fn(struct work_struct *work)
+{
+ /* Wait for CXL PCI and mem drivers to load */
+ cxl_wait_for_pci_mem();
+
+ /*
+ * Once the CXL PCI and mem drivers have loaded wait
+ * for the driver probe routines to complete.
+ */
+ wait_for_device_probe();
+
+ cxl_region_srmem_update();
+}
+
+DECLARE_WORK(cxl_sr_work, cxl_srmem_work_fn);
+
+static void cxl_srmem_update(void)
+{
+ schedule_work(&cxl_sr_work);
+}
+
static int cxl_acpi_probe(struct platform_device *pdev)
{
int rc;
@@ -887,6 +910,9 @@ static int cxl_acpi_probe(struct platform_device *pdev)
/* In case PCI is scanned before ACPI re-trigger memdev attach */
cxl_bus_rescan();
+
+ /* Update SOFT RESERVED resources that intersect with CXL regions */
+ cxl_srmem_update();
return 0;
}
diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
index 9259bcc6773c..01587ba1dcdb 100644
--- a/drivers/cxl/core/Makefile
+++ b/drivers/cxl/core/Makefile
@@ -1,6 +1,6 @@
# SPDX-License-Identifier: GPL-2.0
obj-$(CONFIG_CXL_BUS) += cxl_core.o
-obj-$(CONFIG_CXL_SUSPEND) += suspend.o
+obj-y += suspend.o
ccflags-y += -I$(srctree)/drivers/cxl
CFLAGS_trace.o = -DTRACE_INCLUDE_PATH=. -I$(src)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 21ad5f242875..3f4a7cc4539b 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -10,6 +10,7 @@
#include <linux/sort.h>
#include <linux/idr.h>
#include <linux/memory-tiers.h>
+#include <linux/ioport.h>
#include <cxlmem.h>
#include <cxl.h>
#include "core.h"
@@ -2294,7 +2295,7 @@ const struct device_type cxl_region_type = {
bool is_cxl_region(struct device *dev)
{
- return dev->type == &cxl_region_type;
+ return dev && dev->type == &cxl_region_type;
}
EXPORT_SYMBOL_NS_GPL(is_cxl_region, CXL);
@@ -3377,6 +3378,28 @@ int cxl_add_to_region(struct cxl_port *root, struct cxl_endpoint_decoder *cxled)
}
EXPORT_SYMBOL_NS_GPL(cxl_add_to_region, CXL);
+int cxl_region_srmem_update(void)
+{
+ struct device *dev = NULL;
+ struct cxl_region *cxlr;
+ struct resource *res;
+
+ do {
+ dev = bus_find_next_device(&cxl_bus_type, dev);
+ if (is_cxl_region(dev)) {
+ cxlr = to_cxl_region(dev);
+ res = cxlr->params.res;
+ release_srmem_region_adjustable(res->start,
+ resource_size(res));
+ put_device(dev);
+ }
+ } while (dev);
+
+ merge_srmem_resources();
+ return 0;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_region_srmem_update, CXL);
+
static int is_system_ram(struct resource *res, void *arg)
{
struct cxl_region *cxlr = arg;
diff --git a/drivers/cxl/core/suspend.c b/drivers/cxl/core/suspend.c
index a5984d96ea1d..589f7fc931ee 100644
--- a/drivers/cxl/core/suspend.c
+++ b/drivers/cxl/core/suspend.c
@@ -2,18 +2,27 @@
/* Copyright(c) 2022 Intel Corporation. All rights reserved. */
#include <linux/atomic.h>
#include <linux/export.h>
+#include <linux/wait.h>
#include "cxlmem.h"
+#include "cxlpci.h"
static atomic_t mem_active;
+static DECLARE_WAIT_QUEUE_HEAD(cxl_wait_queue);
+
+static atomic_t pci_loaded;
+
+#ifdef CONFIG_CXL_SUSPEND
bool cxl_mem_active(void)
{
return atomic_read(&mem_active) != 0;
}
+#endif
void cxl_mem_active_inc(void)
{
atomic_inc(&mem_active);
+ wake_up(&cxl_wait_queue);
}
EXPORT_SYMBOL_NS_GPL(cxl_mem_active_inc, CXL);
@@ -22,3 +31,35 @@ void cxl_mem_active_dec(void)
atomic_dec(&mem_active);
}
EXPORT_SYMBOL_NS_GPL(cxl_mem_active_dec, CXL);
+
+void mark_cxl_pci_loaded(void)
+{
+ atomic_inc(&pci_loaded);
+ wake_up(&cxl_wait_queue);
+}
+EXPORT_SYMBOL_NS_GPL(mark_cxl_pci_loaded, CXL);
+
+static bool cxl_pci_loaded(void)
+{
+ if (IS_ENABLED(CONFIG_CXL_PCI))
+ return atomic_read(&pci_loaded) != 0;
+
+ return true;
+}
+
+static bool cxl_mem_probed(void)
+{
+ if (IS_ENABLED(CONFIG_CXL_MEM))
+ return atomic_read(&mem_active) != 0;
+
+ return true;
+}
+
+void cxl_wait_for_pci_mem(void)
+{
+ if (IS_ENABLED(CONFIG_CXL_PCI) || IS_ENABLED(CONFIG_CXL_MEM))
+ wait_event_timeout(cxl_wait_queue,
+ cxl_pci_loaded() && cxl_mem_probed(),
+ 30 * HZ);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_wait_for_pci_mem, CXL);
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 9afb407d438f..65e425e72970 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -861,6 +861,7 @@ bool is_cxl_pmem_region(struct device *dev);
struct cxl_pmem_region *to_cxl_pmem_region(struct device *dev);
int cxl_add_to_region(struct cxl_port *root,
struct cxl_endpoint_decoder *cxled);
+int cxl_region_srmem_update(void);
struct cxl_dax_region *to_cxl_dax_region(struct device *dev);
#else
static inline bool is_cxl_pmem_region(struct device *dev)
@@ -898,6 +899,8 @@ void cxl_coordinates_combine(struct access_coordinate *out,
bool cxl_endpoint_decoder_reset_detected(struct cxl_port *port);
+void cxl_wait_for_pci_mem(void);
+
/*
* Unit test builds overrides this to __weak, find the 'strong' version
* of these symbols in tools/testing/cxl/.
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index afb53d058d62..df6bf7778321 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -838,17 +838,8 @@ int cxl_trigger_poison_list(struct cxl_memdev *cxlmd);
int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa);
int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa);
-#ifdef CONFIG_CXL_SUSPEND
void cxl_mem_active_inc(void);
void cxl_mem_active_dec(void);
-#else
-static inline void cxl_mem_active_inc(void)
-{
-}
-static inline void cxl_mem_active_dec(void)
-{
-}
-#endif
int cxl_mem_sanitize(struct cxl_memdev *cxlmd, u16 cmd);
diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
index 4da07727ab9c..42d9423ba4c0 100644
--- a/drivers/cxl/cxlpci.h
+++ b/drivers/cxl/cxlpci.h
@@ -129,4 +129,6 @@ void read_cdat_data(struct cxl_port *port);
void cxl_cor_error_detected(struct pci_dev *pdev);
pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
pci_channel_state_t state);
+
+void mark_cxl_pci_loaded(void);
#endif /* __CXL_PCI_H__ */
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 4be35dc22202..d90505e3605c 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -1054,6 +1054,7 @@ static int __init cxl_pci_driver_init(void)
if (rc)
pci_unregister_driver(&cxl_pci_driver);
+ mark_cxl_pci_loaded();
return rc;
}
--
2.43.0
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v2 3/4] dax: Update hmem resource/device registration
2025-01-16 17:42 [PATCH v2 0/4] Add managed SOFT RESERVE resource handling Nathan Fontenot
2025-01-16 17:42 ` [PATCH v2 1/4] kernel/resource: Introduce managed SOFT RESERVED resources Nathan Fontenot
2025-01-16 17:42 ` [PATCH v2 2/4] cxl: Update Soft Reserve resources upon region creation Nathan Fontenot
@ 2025-01-16 17:42 ` Nathan Fontenot
2025-01-16 22:28 ` Ira Weiny
2025-01-16 17:42 ` [PATCH v2 4/4] Add SOFT RESERVE resource notification chain Nathan Fontenot
3 siblings, 1 reply; 25+ messages in thread
From: Nathan Fontenot @ 2025-01-16 17:42 UTC (permalink / raw)
To: linux-cxl; +Cc: dan.j.williams, alison.schofield, linux-mm, gourry
In order to handle registering hmem devices for SOFT RESERVE reources
that are added late in boot update the hmem_register_resource(),
hmem_register_device(), and walk_hmem_resources() interfaces.
Remove the target_nid arg to hmem_register_resource(). The target nid
value is calculated from the resource start address and not used until
registering a device for the resource. Move the target nid calculation
to hmem_register_device().
To allow for registering hmem devices outside of the hmem dax driver
probe routine save the dax hmem platform driver during probe. The
hmem_register_device() interface can then drop the host and target
nid parameters.
There should be no functional changes.
Signed-off-by: Nathan Fontenot <nathan.fontenot@amd.com>
---
drivers/acpi/numa/hmat.c | 7 ++-----
drivers/dax/hmem/device.c | 14 ++++++--------
drivers/dax/hmem/hmem.c | 12 ++++++++----
include/linux/dax.h | 9 ++++-----
4 files changed, 20 insertions(+), 22 deletions(-)
diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
index 1a902a02390f..23d4b3ad6d88 100644
--- a/drivers/acpi/numa/hmat.c
+++ b/drivers/acpi/numa/hmat.c
@@ -857,11 +857,8 @@ static void hmat_register_target_devices(struct memory_target *target)
if (!IS_ENABLED(CONFIG_DEV_DAX_HMEM))
return;
- for (res = target->memregions.child; res; res = res->sibling) {
- int target_nid = pxm_to_node(target->memory_pxm);
-
- hmem_register_resource(target_nid, res);
- }
+ for (res = target->memregions.child; res; res = res->sibling)
+ hmem_register_resource(res);
}
static void hmat_register_target(struct memory_target *target)
diff --git a/drivers/dax/hmem/device.c b/drivers/dax/hmem/device.c
index f9e1a76a04a9..ae25e08a636f 100644
--- a/drivers/dax/hmem/device.c
+++ b/drivers/dax/hmem/device.c
@@ -17,14 +17,14 @@ static struct resource hmem_active = {
.flags = IORESOURCE_MEM,
};
-int walk_hmem_resources(struct device *host, walk_hmem_fn fn)
+int walk_hmem_resources(walk_hmem_fn fn)
{
struct resource *res;
int rc = 0;
mutex_lock(&hmem_resource_lock);
for (res = hmem_active.child; res; res = res->sibling) {
- rc = fn(host, (int) res->desc, res);
+ rc = fn(res);
if (rc)
break;
}
@@ -33,7 +33,7 @@ int walk_hmem_resources(struct device *host, walk_hmem_fn fn)
}
EXPORT_SYMBOL_GPL(walk_hmem_resources);
-static void __hmem_register_resource(int target_nid, struct resource *res)
+static void __hmem_register_resource(struct resource *res)
{
struct platform_device *pdev;
struct resource *new;
@@ -46,8 +46,6 @@ static void __hmem_register_resource(int target_nid, struct resource *res)
return;
}
- new->desc = target_nid;
-
if (platform_initialized)
return;
@@ -64,19 +62,19 @@ static void __hmem_register_resource(int target_nid, struct resource *res)
platform_initialized = true;
}
-void hmem_register_resource(int target_nid, struct resource *res)
+void hmem_register_resource(struct resource *res)
{
if (nohmem)
return;
mutex_lock(&hmem_resource_lock);
- __hmem_register_resource(target_nid, res);
+ __hmem_register_resource(res);
mutex_unlock(&hmem_resource_lock);
}
static __init int hmem_register_one(struct resource *res, void *data)
{
- hmem_register_resource(phys_to_target_node(res->start), res);
+ hmem_register_resource(res);
return 0;
}
diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index 5e7c53f18491..088f4060d4d5 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -9,6 +9,8 @@
static bool region_idle;
module_param_named(region_idle, region_idle, bool, 0644);
+static struct platform_device *dax_hmem_pdev;
+
static int dax_hmem_probe(struct platform_device *pdev)
{
unsigned long flags = IORESOURCE_DAX_KMEM;
@@ -59,13 +61,13 @@ static void release_hmem(void *pdev)
platform_device_unregister(pdev);
}
-static int hmem_register_device(struct device *host, int target_nid,
- const struct resource *res)
+static int hmem_register_device(const struct resource *res)
{
+ struct device *host = &dax_hmem_pdev->dev;
struct platform_device *pdev;
struct memregion_info info;
+ int target_nid, rc;
long id;
- int rc;
if (IS_ENABLED(CONFIG_CXL_REGION) &&
region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
@@ -94,6 +96,7 @@ static int hmem_register_device(struct device *host, int target_nid,
return -ENOMEM;
}
+ target_nid = phys_to_target_node(res->start);
pdev->dev.numa_node = numa_map_to_online_node(target_nid);
info = (struct memregion_info) {
.target_node = target_nid,
@@ -125,7 +128,8 @@ static int hmem_register_device(struct device *host, int target_nid,
static int dax_hmem_platform_probe(struct platform_device *pdev)
{
- return walk_hmem_resources(&pdev->dev, hmem_register_device);
+ dax_hmem_pdev = pdev;
+ return walk_hmem_resources(hmem_register_device);
}
static struct platform_driver dax_hmem_platform_driver = {
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 9d3e3327af4c..beaa4bcb515c 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -276,14 +276,13 @@ static inline int dax_mem2blk_err(int err)
}
#ifdef CONFIG_DEV_DAX_HMEM_DEVICES
-void hmem_register_resource(int target_nid, struct resource *r);
+void hmem_register_resource(struct resource *r);
#else
-static inline void hmem_register_resource(int target_nid, struct resource *r)
+static inline void hmem_register_resource(struct resource *r)
{
}
#endif
-typedef int (*walk_hmem_fn)(struct device *dev, int target_nid,
- const struct resource *res);
-int walk_hmem_resources(struct device *dev, walk_hmem_fn fn);
+typedef int (*walk_hmem_fn)(const struct resource *res);
+int walk_hmem_resources(walk_hmem_fn fn);
#endif
--
2.43.0
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v2 4/4] Add SOFT RESERVE resource notification chain
2025-01-16 17:42 [PATCH v2 0/4] Add managed SOFT RESERVE resource handling Nathan Fontenot
` (2 preceding siblings ...)
2025-01-16 17:42 ` [PATCH v2 3/4] dax: Update hmem resource/device registration Nathan Fontenot
@ 2025-01-16 17:42 ` Nathan Fontenot
3 siblings, 0 replies; 25+ messages in thread
From: Nathan Fontenot @ 2025-01-16 17:42 UTC (permalink / raw)
To: linux-cxl; +Cc: dan.j.williams, alison.schofield, linux-mm, gourry
Add a notification chain for SOFT RESERVE resources that are added
to the iomem resource tree when the SOFT_RESERVE_MANAGED config
option is specified.
Update the dax driver to register a notification handler for SOFT
RESERVE resources so that any late added SOFT RESERVES can be
consumed by the driver.
Signed-off-by: Nathan Fontenot <nathan.fontenot@amd.com>
---
drivers/dax/hmem/hmem.c | 24 +++++++++++++++++++++++-
include/linux/ioport.h | 11 +++++++++++
kernel/resource.c | 21 +++++++++++++++++++++
3 files changed, 55 insertions(+), 1 deletion(-)
diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index 088f4060d4d5..0e6b7558ca3d 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -126,14 +126,36 @@ static int hmem_register_device(const struct resource *res)
return rc;
}
+static int dax_hmem_cb(struct notifier_block *nb, unsigned long action,
+ void *arg)
+{
+ return hmem_register_device((struct resource *)arg);
+}
+
+static struct notifier_block dax_hmem_nb = {
+ .notifier_call = dax_hmem_cb,
+};
+
static int dax_hmem_platform_probe(struct platform_device *pdev)
{
+ int rc;
+
dax_hmem_pdev = pdev;
- return walk_hmem_resources(hmem_register_device);
+ rc = walk_hmem_resources(hmem_register_device);
+
+ register_srmem_notifier(&dax_hmem_nb);
+ return rc;
+}
+
+static void dax_hmem_platform_remove(struct platform_device *pdev)
+{
+ dax_hmem_pdev = NULL;
+ unregister_srmem_notifier(&dax_hmem_nb);
}
static struct platform_driver dax_hmem_platform_driver = {
.probe = dax_hmem_platform_probe,
+ .remove = dax_hmem_platform_remove,
.driver = {
.name = "hmem_platform",
},
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 2c95cf0be45e..c173cdd5ab87 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -13,6 +13,7 @@
#include <linux/bits.h>
#include <linux/compiler.h>
#include <linux/minmax.h>
+#include <linux/notifier.h>
#include <linux/types.h>
/*
* Resources are tree-like, allowing
@@ -254,8 +255,18 @@ resource_size_t resource_alignment(struct resource *res);
void merge_srmem_resources(void);
extern void release_srmem_region_adjustable(resource_size_t start,
resource_size_t size);
+int register_srmem_notifier(struct notifier_block *nb);
+int unregister_srmem_notifier(struct notifier_block *nb);
#else
static inline void merge_srmem_resources(void) { }
+static int register_srmem_notifier(struct notifier_block *nb)
+{
+ return 0;
+}
+static int unregister_srmem_notifier(struct notifier_block *nb)
+{
+ return 0;
+}
#endif
static inline resource_size_t resource_size(const struct resource *res)
diff --git a/kernel/resource.c b/kernel/resource.c
index 9db420078a3f..3e117e3ba2a5 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -1471,6 +1471,26 @@ static void release_region_adjustable(struct resource *parent,
}
#ifdef CONFIG_SOFT_RESERVED_MANAGED
+
+static RAW_NOTIFIER_HEAD(srmem_chain);
+
+int register_srmem_notifier(struct notifier_block *nb)
+{
+ return raw_notifier_chain_register(&srmem_chain, nb);
+}
+EXPORT_SYMBOL(register_srmem_notifier);
+
+int unregister_srmem_notifier(struct notifier_block *nb)
+{
+ return raw_notifier_chain_unregister(&srmem_chain, nb);
+}
+EXPORT_SYMBOL(unregister_srmem_notifier);
+
+static int srmem_notify(void *v)
+{
+ return raw_notifier_call_chain(&srmem_chain, 0, v);
+}
+
/**
* merge_srmem_resources - merge srmem resources into the iomem resource tree
*
@@ -1497,6 +1517,7 @@ void merge_srmem_resources(void)
__insert_resource(&srmem_resource, res);
write_unlock(&resource_lock);
+ srmem_notify(res);
}
}
EXPORT_SYMBOL_GPL(merge_srmem_resources);
--
2.43.0
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 3/4] dax: Update hmem resource/device registration
2025-01-16 17:42 ` [PATCH v2 3/4] dax: Update hmem resource/device registration Nathan Fontenot
@ 2025-01-16 22:28 ` Ira Weiny
2025-01-21 18:49 ` Fontenot, Nathan
0 siblings, 1 reply; 25+ messages in thread
From: Ira Weiny @ 2025-01-16 22:28 UTC (permalink / raw)
To: Nathan Fontenot, linux-cxl
Cc: dan.j.williams, alison.schofield, linux-mm, gourry
Nathan Fontenot wrote:
> In order to handle registering hmem devices for SOFT RESERVE reources
^^^^^^^^^
resources
> that are added late in boot update the hmem_register_resource(),
> hmem_register_device(), and walk_hmem_resources() interfaces.
>
> Remove the target_nid arg to hmem_register_resource(). The target nid
> value is calculated from the resource start address and not used until
> registering a device for the resource. Move the target nid calculation
> to hmem_register_device().
>
> To allow for registering hmem devices outside of the hmem dax driver
> probe routine save the dax hmem platform driver during probe. The
> hmem_register_device() interface can then drop the host and target
> nid parameters.
>
> There should be no functional changes.
>
> Signed-off-by: Nathan Fontenot <nathan.fontenot@amd.com>
> ---
> drivers/acpi/numa/hmat.c | 7 ++-----
> drivers/dax/hmem/device.c | 14 ++++++--------
> drivers/dax/hmem/hmem.c | 12 ++++++++----
> include/linux/dax.h | 9 ++++-----
> 4 files changed, 20 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
> index 1a902a02390f..23d4b3ad6d88 100644
> --- a/drivers/acpi/numa/hmat.c
> +++ b/drivers/acpi/numa/hmat.c
> @@ -857,11 +857,8 @@ static void hmat_register_target_devices(struct memory_target *target)
> if (!IS_ENABLED(CONFIG_DEV_DAX_HMEM))
> return;
>
> - for (res = target->memregions.child; res; res = res->sibling) {
> - int target_nid = pxm_to_node(target->memory_pxm);
> -
> - hmem_register_resource(target_nid, res);
> - }
> + for (res = target->memregions.child; res; res = res->sibling)
> + hmem_register_resource(res);
> }
>
> static void hmat_register_target(struct memory_target *target)
> diff --git a/drivers/dax/hmem/device.c b/drivers/dax/hmem/device.c
> index f9e1a76a04a9..ae25e08a636f 100644
> --- a/drivers/dax/hmem/device.c
> +++ b/drivers/dax/hmem/device.c
> @@ -17,14 +17,14 @@ static struct resource hmem_active = {
> .flags = IORESOURCE_MEM,
> };
>
> -int walk_hmem_resources(struct device *host, walk_hmem_fn fn)
> +int walk_hmem_resources(walk_hmem_fn fn)
> {
> struct resource *res;
> int rc = 0;
>
> mutex_lock(&hmem_resource_lock);
> for (res = hmem_active.child; res; res = res->sibling) {
> - rc = fn(host, (int) res->desc, res);
> + rc = fn(res);
> if (rc)
> break;
> }
> @@ -33,7 +33,7 @@ int walk_hmem_resources(struct device *host, walk_hmem_fn fn)
> }
> EXPORT_SYMBOL_GPL(walk_hmem_resources);
>
> -static void __hmem_register_resource(int target_nid, struct resource *res)
> +static void __hmem_register_resource(struct resource *res)
> {
> struct platform_device *pdev;
> struct resource *new;
> @@ -46,8 +46,6 @@ static void __hmem_register_resource(int target_nid, struct resource *res)
> return;
> }
>
> - new->desc = target_nid;
> -
> if (platform_initialized)
> return;
>
> @@ -64,19 +62,19 @@ static void __hmem_register_resource(int target_nid, struct resource *res)
> platform_initialized = true;
> }
>
> -void hmem_register_resource(int target_nid, struct resource *res)
> +void hmem_register_resource(struct resource *res)
> {
> if (nohmem)
> return;
>
> mutex_lock(&hmem_resource_lock);
> - __hmem_register_resource(target_nid, res);
> + __hmem_register_resource(res);
> mutex_unlock(&hmem_resource_lock);
> }
>
> static __init int hmem_register_one(struct resource *res, void *data)
> {
> - hmem_register_resource(phys_to_target_node(res->start), res);
> + hmem_register_resource(res);
>
> return 0;
> }
> diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
> index 5e7c53f18491..088f4060d4d5 100644
> --- a/drivers/dax/hmem/hmem.c
> +++ b/drivers/dax/hmem/hmem.c
> @@ -9,6 +9,8 @@
> static bool region_idle;
> module_param_named(region_idle, region_idle, bool, 0644);
>
> +static struct platform_device *dax_hmem_pdev;
I don't think you can assume there is only ever 1 hmem platform device.
hmat_register_target_devices() in particular iterates multiple memory
regions and will create more than one.
What am I missing?
Ira
> +
> static int dax_hmem_probe(struct platform_device *pdev)
> {
> unsigned long flags = IORESOURCE_DAX_KMEM;
> @@ -59,13 +61,13 @@ static void release_hmem(void *pdev)
> platform_device_unregister(pdev);
> }
>
> -static int hmem_register_device(struct device *host, int target_nid,
> - const struct resource *res)
> +static int hmem_register_device(const struct resource *res)
> {
> + struct device *host = &dax_hmem_pdev->dev;
> struct platform_device *pdev;
> struct memregion_info info;
> + int target_nid, rc;
> long id;
> - int rc;
>
> if (IS_ENABLED(CONFIG_CXL_REGION) &&
> region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
> @@ -94,6 +96,7 @@ static int hmem_register_device(struct device *host, int target_nid,
> return -ENOMEM;
> }
>
> + target_nid = phys_to_target_node(res->start);
> pdev->dev.numa_node = numa_map_to_online_node(target_nid);
> info = (struct memregion_info) {
> .target_node = target_nid,
> @@ -125,7 +128,8 @@ static int hmem_register_device(struct device *host, int target_nid,
>
> static int dax_hmem_platform_probe(struct platform_device *pdev)
> {
> - return walk_hmem_resources(&pdev->dev, hmem_register_device);
> + dax_hmem_pdev = pdev;
> + return walk_hmem_resources(hmem_register_device);
> }
>
> static struct platform_driver dax_hmem_platform_driver = {
> diff --git a/include/linux/dax.h b/include/linux/dax.h
> index 9d3e3327af4c..beaa4bcb515c 100644
> --- a/include/linux/dax.h
> +++ b/include/linux/dax.h
> @@ -276,14 +276,13 @@ static inline int dax_mem2blk_err(int err)
> }
>
> #ifdef CONFIG_DEV_DAX_HMEM_DEVICES
> -void hmem_register_resource(int target_nid, struct resource *r);
> +void hmem_register_resource(struct resource *r);
> #else
> -static inline void hmem_register_resource(int target_nid, struct resource *r)
> +static inline void hmem_register_resource(struct resource *r)
> {
> }
> #endif
>
> -typedef int (*walk_hmem_fn)(struct device *dev, int target_nid,
> - const struct resource *res);
> -int walk_hmem_resources(struct device *dev, walk_hmem_fn fn);
> +typedef int (*walk_hmem_fn)(const struct resource *res);
> +int walk_hmem_resources(walk_hmem_fn fn);
> #endif
> --
> 2.43.0
>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 1/4] kernel/resource: Introduce managed SOFT RESERVED resources
2025-01-16 17:42 ` [PATCH v2 1/4] kernel/resource: Introduce managed SOFT RESERVED resources Nathan Fontenot
@ 2025-01-21 8:19 ` David Hildenbrand
2025-01-21 18:57 ` Fontenot, Nathan
2025-01-22 5:52 ` Fan Ni
1 sibling, 1 reply; 25+ messages in thread
From: David Hildenbrand @ 2025-01-21 8:19 UTC (permalink / raw)
To: Nathan Fontenot, linux-cxl
Cc: dan.j.williams, alison.schofield, linux-mm, gourry
On 16.01.25 18:42, Nathan Fontenot wrote:
Hi,
> Introduce the ability to manage SOFT RESERVED kernel resources prior to
> these resources being placed in the iomem resource tree. This provides
> the ability for drivers to update SOFT RESERVED resources that intersect
> with their memory resources.
>
> During boot, any resources marked as IORES_DESC_SOFT_RESERVED are placed
> on the soft reserve resource tree. Once boot completes all resources
> are placed on the iomem resource tree. This behavior is gated by a new
> kernel option CONFIG_SOFT_RESERVED_MANAGED.
>
I'm missing a bit of context here.
Why can't we flag these regions in the existing iomem tree, where they
can be fixed up (even after boot?)?
Especially, what about deferred driver loading after boot? Why is that
not a concern or why can we reliably handle everything "during boot" ?
> As part of this update two new interfaces are added for management of
> the SOFT RESERVED resources. The release_srmem_region_adjustable()
> routine allows for removing pieces of SOFT RESERVED resources. The
> the merge_srmem_resources() allows drivers to merge any remaining SOFT
> RESERVED resources into the iomem resource tree once updatea are complete.
>
> Signed-off-by: Nathan Fontenot <nathan.fontenot@amd.com>
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 3/4] dax: Update hmem resource/device registration
2025-01-16 22:28 ` Ira Weiny
@ 2025-01-21 18:49 ` Fontenot, Nathan
2025-01-21 23:14 ` Ira Weiny
0 siblings, 1 reply; 25+ messages in thread
From: Fontenot, Nathan @ 2025-01-21 18:49 UTC (permalink / raw)
To: Ira Weiny, linux-cxl; +Cc: dan.j.williams, alison.schofield, linux-mm, gourry
On 1/16/2025 4:28 PM, Ira Weiny wrote:
> Nathan Fontenot wrote:
>> In order to handle registering hmem devices for SOFT RESERVE reources
> ^^^^^^^^^
> resources
>
>> that are added late in boot update the hmem_register_resource(),
>> hmem_register_device(), and walk_hmem_resources() interfaces.
>>
>> Remove the target_nid arg to hmem_register_resource(). The target nid
>> value is calculated from the resource start address and not used until
>> registering a device for the resource. Move the target nid calculation
>> to hmem_register_device().
>>
>> To allow for registering hmem devices outside of the hmem dax driver
>> probe routine save the dax hmem platform driver during probe. The
>> hmem_register_device() interface can then drop the host and target
>> nid parameters.
>>
>> There should be no functional changes.
>>
>> Signed-off-by: Nathan Fontenot <nathan.fontenot@amd.com>
>> ---
>> drivers/acpi/numa/hmat.c | 7 ++-----
>> drivers/dax/hmem/device.c | 14 ++++++--------
>> drivers/dax/hmem/hmem.c | 12 ++++++++----
>> include/linux/dax.h | 9 ++++-----
>> 4 files changed, 20 insertions(+), 22 deletions(-)
>>
>> diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
>> index 1a902a02390f..23d4b3ad6d88 100644
>> --- a/drivers/acpi/numa/hmat.c
>> +++ b/drivers/acpi/numa/hmat.c
>> @@ -857,11 +857,8 @@ static void hmat_register_target_devices(struct memory_target *target)
>> if (!IS_ENABLED(CONFIG_DEV_DAX_HMEM))
>> return;
>>
>> - for (res = target->memregions.child; res; res = res->sibling) {
>> - int target_nid = pxm_to_node(target->memory_pxm);
>> -
>> - hmem_register_resource(target_nid, res);
>> - }
>> + for (res = target->memregions.child; res; res = res->sibling)
>> + hmem_register_resource(res);
>> }
>>
>> static void hmat_register_target(struct memory_target *target)
>> diff --git a/drivers/dax/hmem/device.c b/drivers/dax/hmem/device.c
>> index f9e1a76a04a9..ae25e08a636f 100644
>> --- a/drivers/dax/hmem/device.c
>> +++ b/drivers/dax/hmem/device.c
>> @@ -17,14 +17,14 @@ static struct resource hmem_active = {
>> .flags = IORESOURCE_MEM,
>> };
>>
>> -int walk_hmem_resources(struct device *host, walk_hmem_fn fn)
>> +int walk_hmem_resources(walk_hmem_fn fn)
>> {
>> struct resource *res;
>> int rc = 0;
>>
>> mutex_lock(&hmem_resource_lock);
>> for (res = hmem_active.child; res; res = res->sibling) {
>> - rc = fn(host, (int) res->desc, res);
>> + rc = fn(res);
>> if (rc)
>> break;
>> }
>> @@ -33,7 +33,7 @@ int walk_hmem_resources(struct device *host, walk_hmem_fn fn)
>> }
>> EXPORT_SYMBOL_GPL(walk_hmem_resources);
>>
>> -static void __hmem_register_resource(int target_nid, struct resource *res)
>> +static void __hmem_register_resource(struct resource *res)
>> {
>> struct platform_device *pdev;
>> struct resource *new;
>> @@ -46,8 +46,6 @@ static void __hmem_register_resource(int target_nid, struct resource *res)
>> return;
>> }
>>
>> - new->desc = target_nid;
>> -
>> if (platform_initialized)
>> return;
>>
>> @@ -64,19 +62,19 @@ static void __hmem_register_resource(int target_nid, struct resource *res)
>> platform_initialized = true;
>> }
>>
>> -void hmem_register_resource(int target_nid, struct resource *res)
>> +void hmem_register_resource(struct resource *res)
>> {
>> if (nohmem)
>> return;
>>
>> mutex_lock(&hmem_resource_lock);
>> - __hmem_register_resource(target_nid, res);
>> + __hmem_register_resource(res);
>> mutex_unlock(&hmem_resource_lock);
>> }
>>
>> static __init int hmem_register_one(struct resource *res, void *data)
>> {
>> - hmem_register_resource(phys_to_target_node(res->start), res);
>> + hmem_register_resource(res);
>>
>> return 0;
>> }
>> diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
>> index 5e7c53f18491..088f4060d4d5 100644
>> --- a/drivers/dax/hmem/hmem.c
>> +++ b/drivers/dax/hmem/hmem.c
>> @@ -9,6 +9,8 @@
>> static bool region_idle;
>> module_param_named(region_idle, region_idle, bool, 0644);
>>
>> +static struct platform_device *dax_hmem_pdev;
>
> I don't think you can assume there is only ever 1 hmem platform device.
>
> hmat_register_target_devices() in particular iterates multiple memory
> regions and will create more than one.
>
> What am I missing?
You may be correct that there can be more than one hmem platform device.
I was making this change based on a comment from Dan that it may not matter
which platform device these are created against.
I could be wrong in that assumption. If so we'll need to figure lout how to
determine which platform device a soft reserve resource would be created
against when they are added later in boot from a notification by the
srmem notification chain.
-Nathan
> Ira
>
>> +
>> static int dax_hmem_probe(struct platform_device *pdev)
>> {
>> unsigned long flags = IORESOURCE_DAX_KMEM;
>> @@ -59,13 +61,13 @@ static void release_hmem(void *pdev)
>> platform_device_unregister(pdev);
>> }
>>
>> -static int hmem_register_device(struct device *host, int target_nid,
>> - const struct resource *res)
>> +static int hmem_register_device(const struct resource *res)
>> {
>> + struct device *host = &dax_hmem_pdev->dev;
>> struct platform_device *pdev;
>> struct memregion_info info;
>> + int target_nid, rc;
>> long id;
>> - int rc;
>>
>> if (IS_ENABLED(CONFIG_CXL_REGION) &&
>> region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
>> @@ -94,6 +96,7 @@ static int hmem_register_device(struct device *host, int target_nid,
>> return -ENOMEM;
>> }
>>
>> + target_nid = phys_to_target_node(res->start);
>> pdev->dev.numa_node = numa_map_to_online_node(target_nid);
>> info = (struct memregion_info) {
>> .target_node = target_nid,
>> @@ -125,7 +128,8 @@ static int hmem_register_device(struct device *host, int target_nid,
>>
>> static int dax_hmem_platform_probe(struct platform_device *pdev)
>> {
>> - return walk_hmem_resources(&pdev->dev, hmem_register_device);
>> + dax_hmem_pdev = pdev;
>> + return walk_hmem_resources(hmem_register_device);
>> }
>>
>> static struct platform_driver dax_hmem_platform_driver = {
>> diff --git a/include/linux/dax.h b/include/linux/dax.h
>> index 9d3e3327af4c..beaa4bcb515c 100644
>> --- a/include/linux/dax.h
>> +++ b/include/linux/dax.h
>> @@ -276,14 +276,13 @@ static inline int dax_mem2blk_err(int err)
>> }
>>
>> #ifdef CONFIG_DEV_DAX_HMEM_DEVICES
>> -void hmem_register_resource(int target_nid, struct resource *r);
>> +void hmem_register_resource(struct resource *r);
>> #else
>> -static inline void hmem_register_resource(int target_nid, struct resource *r)
>> +static inline void hmem_register_resource(struct resource *r)
>> {
>> }
>> #endif
>>
>> -typedef int (*walk_hmem_fn)(struct device *dev, int target_nid,
>> - const struct resource *res);
>> -int walk_hmem_resources(struct device *dev, walk_hmem_fn fn);
>> +typedef int (*walk_hmem_fn)(const struct resource *res);
>> +int walk_hmem_resources(walk_hmem_fn fn);
>> #endif
>> --
>> 2.43.0
>>
>>
>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 1/4] kernel/resource: Introduce managed SOFT RESERVED resources
2025-01-21 8:19 ` David Hildenbrand
@ 2025-01-21 18:57 ` Fontenot, Nathan
2025-01-22 6:03 ` Fan Ni
0 siblings, 1 reply; 25+ messages in thread
From: Fontenot, Nathan @ 2025-01-21 18:57 UTC (permalink / raw)
To: David Hildenbrand, linux-cxl
Cc: dan.j.williams, alison.schofield, linux-mm, gourry
On 1/21/2025 2:19 AM, David Hildenbrand wrote:
> On 16.01.25 18:42, Nathan Fontenot wrote:
>
> Hi,
>
>> Introduce the ability to manage SOFT RESERVED kernel resources prior to
>> these resources being placed in the iomem resource tree. This provides
>> the ability for drivers to update SOFT RESERVED resources that intersect
>> with their memory resources.
>>
>> During boot, any resources marked as IORES_DESC_SOFT_RESERVED are placed
>> on the soft reserve resource tree. Once boot completes all resources
>> are placed on the iomem resource tree. This behavior is gated by a new
>> kernel option CONFIG_SOFT_RESERVED_MANAGED.
>>
>
> I'm missing a bit of context here.
>
> Why can't we flag these regions in the existing iomem tree, where they can be fixed up (even after boot?)?
>
> Especially, what about deferred driver loading after boot? Why is that not a concern or why can we reliably handle everything "during boot" ?
That's a good question and one I should have addressed.
The goal is to prevent the dax driver from creating dax devices for soft reserve
resources prior to the soft reserve resources being updated for any intersecting
cxl regions.
During boot the dax hmem driver walks the iomem tree to save off a copy of all
soft reserve resources. The dax driver then later walks this copy to create
dax devices for the soft reserve regions. This occurs before the cxl drivers
load, create cxl regions, and update any intersecting soft reserve resources.
To prevent this the soft reserves are set aside on a separate list during boot
so that they can be updated (if needed) and later added to the iomem resource tree.
The dax driver is then notified of any soft reserves added to the iomem tree
so that it may consume them.
Hopefully that answers your question. I'll include this in the next version.
-Nathan
>
>> As part of this update two new interfaces are added for management of
>> the SOFT RESERVED resources. The release_srmem_region_adjustable()
>> routine allows for removing pieces of SOFT RESERVED resources. The
>> the merge_srmem_resources() allows drivers to merge any remaining SOFT
>> RESERVED resources into the iomem resource tree once updatea are complete.
>>
>> Signed-off-by: Nathan Fontenot <nathan.fontenot@amd.com>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 3/4] dax: Update hmem resource/device registration
2025-01-21 18:49 ` Fontenot, Nathan
@ 2025-01-21 23:14 ` Ira Weiny
2025-01-23 16:01 ` Fontenot, Nathan
0 siblings, 1 reply; 25+ messages in thread
From: Ira Weiny @ 2025-01-21 23:14 UTC (permalink / raw)
To: Fontenot, Nathan, Ira Weiny, linux-cxl
Cc: dan.j.williams, alison.schofield, linux-mm, gourry
Fontenot, Nathan wrote:
> On 1/16/2025 4:28 PM, Ira Weiny wrote:
> > Nathan Fontenot wrote:
> >> In order to handle registering hmem devices for SOFT RESERVE reources
> > ^^^^^^^^^
> > resources
> >
> >> that are added late in boot update the hmem_register_resource(),
> >> hmem_register_device(), and walk_hmem_resources() interfaces.
> >>
> >> Remove the target_nid arg to hmem_register_resource(). The target nid
> >> value is calculated from the resource start address and not used until
> >> registering a device for the resource. Move the target nid calculation
> >> to hmem_register_device().
> >>
> >> To allow for registering hmem devices outside of the hmem dax driver
> >> probe routine save the dax hmem platform driver during probe. The
> >> hmem_register_device() interface can then drop the host and target
> >> nid parameters.
> >>
> >> There should be no functional changes.
> >>
> >> Signed-off-by: Nathan Fontenot <nathan.fontenot@amd.com>
> >> ---
> >> drivers/acpi/numa/hmat.c | 7 ++-----
> >> drivers/dax/hmem/device.c | 14 ++++++--------
> >> drivers/dax/hmem/hmem.c | 12 ++++++++----
> >> include/linux/dax.h | 9 ++++-----
> >> 4 files changed, 20 insertions(+), 22 deletions(-)
> >>
> >> diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
> >> index 1a902a02390f..23d4b3ad6d88 100644
> >> --- a/drivers/acpi/numa/hmat.c
> >> +++ b/drivers/acpi/numa/hmat.c
> >> @@ -857,11 +857,8 @@ static void hmat_register_target_devices(struct memory_target *target)
> >> if (!IS_ENABLED(CONFIG_DEV_DAX_HMEM))
> >> return;
> >>
> >> - for (res = target->memregions.child; res; res = res->sibling) {
> >> - int target_nid = pxm_to_node(target->memory_pxm);
> >> -
> >> - hmem_register_resource(target_nid, res);
> >> - }
> >> + for (res = target->memregions.child; res; res = res->sibling)
> >> + hmem_register_resource(res);
> >> }
> >>
> >> static void hmat_register_target(struct memory_target *target)
> >> diff --git a/drivers/dax/hmem/device.c b/drivers/dax/hmem/device.c
> >> index f9e1a76a04a9..ae25e08a636f 100644
> >> --- a/drivers/dax/hmem/device.c
> >> +++ b/drivers/dax/hmem/device.c
> >> @@ -17,14 +17,14 @@ static struct resource hmem_active = {
> >> .flags = IORESOURCE_MEM,
> >> };
> >>
> >> -int walk_hmem_resources(struct device *host, walk_hmem_fn fn)
> >> +int walk_hmem_resources(walk_hmem_fn fn)
> >> {
> >> struct resource *res;
> >> int rc = 0;
> >>
> >> mutex_lock(&hmem_resource_lock);
> >> for (res = hmem_active.child; res; res = res->sibling) {
> >> - rc = fn(host, (int) res->desc, res);
> >> + rc = fn(res);
> >> if (rc)
> >> break;
> >> }
> >> @@ -33,7 +33,7 @@ int walk_hmem_resources(struct device *host, walk_hmem_fn fn)
> >> }
> >> EXPORT_SYMBOL_GPL(walk_hmem_resources);
> >>
> >> -static void __hmem_register_resource(int target_nid, struct resource *res)
> >> +static void __hmem_register_resource(struct resource *res)
> >> {
> >> struct platform_device *pdev;
> >> struct resource *new;
> >> @@ -46,8 +46,6 @@ static void __hmem_register_resource(int target_nid, struct resource *res)
> >> return;
> >> }
> >>
> >> - new->desc = target_nid;
> >> -
> >> if (platform_initialized)
> >> return;
> >>
> >> @@ -64,19 +62,19 @@ static void __hmem_register_resource(int target_nid, struct resource *res)
> >> platform_initialized = true;
> >> }
> >>
> >> -void hmem_register_resource(int target_nid, struct resource *res)
> >> +void hmem_register_resource(struct resource *res)
> >> {
> >> if (nohmem)
> >> return;
> >>
> >> mutex_lock(&hmem_resource_lock);
> >> - __hmem_register_resource(target_nid, res);
> >> + __hmem_register_resource(res);
> >> mutex_unlock(&hmem_resource_lock);
> >> }
> >>
> >> static __init int hmem_register_one(struct resource *res, void *data)
> >> {
> >> - hmem_register_resource(phys_to_target_node(res->start), res);
> >> + hmem_register_resource(res);
> >>
> >> return 0;
> >> }
> >> diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
> >> index 5e7c53f18491..088f4060d4d5 100644
> >> --- a/drivers/dax/hmem/hmem.c
> >> +++ b/drivers/dax/hmem/hmem.c
> >> @@ -9,6 +9,8 @@
> >> static bool region_idle;
> >> module_param_named(region_idle, region_idle, bool, 0644);
> >>
> >> +static struct platform_device *dax_hmem_pdev;
> >
> > I don't think you can assume there is only ever 1 hmem platform device.
> >
> > hmat_register_target_devices() in particular iterates multiple memory
> > regions and will create more than one.
> >
> > What am I missing?
>
> You may be correct that there can be more than one hmem platform device.
> I was making this change based on a comment from Dan that it may not matter
> which platform device these are created against.
If that is true I think there should be a big comment around this code
explaining why it is ok to have the platform device being allocated in
this call unregistered when a different platform device (host) is
released.
IOW hmem_register_device() calls two devm_*() functions using host as the
device used to trigger an action. It is not entirely clear to me why that
change is safe here.
>
> I could be wrong in that assumption. If so we'll need to figure lout how to
> determine which platform device a soft reserve resource would be created
> against when they are added later in boot from a notification by the
> srmem notification chain.
I see that it would be more difficult to track. And I'm ok if it really
does work. But just looking at the commit message and code I don't see
how this does not at least introduce a functional change.
Ira
>
> -Nathan
>
> > Ira
> >
> >> +
> >> static int dax_hmem_probe(struct platform_device *pdev)
> >> {
> >> unsigned long flags = IORESOURCE_DAX_KMEM;
> >> @@ -59,13 +61,13 @@ static void release_hmem(void *pdev)
> >> platform_device_unregister(pdev);
> >> }
> >>
> >> -static int hmem_register_device(struct device *host, int target_nid,
> >> - const struct resource *res)
> >> +static int hmem_register_device(const struct resource *res)
> >> {
> >> + struct device *host = &dax_hmem_pdev->dev;
> >> struct platform_device *pdev;
> >> struct memregion_info info;
> >> + int target_nid, rc;
> >> long id;
> >> - int rc;
> >>
> >> if (IS_ENABLED(CONFIG_CXL_REGION) &&
> >> region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
> >> @@ -94,6 +96,7 @@ static int hmem_register_device(struct device *host, int target_nid,
> >> return -ENOMEM;
> >> }
> >>
> >> + target_nid = phys_to_target_node(res->start);
> >> pdev->dev.numa_node = numa_map_to_online_node(target_nid);
> >> info = (struct memregion_info) {
> >> .target_node = target_nid,
> >> @@ -125,7 +128,8 @@ static int hmem_register_device(struct device *host, int target_nid,
> >>
> >> static int dax_hmem_platform_probe(struct platform_device *pdev)
> >> {
> >> - return walk_hmem_resources(&pdev->dev, hmem_register_device);
> >> + dax_hmem_pdev = pdev;
> >> + return walk_hmem_resources(hmem_register_device);
> >> }
> >>
> >> static struct platform_driver dax_hmem_platform_driver = {
> >> diff --git a/include/linux/dax.h b/include/linux/dax.h
> >> index 9d3e3327af4c..beaa4bcb515c 100644
> >> --- a/include/linux/dax.h
> >> +++ b/include/linux/dax.h
> >> @@ -276,14 +276,13 @@ static inline int dax_mem2blk_err(int err)
> >> }
> >>
> >> #ifdef CONFIG_DEV_DAX_HMEM_DEVICES
> >> -void hmem_register_resource(int target_nid, struct resource *r);
> >> +void hmem_register_resource(struct resource *r);
> >> #else
> >> -static inline void hmem_register_resource(int target_nid, struct resource *r)
> >> +static inline void hmem_register_resource(struct resource *r)
> >> {
> >> }
> >> #endif
> >>
> >> -typedef int (*walk_hmem_fn)(struct device *dev, int target_nid,
> >> - const struct resource *res);
> >> -int walk_hmem_resources(struct device *dev, walk_hmem_fn fn);
> >> +typedef int (*walk_hmem_fn)(const struct resource *res);
> >> +int walk_hmem_resources(walk_hmem_fn fn);
> >> #endif
> >> --
> >> 2.43.0
> >>
> >>
> >
> >
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 1/4] kernel/resource: Introduce managed SOFT RESERVED resources
2025-01-16 17:42 ` [PATCH v2 1/4] kernel/resource: Introduce managed SOFT RESERVED resources Nathan Fontenot
2025-01-21 8:19 ` David Hildenbrand
@ 2025-01-22 5:52 ` Fan Ni
2025-01-23 15:55 ` Fontenot, Nathan
1 sibling, 1 reply; 25+ messages in thread
From: Fan Ni @ 2025-01-22 5:52 UTC (permalink / raw)
To: Nathan Fontenot
Cc: linux-cxl, dan.j.williams, alison.schofield, linux-mm, gourry
On Thu, Jan 16, 2025 at 11:42:05AM -0600, Nathan Fontenot wrote:
> Introduce the ability to manage SOFT RESERVED kernel resources prior to
> these resources being placed in the iomem resource tree. This provides
> the ability for drivers to update SOFT RESERVED resources that intersect
> with their memory resources.
>
> During boot, any resources marked as IORES_DESC_SOFT_RESERVED are placed
> on the soft reserve resource tree. Once boot completes all resources
> are placed on the iomem resource tree. This behavior is gated by a new
> kernel option CONFIG_SOFT_RESERVED_MANAGED.
>
> As part of this update two new interfaces are added for management of
> the SOFT RESERVED resources. The release_srmem_region_adjustable()
> routine allows for removing pieces of SOFT RESERVED resources. The
> the merge_srmem_resources() allows drivers to merge any remaining SOFT
> RESERVED resources into the iomem resource tree once updatea are complete.
>
> Signed-off-by: Nathan Fontenot <nathan.fontenot@amd.com>
> ---
> include/linux/ioport.h | 9 +++++
> kernel/resource.c | 79 +++++++++++++++++++++++++++++++++++++++---
> lib/Kconfig | 4 +++
> 3 files changed, 87 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> index 6e9fb667a1c5..2c95cf0be45e 100644
> --- a/include/linux/ioport.h
> +++ b/include/linux/ioport.h
> @@ -249,6 +249,15 @@ struct resource *lookup_resource(struct resource *root, resource_size_t start);
> int adjust_resource(struct resource *res, resource_size_t start,
> resource_size_t size);
> resource_size_t resource_alignment(struct resource *res);
> +
> +#ifdef CONFIG_SOFT_RESERVED_MANAGED
> +void merge_srmem_resources(void);
> +extern void release_srmem_region_adjustable(resource_size_t start,
> + resource_size_t size);
> +#else
> +static inline void merge_srmem_resources(void) { }
> +#endif
> +
> static inline resource_size_t resource_size(const struct resource *res)
> {
> return res->end - res->start + 1;
> diff --git a/kernel/resource.c b/kernel/resource.c
> index a83040fde236..9db420078a3f 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -48,6 +48,14 @@ struct resource iomem_resource = {
> };
> EXPORT_SYMBOL(iomem_resource);
>
> +static struct resource srmem_resource = {
> + .name = "Soft Reserved mem",
> + .start = 0,
> + .end = -1,
> + .flags = IORESOURCE_MEM,
> + .desc = IORES_DESC_SOFT_RESERVED,
> +};
> +
> static DEFINE_RWLOCK(resource_lock);
>
> static struct resource *next_resource(struct resource *p, bool skip_children)
> @@ -818,6 +826,19 @@ static struct resource * __insert_resource(struct resource *parent, struct resou
> {
> struct resource *first, *next;
>
> + if (IS_ENABLED(CONFIG_SOFT_RESERVED_MANAGED)) {
> + /*
> + * During boot SOFT RESERVED resources are placed on the srmem
> + * resource tree. These resources may be updated later in boot,
> + * for example see the CXL driver, prior to being merged into
> + * the iomem resource tree.
> + */
> + if (system_state < SYSTEM_RUNNING &&
> + parent == &iomem_resource &&
> + new->desc == IORES_DESC_SOFT_RESERVED)
> + parent = &srmem_resource;
> + }
> +
> for (;; parent = first) {
> first = __request_resource(parent, new);
> if (!first)
> @@ -1336,11 +1357,12 @@ void __release_region(struct resource *parent, resource_size_t start,
> }
> EXPORT_SYMBOL(__release_region);
>
> -#ifdef CONFIG_MEMORY_HOTREMOVE
If CONFIG_MEMORY_HOTREMOVE not defined, it seems we do not have a
user for release_region_adjustable as
release_mem_region_adjustable() will not exist.
Fan
> /**
> - * release_mem_region_adjustable - release a previously reserved memory region
> + * release_region_adjustable - release a previously reserved memory region
> + * @parent: resource tree to release resource from
> * @start: resource start address
> * @size: resource region size
> + * @busy_check: check for IORESOURCE_BUSY
> *
> * This interface is intended for memory hot-delete. The requested region
> * is released from a currently busy memory resource. The requested region
> @@ -1356,9 +1378,11 @@ EXPORT_SYMBOL(__release_region);
> * assumes that all children remain in the lower address entry for
> * simplicity. Enhance this logic when necessary.
> */
> -void release_mem_region_adjustable(resource_size_t start, resource_size_t size)
> +static void release_region_adjustable(struct resource *parent,
> + resource_size_t start,
> + resource_size_t size,
> + bool busy_check)
> {
> - struct resource *parent = &iomem_resource;
> struct resource *new_res = NULL;
> bool alloc_nofail = false;
> struct resource **p;
> @@ -1395,7 +1419,7 @@ void release_mem_region_adjustable(resource_size_t start, resource_size_t size)
> if (!(res->flags & IORESOURCE_MEM))
> break;
>
> - if (!(res->flags & IORESOURCE_BUSY)) {
> + if (busy_check && !(res->flags & IORESOURCE_BUSY)) {
> p = &res->child;
> continue;
> }
> @@ -1445,6 +1469,51 @@ void release_mem_region_adjustable(resource_size_t start, resource_size_t size)
> write_unlock(&resource_lock);
> free_resource(new_res);
> }
> +
> +#ifdef CONFIG_SOFT_RESERVED_MANAGED
> +/**
> + * merge_srmem_resources - merge srmem resources into the iomem resource tree
> + *
> + * This is intended to allow kernel drivers that manage the SOFT RESERVED
> + * resources to merge any remaining resources into the iomem resource tree
> + * once any updates have been made.
> + */
> +void merge_srmem_resources(void)
> +{
> + struct resource *res, *next;
> + int rc;
> +
> + for (res = srmem_resource.child; res; res = next) {
> + next = next_resource(res, true);
> +
> + write_lock(&resource_lock);
> +
> + if (WARN_ON(__release_resource(res, true))) {
> + write_unlock(&resource_lock);
> + continue;
> + }
> +
> + if (WARN_ON(__insert_resource(&iomem_resource, res)))
> + __insert_resource(&srmem_resource, res);
> +
> + write_unlock(&resource_lock);
> + }
> +}
> +EXPORT_SYMBOL_GPL(merge_srmem_resources);
> +
> +void release_srmem_region_adjustable(resource_size_t start,
> + resource_size_t size)
> +{
> + release_region_adjustable(&srmem_resource, start, size, false);
> +}
> +EXPORT_SYMBOL(release_srmem_region_adjustable);
> +#endif
> +
> +#ifdef CONFIG_MEMORY_HOTREMOVE
> +void release_mem_region_adjustable(resource_size_t start, resource_size_t size)
> +{
> + release_region_adjustable(&iomem_resource, start, size, true);
> +}
> #endif /* CONFIG_MEMORY_HOTREMOVE */
>
> #ifdef CONFIG_MEMORY_HOTPLUG
> diff --git a/lib/Kconfig b/lib/Kconfig
> index b38849af6f13..4f4011334051 100644
> --- a/lib/Kconfig
> +++ b/lib/Kconfig
> @@ -777,3 +777,7 @@ config POLYNOMIAL
>
> config FIRMWARE_TABLE
> bool
> +
> +config SOFT_RESERVED_MANAGED
> + bool
> + default n
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 1/4] kernel/resource: Introduce managed SOFT RESERVED resources
2025-01-21 18:57 ` Fontenot, Nathan
@ 2025-01-22 6:03 ` Fan Ni
2025-01-23 15:49 ` Fontenot, Nathan
0 siblings, 1 reply; 25+ messages in thread
From: Fan Ni @ 2025-01-22 6:03 UTC (permalink / raw)
To: Fontenot, Nathan
Cc: David Hildenbrand, linux-cxl, dan.j.williams, alison.schofield,
linux-mm, gourry
On Tue, Jan 21, 2025 at 12:57:19PM -0600, Fontenot, Nathan wrote:
>
>
> On 1/21/2025 2:19 AM, David Hildenbrand wrote:
> > On 16.01.25 18:42, Nathan Fontenot wrote:
> >
> > Hi,
> >
> >> Introduce the ability to manage SOFT RESERVED kernel resources prior to
> >> these resources being placed in the iomem resource tree. This provides
> >> the ability for drivers to update SOFT RESERVED resources that intersect
> >> with their memory resources.
> >>
> >> During boot, any resources marked as IORES_DESC_SOFT_RESERVED are placed
> >> on the soft reserve resource tree. Once boot completes all resources
> >> are placed on the iomem resource tree. This behavior is gated by a new
> >> kernel option CONFIG_SOFT_RESERVED_MANAGED.
> >>
> >
> > I'm missing a bit of context here.
> >
> > Why can't we flag these regions in the existing iomem tree, where they can be fixed up (even after boot?)?
> >
> > Especially, what about deferred driver loading after boot? Why is that not a concern or why can we reliably handle everything "during boot" ?
>
> That's a good question and one I should have addressed.
>
> The goal is to prevent the dax driver from creating dax devices for soft reserve
> resources prior to the soft reserve resources being updated for any intersecting
> cxl regions.
Not an export. Can you explain a little more here?
What is the problem if we only flag the resources as "soft
reserved" in the iomem tree without creating a separate tree, and
process the "soft reserved" resources only when needed?
Fan
>
> During boot the dax hmem driver walks the iomem tree to save off a copy of all
> soft reserve resources. The dax driver then later walks this copy to create
> dax devices for the soft reserve regions. This occurs before the cxl drivers
> load, create cxl regions, and update any intersecting soft reserve resources.
>
> To prevent this the soft reserves are set aside on a separate list during boot
> so that they can be updated (if needed) and later added to the iomem resource tree.
> The dax driver is then notified of any soft reserves added to the iomem tree
> so that it may consume them.
>
> Hopefully that answers your question. I'll include this in the next version.
>
> -Nathan
>
> >
> >> As part of this update two new interfaces are added for management of
> >> the SOFT RESERVED resources. The release_srmem_region_adjustable()
> >> routine allows for removing pieces of SOFT RESERVED resources. The
> >> the merge_srmem_resources() allows drivers to merge any remaining SOFT
> >> RESERVED resources into the iomem resource tree once updatea are complete.
> >>
> >> Signed-off-by: Nathan Fontenot <nathan.fontenot@amd.com>
> >
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 1/4] kernel/resource: Introduce managed SOFT RESERVED resources
2025-01-22 6:03 ` Fan Ni
@ 2025-01-23 15:49 ` Fontenot, Nathan
2025-01-27 14:40 ` David Hildenbrand
0 siblings, 1 reply; 25+ messages in thread
From: Fontenot, Nathan @ 2025-01-23 15:49 UTC (permalink / raw)
To: Fan Ni
Cc: David Hildenbrand, linux-cxl, dan.j.williams, alison.schofield,
linux-mm, gourry
On 1/22/2025 12:03 AM, Fan Ni wrote:
> On Tue, Jan 21, 2025 at 12:57:19PM -0600, Fontenot, Nathan wrote:
>>
>>
>> On 1/21/2025 2:19 AM, David Hildenbrand wrote:
>>> On 16.01.25 18:42, Nathan Fontenot wrote:
>>>
>>> Hi,
>>>
>>>> Introduce the ability to manage SOFT RESERVED kernel resources prior to
>>>> these resources being placed in the iomem resource tree. This provides
>>>> the ability for drivers to update SOFT RESERVED resources that intersect
>>>> with their memory resources.
>>>>
>>>> During boot, any resources marked as IORES_DESC_SOFT_RESERVED are placed
>>>> on the soft reserve resource tree. Once boot completes all resources
>>>> are placed on the iomem resource tree. This behavior is gated by a new
>>>> kernel option CONFIG_SOFT_RESERVED_MANAGED.
>>>>
>>>
>>> I'm missing a bit of context here.
>>>
>>> Why can't we flag these regions in the existing iomem tree, where they can be fixed up (even after boot?)?
>>>
>>> Especially, what about deferred driver loading after boot? Why is that not a concern or why can we reliably handle everything "during boot" ?
>>
>> That's a good question and one I should have addressed.
>>
>> The goal is to prevent the dax driver from creating dax devices for soft reserve
>> resources prior to the soft reserve resources being updated for any intersecting
>> cxl regions.
>
> Not an export. Can you explain a little more here?
> What is the problem if we only flag the resources as "soft
> reserved" in the iomem tree without creating a separate tree, and
> process the "soft reserved" resources only when needed?
The issue we currently encounter is that the dax driver consumes these soft reserve
resources and creates dax devices for the soft reserve resources before the cxl driver
comnpletes device probe and can update the soft reserve resources to remove any
intersections with cxl regions. We do not want these soft reserves consumed prior
to them being updated.
If we were to put the soft reserves on the iomem tree we would need to have the
cxl driver provide a notification that it has completed updates and others (i.e. dax)
can them go process the soft reserve resources.
-Nathan
>
> Fan
>>
>> During boot the dax hmem driver walks the iomem tree to save off a copy of all
>> soft reserve resources. The dax driver then later walks this copy to create
>> dax devices for the soft reserve regions. This occurs before the cxl drivers
>> load, create cxl regions, and update any intersecting soft reserve resources.
>>
>> To prevent this the soft reserves are set aside on a separate list during boot
>> so that they can be updated (if needed) and later added to the iomem resource tree.
>> The dax driver is then notified of any soft reserves added to the iomem tree
>> so that it may consume them.
>>
>> Hopefully that answers your question. I'll include this in the next version.
>>
>> -Nathan
>>
>>>
>>>> As part of this update two new interfaces are added for management of
>>>> the SOFT RESERVED resources. The release_srmem_region_adjustable()
>>>> routine allows for removing pieces of SOFT RESERVED resources. The
>>>> the merge_srmem_resources() allows drivers to merge any remaining SOFT
>>>> RESERVED resources into the iomem resource tree once updatea are complete.
>>>>
>>>> Signed-off-by: Nathan Fontenot <nathan.fontenot@amd.com>
>>>
>>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 1/4] kernel/resource: Introduce managed SOFT RESERVED resources
2025-01-22 5:52 ` Fan Ni
@ 2025-01-23 15:55 ` Fontenot, Nathan
0 siblings, 0 replies; 25+ messages in thread
From: Fontenot, Nathan @ 2025-01-23 15:55 UTC (permalink / raw)
To: Fan Ni, Nathan Fontenot
Cc: linux-cxl, dan.j.williams, alison.schofield, linux-mm, gourry
On 1/21/2025 11:52 PM, Fan Ni wrote:
> On Thu, Jan 16, 2025 at 11:42:05AM -0600, Nathan Fontenot wrote:
>> Introduce the ability to manage SOFT RESERVED kernel resources prior to
>> these resources being placed in the iomem resource tree. This provides
>> the ability for drivers to update SOFT RESERVED resources that intersect
>> with their memory resources.
>>
>> During boot, any resources marked as IORES_DESC_SOFT_RESERVED are placed
>> on the soft reserve resource tree. Once boot completes all resources
>> are placed on the iomem resource tree. This behavior is gated by a new
>> kernel option CONFIG_SOFT_RESERVED_MANAGED.
>>
>> As part of this update two new interfaces are added for management of
>> the SOFT RESERVED resources. The release_srmem_region_adjustable()
>> routine allows for removing pieces of SOFT RESERVED resources. The
>> the merge_srmem_resources() allows drivers to merge any remaining SOFT
>> RESERVED resources into the iomem resource tree once updatea are complete.
>>
>> Signed-off-by: Nathan Fontenot <nathan.fontenot@amd.com>
>> ---
>> include/linux/ioport.h | 9 +++++
>> kernel/resource.c | 79 +++++++++++++++++++++++++++++++++++++++---
>> lib/Kconfig | 4 +++
>> 3 files changed, 87 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/linux/ioport.h b/include/linux/ioport.h
>> index 6e9fb667a1c5..2c95cf0be45e 100644
>> --- a/include/linux/ioport.h
>> +++ b/include/linux/ioport.h
>> @@ -249,6 +249,15 @@ struct resource *lookup_resource(struct resource *root, resource_size_t start);
>> int adjust_resource(struct resource *res, resource_size_t start,
>> resource_size_t size);
>> resource_size_t resource_alignment(struct resource *res);
>> +
>> +#ifdef CONFIG_SOFT_RESERVED_MANAGED
>> +void merge_srmem_resources(void);
>> +extern void release_srmem_region_adjustable(resource_size_t start,
>> + resource_size_t size);
>> +#else
>> +static inline void merge_srmem_resources(void) { }
>> +#endif
>> +
>> static inline resource_size_t resource_size(const struct resource *res)
>> {
>> return res->end - res->start + 1;
>> diff --git a/kernel/resource.c b/kernel/resource.c
>> index a83040fde236..9db420078a3f 100644
>> --- a/kernel/resource.c
>> +++ b/kernel/resource.c
>> @@ -48,6 +48,14 @@ struct resource iomem_resource = {
>> };
>> EXPORT_SYMBOL(iomem_resource);
>>
>> +static struct resource srmem_resource = {
>> + .name = "Soft Reserved mem",
>> + .start = 0,
>> + .end = -1,
>> + .flags = IORESOURCE_MEM,
>> + .desc = IORES_DESC_SOFT_RESERVED,
>> +};
>> +
>> static DEFINE_RWLOCK(resource_lock);
>>
>> static struct resource *next_resource(struct resource *p, bool skip_children)
>> @@ -818,6 +826,19 @@ static struct resource * __insert_resource(struct resource *parent, struct resou
>> {
>> struct resource *first, *next;
>>
>> + if (IS_ENABLED(CONFIG_SOFT_RESERVED_MANAGED)) {
>> + /*
>> + * During boot SOFT RESERVED resources are placed on the srmem
>> + * resource tree. These resources may be updated later in boot,
>> + * for example see the CXL driver, prior to being merged into
>> + * the iomem resource tree.
>> + */
>> + if (system_state < SYSTEM_RUNNING &&
>> + parent == &iomem_resource &&
>> + new->desc == IORES_DESC_SOFT_RESERVED)
>> + parent = &srmem_resource;
>> + }
>> +
>> for (;; parent = first) {
>> first = __request_resource(parent, new);
>> if (!first)
>> @@ -1336,11 +1357,12 @@ void __release_region(struct resource *parent, resource_size_t start,
>> }
>> EXPORT_SYMBOL(__release_region);
>>
>> -#ifdef CONFIG_MEMORY_HOTREMOVE
>
> If CONFIG_MEMORY_HOTREMOVE not defined, it seems we do not have a
> user for release_region_adjustable as
> release_mem_region_adjustable() will not exist.
The release_region_adjustable() routine is used by by release_mem_region_adjustable()
and release_srmem_region_adjustable(). We could put the following around
release_region_adjustable() to prevent it being present when not used.
#if defined(CONFIG_MEMORY_HOTREMOVE) || defined(CONFIG_SOFT_RESERVED_MANAGED)
-Nathan
>
> Fan
>> /**
>> - * release_mem_region_adjustable - release a previously reserved memory region
>> + * release_region_adjustable - release a previously reserved memory region
>> + * @parent: resource tree to release resource from
>> * @start: resource start address
>> * @size: resource region size
>> + * @busy_check: check for IORESOURCE_BUSY
>> *
>> * This interface is intended for memory hot-delete. The requested region
>> * is released from a currently busy memory resource. The requested region
>> @@ -1356,9 +1378,11 @@ EXPORT_SYMBOL(__release_region);
>> * assumes that all children remain in the lower address entry for
>> * simplicity. Enhance this logic when necessary.
>> */
>> -void release_mem_region_adjustable(resource_size_t start, resource_size_t size)
>> +static void release_region_adjustable(struct resource *parent,
>> + resource_size_t start,
>> + resource_size_t size,
>> + bool busy_check)
>> {
>> - struct resource *parent = &iomem_resource;
>> struct resource *new_res = NULL;
>> bool alloc_nofail = false;
>> struct resource **p;
>> @@ -1395,7 +1419,7 @@ void release_mem_region_adjustable(resource_size_t start, resource_size_t size)
>> if (!(res->flags & IORESOURCE_MEM))
>> break;
>>
>> - if (!(res->flags & IORESOURCE_BUSY)) {
>> + if (busy_check && !(res->flags & IORESOURCE_BUSY)) {
>> p = &res->child;
>> continue;
>> }
>> @@ -1445,6 +1469,51 @@ void release_mem_region_adjustable(resource_size_t start, resource_size_t size)
>> write_unlock(&resource_lock);
>> free_resource(new_res);
>> }
>> +
>> +#ifdef CONFIG_SOFT_RESERVED_MANAGED
>> +/**
>> + * merge_srmem_resources - merge srmem resources into the iomem resource tree
>> + *
>> + * This is intended to allow kernel drivers that manage the SOFT RESERVED
>> + * resources to merge any remaining resources into the iomem resource tree
>> + * once any updates have been made.
>> + */
>> +void merge_srmem_resources(void)
>> +{
>> + struct resource *res, *next;
>> + int rc;
>> +
>> + for (res = srmem_resource.child; res; res = next) {
>> + next = next_resource(res, true);
>> +
>> + write_lock(&resource_lock);
>> +
>> + if (WARN_ON(__release_resource(res, true))) {
>> + write_unlock(&resource_lock);
>> + continue;
>> + }
>> +
>> + if (WARN_ON(__insert_resource(&iomem_resource, res)))
>> + __insert_resource(&srmem_resource, res);
>> +
>> + write_unlock(&resource_lock);
>> + }
>> +}
>> +EXPORT_SYMBOL_GPL(merge_srmem_resources);
>> +
>> +void release_srmem_region_adjustable(resource_size_t start,
>> + resource_size_t size)
>> +{
>> + release_region_adjustable(&srmem_resource, start, size, false);
>> +}
>> +EXPORT_SYMBOL(release_srmem_region_adjustable);
>> +#endif
>> +
>> +#ifdef CONFIG_MEMORY_HOTREMOVE
>> +void release_mem_region_adjustable(resource_size_t start, resource_size_t size)
>> +{
>> + release_region_adjustable(&iomem_resource, start, size, true);
>> +}
>> #endif /* CONFIG_MEMORY_HOTREMOVE */
>>
>> #ifdef CONFIG_MEMORY_HOTPLUG
>> diff --git a/lib/Kconfig b/lib/Kconfig
>> index b38849af6f13..4f4011334051 100644
>> --- a/lib/Kconfig
>> +++ b/lib/Kconfig
>> @@ -777,3 +777,7 @@ config POLYNOMIAL
>>
>> config FIRMWARE_TABLE
>> bool
>> +
>> +config SOFT_RESERVED_MANAGED
>> + bool
>> + default n
>> --
>> 2.43.0
>>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 3/4] dax: Update hmem resource/device registration
2025-01-21 23:14 ` Ira Weiny
@ 2025-01-23 16:01 ` Fontenot, Nathan
2025-01-27 18:44 ` Fontenot, Nathan
0 siblings, 1 reply; 25+ messages in thread
From: Fontenot, Nathan @ 2025-01-23 16:01 UTC (permalink / raw)
To: Ira Weiny, linux-cxl; +Cc: dan.j.williams, alison.schofield, linux-mm, gourry
On 1/21/2025 5:14 PM, Ira Weiny wrote:
> Fontenot, Nathan wrote:
>> On 1/16/2025 4:28 PM, Ira Weiny wrote:
>>> Nathan Fontenot wrote:
>>>> In order to handle registering hmem devices for SOFT RESERVE reources
>>> ^^^^^^^^^
>>> resources
>>>
>>>> that are added late in boot update the hmem_register_resource(),
>>>> hmem_register_device(), and walk_hmem_resources() interfaces.
>>>>
>>>> Remove the target_nid arg to hmem_register_resource(). The target nid
>>>> value is calculated from the resource start address and not used until
>>>> registering a device for the resource. Move the target nid calculation
>>>> to hmem_register_device().
>>>>
>>>> To allow for registering hmem devices outside of the hmem dax driver
>>>> probe routine save the dax hmem platform driver during probe. The
>>>> hmem_register_device() interface can then drop the host and target
>>>> nid parameters.
>>>>
>>>> There should be no functional changes.
>>>>
>>>> Signed-off-by: Nathan Fontenot <nathan.fontenot@amd.com>
>>>> ---
>>>> drivers/acpi/numa/hmat.c | 7 ++-----
>>>> drivers/dax/hmem/device.c | 14 ++++++--------
>>>> drivers/dax/hmem/hmem.c | 12 ++++++++----
>>>> include/linux/dax.h | 9 ++++-----
>>>> 4 files changed, 20 insertions(+), 22 deletions(-)
>>>>
>>>> diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
>>>> index 1a902a02390f..23d4b3ad6d88 100644
>>>> --- a/drivers/acpi/numa/hmat.c
>>>> +++ b/drivers/acpi/numa/hmat.c
>>>> @@ -857,11 +857,8 @@ static void hmat_register_target_devices(struct memory_target *target)
>>>> if (!IS_ENABLED(CONFIG_DEV_DAX_HMEM))
>>>> return;
>>>>
>>>> - for (res = target->memregions.child; res; res = res->sibling) {
>>>> - int target_nid = pxm_to_node(target->memory_pxm);
>>>> -
>>>> - hmem_register_resource(target_nid, res);
>>>> - }
>>>> + for (res = target->memregions.child; res; res = res->sibling)
>>>> + hmem_register_resource(res);
>>>> }
>>>>
>>>> static void hmat_register_target(struct memory_target *target)
>>>> diff --git a/drivers/dax/hmem/device.c b/drivers/dax/hmem/device.c
>>>> index f9e1a76a04a9..ae25e08a636f 100644
>>>> --- a/drivers/dax/hmem/device.c
>>>> +++ b/drivers/dax/hmem/device.c
>>>> @@ -17,14 +17,14 @@ static struct resource hmem_active = {
>>>> .flags = IORESOURCE_MEM,
>>>> };
>>>>
>>>> -int walk_hmem_resources(struct device *host, walk_hmem_fn fn)
>>>> +int walk_hmem_resources(walk_hmem_fn fn)
>>>> {
>>>> struct resource *res;
>>>> int rc = 0;
>>>>
>>>> mutex_lock(&hmem_resource_lock);
>>>> for (res = hmem_active.child; res; res = res->sibling) {
>>>> - rc = fn(host, (int) res->desc, res);
>>>> + rc = fn(res);
>>>> if (rc)
>>>> break;
>>>> }
>>>> @@ -33,7 +33,7 @@ int walk_hmem_resources(struct device *host, walk_hmem_fn fn)
>>>> }
>>>> EXPORT_SYMBOL_GPL(walk_hmem_resources);
>>>>
>>>> -static void __hmem_register_resource(int target_nid, struct resource *res)
>>>> +static void __hmem_register_resource(struct resource *res)
>>>> {
>>>> struct platform_device *pdev;
>>>> struct resource *new;
>>>> @@ -46,8 +46,6 @@ static void __hmem_register_resource(int target_nid, struct resource *res)
>>>> return;
>>>> }
>>>>
>>>> - new->desc = target_nid;
>>>> -
>>>> if (platform_initialized)
>>>> return;
>>>>
>>>> @@ -64,19 +62,19 @@ static void __hmem_register_resource(int target_nid, struct resource *res)
>>>> platform_initialized = true;
>>>> }
>>>>
>>>> -void hmem_register_resource(int target_nid, struct resource *res)
>>>> +void hmem_register_resource(struct resource *res)
>>>> {
>>>> if (nohmem)
>>>> return;
>>>>
>>>> mutex_lock(&hmem_resource_lock);
>>>> - __hmem_register_resource(target_nid, res);
>>>> + __hmem_register_resource(res);
>>>> mutex_unlock(&hmem_resource_lock);
>>>> }
>>>>
>>>> static __init int hmem_register_one(struct resource *res, void *data)
>>>> {
>>>> - hmem_register_resource(phys_to_target_node(res->start), res);
>>>> + hmem_register_resource(res);
>>>>
>>>> return 0;
>>>> }
>>>> diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
>>>> index 5e7c53f18491..088f4060d4d5 100644
>>>> --- a/drivers/dax/hmem/hmem.c
>>>> +++ b/drivers/dax/hmem/hmem.c
>>>> @@ -9,6 +9,8 @@
>>>> static bool region_idle;
>>>> module_param_named(region_idle, region_idle, bool, 0644);
>>>>
>>>> +static struct platform_device *dax_hmem_pdev;
>>>
>>> I don't think you can assume there is only ever 1 hmem platform device.
>>>
>>> hmat_register_target_devices() in particular iterates multiple memory
>>> regions and will create more than one.
>>>
>>> What am I missing?
>>
>> You may be correct that there can be more than one hmem platform device.
>> I was making this change based on a comment from Dan that it may not matter
>> which platform device these are created against.
>
> If that is true I think there should be a big comment around this code
> explaining why it is ok to have the platform device being allocated in
> this call unregistered when a different platform device (host) is
> released.
>
> IOW hmem_register_device() calls two devm_*() functions using host as the
> device used to trigger an action. It is not entirely clear to me why that
> change is safe here.
>
>>
>> I could be wrong in that assumption. If so we'll need to figure lout how to
>> determine which platform device a soft reserve resource would be created
>> against when they are added later in boot from a notification by the
>> srmem notification chain.
>
> I see that it would be more difficult to track. And I'm ok if it really
> does work. But just looking at the commit message and code I don't see
> how this does not at least introduce a functional change.
I'm going to go back and take a look at this again. I went this direction
using the approach of having the srmem notification chain. The dax driver
then adds soft reserves outside of a probe routine and don't have a
platform device associated with them.
-Nathan
>
> Ira
>
>>
>> -Nathan
>>
>>> Ira
>>>
>>>> +
>>>> static int dax_hmem_probe(struct platform_device *pdev)
>>>> {
>>>> unsigned long flags = IORESOURCE_DAX_KMEM;
>>>> @@ -59,13 +61,13 @@ static void release_hmem(void *pdev)
>>>> platform_device_unregister(pdev);
>>>> }
>>>>
>>>> -static int hmem_register_device(struct device *host, int target_nid,
>>>> - const struct resource *res)
>>>> +static int hmem_register_device(const struct resource *res)
>>>> {
>>>> + struct device *host = &dax_hmem_pdev->dev;
>>>> struct platform_device *pdev;
>>>> struct memregion_info info;
>>>> + int target_nid, rc;
>>>> long id;
>>>> - int rc;
>>>>
>>>> if (IS_ENABLED(CONFIG_CXL_REGION) &&
>>>> region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
>>>> @@ -94,6 +96,7 @@ static int hmem_register_device(struct device *host, int target_nid,
>>>> return -ENOMEM;
>>>> }
>>>>
>>>> + target_nid = phys_to_target_node(res->start);
>>>> pdev->dev.numa_node = numa_map_to_online_node(target_nid);
>>>> info = (struct memregion_info) {
>>>> .target_node = target_nid,
>>>> @@ -125,7 +128,8 @@ static int hmem_register_device(struct device *host, int target_nid,
>>>>
>>>> static int dax_hmem_platform_probe(struct platform_device *pdev)
>>>> {
>>>> - return walk_hmem_resources(&pdev->dev, hmem_register_device);
>>>> + dax_hmem_pdev = pdev;
>>>> + return walk_hmem_resources(hmem_register_device);
>>>> }
>>>>
>>>> static struct platform_driver dax_hmem_platform_driver = {
>>>> diff --git a/include/linux/dax.h b/include/linux/dax.h
>>>> index 9d3e3327af4c..beaa4bcb515c 100644
>>>> --- a/include/linux/dax.h
>>>> +++ b/include/linux/dax.h
>>>> @@ -276,14 +276,13 @@ static inline int dax_mem2blk_err(int err)
>>>> }
>>>>
>>>> #ifdef CONFIG_DEV_DAX_HMEM_DEVICES
>>>> -void hmem_register_resource(int target_nid, struct resource *r);
>>>> +void hmem_register_resource(struct resource *r);
>>>> #else
>>>> -static inline void hmem_register_resource(int target_nid, struct resource *r)
>>>> +static inline void hmem_register_resource(struct resource *r)
>>>> {
>>>> }
>>>> #endif
>>>>
>>>> -typedef int (*walk_hmem_fn)(struct device *dev, int target_nid,
>>>> - const struct resource *res);
>>>> -int walk_hmem_resources(struct device *dev, walk_hmem_fn fn);
>>>> +typedef int (*walk_hmem_fn)(const struct resource *res);
>>>> +int walk_hmem_resources(walk_hmem_fn fn);
>>>> #endif
>>>> --
>>>> 2.43.0
>>>>
>>>>
>>>
>>>
>>
>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 1/4] kernel/resource: Introduce managed SOFT RESERVED resources
2025-01-23 15:49 ` Fontenot, Nathan
@ 2025-01-27 14:40 ` David Hildenbrand
2025-01-27 18:46 ` Fontenot, Nathan
0 siblings, 1 reply; 25+ messages in thread
From: David Hildenbrand @ 2025-01-27 14:40 UTC (permalink / raw)
To: Fontenot, Nathan, Fan Ni
Cc: linux-cxl, dan.j.williams, alison.schofield, linux-mm, gourry
On 23.01.25 16:49, Fontenot, Nathan wrote:
> On 1/22/2025 12:03 AM, Fan Ni wrote:
>> On Tue, Jan 21, 2025 at 12:57:19PM -0600, Fontenot, Nathan wrote:
>>>
>>>
>>> On 1/21/2025 2:19 AM, David Hildenbrand wrote:
>>>> On 16.01.25 18:42, Nathan Fontenot wrote:
>>>>
>>>> Hi,
>>>>
>>>>> Introduce the ability to manage SOFT RESERVED kernel resources prior to
>>>>> these resources being placed in the iomem resource tree. This provides
>>>>> the ability for drivers to update SOFT RESERVED resources that intersect
>>>>> with their memory resources.
>>>>>
>>>>> During boot, any resources marked as IORES_DESC_SOFT_RESERVED are placed
>>>>> on the soft reserve resource tree. Once boot completes all resources
>>>>> are placed on the iomem resource tree. This behavior is gated by a new
>>>>> kernel option CONFIG_SOFT_RESERVED_MANAGED.
>>>>>
>>>>
>>>> I'm missing a bit of context here.
>>>>
>>>> Why can't we flag these regions in the existing iomem tree, where they can be fixed up (even after boot?)?
>>>>
>>>> Especially, what about deferred driver loading after boot? Why is that not a concern or why can we reliably handle everything "during boot" ?
>>>
>>> That's a good question and one I should have addressed.
>>>
Sorry for the late reply.
>>> The goal is to prevent the dax driver from creating dax devices for soft reserve
>>> resources prior to the soft reserve resources being updated for any intersecting
>>> cxl regions.
>>
>> Not an export. Can you explain a little more here?
>> What is the problem if we only flag the resources as "soft
>> reserved" in the iomem tree without creating a separate tree, and
>> process the "soft reserved" resources only when needed?
>
> The issue we currently encounter is that the dax driver consumes these soft reserve
> resources and creates dax devices for the soft reserve resources before the cxl driver
> comnpletes device probe and can update the soft reserve resources to remove any
> intersections with cxl regions. We do not want these soft reserves consumed prior
> to them being updated.
>
> If we were to put the soft reserves on the iomem tree we would need to have the
> cxl driver provide a notification that it has completed updates and others (i.e. dax)
> can them go process the soft reserve resources.
Would there be any blocker to that approach?
Adding them all to the resource tree and flagging them as soft-reserved,
to then have a signal that allows DAX to work on these, sounds cleaner
to me.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 3/4] dax: Update hmem resource/device registration
2025-01-23 16:01 ` Fontenot, Nathan
@ 2025-01-27 18:44 ` Fontenot, Nathan
0 siblings, 0 replies; 25+ messages in thread
From: Fontenot, Nathan @ 2025-01-27 18:44 UTC (permalink / raw)
To: Ira Weiny, linux-cxl; +Cc: dan.j.williams, alison.schofield, linux-mm, gourry
On 1/23/2025 10:01 AM, Fontenot, Nathan wrote:
> On 1/21/2025 5:14 PM, Ira Weiny wrote:
>> Fontenot, Nathan wrote:
>>> On 1/16/2025 4:28 PM, Ira Weiny wrote:
>>>> Nathan Fontenot wrote:
>>>>> In order to handle registering hmem devices for SOFT RESERVE reources
>>>> ^^^^^^^^^
>>>> resources
>>>>
>>>>> that are added late in boot update the hmem_register_resource(),
>>>>> hmem_register_device(), and walk_hmem_resources() interfaces.
>>>>>
>>>>> Remove the target_nid arg to hmem_register_resource(). The target nid
>>>>> value is calculated from the resource start address and not used until
>>>>> registering a device for the resource. Move the target nid calculation
>>>>> to hmem_register_device().
>>>>>
>>>>> To allow for registering hmem devices outside of the hmem dax driver
>>>>> probe routine save the dax hmem platform driver during probe. The
>>>>> hmem_register_device() interface can then drop the host and target
>>>>> nid parameters.
>>>>>
>>>>> There should be no functional changes.
>>>>>
>>>>> Signed-off-by: Nathan Fontenot <nathan.fontenot@amd.com>
[ snip ]
>>>>> diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
>>>>> index 5e7c53f18491..088f4060d4d5 100644
>>>>> --- a/drivers/dax/hmem/hmem.c
>>>>> +++ b/drivers/dax/hmem/hmem.c
>>>>> @@ -9,6 +9,8 @@
>>>>> static bool region_idle;
>>>>> module_param_named(region_idle, region_idle, bool, 0644);
>>>>>
>>>>> +static struct platform_device *dax_hmem_pdev;
>>>>
>>>> I don't think you can assume there is only ever 1 hmem platform device.
>>>>
>>>> hmat_register_target_devices() in particular iterates multiple memory
>>>> regions and will create more than one.
>>>>
>>>> What am I missing?
>>>
>>> You may be correct that there can be more than one hmem platform device.
>>> I was making this change based on a comment from Dan that it may not matter
>>> which platform device these are created against.
>>
>> If that is true I think there should be a big comment around this code
>> explaining why it is ok to have the platform device being allocated in
>> this call unregistered when a different platform device (host) is
>> released.
>>
>> IOW hmem_register_device() calls two devm_*() functions using host as the
>> device used to trigger an action. It is not entirely clear to me why that
>> change is safe here.
>>
>>>
>>> I could be wrong in that assumption. If so we'll need to figure lout how to
>>> determine which platform device a soft reserve resource would be created
>>> against when they are added later in boot from a notification by the
>>> srmem notification chain.
>>
>> I see that it would be more difficult to track. And I'm ok if it really
>> does work. But just looking at the commit message and code I don't see
>> how this does not at least introduce a functional change.
>
> I'm going to go back and take a look at this again. I went this direction
> using the approach of having the srmem notification chain. The dax driver
> then adds soft reserves outside of a probe routine and don't have a
> platform device associated with them.
>
Digging back into this, the dax driver only creates one platform device.
During hmem_register_resource (which is what hmat_register_target_devices()
calls for each resource) the dax hmem driver will only create a platform
device when the first resource is registered.
Each resource that is passed to hmem_register_resource() is added to a
resource tree internal to the dax/hmem driver that is eventually walked
by the dax hmem driver probe routine.
Now that I understand this better I am confident that saving a pointer
to the dax hmem platform device is safe. I'll include this information
in the commit log for the next version of the patch.
-Nathan
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 1/4] kernel/resource: Introduce managed SOFT RESERVED resources
2025-01-27 14:40 ` David Hildenbrand
@ 2025-01-27 18:46 ` Fontenot, Nathan
2025-03-07 5:56 ` Zhijian Li (Fujitsu)
0 siblings, 1 reply; 25+ messages in thread
From: Fontenot, Nathan @ 2025-01-27 18:46 UTC (permalink / raw)
To: David Hildenbrand, Fan Ni
Cc: linux-cxl, dan.j.williams, alison.schofield, linux-mm, gourry
On 1/27/2025 8:40 AM, David Hildenbrand wrote:
> On 23.01.25 16:49, Fontenot, Nathan wrote:
>> On 1/22/2025 12:03 AM, Fan Ni wrote:
>>> On Tue, Jan 21, 2025 at 12:57:19PM -0600, Fontenot, Nathan wrote:
>>>>
>>>>
>>>> On 1/21/2025 2:19 AM, David Hildenbrand wrote:
>>>>> On 16.01.25 18:42, Nathan Fontenot wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>>> Introduce the ability to manage SOFT RESERVED kernel resources prior to
>>>>>> these resources being placed in the iomem resource tree. This provides
>>>>>> the ability for drivers to update SOFT RESERVED resources that intersect
>>>>>> with their memory resources.
>>>>>>
>>>>>> During boot, any resources marked as IORES_DESC_SOFT_RESERVED are placed
>>>>>> on the soft reserve resource tree. Once boot completes all resources
>>>>>> are placed on the iomem resource tree. This behavior is gated by a new
>>>>>> kernel option CONFIG_SOFT_RESERVED_MANAGED.
>>>>>>
>>>>>
>>>>> I'm missing a bit of context here.
>>>>>
>>>>> Why can't we flag these regions in the existing iomem tree, where they can be fixed up (even after boot?)?
>>>>>
>>>>> Especially, what about deferred driver loading after boot? Why is that not a concern or why can we reliably handle everything "during boot" ?
>>>>
>>>> That's a good question and one I should have addressed.
>>>>
>
> Sorry for the late reply.
>
>>>> The goal is to prevent the dax driver from creating dax devices for soft reserve
>>>> resources prior to the soft reserve resources being updated for any intersecting
>>>> cxl regions.
>>>
>>> Not an export. Can you explain a little more here?
>>> What is the problem if we only flag the resources as "soft
>>> reserved" in the iomem tree without creating a separate tree, and
>>> process the "soft reserved" resources only when needed?
>>
>> The issue we currently encounter is that the dax driver consumes these soft reserve
>> resources and creates dax devices for the soft reserve resources before the cxl driver
>> comnpletes device probe and can update the soft reserve resources to remove any
>> intersections with cxl regions. We do not want these soft reserves consumed prior
>> to them being updated.
>>
>> If we were to put the soft reserves on the iomem tree we would need to have the
>> cxl driver provide a notification that it has completed updates and others (i.e. dax)
>> can them go process the soft reserve resources.
>
> Would there be any blocker to that approach?
>
> Adding them all to the resource tree and flagging them as soft-reserved, to then have a signal that allows DAX to work on these, sounds cleaner to me.
>
You're correct that this does sound cleaner. I've been thinking about how this could be done
and have started working on a version of the patch that takes this approach. If this works
I'll make it part of the next version of the patch set.
-Nathan
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 1/4] kernel/resource: Introduce managed SOFT RESERVED resources
2025-01-27 18:46 ` Fontenot, Nathan
@ 2025-03-07 5:56 ` Zhijian Li (Fujitsu)
2025-03-07 16:47 ` Alison Schofield
2025-03-07 23:05 ` Bowman, Terry
0 siblings, 2 replies; 25+ messages in thread
From: Zhijian Li (Fujitsu) @ 2025-03-07 5:56 UTC (permalink / raw)
To: Fontenot, Nathan, David Hildenbrand, Fan Ni
Cc: linux-cxl, dan.j.williams, alison.schofield, linux-mm, gourry
Hello Fontenot,
I hope this email finds you well.
Thank you very much for this patch. We've encountered the same issue in our product,
and your patch works.
We do hope this issue will be resolved in the upstream kernel soon.
On 28/01/2025 02:46, Fontenot, Nathan wrote:
>>>>>> I'm missing a bit of context here.
>>>>>>
>>>>>> Why can't we flag these regions in the existing iomem tree, where they can be fixed up (even after boot?)?
>>>>>>
>>>>>> Especially, what about deferred driver loading after boot? Why is that not a concern or why can we reliably handle everything "during boot" ?
>>>>> That's a good question and one I should have addressed.
>>>>>
>> Sorry for the late reply.
>>
>>>>> The goal is to prevent the dax driver from creating dax devices for soft reserve
>>>>> resources prior to the soft reserve resources being updated for any intersecting
>>>>> cxl regions.
>>>> Not an export. Can you explain a little more here?
>>>> What is the problem if we only flag the resources as "soft
>>>> reserved" in the iomem tree without creating a separate tree, and
>>>> process the "soft reserved" resources only when needed?
>>> The issue we currently encounter is that the dax driver consumes these soft reserve
>>> resources and creates dax devices for the soft reserve resources before the cxl driver
>>> comnpletes device probe and can update the soft reserve resources to remove any
>>> intersections with cxl regions. We do not want these soft reserves consumed prior
>>> to them being updated.
>>>
>>> If we were to put the soft reserves on the iomem tree we would need to have the
>>> cxl driver provide a notification that it has completed updates and others (i.e. dax)
>>> can them go process the soft reserve resources.
>> Would there be any blocker to that approach?
>>
>> Adding them all to the resource tree and flagging them as soft-reserved, to then have a signal that allows DAX to work on these, sounds cleaner to me.
>>
> You're correct that this does sound cleaner. I've been thinking about how this could be done
> and have started working on a version of the patch that takes this approach. If this works
> I'll make it part of the next version of the patch set.
I noticed your earlier discussions about designing a new approach to solve this problem,
which I'm pretty excited about. Do you have any idea when you might post the updated
version? We'd love to help out with reviewing and testing..
If you run into any roadblocks and need a hand, just let us know. we'd be delighted to help.
As far as I know, this issue usually arises on Real CXL machines, but for ease of testing
and validation, we modified QEMU to simulate the intersection of 'Soft Reserved' and
the CXL region, which would aid in verification.
Thanks
Zhijian
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 1/4] kernel/resource: Introduce managed SOFT RESERVED resources
2025-03-07 5:56 ` Zhijian Li (Fujitsu)
@ 2025-03-07 16:47 ` Alison Schofield
2025-03-10 5:52 ` Li Zhijian
2025-03-07 23:05 ` Bowman, Terry
1 sibling, 1 reply; 25+ messages in thread
From: Alison Schofield @ 2025-03-07 16:47 UTC (permalink / raw)
To: Zhijian Li (Fujitsu)
Cc: Fontenot, Nathan, David Hildenbrand, Fan Ni, linux-cxl,
dan.j.williams, linux-mm, gourry
On Fri, Mar 07, 2025 at 05:56:25AM +0000, Zhijian Li (Fujitsu) wrote:
> Hello Fontenot,
>
> I hope this email finds you well.
>
> Thank you very much for this patch. We've encountered the same issue in our product,
> and your patch works.
>
> We do hope this issue will be resolved in the upstream kernel soon.
>
snip
>
>
> I noticed your earlier discussions about designing a new approach to solve this problem,
> which I'm pretty excited about. Do you have any idea when you might post the updated
> version? We'd love to help out with reviewing and testing..
>
> If you run into any roadblocks and need a hand, just let us know. we'd be delighted to help.
>
> As far as I know, this issue usually arises on Real CXL machines, but for ease of testing
> and validation, we modified QEMU to simulate the intersection of 'Soft Reserved' and
> the CXL region, which would aid in verification.
Hi Zhijian,
Wow - I want that! Can you share that QEMU branch? With QEMU cmdline too.
Thanks :)
Alison
>
>
> Thanks
> Zhijian
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 1/4] kernel/resource: Introduce managed SOFT RESERVED resources
2025-03-07 5:56 ` Zhijian Li (Fujitsu)
2025-03-07 16:47 ` Alison Schofield
@ 2025-03-07 23:05 ` Bowman, Terry
2025-03-10 6:00 ` Zhijian Li (Fujitsu)
2025-03-23 8:24 ` Zhijian Li (Fujitsu)
1 sibling, 2 replies; 25+ messages in thread
From: Bowman, Terry @ 2025-03-07 23:05 UTC (permalink / raw)
To: Zhijian Li (Fujitsu), Fontenot, Nathan, David Hildenbrand, Fan Ni
Cc: linux-cxl, dan.j.williams, alison.schofield, linux-mm, gourry,
PradeepVineshReddy.Kodamati
On 3/6/2025 11:56 PM, Zhijian Li (Fujitsu) wrote:
> Hello Fontenot,
>
> I hope this email finds you well.
>
> Thank you very much for this patch. We've encountered the same issue in our product,
> and your patch works.
>
> We do hope this issue will be resolved in the upstream kernel soon.
>
>
> On 28/01/2025 02:46, Fontenot, Nathan wrote:
>>>>>>> I'm missing a bit of context here.
>>>>>>>
>>>>>>> Why can't we flag these regions in the existing iomem tree, where they can be fixed up (even after boot?)?
>>>>>>>
>>>>>>> Especially, what about deferred driver loading after boot? Why is that not a concern or why can we reliably handle everything "during boot" ?
>>>>>> That's a good question and one I should have addressed.
>>>>>>
>>> Sorry for the late reply.
>>>
>>>>>> The goal is to prevent the dax driver from creating dax devices for soft reserve
>>>>>> resources prior to the soft reserve resources being updated for any intersecting
>>>>>> cxl regions.
>>>>> Not an export. Can you explain a little more here?
>>>>> What is the problem if we only flag the resources as "soft
>>>>> reserved" in the iomem tree without creating a separate tree, and
>>>>> process the "soft reserved" resources only when needed?
>>>> The issue we currently encounter is that the dax driver consumes these soft reserve
>>>> resources and creates dax devices for the soft reserve resources before the cxl driver
>>>> comnpletes device probe and can update the soft reserve resources to remove any
>>>> intersections with cxl regions. We do not want these soft reserves consumed prior
>>>> to them being updated.
>>>>
>>>> If we were to put the soft reserves on the iomem tree we would need to have the
>>>> cxl driver provide a notification that it has completed updates and others (i.e. dax)
>>>> can them go process the soft reserve resources.
>>> Would there be any blocker to that approach?
>>>
>>> Adding them all to the resource tree and flagging them as soft-reserved, to then have a signal that allows DAX to work on these, sounds cleaner to me.
>>>
>> You're correct that this does sound cleaner. I've been thinking about how this could be done
>> and have started working on a version of the patch that takes this approach. If this works
>> I'll make it part of the next version of the patch set.
>
>
> I noticed your earlier discussions about designing a new approach to solve this problem,
> which I'm pretty excited about. Do you have any idea when you might post the updated
> version? We'd love to help out with reviewing and testing..
>
> If you run into any roadblocks and need a hand, just let us know. we'd be delighted to help.
>
> As far as I know, this issue usually arises on Real CXL machines, but for ease of testing
> and validation, we modified QEMU to simulate the intersection of 'Soft Reserved' and
> the CXL region, which would aid in verification.
>
>
> Thanks
> Zhijian
Hi Zhijian,
Nathan asked me to finish the patchset submission in his place. He has the
v3 iteration ready and I plan to send this for review next week.
Can you share the QEMU changes for simulating the situation? Using QEMU
would be very helpful.
Thanks for offering to help test and review.
Regards,
Terry
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 1/4] kernel/resource: Introduce managed SOFT RESERVED resources
2025-03-07 16:47 ` Alison Schofield
@ 2025-03-10 5:52 ` Li Zhijian
0 siblings, 0 replies; 25+ messages in thread
From: Li Zhijian @ 2025-03-10 5:52 UTC (permalink / raw)
To: Alison Schofield, terry.bowman
Cc: y-goto, nafonten, david, nifan.cxl, linux-cxl, dan.j.williams,
linux-mm, gourry
>On Fri, Mar 07, 2025 at 05:56:25AM +0000, Zhijian Li (Fujitsu) wrote:
>> Hello Fontenot,
>>
>> I hope this email finds you well.
>>
>> Thank you very much for this patch. We've encountered the same issue in our product,
>> and your patch works.
>>
>> We do hope this issue will be resolved in the upstream kernel soon.
>>
>
>snip
>
>>
>>
>> I noticed your earlier discussions about designing a new approach to solve this problem,
>> which I'm pretty excited about. Do you have any idea when you might post the updated
>> version? We'd love to help out with reviewing and testing..
>>
>> If you run into any roadblocks and need a hand, just let us know. we'd be delighted to help.
>>
>> As far as I know, this issue usually arises on Real CXL machines, but for ease of testing
>> and validation, we modified QEMU to simulate the intersection of 'Soft Reserved' and
>> the CXL region, which would aid in verification.
>
>Hi Zhijian,
>
>Wow - I want that! Can you share that QEMU branch? With QEMU cmdline too.
>
Hi Terry and Alison
Well, it's a simple hack within the QEMU and seabios project. Their
modifications are as below:
Note: the QEMU's modification exposed the CFMWs region instead of the CXL
memory region(that the Real CXL machine did)
After the guest started, you could see the iomem tree:
a90000000-1a8fffffff : CXL Window 0
a90000000-1a8fffffff : Soft Reserved
QEMU:
=====================
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index f199a8c7ad19..484ad7e5e632 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -963,6 +963,9 @@ void pc_memory_init(PCMachineState *pcms,
memory_region_init_io(&fw->mr, OBJECT(machine), &cfmws_ops, fw,
"cxl-fixed-memory-region", fw->size);
memory_region_add_subregion(system_memory, fw->base, &fw->mr);
+#define E820_TYPE_SOFT_RESERVED 0xefffffff
+ /* add special purpose memory */
+ e820_add_entry(fw->mr.addr, fw->mr.size , E820_TYPE_SOFT_RESERVED);
cxl_fmw_base += fw->size;
cxl_resv_end = cxl_fmw_base;
}
seabios
=====================
diff --git a/src/e820map.c b/src/e820map.c
index c761e5e98a75..9440039541a6 100644
--- a/src/e820map.c
+++ b/src/e820map.c
@@ -54,6 +54,7 @@ e820_type_name(u32 type)
case E820_ACPI: return "ACPI";
case E820_NVS: return "NVS";
case E820_UNUSABLE: return "UNUSABLE";
+ case E820_TYPE_SOFT_RESERVED: return "Soft Reserved";
default: return "UNKNOWN";
}
}
diff --git a/src/e820map.h b/src/e820map.h
index 07ce16ec213f..dd416d5ba3df 100644
--- a/src/e820map.h
+++ b/src/e820map.h
@@ -8,6 +8,7 @@
#define E820_ACPI 3
#define E820_NVS 4
#define E820_UNUSABLE 5
+#define E820_TYPE_SOFT_RESERVED 0xefffffff
struct e820entry {
u64 start;
diff --git a/src/fw/paravirt.c b/src/fw/paravirt.c
index e5d4eca0cb5a..38a9bfed04df 100644
--- a/src/fw/paravirt.c
+++ b/src/fw/paravirt.c
@@ -781,6 +781,12 @@ static int qemu_early_e820(void)
if (RamSizeOver4G < table.address + table.length - 0x100000000LL)
RamSizeOver4G = table.address + table.length - 0x100000000LL;
}
+ break;
+ case E820_TYPE_SOFT_RESERVED:
+ e820_add(table.address, table.length, table.type);
+ dprintf(1, "qemu/e820: addr 0x%016llx len 0x%016llx [Sort reserved]\n",
+ table.address, table.length);
+ break;
}
}
====================
QEMU comand line(nothing special to QEMU expect specifying your own bios):
/path/to/qemu <...args...> \
-machine type=q35,cxl=on \
-bios /path/to/seabios/out/bios.bin \
-nographic \
-object memory-backend-ram,id=vmem0,share=on,size=4G \
-device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
-device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
-device cxl-type3,bus=root_port13,volatile-memdev=vmem0,id=cxl-vmem0 \
-M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=64G
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 1/4] kernel/resource: Introduce managed SOFT RESERVED resources
2025-03-07 23:05 ` Bowman, Terry
@ 2025-03-10 6:00 ` Zhijian Li (Fujitsu)
2025-03-23 8:24 ` Zhijian Li (Fujitsu)
1 sibling, 0 replies; 25+ messages in thread
From: Zhijian Li (Fujitsu) @ 2025-03-10 6:00 UTC (permalink / raw)
To: Bowman, Terry, Fontenot, Nathan, David Hildenbrand, Fan Ni
Cc: linux-cxl, dan.j.williams, alison.schofield, linux-mm, gourry,
PradeepVineshReddy.Kodamati
On 08/03/2025 07:05, Bowman, Terry wrote:
> Hi Zhijian,
>
> Nathan asked me to finish the patchset submission in his place. He has the
> v3 iteration ready and I plan to send this for review next week.
Glad to know this, that's cool.
>
> Can you share the QEMU changes for simulating the situation? Using QEMU
> would be very helpful.
Sure, please take a look at the modification in
https://lore.kernel.org/all/20250310055234.3704571-1-lizhijian@fujitsu.com/
Thanks
Zhijian
>
> Thanks for offering to help test and review.
>
> Regards,
> Terry
^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: [PATCH v2 1/4] kernel/resource: Introduce managed SOFT RESERVED resources
2025-03-07 23:05 ` Bowman, Terry
2025-03-10 6:00 ` Zhijian Li (Fujitsu)
@ 2025-03-23 8:24 ` Zhijian Li (Fujitsu)
2025-03-23 8:33 ` Zhijian Li (Fujitsu)
1 sibling, 1 reply; 25+ messages in thread
From: Zhijian Li (Fujitsu) @ 2025-03-23 8:24 UTC (permalink / raw)
To: Bowman, Terry, Fontenot, Nathan, David Hildenbrand, Fan Ni
Cc: linux-cxl, dan.j.williams, alison.schofield, linux-mm, gourry,
PradeepVineshReddy.Kodamati, Yasunori Gotou (Fujitsu),
Alison Schofield
Hi Nathan
> Nathan asked me to finish the patchset submission in his place. He has the
> v3 iteration ready and I plan to send this for review next week.
I have a new update in QEMU which allows QEMU to automatically programing the device HDM decoders
and the Host-bridge HDM decoder
With this update, after the QEMU guest booted, we can see that the kernel will automatically construct a
Cxl region for the emulated memory device according the programed HDM decoders.
If you are interested in it, please check following branch:
https://github.com/zhijianli88/qemu/tree/program-decoder
Note:
Only one host-bridge + one memdev topo is tested
An CXL qemu command line example:
-device pcie-root-port,id=pci-root,slot=4,bus=pcie.0,chassis=0 \
-device pxb-cxl,id=pxb-cxl.0,bus=pcie.0,bus_nr=0x35,hdm_for_passthrough=true \
-device cxl-rp,id=cxl-rp-hb0rp0,bus=pxb-cxl.0,chassis=0,slot=0,port=0 \
-device cxl-type3,bus=cxl-rp-hb0rp0,volatile-memdev=cxl-mem0,id=cxl-type3-cxl-pmem0,program-hdm-decoder=true \
-object memory-backend-file,id=cxl-mem0,share=on,mem-path=/home/lizhijian/images/cxltest0.raw,size=4G \
-M cxl=on,cxl-fmw.0.targets.0=pxb-cxl.0,cxl-fmw.0.size=64G,cxl-fmw.0.interleave-granularity=8k \
-bios /home/lizhijian/seabios/out/bios.bin
I would also like to express my gratitude to Goto-san(Cc'ing) for his contributions to the code regarding
soft-reserved emulation, which includes work on both QEMU and SeaBIOS.
Finally, it would be greatly appreciated if you could share status of your V3 patch.
Thanks
Zhijian
> -----Original Message-----
> From: Bowman, Terry <terry.bowman@amd.com>
> Sent: Saturday, March 8, 2025 7:05 AM here at home YouTube we'll see what happens
> To: Li, Zhijian/李 智坚 <lizhijian@fujitsu.com>; Fontenot, Nathan
> <nafonten@amd.com>; David Hildenbrand <david@redhat.com>; Fan Ni
> <nifan.cxl@gmail.com>
> Cc: linux-cxl@vger.kernel.org; dan.j.williams@intel.com;
> alison.schofield@intel.com; linux-mm@kvack.org; gourry@gourry.net;
> PradeepVineshReddy.Kodamati@amd.com
> Subject: Re: [PATCH v2 1/4] kernel/resource: Introduce managed SOFT
> RESERVED resources
>
>
> Hi Zhijian,
>
> Nathan asked me to finish the patchset submission in his place. He has the
> v3 iteration ready and I plan to send this for review next week.
>
> Can you share the QEMU changes for simulating the situation? Using QEMU
> would be very helpful.
>
> Thanks for offering to help test and review.
>
> Regards,
> Terry
>
>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: [PATCH v2 1/4] kernel/resource: Introduce managed SOFT RESERVED resources
2025-03-23 8:24 ` Zhijian Li (Fujitsu)
@ 2025-03-23 8:33 ` Zhijian Li (Fujitsu)
0 siblings, 0 replies; 25+ messages in thread
From: Zhijian Li (Fujitsu) @ 2025-03-23 8:33 UTC (permalink / raw)
To: Zhijian Li (Fujitsu),
Bowman, Terry, Fontenot, Nathan, David Hildenbrand, Fan Ni
Cc: linux-cxl, dan.j.williams, alison.schofield, linux-mm, gourry,
PradeepVineshReddy.Kodamati, Yasunori Gotou (Fujitsu),
Alison Schofield
Terry,
My apologies, Tthis reply should have been addressed to you. I hope you don't mind. :)
> -----Original Message-----
> From: Zhijian Li (Fujitsu) <lizhijian@fujitsu.com>
> Sent: Sunday, March 23, 2025 4:25 PM
> To: Bowman, Terry <terry.bowman@amd.com>; Fontenot, Nathan
> <nafonten@amd.com>; David Hildenbrand <david@redhat.com>; Fan Ni
> <nifan.cxl@gmail.com>
> Cc: linux-cxl@vger.kernel.org; dan.j.williams@intel.com;
> alison.schofield@intel.com; linux-mm@kvack.org; gourry@gourry.net;
> PradeepVineshReddy.Kodamati@amd.com; Gotou, Yasunori/五島 康文 <y-
> goto@fujitsu.com>; Alison Schofield <alison.schofield@intel.com>
> Subject: RE: [PATCH v2 1/4] kernel/resource: Introduce managed SOFT
> RESERVED resources
>
> Hi Nathan
>
> > Nathan asked me to finish the patchset submission in his place. He has
> > the
> > v3 iteration ready and I plan to send this for review next week.
>
> I have a new update in QEMU which allows QEMU to automatically
> programing the device HDM decoders and the Host-bridge HDM decoder
>
> With this update, after the QEMU guest booted, we can see that the kernel
> will automatically construct a Cxl region for the emulated memory device
> according the programed HDM decoders.
>
> If you are interested in it, please check following branch:
> https://github.com/zhijianli88/qemu/tree/program-decoder
>
> Note:
> Only one host-bridge + one memdev topo is tested
>
> An CXL qemu command line example:
> -device pcie-root-port,id=pci-root,slot=4,bus=pcie.0,chassis=0 \ -device pxb-
> cxl,id=pxb-cxl.0,bus=pcie.0,bus_nr=0x35,hdm_for_passthrough=true \ -device
> cxl-rp,id=cxl-rp-hb0rp0,bus=pxb-cxl.0,chassis=0,slot=0,port=0 \ -device cxl-
> type3,bus=cxl-rp-hb0rp0,volatile-memdev=cxl-mem0,id=cxl-type3-cxl-
> pmem0,program-hdm-decoder=true \ -object memory-backend-file,id=cxl-
> mem0,share=on,mem-path=/home/lizhijian/images/cxltest0.raw,size=4G \ -
> M cxl=on,cxl-fmw.0.targets.0=pxb-cxl.0,cxl-fmw.0.size=64G,cxl-
> fmw.0.interleave-granularity=8k \ -bios /home/lizhijian/seabios/out/bios.bin
>
> I would also like to express my gratitude to Goto-san(Cc'ing) for his
> contributions to the code regarding soft-reserved emulation, which includes
> work on both QEMU and SeaBIOS.
>
> Finally, it would be greatly appreciated if you could share status of your V3
> patch.
>
>
> Thanks
> Zhijian
>
> > -----Original Message-----
> > From: Bowman, Terry <terry.bowman@amd.com>
> > Sent: Saturday, March 8, 2025 7:05 AM here at home YouTube we'll see
> > what happens
> > To: Li, Zhijian/李 智坚 <lizhijian@fujitsu.com>; Fontenot, Nathan
> > <nafonten@amd.com>; David Hildenbrand <david@redhat.com>; Fan Ni
> > <nifan.cxl@gmail.com>
> > Cc: linux-cxl@vger.kernel.org; dan.j.williams@intel.com;
> > alison.schofield@intel.com; linux-mm@kvack.org; gourry@gourry.net;
> > PradeepVineshReddy.Kodamati@amd.com
> > Subject: Re: [PATCH v2 1/4] kernel/resource: Introduce managed SOFT
> > RESERVED resources
> >
> >
> > Hi Zhijian,
> >
> > Nathan asked me to finish the patchset submission in his place. He has
> > the
> > v3 iteration ready and I plan to send this for review next week.
> >
> > Can you share the QEMU changes for simulating the situation? Using
> > QEMU would be very helpful.
> >
> > Thanks for offering to help test and review.
> >
> > Regards,
> > Terry
> >
> >
> >
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2025-03-23 8:33 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-01-16 17:42 [PATCH v2 0/4] Add managed SOFT RESERVE resource handling Nathan Fontenot
2025-01-16 17:42 ` [PATCH v2 1/4] kernel/resource: Introduce managed SOFT RESERVED resources Nathan Fontenot
2025-01-21 8:19 ` David Hildenbrand
2025-01-21 18:57 ` Fontenot, Nathan
2025-01-22 6:03 ` Fan Ni
2025-01-23 15:49 ` Fontenot, Nathan
2025-01-27 14:40 ` David Hildenbrand
2025-01-27 18:46 ` Fontenot, Nathan
2025-03-07 5:56 ` Zhijian Li (Fujitsu)
2025-03-07 16:47 ` Alison Schofield
2025-03-10 5:52 ` Li Zhijian
2025-03-07 23:05 ` Bowman, Terry
2025-03-10 6:00 ` Zhijian Li (Fujitsu)
2025-03-23 8:24 ` Zhijian Li (Fujitsu)
2025-03-23 8:33 ` Zhijian Li (Fujitsu)
2025-01-22 5:52 ` Fan Ni
2025-01-23 15:55 ` Fontenot, Nathan
2025-01-16 17:42 ` [PATCH v2 2/4] cxl: Update Soft Reserve resources upon region creation Nathan Fontenot
2025-01-16 17:42 ` [PATCH v2 3/4] dax: Update hmem resource/device registration Nathan Fontenot
2025-01-16 22:28 ` Ira Weiny
2025-01-21 18:49 ` Fontenot, Nathan
2025-01-21 23:14 ` Ira Weiny
2025-01-23 16:01 ` Fontenot, Nathan
2025-01-27 18:44 ` Fontenot, Nathan
2025-01-16 17:42 ` [PATCH v2 4/4] Add SOFT RESERVE resource notification chain Nathan Fontenot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox