* [PATCH 0/4] mm/mempolicy: Add memory hotplug support in weighted interleave
@ 2025-03-07 6:35 Rakie Kim
2025-03-07 6:35 ` [PATCH 1/4] mm/mempolicy: Support memory hotplug " Rakie Kim
` (4 more replies)
0 siblings, 5 replies; 15+ messages in thread
From: Rakie Kim @ 2025-03-07 6:35 UTC (permalink / raw)
To: gourry
Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, kernel_team, honggyu.kim,
yunjeong.mun, rakie.kim
This patch series enhances the weighted interleave policy in mempolicy
to support memory hotplug, ensuring that newly added memory nodes are
properly recognized and integrated into the weighted interleave mechanism.
The weighted interleave policy distributes page allocations across
multiple NUMA nodes based on their performance weight, optimizing memory
bandwidth utilization. The weight values for each node are configured
through sysfs. However, the existing implementation only created sysfs
entries at initialization, leading to the following issues:
Unnecessary sysfs entries: Nodes without memory were included in sysfs
at boot.
Missing hotplug support: Nodes that became online after initialization
were not recognized, causing incomplete interleave configurations.
To resolve these issues, the first patch introduces two key changes:
Filtered sysfs creation at initialization Only nodes that are online
and have memory are registered.
Dynamic sysfs updates for hotplugged nodes New memory nodes are
recognized and integrated via the memory hotplug mechanism.
Subsequent patches refine this functionality:
Patch 2: Enables sysfs registration for memory nodes added via hotplug.
Patch 3: Fixes a race condition that caused duplicate sysfs entries when
registering interleave settings.
Patch 4: Ensures proper deallocation of kobjects and memory, preventing
resource leaks in mempolicy_sysfs_init().
With these changes, the weighted interleave policy can dynamically adapt
to memory hotplug events, improving NUMA memory management and system
stability.
Patch Summary
[PATCH 1/4] mm/mempolicy: Support memory hotplug in weighted interleave
Adds dynamic sysfs integration for memory hotplug in weighted interleave.
[PATCH 2/4] mm/mempolicy: Enable sysfs support for memory hotplug in
weighted interleave
Implements sysfs attribute registration for newly detected memory nodes.
[PATCH 3/4] mm/mempolicy: Fix duplicate node addition in sysfs for
weighted interleave
Prevents redundant sysfs entries when configuring interleave settings.
[PATCH 4/4] mm/mempolicy: Fix memory leaks in mempolicy_sysfs_init()
Ensures proper kobject and memory deallocation to prevent resource leaks.
These patches have been tested to ensure correct memory node detection,
proper sysfs updates, and stability improvements in memory hotplug scenarios.
mm/mempolicy.c | 172 +++++++++++++++++++++++++++++++++++--------------
1 file changed, 122 insertions(+), 50 deletions(-)
base-commit: 7eb172143d5508b4da468ed59ee857c6e5e01da6
--
2.34.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/4] mm/mempolicy: Support memory hotplug in weighted interleave
2025-03-07 6:35 [PATCH 0/4] mm/mempolicy: Add memory hotplug support in weighted interleave Rakie Kim
@ 2025-03-07 6:35 ` Rakie Kim
2025-03-07 6:35 ` [PATCH 2/4] mm/mempolicy: Enable sysfs support for " Rakie Kim
` (3 subsequent siblings)
4 siblings, 0 replies; 15+ messages in thread
From: Rakie Kim @ 2025-03-07 6:35 UTC (permalink / raw)
To: gourry
Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, kernel_team, honggyu.kim,
yunjeong.mun, rakie.kim
The weighted interleave policy distributes page allocations across multiple
NUMA nodes based on their performance weight, thereby optimizing memory
bandwidth utilization. The weight values for each node are configured
through sysfs.
Previously, the sysfs entries for configuring weighted interleave were only
created during initialization. This approach had several limitations:
- Sysfs entries were generated for all possible nodes at boot time,
including nodes without memory, leading to unnecessary sysfs creation.
- Some memory devices transition to an online state after initialization,
but the existing implementation failed to create sysfs entries for
these dynamically added nodes. As a result, memory hotplugged nodes
were not properly recognized by the weighed interleave mechanism.
To resolve these issues, this patch introduces two key improvements:
1) At initialization, only nodes that are online and have memory are
recognized, preventing the creation of unnecessary sysfs entries.
2) Nodes that become available after initialization are dynamically
detected and integrated through the memory hotplug mechanism.
With this enhancement, the weighted interleave policy now properly supports
memory hotplug, ensuring that newly added nodes are recognized and sysfs
entries are created accordingly.
Signed-off-by: Rakie Kim <rakie.kim@sk.com>
---
mm/mempolicy.c | 44 +++++++++++++++++++++++++++++++++++++++-----
1 file changed, 39 insertions(+), 5 deletions(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index bbaadbeeb291..385607179ebd 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -113,6 +113,7 @@
#include <asm/tlbflush.h>
#include <asm/tlb.h>
#include <linux/uaccess.h>
+#include <linux/memory.h>
#include "internal.h"
@@ -3489,9 +3490,35 @@ static int add_weight_node(int nid, struct kobject *wi_kobj)
return 0;
}
+struct kobject *wi_kobj;
+
+static int wi_node_notifier(struct notifier_block *nb,
+ unsigned long action, void *data)
+{
+ int err;
+ struct memory_notify *arg = data;
+ int nid = arg->status_change_nid;
+
+ if (nid < 0)
+ goto notifier_end;
+
+ switch(action) {
+ case MEM_ONLINE:
+ err = add_weight_node(nid, wi_kobj);
+ if (err) {
+ pr_err("failed to add sysfs [node%d]\n", nid);
+ kobject_put(wi_kobj);
+ return NOTIFY_BAD;
+ }
+ break;
+ }
+
+notifier_end:
+ return NOTIFY_OK;
+}
+
static int add_weighted_interleave_group(struct kobject *root_kobj)
{
- struct kobject *wi_kobj;
int nid, err;
wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
@@ -3505,16 +3532,23 @@ static int add_weighted_interleave_group(struct kobject *root_kobj)
return err;
}
- for_each_node_state(nid, N_POSSIBLE) {
+ for_each_online_node(nid) {
+ if (!node_state(nid, N_MEMORY))
+ continue;
+
err = add_weight_node(nid, wi_kobj);
if (err) {
pr_err("failed to add sysfs [node%d]\n", nid);
- break;
+ goto err_out;
}
}
- if (err)
- kobject_put(wi_kobj);
+
+ hotplug_memory_notifier(wi_node_notifier, DEFAULT_CALLBACK_PRI);
return 0;
+
+err_out:
+ kobject_put(wi_kobj);
+ return err;
}
static void mempolicy_kobj_release(struct kobject *kobj)
--
2.34.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 2/4] mm/mempolicy: Enable sysfs support for memory hotplug in weighted interleave
2025-03-07 6:35 [PATCH 0/4] mm/mempolicy: Add memory hotplug support in weighted interleave Rakie Kim
2025-03-07 6:35 ` [PATCH 1/4] mm/mempolicy: Support memory hotplug " Rakie Kim
@ 2025-03-07 6:35 ` Rakie Kim
2025-03-07 18:19 ` Joshua Hahn
2025-03-07 6:35 ` [PATCH 3/4] mm/mempolicy: Fix duplicate node addition in sysfs for " Rakie Kim
` (2 subsequent siblings)
4 siblings, 1 reply; 15+ messages in thread
From: Rakie Kim @ 2025-03-07 6:35 UTC (permalink / raw)
To: gourry
Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, kernel_team, honggyu.kim,
yunjeong.mun, rakie.kim
Previously, sysfs entries for weighted interleave were only created during
initialization, preventing dynamically added memory nodes from being recognized.
This patch enables sysfs registration for nodes added via memory hotplug,
allowing weighted interleave settings to be updated as the system memory
configuration changes.
Signed-off-by: Rakie Kim <rakie.kim@sk.com>
---
mm/mempolicy.c | 51 +++++++++++++++++++++++++++++++-------------------
1 file changed, 32 insertions(+), 19 deletions(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 385607179ebd..fc10a9a4be86 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -3389,6 +3389,13 @@ struct iw_node_attr {
int nid;
};
+struct iw_node_group {
+ struct kobject *wi_kobj;
+ struct iw_node_attr **nattrs;
+};
+
+static struct iw_node_group *ngrp;
+
static ssize_t node_show(struct kobject *kobj, struct kobj_attribute *attr,
char *buf)
{
@@ -3431,8 +3438,6 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
return count;
}
-static struct iw_node_attr **node_attrs;
-
static void sysfs_wi_node_release(struct iw_node_attr *node_attr,
struct kobject *parent)
{
@@ -3448,7 +3453,7 @@ static void sysfs_wi_release(struct kobject *wi_kobj)
int i;
for (i = 0; i < nr_node_ids; i++)
- sysfs_wi_node_release(node_attrs[i], wi_kobj);
+ sysfs_wi_node_release(ngrp->nattrs[i], wi_kobj);
kobject_put(wi_kobj);
}
@@ -3486,12 +3491,10 @@ static int add_weight_node(int nid, struct kobject *wi_kobj)
return -ENOMEM;
}
- node_attrs[nid] = node_attr;
+ ngrp->nattrs[nid] = node_attr;
return 0;
}
-struct kobject *wi_kobj;
-
static int wi_node_notifier(struct notifier_block *nb,
unsigned long action, void *data)
{
@@ -3504,10 +3507,10 @@ static int wi_node_notifier(struct notifier_block *nb,
switch(action) {
case MEM_ONLINE:
- err = add_weight_node(nid, wi_kobj);
+ err = add_weight_node(nid, ngrp->wi_kobj);
if (err) {
pr_err("failed to add sysfs [node%d]\n", nid);
- kobject_put(wi_kobj);
+ kobject_put(ngrp->wi_kobj);
return NOTIFY_BAD;
}
break;
@@ -3521,14 +3524,14 @@ static int add_weighted_interleave_group(struct kobject *root_kobj)
{
int nid, err;
- wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
- if (!wi_kobj)
+ ngrp->wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
+ if (!ngrp->wi_kobj)
return -ENOMEM;
- err = kobject_init_and_add(wi_kobj, &wi_ktype, root_kobj,
+ err = kobject_init_and_add(ngrp->wi_kobj, &wi_ktype, root_kobj,
"weighted_interleave");
if (err) {
- kfree(wi_kobj);
+ kfree(ngrp->wi_kobj);
return err;
}
@@ -3536,7 +3539,7 @@ static int add_weighted_interleave_group(struct kobject *root_kobj)
if (!node_state(nid, N_MEMORY))
continue;
- err = add_weight_node(nid, wi_kobj);
+ err = add_weight_node(nid, ngrp->wi_kobj);
if (err) {
pr_err("failed to add sysfs [node%d]\n", nid);
goto err_out;
@@ -3547,7 +3550,7 @@ static int add_weighted_interleave_group(struct kobject *root_kobj)
return 0;
err_out:
- kobject_put(wi_kobj);
+ kobject_put(ngrp->wi_kobj);
return err;
}
@@ -3562,7 +3565,9 @@ static void mempolicy_kobj_release(struct kobject *kobj)
mutex_unlock(&iw_table_lock);
synchronize_rcu();
kfree(old);
- kfree(node_attrs);
+
+ kfree(ngrp->nattrs);
+ kfree(ngrp);
kfree(kobj);
}
@@ -3581,13 +3586,19 @@ static int __init mempolicy_sysfs_init(void)
goto err_out;
}
- node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
- GFP_KERNEL);
- if (!node_attrs) {
+ ngrp = kzalloc(sizeof(*ngrp), GFP_KERNEL);
+ if (!ngrp) {
err = -ENOMEM;
goto mempol_out;
}
+ ngrp->nattrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
+ GFP_KERNEL);
+ if (!ngrp->nattrs) {
+ err = -ENOMEM;
+ goto ngrp_out;
+ }
+
err = kobject_init_and_add(mempolicy_kobj, &mempolicy_ktype, mm_kobj,
"mempolicy");
if (err)
@@ -3602,7 +3613,9 @@ static int __init mempolicy_sysfs_init(void)
return err;
node_out:
- kfree(node_attrs);
+ kfree(ngrp->nattrs);
+ngrp_out:
+ kfree(ngrp);
mempol_out:
kfree(mempolicy_kobj);
err_out:
--
2.34.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 3/4] mm/mempolicy: Fix duplicate node addition in sysfs for weighted interleave
2025-03-07 6:35 [PATCH 0/4] mm/mempolicy: Add memory hotplug support in weighted interleave Rakie Kim
2025-03-07 6:35 ` [PATCH 1/4] mm/mempolicy: Support memory hotplug " Rakie Kim
2025-03-07 6:35 ` [PATCH 2/4] mm/mempolicy: Enable sysfs support for " Rakie Kim
@ 2025-03-07 6:35 ` Rakie Kim
2025-03-07 6:35 ` [PATCH 4/4] mm/mempolicy: Fix memory leaks in mempolicy_sysfs_init() Rakie Kim
2025-03-07 15:56 ` [PATCH 0/4] mm/mempolicy: Add memory hotplug support in weighted interleave Gregory Price
4 siblings, 0 replies; 15+ messages in thread
From: Rakie Kim @ 2025-03-07 6:35 UTC (permalink / raw)
To: gourry
Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, kernel_team, honggyu.kim,
yunjeong.mun, rakie.kim
Sysfs attributes for interleave control were registered both at initialization
and when new nodes were detected via hotplug, leading to potential duplicates.
This patch ensures that each node is registered only once, preventing conflicts
and redundant sysfs entries.
Signed-off-by: Rakie Kim <rakie.kim@sk.com>
---
mm/mempolicy.c | 65 ++++++++++++++++++++++++++++++++++----------------
1 file changed, 45 insertions(+), 20 deletions(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index fc10a9a4be86..2d19434c61ed 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -3391,6 +3391,7 @@ struct iw_node_attr {
struct iw_node_group {
struct kobject *wi_kobj;
+ struct mutex kobj_lock;
struct iw_node_attr **nattrs;
};
@@ -3441,11 +3442,15 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
static void sysfs_wi_node_release(struct iw_node_attr *node_attr,
struct kobject *parent)
{
- if (!node_attr)
+ mutex_lock(&ngrp->kobj_lock);
+ if (!node_attr) {
+ mutex_unlock(&ngrp->kobj_lock);
return;
+ }
sysfs_remove_file(parent, &node_attr->kobj_attr.attr);
kfree(node_attr->kobj_attr.attr.name);
kfree(node_attr);
+ mutex_unlock(&ngrp->kobj_lock);
}
static void sysfs_wi_release(struct kobject *wi_kobj)
@@ -3464,35 +3469,54 @@ static const struct kobj_type wi_ktype = {
static int add_weight_node(int nid, struct kobject *wi_kobj)
{
- struct iw_node_attr *node_attr;
+ int ret = 0;
char *name;
- node_attr = kzalloc(sizeof(*node_attr), GFP_KERNEL);
- if (!node_attr)
- return -ENOMEM;
+ if (nid < 0 || nid >= nr_node_ids) {
+ pr_err("Invalid node id: %d\n", nid);
+ ret = -EINVAL;
+ goto out;
+ }
+
+ mutex_lock(&ngrp->kobj_lock);
+ if (!ngrp->nattrs[nid]) {
+ ngrp->nattrs[nid] = kzalloc(sizeof(struct iw_node_attr), GFP_KERNEL);
+ } else {
+ mutex_unlock(&ngrp->kobj_lock);
+ pr_info("Node [%d] is already existed\n", nid);
+ goto out;
+ }
+ mutex_unlock(&ngrp->kobj_lock);
+
+ if (!ngrp->nattrs[nid]) {
+ ret = -ENOMEM;
+ goto out;
+ }
name = kasprintf(GFP_KERNEL, "node%d", nid);
if (!name) {
- kfree(node_attr);
- return -ENOMEM;
+ kfree(ngrp->nattrs[nid]);
+ ret = -ENOMEM;
+ goto out;
}
- sysfs_attr_init(&node_attr->kobj_attr.attr);
- node_attr->kobj_attr.attr.name = name;
- node_attr->kobj_attr.attr.mode = 0644;
- node_attr->kobj_attr.show = node_show;
- node_attr->kobj_attr.store = node_store;
- node_attr->nid = nid;
+ sysfs_attr_init(&ngrp->nattrs[nid]->kobj_attr.attr);
+ ngrp->nattrs[nid]->kobj_attr.attr.name = name;
+ ngrp->nattrs[nid]->kobj_attr.attr.mode = 0644;
+ ngrp->nattrs[nid]->kobj_attr.show = node_show;
+ ngrp->nattrs[nid]->kobj_attr.store = node_store;
+ ngrp->nattrs[nid]->nid = nid;
- if (sysfs_create_file(wi_kobj, &node_attr->kobj_attr.attr)) {
- kfree(node_attr->kobj_attr.attr.name);
- kfree(node_attr);
- pr_err("failed to add attribute to weighted_interleave\n");
- return -ENOMEM;
+ ret = sysfs_create_file(wi_kobj, &ngrp->nattrs[nid]->kobj_attr.attr);
+ if (ret) {
+ kfree(ngrp->nattrs[nid]->kobj_attr.attr.name);
+ kfree(ngrp->nattrs[nid]);
+ pr_err("failed to add attribute to weighted_interleave: %d\n", ret);
+ goto out;
}
- ngrp->nattrs[nid] = node_attr;
- return 0;
+out:
+ return ret;
}
static int wi_node_notifier(struct notifier_block *nb,
@@ -3591,6 +3615,7 @@ static int __init mempolicy_sysfs_init(void)
err = -ENOMEM;
goto mempol_out;
}
+ mutex_init(&ngrp->kobj_lock);
ngrp->nattrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
GFP_KERNEL);
--
2.34.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 4/4] mm/mempolicy: Fix memory leaks in mempolicy_sysfs_init()
2025-03-07 6:35 [PATCH 0/4] mm/mempolicy: Add memory hotplug support in weighted interleave Rakie Kim
` (2 preceding siblings ...)
2025-03-07 6:35 ` [PATCH 3/4] mm/mempolicy: Fix duplicate node addition in sysfs for " Rakie Kim
@ 2025-03-07 6:35 ` Rakie Kim
2025-03-07 15:23 ` Gregory Price
2025-03-07 15:56 ` [PATCH 0/4] mm/mempolicy: Add memory hotplug support in weighted interleave Gregory Price
4 siblings, 1 reply; 15+ messages in thread
From: Rakie Kim @ 2025-03-07 6:35 UTC (permalink / raw)
To: gourry
Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, kernel_team, honggyu.kim,
yunjeong.mun, rakie.kim
Improper cleanup of sysfs attributes caused kobject and memory leaks when
initialization failed or nodes were removed.
This patch ensures proper deallocation of kobjects and memory, preventing
resource leaks and improving stability.
Signed-off-by: Rakie Kim <rakie.kim@sk.com>
---
mm/mempolicy.c | 34 +++++++++++++++++-----------------
1 file changed, 17 insertions(+), 17 deletions(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 2d19434c61ed..441a0635e81d 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -3604,47 +3604,47 @@ static int __init mempolicy_sysfs_init(void)
int err;
static struct kobject *mempolicy_kobj;
- mempolicy_kobj = kzalloc(sizeof(*mempolicy_kobj), GFP_KERNEL);
- if (!mempolicy_kobj) {
- err = -ENOMEM;
- goto err_out;
- }
-
ngrp = kzalloc(sizeof(*ngrp), GFP_KERNEL);
if (!ngrp) {
err = -ENOMEM;
- goto mempol_out;
+ goto err_out;
}
mutex_init(&ngrp->kobj_lock);
ngrp->nattrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
- GFP_KERNEL);
+ GFP_KERNEL);
if (!ngrp->nattrs) {
err = -ENOMEM;
goto ngrp_out;
}
+ mempolicy_kobj = kzalloc(sizeof(*mempolicy_kobj), GFP_KERNEL);
+ if (!mempolicy_kobj) {
+ err = -ENOMEM;
+ goto nattr_out;
+ }
+
err = kobject_init_and_add(mempolicy_kobj, &mempolicy_ktype, mm_kobj,
"mempolicy");
- if (err)
- goto node_out;
+ if (err) {
+ kobject_put(mempolicy_kobj);
+ goto err_out;
+ }
err = add_weighted_interleave_group(mempolicy_kobj);
if (err) {
- pr_err("mempolicy sysfs structure failed to initialize\n");
kobject_put(mempolicy_kobj);
- return err;
+ goto err_out;
}
- return err;
-node_out:
+ return 0;
+
+nattr_out:
kfree(ngrp->nattrs);
ngrp_out:
kfree(ngrp);
-mempol_out:
- kfree(mempolicy_kobj);
err_out:
- pr_err("failed to add mempolicy kobject to the system\n");
+ pr_err("mempolicy sysfs structure failed to initialize\n");
return err;
}
--
2.34.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4/4] mm/mempolicy: Fix memory leaks in mempolicy_sysfs_init()
2025-03-07 6:35 ` [PATCH 4/4] mm/mempolicy: Fix memory leaks in mempolicy_sysfs_init() Rakie Kim
@ 2025-03-07 15:23 ` Gregory Price
2025-03-10 8:23 ` Rakie Kim
0 siblings, 1 reply; 15+ messages in thread
From: Gregory Price @ 2025-03-07 15:23 UTC (permalink / raw)
To: Rakie Kim
Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, kernel_team, honggyu.kim,
yunjeong.mun
On Fri, Mar 07, 2025 at 03:35:33PM +0900, Rakie Kim wrote:
> Improper cleanup of sysfs attributes caused kobject and memory leaks when
> initialization failed or nodes were removed.
>
Is this fixing something in your patch set or fixing something in the
current upstream code? If in the current patch set, roll this into the
patch that causes it.
If this is fixing something upstream, I recommend submitting this
separately to stable and rebasing on top of it.
~Gregory
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/4] mm/mempolicy: Add memory hotplug support in weighted interleave
2025-03-07 6:35 [PATCH 0/4] mm/mempolicy: Add memory hotplug support in weighted interleave Rakie Kim
` (3 preceding siblings ...)
2025-03-07 6:35 ` [PATCH 4/4] mm/mempolicy: Fix memory leaks in mempolicy_sysfs_init() Rakie Kim
@ 2025-03-07 15:56 ` Gregory Price
2025-03-07 21:55 ` Gregory Price
2025-03-10 9:03 ` Rakie Kim
4 siblings, 2 replies; 15+ messages in thread
From: Gregory Price @ 2025-03-07 15:56 UTC (permalink / raw)
To: Rakie Kim
Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, kernel_team, honggyu.kim,
yunjeong.mun
On Fri, Mar 07, 2025 at 03:35:29PM +0900, Rakie Kim wrote:
> Unnecessary sysfs entries: Nodes without memory were included in sysfs
> at boot.
> Missing hotplug support: Nodes that became online after initialization
> were not recognized, causing incomplete interleave configurations.
This comment is misleading. Nodes can "come online" but they are
absolutely detected during init - as nodes cannot be "hotplugged"
themselves. Resources can be added *to* nodes, but nodes themselves
cannot be added or removed.
I think what you're trying to say here is:
1) The current system creates 1 entry per possible node (explicitly)
2) Not all nodes may have memory at all times (memory can be hotplugged)
3) It would be nice to let sysfs and weighted interleave omit memoryless
nodes until those nodes had memory hotplugged into them.
> Dynamic sysfs updates for hotplugged nodes New memory nodes are
> recognized and integrated via the memory hotplug mechanism.
> Subsequent patches refine this functionality:
>
Just going to reiterate that that there's no such this as a hotplug node
or "new nodes" - only nodes that have their attributes changed (i.e.
!N_MEMORY -> N_MEMORY). The node exists, it may just not have anything
associated with it.
Maybe semantic nits, but it matters. The nodes are present and can be
operated on before memory comes online, and that has implications for
users. Depending on how that hardware comes online, it may or may not
report its performance data prior to memory hotplug.
If it doesn't report its performance data, then hiding the node before
it hotplugs memory means a user can't pre-configure the system for when
the memory is added (which could be used immediately).
Hiding the node until hotplug also means we have hidden state. We need
to capture pre-hotplug reported performance data so that if it comes
online the auto-calculation of weights is correct. But if the user has
already switched from auto to manual mode, then a node suddenly
appearing will have an unknown state.
This is why I initially chose to just expose N_POSSIBLE entries in
sysfs, because the transition state causes hidden information - and that
felt worse than extra entries. I suppose I should add some
documentation somewhere that discusses this issue.
I think the underlying issue you're dealing with is that the system is
creating more nodes for you than it should.
~Gregory
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/4] mm/mempolicy: Enable sysfs support for memory hotplug in weighted interleave
2025-03-07 6:35 ` [PATCH 2/4] mm/mempolicy: Enable sysfs support for " Rakie Kim
@ 2025-03-07 18:19 ` Joshua Hahn
2025-03-10 8:28 ` Rakie Kim
0 siblings, 1 reply; 15+ messages in thread
From: Joshua Hahn @ 2025-03-07 18:19 UTC (permalink / raw)
To: Rakie Kim
Cc: gourry, akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, kernel_team, honggyu.kim,
yunjeong.mun
On Fri, 7 Mar 2025 15:35:31 +0900 Rakie Kim <rakie.kim@sk.com> wrote:
Hi Rakie, thank you for your work on this patch! I think it makes a lot of
sense, given the discussion between Gregory & Honggyu in the weighted
interleave auto-tuning patch.
I have a few small nits and questions that I wanted to raise, but none that
should change the behavior at all : -)
> Previously, sysfs entries for weighted interleave were only created during
> initialization, preventing dynamically added memory nodes from being recognized.
>
> This patch enables sysfs registration for nodes added via memory hotplug,
> allowing weighted interleave settings to be updated as the system memory
> configuration changes.
>
> Signed-off-by: Rakie Kim <rakie.kim@sk.com>
> ---
> mm/mempolicy.c | 51 +++++++++++++++++++++++++++++++-------------------
> 1 file changed, 32 insertions(+), 19 deletions(-)
>
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 385607179ebd..fc10a9a4be86 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -3389,6 +3389,13 @@ struct iw_node_attr {
> int nid;
> };
>
> +struct iw_node_group {
> + struct kobject *wi_kobj;
> + struct iw_node_attr **nattrs;
> +};
> +
> +static struct iw_node_group *ngrp;
> +
> static ssize_t node_show(struct kobject *kobj, struct kobj_attribute *attr,
> char *buf)
> {
> @@ -3431,8 +3438,6 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
> return count;
> }
>
> -static struct iw_node_attr **node_attrs;
> -
> static void sysfs_wi_node_release(struct iw_node_attr *node_attr,
> struct kobject *parent)
> {
> @@ -3448,7 +3453,7 @@ static void sysfs_wi_release(struct kobject *wi_kobj)
> int i;
>
> for (i = 0; i < nr_node_ids; i++)
> - sysfs_wi_node_release(node_attrs[i], wi_kobj);
> + sysfs_wi_node_release(ngrp->nattrs[i], wi_kobj);
Nit: I think it is slightly awkward to have a global struct ngrp, and then have
its members passed individually like this. Of course there's nothing that
we can do for sysfs_wi_release's argument, but I think we can make the
arguments for sysfs_wi_node_release a bit cleaner. An idea would just be to
pass an integer (nid) instead of the nattrs[i] pointer. We also don't need
to pass wi_kobj, since it is accessible from within sysfs_wi_node_release.
Once we make both these changes, patch 3 becomes a little bit cleaner (IMHO),
where we acquire the lock for the ngrp struct, then access its contents,
and we don't have to pass two pointers as arguments when they are already
accessible via the global struct anyways.
> kobject_put(wi_kobj);
> }
>
> @@ -3486,12 +3491,10 @@ static int add_weight_node(int nid, struct kobject *wi_kobj)
> return -ENOMEM;
> }
>
> - node_attrs[nid] = node_attr;
> + ngrp->nattrs[nid] = node_attr;
> return 0;
> }
>
> -struct kobject *wi_kobj;
> -
> static int wi_node_notifier(struct notifier_block *nb,
> unsigned long action, void *data)
> {
> @@ -3504,10 +3507,10 @@ static int wi_node_notifier(struct notifier_block *nb,
>
> switch(action) {
> case MEM_ONLINE:
> - err = add_weight_node(nid, wi_kobj);
> + err = add_weight_node(nid, ngrp->wi_kobj);
Same idea here, we probably don't need to pass wi_kobj into add_weight_node.
With that said, I can also see the argument for passing the struct itself,
since it saves a line of variable declaration & definition.
[...snip...]
Please let me know what you think! I hope you have a great day, thank you
again for this patch!
Joshua
Sent using hkml (https://github.com/sjp38/hackermail)
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/4] mm/mempolicy: Add memory hotplug support in weighted interleave
2025-03-07 15:56 ` [PATCH 0/4] mm/mempolicy: Add memory hotplug support in weighted interleave Gregory Price
@ 2025-03-07 21:55 ` Gregory Price
2025-03-10 9:03 ` Rakie Kim
2025-03-10 9:03 ` Rakie Kim
1 sibling, 1 reply; 15+ messages in thread
From: Gregory Price @ 2025-03-07 21:55 UTC (permalink / raw)
To: Rakie Kim
Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, kernel_team, honggyu.kim,
yunjeong.mun
On Fri, Mar 07, 2025 at 10:56:04AM -0500, Gregory Price wrote:
>
> I think the underlying issue you're dealing with is that the system is
> creating more nodes for you than it should.
>
Looking into this for other reasons, I think you are right that multiple
numa nodes can exist that cover the same memory - just different
regions.
I can see why you would want to hide the nodes that don't actively have
memory online, but i still have concerns for nodes that may come and
go and hiding this configuration from the user until memory arrives.
An example would be a DCD device where a node could add or remove memory
at any time. If you removed the last block of memory, the node would
disappear - but the block could come back at any time. That seems
problematic, as you might want to manage that node while no memory is
present.
~Gregory
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4/4] mm/mempolicy: Fix memory leaks in mempolicy_sysfs_init()
2025-03-07 15:23 ` Gregory Price
@ 2025-03-10 8:23 ` Rakie Kim
0 siblings, 0 replies; 15+ messages in thread
From: Rakie Kim @ 2025-03-10 8:23 UTC (permalink / raw)
To: Gregory Price
Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, kernel_team, honggyu.kim,
yunjeong.mun, Rakie Kim
On Fri, 7 Mar 2025 10:23:57 -0500 Gregory Price <gourry@gourry.net> wrote:
Hi Gregory
On Fri, Mar 07, 2025 at 03:35:33PM +0900, Rakie Kim wrote:
> > Improper cleanup of sysfs attributes caused kobject and memory leaks when
> > initialization failed or nodes were removed.
> >
>
> Is this fixing something in your patch set or fixing something in the
> current upstream code? If in the current patch set, roll this into the
> patch that causes it.
>
> If this is fixing something upstream, I recommend submitting this
> separately to stable and rebasing on top of it.
Thank you for your response regarding this patch.
This patch isn't a modification of my hotplug-related patch but rather
a fix addressing issues in the existing implementation of the commit
listed below.
I will proceed to update it as a separate patch based on the mentioned
commit.
mm/mempolicy: implement the sysfs-based weighted_interleave interface
(dce41f5ae2539d1c20ae8de4e039630aec3c3f3c)
Rakie
>
>
> ~Gregory
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/4] mm/mempolicy: Enable sysfs support for memory hotplug in weighted interleave
2025-03-07 18:19 ` Joshua Hahn
@ 2025-03-10 8:28 ` Rakie Kim
0 siblings, 0 replies; 15+ messages in thread
From: Rakie Kim @ 2025-03-10 8:28 UTC (permalink / raw)
To: Joshua Hahn
Cc: gourry, akpm, linux-mm, linux-kernel, linux-cxl, dan.j.williams,
ying.huang, kernel_team, honggyu.kim, yunjeong.mun, Rakie Kim
On Fri, 7 Mar 2025 10:19:59 -0800 Joshua Hahn <joshua.hahnjy@gmail.com> wrote:
Hi Joshua
Thank you for your response regarding this patch.
> On Fri, 7 Mar 2025 15:35:31 +0900 Rakie Kim <rakie.kim@sk.com> wrote:
>
> Hi Rakie, thank you for your work on this patch! I think it makes a lot of
> sense, given the discussion between Gregory & Honggyu in the weighted
> interleave auto-tuning patch.
>
> I have a few small nits and questions that I wanted to raise, but none that
> should change the behavior at all : -)
>
> > Previously, sysfs entries for weighted interleave were only created during
> > initialization, preventing dynamically added memory nodes from being recognized.
> >
> > This patch enables sysfs registration for nodes added via memory hotplug,
> > allowing weighted interleave settings to be updated as the system memory
> > configuration changes.
> >
> > Signed-off-by: Rakie Kim <rakie.kim@sk.com>
> > ---
> > mm/mempolicy.c | 51 +++++++++++++++++++++++++++++++-------------------
> > 1 file changed, 32 insertions(+), 19 deletions(-)
> >
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index 385607179ebd..fc10a9a4be86 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -3389,6 +3389,13 @@ struct iw_node_attr {
> > int nid;
> > };
> >
> > +struct iw_node_group {
> > + struct kobject *wi_kobj;
> > + struct iw_node_attr **nattrs;
> > +};
> > +
> > +static struct iw_node_group *ngrp;
> > +
> > static ssize_t node_show(struct kobject *kobj, struct kobj_attribute *attr,
> > char *buf)
> > {
> > @@ -3431,8 +3438,6 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
> > return count;
> > }
> >
> > -static struct iw_node_attr **node_attrs;
> > -
> > static void sysfs_wi_node_release(struct iw_node_attr *node_attr,
> > struct kobject *parent)
> > {
> > @@ -3448,7 +3453,7 @@ static void sysfs_wi_release(struct kobject *wi_kobj)
> > int i;
> >
> > for (i = 0; i < nr_node_ids; i++)
> > - sysfs_wi_node_release(node_attrs[i], wi_kobj);
> > + sysfs_wi_node_release(ngrp->nattrs[i], wi_kobj);
>
> Nit: I think it is slightly awkward to have a global struct ngrp, and then have
> its members passed individually like this. Of course there's nothing that
> we can do for sysfs_wi_release's argument, but I think we can make the
> arguments for sysfs_wi_node_release a bit cleaner. An idea would just be to
> pass an integer (nid) instead of the nattrs[i] pointer. We also don't need
> to pass wi_kobj, since it is accessible from within sysfs_wi_node_release.
>
> Once we make both these changes, patch 3 becomes a little bit cleaner (IMHO),
> where we acquire the lock for the ngrp struct, then access its contents,
> and we don't have to pass two pointers as arguments when they are already
> accessible via the global struct anyways.
>
I completely agree with your observations about the use of
ngrp and wi_kobj.
When I was working on this patch, I aimed to minimize changes to the
existing code. This approach led to the creation of similar code being
used differently, as you pointed out. I'll make the necessary adjustments
and update the patch to version 2.
> > kobject_put(wi_kobj);
> > }
> >
> > @@ -3486,12 +3491,10 @@ static int add_weight_node(int nid, struct kobject *wi_kobj)
> > return -ENOMEM;
> > }
> >
> > - node_attrs[nid] = node_attr;
> > + ngrp->nattrs[nid] = node_attr;
> > return 0;
> > }
> >
> > -struct kobject *wi_kobj;
> > -
> > static int wi_node_notifier(struct notifier_block *nb,
> > unsigned long action, void *data)
> > {
> > @@ -3504,10 +3507,10 @@ static int wi_node_notifier(struct notifier_block *nb,
> >
> > switch(action) {
> > case MEM_ONLINE:
> > - err = add_weight_node(nid, wi_kobj);
> > + err = add_weight_node(nid, ngrp->wi_kobj);
>
> Same idea here, we probably don't need to pass wi_kobj into add_weight_node.
> With that said, I can also see the argument for passing the struct itself,
> since it saves a line of variable declaration & definition.
>
> [...snip...]
>
> Please let me know what you think! I hope you have a great day, thank you
> again for this patch!
> Joshua
I will also update this issue in version 2.
>
> Sent using hkml (https://github.com/sjp38/hackermail)
>
Rakie
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/4] mm/mempolicy: Add memory hotplug support in weighted interleave
2025-03-07 15:56 ` [PATCH 0/4] mm/mempolicy: Add memory hotplug support in weighted interleave Gregory Price
2025-03-07 21:55 ` Gregory Price
@ 2025-03-10 9:03 ` Rakie Kim
1 sibling, 0 replies; 15+ messages in thread
From: Rakie Kim @ 2025-03-10 9:03 UTC (permalink / raw)
To: Gregory Price
Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, kernel_team, honggyu.kim,
yunjeong.mun, Rakie Kim
On Fri, 7 Mar 2025 10:56:04 -0500 Gregory Price <gourry@gourry.net> wrote:
Hi Gregory
Thank you for your response regarding this patch.
> On Fri, Mar 07, 2025 at 03:35:29PM +0900, Rakie Kim wrote:
> > Unnecessary sysfs entries: Nodes without memory were included in sysfs
> > at boot.
> > Missing hotplug support: Nodes that became online after initialization
> > were not recognized, causing incomplete interleave configurations.
>
> This comment is misleading. Nodes can "come online" but they are
> absolutely detected during init - as nodes cannot be "hotplugged"
> themselves. Resources can be added *to* nodes, but nodes themselves
> cannot be added or removed.
>
> I think what you're trying to say here is:
>
> 1) The current system creates 1 entry per possible node (explicitly)
> 2) Not all nodes may have memory at all times (memory can be hotplugged)
> 3) It would be nice to let sysfs and weighted interleave omit memoryless
> nodes until those nodes had memory hotplugged into them.
>
> > Dynamic sysfs updates for hotplugged nodes New memory nodes are
> > recognized and integrated via the memory hotplug mechanism.
> > Subsequent patches refine this functionality:
> >
>
> Just going to reiterate that that there's no such this as a hotplug node
> or "new nodes" - only nodes that have their attributes changed (i.e.
> !N_MEMORY -> N_MEMORY). The node exists, it may just not have anything
> associated with it.
>
> Maybe semantic nits, but it matters. The nodes are present and can be
> operated on before memory comes online, and that has implications for
> users. Depending on how that hardware comes online, it may or may not
> report its performance data prior to memory hotplug.
I agree with your assessment. The existing comments, as you pointed out,
might indeed be confusing or misleading. I'll make sure this issue is
addressed in version 2.
>
> If it doesn't report its performance data, then hiding the node before
> it hotplugs memory means a user can't pre-configure the system for when
> the memory is added (which could be used immediately).
>
> Hiding the node until hotplug also means we have hidden state. We need
> to capture pre-hotplug reported performance data so that if it comes
> online the auto-calculation of weights is correct. But if the user has
> already switched from auto to manual mode, then a node suddenly
> appearing will have an unknown state.
>
> This is why I initially chose to just expose N_POSSIBLE entries in
> sysfs, because the transition state causes hidden information - and that
> felt worse than extra entries. I suppose I should add some
> documentation somewhere that discusses this issue.
>
> I think the underlying issue you're dealing with is that the system is
> creating more nodes for you than it should.
I will reply to your next comment on this issue soon.
>
> ~Gregory
Rakie
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/4] mm/mempolicy: Add memory hotplug support in weighted interleave
2025-03-07 21:55 ` Gregory Price
@ 2025-03-10 9:03 ` Rakie Kim
2025-03-10 14:13 ` Gregory Price
0 siblings, 1 reply; 15+ messages in thread
From: Rakie Kim @ 2025-03-10 9:03 UTC (permalink / raw)
To: Gregory Price
Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, kernel_team, honggyu.kim,
yunjeong.mun, Rakie Kim
On Fri, 7 Mar 2025 16:55:40 -0500 Gregory Price <gourry@gourry.net> wrote:
> On Fri, Mar 07, 2025 at 10:56:04AM -0500, Gregory Price wrote:
> >
> > I think the underlying issue you're dealing with is that the system is
> > creating more nodes for you than it should.
> >
>
> Looking into this for other reasons, I think you are right that multiple
> numa nodes can exist that cover the same memory - just different
> regions.
>
I understand your concerns, and I agree that the most critical issue at the
moment is that the system is generating more nodes than necessary.
We need to conduct a more thorough analysis of this problem, but a detailed
investigation will require a significant amount of time. In this context,
these patches might offer a quick solution to address the issue.
Additionally, it's important to note that not many CXL devices have been
developed yet, and their operations are not entirely optimized. Therefore,
we might encounter behaviors from CXL devices and servers that differ from
our expectations. I hope these patches can serve as a solution for
unforeseen issues.
> I can see why you would want to hide the nodes that don't actively have
> memory online, but i still have concerns for nodes that may come and
> go and hiding this configuration from the user until memory arrives.
>
> An example would be a DCD device where a node could add or remove memory
> at any time. If you removed the last block of memory, the node would
> disappear - but the block could come back at any time. That seems
> problematic, as you might want to manage that node while no memory is
> present.
>
> ~Gregory
Of course, the patches may need further refinements. Therefore, I plan to
simplify the patches and remove any unnecessary modifications in the upcoming
version 2 update. Once it's ready, I would be very grateful if you could take
the time to review version 2 and share any further feedback you might have.
Rakie
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/4] mm/mempolicy: Add memory hotplug support in weighted interleave
2025-03-10 9:03 ` Rakie Kim
@ 2025-03-10 14:13 ` Gregory Price
2025-03-12 8:18 ` Rakie Kim
0 siblings, 1 reply; 15+ messages in thread
From: Gregory Price @ 2025-03-10 14:13 UTC (permalink / raw)
To: Rakie Kim
Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, kernel_team, honggyu.kim,
yunjeong.mun
On Mon, Mar 10, 2025 at 06:03:59PM +0900, Rakie Kim wrote:
> On Fri, 7 Mar 2025 16:55:40 -0500 Gregory Price <gourry@gourry.net> wrote:
> > On Fri, Mar 07, 2025 at 10:56:04AM -0500, Gregory Price wrote:
> > >
> > > I think the underlying issue you're dealing with is that the system is
> > > creating more nodes for you than it should.
> > >
> >
> > Looking into this for other reasons, I think you are right that multiple
> > numa nodes can exist that cover the same memory - just different
> > regions.
> >
>
> I understand your concerns, and I agree that the most critical issue at the
> moment is that the system is generating more nodes than necessary.
> We need to conduct a more thorough analysis of this problem, but a detailed
> investigation will require a significant amount of time. In this context,
> these patches might offer a quick solution to address the issue.
>
I dug into the expected CEDT / CFMWS behaviors and had some discussions
with Dan and Jonathan - assuming your CEDT has multiple CFMWS to cover
the same set of devices, this is the expected behavior.
https://lore.kernel.org/linux-mm/Z226PG9t-Ih7fJDL@gourry-fedora-PF4VCD3F/T/#m2780e47df7f0962a79182502afc99843bb046205
Basically your BIOS is likely creating one per device and likely one
per host bridge (to allow intra-host-bridge interleave).
This puts us in an awkward state, and I need some time to consider
whether we should expose N_POSSIBLE nodes or N_MEMORY nodes.
Probably it makes sense to expose N_MEMORY nodes and allow for hidden
state, as the annoying corner condition of a DCD coming and going
most likely means a user wouldn't be using weighted interleave anyway.
So if you can confirm what you CEDT says compared to the notes above, I
think we can move forward with this.
~Gregory
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/4] mm/mempolicy: Add memory hotplug support in weighted interleave
2025-03-10 14:13 ` Gregory Price
@ 2025-03-12 8:18 ` Rakie Kim
0 siblings, 0 replies; 15+ messages in thread
From: Rakie Kim @ 2025-03-12 8:18 UTC (permalink / raw)
To: Gregory Price
Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, kernel_team, honggyu.kim,
yunjeong.mun, Rakie Kim
On Mon, 10 Mar 2025 10:13:58 -0400 Gregory Price <gourry@gourry.net> wrote:
Hi Gregory,
I have updated version 2 of the patch series, incorporating the feedback from
you and Joshua.
However, this version does not yet include updates to the commit messages
regarding the points you previously mentioned.
Your detailed explanations have been incredibly valuable in helping us analyze
the system, and I sincerely appreciate your insights.
> 2) We need to clearly define what the weight of a node will be when
> in manual mode and a node goes (memory -> no memory -> memory)
Additionally, I will soon provide an updated document addressing this and
other points you raised in your emails.
Thank you again for your guidance and support.
Rakie
> On Mon, Mar 10, 2025 at 06:03:59PM +0900, Rakie Kim wrote:
> > On Fri, 7 Mar 2025 16:55:40 -0500 Gregory Price <gourry@gourry.net> wrote:
> > > On Fri, Mar 07, 2025 at 10:56:04AM -0500, Gregory Price wrote:
> > > >
> > > > I think the underlying issue you're dealing with is that the system is
> > > > creating more nodes for you than it should.
> > > >
> > >
> > > Looking into this for other reasons, I think you are right that multiple
> > > numa nodes can exist that cover the same memory - just different
> > > regions.
> > >
> >
> > I understand your concerns, and I agree that the most critical issue at the
> > moment is that the system is generating more nodes than necessary.
> > We need to conduct a more thorough analysis of this problem, but a detailed
> > investigation will require a significant amount of time. In this context,
> > these patches might offer a quick solution to address the issue.
> >
>
> I dug into the expected CEDT / CFMWS behaviors and had some discussions
> with Dan and Jonathan - assuming your CEDT has multiple CFMWS to cover
> the same set of devices, this is the expected behavior.
>
> https://lore.kernel.org/linux-mm/Z226PG9t-Ih7fJDL@gourry-fedora-PF4VCD3F/T/#m2780e47df7f0962a79182502afc99843bb046205
>
> Basically your BIOS is likely creating one per device and likely one
> per host bridge (to allow intra-host-bridge interleave).
>
> This puts us in an awkward state, and I need some time to consider
> whether we should expose N_POSSIBLE nodes or N_MEMORY nodes.
>
> Probably it makes sense to expose N_MEMORY nodes and allow for hidden
> state, as the annoying corner condition of a DCD coming and going
> most likely means a user wouldn't be using weighted interleave anyway.
>
> So if you can confirm what you CEDT says compared to the notes above, I
> think we can move forward with this.
>
> ~Gregory
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2025-03-12 8:18 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-03-07 6:35 [PATCH 0/4] mm/mempolicy: Add memory hotplug support in weighted interleave Rakie Kim
2025-03-07 6:35 ` [PATCH 1/4] mm/mempolicy: Support memory hotplug " Rakie Kim
2025-03-07 6:35 ` [PATCH 2/4] mm/mempolicy: Enable sysfs support for " Rakie Kim
2025-03-07 18:19 ` Joshua Hahn
2025-03-10 8:28 ` Rakie Kim
2025-03-07 6:35 ` [PATCH 3/4] mm/mempolicy: Fix duplicate node addition in sysfs for " Rakie Kim
2025-03-07 6:35 ` [PATCH 4/4] mm/mempolicy: Fix memory leaks in mempolicy_sysfs_init() Rakie Kim
2025-03-07 15:23 ` Gregory Price
2025-03-10 8:23 ` Rakie Kim
2025-03-07 15:56 ` [PATCH 0/4] mm/mempolicy: Add memory hotplug support in weighted interleave Gregory Price
2025-03-07 21:55 ` Gregory Price
2025-03-10 9:03 ` Rakie Kim
2025-03-10 14:13 ` Gregory Price
2025-03-12 8:18 ` Rakie Kim
2025-03-10 9:03 ` Rakie Kim
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox