[PATCH v3 0/3] Enhance sysfs handling for memory hotplug in weighted interleave

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 0/3] Enhance sysfs handling for memory hotplug in weighted interleave
@ 2025-03-20  4:17 Rakie Kim
  2025-03-20  4:17 ` [PATCH v3 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs Rakie Kim
                   ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Rakie Kim @ 2025-03-20  4:17 UTC (permalink / raw)
  To: gourry
  Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
	dan.j.williams, ying.huang, david, Jonathan.Cameron, kernel_team,
	honggyu.kim, yunjeong.mun, rakie.kim

The following patch series enhances the weighted interleave policy in the
memory management subsystem by improving sysfs handling, fixing memory leaks,
and introducing dynamic sysfs updates for memory hotplug support.

### Background
The weighted interleave policy distributes memory allocations across multiple
NUMA nodes based on their performance weight, thereby optimizing memory
bandwidth utilization. The weight values are configured through sysfs.

Previously, sysfs entries for weighted interleave were managed statically
at initialization. This led to several issues:
- Memory Leaks: Improper `kobject` deallocation caused memory leaks
  when initialization failed or when nodes were removed.
- Lack of Dynamic Updates: Sysfs attributes were created only during
  initialization, preventing nodes added at runtime from being recognized.
- Handling of Unusable Nodes: Sysfs entries were generated for all
  possible nodes (`N_POSSIBLE`), including memoryless or unavailable nodes,
  leading to unnecessary sysfs attributes and misconfiguration issues.

### Patch Overview
1. [PATCH 1/3] Fix memory leaks in weighted interleave sysfs
   - Ensures proper cleanup of `kobject` allocations.
   - Replaces unnecessary `kfree()` calls with `kobject_put()`, preventing
     memory leaks and improving system stability.

2. [PATCH 2/3] Enable dynamic updates for weighted interleave sysfs
   - Restructures sysfs handling to allow runtime updates.
   - The sysfs attributes are now globally accessible, enabling external
     modules to manage interleave settings dynamically.

3. [PATCH 3/3] Support memory hotplug in weighted interleave
   - Modifies sysfs creation logic to restrict entries to nodes that are
     online and have memory, excluding unusable nodes.
   - Introduces a memory hotplug mechanism to dynamically add and remove
     sysfs attributes when nodes transition into or out of the `N_MEMORY` set.
   - Ensures that sysfs attributes are properly removed when nodes go offline,
     preventing stale or redundant entries from persisting.

These patches have been tested under CXL-based memory configurations,
including hotplug scenarios, to ensure proper behavior and stability.

 mm/mempolicy.c | 191 +++++++++++++++++++++++++++++++------------------
 1 file changed, 123 insertions(+), 68 deletions(-)


base-commit: 4701f33a10702d5fc577c32434eb62adde0a1ae1
-- 
2.34.1



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v3 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs
  2025-03-20  4:17 [PATCH v3 0/3] Enhance sysfs handling for memory hotplug in weighted interleave Rakie Kim
@ 2025-03-20  4:17 ` Rakie Kim
  2025-03-20  5:40   ` Rakie Kim
                     ` (3 more replies)
  2025-03-20  4:17 ` [PATCH v3 2/3] mm/mempolicy: Support dynamic sysfs updates for weighted interleave Rakie Kim
  2025-03-20  4:17 ` [PATCH v3 3/3] mm/mempolicy: Support memory hotplug in " Rakie Kim
  2 siblings, 4 replies; 25+ messages in thread
From: Rakie Kim @ 2025-03-20  4:17 UTC (permalink / raw)
  To: gourry
  Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
	dan.j.williams, ying.huang, david, Jonathan.Cameron, kernel_team,
	honggyu.kim, yunjeong.mun, rakie.kim

Memory leaks occurred when removing sysfs attributes for weighted
interleave. Improper kobject deallocation led to unreleased memory
when initialization failed or when nodes were removed.

This patch resolves the issue by replacing unnecessary `kfree()`
calls with `kobject_put()`, ensuring proper cleanup and preventing
memory leaks.

By correctly using `kobject_put()`, the release function now
properly deallocates memory without causing resource leaks,
thereby improving system stability.

Fixes: dce41f5ae253 ("mm/mempolicy: implement the sysfs-based weighted_interleave interface")
Signed-off-by: Rakie Kim <rakie.kim@sk.com>
---
 mm/mempolicy.c | 61 +++++++++++++++++++++++++-------------------------
 1 file changed, 31 insertions(+), 30 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index bbaadbeeb291..5950d5d5b85e 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -3448,7 +3448,9 @@ static void sysfs_wi_release(struct kobject *wi_kobj)
 
 	for (i = 0; i < nr_node_ids; i++)
 		sysfs_wi_node_release(node_attrs[i], wi_kobj);
-	kobject_put(wi_kobj);
+
+	kfree(node_attrs);
+	kfree(wi_kobj);
 }
 
 static const struct kobj_type wi_ktype = {
@@ -3494,15 +3496,22 @@ static int add_weighted_interleave_group(struct kobject *root_kobj)
 	struct kobject *wi_kobj;
 	int nid, err;
 
-	wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
-	if (!wi_kobj)
+	node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
+			     GFP_KERNEL);
+	if (!node_attrs)
 		return -ENOMEM;
 
+	wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
+	if (!wi_kobj) {
+		err = -ENOMEM;
+		goto node_out;
+	}
+
 	err = kobject_init_and_add(wi_kobj, &wi_ktype, root_kobj,
 				   "weighted_interleave");
 	if (err) {
-		kfree(wi_kobj);
-		return err;
+		kobject_put(wi_kobj);
+		goto err_out;
 	}
 
 	for_each_node_state(nid, N_POSSIBLE) {
@@ -3512,9 +3521,17 @@ static int add_weighted_interleave_group(struct kobject *root_kobj)
 			break;
 		}
 	}
-	if (err)
+	if (err) {
 		kobject_put(wi_kobj);
+		goto err_out;
+	}
+
 	return 0;
+
+node_out:
+	kfree(node_attrs);
+err_out:
+	return err;
 }
 
 static void mempolicy_kobj_release(struct kobject *kobj)
@@ -3528,7 +3545,6 @@ static void mempolicy_kobj_release(struct kobject *kobj)
 	mutex_unlock(&iw_table_lock);
 	synchronize_rcu();
 	kfree(old);
-	kfree(node_attrs);
 	kfree(kobj);
 }
 
@@ -3542,37 +3558,22 @@ static int __init mempolicy_sysfs_init(void)
 	static struct kobject *mempolicy_kobj;
 
 	mempolicy_kobj = kzalloc(sizeof(*mempolicy_kobj), GFP_KERNEL);
-	if (!mempolicy_kobj) {
-		err = -ENOMEM;
-		goto err_out;
-	}
-
-	node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
-			     GFP_KERNEL);
-	if (!node_attrs) {
-		err = -ENOMEM;
-		goto mempol_out;
-	}
+	if (!mempolicy_kobj)
+		return -ENOMEM;
 
 	err = kobject_init_and_add(mempolicy_kobj, &mempolicy_ktype, mm_kobj,
 				   "mempolicy");
 	if (err)
-		goto node_out;
+		goto err_out;
 
 	err = add_weighted_interleave_group(mempolicy_kobj);
-	if (err) {
-		pr_err("mempolicy sysfs structure failed to initialize\n");
-		kobject_put(mempolicy_kobj);
-		return err;
-	}
+	if (err)
+		goto err_out;
+
+	return 0;
 
-	return err;
-node_out:
-	kfree(node_attrs);
-mempol_out:
-	kfree(mempolicy_kobj);
 err_out:
-	pr_err("failed to add mempolicy kobject to the system\n");
+	kobject_put(mempolicy_kobj);
 	return err;
 }
 

base-commit: 4701f33a10702d5fc577c32434eb62adde0a1ae1
-- 
2.34.1



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v3 2/3] mm/mempolicy: Support dynamic sysfs updates for weighted interleave
  2025-03-20  4:17 [PATCH v3 0/3] Enhance sysfs handling for memory hotplug in weighted interleave Rakie Kim
  2025-03-20  4:17 ` [PATCH v3 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs Rakie Kim
@ 2025-03-20  4:17 ` Rakie Kim
  2025-03-21 14:09   ` Gregory Price
  2025-04-02 16:33   ` Dan Williams
  2025-03-20  4:17 ` [PATCH v3 3/3] mm/mempolicy: Support memory hotplug in " Rakie Kim
  2 siblings, 2 replies; 25+ messages in thread
From: Rakie Kim @ 2025-03-20  4:17 UTC (permalink / raw)
  To: gourry
  Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
	dan.j.williams, ying.huang, david, Jonathan.Cameron, kernel_team,
	honggyu.kim, yunjeong.mun, rakie.kim

Previously, the weighted interleave sysfs structure was statically
managed, preventing dynamic updates when nodes were added or removed.

This patch restructures the weighted interleave sysfs to support
dynamic insertion and deletion. The sysfs that was part of
the 'weighted_interleave_group' is now globally accessible,
allowing external access to that sysfs.

With this change, sysfs management for weighted interleave is
more flexible, supporting hotplug events and runtime updates
more effectively.

Signed-off-by: Rakie Kim <rakie.kim@sk.com>
---
 mm/mempolicy.c | 70 ++++++++++++++++++++++----------------------------
 1 file changed, 30 insertions(+), 40 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 5950d5d5b85e..6c8843114afd 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -3388,6 +3388,13 @@ struct iw_node_attr {
 	int nid;
 };
 
+struct sysfs_wi_group {
+	struct kobject wi_kobj;
+	struct iw_node_attr *nattrs[];
+};
+
+static struct sysfs_wi_group *sgrp;
+
 static ssize_t node_show(struct kobject *kobj, struct kobj_attribute *attr,
 			 char *buf)
 {
@@ -3430,27 +3437,23 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
 	return count;
 }
 
-static struct iw_node_attr **node_attrs;
-
-static void sysfs_wi_node_release(struct iw_node_attr *node_attr,
-				  struct kobject *parent)
+static void sysfs_wi_node_release(int nid)
 {
-	if (!node_attr)
+	if (!sgrp->nattrs[nid])
 		return;
-	sysfs_remove_file(parent, &node_attr->kobj_attr.attr);
-	kfree(node_attr->kobj_attr.attr.name);
-	kfree(node_attr);
+
+	sysfs_remove_file(&sgrp->wi_kobj, &sgrp->nattrs[nid]->kobj_attr.attr);
+	kfree(sgrp->nattrs[nid]->kobj_attr.attr.name);
+	kfree(sgrp->nattrs[nid]);
 }
 
 static void sysfs_wi_release(struct kobject *wi_kobj)
 {
-	int i;
-
-	for (i = 0; i < nr_node_ids; i++)
-		sysfs_wi_node_release(node_attrs[i], wi_kobj);
+	int nid;
 
-	kfree(node_attrs);
-	kfree(wi_kobj);
+	for (nid = 0; nid < nr_node_ids; nid++)
+		sysfs_wi_node_release(nid);
+	kfree(sgrp);
 }
 
 static const struct kobj_type wi_ktype = {
@@ -3458,7 +3461,7 @@ static const struct kobj_type wi_ktype = {
 	.release = sysfs_wi_release,
 };
 
-static int add_weight_node(int nid, struct kobject *wi_kobj)
+static int sysfs_wi_node_add(int nid)
 {
 	struct iw_node_attr *node_attr;
 	char *name;
@@ -3480,57 +3483,44 @@ static int add_weight_node(int nid, struct kobject *wi_kobj)
 	node_attr->kobj_attr.store = node_store;
 	node_attr->nid = nid;
 
-	if (sysfs_create_file(wi_kobj, &node_attr->kobj_attr.attr)) {
+	if (sysfs_create_file(&sgrp->wi_kobj, &node_attr->kobj_attr.attr)) {
 		kfree(node_attr->kobj_attr.attr.name);
 		kfree(node_attr);
 		pr_err("failed to add attribute to weighted_interleave\n");
 		return -ENOMEM;
 	}
 
-	node_attrs[nid] = node_attr;
+	sgrp->nattrs[nid] = node_attr;
 	return 0;
 }
 
-static int add_weighted_interleave_group(struct kobject *root_kobj)
+static int add_weighted_interleave_group(struct kobject *mempolicy_kobj)
 {
-	struct kobject *wi_kobj;
 	int nid, err;
 
-	node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
-			     GFP_KERNEL);
-	if (!node_attrs)
+	sgrp = kzalloc(sizeof(struct sysfs_wi_group) + 			\
+		       nr_node_ids * sizeof(struct iw_node_attr *),	\
+		       GFP_KERNEL);
+	if (!sgrp)
 		return -ENOMEM;
 
-	wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
-	if (!wi_kobj) {
-		err = -ENOMEM;
-		goto node_out;
-	}
-
-	err = kobject_init_and_add(wi_kobj, &wi_ktype, root_kobj,
+	err = kobject_init_and_add(&sgrp->wi_kobj, &wi_ktype, mempolicy_kobj,
 				   "weighted_interleave");
-	if (err) {
-		kobject_put(wi_kobj);
+	if (err)
 		goto err_out;
-	}
 
 	for_each_node_state(nid, N_POSSIBLE) {
-		err = add_weight_node(nid, wi_kobj);
+		err = sysfs_wi_node_add(nid);
 		if (err) {
 			pr_err("failed to add sysfs [node%d]\n", nid);
-			break;
+			goto err_out;
 		}
 	}
-	if (err) {
-		kobject_put(wi_kobj);
-		goto err_out;
-	}
 
 	return 0;
 
-node_out:
-	kfree(node_attrs);
 err_out:
+	kobject_put(&sgrp->wi_kobj);
 	return err;
 }
 
-- 
2.34.1



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v3 3/3] mm/mempolicy: Support memory hotplug in weighted interleave
  2025-03-20  4:17 [PATCH v3 0/3] Enhance sysfs handling for memory hotplug in weighted interleave Rakie Kim
  2025-03-20  4:17 ` [PATCH v3 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs Rakie Kim
  2025-03-20  4:17 ` [PATCH v3 2/3] mm/mempolicy: Support dynamic sysfs updates for weighted interleave Rakie Kim
@ 2025-03-20  4:17 ` Rakie Kim
  2025-03-21 14:24   ` Gregory Price
  2 siblings, 1 reply; 25+ messages in thread
From: Rakie Kim @ 2025-03-20  4:17 UTC (permalink / raw)
  To: gourry
  Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
	dan.j.williams, ying.huang, david, Jonathan.Cameron, kernel_team,
	honggyu.kim, yunjeong.mun, rakie.kim

The weighted interleave policy distributes page allocations across multiple
NUMA nodes based on their performance weight, thereby improving memory
bandwidth utilization. The weight values for each node are configured
through sysfs.

Previously, sysfs entries for configuring weighted interleave were created
for all possible nodes (N_POSSIBLE) at initialization, including nodes that
might not have memory. However, not all nodes in N_POSSIBLE are usable at
runtime, as some may remain memoryless or offline.
This led to sysfs entries being created for unusable nodes, causing
potential misconfiguration issues.

To address this issue, this patch modifies the sysfs creation logic to:
1) Limit sysfs entries to nodes that are online and have memory, reducing
   the creation of sysfs attributes for unusable nodes.
2) Support memory hotplug by dynamically adding and removing sysfs entries
   based on whether a node transitions into or out of the N_MEMORY state.

Additionally, the patch ensures that sysfs attributes are properly managed
when nodes go offline, preventing stale or redundant entries from persisting
in the system.

By making these changes, the weighted interleave policy now manages its
sysfs entries more efficiently, ensuring that only relevant nodes are
considered for interleaving, and dynamically adapting to memory hotplug
events.

Signed-off-by: Rakie Kim <rakie.kim@sk.com>
---
 mm/mempolicy.c | 108 +++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 86 insertions(+), 22 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 6c8843114afd..91cdc1d9d43e 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -113,6 +113,7 @@
 #include <asm/tlbflush.h>
 #include <asm/tlb.h>
 #include <linux/uaccess.h>
+#include <linux/memory.h>
 
 #include "internal.h"
 
@@ -3390,6 +3391,7 @@ struct iw_node_attr {
 
 struct sysfs_wi_group {
 	struct kobject wi_kobj;
+	struct mutex kobj_lock;
 	struct iw_node_attr *nattrs[];
 };
 
@@ -3439,12 +3441,24 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
 
 static void sysfs_wi_node_release(int nid)
 {
-	if (!sgrp->nattrs[nid])
+	struct iw_node_attr *attr;
+
+	if (nid < 0 || nid >= nr_node_ids)
+		return;
+
+	mutex_lock(&sgrp->kobj_lock);
+	attr = sgrp->nattrs[nid];
+	if (!attr) {
+		mutex_unlock(&sgrp->kobj_lock);
 		return;
+	}
+
+	sgrp->nattrs[nid] = NULL;
+	mutex_unlock(&sgrp->kobj_lock);
 
-	sysfs_remove_file(&sgrp->wi_kobj, &sgrp->nattrs[nid]->kobj_attr.attr);
-	kfree(sgrp->nattrs[nid]->kobj_attr.attr.name);
-	kfree(sgrp->nattrs[nid]);
+	sysfs_remove_file(&sgrp->wi_kobj, &attr->kobj_attr.attr);
+	kfree(attr->kobj_attr.attr.name);
+	kfree(attr);
 }
 
 static void sysfs_wi_release(struct kobject *wi_kobj)
@@ -3463,35 +3477,80 @@ static const struct kobj_type wi_ktype = {
 
 static int sysfs_wi_node_add(int nid)
 {
-	struct iw_node_attr *node_attr;
+	int ret = 0;
 	char *name;
+	struct iw_node_attr *new_attr = NULL;
+
+	if (nid < 0 || nid >= nr_node_ids) {
+		pr_err("Invalid node id: %d\n", nid);
+		return -EINVAL;
+	}
 
-	node_attr = kzalloc(sizeof(*node_attr), GFP_KERNEL);
-	if (!node_attr)
+	new_attr = kzalloc(sizeof(struct iw_node_attr), GFP_KERNEL);
+	if (!new_attr)
 		return -ENOMEM;
 
 	name = kasprintf(GFP_KERNEL, "node%d", nid);
 	if (!name) {
-		kfree(node_attr);
+		kfree(new_attr);
 		return -ENOMEM;
 	}
 
-	sysfs_attr_init(&node_attr->kobj_attr.attr);
-	node_attr->kobj_attr.attr.name = name;
-	node_attr->kobj_attr.attr.mode = 0644;
-	node_attr->kobj_attr.show = node_show;
-	node_attr->kobj_attr.store = node_store;
-	node_attr->nid = nid;
+	mutex_lock(&sgrp->kobj_lock);
+	if (sgrp->nattrs[nid]) {
+		mutex_unlock(&sgrp->kobj_lock);
+		pr_info("Node [%d] already exists\n", nid);
+		kfree(new_attr);
+		kfree(name);
+		return 0;
+	}
 
-	if (sysfs_create_file(&sgrp->wi_kobj, &node_attr->kobj_attr.attr)) {
-		kfree(node_attr->kobj_attr.attr.name);
-		kfree(node_attr);
-		pr_err("failed to add attribute to weighted_interleave\n");
-		return -ENOMEM;
+	sgrp->nattrs[nid] = new_attr;
+	mutex_unlock(&sgrp->kobj_lock);
+
+	sysfs_attr_init(&sgrp->nattrs[nid]->kobj_attr.attr);
+	sgrp->nattrs[nid]->kobj_attr.attr.name = name;
+	sgrp->nattrs[nid]->kobj_attr.attr.mode = 0644;
+	sgrp->nattrs[nid]->kobj_attr.show = node_show;
+	sgrp->nattrs[nid]->kobj_attr.store = node_store;
+	sgrp->nattrs[nid]->nid = nid;
+
+	ret = sysfs_create_file(&sgrp->wi_kobj, &sgrp->nattrs[nid]->kobj_attr.attr);
+	if (ret) {
+		kfree(sgrp->nattrs[nid]->kobj_attr.attr.name);
+		kfree(sgrp->nattrs[nid]);
+		sgrp->nattrs[nid] = NULL;
+		pr_err("Failed to add attribute to weighted_interleave: %d\n", ret);
 	}
 
-	sgrp->nattrs[nid] = node_attr;
-	return 0;
+	return ret;
+}
+
+static int wi_node_notifier(struct notifier_block *nb,
+			       unsigned long action, void *data)
+{
+	int err;
+	struct memory_notify *arg = data;
+	int nid = arg->status_change_nid;
+
+	if (nid < 0)
+		goto notifier_end;
+
+	switch(action) {
+	case MEM_ONLINE:
+		err = sysfs_wi_node_add(nid);
+		if (err) {
+			pr_err("failed to add sysfs [node%d]\n", nid);
+			return NOTIFY_BAD;
+		}
+		break;
+	case MEM_OFFLINE:
+		sysfs_wi_node_release(nid);
+		break;
+	}
+
+notifier_end:
+	return NOTIFY_OK;
 }
 
 static int add_weighted_interleave_group(struct kobject *mempolicy_kobj)
@@ -3503,13 +3562,17 @@ static int add_weighted_interleave_group(struct kobject *mempolicy_kobj)
 		       GFP_KERNEL);
 	if (!sgrp)
 		return -ENOMEM;
+	mutex_init(&sgrp->kobj_lock);
 
 	err = kobject_init_and_add(&sgrp->wi_kobj, &wi_ktype, mempolicy_kobj,
 				   "weighted_interleave");
 	if (err)
 		goto err_out;
 
-	for_each_node_state(nid, N_POSSIBLE) {
+	for_each_online_node(nid) {
+		if (!node_state(nid, N_MEMORY))
+			continue;
+
 		err = sysfs_wi_node_add(nid);
 		if (err) {
 			pr_err("failed to add sysfs [node%d]\n", nid);
@@ -3517,6 +3580,7 @@ static int add_weighted_interleave_group(struct kobject *mempolicy_kobj)
 		}
 	}
 
+	hotplug_memory_notifier(wi_node_notifier, DEFAULT_CALLBACK_PRI);
 	return 0;
 
 err_out:
-- 
2.34.1



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs
  2025-03-20  4:17 ` [PATCH v3 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs Rakie Kim
@ 2025-03-20  5:40   ` Rakie Kim
  2025-03-20 16:59     ` Gregory Price
  2025-03-20 16:45   ` Joshua Hahn
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 25+ messages in thread
From: Rakie Kim @ 2025-03-20  5:40 UTC (permalink / raw)
  To: Rakie Kim
  Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
	dan.j.williams, ying.huang, david, Jonathan.Cameron, kernel_team,
	honggyu.kim, yunjeong.mun, gourry

On Thu, 20 Mar 2025 13:17:46 +0900 Rakie Kim <rakie.kim@sk.com> wrote:

Hi Gregory

I initially planned to separate this patch from the hotplug-related patch
series as an independent update. However, after reviewing the code following
Jonathan's suggestion to consolidate `kobject` and `node_attrs` into a single
struct, I realized that most of the intended functionality for Patch 2 was
already incorporated.

As a result, Patch 1 now only contains the `kobject_put` fix, while the
struct consolidation work has been included in Patch 2. Given the current
design, it seems more natural to keep Patch 1 and Patch 2 together as part
of the same patch series rather than separating them.

Rakie

> Memory leaks occurred when removing sysfs attributes for weighted
> interleave. Improper kobject deallocation led to unreleased memory
> when initialization failed or when nodes were removed.
> 
> This patch resolves the issue by replacing unnecessary `kfree()`
> calls with `kobject_put()`, ensuring proper cleanup and preventing
> memory leaks.
> 
> By correctly using `kobject_put()`, the release function now
> properly deallocates memory without causing resource leaks,
> thereby improving system stability.
> 
> Fixes: dce41f5ae253 ("mm/mempolicy: implement the sysfs-based weighted_interleave interface")
> Signed-off-by: Rakie Kim <rakie.kim@sk.com>
> ---
>  mm/mempolicy.c | 61 +++++++++++++++++++++++++-------------------------
>  1 file changed, 31 insertions(+), 30 deletions(-)
> 
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index bbaadbeeb291..5950d5d5b85e 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -3448,7 +3448,9 @@ static void sysfs_wi_release(struct kobject *wi_kobj)
>  
>  	for (i = 0; i < nr_node_ids; i++)
>  		sysfs_wi_node_release(node_attrs[i], wi_kobj);
> -	kobject_put(wi_kobj);
> +
> +	kfree(node_attrs);
> +	kfree(wi_kobj);
>  }
>  
>  static const struct kobj_type wi_ktype = {
> @@ -3494,15 +3496,22 @@ static int add_weighted_interleave_group(struct kobject *root_kobj)
>  	struct kobject *wi_kobj;
>  	int nid, err;
>  
> -	wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
> -	if (!wi_kobj)
> +	node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
> +			     GFP_KERNEL);
> +	if (!node_attrs)
>  		return -ENOMEM;
>  
> +	wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
> +	if (!wi_kobj) {
> +		err = -ENOMEM;
> +		goto node_out;
> +	}
> +
>  	err = kobject_init_and_add(wi_kobj, &wi_ktype, root_kobj,
>  				   "weighted_interleave");
>  	if (err) {
> -		kfree(wi_kobj);
> -		return err;
> +		kobject_put(wi_kobj);
> +		goto err_out;
>  	}
>  
>  	for_each_node_state(nid, N_POSSIBLE) {
> @@ -3512,9 +3521,17 @@ static int add_weighted_interleave_group(struct kobject *root_kobj)
>  			break;
>  		}
>  	}
> -	if (err)
> +	if (err) {
>  		kobject_put(wi_kobj);
> +		goto err_out;
> +	}
> +
>  	return 0;
> +
> +node_out:
> +	kfree(node_attrs);
> +err_out:
> +	return err;
>  }
>  
>  static void mempolicy_kobj_release(struct kobject *kobj)
> @@ -3528,7 +3545,6 @@ static void mempolicy_kobj_release(struct kobject *kobj)
>  	mutex_unlock(&iw_table_lock);
>  	synchronize_rcu();
>  	kfree(old);
> -	kfree(node_attrs);
>  	kfree(kobj);
>  }
>  
> @@ -3542,37 +3558,22 @@ static int __init mempolicy_sysfs_init(void)
>  	static struct kobject *mempolicy_kobj;
>  
>  	mempolicy_kobj = kzalloc(sizeof(*mempolicy_kobj), GFP_KERNEL);
> -	if (!mempolicy_kobj) {
> -		err = -ENOMEM;
> -		goto err_out;
> -	}
> -
> -	node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
> -			     GFP_KERNEL);
> -	if (!node_attrs) {
> -		err = -ENOMEM;
> -		goto mempol_out;
> -	}
> +	if (!mempolicy_kobj)
> +		return -ENOMEM;
>  
>  	err = kobject_init_and_add(mempolicy_kobj, &mempolicy_ktype, mm_kobj,
>  				   "mempolicy");
>  	if (err)
> -		goto node_out;
> +		goto err_out;
>  
>  	err = add_weighted_interleave_group(mempolicy_kobj);
> -	if (err) {
> -		pr_err("mempolicy sysfs structure failed to initialize\n");
> -		kobject_put(mempolicy_kobj);
> -		return err;
> -	}
> +	if (err)
> +		goto err_out;
> +
> +	return 0;
>  
> -	return err;
> -node_out:
> -	kfree(node_attrs);
> -mempol_out:
> -	kfree(mempolicy_kobj);
>  err_out:
> -	pr_err("failed to add mempolicy kobject to the system\n");
> +	kobject_put(mempolicy_kobj);
>  	return err;
>  }
>  
> 
> base-commit: 4701f33a10702d5fc577c32434eb62adde0a1ae1
> -- 
> 2.34.1
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs
  2025-03-20  4:17 ` [PATCH v3 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs Rakie Kim
  2025-03-20  5:40   ` Rakie Kim
@ 2025-03-20 16:45   ` Joshua Hahn
  2025-03-21  4:37     ` Rakie Kim
  2025-03-21 13:59   ` Gregory Price
  2025-03-24 16:40   ` Markus Elfring
  3 siblings, 1 reply; 25+ messages in thread
From: Joshua Hahn @ 2025-03-20 16:45 UTC (permalink / raw)
  To: Rakie Kim
  Cc: gourry, akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
	dan.j.williams, ying.huang, david, Jonathan.Cameron, kernel_team,
	honggyu.kim, yunjeong.mun

Hi Rakie, thank you for the new version! I have just a few questions / nits
about this patch.

On Thu, 20 Mar 2025 13:17:46 +0900 Rakie Kim <rakie.kim@sk.com> wrote:

> Memory leaks occurred when removing sysfs attributes for weighted
> interleave. Improper kobject deallocation led to unreleased memory
> when initialization failed or when nodes were removed.
> 
> This patch resolves the issue by replacing unnecessary `kfree()`
> calls with `kobject_put()`, ensuring proper cleanup and preventing
> memory leaks.
> 
> By correctly using `kobject_put()`, the release function now
> properly deallocates memory without causing resource leaks,
> thereby improving system stability.
> 
> Fixes: dce41f5ae253 ("mm/mempolicy: implement the sysfs-based weighted_interleave interface")
> Signed-off-by: Rakie Kim <rakie.kim@sk.com>
> ---
>  mm/mempolicy.c | 61 +++++++++++++++++++++++++-------------------------
>  1 file changed, 31 insertions(+), 30 deletions(-)
> 
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index bbaadbeeb291..5950d5d5b85e 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -3448,7 +3448,9 @@ static void sysfs_wi_release(struct kobject *wi_kobj)
>  
>  	for (i = 0; i < nr_node_ids; i++)
>  		sysfs_wi_node_release(node_attrs[i], wi_kobj);
> -	kobject_put(wi_kobj);
> +
> +	kfree(node_attrs);
> +	kfree(wi_kobj);
>  }

I think the intent here is to make mempolicy_sysfs_init call kobject_put, which
will then call sysfs_wi_release when the refcount is 0. So I think replacing
kobject_put with kfree makes a lot of sense here. However, I think it is a bit
confusing based on the commit message, which states that you are doing the
opposite (replacing kfree with kobject_put). Perhaps it makes more sense to
say that you are moving kfree() from sysfs_init to the release function, so
that the struct and the node_attrs struct is freed together by the last
reference holder.

>  static const struct kobj_type wi_ktype = {
> @@ -3494,15 +3496,22 @@ static int add_weighted_interleave_group(struct kobject *root_kobj)
>  	struct kobject *wi_kobj;
>  	int nid, err;
>  
> -	wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
> -	if (!wi_kobj)
> +	node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
> +			     GFP_KERNEL);
> +	if (!node_attrs)
>  		return -ENOMEM;

It's also not obvious to me why the allocation for node_attrs was moved to
add_weighted_interleave_group. Maybe this refactoring belongs in patch 2,
whose described intent is to consolidate the two objects into one (I expand
on this idea below)

> +	wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
> +	if (!wi_kobj) {
> +		err = -ENOMEM;
> +		goto node_out;
> +	}
> +
>  	err = kobject_init_and_add(wi_kobj, &wi_ktype, root_kobj,
>  				   "weighted_interleave");
>  	if (err) {
> -		kfree(wi_kobj);
> -		return err;
> +		kobject_put(wi_kobj);
> +		goto err_out;
>  	}
>  
>  	for_each_node_state(nid, N_POSSIBLE) {
> @@ -3512,9 +3521,17 @@ static int add_weighted_interleave_group(struct kobject *root_kobj)
>  			break;
>  		}
>  	}
> -	if (err)
> +	if (err) {
>  		kobject_put(wi_kobj);
> +		goto err_out;
> +	}
> +
>  	return 0;
> +
> +node_out:
> +	kfree(node_attrs);
> +err_out:

NIT: Is there a reason why we have a single line goto statement? Maybe it
is more readable to replace all `goto err_out` with `return err` and save
a few jumps : -)

> +	return err;
>  }
>  
>  static void mempolicy_kobj_release(struct kobject *kobj)
> @@ -3528,7 +3545,6 @@ static void mempolicy_kobj_release(struct kobject *kobj)
>  	mutex_unlock(&iw_table_lock);
>  	synchronize_rcu();
>  	kfree(old);
> -	kfree(node_attrs);

I think the intent of this patch is slightly confusing. Viewing this patch
alone, it is not entirely obvious why the kfree for node_attrs is now being
moved from the release of mempolicy_kobj to wi_kobj. Of course, we know that
it is actually because this patch serves a secondary purpose of moving
the allocations / freeing of nattrs and wi_kobj together, so that in the
next patch they can be combined into a single struct.

I think one way to make this patch more readable and maintainable is to
separate it into (1) fixes, (as the Fixes: tag in your commit message suggests)
and (2) refactoring that prepares for the next patch.

Please let me know what you think -- these were just some thoughts that I had
while I was reading the patch. Thank you again for this new version!

Have a great day : -)
Joshua

Sent using hkml (https://github.com/sjp38/hackermail)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs
  2025-03-20  5:40   ` Rakie Kim
@ 2025-03-20 16:59     ` Gregory Price
  2025-03-21  4:36       ` Rakie Kim
  0 siblings, 1 reply; 25+ messages in thread
From: Gregory Price @ 2025-03-20 16:59 UTC (permalink / raw)
  To: Rakie Kim
  Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
	dan.j.williams, ying.huang, david, Jonathan.Cameron, kernel_team,
	honggyu.kim, yunjeong.mun

On Thu, Mar 20, 2025 at 02:40:01PM +0900, Rakie Kim wrote:
> On Thu, 20 Mar 2025 13:17:46 +0900 Rakie Kim <rakie.kim@sk.com> wrote:
> 
> Hi Gregory
> 
> I initially planned to separate this patch from the hotplug-related patch
> series as an independent update. However, after reviewing the code following
> Jonathan's suggestion to consolidate `kobject` and `node_attrs` into a single
> struct, I realized that most of the intended functionality for Patch 2 was
> already incorporated.
> 
> As a result, Patch 1 now only contains the `kobject_put` fix, while the
> struct consolidation work has been included in Patch 2. Given the current
> design, it seems more natural to keep Patch 1 and Patch 2 together as part
> of the same patch series rather than separating them.
> 
> Rakie
> 

The point of submitting separately was to backport this to LTS via
-stable.  We probably still want this since it ostensibly solves a
memory leak - even if the design is to support this work.

~Gregory



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs
  2025-03-20 16:59     ` Gregory Price
@ 2025-03-21  4:36       ` Rakie Kim
  2025-03-21  4:53         ` Gregory Price
  0 siblings, 1 reply; 25+ messages in thread
From: Rakie Kim @ 2025-03-21  4:36 UTC (permalink / raw)
  To: Gregory Price
  Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
	dan.j.williams, ying.huang, david, Jonathan.Cameron, kernel_team,
	honggyu.kim, yunjeong.mun, Rakie Kim

On Thu, 20 Mar 2025 12:59:32 -0400 Gregory Price <gourry@gourry.net> wrote:
> On Thu, Mar 20, 2025 at 02:40:01PM +0900, Rakie Kim wrote:
> > On Thu, 20 Mar 2025 13:17:46 +0900 Rakie Kim <rakie.kim@sk.com> wrote:
> > 
> > Hi Gregory
> > 
> > I initially planned to separate this patch from the hotplug-related patch
> > series as an independent update. However, after reviewing the code following
> > Jonathan's suggestion to consolidate `kobject` and `node_attrs` into a single
> > struct, I realized that most of the intended functionality for Patch 2 was
> > already incorporated.
> > 
> > As a result, Patch 1 now only contains the `kobject_put` fix, while the
> > struct consolidation work has been included in Patch 2. Given the current
> > design, it seems more natural to keep Patch 1 and Patch 2 together as part
> > of the same patch series rather than separating them.
> > 
> > Rakie
> > 
> 
> The point of submitting separately was to backport this to LTS via
> -stable.  We probably still want this since it ostensibly solves a
> memory leak - even if the design is to support this work.
> 
> ~Gregory
> 

Patch 1 and Patch 2 are closely related, and I believe that both patches
need to be combined to fully support the functionality.

Initially, I thought that Patch 1 was the fix for the original issue and
considered it the candidate for a backport.
However, upon further reflection, I believe that all changes in Patch 1
through Patch 3 are necessary to fully address the underlying problem.

Therefore, I now think it makes more sense to merge Patch 1 and Patch 2
into a single patch, then renumber the current Patch 3 as Patch 2,
and treat the entire set as a proper -stable backport candidate.

I'd appreciate your thoughts on this suggestion.

Rakie



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs
  2025-03-20 16:45   ` Joshua Hahn
@ 2025-03-21  4:37     ` Rakie Kim
  2025-03-21 14:03       ` Gregory Price
  0 siblings, 1 reply; 25+ messages in thread
From: Rakie Kim @ 2025-03-21  4:37 UTC (permalink / raw)
  To: Joshua Hahn
  Cc: gourry, akpm, linux-mm, linux-kernel, linux-cxl, dan.j.williams,
	ying.huang, david, Jonathan.Cameron, kernel_team, honggyu.kim,
	yunjeong.mun, Rakie Kim

On Thu, 20 Mar 2025 09:45:31 -0700 Joshua Hahn <joshua.hahnjy@gmail.com> wrote:

Hi Joshua
Thank you for your response regarding this patch.

> Hi Rakie, thank you for the new version! I have just a few questions / nits
> about this patch.
> 
> On Thu, 20 Mar 2025 13:17:46 +0900 Rakie Kim <rakie.kim@sk.com> wrote:
> 
> > Memory leaks occurred when removing sysfs attributes for weighted
> > interleave. Improper kobject deallocation led to unreleased memory
> > when initialization failed or when nodes were removed.
> > 
> > This patch resolves the issue by replacing unnecessary `kfree()`
> > calls with `kobject_put()`, ensuring proper cleanup and preventing
> > memory leaks.
> > 
> > By correctly using `kobject_put()`, the release function now
> > properly deallocates memory without causing resource leaks,
> > thereby improving system stability.
> > 
> > Fixes: dce41f5ae253 ("mm/mempolicy: implement the sysfs-based weighted_interleave interface")
> > Signed-off-by: Rakie Kim <rakie.kim@sk.com>
> > ---
> >  mm/mempolicy.c | 61 +++++++++++++++++++++++++-------------------------
> >  1 file changed, 31 insertions(+), 30 deletions(-)
> > 
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index bbaadbeeb291..5950d5d5b85e 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -3448,7 +3448,9 @@ static void sysfs_wi_release(struct kobject *wi_kobj)
> >  
> >  	for (i = 0; i < nr_node_ids; i++)
> >  		sysfs_wi_node_release(node_attrs[i], wi_kobj);
> > -	kobject_put(wi_kobj);
> > +
> > +	kfree(node_attrs);
> > +	kfree(wi_kobj);
> >  }
> 
> I think the intent here is to make mempolicy_sysfs_init call kobject_put, which
> will then call sysfs_wi_release when the refcount is 0. So I think replacing
> kobject_put with kfree makes a lot of sense here. However, I think it is a bit
> confusing based on the commit message, which states that you are doing the
> opposite (replacing kfree with kobject_put). Perhaps it makes more sense to
> say that you are moving kfree() from sysfs_init to the release function, so
> that the struct and the node_attrs struct is freed together by the last
> reference holder.

Yes, this patch does both: it replaces kfree with kobject_put and, as you 
also mentioned, it moves the actual kfree logic into the release function.
I agree the original explanation may have been unclear, so I will review
and revise the commit message accordingly.

> 
> >  static const struct kobj_type wi_ktype = {
> > @@ -3494,15 +3496,22 @@ static int add_weighted_interleave_group(struct kobject *root_kobj)
> >  	struct kobject *wi_kobj;
> >  	int nid, err;
> >  
> > -	wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
> > -	if (!wi_kobj)
> > +	node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
> > +			     GFP_KERNEL);
> > +	if (!node_attrs)
> >  		return -ENOMEM;
> 
> It's also not obvious to me why the allocation for node_attrs was moved to
> add_weighted_interleave_group. Maybe this refactoring belongs in patch 2,
> whose described intent is to consolidate the two objects into one (I expand
> on this idea below)

The reason for moving node_attrs is that it should be tied to wi_kobj
rather than mempolicy_kobj. Since node_attrs must be freed together
with wi_kobj, the allocation was relocated. I believe this behavior
is more clearly expressed in Patch 2.

> 
> > +	wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
> > +	if (!wi_kobj) {
> > +		err = -ENOMEM;
> > +		goto node_out;
> > +	}
> > +
> >  	err = kobject_init_and_add(wi_kobj, &wi_ktype, root_kobj,
> >  				   "weighted_interleave");
> >  	if (err) {
> > -		kfree(wi_kobj);
> > -		return err;
> > +		kobject_put(wi_kobj);
> > +		goto err_out;
> >  	}
> >  
> >  	for_each_node_state(nid, N_POSSIBLE) {
> > @@ -3512,9 +3521,17 @@ static int add_weighted_interleave_group(struct kobject *root_kobj)
> >  			break;
> >  		}
> >  	}
> > -	if (err)
> > +	if (err) {
> >  		kobject_put(wi_kobj);
> > +		goto err_out;
> > +	}
> > +
> >  	return 0;
> > +
> > +node_out:
> > +	kfree(node_attrs);
> > +err_out:
> 
> NIT: Is there a reason why we have a single line goto statement? Maybe it
> is more readable to replace all `goto err_out` with `return err` and save
> a few jumps : -)

This is also being cleaned up in Patch 2. In fact, once Patch 2 is
applied, the node_out label becomes unnecessary.

> 
> > +	return err;
> >  }
> >  
> >  static void mempolicy_kobj_release(struct kobject *kobj)
> > @@ -3528,7 +3545,6 @@ static void mempolicy_kobj_release(struct kobject *kobj)
> >  	mutex_unlock(&iw_table_lock);
> >  	synchronize_rcu();
> >  	kfree(old);
> > -	kfree(node_attrs);
> 
> I think the intent of this patch is slightly confusing. Viewing this patch
> alone, it is not entirely obvious why the kfree for node_attrs is now being
> moved from the release of mempolicy_kobj to wi_kobj. Of course, we know that
> it is actually because this patch serves a secondary purpose of moving
> the allocations / freeing of nattrs and wi_kobj together, so that in the
> next patch they can be combined into a single struct.
> 
> I think one way to make this patch more readable and maintainable is to
> separate it into (1) fixes, (as the Fixes: tag in your commit message suggests)
> and (2) refactoring that prepares for the next patch.
> 
> Please let me know what you think -- these were just some thoughts that I had
> while I was reading the patch. Thank you again for this new version!
> 
> Have a great day : -)
> Joshua

As you mentioned, I agree that Patch 1 may be a bit unclear.
In fact, Patch 1 and Patch 2 share similar goals, and in my view,
they only provide complete functionality when applied together.

Initially, I thought that Patch 1 was the fix for the original issue and
considered it the candidate for a backport.
However, upon further reflection, I believe that all changes in Patch 1
through Patch 3 are necessary to fully address the underlying problem.

Therefore, I now think it makes more sense to merge Patch 1 and Patch 2
into a single patch, then renumber the current Patch 3 as Patch 2,
and treat the entire set as a proper -stable backport candidate.

I'd appreciate your thoughts on this suggestion.

Rakie

> 
> Sent using hkml (https://github.com/sjp38/hackermail)
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs
  2025-03-21  4:36       ` Rakie Kim
@ 2025-03-21  4:53         ` Gregory Price
  2025-03-21  5:06           ` Rakie Kim
  0 siblings, 1 reply; 25+ messages in thread
From: Gregory Price @ 2025-03-21  4:53 UTC (permalink / raw)
  To: Rakie Kim
  Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
	dan.j.williams, ying.huang, david, Jonathan.Cameron, kernel_team,
	honggyu.kim, yunjeong.mun

On Fri, Mar 21, 2025 at 01:36:55PM +0900, Rakie Kim wrote:
> 
> Patch 1 and Patch 2 are closely related, and I believe that both patches
> need to be combined to fully support the functionality.
> 
> Initially, I thought that Patch 1 was the fix for the original issue and
> considered it the candidate for a backport.
> However, upon further reflection, I believe that all changes in Patch 1
> through Patch 3 are necessary to fully address the underlying problem.
> 
> Therefore, I now think it makes more sense to merge Patch 1 and Patch 2
> into a single patch, then renumber the current Patch 3 as Patch 2,
> and treat the entire set as a proper -stable backport candidate.
> 
> I'd appreciate your thoughts on this suggestion.
> 
> Rakie
> 

All of this is fine, but it doesn't fix the bug in LTS kernels :]

It would be nice to do that too, since you identified it.
~Gregory


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs
  2025-03-21  4:53         ` Gregory Price
@ 2025-03-21  5:06           ` Rakie Kim
  0 siblings, 0 replies; 25+ messages in thread
From: Rakie Kim @ 2025-03-21  5:06 UTC (permalink / raw)
  To: Gregory Price
  Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
	dan.j.williams, ying.huang, david, Jonathan.Cameron, kernel_team,
	honggyu.kim, yunjeong.mun, Rakie Kim

On Fri, 21 Mar 2025 00:53:23 -0400 Gregory Price <gourry@gourry.net> wrote:
> On Fri, Mar 21, 2025 at 01:36:55PM +0900, Rakie Kim wrote:
> > 
> > Patch 1 and Patch 2 are closely related, and I believe that both patches
> > need to be combined to fully support the functionality.
> > 
> > Initially, I thought that Patch 1 was the fix for the original issue and
> > considered it the candidate for a backport.
> > However, upon further reflection, I believe that all changes in Patch 1
> > through Patch 3 are necessary to fully address the underlying problem.
> > 
> > Therefore, I now think it makes more sense to merge Patch 1 and Patch 2
> > into a single patch, then renumber the current Patch 3 as Patch 2,
> > and treat the entire set as a proper -stable backport candidate.
> > 
> > I'd appreciate your thoughts on this suggestion.
> > 
> > Rakie
> > 
> 
> All of this is fine, but it doesn't fix the bug in LTS kernels :]
> 
> It would be nice to do that too, since you identified it.
> ~Gregory

Following the previously discussed patch restructuring, are there any
other bugs that remain unresolved?
If you have any additional insights on these unresolved issues,
I would appreciate your input.

Rakie.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs
  2025-03-20  4:17 ` [PATCH v3 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs Rakie Kim
  2025-03-20  5:40   ` Rakie Kim
  2025-03-20 16:45   ` Joshua Hahn
@ 2025-03-21 13:59   ` Gregory Price
  2025-03-24 16:40   ` Markus Elfring
  3 siblings, 0 replies; 25+ messages in thread
From: Gregory Price @ 2025-03-21 13:59 UTC (permalink / raw)
  To: Rakie Kim
  Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
	dan.j.williams, ying.huang, david, Jonathan.Cameron, kernel_team,
	honggyu.kim, yunjeong.mun

On Thu, Mar 20, 2025 at 01:17:46PM +0900, Rakie Kim wrote:
> Memory leaks occurred when removing sysfs attributes for weighted
> interleave. Improper kobject deallocation led to unreleased memory
> when initialization failed or when nodes were removed.
> 
> This patch resolves the issue by replacing unnecessary `kfree()`
> calls with `kobject_put()`, ensuring proper cleanup and preventing
> memory leaks.
> 
> By correctly using `kobject_put()`, the release function now
> properly deallocates memory without causing resource leaks,
> thereby improving system stability.
> 
> Fixes: dce41f5ae253 ("mm/mempolicy: implement the sysfs-based weighted_interleave interface")
> Signed-off-by: Rakie Kim <rakie.kim@sk.com>

Reviewed-by: Gregory Price <gourry@gourry.net>



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs
  2025-03-21  4:37     ` Rakie Kim
@ 2025-03-21 14:03       ` Gregory Price
  2025-03-24  8:47         ` Rakie Kim
  0 siblings, 1 reply; 25+ messages in thread
From: Gregory Price @ 2025-03-21 14:03 UTC (permalink / raw)
  To: Rakie Kim
  Cc: Joshua Hahn, akpm, linux-mm, linux-kernel, linux-cxl,
	dan.j.williams, ying.huang, david, Jonathan.Cameron, kernel_team,
	honggyu.kim, yunjeong.mun

On Fri, Mar 21, 2025 at 01:37:22PM +0900, Rakie Kim wrote:
> As you mentioned, I agree that Patch 1 may be a bit unclear.
> In fact, Patch 1 and Patch 2 share similar goals, and in my view,
> they only provide complete functionality when applied together.
> 
> Initially, I thought that Patch 1 was the fix for the original issue and
> considered it the candidate for a backport.
> However, upon further reflection, I believe that all changes in Patch 1
> through Patch 3 are necessary to fully address the underlying problem.
> 

Patch 1 does address the immediate issue of calling kfree instead of the
appropriate put() routine, so it is fine to keep this separate.

> Therefore, I now think it makes more sense to merge Patch 1 and Patch 2
> into a single patch, then renumber the current Patch 3 as Patch 2,
> and treat the entire set as a proper -stable backport candidate.
>

The set adds functionality and changes the original behavior of the
interface - I'm not clear on the rules on backports in this way.

Would need input from another maintainer on that.

Either way, I would keep it separate for now in case just the first
patch is desired for backport.  Maintainers can always pick up the set
if that's desired.

(It also makes these changes easier to review)
~Gregory


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 2/3] mm/mempolicy: Support dynamic sysfs updates for weighted interleave
  2025-03-20  4:17 ` [PATCH v3 2/3] mm/mempolicy: Support dynamic sysfs updates for weighted interleave Rakie Kim
@ 2025-03-21 14:09   ` Gregory Price
  2025-03-24  8:48     ` Rakie Kim
  2025-04-02 16:33   ` Dan Williams
  1 sibling, 1 reply; 25+ messages in thread
From: Gregory Price @ 2025-03-21 14:09 UTC (permalink / raw)
  To: Rakie Kim
  Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
	dan.j.williams, ying.huang, david, Jonathan.Cameron, kernel_team,
	honggyu.kim, yunjeong.mun

On Thu, Mar 20, 2025 at 01:17:47PM +0900, Rakie Kim wrote:
> Previously, the weighted interleave sysfs structure was statically
> managed, preventing dynamic updates when nodes were added or removed.
> 
> This patch restructures the weighted interleave sysfs to support
> dynamic insertion and deletion. The sysfs that was part of
> the 'weighted_interleave_group' is now globally accessible,
> allowing external access to that sysfs.
> 
> With this change, sysfs management for weighted interleave is
> more flexible, supporting hotplug events and runtime updates
> more effectively.
> 
> Signed-off-by: Rakie Kim <rakie.kim@sk.com>

Reviewed-by: Gregory Price <gourry@gourry.net>

1 nit

> ---
>  mm/mempolicy.c | 70 ++++++++++++++++++++++----------------------------
>  1 file changed, 30 insertions(+), 40 deletions(-)
> 
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 5950d5d5b85e..6c8843114afd 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -3388,6 +3388,13 @@ struct iw_node_attr {
>  	int nid;
>  };
>  
> +struct sysfs_wi_group {
> +	struct kobject wi_kobj;
> +	struct iw_node_attr *nattrs[];
> +};
> +
> +static struct sysfs_wi_group *sgrp;
> +

sgrp -> wi_group?  Or something similar, sgrp is not very descriptive
for a global.

~Gregory


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 3/3] mm/mempolicy: Support memory hotplug in weighted interleave
  2025-03-20  4:17 ` [PATCH v3 3/3] mm/mempolicy: Support memory hotplug in " Rakie Kim
@ 2025-03-21 14:24   ` Gregory Price
  2025-03-24  8:48     ` Rakie Kim
  0 siblings, 1 reply; 25+ messages in thread
From: Gregory Price @ 2025-03-21 14:24 UTC (permalink / raw)
  To: Rakie Kim
  Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
	dan.j.williams, ying.huang, david, Jonathan.Cameron, kernel_team,
	honggyu.kim, yunjeong.mun

On Thu, Mar 20, 2025 at 01:17:48PM +0900, Rakie Kim wrote:
... snip ...
> +	mutex_lock(&sgrp->kobj_lock);
> +	if (sgrp->nattrs[nid]) {
> +		mutex_unlock(&sgrp->kobj_lock);
> +		pr_info("Node [%d] already exists\n", nid);
> +		kfree(new_attr);
> +		kfree(name);
> +		return 0;
> +	}
>  
> -	if (sysfs_create_file(&sgrp->wi_kobj, &node_attr->kobj_attr.attr)) {
> -		kfree(node_attr->kobj_attr.attr.name);
> -		kfree(node_attr);
> -		pr_err("failed to add attribute to weighted_interleave\n");
> -		return -ENOMEM;
> +	sgrp->nattrs[nid] = new_attr;
> +	mutex_unlock(&sgrp->kobj_lock);
> +
> +	sysfs_attr_init(&sgrp->nattrs[nid]->kobj_attr.attr);
> +	sgrp->nattrs[nid]->kobj_attr.attr.name = name;
> +	sgrp->nattrs[nid]->kobj_attr.attr.mode = 0644;
> +	sgrp->nattrs[nid]->kobj_attr.show = node_show;
> +	sgrp->nattrs[nid]->kobj_attr.store = node_store;
> +	sgrp->nattrs[nid]->nid = nid;

These accesses need to be inside the lock as well.  Probably we can't
get here concurrently, but I can't so so definitively that I'm
comfortable blind-accessing it outside the lock.

> +static int wi_node_notifier(struct notifier_block *nb,
> +			       unsigned long action, void *data)
> +{
... snip ...
> +	case MEM_OFFLINE:
> +		sysfs_wi_node_release(nid);

I'm still not convinced this is correct.  `offline_pages()` says this:

/*
 * {on,off}lining is constrained to full memory sections (or more
 * precisely to memory blocks from the user space POV).
 */

And that is the function calling:
	memory_notify(MEM_OFFLINE, &arg);

David pointed out that this should be called when offlining each memory
block.  This is not the same as simply doing `echo 0 > online`, you need
to remove the dax device associated with the memory.

For example:

      node1
    /       \
 dax0.0    dax1.0
   |          |
  mb1        mb2

With this code, if I `daxctl reconfigure-device devmem dax0.0` it will
remove the first memory block, causing MEM_OFFLINE event to fire and
removing the node - despite the fact that dax1.0 is still present.

This matters for systems with memory holes in CXL hotplug memory and
also for systems with Dynamic Capacity Devices surfacing capacity as
separate dax devices.

~Gregory

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs
  2025-03-21 14:03       ` Gregory Price
@ 2025-03-24  8:47         ` Rakie Kim
  0 siblings, 0 replies; 25+ messages in thread
From: Rakie Kim @ 2025-03-24  8:47 UTC (permalink / raw)
  To: Gregory Price
  Cc: Joshua Hahn, akpm, linux-mm, linux-kernel, linux-cxl,
	dan.j.williams, ying.huang, david, Jonathan.Cameron, kernel_team,
	honggyu.kim, yunjeong.mun, Rakie Kim

On Fri, 21 Mar 2025 10:03:29 -0400 Gregory Price <gourry@gourry.net> wrote:
> On Fri, Mar 21, 2025 at 01:37:22PM +0900, Rakie Kim wrote:
> > As you mentioned, I agree that Patch 1 may be a bit unclear.
> > In fact, Patch 1 and Patch 2 share similar goals, and in my view,
> > they only provide complete functionality when applied together.
> > 
> > Initially, I thought that Patch 1 was the fix for the original issue and
> > considered it the candidate for a backport.
> > However, upon further reflection, I believe that all changes in Patch 1
> > through Patch 3 are necessary to fully address the underlying problem.
> > 
> 
> Patch 1 does address the immediate issue of calling kfree instead of the
> appropriate put() routine, so it is fine to keep this separate.

Understood. I will keep this patch as-is for now, as you suggested.

> 
> > Therefore, I now think it makes more sense to merge Patch 1 and Patch 2
> > into a single patch, then renumber the current Patch 3 as Patch 2,
> > and treat the entire set as a proper -stable backport candidate.
> >
> 
> The set adds functionality and changes the original behavior of the
> interface - I'm not clear on the rules on backports in this way.
> 
> Would need input from another maintainer on that.
> 
> Either way, I would keep it separate for now in case just the first
> patch is desired for backport.  Maintainers can always pick up the set
> if that's desired.
> 
> (It also makes these changes easier to review)
> ~Gregory

In that case, I agree it's better to treat only Patch 1 as a backport
candidate for now. As for the remaining patches, it would be more appropriate
to discuss their inclusion with other maintainers at a later point.

Rakie



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 2/3] mm/mempolicy: Support dynamic sysfs updates for weighted interleave
  2025-03-21 14:09   ` Gregory Price
@ 2025-03-24  8:48     ` Rakie Kim
  0 siblings, 0 replies; 25+ messages in thread
From: Rakie Kim @ 2025-03-24  8:48 UTC (permalink / raw)
  To: Gregory Price
  Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
	dan.j.williams, ying.huang, david, Jonathan.Cameron, kernel_team,
	honggyu.kim, yunjeong.mun, Rakie Kim

On Fri, 21 Mar 2025 10:09:12 -0400 Gregory Price <gourry@gourry.net> wrote:
> On Thu, Mar 20, 2025 at 01:17:47PM +0900, Rakie Kim wrote:
> > Previously, the weighted interleave sysfs structure was statically
> > managed, preventing dynamic updates when nodes were added or removed.
> > 
> > This patch restructures the weighted interleave sysfs to support
> > dynamic insertion and deletion. The sysfs that was part of
> > the 'weighted_interleave_group' is now globally accessible,
> > allowing external access to that sysfs.
> > 
> > With this change, sysfs management for weighted interleave is
> > more flexible, supporting hotplug events and runtime updates
> > more effectively.
> > 
> > Signed-off-by: Rakie Kim <rakie.kim@sk.com>
> 
> Reviewed-by: Gregory Price <gourry@gourry.net>
> 
> 1 nit
> 
> > ---
> >  mm/mempolicy.c | 70 ++++++++++++++++++++++----------------------------
> >  1 file changed, 30 insertions(+), 40 deletions(-)
> > 
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index 5950d5d5b85e..6c8843114afd 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -3388,6 +3388,13 @@ struct iw_node_attr {
> >  	int nid;
> >  };
> >  
> > +struct sysfs_wi_group {
> > +	struct kobject wi_kobj;
> > +	struct iw_node_attr *nattrs[];
> > +};
> > +
> > +static struct sysfs_wi_group *sgrp;
> > +
> 
> sgrp -> wi_group?  Or something similar, sgrp is not very descriptive
> for a global.
> 
> ~Gregory

Yes, I agree. `wi_group` is more descriptive than `sgrp`. I will rename the
structure to `wi_group` as suggested.

Rakie



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 3/3] mm/mempolicy: Support memory hotplug in weighted interleave
  2025-03-21 14:24   ` Gregory Price
@ 2025-03-24  8:48     ` Rakie Kim
  2025-03-24  8:54       ` Rakie Kim
  0 siblings, 1 reply; 25+ messages in thread
From: Rakie Kim @ 2025-03-24  8:48 UTC (permalink / raw)
  To: Gregory Price
  Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
	dan.j.williams, ying.huang, david, Jonathan.Cameron, kernel_team,
	honggyu.kim, yunjeong.mun, Rakie Kim

On Fri, 21 Mar 2025 10:24:46 -0400 Gregory Price <gourry@gourry.net> wrote:
> On Thu, Mar 20, 2025 at 01:17:48PM +0900, Rakie Kim wrote:
> ... snip ...
> > +	mutex_lock(&sgrp->kobj_lock);
> > +	if (sgrp->nattrs[nid]) {
> > +		mutex_unlock(&sgrp->kobj_lock);
> > +		pr_info("Node [%d] already exists\n", nid);
> > +		kfree(new_attr);
> > +		kfree(name);
> > +		return 0;
> > +	}
> >  
> > -	if (sysfs_create_file(&sgrp->wi_kobj, &node_attr->kobj_attr.attr)) {
> > -		kfree(node_attr->kobj_attr.attr.name);
> > -		kfree(node_attr);
> > -		pr_err("failed to add attribute to weighted_interleave\n");
> > -		return -ENOMEM;
> > +	sgrp->nattrs[nid] = new_attr;
> > +	mutex_unlock(&sgrp->kobj_lock);
> > +
> > +	sysfs_attr_init(&sgrp->nattrs[nid]->kobj_attr.attr);
> > +	sgrp->nattrs[nid]->kobj_attr.attr.name = name;
> > +	sgrp->nattrs[nid]->kobj_attr.attr.mode = 0644;
> > +	sgrp->nattrs[nid]->kobj_attr.show = node_show;
> > +	sgrp->nattrs[nid]->kobj_attr.store = node_store;
> > +	sgrp->nattrs[nid]->nid = nid;
> 
> These accesses need to be inside the lock as well.  Probably we can't
> get here concurrently, but I can't so so definitively that I'm
> comfortable blind-accessing it outside the lock.

You're right, and I appreciate your point. It's not difficult to apply your
suggestion, so I plan to update the code as follows:

    sgrp->nattrs[nid] = new_attr;

    sysfs_attr_init(&sgrp->nattrs[nid]->kobj_attr.attr);
    sgrp->nattrs[nid]->kobj_attr.attr.name = name;
    sgrp->nattrs[nid]->kobj_attr.attr.mode = 0644;
    sgrp->nattrs[nid]->kobj_attr.show = node_show;
    sgrp->nattrs[nid]->kobj_attr.store = node_store;
    sgrp->nattrs[nid]->nid = nid;

    ret = sysfs_create_file(&sgrp->wi_kobj,
           &sgrp->nattrs[nid]->kobj_attr.attr);
    if (ret) {
        mutex_unlock(&sgrp->kobj_lock);
        ...
    }
    mutex_unlock(&sgrp->kobj_lock);

> 
> > +static int wi_node_notifier(struct notifier_block *nb,
> > +			       unsigned long action, void *data)
> > +{
> ... snip ...
> > +	case MEM_OFFLINE:
> > +		sysfs_wi_node_release(nid);
> 
> I'm still not convinced this is correct.  `offline_pages()` says this:
> 
> /*
>  * {on,off}lining is constrained to full memory sections (or more
>  * precisely to memory blocks from the user space POV).
>  */
> 
> And that is the function calling:
> 	memory_notify(MEM_OFFLINE, &arg);
> 
> David pointed out that this should be called when offlining each memory
> block.  This is not the same as simply doing `echo 0 > online`, you need
> to remove the dax device associated with the memory.
> 
> For example:
> 
>       node1
>     /       \
>  dax0.0    dax1.0
>    |          |
>   mb1        mb2
> 
> 
> With this code, if I `daxctl reconfigure-device devmem dax0.0` it will
> remove the first memory block, causing MEM_OFFLINE event to fire and
> removing the node - despite the fact that dax1.0 is still present.
> 
> This matters for systems with memory holes in CXL hotplug memory and
> also for systems with Dynamic Capacity Devices surfacing capacity as
> separate dax devices.
> 
> ~Gregory

If all memory blocks belonging to a node are offlined, the node will lose its
`N_MEMORY` state before the notifier callback is invoked. This should help avoid
the issue you mentioned.
Please let me know your thoughts on this approach.

Rakie



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 3/3] mm/mempolicy: Support memory hotplug in weighted interleave
  2025-03-24  8:48     ` Rakie Kim
@ 2025-03-24  8:54       ` Rakie Kim
  2025-03-24 13:32         ` Gregory Price
  0 siblings, 1 reply; 25+ messages in thread
From: Rakie Kim @ 2025-03-24  8:54 UTC (permalink / raw)
  To: Rakie Kim
  Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
	dan.j.williams, ying.huang, david, Jonathan.Cameron, kernel_team,
	honggyu.kim, yunjeong.mun, Gregory Price

On Mon, 24 Mar 2025 17:48:39 +0900 Rakie Kim <rakie.kim@sk.com> wrote:
> On Fri, 21 Mar 2025 10:24:46 -0400 Gregory Price <gourry@gourry.net> wrote:
> > On Thu, Mar 20, 2025 at 01:17:48PM +0900, Rakie Kim wrote:
> > ... snip ...
> > > +	mutex_lock(&sgrp->kobj_lock);
> > > +	if (sgrp->nattrs[nid]) {
> > > +		mutex_unlock(&sgrp->kobj_lock);
> > > +		pr_info("Node [%d] already exists\n", nid);
> > > +		kfree(new_attr);
> > > +		kfree(name);
> > > +		return 0;
> > > +	}
> > >  
> > > -	if (sysfs_create_file(&sgrp->wi_kobj, &node_attr->kobj_attr.attr)) {
> > > -		kfree(node_attr->kobj_attr.attr.name);
> > > -		kfree(node_attr);
> > > -		pr_err("failed to add attribute to weighted_interleave\n");
> > > -		return -ENOMEM;
> > > +	sgrp->nattrs[nid] = new_attr;
> > > +	mutex_unlock(&sgrp->kobj_lock);
> > > +
> > > +	sysfs_attr_init(&sgrp->nattrs[nid]->kobj_attr.attr);
> > > +	sgrp->nattrs[nid]->kobj_attr.attr.name = name;
> > > +	sgrp->nattrs[nid]->kobj_attr.attr.mode = 0644;
> > > +	sgrp->nattrs[nid]->kobj_attr.show = node_show;
> > > +	sgrp->nattrs[nid]->kobj_attr.store = node_store;
> > > +	sgrp->nattrs[nid]->nid = nid;
> > 
> > These accesses need to be inside the lock as well.  Probably we can't
> > get here concurrently, but I can't so so definitively that I'm
> > comfortable blind-accessing it outside the lock.
> 
> You're right, and I appreciate your point. It's not difficult to apply your
> suggestion, so I plan to update the code as follows:
> 
>     sgrp->nattrs[nid] = new_attr;
> 
>     sysfs_attr_init(&sgrp->nattrs[nid]->kobj_attr.attr);
>     sgrp->nattrs[nid]->kobj_attr.attr.name = name;
>     sgrp->nattrs[nid]->kobj_attr.attr.mode = 0644;
>     sgrp->nattrs[nid]->kobj_attr.show = node_show;
>     sgrp->nattrs[nid]->kobj_attr.store = node_store;
>     sgrp->nattrs[nid]->nid = nid;
> 
>     ret = sysfs_create_file(&sgrp->wi_kobj,
>            &sgrp->nattrs[nid]->kobj_attr.attr);
>     if (ret) {
>         mutex_unlock(&sgrp->kobj_lock);
>         ...
>     }
>     mutex_unlock(&sgrp->kobj_lock);
> 
> > 
> > > +static int wi_node_notifier(struct notifier_block *nb,
> > > +			       unsigned long action, void *data)
> > > +{
> > ... snip ...
> > > +	case MEM_OFFLINE:
> > > +		sysfs_wi_node_release(nid);
> > 
> > I'm still not convinced this is correct.  `offline_pages()` says this:
> > 
> > /*
> >  * {on,off}lining is constrained to full memory sections (or more
> >  * precisely to memory blocks from the user space POV).
> >  */
> > 
> > And that is the function calling:
> > 	memory_notify(MEM_OFFLINE, &arg);
> > 
> > David pointed out that this should be called when offlining each memory
> > block.  This is not the same as simply doing `echo 0 > online`, you need
> > to remove the dax device associated with the memory.
> > 
> > For example:
> > 
> >       node1
> >     /       \
> >  dax0.0    dax1.0
> >    |          |
> >   mb1        mb2
> > 
> > 
> > With this code, if I `daxctl reconfigure-device devmem dax0.0` it will
> > remove the first memory block, causing MEM_OFFLINE event to fire and
> > removing the node - despite the fact that dax1.0 is still present.
> > 
> > This matters for systems with memory holes in CXL hotplug memory and
> > also for systems with Dynamic Capacity Devices surfacing capacity as
> > separate dax devices.
> > 
> > ~Gregory
> 
> If all memory blocks belonging to a node are offlined, the node will lose its
> `N_MEMORY` state before the notifier callback is invoked. This should help avoid
> the issue you mentioned.
> Please let me know your thoughts on this approach.
> 
> Rakie
> 

I'm sorry, the code is missing.
I may not fully understand the scenario you described, but I think your concern
can be addressed by adding a simple check like the following:

    case MEM_OFFLINE:
        if (!node_state(nid, N_MEMORY)) --> this point
            sysfs_wi_node_release(nid);

If all memory blocks belonging to a node are offlined, the node will lose its
`N_MEMORY` state before the notifier callback is invoked. This should help avoid
the issue you mentioned.
Please let me know your thoughts on this approach.

Rakie.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 3/3] mm/mempolicy: Support memory hotplug in weighted interleave
  2025-03-24  8:54       ` Rakie Kim
@ 2025-03-24 13:32         ` Gregory Price
  2025-03-25 10:27           ` Rakie Kim
  0 siblings, 1 reply; 25+ messages in thread
From: Gregory Price @ 2025-03-24 13:32 UTC (permalink / raw)
  To: Rakie Kim
  Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
	dan.j.williams, ying.huang, david, Jonathan.Cameron, kernel_team,
	honggyu.kim, yunjeong.mun

On Mon, Mar 24, 2025 at 05:54:27PM +0900, Rakie Kim wrote:
> 
> I'm sorry, the code is missing.
> I may not fully understand the scenario you described, but I think your concern
> can be addressed by adding a simple check like the following:
> 
>     case MEM_OFFLINE:
>         if (!node_state(nid, N_MEMORY)) --> this point
>             sysfs_wi_node_release(nid);
>

This should work.  I have some questions about whether there might be
some subtle race conditions with this implementation, but I can take a
look after LSFMM.  (Example: Two blocks being offlined/onlined at the
same time, is state(nid, N_MEMORY) a raced value?)

~Gregory


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs
  2025-03-20  4:17 ` [PATCH v3 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs Rakie Kim
                     ` (2 preceding siblings ...)
  2025-03-21 13:59   ` Gregory Price
@ 2025-03-24 16:40   ` Markus Elfring
  2025-03-25 10:27     ` Rakie Kim
  3 siblings, 1 reply; 25+ messages in thread
From: Markus Elfring @ 2025-03-24 16:40 UTC (permalink / raw)
  To: Rakie Kim, linux-mm, linux-cxl
  Cc: LKML, kernel_team, Andrew Morton, Dan Williams,
	David Hildenbrand, Gregory Price, Honggyu Kim, Jonathan Cameron,
	Joshua Hahn, Ying Huang, Yunjeong Mun

…
> This patch resolves the issue …

See also:
https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?h=v6.14#n94

Regards,
Markus


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs
  2025-03-24 16:40   ` Markus Elfring
@ 2025-03-25 10:27     ` Rakie Kim
  0 siblings, 0 replies; 25+ messages in thread
From: Rakie Kim @ 2025-03-25 10:27 UTC (permalink / raw)
  To: Markus Elfring
  Cc: LKML, kernel_team, Andrew Morton, Dan Williams,
	David Hildenbrand, Gregory Price, Honggyu Kim, Jonathan Cameron,
	Joshua Hahn, Ying Huang, Yunjeong Mun, Rakie Kim, linux-mm,
	linux-cxl

On Mon, 24 Mar 2025 17:40:10 +0100 Markus Elfring <Markus.Elfring@web.de> wrote:
> =E2=80=A6
> > This patch resolves the issue =E2=80=A6
> 
> See also:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tre=
> e/Documentation/process/submitting-patches.rst?h=3Dv6.14#n94
> 
> Regards,
> Markus

Hi Markus

Thank you for your response regarding this patch.
I plan to submit this once the related patches are completed.

Rakie



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 3/3] mm/mempolicy: Support memory hotplug in weighted interleave
  2025-03-24 13:32         ` Gregory Price
@ 2025-03-25 10:27           ` Rakie Kim
  0 siblings, 0 replies; 25+ messages in thread
From: Rakie Kim @ 2025-03-25 10:27 UTC (permalink / raw)
  To: Gregory Price
  Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
	dan.j.williams, ying.huang, david, Jonathan.Cameron, kernel_team,
	honggyu.kim, yunjeong.mun, Rakie Kim

On Mon, 24 Mar 2025 09:32:49 -0400 Gregory Price <gourry@gourry.net> wrote:
> On Mon, Mar 24, 2025 at 05:54:27PM +0900, Rakie Kim wrote:
> > 
> > I'm sorry, the code is missing.
> > I may not fully understand the scenario you described, but I think your concern
> > can be addressed by adding a simple check like the following:
> > 
> >     case MEM_OFFLINE:
> >         if (!node_state(nid, N_MEMORY)) --> this point
> >             sysfs_wi_node_release(nid);
> >
> 
> This should work.  I have some questions about whether there might be
> some subtle race conditions with this implementation, but I can take a
> look after LSFMM.  (Example: Two blocks being offlined/onlined at the
> same time, is state(nid, N_MEMORY) a raced value?)
> 
> ~Gregory

I will also further review the code for any race condition
issues. I intend to update v4 to incorporate the discussions
from this patch series. Your feedback and review of v4 after
LSFMM would be greatly appreciated.

Rakie



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 2/3] mm/mempolicy: Support dynamic sysfs updates for weighted interleave
  2025-03-20  4:17 ` [PATCH v3 2/3] mm/mempolicy: Support dynamic sysfs updates for weighted interleave Rakie Kim
  2025-03-21 14:09   ` Gregory Price
@ 2025-04-02 16:33   ` Dan Williams
  2025-04-03  4:25     ` Rakie Kim
  1 sibling, 1 reply; 25+ messages in thread
From: Dan Williams @ 2025-04-02 16:33 UTC (permalink / raw)
  To: Rakie Kim, gourry
  Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
	dan.j.williams, ying.huang, david, Jonathan.Cameron, kernel_team,
	honggyu.kim, yunjeong.mun, rakie.kim

Rakie Kim wrote:
> Previously, the weighted interleave sysfs structure was statically
> managed, preventing dynamic updates when nodes were added or removed.
> 
> This patch restructures the weighted interleave sysfs to support
> dynamic insertion and deletion. The sysfs that was part of
> the 'weighted_interleave_group' is now globally accessible,
> allowing external access to that sysfs.
> 
> With this change, sysfs management for weighted interleave is
> more flexible, supporting hotplug events and runtime updates
> more effectively.

I understand the urge to try to make a general case for a patch, but it
is better to state the explicit reason especially when someone is later
reading the history and may not realize that this is part of a series.

So instead of making claims like "this is more flexible / more effective
for runtime updates", state that motivation explicitly.  Something like:

"In preparation for enabling weighted-interleave sysfs attributes to
react to node-online/offline events, introduce sysfs_wi_node_add() and
sysfs_wi_node_delete() helpers to dynamically manage the
weighted-interleave attributes.

A follow-on patch registers a memory-hotplug notifier to use these
helpers, for now just refactor the current "publish all possible node"
approach to use sysfs_wi_node_{add,delete}()."

> 
> Signed-off-by: Rakie Kim <rakie.kim@sk.com>
> ---
>  mm/mempolicy.c | 70 ++++++++++++++++++++++----------------------------
>  1 file changed, 30 insertions(+), 40 deletions(-)
> 
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 5950d5d5b85e..6c8843114afd 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -3388,6 +3388,13 @@ struct iw_node_attr {
>  	int nid;
>  };
>  
> +struct sysfs_wi_group {
> +	struct kobject wi_kobj;
> +	struct iw_node_attr *nattrs[];
> +};
> +
> +static struct sysfs_wi_group *sgrp;
> +
>  static ssize_t node_show(struct kobject *kobj, struct kobj_attribute *attr,
>  			 char *buf)
>  {
> @@ -3430,27 +3437,23 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
>  	return count;
>  }
>  
> -static struct iw_node_attr **node_attrs;
> -
> -static void sysfs_wi_node_release(struct iw_node_attr *node_attr,
> -				  struct kobject *parent)
> +static void sysfs_wi_node_release(int nid)

I called this sysfs_wi_node_delete() above because _release() is
typically callback invoked on last put of a kobject.

>  {
> -	if (!node_attr)
> +	if (!sgrp->nattrs[nid])
>  		return;
> -	sysfs_remove_file(parent, &node_attr->kobj_attr.attr);
> -	kfree(node_attr->kobj_attr.attr.name);
> -	kfree(node_attr);
> +
> +	sysfs_remove_file(&sgrp->wi_kobj, &sgrp->nattrs[nid]->kobj_attr.attr);
> +	kfree(sgrp->nattrs[nid]->kobj_attr.attr.name);
> +	kfree(sgrp->nattrs[nid]);
>  }
>  
>  static void sysfs_wi_release(struct kobject *wi_kobj)
>  {
> -	int i;
> -
> -	for (i = 0; i < nr_node_ids; i++)
> -		sysfs_wi_node_release(node_attrs[i], wi_kobj);
> +	int nid;
>  
> -	kfree(node_attrs);
> -	kfree(wi_kobj);
> +	for (nid = 0; nid < nr_node_ids; nid++)
> +		sysfs_wi_node_release(nid);
> +	kfree(sgrp);

This looks broken, are you sure that a kobject with a zero reference can
still host child attributes?

The teardown flow I would expect is:

sysfs_remove_file(node_attrs[i],
kobject_del(wi_kobj)
...that does final kobject_put()...
kfree(container_of(wi_kobj))

However, now I do not think patch1 is actually fixing anything because
there is never a kobject_del() of the mempolicy_kobj. Just like there is
never a kobject_del() of the mm_kobj.

So patch1 seems to potentially be addressing a bug introduced by this
dynamic work which is caused by the original code being confused about
the kobject shutdown path.

The original problems are that sysfs_wi_release() has a kobject_put()
which, yes, is broken, but equally problematic is that there is no
kobject_del() in sight for either of these kobjects(), even with the new
changes. mempolicy_kobj_release() seems to confuse the activities that I
would expect to be near a kobject_del() call with the minimal kfree() on
final put.

>  }
>  
>  static const struct kobj_type wi_ktype = {
> @@ -3458,7 +3461,7 @@ static const struct kobj_type wi_ktype = {
>  	.release = sysfs_wi_release,
>  };
>  
> -static int add_weight_node(int nid, struct kobject *wi_kobj)
> +static int sysfs_wi_node_add(int nid)
>  {
>  	struct iw_node_attr *node_attr;
>  	char *name;
> @@ -3480,57 +3483,44 @@ static int add_weight_node(int nid, struct kobject *wi_kobj)
>  	node_attr->kobj_attr.store = node_store;
>  	node_attr->nid = nid;
>  
> -	if (sysfs_create_file(wi_kobj, &node_attr->kobj_attr.attr)) {
> +	if (sysfs_create_file(&sgrp->wi_kobj, &node_attr->kobj_attr.attr)) {
>  		kfree(node_attr->kobj_attr.attr.name);
>  		kfree(node_attr);
>  		pr_err("failed to add attribute to weighted_interleave\n");
>  		return -ENOMEM;
>  	}
>  
> -	node_attrs[nid] = node_attr;
> +	sgrp->nattrs[nid] = node_attr;
>  	return 0;
>  }
>  
> -static int add_weighted_interleave_group(struct kobject *root_kobj)
> +static int add_weighted_interleave_group(struct kobject *mempolicy_kobj)
>  {
> -	struct kobject *wi_kobj;
>  	int nid, err;
>  
> -	node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
> -			     GFP_KERNEL);
> -	if (!node_attrs)
> +	sgrp = kzalloc(sizeof(struct sysfs_wi_group) + 			\
> +		       nr_node_ids * sizeof(struct iw_node_attr *),	\
> +		       GFP_KERNEL);

The recommended way to allocate a struct with a flexible array is using
the struct_size() helper.

    kzalloc(struct_size(sgrp, nattrs, nr_node_ids), GFP_KERNEL)

...but overall I think the original code needs a cleanup and to be clear
that I think there is no memory leak risk exposed to existing users
given the shutdown path is never invoked.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 2/3] mm/mempolicy: Support dynamic sysfs updates for weighted interleave
  2025-04-02 16:33   ` Dan Williams
@ 2025-04-03  4:25     ` Rakie Kim
  0 siblings, 0 replies; 25+ messages in thread
From: Rakie Kim @ 2025-04-03  4:25 UTC (permalink / raw)
  To: Dan Williams
  Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
	ying.huang, david, Jonathan.Cameron, kernel_team, honggyu.kim,
	yunjeong.mun, rakie.kim, gourry

On Wed, 2 Apr 2025 09:33:51 -0700 Dan Williams <dan.j.williams@intel.com> wrote:
> Rakie Kim wrote:
> > Previously, the weighted interleave sysfs structure was statically
> > managed, preventing dynamic updates when nodes were added or removed.
> > 
> > This patch restructures the weighted interleave sysfs to support
> > dynamic insertion and deletion. The sysfs that was part of
> > the 'weighted_interleave_group' is now globally accessible,
> > allowing external access to that sysfs.
> > 
> > With this change, sysfs management for weighted interleave is
> > more flexible, supporting hotplug events and runtime updates
> > more effectively.
> 
> I understand the urge to try to make a general case for a patch, but it
> is better to state the explicit reason especially when someone is later
> reading the history and may not realize that this is part of a series.
> 
> So instead of making claims like "this is more flexible / more effective
> for runtime updates", state that motivation explicitly.  Something like:
> 
> "In preparation for enabling weighted-interleave sysfs attributes to
> react to node-online/offline events, introduce sysfs_wi_node_add() and
> sysfs_wi_node_delete() helpers to dynamically manage the
> weighted-interleave attributes.
> 
> A follow-on patch registers a memory-hotplug notifier to use these
> helpers, for now just refactor the current "publish all possible node"
> approach to use sysfs_wi_node_{add,delete}()."
> 

Hi Dan Williams,

Thank you for your response regarding this patch.

First, I would like to mention that this version is v3, and the latest
version is v5:
https://lore.kernel.org/lkml/20250402014906.1086-1-rakie.kim@sk.com/

However, I believe that all of your suggestions should also apply to v5,
and I completely agree with your feedback.

I will incorporate the improvements you proposed and prepare v6
accordingly.

Rakie


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2025-04-03  4:26 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-03-20  4:17 [PATCH v3 0/3] Enhance sysfs handling for memory hotplug in weighted interleave Rakie Kim
2025-03-20  4:17 ` [PATCH v3 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs Rakie Kim
2025-03-20  5:40   ` Rakie Kim
2025-03-20 16:59     ` Gregory Price
2025-03-21  4:36       ` Rakie Kim
2025-03-21  4:53         ` Gregory Price
2025-03-21  5:06           ` Rakie Kim
2025-03-20 16:45   ` Joshua Hahn
2025-03-21  4:37     ` Rakie Kim
2025-03-21 14:03       ` Gregory Price
2025-03-24  8:47         ` Rakie Kim
2025-03-21 13:59   ` Gregory Price
2025-03-24 16:40   ` Markus Elfring
2025-03-25 10:27     ` Rakie Kim
2025-03-20  4:17 ` [PATCH v3 2/3] mm/mempolicy: Support dynamic sysfs updates for weighted interleave Rakie Kim
2025-03-21 14:09   ` Gregory Price
2025-03-24  8:48     ` Rakie Kim
2025-04-02 16:33   ` Dan Williams
2025-04-03  4:25     ` Rakie Kim
2025-03-20  4:17 ` [PATCH v3 3/3] mm/mempolicy: Support memory hotplug in " Rakie Kim
2025-03-21 14:24   ` Gregory Price
2025-03-24  8:48     ` Rakie Kim
2025-03-24  8:54       ` Rakie Kim
2025-03-24 13:32         ` Gregory Price
2025-03-25 10:27           ` Rakie Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox