[PATCH v8 0/3] Enhance sysfs handling for memory hotplug in weighted interleave

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v8 0/3] Enhance sysfs handling for memory hotplug in weighted interleave
@ 2025-04-16 11:31 Rakie Kim
  2025-04-16 11:31 ` [PATCH v8 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs Rakie Kim
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Rakie Kim @ 2025-04-16 11:31 UTC (permalink / raw)
  To: akpm
  Cc: gourry, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
	dan.j.williams, ying.huang, david, Jonathan.Cameron, osalvador,
	kernel_team, honggyu.kim, yunjeong.mun, rakie.kim

The following patch series enhances the weighted interleave policy in the
memory management subsystem by improving sysfs handling, fixing memory leaks,
and introducing dynamic sysfs updates for memory hotplug support.

Changes from v7:
https://lore.kernel.org/all/20250408073243.488-1-rakie.kim@sk.com/
- Refactored error handling paths to remove unnecessary `goto` usage
- Renamed unclear variables and functions for better readability

Changes from v6:
https://lore.kernel.org/all/20250404074623.1179-1-rakie.kim@sk.com/
- Removed redundant error handling in MEM_OFFLINE case

Changes from v5:
https://lore.kernel.org/all/20250402014906.1086-1-rakie.kim@sk.com/
- Added lock protection to sensitive sections

Changes from v4:
https://lore.kernel.org/all/20250401090901.1050-1-rakie.kim@sk.com/
- Added missing `kobject_del()` when removing a kobject
- Extended locking coverage to protect shared state

Changes from v3:
https://lore.kernel.org/all/20250320041749.881-1-rakie.kim@sk.com/
- Added error handling for allocation and cleanup paths
- Replaced static node attribute list with flexible array
- Reorganized four patches into three based on their functional scope

Changes from v2:
https://lore.kernel.org/all/20250312075628.648-1-rakie.kim@sk.com/
- Clarified commit messages
- Refactored to avoid passing the global structure as a parameter

Changes from v1:
https://lore.kernel.org/all/20250307063534.540-1-rakie.kim@sk.com/
- Fixed memory leaks related to `kobject` lifecycle
- Refactored sysfs layout for hotplug readiness
- Introduced initial memory hotplug support for weighted interleave

### Introduction
The weighted interleave policy distributes memory allocations across multiple
NUMA nodes based on their performance weight, thereby optimizing memory
bandwidth utilization. The weight values are configured through sysfs.

Previously, sysfs entries for weighted interleave were managed statically
at initialization. This led to several issues:
- Memory Leaks: Improper `kobject` teardown caused memory leaks
  when initialization failed or when nodes were removed.
- Lack of Dynamic Updates: Sysfs attributes were created only during
  initialization, preventing nodes added at runtime from being recognized.
- Handling of Unusable Nodes: Sysfs entries were generated for all
  possible nodes (`N_POSSIBLE`), including memoryless or unavailable nodes,
  leading to sysfs entries for unusable nodes and potential
  misconfigurations.

### Patch Overview
1. [PATCH 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs
   - Ensures proper cleanup of `kobject` allocations.
   - Adds `kobject_del()` before `kobject_put()` to clean up sysfs state correctly.
   - Prevents memory/resource leaks and improves teardown behavior.
   - Ensures that `sysfs_remove_file()` is not called from the release path
     after `kobject_del()` has cleared sysfs state, to avoid potential
     inconsistencies and warnings in the kernfs subsystem.

2. [PATCH 2/3] mm/mempolicy: Prepare weighted interleave sysfs for memory hotplug
   - Refactors static sysfs layout into a new `sysfs_wi_group` structure.
   - Makes per-node sysfs attributes accessible to external modules.
   - Lays groundwork for future hotplug support by enabling runtime modification.

3. [PATCH 3/3] mm/mempolicy: Support memory hotplug in weighted interleave
   - Dynamically adds/removes sysfs entries when nodes are online/offline.
   - Limits sysfs creation to nodes with memory, avoiding unusable node entries.
   - Hooks into memory hotplug notifier for runtime updates.

These patches have been tested under CXL-based memory configurations,
including hotplug scenarios, to ensure proper behavior and stability.

 mm/mempolicy.c | 230 ++++++++++++++++++++++++++++++++-----------------
 1 file changed, 150 insertions(+), 80 deletions(-)


base-commit: 8ffd015db85fea3e15a77027fda6c02ced4d2444
-- 
2.34.1



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v8 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs
  2025-04-16 11:31 [PATCH v8 0/3] Enhance sysfs handling for memory hotplug in weighted interleave Rakie Kim
@ 2025-04-16 11:31 ` Rakie Kim
  2025-04-16 21:54   ` Dan Williams
  2025-04-16 11:31 ` [PATCH v8 2/3] mm/mempolicy: Prepare weighted interleave sysfs for memory hotplug Rakie Kim
  2025-04-16 11:31 ` [PATCH v8 3/3] mm/mempolicy: Support memory hotplug in weighted interleave Rakie Kim
  2 siblings, 1 reply; 10+ messages in thread
From: Rakie Kim @ 2025-04-16 11:31 UTC (permalink / raw)
  To: akpm
  Cc: gourry, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
	dan.j.williams, ying.huang, david, Jonathan.Cameron, osalvador,
	kernel_team, honggyu.kim, yunjeong.mun, rakie.kim

Memory leaks occurred when removing sysfs attributes for weighted
interleave. Improper kobject deallocation led to unreleased memory
when initialization failed or when nodes were removed.

This patch resolves the issue by replacing unnecessary `kfree()`
calls with proper `kobject_del()` and `kobject_put()` sequences,
ensuring correct teardown and preventing memory leaks.

By explicitly calling `kobject_del()` before `kobject_put()`, the
release function is now invoked safely, and internal sysfs state is
correctly cleaned up. This guarantees that the memory associated with
the kobject is fully released and avoids resource leaks, thereby
improving system stability.

Additionally, sysfs_remove_file() is no longer called from the release
function to avoid accessing invalid sysfs state after kobject_del().
All attribute removals are now done before kobject_del(), preventing
WARN_ON() in kernfs and ensuring safe and consistent cleanup of sysfs
entries.

Fixes: dce41f5ae253 ("mm/mempolicy: implement the sysfs-based weighted_interleave interface")
Signed-off-by: Rakie Kim <rakie.kim@sk.com>
Reviewed-by: Gregory Price <gourry@gourry.net>
Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 mm/mempolicy.c | 111 +++++++++++++++++++++++++++----------------------
 1 file changed, 61 insertions(+), 50 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index b28a1e6ae096..dcf03c389b51 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -3463,8 +3463,8 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
 
 static struct iw_node_attr **node_attrs;
 
-static void sysfs_wi_node_release(struct iw_node_attr *node_attr,
-				  struct kobject *parent)
+static void sysfs_wi_node_delete(struct iw_node_attr *node_attr,
+				 struct kobject *parent)
 {
 	if (!node_attr)
 		return;
@@ -3473,18 +3473,41 @@ static void sysfs_wi_node_release(struct iw_node_attr *node_attr,
 	kfree(node_attr);
 }
 
-static void sysfs_wi_release(struct kobject *wi_kobj)
+static void sysfs_wi_node_delete_all(struct kobject *wi_kobj)
 {
-	int i;
+	int nid;
 
-	for (i = 0; i < nr_node_ids; i++)
-		sysfs_wi_node_release(node_attrs[i], wi_kobj);
-	kobject_put(wi_kobj);
+	for (nid = 0; nid < nr_node_ids; nid++)
+		sysfs_wi_node_delete(node_attrs[nid], wi_kobj);
+}
+
+static void iw_table_free(void)
+{
+	u8 *old;
+
+	mutex_lock(&iw_table_lock);
+	old = rcu_dereference_protected(iw_table,
+					lockdep_is_held(&iw_table_lock));
+	if (old) {
+		rcu_assign_pointer(iw_table, NULL);
+		mutex_unlock(&iw_table_lock);
+
+		synchronize_rcu();
+		kfree(old);
+	} else
+		mutex_unlock(&iw_table_lock);
+}
+
+static void wi_kobj_release(struct kobject *wi_kobj)
+{
+	iw_table_free();
+	kfree(node_attrs);
+	kfree(wi_kobj);
 }
 
 static const struct kobj_type wi_ktype = {
 	.sysfs_ops = &kobj_sysfs_ops,
-	.release = sysfs_wi_release,
+	.release = wi_kobj_release,
 };
 
 static int add_weight_node(int nid, struct kobject *wi_kobj)
@@ -3525,41 +3548,42 @@ static int add_weighted_interleave_group(struct kobject *root_kobj)
 	struct kobject *wi_kobj;
 	int nid, err;
 
+	node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
+			     GFP_KERNEL);
+	if (!node_attrs)
+		return -ENOMEM;
+
 	wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
-	if (!wi_kobj)
+	if (!wi_kobj) {
+		kfree(node_attrs);
 		return -ENOMEM;
+	}
 
 	err = kobject_init_and_add(wi_kobj, &wi_ktype, root_kobj,
 				   "weighted_interleave");
-	if (err) {
-		kfree(wi_kobj);
-		return err;
-	}
+	if (err)
+		goto err_put_kobj;
 
 	for_each_node_state(nid, N_POSSIBLE) {
 		err = add_weight_node(nid, wi_kobj);
 		if (err) {
 			pr_err("failed to add sysfs [node%d]\n", nid);
-			break;
+			goto err_cleanup_kobj;
 		}
 	}
-	if (err)
-		kobject_put(wi_kobj);
+
 	return 0;
+
+err_cleanup_kobj:
+	sysfs_wi_node_delete_all(wi_kobj);
+	kobject_del(wi_kobj);
+err_put_kobj:
+	kobject_put(wi_kobj);
+	return err;
 }
 
 static void mempolicy_kobj_release(struct kobject *kobj)
 {
-	u8 *old;
-
-	mutex_lock(&iw_table_lock);
-	old = rcu_dereference_protected(iw_table,
-					lockdep_is_held(&iw_table_lock));
-	rcu_assign_pointer(iw_table, NULL);
-	mutex_unlock(&iw_table_lock);
-	synchronize_rcu();
-	kfree(old);
-	kfree(node_attrs);
 	kfree(kobj);
 }
 
@@ -3573,37 +3597,24 @@ static int __init mempolicy_sysfs_init(void)
 	static struct kobject *mempolicy_kobj;
 
 	mempolicy_kobj = kzalloc(sizeof(*mempolicy_kobj), GFP_KERNEL);
-	if (!mempolicy_kobj) {
-		err = -ENOMEM;
-		goto err_out;
-	}
-
-	node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
-			     GFP_KERNEL);
-	if (!node_attrs) {
-		err = -ENOMEM;
-		goto mempol_out;
-	}
+	if (!mempolicy_kobj)
+		return -ENOMEM;
 
 	err = kobject_init_and_add(mempolicy_kobj, &mempolicy_ktype, mm_kobj,
 				   "mempolicy");
 	if (err)
-		goto node_out;
+		goto err_put_kobj;
 
 	err = add_weighted_interleave_group(mempolicy_kobj);
-	if (err) {
-		pr_err("mempolicy sysfs structure failed to initialize\n");
-		kobject_put(mempolicy_kobj);
-		return err;
-	}
+	if (err)
+		goto err_del_kobj;
 
-	return err;
-node_out:
-	kfree(node_attrs);
-mempol_out:
-	kfree(mempolicy_kobj);
-err_out:
-	pr_err("failed to add mempolicy kobject to the system\n");
+	return 0;
+
+err_del_kobj:
+	kobject_del(mempolicy_kobj);
+err_put_kobj:
+	kobject_put(mempolicy_kobj);
 	return err;
 }
 
-- 
2.34.1



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v8 2/3] mm/mempolicy: Prepare weighted interleave sysfs for memory hotplug
  2025-04-16 11:31 [PATCH v8 0/3] Enhance sysfs handling for memory hotplug in weighted interleave Rakie Kim
  2025-04-16 11:31 ` [PATCH v8 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs Rakie Kim
@ 2025-04-16 11:31 ` Rakie Kim
  2025-04-16 23:07   ` Dan Williams
  2025-04-16 11:31 ` [PATCH v8 3/3] mm/mempolicy: Support memory hotplug in weighted interleave Rakie Kim
  2 siblings, 1 reply; 10+ messages in thread
From: Rakie Kim @ 2025-04-16 11:31 UTC (permalink / raw)
  To: akpm
  Cc: gourry, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
	dan.j.williams, ying.huang, david, Jonathan.Cameron, osalvador,
	kernel_team, honggyu.kim, yunjeong.mun, rakie.kim

Previously, the weighted interleave sysfs structure was statically
managed during initialization. This prevented new nodes from being
recognized when memory hotplug events occurred, limiting the ability
to update or extend sysfs entries dynamically at runtime.

To address this, this patch refactors the sysfs infrastructure and
encapsulates it within a new structure, `sysfs_wi_group`, which holds
both the kobject and an array of node attribute pointers.

By allocating this group structure globally, the per-node sysfs
attributes can be managed beyond initialization time, enabling
external modules to insert or remove node entries in response to
events such as memory hotplug or node online/offline transitions.

Instead of allocating all per-node sysfs attributes at once, the
initialization path now uses the existing sysfs_wi_node_add() and
sysfs_wi_node_delete() helpers. This refactoring makes it possible
to modularly manage per-node sysfs entries and ensures the
infrastructure is ready for runtime extension.

Signed-off-by: Rakie Kim <rakie.kim@sk.com>
Reviewed-by: Gregory Price <gourry@gourry.net>
Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com>
---
 mm/mempolicy.c | 60 ++++++++++++++++++++++++--------------------------
 1 file changed, 29 insertions(+), 31 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index dcf03c389b51..998635127e9d 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -3419,6 +3419,13 @@ struct iw_node_attr {
 	int nid;
 };
 
+struct sysfs_wi_group {
+	struct kobject wi_kobj;
+	struct iw_node_attr *nattrs[];
+};
+
+static struct sysfs_wi_group *wi_group;
+
 static ssize_t node_show(struct kobject *kobj, struct kobj_attribute *attr,
 			 char *buf)
 {
@@ -3461,24 +3468,23 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
 	return count;
 }
 
-static struct iw_node_attr **node_attrs;
-
-static void sysfs_wi_node_delete(struct iw_node_attr *node_attr,
-				 struct kobject *parent)
+static void sysfs_wi_node_delete(int nid)
 {
-	if (!node_attr)
+	if (!wi_group->nattrs[nid])
 		return;
-	sysfs_remove_file(parent, &node_attr->kobj_attr.attr);
-	kfree(node_attr->kobj_attr.attr.name);
-	kfree(node_attr);
+
+	sysfs_remove_file(&wi_group->wi_kobj,
+			  &wi_group->nattrs[nid]->kobj_attr.attr);
+	kfree(wi_group->nattrs[nid]->kobj_attr.attr.name);
+	kfree(wi_group->nattrs[nid]);
 }
 
-static void sysfs_wi_node_delete_all(struct kobject *wi_kobj)
+static void sysfs_wi_node_delete_all(void)
 {
 	int nid;
 
 	for (nid = 0; nid < nr_node_ids; nid++)
-		sysfs_wi_node_delete(node_attrs[nid], wi_kobj);
+		sysfs_wi_node_delete(nid);
 }
 
 static void iw_table_free(void)
@@ -3501,8 +3507,7 @@ static void iw_table_free(void)
 static void wi_kobj_release(struct kobject *wi_kobj)
 {
 	iw_table_free();
-	kfree(node_attrs);
-	kfree(wi_kobj);
+	kfree(wi_group);
 }
 
 static const struct kobj_type wi_ktype = {
@@ -3510,7 +3515,7 @@ static const struct kobj_type wi_ktype = {
 	.release = wi_kobj_release,
 };
 
-static int add_weight_node(int nid, struct kobject *wi_kobj)
+static int sysfs_wi_node_add(int nid)
 {
 	struct iw_node_attr *node_attr;
 	char *name;
@@ -3532,40 +3537,33 @@ static int add_weight_node(int nid, struct kobject *wi_kobj)
 	node_attr->kobj_attr.store = node_store;
 	node_attr->nid = nid;
 
-	if (sysfs_create_file(wi_kobj, &node_attr->kobj_attr.attr)) {
+	if (sysfs_create_file(&wi_group->wi_kobj, &node_attr->kobj_attr.attr)) {
 		kfree(node_attr->kobj_attr.attr.name);
 		kfree(node_attr);
 		pr_err("failed to add attribute to weighted_interleave\n");
 		return -ENOMEM;
 	}
 
-	node_attrs[nid] = node_attr;
+	wi_group->nattrs[nid] = node_attr;
 	return 0;
 }
 
-static int add_weighted_interleave_group(struct kobject *root_kobj)
+static int __init add_weighted_interleave_group(struct kobject *mempolicy_kobj)
 {
-	struct kobject *wi_kobj;
 	int nid, err;
 
-	node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
-			     GFP_KERNEL);
-	if (!node_attrs)
-		return -ENOMEM;
-
-	wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
-	if (!wi_kobj) {
-		kfree(node_attrs);
+	wi_group = kzalloc(struct_size(wi_group, nattrs, nr_node_ids),
+			   GFP_KERNEL);
+	if (!wi_group)
 		return -ENOMEM;
-	}
 
-	err = kobject_init_and_add(wi_kobj, &wi_ktype, root_kobj,
+	err = kobject_init_and_add(&wi_group->wi_kobj, &wi_ktype, mempolicy_kobj,
 				   "weighted_interleave");
 	if (err)
 		goto err_put_kobj;
 
 	for_each_node_state(nid, N_POSSIBLE) {
-		err = add_weight_node(nid, wi_kobj);
+		err = sysfs_wi_node_add(nid);
 		if (err) {
 			pr_err("failed to add sysfs [node%d]\n", nid);
 			goto err_cleanup_kobj;
@@ -3575,10 +3573,10 @@ static int add_weighted_interleave_group(struct kobject *root_kobj)
 	return 0;
 
 err_cleanup_kobj:
-	sysfs_wi_node_delete_all(wi_kobj);
-	kobject_del(wi_kobj);
+	sysfs_wi_node_delete_all();
+	kobject_del(&wi_group->wi_kobj);
 err_put_kobj:
-	kobject_put(wi_kobj);
+	kobject_put(&wi_group->wi_kobj);
 	return err;
 }
 
-- 
2.34.1



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v8 3/3] mm/mempolicy: Support memory hotplug in weighted interleave
  2025-04-16 11:31 [PATCH v8 0/3] Enhance sysfs handling for memory hotplug in weighted interleave Rakie Kim
  2025-04-16 11:31 ` [PATCH v8 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs Rakie Kim
  2025-04-16 11:31 ` [PATCH v8 2/3] mm/mempolicy: Prepare weighted interleave sysfs for memory hotplug Rakie Kim
@ 2025-04-16 11:31 ` Rakie Kim
  2025-04-16 23:08   ` Dan Williams
  2 siblings, 1 reply; 10+ messages in thread
From: Rakie Kim @ 2025-04-16 11:31 UTC (permalink / raw)
  To: akpm
  Cc: gourry, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
	dan.j.williams, ying.huang, david, Jonathan.Cameron, osalvador,
	kernel_team, honggyu.kim, yunjeong.mun, rakie.kim

The weighted interleave policy distributes page allocations across multiple
NUMA nodes based on their performance weight, thereby improving memory
bandwidth utilization. The weight values for each node are configured
through sysfs.

Previously, sysfs entries for configuring weighted interleave were created
for all possible nodes (N_POSSIBLE) at initialization, including nodes that
might not have memory. However, not all nodes in N_POSSIBLE are usable at
runtime, as some may remain memoryless or offline.
This led to sysfs entries being created for unusable nodes, causing
potential misconfiguration issues.

To address this issue, this patch modifies the sysfs creation logic to:
1) Limit sysfs entries to nodes that are online and have memory, avoiding
   the creation of sysfs entries for nodes that cannot be used.
2) Support memory hotplug by dynamically adding and removing sysfs entries
   based on whether a node transitions into or out of the N_MEMORY state.

Additionally, the patch ensures that sysfs attributes are properly managed
when nodes go offline, preventing stale or redundant entries from persisting
in the system.

By making these changes, the weighted interleave policy now manages its
sysfs entries more efficiently, ensuring that only relevant nodes are
considered for interleaving, and dynamically adapting to memory hotplug
events.

Co-developed-by: Honggyu Kim <honggyu.kim@sk.com>
Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
Co-developed-by: Yunjeong Mun <yunjeong.mun@sk.com>
Signed-off-by: Yunjeong Mun <yunjeong.mun@sk.com>
Signed-off-by: Rakie Kim <rakie.kim@sk.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Reviewed-by: Gregory Price <gourry@gourry.net>
Acked-by: David Hildenbrand <david@redhat.com>
---
 mm/mempolicy.c | 107 ++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 84 insertions(+), 23 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 998635127e9d..646fc9e8c8ac 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -113,6 +113,7 @@
 #include <asm/tlbflush.h>
 #include <asm/tlb.h>
 #include <linux/uaccess.h>
+#include <linux/memory.h>
 
 #include "internal.h"
 
@@ -3421,6 +3422,7 @@ struct iw_node_attr {
 
 struct sysfs_wi_group {
 	struct kobject wi_kobj;
+	struct mutex kobj_lock;
 	struct iw_node_attr *nattrs[];
 };
 
@@ -3470,13 +3472,24 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
 
 static void sysfs_wi_node_delete(int nid)
 {
-	if (!wi_group->nattrs[nid])
+	struct iw_node_attr *attr;
+
+	if (nid < 0 || nid >= nr_node_ids)
+		return;
+
+	mutex_lock(&wi_group->kobj_lock);
+	attr = wi_group->nattrs[nid];
+	if (!attr) {
+		mutex_unlock(&wi_group->kobj_lock);
 		return;
+	}
+
+	wi_group->nattrs[nid] = NULL;
+	mutex_unlock(&wi_group->kobj_lock);
 
-	sysfs_remove_file(&wi_group->wi_kobj,
-			  &wi_group->nattrs[nid]->kobj_attr.attr);
-	kfree(wi_group->nattrs[nid]->kobj_attr.attr.name);
-	kfree(wi_group->nattrs[nid]);
+	sysfs_remove_file(&wi_group->wi_kobj, &attr->kobj_attr.attr);
+	kfree(attr->kobj_attr.attr.name);
+	kfree(attr);
 }
 
 static void sysfs_wi_node_delete_all(void)
@@ -3517,35 +3530,77 @@ static const struct kobj_type wi_ktype = {
 
 static int sysfs_wi_node_add(int nid)
 {
-	struct iw_node_attr *node_attr;
+	int ret = 0;
 	char *name;
+	struct iw_node_attr *new_attr;
 
-	node_attr = kzalloc(sizeof(*node_attr), GFP_KERNEL);
-	if (!node_attr)
+	if (nid < 0 || nid >= nr_node_ids) {
+		pr_err("invalid node id: %d\n", nid);
+		return -EINVAL;
+	}
+
+	new_attr = kzalloc(sizeof(*new_attr), GFP_KERNEL);
+	if (!new_attr)
 		return -ENOMEM;
 
 	name = kasprintf(GFP_KERNEL, "node%d", nid);
 	if (!name) {
-		kfree(node_attr);
+		kfree(new_attr);
 		return -ENOMEM;
 	}
 
-	sysfs_attr_init(&node_attr->kobj_attr.attr);
-	node_attr->kobj_attr.attr.name = name;
-	node_attr->kobj_attr.attr.mode = 0644;
-	node_attr->kobj_attr.show = node_show;
-	node_attr->kobj_attr.store = node_store;
-	node_attr->nid = nid;
+	sysfs_attr_init(&new_attr->kobj_attr.attr);
+	new_attr->kobj_attr.attr.name = name;
+	new_attr->kobj_attr.attr.mode = 0644;
+	new_attr->kobj_attr.show = node_show;
+	new_attr->kobj_attr.store = node_store;
+	new_attr->nid = nid;
 
-	if (sysfs_create_file(&wi_group->wi_kobj, &node_attr->kobj_attr.attr)) {
-		kfree(node_attr->kobj_attr.attr.name);
-		kfree(node_attr);
-		pr_err("failed to add attribute to weighted_interleave\n");
-		return -ENOMEM;
+	mutex_lock(&wi_group->kobj_lock);
+	if (wi_group->nattrs[nid]) {
+		mutex_unlock(&wi_group->kobj_lock);
+		pr_info("node%d already exists\n", nid);
+		goto out;
 	}
 
-	wi_group->nattrs[nid] = node_attr;
+	ret = sysfs_create_file(&wi_group->wi_kobj, &new_attr->kobj_attr.attr);
+	if (ret) {
+		mutex_unlock(&wi_group->kobj_lock);
+		goto out;
+	}
+	wi_group->nattrs[nid] = new_attr;
+	mutex_unlock(&wi_group->kobj_lock);
 	return 0;
+
+out:
+	kfree(new_attr->kobj_attr.attr.name);
+	kfree(new_attr);
+	return ret;
+}
+
+static int wi_node_notifier(struct notifier_block *nb,
+			       unsigned long action, void *data)
+{
+	int err;
+	struct memory_notify *arg = data;
+	int nid = arg->status_change_nid;
+
+	if (nid < 0)
+		return NOTIFY_OK;
+
+	switch (action) {
+	case MEM_ONLINE:
+		err = sysfs_wi_node_add(nid);
+		if (err)
+			pr_err("failed to add sysfs for node%d during hotplug: %d\n",
+			       nid, err);
+		break;
+	case MEM_OFFLINE:
+		sysfs_wi_node_delete(nid);
+		break;
+	}
+
+	return NOTIFY_OK;
 }
 
 static int __init add_weighted_interleave_group(struct kobject *mempolicy_kobj)
@@ -3556,20 +3611,26 @@ static int __init add_weighted_interleave_group(struct kobject *mempolicy_kobj)
 			   GFP_KERNEL);
 	if (!wi_group)
 		return -ENOMEM;
+	mutex_init(&wi_group->kobj_lock);
 
 	err = kobject_init_and_add(&wi_group->wi_kobj, &wi_ktype, mempolicy_kobj,
 				   "weighted_interleave");
 	if (err)
 		goto err_put_kobj;
 
-	for_each_node_state(nid, N_POSSIBLE) {
+	for_each_online_node(nid) {
+		if (!node_state(nid, N_MEMORY))
+			continue;
+
 		err = sysfs_wi_node_add(nid);
 		if (err) {
-			pr_err("failed to add sysfs [node%d]\n", nid);
+			pr_err("failed to add sysfs for node%d during init: %d\n",
+			       nid, err);
 			goto err_cleanup_kobj;
 		}
 	}
 
+	hotplug_memory_notifier(wi_node_notifier, DEFAULT_CALLBACK_PRI);
 	return 0;
 
 err_cleanup_kobj:
-- 
2.34.1



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v8 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs
  2025-04-16 11:31 ` [PATCH v8 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs Rakie Kim
@ 2025-04-16 21:54   ` Dan Williams
  2025-04-17  1:49     ` Rakie Kim
  0 siblings, 1 reply; 10+ messages in thread
From: Dan Williams @ 2025-04-16 21:54 UTC (permalink / raw)
  To: Rakie Kim, akpm
  Cc: gourry, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
	dan.j.williams, ying.huang, david, Jonathan.Cameron, osalvador,
	kernel_team, honggyu.kim, yunjeong.mun, rakie.kim

Rakie Kim wrote:
> Memory leaks occurred when removing sysfs attributes for weighted
> interleave. Improper kobject deallocation led to unreleased memory
> when initialization failed or when nodes were removed.
> 
> This patch resolves the issue by replacing unnecessary `kfree()`
> calls with proper `kobject_del()` and `kobject_put()` sequences,
> ensuring correct teardown and preventing memory leaks.
> 
> By explicitly calling `kobject_del()` before `kobject_put()`, the
> release function is now invoked safely, and internal sysfs state is
> correctly cleaned up. This guarantees that the memory associated with
> the kobject is fully released and avoids resource leaks, thereby
> improving system stability.
> 
> Additionally, sysfs_remove_file() is no longer called from the release
> function to avoid accessing invalid sysfs state after kobject_del().
> All attribute removals are now done before kobject_del(), preventing
> WARN_ON() in kernfs and ensuring safe and consistent cleanup of sysfs
> entries.
> 
> Fixes: dce41f5ae253 ("mm/mempolicy: implement the sysfs-based weighted_interleave interface")
> Signed-off-by: Rakie Kim <rakie.kim@sk.com>
> Reviewed-by: Gregory Price <gourry@gourry.net>
> Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
>  mm/mempolicy.c | 111 +++++++++++++++++++++++++++----------------------
>  1 file changed, 61 insertions(+), 50 deletions(-)
> 
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index b28a1e6ae096..dcf03c389b51 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -3463,8 +3463,8 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
>  
>  static struct iw_node_attr **node_attrs;
>  
> -static void sysfs_wi_node_release(struct iw_node_attr *node_attr,
> -				  struct kobject *parent)
> +static void sysfs_wi_node_delete(struct iw_node_attr *node_attr,
> +				 struct kobject *parent)
>  {
>  	if (!node_attr)
>  		return;
> @@ -3473,18 +3473,41 @@ static void sysfs_wi_node_release(struct iw_node_attr *node_attr,
>  	kfree(node_attr);
>  }
>  
> -static void sysfs_wi_release(struct kobject *wi_kobj)
> +static void sysfs_wi_node_delete_all(struct kobject *wi_kobj)
>  {
> -	int i;
> +	int nid;
>  
> -	for (i = 0; i < nr_node_ids; i++)
> -		sysfs_wi_node_release(node_attrs[i], wi_kobj);
> -	kobject_put(wi_kobj);
> +	for (nid = 0; nid < nr_node_ids; nid++)
> +		sysfs_wi_node_delete(node_attrs[nid], wi_kobj);
> +}
> +
> +static void iw_table_free(void)
> +{
> +	u8 *old;
> +
> +	mutex_lock(&iw_table_lock);
> +	old = rcu_dereference_protected(iw_table,
> +					lockdep_is_held(&iw_table_lock));
> +	if (old) {
> +		rcu_assign_pointer(iw_table, NULL);
> +		mutex_unlock(&iw_table_lock);
> +
> +		synchronize_rcu();
> +		kfree(old);
> +	} else
> +		mutex_unlock(&iw_table_lock);

This looks correct. I personally would not have spent the effort to
avoid the synchronize_rcu() because this is an error path that rarely
gets triggered, and kfree(NULL) is already a nop, so no pressing need to be
careful there either:

	mutex_lock(&iw_table_lock);
	old = rcu_dereference_protected(iw_table,
					lockdep_is_held(&iw_table_lock));
	rcu_assign_pointer(iw_table, NULL);
	mutex_unlock(&iw_table_lock);
	synchronize_rcu();
	kfree(old);

> +}
> +
> +static void wi_kobj_release(struct kobject *wi_kobj)
> +{
> +	iw_table_free();

This memory can be freed as soon as node_attrs have been deleted. By
waiting until final wi_kobj release it confuses the lifetime rules.

> +	kfree(node_attrs);

This memory too can be freed as soon as the attributes are deleted.

...the rationale for considering these additional cleanups below:

> +	kfree(wi_kobj);
>  }
>  
>  static const struct kobj_type wi_ktype = {
>  	.sysfs_ops = &kobj_sysfs_ops,
> -	.release = sysfs_wi_release,
> +	.release = wi_kobj_release,
>  };
>  
>  static int add_weight_node(int nid, struct kobject *wi_kobj)
> @@ -3525,41 +3548,42 @@ static int add_weighted_interleave_group(struct kobject *root_kobj)
>  	struct kobject *wi_kobj;
>  	int nid, err;
>  
> +	node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
> +			     GFP_KERNEL);
> +	if (!node_attrs)
> +		return -ENOMEM;
> +
>  	wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
> -	if (!wi_kobj)
> +	if (!wi_kobj) {
> +		kfree(node_attrs);
>  		return -ENOMEM;
> +	}
>  
>  	err = kobject_init_and_add(wi_kobj, &wi_ktype, root_kobj,
>  				   "weighted_interleave");

If you fix wi_kobj_release() to stop being responsible to free memory
that should have been handled in the delete path (@node_attrs,
iw_table_free()), then you can also drop the wi_ktype and
wi_kobj_release() callback altogether.

I.e. once releasing @wi_kobj is just "kfree(wi_kobj)", then this
sequence:

	wi_kobj = kzalloc(...)
	kobject_init_and_add(wi_kob, &wi_ktype, ...)

Can simply become:

	wi_kobj = kobject_create_and_add("weighted_interleave", root_kobj);

> -	if (err) {
> -		kfree(wi_kobj);
> -		return err;
> -	}
> +	if (err)
> +		goto err_put_kobj;
>  
>  	for_each_node_state(nid, N_POSSIBLE) {
>  		err = add_weight_node(nid, wi_kobj);
>  		if (err) {
>  			pr_err("failed to add sysfs [node%d]\n", nid);
> -			break;
> +			goto err_cleanup_kobj;
>  		}
>  	}
> -	if (err)
> -		kobject_put(wi_kobj);
> +
>  	return 0;
> +
> +err_cleanup_kobj:
> +	sysfs_wi_node_delete_all(wi_kobj);
> +	kobject_del(wi_kobj);
> +err_put_kobj:
> +	kobject_put(wi_kobj);
> +	return err;
>  }
>  
>  static void mempolicy_kobj_release(struct kobject *kobj)
>  {
> -	u8 *old;
> -
> -	mutex_lock(&iw_table_lock);
> -	old = rcu_dereference_protected(iw_table,
> -					lockdep_is_held(&iw_table_lock));
> -	rcu_assign_pointer(iw_table, NULL);
> -	mutex_unlock(&iw_table_lock);
> -	synchronize_rcu();
> -	kfree(old);
> -	kfree(node_attrs);
>  	kfree(kobj);
>  }
>  
> @@ -3573,37 +3597,24 @@ static int __init mempolicy_sysfs_init(void)
>  	static struct kobject *mempolicy_kobj;
>  
>  	mempolicy_kobj = kzalloc(sizeof(*mempolicy_kobj), GFP_KERNEL);
> -	if (!mempolicy_kobj) {
> -		err = -ENOMEM;
> -		goto err_out;
> -	}
> -
> -	node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
> -			     GFP_KERNEL);
> -	if (!node_attrs) {
> -		err = -ENOMEM;
> -		goto mempol_out;
> -	}
> +	if (!mempolicy_kobj)
> +		return -ENOMEM;
>  
>  	err = kobject_init_and_add(mempolicy_kobj, &mempolicy_ktype, mm_kobj,
>  				   "mempolicy");

Similar comment as above, now that mempolicy_kobj_release() is simply
kfree(@kobj), you can use kobject_create_and_add() and make this all
that much simpler.

So the patch looks technically correct as is, but if you make those
final cleanups I will add my review tag.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v8 2/3] mm/mempolicy: Prepare weighted interleave sysfs for memory hotplug
  2025-04-16 11:31 ` [PATCH v8 2/3] mm/mempolicy: Prepare weighted interleave sysfs for memory hotplug Rakie Kim
@ 2025-04-16 23:07   ` Dan Williams
  2025-04-17  1:55     ` Rakie Kim
  0 siblings, 1 reply; 10+ messages in thread
From: Dan Williams @ 2025-04-16 23:07 UTC (permalink / raw)
  To: Rakie Kim, akpm
  Cc: gourry, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
	dan.j.williams, ying.huang, david, Jonathan.Cameron, osalvador,
	kernel_team, honggyu.kim, yunjeong.mun, rakie.kim

Rakie Kim wrote:
> Previously, the weighted interleave sysfs structure was statically
> managed during initialization. This prevented new nodes from being
> recognized when memory hotplug events occurred, limiting the ability
> to update or extend sysfs entries dynamically at runtime.
> 
> To address this, this patch refactors the sysfs infrastructure and
> encapsulates it within a new structure, `sysfs_wi_group`, which holds
> both the kobject and an array of node attribute pointers.
> 
> By allocating this group structure globally, the per-node sysfs
> attributes can be managed beyond initialization time, enabling
> external modules to insert or remove node entries in response to
> events such as memory hotplug or node online/offline transitions.
> 
> Instead of allocating all per-node sysfs attributes at once, the
> initialization path now uses the existing sysfs_wi_node_add() and
> sysfs_wi_node_delete() helpers. This refactoring makes it possible
> to modularly manage per-node sysfs entries and ensures the
> infrastructure is ready for runtime extension.
> 
> Signed-off-by: Rakie Kim <rakie.kim@sk.com>
> Reviewed-by: Gregory Price <gourry@gourry.net>
> Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com>
> ---
>  mm/mempolicy.c | 60 ++++++++++++++++++++++++--------------------------
>  1 file changed, 29 insertions(+), 31 deletions(-)
> 
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index dcf03c389b51..998635127e9d 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -3419,6 +3419,13 @@ struct iw_node_attr {
>  	int nid;
>  };
>  
> +struct sysfs_wi_group {
> +	struct kobject wi_kobj;
> +	struct iw_node_attr *nattrs[];
> +};
> +
> +static struct sysfs_wi_group *wi_group;
> +
>  static ssize_t node_show(struct kobject *kobj, struct kobj_attribute *attr,
>  			 char *buf)
>  {
> @@ -3461,24 +3468,23 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
>  	return count;
>  }
>  
> -static struct iw_node_attr **node_attrs;
> -
> -static void sysfs_wi_node_delete(struct iw_node_attr *node_attr,
> -				 struct kobject *parent)
> +static void sysfs_wi_node_delete(int nid)
>  {
> -	if (!node_attr)
> +	if (!wi_group->nattrs[nid])
>  		return;
> -	sysfs_remove_file(parent, &node_attr->kobj_attr.attr);
> -	kfree(node_attr->kobj_attr.attr.name);
> -	kfree(node_attr);
> +
> +	sysfs_remove_file(&wi_group->wi_kobj,
> +			  &wi_group->nattrs[nid]->kobj_attr.attr);
> +	kfree(wi_group->nattrs[nid]->kobj_attr.attr.name);
> +	kfree(wi_group->nattrs[nid]);
>  }
>  
> -static void sysfs_wi_node_delete_all(struct kobject *wi_kobj)
> +static void sysfs_wi_node_delete_all(void)
>  {
>  	int nid;
>  
>  	for (nid = 0; nid < nr_node_ids; nid++)
> -		sysfs_wi_node_delete(node_attrs[nid], wi_kobj);
> +		sysfs_wi_node_delete(nid);
>  }
>  
>  static void iw_table_free(void)
> @@ -3501,8 +3507,7 @@ static void iw_table_free(void)
>  static void wi_kobj_release(struct kobject *wi_kobj)
>  {
>  	iw_table_free();
> -	kfree(node_attrs);
> -	kfree(wi_kobj);
> +	kfree(wi_group);

Ah, look just one more iw_table_free() deletion to be able to switch to
kobject_create_and_add() flow.

For what this patch is though you can add:

Reviewed-by: Dan Williams <dan.j.williams@intel.com>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v8 3/3] mm/mempolicy: Support memory hotplug in weighted interleave
  2025-04-16 11:31 ` [PATCH v8 3/3] mm/mempolicy: Support memory hotplug in weighted interleave Rakie Kim
@ 2025-04-16 23:08   ` Dan Williams
  0 siblings, 0 replies; 10+ messages in thread
From: Dan Williams @ 2025-04-16 23:08 UTC (permalink / raw)
  To: Rakie Kim, akpm
  Cc: gourry, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
	dan.j.williams, ying.huang, david, Jonathan.Cameron, osalvador,
	kernel_team, honggyu.kim, yunjeong.mun, rakie.kim

Rakie Kim wrote:
> The weighted interleave policy distributes page allocations across multiple
> NUMA nodes based on their performance weight, thereby improving memory
> bandwidth utilization. The weight values for each node are configured
> through sysfs.
> 
> Previously, sysfs entries for configuring weighted interleave were created
> for all possible nodes (N_POSSIBLE) at initialization, including nodes that
> might not have memory. However, not all nodes in N_POSSIBLE are usable at
> runtime, as some may remain memoryless or offline.
> This led to sysfs entries being created for unusable nodes, causing
> potential misconfiguration issues.
> 
> To address this issue, this patch modifies the sysfs creation logic to:
> 1) Limit sysfs entries to nodes that are online and have memory, avoiding
>    the creation of sysfs entries for nodes that cannot be used.
> 2) Support memory hotplug by dynamically adding and removing sysfs entries
>    based on whether a node transitions into or out of the N_MEMORY state.
> 
> Additionally, the patch ensures that sysfs attributes are properly managed
> when nodes go offline, preventing stale or redundant entries from persisting
> in the system.
> 
> By making these changes, the weighted interleave policy now manages its
> sysfs entries more efficiently, ensuring that only relevant nodes are
> considered for interleaving, and dynamically adapting to memory hotplug
> events.
> 
> Co-developed-by: Honggyu Kim <honggyu.kim@sk.com>
> Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
> Co-developed-by: Yunjeong Mun <yunjeong.mun@sk.com>
> Signed-off-by: Yunjeong Mun <yunjeong.mun@sk.com>
> Signed-off-by: Rakie Kim <rakie.kim@sk.com>
> Reviewed-by: Oscar Salvador <osalvador@suse.de>
> Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com>
> Reviewed-by: Gregory Price <gourry@gourry.net>
> Acked-by: David Hildenbrand <david@redhat.com>

Looks good.

Reviewed-by: Dan Williams <dan.j.williams@intel.com>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v8 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs
  2025-04-16 21:54   ` Dan Williams
@ 2025-04-17  1:49     ` Rakie Kim
  2025-04-17  3:23       ` Dan Williams
  0 siblings, 1 reply; 10+ messages in thread
From: Rakie Kim @ 2025-04-17  1:49 UTC (permalink / raw)
  To: Dan Williams
  Cc: gourry, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
	ying.huang, david, Jonathan.Cameron, osalvador, kernel_team,
	honggyu.kim, yunjeong.mun, rakie.kim, akpm

On Wed, 16 Apr 2025 14:54:16 -0700 Dan Williams <dan.j.williams@intel.com> wrote:
> Rakie Kim wrote:
> > +
> > +static void iw_table_free(void)
> > +{
> > +	u8 *old;
> > +
> > +	mutex_lock(&iw_table_lock);
> > +	old = rcu_dereference_protected(iw_table,
> > +					lockdep_is_held(&iw_table_lock));
> > +	if (old) {
> > +		rcu_assign_pointer(iw_table, NULL);
> > +		mutex_unlock(&iw_table_lock);
> > +
> > +		synchronize_rcu();
> > +		kfree(old);
> > +	} else
> > +		mutex_unlock(&iw_table_lock);
> 
> This looks correct. I personally would not have spent the effort to
> avoid the synchronize_rcu() because this is an error path that rarely
> gets triggered, and kfree(NULL) is already a nop, so no pressing need to be
> careful there either:
> 
> 	mutex_lock(&iw_table_lock);
> 	old = rcu_dereference_protected(iw_table,
> 					lockdep_is_held(&iw_table_lock));
> 	rcu_assign_pointer(iw_table, NULL);
> 	mutex_unlock(&iw_table_lock);
> 	synchronize_rcu();
> 	kfree(old);

I will modify the structure as you suggested.

> 
> > +}
> > +
> > +static void wi_kobj_release(struct kobject *wi_kobj)
> > +{
> > +	iw_table_free();
> 
> This memory can be freed as soon as node_attrs have been deleted. By
> waiting until final wi_kobj release it confuses the lifetime rules.
> 
> > +	kfree(node_attrs);
> 
> This memory too can be freed as soon as the attributes are deleted.
> 
> ...the rationale for considering these additional cleanups below:
> 

I created a new function named wi_cleanup(), as you proposed.
static void wi_cleanup(struct kobject *wi_kobj) {
	sysfs_wi_node_delete_all(wi_kobj);
	iw_table_free();
	kfree(node_attrs);
}
And I changed the error handling code to call this function.
static int add_weighted_interleave_group(struct kobject *root_kobj)
{
...
err_cleanup_kobj:
	wi_cleanup(wi_kobj);
	kobject_del(wi_kobj);
err_put_kobj:
	kobject_put(wi_kobj);
	return err;
}

However, I have one question.
With this change, iw_table and node_attrs will not be freed at system
shutdown. Is it acceptable to leave this memory unfreed on shutdown?
(As you previously noted, the sysfs files in this patch are also
not removed during system shutdown.)

> > +	kfree(wi_kobj);
> >  }
> >  
> >  static const struct kobj_type wi_ktype = {
> >  	.sysfs_ops = &kobj_sysfs_ops,
> > -	.release = sysfs_wi_release,
> > +	.release = wi_kobj_release,
> >  };
> >  
> >  static int add_weight_node(int nid, struct kobject *wi_kobj)
> > @@ -3525,41 +3548,42 @@ static int add_weighted_interleave_group(struct kobject *root_kobj)
> >  	struct kobject *wi_kobj;
> >  	int nid, err;
> >  
> > +	node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
> > +			     GFP_KERNEL);
> > +	if (!node_attrs)
> > +		return -ENOMEM;
> > +
> >  	wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
> > -	if (!wi_kobj)
> > +	if (!wi_kobj) {
> > +		kfree(node_attrs);
> >  		return -ENOMEM;
> > +	}
> >  
> >  	err = kobject_init_and_add(wi_kobj, &wi_ktype, root_kobj,
> >  				   "weighted_interleave");
> 
> If you fix wi_kobj_release() to stop being responsible to free memory
> that should have been handled in the delete path (@node_attrs,
> iw_table_free()), then you can also drop the wi_ktype and
> wi_kobj_release() callback altogether.

I understand your suggestion about simplifying the kobject
handling.
If we only consider Patch1, then replacing kobject_init_and_add
with kobject_create_and_add would be the right choice.

However, in Patch2, the code changes as follows:
struct sysfs_wi_group {
	struct kobject wi_kobj;
	struct iw_node_attr *nattrs[];
};
static struct sysfs_wi_group *wi_group;
...
static void wi_kobj_release(struct kobject *wi_kobj)
{
	kfree(wi_group);
}
...
static int __init add_weighted_interleave_group(struct kobject *mempolicy_kobj)
{
	int nid, err;

	wi_group = kzalloc(struct_size(wi_group, nattrs, nr_node_ids),
			   GFP_KERNEL);

In this case, wi_kobj_release() is responsible for freeing the
container struct wi_group that includes the kobject.
Therefore, it seems more appropriate to use kobject_init_and_add
in this context.
I would appreciate your thoughts on this.

> 
> I.e. once releasing @wi_kobj is just "kfree(wi_kobj)", then this
> sequence:
> 
> 	wi_kobj = kzalloc(...)
> 	kobject_init_and_add(wi_kob, &wi_ktype, ...)
> 
> Can simply become:
> 
> 	wi_kobj = kobject_create_and_add("weighted_interleave", root_kobj);
> 
> > -	if (err) {
> > -		kfree(wi_kobj);
> > -		return err;
> > -	}
> > +	if (err)
> > +		goto err_put_kobj;
> >  
> >  	for_each_node_state(nid, N_POSSIBLE) {
> >  		err = add_weight_node(nid, wi_kobj);
> >  		if (err) {
> >  			pr_err("failed to add sysfs [node%d]\n", nid);
> > -			break;
> > +			goto err_cleanup_kobj;
> >  		}
> >  	}
> > -	if (err)
> > -		kobject_put(wi_kobj);
> > +
> >  	return 0;
> > +
> > +err_cleanup_kobj:
> > +	sysfs_wi_node_delete_all(wi_kobj);
> > +	kobject_del(wi_kobj);
> > +err_put_kobj:
> > +	kobject_put(wi_kobj);
> > +	return err;
> >  }
> >  
> >  static void mempolicy_kobj_release(struct kobject *kobj)
> >  {
> > -	u8 *old;
> > -
> > -	mutex_lock(&iw_table_lock);
> > -	old = rcu_dereference_protected(iw_table,
> > -					lockdep_is_held(&iw_table_lock));
> > -	rcu_assign_pointer(iw_table, NULL);
> > -	mutex_unlock(&iw_table_lock);
> > -	synchronize_rcu();
> > -	kfree(old);
> > -	kfree(node_attrs);
> >  	kfree(kobj);
> >  }
> >  
> > @@ -3573,37 +3597,24 @@ static int __init mempolicy_sysfs_init(void)
> >  	static struct kobject *mempolicy_kobj;
> >  
> >  	mempolicy_kobj = kzalloc(sizeof(*mempolicy_kobj), GFP_KERNEL);
> > -	if (!mempolicy_kobj) {
> > -		err = -ENOMEM;
> > -		goto err_out;
> > -	}
> > -
> > -	node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
> > -			     GFP_KERNEL);
> > -	if (!node_attrs) {
> > -		err = -ENOMEM;
> > -		goto mempol_out;
> > -	}
> > +	if (!mempolicy_kobj)
> > +		return -ENOMEM;
> >  
> >  	err = kobject_init_and_add(mempolicy_kobj, &mempolicy_ktype, mm_kobj,
> >  				   "mempolicy");
> 
> Similar comment as above, now that mempolicy_kobj_release() is simply
> kfree(@kobj), you can use kobject_create_and_add() and make this all
> that much simpler.

For the mempolicy_kobj, I will update the code as you suggested
and use kobject_create_and_add().

With all your recommendations applied, Patch1 would now look like this:
@@ -3488,20 +3488,21 @@ static void iw_table_free(void)
        mutex_lock(&iw_table_lock);
        old = rcu_dereference_protected(iw_table,
                                        lockdep_is_held(&iw_table_lock));
-       if (old) {
-               rcu_assign_pointer(iw_table, NULL);
-               mutex_unlock(&iw_table_lock);
+       rcu_assign_pointer(iw_table, NULL);
+       mutex_unlock(&iw_table_lock);

-               synchronize_rcu();
-               kfree(old);
-       } else
-               mutex_unlock(&iw_table_lock);
+       synchronize_rcu();
+       kfree(old);
 }

-static void wi_kobj_release(struct kobject *wi_kobj)
-{
+static void wi_cleanup(struct kobject *wi_kobj) {
+       sysfs_wi_node_delete_all(wi_kobj);
        iw_table_free();
        kfree(node_attrs);
+}
+
+static void wi_kobj_release(struct kobject *wi_kobj)
+{
        kfree(wi_kobj);
 }

@@ -3575,45 +3576,30 @@ static int add_weighted_interleave_group(struct kobject *root_kobj)
        return 0;

 err_cleanup_kobj:
-       sysfs_wi_node_delete_all(wi_kobj);
+       wi_cleanup(wi_kobj);
        kobject_del(wi_kobj);
 err_put_kobj:
        kobject_put(wi_kobj);
        return err;
 }

-static void mempolicy_kobj_release(struct kobject *kobj)
-{
-       kfree(kobj);
-}
-
-static const struct kobj_type mempolicy_ktype = {
-       .release = mempolicy_kobj_release
-};
-
 static int __init mempolicy_sysfs_init(void)
 {
        int err;
        static struct kobject *mempolicy_kobj;

-       mempolicy_kobj = kzalloc(sizeof(*mempolicy_kobj), GFP_KERNEL);
+       mempolicy_kobj = kobject_create_and_add("mempolicy", mm_kobj);
        if (!mempolicy_kobj)
                return -ENOMEM;

-       err = kobject_init_and_add(mempolicy_kobj, &mempolicy_ktype, mm_kobj,
-                                  "mempolicy");
-       if (err)
-               goto err_put_kobj;
-
        err = add_weighted_interleave_group(mempolicy_kobj);
        if (err)
-               goto err_del_kobj;
+               goto err_kobj;

        return 0;

-err_del_kobj:
+err_kobj:
        kobject_del(mempolicy_kobj);
-err_put_kobj:
        kobject_put(mempolicy_kobj);
        return err;
 }

Rakie

> 
> So the patch looks technically correct as is, but if you make those
> final cleanups I will add my review tag.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v8 2/3] mm/mempolicy: Prepare weighted interleave sysfs for memory hotplug
  2025-04-16 23:07   ` Dan Williams
@ 2025-04-17  1:55     ` Rakie Kim
  0 siblings, 0 replies; 10+ messages in thread
From: Rakie Kim @ 2025-04-17  1:55 UTC (permalink / raw)
  To: Dan Williams
  Cc: gourry, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
	ying.huang, david, Jonathan.Cameron, osalvador, kernel_team,
	honggyu.kim, yunjeong.mun, rakie.kim, akpm

On Wed, 16 Apr 2025 16:07:18 -0700 Dan Williams <dan.j.williams@intel.com> wrote:
> Rakie Kim wrote:
> > Previously, the weighted interleave sysfs structure was statically
> > managed during initialization. This prevented new nodes from being
> > recognized when memory hotplug events occurred, limiting the ability
> > to update or extend sysfs entries dynamically at runtime.
> > 
> > To address this, this patch refactors the sysfs infrastructure and
> > encapsulates it within a new structure, `sysfs_wi_group`, which holds
> > both the kobject and an array of node attribute pointers.
> > 
> > By allocating this group structure globally, the per-node sysfs
> > attributes can be managed beyond initialization time, enabling
> > external modules to insert or remove node entries in response to
> > events such as memory hotplug or node online/offline transitions.
> > 
> > Instead of allocating all per-node sysfs attributes at once, the
> > initialization path now uses the existing sysfs_wi_node_add() and
> > sysfs_wi_node_delete() helpers. This refactoring makes it possible
> > to modularly manage per-node sysfs entries and ensures the
> > infrastructure is ready for runtime extension.
> > 
> > Signed-off-by: Rakie Kim <rakie.kim@sk.com>
> > Reviewed-by: Gregory Price <gourry@gourry.net>
> > Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com>
> > ---
> >  mm/mempolicy.c | 60 ++++++++++++++++++++++++--------------------------
> >  1 file changed, 29 insertions(+), 31 deletions(-)
> > 
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index dcf03c389b51..998635127e9d 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -3419,6 +3419,13 @@ struct iw_node_attr {
> >  	int nid;
> >  };
> >  
> > +struct sysfs_wi_group {
> > +	struct kobject wi_kobj;
> > +	struct iw_node_attr *nattrs[];
> > +};
> > +
> > +static struct sysfs_wi_group *wi_group;
> > +
> >  static ssize_t node_show(struct kobject *kobj, struct kobj_attribute *attr,
> >  			 char *buf)
> >  {
> > @@ -3461,24 +3468,23 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
> >  	return count;
> >  }
> >  
> > -static struct iw_node_attr **node_attrs;
> > -
> > -static void sysfs_wi_node_delete(struct iw_node_attr *node_attr,
> > -				 struct kobject *parent)
> > +static void sysfs_wi_node_delete(int nid)
> >  {
> > -	if (!node_attr)
> > +	if (!wi_group->nattrs[nid])
> >  		return;
> > -	sysfs_remove_file(parent, &node_attr->kobj_attr.attr);
> > -	kfree(node_attr->kobj_attr.attr.name);
> > -	kfree(node_attr);
> > +
> > +	sysfs_remove_file(&wi_group->wi_kobj,
> > +			  &wi_group->nattrs[nid]->kobj_attr.attr);
> > +	kfree(wi_group->nattrs[nid]->kobj_attr.attr.name);
> > +	kfree(wi_group->nattrs[nid]);
> >  }
> >  
> > -static void sysfs_wi_node_delete_all(struct kobject *wi_kobj)
> > +static void sysfs_wi_node_delete_all(void)
> >  {
> >  	int nid;
> >  
> >  	for (nid = 0; nid < nr_node_ids; nid++)
> > -		sysfs_wi_node_delete(node_attrs[nid], wi_kobj);
> > +		sysfs_wi_node_delete(nid);
> >  }
> >  
> >  static void iw_table_free(void)
> > @@ -3501,8 +3507,7 @@ static void iw_table_free(void)
> >  static void wi_kobj_release(struct kobject *wi_kobj)
> >  {
> >  	iw_table_free();
> > -	kfree(node_attrs);
> > -	kfree(wi_kobj);
> > +	kfree(wi_group);
> 
> Ah, look just one more iw_table_free() deletion to be able to switch to
> kobject_create_and_add() flow.
> 
> For what this patch is though you can add:
> 
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>

Thank you for the suggestion.
I have already removed the iw_table_free() and kfree(node_attrs) from
the release path in Patch1 as part of wi_cleanup().

Rakie



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v8 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs
  2025-04-17  1:49     ` Rakie Kim
@ 2025-04-17  3:23       ` Dan Williams
  0 siblings, 0 replies; 10+ messages in thread
From: Dan Williams @ 2025-04-17  3:23 UTC (permalink / raw)
  To: Rakie Kim, Dan Williams
  Cc: gourry, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
	ying.huang, david, Jonathan.Cameron, osalvador, kernel_team,
	honggyu.kim, yunjeong.mun, rakie.kim, akpm

Rakie Kim wrote:
> On Wed, 16 Apr 2025 14:54:16 -0700 Dan Williams <dan.j.williams@intel.com> wrote:
> > Rakie Kim wrote:
> > > +
> > > +static void iw_table_free(void)
> > > +{
> > > +	u8 *old;
> > > +
> > > +	mutex_lock(&iw_table_lock);
> > > +	old = rcu_dereference_protected(iw_table,
> > > +					lockdep_is_held(&iw_table_lock));
> > > +	if (old) {
> > > +		rcu_assign_pointer(iw_table, NULL);
> > > +		mutex_unlock(&iw_table_lock);
> > > +
> > > +		synchronize_rcu();
> > > +		kfree(old);
> > > +	} else
> > > +		mutex_unlock(&iw_table_lock);
> > 
> > This looks correct. I personally would not have spent the effort to
> > avoid the synchronize_rcu() because this is an error path that rarely
> > gets triggered, and kfree(NULL) is already a nop, so no pressing need to be
> > careful there either:
> > 
> > 	mutex_lock(&iw_table_lock);
> > 	old = rcu_dereference_protected(iw_table,
> > 					lockdep_is_held(&iw_table_lock));
> > 	rcu_assign_pointer(iw_table, NULL);
> > 	mutex_unlock(&iw_table_lock);
> > 	synchronize_rcu();
> > 	kfree(old);
> 
> I will modify the structure as you suggested.
> 
> > 
> > > +}
> > > +
> > > +static void wi_kobj_release(struct kobject *wi_kobj)
> > > +{
> > > +	iw_table_free();
> > 
> > This memory can be freed as soon as node_attrs have been deleted. By
> > waiting until final wi_kobj release it confuses the lifetime rules.
> > 
> > > +	kfree(node_attrs);
> > 
> > This memory too can be freed as soon as the attributes are deleted.
> > 
> > ...the rationale for considering these additional cleanups below:
> > 
> 
> I created a new function named wi_cleanup(), as you proposed.
> static void wi_cleanup(struct kobject *wi_kobj) {
> 	sysfs_wi_node_delete_all(wi_kobj);
> 	iw_table_free();
> 	kfree(node_attrs);

Looks good.

> }
> And I changed the error handling code to call this function.
> static int add_weighted_interleave_group(struct kobject *root_kobj)
> {
> ...
> err_cleanup_kobj:
> 	wi_cleanup(wi_kobj);
> 	kobject_del(wi_kobj);
> err_put_kobj:
> 	kobject_put(wi_kobj);
> 	return err;
> }
> 
> However, I have one question.
> With this change, iw_table and node_attrs will not be freed at system
> shutdown. Is it acceptable to leave this memory unfreed on shutdown?
> (As you previously noted, the sysfs files in this patch are also
> not removed during system shutdown.)

Yes, and note that most drivers do not implement a ->shutdown() handler
which means most drivers leak allocations from ->probe() when the system
is shut down.

> > > +	kfree(wi_kobj);
> > >  }
> > >  
> > >  static const struct kobj_type wi_ktype = {
> > >  	.sysfs_ops = &kobj_sysfs_ops,
> > > -	.release = sysfs_wi_release,
> > > +	.release = wi_kobj_release,
> > >  };
> > >  
> > >  static int add_weight_node(int nid, struct kobject *wi_kobj)
> > > @@ -3525,41 +3548,42 @@ static int add_weighted_interleave_group(struct kobject *root_kobj)
> > >  	struct kobject *wi_kobj;
> > >  	int nid, err;
> > >  
> > > +	node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
> > > +			     GFP_KERNEL);
> > > +	if (!node_attrs)
> > > +		return -ENOMEM;
> > > +
> > >  	wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
> > > -	if (!wi_kobj)
> > > +	if (!wi_kobj) {
> > > +		kfree(node_attrs);
> > >  		return -ENOMEM;
> > > +	}
> > >  
> > >  	err = kobject_init_and_add(wi_kobj, &wi_ktype, root_kobj,
> > >  				   "weighted_interleave");
> > 
> > If you fix wi_kobj_release() to stop being responsible to free memory
> > that should have been handled in the delete path (@node_attrs,
> > iw_table_free()), then you can also drop the wi_ktype and
> > wi_kobj_release() callback altogether.
> 
> I understand your suggestion about simplifying the kobject
> handling.
> If we only consider Patch1, then replacing kobject_init_and_add
> with kobject_create_and_add would be the right choice.
> 
> However, in Patch2, the code changes as follows:
> struct sysfs_wi_group {
> 	struct kobject wi_kobj;
> 	struct iw_node_attr *nattrs[];
> };
> static struct sysfs_wi_group *wi_group;
> ...
> static void wi_kobj_release(struct kobject *wi_kobj)
> {
> 	kfree(wi_group);
> }
> ...
> static int __init add_weighted_interleave_group(struct kobject *mempolicy_kobj)
> {
> 	int nid, err;
> 
> 	wi_group = kzalloc(struct_size(wi_group, nattrs, nr_node_ids),
> 			   GFP_KERNEL);
> 
> In this case, wi_kobj_release() is responsible for freeing the
> container struct wi_group that includes the kobject.
> Therefore, it seems more appropriate to use kobject_init_and_add
> in this context.

Ah, ok, I agree with you.

> I would appreciate your thoughts on this.
> 
> > 
> > I.e. once releasing @wi_kobj is just "kfree(wi_kobj)", then this
> > sequence:
> > 
> > 	wi_kobj = kzalloc(...)
> > 	kobject_init_and_add(wi_kob, &wi_ktype, ...)
> > 
> > Can simply become:
> > 
> > 	wi_kobj = kobject_create_and_add("weighted_interleave", root_kobj);
> > 
> > > -	if (err) {
> > > -		kfree(wi_kobj);
> > > -		return err;
> > > -	}
> > > +	if (err)
> > > +		goto err_put_kobj;
> > >  
> > >  	for_each_node_state(nid, N_POSSIBLE) {
> > >  		err = add_weight_node(nid, wi_kobj);
> > >  		if (err) {
> > >  			pr_err("failed to add sysfs [node%d]\n", nid);
> > > -			break;
> > > +			goto err_cleanup_kobj;
> > >  		}
> > >  	}
> > > -	if (err)
> > > -		kobject_put(wi_kobj);
> > > +
> > >  	return 0;
> > > +
> > > +err_cleanup_kobj:
> > > +	sysfs_wi_node_delete_all(wi_kobj);
> > > +	kobject_del(wi_kobj);
> > > +err_put_kobj:
> > > +	kobject_put(wi_kobj);
> > > +	return err;
> > >  }
> > >  
> > >  static void mempolicy_kobj_release(struct kobject *kobj)
> > >  {
> > > -	u8 *old;
> > > -
> > > -	mutex_lock(&iw_table_lock);
> > > -	old = rcu_dereference_protected(iw_table,
> > > -					lockdep_is_held(&iw_table_lock));
> > > -	rcu_assign_pointer(iw_table, NULL);
> > > -	mutex_unlock(&iw_table_lock);
> > > -	synchronize_rcu();
> > > -	kfree(old);
> > > -	kfree(node_attrs);
> > >  	kfree(kobj);
> > >  }
> > >  
> > > @@ -3573,37 +3597,24 @@ static int __init mempolicy_sysfs_init(void)
> > >  	static struct kobject *mempolicy_kobj;
> > >  
> > >  	mempolicy_kobj = kzalloc(sizeof(*mempolicy_kobj), GFP_KERNEL);
> > > -	if (!mempolicy_kobj) {
> > > -		err = -ENOMEM;
> > > -		goto err_out;
> > > -	}
> > > -
> > > -	node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
> > > -			     GFP_KERNEL);
> > > -	if (!node_attrs) {
> > > -		err = -ENOMEM;
> > > -		goto mempol_out;
> > > -	}
> > > +	if (!mempolicy_kobj)
> > > +		return -ENOMEM;
> > >  
> > >  	err = kobject_init_and_add(mempolicy_kobj, &mempolicy_ktype, mm_kobj,
> > >  				   "mempolicy");
> > 
> > Similar comment as above, now that mempolicy_kobj_release() is simply
> > kfree(@kobj), you can use kobject_create_and_add() and make this all
> > that much simpler.
> 
> For the mempolicy_kobj, I will update the code as you suggested
> and use kobject_create_and_add().
> 
> With all your recommendations applied, Patch1 would now look like this:
[..]

Changes look good.

With those changes you can add:

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

Thanks for all the patience with this!


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-04-17  3:23 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-04-16 11:31 [PATCH v8 0/3] Enhance sysfs handling for memory hotplug in weighted interleave Rakie Kim
2025-04-16 11:31 ` [PATCH v8 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs Rakie Kim
2025-04-16 21:54   ` Dan Williams
2025-04-17  1:49     ` Rakie Kim
2025-04-17  3:23       ` Dan Williams
2025-04-16 11:31 ` [PATCH v8 2/3] mm/mempolicy: Prepare weighted interleave sysfs for memory hotplug Rakie Kim
2025-04-16 23:07   ` Dan Williams
2025-04-17  1:55     ` Rakie Kim
2025-04-16 11:31 ` [PATCH v8 3/3] mm/mempolicy: Support memory hotplug in weighted interleave Rakie Kim
2025-04-16 23:08   ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox