* [PATCH v2 2/4] mm/mempolicy: Support memory hotplug in weighted interleave
2025-03-12 7:56 [PATCH v2 1/4] mm/mempolicy: Fix memory leaks in mempolicy_sysfs_init() Rakie Kim
@ 2025-03-12 7:56 ` Rakie Kim
2025-03-12 16:03 ` Gregory Price
2025-03-12 7:56 ` [PATCH v2 3/4] mm/mempolicy: Enable sysfs support for " Rakie Kim
` (2 subsequent siblings)
3 siblings, 1 reply; 27+ messages in thread
From: Rakie Kim @ 2025-03-12 7:56 UTC (permalink / raw)
To: gourry
Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, kernel_team, honggyu.kim,
yunjeong.mun, rakie.kim
The weighted interleave policy distributes page allocations across multiple
NUMA nodes based on their performance weight, thereby optimizing memory
bandwidth utilization. The weight values for each node are configured
through sysfs.
Previously, the sysfs entries for configuring weighted interleave were only
created during initialization. This approach had several limitations:
- Sysfs entries were generated for all possible nodes at boot time,
including nodes without memory, leading to unnecessary sysfs creation.
- Some memory devices transition to an online state after initialization,
but the existing implementation failed to create sysfs entries for
these dynamically added nodes. As a result, memory hotplugged nodes
were not properly recognized by the weighed interleave mechanism.
To resolve these issues, this patch introduces two key improvements:
1) At initialization, only nodes that are online and have memory are
recognized, preventing the creation of unnecessary sysfs entries.
2) Nodes that become available after initialization are dynamically
detected and integrated through the memory hotplug mechanism.
With this enhancement, the weighted interleave policy now properly supports
memory hotplug, ensuring that newly added nodes are recognized and sysfs
entries are created accordingly.
Signed-off-by: Rakie Kim <rakie.kim@sk.com>
---
mm/mempolicy.c | 47 ++++++++++++++++++++++++++++++++++++++++++-----
1 file changed, 42 insertions(+), 5 deletions(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 1691748badb2..94efff89e0be 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -113,6 +113,7 @@
#include <asm/tlbflush.h>
#include <asm/tlb.h>
#include <linux/uaccess.h>
+#include <linux/memory.h>
#include "internal.h"
@@ -3489,9 +3490,38 @@ static int add_weight_node(int nid, struct kobject *wi_kobj)
return 0;
}
+struct kobject *wi_kobj;
+
+static int wi_node_notifier(struct notifier_block *nb,
+ unsigned long action, void *data)
+{
+ int err;
+ struct memory_notify *arg = data;
+ int nid = arg->status_change_nid;
+
+ if (nid < 0)
+ goto notifier_end;
+
+ switch(action) {
+ case MEM_ONLINE:
+ err = add_weight_node(nid, wi_kobj);
+ if (err) {
+ pr_err("failed to add sysfs [node%d]\n", nid);
+ kobject_put(wi_kobj);
+ return NOTIFY_BAD;
+ }
+ break;
+ case MEM_OFFLINE:
+ sysfs_wi_node_release(node_attrs[nid], wi_kobj);
+ break;
+ }
+
+notifier_end:
+ return NOTIFY_OK;
+}
+
static int add_weighted_interleave_group(struct kobject *root_kobj)
{
- struct kobject *wi_kobj;
int nid, err;
wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
@@ -3505,16 +3535,23 @@ static int add_weighted_interleave_group(struct kobject *root_kobj)
return err;
}
- for_each_node_state(nid, N_POSSIBLE) {
+ for_each_online_node(nid) {
+ if (!node_state(nid, N_MEMORY))
+ continue;
+
err = add_weight_node(nid, wi_kobj);
if (err) {
pr_err("failed to add sysfs [node%d]\n", nid);
- break;
+ goto err_out;
}
}
- if (err)
- kobject_put(wi_kobj);
+
+ hotplug_memory_notifier(wi_node_notifier, DEFAULT_CALLBACK_PRI);
return 0;
+
+err_out:
+ kobject_put(wi_kobj);
+ return err;
}
static void mempolicy_kobj_release(struct kobject *kobj)
--
2.34.1
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [PATCH v2 2/4] mm/mempolicy: Support memory hotplug in weighted interleave
2025-03-12 7:56 ` [PATCH v2 2/4] mm/mempolicy: Support memory hotplug in weighted interleave Rakie Kim
@ 2025-03-12 16:03 ` Gregory Price
2025-03-13 6:33 ` Rakie Kim
0 siblings, 1 reply; 27+ messages in thread
From: Gregory Price @ 2025-03-12 16:03 UTC (permalink / raw)
To: Rakie Kim
Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, kernel_team, honggyu.kim,
yunjeong.mun
On Wed, Mar 12, 2025 at 04:56:25PM +0900, Rakie Kim wrote:
> The weighted interleave policy distributes page allocations across multiple
> NUMA nodes based on their performance weight, thereby optimizing memory
> bandwidth utilization. The weight values for each node are configured
> through sysfs.
>
> Previously, the sysfs entries for configuring weighted interleave were only
> created during initialization. This approach had several limitations:
> - Sysfs entries were generated for all possible nodes at boot time,
> including nodes without memory, leading to unnecessary sysfs creation.
It's not that it's unnecessary, it's that it allowed for configuration
of nodes which may not have memory now but may have memory in the
future. This was not well documented.
> - Some memory devices transition to an online state after initialization,
> but the existing implementation failed to create sysfs entries for
> these dynamically added nodes. As a result, memory hotplugged nodes
> were not properly recognized by the weighed interleave mechanism.
>
The current system creates 1 node per N_POSSIBLE nodes, and since nodes
can't transition between possible and not-possible your claims here are
contradictory.
I think you mean that simply switching from N_POSSIBLE to N_MEMORY is
insufficient since nodes may transition in and out of the N_MEMORY
state. Therefore this patch utilizes a hotplug callback to add and
remove sysfs entries based on whether a node is in the N_MEMORY set.
> To resolve these issues, this patch introduces two key improvements:
> 1) At initialization, only nodes that are online and have memory are
> recognized, preventing the creation of unnecessary sysfs entries.
> 2) Nodes that become available after initialization are dynamically
> detected and integrated through the memory hotplug mechanism.
>
> With this enhancement, the weighted interleave policy now properly supports
> memory hotplug, ensuring that newly added nodes are recognized and sysfs
> entries are created accordingly.
>
It doesn't "support memory hotplug" so much as it "Minimizes weighted
interleave to exclude memoryless nodes".
> Signed-off-by: Rakie Kim <rakie.kim@sk.com>
> ---
> mm/mempolicy.c | 47 ++++++++++++++++++++++++++++++++++++++++++-----
> 1 file changed, 42 insertions(+), 5 deletions(-)
>
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 1691748badb2..94efff89e0be 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -113,6 +113,7 @@
> #include <asm/tlbflush.h>
> #include <asm/tlb.h>
> #include <linux/uaccess.h>
> +#include <linux/memory.h>
>
> #include "internal.h"
>
> @@ -3489,9 +3490,38 @@ static int add_weight_node(int nid, struct kobject *wi_kobj)
> return 0;
> }
>
> +struct kobject *wi_kobj;
> +
> +static int wi_node_notifier(struct notifier_block *nb,
> + unsigned long action, void *data)
> +{
> + int err;
> + struct memory_notify *arg = data;
> + int nid = arg->status_change_nid;
> +
> + if (nid < 0)
> + goto notifier_end;
> +
> + switch(action) {
> + case MEM_ONLINE:
> + err = add_weight_node(nid, wi_kobj);
> + if (err) {
> + pr_err("failed to add sysfs [node%d]\n", nid);
> + kobject_put(wi_kobj);
> + return NOTIFY_BAD;
> + }
> + break;
> + case MEM_OFFLINE:
> + sysfs_wi_node_release(node_attrs[nid], wi_kobj);
> + break;
> + }
I'm fairly certain this logic is wrong. If I add two memory blocks and
then remove one, would this logic not remove the sysfs entries despite
there being a block remaining?
> +
> +notifier_end:
> + return NOTIFY_OK;
> +}
> +
> static int add_weighted_interleave_group(struct kobject *root_kobj)
> {
> - struct kobject *wi_kobj;
> int nid, err;
>
> wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
> @@ -3505,16 +3535,23 @@ static int add_weighted_interleave_group(struct kobject *root_kobj)
> return err;
> }
>
> - for_each_node_state(nid, N_POSSIBLE) {
> + for_each_online_node(nid) {
> + if (!node_state(nid, N_MEMORY))
Rather than online node, why not just add for each N_MEMORY node -
regardless of if its memory is online or not? If the memory is offline,
then it will be excluded from the weighted interleave mechanism by
nature of the node being invalid for allocations anyway.
> + continue;
> +
> err = add_weight_node(nid, wi_kobj);
> if (err) {
> pr_err("failed to add sysfs [node%d]\n", nid);
> - break;
> + goto err_out;
> }
> }
> - if (err)
> - kobject_put(wi_kobj);
> +
> + hotplug_memory_notifier(wi_node_notifier, DEFAULT_CALLBACK_PRI);
> return 0;
> +
> +err_out:
> + kobject_put(wi_kobj);
> + return err;
> }
>
> static void mempolicy_kobj_release(struct kobject *kobj)
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [PATCH v2 2/4] mm/mempolicy: Support memory hotplug in weighted interleave
2025-03-12 16:03 ` Gregory Price
@ 2025-03-13 6:33 ` Rakie Kim
2025-03-13 16:23 ` Gregory Price
0 siblings, 1 reply; 27+ messages in thread
From: Rakie Kim @ 2025-03-13 6:33 UTC (permalink / raw)
To: Gregory Price
Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, kernel_team, honggyu.kim,
yunjeong.mun, Rakie Kim
On Wed, 12 Mar 2025 12:03:01 -0400 Gregory Price <gourry@gourry.net> wrote:
> On Wed, Mar 12, 2025 at 04:56:25PM +0900, Rakie Kim wrote:
> > The weighted interleave policy distributes page allocations across multiple
> > NUMA nodes based on their performance weight, thereby optimizing memory
> > bandwidth utilization. The weight values for each node are configured
> > through sysfs.
> >
> > Previously, the sysfs entries for configuring weighted interleave were only
> > created during initialization. This approach had several limitations:
> > - Sysfs entries were generated for all possible nodes at boot time,
> > including nodes without memory, leading to unnecessary sysfs creation.
>
> It's not that it's unnecessary, it's that it allowed for configuration
> of nodes which may not have memory now but may have memory in the
> future. This was not well documented.
I will update the commit message to reflect your feedback.
>
> > - Some memory devices transition to an online state after initialization,
> > but the existing implementation failed to create sysfs entries for
> > these dynamically added nodes. As a result, memory hotplugged nodes
> > were not properly recognized by the weighed interleave mechanism.
> >
>
> The current system creates 1 node per N_POSSIBLE nodes, and since nodes
> can't transition between possible and not-possible your claims here are
> contradictory.
>
> I think you mean that simply switching from N_POSSIBLE to N_MEMORY is
> insufficient since nodes may transition in and out of the N_MEMORY
> state. Therefore this patch utilizes a hotplug callback to add and
> remove sysfs entries based on whether a node is in the N_MEMORY set.
I will update the commit message to reflect your feedback.
>
> > To resolve these issues, this patch introduces two key improvements:
> > 1) At initialization, only nodes that are online and have memory are
> > recognized, preventing the creation of unnecessary sysfs entries.
> > 2) Nodes that become available after initialization are dynamically
> > detected and integrated through the memory hotplug mechanism.
> >
> > With this enhancement, the weighted interleave policy now properly supports
> > memory hotplug, ensuring that newly added nodes are recognized and sysfs
> > entries are created accordingly.
> >
>
> It doesn't "support memory hotplug" so much as it "Minimizes weighted
> interleave to exclude memoryless nodes".
I will update the commit message to reflect your feedback.
>
> > Signed-off-by: Rakie Kim <rakie.kim@sk.com>
> > ---
> > mm/mempolicy.c | 47 ++++++++++++++++++++++++++++++++++++++++++-----
> > 1 file changed, 42 insertions(+), 5 deletions(-)
> >
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index 1691748badb2..94efff89e0be 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -113,6 +113,7 @@
> > #include <asm/tlbflush.h>
> > #include <asm/tlb.h>
> > #include <linux/uaccess.h>
> > +#include <linux/memory.h>
> >
> > #include "internal.h"
> >
> > @@ -3489,9 +3490,38 @@ static int add_weight_node(int nid, struct kobject *wi_kobj)
> > return 0;
> > }
> >
> > +struct kobject *wi_kobj;
> > +
> > +static int wi_node_notifier(struct notifier_block *nb,
> > + unsigned long action, void *data)
> > +{
> > + int err;
> > + struct memory_notify *arg = data;
> > + int nid = arg->status_change_nid;
> > +
> > + if (nid < 0)
> > + goto notifier_end;
> > +
> > + switch(action) {
> > + case MEM_ONLINE:
> > + err = add_weight_node(nid, wi_kobj);
> > + if (err) {
> > + pr_err("failed to add sysfs [node%d]\n", nid);
> > + kobject_put(wi_kobj);
> > + return NOTIFY_BAD;
> > + }
> > + break;
> > + case MEM_OFFLINE:
> > + sysfs_wi_node_release(node_attrs[nid], wi_kobj);
> > + break;
> > + }
>
> I'm fairly certain this logic is wrong. If I add two memory blocks and
> then remove one, would this logic not remove the sysfs entries despite
> there being a block remaining?
Regarding the assumption about node configuration:
Are you assuming that a node has two memory blocks and that
MEM_OFFLINE is triggered when one of them is offlined? If so, then
you are correct that this logic would need modification.
I performed a simple test by offlining a single memory block:
# echo 0 > /sys/devices/system/node/node2/memory100/online
In this case, MEM_OFFLINE was not triggered. However, I need to
conduct further analysis to confirm this behavior under different
conditions. I will review this in more detail and share my
findings, including the test methodology and results.
>
> > +
> > +notifier_end:
> > + return NOTIFY_OK;
> > +}
> > +
> > static int add_weighted_interleave_group(struct kobject *root_kobj)
> > {
> > - struct kobject *wi_kobj;
> > int nid, err;
> >
> > wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
> > @@ -3505,16 +3535,23 @@ static int add_weighted_interleave_group(struct kobject *root_kobj)
> > return err;
> > }
> >
> > - for_each_node_state(nid, N_POSSIBLE) {
> > + for_each_online_node(nid) {
> > + if (!node_state(nid, N_MEMORY))
>
> Rather than online node, why not just add for each N_MEMORY node -
> regardless of if its memory is online or not? If the memory is offline,
> then it will be excluded from the weighted interleave mechanism by
> nature of the node being invalid for allocations anyway.
Regarding the decision to check both N_MEMORY and N_ONLINE:
This was done to ensure consistency with the conditions under which
`wi_node_notifier` is triggered. Specifically, `MEM_ONLINE` is called
only when a node is in both the N_MEMORY and N_ONLINE states.
I will review this logic further. If my understanding is correct,
keeping the current implementation is the appropriate approach.
However, I will conduct additional testing to validate this and
provide further updates accordingly.
>
> > + continue;
> > +
> > err = add_weight_node(nid, wi_kobj);
> > if (err) {
> > pr_err("failed to add sysfs [node%d]\n", nid);
> > - break;
> > + goto err_out;
> > }
> > }
> > - if (err)
> > - kobject_put(wi_kobj);
> > +
> > + hotplug_memory_notifier(wi_node_notifier, DEFAULT_CALLBACK_PRI);
> > return 0;
> > +
> > +err_out:
> > + kobject_put(wi_kobj);
> > + return err;
> > }
> >
> > static void mempolicy_kobj_release(struct kobject *kobj)
> > --
> > 2.34.1
> >
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [PATCH v2 2/4] mm/mempolicy: Support memory hotplug in weighted interleave
2025-03-13 6:33 ` Rakie Kim
@ 2025-03-13 16:23 ` Gregory Price
2025-03-13 22:36 ` David Hildenbrand
0 siblings, 1 reply; 27+ messages in thread
From: Gregory Price @ 2025-03-13 16:23 UTC (permalink / raw)
To: Rakie Kim
Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, kernel_team, honggyu.kim,
yunjeong.mun, david
On Thu, Mar 13, 2025 at 03:33:37PM +0900, Rakie Kim wrote:
> > I'm fairly certain this logic is wrong. If I add two memory blocks and
> > then remove one, would this logic not remove the sysfs entries despite
> > there being a block remaining?
>
> Regarding the assumption about node configuration:
> Are you assuming that a node has two memory blocks and that
> MEM_OFFLINE is triggered when one of them is offlined? If so, then
> you are correct that this logic would need modification.
>
> I performed a simple test by offlining a single memory block:
> # echo 0 > /sys/devices/system/node/node2/memory100/online
>
> In this case, MEM_OFFLINE was not triggered. However, I need to
> conduct further analysis to confirm this behavior under different
> conditions. I will review this in more detail and share my
> findings, including the test methodology and results.
>
+David - might have a quick answer to this. I would have expected a
single memory block going offline to cause a notification.
I think the logic we care about is here:
static void node_states_check_changes_online(unsigned long nr_pages,
struct zone *zone, struct memory_notify *arg)
{
int nid = zone_to_nid(zone);
arg->status_change_nid = NUMA_NO_NODE;
arg->status_change_nid_normal = NUMA_NO_NODE;
if (!node_state(nid, N_MEMORY))
arg->status_change_nid = nid;
if (zone_idx(zone) <= ZONE_NORMAL && !node_state(nid, N_NORMAL_MEMORY))
arg->status_change_nid_normal = nid;
}
static void node_states_set_node(int node, struct memory_notify *arg)
{
if (arg->status_change_nid_normal >= 0)
node_set_state(node, N_NORMAL_MEMORY);
if (arg->status_change_nid >= 0)
node_set_state(node, N_MEMORY);
}
int online_pages(unsigned long pfn, unsigned long nr_pages,
struct zone *zone, struct memory_group *group)
{
...
node_states_check_changes_online(nr_pages, zone, &arg);
...
node_states_set_node(nid, &arg);
...
memory_notify(MEM_ONLINE, &arg);
}
In the callback i think you want to check whether N_MEMORY is set
+ case MEM_OFFLINE:
++ if (node is !N_MEMORY)
++ sysfs_wi_node_release(node_attrs[nid], wi_kobj);
+ break;
+ }
Similar with online (don't want to double-add).
also from what I can tell, N_MEMORY implies N_ONLINE because N_ONLINE
occurs when memory blocks are added (regardless of state).
~Gregory
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [PATCH v2 2/4] mm/mempolicy: Support memory hotplug in weighted interleave
2025-03-13 16:23 ` Gregory Price
@ 2025-03-13 22:36 ` David Hildenbrand
2025-03-14 6:00 ` Rakie Kim
0 siblings, 1 reply; 27+ messages in thread
From: David Hildenbrand @ 2025-03-13 22:36 UTC (permalink / raw)
To: Gregory Price, Rakie Kim
Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, kernel_team, honggyu.kim,
yunjeong.mun
On 13.03.25 17:23, Gregory Price wrote:
> On Thu, Mar 13, 2025 at 03:33:37PM +0900, Rakie Kim wrote:
>>> I'm fairly certain this logic is wrong. If I add two memory blocks and
>>> then remove one, would this logic not remove the sysfs entries despite
>>> there being a block remaining?
>>
>> Regarding the assumption about node configuration:
>> Are you assuming that a node has two memory blocks and that
>> MEM_OFFLINE is triggered when one of them is offlined? If so, then
>> you are correct that this logic would need modification.
>>
>> I performed a simple test by offlining a single memory block:
>> # echo 0 > /sys/devices/system/node/node2/memory100/online
>>
>> In this case, MEM_OFFLINE was not triggered. However, I need to
>> conduct further analysis to confirm this behavior under different
>> conditions. I will review this in more detail and share my
>> findings, including the test methodology and results.
>>
>
> +David - might have a quick answer to this. I would have expected a
> single memory block going offline to cause a notification.
Yes. Unless offlining failed, or the block was already offline :)
If it doesn't happen for an actual online memory block that is offline
after the call, we would have a bug.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 2/4] mm/mempolicy: Support memory hotplug in weighted interleave
2025-03-13 22:36 ` David Hildenbrand
@ 2025-03-14 6:00 ` Rakie Kim
2025-03-14 9:17 ` David Hildenbrand
0 siblings, 1 reply; 27+ messages in thread
From: Rakie Kim @ 2025-03-14 6:00 UTC (permalink / raw)
To: David Hildenbrand
Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, kernel_team, honggyu.kim,
yunjeong.mun, Gregory Price, Rakie Kim
On Thu, 13 Mar 2025 23:36:47 +0100 David Hildenbrand <david@redhat.com> wrote:
> On 13.03.25 17:23, Gregory Price wrote:
> > On Thu, Mar 13, 2025 at 03:33:37PM +0900, Rakie Kim wrote:
> >>> I'm fairly certain this logic is wrong. If I add two memory blocks and
> >>> then remove one, would this logic not remove the sysfs entries despite
> >>> there being a block remaining?
> >>
> >> Regarding the assumption about node configuration:
> >> Are you assuming that a node has two memory blocks and that
> >> MEM_OFFLINE is triggered when one of them is offlined? If so, then
> >> you are correct that this logic would need modification.
> >>
> >> I performed a simple test by offlining a single memory block:
> >> # echo 0 > /sys/devices/system/node/node2/memory100/online
> >>
> >> In this case, MEM_OFFLINE was not triggered. However, I need to
> >> conduct further analysis to confirm this behavior under different
> >> conditions. I will review this in more detail and share my
> >> findings, including the test methodology and results.
> >>
> >
> > +David - might have a quick answer to this. I would have expected a
> > single memory block going offline to cause a notification.
>
> Yes. Unless offlining failed, or the block was already offline :)
>
> If it doesn't happen for an actual online memory block that is offline
> after the call, we would have a bug.
>
>
> --
> Cheers,
>
> David / dhildenb
>
Hi David,
I am currently working on adding memory hotplug-related functionality
to the weighted interleave feature. While discussing this with Gregory,
a question came up. If you happen to know the answer to the following,
I would greatly appreciate your input.
I have added the following logic to call add_weight_node when a node
transitions to the N_MEMORY state to create a sysfs entry. Conversely,
when all memory blocks of a node go offline (!N_MEMORY),
I call sysfs_wi_node_release to remove the corresponding sysfs entry.
+static int wi_node_notifier(struct notifier_block *nb,
+ unsigned long action, void *data)
+{
+ int err;
+ struct memory_notify *arg = data;
+ int nid = arg->status_change_nid;
+
+ if (nid < 0)
+ goto notifier_end;
+
+ switch(action) {
+ case MEM_ONLINE:
+ err = add_weight_node(nid, wi_kobj);
+ if (err) {
+ pr_err("failed to add sysfs [node%d]\n", nid);
+ kobject_put(wi_kobj);
+ return NOTIFY_BAD;
+ }
+ break;
+ case MEM_OFFLINE:
+ sysfs_wi_node_release(node_attrs[nid], wi_kobj);
+ break;
+ }
+
+notifier_end:
+ return NOTIFY_OK;
+}
One question I have is whether the MEM_OFFLINE action in the code
below will be triggered when a node that consists of multiple memory
blocks has only one of its memory blocks transitioning to the offline state.
+ int nid = arg->status_change_nid;
+
+ if (nid < 0)
+ goto notifier_end;
Based on my analysis, wi_node_notifier should function as expected
because arg->status_change_nid only holds a non-negative value
under the following conditions:
1) !N_MEMORY -> N_MEMORY
When the first memory block of a node transitions to the online state,
it holds a non-negative value.
In all other cases, it remains -1 (NUMA_NO_NODE).
2) N_MEMORY -> !N_MEMORY
When all memory blocks of a node transition to the offline state,
it holds a non-negative value.
In all other cases, it remains -1 (NUMA_NO_NODE).
I would truly appreciate it if you could confirm whether this analysis is correct.
Below is a more detailed explanation of my findings.
<memory block online>
- The callback function registered in hotplug_memory_notifier
receives the MEM_ONLINE action in online_pages.
int online_pages(unsigned long pfn, unsigned long nr_pages,
struct zone *zone, struct memory_group *group)
{
struct memory_notify arg;
...
node_states_check_changes_online(nr_pages, zone, &arg);
ret = memory_notify(MEM_GOING_ONLINE, &arg);
...
node_states_set_node(nid, &arg);
...
memory_notify(MEM_ONLINE, &arg);
}
- If the node is in the !N_MEMORY state,
arg->status_change_nid is set to the node ID.
static void node_states_check_changes_online(unsigned long nr_pages,
struct zone *zone, struct memory_notify *arg)
{
int nid = zone_to_nid(zone);
arg->status_change_nid = NUMA_NO_NODE;
arg->status_change_nid_normal = NUMA_NO_NODE;
if (!node_state(nid, N_MEMORY))
arg->status_change_nid = nid;
...
}
- If arg->status_change_nid >= 0, the node transitions to the N_MEMORY state.
static void node_states_set_node(int node, struct memory_notify *arg)
{
...
if (arg->status_change_nid >= 0)
node_set_state(node, N_MEMORY);
}
<memory block offline>
- The callback function registered in hotplug_memory_notifier
receives the MEM_OFFLINE action in offline_pages.
int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
struct zone *zone, struct memory_group *group)
{
struct memory_notify arg;
...
node_states_check_changes_offline(nr_pages, zone, &arg);
ret = memory_notify(MEM_GOING_OFFLINE, &arg);
...
node_states_clear_node(node, &arg);
...
memory_notify(MEM_OFFLINE, &arg);
}
- If the node becomes empty,
arg->status_change_nid is set to the node ID.
static void node_states_check_changes_offline(unsigned long nr_pages,
struct zone *zone, struct memory_notify *arg)
{
...
/*
* We have accounted the pages from [0..ZONE_NORMAL); ZONE_HIGHMEM
* does not apply as we don't support 32bit.
* Here we count the possible pages from ZONE_MOVABLE.
* If after having accounted all the pages, we see that the nr_pages
* to be offlined is over or equal to the accounted pages,
* we know that the node will become empty, and so, we can clear
* it for N_MEMORY as well.
*/
present_pages += pgdat->node_zones[ZONE_MOVABLE].present_pages;
if (nr_pages >= present_pages)
arg->status_change_nid = zone_to_nid(zone);
}
- If arg->status_change_nid >= 0,
the node transitions to the !N_MEMORY state.
static void node_states_clear_node(int node, struct memory_notify *arg)
{
...
if (arg->status_change_nid >= 0)
node_clear_state(node, N_MEMORY);
}
Thank you very much for taking the time to read this lengthy email.
Rakie
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [PATCH v2 2/4] mm/mempolicy: Support memory hotplug in weighted interleave
2025-03-14 6:00 ` Rakie Kim
@ 2025-03-14 9:17 ` David Hildenbrand
2025-03-17 8:23 ` Rakie Kim
0 siblings, 1 reply; 27+ messages in thread
From: David Hildenbrand @ 2025-03-14 9:17 UTC (permalink / raw)
To: Rakie Kim
Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, kernel_team, honggyu.kim,
yunjeong.mun, Gregory Price
> Hi David,
Hi :)
>
> I am currently working on adding memory hotplug-related functionality
> to the weighted interleave feature. While discussing this with Gregory,
> a question came up. If you happen to know the answer to the following,
> I would greatly appreciate your input.
>
> I have added the following logic to call add_weight_node when a node
> transitions to the N_MEMORY state to create a sysfs entry. Conversely,
> when all memory blocks of a node go offline (!N_MEMORY),
> I call sysfs_wi_node_release to remove the corresponding sysfs entry.
>
As a spoiler: I don't like how we squeezed the status_change_nid /
status_change_nid_normal stuff into the memory notifier that works on a
single memory block -> single zone. But decoupling it is not as easy,
because we have this status_change_nid vs. status_change_nid_normal thing.
For the basic "node going offline / node going online", a separate
notifier (or separate callbacks) would make at least your use case much
clearer.
The whole "status_change_nid_normal" is only used by slab. I wonder if
we could get rid of it, and simply let slab check the relevant
zone->present pages when notified about onlining/offlining of a singe
memory block.
Then, we would only have status_change_nid and could just convert that
to a NODE_MEM_ONLINE / NODE_MEM_OFFLINE notifier or sth like that.
Hmmm, if I wouldn't be on PTO, I would prototype that real quick :)
> +static int wi_node_notifier(struct notifier_block *nb,
> + unsigned long action, void *data)
> +{
> + int err;
> + struct memory_notify *arg = data;
> + int nid = arg->status_change_nid;
> +
> + if (nid < 0)
> + goto notifier_end;
> +
> + switch(action) {
> + case MEM_ONLINE:
> + err = add_weight_node(nid, wi_kobj);
> + if (err) {
> + pr_err("failed to add sysfs [node%d]\n", nid);
> + kobject_put(wi_kobj);
> + return NOTIFY_BAD;
> + }
> + break;
> + case MEM_OFFLINE:
> + sysfs_wi_node_release(node_attrs[nid], wi_kobj);
> + break;
> + }
> +
> +notifier_end:
> + return NOTIFY_OK;
> +}
>
> One question I have is whether the MEM_OFFLINE action in the code
> below will be triggered when a node that consists of multiple memory
> blocks has only one of its memory blocks transitioning to the offline state.
>
node_states_check_changes_offline() should be making sure that that is
the case.
Only if offlining the current block will make the node (all zones) have
no present pages any more, then we'll set status_change_nid to >= 0.
> + int nid = arg->status_change_nid;
> +
> + if (nid < 0)
> + goto notifier_end;
>
> Based on my analysis, wi_node_notifier should function as expected
> because arg->status_change_nid only holds a non-negative value
> under the following conditions:
>
> 1) !N_MEMORY -> N_MEMORY
> When the first memory block of a node transitions to the online state,
> it holds a non-negative value.
> In all other cases, it remains -1 (NUMA_NO_NODE).
>
> 2) N_MEMORY -> !N_MEMORY
> When all memory blocks of a node transition to the offline state,
> it holds a non-negative value.
> In all other cases, it remains -1 (NUMA_NO_NODE).
>
> I would truly appreciate it if you could confirm whether this analysis is correct.
> Below is a more detailed explanation of my findings.
>
Yes, that's at least how it is supposed to work (-bugs, but I am not
aware of any) :)
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [PATCH v2 2/4] mm/mempolicy: Support memory hotplug in weighted interleave
2025-03-14 9:17 ` David Hildenbrand
@ 2025-03-17 8:23 ` Rakie Kim
0 siblings, 0 replies; 27+ messages in thread
From: Rakie Kim @ 2025-03-17 8:23 UTC (permalink / raw)
To: David Hildenbrand
Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, kernel_team, honggyu.kim,
yunjeong.mun, Gregory Price, Rakie Kim
On Fri, 14 Mar 2025 10:17:26 +0100 David Hildenbrand <david@redhat.com> wrote:
> > Hi David,
>
> Hi :)
>
> >
> > I am currently working on adding memory hotplug-related functionality
> > to the weighted interleave feature. While discussing this with Gregory,
> > a question came up. If you happen to know the answer to the following,
> > I would greatly appreciate your input.
> >
> > I have added the following logic to call add_weight_node when a node
> > transitions to the N_MEMORY state to create a sysfs entry. Conversely,
> > when all memory blocks of a node go offline (!N_MEMORY),
> > I call sysfs_wi_node_release to remove the corresponding sysfs entry.
> >
>
> As a spoiler: I don't like how we squeezed the status_change_nid /
> status_change_nid_normal stuff into the memory notifier that works on a
> single memory block -> single zone. But decoupling it is not as easy,
> because we have this status_change_nid vs. status_change_nid_normal thing.
>
> For the basic "node going offline / node going online", a separate
> notifier (or separate callbacks) would make at least your use case much
> clearer.
>
> The whole "status_change_nid_normal" is only used by slab. I wonder if
> we could get rid of it, and simply let slab check the relevant
> zone->present pages when notified about onlining/offlining of a singe
> memory block.
>
> Then, we would only have status_change_nid and could just convert that
> to a NODE_MEM_ONLINE / NODE_MEM_OFFLINE notifier or sth like that.
>
> Hmmm, if I wouldn't be on PTO, I would prototype that real quick :)
Hi David :)
I completely agree with your perspective on this. Having separate callbacks
for "node going offline/node going online" would certainly lead to clearer
code. For now, I shall proceed with developing the code based on the current
structure. I will also continue monitoring updates related to "node online/
offline" and plan on revising the code once those are integrated.
Thank you for your valuable input on this matter.
>
> > +static int wi_node_notifier(struct notifier_block *nb,
> > + unsigned long action, void *data)
> > +{
> > + int err;
> > + struct memory_notify *arg = data;
> > + int nid = arg->status_change_nid;
> > +
> > + if (nid < 0)
> > + goto notifier_end;
> > +
> > + switch(action) {
> > + case MEM_ONLINE:
> > + err = add_weight_node(nid, wi_kobj);
> > + if (err) {
> > + pr_err("failed to add sysfs [node%d]\n", nid);
> > + kobject_put(wi_kobj);
> > + return NOTIFY_BAD;
> > + }
> > + break;
> > + case MEM_OFFLINE:
> > + sysfs_wi_node_release(node_attrs[nid], wi_kobj);
> > + break;
> > + }
> > +
> > +notifier_end:
> > + return NOTIFY_OK;
> > +}
> >
> > One question I have is whether the MEM_OFFLINE action in the code
> > below will be triggered when a node that consists of multiple memory
> > blocks has only one of its memory blocks transitioning to the offline state.
> >
>
> node_states_check_changes_offline() should be making sure that that is
> the case.
>
> Only if offlining the current block will make the node (all zones) have
> no present pages any more, then we'll set status_change_nid to >= 0.
>
Thank you for reviewing this matter.
>
> > + int nid = arg->status_change_nid;
> > +
> > + if (nid < 0)
> > + goto notifier_end;
> >
> > Based on my analysis, wi_node_notifier should function as expected
> > because arg->status_change_nid only holds a non-negative value
> > under the following conditions:
> >
> > 1) !N_MEMORY -> N_MEMORY
> > When the first memory block of a node transitions to the online state,
> > it holds a non-negative value.
> > In all other cases, it remains -1 (NUMA_NO_NODE).
> >
> > 2) N_MEMORY -> !N_MEMORY
> > When all memory blocks of a node transition to the offline state,
> > it holds a non-negative value.
> > In all other cases, it remains -1 (NUMA_NO_NODE).
> >
> > I would truly appreciate it if you could confirm whether this analysis is correct.
> > Below is a more detailed explanation of my findings.
> >
>
> Yes, that's at least how it is supposed to work (-bugs, but I am not
> aware of any) :)
>
Thank you once again for reviewing this matter. Your insightful feedback has been
instrumental in crafting a more robust structure.
Rakie
> --
> Cheers,
>
> David / dhildenb
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH v2 3/4] mm/mempolicy: Enable sysfs support for memory hotplug in weighted interleave
2025-03-12 7:56 [PATCH v2 1/4] mm/mempolicy: Fix memory leaks in mempolicy_sysfs_init() Rakie Kim
2025-03-12 7:56 ` [PATCH v2 2/4] mm/mempolicy: Support memory hotplug in weighted interleave Rakie Kim
@ 2025-03-12 7:56 ` Rakie Kim
2025-03-12 16:14 ` Gregory Price
2025-03-12 7:56 ` [PATCH v2 4/4] mm/mempolicy: Fix duplicate node addition in sysfs for " Rakie Kim
2025-03-12 15:49 ` [PATCH v2 1/4] mm/mempolicy: Fix memory leaks in mempolicy_sysfs_init() Gregory Price
3 siblings, 1 reply; 27+ messages in thread
From: Rakie Kim @ 2025-03-12 7:56 UTC (permalink / raw)
To: gourry
Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, kernel_team, honggyu.kim,
yunjeong.mun, rakie.kim
Previously, sysfs entries for weighted interleave were only created during
initialization, preventing dynamically added memory nodes from being recognized.
This patch enables sysfs registration for nodes added via memory hotplug,
allowing weighted interleave settings to be updated as the system memory
configuration changes.
Signed-off-by: Rakie Kim <rakie.kim@sk.com>
---
mm/mempolicy.c | 78 +++++++++++++++++++++++++++++---------------------
1 file changed, 45 insertions(+), 33 deletions(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 94efff89e0be..71aff1276d4d 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -3389,6 +3389,13 @@ struct iw_node_attr {
int nid;
};
+struct iw_node_group {
+ struct kobject *wi_kobj;
+ struct iw_node_attr **nattrs;
+};
+
+static struct iw_node_group *ngrp;
+
static ssize_t node_show(struct kobject *kobj, struct kobj_attribute *attr,
char *buf)
{
@@ -3431,24 +3438,22 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
return count;
}
-static struct iw_node_attr **node_attrs;
-
-static void sysfs_wi_node_release(struct iw_node_attr *node_attr,
- struct kobject *parent)
+static void sysfs_wi_node_release(int nid)
{
- if (!node_attr)
+ if (!ngrp->nattrs[nid])
return;
- sysfs_remove_file(parent, &node_attr->kobj_attr.attr);
- kfree(node_attr->kobj_attr.attr.name);
- kfree(node_attr);
+
+ sysfs_remove_file(ngrp->wi_kobj, &ngrp->nattrs[nid]->kobj_attr.attr);
+ kfree(ngrp->nattrs[nid]->kobj_attr.attr.name);
+ kfree(ngrp->nattrs[nid]);
}
static void sysfs_wi_release(struct kobject *wi_kobj)
{
- int i;
+ int nid;
- for (i = 0; i < nr_node_ids; i++)
- sysfs_wi_node_release(node_attrs[i], wi_kobj);
+ for (nid = 0; nid < nr_node_ids; nid++)
+ sysfs_wi_node_release(nid);
kobject_put(wi_kobj);
}
@@ -3457,7 +3462,7 @@ static const struct kobj_type wi_ktype = {
.release = sysfs_wi_release,
};
-static int add_weight_node(int nid, struct kobject *wi_kobj)
+static int sysfs_wi_node_add(int nid)
{
struct iw_node_attr *node_attr;
char *name;
@@ -3479,19 +3484,17 @@ static int add_weight_node(int nid, struct kobject *wi_kobj)
node_attr->kobj_attr.store = node_store;
node_attr->nid = nid;
- if (sysfs_create_file(wi_kobj, &node_attr->kobj_attr.attr)) {
+ if (sysfs_create_file(ngrp->wi_kobj, &node_attr->kobj_attr.attr)) {
kfree(node_attr->kobj_attr.attr.name);
kfree(node_attr);
pr_err("failed to add attribute to weighted_interleave\n");
return -ENOMEM;
}
- node_attrs[nid] = node_attr;
+ ngrp->nattrs[nid] = node_attr;
return 0;
}
-struct kobject *wi_kobj;
-
static int wi_node_notifier(struct notifier_block *nb,
unsigned long action, void *data)
{
@@ -3504,15 +3507,15 @@ static int wi_node_notifier(struct notifier_block *nb,
switch(action) {
case MEM_ONLINE:
- err = add_weight_node(nid, wi_kobj);
+ err = sysfs_wi_node_add(nid);
if (err) {
pr_err("failed to add sysfs [node%d]\n", nid);
- kobject_put(wi_kobj);
+ kobject_put(ngrp->wi_kobj);
return NOTIFY_BAD;
}
break;
case MEM_OFFLINE:
- sysfs_wi_node_release(node_attrs[nid], wi_kobj);
+ sysfs_wi_node_release(nid);
break;
}
@@ -3524,14 +3527,14 @@ static int add_weighted_interleave_group(struct kobject *root_kobj)
{
int nid, err;
- wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
- if (!wi_kobj)
+ ngrp->wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
+ if (!ngrp->wi_kobj)
return -ENOMEM;
- err = kobject_init_and_add(wi_kobj, &wi_ktype, root_kobj,
+ err = kobject_init_and_add(ngrp->wi_kobj, &wi_ktype, root_kobj,
"weighted_interleave");
if (err) {
- kfree(wi_kobj);
+ kfree(ngrp->wi_kobj);
return err;
}
@@ -3539,7 +3542,7 @@ static int add_weighted_interleave_group(struct kobject *root_kobj)
if (!node_state(nid, N_MEMORY))
continue;
- err = add_weight_node(nid, wi_kobj);
+ err = sysfs_wi_node_add(nid);
if (err) {
pr_err("failed to add sysfs [node%d]\n", nid);
goto err_out;
@@ -3550,7 +3553,7 @@ static int add_weighted_interleave_group(struct kobject *root_kobj)
return 0;
err_out:
- kobject_put(wi_kobj);
+ kobject_put(ngrp->wi_kobj);
return err;
}
@@ -3565,7 +3568,9 @@ static void mempolicy_kobj_release(struct kobject *kobj)
mutex_unlock(&iw_table_lock);
synchronize_rcu();
kfree(old);
- kfree(node_attrs);
+
+ kfree(ngrp->nattrs);
+ kfree(ngrp);
kfree(kobj);
}
@@ -3578,17 +3583,23 @@ static int __init mempolicy_sysfs_init(void)
int err;
static struct kobject *mempolicy_kobj;
- node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
- GFP_KERNEL);
- if (!node_attrs) {
+ ngrp = kzalloc(sizeof(*ngrp), GFP_KERNEL);
+ if (!ngrp) {
err = -ENOMEM;
goto err_out;
}
+ ngrp->nattrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
+ GFP_KERNEL);
+ if (!ngrp->nattrs) {
+ err = -ENOMEM;
+ goto ngrp_out;
+ }
+
mempolicy_kobj = kzalloc(sizeof(*mempolicy_kobj), GFP_KERNEL);
if (!mempolicy_kobj) {
err = -ENOMEM;
- goto node_out;
+ goto nattr_out;
}
err = kobject_init_and_add(mempolicy_kobj, &mempolicy_ktype, mm_kobj,
@@ -3606,12 +3617,13 @@ static int __init mempolicy_sysfs_init(void)
return 0;
-node_out:
- kfree(node_attrs);
+nattr_out:
+ kfree(ngrp->nattrs);
+ngrp_out:
+ kfree(ngrp);
err_out:
pr_err("mempolicy sysfs structure failed to initialize\n");
return err;
-
}
late_initcall(mempolicy_sysfs_init);
--
2.34.1
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [PATCH v2 3/4] mm/mempolicy: Enable sysfs support for memory hotplug in weighted interleave
2025-03-12 7:56 ` [PATCH v2 3/4] mm/mempolicy: Enable sysfs support for " Rakie Kim
@ 2025-03-12 16:14 ` Gregory Price
2025-03-13 6:34 ` Rakie Kim
0 siblings, 1 reply; 27+ messages in thread
From: Gregory Price @ 2025-03-12 16:14 UTC (permalink / raw)
To: Rakie Kim
Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, kernel_team, honggyu.kim,
yunjeong.mun
On Wed, Mar 12, 2025 at 04:56:26PM +0900, Rakie Kim wrote:
> Previously, sysfs entries for weighted interleave were only created during
> initialization, preventing dynamically added memory nodes from being recognized.
>
> This patch enables sysfs registration for nodes added via memory hotplug,
> allowing weighted interleave settings to be updated as the system memory
> configuration changes.
>
In patch 2 you said:
```
With this enhancement, the weighted interleave policy now properly supports
memory hotplug, ensuring that newly added nodes are recognized and sysfs
entries are created accordingly.
```
By description, this claims to accomplish functionally the same thing
patch 2 claim.
The code below actually does two things:
1) Refactors the sysfs code to break out the weighted_interleave group
into a global that can be referenced by the hotplug callback.
2) The change the the memory hotplug callback to add/remove nodes.
Move the refactor work out ahead in a separate patch to make it easier
to review the changes individually please.
> Signed-off-by: Rakie Kim <rakie.kim@sk.com>
> ---
> mm/mempolicy.c | 78 +++++++++++++++++++++++++++++---------------------
> 1 file changed, 45 insertions(+), 33 deletions(-)
>
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 94efff89e0be..71aff1276d4d 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -3389,6 +3389,13 @@ struct iw_node_attr {
> int nid;
> };
>
> +struct iw_node_group {
> + struct kobject *wi_kobj;
> + struct iw_node_attr **nattrs;
> +};
> +
> +static struct iw_node_group *ngrp;
> +
for a global, this should have a more descriptive name.
And since this actually represents the weighted_interleave sysfs entry,
it should maybe be `sysfs_wi_group`? Since it will include more than
just nodes (e.g. the upcoming `auto`)
> static ssize_t node_show(struct kobject *kobj, struct kobj_attribute *attr,
> char *buf)
> {
> @@ -3431,24 +3438,22 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
> return count;
> }
>
> -static struct iw_node_attr **node_attrs;
> -
> -static void sysfs_wi_node_release(struct iw_node_attr *node_attr,
> - struct kobject *parent)
> +static void sysfs_wi_node_release(int nid)
> {
> - if (!node_attr)
> + if (!ngrp->nattrs[nid])
> return;
> - sysfs_remove_file(parent, &node_attr->kobj_attr.attr);
> - kfree(node_attr->kobj_attr.attr.name);
> - kfree(node_attr);
> +
> + sysfs_remove_file(ngrp->wi_kobj, &ngrp->nattrs[nid]->kobj_attr.attr);
> + kfree(ngrp->nattrs[nid]->kobj_attr.attr.name);
> + kfree(ngrp->nattrs[nid]);
> }
>
> static void sysfs_wi_release(struct kobject *wi_kobj)
> {
> - int i;
> + int nid;
>
> - for (i = 0; i < nr_node_ids; i++)
> - sysfs_wi_node_release(node_attrs[i], wi_kobj);
> + for (nid = 0; nid < nr_node_ids; nid++)
> + sysfs_wi_node_release(nid);
> kobject_put(wi_kobj);
> }
>
> @@ -3457,7 +3462,7 @@ static const struct kobj_type wi_ktype = {
> .release = sysfs_wi_release,
> };
>
> -static int add_weight_node(int nid, struct kobject *wi_kobj)
> +static int sysfs_wi_node_add(int nid)
> {
> struct iw_node_attr *node_attr;
> char *name;
> @@ -3479,19 +3484,17 @@ static int add_weight_node(int nid, struct kobject *wi_kobj)
> node_attr->kobj_attr.store = node_store;
> node_attr->nid = nid;
>
> - if (sysfs_create_file(wi_kobj, &node_attr->kobj_attr.attr)) {
> + if (sysfs_create_file(ngrp->wi_kobj, &node_attr->kobj_attr.attr)) {
> kfree(node_attr->kobj_attr.attr.name);
> kfree(node_attr);
> pr_err("failed to add attribute to weighted_interleave\n");
> return -ENOMEM;
> }
>
> - node_attrs[nid] = node_attr;
> + ngrp->nattrs[nid] = node_attr;
> return 0;
> }
>
> -struct kobject *wi_kobj;
> -
> static int wi_node_notifier(struct notifier_block *nb,
> unsigned long action, void *data)
> {
> @@ -3504,15 +3507,15 @@ static int wi_node_notifier(struct notifier_block *nb,
>
> switch(action) {
> case MEM_ONLINE:
> - err = add_weight_node(nid, wi_kobj);
> + err = sysfs_wi_node_add(nid);
> if (err) {
> pr_err("failed to add sysfs [node%d]\n", nid);
> - kobject_put(wi_kobj);
> + kobject_put(ngrp->wi_kobj);
> return NOTIFY_BAD;
> }
> break;
> case MEM_OFFLINE:
> - sysfs_wi_node_release(node_attrs[nid], wi_kobj);
> + sysfs_wi_node_release(nid);
> break;
> }
>
> @@ -3524,14 +3527,14 @@ static int add_weighted_interleave_group(struct kobject *root_kobj)
> {
> int nid, err;
>
> - wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
> - if (!wi_kobj)
> + ngrp->wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
> + if (!ngrp->wi_kobj)
> return -ENOMEM;
>
> - err = kobject_init_and_add(wi_kobj, &wi_ktype, root_kobj,
> + err = kobject_init_and_add(ngrp->wi_kobj, &wi_ktype, root_kobj,
> "weighted_interleave");
> if (err) {
> - kfree(wi_kobj);
> + kfree(ngrp->wi_kobj);
> return err;
> }
>
> @@ -3539,7 +3542,7 @@ static int add_weighted_interleave_group(struct kobject *root_kobj)
> if (!node_state(nid, N_MEMORY))
> continue;
>
> - err = add_weight_node(nid, wi_kobj);
> + err = sysfs_wi_node_add(nid);
> if (err) {
> pr_err("failed to add sysfs [node%d]\n", nid);
> goto err_out;
> @@ -3550,7 +3553,7 @@ static int add_weighted_interleave_group(struct kobject *root_kobj)
> return 0;
>
> err_out:
> - kobject_put(wi_kobj);
> + kobject_put(ngrp->wi_kobj);
> return err;
> }
>
> @@ -3565,7 +3568,9 @@ static void mempolicy_kobj_release(struct kobject *kobj)
> mutex_unlock(&iw_table_lock);
> synchronize_rcu();
> kfree(old);
> - kfree(node_attrs);
> +
> + kfree(ngrp->nattrs);
> + kfree(ngrp);
> kfree(kobj);
> }
>
> @@ -3578,17 +3583,23 @@ static int __init mempolicy_sysfs_init(void)
> int err;
> static struct kobject *mempolicy_kobj;
>
> - node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
> - GFP_KERNEL);
> - if (!node_attrs) {
> + ngrp = kzalloc(sizeof(*ngrp), GFP_KERNEL);
> + if (!ngrp) {
> err = -ENOMEM;
> goto err_out;
> }
>
> + ngrp->nattrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
> + GFP_KERNEL);
> + if (!ngrp->nattrs) {
> + err = -ENOMEM;
> + goto ngrp_out;
> + }
> +
> mempolicy_kobj = kzalloc(sizeof(*mempolicy_kobj), GFP_KERNEL);
> if (!mempolicy_kobj) {
> err = -ENOMEM;
> - goto node_out;
> + goto nattr_out;
> }
>
> err = kobject_init_and_add(mempolicy_kobj, &mempolicy_ktype, mm_kobj,
> @@ -3606,12 +3617,13 @@ static int __init mempolicy_sysfs_init(void)
>
> return 0;
>
> -node_out:
> - kfree(node_attrs);
> +nattr_out:
> + kfree(ngrp->nattrs);
> +ngrp_out:
> + kfree(ngrp);
> err_out:
> pr_err("mempolicy sysfs structure failed to initialize\n");
> return err;
> -
> }
>
> late_initcall(mempolicy_sysfs_init);
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [PATCH v2 3/4] mm/mempolicy: Enable sysfs support for memory hotplug in weighted interleave
2025-03-12 16:14 ` Gregory Price
@ 2025-03-13 6:34 ` Rakie Kim
2025-03-13 16:40 ` Gregory Price
0 siblings, 1 reply; 27+ messages in thread
From: Rakie Kim @ 2025-03-13 6:34 UTC (permalink / raw)
To: Gregory Price
Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, kernel_team, honggyu.kim,
yunjeong.mun, Rakie Kim
On Wed, 12 Mar 2025 12:14:48 -0400 Gregory Price <gourry@gourry.net> wrote:
> On Wed, Mar 12, 2025 at 04:56:26PM +0900, Rakie Kim wrote:
> > Previously, sysfs entries for weighted interleave were only created during
> > initialization, preventing dynamically added memory nodes from being recognized.
> >
> > This patch enables sysfs registration for nodes added via memory hotplug,
> > allowing weighted interleave settings to be updated as the system memory
> > configuration changes.
> >
>
> In patch 2 you said:
>
> ```
> With this enhancement, the weighted interleave policy now properly supports
> memory hotplug, ensuring that newly added nodes are recognized and sysfs
> entries are created accordingly.
> ```
>
> By description, this claims to accomplish functionally the same thing
> patch 2 claim.
>
> The code below actually does two things:
>
> 1) Refactors the sysfs code to break out the weighted_interleave group
> into a global that can be referenced by the hotplug callback.
>
> 2) The change the the memory hotplug callback to add/remove nodes.
>
> Move the refactor work out ahead in a separate patch to make it easier
> to review the changes individually please.
This change primarily addresses 1) from your feedback. The
modification to the memory hotplug callback was necessary to adapt
to the new `struct iw_node_group`.
Given that this adjustment is part of integrating the refactored
structure, I believe this patch does not need to be split into two.
However, I would appreciate any further input you may have on this.
>
> > Signed-off-by: Rakie Kim <rakie.kim@sk.com>
> > ---
> > mm/mempolicy.c | 78 +++++++++++++++++++++++++++++---------------------
> > 1 file changed, 45 insertions(+), 33 deletions(-)
> >
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index 94efff89e0be..71aff1276d4d 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -3389,6 +3389,13 @@ struct iw_node_attr {
> > int nid;
> > };
> >
> > +struct iw_node_group {
> > + struct kobject *wi_kobj;
> > + struct iw_node_attr **nattrs;
> > +};
> > +
> > +static struct iw_node_group *ngrp;
> > +
>
> for a global, this should have a more descriptive name.
>
> And since this actually represents the weighted_interleave sysfs entry,
> it should maybe be `sysfs_wi_group`? Since it will include more than
> just nodes (e.g. the upcoming `auto`)
Regarding your naming suggestion, I agree that `sysfs_wi_group` is
more descriptive and aligns well with its role in managing the
weighted_interleave sysfs entry. I will update the patch accordingly.
>
> > static ssize_t node_show(struct kobject *kobj, struct kobj_attribute *attr,
> > char *buf)
> > {
> > @@ -3431,24 +3438,22 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
> > return count;
> > }
> >
> > -static struct iw_node_attr **node_attrs;
> > -
> > -static void sysfs_wi_node_release(struct iw_node_attr *node_attr,
> > - struct kobject *parent)
> > +static void sysfs_wi_node_release(int nid)
> > {
> > - if (!node_attr)
> > + if (!ngrp->nattrs[nid])
> > return;
> > - sysfs_remove_file(parent, &node_attr->kobj_attr.attr);
> > - kfree(node_attr->kobj_attr.attr.name);
> > - kfree(node_attr);
> > +
> > + sysfs_remove_file(ngrp->wi_kobj, &ngrp->nattrs[nid]->kobj_attr.attr);
> > + kfree(ngrp->nattrs[nid]->kobj_attr.attr.name);
> > + kfree(ngrp->nattrs[nid]);
> > }
> >
> > static void sysfs_wi_release(struct kobject *wi_kobj)
> > {
> > - int i;
> > + int nid;
> >
> > - for (i = 0; i < nr_node_ids; i++)
> > - sysfs_wi_node_release(node_attrs[i], wi_kobj);
> > + for (nid = 0; nid < nr_node_ids; nid++)
> > + sysfs_wi_node_release(nid);
> > kobject_put(wi_kobj);
> > }
> >
> > @@ -3457,7 +3462,7 @@ static const struct kobj_type wi_ktype = {
> > .release = sysfs_wi_release,
> > };
> >
> > -static int add_weight_node(int nid, struct kobject *wi_kobj)
> > +static int sysfs_wi_node_add(int nid)
> > {
> > struct iw_node_attr *node_attr;
> > char *name;
> > @@ -3479,19 +3484,17 @@ static int add_weight_node(int nid, struct kobject *wi_kobj)
> > node_attr->kobj_attr.store = node_store;
> > node_attr->nid = nid;
> >
> > - if (sysfs_create_file(wi_kobj, &node_attr->kobj_attr.attr)) {
> > + if (sysfs_create_file(ngrp->wi_kobj, &node_attr->kobj_attr.attr)) {
> > kfree(node_attr->kobj_attr.attr.name);
> > kfree(node_attr);
> > pr_err("failed to add attribute to weighted_interleave\n");
> > return -ENOMEM;
> > }
> >
> > - node_attrs[nid] = node_attr;
> > + ngrp->nattrs[nid] = node_attr;
> > return 0;
> > }
> >
> > -struct kobject *wi_kobj;
> > -
> > static int wi_node_notifier(struct notifier_block *nb,
> > unsigned long action, void *data)
> > {
> > @@ -3504,15 +3507,15 @@ static int wi_node_notifier(struct notifier_block *nb,
> >
> > switch(action) {
> > case MEM_ONLINE:
> > - err = add_weight_node(nid, wi_kobj);
> > + err = sysfs_wi_node_add(nid);
> > if (err) {
> > pr_err("failed to add sysfs [node%d]\n", nid);
> > - kobject_put(wi_kobj);
> > + kobject_put(ngrp->wi_kobj);
> > return NOTIFY_BAD;
> > }
> > break;
> > case MEM_OFFLINE:
> > - sysfs_wi_node_release(node_attrs[nid], wi_kobj);
> > + sysfs_wi_node_release(nid);
> > break;
> > }
> >
> > @@ -3524,14 +3527,14 @@ static int add_weighted_interleave_group(struct kobject *root_kobj)
> > {
> > int nid, err;
> >
> > - wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
> > - if (!wi_kobj)
> > + ngrp->wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
> > + if (!ngrp->wi_kobj)
> > return -ENOMEM;
> >
> > - err = kobject_init_and_add(wi_kobj, &wi_ktype, root_kobj,
> > + err = kobject_init_and_add(ngrp->wi_kobj, &wi_ktype, root_kobj,
> > "weighted_interleave");
> > if (err) {
> > - kfree(wi_kobj);
> > + kfree(ngrp->wi_kobj);
> > return err;
> > }
> >
> > @@ -3539,7 +3542,7 @@ static int add_weighted_interleave_group(struct kobject *root_kobj)
> > if (!node_state(nid, N_MEMORY))
> > continue;
> >
> > - err = add_weight_node(nid, wi_kobj);
> > + err = sysfs_wi_node_add(nid);
> > if (err) {
> > pr_err("failed to add sysfs [node%d]\n", nid);
> > goto err_out;
> > @@ -3550,7 +3553,7 @@ static int add_weighted_interleave_group(struct kobject *root_kobj)
> > return 0;
> >
> > err_out:
> > - kobject_put(wi_kobj);
> > + kobject_put(ngrp->wi_kobj);
> > return err;
> > }
> >
> > @@ -3565,7 +3568,9 @@ static void mempolicy_kobj_release(struct kobject *kobj)
> > mutex_unlock(&iw_table_lock);
> > synchronize_rcu();
> > kfree(old);
> > - kfree(node_attrs);
> > +
> > + kfree(ngrp->nattrs);
> > + kfree(ngrp);
> > kfree(kobj);
> > }
> >
> > @@ -3578,17 +3583,23 @@ static int __init mempolicy_sysfs_init(void)
> > int err;
> > static struct kobject *mempolicy_kobj;
> >
> > - node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
> > - GFP_KERNEL);
> > - if (!node_attrs) {
> > + ngrp = kzalloc(sizeof(*ngrp), GFP_KERNEL);
> > + if (!ngrp) {
> > err = -ENOMEM;
> > goto err_out;
> > }
> >
> > + ngrp->nattrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
> > + GFP_KERNEL);
> > + if (!ngrp->nattrs) {
> > + err = -ENOMEM;
> > + goto ngrp_out;
> > + }
> > +
> > mempolicy_kobj = kzalloc(sizeof(*mempolicy_kobj), GFP_KERNEL);
> > if (!mempolicy_kobj) {
> > err = -ENOMEM;
> > - goto node_out;
> > + goto nattr_out;
> > }
> >
> > err = kobject_init_and_add(mempolicy_kobj, &mempolicy_ktype, mm_kobj,
> > @@ -3606,12 +3617,13 @@ static int __init mempolicy_sysfs_init(void)
> >
> > return 0;
> >
> > -node_out:
> > - kfree(node_attrs);
> > +nattr_out:
> > + kfree(ngrp->nattrs);
> > +ngrp_out:
> > + kfree(ngrp);
> > err_out:
> > pr_err("mempolicy sysfs structure failed to initialize\n");
> > return err;
> > -
> > }
> >
> > late_initcall(mempolicy_sysfs_init);
> > --
> > 2.34.1
> >
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [PATCH v2 3/4] mm/mempolicy: Enable sysfs support for memory hotplug in weighted interleave
2025-03-13 6:34 ` Rakie Kim
@ 2025-03-13 16:40 ` Gregory Price
2025-03-14 6:35 ` Rakie Kim
0 siblings, 1 reply; 27+ messages in thread
From: Gregory Price @ 2025-03-13 16:40 UTC (permalink / raw)
To: Rakie Kim
Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, kernel_team, honggyu.kim,
yunjeong.mun
On Thu, Mar 13, 2025 at 03:34:10PM +0900, Rakie Kim wrote:
> On Wed, 12 Mar 2025 12:14:48 -0400 Gregory Price <gourry@gourry.net> wrote:
>
> Given that this adjustment is part of integrating the refactored
> structure, I believe this patch does not need to be split into two.
> However, I would appreciate any further input you may have on this.
>
Another way of saying this is: can you please change the ordering of
patch 2 and 3 and place the functional changes into "make mempolicy
support memory hotplug" patch.
It's a little odd to "make mempolicy support memory hotplug" and then
follow that up with a patch that says "now make it REALLY support it".
Patch 2 should read:
"Refactor weighted_interleave sysfs to allow node structure to be
dynamic"
Patch 3 should read:
"Make weighted interleave sysfs structure support memory hotplug"
I think you'll find the patches end up looking much cleaner this way as
well.
~Gregory
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [PATCH v2 3/4] mm/mempolicy: Enable sysfs support for memory hotplug in weighted interleave
2025-03-13 16:40 ` Gregory Price
@ 2025-03-14 6:35 ` Rakie Kim
0 siblings, 0 replies; 27+ messages in thread
From: Rakie Kim @ 2025-03-14 6:35 UTC (permalink / raw)
To: Gregory Price
Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, kernel_team, honggyu.kim,
yunjeong.mun, Rakie Kim
On Thu, 13 Mar 2025 12:40:02 -0400 Gregory Price <gourry@gourry.net> wrote:
Hi Gregory
> On Thu, Mar 13, 2025 at 03:34:10PM +0900, Rakie Kim wrote:
> > On Wed, 12 Mar 2025 12:14:48 -0400 Gregory Price <gourry@gourry.net> wrote:
> >
> > Given that this adjustment is part of integrating the refactored
> > structure, I believe this patch does not need to be split into two.
> > However, I would appreciate any further input you may have on this.
> >
>
> Another way of saying this is: can you please change the ordering of
> patch 2 and 3 and place the functional changes into "make mempolicy
> support memory hotplug" patch.
>
> It's a little odd to "make mempolicy support memory hotplug" and then
> follow that up with a patch that says "now make it REALLY support it".
>
> Patch 2 should read:
> "Refactor weighted_interleave sysfs to allow node structure to be
> dynamic"
> Patch 3 should read:
> "Make weighted interleave sysfs structure support memory hotplug"
>
> I think you'll find the patches end up looking much cleaner this way as
> well.
>
> ~Gregory
Your suggestion seems appropriate. I will rearrange the order as you
suggested when creating version 3.
Rakie
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH v2 4/4] mm/mempolicy: Fix duplicate node addition in sysfs for weighted interleave
2025-03-12 7:56 [PATCH v2 1/4] mm/mempolicy: Fix memory leaks in mempolicy_sysfs_init() Rakie Kim
2025-03-12 7:56 ` [PATCH v2 2/4] mm/mempolicy: Support memory hotplug in weighted interleave Rakie Kim
2025-03-12 7:56 ` [PATCH v2 3/4] mm/mempolicy: Enable sysfs support for " Rakie Kim
@ 2025-03-12 7:56 ` Rakie Kim
2025-03-12 15:04 ` Joshua Hahn
2025-03-13 16:42 ` Gregory Price
2025-03-12 15:49 ` [PATCH v2 1/4] mm/mempolicy: Fix memory leaks in mempolicy_sysfs_init() Gregory Price
3 siblings, 2 replies; 27+ messages in thread
From: Rakie Kim @ 2025-03-12 7:56 UTC (permalink / raw)
To: gourry
Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, kernel_team, honggyu.kim,
yunjeong.mun, rakie.kim
Sysfs attributes for interleave control were registered both at initialization
and when new nodes were detected via hotplug, leading to potential duplicates.
This patch ensures that each node is registered only once, preventing conflicts
and redundant sysfs entries.
Signed-off-by: Rakie Kim <rakie.kim@sk.com>
---
mm/mempolicy.c | 66 +++++++++++++++++++++++++++++++++++---------------
1 file changed, 46 insertions(+), 20 deletions(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 71aff1276d4d..5f20521036ec 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -3391,6 +3391,7 @@ struct iw_node_attr {
struct iw_node_group {
struct kobject *wi_kobj;
+ struct mutex kobj_lock;
struct iw_node_attr **nattrs;
};
@@ -3440,12 +3441,17 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
static void sysfs_wi_node_release(int nid)
{
- if (!ngrp->nattrs[nid])
+ mutex_lock(&ngrp->kobj_lock);
+ if (!ngrp->nattrs[nid]) {
+ mutex_unlock(&ngrp->kobj_lock);
return;
+ }
sysfs_remove_file(ngrp->wi_kobj, &ngrp->nattrs[nid]->kobj_attr.attr);
kfree(ngrp->nattrs[nid]->kobj_attr.attr.name);
kfree(ngrp->nattrs[nid]);
+ ngrp->nattrs[nid] = NULL;
+ mutex_unlock(&ngrp->kobj_lock);
}
static void sysfs_wi_release(struct kobject *wi_kobj)
@@ -3464,35 +3470,54 @@ static const struct kobj_type wi_ktype = {
static int sysfs_wi_node_add(int nid)
{
- struct iw_node_attr *node_attr;
+ int ret = 0;
char *name;
- node_attr = kzalloc(sizeof(*node_attr), GFP_KERNEL);
- if (!node_attr)
- return -ENOMEM;
+ if (nid < 0 || nid >= nr_node_ids) {
+ pr_err("Invalid node id: %d\n", nid);
+ ret = -EINVAL;
+ goto out;
+ }
+
+ mutex_lock(&ngrp->kobj_lock);
+ if (!ngrp->nattrs[nid]) {
+ ngrp->nattrs[nid] = kzalloc(sizeof(struct iw_node_attr), GFP_KERNEL);
+ } else {
+ mutex_unlock(&ngrp->kobj_lock);
+ pr_info("Node [%d] is already existed\n", nid);
+ goto out;
+ }
+ mutex_unlock(&ngrp->kobj_lock);
+
+ if (!ngrp->nattrs[nid]) {
+ ret = -ENOMEM;
+ goto out;
+ }
name = kasprintf(GFP_KERNEL, "node%d", nid);
if (!name) {
- kfree(node_attr);
- return -ENOMEM;
+ kfree(ngrp->nattrs[nid]);
+ ret = -ENOMEM;
+ goto out;
}
- sysfs_attr_init(&node_attr->kobj_attr.attr);
- node_attr->kobj_attr.attr.name = name;
- node_attr->kobj_attr.attr.mode = 0644;
- node_attr->kobj_attr.show = node_show;
- node_attr->kobj_attr.store = node_store;
- node_attr->nid = nid;
+ sysfs_attr_init(&ngrp->nattrs[nid]->kobj_attr.attr);
+ ngrp->nattrs[nid]->kobj_attr.attr.name = name;
+ ngrp->nattrs[nid]->kobj_attr.attr.mode = 0644;
+ ngrp->nattrs[nid]->kobj_attr.show = node_show;
+ ngrp->nattrs[nid]->kobj_attr.store = node_store;
+ ngrp->nattrs[nid]->nid = nid;
- if (sysfs_create_file(ngrp->wi_kobj, &node_attr->kobj_attr.attr)) {
- kfree(node_attr->kobj_attr.attr.name);
- kfree(node_attr);
- pr_err("failed to add attribute to weighted_interleave\n");
- return -ENOMEM;
+ ret = sysfs_create_file(ngrp->wi_kobj, &ngrp->nattrs[nid]->kobj_attr.attr);
+ if (ret) {
+ kfree(ngrp->nattrs[nid]->kobj_attr.attr.name);
+ kfree(ngrp->nattrs[nid]);
+ pr_err("failed to add attribute to weighted_interleave: %d\n", ret);
+ goto out;
}
- ngrp->nattrs[nid] = node_attr;
- return 0;
+out:
+ return ret;
}
static int wi_node_notifier(struct notifier_block *nb,
@@ -3588,6 +3613,7 @@ static int __init mempolicy_sysfs_init(void)
err = -ENOMEM;
goto err_out;
}
+ mutex_init(&ngrp->kobj_lock);
ngrp->nattrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
GFP_KERNEL);
--
2.34.1
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [PATCH v2 4/4] mm/mempolicy: Fix duplicate node addition in sysfs for weighted interleave
2025-03-12 7:56 ` [PATCH v2 4/4] mm/mempolicy: Fix duplicate node addition in sysfs for " Rakie Kim
@ 2025-03-12 15:04 ` Joshua Hahn
2025-03-13 6:34 ` Rakie Kim
2025-03-13 16:42 ` Gregory Price
1 sibling, 1 reply; 27+ messages in thread
From: Joshua Hahn @ 2025-03-12 15:04 UTC (permalink / raw)
To: Rakie Kim
Cc: gourry, akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, kernel_team, honggyu.kim,
yunjeong.mun
Hi Rakie, thank your new revision!
I think this new ordering of the series makes more sense, since the bug exists
even before your patch is applied! IMHO, it might also make sense to take
patch 1 out of this series, and send it separately (and make patches 2-4
their own series).
I have a nit and a few thoughts about this patch and the way we order locking
and allocating:
> static void sysfs_wi_release(struct kobject *wi_kobj)
> @@ -3464,35 +3470,54 @@ static const struct kobj_type wi_ktype = {
>
> static int sysfs_wi_node_add(int nid)
> {
> - struct iw_node_attr *node_attr;
> + int ret = 0;
> char *name;
>
> - node_attr = kzalloc(sizeof(*node_attr), GFP_KERNEL);
> - if (!node_attr)
> - return -ENOMEM;
> + if (nid < 0 || nid >= nr_node_ids) {
> + pr_err("Invalid node id: %d\n", nid);
> + ret = -EINVAL;
> + goto out;
> + }
> +
> + mutex_lock(&ngrp->kobj_lock);
> + if (!ngrp->nattrs[nid]) {
> + ngrp->nattrs[nid] = kzalloc(sizeof(struct iw_node_attr), GFP_KERNEL);
I am unsure if kzallocing with the mutex_lock held is best practice. Even though
two threads won't reach this place simultaneously since *most* calls to this
function are sequential, we should try to keep the code safe since future
patches might overlook this, and later make non-sequential calls : -)
It also doesn't seem wise to directly assign the result of an allocation
without checking for its success (as I explain below).
IMHO it is best to allocate first, then acquire the lock and check for
existence, then assign with the lock still held. We can also reduce this code
section into a single if statement for clarity (but if you think it looks
cleaner with the if-else, please keep it!)
struct iw_node_attr *node_attr;
...
node_attr = kzalloc(sizeof(*node_attr), GFP_KERNEL);
if (!node_attr) {
ret = -ENOMEM;
goto out;
}
mutex_lock(&ngrp->kobj_lock);
if (ngrp->nattrs[nid]) {
mutex_unlock(&ngrp->kobj_lock);
kfree(node_attr);
pr_info("Node [%d] already exists\n");
goto out;
}
ngrp->attrs[nid] = node_attr;
mutex_unlock(&ngrp->kobj_lock):
> + } else {
> + mutex_unlock(&ngrp->kobj_lock);
> + pr_info("Node [%d] is already existed\n", nid);
NIT: To keep consistency with other parts of the kernel, maybe this can be
rephrased to "Node [%d] already exists\n"
> + goto out;
> + }
> + mutex_unlock(&ngrp->kobj_lock);
> +
> + if (!ngrp->nattrs[nid]) {
> + ret = -ENOMEM;
> + goto out;
> + }
If we make the changes above, we don't have to check for allocation success
*after* already having locked & unlocked and making the unnecessary assignment.
>
> name = kasprintf(GFP_KERNEL, "node%d", nid);
> if (!name) {
> - kfree(node_attr);
> - return -ENOMEM;
> + kfree(ngrp->nattrs[nid]);
> + ret = -ENOMEM;
> + goto out;
> }
For the same reasons above, I think it makes sense to make this allocation
together with the allocation of node_attr above, and free / return -ENOMEM
as early as possible if we can.
[...snip...]
Thank you again for this patch! Please let me know what you think : -)
Have a great day!
Joshua
Sent using hkml (https://github.com/sjp38/hackermail)
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [PATCH v2 4/4] mm/mempolicy: Fix duplicate node addition in sysfs for weighted interleave
2025-03-12 15:04 ` Joshua Hahn
@ 2025-03-13 6:34 ` Rakie Kim
0 siblings, 0 replies; 27+ messages in thread
From: Rakie Kim @ 2025-03-13 6:34 UTC (permalink / raw)
To: Joshua Hahn
Cc: gourry, akpm, linux-mm, linux-kernel, linux-cxl, dan.j.williams,
ying.huang, kernel_team, honggyu.kim, yunjeong.mun, Rakie Kim
On Wed, 12 Mar 2025 08:04:39 -0700 Joshua Hahn <joshua.hahnjy@gmail.com> wrote:
Hi Joshua
Thank you for your response regarding this patch.
> Hi Rakie, thank your new revision!
>
> I think this new ordering of the series makes more sense, since the bug exists
> even before your patch is applied! IMHO, it might also make sense to take
> patch 1 out of this series, and send it separately (and make patches 2-4
> their own series).
>
> I have a nit and a few thoughts about this patch and the way we order locking
> and allocating:
>
> > static void sysfs_wi_release(struct kobject *wi_kobj)
> > @@ -3464,35 +3470,54 @@ static const struct kobj_type wi_ktype = {
> >
> > static int sysfs_wi_node_add(int nid)
> > {
> > - struct iw_node_attr *node_attr;
> > + int ret = 0;
> > char *name;
> >
> > - node_attr = kzalloc(sizeof(*node_attr), GFP_KERNEL);
> > - if (!node_attr)
> > - return -ENOMEM;
> > + if (nid < 0 || nid >= nr_node_ids) {
> > + pr_err("Invalid node id: %d\n", nid);
> > + ret = -EINVAL;
> > + goto out;
> > + }
> > +
> > + mutex_lock(&ngrp->kobj_lock);
> > + if (!ngrp->nattrs[nid]) {
> > + ngrp->nattrs[nid] = kzalloc(sizeof(struct iw_node_attr), GFP_KERNEL);
>
> I am unsure if kzallocing with the mutex_lock held is best practice. Even though
> two threads won't reach this place simultaneously since *most* calls to this
> function are sequential, we should try to keep the code safe since future
> patches might overlook this, and later make non-sequential calls : -)
>
> It also doesn't seem wise to directly assign the result of an allocation
> without checking for its success (as I explain below).
>
> IMHO it is best to allocate first, then acquire the lock and check for
> existence, then assign with the lock still held. We can also reduce this code
> section into a single if statement for clarity (but if you think it looks
> cleaner with the if-else, please keep it!)
>
> struct iw_node_attr *node_attr;
>
> ...
>
> node_attr = kzalloc(sizeof(*node_attr), GFP_KERNEL);
> if (!node_attr) {
> ret = -ENOMEM;
> goto out;
> }
>
> mutex_lock(&ngrp->kobj_lock);
> if (ngrp->nattrs[nid]) {
> mutex_unlock(&ngrp->kobj_lock);
> kfree(node_attr);
> pr_info("Node [%d] already exists\n");
> goto out;
> }
> ngrp->attrs[nid] = node_attr;
> mutex_unlock(&ngrp->kobj_lock):
>
Your suggestion makes sense, and I will update this part accordingly
to reflect your feedback.
>
> > + } else {
> > + mutex_unlock(&ngrp->kobj_lock);
> > + pr_info("Node [%d] is already existed\n", nid);
>
> NIT: To keep consistency with other parts of the kernel, maybe this can be
> rephrased to "Node [%d] already exists\n"
I also agree that modifying the wording would improve clarity.
I will make the necessary adjustments in the next version.
>
> > + goto out;
> > + }
> > + mutex_unlock(&ngrp->kobj_lock);
> > +
> > + if (!ngrp->nattrs[nid]) {
> > + ret = -ENOMEM;
> > + goto out;
> > + }
>
> If we make the changes above, we don't have to check for allocation success
> *after* already having locked & unlocked and making the unnecessary assignment.
>
> >
> > name = kasprintf(GFP_KERNEL, "node%d", nid);
> > if (!name) {
> > - kfree(node_attr);
> > - return -ENOMEM;
> > + kfree(ngrp->nattrs[nid]);
> > + ret = -ENOMEM;
> > + goto out;
> > }
>
> For the same reasons above, I think it makes sense to make this allocation
> together with the allocation of node_attr above, and free / return -ENOMEM
> as early as possible if we can.
>
> [...snip...]
>
> Thank you again for this patch! Please let me know what you think : -)
> Have a great day!
> Joshua
Thank you for your detailed and thoughtful review. I will incorporate
your suggestions and update the next version accordingly.
>
> Sent using hkml (https://github.com/sjp38/hackermail)
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 4/4] mm/mempolicy: Fix duplicate node addition in sysfs for weighted interleave
2025-03-12 7:56 ` [PATCH v2 4/4] mm/mempolicy: Fix duplicate node addition in sysfs for " Rakie Kim
2025-03-12 15:04 ` Joshua Hahn
@ 2025-03-13 16:42 ` Gregory Price
2025-03-14 6:35 ` Rakie Kim
1 sibling, 1 reply; 27+ messages in thread
From: Gregory Price @ 2025-03-13 16:42 UTC (permalink / raw)
To: Rakie Kim
Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, kernel_team, honggyu.kim,
yunjeong.mun
On Wed, Mar 12, 2025 at 04:56:27PM +0900, Rakie Kim wrote:
> Sysfs attributes for interleave control were registered both at initialization
> and when new nodes were detected via hotplug, leading to potential duplicates.
>
> This patch ensures that each node is registered only once, preventing conflicts
> and redundant sysfs entries.
>
After looking more closely at patch 2, this seems to suggest we're not
understanding the OFFLINE/ONLINE events well enough to use for this
purpose. I think this patch won't be needed once we address the
concerns in patch 2 - and more generally if we discover it is needed
this should just be rolled into patch 2 rather than kept separate.
~Gregory
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 4/4] mm/mempolicy: Fix duplicate node addition in sysfs for weighted interleave
2025-03-13 16:42 ` Gregory Price
@ 2025-03-14 6:35 ` Rakie Kim
0 siblings, 0 replies; 27+ messages in thread
From: Rakie Kim @ 2025-03-14 6:35 UTC (permalink / raw)
To: Gregory Price
Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, kernel_team, honggyu.kim,
yunjeong.mun, Rakie Kim
On Thu, 13 Mar 2025 12:42:03 -0400 Gregory Price <gourry@gourry.net> wrote:
> On Wed, Mar 12, 2025 at 04:56:27PM +0900, Rakie Kim wrote:
> > Sysfs attributes for interleave control were registered both at initialization
> > and when new nodes were detected via hotplug, leading to potential duplicates.
> >
> > This patch ensures that each node is registered only once, preventing conflicts
> > and redundant sysfs entries.
> >
>
> After looking more closely at patch 2, this seems to suggest we're not
> understanding the OFFLINE/ONLINE events well enough to use for this
> purpose. I think this patch won't be needed once we address the
> concerns in patch 2 - and more generally if we discover it is needed
> this should just be rolled into patch 2 rather than kept separate.
>
> ~Gregory
For this patch, it might be beneficial to merge it with another patch
or change its form. I will consider the changes, including the feedback
from Joshua.
Rakie
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 1/4] mm/mempolicy: Fix memory leaks in mempolicy_sysfs_init()
2025-03-12 7:56 [PATCH v2 1/4] mm/mempolicy: Fix memory leaks in mempolicy_sysfs_init() Rakie Kim
` (2 preceding siblings ...)
2025-03-12 7:56 ` [PATCH v2 4/4] mm/mempolicy: Fix duplicate node addition in sysfs for " Rakie Kim
@ 2025-03-12 15:49 ` Gregory Price
2025-03-13 6:31 ` Rakie Kim
3 siblings, 1 reply; 27+ messages in thread
From: Gregory Price @ 2025-03-12 15:49 UTC (permalink / raw)
To: Rakie Kim
Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, kernel_team, honggyu.kim,
yunjeong.mun
On Wed, Mar 12, 2025 at 04:56:24PM +0900, Rakie Kim wrote:
> Improper cleanup of sysfs attributes caused kobject and memory leaks when
> initialization failed or nodes were removed.
>
> This patch ensures proper deallocation of kobjects and memory, preventing
> resource leaks and improving stability.
>
This patch does multiple things, please split these changes into
multiple patches.
> Fixes: dce41f5ae253 ("mm/mempolicy: implement the sysfs-based weighted_interleave interface")
> Signed-off-by: Rakie Kim <rakie.kim@sk.com>
> ---
> mm/mempolicy.c | 29 +++++++++++++++--------------
> 1 file changed, 15 insertions(+), 14 deletions(-)
>
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index bbaadbeeb291..1691748badb2 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -3541,39 +3541,40 @@ static int __init mempolicy_sysfs_init(void)
> int err;
> static struct kobject *mempolicy_kobj;
>
> - mempolicy_kobj = kzalloc(sizeof(*mempolicy_kobj), GFP_KERNEL);
> - if (!mempolicy_kobj) {
> + node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
> + GFP_KERNEL);
> + if (!node_attrs) {
> err = -ENOMEM;
> goto err_out;
> }
>
> - node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
> - GFP_KERNEL);
> - if (!node_attrs) {
> + mempolicy_kobj = kzalloc(sizeof(*mempolicy_kobj), GFP_KERNEL);
> + if (!mempolicy_kobj) {
> err = -ENOMEM;
> - goto mempol_out;
> + goto node_out;
> }
It's not clear to me why you re-ordered these allocations, it seems
superfluous.
>
> err = kobject_init_and_add(mempolicy_kobj, &mempolicy_ktype, mm_kobj,
> "mempolicy");
> - if (err)
> - goto node_out;
> + if (err) {
> + kobject_put(mempolicy_kobj);
Is this correct? If kobject_init_and_add fails, from other examples we
need only free the mempolicy_kobj - because it failed to initialize and
therefore should not have any references. I think this causes an
underflow.
> + goto err_out;
> + }
>
> err = add_weighted_interleave_group(mempolicy_kobj);
> if (err) {
> - pr_err("mempolicy sysfs structure failed to initialize\n");
> kobject_put(mempolicy_kobj);
> - return err;
> + goto err_out;
> }
>
> - return err;
> + return 0;
> +
Please keep functional changes and refactors separate. There's more
churn in this patch than is needed to accomplish what the changelog
states is the intent.
> node_out:
> kfree(node_attrs);
> -mempol_out:
> - kfree(mempolicy_kobj);
> err_out:
> - pr_err("failed to add mempolicy kobject to the system\n");
> + pr_err("mempolicy sysfs structure failed to initialize\n");
> return err;
> +
> }
>
> late_initcall(mempolicy_sysfs_init);
>
> base-commit: 80e54e84911a923c40d7bee33a34c1b4be148d7a
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [PATCH v2 1/4] mm/mempolicy: Fix memory leaks in mempolicy_sysfs_init()
2025-03-12 15:49 ` [PATCH v2 1/4] mm/mempolicy: Fix memory leaks in mempolicy_sysfs_init() Gregory Price
@ 2025-03-13 6:31 ` Rakie Kim
2025-03-13 15:52 ` Gregory Price
0 siblings, 1 reply; 27+ messages in thread
From: Rakie Kim @ 2025-03-13 6:31 UTC (permalink / raw)
To: Gregory Price
Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, kernel_team, honggyu.kim,
yunjeong.mun, Rakie Kim
On Wed, 12 Mar 2025 11:49:06 -0400 Gregory Price <gourry@gourry.net> wrote:
Hi Gregory
Thank you for your response regarding this patch.
> On Wed, Mar 12, 2025 at 04:56:24PM +0900, Rakie Kim wrote:
> > Improper cleanup of sysfs attributes caused kobject and memory leaks when
> > initialization failed or nodes were removed.
> >
> > This patch ensures proper deallocation of kobjects and memory, preventing
> > resource leaks and improving stability.
> >
>
> This patch does multiple things, please split these changes into
> multiple patches.
This patch should remain as a single patch since all changes address
kobject-related memory issues in mempolicy_sysfs_init(). If you still
believe it should be split, I would appreciate your suggestion on
which parts should be separated.
>
> > Fixes: dce41f5ae253 ("mm/mempolicy: implement the sysfs-based weighted_interleave interface")
> > Signed-off-by: Rakie Kim <rakie.kim@sk.com>
> > ---
> > mm/mempolicy.c | 29 +++++++++++++++--------------
> > 1 file changed, 15 insertions(+), 14 deletions(-)
> >
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index bbaadbeeb291..1691748badb2 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -3541,39 +3541,40 @@ static int __init mempolicy_sysfs_init(void)
> > int err;
> > static struct kobject *mempolicy_kobj;
> >
> > - mempolicy_kobj = kzalloc(sizeof(*mempolicy_kobj), GFP_KERNEL);
> > - if (!mempolicy_kobj) {
> > + node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
> > + GFP_KERNEL);
> > + if (!node_attrs) {
> > err = -ENOMEM;
> > goto err_out;
> > }
> >
> > - node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
> > - GFP_KERNEL);
> > - if (!node_attrs) {
> > + mempolicy_kobj = kzalloc(sizeof(*mempolicy_kobj), GFP_KERNEL);
> > + if (!mempolicy_kobj) {
> > err = -ENOMEM;
> > - goto mempol_out;
> > + goto node_out;
> > }
>
> It's not clear to me why you re-ordered these allocations, it seems
> superfluous.
>
> >
> > err = kobject_init_and_add(mempolicy_kobj, &mempolicy_ktype, mm_kobj,
> > "mempolicy");
> > - if (err)
> > - goto node_out;
> > + if (err) {
> > + kobject_put(mempolicy_kobj);
>
> Is this correct? If kobject_init_and_add fails, from other examples we
> need only free the mempolicy_kobj - because it failed to initialize and
> therefore should not have any references. I think this causes an
> underflow.
Regarding the reordering of mempolicy_kobj allocation:
1) In kobject_init_and_add(), kobject_init() is always called, which
increments the kobject's refcount. Therefore, even if
kobject_init_and_add() fails, kobject_put() must be called to ensure
proper memory cleanup.
int kobject_init_and_add(struct kobject *kobj, const struct kobj_type *ktype,
struct kobject *parent, const char *fmt, ...)
{
...
kobject_init(kobj, ktype);
retval = kobject_add_varg(kobj, parent, fmt, args);
...
return retval;
}
2) The release function for mempolicy_kobj is responsible for freeing
associated memory:
static void mempolicy_kobj_release(struct kobject *kobj)
{
...
kfree(ngrp->nattrs);
kfree(ngrp);
kfree(kobj);
}
Once mempolicy_kobj is passed to kobject_init_and_add(), the memory
for ngrp->attrs and ngrp should be released via mempolicy_kobj_release().
The allocation order was changed to ensure that kobject_put() properly
invokes mempolicy_kobj_release() when required.
>
> > + goto err_out;
> > + }
> >
> > err = add_weighted_interleave_group(mempolicy_kobj);
> > if (err) {
> > - pr_err("mempolicy sysfs structure failed to initialize\n");
> > kobject_put(mempolicy_kobj);
> > - return err;
> > + goto err_out;
> > }
> >
> > - return err;
> > + return 0;
> > +
>
> Please keep functional changes and refactors separate. There's more
> churn in this patch than is needed to accomplish what the changelog
> states is the intent.
As mentioned earlier, I believe this patch does not need to be split.
However, if you have further concerns or suggestions, I would
appreciate your input.
>
> > node_out:
> > kfree(node_attrs);
> > -mempol_out:
> > - kfree(mempolicy_kobj);
> > err_out:
> > - pr_err("failed to add mempolicy kobject to the system\n");
> > + pr_err("mempolicy sysfs structure failed to initialize\n");
> > return err;
> > +
> > }
> >
> > late_initcall(mempolicy_sysfs_init);
> >
> > base-commit: 80e54e84911a923c40d7bee33a34c1b4be148d7a
> > --
> > 2.34.1
> >
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [PATCH v2 1/4] mm/mempolicy: Fix memory leaks in mempolicy_sysfs_init()
2025-03-13 6:31 ` Rakie Kim
@ 2025-03-13 15:52 ` Gregory Price
2025-03-14 7:44 ` Rakie Kim
2025-03-14 10:55 ` Jonathan Cameron
0 siblings, 2 replies; 27+ messages in thread
From: Gregory Price @ 2025-03-13 15:52 UTC (permalink / raw)
To: Rakie Kim
Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, kernel_team, honggyu.kim,
yunjeong.mun
On Thu, Mar 13, 2025 at 03:31:38PM +0900, Rakie Kim wrote:
> > Is this correct? If kobject_init_and_add fails, from other examples we
> > need only free the mempolicy_kobj - because it failed to initialize and
> > therefore should not have any references. I think this causes an
> > underflow.
>
> Regarding the reordering of mempolicy_kobj allocation:
> 1) In kobject_init_and_add(), kobject_init() is always called, which
Quite right, mea culpa.
>
> 2) The release function for mempolicy_kobj is responsible for freeing
> associated memory:
>
> static void mempolicy_kobj_release(struct kobject *kobj)
> {
> ...
> kfree(ngrp->nattrs);
> kfree(ngrp);
> kfree(kobj);
> }
>
I see what you're trying to do now after looking at the free-ordering
at little closer.
Lets do the following:
1) allocate node_attrs and mempolicy_kobj up front and keep your
reordering, this lets us clean up allocations on failure before
kobject_init is called
2) after this remove all the other code and just let
mempolicy_kobj_release clean up node_attrs
3) Add a (%d) to the error message to differentiate failures
This is a little bit cleaner and is a bit less code. (Not built or
tested, just a recommendation).
I'd recommend submitting this patch by itself to mm-stable, since the
remainder of the patch line changes functionality and this fixes a bug
in LTS kernels.
~Gregory
---
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 530e71fe9147..05a410db08b4 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -3541,38 +3541,34 @@ static int __init mempolicy_sysfs_init(void)
int err;
static struct kobject *mempolicy_kobj;
- mempolicy_kobj = kzalloc(sizeof(*mempolicy_kobj), GFP_KERNEL);
- if (!mempolicy_kobj) {
+ node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
+ GFP_KERNEL);
+ if (!node_attrs) {
err = -ENOMEM;
goto err_out;
}
- node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
- GFP_KERNEL);
- if (!node_attrs) {
+ mempolicy_kobj = kzalloc(sizeof(*mempolicy_kobj), GFP_KERNEL);
+ if (!mempolicy_kobj) {
err = -ENOMEM;
- goto mempol_out;
+ kfree(node_attrs);
+ goto err_out;
}
err = kobject_init_and_add(mempolicy_kobj, &mempolicy_ktype, mm_kobj,
"mempolicy");
if (err)
- goto node_out;
+ goto mempol_out;
err = add_weighted_interleave_group(mempolicy_kobj);
- if (err) {
- pr_err("mempolicy sysfs structure failed to initialize\n");
- kobject_put(mempolicy_kobj);
- return err;
- }
+ if (err)
+ goto mempol_out;
- return err;
-node_out:
- kfree(node_attrs);
+ return 0;
mempol_out:
- kfree(mempolicy_kobj);
+ kobject_put(mempolicy_kobj);
err_out:
- pr_err("failed to add mempolicy kobject to the system\n");
+ pr_err("mempolicy sysfs structure failed to initialize (%d)\n", err);
return err;
}
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [PATCH v2 1/4] mm/mempolicy: Fix memory leaks in mempolicy_sysfs_init()
2025-03-13 15:52 ` Gregory Price
@ 2025-03-14 7:44 ` Rakie Kim
2025-03-14 10:55 ` Jonathan Cameron
1 sibling, 0 replies; 27+ messages in thread
From: Rakie Kim @ 2025-03-14 7:44 UTC (permalink / raw)
To: Gregory Price
Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, kernel_team, honggyu.kim,
yunjeong.mun, Rakie Kim
On Thu, 13 Mar 2025 11:52:18 -0400 Gregory Price <gourry@gourry.net> wrote:
> On Thu, Mar 13, 2025 at 03:31:38PM +0900, Rakie Kim wrote:
> > > Is this correct? If kobject_init_and_add fails, from other examples we
> > > need only free the mempolicy_kobj - because it failed to initialize and
> > > therefore should not have any references. I think this causes an
> > > underflow.
> >
> > Regarding the reordering of mempolicy_kobj allocation:
> > 1) In kobject_init_and_add(), kobject_init() is always called, which
>
> Quite right, mea culpa.
>
> >
> > 2) The release function for mempolicy_kobj is responsible for freeing
> > associated memory:
> >
> > static void mempolicy_kobj_release(struct kobject *kobj)
> > {
> > ...
> > kfree(ngrp->nattrs);
> > kfree(ngrp);
> > kfree(kobj);
> > }
> >
>
> I see what you're trying to do now after looking at the free-ordering
> at little closer.
>
> Lets do the following:
>
> 1) allocate node_attrs and mempolicy_kobj up front and keep your
> reordering, this lets us clean up allocations on failure before
> kobject_init is called
>
> 2) after this remove all the other code and just let
> mempolicy_kobj_release clean up node_attrs
>
> 3) Add a (%d) to the error message to differentiate failures
>
> This is a little bit cleaner and is a bit less code. (Not built or
> tested, just a recommendation).
>
> I'd recommend submitting this patch by itself to mm-stable, since the
> remainder of the patch line changes functionality and this fixes a bug
> in LTS kernels.
>
> ~Gregory
>
I will make every effort to incorporate your suggestions.
Additionally, I will separate this patch from the current patch series
and create it as an independent patch.
Rakie
>
>
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 530e71fe9147..05a410db08b4 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -3541,38 +3541,34 @@ static int __init mempolicy_sysfs_init(void)
> int err;
> static struct kobject *mempolicy_kobj;
>
> - mempolicy_kobj = kzalloc(sizeof(*mempolicy_kobj), GFP_KERNEL);
> - if (!mempolicy_kobj) {
> + node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
> + GFP_KERNEL);
> + if (!node_attrs) {
> err = -ENOMEM;
> goto err_out;
> }
>
> - node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
> - GFP_KERNEL);
> - if (!node_attrs) {
> + mempolicy_kobj = kzalloc(sizeof(*mempolicy_kobj), GFP_KERNEL);
> + if (!mempolicy_kobj) {
> err = -ENOMEM;
> - goto mempol_out;
> + kfree(node_attrs);
> + goto err_out;
> }
>
> err = kobject_init_and_add(mempolicy_kobj, &mempolicy_ktype, mm_kobj,
> "mempolicy");
> if (err)
> - goto node_out;
> + goto mempol_out;
>
> err = add_weighted_interleave_group(mempolicy_kobj);
> - if (err) {
> - pr_err("mempolicy sysfs structure failed to initialize\n");
> - kobject_put(mempolicy_kobj);
> - return err;
> - }
> + if (err)
> + goto mempol_out;
>
> - return err;
> -node_out:
> - kfree(node_attrs);
> + return 0;
> mempol_out:
> - kfree(mempolicy_kobj);
> + kobject_put(mempolicy_kobj);
> err_out:
> - pr_err("failed to add mempolicy kobject to the system\n");
> + pr_err("mempolicy sysfs structure failed to initialize (%d)\n", err);
> return err;
> }
>
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [PATCH v2 1/4] mm/mempolicy: Fix memory leaks in mempolicy_sysfs_init()
2025-03-13 15:52 ` Gregory Price
2025-03-14 7:44 ` Rakie Kim
@ 2025-03-14 10:55 ` Jonathan Cameron
2025-03-14 13:42 ` Gregory Price
2025-03-17 8:24 ` Rakie Kim
1 sibling, 2 replies; 27+ messages in thread
From: Jonathan Cameron @ 2025-03-14 10:55 UTC (permalink / raw)
To: Gregory Price
Cc: Rakie Kim, akpm, linux-mm, linux-kernel, linux-cxl,
joshua.hahnjy, dan.j.williams, ying.huang, kernel_team,
honggyu.kim, yunjeong.mun
On Thu, 13 Mar 2025 11:52:18 -0400
Gregory Price <gourry@gourry.net> wrote:
> On Thu, Mar 13, 2025 at 03:31:38PM +0900, Rakie Kim wrote:
> > > Is this correct? If kobject_init_and_add fails, from other examples we
> > > need only free the mempolicy_kobj - because it failed to initialize and
> > > therefore should not have any references. I think this causes an
> > > underflow.
> >
> > Regarding the reordering of mempolicy_kobj allocation:
> > 1) In kobject_init_and_add(), kobject_init() is always called, which
>
> Quite right, mea culpa.
>
> >
> > 2) The release function for mempolicy_kobj is responsible for freeing
> > associated memory:
> >
> > static void mempolicy_kobj_release(struct kobject *kobj)
> > {
> > ...
> > kfree(ngrp->nattrs);
> > kfree(ngrp);
> > kfree(kobj);
> > }
> >
>
> I see what you're trying to do now after looking at the free-ordering
> at little closer.
>
> Lets do the following:
>
> 1) allocate node_attrs and mempolicy_kobj up front and keep your
> reordering, this lets us clean up allocations on failure before
> kobject_init is called
>
> 2) after this remove all the other code and just let
> mempolicy_kobj_release clean up node_attrs
>
> 3) Add a (%d) to the error message to differentiate failures
Given how unlikely (and noisy) a memory allocation failure is,
maybe just drop the printing at all in those paths - allowing
early returns.
The lifetime rules around node_attrs in here are making readability
poor. It is implicitly owned by the mempolicy_kobj, but no direct association.
Maybe just encapsulating the kobject in a structure that contains
this as a [] array at the end. Then we end up with single allocation of
stuff that is effectively one thing.
>
> This is a little bit cleaner and is a bit less code. (Not built or
> tested, just a recommendation).
>
> I'd recommend submitting this patch by itself to mm-stable, since the
> remainder of the patch line changes functionality and this fixes a bug
> in LTS kernels.
>
> ~Gregory
>
> ---
>
>
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 530e71fe9147..05a410db08b4 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -3541,38 +3541,34 @@ static int __init mempolicy_sysfs_init(void)
> int err;
> static struct kobject *mempolicy_kobj;
>
> - mempolicy_kobj = kzalloc(sizeof(*mempolicy_kobj), GFP_KERNEL);
> - if (!mempolicy_kobj) {
> + node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
> + GFP_KERNEL);
> + if (!node_attrs) {
> err = -ENOMEM;
> goto err_out;
> }
>
> - node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
> - GFP_KERNEL);
> - if (!node_attrs) {
> + mempolicy_kobj = kzalloc(sizeof(*mempolicy_kobj), GFP_KERNEL);
> + if (!mempolicy_kobj) {
> err = -ENOMEM;
> - goto mempol_out;
> + kfree(node_attrs);
> + goto err_out;
> }
>
> err = kobject_init_and_add(mempolicy_kobj, &mempolicy_ktype, mm_kobj,
> "mempolicy");
> if (err)
> - goto node_out;
> + goto mempol_out;
>
> err = add_weighted_interleave_group(mempolicy_kobj);
> - if (err) {
> - pr_err("mempolicy sysfs structure failed to initialize\n");
> - kobject_put(mempolicy_kobj);
> - return err;
> - }
> + if (err)
> + goto mempol_out;
>
> - return err;
> -node_out:
> - kfree(node_attrs);
> + return 0;
> mempol_out:
> - kfree(mempolicy_kobj);
> + kobject_put(mempolicy_kobj);
> err_out:
> - pr_err("failed to add mempolicy kobject to the system\n");
> + pr_err("mempolicy sysfs structure failed to initialize (%d)\n", err);
> return err;
> }
>
>
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [PATCH v2 1/4] mm/mempolicy: Fix memory leaks in mempolicy_sysfs_init()
2025-03-14 10:55 ` Jonathan Cameron
@ 2025-03-14 13:42 ` Gregory Price
2025-03-17 8:24 ` Rakie Kim
2025-03-17 8:24 ` Rakie Kim
1 sibling, 1 reply; 27+ messages in thread
From: Gregory Price @ 2025-03-14 13:42 UTC (permalink / raw)
To: Jonathan Cameron
Cc: Rakie Kim, akpm, linux-mm, linux-kernel, linux-cxl,
joshua.hahnjy, dan.j.williams, ying.huang, kernel_team,
honggyu.kim, yunjeong.mun
On Fri, Mar 14, 2025 at 10:55:00AM +0000, Jonathan Cameron wrote:
> >
> > 1) allocate node_attrs and mempolicy_kobj up front and keep your
> > reordering, this lets us clean up allocations on failure before
> > kobject_init is called
> >
> > 2) after this remove all the other code and just let
> > mempolicy_kobj_release clean up node_attrs
> >
> > 3) Add a (%d) to the error message to differentiate failures
>
> Given how unlikely (and noisy) a memory allocation failure is,
> maybe just drop the printing at all in those paths - allowing
> early returns.
>
> The lifetime rules around node_attrs in here are making readability
> poor. It is implicitly owned by the mempolicy_kobj, but no direct association.
> Maybe just encapsulating the kobject in a structure that contains
> this as a [] array at the end. Then we end up with single allocation of
> stuff that is effectively one thing.
>
Even better recommendation, lets do as Jonathan suggests. <3
~Gregory
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 1/4] mm/mempolicy: Fix memory leaks in mempolicy_sysfs_init()
2025-03-14 13:42 ` Gregory Price
@ 2025-03-17 8:24 ` Rakie Kim
0 siblings, 0 replies; 27+ messages in thread
From: Rakie Kim @ 2025-03-17 8:24 UTC (permalink / raw)
To: Gregory Price
Cc: Rakie Kim, akpm, linux-mm, linux-kernel, linux-cxl,
joshua.hahnjy, dan.j.williams, ying.huang, kernel_team,
honggyu.kim, yunjeong.mun, Jonathan Cameron
On Fri, 14 Mar 2025 09:42:31 -0400 Gregory Price <gourry@gourry.net> wrote:
> On Fri, Mar 14, 2025 at 10:55:00AM +0000, Jonathan Cameron wrote:
> > >
> > > 1) allocate node_attrs and mempolicy_kobj up front and keep your
> > > reordering, this lets us clean up allocations on failure before
> > > kobject_init is called
> > >
> > > 2) after this remove all the other code and just let
> > > mempolicy_kobj_release clean up node_attrs
> > >
> > > 3) Add a (%d) to the error message to differentiate failures
> >
> > Given how unlikely (and noisy) a memory allocation failure is,
> > maybe just drop the printing at all in those paths - allowing
> > early returns.
> >
> > The lifetime rules around node_attrs in here are making readability
> > poor. It is implicitly owned by the mempolicy_kobj, but no direct association.
> > Maybe just encapsulating the kobject in a structure that contains
> > this as a [] array at the end. Then we end up with single allocation of
> > stuff that is effectively one thing.
> >
>
> Even better recommendation, lets do as Jonathan suggests. <3
>
> ~Gregory
Hi Gregory
I will revise the next version based on Jonathan's feedback.
Moreover, I'll separate this patch from the hotplug series and make it an
independent patch.
Rakie
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 1/4] mm/mempolicy: Fix memory leaks in mempolicy_sysfs_init()
2025-03-14 10:55 ` Jonathan Cameron
2025-03-14 13:42 ` Gregory Price
@ 2025-03-17 8:24 ` Rakie Kim
1 sibling, 0 replies; 27+ messages in thread
From: Rakie Kim @ 2025-03-17 8:24 UTC (permalink / raw)
To: Jonathan Cameron
Cc: Rakie Kim, akpm, linux-mm, linux-kernel, linux-cxl,
joshua.hahnjy, dan.j.williams, ying.huang, kernel_team,
honggyu.kim, yunjeong.mun, Gregory Price
On Fri, 14 Mar 2025 10:55:00 +0000 Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> On Thu, 13 Mar 2025 11:52:18 -0400
> Gregory Price <gourry@gourry.net> wrote:
>
> > On Thu, Mar 13, 2025 at 03:31:38PM +0900, Rakie Kim wrote:
> > > > Is this correct? If kobject_init_and_add fails, from other examples we
> > > > need only free the mempolicy_kobj - because it failed to initialize and
> > > > therefore should not have any references. I think this causes an
> > > > underflow.
> > >
> > > Regarding the reordering of mempolicy_kobj allocation:
> > > 1) In kobject_init_and_add(), kobject_init() is always called, which
> >
> > Quite right, mea culpa.
> >
> > >
> > > 2) The release function for mempolicy_kobj is responsible for freeing
> > > associated memory:
> > >
> > > static void mempolicy_kobj_release(struct kobject *kobj)
> > > {
> > > ...
> > > kfree(ngrp->nattrs);
> > > kfree(ngrp);
> > > kfree(kobj);
> > > }
> > >
> >
> > I see what you're trying to do now after looking at the free-ordering
> > at little closer.
> >
> > Lets do the following:
> >
> > 1) allocate node_attrs and mempolicy_kobj up front and keep your
> > reordering, this lets us clean up allocations on failure before
> > kobject_init is called
> >
> > 2) after this remove all the other code and just let
> > mempolicy_kobj_release clean up node_attrs
> >
> > 3) Add a (%d) to the error message to differentiate failures
>
> Given how unlikely (and noisy) a memory allocation failure is,
> maybe just drop the printing at all in those paths - allowing
> early returns.
>
> The lifetime rules around node_attrs in here are making readability
> poor. It is implicitly owned by the mempolicy_kobj, but no direct association.
> Maybe just encapsulating the kobject in a structure that contains
> this as a [] array at the end. Then we end up with single allocation of
> stuff that is effectively one thing.
>
Hi Jonathan
Thank you for your response regarding this patch.
Your suggestions seem very appropriate. As you recommended, I will proceed to
encapsulate node_attrs and mempolicy_kobj into a single structure.
Rakie
>
> >
> > This is a little bit cleaner and is a bit less code. (Not built or
> > tested, just a recommendation).
> >
> > I'd recommend submitting this patch by itself to mm-stable, since the
> > remainder of the patch line changes functionality and this fixes a bug
> > in LTS kernels.
> >
> > ~Gregory
> >
> > ---
> >
> >
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index 530e71fe9147..05a410db08b4 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -3541,38 +3541,34 @@ static int __init mempolicy_sysfs_init(void)
> > int err;
> > static struct kobject *mempolicy_kobj;
> >
> > - mempolicy_kobj = kzalloc(sizeof(*mempolicy_kobj), GFP_KERNEL);
> > - if (!mempolicy_kobj) {
> > + node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
> > + GFP_KERNEL);
> > + if (!node_attrs) {
> > err = -ENOMEM;
> > goto err_out;
> > }
> >
> > - node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
> > - GFP_KERNEL);
> > - if (!node_attrs) {
> > + mempolicy_kobj = kzalloc(sizeof(*mempolicy_kobj), GFP_KERNEL);
> > + if (!mempolicy_kobj) {
> > err = -ENOMEM;
> > - goto mempol_out;
> > + kfree(node_attrs);
> > + goto err_out;
> > }
> >
> > err = kobject_init_and_add(mempolicy_kobj, &mempolicy_ktype, mm_kobj,
> > "mempolicy");
> > if (err)
> > - goto node_out;
> > + goto mempol_out;
> >
> > err = add_weighted_interleave_group(mempolicy_kobj);
> > - if (err) {
> > - pr_err("mempolicy sysfs structure failed to initialize\n");
> > - kobject_put(mempolicy_kobj);
> > - return err;
> > - }
> > + if (err)
> > + goto mempol_out;
> >
> > - return err;
> > -node_out:
> > - kfree(node_attrs);
> > + return 0;
> > mempol_out:
> > - kfree(mempolicy_kobj);
> > + kobject_put(mempolicy_kobj);
> > err_out:
> > - pr_err("failed to add mempolicy kobject to the system\n");
> > + pr_err("mempolicy sysfs structure failed to initialize (%d)\n", err);
> > return err;
> > }
> >
> >
>
^ permalink raw reply [flat|nested] 27+ messages in thread