From: Honggyu Kim <honggyu.kim@sk.com>
To: Jonathan Cameron <Jonathan.Cameron@huawei.com>,
Rakie Kim <rakie.kim@sk.com>
Cc: kernel_team@skhynix.com, akpm@linux-foundation.org,
gourry@gourry.net, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org,
joshua.hahnjy@gmail.com, dan.j.williams@intel.com,
ying.huang@linux.alibaba.com, david@redhat.com,
osalvador@suse.de, yunjeong.mun@sk.com
Subject: Re: [PATCH v7 3/3] mm/mempolicy: Support memory hotplug in weighted interleave
Date: Wed, 16 Apr 2025 13:04:32 +0900 [thread overview]
Message-ID: <6a651c16-7ffc-42a5-8c98-95949073c804@sk.com> (raw)
In-Reply-To: <20250415170031.0000372b@huawei.com>
Hi Jonathan,
Thanks for reviewing our patches.
I have a few comments and the rest will be addressed by Rakie.
On 4/16/2025 1:00 AM, Jonathan Cameron wrote:
> On Tue, 8 Apr 2025 16:32:42 +0900
> Rakie Kim <rakie.kim@sk.com> wrote:
>
>> The weighted interleave policy distributes page allocations across multiple
>> NUMA nodes based on their performance weight, thereby improving memory
>> bandwidth utilization. The weight values for each node are configured
>> through sysfs.
>>
>> Previously, sysfs entries for configuring weighted interleave were created
>> for all possible nodes (N_POSSIBLE) at initialization, including nodes that
>> might not have memory. However, not all nodes in N_POSSIBLE are usable at
>> runtime, as some may remain memoryless or offline.
>> This led to sysfs entries being created for unusable nodes, causing
>> potential misconfiguration issues.
>>
>> To address this issue, this patch modifies the sysfs creation logic to:
>> 1) Limit sysfs entries to nodes that are online and have memory, avoiding
>> the creation of sysfs entries for nodes that cannot be used.
>> 2) Support memory hotplug by dynamically adding and removing sysfs entries
>> based on whether a node transitions into or out of the N_MEMORY state.
>>
>> Additionally, the patch ensures that sysfs attributes are properly managed
>> when nodes go offline, preventing stale or redundant entries from persisting
>> in the system.
>>
>> By making these changes, the weighted interleave policy now manages its
>> sysfs entries more efficiently, ensuring that only relevant nodes are
>> considered for interleaving, and dynamically adapting to memory hotplug
>> events.
>>
>> Signed-off-by: Rakie Kim <rakie.kim@sk.com>
>> Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
>> Signed-off-by: Yunjeong Mun <yunjeong.mun@sk.com>
>> Reviewed-by: Oscar Salvador <osalvador@suse.de>
>> ---
>> mm/mempolicy.c | 106 ++++++++++++++++++++++++++++++++++++++-----------
>> 1 file changed, 83 insertions(+), 23 deletions(-)
>>
>> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
>> index 988575f29c53..9aa884107f4c 100644
>> --- a/mm/mempolicy.c
>> +++ b/mm/mempolicy.c
>> @@ -113,6 +113,7 @@
>> #include <asm/tlbflush.h>
>> #include <asm/tlb.h>
>> #include <linux/uaccess.h>
>> +#include <linux/memory.h>
>>
>> #include "internal.h"
>>
>> @@ -3421,6 +3422,7 @@ struct iw_node_attr {
>>
>> struct sysfs_wi_group {
>> struct kobject wi_kobj;
>> + struct mutex kobj_lock;
>> struct iw_node_attr *nattrs[];
>> };
>>
>> @@ -3470,13 +3472,24 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
>>
>> static void sysfs_wi_node_delete(int nid)
>> {
>> - if (!wi_group->nattrs[nid])
>> + struct iw_node_attr *attr;
>> +
>> + if (nid < 0 || nid >= nr_node_ids)
>> + return;
>> +
>> + mutex_lock(&wi_group->kobj_lock);
>> + attr = wi_group->nattrs[nid];
>> + if (!attr) {
>> + mutex_unlock(&wi_group->kobj_lock);
>> return;
>> + }
>> +
>> + wi_group->nattrs[nid] = NULL;
>> + mutex_unlock(&wi_group->kobj_lock);
>>
>> - sysfs_remove_file(&wi_group->wi_kobj,
>> - &wi_group->nattrs[nid]->kobj_attr.attr);
>> - kfree(wi_group->nattrs[nid]->kobj_attr.attr.name);
>> - kfree(wi_group->nattrs[nid]);
>> + sysfs_remove_file(&wi_group->wi_kobj, &attr->kobj_attr.attr);
>> + kfree(attr->kobj_attr.attr.name);
>> + kfree(attr);
> Here you go through a careful dance to not touch wi_group->nattrs[nid]
> except under the lock, but later you are happy to do so in the
> error handling paths. Maybe better to do similar to here and
> set it to NULL under the lock but do the freeing on a copy taken
> under that lock.
> .
>> }
>>
>> static void sysfs_wi_release(struct kobject *wi_kobj)
>> @@ -3495,35 +3508,77 @@ static const struct kobj_type wi_ktype = {
>>
>> static int sysfs_wi_node_add(int nid)
>> {
>> - struct iw_node_attr *node_attr;
>> + int ret = 0;
>
> Trivial but isn't ret always set when it is used? So no need to initialize
> here.
If we don't initialize it, then this kind of trivial fixup might be needed later
so I think there is no reason not to initialize it.
https://lore.kernel.org/mm-commits/20240705010631.46743C4AF07@smtp.kernel.org
>
>> char *name;
>> + struct iw_node_attr *new_attr = NULL;
>
> This is also always set before use so I'm not seeing a
> reason to initialize it to NULL.
Ditto.
>
>
>>
>> - node_attr = kzalloc(sizeof(*node_attr), GFP_KERNEL);
>> - if (!node_attr)
>> + if (nid < 0 || nid >= nr_node_ids) {
>> + pr_err("Invalid node id: %d\n", nid);
>> + return -EINVAL;
>> + }
>> +
>> + new_attr = kzalloc(sizeof(struct iw_node_attr), GFP_KERNEL);
>
> I'd prefer sizeof(*new_attr) because I'm lazy and don't like checking
> types for allocation sizes :) Local style seems to be a bit
> of a mix though.
Agreed.
>
>> + if (!new_attr)
>> return -ENOMEM;
>>
>> name = kasprintf(GFP_KERNEL, "node%d", nid);
>> if (!name) {
>> - kfree(node_attr);
>> + kfree(new_attr);
>> return -ENOMEM;
>> }
>>
>> - sysfs_attr_init(&node_attr->kobj_attr.attr);
>> - node_attr->kobj_attr.attr.name = name;
>> - node_attr->kobj_attr.attr.mode = 0644;
>> - node_attr->kobj_attr.show = node_show;
>> - node_attr->kobj_attr.store = node_store;
>> - node_attr->nid = nid;
>> + mutex_lock(&wi_group->kobj_lock);
>> + if (wi_group->nattrs[nid]) {
>> + mutex_unlock(&wi_group->kobj_lock);
>> + pr_info("Node [%d] already exists\n", nid);
>> + kfree(new_attr);
>> + kfree(name);
>> + return 0;
>> + }
>> + wi_group->nattrs[nid] = new_attr;
This set can be done after all the "wi_group->nattrs[nid]" related set is done.
>>
>> - if (sysfs_create_file(&wi_group->wi_kobj, &node_attr->kobj_attr.attr)) {
>> - kfree(node_attr->kobj_attr.attr.name);
>> - kfree(node_attr);
>> - pr_err("failed to add attribute to weighted_interleave\n");
>> - return -ENOMEM;
>> + sysfs_attr_init(&wi_group->nattrs[nid]->kobj_attr.attr);
>
> I'd have been tempted to use the new_attr pointer but perhaps
> this brings some documentation like advantages.
+1
>
>> + wi_group->nattrs[nid]->kobj_attr.attr.name = name;
>> + wi_group->nattrs[nid]->kobj_attr.attr.mode = 0644;
>> + wi_group->nattrs[nid]->kobj_attr.show = node_show;
>> + wi_group->nattrs[nid]->kobj_attr.store = node_store;
>> + wi_group->nattrs[nid]->nid = nid;
As Jonathan mentioned, all the "wi_group->nattrs[nid]" here is better to be
"new_attr" for simplicity.
Thanks,
Honggyu
>> +
>> + ret = sysfs_create_file(&wi_group->wi_kobj,
>> + &wi_group->nattrs[nid]->kobj_attr.attr);
>> + if (ret) {
>> + kfree(wi_group->nattrs[nid]->kobj_attr.attr.name);
>
> See comment above on the rather different handling here to in
> sysfs_wi_node_delete() where you set it to NULL first, release the lock and tidy up.
> new_attrand name are still set so you could even combine the handling with the
> if (wi_group->nattrs[nid]) above via appropriate gotos.
>
>> + kfree(wi_group->nattrs[nid]);
>> + wi_group->nattrs[nid] = NULL;
>> + pr_err("Failed to add attribute to weighted_interleave: %d\n", ret);
>> }
>> + mutex_unlock(&wi_group->kobj_lock);
>>
>> - wi_group->nattrs[nid] = node_attr;
>> - return 0;
>> + return ret;
>> +}
>
>
next prev parent reply other threads:[~2025-04-16 4:04 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-08 7:32 [PATCH v7 0/3] Enhance sysfs handling for " Rakie Kim
2025-04-08 7:32 ` [PATCH v7 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs Rakie Kim
2025-04-08 13:45 ` Joshua Hahn
2025-04-15 15:41 ` Jonathan Cameron
2025-04-08 7:32 ` [PATCH v7 2/3] mm/mempolicy: Prepare weighted interleave sysfs for memory hotplug Rakie Kim
2025-04-08 13:49 ` Joshua Hahn
2025-04-09 3:43 ` Dan Williams
2025-04-09 3:54 ` Dan Williams
2025-04-09 5:56 ` Rakie Kim
2025-04-09 18:51 ` Dan Williams
2025-04-10 7:53 ` Rakie Kim
2025-04-10 8:06 ` Rakie Kim
2025-04-11 3:11 ` Andrew Morton
2025-04-11 7:21 ` Rakie Kim
2025-04-11 22:24 ` Dan Williams
2025-04-08 7:32 ` [PATCH v7 3/3] mm/mempolicy: Support memory hotplug in weighted interleave Rakie Kim
2025-04-08 13:52 ` Joshua Hahn
2025-04-08 14:45 ` Gregory Price
2025-04-09 9:05 ` David Hildenbrand
2025-04-09 11:39 ` Honggyu Kim
2025-04-09 11:52 ` David Hildenbrand
2025-04-10 7:53 ` Rakie Kim
2025-04-10 13:25 ` Honggyu Kim
2025-04-10 13:41 ` David Hildenbrand
2025-04-15 16:00 ` Jonathan Cameron
2025-04-16 4:04 ` Honggyu Kim [this message]
2025-04-16 7:37 ` Honggyu Kim
2025-04-16 7:49 ` Rakie Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6a651c16-7ffc-42a5-8c98-95949073c804@sk.com \
--to=honggyu.kim@sk.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=akpm@linux-foundation.org \
--cc=dan.j.williams@intel.com \
--cc=david@redhat.com \
--cc=gourry@gourry.net \
--cc=joshua.hahnjy@gmail.com \
--cc=kernel_team@skhynix.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=osalvador@suse.de \
--cc=rakie.kim@sk.com \
--cc=ying.huang@linux.alibaba.com \
--cc=yunjeong.mun@sk.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox