From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4A164C369AB for ; Tue, 15 Apr 2025 16:00:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2BFE0280007; Tue, 15 Apr 2025 12:00:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2701A280006; Tue, 15 Apr 2025 12:00:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1115E280007; Tue, 15 Apr 2025 12:00:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id E589F280006 for ; Tue, 15 Apr 2025 12:00:38 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 19936121683 for ; Tue, 15 Apr 2025 16:00:39 +0000 (UTC) X-FDA: 83336740998.20.265116F Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by imf08.hostedemail.com (Postfix) with ESMTP id A3EC116001C for ; Tue, 15 Apr 2025 16:00:35 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=none; spf=pass (imf08.hostedemail.com: domain of jonathan.cameron@huawei.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=jonathan.cameron@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744732836; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=EegICITvgdWkmYCeiXt7d2kLXpKXbL+fghuaBeDPfg0=; b=wqjfrHZRp0Dce17JxQ+uEQ0hq6Xe0Wt3eKaN2jhXLSv9xzGGL3Pj5vI3YHIW8zdP0Yi4gC uHmmyHht5oC60l+zTZdTvlwaavOBVWB0FGdsZZiARWJ6MotfSZMezycGGP2RAf2+OdI3hr hjPPeos1jhxppfmwIOl7NAWJW9Q6LqQ= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=none; spf=pass (imf08.hostedemail.com: domain of jonathan.cameron@huawei.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=jonathan.cameron@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744732836; a=rsa-sha256; cv=none; b=8HequhBO2hFT9HYAUOFxWeKyvOG0XEZ7ZJC0bEhaU2qgbxDhBw1noyza18WEpzRBC8kk3S 0PewHrM8pq4Fx4WkUeRqXJ5vDH0UDFcwdv4d99SqolmVgGD7ATYMP6xaACS54p746Wvlmx 2lLPU2f30XOgppbsWWcxIO8lHg8GWXQ= Received: from mail.maildlp.com (unknown [172.18.186.216]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4ZcTNy5h6Mz6FGYx; Tue, 15 Apr 2025 23:59:14 +0800 (CST) Received: from frapeml500008.china.huawei.com (unknown [7.182.85.71]) by mail.maildlp.com (Postfix) with ESMTPS id 27BAF140133; Wed, 16 Apr 2025 00:00:33 +0800 (CST) Received: from localhost (10.203.177.66) by frapeml500008.china.huawei.com (7.182.85.71) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Tue, 15 Apr 2025 18:00:32 +0200 Date: Tue, 15 Apr 2025 17:00:31 +0100 From: Jonathan Cameron To: Rakie Kim CC: , , , , , , , , , , , , Subject: Re: [PATCH v7 3/3] mm/mempolicy: Support memory hotplug in weighted interleave Message-ID: <20250415170031.0000372b@huawei.com> In-Reply-To: <20250408073243.488-4-rakie.kim@sk.com> References: <20250408073243.488-1-rakie.kim@sk.com> <20250408073243.488-4-rakie.kim@sk.com> X-Mailer: Claws Mail 4.3.0 (GTK 3.24.42; x86_64-w64-mingw32) MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.203.177.66] X-ClientProxiedBy: lhrpeml100006.china.huawei.com (7.191.160.224) To frapeml500008.china.huawei.com (7.182.85.71) X-Stat-Signature: kfcxe54u18zyc5bq59mxbh9wfs9z9q1f X-Rspamd-Queue-Id: A3EC116001C X-Rspam-User: X-Rspamd-Server: rspam05 X-HE-Tag: 1744732835-507116 X-HE-Meta: U2FsdGVkX1/ylwCU1d9hN/kcviCKnSbIo2CklZKVaOmk5wrP074CAZo40UY8JU9NcZBFFmNXOFu9zc2dJn2q8m5wTBv5SDX5Jf1paqdITvH+DB9yer5BjlLlnkSlcGPFoY57kizio70HQ1ktw0/PGIjIpS0cs+d/yks0B1KSJjulBbH/qt26mjGDaZUj+0s+zSpAkx+Z+bdYQb4vCNpBAoD1xjzMCW09wRiBL0CnmWx6jYrbIAbIkSthcVMsSuRSrEDn36Vi+gj/EveISLlUGHuMZgeqc3la8QUl7j8355ahHdB6foU17lHz76tLj9G9NqTH1sA9ztkmSItUKdrkhUUS6WsHE+98N96eVBIinp56WJTnBRYzutJsSvJaW1WPjkzbVIDVLKAuojy3o8rwZuVYY4iEX1rFaL/qFOcMighxQn18AeuSuTxF+hx+3bgj777547Nw8JQ7AAATpremS0OsN1n76PO4DObkRUKf0qr9TDYQ0xxSM4bNSPAHWgcvUq7Xn5+RVVM7uMN7qNZ/gz0C2O91N1sjXNSgL5Z4Sl17i2nTF+ojndjqLDqINdXQ+a7BP2hQpLASJjWmFImKZuVGJSk+/NLHvS1nI1hREesO3/9mzCx25NUiO5ykzFomd1YydLn4GCumn3RmVJftBzeIiFtATfAbQTs8m2AtuyMP79vrLoXnqwngJFGChKGorm2h8GxB1zhlCpOX+8SAf7hzlp5qtL8Z9Yd1vdedC9SSpDSEH0z/gW0SxQCE6VmEgEpqTII9bzCxMa2IqmeI1P7w0wB0gOdNE+Cb0c53W/AKz+3mV0HnXCNQo0NbE5IKOLEJrCyRLwrv8H51rrutB/2cTtiRfIuq5vKhRvHD5bl1FlZOBpeiSSeXTnMuj7xm2K4qnQFofysydNwWS5A6g/NwD7k1qusVKlFma/uH6pjed93wwBK8f6etyys9j4fV2+1btD1cnINDMeqsrVv HHmxXdYu 6VmO4xocn9L1GS1qnF86efVJHUXLOfBnboGYpCLclHFvTVHISi9mkf/MTkjXNACQICkIKA4Hg/Ouw/cVmWzqTkkHO1PcBefjZeR8SahacqEZBFu6a3YrGDBt/rMFB7E5tF3ZVK+XXpg12zWAJ51gkjDdyKcbj6rNrXEbnmYepnFrDIm2jbpfWyrVYSOf9SaGddsfFY9bfylORxmYLS2JkAqi/o3HGbHXID+o0vqqXyKMz9ev3dL0rfuUPRwfulGt78oBgLH3yqcjJylsBmAUR2bJA3BxZoJtSioivAX5Gd7SBwI3S+b3vivsXFQT9doqhGudTdg6olhFkThPdiAT68iAMrg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, 8 Apr 2025 16:32:42 +0900 Rakie Kim wrote: > The weighted interleave policy distributes page allocations across multiple > NUMA nodes based on their performance weight, thereby improving memory > bandwidth utilization. The weight values for each node are configured > through sysfs. > > Previously, sysfs entries for configuring weighted interleave were created > for all possible nodes (N_POSSIBLE) at initialization, including nodes that > might not have memory. However, not all nodes in N_POSSIBLE are usable at > runtime, as some may remain memoryless or offline. > This led to sysfs entries being created for unusable nodes, causing > potential misconfiguration issues. > > To address this issue, this patch modifies the sysfs creation logic to: > 1) Limit sysfs entries to nodes that are online and have memory, avoiding > the creation of sysfs entries for nodes that cannot be used. > 2) Support memory hotplug by dynamically adding and removing sysfs entries > based on whether a node transitions into or out of the N_MEMORY state. > > Additionally, the patch ensures that sysfs attributes are properly managed > when nodes go offline, preventing stale or redundant entries from persisting > in the system. > > By making these changes, the weighted interleave policy now manages its > sysfs entries more efficiently, ensuring that only relevant nodes are > considered for interleaving, and dynamically adapting to memory hotplug > events. > > Signed-off-by: Rakie Kim > Signed-off-by: Honggyu Kim > Signed-off-by: Yunjeong Mun > Reviewed-by: Oscar Salvador > --- > mm/mempolicy.c | 106 ++++++++++++++++++++++++++++++++++++++----------- > 1 file changed, 83 insertions(+), 23 deletions(-) > > diff --git a/mm/mempolicy.c b/mm/mempolicy.c > index 988575f29c53..9aa884107f4c 100644 > --- a/mm/mempolicy.c > +++ b/mm/mempolicy.c > @@ -113,6 +113,7 @@ > #include > #include > #include > +#include > > #include "internal.h" > > @@ -3421,6 +3422,7 @@ struct iw_node_attr { > > struct sysfs_wi_group { > struct kobject wi_kobj; > + struct mutex kobj_lock; > struct iw_node_attr *nattrs[]; > }; > > @@ -3470,13 +3472,24 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr, > > static void sysfs_wi_node_delete(int nid) > { > - if (!wi_group->nattrs[nid]) > + struct iw_node_attr *attr; > + > + if (nid < 0 || nid >= nr_node_ids) > + return; > + > + mutex_lock(&wi_group->kobj_lock); > + attr = wi_group->nattrs[nid]; > + if (!attr) { > + mutex_unlock(&wi_group->kobj_lock); > return; > + } > + > + wi_group->nattrs[nid] = NULL; > + mutex_unlock(&wi_group->kobj_lock); > > - sysfs_remove_file(&wi_group->wi_kobj, > - &wi_group->nattrs[nid]->kobj_attr.attr); > - kfree(wi_group->nattrs[nid]->kobj_attr.attr.name); > - kfree(wi_group->nattrs[nid]); > + sysfs_remove_file(&wi_group->wi_kobj, &attr->kobj_attr.attr); > + kfree(attr->kobj_attr.attr.name); > + kfree(attr); Here you go through a careful dance to not touch wi_group->nattrs[nid] except under the lock, but later you are happy to do so in the error handling paths. Maybe better to do similar to here and set it to NULL under the lock but do the freeing on a copy taken under that lock. . > } > > static void sysfs_wi_release(struct kobject *wi_kobj) > @@ -3495,35 +3508,77 @@ static const struct kobj_type wi_ktype = { > > static int sysfs_wi_node_add(int nid) > { > - struct iw_node_attr *node_attr; > + int ret = 0; Trivial but isn't ret always set when it is used? So no need to initialize here. > char *name; > + struct iw_node_attr *new_attr = NULL; This is also always set before use so I'm not seeing a reason to initialize it to NULL. > > - node_attr = kzalloc(sizeof(*node_attr), GFP_KERNEL); > - if (!node_attr) > + if (nid < 0 || nid >= nr_node_ids) { > + pr_err("Invalid node id: %d\n", nid); > + return -EINVAL; > + } > + > + new_attr = kzalloc(sizeof(struct iw_node_attr), GFP_KERNEL); I'd prefer sizeof(*new_attr) because I'm lazy and don't like checking types for allocation sizes :) Local style seems to be a bit of a mix though. > + if (!new_attr) > return -ENOMEM; > > name = kasprintf(GFP_KERNEL, "node%d", nid); > if (!name) { > - kfree(node_attr); > + kfree(new_attr); > return -ENOMEM; > } > > - sysfs_attr_init(&node_attr->kobj_attr.attr); > - node_attr->kobj_attr.attr.name = name; > - node_attr->kobj_attr.attr.mode = 0644; > - node_attr->kobj_attr.show = node_show; > - node_attr->kobj_attr.store = node_store; > - node_attr->nid = nid; > + mutex_lock(&wi_group->kobj_lock); > + if (wi_group->nattrs[nid]) { > + mutex_unlock(&wi_group->kobj_lock); > + pr_info("Node [%d] already exists\n", nid); > + kfree(new_attr); > + kfree(name); > + return 0; > + } > + wi_group->nattrs[nid] = new_attr; > > - if (sysfs_create_file(&wi_group->wi_kobj, &node_attr->kobj_attr.attr)) { > - kfree(node_attr->kobj_attr.attr.name); > - kfree(node_attr); > - pr_err("failed to add attribute to weighted_interleave\n"); > - return -ENOMEM; > + sysfs_attr_init(&wi_group->nattrs[nid]->kobj_attr.attr); I'd have been tempted to use the new_attr pointer but perhaps this brings some documentation like advantages. > + wi_group->nattrs[nid]->kobj_attr.attr.name = name; > + wi_group->nattrs[nid]->kobj_attr.attr.mode = 0644; > + wi_group->nattrs[nid]->kobj_attr.show = node_show; > + wi_group->nattrs[nid]->kobj_attr.store = node_store; > + wi_group->nattrs[nid]->nid = nid; > + > + ret = sysfs_create_file(&wi_group->wi_kobj, > + &wi_group->nattrs[nid]->kobj_attr.attr); > + if (ret) { > + kfree(wi_group->nattrs[nid]->kobj_attr.attr.name); See comment above on the rather different handling here to in sysfs_wi_node_delete() where you set it to NULL first, release the lock and tidy up. new_attrand name are still set so you could even combine the handling with the if (wi_group->nattrs[nid]) above via appropriate gotos. > + kfree(wi_group->nattrs[nid]); > + wi_group->nattrs[nid] = NULL; > + pr_err("Failed to add attribute to weighted_interleave: %d\n", ret); > } > + mutex_unlock(&wi_group->kobj_lock); > > - wi_group->nattrs[nid] = node_attr; > - return 0; > + return ret; > +}