From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8CA7FC369BD for ; Wed, 16 Apr 2025 11:31:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2FDCA280115; Wed, 16 Apr 2025 07:31:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 261D3280112; Wed, 16 Apr 2025 07:31:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 03C31280115; Wed, 16 Apr 2025 07:31:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id DBCA1280112 for ; Wed, 16 Apr 2025 07:31:42 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 7B1EF1A066C for ; Wed, 16 Apr 2025 11:31:43 +0000 (UTC) X-FDA: 83339692086.15.F25A3B1 Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf30.hostedemail.com (Postfix) with ESMTP id B5F008000B for ; Wed, 16 Apr 2025 11:31:41 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=none; spf=pass (imf30.hostedemail.com: domain of rakie.kim@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=rakie.kim@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744803102; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=R/J8DOeI+JHiet7acKxGkTF63g3Q0kL43IQhRZJzrco=; b=d1bTy1NVN5FVedc9LIgazwCo3ZWemkUG6tuT1AV1IcjclRMtVD5J4DbFggReSuu4Lor5kW flN+V1k2FHOgamnljvTlaJ92xelBlg/lHSvkukepP6lKm8oiZuyakCP5MiyG1WNf5zbJYO 7VnGb5EYhzL0H9F8h8tW8Rxl6RqCD5E= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=none; spf=pass (imf30.hostedemail.com: domain of rakie.kim@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=rakie.kim@sk.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744803102; a=rsa-sha256; cv=none; b=6ruXj+bAeMiRk29fTMaxBZ4a6nr/fRhosOAMW7bjbsjPDiYvAGS/mYU0VdSbXId7ZD+YHy Gr97f17yLI+RPGeyuF8zafy1LuS5gP+10FcYLnm6aR/yFgnHB46lH0tdsTRJbV3zoKNy7P 2BIaAkxUAxjIIshaYc04ZL6FFajzs8A= X-AuditID: a67dfc5b-669ff7000002311f-25-67ff951d1539 From: Rakie Kim To: akpm@linux-foundation.org Cc: gourry@gourry.net, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org, joshua.hahnjy@gmail.com, dan.j.williams@intel.com, ying.huang@linux.alibaba.com, david@redhat.com, Jonathan.Cameron@huawei.com, osalvador@suse.de, kernel_team@skhynix.com, honggyu.kim@sk.com, yunjeong.mun@sk.com, rakie.kim@sk.com Subject: [PATCH v8 3/3] mm/mempolicy: Support memory hotplug in weighted interleave Date: Wed, 16 Apr 2025 20:31:21 +0900 Message-ID: <20250416113123.629-4-rakie.kim@sk.com> X-Mailer: git-send-email 2.48.1.windows.1 In-Reply-To: <20250416113123.629-1-rakie.kim@sk.com> References: <20250416113123.629-1-rakie.kim@sk.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrJLMWRmVeSWpSXmKPExsXC9ZZnka7s1P/pBtduiljMWb+GzWL61AuM Fl/X/2K2+Hn3OLvFqoXX2CyOb53HbnF+1ikWi8u75rBZ3Fvzn9XizLQii9VrMhy4PXbOusvu 0d12md2j5chbVo/Fe14yeWz6NInd48SM3yweOx9aerzfd5XNY/Ppao/Pm+QCuKK4bFJSczLL Uov07RK4Ms7unMJasMu44uSJ1cwNjDu0uhg5OSQETCTmf/zKBGMv/fSZpYuRg4NNQEni2N4Y kLCIgKzE1L/ngcJcHMwCj5kkHj1/wQiSEBYIlvj1dxI7iM0ioCrx5P96VhCbV8BY4uz2RcwQ MzUlGi7dA5vPCTT/5KRFYL1CQDVbX81jhKgXlDg58wkLiM0sIC/RvHU2M8gyCYHPbBLN/VcY IQZJShxccYNlAiP/LCQ9s5D0LGBkWsUolJlXlpuYmWOil1GZl1mhl5yfu4kRGAHLav9E72D8 dCH4EKMAB6MSD29E/L90IdbEsuLK3EOMEhzMSiK858yBQrwpiZVVqUX58UWlOanFhxilOViU xHmNvpWnCAmkJ5akZqemFqQWwWSZODilGhhN5YS26e48dnyP5QLHv8w2U6qFstcKTliauCWX a4kHv3rHItcJzOumS+zYPEuF4ZOXjOX85Vafoo4l2DvuMOVbdDs75uoFmwwFxlo5yWv6Wt3y 0aJL5qo1HTrF+XmZ5fFfi16v8Tqq+vrPHb7p0UvsEnWbCjYVRrud/Cud+6LyUVVIidLlRAUl luKMREMt5qLiRAAj0FpxfAIAAA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrMLMWRmVeSWpSXmKPExsXCNUNNS1d26v90gxer2S3mrF/DZjF96gVG i6/rfzFb/Lx7nN3i87PXzBarFl5jszi+dR67xeG5J1ktzs86xWJxedccNot7a/6zWpyZVmRx 6NpzVovVazIsfm9bwebA77Fz1l12j+62y+weLUfesnos3vOSyWPTp0nsHidm/Gbx2PnQ0uP9 vqtsHt9ue3gsfvGByWPz6WqPz5vkAniiuGxSUnMyy1KL9O0SuDLO7pzCWrDLuOLkidXMDYw7 tLoYOTkkBEwkln76zNLFyMHBJqAkcWxvDEhYREBWYurf80BhLg5mgcdMEo+ev2AESQgLBEv8 +juJHcRmEVCVePJ/PSuIzStgLHF2+yJmiJmaEg2X7jGB2JxA809OWgTWKwRUs/XVPEaIekGJ kzOfsIDYzALyEs1bZzNPYOSZhSQ1C0lqASPTKkaRzLyy3MTMHFO94uyMyrzMCr3k/NxNjMCg X1b7Z+IOxi+X3Q8xCnAwKvHwRsT/SxdiTSwrrsw9xCjBwawkwnvOHCjEm5JYWZValB9fVJqT WnyIUZqDRUmc1ys8NUFIID2xJDU7NbUgtQgmy8TBKdXA2NfIXTil1SPiGpv+7n1XO8JeBJ3a szH18uw/1svvcbD18K6Wi13sMlup4+zxCQqlhfZvns34H5YlyJa/2GqtoLbuLo7na0orG/0E futLle5exvElNdPs6E+5pMLKf1vW2lm69Nja3571qvl/Zdn/J5XPu84asjzcrqqefcvCkZN9 xVQ39v63SizFGYmGWsxFxYkAku5R6XYCAAA= X-CFilter-Loop: Reflected X-Stat-Signature: zd6yx7yf1teck1e4zeojsb97t68ske18 X-Rspamd-Queue-Id: B5F008000B X-Rspam-User: X-Rspamd-Server: rspam05 X-HE-Tag: 1744803101-197455 X-HE-Meta: U2FsdGVkX186JdxXX8r0faD/WhllGRDzHTxyFG3u69LN8SX4aqUNTV3d1EzPEpi37OV44pWG8HV3xsK87xw39CaLD4CTMf37XMJgazzmkQPLY0C/9IFs2I9XFcIqlExWICllp+vm9rG9YeVc2n+v4gd/zmaA/ywfHuR1qvHd+U3TsdBfNMgy0q7XOYHrrGpnXFjkN8MJ8pO2BWz7a7GcpK+RcpJ6jYXpXwWjEVF2/ufsLn4pbgTOuZuoUj07wigS7AVXacPVdsVdHovapbL/4ggMDdAkFhOYvJF2gmv0yi2ZQRbnmHNzKkxcqfKwseAdFuLvQFH0lfMMKHD1LRD11guVC9ZJGrgyHpjSkHVLifwzaFMoOI8JUNMUqXf4riHOSwURSQiACxFlj2fcEP7K7PhkJFT5tfZwA84/96KjIqRDRXBJ+rClu2tJVBortap9kAL30pe/IT0GpJYaQ4ibgCW17HRwlSwyqKBt9p18Xs33ViUskffY3Ge+HPXrzFbYP5ZcEdyIWr7Ynsc+NmjOsEpqs+KT8x/bwEeyppC4XcsaLwqKFB/5BZAniAS/mqwwghUFeqfGKsucG+VVh905r5oeJcJykxB8deU3ywa9tSr+GdRcVnG5x7EkD0mxkvF6HRLqpmSgj/pBV9YULCrHs7gk04QnvTLAOV4aifaC2ui6+ms+xb9kSMdvJk4dkeT8mppFDc5EQ1XiofMg4lKBghEI17sb9p4od3sPy7B3HOT4hBkHUSwMBmZKM76JqYk6Xz5JiMwJ/cByyHCX8mfllBDjAi4jAV4qL0NHeI8w0A4JDbYVAyKTsyGe7td3j4+G X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The weighted interleave policy distributes page allocations across multiple NUMA nodes based on their performance weight, thereby improving memory bandwidth utilization. The weight values for each node are configured through sysfs. Previously, sysfs entries for configuring weighted interleave were created for all possible nodes (N_POSSIBLE) at initialization, including nodes that might not have memory. However, not all nodes in N_POSSIBLE are usable at runtime, as some may remain memoryless or offline. This led to sysfs entries being created for unusable nodes, causing potential misconfiguration issues. To address this issue, this patch modifies the sysfs creation logic to: 1) Limit sysfs entries to nodes that are online and have memory, avoiding the creation of sysfs entries for nodes that cannot be used. 2) Support memory hotplug by dynamically adding and removing sysfs entries based on whether a node transitions into or out of the N_MEMORY state. Additionally, the patch ensures that sysfs attributes are properly managed when nodes go offline, preventing stale or redundant entries from persisting in the system. By making these changes, the weighted interleave policy now manages its sysfs entries more efficiently, ensuring that only relevant nodes are considered for interleaving, and dynamically adapting to memory hotplug events. Co-developed-by: Honggyu Kim Signed-off-by: Honggyu Kim Co-developed-by: Yunjeong Mun Signed-off-by: Yunjeong Mun Signed-off-by: Rakie Kim Reviewed-by: Oscar Salvador Reviewed-by: Joshua Hahn Reviewed-by: Gregory Price Acked-by: David Hildenbrand --- mm/mempolicy.c | 107 ++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 84 insertions(+), 23 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 998635127e9d..646fc9e8c8ac 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -113,6 +113,7 @@ #include #include #include +#include #include "internal.h" @@ -3421,6 +3422,7 @@ struct iw_node_attr { struct sysfs_wi_group { struct kobject wi_kobj; + struct mutex kobj_lock; struct iw_node_attr *nattrs[]; }; @@ -3470,13 +3472,24 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr, static void sysfs_wi_node_delete(int nid) { - if (!wi_group->nattrs[nid]) + struct iw_node_attr *attr; + + if (nid < 0 || nid >= nr_node_ids) + return; + + mutex_lock(&wi_group->kobj_lock); + attr = wi_group->nattrs[nid]; + if (!attr) { + mutex_unlock(&wi_group->kobj_lock); return; + } + + wi_group->nattrs[nid] = NULL; + mutex_unlock(&wi_group->kobj_lock); - sysfs_remove_file(&wi_group->wi_kobj, - &wi_group->nattrs[nid]->kobj_attr.attr); - kfree(wi_group->nattrs[nid]->kobj_attr.attr.name); - kfree(wi_group->nattrs[nid]); + sysfs_remove_file(&wi_group->wi_kobj, &attr->kobj_attr.attr); + kfree(attr->kobj_attr.attr.name); + kfree(attr); } static void sysfs_wi_node_delete_all(void) @@ -3517,35 +3530,77 @@ static const struct kobj_type wi_ktype = { static int sysfs_wi_node_add(int nid) { - struct iw_node_attr *node_attr; + int ret = 0; char *name; + struct iw_node_attr *new_attr; - node_attr = kzalloc(sizeof(*node_attr), GFP_KERNEL); - if (!node_attr) + if (nid < 0 || nid >= nr_node_ids) { + pr_err("invalid node id: %d\n", nid); + return -EINVAL; + } + + new_attr = kzalloc(sizeof(*new_attr), GFP_KERNEL); + if (!new_attr) return -ENOMEM; name = kasprintf(GFP_KERNEL, "node%d", nid); if (!name) { - kfree(node_attr); + kfree(new_attr); return -ENOMEM; } - sysfs_attr_init(&node_attr->kobj_attr.attr); - node_attr->kobj_attr.attr.name = name; - node_attr->kobj_attr.attr.mode = 0644; - node_attr->kobj_attr.show = node_show; - node_attr->kobj_attr.store = node_store; - node_attr->nid = nid; + sysfs_attr_init(&new_attr->kobj_attr.attr); + new_attr->kobj_attr.attr.name = name; + new_attr->kobj_attr.attr.mode = 0644; + new_attr->kobj_attr.show = node_show; + new_attr->kobj_attr.store = node_store; + new_attr->nid = nid; - if (sysfs_create_file(&wi_group->wi_kobj, &node_attr->kobj_attr.attr)) { - kfree(node_attr->kobj_attr.attr.name); - kfree(node_attr); - pr_err("failed to add attribute to weighted_interleave\n"); - return -ENOMEM; + mutex_lock(&wi_group->kobj_lock); + if (wi_group->nattrs[nid]) { + mutex_unlock(&wi_group->kobj_lock); + pr_info("node%d already exists\n", nid); + goto out; } - wi_group->nattrs[nid] = node_attr; + ret = sysfs_create_file(&wi_group->wi_kobj, &new_attr->kobj_attr.attr); + if (ret) { + mutex_unlock(&wi_group->kobj_lock); + goto out; + } + wi_group->nattrs[nid] = new_attr; + mutex_unlock(&wi_group->kobj_lock); return 0; + +out: + kfree(new_attr->kobj_attr.attr.name); + kfree(new_attr); + return ret; +} + +static int wi_node_notifier(struct notifier_block *nb, + unsigned long action, void *data) +{ + int err; + struct memory_notify *arg = data; + int nid = arg->status_change_nid; + + if (nid < 0) + return NOTIFY_OK; + + switch (action) { + case MEM_ONLINE: + err = sysfs_wi_node_add(nid); + if (err) + pr_err("failed to add sysfs for node%d during hotplug: %d\n", + nid, err); + break; + case MEM_OFFLINE: + sysfs_wi_node_delete(nid); + break; + } + + return NOTIFY_OK; } static int __init add_weighted_interleave_group(struct kobject *mempolicy_kobj) @@ -3556,20 +3611,26 @@ static int __init add_weighted_interleave_group(struct kobject *mempolicy_kobj) GFP_KERNEL); if (!wi_group) return -ENOMEM; + mutex_init(&wi_group->kobj_lock); err = kobject_init_and_add(&wi_group->wi_kobj, &wi_ktype, mempolicy_kobj, "weighted_interleave"); if (err) goto err_put_kobj; - for_each_node_state(nid, N_POSSIBLE) { + for_each_online_node(nid) { + if (!node_state(nid, N_MEMORY)) + continue; + err = sysfs_wi_node_add(nid); if (err) { - pr_err("failed to add sysfs [node%d]\n", nid); + pr_err("failed to add sysfs for node%d during init: %d\n", + nid, err); goto err_cleanup_kobj; } } + hotplug_memory_notifier(wi_node_notifier, DEFAULT_CALLBACK_PRI); return 0; err_cleanup_kobj: -- 2.34.1