From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 46255C369B2 for ; Thu, 17 Apr 2025 07:29:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0D1376B018C; Thu, 17 Apr 2025 03:29:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 07F236B018D; Thu, 17 Apr 2025 03:29:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E61626B018E; Thu, 17 Apr 2025 03:29:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id C78646B018C for ; Thu, 17 Apr 2025 03:29:12 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 4A7121213B7 for ; Thu, 17 Apr 2025 07:29:13 +0000 (UTC) X-FDA: 83342709786.22.C220A15 Received: from invmail4.hynix.com (exvmail4.hynix.com [166.125.252.92]) by imf14.hostedemail.com (Postfix) with ESMTP id 4CD0F100002 for ; Thu, 17 Apr 2025 07:29:11 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf14.hostedemail.com: domain of rakie.kim@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=rakie.kim@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744874951; a=rsa-sha256; cv=none; b=lZpwol6xSbqevamrWMN24g3Eluk3kXweXPvwYk6RgnouOZ8IaI0KdA35IOSXpsIZO9SIF8 oQXtP7kP6q81B9TBnpExD+/ixTT0/fjfPll12wlvV2GY3HLLLCCuI165cB0OrZ3HcAqLOS ejsjq9BcQsMm+6QZdPnr9EDoUegltkE= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf14.hostedemail.com: domain of rakie.kim@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=rakie.kim@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744874951; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NUWNPCA1ZhAryv3IekXhEACUMqR+NuKEtUdefv47rQw=; b=4Dv/e0/Um+x9o8sKdwZlfT+TX9q8S4JqBhm6KusL2cs1mFdHUTyIptTHN8OrSAOj/bI4Z4 cCW9jsv2dGAUZP5GIjGfmkQgXsvCtMUDcaZ8pyEIb8/Vpd/iFZ9mhkHJ31eSmI/5pd24kw zp4FlEZUZqygknC3sWgvYWygcAzZYS0= X-AuditID: a67dfc5b-669ff7000002311f-12-6800adc65bd6 From: Rakie Kim To: akpm@linux-foundation.org Cc: gourry@gourry.net, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org, joshua.hahnjy@gmail.com, dan.j.williams@intel.com, ying.huang@linux.alibaba.com, david@redhat.com, Jonathan.Cameron@huawei.com, osalvador@suse.de, kernel_team@skhynix.com, honggyu.kim@sk.com, yunjeong.mun@sk.com, rakie.kim@sk.com Subject: [PATCH v9 3/3] mm/mempolicy: Support memory hotplug in weighted interleave Date: Thu, 17 Apr 2025 16:28:37 +0900 Message-ID: <20250417072839.711-4-rakie.kim@sk.com> X-Mailer: git-send-email 2.48.1.windows.1 In-Reply-To: <20250417072839.711-1-rakie.kim@sk.com> References: <20250417072839.711-1-rakie.kim@sk.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrJLMWRmVeSWpSXmKPExsXC9ZZnoe6xtQwZBp8X81jMWb+GzWL61AuM Fl/X/2K2+Hn3OLvFqoXX2CyOb53HbnF+1ikWi8u75rBZ3Fvzn9XizLQii9VrMhy4PXbOusvu 0d12md2j5chbVo/Fe14yeWz6NInd48SM3yweOx9aerzfd5XNY/Ppao/Pm+QCuKK4bFJSczLL Uov07RK4Mtrab7AWfDCuuDZ5PnsD40utLkZODgkBE4nnq6YzwtgnDjQxdTFycLAJKEkc2xsD EhYRkJWY+vc8SxcjFwezwGMmiUfPX4DVCwsESxxbvYgNxGYRUJU4cG8GO4jNK2AscfpWFxPE TE2Jhkv3wGxOoPnTdrcwg9hCQDUzX01ng6gXlDg58wkLiM0sIC/RvHU2M8gyCYHvbBIfZmyF Ok5S4uCKGywTGPlnIemZhaRnASPTKkahzLyy3MTMHBO9jMq8zAq95PzcTYzACFhW+yd6B+On C8GHGAU4GJV4eE8s+p8uxJpYVlyZe4hRgoNZSYT3nPm/dCHelMTKqtSi/Pii0pzU4kOM0hws SuK8Rt/KU4QE0hNLUrNTUwtSi2CyTBycUg2MzPfuKj+bZR32bIltfPAaZwfl29765x6s5Wqb 9fTuksmt/wM6lz3+ulr/lmf/uzJm0Q79m4ymnifFGVQO8qx2Sl3xdPGNkOYtpm8k1PXjn9Rs 3prwrcHD/lL2181RH6dmRuyokZ26ZGnFBE+2VyZ/rEry5kZPKvjNMOdcyKUnIecXLzDNerzk gRJLcUaioRZzUXEiAAybUft8AgAA X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrMLMWRmVeSWpSXmKPExsXCNUNNS/fYWoYMgyPrWC3mrF/DZjF96gVG i6/rfzFb/Lx7nN3i87PXzBarFl5jszi+dR67xeG5J1ktzs86xWJxedccNot7a/6zWpyZVmRx 6NpzVovVazIsfm9bwebA77Fz1l12j+62y+weLUfesnos3vOSyWPTp0nsHidm/Gbx2PnQ0uP9 vqtsHt9ue3gsfvGByWPz6WqPz5vkAniiuGxSUnMyy1KL9O0SuDLa2m+wFnwwrrg2eT57A+NL rS5GTg4JAROJEweamLoYOTjYBJQkju2NAQmLCMhKTP17nqWLkYuDWeAxk8Sj5y8YQRLCAsES x1YvYgOxWQRUJQ7cm8EOYvMKGEucvtXFBDFTU6Lh0j0wmxNo/rTdLcwgthBQzcxX09kg6gUl Ts58wgJiMwvISzRvnc08gZFnFpLULCSpBYxMqxhFMvPKchMzc0z1irMzKvMyK/SS83M3MQKD flntn4k7GL9cdj/EKMDBqMTDe2LR/3Qh1sSy4srcQ4wSHMxKIrznzP+lC/GmJFZWpRblxxeV 5qQWH2KU5mBREuf1Ck9NEBJITyxJzU5NLUgtgskycXBKNTCahEkfvhp+farp7Cu7N6xOYLkX 57B6Q5OwYV9jeMD36P0HYhYe4w68YXqoo/F2mer29nn3g7+EZR70TNwitWvNd8Nt/hOFa+xe MXzVWKA75351G9/EKxODhRTlp85kO5eg4lIlHbDg5jvX2+G+D++oRMtyfbuzwkFhWeOUCgm5 ixNX1W1INXihxFKckWioxVxUnAgA8/fEZXYCAAA= X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 4CD0F100002 X-Stat-Signature: 96xn35gud1uceb7gbwnfskkxc5gbm81x X-HE-Tag: 1744874951-632253 X-HE-Meta: U2FsdGVkX18HGrUIN5AXSujQKpXqUmadyLC0Iptf88rPhk/hCh/CUiKUgieXC5ocxmXEyqdLU+xJVoyPGafAgan/0c48OcVaKoU1V5HIABmRmpzMJkbtSR6MaD9Nfv22dFfxyhABr/usGlsJr+Q467YuBn2HyAXppPU5r4QbqNzP46rA1kBvdVteXKnvMSuPcfNWxin2mURjubqYjAnf9+dA+CyDLcVOnGMLSIOhhH8jF7WB1wlMpRVdCcKfNgQhJPbZDwfAbL62clK6WefSgnvkfNlnxmDll7XoiRPRyBOZVNeTAA1+EOPt8lZ3S9Dc8OPhtVE/s52FboUYqA2QTty8g/0Z6z7dwvU+/it8F7HhnUDnBzqn3joizYqoi3fMAfKt3MVjkyduJQPpA4iWRdPZvMZXHXft6H2Z9hMVkYWiwoZ179QxlOgxDBv8j5Zu1X485pj32yRxeW2x1d0H7iM1xr9GdZiPZvWERUSZrFFUocnA1qbgfolUoMvwGDbvTYz5sMDwvE5epiyGpcXCKP2Ka7g3pCbcgFUajOz3MjWV+aKVjlWjdMbMNkE09Jm6iNbhePrLHw1nV5fOzQxA3Y4qkh0RwcbySSBkU8lmofyISNUXmFbvOsMpxuOOLgfENLtM5GoJrcpIPUE/2zRPoMrcC6wXOeyFz1hS/mq+cP5POEsHexk8U4FzZ0MxDH+IgeCmtBZEIR8b+HOfUEEiohvjKaZxiAWRdhlXUmFwsrFIiTWN6QwrUIgwU8VEzPmcX+Mmcwz0iIsECBwUgsi0Pe3eIVue+wW2feEfKg9snT8tZPwC6kJWCceoRnMQVetPGd2tKAefWW35qkqv31Tk+hBkAxqHtukidMFZDZMo5iuDy5ePEPXRdvfsTameBEU/h188T9SQVTLIsKQjpUfq9RkDnPPIRNk+vM9u1Jet/ul2gegyr4Oapz/+O1fHF5OpHeGAXZY1U0zkXd63CBX BjKeOChy GWUkYM0bsAVx9fYWFNA2G77M23DdThB9KArFlg0MjJbV2xSpXyGHqpR//LwHPBdGajjxlN9PeqSUFpkPKUKsEjNrwyxw76RnbsXHtXKIoZS0QUPNhGbbluXJxaBFELbdQ1KP0H0+hNf05qXMS0iDtkUd4jJfm/NCmMuUdcrKAPmTBdQjiurF34+950e3Pymp3vlPr1AjdykNiIsR6+QqZ40SdjK2O0hUfnve0zuzQEqTeOkc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The weighted interleave policy distributes page allocations across multiple NUMA nodes based on their performance weight, thereby improving memory bandwidth utilization. The weight values for each node are configured through sysfs. Previously, sysfs entries for configuring weighted interleave were created for all possible nodes (N_POSSIBLE) at initialization, including nodes that might not have memory. However, not all nodes in N_POSSIBLE are usable at runtime, as some may remain memoryless or offline. This led to sysfs entries being created for unusable nodes, causing potential misconfiguration issues. To address this issue, this patch modifies the sysfs creation logic to: 1) Limit sysfs entries to nodes that are online and have memory, avoiding the creation of sysfs entries for nodes that cannot be used. 2) Support memory hotplug by dynamically adding and removing sysfs entries based on whether a node transitions into or out of the N_MEMORY state. Additionally, the patch ensures that sysfs attributes are properly managed when nodes go offline, preventing stale or redundant entries from persisting in the system. By making these changes, the weighted interleave policy now manages its sysfs entries more efficiently, ensuring that only relevant nodes are considered for interleaving, and dynamically adapting to memory hotplug events. Co-developed-by: Honggyu Kim Signed-off-by: Honggyu Kim Co-developed-by: Yunjeong Mun Signed-off-by: Yunjeong Mun Signed-off-by: Rakie Kim Reviewed-by: Oscar Salvador Reviewed-by: Joshua Hahn Reviewed-by: Gregory Price Reviewed-by: Dan Williams Acked-by: David Hildenbrand --- mm/mempolicy.c | 107 ++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 84 insertions(+), 23 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 97b52d65b3ba..74b4e2a6c786 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -113,6 +113,7 @@ #include #include #include +#include #include "internal.h" @@ -3421,6 +3422,7 @@ struct iw_node_attr { struct sysfs_wi_group { struct kobject wi_kobj; + struct mutex kobj_lock; struct iw_node_attr *nattrs[]; }; @@ -3470,13 +3472,24 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr, static void sysfs_wi_node_delete(int nid) { - if (!wi_group->nattrs[nid]) + struct iw_node_attr *attr; + + if (nid < 0 || nid >= nr_node_ids) + return; + + mutex_lock(&wi_group->kobj_lock); + attr = wi_group->nattrs[nid]; + if (!attr) { + mutex_unlock(&wi_group->kobj_lock); return; + } + + wi_group->nattrs[nid] = NULL; + mutex_unlock(&wi_group->kobj_lock); - sysfs_remove_file(&wi_group->wi_kobj, - &wi_group->nattrs[nid]->kobj_attr.attr); - kfree(wi_group->nattrs[nid]->kobj_attr.attr.name); - kfree(wi_group->nattrs[nid]); + sysfs_remove_file(&wi_group->wi_kobj, &attr->kobj_attr.attr); + kfree(attr->kobj_attr.attr.name); + kfree(attr); } static void sysfs_wi_node_delete_all(void) @@ -3518,35 +3531,77 @@ static const struct kobj_type wi_ktype = { static int sysfs_wi_node_add(int nid) { - struct iw_node_attr *node_attr; + int ret = 0; char *name; + struct iw_node_attr *new_attr; - node_attr = kzalloc(sizeof(*node_attr), GFP_KERNEL); - if (!node_attr) + if (nid < 0 || nid >= nr_node_ids) { + pr_err("invalid node id: %d\n", nid); + return -EINVAL; + } + + new_attr = kzalloc(sizeof(*new_attr), GFP_KERNEL); + if (!new_attr) return -ENOMEM; name = kasprintf(GFP_KERNEL, "node%d", nid); if (!name) { - kfree(node_attr); + kfree(new_attr); return -ENOMEM; } - sysfs_attr_init(&node_attr->kobj_attr.attr); - node_attr->kobj_attr.attr.name = name; - node_attr->kobj_attr.attr.mode = 0644; - node_attr->kobj_attr.show = node_show; - node_attr->kobj_attr.store = node_store; - node_attr->nid = nid; + sysfs_attr_init(&new_attr->kobj_attr.attr); + new_attr->kobj_attr.attr.name = name; + new_attr->kobj_attr.attr.mode = 0644; + new_attr->kobj_attr.show = node_show; + new_attr->kobj_attr.store = node_store; + new_attr->nid = nid; - if (sysfs_create_file(&wi_group->wi_kobj, &node_attr->kobj_attr.attr)) { - kfree(node_attr->kobj_attr.attr.name); - kfree(node_attr); - pr_err("failed to add attribute to weighted_interleave\n"); - return -ENOMEM; + mutex_lock(&wi_group->kobj_lock); + if (wi_group->nattrs[nid]) { + mutex_unlock(&wi_group->kobj_lock); + pr_info("node%d already exists\n", nid); + goto out; } - wi_group->nattrs[nid] = node_attr; + ret = sysfs_create_file(&wi_group->wi_kobj, &new_attr->kobj_attr.attr); + if (ret) { + mutex_unlock(&wi_group->kobj_lock); + goto out; + } + wi_group->nattrs[nid] = new_attr; + mutex_unlock(&wi_group->kobj_lock); return 0; + +out: + kfree(new_attr->kobj_attr.attr.name); + kfree(new_attr); + return ret; +} + +static int wi_node_notifier(struct notifier_block *nb, + unsigned long action, void *data) +{ + int err; + struct memory_notify *arg = data; + int nid = arg->status_change_nid; + + if (nid < 0) + return NOTIFY_OK; + + switch (action) { + case MEM_ONLINE: + err = sysfs_wi_node_add(nid); + if (err) + pr_err("failed to add sysfs for node%d during hotplug: %d\n", + nid, err); + break; + case MEM_OFFLINE: + sysfs_wi_node_delete(nid); + break; + } + + return NOTIFY_OK; } static int __init add_weighted_interleave_group(struct kobject *mempolicy_kobj) @@ -3557,20 +3612,26 @@ static int __init add_weighted_interleave_group(struct kobject *mempolicy_kobj) GFP_KERNEL); if (!wi_group) return -ENOMEM; + mutex_init(&wi_group->kobj_lock); err = kobject_init_and_add(&wi_group->wi_kobj, &wi_ktype, mempolicy_kobj, "weighted_interleave"); if (err) goto err_put_kobj; - for_each_node_state(nid, N_POSSIBLE) { + for_each_online_node(nid) { + if (!node_state(nid, N_MEMORY)) + continue; + err = sysfs_wi_node_add(nid); if (err) { - pr_err("failed to add sysfs [node%d]\n", nid); + pr_err("failed to add sysfs for node%d during init: %d\n", + nid, err); goto err_cleanup_kobj; } } + hotplug_memory_notifier(wi_node_notifier, DEFAULT_CALLBACK_PRI); return 0; err_cleanup_kobj: -- 2.34.1