From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 46834C36010 for ; Mon, 7 Apr 2025 09:39:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F0A4C6B000C; Mon, 7 Apr 2025 05:39:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EBA766B000D; Mon, 7 Apr 2025 05:39:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D5A1B6B000E; Mon, 7 Apr 2025 05:39:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id B86716B000C for ; Mon, 7 Apr 2025 05:39:35 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 49D73AE75B for ; Mon, 7 Apr 2025 09:39:36 +0000 (UTC) X-FDA: 83306750352.13.04612BF Received: from invmail4.hynix.com (exvmail4.hynix.com [166.125.252.92]) by imf12.hostedemail.com (Postfix) with ESMTP id 216C040004 for ; Mon, 7 Apr 2025 09:39:33 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf12.hostedemail.com: domain of rakie.kim@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=rakie.kim@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744018774; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pdlv0ANWrCzkt8QZK+lAi9id33W7zfvLng5wRJObQwM=; b=t6yt2DwRA8IkhsBCIsiQ7bXkjU+eJffmILUvf3n9QGayU6yfGtHbEJ7tLTJJPCeeUmz6d9 3W7oyaXeqJzT4H0cqgG4FSrKgow5vLHQbZ/WlNv2w06Hp16DhAsMpptQDX5RJgt+o0Xi3R 6fykdh28rCWOaWCKQc8yapxo11FARpM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744018774; a=rsa-sha256; cv=none; b=epP1FipUbBFkNMPLaOKvskfjmyr9gXhwM9LJdZVIZH21SExp6Ly65+Gzx55TTzxV1HIpK/ 2hVeUhfN31R15c1d71I81X1VPPCwOSkKzszAIynIPWWz9TBIPnkMxLU1ZNX5PlUa1OPxEt zl7cHKUKeMO537NeKm/gF3SljdCOb+8= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf12.hostedemail.com: domain of rakie.kim@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=rakie.kim@sk.com X-AuditID: a67dfc5b-681ff7000002311f-91-67f39d523d72 From: Rakie Kim To: David Hildenbrand Cc: gourry@gourry.net, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org, joshua.hahnjy@gmail.com, dan.j.williams@intel.com, ying.huang@linux.alibaba.com, Jonathan.Cameron@huawei.com, osalvador@suse.de, kernel_team@skhynix.com, honggyu.kim@sk.com, yunjeong.mun@sk.com, Rakie Kim , akpm@linux-foundation.org Subject: Re: [PATCH v6 3/3] mm/mempolicy: Support memory hotplug in weighted interleave Date: Mon, 7 Apr 2025 18:39:19 +0900 Message-ID: <20250407093926.450-1-rakie.kim@sk.com> X-Mailer: git-send-email 2.48.1.windows.1 In-Reply-To: <198f2cbe-b1cb-4239-833e-9aac33d978fa@redhat.com> References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrOLMWRmVeSWpSXmKPExsXC9ZZnkW7Q3M/pBk/6hSzmrF/DZjF96gVG i6/rfzFb/Lx7nN1i1cJrbBbHt85jtzg/6xSLxeVdc9gs7q35z2pxZlqRxeo1GQ7cHjtn3WX3 6G67zO7RcuQtq8fiPS+ZPDZ9msTucWLGbxaPnQ8tPd7vu8rmsfl0tcfnTXIBXFFcNimpOZll qUX6dglcGRvm1BSsdqzY9Pg/YwPjReMuRk4OCQETifufv7N2MXKA2bu3xoKYbAJKEsf2xoBU iAhoSGxq28DcxcjFwSzwg0li64VDbCAJYYFwie/3n7CA2CwCqhI/b85lBLF5BYwlLnbNYIIY rynRcOkemM0pYCfxs2EHWI2QAI/Eqw37oeoFJU7OhJjDLCAv0bx1NtgyCYHPbBLNk6+xQAyS lDi44gbLBEb+WUh6ZiHpWcDItIpRKDOvLDcxM8dEL6MyL7NCLzk/dxMjMPiX1f6J3sH46ULw IUYBDkYlHt4dbp/ThVgTy4orcw8xSnAwK4nwWp76lC7Em5JYWZValB9fVJqTWnyIUZqDRUmc 1+hbeYqQQHpiSWp2ampBahFMlomDU6qB0WdazdXPaYHixl9nJvKeU12x5fl+noex/IdfPL9x P//kq9Tuk7Wupyd0VRTmVs89lOOsf6n+k1TEhiM2vw8fkmNbf7TSJ/HoVAal+QeENvi+5Hse c8ePu4MjOW7/nX+598IDLd/lO+cfdJnf/+VXhzp/dFBIgvsCFZU7yzxmf1+aOlEo595VcSWW 4oxEQy3mouJEALAtHNt6AgAA X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrILMWRmVeSWpSXmKPExsXCNUNNSzdo7ud0g19fOC3mrF/DZjF96gVG i6/rfzFb/Lx7nN3i87PXzBarFl5jszi+dR67xeG5J1ktzs86xWJxedccNot7a/6zWpyZVmRx 6NpzVovVazIsfm9bwebA77Fz1l12j+62y+weLUfesnos3vOSyWPTp0nsHidm/Gbx2PnQ0uP9 vqtsHt9ue3gsfvGByWPz6WqPz5vkAniiuGxSUnMyy1KL9O0SuDI2zKkpWO1Ysenxf8YGxovG XYwcHBICJhK7t8aCmGwCShLH9sZ0MXJyiAhoSGxq28DcxcjFwSzwg0li64VDbCAJYYFwie/3 n7CA2CwCqhI/b85lBLF5BYwlLnbNYAKxJQQ0JRou3QOzOQXsJH427ACrERLgkXi1YT9UvaDE yZkQc5gF5CWat85mnsDIMwtJahaS1AJGplWMIpl5ZbmJmTmmesXZGZV5mRV6yfm5mxiBAb+s 9s/EHYxfLrsfYhTgYFTi4b3R+CldiDWxrLgy9xCjBAezkgiv5SmgEG9KYmVValF+fFFpTmrx IUZpDhYlcV6v8NQEIYH0xJLU7NTUgtQimCwTB6dUA2P761WJJyRkDv2ecDUkKHgSi+ai559b PD9w/Xd11rj78k4hs0hitN9H2WdvznQo1ga++PYjn+UX55aNNx+eUjFcsNpNNUz5qtYmF4vH k1a85BYKnc98zeP1zOtZCUm7Zr9y7G7bZNN789yFwiClaPnY41s0+A4khO/6zXXy+NGOwwy2 Xlkv1tQosRRnJBpqMRcVJwIArJaB4XQCAAA= X-CFilter-Loop: Reflected X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 216C040004 X-Stat-Signature: psqyarf8jztmjwcwhzsy63pr4netn1i4 X-Rspam-User: X-HE-Tag: 1744018773-751738 X-HE-Meta: U2FsdGVkX18/te+S/TdWXAj+DGCeujr+3XZwkcB/wzHUGIVzFTXVrJq8giW8avc1aYxOZ+EzwkB0tQGlHbIPQzhwraFrDU8YRWCcLREkS30IQfw3KkdgdKCGbJQ3MLzLbMrk55ZjYJvfw5jAnVaos8yTU1ujgNz2peI0BWGJOXqD2lI9YsUlOEY2w2ZGUEp+4+sTjk8vN1MyjzIXNUzC2cYa9aI+dcDgyd5jOXMY+9/O5WQUodDzPEhqyLg0iFKU9IgHnwBxyI72aeMHpHkhDi36mlvol5U49tx9bncx2OjaldIbZjj88jDDNrAQPEO1kF8xiL3Q0vMPfHumw5st26HDk1hcahZJGFeyLSzC9JNjhKJwvFtKEeagMc2ddc5rInJajtXUDLBgyGTAvIhwvYSTCrBgf2/bWJ+C5op+8uZIV5zzpqjVNgOCQYknmHFs1baxuwMKqiDgdEEHsTdSVzHDU/vXm2AjsINgYErbkquaoaxMFrV/JpfvbrqfhFgSm/DGV4Zuku7a4+3VPXW/z9vOTzjy4uUzCSKERM905OYekhdBGdy67O0SJDX0eYBCJwUVjpEtMDBKGJufCdJyIsntVq40xjsHqL+rbEGg4NgfagNw2c3ixnmkRGBbywr7E3+HBsY/5sOnllDI2P7stc6HrTALKjhT+ggx8fqtCrCNGXQsG2wJDuDqgUZJCDlIFg3O2dk0lnTO53mdc/oFYE8c+dewFgSZ5nvDXXPEoIX1Z5z1MC9dVtyM8wXE4pn3MIKu3elCwwSOuKY3I5Qt03hOVNvMrT7UEYmoEymw8yU/7XYc7VZMY3rrI0xyyznPHckJkCGzkp31AnBzEYG41gmG8T7Gd8r2oacJoljyoiqLSx8RjZvG5DuNGzYAVEa31CVRk5KGtz3cvVa4YhiPqiX0JkL2iTQw2WZ6mlEl9LAsZYctWYZwqKkdLUhA5F6usgvtitUMB0mWiE9avh9 iorcvMBh JnnB2LDjZ5pFlTj0AHKU+1sZx9o2lXdx3AGG9QF5LtdLjCzjOse7dfTCXfXNZgBRlz5udXvYGYvFI4ldBmnFjQBpaMO0v8ErqsrZndbKVjiPG6I5GyfOyU92wSVbUIl890+Gr X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, 4 Apr 2025 22:45:00 +0200 David Hildenbrand wrote: > On 04.04.25 09:46, Rakie Kim wrote: > > The weighted interleave policy distributes page allocations across multiple > > NUMA nodes based on their performance weight, thereby improving memory > > bandwidth utilization. The weight values for each node are configured > > through sysfs. > > > > Previously, sysfs entries for configuring weighted interleave were created > > for all possible nodes (N_POSSIBLE) at initialization, including nodes that > > might not have memory. However, not all nodes in N_POSSIBLE are usable at > > runtime, as some may remain memoryless or offline. > > This led to sysfs entries being created for unusable nodes, causing > > potential misconfiguration issues. > > > > To address this issue, this patch modifies the sysfs creation logic to: > > 1) Limit sysfs entries to nodes that are online and have memory, avoiding > > the creation of sysfs entries for nodes that cannot be used. > > 2) Support memory hotplug by dynamically adding and removing sysfs entries > > based on whether a node transitions into or out of the N_MEMORY state. > > > > Additionally, the patch ensures that sysfs attributes are properly managed > > when nodes go offline, preventing stale or redundant entries from persisting > > in the system. > > > > By making these changes, the weighted interleave policy now manages its > > sysfs entries more efficiently, ensuring that only relevant nodes are > > considered for interleaving, and dynamically adapting to memory hotplug > > events. > > > > Signed-off-by: Rakie Kim > > Signed-off-by: Honggyu Kim > > Signed-off-by: Yunjeong Mun > > --- > > mm/mempolicy.c | 109 ++++++++++++++++++++++++++++++++++++++----------- > > 1 file changed, 86 insertions(+), 23 deletions(-) > > > > diff --git a/mm/mempolicy.c b/mm/mempolicy.c > > index 73a9405ff352..f25c2c7f8fcf 100644 > > --- a/mm/mempolicy.c > > +++ b/mm/mempolicy.c > > @@ -113,6 +113,7 @@ > > #include > > #include > > #include > > +#include > > > > #include "internal.h" > > > > @@ -3390,6 +3391,7 @@ struct iw_node_attr { > > > > struct sysfs_wi_group { > > struct kobject wi_kobj; > > + struct mutex kobj_lock; > > struct iw_node_attr *nattrs[]; > > }; > > > > @@ -3439,13 +3441,24 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr, > > > > static void sysfs_wi_node_delete(int nid) > > { > > - if (!wi_group->nattrs[nid]) > > + struct iw_node_attr *attr; > > + > > + if (nid < 0 || nid >= nr_node_ids) > > + return; > > + > > + mutex_lock(&wi_group->kobj_lock); > > + attr = wi_group->nattrs[nid]; > > + if (!attr) { > > + mutex_unlock(&wi_group->kobj_lock); > > return; > > + } > > + > > + wi_group->nattrs[nid] = NULL; > > + mutex_unlock(&wi_group->kobj_lock); > > > > - sysfs_remove_file(&wi_group->wi_kobj, > > - &wi_group->nattrs[nid]->kobj_attr.attr); > > - kfree(wi_group->nattrs[nid]->kobj_attr.attr.name); > > - kfree(wi_group->nattrs[nid]); > > + sysfs_remove_file(&wi_group->wi_kobj, &attr->kobj_attr.attr); > > + kfree(attr->kobj_attr.attr.name); > > + kfree(attr); > > } > > > > static void sysfs_wi_release(struct kobject *wi_kobj) > > @@ -3464,35 +3477,80 @@ static const struct kobj_type wi_ktype = { > > > > static int sysfs_wi_node_add(int nid) > > { > > - struct iw_node_attr *node_attr; > > + int ret = 0; > > char *name; > > + struct iw_node_attr *new_attr = NULL; > > > > - node_attr = kzalloc(sizeof(*node_attr), GFP_KERNEL); > > - if (!node_attr) > > + if (nid < 0 || nid >= nr_node_ids) { > > + pr_err("Invalid node id: %d\n", nid); > > + return -EINVAL; > > + } > > + > > + new_attr = kzalloc(sizeof(struct iw_node_attr), GFP_KERNEL); > > + if (!new_attr) > > return -ENOMEM; > > > > name = kasprintf(GFP_KERNEL, "node%d", nid); > > if (!name) { > > - kfree(node_attr); > > + kfree(new_attr); > > return -ENOMEM; > > } > > > > - sysfs_attr_init(&node_attr->kobj_attr.attr); > > - node_attr->kobj_attr.attr.name = name; > > - node_attr->kobj_attr.attr.mode = 0644; > > - node_attr->kobj_attr.show = node_show; > > - node_attr->kobj_attr.store = node_store; > > - node_attr->nid = nid; > > + mutex_lock(&wi_group->kobj_lock); > > + if (wi_group->nattrs[nid]) { > > + mutex_unlock(&wi_group->kobj_lock); > > + pr_info("Node [%d] already exists\n", nid); > > + kfree(new_attr); > > + kfree(name); > > + return 0; > > + } > > + wi_group->nattrs[nid] = new_attr; > > > > - if (sysfs_create_file(&wi_group->wi_kobj, &node_attr->kobj_attr.attr)) { > > - kfree(node_attr->kobj_attr.attr.name); > > - kfree(node_attr); > > - pr_err("failed to add attribute to weighted_interleave\n"); > > - return -ENOMEM; > > + sysfs_attr_init(&wi_group->nattrs[nid]->kobj_attr.attr); > > + wi_group->nattrs[nid]->kobj_attr.attr.name = name; > > + wi_group->nattrs[nid]->kobj_attr.attr.mode = 0644; > > + wi_group->nattrs[nid]->kobj_attr.show = node_show; > > + wi_group->nattrs[nid]->kobj_attr.store = node_store; > > + wi_group->nattrs[nid]->nid = nid; > > + > > + ret = sysfs_create_file(&wi_group->wi_kobj, > > + &wi_group->nattrs[nid]->kobj_attr.attr); > > + if (ret) { > > + kfree(wi_group->nattrs[nid]->kobj_attr.attr.name); > > + kfree(wi_group->nattrs[nid]); > > + wi_group->nattrs[nid] = NULL; > > + pr_err("Failed to add attribute to weighted_interleave: %d\n", ret); > > } > > + mutex_unlock(&wi_group->kobj_lock); > > > > - wi_group->nattrs[nid] = node_attr; > > - return 0; > > + return ret; > > +} > > + > > +static int wi_node_notifier(struct notifier_block *nb, > > + unsigned long action, void *data) > > +{ > > + int err; > > + struct memory_notify *arg = data; > > + int nid = arg->status_change_nid; > > + > > + if (nid < 0) > > + goto notifier_end; > > + > > + switch(action) { > > + case MEM_ONLINE: > > MEM_ONLINE is too late, we cannot fail hotplug at that point. > > Would MEM_GOING_ONLINE / MEM_CANCEL_ONLINE be better? Hi David, Thank you for raising these points. I would appreciate your clarification on the following: Issue1: I want to invoke sysfs_wi_node_add() after a node with memory has been fully transitioned to the online state. Does replacing MEM_ONLINE with MEM_GOING_ONLINE or MEM_CANCEL_ONLINE still ensure that the node is considered online and usable by that point? > > > > + err = sysfs_wi_node_add(nid); > > + if (err) { > > + pr_err("failed to add sysfs [node%d]\n", nid); > > + return NOTIFY_BAD; > > Note that NOTIFY_BAD includes NOTIFY_STOP_MASK. So you wouldn't call > other notifiers, but the overall memory onlining would succeed, which is > bad. > > If we don't care about the error (not prevent hotplug) we could only > pr_warn() and continue. Maybe this (unlikely) case is not a good reason > to stop memory from getting onlined. OTOH, it will barely ever trigger > in practice ... > Issue2: Regarding your note about NOTIFY_BAD ? are you saying that if sysfs_wi_node_add() returns NOTIFY_BAD, it will trigger NOTIFY_STOP_MASK, preventing other notifiers from running, while still allowing the memory hotplug operation to complete? If so, then I'm thinking of resolving both issues as follows: - For Issue1: I keep using MEM_ONLINE, assuming it is safe and sufficient to ensure the node is fully online. - For Issue2: I avoid returning NOTIFY_BAD from the notifier. Instead, I log the error using pr_err() and continue the operation. This would result in the following code: if (nid < 0) return NOTIFY_OK; switch (action) { case MEM_ONLINE: // Issue1: keeping this unchanged err = sysfs_wi_node_add(nid); if (err) { pr_err("failed to add sysfs [node%d]\n", nid); // Issue2: Do not return NOTIFY_BAD } break; case MEM_OFFLINE: sysfs_wi_node_delete(nid); break; } // Always return NOTIFY_OK return NOTIFY_OK; Please let me know if this approach is acceptable. Rakie > -- > Cheers, > > David / dhildenb >