From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1085BC282DE for ; Thu, 13 Mar 2025 06:34:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D053C280003; Thu, 13 Mar 2025 02:33:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CB661280001; Thu, 13 Mar 2025 02:33:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B7DA5280003; Thu, 13 Mar 2025 02:33:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 9E389280001 for ; Thu, 13 Mar 2025 02:33:57 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id D8A18C1C7A for ; Thu, 13 Mar 2025 06:33:58 +0000 (UTC) X-FDA: 83215562556.12.A5C4CBE Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf28.hostedemail.com (Postfix) with ESMTP id A9028C0006 for ; Thu, 13 Mar 2025 06:33:56 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf28.hostedemail.com: domain of rakie.kim@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=rakie.kim@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741847637; a=rsa-sha256; cv=none; b=TSo2LJrInKT+g51Mh+8Ow4DxR6ryyuk8EJTRHEUtDCAvRHa5mQllpQGCqyLtNyG7VQulvc rjFYubW8/byNsLMsZ22B7jhiWJPhPFcdoP2wIACCspJX9qKgxXNhFAqqgPrJ8uky7g4uEk TGBP6D3CjrT12pI5B0gIFCdsC2N1ERc= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf28.hostedemail.com: domain of rakie.kim@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=rakie.kim@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741847637; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HKqBps6G36x3KfkEKa9COE3I6PLfEA/CJ3vQjEDb1Kg=; b=C1AwNl9JumgVynBpJQWuF8icTjiDBsLyQpr+7sZ5bnSAbnsdQmKiFLhMzmABg5divMQPSv WhZ/VGVNQodZ6fVZnahWFA7hqvVfJYhClv05YwI84oB9a9EnHRdbdeDNXdhyTsGf1+j4yV u3CijMCb6CAEwcqtvtVNVd88dhZlR2A= X-AuditID: a67dfc5b-681ff7000002311f-98-67d27c53c7e1 From: Rakie Kim To: Gregory Price Cc: akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org, joshua.hahnjy@gmail.com, dan.j.williams@intel.com, ying.huang@linux.alibaba.com, kernel_team@skhynix.com, honggyu.kim@sk.com, yunjeong.mun@sk.com, Rakie Kim Subject: Re: [PATCH v2 2/4] mm/mempolicy: Support memory hotplug in weighted interleave Date: Thu, 13 Mar 2025 15:33:37 +0900 Message-ID: <20250313063351.692-1-rakie.kim@sk.com> X-Mailer: git-send-email 2.48.1.windows.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrBLMWRmVeSWpSXmKPExsXC9ZZnoW5wzaV0gwt3JC3mrF/DZjF96gVG i593j7NbHN86j93i/KxTLBaXd81hs7i35j+rxeo1GQ4cHjtn3WX36G67zO6xeM9LJo9Nnyax e5yY8ZvFY+dDS4/Pm+QC2KO4bFJSczLLUov07RK4Mlp2aBU0GVcce/GbvYFxqkYXIyeHhICJ xJbH35hg7FNv1zB2MXJwsAkoSRzbGwNiigioSrRdce9i5OJgFljPJPF60yw2kLiwQLjE8bdV IJ0sQCW/z79mBLF5BYwlWpv/sEBM1JRouHSPCaScU8BMYuPyPJCwkACPxKsN+6HKBSVOznwC Vs4sIC/RvHU2M0TrCTaJLacCIWxJiYMrbrBMYOSfhaRlFpKWBYxMqxiFMvPKchMzc0z0Mirz Miv0kvNzNzECw3dZ7Z/oHYyfLgQfYhTgYFTi4U2YfDFdiDWxrLgy9xCjBAezkgjvatsL6UK8 KYmVValF+fFFpTmpxYcYpTlYlMR5jb6VpwgJpCeWpGanphakFsFkmTg4pRoYmw0kGTPsFuw+ FVc9c8L0uFqJP6tVzfk9NjRMPnPTVVzbgmNfZcqN4H0Bp1cHrz3StzKmpWCm8synES6rGPVd +nJPs28r75HV+7NI9iSrttnngve9Ouu+rVzzP8mf9XNu1zl/q6PCEv5Mrn0zV/hvZHnuO2t/ pGb9+/j5asIK+Qf41abEu/UpsRRnJBpqMRcVJwIAOG103lsCAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrMLMWRmVeSWpSXmKPExsXCNUNNSze45lK6wbd1ghZz1q9hs5g+9QKj xc+7x9ktPj97zWxxfOs8dovDc0+yWpyfdYrF4vKuOWwW99b8Z7U4dO05q8XqNRkWv7etYHPg 8dg56y67R3fbZXaPxXteMnls+jSJ3ePEjN8sHjsfWnp8u+3hsfjFByaPz5vkAjijuGxSUnMy y1KL9O0SuDJadmgVNBlXHHvxm72BcapGFyMnh4SAicSpt2sYuxg5ONgElCSO7Y0BMUUEVCXa rrh3MXJxMAusZ5J4vWkWG0hcWCBc4vjbKpBOFqCS3+dfM4LYvALGEq3Nf1ggJmpKNFy6xwRS zilgJrFxeR5IWEiAR+LVhv1Q5YISJ2c+AStnFpCXaN46m3kCI88sJKlZSFILGJlWMYpk5pXl JmbmmOoVZ2dU5mVW6CXn525iBAbtsto/E3cwfrnsfohRgINRiYc3YfLFdCHWxLLiytxDjBIc zEoivKttL6QL8aYkVlalFuXHF5XmpBYfYpTmYFES5/UKT00QEkhPLEnNTk0tSC2CyTJxcEo1 ME6UupNreChILlN9s+/psHVhhi+Ur75/Yf3z9rvZq/Zma2sf+rlCermF97XixHXvf8Rc5HOu 8Ug97aEp4veNRbTm7J3mzbPST+7+m7ivZun8e1N71eQj4piYvuiGK+3+nLmT4/OMZ63rC1U5 proF+Fferd1WZeiz465ZFlvD8lzm16tO/up/F6fEUpyRaKjFXFScCACF+M0DVgIAAA== X-CFilter-Loop: Reflected X-Rspamd-Server: rspam07 X-Rspam-User: X-Stat-Signature: z4og8yd8w56ykjzi4t1j3s8zmz4zfi7a X-Rspamd-Queue-Id: A9028C0006 X-HE-Tag: 1741847636-515621 X-HE-Meta: U2FsdGVkX1/wxtB7/OYVFIdbmQkpXIPlFG/qCpR4VjjC7TT7g7XzMrOxxQfeXvlNWilFzXrph81l3WwSh4k6U9tghaEa9e9aJx6BeSqw/MUIYFyWT7waMpQqWCvEAegi84RtkG//43y8RF6ggiKFMWIAAFtQtHHVOsCIg0DOJh0z48JRhn7QSUEBQbZU5Lm+LCcm+j9kkLAE35QJAN6e3CYbahXOktBWeonoavsp6c2nGrUxVTheWdtwGPD0Tw172cWpmwv+utc2Kc39UXRSC0/p4wBPD7/mZv2LFk9dXHndKtVLyRnz1WFho7AKZVahQ5cCVYoLzeP8+aCbsV5jIIerK5g39hmAvhmZ5EdLDPGpcyo64SYi2oz8PwTmOXwRw9SSEhGyDhEtYSSgR+Cculzpojl+Ap35NUj1AhW4X4L+2A4LrvroGKueSwN5T1XGP2UGIyvBsOFoOxg76T0Fi2CDwdiZ0fI9a44d62k3QIPTGfdhZIAXgZpOL1mP/AypkpU4sIfo43oaz1eWtwFcbIAI9HBGpT2HqoRYGXqqRFuuhCTihDb8slJMqYRNWm2kMqsMKfF00d7qLTDsibT3RWooFKwJCixp148Vnl12kHqirKPz0HYq6CcokUguS431uZjaBg2OV5BOsAMLtZ/WEv+ye227MTzDz6HQ6KNPnh2zPVaSvnjctd2QoTDuP44CKR0fm1ti00HUu47DnuaKLsP9d4ZWm6yhix9J1PwC9nPyNq7JP44IewVvq/WEV91RftYanLc/fmWsx/5Z4/cWTwN4vtf8SnXF90TF8CYDgTuuKMgRW/GLQb6TILpp93rWdKYLf+HF4TyhUhs6UNQg81AwdeYeZaciFSdvdoyptmo5K7JWtUJAKs82prDT+BU2YAc6bUMrQ/kyC4pHzB0z3TZcUyl/+kUBuUUuxJQgjSVFN65m76VW9fjoOi5XnquDpt00exsn0Uto76jNqii nWzMWcTp jI1ePhsz0WnU7VeRDb8f6JgzCtsyQGleYKyyydro4ioxVc8VHora/+K7j4ouzYSXtrh2DrcLa3kzQYQPkFOO1XOrZzCyif0sdoCIEaFHrjzmOmHoOvRjFZM7HbMmlTRxxQw1dM/vl3nViRI7TRZHs7BRyXwthep+81J1h X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, 12 Mar 2025 12:03:01 -0400 Gregory Price wrote: > On Wed, Mar 12, 2025 at 04:56:25PM +0900, Rakie Kim wrote: > > The weighted interleave policy distributes page allocations across multiple > > NUMA nodes based on their performance weight, thereby optimizing memory > > bandwidth utilization. The weight values for each node are configured > > through sysfs. > > > > Previously, the sysfs entries for configuring weighted interleave were only > > created during initialization. This approach had several limitations: > > - Sysfs entries were generated for all possible nodes at boot time, > > including nodes without memory, leading to unnecessary sysfs creation. > > It's not that it's unnecessary, it's that it allowed for configuration > of nodes which may not have memory now but may have memory in the > future. This was not well documented. I will update the commit message to reflect your feedback. > > > - Some memory devices transition to an online state after initialization, > > but the existing implementation failed to create sysfs entries for > > these dynamically added nodes. As a result, memory hotplugged nodes > > were not properly recognized by the weighed interleave mechanism. > > > > The current system creates 1 node per N_POSSIBLE nodes, and since nodes > can't transition between possible and not-possible your claims here are > contradictory. > > I think you mean that simply switching from N_POSSIBLE to N_MEMORY is > insufficient since nodes may transition in and out of the N_MEMORY > state. Therefore this patch utilizes a hotplug callback to add and > remove sysfs entries based on whether a node is in the N_MEMORY set. I will update the commit message to reflect your feedback. > > > To resolve these issues, this patch introduces two key improvements: > > 1) At initialization, only nodes that are online and have memory are > > recognized, preventing the creation of unnecessary sysfs entries. > > 2) Nodes that become available after initialization are dynamically > > detected and integrated through the memory hotplug mechanism. > > > > With this enhancement, the weighted interleave policy now properly supports > > memory hotplug, ensuring that newly added nodes are recognized and sysfs > > entries are created accordingly. > > > > It doesn't "support memory hotplug" so much as it "Minimizes weighted > interleave to exclude memoryless nodes". I will update the commit message to reflect your feedback. > > > Signed-off-by: Rakie Kim > > --- > > mm/mempolicy.c | 47 ++++++++++++++++++++++++++++++++++++++++++----- > > 1 file changed, 42 insertions(+), 5 deletions(-) > > > > diff --git a/mm/mempolicy.c b/mm/mempolicy.c > > index 1691748badb2..94efff89e0be 100644 > > --- a/mm/mempolicy.c > > +++ b/mm/mempolicy.c > > @@ -113,6 +113,7 @@ > > #include > > #include > > #include > > +#include > > > > #include "internal.h" > > > > @@ -3489,9 +3490,38 @@ static int add_weight_node(int nid, struct kobject *wi_kobj) > > return 0; > > } > > > > +struct kobject *wi_kobj; > > + > > +static int wi_node_notifier(struct notifier_block *nb, > > + unsigned long action, void *data) > > +{ > > + int err; > > + struct memory_notify *arg = data; > > + int nid = arg->status_change_nid; > > + > > + if (nid < 0) > > + goto notifier_end; > > + > > + switch(action) { > > + case MEM_ONLINE: > > + err = add_weight_node(nid, wi_kobj); > > + if (err) { > > + pr_err("failed to add sysfs [node%d]\n", nid); > > + kobject_put(wi_kobj); > > + return NOTIFY_BAD; > > + } > > + break; > > + case MEM_OFFLINE: > > + sysfs_wi_node_release(node_attrs[nid], wi_kobj); > > + break; > > + } > > I'm fairly certain this logic is wrong. If I add two memory blocks and > then remove one, would this logic not remove the sysfs entries despite > there being a block remaining? Regarding the assumption about node configuration: Are you assuming that a node has two memory blocks and that MEM_OFFLINE is triggered when one of them is offlined? If so, then you are correct that this logic would need modification. I performed a simple test by offlining a single memory block: # echo 0 > /sys/devices/system/node/node2/memory100/online In this case, MEM_OFFLINE was not triggered. However, I need to conduct further analysis to confirm this behavior under different conditions. I will review this in more detail and share my findings, including the test methodology and results. > > > + > > +notifier_end: > > + return NOTIFY_OK; > > +} > > + > > static int add_weighted_interleave_group(struct kobject *root_kobj) > > { > > - struct kobject *wi_kobj; > > int nid, err; > > > > wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL); > > @@ -3505,16 +3535,23 @@ static int add_weighted_interleave_group(struct kobject *root_kobj) > > return err; > > } > > > > - for_each_node_state(nid, N_POSSIBLE) { > > + for_each_online_node(nid) { > > + if (!node_state(nid, N_MEMORY)) > > Rather than online node, why not just add for each N_MEMORY node - > regardless of if its memory is online or not? If the memory is offline, > then it will be excluded from the weighted interleave mechanism by > nature of the node being invalid for allocations anyway. Regarding the decision to check both N_MEMORY and N_ONLINE: This was done to ensure consistency with the conditions under which `wi_node_notifier` is triggered. Specifically, `MEM_ONLINE` is called only when a node is in both the N_MEMORY and N_ONLINE states. I will review this logic further. If my understanding is correct, keeping the current implementation is the appropriate approach. However, I will conduct additional testing to validate this and provide further updates accordingly. > > > + continue; > > + > > err = add_weight_node(nid, wi_kobj); > > if (err) { > > pr_err("failed to add sysfs [node%d]\n", nid); > > - break; > > + goto err_out; > > } > > } > > - if (err) > > - kobject_put(wi_kobj); > > + > > + hotplug_memory_notifier(wi_node_notifier, DEFAULT_CALLBACK_PRI); > > return 0; > > + > > +err_out: > > + kobject_put(wi_kobj); > > + return err; > > } > > > > static void mempolicy_kobj_release(struct kobject *kobj) > > -- > > 2.34.1 > >