From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C098AC28B2F for ; Wed, 12 Mar 2025 16:03:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D6E23280002; Wed, 12 Mar 2025 12:03:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D1DBB280001; Wed, 12 Mar 2025 12:03:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BBEF5280002; Wed, 12 Mar 2025 12:03:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 9DCB2280001 for ; Wed, 12 Mar 2025 12:03:10 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id A9CE0B8DFF for ; Wed, 12 Mar 2025 16:03:10 +0000 (UTC) X-FDA: 83213368140.01.B209E6D Received: from mail-qv1-f51.google.com (mail-qv1-f51.google.com [209.85.219.51]) by imf17.hostedemail.com (Postfix) with ESMTP id 721B440024 for ; Wed, 12 Mar 2025 16:03:05 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=gourry.net header.s=google header.b="g/1AjQWu"; dmarc=none; spf=pass (imf17.hostedemail.com: domain of gourry@gourry.net designates 209.85.219.51 as permitted sender) smtp.mailfrom=gourry@gourry.net ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741795385; a=rsa-sha256; cv=none; b=1WIzHVjP3WW6imOiAMdSeWX93pLCcLV5EAXqW+IOhKkLPMBmMHujWWa1uZOmitUcJ0kvg6 R1pkKIprAWduNN6AextUGHdDaYAh5Fb7fSNSqa3DXHIgOF+YpQ1cIy+R2Fru7wEoTXEEZT T9xzrJurGVMPFZcJRtDhLEK3n4CTvO4= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=gourry.net header.s=google header.b="g/1AjQWu"; dmarc=none; spf=pass (imf17.hostedemail.com: domain of gourry@gourry.net designates 209.85.219.51 as permitted sender) smtp.mailfrom=gourry@gourry.net ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741795385; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=PgQPX6to3ruRXTMnOsQSfBylTa8PLJ1nm8nO6/ovThY=; b=69naqf/fX7lPlp0Tk6reYbxPyMIGgCrOYpcH+mH2DcTsg7/Cy/a04ElHNI5vCJcayA7t2r 8u7ReuvMNK18Lz7GfhfCsOrHW2/Cx1i3Muxl7cLujNb5xvUfYJW7r16BpvaW0i86O2MFt2 K5uvAY3J/OIpl4T8w9WuGa9Ef0V4Tic= Received: by mail-qv1-f51.google.com with SMTP id 6a1803df08f44-6e8ffa00555so524106d6.0 for ; Wed, 12 Mar 2025 09:03:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1741795384; x=1742400184; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=PgQPX6to3ruRXTMnOsQSfBylTa8PLJ1nm8nO6/ovThY=; b=g/1AjQWu9M6fXBNmKwguhCv/FMAa/dDsF0wfYPO0TKsU3se3A9isgwT5pEuPVhFAgD md1szuBVDb+yw2Jphs5yNFh+fZQ8iHwQL8SGiYvmhFYKsH347V0p4wrZz6ph7HIfykwh ExR/nXhstbEuh4x7UrKLlGKrjFGKezqMhVygEXLUadHG+LI320sK79ugpQ6u6+l1NUkx BWrM5+mLOO9yQZKGT/kGB5gdHkt4Xb6aLttPFl6Fn5yTXYQCNL4b4/UOLyJAGdbXBG6U DXpKkKZQjwQS5M1TLStpy4jLdcIGE9OodUstJI9CNe40XhyIa2pDNk8sHkLB+VskEciY ZksA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741795384; x=1742400184; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=PgQPX6to3ruRXTMnOsQSfBylTa8PLJ1nm8nO6/ovThY=; b=XyZXopXEUtyjaNHw6/DEqz5rvVCkL/PU9VdWUapIdeIO4qw9YvLhqdc48b9BC268Gx gIh4227Ut6E7Ye9+iwqohNYCpCOCNe9Zhyn69Eq0YvGxJP6QFvXEiEOtrRet1BLDj2zc ISWQIu6bl/XE8ns5FhqcJwik35S3adr7BXrLlKMghwiqkT2p9X0/mx89rcdMgkVQlcj8 53myejdnRfFbhd4Bl8BDyEPWMjB5ZGjg61C0UoQE5IZwtOCdxMtV2Gi3zTW79KtdxdMs 43CW96k2c9Xuf+zjXD3h3HBXXFW3MshaZ0iO50IDwjUGvc2gNlsqoFb6anJo2sabB/nM NEww== X-Forwarded-Encrypted: i=1; AJvYcCX4q8PjKfGKbbDG9o6MGDDfGiYen0TEiP64KJ/n+hZbdn/1ALyiG5+Jz7weQx1l3Aa53luhjWY2FA==@kvack.org X-Gm-Message-State: AOJu0YwqUPQV0DRZjacJ67Yitu4CTHs8OQsUNt06/gh4HOUXcUIu9OPU fL//iL5uTNbQnTNFmQjZUXa9hA6e2gvMefAt5O4Vasu+zN+MNs/fnnPD988Ey9Q= X-Gm-Gg: ASbGncuJso3N8Tt+aOiM39Yl18W/1lnjqbLVNa2QZBEZUVNG9A6xrKIfZ5JJFY44Kze UnnuE4MPw2i+4KF7BizDRSLu6XNJaL2/JjuDOxHPEE+kP1f21CTU0kX2oH3xnwHaXecNzmdKqsE P5M8CgVeeUAIoCB0F2qlQL29cL+Zq/CZs8dJyDPBLMx4H3icoMT0zgrv+RjwS0JSXyN3XhYiyRe i2dzdakQGRagh1jhNtsmeVBytbIdqhUkKNOL6Cz5HiEbz8UQlqfIDo/Vmecc5nmKtJW8zgXpQj5 iAKhb2sG8wRWxEXgRcXisQgY+4lq7ntwRUDoNwdOrS34Paxii3NvXIE0dcdN9kFVt3Wq5kMO1C+ oQ2J6uOlQvL4d1gc8J0A7S3TLKOEAQ2/0Kqm3rg== X-Google-Smtp-Source: AGHT+IFOzNBmM6sNXdBPAvZOChEL8BWYIxX5n5ObZl7b9qMQ+gMxQRQ4ExDjUCkWm/IyG4QGgX5V6Q== X-Received: by 2002:a05:6214:76c:b0:6e8:fde9:5d07 with SMTP id 6a1803df08f44-6e90066a2e7mr270478546d6.26.1741795383936; Wed, 12 Mar 2025 09:03:03 -0700 (PDT) Received: from gourry-fedora-PF4VCD3F (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6e8f707c61csm86339506d6.4.2025.03.12.09.03.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 12 Mar 2025 09:03:03 -0700 (PDT) Date: Wed, 12 Mar 2025 12:03:01 -0400 From: Gregory Price To: Rakie Kim Cc: akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org, joshua.hahnjy@gmail.com, dan.j.williams@intel.com, ying.huang@linux.alibaba.com, kernel_team@skhynix.com, honggyu.kim@sk.com, yunjeong.mun@sk.com Subject: Re: [PATCH v2 2/4] mm/mempolicy: Support memory hotplug in weighted interleave Message-ID: References: <20250312075628.648-1-rakie.kim@sk.com> <20250312075628.648-2-rakie.kim@sk.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250312075628.648-2-rakie.kim@sk.com> X-Rspam-User: X-Rspamd-Queue-Id: 721B440024 X-Stat-Signature: fpwr1smo1xxckhd1k3x1oru6u4tywm8z X-Rspamd-Server: rspam06 X-HE-Tag: 1741795385-222302 X-HE-Meta: U2FsdGVkX1/3EB57EJ4CQG/kOiN6TnfSkLL4MUAMQseoU1C9CBrTiJWLPFGAnYhbTVsPq621kKkvZbjCAIlUe3FsUGOtCrGbMPDpJvX5v+bsAKSHh93rsbTbxCcqTj6dqm9H5wIP8YYtIE/+flvbAyX/3UNWOQBD4RiOt3nAiVI77lNlRxxlSMJaVtFPkamRaoWbl54Q/JxPjQXZluFRzIuSVg/KdGagDt5HmxyDGtfBxi6nJj9zA9338vHzd3WxqdmLBa1p2LzV+VI8uptZnD0DuT71BD+rKNfhZGBwGDBEACidjbTJzgah4AvNw8uceVg2rUDssxkpdCEEFCSVY7gPJBnhDAlStPRMna83wJbveQpLxBTPRGBE7RaBvUVkobtVgoDA9MOoo6yJXr/Wc9RHgHs8KziMTjovnRUvZIiX2iH1Hl1kxU+v0N47Aogh79/HXKRrc8hLtRRBldqYhnh3jQE/4pYx7YC+2dlCLt02/twmfwOt6Qs0wruV8DczjGrBvU6nsL3jFvmAgor77A5hbL+qw0CLhslKSOHVKcJEF2NDNyqP7La1oDP633OORQb095BmAUbRyiJZ8ZEjsVUPvEYiOeEIJxChMRErBrA4Ywu0X4gpokeB/qQEpqmWoauFdQaHfgtp7Q6wX/ZMLjlz4uzM5gk4GFnSnFTJL6W38UT9AHV0PEv+mTeCvGRx9zYi9kX6yqrpZU8U3JUiqpFA4L19TsMzI+fHiuGHm/bAe4osiipidxE+5MukWFzqEexsYnid6X5p+cRHXKbSJ9waitgMK2vH6MwtJVXbr0xfzuJQxwPrqbVRf2zX7HwfqwL6aMqwlkjW1Xc9PzcZEcdJGx9EM8vxmm5/vItGM37Ko16W/XTFIEVUln+bqsHh/s6GxPrCT5BHBzKlgxMO95lizH1zkUbwfCE8TX3IiZXwOPvLm+qsxMoSyOdXOKQtAmphKvLIk4mMVZhyUkF 6rsqHVjk LqeXmdRco0/9IyB/7CfFqxAmaqLKLkOHP+8JHcUs428gwZqYeSd5AUAVqmP0/mft6IjmaeakPR9wI/16S1A4jywq8nYRHLdg74dShsOt9R2HRqfXna76IUfuD1iocsVMJj82y/eUFeYF7flCo7SLC0lxnr8Q9VOacp+gW1Y9FQaZOcRoVBsk9Y1mcIWpSlx19TqMM4fAC1/YHtsLgejvASIFwzYO6eW42acSDAhGa7KIpeoaieOuvzIkmJ1kk9ImptZ3QUckOv+UsFMCwnqKOSWQ0DpeXO9i9lWYL+jMwSkKVoOyeEi6yZXjCWFTIdIOPlTL+J+DLAlmJrptJDr3uzfZsKkd+lcsimNRQsI7f1XUviw64R1xR6OZPDvkNRCUvGnC7bi7/lzf3lcvPnL0+RjyUeFa3vFflEwqyEwtPAne706idFUuICZs3TVnpUg25EGWU1ku5Z9k5MI8IVcvPCd5X6W9M8mEvopbuPRrWege3qzxL5PEJ4sghgQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Mar 12, 2025 at 04:56:25PM +0900, Rakie Kim wrote: > The weighted interleave policy distributes page allocations across multiple > NUMA nodes based on their performance weight, thereby optimizing memory > bandwidth utilization. The weight values for each node are configured > through sysfs. > > Previously, the sysfs entries for configuring weighted interleave were only > created during initialization. This approach had several limitations: > - Sysfs entries were generated for all possible nodes at boot time, > including nodes without memory, leading to unnecessary sysfs creation. It's not that it's unnecessary, it's that it allowed for configuration of nodes which may not have memory now but may have memory in the future. This was not well documented. > - Some memory devices transition to an online state after initialization, > but the existing implementation failed to create sysfs entries for > these dynamically added nodes. As a result, memory hotplugged nodes > were not properly recognized by the weighed interleave mechanism. > The current system creates 1 node per N_POSSIBLE nodes, and since nodes can't transition between possible and not-possible your claims here are contradictory. I think you mean that simply switching from N_POSSIBLE to N_MEMORY is insufficient since nodes may transition in and out of the N_MEMORY state. Therefore this patch utilizes a hotplug callback to add and remove sysfs entries based on whether a node is in the N_MEMORY set. > To resolve these issues, this patch introduces two key improvements: > 1) At initialization, only nodes that are online and have memory are > recognized, preventing the creation of unnecessary sysfs entries. > 2) Nodes that become available after initialization are dynamically > detected and integrated through the memory hotplug mechanism. > > With this enhancement, the weighted interleave policy now properly supports > memory hotplug, ensuring that newly added nodes are recognized and sysfs > entries are created accordingly. > It doesn't "support memory hotplug" so much as it "Minimizes weighted interleave to exclude memoryless nodes". > Signed-off-by: Rakie Kim > --- > mm/mempolicy.c | 47 ++++++++++++++++++++++++++++++++++++++++++----- > 1 file changed, 42 insertions(+), 5 deletions(-) > > diff --git a/mm/mempolicy.c b/mm/mempolicy.c > index 1691748badb2..94efff89e0be 100644 > --- a/mm/mempolicy.c > +++ b/mm/mempolicy.c > @@ -113,6 +113,7 @@ > #include > #include > #include > +#include > > #include "internal.h" > > @@ -3489,9 +3490,38 @@ static int add_weight_node(int nid, struct kobject *wi_kobj) > return 0; > } > > +struct kobject *wi_kobj; > + > +static int wi_node_notifier(struct notifier_block *nb, > + unsigned long action, void *data) > +{ > + int err; > + struct memory_notify *arg = data; > + int nid = arg->status_change_nid; > + > + if (nid < 0) > + goto notifier_end; > + > + switch(action) { > + case MEM_ONLINE: > + err = add_weight_node(nid, wi_kobj); > + if (err) { > + pr_err("failed to add sysfs [node%d]\n", nid); > + kobject_put(wi_kobj); > + return NOTIFY_BAD; > + } > + break; > + case MEM_OFFLINE: > + sysfs_wi_node_release(node_attrs[nid], wi_kobj); > + break; > + } I'm fairly certain this logic is wrong. If I add two memory blocks and then remove one, would this logic not remove the sysfs entries despite there being a block remaining? > + > +notifier_end: > + return NOTIFY_OK; > +} > + > static int add_weighted_interleave_group(struct kobject *root_kobj) > { > - struct kobject *wi_kobj; > int nid, err; > > wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL); > @@ -3505,16 +3535,23 @@ static int add_weighted_interleave_group(struct kobject *root_kobj) > return err; > } > > - for_each_node_state(nid, N_POSSIBLE) { > + for_each_online_node(nid) { > + if (!node_state(nid, N_MEMORY)) Rather than online node, why not just add for each N_MEMORY node - regardless of if its memory is online or not? If the memory is offline, then it will be excluded from the weighted interleave mechanism by nature of the node being invalid for allocations anyway. > + continue; > + > err = add_weight_node(nid, wi_kobj); > if (err) { > pr_err("failed to add sysfs [node%d]\n", nid); > - break; > + goto err_out; > } > } > - if (err) > - kobject_put(wi_kobj); > + > + hotplug_memory_notifier(wi_node_notifier, DEFAULT_CALLBACK_PRI); > return 0; > + > +err_out: > + kobject_put(wi_kobj); > + return err; > } > > static void mempolicy_kobj_release(struct kobject *kobj) > -- > 2.34.1 >