From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9138EC4332F for ; Thu, 2 Nov 2023 05:21:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0C7578D0076; Thu, 2 Nov 2023 01:21:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 04F808D0026; Thu, 2 Nov 2023 01:21:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E0C4B8D0076; Thu, 2 Nov 2023 01:21:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id CA8F58D0026 for ; Thu, 2 Nov 2023 01:21:18 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 9DDF3160EA2 for ; Thu, 2 Nov 2023 05:21:18 +0000 (UTC) X-FDA: 81411865836.17.7756AD1 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.43]) by imf24.hostedemail.com (Postfix) with ESMTP id 22C71180013 for ; Thu, 2 Nov 2023 05:21:15 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="TF//wEdP"; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf24.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1698902476; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=y9kYuubv6MlPfi1f557Q1G5chGBeSf5NEvjPRdmhzpY=; b=E7wqi1veYpoQ4hGfs6mKYYk6CdVzSPlkYlIjx5PT7/u366yujJBdupybtlcOOKnSGZRL1g uZuvM5JQ/lsN4ZzCJ0DuV5gtCm4vIg8HxCfx9F4aqhDrAAnMBroOzbShtVLPxAhNGyvTzS V91oAZxZVGzDm62iI9OpXosdklkaMlQ= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="TF//wEdP"; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf24.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1698902476; a=rsa-sha256; cv=none; b=pjFn0nh6egf6r0l7u5HDimSIK/5y2mSK9WTXeYmYMfolal7Q79qeXIpKCLfVjzxskI3NsO Ho0xMrsrZC3Ul/QCJZ16C9vw/KKBXvlB+d3/pA8cETlfkXwYyw1p61dLW2KodVQLxKN9HO iSIsgNRHKwcV8qWac4q2wmyRkrMMH5k= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1698902476; x=1730438476; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version:content-transfer-encoding; bh=XjJIMBcU/lOPh1j2ffct2ioR/1Xofx7bdH3bKdf4IvY=; b=TF//wEdPydSX8xE5O0ZE0xczlnzKq/L16/6Nn3+gZNLiMo+NeZepFbs7 bhu8AJS3GnDWvYXHHjYdMEQhun4G5ST5dhfpwcG+StJEmrv7RarL75nUi mDytdx/YQcKXTqhRzeOpJOLe+VD9GtA0SDLppbqZa+5IvK6SzosytJP+N mKu2fHSnBsKydOca/l6yG7lysJfM9hrIJrrVLpdRnGAHIiz039z+cw2Mi bgS7dEhmUVrUsNVLuNtJw/Z54WNcvWO3nng8Q6+lwWyjHoQGrsgOraE9j NbKj+1Ac11Bj9am8UNm+aX1IR2sVe0jojAGVQGatOkRW2Ru8V2/p7rm1n Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10881"; a="474878695" X-IronPort-AV: E=Sophos;i="6.03,270,1694761200"; d="scan'208";a="474878695" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Nov 2023 22:20:50 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10881"; a="1092599736" X-IronPort-AV: E=Sophos;i="6.03,270,1694761200"; d="scan'208";a="1092599736" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Nov 2023 22:20:48 -0700 From: "Huang, Ying" To: "Zhijian Li (Fujitsu)" Cc: Andrew Morton , Greg Kroah-Hartman , "rafael@kernel.org" , "linux-mm@kvack.org" , "Yasunori Gotou (Fujitsu)" , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH RFC 1/4] drivers/base/node: Add demotion_nodes sys infterface In-Reply-To: (Zhijian Li's message of "Thu, 2 Nov 2023 03:39:58 +0000") References: <20231102025648.1285477-1-lizhijian@fujitsu.com> <20231102025648.1285477-2-lizhijian@fujitsu.com> <878r7g3ktj.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Thu, 02 Nov 2023 13:18:46 +0800 Message-ID: <87zfzw20nd.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 22C71180013 X-Stat-Signature: bsyy945ejoyj6hfdqa1xjnhhfdacrw6a X-HE-Tag: 1698902475-663417 X-HE-Meta: U2FsdGVkX1+/WIxXi4q1cqtb4I4nyidWXjcjKNPWeVzmVpraIQRUJ8JZLjRkRoTu5PT4HGa9LVDJF5AwpgtNVgLpTIFHseCh2Sd7CSey5wYsAXkVwfCzwy7I8T3Sxju1HV7Dfvn6O32WylO7+07hLLMzQ/DkS5SRLXDywwiCeqZ9LReCSfiNsaQBK4CFBMUQZda9JYjAMQrdNmXK6MLu4rgFNdakcz9bNlKZa+HGYhoUqQWLHB4ec6ZZLbVexzuuoneAeRvML2GbfcP22GjKUqd+pFhVVVBEYvbwYd7AZ13btBplyxBp3FOS0/hz6EzOM6X+Jh3U9CY6L+Y+oxiSZnuue88ZN1N0lrPi8i0hISIdturQOHqDi07QCJ6QMneRM7QCyJA+jZDNPN9bJnnNDlx9Bsxjvqdj1BQ4PuOIyvUJLGKTpTV2Ogj3fXmGq3XFiLfc0spVE8vv+0Om9vJnmirRDD6mvSTBNeBRpBPgXLjLZ8qJxcT099PS3do9I8Os1UAF1OlzT62n6XGhqfs9bdNzaZM965KMEuaQMIXoMHxIO9JJyRlzYaOAJnNNzJFvtbbagjyXa4II3q6uEoFukdL0D1HZq8JcwnVVj96cMGKRYkpqRR+swsYq8NldVZsYwRkY31AiZhJIrFo/tuZc8fCZcuQMg2hHBCHZ58FtyigN6hmCwDS4Dj1WMHWGiUnwj9O2sHD6hZRhHXyjMdcFNeazzqr487I7iB0sBWWOANvsUFo0vNkGNeCUoRErLVS3+FANIN37Vm0eKFzCMh4u10GR/aWy36liPSj/GHYf0xjI3lzlFoWhC8Uwa5my68mOPGd3mN71f2TIUmSb5siB/3KIQ5gJUTlKBpjEXM8lDI+kGqSE4E1GE79hmPdkWpXeX4LWqru6aKbi+NqwZiud/QK4rpI4mKTTDc3lJRaoaqwEnTn68tNzlJxNZKN4E7rtc+5UFiNOHZKfQudt646 3mHvS0Cs UcR6M86oznm4d4aOUtts+HpZ/wWELvmUw4L8bPUNjZ5e92i3blxIY7jStj/qo3Rl8SCD6Ry4Hdr2xnXoM/vhR57NhJSXLK9U618wVEnb71OpUrN8sexXSc2sgZPCLMQb336D9oRYB8vyeihGWaGUbSjcsKnIx6HiZEfp4lQPCRLmb7A6wLkS64L1zYuxEb4n0NaJKXaANd4ce8uyY4dyOukJ9bbT7S6Tuk23v+f1J1JRm0hOcPvbOOnndldNDJjRvSBwG3e2JC1lxiqrdQZP/YCTxIm7drGA0gELfOLBHoLbvrLO5SP/PCpEt3w14s9KWDKAzjfxobbia6G7/VSjXZusXeodn4sa0FyPg/dsByynyMEGhxsjIqZG4GWhO89Kum6fuRvoX+rROv80= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: "Zhijian Li (Fujitsu)" writes: >> We have /sys/devices/virtual/memory_tiering/memory_tier*/nodelist >> already. A node in a higher tier can demote to any node in the lower >> tiers. What's more need to be displayed in nodeX/demotion_nodes? > > IIRC, they are not the same. memory_tier[number], where the number is sha= red by > the memory using the same memory driver(dax/kmem etc). Not reflect the ac= tual distance > across nodes(different distance will be grouped into the same memory_tier= ). > But demotion will only select the nearest nodelist to demote. In the following patchset, we will use the performance information from HMAT to place nodes using the same memory driver into different memory tiers. https://lore.kernel.org/all/20230926060628.265989-1-ying.huang@intel.com/ The patch is in mm-stable tree. > Below is an example, node0 node1 are DRAM, node2 node3 are PMEM, but dist= ance to DRAM nodes > are different. >=20=20 > # numactl -H > available: 4 nodes (0-3) > node 0 cpus: 0 > node 0 size: 964 MB > node 0 free: 746 MB > node 1 cpus: 1 > node 1 size: 685 MB > node 1 free: 455 MB > node 2 cpus: > node 2 size: 896 MB > node 2 free: 897 MB > node 3 cpus: > node 3 size: 896 MB > node 3 free: 896 MB > node distances: > node 0 1 2 3 > 0: 10 20 20 25 > 1: 20 10 25 20 > 2: 20 25 10 20 > 3: 25 20 20 10 > # cat /sys/devices/system/node/node0/demotion_nodes > 2 node 2 is only the preferred demotion target. In fact, memory in node 0 can be demoted to node 2,3. Please check demote_folio_list() for details. -- Best Regards, Huang, Ying > # cat /sys/devices/system/node/node1/demotion_nodes > 3 > # cat /sys/devices/virtual/memory_tiering/memory_tier22/nodelist > 2-3 > > Thanks > Zhijian > > (I hate the outlook native reply composition format.) > ________________________________________ > From: Huang, Ying > Sent: Thursday, November 2, 2023 11:17 > To: Li, Zhijian/=E6=9D=8E =E6=99=BA=E5=9D=9A > Cc: Andrew Morton; Greg Kroah-Hartman; rafael@kernel.org; linux-mm@kvack.= org; Gotou, Yasunori/=E4=BA=94=E5=B3=B6 =E5=BA=B7=E6=96=87; linux-kernel@vg= er.kernel.org > Subject: Re: [PATCH RFC 1/4] drivers/base/node: Add demotion_nodes sys in= fterface > > Li Zhijian writes: > >> It shows the demotion target nodes of a node. Export this information to >> user directly. >> >> Below is an example where node0 node1 are DRAM, node3 is a PMEM node. >> - Before PMEM is online, no demotion_nodes for node0 and node1. >> $ cat /sys/devices/system/node/node0/demotion_nodes >> >> - After node3 is online as kmem >> $ daxctl reconfigure-device --mode=3Dsystem-ram --no-online dax0.0 && da= xctl online-memory dax0.0 >> [ >> { >> "chardev":"dax0.0", >> "size":1054867456, >> "target_node":3, >> "align":2097152, >> "mode":"system-ram", >> "online_memblocks":0, >> "total_memblocks":7 >> } >> ] >> $ cat /sys/devices/system/node/node0/demotion_nodes >> 3 >> $ cat /sys/devices/system/node/node1/demotion_nodes >> 3 >> $ cat /sys/devices/system/node/node3/demotion_nodes >> > > We have /sys/devices/virtual/memory_tiering/memory_tier*/nodelist > already. A node in a higher tier can demote to any node in the lower > tiers. What's more need to be displayed in nodeX/demotion_nodes? > > -- > Best Regards, > Huang, Ying > >> Signed-off-by: Li Zhijian >> --- >> drivers/base/node.c | 13 +++++++++++++ >> include/linux/memory-tiers.h | 6 ++++++ >> mm/memory-tiers.c | 8 ++++++++ >> 3 files changed, 27 insertions(+) >> >> diff --git a/drivers/base/node.c b/drivers/base/node.c >> index 493d533f8375..27e8502548a7 100644 >> --- a/drivers/base/node.c >> +++ b/drivers/base/node.c >> @@ -7,6 +7,7 @@ >> #include >> #include >> #include >> +#include >> #include >> #include >> #include >> @@ -569,11 +570,23 @@ static ssize_t node_read_distance(struct device *d= ev, >> } >> static DEVICE_ATTR(distance, 0444, node_read_distance, NULL); >> >> +static ssize_t demotion_nodes_show(struct device *dev, >> + struct device_attribute *attr, char *buf) >> +{ >> + int ret; >> + nodemask_t nmask =3D next_demotion_nodes(dev->id); >> + >> + ret =3D sysfs_emit(buf, "%*pbl\n", nodemask_pr_args(&nmask)); >> + return ret; >> +} >> +static DEVICE_ATTR_RO(demotion_nodes); >> + >> static struct attribute *node_dev_attrs[] =3D { >> &dev_attr_meminfo.attr, >> &dev_attr_numastat.attr, >> &dev_attr_distance.attr, >> &dev_attr_vmstat.attr, >> + &dev_attr_demotion_nodes.attr, >> NULL >> }; >> >> diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h >> index 437441cdf78f..8eb04923f965 100644 >> --- a/include/linux/memory-tiers.h >> +++ b/include/linux/memory-tiers.h >> @@ -38,6 +38,7 @@ void init_node_memory_type(int node, struct memory_dev= _type *default_type); >> void clear_node_memory_type(int node, struct memory_dev_type *memtype); >> #ifdef CONFIG_MIGRATION >> int next_demotion_node(int node); >> +nodemask_t next_demotion_nodes(int node); >> void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets); >> bool node_is_toptier(int node); >> #else >> @@ -46,6 +47,11 @@ static inline int next_demotion_node(int node) >> return NUMA_NO_NODE; >> } >> >> +static inline next_demotion_nodes next_demotion_nodes(int node) >> +{ >> + return NODE_MASK_NONE; >> +} >> + >> static inline void node_get_allowed_targets(pg_data_t *pgdat, nodemask_= t *targets) >> { >> *targets =3D NODE_MASK_NONE; >> diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c >> index 37a4f59d9585..90047f37d98a 100644 >> --- a/mm/memory-tiers.c >> +++ b/mm/memory-tiers.c >> @@ -282,6 +282,14 @@ void node_get_allowed_targets(pg_data_t *pgdat, nod= emask_t *targets) >> rcu_read_unlock(); >> } >> >> +nodemask_t next_demotion_nodes(int node) >> +{ >> + if (!node_demotion) >> + return NODE_MASK_NONE; >> + >> + return node_demotion[node].preferred; >> +} >> + >> /** >> * next_demotion_node() - Get the next node in the demotion path >> * @node: The starting node to lookup the next node