From: Bing Jiao <bingjiao@google.com>
To: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
gourry@gourry.net, longman@redhat.com, hannes@cmpxchg.org,
mhocko@kernel.org, roman.gushchin@linux.dev,
shakeel.butt@linux.dev, muchun.song@linux.dev, tj@kernel.org,
mkoutny@suse.com, david@kernel.org, zhengqi.arch@bytedance.com,
lorenzo.stoakes@oracle.com, axelrasmussen@google.com,
chenridong@huaweicloud.com, yuanchu@google.com,
weixugc@google.com, cgroups@vger.kernel.org,
joshua.hahnjy@gmail.com, bingjiao@google.com
Subject: [PATCH v7 2/2] mm/vmscan: select the closest preferred node in demote_folio_list()
Date: Thu, 8 Jan 2026 03:32:47 +0000 [thread overview]
Message-ID: <20260108033248.2791579-3-bingjiao@google.com> (raw)
In-Reply-To: <20260108033248.2791579-1-bingjiao@google.com>
The preferred demotion node (migration_target_control.nid) should be
the one closest to the source node to minimize migration latency.
Currently, a discrepancy exists where demote_folio_list() randomly
selects an allowed node if the preferred node from next_demotion_node()
is not set in mems_allowed.
To address it, update next_demotion_node() to return preferred nodes,
allowing the caller to select the preferred one.
Also update demote_folio_list() to traverse the demotion targets
hierarchically until a preferred node within mems_allowed is found.
It ensures that the selected demotion target is consistently
the closest available node.
Signed-off-by: Bing Jiao <bingjiao@google.com>
---
include/linux/memory-tiers.h | 6 +++---
mm/memory-tiers.c | 11 +++++++----
mm/vmscan.c | 25 ++++++++++++++++++++++---
3 files changed, 32 insertions(+), 10 deletions(-)
diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
index 7a805796fcfd..87652042f2c2 100644
--- a/include/linux/memory-tiers.h
+++ b/include/linux/memory-tiers.h
@@ -53,11 +53,11 @@ struct memory_dev_type *mt_find_alloc_memory_type(int adist,
struct list_head *memory_types);
void mt_put_memory_types(struct list_head *memory_types);
#ifdef CONFIG_MIGRATION
-int next_demotion_node(int node);
+int next_demotion_node(int node, nodemask_t *preferred_nodes);
void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets);
bool node_is_toptier(int node);
#else
-static inline int next_demotion_node(int node)
+static inline int next_demotion_node(int node, nodemask_t *preferred_nodes)
{
return NUMA_NO_NODE;
}
@@ -101,7 +101,7 @@ static inline void clear_node_memory_type(int node, struct memory_dev_type *memt
}
-static inline int next_demotion_node(int node)
+static inline int next_demotion_node(int node, nodemask_t *preferred_nodes)
{
return NUMA_NO_NODE;
}
diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
index 864811fff409..286e4b5fa0e5 100644
--- a/mm/memory-tiers.c
+++ b/mm/memory-tiers.c
@@ -320,13 +320,14 @@ void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets)
/**
* next_demotion_node() - Get the next node in the demotion path
* @node: The starting node to lookup the next node
+ * @preferred_nodes: The pointer to nodemask of all preferred nodes to return
*
* Return: node id for next memory node in the demotion path hierarchy
- * from @node; NUMA_NO_NODE if @node is terminal. This does not keep
- * @node online or guarantee that it *continues* to be the next demotion
- * target.
+ * from @node; NUMA_NO_NODE if @node is terminal. Also returns all preferred
+ * nodes in @preferred_nodes. This does not keep @node online or guarantee
+ * that it *continues* to be the next demotion target.
*/
-int next_demotion_node(int node)
+int next_demotion_node(int node, nodemask_t *preferred_nodes)
{
struct demotion_nodes *nd;
int target;
@@ -357,6 +358,8 @@ int next_demotion_node(int node)
* target node randomly seems better until now.
*/
target = node_random(&nd->preferred);
+ if (preferred_nodes)
+ nodes_copy(*preferred_nodes, nd->preferred);
rcu_read_unlock();
return target;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 94ff5aa7c4fb..213ee75b3306 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1024,9 +1024,10 @@ static unsigned int demote_folio_list(struct list_head *demote_folios,
struct pglist_data *pgdat,
struct mem_cgroup *memcg)
{
- int target_nid = next_demotion_node(pgdat->node_id);
+ int target_nid;
unsigned int nr_succeeded;
nodemask_t allowed_mask;
+ nodemask_t preferred;
struct migration_target_control mtc = {
/*
@@ -1052,8 +1053,26 @@ static unsigned int demote_folio_list(struct list_head *demote_folios,
if (nodes_empty(allowed_mask))
return 0;
- if (!node_isset(target_nid, allowed_mask))
- target_nid = node_random(&allowed_mask);
+ target_nid = next_demotion_node(pgdat->node_id, &preferred);
+ while (target_nid != NUMA_NO_NODE &&
+ !node_isset(target_nid, allowed_mask)) {
+ /* Filter out preferred nodes that are not in allowed. */
+ nodes_and(preferred, preferred, allowed_mask);
+ if (!nodes_empty(preferred)) {
+ /* Randomly select one node from preferred. */
+ target_nid = node_random(&preferred);
+ break;
+ }
+ /*
+ * Preferred nodes in the lower tier are not set in allowed.
+ * Recursively get preferred from the next lower tier.
+ */
+ target_nid = next_demotion_node(target_nid, &preferred);
+ }
+
+ if (target_nid == NUMA_NO_NODE)
+ /* Nodes are gone (e.g., hot-unplugged). */
+ return 0;
mtc.nid = target_nid;
/* Demotion ignores all cpuset and mempolicy settings */
--
2.52.0.457.g6b5491de43-goog
prev parent reply other threads:[~2026-01-08 3:32 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-20 6:10 [PATCH] mm/vmscan: respect mems_effective " Bing Jiao
2025-12-20 19:20 ` Andrew Morton
2025-12-22 6:16 ` Bing Jiao
2025-12-21 12:07 ` Gregory Price
2025-12-22 6:28 ` Bing Jiao
2025-12-21 23:36 ` [PATCH v2 0/2] fix demotion targets checks in reclaim/demotion Bing Jiao
2025-12-21 23:36 ` [PATCH v2 1/2] mm/vmscan: respect mems_effective in demote_folio_list() Bing Jiao
2025-12-22 2:38 ` Chen Ridong
2025-12-22 21:56 ` kernel test robot
2025-12-22 22:18 ` kernel test robot
2025-12-21 23:36 ` [PATCH v2 2/2] mm/vmscan: check all allowed targets in can_demote() Bing Jiao
2025-12-22 2:51 ` Chen Ridong
2025-12-22 6:09 ` Bing Jiao
2025-12-22 8:28 ` Chen Ridong
2025-12-23 21:19 ` [PATCH v3] mm/vmscan: fix demotion targets checks in reclaim/demotion Bing Jiao
2025-12-23 21:38 ` Bing Jiao
2025-12-24 1:19 ` Gregory Price
2025-12-26 18:48 ` Bing Jiao
2026-01-05 21:57 ` Bing Jiao
2025-12-24 1:49 ` Chen Ridong
2025-12-26 18:58 ` Bing Jiao
2025-12-26 19:32 ` Waiman Long
2025-12-26 20:24 ` Waiman Long
2026-01-04 9:04 ` Bing Jiao
2026-01-04 8:54 ` [PATCH v4] " Bing Jiao
2026-01-04 18:27 ` Andrew Morton
2026-01-05 5:08 ` Bing Jiao
2026-01-05 2:48 ` Chen Ridong
2026-01-05 5:10 ` Bing Jiao
2026-01-05 5:01 ` [PATCH v5] " Bing Jiao
2026-01-05 15:54 ` Gregory Price
2026-01-05 21:34 ` Bing Jiao
2026-01-06 7:56 ` [PATCH v6] " Bing Jiao
2026-01-06 14:23 ` Gregory Price
2026-01-06 19:36 ` Andrew Morton
2026-01-07 1:27 ` Chen Ridong
2026-01-08 3:32 ` [PATCH v7 0/2] " Bing Jiao
2026-01-08 3:32 ` [PATCH v7 1/2] " Bing Jiao
2026-01-08 3:32 ` Bing Jiao [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260108033248.2791579-3-bingjiao@google.com \
--to=bingjiao@google.com \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=cgroups@vger.kernel.org \
--cc=chenridong@huaweicloud.com \
--cc=david@kernel.org \
--cc=gourry@gourry.net \
--cc=hannes@cmpxchg.org \
--cc=joshua.hahnjy@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=longman@redhat.com \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@kernel.org \
--cc=mkoutny@suse.com \
--cc=muchun.song@linux.dev \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=tj@kernel.org \
--cc=weixugc@google.com \
--cc=yuanchu@google.com \
--cc=zhengqi.arch@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox