linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Bing Jiao <bingjiao@google.com>
To: bingjiao@google.com
Cc: Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@kernel.org>,
	 Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	 Vlastimil Babka <vbabka@suse.cz>,
	Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	 Michal Hocko <mhocko@suse.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	 Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
	 Johannes Weiner <hannes@cmpxchg.org>,
	Qi Zheng <zhengqi.arch@bytedance.com>,
	 Shakeel Butt <shakeel.butt@linux.dev>,
	Gregory Price <gourry@gourry.net>,
	 Joshua Hahn <joshua.hahnjy@gmail.com>,
	chenridong@huaweicloud.com, linux-mm@kvack.org,
	 linux-kernel@vger.kernel.org
Subject: [PATCH v8 2/2] mm/vmscan: select the closest preferred node in demote_folio_list()
Date: Wed, 14 Jan 2026 06:59:49 +0000	[thread overview]
Message-ID: <20260114070053.2446770-2-bingjiao@google.com> (raw)
In-Reply-To: <20260114070053.2446770-1-bingjiao@google.com>

The preferred demotion node (migration_target_control.nid) should be the
one closest to the source node to minimize migration latency.  Currently,
a discrepancy exists where demote_folio_list() randomly selects an allowed
node if the preferred node from next_demotion_node() is not set in
mems_allowed.

To address it, update next_demotion_node() to return preferred nodes,
allowing the caller to select the preferred one.  Also update
demote_folio_list() to get the closest node from allowed if all
preferred nodes are not set in mems_allowed.

It ensures that the preferred demotion target is consistently the closest
available node to the source node.

Signed-off-by: Bing Jiao <bingjiao@google.com>
---
 include/linux/memory-tiers.h |  6 +++---
 mm/memory-tiers.c            | 11 +++++++----
 mm/vmscan.c                  | 30 +++++++++++++++++++++++-------
 3 files changed, 33 insertions(+), 14 deletions(-)

diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
index 7a805796fcfd..87652042f2c2 100644
--- a/include/linux/memory-tiers.h
+++ b/include/linux/memory-tiers.h
@@ -53,11 +53,11 @@ struct memory_dev_type *mt_find_alloc_memory_type(int adist,
 						  struct list_head *memory_types);
 void mt_put_memory_types(struct list_head *memory_types);
 #ifdef CONFIG_MIGRATION
-int next_demotion_node(int node);
+int next_demotion_node(int node, nodemask_t *preferred_nodes);
 void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets);
 bool node_is_toptier(int node);
 #else
-static inline int next_demotion_node(int node)
+static inline int next_demotion_node(int node, nodemask_t *preferred_nodes)
 {
 	return NUMA_NO_NODE;
 }
@@ -101,7 +101,7 @@ static inline void clear_node_memory_type(int node, struct memory_dev_type *memt

 }

-static inline int next_demotion_node(int node)
+static inline int next_demotion_node(int node, nodemask_t *preferred_nodes)
 {
 	return NUMA_NO_NODE;
 }
diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
index 20aab9c19c5e..c71a865e9dd8 100644
--- a/mm/memory-tiers.c
+++ b/mm/memory-tiers.c
@@ -320,13 +320,14 @@ void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets)
 /**
  * next_demotion_node() - Get the next node in the demotion path
  * @node: The starting node to lookup the next node
+ * @preferred_nodes: The pointer to nodemask of all preferred nodes to return
  *
  * Return: node id for next memory node in the demotion path hierarchy
- * from @node; NUMA_NO_NODE if @node is terminal.  This does not keep
- * @node online or guarantee that it *continues* to be the next demotion
- * target.
+ * from @node; NUMA_NO_NODE if @node is terminal. Also returns all preferred
+ * nodes in @preferred_nodes. This does not keep @node online or guarantee
+ * that it *continues* to be the next demotion target.
  */
-int next_demotion_node(int node)
+int next_demotion_node(int node, nodemask_t *preferred_nodes)
 {
 	struct demotion_nodes *nd;
 	int target;
@@ -357,6 +358,8 @@ int next_demotion_node(int node)
 	 * target node randomly seems better until now.
 	 */
 	target = node_random(&nd->preferred);
+	if (preferred_nodes)
+		nodes_copy(*preferred_nodes, nd->preferred);
 	rcu_read_unlock();

 	return target;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 71617891bcde..18f4425bd750 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1023,9 +1023,10 @@ static unsigned int demote_folio_list(struct list_head *demote_folios,
 				      struct pglist_data *pgdat,
 				      struct mem_cgroup *memcg)
 {
-	int target_nid = next_demotion_node(pgdat->node_id);
+	int target_nid;
 	unsigned int nr_succeeded;
 	nodemask_t allowed_mask;
+	nodemask_t preferred;

 	struct migration_target_control mtc = {
 		/*
@@ -1042,17 +1043,32 @@ static unsigned int demote_folio_list(struct list_head *demote_folios,
 	if (list_empty(demote_folios))
 		return 0;

-	if (target_nid == NUMA_NO_NODE)
-		/* No lower-tier nodes or nodes were hot-unplugged. */
-		return 0;
-
 	node_get_allowed_targets(pgdat, &allowed_mask);
 	mem_cgroup_node_filter_allowed(memcg, &allowed_mask);
 	if (nodes_empty(allowed_mask))
 		return 0;

-	if (!node_isset(target_nid, allowed_mask))
-		target_nid = node_random(&allowed_mask);
+	target_nid = next_demotion_node(pgdat->node_id, &preferred);
+	if (target_nid == NUMA_NO_NODE)
+		/* No lower-tier nodes (e.g., nodes were hot-unplugged) */
+		return 0;
+
+	if (!node_isset(target_nid, allowed_mask)) {
+		/* Filter out preferred nodes that are not in allowed. */
+		nodes_and(preferred, preferred, allowed_mask);
+		if (!nodes_empty(preferred)) {
+			/* Randomly select one node from preferred. */
+			target_nid = node_random(&preferred);
+		} else {
+			/* Get the closest node from allowed. */
+			nodes_complement(preferred, allowed_mask);
+			target_nid = find_next_best_node(pgdat->node_id,
+							 &preferred);
+			if (target_nid == NUMA_NO_NODE)
+				return 0;
+		}
+	}
+
 	mtc.nid = target_nid;

 	/* Demotion ignores all cpuset and mempolicy settings */
--
2.52.0.457.g6b5491de43-goog



      reply	other threads:[~2026-01-14  7:01 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-20  6:10 [PATCH] mm/vmscan: respect mems_effective " Bing Jiao
2025-12-20 19:20 ` Andrew Morton
2025-12-22  6:16   ` Bing Jiao
2025-12-21 12:07 ` Gregory Price
2025-12-22  6:28   ` Bing Jiao
2025-12-21 23:36 ` [PATCH v2 0/2] fix demotion targets checks in reclaim/demotion Bing Jiao
2025-12-21 23:36   ` [PATCH v2 1/2] mm/vmscan: respect mems_effective in demote_folio_list() Bing Jiao
2025-12-22  2:38     ` Chen Ridong
2025-12-22 21:56     ` kernel test robot
2025-12-22 22:18     ` kernel test robot
2025-12-21 23:36   ` [PATCH v2 2/2] mm/vmscan: check all allowed targets in can_demote() Bing Jiao
2025-12-22  2:51     ` Chen Ridong
2025-12-22  6:09       ` Bing Jiao
2025-12-22  8:28         ` Chen Ridong
2025-12-23 21:19   ` [PATCH v3] mm/vmscan: fix demotion targets checks in reclaim/demotion Bing Jiao
2025-12-23 21:38     ` Bing Jiao
2025-12-24  1:19     ` Gregory Price
2025-12-26 18:48       ` Bing Jiao
2026-01-05 21:57         ` Bing Jiao
2025-12-24  1:49     ` Chen Ridong
2025-12-26 18:58       ` Bing Jiao
2025-12-26 19:32     ` Waiman Long
2025-12-26 20:24     ` Waiman Long
2026-01-04  9:04       ` Bing Jiao
2026-01-04  8:54     ` [PATCH v4] " Bing Jiao
2026-01-04 18:27       ` Andrew Morton
2026-01-05  5:08         ` Bing Jiao
2026-01-05  2:48       ` Chen Ridong
2026-01-05  5:10         ` Bing Jiao
2026-01-05  5:01       ` [PATCH v5] " Bing Jiao
2026-01-05 15:54         ` Gregory Price
2026-01-05 21:34           ` Bing Jiao
2026-01-06  7:56         ` [PATCH v6] " Bing Jiao
2026-01-06 14:23           ` Gregory Price
2026-01-06 19:36           ` Andrew Morton
2026-01-07  1:27           ` Chen Ridong
2026-01-08  3:32           ` [PATCH v7 0/2] " Bing Jiao
2026-01-08  3:32             ` [PATCH v7 1/2] " Bing Jiao
2026-01-08  3:32             ` [PATCH v7 2/2] mm/vmscan: select the closest preferred node in demote_folio_list() Bing Jiao
2026-01-10  3:00               ` [PATCH] mm/vmscan: fix uninitialized variable " Bing Jiao
2026-01-10  3:38               ` Bing Jiao
2026-01-14  6:59             ` [PATCH v8 0/2] mm/vmscan: select the closest preferred node " Bing Jiao
2026-01-14  6:59               ` Bing Jiao [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260114070053.2446770-2-bingjiao@google.com \
    --to=bingjiao@google.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=chenridong@huaweicloud.com \
    --cc=david@kernel.org \
    --cc=gourry@gourry.net \
    --cc=hannes@cmpxchg.org \
    --cc=joshua.hahnjy@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=rppt@kernel.org \
    --cc=shakeel.butt@linux.dev \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=weixugc@google.com \
    --cc=yuanchu@google.com \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox