linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Fan Du <fan.du@intel.com>
To: akpm@linux-foundation.org, mhocko@suse.com,
	fengguang.wu@intel.com, dan.j.williams@intel.com,
	dave.hansen@intel.com, xishi.qiuxishi@alibaba-inc.com,
	ying.huang@intel.com
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Fan Du <fan.du@intel.com>
Subject: [RFC PATCH 4/5] mm, page alloc: build fallback list on per node type basis
Date: Thu, 25 Apr 2019 09:21:34 +0800	[thread overview]
Message-ID: <1556155295-77723-5-git-send-email-fan.du@intel.com> (raw)
In-Reply-To: <1556155295-77723-1-git-send-email-fan.du@intel.com>

On box with both DRAM and PMEM managed by mm system,
Usually node 0, 1 are DRAM nodes, nodes 2, 3 are PMEM nodes.
nofallback list are same as before, fallback list are not
redesigned to be arranged by node type basis, iow,
allocation request of DRAM page start from node 0 will go
through node0->node1->node2->node3 zonelists.

Signed-off-by: Fan Du <fan.du@intel.com>
---
 include/linux/mmzone.h |  8 ++++++++
 mm/page_alloc.c        | 42 ++++++++++++++++++++++++++----------------
 2 files changed, 34 insertions(+), 16 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index d3ee9f9..8c37e1c 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -939,6 +939,14 @@ static inline int is_node_dram(int nid)
 	return test_bit(PGDAT_DRAM, &pgdat->flags);
 }
 
+static inline int is_node_same_type(int nida, int nidb)
+{
+	if (node_isset(nida, numa_nodes_pmem))
+		return node_isset(nidb, numa_nodes_pmem);
+	else
+		return node_isset(nidb, numa_nodes_dram);
+}
+
 static inline void set_node_type(int nid)
 {
 	pg_data_t *pgdat = NODE_DATA(nid);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c6ce20a..a408a91 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5372,7 +5372,7 @@ int numa_zonelist_order_handler(struct ctl_table *table, int write,
  *
  * Return: node id of the found node or %NUMA_NO_NODE if no node is found.
  */
-static int find_next_best_node(int node, nodemask_t *used_node_mask)
+static int find_next_best_node(int node, nodemask_t *used_node_mask, int need_same_type)
 {
 	int n, val;
 	int min_val = INT_MAX;
@@ -5380,7 +5380,7 @@ static int find_next_best_node(int node, nodemask_t *used_node_mask)
 	const struct cpumask *tmp = cpumask_of_node(0);
 
 	/* Use the local node if we haven't already */
-	if (!node_isset(node, *used_node_mask)) {
+	if (need_same_type && !node_isset(node, *used_node_mask)) {
 		node_set(node, *used_node_mask);
 		return node;
 	}
@@ -5391,6 +5391,12 @@ static int find_next_best_node(int node, nodemask_t *used_node_mask)
 		if (node_isset(n, *used_node_mask))
 			continue;
 
+		if (need_same_type && !is_node_same_type(node, n))
+			continue;
+
+		if (!need_same_type && is_node_same_type(node, n))
+			continue;
+
 		/* Use the distance array to find the distance */
 		val = node_distance(node, n);
 
@@ -5472,31 +5478,35 @@ static void build_zonelists(pg_data_t *pgdat)
 	int node, load, nr_nodes = 0;
 	nodemask_t used_mask;
 	int local_node, prev_node;
+	int need_same_type;
 
 	/* NUMA-aware ordering of nodes */
 	local_node = pgdat->node_id;
 	load = nr_online_nodes;
 	prev_node = local_node;
-	nodes_clear(used_mask);
 
 	memset(node_order, 0, sizeof(node_order));
-	while ((node = find_next_best_node(local_node, &used_mask)) >= 0) {
-		/*
-		 * We don't want to pressure a particular node.
-		 * So adding penalty to the first node in same
-		 * distance group to make it round-robin.
-		 */
-		if (node_distance(local_node, node) !=
-		    node_distance(local_node, prev_node))
-			node_load[node] = load;
+	for (need_same_type = 1; need_same_type >= 0; need_same_type--) {
+		nodes_clear(used_mask);
+		while ((node = find_next_best_node(local_node, &used_mask,
+				need_same_type)) >= 0) {
+			/*
+			 * We don't want to pressure a particular node.
+			 * So adding penalty to the first node in same
+			 * distance group to make it round-robin.
+			 */
+			if (node_distance(local_node, node) !=
+			    node_distance(local_node, prev_node))
+				node_load[node] = load;
 
-		node_order[nr_nodes++] = node;
-		prev_node = node;
-		load--;
+			node_order[nr_nodes++] = node;
+			prev_node = node;
+			load--;
+		}
 	}
-
 	build_zonelists_in_node_order(pgdat, node_order, nr_nodes);
 	build_thisnode_zonelists(pgdat);
+
 }
 
 #ifdef CONFIG_HAVE_MEMORYLESS_NODES
-- 
1.8.3.1


  parent reply	other threads:[~2019-04-25  1:42 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-25  1:21 [RFC PATCH 0/5] New fallback workflow for heterogeneous memory system Fan Du
2019-04-25  1:21 ` [RFC PATCH 1/5] acpi/numa: memorize NUMA node type from SRAT table Fan Du
2019-04-25  1:21 ` [RFC PATCH 2/5] mmzone: new pgdat flags for DRAM and PMEM Fan Du
2019-04-25  1:21 ` [RFC PATCH 3/5] x86,numa: update numa node type Fan Du
2019-04-25  1:21 ` Fan Du [this message]
2019-04-25  1:21 ` [RFC PATCH 5/5] mm, page_alloc: Introduce ZONELIST_FALLBACK_SAME_TYPE fallback list Fan Du
     [not found]   ` <a0728518-a067-4f89-a8ae-3fa279f768f2.xishi.qiuxishi@alibaba-inc.com>
2019-04-25  3:26     ` Xishi Qiu
2019-04-25  7:45       ` Du, Fan
2019-04-25  6:38   ` Michal Hocko
2019-04-25  7:43     ` Du, Fan
2019-04-25  7:48       ` Michal Hocko
2019-04-25  7:55         ` Du, Fan
2019-04-25  8:09           ` Michal Hocko
2019-04-25  8:20             ` Du, Fan
2019-04-25  8:43               ` Michal Hocko
2019-04-25  9:18                 ` Du, Fan
2019-04-25  6:37 ` [RFC PATCH 0/5] New fallback workflow for heterogeneous memory system Michal Hocko
2019-04-25  7:41   ` Du, Fan
2019-04-25  7:53     ` Michal Hocko
2019-04-25  8:05       ` Du, Fan
2019-04-25 15:43         ` Dan Williams
2019-04-26  2:40           ` Du, Fan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1556155295-77723-5-git-send-email-fan.du@intel.com \
    --to=fan.du@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=fengguang.wu@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=xishi.qiuxishi@alibaba-inc.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox