From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
To: linux-mm@kvack.org, akpm@linux-foundation.org
Cc: Wei Xu <weixugc@google.com>, Huang Ying <ying.huang@intel.com>,
Greg Thelen <gthelen@google.com>, Yang Shi <shy828301@gmail.com>,
Davidlohr Bueso <dave@stgolabs.net>,
Tim C Chen <tim.c.chen@intel.com>,
Brice Goglin <brice.goglin@gmail.com>,
Michal Hocko <mhocko@kernel.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Hesham Almatary <hesham.almatary@huawei.com>,
Dave Hansen <dave.hansen@intel.com>,
Jonathan Cameron <Jonathan.Cameron@huawei.com>,
Alistair Popple <apopple@nvidia.com>,
Dan Williams <dan.j.williams@intel.com>,
Feng Tang <feng.tang@intel.com>,
Jagdish Gediya <jvgediya@linux.ibm.com>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
David Rientjes <rientjes@google.com>,
"Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Subject: [PATCH v5 7/9] mm/demotion: Demote pages according to allocation fallback order
Date: Fri, 3 Jun 2022 19:12:35 +0530 [thread overview]
Message-ID: <20220603134237.131362-8-aneesh.kumar@linux.ibm.com> (raw)
In-Reply-To: <20220603134237.131362-1-aneesh.kumar@linux.ibm.com>
From: Jagdish Gediya <jvgediya@linux.ibm.com>
currently, a higher tier node can only be demoted to selected
nodes on the next lower tier as defined by the demotion path,
not any other node from any lower tier. This strict, hard-coded
demotion order does not work in all use cases (e.g. some use cases
may want to allow cross-socket demotion to another node in the same
demotion tier as a fallback when the preferred demotion node is out
of space). This demotion order is also inconsistent with the page
allocation fallback order when all the nodes in a higher tier are
out of space: The page allocation can fall back to any node from any
lower tier, whereas the demotion order doesn't allow that currently.
This patch adds support to get all the allowed demotion targets mask
for node, also demote_page_list() function is modified to utilize this
allowed node mask by filling it in migration_target_control structure
before passing it to migrate_pages().
Signed-off-by: Jagdish Gediya <jvgediya@linux.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
include/linux/memory-tiers.h | 5 ++++
mm/memory-tiers.c | 49 ++++++++++++++++++++++++++++++++++--
mm/vmscan.c | 38 +++++++++++++---------------
3 files changed, 70 insertions(+), 22 deletions(-)
diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
index 79bd8d26feb2..cd6e71f702ad 100644
--- a/include/linux/memory-tiers.h
+++ b/include/linux/memory-tiers.h
@@ -21,6 +21,7 @@ void node_remove_from_memory_tier(int node);
int node_get_memory_tier_id(int node);
int node_set_memory_tier(int node, int tier);
int node_reset_memory_tier(int node, int tier);
+void node_get_allowed_targets(int node, nodemask_t *targets);
#else
#define numa_demotion_enabled false
static inline int next_demotion_node(int node)
@@ -28,6 +29,10 @@ static inline int next_demotion_node(int node)
return NUMA_NO_NODE;
}
+static inline void node_get_allowed_targets(int node, nodemask_t *targets)
+{
+ *targets = NODE_MASK_NONE;
+}
#endif /* CONFIG_TIERED_MEMORY */
#endif
diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
index b4e72b672d4d..592d939ec28d 100644
--- a/mm/memory-tiers.c
+++ b/mm/memory-tiers.c
@@ -18,6 +18,7 @@ struct memory_tier {
struct demotion_nodes {
nodemask_t preferred;
+ nodemask_t allowed;
};
#define to_memory_tier(device) container_of(device, struct memory_tier, dev)
@@ -378,6 +379,25 @@ int node_set_memory_tier(int node, int tier)
}
EXPORT_SYMBOL_GPL(node_set_memory_tier);
+void node_get_allowed_targets(int node, nodemask_t *targets)
+{
+ /*
+ * node_demotion[] is updated without excluding this
+ * function from running.
+ *
+ * If any node is moving to lower tiers then modifications
+ * in node_demotion[] are still valid for this node, if any
+ * node is moving to higher tier then moving node may be
+ * used once for demotion which should be ok so rcu should
+ * be enough here.
+ */
+ rcu_read_lock();
+
+ *targets = node_demotion[node].allowed;
+
+ rcu_read_unlock();
+}
+
/**
* next_demotion_node() - Get the next node in the demotion path
* @node: The starting node to lookup the next node
@@ -437,8 +457,10 @@ static void __disable_all_migrate_targets(void)
{
int node;
- for_each_node_mask(node, node_states[N_MEMORY])
+ for_each_node_mask(node, node_states[N_MEMORY]) {
node_demotion[node].preferred = NODE_MASK_NONE;
+ node_demotion[node].allowed = NODE_MASK_NONE;
+ }
}
static void disable_all_migrate_targets(void)
@@ -465,7 +487,7 @@ static void establish_migration_targets(void)
struct demotion_nodes *nd;
int target = NUMA_NO_NODE, node;
int distance, best_distance;
- nodemask_t used;
+ nodemask_t used, allowed = NODE_MASK_NONE;
if (!node_demotion)
return;
@@ -511,6 +533,29 @@ static void establish_migration_targets(void)
}
} while (1);
}
+ /*
+ * Now build the allowed mask for each node collecting node mask from
+ * all memory tier below it. This allows us to fallback demotion page
+ * allocation to a set of nodes that is closer the above selected
+ * perferred node.
+ */
+ list_for_each_entry(memtier, &memory_tiers, list)
+ nodes_or(allowed, allowed, memtier->nodelist);
+ /*
+ * Removes nodes not yet in N_MEMORY.
+ */
+ nodes_and(allowed, node_states[N_MEMORY], allowed);
+
+ list_for_each_entry(memtier, &memory_tiers, list) {
+ /*
+ * Keep removing current tier from allowed nodes,
+ * This will remove all nodes in current and above
+ * memory tier from the allowed mask.
+ */
+ nodes_andnot(allowed, allowed, memtier->nodelist);
+ for_each_node_mask(node, memtier->nodelist)
+ node_demotion[node].allowed = allowed;
+ }
}
/*
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 3a8f78277f99..d424b7af2f26 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1460,23 +1460,6 @@ static void folio_check_dirty_writeback(struct folio *folio,
mapping->a_ops->is_dirty_writeback(folio, dirty, writeback);
}
-static struct page *alloc_demote_page(struct page *page, unsigned long node)
-{
- struct migration_target_control mtc = {
- /*
- * Allocate from 'node', or fail quickly and quietly.
- * When this happens, 'page' will likely just be discarded
- * instead of migrated.
- */
- .gfp_mask = (GFP_HIGHUSER_MOVABLE & ~__GFP_RECLAIM) |
- __GFP_THISNODE | __GFP_NOWARN |
- __GFP_NOMEMALLOC | GFP_NOWAIT,
- .nid = node
- };
-
- return alloc_migration_target(page, (unsigned long)&mtc);
-}
-
/*
* Take pages on @demote_list and attempt to demote them to
* another node. Pages which are not demoted are left on
@@ -1487,6 +1470,19 @@ static unsigned int demote_page_list(struct list_head *demote_pages,
{
int target_nid = next_demotion_node(pgdat->node_id);
unsigned int nr_succeeded;
+ nodemask_t allowed_mask;
+
+ struct migration_target_control mtc = {
+ /*
+ * Allocate from 'node', or fail quickly and quietly.
+ * When this happens, 'page' will likely just be discarded
+ * instead of migrated.
+ */
+ .gfp_mask = (GFP_HIGHUSER_MOVABLE & ~__GFP_RECLAIM) | __GFP_NOWARN |
+ __GFP_NOMEMALLOC | GFP_NOWAIT,
+ .nid = target_nid,
+ .nmask = &allowed_mask
+ };
if (list_empty(demote_pages))
return 0;
@@ -1494,10 +1490,12 @@ static unsigned int demote_page_list(struct list_head *demote_pages,
if (target_nid == NUMA_NO_NODE)
return 0;
+ node_get_allowed_targets(pgdat->node_id, &allowed_mask);
+
/* Demotion ignores all cpuset and mempolicy settings */
- migrate_pages(demote_pages, alloc_demote_page, NULL,
- target_nid, MIGRATE_ASYNC, MR_DEMOTION,
- &nr_succeeded);
+ migrate_pages(demote_pages, alloc_migration_target, NULL,
+ (unsigned long)&mtc, MIGRATE_ASYNC, MR_DEMOTION,
+ &nr_succeeded);
if (current_is_kswapd())
__count_vm_events(PGDEMOTE_KSWAPD, nr_succeeded);
--
2.36.1
next prev parent reply other threads:[~2022-06-03 13:44 UTC|newest]
Thread overview: 84+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-03 13:42 [PATCH v5 0/9] mm/demotion: Memory tiers and demotion Aneesh Kumar K.V
2022-06-03 13:42 ` [PATCH v5 1/9] mm/demotion: Add support for explicit memory tiers Aneesh Kumar K.V
2022-06-07 18:43 ` Tim Chen
2022-06-07 20:18 ` Wei Xu
2022-06-08 4:30 ` Aneesh Kumar K V
2022-06-08 6:06 ` Ying Huang
2022-06-08 4:37 ` Aneesh Kumar K V
2022-06-08 6:10 ` Ying Huang
2022-06-08 8:04 ` Aneesh Kumar K V
2022-06-07 21:32 ` Yang Shi
2022-06-08 1:34 ` Ying Huang
2022-06-08 16:37 ` Yang Shi
2022-06-09 6:52 ` Ying Huang
2022-06-08 4:58 ` Aneesh Kumar K V
2022-06-08 6:18 ` Ying Huang
2022-06-08 16:42 ` Yang Shi
2022-06-09 8:17 ` Aneesh Kumar K V
2022-06-09 16:04 ` Yang Shi
2022-06-08 14:11 ` Johannes Weiner
2022-06-08 14:21 ` Aneesh Kumar K V
2022-06-08 15:55 ` Johannes Weiner
2022-06-08 16:13 ` Aneesh Kumar K V
2022-06-08 18:16 ` Johannes Weiner
2022-06-09 2:33 ` Aneesh Kumar K V
2022-06-09 13:55 ` Johannes Weiner
2022-06-09 14:22 ` Jonathan Cameron
2022-06-09 20:41 ` Johannes Weiner
2022-06-10 6:15 ` Ying Huang
2022-06-10 9:57 ` Jonathan Cameron
2022-06-13 14:05 ` Johannes Weiner
2022-06-13 14:23 ` Aneesh Kumar K V
2022-06-13 15:50 ` Johannes Weiner
2022-06-14 6:48 ` Ying Huang
2022-06-14 8:01 ` Aneesh Kumar K V
2022-06-14 18:56 ` Johannes Weiner
2022-06-15 6:23 ` Aneesh Kumar K V
2022-06-16 1:11 ` Ying Huang
2022-06-16 3:45 ` Wei Xu
2022-06-16 4:47 ` Aneesh Kumar K V
2022-06-16 5:51 ` Ying Huang
2022-06-17 10:41 ` Jonathan Cameron
2022-06-20 1:54 ` Huang, Ying
2022-06-14 16:45 ` Jonathan Cameron
2022-06-21 8:27 ` Aneesh Kumar K V
2022-06-03 13:42 ` [PATCH v5 2/9] mm/demotion: Expose per node memory tier to sysfs Aneesh Kumar K.V
2022-06-07 20:15 ` Tim Chen
2022-06-08 4:55 ` Aneesh Kumar K V
2022-06-08 6:42 ` Ying Huang
2022-06-08 16:06 ` Tim Chen
2022-06-08 16:15 ` Aneesh Kumar K V
2022-06-03 13:42 ` [PATCH v5 3/9] mm/demotion: Move memory demotion related code Aneesh Kumar K.V
2022-06-06 13:39 ` Bharata B Rao
2022-06-03 13:42 ` [PATCH v5 4/9] mm/demotion: Build demotion targets based on explicit memory tiers Aneesh Kumar K.V
2022-06-07 22:51 ` Tim Chen
2022-06-08 5:02 ` Aneesh Kumar K V
2022-06-08 6:52 ` Ying Huang
2022-06-08 6:50 ` Ying Huang
2022-06-08 8:19 ` Aneesh Kumar K V
2022-06-08 8:00 ` Ying Huang
2022-06-03 13:42 ` [PATCH v5 5/9] mm/demotion/dax/kmem: Set node's memory tier to MEMORY_TIER_PMEM Aneesh Kumar K.V
2022-06-03 13:42 ` [PATCH v5 6/9] mm/demotion: Add support for removing node from demotion memory tiers Aneesh Kumar K.V
2022-06-07 23:40 ` Tim Chen
2022-06-08 6:59 ` Ying Huang
2022-06-08 8:20 ` Aneesh Kumar K V
2022-06-08 8:23 ` Ying Huang
2022-06-08 8:29 ` Aneesh Kumar K V
2022-06-08 8:34 ` Ying Huang
2022-06-03 13:42 ` Aneesh Kumar K.V [this message]
2022-06-03 13:42 ` [PATCH v5 8/9] mm/demotion: Add documentation for memory tiering Aneesh Kumar K.V
2022-06-03 13:42 ` [PATCH v5 9/9] mm/demotion: Update node_is_toptier to work with memory tiers Aneesh Kumar K.V
2022-06-06 3:11 ` Ying Huang
2022-06-06 3:52 ` Aneesh Kumar K V
2022-06-06 7:24 ` Ying Huang
2022-06-06 8:33 ` Aneesh Kumar K V
2022-06-08 7:26 ` Ying Huang
2022-06-08 8:28 ` Aneesh Kumar K V
2022-06-08 8:32 ` Ying Huang
2022-06-08 14:37 ` Aneesh Kumar K.V
2022-06-08 20:14 ` Tim Chen
2022-06-10 6:04 ` Ying Huang
2022-06-06 4:53 ` [PATCH] mm/demotion: Add sysfs ABI documentation Aneesh Kumar K.V
2022-06-08 13:57 ` [PATCH v5 0/9] mm/demotion: Memory tiers and demotion Johannes Weiner
2022-06-08 14:20 ` Aneesh Kumar K V
2022-06-09 8:53 ` Jonathan Cameron
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220603134237.131362-8-aneesh.kumar@linux.ibm.com \
--to=aneesh.kumar@linux.ibm.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=baolin.wang@linux.alibaba.com \
--cc=brice.goglin@gmail.com \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@intel.com \
--cc=dave@stgolabs.net \
--cc=feng.tang@intel.com \
--cc=gthelen@google.com \
--cc=hesham.almatary@huawei.com \
--cc=jvgediya@linux.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=rientjes@google.com \
--cc=shy828301@gmail.com \
--cc=tim.c.chen@intel.com \
--cc=weixugc@google.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox