* [PATCH 0/4] Extend migrate_misplaced_page() to support batch migration
@ 2023-08-19 10:52 Baolin Wang
2023-08-19 10:52 ` [PATCH 1/4] mm: migrate: move migration validation into numa_migrate_prep() Baolin Wang
` (4 more replies)
0 siblings, 5 replies; 11+ messages in thread
From: Baolin Wang @ 2023-08-19 10:52 UTC (permalink / raw)
To: akpm
Cc: mgorman, shy828301, david, ying.huang, baolin.wang, linux-mm,
linux-kernel
Hi,
Currently, on our ARM servers with NUMA enabled, we found the cross-die latency
is a little larger that will significantly impact the workload's performance.
So on ARM servers we will rely on the NUMA balancing to avoid the cross-die
accessing. And I posted a patchset[1] to support speculative numa fault to
improve the NUMA balancing's performance according to the principle of data
locality. Moreover, thanks to Huang Ying's patchset[2], which introduced batch
migration as a way to reduce the cost of TLB flush, and it will also benefit
the migration of multiple pages all at once during NUMA balancing.
So we plan to continue to support batch migration in do_numa_page() to improve
the NUMA balancing's performance, but before adding complicated batch migration
algorithm for NUMA balancing, some cleanup and preparation work need to do firstly,
which are done in this patch set. In short, this patchset extends the
migrate_misplaced_page() interface to support batch migration, and no functional
changes intended.
[1] https://lore.kernel.org/lkml/cover.1639306956.git.baolin.wang@linux.alibaba.com/t/#mc45929849b5d0e29b5fdd9d50425f8e95b8f2563
[2] https://lore.kernel.org/all/20230213123444.155149-1-ying.huang@intel.com/T/#u
Baolin Wang (4):
mm: migrate: move migration validation into numa_migrate_prep()
mm: migrate: move the numamigrate_isolate_page() into do_numa_page()
mm: migrate: change migrate_misplaced_page() to support multiple pages
migration
mm: migrate: change to return the number of pages migrated
successfully
include/linux/migrate.h | 15 ++++++++---
mm/huge_memory.c | 19 +++++++++++---
mm/memory.c | 34 +++++++++++++++++++++++-
mm/migrate.c | 58 ++++++++---------------------------------
4 files changed, 71 insertions(+), 55 deletions(-)
--
2.39.3
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/4] mm: migrate: move migration validation into numa_migrate_prep()
2023-08-19 10:52 [PATCH 0/4] Extend migrate_misplaced_page() to support batch migration Baolin Wang
@ 2023-08-19 10:52 ` Baolin Wang
2023-08-21 2:20 ` Huang, Ying
2023-08-19 10:52 ` [PATCH 2/4] mm: migrate: move the numamigrate_isolate_page() into do_numa_page() Baolin Wang
` (3 subsequent siblings)
4 siblings, 1 reply; 11+ messages in thread
From: Baolin Wang @ 2023-08-19 10:52 UTC (permalink / raw)
To: akpm
Cc: mgorman, shy828301, david, ying.huang, baolin.wang, linux-mm,
linux-kernel
Now there are 3 places will validate if a page can mirate or not, and
some validations are performed later, which will waste some CPU to call
numa_migrate_prep().
Thus we can move all the migration validation into numa_migrate_prep(),
which is more maintainable as well as saving some CPU resources. Another
benefit is that it can serve as a preparation for supporting batch migration
in do_numa_page() in future.
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
mm/memory.c | 19 +++++++++++++++++++
mm/migrate.c | 19 -------------------
2 files changed, 19 insertions(+), 19 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index d003076b218d..bee9b1e86ef0 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4747,6 +4747,25 @@ int numa_migrate_prep(struct page *page, struct vm_area_struct *vma,
*flags |= TNF_FAULT_LOCAL;
}
+ /*
+ * Don't migrate file pages that are mapped in multiple processes
+ * with execute permissions as they are probably shared libraries.
+ */
+ if (page_mapcount(page) != 1 && page_is_file_lru(page) &&
+ (vma->vm_flags & VM_EXEC))
+ return NUMA_NO_NODE;
+
+ /*
+ * Also do not migrate dirty pages as not all filesystems can move
+ * dirty pages in MIGRATE_ASYNC mode which is a waste of cycles.
+ */
+ if (page_is_file_lru(page) && PageDirty(page))
+ return NUMA_NO_NODE;
+
+ /* Do not migrate THP mapped by multiple processes */
+ if (PageTransHuge(page) && total_mapcount(page) > 1)
+ return NUMA_NO_NODE;
+
return mpol_misplaced(page, vma, addr);
}
diff --git a/mm/migrate.c b/mm/migrate.c
index e21d5a7e7447..9cc98fb1d6ec 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2485,10 +2485,6 @@ static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page)
VM_BUG_ON_PAGE(order && !PageTransHuge(page), page);
- /* Do not migrate THP mapped by multiple processes */
- if (PageTransHuge(page) && total_mapcount(page) > 1)
- return 0;
-
/* Avoid migrating to a node that is nearly full */
if (!migrate_balanced_pgdat(pgdat, nr_pages)) {
int z;
@@ -2533,21 +2529,6 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma,
LIST_HEAD(migratepages);
int nr_pages = thp_nr_pages(page);
- /*
- * Don't migrate file pages that are mapped in multiple processes
- * with execute permissions as they are probably shared libraries.
- */
- if (page_mapcount(page) != 1 && page_is_file_lru(page) &&
- (vma->vm_flags & VM_EXEC))
- goto out;
-
- /*
- * Also do not migrate dirty pages as not all filesystems can move
- * dirty pages in MIGRATE_ASYNC mode which is a waste of cycles.
- */
- if (page_is_file_lru(page) && PageDirty(page))
- goto out;
-
isolated = numamigrate_isolate_page(pgdat, page);
if (!isolated)
goto out;
--
2.39.3
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 2/4] mm: migrate: move the numamigrate_isolate_page() into do_numa_page()
2023-08-19 10:52 [PATCH 0/4] Extend migrate_misplaced_page() to support batch migration Baolin Wang
2023-08-19 10:52 ` [PATCH 1/4] mm: migrate: move migration validation into numa_migrate_prep() Baolin Wang
@ 2023-08-19 10:52 ` Baolin Wang
2023-08-19 10:52 ` [PATCH 3/4] mm: migrate: change migrate_misplaced_page() to support multiple pages migration Baolin Wang
` (2 subsequent siblings)
4 siblings, 0 replies; 11+ messages in thread
From: Baolin Wang @ 2023-08-19 10:52 UTC (permalink / raw)
To: akpm
Cc: mgorman, shy828301, david, ying.huang, baolin.wang, linux-mm,
linux-kernel
Move the numamigrate_isolate_page() into do_numa_page() to simplify the
migrate_misplaced_page(), which now only focuses on page migration, and
it also serves as a preparation for supporting batch migration for NUMA
balancing.
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
include/linux/migrate.h | 6 ++++++
mm/huge_memory.c | 10 ++++++++++
mm/memory.c | 10 ++++++++++
mm/migrate.c | 16 ++++------------
4 files changed, 30 insertions(+), 12 deletions(-)
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 711dd9412561..7c5189043707 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -144,12 +144,18 @@ const struct movable_operations *page_movable_ops(struct page *page)
#ifdef CONFIG_NUMA_BALANCING
int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma,
int node);
+int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page);
#else
static inline int migrate_misplaced_page(struct page *page,
struct vm_area_struct *vma, int node)
{
return -EAGAIN; /* can't migrate now */
}
+
+static inline int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page)
+{
+ return -EAGAIN;
+}
#endif /* CONFIG_NUMA_BALANCING */
#ifdef CONFIG_MIGRATION
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index cb4432792b88..b7cc6828ce9e 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1496,6 +1496,8 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
int target_nid, last_cpupid = (-1 & LAST_CPUPID_MASK);
bool migrated = false, writable = false;
int flags = 0;
+ pg_data_t *pgdat;
+ int isolated;
vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
if (unlikely(!pmd_same(oldpmd, *vmf->pmd))) {
@@ -1540,11 +1542,19 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
spin_unlock(vmf->ptl);
writable = false;
+ pgdat = NODE_DATA(target_nid);
+ isolated = numamigrate_isolate_page(pgdat, page);
+ if (!isolated) {
+ put_page(page);
+ goto isolate_fail;
+ }
+
migrated = migrate_misplaced_page(page, vma, target_nid);
if (migrated) {
flags |= TNF_MIGRATED;
page_nid = target_nid;
} else {
+isolate_fail:
flags |= TNF_MIGRATE_FAIL;
vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
if (unlikely(!pmd_same(oldpmd, *vmf->pmd))) {
diff --git a/mm/memory.c b/mm/memory.c
index bee9b1e86ef0..01b1980d4fb7 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4779,6 +4779,8 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
int target_nid;
pte_t pte, old_pte;
int flags = 0;
+ pg_data_t *pgdat;
+ int isolated;
/*
* The "pte" at this point cannot be used safely without
@@ -4849,11 +4851,19 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
pte_unmap_unlock(vmf->pte, vmf->ptl);
writable = false;
+ pgdat = NODE_DATA(target_nid);
+ isolated = numamigrate_isolate_page(pgdat, page);
+ if (!isolated) {
+ put_page(page);
+ goto isolate_fail;
+ }
+
/* Migrate to the requested node */
if (migrate_misplaced_page(page, vma, target_nid)) {
page_nid = target_nid;
flags |= TNF_MIGRATED;
} else {
+isolate_fail:
flags |= TNF_MIGRATE_FAIL;
vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
vmf->address, &vmf->ptl);
diff --git a/mm/migrate.c b/mm/migrate.c
index 9cc98fb1d6ec..5eeeb2cda21c 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2478,7 +2478,7 @@ static struct folio *alloc_misplaced_dst_folio(struct folio *src,
return __folio_alloc_node(gfp, order, nid);
}
-static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page)
+int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page)
{
int nr_pages = thp_nr_pages(page);
int order = compound_order(page);
@@ -2523,16 +2523,12 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma,
int node)
{
pg_data_t *pgdat = NODE_DATA(node);
- int isolated;
+ int migrated = 1;
int nr_remaining;
unsigned int nr_succeeded;
LIST_HEAD(migratepages);
int nr_pages = thp_nr_pages(page);
- isolated = numamigrate_isolate_page(pgdat, page);
- if (!isolated)
- goto out;
-
list_add(&page->lru, &migratepages);
nr_remaining = migrate_pages(&migratepages, alloc_misplaced_dst_folio,
NULL, node, MIGRATE_ASYNC,
@@ -2544,7 +2540,7 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma,
page_is_file_lru(page), -nr_pages);
putback_lru_page(page);
}
- isolated = 0;
+ migrated = 0;
}
if (nr_succeeded) {
count_vm_numa_events(NUMA_PAGE_MIGRATE, nr_succeeded);
@@ -2553,11 +2549,7 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma,
nr_succeeded);
}
BUG_ON(!list_empty(&migratepages));
- return isolated;
-
-out:
- put_page(page);
- return 0;
+ return migrated;
}
#endif /* CONFIG_NUMA_BALANCING */
#endif /* CONFIG_NUMA */
--
2.39.3
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 3/4] mm: migrate: change migrate_misplaced_page() to support multiple pages migration
2023-08-19 10:52 [PATCH 0/4] Extend migrate_misplaced_page() to support batch migration Baolin Wang
2023-08-19 10:52 ` [PATCH 1/4] mm: migrate: move migration validation into numa_migrate_prep() Baolin Wang
2023-08-19 10:52 ` [PATCH 2/4] mm: migrate: move the numamigrate_isolate_page() into do_numa_page() Baolin Wang
@ 2023-08-19 10:52 ` Baolin Wang
2023-08-19 10:52 ` [PATCH 4/4] mm: migrate: change to return the number of pages migrated successfully Baolin Wang
2023-08-21 2:29 ` [PATCH 0/4] Extend migrate_misplaced_page() to support batch migration Huang, Ying
4 siblings, 0 replies; 11+ messages in thread
From: Baolin Wang @ 2023-08-19 10:52 UTC (permalink / raw)
To: akpm
Cc: mgorman, shy828301, david, ying.huang, baolin.wang, linux-mm,
linux-kernel
Expanding the migrate_misplaced_page() function to allow passing in a list
to support multiple pages migration as a preparation to support batch migration
for NUMA balancing.
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
include/linux/migrate.h | 9 +++++----
mm/huge_memory.c | 4 +++-
mm/memory.c | 4 +++-
mm/migrate.c | 26 ++++++++++----------------
4 files changed, 21 insertions(+), 22 deletions(-)
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 7c5189043707..2599f95d6c9e 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -142,12 +142,13 @@ const struct movable_operations *page_movable_ops(struct page *page)
}
#ifdef CONFIG_NUMA_BALANCING
-int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma,
- int node);
+int migrate_misplaced_page(struct list_head *migratepages, struct vm_area_struct *vma,
+ int source_nid, int target_nid);
int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page);
#else
-static inline int migrate_misplaced_page(struct page *page,
- struct vm_area_struct *vma, int node)
+static inline int migrate_misplaced_page(struct list_head *migratepages,
+ struct vm_area_struct *vma,
+ int source_nid, int target_nid)
{
return -EAGAIN; /* can't migrate now */
}
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index b7cc6828ce9e..53a9d63cfb1e 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1498,6 +1498,7 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
int flags = 0;
pg_data_t *pgdat;
int isolated;
+ LIST_HEAD(migratepages);
vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
if (unlikely(!pmd_same(oldpmd, *vmf->pmd))) {
@@ -1549,7 +1550,8 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
goto isolate_fail;
}
- migrated = migrate_misplaced_page(page, vma, target_nid);
+ list_add(&page->lru, &migratepages);
+ migrated = migrate_misplaced_page(&migratepages, vma, page_nid, target_nid);
if (migrated) {
flags |= TNF_MIGRATED;
page_nid = target_nid;
diff --git a/mm/memory.c b/mm/memory.c
index 01b1980d4fb7..973403a83797 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4781,6 +4781,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
int flags = 0;
pg_data_t *pgdat;
int isolated;
+ LIST_HEAD(migratepages);
/*
* The "pte" at this point cannot be used safely without
@@ -4858,8 +4859,9 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
goto isolate_fail;
}
+ list_add(&page->lru, &migratepages);
/* Migrate to the requested node */
- if (migrate_misplaced_page(page, vma, target_nid)) {
+ if (migrate_misplaced_page(&migratepages, vma, page_nid, target_nid)) {
page_nid = target_nid;
flags |= TNF_MIGRATED;
} else {
diff --git a/mm/migrate.c b/mm/migrate.c
index 5eeeb2cda21c..93d359471b95 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2519,36 +2519,30 @@ int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page)
* node. Caller is expected to have an elevated reference count on
* the page that will be dropped by this function before returning.
*/
-int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma,
- int node)
+int migrate_misplaced_page(struct list_head *migratepages, struct vm_area_struct *vma,
+ int source_nid, int target_nid)
{
- pg_data_t *pgdat = NODE_DATA(node);
+ pg_data_t *pgdat = NODE_DATA(target_nid);
int migrated = 1;
int nr_remaining;
unsigned int nr_succeeded;
- LIST_HEAD(migratepages);
- int nr_pages = thp_nr_pages(page);
- list_add(&page->lru, &migratepages);
- nr_remaining = migrate_pages(&migratepages, alloc_misplaced_dst_folio,
- NULL, node, MIGRATE_ASYNC,
+ nr_remaining = migrate_pages(migratepages, alloc_misplaced_dst_folio,
+ NULL, target_nid, MIGRATE_ASYNC,
MR_NUMA_MISPLACED, &nr_succeeded);
if (nr_remaining) {
- if (!list_empty(&migratepages)) {
- list_del(&page->lru);
- mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON +
- page_is_file_lru(page), -nr_pages);
- putback_lru_page(page);
- }
+ if (!list_empty(migratepages))
+ putback_movable_pages(migratepages);
+
migrated = 0;
}
if (nr_succeeded) {
count_vm_numa_events(NUMA_PAGE_MIGRATE, nr_succeeded);
- if (!node_is_toptier(page_to_nid(page)) && node_is_toptier(node))
+ if (!node_is_toptier(source_nid) && node_is_toptier(target_nid))
mod_node_page_state(pgdat, PGPROMOTE_SUCCESS,
nr_succeeded);
}
- BUG_ON(!list_empty(&migratepages));
+ BUG_ON(!list_empty(migratepages));
return migrated;
}
#endif /* CONFIG_NUMA_BALANCING */
--
2.39.3
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 4/4] mm: migrate: change to return the number of pages migrated successfully
2023-08-19 10:52 [PATCH 0/4] Extend migrate_misplaced_page() to support batch migration Baolin Wang
` (2 preceding siblings ...)
2023-08-19 10:52 ` [PATCH 3/4] mm: migrate: change migrate_misplaced_page() to support multiple pages migration Baolin Wang
@ 2023-08-19 10:52 ` Baolin Wang
2023-08-21 2:29 ` [PATCH 0/4] Extend migrate_misplaced_page() to support batch migration Huang, Ying
4 siblings, 0 replies; 11+ messages in thread
From: Baolin Wang @ 2023-08-19 10:52 UTC (permalink / raw)
To: akpm
Cc: mgorman, shy828301, david, ying.huang, baolin.wang, linux-mm,
linux-kernel
Change the migrate_misplaced_page() to return the number of pages migrated
successfully, which is used to calculate how many pages are failed to
migrate for batch migration.
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
mm/huge_memory.c | 7 ++++---
mm/memory.c | 5 +++--
mm/migrate.c | 5 +----
3 files changed, 8 insertions(+), 9 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 53a9d63cfb1e..a9c454160984 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1494,11 +1494,12 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
unsigned long haddr = vmf->address & HPAGE_PMD_MASK;
int page_nid = NUMA_NO_NODE;
int target_nid, last_cpupid = (-1 & LAST_CPUPID_MASK);
- bool migrated = false, writable = false;
+ bool writable = false;
int flags = 0;
pg_data_t *pgdat;
int isolated;
LIST_HEAD(migratepages);
+ int nr_successed = 0;
vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
if (unlikely(!pmd_same(oldpmd, *vmf->pmd))) {
@@ -1551,8 +1552,8 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
}
list_add(&page->lru, &migratepages);
- migrated = migrate_misplaced_page(&migratepages, vma, page_nid, target_nid);
- if (migrated) {
+ nr_successed = migrate_misplaced_page(&migratepages, vma, page_nid, target_nid);
+ if (nr_successed) {
flags |= TNF_MIGRATED;
page_nid = target_nid;
} else {
diff --git a/mm/memory.c b/mm/memory.c
index 973403a83797..edfd2d528e7e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4780,7 +4780,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
pte_t pte, old_pte;
int flags = 0;
pg_data_t *pgdat;
- int isolated;
+ int isolated, nr_succeeded;
LIST_HEAD(migratepages);
/*
@@ -4861,7 +4861,8 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
list_add(&page->lru, &migratepages);
/* Migrate to the requested node */
- if (migrate_misplaced_page(&migratepages, vma, page_nid, target_nid)) {
+ nr_succeeded = migrate_misplaced_page(&migratepages, vma, page_nid, target_nid);
+ if (nr_succeeded) {
page_nid = target_nid;
flags |= TNF_MIGRATED;
} else {
diff --git a/mm/migrate.c b/mm/migrate.c
index 93d359471b95..45f92376ba6f 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2523,7 +2523,6 @@ int migrate_misplaced_page(struct list_head *migratepages, struct vm_area_struct
int source_nid, int target_nid)
{
pg_data_t *pgdat = NODE_DATA(target_nid);
- int migrated = 1;
int nr_remaining;
unsigned int nr_succeeded;
@@ -2533,8 +2532,6 @@ int migrate_misplaced_page(struct list_head *migratepages, struct vm_area_struct
if (nr_remaining) {
if (!list_empty(migratepages))
putback_movable_pages(migratepages);
-
- migrated = 0;
}
if (nr_succeeded) {
count_vm_numa_events(NUMA_PAGE_MIGRATE, nr_succeeded);
@@ -2543,7 +2540,7 @@ int migrate_misplaced_page(struct list_head *migratepages, struct vm_area_struct
nr_succeeded);
}
BUG_ON(!list_empty(migratepages));
- return migrated;
+ return nr_succeeded;
}
#endif /* CONFIG_NUMA_BALANCING */
#endif /* CONFIG_NUMA */
--
2.39.3
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/4] mm: migrate: move migration validation into numa_migrate_prep()
2023-08-19 10:52 ` [PATCH 1/4] mm: migrate: move migration validation into numa_migrate_prep() Baolin Wang
@ 2023-08-21 2:20 ` Huang, Ying
2023-08-21 7:52 ` Baolin Wang
0 siblings, 1 reply; 11+ messages in thread
From: Huang, Ying @ 2023-08-21 2:20 UTC (permalink / raw)
To: Baolin Wang; +Cc: akpm, mgorman, shy828301, david, linux-mm, linux-kernel
Baolin Wang <baolin.wang@linux.alibaba.com> writes:
> Now there are 3 places will validate if a page can mirate or not, and
> some validations are performed later, which will waste some CPU to call
> numa_migrate_prep().
>
> Thus we can move all the migration validation into numa_migrate_prep(),
> which is more maintainable as well as saving some CPU resources. Another
> benefit is that it can serve as a preparation for supporting batch migration
> in do_numa_page() in future.
>
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
> mm/memory.c | 19 +++++++++++++++++++
> mm/migrate.c | 19 -------------------
> 2 files changed, 19 insertions(+), 19 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index d003076b218d..bee9b1e86ef0 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4747,6 +4747,25 @@ int numa_migrate_prep(struct page *page, struct vm_area_struct *vma,
> *flags |= TNF_FAULT_LOCAL;
> }
>
> + /*
> + * Don't migrate file pages that are mapped in multiple processes
> + * with execute permissions as they are probably shared libraries.
> + */
> + if (page_mapcount(page) != 1 && page_is_file_lru(page) &&
> + (vma->vm_flags & VM_EXEC))
> + return NUMA_NO_NODE;
> +
> + /*
> + * Also do not migrate dirty pages as not all filesystems can move
> + * dirty pages in MIGRATE_ASYNC mode which is a waste of cycles.
> + */
> + if (page_is_file_lru(page) && PageDirty(page))
> + return NUMA_NO_NODE;
> +
> + /* Do not migrate THP mapped by multiple processes */
> + if (PageTransHuge(page) && total_mapcount(page) > 1)
> + return NUMA_NO_NODE;
> +
> return mpol_misplaced(page, vma, addr);
In mpol_misplaced()->should_numa_migrate_memory(), accessing CPU and PID
will be recorded. So the code change above will introduce some behavior
change.
How about move these checks into a separate function which is called
between numa_migrate_prep() and migrate_misplaced_page() after unlocking
PTL?
--
Best Regards,
Huang, Ying
> }
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index e21d5a7e7447..9cc98fb1d6ec 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -2485,10 +2485,6 @@ static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page)
>
> VM_BUG_ON_PAGE(order && !PageTransHuge(page), page);
>
> - /* Do not migrate THP mapped by multiple processes */
> - if (PageTransHuge(page) && total_mapcount(page) > 1)
> - return 0;
> -
> /* Avoid migrating to a node that is nearly full */
> if (!migrate_balanced_pgdat(pgdat, nr_pages)) {
> int z;
> @@ -2533,21 +2529,6 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma,
> LIST_HEAD(migratepages);
> int nr_pages = thp_nr_pages(page);
>
> - /*
> - * Don't migrate file pages that are mapped in multiple processes
> - * with execute permissions as they are probably shared libraries.
> - */
> - if (page_mapcount(page) != 1 && page_is_file_lru(page) &&
> - (vma->vm_flags & VM_EXEC))
> - goto out;
> -
> - /*
> - * Also do not migrate dirty pages as not all filesystems can move
> - * dirty pages in MIGRATE_ASYNC mode which is a waste of cycles.
> - */
> - if (page_is_file_lru(page) && PageDirty(page))
> - goto out;
> -
> isolated = numamigrate_isolate_page(pgdat, page);
> if (!isolated)
> goto out;
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/4] Extend migrate_misplaced_page() to support batch migration
2023-08-19 10:52 [PATCH 0/4] Extend migrate_misplaced_page() to support batch migration Baolin Wang
` (3 preceding siblings ...)
2023-08-19 10:52 ` [PATCH 4/4] mm: migrate: change to return the number of pages migrated successfully Baolin Wang
@ 2023-08-21 2:29 ` Huang, Ying
2023-08-21 8:10 ` Baolin Wang
4 siblings, 1 reply; 11+ messages in thread
From: Huang, Ying @ 2023-08-21 2:29 UTC (permalink / raw)
To: Baolin Wang; +Cc: akpm, mgorman, shy828301, david, linux-mm, linux-kernel
Baolin Wang <baolin.wang@linux.alibaba.com> writes:
> Hi,
>
> Currently, on our ARM servers with NUMA enabled, we found the cross-die latency
> is a little larger that will significantly impact the workload's performance.
> So on ARM servers we will rely on the NUMA balancing to avoid the cross-die
> accessing. And I posted a patchset[1] to support speculative numa fault to
> improve the NUMA balancing's performance according to the principle of data
> locality. Moreover, thanks to Huang Ying's patchset[2], which introduced batch
> migration as a way to reduce the cost of TLB flush, and it will also benefit
> the migration of multiple pages all at once during NUMA balancing.
>
> So we plan to continue to support batch migration in do_numa_page() to improve
> the NUMA balancing's performance, but before adding complicated batch migration
> algorithm for NUMA balancing, some cleanup and preparation work need to do firstly,
> which are done in this patch set. In short, this patchset extends the
> migrate_misplaced_page() interface to support batch migration, and no functional
> changes intended.
Will these cleanup benefit anything except batching migration? If not,
I suggest you to post the whole series. In this way, people will be
more clear about why we need these cleanup.
--
Best Regards,
Huang, Ying
> [1] https://lore.kernel.org/lkml/cover.1639306956.git.baolin.wang@linux.alibaba.com/t/#mc45929849b5d0e29b5fdd9d50425f8e95b8f2563
> [2] https://lore.kernel.org/all/20230213123444.155149-1-ying.huang@intel.com/T/#u
>
> Baolin Wang (4):
> mm: migrate: move migration validation into numa_migrate_prep()
> mm: migrate: move the numamigrate_isolate_page() into do_numa_page()
> mm: migrate: change migrate_misplaced_page() to support multiple pages
> migration
> mm: migrate: change to return the number of pages migrated
> successfully
>
> include/linux/migrate.h | 15 ++++++++---
> mm/huge_memory.c | 19 +++++++++++---
> mm/memory.c | 34 +++++++++++++++++++++++-
> mm/migrate.c | 58 ++++++++---------------------------------
> 4 files changed, 71 insertions(+), 55 deletions(-)
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/4] mm: migrate: move migration validation into numa_migrate_prep()
2023-08-21 2:20 ` Huang, Ying
@ 2023-08-21 7:52 ` Baolin Wang
0 siblings, 0 replies; 11+ messages in thread
From: Baolin Wang @ 2023-08-21 7:52 UTC (permalink / raw)
To: Huang, Ying; +Cc: akpm, mgorman, shy828301, david, linux-mm, linux-kernel
On 8/21/2023 10:20 AM, Huang, Ying wrote:
> Baolin Wang <baolin.wang@linux.alibaba.com> writes:
>
>> Now there are 3 places will validate if a page can mirate or not, and
>> some validations are performed later, which will waste some CPU to call
>> numa_migrate_prep().
>>
>> Thus we can move all the migration validation into numa_migrate_prep(),
>> which is more maintainable as well as saving some CPU resources. Another
>> benefit is that it can serve as a preparation for supporting batch migration
>> in do_numa_page() in future.
>>
>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> ---
>> mm/memory.c | 19 +++++++++++++++++++
>> mm/migrate.c | 19 -------------------
>> 2 files changed, 19 insertions(+), 19 deletions(-)
>>
>> diff --git a/mm/memory.c b/mm/memory.c
>> index d003076b218d..bee9b1e86ef0 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -4747,6 +4747,25 @@ int numa_migrate_prep(struct page *page, struct vm_area_struct *vma,
>> *flags |= TNF_FAULT_LOCAL;
>> }
>>
>> + /*
>> + * Don't migrate file pages that are mapped in multiple processes
>> + * with execute permissions as they are probably shared libraries.
>> + */
>> + if (page_mapcount(page) != 1 && page_is_file_lru(page) &&
>> + (vma->vm_flags & VM_EXEC))
>> + return NUMA_NO_NODE;
>> +
>> + /*
>> + * Also do not migrate dirty pages as not all filesystems can move
>> + * dirty pages in MIGRATE_ASYNC mode which is a waste of cycles.
>> + */
>> + if (page_is_file_lru(page) && PageDirty(page))
>> + return NUMA_NO_NODE;
>> +
>> + /* Do not migrate THP mapped by multiple processes */
>> + if (PageTransHuge(page) && total_mapcount(page) > 1)
>> + return NUMA_NO_NODE;
>> +
>> return mpol_misplaced(page, vma, addr);
>
> In mpol_misplaced()->should_numa_migrate_memory(), accessing CPU and PID
> will be recorded. So the code change above will introduce some behavior
> change.
Indeed.
>
> How about move these checks into a separate function which is called
> between numa_migrate_prep() and migrate_misplaced_page() after unlocking
> PTL?
Sounds reasonable to me. Thanks for your input.
>
> --
> Best Regards,
> Huang, Ying
>
>> }
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index e21d5a7e7447..9cc98fb1d6ec 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -2485,10 +2485,6 @@ static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page)
>>
>> VM_BUG_ON_PAGE(order && !PageTransHuge(page), page);
>>
>> - /* Do not migrate THP mapped by multiple processes */
>> - if (PageTransHuge(page) && total_mapcount(page) > 1)
>> - return 0;
>> -
>> /* Avoid migrating to a node that is nearly full */
>> if (!migrate_balanced_pgdat(pgdat, nr_pages)) {
>> int z;
>> @@ -2533,21 +2529,6 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma,
>> LIST_HEAD(migratepages);
>> int nr_pages = thp_nr_pages(page);
>>
>> - /*
>> - * Don't migrate file pages that are mapped in multiple processes
>> - * with execute permissions as they are probably shared libraries.
>> - */
>> - if (page_mapcount(page) != 1 && page_is_file_lru(page) &&
>> - (vma->vm_flags & VM_EXEC))
>> - goto out;
>> -
>> - /*
>> - * Also do not migrate dirty pages as not all filesystems can move
>> - * dirty pages in MIGRATE_ASYNC mode which is a waste of cycles.
>> - */
>> - if (page_is_file_lru(page) && PageDirty(page))
>> - goto out;
>> -
>> isolated = numamigrate_isolate_page(pgdat, page);
>> if (!isolated)
>> goto out;
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/4] Extend migrate_misplaced_page() to support batch migration
2023-08-21 2:29 ` [PATCH 0/4] Extend migrate_misplaced_page() to support batch migration Huang, Ying
@ 2023-08-21 8:10 ` Baolin Wang
2023-08-21 8:41 ` Huang, Ying
0 siblings, 1 reply; 11+ messages in thread
From: Baolin Wang @ 2023-08-21 8:10 UTC (permalink / raw)
To: Huang, Ying; +Cc: akpm, mgorman, shy828301, david, linux-mm, linux-kernel
On 8/21/2023 10:29 AM, Huang, Ying wrote:
> Baolin Wang <baolin.wang@linux.alibaba.com> writes:
>
>> Hi,
>>
>> Currently, on our ARM servers with NUMA enabled, we found the cross-die latency
>> is a little larger that will significantly impact the workload's performance.
>> So on ARM servers we will rely on the NUMA balancing to avoid the cross-die
>> accessing. And I posted a patchset[1] to support speculative numa fault to
>> improve the NUMA balancing's performance according to the principle of data
>> locality. Moreover, thanks to Huang Ying's patchset[2], which introduced batch
>> migration as a way to reduce the cost of TLB flush, and it will also benefit
>> the migration of multiple pages all at once during NUMA balancing.
>>
>> So we plan to continue to support batch migration in do_numa_page() to improve
>> the NUMA balancing's performance, but before adding complicated batch migration
>> algorithm for NUMA balancing, some cleanup and preparation work need to do firstly,
>> which are done in this patch set. In short, this patchset extends the
>> migrate_misplaced_page() interface to support batch migration, and no functional
>> changes intended.
>
> Will these cleanup benefit anything except batching migration? If not,
I hope these cleanup can also benefit the compound page's NUMA
balancing, which was discussed in the thread[1]. IIUC, for the compound
page's NUMA balancing, it is possible that partial pages were
successfully migrated, so it is necessary to return the number of pages
that were successfully migrated from migrate_misplaced_page(). (But I
did not look this in detail yet, please correct me if I missed
something, and I will find some time to look this in detail). That is
why I think these cleanups are straightforward.
Yes, I will post the batch migration patches after more polish and
testing, but I think these cleanups are separate and straightforward, so
I plan to submit the patches separately.
[1]
https://lore.kernel.org/all/f8d47176-03a8-99bf-a813-b5942830fd73@arm.com/
> I suggest you to post the whole series. In this way, people will be
> more clear about why we need these cleanup.
>
> --
> Best Regards,
> Huang, Ying
>
>> [1] https://lore.kernel.org/lkml/cover.1639306956.git.baolin.wang@linux.alibaba.com/t/#mc45929849b5d0e29b5fdd9d50425f8e95b8f2563
>> [2] https://lore.kernel.org/all/20230213123444.155149-1-ying.huang@intel.com/T/#u
>>
>> Baolin Wang (4):
>> mm: migrate: move migration validation into numa_migrate_prep()
>> mm: migrate: move the numamigrate_isolate_page() into do_numa_page()
>> mm: migrate: change migrate_misplaced_page() to support multiple pages
>> migration
>> mm: migrate: change to return the number of pages migrated
>> successfully
>>
>> include/linux/migrate.h | 15 ++++++++---
>> mm/huge_memory.c | 19 +++++++++++---
>> mm/memory.c | 34 +++++++++++++++++++++++-
>> mm/migrate.c | 58 ++++++++---------------------------------
>> 4 files changed, 71 insertions(+), 55 deletions(-)
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/4] Extend migrate_misplaced_page() to support batch migration
2023-08-21 8:10 ` Baolin Wang
@ 2023-08-21 8:41 ` Huang, Ying
2023-08-21 8:50 ` Baolin Wang
0 siblings, 1 reply; 11+ messages in thread
From: Huang, Ying @ 2023-08-21 8:41 UTC (permalink / raw)
To: Baolin Wang; +Cc: akpm, mgorman, shy828301, david, linux-mm, linux-kernel
Baolin Wang <baolin.wang@linux.alibaba.com> writes:
> On 8/21/2023 10:29 AM, Huang, Ying wrote:
>> Baolin Wang <baolin.wang@linux.alibaba.com> writes:
>>
>>> Hi,
>>>
>>> Currently, on our ARM servers with NUMA enabled, we found the cross-die latency
>>> is a little larger that will significantly impact the workload's performance.
>>> So on ARM servers we will rely on the NUMA balancing to avoid the cross-die
>>> accessing. And I posted a patchset[1] to support speculative numa fault to
>>> improve the NUMA balancing's performance according to the principle of data
>>> locality. Moreover, thanks to Huang Ying's patchset[2], which introduced batch
>>> migration as a way to reduce the cost of TLB flush, and it will also benefit
>>> the migration of multiple pages all at once during NUMA balancing.
>>>
>>> So we plan to continue to support batch migration in do_numa_page() to improve
>>> the NUMA balancing's performance, but before adding complicated batch migration
>>> algorithm for NUMA balancing, some cleanup and preparation work need to do firstly,
>>> which are done in this patch set. In short, this patchset extends the
>>> migrate_misplaced_page() interface to support batch migration, and no functional
>>> changes intended.
>> Will these cleanup benefit anything except batching migration? If
>> not,
>
> I hope these cleanup can also benefit the compound page's NUMA
> balancing, which was discussed in the thread[1]. IIUC, for the
> compound page's NUMA balancing, it is possible that partial pages were
> successfully migrated, so it is necessary to return the number of
> pages that were successfully migrated from
> migrate_misplaced_page(). (But I did not look this in detail yet,
> please correct me if I missed something, and I will find some time to
> look this in detail). That is why I think these cleanups are
> straightforward.
>
> Yes, I will post the batch migration patches after more polish and
> testing, but I think these cleanups are separate and straightforward,
> so I plan to submit the patches separately.
Then, please state the benefit explicitly in the patch description
instead of just preparation for batching migration.
--
Best Regards,
Huang, Ying
> [1]
> https://lore.kernel.org/all/f8d47176-03a8-99bf-a813-b5942830fd73@arm.com/
>
>> I suggest you to post the whole series. In this way, people will be
>> more clear about why we need these cleanup.
>> --
>> Best Regards,
>> Huang, Ying
>>
>>> [1] https://lore.kernel.org/lkml/cover.1639306956.git.baolin.wang@linux.alibaba.com/t/#mc45929849b5d0e29b5fdd9d50425f8e95b8f2563
>>> [2] https://lore.kernel.org/all/20230213123444.155149-1-ying.huang@intel.com/T/#u
>>>
>>> Baolin Wang (4):
>>> mm: migrate: move migration validation into numa_migrate_prep()
>>> mm: migrate: move the numamigrate_isolate_page() into do_numa_page()
>>> mm: migrate: change migrate_misplaced_page() to support multiple pages
>>> migration
>>> mm: migrate: change to return the number of pages migrated
>>> successfully
>>>
>>> include/linux/migrate.h | 15 ++++++++---
>>> mm/huge_memory.c | 19 +++++++++++---
>>> mm/memory.c | 34 +++++++++++++++++++++++-
>>> mm/migrate.c | 58 ++++++++---------------------------------
>>> 4 files changed, 71 insertions(+), 55 deletions(-)
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/4] Extend migrate_misplaced_page() to support batch migration
2023-08-21 8:41 ` Huang, Ying
@ 2023-08-21 8:50 ` Baolin Wang
0 siblings, 0 replies; 11+ messages in thread
From: Baolin Wang @ 2023-08-21 8:50 UTC (permalink / raw)
To: Huang, Ying; +Cc: akpm, mgorman, shy828301, david, linux-mm, linux-kernel
On 8/21/2023 4:41 PM, Huang, Ying wrote:
> Baolin Wang <baolin.wang@linux.alibaba.com> writes:
>
>> On 8/21/2023 10:29 AM, Huang, Ying wrote:
>>> Baolin Wang <baolin.wang@linux.alibaba.com> writes:
>>>
>>>> Hi,
>>>>
>>>> Currently, on our ARM servers with NUMA enabled, we found the cross-die latency
>>>> is a little larger that will significantly impact the workload's performance.
>>>> So on ARM servers we will rely on the NUMA balancing to avoid the cross-die
>>>> accessing. And I posted a patchset[1] to support speculative numa fault to
>>>> improve the NUMA balancing's performance according to the principle of data
>>>> locality. Moreover, thanks to Huang Ying's patchset[2], which introduced batch
>>>> migration as a way to reduce the cost of TLB flush, and it will also benefit
>>>> the migration of multiple pages all at once during NUMA balancing.
>>>>
>>>> So we plan to continue to support batch migration in do_numa_page() to improve
>>>> the NUMA balancing's performance, but before adding complicated batch migration
>>>> algorithm for NUMA balancing, some cleanup and preparation work need to do firstly,
>>>> which are done in this patch set. In short, this patchset extends the
>>>> migrate_misplaced_page() interface to support batch migration, and no functional
>>>> changes intended.
>>> Will these cleanup benefit anything except batching migration? If
>>> not,
>>
>> I hope these cleanup can also benefit the compound page's NUMA
>> balancing, which was discussed in the thread[1]. IIUC, for the
>> compound page's NUMA balancing, it is possible that partial pages were
>> successfully migrated, so it is necessary to return the number of
>> pages that were successfully migrated from
>> migrate_misplaced_page(). (But I did not look this in detail yet,
>> please correct me if I missed something, and I will find some time to
>> look this in detail). That is why I think these cleanups are
>> straightforward.
>>
>> Yes, I will post the batch migration patches after more polish and
>> testing, but I think these cleanups are separate and straightforward,
>> so I plan to submit the patches separately.
>
> Then, please state the benefit explicitly in the patch description
> instead of just preparation for batching migration.
Sure, will do. Thanks.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2023-08-21 8:50 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-19 10:52 [PATCH 0/4] Extend migrate_misplaced_page() to support batch migration Baolin Wang
2023-08-19 10:52 ` [PATCH 1/4] mm: migrate: move migration validation into numa_migrate_prep() Baolin Wang
2023-08-21 2:20 ` Huang, Ying
2023-08-21 7:52 ` Baolin Wang
2023-08-19 10:52 ` [PATCH 2/4] mm: migrate: move the numamigrate_isolate_page() into do_numa_page() Baolin Wang
2023-08-19 10:52 ` [PATCH 3/4] mm: migrate: change migrate_misplaced_page() to support multiple pages migration Baolin Wang
2023-08-19 10:52 ` [PATCH 4/4] mm: migrate: change to return the number of pages migrated successfully Baolin Wang
2023-08-21 2:29 ` [PATCH 0/4] Extend migrate_misplaced_page() to support batch migration Huang, Ying
2023-08-21 8:10 ` Baolin Wang
2023-08-21 8:41 ` Huang, Ying
2023-08-21 8:50 ` Baolin Wang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox