[PATCH v2 0/4] Extend migrate_misplaced_page() to support batch migration

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/4] Extend migrate_misplaced_page() to support batch migration
@ 2023-08-22  0:53 Baolin Wang
  2023-08-22  0:53 ` [PATCH v2 1/4] mm: migrate: factor out migration validation into numa_page_can_migrate() Baolin Wang
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Baolin Wang @ 2023-08-22  0:53 UTC (permalink / raw)
  To: akpm
  Cc: mgorman, shy828301, david, ying.huang, baolin.wang, linux-mm,
	linux-kernel

Hi,

Currently, on our ARM servers with NUMA enabled, we found the cross-die latency
is a little larger that will significantly impact the workload's performance.
So on ARM servers we will rely on the NUMA balancing to avoid the cross-die
accessing. And I posted a patchset[1] to support speculative numa fault to
improve the NUMA balancing's performance according to the principle of data
locality. Moreover, thanks to Huang Ying's patchset[2], which introduced batch
migration as a way to reduce the cost of TLB flush, and it will also benefit
the migration of multiple pages all at once during NUMA balancing.

So we plan to continue to support batch migration in do_numa_page() to improve
the NUMA balancing's performance, but before adding complicated batch migration
algorithm for NUMA balancing, some cleanup and preparation work need to do firstly,
which are done in this patch set. In short, this patchset extends the
migrate_misplaced_page() interface to support batch migration, and no functional
changes intended.

In addition, these cleanup can also benefit the compound page's NUMA balancing,
which was discussed in previous thread[3]. IIUC, for the compound page's NUMA
balancing, it is possible that partial pages were successfully migrated, so it is
necessary to return the number of pages that were successfully migrated from
migrate_misplaced_page().

This series is based on the latest mm-unstable(d226b59b30cc).

[1] https://lore.kernel.org/lkml/cover.1639306956.git.baolin.wang@linux.alibaba.com/t/#mc45929849b5d0e29b5fdd9d50425f8e95b8f2563
[2] https://lore.kernel.org/all/20230213123444.155149-1-ying.huang@intel.com/T/#u
[3] https://lore.kernel.org/all/f8d47176-03a8-99bf-a813-b5942830fd73@arm.com/

Changes from v1:
 - Move page validation into a new function suggested by Huang Ying.
 - Change numamigrate_isolate_page() to boolean type.
 - Update some commit message.

Baolin Wang (4):
  mm: migrate: factor out migration validation into
    numa_page_can_migrate()
  mm: migrate: move the numamigrate_isolate_page() into do_numa_page()
  mm: migrate: change migrate_misplaced_page() to support multiple pages
    migration
  mm: migrate: change to return the number of pages migrated
    successfully

 include/linux/migrate.h | 15 +++++++---
 mm/huge_memory.c        | 23 +++++++++++++--
 mm/internal.h           |  1 +
 mm/memory.c             | 43 ++++++++++++++++++++++++++-
 mm/migrate.c            | 64 +++++++++--------------------------------
 5 files changed, 88 insertions(+), 58 deletions(-)

-- 
2.39.3

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/4] mm: migrate: factor out migration validation into numa_page_can_migrate()
  2023-08-22  0:53 [PATCH v2 0/4] Extend migrate_misplaced_page() to support batch migration Baolin Wang
@ 2023-08-22  0:53 ` Baolin Wang
  2023-08-22  0:53 ` [PATCH v2 2/4] mm: migrate: move the numamigrate_isolate_page() into do_numa_page() Baolin Wang
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Baolin Wang @ 2023-08-22  0:53 UTC (permalink / raw)
  To: akpm
  Cc: mgorman, shy828301, david, ying.huang, baolin.wang, linux-mm,
	linux-kernel

Now there are several places will validate if a page can migrate or not,
so factoring out these validation into a new function to make them more
maintainable.

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 mm/huge_memory.c |  6 ++++++
 mm/internal.h    |  1 +
 mm/memory.c      | 30 ++++++++++++++++++++++++++++++
 mm/migrate.c     | 19 -------------------
 4 files changed, 37 insertions(+), 19 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 4465915711c3..4a9b34a89854 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1540,11 +1540,17 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
 	spin_unlock(vmf->ptl);
 	writable = false;
 
+	if (!numa_page_can_migrate(vma, page)) {
+		put_page(page);
+		goto migrate_fail;
+	}
+
 	migrated = migrate_misplaced_page(page, vma, target_nid);
 	if (migrated) {
 		flags |= TNF_MIGRATED;
 		page_nid = target_nid;
 	} else {
+migrate_fail:
 		flags |= TNF_MIGRATE_FAIL;
 		vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
 		if (unlikely(!pmd_same(oldpmd, *vmf->pmd))) {
diff --git a/mm/internal.h b/mm/internal.h
index f59a53111817..1e00b8a30910 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -933,6 +933,7 @@ void __vunmap_range_noflush(unsigned long start, unsigned long end);
 
 int numa_migrate_prep(struct page *page, struct vm_area_struct *vma,
 		      unsigned long addr, int page_nid, int *flags);
+bool numa_page_can_migrate(struct vm_area_struct *vma, struct page *page);
 
 void free_zone_device_page(struct page *page);
 int migrate_device_coherent_page(struct page *page);
diff --git a/mm/memory.c b/mm/memory.c
index 12647d139a13..fc6f6b7a70e1 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4735,6 +4735,30 @@ int numa_migrate_prep(struct page *page, struct vm_area_struct *vma,
 	return mpol_misplaced(page, vma, addr);
 }
 
+bool numa_page_can_migrate(struct vm_area_struct *vma, struct page *page)
+{
+	/*
+	 * Don't migrate file pages that are mapped in multiple processes
+	 * with execute permissions as they are probably shared libraries.
+	 */
+	if (page_mapcount(page) != 1 && page_is_file_lru(page) &&
+	    (vma->vm_flags & VM_EXEC))
+		return false;
+
+	/*
+	 * Also do not migrate dirty pages as not all filesystems can move
+	 * dirty pages in MIGRATE_ASYNC mode which is a waste of cycles.
+	 */
+	if (page_is_file_lru(page) && PageDirty(page))
+		return false;
+
+	/* Do not migrate THP mapped by multiple processes */
+	if (PageTransHuge(page) && total_mapcount(page) > 1)
+		return false;
+
+	return true;
+}
+
 static vm_fault_t do_numa_page(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
@@ -4815,11 +4839,17 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
 	pte_unmap_unlock(vmf->pte, vmf->ptl);
 	writable = false;
 
+	if (!numa_page_can_migrate(vma, page)) {
+		put_page(page);
+		goto migrate_fail;
+	}
+
 	/* Migrate to the requested node */
 	if (migrate_misplaced_page(page, vma, target_nid)) {
 		page_nid = target_nid;
 		flags |= TNF_MIGRATED;
 	} else {
+migrate_fail:
 		flags |= TNF_MIGRATE_FAIL;
 		vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
 					       vmf->address, &vmf->ptl);
diff --git a/mm/migrate.c b/mm/migrate.c
index e21d5a7e7447..9cc98fb1d6ec 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2485,10 +2485,6 @@ static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page)
 
 	VM_BUG_ON_PAGE(order && !PageTransHuge(page), page);
 
-	/* Do not migrate THP mapped by multiple processes */
-	if (PageTransHuge(page) && total_mapcount(page) > 1)
-		return 0;
-
 	/* Avoid migrating to a node that is nearly full */
 	if (!migrate_balanced_pgdat(pgdat, nr_pages)) {
 		int z;
@@ -2533,21 +2529,6 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma,
 	LIST_HEAD(migratepages);
 	int nr_pages = thp_nr_pages(page);
 
-	/*
-	 * Don't migrate file pages that are mapped in multiple processes
-	 * with execute permissions as they are probably shared libraries.
-	 */
-	if (page_mapcount(page) != 1 && page_is_file_lru(page) &&
-	    (vma->vm_flags & VM_EXEC))
-		goto out;
-
-	/*
-	 * Also do not migrate dirty pages as not all filesystems can move
-	 * dirty pages in MIGRATE_ASYNC mode which is a waste of cycles.
-	 */
-	if (page_is_file_lru(page) && PageDirty(page))
-		goto out;
-
 	isolated = numamigrate_isolate_page(pgdat, page);
 	if (!isolated)
 		goto out;
-- 
2.39.3



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 2/4] mm: migrate: move the numamigrate_isolate_page() into do_numa_page()
  2023-08-22  0:53 [PATCH v2 0/4] Extend migrate_misplaced_page() to support batch migration Baolin Wang
  2023-08-22  0:53 ` [PATCH v2 1/4] mm: migrate: factor out migration validation into numa_page_can_migrate() Baolin Wang
@ 2023-08-22  0:53 ` Baolin Wang
  2023-08-22  9:02   ` Bharata B Rao
  2023-08-22  0:53 ` [PATCH v2 3/4] mm: migrate: change migrate_misplaced_page() to support multiple pages migration Baolin Wang
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Baolin Wang @ 2023-08-22  0:53 UTC (permalink / raw)
  To: akpm
  Cc: mgorman, shy828301, david, ying.huang, baolin.wang, linux-mm,
	linux-kernel

Move the numamigrate_isolate_page() into do_numa_page() to simplify the
migrate_misplaced_page(), which now only focuses on page migration, and
it also serves as a preparation for supporting batch migration for
migrate_misplaced_page().

While we are at it, change the numamigrate_isolate_page() to boolean
type to make the return value more clear.

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 include/linux/migrate.h |  6 ++++++
 mm/huge_memory.c        |  7 +++++++
 mm/memory.c             |  7 +++++++
 mm/migrate.c            | 22 +++++++---------------
 4 files changed, 27 insertions(+), 15 deletions(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 711dd9412561..ddcd62ec2c12 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -144,12 +144,18 @@ const struct movable_operations *page_movable_ops(struct page *page)
 #ifdef CONFIG_NUMA_BALANCING
 int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma,
 			   int node);
+bool numamigrate_isolate_page(pg_data_t *pgdat, struct page *page);
 #else
 static inline int migrate_misplaced_page(struct page *page,
 					 struct vm_area_struct *vma, int node)
 {
 	return -EAGAIN; /* can't migrate now */
 }
+
+static inline bool numamigrate_isolate_page(pg_data_t *pgdat, struct page *page)
+{
+	return false;
+}
 #endif /* CONFIG_NUMA_BALANCING */
 
 #ifdef CONFIG_MIGRATION
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 4a9b34a89854..07149ead11e4 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1496,6 +1496,7 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
 	int target_nid, last_cpupid = (-1 & LAST_CPUPID_MASK);
 	bool migrated = false, writable = false;
 	int flags = 0;
+	pg_data_t *pgdat;
 
 	vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
 	if (unlikely(!pmd_same(oldpmd, *vmf->pmd))) {
@@ -1545,6 +1546,12 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
 		goto migrate_fail;
 	}
 
+	pgdat = NODE_DATA(target_nid);
+	if (!numamigrate_isolate_page(pgdat, page)) {
+		put_page(page);
+		goto migrate_fail;
+	}
+
 	migrated = migrate_misplaced_page(page, vma, target_nid);
 	if (migrated) {
 		flags |= TNF_MIGRATED;
diff --git a/mm/memory.c b/mm/memory.c
index fc6f6b7a70e1..4e451b041488 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4769,6 +4769,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
 	int target_nid;
 	pte_t pte, old_pte;
 	int flags = 0;
+	pg_data_t *pgdat;
 
 	/*
 	 * The "pte" at this point cannot be used safely without
@@ -4844,6 +4845,12 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
 		goto migrate_fail;
 	}
 
+	pgdat = NODE_DATA(target_nid);
+	if (!numamigrate_isolate_page(pgdat, page)) {
+		put_page(page);
+		goto migrate_fail;
+	}
+
 	/* Migrate to the requested node */
 	if (migrate_misplaced_page(page, vma, target_nid)) {
 		page_nid = target_nid;
diff --git a/mm/migrate.c b/mm/migrate.c
index 9cc98fb1d6ec..0b2b69a2a7ab 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2478,7 +2478,7 @@ static struct folio *alloc_misplaced_dst_folio(struct folio *src,
 	return __folio_alloc_node(gfp, order, nid);
 }
 
-static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page)
+bool numamigrate_isolate_page(pg_data_t *pgdat, struct page *page)
 {
 	int nr_pages = thp_nr_pages(page);
 	int order = compound_order(page);
@@ -2496,11 +2496,11 @@ static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page)
 				break;
 		}
 		wakeup_kswapd(pgdat->node_zones + z, 0, order, ZONE_MOVABLE);
-		return 0;
+		return false;
 	}
 
 	if (!isolate_lru_page(page))
-		return 0;
+		return false;
 
 	mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON + page_is_file_lru(page),
 			    nr_pages);
@@ -2511,7 +2511,7 @@ static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page)
 	 * disappearing underneath us during migration.
 	 */
 	put_page(page);
-	return 1;
+	return true;
 }
 
 /*
@@ -2523,16 +2523,12 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma,
 			   int node)
 {
 	pg_data_t *pgdat = NODE_DATA(node);
-	int isolated;
+	int migrated = 1;
 	int nr_remaining;
 	unsigned int nr_succeeded;
 	LIST_HEAD(migratepages);
 	int nr_pages = thp_nr_pages(page);
 
-	isolated = numamigrate_isolate_page(pgdat, page);
-	if (!isolated)
-		goto out;
-
 	list_add(&page->lru, &migratepages);
 	nr_remaining = migrate_pages(&migratepages, alloc_misplaced_dst_folio,
 				     NULL, node, MIGRATE_ASYNC,
@@ -2544,7 +2540,7 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma,
 					page_is_file_lru(page), -nr_pages);
 			putback_lru_page(page);
 		}
-		isolated = 0;
+		migrated = 0;
 	}
 	if (nr_succeeded) {
 		count_vm_numa_events(NUMA_PAGE_MIGRATE, nr_succeeded);
@@ -2553,11 +2549,7 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma,
 					    nr_succeeded);
 	}
 	BUG_ON(!list_empty(&migratepages));
-	return isolated;
-
-out:
-	put_page(page);
-	return 0;
+	return migrated;
 }
 #endif /* CONFIG_NUMA_BALANCING */
 #endif /* CONFIG_NUMA */
-- 
2.39.3



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/4] mm: migrate: move the numamigrate_isolate_page() into do_numa_page()
  2023-08-22  0:53 ` [PATCH v2 2/4] mm: migrate: move the numamigrate_isolate_page() into do_numa_page() Baolin Wang
@ 2023-08-22  9:02   ` Bharata B Rao
  2023-08-24  3:14     ` Baolin Wang
  0 siblings, 1 reply; 11+ messages in thread
From: Bharata B Rao @ 2023-08-22  9:02 UTC (permalink / raw)
  To: Baolin Wang, akpm
  Cc: mgorman, shy828301, david, ying.huang, linux-mm, linux-kernel

On 22-Aug-23 6:23 AM, Baolin Wang wrote:
> Move the numamigrate_isolate_page() into do_numa_page() to simplify the
> migrate_misplaced_page(), which now only focuses on page migration, and
> it also serves as a preparation for supporting batch migration for
> migrate_misplaced_page().
> 
> While we are at it, change the numamigrate_isolate_page() to boolean
> type to make the return value more clear.
> 
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
>  include/linux/migrate.h |  6 ++++++
>  mm/huge_memory.c        |  7 +++++++
>  mm/memory.c             |  7 +++++++
>  mm/migrate.c            | 22 +++++++---------------
>  4 files changed, 27 insertions(+), 15 deletions(-)
> 
> diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> index 711dd9412561..ddcd62ec2c12 100644
> --- a/include/linux/migrate.h
> +++ b/include/linux/migrate.h
> @@ -144,12 +144,18 @@ const struct movable_operations *page_movable_ops(struct page *page)
>  #ifdef CONFIG_NUMA_BALANCING
>  int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma,
>  			   int node);
> +bool numamigrate_isolate_page(pg_data_t *pgdat, struct page *page);
>  #else
>  static inline int migrate_misplaced_page(struct page *page,
>  					 struct vm_area_struct *vma, int node)
>  {
>  	return -EAGAIN; /* can't migrate now */
>  }
> +
> +static inline bool numamigrate_isolate_page(pg_data_t *pgdat, struct page *page)
> +{
> +	return false;
> +}
>  #endif /* CONFIG_NUMA_BALANCING */
>  
>  #ifdef CONFIG_MIGRATION
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 4a9b34a89854..07149ead11e4 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1496,6 +1496,7 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
>  	int target_nid, last_cpupid = (-1 & LAST_CPUPID_MASK);
>  	bool migrated = false, writable = false;
>  	int flags = 0;
> +	pg_data_t *pgdat;
>  
>  	vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
>  	if (unlikely(!pmd_same(oldpmd, *vmf->pmd))) {
> @@ -1545,6 +1546,12 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
>  		goto migrate_fail;
>  	}
>  
> +	pgdat = NODE_DATA(target_nid);
> +	if (!numamigrate_isolate_page(pgdat, page)) {
> +		put_page(page);
> +		goto migrate_fail;
> +	}
> +
>  	migrated = migrate_misplaced_page(page, vma, target_nid);
>  	if (migrated) {
>  		flags |= TNF_MIGRATED;
> diff --git a/mm/memory.c b/mm/memory.c
> index fc6f6b7a70e1..4e451b041488 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4769,6 +4769,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
>  	int target_nid;
>  	pte_t pte, old_pte;
>  	int flags = 0;
> +	pg_data_t *pgdat;
>  
>  	/*
>  	 * The "pte" at this point cannot be used safely without
> @@ -4844,6 +4845,12 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
>  		goto migrate_fail;
>  	}
>  
> +	pgdat = NODE_DATA(target_nid);
> +	if (!numamigrate_isolate_page(pgdat, page)) {
> +		put_page(page);
> +		goto migrate_fail;
> +	}
> +
>  	/* Migrate to the requested node */
>  	if (migrate_misplaced_page(page, vma, target_nid)) {
>  		page_nid = target_nid;
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 9cc98fb1d6ec..0b2b69a2a7ab 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -2478,7 +2478,7 @@ static struct folio *alloc_misplaced_dst_folio(struct folio *src,
>  	return __folio_alloc_node(gfp, order, nid);
>  }
>  
> -static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page)
> +bool numamigrate_isolate_page(pg_data_t *pgdat, struct page *page)
>  {
>  	int nr_pages = thp_nr_pages(page);
>  	int order = compound_order(page);
> @@ -2496,11 +2496,11 @@ static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page)
>  				break;
>  		}

There is an other s/return 0/return false/ changed required here for this chunk:

if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING))
                        return 0;

>  		wakeup_kswapd(pgdat->node_zones + z, 0, order, ZONE_MOVABLE);
> -		return 0;
> +		return false;
>  	}

Looks like this whole section under "Avoiding migrating to a node that is nearly full"
check could be moved to numa_page_can_migrate() as that can be considered as one more
check (or action to) see if the page can be migrated or not. After that numamigrate_isolate_page()
will truly be about isolating the page.

Regards,
Bharata.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/4] mm: migrate: move the numamigrate_isolate_page() into do_numa_page()
  2023-08-22  9:02   ` Bharata B Rao
@ 2023-08-24  3:14     ` Baolin Wang
  0 siblings, 0 replies; 11+ messages in thread
From: Baolin Wang @ 2023-08-24  3:14 UTC (permalink / raw)
  To: Bharata B Rao, akpm
  Cc: mgorman, shy828301, david, ying.huang, linux-mm, linux-kernel



On 8/22/2023 5:02 PM, Bharata B Rao wrote:
> On 22-Aug-23 6:23 AM, Baolin Wang wrote:
>> Move the numamigrate_isolate_page() into do_numa_page() to simplify the
>> migrate_misplaced_page(), which now only focuses on page migration, and
>> it also serves as a preparation for supporting batch migration for
>> migrate_misplaced_page().
>>
>> While we are at it, change the numamigrate_isolate_page() to boolean
>> type to make the return value more clear.
>>
>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> ---
>>   include/linux/migrate.h |  6 ++++++
>>   mm/huge_memory.c        |  7 +++++++
>>   mm/memory.c             |  7 +++++++
>>   mm/migrate.c            | 22 +++++++---------------
>>   4 files changed, 27 insertions(+), 15 deletions(-)
>>
>> diff --git a/include/linux/migrate.h b/include/linux/migrate.h
>> index 711dd9412561..ddcd62ec2c12 100644
>> --- a/include/linux/migrate.h
>> +++ b/include/linux/migrate.h
>> @@ -144,12 +144,18 @@ const struct movable_operations *page_movable_ops(struct page *page)
>>   #ifdef CONFIG_NUMA_BALANCING
>>   int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma,
>>   			   int node);
>> +bool numamigrate_isolate_page(pg_data_t *pgdat, struct page *page);
>>   #else
>>   static inline int migrate_misplaced_page(struct page *page,
>>   					 struct vm_area_struct *vma, int node)
>>   {
>>   	return -EAGAIN; /* can't migrate now */
>>   }
>> +
>> +static inline bool numamigrate_isolate_page(pg_data_t *pgdat, struct page *page)
>> +{
>> +	return false;
>> +}
>>   #endif /* CONFIG_NUMA_BALANCING */
>>   
>>   #ifdef CONFIG_MIGRATION
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 4a9b34a89854..07149ead11e4 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -1496,6 +1496,7 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
>>   	int target_nid, last_cpupid = (-1 & LAST_CPUPID_MASK);
>>   	bool migrated = false, writable = false;
>>   	int flags = 0;
>> +	pg_data_t *pgdat;
>>   
>>   	vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
>>   	if (unlikely(!pmd_same(oldpmd, *vmf->pmd))) {
>> @@ -1545,6 +1546,12 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
>>   		goto migrate_fail;
>>   	}
>>   
>> +	pgdat = NODE_DATA(target_nid);
>> +	if (!numamigrate_isolate_page(pgdat, page)) {
>> +		put_page(page);
>> +		goto migrate_fail;
>> +	}
>> +
>>   	migrated = migrate_misplaced_page(page, vma, target_nid);
>>   	if (migrated) {
>>   		flags |= TNF_MIGRATED;
>> diff --git a/mm/memory.c b/mm/memory.c
>> index fc6f6b7a70e1..4e451b041488 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -4769,6 +4769,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
>>   	int target_nid;
>>   	pte_t pte, old_pte;
>>   	int flags = 0;
>> +	pg_data_t *pgdat;
>>   
>>   	/*
>>   	 * The "pte" at this point cannot be used safely without
>> @@ -4844,6 +4845,12 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
>>   		goto migrate_fail;
>>   	}
>>   
>> +	pgdat = NODE_DATA(target_nid);
>> +	if (!numamigrate_isolate_page(pgdat, page)) {
>> +		put_page(page);
>> +		goto migrate_fail;
>> +	}
>> +
>>   	/* Migrate to the requested node */
>>   	if (migrate_misplaced_page(page, vma, target_nid)) {
>>   		page_nid = target_nid;
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index 9cc98fb1d6ec..0b2b69a2a7ab 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -2478,7 +2478,7 @@ static struct folio *alloc_misplaced_dst_folio(struct folio *src,
>>   	return __folio_alloc_node(gfp, order, nid);
>>   }
>>   
>> -static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page)
>> +bool numamigrate_isolate_page(pg_data_t *pgdat, struct page *page)
>>   {
>>   	int nr_pages = thp_nr_pages(page);
>>   	int order = compound_order(page);
>> @@ -2496,11 +2496,11 @@ static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page)
>>   				break;
>>   		}
> 
> There is an other s/return 0/return false/ changed required here for this chunk:
> 
> if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING))
>                          return 0;

Good catch.

> 
>>   		wakeup_kswapd(pgdat->node_zones + z, 0, order, ZONE_MOVABLE);
>> -		return 0;
>> +		return false;
>>   	}
> 
> Looks like this whole section under "Avoiding migrating to a node that is nearly full"
> check could be moved to numa_page_can_migrate() as that can be considered as one more
> check (or action to) see if the page can be migrated or not. After that numamigrate_isolate_page()
> will truly be about isolating the page.

Good idea. Will do. Thanks.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 3/4] mm: migrate: change migrate_misplaced_page() to support multiple pages migration
  2023-08-22  0:53 [PATCH v2 0/4] Extend migrate_misplaced_page() to support batch migration Baolin Wang
  2023-08-22  0:53 ` [PATCH v2 1/4] mm: migrate: factor out migration validation into numa_page_can_migrate() Baolin Wang
  2023-08-22  0:53 ` [PATCH v2 2/4] mm: migrate: move the numamigrate_isolate_page() into do_numa_page() Baolin Wang
@ 2023-08-22  0:53 ` Baolin Wang
  2023-08-22  0:53 ` [PATCH v2 4/4] mm: migrate: change to return the number of pages migrated successfully Baolin Wang
  2023-08-22  2:47 ` [PATCH v2 0/4] Extend migrate_misplaced_page() to support batch migration Huang, Ying
  4 siblings, 0 replies; 11+ messages in thread
From: Baolin Wang @ 2023-08-22  0:53 UTC (permalink / raw)
  To: akpm
  Cc: mgorman, shy828301, david, ying.huang, baolin.wang, linux-mm,
	linux-kernel

Expanding the migrate_misplaced_page() function to allow passing in a list
to support multiple pages migration as a preparation to support batch migration
for NUMA balancing as well as compound page NUMA balancing in future.

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 include/linux/migrate.h |  9 +++++----
 mm/huge_memory.c        |  5 ++++-
 mm/memory.c             |  4 +++-
 mm/migrate.c            | 26 ++++++++++----------------
 4 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index ddcd62ec2c12..87edce8e939d 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -142,12 +142,13 @@ const struct movable_operations *page_movable_ops(struct page *page)
 }
 
 #ifdef CONFIG_NUMA_BALANCING
-int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma,
-			   int node);
+int migrate_misplaced_page(struct list_head *migratepages, struct vm_area_struct *vma,
+			   int source_nid, int target_nid);
 bool numamigrate_isolate_page(pg_data_t *pgdat, struct page *page);
 #else
-static inline int migrate_misplaced_page(struct page *page,
-					 struct vm_area_struct *vma, int node)
+static inline int migrate_misplaced_page(struct list_head *migratepages,
+					 struct vm_area_struct *vma,
+					 int source_nid, int target_nid)
 {
 	return -EAGAIN; /* can't migrate now */
 }
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 07149ead11e4..4401a3493544 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1497,6 +1497,7 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
 	bool migrated = false, writable = false;
 	int flags = 0;
 	pg_data_t *pgdat;
+	LIST_HEAD(migratepages);
 
 	vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
 	if (unlikely(!pmd_same(oldpmd, *vmf->pmd))) {
@@ -1552,7 +1553,9 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
 		goto migrate_fail;
 	}
 
-	migrated = migrate_misplaced_page(page, vma, target_nid);
+	list_add(&page->lru, &migratepages);
+	migrated = migrate_misplaced_page(&migratepages, vma,
+					  page_nid, target_nid);
 	if (migrated) {
 		flags |= TNF_MIGRATED;
 		page_nid = target_nid;
diff --git a/mm/memory.c b/mm/memory.c
index 4e451b041488..9e417e8dd5d5 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4770,6 +4770,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
 	pte_t pte, old_pte;
 	int flags = 0;
 	pg_data_t *pgdat;
+	LIST_HEAD(migratepages);
 
 	/*
 	 * The "pte" at this point cannot be used safely without
@@ -4851,8 +4852,9 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
 		goto migrate_fail;
 	}
 
+	list_add(&page->lru, &migratepages);
 	/* Migrate to the requested node */
-	if (migrate_misplaced_page(page, vma, target_nid)) {
+	if (migrate_misplaced_page(&migratepages, vma, page_nid, target_nid)) {
 		page_nid = target_nid;
 		flags |= TNF_MIGRATED;
 	} else {
diff --git a/mm/migrate.c b/mm/migrate.c
index 0b2b69a2a7ab..fae7224b8e64 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2519,36 +2519,30 @@ bool numamigrate_isolate_page(pg_data_t *pgdat, struct page *page)
  * node. Caller is expected to have an elevated reference count on
  * the page that will be dropped by this function before returning.
  */
-int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma,
-			   int node)
+int migrate_misplaced_page(struct list_head *migratepages, struct vm_area_struct *vma,
+			   int source_nid, int target_nid)
 {
-	pg_data_t *pgdat = NODE_DATA(node);
+	pg_data_t *pgdat = NODE_DATA(target_nid);
 	int migrated = 1;
 	int nr_remaining;
 	unsigned int nr_succeeded;
-	LIST_HEAD(migratepages);
-	int nr_pages = thp_nr_pages(page);
 
-	list_add(&page->lru, &migratepages);
-	nr_remaining = migrate_pages(&migratepages, alloc_misplaced_dst_folio,
-				     NULL, node, MIGRATE_ASYNC,
+	nr_remaining = migrate_pages(migratepages, alloc_misplaced_dst_folio,
+				     NULL, target_nid, MIGRATE_ASYNC,
 				     MR_NUMA_MISPLACED, &nr_succeeded);
 	if (nr_remaining) {
-		if (!list_empty(&migratepages)) {
-			list_del(&page->lru);
-			mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON +
-					page_is_file_lru(page), -nr_pages);
-			putback_lru_page(page);
-		}
+		if (!list_empty(migratepages))
+			putback_movable_pages(migratepages);
+
 		migrated = 0;
 	}
 	if (nr_succeeded) {
 		count_vm_numa_events(NUMA_PAGE_MIGRATE, nr_succeeded);
-		if (!node_is_toptier(page_to_nid(page)) && node_is_toptier(node))
+		if (!node_is_toptier(source_nid) && node_is_toptier(target_nid))
 			mod_node_page_state(pgdat, PGPROMOTE_SUCCESS,
 					    nr_succeeded);
 	}
-	BUG_ON(!list_empty(&migratepages));
+	BUG_ON(!list_empty(migratepages));
 	return migrated;
 }
 #endif /* CONFIG_NUMA_BALANCING */
-- 
2.39.3



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 4/4] mm: migrate: change to return the number of pages migrated successfully
  2023-08-22  0:53 [PATCH v2 0/4] Extend migrate_misplaced_page() to support batch migration Baolin Wang
                   ` (2 preceding siblings ...)
  2023-08-22  0:53 ` [PATCH v2 3/4] mm: migrate: change migrate_misplaced_page() to support multiple pages migration Baolin Wang
@ 2023-08-22  0:53 ` Baolin Wang
  2023-08-22  2:47 ` [PATCH v2 0/4] Extend migrate_misplaced_page() to support batch migration Huang, Ying
  4 siblings, 0 replies; 11+ messages in thread
From: Baolin Wang @ 2023-08-22  0:53 UTC (permalink / raw)
  To: akpm
  Cc: mgorman, shy828301, david, ying.huang, baolin.wang, linux-mm,
	linux-kernel

Change the migrate_misplaced_page() to return the number of pages migrated
successfully, which is used to calculate how many pages are failed to
migrate for batch migration. For the compound page's NUMA balancing support,
it is possible that partial pages were successfully migrated, so it is
necessary to return the number of pages that were successfully migrated from
migrate_misplaced_page().

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 mm/huge_memory.c | 9 +++++----
 mm/memory.c      | 4 +++-
 mm/migrate.c     | 5 +----
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 4401a3493544..951f73d6b5bf 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1494,10 +1494,11 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
 	unsigned long haddr = vmf->address & HPAGE_PMD_MASK;
 	int page_nid = NUMA_NO_NODE;
 	int target_nid, last_cpupid = (-1 & LAST_CPUPID_MASK);
-	bool migrated = false, writable = false;
+	bool writable = false;
 	int flags = 0;
 	pg_data_t *pgdat;
 	LIST_HEAD(migratepages);
+	int nr_successed;
 
 	vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
 	if (unlikely(!pmd_same(oldpmd, *vmf->pmd))) {
@@ -1554,9 +1555,9 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
 	}
 
 	list_add(&page->lru, &migratepages);
-	migrated = migrate_misplaced_page(&migratepages, vma,
-					  page_nid, target_nid);
-	if (migrated) {
+	nr_successed = migrate_misplaced_page(&migratepages, vma,
+					      page_nid, target_nid);
+	if (nr_successed) {
 		flags |= TNF_MIGRATED;
 		page_nid = target_nid;
 	} else {
diff --git a/mm/memory.c b/mm/memory.c
index 9e417e8dd5d5..2773cd804ee9 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4771,6 +4771,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
 	int flags = 0;
 	pg_data_t *pgdat;
 	LIST_HEAD(migratepages);
+	int nr_succeeded;
 
 	/*
 	 * The "pte" at this point cannot be used safely without
@@ -4854,7 +4855,8 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
 
 	list_add(&page->lru, &migratepages);
 	/* Migrate to the requested node */
-	if (migrate_misplaced_page(&migratepages, vma, page_nid, target_nid)) {
+	nr_succeeded = migrate_misplaced_page(&migratepages, vma, page_nid, target_nid);
+	if (nr_succeeded) {
 		page_nid = target_nid;
 		flags |= TNF_MIGRATED;
 	} else {
diff --git a/mm/migrate.c b/mm/migrate.c
index fae7224b8e64..5435cfb225ab 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2523,7 +2523,6 @@ int migrate_misplaced_page(struct list_head *migratepages, struct vm_area_struct
 			   int source_nid, int target_nid)
 {
 	pg_data_t *pgdat = NODE_DATA(target_nid);
-	int migrated = 1;
 	int nr_remaining;
 	unsigned int nr_succeeded;
 
@@ -2533,8 +2532,6 @@ int migrate_misplaced_page(struct list_head *migratepages, struct vm_area_struct
 	if (nr_remaining) {
 		if (!list_empty(migratepages))
 			putback_movable_pages(migratepages);
-
-		migrated = 0;
 	}
 	if (nr_succeeded) {
 		count_vm_numa_events(NUMA_PAGE_MIGRATE, nr_succeeded);
@@ -2543,7 +2540,7 @@ int migrate_misplaced_page(struct list_head *migratepages, struct vm_area_struct
 					    nr_succeeded);
 	}
 	BUG_ON(!list_empty(migratepages));
-	return migrated;
+	return nr_succeeded;
 }
 #endif /* CONFIG_NUMA_BALANCING */
 #endif /* CONFIG_NUMA */
-- 
2.39.3



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/4] Extend migrate_misplaced_page() to support batch migration
  2023-08-22  0:53 [PATCH v2 0/4] Extend migrate_misplaced_page() to support batch migration Baolin Wang
                   ` (3 preceding siblings ...)
  2023-08-22  0:53 ` [PATCH v2 4/4] mm: migrate: change to return the number of pages migrated successfully Baolin Wang
@ 2023-08-22  2:47 ` Huang, Ying
  2023-08-24  3:13   ` Baolin Wang
  4 siblings, 1 reply; 11+ messages in thread
From: Huang, Ying @ 2023-08-22  2:47 UTC (permalink / raw)
  To: Baolin Wang; +Cc: akpm, mgorman, shy828301, david, linux-mm, linux-kernel

Baolin Wang <baolin.wang@linux.alibaba.com> writes:

> Hi,
>
> Currently, on our ARM servers with NUMA enabled, we found the cross-die latency
> is a little larger that will significantly impact the workload's performance.
> So on ARM servers we will rely on the NUMA balancing to avoid the cross-die
> accessing. And I posted a patchset[1] to support speculative numa fault to
> improve the NUMA balancing's performance according to the principle of data
> locality. Moreover, thanks to Huang Ying's patchset[2], which introduced batch
> migration as a way to reduce the cost of TLB flush, and it will also benefit
> the migration of multiple pages all at once during NUMA balancing.
>
> So we plan to continue to support batch migration in do_numa_page() to improve
> the NUMA balancing's performance, but before adding complicated batch migration
> algorithm for NUMA balancing, some cleanup and preparation work need to do firstly,
> which are done in this patch set. In short, this patchset extends the
> migrate_misplaced_page() interface to support batch migration, and no functional
> changes intended.
>
> In addition, these cleanup can also benefit the compound page's NUMA balancing,
> which was discussed in previous thread[3]. IIUC, for the compound page's NUMA
> balancing, it is possible that partial pages were successfully migrated, so it is
> necessary to return the number of pages that were successfully migrated from
> migrate_misplaced_page().

But I don't find the return number is used except as bool now.

Per my understanding, I still don't find much value of the changes
except as preparation for batch migration in NUMA balancing.  So I still
think it's better to wait for the whole series.  Where we can check why
these changes are necessary for batch migration.  And I think that you
will provide some number to justify the batch migration, including pros
and cons.

--
Best Regards,
Huang, Ying

> This series is based on the latest mm-unstable(d226b59b30cc).
>
> [1] https://lore.kernel.org/lkml/cover.1639306956.git.baolin.wang@linux.alibaba.com/t/#mc45929849b5d0e29b5fdd9d50425f8e95b8f2563
> [2] https://lore.kernel.org/all/20230213123444.155149-1-ying.huang@intel.com/T/#u
> [3] https://lore.kernel.org/all/f8d47176-03a8-99bf-a813-b5942830fd73@arm.com/
>
> Changes from v1:
>  - Move page validation into a new function suggested by Huang Ying.
>  - Change numamigrate_isolate_page() to boolean type.
>  - Update some commit message.
>
> Baolin Wang (4):
>   mm: migrate: factor out migration validation into
>     numa_page_can_migrate()
>   mm: migrate: move the numamigrate_isolate_page() into do_numa_page()
>   mm: migrate: change migrate_misplaced_page() to support multiple pages
>     migration
>   mm: migrate: change to return the number of pages migrated
>     successfully
>
>  include/linux/migrate.h | 15 +++++++---
>  mm/huge_memory.c        | 23 +++++++++++++--
>  mm/internal.h           |  1 +
>  mm/memory.c             | 43 ++++++++++++++++++++++++++-
>  mm/migrate.c            | 64 +++++++++--------------------------------
>  5 files changed, 88 insertions(+), 58 deletions(-)


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/4] Extend migrate_misplaced_page() to support batch migration
  2023-08-22  2:47 ` [PATCH v2 0/4] Extend migrate_misplaced_page() to support batch migration Huang, Ying
@ 2023-08-24  3:13   ` Baolin Wang
  2023-08-24  4:51     ` Huang, Ying
  0 siblings, 1 reply; 11+ messages in thread
From: Baolin Wang @ 2023-08-24  3:13 UTC (permalink / raw)
  To: Huang, Ying; +Cc: akpm, mgorman, shy828301, david, linux-mm, linux-kernel



On 8/22/2023 10:47 AM, Huang, Ying wrote:
> Baolin Wang <baolin.wang@linux.alibaba.com> writes:
> 
>> Hi,
>>
>> Currently, on our ARM servers with NUMA enabled, we found the cross-die latency
>> is a little larger that will significantly impact the workload's performance.
>> So on ARM servers we will rely on the NUMA balancing to avoid the cross-die
>> accessing. And I posted a patchset[1] to support speculative numa fault to
>> improve the NUMA balancing's performance according to the principle of data
>> locality. Moreover, thanks to Huang Ying's patchset[2], which introduced batch
>> migration as a way to reduce the cost of TLB flush, and it will also benefit
>> the migration of multiple pages all at once during NUMA balancing.
>>
>> So we plan to continue to support batch migration in do_numa_page() to improve
>> the NUMA balancing's performance, but before adding complicated batch migration
>> algorithm for NUMA balancing, some cleanup and preparation work need to do firstly,
>> which are done in this patch set. In short, this patchset extends the
>> migrate_misplaced_page() interface to support batch migration, and no functional
>> changes intended.
>>
>> In addition, these cleanup can also benefit the compound page's NUMA balancing,
>> which was discussed in previous thread[3]. IIUC, for the compound page's NUMA
>> balancing, it is possible that partial pages were successfully migrated, so it is
>> necessary to return the number of pages that were successfully migrated from
>> migrate_misplaced_page().
> 
> But I don't find the return number is used except as bool now.

As I said above, this is a preparation for batch migration and compound 
page NUMA balancing in future.

In addition, after looking into the THP' NUMA migration, I found this 
change is necessary for THP migration. Since it is possible that partial 
subpages were successfully migrated if the THP is split, so below THP 
numa fault statistics is not always correct:

if (page_nid != NUMA_NO_NODE)
	task_numa_fault(last_cpupid, page_nid, HPAGE_PMD_NR,
				flags);

I will try to fix this in next version.

> Per my understanding, I still don't find much value of the changes
> except as preparation for batch migration in NUMA balancing.  So I still

IMO, only patch 3 is just a preparation for batch migration, but other 
patches are some cleanups for migrate_misplaced_page(). I can drop the 
preparation patches in this series and revise the commit message.

> think it's better to wait for the whole series.  Where we can check why
> these changes are necessary for batch migration.  And I think that you
> will provide some number to justify the batch migration, including pros
> and cons.
> 
> --
> Best Regards,
> Huang, Ying
> 
>> This series is based on the latest mm-unstable(d226b59b30cc).
>>
>> [1] https://lore.kernel.org/lkml/cover.1639306956.git.baolin.wang@linux.alibaba.com/t/#mc45929849b5d0e29b5fdd9d50425f8e95b8f2563
>> [2] https://lore.kernel.org/all/20230213123444.155149-1-ying.huang@intel.com/T/#u
>> [3] https://lore.kernel.org/all/f8d47176-03a8-99bf-a813-b5942830fd73@arm.com/
>>
>> Changes from v1:
>>   - Move page validation into a new function suggested by Huang Ying.
>>   - Change numamigrate_isolate_page() to boolean type.
>>   - Update some commit message.
>>
>> Baolin Wang (4):
>>    mm: migrate: factor out migration validation into
>>      numa_page_can_migrate()
>>    mm: migrate: move the numamigrate_isolate_page() into do_numa_page()
>>    mm: migrate: change migrate_misplaced_page() to support multiple pages
>>      migration
>>    mm: migrate: change to return the number of pages migrated
>>      successfully
>>
>>   include/linux/migrate.h | 15 +++++++---
>>   mm/huge_memory.c        | 23 +++++++++++++--
>>   mm/internal.h           |  1 +
>>   mm/memory.c             | 43 ++++++++++++++++++++++++++-
>>   mm/migrate.c            | 64 +++++++++--------------------------------
>>   5 files changed, 88 insertions(+), 58 deletions(-)


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/4] Extend migrate_misplaced_page() to support batch migration
  2023-08-24  3:13   ` Baolin Wang
@ 2023-08-24  4:51     ` Huang, Ying
  2023-08-24  6:26       ` Baolin Wang
  0 siblings, 1 reply; 11+ messages in thread
From: Huang, Ying @ 2023-08-24  4:51 UTC (permalink / raw)
  To: Baolin Wang; +Cc: akpm, mgorman, shy828301, david, linux-mm, linux-kernel

Baolin Wang <baolin.wang@linux.alibaba.com> writes:

> On 8/22/2023 10:47 AM, Huang, Ying wrote:
>> Baolin Wang <baolin.wang@linux.alibaba.com> writes:
>> 
>>> Hi,
>>>
>>> Currently, on our ARM servers with NUMA enabled, we found the cross-die latency
>>> is a little larger that will significantly impact the workload's performance.
>>> So on ARM servers we will rely on the NUMA balancing to avoid the cross-die
>>> accessing. And I posted a patchset[1] to support speculative numa fault to
>>> improve the NUMA balancing's performance according to the principle of data
>>> locality. Moreover, thanks to Huang Ying's patchset[2], which introduced batch
>>> migration as a way to reduce the cost of TLB flush, and it will also benefit
>>> the migration of multiple pages all at once during NUMA balancing.
>>>
>>> So we plan to continue to support batch migration in do_numa_page() to improve
>>> the NUMA balancing's performance, but before adding complicated batch migration
>>> algorithm for NUMA balancing, some cleanup and preparation work need to do firstly,
>>> which are done in this patch set. In short, this patchset extends the
>>> migrate_misplaced_page() interface to support batch migration, and no functional
>>> changes intended.
>>>
>>> In addition, these cleanup can also benefit the compound page's NUMA balancing,
>>> which was discussed in previous thread[3]. IIUC, for the compound page's NUMA
>>> balancing, it is possible that partial pages were successfully migrated, so it is
>>> necessary to return the number of pages that were successfully migrated from
>>> migrate_misplaced_page().
>> But I don't find the return number is used except as bool now.
>
> As I said above, this is a preparation for batch migration and
> compound page NUMA balancing in future.
>
> In addition, after looking into the THP' NUMA migration, I found this
> change is necessary for THP migration. Since it is possible that
> partial subpages were successfully migrated if the THP is split, so
> below THP numa fault statistics is not always correct:
>
> if (page_nid != NUMA_NO_NODE)
> 	task_numa_fault(last_cpupid, page_nid, HPAGE_PMD_NR,
> 				flags);
>
> I will try to fix this in next version.

IIUC, THP will not be split for NUMA balancing.  Please check the
nosplit logic in migrate_pages_batch().

	bool nosplit = (reason == MR_NUMA_MISPLACED);

--
Best Regards,
Huang, Ying

>> Per my understanding, I still don't find much value of the changes
>> except as preparation for batch migration in NUMA balancing.  So I still
>
> IMO, only patch 3 is just a preparation for batch migration, but other
> patches are some cleanups for migrate_misplaced_page(). I can drop the
> preparation patches in this series and revise the commit message.
>
>> think it's better to wait for the whole series.  Where we can check why
>> these changes are necessary for batch migration.  And I think that you
>> will provide some number to justify the batch migration, including pros
>> and cons.
>> --
>> Best Regards,
>> Huang, Ying
>> 
>>> This series is based on the latest mm-unstable(d226b59b30cc).
>>>
>>> [1] https://lore.kernel.org/lkml/cover.1639306956.git.baolin.wang@linux.alibaba.com/t/#mc45929849b5d0e29b5fdd9d50425f8e95b8f2563
>>> [2] https://lore.kernel.org/all/20230213123444.155149-1-ying.huang@intel.com/T/#u
>>> [3] https://lore.kernel.org/all/f8d47176-03a8-99bf-a813-b5942830fd73@arm.com/
>>>
>>> Changes from v1:
>>>   - Move page validation into a new function suggested by Huang Ying.
>>>   - Change numamigrate_isolate_page() to boolean type.
>>>   - Update some commit message.
>>>
>>> Baolin Wang (4):
>>>    mm: migrate: factor out migration validation into
>>>      numa_page_can_migrate()
>>>    mm: migrate: move the numamigrate_isolate_page() into do_numa_page()
>>>    mm: migrate: change migrate_misplaced_page() to support multiple pages
>>>      migration
>>>    mm: migrate: change to return the number of pages migrated
>>>      successfully
>>>
>>>   include/linux/migrate.h | 15 +++++++---
>>>   mm/huge_memory.c        | 23 +++++++++++++--
>>>   mm/internal.h           |  1 +
>>>   mm/memory.c             | 43 ++++++++++++++++++++++++++-
>>>   mm/migrate.c            | 64 +++++++++--------------------------------
>>>   5 files changed, 88 insertions(+), 58 deletions(-)


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/4] Extend migrate_misplaced_page() to support batch migration
  2023-08-24  4:51     ` Huang, Ying
@ 2023-08-24  6:26       ` Baolin Wang
  0 siblings, 0 replies; 11+ messages in thread
From: Baolin Wang @ 2023-08-24  6:26 UTC (permalink / raw)
  To: Huang, Ying; +Cc: akpm, mgorman, shy828301, david, linux-mm, linux-kernel



On 8/24/2023 12:51 PM, Huang, Ying wrote:
> Baolin Wang <baolin.wang@linux.alibaba.com> writes:
> 
>> On 8/22/2023 10:47 AM, Huang, Ying wrote:
>>> Baolin Wang <baolin.wang@linux.alibaba.com> writes:
>>>
>>>> Hi,
>>>>
>>>> Currently, on our ARM servers with NUMA enabled, we found the cross-die latency
>>>> is a little larger that will significantly impact the workload's performance.
>>>> So on ARM servers we will rely on the NUMA balancing to avoid the cross-die
>>>> accessing. And I posted a patchset[1] to support speculative numa fault to
>>>> improve the NUMA balancing's performance according to the principle of data
>>>> locality. Moreover, thanks to Huang Ying's patchset[2], which introduced batch
>>>> migration as a way to reduce the cost of TLB flush, and it will also benefit
>>>> the migration of multiple pages all at once during NUMA balancing.
>>>>
>>>> So we plan to continue to support batch migration in do_numa_page() to improve
>>>> the NUMA balancing's performance, but before adding complicated batch migration
>>>> algorithm for NUMA balancing, some cleanup and preparation work need to do firstly,
>>>> which are done in this patch set. In short, this patchset extends the
>>>> migrate_misplaced_page() interface to support batch migration, and no functional
>>>> changes intended.
>>>>
>>>> In addition, these cleanup can also benefit the compound page's NUMA balancing,
>>>> which was discussed in previous thread[3]. IIUC, for the compound page's NUMA
>>>> balancing, it is possible that partial pages were successfully migrated, so it is
>>>> necessary to return the number of pages that were successfully migrated from
>>>> migrate_misplaced_page().
>>> But I don't find the return number is used except as bool now.
>>
>> As I said above, this is a preparation for batch migration and
>> compound page NUMA balancing in future.
>>
>> In addition, after looking into the THP' NUMA migration, I found this
>> change is necessary for THP migration. Since it is possible that
>> partial subpages were successfully migrated if the THP is split, so
>> below THP numa fault statistics is not always correct:
>>
>> if (page_nid != NUMA_NO_NODE)
>> 	task_numa_fault(last_cpupid, page_nid, HPAGE_PMD_NR,
>> 				flags);
>>
>> I will try to fix this in next version.
> 
> IIUC, THP will not be split for NUMA balancing.  Please check the
> nosplit logic in migrate_pages_batch().
> 
> 	bool nosplit = (reason == MR_NUMA_MISPLACED);

Yes, I overlooked this. Thanks for reminding.

> 
>>> Per my understanding, I still don't find much value of the changes
>>> except as preparation for batch migration in NUMA balancing.  So I still
>>
>> IMO, only patch 3 is just a preparation for batch migration, but other
>> patches are some cleanups for migrate_misplaced_page(). I can drop the
>> preparation patches in this series and revise the commit message.
>>
>>> think it's better to wait for the whole series.  Where we can check why
>>> these changes are necessary for batch migration.  And I think that you
>>> will provide some number to justify the batch migration, including pros
>>> and cons.
>>> --
>>> Best Regards,
>>> Huang, Ying
>>>
>>>> This series is based on the latest mm-unstable(d226b59b30cc).
>>>>
>>>> [1] https://lore.kernel.org/lkml/cover.1639306956.git.baolin.wang@linux.alibaba.com/t/#mc45929849b5d0e29b5fdd9d50425f8e95b8f2563
>>>> [2] https://lore.kernel.org/all/20230213123444.155149-1-ying.huang@intel.com/T/#u
>>>> [3] https://lore.kernel.org/all/f8d47176-03a8-99bf-a813-b5942830fd73@arm.com/
>>>>
>>>> Changes from v1:
>>>>    - Move page validation into a new function suggested by Huang Ying.
>>>>    - Change numamigrate_isolate_page() to boolean type.
>>>>    - Update some commit message.
>>>>
>>>> Baolin Wang (4):
>>>>     mm: migrate: factor out migration validation into
>>>>       numa_page_can_migrate()
>>>>     mm: migrate: move the numamigrate_isolate_page() into do_numa_page()
>>>>     mm: migrate: change migrate_misplaced_page() to support multiple pages
>>>>       migration
>>>>     mm: migrate: change to return the number of pages migrated
>>>>       successfully
>>>>
>>>>    include/linux/migrate.h | 15 +++++++---
>>>>    mm/huge_memory.c        | 23 +++++++++++++--
>>>>    mm/internal.h           |  1 +
>>>>    mm/memory.c             | 43 ++++++++++++++++++++++++++-
>>>>    mm/migrate.c            | 64 +++++++++--------------------------------
>>>>    5 files changed, 88 insertions(+), 58 deletions(-)


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-08-24  6:26 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-22  0:53 [PATCH v2 0/4] Extend migrate_misplaced_page() to support batch migration Baolin Wang
2023-08-22  0:53 ` [PATCH v2 1/4] mm: migrate: factor out migration validation into numa_page_can_migrate() Baolin Wang
2023-08-22  0:53 ` [PATCH v2 2/4] mm: migrate: move the numamigrate_isolate_page() into do_numa_page() Baolin Wang
2023-08-22  9:02   ` Bharata B Rao
2023-08-24  3:14     ` Baolin Wang
2023-08-22  0:53 ` [PATCH v2 3/4] mm: migrate: change migrate_misplaced_page() to support multiple pages migration Baolin Wang
2023-08-22  0:53 ` [PATCH v2 4/4] mm: migrate: change to return the number of pages migrated successfully Baolin Wang
2023-08-22  2:47 ` [PATCH v2 0/4] Extend migrate_misplaced_page() to support batch migration Huang, Ying
2023-08-24  3:13   ` Baolin Wang
2023-08-24  4:51     ` Huang, Ying
2023-08-24  6:26       ` Baolin Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox