[PATCH v3 0/3] do_numa_page(),do_huge_pmd_numa

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 0/3] do_numa_page(),do_huge_pmd_numa_page() fix and cleanup
@ 2024-08-09 14:59 Zi Yan
  2024-08-09 14:59 ` [PATCH v3 1/3] mm/numa: no task_numa_fault() call if PTE is changed Zi Yan
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Zi Yan @ 2024-08-09 14:59 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, David Hildenbrand, Huang, Ying, Baolin Wang,
	Kefeng Wang, Yang Shi, Mel Gorman, linux-kernel, Zi Yan

Changes from v1[1] and v2(Patch 2 only)[2]
===
1. Patch 1: Separated do_numa_page() and do_huge_pmd_numa_page() fixes,
since the issues are introduced by two separate commits.

2. Patch 1: Moved migration failure branch code and called task_numa_fault()
and return immediately when migration succeedds. (per Huang, Ying)

3. Patch 2: change do_huge_pmd_numa_page() to match do_numa_page() in
terms of page table entry manipulation (per Huang, Ying)

4. Patch 1: Restructured the code (per Kefeng Wang)

5. Patch 1: Returned immediately when page table entries do not match instead
of using goto (per David Hildenbrand)

[1] https://lore.kernel.org/lkml/20240807184730.1266736-1-ziy@nvidia.com/
[2] https://lore.kernel.org/linux-mm/20240808233728.1477034-1-ziy@nvidia.com/

Zi Yan (3):
  mm/numa: no task_numa_fault() call if PTE is changed
  mm/numa: no task_numa_fault() call if PMD is changed
  mm/migrate: move common code to numa_migrate_check (was
    numa_migrate_prep)

 mm/huge_memory.c | 56 ++++++++++++----------------
 mm/internal.h    |  5 ++-
 mm/memory.c      | 96 ++++++++++++++++++++++++------------------------
 3 files changed, 75 insertions(+), 82 deletions(-)

-- 
2.43.0

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 1/3] mm/numa: no task_numa_fault() call if PTE is changed
  2024-08-09 14:59 [PATCH v3 0/3] do_numa_page(),do_huge_pmd_numa_page() fix and cleanup Zi Yan
@ 2024-08-09 14:59 ` Zi Yan
  2024-08-09 15:29   ` David Hildenbrand
  2024-08-09 14:59 ` [PATCH v3 2/3] mm/numa: no task_numa_fault() call if PMD " Zi Yan
  2024-08-09 14:59 ` [PATCH v3 3/3] mm/migrate: move common code to numa_migrate_check (was numa_migrate_prep) Zi Yan
  2 siblings, 1 reply; 9+ messages in thread
From: Zi Yan @ 2024-08-09 14:59 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, David Hildenbrand, Huang, Ying, Baolin Wang,
	Kefeng Wang, Yang Shi, Mel Gorman, linux-kernel, Zi Yan, stable

When handling a numa page fault, task_numa_fault() should be called by a
process that restores the page table of the faulted folio to avoid
duplicated stats counting. Commit b99a342d4f11 ("NUMA balancing: reduce
TLB flush via delaying mapping on hint page fault") restructured
do_numa_page() and did not avoid task_numa_fault() call in the second page
table check after a numa migration failure. Fix it by making all
!pte_same() return immediately.

This issue can cause task_numa_fault() being called more than necessary
and lead to unexpected numa balancing results (It is hard to tell whether
the issue will cause positive or negative performance impact due to
duplicated numa fault counting).

Reported-by: "Huang, Ying" <ying.huang@intel.com>
Closes: https://lore.kernel.org/linux-mm/87zfqfw0yw.fsf@yhuang6-desk2.ccr.corp.intel.com/
Fixes: b99a342d4f11 ("NUMA balancing: reduce TLB flush via delaying mapping on hint page fault")
Cc: <stable@vger.kernel.org>
Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 mm/memory.c | 33 ++++++++++++++++-----------------
 1 file changed, 16 insertions(+), 17 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 67496dc5064f..bf791da57cab 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5461,7 +5461,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
 
 	if (unlikely(!pte_same(old_pte, vmf->orig_pte))) {
 		pte_unmap_unlock(vmf->pte, vmf->ptl);
-		goto out;
+		return 0;
 	}
 
 	pte = pte_modify(old_pte, vma->vm_page_prot);
@@ -5523,23 +5523,19 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
 	if (!migrate_misplaced_folio(folio, vma, target_nid)) {
 		nid = target_nid;
 		flags |= TNF_MIGRATED;
-	} else {
-		flags |= TNF_MIGRATE_FAIL;
-		vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
-					       vmf->address, &vmf->ptl);
-		if (unlikely(!vmf->pte))
-			goto out;
-		if (unlikely(!pte_same(ptep_get(vmf->pte), vmf->orig_pte))) {
-			pte_unmap_unlock(vmf->pte, vmf->ptl);
-			goto out;
-		}
-		goto out_map;
+		task_numa_fault(last_cpupid, nid, nr_pages, flags);
+		return 0;
 	}
 
-out:
-	if (nid != NUMA_NO_NODE)
-		task_numa_fault(last_cpupid, nid, nr_pages, flags);
-	return 0;
+	flags |= TNF_MIGRATE_FAIL;
+	vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
+				       vmf->address, &vmf->ptl);
+	if (unlikely(!vmf->pte))
+		return 0;
+	if (unlikely(!pte_same(ptep_get(vmf->pte), vmf->orig_pte))) {
+		pte_unmap_unlock(vmf->pte, vmf->ptl);
+		return 0;
+	}
 out_map:
 	/*
 	 * Make it present again, depending on how arch implements
@@ -5552,7 +5548,10 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
 		numa_rebuild_single_mapping(vmf, vma, vmf->address, vmf->pte,
 					    writable);
 	pte_unmap_unlock(vmf->pte, vmf->ptl);
-	goto out;
+
+	if (nid != NUMA_NO_NODE)
+		task_numa_fault(last_cpupid, nid, nr_pages, flags);
+	return 0;
 }
 
 static inline vm_fault_t create_huge_pmd(struct vm_fault *vmf)
-- 
2.43.0



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/3] mm/numa: no task_numa_fault() call if PTE is changed
  2024-08-09 14:59 ` [PATCH v3 1/3] mm/numa: no task_numa_fault() call if PTE is changed Zi Yan
@ 2024-08-09 15:29   ` David Hildenbrand
  0 siblings, 0 replies; 9+ messages in thread
From: David Hildenbrand @ 2024-08-09 15:29 UTC (permalink / raw)
  To: Zi Yan, linux-mm
  Cc: Andrew Morton, Huang, Ying, Baolin Wang, Kefeng Wang, Yang Shi,
	Mel Gorman, linux-kernel, stable

On 09.08.24 16:59, Zi Yan wrote:
> When handling a numa page fault, task_numa_fault() should be called by a
> process that restores the page table of the faulted folio to avoid
> duplicated stats counting. Commit b99a342d4f11 ("NUMA balancing: reduce
> TLB flush via delaying mapping on hint page fault") restructured
> do_numa_page() and did not avoid task_numa_fault() call in the second page
> table check after a numa migration failure. Fix it by making all
> !pte_same() return immediately.
> 
> This issue can cause task_numa_fault() being called more than necessary
> and lead to unexpected numa balancing results (It is hard to tell whether
> the issue will cause positive or negative performance impact due to
> duplicated numa fault counting).
> 
> Reported-by: "Huang, Ying" <ying.huang@intel.com>
> Closes: https://lore.kernel.org/linux-mm/87zfqfw0yw.fsf@yhuang6-desk2.ccr.corp.intel.com/
> Fixes: b99a342d4f11 ("NUMA balancing: reduce TLB flush via delaying mapping on hint page fault")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 2/3] mm/numa: no task_numa_fault() call if PMD is changed
  2024-08-09 14:59 [PATCH v3 0/3] do_numa_page(),do_huge_pmd_numa_page() fix and cleanup Zi Yan
  2024-08-09 14:59 ` [PATCH v3 1/3] mm/numa: no task_numa_fault() call if PTE is changed Zi Yan
@ 2024-08-09 14:59 ` Zi Yan
  2024-08-09 15:29   ` David Hildenbrand
  2024-08-09 14:59 ` [PATCH v3 3/3] mm/migrate: move common code to numa_migrate_check (was numa_migrate_prep) Zi Yan
  2 siblings, 1 reply; 9+ messages in thread
From: Zi Yan @ 2024-08-09 14:59 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, David Hildenbrand, Huang, Ying, Baolin Wang,
	Kefeng Wang, Yang Shi, Mel Gorman, linux-kernel, Zi Yan, stable

When handling a numa page fault, task_numa_fault() should be called by a
process that restores the page table of the faulted folio to avoid
duplicated stats counting. Commit c5b5a3dd2c1f ("mm: thp: refactor NUMA
fault handling") restructured do_huge_pmd_numa_page() and did not avoid
task_numa_fault() call in the second page table check after a numa
migration failure. Fix it by making all !pmd_same() return immediately.

This issue can cause task_numa_fault() being called more than necessary
and lead to unexpected numa balancing results (It is hard to tell whether
the issue will cause positive or negative performance impact due to
duplicated numa fault counting).

Reported-by: "Huang, Ying" <ying.huang@intel.com>
Closes: https://lore.kernel.org/linux-mm/87zfqfw0yw.fsf@yhuang6-desk2.ccr.corp.intel.com/
Fixes: c5b5a3dd2c1f ("mm: thp: refactor NUMA fault handling")
Cc: <stable@vger.kernel.org>
Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 mm/huge_memory.c | 29 +++++++++++++----------------
 1 file changed, 13 insertions(+), 16 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 0024266dea0a..666fa675e5b6 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1681,7 +1681,7 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
 	vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
 	if (unlikely(!pmd_same(oldpmd, *vmf->pmd))) {
 		spin_unlock(vmf->ptl);
-		goto out;
+		return 0;
 	}
 
 	pmd = pmd_modify(oldpmd, vma->vm_page_prot);
@@ -1724,22 +1724,16 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
 	if (!migrate_misplaced_folio(folio, vma, target_nid)) {
 		flags |= TNF_MIGRATED;
 		nid = target_nid;
-	} else {
-		flags |= TNF_MIGRATE_FAIL;
-		vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
-		if (unlikely(!pmd_same(oldpmd, *vmf->pmd))) {
-			spin_unlock(vmf->ptl);
-			goto out;
-		}
-		goto out_map;
-	}
-
-out:
-	if (nid != NUMA_NO_NODE)
 		task_numa_fault(last_cpupid, nid, HPAGE_PMD_NR, flags);
+		return 0;
+	}
 
-	return 0;
-
+	flags |= TNF_MIGRATE_FAIL;
+	vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
+	if (unlikely(!pmd_same(oldpmd, *vmf->pmd))) {
+		spin_unlock(vmf->ptl);
+		return 0;
+	}
 out_map:
 	/* Restore the PMD */
 	pmd = pmd_modify(oldpmd, vma->vm_page_prot);
@@ -1749,7 +1743,10 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
 	set_pmd_at(vma->vm_mm, haddr, vmf->pmd, pmd);
 	update_mmu_cache_pmd(vma, vmf->address, vmf->pmd);
 	spin_unlock(vmf->ptl);
-	goto out;
+
+	if (nid != NUMA_NO_NODE)
+		task_numa_fault(last_cpupid, nid, HPAGE_PMD_NR, flags);
+	return 0;
 }
 
 /*
-- 
2.43.0



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 2/3] mm/numa: no task_numa_fault() call if PMD is changed
  2024-08-09 14:59 ` [PATCH v3 2/3] mm/numa: no task_numa_fault() call if PMD " Zi Yan
@ 2024-08-09 15:29   ` David Hildenbrand
  0 siblings, 0 replies; 9+ messages in thread
From: David Hildenbrand @ 2024-08-09 15:29 UTC (permalink / raw)
  To: Zi Yan, linux-mm
  Cc: Andrew Morton, Huang, Ying, Baolin Wang, Kefeng Wang, Yang Shi,
	Mel Gorman, linux-kernel, stable

On 09.08.24 16:59, Zi Yan wrote:
> When handling a numa page fault, task_numa_fault() should be called by a
> process that restores the page table of the faulted folio to avoid
> duplicated stats counting. Commit c5b5a3dd2c1f ("mm: thp: refactor NUMA
> fault handling") restructured do_huge_pmd_numa_page() and did not avoid
> task_numa_fault() call in the second page table check after a numa
> migration failure. Fix it by making all !pmd_same() return immediately.
> 
> This issue can cause task_numa_fault() being called more than necessary
> and lead to unexpected numa balancing results (It is hard to tell whether
> the issue will cause positive or negative performance impact due to
> duplicated numa fault counting).
> 
> Reported-by: "Huang, Ying" <ying.huang@intel.com>
> Closes: https://lore.kernel.org/linux-mm/87zfqfw0yw.fsf@yhuang6-desk2.ccr.corp.intel.com/
> Fixes: c5b5a3dd2c1f ("mm: thp: refactor NUMA fault handling")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 3/3] mm/migrate: move common code to numa_migrate_check (was numa_migrate_prep)
  2024-08-09 14:59 [PATCH v3 0/3] do_numa_page(),do_huge_pmd_numa_page() fix and cleanup Zi Yan
  2024-08-09 14:59 ` [PATCH v3 1/3] mm/numa: no task_numa_fault() call if PTE is changed Zi Yan
  2024-08-09 14:59 ` [PATCH v3 2/3] mm/numa: no task_numa_fault() call if PMD " Zi Yan
@ 2024-08-09 14:59 ` Zi Yan
  2024-08-09 15:32   ` David Hildenbrand
  2 siblings, 1 reply; 9+ messages in thread
From: Zi Yan @ 2024-08-09 14:59 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, David Hildenbrand, Huang, Ying, Baolin Wang,
	Kefeng Wang, Yang Shi, Mel Gorman, linux-kernel, Zi Yan

do_numa_page() and do_huge_pmd_numa_page() share a lot of common code. To
reduce redundancy, move common code to numa_migrate_prep() and rename
the function to numa_migrate_check() to reflect its functionality.

Now do_huge_pmd_numa_page() also checks shared folios to set TNF_SHARED
flag.

Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 mm/huge_memory.c | 29 +++++++++-------------
 mm/internal.h    |  5 ++--
 mm/memory.c      | 63 +++++++++++++++++++++++++-----------------------
 3 files changed, 47 insertions(+), 50 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 666fa675e5b6..f2fd3aabb67b 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1669,22 +1669,23 @@ static inline bool can_change_pmd_writable(struct vm_area_struct *vma,
 vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
-	pmd_t oldpmd = vmf->orig_pmd;
-	pmd_t pmd;
 	struct folio *folio;
 	unsigned long haddr = vmf->address & HPAGE_PMD_MASK;
 	int nid = NUMA_NO_NODE;
-	int target_nid, last_cpupid = (-1 & LAST_CPUPID_MASK);
+	int target_nid, last_cpupid;
+	pmd_t pmd, old_pmd;
 	bool writable = false;
 	int flags = 0;
 
 	vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
-	if (unlikely(!pmd_same(oldpmd, *vmf->pmd))) {
+	old_pmd = pmdp_get(vmf->pmd);
+
+	if (unlikely(!pmd_same(old_pmd, vmf->orig_pmd))) {
 		spin_unlock(vmf->ptl);
 		return 0;
 	}
 
-	pmd = pmd_modify(oldpmd, vma->vm_page_prot);
+	pmd = pmd_modify(old_pmd, vma->vm_page_prot);
 
 	/*
 	 * Detect now whether the PMD could be writable; this information
@@ -1699,18 +1700,10 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
 	if (!folio)
 		goto out_map;
 
-	/* See similar comment in do_numa_page for explanation */
-	if (!writable)
-		flags |= TNF_NO_GROUP;
-
 	nid = folio_nid(folio);
-	/*
-	 * For memory tiering mode, cpupid of slow memory page is used
-	 * to record page access time.  So use default value.
-	 */
-	if (!folio_use_access_time(folio))
-		last_cpupid = folio_last_cpupid(folio);
-	target_nid = numa_migrate_prep(folio, vmf, haddr, nid, &flags);
+
+	target_nid = numa_migrate_check(folio, vmf, haddr, &flags, writable,
+					&last_cpupid);
 	if (target_nid == NUMA_NO_NODE)
 		goto out_map;
 	if (migrate_misplaced_folio_prepare(folio, vma, target_nid)) {
@@ -1730,13 +1723,13 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
 
 	flags |= TNF_MIGRATE_FAIL;
 	vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
-	if (unlikely(!pmd_same(oldpmd, *vmf->pmd))) {
+	if (unlikely(!pmd_same(pmdp_get(vmf->pmd), vmf->orig_pmd))) {
 		spin_unlock(vmf->ptl);
 		return 0;
 	}
 out_map:
 	/* Restore the PMD */
-	pmd = pmd_modify(oldpmd, vma->vm_page_prot);
+	pmd = pmd_modify(pmdp_get(vmf->pmd), vma->vm_page_prot);
 	pmd = pmd_mkyoung(pmd);
 	if (writable)
 		pmd = pmd_mkwrite(pmd, vma);
diff --git a/mm/internal.h b/mm/internal.h
index 52f7fc4e8ac3..fb16e18c9761 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1191,8 +1191,9 @@ void vunmap_range_noflush(unsigned long start, unsigned long end);
 
 void __vunmap_range_noflush(unsigned long start, unsigned long end);
 
-int numa_migrate_prep(struct folio *folio, struct vm_fault *vmf,
-		      unsigned long addr, int page_nid, int *flags);
+int numa_migrate_check(struct folio *folio, struct vm_fault *vmf,
+		      unsigned long addr, int *flags, bool writable,
+		      int *last_cpupid);
 
 void free_zone_device_folio(struct folio *folio);
 int migrate_device_coherent_page(struct page *page);
diff --git a/mm/memory.c b/mm/memory.c
index bf791da57cab..e4f27c0696cb 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5368,16 +5368,43 @@ static vm_fault_t do_fault(struct vm_fault *vmf)
 	return ret;
 }
 
-int numa_migrate_prep(struct folio *folio, struct vm_fault *vmf,
-		      unsigned long addr, int page_nid, int *flags)
+int numa_migrate_check(struct folio *folio, struct vm_fault *vmf,
+		      unsigned long addr, int *flags,
+		      bool writable, int *last_cpupid)
 {
 	struct vm_area_struct *vma = vmf->vma;
 
+	/*
+	 * Avoid grouping on RO pages in general. RO pages shouldn't hurt as
+	 * much anyway since they can be in shared cache state. This misses
+	 * the case where a mapping is writable but the process never writes
+	 * to it but pte_write gets cleared during protection updates and
+	 * pte_dirty has unpredictable behaviour between PTE scan updates,
+	 * background writeback, dirty balancing and application behaviour.
+	 */
+	if (!writable)
+		*flags |= TNF_NO_GROUP;
+
+	/*
+	 * Flag if the folio is shared between multiple address spaces. This
+	 * is later used when determining whether to group tasks together
+	 */
+	if (folio_likely_mapped_shared(folio) && (vma->vm_flags & VM_SHARED))
+		*flags |= TNF_SHARED;
+	/*
+	 * For memory tiering mode, cpupid of slow memory page is used
+	 * to record page access time.  So use default value.
+	 */
+	if (folio_use_access_time(folio))
+		*last_cpupid = (-1 & LAST_CPUPID_MASK);
+	else
+		*last_cpupid = folio_last_cpupid(folio);
+
 	/* Record the current PID acceesing VMA */
 	vma_set_access_pid_bit(vma);
 
 	count_vm_numa_event(NUMA_HINT_FAULTS);
-	if (page_nid == numa_node_id()) {
+	if (folio_nid(folio) == numa_node_id()) {
 		count_vm_numa_event(NUMA_HINT_FAULTS_LOCAL);
 		*flags |= TNF_FAULT_LOCAL;
 	}
@@ -5479,35 +5506,11 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
 	if (!folio || folio_is_zone_device(folio))
 		goto out_map;
 
-	/*
-	 * Avoid grouping on RO pages in general. RO pages shouldn't hurt as
-	 * much anyway since they can be in shared cache state. This misses
-	 * the case where a mapping is writable but the process never writes
-	 * to it but pte_write gets cleared during protection updates and
-	 * pte_dirty has unpredictable behaviour between PTE scan updates,
-	 * background writeback, dirty balancing and application behaviour.
-	 */
-	if (!writable)
-		flags |= TNF_NO_GROUP;
-
-	/*
-	 * Flag if the folio is shared between multiple address spaces. This
-	 * is later used when determining whether to group tasks together
-	 */
-	if (folio_likely_mapped_shared(folio) && (vma->vm_flags & VM_SHARED))
-		flags |= TNF_SHARED;
-
 	nid = folio_nid(folio);
 	nr_pages = folio_nr_pages(folio);
-	/*
-	 * For memory tiering mode, cpupid of slow memory page is used
-	 * to record page access time.  So use default value.
-	 */
-	if (folio_use_access_time(folio))
-		last_cpupid = (-1 & LAST_CPUPID_MASK);
-	else
-		last_cpupid = folio_last_cpupid(folio);
-	target_nid = numa_migrate_prep(folio, vmf, vmf->address, nid, &flags);
+
+	target_nid = numa_migrate_check(folio, vmf, vmf->address, &flags,
+					writable, &last_cpupid);
 	if (target_nid == NUMA_NO_NODE)
 		goto out_map;
 	if (migrate_misplaced_folio_prepare(folio, vma, target_nid)) {
-- 
2.43.0



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 3/3] mm/migrate: move common code to numa_migrate_check (was numa_migrate_prep)
  2024-08-09 14:59 ` [PATCH v3 3/3] mm/migrate: move common code to numa_migrate_check (was numa_migrate_prep) Zi Yan
@ 2024-08-09 15:32   ` David Hildenbrand
  2024-08-12  2:20     ` Huang, Ying
  0 siblings, 1 reply; 9+ messages in thread
From: David Hildenbrand @ 2024-08-09 15:32 UTC (permalink / raw)
  To: Zi Yan, linux-mm
  Cc: Andrew Morton, Huang, Ying, Baolin Wang, Kefeng Wang, Yang Shi,
	Mel Gorman, linux-kernel

On 09.08.24 16:59, Zi Yan wrote:
> do_numa_page() and do_huge_pmd_numa_page() share a lot of common code. To
> reduce redundancy, move common code to numa_migrate_prep() and rename
> the function to numa_migrate_check() to reflect its functionality.
> 
> Now do_huge_pmd_numa_page() also checks shared folios to set TNF_SHARED
> flag.

Yeah, I was also wondering why we didn't check that in the PMD case.

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 3/3] mm/migrate: move common code to numa_migrate_check (was numa_migrate_prep)
  2024-08-09 15:32   ` David Hildenbrand
@ 2024-08-12  2:20     ` Huang, Ying
  2024-08-12  9:06       ` David Hildenbrand
  0 siblings, 1 reply; 9+ messages in thread
From: Huang, Ying @ 2024-08-12  2:20 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Zi Yan, linux-mm, Andrew Morton, Baolin Wang, Kefeng Wang,
	Yang Shi, Mel Gorman, linux-kernel

David Hildenbrand <david@redhat.com> writes:

> On 09.08.24 16:59, Zi Yan wrote:
>> do_numa_page() and do_huge_pmd_numa_page() share a lot of common code. To
>> reduce redundancy, move common code to numa_migrate_prep() and rename
>> the function to numa_migrate_check() to reflect its functionality.
>> Now do_huge_pmd_numa_page() also checks shared folios to set
>> TNF_SHARED flag.
>
> Yeah, I was also wondering why we didn't check that in the PMD case.

IIUC, before commit 728be28fae8c ("mm: migrate: remove THP mapcount
check in numamigrate_isolate_page()"), we don't allow to migrate THP
with mapcount > 1.  So, it's unnecessary to set TNF_SHARED flag at that
time.  However, after that commit, we need to do it.

--
Best Regards,
Huang, Ying



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 3/3] mm/migrate: move common code to numa_migrate_check (was numa_migrate_prep)
  2024-08-12  2:20     ` Huang, Ying
@ 2024-08-12  9:06       ` David Hildenbrand
  0 siblings, 0 replies; 9+ messages in thread
From: David Hildenbrand @ 2024-08-12  9:06 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Zi Yan, linux-mm, Andrew Morton, Baolin Wang, Kefeng Wang,
	Yang Shi, Mel Gorman, linux-kernel

On 12.08.24 04:20, Huang, Ying wrote:
> David Hildenbrand <david@redhat.com> writes:
> 
>> On 09.08.24 16:59, Zi Yan wrote:
>>> do_numa_page() and do_huge_pmd_numa_page() share a lot of common code. To
>>> reduce redundancy, move common code to numa_migrate_prep() and rename
>>> the function to numa_migrate_check() to reflect its functionality.
>>> Now do_huge_pmd_numa_page() also checks shared folios to set
>>> TNF_SHARED flag.
>>
>> Yeah, I was also wondering why we didn't check that in the PMD case.
> 
> IIUC, before commit 728be28fae8c ("mm: migrate: remove THP mapcount
> check in numamigrate_isolate_page()"), we don't allow to migrate THP
> with mapcount > 1.  So, it's unnecessary to set TNF_SHARED flag at that
> time.  However, after that commit, we need to do it.

Thanks for the pointer, rings a bell. Indeed, the check was remove for a 
different purpose, so introducing a check for setting the flag makes sense.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-08-12  9:07 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-08-09 14:59 [PATCH v3 0/3] do_numa_page(),do_huge_pmd_numa_page() fix and cleanup Zi Yan
2024-08-09 14:59 ` [PATCH v3 1/3] mm/numa: no task_numa_fault() call if PTE is changed Zi Yan
2024-08-09 15:29   ` David Hildenbrand
2024-08-09 14:59 ` [PATCH v3 2/3] mm/numa: no task_numa_fault() call if PMD " Zi Yan
2024-08-09 15:29   ` David Hildenbrand
2024-08-09 14:59 ` [PATCH v3 3/3] mm/migrate: move common code to numa_migrate_check (was numa_migrate_prep) Zi Yan
2024-08-09 15:32   ` David Hildenbrand
2024-08-12  2:20     ` Huang, Ying
2024-08-12  9:06       ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox