linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/3] mm: some optimizations for prot numa
@ 2025-10-15 12:35 Kefeng Wang
  2025-10-15 12:35 ` [PATCH v3 1/3] mm: mprotect: always skip dma pinned folio in prot_numa_skip() Kefeng Wang
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Kefeng Wang @ 2025-10-15 12:35 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Lorenzo Stoakes, linux-mm
  Cc: Zi Yan, Baolin Wang, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, Liam.Howlett, Kefeng Wang

v3:
- use "&&" instead of "&" in patch2
- rename folio_skip_prot_numa() to folio_needs_prot_numa()
  and add kerneldoc
- add RB

v2:
- update comments for pte_protnone() in patch2
- rename prot_numa_skip() to folio_skip_prot_numa() and
  cleanup it a bit
- add RB/ACK


Kefeng Wang (3):
  mm: mprotect: always skip dma pinned folio in prot_numa_skip()
  mm: mprotect: avoid unnecessary struct page accessing if
    pte_protnone()
  mm: huge_memory: use folio_needs_prot_numa() for pmd folio

 mm/huge_memory.c | 21 +++++---------
 mm/internal.h    |  2 ++
 mm/mprotect.c    | 72 ++++++++++++++++++++++++------------------------
 3 files changed, 45 insertions(+), 50 deletions(-)

-- 
2.27.0



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 1/3] mm: mprotect: always skip dma pinned folio in prot_numa_skip()
  2025-10-15 12:35 [PATCH v3 0/3] mm: some optimizations for prot numa Kefeng Wang
@ 2025-10-15 12:35 ` Kefeng Wang
  2025-10-15 15:32   ` Lorenzo Stoakes
                     ` (2 more replies)
  2025-10-15 12:35 ` [PATCH v3 2/3] mm: mprotect: avoid unnecessary struct page accessing if pte_protnone() Kefeng Wang
  2025-10-15 12:35 ` [PATCH v3 3/3] mm: huge_memory: use folio_needs_prot_numa() for pmd folio Kefeng Wang
  2 siblings, 3 replies; 22+ messages in thread
From: Kefeng Wang @ 2025-10-15 12:35 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Lorenzo Stoakes, linux-mm
  Cc: Zi Yan, Baolin Wang, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, Liam.Howlett, Kefeng Wang, Sidhartha Kumar

If the folio(even not CoW folio) is dma pinned, it can't be
migrated, so always skip pinned folio to avoid a waste of cycles
when folios migration.

Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 mm/mprotect.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/mm/mprotect.c b/mm/mprotect.c
index 113b48985834..bb59a42809b8 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -136,9 +136,12 @@ static bool prot_numa_skip(struct vm_area_struct *vma, unsigned long addr,
 	if (folio_is_zone_device(folio) || folio_test_ksm(folio))
 		goto skip;
 
-	/* Also skip shared copy-on-write pages */
-	if (is_cow_mapping(vma->vm_flags) &&
-	    (folio_maybe_dma_pinned(folio) || folio_maybe_mapped_shared(folio)))
+	/* Also skip shared copy-on-write folios */
+	if (is_cow_mapping(vma->vm_flags) && folio_maybe_mapped_shared(folio))
+		goto skip;
+
+	/* Folios are pinned and can't be migrated */
+	if (folio_maybe_dma_pinned(folio))
 		goto skip;
 
 	/*
-- 
2.27.0



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 2/3] mm: mprotect: avoid unnecessary struct page accessing if pte_protnone()
  2025-10-15 12:35 [PATCH v3 0/3] mm: some optimizations for prot numa Kefeng Wang
  2025-10-15 12:35 ` [PATCH v3 1/3] mm: mprotect: always skip dma pinned folio in prot_numa_skip() Kefeng Wang
@ 2025-10-15 12:35 ` Kefeng Wang
  2025-10-15 15:43   ` Lorenzo Stoakes
                     ` (2 more replies)
  2025-10-15 12:35 ` [PATCH v3 3/3] mm: huge_memory: use folio_needs_prot_numa() for pmd folio Kefeng Wang
  2 siblings, 3 replies; 22+ messages in thread
From: Kefeng Wang @ 2025-10-15 12:35 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Lorenzo Stoakes, linux-mm
  Cc: Zi Yan, Baolin Wang, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, Liam.Howlett, Kefeng Wang, Sidhartha Kumar

If the pte_protnone() is true, we could avoid unnecessary struct page
accessing and reduce cache footprint when scanning page tables for prot
numa, the performance test of pmbench memory accessing benchmark
should be benifit, see more commit a818f5363a0e ("autonuma: reduce cache
footprint when scanning page tables").

Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 mm/mprotect.c | 30 +++++++++++++-----------------
 1 file changed, 13 insertions(+), 17 deletions(-)

diff --git a/mm/mprotect.c b/mm/mprotect.c
index bb59a42809b8..ed44aadb7aaa 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -118,18 +118,13 @@ static int mprotect_folio_pte_batch(struct folio *folio, pte_t *ptep,
 	return folio_pte_batch_flags(folio, NULL, ptep, &pte, max_nr_ptes, flags);
 }
 
-static bool prot_numa_skip(struct vm_area_struct *vma, unsigned long addr,
-			   pte_t oldpte, pte_t *pte, int target_node,
-			   struct folio *folio)
+static bool prot_numa_skip(struct vm_area_struct *vma, int target_node,
+		struct folio *folio)
 {
 	bool ret = true;
 	bool toptier;
 	int nid;
 
-	/* Avoid TLB flush if possible */
-	if (pte_protnone(oldpte))
-		goto skip;
-
 	if (!folio)
 		goto skip;
 
@@ -307,23 +302,24 @@ static long change_pte_range(struct mmu_gather *tlb,
 			struct page *page;
 			pte_t ptent;
 
+			/* Already in the desired state. */
+			if (prot_numa && pte_protnone(oldpte))
+				continue;
+
 			page = vm_normal_page(vma, addr, oldpte);
 			if (page)
 				folio = page_folio(page);
+
 			/*
 			 * Avoid trapping faults against the zero or KSM
 			 * pages. See similar comment in change_huge_pmd.
 			 */
-			if (prot_numa) {
-				int ret = prot_numa_skip(vma, addr, oldpte, pte,
-							 target_node, folio);
-				if (ret) {
-
-					/* determine batch to skip */
-					nr_ptes = mprotect_folio_pte_batch(folio,
-						  pte, oldpte, max_nr_ptes, /* flags = */ 0);
-					continue;
-				}
+			if (prot_numa && prot_numa_skip(vma, target_node,
+							folio)) {
+				/* determine batch to skip */
+				nr_ptes = mprotect_folio_pte_batch(folio,
+					  pte, oldpte, max_nr_ptes, /* flags = */ 0);
+				continue;
 			}
 
 			nr_ptes = mprotect_folio_pte_batch(folio, pte, oldpte, max_nr_ptes, flags);
-- 
2.27.0



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 3/3] mm: huge_memory: use folio_needs_prot_numa() for pmd folio
  2025-10-15 12:35 [PATCH v3 0/3] mm: some optimizations for prot numa Kefeng Wang
  2025-10-15 12:35 ` [PATCH v3 1/3] mm: mprotect: always skip dma pinned folio in prot_numa_skip() Kefeng Wang
  2025-10-15 12:35 ` [PATCH v3 2/3] mm: mprotect: avoid unnecessary struct page accessing if pte_protnone() Kefeng Wang
@ 2025-10-15 12:35 ` Kefeng Wang
  2025-10-16 19:19   ` David Hildenbrand
  2025-10-17 10:07   ` Lorenzo Stoakes
  2 siblings, 2 replies; 22+ messages in thread
From: Kefeng Wang @ 2025-10-15 12:35 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Lorenzo Stoakes, linux-mm
  Cc: Zi Yan, Baolin Wang, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, Liam.Howlett, Kefeng Wang, Sidhartha Kumar

Rename prot_numa_skip() to folio_needs_prot_numa(), and remove
ret by directly return value instead of goto style.

The folio checks for prot numa should be suitable for pmd folio
too, which helps to avoid unnecessary pmd change and folio
migration attempts.

Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 mm/huge_memory.c | 21 +++++++--------------
 mm/internal.h    |  2 ++
 mm/mprotect.c    | 45 +++++++++++++++++++++++----------------------
 3 files changed, 32 insertions(+), 36 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 1d1b74950332..c7364dcb96c1 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2395,8 +2395,7 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 #endif
 
 	if (prot_numa) {
-		struct folio *folio;
-		bool toptier;
+		int target_node = NUMA_NO_NODE;
 		/*
 		 * Avoid trapping faults against the zero page. The read-only
 		 * data is likely to be read-cached on the local CPU and
@@ -2408,19 +2407,13 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 		if (pmd_protnone(*pmd))
 			goto unlock;
 
-		folio = pmd_folio(*pmd);
-		toptier = node_is_toptier(folio_nid(folio));
-		/*
-		 * Skip scanning top tier node if normal numa
-		 * balancing is disabled
-		 */
-		if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_NORMAL) &&
-		    toptier)
-			goto unlock;
+		/* Get target node for single threaded private VMAs */
+		if (!(vma->vm_flags & VM_SHARED) &&
+		    atomic_read(&vma->vm_mm->mm_users) == 1)
+			target_node = numa_node_id();
 
-		if (folio_use_access_time(folio))
-			folio_xchg_access_time(folio,
-					       jiffies_to_msecs(jiffies));
+		if (!folio_needs_prot_numa(pmd_folio(*pmd), vma, target_node))
+			goto unlock;
 	}
 	/*
 	 * In case prot_numa, we are under mmap_read_lock(mm). It's critical
diff --git a/mm/internal.h b/mm/internal.h
index 1561fc2ff5b8..5f63d5c049b1 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1378,6 +1378,8 @@ void vunmap_range_noflush(unsigned long start, unsigned long end);
 
 void __vunmap_range_noflush(unsigned long start, unsigned long end);
 
+bool folio_needs_prot_numa(struct folio *folio, struct vm_area_struct *vma,
+		int target_node);
 int numa_migrate_check(struct folio *folio, struct vm_fault *vmf,
 		      unsigned long addr, int *flags, bool writable,
 		      int *last_cpupid);
diff --git a/mm/mprotect.c b/mm/mprotect.c
index ed44aadb7aaa..0ae8f4a277b2 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -118,26 +118,30 @@ static int mprotect_folio_pte_batch(struct folio *folio, pte_t *ptep,
 	return folio_pte_batch_flags(folio, NULL, ptep, &pte, max_nr_ptes, flags);
 }
 
-static bool prot_numa_skip(struct vm_area_struct *vma, int target_node,
-		struct folio *folio)
+/**
+ * folio_needs_prot_numa() - Whether the folio needs prot numa
+ * @folio: The folio.
+ * @vma: The VMA mapping.
+ * @target_node: The numa node being accessed.
+ *
+ * Return: Returns true if folio needs prot numa and the access time of
+ *	   folio is adjusted. Returns false otherwise.
+ */
+bool folio_needs_prot_numa(struct folio *folio, struct vm_area_struct *vma,
+		int target_node)
 {
-	bool ret = true;
-	bool toptier;
 	int nid;
 
-	if (!folio)
-		goto skip;
-
-	if (folio_is_zone_device(folio) || folio_test_ksm(folio))
-		goto skip;
+	if (!folio || folio_is_zone_device(folio) || folio_test_ksm(folio))
+		return false;
 
 	/* Also skip shared copy-on-write folios */
 	if (is_cow_mapping(vma->vm_flags) && folio_maybe_mapped_shared(folio))
-		goto skip;
+		return false;
 
 	/* Folios are pinned and can't be migrated */
 	if (folio_maybe_dma_pinned(folio))
-		goto skip;
+		return false;
 
 	/*
 	 * While migration can move some dirty pages,
@@ -145,7 +149,7 @@ static bool prot_numa_skip(struct vm_area_struct *vma, int target_node,
 	 * context.
 	 */
 	if (folio_is_file_lru(folio) && folio_test_dirty(folio))
-		goto skip;
+		return false;
 
 	/*
 	 * Don't mess with PTEs if page is already on the node
@@ -153,23 +157,20 @@ static bool prot_numa_skip(struct vm_area_struct *vma, int target_node,
 	 */
 	nid = folio_nid(folio);
 	if (target_node == nid)
-		goto skip;
-
-	toptier = node_is_toptier(nid);
+		return false;
 
 	/*
 	 * Skip scanning top tier node if normal numa
 	 * balancing is disabled
 	 */
-	if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_NORMAL) && toptier)
-		goto skip;
+	if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_NORMAL) &&
+	    node_is_toptier(nid))
+		return false;
 
-	ret = false;
 	if (folio_use_access_time(folio))
 		folio_xchg_access_time(folio, jiffies_to_msecs(jiffies));
 
-skip:
-	return ret;
+	return true;
 }
 
 /* Set nr_ptes number of ptes, starting from idx */
@@ -314,8 +315,8 @@ static long change_pte_range(struct mmu_gather *tlb,
 			 * Avoid trapping faults against the zero or KSM
 			 * pages. See similar comment in change_huge_pmd.
 			 */
-			if (prot_numa && prot_numa_skip(vma, target_node,
-							folio)) {
+			if (prot_numa && !folio_needs_prot_numa(folio, vma,
+								target_node)) {
 				/* determine batch to skip */
 				nr_ptes = mprotect_folio_pte_batch(folio,
 					  pte, oldpte, max_nr_ptes, /* flags = */ 0);
-- 
2.27.0



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 1/3] mm: mprotect: always skip dma pinned folio in prot_numa_skip()
  2025-10-15 12:35 ` [PATCH v3 1/3] mm: mprotect: always skip dma pinned folio in prot_numa_skip() Kefeng Wang
@ 2025-10-15 15:32   ` Lorenzo Stoakes
  2025-10-16  1:00     ` Kefeng Wang
  2025-10-16  1:14   ` Zi Yan
  2025-10-16 17:53   ` Dev Jain
  2 siblings, 1 reply; 22+ messages in thread
From: Lorenzo Stoakes @ 2025-10-15 15:32 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Andrew Morton, David Hildenbrand, linux-mm, Zi Yan, Baolin Wang,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Liam.Howlett,
	Sidhartha Kumar

On Wed, Oct 15, 2025 at 08:35:14PM +0800, Kefeng Wang wrote:
> If the folio(even not CoW folio) is dma pinned, it can't be
> migrated, so always skip pinned folio to avoid a waste of cycles
> when folios migration.

NIT: 'when folios migration' -> 'when folios are migrated'.

Maybe worth saying 'due to elevated reference count?'

>
> Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
> Acked-by: David Hildenbrand <david@redhat.com>
> Reviewed-by: Lance Yang <lance.yang@linux.dev>
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>

LGTM though so:

Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

> ---
>  mm/mprotect.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index 113b48985834..bb59a42809b8 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -136,9 +136,12 @@ static bool prot_numa_skip(struct vm_area_struct *vma, unsigned long addr,
>  	if (folio_is_zone_device(folio) || folio_test_ksm(folio))
>  		goto skip;
>
> -	/* Also skip shared copy-on-write pages */
> -	if (is_cow_mapping(vma->vm_flags) &&
> -	    (folio_maybe_dma_pinned(folio) || folio_maybe_mapped_shared(folio)))
> +	/* Also skip shared copy-on-write folios */
> +	if (is_cow_mapping(vma->vm_flags) && folio_maybe_mapped_shared(folio))
> +		goto skip;
> +
> +	/* Folios are pinned and can't be migrated */
> +	if (folio_maybe_dma_pinned(folio))
>  		goto skip;
>
>  	/*
> --
> 2.27.0
>
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 2/3] mm: mprotect: avoid unnecessary struct page accessing if pte_protnone()
  2025-10-15 12:35 ` [PATCH v3 2/3] mm: mprotect: avoid unnecessary struct page accessing if pte_protnone() Kefeng Wang
@ 2025-10-15 15:43   ` Lorenzo Stoakes
  2025-10-15 17:45     ` David Hildenbrand
  2025-10-16  1:28   ` Zi Yan
  2025-10-16 18:02   ` Dev Jain
  2 siblings, 1 reply; 22+ messages in thread
From: Lorenzo Stoakes @ 2025-10-15 15:43 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Andrew Morton, David Hildenbrand, linux-mm, Zi Yan, Baolin Wang,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Liam.Howlett,
	Sidhartha Kumar

On Wed, Oct 15, 2025 at 08:35:15PM +0800, Kefeng Wang wrote:
> If the pte_protnone() is true, we could avoid unnecessary struct page
> accessing and reduce cache footprint when scanning page tables for prot
> numa, the performance test of pmbench memory accessing benchmark
> should be benifit, see more commit a818f5363a0e ("autonuma: reduce cache
> footprint when scanning page tables").

Hm should benefit? But you've not tested it? Seems like rather than
guessing you should actually give data. Otherwise I don't know why we're
referencing a benchmark.

>
> Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>

This generally looks good to me but we need to update the commit message.

> ---
>  mm/mprotect.c | 30 +++++++++++++-----------------
>  1 file changed, 13 insertions(+), 17 deletions(-)
>
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index bb59a42809b8..ed44aadb7aaa 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -118,18 +118,13 @@ static int mprotect_folio_pte_batch(struct folio *folio, pte_t *ptep,
>  	return folio_pte_batch_flags(folio, NULL, ptep, &pte, max_nr_ptes, flags);
>  }
>
> -static bool prot_numa_skip(struct vm_area_struct *vma, unsigned long addr,
> -			   pte_t oldpte, pte_t *pte, int target_node,
> -			   struct folio *folio)
> +static bool prot_numa_skip(struct vm_area_struct *vma, int target_node,
> +		struct folio *folio)
>  {
>  	bool ret = true;
>  	bool toptier;
>  	int nid;
>
> -	/* Avoid TLB flush if possible */
> -	if (pte_protnone(oldpte))
> -		goto skip;
> -
>  	if (!folio)
>  		goto skip;
>
> @@ -307,23 +302,24 @@ static long change_pte_range(struct mmu_gather *tlb,
>  			struct page *page;
>  			pte_t ptent;
>
> +			/* Already in the desired state. */
> +			if (prot_numa && pte_protnone(oldpte))
> +				continue;
> +
>  			page = vm_normal_page(vma, addr, oldpte);
>  			if (page)
>  				folio = page_folio(page);
> +
>  			/*
>  			 * Avoid trapping faults against the zero or KSM
>  			 * pages. See similar comment in change_huge_pmd.
>  			 */
> -			if (prot_numa) {
> -				int ret = prot_numa_skip(vma, addr, oldpte, pte,
> -							 target_node, folio);
> -				if (ret) {

Yikes this was gross, decl without blank line after, for no reason...

> -
> -					/* determine batch to skip */
> -					nr_ptes = mprotect_folio_pte_batch(folio,
> -						  pte, oldpte, max_nr_ptes, /* flags = */ 0);
> -					continue;
> -				}
> +			if (prot_numa && prot_numa_skip(vma, target_node,
> +							folio)) {
> +				/* determine batch to skip */
> +				nr_ptes = mprotect_folio_pte_batch(folio,
> +					  pte, oldpte, max_nr_ptes, /* flags = */ 0);
> +				continue;
>  			}
>
>  			nr_ptes = mprotect_folio_pte_batch(folio, pte, oldpte, max_nr_ptes, flags);
> --
> 2.27.0
>
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 2/3] mm: mprotect: avoid unnecessary struct page accessing if pte_protnone()
  2025-10-15 15:43   ` Lorenzo Stoakes
@ 2025-10-15 17:45     ` David Hildenbrand
  2025-10-16  1:07       ` Kefeng Wang
  2025-10-17  8:46       ` Lorenzo Stoakes
  0 siblings, 2 replies; 22+ messages in thread
From: David Hildenbrand @ 2025-10-15 17:45 UTC (permalink / raw)
  To: Lorenzo Stoakes, Kefeng Wang
  Cc: Andrew Morton, linux-mm, Zi Yan, Baolin Wang, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Liam.Howlett, Sidhartha Kumar

On 15.10.25 17:43, Lorenzo Stoakes wrote:
> On Wed, Oct 15, 2025 at 08:35:15PM +0800, Kefeng Wang wrote:
>> If the pte_protnone() is true, we could avoid unnecessary struct page
>> accessing and reduce cache footprint when scanning page tables for prot
>> numa, the performance test of pmbench memory accessing benchmark
>> should be benifit, see more commit a818f5363a0e ("autonuma: reduce cache
>> footprint when scanning page tables").
> 
> Hm should benefit? But you've not tested it? Seems like rather than
> guessing you should actually give data. Otherwise I don't know why we're
> referencing a benchmark.

I guess it might be harder to quantify, but theoretically it makes sense 
to me to just skip in any case.

... and it enables patch #3, which is nice :)

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 1/3] mm: mprotect: always skip dma pinned folio in prot_numa_skip()
  2025-10-15 15:32   ` Lorenzo Stoakes
@ 2025-10-16  1:00     ` Kefeng Wang
  2025-10-16 21:10       ` Andrew Morton
  0 siblings, 1 reply; 22+ messages in thread
From: Kefeng Wang @ 2025-10-16  1:00 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, David Hildenbrand, linux-mm, Zi Yan, Baolin Wang,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Liam.Howlett,
	Sidhartha Kumar



On 2025/10/15 23:32, Lorenzo Stoakes wrote:
> On Wed, Oct 15, 2025 at 08:35:14PM +0800, Kefeng Wang wrote:
>> If the folio(even not CoW folio) is dma pinned, it can't be
>> migrated, so always skip pinned folio to avoid a waste of cycles
>> when folios migration.
> 
> NIT: 'when folios migration' -> 'when folios are migrated'.
> 
> Maybe worth saying 'due to elevated reference count?'
> 

Hope Andrew could help to update the changelog.

>>
>> Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
>> Acked-by: David Hildenbrand <david@redhat.com>
>> Reviewed-by: Lance Yang <lance.yang@linux.dev>
>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> 
> LGTM though so:
> 
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

Thanks all.

> 
>> ---
>>   mm/mprotect.c | 9 ++++++---
>>   1 file changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/mprotect.c b/mm/mprotect.c
>> index 113b48985834..bb59a42809b8 100644
>> --- a/mm/mprotect.c
>> +++ b/mm/mprotect.c
>> @@ -136,9 +136,12 @@ static bool prot_numa_skip(struct vm_area_struct *vma, unsigned long addr,
>>   	if (folio_is_zone_device(folio) || folio_test_ksm(folio))
>>   		goto skip;
>>
>> -	/* Also skip shared copy-on-write pages */
>> -	if (is_cow_mapping(vma->vm_flags) &&
>> -	    (folio_maybe_dma_pinned(folio) || folio_maybe_mapped_shared(folio)))
>> +	/* Also skip shared copy-on-write folios */
>> +	if (is_cow_mapping(vma->vm_flags) && folio_maybe_mapped_shared(folio))
>> +		goto skip;
>> +
>> +	/* Folios are pinned and can't be migrated */
>> +	if (folio_maybe_dma_pinned(folio))
>>   		goto skip;
>>
>>   	/*
>> --
>> 2.27.0
>>
>>
> 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 2/3] mm: mprotect: avoid unnecessary struct page accessing if pte_protnone()
  2025-10-15 17:45     ` David Hildenbrand
@ 2025-10-16  1:07       ` Kefeng Wang
  2025-10-17  8:47         ` Lorenzo Stoakes
  2025-10-17  8:46       ` Lorenzo Stoakes
  1 sibling, 1 reply; 22+ messages in thread
From: Kefeng Wang @ 2025-10-16  1:07 UTC (permalink / raw)
  To: David Hildenbrand, Lorenzo Stoakes
  Cc: Andrew Morton, linux-mm, Zi Yan, Baolin Wang, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Liam.Howlett, Sidhartha Kumar



On 2025/10/16 1:45, David Hildenbrand wrote:
> On 15.10.25 17:43, Lorenzo Stoakes wrote:
>> On Wed, Oct 15, 2025 at 08:35:15PM +0800, Kefeng Wang wrote:
>>> If the pte_protnone() is true, we could avoid unnecessary struct page
>>> accessing and reduce cache footprint when scanning page tables for prot
>>> numa, the performance test of pmbench memory accessing benchmark
>>> should be benifit, see more commit a818f5363a0e ("autonuma: reduce cache
>>> footprint when scanning page tables").
>>
>> Hm should benefit? But you've not tested it? Seems like rather than
>> guessing you should actually give data. Otherwise I don't know why we're
>> referencing a benchmark.
> 

Yes, I don't test it, just find the above commit which does same thing 
before and it shows benefit in a benchmark.

> I guess it might be harder to quantify, but theoretically it makes sense 
> to me to just skip in any case.
> 
> ... and it enables patch #3, which is nice :)
> 

My initial goal was to skip unnecessary steps for pmd folio when prot numa.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 1/3] mm: mprotect: always skip dma pinned folio in prot_numa_skip()
  2025-10-15 12:35 ` [PATCH v3 1/3] mm: mprotect: always skip dma pinned folio in prot_numa_skip() Kefeng Wang
  2025-10-15 15:32   ` Lorenzo Stoakes
@ 2025-10-16  1:14   ` Zi Yan
  2025-10-16 17:53   ` Dev Jain
  2 siblings, 0 replies; 22+ messages in thread
From: Zi Yan @ 2025-10-16  1:14 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes, linux-mm,
	Baolin Wang, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
	Liam.Howlett, Sidhartha Kumar

On 15 Oct 2025, at 8:35, Kefeng Wang wrote:

> If the folio(even not CoW folio) is dma pinned, it can't be
> migrated, so always skip pinned folio to avoid a waste of cycles
> when folios migration.
>
> Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
> Acked-by: David Hildenbrand <david@redhat.com>
> Reviewed-by: Lance Yang <lance.yang@linux.dev>
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
>  mm/mprotect.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
Makes sense. Acked-by: Zi Yan <ziy@nvidia.com>


Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 2/3] mm: mprotect: avoid unnecessary struct page accessing if pte_protnone()
  2025-10-15 12:35 ` [PATCH v3 2/3] mm: mprotect: avoid unnecessary struct page accessing if pte_protnone() Kefeng Wang
  2025-10-15 15:43   ` Lorenzo Stoakes
@ 2025-10-16  1:28   ` Zi Yan
  2025-10-16  1:35     ` Kefeng Wang
  2025-10-16 18:02   ` Dev Jain
  2 siblings, 1 reply; 22+ messages in thread
From: Zi Yan @ 2025-10-16  1:28 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes, linux-mm,
	Baolin Wang, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
	Liam.Howlett, Sidhartha Kumar

On 15 Oct 2025, at 8:35, Kefeng Wang wrote:

> If the pte_protnone() is true, we could avoid unnecessary struct page
> accessing and reduce cache footprint when scanning page tables for prot
> numa, the performance test of pmbench memory accessing benchmark
> should be benifit, see more commit a818f5363a0e ("autonuma: reduce cache
> footprint when scanning page tables").
>
> Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
>  mm/mprotect.c | 30 +++++++++++++-----------------
>  1 file changed, 13 insertions(+), 17 deletions(-)
>
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index bb59a42809b8..ed44aadb7aaa 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -118,18 +118,13 @@ static int mprotect_folio_pte_batch(struct folio *folio, pte_t *ptep,
>  	return folio_pte_batch_flags(folio, NULL, ptep, &pte, max_nr_ptes, flags);
>  }
>
> -static bool prot_numa_skip(struct vm_area_struct *vma, unsigned long addr,
> -			   pte_t oldpte, pte_t *pte, int target_node,
> -			   struct folio *folio)
> +static bool prot_numa_skip(struct vm_area_struct *vma, int target_node,
> +		struct folio *folio)
>  {
>  	bool ret = true;
>  	bool toptier;
>  	int nid;
>
> -	/* Avoid TLB flush if possible */
> -	if (pte_protnone(oldpte))
> -		goto skip;
> -
>  	if (!folio)
>  		goto skip;
>
> @@ -307,23 +302,24 @@ static long change_pte_range(struct mmu_gather *tlb,
>  			struct page *page;
>  			pte_t ptent;
>
> +			/* Already in the desired state. */
> +			if (prot_numa && pte_protnone(oldpte))
> +				continue;

For a folio pte batch, instead of calculating nr_ptes to skip, this just
skip PTEs one by one. Looking at folio_pte_batch_flags(), this way looks
better.

> +
>  			page = vm_normal_page(vma, addr, oldpte);
>  			if (page)
>  				folio = page_folio(page);
> +
>  			/*
>  			 * Avoid trapping faults against the zero or KSM
>  			 * pages. See similar comment in change_huge_pmd.
>  			 */
> -			if (prot_numa) {
> -				int ret = prot_numa_skip(vma, addr, oldpte, pte,
> -							 target_node, folio);
> -				if (ret) {
> -
> -					/* determine batch to skip */
> -					nr_ptes = mprotect_folio_pte_batch(folio,
> -						  pte, oldpte, max_nr_ptes, /* flags = */ 0);
> -					continue;
> -				}
> +			if (prot_numa && prot_numa_skip(vma, target_node,
> +							folio)) {
> +				/* determine batch to skip */
> +				nr_ptes = mprotect_folio_pte_batch(folio,
> +					  pte, oldpte, max_nr_ptes, /* flags = */ 0);
> +				continue;
>  			}
>
>  			nr_ptes = mprotect_folio_pte_batch(folio, pte, oldpte, max_nr_ptes, flags);

Acked-by: Zi Yan <ziy@nvidia.com>

--
Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 2/3] mm: mprotect: avoid unnecessary struct page accessing if pte_protnone()
  2025-10-16  1:28   ` Zi Yan
@ 2025-10-16  1:35     ` Kefeng Wang
  0 siblings, 0 replies; 22+ messages in thread
From: Kefeng Wang @ 2025-10-16  1:35 UTC (permalink / raw)
  To: Zi Yan
  Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes, linux-mm,
	Baolin Wang, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
	Liam.Howlett, Sidhartha Kumar



On 2025/10/16 9:28, Zi Yan wrote:
> On 15 Oct 2025, at 8:35, Kefeng Wang wrote:
> 
>> If the pte_protnone() is true, we could avoid unnecessary struct page
>> accessing and reduce cache footprint when scanning page tables for prot
>> numa, the performance test of pmbench memory accessing benchmark
>> should be benifit, see more commit a818f5363a0e ("autonuma: reduce cache
>> footprint when scanning page tables").
>>
>> Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>> ---
>>   mm/mprotect.c | 30 +++++++++++++-----------------
>>   1 file changed, 13 insertions(+), 17 deletions(-)
>>
>> diff --git a/mm/mprotect.c b/mm/mprotect.c
>> index bb59a42809b8..ed44aadb7aaa 100644
>> --- a/mm/mprotect.c
>> +++ b/mm/mprotect.c
>> @@ -118,18 +118,13 @@ static int mprotect_folio_pte_batch(struct folio *folio, pte_t *ptep,
>>   	return folio_pte_batch_flags(folio, NULL, ptep, &pte, max_nr_ptes, flags);
>>   }
>>
>> -static bool prot_numa_skip(struct vm_area_struct *vma, unsigned long addr,
>> -			   pte_t oldpte, pte_t *pte, int target_node,
>> -			   struct folio *folio)
>> +static bool prot_numa_skip(struct vm_area_struct *vma, int target_node,
>> +		struct folio *folio)
>>   {
>>   	bool ret = true;
>>   	bool toptier;
>>   	int nid;
>>
>> -	/* Avoid TLB flush if possible */
>> -	if (pte_protnone(oldpte))
>> -		goto skip;
>> -
>>   	if (!folio)
>>   		goto skip;
>>
>> @@ -307,23 +302,24 @@ static long change_pte_range(struct mmu_gather *tlb,
>>   			struct page *page;
>>   			pte_t ptent;
>>
>> +			/* Already in the desired state. */
>> +			if (prot_numa && pte_protnone(oldpte))
>> +				continue;
> 
> For a folio pte batch, instead of calculating nr_ptes to skip, this just
> skip PTEs one by one. Looking at folio_pte_batch_flags(), this way looks
> better.

folio_pte_batch_flags need a folio, we want to avoid it,
maybe use pte_batch_hint() here.

> 
>> +
>>   			page = vm_normal_page(vma, addr, oldpte);
>>   			if (page)
>>   				folio = page_folio(page);
>> +
>>   			/*
>>   			 * Avoid trapping faults against the zero or KSM
>>   			 * pages. See similar comment in change_huge_pmd.
>>   			 */
>> -			if (prot_numa) {
>> -				int ret = prot_numa_skip(vma, addr, oldpte, pte,
>> -							 target_node, folio);
>> -				if (ret) {
>> -
>> -					/* determine batch to skip */
>> -					nr_ptes = mprotect_folio_pte_batch(folio,
>> -						  pte, oldpte, max_nr_ptes, /* flags = */ 0);
>> -					continue;
>> -				}
>> +			if (prot_numa && prot_numa_skip(vma, target_node,
>> +							folio)) {
>> +				/* determine batch to skip */
>> +				nr_ptes = mprotect_folio_pte_batch(folio,
>> +					  pte, oldpte, max_nr_ptes, /* flags = */ 0);
>> +				continue;
>>   			}
>>
>>   			nr_ptes = mprotect_folio_pte_batch(folio, pte, oldpte, max_nr_ptes, flags);
> 
> Acked-by: Zi Yan <ziy@nvidia.com>

Thanks.
> 
> --
> Best Regards,
> Yan, Zi
> 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 1/3] mm: mprotect: always skip dma pinned folio in prot_numa_skip()
  2025-10-15 12:35 ` [PATCH v3 1/3] mm: mprotect: always skip dma pinned folio in prot_numa_skip() Kefeng Wang
  2025-10-15 15:32   ` Lorenzo Stoakes
  2025-10-16  1:14   ` Zi Yan
@ 2025-10-16 17:53   ` Dev Jain
  2 siblings, 0 replies; 22+ messages in thread
From: Dev Jain @ 2025-10-16 17:53 UTC (permalink / raw)
  To: Kefeng Wang, Andrew Morton, David Hildenbrand, Lorenzo Stoakes, linux-mm
  Cc: Zi Yan, Baolin Wang, Ryan Roberts, Barry Song, Lance Yang,
	Liam.Howlett, Sidhartha Kumar


On 15/10/25 6:05 pm, Kefeng Wang wrote:
> If the folio(even not CoW folio) is dma pinned, it can't be
> migrated, so always skip pinned folio to avoid a waste of cycles
> when folios migration.
>
> Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
> Acked-by: David Hildenbrand <david@redhat.com>
> Reviewed-by: Lance Yang <lance.yang@linux.dev>
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---

Reviewed-by: Dev Jain <dev.jain@arm.com>



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 2/3] mm: mprotect: avoid unnecessary struct page accessing if pte_protnone()
  2025-10-15 12:35 ` [PATCH v3 2/3] mm: mprotect: avoid unnecessary struct page accessing if pte_protnone() Kefeng Wang
  2025-10-15 15:43   ` Lorenzo Stoakes
  2025-10-16  1:28   ` Zi Yan
@ 2025-10-16 18:02   ` Dev Jain
  2 siblings, 0 replies; 22+ messages in thread
From: Dev Jain @ 2025-10-16 18:02 UTC (permalink / raw)
  To: Kefeng Wang, Andrew Morton, David Hildenbrand, Lorenzo Stoakes, linux-mm
  Cc: Zi Yan, Baolin Wang, Ryan Roberts, Barry Song, Lance Yang,
	Liam.Howlett, Sidhartha Kumar


On 15/10/25 6:05 pm, Kefeng Wang wrote:
> If the pte_protnone() is true, we could avoid unnecessary struct page
> accessing and reduce cache footprint when scanning page tables for prot
> numa, the performance test of pmbench memory accessing benchmark
> should be benifit, see more commit a818f5363a0e ("autonuma: reduce cache
> footprint when scanning page tables").
>
> Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
>   mm/mprotect.c | 30 +++++++++++++-----------------
>   1 file changed, 13 insertions(+), 17 deletions(-)
>
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index bb59a42809b8..ed44aadb7aaa 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -118,18 +118,13 @@ static int mprotect_folio_pte_batch(struct folio *folio, pte_t *ptep,
>   	return folio_pte_batch_flags(folio, NULL, ptep, &pte, max_nr_ptes, flags);
>   }
>   
> -static bool prot_numa_skip(struct vm_area_struct *vma, unsigned long addr,
> -			   pte_t oldpte, pte_t *pte, int target_node,
> -			   struct folio *folio)
> +static bool prot_numa_skip(struct vm_area_struct *vma, int target_node,
> +		struct folio *folio)
>   {

Wow, so while refactoring stuff, I had introduced two unused parameters :)
Thanks.

Reviewed-by: Dev Jain <dev.jain@arm.com>



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/3] mm: huge_memory: use folio_needs_prot_numa() for pmd folio
  2025-10-15 12:35 ` [PATCH v3 3/3] mm: huge_memory: use folio_needs_prot_numa() for pmd folio Kefeng Wang
@ 2025-10-16 19:19   ` David Hildenbrand
  2025-10-17 10:07   ` Lorenzo Stoakes
  1 sibling, 0 replies; 22+ messages in thread
From: David Hildenbrand @ 2025-10-16 19:19 UTC (permalink / raw)
  To: Kefeng Wang, Andrew Morton, Lorenzo Stoakes, linux-mm
  Cc: Zi Yan, Baolin Wang, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, Liam.Howlett, Sidhartha Kumar

On 15.10.25 14:35, Kefeng Wang wrote:
> Rename prot_numa_skip() to folio_needs_prot_numa(), and remove
> ret by directly return value instead of goto style.
> 
> The folio checks for prot numa should be suitable for pmd folio
> too, which helps to avoid unnecessary pmd change and folio
> migration attempts.

It would be good to describe here which additional checks we are now 
performing in the PMD case.

Because that's a real change.

> 
> Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
>   mm/huge_memory.c | 21 +++++++--------------
>   mm/internal.h    |  2 ++
>   mm/mprotect.c    | 45 +++++++++++++++++++++++----------------------
>   3 files changed, 32 insertions(+), 36 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 1d1b74950332..c7364dcb96c1 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2395,8 +2395,7 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>   #endif
>   
>   	if (prot_numa) {
> -		struct folio *folio;
> -		bool toptier;
> +		int target_node = NUMA_NO_NODE;
>   		/*
>   		 * Avoid trapping faults against the zero page. The read-only
>   		 * data is likely to be read-cached on the local CPU and
> @@ -2408,19 +2407,13 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>   		if (pmd_protnone(*pmd))
>   			goto unlock;
>   
> -		folio = pmd_folio(*pmd);
> -		toptier = node_is_toptier(folio_nid(folio));
> -		/*
> -		 * Skip scanning top tier node if normal numa
> -		 * balancing is disabled
> -		 */
> -		if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_NORMAL) &&
> -		    toptier)
> -			goto unlock;
> +		/* Get target node for single threaded private VMAs */
> +		if (!(vma->vm_flags & VM_SHARED) &&
> +		    atomic_read(&vma->vm_mm->mm_users) == 1)
> +			target_node = numa_node_id();
>   
> -		if (folio_use_access_time(folio))
> -			folio_xchg_access_time(folio,
> -					       jiffies_to_msecs(jiffies));
> +		if (!folio_needs_prot_numa(pmd_folio(*pmd), vma, target_node))
> +			goto unlock;
>   	}
>   	/*
>   	 * In case prot_numa, we are under mmap_read_lock(mm). It's critical
> diff --git a/mm/internal.h b/mm/internal.h
> index 1561fc2ff5b8..5f63d5c049b1 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -1378,6 +1378,8 @@ void vunmap_range_noflush(unsigned long start, unsigned long end);
>   
>   void __vunmap_range_noflush(unsigned long start, unsigned long end);
>   
> +bool folio_needs_prot_numa(struct folio *folio, struct vm_area_struct *vma,
> +		int target_node);
>   int numa_migrate_check(struct folio *folio, struct vm_fault *vmf,
>   		      unsigned long addr, int *flags, bool writable,
>   		      int *last_cpupid);
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index ed44aadb7aaa..0ae8f4a277b2 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -118,26 +118,30 @@ static int mprotect_folio_pte_batch(struct folio *folio, pte_t *ptep,
>   	return folio_pte_batch_flags(folio, NULL, ptep, &pte, max_nr_ptes, flags);
>   }
>   
> -static bool prot_numa_skip(struct vm_area_struct *vma, int target_node,
> -		struct folio *folio)
> +/**
> + * folio_needs_prot_numa() - Whether the folio needs prot numa
> + * @folio: The folio.
> + * @vma: The VMA mapping.
> + * @target_node: The numa node being accessed.
> + *
> + * Return: Returns true if folio needs prot numa and the access time of
> + *	   folio is adjusted. Returns false otherwise.

I guess you could drop the other Returns like

Return: true if .... Otherwise false.

> + */
> +bool folio_needs_prot_numa(struct folio *folio, struct vm_area_struct *vma,
> +		int target_node)

[...]

>   
>   /* Set nr_ptes number of ptes, starting from idx */
> @@ -314,8 +315,8 @@ static long change_pte_range(struct mmu_gather *tlb,
>   			 * Avoid trapping faults against the zero or KSM
>   			 * pages. See similar comment in change_huge_pmd.
>   			 */
> -			if (prot_numa && prot_numa_skip(vma, target_node,
> -							folio)) {
> +			if (prot_numa && !folio_needs_prot_numa(folio, vma,
> +								target_node)) {

You can fit that into a single line to improve readability. Alternatively

if (prot_numa &&
     !folio_needs ...)

>   				/* determine batch to skip */
>   				nr_ptes = mprotect_folio_pte_batch(folio,
>   					  pte, oldpte, max_nr_ptes, /* flags = */ 0);


-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 1/3] mm: mprotect: always skip dma pinned folio in prot_numa_skip()
  2025-10-16  1:00     ` Kefeng Wang
@ 2025-10-16 21:10       ` Andrew Morton
  2025-10-17  8:46         ` Lorenzo Stoakes
  2025-10-17 14:12         ` Kefeng Wang
  0 siblings, 2 replies; 22+ messages in thread
From: Andrew Morton @ 2025-10-16 21:10 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Lorenzo Stoakes, David Hildenbrand, linux-mm, Zi Yan,
	Baolin Wang, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
	Liam.Howlett, Sidhartha Kumar

On Thu, 16 Oct 2025 09:00:34 +0800 Kefeng Wang <wangkefeng.wang@huawei.com> wrote:

> On 2025/10/15 23:32, Lorenzo Stoakes wrote:
> > On Wed, Oct 15, 2025 at 08:35:14PM +0800, Kefeng Wang wrote:
> >> If the folio(even not CoW folio) is dma pinned, it can't be
> >> migrated, so always skip pinned folio to avoid a waste of cycles
> >> when folios migration.
> > 
> > NIT: 'when folios migration' -> 'when folios are migrated'.
> > 
> > Maybe worth saying 'due to elevated reference count?'
> > 
> 
> Hope Andrew could help to update the changelog.
> 

I changed it to this:

: If the folio (even not CoW folio) is dma pinned, it can't be migrated
: due to the elevated reference count.  So always skip a pinned folio to
: avoid wasting cycles when folios are migrated.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 1/3] mm: mprotect: always skip dma pinned folio in prot_numa_skip()
  2025-10-16 21:10       ` Andrew Morton
@ 2025-10-17  8:46         ` Lorenzo Stoakes
  2025-10-17 14:12         ` Kefeng Wang
  1 sibling, 0 replies; 22+ messages in thread
From: Lorenzo Stoakes @ 2025-10-17  8:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kefeng Wang, David Hildenbrand, linux-mm, Zi Yan, Baolin Wang,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Liam.Howlett,
	Sidhartha Kumar

On Thu, Oct 16, 2025 at 02:10:51PM -0700, Andrew Morton wrote:
> On Thu, 16 Oct 2025 09:00:34 +0800 Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>
> > On 2025/10/15 23:32, Lorenzo Stoakes wrote:
> > > On Wed, Oct 15, 2025 at 08:35:14PM +0800, Kefeng Wang wrote:
> > >> If the folio(even not CoW folio) is dma pinned, it can't be
> > >> migrated, so always skip pinned folio to avoid a waste of cycles
> > >> when folios migration.
> > >
> > > NIT: 'when folios migration' -> 'when folios are migrated'.
> > >
> > > Maybe worth saying 'due to elevated reference count?'
> > >
> >
> > Hope Andrew could help to update the changelog.
> >
>
> I changed it to this:
>
> : If the folio (even not CoW folio) is dma pinned, it can't be migrated
> : due to the elevated reference count.  So always skip a pinned folio to
> : avoid wasting cycles when folios are migrated.
>

LGTM from my side, thanks! :)


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 2/3] mm: mprotect: avoid unnecessary struct page accessing if pte_protnone()
  2025-10-15 17:45     ` David Hildenbrand
  2025-10-16  1:07       ` Kefeng Wang
@ 2025-10-17  8:46       ` Lorenzo Stoakes
  1 sibling, 0 replies; 22+ messages in thread
From: Lorenzo Stoakes @ 2025-10-17  8:46 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Kefeng Wang, Andrew Morton, linux-mm, Zi Yan, Baolin Wang,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Liam.Howlett,
	Sidhartha Kumar

On Wed, Oct 15, 2025 at 07:45:24PM +0200, David Hildenbrand wrote:
> On 15.10.25 17:43, Lorenzo Stoakes wrote:
> > On Wed, Oct 15, 2025 at 08:35:15PM +0800, Kefeng Wang wrote:
> > > If the pte_protnone() is true, we could avoid unnecessary struct page
> > > accessing and reduce cache footprint when scanning page tables for prot
> > > numa, the performance test of pmbench memory accessing benchmark
> > > should be benifit, see more commit a818f5363a0e ("autonuma: reduce cache
> > > footprint when scanning page tables").
> >
> > Hm should benefit? But you've not tested it? Seems like rather than
> > guessing you should actually give data. Otherwise I don't know why we're
> > referencing a benchmark.
>
> I guess it might be harder to quantify, but theoretically it makes sense to
> me to just skip in any case.

Right, yeah let's just drop that then.

>
> ... and it enables patch #3, which is nice :)

:)

>
> --
> Cheers
>
> David / dhildenb
>
>

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 2/3] mm: mprotect: avoid unnecessary struct page accessing if pte_protnone()
  2025-10-16  1:07       ` Kefeng Wang
@ 2025-10-17  8:47         ` Lorenzo Stoakes
  0 siblings, 0 replies; 22+ messages in thread
From: Lorenzo Stoakes @ 2025-10-17  8:47 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: David Hildenbrand, Andrew Morton, linux-mm, Zi Yan, Baolin Wang,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Liam.Howlett,
	Sidhartha Kumar

On Thu, Oct 16, 2025 at 09:07:13AM +0800, Kefeng Wang wrote:
>
>
> On 2025/10/16 1:45, David Hildenbrand wrote:
> > On 15.10.25 17:43, Lorenzo Stoakes wrote:
> > > On Wed, Oct 15, 2025 at 08:35:15PM +0800, Kefeng Wang wrote:
> > > > If the pte_protnone() is true, we could avoid unnecessary struct page
> > > > accessing and reduce cache footprint when scanning page tables for prot
> > > > numa, the performance test of pmbench memory accessing benchmark
> > > > should be benifit, see more commit a818f5363a0e ("autonuma: reduce cache
> > > > footprint when scanning page tables").
> > >
> > > Hm should benefit? But you've not tested it? Seems like rather than
> > > guessing you should actually give data. Otherwise I don't know why we're
> > > referencing a benchmark.
> >
>
> Yes, I don't test it, just find the above commit which does same thing
> before and it shows benefit in a benchmark.

Right, so let's just drop this from the commit message please, otherwise
it's a bit weird - saying that you hope it'll be an improvement without
providing any details is not really useful here.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/3] mm: huge_memory: use folio_needs_prot_numa() for pmd folio
  2025-10-15 12:35 ` [PATCH v3 3/3] mm: huge_memory: use folio_needs_prot_numa() for pmd folio Kefeng Wang
  2025-10-16 19:19   ` David Hildenbrand
@ 2025-10-17 10:07   ` Lorenzo Stoakes
  2025-10-17 14:13     ` Kefeng Wang
  1 sibling, 1 reply; 22+ messages in thread
From: Lorenzo Stoakes @ 2025-10-17 10:07 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Andrew Morton, David Hildenbrand, linux-mm, Zi Yan, Baolin Wang,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Liam.Howlett,
	Sidhartha Kumar

On Wed, Oct 15, 2025 at 08:35:16PM +0800, Kefeng Wang wrote:
> Rename prot_numa_skip() to folio_needs_prot_numa(), and remove
> ret by directly return value instead of goto style.
>
> The folio checks for prot numa should be suitable for pmd folio
> too, which helps to avoid unnecessary pmd change and folio
> migration attempts.
>
> Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>

Yeah you're doing too much at once here.

You seem to be combining refactoring with changing actual functionality which is
just not helpful, could you split this into two patches please?

Otherwise on bisect if you break the PMD stuff we'll end up reverting the
refactoring too.

Plus it makes review harder.

Thanks!

> ---
>  mm/huge_memory.c | 21 +++++++--------------
>  mm/internal.h    |  2 ++
>  mm/mprotect.c    | 45 +++++++++++++++++++++++----------------------
>  3 files changed, 32 insertions(+), 36 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 1d1b74950332..c7364dcb96c1 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2395,8 +2395,7 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>  #endif
>
>  	if (prot_numa) {
> -		struct folio *folio;
> -		bool toptier;
> +		int target_node = NUMA_NO_NODE;
>  		/*
>  		 * Avoid trapping faults against the zero page. The read-only
>  		 * data is likely to be read-cached on the local CPU and
> @@ -2408,19 +2407,13 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>  		if (pmd_protnone(*pmd))
>  			goto unlock;
>
> -		folio = pmd_folio(*pmd);
> -		toptier = node_is_toptier(folio_nid(folio));
> -		/*
> -		 * Skip scanning top tier node if normal numa
> -		 * balancing is disabled
> -		 */
> -		if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_NORMAL) &&
> -		    toptier)
> -			goto unlock;
> +		/* Get target node for single threaded private VMAs */
> +		if (!(vma->vm_flags & VM_SHARED) &&
> +		    atomic_read(&vma->vm_mm->mm_users) == 1)
> +			target_node = numa_node_id();
>
> -		if (folio_use_access_time(folio))
> -			folio_xchg_access_time(folio,
> -					       jiffies_to_msecs(jiffies));
> +		if (!folio_needs_prot_numa(pmd_folio(*pmd), vma, target_node))
> +			goto unlock;
>  	}
>  	/*
>  	 * In case prot_numa, we are under mmap_read_lock(mm). It's critical
> diff --git a/mm/internal.h b/mm/internal.h
> index 1561fc2ff5b8..5f63d5c049b1 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -1378,6 +1378,8 @@ void vunmap_range_noflush(unsigned long start, unsigned long end);
>
>  void __vunmap_range_noflush(unsigned long start, unsigned long end);
>
> +bool folio_needs_prot_numa(struct folio *folio, struct vm_area_struct *vma,
> +		int target_node);
>  int numa_migrate_check(struct folio *folio, struct vm_fault *vmf,
>  		      unsigned long addr, int *flags, bool writable,
>  		      int *last_cpupid);
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index ed44aadb7aaa..0ae8f4a277b2 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -118,26 +118,30 @@ static int mprotect_folio_pte_batch(struct folio *folio, pte_t *ptep,
>  	return folio_pte_batch_flags(folio, NULL, ptep, &pte, max_nr_ptes, flags);
>  }
>
> -static bool prot_numa_skip(struct vm_area_struct *vma, int target_node,
> -		struct folio *folio)
> +/**
> + * folio_needs_prot_numa() - Whether the folio needs prot numa
> + * @folio: The folio.
> + * @vma: The VMA mapping.
> + * @target_node: The numa node being accessed.
> + *
> + * Return: Returns true if folio needs prot numa and the access time of
> + *	   folio is adjusted. Returns false otherwise.
> + */
> +bool folio_needs_prot_numa(struct folio *folio, struct vm_area_struct *vma,
> +		int target_node)
>  {
> -	bool ret = true;
> -	bool toptier;
>  	int nid;
>
> -	if (!folio)
> -		goto skip;
> -
> -	if (folio_is_zone_device(folio) || folio_test_ksm(folio))
> -		goto skip;
> +	if (!folio || folio_is_zone_device(folio) || folio_test_ksm(folio))
> +		return false;
>
>  	/* Also skip shared copy-on-write folios */
>  	if (is_cow_mapping(vma->vm_flags) && folio_maybe_mapped_shared(folio))
> -		goto skip;
> +		return false;
>
>  	/* Folios are pinned and can't be migrated */
>  	if (folio_maybe_dma_pinned(folio))
> -		goto skip;
> +		return false;
>
>  	/*
>  	 * While migration can move some dirty pages,
> @@ -145,7 +149,7 @@ static bool prot_numa_skip(struct vm_area_struct *vma, int target_node,
>  	 * context.
>  	 */
>  	if (folio_is_file_lru(folio) && folio_test_dirty(folio))
> -		goto skip;
> +		return false;
>
>  	/*
>  	 * Don't mess with PTEs if page is already on the node
> @@ -153,23 +157,20 @@ static bool prot_numa_skip(struct vm_area_struct *vma, int target_node,
>  	 */
>  	nid = folio_nid(folio);
>  	if (target_node == nid)
> -		goto skip;
> -
> -	toptier = node_is_toptier(nid);
> +		return false;
>
>  	/*
>  	 * Skip scanning top tier node if normal numa
>  	 * balancing is disabled
>  	 */
> -	if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_NORMAL) && toptier)
> -		goto skip;
> +	if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_NORMAL) &&
> +	    node_is_toptier(nid))
> +		return false;
>
> -	ret = false;
>  	if (folio_use_access_time(folio))
>  		folio_xchg_access_time(folio, jiffies_to_msecs(jiffies));
>
> -skip:
> -	return ret;
> +	return true;
>  }
>
>  /* Set nr_ptes number of ptes, starting from idx */
> @@ -314,8 +315,8 @@ static long change_pte_range(struct mmu_gather *tlb,
>  			 * Avoid trapping faults against the zero or KSM
>  			 * pages. See similar comment in change_huge_pmd.
>  			 */
> -			if (prot_numa && prot_numa_skip(vma, target_node,
> -							folio)) {
> +			if (prot_numa && !folio_needs_prot_numa(folio, vma,
> +								target_node)) {
>  				/* determine batch to skip */
>  				nr_ptes = mprotect_folio_pte_batch(folio,
>  					  pte, oldpte, max_nr_ptes, /* flags = */ 0);
> --
> 2.27.0
>
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 1/3] mm: mprotect: always skip dma pinned folio in prot_numa_skip()
  2025-10-16 21:10       ` Andrew Morton
  2025-10-17  8:46         ` Lorenzo Stoakes
@ 2025-10-17 14:12         ` Kefeng Wang
  1 sibling, 0 replies; 22+ messages in thread
From: Kefeng Wang @ 2025-10-17 14:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Lorenzo Stoakes, David Hildenbrand, linux-mm, Zi Yan,
	Baolin Wang, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
	Liam.Howlett, Sidhartha Kumar



On 2025/10/17 5:10, Andrew Morton wrote:
> On Thu, 16 Oct 2025 09:00:34 +0800 Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
> 
>> On 2025/10/15 23:32, Lorenzo Stoakes wrote:
>>> On Wed, Oct 15, 2025 at 08:35:14PM +0800, Kefeng Wang wrote:
>>>> If the folio(even not CoW folio) is dma pinned, it can't be
>>>> migrated, so always skip pinned folio to avoid a waste of cycles
>>>> when folios migration.
>>>
>>> NIT: 'when folios migration' -> 'when folios are migrated'.
>>>
>>> Maybe worth saying 'due to elevated reference count?'
>>>
>>
>> Hope Andrew could help to update the changelog.
>>
> 
> I changed it to this:
> 
> : If the folio (even not CoW folio) is dma pinned, it can't be migrated
> : due to the elevated reference count.  So always skip a pinned folio to
> : avoid wasting cycles when folios are migrated.
> 

Thank you Andrew, since Lorenzo/David have some comments, I think I need
send a new verison, but thank you for all the help.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/3] mm: huge_memory: use folio_needs_prot_numa() for pmd folio
  2025-10-17 10:07   ` Lorenzo Stoakes
@ 2025-10-17 14:13     ` Kefeng Wang
  0 siblings, 0 replies; 22+ messages in thread
From: Kefeng Wang @ 2025-10-17 14:13 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, David Hildenbrand, linux-mm, Zi Yan, Baolin Wang,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Liam.Howlett,
	Sidhartha Kumar



On 2025/10/17 18:07, Lorenzo Stoakes wrote:
> On Wed, Oct 15, 2025 at 08:35:16PM +0800, Kefeng Wang wrote:
>> Rename prot_numa_skip() to folio_needs_prot_numa(), and remove
>> ret by directly return value instead of goto style.
>>
>> The folio checks for prot numa should be suitable for pmd folio
>> too, which helps to avoid unnecessary pmd change and folio
>> migration attempts.
>>
>> Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> 
> Yeah you're doing too much at once here.
> 
> You seem to be combining refactoring with changing actual functionality which is
> just not helpful, could you split this into two patches please?
> 
> Otherwise on bisect if you break the PMD stuff we'll end up reverting the
> refactoring too.

OK, the refactoring will be the first, I will split it out.
> 
> Plus it makes review harder.
> 
Thanks.



^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2025-10-17 14:13 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-15 12:35 [PATCH v3 0/3] mm: some optimizations for prot numa Kefeng Wang
2025-10-15 12:35 ` [PATCH v3 1/3] mm: mprotect: always skip dma pinned folio in prot_numa_skip() Kefeng Wang
2025-10-15 15:32   ` Lorenzo Stoakes
2025-10-16  1:00     ` Kefeng Wang
2025-10-16 21:10       ` Andrew Morton
2025-10-17  8:46         ` Lorenzo Stoakes
2025-10-17 14:12         ` Kefeng Wang
2025-10-16  1:14   ` Zi Yan
2025-10-16 17:53   ` Dev Jain
2025-10-15 12:35 ` [PATCH v3 2/3] mm: mprotect: avoid unnecessary struct page accessing if pte_protnone() Kefeng Wang
2025-10-15 15:43   ` Lorenzo Stoakes
2025-10-15 17:45     ` David Hildenbrand
2025-10-16  1:07       ` Kefeng Wang
2025-10-17  8:47         ` Lorenzo Stoakes
2025-10-17  8:46       ` Lorenzo Stoakes
2025-10-16  1:28   ` Zi Yan
2025-10-16  1:35     ` Kefeng Wang
2025-10-16 18:02   ` Dev Jain
2025-10-15 12:35 ` [PATCH v3 3/3] mm: huge_memory: use folio_needs_prot_numa() for pmd folio Kefeng Wang
2025-10-16 19:19   ` David Hildenbrand
2025-10-17 10:07   ` Lorenzo Stoakes
2025-10-17 14:13     ` Kefeng Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox