* [PATCH v2 0/3] mm: some optimizations for prot numa
@ 2025-10-14 11:33 Kefeng Wang
2025-10-14 11:33 ` [PATCH v2 1/3] mm: mprotect: always skip dma pinned folio in prot_numa_skip() Kefeng Wang
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Kefeng Wang @ 2025-10-14 11:33 UTC (permalink / raw)
To: Andrew Morton, David Hildenbrand, Lorenzo Stoakes, linux-mm
Cc: Zi Yan, Baolin Wang, Ryan Roberts, Dev Jain, Barry Song,
Lance Yang, Liam.Howlett, Kefeng Wang
v2
- update comments for pte_protnone() in patch2
- rename prot_numa_skip() to folio_skip_prot_numa() and
cleanup it a bit
- add RB/ACK
Kefeng Wang (3):
mm: mprotect: always skip dma pinned folio in prot_numa_skip()
mm: mprotect: avoid unnecessary struct page accessing if
pte_protnone()
mm: huge_memory: use folio_skip_prot_numa() for pmd folio
mm/huge_memory.c | 21 ++++++----------
mm/internal.h | 2 ++
mm/mprotect.c | 63 +++++++++++++++++++++---------------------------
3 files changed, 36 insertions(+), 50 deletions(-)
--
2.27.0
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v2 1/3] mm: mprotect: always skip dma pinned folio in prot_numa_skip()
2025-10-14 11:33 [PATCH v2 0/3] mm: some optimizations for prot numa Kefeng Wang
@ 2025-10-14 11:33 ` Kefeng Wang
2025-10-14 22:20 ` Barry Song
2025-10-14 11:33 ` [PATCH v2 2/3] mm: mprotect: avoid unnecessary struct page accessing if pte_protnone() Kefeng Wang
2025-10-14 11:33 ` [PATCH v2 3/3] mm: huge_memory: use folio_skip_prot_numa() for pmd folio Kefeng Wang
2 siblings, 1 reply; 14+ messages in thread
From: Kefeng Wang @ 2025-10-14 11:33 UTC (permalink / raw)
To: Andrew Morton, David Hildenbrand, Lorenzo Stoakes, linux-mm
Cc: Zi Yan, Baolin Wang, Ryan Roberts, Dev Jain, Barry Song,
Lance Yang, Liam.Howlett, Kefeng Wang, Sidhartha Kumar
If the folio(even not CoW folio) is dma pinned, it can't be
migrated, so always skip pinned folio to avoid a waste of cycles
when folios migration.
Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
mm/mprotect.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 113b48985834..bb59a42809b8 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -136,9 +136,12 @@ static bool prot_numa_skip(struct vm_area_struct *vma, unsigned long addr,
if (folio_is_zone_device(folio) || folio_test_ksm(folio))
goto skip;
- /* Also skip shared copy-on-write pages */
- if (is_cow_mapping(vma->vm_flags) &&
- (folio_maybe_dma_pinned(folio) || folio_maybe_mapped_shared(folio)))
+ /* Also skip shared copy-on-write folios */
+ if (is_cow_mapping(vma->vm_flags) && folio_maybe_mapped_shared(folio))
+ goto skip;
+
+ /* Folios are pinned and can't be migrated */
+ if (folio_maybe_dma_pinned(folio))
goto skip;
/*
--
2.27.0
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v2 2/3] mm: mprotect: avoid unnecessary struct page accessing if pte_protnone()
2025-10-14 11:33 [PATCH v2 0/3] mm: some optimizations for prot numa Kefeng Wang
2025-10-14 11:33 ` [PATCH v2 1/3] mm: mprotect: always skip dma pinned folio in prot_numa_skip() Kefeng Wang
@ 2025-10-14 11:33 ` Kefeng Wang
2025-10-14 12:13 ` David Hildenbrand
2025-10-15 7:32 ` Dev Jain
2025-10-14 11:33 ` [PATCH v2 3/3] mm: huge_memory: use folio_skip_prot_numa() for pmd folio Kefeng Wang
2 siblings, 2 replies; 14+ messages in thread
From: Kefeng Wang @ 2025-10-14 11:33 UTC (permalink / raw)
To: Andrew Morton, David Hildenbrand, Lorenzo Stoakes, linux-mm
Cc: Zi Yan, Baolin Wang, Ryan Roberts, Dev Jain, Barry Song,
Lance Yang, Liam.Howlett, Kefeng Wang, Sidhartha Kumar
If the pte_protnone() is true, we could avoid unnecessary struct page
accessing and reduce cache footprint when scanning page tables for prot
numa, the performance test of pmbench memory accessing benchmark
should be benifit, see more commit a818f5363a0e ("autonuma: reduce cache
footprint when scanning page tables").
Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
mm/mprotect.c | 29 ++++++++++++-----------------
1 file changed, 12 insertions(+), 17 deletions(-)
diff --git a/mm/mprotect.c b/mm/mprotect.c
index bb59a42809b8..7affa88a6de7 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -118,18 +118,13 @@ static int mprotect_folio_pte_batch(struct folio *folio, pte_t *ptep,
return folio_pte_batch_flags(folio, NULL, ptep, &pte, max_nr_ptes, flags);
}
-static bool prot_numa_skip(struct vm_area_struct *vma, unsigned long addr,
- pte_t oldpte, pte_t *pte, int target_node,
- struct folio *folio)
+static bool prot_numa_skip(struct vm_area_struct *vma, int target_node,
+ struct folio *folio)
{
bool ret = true;
bool toptier;
int nid;
- /* Avoid TLB flush if possible */
- if (pte_protnone(oldpte))
- goto skip;
-
if (!folio)
goto skip;
@@ -307,23 +302,23 @@ static long change_pte_range(struct mmu_gather *tlb,
struct page *page;
pte_t ptent;
+ /* Already in the desired state. */
+ if (prot_numa && pte_protnone(oldpte))
+ continue;
+
page = vm_normal_page(vma, addr, oldpte);
if (page)
folio = page_folio(page);
+
/*
* Avoid trapping faults against the zero or KSM
* pages. See similar comment in change_huge_pmd.
*/
- if (prot_numa) {
- int ret = prot_numa_skip(vma, addr, oldpte, pte,
- target_node, folio);
- if (ret) {
-
- /* determine batch to skip */
- nr_ptes = mprotect_folio_pte_batch(folio,
- pte, oldpte, max_nr_ptes, /* flags = */ 0);
- continue;
- }
+ if (prot_numa & prot_numa_skip(vma, target_node, folio)) {
+ /* determine batch to skip */
+ nr_ptes = mprotect_folio_pte_batch(folio,
+ pte, oldpte, max_nr_ptes, /* flags = */ 0);
+ continue;
}
nr_ptes = mprotect_folio_pte_batch(folio, pte, oldpte, max_nr_ptes, flags);
--
2.27.0
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v2 3/3] mm: huge_memory: use folio_skip_prot_numa() for pmd folio
2025-10-14 11:33 [PATCH v2 0/3] mm: some optimizations for prot numa Kefeng Wang
2025-10-14 11:33 ` [PATCH v2 1/3] mm: mprotect: always skip dma pinned folio in prot_numa_skip() Kefeng Wang
2025-10-14 11:33 ` [PATCH v2 2/3] mm: mprotect: avoid unnecessary struct page accessing if pte_protnone() Kefeng Wang
@ 2025-10-14 11:33 ` Kefeng Wang
2025-10-15 7:30 ` Dev Jain
2 siblings, 1 reply; 14+ messages in thread
From: Kefeng Wang @ 2025-10-14 11:33 UTC (permalink / raw)
To: Andrew Morton, David Hildenbrand, Lorenzo Stoakes, linux-mm
Cc: Zi Yan, Baolin Wang, Ryan Roberts, Dev Jain, Barry Song,
Lance Yang, Liam.Howlett, Kefeng Wang, Sidhartha Kumar
Rename prot_numa_skip() to folio_skip_prot_numa(), and remove
ret by directly return value instead of goto style.
The folio skip checks for prot numa should be suitable for pmd
folio too, which helps to avoid unnecessary pmd change and folio
migration attempts.
Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
mm/huge_memory.c | 21 +++++++--------------
mm/internal.h | 2 ++
mm/mprotect.c | 35 ++++++++++++++---------------------
3 files changed, 23 insertions(+), 35 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 1d1b74950332..8ae17e0aacb9 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2395,8 +2395,7 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
#endif
if (prot_numa) {
- struct folio *folio;
- bool toptier;
+ int target_node = NUMA_NO_NODE;
/*
* Avoid trapping faults against the zero page. The read-only
* data is likely to be read-cached on the local CPU and
@@ -2408,19 +2407,13 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
if (pmd_protnone(*pmd))
goto unlock;
- folio = pmd_folio(*pmd);
- toptier = node_is_toptier(folio_nid(folio));
- /*
- * Skip scanning top tier node if normal numa
- * balancing is disabled
- */
- if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_NORMAL) &&
- toptier)
- goto unlock;
+ /* Get target node for single threaded private VMAs */
+ if (!(vma->vm_flags & VM_SHARED) &&
+ atomic_read(&vma->vm_mm->mm_users) == 1)
+ target_node = numa_node_id();
- if (folio_use_access_time(folio))
- folio_xchg_access_time(folio,
- jiffies_to_msecs(jiffies));
+ if (folio_skip_prot_numa(pmd_folio(*pmd), vma, target_node))
+ goto unlock;
}
/*
* In case prot_numa, we are under mmap_read_lock(mm). It's critical
diff --git a/mm/internal.h b/mm/internal.h
index 1561fc2ff5b8..55daceab3682 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1378,6 +1378,8 @@ void vunmap_range_noflush(unsigned long start, unsigned long end);
void __vunmap_range_noflush(unsigned long start, unsigned long end);
+bool folio_skip_prot_numa(struct folio *folio, struct vm_area_struct *vma,
+ int target_node);
int numa_migrate_check(struct folio *folio, struct vm_fault *vmf,
unsigned long addr, int *flags, bool writable,
int *last_cpupid);
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 7affa88a6de7..cec4c80eb46d 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -118,26 +118,21 @@ static int mprotect_folio_pte_batch(struct folio *folio, pte_t *ptep,
return folio_pte_batch_flags(folio, NULL, ptep, &pte, max_nr_ptes, flags);
}
-static bool prot_numa_skip(struct vm_area_struct *vma, int target_node,
- struct folio *folio)
+bool folio_skip_prot_numa(struct folio *folio, struct vm_area_struct *vma,
+ int target_node)
{
- bool ret = true;
- bool toptier;
int nid;
- if (!folio)
- goto skip;
-
- if (folio_is_zone_device(folio) || folio_test_ksm(folio))
- goto skip;
+ if (!folio || folio_is_zone_device(folio) || folio_test_ksm(folio))
+ return true;
/* Also skip shared copy-on-write folios */
if (is_cow_mapping(vma->vm_flags) && folio_maybe_mapped_shared(folio))
- goto skip;
+ return true;
/* Folios are pinned and can't be migrated */
if (folio_maybe_dma_pinned(folio))
- goto skip;
+ return true;
/*
* While migration can move some dirty pages,
@@ -145,7 +140,7 @@ static bool prot_numa_skip(struct vm_area_struct *vma, int target_node,
* context.
*/
if (folio_is_file_lru(folio) && folio_test_dirty(folio))
- goto skip;
+ return true;
/*
* Don't mess with PTEs if page is already on the node
@@ -153,23 +148,20 @@ static bool prot_numa_skip(struct vm_area_struct *vma, int target_node,
*/
nid = folio_nid(folio);
if (target_node == nid)
- goto skip;
-
- toptier = node_is_toptier(nid);
+ return true;
/*
* Skip scanning top tier node if normal numa
* balancing is disabled
*/
- if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_NORMAL) && toptier)
- goto skip;
+ if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_NORMAL) &&
+ node_is_toptier(nid))
+ return true;
- ret = false;
if (folio_use_access_time(folio))
folio_xchg_access_time(folio, jiffies_to_msecs(jiffies));
-skip:
- return ret;
+ return false;
}
/* Set nr_ptes number of ptes, starting from idx */
@@ -314,7 +306,8 @@ static long change_pte_range(struct mmu_gather *tlb,
* Avoid trapping faults against the zero or KSM
* pages. See similar comment in change_huge_pmd.
*/
- if (prot_numa & prot_numa_skip(vma, target_node, folio)) {
+ if (prot_numa & folio_skip_prot_numa(folio, vma,
+ target_node)) {
/* determine batch to skip */
nr_ptes = mprotect_folio_pte_batch(folio,
pte, oldpte, max_nr_ptes, /* flags = */ 0);
--
2.27.0
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 2/3] mm: mprotect: avoid unnecessary struct page accessing if pte_protnone()
2025-10-14 11:33 ` [PATCH v2 2/3] mm: mprotect: avoid unnecessary struct page accessing if pte_protnone() Kefeng Wang
@ 2025-10-14 12:13 ` David Hildenbrand
2025-10-15 7:32 ` Dev Jain
1 sibling, 0 replies; 14+ messages in thread
From: David Hildenbrand @ 2025-10-14 12:13 UTC (permalink / raw)
To: Kefeng Wang, Andrew Morton, Lorenzo Stoakes, linux-mm
Cc: Zi Yan, Baolin Wang, Ryan Roberts, Dev Jain, Barry Song,
Lance Yang, Liam.Howlett, Sidhartha Kumar
On 14.10.25 13:33, Kefeng Wang wrote:
> If the pte_protnone() is true, we could avoid unnecessary struct page
> accessing and reduce cache footprint when scanning page tables for prot
> numa, the performance test of pmbench memory accessing benchmark
> should be benifit, see more commit a818f5363a0e ("autonuma: reduce cache
> footprint when scanning page tables").
>
> Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
Acked-by: David Hildenbrand <david@redhat.com>
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 1/3] mm: mprotect: always skip dma pinned folio in prot_numa_skip()
2025-10-14 11:33 ` [PATCH v2 1/3] mm: mprotect: always skip dma pinned folio in prot_numa_skip() Kefeng Wang
@ 2025-10-14 22:20 ` Barry Song
0 siblings, 0 replies; 14+ messages in thread
From: Barry Song @ 2025-10-14 22:20 UTC (permalink / raw)
To: wangkefeng.wang
Cc: Liam.Howlett, akpm, baohua, baolin.wang, david, dev.jain,
lance.yang, linux-mm, lorenzo.stoakes, ryan.roberts,
sidhartha.kumar, ziy
> If the folio(even not CoW folio) is dma pinned, it can't be
> migrated, so always skip pinned folio to avoid a waste of cycles
> when folios migration.
>
> Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
> Acked-by: David Hildenbrand <david@redhat.com>
> Reviewed-by: Lance Yang <lance.yang@linux.dev>
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
LGTM,
Reviewed-by: Barry Song <baohua@kernel.org>
> ---
> mm/mprotect.c | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 3/3] mm: huge_memory: use folio_skip_prot_numa() for pmd folio
2025-10-14 11:33 ` [PATCH v2 3/3] mm: huge_memory: use folio_skip_prot_numa() for pmd folio Kefeng Wang
@ 2025-10-15 7:30 ` Dev Jain
2025-10-15 9:21 ` Kefeng Wang
0 siblings, 1 reply; 14+ messages in thread
From: Dev Jain @ 2025-10-15 7:30 UTC (permalink / raw)
To: Kefeng Wang, Andrew Morton, David Hildenbrand, Lorenzo Stoakes, linux-mm
Cc: Zi Yan, Baolin Wang, Ryan Roberts, Barry Song, Lance Yang,
Liam.Howlett, Sidhartha Kumar
On 14/10/25 5:03 pm, Kefeng Wang wrote:
> Rename prot_numa_skip() to folio_skip_prot_numa(), and remove
> ret by directly return value instead of goto style.
>
> The folio skip checks for prot numa should be suitable for pmd
> folio too, which helps to avoid unnecessary pmd change and folio
> migration attempts.
>
> Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
In the review of my mprotect pte batching series, reviewers had
noted that the branch "if (folio_use_access_time(folio))" in
folio_skip_prot_numa() did not belong there - it should be done
outside of the function. But I see that that would duplicate a line
now that this function has two users. So in light of that, would you
mind changing the name of this function to folio_skip_or_process_prot_numa()?
> ---
> mm/huge_memory.c | 21 +++++++--------------
> mm/internal.h | 2 ++
> mm/mprotect.c | 35 ++++++++++++++---------------------
> 3 files changed, 23 insertions(+), 35 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 1d1b74950332..8ae17e0aacb9 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2395,8 +2395,7 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
> #endif
>
> if (prot_numa) {
> - struct folio *folio;
> - bool toptier;
> + int target_node = NUMA_NO_NODE;
> /*
> * Avoid trapping faults against the zero page. The read-only
> * data is likely to be read-cached on the local CPU and
> @@ -2408,19 +2407,13 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
> if (pmd_protnone(*pmd))
> goto unlock;
>
> - folio = pmd_folio(*pmd);
> - toptier = node_is_toptier(folio_nid(folio));
> - /*
> - * Skip scanning top tier node if normal numa
> - * balancing is disabled
> - */
> - if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_NORMAL) &&
> - toptier)
> - goto unlock;
> + /* Get target node for single threaded private VMAs */
> + if (!(vma->vm_flags & VM_SHARED) &&
> + atomic_read(&vma->vm_mm->mm_users) == 1)
> + target_node = numa_node_id();
>
> - if (folio_use_access_time(folio))
> - folio_xchg_access_time(folio,
> - jiffies_to_msecs(jiffies));
> + if (folio_skip_prot_numa(pmd_folio(*pmd), vma, target_node))
> + goto unlock;
> }
> /*
> * In case prot_numa, we are under mmap_read_lock(mm). It's critical
> diff --git a/mm/internal.h b/mm/internal.h
> index 1561fc2ff5b8..55daceab3682 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -1378,6 +1378,8 @@ void vunmap_range_noflush(unsigned long start, unsigned long end);
>
> void __vunmap_range_noflush(unsigned long start, unsigned long end);
>
> +bool folio_skip_prot_numa(struct folio *folio, struct vm_area_struct *vma,
> + int target_node);
> int numa_migrate_check(struct folio *folio, struct vm_fault *vmf,
> unsigned long addr, int *flags, bool writable,
> int *last_cpupid);
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index 7affa88a6de7..cec4c80eb46d 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -118,26 +118,21 @@ static int mprotect_folio_pte_batch(struct folio *folio, pte_t *ptep,
> return folio_pte_batch_flags(folio, NULL, ptep, &pte, max_nr_ptes, flags);
> }
>
> -static bool prot_numa_skip(struct vm_area_struct *vma, int target_node,
> - struct folio *folio)
> +bool folio_skip_prot_numa(struct folio *folio, struct vm_area_struct *vma,
> + int target_node)
> {
> - bool ret = true;
> - bool toptier;
> int nid;
>
> - if (!folio)
> - goto skip;
> -
> - if (folio_is_zone_device(folio) || folio_test_ksm(folio))
> - goto skip;
> + if (!folio || folio_is_zone_device(folio) || folio_test_ksm(folio))
> + return true;
>
> /* Also skip shared copy-on-write folios */
> if (is_cow_mapping(vma->vm_flags) && folio_maybe_mapped_shared(folio))
> - goto skip;
> + return true;
>
> /* Folios are pinned and can't be migrated */
> if (folio_maybe_dma_pinned(folio))
> - goto skip;
> + return true;
>
> /*
> * While migration can move some dirty pages,
> @@ -145,7 +140,7 @@ static bool prot_numa_skip(struct vm_area_struct *vma, int target_node,
> * context.
> */
> if (folio_is_file_lru(folio) && folio_test_dirty(folio))
> - goto skip;
> + return true;
>
> /*
> * Don't mess with PTEs if page is already on the node
> @@ -153,23 +148,20 @@ static bool prot_numa_skip(struct vm_area_struct *vma, int target_node,
> */
> nid = folio_nid(folio);
> if (target_node == nid)
> - goto skip;
> -
> - toptier = node_is_toptier(nid);
> + return true;
>
> /*
> * Skip scanning top tier node if normal numa
> * balancing is disabled
> */
> - if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_NORMAL) && toptier)
> - goto skip;
> + if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_NORMAL) &&
> + node_is_toptier(nid))
> + return true;
>
> - ret = false;
> if (folio_use_access_time(folio))
> folio_xchg_access_time(folio, jiffies_to_msecs(jiffies));
>
> -skip:
> - return ret;
> + return false;
> }
>
> /* Set nr_ptes number of ptes, starting from idx */
> @@ -314,7 +306,8 @@ static long change_pte_range(struct mmu_gather *tlb,
> * Avoid trapping faults against the zero or KSM
> * pages. See similar comment in change_huge_pmd.
> */
> - if (prot_numa & prot_numa_skip(vma, target_node, folio)) {
> + if (prot_numa & folio_skip_prot_numa(folio, vma,
> + target_node)) {
> /* determine batch to skip */
> nr_ptes = mprotect_folio_pte_batch(folio,
> pte, oldpte, max_nr_ptes, /* flags = */ 0);
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 2/3] mm: mprotect: avoid unnecessary struct page accessing if pte_protnone()
2025-10-14 11:33 ` [PATCH v2 2/3] mm: mprotect: avoid unnecessary struct page accessing if pte_protnone() Kefeng Wang
2025-10-14 12:13 ` David Hildenbrand
@ 2025-10-15 7:32 ` Dev Jain
2025-10-15 8:50 ` Kefeng Wang
1 sibling, 1 reply; 14+ messages in thread
From: Dev Jain @ 2025-10-15 7:32 UTC (permalink / raw)
To: Kefeng Wang, Andrew Morton, David Hildenbrand, Lorenzo Stoakes, linux-mm
Cc: Zi Yan, Baolin Wang, Ryan Roberts, Barry Song, Lance Yang,
Liam.Howlett, Sidhartha Kumar
On 14/10/25 5:03 pm, Kefeng Wang wrote:
> If the pte_protnone() is true, we could avoid unnecessary struct page
> accessing and reduce cache footprint when scanning page tables for prot
> numa, the performance test of pmbench memory accessing benchmark
> should be benifit, see more commit a818f5363a0e ("autonuma: reduce cache
> footprint when scanning page tables").
>
> Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
Thanks.
> mm/mprotect.c | 29 ++++++++++++-----------------
> 1 file changed, 12 insertions(+), 17 deletions(-)
>
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index bb59a42809b8..7affa88a6de7 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -118,18 +118,13 @@ static int mprotect_folio_pte_batch(struct folio *folio, pte_t *ptep,
> return folio_pte_batch_flags(folio, NULL, ptep, &pte, max_nr_ptes, flags);
> }
>
> -static bool prot_numa_skip(struct vm_area_struct *vma, unsigned long addr,
> - pte_t oldpte, pte_t *pte, int target_node,
> - struct folio *folio)
> +static bool prot_numa_skip(struct vm_area_struct *vma, int target_node,
> + struct folio *folio)
> {
> bool ret = true;
> bool toptier;
> int nid;
>
> - /* Avoid TLB flush if possible */
> - if (pte_protnone(oldpte))
> - goto skip;
> -
> if (!folio)
> goto skip;
>
> @@ -307,23 +302,23 @@ static long change_pte_range(struct mmu_gather *tlb,
> struct page *page;
> pte_t ptent;
>
> + /* Already in the desired state. */
> + if (prot_numa && pte_protnone(oldpte))
> + continue;
> +
> page = vm_normal_page(vma, addr, oldpte);
> if (page)
> folio = page_folio(page);
> +
Unrelated change but I guess this is needed since we usually leave a line
before starting a comment.
> /*
> * Avoid trapping faults against the zero or KSM
> * pages. See similar comment in change_huge_pmd.
> */
> - if (prot_numa) {
> - int ret = prot_numa_skip(vma, addr, oldpte, pte,
> - target_node, folio);
> - if (ret) {
> -
> - /* determine batch to skip */
> - nr_ptes = mprotect_folio_pte_batch(folio,
> - pte, oldpte, max_nr_ptes, /* flags = */ 0);
> - continue;
> - }
> + if (prot_numa & prot_numa_skip(vma, target_node, folio)) {
Why not "&&" instead of "&"?
> + /* determine batch to skip */
> + nr_ptes = mprotect_folio_pte_batch(folio,
> + pte, oldpte, max_nr_ptes, /* flags = */ 0);
> + continue;
> }
>
> nr_ptes = mprotect_folio_pte_batch(folio, pte, oldpte, max_nr_ptes, flags);
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 2/3] mm: mprotect: avoid unnecessary struct page accessing if pte_protnone()
2025-10-15 7:32 ` Dev Jain
@ 2025-10-15 8:50 ` Kefeng Wang
0 siblings, 0 replies; 14+ messages in thread
From: Kefeng Wang @ 2025-10-15 8:50 UTC (permalink / raw)
To: Dev Jain, Andrew Morton, David Hildenbrand, Lorenzo Stoakes, linux-mm
Cc: Zi Yan, Baolin Wang, Ryan Roberts, Barry Song, Lance Yang,
Liam.Howlett, Sidhartha Kumar
On 2025/10/15 15:32, Dev Jain wrote:
>
> On 14/10/25 5:03 pm, Kefeng Wang wrote:
>> If the pte_protnone() is true, we could avoid unnecessary struct page
>> accessing and reduce cache footprint when scanning page tables for prot
>> numa, the performance test of pmbench memory accessing benchmark
>> should be benifit, see more commit a818f5363a0e ("autonuma: reduce cache
>> footprint when scanning page tables").
>>
>> Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>> ---
>
> Thanks.
>
>> mm/mprotect.c | 29 ++++++++++++-----------------
>> 1 file changed, 12 insertions(+), 17 deletions(-)
>>
>> diff --git a/mm/mprotect.c b/mm/mprotect.c
>> index bb59a42809b8..7affa88a6de7 100644
>> --- a/mm/mprotect.c
>> +++ b/mm/mprotect.c
>> @@ -118,18 +118,13 @@ static int mprotect_folio_pte_batch(struct folio
>> *folio, pte_t *ptep,
>> return folio_pte_batch_flags(folio, NULL, ptep, &pte,
>> max_nr_ptes, flags);
>> }
>> -static bool prot_numa_skip(struct vm_area_struct *vma, unsigned long
>> addr,
>> - pte_t oldpte, pte_t *pte, int target_node,
>> - struct folio *folio)
>> +static bool prot_numa_skip(struct vm_area_struct *vma, int target_node,
>> + struct folio *folio)
>> {
>> bool ret = true;
>> bool toptier;
>> int nid;
>> - /* Avoid TLB flush if possible */
>> - if (pte_protnone(oldpte))
>> - goto skip;
>> -
>> if (!folio)
>> goto skip;
>> @@ -307,23 +302,23 @@ static long change_pte_range(struct mmu_gather
>> *tlb,
>> struct page *page;
>> pte_t ptent;
>> + /* Already in the desired state. */
>> + if (prot_numa && pte_protnone(oldpte))
>> + continue;
>> +
>> page = vm_normal_page(vma, addr, oldpte);
>> if (page)
>> folio = page_folio(page);
>> +
>
> Unrelated change but I guess this is needed since we usually leave a line
> before starting a comment.
>
>> /*
>> * Avoid trapping faults against the zero or KSM
>> * pages. See similar comment in change_huge_pmd.
>> */
>> - if (prot_numa) {
>> - int ret = prot_numa_skip(vma, addr, oldpte, pte,
>> - target_node, folio);
>> - if (ret) {
>> -
>> - /* determine batch to skip */
>> - nr_ptes = mprotect_folio_pte_batch(folio,
>> - pte, oldpte, max_nr_ptes, /* flags = */ 0);
>> - continue;
>> - }
>> + if (prot_numa & prot_numa_skip(vma, target_node, folio)) {
>
> Why not "&&" instead of "&"?
oops,will fix it.
>
>> + /* determine batch to skip */
>> + nr_ptes = mprotect_folio_pte_batch(folio,
>> + pte, oldpte, max_nr_ptes, /* flags = */ 0);
>> + continue;
>> }
>> nr_ptes = mprotect_folio_pte_batch(folio, pte, oldpte,
>> max_nr_ptes, flags);
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 3/3] mm: huge_memory: use folio_skip_prot_numa() for pmd folio
2025-10-15 7:30 ` Dev Jain
@ 2025-10-15 9:21 ` Kefeng Wang
2025-10-15 9:54 ` David Hildenbrand
0 siblings, 1 reply; 14+ messages in thread
From: Kefeng Wang @ 2025-10-15 9:21 UTC (permalink / raw)
To: Dev Jain, Andrew Morton, David Hildenbrand, Lorenzo Stoakes, linux-mm
Cc: Zi Yan, Baolin Wang, Ryan Roberts, Barry Song, Lance Yang,
Liam.Howlett, Sidhartha Kumar
On 2025/10/15 15:30, Dev Jain wrote:
>
> On 14/10/25 5:03 pm, Kefeng Wang wrote:
>> Rename prot_numa_skip() to folio_skip_prot_numa(), and remove
>> ret by directly return value instead of goto style.
>>
>> The folio skip checks for prot numa should be suitable for pmd
>> folio too, which helps to avoid unnecessary pmd change and folio
>> migration attempts.
>>
>> Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>
> In the review of my mprotect pte batching series, reviewers had
> noted that the branch "if (folio_use_access_time(folio))" in
> folio_skip_prot_numa() did not belong there - it should be done
> outside of the function. But I see that that would duplicate a line
> now that this function has two users. So in light of that, would you
> mind changing the name of this function to
> folio_skip_or_process_prot_numa()?
>
The name is a bit long, and it only update access_time not change the
pte, so maybe we leave it as is?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 3/3] mm: huge_memory: use folio_skip_prot_numa() for pmd folio
2025-10-15 9:21 ` Kefeng Wang
@ 2025-10-15 9:54 ` David Hildenbrand
2025-10-15 11:04 ` Kefeng Wang
0 siblings, 1 reply; 14+ messages in thread
From: David Hildenbrand @ 2025-10-15 9:54 UTC (permalink / raw)
To: Kefeng Wang, Dev Jain, Andrew Morton, Lorenzo Stoakes, linux-mm
Cc: Zi Yan, Baolin Wang, Ryan Roberts, Barry Song, Lance Yang,
Liam.Howlett, Sidhartha Kumar
On 15.10.25 11:21, Kefeng Wang wrote:
>
>
> On 2025/10/15 15:30, Dev Jain wrote:
>>
>> On 14/10/25 5:03 pm, Kefeng Wang wrote:
>>> Rename prot_numa_skip() to folio_skip_prot_numa(), and remove
>>> ret by directly return value instead of goto style.
>>>
>>> The folio skip checks for prot numa should be suitable for pmd
>>> folio too, which helps to avoid unnecessary pmd change and folio
>>> migration attempts.
>>>
>>> Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
>>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>>
>> In the review of my mprotect pte batching series, reviewers had
>> noted that the branch "if (folio_use_access_time(folio))" in
>> folio_skip_prot_numa() did not belong there - it should be done
>> outside of the function. But I see that that would duplicate a line
>> now that this function has two users. So in light of that, would you
>> mind changing the name of this function to
>> folio_skip_or_process_prot_numa()?
>>
>
> The name is a bit long, and it only update access_time not change the
> pte, so maybe we leave it as is?
Any such name might make the return value weird (which indicates whether
to skip) I'm afraid.
We could invert the meaning and call it something like
folio_apply_prot_numa()
And return whether we have to protect it.
Maybe that's better? Other naming suggestions welcome :)
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 3/3] mm: huge_memory: use folio_skip_prot_numa() for pmd folio
2025-10-15 9:54 ` David Hildenbrand
@ 2025-10-15 11:04 ` Kefeng Wang
2025-10-15 11:30 ` David Hildenbrand
0 siblings, 1 reply; 14+ messages in thread
From: Kefeng Wang @ 2025-10-15 11:04 UTC (permalink / raw)
To: David Hildenbrand, Dev Jain, Andrew Morton, Lorenzo Stoakes, linux-mm
Cc: Zi Yan, Baolin Wang, Ryan Roberts, Barry Song, Lance Yang,
Liam.Howlett, Sidhartha Kumar
On 2025/10/15 17:54, David Hildenbrand wrote:
> On 15.10.25 11:21, Kefeng Wang wrote:
>>
>>
>> On 2025/10/15 15:30, Dev Jain wrote:
>>>
>>> On 14/10/25 5:03 pm, Kefeng Wang wrote:
>>>> Rename prot_numa_skip() to folio_skip_prot_numa(), and remove
>>>> ret by directly return value instead of goto style.
>>>>
>>>> The folio skip checks for prot numa should be suitable for pmd
>>>> folio too, which helps to avoid unnecessary pmd change and folio
>>>> migration attempts.
>>>>
>>>> Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
>>>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>>>
>>> In the review of my mprotect pte batching series, reviewers had
>>> noted that the branch "if (folio_use_access_time(folio))" in
>>> folio_skip_prot_numa() did not belong there - it should be done
>>> outside of the function. But I see that that would duplicate a line
>>> now that this function has two users. So in light of that, would you
>>> mind changing the name of this function to
>>> folio_skip_or_process_prot_numa()?
>>>
>>
>> The name is a bit long, and it only update access_time not change the
>> pte, so maybe we leave it as is?
>
> Any such name might make the return value weird (which indicates whether
> to skip) I'm afraid.
>
> We could invert the meaning and call it something like
>
> folio_apply_prot_numa()
>
> And return whether we have to protect it.
>
> Maybe that's better? Other naming suggestions welcome :)
>
That's better, or folio_needs_prot_numa() as there are already some
similar names with folio_needs_ prefix ?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 3/3] mm: huge_memory: use folio_skip_prot_numa() for pmd folio
2025-10-15 11:04 ` Kefeng Wang
@ 2025-10-15 11:30 ` David Hildenbrand
2025-10-15 12:37 ` Kefeng Wang
0 siblings, 1 reply; 14+ messages in thread
From: David Hildenbrand @ 2025-10-15 11:30 UTC (permalink / raw)
To: Kefeng Wang, Dev Jain, Andrew Morton, Lorenzo Stoakes, linux-mm
Cc: Zi Yan, Baolin Wang, Ryan Roberts, Barry Song, Lance Yang,
Liam.Howlett, Sidhartha Kumar
On 15.10.25 13:04, Kefeng Wang wrote:
>
>
> On 2025/10/15 17:54, David Hildenbrand wrote:
>> On 15.10.25 11:21, Kefeng Wang wrote:
>>>
>>>
>>> On 2025/10/15 15:30, Dev Jain wrote:
>>>>
>>>> On 14/10/25 5:03 pm, Kefeng Wang wrote:
>>>>> Rename prot_numa_skip() to folio_skip_prot_numa(), and remove
>>>>> ret by directly return value instead of goto style.
>>>>>
>>>>> The folio skip checks for prot numa should be suitable for pmd
>>>>> folio too, which helps to avoid unnecessary pmd change and folio
>>>>> migration attempts.
>>>>>
>>>>> Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
>>>>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>>>>
>>>> In the review of my mprotect pte batching series, reviewers had
>>>> noted that the branch "if (folio_use_access_time(folio))" in
>>>> folio_skip_prot_numa() did not belong there - it should be done
>>>> outside of the function. But I see that that would duplicate a line
>>>> now that this function has two users. So in light of that, would you
>>>> mind changing the name of this function to
>>>> folio_skip_or_process_prot_numa()?
>>>>
>>>
>>> The name is a bit long, and it only update access_time not change the
>>> pte, so maybe we leave it as is?
>>
>> Any such name might make the return value weird (which indicates whether
>> to skip) I'm afraid.
>>
>> We could invert the meaning and call it something like
>>
>> folio_apply_prot_numa()
>>
>> And return whether we have to protect it.
>>
>> Maybe that's better? Other naming suggestions welcome :)
>>
>
> That's better, or folio_needs_prot_numa() as there are already some
> similar names with folio_needs_ prefix ?
Yeah, why not. Then we can also add kerneldoc to describe that it will
adjust the access time if the function returns true.
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 3/3] mm: huge_memory: use folio_skip_prot_numa() for pmd folio
2025-10-15 11:30 ` David Hildenbrand
@ 2025-10-15 12:37 ` Kefeng Wang
0 siblings, 0 replies; 14+ messages in thread
From: Kefeng Wang @ 2025-10-15 12:37 UTC (permalink / raw)
To: David Hildenbrand, Dev Jain, Andrew Morton, Lorenzo Stoakes, linux-mm
Cc: Zi Yan, Baolin Wang, Ryan Roberts, Barry Song, Lance Yang,
Liam.Howlett, Sidhartha Kumar
On 2025/10/15 19:30, David Hildenbrand wrote:
> On 15.10.25 13:04, Kefeng Wang wrote:
>>
>>
>> On 2025/10/15 17:54, David Hildenbrand wrote:
>>> On 15.10.25 11:21, Kefeng Wang wrote:
>>>>
>>>>
>>>> On 2025/10/15 15:30, Dev Jain wrote:
>>>>>
>>>>> On 14/10/25 5:03 pm, Kefeng Wang wrote:
>>>>>> Rename prot_numa_skip() to folio_skip_prot_numa(), and remove
>>>>>> ret by directly return value instead of goto style.
>>>>>>
>>>>>> The folio skip checks for prot numa should be suitable for pmd
>>>>>> folio too, which helps to avoid unnecessary pmd change and folio
>>>>>> migration attempts.
>>>>>>
>>>>>> Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
>>>>>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>>>>>
>>>>> In the review of my mprotect pte batching series, reviewers had
>>>>> noted that the branch "if (folio_use_access_time(folio))" in
>>>>> folio_skip_prot_numa() did not belong there - it should be done
>>>>> outside of the function. But I see that that would duplicate a line
>>>>> now that this function has two users. So in light of that, would you
>>>>> mind changing the name of this function to
>>>>> folio_skip_or_process_prot_numa()?
>>>>>
>>>>
>>>> The name is a bit long, and it only update access_time not change the
>>>> pte, so maybe we leave it as is?
>>>
>>> Any such name might make the return value weird (which indicates whether
>>> to skip) I'm afraid.
>>>
>>> We could invert the meaning and call it something like
>>>
>>> folio_apply_prot_numa()
>>>
>>> And return whether we have to protect it.
>>>
>>> Maybe that's better? Other naming suggestions welcome :)
>>>
>>
>> That's better, or folio_needs_prot_numa() as there are already some
>> similar names with folio_needs_ prefix ?
>
> Yeah, why not. Then we can also add kerneldoc to describe that it will
> adjust the access time if the function returns true.
>
Done in v3.
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-10-15 12:37 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-14 11:33 [PATCH v2 0/3] mm: some optimizations for prot numa Kefeng Wang
2025-10-14 11:33 ` [PATCH v2 1/3] mm: mprotect: always skip dma pinned folio in prot_numa_skip() Kefeng Wang
2025-10-14 22:20 ` Barry Song
2025-10-14 11:33 ` [PATCH v2 2/3] mm: mprotect: avoid unnecessary struct page accessing if pte_protnone() Kefeng Wang
2025-10-14 12:13 ` David Hildenbrand
2025-10-15 7:32 ` Dev Jain
2025-10-15 8:50 ` Kefeng Wang
2025-10-14 11:33 ` [PATCH v2 3/3] mm: huge_memory: use folio_skip_prot_numa() for pmd folio Kefeng Wang
2025-10-15 7:30 ` Dev Jain
2025-10-15 9:21 ` Kefeng Wang
2025-10-15 9:54 ` David Hildenbrand
2025-10-15 11:04 ` Kefeng Wang
2025-10-15 11:30 ` David Hildenbrand
2025-10-15 12:37 ` Kefeng Wang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox