* [PATCH v2 0/2] Expand scope of khugepaged anonymous collapse
@ 2025-09-08 7:50 Dev Jain
2025-09-08 7:50 ` [PATCH v2 1/2] mm: Enable khugepaged anonymous collapse on non-writable regions Dev Jain
2025-09-08 7:50 ` [PATCH v2 2/2] mm: Drop all references of writable and SCAN_PAGE_RO Dev Jain
0 siblings, 2 replies; 7+ messages in thread
From: Dev Jain @ 2025-09-08 7:50 UTC (permalink / raw)
To: akpm, david, kas, willy, hughd
Cc: ziy, baolin.wang, lorenzo.stoakes, Liam.Howlett, npache,
ryan.roberts, baohua, richard.weiyang, linux-mm, linux-kernel,
Dev Jain
Currently khugepaged does not collapse an anonymous region which does not
have a single writable pte. This is wasteful since a region mapped with
non-writable ptes, for example, non-writable VMAs mapped by the
application, won't benefit from THP collapse.
An additional consequence of this constraint is that MADV_COLLAPSE does not
perform a collapse on a non-writable VMA, and this restriction is nowhere
to be found on the manpage - the restriction itself sounds wrong to me
since the user knows the protection of the memory it has mapped, so
collapsing read-only memory via madvise() should be a choice of the
user which shouldn't be overridden by the kernel.
Therefore, remove this constraint.
On an arm64 bare metal machine, comparing with vanilla 6.17-rc2, an
average of 5% improvement is seen on some mmtests benchmarks,
particularly hackbench, with a maximum improvement of 12%. In the
following table, (I) denotes statistically significant improvement,
(R) denotes statistically significant regression.
+-------------------------+--------------------------------+---------------+
| mmtests/hackbench | process-pipes-1 (seconds) | -0.06% |
| | process-pipes-4 (seconds) | -0.27% |
| | process-pipes-7 (seconds) | (I) -12.13% |
| | process-pipes-12 (seconds) | (I) -5.32% |
| | process-pipes-21 (seconds) | (I) -2.87% |
| | process-pipes-30 (seconds) | (I) -3.39% |
| | process-pipes-48 (seconds) | (I) -5.65% |
| | process-pipes-79 (seconds) | (I) -6.74% |
| | process-pipes-110 (seconds) | (I) -6.26% |
| | process-pipes-141 (seconds) | (I) -4.99% |
| | process-pipes-172 (seconds) | (I) -4.45% |
| | process-pipes-203 (seconds) | (I) -3.65% |
| | process-pipes-234 (seconds) | (I) -3.45% |
| | process-pipes-256 (seconds) | (I) -3.47% |
| | process-sockets-1 (seconds) | 2.13% |
| | process-sockets-4 (seconds) | 1.02% |
| | process-sockets-7 (seconds) | -0.26% |
| | process-sockets-12 (seconds) | -1.24% |
| | process-sockets-21 (seconds) | 0.01% |
| | process-sockets-30 (seconds) | -0.15% |
| | process-sockets-48 (seconds) | 0.15% |
| | process-sockets-79 (seconds) | 1.45% |
| | process-sockets-110 (seconds) | -1.64% |
| | process-sockets-141 (seconds) | (I) -4.27% |
| | process-sockets-172 (seconds) | 0.30% |
| | process-sockets-203 (seconds) | -1.71% |
| | process-sockets-234 (seconds) | -1.94% |
| | process-sockets-256 (seconds) | -0.71% |
| | thread-pipes-1 (seconds) | 0.66% |
| | thread-pipes-4 (seconds) | 1.66% |
| | thread-pipes-7 (seconds) | -0.17% |
| | thread-pipes-12 (seconds) | (I) -4.12% |
| | thread-pipes-21 (seconds) | (I) -2.13% |
| | thread-pipes-30 (seconds) | (I) -3.78% |
| | thread-pipes-48 (seconds) | (I) -5.77% |
| | thread-pipes-79 (seconds) | (I) -5.31% |
| | thread-pipes-110 (seconds) | (I) -6.12% |
| | thread-pipes-141 (seconds) | (I) -4.00% |
| | thread-pipes-172 (seconds) | (I) -3.01% |
| | thread-pipes-203 (seconds) | (I) -2.62% |
| | thread-pipes-234 (seconds) | (I) -2.00% |
| | thread-pipes-256 (seconds) | (I) -2.30% |
| | thread-sockets-1 (seconds) | (R) 2.39% |
+-------------------------+--------------------------------+---------------+
+-------------------------+------------------------------------------------+
| mmtests/sysbench-mutex | sysbenchmutex-1 (usec) | -0.02% |
| | sysbenchmutex-4 (usec) | -0.02% |
| | sysbenchmutex-7 (usec) | 0.00% |
| | sysbenchmutex-12 (usec) | 0.12% |
| | sysbenchmutex-21 (usec) | -0.40% |
| | sysbenchmutex-30 (usec) | 0.08% |
| | sysbenchmutex-48 (usec) | 2.59% |
| | sysbenchmutex-79 (usec) | -0.80% |
| | sysbenchmutex-110 (usec) | -3.87% |
| | sysbenchmutex-128 (usec) | (I) -4.46% |
+-------------------------+--------------------------------+---------------+
---
Based on today's mm-new.
v1->v2:
- Replace non-writable VMAs with non-writable PTEs to be more specific
- Add cover letter
RFC->v1:
- Drop writable references from tracepoints
RFC:
- https://lore.kernel.org/all/20250901074817.73012-1-dev.jain@arm.com/
Dev Jain (2):
mm: Enable khugepaged anonymous collapse on non-writable regions
mm: Drop all references of writable and SCAN_PAGE_RO
include/trace/events/huge_memory.h | 19 ++++++-------------
mm/khugepaged.c | 23 +++++------------------
2 files changed, 11 insertions(+), 31 deletions(-)
--
2.30.2
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2 1/2] mm: Enable khugepaged anonymous collapse on non-writable regions
2025-09-08 7:50 [PATCH v2 0/2] Expand scope of khugepaged anonymous collapse Dev Jain
@ 2025-09-08 7:50 ` Dev Jain
2025-09-09 18:49 ` Zach O'Keefe
2025-09-10 4:03 ` Anshuman Khandual
2025-09-08 7:50 ` [PATCH v2 2/2] mm: Drop all references of writable and SCAN_PAGE_RO Dev Jain
1 sibling, 2 replies; 7+ messages in thread
From: Dev Jain @ 2025-09-08 7:50 UTC (permalink / raw)
To: akpm, david, kas, willy, hughd
Cc: ziy, baolin.wang, lorenzo.stoakes, Liam.Howlett, npache,
ryan.roberts, baohua, richard.weiyang, linux-mm, linux-kernel,
Dev Jain
Currently khugepaged does not collapse an anonymous region which does not
have a single writable pte. This is wasteful since a region mapped with
non-writable ptes, for example, non-writable VMAs mapped by the
application, won't benefit from THP collapse.
An additional consequence of this constraint is that MADV_COLLAPSE does not
perform a collapse on a non-writable VMA, and this restriction is nowhere
to be found on the manpage - the restriction itself sounds wrong to me
since the user knows the protection of the memory it has mapped, so
collapsing read-only memory via madvise() should be a choice of the
user which shouldn't be overridden by the kernel.
Therefore, remove this restriction by not honouring SCAN_PAGE_RO.
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Kiryl Shutsemau <kas@kernel.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: Dev Jain <dev.jain@arm.com>
---
mm/khugepaged.c | 9 ++-------
1 file changed, 2 insertions(+), 7 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 4ec324a4c1fe..a0f1df2a7ae6 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -676,9 +676,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
writable = true;
}
- if (unlikely(!writable)) {
- result = SCAN_PAGE_RO;
- } else if (unlikely(cc->is_khugepaged && !referenced)) {
+ if (unlikely(cc->is_khugepaged && !referenced)) {
result = SCAN_LACK_REFERENCED_PAGE;
} else {
result = SCAN_SUCCEED;
@@ -1421,9 +1419,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
mmu_notifier_test_young(vma->vm_mm, _address)))
referenced++;
}
- if (!writable) {
- result = SCAN_PAGE_RO;
- } else if (cc->is_khugepaged &&
+ if (cc->is_khugepaged &&
(!referenced ||
(unmapped && referenced < HPAGE_PMD_NR / 2))) {
result = SCAN_LACK_REFERENCED_PAGE;
@@ -2830,7 +2826,6 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
case SCAN_PMD_NULL:
case SCAN_PTE_NON_PRESENT:
case SCAN_PTE_UFFD_WP:
- case SCAN_PAGE_RO:
case SCAN_LACK_REFERENCED_PAGE:
case SCAN_PAGE_NULL:
case SCAN_PAGE_COUNT:
--
2.30.2
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2 2/2] mm: Drop all references of writable and SCAN_PAGE_RO
2025-09-08 7:50 [PATCH v2 0/2] Expand scope of khugepaged anonymous collapse Dev Jain
2025-09-08 7:50 ` [PATCH v2 1/2] mm: Enable khugepaged anonymous collapse on non-writable regions Dev Jain
@ 2025-09-08 7:50 ` Dev Jain
2025-09-09 18:51 ` Zach O'Keefe
2025-09-10 4:06 ` Anshuman Khandual
1 sibling, 2 replies; 7+ messages in thread
From: Dev Jain @ 2025-09-08 7:50 UTC (permalink / raw)
To: akpm, david, kas, willy, hughd
Cc: ziy, baolin.wang, lorenzo.stoakes, Liam.Howlett, npache,
ryan.roberts, baohua, richard.weiyang, linux-mm, linux-kernel,
Dev Jain
Now that all actionable outcomes from checking pte_write() are gone,
drop the related references.
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Kiryl Shutsemau <kas@kernel.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: Dev Jain <dev.jain@arm.com>
---
include/trace/events/huge_memory.h | 19 ++++++-------------
mm/khugepaged.c | 14 +++-----------
2 files changed, 9 insertions(+), 24 deletions(-)
diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h
index 2305df6cb485..dd94d14a2427 100644
--- a/include/trace/events/huge_memory.h
+++ b/include/trace/events/huge_memory.h
@@ -19,7 +19,6 @@
EM( SCAN_PTE_NON_PRESENT, "pte_non_present") \
EM( SCAN_PTE_UFFD_WP, "pte_uffd_wp") \
EM( SCAN_PTE_MAPPED_HUGEPAGE, "pte_mapped_hugepage") \
- EM( SCAN_PAGE_RO, "no_writable_page") \
EM( SCAN_LACK_REFERENCED_PAGE, "lack_referenced_page") \
EM( SCAN_PAGE_NULL, "page_null") \
EM( SCAN_SCAN_ABORT, "scan_aborted") \
@@ -55,15 +54,14 @@ SCAN_STATUS
TRACE_EVENT(mm_khugepaged_scan_pmd,
- TP_PROTO(struct mm_struct *mm, struct folio *folio, bool writable,
+ TP_PROTO(struct mm_struct *mm, struct folio *folio,
int referenced, int none_or_zero, int status, int unmapped),
- TP_ARGS(mm, folio, writable, referenced, none_or_zero, status, unmapped),
+ TP_ARGS(mm, folio, referenced, none_or_zero, status, unmapped),
TP_STRUCT__entry(
__field(struct mm_struct *, mm)
__field(unsigned long, pfn)
- __field(bool, writable)
__field(int, referenced)
__field(int, none_or_zero)
__field(int, status)
@@ -73,17 +71,15 @@ TRACE_EVENT(mm_khugepaged_scan_pmd,
TP_fast_assign(
__entry->mm = mm;
__entry->pfn = folio ? folio_pfn(folio) : -1;
- __entry->writable = writable;
__entry->referenced = referenced;
__entry->none_or_zero = none_or_zero;
__entry->status = status;
__entry->unmapped = unmapped;
),
- TP_printk("mm=%p, scan_pfn=0x%lx, writable=%d, referenced=%d, none_or_zero=%d, status=%s, unmapped=%d",
+ TP_printk("mm=%p, scan_pfn=0x%lx, referenced=%d, none_or_zero=%d, status=%s, unmapped=%d",
__entry->mm,
__entry->pfn,
- __entry->writable,
__entry->referenced,
__entry->none_or_zero,
__print_symbolic(__entry->status, SCAN_STATUS),
@@ -117,15 +113,14 @@ TRACE_EVENT(mm_collapse_huge_page,
TRACE_EVENT(mm_collapse_huge_page_isolate,
TP_PROTO(struct folio *folio, int none_or_zero,
- int referenced, bool writable, int status),
+ int referenced, int status),
- TP_ARGS(folio, none_or_zero, referenced, writable, status),
+ TP_ARGS(folio, none_or_zero, referenced, status),
TP_STRUCT__entry(
__field(unsigned long, pfn)
__field(int, none_or_zero)
__field(int, referenced)
- __field(bool, writable)
__field(int, status)
),
@@ -133,15 +128,13 @@ TRACE_EVENT(mm_collapse_huge_page_isolate,
__entry->pfn = folio ? folio_pfn(folio) : -1;
__entry->none_or_zero = none_or_zero;
__entry->referenced = referenced;
- __entry->writable = writable;
__entry->status = status;
),
- TP_printk("scan_pfn=0x%lx, none_or_zero=%d, referenced=%d, writable=%d, status=%s",
+ TP_printk("scan_pfn=0x%lx, none_or_zero=%d, referenced=%d, status=%s",
__entry->pfn,
__entry->none_or_zero,
__entry->referenced,
- __entry->writable,
__print_symbolic(__entry->status, SCAN_STATUS))
);
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index a0f1df2a7ae6..af5f5c80fe4e 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -39,7 +39,6 @@ enum scan_result {
SCAN_PTE_NON_PRESENT,
SCAN_PTE_UFFD_WP,
SCAN_PTE_MAPPED_HUGEPAGE,
- SCAN_PAGE_RO,
SCAN_LACK_REFERENCED_PAGE,
SCAN_PAGE_NULL,
SCAN_SCAN_ABORT,
@@ -557,7 +556,6 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
struct folio *folio = NULL;
pte_t *_pte;
int none_or_zero = 0, shared = 0, result = SCAN_FAIL, referenced = 0;
- bool writable = false;
for (_pte = pte; _pte < pte + HPAGE_PMD_NR;
_pte++, address += PAGE_SIZE) {
@@ -671,9 +669,6 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
folio_test_referenced(folio) || mmu_notifier_test_young(vma->vm_mm,
address)))
referenced++;
-
- if (pte_write(pteval))
- writable = true;
}
if (unlikely(cc->is_khugepaged && !referenced)) {
@@ -681,13 +676,13 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
} else {
result = SCAN_SUCCEED;
trace_mm_collapse_huge_page_isolate(folio, none_or_zero,
- referenced, writable, result);
+ referenced, result);
return result;
}
out:
release_pte_pages(pte, _pte, compound_pagelist);
trace_mm_collapse_huge_page_isolate(folio, none_or_zero,
- referenced, writable, result);
+ referenced, result);
return result;
}
@@ -1280,7 +1275,6 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
unsigned long _address;
spinlock_t *ptl;
int node = NUMA_NO_NODE, unmapped = 0;
- bool writable = false;
VM_BUG_ON(address & ~HPAGE_PMD_MASK);
@@ -1344,8 +1338,6 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
result = SCAN_PTE_UFFD_WP;
goto out_unmap;
}
- if (pte_write(pteval))
- writable = true;
page = vm_normal_page(vma, _address, pteval);
if (unlikely(!page) || unlikely(is_zone_device_page(page))) {
@@ -1435,7 +1427,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
*mmap_locked = false;
}
out:
- trace_mm_khugepaged_scan_pmd(mm, folio, writable, referenced,
+ trace_mm_khugepaged_scan_pmd(mm, folio, referenced,
none_or_zero, result, unmapped);
return result;
}
--
2.30.2
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 1/2] mm: Enable khugepaged anonymous collapse on non-writable regions
2025-09-08 7:50 ` [PATCH v2 1/2] mm: Enable khugepaged anonymous collapse on non-writable regions Dev Jain
@ 2025-09-09 18:49 ` Zach O'Keefe
2025-09-10 4:03 ` Anshuman Khandual
1 sibling, 0 replies; 7+ messages in thread
From: Zach O'Keefe @ 2025-09-09 18:49 UTC (permalink / raw)
To: Dev Jain
Cc: akpm, david, kas, willy, hughd, ziy, baolin.wang,
lorenzo.stoakes, Liam.Howlett, npache, ryan.roberts, baohua,
richard.weiyang, linux-mm, linux-kernel
On Mon, Sep 8, 2025 at 12:51 AM Dev Jain <dev.jain@arm.com> wrote:
>
> Currently khugepaged does not collapse an anonymous region which does not
> have a single writable pte. This is wasteful since a region mapped with
> non-writable ptes, for example, non-writable VMAs mapped by the
> application, won't benefit from THP collapse.
>
> An additional consequence of this constraint is that MADV_COLLAPSE does not
> perform a collapse on a non-writable VMA, and this restriction is nowhere
> to be found on the manpage - the restriction itself sounds wrong to me
> since the user knows the protection of the memory it has mapped, so
> collapsing read-only memory via madvise() should be a choice of the
> user which shouldn't be overridden by the kernel.
Sorry ; late to the party. Certainly agree wrt MADV_COLLAPSE.
Ditto for khugepaged as well. Check added when support for
non-writable pages were added to khugepaged, though retaining
heuristic that at least one pte should be writable; 10359213d05a
("mm: incorporate read-only pages into transparent huge pages"), which
predates max_ptes_swap.
> Therefore, remove this restriction by not honouring SCAN_PAGE_RO.>
> Acked-by: David Hildenbrand <david@redhat.com>
> Acked-by: Zi Yan <ziy@nvidia.com>
> Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
> Reviewed-by: Kiryl Shutsemau <kas@kernel.org>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Zach O'Keefe <zokeefe@google.com>
> Signed-off-by: Dev Jain <dev.jain@arm.com>
> ---
> mm/khugepaged.c | 9 ++-------
> 1 file changed, 2 insertions(+), 7 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 4ec324a4c1fe..a0f1df2a7ae6 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -676,9 +676,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> writable = true;
> }
>
> - if (unlikely(!writable)) {
> - result = SCAN_PAGE_RO;
> - } else if (unlikely(cc->is_khugepaged && !referenced)) {
> + if (unlikely(cc->is_khugepaged && !referenced)) {
> result = SCAN_LACK_REFERENCED_PAGE;
> } else {
> result = SCAN_SUCCEED;
> @@ -1421,9 +1419,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
> mmu_notifier_test_young(vma->vm_mm, _address)))
> referenced++;
> }
> - if (!writable) {
> - result = SCAN_PAGE_RO;
> - } else if (cc->is_khugepaged &&
> + if (cc->is_khugepaged &&
> (!referenced ||
> (unmapped && referenced < HPAGE_PMD_NR / 2))) {
> result = SCAN_LACK_REFERENCED_PAGE;
> @@ -2830,7 +2826,6 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
> case SCAN_PMD_NULL:
> case SCAN_PTE_NON_PRESENT:
> case SCAN_PTE_UFFD_WP:
> - case SCAN_PAGE_RO:
> case SCAN_LACK_REFERENCED_PAGE:
> case SCAN_PAGE_NULL:
> case SCAN_PAGE_COUNT:
> --
> 2.30.2
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 2/2] mm: Drop all references of writable and SCAN_PAGE_RO
2025-09-08 7:50 ` [PATCH v2 2/2] mm: Drop all references of writable and SCAN_PAGE_RO Dev Jain
@ 2025-09-09 18:51 ` Zach O'Keefe
2025-09-10 4:06 ` Anshuman Khandual
1 sibling, 0 replies; 7+ messages in thread
From: Zach O'Keefe @ 2025-09-09 18:51 UTC (permalink / raw)
To: Dev Jain
Cc: akpm, david, kas, willy, hughd, ziy, baolin.wang,
lorenzo.stoakes, Liam.Howlett, npache, ryan.roberts, baohua,
richard.weiyang, linux-mm, linux-kernel
Thanks, Dev.
On Mon, Sep 8, 2025 at 12:51 AM Dev Jain <dev.jain@arm.com> wrote:
>
> Now that all actionable outcomes from checking pte_write() are gone,
> drop the related references.
>
> Acked-by: David Hildenbrand <david@redhat.com>
> Acked-by: Zi Yan <ziy@nvidia.com>
> Reviewed-by: Kiryl Shutsemau <kas@kernel.org>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Zach O'Keefe <zokeefe@google.com>
> Signed-off-by: Dev Jain <dev.jain@arm.com>
> ---
> include/trace/events/huge_memory.h | 19 ++++++-------------
> mm/khugepaged.c | 14 +++-----------
> 2 files changed, 9 insertions(+), 24 deletions(-)
>
> diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h
> index 2305df6cb485..dd94d14a2427 100644
> --- a/include/trace/events/huge_memory.h
> +++ b/include/trace/events/huge_memory.h
> @@ -19,7 +19,6 @@
> EM( SCAN_PTE_NON_PRESENT, "pte_non_present") \
> EM( SCAN_PTE_UFFD_WP, "pte_uffd_wp") \
> EM( SCAN_PTE_MAPPED_HUGEPAGE, "pte_mapped_hugepage") \
> - EM( SCAN_PAGE_RO, "no_writable_page") \
> EM( SCAN_LACK_REFERENCED_PAGE, "lack_referenced_page") \
> EM( SCAN_PAGE_NULL, "page_null") \
> EM( SCAN_SCAN_ABORT, "scan_aborted") \
> @@ -55,15 +54,14 @@ SCAN_STATUS
>
> TRACE_EVENT(mm_khugepaged_scan_pmd,
>
> - TP_PROTO(struct mm_struct *mm, struct folio *folio, bool writable,
> + TP_PROTO(struct mm_struct *mm, struct folio *folio,
> int referenced, int none_or_zero, int status, int unmapped),
>
> - TP_ARGS(mm, folio, writable, referenced, none_or_zero, status, unmapped),
> + TP_ARGS(mm, folio, referenced, none_or_zero, status, unmapped),
>
> TP_STRUCT__entry(
> __field(struct mm_struct *, mm)
> __field(unsigned long, pfn)
> - __field(bool, writable)
> __field(int, referenced)
> __field(int, none_or_zero)
> __field(int, status)
> @@ -73,17 +71,15 @@ TRACE_EVENT(mm_khugepaged_scan_pmd,
> TP_fast_assign(
> __entry->mm = mm;
> __entry->pfn = folio ? folio_pfn(folio) : -1;
> - __entry->writable = writable;
> __entry->referenced = referenced;
> __entry->none_or_zero = none_or_zero;
> __entry->status = status;
> __entry->unmapped = unmapped;
> ),
>
> - TP_printk("mm=%p, scan_pfn=0x%lx, writable=%d, referenced=%d, none_or_zero=%d, status=%s, unmapped=%d",
> + TP_printk("mm=%p, scan_pfn=0x%lx, referenced=%d, none_or_zero=%d, status=%s, unmapped=%d",
> __entry->mm,
> __entry->pfn,
> - __entry->writable,
> __entry->referenced,
> __entry->none_or_zero,
> __print_symbolic(__entry->status, SCAN_STATUS),
> @@ -117,15 +113,14 @@ TRACE_EVENT(mm_collapse_huge_page,
> TRACE_EVENT(mm_collapse_huge_page_isolate,
>
> TP_PROTO(struct folio *folio, int none_or_zero,
> - int referenced, bool writable, int status),
> + int referenced, int status),
>
> - TP_ARGS(folio, none_or_zero, referenced, writable, status),
> + TP_ARGS(folio, none_or_zero, referenced, status),
>
> TP_STRUCT__entry(
> __field(unsigned long, pfn)
> __field(int, none_or_zero)
> __field(int, referenced)
> - __field(bool, writable)
> __field(int, status)
> ),
>
> @@ -133,15 +128,13 @@ TRACE_EVENT(mm_collapse_huge_page_isolate,
> __entry->pfn = folio ? folio_pfn(folio) : -1;
> __entry->none_or_zero = none_or_zero;
> __entry->referenced = referenced;
> - __entry->writable = writable;
> __entry->status = status;
> ),
>
> - TP_printk("scan_pfn=0x%lx, none_or_zero=%d, referenced=%d, writable=%d, status=%s",
> + TP_printk("scan_pfn=0x%lx, none_or_zero=%d, referenced=%d, status=%s",
> __entry->pfn,
> __entry->none_or_zero,
> __entry->referenced,
> - __entry->writable,
> __print_symbolic(__entry->status, SCAN_STATUS))
> );
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index a0f1df2a7ae6..af5f5c80fe4e 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -39,7 +39,6 @@ enum scan_result {
> SCAN_PTE_NON_PRESENT,
> SCAN_PTE_UFFD_WP,
> SCAN_PTE_MAPPED_HUGEPAGE,
> - SCAN_PAGE_RO,
> SCAN_LACK_REFERENCED_PAGE,
> SCAN_PAGE_NULL,
> SCAN_SCAN_ABORT,
> @@ -557,7 +556,6 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> struct folio *folio = NULL;
> pte_t *_pte;
> int none_or_zero = 0, shared = 0, result = SCAN_FAIL, referenced = 0;
> - bool writable = false;
>
> for (_pte = pte; _pte < pte + HPAGE_PMD_NR;
> _pte++, address += PAGE_SIZE) {
> @@ -671,9 +669,6 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> folio_test_referenced(folio) || mmu_notifier_test_young(vma->vm_mm,
> address)))
> referenced++;
> -
> - if (pte_write(pteval))
> - writable = true;
> }
>
> if (unlikely(cc->is_khugepaged && !referenced)) {
> @@ -681,13 +676,13 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> } else {
> result = SCAN_SUCCEED;
> trace_mm_collapse_huge_page_isolate(folio, none_or_zero,
> - referenced, writable, result);
> + referenced, result);
> return result;
> }
> out:
> release_pte_pages(pte, _pte, compound_pagelist);
> trace_mm_collapse_huge_page_isolate(folio, none_or_zero,
> - referenced, writable, result);
> + referenced, result);
> return result;
> }
>
> @@ -1280,7 +1275,6 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
> unsigned long _address;
> spinlock_t *ptl;
> int node = NUMA_NO_NODE, unmapped = 0;
> - bool writable = false;
>
> VM_BUG_ON(address & ~HPAGE_PMD_MASK);
>
> @@ -1344,8 +1338,6 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
> result = SCAN_PTE_UFFD_WP;
> goto out_unmap;
> }
> - if (pte_write(pteval))
> - writable = true;
>
> page = vm_normal_page(vma, _address, pteval);
> if (unlikely(!page) || unlikely(is_zone_device_page(page))) {
> @@ -1435,7 +1427,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
> *mmap_locked = false;
> }
> out:
> - trace_mm_khugepaged_scan_pmd(mm, folio, writable, referenced,
> + trace_mm_khugepaged_scan_pmd(mm, folio, referenced,
> none_or_zero, result, unmapped);
> return result;
> }
> --
> 2.30.2
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 1/2] mm: Enable khugepaged anonymous collapse on non-writable regions
2025-09-08 7:50 ` [PATCH v2 1/2] mm: Enable khugepaged anonymous collapse on non-writable regions Dev Jain
2025-09-09 18:49 ` Zach O'Keefe
@ 2025-09-10 4:03 ` Anshuman Khandual
1 sibling, 0 replies; 7+ messages in thread
From: Anshuman Khandual @ 2025-09-10 4:03 UTC (permalink / raw)
To: Dev Jain, akpm, david, kas, willy, hughd
Cc: ziy, baolin.wang, lorenzo.stoakes, Liam.Howlett, npache,
ryan.roberts, baohua, richard.weiyang, linux-mm, linux-kernel
On 08/09/25 1:20 PM, Dev Jain wrote:
> Currently khugepaged does not collapse an anonymous region which does not
> have a single writable pte. This is wasteful since a region mapped with
> non-writable ptes, for example, non-writable VMAs mapped by the
> application, won't benefit from THP collapse.
>
> An additional consequence of this constraint is that MADV_COLLAPSE does not
> perform a collapse on a non-writable VMA, and this restriction is nowhere
> to be found on the manpage - the restriction itself sounds wrong to me
> since the user knows the protection of the memory it has mapped, so
> collapsing read-only memory via madvise() should be a choice of the
> user which shouldn't be overridden by the kernel.
Agreed. Dropping this constraint makes sense both for MAD_COLLAPSE
system call and khugepaged based collapse as well.
>
> Therefore, remove this restriction by not honouring SCAN_PAGE_RO.
>
> Acked-by: David Hildenbrand <david@redhat.com>
> Acked-by: Zi Yan <ziy@nvidia.com>
> Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
> Reviewed-by: Kiryl Shutsemau <kas@kernel.org>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> Signed-off-by: Dev Jain <dev.jain@arm.com>
> ---
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
> mm/khugepaged.c | 9 ++-------
> 1 file changed, 2 insertions(+), 7 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 4ec324a4c1fe..a0f1df2a7ae6 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -676,9 +676,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> writable = true;
> }
>
> - if (unlikely(!writable)) {
> - result = SCAN_PAGE_RO;
> - } else if (unlikely(cc->is_khugepaged && !referenced)) {
> + if (unlikely(cc->is_khugepaged && !referenced)) {
> result = SCAN_LACK_REFERENCED_PAGE;
> } else {
> result = SCAN_SUCCEED;
> @@ -1421,9 +1419,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
> mmu_notifier_test_young(vma->vm_mm, _address)))
> referenced++;
> }
> - if (!writable) {
> - result = SCAN_PAGE_RO;
> - } else if (cc->is_khugepaged &&
> + if (cc->is_khugepaged &&
> (!referenced ||
> (unmapped && referenced < HPAGE_PMD_NR / 2))) {
> result = SCAN_LACK_REFERENCED_PAGE;
> @@ -2830,7 +2826,6 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
> case SCAN_PMD_NULL:
> case SCAN_PTE_NON_PRESENT:
> case SCAN_PTE_UFFD_WP:
> - case SCAN_PAGE_RO:
> case SCAN_LACK_REFERENCED_PAGE:
> case SCAN_PAGE_NULL:
> case SCAN_PAGE_COUNT:
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 2/2] mm: Drop all references of writable and SCAN_PAGE_RO
2025-09-08 7:50 ` [PATCH v2 2/2] mm: Drop all references of writable and SCAN_PAGE_RO Dev Jain
2025-09-09 18:51 ` Zach O'Keefe
@ 2025-09-10 4:06 ` Anshuman Khandual
1 sibling, 0 replies; 7+ messages in thread
From: Anshuman Khandual @ 2025-09-10 4:06 UTC (permalink / raw)
To: Dev Jain, akpm, david, kas, willy, hughd
Cc: ziy, baolin.wang, lorenzo.stoakes, Liam.Howlett, npache,
ryan.roberts, baohua, richard.weiyang, linux-mm, linux-kernel
On 08/09/25 1:20 PM, Dev Jain wrote:
> Now that all actionable outcomes from checking pte_write() are gone,
> drop the related references.
>
> Acked-by: David Hildenbrand <david@redhat.com>
> Acked-by: Zi Yan <ziy@nvidia.com>
> Reviewed-by: Kiryl Shutsemau <kas@kernel.org>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> Signed-off-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
> include/trace/events/huge_memory.h | 19 ++++++-------------
> mm/khugepaged.c | 14 +++-----------
> 2 files changed, 9 insertions(+), 24 deletions(-)
>
> diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h
> index 2305df6cb485..dd94d14a2427 100644
> --- a/include/trace/events/huge_memory.h
> +++ b/include/trace/events/huge_memory.h
> @@ -19,7 +19,6 @@
> EM( SCAN_PTE_NON_PRESENT, "pte_non_present") \
> EM( SCAN_PTE_UFFD_WP, "pte_uffd_wp") \
> EM( SCAN_PTE_MAPPED_HUGEPAGE, "pte_mapped_hugepage") \
> - EM( SCAN_PAGE_RO, "no_writable_page") \
> EM( SCAN_LACK_REFERENCED_PAGE, "lack_referenced_page") \
> EM( SCAN_PAGE_NULL, "page_null") \
> EM( SCAN_SCAN_ABORT, "scan_aborted") \
> @@ -55,15 +54,14 @@ SCAN_STATUS
>
> TRACE_EVENT(mm_khugepaged_scan_pmd,
>
> - TP_PROTO(struct mm_struct *mm, struct folio *folio, bool writable,
> + TP_PROTO(struct mm_struct *mm, struct folio *folio,
> int referenced, int none_or_zero, int status, int unmapped),
>
> - TP_ARGS(mm, folio, writable, referenced, none_or_zero, status, unmapped),
> + TP_ARGS(mm, folio, referenced, none_or_zero, status, unmapped),
>
> TP_STRUCT__entry(
> __field(struct mm_struct *, mm)
> __field(unsigned long, pfn)
> - __field(bool, writable)
> __field(int, referenced)
> __field(int, none_or_zero)
> __field(int, status)
> @@ -73,17 +71,15 @@ TRACE_EVENT(mm_khugepaged_scan_pmd,
> TP_fast_assign(
> __entry->mm = mm;
> __entry->pfn = folio ? folio_pfn(folio) : -1;
> - __entry->writable = writable;
> __entry->referenced = referenced;
> __entry->none_or_zero = none_or_zero;
> __entry->status = status;
> __entry->unmapped = unmapped;
> ),
>
> - TP_printk("mm=%p, scan_pfn=0x%lx, writable=%d, referenced=%d, none_or_zero=%d, status=%s, unmapped=%d",
> + TP_printk("mm=%p, scan_pfn=0x%lx, referenced=%d, none_or_zero=%d, status=%s, unmapped=%d",
> __entry->mm,
> __entry->pfn,
> - __entry->writable,
> __entry->referenced,
> __entry->none_or_zero,
> __print_symbolic(__entry->status, SCAN_STATUS),
> @@ -117,15 +113,14 @@ TRACE_EVENT(mm_collapse_huge_page,
> TRACE_EVENT(mm_collapse_huge_page_isolate,
>
> TP_PROTO(struct folio *folio, int none_or_zero,
> - int referenced, bool writable, int status),
> + int referenced, int status),
>
> - TP_ARGS(folio, none_or_zero, referenced, writable, status),
> + TP_ARGS(folio, none_or_zero, referenced, status),
>
> TP_STRUCT__entry(
> __field(unsigned long, pfn)
> __field(int, none_or_zero)
> __field(int, referenced)
> - __field(bool, writable)
> __field(int, status)
> ),
>
> @@ -133,15 +128,13 @@ TRACE_EVENT(mm_collapse_huge_page_isolate,
> __entry->pfn = folio ? folio_pfn(folio) : -1;
> __entry->none_or_zero = none_or_zero;
> __entry->referenced = referenced;
> - __entry->writable = writable;
> __entry->status = status;
> ),
>
> - TP_printk("scan_pfn=0x%lx, none_or_zero=%d, referenced=%d, writable=%d, status=%s",
> + TP_printk("scan_pfn=0x%lx, none_or_zero=%d, referenced=%d, status=%s",
> __entry->pfn,
> __entry->none_or_zero,
> __entry->referenced,
> - __entry->writable,
> __print_symbolic(__entry->status, SCAN_STATUS))
> );
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index a0f1df2a7ae6..af5f5c80fe4e 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -39,7 +39,6 @@ enum scan_result {
> SCAN_PTE_NON_PRESENT,
> SCAN_PTE_UFFD_WP,
> SCAN_PTE_MAPPED_HUGEPAGE,
> - SCAN_PAGE_RO,
> SCAN_LACK_REFERENCED_PAGE,
> SCAN_PAGE_NULL,
> SCAN_SCAN_ABORT,
> @@ -557,7 +556,6 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> struct folio *folio = NULL;
> pte_t *_pte;
> int none_or_zero = 0, shared = 0, result = SCAN_FAIL, referenced = 0;
> - bool writable = false;
>
> for (_pte = pte; _pte < pte + HPAGE_PMD_NR;
> _pte++, address += PAGE_SIZE) {
> @@ -671,9 +669,6 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> folio_test_referenced(folio) || mmu_notifier_test_young(vma->vm_mm,
> address)))
> referenced++;
> -
> - if (pte_write(pteval))
> - writable = true;
> }
>
> if (unlikely(cc->is_khugepaged && !referenced)) {
> @@ -681,13 +676,13 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> } else {
> result = SCAN_SUCCEED;
> trace_mm_collapse_huge_page_isolate(folio, none_or_zero,
> - referenced, writable, result);
> + referenced, result);
> return result;
> }
> out:
> release_pte_pages(pte, _pte, compound_pagelist);
> trace_mm_collapse_huge_page_isolate(folio, none_or_zero,
> - referenced, writable, result);
> + referenced, result);
> return result;
> }
>
> @@ -1280,7 +1275,6 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
> unsigned long _address;
> spinlock_t *ptl;
> int node = NUMA_NO_NODE, unmapped = 0;
> - bool writable = false;
>
> VM_BUG_ON(address & ~HPAGE_PMD_MASK);
>
> @@ -1344,8 +1338,6 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
> result = SCAN_PTE_UFFD_WP;
> goto out_unmap;
> }
> - if (pte_write(pteval))
> - writable = true;
>
> page = vm_normal_page(vma, _address, pteval);
> if (unlikely(!page) || unlikely(is_zone_device_page(page))) {
> @@ -1435,7 +1427,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
> *mmap_locked = false;
> }
> out:
> - trace_mm_khugepaged_scan_pmd(mm, folio, writable, referenced,
> + trace_mm_khugepaged_scan_pmd(mm, folio, referenced,
> none_or_zero, result, unmapped);
> return result;
> }
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-09-10 4:06 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-09-08 7:50 [PATCH v2 0/2] Expand scope of khugepaged anonymous collapse Dev Jain
2025-09-08 7:50 ` [PATCH v2 1/2] mm: Enable khugepaged anonymous collapse on non-writable regions Dev Jain
2025-09-09 18:49 ` Zach O'Keefe
2025-09-10 4:03 ` Anshuman Khandual
2025-09-08 7:50 ` [PATCH v2 2/2] mm: Drop all references of writable and SCAN_PAGE_RO Dev Jain
2025-09-09 18:51 ` Zach O'Keefe
2025-09-10 4:06 ` Anshuman Khandual
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox