* Re: [PATCH 1/1] mm/thp: fix MTE tag mismatch when replacing zero-filled subpages
2025-09-22 2:14 [PATCH 1/1] mm/thp: fix MTE tag mismatch when replacing zero-filled subpages Lance Yang
@ 2025-09-22 2:36 ` Zi Yan
2025-09-22 3:36 ` Lance Yang
2025-09-22 7:41 ` David Hildenbrand
` (3 subsequent siblings)
4 siblings, 1 reply; 22+ messages in thread
From: Zi Yan @ 2025-09-22 2:36 UTC (permalink / raw)
To: Lance Yang
Cc: akpm, david, lorenzo.stoakes, usamaarif642, yuzhao, baolin.wang,
baohua, voidice, Liam.Howlett, catalin.marinas,
cerasuolodomenico, hannes, kaleshsingh, npache, riel,
roman.gushchin, rppt, ryan.roberts, dev.jain, ryncsn,
shakeel.butt, surenb, hughd, willy, matthew.brost, joshua.hahnjy,
rakie.kim, byungchul, gourry, ying.huang, apopple, qun-wei.lin,
Andrew.Yang, casper.li, chinwen.chang, linux-arm-kernel,
linux-kernel, linux-mediatek, linux-mm, ioworker0, stable
On 21 Sep 2025, at 22:14, Lance Yang wrote:
> From: Lance Yang <lance.yang@linux.dev>
>
> When both THP and MTE are enabled, splitting a THP and replacing its
> zero-filled subpages with the shared zeropage can cause MTE tag mismatch
> faults in userspace.
>
> Remapping zero-filled subpages to the shared zeropage is unsafe, as the
> zeropage has a fixed tag of zero, which may not match the tag expected by
> the userspace pointer.
>
> KSM already avoids this problem by using memcmp_pages(), which on arm64
> intentionally reports MTE-tagged pages as non-identical to prevent unsafe
> merging.
>
> As suggested by David[1], this patch adopts the same pattern, replacing the
> memchr_inv() byte-level check with a call to pages_identical(). This
> leverages existing architecture-specific logic to determine if a page is
> truly identical to the shared zeropage.
>
> Having both the THP shrinker and KSM rely on pages_identical() makes the
> design more future-proof, IMO. Instead of handling quirks in generic code,
> we just let the architecture decide what makes two pages identical.
>
> [1] https://lore.kernel.org/all/ca2106a3-4bb2-4457-81af-301fd99fbef4@redhat.com
>
> Cc: <stable@vger.kernel.org>
> Reported-by: Qun-wei Lin <Qun-wei.Lin@mediatek.com>
> Closes: https://lore.kernel.org/all/a7944523fcc3634607691c35311a5d59d1a3f8d4.camel@mediatek.com
> Fixes: b1f202060afe ("mm: remap unused subpages to shared zeropage when splitting isolated thp")
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Lance Yang <lance.yang@linux.dev>
> ---
> Tested on x86_64 and on QEMU for arm64 (with and without MTE support),
> and the fix works as expected.
From [1], I see you mentioned RISC-V also has the address masking feature.
Is it affected by this? And memcmp_pages() is only implemented by ARM64
for MTE. Should any arch with address masking always implement it to avoid
the same issue?
>
> mm/huge_memory.c | 15 +++------------
> mm/migrate.c | 8 +-------
> 2 files changed, 4 insertions(+), 19 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 32e0ec2dde36..28d4b02a1aa5 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -4104,29 +4104,20 @@ static unsigned long deferred_split_count(struct shrinker *shrink,
> static bool thp_underused(struct folio *folio)
> {
> int num_zero_pages = 0, num_filled_pages = 0;
> - void *kaddr;
> int i;
>
> for (i = 0; i < folio_nr_pages(folio); i++) {
> - kaddr = kmap_local_folio(folio, i * PAGE_SIZE);
> - if (!memchr_inv(kaddr, 0, PAGE_SIZE)) {
> - num_zero_pages++;
> - if (num_zero_pages > khugepaged_max_ptes_none) {
> - kunmap_local(kaddr);
> + if (pages_identical(folio_page(folio, i), ZERO_PAGE(0))) {
> + if (++num_zero_pages > khugepaged_max_ptes_none)
> return true;
> - }
> } else {
> /*
> * Another path for early exit once the number
> * of non-zero filled pages exceeds threshold.
> */
> - num_filled_pages++;
> - if (num_filled_pages >= HPAGE_PMD_NR - khugepaged_max_ptes_none) {
> - kunmap_local(kaddr);
> + if (++num_filled_pages >= HPAGE_PMD_NR - khugepaged_max_ptes_none)
> return false;
> - }
> }
> - kunmap_local(kaddr);
> }
> return false;
> }
> diff --git a/mm/migrate.c b/mm/migrate.c
> index aee61a980374..ce83c2c3c287 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -300,9 +300,7 @@ static bool try_to_map_unused_to_zeropage(struct page_vma_mapped_walk *pvmw,
> unsigned long idx)
> {
> struct page *page = folio_page(folio, idx);
> - bool contains_data;
> pte_t newpte;
> - void *addr;
>
> if (PageCompound(page))
> return false;
> @@ -319,11 +317,7 @@ static bool try_to_map_unused_to_zeropage(struct page_vma_mapped_walk *pvmw,
> * this subpage has been non present. If the subpage is only zero-filled
> * then map it to the shared zeropage.
> */
> - addr = kmap_local_page(page);
> - contains_data = memchr_inv(addr, 0, PAGE_SIZE);
> - kunmap_local(addr);
> -
> - if (contains_data)
> + if (!pages_identical(page, ZERO_PAGE(0)))
> return false;
>
> newpte = pte_mkspecial(pfn_pte(my_zero_pfn(pvmw->address),
> --
> 2.49.0
The changes look good to me. Thanks. Acked-by: Zi Yan <ziy@nvidia.com>
--
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH 1/1] mm/thp: fix MTE tag mismatch when replacing zero-filled subpages
2025-09-22 2:36 ` Zi Yan
@ 2025-09-22 3:36 ` Lance Yang
0 siblings, 0 replies; 22+ messages in thread
From: Lance Yang @ 2025-09-22 3:36 UTC (permalink / raw)
To: Zi Yan
Cc: akpm, david, lorenzo.stoakes, usamaarif642, yuzhao, baolin.wang,
baohua, voidice, Liam.Howlett, catalin.marinas,
cerasuolodomenico, hannes, kaleshsingh, npache, riel,
roman.gushchin, rppt, ryan.roberts, dev.jain, ryncsn,
shakeel.butt, surenb, hughd, willy, matthew.brost, joshua.hahnjy,
rakie.kim, byungchul, gourry, ying.huang, apopple, qun-wei.lin,
Andrew.Yang, casper.li, chinwen.chang, linux-arm-kernel,
linux-kernel, linux-mediatek, linux-mm, ioworker0, stable,
linux-riscv, palmer, samuel.holland, charlie
Cc: RISC-V folks
On 2025/9/22 10:36, Zi Yan wrote:
> On 21 Sep 2025, at 22:14, Lance Yang wrote:
>
>> From: Lance Yang <lance.yang@linux.dev>
>>
>> When both THP and MTE are enabled, splitting a THP and replacing its
>> zero-filled subpages with the shared zeropage can cause MTE tag mismatch
>> faults in userspace.
>>
>> Remapping zero-filled subpages to the shared zeropage is unsafe, as the
>> zeropage has a fixed tag of zero, which may not match the tag expected by
>> the userspace pointer.
>>
>> KSM already avoids this problem by using memcmp_pages(), which on arm64
>> intentionally reports MTE-tagged pages as non-identical to prevent unsafe
>> merging.
>>
>> As suggested by David[1], this patch adopts the same pattern, replacing the
>> memchr_inv() byte-level check with a call to pages_identical(). This
>> leverages existing architecture-specific logic to determine if a page is
>> truly identical to the shared zeropage.
>>
>> Having both the THP shrinker and KSM rely on pages_identical() makes the
>> design more future-proof, IMO. Instead of handling quirks in generic code,
>> we just let the architecture decide what makes two pages identical.
>>
>> [1] https://lore.kernel.org/all/ca2106a3-4bb2-4457-81af-301fd99fbef4@redhat.com
>>
>> Cc: <stable@vger.kernel.org>
>> Reported-by: Qun-wei Lin <Qun-wei.Lin@mediatek.com>
>> Closes: https://lore.kernel.org/all/a7944523fcc3634607691c35311a5d59d1a3f8d4.camel@mediatek.com
>> Fixes: b1f202060afe ("mm: remap unused subpages to shared zeropage when splitting isolated thp")
>> Suggested-by: David Hildenbrand <david@redhat.com>
>> Signed-off-by: Lance Yang <lance.yang@linux.dev>
>> ---
>> Tested on x86_64 and on QEMU for arm64 (with and without MTE support),
>> and the fix works as expected.
>
> From [1], I see you mentioned RISC-V also has the address masking feature.
> Is it affected by this? And memcmp_pages() is only implemented by ARM64
> for MTE. Should any arch with address masking always implement it to avoid
> the same issue?
Yeah, I'm new to RISC-V, seems like RISC-V has a similar feature as
described in Documentation/arch/riscv/uabi.rst, which is the Supm
(Supervisor-mode Pointer Masking) extension.
```
Pointer masking
---------------
Support for pointer masking in userspace (the Supm extension) is
provided via
the ``PR_SET_TAGGED_ADDR_CTRL`` and ``PR_GET_TAGGED_ADDR_CTRL`` ``prctl()``
operations. Pointer masking is disabled by default. To enable it, userspace
must call ``PR_SET_TAGGED_ADDR_CTRL`` with the ``PR_PMLEN`` field set to the
number of mask/tag bits needed by the application. ``PR_PMLEN`` is
interpreted
as a lower bound; if the kernel is unable to satisfy the request, the
``PR_SET_TAGGED_ADDR_CTRL`` operation will fail. The actual number of
tag bits
is returned in ``PR_PMLEN`` by the ``PR_GET_TAGGED_ADDR_CTRL`` operation.
```
But, IIUC, Supm by itself only ensures that the upper bits are ignored on
memory access :)
So, RISC-V today would likely not be affected. However, once it implements
full hardware tag checking, it will face the exact same zero-page problem.
Anyway, any architecture with a feature like MTE in the future will need
its own memcmp_pages() to prevent unsafe merges ;)
>
>>
>> mm/huge_memory.c | 15 +++------------
>> mm/migrate.c | 8 +-------
>> 2 files changed, 4 insertions(+), 19 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 32e0ec2dde36..28d4b02a1aa5 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -4104,29 +4104,20 @@ static unsigned long deferred_split_count(struct shrinker *shrink,
>> static bool thp_underused(struct folio *folio)
>> {
>> int num_zero_pages = 0, num_filled_pages = 0;
>> - void *kaddr;
>> int i;
>>
>> for (i = 0; i < folio_nr_pages(folio); i++) {
>> - kaddr = kmap_local_folio(folio, i * PAGE_SIZE);
>> - if (!memchr_inv(kaddr, 0, PAGE_SIZE)) {
>> - num_zero_pages++;
>> - if (num_zero_pages > khugepaged_max_ptes_none) {
>> - kunmap_local(kaddr);
>> + if (pages_identical(folio_page(folio, i), ZERO_PAGE(0))) {
>> + if (++num_zero_pages > khugepaged_max_ptes_none)
>> return true;
>> - }
>> } else {
>> /*
>> * Another path for early exit once the number
>> * of non-zero filled pages exceeds threshold.
>> */
>> - num_filled_pages++;
>> - if (num_filled_pages >= HPAGE_PMD_NR - khugepaged_max_ptes_none) {
>> - kunmap_local(kaddr);
>> + if (++num_filled_pages >= HPAGE_PMD_NR - khugepaged_max_ptes_none)
>> return false;
>> - }
>> }
>> - kunmap_local(kaddr);
>> }
>> return false;
>> }
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index aee61a980374..ce83c2c3c287 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -300,9 +300,7 @@ static bool try_to_map_unused_to_zeropage(struct page_vma_mapped_walk *pvmw,
>> unsigned long idx)
>> {
>> struct page *page = folio_page(folio, idx);
>> - bool contains_data;
>> pte_t newpte;
>> - void *addr;
>>
>> if (PageCompound(page))
>> return false;
>> @@ -319,11 +317,7 @@ static bool try_to_map_unused_to_zeropage(struct page_vma_mapped_walk *pvmw,
>> * this subpage has been non present. If the subpage is only zero-filled
>> * then map it to the shared zeropage.
>> */
>> - addr = kmap_local_page(page);
>> - contains_data = memchr_inv(addr, 0, PAGE_SIZE);
>> - kunmap_local(addr);
>> -
>> - if (contains_data)
>> + if (!pages_identical(page, ZERO_PAGE(0)))
>> return false;
>>
>> newpte = pte_mkspecial(pfn_pte(my_zero_pfn(pvmw->address),
>> --
>> 2.49.0
>
> The changes look good to me. Thanks. Acked-by: Zi Yan <ziy@nvidia.com>
Cheers!
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/1] mm/thp: fix MTE tag mismatch when replacing zero-filled subpages
2025-09-22 2:14 [PATCH 1/1] mm/thp: fix MTE tag mismatch when replacing zero-filled subpages Lance Yang
2025-09-22 2:36 ` Zi Yan
@ 2025-09-22 7:41 ` David Hildenbrand
2025-09-22 8:24 ` Usama Arif
` (2 subsequent siblings)
4 siblings, 0 replies; 22+ messages in thread
From: David Hildenbrand @ 2025-09-22 7:41 UTC (permalink / raw)
To: Lance Yang, akpm, lorenzo.stoakes
Cc: usamaarif642, yuzhao, ziy, baolin.wang, baohua, voidice,
Liam.Howlett, catalin.marinas, cerasuolodomenico, hannes,
kaleshsingh, npache, riel, roman.gushchin, rppt, ryan.roberts,
dev.jain, ryncsn, shakeel.butt, surenb, hughd, willy,
matthew.brost, joshua.hahnjy, rakie.kim, byungchul, gourry,
ying.huang, apopple, qun-wei.lin, Andrew.Yang, casper.li,
chinwen.chang, linux-arm-kernel, linux-kernel, linux-mediatek,
linux-mm, ioworker0, stable
On 22.09.25 04:14, Lance Yang wrote:
> From: Lance Yang <lance.yang@linux.dev>
>
> When both THP and MTE are enabled, splitting a THP and replacing its
> zero-filled subpages with the shared zeropage can cause MTE tag mismatch
> faults in userspace.
>
> Remapping zero-filled subpages to the shared zeropage is unsafe, as the
> zeropage has a fixed tag of zero, which may not match the tag expected by
> the userspace pointer.
>
> KSM already avoids this problem by using memcmp_pages(), which on arm64
> intentionally reports MTE-tagged pages as non-identical to prevent unsafe
> merging.
>
> As suggested by David[1], this patch adopts the same pattern, replacing the
> memchr_inv() byte-level check with a call to pages_identical(). This
> leverages existing architecture-specific logic to determine if a page is
> truly identical to the shared zeropage.
>
> Having both the THP shrinker and KSM rely on pages_identical() makes the
> design more future-proof, IMO. Instead of handling quirks in generic code,
> we just let the architecture decide what makes two pages identical.
>
> [1] https://lore.kernel.org/all/ca2106a3-4bb2-4457-81af-301fd99fbef4@redhat.com
>
> Cc: <stable@vger.kernel.org>
> Reported-by: Qun-wei Lin <Qun-wei.Lin@mediatek.com>
> Closes: https://lore.kernel.org/all/a7944523fcc3634607691c35311a5d59d1a3f8d4.camel@mediatek.com
> Fixes: b1f202060afe ("mm: remap unused subpages to shared zeropage when splitting isolated thp")
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Lance Yang <lance.yang@linux.dev>
> ---
LGTM, thanks
Acked-by: David Hildenbrand <david@redhat.com>
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH 1/1] mm/thp: fix MTE tag mismatch when replacing zero-filled subpages
2025-09-22 2:14 [PATCH 1/1] mm/thp: fix MTE tag mismatch when replacing zero-filled subpages Lance Yang
2025-09-22 2:36 ` Zi Yan
2025-09-22 7:41 ` David Hildenbrand
@ 2025-09-22 8:24 ` Usama Arif
2025-09-22 17:24 ` Catalin Marinas
2025-09-23 2:10 ` Wei Yang
4 siblings, 0 replies; 22+ messages in thread
From: Usama Arif @ 2025-09-22 8:24 UTC (permalink / raw)
To: Lance Yang, akpm, david, lorenzo.stoakes
Cc: yuzhao, ziy, baolin.wang, baohua, voidice, Liam.Howlett,
catalin.marinas, cerasuolodomenico, hannes, kaleshsingh, npache,
riel, roman.gushchin, rppt, ryan.roberts, dev.jain, ryncsn,
shakeel.butt, surenb, hughd, willy, matthew.brost, joshua.hahnjy,
rakie.kim, byungchul, gourry, ying.huang, apopple, qun-wei.lin,
Andrew.Yang, casper.li, chinwen.chang, linux-arm-kernel,
linux-kernel, linux-mediatek, linux-mm, ioworker0, stable
On 22/09/2025 03:14, Lance Yang wrote:
> From: Lance Yang <lance.yang@linux.dev>
>
> When both THP and MTE are enabled, splitting a THP and replacing its
> zero-filled subpages with the shared zeropage can cause MTE tag mismatch
> faults in userspace.
>
> Remapping zero-filled subpages to the shared zeropage is unsafe, as the
> zeropage has a fixed tag of zero, which may not match the tag expected by
> the userspace pointer.
>
> KSM already avoids this problem by using memcmp_pages(), which on arm64
> intentionally reports MTE-tagged pages as non-identical to prevent unsafe
> merging.
>
> As suggested by David[1], this patch adopts the same pattern, replacing the
> memchr_inv() byte-level check with a call to pages_identical(). This
> leverages existing architecture-specific logic to determine if a page is
> truly identical to the shared zeropage.
>
> Having both the THP shrinker and KSM rely on pages_identical() makes the
> design more future-proof, IMO. Instead of handling quirks in generic code,
> we just let the architecture decide what makes two pages identical.
>
> [1] https://lore.kernel.org/all/ca2106a3-4bb2-4457-81af-301fd99fbef4@redhat.com
>
> Cc: <stable@vger.kernel.org>
> Reported-by: Qun-wei Lin <Qun-wei.Lin@mediatek.com>
> Closes: https://lore.kernel.org/all/a7944523fcc3634607691c35311a5d59d1a3f8d4.camel@mediatek.com
> Fixes: b1f202060afe ("mm: remap unused subpages to shared zeropage when splitting isolated thp")
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Lance Yang <lance.yang@linux.dev>
> ---
> Tested on x86_64 and on QEMU for arm64 (with and without MTE support),
> and the fix works as expected.
>
> mm/huge_memory.c | 15 +++------------
> mm/migrate.c | 8 +-------
> 2 files changed, 4 insertions(+), 19 deletions(-)
>
Thanks for the fix!
Acked-by: Usama Arif <usamaarif642@gmail.com>
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH 1/1] mm/thp: fix MTE tag mismatch when replacing zero-filled subpages
2025-09-22 2:14 [PATCH 1/1] mm/thp: fix MTE tag mismatch when replacing zero-filled subpages Lance Yang
` (2 preceding siblings ...)
2025-09-22 8:24 ` Usama Arif
@ 2025-09-22 17:24 ` Catalin Marinas
2025-09-22 17:59 ` David Hildenbrand
2025-09-23 2:10 ` Wei Yang
4 siblings, 1 reply; 22+ messages in thread
From: Catalin Marinas @ 2025-09-22 17:24 UTC (permalink / raw)
To: Lance Yang
Cc: akpm, david, lorenzo.stoakes, usamaarif642, yuzhao, ziy,
baolin.wang, baohua, voidice, Liam.Howlett, cerasuolodomenico,
hannes, kaleshsingh, npache, riel, roman.gushchin, rppt,
ryan.roberts, dev.jain, ryncsn, shakeel.butt, surenb, hughd,
willy, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, qun-wei.lin, Andrew.Yang, casper.li,
chinwen.chang, linux-arm-kernel, linux-kernel, linux-mediatek,
linux-mm, ioworker0, stable
On Mon, Sep 22, 2025 at 10:14:58AM +0800, Lance Yang wrote:
> From: Lance Yang <lance.yang@linux.dev>
>
> When both THP and MTE are enabled, splitting a THP and replacing its
> zero-filled subpages with the shared zeropage can cause MTE tag mismatch
> faults in userspace.
>
> Remapping zero-filled subpages to the shared zeropage is unsafe, as the
> zeropage has a fixed tag of zero, which may not match the tag expected by
> the userspace pointer.
>
> KSM already avoids this problem by using memcmp_pages(), which on arm64
> intentionally reports MTE-tagged pages as non-identical to prevent unsafe
> merging.
>
> As suggested by David[1], this patch adopts the same pattern, replacing the
> memchr_inv() byte-level check with a call to pages_identical(). This
> leverages existing architecture-specific logic to determine if a page is
> truly identical to the shared zeropage.
>
> Having both the THP shrinker and KSM rely on pages_identical() makes the
> design more future-proof, IMO. Instead of handling quirks in generic code,
> we just let the architecture decide what makes two pages identical.
>
> [1] https://lore.kernel.org/all/ca2106a3-4bb2-4457-81af-301fd99fbef4@redhat.com
>
> Cc: <stable@vger.kernel.org>
> Reported-by: Qun-wei Lin <Qun-wei.Lin@mediatek.com>
> Closes: https://lore.kernel.org/all/a7944523fcc3634607691c35311a5d59d1a3f8d4.camel@mediatek.com
> Fixes: b1f202060afe ("mm: remap unused subpages to shared zeropage when splitting isolated thp")
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Lance Yang <lance.yang@linux.dev>
Functionally, the patch looks fine, both with and without MTE.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 32e0ec2dde36..28d4b02a1aa5 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -4104,29 +4104,20 @@ static unsigned long deferred_split_count(struct shrinker *shrink,
> static bool thp_underused(struct folio *folio)
> {
> int num_zero_pages = 0, num_filled_pages = 0;
> - void *kaddr;
> int i;
>
> for (i = 0; i < folio_nr_pages(folio); i++) {
> - kaddr = kmap_local_folio(folio, i * PAGE_SIZE);
> - if (!memchr_inv(kaddr, 0, PAGE_SIZE)) {
> - num_zero_pages++;
> - if (num_zero_pages > khugepaged_max_ptes_none) {
> - kunmap_local(kaddr);
> + if (pages_identical(folio_page(folio, i), ZERO_PAGE(0))) {
> + if (++num_zero_pages > khugepaged_max_ptes_none)
> return true;
I wonder what the overhead of doing a memcmp() vs memchr_inv() is. The
former will need to read from two places. If it's noticeable, it would
affect architectures that don't have an MTE equivalent.
Alternatively we could introduce something like folio_has_metadata()
which on arm64 simply checks PG_mte_tagged.
--
Catalin
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH 1/1] mm/thp: fix MTE tag mismatch when replacing zero-filled subpages
2025-09-22 17:24 ` Catalin Marinas
@ 2025-09-22 17:59 ` David Hildenbrand
2025-09-23 1:48 ` Lance Yang
2025-09-23 11:52 ` Catalin Marinas
0 siblings, 2 replies; 22+ messages in thread
From: David Hildenbrand @ 2025-09-22 17:59 UTC (permalink / raw)
To: Catalin Marinas, Lance Yang
Cc: akpm, lorenzo.stoakes, usamaarif642, yuzhao, ziy, baolin.wang,
baohua, voidice, Liam.Howlett, cerasuolodomenico, hannes,
kaleshsingh, npache, riel, roman.gushchin, rppt, ryan.roberts,
dev.jain, ryncsn, shakeel.butt, surenb, hughd, willy,
matthew.brost, joshua.hahnjy, rakie.kim, byungchul, gourry,
ying.huang, apopple, qun-wei.lin, Andrew.Yang, casper.li,
chinwen.chang, linux-arm-kernel, linux-kernel, linux-mediatek,
linux-mm, ioworker0, stable
On 22.09.25 19:24, Catalin Marinas wrote:
> On Mon, Sep 22, 2025 at 10:14:58AM +0800, Lance Yang wrote:
>> From: Lance Yang <lance.yang@linux.dev>
>>
>> When both THP and MTE are enabled, splitting a THP and replacing its
>> zero-filled subpages with the shared zeropage can cause MTE tag mismatch
>> faults in userspace.
>>
>> Remapping zero-filled subpages to the shared zeropage is unsafe, as the
>> zeropage has a fixed tag of zero, which may not match the tag expected by
>> the userspace pointer.
>>
>> KSM already avoids this problem by using memcmp_pages(), which on arm64
>> intentionally reports MTE-tagged pages as non-identical to prevent unsafe
>> merging.
>>
>> As suggested by David[1], this patch adopts the same pattern, replacing the
>> memchr_inv() byte-level check with a call to pages_identical(). This
>> leverages existing architecture-specific logic to determine if a page is
>> truly identical to the shared zeropage.
>>
>> Having both the THP shrinker and KSM rely on pages_identical() makes the
>> design more future-proof, IMO. Instead of handling quirks in generic code,
>> we just let the architecture decide what makes two pages identical.
>>
>> [1] https://lore.kernel.org/all/ca2106a3-4bb2-4457-81af-301fd99fbef4@redhat.com
>>
>> Cc: <stable@vger.kernel.org>
>> Reported-by: Qun-wei Lin <Qun-wei.Lin@mediatek.com>
>> Closes: https://lore.kernel.org/all/a7944523fcc3634607691c35311a5d59d1a3f8d4.camel@mediatek.com
>> Fixes: b1f202060afe ("mm: remap unused subpages to shared zeropage when splitting isolated thp")
>> Suggested-by: David Hildenbrand <david@redhat.com>
>> Signed-off-by: Lance Yang <lance.yang@linux.dev>
>
> Functionally, the patch looks fine, both with and without MTE.
>
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 32e0ec2dde36..28d4b02a1aa5 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -4104,29 +4104,20 @@ static unsigned long deferred_split_count(struct shrinker *shrink,
>> static bool thp_underused(struct folio *folio)
>> {
>> int num_zero_pages = 0, num_filled_pages = 0;
>> - void *kaddr;
>> int i;
>>
>> for (i = 0; i < folio_nr_pages(folio); i++) {
>> - kaddr = kmap_local_folio(folio, i * PAGE_SIZE);
>> - if (!memchr_inv(kaddr, 0, PAGE_SIZE)) {
>> - num_zero_pages++;
>> - if (num_zero_pages > khugepaged_max_ptes_none) {
>> - kunmap_local(kaddr);
>> + if (pages_identical(folio_page(folio, i), ZERO_PAGE(0))) {
>> + if (++num_zero_pages > khugepaged_max_ptes_none)
>> return true;
>
> I wonder what the overhead of doing a memcmp() vs memchr_inv() is. The
> former will need to read from two places. If it's noticeable, it would
> affect architectures that don't have an MTE equivalent.
>
> Alternatively we could introduce something like folio_has_metadata()
> which on arm64 simply checks PG_mte_tagged.
We discussed something similar in the other thread (I suggested
page_is_mergable()). I'd prefer to use pages_identical() for now, so we
have the same logic here and in ksm code.
(this patch here almost looks like a cleanup :) )
If this becomes a problem, what we could do is in pages_identical()
would be simply doing the memchr_inv() in case is_zero_pfn(). KSM might
benefit from that as well when merging with the shared zeropage through
try_to_merge_with_zero_page().
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH 1/1] mm/thp: fix MTE tag mismatch when replacing zero-filled subpages
2025-09-22 17:59 ` David Hildenbrand
@ 2025-09-23 1:48 ` Lance Yang
2025-09-23 11:52 ` Catalin Marinas
1 sibling, 0 replies; 22+ messages in thread
From: Lance Yang @ 2025-09-23 1:48 UTC (permalink / raw)
To: Catalin Marinas, David Hildenbrand
Cc: akpm, lorenzo.stoakes, usamaarif642, yuzhao, ziy, baolin.wang,
baohua, voidice, Liam.Howlett, cerasuolodomenico, hannes,
kaleshsingh, npache, riel, roman.gushchin, rppt, ryan.roberts,
dev.jain, ryncsn, shakeel.butt, surenb, hughd, willy,
matthew.brost, joshua.hahnjy, rakie.kim, byungchul, gourry,
ying.huang, apopple, qun-wei.lin, Andrew.Yang, casper.li,
chinwen.chang, linux-arm-kernel, linux-kernel, linux-mediatek,
linux-mm, ioworker0, stable
On 2025/9/23 01:59, David Hildenbrand wrote:
> On 22.09.25 19:24, Catalin Marinas wrote:
>> On Mon, Sep 22, 2025 at 10:14:58AM +0800, Lance Yang wrote:
>>> From: Lance Yang <lance.yang@linux.dev>
>>>
>>> When both THP and MTE are enabled, splitting a THP and replacing its
>>> zero-filled subpages with the shared zeropage can cause MTE tag mismatch
>>> faults in userspace.
>>>
>>> Remapping zero-filled subpages to the shared zeropage is unsafe, as the
>>> zeropage has a fixed tag of zero, which may not match the tag
>>> expected by
>>> the userspace pointer.
>>>
>>> KSM already avoids this problem by using memcmp_pages(), which on arm64
>>> intentionally reports MTE-tagged pages as non-identical to prevent
>>> unsafe
>>> merging.
>>>
>>> As suggested by David[1], this patch adopts the same pattern,
>>> replacing the
>>> memchr_inv() byte-level check with a call to pages_identical(). This
>>> leverages existing architecture-specific logic to determine if a page is
>>> truly identical to the shared zeropage.
>>>
>>> Having both the THP shrinker and KSM rely on pages_identical() makes the
>>> design more future-proof, IMO. Instead of handling quirks in generic
>>> code,
>>> we just let the architecture decide what makes two pages identical.
>>>
>>> [1] https://lore.kernel.org/all/
>>> ca2106a3-4bb2-4457-81af-301fd99fbef4@redhat.com
>>>
>>> Cc: <stable@vger.kernel.org>
>>> Reported-by: Qun-wei Lin <Qun-wei.Lin@mediatek.com>
>>> Closes: https://lore.kernel.org/all/
>>> a7944523fcc3634607691c35311a5d59d1a3f8d4.camel@mediatek.com
>>> Fixes: b1f202060afe ("mm: remap unused subpages to shared zeropage
>>> when splitting isolated thp")
>>> Suggested-by: David Hildenbrand <david@redhat.com>
>>> Signed-off-by: Lance Yang <lance.yang@linux.dev>
>>
>> Functionally, the patch looks fine, both with and without MTE.
>>
>> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Thanks for taking time to review!
>>
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index 32e0ec2dde36..28d4b02a1aa5 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -4104,29 +4104,20 @@ static unsigned long
>>> deferred_split_count(struct shrinker *shrink,
>>> static bool thp_underused(struct folio *folio)
>>> {
>>> int num_zero_pages = 0, num_filled_pages = 0;
>>> - void *kaddr;
>>> int i;
>>> for (i = 0; i < folio_nr_pages(folio); i++) {
>>> - kaddr = kmap_local_folio(folio, i * PAGE_SIZE);
>>> - if (!memchr_inv(kaddr, 0, PAGE_SIZE)) {
>>> - num_zero_pages++;
>>> - if (num_zero_pages > khugepaged_max_ptes_none) {
>>> - kunmap_local(kaddr);
>>> + if (pages_identical(folio_page(folio, i), ZERO_PAGE(0))) {
>>> + if (++num_zero_pages > khugepaged_max_ptes_none)
>>> return true;
>>
>> I wonder what the overhead of doing a memcmp() vs memchr_inv() is. The
>> former will need to read from two places. If it's noticeable, it would
>> affect architectures that don't have an MTE equivalent.
>>
>> Alternatively we could introduce something like folio_has_metadata()
>> which on arm64 simply checks PG_mte_tagged.
>
> We discussed something similar in the other thread (I suggested
> page_is_mergable()). I'd prefer to use pages_identical() for now, so we
> have the same logic here and in ksm code.
>
> (this patch here almost looks like a cleanup :) )
Yeah, let's keep it as-is for now.
Using the same pages_identical() pattern as KSM makes the logic
consistent.
And it's simple enough to be easily backported to stable trees ;)
>
> If this becomes a problem, what we could do is in pages_identical()
> would be simply doing the memchr_inv() in case is_zero_pfn(). KSM might
> benefit from that as well when merging with the shared zeropage through
> try_to_merge_with_zero_page().
Right, there is room for that optimization. I will look into it as a
follow-up patch after this one is settled and backported, especially if
the performance overhead turns out to be a real concern :)
Cheers,
Lance
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH 1/1] mm/thp: fix MTE tag mismatch when replacing zero-filled subpages
2025-09-22 17:59 ` David Hildenbrand
2025-09-23 1:48 ` Lance Yang
@ 2025-09-23 11:52 ` Catalin Marinas
2025-09-23 12:00 ` David Hildenbrand
2025-09-23 16:14 ` Catalin Marinas
1 sibling, 2 replies; 22+ messages in thread
From: Catalin Marinas @ 2025-09-23 11:52 UTC (permalink / raw)
To: David Hildenbrand
Cc: Lance Yang, akpm, lorenzo.stoakes, usamaarif642, yuzhao, ziy,
baolin.wang, baohua, voidice, Liam.Howlett, cerasuolodomenico,
hannes, kaleshsingh, npache, riel, roman.gushchin, rppt,
ryan.roberts, dev.jain, ryncsn, shakeel.butt, surenb, hughd,
willy, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, qun-wei.lin, Andrew.Yang, casper.li,
chinwen.chang, linux-arm-kernel, linux-kernel, linux-mediatek,
linux-mm, ioworker0, stable
On Mon, Sep 22, 2025 at 07:59:00PM +0200, David Hildenbrand wrote:
> On 22.09.25 19:24, Catalin Marinas wrote:
> > On Mon, Sep 22, 2025 at 10:14:58AM +0800, Lance Yang wrote:
> > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > index 32e0ec2dde36..28d4b02a1aa5 100644
> > > --- a/mm/huge_memory.c
> > > +++ b/mm/huge_memory.c
> > > @@ -4104,29 +4104,20 @@ static unsigned long deferred_split_count(struct shrinker *shrink,
> > > static bool thp_underused(struct folio *folio)
> > > {
> > > int num_zero_pages = 0, num_filled_pages = 0;
> > > - void *kaddr;
> > > int i;
> > > for (i = 0; i < folio_nr_pages(folio); i++) {
> > > - kaddr = kmap_local_folio(folio, i * PAGE_SIZE);
> > > - if (!memchr_inv(kaddr, 0, PAGE_SIZE)) {
> > > - num_zero_pages++;
> > > - if (num_zero_pages > khugepaged_max_ptes_none) {
> > > - kunmap_local(kaddr);
> > > + if (pages_identical(folio_page(folio, i), ZERO_PAGE(0))) {
> > > + if (++num_zero_pages > khugepaged_max_ptes_none)
> > > return true;
> >
> > I wonder what the overhead of doing a memcmp() vs memchr_inv() is. The
> > former will need to read from two places. If it's noticeable, it would
> > affect architectures that don't have an MTE equivalent.
> >
> > Alternatively we could introduce something like folio_has_metadata()
> > which on arm64 simply checks PG_mte_tagged.
>
> We discussed something similar in the other thread (I suggested
> page_is_mergable()). I'd prefer to use pages_identical() for now, so we have
> the same logic here and in ksm code.
>
> (this patch here almost looks like a cleanup :) )
>
> If this becomes a problem, what we could do is in pages_identical() would be
> simply doing the memchr_inv() in case is_zero_pfn(). KSM might benefit from
> that as well when merging with the shared zeropage through
> try_to_merge_with_zero_page().
Yes, we can always optimise it later.
I just realised that on arm64 with MTE we won't get any merging with the
zero page even if the user page isn't mapped with PROT_MTE. In
cpu_enable_mte() we zero the tags in the zero page and set
PG_mte_tagged. The reason is that we want to use the zero page with
PROT_MTE mappings (until tag setting causes CoW). Hmm, the arm64
memcmp_pages() messed up KSM merging with the zero page even before this
patch.
The MTE tag setting evolved a bit over time with some locking using PG_*
flags to avoid a set_pte_at() race trying to initialise the tags on the
same page. We also moved the swap restoring to arch_swap_restore()
rather than the set_pte_at() path. So it is safe now to merge with the
zero page if the other page isn't tagged. A subsequent set_pte_at()
attempting to clear the tags would notice that the zero page is already
tagged.
We could go a step further and add tag comparison (I had some code
around) but I think the quick fix is to just not treat the zero page as
tagged. Not fully tested yet:
diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
index e5e773844889..72a1dfc54659 100644
--- a/arch/arm64/kernel/mte.c
+++ b/arch/arm64/kernel/mte.c
@@ -73,6 +73,8 @@ int memcmp_pages(struct page *page1, struct page *page2)
{
char *addr1, *addr2;
int ret;
+ bool page1_tagged = page_mte_tagged(page1) && !is_zero_page(page1);
+ bool page2_tagged = page_mte_tagged(page2) && !is_zero_page(page2);
addr1 = page_address(page1);
addr2 = page_address(page2);
@@ -83,11 +85,10 @@ int memcmp_pages(struct page *page1, struct page *page2)
/*
* If the page content is identical but at least one of the pages is
- * tagged, return non-zero to avoid KSM merging. If only one of the
- * pages is tagged, __set_ptes() may zero or change the tags of the
- * other page via mte_sync_tags().
+ * tagged, return non-zero to avoid KSM merging. Ignore the zero page
+ * since it is always tagged with the tags cleared.
*/
- if (page_mte_tagged(page1) || page_mte_tagged(page2))
+ if (page1_tagged || page2_tagged)
return addr1 != addr2;
return ret;
--
Catalin
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH 1/1] mm/thp: fix MTE tag mismatch when replacing zero-filled subpages
2025-09-23 11:52 ` Catalin Marinas
@ 2025-09-23 12:00 ` David Hildenbrand
2025-09-23 12:04 ` Lance Yang
` (2 more replies)
2025-09-23 16:14 ` Catalin Marinas
1 sibling, 3 replies; 22+ messages in thread
From: David Hildenbrand @ 2025-09-23 12:00 UTC (permalink / raw)
To: Catalin Marinas
Cc: Lance Yang, akpm, lorenzo.stoakes, usamaarif642, yuzhao, ziy,
baolin.wang, baohua, voidice, Liam.Howlett, cerasuolodomenico,
hannes, kaleshsingh, npache, riel, roman.gushchin, rppt,
ryan.roberts, dev.jain, ryncsn, shakeel.butt, surenb, hughd,
willy, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, qun-wei.lin, Andrew.Yang, casper.li,
chinwen.chang, linux-arm-kernel, linux-kernel, linux-mediatek,
linux-mm, ioworker0, stable
On 23.09.25 13:52, Catalin Marinas wrote:
> On Mon, Sep 22, 2025 at 07:59:00PM +0200, David Hildenbrand wrote:
>> On 22.09.25 19:24, Catalin Marinas wrote:
>>> On Mon, Sep 22, 2025 at 10:14:58AM +0800, Lance Yang wrote:
>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>> index 32e0ec2dde36..28d4b02a1aa5 100644
>>>> --- a/mm/huge_memory.c
>>>> +++ b/mm/huge_memory.c
>>>> @@ -4104,29 +4104,20 @@ static unsigned long deferred_split_count(struct shrinker *shrink,
>>>> static bool thp_underused(struct folio *folio)
>>>> {
>>>> int num_zero_pages = 0, num_filled_pages = 0;
>>>> - void *kaddr;
>>>> int i;
>>>> for (i = 0; i < folio_nr_pages(folio); i++) {
>>>> - kaddr = kmap_local_folio(folio, i * PAGE_SIZE);
>>>> - if (!memchr_inv(kaddr, 0, PAGE_SIZE)) {
>>>> - num_zero_pages++;
>>>> - if (num_zero_pages > khugepaged_max_ptes_none) {
>>>> - kunmap_local(kaddr);
>>>> + if (pages_identical(folio_page(folio, i), ZERO_PAGE(0))) {
>>>> + if (++num_zero_pages > khugepaged_max_ptes_none)
>>>> return true;
>>>
>>> I wonder what the overhead of doing a memcmp() vs memchr_inv() is. The
>>> former will need to read from two places. If it's noticeable, it would
>>> affect architectures that don't have an MTE equivalent.
>>>
>>> Alternatively we could introduce something like folio_has_metadata()
>>> which on arm64 simply checks PG_mte_tagged.
>>
>> We discussed something similar in the other thread (I suggested
>> page_is_mergable()). I'd prefer to use pages_identical() for now, so we have
>> the same logic here and in ksm code.
>>
>> (this patch here almost looks like a cleanup :) )
>>
>> If this becomes a problem, what we could do is in pages_identical() would be
>> simply doing the memchr_inv() in case is_zero_pfn(). KSM might benefit from
>> that as well when merging with the shared zeropage through
>> try_to_merge_with_zero_page().
>
> Yes, we can always optimise it later.
>
> I just realised that on arm64 with MTE we won't get any merging with the
> zero page even if the user page isn't mapped with PROT_MTE. In
> cpu_enable_mte() we zero the tags in the zero page and set
> PG_mte_tagged. The reason is that we want to use the zero page with
> PROT_MTE mappings (until tag setting causes CoW). Hmm, the arm64
> memcmp_pages() messed up KSM merging with the zero page even before this
> patch.
>
> The MTE tag setting evolved a bit over time with some locking using PG_*
> flags to avoid a set_pte_at() race trying to initialise the tags on the
> same page. We also moved the swap restoring to arch_swap_restore()
> rather than the set_pte_at() path. So it is safe now to merge with the
> zero page if the other page isn't tagged. A subsequent set_pte_at()
> attempting to clear the tags would notice that the zero page is already
> tagged.
>
> We could go a step further and add tag comparison (I had some code
> around) but I think the quick fix is to just not treat the zero page as
> tagged.
I assume any tag changes would result in CoW.
It would be interesting to know if there are use cases with VMs or other
workloads where that could be beneficial with KSM.
> Not fully tested yet:
>
> diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
> index e5e773844889..72a1dfc54659 100644
> --- a/arch/arm64/kernel/mte.c
> +++ b/arch/arm64/kernel/mte.c
> @@ -73,6 +73,8 @@ int memcmp_pages(struct page *page1, struct page *page2)
> {
> char *addr1, *addr2;
> int ret;
> + bool page1_tagged = page_mte_tagged(page1) && !is_zero_page(page1);
> + bool page2_tagged = page_mte_tagged(page2) && !is_zero_page(page2);
>
> addr1 = page_address(page1);
> addr2 = page_address(page2);
> @@ -83,11 +85,10 @@ int memcmp_pages(struct page *page1, struct page *page2)
>
> /*
> * If the page content is identical but at least one of the pages is
> - * tagged, return non-zero to avoid KSM merging. If only one of the
> - * pages is tagged, __set_ptes() may zero or change the tags of the
> - * other page via mte_sync_tags().
> + * tagged, return non-zero to avoid KSM merging. Ignore the zero page
> + * since it is always tagged with the tags cleared.
> */
> - if (page_mte_tagged(page1) || page_mte_tagged(page2))
> + if (page1_tagged || page2_tagged)
> return addr1 != addr2;
That looks reasonable to me.
@Lance as you had a test setup, could you give this a try as well with
KSM shared zeropage deduplication enabled whether it now works as
expected as well?
Then, this should likely be an independent fix.
For KSM you likely have to enable it first through
/sys/kernel/mm/ksm/use_zero_pages.
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH 1/1] mm/thp: fix MTE tag mismatch when replacing zero-filled subpages
2025-09-23 12:00 ` David Hildenbrand
@ 2025-09-23 12:04 ` Lance Yang
2025-09-23 12:51 ` Catalin Marinas
2025-09-23 17:20 ` Lance Yang
2 siblings, 0 replies; 22+ messages in thread
From: Lance Yang @ 2025-09-23 12:04 UTC (permalink / raw)
To: David Hildenbrand, Catalin Marinas
Cc: akpm, lorenzo.stoakes, usamaarif642, yuzhao, ziy, baolin.wang,
baohua, voidice, Liam.Howlett, cerasuolodomenico, hannes,
kaleshsingh, npache, riel, roman.gushchin, rppt, ryan.roberts,
dev.jain, ryncsn, shakeel.butt, surenb, hughd, willy,
matthew.brost, joshua.hahnjy, rakie.kim, byungchul, gourry,
ying.huang, apopple, qun-wei.lin, Andrew.Yang, casper.li,
chinwen.chang, linux-arm-kernel, linux-kernel, linux-mediatek,
linux-mm, ioworker0, stable
On 2025/9/23 20:00, David Hildenbrand wrote:
> On 23.09.25 13:52, Catalin Marinas wrote:
>> On Mon, Sep 22, 2025 at 07:59:00PM +0200, David Hildenbrand wrote:
>>> On 22.09.25 19:24, Catalin Marinas wrote:
>>>> On Mon, Sep 22, 2025 at 10:14:58AM +0800, Lance Yang wrote:
>>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>>> index 32e0ec2dde36..28d4b02a1aa5 100644
>>>>> --- a/mm/huge_memory.c
>>>>> +++ b/mm/huge_memory.c
>>>>> @@ -4104,29 +4104,20 @@ static unsigned long
>>>>> deferred_split_count(struct shrinker *shrink,
>>>>> static bool thp_underused(struct folio *folio)
>>>>> {
>>>>> int num_zero_pages = 0, num_filled_pages = 0;
>>>>> - void *kaddr;
>>>>> int i;
>>>>> for (i = 0; i < folio_nr_pages(folio); i++) {
>>>>> - kaddr = kmap_local_folio(folio, i * PAGE_SIZE);
>>>>> - if (!memchr_inv(kaddr, 0, PAGE_SIZE)) {
>>>>> - num_zero_pages++;
>>>>> - if (num_zero_pages > khugepaged_max_ptes_none) {
>>>>> - kunmap_local(kaddr);
>>>>> + if (pages_identical(folio_page(folio, i), ZERO_PAGE(0))) {
>>>>> + if (++num_zero_pages > khugepaged_max_ptes_none)
>>>>> return true;
>>>>
>>>> I wonder what the overhead of doing a memcmp() vs memchr_inv() is. The
>>>> former will need to read from two places. If it's noticeable, it would
>>>> affect architectures that don't have an MTE equivalent.
>>>>
>>>> Alternatively we could introduce something like folio_has_metadata()
>>>> which on arm64 simply checks PG_mte_tagged.
>>>
>>> We discussed something similar in the other thread (I suggested
>>> page_is_mergable()). I'd prefer to use pages_identical() for now, so
>>> we have
>>> the same logic here and in ksm code.
>>>
>>> (this patch here almost looks like a cleanup :) )
>>>
>>> If this becomes a problem, what we could do is in pages_identical()
>>> would be
>>> simply doing the memchr_inv() in case is_zero_pfn(). KSM might
>>> benefit from
>>> that as well when merging with the shared zeropage through
>>> try_to_merge_with_zero_page().
>>
>> Yes, we can always optimise it later.
>>
>> I just realised that on arm64 with MTE we won't get any merging with the
>> zero page even if the user page isn't mapped with PROT_MTE. In
>> cpu_enable_mte() we zero the tags in the zero page and set
>> PG_mte_tagged. The reason is that we want to use the zero page with
>> PROT_MTE mappings (until tag setting causes CoW). Hmm, the arm64
>> memcmp_pages() messed up KSM merging with the zero page even before this
>> patch.
>>
>> The MTE tag setting evolved a bit over time with some locking using PG_*
>> flags to avoid a set_pte_at() race trying to initialise the tags on the
>> same page. We also moved the swap restoring to arch_swap_restore()
>> rather than the set_pte_at() path. So it is safe now to merge with the
>> zero page if the other page isn't tagged. A subsequent set_pte_at()
>> attempting to clear the tags would notice that the zero page is already
>> tagged.
>>
>> We could go a step further and add tag comparison (I had some code
>> around) but I think the quick fix is to just not treat the zero page as
>> tagged.
>
> I assume any tag changes would result in CoW.
>
> It would be interesting to know if there are use cases with VMs or other
> workloads where that could be beneficial with KSM.
>
>> Not fully tested yet:
>>
>> diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
>> index e5e773844889..72a1dfc54659 100644
>> --- a/arch/arm64/kernel/mte.c
>> +++ b/arch/arm64/kernel/mte.c
>> @@ -73,6 +73,8 @@ int memcmp_pages(struct page *page1, struct page
>> *page2)
>> {
>> char *addr1, *addr2;
>> int ret;
>> + bool page1_tagged = page_mte_tagged(page1) && !is_zero_page(page1);
>> + bool page2_tagged = page_mte_tagged(page2) && !is_zero_page(page2);
>> addr1 = page_address(page1);
>> addr2 = page_address(page2);
>> @@ -83,11 +85,10 @@ int memcmp_pages(struct page *page1, struct page
>> *page2)
>> /*
>> * If the page content is identical but at least one of the
>> pages is
>> - * tagged, return non-zero to avoid KSM merging. If only one of the
>> - * pages is tagged, __set_ptes() may zero or change the tags of the
>> - * other page via mte_sync_tags().
>> + * tagged, return non-zero to avoid KSM merging. Ignore the zero
>> page
>> + * since it is always tagged with the tags cleared.
>> */
>> - if (page_mte_tagged(page1) || page_mte_tagged(page2))
>> + if (page1_tagged || page2_tagged)
>> return addr1 != addr2;
>
> That looks reasonable to me.
Yeah, looks good to me as well.
>
> @Lance as you had a test setup, could you give this a try as well with
> KSM shared zeropage deduplication enabled whether it now works as
> expected as well?
Sure. I'll test that and get back to you ;)
>
> Then, this should likely be an independent fix.
>
> For KSM you likely have to enable it first through /sys/kernel/mm/ksm/
> use_zero_pages.
Got it.
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH 1/1] mm/thp: fix MTE tag mismatch when replacing zero-filled subpages
2025-09-23 12:00 ` David Hildenbrand
2025-09-23 12:04 ` Lance Yang
@ 2025-09-23 12:51 ` Catalin Marinas
2025-09-23 17:20 ` Lance Yang
2 siblings, 0 replies; 22+ messages in thread
From: Catalin Marinas @ 2025-09-23 12:51 UTC (permalink / raw)
To: David Hildenbrand
Cc: Lance Yang, akpm, lorenzo.stoakes, usamaarif642, yuzhao, ziy,
baolin.wang, baohua, voidice, Liam.Howlett, cerasuolodomenico,
hannes, kaleshsingh, npache, riel, roman.gushchin, rppt,
ryan.roberts, dev.jain, ryncsn, shakeel.butt, surenb, hughd,
willy, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, qun-wei.lin, Andrew.Yang, casper.li,
chinwen.chang, linux-arm-kernel, linux-kernel, linux-mediatek,
linux-mm, ioworker0, stable
On Tue, Sep 23, 2025 at 02:00:01PM +0200, David Hildenbrand wrote:
> On 23.09.25 13:52, Catalin Marinas wrote:
> > I just realised that on arm64 with MTE we won't get any merging with the
> > zero page even if the user page isn't mapped with PROT_MTE. In
> > cpu_enable_mte() we zero the tags in the zero page and set
> > PG_mte_tagged. The reason is that we want to use the zero page with
> > PROT_MTE mappings (until tag setting causes CoW). Hmm, the arm64
> > memcmp_pages() messed up KSM merging with the zero page even before this
> > patch.
> >
> > The MTE tag setting evolved a bit over time with some locking using PG_*
> > flags to avoid a set_pte_at() race trying to initialise the tags on the
> > same page. We also moved the swap restoring to arch_swap_restore()
> > rather than the set_pte_at() path. So it is safe now to merge with the
> > zero page if the other page isn't tagged. A subsequent set_pte_at()
> > attempting to clear the tags would notice that the zero page is already
> > tagged.
> >
> > We could go a step further and add tag comparison (I had some code
> > around) but I think the quick fix is to just not treat the zero page as
> > tagged.
>
> I assume any tag changes would result in CoW.
Yes.
> It would be interesting to know if there are use cases with VMs or other
> workloads where that could be beneficial with KSM.
With VMs, if MTE is allowed in the guest, we currently treat any VM page
as tagged. In the initial version of the MTE spec, we did not have any
fine-rained control at the stage 2 page table over whether MTE is in use
by the guest (just a big knob in a control register). We later got
FEAT_MTE_PERM which allows stage 2 to trap tag accesses in a VM on a
page by page basis, though we haven't added KVM support for it yet.
If we add full tag comparison, VMs may be able to share more pages. For
example, code pages are never tagged in a VM but the hypervisor doesn't
know this, so it just avoids sharing. I posted tag comparison some years
ago but dropped it eventually to keep things simple:
https://lore.kernel.org/all/20200421142603.3894-9-catalin.marinas@arm.com/
However, it needs a bit of tidying up since at the time we assumed
everything was tagged. I can respin the above (on top of the fix below),
though I don't see many vendors rushing to deploy MTE in a multi-VM
scenario (Android + pKVM maybe but not sure they enable KSM due to power
constraints).
> > diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
> > index e5e773844889..72a1dfc54659 100644
> > --- a/arch/arm64/kernel/mte.c
> > +++ b/arch/arm64/kernel/mte.c
> > @@ -73,6 +73,8 @@ int memcmp_pages(struct page *page1, struct page *page2)
> > {
> > char *addr1, *addr2;
> > int ret;
> > + bool page1_tagged = page_mte_tagged(page1) && !is_zero_page(page1);
> > + bool page2_tagged = page_mte_tagged(page2) && !is_zero_page(page2);
> > addr1 = page_address(page1);
> > addr2 = page_address(page2);
> > @@ -83,11 +85,10 @@ int memcmp_pages(struct page *page1, struct page *page2)
> > /*
> > * If the page content is identical but at least one of the pages is
> > - * tagged, return non-zero to avoid KSM merging. If only one of the
> > - * pages is tagged, __set_ptes() may zero or change the tags of the
> > - * other page via mte_sync_tags().
> > + * tagged, return non-zero to avoid KSM merging. Ignore the zero page
> > + * since it is always tagged with the tags cleared.
> > */
> > - if (page_mte_tagged(page1) || page_mte_tagged(page2))
> > + if (page1_tagged || page2_tagged)
> > return addr1 != addr2;
>
> That looks reasonable to me.
>
> @Lance as you had a test setup, could you give this a try as well with KSM
> shared zeropage deduplication enabled whether it now works as expected as
> well?
Thanks!
> Then, this should likely be an independent fix.
Yes, I'll add a proper commit message. We could do a cc stable, though
it's more of an optimisation.
--
Catalin
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH 1/1] mm/thp: fix MTE tag mismatch when replacing zero-filled subpages
2025-09-23 12:00 ` David Hildenbrand
2025-09-23 12:04 ` Lance Yang
2025-09-23 12:51 ` Catalin Marinas
@ 2025-09-23 17:20 ` Lance Yang
2 siblings, 0 replies; 22+ messages in thread
From: Lance Yang @ 2025-09-23 17:20 UTC (permalink / raw)
To: David Hildenbrand, Catalin Marinas
Cc: akpm, lorenzo.stoakes, usamaarif642, yuzhao, ziy, baolin.wang,
baohua, voidice, Liam.Howlett, cerasuolodomenico, hannes,
kaleshsingh, npache, riel, roman.gushchin, rppt, ryan.roberts,
dev.jain, ryncsn, shakeel.butt, surenb, hughd, willy,
matthew.brost, joshua.hahnjy, rakie.kim, byungchul, gourry,
ying.huang, apopple, qun-wei.lin, Andrew.Yang, casper.li,
chinwen.chang, linux-arm-kernel, linux-kernel, linux-mediatek,
linux-mm, ioworker0, stable
On 2025/9/23 20:00, David Hildenbrand wrote:
> On 23.09.25 13:52, Catalin Marinas wrote:
>> On Mon, Sep 22, 2025 at 07:59:00PM +0200, David Hildenbrand wrote:
>>> On 22.09.25 19:24, Catalin Marinas wrote:
>>>> On Mon, Sep 22, 2025 at 10:14:58AM +0800, Lance Yang wrote:
>>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>>> index 32e0ec2dde36..28d4b02a1aa5 100644
>>>>> --- a/mm/huge_memory.c
>>>>> +++ b/mm/huge_memory.c
>>>>> @@ -4104,29 +4104,20 @@ static unsigned long
>>>>> deferred_split_count(struct shrinker *shrink,
>>>>> static bool thp_underused(struct folio *folio)
>>>>> {
>>>>> int num_zero_pages = 0, num_filled_pages = 0;
>>>>> - void *kaddr;
>>>>> int i;
>>>>> for (i = 0; i < folio_nr_pages(folio); i++) {
>>>>> - kaddr = kmap_local_folio(folio, i * PAGE_SIZE);
>>>>> - if (!memchr_inv(kaddr, 0, PAGE_SIZE)) {
>>>>> - num_zero_pages++;
>>>>> - if (num_zero_pages > khugepaged_max_ptes_none) {
>>>>> - kunmap_local(kaddr);
>>>>> + if (pages_identical(folio_page(folio, i), ZERO_PAGE(0))) {
>>>>> + if (++num_zero_pages > khugepaged_max_ptes_none)
>>>>> return true;
>>>>
>>>> I wonder what the overhead of doing a memcmp() vs memchr_inv() is. The
>>>> former will need to read from two places. If it's noticeable, it would
>>>> affect architectures that don't have an MTE equivalent.
>>>>
>>>> Alternatively we could introduce something like folio_has_metadata()
>>>> which on arm64 simply checks PG_mte_tagged.
>>>
>>> We discussed something similar in the other thread (I suggested
>>> page_is_mergable()). I'd prefer to use pages_identical() for now, so
>>> we have
>>> the same logic here and in ksm code.
>>>
>>> (this patch here almost looks like a cleanup :) )
>>>
>>> If this becomes a problem, what we could do is in pages_identical()
>>> would be
>>> simply doing the memchr_inv() in case is_zero_pfn(). KSM might
>>> benefit from
>>> that as well when merging with the shared zeropage through
>>> try_to_merge_with_zero_page().
>>
>> Yes, we can always optimise it later.
>>
>> I just realised that on arm64 with MTE we won't get any merging with the
>> zero page even if the user page isn't mapped with PROT_MTE. In
>> cpu_enable_mte() we zero the tags in the zero page and set
>> PG_mte_tagged. The reason is that we want to use the zero page with
>> PROT_MTE mappings (until tag setting causes CoW). Hmm, the arm64
>> memcmp_pages() messed up KSM merging with the zero page even before this
>> patch.
>>
>> The MTE tag setting evolved a bit over time with some locking using PG_*
>> flags to avoid a set_pte_at() race trying to initialise the tags on the
>> same page. We also moved the swap restoring to arch_swap_restore()
>> rather than the set_pte_at() path. So it is safe now to merge with the
>> zero page if the other page isn't tagged. A subsequent set_pte_at()
>> attempting to clear the tags would notice that the zero page is already
>> tagged.
>>
>> We could go a step further and add tag comparison (I had some code
>> around) but I think the quick fix is to just not treat the zero page as
>> tagged.
>
> I assume any tag changes would result in CoW.
Right, I did a test and confirmed that any attempt to change
tags on the shared zero page results in a Copy-on-Write, as
expected ;)
>
> It would be interesting to know if there are use cases with VMs or other
> workloads where that could be beneficial with KSM.
>
>> Not fully tested yet:
>>
>> diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
>> index e5e773844889..72a1dfc54659 100644
>> --- a/arch/arm64/kernel/mte.c
>> +++ b/arch/arm64/kernel/mte.c
>> @@ -73,6 +73,8 @@ int memcmp_pages(struct page *page1, struct page
>> *page2)
>> {
>> char *addr1, *addr2;
>> int ret;
>> + bool page1_tagged = page_mte_tagged(page1) && !is_zero_page(page1);
>> + bool page2_tagged = page_mte_tagged(page2) && !is_zero_page(page2);
>> addr1 = page_address(page1);
>> addr2 = page_address(page2);
>> @@ -83,11 +85,10 @@ int memcmp_pages(struct page *page1, struct page
>> *page2)
>> /*
>> * If the page content is identical but at least one of the
>> pages is
>> - * tagged, return non-zero to avoid KSM merging. If only one of the
>> - * pages is tagged, __set_ptes() may zero or change the tags of the
>> - * other page via mte_sync_tags().
>> + * tagged, return non-zero to avoid KSM merging. Ignore the zero
>> page
>> + * since it is always tagged with the tags cleared.
>> */
>> - if (page_mte_tagged(page1) || page_mte_tagged(page2))
>> + if (page1_tagged || page2_tagged)
>> return addr1 != addr2;
>
> That looks reasonable to me.
>
> @Lance as you had a test setup, could you give this a try as well with
> KSM shared zeropage deduplication enabled whether it now works as
> expected as well?
This works as expected. Both KSM (with use_zero_pages enabled) and
the THP shrinker are now able to successfully merge zero-filled
pages with the shared zero page, as long as those pages are not
mapped with PROT_MTE.
>
> Then, this should likely be an independent fix.
@Catalin could you send a real patch?
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/1] mm/thp: fix MTE tag mismatch when replacing zero-filled subpages
2025-09-23 11:52 ` Catalin Marinas
2025-09-23 12:00 ` David Hildenbrand
@ 2025-09-23 16:14 ` Catalin Marinas
2025-09-23 16:40 ` David Hildenbrand
2025-09-24 2:49 ` Lance Yang
1 sibling, 2 replies; 22+ messages in thread
From: Catalin Marinas @ 2025-09-23 16:14 UTC (permalink / raw)
To: David Hildenbrand
Cc: Lance Yang, akpm, lorenzo.stoakes, usamaarif642, yuzhao, ziy,
baolin.wang, baohua, voidice, Liam.Howlett, cerasuolodomenico,
hannes, kaleshsingh, npache, riel, roman.gushchin, rppt,
ryan.roberts, dev.jain, ryncsn, shakeel.butt, surenb, hughd,
willy, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, qun-wei.lin, Andrew.Yang, casper.li,
chinwen.chang, linux-arm-kernel, linux-kernel, linux-mediatek,
linux-mm, ioworker0, stable
On Tue, Sep 23, 2025 at 12:52:06PM +0100, Catalin Marinas wrote:
> I just realised that on arm64 with MTE we won't get any merging with the
> zero page even if the user page isn't mapped with PROT_MTE. In
> cpu_enable_mte() we zero the tags in the zero page and set
> PG_mte_tagged. The reason is that we want to use the zero page with
> PROT_MTE mappings (until tag setting causes CoW). Hmm, the arm64
> memcmp_pages() messed up KSM merging with the zero page even before this
> patch.
[...]
> diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
> index e5e773844889..72a1dfc54659 100644
> --- a/arch/arm64/kernel/mte.c
> +++ b/arch/arm64/kernel/mte.c
> @@ -73,6 +73,8 @@ int memcmp_pages(struct page *page1, struct page *page2)
> {
> char *addr1, *addr2;
> int ret;
> + bool page1_tagged = page_mte_tagged(page1) && !is_zero_page(page1);
> + bool page2_tagged = page_mte_tagged(page2) && !is_zero_page(page2);
>
> addr1 = page_address(page1);
> addr2 = page_address(page2);
> @@ -83,11 +85,10 @@ int memcmp_pages(struct page *page1, struct page *page2)
>
> /*
> * If the page content is identical but at least one of the pages is
> - * tagged, return non-zero to avoid KSM merging. If only one of the
> - * pages is tagged, __set_ptes() may zero or change the tags of the
> - * other page via mte_sync_tags().
> + * tagged, return non-zero to avoid KSM merging. Ignore the zero page
> + * since it is always tagged with the tags cleared.
> */
> - if (page_mte_tagged(page1) || page_mte_tagged(page2))
> + if (page1_tagged || page2_tagged)
> return addr1 != addr2;
>
> return ret;
Unrelated to this discussion, I got an internal report that Linux hangs
during boot with CONFIG_DEFERRED_STRUCT_PAGE_INIT because
try_page_mte_tagging() locks up on uninitialised page flags.
Since we (always?) map the zero page as pte_special(), set_pte_at()
won't check if the tags have to be initialised, so we can skip the
PG_mte_tagged altogether. We actually had this code for some time until
we introduced the pte_special() check in set_pte_at().
So alternative patch that also fixes the deferred struct page init (on
the assumptions that the zero page is always mapped as pte_special():
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 7b78c95a9017..e325ba34f45c 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2419,17 +2419,21 @@ static void bti_enable(const struct arm64_cpu_capabilities *__unused)
#ifdef CONFIG_ARM64_MTE
static void cpu_enable_mte(struct arm64_cpu_capabilities const *cap)
{
+ static bool cleared_zero_page = false;
+
sysreg_clear_set(sctlr_el1, 0, SCTLR_ELx_ATA | SCTLR_EL1_ATA0);
mte_cpu_setup();
/*
* Clear the tags in the zero page. This needs to be done via the
- * linear map which has the Tagged attribute.
+ * linear map which has the Tagged attribute. Since this page is
+ * always mapped as pte_special(), set_pte_at() will not attempt to
+ * clear the tags or set PG_mte_tagged.
*/
- if (try_page_mte_tagging(ZERO_PAGE(0))) {
+ if (!cleared_zero_page) {
+ cleared_zero_page = true;
mte_clear_page_tags(lm_alias(empty_zero_page));
- set_page_mte_tagged(ZERO_PAGE(0));
}
kasan_init_hw_tags_cpu();
--
Catalin
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH 1/1] mm/thp: fix MTE tag mismatch when replacing zero-filled subpages
2025-09-23 16:14 ` Catalin Marinas
@ 2025-09-23 16:40 ` David Hildenbrand
2025-09-24 2:49 ` Lance Yang
1 sibling, 0 replies; 22+ messages in thread
From: David Hildenbrand @ 2025-09-23 16:40 UTC (permalink / raw)
To: Catalin Marinas
Cc: Lance Yang, akpm, lorenzo.stoakes, usamaarif642, yuzhao, ziy,
baolin.wang, baohua, voidice, Liam.Howlett, cerasuolodomenico,
hannes, kaleshsingh, npache, riel, roman.gushchin, rppt,
ryan.roberts, dev.jain, ryncsn, shakeel.butt, surenb, hughd,
willy, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, qun-wei.lin, Andrew.Yang, casper.li,
chinwen.chang, linux-arm-kernel, linux-kernel, linux-mediatek,
linux-mm, ioworker0, stable
On 23.09.25 18:14, Catalin Marinas wrote:
> On Tue, Sep 23, 2025 at 12:52:06PM +0100, Catalin Marinas wrote:
>> I just realised that on arm64 with MTE we won't get any merging with the
>> zero page even if the user page isn't mapped with PROT_MTE. In
>> cpu_enable_mte() we zero the tags in the zero page and set
>> PG_mte_tagged. The reason is that we want to use the zero page with
>> PROT_MTE mappings (until tag setting causes CoW). Hmm, the arm64
>> memcmp_pages() messed up KSM merging with the zero page even before this
>> patch.
> [...]
>> diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
>> index e5e773844889..72a1dfc54659 100644
>> --- a/arch/arm64/kernel/mte.c
>> +++ b/arch/arm64/kernel/mte.c
>> @@ -73,6 +73,8 @@ int memcmp_pages(struct page *page1, struct page *page2)
>> {
>> char *addr1, *addr2;
>> int ret;
>> + bool page1_tagged = page_mte_tagged(page1) && !is_zero_page(page1);
>> + bool page2_tagged = page_mte_tagged(page2) && !is_zero_page(page2);
>>
>> addr1 = page_address(page1);
>> addr2 = page_address(page2);
>> @@ -83,11 +85,10 @@ int memcmp_pages(struct page *page1, struct page *page2)
>>
>> /*
>> * If the page content is identical but at least one of the pages is
>> - * tagged, return non-zero to avoid KSM merging. If only one of the
>> - * pages is tagged, __set_ptes() may zero or change the tags of the
>> - * other page via mte_sync_tags().
>> + * tagged, return non-zero to avoid KSM merging. Ignore the zero page
>> + * since it is always tagged with the tags cleared.
>> */
>> - if (page_mte_tagged(page1) || page_mte_tagged(page2))
>> + if (page1_tagged || page2_tagged)
>> return addr1 != addr2;
>>
>> return ret;
>
> Unrelated to this discussion, I got an internal report that Linux hangs
> during boot with CONFIG_DEFERRED_STRUCT_PAGE_INIT because
> try_page_mte_tagging() locks up on uninitialised page flags.
>
> Since we (always?) map the zero page as pte_special(), set_pte_at()
Yes. (if pte_special is implemented by the arch of course)
> won't check if the tags have to be initialised, so we can skip the
> PG_mte_tagged altogether. We actually had this code for some time until
> we introduced the pte_special() check in set_pte_at().
>
> So alternative patch that also fixes the deferred struct page init (on
> the assumptions that the zero page is always mapped as pte_special():
>
LGTM!
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH 1/1] mm/thp: fix MTE tag mismatch when replacing zero-filled subpages
2025-09-23 16:14 ` Catalin Marinas
2025-09-23 16:40 ` David Hildenbrand
@ 2025-09-24 2:49 ` Lance Yang
2025-09-24 8:50 ` Catalin Marinas
1 sibling, 1 reply; 22+ messages in thread
From: Lance Yang @ 2025-09-24 2:49 UTC (permalink / raw)
To: Catalin Marinas, David Hildenbrand
Cc: akpm, lorenzo.stoakes, usamaarif642, yuzhao, ziy, baolin.wang,
baohua, voidice, Liam.Howlett, cerasuolodomenico, hannes,
kaleshsingh, npache, riel, roman.gushchin, rppt, ryan.roberts,
dev.jain, ryncsn, shakeel.butt, surenb, hughd, willy,
matthew.brost, joshua.hahnjy, rakie.kim, byungchul, gourry,
ying.huang, apopple, qun-wei.lin, Andrew.Yang, casper.li,
chinwen.chang, linux-arm-kernel, linux-kernel, linux-mediatek,
linux-mm, ioworker0, stable
On 2025/9/24 00:14, Catalin Marinas wrote:
> On Tue, Sep 23, 2025 at 12:52:06PM +0100, Catalin Marinas wrote:
>> I just realised that on arm64 with MTE we won't get any merging with the
>> zero page even if the user page isn't mapped with PROT_MTE. In
>> cpu_enable_mte() we zero the tags in the zero page and set
>> PG_mte_tagged. The reason is that we want to use the zero page with
>> PROT_MTE mappings (until tag setting causes CoW). Hmm, the arm64
>> memcmp_pages() messed up KSM merging with the zero page even before this
>> patch.
> [...]
>> diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
>> index e5e773844889..72a1dfc54659 100644
>> --- a/arch/arm64/kernel/mte.c
>> +++ b/arch/arm64/kernel/mte.c
>> @@ -73,6 +73,8 @@ int memcmp_pages(struct page *page1, struct page *page2)
>> {
>> char *addr1, *addr2;
>> int ret;
>> + bool page1_tagged = page_mte_tagged(page1) && !is_zero_page(page1);
>> + bool page2_tagged = page_mte_tagged(page2) && !is_zero_page(page2);
>>
>> addr1 = page_address(page1);
>> addr2 = page_address(page2);
>> @@ -83,11 +85,10 @@ int memcmp_pages(struct page *page1, struct page *page2)
>>
>> /*
>> * If the page content is identical but at least one of the pages is
>> - * tagged, return non-zero to avoid KSM merging. If only one of the
>> - * pages is tagged, __set_ptes() may zero or change the tags of the
>> - * other page via mte_sync_tags().
>> + * tagged, return non-zero to avoid KSM merging. Ignore the zero page
>> + * since it is always tagged with the tags cleared.
>> */
>> - if (page_mte_tagged(page1) || page_mte_tagged(page2))
>> + if (page1_tagged || page2_tagged)
>> return addr1 != addr2;
>>
>> return ret;
>
> Unrelated to this discussion, I got an internal report that Linux hangs
> during boot with CONFIG_DEFERRED_STRUCT_PAGE_INIT because
> try_page_mte_tagging() locks up on uninitialised page flags.
>
> Since we (always?) map the zero page as pte_special(), set_pte_at()
> won't check if the tags have to be initialised, so we can skip the
> PG_mte_tagged altogether. We actually had this code for some time until
> we introduced the pte_special() check in set_pte_at().
>
> So alternative patch that also fixes the deferred struct page init (on
> the assumptions that the zero page is always mapped as pte_special():
I can confirm that this alternative patch also works correctly; my tests
for MTE all pass ;)
This looks like a better fix since it solves the boot hang issue too.
>
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index 7b78c95a9017..e325ba34f45c 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -2419,17 +2419,21 @@ static void bti_enable(const struct arm64_cpu_capabilities *__unused)
> #ifdef CONFIG_ARM64_MTE
> static void cpu_enable_mte(struct arm64_cpu_capabilities const *cap)
> {
> + static bool cleared_zero_page = false;
> +
> sysreg_clear_set(sctlr_el1, 0, SCTLR_ELx_ATA | SCTLR_EL1_ATA0);
>
> mte_cpu_setup();
>
> /*
> * Clear the tags in the zero page. This needs to be done via the
> - * linear map which has the Tagged attribute.
> + * linear map which has the Tagged attribute. Since this page is
> + * always mapped as pte_special(), set_pte_at() will not attempt to
> + * clear the tags or set PG_mte_tagged.
> */
> - if (try_page_mte_tagging(ZERO_PAGE(0))) {
> + if (!cleared_zero_page) {
> + cleared_zero_page = true;
> mte_clear_page_tags(lm_alias(empty_zero_page));
> - set_page_mte_tagged(ZERO_PAGE(0));
> }
>
> kasan_init_hw_tags_cpu();
>
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH 1/1] mm/thp: fix MTE tag mismatch when replacing zero-filled subpages
2025-09-24 2:49 ` Lance Yang
@ 2025-09-24 8:50 ` Catalin Marinas
2025-09-24 9:13 ` David Hildenbrand
0 siblings, 1 reply; 22+ messages in thread
From: Catalin Marinas @ 2025-09-24 8:50 UTC (permalink / raw)
To: Lance Yang
Cc: David Hildenbrand, akpm, lorenzo.stoakes, usamaarif642, yuzhao,
ziy, baolin.wang, baohua, voidice, Liam.Howlett,
cerasuolodomenico, hannes, kaleshsingh, npache, riel,
roman.gushchin, rppt, ryan.roberts, dev.jain, ryncsn,
shakeel.butt, surenb, hughd, willy, matthew.brost, joshua.hahnjy,
rakie.kim, byungchul, gourry, ying.huang, apopple, qun-wei.lin,
Andrew.Yang, casper.li, chinwen.chang, linux-arm-kernel,
linux-kernel, linux-mediatek, linux-mm, ioworker0, stable
On Wed, Sep 24, 2025 at 10:49:27AM +0800, Lance Yang wrote:
> On 2025/9/24 00:14, Catalin Marinas wrote:
> > So alternative patch that also fixes the deferred struct page init (on
> > the assumptions that the zero page is always mapped as pte_special():
>
> I can confirm that this alternative patch also works correctly; my tests
> for MTE all pass ;)
Thanks Lance for testing. I'll post one of the variants today.
> This looks like a better fix since it solves the boot hang issue too.
In principle, yes, until I tracked down why I changed it in the first
place - 68d54ceeec0e ("arm64: mte: Allow PTRACE_PEEKMTETAGS access to
the zero page"). ptrace() can read tags from PROT_MTE mappings and we
want to allow reading zeroes as well if the page points to the zero
page. Not flagging the page as PG_mte_tagged caused issues.
I can change the logic in the ptrace() code, I just need to figure out
what happens to the huge zero page. Ideally we should treat both in the
same way but, AFAICT, we don't use pmd_mkspecial() on the huge zero
page, so it gets flagged with PG_mte_tagged.
If I go with the first fix for the page merging, I'll look to defer the
zero page initialisation post page_alloc_init_late() in a separate
patch.
--
Catalin
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH 1/1] mm/thp: fix MTE tag mismatch when replacing zero-filled subpages
2025-09-24 8:50 ` Catalin Marinas
@ 2025-09-24 9:13 ` David Hildenbrand
2025-09-24 9:34 ` Catalin Marinas
0 siblings, 1 reply; 22+ messages in thread
From: David Hildenbrand @ 2025-09-24 9:13 UTC (permalink / raw)
To: Catalin Marinas, Lance Yang
Cc: akpm, lorenzo.stoakes, usamaarif642, yuzhao, ziy, baolin.wang,
baohua, voidice, Liam.Howlett, cerasuolodomenico, hannes,
kaleshsingh, npache, riel, roman.gushchin, rppt, ryan.roberts,
dev.jain, ryncsn, shakeel.butt, surenb, hughd, willy,
matthew.brost, joshua.hahnjy, rakie.kim, byungchul, gourry,
ying.huang, apopple, qun-wei.lin, Andrew.Yang, casper.li,
chinwen.chang, linux-arm-kernel, linux-kernel, linux-mediatek,
linux-mm, ioworker0, stable
On 24.09.25 10:50, Catalin Marinas wrote:
> On Wed, Sep 24, 2025 at 10:49:27AM +0800, Lance Yang wrote:
>> On 2025/9/24 00:14, Catalin Marinas wrote:
>>> So alternative patch that also fixes the deferred struct page init (on
>>> the assumptions that the zero page is always mapped as pte_special():
>>
>> I can confirm that this alternative patch also works correctly; my tests
>> for MTE all pass ;)
>
> Thanks Lance for testing. I'll post one of the variants today.
>
>> This looks like a better fix since it solves the boot hang issue too.
>
> In principle, yes, until I tracked down why I changed it in the first
> place - 68d54ceeec0e ("arm64: mte: Allow PTRACE_PEEKMTETAGS access to
> the zero page"). ptrace() can read tags from PROT_MTE mappings and we
> want to allow reading zeroes as well if the page points to the zero
> page. Not flagging the page as PG_mte_tagged caused issues.
>
> I can change the logic in the ptrace() code, I just need to figure out
> what happens to the huge zero page. Ideally we should treat both in the
> same way but, AFAICT, we don't use pmd_mkspecial() on the huge zero
> page, so it gets flagged with PG_mte_tagged.
I changed that recently :) The huge zero folio will now always have
pmd_special() set.
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH 1/1] mm/thp: fix MTE tag mismatch when replacing zero-filled subpages
2025-09-24 9:13 ` David Hildenbrand
@ 2025-09-24 9:34 ` Catalin Marinas
2025-09-24 9:44 ` David Hildenbrand
0 siblings, 1 reply; 22+ messages in thread
From: Catalin Marinas @ 2025-09-24 9:34 UTC (permalink / raw)
To: David Hildenbrand
Cc: Lance Yang, akpm, lorenzo.stoakes, usamaarif642, yuzhao, ziy,
baolin.wang, baohua, voidice, Liam.Howlett, cerasuolodomenico,
hannes, kaleshsingh, npache, riel, roman.gushchin, rppt,
ryan.roberts, dev.jain, ryncsn, shakeel.butt, surenb, hughd,
willy, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, qun-wei.lin, Andrew.Yang, casper.li,
chinwen.chang, linux-arm-kernel, linux-kernel, linux-mediatek,
linux-mm, ioworker0, stable
On Wed, Sep 24, 2025 at 11:13:18AM +0200, David Hildenbrand wrote:
> On 24.09.25 10:50, Catalin Marinas wrote:
> > On Wed, Sep 24, 2025 at 10:49:27AM +0800, Lance Yang wrote:
> > > On 2025/9/24 00:14, Catalin Marinas wrote:
> > > > So alternative patch that also fixes the deferred struct page init (on
> > > > the assumptions that the zero page is always mapped as pte_special():
> > >
> > > I can confirm that this alternative patch also works correctly; my tests
> > > for MTE all pass ;)
> >
> > Thanks Lance for testing. I'll post one of the variants today.
> >
> > > This looks like a better fix since it solves the boot hang issue too.
> >
> > In principle, yes, until I tracked down why I changed it in the first
> > place - 68d54ceeec0e ("arm64: mte: Allow PTRACE_PEEKMTETAGS access to
> > the zero page"). ptrace() can read tags from PROT_MTE mappings and we
> > want to allow reading zeroes as well if the page points to the zero
> > page. Not flagging the page as PG_mte_tagged caused issues.
> >
> > I can change the logic in the ptrace() code, I just need to figure out
> > what happens to the huge zero page. Ideally we should treat both in the
> > same way but, AFAICT, we don't use pmd_mkspecial() on the huge zero
> > page, so it gets flagged with PG_mte_tagged.
>
> I changed that recently :) The huge zero folio will now always have
> pmd_special() set.
Oh, which commit was this? It means that we can end up with
uninitialised tags if we have a PROT_MTE huge zero page since
set_pmd_at/set_pte_at() skips mte_sync_tags().
--
Catalin
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH 1/1] mm/thp: fix MTE tag mismatch when replacing zero-filled subpages
2025-09-24 9:34 ` Catalin Marinas
@ 2025-09-24 9:44 ` David Hildenbrand
2025-09-24 9:59 ` Catalin Marinas
0 siblings, 1 reply; 22+ messages in thread
From: David Hildenbrand @ 2025-09-24 9:44 UTC (permalink / raw)
To: Catalin Marinas
Cc: Lance Yang, akpm, lorenzo.stoakes, usamaarif642, yuzhao, ziy,
baolin.wang, baohua, voidice, Liam.Howlett, cerasuolodomenico,
hannes, kaleshsingh, npache, riel, roman.gushchin, rppt,
ryan.roberts, dev.jain, ryncsn, shakeel.butt, surenb, hughd,
willy, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, qun-wei.lin, Andrew.Yang, casper.li,
chinwen.chang, linux-arm-kernel, linux-kernel, linux-mediatek,
linux-mm, ioworker0, stable
On 24.09.25 11:34, Catalin Marinas wrote:
> On Wed, Sep 24, 2025 at 11:13:18AM +0200, David Hildenbrand wrote:
>> On 24.09.25 10:50, Catalin Marinas wrote:
>>> On Wed, Sep 24, 2025 at 10:49:27AM +0800, Lance Yang wrote:
>>>> On 2025/9/24 00:14, Catalin Marinas wrote:
>>>>> So alternative patch that also fixes the deferred struct page init (on
>>>>> the assumptions that the zero page is always mapped as pte_special():
>>>>
>>>> I can confirm that this alternative patch also works correctly; my tests
>>>> for MTE all pass ;)
>>>
>>> Thanks Lance for testing. I'll post one of the variants today.
>>>
>>>> This looks like a better fix since it solves the boot hang issue too.
>>>
>>> In principle, yes, until I tracked down why I changed it in the first
>>> place - 68d54ceeec0e ("arm64: mte: Allow PTRACE_PEEKMTETAGS access to
>>> the zero page"). ptrace() can read tags from PROT_MTE mappings and we
>>> want to allow reading zeroes as well if the page points to the zero
>>> page. Not flagging the page as PG_mte_tagged caused issues.
>>>
>>> I can change the logic in the ptrace() code, I just need to figure out
>>> what happens to the huge zero page. Ideally we should treat both in the
>>> same way but, AFAICT, we don't use pmd_mkspecial() on the huge zero
>>> page, so it gets flagged with PG_mte_tagged.
>>
>> I changed that recently :) The huge zero folio will now always have
>> pmd_special() set.
>
> Oh, which commit was this? It means that we can end up with
> uninitialised tags if we have a PROT_MTE huge zero page since
> set_pmd_at/set_pte_at() skips mte_sync_tags().
>
This one:
commit d82d09e482199e6bbc204df10b2082f764cbe1f4
Author: David Hildenbrand <david@redhat.com>
Date: Mon Aug 11 13:26:25 2025 +0200
mm/huge_memory: mark PMD mappings of the huge zero folio special
The huge zero folio is refcounted (+mapcounted -- is that a word?)
differently than "normal" folios, similarly (but different) to the
ordinary shared zeropage.
It should be in mm-stable, to go upstream in the upcoming merge window.
It's been lurking in -next for a while now.
As it behaves just like the ordinary shared zeropage now, would we have
to zero/initialize the tags after allocating it?
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH 1/1] mm/thp: fix MTE tag mismatch when replacing zero-filled subpages
2025-09-24 9:44 ` David Hildenbrand
@ 2025-09-24 9:59 ` Catalin Marinas
0 siblings, 0 replies; 22+ messages in thread
From: Catalin Marinas @ 2025-09-24 9:59 UTC (permalink / raw)
To: David Hildenbrand
Cc: Lance Yang, akpm, lorenzo.stoakes, usamaarif642, yuzhao, ziy,
baolin.wang, baohua, voidice, Liam.Howlett, cerasuolodomenico,
hannes, kaleshsingh, npache, riel, roman.gushchin, rppt,
ryan.roberts, dev.jain, ryncsn, shakeel.butt, surenb, hughd,
willy, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, qun-wei.lin, Andrew.Yang, casper.li,
chinwen.chang, linux-arm-kernel, linux-kernel, linux-mediatek,
linux-mm, ioworker0, stable
On Wed, Sep 24, 2025 at 11:44:19AM +0200, David Hildenbrand wrote:
> On 24.09.25 11:34, Catalin Marinas wrote:
> > On Wed, Sep 24, 2025 at 11:13:18AM +0200, David Hildenbrand wrote:
> > > On 24.09.25 10:50, Catalin Marinas wrote:
> > > > On Wed, Sep 24, 2025 at 10:49:27AM +0800, Lance Yang wrote:
> > > > > On 2025/9/24 00:14, Catalin Marinas wrote:
> > > > > > So alternative patch that also fixes the deferred struct page init (on
> > > > > > the assumptions that the zero page is always mapped as pte_special():
> > > > >
> > > > > I can confirm that this alternative patch also works correctly; my tests
> > > > > for MTE all pass ;)
> > > >
> > > > Thanks Lance for testing. I'll post one of the variants today.
> > > >
> > > > > This looks like a better fix since it solves the boot hang issue too.
> > > >
> > > > In principle, yes, until I tracked down why I changed it in the first
> > > > place - 68d54ceeec0e ("arm64: mte: Allow PTRACE_PEEKMTETAGS access to
> > > > the zero page"). ptrace() can read tags from PROT_MTE mappings and we
> > > > want to allow reading zeroes as well if the page points to the zero
> > > > page. Not flagging the page as PG_mte_tagged caused issues.
> > > >
> > > > I can change the logic in the ptrace() code, I just need to figure out
> > > > what happens to the huge zero page. Ideally we should treat both in the
> > > > same way but, AFAICT, we don't use pmd_mkspecial() on the huge zero
> > > > page, so it gets flagged with PG_mte_tagged.
> > >
> > > I changed that recently :) The huge zero folio will now always have
> > > pmd_special() set.
> >
> > Oh, which commit was this? It means that we can end up with
> > uninitialised tags if we have a PROT_MTE huge zero page since
> > set_pmd_at/set_pte_at() skips mte_sync_tags().
>
> This one:
>
> commit d82d09e482199e6bbc204df10b2082f764cbe1f4
> Author: David Hildenbrand <david@redhat.com>
> Date: Mon Aug 11 13:26:25 2025 +0200
>
> mm/huge_memory: mark PMD mappings of the huge zero folio special
>
> The huge zero folio is refcounted (+mapcounted -- is that a word?)
> differently than "normal" folios, similarly (but different) to the
> ordinary shared zeropage.
>
>
> It should be in mm-stable, to go upstream in the upcoming merge window. It's
> been lurking in -next for a while now.
Thanks. At least it's something to address in the next kernel version. I
need to improve the MTE kselftests to catch the zero page scenarios.
> As it behaves just like the ordinary shared zeropage now, would we have to
> zero/initialize the tags after allocating it?
Yes. Before pmd_special(), it was be done lazily via set_pmd_at(). I
think it just needs a __GFP_ZEROTAGS. The only other place we use this
flag is in vma_alloc_zeroed_movable_folio(), as an optimisation to avoid
a separate loop for zeroing the tags after data.
--
Catalin
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/1] mm/thp: fix MTE tag mismatch when replacing zero-filled subpages
2025-09-22 2:14 [PATCH 1/1] mm/thp: fix MTE tag mismatch when replacing zero-filled subpages Lance Yang
` (3 preceding siblings ...)
2025-09-22 17:24 ` Catalin Marinas
@ 2025-09-23 2:10 ` Wei Yang
4 siblings, 0 replies; 22+ messages in thread
From: Wei Yang @ 2025-09-23 2:10 UTC (permalink / raw)
To: Lance Yang
Cc: akpm, david, lorenzo.stoakes, usamaarif642, yuzhao, ziy,
baolin.wang, baohua, voidice, Liam.Howlett, catalin.marinas,
cerasuolodomenico, hannes, kaleshsingh, npache, riel,
roman.gushchin, rppt, ryan.roberts, dev.jain, ryncsn,
shakeel.butt, surenb, hughd, willy, matthew.brost, joshua.hahnjy,
rakie.kim, byungchul, gourry, ying.huang, apopple, qun-wei.lin,
Andrew.Yang, casper.li, chinwen.chang, linux-arm-kernel,
linux-kernel, linux-mediatek, linux-mm, ioworker0, stable
On Mon, Sep 22, 2025 at 10:14:58AM +0800, Lance Yang wrote:
>From: Lance Yang <lance.yang@linux.dev>
>
>When both THP and MTE are enabled, splitting a THP and replacing its
>zero-filled subpages with the shared zeropage can cause MTE tag mismatch
>faults in userspace.
>
>Remapping zero-filled subpages to the shared zeropage is unsafe, as the
>zeropage has a fixed tag of zero, which may not match the tag expected by
>the userspace pointer.
>
>KSM already avoids this problem by using memcmp_pages(), which on arm64
>intentionally reports MTE-tagged pages as non-identical to prevent unsafe
>merging.
>
>As suggested by David[1], this patch adopts the same pattern, replacing the
>memchr_inv() byte-level check with a call to pages_identical(). This
>leverages existing architecture-specific logic to determine if a page is
>truly identical to the shared zeropage.
>
>Having both the THP shrinker and KSM rely on pages_identical() makes the
>design more future-proof, IMO. Instead of handling quirks in generic code,
>we just let the architecture decide what makes two pages identical.
>
>[1] https://lore.kernel.org/all/ca2106a3-4bb2-4457-81af-301fd99fbef4@redhat.com
>
>Cc: <stable@vger.kernel.org>
>Reported-by: Qun-wei Lin <Qun-wei.Lin@mediatek.com>
>Closes: https://lore.kernel.org/all/a7944523fcc3634607691c35311a5d59d1a3f8d4.camel@mediatek.com
>Fixes: b1f202060afe ("mm: remap unused subpages to shared zeropage when splitting isolated thp")
>Suggested-by: David Hildenbrand <david@redhat.com>
>Signed-off-by: Lance Yang <lance.yang@linux.dev>
Nice catch.
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
--
Wei Yang
Help you, Help me
^ permalink raw reply [flat|nested] 22+ messages in thread