* [PATCH v2] arm64: contpte: fix set_access_flags() no-op check for SMMU/ATS faults
@ 2026-03-05 23:26 Piotr Jaroszynski
2026-03-06 10:48 ` Balbir Singh
2026-03-06 12:26 ` Will Deacon
0 siblings, 2 replies; 3+ messages in thread
From: Piotr Jaroszynski @ 2026-03-05 23:26 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Ard Biesheuvel, Ryan Roberts,
Mark Rutland, Andrew Morton
Cc: linux-arm-kernel, linux-kernel, linux-mm, John Hubbard, Zi Yan,
Breno Leitao, stable, Alistair Popple, James Houghton,
Piotr Jaroszynski, Jason Gunthorpe
contpte_ptep_set_access_flags() compared the gathered ptep_get() value
against the requested entry to detect no-ops. ptep_get() ORs AF/dirty
from all sub-PTEs in the CONT block, so a dirty sibling can make the
target appear already-dirty. When the gathered value matches entry, the
function returns 0 even though the target sub-PTE still has PTE_RDONLY
set in hardware.
For a CPU with FEAT_HAFDBS this gathered view is fine, since hardware may
set AF/dirty on any sub-PTE and CPU TLB behavior is effectively gathered
across the CONT range. But page-table walkers that evaluate each
descriptor individually (e.g. a CPU without DBM support, or an SMMU
without HTTU, or with HA/HD disabled in CD.TCR) can keep faulting on the
unchanged target sub-PTE, causing an infinite fault loop.
Gathering can therefore cause false no-ops when only a sibling has been
updated:
- write faults: target still has PTE_RDONLY (needs PTE_RDONLY cleared)
- read faults: target still lacks PTE_AF
Fix by checking each sub-PTE against the requested AF/dirty/write state
(the same bits consumed by __ptep_set_access_flags()), using raw
per-PTE values rather than the gathered ptep_get() view, before
returning no-op. Keep using the raw target PTE for the write-bit unfold
decision.
Per Arm ARM (DDI 0487) D8.7.1 ("The Contiguous bit"), any sub-PTE in a CONT
range may become the effective cached translation and software must
maintain consistent attributes across the range.
Fixes: 4602e5757bcc ("arm64/mm: wire up PTE_CONT for user mappings")
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Breno Leitao <leitao@debian.org>
Cc: stable@vger.kernel.org
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: James Houghton <jthoughton@google.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Piotr Jaroszynski <pjaroszynski@nvidia.com>
---
Changes in v2:
- Clarify commit message/comments: issue affects per-descriptor walkers
(CPU without DBM support, or SMMU without HTTU / with HA/HD disabled).
- Clarify sub-PTE comparison semantics: use raw per-PTE values and match
bits consumed by __ptep_set_access_flags() (AF, DIRTY, write permission).
- Add Reviewed-by/Tested-by trailers from the v1 thread.
---
arch/arm64/mm/contpte.c | 53 +++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 49 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c
index b929a455103f..1519d090d5ea 100644
--- a/arch/arm64/mm/contpte.c
+++ b/arch/arm64/mm/contpte.c
@@ -599,6 +599,27 @@ void contpte_clear_young_dirty_ptes(struct vm_area_struct *vma,
}
EXPORT_SYMBOL_GPL(contpte_clear_young_dirty_ptes);
+static bool contpte_all_subptes_match_access_flags(pte_t *ptep, pte_t entry)
+{
+ pte_t *cont_ptep = contpte_align_down(ptep);
+ /*
+ * PFNs differ per sub-PTE. Match only bits consumed by
+ * __ptep_set_access_flags(): AF, DIRTY and write permission.
+ */
+ const pteval_t cmp_mask = PTE_RDONLY | PTE_AF | PTE_WRITE | PTE_DIRTY;
+ pteval_t entry_cmp = pte_val(entry) & cmp_mask;
+ int i;
+
+ for (i = 0; i < CONT_PTES; i++) {
+ pteval_t pte_cmp = pte_val(__ptep_get(cont_ptep + i)) & cmp_mask;
+
+ if (pte_cmp != entry_cmp)
+ return false;
+ }
+
+ return true;
+}
+
int contpte_ptep_set_access_flags(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep,
pte_t entry, int dirty)
@@ -608,13 +629,37 @@ int contpte_ptep_set_access_flags(struct vm_area_struct *vma,
int i;
/*
- * Gather the access/dirty bits for the contiguous range. If nothing has
- * changed, its a noop.
+ * Check whether all sub-PTEs in the CONT block already match the
+ * requested access flags/write permission, using raw per-PTE values
+ * rather than the gathered ptep_get() view.
+ *
+ * __ptep_set_access_flags() can update AF, dirty and write
+ * permission, but only to make the mapping more permissive.
+ *
+ * ptep_get() gathers AF/dirty state across the whole CONT block,
+ * which is correct for a CPU with FEAT_HAFDBS. But page-table
+ * walkers that evaluate each descriptor individually (e.g. a CPU
+ * without DBM support, or an SMMU without HTTU, or with HA/HD
+ * disabled in CD.TCR) can keep faulting on the target sub-PTE if
+ * only a sibling has been updated. Gathering can therefore cause
+ * false no-ops when only a sibling has been updated:
+ * - write faults: target still has PTE_RDONLY (needs PTE_RDONLY cleared)
+ * - read faults: target still lacks PTE_AF
+ *
+ * Per Arm ARM (DDI 0487) D8.7.1, any sub-PTE in a CONT range may
+ * become the effective cached translation, so all entries must have
+ * consistent attributes. Check the full CONT block before returning
+ * no-op, and when any sub-PTE mismatches, proceed to update the whole
+ * range.
*/
- orig_pte = pte_mknoncont(ptep_get(ptep));
- if (pte_val(orig_pte) == pte_val(entry))
+ if (contpte_all_subptes_match_access_flags(ptep, entry))
return 0;
+ /*
+ * Use raw target pte (not gathered) for write-bit unfold decision.
+ */
+ orig_pte = pte_mknoncont(__ptep_get(ptep));
+
/*
* We can fix up access/dirty bits without having to unfold the contig
* range. But if the write bit is changing, we must unfold.
---
base-commit: c107785c7e8dbabd1c18301a1c362544b5786282
change-id: 20260305-contpte-fault-loop-76ed911b01c0
Best regards,
--
Piotr Jaroszynski <pjaroszynski@nvidia.com>
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [PATCH v2] arm64: contpte: fix set_access_flags() no-op check for SMMU/ATS faults
2026-03-05 23:26 [PATCH v2] arm64: contpte: fix set_access_flags() no-op check for SMMU/ATS faults Piotr Jaroszynski
@ 2026-03-06 10:48 ` Balbir Singh
2026-03-06 12:26 ` Will Deacon
1 sibling, 0 replies; 3+ messages in thread
From: Balbir Singh @ 2026-03-06 10:48 UTC (permalink / raw)
To: Piotr Jaroszynski, Catalin Marinas, Will Deacon, Ard Biesheuvel,
Ryan Roberts, Mark Rutland, Andrew Morton
Cc: linux-arm-kernel, linux-kernel, linux-mm, John Hubbard, Zi Yan,
Breno Leitao, stable, Alistair Popple, James Houghton,
Jason Gunthorpe
On 3/6/26 10:26, Piotr Jaroszynski wrote:
> contpte_ptep_set_access_flags() compared the gathered ptep_get() value
> against the requested entry to detect no-ops. ptep_get() ORs AF/dirty
> from all sub-PTEs in the CONT block, so a dirty sibling can make the
> target appear already-dirty. When the gathered value matches entry, the
> function returns 0 even though the target sub-PTE still has PTE_RDONLY
> set in hardware.
>
> For a CPU with FEAT_HAFDBS this gathered view is fine, since hardware may
> set AF/dirty on any sub-PTE and CPU TLB behavior is effectively gathered
> across the CONT range. But page-table walkers that evaluate each
> descriptor individually (e.g. a CPU without DBM support, or an SMMU
> without HTTU, or with HA/HD disabled in CD.TCR) can keep faulting on the
> unchanged target sub-PTE, causing an infinite fault loop.
>
> Gathering can therefore cause false no-ops when only a sibling has been
> updated:
> - write faults: target still has PTE_RDONLY (needs PTE_RDONLY cleared)
> - read faults: target still lacks PTE_AF
>
> Fix by checking each sub-PTE against the requested AF/dirty/write state
> (the same bits consumed by __ptep_set_access_flags()), using raw
> per-PTE values rather than the gathered ptep_get() view, before
> returning no-op. Keep using the raw target PTE for the write-bit unfold
> decision.
>
> Per Arm ARM (DDI 0487) D8.7.1 ("The Contiguous bit"), any sub-PTE in a CONT
> range may become the effective cached translation and software must
> maintain consistent attributes across the range.
>
> Fixes: 4602e5757bcc ("arm64/mm: wire up PTE_CONT for user mappings")
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Jason Gunthorpe <jgg@nvidia.com>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: Breno Leitao <leitao@debian.org>
> Cc: stable@vger.kernel.org
> Reviewed-by: Alistair Popple <apopple@nvidia.com>
> Reviewed-by: James Houghton <jthoughton@google.com>
> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> Tested-by: Breno Leitao <leitao@debian.org>
> Signed-off-by: Piotr Jaroszynski <pjaroszynski@nvidia.com>
> ---
> Changes in v2:
> - Clarify commit message/comments: issue affects per-descriptor walkers
> (CPU without DBM support, or SMMU without HTTU / with HA/HD disabled).
> - Clarify sub-PTE comparison semantics: use raw per-PTE values and match
> bits consumed by __ptep_set_access_flags() (AF, DIRTY, write permission).
> - Add Reviewed-by/Tested-by trailers from the v1 thread.
> ---
> arch/arm64/mm/contpte.c | 53 +++++++++++++++++++++++++++++++++++++++++++++----
> 1 file changed, 49 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c
> index b929a455103f..1519d090d5ea 100644
> --- a/arch/arm64/mm/contpte.c
> +++ b/arch/arm64/mm/contpte.c
> @@ -599,6 +599,27 @@ void contpte_clear_young_dirty_ptes(struct vm_area_struct *vma,
> }
> EXPORT_SYMBOL_GPL(contpte_clear_young_dirty_ptes);
>
> +static bool contpte_all_subptes_match_access_flags(pte_t *ptep, pte_t entry)
> +{
> + pte_t *cont_ptep = contpte_align_down(ptep);
> + /*
> + * PFNs differ per sub-PTE. Match only bits consumed by
> + * __ptep_set_access_flags(): AF, DIRTY and write permission.
> + */
> + const pteval_t cmp_mask = PTE_RDONLY | PTE_AF | PTE_WRITE | PTE_DIRTY;
> + pteval_t entry_cmp = pte_val(entry) & cmp_mask;
> + int i;
> +
> + for (i = 0; i < CONT_PTES; i++) {
> + pteval_t pte_cmp = pte_val(__ptep_get(cont_ptep + i)) & cmp_mask;
> +
> + if (pte_cmp != entry_cmp)
> + return false;
> + }
> +
> + return true;
> +}
> +
> int contpte_ptep_set_access_flags(struct vm_area_struct *vma,
> unsigned long addr, pte_t *ptep,
> pte_t entry, int dirty)
> @@ -608,13 +629,37 @@ int contpte_ptep_set_access_flags(struct vm_area_struct *vma,
> int i;
>
> /*
> - * Gather the access/dirty bits for the contiguous range. If nothing has
> - * changed, its a noop.
> + * Check whether all sub-PTEs in the CONT block already match the
> + * requested access flags/write permission, using raw per-PTE values
> + * rather than the gathered ptep_get() view.
> + *
> + * __ptep_set_access_flags() can update AF, dirty and write
> + * permission, but only to make the mapping more permissive.
> + *
> + * ptep_get() gathers AF/dirty state across the whole CONT block,
> + * which is correct for a CPU with FEAT_HAFDBS. But page-table
> + * walkers that evaluate each descriptor individually (e.g. a CPU
> + * without DBM support, or an SMMU without HTTU, or with HA/HD
> + * disabled in CD.TCR) can keep faulting on the target sub-PTE if
> + * only a sibling has been updated. Gathering can therefore cause
> + * false no-ops when only a sibling has been updated:
> + * - write faults: target still has PTE_RDONLY (needs PTE_RDONLY cleared)
> + * - read faults: target still lacks PTE_AF
> + *
> + * Per Arm ARM (DDI 0487) D8.7.1, any sub-PTE in a CONT range may
> + * become the effective cached translation, so all entries must have
> + * consistent attributes. Check the full CONT block before returning
> + * no-op, and when any sub-PTE mismatches, proceed to update the whole
> + * range.
> */
> - orig_pte = pte_mknoncont(ptep_get(ptep));
> - if (pte_val(orig_pte) == pte_val(entry))
> + if (contpte_all_subptes_match_access_flags(ptep, entry))
> return 0;
>
> + /*
> + * Use raw target pte (not gathered) for write-bit unfold decision.
> + */
> + orig_pte = pte_mknoncont(__ptep_get(ptep));
> +
> /*
> * We can fix up access/dirty bits without having to unfold the contig
> * range. But if the write bit is changing, we must unfold.
>
Looks good
Acked-by: Balbir Singh <balbirs@nvidia.com>
Balbir
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [PATCH v2] arm64: contpte: fix set_access_flags() no-op check for SMMU/ATS faults
2026-03-05 23:26 [PATCH v2] arm64: contpte: fix set_access_flags() no-op check for SMMU/ATS faults Piotr Jaroszynski
2026-03-06 10:48 ` Balbir Singh
@ 2026-03-06 12:26 ` Will Deacon
1 sibling, 0 replies; 3+ messages in thread
From: Will Deacon @ 2026-03-06 12:26 UTC (permalink / raw)
To: Catalin Marinas, Ard Biesheuvel, Ryan Roberts, Mark Rutland,
Andrew Morton, Piotr Jaroszynski
Cc: kernel-team, Will Deacon, linux-arm-kernel, linux-kernel,
linux-mm, John Hubbard, Zi Yan, Breno Leitao, stable,
Alistair Popple, James Houghton, Jason Gunthorpe
On Thu, 05 Mar 2026 15:26:29 -0800, Piotr Jaroszynski wrote:
> contpte_ptep_set_access_flags() compared the gathered ptep_get() value
> against the requested entry to detect no-ops. ptep_get() ORs AF/dirty
> from all sub-PTEs in the CONT block, so a dirty sibling can make the
> target appear already-dirty. When the gathered value matches entry, the
> function returns 0 even though the target sub-PTE still has PTE_RDONLY
> set in hardware.
>
> [...]
Applied to arm64 (for-next/fixes), thanks!
[1/1] arm64: contpte: fix set_access_flags() no-op check for SMMU/ATS faults
https://git.kernel.org/arm64/c/97c5550b7631
Cheers,
--
Will
https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-03-06 12:27 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-03-05 23:26 [PATCH v2] arm64: contpte: fix set_access_flags() no-op check for SMMU/ATS faults Piotr Jaroszynski
2026-03-06 10:48 ` Balbir Singh
2026-03-06 12:26 ` Will Deacon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox