From: Catalin Marinas <catalin.marinas@arm.com>
To: Piotr Jaroszynski <pjaroszynski@nvidia.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>,
Will Deacon <will@kernel.org>,
linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org,
Alistair Popple <apopple@nvidia.com>,
Jason Gunthorpe <jgg@nvidia.com>,
John Hubbard <jhubbard@nvidia.com>, Zi Yan <ziy@nvidia.com>,
Breno Leitao <leitao@debian.org>,
stable@vger.kernel.org
Subject: Re: [PATCH] arm64: contpte: fix set_access_flags() no-op check for SMMU/ATS faults
Date: Wed, 4 Mar 2026 11:17:08 +0000 [thread overview]
Message-ID: <aagUtDTca5d0le2Y@arm.com> (raw)
In-Reply-To: <20260303063751.2531716-1-pjaroszynski@nvidia.com>
On Mon, Mar 02, 2026 at 10:37:51PM -0800, Piotr Jaroszynski wrote:
> contpte_ptep_set_access_flags() compared the gathered ptep_get() value
> against the requested entry to detect no-ops. ptep_get() ORs AF/dirty
> from all sub-PTEs in the CONT block, so a dirty sibling can make the
> target appear already-dirty. When the gathered value matches entry, the
> function returns 0 even though the target sub-PTE still has PTE_RDONLY
> set in hardware.
>
> For CPU page-table walks this is benign: with FEAT_HAFDBS the hardware
> may set AF/dirty on any sub-PTE and the CPU TLB treats the gathered
> result as authoritative for the entire range. But an SMMU without HTTU
> (or with HA/HD disabled in CD.TCR) evaluates each descriptor
> individually and will keep raising F_PERMISSION on the unchanged target
> sub-PTE, causing an infinite fault loop.
This can also happen if not all CPUs support the hardware updates of the
AF/dirty bits.
> diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c
> index bcac4f55f9c1..9868bfe4607c 100644
> --- a/arch/arm64/mm/contpte.c
> +++ b/arch/arm64/mm/contpte.c
> @@ -390,6 +390,23 @@ void contpte_clear_young_dirty_ptes(struct vm_area_struct *vma,
> }
> EXPORT_SYMBOL_GPL(contpte_clear_young_dirty_ptes);
>
> +static bool contpte_all_subptes_match_access_flags(pte_t *ptep, pte_t entry)
> +{
> + pte_t *cont_ptep = contpte_align_down(ptep);
> + const pteval_t access_mask = PTE_RDONLY | PTE_AF | PTE_WRITE | PTE_DIRTY;
> + pteval_t entry_access = pte_val(entry) & access_mask;
> + int i;
> +
> + for (i = 0; i < CONT_PTES; i++) {
> + pteval_t pte_access = pte_val(__ptep_get(cont_ptep + i)) & access_mask;
> +
> + if (pte_access != entry_access)
> + return false;
> + }
> +
> + return true;
> +}
> +
> int contpte_ptep_set_access_flags(struct vm_area_struct *vma,
> unsigned long addr, pte_t *ptep,
> pte_t entry, int dirty)
> @@ -399,13 +416,35 @@ int contpte_ptep_set_access_flags(struct vm_area_struct *vma,
> int i;
>
> /*
> - * Gather the access/dirty bits for the contiguous range. If nothing has
> - * changed, its a noop.
> + * Check whether all sub-PTEs in the CONT block already have the
> + * requested access flags, using raw per-PTE values rather than the
> + * gathered ptep_get() view.
> + *
> + * ptep_get() gathers AF/dirty state across the whole CONT block,
> + * which is correct for CPU TLB semantics: with FEAT_HAFDBS the
> + * hardware may set AF/dirty on any sub-PTE and the CPU TLB treats
> + * the gathered result as authoritative for the entire range. But an
> + * SMMU without HTTU (or with HA/HD disabled in CD.TCR) evaluates
> + * each descriptor individually and will keep faulting on the target
> + * sub-PTE if its flags haven't actually been updated. Gathering can
> + * therefore cause false no-ops when only a sibling has been updated:
> + * - write faults: target still has PTE_RDONLY (needs PTE_RDONLY cleared)
> + * - read faults: target still lacks PTE_AF
> + *
> + * Per Arm ARM (DDI 0487) D8.7.1, any sub-PTE in a CONT range may
> + * become the effective cached translation, so all entries must have
> + * consistent attributes. Check the full CONT block before returning
> + * no-op, and when any sub-PTE mismatches, proceed to update the whole
> + * range.
> */
> - orig_pte = pte_mknoncont(ptep_get(ptep));
> - if (pte_val(orig_pte) == pte_val(entry))
> + if (contpte_all_subptes_match_access_flags(ptep, entry))
> return 0;
Actually, do we need to loop over all the ptes? I think it sufficient to
only check the one at ptep since it is the one that triggered the fault.
Instead of ptep_get(ptep), use __ptep_get(ptep). The rest of the
function sets the flags correctly for all the ptes in the contig range.
--
Catalin
next prev parent reply other threads:[~2026-03-04 11:17 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-03 6:37 Piotr Jaroszynski
2026-03-03 7:19 ` James Houghton
2026-03-03 12:45 ` Jason Gunthorpe
2026-03-03 21:40 ` Piotr Jaroszynski
2026-03-03 8:38 ` Ryan Roberts
2026-03-03 18:40 ` Piotr Jaroszynski
2026-03-03 19:12 ` Jason Gunthorpe
2026-03-04 12:20 ` Ryan Roberts
2026-03-04 13:44 ` Jason Gunthorpe
2026-03-04 11:17 ` Catalin Marinas [this message]
2026-03-04 13:43 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aagUtDTca5d0le2Y@arm.com \
--to=catalin.marinas@arm.com \
--cc=apopple@nvidia.com \
--cc=jgg@nvidia.com \
--cc=jhubbard@nvidia.com \
--cc=leitao@debian.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-mm@kvack.org \
--cc=pjaroszynski@nvidia.com \
--cc=ryan.roberts@arm.com \
--cc=stable@vger.kernel.org \
--cc=will@kernel.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox