linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Balbir Singh <balbirs@nvidia.com>
To: Piotr Jaroszynski <pjaroszynski@nvidia.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>, Ard Biesheuvel <ardb@kernel.org>,
	Ryan Roberts <ryan.roberts@arm.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	John Hubbard <jhubbard@nvidia.com>, Zi Yan <ziy@nvidia.com>,
	Breno Leitao <leitao@debian.org>,
	stable@vger.kernel.org, Alistair Popple <apopple@nvidia.com>,
	James Houghton <jthoughton@google.com>,
	Jason Gunthorpe <jgg@ziepe.ca>
Subject: Re: [PATCH v2] arm64: contpte: fix set_access_flags() no-op check for SMMU/ATS faults
Date: Fri, 6 Mar 2026 21:48:39 +1100	[thread overview]
Message-ID: <32970563-29bf-4569-befd-a8f4c5a3e689@nvidia.com> (raw)
In-Reply-To: <20260305-contpte-fault-loop-v2-1-0216f0026d7f@nvidia.com>

On 3/6/26 10:26, Piotr Jaroszynski wrote:
> contpte_ptep_set_access_flags() compared the gathered ptep_get() value
> against the requested entry to detect no-ops. ptep_get() ORs AF/dirty
> from all sub-PTEs in the CONT block, so a dirty sibling can make the
> target appear already-dirty. When the gathered value matches entry, the
> function returns 0 even though the target sub-PTE still has PTE_RDONLY
> set in hardware.
> 
> For a CPU with FEAT_HAFDBS this gathered view is fine, since hardware may
> set AF/dirty on any sub-PTE and CPU TLB behavior is effectively gathered
> across the CONT range. But page-table walkers that evaluate each
> descriptor individually (e.g. a CPU without DBM support, or an SMMU
> without HTTU, or with HA/HD disabled in CD.TCR) can keep faulting on the
> unchanged target sub-PTE, causing an infinite fault loop.
> 
> Gathering can therefore cause false no-ops when only a sibling has been
> updated:
>  - write faults: target still has PTE_RDONLY (needs PTE_RDONLY cleared)
>  - read faults:  target still lacks PTE_AF
> 
> Fix by checking each sub-PTE against the requested AF/dirty/write state
> (the same bits consumed by __ptep_set_access_flags()), using raw
> per-PTE values rather than the gathered ptep_get() view, before
> returning no-op. Keep using the raw target PTE for the write-bit unfold
> decision.
> 
> Per Arm ARM (DDI 0487) D8.7.1 ("The Contiguous bit"), any sub-PTE in a CONT
> range may become the effective cached translation and software must
> maintain consistent attributes across the range.
> 
> Fixes: 4602e5757bcc ("arm64/mm: wire up PTE_CONT for user mappings")
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Jason Gunthorpe <jgg@nvidia.com>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: Breno Leitao <leitao@debian.org>
> Cc: stable@vger.kernel.org
> Reviewed-by: Alistair Popple <apopple@nvidia.com>
> Reviewed-by: James Houghton <jthoughton@google.com>
> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> Tested-by: Breno Leitao <leitao@debian.org>
> Signed-off-by: Piotr Jaroszynski <pjaroszynski@nvidia.com>
> ---
> Changes in v2:
> - Clarify commit message/comments: issue affects per-descriptor walkers
>   (CPU without DBM support, or SMMU without HTTU / with HA/HD disabled).
> - Clarify sub-PTE comparison semantics: use raw per-PTE values and match
>   bits consumed by __ptep_set_access_flags() (AF, DIRTY, write permission).
> - Add Reviewed-by/Tested-by trailers from the v1 thread.
> ---
>  arch/arm64/mm/contpte.c | 53 +++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 49 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c
> index b929a455103f..1519d090d5ea 100644
> --- a/arch/arm64/mm/contpte.c
> +++ b/arch/arm64/mm/contpte.c
> @@ -599,6 +599,27 @@ void contpte_clear_young_dirty_ptes(struct vm_area_struct *vma,
>  }
>  EXPORT_SYMBOL_GPL(contpte_clear_young_dirty_ptes);
>  
> +static bool contpte_all_subptes_match_access_flags(pte_t *ptep, pte_t entry)
> +{
> +	pte_t *cont_ptep = contpte_align_down(ptep);
> +	/*
> +	 * PFNs differ per sub-PTE. Match only bits consumed by
> +	 * __ptep_set_access_flags(): AF, DIRTY and write permission.
> +	 */
> +	const pteval_t cmp_mask = PTE_RDONLY | PTE_AF | PTE_WRITE | PTE_DIRTY;
> +	pteval_t entry_cmp = pte_val(entry) & cmp_mask;
> +	int i;
> +
> +	for (i = 0; i < CONT_PTES; i++) {
> +		pteval_t pte_cmp = pte_val(__ptep_get(cont_ptep + i)) & cmp_mask;
> +
> +		if (pte_cmp != entry_cmp)
> +			return false;
> +	}
> +
> +	return true;
> +}
> +
>  int contpte_ptep_set_access_flags(struct vm_area_struct *vma,
>  					unsigned long addr, pte_t *ptep,
>  					pte_t entry, int dirty)
> @@ -608,13 +629,37 @@ int contpte_ptep_set_access_flags(struct vm_area_struct *vma,
>  	int i;
>  
>  	/*
> -	 * Gather the access/dirty bits for the contiguous range. If nothing has
> -	 * changed, its a noop.
> +	 * Check whether all sub-PTEs in the CONT block already match the
> +	 * requested access flags/write permission, using raw per-PTE values
> +	 * rather than the gathered ptep_get() view.
> +	 *
> +	 * __ptep_set_access_flags() can update AF, dirty and write
> +	 * permission, but only to make the mapping more permissive.
> +	 *
> +	 * ptep_get() gathers AF/dirty state across the whole CONT block,
> +	 * which is correct for a CPU with FEAT_HAFDBS. But page-table
> +	 * walkers that evaluate each descriptor individually (e.g. a CPU
> +	 * without DBM support, or an SMMU without HTTU, or with HA/HD
> +	 * disabled in CD.TCR) can keep faulting on the target sub-PTE if
> +	 * only a sibling has been updated. Gathering can therefore cause
> +	 * false no-ops when only a sibling has been updated:
> +	 *  - write faults: target still has PTE_RDONLY (needs PTE_RDONLY cleared)
> +	 *  - read faults:  target still lacks PTE_AF
> +	 *
> +	 * Per Arm ARM (DDI 0487) D8.7.1, any sub-PTE in a CONT range may
> +	 * become the effective cached translation, so all entries must have
> +	 * consistent attributes. Check the full CONT block before returning
> +	 * no-op, and when any sub-PTE mismatches, proceed to update the whole
> +	 * range.
>  	 */
> -	orig_pte = pte_mknoncont(ptep_get(ptep));
> -	if (pte_val(orig_pte) == pte_val(entry))
> +	if (contpte_all_subptes_match_access_flags(ptep, entry))
>  		return 0;
>  
> +	/*
> +	 * Use raw target pte (not gathered) for write-bit unfold decision.
> +	 */
> +	orig_pte = pte_mknoncont(__ptep_get(ptep));
> +
>  	/*
>  	 * We can fix up access/dirty bits without having to unfold the contig
>  	 * range. But if the write bit is changing, we must unfold.
> 
Looks good

Acked-by: Balbir Singh <balbirs@nvidia.com>

Balbir


  reply	other threads:[~2026-03-06 10:48 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-05 23:26 Piotr Jaroszynski
2026-03-06 10:48 ` Balbir Singh [this message]
2026-03-06 12:26 ` Will Deacon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=32970563-29bf-4569-befd-a8f4c5a3e689@nvidia.com \
    --to=balbirs@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=ardb@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=jgg@ziepe.ca \
    --cc=jhubbard@nvidia.com \
    --cc=jthoughton@google.com \
    --cc=leitao@debian.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mark.rutland@arm.com \
    --cc=pjaroszynski@nvidia.com \
    --cc=ryan.roberts@arm.com \
    --cc=stable@vger.kernel.org \
    --cc=will@kernel.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox