linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ryan Roberts <ryan.roberts@arm.com>
To: Anshuman Khandual <anshuman.khandual@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	Muchun Song <muchun.song@linux.dev>,
	Pasha Tatashin <pasha.tatashin@soleen.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Uladzislau Rezki <urezki@gmail.com>,
	Christoph Hellwig <hch@infradead.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Ard Biesheuvel <ardb@kernel.org>, Dev Jain <dev.jain@arm.com>,
	Alexandre Ghiti <alexghiti@rivosinc.com>,
	Steve Capper <steve.capper@linaro.org>,
	Kevin Brodsky <kevin.brodsky@arm.com>
Cc: linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v1 08/16] arm64/mm: Hoist barriers out of ___set_ptes() loop
Date: Wed, 12 Feb 2025 16:00:25 +0000	[thread overview]
Message-ID: <6b3ad437-04df-41c6-81f4-c0fec5a099ec@arm.com> (raw)
In-Reply-To: <e0400886-d4bc-40f5-8375-fc3d515fd386@arm.com>

On 07/02/2025 10:38, Ryan Roberts wrote:
> On 07/02/2025 05:35, Anshuman Khandual wrote:
>>
>>
>> On 2/5/25 20:39, Ryan Roberts wrote:
>>> ___set_ptes() previously called __set_pte() for each PTE in the range,
>>> which would conditionally issue a DSB and ISB to make the new PTE value
>>> immediately visible to the table walker if the new PTE was valid and for
>>> kernel space.
>>>
>>> We can do better than this; let's hoist those barriers out of the loop
>>> so that they are only issued once at the end of the loop. We then reduce
>>> the cost by the number of PTEs in the range.
>>>
>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>>> ---
>>>  arch/arm64/include/asm/pgtable.h | 14 ++++++++++----
>>>  1 file changed, 10 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>>> index 3b55d9a15f05..1d428e9c0e5a 100644
>>> --- a/arch/arm64/include/asm/pgtable.h
>>> +++ b/arch/arm64/include/asm/pgtable.h
>>> @@ -317,10 +317,8 @@ static inline void __set_pte_nosync(pte_t *ptep, pte_t pte)
>>>  	WRITE_ONCE(*ptep, pte);
>>>  }
>>>  
>>> -static inline void __set_pte(pte_t *ptep, pte_t pte)
>>> +static inline void __set_pte_complete(pte_t pte)
>>>  {
>>> -	__set_pte_nosync(ptep, pte);
>>> -
>>>  	/*
>>>  	 * Only if the new pte is valid and kernel, otherwise TLB maintenance
>>>  	 * or update_mmu_cache() have the necessary barriers.
>>> @@ -331,6 +329,12 @@ static inline void __set_pte(pte_t *ptep, pte_t pte)
>>>  	}
>>>  }
>>>  
>>> +static inline void __set_pte(pte_t *ptep, pte_t pte)
>>> +{
>>> +	__set_pte_nosync(ptep, pte);
>>> +	__set_pte_complete(pte);
>>> +}
>>> +
>>>  static inline pte_t __ptep_get(pte_t *ptep)
>>>  {
>>>  	return READ_ONCE(*ptep);
>>> @@ -647,12 +651,14 @@ static inline void ___set_ptes(struct mm_struct *mm, pte_t *ptep, pte_t pte,
>>>  
>>>  	for (;;) {
>>>  		__check_safe_pte_update(mm, ptep, pte);
>>> -		__set_pte(ptep, pte);
>>> +		__set_pte_nosync(ptep, pte);
>>>  		if (--nr == 0)
>>>  			break;
>>>  		ptep++;
>>>  		pte = pte_advance_pfn(pte, stride);
>>>  	}
>>> +
>>> +	__set_pte_complete(pte);
>>
>> Given that the loop now iterates over number of page table entries without corresponding
>> consecutive dsb/isb sync, could there be a situation where something else gets scheduled
>> on the cpu before __set_pte_complete() is called ? Hence leaving the entire page table
>> entries block without desired mapping effect. IOW how __set_pte_complete() is ensured to
>> execute once the loop above completes. Otherwise this change LGTM.
> 
> I don't think this changes the model. Previously, __set_pte() was called, which
> writes the pte to the pgtable, then issues the barriers. So there is still a
> window where the thread could be unscheduled after the write but before the
> barriers. Yes, my change makese that window bigger, but if it is a bug now, it
> was a bug before.
> 
> Additionally, the spec for set_ptes() is:
> 
> /**
>  * set_ptes - Map consecutive pages to a contiguous range of addresses.
>  * @mm: Address space to map the pages into.
>  * @addr: Address to map the first page at.
>  * @ptep: Page table pointer for the first entry.
>  * @pte: Page table entry for the first page.
>  * @nr: Number of pages to map.
>  *
>  * When nr==1, initial state of pte may be present or not present, and new state
>  * may be present or not present. When nr>1, initial state of all ptes must be
>  * not present, and new state must be present.
>  *
>  * May be overridden by the architecture, or the architecture can define
>  * set_pte() and PFN_PTE_SHIFT.
>  *
>  * Context: The caller holds the page table lock.  The pages all belong
>  * to the same folio.  The PTEs are all in the same PMD.
>  */
> 
> Note that the caller is required to hold the page table lock. That's a spin lock
> so should be non-preemptible at this point (perhaps not for RT?)
> 
> Although actually, vmalloc doesn't hold a lock when calling these helpers; it
> has a lock when allocating the VA space, then drops it.
> 
> So yes, I think there is a chance of preemption after writing the pgtable entry
> but before issuing the barriers.
> 
> But in that case, we get saved by the DSB in the context switch path. There is
> no guarrantee of an ISB in that path (AFAIU). But the need for an ISB is a bit
> whooly anyway. My rough understanding is that the ISB is there to prevent
> previous speculation from determining that a given translation was invalid and
> "caching" that determination in the pipeline. That could still (theoretically)
> happen on remote CPUs I think, and we have the spurious fault handler to detect
> that. Anyway, once you context switch, the local CPU becomes remote and we don't
> have the ISB there, so what's the difference... There's a high chance I've
> misunderstood a bunch of this.

I thought about this some more; The ISB is there to ensure that the "speculative
invalid translation marker" cached in the pipeline gets removed prior to any
code that runs after set_ptes() returns which accesses an address now mapped by
the pte that was set. Even if preemption occurs, the ISB will still execute when
the thread runs again, before the return from set_ptes(). So all is well.

> 
> 
> In conclusion, I don't think I've made things any worse.
> 
> Thanks,
> Ryan
> 
>>
>>>  }
>>>  
>>>  static inline void __set_ptes(struct mm_struct *mm,
> 



  reply	other threads:[~2025-02-12 16:00 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-05 15:09 [PATCH v1 00/16] hugetlb and vmalloc fixes and perf improvements Ryan Roberts
2025-02-05 15:09 ` [PATCH v1 01/16] mm: hugetlb: Add huge page size param to huge_ptep_get_and_clear() Ryan Roberts
2025-02-06  5:03   ` Anshuman Khandual
2025-02-06 12:15     ` Ryan Roberts
2025-02-05 15:09 ` [PATCH v1 02/16] arm64: hugetlb: Fix huge_ptep_get_and_clear() for non-present ptes Ryan Roberts
2025-02-06  6:15   ` Anshuman Khandual
2025-02-06 12:55     ` Ryan Roberts
2025-02-12 14:44       ` Ryan Roberts
2025-02-05 15:09 ` [PATCH v1 03/16] arm64: hugetlb: Fix flush_hugetlb_tlb_range() invalidation level Ryan Roberts
2025-02-06  6:46   ` Anshuman Khandual
2025-02-06 13:04     ` Ryan Roberts
2025-02-13  4:57       ` Anshuman Khandual
2025-02-05 15:09 ` [PATCH v1 04/16] arm64: hugetlb: Refine tlb maintenance scope Ryan Roberts
2025-02-05 15:09 ` [PATCH v1 05/16] mm/page_table_check: Batch-check pmds/puds just like ptes Ryan Roberts
2025-02-06 10:55   ` Anshuman Khandual
2025-02-06 13:07     ` Ryan Roberts
2025-02-05 15:09 ` [PATCH v1 06/16] arm64/mm: Refactor __set_ptes() and __ptep_get_and_clear() Ryan Roberts
2025-02-06 11:48   ` Anshuman Khandual
2025-02-06 13:26     ` Ryan Roberts
2025-02-07  9:38       ` Ryan Roberts
2025-02-12 15:29         ` Ryan Roberts
2025-02-05 15:09 ` [PATCH v1 07/16] arm64: hugetlb: Use ___set_ptes() and ___ptep_get_and_clear() Ryan Roberts
2025-02-07  4:09   ` Anshuman Khandual
2025-02-07 10:00     ` Ryan Roberts
2025-02-05 15:09 ` [PATCH v1 08/16] arm64/mm: Hoist barriers out of ___set_ptes() loop Ryan Roberts
2025-02-07  5:35   ` Anshuman Khandual
2025-02-07 10:38     ` Ryan Roberts
2025-02-12 16:00       ` Ryan Roberts [this message]
2025-02-05 15:09 ` [PATCH v1 09/16] arm64/mm: Avoid barriers for invalid or userspace mappings Ryan Roberts
2025-02-07  8:11   ` Anshuman Khandual
2025-02-07 10:53     ` Ryan Roberts
2025-02-12 16:48       ` Ryan Roberts
2025-02-05 15:09 ` [PATCH v1 10/16] mm/vmalloc: Warn on improper use of vunmap_range() Ryan Roberts
2025-02-07  8:41   ` Anshuman Khandual
2025-02-07 10:59     ` Ryan Roberts
2025-02-13  6:36       ` Anshuman Khandual
2025-02-05 15:09 ` [PATCH v1 11/16] mm/vmalloc: Gracefully unmap huge ptes Ryan Roberts
2025-02-07  9:19   ` Anshuman Khandual
2025-02-05 15:09 ` [PATCH v1 12/16] arm64/mm: Support huge pte-mapped pages in vmap Ryan Roberts
2025-02-07 10:04   ` Anshuman Khandual
2025-02-07 11:20     ` Ryan Roberts
2025-02-13  6:32       ` Anshuman Khandual
2025-02-13  9:09         ` Ryan Roberts
2025-02-17  4:33           ` Anshuman Khandual
2025-02-05 15:09 ` [PATCH v1 13/16] mm: Don't skip arch_sync_kernel_mappings() in error paths Ryan Roberts
2025-02-07 10:21   ` Anshuman Khandual
2025-02-05 15:09 ` [PATCH v1 14/16] mm/vmalloc: Batch arch_sync_kernel_mappings() more efficiently Ryan Roberts
2025-02-10  7:11   ` Anshuman Khandual
2025-02-05 15:09 ` [PATCH v1 15/16] mm: Generalize arch_sync_kernel_mappings() Ryan Roberts
2025-02-10  7:45   ` Anshuman Khandual
2025-02-10 11:04     ` Ryan Roberts
2025-02-13  5:57       ` Anshuman Khandual
2025-02-13  9:17         ` Ryan Roberts
2025-02-05 15:09 ` [PATCH v1 16/16] arm64/mm: Defer barriers when updating kernel mappings Ryan Roberts
2025-02-10  8:03   ` Anshuman Khandual
2025-02-10 11:12     ` Ryan Roberts
2025-02-13  5:30       ` Anshuman Khandual
2025-02-13  9:38         ` Ryan Roberts
2025-02-17  4:48           ` Anshuman Khandual
2025-02-17  9:40             ` Ryan Roberts
2025-02-06  7:52 ` [PATCH v1 00/16] hugetlb and vmalloc fixes and perf improvements Andrew Morton
2025-02-06 11:59   ` Ryan Roberts

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6b3ad437-04df-41c6-81f4-c0fec5a099ec@arm.com \
    --to=ryan.roberts@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexghiti@rivosinc.com \
    --cc=anshuman.khandual@arm.com \
    --cc=ardb@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=dev.jain@arm.com \
    --cc=hch@infradead.org \
    --cc=kevin.brodsky@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mark.rutland@arm.com \
    --cc=muchun.song@linux.dev \
    --cc=pasha.tatashin@soleen.com \
    --cc=steve.capper@linaro.org \
    --cc=urezki@gmail.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox