From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 320EAC02198 for ; Wed, 12 Feb 2025 16:00:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B8AEF6B0085; Wed, 12 Feb 2025 11:00:33 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B3BD36B0088; Wed, 12 Feb 2025 11:00:33 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A029E6B0089; Wed, 12 Feb 2025 11:00:33 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 5916E6B0085 for ; Wed, 12 Feb 2025 11:00:33 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 9B32216243F for ; Wed, 12 Feb 2025 16:00:32 +0000 (UTC) X-FDA: 83111755104.04.2BBA1F6 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf08.hostedemail.com (Postfix) with ESMTP id 5BA87160011 for ; Wed, 12 Feb 2025 16:00:30 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf08.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739376030; a=rsa-sha256; cv=none; b=BwbfKXVggSzz4vkaP5xOS85GeurDeFyaFPpBH8mTbz6q3IvAfO2+92yaOYEWfO+Qe5UOaM lBF6YM622s2Hu7NMwqM5QXzvUGlYxZCQocmnlBiyn1kueyuR7e7IjPj1C56820glyaSu/l zuVIiSQH3bHnM5s1L5erI3pMscJlAzU= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf08.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739376030; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KXM4YCHeVF6IO/h6K9bOJdaDGWehlVoKSmej3MF0UUw=; b=Jgsva2PT5tIcKyj+z9qJaOqPqfGhX7FO3SojgluepwLFXcI2+3aMpcDUDqrw3dwPD68LWo 7+np89/l3iuiSU9x4avxB7KHKruHD7JZQ4fNCmaOvSyYoVCcdqlu2fwjxthMlTMg5Ce57d cOVPgGLXjKUcSDbBLUNTfLUiMy8vycs= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A89BF12FC; Wed, 12 Feb 2025 08:00:50 -0800 (PST) Received: from [10.57.81.93] (unknown [10.57.81.93]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 12BD23F6A8; Wed, 12 Feb 2025 08:00:26 -0800 (PST) Message-ID: <6b3ad437-04df-41c6-81f4-c0fec5a099ec@arm.com> Date: Wed, 12 Feb 2025 16:00:25 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v1 08/16] arm64/mm: Hoist barriers out of ___set_ptes() loop Content-Language: en-GB From: Ryan Roberts To: Anshuman Khandual , Catalin Marinas , Will Deacon , Muchun Song , Pasha Tatashin , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Mark Rutland , Ard Biesheuvel , Dev Jain , Alexandre Ghiti , Steve Capper , Kevin Brodsky Cc: linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20250205151003.88959-1-ryan.roberts@arm.com> <20250205151003.88959-9-ryan.roberts@arm.com> <858ecac5-9ba7-48da-8f34-ffda28d17609@arm.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Queue-Id: 5BA87160011 X-Rspamd-Server: rspam12 X-Stat-Signature: db7ffeky4ifuqo4rodchd3hwet84tyuh X-HE-Tag: 1739376030-818937 X-HE-Meta: U2FsdGVkX1/jx/q3RkCpO0I6egrXYyhpem/kBo/X87y4qr0eKBbR9LFXHhaodCFKxo/HFvCDvRqeEjeOq3ZVb1T3KanXj6AGNeqUVC8i5W7DNNNQcmi+jOmwD36nhiqtnFm+F5o1Er8wGjwPirz1+zDm0d+11loevFRHYGGkdW42+KkIszh4MjxZvRUdJCrQUFRAAlyzlevNaJi0npvVr/8YzdDC/EF5Y2iICWjzbNSPAa8FjZEvhVFVTGMAbuxxo7xRojcsc15nffvPMOmJxswpIUKBshWBqEX9mhl6wiOJLczkJCa5MCtiK1gU4GdKGwlUWO2atpK4+JxVG9NLtnKm1zOv6V8II0CbtIxG4dmRrIUnJe/qoVYYM+EU+bxrLKK2eiayDaR/JiGNae/NUojRkS2mDzKoia/I7YokGE9ImzjfSSzhwMjeW7pgRITtVyI718YaOPlbkRgd+uuv8df8HI4QNStSYxdXmzfvASydIjeRIanwena1CU6+36cuLeqykU6BrCSLuCLNcjyyPvYR4eKsWM3+bmVa/Zvb/6SBJaCSiGyLd+GTkqqV9JL37jCAeH/gLOfNaGmsqzQt1NgG2+pwV9V3QI4SDeWtYH55flv+KpygrZA4iV4PXzCtDLPGJ6jSBcLCL19NIsWCwUQsdxAIIE/dN1RQTbhv6Zi3GmjFE+3pLHQEAevrko5z96YI+lnkDGFprTCPuv6Wyf/Hu9AM53E+zgKIgjrrsFSAEMYXm2gg+tObHKzpVkYFY25i1X9et+IvbM1tWObx3kkkwwoM0cdDW0SHBN7c/3TenPx7o/7F2eIlvwgBGH7GH4UFDYS5smdtZ/QxVvHKwAlwE/kNlqnsc0NR9MZWOuJZiP/SVHuQsEx8Rj6n2uIS7iYegtve0bRadd7Brup5MeMOhX+1xs67lt+xb0nC7BamsszxHy/cmkMAluoMWVEA5NN53zR7CbZf28f1oS0 rG3yqRko yruHC+xdn8KmL/Njeh7uSzOJ/5S3jL1Jtn2tQJ6gTds53ltlJcFm6l5nHGNjadcLMf/iV5OBLCf8wZTYuYOO5z8uM/CBfzszPuSMO7yEkF6eoqbZvJHnuM/0n0LOvTDh7g3+1Sb/r2+ga260rJqWTOugRYh2bal9PzI12OnpLibP9RQ8CQBkR5X3aqWiTlmU49q2EHNlFwVr5UcbayqTrtdDH4XbKLswvwNlAvzHfRTsaNmzedXvsotnwEg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 07/02/2025 10:38, Ryan Roberts wrote: > On 07/02/2025 05:35, Anshuman Khandual wrote: >> >> >> On 2/5/25 20:39, Ryan Roberts wrote: >>> ___set_ptes() previously called __set_pte() for each PTE in the range, >>> which would conditionally issue a DSB and ISB to make the new PTE value >>> immediately visible to the table walker if the new PTE was valid and for >>> kernel space. >>> >>> We can do better than this; let's hoist those barriers out of the loop >>> so that they are only issued once at the end of the loop. We then reduce >>> the cost by the number of PTEs in the range. >>> >>> Signed-off-by: Ryan Roberts >>> --- >>> arch/arm64/include/asm/pgtable.h | 14 ++++++++++---- >>> 1 file changed, 10 insertions(+), 4 deletions(-) >>> >>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h >>> index 3b55d9a15f05..1d428e9c0e5a 100644 >>> --- a/arch/arm64/include/asm/pgtable.h >>> +++ b/arch/arm64/include/asm/pgtable.h >>> @@ -317,10 +317,8 @@ static inline void __set_pte_nosync(pte_t *ptep, pte_t pte) >>> WRITE_ONCE(*ptep, pte); >>> } >>> >>> -static inline void __set_pte(pte_t *ptep, pte_t pte) >>> +static inline void __set_pte_complete(pte_t pte) >>> { >>> - __set_pte_nosync(ptep, pte); >>> - >>> /* >>> * Only if the new pte is valid and kernel, otherwise TLB maintenance >>> * or update_mmu_cache() have the necessary barriers. >>> @@ -331,6 +329,12 @@ static inline void __set_pte(pte_t *ptep, pte_t pte) >>> } >>> } >>> >>> +static inline void __set_pte(pte_t *ptep, pte_t pte) >>> +{ >>> + __set_pte_nosync(ptep, pte); >>> + __set_pte_complete(pte); >>> +} >>> + >>> static inline pte_t __ptep_get(pte_t *ptep) >>> { >>> return READ_ONCE(*ptep); >>> @@ -647,12 +651,14 @@ static inline void ___set_ptes(struct mm_struct *mm, pte_t *ptep, pte_t pte, >>> >>> for (;;) { >>> __check_safe_pte_update(mm, ptep, pte); >>> - __set_pte(ptep, pte); >>> + __set_pte_nosync(ptep, pte); >>> if (--nr == 0) >>> break; >>> ptep++; >>> pte = pte_advance_pfn(pte, stride); >>> } >>> + >>> + __set_pte_complete(pte); >> >> Given that the loop now iterates over number of page table entries without corresponding >> consecutive dsb/isb sync, could there be a situation where something else gets scheduled >> on the cpu before __set_pte_complete() is called ? Hence leaving the entire page table >> entries block without desired mapping effect. IOW how __set_pte_complete() is ensured to >> execute once the loop above completes. Otherwise this change LGTM. > > I don't think this changes the model. Previously, __set_pte() was called, which > writes the pte to the pgtable, then issues the barriers. So there is still a > window where the thread could be unscheduled after the write but before the > barriers. Yes, my change makese that window bigger, but if it is a bug now, it > was a bug before. > > Additionally, the spec for set_ptes() is: > > /** > * set_ptes - Map consecutive pages to a contiguous range of addresses. > * @mm: Address space to map the pages into. > * @addr: Address to map the first page at. > * @ptep: Page table pointer for the first entry. > * @pte: Page table entry for the first page. > * @nr: Number of pages to map. > * > * When nr==1, initial state of pte may be present or not present, and new state > * may be present or not present. When nr>1, initial state of all ptes must be > * not present, and new state must be present. > * > * May be overridden by the architecture, or the architecture can define > * set_pte() and PFN_PTE_SHIFT. > * > * Context: The caller holds the page table lock. The pages all belong > * to the same folio. The PTEs are all in the same PMD. > */ > > Note that the caller is required to hold the page table lock. That's a spin lock > so should be non-preemptible at this point (perhaps not for RT?) > > Although actually, vmalloc doesn't hold a lock when calling these helpers; it > has a lock when allocating the VA space, then drops it. > > So yes, I think there is a chance of preemption after writing the pgtable entry > but before issuing the barriers. > > But in that case, we get saved by the DSB in the context switch path. There is > no guarrantee of an ISB in that path (AFAIU). But the need for an ISB is a bit > whooly anyway. My rough understanding is that the ISB is there to prevent > previous speculation from determining that a given translation was invalid and > "caching" that determination in the pipeline. That could still (theoretically) > happen on remote CPUs I think, and we have the spurious fault handler to detect > that. Anyway, once you context switch, the local CPU becomes remote and we don't > have the ISB there, so what's the difference... There's a high chance I've > misunderstood a bunch of this. I thought about this some more; The ISB is there to ensure that the "speculative invalid translation marker" cached in the pipeline gets removed prior to any code that runs after set_ptes() returns which accesses an address now mapped by the pte that was set. Even if preemption occurs, the ISB will still execute when the thread runs again, before the return from set_ptes(). So all is well. > > > In conclusion, I don't think I've made things any worse. > > Thanks, > Ryan > >> >>> } >>> >>> static inline void __set_ptes(struct mm_struct *mm, >