From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 58925C369A2 for ; Mon, 14 Apr 2025 18:28:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4EA17280074; Mon, 14 Apr 2025 14:28:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4729B28005A; Mon, 14 Apr 2025 14:28:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 312C6280074; Mon, 14 Apr 2025 14:28:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 1083928005A for ; Mon, 14 Apr 2025 14:28:55 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 07BE3ABD7E for ; Mon, 14 Apr 2025 18:28:56 +0000 (UTC) X-FDA: 83333485872.20.5B93E37 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf13.hostedemail.com (Postfix) with ESMTP id 1CC4E2000D for ; Mon, 14 Apr 2025 18:28:53 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=none; spf=pass (imf13.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744655334; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NjEEFFYwSwoQGChaGn2xYz+HvAAtUu4eyUM8yzFkZWA=; b=k35EUQ6kUKbUakCi8fumG5UcPR9ivEfJazHKuI3IT3rmjLwVsI0sa7PZhYBMxitIg7nv15 EX6OX9i3Rcf5a9H+li9tBHrdCUU+TiCPWhbV5seuVAbtpEzk0ejfJszmpAPPveuJGTMzG3 LorfVpLbHNUXnsj2ncAadx/iwE9RHuk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744655334; a=rsa-sha256; cv=none; b=AM0E5mt0+2HbeRtuNQael4Rmjp89CytiNkSBF/uVeONUgS7GeSqyR4GDEyNA1uSAhZrGc/ LHOMhCwzxxd0eMG7ef+vUZZChjWANUh1k+XJO9CHabZBKZW9zG/ibrA+FzPbD47mC4tFXA d8aRAPt9Ncn5DTZCx/fTNipIi+OtQsw= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=none; spf=pass (imf13.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 662741595; Mon, 14 Apr 2025 11:28:51 -0700 (PDT) Received: from [10.57.86.225] (unknown [10.57.86.225]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 191BD3F59E; Mon, 14 Apr 2025 11:28:47 -0700 (PDT) Message-ID: Date: Mon, 14 Apr 2025 19:28:46 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3 11/11] arm64/mm: Batch barriers when updating kernel mappings Content-Language: en-GB To: Catalin Marinas Cc: Will Deacon , Pasha Tatashin , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , David Hildenbrand , "Matthew Wilcox (Oracle)" , Mark Rutland , Anshuman Khandual , Alexandre Ghiti , Kevin Brodsky , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20250304150444.3788920-1-ryan.roberts@arm.com> <20250304150444.3788920-12-ryan.roberts@arm.com> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 1CC4E2000D X-Stat-Signature: rha9faxn56pwipe7posz8qzw5u43i935 X-HE-Tag: 1744655333-218452 X-HE-Meta: U2FsdGVkX1+brXBwSvo/eK8h6QkKeYq6gXNe9OMb3J6ViuXMxfCXjXbczpgS7HBKEHNSd4vn5BViFAj7wV+a9/OuiscI1/D3VEJKvfCbV2rSViIT5tom12lvuD1mB9jgOI+oUNdNzRcU3FS/0btiiIJcdkq2kg5Kv4zL0TtIxFqfsFiD+j3OI7AWLvutaT8XgaCgeEaIaflHtZd47Y8iv6aRDcm/iBEHHbCjKMSAFv7dYM0tXZeqhRi5HIbcRwbCYcvXSjTCbA5+u36md1K/GTz0JtIxbpsvbhFWv6K+W0QpoyZUlsAK9v63lj+OSgRpmSIAzHRmfrEzwNb6Sk+tQ8WKgXVr/TrR6uyitqyUQ8mnoO9F0fQu8u7UhvmtcMLDsEPSvogeGO+5SOP2yMs/r/ZzUBMTA5hv9OMF6ob4/NS1l2EBVL5nmK9ZPe+c4K8bm552Vi8eA8biABJZNSfyPa7dl3j7Lz27gP0q8zBulWlRbz//VoSDiY70AMUd29MllTsaB6UizgptjQUn98/AiEyrDK38PkzhR5IFB83BP42YFpw6mwC8V4wWd/WMkfhIH4pa5MzzV5DbS79Wa8p7feWvr1QY88gszhppVgVLAsvfX1DQNIyQxOtqW2sUP3mAevdT8eW8c2ftj7QGlmTtbaOXMlC06Z5rFd19yHloq2m4JXljTPag0oWPsrA6+qDAm5DeddRfWiWnD+6oVU1Lpj0H7qulhiZZR/BZSIoEtnu8ye0WZFf/RKlKtVjbGLBqYOA4FBb3WbiW9nG/PMHFJlPkauisGhSzkFC5tLxAS8I07AGonujaZGywh71lIqgX8oz1NHhFoZ6uDKybllAyueg0pOaWs6Bo X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 14/04/2025 18:38, Catalin Marinas wrote: > On Tue, Mar 04, 2025 at 03:04:41PM +0000, Ryan Roberts wrote: >> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h >> index 1898c3069c43..149df945c1ab 100644 >> --- a/arch/arm64/include/asm/pgtable.h >> +++ b/arch/arm64/include/asm/pgtable.h >> @@ -40,6 +40,55 @@ >> #include >> #include >> >> +static inline void emit_pte_barriers(void) >> +{ >> + /* >> + * These barriers are emitted under certain conditions after a pte entry >> + * was modified (see e.g. __set_pte_complete()). The dsb makes the store >> + * visible to the table walker. The isb ensures that any previous >> + * speculative "invalid translation" marker that is in the CPU's >> + * pipeline gets cleared, so that any access to that address after >> + * setting the pte to valid won't cause a spurious fault. If the thread >> + * gets preempted after storing to the pgtable but before emitting these >> + * barriers, __switch_to() emits a dsb which ensure the walker gets to >> + * see the store. There is no guarrantee of an isb being issued though. >> + * This is safe because it will still get issued (albeit on a >> + * potentially different CPU) when the thread starts running again, >> + * before any access to the address. >> + */ >> + dsb(ishst); >> + isb(); >> +} >> + >> +static inline void queue_pte_barriers(void) >> +{ >> + if (test_thread_flag(TIF_LAZY_MMU)) >> + set_thread_flag(TIF_LAZY_MMU_PENDING); > > As we can have lots of calls here, it might be slightly cheaper to test > TIF_LAZY_MMU_PENDING and avoid setting it unnecessarily. Yes, good point. > > I haven't checked - does the compiler generate multiple mrs from sp_el0 > for subsequent test_thread_flag()? It emits a single mrs but it loads from the pointer twice. I think v3 is the version we want? void TEST_queue_pte_barriers_v1(void) { if (test_thread_flag(TIF_LAZY_MMU)) set_thread_flag(TIF_LAZY_MMU_PENDING); else emit_pte_barriers(); } void TEST_queue_pte_barriers_v2(void) { if (test_thread_flag(TIF_LAZY_MMU) && !test_thread_flag(TIF_LAZY_MMU_PENDING)) set_thread_flag(TIF_LAZY_MMU_PENDING); else emit_pte_barriers(); } void TEST_queue_pte_barriers_v3(void) { unsigned long flags = read_thread_flags(); if ((flags & (_TIF_LAZY_MMU | _TIF_LAZY_MMU_PENDING)) == _TIF_LAZY_MMU) set_thread_flag(TIF_LAZY_MMU_PENDING); else emit_pte_barriers(); } 000000000000101c : 101c: d5384100 mrs x0, sp_el0 1020: f9400001 ldr x1, [x0] 1024: 37f80081 tbnz w1, #31, 1034 1028: d5033a9f dsb ishst 102c: d5033fdf isb 1030: d65f03c0 ret 1034: 14000004 b 1044 1038: d2c00021 mov x1, #0x100000000 // #4294967296 103c: f821301f stset x1, [x0] 1040: d65f03c0 ret 1044: f9800011 prfm pstl1strm, [x0] 1048: c85f7c01 ldxr x1, [x0] 104c: b2600021 orr x1, x1, #0x100000000 1050: c8027c01 stxr w2, x1, [x0] 1054: 35ffffa2 cbnz w2, 1048 1058: d65f03c0 ret 000000000000105c : 105c: d5384100 mrs x0, sp_el0 1060: f9400001 ldr x1, [x0] 1064: 37f80081 tbnz w1, #31, 1074 1068: d5033a9f dsb ishst 106c: d5033fdf isb 1070: d65f03c0 ret 1074: f9400001 ldr x1, [x0] 1078: b707ff81 tbnz x1, #32, 1068 107c: 14000004 b 108c 1080: d2c00021 mov x1, #0x100000000 // #4294967296 1084: f821301f stset x1, [x0] 1088: d65f03c0 ret 108c: f9800011 prfm pstl1strm, [x0] 1090: c85f7c01 ldxr x1, [x0] 1094: b2600021 orr x1, x1, #0x100000000 1098: c8027c01 stxr w2, x1, [x0] 109c: 35ffffa2 cbnz w2, 1090 10a0: d65f03c0 ret 00000000000010a4 : 10a4: d5384101 mrs x1, sp_el0 10a8: f9400020 ldr x0, [x1] 10ac: d2b00002 mov x2, #0x80000000 // #2147483648 10b0: 92610400 and x0, x0, #0x180000000 10b4: eb02001f cmp x0, x2 10b8: 54000080 b.eq 10c8 // b.none 10bc: d5033a9f dsb ishst 10c0: d5033fdf isb 10c4: d65f03c0 ret 10c8: 14000004 b 10d8 10cc: d2c00020 mov x0, #0x100000000 // #4294967296 10d0: f820303f stset x0, [x1] 10d4: d65f03c0 ret 10d8: f9800031 prfm pstl1strm, [x1] 10dc: c85f7c20 ldxr x0, [x1] 10e0: b2600000 orr x0, x0, #0x100000000 10e4: c8027c20 stxr w2, x0, [x1] 10e8: 35ffffa2 cbnz w2, 10dc 10ec: d65f03c0 ret > >> + else >> + emit_pte_barriers(); >> +} >> + >> +#define __HAVE_ARCH_ENTER_LAZY_MMU_MODE >> +static inline void arch_enter_lazy_mmu_mode(void) >> +{ >> + VM_WARN_ON(in_interrupt()); >> + VM_WARN_ON(test_thread_flag(TIF_LAZY_MMU)); >> + >> + set_thread_flag(TIF_LAZY_MMU); >> +} >> + >> +static inline void arch_flush_lazy_mmu_mode(void) >> +{ >> + if (test_and_clear_thread_flag(TIF_LAZY_MMU_PENDING)) >> + emit_pte_barriers(); >> +} >> + >> +static inline void arch_leave_lazy_mmu_mode(void) >> +{ >> + arch_flush_lazy_mmu_mode(); >> + clear_thread_flag(TIF_LAZY_MMU); >> +} >> + >> #ifdef CONFIG_TRANSPARENT_HUGEPAGE >> #define __HAVE_ARCH_FLUSH_PMD_TLB_RANGE >> >> @@ -323,10 +372,8 @@ static inline void __set_pte_complete(pte_t pte) >> * Only if the new pte is valid and kernel, otherwise TLB maintenance >> * has the necessary barriers. >> */ >> - if (pte_valid_not_user(pte)) { >> - dsb(ishst); >> - isb(); >> - } >> + if (pte_valid_not_user(pte)) >> + queue_pte_barriers(); >> } > > I think this scheme works, I couldn't find a counter-example unless > __set_pte() gets called in an interrupt context. You could add > VM_WARN_ON(in_interrupt()) in queue_pte_barriers() as well. > > With preemption, the newly mapped range shouldn't be used before > arch_flush_lazy_mmu_mode() is called, so it looks safe as well. I think > x86 uses a per-CPU variable to track this but per-thread is easier to > reason about if there's no nesting. > >> static inline void __set_pte(pte_t *ptep, pte_t pte) >> @@ -778,10 +825,8 @@ static inline void set_pmd(pmd_t *pmdp, pmd_t pmd) >> >> WRITE_ONCE(*pmdp, pmd); >> >> - if (pmd_valid(pmd)) { >> - dsb(ishst); >> - isb(); >> - } >> + if (pmd_valid(pmd)) >> + queue_pte_barriers(); >> } > > We discussed on a previous series - for pmd/pud we end up with barriers > even for user mappings but they are at a much coarser granularity (and I > wasn't keen on 'user' attributes for the table entries). > > Reviewed-by: Catalin Marinas Thanks! Ryan