From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F0E8BC369A2 for ; Mon, 14 Apr 2025 17:38:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7AD02280029; Mon, 14 Apr 2025 13:38:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 736B7280069; Mon, 14 Apr 2025 13:38:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 60AA6280029; Mon, 14 Apr 2025 13:38:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 3E70A280029 for ; Mon, 14 Apr 2025 13:38:26 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 4E38F140523 for ; Mon, 14 Apr 2025 17:38:27 +0000 (UTC) X-FDA: 83333358654.29.41D497C Received: from tor.source.kernel.org (tor.source.kernel.org [172.105.4.254]) by imf30.hostedemail.com (Postfix) with ESMTP id B81D480005 for ; Mon, 14 Apr 2025 17:38:25 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=arm.com (policy=none); spf=pass (imf30.hostedemail.com: domain of cmarinas@kernel.org designates 172.105.4.254 as permitted sender) smtp.mailfrom=cmarinas@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744652305; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=s3aL/BQJnmp8kDsseDJeVOwhRU3fAuZEScOgxsiLZ8Q=; b=tRjBVEx7OnI6OxwZgSqi+QjLrZrc7B779ZxGDkblfEqv8LVExsB0RuskACMrJzBcavZ/xL Yc2zV5mwYPH/WOVpRDhlIR63csRVUXw4EFtoN0cpv/rRd950ei/bTNVlpMTakeo9PFiRvP 7uhH8TSPvZkFjq8sfutATJV3ap3ZTUc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744652305; a=rsa-sha256; cv=none; b=SDmCgi40MTTmeMX5OBVEsp7H5QXmGLWEQ+Qrw/0i4l9QBs9FK55X03/OKpjcAVcVYyeWm8 NMZKxeChkAWNlIi3S1ua868SyeBUu4L9CA4qiWFGnOLe1FU8H/aclJg0UrlBGvqykIJkRR 6lyeX4u4cA1zxGT30DNWWA9Jyb/TSaQ= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=arm.com (policy=none); spf=pass (imf30.hostedemail.com: domain of cmarinas@kernel.org designates 172.105.4.254 as permitted sender) smtp.mailfrom=cmarinas@kernel.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by tor.source.kernel.org (Postfix) with ESMTP id 2F3FC6114C; Mon, 14 Apr 2025 17:38:10 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 28492C4CEE2; Mon, 14 Apr 2025 17:38:22 +0000 (UTC) Date: Mon, 14 Apr 2025 18:38:19 +0100 From: Catalin Marinas To: Ryan Roberts Cc: Will Deacon , Pasha Tatashin , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , David Hildenbrand , "Matthew Wilcox (Oracle)" , Mark Rutland , Anshuman Khandual , Alexandre Ghiti , Kevin Brodsky , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v3 11/11] arm64/mm: Batch barriers when updating kernel mappings Message-ID: References: <20250304150444.3788920-1-ryan.roberts@arm.com> <20250304150444.3788920-12-ryan.roberts@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250304150444.3788920-12-ryan.roberts@arm.com> X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: B81D480005 X-Stat-Signature: n8c35e9x7ju557a384w17pgyb11snkgr X-Rspam-User: X-HE-Tag: 1744652305-16519 X-HE-Meta: U2FsdGVkX193IEGxzpFRheL3Gvj+BmkxX+sGFyxunyLjoY5grZ+MzTk1Sd6ye/2X6PecLz+/+WHZOk4WzT+7CQnytnLa/PUIkeL6Xz1D2P6+qNpRCeODEUdLIE6b08psCAhODGCDjSjKcs/BH95wAUDGA99ty5S1lk9xyHSfSWECZM94fdDEaNC9iQCy4y5nudq+8gOEmXf896MyIRf2u9FVkKuZquyFosBNuV7/4aAPc61ghhy9HnclOKqbsoPugAcTNTTX1OX12S4Y2dA0c5dstehtZ9o3NQ4np8Ffxc0NicHATwcXHzEZb5eeoxCsIdzBKdvux+mowMIF4ExlnqOE6JO6sq3rSdDqrrnQ6B0OXuCb4STbkoEvkKnGV7ENPr+LMnmjOVYj2KqLyTNau8lXPi8C7Ch+DJXDuW/+a8FtceG0Wrtb1uuph5V/MDDvuuRz//OXGc3xpwuf534Sq9oz35dTnxuhCaqw8FRmmcZu0222uG62Jz82ltY1iVmNc+/Q5LQGlMj8Anxojj7CZwcAyWlRybZlLJcR61clUPT8YKLZYRcKhSlHqeR9HhKhbYyJCCSj4KyU+7R+lyCWMWGVDyxrN6+/0Gl5E+DclGc2q+xGy7BvoTPW26PWE92IJFWFy1nvqYTtxPS9BBh/Bq0ZTb7exlNx5YxzkfPJtvhUrZkeHfaHO4nYTnRgzDm25T8gFlwhiD+JfXp15MOLyKsi1jzzriTPvegOQ3cCYotLRN7zb2EmWYot9KESvHmMRQJcpAGrSZWZ/acfXQIq6JRE/MgA2+XSR9mDCRVPVdNV6/8n2EQ/pl9dvyi5SxI6+wPJf5SoZRJjylxJXpLcA4dCaW9PxFgRjNsCqipBd/IRmgquvf3gvSs4jIsr3B/nS7U8OuVkzujpXUrv58oVCfKT4cGi7YwGbSOUYx1KcMhY45qOUIvFBKY0ZdbxEtLuLiXEqK38nxD3u2zIG7l zherwWNi rAg+zZQa2RNLS1brqcxxJzAYbApV/lSWQmVmm6vLGOFnaNr4yG8n+VNF8ndqAx2vz1OUyiZtQ6/A+e7Fz9dEA4ZNNBO0IG1o96TCP5q2+qFGskyhxlLxt7zREpm9/h3RqdojHNl52reeMUX4xndGEyRA5Xb39rbAdgGqM/x77yShNHp5254MCKwod5EDz3ynfJKnbOnOS2ckuvVd23ghk7+r/5U2Tt49vHucJG84yXanDjSE4nP9/mKWTVreIbn3vIDyATnkTKLdLKDi1/Rb9w3go+LWqdJIMYeBZ3rceukofrw8ePiXtJg7bQqSCSR/2FV64CGg29MteqBaadGN11A0aiAYbzGfcu5tMWnbGuSssLBoj5NB4W9DqhOFjzSWim9TduQpgpmC4MY7r4KizHK0ZcQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Mar 04, 2025 at 03:04:41PM +0000, Ryan Roberts wrote: > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h > index 1898c3069c43..149df945c1ab 100644 > --- a/arch/arm64/include/asm/pgtable.h > +++ b/arch/arm64/include/asm/pgtable.h > @@ -40,6 +40,55 @@ > #include > #include > > +static inline void emit_pte_barriers(void) > +{ > + /* > + * These barriers are emitted under certain conditions after a pte entry > + * was modified (see e.g. __set_pte_complete()). The dsb makes the store > + * visible to the table walker. The isb ensures that any previous > + * speculative "invalid translation" marker that is in the CPU's > + * pipeline gets cleared, so that any access to that address after > + * setting the pte to valid won't cause a spurious fault. If the thread > + * gets preempted after storing to the pgtable but before emitting these > + * barriers, __switch_to() emits a dsb which ensure the walker gets to > + * see the store. There is no guarrantee of an isb being issued though. > + * This is safe because it will still get issued (albeit on a > + * potentially different CPU) when the thread starts running again, > + * before any access to the address. > + */ > + dsb(ishst); > + isb(); > +} > + > +static inline void queue_pte_barriers(void) > +{ > + if (test_thread_flag(TIF_LAZY_MMU)) > + set_thread_flag(TIF_LAZY_MMU_PENDING); As we can have lots of calls here, it might be slightly cheaper to test TIF_LAZY_MMU_PENDING and avoid setting it unnecessarily. I haven't checked - does the compiler generate multiple mrs from sp_el0 for subsequent test_thread_flag()? > + else > + emit_pte_barriers(); > +} > + > +#define __HAVE_ARCH_ENTER_LAZY_MMU_MODE > +static inline void arch_enter_lazy_mmu_mode(void) > +{ > + VM_WARN_ON(in_interrupt()); > + VM_WARN_ON(test_thread_flag(TIF_LAZY_MMU)); > + > + set_thread_flag(TIF_LAZY_MMU); > +} > + > +static inline void arch_flush_lazy_mmu_mode(void) > +{ > + if (test_and_clear_thread_flag(TIF_LAZY_MMU_PENDING)) > + emit_pte_barriers(); > +} > + > +static inline void arch_leave_lazy_mmu_mode(void) > +{ > + arch_flush_lazy_mmu_mode(); > + clear_thread_flag(TIF_LAZY_MMU); > +} > + > #ifdef CONFIG_TRANSPARENT_HUGEPAGE > #define __HAVE_ARCH_FLUSH_PMD_TLB_RANGE > > @@ -323,10 +372,8 @@ static inline void __set_pte_complete(pte_t pte) > * Only if the new pte is valid and kernel, otherwise TLB maintenance > * has the necessary barriers. > */ > - if (pte_valid_not_user(pte)) { > - dsb(ishst); > - isb(); > - } > + if (pte_valid_not_user(pte)) > + queue_pte_barriers(); > } I think this scheme works, I couldn't find a counter-example unless __set_pte() gets called in an interrupt context. You could add VM_WARN_ON(in_interrupt()) in queue_pte_barriers() as well. With preemption, the newly mapped range shouldn't be used before arch_flush_lazy_mmu_mode() is called, so it looks safe as well. I think x86 uses a per-CPU variable to track this but per-thread is easier to reason about if there's no nesting. > static inline void __set_pte(pte_t *ptep, pte_t pte) > @@ -778,10 +825,8 @@ static inline void set_pmd(pmd_t *pmdp, pmd_t pmd) > > WRITE_ONCE(*pmdp, pmd); > > - if (pmd_valid(pmd)) { > - dsb(ishst); > - isb(); > - } > + if (pmd_valid(pmd)) > + queue_pte_barriers(); > } We discussed on a previous series - for pmd/pud we end up with barriers even for user mappings but they are at a much coarser granularity (and I wasn't keen on 'user' attributes for the table entries). Reviewed-by: Catalin Marinas