From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 05389C02192 for ; Wed, 5 Feb 2025 13:52:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1B40B280004; Wed, 5 Feb 2025 08:52:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 16421280003; Wed, 5 Feb 2025 08:52:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0524C280004; Wed, 5 Feb 2025 08:52:14 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id DB416280003 for ; Wed, 5 Feb 2025 08:52:14 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 2386080E47 for ; Wed, 5 Feb 2025 13:52:07 +0000 (UTC) X-FDA: 83086029894.20.17A2106 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf08.hostedemail.com (Postfix) with ESMTP id 1E006160007 for ; Wed, 5 Feb 2025 13:52:04 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=rCqrhMp2; spf=none (imf08.hostedemail.com: domain of peterz@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=peterz@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738763525; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=o3tdG/ILmJ8WB9mJb6zPM5dQLbi+IcpxaV9npA9DxAc=; b=fj7AILo1QW6q6HXUyfobFwWO7wqhupyCckt3e+wtysRsv0GGGdeXh78sG8abOsbR9wooSA YHL6uDyQco30Dx1EV3dhSLf+/tuRtSwVXA7p++I3KODrYogy+UFZw2l5agX833aIj3Y9Rt cm/KoGaSP9lbDaH5ypNc1SAoI4aTb8M= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738763525; a=rsa-sha256; cv=none; b=fnaTeahVyDW75TY2eiy0Cg4rqJpb3xnnx6aJabCpeCoRfGFtjEH/AdZWDcublAmmbTsI6+ zUM7RP4X/lxr+CNz0ajjCaVp8LxLx+eh4F+6vKahHN5SnEcEeuR6DskIO4bwrqJG/CS37A 1VsWMNTq8aFnvRDPrmRFat3dVsLOvHQ= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=rCqrhMp2; spf=none (imf08.hostedemail.com: domain of peterz@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=peterz@infradead.org; dmarc=none DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=o3tdG/ILmJ8WB9mJb6zPM5dQLbi+IcpxaV9npA9DxAc=; b=rCqrhMp2iqT/kkq0j//02TpF+W w+VR27oz7AikR0/YRzEpgNdd7I5hgoW+j92GuDa0FzgaLTdbnAC3C9n01l9/G5QMMyD2Dh3ckKecB fm4J341VsaYRVneRjPzraait+KIV/o8adFM1yvBqxJ43BGLAiUc6PU4dUqJ+E0Ckdo2uhY5UXgoHr MMKAyCUWT8x1131Q4Ks69x8vdiwxhdJfsoclNwpji+2ms98+PjYjGo2tsRYMx2A23CeE7PahgoJ2J yswhuu33CtRhf+PRhPlBy0Zn3EI/WWZKsl6wKw2OEvHKybGT+4wuvyH9x2hpO21H9wxLL4/Awh7EI Ux2QZlhg==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux)) id 1tffor-00000004Vy6-0xWi; Wed, 05 Feb 2025 13:51:57 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 1000) id 26DBC30083E; Wed, 5 Feb 2025 14:51:56 +0100 (CET) Date: Wed, 5 Feb 2025 14:51:56 +0100 From: Peter Zijlstra To: Rik van Riel Cc: x86@kernel.org, linux-kernel@vger.kernel.org, bp@alien8.de, dave.hansen@linux.intel.com, zhengqi.arch@bytedance.com, nadav.amit@gmail.com, thomas.lendacky@amd.com, kernel-team@meta.com, linux-mm@kvack.org, akpm@linux-foundation.org, jannh@google.com, mhklinux@outlook.com, andrew.cooper3@citrix.com, Manali Shukla , David.Kaplan@amd.com Subject: Re: [PATCH v8 10/12] x86/mm: do targeted broadcast flushing from tlbbatch code Message-ID: <20250205135156.GI14028@noisy.programming.kicks-ass.net> References: <20250205014033.3626204-1-riel@surriel.com> <20250205014033.3626204-11-riel@surriel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250205014033.3626204-11-riel@surriel.com> X-Stat-Signature: mee6wyxafc1qr5ydxo4nw98bif5inups X-Rspamd-Queue-Id: 1E006160007 X-Rspam-User: X-Rspamd-Server: rspam06 X-HE-Tag: 1738763524-664846 X-HE-Meta: U2FsdGVkX19zbzcggFTq9Xeewb78nN1ejw0ShwAxYgSNBk1yuYqRRZyzzRxOyKBjslXJwyjl1TGasvEZLXwOpsrWZZ6LSCZ3n3l/Sl+QBXG3dJrVjPJJgZv6YuP+TRE0+CeEckvZQ9IA8Q61eYiJVDhB7f/XXEIppgMfeeqB2B/0+BkQZbxTua7lPGivaKPxrgWIxERD6y5t4wOwSi5XvBuKjSPdhuFs/1irtODgli6HXGoluTfqLaSNMw0Q6uXcWlcO9HsgPf0eyWyaUUeTCzhnKfBk6IzSzTsnbdElaionAIHtYuyN8HNzVnHeEkWTyX0/dbqalsZO5Noy1AKjLySCiYzXuVMwzK+XLyDwCC9mon3FwBiDfM4y2btsDUikIDHs4JhKO/340wgEHZtSMvLl5F9kVgy/Kta1B4r8qEcfiaxg8Q9CvY9/B1ZqeMBl1jcTNkhs1g05uWBBQAXQ7Q+APfubZ9CaHIbdKvbZuDvxKbbOsFqWRGOrra7IsAGkYOvTgTpEr6LuJF8EWtwIgLC9PN5xuvdCujyzNZ5iy4+0e2ox/a3Bk/TfFBlPQIFgzgE4JRQxxxSK0Tc0dWan8xIE/d1RmneJcsU+3+lmLtiB/TDZ/g2NFaI/0SgPvQRYLdE7s8o/ep6E5Jw8s0UumiOM8dW3FOYKpGqzGUB7jY+6Mr69blWtr3dFNudixZBfNrsdtJBhS/9eTwYjjAYovUB7afI5fqqPLdzZPW7M0sRrcYOIcpAanN1EjPqQ76b/uPR5Q+/EQT5ona4GIQS9ad3fhdBrESmdgFndabaE8Ez3zySNyhZxJTmk1g7L6DmteftQNqusJy6LTb6mbzIJlthlCihXJARmhUxMsqU3yKDKNjhlTtLXhu84f+H0hytJQeujqt34LHTBowDE73fk1q16Mh82jY/y8xot715mVYn7Ur03K6WdBVpsC8AGtotf6qwIEIs/yl1/S7y7rKS U8Xrfwde bYT701m1eXVHcxgiZjLnX8PPdq4U4IGm8K3+ZvCTwYFyV2sd++8rQBCXwJkW3/5QprASbsL/cFUHwjIIjtg6FpYKH+wYnohwmEF+he6Gb72acDh+yFNJgStZtyUik/3qpjYYfGC55nejKR58lsdvr/IfV4Opm+1NiKom3ACphgzunGPik9v6l/QDq9tb1txja7PS0B9EydSHVbzvdkxE/P7981T5jlh291kSVJrWWKBCMC10N++QNPDlYWpPt+NK/9JFFQbBsDAkjOas= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Feb 04, 2025 at 08:39:59PM -0500, Rik van Riel wrote: > @@ -1657,12 +1655,65 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) > local_irq_enable(); > } > > + /* > + * If we issued (asynchronous) INVLPGB flushes, wait for them here. > + * The cpumask above contains only CPUs that were running tasks > + * not using broadcast TLB flushing. > + */ > + if (cpu_feature_enabled(X86_FEATURE_INVLPGB) && batch->used_invlpgb) { > + tlbsync(); > + migrate_enable(); > + batch->used_invlpgb = false; > + } > + > cpumask_clear(&batch->cpumask); > > put_flush_tlb_info(); > put_cpu(); > } > > +void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch, > + struct mm_struct *mm, > + unsigned long uaddr) > +{ > + u16 asid = mm_global_asid(mm); > + > + if (asid) { > + /* > + * Queue up an asynchronous invalidation. The corresponding > + * TLBSYNC is done in arch_tlbbatch_flush(), and must be done > + * on the same CPU. > + */ > + if (!batch->used_invlpgb) { > + batch->used_invlpgb = true; > + migrate_disable(); > + } How about we do something like this instead? This keeps all the TLBSYNC in the same task as the INVLPGB, without making things complicated and allowing random CR3 writes in between them -- which makes my head hurt. --- --- a/arch/x86/include/asm/tlbbatch.h +++ b/arch/x86/include/asm/tlbbatch.h @@ -10,7 +10,6 @@ struct arch_tlbflush_unmap_batch { * the PFNs being flushed.. */ struct cpumask cpumask; - bool used_invlpgb; }; #endif /* _ARCH_X86_TLBBATCH_H */ --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -106,6 +106,7 @@ struct tlb_state { * need to be invalidated. */ bool invalidate_other; + bool need_tlbsync; #ifdef CONFIG_ADDRESS_MASKING /* --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -266,6 +266,37 @@ static void choose_new_asid(struct mm_st *need_flush = true; } +static inline void tlbsync(void) +{ + if (!this_cpu_read(cpu_tlbstate.need_tlbsync)) + return; + __tlbsync(); + this_cpu_write(cpu_tlbstate.need_tlbsync, false); +} + +static inline void invlpgb_flush_user_nr_nosync(unsigned long pcid, + unsigned long addr, + u16 nr, bool pmd_stride) +{ + __invlpgb_flush_user_nr(pcid, addr, nr, pmd_stride); + if (!this_cpu_read(cpu_tlbstate.need_tlbsync)) + this_cpu_write(cpu_tlbstate.need_tlbsync, true); +} + +static inline void invlpgb_flush_single_pcid_nosync(unsigned long pcid) +{ + __invlpgb_flush_single_pcid(pcid); + if (!this_cpu_read(cpu_tlbstate.need_tlbsync)) + this_cpu_write(cpu_tlbstate.need_tlbsync, true); +} + +static inline void invlpgb_flush_addr_nosync(unsigned long addr, u16 nr) +{ + __invlpgb_flush_addr(addr, nr); + if (!this_cpu_read(cpu_tlbstate.need_tlbsync)) + this_cpu_write(cpu_tlbstate.need_tlbsync, true); +} + #ifdef CONFIG_X86_BROADCAST_TLB_FLUSH /* * Logic for broadcast TLB invalidation. @@ -793,6 +824,8 @@ void switch_mm_irqs_off(struct mm_struct if (IS_ENABLED(CONFIG_PROVE_LOCKING)) WARN_ON_ONCE(!irqs_disabled()); + tlbsync(); + /* * Verify that CR3 is what we think it is. This will catch * hypothetical buggy code that directly switches to swapper_pg_dir @@ -968,6 +1001,8 @@ void switch_mm_irqs_off(struct mm_struct */ void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk) { + tlbsync(); + if (this_cpu_read(cpu_tlbstate.loaded_mm) == &init_mm) return; @@ -1623,11 +1658,8 @@ void arch_tlbbatch_flush(struct arch_tlb * The cpumask above contains only CPUs that were running tasks * not using broadcast TLB flushing. */ - if (cpu_feature_enabled(X86_FEATURE_INVLPGB) && batch->used_invlpgb) { + if (cpu_feature_enabled(X86_FEATURE_INVLPGB)) tlbsync(); - migrate_enable(); - batch->used_invlpgb = false; - } cpumask_clear(&batch->cpumask); @@ -1647,10 +1679,6 @@ void arch_tlbbatch_add_pending(struct ar * TLBSYNC is done in arch_tlbbatch_flush(), and must be done * on the same CPU. */ - if (!batch->used_invlpgb) { - batch->used_invlpgb = true; - migrate_disable(); - } invlpgb_flush_user_nr_nosync(kern_pcid(asid), uaddr, 1, false); /* Do any CPUs supporting INVLPGB need PTI? */ if (static_cpu_has(X86_FEATURE_PTI)) --- a/arch/x86/include/asm/invlpgb.h +++ b/arch/x86/include/asm/invlpgb.h @@ -3,6 +3,7 @@ #define _ASM_X86_INVLPGB #include +#include #include #include @@ -31,9 +32,8 @@ static inline void __invlpgb(unsigned lo } /* Wait for INVLPGB originated by this CPU to complete. */ -static inline void tlbsync(void) +static inline void __tlbsync(void) { - cant_migrate(); /* TLBSYNC: supported in binutils >= 0.36. */ asm volatile(".byte 0x0f, 0x01, 0xff" ::: "memory"); } @@ -61,19 +61,19 @@ static inline void invlpgb_flush_user(un unsigned long addr) { __invlpgb(0, pcid, addr, 0, 0, INVLPGB_PCID | INVLPGB_VA); - tlbsync(); + __tlbsync(); } -static inline void invlpgb_flush_user_nr_nosync(unsigned long pcid, - unsigned long addr, - u16 nr, - bool pmd_stride) +static inline void __invlpgb_flush_user_nr(unsigned long pcid, + unsigned long addr, + u16 nr, + bool pmd_stride) { __invlpgb(0, pcid, addr, nr - 1, pmd_stride, INVLPGB_PCID | INVLPGB_VA); } /* Flush all mappings for a given PCID, not including globals. */ -static inline void invlpgb_flush_single_pcid_nosync(unsigned long pcid) +static inline void __invlpgb_flush_single_pcid(unsigned long pcid) { __invlpgb(0, pcid, 0, 0, 0, INVLPGB_PCID); } @@ -82,11 +82,11 @@ static inline void invlpgb_flush_single_ static inline void invlpgb_flush_all(void) { __invlpgb(0, 0, 0, 0, 0, INVLPGB_INCLUDE_GLOBAL); - tlbsync(); + __tlbsync(); } /* Flush addr, including globals, for all PCIDs. */ -static inline void invlpgb_flush_addr_nosync(unsigned long addr, u16 nr) +static inline void __invlpgb_flush_addr(unsigned long addr, u16 nr) { __invlpgb(0, 0, addr, nr - 1, 0, INVLPGB_INCLUDE_GLOBAL); } @@ -95,7 +95,7 @@ static inline void invlpgb_flush_addr_no static inline void invlpgb_flush_all_nonglobals(void) { __invlpgb(0, 0, 0, 0, 0, 0); - tlbsync(); + __tlbsync(); } #endif /* _ASM_X86_INVLPGB */