From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4654DC2D0CD for ; Tue, 20 May 2025 01:04:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8B1D66B009D; Mon, 19 May 2025 21:04:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 883E66B00A5; Mon, 19 May 2025 21:04:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5C29D6B009D; Mon, 19 May 2025 21:04:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 1F7576B009F for ; Mon, 19 May 2025 21:04:21 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id B7F5E8190E for ; Tue, 20 May 2025 01:04:20 +0000 (UTC) X-FDA: 83461490280.24.F191727 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) by imf25.hostedemail.com (Postfix) with ESMTP id 30DFDA0005 for ; Tue, 20 May 2025 01:04:18 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=none; spf=pass (imf25.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747703059; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=r0DTqUJvzPq062h9hu6Sjy04l1WOu7JdJDu8bbWiuuM=; b=Z3SdRrVFY4FSZ6cqiO7KEXC5rbBSLca9fTfI/aEPMdPxgFwKJZaTp6PH97Zu35Uba73/7V zX+2bgz+YsCf4paQGy45bN2PivILGkDXjdzrotvBxNB8hV/QXrMAjEoEWcO8ZPWTdIJ6y6 e5PouUd9yq8FqEuJ8dMOGh2eFDFAXcA= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747703059; a=rsa-sha256; cv=none; b=EGqT4mOQZyj8fBqtBor02pckL+ZeBZPjxvo9U5B3fLMiBrcfA5YdOyy55YYd8RlEfnsTVZ 9jTdH8WRiFdVopupk7l5QCAle/uvVVF0XVVq32PDnDtxsgf1xWPf/ow75oRTAc7ZjYSxqW unCQNNmD6bFeMSuoFCypcwgIlJ3OThE= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=none; spf=pass (imf25.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com; dmarc=none Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1uHBOd-000000000aB-0adx; Mon, 19 May 2025 21:03:55 -0400 From: Rik van Riel To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, x86@kernel.org, kernel-team@meta.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, nadav.amit@gmail.com, Rik van Riel , Rik van Riel Subject: [RFC v2 9/9] x86/mm: userspace & pageout flushing using Intel RAR Date: Mon, 19 May 2025 21:02:34 -0400 Message-ID: <20250520010350.1740223-10-riel@surriel.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250520010350.1740223-1-riel@surriel.com> References: <20250520010350.1740223-1-riel@surriel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Stat-Signature: twtca95zj9omca5nq1e89d6h6jd518ji X-Rspamd-Queue-Id: 30DFDA0005 X-Rspam-User: X-Rspamd-Server: rspam02 X-HE-Tag: 1747703058-732997 X-HE-Meta: U2FsdGVkX1/evN1mmVrJBQF8fr01e/JombbsW2QOkeIvLZdINdjHwsMPnWheW9+bhP+s6I5XlRIcCyYrudcVzue0UPbBqMv35OcrLrzQWm4Wl9yXTEJIziJa7sYlii0+Y2vfcS/gr2Ons0l5Nd5bRkooRDpRweMII+soExybjEJP8h5jl47aM8o4Zq/pQes4WGVQJhqfVkdqhyi9xCZcwmElbdhW6ML8SyaPG307FJNi+g82b+E5cuqGlKlzsEI1Sdu2Bo6cOPbNlb2UChnuNwKUsy569eNkL+r/yxIsWaKWDaedPubu/QsaUaD/452YQ8ybWV19puiv5CtLdD7APLB/Ik9MsNPW7LyKwF9a+RHvI3OQDvjSxhkhtQNH85UbjC+xW2k4kOReWD1AjHjJe1IeIZelx6W+fxUSNwwO//wuU/yPLIhWZc6QRjZmouhb2u7TudLKmMbR3A2A0qp8Ve0Dot2u+2OqriK9t/l3h3SuU2E8Jj14CvgvfsQv3W4dqO191R6bLDtZD1g6Azf428j+8xEZ5c/NpIXwpDI1DL/1seGYwyC4DibfTpmdcWWe7WPGVBIvM8xIcjuxuu5k9UFJym0m7vb5f6VeF18/+hZfRkZTTCgyOx4E/h9XslFkFpOARS+H7XsEbjwKSt0QBk2/zaW+neTTxz/yBqslL0rW2dIh6PhuBZyx2BgT9ESUPKQv0PcROf/EQx2g2sHOJRL5VNuApisWUnvaBuXWN3soDR61ayZRxXZhOLBEVLnE8o7vM7O/xwcQUNColrInt6FBkYCfLb4RaEIB246X2TX7386Zuqw+xZNYS2W4+uy0w0SqzU3w4SX8zH/VDtlhVAu/wFPJXM08176Boe+Cd6+22jajcBn/OCd66AYz7+H+4pUuZl3KA628asf7K/dGgmY0TiS8SUS4UOorpPkiGUgUNTwu0DWr3YrhgDZKP2y4YCvnnAMvNG2PAL1m+sP vRmE+Rc+ g32EynJtm8L0HBQtFZYcv+l9T5aAngDkWn7DN1XbqgAIO8MrZyoQEf7Vkkd7bGlAadSMqgcakHH3bnaGZ89n2eGOrqKIHSUqfYE7ml0DH9aw1MWAx4RpyKQ3OoJ2861BihD68O2fqyvVDvUg576zXyYKIuTyt7g/00Vphx8KNT6uzYBzAEqQ6q2meT1hc+Cx6C3crfp2YnfN5if3CcVs19GHqppgfbUlNu9uTNRe8AlJ3VtKg7QKJ68rRq040I+Ce//mqcJuCkS1yewDIkNi01vomaAgJCWQvmmGnG2d9oHR4xTU8OrlMx7aG3tCUSYLqZeqV/cHeF6S75ab+hNcFlYm8NZ3e1aDGXyiISSy6kj6+BspO0dW1Kf6zUOPj1CLDD2FjMyRxfWfSxpwNgJi0F6iLjA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Rik van Riel Use Intel RAR to flush userspace mappings. Because RAR flushes are targeted using a cpu bitmap, the rules are a little bit different than for true broadcast TLB invalidation. For true broadcast TLB invalidation, like done with AMD INVLPGB, a global ASID always has up to date TLB entries on every CPU. The context switch code never has to flush the TLB when switching to a global ASID on any CPU with INVLPGB. For RAR, the TLB mappings for a global ASID are kept up to date only on CPUs within the mm_cpumask, which lazily follows the threads around the system. The context switch code does not need to flush the TLB if the CPU is in the mm_cpumask, and the PCID used stays the same. However, a CPU that falls outside of the mm_cpumask can have out of date TLB mappings for this task. When switching to that task on a CPU not in the mm_cpumask, the TLB does need to be flushed. Signed-off-by: Rik van Riel --- arch/x86/include/asm/tlbflush.h | 9 ++- arch/x86/mm/tlb.c | 121 ++++++++++++++++++++++++++------ 2 files changed, 104 insertions(+), 26 deletions(-) diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index cc9935bbbd45..bdde3ce6c9b1 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -276,7 +276,8 @@ static inline u16 mm_global_asid(struct mm_struct *mm) { u16 asid; - if (!cpu_feature_enabled(X86_FEATURE_INVLPGB)) + if (!cpu_feature_enabled(X86_FEATURE_INVLPGB) && + !cpu_feature_enabled(X86_FEATURE_RAR)) return 0; asid = smp_load_acquire(&mm->context.global_asid); @@ -289,7 +290,8 @@ static inline u16 mm_global_asid(struct mm_struct *mm) static inline void mm_init_global_asid(struct mm_struct *mm) { - if (cpu_feature_enabled(X86_FEATURE_INVLPGB)) { + if (cpu_feature_enabled(X86_FEATURE_INVLPGB) || + cpu_feature_enabled(X86_FEATURE_RAR)) { mm->context.global_asid = 0; mm->context.asid_transition = false; } @@ -313,7 +315,8 @@ static inline void mm_clear_asid_transition(struct mm_struct *mm) static inline bool mm_in_asid_transition(struct mm_struct *mm) { - if (!cpu_feature_enabled(X86_FEATURE_INVLPGB)) + if (!cpu_feature_enabled(X86_FEATURE_INVLPGB) && + !cpu_feature_enabled(X86_FEATURE_RAR)) return false; return mm && READ_ONCE(mm->context.asid_transition); diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 35489df811dc..51658bdaa0b3 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -203,7 +203,8 @@ struct new_asid { unsigned int need_flush : 1; }; -static struct new_asid choose_new_asid(struct mm_struct *next, u64 next_tlb_gen) +static struct new_asid choose_new_asid(struct mm_struct *next, u64 next_tlb_gen, + bool new_cpu) { struct new_asid ns; u16 asid; @@ -216,14 +217,22 @@ static struct new_asid choose_new_asid(struct mm_struct *next, u64 next_tlb_gen) /* * TLB consistency for global ASIDs is maintained with hardware assisted - * remote TLB flushing. Global ASIDs are always up to date. + * remote TLB flushing. Global ASIDs are always up to date with INVLPGB, + * and up to date for CPUs in the mm_cpumask with RAR.. */ - if (cpu_feature_enabled(X86_FEATURE_INVLPGB)) { + if (cpu_feature_enabled(X86_FEATURE_INVLPGB) || + cpu_feature_enabled(X86_FEATURE_RAR)) { u16 global_asid = mm_global_asid(next); if (global_asid) { ns.asid = global_asid; ns.need_flush = 0; + /* + * If the CPU fell out of the cpumask, it can be + * out of date with RAR, and should be flushed. + */ + if (cpu_feature_enabled(X86_FEATURE_RAR)) + ns.need_flush = new_cpu; return ns; } } @@ -281,7 +290,14 @@ static void reset_global_asid_space(void) { lockdep_assert_held(&global_asid_lock); - invlpgb_flush_all_nonglobals(); + /* + * The global flush ensures that a freshly allocated global ASID + * has no entries in any TLB, and can be used immediately. + * With Intel RAR, the TLB may still need to be flushed at context + * switch time when dealing with a CPU that was not in the mm_cpumask + * for the process, and may have missed flushes along the way. + */ + flush_tlb_all(); /* * The TLB flush above makes it safe to re-use the previously @@ -358,7 +374,7 @@ static void use_global_asid(struct mm_struct *mm) { u16 asid; - guard(raw_spinlock_irqsave)(&global_asid_lock); + guard(raw_spinlock)(&global_asid_lock); /* This process is already using broadcast TLB invalidation. */ if (mm_global_asid(mm)) @@ -384,13 +400,14 @@ static void use_global_asid(struct mm_struct *mm) void mm_free_global_asid(struct mm_struct *mm) { - if (!cpu_feature_enabled(X86_FEATURE_INVLPGB)) + if (!cpu_feature_enabled(X86_FEATURE_INVLPGB) && + !cpu_feature_enabled(X86_FEATURE_RAR)) return; if (!mm_global_asid(mm)) return; - guard(raw_spinlock_irqsave)(&global_asid_lock); + guard(raw_spinlock)(&global_asid_lock); /* The global ASID can be re-used only after flush at wrap-around. */ #ifdef CONFIG_BROADCAST_TLB_FLUSH @@ -408,7 +425,8 @@ static bool mm_needs_global_asid(struct mm_struct *mm, u16 asid) { u16 global_asid = mm_global_asid(mm); - if (!cpu_feature_enabled(X86_FEATURE_INVLPGB)) + if (!cpu_feature_enabled(X86_FEATURE_INVLPGB) && + !cpu_feature_enabled(X86_FEATURE_RAR)) return false; /* Process is transitioning to a global ASID */ @@ -426,13 +444,17 @@ static bool mm_needs_global_asid(struct mm_struct *mm, u16 asid) */ static void consider_global_asid(struct mm_struct *mm) { - if (!cpu_feature_enabled(X86_FEATURE_INVLPGB)) + if (!cpu_feature_enabled(X86_FEATURE_INVLPGB) && + !cpu_feature_enabled(X86_FEATURE_RAR)) return; /* Check every once in a while. */ if ((current->pid & 0x1f) != (jiffies & 0x1f)) return; + if (mm == &init_mm) + return; + /* * Assign a global ASID if the process is active on * 4 or more CPUs simultaneously. @@ -480,7 +502,7 @@ static void finish_asid_transition(struct flush_tlb_info *info) mm_clear_asid_transition(mm); } -static void broadcast_tlb_flush(struct flush_tlb_info *info) +static void invlpgb_tlb_flush(struct flush_tlb_info *info) { bool pmd = info->stride_shift == PMD_SHIFT; unsigned long asid = mm_global_asid(info->mm); @@ -511,8 +533,6 @@ static void broadcast_tlb_flush(struct flush_tlb_info *info) addr += nr << info->stride_shift; } while (addr < info->end); - finish_asid_transition(info); - /* Wait for the INVLPGBs kicked off above to finish. */ __tlbsync(); } @@ -840,7 +860,7 @@ void switch_mm_irqs_off(struct mm_struct *unused, struct mm_struct *next, /* Check if the current mm is transitioning to a global ASID */ if (mm_needs_global_asid(next, prev_asid)) { next_tlb_gen = atomic64_read(&next->context.tlb_gen); - ns = choose_new_asid(next, next_tlb_gen); + ns = choose_new_asid(next, next_tlb_gen, true); goto reload_tlb; } @@ -878,6 +898,7 @@ void switch_mm_irqs_off(struct mm_struct *unused, struct mm_struct *next, ns.asid = prev_asid; ns.need_flush = true; } else { + bool new_cpu = false; /* * Apply process to process speculation vulnerability * mitigations if applicable. @@ -892,20 +913,25 @@ void switch_mm_irqs_off(struct mm_struct *unused, struct mm_struct *next, this_cpu_write(cpu_tlbstate.loaded_mm, LOADED_MM_SWITCHING); barrier(); - /* Start receiving IPIs and then read tlb_gen (and LAM below) */ - if (next != &init_mm && !cpumask_test_cpu(cpu, mm_cpumask(next))) + /* Start receiving IPIs and RAR invalidations */ + if (next != &init_mm && !cpumask_test_cpu(cpu, mm_cpumask(next))) { cpumask_set_cpu(cpu, mm_cpumask(next)); + if (cpu_feature_enabled(X86_FEATURE_RAR)) + new_cpu = true; + } + next_tlb_gen = atomic64_read(&next->context.tlb_gen); - ns = choose_new_asid(next, next_tlb_gen); + ns = choose_new_asid(next, next_tlb_gen, new_cpu); } reload_tlb: new_lam = mm_lam_cr3_mask(next); if (ns.need_flush) { - VM_WARN_ON_ONCE(is_global_asid(ns.asid)); - this_cpu_write(cpu_tlbstate.ctxs[ns.asid].ctx_id, next->context.ctx_id); - this_cpu_write(cpu_tlbstate.ctxs[ns.asid].tlb_gen, next_tlb_gen); + if (is_dyn_asid(ns.asid)) { + this_cpu_write(cpu_tlbstate.ctxs[ns.asid].ctx_id, next->context.ctx_id); + this_cpu_write(cpu_tlbstate.ctxs[ns.asid].tlb_gen, next_tlb_gen); + } load_new_mm_cr3(next->pgd, ns.asid, new_lam, true); trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); @@ -1122,8 +1148,13 @@ static void flush_tlb_func(void *info) loaded_mm_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid); } - /* Broadcast ASIDs are always kept up to date with INVLPGB. */ - if (is_global_asid(loaded_mm_asid)) + /* + * Broadcast ASIDs are always kept up to date with INVLPGB; with + * Intel RAR IPI based flushes are used periodically to trim the + * mm_cpumask, and flushes that get here should be processed. + */ + if (cpu_feature_enabled(X86_FEATURE_INVLPGB) && + is_global_asid(loaded_mm_asid)) return; VM_WARN_ON(this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].ctx_id) != @@ -1358,6 +1389,35 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(struct flush_tlb_info, flush_tlb_info); static DEFINE_PER_CPU(unsigned int, flush_tlb_info_idx); #endif +static void rar_tlb_flush(struct flush_tlb_info *info) +{ + unsigned long asid = mm_global_asid(info->mm); + u16 pcid = kern_pcid(asid); + + /* Flush the remote CPUs. */ + smp_call_rar_many(mm_cpumask(info->mm), pcid, info->start, info->end); + if (cpu_feature_enabled(X86_FEATURE_PTI)) + smp_call_rar_many(mm_cpumask(info->mm), user_pcid(asid), info->start, info->end); + + /* Flush the local TLB, if needed. */ + if (cpumask_test_cpu(smp_processor_id(), mm_cpumask(info->mm))) { + lockdep_assert_irqs_enabled(); + local_irq_disable(); + flush_tlb_func(info); + local_irq_enable(); + } +} + +static void broadcast_tlb_flush(struct flush_tlb_info *info) +{ + if (cpu_feature_enabled(X86_FEATURE_INVLPGB)) + invlpgb_tlb_flush(info); + else /* Intel RAR */ + rar_tlb_flush(info); + + finish_asid_transition(info); +} + static struct flush_tlb_info *get_flush_tlb_info(struct mm_struct *mm, unsigned long start, unsigned long end, unsigned int stride_shift, bool freed_tables, @@ -1418,15 +1478,22 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start, info = get_flush_tlb_info(mm, start, end, stride_shift, freed_tables, new_tlb_gen); + /* + * IPIs and RAR can be targeted to a cpumask. Periodically trim that + * mm_cpumask by sending TLB flush IPIs, even when most TLB flushes + * are done with RAR. + */ + if (!cpu_feature_enabled(X86_FEATURE_INVLPGB) || !mm_global_asid(mm)) + info->trim_cpumask = should_trim_cpumask(mm); + /* * flush_tlb_multi() is not optimized for the common case in which only * a local TLB flush is needed. Optimize this use-case by calling * flush_tlb_func_local() directly in this case. */ - if (mm_global_asid(mm)) { + if (mm_global_asid(mm) && !info->trim_cpumask) { broadcast_tlb_flush(info); } else if (cpumask_any_but(mm_cpumask(mm), cpu) < nr_cpu_ids) { - info->trim_cpumask = should_trim_cpumask(mm); flush_tlb_multi(mm_cpumask(mm), info); consider_global_asid(mm); } else if (mm == this_cpu_read(cpu_tlbstate.loaded_mm)) { @@ -1737,6 +1804,14 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) if (cpu_feature_enabled(X86_FEATURE_INVLPGB) && batch->unmapped_pages) { invlpgb_flush_all_nonglobals(); batch->unmapped_pages = false; + } else if (cpu_feature_enabled(X86_FEATURE_RAR) && cpumask_any(&batch->cpumask) < nr_cpu_ids) { + rar_full_flush(&batch->cpumask); + if (cpumask_test_cpu(cpu, &batch->cpumask)) { + lockdep_assert_irqs_enabled(); + local_irq_disable(); + invpcid_flush_all_nonglobals(); + local_irq_enable(); + } } else if (cpumask_any_but(&batch->cpumask, cpu) < nr_cpu_ids) { flush_tlb_multi(&batch->cpumask, info); } else if (cpumask_test_cpu(cpu, &batch->cpumask)) { -- 2.49.0