From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E7B9BC0219D for ; Tue, 11 Feb 2025 14:05:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6CA2B280002; Tue, 11 Feb 2025 09:05:20 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6A388280001; Tue, 11 Feb 2025 09:05:20 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 56953280002; Tue, 11 Feb 2025 09:05:20 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 3A5AE280001 for ; Tue, 11 Feb 2025 09:05:20 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id AE78180731 for ; Tue, 11 Feb 2025 14:03:25 +0000 (UTC) X-FDA: 83107831212.04.8AD4CCF Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf08.hostedemail.com (Postfix) with ESMTP id 6109E16001B for ; Tue, 11 Feb 2025 14:03:23 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=none; spf=pass (imf08.hostedemail.com: domain of mark.rutland@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=mark.rutland@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739282603; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zqiS5G+M3fIZNTWzQKxKuskF4w3Gimy7taxKMss1ow4=; b=XuFeT+hGn9aRXCB8SmW8hmx9O6iF8iALAkRwIccvq2u4/L5bJQ5xaOfIX5PSPg6FbzO4tI Dgu116iJjCF5X915flwhRbE6DGXSwaTJn/N/7GjEhtJwYcj8J8Qx5swv8uVwG23S2E3aD0 P/E3pN8My86sK+KJTKUXIeuC0VEJM9c= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=none; spf=pass (imf08.hostedemail.com: domain of mark.rutland@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=mark.rutland@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739282604; a=rsa-sha256; cv=none; b=AgOWYWZD70NCHplJ+VXet2lQjQnVKU1+jS4Ui2d+NIXSxyuxvA7e3rczc46uCmJa6ezq/d 3Huu58VWS+QrV5uxeNioTRA9L284N51WmhxclNWUpnefAewa1c/I6E3cjXVkR56KamvSVp cN20wC17t1l7hvtqaWwcE6ZbhQr0QBs= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D37F113D5; Tue, 11 Feb 2025 06:03:43 -0800 (PST) Received: from J2N7QTR9R3 (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id DE72E3F6A8; Tue, 11 Feb 2025 06:03:10 -0800 (PST) Date: Tue, 11 Feb 2025 14:03:08 +0000 From: Mark Rutland To: Valentin Schneider Cc: Jann Horn , linux-kernel@vger.kernel.org, x86@kernel.org, virtualization@lists.linux.dev, linux-arm-kernel@lists.infradead.org, loongarch@lists.linux.dev, linux-riscv@lists.infradead.org, linux-perf-users@vger.kernel.org, xen-devel@lists.xenproject.org, kvm@vger.kernel.org, linux-arch@vger.kernel.org, rcu@vger.kernel.org, linux-hardening@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, bpf@vger.kernel.org, bcm-kernel-feedback-list@broadcom.com, Juergen Gross , Ajay Kaher , Alexey Makhalov , Russell King , Catalin Marinas , Will Deacon , Huacai Chen , WANG Xuerui , Paul Walmsley , Palmer Dabbelt , Albert Ou , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Peter Zijlstra , Arnaldo Carvalho de Melo , Namhyung Kim , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , "Liang, Kan" , Boris Ostrovsky , Josh Poimboeuf , Pawan Gupta , Sean Christopherson , Paolo Bonzini , Andy Lutomirski , Arnd Bergmann , Frederic Weisbecker , "Paul E. McKenney" , Jason Baron , Steven Rostedt , Ard Biesheuvel , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Uladzislau Rezki , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Juri Lelli , Clark Williams , Yair Podemsky , Tomas Glozar , Vincent Guittot , Dietmar Eggemann , Ben Segall , Mel Gorman , Kees Cook , Andrew Morton , Christoph Hellwig , Shuah Khan , Sami Tolvanen , Miguel Ojeda , Alice Ryhl , "Mike Rapoport (Microsoft)" , Samuel Holland , Rong Xu , Nicolas Saenz Julienne , Geert Uytterhoeven , Yosry Ahmed , "Kirill A. Shutemov" , "Masami Hiramatsu (Google)" , Jinghao Jia , Luis Chamberlain , Randy Dunlap , Tiezhu Yang Subject: Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs Message-ID: References: <20250114175143.81438-1-vschneid@redhat.com> <20250114175143.81438-30-vschneid@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Queue-Id: 6109E16001B X-Stat-Signature: tdn9nzbbcgzwk4pb7md597u15n973mm7 X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1739282603-148723 X-HE-Meta: U2FsdGVkX1/L5mlz9BGT2qhifbtdHEFepQ30c/Hk8FmUFjNJIl4fjzq0vAkSyc87gpefAqggIAOTtbW89PMcVW0zEf7Y926Wcty6JErrtc9nSEBgRjTm06F5ctGzqnUEVyWnOVKpAX7BpIXes4xkByhKOYjCzQsfcsEvuRid3FhneSmn2ZWYsaRUHT+4toOFpxJoCXGiCPIiFZhGtM8ELxRedxnCYkFQmkQMNseDK05gCNkQtd4CFtx9SSnTjL9SrfSsI/TPtmfaSpc3GDyzPzK5TBUjBm+5UerZGTFFWZlg4+jYEPAiXc7wpY0QZIpAkloiL7MjwG1mfnAHNdQDFCeONrRfqyYA2Xx6XxtM5FeQc9LPoG83H0ZlH5WeTHN0hhhSInJ6kYQdRxj9o1gsBsOxvr9gg7OW86Ev/1qL1VSocqgkriPaqPUEEhHkUifRsDS6klhn4Z238z5qii614k7H3XuRHYApbIpzDapgZEu64i8Saf34hg7wpN+OcDdOSiZ6/OTbEHtAPF4WOi6JbVbk6uYbnWVGmeKzZVgng6XqcF7/9yr3xnHkQsMCO7UpgnsBtkZ5bbihNgGbCxBOVTSEpfBN7mAuhBljukbdQDJowkDopqPm8qZX44MRv2G3vv9RPGcfVHzHMlBUHVtoLj0lxsuIlcQQpYQF48e4LCsBHLNTibXBliM5Zokq9qhkhBLbjglVkmvsnV71MP0soWoGk41htfnWPlT+mTqbfJYtYR+EAgjpmMpRn/YDccayWGz7GQDq3KjlkEAlxtIyzfUjXOtSRGSunMsMjwndjF2IZwF/uvhLH2IezilsqIzRBsvtEdjxvt3regjAES8e7ixqlZZRp7n2ZD1J/UkMi2JjT0gBtzMMN96vy7PVMtXf+9lblXPfc7whis6n6ySU7GxvjBhkfhwhICSAj7ft4trP4hRG/db2QbjYGeJra4loSsSiVMpRdhYMW2OKZVZ 91bWb3K2 8qSoof/taZ9uDhEZWFQOJg9e//oOigF8GMYWhiOvsNZBoDiYX2ilbYJMTfniMLcYRut49pdKyNRF5MRxdSFRwcTLJTUPWPQQ/FhuzSZ+zrIED83uPbLLfw6sZhLEfzB+XLNsW1Cu/vhLc23G8Yi5U0+lYCvhyOZcERu/LxI3nVze+cZ61t+1UfeQRbiFehBDEEpK9RzdVbNf3aDjLc6UhbJ0Rcwd1fSyFsyR9ay5owLdJGXqLNY157wL+c0e4cL+XcoxoNehWdNa1QtF1r91dguwn/2ECmAopR8iGck0lvljl15lfu2RMs4lSvBU+LqPN1AQULhtIMRKTxSSfbvwFI36kNWTKsOKq9Fg83WFdXSc5JdI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Feb 11, 2025 at 02:33:51PM +0100, Valentin Schneider wrote: > On 10/02/25 23:08, Jann Horn wrote: > > On Mon, Feb 10, 2025 at 7:36 PM Valentin Schneider wrote: > >> What if isolated CPUs unconditionally did a TLBi as late as possible in > >> the stack right before returning to userspace? This would mean that upon > >> re-entering the kernel, an isolated CPU's TLB wouldn't contain any kernel > >> range translation - with the exception of whatever lies between the > >> last-minute flush and the actual userspace entry, which should be feasible > >> to vet? Then AFAICT there wouldn't be any work/flush to defer, the IPI > >> could be entirely silenced if it targets an isolated CPU. > > > > Two issues with that: > > Firstly, thank you for entertaining the idea :-) > > > 1. I think the "Common not Private" feature Will Deacon referred to is > > incompatible with this idea: > > > > says "When the CnP bit is set, the software promises to use the ASIDs > > and VMIDs in the same way on all processors, which allows the TLB > > entries that are created by one processor to be used by another" > > Sorry for being obtuse - I can understand inconsistent TLB states (old vs > new translations being present in separate TLBs) due to not sending the > flush IPI causing an issue with that, but not "flushing early". Even if TLB > entries can be shared/accessed between CPUs, a CPU should be allowed not to > have a shared entry in its TLB - what am I missing? > > > 2. It's wrong to assume that TLB entries are only populated for > > addresses you access - thanks to speculative execution, you have to > > assume that the CPU might be populating random TLB entries all over > > the place. > > Gotta love speculation. Now it is supposed to be limited to genuinely > accessible data & code, right? Say theoretically we have a full TLBi as > literally the last thing before doing the return-to-userspace, speculation > should be limited to executing maybe bits of the return-from-userspace > code? I think it's easier to ignore speculation entirely, and just assume that the MMU can arbitrarily fill TLB entries from any page table entries which are valid/accessible in the active page tables. Hardware prefetchers can do that regardless of the specific path of speculative execution. Thus TLB fills are not limited to VAs which would be used on that return-to-userspace path. > Furthermore, I would hope that once a CPU is executing in userspace, it's > not going to populate the TLB with kernel address translations - AIUI the > whole vulnerability mitigation debacle was about preventing this sort of > thing. The CPU can definitely do that; the vulnerability mitigations are all about what userspace can observe rather than what the CPU can do in the background. Additionally, there are features like SPE and TRBE that use kernel addresses while the CPU is executing userspace instructions. The latest ARM Architecture Reference Manual (ARM DDI 0487 L.a) is fairly clear about that in section D8.16 "Translation Lookaside Buff", where it says (among other things): When address translation is enabled, if a translation table entry meets all of the following requirements, then that translation table entry is permitted to be cached in a TLB or intermediate TLB caching structure at any time: • The translation table entry itself does not generate a Translation fault, an Address size fault, or an Access flag fault. • The translation table entry is not from a translation regime configured by an Exception level that is lower than the current Exception level. Here "permitted to be cached in a TLB" also implies that the HW is allowed to fetch the translation tabl entry (which is what ARM call page table entries). The PDF can be found at: https://developer.arm.com/documentation/ddi0487/la/?lang=en Mark.