From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F3107C04A6A for ; Tue, 25 Jul 2023 13:22:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6940B8D0001; Tue, 25 Jul 2023 09:22:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 61E6C6B0074; Tue, 25 Jul 2023 09:22:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4970A8D0001; Tue, 25 Jul 2023 09:22:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 38BEE6B0071 for ; Tue, 25 Jul 2023 09:22:43 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id E67291A0E23 for ; Tue, 25 Jul 2023 13:22:42 +0000 (UTC) X-FDA: 81050198964.08.DE2BA82 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf01.hostedemail.com (Postfix) with ESMTP id EB2D140003 for ; Tue, 25 Jul 2023 13:22:40 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=f8AwNDDB; spf=none (imf01.hostedemail.com: domain of peterz@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=peterz@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690291361; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=wGqxpUXCoDxIaORETfr7uIrAMmzRtESmnNtnwpMTjcQ=; b=hOvEoRi6w8iFlOxfJfmLCgJoNLWtCH6DWpQ5621U3PcEYTFUDTXno8PDyN9I4eLhwP9Hew jhvM1N32UImapCOf1skS34ankKMF4ZuXCMcpnMTwuG62BFjuqOGA4HDwNwJ2wPjq/4XHpc AQgK33YcrH6c55BPeY12VdIfYljqqIE= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=f8AwNDDB; spf=none (imf01.hostedemail.com: domain of peterz@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=peterz@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690291361; a=rsa-sha256; cv=none; b=4dFQJxkb9WEcjuCejphxk7K1eJMSJcZfT9odNLsIcV54l86RsrQuHaee0D6O03vjl5Tm3X FI/UEixeXOCORS0awU2t0I6tNb8FRE3Ku9HBpIdTS7DFMeAYUpZ3wFrjkORzCr2z2Y+YhM ieBtzvGUIfCXoYN1IN+VlXcmrrtxpXg= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=wGqxpUXCoDxIaORETfr7uIrAMmzRtESmnNtnwpMTjcQ=; b=f8AwNDDBNeSxUozU34si/zZBDf plnXfYPhG0x+P/nfzmym0QplxCPJ7SZ/55xlG0/wfRzFP3qjjqPhRviGKBsC7Zd5aD7/51ehEIcju h9rI0QP6TNGPTo8sDE7i8ZIUdODxaSbETaTglg2bkjgwSno/LOwy+t3L6frXOZ/gUdBsYDnf3fTWm i32WJGiyMpI0+74+v+giFGr+IGzPqQICkYSOV1tb2wK/ha9P4p+geQWGEfD/B23DNww43mcOaxmvj HnaHdDr5UPRMasRRiI/PqfVG343MRV53doZZI+vc7ByAdPWQKHp0/uzlgiPhp2WJKwweLe3q1Axtm CBejci1Q==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1qOHzB-005Uxr-5m; Tue, 25 Jul 2023 13:21:57 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 6CD2A30036B; Tue, 25 Jul 2023 15:21:55 +0200 (CEST) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id 42E6127D9B9A2; Tue, 25 Jul 2023 15:21:55 +0200 (CEST) Date: Tue, 25 Jul 2023 15:21:55 +0200 From: Peter Zijlstra To: Dave Hansen Cc: Valentin Schneider , Nadav Amit , Linux Kernel Mailing List , "linux-trace-kernel@vger.kernel.org" , "linux-doc@vger.kernel.org" , "kvm@vger.kernel.org" , linux-mm , bpf , the arch/x86 maintainers , "rcu@vger.kernel.org" , "linux-kselftest@vger.kernel.org" , Steven Rostedt , Masami Hiramatsu , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Andy Lutomirski , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Josh Poimboeuf , Jason Baron , Kees Cook , Sami Tolvanen , Ard Biesheuvel , Nicholas Piggin , Juerg Haefliger , Nicolas Saenz Julienne , "Kirill A. Shutemov" , Dan Carpenter , Chuang Wang , Yang Jihong , Petr Mladek , "Jason A. Donenfeld" , Song Liu , Julian Pidancet , Tom Lendacky , Dionna Glaze , Thomas =?iso-8859-1?Q?Wei=DFschuh?= , Juri Lelli , Daniel Bristot de Oliveira , Marcelo Tosatti , Yair Podemsky Subject: Re: [RFC PATCH v2 20/20] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs Message-ID: <20230725132155.GJ3765278@hirez.programming.kicks-ass.net> References: <20230720163056.2564824-1-vschneid@redhat.com> <20230720163056.2564824-21-vschneid@redhat.com> <188AEA79-10E6-4DFF-86F4-FE624FD1880F@vmware.com> <2284d0db-f94a-e059-7bd0-bab4f112ed35@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2284d0db-f94a-e059-7bd0-bab4f112ed35@intel.com> X-Rspamd-Queue-Id: EB2D140003 X-Rspam-User: X-Stat-Signature: muo9tdwnqt7896jwhsu5fu5ju1uwkpis X-Rspamd-Server: rspam01 X-HE-Tag: 1690291360-191256 X-HE-Meta: U2FsdGVkX18Vt2zvJvJUPrFLw8WgLd98qfEwgEaq5iKzANjyXxkp8JzYPO4fDUM/pfl00+tcQxK7pJFO9bWPX9IVdZ+fzasl5VqyKoUCllnqRlnGKzpbAG5vXMq5k4fFIwWDk5VgwvdNxLFzlyMpkgBBxYKgQ4pUojkBcnU0npJ8XNjTDtlQSTyXwLeOKYFLHW3H450lXUdqivo0t7DQ+qZhROKh+tk75q3tnlYffsTRVtB9+ZoLn3nmASJ8D4eOQWn0wf4Reb17Y9C38OtMiVNOpUlK5tOQv4EkkWjkAJ2fbzhP2LQNYlYbl/Kern0kvQeVc1xesMTR95CLlvIpbcQs5+BwL1v41mRt+BIS9eNtC1Wbvlmn4YtMXjFS4GCyHLNv7zVAmgEIy19TeOmC8PB349PkDga5jf5kuqyVTotYsDBFKUau/4+gOm8G4b0DHFiYwV2UiY6N6gh1czKtnJNxNq1UCNvzN98hdE6wuvJ5I2FNEkuNklGknjmxlqMhWd1V5JGmIVVpZcMSxncxa5UEquP5lMjTySG82FZzsnp0xi9XWROUJO/wSx9VoGjOcYjw1lQlHAt/HSEuKXojo1yW269Y9KaitZVnKrELN7b3JcmDlO6Ric634tSvN6pgr0rq7v47QFexZU4+W5yDhBfVBDzrPlRo9uPFdsSLtDWX36IXqNQdINSWg3cd0hjAqhqACmBp5Xx///ePDdrMHSY9Oux4OmGOA7rB1fyhR4VUPwN7fGOksS0RqBHB3SsrYvFrQhnAWe9jDC6Iz8HVw6cECCZj9OHUjmqNLECwib9w3ZRNc1z9YF8iTZJC1Z4JLnG3emkycaRLD9CMzcX6J/k4rh345S32vHAwUGkL2P6l5UapFqsz0Pds1W6KjhgznsgG7BG57+N/G6i8h0D+IU0wNfxnRObR/SNWv41yqGE4KHnUQz+7DNOE/sbO7luJF/EV/ABzcawyauWNKX9 r1jiWvNh OJQF6YQ2Q+EWTSm7jJ9sJ+9YigmrlAKjjvM/NoMnPPRSZm4Y/ABiXYxC2s56tvwSF3rCGUwufFdtY8Kv3Tds36t1/LvjC8EX/5OiXutePyrZeR+wKzThtiIkm8XdxpkCu2dQE/HYBSVVDuONKnn2BLnqkH2gAy6IY9Z1zatg00WKlFjtdQoWnvqTr4uq/yvYnIrHPjd+kUDebR7M2g620dnQ0OsT15N7dTYfKxrqmflesT+VZd9g1omnTAA08mdWFOaKYzzgNZHEBQuUIzqQ/5by0C66NYNg965eUVf5vAszkoVk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jul 24, 2023 at 10:40:04AM -0700, Dave Hansen wrote: > TLB flushes for freed page tables are another game entirely. The CPU is > free to cache any part of the paging hierarchy it wants at any time. > It's also free to set accessed and dirty bits at any time, even for > instructions that may never execute architecturally. > > That basically means that if you have *ANY* freed page table page > *ANYWHERE* in the page table hierarchy of any CPU at any time ... you're > screwed. > > There's no reasoning about accesses or ordering. As soon as the CPU > does *anything*, it's out to get you. > > You're going to need to do something a lot more radical to deal with > free page table pages. Ha! IIRC the only thing we can reasonably do there is to have strict per-cpu page-tables such that NOHZ_FULL CPUs can be isolated. That is, as long we the per-cpu tables do not contain -- and have never contained -- a particular table page, we can avoid flushing it. Because if it never was there, it also couldn't have speculatively loaded it. Now, x86 doesn't really do per-cpu page tables easily (otherwise we'd have done them ages ago) and doing them is going to be *major* surgery and pain. Other than that, we must take the TLBI-IPI when freeing page-table-pages. But yeah, I think Nadav is right, vmalloc.c never frees page-tables (or at least, I couldn't find it in a hurry either), but if we're going to be doing this, then that file must include a very prominent comment explaining it must never actually do so either. Not being able to free page-tables might be a 'problem' if we're going to be doing more of HUGE_VMALLOC, because that means it becomes rather hard to swizzle from small to large pages.