From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CDE48C021B8 for ; Wed, 26 Feb 2025 16:53:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 47D73280008; Wed, 26 Feb 2025 11:53:00 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 42D216B0095; Wed, 26 Feb 2025 11:53:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2CE08280008; Wed, 26 Feb 2025 11:53:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 0CD816B0093 for ; Wed, 26 Feb 2025 11:53:00 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id AED7F54421 for ; Wed, 26 Feb 2025 16:52:59 +0000 (UTC) X-FDA: 83162690478.22.B710DA0 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf23.hostedemail.com (Postfix) with ESMTP id 76CC0140008 for ; Wed, 26 Feb 2025 16:52:57 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=J+gLlw62; spf=pass (imf23.hostedemail.com: domain of vschneid@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=vschneid@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740588777; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Qf/Qm0nkzf7U3vlvRsLDs3vQoV9ABJSWnA6WtOiN2ig=; b=UeJTE/DUXk7KbJEPCJlaPUwZ5EcKqh1VDD8EY2jvpF9UeJK0LnuoXCI4NP9pWi9U/G7yEf i2YIyImXu1A4tOc5HwCzIofyErlhDk8pwgwbKN/FVQhid1I5wyFTOxpjf6hkReTKojzU8o tb0y4TtgwdGpCF7QPDHOY0ck+6ykI4k= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=J+gLlw62; spf=pass (imf23.hostedemail.com: domain of vschneid@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=vschneid@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740588777; a=rsa-sha256; cv=none; b=wIZ/GSq7MUyLri5NoJcLoh0MNft1PlxlV3Y9sfFYTCvQxrPKU6AmG1qnSQtGGU+MXMTtGY gh7dyeXpjW3efD+1AdN8uYFiiagVZA2ubk2nJTwAI/g160TKnHJaQW0S9Ef6keYyEf5WoW v9dYVL/0X4YGypkysJCf8QLCz2CRkKU= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740588776; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Qf/Qm0nkzf7U3vlvRsLDs3vQoV9ABJSWnA6WtOiN2ig=; b=J+gLlw62K7t4nQLqh/eCztgxf5Fg2RUFnZfalzbLz7G4JkaSaI98fxLoJofLamUdKiq9lF AR5TImaky7EeD9LjA8lc2AfffaIe1F5zPSAVzupIlXI8hJbREi5Xvylty0oavr+Tqbm+xj K1XIY0YigzYySnTUVOWTEqhsAoXWxVI= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-556-rHqjAr6SP_-uWRBGpRBsBA-1; Wed, 26 Feb 2025 11:52:55 -0500 X-MC-Unique: rHqjAr6SP_-uWRBGpRBsBA-1 X-Mimecast-MFC-AGG-ID: rHqjAr6SP_-uWRBGpRBsBA_1740588774 Received: by mail-wm1-f69.google.com with SMTP id 5b1f17b1804b1-4394040fea1so441645e9.0 for ; Wed, 26 Feb 2025 08:52:55 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740588774; x=1741193574; h=mime-version:message-id:date:references:in-reply-to:subject:cc:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Qf/Qm0nkzf7U3vlvRsLDs3vQoV9ABJSWnA6WtOiN2ig=; b=UTSVcGGraF/ftIcH+KQfESyUWGv3K1jaIubGHjE0RLv6AcFHiX0hqTAUX35oDt2Mxs OBdgIF/QJPvG9BLwFYdwLBbjMnaEe2PTS8IfI1C/tHPI/flGWZXYimJqOvQvk6ZidGiA p1Emk9Psfny053Me/JBd97Hyacdm4Kwj6SL67d+4ZIFL0PI5HkbwNN+4eBoDxERrZViO qHOhaJHcJDH+u/Y91eMPEEqIgh/EUUfRBD2hITHWnEp1ckDUfYJ6wI8c4ZD1oc9hb5Qc 4QLr88pJRfmofJKH740kK/IYW58CyJiKdSKQGSJdPPHltLEqJBQVn6QC71gPrKjfZNBU ZBjw== X-Forwarded-Encrypted: i=1; AJvYcCWdlhVk0L9uVZJfPgVJn7okE1Py5WkNooi294zXYI5WWG5Zk7lu6tpriqMeM8LZ2gq8UG4Bg5vo5w==@kvack.org X-Gm-Message-State: AOJu0YxTJt0kKN//9U30ywaTqr5Rk1gnR3plJ/iFHKtMa9Ro+x1C/+Bu WgkP+tEw9/1jVsrto4ytAmw19yWp6olFSOeTkxHq7wQJG1WkjTPj1W1+NLJQ7u1uErXDn69Kptw 6Y1HWpB3Ajj5iJqhfnM6VkuIy6G57PWIUOm/yyLoJ0nNc0iX8 X-Gm-Gg: ASbGnctIEhmfoeMvx7EATkNBEieo/S0rOmD7fDzY401OKo8puUX/lK4+grY78Y9rKiW 5edClXaDgjk2pYGHWW44aNy+HBVxidadCNJNxfOkQsz//0FlHTgZ+sz4nWFJAmEyj4fUCNE29XC Z6SNabFp+Ood5mj3AIbkYYC39ObIOeTIuLh7TL2WMkTIWV/td/iMY3FPKgRdZZiSpvnKLShJICY zhWfBt+Y0YWKtU1/FQU2llQz9rW+k1vfoUtXZhx3XT4Gl8hZCQ21pDJeUUudQPQTgoQb/tCpcbc 41XMSb781J8/zNkFagK+SQJ6Gs8R5t2OCskeO8Cdn5y5ibJSkGyIhG0hhvjR9uhWUHRfgDNyokY R X-Received: by 2002:a05:600c:ca:b0:439:91c7:895a with SMTP id 5b1f17b1804b1-43afddc6489mr798005e9.7.1740588774316; Wed, 26 Feb 2025 08:52:54 -0800 (PST) X-Google-Smtp-Source: AGHT+IGWpDWParbyEFSiQubNArY/pYMHk4T+vY7Yfjqs/xu0tXoSgFv+Leti7F7TfR+hujI+3HmBjw== X-Received: by 2002:a05:600c:ca:b0:439:91c7:895a with SMTP id 5b1f17b1804b1-43afddc6489mr797035e9.7.1740588773857; Wed, 26 Feb 2025 08:52:53 -0800 (PST) Received: from vschneid-thinkpadt14sgen2i.remote.csb (213-44-141-166.abo.bbox.fr. [213.44.141.166]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-390d98c0642sm2242326f8f.81.2025.02.26.08.52.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 26 Feb 2025 08:52:53 -0800 (PST) From: Valentin Schneider To: Dave Hansen , Jann Horn Cc: linux-kernel@vger.kernel.org, x86@kernel.org, virtualization@lists.linux.dev, linux-arm-kernel@lists.infradead.org, loongarch@lists.linux.dev, linux-riscv@lists.infradead.org, linux-perf-users@vger.kernel.org, xen-devel@lists.xenproject.org, kvm@vger.kernel.org, linux-arch@vger.kernel.org, rcu@vger.kernel.org, linux-hardening@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, bpf@vger.kernel.org, bcm-kernel-feedback-list@broadcom.com, Juergen Gross , Ajay Kaher , Alexey Makhalov , Russell King , Catalin Marinas , Will Deacon , Huacai Chen , WANG Xuerui , Paul Walmsley , Palmer Dabbelt , Albert Ou , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Peter Zijlstra , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , "Liang, Kan" , Boris Ostrovsky , Josh Poimboeuf , Pawan Gupta , Sean Christopherson , Paolo Bonzini , Andy Lutomirski , Arnd Bergmann , Frederic Weisbecker , "Paul E. McKenney" , Jason Baron , Steven Rostedt , Ard Biesheuvel , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Uladzislau Rezki , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Juri Lelli , Clark Williams , Yair Podemsky , Tomas Glozar , Vincent Guittot , Dietmar Eggemann , Ben Segall , Mel Gorman , Kees Cook , Andrew Morton , Christoph Hellwig , Shuah Khan , Sami Tolvanen , Miguel Ojeda , Alice Ryhl , "Mike Rapoport (Microsoft)" , Samuel Holland , Rong Xu , Nicolas Saenz Julienne , Geert Uytterhoeven , Yosry Ahmed , "Kirill A. Shutemov" , "Masami Hiramatsu (Google)" , Jinghao Jia , Luis Chamberlain , Randy Dunlap , Tiezhu Yang Subject: Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs In-Reply-To: <408ebd8b-4bfb-4c4f-b118-7fe853c6e897@intel.com> References: <20250114175143.81438-1-vschneid@redhat.com> <20250114175143.81438-30-vschneid@redhat.com> <352317e3-c7dc-43b4-b4cb-9644489318d0@intel.com> <408ebd8b-4bfb-4c4f-b118-7fe853c6e897@intel.com> Date: Wed, 26 Feb 2025 17:52:50 +0100 Message-ID: MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: FAQcYYouJKrLgdf2TaBZ57ku-QPbvJjNVk7wA2P5qUQ_1740588774 X-Mimecast-Originator: redhat.com Content-Type: text/plain X-Rspam-User: X-Stat-Signature: nukpfu79rp3mwwkrjzsyt3pxw6xc3ko1 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 76CC0140008 X-HE-Tag: 1740588777-525400 X-HE-Meta: U2FsdGVkX19oou30oiRMMg6orQyrSXS0UR6OuZ1aWHoVL25WlWVgxyxWbs9rfuHpF9J1eskJX/EunU5d5mv2wflvMXVn9b7i3GtU08Q7kADK5VlPQUs9rP/kBHR6655ojND3XfE6+ytDnQQSZoSjcVq7H8yabAPYBctRuTahuzZL9ctM5AxLzZ8ln/EL1jy/xU0mwwmuYwXGzmueMo7PSPT0JYV9Vew5ob1LVU4mBeW7XQMYCQQF2E3Ly4KHQsp2bIY+v1GJxxHGO3KcakwUXPMWd3X6WP8PLM3bmHzqJNiFd7LE+XQDyh+FRrFuAwmBmpoyMIhYrp48OpfUfJ9qxXU+QIt6QGrCQLGV880KeTUdArt98tQatTQQdsXNm861/CW1kaIxgSHVn1fhJ0bcSTtlkQxg5qZeR/V5CD3AnAe9Mu3A259CknCiK8e/gTT5NK/f6Hx+ToWgnzaahwXhL1rZvvUDf8uKpbbOb3hhLQHPVrm6gL0y/XPmCx0JAy2IzvSiHZLvAQvTzAiMQGwN9N2hziNpbsNOo9BCe37EIQIN7D5ckjXLxM/TN3Ce7uFVOPgU/Trq2JMwu6OsagSVEyT0bKr5Jl3NCCjqbYdVNfxJa20EwbAqFajj2qb9p6+YvyAhYwoZ6KpMu9bdpNYs4owUNeZXJF9BrCe/1i/reXgQzD6+IaN2i/sipjvuGevFpnd/XO4VWCz7LViRlfQV9cUzBC7ezflLvD375AB7QBrUWp4HTRhYzR+CJGOXIVK0zIYso3ZODYdFDOQH3DWBv2K6v0Lb8SMkHBGhZNHj98Au4ou32wpRtlThpsgqBQsFK5tG1zZ8vvrpnjBkGeMnK1AY76Q0zT5xuunbF2CG1ib2SlLpTXh2aRSt3SALDbVWYX5mUvjwkF/U1j832R+x+k1aIvluMmdkbDcuIkBpx1G/2E+ozY0rQDstcZhPQxo5ZW8dQ8vhcFecYCNBuRT AetaxFaZ XzOZUzj9RM1zJYeFgLN3HZMftdg4OMTaHDRhBTBXvE+pGnZzkhfuAjwm4ymIteeaxxZjFJguFbW0VfoehDz2WvewGAt9xqyZlGUAZUv2L5GAtw7xVySTpiu73BQCCpD7HfUCKpA5U8aMmXiNwcRkuz8T1h9Mo6HjqQPNR8SLAQwedrBdaEY5rk8WZZSvzP2S28QaFTFC/jt2C4ih08V67Eb4UlWmU2HBdsc4jJz3/+uSa4c6n1FAjwiMqkwohcHuQejQWE34QIpggQPhisjKpkQbxSWEXQaXQPgZ3UNm5wlhJwVVGR9hyevVu3vbMbtCOQzSAHck9LJ4LwaaEAkRJ+rgRDWJqyA5yGYuro/4259zSBzVAQfB8N2K9mvUQgyh3sa3EBzprIXtPCXcRXtq0s8tG+BtwCQorU3CqiyOtICtkbuY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 20/02/25 09:38, Dave Hansen wrote: > On 2/20/25 09:10, Valentin Schneider wrote: >>> The LDT and maybe the PEBS buffers are the only implicit supervisor >>> accesses to vmalloc()'d memory that I can think of. But those are both >>> handled specially and shouldn't ever get zapped while in use. The LDT >>> replacement has its own IPIs separate from TLB flushing. >>> >>> But I'm actually not all that worried about accesses while actually >>> running userspace. It's that "danger zone" in the kernel between entry >>> and when the TLB might have dangerous garbage in it. >>> >> So say we have kPTI, thus no vmalloc() mapped in CR3 when running >> userspace, and do a full TLB flush right before switching to userspace - >> could the TLB still end up with vmalloc()-range-related entries when we're >> back in the kernel and going through the danger zone? > > Yes, because the danger zone includes the switch back to the kernel CR3 > with vmalloc() fully mapped. All bets are off about what's in the TLB > the moment that CR3 write occurs. > > Actually, you could probably use that. > > If a mapping is in the PTI user page table, you can't defer the flushes > for it. Basically the same rule for text poking in the danger zone. > > If there's a deferred flush pending, make sure that all of the > SWITCH_TO_KERNEL_CR3's fully flush the TLB. You'd need something similar > to user_pcid_flush_mask. > Right, that's what I (roughly) had in mind... > But, honestly, I'm still not sure this is worth all the trouble. If > folks want to avoid IPIs for TLB flushes, there are hardware features > that *DO* that. Just get new hardware instead of adding this complicated > pile of software that we have to maintain forever. In 10 years, we'll > still have this software *and* 95% of our hardware has the hardware > feature too. ... But yeah, it pretty much circumvents arch_context_tracking_work, or at the very least adds an early(er) flushing of the context tracking work... Urgh. Thank you for grounding my wild ideas into reality. I'll try to think some more see if I see any other way out (other than "buy hardware that does what you want and ditch the one that doesn't").